Jieba (buse) common method

The first step: install the jieba library first

  Enter the command: pip install jieba

  share picture

Jieba library commonly used functions:

  Three types of jieba library word segmentation Mode:
  1. Precise mode: separate the text accurately without redundancy
  2. Full mode: scan all possible words in the text, there is redundancy
  3, Search engine mode: On the basis of the precise mode, the long words are segmented again

  share picture

  Precise mode:

  >>>import jieba
  >> > jieba.lcut(“China is a great country”)
  Building prefix dict from the default dictionary …
  Loading model from cache C:\Users\25282\AppData\Local\Temp\jieba.cache
   Loading model cost 0.869 seconds.
  Prefix dict has been built succesfully.
  [‘China’,’Yes’,’One’,’Great’,’的’,’Country’]

  Full mode:

  >>> jieba.lcut(“China is a great country”,cut_all=True)
  [‘China’, ‘ Country is’,’a’,’great’,’of’,’country’]

  Search engine mode:

  >>> jieba. lcut_for_se arch(“The People’s Republic of China is great”)
  [‘China’,’Chinese’,’People’,’Republic’,’Republic’,’People’s Republic of China’,’Yes’,’Great’, ‘ ‘]

  Add new words to the word segmentation dictionary:

  >>> jieba.add_word(“Python language”)
  >>> jieba .lcut(“python is the python language”)
  [‘python’,’是’,’python language’]

Jieba library application example 1-statistics of words appearing in the eight honors and eight shame

  share picture

  share picture

Statistical example of word segmentation in the jieba database 2–The Romance of the Three Kingdoms Vocabulary

 (1) Find the top ten words in the “threekingdoms.txt” file with frequency

  share picture

< p> (2) Count the number of appearances of names such as “Guan Yu”, “Cao Cao”, “Zhu Geliang” and “Liu Bei” in the “threekingdoms.txt” file

  Share pictures

< /p>

WordPress database error: [Table 'yf99682.wp_s6mz6tyggq_comments' doesn't exist]
SELECT SQL_CALC_FOUND_ROWS wp_s6mz6tyggq_comments.comment_ID FROM wp_s6mz6tyggq_comments WHERE ( comment_approved = '1' ) AND comment_post_ID = 2695 ORDER BY wp_s6mz6tyggq_comments.comment_date_gmt ASC, wp_s6mz6tyggq_comments.comment_ID ASC

Leave a Comment

Your email address will not be published.