Jieba (buse) common method

The first step: install the jieba library first

  Enter the command: pip install jieba

  share picture

Jieba library commonly used functions:

  Three types of jieba library word segmentation Mode:
  1. Precise mode: separate the text accurately without redundancy
  2. Full mode: scan all possible words in the text, there is redundancy
  3, Search engine mode: On the basis of the precise mode, the long words are segmented again

  share picture

  Precise mode:

  >>>import jieba
  >> > jieba.lcut(“China is a great country”)
  Building prefix dict from the default dictionary …
  Loading model from cache C:\Users\25282\AppData\Local\Temp\jieba.cache
   Loading model cost 0.869 seconds.
  Prefix dict has been built succesfully.
  [‘China’,’Yes’,’One’,’Great’,’的’,’Country’]

  Full mode:

  >>> jieba.lcut(“China is a great country”,cut_all=True)
  [‘China’, ‘ Country is’,’a’,’great’,’of’,’country’]

  Search engine mode:

  >>> jieba. lcut_for_se arch(“The People’s Republic of China is great”)
  [‘China’,’Chinese’,’People’,’Republic’,’Republic’,’People’s Republic of China’,’Yes’,’Great’, ‘ ‘]

  Add new words to the word segmentation dictionary:

  >>> jieba.add_word(“Python language”)
  >>> jieba .lcut(“python is the python language”)
  [‘python’,’是’,’python language’]

Jieba library application example 1-statistics of words appearing in the eight honors and eight shame

  share picture

  share picture

Statistical example of word segmentation in the jieba database 2–The Romance of the Three Kingdoms Vocabulary

 (1) Find the top ten words in the “threekingdoms.txt” file with frequency

  share picture

< p> (2) Count the number of appearances of names such as “Guan Yu”, “Cao Cao”, “Zhu Geliang” and “Liu Bei” in the “threekingdoms.txt” file

  Share pictures

< /p>

Leave a Comment

Your email address will not be published.