The first step: install the jieba library first

　　Enter the command: pip install jieba

Jieba library commonly used functions:

　　Three types of jieba library word segmentation Mode:
　　1. Precise mode: separate the text accurately without redundancy
　　2. Full mode: scan all possible words in the text, there is redundancy
　　3, Search engine mode: On the basis of the precise mode, the long words are segmented again

　　Precise mode:

　　>>>import jieba
　　>> > jieba.lcut(“China is a great country”)
　　Building prefix dict from the default dictionary …
　　Loading model from cache C:\Users\25282\AppData\Local\Temp\jieba.cache
　　 Loading model cost 0.869 seconds.
　　Prefix dict has been built succesfully.
　　[‘China’,’Yes’,’One’,’Great’,’的’,’Country’]

　　Full mode:

　　>>> jieba.lcut(“China is a great country”,cut_all=True)
　　[‘China’, ‘ Country is’,’a’,’great’,’of’,’country’]

　　Search engine mode:

　　>>> jieba. lcut_for_se arch(“The People’s Republic of China is great”)
　　[‘China’,’Chinese’,’People’,’Republic’,’Republic’,’People’s Republic of China’,’Yes’,’Great’, ‘ ‘]

　　Add new words to the word segmentation dictionary:

　　>>> jieba.add_word(“Python language”)
　　>>> jieba .lcut(“python is the python language”)
　　[‘python’,’是’,’python language’]

Jieba library application example 1-statistics of words appearing in the eight honors and eight shame

Statistical example of word segmentation in the jieba database 2–The Romance of the Three Kingdoms Vocabulary

　(1) Find the top ten words in the “threekingdoms.txt” file with frequency

< p>　(2) Count the number of appearances of names such as “Guan Yu”, “Cao Cao”, “Zhu Geliang” and “Liu Bei” in the “threekingdoms.txt” file

< /p>

Jieba (buse) common method

The first step: install the jieba library first

Jieba library commonly used functions:

Jieba library application example 1-statistics of words appearing in the eight honors and eight shame

Statistical example of word segmentation in the jieba database 2–The Romance of the Three Kingdoms Vocabulary

Leave a Comment Cancel reply