The first step: install the jieba library first
Enter the command: pip install jieba
Jieba library commonly used functions:
Three types of jieba library word segmentation Mode:
1. Precise mode: separate the text accurately without redundancy
2. Full mode: scan all possible words in the text, there is redundancy
3, Search engine mode: On the basis of the precise mode, the long words are segmented again
Precise mode:
>>>import jieba
>> > jieba.lcut(“China is a great country”)
Building prefix dict from the default dictionary …
Loading model from cache C:\Users\25282\AppData\Local\Temp\jieba.cache
Loading model cost 0.869 seconds.
Prefix dict has been built succesfully.
[‘China’,’Yes’,’One’,’Great’,’的’,’Country’]
Full mode:
>>> jieba.lcut(“China is a great country”,cut_all=True)
[‘China’, ‘ Country is’,’a’,’great’,’of’,’country’]
Search engine mode:
>>> jieba. lcut_for_se arch(“The People’s Republic of China is great”)
[‘China’,’Chinese’,’People’,’Republic’,’Republic’,’People’s Republic of China’,’Yes’,’Great’, ‘ ‘]
Add new words to the word segmentation dictionary:
>>> jieba.add_word(“Python language”)
>>> jieba .lcut(“python is the python language”)
[‘python’,’是’,’python language’]
Jieba library application example 1-statistics of words appearing in the eight honors and eight shame
p>
Statistical example of word segmentation in the jieba database 2–The Romance of the Three Kingdoms Vocabulary
(1) Find the top ten words in the “threekingdoms.txt” file with frequency
< p> (2) Count the number of appearances of names such as “Guan Yu”, “Cao Cao”, “Zhu Geliang” and “Liu Bei” in the “threekingdoms.txt” file
< /p>