Definition
Synonyms : Synonyms that appear in search results. If we enter “okay”, the result will include the synonym “okay”.
Stop words: Words that do not appear in the results when searching. For example, is, a, are, “的”, “得”, “我” and so on. These words will appear multiple times in a sentence but have no meaning. Therefore, these words need to be filtered out during word segmentation.
Expanded words: additional words that appear in search results. The extended word can only be the word itself or a substring of the word you entered. For example, if we enter “Chongqing Kaixianren”, the result of normal segmentation is “Chongqing”, “Kaixian” and “人”; when we add “Chongqing Kaixian” to the extended word, the result of the word segmentation is “Chongqing Kaixian” “Chongqing”” Kaixian” “people”.
Configure synonyms
1. Configure the synonym text_syn in the schema.xml under the conf directory of solr_home:
< fieldType name="text_syn" class="solr .TextField"> <analyzer type="query "> <tokenizer class="solr.WhitespaceTokenizerFactory "/> <filter class="solr. LowerCaseFilterFactory"/> analyzer> <analyzer type="index"> < span class="hljs-tag"><tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.SynonymFilterFactory" < span class="hljs-attribute">synonyms="synonyms.txt" ignoreCase="true "expand="false" /> <filter class="solr.LowerCaseFilterFactory"/> analyzer> fieldType>< /span> span>
2. Assuming that the shortName field needs to be configured for synonyms in the schema.xml under the conf directory of solr_home, then we Need to set the type attribute to “text_syn” of the upward configuration
"shortName" type="text_syn" indexed="true" stored=" true" />
- 1
3. Add synonyms to the synonyms.txt in the conf directory, such as:
Here I say two more words: the red box above => The word on the right is the word on the left Synonyms, separated by spaces when multiple. Also, it is best not to open the synonyms.txt file directly, because the added Chinese characters will not be found after saving, because txt is not saved in UTF-8 format.
4. Test synonyms
I am happy to enter:
- 1
Enter Chinese:
- 1
< /ul>
< p>5. Based on the above problem, let’s talk about how to solve it. After all, whether we import China or China, there will be corresponding synonyms.
First, we separate the synonyms in synonyms.txt with English commas, and then set the expand attribute in the above configuration IK synonyms to true.
We input: big, and the result is as follows:
Configure stop words and extended thesaurus.
1. Copy stopword.dic and IKAnalyzer.cfg.xml in the IKAnalyzer decompression folder to tomcat/webapps/solr/WEB-INF/classes, and create a new ext.dic with the format inside Same as stopword.dic.
2. Modify IKAnalyzer.cfg.xml as the following format to configure multiple stop words or extended thesaurus files.
<properties> <comment>IK Analyzer extension configuration comment> <entry key ="ext_dict">ext.dic;entry> <entry key="ext_stopwords">english_stopword.dic;stopword.dicentry> <properties> < /span>
When entering “Chongqing Kaixian”, the normal participle is only There are “Chongqing” and “Kaixian”
After adding “Chongqing Kaixian” to ext.dic, the test result:
When entering “Chongqing Kaixian”, the normal participle is only “Chongqing” and “Kaixian”.
In stopword. After adding “Open County” to the dic, the test result:
Note
If the field can be segmented, or stop word, or extended word, you need to configure this field in schema.xml When, assign the word type to the type attribute of the field, here is text_ik, for example:
"companyName" type="text_ik" indexed="false" stored="true" multiValued="false" />
definition
< strong>Synonyms: Synonyms that appear in search results. If we enter “okay”, the result will include the synonym “okay”.
Stop words: Words that do not appear in the results when searching. For example, is, a, are, “的”, “得”, “我” and so on. These words will appear multiple times in a sentence but have no meaning. Therefore, these words need to be filtered out during word segmentation.
Expanded words: additional words that appear in search results. The extended word can only be the word itself or a substring of the word you entered. For example, if we enter “Chongqing Kaixianren”, the result of normal segmentation is “Chongqing”, “Kaixian” and “人”; when we add “Chongqing Kaixian” to the extended word, the result of the word segmentation is “Chongqing Kaixian” “Chongqing”” Kaixian” “people”.
Configure synonyms
1. Configure the synonym text_syn in the schema.xml under the conf directory of solr_home:
< fieldType name="text_syn" class="solr .TextField"> <analyzer type="query "> <tokenizer class="solr.WhitespaceTokenizerFactory "/> <filter class="solr. LowerCaseFilterFactory"/> analyzer> <analyzer type="index"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.SynonymFilterFactory" < span class="hljs-attribute">synonyms="synonyms.txt" ignoreCase="true "expand="false" /> <filter class="solr.LowerCaseFilterFactory"/> analyzer> fieldType>< /span>
2. Assuming that the shortName field needs to be configured for synonyms in the schema.xml under the conf directory of solr_home, then we Need to set the type attribute to “text_syn” of the upward configuration
"shortName" type="text_syn" indexed="true" stored=" true" />
- 1
3. Add synonyms to the synonyms.txt in the conf directory, such as:
Here I say two more words: the red box above => The word on the right is the word on the left Synonyms, separated by spaces when multiple. Also, it is best not to open the synonyms.txt file directly, because the added Chinese characters will not be found after saving, because txt is not saved in UTF-8 format.
4. Test synonyms
I am happy to enter:
- 1
Enter Chinese:
- 1
< /ul>
< p>5. Based on the above problem, let’s talk about how to solve it. After all, whether we import China or China, there will be corresponding synonyms.
First, we separate the synonyms in synonyms.txt with English commas, and then set the expand attribute in the above configuration IK synonyms to true.
We input: big, and the result is as follows:
Configure stop words and extended thesaurus.
1. Copy stopword.dic and IKAnalyzer.cfg.xml in the IKAnalyzer decompression folder to tomcat/webapps/solr/WEB-INF/classes, and create a new ext.dic with the format inside Same as stopword.dic.
2. Modify IKAnalyzer.cfg.xml as the following format to configure multiple stop words or extended thesaurus files.
<properties> <comment>IK Analyzer extension configuration comment> <entry key ="ext_dict">ext.dic;entry> <entry key="ext_stopwords">english_stopword.dic;stopword.dicentry> <properties> < /span>
When entering “Chongqing Kaixian”, the normal participle is only “weight “Celebrate” “Kaixian”
After adding “Chongqing Kaixian” to ext.dic, the test result:
When entering “Chongqing Kaixian”, the normal participle is only “Chongqing” and “Kaixian”
in stopword.dic After adding “Kaixian”, the test result:
Note
If the field can be segmented, or stop word, or extended word, when the field needs to be configured in schema.xml, Assign the word type to the type attribute of the field, here is text_ik, for example:
" companyName" type="text_ik" indexed="false" stored="true" multiValued="false" /> pre>
definition
Synonyms: Synonyms that appear in search results. If we enter "okay", the result will include the synonym "okay".
Stop words: Words that do not appear in the results when searching. For example, is, a, are, "的", "得", "我" and so on. These words will appear multiple times in a sentence but have no meaning. Therefore, these words need to be filtered out during word segmentation.
Expanded words: additional words that appear in search results. The extended word can only be the word itself or a substring of the word you entered. For example, if we enter "Chongqing Kaixianren", the result of normal segmentation is "Chongqing", "Kaixian" and "人"; when we add "Chongqing Kaixian" to the extended word, the result of the word segmentation is "Chongqing Kaixian" "Chongqing"" Kaixian" "people".
Configure synonyms
1. Configure the synonym text_syn in the schema.xml under the conf directory of solr_home:
< fieldType name="text_syn" class="solr .TextField"> <analyzer type="query "> <tokenizer class="solr.WhitespaceTokenizerFactory "/> <filter class="solr. LowerCaseFilterFactory"/> analyzer> <analyzer type="index"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> < span class="hljs-tag"><filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false" /> <filter < span class="hljs-attribute">class="solr.LowerCaseFilterFactory"/> analyzer> fieldType> span> < /span>
2. Assuming that the shortName field needs to be configured as a synonym in the schema.xml in the conf directory of solr_home, we need to set the type The attribute is set to "text_syn" of the upward configuration
"shortName" type="text_syn" indexed="true" stored="true" / >
- 1 li>
3. Add synonyms to the synonyms.txt in the conf directory, such as:
Here I say two more words: => in the red box above, the word on the right is a synonym for the word on the left, more Separate each time with a space. Also, it is best not to open the synonyms.txt file directly, because the added Chinese characters will not be found after saving, because txt is not saved in UTF-8 format.
4. Test synonyms
I am happy to enter:
- 1
Enter Chinese:
- 1
< /ul>
< p>5. Based on the above problem, let's talk about how to solve it. After all, whether we import China or China, there will be corresponding synonyms.
First, we separate the synonyms in synonyms.txt with English commas, and then set the expand attribute in the above configuration IK synonyms to true.
We input: big, and the result is as follows:
Configure stop words and extended thesaurus.
1. Copy stopword.dic and IKAnalyzer.cfg.xml in the IKAnalyzer decompression folder to tomcat/webapps/solr/WEB-INF/classes, and create a new ext.dic with the format inside Same as stopword.dic.
2. Modify IKAnalyzer.cfg.xml as the following format to configure multiple stop words or extended thesaurus files.
<properties> <comment>IK Analyzer extension configuration comment> <entry key ="ext_dict">ext.dic;entry> <entry key="ext_stopwords">english_stopword.dic;stopword.dicentry> <properties> < /span>
When entering "Chongqing Kaixian", the normal participle is only "Chongqing" "Kaixian"
After adding "Chongqing Kaixian" to ext.dic, the test result:
When entering "Chongqing Kaixian", the normal participle is only "Chongqing" and "Kaixian".
Added in stopword.dic After "Opening County", the test result:
Note
If the field can be segmented, or stop word, or extended word, you need to configure the field in the schema.xml, give the The type attribute of the field matches the word type, here is text_ik, for example:
"companyName" type="text_ik" indexed="false" stored="true" multiValued="false" />
There are 0 personal rewards