Solr configuration synonym, stop word, and extension of the word library (IK chiper as an example)

Definition

Synonyms : Synonyms that appear in search results. If we enter “okay”, the result will include the synonym “okay”.

Stop words: Words that do not appear in the results when searching. For example, is, a, are, “的”, “得”, “我” and so on. These words will appear multiple times in a sentence but have no meaning. Therefore, these words need to be filtered out during word segmentation.

Expanded words: additional words that appear in search results. The extended word can only be the word itself or a substring of the word you entered. For example, if we enter “Chongqing Kaixianren”, the result of normal segmentation is “Chongqing”, “Kaixian” and “人”; when we add “Chongqing Kaixian” to the extended word, the result of the word segmentation is “Chongqing Kaixian” “Chongqing”” Kaixian” “people”.

Configure synonyms

1. Configure the synonym text_syn in the schema.xml under the conf directory of solr_home:

 < fieldType name="text_syn" class="solr .TextField"> <analyzer type="query "> <tokenizer class="solr.WhitespaceTokenizerFactory "/> <filter class="solr. LowerCaseFilterFactory"/> analyzer> <analyzer type="index"> < span class="hljs-tag"><tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.SynonymFilterFactory" < span class="hljs-attribute">synonyms="synonyms.txt" ignoreCase="true "expand="false" /> <filter class="solr.LowerCaseFilterFactory"/> analyzer> fieldType>< /span> span> 

2. Assuming that the shortName field needs to be configured for synonyms in the schema.xml under the conf directory of solr_home, then we Need to set the type attribute to “text_syn” of the upward configuration

"shortName" type="text_syn" indexed="true" stored=" true" />
  • 1

3. Add synonyms to the synonyms.txt in the conf directory, such as:

write picture description here
Here I say two more words: the red box above => The word on the right is the word on the left Synonyms, separated by spaces when multiple. Also, it is best not to open the synonyms.txt file directly, because the added Chinese characters will not be found after saving, because txt is not saved in UTF-8 format.

4. Test synonyms

I am happy to enter:
  • 1

Write the picture description here

Enter Chinese:
  • 1
  • < /ul>

    Write the picture description here

    < p>5. Based on the above problem, let’s talk about how to solve it. After all, whether we import China or China, there will be corresponding synonyms.

    First, we separate the synonyms in synonyms.txt with English commas, and then set the expand attribute in the above configuration IK synonyms to true.

    Write the picture description here

    We input: big, and the result is as follows:

    write image description here

    Configure stop words and extended thesaurus.

    1. Copy stopword.dic and IKAnalyzer.cfg.xml in the IKAnalyzer decompression folder to tomcat/webapps/solr/WEB-INF/classes, and create a new ext.dic with the format inside Same as stopword.dic.

    2. Modify IKAnalyzer.cfg.xml as the following format to configure multiple stop words or extended thesaurus files.

    <properties>  <comment>IK Analyzer extension configuration comment>  <entry key ="ext_dict">ext.dic;entry>  <entry key="ext_stopwords">english_stopword.dic;stopword.dicentry> <properties> < /span>

    When entering “Chongqing Kaixian”, the normal participle is only There are “Chongqing” and “Kaixian”
    After adding “Chongqing Kaixian” to ext.dic, the test result:
    Write the picture description here

    When entering “Chongqing Kaixian”, the normal participle is only “Chongqing” and “Kaixian”.
    In stopword. After adding “Open County” to the dic, the test result:
    Write the picture description here

    Note

    If the field can be segmented, or stop word, or extended word, you need to configure this field in schema.xml When, assign the word type to the type attribute of the field, here is text_ik, for example:

    "companyName" type="text_ik" indexed="false" stored="true" multiValued="false" /> 

There are 0 personal tips

definition

< strong>Synonyms: Synonyms that appear in search results. If we enter “okay”, the result will include the synonym “okay”.

Stop words: Words that do not appear in the results when searching. For example, is, a, are, “的”, “得”, “我” and so on. These words will appear multiple times in a sentence but have no meaning. Therefore, these words need to be filtered out during word segmentation.

Expanded words: additional words that appear in search results. The extended word can only be the word itself or a substring of the word you entered. For example, if we enter “Chongqing Kaixianren”, the result of normal segmentation is “Chongqing”, “Kaixian” and “人”; when we add “Chongqing Kaixian” to the extended word, the result of the word segmentation is “Chongqing Kaixian” “Chongqing”” Kaixian” “people”.

Configure synonyms

1. Configure the synonym text_syn in the schema.xml under the conf directory of solr_home:

 < fieldType name="text_syn" class="solr .TextField"> <analyzer type="query "> <tokenizer class="solr.WhitespaceTokenizerFactory "/> <filter class="solr. LowerCaseFilterFactory"/> analyzer> <analyzer type="index"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.SynonymFilterFactory" < span class="hljs-attribute">synonyms="synonyms.txt" ignoreCase="true "expand="false" /> <filter class="solr.LowerCaseFilterFactory"/> analyzer> fieldType>< /span> 

2. Assuming that the shortName field needs to be configured for synonyms in the schema.xml under the conf directory of solr_home, then we Need to set the type attribute to “text_syn” of the upward configuration

"shortName" type="text_syn" indexed="true" stored=" true" />
  • 1

3. Add synonyms to the synonyms.txt in the conf directory, such as:

write picture description here
Here I say two more words: the red box above => The word on the right is the word on the left Synonyms, separated by spaces when multiple. Also, it is best not to open the synonyms.txt file directly, because the added Chinese characters will not be found after saving, because txt is not saved in UTF-8 format.

4. Test synonyms

I am happy to enter:
  • 1

Write the picture description here

Enter Chinese:
  • 1
  • < /ul>

    Write the picture description here

    < p>5. Based on the above problem, let’s talk about how to solve it. After all, whether we import China or China, there will be corresponding synonyms.

    First, we separate the synonyms in synonyms.txt with English commas, and then set the expand attribute in the above configuration IK synonyms to true.

    Write the picture description here

    We input: big, and the result is as follows:

    write image description here

    Configure stop words and extended thesaurus.

    1. Copy stopword.dic and IKAnalyzer.cfg.xml in the IKAnalyzer decompression folder to tomcat/webapps/solr/WEB-INF/classes, and create a new ext.dic with the format inside Same as stopword.dic.

    2. Modify IKAnalyzer.cfg.xml as the following format to configure multiple stop words or extended thesaurus files.

    <properties>  <comment>IK Analyzer extension configuration comment>  <entry key ="ext_dict">ext.dic;entry>  <entry key="ext_stopwords">english_stopword.dic;stopword.dicentry> <properties> < /span>

    When entering “Chongqing Kaixian”, the normal participle is only “weight “Celebrate” “Kaixian”
    After adding “Chongqing Kaixian” to ext.dic, the test result:
    Write the picture description here

    When entering “Chongqing Kaixian”, the normal participle is only “Chongqing” and “Kaixian”
    in stopword.dic After adding “Kaixian”, the test result:
    Write picture description here

    Note

    If the field can be segmented, or stop word, or extended word, when the field needs to be configured in schema.xml, Assign the word type to the type attribute of the field, here is text_ik, for example:

    " companyName" type="text_ik" indexed="false" stored="true" multiValued="false" /> 

definition

Synonyms: Synonyms that appear in search results. If we enter "okay", the result will include the synonym "okay".

Stop words: Words that do not appear in the results when searching. For example, is, a, are, "的", "得", "我" and so on. These words will appear multiple times in a sentence but have no meaning. Therefore, these words need to be filtered out during word segmentation.

Expanded words: additional words that appear in search results. The extended word can only be the word itself or a substring of the word you entered. For example, if we enter "Chongqing Kaixianren", the result of normal segmentation is "Chongqing", "Kaixian" and "人"; when we add "Chongqing Kaixian" to the extended word, the result of the word segmentation is "Chongqing Kaixian" "Chongqing"" Kaixian" "people".

Configure synonyms

1. Configure the synonym text_syn in the schema.xml under the conf directory of solr_home:

 < fieldType name="text_syn" class="solr .TextField"> <analyzer type="query "> <tokenizer class="solr.WhitespaceTokenizerFactory "/> <filter class="solr. LowerCaseFilterFactory"/> analyzer> <analyzer type="index"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> < span class="hljs-tag"><filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false" /> <filter < span class="hljs-attribute">class="solr.LowerCaseFilterFactory"/>  analyzer> fieldType>  span> < /span>

2. Assuming that the shortName field needs to be configured as a synonym in the schema.xml in the conf directory of solr_home, we need to set the type The attribute is set to "text_syn" of the upward configuration

"shortName" type="text_syn" indexed="true" stored="true" / >
  • 1 li>

3. Add synonyms to the synonyms.txt in the conf directory, such as:

Write the picture description here
Here I say two more words: => in the red box above, the word on the right is a synonym for the word on the left, more Separate each time with a space. Also, it is best not to open the synonyms.txt file directly, because the added Chinese characters will not be found after saving, because txt is not saved in UTF-8 format.

4. Test synonyms

I am happy to enter:
  • 1

Write the picture description here

Enter Chinese:
  • 1
  • < /ul>

    Write the picture description here

    < p>5. Based on the above problem, let's talk about how to solve it. After all, whether we import China or China, there will be corresponding synonyms.

    First, we separate the synonyms in synonyms.txt with English commas, and then set the expand attribute in the above configuration IK synonyms to true.

    Write the picture description here

    We input: big, and the result is as follows:

    write image description here

    Configure stop words and extended thesaurus.

    1. Copy stopword.dic and IKAnalyzer.cfg.xml in the IKAnalyzer decompression folder to tomcat/webapps/solr/WEB-INF/classes, and create a new ext.dic with the format inside Same as stopword.dic.

    2. Modify IKAnalyzer.cfg.xml as the following format to configure multiple stop words or extended thesaurus files.

    <properties>  <comment>IK Analyzer extension configuration comment>  <entry key ="ext_dict">ext.dic;entry>  <entry key="ext_stopwords">english_stopword.dic;stopword.dicentry> <properties> < /span>

    When entering "Chongqing Kaixian", the normal participle is only "Chongqing" "Kaixian"
    After adding "Chongqing Kaixian" to ext.dic, the test result:
    Write the picture description here

    When entering "Chongqing Kaixian", the normal participle is only "Chongqing" and "Kaixian".
    Added in stopword.dic After "Opening County", the test result:
    Write picture description here

    Note

    If the field can be segmented, or stop word, or extended word, you need to configure the field in the schema.xml, give the The type attribute of the field matches the word type, here is text_ik, for example:

    "companyName" type="text_ik" indexed="false" stored="true" multiValued="false" />

    Yes 0 personal rewards

    There are 0 personal tips

    There are 0 personal rewards

WordPress database error: [Table 'yf99682.wp_s6mz6tyggq_comments' doesn't exist]
SELECT SQL_CALC_FOUND_ROWS wp_s6mz6tyggq_comments.comment_ID FROM wp_s6mz6tyggq_comments WHERE ( comment_approved = '1' ) AND comment_post_ID = 2709 ORDER BY wp_s6mz6tyggq_comments.comment_date_gmt ASC, wp_s6mz6tyggq_comments.comment_ID ASC

Leave a Comment

Your email address will not be published.