Open Source Search Engine Tool

Search engine tools

graph TB A((Lucene)) A –> B[Solr] A –> C[ElasticSearch] A –> H[Java] B –> D[RESTful & Admin] B –> E[Core & Field] B –> F[IKAnalyzer] C –> FC –> G[Kibana]

[TOC]

Lucene

Lucene (['lu:si:n]) is based on java Open source full-text indexing toolkit

The content of this article is derived from how2j

Sample code:

// 1. Prepare Chinese word segmenter
IKAnalyzer analyzer = new IKAnalyzer();

// 2. Use test data to create an index
List productNames = new ArrayList<>();
productNames.add("Philips led bulb e27 screw-mouth warm white bulb lamp household lighting super bright energy-saving bulb to color temperature bulb");
productNames.add("Philips led bulb e14 screw candle bulb 3W pointed bubble tailed energy-saving bulb warm yellow light source Lamp");
productNames.add("NVC lighting LED bulb e27 large screw socket energy-saving lamp 3W bulb lamp Lamp led energy-saving bulb");
productNames.add("Philips led bulb e27 screw socket household 3W warm white ball Bulb energy saving lamp 5W bulb LED single lamp 7w");
productNames.add("Philips led small bulb e14 screw 4.5w transparent LED energy saving bulb lighting source lamp single lamp");
productNames .add("Philips dandelion eye protection desk lamp work, study and reading energy-saving lamps 30508 with light source");
productNames.add("op lighting led bulb candle energy-saving bulb e14 screw bulb lamp super bright lighting single light source") ;
productNames.add("Opu lighting led bulb energy saving bulb super bright light source e14e27 spiral screw mouth small bulb warm yellow household");
p roductNames.add("Polyop lighting led bulb energy-saving bulb e27 screw bulb household led lighting single lamp super bright light source");
// Directory index = createIndex(analyzer, productNames);
Directory index = new RAMDirectory();
IndexWriterConfig config = new IndexWriterConfig(analyzer);
IndexWriter writer = new IndexWriter(index, config);

for (String name: productNames) {
Document doc = new Document();
doc.add(new TextField("name", name, Field.Store.YES));
writer.addDocument(doc);
}
writer.close();

// 3. Query device
String keyword = "eye protection with light source";
Query query = new QueryParser ("name", analyzer).parse(keyword);

// 4. Search
IndexReader reader = DirectoryReader.open(index);
IndexSearcher searcher = new IndexSearcher( reader);
int numberPerPage = 1000;
System.out.printf("Currently there are %d pieces of data%n",productNames.size());
System.out.printf ("The query keyword is:\"%s\"%n",keyword);
System.out.printf("Query string word segmentation result:\"%s\"%n",query);
ScoreDoc[] hits = searcher.search(query, numberPer Page).scoreDocs;

// 5. Display query results
for (int i = 0; i ScoreDoc scoreDoc = hits [i];
int docId = scoreDoc.doc;
Document d = searcher.doc(docId);
List fields = d.getFields();
System .out.print((i + 1));
System.out.print("\t" + scoreDoc.score);
for (IndexableField f: fields) {
System. out.print("\t" + d.get(f.name()));
}
System.out.println();
}
// 6 . Close the query
reader.close();

Executive result 1:

The query keyword is: "eye protection with light source"
Query string word segmentation Result: "name:eye protection name:with name:light source"
1 5.159822 Philips dandelion eye protection desk lamp work, study and reading energy-saving lamps 30508 with light source
2 0.43331528 Op lighting led light bulb candle energy-saving light bulb e14 screw ball Bulb super bright lighting single light source
3 0.425806 Philips led bulb e14 screw candle bulb 3W pointed bulb tailed energy-saving bulb warm yellow light source Lamp
4 0.425806 Philips led bulb e14 screw 4.5w transparent LED energy-saving bulb lighting source lamp single lamp
5 0.425806 Op lighting led light bulb energy-saving bulb super bright light source e14e27 spiral screw small bulb warm yellow household
6 0.425806 Poly Op lighting led light bulb energy-saving bulb e27 screw Bulb household led lighting single lamp super bright light source

Executive result 2:

The query keyword is: "led yellow"
The query string word segmentation result: "name: led name: yellow"
1 0.2216 8216 Philips led small bulb e14 screw 4.5w transparent LED energy-saving bulb lighting source lamp single lamp
2 0.22168216 Polyop lighting led bulb energy-saving bulb e27 screw bulb household led lighting single lamp super bright light source
3 0.21906272 NVC lighting LED bulb e27 large screw energy saving lamp 3W bulb led energy saving bulb
4 0.20684227 Philips led bulb e27 screw mouth household 3w warm white bulb energy saving lamp 5W bulb LED single lamp 7w
5 0.1634743 Philips led bulb e27 screw-mouth warm white bulb home lighting super bright energy-saving bulb to color temperature bulb
6 0.1634743 Op lighting led bulb candle energy-saving bulb e14 screw-mouth bulb super bright lighting single light source
7 0.16064131 Philips led light bulb e14 screw candle light bulb 3W pointed bubble pull tail energy-saving light bulb warm yellow light source Lamp
8 0.16064131 Op lighting led light bulb energy-saving light bulb super bright light source e14e27 spiral screw light bulb warm yellow home
pre>

Executive result 3:

The query keyword is: "led yellow"
The word segmentation result of the query string: "name:led name:黄"
1 2.0358434 OP Lighting led bulb energy saving bulb super bright light source e14e27 spiral screw small bulb warm yellow household
2 0.22168216 Philips led small bulb e14 screw 4.5w transparent led energy saving bulb lighting source lamp single lamp
3 0.22168216 Juop lighting led bulb energy-saving bulb e27 screw bulb household led lighting single lamp super bright light source
4 0.21906272 NVC lighting LED bulb e27 large screw energy-saving lamp 3W bulb lamp Lamp led energy-saving bulb
5 0.20684227 Philips led bulb e27 screw socket household 3w warm white bulb energy-saving lamp 5W bulb LED single lamp 7w
6 0.1634743 Philips led bulb e27 screw socket warm white bulb lamp household lighting super bright energy-saving bulb to color temperature bulb
7 0.1634743 Op lighting led bulb candle energy-saving bulb e14 screw bulb lamp super bright single light source
8 0.16064131 Philips led bulb e14 screw candle bulb 3W pointed bulb energy-saving bulb warm yellow light source Lamp

Look carefully at the execution results 2 and 3. The two search intents are actually very similar, but Lucene returned completely different results. The actual search engine will consider the user’s search intent. For example, the user’s query string is automatically modified to obtain a query string with similar semantics (intent) and then perform the query, and finally show the user the synthesized query result

Solr

Solr (['s?ul?]) is an open source search platform for building search applications. It is built on Lucene (full-text search engine)

Solr provides a way similar to RESTful-API to facilitate other systems to communicate with Solr, and Solr provides a web Admin interface , It is convenient for users to operate and manage

IKAnalyzer

The default Solr installation package does not include a Chinese word segmenter. At this time, Solr will decompose Chinese sentences, paragraphs or articles into One by one single words, Chinese word retrieval cannot be achieved. IKAnalyzer is a tripartite Chinese word segmenter

Take "Research Life Science" as an example, using the default Solr word segmenter, the word segmentation results are 6 independent Chinese characters:

Research| Research| Health| Life| Science| Learning

Using IKAnalyzer as a tokenizer, the result of word segmentation is 5 Chinese words:< /p>

Graduate| Research| Life Science| Life| Science

Concept

  • Core

    If Solr is equivalent to a database, then Core is equivalent to a table

  • Field, you need to specify the field name when querying

    Core is equivalent to a table. Next, you must set up fields for this table to store data. When using Solr to query, you need to specify the search field and the query word, for example: SolrUtil.query("name:小米电视片",0,10), SolrUtil.query("category :Home appliances",0,10)

Common commands (windows)

  • solr.cmd start
  • solr.cmd stop -all
  • solr.cmd create -c how2java -p 8983, create core
  • solr.cmd delete- c core1

ElasticSearch

Like Solr, ElasticSearch is based on Lucene and provides more convenient access and call

Like Solr, ES does not have a default Chinese word segmenter, and requires a tripartite tool. The installation command:

elasticsearch-plugin.bat install file:.\elasticsearch-analysis-ik-6.2.2.zip< /code>

Kibana

Kibana is an open source analysis and visualization platform designed to work with Elasticsearch. Kibana is a tool for analyzing data in ES. Kibana provides Dev tools to facilitate users to send requests to ES services in a RESTful style.

Default access port: 127.0.0.1:5601

Index

The index is equivalent to a database on a database server, so the index can also be regarded as a database in Elastic Search< /p>

Use RESTful style to control ES

PUT /how2java?pretty, add index
GET /_cat/indices?v, query
DELETE /how2java?pretty, delete

// insert a piece of data into product
PUT /how2java/product/1?pretty
{
"name": " Candle"
}

graph TB A((Lucene)) A --> B[Solr] A --> C[ElasticSearch] A --> H[Java] B --> D[RESTful & Admin] B --> E[Core & Field] B --> F[IKAnalyzer] C --> FC --> G[Kibana]

Leave a Comment

Your email address will not be published.