Performance – Index Size on search speed (storage or not store)

Now, we use Solr as a full-text index, where all the fields of the document are indexed but not stored.
There are millions of documents, and the index size is 50 GB. The average query time is about 100 milliseconds.

To use features such as highlighting, we are considering: additional store text. However, this may double the size of the index file.​​​

I know that there is absolutely no (linear) relationship between index size and query time. A rise in the factor of 10 documents leads to almost no difference in query time.

However, the system (Solr / Lucene / Linux/…) has to deal with more information-index files (for example) are based on more I nodes, and so on.

So I’m sure that query time related to index size will be affected. ( But: Is this obvious?)

First place:
Do you think I am right?
Do you have any experience about index size and search speed with/without stored text?
Is it wise and reasonable to blow up the index by storing files?

Second:
Do you know how Solr/Lucene handles stored text? Maybe in a separate file? (This has no effect on simple search, no need to store text!?)

Thank you.

Yes, if you store large fields, then the index will grow, but if you want to highlight them, you have no other way. I don’t think the speed will decrease that much, maybe just because you need to download more Data retrieval results, but it is not relevant.

About the lucene index format and the different files in the index, you can check here: The stored fields are stored in a specific file.

< /div>

Now, we use Solr as a full-text index, where all the fields of the document are indexed but not stored.
There are several million documents, and the index size is 50 GB. On average The query time is about 100 milliseconds.

To use features such as highlighting, we are considering: additional store text. However, this may double the size of the index file.

/p>

I know that there is absolutely no (linear) relationship between index size and query time. A rise in the factor of 10 documents leads to almost no difference in query time.

However, the system (Solr / Lucene / Linux / …) more information must be processed – index files (for example) are based on more I nodes, and so on.

So I’m sure that query time related to index size will be affected. (But : Is this obvious?)

First place:
Do you think I am right?
Do you have any experience about index size and search speed with/without stored text?
Is it wise and reasonable to blow up the index by storing files?

Second:
Do you know how Solr/Lucene handles stored text? Maybe in a separate file? (This has no effect on simple search, no need to store text!?)

Thank you.

Yes, if you store a large Fields, then the index will grow, but if you want to highlight them, you have no other way. I don’t think the speed will decrease that much, maybe it’s just because you need to download more data retrieval results, but it’s not relevant.

About the lucene index format and the different files in the index, you can check here: The stored fields are stored in a specific file.

Leave a Comment

Your email address will not be published.