NOSQL – GET consistency (and quorum) in Elasticsearch

I am new to ElasticSearch and I am evaluating for a project.

In ES, replication can be synchronous or asynchronous. In the case of asynchronous Next, as long as the document is written to the main shard, the client will return success. Then the document is pushed asynchronously to other replicas.

When writing asynchronously, how do we ensure that when the GET is completed, even if the data is not Propagated to all replicas, data will also be returned. Because when we perform GET in ES, the query will be forwarded to one of the replicas of the corresponding shard. If we write asynchronously, the main shard may have documents, but it is used to perform GET The selected copy of may not yet receive/write the document. In Cassandra, we can specify the consistency level (ONE, QUORUM, ALL) when writing and reading. Is it possible to read in ES?

Yes, you can set replication to asynchronous (synchronous by default) so as not to wait for a copy, Although this will not help you much in practice.

Whenever you read data, you can specify the preference parameter to control where the document is taken from. If You use the preference: _primary, please make sure to always fetch the document from the primary shard, otherwise, if the fetching is completed before the document is available on all copies, you may encounter shards without it. However. Given that the get api works in real time, It usually makes sense to keep the replication in sync, so that after the indexing operation returns, you can always return the document by id from any shard that should contain it. Nevertheless, if you try to retrieve the document the first time you index the document , Then you may find that it did not find it.

There is also a write consistency parameter in elasticsearch, but it is different from the way other data stores work, and it has nothing to do with whether the replication is synchronous or asynchronous. Use the consistency parameter, You can control how many data copies are needed to allow write operations. If there are not enough data copies available, the write operation will fail (after waiting up to 1 minute, you can change the interval by the timeout parameter). This is just a preliminary check, To decide whether to accept the operation. This does not mean that if the operation fails on the copy, it will be rolled back. In fact, if the write operation fails on the copy but succeeds on the primary database, it is assumed that the copy has an error (or is Running hardware), so the shard will be marked as failed and recreated on another node. The default value of consistency is arbitration, and it can also be set to one or all.

That is, When it comes to get api, elasticsearch is ultimately inconsistent, but only consistent, because once a document is indexed, you can retrieve it.

In fact, newly added documents cannot be searched until the next refresh operation. It happens automatically every second by default, which is actually not final consistency (because the document is there and can be retrieved by id), but more about how search and lucene work and how to make the document visible through lucene.

I am new to ElasticSearch and I am evaluating for a project.

In ES, replication can be synchronous or asynchronous. In the case of asynchronous Next, as long as the document is written to the main shard, the client will return success. Then the document is pushed asynchronously to other replicas.

When writing asynchronously, how do we ensure that when the GET is completed, even if the data is not Spread to all copies and also Return data. Because when we perform a GET in ES, the query will be forwarded to one of the replicas of the corresponding shard. If we write asynchronously, the primary shard may have documents, but the selected replica used to perform the GET may not have been received /Write document. In Cassandra, we can specify the consistency level (ONE, QUORUM, ALL) when writing and reading. Is it possible to read in ES?

Yes, you can set replication to asynchronous (synchronous by default) so as not to wait for copies, although in practice this will not bring you Too much help.

Whenever you read data, you can specify the preference parameter to control where the document is taken from. If you use the preference: _primary, please make sure to always start from the primary Fetch documents in shards, otherwise, if the fetching is completed before the document is available on all copies, you may encounter shards without it. However, given that the get api works in real time, it usually makes sense to keep the replication in sync, so that it is in the index After the operation returns, you can always return the document by id from any shard that should contain it. Nevertheless, if you try to retrieve the document when indexing the document for the first time, you may find that it did not find it. /p>

There is also a write consistency parameter in elasticsearch, but it is different from the way other data stores work. It has nothing to do with whether the replication is synchronous or asynchronous. Using the consistency parameter, you can control how many copies of data are required to allow writing Operation. If there are not enough data copies available, the write operation will fail (after waiting up to 1 minute, you can change the interval by the timeout parameter). This is just a preliminary check to decide whether to accept the operation. This does not mean If the operation fails on the replica, it will be rolled back. In fact, if the write operation fails on the replica but succeeds on the primary database, it is assumed that there is an error in the replica (or the running hardware), so the shard will be marked To fail and re-create it on another node. The default value of consistency is arbitration, and it can also be set to one or all.

That is to say, when it comes to get api, elasticsearch is ultimately inconsistent. But it’s just consistent, because once the document is indexed, you can retrieve it.

In fact, newly added documents cannot be searched until the next refresh operation, which happens automatically once per second by default, which is actually Not final consistency (because the document is there and can be retrieved by id), but more about how search works with lucene and how to make the document visible through lucene.

Leave a Comment

Your email address will not be published.