Distributed classic structure

As shown in the figure, when the front end receives the request, By calculating the hash value of the key, the hash value is modulo 3, and then distributed to different back-end servers

However, this structure exposes problems when adding or reducing back-end servers. Adding or reducing the back-end server at a time, all data placed in the server must be re-calculated hash, the hash value is touched to a new number, and re-added. In this way, the cost of data migration is too high, which leads to consistency Hashing

Consistent Hash

The front-end server structure remains unchanged, the following are all back-end servers.

Assuming the value calculated by the hash function In the range of 0-2^64, think of it as a ring, as follows:

Put the server here On the ring, the server must also have a hash value, which is calculated by the server’s unique logo (ip, mac, hostname, etc.), as follows:

When the request comes, the hash value of the request is calculated, and the hash value is determined Will hit this ring, and then send the request to the first server found clockwise, as follows:

also It is to find the first server with a larger hash value than the request.

After implementing this structure, if you add one to the server, you only need to find the server that was originally responsible for this area, and then it should be responsible for the data of the area. Just take it and delete it from the original server, as follows:

The same goes for deleting a server

In this way, the cost of data migration is indeed reduced, but new problems arise, and the balance of data cannot be guaranteed, because the hash value calculated by the hash function is random, so it is likely to occur Uneven distribution of two servers:

At this time, S1 is responsible for most of the data, and S2 is only responsible A small part of the data, even if it happens to be evenly distributed, S1 and S2 happen to hit the two ends of the ring, but adding a new server will inevitably break the balance:

This must not work Yes, so how to solve this problem?

What caused this problem? It is caused by the hash function. When the amount of data is large, the hash function can ensure a uniform distribution. But when the amount of data is small, it is not guaranteed, so let the amount of data be large.

We allocate 10,000 virtual nodes to each server, so that the virtual nodes are distributed on the ring, and the server is responsible The area of is the sum of the areas responsible for these 10,000 virtual nodes, so that the amount of data is guaranteed when calculating the hash, that is, it is guaranteed that the hash value will be evenly distributed on the ring, and the problem is solved.

Above , Is a simple introduction to consistent hashing!!!

You can pay attention to my official account, thank you! Share a picture