One ZooKeeper function

1. File system

2. Notification mechanism

Two Zookeeper file system< /h1>

Each sub-directory item is called a znode. Like the file system, we can add and delete znodes freely. Add or delete sub-znodes under a znode. The only difference is that znodes can store data. .

There are four types of znodes:
1, PERSISTENT-persistent directory node
After the client disconnects from zookeeper, the node still exists
2, PERSISTENT_SEQUENTIAL -Persistent sequence numbering directory node
After the client is disconnected from zookeeper, the node still exists, but Zookeeper gives the node name sequence number
3, EPHEMERAL-temporary directory node
Client and zookeeper After disconnection, the node is deleted
4, EPHEMERAL_SEQUENTIAL-temporary sequential numbering directory node
After the client disconnects from zookeeper, the node is deleted, but Zookeeper gives the node name a sequential number

According to the sequence number, it can be used as a distributed lock.

Three Zookeeper notification mechanism

The client registers and listens to the directory node it cares about. When the directory node changes (data change, delete, subdirectory node) When adding or deleting), zookeeper will notify the client.

Four Zookeeper design goals

1. Eventual consistency: no matter which server the client is connected to, it will be displayed to the same view, which is zookeeper The most important performance.
2. Reliability: Simple, robust, and good performance. If a message is accepted by one server, it will be accepted by all servers.
3. Real-time: Zookeeper guarantees that the client will obtain the update information of the server within a time interval, or the information of the server failure. However, due to network delay and other reasons, Zookeeper cannot guarantee that two clients can get the newly updated data at the same time. If the latest data is needed, the sync() interface should be called before reading the data.
4. Wait-free: A slow or invalid client must not interfere with the request of a fast client, so that each client can wait effectively.
5. Atomicity: Updates can only succeed or fail, and there is no intermediate state.
6. Sequentiality: Including global ordering and partial ordering: global ordering means that if message a is published before message b on one server, message a will be before message b on all servers Is published; partial order means that if a message b is published by the same sender after message a, a must be ranked before b.

Five server working status under Zookeeper

Each server has three statuses in the working process:
LOOKING: The current server does not know who the leader is , Searching for
LEADING: The current Server is the elected leader
FOLLOWING: The leader has been elected, and the current Server is synchronized with it

Six Zookeeper election process

Zk has two election algorithms: one is based on basic paxos and the other is based on fast paxos. The default election algorithm of the system is fast paxos.

basic paxos
1. The election thread is held by the thread that the current server initiates the election. Its main function is to count the voting results and select the recommended server;
2. The election thread first Initiate a query to all servers (including yourself);
3. After the election thread receives the reply, verify whether it is the query initiated by itself (verify whether the zxid is consistent), and then obtain the id (myid) of the other party and store it to the current In the list of query objects, finally get the leader-related information (id, zxid) proposed by the other party, and store this information in the voting record table of the current election;
4. After receiving all the server replies, calculate the zxid The largest server, and set the server related information as the next server to vote;
5. The thread sets the current server with the largest zxid as the leader recommended by the current server, if the winning server at this time gets n/ 2 + 1 server votes, set the currently recommended leader as the winning server, and set its own status based on the winning server’s related information, otherwise, continue this process until the leader is elected. Through process analysis, we can get that: in order for Leader to get the support of most servers, the total number of servers must be an odd number 2n+1, and the number of surviving servers must not be less than n+1. The above process will be repeated after each server is started. In the recovery mode, if the server has just recovered from a crash state or has just started the server, it will also recover data and session information from the disk snapshot. Zk will record the transaction log and take regular snapshots to facilitate state recovery during recovery. The specific flowchart of the leader selection is as follows:

fast paxos

During the election process, a server first proposes to all servers that it should become the leader, and when other servers receive the proposal, it will be resolved. The conflict between epoch and zxid, and accept the other party’s proposal, and then send a message to the other party that the proposal is completed, repeat this process, and finally the Leader will be elected.

Seven Zookeeper workflow-Leader

1. Restore data;
2. Maintain the heartbeat with Learner, receive Learner request and judge Learner’s Request message type;
3. The message types of Learner mainly include PING message, REQUEST message, ACK message, and REVALIDATE message. According to different message types, different processing is performed.
PING message refers to the learner’s heartbeat information;
REQUEST message is the proposal information sent by Follower, including write request and synchronization request;
ACK message is Follower’s reply to the proposal, more than half of the followers passed, Commit the proposal;
REVALIDATE message is used to extend the effective time of SESSION.

Eight Zookeeper workflow-Follower

Follower has four main functions:
1. Send requests to Leader (PING message, REQUEST message, ACK message, REVALIDATE message);
2. Receive Leader message and process it;
3. Receive Client’s request, if it is a write request, send it to Leader for voting;
4. Return Client result.

Follower’s message loop processes the following types of messages from Leader:
1 .PING message: heartbeat message;
2 .PROPOSAL message: a proposal initiated by the Leader, requiring Follower to vote;
3. COMMIT message: information about the latest proposal on the server side;
4. UPTODATE message: indicating that the synchronization is complete;
5. REVALIDATE message: according to the leader’s REVALIDATE result, closing the session to be revalidated or allowing it to accept messages;
6. SYNC message: returns the SYNC result to the client. This message is initially initiated by the client to force the latest update.

Nine Zookeeper roles

Share pictures

What can Zookeeper do

1. Registry

Create a directory in the zookeeper’s file system, That is, there is a unique path. When we use tborg to be unable to determine the deployment machine of the upstream program, we can agree on the path with the downstream program, and we can explore each other through the path.

2. Configuration management

Programs always need to be configured. If the program is deployed on multiple machines, it becomes difficult to change the configuration one by one . Now put all these configurations on zookeeper, save them in a certain directory node of Zookeeper, and then all related applications will monitor this directory node. Once the configuration information changes, each application will receive a notification from Zookeeper. Then get the new configuration information from Zookeeper and apply it to the system.

3. Cluster management

The so-called cluster management is : The machine exits and joins, and elects the master.
The machine exits, all machines agree to create a temporary directory node under the parent directory GroupMembers, and then monitor the child node change message of the parent directory node. Once a machine hangs, the connection between the machine and zookeeper is disconnected, the temporary directory node it created is deleted, and all other machines are notified that a brother directory has been deleted

The addition of the machine is similar. All machines are notified: the new brother directory has been added, and the highcount has been elected again.

We will change a little bit. All machines will create temporary sequential numbered directory nodes, and each time the machine with the smallest number will be selected as the master. .

4. Distributed lock

With the consistent file system of zookeeper, the problem of locking becomes easy. Lock services can be divided into two categories, one is to keep exclusive, and the other is to control timing.
For the first category, we regard a znode on zookeeper as a lock, which is implemented by createznode. All clients create the /distribute_lock node, and the client that is successfully created in the end also owns the lock. When you finish deleting the distribute_lock node you created, release the lock.
For the second type, /distribute_lock already exists, and all clients create a temporary sequence numbered directory node under it. Like the master, the one with the smallest number gets the lock. , Delete the temporary sequence numbering directory node when it is used up

5. Queue management

Two types of queues:
1. Synchronous queue, when one This queue is only available when the members of the queue are gathered, otherwise it has been waiting for all members to arrive.
2. The queue enters and dequeues in FIFO mode.

The first category is to create a temporary directory node under the agreed directory, and monitor whether the number of nodes is the number we require. The second category is consistent with the basic principle of the control sequence scenario in the distributed lock service. The entry is numbered, and the exit is numbered.

ZK small knot