Zookeeper’s core principles

The origin of Zookeeper

each Data consistency of nodes
How to ensure that the task is only executed on one node
If orderserver1 is down, how will other nodes discover and take over
existence of shared resources, mutual exclusion, security

share picture

Apache’s Zookeeper

Google’s Chubby is a distributed lock service, through Google Chubby to solve distributed collaboration, Master election, etc. Issues related to distributed lock service

Zookeeper’s design conjecture

Prevent single points of failure
1. Cluster solution (Leader Follower) can also share requests, both high availability and high performance
The data of each node is consistent (a leader is required)

leader master (with centralization) redis-cluser (without centralization)

< li class="md-list-item">

The leader in the cluster is down, what should I do? How to recover data?

election mechanism? Data recovery

How to ensure data consistency? (Distributed transaction)

2PC protocol, second-order submission span>

2PC

(Two Phase Commitment Protocol) When a transaction operation needs to span multiple distributed nodes, in order to maintain the ACID characteristics of transaction processing, it is necessary to introduce a “coordinator” (TM) to uniformly schedule the execution logic of all distributed nodes, and these scheduled distributed nodes are called APs. TM is responsible for scheduling the behavior of APs, and finally decides whether these APs really commit the transaction; because the entire transaction is submitted in two stages, it is called 2PC.

Phase 1: Submit transaction request

affair inquiry
1. The coordinator sends the transaction content to all participants and asks whether the transaction can be executed Submit the operation and start waiting for the response of each participant.
Execute transaction
1. Each participant node performs transaction operations, and records Undo and Redo information in the transaction log, try to ensure that all time-consuming operations and preparations in the submission process are completed in advance Subsequent 100% successful transaction submission
Each participant feedbacks the response of the transaction inquiry to the coordinator
1. If each participant successfully executes the transaction operation, then the participant’s yes response is fed back, Indicates that the transaction can be executed;
2. If the participant does not successfully execute the transaction, it will feed back the no response from the coordinator, indicating that the transaction cannot be executed;
3. The first stage of the 2pc agreement is called the “voting stage”, that is, whether each participant’s voting table name needs to continue Perform the next transaction commit operation.

Phase 2: Implementation Transaction submission

At this stage, the coordinator will decide whether it is okay based on the feedback from each participant Commit transaction;

Two possibilities:

Execute transaction
Interrupt transaction

Share a picture

Zookeeper’s cluster role

In zookeeper, the client randomly connects to a node in zookeeper.

If it is a read request, read the data directly from the current node

share picture

If it is a write request, then the request will be forwarded to the leader to submit the transaction, and then the leader broadcasts the transaction Give the follower node in the cluster (note that the obeserver node does not participate in voting), the follower node gives the leader an ack (ack indicates whether the current node can execute this transaction), as long as more than half of the nodes are successfully written, the write request will be submit. Cluster nodes need (2n+1)

share picture

Leader role

is the entire core of zookeeper and plays a leading role in the entire cluster

Transaction request scheduling and processing
Guarantee the order of transaction processing Sex

Follower role

< ol class="ol-list" start="">

handles non- Transaction request,

forward transaction request to leader server

Participate in the voting of the transaction request Proposal (it needs more than half of the servers to pass to notify the leader com mit data; Leader’s proposal, which requires follower to vote)

Participate in voting for leader node election

Observer role

is an observer role

Understand the status changes in the cluster and synchronize these statuses
The working principle is the same as that of the follower node. The only difference is that it does not participate in the voting of transaction requests and does not participate in the leader election.

< li class="md-list-item">

Observer only provides non-transactional requests, usually because it does not affect the cluster transaction processing capabilities Under the premise of improving the non-transaction processing capabilities of the cluster

Note:

Why 2n+1 nodes are needed

means odd-numbered nodes. If zookeeper wants to provide services to the outside world normally, there is a voting mechanism in it. This mechanism is that more than half of the machines must work normally. And can Enough to complete communication with each other to conduct transaction voting results.

ZAB protocol

ZAB (Zookeeper Atomic Broadcast) protocol is for distributed coordination services. ZooKeeper specifically designed an atomic broadcast protocol that supports crash recovery. In ZooKeeper, ZAB protocol is mainly relied on to achieve distributed data consistency. Based on this protocol, ZooKeeper implements a system architecture of active/standby mode to maintain data consistency between replicas in the cluster.

ZAB

Atomic broadcast protocol supporting crash recovery, mainly used for data consistency

ZAB protocol basic mode

crash recovery (recover leader Node and recovery data)
Atomic Broadcasting

The realization principle of message broadcasting

The message broadcasting process is actually a simplified second-order submission . 2PC

leader receives the message request, it assigns the message a globally unique 64-bit self-incrementing id (ZXID). The size of ZXID realizes the characteristics of causal order.
leader A FIFO queue is prepared for each follower, and the message with zxid is distributed to all followers as a proposal.
When the follower receives the proposal, first write the proposal to the disk, and then reply to the leader with an ack
be the leader After receiving a legal number of ack, the leader will send a commit command to this follower, and the message will be executed locally.
when After the follower receives the commit of the message, it will submit the message.

Note: The leader’s voting process does not require Observer’s ack, but the Observer must synchronize the leader’s data to ensure data consistency .

crash recovery

When the leader loses contact with more than half of the follower nodes
When the leader server is down

Cluster enters the crash recovery phase

For data recovery

Processed messages cannot be lost
1. When the leader receives a legal number of followers’ ack, it will send Each follower broadcasts a message (commit command), and at the same time commits this transaction message.
2. if Before the follower node receives the commit command, the leader hangs up, which will cause some nodes to receive the commit and some nodes not to receive it.
3. ZAB The protocol needs to ensure that the processed messages cannot be lost.
Discarded messages cannot appear again
When the leader receives the transaction request and has not initiated the transaction vote, the leader hangs up

The ZAB protocol needs to meet the above two conditions and must A leader election algorithm needs to be designed: it can guarantee that the transaction proposal that has been submitted by the leader can be submitted, and at the same time discard the transaction proposal that has been skipped.

ZAB’s design ideas

< ol class="ol-list" start="">

zxid is the largest

If the leader election algorithm can guarantee the newly elected leader server has the transaction proposal with the highest number (the largest ZXID) of all machines in the cluster, then It can be guaranteed that the newly elected Leader must have the submitted proposal. Because all proposals must have more than half of the follower ACKs before being committed, that is, there must be more than half of the server’s transaction logs with the proposal of the proposal, so as long as there is a legal number of nodes working normally, there must be a node that saves all The proposal status of the committed message.

The concept of epoch, every time a new leader is generated, the epoch of the new leader will be +1, zxid is 64-bit data, and the lower 32 bits represent the message counter ( Self-increment), this value is +1 every time a message is received, and this value is reset to 0 after a new leader is elected. The reason for this design is that when the old leader is restarted after it hangs up, he will not be elected as the leader, so at this time its zxid must be smaller than the current new leader. When the old leader connects to the new leader as a follower, the new leader The leader will let it clear all the proposals with the old epoch number that have not been COMMIT. The upper 32 bits will store the epoch number

ZXID

ZXID, transaction ID

In order to ensure the consistency of the transaction sequence, Zookeeper uses an incremental transaction id number to identify the transaction.

All Proposals are proposed with zxid added. span>

ZXID is a 64-bit number (low 32 bits and high 32 bits) span>

Lower 32 bits: indicates the message counter
Higher 32 bits: epoch, used to identify whether the leader relationship has changed. Every time a leader is selected, there will be a new epoch (the original epoch+1), which identifies the current period of the leader’s reign.

Note:

epoch: It can be understood as the era or cycle of the current cluster.

leader: similar, has its own year number, and it will be changed every time Add 1 to the previous era.

View epoch under Linux

/tmp/zookeeper/VERSION-2 The path will look To a currentEpoch file. The file shows the current epoch

View transaction log by command

java -cp :/opt/zookeeper/zookeeper-3.4.10/lib/slf4j-api-1.6 .1.jar:/opt/zookeeper/zookeeper-3.4.10/zookeeper-3.4.10.jar org.apache.zookeeper.server.LogFormatter /tmp/zookeeper/version-2/log.100000001 span>

Leader election

Note:

Fast Leader

ZXID (transaction ID) The larger the transaction ID, the newer the data, and the maximum will be set to Leader, ,

epoch, each round of voting, the epoch will increase span>

myid (server id ,server id)

The larger the myid, the greater the weight in the leader election mechanism

service The status when the service is started is LOOKING, wait and see status

LEADING

FOLLOWING

OBSERVING

Leader election at server startup

Each server sends a vote, the initial situation , Will vote for itself as the Leader server, and then send it to the servers in other clusters.
1. Vote information contains (myid,zxid,epoch)
Accept votes from various servers
1. Judge whether the vote is valid
  1. Check if it comes from this round of voting epoch
  2. Check if the server is in LOOKING status when it comes
Process voting
1. Check the ZXID, if the ZXID is larger, then Set as Leader,
2. If the ZXID is the same, the myid will be checked, the myid is larger, and set to the leader
3. Statistic voting
 1. Determine whether more than half of the machines have received the same voting information
 2. If more than half of the machines accept it, it is considered that the leader has been elected
4. Change server status
Leader election when Leader crashes
1. Change status p>
 1. Leader After hanging up, the remaining non-Observer servers will change their server status to LOOKING,
 2. Begin to enter the Leader election process
2. Each server will initiate a vote.
 1. During the operation of the , the ZXID may be different, and then the respective votes will be sent to all machines in the cluster.
3. The rest is the same as the process at startup.

ZooKeeper core principle

Zookeeper’s core principles

The origin of Zookeeper

Zookeeper’s design conjecture

2PC

Phase 1: Submit transaction request

Phase 2: Implementation Transaction submission

Zookeeper’s cluster role

Leader role

Follower role

Observer role

ZAB protocol

The realization principle of message broadcasting

crash recovery

ZXID

Leader election

Leader election at server startup

Leader election when Leader crashes

Leave a Comment Cancel reply