-
-
How to ensure that the task is only executed on one node
-
If orderserver1 is down, how will other nodes discover and take over
-
existence of shared resources, mutual exclusion, security
< p class="md-focus-p">
Google’s Chubby is a distributed lock service, through Google Chubby to solve distributed collaboration, Master election, etc. Issues related to distributed lock service
-
-
Cluster solution (Leader Follower) can also share requests, both high availability and high performance
-
-
The data of each node is consistent (a leader is required)
leader master (with centralization) redis-cluser (without centralization)
-
How to ensure data consistency? (Distributed transaction)
2PC protocol, second-order submission span>
< li class="md-list-item">
The leader in the cluster is down, what should I do? How to recover data?
election mechanism? Data recovery
-
affair inquiry
-
The coordinator sends the transaction content to all participants and asks whether the transaction can be executed Submit the operation and start waiting for the response of each participant.
-
-
Execute transaction
-
Each participant node performs transaction operations, and records Undo and Redo information in the transaction log, try to ensure that all time-consuming operations and preparations in the submission process are completed in advance Subsequent 100% successful transaction submission
-
-
Each participant feedbacks the response of the transaction inquiry to the coordinator
-
If each participant successfully executes the transaction operation, then the participant’s yes response is fed back, Indicates that the transaction can be executed;
-
If the participant does not successfully execute the transaction, it will feed back the no response from the coordinator, indicating that the transaction cannot be executed;
-
The first stage of the 2pc agreement is called the “voting stage”, that is, whether each participant’s voting table name needs to continue Perform the next transaction commit operation.
-
Phase 2: Implementation Transaction submission
At this stage, the coordinator will decide whether it is okay based on the feedback from each participant Commit transaction;
Two possibilities:
-
Execute transaction
-
Interrupt transaction
If it is a read request, read the data directly from the current node
If it is a write request, then the request will be forwarded to the leader to submit the transaction, and then the leader broadcasts the transaction Give the follower node in the cluster (note that the obeserver node does not participate in voting), the follower node gives the leader an ack (ack indicates whether the current node can execute this transaction), as long as more than half of the nodes are successfully written, the write request will be submit. Cluster nodes need (2n+1)
< span class="md-line md-end-block"> is the entire core of zookeeper and plays a leading role in the entire cluster
-
Transaction request scheduling and processing
-
Guarantee the order of transaction processing Sex
Follower role
< ol class="ol-list" start="">
handles non- Transaction request,
< span> forward transaction request to leader server
Participate in the voting of the transaction request Proposal (it needs more than half of the servers to pass to notify the leader com mit data; Leader’s proposal, which requires follower to vote)
Participate in voting for leader node election
Observer role
is an observer role< /p>
-
Understand the status changes in the cluster and synchronize these statuses
-
The working principle is the same as that of the follower node. The only difference is that it does not participate in the voting of transaction requests and does not participate in the leader election.
< li class="md-list-item">
Observer only provides non-transactional requests, usually because it does not affect the cluster transaction processing capabilities Under the premise of improving the non-transaction processing capabilities of the cluster
Note:
means odd-numbered nodes. If zookeeper wants to provide services to the outside world normally, there is a voting mechanism in it. This mechanism is that more than half of the machines must work normally. And can Enough to complete communication with each other to conduct transaction voting results.
Atomic broadcast protocol supporting crash recovery, mainly used for data consistency
ZAB protocol basic mode
-
crash recovery (recover leader Node and recovery data)
-
Atomic Broadcasting
The message broadcasting process is actually a simplified second-order submission . 2PC
-
leader receives the message request, it assigns the message a globally unique 64-bit self-incrementing id (ZXID). The size of ZXID realizes the characteristics of causal order.
-
leader A FIFO queue is prepared for each follower, and the message with zxid is distributed to all followers as a proposal.
-
When the follower receives the proposal, first write the proposal to the disk, and then reply to the leader with an ack< /span>
-
be the leader After receiving a legal number of ack, the leader will send a commit command to this follower, and the message will be executed locally.
-
when After the follower receives the commit of the message, it will submit the message.
Note: The leader’s voting process does not require Observer’s ack, but the Observer must synchronize the leader’s data to ensure data consistency .
-
-
When the leader server is down
Cluster enters the crash recovery phase
For data recovery
- < p>Processed messages cannot be lost
-
When the leader receives a legal number of followers’ ack, it will send Each follower broadcasts a message (commit command), and at the same time commits this transaction message.
-
if Before the follower node receives the commit command, the leader hangs up, which will cause some nodes to receive the commit and some nodes not to receive it.
-
ZAB The protocol needs to ensure that the processed messages cannot be lost.
-
-
Discarded messages cannot appear again
-
When the leader receives the transaction request and has not initiated the transaction vote, the leader hangs up
ZAB’s design ideas
< ol class="ol-list" start="">
zxid is the largest
-
If the leader election algorithm can guarantee the newly elected leader server has the transaction proposal with the highest number (the largest ZXID) of all machines in the cluster, then It can be guaranteed that the newly elected Leader must have the submitted proposal. Because all proposals must have more than half of the follower ACKs before being committed, that is, there must be more than half of the server’s transaction logs with the proposal of the proposal, so as long as there is a legal number of nodes working normally, there must be a node that saves all The proposal status of the committed message.
The concept of epoch, every time a new leader is generated, the epoch of the new leader will be +1, zxid is 64-bit data, and the lower 32 bits represent the message counter ( Self-increment), this value is +1 every time a message is received, and this value is reset to 0 after a new leader is elected. The reason for this design is that when the old leader is restarted after it hangs up, he will not be elected as the leader, so at this time its zxid must be smaller than the current new leader. When the old leader connects to the new leader as a follower, the new leader The leader will let it clear all the proposals with the old epoch number that have not been COMMIT. The upper 32 bits will store the epoch number
In order to ensure the consistency of the transaction sequence, Zookeeper uses an incremental transaction id number to identify the transaction.
All Proposals are proposed with zxid added. span>
ZXID is a 64-bit number (low 32 bits and high 32 bits) span>
-
Lower 32 bits: indicates the message counter
-
Higher 32 bits: epoch, used to identify whether the leader relationship has changed. Every time a leader is selected, there will be a new epoch (the original epoch+1), which identifies the current period of the leader’s reign.
Note:
leader: similar, has its own year number, and it will be changed every time Add 1 to the previous era.
/tmp/zookeeper/VERSION-2 The path will look To a currentEpoch file. The file shows the current epoch
View transaction log by command< /span>
java -cp :/opt/zookeeper/zookeeper-3.4.10/lib/slf4j-api-1.6 .1.jar:/opt/zookeeper/zookeeper-3.4.10/zookeeper-3.4.10.jar org.apache.zookeeper.server.LogFormatter /tmp/zookeeper/version-2/log.100000001 span>
Note:< /p>
< /span>
ZXID (transaction ID) The larger the transaction ID, the newer the data, and the maximum will be set to Leader, ,
epoch, each round of voting, the epoch will increase span>
myid (server id ,server id)
The larger the myid, the greater the weight in the leader election mechanism
service The status when the service is started is LOOKING, wait and see status
LEADING< /span>
FOLLOWING
OBSERVING
-
-
Vote information contains (myid,zxid,epoch)
-
-
Accept votes from various servers
-
Judge whether the vote is valid
-
Check if it comes from this round of voting epoch
-
Check if the server is in LOOKING status when it comes
-
-
-
Process voting
-
Check the ZXID, if the ZXID is larger, then Set as Leader,
-
If the ZXID is the same, the myid will be checked, the myid is larger, and set to the leader
ol>
-
-
Statistic voting< /p>
-
Determine whether more than half of the machines have received the same voting information
-
If more than half of the machines accept it, it is considered that the leader has been elected
-
-
Change server status
-
If it is Follower , Then the status becomes FOLLOWING
-
If it is Leader, then the status becomes LEADING
-
Change status p>
-
Leader After hanging up, the remaining non-Observer servers will change their server status to LOOKING,
-
Begin to enter the Leader election process
-
-
Each server will initiate a vote.
-
During the operation of the , the ZXID may be different, and then the respective votes will be sent to all machines in the cluster.
-
-
The rest is the same as the process at startup.