ZooKeeper core principle

Zookeeper’s core principles

The origin of Zookeeper

  1. each Data consistency of nodes

  2. How to ensure that the task is only executed on one node

  3. If orderserver1 is down, how will other nodes discover and take over

  4. existence of shared resources, mutual exclusion, security

share picture

share picture

< p class="md-focus-p">Apache’s Zookeeper

Google’s Chubby is a distributed lock service, through Google Chubby to solve distributed collaboration, Master election, etc. Issues related to distributed lock service

Zookeeper’s design conjecture

  1. Prevent single points of failure

    1. Cluster solution (Leader Follower) can also share requests, both high availability and high performance

  2. The data of each node is consistent (a leader is required)

    leader master (with centralization) redis-cluser (without centralization)

  3. < li class="md-list-item">

    The leader in the cluster is down, what should I do? How to recover data?

    election mechanism? Data recovery

  4. How to ensure data consistency? (Distributed transaction)

    2PC protocol, second-order submission span>

2PC

(Two Phase Commitment Protocol) When a transaction operation needs to span multiple distributed nodes, in order to maintain the ACID characteristics of transaction processing, it is necessary to introduce a “coordinator” (TM) to uniformly schedule the execution logic of all distributed nodes, and these scheduled distributed nodes are called APs. TM is responsible for scheduling the behavior of APs, and finally decides whether these APs really commit the transaction; because the entire transaction is submitted in two stages, it is called 2PC.

Phase 1: Submit transaction request

  1. affair inquiry

    1. The coordinator sends the transaction content to all participants and asks whether the transaction can be executed Submit the operation and start waiting for the response of each participant.

  2. Execute transaction

    1. Each participant node performs transaction operations, and records Undo and Redo information in the transaction log, try to ensure that all time-consuming operations and preparations in the submission process are completed in advance Subsequent 100% successful transaction submission

  3. Each participant feedbacks the response of the transaction inquiry to the coordinator

    1. If each participant successfully executes the transaction operation, then the participant’s yes response is fed back, Indicates that the transaction can be executed;

    2. If the participant does not successfully execute the transaction, it will feed back the no response from the coordinator, indicating that the transaction cannot be executed;

    3. The first stage of the 2pc agreement is called the “voting stage”, that is, whether each participant’s voting table name needs to continue Perform the next transaction commit operation.

Phase 2: Implementation Transaction submission

At this stage, the coordinator will decide whether it is okay based on the feedback from each participant Commit transaction;

Two possibilities:

  • Execute transaction

  • Interrupt transaction

Share a picture

Zookeeper’s cluster role

In zookeeper, the client randomly connects to a node in zookeeper.

If it is a read request, read the data directly from the current node

share picture

If it is a write request, then the request will be forwarded to the leader to submit the transaction, and then the leader broadcasts the transaction Give the follower node in the cluster (note that the obeserver node does not participate in voting), the follower node gives the leader an ack (ack indicates whether the current node can execute this transaction), as long as more than half of the nodes are successfully written, the write request will be submit. Cluster nodes need (2n+1)

share picture

Leader role

< span class="md-line md-end-block"> is the entire core of zookeeper and plays a leading role in the entire cluster

  1. Transaction request scheduling and processing

  2. Guarantee the order of transaction processing Sex

Follower role

< ol class="ol-list" start="">

  • handles non- Transaction request,

  • < span> forward transaction request to leader server

  • Participate in the voting of the transaction request Proposal (it needs more than half of the servers to pass to notify the leader com mit data; Leader’s proposal, which requires follower to vote)

  • Participate in voting for leader node election

  • Observer role

    is an observer role< /p>

    1. Understand the status changes in the cluster and synchronize these statuses

    2. The working principle is the same as that of the follower node. The only difference is that it does not participate in the voting of transaction requests and does not participate in the leader election.

    3. < li class="md-list-item">

      Observer only provides non-transactional requests, usually because it does not affect the cluster transaction processing capabilities Under the premise of improving the non-transaction processing capabilities of the cluster

    Note:

    Why 2n+1 nodes are needed

    means odd-numbered nodes. If zookeeper wants to provide services to the outside world normally, there is a voting mechanism in it. This mechanism is that more than half of the machines must work normally. And can Enough to complete communication with each other to conduct transaction voting results.

    ZAB protocol

    ZAB (Zookeeper Atomic Broadcast) protocol is for distributed coordination services. ZooKeeper specifically designed an atomic broadcast protocol that supports crash recovery. In ZooKeeper, ZAB protocol is mainly relied on to achieve distributed data consistency. Based on this protocol, ZooKeeper implements a system architecture of active/standby mode to maintain data consistency between replicas in the cluster.

    ZAB< /span>

    Atomic broadcast protocol supporting crash recovery, mainly used for data consistency

    ZAB protocol basic mode

    • crash recovery (recover leader Node and recovery data)

    • Atomic Broadcasting

    The realization principle of message broadcasting

    The message broadcasting process is actually a simplified second-order submission . 2PC

    1. leader receives the message request, it assigns the message a globally unique 64-bit self-incrementing id (ZXID). The size of ZXID realizes the characteristics of causal order.

    2. leader A FIFO queue is prepared for each follower, and the message with zxid is distributed to all followers as a proposal.

    3. When the follower receives the proposal, first write the proposal to the disk, and then reply to the leader with an ack< /span>

    4. be the leader After receiving a legal number of ack, the leader will send a commit command to this follower, and the message will be executed locally.

    5. when After the follower receives the commit of the message, it will submit the message.

    Note: The leader’s voting process does not require Observer’s ack, but the Observer must synchronize the leader’s data to ensure data consistency .

    crash recovery

    1. When the leader loses contact with more than half of the follower nodes

    2. When the leader server is down

    Cluster enters the crash recovery phase

    For data recovery

    1. < p>Processed messages cannot be lost

      1. When the leader receives a legal number of followers’ ack, it will send Each follower broadcasts a message (commit command), and at the same time commits this transaction message.

      2. if Before the follower node receives the commit command, the leader hangs up, which will cause some nodes to receive the commit and some nodes not to receive it.

      3. ZAB The protocol needs to ensure that the processed messages cannot be lost.

    2. Discarded messages cannot appear again

    3. When the leader receives the transaction request and has not initiated the transaction vote, the leader hangs up

    The ZAB protocol needs to meet the above two conditions and must A leader election algorithm needs to be designed: it can guarantee that the transaction proposal that has been submitted by the leader can be submitted, and at the same time discard the transaction proposal that has been skipped.

    ZAB’s design ideas

    < ol class="ol-list" start="">

  • zxid is the largest

    1. If the leader election algorithm can guarantee the newly elected leader server has the transaction proposal with the highest number (the largest ZXID) of all machines in the cluster, then It can be guaranteed that the newly elected Leader must have the submitted proposal. Because all proposals must have more than half of the follower ACKs before being committed, that is, there must be more than half of the server’s transaction logs with the proposal of the proposal, so as long as there is a legal number of nodes working normally, there must be a node that saves all The proposal status of the committed message.

  • The concept of epoch, every time a new leader is generated, the epoch of the new leader will be +1, zxid is 64-bit data, and the lower 32 bits represent the message counter ( Self-increment), this value is +1 every time a message is received, and this value is reset to 0 after a new leader is elected. The reason for this design is that when the old leader is restarted after it hangs up, he will not be elected as the leader, so at this time its zxid must be smaller than the current new leader. When the old leader connects to the new leader as a follower, the new leader The leader will let it clear all the proposals with the old epoch number that have not been COMMIT. The upper 32 bits will store the epoch number

  • ZXID

    ZXID, transaction ID

    In order to ensure the consistency of the transaction sequence, Zookeeper uses an incremental transaction id number to identify the transaction.

    All Proposals are proposed with zxid added. span>

    ZXID is a 64-bit number (low 32 bits and high 32 bits) span>

    • Lower 32 bits: indicates the message counter

    • Higher 32 bits: epoch, used to identify whether the leader relationship has changed. Every time a leader is selected, there will be a new epoch (the original epoch+1), which identifies the current period of the leader’s reign.

    Note:

    epoch: It can be understood as the era or cycle of the current cluster.

    leader: similar, has its own year number, and it will be changed every time Add 1 to the previous era.

    View epoch under Linux

    /tmp/zookeeper/VERSION-2 The path will look To a currentEpoch file. The file shows the current epoch

    View transaction log by command< /span>

    java -cp :/opt/zookeeper/zookeeper-3.4.10/lib/slf4j-api-1.6 .1.jar:/opt/zookeeper/zookeeper-3.4.10/zookeeper-3.4.10.jar org.apache.zookeeper.server.LogFormatter /tmp/zookeeper/version-2/log.100000001 span>

    Leader election

    Note:< /p>

    Fast Leader< /span>

    ZXID (transaction ID) The larger the transaction ID, the newer the data, and the maximum will be set to Leader, ,

    epoch, each round of voting, the epoch will increase span>

    myid (server id ,server id)

    The larger the myid, the greater the weight in the leader election mechanism

    service The status when the service is started is LOOKING, wait and see status

    LEADING< /span>

    FOLLOWING

    OBSERVING

    Leader election at server startup

    1. Each server sends a vote, the initial situation , Will vote for itself as the Leader server, and then send it to the servers in other clusters.

      1. Vote information contains (myid,zxid,epoch)

    2. Accept votes from various servers

      1. Judge whether the vote is valid

        1. Check if it comes from this round of voting epoch

        2. Check if the server is in LOOKING status when it comes

    3. Process voting

      1. Check the ZXID, if the ZXID is larger, then Set as Leader,

      2. If the ZXID is the same, the myid will be checked, the myid is larger, and set to the leader

      3. Statistic voting< /p>

        1. Determine whether more than half of the machines have received the same voting information

        2. If more than half of the machines accept it, it is considered that the leader has been elected

      4. Change server status

        1. If it is Follower , Then the status becomes FOLLOWING

        2. If it is Leader, then the status becomes LEADING

      Leader election when Leader crashes

      1. Change status p>

        1. Leader After hanging up, the remaining non-Observer servers will change their server status to LOOKING,

        2. Begin to enter the Leader election process

      2. Each server will initiate a vote.

        1. During the operation of the , the ZXID may be different, and then the respective votes will be sent to all machines in the cluster.

      3. The rest is the same as the process at startup.

    Leave a Comment

    Your email address will not be published.