ZooKeeper implements distributed lock safety

  • Background
  • ConnectionLoss link is lost
  • SessionExpired session expired
  • Bypass zookeeper broker for status notification
  • leader election Disconnect from zkNode
  • Do idempotent
  • Static expansion, dynamic expansion
  • Background

    Distributed locks are now being used more and more The more, usually used to coordinate multiple concurrent tasks. In general application scenarios, there are certain unsafe usages. Unsafe usage will cause multiple masters to execute in parallel. There may be side effects caused by repeated calculations in business or data. If you do not get the lock, you can play the role of master, etc. And so on.

    In order to accurately obtain distributed locks and accurately capture the dynamic transfer status of locks in a distributed situation, it is necessary to deal with the chain reaction brought about by network changes. For example, common session expire, connectionLoss, how do we ensure that we get the lock accurately when setting the lock state.

    When designing tasks, we need to have a stop point strategy. This strategy is a mechanism that can deliver execution rights after sensing the loss of lock. However, whether such a serious matter needs to be dealt with depends on the business scenario. For example, the downstream tasks have been done idempotent and there is no such thing as repeated calculations. But in some cases, serious and precise control is really needed.

    ConnectionLoss link is lost

    Let’s talk about the first scenario, the connectionLoss event. This event indicates that the submitted commit may be executed successfully or failed. Success refers to execution in zookeeper broker. When it succeeded, tcp was disconnected when it returned, which resulted in the failure to get the returned status. Failure means that the link is broken without being submitted to the zookeper broker at all.

    So when we get the lock, we need to do connectionLoss event processing, let’s look at an example.

    protected void runForMaster() {

    logger.info("master:run for master.");

    AsyncCallback.StringCallback createCallback =
    (rc, path, ctx, name) -> {
    switch (KeeperException.Code.get(rc)) {
    case CONNECTIONLOSS:
    checkMaster();//Link failure check znode setting is successful
    return;
    case OK:
    isLeader = true;
    logger.info("master:I'm the leader serverId:" + serverId);
    addMasterWatcher();//Monitor master znode
    this.takeLeadership();//Execute leader rights
    break;
    case NODEEXISTS:
    isLeader = false;
    String serverId = this.getMasterServerId();
    this.takeBackup(serverId);
    break;

    }
    };

    zk.create(rootPath + "/master", serverId.getBytes (), ZooDefs.Ids.OPEN_ACL_UNSAFE,
    CreateMode.EPHEMERAL, createCallback, null);//Create master node
    }

    /**
    * check master Loop check
    */
    private void checkMaster() {

    AsyncCallback.DataCallback masterCheckCallback =
    (rc, path, ctx, data[], stat) -> {
    switch (KeeperException.Code.get(rc)) {
    case CONNECTIONLOSS:
    checkMaster();
    return;
    case NONODE:
    runForMaster ();
    return;
    default: {
    String serverId = this.getMasterServerId();
    isLeader = serverId.equals(this.serverId);
    if (BooleanUtils.isNotTrue(isLeader)) {
    this.takeBackup(serverId);
    } else {
    this.takeLeadership ();
    }
    }

    return;
    }
    };

    zk.getData(masterZnode, false, masterCheckCallback, null);
    }

    The master here means that it has the right to execute. Only when the master role is successfully obtained can the master right be fulfilled.

    Once the runForMaster method finds that there is a connectionLoss, it initiates a checkMaster to check, and at the same time, a connectinLoss check is also performed in the checkMaster method until a clear status is obtained. At this time, it is possible that another node has acquired the master role, then the current node will do a backup and wait for the opportunity.

    We need to capture all the status changes of zookeeper. We need to know when the master fails and prepare to apply. When we are the master, the session fails and the master rights need to be released.

    /**
    * Monitor master znode for master/slave switch
    */
    private void addMasterWatcher() {

    AsyncCallback.StatCallback addMasterWatcher = (rc, path, ctx, stat) -> {
    switch (KeeperException.Code.get(rc)) {
    case CONNECTIONLOSS:
    addMasterWatcher();
    break ;
    case OK:
    if (stat == null) {
    runForMaster();//master no longer exists
    } else {
    logger.info(" master:watcher master znode ok.");
    }
    break;
    case NONODE:
    logger.info("master:master znode delete.");
    runForMaster();
    break;
    }
    };

    zk.exists(masterZnode, MasterExistsWatcher, addMasterWatcher, null);
    }

    The zookeeper watcher mechanism is used to monitor the status and keep in touch with the network and zookeeper status changes.

    SessionExpired Session expires

    Let’s look at the second question. The first question is how to ensure that the state can be accurately obtained when obtaining the lock. The state here refers to the master role. Or backup role.

    When we successfully establish a link with the zookeeper broker, successfully obtain the master role and are fulfilling the master obligations, zookeeper suddenly informs that the session has expired. The SessionExpired event indicates that zookeeper will delete all temporary znodes created by the current session. This means that the master znode will be created by other sessions.

    At this time, we need to hand over our master rights, that is, we must let go of the tasks currently being performed, and this stopped state must be able to reflect the overall situation. At this time, the most likely problem is that we are no longer the master but are still secretly gaining the right to execute the master. You will see a very strange problem through the dashboard, and the server that is not the master is still executing.

    case SESSIONEXPIRED:
    //Execute stop point notification
    this.stopPoint();
    break;

    So here we need to design tasks There is a stop point strategy from time to time, similar to the safe point of jvm, which responds to the global stop at any time.

    Bypass zookeeper broker for status notification

    Another common way to use is to bypass zookeeper for status notification.

    We all know that the zookeeper cluster is composed of multiple instances. Each instance is located in a different part of the country or even the world. There is a big synchronization delay difference between the leader and these nodes. The zookeeper uses statutory A two-stage submission of the number of people completes a commit.

    For example, there are 7 instances to form a zookeeper cluster. When a client writes a commit, it only needs more than half of the cluster to complete the write, even if the commit is submitted successfully. However, cleint immediately executes the next task after receiving the successful submission response. This task may be to read all the state data under a certain znode. At this time, it may not be possible to read this state.

    If it is a distributed lock, it is likely that the transfer of the lock in the zk cluster cannot be maintained with the client cluster. Therefore, as long as the cluster scheduling is based on zookeeper, the original zookeeper must be used for status notification. It is not possible to bypass the zookeeper to schedule by itself.

    Leader election is disconnected from zkNode

    The zookeeper leader is the serializer of all state changes. Add, update, and delete need to be processed by the leader, and then propagated to all followers and observers node.

    All sessions are stored in the leader, and all watchers are stored in the zookeper node linked to the client. Both scenarios here will cause the notification of state transitions to be out of time.

    If zookeeper is a set of clusters composed of multiple data centers, there is a problem of remote synchronization delay, the leader will definitely be placed in the data center written, and the zid should be the largest, even one The high-zid machines are all in the data center where they are written. This ensures that the leader's downtime will not easily lead to leader election to other data centers.

    But followers and observers will have clients in use, and there will also be distributed clusters that coordinate on these nodes.

    Let’s talk about the leader election causing the delay perception problem of remote nodes. For example, the current zookeeper cluster consists of 7 machines:

    dataCenter shanghai: zid=100, zid=80, zid=50< br />dataCenter beijing: zid=10, zid=20
    dataCenter shenzhen: zid=30, zid=40

    Due to network problems, the cluster has a leader election, zid=100 temporarily leaves the cluster, zid =80 becomes the leader, here does not consider the old and new issues of the log, preferentially use zid for election.

    Because all sessions in the cluster are stored in the original machine with zid=100, the new leader does not have any session information, so all sessions will be lost.

    The retention time of the session depends on the sessinoTimeout we set. The client uses ping to propagate the heartbeat to the linked zkNode. This zkNode may be a node of any role, and then zkNode is communicating with zkleaderNode Heartbeat is used to maintain the session, and zkNode also uses ping to maintain the session overtime.

    At this time, when the original client reconnects to zkNode, it will be notified of sessionExpired. sessionExpired is notified by zkNode. When the session is lost or expired, the client will be notified by zkNode that the session has expired when trying to connect to zkNode.

    If the client only captures sessionExpired, there will obviously be multiple master operations, because when you disconnect from zkNode, when you have not received the sessionExpired event, another client has successfully created the master and got it. right.

    This situation will also occur when zkNode leaves the cluster, and sessionExpired delay notification will also occur after zkNode is disconnected. All watchers need to be created on a new zkNode to receive new events.

    Static expansion, dynamic expansion

    In extreme cases, static expansion may cause serious data inconsistencies in the zookeeper cluster. For example, the existing clusters: A, B, C, now need Perform static expansion, stop the ABC instance, and pull in the DE instance. At this time, if the C instance is the most lagging instance in ABC, if AB starts faster than C, CDE will form a new cluster, and the new epoch number will overwrite the original The AB log. Of course, static expansion is basically not accepted now, basically it is dynamic expansion.

    Dynamic expansion will also have similar problems in extreme cases. For example, there are now three computer rooms, 1, 2, 3, 1 computer room side leader zid=200, 100, 2 computer room zid=80, 50, 3 Computer room zid=40, suppose the last commit was submitted between zid=200, 100, and 50. At this time, the network is disconnected in computer room 1, and computer room zid=80, 50 and computer room zid=40 start to form a new cluster , The new era is produced on zid=50.

    Be idempotent

    When using zookeeper to implement distributed locks or cluster scheduling, there will be many distributed problems, in order to ensure that these problems will not cause Inconsistent business systems or business data, we still consider idempotence in these tasks.

    For example, data calculation, time check, version check and so on. If it is an independent distributed system based on zookeeper, the work required will be more.

    Author: Wang Qingpei (Senior Architect of Hujiang Group)

    Leave a Comment

    Your email address will not be published.