ZooKeeper foundation

1. The origin of zookeeper

  Zookeeper originated from a research group in Yahoo Research Institute. At that time, researchers discovered that many large systems within Yahoo basically depend on a similar system for distributed coordination, but these systems often have distributed single-point problems.

   Therefore, Yahoo developers tried to develop a universal distributed coordination framework with no single point of issue, so that developers can focus on processing business logic.

   There is actually an interesting anecdote about the name of the “ZooKeeper” project. In the early stage of the project, considering that many internal projects were named after animals (such as the famous Pig project), Yahoo engineers hope to give this project an animal name. Raghu Ramakrishnan, the chief scientist of the Institute at the time, jokingly said: “If this continues, we will become a zoo!” As soon as this was said, everyone said that they should call zookeepers, because the distribution of animals named after each other Putting the components together, Yahoo’s entire distributed system looks like a large zoo.

   Zookeeper happens to be used to coordinate the distributed environment, so the name Zookeeper was born.

2. Zookeeper overview

  ZooKeeper is an open source distributed coordination service. The ZooKeeper framework was originally built on “Yahoo!” to access their applications in a simple and robust way.

   Later, Apache ZooKeeper became the standard for organized services used by Hadoop, HBase and other distributed frameworks.

   For example, Apache HBase uses ZooKeeper to track the state of distributed data.

  The design goal of ZooKeeper is to encapsulate those complex and error-prone distributed consistency services to form an efficient and reliable primitive set, and provide users with a series of simple and easy-to-use interfaces.

   Primitive: Operating system or computer network term category. It is composed of several instructions, used to complete a process of certain functions. It is indivisible, that is, the execution of the primitive must be continuous, and it is not allowed to be interrupted during the execution.

   ZooKeeper is a typical distributed data consistency solution, distributed applications can be implemented based on ZooKeeper Functions such as data publish/subscribe, load balancing, naming service, distributed coordination/notification, cluster management, master election, distributed lock and distributed queue.

   One of the most common usage scenarios of ZooKeeper is to serve as a service producer and service consumption The registration center of the .

   service producers register the services they provide to the ZooKeeper center. When the service consumers make service calls, they first look up the services in ZooKeeper, and after obtaining the detailed information of the service producers, they can call them The content and data of the service producer.

  As shown in the figure below, ZooKeeper plays the role of the registry in the Dubbo architecture.

  share picture

3. Personal use of zookeeper

p>

   In the projects I have done myself, ZooKeeper is mainly used as the registration center of Dubbo (Dubbo officially recommends the use of ZooKeeper registration center).

   In addition, when building a Solr cluster, I use ZooKeeper as a management tool for the Solr cluster.

   At this time, ZooKeeper mainly provides the following functions:

    1. Cluster management: fault tolerance, Load balancing.
    2. Centralized management of configuration files.
    3. The entrance to the cluster.

   I personally think that when using ZooKeeper, it is best to use the cluster version of ZooKeeper Not the stand-alone version.

The architecture diagram given by the official website of    describes a cluster version of ZooKeeper. Usually 3 servers can form a ZooKeeper cluster.

  Why is it best to use an odd number of servers to form a ZooKeeper cluster?

     We know that the Leader election algorithm in ZooKeeper uses the Zab protocol. The core idea of ​​Zab is that when most servers are successfully written, the task data is written successfully:

     1. If there are 3 servers, at most 1 server is allowed to hang up.

     2. If there are 4 servers, at most 1 server is allowed to hang up.

   Since 3 or 4 servers are allowed to hang up at most, then they The reliability is the same.

   So just select an odd number of ZooKeeper Servers, here select 3 Servers.

4. The important concept of zookeeper

  a. ZooKeeper itself is a distributed program (As long as more than half of the nodes survive, ZooKeeper can serve normally.

  b In order to ensure high availability, it is best to deploy ZooKeeper in the form of a cluster, so as long as most of the machines in the cluster are available (tolerant of certain machine failures), then ZooKeeper itself is still available.

c. ZooKeeper saves data in memory, which also guarantees high throughput and low latency (but memory limits the capacity that can be stored is not too large, this limitation is also a further reason to keep the amount of data stored in Znode small) .

  d. ZooKeeper is high-performance. It is especially high-performance in applications that “read” more than “write”, because “write” will cause all servers to synchronize the state. (“Read” “More than “write” is a typical scenario for coordinating services.)

  e. ZooKeeper has the concept of temporary nodes. When the client session that creates the temporary node remains active, the transient node always exists. And when When the session ends, the instantaneous node is deleted. The persistent node means that once the ZNode is created, unless the ZNode is actively removed, the ZNode will always be stored on Zookeeper

  f. The bottom layer of ZooKeeper is actually Only two functions are provided: ①Manage (store, read) the data submitted by the user program; ②Submit the data node monitoring service for the user program.

< p>  

http://developer.51cto.com/art/201809/583184.htm

Leave a Comment

Your email address will not be published.