ZooKeeper installation and deployment

1. System requirement

ZooKeeper can run on a variety of system platforms, as shown in Table 1 The system platform supported by zk and whether the development environment or production environment is supported on the platform.
Table: Running Platforms Supported by ZooKeeper
System Development Environment Production Environment
Linux Support
Solaris Support
FreeBSD Support
Windows Support Not Support
MacOS Support Not Support
ZooKeeper is written in Java and runs on the Java environment. Therefore, the Java runtime environment needs to be installed on the machine where zk is deployed. In order to run zk normally, we need JRE 1.6 or above.
For ZooKeeper deployment in cluster mode, 3 ZooKeeper service processes are the recommended minimum number of processes, and different service processes are recommended to be deployed on different physical machines to reduce the risk of machine downtime and achieve ZooKeeper High availability of the cluster.
ZooKeeper does not require much hardware configuration of the machine. For example, in Yahoo!, the configuration of the machines deployed by ZooKeeper is usually as follows: dual-core processor, 2GB memory, 80GB hard disk.

Second, download

You can download ZooKeeper from https://zookeeper.apache.org/releases.html, the latest stable version is 3.4.8 Version, users can choose a faster mirror to download.

Three Directory

After downloading and decompressing the ZooKeeper software compression package, you can see that zk contains the following files and directories:

bin directory: zk executable script directory, including zk service process, zk client, and other scripts. Among them, .sh is a script in the Linux environment, and .cmd is a script in the Windows environment.
conf directory: configuration file directory. zoo_sample.cfg is a sample configuration file and needs to be modified to its own name, usually zoo.cfg. log4j.properties is the log configuration file.
lib: The package that zk depends on.
contrib directory: some toolkits for operating zk.
Recipes directory: Code examples of some usages of zk

four, stand-alone mode

The installation of ZooKeeper includes stand-alone mode installation and cluster mode installation .
The stand-alone mode is relatively simple, which means that only one zk process is deployed, and the client directly communicates with the zk process.
In the development and test environment, there are not many physical resources, so we often use the stand-alone mode. Of course, cluster mode can also be deployed on a single physical machine, but this will increase the resource consumption of a single physical machine. Therefore, in the development environment, we generally use the stand-alone mode.
But it should be noted that the stand-alone mode is not available in the production environment. This is because the stand-alone mode cannot meet the needs of production regardless of system reliability or read and write performance.

4.1 Run Configuration

As mentioned above, the configuration sample zoo_sample.cfg is provided in the conf directory. To run zk, you need to Its name is changed to zoo.cfg.
Open zoo.cfg, you can see some default configurations.
tickTime
The unit of time length is milliseconds, which is the basic time measurement unit used by zk. For example, 1 * tickTime is the heartbeat time between the client and the zk server, and 2 * tickTime is the timeout time of the client session.
The default value of tickTime is 2000 milliseconds. A lower tickTime value can find timeout issues faster, but it will also cause higher network traffic (heartbeat messages) and higher CPU usage (session tracking processing) .
clientPort
The TCP port monitored by the zk service process. By default, the server will listen on port 2181.
dataDir
There is no default configuration, it must be configured. It is used to configure the directory where the snapshot files are stored. If dataLogDir is not configured, the transaction log will also be stored in this directory.

4.2 Start

In the Windows environment, double-click zkServer.cmd directly. In the Linux environment, enter the bin directory and execute the command: ./zkServer.sh start This command makes the zk service process proceed in the background.
You can use the command: ./zkServer.sh status to view the running status of zk. In cluster mode, you can also see whether it is the master node or the slave node.
If you want to run in the foreground to view the output log of the server process, you can run it with the following command: ./zkServer.sh start-foreground You can see the output of a lot of detailed information to allow you to view what happened to the server.
Use a text editor to open the zkServer.cmd or zkServer.sh file, you can see that it will call the zkEnv.cmd or zkEnv.sh script. The function of the zkEnv script is to set some environment variables that zk runs, such as the location and name of the configuration file.

4.3 Connection

If it is connected to the zk process on the same host, then directly run zkCli.cmd in the bin/ directory (under Windows environment) Or zkCli.sh (under Linux environment), you can connect to zk.
Directly execute the zkCli.cmd or zkCli.sh command by default to connect to zk with host number 127.0.0.1 and port number 2181. If you want to connect to zk on different machines, you can use the -server parameter, for example:
bin/ zkCli.sh -server 192.168.0.1:2181

five, cluster mode

Although the zk process in stand-alone mode is convenient for development and testing, it is not suitable Use in a production environment. In a production environment, we need to use cluster mode to deploy zk.
Note that in cluster mode, it is recommended to deploy at least 3 zk processes, or deploy an odd number of zk processes. If only two zk processes are deployed, when one of the zk processes hangs, the remaining process does not constitute the majority of a quorum. Therefore, the deployment of two processes is even less reliable than the stand-alone mode, because the possibility that one of the two processes is unavailable is greater than the possibility that one process is unavailable.

5. 1 Run configuration

In cluster mode, all zk processes can use the same configuration file (refers to each zk process Deployed on different machines), such as the following configuration:
tickTime=2000
dataDir=/home/myname/zookeeper
clientPort=2181
initLimit=5
syncLimit=2
server.1=192.168.229.160:2888:3888
server.2=192.168.229.161:2888:3888
server.3=192.168.229.162:2888:3888
initLimit
ZooKeeper cluster mode Contains multiple zk processes, one of which is the leader and the remaining processes are followers.
When the follower initially establishes a connection with the leader, a considerable amount of data will be transmitted between them, especially the data of the follower is much behind the leader. initLimit configures the maximum time for synchronization after a connection is established between the follower and the leader.
syncLimit
Configure the maximum length of time for sending messages, requests and responses between follower and leader.
tickTime
tickTime is the basic unit of the above two timeout configurations. For example, for initLimit, the configuration value is 5, indicating that the timeout period is 2000ms * 5 = 10 seconds.
server.id=host:port1:port2
where id is a number, representing the id of the zk process, and this id is also the content of the myid file in the dataDir directory.
host is the IP address where the zk process is located, port1 represents the port used by the follower and leader to exchange messages, and port2 represents the port used to elect the leader.
dataDir
The meaning of its configuration is similar to that in stand-alone mode, except that there is a myid file in cluster mode. The content of the myid file is only one line, and the content can only be a number between 1 and 255. This number is also the id in server.id introduced above, which represents the id of the zk process.
Note that if you deploy the zk process on the same machine only for testing the deployment of the cluster mode, the port parameter in the server.id=host:port1:port2 configuration must be different. However, in order to reduce the risk of machine downtime, it is strongly recommended to deploy the zk process on different physical machines when deploying the cluster mode.

5.2 Start

If we plan to deploy a zk process on three different machines, 192.168.229.160, 192.168.229.161, and 192.168.229.162, To form a zk cluster.
The three zk processes all use the same zoo.cfg configuration:
tickTime=2000
dataDir=/home/myname/zookeeper
clientPort=2181
initLimit=5
syncLimit= 2
server.1=192.168.229.160:2888:3888
server.2=192.168.229.161:2888:3888
server.3=192.168.229.162:2888:3888
on three machines Under the dataDir directory (/home/myname/zookeeper directory), generate a myid file with contents 1, 2, and 3. Then start the zk process on these three machines, so that we start the zk cluster.

5.3 Connection

You can use the following command to connect to a zk cluster:
bin/zkCli.sh -server 192.168.229.160:2181,192.168 .229.161:2181,192.168.229.162:2181
After a successful connection, you can see the following output:

 2016-06-28 19 span>:29:18,074 [myid:]-INFO [main:ZooKeeper@438]-Initiating client connection, connectString=192.168.229.160:2181,192 .168.229.161:2181,192.168.229.162:< span class="hljs-number">2181 sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain $MyWatcher@770537e4Welcome to ZooKeeper!2016-06- 28 19:29:18,146 [myid:]-INFO [main-< span class="hljs-constant">SendThread(192.168.229.162:2181):ClientCnxn$SendThread@975]-Opening socket connection to server 192.168.229.162/192.168.229.162:2181.  Will not attempt to authenticate using SASL (unknown error)JLine support is enabled2016-06-28 19:29:18,161 [myid:]-INFO [main-SendThread(192.168.229.162:2181):ClientCnxn$SendThread@852]-Socket connection established to 192.168.229.162/192.168.229.162:2181 , initiating session2016-06-< span class="hljs-number">28 19:29:18, 199 [myid:]-INFO [main-SendThread(192.168.229.162: 2181):ClientCnxn$SendThread span>@1235]-Session establishment complete on server 192.168 .229.162/192.168.229.162 span>:2181, sessionid = 0x3557c39d2810029, negotiated timeout = 30000WATCHER::< span class="hljs-constant">WatchedEvent state:SyncConnected type:None path:null[zk: 192.168.229.160:2181,192.168.229.161< /span>:2181,192.168. 229.162:2181(CONNECTED) 0]
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • li>

  • 7
  • 8
  • 9

From the log output, you can see that the client is connected to the 192.168.229.162:2181 process (which machine is connected to the zk process at random), and the client has successfully connected to the zk cluster.

Reference Materials

1, http://zookeeper.apache.org/doc/trunk/zookeeperStarted.html
2, http:// zookeeper.apache.org/doc/trunk/zookeeperAdmin.html
3. “ZooKeeper Distributed System Development Practical Combat” course, lecturer: Xi Gan
4. “ZooKeeper Distributed Process Collaboration Technology Detailed Explanation”, Flavio Junqueira Wait, Xie Chao and other translations
5. Baidu Encyclopedia’s explanation of quorum, http://baike.baidu.com/link?url=pqWrzgH-_VhMLnscR1iRTpPjovfyhxG-8Qs9HxGutiGi5bhnA_lX_pmabLQ-3MiDeigcHRFMYSookeeper installation and configuration 6 “, http://coolxing.iteye.com/blog/1871009

Let’s share my teacher’s artificial intelligence tutorial. Zero-based! Easy to understand! Funny and humorous! Hope you join our artificial intelligence team too! http://www.captainbed.net

One, System Requirements

ZooKeeper can run on multiple system platforms , Table 1 shows the system platforms supported by zk, and whether the development environment or production environment is supported on this platform.
Table: Running Platforms Supported by ZooKeeper
System Development Environment Production Environment
Linux Support
Solaris Support
FreeBSD Support
Windows Support Not Support
MacOS Support Not Support
ZooKeeper is written in Java and runs on the Java environment. Therefore, the Java runtime environment needs to be installed on the machine where zk is deployed. In order to run zk normally, we need JRE 1.6 or above.
For ZooKeeper deployment in cluster mode, 3 ZooKeeper service processes are the recommended minimum number of processes, and different service processes are recommended to be deployed on different physical machines to reduce the risk of machine downtime and achieve ZooKeeper High availability of the cluster.
ZooKeeper does not require much hardware configuration of the machine. For example, in Yahoo!, the configuration of the machines deployed by ZooKeeper is usually as follows: dual-core processor, 2GB memory, 80GB hard disk.

Second, download

You can download ZooKeeper from https://zookeeper.apache.org/releases.html, the latest stable version is 3.4.8 Version, users can choose a faster mirror to download.

Three Directory

After downloading and decompressing the ZooKeeper software compression package, you can see that zk contains the following files and directories:

bin directory: zk executable script directory, including zk service process, zk client, and other scripts. Among them, .sh is a script in the Linux environment, and .cmd is a script in the Windows environment.
conf directory: configuration file directory. zoo_sample.cfg is a sample configuration file and needs to be modified to its own name, usually zoo.cfg. log4j.properties is the log configuration file.
lib: The package that zk depends on.
contrib directory: some toolkits for operating zk.
Recipes directory: Code examples of some usages of zk

four, stand-alone mode

The installation of ZooKeeper includes stand-alone mode installation and cluster mode installation .
The stand-alone mode is relatively simple, which means that only one zk process is deployed, and the client directly communicates with the zk process.
In the development and test environment, there are not many physical resources, so we often use the stand-alone mode. Of course, cluster mode can also be deployed on a single physical machine, but this will increase the resource consumption of a single physical machine. Therefore, in the development environment, we generally use the stand-alone mode.
But it should be noted that the stand-alone mode is not available in the production environment. This is because the stand-alone mode cannot meet the needs of production regardless of system reliability or read and write performance.

4.1 Run Configuration

As mentioned above, the configuration sample zoo_sample.cfg is provided in the conf directory. To run zk, you need to Its name is changed to zoo.cfg.
Open zoo.cfg, you can see some default configurations.
tickTime
The unit of time length is milliseconds, which is the basic time measurement unit used by zk. For example, 1 * tickTime is the heartbeat time between the client and the zk server, and 2 * tickTime is the timeout time of the client session.
The default value of tickTime is 2000 milliseconds. A lower tickTime value can find timeout issues faster, but it will also cause higher network traffic (heartbeat messages) and higher CPU usage (session tracking processing) .
clientPort
The TCP port monitored by the zk service process. By default, the server will listen on port 2181.
dataDir
There is no default configuration, it must be configured. It is used to configure the directory where the snapshot files are stored. If dataLogDir is not configured, the transaction log will also be stored in this directory.

4.2 Start

In the Windows environment, double-click zkServer.cmd directly. In the Linux environment, enter the bin directory and execute the command: ./zkServer.sh start This command makes the zk service process proceed in the background.
You can use the command: ./zkServer.sh status to view the running status of zk. In cluster mode, you can also see whether it is the master node or the slave node.
If you want to run in the foreground to view the output log of the server process, you can run it with the following command: ./zkServer.sh start-foreground You can see the output of a lot of detailed information to allow you to view what happened to the server.
Use a text editor to open the zkServer.cmd or zkServer.sh file, you can see that it will call the zkEnv.cmd or zkEnv.sh script. The function of the zkEnv script is to set some environment variables that zk runs, such as the location and name of the configuration file.

4.3 Connection

If it is connected to the zk process on the same host, then directly run zkCli.cmd in the bin/ directory (under Windows environment) Or zkCli.sh (under Linux environment), you can connect to zk.
Directly execute the zkCli.cmd or zkCli.sh command by default to connect to zk with host number 127.0.0.1 and port number 2181. If you want to connect to zk on different machines, you can use the -server parameter, for example:
bin/ zkCli.sh -server 192.168.0.1:2181

five, cluster mode

Although the zk process in stand-alone mode is convenient for development and testing, it is not suitable Use in a production environment. In a production environment, we need to use cluster mode to deploy zk.
Note that in cluster mode, it is recommended to deploy at least 3 zk processes, or deploy an odd number of zk processes. If only two zk processes are deployed, when one of the zk processes hangs, the remaining process does not constitute the majority of a quorum. Therefore, the deployment of two processes is even less reliable than the stand-alone mode, because the possibility that one of the two processes is unavailable is greater than the possibility that one process is unavailable.

5. 1 Run configuration

In cluster mode, all zk processes can use the same configuration file (refers to each zk process Deployed on different machines), such as the following configuration:
tickTime=2000
dataDir=/home/myname/zookeeper
clientPort=2181
initLimit=5
syncLimit=2
server.1=192.168.229.160:2888:3888
server.2=192.168.229.161:2888:3888
server.3=192.168.229.162:2888:3888
initLimit
ZooKeeper cluster mode Contains multiple zk processes, one of which is the leader and the remaining processes are followers.
When the follower initially establishes a connection with the leader, a considerable amount of data will be transmitted between them, especially the data of the follower is much behind the leader. initLimit configures the maximum time for synchronization after a connection is established between the follower and the leader.
syncLimit
Configure the maximum length of time for sending messages, requests and responses between follower and leader.
tickTime
tickTime is the basic unit of the above two timeout configurations. For example, for initLimit, the configuration value is 5, indicating that the timeout period is 2000ms * 5 = 10 seconds.
server.id=host:port1:port2
where id is a number, representing the id of the zk process, and this id is also the content of the myid file in the dataDir directory.
host is the IP address where the zk process is located, port1 represents the port used by the follower and leader to exchange messages, and port2 represents the port used to elect the leader.
dataDir
The meaning of its configuration is similar to that in stand-alone mode, except that there is a myid file in cluster mode. The content of the myid file is only one line, and the content can only be a number between 1 and 255. This number is also the id in server.id introduced above, which represents the id of the zk process.
Note that if you deploy the zk process on the same machine only for testing the deployment of the cluster mode, the port parameter in the server.id=host:port1:port2 configuration must be different. However, in order to reduce the risk of machine downtime, it is strongly recommended to deploy the zk process on different physical machines when deploying the cluster mode.

5.2 Start

If we plan to deploy a zk process on three different machines, 192.168.229.160, 192.168.229.161, and 192.168.229.162, To form a zk cluster.
The three zk processes all use the same zoo.cfg configuration:
tickTime=2000
dataDir=/home/myname/zookeeper
clientPort=2181
initLimit=5
syncLimit= 2
server.1=192.168.229.160:2888:3888
server.2=192.168.229.161:2888:3888
server.3=192.168.229.162:2888:3888
on three machines Under the dataDir directory (/home/myname/zookeeper directory), generate a myid file with contents 1, 2, and 3. Then start the zk process on these three machines, so that we start the zk cluster.

5.3 Connection

You can use the following command to connect to a zk cluster:
bin/zkCli.sh -server 192.168.229.160:2181,192.168 .229.161:2181,192.168.229.162:2181
After a successful connection, you can see the following output:

 2016-06-28 19 span>:29:18,074 [myid:]-INFO [main:ZooKeeper@438]-Initiating client connection, connectString=192.168.229.160:2181,192.16 8.229.161:2181,192.168.229.162:2181 sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain< /span>$MyWatcher@770537e4Welcome to ZooKeeper!2016-06-< span class="hljs-number">28 19:29:18, 146 [myid:]-INFO [main-SendThread(192.168.229.162:2181):ClientCnxn$SendThread@975]-Opening socket connection to server 192.168.229.162/192.168.229.162:2181.  Will not attempt to authenticate using SASL (unknown error)JLine support is enabled2016-06-28 19:29:18,161 [myid:]-INFO [main-SendThread(192.168.229.162:2181):ClientCnxn$SendThread@852]-Socket connection established to 192.168.229.162/192.168.229.162:2181 , initiating session2016-06-28 19:29:18, 199 [myid:]-INFO [main-SendThread(192.168.229.162: 2181):ClientCnxn$SendThread span>@1235]-Session establishment complete on server 192.168 .229.162/192.168.229.162 span>:2181, sessionid = 0x3557c39d2810029, negotiated timeout = 30000WATCHER::< span class="hljs-constant">WatchedEvent state:SyncConnected type:None path:null[zk: 192.168.229.160:2181,192.168.229.161< /span>:2181,192.168. 229.162:2181(CONNECTED) 0]
  • 1
  • 2
  • 3
  • li>

  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • It can be seen from the log output that the client is connected to the 192.168.229.162:2181 process (the zk process of which machine is connected to is random), and the client has successfully connected to the zk cluster.

    Reference Materials

    1, http://zookeeper.apache.org/doc/trunk/zookeeperStarted.html
    2, http:// zookeeper.apache.org/doc/trunk/zookeeperAdmin.html
    3. “ZooKeeper Distributed System Development Practical Combat” course, lecturer: Xigan
    4. “ZooKeeper Distributed Process Collaboration Technology Detailed Explanation”, Flavio Junqueira Wait, Xie Chao and other translations
    5. Baidu Encyclopedia’s explanation of quorum, http://baike.baidu.com/link?url=pqWrzgH-_VhMLnscR1iRTpPjovfyhxG-8Qs9HxGutiGi5bhnA_lX_pmabLQ-3MiDeigcHRFMYSookeeper installation and configuration 6 “, http://coolxing.iteye.com/blog/1871009

Leave a Comment

Your email address will not be published.