Cassandra Cluster Management – Add new node

Add a node to the Cassandra cluster

Note

This document is only a part of the system documentation. For details of the previous document information, please refer to:
https://blog.51cto.com/michaelkang/2419518< /p>

Scenario:

Used for node expansion, test method: clean up (172.20.101.165) all data on the node, and simulate the addition of a new node;

Confirm content:< /h3>

1: Use the same version of Cassandra
2: Note that seed nodes cannot be booted. Make sure that the new node is not listed in the -seeds list, and do not make all nodes a seed node.
3: copy the DC existing node configuration file to the new node, and then modify the configuration, the file is as follows:
In the cassandra.yaml file and cassandra-topology.properties or cassandra-rackdc.properties

4: Note that the following properties are set in the cassandra.yaml file:
auto_bootstrap:
If this option has been set to false, it must be set to true. This option is not listed in the default cassandra.yaml configuration file, and the default is true.

CLUSTER_NAME:
The name of the cluster to which the new node is joining.

listen_address/broadcast_address:
Use the IP address that other Cassandra nodes use to connect to the new node.

endpoint_snitch:
The whistleblower Cassandra is used to locate nodes and route requests.

num_tokens:
The number of vnodes to be allocated to the node. Use the same number of tokens as set on other nodes in the data center. The token range is allocated proportionally. If the hardware capabilities are different, more token ranges are allocated to systems with higher capacity and better performance.

allocate_tokens_for_local_replication_factor:
Specify the replication factor (RF) of the key space of the data center.

5: During the process of adding nodes, pay attention to monitoring the traffic and process to ensure that the task is not dead;

Processing:

1: Installation service;
2: Synchronize the configuration file and modify the configuration;
3: Modify the streaming_socket_timeout_in_ms value of the cassandra.ymal file. The default value is 3600000, that is, 1 hour. Change it to 172800000 (48 hours) to ensure that the time is enough to transmit all the data.

–The synchronization process will bring a lot of load to the cluster, so turn off or limit some functions as much as possible to avoid affecting online business. Optional, execute the command 4-5–
All nodes are closed, including new nodes (closed after the new node is started))
4: Turn off the compression of all nodes: nodetool disableautocompaction
5: Stop being Compression performed: nodetool stop COMPACTION

6: Limit data migration traffic of all nodes: nodetool setstreamthroughput 32/64/larger
–Limit to 32mbps/64mbps/larger, assuming your cluster has 10 machines, then the traffic of your new node is about 32*10mbps. You can adjust this value according to the progress of data migration, network pressure, node pressure, disk pressure, and the number of completed nodes.

7: Modify the data directory permissions, if you adjust the cluster-related directory folder configuration, you must first modify the permissions: chown -R cassandra.cassandra /var/lib/cassandra
8: start the bootloader node (Execution of newly added node): /etc/init.d/cassandra start
9. Use nodetool status to verify whether the node has been fully booted, and all other nodes are in the running state (UN) and not in any other state.

10: Now restart automatic compression for all nodes: nodetool enableautocompaction
11: Turn off all node data migration traffic: nodetool setstreamthroughput 0

12: Manually clean up every old node Disk space: nodetool cleanup
After the new node is successfully added, running nodetool cleanup on each pre-existing node will take a long time. It is recommended to run nodetool cleanup in the background. If you don’t do this, the old data will still be on the old node, taking up disk space.
Cleanup is a single-threaded operation, which has little effect on the whole, and there is no need to turn off compression.
Cleanup is a stand-alone behavior, and there is no need to restrict node stream transmission.
Avoid business peak hours and execute node by node.

Verify cluster data

[[emailprotected] ~]# cqlsh 172.20.101.157 -u cassandra -p cassandra 

[emailprotected]> SELECT * from kevin_test.t_users;

user_id | emails | first_name | last_name
---------+---------------- -----------------+------------+-----------
6 | {' [email protected]','[email protected]'} | kevin6 | kang
7 | {'[email protected]','[email protected]'} | kevin7 | kang
9 | {' [email protected]','[email protected]'} | kevin9 | kang
4 | {'[email protected]','[email protected]'} | kevin4 | kang
3 | {' [email protected]','[email protected]'} | kevin3 | kang
5 | {'[email protected]','[email protected]'} | kevin5 | kang
0 | {' [email protected]','[email protected]'} | kevin0 | kang
8 | {'[email protected]','[email protected]'} | kevin8 | kang
2 | {'[email Protected]','[email Protected]'} | kevin2 | kang
1 | {'[email Protected]','[email Protected]'} | kevin1 | kang

Reference information:

https://blog.csdn.net/yuanjian0814/article/details/78768889
https://www.jianshu.com/p/1dcca8f19894
http ://cassandra.apache.org/doc/latest/tools/nodetool/nodetool.html?highlight=setstreamthroughput
https://zhaoyanblog.com/archives/684.html
https://blog.csdn .net/yuanjian0814/article/details/78777735
https://blog.csdn.net/iteye_19004/article/details/82648737

Leave a Comment

Your email address will not be published.