Cassandra Cluster Management - Replace Exception Nodes - Cassandra, cluster, exception, management, Node, replacement

Cassandra cluster management-replace abnormal node

Replace abnormal cluster node, use JVM startup flag Dcassandra.replace_address_first_boot = to start. Once this attribute is enabled, the node will start in a dormant state, during which all other nodes will see this node shut down. The replacement node will immediately start directing data from the remaining nodes in the cluster. The main difference in the normal boot of the new node is that the new node will not accept any writes at this stage. Once the boot is completed, the node will be marked as “UP”, and we rely on the implicit start to ensure the independent existence of the new node data. (Because the new node has not accepted writes since the start of the boot).
If the replacement process takes longer than max_hint_window_in_ms, then you must run repair to make the placed node consistent again because it missed the ongoing write during boot.

Note:

This document is only a part of the system document. For details of the previous document information, please see:
Test preparation + offline normal node: https://blog.51cto. com/michaelkang/2419518
Node restarts abnormally: https://blog.51cto.com/michaelkang/2419524
Add a new node: https://blog.51cto.com/michaelkang/2419521
Remove exception Node: https://blog.51cto.com/michaelkang/2419525

View cluster status

[[emailprotected] ~]# nodetool status 
Datacenter: dc1< br />-- Address Load Tokens Owns Host ID Rack
.......

The abnormal node status is DN

DN 172.20.101.166 76.83 MiB 256? 88e16e35-50dd-4ee3-aa1a-f10a8c61a3eb rack1

Replacement Nodes-Precautions

Refer to “## Cassandra Cluster Adding Nodes”
https://blog.51cto .com/michaelkang/2419521

Modify the configuration file

vi /etc/cassandra/conf/jvm.options 
If you want to replace the dead node, please place it in its place Restart the new node with the specified dead node address. The data directory of the new node must not contain any data.
Line 47
#-Dcassandra.replace_address=listen_address or broadcast_address of dead node

Modify the configuration file:
- Dcassandra.replace_address=172.20.101.166

Clean up useless data and start services

Delete the following folders and contents before execution: 
- data/
- commitlog/ 
- saved_caches/

rm -rf /var/lib/cassandra/

Start:
/etc/init.d/cassandra start

Wait for the completion of the cluster data recovery and verify the cluster status:

[[emailprotected] cassandra]# nodetool status
Datacenter: dc1
======== =======
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
 UN 172.20 .101.164 68.13 MiB 256? Dcbbad83-fe7c-4580-ade7-aa763b8d2c40 rack1
 UN 172.20.101.165 59.21 MiB 256? B985de23-6ad1-40b9-a252-dbaeb5d4cb12 rack1
Restore=》UN 172.20.101.166 154.7 KiB 256? F9a72fb2-55bd-40ec-b8e7-717404b80f19 rack1
 UN 172.20.101.167 71.93 MiB 256 ? 8808aaf7-690c-4f0c-be9b-ce655c1464d4 rack1
 UN 172.20.101.160 66.23 MiB 256? 57cc39fc-e47b-4c96-b9b0-b004f2b79242 rack1
 UN 172.20.101.157 55b-48 MiB 256? 091ff0dc -b4ce-e70c84bbfafc rack1

Verification query

cqlsh 172.20.101.157 -u cassandra -p cassandra 
[emailprotected]> SELECT * from kevin_test.t_users; 

 user_id | emails | first_name | last_name
---------+------------------------ ---------+------------+-----------
 6 | {'[email protected]','[ email protected]'} | kevin6 | kang
 7 | {'[email protected]','[email protected]'} | kevin7 | kang
 9 | {'[email protected]','[ email protected]'} | kevin9 | kang
 4 | {'[email protected]','[email protected]'} | kevin4 | kang
 3 | {'[email protected]','[ email protected]'} | kevin3 | kang
 5 | {'[ema il protected]','[email protected]'} | kevin5 | kang
 0 | {'[email protected]','[email protected]'} | kevin0 | kang
 8 | {'[ email protected]','[email protected]'} | kevin8 | kang
 2 | {'[email protected]','[email protected]'} | kevin2 | kang
 1 | {'[ email protected]','[email protected]'} | kevin1 | kang

Test result:

Repeatedly restart the node, the query table content is normal.

Reference information:

https://blog.csdn.net/yuanjian0814/article/details/78768889
https://www.jianshu.com/p/1dcca8f19894
http://cassandra.apache.org/doc/latest/tools/nodetool/nodetool.html?highlight=setstreamthroughput
https://zhaoyanblog.com/archives/684.html
https:/ /blog.csdn.net/yuanjian0814/article/details/78777735
https://blog.csdn.net/iteye_19004/article/details/82648737

Cassandra Cluster Management – Replace Exception Nodes