Cassandra cluster management - node abnormal restart - abnormal, Cassandra, cluster, management, Node, Restart

Cassandra cluster management-abnormal node restart

Note:

This document is only a part of the system document. For details of the previous document information, please see:
Test preparation + offline normal node: https://blog.51cto. com/michaelkang/2419518
Node restarts abnormally: https://blog.51cto.com/michaelkang/2419524
Add a new node: https://blog.51cto.com/michaelkang/2419521
Remove exception Node: https://blog.51cto.com/michaelkang/2419525

Scenario:

The node was restarted abnormally, and the response to the cluster.

Cassandra.log basically has no output

tailf /var/log/cassandra/cassandra.log

system.log

There is obvious The log reports 172.20.101.166 DOWN! ! !

172.20.101.165 Node:

[[email protected] lib]# tailf /var/log/cassandra/system.log 
INFO [GossipStage:1] 2019 -07-11 18:19:23,372 Gossiper.java:1026-InetAddress /172.20.101.166 is now DOWN

View abnormal nodes

[[email Protected] ~]# nodetool describecluster 
Cluster Information:
 Name: pttest
 Snitch: org.apache.cassandra.locator.GossipingPropertyFileSnitch
 DynamicEndPointSnitch: enabled
 Partitioner: org.apache.cassandra.dht. Murmur3Partitioner
 Schema versions:
 cfce5a85-19c8-327a-ab19-e1faae2358f7: [172.20.101.164, 172.20.101.165, 172.20.101.167, 172.20.101.160, 172.20.101.157]

 UNREACHABLE: [172.20.101.166]

debug.log

A large number of reports fail to connect to 172.20.101.166

172.20.101.164 node:

tailf /var/log/cassandra/debug.log

DEBUG [GossipStage:1] 2019-07-11 18:19:23,374 OutboundTcpConnection.java:205-Enqueuing socket close fo r /172.20.101.166
DEBUG [MessagingService-Outgoing-/172.20.101.166-Small] 2019-07-11 18:19:23,374 OutboundTcpConnection.java:411-Socket to /172.20.101.166 closed
DEBUG [GossipStage:1] 2019-07-11 18:19:23,374 OutboundTcpConnection.java:205-Enqueuing socket close for /172.20.101.166
DEBUG [MessagingService-Outgoing-/172.20.101.166-Gossip] 2019-07- 11 18:19:23,374 OutboundTcpConnection.java:411-Socket to /172.20.101.166 closed
DEBUG [GossipStage:1] 2019-07-11 18:19:23,374 FailureDetector.java:313-Forcing conviction of /172.20 .101.166
DEBUG [MessagingService-Outgoing-/172.20.101.166-Gossip] 2019-07-11 18:19:24,740 OutboundTcpConnection.java:425-Attempting to connect to /172.20.101.166
INFO [HANDSHAKE] -/172.20.101.166] 2019-07-11 18:19:24,741 OutboundTcpConnection.java:561-Handshaking version with /172.20.101.166
DEBUG [MessagingService-Outgoing-/172.20.101.166-Gossip] 2019-07- 11 18:19:24,742 OutboundTcpConnection.java:533-Done connecting to / 172.20.101.166

Verification query

After the system is started, the service starts naturally and can join the cluster normally.

[email protected]> SELECT * from kevin_test.t_users; 

 user_id | emails | first_name | last_name
---------+-- -------------------------------+------------+----- ------
 6 | {'[emailprotected]','[emailprotected]'} | kevin6 | kang
 7 | {'[emailprotected]','[emailprotected] '} | kevin7 | kang
 9 | {'[emailprotected]','[emailprotected]'} | kevin9 | kang
 4 | {'[emailprotected]','[emailprotected] '} | kevin4 | kang
 3 | {'[emailprotected]','[emailprotected]'} | kevin3 | kang
 5 | {'[emailprotected]','[emailprotected] '} | kevin5 | kang
 0 | {'[emailprotected]','[emailprotected]'} | kevin0 | kang
 8 | {'[emailprotected]','[emailprotected] '} | kevin8 | kang
 2 | {'[email protected]','[email protected]'} | kevin2 | kang
 1 | {'[email pr otected]‘, ‘[email protected]‘} | kevin1 | kang

Test result:

Restart the node repeatedly, and the query table content is normal.

Cassandra cluster management – node abnormal restart