How many data have each node in the NOSQL – Cassandra cluster?

When do the boundaries of SSTables compression (primary and secondary) become invalid?

If I have the main compression of 500G SSTables and my final SSTable will exceed 1TB-is this effective for a node to “rewrite” this large data set?

This may take about a day’s hard drive and double the space, so is there a best practice in this regard?

1 TB is a reasonable limit on how much data a single node can Limited by the size of the data, only limited by the operating rate.

There may be only 80 GB of data on a node, but if you absolutely use random reads to smash it and it does not have a lot of RAM, It may not even be able to process this number of requests at a reasonable rate. Similarly, a node may have 10 TB of data, but if you rarely read data from it, or a small part of your data is hot (so that it can Is effectively cached), it will do.

When there is a large amount of data on a node, compression is definitely a problem that needs attention, but there are several points to note:

First, the “maximum” compression, the result of which is a single huge SSTable, rarely occurs, or even more, because the amount of data on the node increases. (The number of secondary compressions that must occur before the top-level compression occurs will vary with you The number of top-level compressions performed has grown exponentially.)

Secondly, your node is still able to process requests, and the reading speed will be slower.

Third, if your If the replication factor is greater than 1 and you do not read at the consistency level ALL, the other replicas will be able to respond quickly to read requests, so from the client’s perspective, you should not see a huge difference in latency.

Finally, there are some larger data sets that may help plans to improve the compaction strategy.

When will the boundaries of SSTables compression (primary and secondary) change? Must be invalid?

If I have the main compression of 500G SSTables and my final SSTable will exceed 1TB-is this effective for a node to “rewrite” this large data set?

This may take about a day’s hard drive and double the space, so is there a best practice in this regard?

1 TB is a reasonable limit on how much data a single node can handle, but in fact, the node is not limited by the size of the data at all, only limited by the operating rate .

A node may only have 80 GB of data, but if you absolutely smash it with random read and it does not have a lot of RAM, it may not even be able to process this amount at a reasonable rate Request. Similarly, a node may have 10 terabytes of data, but if you rarely read data from it, or if a small part of your data is hot (so that it can be cached efficiently), it’s fine. /p>

When there is a large amount of data on a node, compression is definitely a problem that needs attention, but there are a few points to note:

First of all, the “maximum” compression, the result is a single Huge SSTable, rarely happens, or even more, because the amount of data on the node increases. (The number of secondary compressions that must occur before the top-level compression occurs will grow exponentially with the number of top-level compressions you have performed. )

Secondly, your node is still able to process requests, and the read speed will be slower.

Third, if your replication factor is greater than 1 and you have not ALL at the consistency level Read, the other replicas will be able to respond quickly to read requests, so from a client perspective, you should not see a huge difference in latency.

Finally, some larger data sets may have Help plans to improve the compaction strategy.

Leave a Comment

Your email address will not be published.