Rocksdb Chinese Wiki · Write Stalls Tuning

When we continue to insert a large amount of data, we will find that at a certain time, the performance will suddenly drop. If this happens suddenly, we will check whether it appears in the LOG file or statistics. write stall.

Where Stall

Usually write stalls will appear in several places

Too many memtables

When you need to wait to be flushed to level 0 When the memtable reaches or exceeds max_write_buffer_number, RocksDB will completely stop writing until the flush ends. At the same time, when max_write_buffer_number is greater than or equal to 3 and the number of memtables that need to be flushed is greater than or equal to max_writer_buffer_number-1, RocksDB will stall for writing. Since leveldb will only have one memtable and immemtable, there is no such thing.

Too many level-0 SST files

When the number of level 0 SST files reaches level0_slowdown_writes_tigger, RocksDB will stall for writing. When the number of level 0 SST files reaches level0_stop_writes_trigger, RocksDB will stop writing until the compaction between level 0 and level 1 is completed and the number of level 0 SST files is reduced.

Too many pending compaction bytes

When the estimated compaction data size reaches sofe_pending_compaction_bytes, RocksDB will stall for writing. When hard_pending_compaction_bytes is reached, writing will be stopped. This mechanism is not available in leveldb.

Mitigate Stall

We cannot prevent stalls, we can only improve as much as possible through configuration.
When a stall occurs, RocksDB will reduce the write speed to delayed_write_rate, or even lower than this. Also note that slowdown/stop trigger or pending compaction limit are for different CFs, but stalls are for the entire DB. If there are multiple CFs in the program and one CF stalls, the entire DB will stall.
If stall is caused by pending flush memtable not in time, we can try:
increasing max_background_flushes, so that more threads can flush memtable at the same time.
Increase max_write_buffer_number and use a smaller memtable to improve flush speed.
If the stall is caused by level 0 or too much pending compaction, we need to consider increasing the speed of compaction. In addition, the write amplification can also be reduced, because the smaller the write amplification, the smaller the amount of data required for compaction. So we can try:

Increase max_background_compactions and use more threads for compaction. 

Increase write_buffer_size, so that you can have a larger memtable to reduce write amplification.
Increase min_write_buffer_number_to_merge, merge memtable before flush, reduce the number of keys written, but this will affect the performance of read from memtable.

RocksDB Chinese Wiki address: https://github.com/cld3786326…
English original: https://github.com/facebook/r…

p>

Leave a Comment

Your email address will not be published.