[Data Structure] Red Black Tree and Jump Table – (Sortset) – (TreeMap) – (Treeset)

SortSet

   An ordered set is actually TreeSet in Java is the only implementation class of SortSet, which is implemented internally through TreeMap; while TreeMap is implemented through red-black trees; and in Redis, it is implemented through jump tables;

SkipList

  The idea is similar to a balanced binary tree, but it is different; here is an introduction:

Introduction to   skiplist data structure (from: https ://www.cnblogs.com/Elliott-Su-Faith-change-our-life/p/7545940.html)

  skiplist is essentially a search structure , Used to solve the search problem in the algorithm (Searching), that is, according to the given key, quickly find its location (or the corresponding value).

   When we introduced dict in the first article of the “Redis Internal Data Structure Detailed Explanation” series, we discussed: The general search problem is divided into two categories: one is based on various balanced trees , One is based on a hash table. However, skiplist is rather special, and it cannot be classified into these two categories.

   This data structure was invented by William Pugh and first appeared in his paper “Skip Lists: A Probabilistic Alternative to Balanced Trees” published in 1990. Students who are interested in the details can download the original paper to read.

skiplist, as the name implies, first of all it is a list. In fact, it is developed on the basis of an ordered linked list.

   Let’s first look at an ordered linked list, as shown in the figure below (the gray node on the far left represents an empty head node):

   In such a linked list, if we want To find a certain data, you need to compare one by one from the beginning until the node containing the data is found, or the first node larger than the given data is found (not found). In other words, the time complexity is O(n). Similarly, when we want to insert new data, we have to go through the same search process to determine the insertion position.

   If we add a pointer for every two adjacent nodes, let the pointer point to the next node, as shown below:

share picture

   so that all newly added pointers are connected into a new linked list, but it contains The number of nodes is only half of the original number (7, 19, 26 in the figure above). Now when we want to find data, we can first look up along this new linked list. When it encounters a node larger than the data to be checked, it returns to the original linked list to search. For example, we want to search for 23, and the search path is along the direction pointed by the pointer marked in red in the figure below:

share picture

  • First compare 23 with 7, and then compare with 19, which are larger than them, and continue to compare backwards.

  • But when 23 is compared with 26, it is smaller than 26, so return to the following linked list (original linked list) and compare with 22.

  • 23 is greater than 22. Continue to compare with 26 along the pointer below. 23 is smaller than 26, indicating that the data 23 to be checked does not exist in the original linked list, and its insertion position should be between 22 and 26.

   In this search process, due to the newly added pointer, we no longer need to compare each node in the linked list one by one. The number of nodes to be compared is about half of the original number.

  Using the same method, we can continue to add a pointer for every two adjacent nodes on the newly generated upper-level linked list, thereby generating a third-level linked list. As shown below:

Share a picture

p>

   In this new three-level linked list structure, if we still look for 23, then the first to be compared along the top linked list is 19, and we find that 23 is greater than 19, and then we know that we only need to 19 Continue to search later, thus skipping all the nodes in front of 19 at once. It is conceivable that when the linked list is long enough, this multi-level linked list search method allows us to skip many lower-level nodes, greatly speeding up the search.

  skiplist is inspired by the idea of ​​this multi-layer linked list. In fact, according to the above method of generating the linked list, the number of nodes in each level of the linked list above is half of the number of nodes in the lower level, so the search process is very similar to a binary search, so that the time complexity of the search can be reduced. To O(log n). However, this method has a big problem when inserting data. After a new node is inserted, the strict 2:1 correspondence of the number of nodes on the upper and lower adjacent linked lists will be disrupted. If you want to maintain this correspondence, you must re-adjust all nodes behind the newly inserted node (including the newly inserted node), which will reduce the time complexity to O(n) again. Deleting data has the same problem.

  skiplist In order to avoid this problem, it does not require a strict correspondence between the number of nodes between the upper and lower adjacent levels of the linked list, but a random level for each node. For example, if the number of layers randomly selected by a node is 3, then it is linked to the three-layer linked list from layer 1 to layer 3. For clarity, the following figure shows how to form a skiplist through the step-by-step insertion operation (click to enlarge):

share picture

   From the above skiplist creation and insertion process, we can see that the level of each node is random, and Inserting a new node will not affect the number of layers of other nodes. Therefore, the insert operation only needs to modify the pointers before and after the inserted node, and does not need to adjust many nodes. This reduces the complexity of the insert operation. In fact, this is a very important feature of skiplist, which makes it significantly better than the balanced tree solution in terms of insertion performance. This will be mentioned later.

   According to the skiplist structure in the figure above, it is easy for us to understand the origin of the name of this data structure. Skiplist, translated into Chinese, can be translated into “jump list” or “jump list”, which means that in addition to the bottom first level linked list, it will generate several sparse linked lists, and the pointers in these linked lists are deliberately skipped Some nodes (and the higher the level of the linked list, the more nodes are skipped). This allows us to search in the high-level linked list first when looking for data, and then reduce it level by level, and finally down to the first level linked list to accurately determine the data location. In this process, we skipped some nodes, thus speeding up the search.

   The skiplist just created contains a total of 4 levels of linked lists. Now suppose we are still looking for 23 in it. The following figure shows the search path:

Share a picture

   It should be noted that the process of inserting each node in the previous demonstration is actually before inserting It is also necessary to go through a similar search process first, and then complete the insertion operation after determining the insertion position.

   So far, we have been very clear about the search and insert operations of skiplist. The delete operation is similar to the insert operation, and we can easily imagine it. We should also be able to easily implement these operations in code.

   Of course, each node of the skiplist in practical applications should contain two parts: key and value. In the previous description, we did not specifically distinguish between key and value, but in fact the list is sorted according to key, and the search process is also based on key comparison.

Red-black tree:

  This introduction Much, to sum up, a self-balancing binary search tree.

Leave a Comment

Your email address will not be published.