A data structure based on luck, guess what?

Time:2021-7-20

A data structure based on luck, guess what?

Ranking List

As soon as I look at this subtitle, I know that I want to talk about the Zset of redis from the perspective of ranking.

Yes, the classic interview questions, please achieve a ranking, most of the time is to test whether you know the Zset structure of redis and its corresponding operation.

Of course, we can also base the ranking on other solutions. For example, mysql.

I used to make a ranking list based on MySQL, and one SQL can do it. However, it is only limited to the scenarios where the data volume is relatively small and the performance requirements are not high (I only had 11 teams to make the leaderboard at that time and refresh the leaderboard once a minute).

For this kind of classic interview eight part essay, we can find a lot on the Internet, so this article will not do the relevant analysis.

It’s just a starting point.

If you don’t know how to implement it, or you don’t know what this question is asking, please remember to read the related articles after reading this article. You’d better do it yourself.

Believe me, I have to recite the eight part essay.

Internal coding of Zset

As we all know, redis provides five basic data types. But the internal coding of each basic type is a different landscape

A data structure based on luck, guess what?

The list data structure also provides the internal code of QuickList in redis version 3.2. It’s not the key point of this article. I’ll just mention it. If you’re interested, you can get to know it by yourself.

This paper mainly discusses the Zset data structure in the figure above.

There are two internal codes of Zset: ziplist and skiplist.

A data structure based on luck, guess what?

In fact, you don’t think this thing is magical. Because you are already familiar with this kind of “double standard party” which is both external and internal.

It is a collection of JDK class, to friends, bold call out its name: HashMap.

In addition to the basic array structure, HashMap has two other data structures: a linked list and a red black tree.

This association is not just that, at least have a bottom in mind.

When the length of the linked list is greater than 8 and the length of the array is greater than 64, the linked list in HashMap will turn red and black.

The same is true for Zset, which will trigger the change between ziplist and skiplist?

The answer to this question is hidden in the redis.conf file. There are two configurations:

A data structure based on luck, guess what?

The meaning of configuration in the figure above is that when the number of elements in the ordered set is less than the value configured by Zset Max ziplist entries, and the length of each element value is less than the value configured by Zset Max ziplist value, the internal code of Zset is ziplist.

Otherwise, skiplist is used.

Now that the theory is ready, I’ll show you a wave.

First, we set two values for the key of the ordered set memberscore, and then look at its internal code

A data structure based on luck, guess what?

At this time, the number of elements in the ordered set is 2. As you can see, the internal coding adopts the structure of ziplist.

For your convenience, let me draw a picture:

A data structure based on luck, guess what?

Then we need to trigger the change of internal code from ziplist to skiplist.

To verify the Zset Max ziplist value configuration a priori, a value larger than 64 bytes (the default configuration of Zset Max ziplist value) is inserted into the memberscale element

A data structure based on luck, guess what?

At this time, there are three elements in the ordered set whose key is memberscore, and the value of one element is particularly long, exceeding 64 bytes.

In this case, the internal code is skiplist.

Next, we add multiple values to Zset to verify that the number of elements is greater than Zset Max ziplist entries.

Let’s create a new key with the value whytestkey.

First, insert two elements into the whytestkey. Is this its internal code or ziplist

A data structure based on luck, guess what?

So the problem is, from the configuration point of viewzset-max-ziplist-entries 128

Is 128 equal to or greater than?

It doesn’t matter. I don’t know. Just give it a try.

Now there are two elements. Add another 126 elements

A data structure based on luck, guess what?

Through experiments, we find that when the number of elements in whytestkey is 128, its internal code is ziplist.

Then, the condition that triggers the transition from ziplist to skiplist is that the number of elements is greater than 128. Let’s add another one:

A data structure based on luck, guess what?

Sure enough, the internal code changed from ziplist to skiplist.

After the theoretical verification, Zset does have two faces.

This paper mainly discusses the internal code of skiplist.

It’s what the title says: a data structure based on luck.

What is a skiplist?

This structure was proposed by a friend named William Pugh in a paper called skip lists: a probabilistic alternative to balanced trees published in 1990.

Address: ftp://ftp.cs.umd.edu/pub/skipLists/skiplists.pdf

As for me, when I write articles, I usually go to the Internet to search what the boss looks like. It doesn’t mean anything else. The main concern is whether their hair is sparse or not.

A data structure based on luck, guess what?

Before I look for the photo of the author of the paper, I call him Mr. William. After I find it, I want to give him a nickname, which is fire man:

A data structure based on luck, guess what?

This is the only picture on his homepage. Then I clicked on his website:

A data structure based on luck, guess what?

It mentions his great achievements.

At a glance, I was interested in the three places I circled.

  • The first was the invention of the jump watch.
  • The second is to participate in jsr-133 “Java Memory Model and thread Specification Revision” work.
  • The third is that this guy learned to swallow fire when he was at Google. I think Google is full of talents. How can we teach this stuff?

Eat fire, big guy’s hobbies are really different.

I think he really likes playing with fire, so I’ll call him fire man

A data structure based on luck, guess what?

In the abstract of Huo man’s paper, the jump table is introduced as follows:

A data structure based on luck, guess what?

The abstract says:Jump table is a kind of data structure that can be used to replace balance tree. Jump table uses probability balance instead of strict balance. Therefore, compared with balance tree, the algorithm of insertion and deletion in jump table is much simpler and faster.

In the paper, where the jump table algorithm is described in detail, he said as follows:

A data structure based on luck, guess what?

First of all, fire man said, for an ordered list, if we need to find an element, we must traverse the list. Like part a of the diagram he gave you.

I’ll take it alone

A data structure based on luck, guess what?

At this time, we can keep up, right. Linked list search, one by one traversal is the basic operation.

Well, if the list is ordered, we can make a pointer to the next node of the node.

It means pulling some nodes up.

How to pull out? Every other node is pulled out. Compared with the diagram a above, the change is as follows:

A data structure based on luck, guess what?

What’s the advantage of pulling it out?

Suppose the node we want to query is 25.

When it is an ordinary ordered list, we start to traverse from the top node. The path we need to traverse is as follows:

head -> 3 -> 6 -> 7 -> 9 -> 12 -> 17 -> 19 -> 21 -> 25

It takes nine queries to find 25.

However, when the structure changes a little and becomes the B diagram, the query path is as follows:

The second layer is head, 6, 9, 17, 21, 25.

Five queries found 25.

In this case, we find the specified element, which will not exceed (n / 2) + 1 node:

A data structure based on luck, guess what?

So at this time, there is a small problem: how to go directly from 21 to 25?

Looking at the pictures in the paper, it’s a little difficult to understand.

So, let me draw a new sketch for you

A data structure based on luck, guess what?

See“ One more down pointer. In fact, it’s not too much. It’s just that there’s no indication in the paper.

Therefore, the path of query 25 is as follows: the direction indicated by the hollow arrow is as follows:

A data structure based on luck, guess what?

Between 21 and 26 nodes, the logic is very simple.

Node 21 has a right pointer to 26. First judge that the value of the right pointer is greater than the value of the query.

So the lower pointer will play a role. Next, continue to judge the right pointer.

In fact, the judgment logic of each node is like this, but the previous judgment result is to move the right pointer.

According to the idea of drawing nodes up, suppose we draw to the fourth layer, which is the schematic diagram in the paper

A data structure based on luck, guess what?

When we query 25, we only need to go through it twice.

The first step is to skip all the elements before 21.

How’s it going? How’s it going?

A data structure based on luck, guess what?

However, it is flawed.

This is what Huo man said in his paper

A data structure based on luck, guess what?

This data structure could be used for fast searching, but insertion and deletion would be impractical.

Queries are really fast. But for inserting and deleting would be impractical.

What’s the meaning of impractical?

A data structure based on luck, guess what?

You see, learn another CET-4 word.

A data structure based on luck, guess what?

Insertion and deletion are almost impossible.

You think, ah, the top of the bottom of the ordered list, I took it out to you from the beginning.

Then I will say that based on the ordered list, every other node will be pulled to the upper layer, and then a list will be built. So the ratio of upper and lower nodes should be 2:1. Balabala’s

But the actual situation should be that we didn’t even have this ordered list at the beginning, so we need to create it ourselves.

If you want to insert a node into the existing jump list structure, it will undoubtedly be inserted into the bottom ordered list.

But did you destroy the 1:2 ratio of the upper and lower levels?

How to do, one layer of adjustment.

Yes, but please consider the difficulty of coding and the corresponding time complexity?

If we want to do this, it will be a wave of persuasion.

Can’t stand it?

I haven’t said anything about deletion.

What can we do?

Look at what the paper says

A data structure based on luck, guess what?

First of all, let’s focus on the first section where the red line is drawn.

Fire man wrote: 50% of the nodes are in the first layer, 25% in the second layer, and 12.5% in the third layer.

What do you think he’s saying to you?

In addition to the number of nodes in each layer, he also explains the hierarchy

A data structure based on luck, guess what?

There is no layer 0, at least there is no layer 0 in the paper.

If you have to say that the bottom ordered list with all nodes is called layer 0, I think it’s OK. However, I think it is more appropriate to call it the basic linked list.

Then I look at the second line.

Fire man mentioned a key word: random, which means random.

You may not believe it when you say it, but jumping table is a random way to solve the problem of adjusting the structure after inserting (delete).

How to be random? Flip a coin.

Yes, I didn’t cheat you. It’s really a coin toss.

A data structure based on luck, guess what?

“Coin” in jump Watch

When an element is inserted into the jump table, fire man means that we can not strictly follow the 1:2 node relationship between the upper and lower levels.

If the inserted element needs to be indexed, it is decided by the toss of a coin to set up the index on the third level.

Or: it’s decided by the probability of tossing a coin.

I ask you, what’s the probability that a coin is positive after it’s thrown out?

Is it 50%?

If we record this probability as P, then 50% is p = 1 / 2.

How to use the probability mentioned above?

There is a section in Huo man’s paper written like this:

A data structure based on luck, guess what?

Randomly select a level. He said, let’s assume the probability p = 1 / 2, and then ask us to look at Figure 5.

Figure 5 is as follows:

A data structure based on luck, guess what?

A very important picture.

Just a few lines of code, describes how to choose the level of random algorithm.

First, the initial level is defined as 1 (LVL: = 1).

Then there is a comment: random() that returns a random value in [0… 1]

Random() returns a random value between [0… 1].

Next, a while… Do loop.

There are two cyclic conditions.

First: random () < P. Since P = 1 / 2, the probability of this condition is also 1 / 2.

If each random time satisfies random () < p, then the level is increased by one.

What if you are lucky and the random numbers are less than p one hundred times in a row? Isn’t the level 100?

The second condition LVL < MAXLEVEL is to prevent this situation. It can ensure that the calculated level will not exceed the specified MAXLEVEL.

In this way, although each time is based on the probability to determine the level, but the overall trend is close to 1 / 2.

The advantage is that each insertion is independent. You only need to adjust the pointers of the nodes before and after insertion.

An insert is a query and update operation, such as the following diagram:

A data structure based on luck, guess what?

In addition, for this probability, in fact, Huo Nan specially wrote a subtitle in his paper and gave a chart:

A data structure based on luck, guess what?

The final conclusion is that huonan suggests that P value should be 1 / 4. If your main concern is the change of execution time, then p is 1 / 2.

Let’s talk about my understanding. First of all, this is a typical example of space for time.

An ordered two-dimensional array, find the specified elements, theoretically is the fastest binary search. And the jump list is in the base of the list constantly extract nodes (or called index), forming a new list.

Therefore, when p = 1 / 2, it is similar to binary search, and the query speed is fast, but the number of layers is relatively high, and the space occupied is large.

When p = 1 / 4, the probability of element upgrading layers is low, and the overall layer height is low. Although the query speed is slower, the space occupied is smaller.

In redis, the value of P is 0.25, that is, 1 / 4, and the value of MAXLEVEL is 32 (depending on the version: some versions are 64).

The paper also spent a lot of time to reason about the time complexity. If you are interested, you can look at the paper and reason about it together

A data structure based on luck, guess what?

Application of jump table in Java

Jump table, although it is a relatively small data structure.

In fact, there is a corresponding implementation in Java.

Let me ask you a question: most of the map families are disordered. Do you know any map that is orderly?

Treemap, LinkedHashMap are ordered, right.

But they are not thread safe.

So what is a thread safe and orderly map?

That is it, a concurrent skiplistmap with low sense of existence.

You see, it’s a name with list and map.

Look at a test case:

`public class MainTest {
    public static void main(String[] args) {
        ConcurrentSkipListMap<Integer, String> skipListMap = new ConcurrentSkipListMap<>();
        skipListMap.put(3,”3″);
        skipListMap.put(6,”6″);
        skipListMap.put(7,”7″);
        skipListMap.put(9,”9″);
        skipListMap.put(12,”12″);
        skipListMap.put(17,”17″);
        skipListMap.put(19,”19″);
        skipListMap.put(21,”21″);
        skipListMap.put(25,”25″);
        skipListMap.put(26,”26″);
        System.out.println(“skipListMap = ” + skipListMap);
    }
}
`

The output is like this: it’s really orderly

A data structure based on luck, guess what?

A little bit of analysis. First look at its three key structures.

The first is index

A data structure based on luck, guess what?

Index contains a node, a right pointer and a down pointer.

The second is headindex

A data structure based on luck, guess what?

It is inherited from the index, only a level attribute is added, and the record is the index at which level.

The third is node

A data structure based on luck, guess what?

This node has nothing to say. It’s just a linked list.

The relationship between the three is illustrated as follows:

A data structure based on luck, guess what?

Let’s use the previous example code, debug it first, and fill in the above diagram with real values.

After debugging, you can see that there are two levels at present:

A data structure based on luck, guess what?

Let’s first look at the linked list of the second layer, that is, the right attribute of the header node of the second layer

A data structure based on luck, guess what?

So the linked list of the second layer is like this:

A data structure based on luck, guess what?

In addition to the right attribute we just analyzed, the headindex node of the second layer also has a down, which points to the next layer, that is, the headindex of the first layer

A data structure based on luck, guess what?

You can see that the down property of the headindex in the first layer is null. But its right property has a value:

A data structure based on luck, guess what?

You can draw the linked list structure of the first layer as follows:

A data structure based on luck, guess what?

At the same time, we can see that the node attribute is actually the entire ordered linked list (in fact, there is one in the headindex of each layer)

A data structure based on luck, guess what?

Therefore, the whole hop table structure is as follows:

A data structure based on luck, guess what?

But when you take the same program and debug it yourself, you will find that your jump meter doesn’t look like this?

Of course, it’s not the same. It’s the same that you hit a ghost.

Don’t forget that the levels of the index are randomly generated.

How is concurrent skiplistmap random?

Let’s take a look at the source code of put.

A data structure based on luck, guess what?

There are a lot of local codes labeled ①, but the core idea is to maintain the specified elements in the bottom ordered list. There’s no interpretation, so I folded the code.

The place marked 2 is(rnd & 0x80000001) == 0

This RND is the random value of the previous line of code.

The binary corresponding to 0x8000001 is as follows:

A data structure based on luck, guess what?

The first and last bits are all 1, and the other bits are all 0.

Then the if condition will be satisfied only when both ends of RND are zero,(rnd & 0x80000001) == 0

The beginning and the end of the binary are all 0, indicating that it is a positive even number.

When a positive even number comes out randomly, it indicates that the maintenance of index is needed.

The place marked with (3) is to determine the level index to maintain the current element.

((rnd >>>= 1) & 1) != 0It is known that RND is a positive even number, then starting from the second bit of the lower bit of the binary (the first bit must be 0), there are several consecutive 1s, which are maintained to the level.

don’t get it? It doesn’t matter. Let me give you an example.

Suppose that the random positive even number is 110, and its binary is 01101110. Because there are three consecutive 1s, then the level is self increasing three times from 1, and the final level is 4.

So the question is, what if we only have two indexes at most? Do you want to go straight to layer 4?

At this time, the function of the code labeled as 4 comes out.

If the number of new layers is greater than the number of existing layers, only add one to the number of existing layers.

At this time, let’s go back to the random algorithm in Huo man’s paper

A data structure based on luck, guess what?

So, you know now, because of the emergence of random numbers, even the same parameters can build a different jump table structure every time.

For example, as shown in the previous code, when I debugged the screenshot, I had two levels of index.

However, sometimes I will encounter three-tier index.

A data structure based on luck, guess what?

Don’t ask why, feel it with your heart, you should know it.

A data structure based on luck, guess what?

In addition, the beginning with redis as a starting point, in fact, the overall idea of redis jump table is the same, but there are also small differences.

For example, redis adds the span attribute to the forward pointer (equivalent to index) of skiplist.

This attribute is described in the book redis in depth adventure

A data structure based on luck, guess what?

One last word

Well, that’s all for this article.

If you find something wrong, you can put it forward and I will revise it. Thank you for your reading, I insist on original, very welcome and thank you for your attention.

A data structure based on luck, guess what?

I am why, a literary creator delayed by the code, not a big guy, but like to share, a warm and informative Sichuan man.

Welcome to pay attention to me.

A data structure based on luck, guess what?