Redis scan Command Principle

Time:2019-3-29

Scan type command

SCAN cursor [MATCH pattern] [COUNT count]

SSCAN KEY cursor [MATCH pattern] [COUNT count]

HSCAN  KEY cursor [MATCH pattern] [COUNT count]

ZSCAN KEY cursor [MATCH pattern] [COUNT count]

Scan: Iterates the current library

Sscan: Iterate a set type

Hscan: Iterates a hash type and returns the corresponding value

Zscan: Iterates a sorted set and returns the corresponding score

Redis is a single process and single thread model. Commands such as keys and smembers may block the server, so a series of commands of scan appear, which can be iterated incrementally by returning a cursor.

Implementation of scan type command

Scan, sscan, hscan and zsan have their own command entries. In the entries, parameter detection and cursor assignment are carried out, and then the unified entry function is scanGenericCommand. Take the hscan command as an example:

Redis scan Command Principle
Scan GenericCommand has four main steps:

  • Parse count and match parameters. If no count is specified, 10 data are returned by default.
  • Start iterating over the set. If the key is saved as ziplist or intset, all data is returned at once without a cursor (the cursor value returns 0 directly). Because redis design will only be saved as ziplist or intset when the amount of data is small, performance will not be affected here.

Cursors play a role when saved as hash. The specific entry function is dictScan, which is described in detail below.

  • Filter the return value according to the match parameter, and if the key has expired, it will also be filtered out directly (redis will not be deleted immediately after the key expires)
  • Returns the result to the client, which is an array. The first value is a cursor, and the second value is a specific key-value pair.

Realization of Cursor in dictScan

When iterating over a hash table, there are three cases:

  • From the beginning to the end of the iteration, the hash table does not rehash
  • From the beginning to the end of the iteration, the hash table rehashes, but each iteration, the hash table either does not start rehash or has ended rehash.
  • From the beginning to the end of an iteration, the hash table is rehashing at one or more iterations

When rehash is performed in redis, there are two hash tables, HT [0] and HT [1], and they are progressive rehash (i.e., not all rehash at once); the new key pair will be stored in HT [1], and the data of HT [0] will be transferred to HT [1]. After all rehash is completed, HT [1] is assigned to HT [0] and then emptied HT [1].

Therefore, the realization of cursors needs to take into account the above three situations. The requirements of cursor implementation in the above three situations are as follows:

  • The first case is relatively simple. Assuming that the hash table size of redis is 4, the first cursor is 0, reading the data of the first bucket, then the cursor returns to 1, and the next time reading the position of the second bucket, traversing it in turn.
  • The second case is more complex. Assuming that the hash table size of redis is 4, if the size becomes 8 after rehash. If the cursor is returned as above, the following figure is shown:

Redis scan Command Principle

Assuming that bucket 0 returns to cursor 1 after reading, the hash table has been rehashed when the client returns with cursor 1 again, and the size has doubled to 8. redis calculates a key bucket as follows:

hash(key)&(size-1)

That is, if the size is 4, hash (key) & 11, if the size is 8, hash (key) & 111. So when the size is expanded from 4 to 8, the original data in 0 bucket will be scattered to 0 (000) and 4 (100) buckets. The corresponding table of bucket is as follows:

Redis scan Command Principle
From the binary point of view, when size is 4, after hash (key), take two lower places, namely hash (key) & 11, the bucket position of key, if size is 8, the bucket position is hash (key) & 111, that is, take three lower places, when two lower places are 00, if the third place is 0, then 000, if the third place is 1, then 100, just like the other slots. All values are duplicate

  • In the third case, if rehash is in progress when cursor 1 is returned, some data in bucket 1 of HT [0] may have rehash to bucket [1] or bucket [5] of HT [1], then the corresponding bucket in HT [0] and HT [1] must be traversed completely, otherwise there may be missing data.

So in order to take into account the above three situations, do not leak data and try not to repeat, redis uses a method called reverse binary iteration. Specific cursor calculation code is as follows:

Redis scan Command Principle
The code logic is very simple. The following examples show why this method can be repeated when changing from 4 to 8 and from 4 to 16, and from 8 to 4 and from 16 to 4.

Redis scan Command Principle
The cursor state transition is 0-2-1-3 when traversing size 4.

Similarly, when size is 8, the cursor state transition is 0-4-2-6-1-5-3-7.

When size is 16, the cursor state is converted to 0-8-4-12-2-10-6-14-1-9-5-13-3-11-7-15

Redis scan Command Principle

It can be seen that when size changes from small to large, all the original cursors can find the corresponding position in the large hashTable, and in the same order, they will not be read repeatedly and will not be omitted.

For example, size changed from 4 to 8, and rehash was completed in the second traversal. At this time, the cursor is 2. According to Figure 2, we know that bucket 2 at Size 4 will rehash to 2 and 6 at size 8. And bucket 0 rehash at Size 4 to 0 and 4 at size 8.

Since bucket 0 has been traversed, that is, 0,4 at 8:00 has been traversed, it just begins to traverse from the beginning, without repetition or omission.

Consider the case of size changing from large to small. Assuming that size changes from 16 to 4, there are two cases: one is the cursor of 0,2,1,3, and then continue to read without missing or repeating.

But if the cursor does not return these four kinds, such as returning 10,10-11 and then changing to 2, it will continue traversing from 2. But since the bucket 2 at Size 16 has been read, and 2,10,6,14 will rehash to the bucket 2 at size 4, it will cause repeated reading.

The size is 16:00 bucket 2. Repeated but not omitted

To sum up: when rehash is growing up in redis, the scan commands will not be repeated or omitted. But from large to small, it may cause repetition but will not be omitted.

So far, case 1 and case 2 have been handled perfectly. Case 3 See how to deal with it

Case 3 needs to extract data from both HT [0] and HT [1]. The main difficulty is how to find which bucket. redis codes should be retrieved from the large size hash table as follows:

Redis scan Command Principle
The judgment condition is:

v&(m0^m1)

The M0 of size 4 is 00000011, and the M1 of size 8 is 00000111. If the two are different, the value is 00000100, that is, the value of the mask height of the two masks, and then & v, see if the cursor still has value at the height.

The next cursor is selected by the following method

v = (  ((v | m0) +1)& ~m0) | ( v & m0)

The lower part of the right part is v, and the higher part of the left part is v. (v&m0) Take out the low position of v, e.g. size = 4 when v&00000011

The left half (v | m0) + 1 sets the low position of V to 1, then + 1 will carry to the high position of v, and once again, & m0, the high position of V will be taken out.

Overall, add the height of the cursor V to 1 at a time. For example:

Assuming that the cursor returns 2 and is rehashing, the size changes from 4 to 8. Then M0 = 00000011 v = 00000010

The next cursor calculated from the formula is ((00000010 | 00000011) +1) & (11111111100) | (00000010 & 00000011) = (00000100) & (11111111100) | (00000000010) = (000000000110) which is exactly 6.

The judgement condition is (00000010) & (00000011 ^ 00000111) = (00000010) & (00000100) = (00000000000) 0, ending the cycle.