Redis design and implementation 9: a collection of five data types

Time:2021-4-15

There are two encoding methods for set objectsintsetandhashtable

Code 1: intset

The structure of intset

Set of integersintsetIt is one of the implementations of the collection bottom layer. As you can see from the name, it is a collection type specially provided for integers.
Its structure is defined as followsintset.h

typedef struct intset {
    //Coding mode
    uint32_t encoding;
    //The number of elements contained in the collection
    uint32_t length;
    //Save an array of elements
    int8_t contents[];
} intset;
  • contentsAnd there are no duplicate items. Although the element definition isint8_tType, but actually,contentsThe type of element to store depends on theencoding
  • encodingThere are several types in the definitionintset.c
#define INTSET_ENC_INT16 (sizeof(int16_t))
#define INTSET_ENC_INT32 (sizeof(int32_t))
#define INTSET_ENC_INT64 (sizeof(int64_t))
encoding type byte
INTSET_ENC_INT16 int16_t 2
INTSET_ENC_INT32 int32_t 4
INTSET_ENC_INT64 int64_t 8

The following figure shows the set structure with three integer elements: 1, 2 and 3

Common operation source code analysis

Source code inintset.cin

1. Create an empty collection

Create an emptyintsetThe initial encoding is minimalINTSET_ENC_INT16

intset *intsetNew(void) {
    intset *is = zmalloc(sizeof(intset));
    is->encoding = intrev32ifbe(INTSET_ENC_INT16);
    is->length = 0;
    return is;
}

2. Search

Because the integers in the set are stored in order, the search is binary search, which has low time complexity\(O(nlogn)\)

uint8_t intsetFind(intset *is, int64_t value) {
    uint8_t valenc = _intsetValueEncoding(value);
    //If the encoding of value is greater than that of the set, it certainly does not exist
    //Intsetsearch is a lower level search. The source code is below. It is a binary search
    return valenc <= intrev32ifbe(is->encoding) && intsetSearch(is,value,NULL);
}

//Set search is binary search.
//If found, return 1 and set the position to the POS variable
//If it cannot be found, return 0, and set the position of the inserted value to the POS variable
static uint8_t intsetSearch(intset *is, int64_t value, uint32_t *pos) {
    int min = 0, max = intrev32ifbe(is->length)-1, mid = -1;
    int64_t cur = -1;

    //Array null
    if (intrev32ifbe(is->length) == 0) {
        if (pos) *pos = 0;
        return 0;
    } else {
        //To see whether it is larger than the largest or smaller than the smallest, this situation also directly returns that it does not exist
        if (value > _intsetGet(is,max)) {
            if (pos) *pos = intrev32ifbe(is->length);
            return 0;
        } else if (value < _intsetGet(is,0)) {
            if (pos) *pos = 0;
            return 0;
        }
    }

    //Binary search
    while(max >= min) {
        mid = ((unsigned int)min + (unsigned int)max) >> 1;
        cur = _intsetGet(is,mid);
        if (value > cur) {
            min = mid+1;
        } else if (value < cur) {
            max = mid-1;
        } else {
            break;
        }
    }

    if (value == cur) {
        if (pos) *pos = mid;
        return 1;
    } else {
        if (pos) *pos = min;
        return 0;
    }
}

3. Get from the specified location

//If obtained, return 1, and set the found value into the value variable
//If not, return 0
uint8_t intsetGet(intset *is, uint32_t pos, int64_t *value) {
    if (pos < intrev32ifbe(is->length)) {
        *value = _intsetGet(is,pos);
        return 1;
    }
    //If the position is larger than the length, it will not be obtained
    return 0;
}
static int64_t _intsetGet(intset *is, int pos) {
    //According to the code
    return _intsetGetEncoded(is,pos,intrev32ifbe(is->encoding));
}
static int64_t _intsetGetEncoded(intset *is, int pos, uint8_t enc) {
    int64_t v64;
   	// ...

    //According to the length of the code, copy the corresponding byte from the corresponding position and return it
    if (enc == INTSET_ENC_INT64) {
        memcpy(&v64,((int64_t*)is->contents)+pos,sizeof(v64));
        memrev64ifbe(&v64);
        return v64;
    } else if (enc == INTSET_ENC_INT32) {
        // ...
        return v32;
    } else {
        // ...
    }
}

4. Insert

The steps of insertion are as follows:

  1. Check if the code of the inserted element is larger than the set code, upgrade and insert
  2. If there is no need to upgrade, check whether the element exists. If it exists, return it directly
  3. If the element does not exist, it will be expanded and the value will be inserted at the corresponding position of the element (the elements after it will be moved backward)
intset *intsetAdd(intset *is, int64_t value, uint8_t *success) {
    //The encoding of the inserted element
    uint8_t valenc = _intsetValueEncoding(value);
    uint32_t pos;
    if (success) *success = 1;

    //If the code of the inserted element is larger than that of the current set, it needs to be upgraded
    if (valenc > intrev32ifbe(is->encoding)) {
        return intsetUpgradeAndAdd(is,value);
    } else {
        //First, look up the element to see if it already exists. If it exists, return it directly
        if (intsetSearch(is,value,&pos)) {
            if (success) *success = 0;
            return is;
        }
		
		//Expansion
        is = intsetResize(is,intrev32ifbe(is->length)+1);
        //Move the memory block after POS backward to make room for the new value
        if (pos < intrev32ifbe(is->length)) intsetMoveTail(is,pos,pos+1);
    }

    //Set the new value to the POS position
    _intsetSet(is,pos,value);
    is->length = intrev32ifbe(intrev32ifbe(is->length)+1);
    return is;
}

static void intsetMoveTail(intset *is, uint32_t from, uint32_t to) {
    void *src, *dst;
    uint32_t bytes = intrev32ifbe(is->length)-from;
    uint32_t encoding = intrev32ifbe(is->encoding);

    if (encoding == INTSET_ENC_INT64) {
        src = (int64_t*)is->contents+from;
        dst = (int64_t*)is->contents+to;
        bytes *= sizeof(int64_t);
    } else if (encoding == INTSET_ENC_INT32) {
        // ...
    } else {
        // ...
    }
    memmove(dst,src,bytes);
}

5. Upgrade

WhenintsetWhen inserting an element, the length of the element will be detected first to determine what code the element should belong to(encoding)。
If the encoding of the current element is greater thanintsetCode (the longest code of the entire collection), the collection will be upgraded before adding elements.

There are three steps to upgrade integer set and add new elements

  1. According to the encoding of the new element, the space of the underlying array of the integer set is expanded, and the space is allocated for the new element.
  2. All the existing elements of the underlying array are converted to the same type as the new elements, and the elements after the type conversion are placed in the correct position. In the process of placing elements, it is necessary to continue to maintain the ordered nature of the underlying array.
  3. Add new elements to the underlying array.
//Upgrade and insert new values
static intset *intsetUpgradeAndAdd(intset *is, int64_t value) {
    //Current code
    uint8_t curenc = intrev32ifbe(is->encoding);
    //New coding
    uint8_t newenc = _intsetValueEncoding(value);
    //Number of current elements
    int length = intrev32ifbe(is->length);
    //The encoding of value is larger than others, so the value is either the maximum or the minimum.
    //If it is the maximum value, it is placed at the end of the array, and the minimum value is placed at the front of the array
    int prepend = value < 0 ? 1 : 0;

    //Set encoding property to new encoding
    is->encoding = intrev32ifbe(newenc);
    //According to the new code to expand the space required by the collection, to achieve the source code in the following
    is = intsetResize(is,intrev32ifbe(is->length)+1);

    //Traverse and move the original value from end to end. Why not start from the beginning to the end? Because the array is the same, it will cover the original value from beginning to end
    while(length--)
        // _ Intsetgetencoded (is, length, curenc) means to get the value according to the encoding and location
        //In order to ensure that if value is the minimum value, a blank position will be left in front of it
        _intsetSet(is,length+prepend,_intsetGetEncoded(is,length,curenc));

    if (prepend)
    	//When value is the minimum, it is placed in the first vacancy
        _intsetSet(is,0,value);
    else
        //When value is the maximum, put it in the last position
        _intsetSet(is,intrev32ifbe(is->length),value);
    //Length plus 1
    is->length = intrev32ifbe(intrev32ifbe(is->length)+1);
    return is;
}

//Reallocation of memory by integer set
static intset *intsetResize(intset *is, uint32_t len) {
    //Calculate the space needed by the set according to the code
    uint32_t size = len*intrev32ifbe(is->encoding);
    //Allocate memory
    is = zrealloc(is,sizeof(intset)+size);
    return is;
}

6. Demotion

There was no downgrade

7. Deletion

The steps of deletion are as follows:

  1. Find the location of the valuepos
  2. holdposThe following elements move forward and coverposElements on
  3. Shrinkage: length minus one
intset *intsetRemove(intset *is, int64_t value, int *success) {
    uint8_t valenc = _intsetValueEncoding(value);
    uint32_t pos;
    if (success) *success = 0;

    //Find the location of the value
    if (valenc <= intrev32ifbe(is->encoding) && intsetSearch(is,value,&pos)) {
        uint32_t len = intrev32ifbe(is->length);
        if (success) *success = 1;
        //Move the elements behind the deletion position to the front, and directly cover the POS elements
        if (pos < (len-1)) intsetMoveTail(is,pos+1,pos);
        //Re shrinking volume
        is = intsetResize(is,len-1);
        is->length = intrev32ifbe(len-1);
    }
    return is;
}

Code 2: hashtable

hashtableThe code uses a dictionarydictAs the underlying implementation, aboutdictThe concrete preceding textRedis design and implementation 4: Dictionary DictHas been written, including the dict basic operation of the source code interpretation.

The following figure shows the set structure with four elements of “a”, “B”, “C” and “d”:

Conversion of coding

When the set object satisfies the following two conditions, theintsetcode:

  1. All elements are integers
  2. The number of elements should not exceed 512set-max-intset-entriesConfiguration item (configuration)

If the above two conditions can not be met at the same time, the method is adoptedtablehashcode.

Recommended Today

Review of SQL Sever basic command

catalogue preface Installation of virtual machine Commands and operations Basic command syntax Case sensitive SQL keyword and function name Column and Index Names alias Too long to see? Space Database connection Connection of SSMS Connection of command line Database operation establish delete constraint integrity constraint Common constraints NOT NULL UNIQUE PRIMARY KEY FOREIGN KEY DEFAULT […]