Performance optimization of handwritten redis (VIII) plain LRU elimination algorithm from scratch

Time:2022-1-14

preface

Java implements redis from zero handwriting (I) how to implement a fixed size cache?

Implementing redis from zero handwriting in Java (III) the principle of redis expire expiration

How to implement redis from scratch in Java (III) how to restart without losing memory data?

Implementing redis from zero handwriting in Java (IV) adding a listener

Another implementation idea of redis (V) expiration policy from zero handwriting in Java

Redis (VI) detailed explanation and implementation of AOF persistence principle from zero handwriting in Java

We have simply implemented several features of redis. Java implements redis from scratch (I) how to implement a fixed size cache? The first in first out elimination strategy is implemented in.

However, in practical work, it is generally recommended to use the expulsion strategy of LRU / LFU.

LRU Basics

What is it?

The full name of LRU algorithm is least recently used algorithm, which is widely used in caching mechanism.

When the space used by the cache reaches the upper limit, it is necessary to eliminate some of the existing data to maintain the availability of the cache, and the selection of eliminated data is completed through the LRU algorithm.

The basic idea of LRU algorithm is time locality based on locality principle:

If an information item is being accessed, it is likely to be accessed again in the near future.

Expand reading

Apache Commons lrumap source code explanation

Redis is used as LRU map

Java handwritten redis from scratch (VII) detailed explanation and implementation of redis LRU removal strategy

Simple implementation ideas

Array based

Scheme: attach an additional attribute timestamp to each data. When accessing the data every time, update the timestamp of the data to the current time.

When the data space is full, the entire array is scanned to eliminate the data with the smallest timestamp.

Disadvantages: maintaining timestamps requires additional space, and the entire array needs to be scanned when data is eliminated.

The time complexity is too poor and the space complexity is not good.

Bidirectional linked list based on finite length

Scheme: when accessing a data, when the data is not in the linked list, insert the data into the head of the linked list. If it is in the linked list, move the data to the head of the linked list. When the data space is full, the data at the end of the linked list will be eliminated.

Insufficient: when inserting or fetching data, you need to scan the whole linked list.

This is the way we implemented in the previous section. The disadvantage is still obvious. Each time we confirm whether an element exists, we have to consume o (n) time complexity to query.

Based on bidirectional linked list and hash table

Scheme: in order to improve the above defect of scanning the linked list, cooperate with the hash table to map the data with the nodes in the linked list, and reduce the time complexity of insertion and reading operations from O (n) to o (1)

Disadvantages: this makes the optimization idea mentioned in the previous section, but it still has disadvantages, that is, doubling the space complexity.

Selection of data structure

(1) Implementation based on array

It is not recommended to select array or ArrayList here, because the read time complexity is O (1), but the update is relatively slow, although the JDK uses system arrayCopy。

(2) Implementation based on linked list

If we choose the linked list, we can’t simply store the key and the corresponding subscript in the HashMap.

Because the traversal of the linked list is actually o (n), the two-way linked list can be optimized by half in theory, but this is not the O (1) effect we want.

(3) Based on bidirectional list

The two-way linked list remains unchanged.

The value corresponding to the key in the map is used to store the node information of the two-way linked list.

The implementation method becomes to implement a two-way linked list.

code implementation

  • Node definition
/**
 *Bidirectional linked list node
 * @author binbin.hou
 * @since 0.0.12
 * @param <K> key
 * @param <V> value
 */
public class DoubleListNode<K,V> {

    /**
     *Key
     * @since 0.0.12
     */
    private K key;

    /**
     *Value
     * @since 0.0.12
     */
    private V value;

    /**
     *Previous node
     * @since 0.0.12
     */
    private DoubleListNode<K,V> pre;

    /**
     *Last node
     * @since 0.0.12
     */
    private DoubleListNode<K,V> next;

    //fluent get & set
}
  • Core code implementation

We keep the original interface unchanged, and the implementation is as follows:

public class CacheEvictLruDoubleListMap<K,V> extends AbstractCacheEvict<K,V> {

    private static final Log log = LogFactory.getLog(CacheEvictLruDoubleListMap.class);


    /**
     *Head node
     * @since 0.0.12
     */
    private DoubleListNode<K,V> head;

    /**
     *Tail node
     * @since 0.0.12
     */
    private DoubleListNode<K,V> tail;

    /**
     *Map information
     *
     *Key: element information
     *Value: the node information corresponding to the element in the list
     * @since 0.0.12
     */
    private Map<K, DoubleListNode<K,V>> indexMap;

    public CacheEvictLruDoubleListMap() {
        this.indexMap = new HashMap<>();
        this.head = new DoubleListNode<>();
        this.tail = new DoubleListNode<>();

        this.head.next(this.tail);
        this.tail.pre(this.head);
    }

    @Override
    protected ICacheEntry<K, V> doEvict(ICacheEvictContext<K, V> context) {
        ICacheEntry<K, V> result = null;
        final ICache<K,V> cache = context.cache();
        //If the limit is exceeded, remove the element at the end of the queue
        if(cache.size() >= context.size()) {
            //Gets the previous element of the tail node
            DoubleListNode<K,V> tailPre = this.tail.pre();
            if(tailPre == this.head) {
                log. Error ("the current list is empty and cannot be deleted");
                Throw new cacheruntimeexception ("header node cannot be deleted!");
            }

            K evictKey = tailPre.key();
            V evictValue = cache.remove(evictKey);
            result = new CacheEntry<>(evictKey, evictValue);
        }

        return result;
    }


    /**
     *Put element
     *
     *(1) Delete existing
     *(2) Put the new element in the element head
     *
     *@ param key element
     * @since 0.0.12
     */
    @Override
    public void update(final K key) {
        //1.  Execute delete
        this.remove(key);

        //2.  Insert new element into header
        //head<->next
        //Become: head < - > New < - > next
        DoubleListNode<K,V> newNode = new DoubleListNode<>();
        newNode.key(key);

        DoubleListNode<K,V> next = this.head.next();
        this.head.next(newNode);
        newNode.pre(this.head);
        next.pre(newNode);
        newNode.next(next);

        //2.2 inserting into map
        indexMap.put(key, newNode);
    }

    /**
     *Remove element
     *
     * 1.  Get elements in map
     * 2.  If there is no direct return, perform the following steps:
     *2.1 delete elements in the two-way linked list
     *2.2 delete elements in map
     *
     *@ param key element
     * @since 0.0.12
     */
    @Override
    public void remove(final K key) {
        DoubleListNode<K,V> node = indexMap.get(key);

        if(ObjectUtil.isNull(node)) {
            return;
        }

        //Delete list node
        // A<->B<->C
        //Delete B and change to: a < - > C
        DoubleListNode<K,V> pre = node.pre();
        DoubleListNode<K,V> next = node.next();

        pre.next(next);
        next.pre(pre);

        //Delete corresponding information in map
        this.indexMap.remove(key);
    }

}

It’s not difficult to implement. It’s a simple version of a two-way list.

Only when obtaining nodes, the time complexity is reduced to o (1) with the help of map.

test

Let’s verify our implementation:

ICache<String, String> cache = CacheBs.<String,String>newInstance()
        .size(3)
        .evict(CacheEvicts.<String, String>lruDoubleListMap())
        .build();
cache.put("A", "hello");
cache.put("B", "world");
cache.put("C", "FIFO");

//Visit a once
cache.get("A");
cache.put("D", "LRU");

Assert.assertEquals(3, cache.size());
System.out.println(cache.keySet());
  • journal
[DEBUG] [2020-10-03 09:37:41.007] [main] [c.g.h.c.c.s.l.r.CacheRemoveListener.listen] - Remove key: B, value: world, type: evict
[D, A, C]

Because we visited a once, B has become the least accessed element.

Implementation based on LinkedHashMap

In fact, LinkedHashMap itself is a combined data structure of list and HashMap. We can directly use LinkedHashMap in JDK to implement it.

Direct implementation

public class LRUCache extends LinkedHashMap {

    private int capacity;

    public LRUCache(int capacity) {
        //Note that the accessorder of LinkedHashMap is set to true here
        super(16, 0.75f, true);
        this.capacity = capacity;
    }

    @Override
    protected boolean removeEldestEntry(Map.Entry eldest) {
        return super.size() >= capacity;
    }
}

The default LinkedHashMap does not eliminate data, so we override its removeeldestentry() method. When the number of data reaches the preset upper limit, the data is eliminated, and accessorder is set to true, which means sorting according to the access order.

The amount of code in the whole implementation is not large. It mainly applies the characteristics of LinkedHashMap.

Simple transformation

We simply modify this method to adapt it to the interface we define.

ICache<String, String> cache = CacheBs.<String,String>newInstance()
        .size(3)
        .evict(CacheEvicts.<String, String>lruLinkedHashMap())
        .build();
cache.put("A", "hello");
cache.put("B", "world");
cache.put("C", "FIFO");
//Visit a once
cache.get("A");
cache.put("D", "LRU");

Assert.assertEquals(3, cache.size());
System.out.println(cache.keySet());

test

  • code
ICache<String, String> cache = CacheBs.<String,String>newInstance()
        .size(3)
        .evict(CacheEvicts.<String, String>lruLinkedHashMap())
        .build();
cache.put("A", "hello");
cache.put("B", "world");
cache.put("C", "FIFO");
//Visit a once
cache.get("A");
cache.put("D", "LRU");

Assert.assertEquals(3, cache.size());
System.out.println(cache.keySet());
  • journal
[DEBUG] [2020-10-03 10:20:57.842] [main] [c.g.h.c.c.s.l.r.CacheRemoveListener.listen] - Remove key: B, value: world, type: evict
[D, A, C]

Summary

The problem of array o (n) traversal mentioned in the previous section has been basically solved in this section.

But in fact, there are still some problems with this algorithm. For example, during occasional batch operations, hot data will be squeezed out of the cache by non hot data. In the next section, we will learn how to further improve the LRU algorithm.

This paper mainly describes the idea, and the realization part is not posted because of space limitation.

Open source address: https://github.com/houbb/cache

If you think this article is helpful to you, you are welcome to like, comment, collect and pay attention to a wave~

Your encouragement is my greatest motivation~

Performance optimization of handwritten redis (VIII) plain LRU elimination algorithm from scratch