Performance optimization of simple LRU elimination algorithm

Time:2020-10-29

preface

Java implements redis from zero handwriting?

Java realizes redis from zero handwriting

How to restart memory data without losing it?

Java realizes redis from zero handwriting

Another way to realize the expiration policy of redis (5) from zero handwriting in Java

Java realizes redis from zero handwriting

We have implemented several features of redis,Java implements redis from zero handwriting?The first in first out (FIFO) drive strategy is realized.

But in practice, LRU / LFU is generally recommended.

LRU Basics

What is it?

The full name of LRU algorithm is the least recent use algorithm, which is widely used in cache mechanism.

When the cache space reaches the upper limit, it is necessary to eliminate a part of the existing data to maintain the availability of the cache, and the selection of the eliminated data is completed by LRU algorithm.

The basic idea of LRU algorithm is time locality based on locality principle

If an information item is being accessed, it is likely to be accessed again in the near future.

Extended reading

Apache Commons lrumap source code details

Redis is used as LRU map

Java handwriting redis from scratch

Simple implementation ideas

Array based

Solution: each data is attached with an additional attribute — timestamp. When each data is accessed, the timestamp of the data is updated to the current time.

When the data space is full, the entire array is scanned to eliminate the data with the smallest timestamp.

Disadvantage: it takes extra space to maintain timestamps, and the entire array needs to be scanned when data is eliminated.

This time complexity is too poor, and the space complexity is not good.

Bidirectional linked list based on finite length

Scheme: when accessing a data, when the data is not in the linked list, the data will be inserted into the head of the linked list; if it is in the linked list, the data will be moved to the head of the linked list. When the data space is full, the data at the end of the linked list is eliminated.

Insufficient: when inserting or fetching data, you need to scan the entire linked list.

This is the way we implemented in the previous section, but the disadvantage is still obvious. Every time we confirm whether an element exists, it will take o (n) time complexity to query.

Based on bidirectional linked list and hash table

Scheme: in order to improve the defect of scanning the linked list, the data is mapped to the nodes in the linked list with the hash table, and the time complexity of insert operation and read operation is reduced from O (n) to o (1)

Disadvantages: this leads to the optimization idea mentioned in the previous section, but there is still a disadvantage, that is, the space complexity is doubled.

The choice of data structure

(1) Implementation based on array

It is not recommended to select array or ArrayList here, because the time complexity of reading is O (1), but the update is relatively slow, although the JDK uses the System.arrayCopy 。

(2) Implementation based on linked list

If we choose a linked list, we can’t simply store the key and the corresponding subscript in HashMap.

Because the traversal of the linked list is actually o (n), the bidirectional linked list can theoretically be optimized by half, but this is not the O (1) effect we want.

(3) Based on bidirectional list

We keep the two-way list unchanged.

For the value corresponding to the key in the map, we put the node information of the bidirectional linked list.

The implementation mode is to implement a bidirectional linked list.

code implementation

  • Node definition
/**
 *Double linked list node
 * @author binbin.hou
 * @since 0.0.12
 * @param <K> key
 * @param <V> value
 */
public class DoubleListNode<K,V> {

    /**
     *Key
     * @since 0.0.12
     */
    private K key;

    /**
     *Value
     * @since 0.0.12
     */
    private V value;

    /**
     *Previous node
     * @since 0.0.12
     */
    private DoubleListNode<K,V> pre;

    /**
     *Next node
     * @since 0.0.12
     */
    private DoubleListNode<K,V> next;

    //fluent get & set
}
  • Core code implementation

We keep the original interface unchanged and implement it as follows:

public class CacheEvictLruDoubleListMap<K,V> extends AbstractCacheEvict<K,V> {

    private static final Log log = LogFactory.getLog(CacheEvictLruDoubleListMap.class);


    /**
     *Head node
     * @since 0.0.12
     */
    private DoubleListNode<K,V> head;

    /**
     *Tail node
     * @since 0.0.12
     */
    private DoubleListNode<K,V> tail;

    /**
     *Map information
     *
     *Key: element information
     *Value: the node information corresponding to the element in the list
     * @since 0.0.12
     */
    private Map<K, DoubleListNode<K,V>> indexMap;

    public CacheEvictLruDoubleListMap() {
        this.indexMap = new HashMap<>();
        this.head = new DoubleListNode<>();
        this.tail = new DoubleListNode<>();

        this.head.next(this.tail);
        this.tail.pre(this.head);
    }

    @Override
    protected ICacheEntry<K, V> doEvict(ICacheEvictContext<K, V> context) {
        ICacheEntry<K, V> result = null;
        final ICache<K,V> cache = context.cache();
        //Element of tail exceeded, team removed
        if(cache.size() >= context.size()) {
            //Gets the previous element of the tail node
            DoubleListNode<K,V> tailPre = this.tail.pre();
            if(tailPre == this.head) {
                log.error ("the current list is empty and cannot be deleted");
                Throw new cacheruntimeexception ("cannot delete header node!");
            }

            K evictKey = tailPre.key();
            V evictValue = cache.remove(evictKey);
            result = new CacheEntry<>(evictKey, evictValue);
        }

        return result;
    }


    /**
     *Put in the element
     *
     *(1) Delete existing
     *(2) Put the new element in the head of the element
     *
     *@ param key element
     * @since 0.0.12
     */
    @Override
    public void update(final K key) {
        //1. Execute deletion
        this.remove(key);

        //2. The new element is inserted into the head
        //head<->next
        //Change to: head < - > New < - > next
        DoubleListNode<K,V> newNode = new DoubleListNode<>();
        newNode.key(key);

        DoubleListNode<K,V> next = this.head.next();
        this.head.next(newNode);
        newNode.pre(this.head);
        next.pre(newNode);
        newNode.next(next);

        //2.2 insert into map
        indexMap.put(key, newNode);
    }

    /**
     *Remove element
     *
     *1. Get the elements in the map
     *2. There is no direct return, but the following steps are performed:
     *2.1 deleting elements in double linked list
     *2.2 deleting elements in map
     *
     *@ param key element
     * @since 0.0.12
     */
    @Override
    public void remove(final K key) {
        DoubleListNode<K,V> node = indexMap.get(key);

        if(ObjectUtil.isNull(node)) {
            return;
        }

        //Delete list node
        // A<->B<->C
        //Delete B, need to become: a < - > C
        DoubleListNode<K,V> pre = node.pre();
        DoubleListNode<K,V> next = node.next();

        pre.next(next);
        next.pre(pre);

        //Delete the corresponding information in the map
        this.indexMap.remove(key);
    }

}

It’s easy to implement. It’s a simple version of a two-way list.

The time complexity is reduced to o (1) with the help of map.

test

Let’s verify our implementation

ICache<String, String> cache = CacheBs.<String,String>newInstance()
        .size(3)
        .evict(CacheEvicts.<String, String>lruDoubleListMap())
        .build();
cache.put("A", "hello");
cache.put("B", "world");
cache.put("C", "FIFO");

//Visit a
cache.get("A");
cache.put("D", "LRU");

Assert.assertEquals(3, cache.size());
System.out.println(cache.keySet());
  • journal
[DEBUG] [2020-10-03 09:37:41.007] [main] [c.g.h.c.c.s.l.r.CacheRemoveListener.listen] - Remove key: B, value: world, type: evict
[D, A, C]

Because we visited a once, B has become the least visited element.

Implementation based on LinkedHashMap

In fact, LinkedHashMap itself is a data structure combining list and HashMap. We can directly use LinkedHashMap in JDK to implement it.

Direct implementation

public class LRUCache extends LinkedHashMap {

    private int capacity;

    public LRUCache(int capacity) {
        //Note that the accessorder of the LinkedHashMap is set to true
        super(16, 0.75f, true);
        this.capacity = capacity;
    }

    @Override
    protected boolean removeEldestEntry(Map.Entry eldest) {
        return super.size() >= capacity;
    }
}

By default, LinkedHashMap does not weed out data, so we rewrite its removeeldestentry() method. When the number of data reaches the preset upper limit, the data is eliminated, and accessorder is set to true, which means sorting in the order of access.

The code amount of the whole implementation is not large, and it mainly applies the features of LinkedHashMap.

Simple transformation

We simply modified this method to adapt to the interface we defined.

ICache<String, String> cache = CacheBs.<String,String>newInstance()
        .size(3)
        .evict(CacheEvicts.<String, String>lruLinkedHashMap())
        .build();
cache.put("A", "hello");
cache.put("B", "world");
cache.put("C", "FIFO");
//Visit a
cache.get("A");
cache.put("D", "LRU");

Assert.assertEquals(3, cache.size());
System.out.println(cache.keySet());

test

  • code
ICache<String, String> cache = CacheBs.<String,String>newInstance()
        .size(3)
        .evict(CacheEvicts.<String, String>lruLinkedHashMap())
        .build();
cache.put("A", "hello");
cache.put("B", "world");
cache.put("C", "FIFO");
//Visit a
cache.get("A");
cache.put("D", "LRU");

Assert.assertEquals(3, cache.size());
System.out.println(cache.keySet());
  • journal
[DEBUG] [2020-10-03 10:20:57.842] [main] [c.g.h.c.c.s.l.r.CacheRemoveListener.listen] - Remove key: B, value: world, type: evict
[D, A, C]

Summary

The problem of array o (n) traversal mentioned in the previous section has been basically solved in this section.

However, there are still some problems with this algorithm. For example, when batch operations occur occasionally, hot data will be squeezed out of the cache by non hot data. In the next section, we will learn how to further improve the LRU algorithm.

This paper mainly describes the ideas, the implementation part because of space constraints, not all pasted out.

Open source address:https://github.com/houbb/cache

If you feel that this article is helpful to you, you are welcome to comment on it~

Your encouragement is my biggest motivation~

Performance optimization of simple LRU elimination algorithm

Recommended Today

How do you choose several common assertion styles in Java

In daily work, whether you write unit test or use TDD programming method for development, you will encounter assertions. The common assertion styles include assert and BDD. How do you choose these common assertion styles? 01 assert style JUnit provides such assertion style as: void should_be_unlocked_when_insert_coin_given_a_entrance_machine_with_locked_state() { EntranceMachine entranceMachine = new EntranceMachine(EntranceMachineState.LOCKED); String result = […]