Java handwritten redis (x) cache elimination algorithm LFU minimum usage frequency

Time:2020-10-26

preface

Java implements redis from zero handwriting?

Java realizes redis from zero handwriting

How to restart memory data without losing it?

Java realizes redis from zero handwriting

Another way to realize the expiration policy of redis (5) from zero handwriting in Java

Java realizes redis from zero handwriting

Java realizes redis from zero handwriting

Performance optimization of simple LRU elimination algorithm

This section will learn another common cache elimination algorithm, LFU least used frequency algorithm.

Basic knowledge of LFU

concept

LFU (least frequently used) is the least frequently used algorithm recently.

LRU is time-based, which will eliminate the most infrequently accessed data in time, and put it at the top of the list in terms of algorithm performance; LFU is to eliminate the most infrequently accessed data in frequency

Since it is frequency based, it is necessary to store the number of times each data access

From the storage space, LRU will have some more space to hold the count

Core ideas

If a data has been used infrequently in the recent period, it is unlikely to be used in the future.

Realization ideas

Deletion of O (n)

In order to be able to eliminate the least used data, one’s first instinct is to be directHashMap<String, Interger>, string corresponds to key information, and integer corresponds to times.

The time complexity of setting and reading is O (1); however, deleting is more troublesome, and all traversal comparisons are required, and the time complexity is O (n);

Deletion of O (logn)

In addition, using small top heap + HashMap, small top heap insertion and deletion operations can achieve o (logn) time complexity, so the efficiency is more efficient than the first method. For example, treemap.

Deletion of O (1)

Can it be further optimized?

In fact, there are algorithms for O (1). Please refer to this paper:

An O(1) algorithm for implementing the LFU cache eviction scheme

To put it simply, personal thoughts:

If we want to implement o (1) operation, we can’t do without hash operation. In the deletion of O (n), we implement put / get of O (1).

However, the performance of deletion is poor because it takes time to find the least number of comparisons.

Private map < K, node > map; // key and data mapping
Private map < integer, linkedhashset < node > > freqmap; // linked list composed of data frequency and corresponding data

class Node {
    K key;
    V value;
    int frequency = 1;
}

We can basically solve this problem by using double hash.

Map stores the mapping relationship between key and node. Put / get must be o (1).

In the node of the key mapping, there is the corresponding frequency frequency information; the same frequency will be associated through the freqmap, and the corresponding linked list can be quickly obtained through the frequency.

The deletion is also very simple. It can be determined that the minimum frequency to be deleted is 1. If the cycle does not start from 1… N at most, the minimum freq can select the first element of the linked list to delete.

As for the priority of the list itself, you can use FIFO, or whatever you like.

Introduction to the core content of paper

Stones from other mountains can be used to attack jade.

Let’s read the O (1) paper before we implement the code.

introduce

The structure of this paper is as follows.

For the description of LFU use case, it can prove to be superior to other cache eviction algorithms

The LFU cache implements the dictionary operations that should be supported. These are the operations that determine the runtime complexity of the policy

The most famous LFU algorithm and its runtime complexity description

The running time complexity of each operation is O (1)

Purpose of LFU

Consider a caching network proxy application for the HTTP protocol.

The agent is usually located between the Internet and a user or a group of users.

It ensures that all users can access the Internet and realizes the sharing of all sharable resources to achieve the best network utilization and response speed.

Such a cache agent should try to maximize the amount of data it can cache in a limited amount of storage or memory at its disposal.

In general, it’s easy to cache static resources (such as images, CSS stylesheets, and JavaScript code) for a long time before replacing them with newer versions.

These static resources or what programmers call “assets” are contained in almost every page, so caching them is most beneficial because they will be needed for almost every request.

In addition, since network agents are required to process thousands of requests per second, the overhead required to do so should be kept to a minimum.

To this end, it should only expel resources that are not used frequently.

Therefore, you should keep frequently used resources on less frequently used resources, because the former has proved useful for some time.

Of course, there is a statement to the contrary, it says that in the future, it may not need a lot of resources, but we find that this is not the case in most cases.

For example, static resources that are frequently used on a page are always requested by each user of the page.

The cache in the LFU cache is the least, so it can be used to replace the cache policy when the cache is insufficient.

LRU may also be a suitable strategy here,However, the LRU will fail when the request mode causes all requested items not to enter the cache and requests these items in a circular manner.

PS: the cyclic request of data will cause the LRU to just not adapt to this scenario.

In the case of LRU, projects will continue to enter and leave the cache without user requests to access the cache.

However, under the same conditions, the performance of LFU algorithm is better, and most cache items will cause cache hits.

The pathological behavior of LFU algorithm is not impossible.

We are not proposing the case of LFU here, but trying to prove that if LFU is the applicable strategy, there is a better implementation method than the previously published methods.

Dictionary operations supported by LFU cache

When we talk about the cache eviction algorithm, we mainly need to do three different operations on the cache data.

  1. Set (or insert) items in the cache
  2. Retrieve (or find) items in the cache; increase their usage count (for LFU)
  3. Evict (or delete) from the cache with minimal use (or as a policy for eviction algorithms)

The most famous complexity of LFU algorithm

At the time of writing, the most famous runtime for each of the above operations for the LFU cache eviction strategy is as follows:

Insert: O (log n)

Find: O (log n)

Delete: O (log n)

These complexity values are obtained directly from the binomial heap implementation and the standard conflict free hash table.

Using minimum heap data structure and hash graph can implement LFU caching strategy easily and effectively.

The minimum heap is created based on the (item’s) usage count, and the hash table is indexed by the key of the element.

The order of all operations on the conflict free hash table is O (1), so the running time of the LFU cache is controlled by the run time of the operations on the smallest heap.

When an element is inserted into the cache, it will enter with a usage count of 1. Since the minimum heap insertion cost is O (log n), it takes O (log n) time to insert it into the LFU cache.

When looking for an element, you can find it through a hash function that hashes the key to the actual element. At the same time, the count (the count in the largest heap) is used plus 1, which results in the reorganization of the minimum heap and the element is removed from the root.

Since the element can move down to the log (n) level at any stage, this operation also takes time o (log n).

When you select an element to evict it and eventually remove it from the heap, it can lead to a major reorganization of the heap data structure.

The element with the least count is at the root of the smallest heap.

Removing the root of the smallest heap involves replacing the root node with the last leaf node in the heap and bubbling the node to the correct position.

The runtime complexity of this operation is also o (log n).

The proposed LFU algorithm

For each dictionary operation (insert, find and delete) that can be performed on the LFU cache, the runtime complexity of the proposed LFU algorithm is O (1).

This is achieved by maintaining two link lists. One is for access frequency and the other is for all elements with the same access frequency.

The hash table is used for key access elements (not shown in the figure below for clarity).

Double linked lists are used to link together nodes representing a group of nodes with the same access frequency (shown as rectangular blocks in the following figure).

We call this doubly linked list the frequency list. The set of nodes with the same access frequency is actually a list of two-way links for such nodes (shown as circular nodes in the figure below).

We refer to this list of two-way links (locally at a specific frequency) as a node list.

Each node in the node list has a pointer to its parent.

Frequency list (not shown for clarity). So nodes X and you will have a pointer to node 1, nodes Z and a will have a pointer to node 2, and so on

Java handwritten redis (x) cache elimination algorithm LFU minimum usage frequency

The following pseudo code shows how to initialize the LFU cache.

The hash table used to locate key elements is represented by key variables.

To simplify the implementation, we use set instead of linked list to store elements with the same access frequency.

Variable items are standard set data structures that contain keys for such elements with the same access frequency.

Its insert, find and delete runtime complexity is O (1).

Java handwritten redis (x) cache elimination algorithm LFU minimum usage frequency

Pseudo code

Behind are some pseudo code, our article domestic.

Understand its core idea on the line, let’s go to the real code.

feel

The core of the O (1) algorithm is not much in fact, and it should be considered as a medium difficulty topic in leetcode.

However, it is strange that this paper was put forward in 2010. It is estimated that O (logn) is the limit before?

Java code implementation

Basic attributes

public class CacheEvictLfu<K,V> extends AbstractCacheEvict<K,V> {

    private static final Log log = LogFactory.getLog(CacheEvictLfu.class);

    /**
     *Key mapping information
     * @since 0.0.14
     */
    private final Map<K, FreqNode<K,V>> keyMap;

    /**
     *Frequency map
     * @since 0.0.14
     */
    private final Map<Integer, LinkedHashSet<FreqNode<K,V>>> freqMap;

    /**
     *
     *Minimum frequency
     * @since 0.0.14
     */
    private int minFreq;

    public CacheEvictLfu() {
        this.keyMap = new HashMap<>();
        this.freqMap = new HashMap<>();
        this.minFreq = 1;
    }

}

Node definition

  • FreqNode.java
public class FreqNode<K,V> {

    /**
     *Key
     * @since 0.0.14
     */
    private K key;

    /**
     *Value
     * @since 0.0.14
     */
    private V value = null;

    /**
     *Frequency
     * @since 0.0.14
     */
    private int frequency = 1;

    public FreqNode(K key) {
        this.key = key;
    }

    //fluent getter & setter
    // toString() equals() hashCode()
}

Removing Elements

/**
 *Remove element
 *
 *1. Remove from freqmap
 *2. Remove from keymap
 *3. Update minfreq information
 *
 *@ param key element
 * @since 0.0.14
 */
@Override
public void removeKey(final K key) {
    FreqNode<K,V> freqNode = this.keyMap.remove(key);
    //1. Get the frequency according to the key
    int freq = freqNode.frequency();
    LinkedHashSet<FreqNode<K,V>> set = this.freqMap.get(freq);
    //2. Remove the corresponding node in the frequency
    set.remove(freqNode);
    log.debug ("freq = {} remove element node: {}", freq, freqnode);
    //3. Update minfreq
    if(CollectionUtil.isEmpty(set) && minFreq == freq) {
        minFreq--;
        log.debug (minfreq reduced to: {} ", minfreq);
    }
}

Update element

/**
 *Update element, update minfreq information
 *@ param key element
 * @since 0.0.14
 */
@Override
public void updateKey(final K key) {
    FreqNode<K,V> freqNode = keyMap.get(key);
    //1. It already exists
    if(ObjectUtil.isNotNull(freqNode)) {
        //1.1 remove the original node information
        int frequency = freqNode.frequency();
        LinkedHashSet<FreqNode<K,V>> oldSet = freqMap.get(frequency);
        oldSet.remove(freqNode);
        //1.2 update minimum data frequency
        if (minFreq == frequency && oldSet.isEmpty()) {
            minFreq++;
            log.debug (minfreq increased to: {} ", minfreq);
        }
        //1.3 update frequency information
        frequency++;
        freqNode.frequency(frequency);
        //1.4 put in new set
        this.addToFreqMap(frequency, freqNode);
    } else {
        //2. It doesn't exist
        //2.1 building new elements
        FreqNode<K,V> newNode = new FreqNode<>(key);
        //2.2 fixed in the list with frequency 1
        this.addToFreqMap(1, newNode);
        //2.3 update minfreq information
        this.minFreq = 1;
        //2.4 add to keymap
        this.keyMap.put(key, newNode);
    }
}

/**
 *Add to frequency map
 *@ param frequency
 *@ param freqnode node
 */
private void addToFreqMap(final int frequency, FreqNode<K,V> freqNode) {
    LinkedHashSet<FreqNode<K,V>> set = freqMap.get(frequency);
    if (set == null) {
        set = new LinkedHashSet<>();
    }
    set.add(freqNode);
    freqMap.put(frequency, set);
    log.debug ("freq = {} add element node: {}", frequency, freqnode);
}

Data obsolescence

@Override
protected ICacheEntry<K, V> doEvict(ICacheEvictContext<K, V> context) {
    ICacheEntry<K, V> result = null;
    final ICache<K,V> cache = context.cache();
    //Remove the element with the lowest frequency if the limit is exceeded
    if(cache.size() >= context.size()) {
        FreqNode<K,V> evictNode = this.getMinFreqNode();
        K evictKey = evictNode.key();
        V evictValue = cache.remove(evictKey);
        log.debug ("eliminate the minimum frequency information, key: {}, value: {}, freq: {}"),
                evictKey, evictValue, evictNode.frequency());
        result = new CacheEntry<>(evictKey, evictValue);
    }
    return result;
}

/**
 *Get the node with the minimum frequency
 *
 *@ return result
 * @since 0.0.14
 */
private FreqNode<K, V> getMinFreqNode() {
    LinkedHashSet<FreqNode<K,V>> set = freqMap.get(minFreq);
    if(CollectionUtil.isNotEmpty(set)) {
        return set.iterator().next();
    }
    Throw new cacheruntimeexception ("key with minimum frequency not found");
}

test

code

ICache<String, String> cache = CacheBs.<String,String>newInstance()
        .size(3)
        .evict(CacheEvicts.<String, String>lfu())
        .build();
cache.put("A", "hello");
cache.put("B", "world");
cache.put("C", "FIFO");
//Visit a
cache.get("A");
cache.put("D", "LRU");

Assert.assertEquals(3, cache.size());
System.out.println(cache.keySet());

journal

[DEBUG] [2020-10-03 21:23:43.722] [main] [c.g.h.c.c.s.e. CacheEvictLfu.addToFreqMap ]- freq = 1 add element node: freqnode {key = a, value = null, frequency = 1}
[DEBUG] [2020-10-03 21:23:43.723] [main] [c.g.h.c.c.s.e. CacheEvictLfu.addToFreqMap ]- freq = 1 add element node: freqnode {key = B, value = null, frequency = 1}
[DEBUG] [2020-10-03 21:23:43.725] [main] [c.g.h.c.c.s.e. CacheEvictLfu.addToFreqMap ]- freq = 1 add element node: freqnode {key = C, value = null, frequency = 1}
[DEBUG] [2020-10-03 21:23:43.727] [main] [c.g.h.c.c.s.e. CacheEvictLfu.addToFreqMap ]- freq = 2 add element node: freqnode {key = a, value = null, frequency = 2}
[DEBUG] [2020-10-03 21:23:43.728] [main] [c.g.h.c.c.s.e. CacheEvictLfu.doEvict ]- eliminate the minimum frequency information, key: B, value: world, freq: 1
[DEBUG] [2020-10-03 21:23:43.731] [main] [c.g.h.c.c.s.l.r.CacheRemoveListener.listen] - Remove key: B, value: world, type: evict
[DEBUG] [2020-10-03 21:23:43.732] [main] [c.g.h.c.c.s.e. CacheEvictLfu.addToFreqMap ]- freq = 1 add element node: freqnode {key = D, value = null, frequency = 1}
[D, A, C]

LFU vs LRU

difference

LFU is based on the access frequency mode, and LRU is based on the access time mode.

advantage

Compared with LRU algorithm, the cache hit rate of LFU algorithm is higher when the data access conforms to normal distribution.

inferiority

  • The complexity of LFU is higher than that of LRU.
  • The access frequency of data needs to be maintained, and each access needs to be updated.
  • Compared with the later data, the early data is easier to be cached, resulting in the later data is difficult to be cached.
  • The newly added data is easy to be eliminated, such as “jitter” at the end of the cache.

Summary

However, in practice, the application scenarios of LFU are not so extensive.

Because the real data is skewed, and the hot data is normal, the performance of LRU is generally better than that of LFU.

Open source address:https://github.com/houbb/cache

If you think this article is helpful to you, please comment on it. Your encouragement is my biggest motivation~

At present, we have implemented LRU and LFU algorithms with excellent performance, but the operating system actually uses these two algorithms. In the next section, we will learn the clock elimination algorithm favored by the operating system.

I don’t know what you got? Or have more ideas, welcome to discuss with me in the message area, looking forward to meeting with your thoughts.

Java handwritten redis (x) cache elimination algorithm LFU minimum usage frequency