Java handwritten redis (11) clock clock elimination algorithm from scratch and its implementation

Time:2020-10-24

preface

Java implements redis from zero handwriting?

Java realizes redis from zero handwriting

How to restart memory data without losing it?

Java realizes redis from zero handwriting

Another way to realize the expiration policy of redis (5) from zero handwriting in Java

Java realizes redis from zero handwriting

Java handwritten redis from scratch (7) detailed explanation of LRU cache elimination strategy

We have implemented some common elimination strategies such as FIFO / LRU / LFU, but in the operating system, we actually use the clock page replacement algorithm.

The performance of LRU is really good, but it consumes more memory, and the implementation is more cumbersome.

Clock page replacement algorithm is an algorithm implementation similar to LRU, which can be regarded as an improvement of FIFO algorithm.

Clock page replacement algorithm

Why do we need clock algorithm?

The performance of LRU algorithm is close to opt, but it is difficult to implement and has high cost; FIFO algorithm is simple to implement, but its performance is poor.

So the designers of the operating system have tried many algorithms, trying to approach the performance of LRU with relatively small cost. These algorithms are all variants of the clock algorithm.

This algorithm is called clock algorithm, also known as not recently used (NRU) algorithm.

Basic ideas

When a page is loaded into memory, the bit is initialized to 0. If the page is accessed (read / write), the hardware sets it to 1

Organize each page into a circular linked list (similar to clock surface), and point the pointer to the oldest page (the most advanced one);

When a page break occurs, the oldest page pointed to by the pointer is examined. If its access is 0, it will be eliminated immediately. If the access is 1, the position is set to 0, and then the pointer moves down one space. Do this until you find the obsolete page and move the pointer to its next cell.

Personal doubts

(1) What if you look around and find that all elements are 1?

If the first element is selected by default, it is a return to the plain FIFO mechanism.

(2) Access performance issues

The traversal here can be regarded as a circular linked list

Content of each node:

K key;
boolean accessFlag;

Plain FIFO is very simple, just throw elements into the queue, and then eliminate the oldest one.

If the linked list is really used as the data structure, then the time complexity of search and update is O (n), and the performance is not so good.

The scheme that can be thought of is to store key + bidirectional linked list nodes in HashMap.

Compared with the performance improved version of the LRU, the node removal adjustment is not performed for each update, but only the corresponding flag bit is updated.

Simple clock algorithm

By associating a reference bit to each page visited, it is also called a use bit in some places.

His main idea is: when a page is loaded into main memory, the use bit is initialized to 0; if the page is accessed later, the use bit is still marked as 1.

For page replacement algorithm, the candidate frame set can be regarded as a circular buffer, and there is a pointer associated with the buffer. When a page replacement is encountered, the pointer points to the next frame of the buffer.

If there is no frame left after the page enters the main memory, that is, the use bits of all pages are 1, then a buffer is cycled from the pointer to clear all the previous used bits to the original position and change out the page corresponding to the frame.

PS: if there are no free frames, all the used bits will be cleared to zero.

example

Take the following page replacement process as an example. The pages visited are: 1,2,3,4,1,2,5,1,2,3,4,5.

There are four free frames in main memory, and the corresponding structure of each page is (page number, use bit).

At the beginning, page number 1 enters the main memory. If there are idle frames in main memory, mark the use bit as 1. Since there is no page 1 in main memory, page missing interrupt will occur.

Similarly, the following pages 2, 3, 4 enter main memory, and mark their use bits as 1, resulting in page missing interruption.

When the following pages 1 and 2 enter main memory, they are not processed because they are already in main memory.

When the next page 5 enters the main memory, there is no empty frame in the main memory. At this time, with the pointer moving the entire buffer, all the use bits of the previous page are cleared to 0, that is, the use bits corresponding to page 1, 2, 3 and 4 are all 0. The pointer returns to the original position, and page 1 is replaced, and page 5 is changed into main memory, and the bit is marked as 1.

By analogy, we can see that clock has 10 page faults.

Java handwritten redis (11) clock clock elimination algorithm from scratch and its implementation

Gclock(Generalized clock page replacement algorithm)

Algorithm idea

This algorithm is a variant of clock.

Compared with clock, which is represented by binary 0 and 1, the flag bit of gclock is an integer, which means that it can be increased to infinity in theory.

working principle

(1) When the object to be cached is in the cache, the value of its flag bit is increased by 1. At the same time, the pointer points to the next object of the object.

(2) If it is not in the cache, check the marker bit that the pointer points to the object. If it is 0, the object to be cached is replaced by the object to be cached; otherwise, the value of the tag bit is subtracted by 1, and the pointer points to the next object. This is until one object is eliminated. Since the value of the tag bit is allowed to be greater than 1, the pointer may loop multiple times to weed out an object.

PS: This is similar to the simplified version of LFU, and the corresponding occurrence times are counted.

WSclock(Working set clock page replacement algorithm)

Algorithm idea

This algorithm is also a variant of clock, which may be the most widely used algorithm in practice.

It adopts the principle of clock and is an enhanced version of WS algorithm.

The data structure of the algorithm is a circular linked list. Each cache object stores the “last used time” RT and “reference or not” R flag bits, and uses a cycle timer t. Age is expressed as the difference between the current time and RT

working principle

(1) When the object to be cached exists in the cache, the RT is updated to the current time. At the same time, the pointer points to the next object of the object.

(2) If it doesn’t exist in the cache, if the cache is not full, the update pointer points to the position RT as the current time and R as 1. At the same time, the pointer points to the next object. If it is full, one object needs to be eliminated. Check the object that the pointer points to,

  • If R is 1, indicating that the object is in the working set, reset r to 0 and the pointer points to the next object.
  • R is 0. If age is greater than t, it indicates that the object is not in the working set. Replace the object, and place r as 1 and RT as the current time. If the age is not greater than t, continue to look for obsolete objects. If you go back to the beginning of the pointer and you haven’t found an obsolete object, the first object with r = 0 is eliminated.

Second chance method (or enhanced clock)

Improved clock algorithm

Idea: reduce the processing cost of missing pages

Modify the clock algorithm so that it allows dirty pages to always remain in one hour scan, and uses both dirty bits and use bits to guide the replacement

Algorithm flow

In addition to using the used bit, a modified bit is added to the previous clock algorithm, which is also called dirty bit in some places.

Now each page has two states, namely (use bit and modify bit), which can be considered in the following four situations:

(0,0): not used or modified recently, best state!

(0,1): modified but not used recently, will be written

(1,0): used but not modified, the next round will be used again

(1,1): used and modified, the last selection of page replacement in the next round

example

Take the following page replacement process as an example:

The visited pages are: 0,1,3,6,2,4,5,2,5,0,3,1,2,5,4,1,0. The red numbers indicate the pages to be modified, that is, their modified bit will be set to 1. In the figure below, these pages are represented in italics, and the use bits and modified bits are shown in the following figure. The “fault?” below indicates the number of times to look for a free frame when a page is missing.

Java handwritten redis (11) clock clock elimination algorithm from scratch and its implementation

Order of substitution

  1. Start from the current position of the pointer to find the page that satisfies (use bit, modify bit) to (0,0) in main memory;
  2. If the first step does not find the one that meets the conditions, then search for the page with the status of (0,1);
  3. If it is still not found, the pointer returns to its original position and sets the usage bits of all pages in the collection to 0. Repeat step 1, and if necessary, repeat step 2 so that you are sure to find the page to be replaced.

Implementation of clock algorithm in Java

explain

This paper mainly implements a simple version of clock algorithm, and adds some performance optimization to the conventional implementation. (the whole network may be exclusive, or the first to realize this)

Optimization is mainly based on performance considerations. Similar to the previous performance optimization for LRU, the query operation is optimized from O (n) to o (1).

Realization ideas

We define a circular linked list that conforms to the current business scenario (you can also go out independently in this later stage, and have time to write a data structure project separately, which is convenient for reuse)

Define the node that contains accessflag.

We use bidirectional linked list instead of unidirectional list, so the performance of deletion is the best.

The map is used to save the information of the key to avoid looping through the whole linked list to determine whether the key exists or not, and exchange space for time.

OK, the next step is the coding phase of happiness.

code implementation

Node definition

/**
 *Circular linked list node
 * @author binbin.hou
 * @since 0.0.15
 * @param <K> key
 * @param <V> value
 */
public class CircleListNode<K,V> {

    /**
     *Key
     * @since 0.0.15
     */
    private K key;

    /**
     *Value
     * @since 0.0.15
     */
    private V value = null;

    /**
     *Have you ever been interviewed
     * @since 0.0.15
     */
    private boolean accessFlag = false;

    /**
     *Next node
     * @since 0.0.15
     */
    private CircleListNode<K, V> pre;

    /**
     *Next node
     * @since 0.0.15
     */
    private CircleListNode<K, V> next;

    //getter & setter
}

Here are a few simple elements: key, value, accessflag (the identity of whether you have visited), and then next, pre. The user implements the bidirectional linked list.

Realization of bidirectional linked list

Basic attributes

In order to keep consistent with the original LRU bidirectional linked list, we implement the original interface.

public class LruMapCircleList<K,V> implements ILruMap<K,V> {

    private static final Log log = LogFactory.getLog(LruMapCircleList.class);

    /**
     *Head node
     * @since 0.0.15
     */
    private CircleListNode<K,V> head;

    /**
     *Map
     * @since 0.0.15
     */
    private Map<K, CircleListNode<K,V>> indexMap;

    public LruMapCircleList() {
        //Two way circular linked list
        this.head = new CircleListNode<>(null);
        this.head.next(this.head);
        this.head.pre(this.head);

        indexMap = new HashMap<>();
    }

}

The head node is initialized, and the indexmap user saves the relationship between the key and the bidirectional node.

Delete element

/**
 *Remove element
 *
 *1. Whether it exists or not, it will be ignored
 *2. Remove if it exists, and remove from linked list + map
 *
 * head==>1==>2==>head
 *
 *After deleting 2:
 * head==>1==>head
 *@ param key element
 * @since 0.0.15
 */
@Override
public void removeKey(final K key) {
    CircleListNode<K,V> node = indexMap.get(key);
    if(ObjectUtil.isNull(node)) {
        log.warn (the corresponding deletion information does not exist: {} ", key);
        return;
    }
    CircleListNode<K,V> pre = node.pre();
    CircleListNode<K,V> next = node.next();
    //1 -- > (x2) -- > 3 remove 2 directly
    pre.next(next);
    next.pre(pre);
    indexMap.remove(key);
    log.debug ("key: {} remove from circular list", key);
}

It is not difficult to delete a node. You can directly remove the node from the circular linked list and remove the information in the indexmap at the same time.

to update

The same method is used for put / get here. In fact, if you want to implement the enhanced version of clock algorithm, it is better to distinguish the two. However, I feel that the principle is similar, so it will not be implemented here. It is estimated that this is the last section of the elimination algorithm.

/**
 *Put in the element
 *
 *Similar to FIFO, it is placed directly at the end of the queue
 * 
 * head==>1==>head
 *Elements added:
 *
 * head==>1==>2==>head
 *
 *(1) If the element does not exist, it is inserted directly.
 *The default accessflag is 0;
 *(2) If it already exists, update accessflag = 1;
 *
 *@ param key element
 * @since 0.0.15
 */
@Override
public void updateKey(final K key) {
    CircleListNode<K,V> node = indexMap.get(key);
    //Existence
    if(ObjectUtil.isNotNull(node)) {
        node.accessFlag(true);
        log.debug (the node already exists, set the node access ID to true, key: {} ", key);
    } else {
        //If it does not exist, it will be inserted to the end
        node = new CircleListNode<>(key);
        CircleListNode<K,V> tail = head.pre();
        tail.next(node);
        node.pre(tail);
        node.next(head);
        head.pre(node);
        //Put it into indexmap for quick positioning
        indexMap.put(key, node);
        log.debug (the node does not exist. Add a node to the linked list: {} ", key);
    }
}

Here is to distinguish whether the lower node already exists.

(1) If it already exists, get the node directly and update accessflag = true;

(2) Does not exist: insert a new node, accessflag = false

Elimination of data

/**
 *Delete oldest element
 *
 *(1) From head.next  Start traversing. If the element accessflag = 0, it will be removed directly
 *(2) If accessflag = 1, set its value to 0 to cycle to the next node.
 *
 *@ return result
 * @since 0.0.15
 */
@Override
public ICacheEntry<K, V> removeEldest() {
    //fast-fail
    if(isEmpty()) {
        log.error ("the current list is empty and cannot be deleted");
        Throw new cacheruntimeexception ("cannot delete header node!");
    }
    //Start with the oldest element, which is directly from here head.next  In the beginning, you can consider optimizing the record key in the future
    CircleListNode<K,V> node = this.head;
    while (node.next() != this.head) {
        //Next element
        node = node.next();
        if(!node.accessFlag()) {
            //No visit, direct elimination
            K key = node.key();
            this.removeKey(key);
            return CacheEntry.of(key, node.value());
        } else {
            //Set the current accessflag = 0 to continue with the next one
            node.accessFlag(false);
        }
    }
    //If you don't find it again, just take the first element.
    CircleListNode<K,V> firstNode = this.head.next();
    return CacheEntry.of(firstNode.key(), firstNode.value());
}

Traverse the node directly, and directly eliminate the node with accessflag = 0.

If accessflag = 1, set its value to 0, and then proceed to the next one. (it’s a little bit like a death free gold medal can only be used once)

The loop is not found again. In fact, it is taken directly head.next It can be degraded to FIFO. Of course, since we have updated accessflag = 0, we can actually continue the loop.

  • Deficiencies in implementation

Here’s one thing to improve: we don’t have to loop from the beginning every time. In fact, the disadvantages are obvious. The element that enters the queue first must be eliminated for the second time. Other elements that have not been accessed may always exist. You can use an element to remember this position. (the next node of the last eliminated node), which is more in line with the idea of clock algorithm.

Another method is not to set the accessed accessflag to 0, and no element can be found in the loop. It can be demoted to FIFO directly. However, after most elements are accessed, the performance will be worse. Therefore, it is recommended to mark the position of the last loop.

call

When the cache is full, we can call the current circular list:

import com.github.houbb.cache.api.ICache;
import com.github.houbb.cache.api.ICacheEntry;
import com.github.houbb.cache.api.ICacheEvictContext;
import com.github.houbb.cache.core.model.CacheEntry;
import com.github.houbb.cache.core.support.struct.lru.ILruMap;
import com.github.houbb.cache.core.support.struct.lru.impl.LruMapCircleList;
import com.github.houbb.log.integration.core.Log;
import com.github.houbb.log.integration.core.LogFactory;

/**
 *Elimination strategy clock algorithm
 *
 * @author binbin.hou
 * @since 0.0.15
 */
public class CacheEvictClock<K,V> extends AbstractCacheEvict<K,V> {

    private static final Log log = LogFactory.getLog(CacheEvictClock.class);

    /**
     *Circular linked list
     * @since 0.0.15
     */
    private final ILruMap<K,V> circleList;

    public CacheEvictClock() {
        this.circleList = new LruMapCircleList<>();
    }

    @Override
    protected ICacheEntry<K, V> doEvict(ICacheEvictContext<K, V> context) {
        ICacheEntry<K, V> result = null;
        final ICache<K,V> cache = context.cache();
        //Remove the element at the end of the team if the limit is exceeded
        if(cache.size() >= context.size()) {
            ICacheEntry<K,V>  evictEntry = circleList.removeEldest();;
            //Performs a cache removal operation
            final K evictKey = evictEntry.key();
            V evictValue = cache.remove(evictKey);

            log.debug (key: {}, value: {} ", evictkey, evictvalue based on clock algorithm);
            result = new CacheEntry<>(evictKey, evictValue);
        }

        return result;
    }


    /**
     *Update information
     *@ param key element
     * @since 0.0.15
     */
    @Override
    public void updateKey(final K key) {
        this.circleList.updateKey(key);
    }

    /**
     *Remove element
     *
     *@ param key element
     * @since 0.0.15
     */
    @Override
    public void removeKey(final K key) {
        this.circleList.removeKey(key);
    }

}

In fact, it is not difficult to call the next method directly.

test

OK, after the code is written, let’s simply verify it.

Test code

ICache<String, String> cache = CacheBs.<String,String>newInstance()
        .size(3)
        .evict(CacheEvicts.<String, String>clock())
        .build();
cache.put("A", "hello");
cache.put("B", "world");
cache.put("C", "FIFO");
//Visit a
cache.get("A");
cache.put("D", "LRU");
Assert.assertEquals(3, cache.size());
System.out.println(cache.keySet());

journal

[DEBUG] [2020-10-07 11:32:55.396] [main] [c.g.h.c.c.s.s.l.i. LruMapCircleList.updateKey ]- node does not exist, add node to linked list: a
[DEBUG] [2020-10-07 11:32:55.398] [main] [c.g.h.c.c.s.s.l.i. LruMapCircleList.updateKey ]- node does not exist, add node to linked list: B
[DEBUG] [2020-10-07 11:32:55.401] [main] [c.g.h.c.c.s.s.l.i. LruMapCircleList.updateKey ]- node does not exist, add node to linked list: C
[DEBUG] [2020-10-07 11:32:55.403] [main] [c.g.h.c.c.s.s.l.i. LruMapCircleList.updateKey ]- the node already exists. Set the node access ID to true, key: a
[DEBUG] [2020-10-07 11:32:55.404] [main] [c.g.h.c.c.s.s.l.i. LruMapCircleList.removeKey ]- key: B is removed from the circular list
[DEBUG] [2020-10-07 11:32:55.406] [main] [c.g.h.c.c.s.e. CacheEvictClock.doEvict ]- eliminate key: B, value: world based on clock algorithm
[DEBUG] [2020-10-07 11:32:55.410] [main] [c.g.h.c.c.s.l.r.CacheRemoveListener.listen] - Remove key: B, value: world, type: evict
[DEBUG] [2020-10-07 11:32:55.411] [main] [c.g.h.c.c.s.s.l.i. LruMapCircleList.updateKey ]- node does not exist, add node to linked list: D
[D, A, C]

It’s in line with our expectations.

Comparison of LRU, FIFO and clock

The essence of LRU and FIFO are FIFO ideas, but LRU is to sort the pages according to the latest access time, so it is necessary to dynamically adjust the order of each page during each page visit (the latest access time of each page changes); while FIFO sorts the pages according to the time when the pages enter the memory, and this time is fixed The order of the pages is fixed.

If the program is local, the LRU will be good. If all the pages in memory have not been accessed, they will degenerate to FIFO (for example, if the page has not been accessed after entering memory, the latest access time is the same as the time of entering memory).

The performance of LRU algorithm is better, but the system cost is higher; FIFO algorithm has lower system cost, but belady phenomenon may occur.

Therefore, the best choice is the clock algorithm. In each page visit, it does not need to dynamically adjust the order of the page in the linked list, but just make a mark, wait for the page missing break, and then move it to the end of the linked list.

The clock algorithm performs as well as the LRU for pages that have not been accessed in memory, while for those pages that have been visited, it can’t remember the exact access order as LRU does.

Replacement algorithm supplement

We have described the common permutation algorithms.

However, there are many variants of the algorithm, and there are more algorithms in different scenarios. Here, we supplement the algorithm without detailed explanation, and we will not do the corresponding implementation here.

Objective to improve the cognitive system of the whole elimination algorithm.

Optimal permutation algorithm (OPT)

The eliminated pages selected by the optimal (OPT) replacement algorithm will be the pages that will never be used in the future or will not be visited in the longest time, which can ensure the lowest page missing rate.

However, it is impossible to predict which of the pages in memory will not be accessed for the longest time in the future.

The best permutation algorithm can be used to evaluate other algorithms. Suppose that the system allocates three physical blocks to a process, and considers the following page number reference strings:

7, 0, 1, 2, 0, 3, 0, 4, 2, 3, 0, 3, 2, 1, 2, 0, 1, 7, 0, 1

When the process is running, load 7, 0, 1 pages into memory in turn.

When the process wants to visit page 2, page missing will be interrupted. According to the best replacement algorithm, the page 7 that needs to be transferred in for the 18th visit will be eliminated.

Then, when accessing page 0, there is no need to generate a page break because it is already in memory. When visiting page 3, page 1 will be eliminated according to the best replacement algorithm And so on, as shown in Figure 3-26.

From the figure, we can see the situation when using the best permutation algorithm.

As you can see, the number of page breaks is 9 and the number of page replacement is 6.

Java handwritten redis (11) clock clock elimination algorithm from scratch and its implementation

Of course, this is a theoretical algorithm, which can not be realized in practice, because we can not predict how the later data will be used.

PBA: page buffering algorithm

Although LRU and clock permutation algorithms are better than FIFO algorithms, they both need some hardware support and need to pay more overhead. Moreover, replacing a modified page is more expensive than replacing an unmodified page.

Page buffering algorithm (PBA) can not only improve the performance of paging system, but also adopt a simple replacement strategy.

VAX / VMS operating system uses page buffering algorithm. It adopts the variable allocation and local permutation methods mentioned above, and FIFO is used in the permutation algorithm.

The algorithm requires that one obsolete page be put into one of the two linked lists, that is, if the page is not modified, it will be directly put into the free list; otherwise, it will be put into the linked list of modified pages. It should be noted that the page is not physically moved in memory at this time, but the table entries in the page table are moved to one of the above two linked lists.

The free page list is actually a list of free physical blocks, in which each physical block is free. Therefore, programs or data can be loaded in it. When a page needs to be read in, the first physical block in the free physical block list can be used to load the page. When there is an unmodified page to be swapped out, it is not actually swapped out of memory, but the physical block of the unmodified page is hung at the end of the free page list.

Similarly, when replacing a modified page, its physical block is also hung at the end of the modified page list. In this way, both modified and unmodified pages can be kept in memory. When the process visits these pages again in the future, it only costs less to make the pages return to the process’s resident set. When the number of modified pages reaches a certain value, such as 64 pages, they are written back to the disk together, thus significantly reducing the number of disk I / O operations.

A simple page buffering algorithm has been implemented in Mach operating system, but it does not distinguish between modified pages and unmodified pages.

Comparison of permutation algorithms

algorithm notes
Optimal algorithm Not achievable, but can be used as a benchmark
NRU (not recently used) algorithm Rough approximation of LRU
FIFO algorithm Important (frequently used) pages may be discarded
Second chance algorithm It is much better than FIFO
Clock algorithm actual
LRU (least recently used) algorithm It’s excellent, but it’s hard to achieve
NFU (least frequently used) algorithm Approximation of LRU
Aging algorithm Very close to LRU
Working set algorithm It costs a lot to implement
Working set clock algorithm Good and effective algorithm

Summary

Clock algorithm is a trade-off, in practical application, the operating system chooses this algorithm.

The advantage of understanding clock is thatYou don’t have to update the location of the elements every timeWe only need to update it once when it is eliminated. Although we use bidirectional linked list optimization in LRU, the time complexity is O (1), but it is still a waste.

The cache elimination algorithm is basically over here. Thank you for your support and wish you something.

Open source address:https://github.com/houbb/cache

If you feel that this article is helpful to you, you are welcome to comment on it. Your encouragement is my biggest motivation~

I don’t know what you got? Or have more ideas, welcome to discuss with me in the message area, looking forward to meeting with your thoughts.

Java handwritten redis (11) clock clock elimination algorithm from scratch and its implementation