Redis data tilt and JD open source hotkey source code analysis revealed

Time:2022-9-22

1 Introduction
The little friend next to me asked me about hot data. After giving him a rough explanation of a wave of redis data skew cases, I also reviewed some methodologies about hot data processing. At the same time, I also remembered the JD open source project hotkey that I learned last year. – A framework dedicated to solving hot data problems. Combining the knowledge points associated with the two, through a few small pictures and some rough explanations, let everyone understand the relevant methodology and hotkey source code analysis.

2 Redis data skew
2.1 Definition and Hazards
Let’s talk about the definition of data skew first, and borrow the explanation of Baidu entry:

For a cluster system, the general cache is distributed, that is, different nodes are responsible for a certain range of cached data. We have insufficient dispersion of cache data, resulting in a large amount of cache data being concentrated on one or several service nodes, which is called data skew. Generally speaking, data skew is caused by the poor effect of load balancing implementation.

As can be seen from the above definition, the reason for data skew is generally because the effect of LB is not good, resulting in a very concentrated amount of data on some nodes.

So what harm would that be?

If data skew occurs, the processing pressure of the instance that saves a large amount of data or hotspot data will increase, and the speed will be slowed down. It may even cause the instance's memory resources to be exhausted, resulting in a crash. This is what we want to avoid when applying sliced ​​clusters.

2.2 Classification of data skew
2.2.1 Data volume skew (write skew)

1. Icon

As shown in the figure, in some cases, the data distribution on the instances is not balanced, and there is a particularly large amount of data on a certain instance.

2.bigkey leads to tilt

The bigkey happens to be saved on an instance. The value of bigkey is very large (String type), or the bigkey stores a large number of collection elements (collection type), which will lead to an increase in the amount of data of this instance and a corresponding increase in memory resource consumption.

solution

When generating data in the business layer, try to avoid storing too much data in the same key-value pair.
If bigkey happens to be a collection type, there is another method, which is to split bigkey into many small collection type data and store them in different instances.
3. Uneven allocation of Slot leads to skew

Let’s briefly introduce the concept of slot. The full name of slot is Hash Slot. There are 16384 slots in the Redis Cluster slicing cluster. These hash slots are similar to data partitions. Its key, which is mapped to a hash slot. The Redis Cluster scheme uses hash slots to handle the mapping between data and instances.

A picture to explain the mapping distribution of data, hash slots, and instances.

The CRC16(city)%16384 here can be simply understood as taking the hash value of key1 according to the CRC16 algorithm and then modulo the number of slots. The result is that the slot position is 14484, and its corresponding instance node is the third one.

When building a sliced ​​cluster, operation and maintenance needs to manually allocate hash slots and allocate all 16384 slots, otherwise the Redis cluster will not work properly. Due to manual allocation, some instances may be allocated too many slots, resulting in skewed data.

solution

Use the CLUSTER SLOTS command to check

Depending on the slot allocation, use the three commands CLUSTER SETSLOT, CLUSTER GETKEYSINSLOT, and MIGRATE to migrate the slot data. The specific content will not be detailed here, and interested students can learn it by themselves.

4. Hash Tag leads to tilt

Hash Tag definition: refers to when a key contains {}, the entire key is not hashed, but only the string included in {} is hashed.
Suppose the hash algorithm is sha1. For user:{user1}:ids and user:{user1}:tweets, the hash value is equal to sha1(user1).
Hash Tag advantage: If the Hash Tag content of different keys is the same, then the data corresponding to these keys will be mapped to the same Slot, and will be assigned to the same instance at the same time.
Disadvantages of Hash Tag: If it is not used reasonably, a large amount of data may be concentrated on one instance, data skew may occur, and the load in the cluster will be unbalanced.
2.2.2 Data access skew (read skew-hot key problem)

Generally speaking, the skewed data access is caused by the hot key problem, and how to deal with the redis hot key problem is often asked in interviews. Therefore, understanding related concepts and methodologies is also an indispensable part.

1. Icon

As shown in the figure, although the amount of data on each cluster instance is not much different, the data on a certain instance is hot data and is accessed very frequently.

But why is there hot data generated?

2. Causes and hazards of hot keys

1) The data consumed by users is much larger than the data produced (hot selling products, hot news, hot comments, star live broadcasts).

In some unexpected events in daily work and life, such as: the price reduction of some popular products during Double Eleven, when a certain product is clicked or purchased tens of thousands of times, it will form a larger demand In this case, it will cause a hot spot problem.

In the same way, hot news, hot comments, star live broadcasts, etc. that are widely published and browsed, these typical scenarios of reading more and writing less will also generate hot issues.

2) The request fragment set exceeds the performance limit of a single server.

When the server reads data for access, the data is often sharded and divided. During this process, the corresponding key will be accessed on a certain host server. When the access exceeds the server limit, it will lead to hot key problems. produce.

If the hotspots are too concentrated and the hotspot keys have too much cache, which exceeds the current cache capacity, the cache sharding service will be overwhelmed. When the cache service crashes, there are more requests generated at this time, which will be cached on the background DB. Due to the weak performance of the DB itself, request penetration is easy to occur in the face of large requests, which will further lead to an avalanche phenomenon, which will seriously affect the device's performance. performance.

3. Commonly used hot key problem solutions:

Solution 1: Backup hot key

You can copy multiple copies of hot data, and add a random suffix to the key of each copy of the data, so that it and other copy data will not be mapped to the same Slot.

This is equivalent to copying a piece of data to other instances, so that random prefixes are also added when accessing, and the access pressure on one instance is evenly distributed to other instances.

For example, when we put it in the cache, we split the cache key of the corresponding business into multiple different keys. As shown in the figure below, we first split the key into N parts on the side of the update cache. For example, a key name is "good_100", then we can split it into four parts, "good_100_copy1", "good_100_copy2", " good_100_copy3", "good_100_copy4", these N keys need to be changed each time they are updated or added. This step is to remove the keys.

For the service side, we need to find a way to try to make the traffic we access evenly enough.

How to add a suffix to the hot key that you are about to access? There are several ways to hash according to the ip or mac address of the local machine, and then take the remainder of the number of keys to be removed, and finally decide what kind of key suffix is ​​spliced ​​into, so as to hit which machine; one when the service is started The random number takes the remainder of the number of split keys.

The pseudo code is as follows:

const M = N * 2
//generate random numbers
random = GenRandom(0, M)
//Construct backup new key
bakHotKey = hotKey + “_” + random
data = redis.GET(bakHotKey)
if data == NULL {
  data = GetFromDB()
  redis.SET(bakHotKey, expireTime + GenRandom(0,5))
}

Solution 2: Local cache + dynamic calculation to automatically discover hotspot cache

This solution solves the problem of hotspot keys by actively discovering hotspots and storing them. First, the Client will also access the SLB, and distribute various requests to the Proxy through the SLB, and the Proxy will forward the request to the back-end Redis in a route-based manner.

The solution to the hot key is to increase the cache on the server side. Specifically, a local cache is added on the Proxy. The local cache uses the LRU algorithm to cache hot data, and the back-end node adds a hot data calculation module to return the hot data.

The main advantages of the Proxy architecture are as follows:

Proxy local cache hotspot, read capability can be scaled horizontally
DB node regularly calculates hot data collection
DB feedback proxy hotspot data
Completely transparent to the client, no need to do any compatibility
Discovery and storage of hotspot data

For the discovery of hotspot data, firstly, the request statistics will be performed on the key in one cycle. After the request level is reached, the hotspot key will be located, and all the hotspot keys will be put into a small LRU linked list, and then through the Proxy When requesting access, if Redis finds that the point to be accessed is a hotspot, it will enter a feedback stage and mark the data at the same time.

An etcd or zk cluster can be used to store feedback hotspot data, and then all local nodes listen to the hotspot data and load it into the local JVM cache.

Acquisition of hotspot data

The processing of hot keys is mainly divided into two forms: writing and reading. During the data writing process, when SLB receives data K1 and writes it to a Redis through a Proxy, the data writing is completed.

If K1 is found to be a hotspot key after calculation by the back-end hotspot module, the Proxy will cache the hotspot, and when the client accesses K1 next time, it does not need to go through Redis.

Finally, since the proxy can be expanded horizontally, the access capability of hotspot data can be enhanced arbitrarily.

Best mature solution: JD open source hotKey This is a relatively mature solution for automatic detection of hot keys and distributed consistent caching. The principle is to make insights on the client side, and then report the corresponding hotkey. After the server side detects it, the corresponding hotkey is sent to the corresponding server for local caching, and the consistency between the local cache and the remote cache can be guaranteed.

We won't go into details here. The third part of this article: JD open source hotkey source code analysis will lead you to understand its overall principle.

3 JD open source hotkey – automatic detection of hot keys, distributed consistent cache solution
3.1 Solve Pain Points
As can be seen from the above, the hot key problem occurs more frequently in systems with high concurrency (especially when doing seckill activities), and the harm to the system is also great.

So what is the purpose of hotkey for this? What are the pain points that need to be addressed? and how it works.

Here is a paragraph from the project to summarize: for any sudden hot data that cannot be perceived in advance, including but not limited to hot data (such as a large burst of requests for the same product), hot users (such as malicious crawler brushes), Hot interfaces (burst massive requests for the same interface), etc., can be accurately detected in milliseconds. Then push these hot data, hot users, etc. to all server-side JVM memory to greatly reduce the impact on the back-end data storage layer, and users can decide how to allocate and use these hot keys (for example, for hot commodities local cache, deny access to hot users, fuse hot interfaces, or return to default). These hot data are consistent across the entire server cluster, and services are isolated.

Core function: hot data detection and push to each server in the cluster

3.2 Integration method
The integration method will not be described in detail here, and interested students can search by themselves.

3.3 Source code analysis
3.3.1 Introduction to Architecture

1. Panorama list

Process introduction:

By referencing the client package of hotkey, the client reports its own information to the worker at startup, and establishes a long connection with the worker at the same time. Regularly pull the rule information and worker cluster information on the configuration center.
The client calls the ishot() method of hotkey to match the rules first, and then counts whether it is a hot key.
Upload hot key data to worker nodes through scheduled tasks.
After the worker cluster receives all the data about the key (because the hash is used to determine which worker the key is uploaded to, so the same key will only be on the same worker node), it matches the defined rules to determine whether it is Hot key, if so, push it to the client to complete local caching.
2. Role composition

Here is a direct borrowing from the author's description:

1) etcd cluster As a high-performance configuration center, etcd can provide efficient monitoring and subscription services with minimal resource occupation. It is mainly used to store the rule configuration, the IP address of each worker, as well as the detected hot keys and manually added hot keys.

2) The client-side jar package is the reference jar added to the service. After introduction, it is possible to judge whether a key is a hot key in a convenient way. At the same time, the jar completes key reporting, monitoring rule changes in etcd, worker information changes, hot key changes, and local caffeine caching for hot keys.

3) Worker-side cluster The worker-side is an independently deployed Java program. After startup, it will connect to etcd and regularly report its own IP information for the client to obtain the address and make a long connection. After that, the main thing is to accumulate and calculate the key to be tested sent by each client. When the rule threshold set in etcd is reached, the hot key is pushed to each client.

4) Dashboard console The console is a Java program with a visual interface, which is also connected to etcd, and then sets the key rules of each APP in the console, such as 20 times in 2 seconds to calculate the heat. Then, when the worker detects the hot key, it will send the key to etcd, and the dashboard will also monitor the hot key information and store it in the database to save the record. At the same time, the dashboard can also manually add and delete hot keys for each client to monitor.

3. Hotkey project structure

3.3.2 Client side

The source code is mainly analyzed from the following three aspects:

1. Client Launcher

1) Startup method

@PostConstruct
public void init() {
    ClientStarter.Builder builder = new ClientStarter.Builder();
    ClientStarter starter = builder.setAppName(appName).setEtcdServer(etcd).build();
    starter.startPipeline();
}

appName: is the name of the application, generally the value of ${spring.application.name}, and all subsequent configurations start with this

etcd: is the address of the etcd cluster, separated by commas, the configuration center.

You can also see that ClientStarter implements the builder pattern, making the code more introductory.

2) Core entry
com.jd.platform.hotkey.client.ClientStarter#startPipeline

/**
 * Start listening etcd
 */
public void startPipeline() {
    JdLogger.info(getClass(), "etcdServer:" + etcdServer);
    //Set the maximum capacity of caffeine
    Context.CAFFEINE_SIZE = caffeineSize;

    //Set etcd address
    EtcdConfigFactory.buildConfigCenter(etcdServer);
    //Start timing push
    PushSchedulerStarter.startPusher(pushPeriod);
    PushSchedulerStarter.startCountPusher(10);
    //Open the worker reconnector
    WorkerRetryConnector.retryConnectWorkers();

    registEventBus();

    EtcdStarter starter = new EtcdStarter();
    //Monitors related to etcd are enabled
    starter.start();
}

This method has five main functions:

① Set the maximum value of the local cache (caffeine) and create an etcd instance

//Set the maximum capacity of caffeine
Context.CAFFEINE_SIZE = caffeineSize;

//Set etcd address
EtcdConfigFactory.buildConfigCenter(etcdServer);

caffeineSize is the maximum value of the local cache, which can be set at startup, and defaults to 200000 if it is not set.
etcdServer is the etcd cluster address mentioned above.

Context can be understood as a configuration class, which contains two fields:

public class Context {
    public static String APP_NAME;

    public static int CAFFEINE_SIZE;
}

EtcdConfigFactory is the factory class of ectd configuration center

public class EtcdConfigFactory {
    private static IConfigCenter configCenter;

    private EtcdConfigFactory() {}

    public static IConfigCenter configCenter() {
        return configCenter;
    }

    public static void buildConfigCenter(String etcdServer) {
        // When connecting multiple, comma separated
        configCenter = JdEtcdBuilder.build(etcdServer);
    }
}

Obtain and create etcd instance objects through its configCenter() method. The IConfigCenter interface encapsulates the behavior of etcd instance objects (including basic crud, monitoring, renewal, etc.)

② Create and start timed tasks: PushSchedulerStarter

//Start timing push
PushSchedulerStarter.startPusher(pushPeriod);//Push the key to be tested every 0.5 seconds
PushSchedulerStarter.startCountPusher(10);//Push count statistics every 10 seconds, not configurable

pushPeriod is the push interval, which can be set when restarting. The minimum is 0.05s. The faster the push, the more intensive the detection, and the faster the detection will be, but the resource consumption of the client will increase accordingly.

PushSchedulerStarter类

/**
     * Push the key to be tested every 0.5 seconds
     */
    public static void startPusher(Long period) {
        if (period == null || period <= 0) {
            period = 500L;
        }
        @SuppressWarnings("PMD.ThreadPoolCreationRule")
        ScheduledExecutorService scheduledExecutorService = Executors.newSingleThreadScheduledExecutor(new NamedThreadFactory("hotkey-pusher-service-executor", true));
        scheduledExecutorService.scheduleAtFixedRate(() -> {
            //collector of hot keys
            IKeyCollector<HotKeyModel, HotKeyModel> collectHK = KeyHandlerFactory.getCollector();
            //This is equivalent to every 0.5 seconds, through netty to push the collected hot key information to the worker, mainly the metadata information of some hot keys (the type of the app and key from which the hot key originates, and whether it is a deletion event, and also The number of reports with this hot key)
            //There is also the fact that the hot key will generate a global unique id each time it is reported, and the creation time of each report of the hot key is generated when netty sends, the same batch of hot keys time is the same
            List<HotKeyModel> hotKeyModels = collectHK.lockAndGetResult();
            if(CollectionUtil.isNotEmpty(hotKeyModels)){
                //The key set accumulated for half a second is distributed to different workers according to the hash
                KeyHandlerFactory.getPusher().send(Context.APP_NAME, hotKeyModels);
                collectHK.finishOnce();
            }

        },0, period, TimeUnit.MILLISECONDS);
    }
    /**
     * Push statistics every 10 seconds
     */
    public static void startCountPusher(Integer period) {
        if (period == null || period <= 0) {
            period = 10;
        }
        @SuppressWarnings("PMD.ThreadPoolCreationRule")
        ScheduledExecutorService scheduledExecutorService = Executors.newSingleThreadScheduledExecutor(new NamedThreadFactory("hotkey-count-pusher-service-executor", true));
        scheduledExecutorService.scheduleAtFixedRate(() -> {
            IKeyCollector<KeyHotModel, KeyCountModel> collectHK = KeyHandlerFactory.getCounter();
            List<KeyCountModel> keyCountModels = collectHK.lockAndGetResult();
            if(CollectionUtil.isNotEmpty(keyCountModels)){
                //Accumulate the amount of 10 seconds and distribute it to different workers according to the hash
                KeyHandlerFactory.getPusher().sendCount(Context.APP_NAME, keyCountModels);
                collectHK.finishOnce();
            }
        },0, period, TimeUnit.SECONDS);
    }

From the above two methods, it can be seen that the timed tasks are implemented through the timed thread pool, and they are all daemon threads.

Let's focus on the KeyHandlerFactory class, which is a clever design on the client side, which is literally translated from the class name to the key processing factory. The specific instance object is DefaultKeyHandler:

public class DefaultKeyHandler {
    //Push the HotKeyMsg message to the pusher of Netty
    private IKeyPusher iKeyPusher = new NettyKeyPusher();
    //The collector of the key to be tested, which contains two maps, the key is mainly the name of the hot key, and the value is mainly the metadata information of the hot key (for example: the type of the app and key from which the hot key comes and whether it is a deletion event or not) )
    private IKeyCollector<HotKeyModel, HotKeyModel> iKeyCollector = new TurnKeyCollector();
    //Quantity collector, which contains two maps, where the key is the corresponding rule, and the HitCount is the total number of visits of this rule and the number of visits after hot
    private IKeyCollector<KeyHotModel, KeyCountModel> iKeyCounter = new TurnCountCollector();

    public IKeyPusher keyPusher() {
        return iKeyPusher;
    }
    public IKeyCollector<HotKeyModel, HotKeyModel> keyCollector() {
        return iKeyCollector;
    }
    public IKeyCollector<KeyHotModel, KeyCountModel> keyCounter() {
        return iKeyCounter;
    }
}

There are three member objects in it, namely NettyKeyPusher, which encapsulates push messages to netty, TurnKeyCollector, a key collector to be tested, and TurnCountCollector, a quantity collector. The latter two implement the interface IKeyCollector, which can effectively aggregate hotkey processing. It fully reflects the high cohesion of the code.
Let's first take a look at the NettyKeyPusher that encapsulates the push message to netty:

/**
 * push msg to netty's pusher
 * @author wuweifeng wrote on 2020-01-06
 * @version 1.0
 */
public class NettyKeyPusher implements IKeyPusher {
    @Override
    public void send(String appName, List<HotKeyModel> list) {
        //The key set accumulated for half a second is distributed to different workers according to the hash
        long now = System.currentTimeMillis();

        Map<Channel, List<HotKeyModel>> map = new HashMap<>();
        for(HotKeyModel model : list) {
            model.setCreateTime(now);
            Channel channel = WorkerInfoHolder.chooseChannel(model.getKey());
            if (channel == null) {
                continue;
            }
            List<HotKeyModel> newList = map.computeIfAbsent(channel, k -> new ArrayList<>());
            newList.add(model);
        }

        for (Channel channel : map.keySet()) {
            try {
                List<HotKeyModel> batch = map.get(channel);
                HotKeyMsg hotKeyMsg = new HotKeyMsg(MessageType.REQUEST_NEW_KEY, Context.APP_NAME);
                hotKeyMsg.setHotKeyModels(batch);
                channel.writeAndFlush(hotKeyMsg).sync();
            } catch (Exception e) {
                try {
                    InetSocketAddress insocket = (InetSocketAddress) channel.remoteAddress();
                    JdLogger.error(getClass(),"flush error " + insocket.getAddress().getHostAddress());
                } catch (Exception ex) {
                    JdLogger.error(getClass(),"flush error");
                }
            }
        }
    }
    @Override
    public void sendCount(String appName, List<KeyCountModel> list) {
        //Accumulate the amount of 10 seconds and distribute it to different workers according to the hash
        long now = System.currentTimeMillis();
        Map<Channel, List<KeyCountModel>> map = new HashMap<>();
        for(KeyCountModel model : list) {
            model.setCreateTime(now);
            Channel channel = WorkerInfoHolder.chooseChannel(model.getRuleKey());
            if (channel == null) {
                continue;
            }
            List<KeyCountModel> newList = map.computeIfAbsent(channel, k -> new ArrayList<>());
            newList.add(model);
        }
        for (Channel channel : map.keySet()) {
            try {
                List<KeyCountModel> batch = map.get(channel);
                HotKeyMsg hotKeyMsg = new HotKeyMsg(MessageType.REQUEST_HIT_COUNT, Context.APP_NAME);
                hotKeyMsg.setKeyCountModels(batch);
                channel.writeAndFlush(hotKeyMsg).sync();
            } catch (Exception e) {
                try {
                    InetSocketAddress insocket = (InetSocketAddress) channel.remoteAddress();
                    JdLogger.error(getClass(),"flush error " + insocket.getAddress().getHostAddress());
                } catch (Exception ex) {
                    JdLogger.error(getClass(),"flush error");
                }
            }
        }
    }
}

send(String appName, List list)
It mainly pushes the key to be tested collected by TurnKeyCollector to the worker through netty. The HotKeyModel object is mainly the metadata information of some hot keys (the type of the app and key from which the hot key originates, and whether it is a deletion event, and the number of reports of the hot key). )

sendCount(String appName, List list)
It mainly pushes the key corresponding to the rules collected by TurnCountCollector to the worker through netty. The KeyCountModel object is mainly the rule information corresponding to some keys and the number of visits, etc.

WorkerInfoHolder.chooseChannel(model.getRuleKey())
Obtain the server corresponding to the key according to the hash algorithm, and distribute it to the corresponding Channel connection of the corresponding server, so the server can expand horizontally and infinitely without pressure.

Let's analyze the key collectors: TurnKeyCollector and TurnCountCollector:
Implement the IKeyCollector interface:

/**
 * Aggregate hotkey
 * @author wuweifeng wrote on 2020-01-06
 * @version 1.0
 */
public interface IKeyCollector<T, V> {
    /**
     * Return value after locking
     */
    List<V> lockAndGetResult();
    /**
     * input parameters
     */
    void collect(T t);

    void finishOnce();
}

lockAndGetResult()
The main purpose is to obtain the information collected by the return collect method, and clear the locally temporarily stored information to facilitate the accumulation of data in the next statistical cycle.

collect(T t)
As the name suggests, it stores the collected key information locally when collecting api calls.

finishOnce()
The current implementation of this method is empty, no need to pay attention.

Key collector to be tested: TurnKeyCollector

public class TurnKeyCollector implements IKeyCollector<HotKeyModel, HotKeyModel> {
    //The key in this map is mainly the name of the hot key, and the value is mainly the metadata information of the hot key (for example: the type of app and key from which the hot key comes, and whether it is a deletion event)
    private ConcurrentHashMap<String, HotKeyModel> map0 = new ConcurrentHashMap<>();
    private ConcurrentHashMap<String, HotKeyModel> map1 = new ConcurrentHashMap<>();
    private AtomicLong atomicLong = new AtomicLong(0);

    @Override
    public List<HotKeyModel> lockAndGetResult() {
        //After self-increment, the corresponding map will stop being written and wait to be read
        atomicLong.addAndGet(1);
        List<HotKeyModel> list;
        //You can observe the same position here as in the collect method, and you will find that one is to operate map0 and the other is to operate map1, so as to ensure that when reading the map, it will not block the writing of the map.
        //Two maps provide reading and writing ability in turn, the design is very clever, it is worth learning
        if (atomicLong.get() % 2 == 0) {
            list = get(map1);
            map1.clear();
        } else {
            list = get(map0);
            map0.clear();
        }
        return list;
    }
    private List<HotKeyModel> get(ConcurrentHashMap<String, HotKeyModel> map) {
        return CollectionUtil.list(false, map.values());
    }
    @Override
    public void collect(HotKeyModel hotKeyModel) {
        String key = hotKeyModel.getKey();
        if (StrUtil.isEmpty(key)) {
            return;
        }
        if (atomicLong.get() % 2 == 0) {
            //If it does not exist, return null and put the key-value in. If the same key already exists, return the value corresponding to the key without overwriting
            HotKeyModel model = map0.putIfAbsent(key, hotKeyModel);
            if (model != null) {
                //Increase the number of times the hotMey reports
                model.add(hotKeyModel.getCount());
            }
        } else {
            HotKeyModel model = map1.putIfAbsent(key, hotKeyModel);
            if (model != null) {
                model.add(hotKeyModel.getCount());
            }
        }
    }
    @Override
    public void finishOnce() {}
}

It can be seen that there are two ConcurrentHashMap and one AtomicLong in this class. By self-increasing the AtomicLong, and then taking the modulo of 2, the read and write capabilities of the two maps are controlled separately, ensuring that each map can read and write, and the same one The map cannot be read and written at the same time, which can avoid concurrent collection reads and writes without blocking. This lock-free design is very clever, which greatly improves the collection throughput.

Key number collector: TurnCountCollector
The design here is similar to TurnKeyCollector, so we won't go into details. It is worth mentioning that it has a parallel processing mechanism. When the number of collections exceeds the threshold of DATA_CONVERT_SWITCH_THRESHOLD=5000, lockAndGetResult processing uses java Stream parallel stream processing to improve processing efficiency.

③ Open the worker reconnector

//Open the worker reconnector
WorkerRetryConnector.retryConnectWorkers();
public class WorkerRetryConnector {

    /**
     * Regularly reconnect unconnected workers
     */
    public static void retryConnectWorkers() {
        @SuppressWarnings("PMD.ThreadPoolCreationRule")
        ScheduledExecutorService scheduledExecutorService = Executors.newSingleThreadScheduledExecutor(new NamedThreadFactory("worker-retry-connector-service-executor", true));
        //Enable pulling etcd worker information, if the pulling fails, continue to pull regularly
        scheduledExecutorService.scheduleAtFixedRate(WorkerRetryConnector::reConnectWorkers, 30, 30, TimeUnit.SECONDS);
    }

    private static void reConnectWorkers() {
        List<String> nonList = WorkerInfoHolder.getNonConnectedWorkers();
        if (nonList.size() == 0) {
            return;
        }
        JdLogger.info(WorkerRetryConnector.class, "trying to reConnect to these workers :" + nonList);
        NettyClient.getInstance().connect(nonList);//The netty connection method channelActive will be triggered here
    }
}

It is also executed through a timed thread. The default time interval is 30s, which cannot be set.
The worker connection information of the client is controlled by the WorkerInfoHolder. The connection information is a List. CopyOnWriteArrayList is used. After all, it is a scenario of reading more and writing less, similar to metadata information.

/**
 * Save the mapping between the worker's ip address and the Channel, which is in order. Every time the client sends a message, it will hash according to the size of the map
 * For example, key-1 is sent to the first Channel of the workerHolder, and key-2 is sent to the second Channel
 */
private static final List<Server> WORKER_HOLDER = new CopyOnWriteArrayList<>();
④ Register EventBus event subscriber

private void registEventBus() {
    //netty connector will pay attention to WorkerInfoChangeEvent event
    EventBusCenter.register(new WorkerChangeSubscriber());
    //The hot key detection callback pays attention to the hot key event
    EventBusCenter.register(new ReceiveNewKeySubscribe());
    //Rule change event
    EventBusCenter.register(new KeyRuleHolder());
}

Use guava's EventBus event message bus to decouple the project using the publish/subscriber pattern. It can use very little code, to achieve multi-component communication.

The basic schematic is as follows:

Monitor worker information changes: WorkerChangeSubscriber

/**
 * Monitor worker information changes
 */
@Subscribe
public void connectAll(WorkerInfoChangeEvent event) {
    List<String> addresses = event.getAddresses();
    if (addresses == null) {
        addresses = new ArrayList<>();
    }

    WorkerInfoHolder.mergeAndConnectNew(addresses);
}
/**
 * When the connection between the client and the worker is disconnected, delete
 */
@Subscribe
public void channelInactive(ChannelInactiveEvent inactiveEvent) {
    //Get the disconnected channel
    Channel channel = inactiveEvent.getChannel();
    InetSocketAddress socketAddress = (InetSocketAddress) channel.remoteAddress();
    String address = socketAddress.getHostName() + ":" + socketAddress.getPort();
    JdLogger.warn(getClass(), "this channel is inactive : " + socketAddress + " trying to remove this connection");

    WorkerInfoHolder.dealChannelInactive(address);
}

Listen to the hot key callback event: ReceiveNewKeySubscribe

private ReceiveNewKeyListener receiveNewKeyListener = new DefaultNewKeyListener();

@Subscribe
public void newKeyComing(ReceiveNewKeyEvent event) {
    HotKeyModel hotKeyModel = event.getModel();
    if (hotKeyModel == null) {
        return;
    }
    //Receive new key push
    if (receiveNewKeyListener != null) {
        receiveNewKeyListener.newKey(hotKeyModel);
    }
}

After this method receives the new hot key subscription event, it will be added to the collector of KeyHandlerFactory for processing.

Core processing logic: DefaultNewKeyListener#newKey:

@Override
public void newKey(HotKeyModel hotKeyModel) {
    long now = System.currentTimeMillis();
    //If 1 second has passed since the key arrived, record it. When manually deleting the key, there is no CreateTime
    if (hotKeyModel.getCreateTime() != 0 && Math.abs(now - hotKeyModel.getCreateTime()) > 1000) {
        JdLogger.warn(getClass(), "the key comes too late : " + hotKeyModel.getKey() + " now " +
                +now + " keyCreateAt " + hotKeyModel.getCreateTime());
    }
    if (hotKeyModel.isRemove()) {
        //If it is a delete event, delete it directly
        deleteKey(hotKeyModel.getKey());
        return;
    }
    //It is already a hot key, push the same hot key again, make a log record, and refresh it
    if (JdHotKeyStore.isHot(hotKeyModel.getKey())) {
        JdLogger.warn(getClass(), "receive repeat hot key :" + hotKeyModel.getKey() + " at " + now);
    }
    addKey(hotKeyModel.getKey());
}
private void deleteKey(String key) {
        CacheFactory.getNonNullCache(key).delete(key);
}
private void addKey(String key) {
  ValueModel valueModel = ValueModel.defaultValue(key);
  if (valueModel == null) {
      // does not match any rules
      deleteKey(key);
      return;
  }

//If the original key already exists, then the value will be reset, and the expiration time will also be reset. If the original does not exist, the newly added hot key
JdHotKeyStore.setValueDirectly(key, valueModel);
}
If there is a delete event in the HotKeyModel, get the caffeine corresponding to the key timeout time in RULE_CACHE_MAP, delete the key cache from it, and then return (this is equivalent to deleting the local cache).
If it is not a delete event, add the cache of the key in the caffeine cache corresponding to RULE_CACHE_MAP.
There is a point to note here. If you do not delete the event, call the addKey() method to add a cache to caffeine. null.
Listen to Rule change events: KeyRuleHolder

You can see that there are two member attributes: RULE_CACHE_MAP, KEY_RULES

/**
 * Save the mapping between timeout and caffeine, key is timeout, value is caffeine[(String,Object)]
 */
private static final ConcurrentHashMap<Integer, LocalCache> RULE_CACHE_MAP = new ConcurrentHashMap<>();
/**
 * Here KEY_RULES is to save all the rules corresponding to the appName in etcd
 */
private static final List<KeyRule> KEY_RULES = new ArrayList<>();

ConcurrentHashMap RULE_CACHE_MAP:

Save the mapping between timeout and caffeine, the key is the timeout, and the value is caffeine[(String, Object)].
Clever design: Here, the expiration time of the key is used as the bucketing strategy, so that the keys with the same expiration time will be in a bucket (caffeine), and each caffeine in this is the local cache of the client, that is, the local cache of the hotKey. The cached KV is actually stored there.
List KEY_RULES:

Here KEY_RULES is to save all the rules corresponding to the appName in etcd.
Specifically listening to the KeyRuleInfoChangeEvent event method:

@Subscribe
public void ruleChange(KeyRuleInfoChangeEvent event) {
    JdLogger.info(getClass(), "new rules info is :" + event.getKeyRules());
    List<KeyRule> ruleList = event.getKeyRules();
    if (ruleList == null) {
        return;
    }

    putRules(ruleList);
}

Core processing logic: KeyRuleHolder#putRules:

/**
 * All rules, if the rule's timeout changes, caffeine will be rebuilt
 */
public static void putRules(List<KeyRule> keyRules) {
    synchronized (KEY_RULES) {
        //If the rule is empty, clear the rule table
        if (CollectionUtil.isEmpty(keyRules)) {
            KEY_RULES.clear();
            RULE_CACHE_MAP.clear();
            return;
        }
        KEY_RULES.clear();
        KEY_RULES.addAll(keyRules);
        Set<Integer> durationSet = keyRules.stream().map(KeyRule::getDuration).collect(Collectors.toSet());
        for (Integer duration : RULE_CACHE_MAP.keySet()) {
            //First clear those stored in RULE_CACHE_MAP, but not in rule
            if (!durationSet.contains(duration)) {
                RULE_CACHE_MAP.remove(duration);
            }
        }
        // loop through all rules
        for (KeyRule keyRule : keyRules) {
            int duration = keyRule.getDuration();
            //Here, if there is no value with a timeout time of duration in RULE_CACHE_MAP, create a new one and put it in RULE_CACHE_MAP
            //For example, RULE_CACHE_MAP is originally empty, then the mapping relationship of RULE_CACHE_MAP is constructed here
            //TODO If the keyRules contains a keyRule of the same duration, only one key will be created as duration and value as caffeine, where caffeine is (string, object)
            if (RULE_CACHE_MAP.get(duration) == null) {
                LocalCache cache = CacheFactory.build(duration);
                RULE_CACHE_MAP.put(duration, cache);
            }
        }
    }
}

Use the synchronized keyword to ensure thread safety;
If the rule is empty, clear the rule table (RULE_CACHE_MAP, KEY_RULES);
Override KEY_RULES with the passed in keyRules;
Clear the mapping relationship that is not in keyRules in RULE_CACHE_MAP;
Traverse all keyRules, if there is no relevant timeout key in RULE_CACHE_MAP, assign a value in it;
⑤ Start EtcdStarter (etcd connection manager)

EtcdStarter starter = new EtcdStarter();
//Monitors related to etcd are enabled
starter.start();

public void start() {
fetchWorkerInfo();
fetchRule();
startWatchRule();
//Listen to hot key events, only listen to manually added and deleted keys
startWatchHotKey();
}

fetchWorkerInfo()

Pull worker cluster address information allAddress from etcd and update WORKER_HOLDER in WorkerInfoHolder

/**
 * Pull worker information every 30 seconds
 */
private void fetchWorkerInfo() {
    ScheduledExecutorService scheduledExecutorService = Executors.newSingleThreadScheduledExecutor();
    //Enable pulling etcd worker information, if the pulling fails, continue to pull regularly
    scheduledExecutorService.scheduleAtFixedRate(() -> {
        JdLogger.info(getClass(), "trying to connect to etcd and fetch worker info");
        fetch();

    }, 0, 30, TimeUnit.SECONDS);
}

Use a timed thread pool to execute, single thread.
Timing is obtained from etcd, the address is /jd/workers/+$appName or default, the time interval cannot be set, the default is 30 seconds, and the ip+port of the worker address is stored here.
Post WorkerInfoChangeEvent event.
Note: The address has $appName or default, which is configured in the worker. If the worker is placed under an appName, the worker will only participate in the calculation of the app.
fetchRule()

Scheduled thread to execute, single thread, the time interval cannot be set, the default is 5 seconds, when the pull rule configuration and the manually configured hotKey are successful, the thread is terminated (that is to say, it will only be executed successfully once), and the execution will continue if it fails.

private void fetchRule() {
    ScheduledExecutorService scheduledExecutorService = Executors.newSingleThreadScheduledExecutor();
    //Enable pulling etcd worker information, if the pulling fails, continue to pull regularly
    scheduledExecutorService.scheduleAtFixedRate(() -> {
        JdLogger.info(getClass(), "trying to connect to etcd and fetch rule info");
        boolean success = fetchRuleFromEtcd();
        if (success) {
            //Pull the existing hot key
            fetchExistHotKey();
            //Here, if the pull rule and the manually configured hotKey are successfully pulled, the timing execution thread stops
            scheduledExecutorService.shutdown();
        }
    }, 0, 5, TimeUnit.SECONDS);
}

fetchRuleFromEtcd()

Get the rule rule configured by the appName from etcd, the address is /jd/rules/+$appName.
If the rule rules are found to be empty, the local rule configuration cache and all rule key caches will be cleared by publishing the KeyRuleInfoChangeEvent event.
Post the KeyRuleInfoChangeEvent event.
fetchExistHotKey()

Obtain the hot key manually configured by the appName from etcd, the address is /jd/hotkeys/+$appName.
The ReceiveNewKeyEvent event is posted, and the content HotKeyModel is not a delete event.
startWatchRule()

/**
 * Asynchronously monitor rule changes
 */
private void startWatchRule() {
    ExecutorService executorService = Executors.newSingleThreadExecutor();
    executorService.submit(() -> {
        JdLogger.info(getClass(), "--- begin watch rule change ----");
        try {
            IConfigCenter configCenter = EtcdConfigFactory.configCenter();
            KvClient.WatchIterator watchIterator = configCenter.watch(ConfigConstant.rulePath + Context.APP_NAME);
            //If there is a new event, that is, the change of the rule, re-pull all the information
            while (watchIterator.hasNext()) {
                //This sentence must be written, next will make him stuck unless there is really a new rule change
                WatchUpdate watchUpdate = watchIterator.next();
                List<Event> eventList = watchUpdate.getEvents();
                JdLogger.info(getClass(), "rules info changed. begin to fetch new infos. rule change is " + eventList);

                // Pull rule information in full
                fetchRuleFromEtcd();
            }
        } catch (Exception e) {
            JdLogger.error(getClass(), "watch err");
        }
    });
}

Asynchronously monitor rule changes, and use etcd to monitor changes in nodes whose address is /jd/rules/+$appName.
Use a thread pool, single thread, and asynchronously monitor rule changes. If there is an event change, call the fetchRuleFromEtcd() method.
startWatchHotKey()
Asynchronously start monitoring hot key change information, use etcd to listen to the address prefix as /jd/hotkeys/+$appName

/**
 * Asynchronously starts to monitor the hot key change information, only the manually added key information is in this directory
 */
private void startWatchHotKey() {
    ExecutorService executorService = Executors.newSingleThreadExecutor();
    executorService.submit(() -> {
        JdLogger.info(getClass(), "--- begin watch hotKey change ----");
        IConfigCenter configCenter = EtcdConfigFactory.configCenter();
        try {
            KvClient.WatchIterator watchIterator = configCenter.watchPrefix(ConfigConstant.hotKeyPath + Context.APP_NAME);
            //If there is a new event, that is, a new key is generated or deleted
            while (watchIterator.hasNext()) {
                WatchUpdate watchUpdate = watchIterator.next();

                List<Event> eventList = watchUpdate.getEvents();
                KeyValue keyValue = eventList.get(0).getKv();
                Event.EventType eventType = eventList.get(0).getType();
                try {
                    //It can be seen from this place that the return given by etcd is the full path of the node, and the key we need needs to remove the prefix
                    String key = keyValue.getKey().toStringUtf8().replace(ConfigConstant.hotKeyPath + Context.APP_NAME + "/", "");
                    //If the key is deleted, delete it immediately
                    if (Event.EventType.DELETE == eventType) {
                        HotKeyModel model = new HotKeyModel();
                        model.setRemove(true);
                        model.setKey(key);
                        EventBusCenter.getInstance().post(new ReceiveNewKeyEvent(model));
                    } else {
                        HotKeyModel model = new HotKeyModel();
                        model.setRemove(false);
                        String value = keyValue.getValue().toStringUtf8();
                        //Add hot key
                        JdLogger.info(getClass(), "etcd receive new key : " + key + " --value:" + value);
                        //If this is a delete instruction, do nothing
                        //TODO There is a question here. I have listened to the lazy deletion command issued by the automatic detection of the worker, and skipped it here, but the local cache has not been updated, right?
                        //TODO So I guess that in the api that the client uses to determine whether the cache exists, it should determine whether the value of the relevant cache is the &quot;#[DELETE]#&quot; delete marker
                        //Disambiguation: Only the manually configured hotKey is really monitored here. The /jd/hotkeys/+$appName address of etcd is only a manually configured hotKey, and the hotKey automatically detected by the worker is notified to the client directly through the netty channel
                        if (Constant.DEFAULT_DELETE_VALUE.equals(value)) {
                            continue;
                        }
                        //The manually created value is a timestamp
                        model.setCreateTime(Long.valueOf(keyValue.getValue().toStringUtf8()));
                        model.setKey(key);
                        EventBusCenter.getInstance().post(new ReceiveNewKeyEvent(model));
                    }
                } catch (Exception e) {
                    JdLogger.error(getClass(), "new key err :" + keyValue);
                }

            }
        } catch (Exception e) {
            JdLogger.error(getClass(), "watch err");
        }
    });

}

Use thread pool, single thread, asynchronously monitor hot key changes
Use etcd to monitor the current node of the prefix address and all the changed values ​​of the child nodes
delete node action
Post ReceiveNewKeyEvent event and content HotKeyModel is delete event
Add or update node action
The value of the event change is the delete marker#[DELETE]#
If it is a delete mark, it means that the worker automatically detects or the client needs to delete the instruction.
If it is a delete marker, do nothing and skip it directly (here, you can see from the HotKeyPusher#push method that when the delete event is performed, he will add a value to the node of /jd/hotkeys/+$appName as delete The marked node, and then delete the node of the same path, so that the above delete node event can be triggered, so here it is judged that if the delete mark is directly skipped).
Not marked for deletion
Publish the ReceiveNewKeyEvent event, the createTime in the event content HotKeyModel is the timestamp corresponding to kv
Question: The code comments here say that only manually added or deleted hotKeys are monitored. Does it mean that the /jd/hotkeys/+$appName address is just a manually configured address?

Answer: It is true that only the manually configured hotKey is monitored here. The /jd/hotkeys/+$appName address of etcd is only a manually configured hotKey, and the hotKey automatically detected by the worker is notified to the client directly through the netty channel.

5. API analysis

1) Flow chart
① Query process

② Deletion process:

From the above flow chart, you should know how the hot key is reversed in the code. Here I will explain the source code analysis of the core API. Due to space limitations, we will not post the relevant source code one by one, but simply Tell you what its internal logic is like.

2) Core class: JdHotKeyStore

JdHotKeyStore is the core api class that encapsulates client calls, including the above 10 public methods, and we focus on analyzing 6 of them:

① isHotKey(String key)
Determine whether it is in the rule, if not, return false
Determine whether it is a hot key, if not or if it is and the expiration time is within 2s, collect it for TurnKeyCollector#collect
Finally, do statistical collection for TurnCountCollector#collect

② get(String key)
Get value from local caffeine
If the obtained value is a magic value, it only means that it is added to the caffeine cache, and the query is null

③ smartSet(String key, Object value)
Determine whether it is a hot key, whether it is in the rule or not, if it is a hot key, assign a value to the value, if it is not a hot key, do nothing

④ forceSet(String key, Object value)
Force assignment to value
If the key is not in the rule configuration, the passed value will not take effect, and the assigned value of the local cache will be changed to null

⑤ getValue(String key, KeyType keyType)
Get the value, if the value does not exist, call the HotKeyPusher#push method to send it to netty
If there is no rule configured for the key, there is no need to report the key and return null directly
If the obtained value is a magic value, it only means that it is added to the caffeine cache, and the query is null

⑥ remove(String key)
Deleting a key (local caffeine cache) will notify the entire cluster to delete (notify the cluster to delete through etcd)

3) Client uploads hot key entry calling class: HotKeyPusher
Core method:

public static void push(String key, KeyType keyType, int count, boolean remove) {
    if (count <= 0) {
        count = 1;
    }
    if (keyType == null) {
        keyType = KeyType.REDIS_KEY;
    }
    if (key == null) {
        return;
    }
    //The reason why LongAdder is used here is to ensure the thread safety of multi-thread counting. Although it is called in the method, in the two maps of TurnKeyCollector,
    //The instance object of HotKeyModel is stored, so that when multiple threads modify the count attribute of count at the same time, there will be a thread-safe count inaccuracy problem
    LongAdder adderCnt = new LongAdder();
    adderCnt.add(count);

    HotKeyModel hotKeyModel = new HotKeyModel();
    hotKeyModel.setAppName(Context.APP_NAME);
    hotKeyModel.setKeyType(keyType);
    hotKeyModel.setCount(adderCnt);
    hotKeyModel.setRemove(remove);
    hotKeyModel.setKey(key);


    if (remove) {
        //If the key is deleted, it is sent directly to etcd without aggregation. But there is a problem. Now, this deletion can only delete the keys added manually, not the ones detected by the worker.
        //Because each client is listening to the manually added path, it does not listen to the automatically detected path. Therefore, if there is no such key under the manual path, it cannot be deleted.
        //If you can't delete it, the effect of the cluster monitoring delete event will not be achieved. What should I do? You can add a hot key by adding a new method, and then delete it
        //TODO Why don't you delete the node directly here, won't the hotKey automatically detected and processed by the worker add new events to the node?
        //Disambiguation: According to the rules of the detection configuration, when the worker determines that a key is a hotKey, it does not add a node to the keyPath. He simply adds a null value to the local cache, representing a hot key.
        EtcdConfigFactory.configCenter().putAndGrant(HotKeyPathTool.keyPath(hotKeyModel), Constant.DEFAULT_DELETE_VALUE, 1);
        EtcdConfigFactory.configCenter().delete(HotKeyPathTool.keyPath(hotKeyModel));//TODO is very clever here to be added
        //Also delete the directory detected by the worker
        EtcdConfigFactory.configCenter().delete(HotKeyPathTool.keyRecordPath(hotKeyModel));
    } else {
        //If the key is the key to be detected in the rule, it will accumulate and wait for transmission
        if (KeyRuleHolder.isKeyInRule(key)) {
            //Accumulate and wait to send every half second
            KeyHandlerFactory.getCollector().collect(hotKeyModel);
        }
    }
}

From the source code above:

The reason why LongAdder is used here is to ensure the thread safety of multi-thread counting. Although it is called in the method, the instance objects of HotKeyModel are stored in the two maps of TurnKeyCollector, so that the count is modified in multiple threads at the same time. When counting properties, there is a thread-safe count inaccuracy.
If it is the remove deletion type, while deleting the manually configured hot key configuration path, the dashboard display hot key configuration path will also be deleted.
Only the key configured in the rule will be detected and sent to the worker for calculation.
3. Communication mechanism (interacting with workers)

1) NettyClient:netty connector

public class NettyClient {
    private static final NettyClient nettyClient = new NettyClient();

    private Bootstrap bootstrap;

    public static NettyClient getInstance() {
        return nettyClient;
    }

    private NettyClient() {
        if (bootstrap == null) {
            bootstrap = initBootstrap();
        }
    }

    private Bootstrap initBootstrap() {
        // less threads
        EventLoopGroup group = new NioEventLoopGroup(2);

        Bootstrap bootstrap = new Bootstrap();
        NettyClientHandler nettyClientHandler = new NettyClientHandler();
        bootstrap.group(group).channel(NioSocketChannel.class)
                .option(ChannelOption.SO_KEEPALIVE, true)
                .option(ChannelOption.TCP_NODELAY, true)
                .handler(new ChannelInitializer<SocketChannel>() {
                    @Override
                    protected void initChannel(SocketChannel ch) {
                        ByteBuf delimiter = Unpooled.copiedBuffer(Constant.DELIMITER.getBytes());
                        ch.pipeline()
                                .addLast(new DelimiterBasedFrameDecoder(Constant.MAX_LENGTH, delimiter))//This is to define the separator between multiple TCP packets, in order to better unpack
                                .addLast(new MsgDecoder())
                                .addLast(new MsgEncoder())
                                //When there is no message for 30 seconds, send a heartbeat packet
                                .addLast(new IdleStateHandler(0, 0, 30))
                                .addLast(nettyClientHandler);
                    }
                });
        return bootstrap;
    }
}

Using Reactor threading model, there are only 2 worker threads, no separate main thread is set
Long connection, open TCP_NODELAY
Netty's delimiter &quot;$( )$&quot; is similar to the standard of TCP packet segmentation, which is convenient for unpacking
Protobuf serialization and deserialization
When no message is sent to the peer in 30s, a heartbeat packet is sent to determine the activity
Worker thread handler NettyClientHandler
JDhotkey's tcp protocol design is to send and receive strings. Each tcp message packet is divided by special characters $( )$
Advantages: This is very simple to implement.

After obtaining the message package, json or protobuf deserialization is performed.

Disadvantage: It is necessary, from byte stream – &quot;deserialize to string -&quot; to deserialize into message object, two-layer serialization consumes part of the performance.

Fortunately, the serialization of protobuf is fast, but the speed of json serialization is only hundreds of thousands per second, which will consume some performance.

2) NettyClientHandler: Worker thread processor

@ChannelHandler.Sharable
public class NettyClientHandler extends SimpleChannelInboundHandler<HotKeyMsg> {
    @Override
    public void userEventTriggered(ChannelHandlerContext ctx, Object evt) throws Exception {
        if (evt instanceof IdleStateEvent) {
            IdleStateEvent idleStateEvent = (IdleStateEvent) evt;
            //This means that if both read and write are hung up
            if (idleStateEvent.state() == IdleState.ALL_IDLE) {
                //Send message to server
                ctx.writeAndFlush(new HotKeyMsg(MessageType.PING, Context.APP_NAME));
            }
        }

        super.userEventTriggered(ctx, evt);
    }
    //When the Channel registers the EventLoop, binds the SocketAddress and connects to the ChannelFuture, it may trigger the invocation of the channelActive method of the ChannelInboundHandler
    //Similar to trigger after successful TCP three-way handshake
    @Override
    public void channelActive(ChannelHandlerContext ctx) {
        JdLogger.info(getClass(), "channelActive:" + ctx.name());
        ctx.writeAndFlush(new HotKeyMsg(MessageType.APP_NAME, Context.APP_NAME));
    }
    //Similar to TCP after waving hands four times, it will be triggered after waiting for 2MSL time (about 180s), such as channel channel closing will trigger (channel.close())

    //When the client channel actively closes the connection, it will send a write request to the server, and then the selector where the server channel is located will listen to an OP_READ event, and then
    //Execute the data read operation, and find that the client channel has been closed when reading, then the number of read data bytes returns -1, and then execute the close operation to close the underlying socket corresponding to the channel,
    //And in the pipeline, start from the head, go down to the InboundHandler, and trigger the execution of the handler's channelInactive and channelUnregistered methods, as well as a series of operations to remove the handlers in the pipeline.
    @Override
    public void channelInactive(ChannelHandlerContext ctx) throws Exception {
        super.channelInactive(ctx);
        //The connection is disconnected, maybe only the client and server are disconnected, but they are not disconnected from etcd. It may also be that the client is disconnected from the network, or the server may be disconnected.
        // Post disconnection event. Reconnect after 10 seconds, and decide whether to reconnect according to the worker information in etcd. If there is nothing in etcd, it will not reconnect. If there is in etcd, reconnect
        notifyWorkerChange(ctx.channel());
    }
    private void notifyWorkerChange(Channel channel) {
        EventBusCenter.getInstance().post(new ChannelInactiveEvent(channel));
    }
    @Override
    protected void channelRead0(ChannelHandlerContext channelHandlerContext, HotKeyMsg msg) {
        if (MessageType.PONG == msg.getMessageType()) {
            JdLogger.info(getClass(), "heart beat");
            return;
        }
        if (MessageType.RESPONSE_NEW_KEY == msg.getMessageType()) {
            JdLogger.info(getClass(), "receive new key : " + msg);
            if (CollectionUtil.isEmpty(msg.getHotKeyModels())) {
                return;
            }
            for (HotKeyModel model : msg.getHotKeyModels()) {
                EventBusCenter.getInstance().post(new ReceiveNewKeyEvent(model));
            }
        }
    }
}

userEventTriggered

After receiving the heartbeat packet from the peer, return new HotKeyMsg(MessageType.PING, Context.APP_NAME)
channelActive

When the Channel registers the EventLoop, binds the SocketAddress and connects to the ChannelFuture, it may trigger the invocation of the channelActive method of the ChannelInboundHandler
Similar to triggering after successful TCP three-way handshake, sending new HotKeyMsg(MessageType.APP_NAME, Context.APP_NAME) to the peer
channelInactive

Similar to TCP after waving hands four times, it will be triggered after waiting for 2MSL time (about 180s). For example, when the channel is closed, the method will be triggered (channel.close()), the ChannelInactiveEvent event will be released, and the connection will be reconnected after 10s.
channelRead0

When receiving the PONG message type, make a log and return
When receiving the RESPONSE_NEW_KEY message type, publish the ReceiveNewKeyEvent event
3.3.3 Worker side

1. Entry startup loading: 7 @PostConstruct

1) The worker side handles etcd-related processing: EtcdStarter
① The first @PostConstruct: watchLog()

@PostConstruct
public void watchLog() {
    AsyncPool.asyncDo(() -> {
        try {
            // Take etcd whether to open the log configuration, address /jd/logOn
            String loggerOn = configCenter.get(ConfigConstant.logToggle);
            LOGGER_ON = "true".equals(loggerOn) || "1".equals(loggerOn);
        } catch (StatusRuntimeException ex) {
            logger.error(ETCD_DOWN);
        }
        //Monitor etcd address /jd/logOn whether to enable log configuration, and change the switch in real time
        KvClient.WatchIterator watchIterator = configCenter.watch(ConfigConstant.logToggle);
        while (watchIterator.hasNext()) {
            WatchUpdate watchUpdate = watchIterator.next();
            List<Event> eventList = watchUpdate.getEvents();
            KeyValue keyValue = eventList.get(0).getKv();
            logger.info("log toggle changed : " + keyValue);
            String value = keyValue.getValue().toStringUtf8();
            LOGGER_ON = "true".equals(value) || "1".equals(value);
        }
    });
}

Asynchronous execution in the thread pool
Take etcd's log configuration, address /jd/logOn, default true
Monitor etcd address /jd/logOn whether the log configuration is enabled, and change the switch in real time
Due to the monitoring of etcd, it will be executed all the time, instead of ending at one time
② The second @PostConstruct: watch()

/**
 * Start the callback listener to listen for rule changes
 */
@PostConstruct
public void watch() {
    AsyncPool.asyncDo(() -> {
        KvClient.WatchIterator watchIterator;
        if (isForSingle()) {
            watchIterator = configCenter.watch(ConfigConstant.rulePath + workerPath);
        } else {           
            watchIterator = configCenter.watchPrefix(ConfigConstant.rulePath);
        }
        while (watchIterator.hasNext()) {
            WatchUpdate watchUpdate = watchIterator.next();
            List<Event> eventList = watchUpdate.getEvents();
            KeyValue keyValue = eventList.get(0).getKv();
            logger.info("rule changed : " + keyValue);
            try {
                ruleChange(keyValue);
            } catch (Exception e) {
                e.printStackTrace();
            }
        }
    });
}
/**
     * When the rule changes, update the cached rule
     */
    private synchronized void ruleChange(KeyValue keyValue) {
        String appName = keyValue.getKey().toStringUtf8().replace(ConfigConstant.rulePath, "");
        if (StrUtil.isEmpty(appName)) {
            return;
        }
        String ruleJson = keyValue.getValue().toStringUtf8();
        List<KeyRule> keyRules = FastJsonUtils.toList(ruleJson, KeyRule.class);
        KeyRuleHolder.put(appName, keyRules);
    }

The etcd.workerPath configuration is used to determine whether the worker serves an app alone. The default value is &quot;default&quot;. If it is the default value, it means that the worker participates in the calculation of all app clients on etcd. Otherwise, it is only for an app. service computing

Use etcd to monitor rule changes. If it is a shared worker, the listening address prefix is ​​&quot;/jd/rules/&quot;. If it is exclusive to an app, the listening address is &quot;/jd/rules/&quot;+$etcd.workerPath

If the rule changes, modify the rule cache stored locally by the corresponding app, and clear the KV cache stored locally by the app at the same time

KeyRuleHolder: rule cache local storage

Map&gt; RULE_MAP, this map is concurrentHashMap, and the kv of the map are appName and corresponding rule respectively
The difference compared to the client's KeyRuleHolder: worker stores all app rules, each app corresponds to a rule bucket, so use map
CaffeineCacheHolder: key cache local storage

Map&gt; CACHE_MAP, which is also a concurrentHashMap, the kv of the map are appName and the caffeine of the corresponding kv respectively
Compared with the client's caffeine, the first is that the worker does not have a cache interface such as LocalCache, and the second is that the kv of the client's map is the timeout time and the cache bucket of the key corresponding to the same timeout time.
Put it in the thread pool for asynchronous execution. Because of the monitoring of etcd, it will be executed all the time, instead of executing once.

③ The third @PostConstruct: watchWhiteList()

/**
 * Start the callback listener, monitor the change of the whitelist, and only monitor the app you are in. The whitelist key does not participate in the hot key calculation and is ignored directly.
 */
@PostConstruct
public void watchWhiteList() {
    AsyncPool.asyncDo(() -> {
        // Get all whitelists from etcd config
        fetchWhite();
        KvClient.WatchIterator watchIterator = configCenter.watch(ConfigConstant.whiteListPath + workerPath);
        while (watchIterator.hasNext()) {
            WatchUpdate watchUpdate = watchIterator.next();
            logger.info("whiteList changed ");
            try {
                fetchWhite();
            } catch (Exception e) {
                e.printStackTrace();
            }
        }
    });
}

Pull and monitor etcd whitelist key configuration, the address is /jd/whiteList/+$etcd.workerPath
The keys in the whitelist do not participate in the hot key calculation and are ignored directly.
Put it in the thread pool for asynchronous execution. Because of the monitoring of etcd, it will be executed all the time, instead of executing once.
④ The fourth @PostConstruct:makeSureSelfOn()

/**
 * Check every once in a while, if you are still in etcd
 */
@PostConstruct
public void makeSureSelfOn() {
    //Enable upload worker information
    ScheduledExecutorService scheduledExecutorService = Executors.newSingleThreadScheduledExecutor();
    scheduledExecutorService.scheduleAtFixedRate(() -> {
        try {
            if (canUpload) {
                uploadSelfInfo();
            }
        } catch (Exception e) {
            //do nothing
        }
    }, 0, 5, TimeUnit.SECONDS);
}

Asynchronous execution in the thread pool, timing execution, the time interval is 5s
Report the hostName and ip+port of the local worker to etcd regularly in the form of kv, the address is /jd/workers/+$etcd.workPath+&quot;/&quot;+$hostName, the renewal time is 8s
There is a canUpload switch to control whether the worker renews to etcd regularly. If this switch is turned off, it means that the worker does not renew to etcd, so when the kv of the above address expires, etcd will delete the node, so that the client loops Judge that worker information has changed
2) Push the hot key to the dashboard for storage: DashboardPusher

① The fifth @PostConstruct:uploadToDashboard()

@Component
public class DashboardPusher implements IPusher {
    /**
     * Hot key concentration camp
     */
    private static LinkedBlockingQueue<HotKeyModel> hotKeyStoreQueue = new LinkedBlockingQueue<>();

    @PostConstruct
    public void uploadToDashboard() {
        AsyncPool.asyncDo(() -> {
            while (true) {
                try {
                    //Either the key reaches 1,000, or it reaches 1 second, it will be reported to etcd once
                    List<HotKeyModel> tempModels = new ArrayList<>();
                    Queues.drain(hotKeyStoreQueue, tempModels, 1000, 1, TimeUnit.SECONDS);
                    if (CollectionUtil.isEmpty(tempModels)) {
                        continue;
                    }

                    // Push the hot key to the dashboard
                    DashboardHolder.flushToDashboard(FastJsonUtils.convertObjectToJSON(tempModels));
                } catch (Exception e) {
                    e.printStackTrace();
                }
            }
        });
    }
}

When the number of hot keys reaches 1000 or every 1s, the data of the hot keys is sent to the dashboard through the netty channel with the dashboard, and the data type is REQUEST_HOT_KEY
LinkedBlockingQueue hotKeyStoreQueue: The concentration camp of the hot keys for the dashboard calculated by the worker, and all the hot keys pushed to the dashboard are stored in it
3) Push to each client server: AppServerPusher

① The sixth @PostConstruct:batchPushToClient()

public class AppServerPusher implements IPusher {
    /**
     * Hot key concentration camp
     */
    private static LinkedBlockingQueue<HotKeyModel> hotKeyStoreQueue = new LinkedBlockingQueue<>();

    /**
     * The main difference from the push on the dashboard is that the app is pushed every 10ms, and the dashboard is pushed every 1s
     */
    @PostConstruct
    public void batchPushToClient() {
        AsyncPool.asyncDo(() -> {
            while (true) {
                try {
                    List<HotKeyModel> tempModels = new ArrayList<>();
                    // push every 10ms
                    Queues.drain(hotKeyStoreQueue, tempModels, 10, 10, TimeUnit.MILLISECONDS);
                    if (CollectionUtil.isEmpty(tempModels)) {
                        continue;
                    }
                    Map<String, List<HotKeyModel>> allAppHotKeyModels = new HashMap<>();
                    //Split out the hot key set of each app and stack by app
                    for (HotKeyModel hotKeyModel : tempModels) {
                        List<HotKeyModel> oneAppModels = allAppHotKeyModels.computeIfAbsent(hotKeyModel.getAppName(), (key) -> new ArrayList<>());
                        oneAppModels.add(hotKeyModel);
                    }
                    // Traverse all apps and push
                    for (AppInfo appInfo : ClientInfoHolder.apps) {
                        List<HotKeyModel> list = allAppHotKeyModels.get(appInfo.getAppName());
                        if (CollectionUtil.isEmpty(list)) {
                            continue;
                        }
                        HotKeyMsg hotKeyMsg = new HotKeyMsg(MessageType.RESPONSE_NEW_KEY);
                        hotKeyMsg.setHotKeyModels(list);

                        // Send the entire app
                        appInfo.groupPush(hotKeyMsg);
                    }
                    //After pushing, clean up unused memory in time
                    allAppHotKeyModels = null;
                } catch (Exception e) {
                    e.printStackTrace();
                }
            }
        });
    }
}

It will be grouped according to the appName of the key, and then pushed through the channelGroup of the corresponding app
When the number of hot keys reaches 10 or every 10ms, the data of the hot keys is sent to the app through the netty channel with the app, and the data type is RESPONSE_NEW_KEY
LinkedBlockingQueue hotKeyStoreQueue: The concentration camp of hot keys for clients calculated by workers, and all hot keys pushed to clients are stored in it
4) Client instance node processing: NodesServerStarter
① The seventh @PostConstruct:start()

public class NodesServerStarter {
    @Value("${netty.port}")
    private int port;
    private Logger logger = LoggerFactory.getLogger(getClass());
    @Resource
    private IClientChangeListener iClientChangeListener;
    @Resource
    private List<INettyMsgFilter> messageFilters;

    @PostConstruct
    public void start() {
        AsyncPool.asyncDo(() -> {
            logger.info("netty server is starting");
            NodesServer nodesServer = new NodesServer();
            nodesServer.setClientChangeListener(iClientChangeListener);
            nodesServer.setMessageFilters(messageFilters);
            try {
                nodesServer.startNettyServer(port);
            } catch (Exception e) {
                e.printStackTrace();
            }
        });
    }
}

Asynchronous execution in the thread pool, starting the nettyServer on the client side
The two dependencies, iClientChangeListener and messageFilters, will eventually be passed to the netty message processor. iClientChangeListener will be processed as channel offline to delete the channel where ClientInfoHolder is offline or timed out. messageFilters will be used as a processing filter for netty to receive event messages (chain of responsibility). model)
② Dependent beans: IClientChangeListener iClientChangeListener

public interface IClientChangeListener {
    /**
     * discover new connections
     */
    void newClient(String appName, String channelId, ChannelHandlerContext ctx);
    /**
     * Client disconnected
     */
    void loseClient(ChannelHandlerContext ctx);
}

For client management, management of new client (which will trigger netty's connection method channelActive) and disconnection (loseClient) (which will trigger netty's disconnection method channelInactive())
The connection information of the client is mainly in the ClientInfoHolder

List apps, where AppInfo is mainly appName and corresponding channelGroup
The add and remove of apps are mainly through new (newClient) and disconnection (loseClient)
③ Dependent beans: List messageFilters

/**
 * Filter the messages from netty
 * @author wuweifeng wrote on 2019-12-11
 * @version 1.0
 */
public interface INettyMsgFilter {
    boolean chain(HotKeyMsg message, ChannelHandlerContext ctx);
}

There are four implementation classes for filtering the netty messages sent by the client to the worker, that is to say, the bottom four filters receive the netty messages sent by the client for processing.

④ The type of each message processing: MessageType

APP_NAME((byte) 1),
REQUEST_NEW_KEY((byte) 2),
RESPONSE_NEW_KEY((byte) 3),
REQUEST_HIT_COUNT((byte) 7), //Hit rate
REQUEST_HOT_KEY((byte) 8), //热key,worker->dashboard
PING((byte) 4), PONG((byte) 5),
EMPTY((byte) 6);
Order 1: HeartBeatFilter

When the message type is PING, return PONG to the corresponding client example
Sequence 2: AppNameFilter

When the message type is APP_NAME, the connection between the client and the worker is successfully established, and then the newClient method of iClientChangeListener is called to add the apps metadata information
Sequence 3: HotKeyFilter

Handle received messages of type REQUEST_NEW_KEY
First add 1 to the HotKeyFilter.totalReceiveKeyCount atomic class, which represents the total number of keys received by the worker instance
The publishMsg method sends the message to the producer for distribution and consumption through the self-built producer-consumer model (KeyProducer, KeyConsumer).
List of received messages in HotKeyMsg
First determine whether the key in the HotKeyModel is in the whitelist, if so, skip it, otherwise send the HotKeyModel through the KeyProducer
Sequence 4: KeyCounterFilter

Handle receive type as REQUEST_HIT_COUNT
This filter is specifically for the dashboard to calculate the key, so the appName is directly set to the appName configured by the worker
The data source of this filter is the client's NettyKeyPusher#sendCount(String appName, List list). The data in this filter is accumulated for 10s by default. This 10s can be configured, which is discussed in the client.
Put the constructed new KeyCountItem(appName, models.get(0).getCreateTime(), models) into the blocking queue LinkedBlockingQueue COUNTER_QUEUE, and then let CounterConsumer consume and process, the consumption logic is single-threaded
CounterConsumer: Hot key statistics consumers
Put it in the public thread pool for single-threaded execution
Take data from the blocking queue COUNTER_QUEUE, and then publish the statistics of the key in it to /jd/keyHitCount/+ appName + &quot;/&quot; + IpUtils.getIp() + &quot;-&quot; + System.currentTimeMillis() in etcd, the The path is the client cluster or default of the worker service, which is used to store the path of the number of client hotKey accesses and the total number of accesses, and then let the dashboard subscribe to the statistical display
2. Three scheduled tasks: 3 @Scheduled

1) Scheduled task 1: EtcdStarter#pullRules()

/**
 * Pull every 1 minute, all the rules of the app
 */
@Scheduled(fixedRate = 60000)
public void pullRules() {
    try {
        if (isForSingle()) {
            String value = configCenter.get(ConfigConstant.rulePath + workerPath);
            if (!StrUtil.isEmpty(value)) {
                List<KeyRule> keyRules = FastJsonUtils.toList(value, KeyRule.class);
                KeyRuleHolder.put(workerPath, keyRules);
            }
        } else {
            List<KeyValue> keyValues = configCenter.getPrefix(ConfigConstant.rulePath);
            for (KeyValue keyValue : keyValues) {
                ruleChange(keyValue);
            }
        }
    } catch (StatusRuntimeException ex) {
        logger.error(ETCD_DOWN);
    }
}

Pull the rule changes whose etcd address is /jd/rules/ every 1 minute. If the app served by the worker or the default rule changes, update the rule cache and clear the local key cache corresponding to the appName

2) Scheduled task 2: EtcdStarter#uploadClientCount()

/**
     * Upload the number of clients to etcd every 10 seconds
     */
    @Scheduled(fixedRate = 10000)
    public void uploadClientCount() {
        try {
            String ip = IpUtils.getIp();
            for (AppInfo appInfo : ClientInfoHolder.apps) {
                String appName = appInfo.getAppName();
                int count = appInfo.size();
                //Even the full gc cannot exceed 3 seconds, because the expiration time given here is 13s. Since the scheduled task is executed every 10s, if the full gc or the time reported to etcd exceeds 3s,
                //The number of clients cannot be queried on the dashboard
                configCenter.putAndGrant(ConfigConstant.clientCountPath + appName + "/" + ip, count + "", 13);
            }
            configCenter.putAndGrant(ConfigConstant.caffeineSizePath + ip, FastJsonUtils.convertObjectToJSON(CaffeineCacheHolder.getSize()), 13);
            //Report QPS per second (number of received keys, number of processed keys)
            String totalCount = FastJsonUtils.convertObjectToJSON(new TotalCount(HotKeyFilter.totalReceiveKeyCount.get(), totalDealCount.longValue()));
            configCenter.putAndGrant(ConfigConstant.totalReceiveKeyCount + ip, totalCount, 13);
            logger.info(totalCount + " expireCount:" + expireTotalCount + " offerCount:" + totalOfferCount);
            //If it is a stable application that always has keys sent, it is recommended to enable this monitoring to avoid possible network failures
            if (openMonitor) {
                checkReceiveKeyCount();
            }
//            configCenter.putAndGrant(ConfigConstant.bufferPoolPath + ip, MemoryTool.getBufferPool() + "", 10);
        } catch (Exception ex) {
            logger.error(ETCD_DOWN);
        }
    }

Every 10s, the client information calculated and stored by the worker is reported to etcd to facilitate the dashboard to query and display. For example, /jd/count/ corresponds to the number of clients, /jd/caffeineSize/ corresponds to the size of the caffeine cache, and /jd/totalKeyCount/ corresponds to the worker The total number of keys received and the total number of keys processed
It can be seen from the code that the node lease time of all etcd above is 13s, and the scheduled task is executed every 10s, which means that if the full gc or the reporting time to etcd exceeds 3s, it cannot be queried on the dashboard. Relevant accounting information of the client
If the key is not received for a long time, it is judged that the network status is not good, and the worker is disconnected to renew the lease of the node whose address is /jd/workers/+$workerPath, because the client will cycle to determine whether the node at this address has changed, so that the client can reconnect to the worker. Or disconnect the disconnected worker
3) Scheduled task 3: EtcdStarter#fetchDashboardIp()

/**
 * Get the address of the dashboard every 30 seconds
 */
@Scheduled(fixedRate = 30000)
public void fetchDashboardIp() {
    try {
        //Get DashboardIp
        List<KeyValue> keyValues = configCenter.getPrefix(ConfigConstant.dashboardPath);
        // is empty, give a warning
        if (CollectionUtil.isEmpty(keyValues)) {
            logger.warn("very important warn !!! Dashboard ip is null!!!");
            return;
        }
        String dashboardIp = keyValues.get(0).getValue().toStringUtf8();
        NettyClient.getInstance().connect(dashboardIp);
    } catch (Exception e) {
        e.printStackTrace();
    }
}

Pull the value of the dashboard connection ip whose etcd prefix is ​​/jd/dashboard/ every 30s, and judge whether DashboardHolder.hasConnected is not connected, and if so, reconnect the netty channel between the worker and the dashboard

3. Self-built producer-consumer model (KeyProducer, KeyConsumer)

The general producer-consumer model consists of three elements: producer, consumer, and message storage queue
Here the message storage queue is QUEUE in DispatcherConfig, using LinkedBlockingQueue, the default size is 200W

1)KeyProducer

@Component
public class KeyProducer {
    public void push(HotKeyModel model, long now) {
        if (model == null || model.getKey() == null) {
            return;
        }
        //Outdated messages 5 seconds ago will not be processed
        if (now - model.getCreateTime() > InitConstant.timeOut) {
            expireTotalCount.increment();
            return;
        }
        try {
            QUEUE.put(model);
            totalOfferCount.increment();
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
    }

}

Determine whether the received HotKeyModel exceeds the time configured by &quot;netty.timeOut&quot;, if it is to increase the expired total count of the expireTotalCount record, and then return

2)KeyConsumer

public class KeyConsumer {

private IKeyListener iKeyListener;
public void setKeyListener(IKeyListener iKeyListener) {
    this.iKeyListener = iKeyListener;
}
public void beginConsume() {
    while (true) {
        try {
            //It can be seen from here that the producer-consumer model here is essentially a pull mode. The reason why EventBus is not used is because a queue is needed for buffering
            HotKeyModel model = QUEUE.take();
            if (model.isRemove()) {
                iKeyListener.removeKey(model, KeyEventOriginal.CLIENT);
            } else {
                iKeyListener.newKey(model, KeyEventOriginal.CLIENT);
            }
            // After processing, increment the number by 1
            totalDealCount.increment();
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
    }
}

}

@Override
public void removeKey(HotKeyModel hotKeyModel, KeyEventOriginal original) {
   //key in the cache, appName+keyType+key
   String key = buildKey(hotKeyModel);
   hotCache.invalidate(key);
   CaffeineCacheHolder.getCache(hotKeyModel.getAppName()).invalidate(key);
   //Push all clients to delete
   hotKeyModel.setCreateTime(SystemClock.now());
   logger.info(DELETE_KEY_EVENT + hotKeyModel.getKey());
   for (IPusher pusher : iPushers) {
       //You can see here that the netty message to delete the hot key is only sent to the client, not to the dashboard (remove in DashboardPusher is an empty method)
       pusher.remove(hotKeyModel);
   }
}
    @Override
    public void newKey(HotKeyModel hotKeyModel, KeyEventOriginal original) {
        //key in cache
        String key = buildKey(hotKeyModel);
        // Determine if it is just hot soon
        //The validity period of the caffeine corresponding to hotCache is 5s, that is to say, the key will be saved for 5s, and the same hotKey will not be processed repeatedly within 5s.
        //After all, hotKey is instantaneous traffic, which can avoid repeated pushes to the client and dashboard within these 5s, and avoid invalid network overhead
        Object o = hotCache.getIfPresent(key);
        if (o != null) {
            return;
        }

        //********** watch here ************//
        //This method will be called by InitConstant.threadCount threads at the same time, there is a multi-threading problem
        //The following sentence addCount is locked, which means that the accumulation of the number for the Key is atomic, and there will be no more or less addition. When the set threshold is reached, it will be hot.
        //For example, the threshold is 2. If multiple threads are accumulated, the hot state must be correct before there is no hot. For example, if thread1 increases by 1, and thread2 increases by 1, then thread2 will return true from hot and start the push.
        //But in extreme cases, such as the threshold is 10, the current is 9, when thread1 goes here, add 1, return true, thread2 also goes here, add 1, this time is 11, return true, the problem is coming
        //The key will go to the following else twice, that is, 2 pushes.
        //So the reason for the problem is that the sentence hotCache.getIfPresent(key) is not returned under concurrent conditions, and there will be problems when two keys+1 are placed in the addCount step.
        //The test code is in the TestBlockQueue class, you can see that it will be hot at the same time when you run it directly

        //So should this problem be solved? NO, no need to solve it. 1. The conditions that must first occur are extremely harsh and it is difficult to trigger. With such a high concurrency as JD.com, I have never seen the same trigger push twice in a row online. key
        //2 Even if it is triggered, the consequences are acceptable, only 2 pushes, no impact, and the client does not perceive it. But if you have to solve it, you must lock the slidingWindow instance, which must have some overhead.

        //So as long as you ensure that the number of keys is not large enough to calculate, it is fine to calculate less. Because the frequency of hot keys must be high, it is okay to miss the count a few times. But non-hot keys, if you calculate too much, it will be wrong to be dried into hot keys.
        SlidingWindow slidingWindow = checkWindow(hotKeyModel, key);//From here, each key of each app will correspond to a sliding window
        // see if it's hot
        boolean hot = slidingWindow.addCount(hotKeyModel.getCount());

        if (!hot) {
            //If there is no hot, put it again, the cache will automatically refresh the expiration time
            CaffeineCacheHolder.getCache(hotKeyModel.getAppName()).put(key, slidingWindow);
        } else {
            //The value put in here is 1, because hotCache is used to store the hotKey just generated
            //The validity period of the caffeine corresponding to hotCache is 5s, that is to say, the key will be saved for 5s, and the same hotKey will not be processed repeatedly within 5s.
            //After all, hotKey is instantaneous traffic, which can avoid repeated pushes to the client and dashboard within these 5s, and avoid invalid network overhead
            hotCache.put(key, 1);

            // delete the key
            //This key is actually the key for slidingWindow, its combination logic is appName+keyType+key, not the hotKey pushed to client and dashboard
            CaffeineCacheHolder.getCache(hotKeyModel.getAppName()).invalidate(key);

            // start push
            hotKeyModel.setCreateTime(SystemClock.now());

            // When the switch is on, print the log. Close the log during the big promotion, it will not be printed
            if (EtcdStarter.LOGGER_ON) {
                logger.info(NEW_KEY_EVENT + hotKeyModel.getKey());
            }

            // Push to each client and etcd respectively
            for (IPusher pusher : iPushers) {
                pusher.push(hotKeyModel);
            }

        }

    }

The &quot;thread.count&quot; configuration is the number of consumers, and multiple consumers consume a QUEUE queue together
The producer-consumer model is essentially a pull model. The reason why EventBus is not used is because a queue is required for buffering.
According to whether the message type is deleted in HotKeyModel

delete message type
According to the name of appName+keyType+key in HotKeyModel, the newkey in caffeine is constructed. The newkey in caffeine is mainly used to correspond to the sliding window of slidingWindow
Delete the cache of newkey in hotCache, and put the cache kv of newKey and 1 respectively. The function of hotCache is to store the generated hot key. The validity period of the caffeine corresponding to hotCache is 5s, that is to say, the key will be saved for 5s, within 5s Do not process the same hotKey repeatedly. After all, hotKey is an instantaneous traffic, which can avoid repeated pushes to the client and dashboard within these 5s, and avoid invalid network overhead.
Delete the newKey in caffeine corresponding to appName in CaffeineCacheHolder, which stores the slidingWindow sliding window
Push to all client instances corresponding to the HotKeyModel to let the client delete the HotKeyModel
non-delete message type
According to the name of appName+keyType+key in HotKeyModel, the newkey in caffeine is constructed. The newkey in caffeine is mainly used to correspond to the sliding window of slidingWindow
Use hotCache to determine whether the newkey has just been hot, and if so, return
Calculate and determine whether the key is a hotKey according to the sliding time window (you can learn the design of the sliding time window here), and return or generate the sliding window corresponding to the newKey
If the hot key standard is not met
Re-put through CaffeineCacheHolder, the cache will automatically refresh the expiration time
If the hot key criterion is met
Add the cache corresponding to the newkey to the hotCache, and a value of 1 indicates that it is just a hot key.
Delete the sliding window cache corresponding to newkey in CaffeineCacheHolder.
Push a netty message to the client of the app corresponding to the hotKeyModel, indicating that a new hotKey is generated, which makes the client cache locally, but the pushed netty message only represents a hot key, and the client's local cache does not store the value corresponding to the key. You need to call the value in JdHotKeyStore. api to assign the value of the local cache
Push the hotKeyModel to the dashboard, indicating that a new hotKey is generated
3) Calculate the design of hot key sliding window
Due to space limitations, I won't go into details here, but directly post the explanatory article written by the project author: Java Simple Implementation of Sliding Window

3.3.4 dashboard side

There is nothing to say about this, it is to connect etcd, mysql, add, delete, modify and check, but JD.com's front-end framework is very convenient, and it can be a list by directly returning the list.

4 Summary
The second part of the article explains the reasons for the tilt of redis data and the countermeasures, and in-depth hot issues, from the discovery of hot keys to the summary of the two key problems of solving hot keys.

The third part of the article is the solution to the hot key problem – the source code analysis of JD open source hotkey, which will be comprehensively explained from the client side, the worker side, and the dashboard side, including its design, use and related principles.

I hope that through this article, everyone can not only learn the relevant methodology, but also understand the specific implementation plan of its methodology, learn and grow together.

Author: Li Peng