Application of computing storage separation in message queue


Guide to Yunmei:

With the continuous development of the Internet, high concurrency of big data is no longer far away, which is a necessary ability for most projects. Among them, message queuing is almost a necessary skill. There are many mature message queuing tools. This article introduces a self-developed message queuing tool of JD Zhilian cloud – JCQ.

_The full name of JCQ is JD cloud message queue, which is a distributed message middleware with cloudnative features developed by JD Zhilian cloud._ The original intention of JCQ design is to adapt to the cloud characteristics of message middleware, with high availability, data reliability, copy physical isolation, service autonomy, health status reporting, less or no operation and maintenance, container deployment, elastic scaling, tenant isolation, pay as you go, cloud account system, authorization and other characteristics.

1、 JCQ evolution process

JCQ started to develop version 1.0 as early as mid-2017, and GA was officially launched for sale in November 2018. However, in version 1.0, topic is limited by a single server, which can not meet the needs of users with super large size topic.

Therefore, in version 2.0, we focus on scaling. JCQ 2.0 will be officially launched in April 2019. The main new features are the capacity expansion and reduction of topic, the load balancing of hot topic among brokers, and the traffic transfer of hot broker.

In July 2019,JCQ has made another big architecture evolution — Separation of computing and storage, the large version number is JCQ 3.0, which will be launched at the end of 2019. The separation of computing and storage brings obvious benefits to the architecture, and solves many daily pain problems.

The following will describe in detail the advantages brought by this evolution and the pain points solved:

1. Effectively control the influence scope of upgrade

In jcq2.0, the computing module and the storage module are in the same process. Upgrading the computing module will upgrade the storage module together. However, the restart of the storage module is a relatively heavy action, and the work to be done includes: loading a large amount of data, comparing the message data with the message index data, truncating the dirty data, etc. Often, to repair a small bug in a computing module, you need to do the very heavy storage module restart mentioned above. In reality, most of the upgrade work is caused by the update of computing module or bugfix.

In order to solve this problem, jcq3.0 deploys computing module and storage module independently, and calls each other through RPC. Each upgrade does not affect each other. As shown in the figure below:

Application of computing storage separation in message queue

The computing node broker is only responsible for the production of messages, push messages, authentication, authentication, current limiting, congestion control, client load balancing and other business logic, which belongs to stateless service. It’s light and fast to upgrade.

The storage node store is only responsible for data writing, replica synchronization and data reading. Because the business logic is simple and the function is stable, there is no need to change or upgrade except for optimization.

2. Independent deployment to break the hardware limitation

JCQ is a shared message middleware. Users apply for topics with different specifications of TPS, and they are not aware of CPU, memory, disk and other hardware indicators. Therefore, JCQ service providers need to consider how to use these hardware indicators reasonably.

JCQ is deployed through containers, and there are many types of components. The hardware requirements of these components are also diverse, among which the computing module and storage module consume the most resources. In jcq2.0, the computing module and the storage module are deployed together, and the CPU, memory, disk and other indicators should be taken into account when selecting the model. The model requirements are single, so it is difficult to deploy with other product lines. Even for the same resource pool, there are scheduling failures due to scheduling order. For example, the remaining resources of a machine can just schedule a container a that needs large size disks, but because container B is scheduled to this machine first, the remaining resources are not enough to create a container a, and the disks on this machine are wasted.

After jcq3.0, the computing node broker and the storage node store are deployed independently. These two components can choose the model suitable for their own business and be deployed in the corresponding resource pool. suchJCQ can achieve mixed deployment with other products and share the water level of resource pool instead of bearing the water level of resource alone.

Application of computing storage separation in message queue

3. Cost reduction brought by architecture improvement

In jcq3.0, the computing node broker is a stateless service, and the master-slave handoff is relatively light, which can complete the fail over in seconds; and the anti affinity of physical devices is considered in the deployment, such as cross rack and cross AZ deployment. So,We can make a trade-off between availability and resource cost. For example, we can use the M: 1 method to do the high availability cold standby, instead of the 1:1 ratio of high availability cold standby, so as to achieve the purpose of saving hardware resources.

Application of computing storage separation in message queue

4. Solve raft performance problems

At the beginning of JCQ 1.0 design, raft algorithm was used to solve the problems of high availability and data consistency. Message log and raft log have many common features, such as sequential write, random read and terminal hot data. Therefore, it is very appropriate to use raft log as message log directly.

In the evolution of JCQ, we also found some performance problems of raft itself, such as sequential replication, sequential commit, and some processes can only be processed by single thread. To solve these problems,The most direct and effective way is to expand the number of raft and single threaded processesWithin a certain order of magnitude, the concurrency capacity increases linearly with the number of raft groups, which is called multiraft, as shown in the figure below:

Application of computing storage separation in message queue

In the figure above, each storenode node is an independent process with four groups of logical raftgroups (the orange node is the leader of the raftgroup). There is a parallel relationship among the groups, which can achieve parallel replication and parallel commit between groups.

Due to the extensive use of NiO, these raftgroups can share the communication thread pool. Expanding the number of raftgroups will not bring the problem of linear growth of thread resources.

5. Fast fault recovery and light load balancing

In jcq3.0, broker is a lightweight stateless service, which is lighter than 2.0 in terms of master-slave handoff and fault recovery, and can recover external service capability faster

At the same time, broker abstracts the connection requests of producer and consumer as pubtask and subtask, which will be collectively referred to as task later. The concept of task is very lightweight. It only describes the corresponding relationship between client and broker, and is uniformly scheduled and managed by metadata manager. To transfer a task, you only need to modify the content of the task, and the client can re connect to the new broker.

Generally speaking, the main bottleneck of broker is network bandwidth. The broker regularly counts the network entrance and exit traffic, and reports it to the management node manager. The manager judges according to the threshold of entrance traffic, exit traffic and bandwidth. When it is found that the threshold is exceeded, the manager transfers the corresponding task to the broker with less load through certain policies, and notifies the corresponding producer and consumer. After the producer and consumer receive the notification, they obtain the routing information of the task again, and automatically reconnects to the new broker to continue production and consumption.

Application of computing storage separation in message queue

6. High fan out demand

Imagine a scenario where there is a large-scale topic and N consumption groups are created. The total consumption TPS is n times of the total production TPS. The increase of consumption group will lead to the linear growth of total consumption. After reaching a certain consumption group size, a single broker can not meet the high fan out scenario due to the bandwidth of network card. Single server cannot solve this problem.

In JCQ 3.0, subtasks corresponding to these different consumption groups can be distributed to several brokers. Each broker is responsible for a part of subtasks. A single broker reads messages in advance from the store and pushes data to the consumer. suchMultiple brokers work together to complete the message traffic of all consumption groups, and cooperate to provide high fan out capability.

7. Support multiple storage engines

The great feature of message middleware is that in most scenarios, hot data is at the end, and the function of backtracking messages a few days ago is not commonly used. Therefore, there are hot and cold data.

JCQ computing node designs a storage abstraction layer, store bridge, which can access different storage engines, remote raft cluster, distributed file system wos, or S3. What’s more, it can unload the cold data from the expensive local disk to the cheap storage engine on a regular basis.

Application of computing storage separation in message queue

8. Side effects

Compared with jcq2.0, the communication mode between computing node and storage node will be changed from interface call to RPC call, and there will be some loss in delay. After testing, most of the delay is about 1ms. In most scenarios, sacrificing 1ms delay will not bring much impact to the business.

2、 Future prospects of JCQ

In the future, JCQ will mainly evolve in multi protocol compatibility, on-demand automatic expansion and reduction, cloud native, etc

1. Multi protocol compatibility

At present, JCQ is a private protocol, which has a big obstacle in guiding users to migrate. In the future, JCQ kernel will be removed and different protocol access layers will be provided externally. It is convenient for users to access JCQ from other MQ.

2. Automatic expansion and contraction

JCQ is a shared message middleware, but it lacks the feature of automatic expansion and reduction of serverless. Every big promotion, such as 618, 11.11, service trade association and other important activities. It is difficult for business parties to estimate their own peak business volume, or underestimate it, which will cause problems such as topic current limiting. If the capacity of JCQ service can be guaranteed, the flexible automatic expansion and reduction of topic will be of great help to users and play a real role in peak shaving and valley filling.

3. Cloud origin

In the future, it will support deployment and delivery in kubernetes environment, and provide native operators, which can be quickly deployed in k8s environment to better deliver private cloud and hybrid cloud projects.

Recommended reading:

Welcome to clickJingdong Zhilian cloud, learn about the developer community

More wonderful technical practice and exclusive dry goods analysis

Welcome to the official account of Jingdong developer cloud.

Application of computing storage separation in message queue