Yan Yanfei: unveiling and optimizing Kafka’s high performance

Time:2021-1-23

Welcome toTencent cloud + community, get more Tencent mass technology practice dry goods~

This article is first published in cloud + community and cannot be reproduced without permission.

Good afternoon, everyone. I’m from BeijingYan Yanfei, senior engineer of ckafka team of Tencent cloud infrastructure department. Today, let’s first share some key points of high performance of open source Kafka. Then I will share some optimization points of Kafka community made by Tencent cloud ckafka. Finally, I will introduce my outlook for the future of Kafka community.

Kafka’s high performance

First of all, I will introduce the architecture of Kafka, so that you can have a more macro understanding of Kafka. Then I will introduce the storage method and specific message storage format of Kafka in more detail. Of course, in order to facilitate you to have an intuitive understanding of Kafka’s high performance, I will also give its performance data in the end.

Overall structure

We can see that the entire Kafka cluster only contains two components, broker and zookeeper.

Broker is the core engine of the whole Kafka cluster, which is responsible for storing and forwarding messages and providing external services. We can see that Kafka cluster can expand and shrink the capacity of the whole cluster very simply by adding and deleting brokers. The basic unit of Kafka’s external services is topic, so to realize the parallel expansion capability of topic level is to realize the parallel expansion capability of application level. In order to realize the parallel extension ability of application level, Kafka adopts the method of partitioning the topic, and makes different partitions fall on different brokers by partitioning the topic, so as to make use of the ability of more brokers, and finally realizes the horizontal extension of application level.

Zookeeper is mainly responsible for storing some configuration information, broker information, topic information and other metadata in the whole cluster, and undertakes part of the coordination and selection function, which can be understood as the configuration management center of Kafka cluster. At this point, you will feel that in Kafka cluster, you can simply add and delete dynamically by broker to achieve cluster expansion and reduction, but there is only one zookeeper in the whole cluster. Will zookeeper become the bottleneck of the whole cluster, thus restricting the parallel expansion ability of the whole cluster? Indeed, in some older versions of Kafka, both producers and consumers of Kafka need to communicate with zookeeper to pull metadata, coordinate consumption groups, and submit and save offset of consumption groups. A problem caused by this is that all clients have to communicate with zookeeper directly, which causes great pressure on zookeeper, affects the stability of the system, and finally affects the parallel expansion ability of the whole Kafka cluster. However, after 0.9 (including), the Kafka team optimized the whole Kafka by adding some protocols and coordination modules. At present, Kafka has realized that the production and consumption of clients do not need any communication with zookeeper, so zookeeper only acts as the configuration management center at present, and the pressure is very small. It will not become the bottleneck of the cluster and restrict the horizontal expansion ability of the cluster.

You can also see that producers and consumers interact with broker directly to realize the production and consumption function. Kafka does not adopt the parallel expansion ability of traditional system by adding a layer of agent. In Kafka’s design, through the internal routing protocol, producers and consumers can directly negotiate with the broker, so that the client can directly produce and consume with the broker without the help of a third-party agent. Agent free method can not only reduce the length of the whole data link and delay, but also improve the stability of the whole system, and save a lot of cost.

In summary, the overall architecture of Kafka reflects the following main advantages. Firstly, Kafka cluster can be expanded horizontally by adding or deleting broker real cluster level. Secondly, by partitioning the topic, the infinite parallel expansion of application level is realized. Thirdly, through the excellent communication protocol, the production system can communicate with the back-end broker directly, saving agent, not only reducing the length of data link, reducing the delay, but also greatly reducing the cost.

So far, I think you have a more macro understanding of Kafka. We know that the overall architecture of the system determines the capacity limit of the whole system. However, the performance of key components in the system determines the number of servers in the cluster under the same capability. The increase in the number of servers will not only increase the cost, but also bring more operation and maintenance pressure and affect the stability of the system. So I will introduce the system architecture of Kafka core engine broker.

Broker Architecture

We can see that broker is a typical reactor model, which mainly includes a network thread pool, which is responsible for processing network requests, sending and receiving network requests, packaging and unpacking, and then pushing requests to the core processing module through the request queue, which is responsible for real business logic processing (Kafka will store all messages, so it is mainly some file I / 0 operations). We can see that Kafka adopts multi thread mode, which can make full use of the multi-core advantage of modern system. Secondly, the asynchronous decoupling of network processing module and core processing module is realized by using queue mode, and the network processing and file I / O parallel processing are realized, which greatly improves the efficiency of the whole system.

At this point, you have a macro understanding of Kafka architecture. As mentioned above, Kafka will store all messages on the ground. So why doesn’t Kafka fear the disk as much as traditional message queuing does, and try to cache without touching the disk? Why did Kafka choose to store all messages? Next, I will explain its storage organization and storage format to reveal the secrets one by one.

How storage is organized

This is the current storage organization of Kafka. We can see that topic is just a logical concept, and does not correspond to any physical entity. In order to realize the horizontal expansion of topic, Kafka will partition it. Partition is presented in the form of directory, and the specific data in partition has not been partitioned. In this way, during production, we can quickly find the latest allocation and directly append it to the end of the file. We can see that Kafka makes full use of the sequential writing of disks in production, which greatly improves the throughput of production. At the same time, there is another advantage of fragmentation storage, which is that we can easily delete the old fragmentation to achieve the deletion of expired messages. At the same time, in order to facilitate consumption, Kafka also uses some skills in the naming of the fragment. The fragment is named by using the offset of the first message contained in it for formatting. In this way, when consuming data, it is very convenient to locate the file fragment of the message through binary search. At the same time, in order to achieve fast positioning within the fragment, Kafka will also establish two sparse index files for each data fragment. Through the index file, binary search can be used to quickly locate the location of the specified message in the data fragment, and then consume it. Through the explanation, we can see that the whole production and consumption of Kafka is actually sequential reading and writing, making full use of the disk sequential reading and writing ability. Secondly, the consumption of Kafka adopts two-level binary search, and its search performance only depends on the index size of a partition, which is not affected by the data volume of the whole system.

The basic unit of Kafka processing is message. What is the format of Kafka message? What is the specific format of landing stored messages? Needless to say, all of these will greatly affect the performance of the system, so I will also introduce Kafka’s message format in detail.

message format

To facilitate your understanding, here I will show the message format in the form of C code. Kafka message is a simple binary encoding, network byte order storage, so it can be very efficient encoding and decoding operation. At the same time, we can see that the whole Kafka message header is very compact, only about 30 bytes, and also contains CRC check code to check the message. At the same time, the most ingenious thing in Kafka is that the message format is consistent in the production system, network transmission, broker, including the final file storage. So the message does not need any transcoding in the transmission of the whole system, and the efficiency is extremely high. Of course, in order to improve the throughput of the whole system, Kafka realizes the mass production and consumption of messages. In Kafka, batch messages are arranged one by one in memory in binary form. At the same time, in order to improve the utilization of system network and disk, Kafka implements message compression. The flow chart on the right illustrates the process of message compression in detail. As you can see, first of all, we compress the whole batch message as a whole to generate a new binary string. Then, the value is packaged into the value field of a new message. We can see that Kafka cleverly realizes batch compression of messages through message nesting, which improves the overall compression efficiency. At the same time, this method also ensures the consistency of message format. Maintaining message consistency has the following advantages: first, we only need the producer to compress the message once in the whole message flow. After the compressed message is sent to the broker, the broker only needs one decompression operation to verify the message and set the message offset. Then the message can be directly stored in the file without a broker Compression operations that consume performance. Therefore, even if the message compression is adopted, the consumption of broker side is very low. At the same time, because of maintaining the consistency of compressed message format, when there is a consumption request, the broker does not need any decompression and compression operation, and can directly send the message to the consumer in the form of compression, and the consumer is responsible for decompression. In this way, the broker does not need any decompression and compression operation in the whole consumption, which can greatly improve the performance of the broker. It can be seen that Kafka achieves end-to-end data compression in this way, and allocates computing resource consumption to production system and consumption system.

Kafka high performance

Because of the time, the key points of Kafka’s high performance are basically covered here. Of course, Kafka team has many ingenious designs for its high performance. I won’t repeat them one by one here because of the time. Of course, the ppt on this page lists the key points of its high performance in detail. If you are interested, you can think about these key points carefully.

I’ve talked so much about the key points of Kafka’s high performance, but what kind of performance does it have? In order to give you a more intuitive understanding of the performance of Kafka, I will also give you the relevant performance test data. Of course, any performance test without pre-set test conditions is empty, so before giving the data, give the test configuration conditions. First, the test scenario is a single broker topic with multiple partitions. 2、 The hardware configuration of the machine is 32 core 64g memory, 10 Gigabit network card, and 12 2T SATA disks are mounted at the same time. 3、 In the broker version, we choose 0.10.2.0. In terms of broker configuration, we choose to swipe every 100000 messages or two seconds. 4、 We use the community Kafka’s native pressure testing tool and open 140 customers for pressure testing at the same time to simulate certain concurrency. Of course, all the pressure test data in the future are based on this unified configuration, and I will no longer believe in the test conditions described later.

From the table below, we can see that Kafka can easily achieve millions of QPS in the case of small packets, and even hundreds of thousands of QPS in the case of 1K large packets. It can be said that the whole Kafka performance is very strong. But at the same time, we also found that the CPU utilization rate of Kafka broker in the test, especially the disk I / O utilization rate is not high, indicating that Kafka still has some room for optimization. So let me introduce some optimization points of ckafka to Kafka community.

Kafka performance optimization

In this chapter, first of all, I will lead you to have a further in-depth understanding of the entire broker side architecture. Through further understanding of its architecture, we can find out its possible bottlenecks, and then optimize the bottlenecks. Then I will pick out several specific optimization points for detailed introduction.

Analysis of current architecture

Here, in order to facilitate your understanding, we use a real request and the path to explain how each module of the broker interacts. First of all, when a production starts, it needs to connect with the broker. The broker side will have an accept thread module for connection monitoring to establish a new connection. After the connection is established, accept will forward the network connection polling to a thread in the network receiving and sending processing thread pool for processing. So far, the connection access has been completed. Next, we will enter the data request. At this time, the production request data sent by the producer will be directly processed by the network transceiver processing thread. Whenever the network thread receives a complete packet and completes the unpacking operation, it will push the request to the request queue, and the back-end core processing thread will carry out the real logical processing. The back-end core I / O processing thread will compete to pull the production request task from the request queue. First, it will parse the corresponding message and create the corresponding message object. Then, the message object will carry out some validity verification and set the corresponding offset. After that, it will write the message to the file. Of course, after the completion of writing, it will detect whether the number of messages without disk brushing meets the disk brushing requirements. If the disk brushing requirements are met, the thread will take the initiative to perform disk brushing operation. Finally, the processing result will be returned to the corresponding network processing thread through the response queue of each network processing thread. Finally, the network thread will package and return the result to the production end. At this point, there is a production line The process is completed.

In this process, we can find that: first, in the whole Kafka architecture, there is only one request queue, and the queue has not been optimized without any lock, which will lead to fierce lock competition between the network processing thread pool and the core I / O processing thread, which may affect the concurrency of the whole system and the performance of the system. So this is an optimization point that we should think of. Second, as we just mentioned, Kafka chooses to do disk brushing directly in the core thread, which will block the whole core process and affect the performance of the whole system. This is also the second optimization point we found today. Thirdly, we find that a large number of message objects are generated in the production message to verify the validity of the message. The generation of a large number of message objects will have a great impact on the GC of the JVM, and may become the bottleneck of the system performance. Of course, we have done a lot of other aspects of Kafka community optimization, here because of the time relationship, mainly focus on these three points.

Lock optimization

We can see from the architecture diagram that the first version of lock optimization is actually very simple. We directly replace the only request queue on the broker side with a lock free request queue. Of course, we also optimize the replacement of all response queues. After this round of optimization, what effect have we achieved? This is the optimization result of the current lock free queue. By comparison, we can find that after the optimization of the lock free queue, compared with the community version Kafka, the whole performance is basically consistent. But according to our previous analysis, it is reasonable to say that there should be a greater performance improvement, but why did not achieve the expected effect?

In order to find out, we make a more detailed statistical analysis of broker. After statistical analysis, we counted the number of requests. Through the following statistical chart, we found that no matter the community Kafka or our optimized version, even in the case of millions of QPS messages, the number of production requests is very small, which is below the 10W level. At this point, we understand that it is because the open source Kafka adopts the method of batch sending and merges a large number of production requests, which sharply reduces the number of requests on the whole broker side. The reduction of the number of requests reduces the lock contention between the network processing thread and the core processing thread. At the current request level, lock contention will not become the bottleneck of the real system So it’s normal that our lockless queue doesn’t achieve the desired effect. At this point, it can be said that our first optimization is not very successful, but the optimization road is always long and tortuous, and it will not scare us if we fail once. We ckafka will continue to move forward on the optimization road.

Optimization of file disk brushing

Next, we will introduce the second optimization point, asynchronous disk brushing optimization. In the aspect of asynchronous disk brushing optimization, ckafka specially adds a group of disk brushing threads, which are specially used for disk brushing. When the core thread finds that there is a need for disk brushing, it directly generates a disk brushing task, and pushes the disk brushing task to the disk brushing thread through the lockless queue, and then it can do the disk brushing. In this way, we can realize that the core processing thread processing is not blocked when we brush the disk, which will greatly improve the performance of a system. Of course, after this round, do we have any effect? Let’s take a look at the performance comparison data below.

After the asynchronous disk brushing optimization, we can see that the optimized throughput is 4 to 5 times better than the community version in the case of small packets. At the same time, even in the case of large packets, it is about twice as good (with the increase of packets and partitions, the optimized throughput will decrease). At the same time, we can find that the I / O utilization rate of the whole system has been very high in the test process of asynchronous disk brushing optimization, which is basically more than 90%. It can be said that the bottleneck of the whole system now should be disk I / O. at the same time, the I / O utilization rate of more than 90% indicates that we have drained the performance of the system disk, and that our subsequent throughput optimization space has been reduced Not much. So does this mean that there is no optimization space for Kafka? In fact, the optimization of the system not only includes the improvement of throughput, but also can optimize the resource utilization under the same throughput. Therefore, the optimization of Kafka community by ckafka does not stop at present, so we further optimize the next GC. We hope that this optimization can not only improve throughput, but also better reduce resource utilization and further reduce costs in similar scenarios.

GC optimization

In the aspect of GC optimization, Kafka community will generate a message object for each message when verifying the production message, resulting in a large number of message objects. Through optimization, ckafka uses message verification directly on ByteBuffer binary data, so that no message object will be generated in the whole message verification, reducing the generation of a large number of message objects, It will reduce the pressure on JVM GC and improve the performance of the system. By comparing the performance data before and after optimization, we can see that it has a certain effect. We can see that the time-consuming proportion of the whole GC after optimization is less than 2.5%. It can be said that GC will not become the bottleneck of the system. Let’s take a look at the GC time consumption of community open source kakfa. With more partitions and smaller messages, it can directly reach 10% consumption. Through our GC optimization, we can see that the performance of the whole GC is improved by 1.5% to 7%. Similarly, we can see that the CPU consumption of our whole system is about 5% to 10% lower than that of the community version when the throughput is almost the same. Therefore, we can be sure that GC optimization has effectively reduced the CPU resource consumption of the system and has a certain effect. Finally, we found that the I / O in the whole system before and after GC optimization has basically reached the peak, which has become the bottleneck of the system. Therefore, GC optimization is consistent with the previous prediction, and there is no significant increase in throughput. It mainly focuses on reducing the consumption of system resources. For the front-end client, it is to reduce the delay of the system to a certain extent.

So far, due to time reasons, the introduction of Kafka’s performance optimization has been completed. But in order to make it more convenient for you to understand intuitively, the ppt on this page posts the final optimized contrast effect. Let’s take a look at the effect of the final optimization. We can see that our final full version optimization has a performance improvement of 4 to 5 times in the case of small packets, and even in the case of 1K large packets, it has a performance improvement of about twice (of course, with the increase of partition and message size, the optimization effect shows a certain downward trend). At the same time, we can find that the I / O of the whole system has become the bottleneck of the system. It also provides a reference for the selection of system hardware in the future. Maybe we will further improve the throughput of the system by mounting more disks, squeeze the performance of the system, and further balance the consumption of CPU and disk ratio. Of course, by choosing more appropriate hardware, we can achieve the appropriate ratio of CPU, disk, and network, so as to maximize the utilization of resources.

Next, I will talk about some problems we found in the operation of ckafka and our optimization points for these problems. At the same time, we hope that the community Kafka can adopt some key suggestions we found and optimized in the operation of ckafka, so that Kafka can better adapt to the production conditions.

First, the current community Kafka can not use the pipe way for consumption, which leads to the following problems: first, the performance of the consumer is very dependent on the network delay with the broker. When the consumer and the broker exist across the city, the increase of network delay will lead to very low consumption performance, which eventually limits the use of Kafka scenarios, and makes the service life of Kafka better It can not work well in cross city data synchronization, which limits its use scenarios. 2、 Kafka reuses the consumption logic in the process of replica replication, which also leads to its inability to use the pipe mode. As a result, the replica synchronization performance is low and very dependent on delay. As a result, the whole Kafka cluster is unlikely to carry out some cross regional deployment, which limits the flexibility of Kafka deployment, and at the same time, it is under great pressure In this case, it is easy to cause ISR jitter and affect the performance of the system. In view of this problem, in fact, in the second point, we have optimized it so that the replica pull can be carried out in pipe mode. Even if we need to do some cross city deployment in the future, the entire replica synchronization performance can meet the requirements. But the first problem we can’t solve here is that we were born here to make Kafka compatible with the community version, which allows customers to directly adopt the open source SDK, so we can’t optimize it here. Here, I hope that Kafka community can adopt relevant suggestions to realize the consumer pipe mode, so that the whole consumption performance does not depend on the network delay, and the user’s use does not have some restrictions on the geographical space.

Second, the current community Kafka does not support the low version consumers to consume the high version production increase news directly for the sake of performance. However, there are three message versions in Kafka. As a result, when using Kafka, the upgrade and degradation of producers and consumers are very unfriendly. Of course, when we are in the cloud, we have realized the transformation of message format, the mixed storage of different versions of messages in the same file, and the production and consumption of any version. Here, we hope that Kafka community can try to release the relevant support, because after all, in the production system, business compatibility is one of the most important criteria. Moreover, even if our current implementation has transcoding of high and low versions of messages, in fact, the CPU is still redundant, which is not the bottleneck of the system, so I hope the community can adopt it

At this point, my sharing is basically over. Then this is my personal wechat. If you have any questions, you can add my wechat. Of course, now our ckafka team is recruiting a large number of people. If you are interested, you can contact us.

Q/A

Q: I need to ask you a question here. When I saw a flash disk optimization just now, I found that the CPU also increased many times with the improvement of performance. Is this mainly because you made another copy in the optimization process?

A: No, in fact, Kafka community has some optimization in memory copy. Alas, wait a moment. In the whole message flow, for example, when producing messages, Kafka will generate a ByteBuffer from the network layer to store message packets. Then this ByteBuffer will continue to reverse in the whole system. It will not have new copies. Only when there are messages that need to be stored in different ways and transcoding requirements, will Kafka generate a ByteBuffer A new message is copied once, otherwise there will not be multiple memory copies. When we do asynchronous disk brushing optimization here, there will not be any more memory copies. Our CPU usage has been improved several times, mainly due to the improvement of throughput. We can see that our system has 4% memory copies in the case of small packets~ 5 times of performance improvement. This improvement will lead to more network operations, more packaging and unpacking, more system I / O. of course, it will eventually lead to more CPU, of course, it will need more CPU consumption. Of course, it’s normal to increase CPU utilization several times as a result of so much consumption.

Q: I have a question to ask you. Ppt also said that in the case of a large number of systems, the pull rate of replicas can’t keep up with the production rate. In this case, we test or say that the pull rate of replicas on the line can’t keep up with the production speed, and it will affect other nodes, which will cause an avalanche effect. Then I’d like to ask about your solution. You just said that there is a solution. What kind of solution do you use?

A: The main reason why Kafka copies can’t be obtained in the community is that the consumption mode is adopted, but the consumption mode does not support pipe consumption. In fact, Kafka’s replica pull is a synchronous mode. After sending a replica pull request and waiting for the response, it sends a synchronous pull request again. It is impossible to use the pipe pipeline mode. In this synchronous mode, the network delay of the broker will become the key point of the whole replica synchronization, and in the case of heavy pressure, the delay of the broker end of the whole kakfa will reach seconds, which will lead to the delay The real reason is that the number of pull requests is not enough. If we use pipe to increase the number of pull requests, we will naturally increase the performance of replica synchronization.

Q: Do you have any more specific solutions?

The main reason is that we adopt a new protocol for replica synchronization, which makes the number of requests increase by using pipe method, so as to improve the speed of replica synchronization. In fact, the real reason why the Kafka replica pull in the community can’t keep up is that the requests are all synchronized. The large delay leads to the reduction of the number of requests. The reduction of the number of requests eventually leads to the degradation of the replica performance, which leads to the inability to keep up. So what we have to do is to increase the number of replica synchronization requests, which can fundamentally solve the problem. Here we use the pipe method to increase the number of requests Find the number of times to solve the copy can not keep up with the problem. This is actually the core and the simplest solution.

Q: Excuse me, I just saw an optimization of asynchronous disk dropping. The data you publish here has no delay information. I don’t know if this will lead to an increase in service delay.

A: Because there are more clients in the test, the statistics plus delay will lead to more clutter, so the statistics here are not shown. In fact, the delay effect is better when we use asynchronous disk brushing in the test statistics. Because the asynchronous disk brushing will not block any core process in the whole request, just push the disk brushing task to the queue, and then it can be directly returned to the front-end client. However, if you do not use asynchronous disk brushing directly in the core process, it will block the core process. In fact, each disk brushing takes a lot of time, often reaching a delay of about 400 milliseconds, so the delay will be even greater. After using asynchronous disk brushing, our test shows that even under the maximum throughput of 700 MB, the whole delay on our side is very good, and the average delay of the whole test is between 15 ms and 30 ms. However, in the community Kafka environment, the delay is about 200 milliseconds.

Q: I asked, if the asynchronous disk has not been swiped when it is due, doesn’t it need to wait for the disk to be swiped before it is returned to the customer? Will it cause message loss when the disk is not swiped?

A: In fact, this is related to Kafka’s application scenarios. At present, Kafka in the community does not swipe every message. In fact, the swiping is carried out by configuring a message interval number or interval time. In this way, in the case of system power failure, Kafka in the community can’t guarantee that this message won’t be lost. Kafka application scenarios are generally not used in scenarios where there is no need for message loss at all, but are mainly used in scenarios with high real-time requirements such as log collection and high throughput requirements. Kafka has actually chosen to sacrifice certain message reliability for throughput. Of course, for our asynchronous disk scrubbing, the first optimization step is relatively simple. We directly push the task to the disk scrubbing crowd queue and return it to the client successfully. It will indeed cause some messages to be lost when the system is powered down. At present, in order to adapt to Kafka’s application scenario, priority throughput is adopted. Of course, we will see if we need to achieve a real disk swipe and then return it to the user successfully. In fact, it’s relatively simple to implement this. We choose to suspend the production request until the disk brushing thread actually brushes the disk and returns it to the customer successfully. But in this case, you have just said that the delay may increase, because you have to wait for its real disk brushing to complete. This may require you to use different ways according to the application form to achieve a trade-off between high throughput and high reliability.

Q: Hello, there are two questions to ask. The first one is about the data disk, because it was mentioned just now that the data in Kafka is not necessarily reliable. Then I want to ask Tencent what optimization and solution it has made for data reliability.

A: On the one hand, in terms of hardware, in fact, all of our storage disks adopt RAID10 mode, so that even if there is a small amount of disk damage, there is no risk of data loss for us. On the other hand, like the community Kafka, ckafka can use multiple copies, and some reasonable configuration can ensure that the data will not be lost when the machine is damaged. Thirdly, in the implementation, ckafka also achieves the timing and quantitative active disk brushing, which can reduce the data loss caused by accidental power failure of the machine. Therefore, the corresponding community Kafka ckafka has higher data reliability guarantee in terms of hardware and software.

Q: Then, the second question. As I said just now, there is a problem in the cross city replica synchronization. Is Tencent’s deployment cross city?

A: At present, ckafka is deployed in the same city and zone. However, ckafka can be deployed across urban areas. Currently, this kind of deployment mode is not provided. The main reason is that the performance of consumers using community Kafka SDK is strongly dependent on the network delay with broker. If we deploy across regions, then the consumption performance of the client can not be guaranteed, because the network delay between regions is often tens of milliseconds, or even hundreds of milliseconds, which makes the whole consumption performance decline very seriously and can not meet the needs of the business. Of course, if the community Kafka SDK can use our above suggestions to achieve consumer pipe consumption, then cross regional deployment will not have any problems.

Q: OK brings another problem. I think if there is an accident, such as the big bang in Tianjin, which leads to the explosion of Tencent’s Tianjin computer room, have you ever considered this kind of migration plan?

A: In fact, ckafka is a cluster deployed in various regions. Users can purchase different instances in different regions. On the one hand, it can access nearby, on the other hand, it can realize remote disaster recovery to ensure the availability of the program. Of course, users can also synchronize data between different regions through some synchronization tools of Kafka, which is provided by us, so as to realize disaster recovery at the regional level.

Q: For the business, if he uses the synchronization tool, will it cost more to the business? Does the business need to modify the relevant procedures to achieve cross city access?

A: For business users, they can directly use some open source tools and methods. Users can achieve cross regional access without any changes. The ecology and tools of Kafka community are relatively perfect. You can go to the community more often, and you can always find your own tools.

For more details, please stamp the following link:
Kafka – high performance disclosure and optimization.pdf

Q & A
What is the purpose of Kafka’s key / value pair based messaging?
Related reading
Rao Jun: past, present and future of Apache Kafka
Yang Yuan: Tencent cloud Kafka automation operation practice
Chen Xinyu: application of ckafka in pass of face recognition

This article has been published by Tencent cloud + community authorized by the authorhttps://cloud.tencent.com/dev…
Yan Yanfei: unveiling and optimizing Kafka's high performance