Welcome toTencent cloud + community, get more Tencent mass technology practice dry goods~
This article is first published in cloud + community and cannot be reproduced without permission.
introduce oneself to
I am a technical engineer in Zhihu, and now I am responsible for Zhihu’s storage related components. My sharing is mainly based on three aspects. One is to briefly introduce the application of Kafka in zhihuyes, the other is why we build the Kafka platform based on kubernetes, and how we implement the Kafka platform based on kubernetes
Application of Kafka in Zhihu
Kafka is an excellent component of message or data flow. In Zhihu, it carries the service log of log, data collection and message queue. Obviously, it includes business, including running debug log and critical log.
Data transmission, for example, when we browse Zhihu, some user behavior or content characteristics are lost through our platform.
The other is Kafka’s implementation of message service. Simply put, I am concerned about user A. should I do a lot of things based on focusing on user behavior? This is a Message Queuing service. Our platform is now deployed with more than 40 Kafka clusters, all of which are independent. In addition, there are more than 1000 topics on it, and our number of brokers is more than 2000 Kafka. The platform has been running for two years since it was launched, and the amount of data it carries is 100 TB. Our current design is Kafka cluster, and we want to realize multi cluster. Because for the internal platform of the company, we need to ensure high availability. The bottom layer of the platform architecture is actually the majority of broker administrators, and the upper layer is abstract. Kafka’s cluster is actually insensitive to business,. Another is a management platform management, I create topic, create partition, or do troubleshooting. First, the upper layer can only have a management platform and a client that is interested in R business. For the client, we need to converge the client. There is a client that is natively supported Java. Different clients will have different performances. Now we need to converge on this word,
Because of the problems we encountered. In the early days, Zhihu’s Kafka was a single cluster. When the utilization rate of everyone was not high, or when the data growth was not explosive, the single cluster was still OK. One day, we found a broker hung up. When we all hung up, we found that there was a way that we could not go often. Therefore, we think that cluster and large-scale social science system will rely on centralization of power, regardless of any business, whether I write logs, send messages or do data transmission, which is not allowed. For Kafka, we have a concept of top development in different countries. In fact, when it comes to business, each topic represents a different business scenario. We feel that we need to classify the business scenario internally. For example, I need to analyze my important data and the deep coupling between our business and Kafka. When I first made a plan and combed it, I found out why I had so many topics and why some people were still OK and very angry after hanging up. I thought that simple is a natural disaster.
At that time, we found that in fact, there are many types of topics in our logs, one is logs, the other is data and messages. Data. For example, when doing an offline calculation, I collect user data or I have buried points in the app. These data can be used through the development pipeline or spa or our calculation tasks. Another is that the news just mentioned, for example, my user did a follow or like, triggered a series of processing flow, it is obvious that they actually have grading. We need to make Kafka cluster into multiple clusters internally. However, we need to make division according to our cooperation.
The same type of topic is used to manage and configure different clusters. In fact, the simplest method is that Kafka has fragmentation. We are doing high availability, and log fragmentation can be reduced. The message capacity is small. We all do data analysis with Kafka, for example. We certainly know that in offline technology, when we look at the relationship between the amount of data in data analysis and that when we are online, it should be thousands of times. Then there will be a problem – we plan a design and then implement it. In the end, our needs will change a lot, such as one A business, I think it is very important to do one, but it carries one. I have hundreds of tens of T data every day. Can I apply for a new broker. New clusters should not be mixed with me. I provide basic data. In this way, we will encounter more and more clusters. In fact, there are more than 4.4 billion clusters now. How to use server resources? Because in the early deployment, it must be a stand-alone deployment, but it’s not. For example, I have an ordinary one. For example, I have made a G and carried four ts. is it a bit wasteful for me to deploy a message task? In fact, we provide a way to improve resource utilization. We hope to deploy more brokers from a single machine and minimize the mutual influence between them. In fact, in our experimental practice,Disk is actually a problem that Kafka can’t get around.
First of all, the disk can not be used for data persistence, but we have encountered a lot of problems. When there is a sudden increase in volume or a large amount of traffic, the capacity of the disk will first have problems. For example, if I have applied for a 1t disk, maybe I have written it to 3T in one day or two days.
Another problem is the lutoc of disk. In fact, IOPs is too high. If this problem occurs, the performance of development will be greatly reduced. Since the broker can achieve multiple deployment, we should isolate the disk level. First, we should ensure that the data and messages do not affect each other. The broker should do a good job at the disk level, so that they do not affect each other. Therefore, the way we think of is whether the disk can be separated or not, we should separate it at the physical level, and it has copies.
This kind of separation of physical existence is acceptable, and in case of failure, it is completely within the controllable range, but can be expected. So I think we chose servers at that time. Blackstone also provided a kind of server called high-performance server. In fact, it meets our needs very well. It has 12 high-performance disks, which are single disks. It doesn’t make Rui’s disks. Each disk has a large capacity. In fact, it has advantages in CPU and memory, because Kafka has requirements for memory, such as file caching. Memory is the CPU that meets my needs. Now Intel’s CPU performance is pretty good. We have adopted the black high-performance server, and now our platforms are mainly deployed in this kind of server. When the underlying server is ready and the resource division is ready, how to manage it?
Let’s talk about an interesting one. Before our platform, there was a management platform developed by ourselves, which realized the deployment of broker, including rendering configuration and migration. This platform is relatively private, and it is not very convenient in operation and maintenance or management. Moreover, the audience is very happy to come. Colleagues have to learn from the code level and learn from our platform Or there is a better solution. If the number of passion increases to the guaranteed uncountable increase, and if the server fails, you can see how we manage it, and how we consider scheduling according to the dimension of disk when scheduling. So we think of QQ, because before we had a lot of practice in containerization. In the early development, before QQ came here, we actually knew that there were a lot of computing tasks deployed on KS, and there was a lot of accumulation on it, so we want to use its management function and container technology for resource management. The other is application management.
Kafka on Kubernetes
First of all, to solve the problem and design the Kafka container, there are only four problems: memory, CPU, network and storage. Another problem is how to schedule the Kafka container.
First, memory and CPU. In fact, CPU is difficult to predict, because according to different types of consultation, memory and CPU consumption are different. Kafka itself is not strong and depends on CPU. But there are still some problems in the actual use. For example, Kafka does not do batch, but sometimes when you understand it well, you will make the batch very small. For example, I want to reduce the delay, ensure that every message is exactly delivered, and receive brokers very small. What problem will it cause? The CPU will be high, but we can solve this problem by increasing the CPU. If there is no such large flow, the general memory will not exceed eight gigabytes, and the general usage will be lower, so our benchmark container will be set to 8g. According to the actual usage time, we often make adjustments, which can be easily changed in the IT market.
In addition, the network is our external service, which adopts an independent intranet IP mode. For example, each broker has an independent IP. In fact, because many containers will be deployed on our stand-alone machine, each has an IP, and this IP is registered on the intranet DNS. The advantage is that users do not need to know the IP of the specific container. This is another good way for the network to design a multi IP network for a single computer, which can at least meet our needs. This is the design of the container. By default, the supported disk mounting method is hostparh volume. This method is the best, because Kafka has the best performance on the local disk, and can make full use of the local efficient file cache. Our own disk performance is also very good, at least I can meet my needs.
Therefore, we should be a local directory, a cosplay, which is given to K2 after it gets up. The requested configuration is mounted to the server’s disk. The black box is our container, and the blue box that the development directory points to is a disk on the server or a directory on the server. Although our cluster looks like this, each block represents a lot of deployed brokers on the Internet. The business can be viewed in reverse, and each blue place represents broker.
First, CPU and memory are not a problem. The network has been tested. The server network is 20g bandwidth in the second century. After testing, each disk has a performance of a few gigabytes. Even if everyone runs a full disk, it is a disaster. It only has no more than 20 gigabytes. Therefore, we are not considering it. What we are considering is the high availability target of disk So that the brokers of a single cluster should be dispersed among the nodes as far as possible.
The second is that the storage usage on nodes should be as uniform as possible.
The algorithm calculates the score according to the state of the server disk, and the one with the higher score is scheduled. The other is the usage of disks. If there are more disks available, we tend to hang broker on them. In fact, it uses a simple way. Assuming that a red cluster is created, both a and C can be used, but C is the best, because the number of brokers on C is relatively small. If you want to create a blue cluster, it is obvious that a is the best. In addition, it is more complex in actual use, because the high availability of fragmentation has to be considered. Implementation according to the algorithm will encounter a practical problem – using hostpat has great limitations and poor consistency. For example, you need to manage the nodes to be scheduled, because if you use class, you need to register a word of your choice, or I don’t know which node to be transferred to. In addition, the directory to be mounted on the host is actually not managed by anyone. This is the problem we encountered. At that time, we wanted to make use of the feature of hostpath and only local disks to improve our performance management.
And I want to be able to select a reasonable node and manage the storage. At that time, we modified kubernetes to realize the algorithm of disk and scheduler, which can update disk information in real time. But the way to do it is to create instances by assuming
Local disk management
If the broker has been set up on the device and the disk is used, how to manage it? In fact, disk management can only enter the third-party agent.
Space will be reserved for fault handling and resource utilization improvement. For example, in order to quickly handle faults without signing, the first thing is that the cost is too high. What we are doing now is fast recovery. Therefore, we will reserve one or two disks, that is, fast processing disk. Therefore, as long as we point the software to this container, it can be enabled immediately, and there will not be too much network overhead. In addition, at the host level, that is, the fragmentation is done separately at the host level to achieve high availability.
But we have a problem – we need to unify the client, because the technology platform. How to unify the clients? Here, the client can read the consult information and check whether topic is useful. Another advantage is that when you do migration, there are many ways of production and consumption because of many things. Moreover, the general process is prior to the producer, and the consumer will come. You may have business. If you register in this way, in fact, the migration process can be synchronized. If you change the information in this place, you can feel the consumption of the whole production, that is, the ease of use will be improved. And the advantage of using this method is that there is a cluster. For example, my entire cluster is broken. Although nothing has happened, as an alternative method, we will have a disaster recovery cluster to migrate all clients directly.
Q: Hello, please ask, there may be many topics in a cluster. How are users isolated when different users consume topics? Will it consume other topic data? Would you like to ask if there is any good way of isolation? How many sets do you have in a cluster? If there are multiple topics in the cluster, don’t I want others to see the data? Of course, if I provide a client to him, he can see all the data. Is there any good way.
A: In fact, in our case, it is said that if there are many brokers in this cluster, and if they will influence each other in this cluster, we still suggest that they should not influence each other, because the cluster can not only provide one cluster for one user, that is, in our large cluster, there will be many users using their data, all of which come from different topics Is that right? When he consumes, if I don’t have isolation, I just give him the client, can all its data be seen? I can only do this by providing API services in front of me. Is that right? Is there any good way for Kafka to get certification.
For more details, please stamp the following link:
Design and implementation of Kafka platform based on kubernetes.pdf
Q & A
How to use Apache Kafka vs Apache storm?
Chen Xinyu: application of ckafka in pass of face recognition
Yang Yuan: Tencent cloud Kafka automation operation practice
Rao Jun: past, present and future of Apache Kafka
This article has been published by Tencent cloud + community authorized by the authorhttps://cloud.tencent.com/dev…