Message Queuing Middleware (hereinafter referred to as message middleware) refers to the use of efficient and reliable message delivery mechanism for platform independent data exchange, and based on data communication to integrate distributed systems. By providing message delivery and message queuing model, it can provide application decoupling, elastic scaling, redundant storage, traffic peak clipping, asynchronous communication, data synchronization and other functions in the distributed environment. As an important component in the distributed system architecture, it plays an important role.
At present, there are many open source message middleware, such as ActiveMQ, rabbitmq, Kafka, rocketmq, zeromq and so on. No matter which one you choose, it will be useful. After all, it is not tailor-made for you. Some large factories have accumulated some experience in the long-term use process, and their message queue usage scenarios are relatively stable, or the current message middleware on the market can not meet their own needs, and they also have enough energy and manpower to choose to develop a message middleware for themselves. But the vast majority of companies will not choose to repeat the wheel, so it is particularly important to choose a suitable message middleware. Even if it is the former, it will go through such a selection process before developing stable and reliable related products.
The introduction of message middleware in the overall architecture is bound to consider many factors, such as cost and benefit, how to achieve the optimal cost performance? Although there are many kinds of message oriented middleware, they all have their own focuses. It is undoubtedly the best way to choose the right one and make full use of the advantages and avoid the disadvantages. If you don’t know what to do about it, this article may refer to one or two.
Brief introduction of various message queues
ActiveMQ is a message oriented middleware produced by Apache and written in Java language based on jms1.1 specification. It provides efficient, scalable, stable and secure enterprise level message communication for applications. However, due to historical reasons, the burden is too heavy. At present, the market share is less than that of the latter three kinds of message middleware. Its latest architecture is named Apollo, which is known as the next generation of ActiveMQ. Interested students can learn about it.
Rabbitmq is a message middleware based on AMQP protocol implemented in Erlang language. It originated from financial system and was used to store and forward messages in distributed system. With the development of rabbitmq today, it is recognized by more and more people, which is inseparable from its excellent performance in reliability, availability, scalability, rich functions and so on.
Kafka is a distributed, multi partitioned, multi replica and zookeeper based distributed messaging system developed by LinkedIn using Scala language. It has been donated to Apache foundation. It is a high throughput distributed publish subscribe message system, which is widely used for its scalability and high throughput. At present, more and more open source distributed processing systems such as cloudera, Apache storm, spark and Flink support the integration with Kafka.
Rocketmq is an open source message middleware of Alibaba. At present, it has donated to Apache foundation. It is developed by java language. It has the characteristics of high throughput, high availability and suitable for large-scale distributed system applications. It has experienced the baptism of double 11 and its strength can not be underestimated.
Zeromq, known as the fastest message queue in history, is developed based on C language. Zeromq is a message processing queue library, which can be flexibly scaled between multi-threaded, multi-core and host computers. Although most of the time we are used to putting it into the message queue family, it is essentially different from the previous ones. Zeromq itself is not a message queue server, it is more like a set of underlying network communication libraries, and it is not a good choice for the original socket It’s just a layer of encapsulation on the API.
At present, there are many message middleware in the market, such as phxqueue, CMQ and ckafka of Tencent, and NSQ based on go language. Sometimes people also regard products similar to redis as a kind of message middleware. Of course, they are excellent, but the space limit of this paper can not be exhaustive. Next, we will select two typical message middleware, rabbitmq and Kafka Analysis, and strive to stand in a fair and just position to elaborate the main points of message middleware selection.
Summary of key points of type selection
To measure whether a message middleware meets the requirements, we need to examine it from multiple dimensions. The first is the function dimension, which directly determines whether you can realize out of the box use to the greatest extent, thus shortening the project cycle and reducing the cost. If the function of a message middleware can not achieve the desired function, it needs secondary development, which will increase the technical difficulty, complexity and cycle of the project.
1. Functional dimension
The function dimension can be divided into several sub dimensions, which can be roughly divided into the following:
Priority queue is different from FIFO queue. Messages with high priority have priority to be consumed, which can provide different message levels for downstream. However, this priority also needs to have a premise: if the consumer’s consumption speed is faster than that of the producer, and there is no message accumulation in the message middleware server (commonly referred to as broker), then setting the priority for the sent message has no substantive significance, because the producer has just sent a message and it is consumed by the consumer It means that there is only one message in the broker at most, and the priority is meaningless for a single message.
When you shop online, will you encounter such a prompt: “if you don’t pay within 30 minutes, the order will be cancelled automatically”? This is a typical application scenario of delay queue. The delay queue stores the corresponding delay message. The so-called “delay message” means that when a message is sent, consumers do not want to get the message immediately, but wait for a specific time before they can get the message for consumption. There are two types of delay queues: message based delay and queue based delay. Message based delay refers to setting different delay time for each message, so every time a new message enters the queue, it will be reordered according to the delay time. Of course, this will have a great impact on performance. In practical applications, queue based delay is mostly used, and queues with different delay levels are set, such as 5S, 10s, 30s, 1min, 5mins, 10mins, etc. the delay time of messages in each queue is the same, so that the performance pain of delay sorting is avoided, and overtime messages can be delivered through certain scanning strategies (such as timing).
Dead letter queue
Due to some reasons, messages cannot be delivered correctly. In order to ensure that messages will not be discarded for no reason, messages are usually placed in a special role queue, which is generally called dead letter queue. Correspondingly, there is a concept of “fallback queue”. Imagine that if a consumer has an exception during consumption, then the consumer will not be acknowledged (ACK), and then the message will be rolled back. After that, the message will always be placed at the top of the queue, and then it will be processed and rolled back continuously, causing the queue to fall into a dead cycle. In order to solve this problem, we can set a fallback queue for each queue, which and the dead letter queue are a mechanism guarantee for exception handling. In fact, the role of fallback queue can be played by dead letter queue and retrial queue.
In fact, the retrial queue can be regarded as a kind of fallback queue. Specifically, when the consumer fails to consume the message, the message is rolled back to the broker to prevent the message from being lost without reason. Different from the backoff queue, the retrial queue is generally divided into multiple retrial levels, and each retrial level will also set the re delivery delay. The more the number of retries, the greater the delivery delay. For example: if a message fails to be consumed for the first time, it enters the retrial queue Q1. The delay of re delivery in Q1 is 5S, and the message is re delivered after 5S. If the message fails to be consumed again, it enters the retrial queue Q2. The delay of re delivery in Q2 is 10s, and the message is re delivered after 10s. By analogy, the more times you try again, the longer it will take to re post. Therefore, you need to set an upper limit. If you exceed the number of times, you will enter the dead letter queue. The retrial queue and the delay queue have the same place. They both need to set the delay level. The differences between them are: the delay queue action is triggered by the internal, and the retrial queue action is triggered by the external consumer; the delay queue function once, and the scope of the retrial queue will be passed back.
Consumption mode is divided into push mode and pull mode. Push mode means that the broker actively pushes messages to the consumer, which has good real-time performance, but it needs a certain flow mechanism to ensure that the messages pushed by the server will not crush the consumer. The pull mode refers to that the consumer actively requests the broker to pull (usually timing or quantitative) messages. The real-time performance of the pull mode is worse than that of the push mode, but it can control the amount of pulled messages according to its own processing capacity.
There are two kinds of message delivery modes: Peer-to-Peer (P2P) mode and publish / subscribe (Pub / sub) mode. For the point-to-point mode, after the message is consumed, it will not be stored in the queue, so it is impossible for the message consumer to consume the consumed message. Although the queue can support multiple consumers, a message will only be consumed by one consumer. The publish subscribe model defines how to publish and subscribe messages to a content node, which is called topic. Topic can be regarded as the intermediary of message delivery. The message publisher publishes messages to a topic, while the message subscriber subscribes messages from the topic. The topic makes the message subscriber and the message publisher independent of each other, and the message delivery can be guaranteed without contact. The publish / subscribe mode is adopted in the one to many broadcast of the message. Rabbitmq is a typical peer-to-peer model, while Kafka is a typical publish subscribe model. However, rabbitmq can achieve the effect of broadcast consumption by setting switch type to realize publish subscribe mode. Kafka can also consume in the form of point-to-point. You can completely regard the concept of consumer group as the concept of queue. However, in contrast, Kafka has stronger support for broadcast consumption than rabbitmq because of its message backtracking function.
Generally, a message is processed after consumption, and it can no longer be consumed. Message backtracking, on the contrary, means that after the completion of consumption, the message can be consumed to the previous consumed message. For messages, it is often faced with the problem of “message loss”. It is difficult to trace whether it is really due to the defect of message middleware or the misuse of the user. If message middleware itself has the function of message backtracking, it can reproduce the “lost” message through backtracking consumption to find out the source of the problem. The function of message backtracking is far more than that, such as index recovery and local cache reconstruction. Some business compensation schemes can also be implemented by backtracking.
Message accumulation + persistence
Traffic peak clipping is a very important function of message middleware, and this function actually benefits from its message stacking ability. In a sense, if a message middleware does not have the ability of message accumulation, it can not be regarded as a qualified message middleware. Message heap includes memory heap and disk heap. Rabbitmq is a typical memory accumulation, but it is not absolute. After some conditions are triggered, there will be a page feed action to page the messages in memory to disk (the page feed action will affect the throughput), or directly use the lazy queue to persist the messages to disk. Kafka is a typical disk type stack in which all messages are stored on disk. Generally speaking, the capacity of disk is much larger than that of memory. For disk type stacking, its stacking capacity is the size of the whole disk. From another point of view, message stacking also provides redundant storage for message middleware. Citing the case of the New York Times（https://www.confluent.io/blog…）Which directly uses Kafka as a storage system.
For the distributed architecture system in the link tracking (trace), we will not be unfamiliar. For message middleware, message link tracing (hereinafter referred to as message tracing) is equally important. The most popular understanding of message tracking is to know where messages come from, exist and go. Based on this function, we can carry out the link tracking service for the sent or consumed messages, and then we can quickly locate and check the problems.
Message filtering refers to providing downstream users with specified categories of messages according to established filtering rules. For example, Kafka can send different types of messages to different topics, which can achieve a certain sense of message filtering, or Kafka can also classify the messages in the same topic according to the partition. However, in a more strict sense, message filtering should take a certain way to filter the given messages according to certain filtering rules. Similarly, take Kafka as an example, you can filter messages through the consumerinterceptor interface provided by the client or the filter function of Kafka stream.
It can also be called multi tenancy technology. It is a software architecture technology. It is mainly used to realize the sharing of the same system or program components in a multi-user environment, and can still ensure the isolation of data among users. Rabbitmq can support multi tenant technology. Each tenant is represented as a Vhost. In essence, it is an independent small rabbitmq server, with its own independent queue, switch and binding relationship, and it has its own independent permissions. Vhost is like a virtual machine in a physical machine. It provides logical separation between instances and allows data for different programs. It can not only distinguish many clients in the same rabbitmq, but also avoid naming conflicts between queues and switches.
Multi protocol support
Message is the carrier of information. In order to make producers and consumers understand the information (producers need to know how to construct messages and consumers need to know how to parse messages), they need to describe messages in a unified format, which is called message protocol. A valid message must have a certain format, and a message without a format is meaningless. General message level protocols include AMQP, mqtt, stomp, XMPP, etc. (JMS in the message field is more a specification than a protocol). The more protocols it supports, the wider its application scope and stronger its universality. For example, rabbitmq can support mqtt protocol, which makes it gain a place in the application of Internet of things. Other message oriented middleware is based on its own private protocol, such as Kafka.
Cross language support
For many companies, there will be multiple programming languages in their technology stack system, such as C / C + +, Java, go, PHP, etc. message middleware itself has the feature of application decoupling. If it can further support multi client languages, it can expand the efficiency of this feature. Cross language support can also reflect the popularity of a message middleware.
Flow control aims at the speed mismatch between the sender and the receiver. It provides a speed matching service to suppress the sending rate and make the reading rate of the receiver adapt to it. The common flow control methods include stop and wait, sliding window and token bucket.
As the name suggests, message sequencing is to ensure that messages are in order. A very common application scenario of this function is CDC (change data chapter). Take MySQL as an example, if the order of the binlog transmitted by MySQL is wrong, for example, it is originally adding 1 to a piece of data and then multiplying by 2. After sending the wrong order, it becomes multiplying by 2 and then adding 1, resulting in data inconsistency.
After Kafka 0.9, two security mechanisms, identity authentication and permission control, have been added. Identity authentication refers to the connection between client and server for identity authentication, including the connection authentication between client and broker, between broker and broker, and between broker and zookeeper. At present, SSL, SASL and other authentication mechanisms are supported. Permission control refers to the permission control of the client’s read and write operations, including the permission control of message or Kafka cluster operations. Permission control is pluggable and supports integration with external authorization services. For rabbitmq, it also provides the security mechanism of identity authentication (TLS / SSL, SASL) and permission control (read-write operation).
To ensure the transmission of messages between producers and consumers, there are generally three delivery guarantees: at most once, messages may be lost, but they will never be transmitted repeatedly; at least once, messages will never be lost, but they may be repeated; exactly Once, exactly once, every message must be transmitted once and only once. For most message oriented middleware, it only provides at most once and at least once transmission guarantee. For the third one, it is difficult to achieve, so it is difficult to guarantee message idempotency.
Kafka has introduced idempotency and transaction since version 0.11. The idempotency of Kafka refers to the idempotency of a single producer for a single partition and a single session, and the transaction can guarantee to write to multiple partitions atomically, that is, all messages written to multiple partitions are successful or rolled back. The combination of these two functions can make Kafka have the ability of EOS (exactly only semantic).
However, if we want to consider the global idempotent, we also need to consider from the upstream and downstream aspects, that is, the business level. Idempotent processing itself is also an important issue to be considered at the business level. Taking the downstream consumer level as an example, it is possible that after consuming a message, the consumer will have an exception without having time to confirm the message, and after recovery, the consumer will have to consume the original consumed message again. Then this type of message idempotency cannot be guaranteed by the message middleware level. If we want to ensure global idempotence, we need to introduce more external resources to ensure it, such as taking the order number as the unique identifier, and setting a de duplication table downstream.
Transaction itself is not a strange word. Transaction is composed of all operations executed between the beginning transaction and the end transaction. Kafka and rabbitmq support transaction oriented message oriented middleware. However, the transaction of these two means that the producer sends a message successfully or fails. Message oriented middleware can be used as a means to implement distributed transactions, but it does not provide the function of global distributed transactions.
The following table is a summary comparison and supplementary description of the functions of Kafka and rabbitmq.
Function dimension is an important reference dimension in the selection of message middleware, but it is not the only dimension. Sometimes performance is more important than function. Moreover, performance and function are often contradictory. You can’t have both. When Kafka turns on idempotent and transaction functions, its performance will be reduced, while rabbitmq turns on rabbitmq_ Tracing plug-ins will also greatly affect its performance. The performance of message oriented middleware generally refers to its throughput. Although rabbitmq has more advantages than Kafka in terms of function, Kafka’s throughput is 1 to 2 orders of magnitude higher than rabbitmq’s. generally, the single QPS of rabbitmq is within 10000, while Kafka’s single QPS can be maintained at 100000, or even reach one million.
The throughput of message oriented middleware is always limited by the hardware level. Take the network card bandwidth as an example. If the bandwidth of a single network card on a single machine is 1Gbps, and if the throughput is to reach a million level, the message body size should not exceed (1GB / 8) / 100W, which is about 134b. In other words, if the message body size exceeds 134b, it is impossible to achieve a million level throughput. This calculation method can also be applied to memory and disk.
As an important indicator of performance dimension, delay is often ignored in the field of message oriented middleware, because the general scenarios using message oriented middleware do not require high timeliness. If timeliness is required, RPC can be used to achieve it. Message middleware has the ability of message accumulation. The larger the message accumulation, the longer the end-to-end delay. At the same time, delay queue is also a major feature of some message middleware. So why pay attention to the delay of message middleware? Message oriented middleware can decouple the system. For a message oriented middleware with low latency, it can enable upstream producers to send messages back quickly, and also enable consumers to get messages more quickly. Without accumulation, it can make the cascade action between upstream and downstream applications more efficient, although it is not recommended to use it in high timeliness scenarios However, if the delay of the message middleware is excellent, the performance of the whole system will be greatly improved.
3. Reliability + availability
Message loss is the same point that we have to face when using message middleware, and the reliability of message behind it is also a key factor to measure the quality of message middleware. Especially in the field of financial payment, the reliability of information is particularly important. However, when it comes to reliability, it is necessary to talk about availability. Pay attention to the difference between the two. The reliability of message oriented middleware refers to the degree of guarantee that messages will not be lost, while the availability of message oriented middleware refers to the percentage of time that messages run without failure, which is usually measured by several nines.
In a narrow sense, distributed system architecture is the application implementation of consistency protocol theory. For message reliability and availability, it can also be traced back to the consistency protocol behind message middleware. For Kafka, it adopts the consistency protocol similar to Pacifica, which guarantees the synchronization between multiple replicas through ISR (in sync replica), and supports strong consistency semantics (implemented through acks). The corresponding rabbitmq implements multiple copies and strong consistency semantics through mirror ring queue. Multiple copies can ensure that after the master node goes down abnormally, slave can be promoted as a new master and continue to provide services to ensure availability. Kafka was originally designed for log processing, leaving a bad impression that data reliability is not required. However, with the upgrade and optimization of the version, its reliability has been greatly enhanced. For details, please refer to kip101. At present, rabbitmq is mostly used in the field of financial payment, while Kafka is mostly used in log processing and big data. With the continuous improvement of rabbitmq performance and the further enhancement of Kafka reliability, we believe that each other can get a share in the fields that they were not good at before.
Synchronous disk scrubbing is an effective way to enhance the reliability of a component, and message middleware is no exception. Kafka and rabbitmq can support synchronous disk scrubbing. However, the author has some doubts about synchronous disk scrubbing: in most cases, the reliability of a component should not be guaranteed by synchronous disk scrubbing, which is an extremely lossy operation, but by multi copy mechanism.
One aspect that I want to mention here is the extensibility. I narrowly summarize it to the dimension of usability. The extensibility of message oriented middleware can enhance its usability and scope. For example, rabbitmq supports a variety of message protocols, which is the extension implementation based on its plug-in. In terms of cluster deployment, thanks to the horizontal expansion capability of Kafka, it can basically reach the level of linear capacity improvement. In the practice introduction of LinkedIn, it is mentioned that Kafka cluster has deployed more than 1000 devices.
4. Operation and maintenance management
In the process of using message middleware, it is inevitable that all kinds of abnormal situations will appear, including client and server. So how to monitor and repair timely and effectively. Line of business traffic has peaks and valleys, especially in the field of e-commerce, so how to conduct an effective capacity assessment, especially during the period of big promotion? Kicking power supply, network cable being dug and other incidents emerge in endlessly, how to effectively do a good job in remote living? These are inseparable from the derivative product of message middleware operation and maintenance management.
Operation and maintenance management can also be further subdivided, such as application, audit, monitoring, alarm, management, disaster recovery, deployment, etc.
The application and audit are well understood. The control of resources at the source can not only effectively correct the usage specifications of the application side, but also do a good job in traffic statistics and traffic evaluation. Generally, the application and audit are integrated with the company’s internal system, which is not suitable for the use of open source products.
Monitoring and alarm are also easy to understand. Monitoring the use of message middleware in an all-round way can not only provide benchmark data for the system, but also cooperate with alarm when abnormal conditions are detected, so as to facilitate the rapid intervention of operation and maintenance personnel and developers. In addition to the general monitoring items (such as hardware, GC, etc.), the message middleware also needs to pay attention to the end-to-end delay, message audit, message accumulation, etc. For rabbitmq, the most orthodox monitoring and management tool is rabbitmq_ Management plug-ins are available, but there are many excellent products in the community, such as appdynamics, collectd, datadog, ganglia, Munin, Nagios, new relic, Prometheus, zenoss, etc. Kafka is no inferior in this aspect, such as Kafka manager, Kafka monitor, Kafka offset monitor, burrow, chaperone, fluent control center and other products, especially cruise can also provide automatic operation and maintenance functions.
Whether it is expansion, degradation, version upgrade, cluster node deployment, or fault handling, it is inseparable from the application of management tools. A complete set of management tools can get twice the result with half the effort when encountering changes. The fault can be large or small, generally some application abnormalities, or machine power failure, network abnormalities, disk damage and other single machine failures. The multiple copies of these failures in a single machine room are enough to cope with. If it is a computer room failure, it will involve remote disaster recovery. The key point is how to effectively replicate data. For Kafka, you can refer to mirrormarker, ureplicator and other products, while for rabbitmq, you can refer to Federation and shovel.
5. Community dynamics and ecological development
For the current popular programming languages, such as Java and python, if you encounter some exceptions in the process of using, you can basically solve them with the help of search engines, because the more people use a product, the more holes you step on, and the more solutions you have. It is also applicable to message middleware. If you choose a “remote” message middleware, it may be handy in some aspects, but the version update is slow, and it is difficult to get the support of the community when encountering thorny problems. On the contrary, if you choose a “popular” message middleware, its update strength is large, which can not only make up for it quickly It can also adapt to the rapid development of technology to change some new functions, so that you can “stand on the shoulders of giants”. In the dimension of operation and maintenance management, we mentioned that Kafka and rabbitmq have a series of open source monitoring and management products, which benefit from the rapid development of their community and ecology.
Discussion on misunderstanding of message middleware selection
Before selecting message middleware, you can ask yourself a question: do you really need a message middleware? After understanding this problem, you can continue to ask yourself a question: do you need to maintain a set of message middleware? In order to save costs, many start-ups will choose to directly purchase cloud services related to message middleware. They only need to focus on sending and receiving messages, and the rest can be outsourced.
Many people have an impulse to develop message oriented middleware. You can simply encapsulate the arrayblocking queue in Java. You can also form a message oriented middleware based on the underlying storage encapsulation of files, databases, redis, etc. As a basic component, message middleware is not as simple as it is supposed to be. It also needs a supporting product set to manage and operate the whole ecosystem. If the documents are not complete and the operation is not standardized, it will bring new people a nightmare like experience. Is it really necessary to do self research? If it is not the pressure of KPI, we can first consider the following two problems: 1. Are the message middleware on the market really unable to meet the current business needs? 2. Does the team have enough ability, manpower, financial resources and energy to support self research?
Many people will refer to a lot of comparative articles on the Internet when they do the selection of message middleware, but their professionalism, preciseness and political stance need to be verified, so we need to examine these articles with a skeptical attitude. For example, some articles will directly define a message middleware without any restrictions and scenarios, and some articles will do function and performance comparative analysis without specifying the version and test environment of the message middleware.
Message middleware is like a pony crossing the river. It is most important to choose the right one. It needs to fit its own business needs, and technology serves the business. Generally speaking, it can be screened one by one according to the six dimensions mentioned in the previous section, such as function and performance. The deeper choice lies in whether you can master its soul. In my humble opinion, rabbitmq lies in routing, while Kafka lies in streaming. Understanding its essence is particularly important for you to choose the right message middleware.
Message middleware selection should not blindly pursue performance or function. Performance can be optimized and function can be redeveloped. If you want to make a choice in terms of function and performance, then performance is preferred, because in general, the space for performance optimization is not as large as the space for function expansion. However, for long-term development, ecology is more important than performance and function.
A lot of times, there is a misunderstanding about reliability: if you want to find a product to ensure the absolute reliability of information, unfortunately, there is no absolute thing in the world, you can only say that it tends to be perfect as far as possible. In order to ensure the reliability of messages as much as possible, it is not only the message middleware itself, but also the upstream and downstream. Efforts should be made from the three dimensions of production, service and consumption,Rabbitmq message reliability analysisThis article analyzes the reliability of rabbitmq from these three dimensions.
Another consideration standard of message middleware selection is to fit the team’s own technology stack system as much as possible. Although there is no bad message middleware and only bad programmers, it is much easier for a C stack team to dig deep into phxqueue than Kafka written by Scala.
It’s from Zhan Xiaolang’s blogIn depth analysis of message middleware selection