Rocketmq producers, consumers, deployment configuration best practices!

Time:2021-1-16

1 producer

1.1 notes for sending messages

1. Use of tags

An application can use a topic as much as possible, while message subtypes can be identified by tags. Tags can be set freely by the application. Only when the producer sets tags when sending messages, the consumer can use tags to filter messages through the broker when subscribing to messages message.setTags (“TagA”)。

2. Use of keys

The unique identification code of each message at the business level should be set to the keys field, so as to locate the problem of message loss in the future. The server will create an index (hash index) for each message. The application can query the content of the message and who consumes the message through topic and key. Since it is a hash index, make sure that the key is as unique as possible to avoid potential hash conflicts.

//Order ID   
   String orderId = "20034568923546";   
   message.setKeys(orderId);
3. Printing of log

If the message is sent successfully or failed, to print the message log, be sure to print the sendresult and key fields. As long as the send message method does not throw an exception, it means that the message is sent successfully. There will be multiple states when sending successfully, which are defined in sendresult. Each state is described below:

  • SEND_OK

Message sent successfully. Note that just because the message is sent successfully doesn’t mean it’s reliable. To ensure that no messages are lost, you should also enable synchronous master server or synchronous disk swipe, that is, sync_ Master or sync_ FLUSH。

  • FLUSH_DISK_TIMEOUT

The message was sent successfully, but the server timed out. At this time, the message has entered the server queue (memory). Only when the server is down, the message will be lost. In the message storage configuration parameters, you can set the disk brushing mode and synchronous disk brushing time. If the broker server sets the disk brushing mode to synchronous disk brushing, that is, flushdisktype = sync_ Flush (the default is asynchronous disk brushing mode). When the broker server does not complete the disk brushing within the synchronous disk brushing time (the default is 5S), it will return to this state — disk brushing timeout.

  • FLUSH_SLAVE_TIMEOUT

The message was sent successfully, but the server timed out when synchronizing to slave. At this time, the message has entered the server queue. Only when the server is down, the message will be lost. If the role of broker server is synchronization master, that is, sync_ Master (asynchronous master, async by default)_ If the slave broker server does not complete the synchronization with the master server within the synchronization disk flushing time (5 seconds by default), the status will be returned – the data synchronization to slave server timeout.

  • SLAVE_NOT_AVAILABLE

Message sent successfully, but slave is not available at this time. If the role of broker server is synchronization master, that is, sync_ Master (the default is asynchronous master server, namely async)_ If the slave broker server is not configured, this status will be returned – no slave server is available.

1.2 handling method of message sending failure

The send method of producer itself supports internal retrying. The retrying logic is as follows:

  • At most 2 retries (2 for synchronous transmission and 0 for asynchronous transmission).
  • If the send fails, the round goes to the next broker. The total time consumption of this method does not exceed the value set by sendmsgtimeout. The default is 10s.
  • If it sends a message to the broker, it will not try again.

The above strategies also ensure that the message can be sent successfully to a certain extent. If the business requires high reliability of messages, it is recommended to add corresponding retrial logic to the application: for example, when calling the send synchronization method to send fails, try to store the message in dB, and then the background thread will try again regularly to ensure that the message will arrive at the broker.

The reason why the above DB retrial method is not integrated into the MQ client, but requires the application to complete it by itself is mainly based on the following considerations: firstly, the MQ client is designed as stateless mode, which is convenient for any horizontal expansion, and the consumption of machine resources is only CPU, memory and network. Secondly, if a kV storage module is integrated in the MQ client, then the data can be more reliable only when it is dropped synchronously, and the performance cost of the synchronous drop disk itself is large, so it usually uses asynchronous drop disk. Because the application closing process is not controlled by the MQ operation and maintenance personnel, it may often be closed in such a violent way as kill – 9, resulting in the loss of data without dropping disk in time. Third, the reliability of the machine where producer is located is low. Generally, it is a virtual machine, which is not suitable for storing important data. In conclusion, it is suggested that the retrial process should be controlled by the application.

1.3 select oneway to send

Generally, message sending is a process like this:

  • The client sends the request to the server
  • The server processes the request
  • The server returns a response to the client

Therefore, the time-consuming of sending a message is the sum of the above three steps. Some scenarios require a very short time, but do not require high reliability. For example, log collection applications can be called in the form of oneway. Oneway only sends requests without waiting for responses, while sending requests are only an operating system call at the client implementation level This process usually takes microseconds.

2 consumers

2.1 consumption process idempotent

Rocketmq cannot avoid message duplication (exactly once), so if the business is very sensitive to consumption duplication, it is necessary to de duplicate it at the business level. With the help of relational database, it can be de duplicated. First, you need to determine the unique key of the message, which can be msgid or the unique identification field in the message content, such as order ID. Determine whether the only key exists in the relational database before consumption. If not, insert and consume, otherwise skip. (in the actual process, the atomicity should be considered to determine whether there is an attempt to insert. If there is a primary key conflict, the insertion will fail and skip directly.)

Msgid must be a globally unique identifier, but in actual use, there may be two different msgids for the same message (active resend by consumers, repeat caused by client re cast mechanism, etc.). In this case, business fields need to be consumed repeatedly.

2.2 treatment of slow consumption

1. Improve the parallel degree of consumption

Most of the message consumption behaviors are IO intensive, that is, they may operate the database or call RPC. The consumption speed of this kind of consumption behavior depends on the throughput of the back-end database or external system. By increasing the consumption parallelism, the total consumption throughput can be improved, but when the parallelism increases to a certain extent, it will decline. Therefore, the application must set a reasonable degree of parallelism. There are several ways to modify consumption parallelism:

  • In the same consumergroup, the parallelism is improved by increasing the number of consumer instances (it should be noted that the consumer instances that exceed the number of subscription queues are invalid). You can add machines or start multiple processes on existing machines.
  • The consumption parallel thread of a single consumer is improved by modifying the parameters consumethreadmin and consumethreadmax.
2. Mass consumption

If some business processes support batch consumption, the consumption throughput can be greatly improved. For example, for order deduction applications, it takes 1 s to process one order at a time, and it may only take 2 s to process 10 orders at a time. In this way, the consumption throughput can be greatly improved. By setting the consumemessagebatchmaxsize parameter of consumer, the default value is 1, that is, only one message is consumed at a time. For example, if n is set, the number of messages consumed each time is less than or equal to n.

3 skip unimportant messages

When message accumulation occurs, if the consumption speed can not catch up with the sending speed, and if the business does not require high data, you can choose to discard unimportant messages. For example, when the number of messages in a queue is more than 100000, try to discard some or all of the messages, so that you can quickly catch up with the speed of sending messages. The sample code is as follows:

public ConsumeConcurrentlyStatus consumeMessage(
            List<MessageExt> msgs,
            ConsumeConcurrentlyContext context) {
        long offset = msgs.get(0).getQueueOffset();
        String maxOffset =
                msgs.get(0).getProperty(Message.PROPERTY_MAX_OFFSET);
        long diff = Long.parseLong(maxOffset) - offset;
        if (diff > 100000) {
            //Special handling of todo message accumulation
            return ConsumeConcurrentlyStatus.CONSUME_SUCCESS;
        }
        //Todo normal consumption process
        return ConsumeConcurrentlyStatus.CONSUME_SUCCESS;
    }
4. Optimize the consumption process of each message

For example, the consumption process of a message is as follows:

  • Query [data 1] from DB according to message
  • Query [data 2] from DB according to message
  • Complex business computing
  • Insert [data 3] into DB
  • Insert [data 4] into DB

In the consumption process of this message, there are four interactions with DB. If the calculation time is 5ms each time, the total time consumption is 20ms. Assuming that the business calculation time is 5ms, the total over time consumption is 25ms. Therefore, if the four DB interactions can be optimized to twice, the total time consumption can be optimized to 15ms, that is, the overall performance is improved by 40%. Therefore, if the application is sensitive to delay, it can deploy dB on SSD hard disk. Compared with SCSI disk, the RT of the former is much smaller.

2.3 consumption print log

If the amount of messages is small, it is recommended to print messages in the consumption entry method, and the consumption time is time-consuming, so as to facilitate subsequent troubleshooting.

public ConsumeConcurrentlyStatus consumeMessage(
            List<MessageExt> msgs,
            ConsumeConcurrentlyContext context) {
        log.info("RECEIVE_MSG_BEGIN: " + msgs.toString());
        //Todo normal consumption process
        return ConsumeConcurrentlyStatus.CONSUME_SUCCESS;
    }

If you can print each message and consume time, it will be more convenient to check online problems such as slow consumption.

2.4 other consumption suggestions

1 about consumers and subscriptions

The first thing to note is that different consumer groups can consume some topics independently, and each consumer group has its own consumption offset. Please ensure that the subscription information of each consumer in the same group is consistent.

2 about ordered messages

Consumers will lock in each message queue to make sure they are consumed one by one, which can lead to performance degradation, but it’s useful when you care about message order. We don’t recommend throwing an exception, you can return it ConsumeOrderlyStatus.SUSPEND_ CURRENT_ QUEUE_ A_ Move as an alternative.

3 about concurrent consumption

As the name suggests, consumers will consume these messages concurrently. It is recommended that you use it to achieve good performance. We do not recommend throwing an exception. You can return to consu meConcurrentlyStatus.RECONSUME_ Later as an alternative.

4 about consumption status

For concurrent consumption listeners, you can return reconsume_ Later to inform consumers that they can’t consume this message now and hope to consume it again later. Then you can continue to consume other messages. For an ordered message listener, you can’t skip the message because you care about its order, but you can return suspend_ CURRENT_ QUEUE_ A_ Moment tells consumers to wait for a moment.

5 about blocking

Blocking the listener is not recommended because it blocks the thread pool and may eventually terminate the consuming process

6 about thread number setting

Consumers use ThreadPoolExecutor to consume messages internally, so you can change it by setting setconsumethreadmin or setconsumethreadmax.

7 about consumption sites

When creating a new consumer group, you need to decide whether you want to consume the historical message consult that already exists in the broker_ FROM_ LAST_ Offset will ignore historical messages and consume any messages generated later. CONSUME_ FROM_ FIRST_ Offset will consume every information that exists in the broker. You can also use consult_ FROM_ Timestamp to consume messages generated after a specified timestamp.

3 Broker

3.1 broker role

Broker roles are divided into async_ Master, sync_ Master and slave. If the reliability of the message is strict, you can use sync_ The deployment mode of master plus slave. If the requirement of message reliability is not high, async can be used_ The deployment mode of master plus slave. If it’s just convenient for testing, you can choose async only_ Master or sync only_ The deployment mode of master.

3.2 FlushDiskType

​ SYNC_ Flush (synchronous refresh) compared with Async_ Flush (asynchronous processing) will lose a lot of performance, but it is also more reliable, so we need to make trade-offs according to the actual business scenarios.

3.3 broker configuration

Rocketmq producers, consumers, deployment configuration best practices!

4 NameServer

In rocketmq, name servers are designed for simple route management. Its responsibilities include:

  • Brokers periodically register routing data with each name server.
  • The name server provides the latest routing information for clients, including producers, consumers and command line clients.


5 client configuration

Compared with the broker cluster of rocketmq, both producers and consumers are clients. This section mainly describes the behavior configuration of producers and consumers.

5.1 client addressing mode

Rocketmq enables the client to find the name server and then find the broker through the name server. As shown in the following, there are several configuration methods. The priority is from high to low, and the high priority will cover the low priority.

  • The name server address is specified in the code, and multiple namesrv addresses are separated by semicolons
producer.setNamesrvAddr("192.168.0.1:9876;192.168.0.2:9876");  

consumer.setNamesrvAddr("192.168.0.1:9876;192.168.0.2:9876");
  • Specify the name server address in the Java startup parameters
-Drocketmq.namesrv.addr=192.168.0.1:9876;192.168.0.2:9876  
  • The environment variable specifies the name server address
export   NAMESRV_ADDR=192.168.0.1:9876;192.168.0.2:9876   
  • HTTP static server addressing (default)

After the client is started, it will visit a static HTTP server regularly, and the address is as follows: http://jmenv.tbsite.net : 8080 /…, the return content of this URL is as follows:

192.168.0.1:9876;192.168.0.2:9876   

By default, the client accesses the HTTP server every 2 minutes and updates the local name server address. The URL has been hard coded in the code. You can change the server to be accessed by modifying the / etc / hosts file. For example, add the following configuration in / etc / hosts:

10.232.22.67    jmenv.taobao.net   

It is recommended to use HTTP static server addressing mode. The advantage is that the client deployment is simple, and the name server cluster can be hot upgraded.

5.2 client configuration

Defaultmqproducer, transactionmqproducer, defaultmqpushconsumer and defaultmqpullconsumer all inherit from the clientconfig class, which is the public configuration class of the client. The configuration of the client is in the form of get and set. Each parameter can be configured with spring or in code. For example, the parameter namesrvaddr can be configured in this way, producer.setNamesrvAddr (192.168.0.1:9876), other parameters are the same.

1 Public configuration of client

Rocketmq producers, consumers, deployment configuration best practices!

2 producer configuration

Rocketmq producers, consumers, deployment configuration best practices!

3 pushconsumer configuration

Rocketmq producers, consumers, deployment configuration best practices!

4 pullconsumer configuration

Rocketmq producers, consumers, deployment configuration best practices!

5 message data structure

Rocketmq producers, consumers, deployment configuration best practices!

6 system configuration

This section mainly introduces the configuration of the system (JVM / OS).

6.1 JVM options

The latest version of JDK 1.8 is recommended. By setting the same XMS and Xmx values, the JVM is prevented from resizing the heap for better performance. A simple JVM configuration is as follows:

`

​-server -Xms8g -Xmx8g -Xmn4g
`


If you don’t care about the startup time of rocketmq broker, a better option is to “pre touch” the Java heap to ensure that every page will be allocated during JVM initialization. Those who don’t care about the startup time can enable it:
​ -XX:+AlwaysPreTouch
Disabling offset locking may reduce JVM pauses,
​ -XX:-UseBiasedLocking
As for garbage collection, it is recommended to use G1 collector with JDK 1.8.

-XX:+UseG1GC -XX:G1HeapRegionSize=16m   
-XX:G1ReservePercent=25 
-XX:InitiatingHeapOccupancyPercent=30

These GC options seem radical, but they have proven to perform well in our production environment. In addition, do not set the value of – XX: maxgcpausemillis too small, otherwise the JVM will use a small young generation to achieve this goal, which will lead to very frequent minor GC. Therefore, it is recommended to use rolling GC log file:

-XX:+UseGCLogFileRotation   
-XX:NumberOfGCLogFiles=5 
-XX:GCLogFileSize=30m

If writing GC files will increase the agent’s delay, consider redirecting GC log files to the in memory file system

-Xloggc:/dev/shm/mq_gc_%p.log123   

6.2 Linux kernel parameters

​ os.sh The script lists many kernel parameters in the bin folder, which can be changed slightly and then used for production purposes. The following parameters need attention. For more details, please refer to the document of / proc / sys / VM / *

  • vm.extra_free_kbytes, which tells the VM to reserve additional available memory between the threshold of kswapd startup and the threshold of direct reclamation (through the allocation process). Rocketmq uses this parameter to avoid long latency in memory allocation. (related to specific kernel version)
  • vm.min_free_kbytesIf it is set to less than 1024KB, the system will be cleverly destroyed, and the system is prone to deadlock under high load.
  • vm.max_map_countTo limit the maximum number of memory mapped areas a process may have. Rocketmq will use MMAP to load commitlog and consumequeue, so it is recommended to set a larger value for this parameter. (agressiveness –> aggressiveness)
  • vm.swappinessTo define how active the kernel is in swapping memory pages. A higher value will increase the attack, and a lower value will reduce the exchange volume. It is recommended to set the value to 10 to avoid exchange latency.
  • File descriptor limits, rocketmq needs to open file descriptors for files (commitlog and consumequeue) and network connections. We recommend setting the value of file descriptor to 655350.
  • Disk scheduler, rocketmq recommends using I / O deadline scheduler, which attempts to provide guaranteed latency for requests.

[]([]())

Recommended Today

DK7 switch’s support for string

Before JDK7, switch can only support byte, short, char, int or their corresponding encapsulation classes and enum types. After JDK7, switch supports string type. In the switch statement, the value of the expression cannot be null, otherwise NullPointerException will be thrown at runtime. Null cannot be used in the case clause, otherwise compilation errors will […]