Open source Kafka enhancement: okmq-1.0.0


The core idea of this tool is: gambling. Only when two basic components die at the same time will they receiveserious influence。 Oh, except for power failure.

MQ is a good thing, we are all using it. This also determines that MQ should be highly available. Because of this component, a group has had several production accidents, ha ha.

In most business systems, the required message semantics areat least once, that is to say, there will be duplicate messages, but they will not be lost. Even so, there are still many problems:

I. MQ availability cannot be guaranteed.The unexpected death of MQ resulted in the failure of sending from the production side. Many messages need to be played back by fetching logs, which is costly and time-consuming.

Second, the MQ blocking service is running normally.If MQ is stuck or network problems occur, the business thread will be stuck in the sending method of MQ, and normal business will not continue, resulting in disastrous consequences.

3. Message delay.It’s unnecessary to say if MQ is dead. It’s dead before the news is delivered. Message delay is mainly caused by the weak consumption ability of the client or the single consumption channel.

Using composite storage to ensure reliable delivery of messages isokmq

Note: okmq focuses on reliability. Other elements such as sequence and transaction shall not be considered. Of course, speed is a must.

Design idea

I use two sets of redis to simulate some MQ operations, which are better than some existing solutions. But this is certainly not what we need, because redis’s stacking capacity is too limited, and the sense of memory usage rising in a straight line is not very good.

But we can use redis as an additional send confirmation mechanism. This idea has been mentioned in the article “using multithreading to increase Kafka’s consumption ability”. Now it’s time to implement it.

Let’s start with API

OkmqKafkaProducer producer = new ProducerBuilder()
.any("okmq.redis.mode", "single")
.any("okmq.redis.endpoint", "")
.any("okmq.redis.poolConfig.maxTotal", 100)

Packet packet = new Packet();
packet.setContent("i will send you a msg");
producer.sendAsync(packet, null);

Take redis for example

Open source Kafka enhancement: okmq-1.0.0
We will introduce it according to the number label:

1、Before the message is sent to Kafka, it is first stored in redis. Since the subsequent callback needs to use a unique representation, we added a UUID in the packet package.

2、Call the underlying API for real message delivery.

3、By listening to Kafka’s callback, delete the corresponding key in redis. Here you can get the exact ack time of a message. If it hasn’t been deleted for such a long time, even if the delivery fails.

4、There will be a thread in the background to traverse and resend these failed messages. Our name is recovery. The most complicated part is this part. For redis, it will first scramble for a lock lasting for 5min, and then traverse the relevant hashkey.

Therefore, for the above code, redis issues the following command:

1559206423.395597 [0] "HEXISTS" "okmq:indexhash" "okmq:5197354"
1559206423.396670 [0] "HSET" "okmq:indexhash" "okmq:5197354" ""
1559206423.397300 [0] "HSET" "okmq:5197354" "okmq::2b9b33fd-95fd-4cd6-8815-4c572f13f76e" "{\"content\":\"i will send you a msg104736623015238\",\"topic\":\"okmq-test-topic\",\"identify\":\"2b9b33fd-95fd-4cd6-8815-4c572f13f76e\",\"timestamp\":1559206423318}"
1559206423.676212 [0] "HDEL" "okmq:5197354" "okmq::2b9b33fd-95fd-4cd6-8815-4c572f13f76e"
1559206428.327788 [0] "SET" "okmq:recovery:lock" "01fb85a9-0670-40c3-8386-b2b7178d4faf" "px" "300000"
1559206428.337930 [0] "HGETALL" "okmq:indexhash"
1559206428.341365 [0] "HSCAN" "okmq:5197354" "0"
1559206428.342446 [0] "HDEL" "okmq:indexhash" "okmq:5197354"
1559206428.342788 [0] "GET" "okmq:recovery:lock"
1559206428.343119 [0] "DEL" "okmq:recovery:lock"

Answers to the above questions

So for the above three questions, the answers are as follows:

I. MQ availability cannot be guaranteed.

Why recover after the fact? Wouldn’t it be better if I brought the recovery mechanism with me? This process can be automated by traversing the messages that have not received ack.

Second, the MQ blocking service is running normally.

By setting Kafka’s max block MS config
Parameters, in fact, can not block traffic, but will lose messages. I can use other storage to ensure that these lost messages are re sent.

3. Message delay.

MQ is dead, and there are still other standby channels for normal service. Some teams are forced to use double writing MQ and double consumption to ensure the process. If Kafka dies, the service will switch to the standby channel for consumption.

Expand your ha

If you don’t want to use redis, for example, you need to use HBase first, it’s also very simple.
But you need to implement an HA interface.

public interface HA {
    void close();

    void configure(Properties properties);

    void preSend(Packet packet) throws HaException;

    void postSend(Packet packet) throws HaException;

    void doRecovery(AbstractProducer producer) throws HaException;

Before using, you need to register your plug-in.

AbstractProducer.register("log", "com.sayhiai.arch.okmq.api.producer.ha.Ha2SimpleLog");

Important parameter

Okmq.ha.recoveryperiod recovery thread detection cycle, default 5 seconds

Okmq.redis.mode redis: single, sentinel, cluster
Okmq.redis.endpoint address, multiple addresses separated by
Okmq.redis.connectiontimeout connection timeout
Okmq.redis.sotimeout socket timeout
Okmq.redis.lockpx the holding time of distributed locks, which can be defaulted to 5min
Okmq.redis.splitmillis interval, redis changes to a key for operation, default 5min
Okmq. Redis. Poolconfig. * all jedis compatible parameters

Version 1.0.0 features

1. The high availability abstraction of the production end is carried out, and the Kafka example is implemented.

2. Added the Ping and Pong log implementation of simplelog.

3. Add the production side standby channel of redis. There are three modes: single, cluster and sentinel.

4. Other standby channels can be customized.

5. It is compatible with all parameter settings of kakfa.



1. Realize the integration of ActiveMQ.

2. Realize the backup channel integration of consumers.

3. Increase producer integration of embedded kV storage.

4. Control the behavior of the system more precisely.

5. Add the switch and preheating to prevent the new start MQ from being crushed.

6. Redis fragmentation mechanism, dedicated to large-scale systems.


1. Add monitoring function.

2. Add rest interface.

Usage restriction

When you set the parameter ha to true, you have received the following restrictions. On the contrary, the system reacts to the original.

Restrictions on use:
This tool is only applicable to non sequential, non transactional ordinary message delivery, and the client has done idempotent. Some order systems, message notification and other businesses are very suitable. If you need other features, please jump out of this page.

If Kafka dies, or redis dies alone, the message will eventually be sent out. Only if Kafka and redis die at the same time, the message will fail to be sent and recorded in the log file.

Under normal circumstances, the use capacity of redis is very small. Under abnormal circumstances, the capacity of redis is limited, and it will quickly fill up. The rest of redis’s time is yoursStopWatch, you have to recover your message system in this time. Be sure to resist.


At present, the system is in version 1.0.0, and is in online small-scale trial. The tools are small, but they are suitable for most application scenarios. If you are looking for such a solution, welcome a piece of perfect code.

GitHub address:

Also welcome to pay attention to the “little sister flavor” WeChat public number for communication.

Open source Kafka enhancement: okmq-1.0.0