Technical exploration: transactional event flow of Apache pulsar

Time:2021-12-31

About Apache pulsar

Apache pulsar is a top-level project of the Apache Software Foundation. It is a native distributed message flow platform for the next generation cloud. It integrates message, storage and lightweight functional computing. It adopts a separate architecture design of computing and storage, supports multi tenant, persistent storage, multi machine room cross regional data replication, and has strong consistency, high throughput Stream data storage features such as low latency and high scalability.
GitHub address:http://github.com/apache/pulsar/

Introduction: This is the text compiled version of the speech “technology exploration: transactional event flow of Apache pulsar” delivered by Cong Bo, a streamnative development engineer and Apache pulsar Committee, at the pulsar summit Asia 2020 conference. This speech mainly shares the transaction principle and planning of Apache pulsar. Please refer to it.

My name is Cong Bo. I’m a development engineer from streamnative. The topic I bring today is “technology exploration: transactional event flow of Apache pulsar”.

Message semantics

As we all know, in all message systems, stream data platforms have different semantics for messages. There are three general semantics: at most once, at least once, and exactly once.

  • At most once: at least once. It doesn’t care whether the message is successfully sent or not. It doesn’t need to be sent.
  • At least once: at least once, the message is allowed to repeat, but the message must reach.
  • Exactly once: accurate once to ensure that messages are not lost and will not be repeated.

At most once

Pulsar at 1.2 Before version 0, at most once semantics has been implemented.

At least once

Pulsar follows at least once semantics at the beginning of its design. Retrying after sending a message fails is the basic way to ensure at least once semantics. Sending retry will cause duplicate messages. In some usage scenarios, the producer is required not to send duplicate messages and the consumer cannot consume repeatedly. Therefore, the exact only semantics is generated.

Exactly once

The implementation of exactly once requires the de duplication of consumption / production.

How to remove weight in pulsar?

  • Producer: idempotent producer;
  • Broker: ensure message deduplication (pip-6);
  • Consumer: Reader + Checkpoints (Flink / Spark)。

How do I turn on exactly once?

Set the set duplication for the name space of topic. Through admin and other operations:

  • bin/pulsar-admin set-deduplication -e tenant/namespace
  • Set producer name and sequence ID when creating producer;
  • Specifies the incremental sequence ID when the message is generated.

Limitations:

  • Valid only when generating messages to a partition;
  • Only for generating one message;
  • There is no atomicity when generating multiple messages on one or more partitions;
  • The consumer needs to store the message ID and its status, and look for the message ID when restoring the status.

How does transaction handle events

Describe how the transaction in the streaming message system handles events through the example of logical operation of transfer:

Now there are Alice and Bob. Alice will transfer Bob ten dollars. How to realize this function through pulsar?

  • Transfer topic: record the transfer request;
  • Cash transfer function: process transfer;
  • Balanceupdate topic: record balance update requests.

Alice to Bob. After receiving this transfer message, the transfer function will send a message that Bob’s balance increases by ten yuan to the balanceupdate topic and a message that Alice’s balance decreases by ten yuan to the balanceupdate topic. After receiving all the returned values, ACK this transfer message. There is no problem when all operations will not fail. But in the past, all its operations may have problems.

Technical exploration: transactional event flow of Apache pulsar

As shown in Figure 1, the transfer message will be consumed again after the ACK fails. The consequence is that Alice transferred ten yuan to Bob again, and Alice transferred twenty yuan to Bob in total. If every ack fails, Alice’s account may be heavily in debt and Bob becomes a billionaire.

Technical exploration: transactional event flow of Apache pulsar

As shown in Figure 2, Bob’s message of increasing the balance was not successfully sent to the corresponding balanceupdate topic. The phenomenon is that Bob’s balance did not increase, but Alice’s balance decreased.

Pulsar Transaction

How to use pulsar’s transaction to achieve this?

Transaction semantics:

  • Ensure multi partition atomic message writing;
  • Ensure atomicity and confirm multiple subscriptions;
  • All operations in a transaction succeed or fail;
  • Allow consumers to read submitted messages.

How to implement the above example without transaction API?

Message<String> message = inputConsumer.receive();
 
CompletableFuture<MessageId> sendFuture1 =
producer1.newMessage().value(“output-message-1”).sendAsync();
CompletableFuture<MessageId> sendFuture2 =
producer2.newMessage().value(“output-message-2”).sendAsync();
 
inputConsumer.acknowledgeAsync(message.getMessageId());

As shown in Figure 3:

Technical exploration: transactional event flow of Apache pulsar

After receiving a message from the input consumer, producer 1 will send a message to topic1, producer 2 will send a message to Topic2, and then ack the received message.

Pulsar’s transaction API is actually very simple, without much change to the original logic to be implemented:

Message<String> message = inputConsumer.receive();
Transaction txn = client.newTransaction().withTransactionTimeout(…).build().get();
 
CompletableFuture<MessageId> sendFuture1 =
producer1.newMessage(txn).value(“output-message-1”).sendAsync();
CompletableFuture<MessageId> sendFuture2 =
producer2.newMessage(txn).value(“output-message-2”).sendAsync();
inputConsumer.acknowledgeAsync(message.getMessageId(), txn);
 
txn.commit().get();
 
MessageId msgId1 = sendFuture1.get();
MessageId msgId2 = sendFuture2.get();
 
inputConsumer.acknowledgeAsync(message.getMessageId(), txn);
 
txn.commit().get();

Pulsar transaction has the following three components:

  • TC (Transaction Coordinator) is responsible for managing transaction metadata.
  • TB (transaction buffer) is responsible for processing and sending messages with transaction.
  • TP (transaction pending ACK) is responsible for handling ack requests with transaction.

Technical exploration: transactional event flow of Apache pulsar

As shown in Figure 4, the operation of creating transaction is recorded in TC.

Technical exploration: transactional event flow of Apache pulsar

As shown in Figure 5, pulsar client has successfully created txn1 and requested txn1 to send messages to topic1 and Topic2. After receiving the sending request, TC records the sending metadata and responds to the client. The client sends a message to topic1 and Topic2 respectively.

Technical exploration: transactional event flow of Apache pulsar

As shown in Fig. 6, it is basically the same as that described in Fig. 5. Only the difference between sending and signing.

Technical exploration: transactional event flow of Apache pulsar
Technical exploration: transactional event flow of Apache pulsar

As shown in Fig. 7 and Fig. 8, the pulsar client waits for all ACK and produce to complete the commit transaction. After the TC receives the commit request, the status of txn1 will be changed to committing, and the information of txn1 in TP and TB will be processed.

Technical exploration: transactional event flow of Apache pulsar

As shown in Figure 9, after processing TP and TB, TC will change the status of txn1 to committed.

The above is the complete life cycle of a transaction.

Let’s take another example of transfer:

Technical exploration: transactional event flow of Apache pulsar

With pulsar transaction support, all operations either succeed or fail. This ensures the correctness of Alice and Bob balance operations.

Future planning of pulsar transaction

Pulsar transaction is designed to make the event flow system simpler and more reliable. For many business scenarios, there may be fewer operations dealing with idempotency when dealing with business scenarios.

Then, the following is the future development plan of pulsar transaction:

  • Transaction support in other languages (e.g. C++, Go)
  • Transaction in Pulsar Functions & Pulsar IO
  • Transaction in Kafka-on-Pulsar (KOP)
  • Transaction for Flink / Spark job
  • Transaction for State storage in Pulsar Functions

If you are interested in the content above, please scan the QR code below and reply to “join the group” and discuss it with us in pulsar communication group at any time.

Technical exploration: transactional event flow of Apache pulsar

To learn more about the introduction in this article, please scan the following applet code to view the full version of the video:

Technical exploration: transactional event flow of Apache pulsar

Related reading

Technical exploration: transactional event flow of Apache pulsar

clicklink, get Apache pulsar hard core dry goods information!

Recommended Today

Tutorial on sending e-mail using net:: SMTP class in Ruby

Simple Mail Transfer Protocol(SMTP)SendE-mailAnd routing protocol processing between e-mail servers. RubyIt provides the connection of simple mail transfer protocol (SMTP) client of net:: SMTP class, and provides two new methods: new and start New takes two parameters: Server name defaults to localhost Port number defaults to 25 The start method takes these parameters: Server – […]