Alibaba Cloud Message Queue Kafka-Message Retrieval Practice

Time:2022-11-23

author: Kafka&Tablestore

This article mainly introduces the troubleshooting methods for pain points such as message loss and repeated consumption encountered in the use of message queues, as well as the scenario practice of message queue Kafka “retrieval component”, and interprets its key technologies. The purpose is to help you become more familiar with the characteristics and usage of the message queue Kafka “retrieval component”, so as to more effectively solve the problems encountered in the message troubleshooting process.

Scenario Pain Point Introduction

In the process of using the message queue, due to its distributed nature, it is inevitable to encounter problems such as message loss and message retransmission.

  • For example, in log aggregation scenarios, multiple heterogeneous data sources usually produce data into Kafka for consumption by downstream computing engines such as Spark. When some logs are missing, it is difficult to check directly from the client’s logs due to the variety of message data sending methods and data structures.
  • Another example is in the process of message forwarding, the consumer may repeatedly consume the same data, which requires retrieving data from the message queue based on the content to determine whether the message is repeatedly produced, and the message queue can only be traversed according to the partition and consumption location Scanning cannot flexibly implement message retrieval.

None of the existing message queue products in the industry have better tools and methods to retrieve message content, which will greatly increase the difficulty of troubleshooting and investment costs.

Kafka message retrieval component

Introduction to Retrieval Components

The message queue Kafka “retrieval component” is a fully managed, highly flexible, and interactive retrieval component, which has the second-level response capability of trillion-level message content retrieval, and aims to solve the problem that the industry’s message products do not support retrieval of message content. The message queue Kafka “retrieval component” transfers the message data in the topic to the table store (Tablestore) through the Kafka Connector, and provides message retrieval capabilities based on the multi-index function of the table store. It can support retrieval based on one or more combined conditions such as message partition, location, and sending time range, and also supports full-text retrieval of messages based on message Key and Value.

Case Practice

case background

Assume that an operation and maintenance team needs to monitor the operation of the online cluster, collect process-level logs and import them into Kafka, and consume them downstream using Flink to calculate the resource consumption of each process in real time. When it is found in Flink that the log data of a certain period of time of a certain process is lost, it is necessary to use the message queue Kafka “retrieval component” to retrieve message data based on the message value and time range, and determine whether the log has been successfully pushed to the message queue Kafka .

For example, the collected log data is in JSON structure, and the format of a certain log data is:

key   =  276
value =  {"PID":"276","COMMAND":"Google Chrom","CPU_USE":"7.2","TIME":"00:01:44","MEM":"8836K","STATE":"sleeping","UID":"0","IP":"164.29.0.1"}

Open message retrieval

  1. First, you need to log in to the Alibaba Cloud message queue Kafka console, select the corresponding topic, and activate the message retrieval service.

Alibaba Cloud Message Queue Kafka-Message Retrieval Practice

  1. After the message retrieval service is activated, a Tablestore instance will be created automatically, and then the message data will be transferred to the Tablestore, and an index will be created to provide message retrieval capabilities. Each topic corresponds to a data table in Tablestore. You can view the message retrieval component details of each topic on the message queue Kafka console.

Alibaba Cloud Message Queue Kafka-Message Retrieval Practice

Message Retrieval Practice

  1. After the message retrieval service is activated, multiple search items in the message can be used to retrieve the message to realize the above case. For example specify a time range and retrieve messages with PID = 276 in the message Value.

Alibaba Cloud Message Queue Kafka-Message Retrieval Practice

  1. Return result example

Alibaba Cloud Message Queue Kafka-Message Retrieval Practice

Capability expansion

Introduction to Tablestore

Tablestore Tablestore is a structured data storage based on the underlying Feitian platform, which can provide hundreds of billions of scale data storage and millisecond-level data retrieval service capabilities. After the message queue Kafka dumps messages to Tablestore, it supports retrieval of messages through Tablestore’s native data access method. Tablestore supports more complex retrieval logic and supports retrieval of messages through SQL syntax. There are two ways to retrieve messages:

Multiple Index Search

  1. Log in to the Tablestore console, enter the Tablestore instance and data table corresponding to the Kafka message data dump, and select the multi-index search message on the index management page.

Alibaba Cloud Message Queue Kafka-Message Retrieval Practice

  1. For example, it is necessary to retrieve the messages whose Value contains PID=276 or PID=277.

Alibaba Cloud Message Queue Kafka-Message Retrieval Practice

  1. return result

Alibaba Cloud Message Queue Kafka-Message Retrieval Practice

SQL retrieve message

  1. Tablestore Tablestore supports retrieval of messages based on SQL syntax. First, an SQL mapping table needs to be created on the data table where the messages are dumped.

Alibaba Cloud Message Queue Kafka-Message Retrieval Practice

  1. Retrieve messages for PID=276 based on Tablestore SQL.

Alibaba Cloud Message Queue Kafka-Message Retrieval Practice

Summarize

Alibaba Cloud Message Queue Kafka “Retrieval Component” is the first component in the message queuing field to support interactive message content retrieval. Tablestore provides message retrieval service capabilities based on data dump table storage, and supports free combination retrieval based on any conditions such as Key, Value, and Partition. Messages, while supporting Key and Value full-text search messages, have the characteristics of free development, free operation and maintenance, and high flexibility. At the same time, messages can also be retrieved directly through the Tablestore index or SQL, which greatly improves the speed of daily troubleshooting for the existence or correctness of messages.

If you have any questions about Tablestore’s multi-index and SQL query in this article, welcome to join the technical exchange group, which provides free online expert services. Welcome to scan the code to join or search the group number 23307953.

Alibaba Cloud Message Queue Kafka-Message Retrieval Practice

clickhere, welcome to open a trial message queue Kafka “retrieval component” ~

Recommended Today

Where is the HTTP body?

Problem Description Business Feedback When the Golang service parses the request parameters, an “EOF” error occasionally occurs. It is suspected that the gateway or the intermediate link has lost the HTTP request body. The business error log statistics are as follows: To explain, the Golang service is based on the gin framework, and the method […]