Kafka (1) – meet Kafka

Time:2020-11-23

Background of Message Oriented Middleware

Problems that message middleware can solve

asynchronous

Kafka (1) - meet Kafka

In many business scenarios, we need to turn synchronous tasks into asynchronous tasks.

Take the registration function of an e-commerce platform for a simple analysis. Users’ registration of this service is not only completed by inserting a piece of data into the database, but also need to send a series of operations, such as sending activation email, sending new red packets or points, and sending marketing SMS. If every operation in this process needs 1s, then the whole registration process will take 4S to respond to the user.

Then we need to take these operations apart and optimize them into asynchronous processing logic.

  • We can use blocking queue + thread pool to implement producer consumer mode.

    • However, this method is only applicable to a single machine. Once the machine goes down, the data stored in the blocking queue will be lost.
  • Using message middleware to process

Peak clipping

Write the request to the queue first. Message queue has length. If the length of message queue exceeds the specified length, it will be discarded directly. This cuts off the peak flow.

Current limiting

The specific core processing business of seckill is to receive messages in the message queue for processing. The message processing capacity here depends on the throughput of the consumer

decoupling

Communication between different programming languages can be achieved through message queuing.

Message persistence

The ability to consume messages without worrying about the application hanging up

Of course, there are more application scenarios of message middleware. For example, in the weak consistency transaction model, distributed message queue can be used to realize maximum capacity notification to realize the final consistency of data

Thinking about the design of message middleware

We can start with basic needs

  • The most basic support message sending and receiving

    • Network communication will consider NiO
  • Storage of messages

    • Persistent, non persistent
  • Serialization, deserialization of messages
  • Cross language
  • Message acknowledgement mechanism

    • How to avoid message retransmission

Advanced features

  • Ordering of messages
  • Do you support event messages
  • Performance of message sending and receiving, support for high concurrency big data
  • Is cluster supported
  • Reliable transmission of messages
  • Is multi protocol supported

The development of Message Oriented Middleware

In fact, the development of message oriented middleware is also very interesting. We know that the emergence of any technology is to solve practical problems, which is through a common softwareBusThat is, a communication system to solve the problem between applicationsHeavy information communicationWork.

The earliest mouse was the field of financial trading, because at that time, traders needed to complete transactions through different terminals, and each terminal displayed different information.

If the message bus is connected, the trader only needs to operate on one terminal and then subscribe to the messages of interest to other terminals. Therefore, PubSub was born, and the information bus, the first modern message queuing software in the world, was born. TiB allows developers to establish a series of rules to describe the content of messages. As long as the messages are published according to these rules, any consumer application can subscribe to the messages of interest.

With the benefits of TiB being widely used in various fields, IBM also began to research and develop its own message oriented middleware. Three years later, IBM’s message queuing IBM MQ product series was released. After a period of time, MQ Series evolved into the business message queuing platform market dominated by WebSphere MQ.

Microsoft also developed its own message queuing (MSMQ) in the later period

Major manufacturers have studied their own MQ, but they operate their own MQ software in a commercial mode. What commercial MQ wants to solve is the problem of application interoperability, rather than creating standard interfaces to allow different MQ products to interoperate.

Switching between different message oriented middleware

Therefore, some large financial companies may use MQ products from multiple suppliers to serve different applications within the enterprise. Then the problem comes. If the application has already subscribed to TiB MQ messages and then suddenly needs to consume IBM MQ messages, the whole implementation process will be very troublesome.

JMS standard

In order to solve this problem, Java Message Service (JMS) was born in 2001. By providing a common Java API, JMS hides the implementation interface of individual MQ product suppliers, thus crossing different MQ consumption and solving interoperability problems. From the technical level, Java applications only need to program for the JMS API and select the appropriate MQ driver. JMS handles other parts. However, there are a lot of problems with the interface integration.

How can two programs using two different programming languages communicate with each other through their asynchronous messaging mechanism.

The emergence of AMQP

At this time, it is necessary to define a general standard for asynchronous messaging, so AMQP (Advanced message queuing protocol) advanced message queuing protocol came into being. It uses a set of standard underlying protocols and adds many other features to support interoperability. It enriches the requirements of message passing for modern applications, aiming at the task of standard coding Anyone can interact with an MQ server provided by any AMQP vendor.

MQTT

In addition to the JMS and AMQP specifications, there is also an mqtt (message queuing telemetry transport), which is specially designed for small devices. Because devices with low computational performance cannot adapt to the complex operations on AMQP, they need a simple and interoperable way to communicate.

This is the basic requirement of mqtt, and today, mqtt is one of the main components of the Internet of things (IOT) ecosystem

Kafka, which does not follow the protocol specifications mentioned above, focuses on throughput, similar to UDP and TCP

Introduction to Kafka

What is Kafka

Kafka is a distributed message publishing and subscribing system, which is characterized by high performance and high throughput.

It was originally designed as a conduit for LinkedIn’s activity flow and operational data. These data are mainly used for user profile analysis and server performance data monitoring

Therefore, Kafka was designed as a distributed, high-throughput message system at the beginning, so it is suitable for big data transmission scenarios.

Application scenarios of Kafka

Because Kafka has the advantages of better throughput, built-in partition, redundancy and fault tolerance (Kafka can process hundreds of thousands of messages per second), Kafka has become a good solution for large-scale message processing applications.

Therefore, in the enterprise application, it will be mainly applied in the following aspects

Behavior tracking

Kafka can be used to track user browsing, searching and other behaviors. Through publish subscribe mode, it is recorded in the corresponding topic in real time. Through the back-end big data platform access processing and analysis, further real-time processing and monitoring are carried out

Log collection

In terms of log collection, there are many excellent products, such as Apache flume. Many companies use Kafka agent log aggregation.

Log aggregation means that log files are collected from the server and then put into a centralized platform (file server) for processing. In the actual application development, the log of our application program will be output to the local disk. If the application forms a load balancing cluster and there are more than dozens of machines in the cluster, it will be very troublesome to quickly locate the problem through the log. Therefore, a unified log collection platform is generally used to manage log logs, which are used to quickly query important application problems. Therefore, many companies focus their application logs on Kafka and then import them to es and HDFS respectively for real-time retrieval and analysis and offline statistical data backup. On the other hand, Kafka itself provides a good API to integrate logs and do log collection

Kafka (1) - meet Kafka

Kafka’s own architecture

A typical Kafka cluster contains several producers

  • The message that can be generated by the application node
  • It can also be events generated by collecting logs through flume
  • Several brokers (Kafka supports horizontal expansion)
  • Several consumer groups
  • A zookeeper cluster.

Kafka manages cluster configuration and service collaboration through zookeeper.

The producer publishes the message to the broker using the push mode, and the consumer subscribes and consumes the message from the broker by listening.

Multiple brokers work together, producer and consumer are deployed in each business logic.

The request and forwarding are managed by zookeeper. In this way, a high-performance distributed message publishing and subscribing system is formed.

There is a detail in the figure that is different from other MQ middleware. The process of producer sending message to broker is push, while the process of consumer consuming message from broker is pull, which actively pulls data. Instead of the broker actively sending data to the consumer

Kafka (1) - meet Kafka

Explanation of terms

Broker

The Kafka cluster contains one or more servers, which are called brokers.

The broker side does not maintain the consumption state of data, which improves the performance.

Direct use of disk for storage, linear read-write, fast speed:

It avoids the replication of data between the JVM memory and the system memory, and reduces the performance consumption of object creation and garbage collection.

Producer

Responsible for publishing information to Kafka broker

Consumer

The message consumer is the client who reads the message to Kafka broker, and the consumer pulls and processes the data from the broker.

Topic

Each message published to the Kafka cluster has a category called topic.

  • Physically

    • Messages from different topics are stored separately
  • logically

    • Although the message of a topic is stored in one or more brokers, the user only needs to specify the topic of the message to produce or consume data, and does not care where the data is stored
Partition

Partition is a physical concept. Each topic contains one or more partitions

Consumer Group

Each consumer belongs to a specific consumer group (the group name can be specified for each consumer, otherwise it belongs to the default group)

Topic & Partition

Topic can be regarded as a queue logically. Each consumption must specify its topic, which can be simply understood as having to specify

Put this message in which queue.

In order to increase the throughput of Kafka linearly, the topic is physically divided into one or more partitions. Each partition is physically corresponding to a folder under which all messages and index files of the partition are stored. If two topics, topic1 and Topic2, are created with 13 and 19 partitions respectively, a total of 32 folders will be generated on the whole cluster (the cluster used in this paper has 8 nodes, where both topic1 and Topic2 replication factor are 1).

install and configure

Download Kafka

wget https://archive.apache.org/dist/kafka/2.0.0/kafka_2.11-2.0.0.tgz
tar -zxvf kafka_2.11-2.0.0.tgz -C /usr/local

Configure zookeeper

Since Kafka relies on zookeeper for master election and other data maintenance, it is necessary to start the zookeeper node first

Kafka has built-in zookeeper services, so these scripts are provided in the bin directory

zookeeper-server-start.sh
zookeeper-server-stop.sh

In the con fi g directory, there are some configuration files

zookeeper.properties
server.properties

So we can start ZK service through the following script. Of course, we can also build ZK cluster by ourselves

sudo sh zookeeper-server-start.sh -daemon ../config/zookeeper.properties

Start and stop Kafka

We can use our own zookeeper

Modification server.properties , add the configuration of zookeeper

zookeeper.connect=localhost:2181
  • Start Kafka

    sudo bin/kafka-server-start.sh -daemon config/server.properties
  • Stop Kafka

    sh bin/kafka-server-stop.sh -daemon config/server.properties

Basic operation of Kafka

Create topic

sh bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 -partitions 1 --topic test
  • Replication-factor

    • This indicates that several copies of the topic need to be saved in different brokers. Setting it to 1 here means that two copies are saved in two brokers
  • partitions

    • Number of partitions

View topic

sh bin/kafka-topics.sh --list --zookeeper localhost:2181
[[email protected] kafka_2.11-2.0.0]$ sh bin/kafka-topics.sh --list --zookeeper localhost:2181
__consumer_offsets
first_topic
test
test_partition

Creating consumers

sh bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test  --from-beginning

send message

sh bin/kafka-console-producer.sh --broker-list localhost:9092 --topic first_topic

Cluster environment installation

Environmental preparation

  • Prepare three virtual machines
  • Deploy Kafka’s installation package on three machines

Modify configuration

The following configuration changes are server.properties

  • Modify the server.properties Configuration, the ID of each machine in the same cluster must be unique
  • broker.id=0
    broker.id=1
    broker.id=2
  • Modify the connection configuration of zookeeper (specify the machine that is currently starting zookeeper)

    zookeeper.connect=192.168.30.2:2181
  • Modify listeners configuration

    • If listeners are configured, message producers and consumers will use the configuration of listeners to send and receive messages. Otherwise, they will use localhost
    • Plaintext indicates the protocol, which is plaintext by default. Other encryption protocols can be selected

      • listeners=PLAINTEXT://192.168.13.102:9092
  • Start three servers respectively

Recommended Today

Api: tiktok: user video list

Tiktok tiktok, tiktok, tiktok, Api, Api, jitter, voice, bullet screen comments, jitter, jitter, and jitter. Tiktok tiktok tiktok data, jitter data acquisition, live broadcast of shaking sound Titodata: professional short video data acquisition and processing platform. For more information, please contact:TiToData Massive data collection Collect 500 million pieces of data for customers every day Tiktok […]