Kafka learning materials

Time:2020-11-25

Kafka

1、 Benefits of message middleware

1. Decoupling

It allows you to extend or modify processes on both sides independently, as long as you make sure they comply with the same interface constraints. It would be a great waste to put resources on standby to handle such peak visits. The use of message queue can make the key components withstand the sudden access pressure, and will not collapse completely because of the sudden overload request.

2. Asynchronous

Most of the time, users don’t want or need to process messages immediately. Message queuing provides an asynchronous processing mechanism that allows users to put a message on the queue but not process it immediately. Put as many messages as you want in the queue, and then process them when necessary.

3. Flexibility / peak shaving

In the case of a sharp increase in traffic, applications still need to continue to play a role, but such burst traffic is not common. It would be a great waste to put resources on standby to handle such peak visits. The use of message queue can make the key components withstand the sudden access pressure, and will not collapse completely because of the sudden overload request.

4. Recoverability

When a part of the system fails, the whole system will not be affected. Message queuing reduces the coupling between processes, so even if a process processing messages is down, messages added to the queue can still be processed after the system recovers.

5. Buffer

It is helpful to control and optimize the speed of data flow through the system, and solve the inconsistency between the processing speed of production message and consumption message.

2、 Mode of message queue communication

1. Point to point mode (one to one, consumers pull data actively, and messages are cleared after receiving messages)

Kafka learning materials

2. Publish subscribe mode (one to many, messages will not be cleared after consumers consume data)

Kafka generally uses the pull from the consumer side, which will always poll and waste resources. (you can set a waiting time. If you don’t get the information, you will wait for the set time.)

Kafka learning materials

3、 Kafka

Kafka was created byApache Software FoundationAn open source stream processing platform developed byScala(a kind of Java language) andJavato write. Kafka is a high throughputDistributedThe message queue based on publish / subscribe mode is mainly used in the field of big data real-time processing (spark real-time analysis framework).

1. Characteristics of Kafka
  • High throughput and low latency: Kafka can process hundreds of thousands of messages per second, and its latency is as low as a few milliseconds. Each topic can be divided into multiple partitions, and the consumer group performs the aggregate operation on the partition.
  • Scalability: Kafka cluster supports hot expansion
  • Persistence and reliability: messages are persisted to local disk, and data backup is supported to prevent data loss
  • Fault tolerance: allow nodes in the cluster to fail (if the number of copies is n, n-1 nodes are allowed to fail)
  • High concurrency: support thousands of clients to read and write at the same time
2. Basic structure of Kafka

Kafka learning materials

1) Producer: the message producer is the client who sends messages to Kafka broker; 2) consumer: the message consumer, the client that receives the message from the Kafka broker; 3) consumer group (CG): the consumer group, which consists of multiple consumers. Each consumer in the consumer group is responsible for consuming data in different zones, and one partition can only be consumed by consumers in one group; the consumers group does not affect each other. All consumers belong to a certain consumer group, that is, the consumer group is a logical subscriber. 4) Broker: a Kafka server is a broker. A cluster consists of multiple brokers. A broker can hold multiple topics. 5) Topic: it can be understood as a queue, and producers and consumers are facing a topic; 6) partition: in order to achieve scalability, a very large topic can be distributed on multiple brokers (i.e. servers), one topic can be divided into multiple parts, each of which is an orderly queue; 7) Replica: replica. In order to ensure that the partition data on a node in the cluster fails, and Kafka can still continue to work, Kafka provides a replica mechanism. Each partition of a topic has several copies, one leader and several followers. 8) Leader: the “master” of multiple copies of each partition, the object of data sent by the producer and the object of consumer consumption data are leaders. 9) Follower: from multiple copies of each partition, synchronize data from leader in real time, and keep synchronization with leader data. When a leader fails, a follower becomes a new follower

The optimal design is that the number of consumer threads in the consumer group is equal to the number of partitions, so the efficiency is the highest.

Therefore, for our online distributed multiple service services, the number of Kafka consumers in each service is less than the partition number of the corresponding topic, but the number of consumers of all services is only equal to the number of partitions. This is because all consumers of distributed service come from one consumer group, if they are from different consumers Groups will handle duplicate messages (consumers under the same consumer group cannot handle the same partition, and different consumer groups can handle the same topic. Then all messages are processed sequentially, and they must be processed repeatedly. In general, two different business logics are used to start two consumer groups to process a topic).

If the flow of producer increases, the number of paritions of the current topic equals the number of consumers. At this time, the solution is to increase the partition under the topic and increase the consumers under the consumer group.

3. Kafka installation and deployment
3.1 download jar package

http://kafka.apache.org/downloads.html

I use Kafka here_ Version 2.11-0.11.0.0.tgz

3.2 cluster deployment

1) Unzip the installation package

[[email protected] software]$ tar -zxvf kafka_2.11-0.11.0.0.tgz -C /opt/module/

2) Modify the extracted file name

[[email protected] module]$ mv kafka_2.11-0.11.0.0/ kafka

3) Create the logs folder in the / opt / module / Kafka directory

[[email protected] kafka]$ mkdir logs

4) Modify configuration file

[[email protected] kafka]$ cd config/

[[email protected] config]$ vi server.properties

Enter the following:

The global unique number of the broker, which cannot be repeated

broker.id=0

Open the delete topic function

delete.topic.enable=true

Number of threads processing network requests

num.network.threads=3

Off the shelf quantity to handle disk IO

num.io.threads=8

The buffer size of the send socket

socket.send.buffer.bytes=102400

Buffer size of the receive socket

socket.receive.buffer.bytes=102400

The buffer size of the request socket

socket.request.max.bytes=104857600

Path of Kafka running log storage

log.dirs=/opt/module/kafka/logs

The number of partitions of topic on the current broker

num.partitions=1

The number of threads used to recover and clean data under data

num.recovery.threads.per.data.dir=1

The maximum time that the segment will be deleted

log.retention.hours=168

Configure the connection zookeeper cluster address

zookeeper.connect=hadoop102:2181,hadoop103:2181,hadoop104:2181

5) Configure environment variables

[[email protected] module]$ sudo vi /etc/profile

KAFKA_HOME

export KAFKA_HOME=/opt/module/kafka
export PATH=$PATH:$KAFKA_HOME/bin

[[email protected]p102 module]$ source /etc/profile

6) Similarly, modify the other two Kafka

server.properties Medium broker.id=1 , broker.id=2
Note: broker.id No repetition

7) Start cluster

[[email protected] kafka]$ bin/kafka-server-start.sh -daemon config/server.properties \[[email protected] kafka\]$ bin/kafka-server-start.sh -daemon config/server.properties [[email protected] kafka]$ bin/kafka-server-start.sh -daemon config/server.properties

8) Shut down the cluster

[[email protected] kafka]$ bin/kafka-server-stop.sh stop
[[email protected] kafka]$ bin/kafka-server-stop.sh stop
[[email protected] kafka]$ bin/kafka-server-stop.sh stop

9) Kafka group script

for i in hadoop102 hadoop103 hadoop104
do
echo “========== $i ==========”
ssh $i ‘/opt/module/kafka/bin/kafka-server-start.sh -daemon /opt/module/kafka/config/server.properties’
done

4. Kafka basic command

1) View all topics in the current server

[[email protected] kafka]$ bin/kafka-topics.sh –zookeeper hadoop102:2181 –list

2) Create topic

[[email protected] kafka]$ bin/kafka-topics.sh –zookeeper hadoop102:2181 –create –replication-factor 3 –partitions 1 -topic first

–Topic defines the topic name — replication factor defines the number of copies — partitions defines the number of partitions

3) Delete topic

[[email protected] kafka]$ bin/kafka-topics.sh –zookeeper hadoop102:2181 –delete –topic first

need server.properties Set in delete.topic.enable=true Otherwise, it just marks the deletion.

4) Send message

[[email protected] kafka]$ bin/kafka-console-producer.sh –brokerlist hadoop102:9092 –topic first
>hello world
>atguigu zywx

5) Consumer News

[[email protected] kafka]$ bin/kafka-console-consumer.sh –zookeeper hadoop102:2181 –topic first

The connection hookeeper method has expired and can be used temporarily, but it is not recommended

[[email protected] kafka]$ bin/kafka-console-consumer.sh –bootstrap-server hadoop102:9092 –topic first

[[email protected] kafka]$ bin/kafka-console-consumer.sh –bootstrap-server hadoop102:9092 –from-beginning –topic first

–From starting: all previous data in the topic will be read out

6) View the details of a topic

[[email protected] kafka]$ bin/kafka-topics.sh –zookeeper hadoop102:2181 –describe –topic first

7) Modify the number of partitions

[[email protected] kafka]$ bin/kafka-topics.sh –zookeeper hadoop102:2181 –alter –topic first –partitions 6

4、 Kafka architecture in depth

1. Kafka producers

1.1 zoning strategy

1) The reasons for partitioning are as follows: (1) it is convenient to expand in the cluster. Each partition can be adjusted to fit the machine where it is located, and a topic can be composed of multiple partitions, so the whole cluster can adapt to data of any size; (2) it can improve concurrency, because it can read and write by partition. 2) Partition principle: we need to encapsulate the data sent by producer into a producer record object.

Kafka learning materials

(1) when the partition is specified, the specified value is directly taken as the partition value; (2) if the partition value is not specified but there is a key, the partition value is obtained by taking the hash value of the key and the partition number of the topic to get the partition value; (3) there is neither partition value nor key In the case of value, an integer is randomly generated on the first call (the integer is automatically increased in each subsequent call). The partition value is obtained by taking this value and the total number of partitions available in the topic to get the partition value, which is commonly known as the round robin algorithm.

1.2 data reliability assurance

To ensure that the data sent by the producer can be reliably sent to the specified topic, each partition of the topic needs to send an ACK to the producer after receiving the data sent by the producer. If the producer receives the ACK, it will send the next round of transmission, otherwise the data will be sent again.

Kafka learning materials

1) Replica data synchronization policy

programme

advantage

shortcoming

If more than half of the followers complete synchronization, ACK is sent

Low latency (follower synchronization can be fast or slow. If more than half of the synchronization is completed, ACK will be sent to filter out the full ones)

When electing a new leader, the fault tolerance of N nodes requires 2n + 1 copies (more than half of the voting participants need n + 1 nodes to survive, and a total of N + 1 + N copies are required)

After all followers are synchronized, ACK is sent

When selecting a new leader, N + 1 copies are needed to tolerate the failure of N nodes

High delay (fast synchronization needs to wait for full synchronization, resulting in high delay)

KafkaSelect the second scheme (only send ack after all synchronization is completed)The reasons are as follows:

  1. Similarly, in order to tolerate the failure of N nodes, the first scheme needs 2n + 1 copies, while the second scheme only needs n + 1 copies. Each partition of Kafka has a large amount of data, and the first scheme will cause a lot of data redundancy.
  2. Although the network delay of the second scheme will be higher, the network delay has little influence on Kafka.

2)ISR

After adopting the second scheme, imagine the following scenario: the leader receives the data, and all the followers start to synchronize the data, but there is a follower who can’t synchronize with the leader due to some fault. The leader has to wait until it completes the synchronization before sending the ACK. How to solve this problem?

The leader maintains a dynamicin-sync replica set(ISR synchronous copy list), meaningA collection of followers synchronized with the leader。 When the follower in ISR completes data synchronization, the leader will send an ACK to the follower. If a follower does not synchronize data with the leader for a long time, the follower will be kicked out of ISR, and the time threshold is set byreplica.lag.time.max.msParameter setting. When the leader fails, a new leader is elected from the ISR.

3) ACK response mechanism

For some unimportant data, the reliability requirement of data is not very high. It can tolerate a small amount of data loss, so it is unnecessary to wait for all the followers in ISR to receive successfully.

So Kafka provides users with three levels of reliability: usersTrade off between reliability and delay requirements, select the configuration below.

Acks parameter configuration:

  • 0: producer does not wait for the broker’s ack. This operation provides a minimum delay. The broker will return as soon as it receives that it has not been written to the disk. It is possible when the broker failsLost data
  • 1: producer waits for the broker’s ACK, and the leader of the partition returns an ACK after the successful disk placement. If the leader fails before the success of the follower synchronization, and because the ACK has been returned, the system defaults that the newly elected leader already has data, so it will not fail to try againLost data

    For some unimportant data, the reliability requirement of data is not very high. It can tolerate a small amount of data loss, so it is unnecessary to wait for all the followers in ISR to receive successfully.

    So Kafka provides users with three levels of reliability: usersTrade off between reliability and delay requirements, select the configuration below.

    Acks parameter configuration:

    • 0: producer does not wait for the broker’s ack. This operation provides a minimum delay. The broker will return as soon as it receives that it has not been written to the disk. It is possible when the broker failsLost data
    • 1: producer waits for the broker’s ACK, and the leader of the partition returns an ACK after the successful disk placement. If the leader fails before the success of the follower synchronization, and because the ACK has been returned, the system defaults that the newly elected leader already has data, so it will not fail to try againLost data
    • -1(all): producer will wait for the broker’s ACK, the leader and follower of the partition to be successful before returning the ACK. However, if the leader fails after the follower synchronization is completed and before the broker sends the ACK, the ACK is not returned to the producer. Due to the failure of the retrial mechanism, data will be sent to the newly elected leaderData duplication

4) Troubleshooting details

Kafka learning materials

Leo: refers to the maximum offset of each replica; HW: refers to the largest offset that consumers can see, and the smallest Leo in ISR queue

(1) A follower will be temporarily kicked out of ISR after a failure. After the follower recovers, the follower will read the last HW recorded on the local disk, intercept the part of the log file higher than HW, and synchronize with the leader from HW. When the Leo of the follower is greater than or equal to the HW of the partition, that is, after the follower catches up with the leader, ISR can be rejoined. (2) Leader failure after the leader fails, a new leader will be selected from ISR. Then, in order to ensure the data consistency between multiple copies, the rest of the followers will first truncate the part of their log file higher than HW, and then synchronize the data from the new leader. Note: this can only guarantee the data consistency between the replicas, and it does not guarantee that the data will not be lost or duplicated.

2. Consumers

2.1 mode of consumption

The consumer uses pull mode to read data from the broker. The push mode is difficult to adapt to consumers with different consumption rates because the message sending rate is determined by the broker. Its goal is to deliver the message as fast as possible, but it is easy to cause the consumer to have no time to process the message. The typical performance is denial of service and network congestion. The pull mode can consume messages at an appropriate rate according to the consumer’s consumption capacity.

The drawback of pull mode is that if Kafka has no data, consumers may fall into a loop and return empty data all the time. In view of this, Kafka’s consumers will pass in a duration parameter timeout when consuming data. If there is no data available for consumption at present, the consumer will wait for a period of time before returning, which is called timeout.

2.2 partition allocation strategy

There are multiple consumers in a consumer group and multiple partitions in a topic. Therefore, it is necessary to determine which consumer is responsible for the partition allocation. Kafka has two allocation strategies, one is round robin (allocated according to consumption group) and the other is range (allocated according to topic topic).

Allocate when starting consumers, increasing or decreasing consumers.

2.3 maintenance of offset

Since the consumer may have power failure and other failures during consumption, after the consumer recovers, it needs to continue consumption from the position before the failure. Therefore, the consumer needs to record the offset it consumes in real time, so as to continue consumption after fault recovery.

Before Kafka version 0.9, consumer saves offset in zookeeper by default. Since version 0.9, consumer saves offset in a built-in topic of Kafka by default. The topic is__ consumer_ offsets。
1) Modify configuration file

consumer.properties exclude.internal.topics=false

2) Read offset

Before 0.11.0.0:

bin/kafka-console-consumer.sh –topic __consumer_offsets -zookeeper hadoop102:2181 –formatter “kafka.coordinator.GroupMetadataManager\$OffsetsMessageFormatter” –consumer.config config/consumer.properties –from-beginning

Version after 0.11.0.0, including:

bin/kafka-console-consumer.sh –topic __consumer_offsets -zookeeper hadoop102:2181 –formatter “kafka.coordinator.group.GroupMetadataManager\$OffsetsMessageForm atter” –consumer.config config/consumer.properties –from-beginning

2.4 consumer group cases

1) Demand: test the consumers in the same consumer group, and only one consumer can consume at the same time. 2) Case operation (1) modified on Hadoop 102 and Hadoop 103 consumer.properties In the configuration file group.id Property is any group name.

[[email protected] config]$ vi consumer.properties
group.id=zywx

(2) start consumers on Hadoop 102 and Hadoop 103 respectively

[[email protected] kafka]$ bin/kafka-console-consumer.sh –bootstrap-server hadoop103:9092 –topic first –consumer.config config/consumer.properties
[[email protected] kafka]$ bin/kafka-console-consumer.sh –bootstrap-server hadoop102:9092 –topic first –consumer.config config/consumer.properties

(3) start producers on Hadoop 104

[[email protected] kafka]$ bin/kafka-console-producer.sh \ –broker-list hadoop102:9092 –topic first
>hello world

(4) check the recipients of Hadoop 102 and Hadoop 103. Only one consumer receives the message at the same time.

5、 Kafka API

5.1 Producer API

5.1.1 message sending process

Kafka’s producer sends messages asynchronously. In the process of message sending, it involves two threads – main thread and sender thread, and a thread sharing variable recordaccumulator. The main thread sends messages to recordaccumulator, and the sender thread constantly pulls messages from recordaccumulator to Kafka broker.

Kafka learning materials

5.1.2 asynchronous send API

1) Import dependency

<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
<version>0.10.1.1</version>
</dependency>

2) Write code

1. API without callback function

package com.zywx.producer;

import org.apache.kafka.clients.CommonClientConfigs;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerConfig;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.clients.producer.RecordMetadata;

import java.util.Properties;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.Future;

public class MyProducer {
public static void main(String[] args) {
//1. Create configuration information of Kafka producer
Properties properties = new Properties();
//Specifies the Kafka cluster to which the connection is made
properties.put(“bootstrap.servers”,”192.168.25.128:9091″);

//ACK response level
properties.put(“acks”, “all”);

//Number of retries
properties.put(“retries”, 1);

//Batch size
properties.put(“batch.size”, 16384);

//Waiting time
properties.put(“linger.ms”, 1);

//Recordaccumulator buffer size
properties.put(“buffer.memory”, 33554432);

//Serialization class of key and value
properties.put(“key.serializer”, “org.apache.kafka.common.serialization.StringSerializer”);
properties.put(“value.serializer”, “org.apache.kafka.common.serialization.StringSerializer”);

//Create producer object
KafkaProducer<String, String> producer = new KafkaProducer<>(properties);

//Send data
for (int i = 0; i < 10; i++) {
producer.send(new ProducerRecord<String, String>(“first”,”zywx—-” + i));
}

//Close resource
producer.close();
}
}

Kafka learning materials

2. API with callback function

The callback function will be called when the producer receives the ACK, which is called asynchronously. The method has two parameters, recordmetadata and exception. If the exception is null, the message is sent successfully. If the exception is not null, it indicates that the message has failed to be sent.

public class CallBackProducer {
public static void main(String[] args) {
//1. Create configuration information
Properties properties = new Properties();
properties.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,”192.168.25.128:9091″);

properties.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,
“org.apache.kafka.common.serialization.StringSerializer”);
properties.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,
“org.apache.kafka.common.serialization.StringSerializer”);

//2. Create producer object
KafkaProducer<String, String> producer = new KafkaProducer<String, String>(properties);

//3. Sending data
for (int i = 0; i < 10 ; i++) {
producer.send(new ProducerRecord<String, String>(“lol”,”zywx===” + i), (recordMetadata, e) -> {
if (e == null){
System.out.println(recordMetadata.partition()+”======”+recordMetadata.offset());
}else {
e.printStackTrace();
}
});
}

//4. Close resources
producer.close();
}
}

Kafka learning materials

package com.zywx.producer;

import org.apache.kafka.clients.producer.*;

import java.util.ArrayList;
import java.util.Properties;

public class CallBackProducer {
public static void main(String[] args) {
//1. Create configuration information
Properties properties = new Properties();
properties.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,”192.168.25.128:9091″);
properties.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,”org.apache.kafka.common.serialization.StringSerializer”);
properties.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,”org.apache.kafka.common.serialization.StringSerializer”);

//2. Create producer object
KafkaProducer<String, String> producer = new KafkaProducer<String, String>(properties);

ArrayList<String> list = new ArrayList<>();
list.add(“a”);
list.add(“b”);
list.add(“c”);
//3. Sending data
for (int i = 0; i < 10 ; i++) {
producer.send(new ProducerRecord<String, String>(“lol”, list.get(i%3),”zywx===” + i), (recordMetadata, e) -> {
if (e == null){
System.out.println(recordMetadata.partition()+”======”+recordMetadata.offset());
}else {
e.printStackTrace();
}
});
}

//4. Close resources
producer.close();
}
}

Kafka learning materials

5.1.3 custom partition

Write a custom partition

package com.zywx.partitioner;

import org.apache.kafka.clients.producer.Partitioner;
import org.apache.kafka.common.Cluster;

import java.util.Map;

public class MyPartitioner implements Partitioner {

@Override
public int partition(String s, Object o, byte[] bytes, Object o1, byte[] bytes1, Cluster cluster) {
// Integer count = cluster.partitionCountForTopic(topic);
// return key.toString().hashCode() % count;
return 1;
}

@Override
public void close() {

}

@Override
public void configure(Map<String, ?> map) {

}
}

Interface implemented by custom partition

//
// Source code recreated from a .class file by IntelliJ IDEA
// (powered by Fernflower decompiler)
//

package org.apache.kafka.clients.producer;

import org.apache.kafka.common.Cluster;
import org.apache.kafka.common.Configurable;

public interface Partitioner extends Configurable {
int partition(String var1, Object var2, byte[] var3, Object var4, byte[] var5, Cluster var6);

void close();
}

Interface implementation class (default partition method)

//
// Source code recreated from a .class file by IntelliJ IDEA
// (powered by Fernflower decompiler)
//

package org.apache.kafka.clients.producer.internals;

import java.util.List;
import java.util.Map;
import java.util.Random;
import java.util.concurrent.atomic.AtomicInteger;
import org.apache.kafka.clients.producer.Partitioner;
import org.apache.kafka.common.Cluster;
import org.apache.kafka.common.PartitionInfo;
import org.apache.kafka.common.utils.Utils;

public class DefaultPartitioner implements Partitioner {


private final AtomicInteger counter = new AtomicInteger((new Random()).nextInt());

public DefaultPartitioner() {
}

public void configure(Map<String, ?> configs) {
}

public int partition(String topic, Object key, byte\[\] keyBytes, Object value, byte\[\] valueBytes, Cluster cluster) {
    List<PartitionInfo> partitions = cluster.partitionsForTopic(topic);
    int numPartitions = partitions.size();
    if (keyBytes == null) {
        int nextValue = this.counter.getAndIncrement();
        List<PartitionInfo> availablePartitions = cluster.availablePartitionsForTopic(topic);
        if (availablePartitions.size() > 0) {
            int part = Utils.toPositive(nextValue) % availablePartitions.size();
            return ((PartitionInfo)availablePartitions.get(part)).partition();
        } else {
            return Utils.toPositive(nextValue) % numPartitions;
        }
    } else {
        return Utils.toPositive(Utils.murmur2(keyBytes)) % numPartitions;
    }
}

public void close() {
}

}

Test custom partition

package com.zywx.producer;

import org.apache.kafka.clients.producer.*;

import java.util.Properties;

public class PartitionProducer {

public static void main(String\[\] args) {        
    //1. Create configuration information of Kafka producer
    Properties properties = new Properties();       
   //Specifies the Kafka cluster to which the connection is made
   properties.put("bootstrap.servers","192.168.25.128:9091");
    //ACK response level
    properties.put("acks", "all");

    //Number of retries
    properties.put("retries", 1);

    //Batch size
    properties.put("batch.size", 16384);

    //Waiting time
    properties.put("linger.ms", 1);

    //Recordaccumulator buffer size
    properties.put("buffer.memory", 33554432);

    //Serialization class of key and value
    properties.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
    properties.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");

    //Add partition
    properties.put(ProducerConfig.PARTITIONER\_CLASS\_CONFIG, "com.zywx.partitioner.MyPartitioner");

    //Create producer object
    KafkaProducer<String, String> producer = new KafkaProducer<>(properties);

    //Send data
    for (int i = 0; i < 10; i++) {
        producer.send(new ProducerRecord<String, String>("first", "zywx----" + i), new Callback() {
            @Override
            public void onCompletion(RecordMetadata recordMetadata, Exception e) {
                if (e == null){
                    System.out.println(recordMetadata.partition());
                }
            }
        });
    }

    //Close resource
    producer.close();
}

}

Kafka learning materials

5.1.4 synchronous sending API

Synchronous sending means that after a message is sent, the current thread will be blocked until ack is returned.

Since the send method returns a future object, according to the characteristics of Futrue object, we can also achieve the effect of synchronous sending, just call the get of future object.

…………
    
    //Send data
    for (int i = 0; i < 10; i++) {
        //The get method of future will block other threads to achieve synchronization
        Future<RecordMetadata> future = producer.send(new ProducerRecord<String, String>("first", "zywx----" + i));
        try {
            RecordMetadata recordMetadata = future.get();
        } catch (InterruptedException e) {
            e.printStackTrace();
        } catch (ExecutionException e) {
            e.printStackTrace();
        }
    }
    
    ……………

5.2 Consumer API

5.2.1 basic message monitoring

package com.zywx.consumer;

import org.apache.kafka.clients.consumer.ConsumerConfig;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;

import java.util.Arrays;
import java.util.Properties;

public class MyConsumer {

public static void main(String\[\] args) {
    //1. Create consumer configuration information
    Properties properties = new Properties();

    //2. Assign values to configuration information
    //Connected clusters
    properties.put(ConsumerConfig.BOOTSTRAP\_SERVERS\_CONFIG,"192.168.25.128:9091");
    //Turn on auto submit
    properties.put(ConsumerConfig.ENABLE\_AUTO\_COMMIT\_CONFIG,true);
    //Delay of auto commit offset
    properties.put(ConsumerConfig.AUTO\_COMMIT\_INTERVAL\_MS\_CONFIG,"1000");
    
    //Key / value deserialization properties.put ( ConsumerConfig.KEY * DESERIALIZER\_ CLASS\_ CONFIG," org.apache.kafka . common.serialization.StringDeserializer "";
    properties.put(ConsumerConfig.VALUE\_DESERIALIZER\_CLASS\_CONFIG,"org.apache.kafka.common.serialization.StringDeserializer");
    
    //Consumer group
    properties.put(ConsumerConfig.GROUP\_ID\_CONFIG,"demodata");

    //3. Create consumers
    KafkaConsumer<String,String> consumer = new KafkaConsumer<String, String>(properties);

    //4. Subscribe to topics
    consumer.subscribe(Arrays.asList("first","lol"));

    while (true){
        //5. Access to data
        ConsumerRecords<String, String> records = consumer.poll(100);

        //6. Analyze and print
        for (ConsumerRecord<String, String> record : records) {
            System.out.println(record.key()+"======="+record.value());
        }
    }

}

}

Reread previously sent messages

(you can only consume the last seven days’ messages saved, and start to consume the earliest messages in the seven days)

//The consumer should belong to a new consumer group
properties.put(ConsumerConfig.GROUP_ID_CONFIG,”demodata”);
//Configure reset consumer’s offset
properties.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG,”earliest”);

Automatic submission is used in the above code, which may cause data leakage or data repeated consumption

5.2.2 manually commit offset

1) Synchronous submission

…………

properties.put (" enable.auto.commit "," false "); // turn off auto commit offset

…………

    while (true){
        //5. Access to data
        ConsumerRecords<String, String> records = consumer.poll(100);

        //6. Analyze and print
        for (ConsumerRecord<String, String> record : records) {
            System.out.println(record.key()+"======="+record.value());
        }
        
        //The current thread will block until the offset commit is successful
         consumer.commitSync();
    }

…………

2) Asynchronous commit

…………

properties.put (" enable.auto.commit "," false "); // turn off auto commit offset

…………

    while (true){
        //5. Access to data
        ConsumerRecords<String, String> records = consumer.poll(100);

        //6. Analyze and print
        for (ConsumerRecord<String, String> record : records) {
            System.out.println(record.key()+"======="+record.value());
        }
        
        //Asynchronous commit
         consumer.commitAsync(new OffsetCommitCallback() {
             @Override
            public void onComplete(Map<TopicPartition,OffsetAndMetadata> offsets,Exception                 exception) {
                 if (exception != null) {
                     System.err.println("Commit failed for" +offsets);
                 }
             }
         });

    }

…………

Asynchronous submission can avoid the problem of data leakage, but it can not avoid the problem of data repeated consumption.

5.2.3 custom commit offset

The maintenance of offset is quite cumbersome, because the rebalaces of consumers need to be considered.

When new consumers join the consumer group, existing consumers launch consumer groups or the partition of the subscribed topic changes, it will trigger the redistribution of the partition. The redistribution process is called rebalance.

The trigger conditions of consumer rebalance are as follows: (1) the increase or deletion of consumers will trigger the rebalance of consumer group; (2) the increase or decrease of brokers will trigger consumer rebalance

After rebalancing, the consumption partition of each consumer will change. Therefore, consumers should first obtain the partition to which they have been reassigned, and locate the latest submitted offset position of each partition to continue consumption.

package com.zywx.consumer;

import org.apache.kafka.clients.consumer.*;
import org.apache.kafka.common.TopicPartition;
import java.util.*;

public class CustomConsumer {

private static Map<TopicPartition, Long> currentOffset = new HashMap<>();
public static void main(String\[\] args) {

//Create configuration information
 Properties props = new Properties();
 
//Kafka cluster
 props.put("bootstrap.servers", "hadoop102:9092");
 
//Consumer group, as long as group.id  If they are the same, they belong to the same consumer group
 props.put("group.id", "test");
 
//Turn off auto commit offset
 props.put("enable.auto.commit", "false");
 
 //Deserialization classes for key and value
 props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
 props.put("value.deserializer","org.apache.kafka.common.serialization.StringDeserializer");
 
 //Create a consumer
 KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);

 //Consumer subscription topics
 consumer.subscribe(Arrays.asList("first"), new ConsumerRebalanceListener() {

     // this method will be called before Rebalance.
     @Override
     public void onPartitionsRevoked(Collection<TopicPartition> partitions) {
         commitOffset(currentOffset);
     }
     // this method will be called after Rebalance.
     @Override
     public void onPartitionsAssigned(Collection<TopicPartition> partitions) {
          currentOffset.clear();
         for (TopicPartition partition : partitions) {
         consumer.seek (partition, getoffset (partition)); // locate the latest submitted offset position and continue consumption
         }
     }
 });
 while (true) {
     //5. Access to data
    ConsumerRecords<String, String> records = consumer.poll(100);

    //6. Analyze and print
    for (ConsumerRecord<String, String> record : records) {
        System.out.println(record.key()+"======="+record.value());
    }
        
    //Asynchronous commit
     consumer.commitAsync(new OffsetCommitCallback() {
         @Override
        public void onComplete(Map<TopicPartition,OffsetAndMetadata> offsets,Exception             exception) {
             if (exception != null) {
                 System.err.println("Commit failed for" +offsets);
             }
         }
     });
 }
 //Gets the latest offset of a partition
 private static long getOffset(TopicPartition partition) {
     return 0;
 }
 //Submit the offsets of all partitions of the consumer
 private static void commitOffset(Map<TopicPartition, Long> currentOffset) {
     //Method of storing offset information into database
 }

}

In short, when the data is consumed and stored in the database, the offset information consumed is also stored in a special offset table (this table uses three columns of consumer group ID + topic + partition name as the composite primary key). Read the latest offset after a failure or rebalance.

5.3 custom interceptor

5.3.1 interceptor principle

For producer, interceptor enables users to make customized requirements for messages before sending messages and before producer callback logic, such as modifying messages. At the same time, producer allows users to specify multiple interceptors to act on the same message in order to form an interceptor chain.

The implementation interface of intercetpor is org.apache.kafka . clients.producer.ProducerInterceptor The definition method includes:

​ (1)configure(configs)

Called when getting configuration information and initialization data.

​ (2)onSend(ProducerRecord):

The method is encapsulated in KafkaProducer.send Method, that is, it runs in the user’s main thread. Producer ensures that the method is called before the message is serialized and calculated. The user can do any operation on the message in this method, but it is better to ensure that the topic and partition to which the message belongs are not modified, otherwise the calculation of the target partition will be affected.

​ (3)onAcknowledgement(RecordMetadata, Exception):

This method is called after the message has been successfully sent from the recordaccumulator to Kafka broker, or if the sending process fails. And usually before the producer callback logic triggers. Onacknowledgement runs in the IO thread of producer, so don’t put heavy logic in this method, otherwise it will slow down the message sending efficiency of producer.

​ (4)close:

Shut down interceptor, which is mainly used to perform some resource cleaning work.

As mentioned above, interceptors may run in multiple threads, so users need to ensure thread safety during implementation. In addition, if more than one interceptor is specified, the producer will call them in the specified order, and only catch the exceptions that each interceptor may throw and record them to the error log instead of passing them up. This should be paid special attention to in the process of use.

5.3.2 interceptor coding

(1) Add timestamp interceptor

package com.zywx.interceptor;

import org.apache.kafka.clients.producer.ProducerInterceptor;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.clients.producer.RecordMetadata;

import java.util.Map;

public class TimeInterceptor implements ProducerInterceptor<String,String> {

@Override
public void configure(Map<String, ?> map) {

}

@Override
public ProducerRecord<String, String> onSend(ProducerRecord<String, String> producerRecord) {
    //1. Take out the data
    String value = producerRecord.value();
    //2. Create a new producer record (no setValue method)
    return new ProducerRecord<String, String>(producerRecord.topic(),producerRecord.partition(),producerRecord.key(),System.currentTimeMillis()+","+value);
}

@Override
public void onAcknowledgement(RecordMetadata recordMetadata, Exception e) {

}

@Override
public void close() {

}

}

(2) Count the number of messages sent successfully and failed, and print the two counters when producer is closed

package com.zywx.interceptor;

import org.apache.kafka.clients.producer.ProducerInterceptor;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.clients.producer.RecordMetadata;

import java.util.Map;

/**
*
*/
public class CounterInterceptor implements ProducerInterceptor<String,String> {

int success;
int error;

/\*\*
 \*Null is returned by default. The parameter itself should be returned instead
 \* @param producerRecord
 \* @return
 \*/
@Override
public ProducerRecord<String, String> onSend(ProducerRecord<String, String> producerRecord) {
    return producerRecord;
}

@Override
public void onAcknowledgement(RecordMetadata recordMetadata, Exception e) {
    if (recordMetadata != null){
        success++;
    }else {
        error++;
    }
}

@Override
public void close() {
    System.out.println("success:"+success);
    System.out.println("error:"+error);
}

@Override
public void configure(Map<String, ?> map) {

}

}

(3) Add interceptor / interception chain in producer

…………

//Add interceptor
    ArrayList<String> interceptors = new ArrayList<>();
    interceptors.add("com.zywx.interceptor.TimeInterceptor");
    interceptors.add("com.zywx.interceptor.CounterInterceptor");
    properties.put(ProducerConfig.INTERCEPTOR\_CLASSES\_CONFIG,interceptors);

…………

result:

Kafka learning materials

Kafka learning materials

6、 Kafka monitoring

Kafka Eagle

1) Modify Kafka start command

Modify Kafka server- start.sh In command

if [ “x$KAFKA_HEAP_OPTS” = “x” ]; then
export KAFKA_HEAP_OPTS=”-Xmx1G -Xms1G”
fi

Revised as

if [ “x$KAFKA_HEAP_OPTS” = “x” ]; then
export KAFKA_HEAP_OPTS=”-server -Xms2G -Xmx2G -XX:PermSize=128m
-XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:ParallelGCThreads=8 –
XX:ConcGCThreads=5 -XX:InitiatingHeapOccupancyPercent=70″
export JMX_PORT=”9999″
#export KAFKA_HEAP_OPTS=”-Xmx1G -Xms1G”
fi

Each node in the cluster needs to be modified

2) Upload package kafka-eagle-bin-1.3.7 tar.gz Go to the cluster / opt / software directory

3) Unzip to local

[[email protected] software]$ tar -zxvf kafka-eagle-bin1.3.7.tar.gz

4) Enter the directory you just unzipped

[[email protected] kafka-eagle-bin-1.3.7]$ ll
Total consumption 82932
-Rw-rw-r –. 1 atguigu atguigu 84920710 August 13 23:00 kafka-eagleweb-1.3.7- bin.tar.gz

5) Add kafka-eagle-web-1.3.7- bin.tar.gz Unzip to / opt / module

[[email protected] kafka-eagle-bin-1.3.7]$ tar -zxvf kafka-eagleweb-1.3.7-bin.tar.gz -C /opt/module/

6) Name modification

[[email protected] module]$ mv kafka-eagle-web-1.3.7/ eagle

7) Give the startup file execution permission

[[email protected] eagle]$ cd bin/
[[email protected] bin]$ ll
Total dosage 12
-Rw-r–r–. 1 atguigu atguigu August 22 2017 ke.bat
-Rw-r — R –. 1 atguigu atguigu 7190 July 30 20:12 ke.sh
[[email protected] bin]$ chmod 777 ke.sh

8) Modify configuration file

multi zookeeper&kafka cluster list

kafka.eagle.zk.cluster.alias=cluster1
cluster1.zk.list=hadoop102:2181,hadoop103:2181,hadoop104:2181

kafka offset storage

cluster1.kafka.eagle.offset.storage=kafka

enable kafka metrics

kafka.eagle.metrics.charts=true
kafka.eagle.sql.fix.error=false

kafka jdbc driver address

kafka.eagle.driver=com.mysql.jdbc.Driver
kafka.eagle.url=jdbc:mysql://hadoop102:3306/ke?useUnicode=true&characterEncoding=UTF-8&zeroDateTimeBehavior=convertToNull
kafka.eagle.username=root
kafka.eagle.password=root

9) Add environment variable

export KE_HOME=/opt/module/eagle
export PATH=$PATH:$KE_HOME/bin

Note: source / etc / profile

10) Start

[[email protected] eagle]$ bin/ke.sh start

Kafka learning materials

11) Login page to view monitoring data

Kafka learning materials
Kafka learning materials

7、 Integration of Kafka and spring boot

7.1 project dependency

pom.xml

`<?xml version=”1.0″ encoding=”UTF-8″?>
<project xmlns=”http://maven.apache.org/POM/4.0.0″ xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance”

     xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-parent</artifactId>
    <version>2.3.3.RELEASE</version>
    <relativePath/> <!-- lookup parent from repository -->
</parent>
<groupId>com.example</groupId>
<artifactId>spring\_boot\_kafka</artifactId>
<version>0.0.1-SNAPSHOT</version>
<name>spring\_boot\_kafka</name>
<description>Demo project for Spring Boot</description>`

<properties>
    <java.version>1.8</java.version>
</properties>

<dependencies>
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-web</artifactId>
    </dependency>
    <dependency>
        <groupId>org.springframework.kafka</groupId>
        <artifactId>spring-kafka</artifactId>
    </dependency>

    <dependency>
        <groupId>org.projectlombok</groupId>
        <artifactId>lombok</artifactId>
        <optional>true</optional>
    </dependency>
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-test</artifactId>
        <scope>test</scope>
        <exclusions>
            <exclusion>
                <groupId>org.junit.vintage</groupId>
                <artifactId>junit-vintage-engine</artifactId>
            </exclusion>
        </exclusions>
    </dependency>
    <dependency>
        <groupId>org.springframework.kafka</groupId>
        <artifactId>spring-kafka-test</artifactId>
        <scope>test</scope>
    </dependency>
    <dependency>
        <groupId>com.alibaba</groupId>
        <artifactId>fastjson</artifactId>
        <version>1.2.44</version>
    </dependency>

</dependencies>

<build>
    <plugins>
        <plugin>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-maven-plugin</artifactId>
        </plugin>
    </plugins>
</build>

</project>

7.2 entity class user

package com.example.spring_boot_kafka.entity;

import lombok.Data;
import lombok.experimental.Accessors;

@Data
@Accessors(chain = true)
public class User {

private Integer id;
private String name;
private Integer age;

}

7.3 message sending

package com.example.spring_boot_kafka.producer;

import com.alibaba.fastjson.JSON;
import com.example.spring_boot_kafka.entity.User;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.kafka.core.KafkaTemplate;
import org.springframework.stereotype.Component;

@Component
public class UserProducer {

@Autowired
private KafkaTemplate kafkaTemplate;

public void sendUser(Integer id){
    User user = new User();
    user.setId (ID). Setage (17). Setname ("Zhang San");
    System.err.println ("send user log: + user)";
    kafkaTemplate.send("user", JSON.toJSONString(user));
}

}

7.4 message receiving (listening)

package com.example.spring_boot_kafka.consumer;

import lombok.extern.slf4j.Slf4j;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.springframework.kafka.annotation.KafkaListener;
import org.springframework.stereotype.Component;

import java.util.Optional;

@Component
@Slf4j
public class UserConsumer {

@KafkaListener(topics = {"user"})
public void consumer(ConsumerRecord consumerRecord){
    //Judge whether it is null or not
    Optional kafkaMessage = Optional.ofNullable(consumerRecord.value());
    log.info(">>>>record="+kafkaMessage);
    if (kafkaMessage.isPresent()){
        //Get the value in the optional instance
        Object message = kafkaMessage.get();
        System.err.println (consumption message) + message;
    }
}

}

7.5 configuration files

spring.application.name=kafka-user
server.port=8080

============== kafka ===================

Specify Kafka proxy address, which can be multiple

spring.kafka.bootstrap-servers=localhost:9092

=============== provider =======================

spring.kafka.producer.retries=0

Number of messages per batch sent

spring.kafka.producer.batch-size=16384
spring.kafka.producer.buffer-memory=33554432

Specifies the encoding and decoding method of message key and message body

spring.kafka.producer.key-serializer=org.apache.kafka.common.serialization.StringSerializer
spring.kafka.producer.value-serializer=org.apache.kafka.common.serialization.StringSerializer

=============== consumer =======================

Specify default consumer group ID

spring.kafka.consumer.group-id=user-log-group

spring.kafka.consumer.auto-offset-reset=earliest
spring.kafka.consumer.enable-auto-commit=true
spring.kafka.consumer.auto-commit-interval=100

Specifies the encoding and decoding method of message key and message body

spring.kafka.consumer.key-deserializer=org.apache.kafka.common.serialization.StringDeserializer
spring.kafka.consumer.value-deserializer=org.apache.kafka.common.serialization.StringDeserializer

7.6 startup class

package com.example.spring_boot_kafka;

import com.example.spring_boot_kafka.producer.UserProducer;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;

import javax.annotation.PostConstruct;

@SpringBootApplication
public class SpringBootKafkaApplication {
@Autowired
private UserProducer kafkaSender;
@PostConstruct
public void init(){
for (int i = 0; i < 10; i++){
kafkaSender.sendUser(i);
}
}

public static void main(String[] args) {
SpringApplication.run(SpringBootKafkaApplication.class, args);
}

}

7.7 test results

Kafka learning materials

Recommended Today

Summary of recent use of gin

Recently, a new project is developed by using gin. Some problems are encountered in the process. To sum up, as a note, I hope it can help you. Cross domain problems Middleware: func Cors() gin.HandlerFunc { return func(c *gin.Context) { //Here you can use * or the domain name you specify c.Header(“Access-Control-Allow-Origin”, “*”) //Allow header […]