Kafka Study Notes (2): a preliminary study of Kafka

Time:2021-1-27

After reading the last article, I believe you have a preliminary understanding of the message system and the overall composition of Kafka. The best way to learn something is to use it. Today, let’s take a look at Kafka and finish our first work.

The history of message in Kafka

Although we need to master things step by step, after we have a general understanding of something, it will be conducive to our understanding and learning of it. So we can first take a look at the experience of a message from sending to receiving?

Kafka Study Notes (2): a preliminary study of Kafka

The above figure briefly illustrates the whole flow process of messages in Kafka (assuming that the whole Kafka system has been deployed, and the corresponding topic and partition have been created, and we will talk about them separately later)

  • 1. The message producer publishes the message to a specific topic and distributes it to a specific partition according to a certain algorithm or randomly;
  • 2. According to the actual needs, whether it is necessary to implement message processing logic;
  • 3. If necessary, publish the results to the output topic after implementing the specific logic;
  • 4. Consumers subscribe to related topics and consume messages according to their needs;

Generally speaking, the process is relatively clear and simple. Let’s practice the basic operation of Kafka with me, and finally realize a small demo of word counting.

Basic operation

The following code and corresponding tests are passed in the following environment: Mac OS + JDK1.8, Linux system should also be able to run through, and students who are interested in windows can go to the official website to download the corresponding version for corresponding test exercises.

Download Kafka

MAC system students can use brew to install:

brew install kafka

Students of Linux system can download the source code from the official website and decompress it, or directly execute the following commands:

cd 
mkdir test-kafka && cd test-kafka
curl -o kafka_2.11-1.0.1.tgz http://mirrors.tuna.tsinghua.edu.cn/apache/kafka/1.0.1/kafka_2.11-1.0.1.tgz
tar -xzf kafka_2.11-1.0.1.tgz
cd kafka_2.11-1.0.1

start-up

Kafka uses zookeeper to maintain cluster information, so we need to start zookeeper first. We can learn more about the relationship between Kafka and zookeeper in combination with the follow-up. After all, we can’t make a fat man by eating all at once.

bin/zookeeper-server-start.sh config/zookeeper.properties

Next, we start a Kafka server node

bin/kafka-server-start.sh config/server.properties

By this time, Kafka system has been started.

Create topic

After everything is ready, we need to start an extremely important step, that is to create a topic. Topic is the core of the whole system flow. In addition, topic itself contains many complex parameters, such as the number of replication factors, the number of partitions and so on. Here, for simplicity, we set the corresponding parameters to 1, which is convenient for you to test

bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic kakfa-test

The specific meaning of parameters is as follows:

attribute function
–create Create topic on behalf of
–zookeeper Zookeeper cluster information
–replication-factor Replicator
–partitions Partition information
–topic Topic name

By this time, we have created a topic called kakfa test.

Send a message to topic

After we have a topic, we can send a message to it:

bin/kafka-console-producer.sh --broker-list localhost:9092 --topic kakfa-test

Then we enter some messages to the console:

this is my first test kafka
so good

By this time, the news has been published on the topic of kakfa test.

Get message from topic

Now there is a message on the topic. You can get the message to be consumed

bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic kafka-test --from-beginning

At this time, we can see in the console:

this is my first test kafka
so good

So far, we have tested the simplest Kafka demo. I hope you can try it yourself. Although it is very simple, it can make you more familiar with the whole Kafka process.

WordCount

Now let’s use the above basic operations to implement a simple wordcount program, which has the following functions:

  • 1. It supports continuous input of phrases, that is, producers generate messages continuously;
  • 2. The program automatically obtains the original data from the input topic, and then publishes the processing results in the counting topic after processing;
  • 3. Consumers can get the corresponding wordcount results from the counting topic;

1. Start Kafka

Just like the above startup, just follow its operation.

2. Create input topic

bin/kafka-topics.sh --zookeeper localhost:2181 --create --topic kafka-word-count-input --partitions 1 --replication-factor 1

3. Enter a message to topic

bin/kafka-console-producer.sh --broker-list localhost:9092 --topic kafka-word-count-input

4. Stream processing logic

This part is the core of the whole example. This part of the code has java 8 + and scala versions. I think that stream processing is more concise and clear with functional syntax. I recommend that you try to write the following with functional thinking, and find that you no longer want to write java anonymous inner class syntax.

Let’s start with a Java 8 version:

public class WordCount {
    public static void main(String[] args) throws Exception {
        Properties props = new Properties();
        props.put(StreamsConfig.APPLICATION_ID_CONFIG, "kafka-word-count");
        props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
        props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
        props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass());

        final StreamsBuilder builder = new StreamsBuilder();
        KStream<String, String> source = builder.<String, String>stream("kafka-word-count-input");
        Pattern pattern = Pattern.compile("\W+");
        source
           .flatMapValues(value -> Arrays.asList(pattern.split(value.toLowerCase(Locale.getDefault()))))
           .groupBy((key, value) -> value)
           .count(Materialized.<String, Long, KeyValueStore<Bytes, byte[]>>as("counts-store")).mapValues(value->Long.toString(value))
           .toStream()
           .to("kafka-word-count-output");
        final KafkaStreams streams = new KafkaStreams(builder.build(), props);
        streams.start();
    }
}

Isn’t it surprising that you can write such concise code with Java, so if there are applicable scenarios, it is recommended that you try to write java code with functional thinking.

Let’s take a look at the scala version:


object WordCount {
  def main(args: Array[String]) {
    val props: Properties = {
      val p = new Properties()
      p.put(StreamsConfig.APPLICATION_ID_CONFIG, "kafka-word-count")
      p.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092")
      p.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String.getClass)
      p.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String.getClass)
      p
    }

    val builder: StreamsBuilder = new StreamsBuilder()
    val source: KStream[String, String] = builder.stream("kafka-word-count-input")
    source
      .flatMapValues(textLine => textLine.toLowerCase.split("\W+").toIterable.asJava)
      .groupBy((_, word) => word)
      .count(Materialized.as[String, Long, KeyValueStore[Bytes, Array[Byte]]]("counts-store")).toStream.to("kafka-word-count-output")
    val streams: KafkaStreams = new KafkaStreams(builder.build(), props)
    streams.start()
  }
}

You can find that the code written in Java 8 functional style is very similar to scala.

5. Start processing logic

Many students don’t have SBT installed on their computers, so the Java version built by Maven is demonstrated here. Please refer to the specific implementation stepsHere is Kafka word countThe instructions on.

6. Start the consumer process

Finally, we start the consumer process and input some words into the producer, such as:

Kafka Study Notes (2): a preliminary study of Kafka

Finally, we can see the following output in the consumer process:

bin/kafka-console-consumer.sh --topic kafka-word-count-output --from-beginning --bootstrap-server localhost:9092  --property print.key=true

Kafka Study Notes (2): a preliminary study of Kafka

summary

This article mainly explains the basic operation process and some basic operations of Kafka, but this is an indispensable step for us to learn something. Only when we have a solid foundation, can we understand it more deeply and understand why it is designed like this. I also encounter a lot of troubles in this process, so I hope you can practice it by yourself, and finally get a good result Get more.