Kafka was originally developed by LinkedIn company in Scala language as a multi partition, multi copy distributed messaging system based on zookeeper coordination, which has been donated to Apache foundation. At present, Kafka has been positioned as a distributed streaming processing platform, which is widely used for its high throughput, persistence, horizontal scalability, support for streaming data processing and other characteristics. At present, more and more open source distributed processing systems such as cloudera, storm, spark and Flink support the integration with Kafka.
Kafka is more and more popular because of its three roles
- Message system: both Kafka and traditional message system (also known as message middleware) have the functions of system decoupling, redundant storage, traffic peak clipping, buffering, asynchronous communication, scalability and recoverability. At the same time, Kafka also provides the function of message sequence guarantee and backtracking consumption, which is difficult to realize in most message systems.
- Storage system: Kafka persistes messages to disk, which effectively reduces the risk of data loss compared with other memory based storage systems. Thanks to Kafka’s message persistence function and multi copy mechanism, we can use Kafka as a long-term data storage system. We only need to set the corresponding data retention policy to “permanent” or enable the theme log compression function.
- Streaming platform: Kafka not only provides a reliable data source for every popular streaming framework, but also provides a complete streaming class library, such as window, connection, transformation, aggregation and other operations.
A typical Kafka architecture includes several producers, several brokers, several consumers, and a zookeeper cluster, as shown in the figure below. Zookeeper is used by Kafka to manage the cluster metadata and elect the controller. Producer sends messages to broker, which stores the received messages to disk, while consumer subscribes and consumes messages from broker.
The following three terms are introduced into the whole Kafka architecture
- Producer: producer, the party that sends the message. The producer is responsible for creating the message and then delivering it to Kafka.
- Consumer: the consumer, that is, the party receiving the message. Consumers connect to Kafka and receive messages, and then process the corresponding business logic.
- Broker: service broker node. For Kafka, broker can be simply regarded as an independent Kafka service node or Kafka service instance. In most cases, broker can also be regarded as a Kafka server, provided that only one Kafka instance is deployed on this server. One or more brokers form a Kafka cluster. Generally speaking, we are more used to using the initial lowercase broker to represent the service proxy node.
There are two particularly important concepts in Kafka topic and partition. Messages in Kafka are classified by topic. Producers are responsible for sending messages to specific topics (each message sent to Kafka cluster must specify a topic), while consumers are responsible for subscribing to topics and consuming them.
A normal production logic needs the following steps:
- Configure the producer client parameters and create the corresponding producer instance.
- Build the message to be sent.
- Send a message.
- Close the producer instance.
The message object producerrecord is not a simple message. It contains multiple attributes. The business-related message body that needs to be sent is only one of the value attributes. For example, “Hello, Kafka!” is only an attribute in the producerrecord object. The definition of the producerrecord class is as follows (only intercepting member variables)
The topic and partition fields represent the subject and area code to which the message is sent. The headers field is the header of a message. Kafka 0.11. X version only introduces this property. It is mostly used to set some application related information. If it is unnecessary, it can also be omitted. Key is the key used to specify the message. It is not only the additional information of the message, but also used to calculate the partition number, so that the message can be sent to a specific partition. As mentioned earlier, messages are classified by topic, and this key allows messages to be classified again. Messages with the same key will be divided into the same partition.
3|0Necessary parameter setting
Before creating a real producer instance, you need to configure the corresponding parameters, such as the Kafka cluster address that you need to connect to. Refer to the initconfig() method in the above client code. In Kafka producer client, there are three parameters that are required.
- bootstrap.servers : this parameter is used to specify the broker address list required by the producer client to connect to the Kafka cluster. The specific content format is host1: port1, host2: port2. One or more addresses can be set, separated by commas. The default value of this parameter is. Note that not all broker addresses are required here, because the producer will find the information of other brokers from the given broker. However, it is recommended to set at least two broker addresses. When any one of them goes down, the producer can still connect to the Kafka cluster.
- key.serializer And value.serializer : the message received by the broker must be in the form of byte array. In code listing 3-1, Kafka producer < string, string > and generics < string in producer record < string, string > are used by producers, String > corresponds to the types of key and value in the message. The producer client can make the code have good readability by using this method. However, before sending it to the broker, the corresponding key and value in the message need to be serialized to be converted into a byte array. key.serializer And value.serializer These two parameters are used to specify the serializer of key and value serialization operations, respectively. There is no default value for these two parameters.
In the above client development code, a parameter is also set in the initconfig () method client.id , which is used to set the client ID corresponding to kafkaproducer. The default value is. If the client does not set it, Kafka producer will automatically generate a non empty string in the form of “producer-1” and “producer-2”, that is, the splicing of the string “producer -” and the number.
There are many parameters in kafkaproducer, far from the example initconfig() method. Developers can modify the default values of these parameters according to the actual needs of business applications, so as to achieve the purpose of flexible deployment. In general, ordinary developers can’t remember all the parameter names, they can only have a general impression.
In the actual use process, such as“ key.serializer ”、“ max.request.size ”、“ interceptor.classes ”Such strings are often wrongly written due to human factors. For this purpose, we can directly use the org.apache.kafka . clients.producer.ProducerConfig Class, and each parameter has its corresponding name in the producerconfig class. Take the initconfig() method in code listing 3-1 as an example, the modification results after the introduction of producerconfig are as follows:
Notice in the code above key.serializer And value.serializer The fully qualified name of the class corresponding to the parameter is relatively long, and it is easy to make mistakes. Here, we make further improvements by using java techniques. The relevant code is as follows:
In this way, the code is much simpler and the possibility of human error is further reduced. After configuring the parameters, we can use it to create a producer instance. The example is as follows:
Kafkaproducer is thread safe. It can share a single kafkaproducer instance in multiple threads, or pool the kafkaproducer instance for other threads to call.
There are several construction methods in kafkaproducer, for example, they are not set when creating kafkaproducer instances key.serializer And value.serializer For these two configuration parameters, you need to add the corresponding serializer in the construction method. The example is as follows: