[big data practice] Kafka producer programming (5) — detailed explanation of producer config (2)

Time:2020-10-26

preface

In the last article [big data practice] Kafka producer programming (4) – producer config detailed explanation (Part I), the relevant configuration items of Kafka producer were introduced. In this article, we will continue to introduce the remaining configuration items.

Producerconfig class

buffer.memory

Importance: high
Type: long
Default value: 33554432 bytes, i.e. 32m

Producer can be used to cache the buffer size of message records waiting to be sent to the server. When the speed of message records sent to the buffer is greater than the speed of transmission to the server, the message records waiting to be sent will be put in the buffer. If the buffer is full, producer will blockmax.block.msThe specified number of milliseconds over which an exception is thrown.

Note: the size of the buffer is roughly the same as the producer that the producer needs to use. However, it should be noted that not all buffers are used to store records to be sent. For example, some buffers are used to compress data (when the option to compress data is enabled), and some are used for maintenancein-flightThe list of requests that are being sent.

retry.backoff.ms

Importance: low
Type: long
Default value: 100 ms

The number of milliseconds to wait before reconnecting when a producer request to the specified partition fails. This is to avoid sending requests over and over again in some failed scenarios.

compression.type

Importance: high
Type: String
Default value: “None”

The compression types used by producer for data include:

  • None: uncompressed type
  • gzip
  • snappy
  • lz4

When producer compresses data, it compresses all batches together, not one batch by one. Therefore, the more batches are compressed at a time, the higher the compression rate, the better the compression effect.

metrics.sample.window.ms

Importance: low
Type: long
Default value: 30000 MS, i.e. 30 seconds

The time window of the measurement sample is calculated, and the measurement is used for Kafka monitoring.

metrics.num.samples

Importance: low
Type: int
Default value: 2

Maintain the number of samples used to calculate metrics.

metrics.recording.level

Importance: low
Type: String
Default: Info

The highest record level for metrics.

metric.reporters

Importance: low
Type: List
Default value: Collections.emptyList ()

A list of classes that are used as metrics reporter, all of which implement interfacesorg.apache.kafka.common.metrics.MetricsReporterTherefore, when a new metric is generated, these reporters classes can receive notifications. It usually containsJmxreporter classTo register JMX statistics.

JMX(Java Management Extensions)

Kafka uses JMX to call the internal data of Kafka broker to monitor some sensitive data.

JMX related information:

Play JMX from scratch (1) — Introduction and standard MBean
Playing JMX from scratch (2) — condition
Play JMX from scratch (3) — model MBean
Play JMX from scratch (4) — Apache commons modeler & dynamic MBean

max.in.flight.requests.per.connection

Importance: low
Type: int
Default value: 5

In a single connection, the producer client can allow the maximum number of unacknowledged requests before blocking. That is, if the number of unacknowledged requests in a connection exceeds this setting, the producer client will block. Note: if the value is set to greater than 1, when the transmission fails, andretriesWhen the configuration item is turned on again, there is a risk that messages will be reordered.

retries

Importance: low
Type: int
Default value: 0, which means no retrying

When the value is set to greater than 0, the client will resend the message and record the error of sending failure. Note that the retrying configuration item is the same as the client resending due to an error received. WhenretriesConfiguration item is greater than 0, andmax.in.flight.requests.per.connectionWhen the value of the configuration item is greater than 1, there is a risk that the retrial records will be reordered, that is, the order of message records may be disordered. The reason is: when two batches are sent to the same partition, if the first fails and the second succeeds, the first one will be retried, and the second batch will be ranked first.

key.serializer

Importance: high
Type: Class
Default: none

Serialization class of message record key.

value.serializer

Importance: high
Type: Class
Default: none

The serialization class of value in the message record.

connections.max.idle.ms

Importance: medium
Type: long
Default value: 540000 MS, i.e. 9 minutes

If the idle time of a connection exceeds the configured value, the connection will be closed.

partitioner.class

Importance: medium
Type: Class
Default: none

Class that calculates which partitioner the message record is assigned to. In the previous article [big data practice] Kafka producer programming (3) — interceptor & partitioner, the partitioner is explained in detail.

interceptor.classes

Importance: low
Type: List
Default value: Collections.emptyList ()

Interception chain is explained in detail in the previous article [big data practice] Kafka producer programming (3) — interceptor & partitioner.

enable.idempotence

Importance: low
Type: Boolean
Default value: false

Whether to use idempotency. If set to true, producer will ensure that every messagejust rightThere is a backup. If it is set to false, the producer may write multiple retries to the data stream because it fails to send data to the broker.

Note: if you use idempotent, i.eenable.idempotenceIf true, the configuration item is requiredmax.in.flight.requests.per.connectionMust be less than or equal to 5; configuration itemretriesMust be greater than 0;acksConfiguration item must be set toall。 If these values are not explicitly set by the user, the system will automatically select the appropriate values. If the set value is not appropriate, it will be thrownConfigExceptionAbnormal.

transaction.timeout.ms

Importance: low
Type: int
Default value: 60000 MS, that is 60 seconds

The maximum amount of time a transaction coordinator can wait before actively aborting a transaction – in order for the producer to update the transaction state. If the value is greater than that set in Kafka brokertransaction.max.timeout.msConfigure the value of the item, then the producer’s request willInvalidTransactionTimeoutFailure by mistake.

transactional.id

Importance: low
Type: String
Default value: null, which means transactions cannot be used

Configure transactionalid for the delivery of transactions. This configuration item can provide reliability semantics for sessions across multiple producers, because it can ensure that transactions with the same transaction ID (transactional ID) will complete before starting a new transaction.

Note: if thetransactional.idMust be configuredenable.idempotenceIs true. In a publishing environment, transactions require a Kafka cluster to have at least three brokers (recommended setting). In the development environment, you can adjust the broker configuration itemstransaction.state.log.replication.factorTo facilitate development.

Summary

Finally, we went through the configuration items of producer. Generally, we need to pay attention to the configuration items with high importance. From the above configuration items, we also learn some new concepts, such asMetricsTransactionsEtc. These concepts may be explained in the following article.