Comparison and selection between Kafka and Kinesis


In the modern large-scale data environment, message sending and processing become very important.

As an elephant in the field of message sending and processing, it is Kafka.

Comparison and selection between Kafka and Kinesis

The direct relationship between Kafka and Kinesis

Before comparing Kafka with kinesis, we need to know something about kinesis.

What is Kafka

Apache Kafka is an open source, distributed and scalable publish subscribe messaging system. The organization responsible for the software is the Apache Software Foundation. The code is written in scala and was originally developed by LinkedIn. It was open source in 2011 and became the top project of Apache.

The project aims to provide a unified low latency platform that can process data feeds in real time. It becomes increasingly valuable for different enterprise infrastructures that require integration between systems. Systems that want to integrate can publish or subscribe to specific Kafka topics according to their needs.

Kafka is affected by the transaction log,   Apache Kafka   The idea behind this is to become a scalable message queue with a structure similar to transaction logs.

This platform is designated as a real-time data stream.

Kafka allows you to organize data under specific topics.

In a word, Kafka’s message processing ability is fast, very fast.

What is kinesis

In short, kinesis is the implementation of AWS cloud platform.

Compared with self deployment of Kafka, you do not need to maintain the hardware platform, do not need to pay for the hardware, and can deploy very quickly.

Amazon kinesis makes it easy for you to collect, process and analyze real-time streaming data so that you can gain timely insights and respond quickly to new information. Amazon kinesis provides a variety of core functions that can cost effectively process streaming data of any size, while providing high flexibility, allowing you to choose the tool that best meets the needs of your application.

With Amazon kinesis, you can get real-time data such as video, audio, application logs and website click streams, as well as IOT telemetry data for machine learning, analysis and other applications.

How to choose

For children’s shoes and companies with selection difficulties, perhaps the following comparison can help you make some decisions.

Main differences

Kafka is an open source distributed messaging solution, and kinesis is a hosting platform provided by Mazon.

In Kafka, you are responsible for installing and managing the cluster, as well as ensuring high availability, persistence and fault recovery. If you are using kinesis, you don’t have to worry about hosting software and resources.

You can easily learn Kafka by installing Kafka on your local system, which kinesis does not.

The pricing in kinesis depends on the number of tiles you use. If you intend to keep your mail for a long time, you must pay an additional fee.

For Kafka, the cost mainly depends on the number of brokers you use. Kafka also needs a Devops team for maintenance, which is sometimes costly.

However, with Kafka, you can keep messages for longer without paying extra as long as you don’t run out of storage space.

Although both Kafka and kinesis are composed of producers, Kafka producers write messages to topics, while kinesis producers write data to KDS.

Kinesis also imposes certain restrictions on message size and message consumption rate.

The maximum message size in kinesis is 1 MB, while the Kafka message size can be larger.

In kinesis, you can consume 5 times per second, and each fragment can consume up to 2 Mb, so you can only write 1000 records per second.

Kafka does not impose any implicit restrictions, so the rate is determined by the underlying hardware, and you can even write data quickly without restrictions.

In terms of security, Kafka provides many client-side security functions, such as data encryption, client authentication and client authorization, while kinesis provides server-side encryption through AWS kms master key to encrypt the data stored in the data stream.

If server-side encryption is used, it is difficult to perform client-side encryption.

Server side encryption provides the second layer of security based on client side encryption.


After reading so much above, is it still a little confused?

In fact, leaving the data volume to talk about the scheme is playing hooligans.

The simple point is that kinesis gets started quickly. If you don’t have any technical strength, you can use it by clicking on the AWS console.

The deployment of Kafka has costs and curves. First, Kafka relies on zookeeper to run. The minimum operating environment of zookeeper requires 3 servers. If it needs to be expanded, it needs 5 servers, because zookeeper needs an odd number of servers to maintain high availability.

If your zookeeper is deployed with 4 servers, the operation effect of zookeeper is the same as that of 3 servers.

This leads to use and learning costs.

If you have tens of thousands of messages a day in the cycle you can meet, and you don’t have many technicians, you can use any one. Kinesis may be more convenient and faster to use.

If you can get 10000 messages in minutes, you can still consider Kafka, because with the increase of messages, kinesis is not cheap, and the retention time of messages is limited.

The expansion of Kafka can be realized by expanding the underlying hardware, and the maintenance cost is included.…