Big data component construction, version conflict and solution

Time:2021-12-7

Recently, due to the establishment of Hadoop, Flink and Kafka environments, a complete set of big data solutions needs to be made.
Check the latest status of several components on the corresponding official website.
CDH is selected for Hadoop. At present, the latest version is HBase, hive, spark and other components corresponding to cdh6.0.0, as follows:

assembly edition Distribution CDH version
Supervisord 3.0 Not available Not applicable
Cloudera Manager Agent 6.0.0 530873.el7 Not applicable
Cloudera Manager Management Daemon 6.0.0 530873.el7 Not applicable
Flume NG 1.8.0+cdh6.0.0 537114 CDH 6.0.0
Hadoop 3.0.0+cdh6.0.0 537114 CDH 6.0.0
HDFS 3.0.0+cdh6.0.0 537114 CDH 6.0.0
HttpFS 3.0.0+cdh6.0.0 537114 CDH 6.0.0
hadoop-kms 3.0.0+cdh6.0.0 537114 CDH 6.0.0
MapReduce 2 3.0.0+cdh6.0.0 537114 CDH 6.0.0
YARN 3.0.0+cdh6.0.0 537114 CDH 6.0.0
HBase 2.0.0+cdh6.0.0 537114 CDH 6.0.0
Lily HBase Indexer 1.5+cdh6.0.0 537114 CDH 6.0.0
Hive 2.1.1+cdh6.0.0 537114 CDH 6.0.0
HCatalog 2.1.1+cdh6.0.0 537114 CDH 6.0.0
Hue 3.9.0+cdh6.0.0 537114 CDH 6.0.0
Impala 3.0.0+cdh6.0.0 537114 CDH 6.0.0
Java 8 java version “1.8.0_181” Java(TM) SE Runtime Environment (build 1.8.0_181-b13) Java HotSpot(TM) 64-Bit Server VM (build 25.181-b13, mixed mode) Not available Not applicable
Kafka 1.0.0+cdh6.0.0 537114 CDH 6.0.0
Kite (CDH 5 only) 1.0.0+cdh6.0.0 537114 CDH 6.0.0
kudu 1.6.0+cdh6.0.0 537114 CDH 6.0.0
Oozie 5.0.0-beta1+cdh6.0.0 537114 CDH 6.0.0
Parquet 1.9.0+cdh6.0.0 537114 CDH 6.0.0
Pig 0.17.0+cdh6.0.0 537114 CDH 6.0.0
sentry 2.0.0+cdh6.0.0 537114 CDH 6.0.0
Solr 7.0.0+cdh6.0.0 537114 CDH 6.0.0
spark 2.2.0+cdh6.0.0 537114 CDH 6.0.0
Sqoop 1.4.7+cdh6.0.0 537114 CDH 6.0.0
ZooKeeper 3.4.5+cdh6.0.0 537114 CDH 6.0.0

The latest release version of Flink is 1.6

  • Flink 1.6.0 – 2018-08-08 (Source, Binaries, Docs, Javadocs, ScalaDocs)
  • Flink 1.5.3 – 2018-08-21 (Source, Binaries, Docs, Javadocs, ScalaDocs)
  • Flink 1.5.2 – 2018-07-31 (Source, Binaries, Docs, Javadocs, ScalaDocs)
  • Flink 1.5.1 – 2018-07-12 (Source, Binaries, Docs, Javadocs, ScalaDocs)
  • Flink 1.5.0 – 2018-05-25 (Source, Binaries, Docs, Javadocs, ScalaDocs)
  • Flink 1.4.2 – 2018-03-08 (Source, Binaries, Docs, Javadocs, ScalaDocs)
  • Flink 1.4.1 – 2018-02-15 (Source, Binaries, Docs, Javadocs, ScalaDocs)
  • Flink 1.4.0 – 2017-11-29 (Source, Binaries, Docs, Javadocs, ScalaDocs)
  • Flink 1.3.3 – 2018-03-15 (Source, Binaries, Docs, Javadocs, ScalaDocs)
  • Flink 1.3.2 – 2017-08-05 (Source, Binaries, Docs, Javadocs, ScalaDocs)
  • Flink 1.3.1 – 2017-06-23 (Source, Binaries, Docs, Javadocs, ScalaDocs)
  • Flink 1.3.0 – 2017-06-01 (Source, Binaries, Docs, Javadocs, ScalaDocs)

Kafka latest version 2.0.0

  • Released July 30, 2018

Because kafka2.0 has many new features, it attracts me. The latest version is installed.

Then the question came·······

The example included in the fly1.6 release runs the example of Kafka and successfully connects to Kafka. However, it is found that the data sent to Kafka cannot be consumed by the flyk, and a warning that the configuration cannot be recognized appears during the operation.

13:50:15,604 WARN org.apache.kafka.clients.producer.ProducerConfig – The configuration ‘output-topic’ was supplied but isn’t a known config.
13:50:15,604 WARN org.apache.kafka.clients.producer.ProducerConfig – The configuration ‘zookeeper.connect’ was supplied but isn’t a known config.
13:50:15,605 WARN org.apache.kafka.clients.producer.ProducerConfig – The configuration ‘input-topic’ was supplied but isn’t a known config.

Considering that it may be the version of each component, I quickly checked the version of Kafka client in the current example. As expected:
Flink-connector-kafka-0.10_$ The package of {Scala. Binary. Version} goes to Kafka to read data,
The version of Kafka clients in Flink connector Kafka is:

<properties>
  <kafka.version>0.10.2.1</kafka.version>
</properties>
<dependency>
   <groupId>org.apache.kafka</groupId>
   <artifactId>kafka-clients</artifactId>
   <version>${kafka.version}</version>
</dependency>

The Kafka I installed is really 2.0.0. There are two big versions missing. No wonder there will be problems.
Finally, I checked that the latest version supported by the current Flink connector Kafka is 0.11, so I consider installing 0.11 Kafka, Flink 1.6 and cdh6.0.0 separately.
Flynk 1.6 can also run on yarn on CDH 6.0.0.
But at present, the trip to the pit should be far from over. Wish me luck.

assembly edition Distribution CDH version
Kafka 0.10.2+kafka2.2.0 1.2.2.0.p0.92 Not applicable

This article is for reference only.