Big data component building, version conflict solution

Time:2020-9-28

Recently, due to the establishment of Hadoop, Flink, Kafka environment, we need to do a whole set of big data solutions.
Take the corresponding official website to view the latest situation of several components.
Hadoop selects CDH, and the latest version is cdh6.0.0. The corresponding versions of HBase, hive, spark are as follows:

assembly edition Release CDH version
Supervisord 3.0 Not available Not applicable
Cloudera Manager Agent 6.0.0 530873.el7 Not applicable
Cloudera Manager Management Daemon 6.0.0 530873.el7 Not applicable
Flume NG 1.8.0+cdh6.0.0 537114 CDH 6.0.0
Hadoop 3.0.0+cdh6.0.0 537114 CDH 6.0.0
HDFS 3.0.0+cdh6.0.0 537114 CDH 6.0.0
HttpFS 3.0.0+cdh6.0.0 537114 CDH 6.0.0
hadoop-kms 3.0.0+cdh6.0.0 537114 CDH 6.0.0
MapReduce 2 3.0.0+cdh6.0.0 537114 CDH 6.0.0
YARN 3.0.0+cdh6.0.0 537114 CDH 6.0.0
HBase 2.0.0+cdh6.0.0 537114 CDH 6.0.0
Lily HBase Indexer 1.5+cdh6.0.0 537114 CDH 6.0.0
Hive 2.1.1+cdh6.0.0 537114 CDH 6.0.0
HCatalog 2.1.1+cdh6.0.0 537114 CDH 6.0.0
Hue 3.9.0+cdh6.0.0 537114 CDH 6.0.0
Impala 3.0.0+cdh6.0.0 537114 CDH 6.0.0
Java 8 java version “1.8.0_181” Java(TM) SE Runtime Environment (build 1.8.0_181-b13) Java HotSpot(TM) 64-Bit Server VM (build 25.181-b13, mixed mode) Not available Not applicable
Kafka 1.0.0+cdh6.0.0 537114 CDH 6.0.0
Kit (CDH 5 only) 1.0.0+cdh6.0.0 537114 CDH 6.0.0
kudu 1.6.0+cdh6.0.0 537114 CDH 6.0.0
Oozie 5.0.0-beta1+cdh6.0.0 537114 CDH 6.0.0
Parquet 1.9.0+cdh6.0.0 537114 CDH 6.0.0
Pig 0.17.0+cdh6.0.0 537114 CDH 6.0.0
sentry 2.0.0+cdh6.0.0 537114 CDH 6.0.0
Solr 7.0.0+cdh6.0.0 537114 CDH 6.0.0
spark 2.2.0+cdh6.0.0 537114 CDH 6.0.0
Sqoop 1.4.7+cdh6.0.0 537114 CDH 6.0.0
ZooKeeper 3.4.5+cdh6.0.0 537114 CDH 6.0.0

The latest release version of Flink is 1.6

  • Flink 1.6.0 – 2018-08-08 (Source, Binaries, Docs, Javadocs, ScalaDocs)
  • Flink 1.5.3 – 2018-08-21 (Source, Binaries, Docs, Javadocs, ScalaDocs)
  • Flink 1.5.2 – 2018-07-31 (Source, Binaries, Docs, Javadocs, ScalaDocs)
  • Flink 1.5.1 – 2018-07-12 (Source, Binaries, Docs, Javadocs, ScalaDocs)
  • Flink 1.5.0 – 2018-05-25 (Source, Binaries, Docs, Javadocs, ScalaDocs)
  • Flink 1.4.2 – 2018-03-08 (Source, Binaries, Docs, Javadocs, ScalaDocs)
  • Flink 1.4.1 – 2018-02-15 (Source, Binaries, Docs, Javadocs, ScalaDocs)
  • Flink 1.4.0 – 2017-11-29 (Source, Binaries, Docs, Javadocs, ScalaDocs)
  • Flink 1.3.3 – 2018-03-15 (Source, Binaries, Docs, Javadocs, ScalaDocs)
  • Flink 1.3.2 – 2017-08-05 (Source, Binaries, Docs, Javadocs, ScalaDocs)
  • Flink 1.3.1 – 2017-06-23 (Source, Binaries, Docs, Javadocs, ScalaDocs)
  • Flink 1.3.0 – 2017-06-01 (Source, Binaries, Docs, Javadocs, ScalaDocs)

Kafka latest version 2.0.0

  • Released July 30, 2018

Because Kafka 2.0 has many new features, it attracts me more. Determined to install the latest version.

Then the question came·······

The example in the flink1.6 release ran Kafka’s example and successfully connected to Kafka. However, it was found that the data sent to Kafka could not be consumed by Flink at all, and there was a warning that the configuration could not be recognized during the operation.

13:50:15,604 WARN org.apache.kafka.clients.producer.ProducerConfig – The configuration ‘output-topic’ was supplied but isn’t a known config.
13:50:15,604 WARN org.apache.kafka.clients.producer.ProducerConfig – The configuration ‘zookeeper.connect’ was supplied but isn’t a known config.
13:50:15,605 WARN org.apache.kafka.clients.producer.ProducerConfig – The configuration ‘input-topic’ was supplied but isn’t a known config.

After thinking that it might still be the version problem of each component, I quickly checked the version of Kafka client in the current example, and found out that:
Flink through the link-connector-kafka-0.10_ $+ scala.binary.version }To read data from Kafka,
The version of Kafka clients in link connector Kafka is as follows:

<properties>
  <kafka.version>0.10.2.1</kafka.version>
</properties>
<dependency>
   <groupId>org.apache.kafka</groupId>
   <artifactId>kafka-clients</artifactId>
   <version>${kafka.version}</version>
</dependency>

The Kafka I installed is 2.0.0, which is two big versions missing. No wonder there will be problems.
Finally, we check that the latest version supported by the current Flink connector Kafka is 0.11, so consider installing Kafka, flink1.6 and cdh6.0.0 separately.
Flink1.6 can also run on yarn on cdh6.0.0.
But at present, it is far from over. Wish me luck.

assembly edition Release CDH version
Kafka 0.10.2+kafka2.2.0 1.2.2.0.p0.92 Not applicable

This paper is for reference only.