Oozie5. 2.1 + Hadoop 3 compilation

Time:2022-5-24

Compiling oozie5.0 based on Hadoop 3 two point one

system requirements

  • Java JDK 1.8+
  • Maven 3.0.1+
  • Hadoop 3.0.0+

Compilation summary

git clone https://github.com/apache/oozie.git


#If you are building for Hadoop 3, you must activate the profile. The following attributes should be specified when publishing hadoop-3 builds:
-Dgeneratedocs: force oozie document generation
-Dskiptests: skip tests
-Dvc. Revision =: Specifies the source control revision number of the distribution
-Dvc. Url =: Specifies the source control URL of the distribution

Modify the POM file according to the cluster situation

Modify oozie / examples / POM xml

<profile>
    <id>hadoop-3</id>
    <activation>
        <activeByDefault>true</activeByDefault>
    </activation>
    <dependencies>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
        </dependency>
    </dependencies>
</profile>

Modify oozie / sharelib / pig / POM xml

<profile>
    <id>hadoop-3</id>
    <activation>
        <activeByDefault>true</activeByDefault>
    </activation>
    <dependencies>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
        </dependency>
    </dependencies>
</profile>

Modify oozie / webapp / POM xml

<profile>
    <id>hadoop-3</id>
    <activation>
        <activeByDefault>true</activeByDefault>
    </activation>
    <dependencies>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
        </dependency>
    </dependencies>
</profile>

Modify oozie / POM xml

....
<hadoop.version>3.1.1</hadoop.version>
<hadoop.majorversion>3</hadoop.majorversion>
<!--
    Org.org not found apache. hadoop. hbase. security. token. Modifications made by tokenutil
    The default version 1.2.3 is short of classes, so it is modified to 1.7.0
-->
<hbase.version>1.7.0</hbase.version>
.....

<!--  Sharelib component versions just modify the version according to the actual situation of the cluster -- >
<!--  The higher version of hive does not exist. Org apache. hadoop. hive. thrift. DelegationTokenIdentifier. Class uses version 2.3.6 -- >

<!--  You can also use hive of 3.1. For details, please see the following -- >
 <hive.version>2.3.6</hive.version>  
 <hive.jline.version>2.12</hive.jline.version>
 <pig.version>0.16.0</pig.version>
 <pig.classifier>h2</pig.classifier>
 <hive.classifier>core</hive.classifier>
 <sqoop.version>1.4.7</sqoop.version>
 <spark.version>2.4.6</spark.version>
 <spark.streaming.kafka.version>2.4.6</spark.streaming.kafka.version>
 <spark.bagel.version>2.4.6</spark.bagel.version>
 <spark.guava.version>14.0.1</spark.guava.version>
 <spark.scala.binary.version>2.10</spark.scala.binary.version>
 <sqoop.classifier>hadoop260</sqoop.classifier>
 <tez.version>0.10.0</tez.version>
 <joda.time.version>2.9.9</joda.time.version>
 <avro.version>1.8.2</avro.version>

Compatible with sharelib hive3 1.0 compilation adjustment

//Replace delegationtokenidentifier with tokenidentifier
//About 50 lines
import org.apache.hadoop.hive.thrift.DelegationTokenIdentifier;
import org.apache.hadoop.security.token.TokenIdentifier;


//Change to tokenidentifier
//About 290 lines
Token<DelegationTokenIdentifier> token = new Token<DelegationTokenIdentifier>();
Token<TokenIdentifier> token = new Token<TokenIdentifier>();

compile

bin/mkdistro.sh -DskipTests [email protected] -Dvc.url=https://github.com/apache/oozie.git -DgenerateDocs

Installation package related

During the installation process, the jar package was missing from the original package, which made it unable to run

  1. The submission job is missing jar package and hadoop-mapreduce-client-common-3.1.1 Jar, you need to CP from libext to the Lib directory
  2. Submit hive job, missing hive-webhcat-java-client-2.3.7 Jar needs to be copied to lib and libext directories
  3. Submit hive job, missing hive-hcatalog-core-2.3.7 Jar needs to be copied to lib and libext directories
  4. Submit MapReduce job and sqoop job, and yarn scheduling is successful, but oozie UI is suspended and Hadoop MapReduce client jobclient-3.1.1.2 is missing jar