Flink Introduction (3) – environment and deployment

Time:2020-5-21

Flink Introduction (3) - environment and deployment

Flink is an open source big data stream processing framework. It can batch process and stream process at the same time. It has the advantages of fault tolerance, high throughput and low latency. This paper briefly describes the installation steps of Flink in windows and Linux, and the operation of sample programs, including local debugging environment and cluster environment. In addition, it introduces the construction of Flink’s development project.

First, to run Flink, we need to download and unzip the binary package of Flink. The download address is as follows: https://flink.apache.org/down…

We can choose the combination version of Flink and scala. Here we choose the latest version of Apache Flink 1.9.0 for Scala 2.12 for download.

After the download is successful, you can run Flink through the bat file of windows or cygwin in the windows system.

In Linux system, it can be divided into stand-alone, cluster and Hadoop.

Run from bat file of windows

First, start the CMD command line window, enter the Flink folder, and run thestart-cluster.bat

Note: running Flink requires a Java environment. Make sure that the Java environment variables are configured.

$ cd flink
$ cd bin
$ start-cluster.bat
Starting a local cluster with one JobManager process and one TaskManager process.
You can terminate the processes via CTRL-C in the spawned shell windows.
Web interface by default on http://localhost:8081/.

Flink Introduction (3) - environment and deployment

After the display starts successfully, we visit http://localhost : 8081 / you can see the management page of Flink.

Run through cygwin

CygwinIt is a UNIX like simulation environment running on the windows platform. Download it on the official website: http://cygwin.com/install.html

After successful installation, start cygwin terminal and runstart-cluster.shscript.

$ cd flink
$ bin/start-cluster.sh
Starting cluster.

After the display starts successfully, we visit http://localhost : 8081 / you can see the management page of Flink.

Flink Introduction (3) - environment and deployment

Installing Flink on Linux system

Single node installation

On Linux, single node installation is the same as cygwin. Download Apache Flink 1.9.0 for Scala 2.12, and then unzip it and just start it- cluster.sh 。

Cluster installation

Cluster installation is divided into the following steps:

1. Copy the extracted Flink directory on each machine.

2. Select one as the master node, and then modify conf / Flink of all machines- conf.yaml

jobmanager.rpc.address  =Master hostname

3. Modify conf / slaves to write all work nodes

work01
work02

4. Start cluster on master

bin/start-cluster.sh

Install in Hadoop

We can choose to have Flink run on the yarn cluster.

Download the package of Flink for Hadoop

Guarantee Hadoop_ Home has been set correctly

Start bin / yarn- session.sh

Run the Flink sample program

Batch example:

To submit a batch example program for Flink:

bin/flink run examples/batch/WordCount.jar

This is a batch example program under examples provided by Flink, which counts the number of words.

$ bin/flink run examples/batch/WordCount.jar
Starting execution of program
Executing WordCount example with default input data set.
Use --input to specify file input.
Printing result to stdout. Use --output to specify output path.
(a,5)
(action,1)
(after,1)
(against,1)
(all,2)
(and,12)
(arms,1)
(arrows,1)
(awry,1)
(ay,1)

The result is that the default data set is counted here. You can specify the input and output through — input — output.

We can view the running status in the page:

Flink Introduction (3) - environment and deployment

Streaming example:

Start NC server:

nc -l 9000

To submit a batch example program for Flink:

bin/flink run examples/streaming/SocketWindowWordCount.jar --port 9000

This is the flow processing example program under examples provided by Flink. It receives socket data and counts the number of words.

Write words on NC side

$ nc -l 9000
lorem ipsum
ipsum ipsum ipsum
bye

Output in log

$ tail -f log/flink-*-taskexecutor-*.out
lorem : 1
bye : 1
ipsum : 4

Stop blink

$ ./bin/stop-cluster.sh

After installing Flink, you can easily start Flink as long as you quickly build Flink project and complete relevant code development.

Build tools

Flink projects can be built using different build tools. To get a quick start, Flink provides project templates for the following build tools:

  • Maven
  • Gradle

These templates help you structure your project and create the initial build files.

Maven

Environmental requirements

The only requirement is to useMaven 3.0.4(or later) and installationJava 8.x

Create project

Use one of the following commandsCreate project

Using Maven Archetypes

 $ mvn archetype:generate                               \
      -DarchetypeGroupId=org.apache.flink              \
      -DarchetypeArtifactId=flink-quickstart-java      \
      -DarchetypeVersion=1.9.0

Run the QuickStart script

 curl https://flink.apache.org/q/quickstart.sh | bash -s 1.9.0

After downloading, view the project directory structure:

tree quickstart/
quickstart/
├── pom.xml
└── src
    └── main
        ├── java
        │   └── org
        │       └── myorg
        │           └── quickstart
        │               ├── BatchJob.java
        │               └── StreamingJob.java
        └── resources
            └── log4j.properties

The sample project is aMaven project, which contains two classes:StreamingJobandBatchJobnamelyDataStream and DataSetThe basic framework of the program.
mainMethod is the entry to the program, which can be used for IDE test / execution as well as deployment.

We suggest that youThis project is imported into IDETo develop and test it.
IntelliJ idea supports Maven project out of the box. If you are using eclipse, you can use the M2e plug-in to
Import Maven project.
Some eclipse bundles contain the plug-in by default, and others require you to install it manually.

Please note that: for Flink, the default JVM heap memory may be too small. You should increase the heap memory manually.
In eclipse, selectRun Configurations -> ArgumentsAndVM ArgumentsWrite in the corresponding input box:-Xmx800m
In IntelliJ idea, it is recommended to selectHelp | Edit Custom VM OptionsTo modify the JVM options.

Build project

If you want toBuild / package your project, please run in the project directory‘mvn clean package’Command. After the command is executed, you willFound a jar file, which contains your application, as well as the connectors and libraries that have been added to the application as dependencies:target/-.jar

be careful:If you use other classes instead ofStreamingJobAs the main class / portal of the application, we suggest you modify it accordinglypom.xmlIn the filemainClassto configure. In this way, Flink can run the application from a jar file without specifying the main class.

Gradle

Environmental requirements

The only requirement is to useGradle 3.x(or later) and installationJava 8.x

Create project

Use one of the following commandsCreate project

Gradle example:

build.gradle

buildscript {
    repositories {
        jcenter() // this applies only to the Gradle 'Shadow' plugin
    }
    dependencies {
        classpath 'com.github.jengelman.gradle.plugins:shadow:2.0.4'
    }
}

plugins {
    id 'java'
    id 'application'
    // shadow plugin to produce fat JARs
    id 'com.github.johnrengelman.shadow' version '2.0.4'
}


// artifact properties
group = 'org.myorg.quickstart'
version = '0.1-SNAPSHOT'
mainClassName = 'org.myorg.quickstart.StreamingJob'
description = """Flink Quickstart Job"""

ext {
    javaVersion = '1.8'
    flinkVersion = '1.9.0'
    scalaBinaryVersion = '2.11'
    slf4jVersion = '1.7.7'
    log4jVersion = '1.2.17'
}


sourceCompatibility = javaVersion
targetCompatibility = javaVersion
tasks.withType(JavaCompile) {
    options.encoding = 'UTF-8'
}

applicationDefaultJvmArgs = ["-Dlog4j.configuration=log4j.properties"]

task wrapper(type: Wrapper) {
    gradleVersion = '3.1'
}

// declare where to find the dependencies of your project
repositories {
    mavenCentral()
    maven { url "https://repository.apache.org/content/repositories/snapshots/" }
}

//Note: we cannot use the "compileonly" or "shadow" configuration, which will prevent us from running code in the IDE or by using the "gradle run" command.
//Nor can we exclude delivery dependencies from shadowjar (see https://github.com/johnrengelman/shadow/issues/159 )。
//- > explicitly define the class libraries we want to include in the "flinkshadowjar" configuration!
configurations {
    flinkShadowJar // dependencies which go into the shadowJar

    //Always exclude these dependencies (also from delivery dependencies) because Flink provides them.
    flinkShadowJar.exclude group: 'org.apache.flink', module: 'force-shading'
    flinkShadowJar.exclude group: 'com.google.code.findbugs', module: 'jsr305'
    flinkShadowJar.exclude group: 'org.slf4j'
    flinkShadowJar.exclude group: 'log4j'
}

// declare the dependencies for your production and test code
dependencies {
    // --------------------------------------------------------------
    //Compile time dependencies should not be included in shadow jar,
    //These dependencies are provided in Flink's lib directory.
    // --------------------------------------------------------------
    compile "org.apache.flink:flink-java:${flinkVersion}"
    compile "org.apache.flink:flink-streaming-java_${scalaBinaryVersion}:${flinkVersion}"

    // --------------------------------------------------------------
    //Dependencies that should be included in the shadow jar, such as connectors.
    //They must be in the configuration of flinkshadowjar!
    // --------------------------------------------------------------
    //flinkShadowJar "org.apache.flink:flink-connector-kafka-0.11_${scalaBinaryVersion}:${flinkVersion}"

    compile "log4j:log4j:${log4jVersion}"
    compile "org.slf4j:slf4j-log4j12:${slf4jVersion}"

    // Add test dependencies here.
    // testCompile "junit:junit:4.12"
}

// make compileOnly dependencies available for tests:
sourceSets {
    main.compileClasspath += configurations.flinkShadowJar
    main.runtimeClasspath += configurations.flinkShadowJar

    test.compileClasspath += configurations.flinkShadowJar
    test.runtimeClasspath += configurations.flinkShadowJar

    javadoc.classpath += configurations.flinkShadowJar
}

run.classpath = sourceSets.main.runtimeClasspath

jar {
    manifest {
        attributes 'Built-By': System.getProperty('user.name'),
                'Build-Jdk': System.getProperty('java.version')
    }
}

shadowJar {
    configurations = [project.configurations.flinkShadowJar]
}

setting.gradle

rootProject.name = 'quickstart'

Or run the QuickStart script

    bash -c "$(curl https://flink.apache.org/q/gradle-quickstart.sh)" -- 1.9.0 2.11

To view the directory structure:

tree quickstart/
quickstart/
├── README
├── build.gradle
├── settings.gradle
└── src
    └── main
        ├── java
        │   └── org
        │       └── myorg
        │           └── quickstart
        │               ├── BatchJob.java
        │               └── StreamingJob.java
        └── resources
            └── log4j.properties

The sample project is aGradle project, which contains two classes:StreamingJobandBatchJobyesDataStreamandDataSetThe basic framework of the program.mainThe method is the entry to the program, which can be used for IDE test / execution or deployment.

We suggest that youThis project imports your IDETo develop and test it. IntelliJ idea is installingGradleThe gradle project is supported after the plug-in. Eclipse supports the gradle project through the eclipse buildship plug-in (whereasshadowThe plug-in has requirements for the gradle version, please make sure to specify the gradle version > = 3.0 at the last step of the Import Wizard). You can also use gradle’s ide integration to create project files from gradle.

Build project

If you want toBuild / package project, please run in the project directory‘gradle clean shadowJar’Command. After the command is executed, you willFound a jar file, which contains your application, as well as the connectors and libraries that have been added to the application as dependencies:build/libs/--all.jar

be careful:If you use other classes instead ofStreamingJobAs the main class / portal of the application, we suggest you modify it accordinglybuild.gradleIn the filemainClassNameto configure. In this way, Flink can run the application from a jar file without specifying the main class.

Flink series:

Introduction to Flink (1) — Introduction to Apache Flink
Introduction to Flink (2) – Introduction to Flink architecture

More blogs about real-time computing, Flink, Kafka and other related technologies, welcome to pay attention to real-time streaming computing

Flink Introduction (3) - environment and deployment