Flink on Zeppelin series: Yan application mode support

Time:2021-7-30

Author: Zhang Jianfeng (Jian Feng)

Last year, when Flink forward talked about the future of Flink on Zeppelin, we talked about the support for the application mode. Today, we have good news to tell you that the community has implemented this feature. You are welcome to download the latest version to use this feature.

Application mode is a new operation mode introduced after Flink 1.11. The problem to be solved is to reduce the pressure on the client and run the user’s main function in jobmanager instead of the user’s client. This mode is very suitable for Flink on Zeppelin, because the client of Flink on Zeppelin is the Flink interpreter process, and the Flink interpreter is a long running main function, which continuously receives commands from the front end and performs corresponding operations (such as submitting jobs, stopping jobs, etc.). Next we will talk the first mock exam about how Zeppelin implements the Yarn Application mode and how to use it.

framework

When talking about the architecture of Yan application mode, let’s talk about the evolution process of Flink on Zeppelin’s architecture.

Normal Flink on Yan operation mode

In the clients of this mode, Flink interpreter runs on Zeppelin, and each client corresponds to a Flink cluster on Yan. If there are many Flink interpreter processes, it will put great pressure on Zeppelin.

Reference documents:https://www.yuque.com/jeffzhangjianfeng/gldg8w/wt1g3h

Reference video:https://www.bilibili.com/video/BV1Te411W73b?p=6

Flink on Zeppelin series: Yan application mode support

Yan interpreter mode

The Yan interpreter moves the client (Flink interpreter) to the Yan cluster and shifts the resource pressure to the Yan cluster, which solves some problems of the above common Flink on Yan operation mode. This mode requires an additional Yan container for each Flink cluster to run the Flink interpreter, which is not very efficient in resource utilization.

Reference documents:https://www.yuque.com/jeffzhangjianfeng/gldg8w/gcah8t

Reference video:https://www.bilibili.com/video/BV1Te411W73b?p=24

Flink on Zeppelin series: Yan application mode support

Yan application mode

The yarn application mode completely solves the problems of the previous two modes and runs the Flink interpreter in the jobmanager, which will neither affect the resource pressure of the Zeppelin server machine nor cause any waste of yarn cluster resources.

Flink on Zeppelin series: Yan application mode support

How to use the Yan application pattern

Configuring the Yan application mode is very simple. Just set the flash.execution.mode to Yan_ Application. All other configurations are no different from other modes. All the following features of Flink on Zeppelin can be used as usual in Yan application mode. Take advantage of all the functions of Zeppelin review.

Multilingual support

The following three languages are supported in the same Flink cluster, and these three languages are accessible (shared catalog and shared executionenvironment)

  • Scala (%flink)
  • PyFlink (%flink.pyflink)
  • SQL (%flink.ssql, %flink.bsql)

Reference documents:https://www.yuque.com/jeffzhangjianfeng/gldg8w/pg5s82

https://www.yuque.com/jeffzhangjianfeng/gldg8w/ggxz76

https://www.yuque.com/jeffzhangjianfeng/gldg8w/te2l1c

Reference video:https://www.bilibili.com/video/BV1Te411W73b?p=4

Hive integration

Hive can be enabled by simple configuration:

Reference documents:https://www.yuque.com/jeffzhangjianfeng/gldg8w/agf94n

Reference video:https://www.bilibili.com/video/BV1Te411W73b?p=10

UDF support

There are four ways to define and use Flink UDF

  • Write Scala UDF directly in Zeppelin
  • Write pyflink UDF directly in Zeppelin
  • Create UDF with SQL
  • Use flick.udf.jars to specify the jar containing UDF

Reference documents:https://www.yuque.com/jeffzhangjianfeng/gldg8w/dthfu2

Reference video:https://www.bilibili.com/video/BV1Te411W73b?p=17

https://www.bilibili.com/video/BV1Te411W73b?p=18

https://www.bilibili.com/video/BV1Te411W73b?p=19

Third party dependency

The third party dependency can be specified in Zeppelin 2

  • flink.excuetion.packages
  • Flash.execution.jars (it should be noted that in the Yan application mode, the HDFS path needs to be specified here, because the Flink interpreter runs in the jobmanager, and the jobmanager runs in the Yan container. There are not necessarily jars you want to specify on the node manager machine of the Yan container)

Reference documents:https://www.yuque.com/jeffzhangjianfeng/gldg8w/rn6g1s

Reference video:https://www.bilibili.com/video/BV1Te411W73b?p=15

Checkpoint & Savepoint

Checkpoint and savepoint are used as usual,

Reference documents:https://www.yuque.com/jeffzhangjianfeng/gldg8w/mlnswx

SQL advanced features

Zeppelin has made a series of enhancements to Flink SQL. These enhancements can be used as usual, such as:

Specific reference documents:https://www.yuque.com/jeffzhangjianfeng/gldg8w/te2l1c