Author: Zhang Jianfeng (Jian Feng)
Last year, when Flink forward talked about the future of Flink on Zeppelin, we talked about the support for the application mode. Today, we have good news to tell you that the community has implemented this feature. You are welcome to download the latest version to use this feature.
Application mode is a new operation mode introduced after Flink 1.11. The problem to be solved is to reduce the pressure on the client and run the user’s main function in jobmanager instead of the user’s client. This mode is very suitable for Flink on Zeppelin, because the client of Flink on Zeppelin is the Flink interpreter process, and the Flink interpreter is a long running main function, which continuously receives commands from the front end and performs corresponding operations (such as submitting jobs, stopping jobs, etc.). Next we will talk the first mock exam about how Zeppelin implements the Yarn Application mode and how to use it.
When talking about the architecture of Yan application mode, let’s talk about the evolution process of Flink on Zeppelin’s architecture.
Normal Flink on Yan operation mode
In the clients of this mode, Flink interpreter runs on Zeppelin, and each client corresponds to a Flink cluster on Yan. If there are many Flink interpreter processes, it will put great pressure on Zeppelin.
Yan interpreter mode
The Yan interpreter moves the client (Flink interpreter) to the Yan cluster and shifts the resource pressure to the Yan cluster, which solves some problems of the above common Flink on Yan operation mode. This mode requires an additional Yan container for each Flink cluster to run the Flink interpreter, which is not very efficient in resource utilization.
Yan application mode
The yarn application mode completely solves the problems of the previous two modes and runs the Flink interpreter in the jobmanager, which will neither affect the resource pressure of the Zeppelin server machine nor cause any waste of yarn cluster resources.
How to use the Yan application pattern
Configuring the Yan application mode is very simple. Just set the flash.execution.mode to Yan_ Application. All other configurations are no different from other modes. All the following features of Flink on Zeppelin can be used as usual in Yan application mode. Take advantage of all the functions of Zeppelin review.
The following three languages are supported in the same Flink cluster, and these three languages are accessible (shared catalog and shared executionenvironment)
- Scala (%flink)
- PyFlink (%flink.pyflink)
- SQL (%flink.ssql, %flink.bsql)
Hive can be enabled by simple configuration:
There are four ways to define and use Flink UDF
- Write Scala UDF directly in Zeppelin
- Write pyflink UDF directly in Zeppelin
- Create UDF with SQL
- Use flick.udf.jars to specify the jar containing UDF
Third party dependency
The third party dependency can be specified in Zeppelin 2
- Flash.execution.jars (it should be noted that in the Yan application mode, the HDFS path needs to be specified here, because the Flink interpreter runs in the jobmanager, and the jobmanager runs in the Yan container. There are not necessarily jars you want to specify on the node manager machine of the Yan container)
Checkpoint & Savepoint
Checkpoint and savepoint are used as usual,
SQL advanced features
Zeppelin has made a series of enhancements to Flink SQL. These enhancements can be used as usual, such as:
- Both batch SQL and streaming SQL are supported
- Multi statement support
- Comment support
- Job parallelism support
- Multiple insert support
- Settings for jobname
- Stream SQL streaming data visualization
Specific reference documents:https://www.yuque.com/jeffzhangjianfeng/gldg8w/te2l1c