Heavy weight! Apache Flink 1.11 features look ahead!

Time:2020-7-5

Sorting out Gao Yu and Cheng Hequn
Review | Wang Zhijiang

Flink 1.11 will be released soon! To meet your curiosity and expectations, we invite Flink core developers to interpret and share the features of version 1.11. Flink 1.11 improves many aspects based on 1.10, and is committed to further improve the usability and performance of Flink.

This article will introduce the new features, improvements, important changes and future development plans of version 1.11. For more information, please refer to the corresponding flip or JIRA page, and pay attention to our follow-up live broadcast.

Cluster deployment and resource management

In terms of cluster deployment

1. [flip-85] Flink supports application mode

At present, Flink creates the jobgraph and submits the job through a separate client. In actual use, there will be problems such as downloading job jar package, occupying a large amount of bandwidth of the client machine, and starting a separate process (occupying unmanaged resources) as the client. In order to solve these problems, a new application mode is provided in flink-1.11, which transfers the generation of jobgraph and the submission of jobs to the master node.

Users can use the application mode through bin / link run application. At present, the application mode supports the deployment mode of yarn and k8s. The yarn application mode will transfer all the dependencies required for running tasks to Flink master through the yarn local resource on the client side, and then submit the tasks on the master side. K8s application allows users to build images containing user jars and dependencies, and automatically creates a task manager based on the job, and destroys the entire cluster after completion.

2. [flink-13938] [flink-17632] Flink yarn supports remote Flink lib jar caching and job creation using remote jar

Before 1.11, Flink had to upload jars under Flink lib once every time Flink submitted a job on yarn, which consumed additional storage space and communication bandwidth. Flink-1.11 allows users to provide multiple remote lib directories, and the files in these directories will be cached on the node of yarn, so as to avoid unnecessary jar package upload and download, and make the submission and startup faster

./bin/flink run -m yarn-cluster -d \
-yD yarn.provided.lib.dirs=hdfs://myhdfs/flink/lib,hdfs://myhdfs/flink/plugins \
examples/streaming/WindowJoin.jar

In addition, 1.11 also allows users to directly use jar packages on remote file systems to create jobs, thus further reducing the cost of jar package download

./bin/flink run-application -p 10 -t yarn-application \
-yD yarn.provided.lib.dirs="hdfs://myhdfs/flink/lib" \
hdfs://myhdfs/jars/WindowJoin.jar

3. [flink-14460] Flink k8s function enhancement

Compared with flk-11ip, it has better support for flk-8ip than flk-8ip.

In addition, Flink also adds some new functions to support the features of k8s, such as node selector, label, annotation, tolerance, etc. In order to integrate with Hadoop more conveniently, it also supports the function of automatically mounting Hadoop configuration according to the environment variables.

4. [flip-111] docker image unification

Previously, several different dockerfiles were provided in Flink project to create docker images of Flink. Now they are unified into the Apache / Flink docker [1] project.

5. Flink-15911 supports configuring the network interface for local listening binding and the address and port for external access respectively

In some usage scenarios (such as docker and NAT port mapping), the local network address and port seen by JM / TM process may be different from those used by other processes to access the process from outside. Previously, Flink did not allow users to set different local and remote addresses for TM / JM, which caused Flink to have problems in NAT networks used by docker, and could not limit the exposure range of listening ports.

In 1.11, different parameters are introduced for local and remote monitoring address and port. Among them:

  • jobmanager.rpc.address
  • jobmanager.rpc.port
  • taskmanager.host
  • taskmanager.rpc.port
  • taskmanager.data.port

Used to configure remote listening address and port,

  • jobmanager.bind-host
  • jobmanager.rpc.bind-port
  • taskmanager.bind-host
  • taskmanager.rpc.bind-port
  • taskmanager.data.bind-port

Used to configure the local listening address and port.

In terms of resource management

1. [flink-16614] unified JM memory resource configuration

A big change in flink-1.10 is to redefine the TM memory model and configuration rules [2]. Flink 1.11 further adjusted the JM memory model and configuration rules to make JM’s memory configuration mode consistent with TM

Heavy weight! Apache Flink 1.11 features look ahead!

For specific memory configuration, please refer to the corresponding user document [3].

2. [flip-108] add scheduling support for extended resources (such as GPU)

With the development of machine learning and deep learning, more and more Flink jobs will be embedded in machine learning or deep learning model, resulting in the demand for GPU resources. Before 1.11, Flink did not support the management of extended resources such as GPU. In order to solve this problem, Flink provides a unified management framework for extended resources in 1.11, and built-in support for GPU resources based on this framework.

For further configuration of the extended resource management framework and GPU resource management, please refer to the corresponding flip page: https://cwiki.apache.org/conf… Interface part (the corresponding user documentation community is under preparation, and the corresponding user documentation can be referred to later).

3. [link-16605] allows users to limit the maximum number of slots in a batch job

In order to avoid excessive resources occupied by Flink batch jobs, flink-1.11 introduces a new configuration item: slotmanager.number -of- slots.max It can limit the maximum number of slots in the entire Flink cluster. This parameter is only recommended for batch table / SQL jobs that use blink planner.

Enhancement of flink-1.11 Web UI

1. [flip-103] improve JM / TM log display on Web UI

Previously, users can only read. Log and. Out logs through the Web UI, but in fact, there may be other files in the log directory, such as GC log. The new interface allows users to access all logs in the log directory. In addition, the functions of log reload, download and full screen display are added.

2. [flip-99] allows more historical failover exceptions to be displayed

Previously, for a single job, the Web UI can only display a single 20 historical failover exceptions. When the job fails frequently, the initial exception (more likely root cause) will be submerged quickly, which increases the difficulty of troubleshooting. The new version of Web UI supports pagination to display more historical exceptions.

Heavy weight! Apache Flink 1.11 features look ahead!

3. [flink-14816] allows users to perform thread dump directly on the page

Thread dump is very helpful for locating problems of some jobs. Before 1.11, users must log in to the machine where TM is to perform thread dump operation. The Web UI of 1.11 integrates this function. It adds the thread dump tab, which allows users to obtain TM thread dump directly through the Web UI.

Heavy weight! Apache Flink 1.11 features look ahead!

Source & Sink

1. [flip-27] new source API

Flip-27 is a larger feature in 1.11. There are some problems in Flink’s traditional source interface, such as the need to implement different sources for stream jobs and batch jobs, no unified data partition discovery logic, the need for source implementers to handle locking logic by themselves, and the lack of a public architecture that makes source developers have to handle multithreading manually. These problems make it more difficult to implement source in Flink.

Flip-27 introduces a new source interface. This set of interface provides unified data partition discovery and management functions. Users only need to focus on the logic of partition information reading and data reading, and do not need to deal with complex thread synchronization problems, which greatly simplifies the burden of source implementation and provides the basis for providing more built-in functions for source in the future.

2. Link-11395 streaming file sink adds support for Avro and orc formats

For the commonly used streaming file sink, 1.11 adds support for Avro and orc file formats.

Avro:

stream.addSink(StreamingFileSink.forBulkFormat(
   Path.fromLocalFile(folder),
   AvroWriters.forSpecificRecord(Address.class)).build());

ORC:

OrcBulkWriterFactory<Record> factory = new OrcBulkWriterFactory<>(
        new RecordVectorizer(schema), writerProps, new Configuration());
Stream.addSink(StreamingFileSink
      .forBulkFormat(new Path(outDir.toURI()), factory)
      .build());

State management

1. [link-5763] modify the file structure of savepoint to make it self-contained and mobile

Flink-1.11 replaces the absolute path of the file in savepoint with the relative path, so that users can move the location of savepoint directly without manually modifying the path in the meta (Note: this function is not supported after enabling entry injection in S3 file system).

2. [link-8871] add callback of checkpoint failure and notify TM

Before Flink 1.11, a notification of checkpoint success was provided. In 1.11, a new mechanism is added to inform TM when checkpoint fails. On the one hand, it can cancel the checkpoint in progress. In addition, users can also receive the corresponding notification through the notifycheckpoint aborted interface added by checkpoint listener.

3. [link-12692] heap keyed statebackend supports overflow data to disk

(this feature is not actually incorporated into Flink 1.11 code, but users can download it from https://flink-packages.org/pa… 。)

For heap statebackend, it can get better performance because it maintains the state directly in the form of Java objects. However, the memory occupied by its heap state backend is uncontrollable, and the reference can cause serious GC problems.

To solve this problem, scalable keyedstatebackend supports overflow of data to disk, allowing statebackend to limit the amount of memory used. For more information on the scalable keyed state backend, please refer to https://flink-packages.org/pa… 。

4. [flink-15507] enables local recovery by default for rocksdb statebackend

When local recovery is enabled by default, the speed of failure can be accelerated.

5. Modification state.backend.fs The default value of the. Memory threshold parameter is 20K

(this part of the work is still in progress, but should be included in 1.11)

state.backend.fs . memory threshold determines when state data needs to be written out to memory in FS statebackend. The previous default of 1K will cause a large number of small file problems in many cases and affect the performance of state access, so in 1.11, this value has been increased to 20K.It should be noted that this change may increase JM memory usage, especially when the operator concurrency is large or unionstate is used.[4]

Table & SQL

1. [flip-65] optimizes the type inference mechanism in table API UDF

Compared with the previous type inference mechanism, the new type inference mechanism can provide more type information about input parameters, thus allowing users to implement more flexible processing logic. At present, this function provides support for UDF and UTF, but udaf is not supported yet.

2. [flip-84] optimize the interface of tableenvironment

Flink-1.11 enhances tableenv in the following aspects:

  1. In the past, the behavior of sqlupdate C for DDL and DML is different. The former will execute immediately, while the latter needs to wait env.execute When. 11 env.executeSql It’s time to do it.
  2. It provides support for queries that need to return results, such as show table, explain SQL, etc.
  3. Provides support for caching multiple SQL statement execution.
  4. The new collect method allows users to get query execution results

3. [flip-93] supports catalog based on JDBC and Postgres

Before 1.11, when a user used Flink to read / write a relational database or read a change log, the table schema of the database needed to be manually copied to Flink. This process is boring and easy to make mistakes, which greatly improves the use cost of users. 1.11 provides catalog management based on JDBC and Postgres, enabling Flink to automatically read the table schema, thus reducing the manual operation of users.

4. [flip-105] added support for changelog source

Through the change data capture mechanism (CDC) to import the dynamic data of external system (such as MySQL binlog, Kafka compact topic) into Flink, and to write the update / retract flow of Flink to the external system are the functions that users have always wanted. Flink-1.11 supports reading and writing of CDC data. At present, Flink can support two CDC formats, debezium and canal.

5. [flip-95] new tablesource and tablesink interfaces

It simplifies the interface structure of current table source / sink, provides a foundation for supporting CDC functions, avoids dependence on datastream API and solves the problem that only blink planner can support efficient source / sink implementation.

For more specific interface changes, please refer to:

https://cwiki.apache.org/conf…

6. [flip-122] modify the connector configuration item

Flip-122 reorganizes the “with” configuration item of table / SQL connector. Due to historical reasons, with configuration items have some redundancy or inconsistency, such as all configuration items start with connector., and different configuration item name patterns. The modified configuration item solves these problems of redundancy and inconsistency. (it should be emphasized that the existing configuration items can still be used normally).

For a list of new configuration items, please refer to:

https://cwiki.apache.org/conf…

7. [flip-113] Flink SQL supports dynamic table attribute

The dynamic table property allows users to dynamically modify the configuration items of a table when using the table, thus avoiding the trouble that users need to re declare the DDL of the table due to the change of configuration items. As shown below, dynamic properties allow the user to pass the/+ OPTIONS(‘k1’=’v1’)/To override attribute values in DDL.

SELECT *
FROM
  EMP /*+ OPTIONS('k1'='v1', 'k2'='v2') */
  JOIN
  DEPT /*+ OPTIONS('a.b.c'='v3', 'd.e.f'='v4') */
ON
  EMP.deptno = DEPT.deptno

8. [flip-115] added Flink SQL support for hive

  1. For the file system connector, it provides support for five formats: CSV / Orc / parquet / JSON / Avro, and full support for batch and streaming file system connector.
  2. Support for hive streaming sink is provided.

9. [flip-123] supports hive compatible DDL and DML statements

Flip-123 provides support for hive dialect, which enables users to operate with hive’s DDL and DML.

DataStream API

1. [link-15670] Kafka shuffle: Kafka job message bus is used to provide a mechanism for data exchange and storage between operators

Flink Kafka shuffle provides a datastream API to use Kafka as a message bus between link operators and a mechanism to exchange and store data at the same time. The advantages of this approach are:

  1. The data of shuffle can be reused.
  2. When the job fails to recover, the persistent data is used as the partition to avoid the restart of the whole picture, and the exact once meaning is still maintained.

This mechanism can be used as a supplement to the large-scale streaming job failure recovery before the completion of Flink’s ongoing failure recovery.

2. [flip-126] optimize the watermarkassignor interface of source

(note that this part of the work has been completed, but whether it should be included in 1.11 is still under discussion)

The new watermarkassignor interface integrates the previous two kinds of watermarks, namely, assignwithpunctured watermarks and assignerwithperiodicwatermarks, so as to simplify the complexity of source implementation for supporting the insertion of watermarks in subsequent development.

3. [flip-92] supports operators with more than two inputs

Flink 1.11 provides support for multiple input operators. However, at present, this function does not provide a complete interface of datastream API. If users want to use it, they need to manually create multipleinputtransformation and multipleconnected streams

MultipleInputTransformation<Long> transform = new MultipleInputTransformation<>(
   "My Operator",
   new SumAllInputOperatorFactory(),
   BasicTypeInfo.LONG_TYPE_INFO,
   1);

env.addOperator(transform
   .addInput(source1.getTransformation())
   .addInput(source2.getTransformation())
   .addInput(source3.getTransformation()));

new MultipleConnectedStreams(env)
   .transform(transform)
   .addSink(resultSink);

PyFlink & ML

1. [link-15636] supports Python UDF running in the batch mode of Flink planner

Prior to this, python UDF could run in flow, batch and Flink planner flow modes of the blink planner. After support, both planner’s flow batch modes support Python UDF running.

2. [link-14500] Python udtf support

Udtf supports single write multiple outputs. Both planner’s flow batch modes support the running of Python udtf.

3. [flip-121] optimize the execution efficiency of Python UDF through cython

The end-to-end performance of coder (serialization, deserialization) and operation is optimized by using cython, and the end-to-end performance is dozens of times higher than that of version 1.10.

4. [flip-97] pandas UDF support

Pandas UDF with pandas.Series As an input and output type, it supports batch processing of data. Generally speaking, pandas UDF has better performance than ordinary UDF, because it reduces the serialization and deserialization cost of data interaction between Java and python processes, and reduces the number of Python UDF calls and the call cost because of the batch processing of data. In addition, when users use pandas UDF, they can use the python library related to pandas more conveniently and naturally.

5. [flip-120] supports the conversion between pyflink table and pandas dataframe

Users can use the to on the table object_ The pandas () method returns a corresponding pandas dataframe object, or through the from_ The pandas () method converts a pandas dataframe object into a table object.

import pandas as pd
import numpy as np

# Create a PyFlink Table
pdf = pd.DataFrame(np.random.rand(1000, 2))
table = t_env.from_pandas(pdf, ["a", "b"]).filter("a > 0.5")

# Convert the PyFlink Table to a Pandas DataFrame
pdf = table.to_pandas()

6. [flip-112] supports the definition of user-defined metric in Python UDF

Currently, four custom metric types are supported, including counter, gauges, meters and distributions. It also supports the definition of user scope and user variables corresponding to metric.

7. Flip-106 supports Python UDF in SQL DDL and SQL client

Before that, python UDF could only be used in the python table API. After registering Python UDF in DDL mode, SQL users can also use Python UDF conveniently. In addition, it also supports Python UDF for SQL client, supports Python UDF registration and dependency management of Python UDF.

8. [flip-96] supports Python pipeline API

Flink 1.9 introduces a new set of ML pipeline API to enhance the ease of use and scalability of Flink ml. Because Python language is widely used in ML field, flip-96 provides a set of corresponding Python pipeline API to facilitate Python users.

Runtime optimization

1. [flip-76] supports unaligned checkpoint

Under the existing checkpoint mechanism of Flink, each operator needs to wait until all the upstream transmitted barriers are aligned before taking a snapshot and continuing to send the barriers backward. In the case of back pressure, it may take a long time for the barrier to be transferred from the upstream operator to the downstream, resulting in the checkpoint timeout problem.

To solve this problem, Flink 1.11 adds the unaligned checkpoint mechanism. After the unaligned checkpoint is turned on, the checkpoint can be executed when the first barrier is received, and the data being transmitted between upstream and downstream is also saved to the snapshot as a state. In this way, the completion time of checkpoint is greatly shortened, and it is no longer dependent on the processing ability of the operator, which solves the problem that checkpoint cannot be done for a long time in the back pressure scenario.

It can be done through env.getCheckpointConfig (). Enable unalignedcheckpoints(); turn on the unaligned checkpoints mechanism.

2. [link-13417] supports zookeeper 3.5

3.5 integrated with zookeeper. This will allow users to use some new zookeeper features, such as SSL.

3. [link-16408] supports slot level classloder reuse

Flink 1.11 modified the loading logic of the TM side classloader: different from creating a new classloader after each failure, as long as there is a slot occupied by this job in 1.11, the corresponding classloader will be cached. This modification has a certain impact on the semantics of job failure, because the static fields will not be reloaded after the failure, but it can avoid the problem of JVM meta memory exhaustion caused by creating a large number of classloaders.

4. [link-15672] upgrade the log system to log4j 2

Flink 1.11 upgrades the log system log4j to 2. X, which solves some problems in log4j 1. X and uses some new features of 2. X.

5. [link-10742] reduce the data copy times and memory consumption of TM receiver

When flink-1.11 receives data in the downstream network, by reusing Flink’s own buffer memory management, it reduces the memory copy from the network layer to the Flink buffer and the additional overhead of direct memory, thus reducing the probability of direct memory oom or container being killed due to memory overrun.

The above is a forward-looking interpretation of Flink 1.11. The following communities will continue to arrange technical sharing of relevant content

reference material:

[1]https://github.com/apache/fli…

[2]https://ci.apache.org/project…

[3]https://ci.apache.org/project…

[4]https://lists.apache.org/thre…