Translation | Gao
Review Zhu Zhu, Ma Guowei
Welcome to like Flink and send star~
Flink 1.13 is released! Flink 1.13 includes more than 1000 fixes and optimizations submitted by more than 200 contributors.
In this release, one of Flink’s main goals has made important progress, namelyMake the use of stream processing applications as simple and natural as ordinary applications。 The newly introduced passive capacity expansion in Flink 1.13 makes the capacity expansion of flow jobs as simple as other applications. Users only need to modify the concurrency.
This version also includes a series of important changes to makeUsers can better understand the performance of flow jobs。 When the performance of flow jobs is less than expected, these changes can enable users to better analyze the causes. These changes include load and backpressure visualization for identifying bottleneck nodes, CPU flame diagram for analyzing operator hotspot code, and state access performance index for analyzing state backend state.
In addition to these features, the Flink community has added a number of other optimizations, some of which we will discuss later in this article. We hope that users can enjoy the convenience brought by the new version and features. At the end of this article, we will also introduce some changes that need to be paid attention to when upgrading the Flink version.
Passive expansion and contraction capacity
An initial goal of Flink project is to hope that stream processing applications can be as simple and natural as ordinary applications. Passive capacity expansion is Flink’s latest progress in this goal.
When considering resource management and parts, Flink has two possible models. Users can deploy Flink applications to k8s, yarn and other resource management systems, and Flink actively manages resources and allocates and releases resources on demand. The first mock exam is very useful for changing the resource requirements and operations, such as batch operation and real-time SQL query. In this mode, the number of workers started by Flink is determined by the concurrency set by the application. In Flink, the first mock exam is called active expansion.
For long-running stream processing applications, a more suitable model is that users only need to start jobs like other long-running services, regardless of whether they are deployed on k8s, yarn or other resource management platforms, and do not need to consider the number of resources to be applied. On the contrary, its size is determined by the number of workers allocated. When the number of workers changes, Flink automatically changes the concurrency of the application. In Flink, the first mock exam is passive expansion.
Flink’sApplication deployment modeThe effort to make Flink jobs closer to ordinary applications (that is, starting Flink jobs does not need to perform two independent steps to start clusters and submit applications) is started, and passive capacity expansion completes this goal: users no longer need to use additional tools (such as scripts and k8s operators) to keep the number of workers consistent with the application concurrency setting.
Users can now apply the automatic capacity expansion tool to Flink applications, just like ordinary applications, as long as users understand the cost of capacity expansion: stateful streaming applications need to redistribute the state when they expand and shrink.
If you want to try passive capacity expansion, you can add the configuration item scheduler mode: reactive, and then start an application cluster（StandaloneperhapsK8s）。 See for more detailsDocuments for passive expansion and contraction。
Analyze application performance
For all applications, it is a key function to simply analyze and understand the performance of applications. This feature is more important for Flink because Flink applications are generally data intensive (i.e. need to process a large amount of data) and need to give results within (near) real-time delays.
When the processing speed of Flink application can not keep up with the speed of data input, or when an application occupies more resources than expected, these tools described below can help you analyze the reasons.
Bottleneck detection and back pressure monitoring
The first problem to be solved in Flink performance analysis is often: which operator is the bottleneck?
In order to answer this question, Flink introduced an index to describe the degree of busy operation (i.e. processing data) and backpressure (unable to continue to output because the downstream operator can not process the results in time). Possible bottlenecks in applications are those operators that are busy and backpressed upstream.
Flink 1.13 optimizes the logic of backpressure detection (using task-based mailbox timing instead of sampling on the stack), and re implements the UI display of job graph: Flink now shows the degree of busyness and backpressure through colors and values on the UI.
CPU flame diagram in Web UI
Another question Flink often needs to answer about performance: which part of the bottleneck operator consumes a lot of computational logic?
To solve this problem, an effective visualization tool is flame diagram. It can help answer the following questions:
- Which method is currently using CPU?
- What is the percentage of CPU used by different methods?
- What is the stack on which a method is called?
The flame graph is built by repeatedly sampling the stack of threads. In the flame diagram, each method call is represented as a rectangle, and the length of the rectangle is proportional to the number of times the method appears in the sample. An example of flame diagram on UI is shown in the figure below.
Documentation of flame diagramsMore details and instructions to enable this feature are included.
State access latency indicator
Another possible performance bottleneck is state backend, especially when the state of the job exceeds the memory capacity and must be usedRocksDB state backendTime.
This is not to say that the performance of rocksdb is not good enough (we like rocksdb very much!), But it needs to meet some conditions to achieve the best performance. For example, users may easily encounter unintentionalOn the cloud, the IO performance requirements of rockdb cannot be met because the wrong disk resource type is usedProblems.
Based on the CPU flame diagram, the new state backend delay index can help users better judge whether the performance does not meet expectations is caused by state backend. For example, if users find that a single access to rocksdb takes several milliseconds, they need to view the memory and I / O configuration. These indicators can be enabled by setting the option state.backend.rocksdb.latency-track-enabled. These indicators monitor performance through sampling, so their impact on the performance of rocksdb state backend is insignificant.
Switch state backend through savepoint
A user of flippoint can now switch from one of the state savepoint applications. This makes the Flink application no longer limited to the state backend selected when the application runs for the first time.
Based on this function, users can first use a HashMap state backend (pure memory state backend). If the subsequent state becomes too large, they can switch to rocksdb state backend.
In the implementation layer, Flink now unifies the savepoint format of all state backend to achieve this function.
The user specified pod mode is used during k8s deployment
Native kubernetes deployment(Flink actively asked k8s to start the pod). Now you can use the customized pod template.
Using these templates, users can set JM and TM pods in a more k8s compliant way, which is more flexible than the built-in configuration items of Flink k8s integration.
Production of available unaligned checkpoints
Unaligned checkpoint is now available for production. We encourage users to try this function in the presence of back pressure.
Specifically, these features introduced in Flink 1.13 make unaligned checkpoint easier to use:
- Users can also expand or shrink the application when using unaligned checkpoint. This function is very convenient if users need to use retained checkpoint because they cannot use savepoint for performance reasons.
- For applications without backpressure, enabling unaligned checkpoint is now less expensive. The unaligned checkpoint can now be triggered automatically through timeout, that is, an application will use the aligned checkpoint by default (does not store the data in transmission), and will automatically switch to the unaligned checkpoint (store the data in transmission) only when the alignment exceeds a certain time range.
For information on how to enable unaligned checkpoint, please refer toRelated documents。
Machine learning is migrated to a separate warehouse
In order to accelerate the progress of Flink machine learning (stream batch unified machine learning), Flink machine learning has opened a new eraflink-mlWarehouse. We adopt a management method similar to stateful function project. By using a separate warehouse, we can simplify the process of code consolidation and release separate versions, so as to improve the efficiency of development.
SQL / table API progress
Similar to previous versions, SQL and table APIs still account for a large proportion of all development.
The time window is defined by the table valued function
One of the most frequently used in streaming SQL queries is to define time windows. Flink 1.13 introduces a new way to define windows: through the table valued function. This method not only has stronger expression ability (allowing users to define new window types), but also is more consistent with SQL standards.
Flink 1.13 supports thumb and hop windows in the new syntax, and session windows will also be supported in subsequent versions. We show the expressiveness of this method through the following two examples:
- Example 1: a newly introduced cumulate window function, which can support Windows expanded by specific steps until the maximum window size is reached:
SELECT window_time, window_start, window_end, SUM(price) AS total_price FROM TABLE(CUMULATE(TABLE Bid, DESCRIPTOR(bidtime), INTERVAL '2' MINUTES, INTERVAL '10' MINUTES)) GROUP BY window_start, window_end, window_time;
- Example 2: the user can access the start and end time of the window in the table valued window function, so that the user can realize new functions. For example, in addition to the conventional window based aggregation and join, users can now implement window based Top-k aggregation:
SELECT window_time, ... FROM ( SELECT *, ROW_NUMBER() OVER (PARTITION BY window_start, window_end ORDER BY total_price DESC) as rank FROM t ) WHERE rank <= 100;
Improve the interoperability between datastream API and table API / SQL
This version greatly simplifies the mixing of datastream API and table API.
Table API is a very convenient application development interface because it is written by programs that support expressions and provides a large number of built-in functions. However, sometimes users also need to switch back to datastream, for example, when users have the need for expressiveness, flexibility or state access.
The streamtableenvironment. Todatastream() /. Fromdatastream() newly introduced by Flink can use the source or sink declared by a datastream API as the source or sink of a table. Major optimizations include:
- Automatic conversion between datastream and table API type system.
- Seamless integration of event time configuration and high consistency of watermark behavior.
- The row type (that is, the representation of data in the table API) has been greatly enhanced, including the optimization of tostring() / hashcode() and equals() methods, the support of accessing field values by name and the support of sparse representation.
Table table = tableEnv.fromDataStream( dataStream, Schema.newBuilder() .columnByMetadata("rowtime", "TIMESTAMP(3)") .watermark("rowtime", "SOURCE_WATERMARK()") .build()); DataStream<Row> dataStream = tableEnv.toDataStream(table) .keyBy(r -> r.getField("user")) .window(...);
SQL client: initializing script and statement sets
SQL client is a simple way to directly run and deploy SQL flow or batch jobs. Users can call SQL from the command line or as part of CI / CD process without writing code.
This version greatly improves the function of SQL client. Now, based on all the syntax that can be supported by Java programming (that is, calling tableenvironment programmatically to initiate queries), SQL client and SQL script can support it. This means that SQL users no longer need to add glue code to deploy their SQL jobs.
Configuration simplification and code sharing
Flink will no longer support configuring SQL client through yaml (Note: it is still supported, but it has been marked as obsolete). As an alternative, the SQL client now supports the use of an initialization script to configure the environment before the main SQL script is executed.
These initialization scripts can usually be shared between different teams / deployments. It can be used to load common catalogs, apply common configurations, or define standard views.
./sql-client.sh -i init1.sql init2.sql -f sqljob.sql
More configuration items
By adding configuration items and optimizing set / reset commands, users can more easily control the execution process within SQL client and SQL script.
Support multiple queries through statement collections
Multiple queries allow users to execute multiple SQL queries (or statements) in one Flink job. This is useful for long-running streaming SQL queries.
Statement sets can be used to combine a set of queries into a set of queries that are executed simultaneously.
The following is an example of an SQL script that can be executed through the SQL client. It initializes and configures the environment for executing multiple queries. This script includes all queries and all environment initialization and configuration work, so that it can be used as a self-contained deployment component.
-- set up a catalog CREATE CATALOG hive_catalog WITH ('type' = 'hive'); USE CATALOG hive_catalog; -- or use temporary objects CREATE TEMPORARY TABLE clicks ( user_id BIGINT, page_id BIGINT, viewtime TIMESTAMP ) WITH ( 'connector' = 'kafka', 'topic' = 'clicks', 'properties.bootstrap.servers' = '...', 'format' = 'avro' ); -- set the execution mode for jobs SET execution.runtime-mode=streaming; -- set the sync/async mode for INSERT INTOs SET table.dml-sync=false; -- set the job's parallelism SET parallism.default=10; -- set the job name SET pipeline.name = my_flink_job; -- restore state from the specific savepoint path SET execution.savepoint.path=/tmp/flink-savepoints/savepoint-bb0dab; BEGIN STATEMENT SET; INSERT INTO pageview_pv_sink SELECT page_id, count(1) FROM clicks GROUP BY page_id; INSERT INTO pageview_uv_sink SELECT page_id, count(distinct user_id) FROM clicks GROUP BY page_id; END;
Hive query syntax compatibility
Users can now also use hive SQL syntax on Flink. In addition to the hive DDL dialect, Flink now supports the commonly used hive DML and DQL dialects.
In order to use hive SQL dialect, you need to set table.sql-dialect to hive and load hivemodule. The latter is very important because hive’s built-in functions must be loaded to correctly achieve compatibility with hive syntax and semantics. Examples are as follows:
CREATE CATALOG myhive WITH ('type' = 'hive'); -- setup HiveCatalog USE CATALOG myhive; LOAD MODULE hive; -- setup HiveModule USE MODULES hive,core; SET table.sql-dialect = hive; -- enable Hive dialect SELECT key, value FROM src CLUSTER BY key; -- run some Hive queries
It should be noted that the DML and DQL statements of Flink syntax are no longer supported in hive dialect. If you want to use Flink syntax, you need to switch back to the dialect configuration of default.
Optimized SQL time function
Time processing is an important task in data processing. But at the same time, dealing with different time zones, dates, and times is a challengeIncreasingly complexTask.
In Flink 1.13, we put a lot of effort into simplifying the use of time functions. We adjusted the return type of time-dependent functions to make them more accurate, such as process(), current\_ Timestamp() and now().
Second, users can now be based on a timestamp\_ Ltz type column to define the event time attribute, which can gracefully support daylight saving time in window processing.
Users can refer to the release note to view the complete changes in this part.
Pyflink core optimization
The improvement of pyflink in this version is mainly to make the corresponding functions of Python based datastream API and table API more consistent with the Java / Scala version.
Stateful operators in Python datastream API
In Flink 1.13, python programmers can enjoy all the capabilities of the Flink state processing API. The python datastream API reconstructed in Flink version 1.12 now has complete state access capability, so that users can record data information in state and access it later.
Stateful processing capability is the basis of many complex data processing scenarios that rely on cross record state sharing (such as window operator).
The following example shows the implementation of a custom calculation window:
class CountWindowAverage(FlatMapFunction): def __init__(self, window_size): self.window_size = window_size def open(self, runtime_context: RuntimeContext): descriptor = ValueStateDescriptor("average", Types.TUPLE([Types.LONG(), Types.LONG()])) self.sum = runtime_context.get_state(descriptor) def flat_map(self, value): current_sum = self.sum.value() if current_sum is None: current_sum = (0, 0) # update the count current_sum = (current_sum + 1, current_sum + value) # if the count reaches window_size, emit the average and clear the state if current_sum >= self.window_size: self.sum.clear() yield value, current_sum // current_sum else: self.sum.update(current_sum) ds = ... # type: DataStream ds.key_by(lambda row: row) \ .flat_map(CountWindowAverage(5))
User defined windows in the pyflink datastream API
The pyflink datastream interface in Flink 1.13 adds support for user-defined windows. Now users can use window definitions other than standard windows.
Since the window is the core mechanism for processing infinite data streams (by dividing the streams into several Limited “buckets”), this function greatly improves the expression ability of the API.
Row based operations in the pyflink table API
The python table API now supports row based operations, such as user-defined functions for row data. This feature allows users to use non built-in data processing functions.
An example of Python table API using map() operation is as follows:
@udf(result_type=DataTypes.ROW( [DataTypes.FIELD("c1", DataTypes.BIGINT()), DataTypes.FIELD("c2", DataTypes.STRING())])) def increment_column(r: Row) -> Row: return Row(r + 1, r) table = ... # type: Table mapped_result = table.map(increment_column)
In addition to map (), this API also supports flat\_ map()，aggregate()，flat\_ Aggregate () and other row based operations. This brings the functions of the python table API closer to those of the Java table API.
The pyflink datastream API supports batch execution mode
For limited streams, the pyflink datastream API now supports the batch execution mode introduced in the Flink 1.12 datastream API.
By reusing the limited data to skip the processing of state backend and checkpoint, the batch execution mode can simplify operation and maintenance and improve the performance of limited stream processing.
Hugo based Flink document
Flink documents were migrated from jekyii to Hugo. If you find any problems, please be sure to inform us. We look forward to users’ feelings about the new interface.
Web UI support history exception
Flink Web UI can now display n historical exceptions that lead to job failure, so as to improve the debugging experience in the scenario where one exception leads to multiple subsequent exceptions. The user can find the root exception in the exception history.
Optimize the reporting of exceptions and failure causes of failed checkpoints
Flink now provides statistics of failed or cancelled checkpoints, so that users can more easily judge the reason for checkpoint failure without looking at the log.
In previous versions of Flink, indicators (such as the size of persistent data, trigger time, etc.) will be reported only when checkpoint is successful.
JDBC sink that provides “just once” consistency
Starting from 1.13, JDBC sink can provide “just once” consistency support for databases supporting XA transactions by using transaction submission data. This feature requires that the target database must have (or link to) an XA transaction processor.
This sink can only be used in the datastream API now. Users can create this sink through jdbcsink. Exactlyoncesink (…) (or by explicitly initializing a jdbcxasinkfunction).
The pyflink table API supports user-defined aggregation functions on the group window
The pyflink table API now supports both Python based user-defined aggregate functions (udafs) and pandas udafs for the group window. These functions are very important for many data analysis or machine learning training programs.
Before Flink 1.13, these functions could only be used in unlimited group by aggregation scenarios. Flink 1.13 optimizes this limitation.
Sort merge shuffle optimization in batch execution mode
Flink 1.13 optimizes the performance and memory usage of sort merge blocking shuffle for batch programs. This shuffle mode is available in Flink 1.12FLIP-148Introduced in.
This optimization avoids the problem of outofmemoryerror: direct memory in large-scale jobs, and improves the performance through I / O scheduling and broadcast optimization (especially on mechanical hard disks).
HBase connector supports asynchronous dimension table query and query cache
HBase lookup table source can now support asynchronous query mode and query caching. This greatly improves the performance of the table / SQL dimension table join using this source, and can reduce the number of I / O requests to HBase in some typical cases.
In previous versions, HBase lookup source only supported synchronous communication, resulting in reduced job throughput and resource utilization.
Changes to upgrade Flink 1.13
- FLINK-21709– the old table & SQL API planner has been marked obsolete and will be deleted in Flink 1.14. The blink planner has been set as the default planner before several versions and will become the only planner in future versions. This means that batchtableenvironment and dataset API interoperability will no longer be supported. Users need to switch to the unified tableenvironment to write stream or batch jobs.
- FLINK-22352– the Flink community has decided to abandon its support for Apache mesos, which may be further deleted in the future. Users should be able to switch to other resource management systems.
- FLINK-21935– state.backend.async has been disabled because Flink always saves snapshots asynchronously (that is, the default value of the previous configuration), and there is no snapshot saving operation that can support synchronization.
- FLINK-17012– the running status of a task is broken down into two steps: initiating and running. The initializing phase of the task includes the process of loading the state and recovering in flight data when the unaligned checkpoint is enabled. By explicitly distinguishing these two states, the monitoring system can better distinguish whether the task is actually working.
- FLINK-21698– there is a problem with the direct conversion between numeric and timestamp types, which has now been disabled, such as cast (numeric as timestamp (3)). Users should use to\_ TIMESTAMP(FROM\_ Unixtime (numeric)) instead.
- FLINK-22133– the new source interface has a minor incompatible modification, that is, the splitenumerator. Snapshotstate() method now accepts an additional checkpoint ID parameter to represent the ID of the checkpoint to which the ongoing snapshot operation belongs.
- FLINK-19463– because the old statebackend interface carries too much semantics and is easy to cause confusion, this interface is marked as obsolete. This is a pure API layer change without affecting the application runtime. For how to upgrade an existing job, refer toJob migration guidelines 。
If you want to upgrade to Flink 1.13, please refer toRelease notes。 This version is compatible with the previous version of 1. X on the interface marked @ public.
Copyright notice:The content of this article is spontaneously contributed by Alibaba cloud real name registered users, and the copyright belongs to the original author. Alibaba cloud developer community does not own its copyright or bear corresponding legal liabilities. Please refer to Alibaba cloud developer community user service agreement and Alibaba cloud developer community intellectual property protection guidelines for specific rules. If you find any content suspected of plagiarism in the community, fill in the infringement complaint form to report. Once verified, the community will immediately delete the content suspected of infringement.