Flink system — task execution — tasks — parallelism


Flink parallelism:

Priority: operator level > environment level > client level > system level

Operator level (operator level)

final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

DataStream<String> text = [...]
DataStream<Tuple2<String, Integer>> wordCounts = text
    .flatMap(new LineSplitter())


env.execute("Word Count Example");
Copy code
  • Operators, data sources, and data sinks can call the setparallelism() method to set parallelism

Execution environment level

final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

DataStream<String> text = [...]
DataStream<Tuple2<String, Integer>> wordCounts = [...]

env.execute("Word Count Example");
Copy code
  • In the execution environment, you can set the default parallelism for operators, data sources, and data sinks through setparallelism; if operators, data sources, and data sinks have their own parallelism settings, the parallelism set by execution environment will be overridden

Client level

./bin/flink run -p 10 ../examples/*WordCount-java*.jar
Copy code


try {
    PackagedProgram program = new PackagedProgram(file, args);
    InetSocketAddress jobManagerAddress = RemoteExecutor.getInetFromHostport("localhost:6123");
    Configuration config = new Configuration();

    Client client = new Client(jobManagerAddress, config, program.getUserCodeClassLoader());

    // set the parallelism to 10 here
    client.run(program, 10, true);

} catch (ProgramInvocationException e) {
Copy code
  • Using the CLI client, you can specify – P when the command line call is made, or when Java / Scala calls are made Client.run Parallel is specified in the parameter of

System level

# The parallelism used for programs that did not specify and other parallelism.

parallelism.default: 1
Copy code
  • It can be found in Flink- conf.yaml Through the parallelism.default The configuration item specifies the system level default parallelism for all execution environments

Introduction to Flink data flow diagram

1.1 logical view of Flink jobs

In the field of big data, wordcount program is like a HelloWorld program of programming language, which shows the basic specifications of a big data engine. Although the sparrow is small, it has five internal organs. From this example, we can have a glimpse of Flink’s design and operation principle.

Flink system -- task execution -- tasks -- parallelism

As shown in Figure 1, the program is divided into three parts: the first part reads the data source, the second part transforms the data, and finally outputs the transformation result to a sink. The methods in the code are called operators, which are interfaces provided by Flink to programmers. Programmers need to operate data through these operators. The source operator reads the data in the data source, which can be a data stream or a file stored in the file system. The transformation operator performs the necessary calculation and processing on the data. The sink operator outputs the processing results, and the data is usually output to the database, file system or the next data stream program.

We can think of the operator as the plus sign in the operation of 1 + 2. The plus sign (+) is a symbolic representation of this operator, which means adding the number 1 and number 2. Similarly, in a big data engine such as Flink or spark, the operator performs some operation on the data, and the programmer can call the appropriate operator according to his own needs to complete the required calculation task. The common operators aremapflatMapkeyBytimeWindowThey perform different types of operations on the data stream, respectively.

Flink system -- task execution -- tasks -- parallelism

Before the program actually runs, Flink will do a simple processing of the code written by the user to generate a logical view as shown in Figure 2. Figure 2 shows how data flows from one operator to another in the wordcount program. In the figure, the circle represents the operator, and the arrow between the circles represents the data flow. The data flow is calculated by different operators in Flink program, and finally generated as the target data. Among them,keyBytimeWindowandsumTogether, the aggregation operation on a time window is reduced to an operator. We can click a job in Flink’s Web UI to view the logical view of the job.

For the case of word frequency statistics, logically speaking, it is nothing more than extracting words from the data stream, then counting the word frequency with a key value structure, and finally outputting the result. Such logic could have been completed in a few lines of code, instead, it used the form of operator, which made the newcomers confused. Why do we have to write programs in the form of operators? In fact, operators have evolved into the current form, just like the evolution process of human beings from stone counting, finger counting, abacus counting, and computer counting. Although the lower level methods can complete certain computing tasks, with the increase of calculation scale, the old counting methods have the disadvantages of low efficiency and can not complete higher level and larger scale Computing requirements. Imagine if we do not use the operators provided by big data engine, but implement a set of calculation logic mentioned above by ourselves. Although we can quickly complete the current task of word frequency statistics, when facing a new calculation task, we need to rewrite the program to complete a set of calculation tasks. The horizontal scalability of our own code may be very low, and when the input data soars, we need to make big changes to deploy on more machines.

The operator of big data engine makes some abstractions for calculation, which has a certain learning cost for newcomers. Once the technology is mastered, the scale of data that people can handle will increase exponentially. The emergence of big data engine operators is a new computing form evolved from the big data scenario in which the data is distributed in multiple nodes, and a unified computing description language is needed to calculate the data. Based on Flink operator, we can define a logical view of data flow to complete the calculation of big data. The remaining problems such as data exchange, horizontal expansion and fault recovery are all solved by big data engine.

1.2 from logical view to physical execution

In most big data processing scenarios, one machine node cannot process all the data, and the data is split into multiple nodes. In the field of big data, when the amount of data exceeds the processing capacity of a single machine, it is necessary to split a piece of data into multiple partitions, and each partition is distributed on a virtual machine or a physical machine.

As mentioned in the previous section, operators of big data engines provide programming interfaces, and we can use operators to build logical views of data streams. Considering that the data is distributed in multiple nodes, the logical view is only an abstraction, which needs to be transformed into a physical execution diagram before it can be executed in a distributed environment.
Flink system -- task execution -- tasks -- parallelism

Figure 3 shows the physical execution diagram of wordcount program, where the data flow is distributed over two partitions. The arrow part indicates the data flow partition, and the circle part indicates the operator subtask on the partition. After changing from a logical view to a physical execution diagram, the flatmap operator has an operator subtask in each partition to process the data on that partition: the flatmap [1 / 2] operator subtask processes the data on the first data stream partition, and so on.

Operator subtasks are also called operator instances. When an operator is executed in parallel, there will be multiple operator instances. Even if the input data increases, we can expand horizontally by deploying more operator instances. As can be seen from Figure 3, all operators except sink are divided into two operator instances with parallelism of 2 and sink of 1. Parallelism can be set. When the parallelism of an operator is set to 2, it means that there are two operator subtasks (or two operator instances) executing in parallel. In practical application, parallelism is usually set according to the size of input data, the number of computing resources and other factors.

Note that in this example, for demonstration, we set the parallelism of all operators to 2:env.setParallelism(2);The parallelism of the final output is set to 1:1wordCount.print().setParallelism(1);。 If not set separatelyprintThe degree of parallelism is also 2.

Operator subtask is the basic unit of Flink physical execution. Operator subtasks are independent of each other. An operator subtask has its own thread, and different operator subtasks may be distributed on different nodes. In the part of Flink resource allocation, we will focus on operator subtasks.

Talking about logic view to physical execution diagram

After understanding the distributed architecture and core components of Flink, we will introduce the process of transforming from logical view to physical execution diagram from a more fine-grained perspective. The process can be divided into four layers:StreamGraph -> JobGraph -> ExecutionGraph->Physical execution diagram.

Flink system -- task execution -- tasks -- parallelism

  • StreamGraph: is the initial diagram generated from user written code to represent the topology of a Flink job. stayStreamGraphMedium, nodeStreamNodeThat’s the operator.
  • JobGraphJobGraphIs the data structure submitted to the job manager.StreamGraphAfter optimization, theJobGraphThe main optimization is to link multiple qualified nodes together as oneJobVertexNode, which can reduce the transmission cost of data exchange. The process of linking is called operator chain, which will be described in the next section.JobVertexAfter passing through the operator chain, it will contain one or more operators, and its output isIntermediateDataSetIs the data set generated by operator processing.
  • ExecutionGraph: jobmanager willJobGraphIntoExecutionGraphExecutionGraphyesJobGraphParallel version of: suppose aJobVertexIf the parallelism of is 2, then it will be divided into 2ExecutionVertexExecutionVertexRepresents an operator subtask that monitors the execution of a single subtask. eachExecutionVertexOne will be outputIntermediateResultPartitionThis is the output of a single subtaskExecutionEdgeOutput to downstream nodes.ExecutionJobVertexIt is a collection of these parallel subtasks that monitor the operation of the entire operator.ExecutionGraphIt is the core data structure of scheduling layer.
  • Physical execution diagram: job manager according toExecutionGraphAfter the job is scheduled, specific tasks are deployed on each task manager. The physical execution diagram is not a specific data structure.

As you can see, Flink takes great pains in data flow graphs, and there are four types of graphs only. For newcomers, they don’t have to pay much attention to the underlying implementation of these very details, but only need to understand the following core concepts:

  • Flink adopts master-slave architecture, master plays the role of management and coordination, and taskmanager is responsible for physical execution. In the process of implementation, some things such as data exchange and life cycle management will occur.
  • The user calls Flink API to construct logical view. Flink optimizes the logical view and converts it into parallel physical execution graph. Finally, the physical execution diagram is executed.

Tasks, operator subtasks and operator chains

In the process of constructing physical execution graph, Flink will link some operator subtasks together to form operator chain. After linking, it is scheduled and executed by task manager in the form of task. Using operator chain is a very effective optimization, which can effectively reduce the transmission cost between operator subtasks. The task formed after the link is a thread in the task manager.

Flink system -- task execution -- tasks -- parallelism

For example, data is propagated forward from source to flatmap, and there is no cross partition data exchange. Therefore, we can combine source and flatmap together to form a task. Data processingkeyByA data exchange has occurred and the data will cross the partition, so thekeyByAnd the window aggregation behind it. Since the parallelism of windowaggregation is 2 and that of sink is 1, the data is exchanged again, so we cannot link the two parts of windowaggregation and sink together. It is mentioned in section 1.2 that the parallelism of sink is artificially set to 1. If we set the parallelism of sink to 2, then the two operators can be linked together.

By default, Flink will try to link more subtasks together, which can reduce some unnecessary data transmission overhead. However, when a subtask has more than one input or data exchange occurs, the link cannot be established. The two operators can be linked together is some rules, interested readers can read Flink source codeorg.apache.flink.streaming.api.graph.StreamingJobGraphGeneratorMediumisChainablemethod.StreamingJobGraphGeneratorThe purpose of the class is toStreamGraphConvert toJobGraph

Although linking operators together can reduce some transmission overhead, there are some cases where too many links are not required. For example, sometimes we need to split a very long chain of operators, so that we can split the computation originally concentrated in one thread into multiple threads for parallel computing. Flink allows developers to manually configure whether operator chains are enabled or which operators to use operator chains.

Task slots and computing resources

Task slot

As mentioned above, we have learned that task manager is responsible for specific task execution. Taskmanager is a JVM process in which multiple tasks can be run in parallel. Before the program is executed, after optimization, some subtasks are linked together to form a task. Each task is a thread, which requires the task manager to allocate corresponding resources. The task manager allocates resources to the task using the task slot.

Before explaining the concept of Flink job slots, let’s review the concepts of processes and threads. At the operating system level, process is an independent unit for resource allocation and scheduling, and thread is the basic unit for CPU scheduling. For example, our commonly used office word software takes up a process of the operating system after it is started. The task manager can be used on windows to view the currently active processes, and Linux can be usedtopCommand to view. A thread is a subset of a process. A thread usually focuses on processing some specific tasks, does not own system resources independently, but only has some necessary resources in running, such as program counters. A process has at least one thread and can have multiple threads. In multithreading scenario, each thread processes a small task, and multiple threads process multiple small tasks simultaneously in a highly concurrent way, which can improve the processing ability.

Back to the slot allocation mechanism of Flink, a taskmanager is a process. Taskmanager can manage one or more tasks. Each task is a thread and occupies a slot. The resource of each slot is a subset of the entire taskmanager resource. For example, there are three slots under the taskmanager here. Each slot occupies 1 / 3 of the memory managed by the taskmanager. The tasks in the first slot will not compete with the tasks in the second slot for memory resources. Note that Flink does not explicitly allocate CPU resources to slots when allocating resources.

Flink system -- task execution -- tasks -- parallelism

Suppose we assign two taskmanagers to the wordcount program, and each taskmanager is assigned three slots, so there are six slots in total. Combined with the parallelism setting of this job in Figure 7, the whole job is divided into five tasks and five threads are used. These five threads can be allocated to six slots in the way shown in Figure 8.

Flink allows users to set the number of slots in the task manager so that users can determine the granularity at which tasks are isolated from each other. If each taskmanager contains only one slot, the tasks running in that slot will be exclusive to the JVM. If the task manager contains multiple slots, the tasks in the multiple slots can share the JVM resources, such as sharing TCP connection, heartbeat information, partial data structure, etc. The official proposal is to set the number of slots to the number of CPU cores available under taskmanager. On average, each slot can get an average of one CPU core.