Flink performance tuning (I)

Time:2022-1-1
Flink performance tuning (I)

ApacheFlink

1 configure memory

Operation scenario

Flink relies on memory calculation. Insufficient memory during the calculation process has a great impact on the execution efficiency of Flink. You can judge whether memory becomes a performance bottleneck by monitoring GC (garbage collection) and evaluating memory usage and surplus, and optimize it according to the situation.
Monitor the container GC log of the yarn of the node process. If full GC occurs frequently, the GC needs to be optimized.


GC configuration: in the “conf / flink-conf.yaml” configuration file of the client, add parameters in the “env. Java. Opts” configuration item:“

-Xloggc:<LOG_DIR>/gc.log 
-XX:+PrintGCDetails 
-XX:-OmitStackTraceInFastThrow 
-XX:+PrintGCTimeStamps 
-XX:+PrintGCDateStamps 
-XX:+UseGCLogFileRotation 
-XX:NumberOfGCLogFiles=20 
-XX:GCLogFileSize=20M

GC logs have been added here by default.


Operation steps

  • Optimize GC.

    Adjust the ratio of old age to new generation. In the “conf / flink-conf.yaml” configuration file of the client, add the parameter in the “env. Java. Opts” configuration item: “- XX: newratio”. If “- XX: newratio = 2”, it means that the ratio of old age to Cenozoic is 2:1. The Cenozoic accounts for 1 / 3 of the whole heap space and the elderly accounts for 2 / 3.

  • When developing Flink applications, optimize data partitioning or grouping operations of datastream.

    • When partitioning causes data skew, you need to consider optimizing partitioning.
    • Avoid non parallel operations. Some operations on datastream will result in non parallelism, such as windowall.
    • Keyby try not to use string.

Supplement:

-Xloggc:<LOG_DIR>/gc.log
#GC details 
-XX:+PrintGCDetails 
-XX:-OmitStackTraceInFastThrow 
#Print GC time information
-XX:+PrintGCTimeStamps 
-XX:+PrintGCDateStamps 
-XX:+UseGCLogFileRotation 
-XX:NumberOfGCLogFiles=20 
-XX:GCLogFileSize=20M。
#It indicates that the ratio of the old generation to the Cenozoic generation is 2:1, the Cenozoic generation accounts for 1 / 3 of the whole heap space, and the old generation accounts for 2 / 3.
#Set the ratio of the younger generation to the older generation. For example, if it is 3, it means that the ratio of the young generation to the old generation is 1:3, and the young generation accounts for 1 / 4 of the sum of the young generation and the old generation 
-XX:NewRatio=2
======================================================================================================

Heap settings
-XMS: initial heap size
-Xmx: maximum heap size
-20: Newsize = n: sets the size of the younger generation
-20: Newratio = n: sets the ratio of the younger generation to the older generation. For example, if it is 3, it means that the ratio of the young generation to the old generation is 1:3, and the young generation accounts for 1 / 4 of the sum of the young generation and the old generation
-20: Survivorratio = n: the ratio of Eden area to two survivor areas in the young generation. Note that there are two in the survivor area. For example, 3 means Eden: Survivor = 3:2, and one survivor area accounts for 1 / 5 of the whole young generation
-20: Maxpermsize = n: sets the persistent generation size
Collector settings
-20: + useserialgc: set serial collector
-20: + useparallelgc: set parallel collector
-20: + useparalleldeloldgc: set parallel collector
-20: + useconcmarksweepgc: set concurrent collector
Garbage collection statistics
-20: + printheapatgc's heap details
-20: + printgcdetails GC details
-20: + printgctimestamps print GC time information
-20: + printtenuringdistribution print age information, etc
-20: + handlepromotionfailure old age distribution guarantee (true or false)
并行Collector settings
-20: Parallelgcthreads = n: sets the number of CPUs used in parallel collector collection. Number of parallel collection threads.
-20: Maxgcpausemillis = n: sets the maximum pause time for parallel collection
-20: Gctimeratio = n: sets the percentage of garbage collection time in program running time. The formula is 1 / (1 + n)
并发Collector settings
-20: + cmsincrementalmode: set to incremental mode. Applicable to single CPU.
-20: Parallelgcthreads = n: sets the number of CPUs used when the collection method of the younger generation of the concurrent collector is parallel collection. Number of parallel collection threads

2 set parallelism

Operation scenario

  • Parallelism controls the number of tasks and affects the number of blocks of data cut after operation. Adjust the parallelism to optimize the number of tasks, the data processed by each task and the processing capacity of the machine.
  • Check the CPU usage and memory usage. When the tasks and data are not evenly distributed in each node, but concentrated in individual nodes, you can increase the parallelism to make the tasks and data more evenly distributed in each node. Increase the parallelism of tasks and make full use of the computing power of cluster machines. Generally, the parallelism is set to 2-3 times the total number of CPU cores of the cluster.

Operation steps

The parallelism of tasks can be specified through the following four levels (arranged from high to low priority). Users can adjust the parallelism parameters according to the actual memory, CPU, data and application logic.

  • Operator hierarchy
    The parallelism of an operator, data source and sink can be specified by calling the setparallelism () method, such as
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

DataStream<String> text = [...]
DataStream<Tuple2<String, Integer>> wordCounts = text
    .flatMap(new LineSplitter())
    .keyBy(0)
    .timeWindow(Time.seconds(5))
    .sum(1).setParallelism(5);

wordCounts.print();

env.execute("Word Count Example");
  • Execution environment hierarchy
    The Flink program runs in the execution environment. The execution environment defines a default parallelism for all operators, data sources and data sink to be executed.
    The default parallelism of the execution environment can be specified by calling the setparallelism () method. For example:
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
    env.setParallelism(3);
    DataStream<String> text = [...]
    DataStream<Tuple2<String, Integer>> wordCounts = [...]
    wordCounts.print();
    env.execute("Word Count Example");
  • Client hierarchy
    The parallelism can be set when the client submits the job to Flink. For cli clients, you can specify the degree of parallelism through the “- P” parameter. For example:
    ./bin/flink run -p 10 ../examples/*WordCount-java*.jar
  • System hierarchy
    At the system level, you can specify the default parallelism of all execution environments by modifying the “parallelism. Default” configuration option in the “flink-conf.yaml” file in the Flink client conf directory.

3. Configure process parameters

Operation scenario

  • In Flink on yarn mode, there are jobmanager and taskmanager processes. In the process of task scheduling and running, jobmanager and task manager take great responsibility.

  • Therefore, the parameter configuration of jobmanager and taskmanager has a great impact on the execution of Flink application. Users can optimize the performance of Flink cluster through the following operations.

Operation steps

1. Configure jobmanager memory.
  • Jobmanager is responsible for task scheduling and message communication between taskmanager and RM. When the number of tasks increases and the parallelism of tasks increases, the memory of jobmanager needs to increase accordingly.

You can set an appropriate memory for the jobmanager according to the actual number of tasks.
• when using the yarn session command, add the “- JM MEM” parameter to set the memory.
• when using the yarn cluster command, add the “- YJM MEM” parameter to set the memory.

2. Configure the number of task managers.

Each task manager and each core can run one task at the same time, so increasing the number of task managers is equivalent to increasing the concurrency of tasks. When resources are sufficient, the number of task managers can be increased accordingly to improve operation efficiency.
• add “- N num” parameter to set the number of taskmanagers when using the yarn session command.
• when using the yarn cluster command, add the “- yn num” parameter to set the number of taskmanagers.

3. Configure the number of taskmanager slots.

Multiple cores of each task manager can run multiple tasks at the same time, which is equivalent to increasing the concurrency of tasks. However, since all cores share the memory of taskmanager, a balance should be made between memory and the number of cores.
• when using the yarn session command, add the “- s num” parameter to set the number of slots.
• when using the yarn cluster command, add the “- ys num” parameter to set the number of slots.

4. Configure taskmanager memory.

The memory of taskmanager is mainly used for task execution, communication, etc. When a task is large, it may require more resources, so the memory can be increased accordingly.
• the “- TM MEM” parameter will be added to set the memory when using the yarn session command.
• the “- YTM MEM” parameter will be added to set the memory when using the yarn cluster command.