Flink system — task execution — tasks — task recovery

Time:2020-10-26

Restart strategies

There are three restart strategies:

  • Fixed delay restart at a fixed time, and the value fixed delay in the configuration file
  • Failure rate the value failure rate in the configuration file is based on the failure rate
  • No restart, the value in the configuration file is none
Fixed delay restart strategy

Fixed delay restart policy attempts to restart a job for a given number of times. If the maximum number of attempts is exceeded, the job will eventually fail, and the restart policy will wait for a fixed time between two consecutive restart attempts.

Through the Flink- conf.yaml The following configuration parameters are set in, and this policy is enabled by default.

restart-strategy: fixed-delay
#Declare the number of times Flink retries execution before the job fails. The default value is 1.
restart-strategy.fixed-delay.attempts: 3
#Delayed retrying means that after the execution fails, the re execution will not start immediately, but after a certain delay, the default is 10s.
restart-strategy.fixed-delay.delay: 10 s

The fixed delay restart policy can also be set programmatically

ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
env.setRestartStrategy(RestartStrategies.fixedDelayRestart(
  3, // number of restart attempts
  Time.of(10, TimeUnit.SECONDS) // delay
));
Failure rate restart strategy

The failure rate restart strategy restarts the job after a failure, but when the failure rate (failure per time interval) is exceeded, the job will eventually fail. The restart policy will wait a fixed amount of time between two consecutive restart attempts.

Through the Flink- conf.yaml The following configuration parameters are set in, and this policy is enabled by default.
If you fail more than three times in 5 minutes, the job fails.

restart-strategy: failure-rate
restart-strategy.failure-rate.max-failures-per-interval: 3
restart-strategy.failure-rate.failure-rate-interval: 5 min
restart-strategy.failure-rate.delay: 10 s

The failure rate restart strategy can also be set by programming:

ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
env.setRestartStrategy(RestartStrategies.failureRateRestart(
  3, // max failures per interval
  Time.of(5, TimeUnit.MINUTES), //time interval for measuring failure rate
  Time.of(10, TimeUnit.SECONDS) // delay
));
No restart strategy

The job failed directly and did not attempt to restart.

restart-strategy: none

You can also set the no restart policy programmatically

ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
env.setRestartStrategy(RestartStrategies.noRestart());

Filover strategies

Fink supports different fault recovery strategies through the Flink in Flink- conf.yaml In the configuration filejobmanager.execution.failover-strategyAttribute, there are two ways:

  • The value of restart all in the configuration file is full
  • The value of restart pipelilled region in the configuration file is region
Restart all (full)

This policy restarts all tasks in the job to recover from task failure

Restart pipeline zone fail over strategy (region)

This strategy divides tasks into disjoint areas. When a task failure is detected, this policy calculates the minimum set of regions that must be restarted to recover from the failure. For some jobs, this may result in fewer tasks to restart than restart all failover policies.

Recommended Today

Let me also summarize the knowledge of nginx

Recently, I want to deeply study the related knowledge of nginx, so I summarize the following contents. Nginx configuration parameters Nginx common commands Nginx variable Virtual host configuration Nginx’s own module Fastcgi related configuration Common functions Load balancing configuration Static and dynamic separation configuration Anti theft chain What is nginx? Nginx is a free, open […]