Flink source code reading (10) — Flink heartbeat mechanism

Time:2022-1-7

1. Background

Heartbeat mechanism is a mechanism used to detect whether the client or server is alive by sending requests to the other party regularly. There are two common heartbeat detection methods:
   1. Socket socket so_ Keepalive has its own heartbeat mechanism. It sends heartbeat packets to the other party regularly, and the other party will reply automatically after receiving the heartbeat packet;
   2. The application itself implements the heartbeat mechanism, which is also the way to send requests regularly
Flink implements the second scheme.

In Flink engine, RM (Resource Manager), JM (Jobmaster) and taskexecutor have a heartbeat mechanism for mutual detection.

  1. RM will actively send a request to detect whether the Jobmaster and taskexecutor are alive.
  2. The Jobmaster actively sends a request to detect whether the task executor is alive for task restart or failure processing.

2. Heartbeat registration

Job runThis article introduces the job startup steps. Let’s re comb this time and see how the Flink heartbeat mechanism works.

2.1 interaction between Jobmaster and ResourceManager

Here we mainly look at the Jobmaster #startjobmasterservices method,

     private void startJobMasterServices() throws Exception {
        startHeartbeatServices();

        // start the slot pool make sure the slot pool now accepts messages for this leader
        slotPool.start(getFencingToken(), getAddress(), getMainThreadExecutor());

        // TODO: Remove once the ZooKeeperLeaderRetrieval returns the stored address upon start
        // try to reconnect to previously known leader
        reconnectToResourceManager(new FlinkException("Starting JobMaster component."));

        // job is ready to go, try to establish connection with resource manager
        //   - activate leader retrieval for the resource manager
        //   - on notification of the leader, the connection will be established and
        //     the slot pool will start requesting slots
        resourceManagerLeaderRetriever.start(new ResourceManagerLeaderListener());
    }
2.1.1 heartbeat service startup: startheartbeat services
  • In Jobmaster, there are mainly two heartbeat services, taskmanagerheartbeat Manager (heartbeat services. Createheartbeat managersender) and resourcemanagerheartbeat Manager (heartbeat services. Createheartbeat manager)
  • Taskmanagerheartbeat manager mainly detects whether the heartbeat of all TM registered with JM is normal, and actively sends heartbeat requests to all TM; Resourcemanagerheartbeat manager mainly performs heartbeat interaction with RM and accepts heartbeat requests from RM.
2.1.2 information registration

Call chain:

JobMaster#reconnectToResourceManager --> JobMaster#tryConnectToResourceManager -->  JobMaster#connectToResourceManager --> RegisteredRpcConnection#start 

  – register JM with RM
  – after JM successfully registers with RM, JM will record the monitoring information of RM, that is, in the resourcemanagerheartbeatmanager object of Jobmaster, the monitoring target is added to RM. Since RM actively sends heartbeat detection to JM, in the resourcemanagerheartbeat manager of Jobmaster, requestheartbeat (used when actively sending heartbeat to other target sender) does not need to be implemented, and receiveheartbeat is implemented to accept heartbeat requests from RM.
  – after registering JM with RM, RM will also add JM monitoring information. In the jobmanagerheartbeatmanager object monitoring target, add jobmanagerresourceid. Since RM actively sends heartbeat detection to JM, in RM’s jobmanagerheartbeat manager, only requestheartbeat (used to send heartbeat requests to JM) needs to be implemented, and receiveheartbeat does not need to be implemented.

2.2 interaction between taskmanager, ResourceManager and Jobmaster

The application entry of task manager is jobmaster#resetandstartscheduler, which is applied during job deployment and restart.

2.2.1 taskmanager application start
TM application

TM application is hidden deeply. We give the call chain

JobMaster#resetAndStartScheduler --> JobMaster#startScheduling --> SchedulerBase#startScheduling --> DefaultScheduler#startSchedulingInternal --> PipelinedRegionSchedulingStrategy#startScheduling --> PipelinedRegionSchedulingStrategy#maybeScheduleRegions --> PipelinedRegionSchedulingStrategy#maybeScheduleRegion --> DefaultScheduler#allocateSlotsAndDeploy  --> DefaultScheduler#allocateSlots --> SlotSharingExecutionSlotAllocator#allocateSlotsFor --> SlotSharingExecutionSlotAllocator#getOrAllocateSharedSlot --> PhysicalSlotProviderImpl#allocatePhysicalSlot --> PhysicalSlotProviderImpl#requestNewSlot --> SlotPoolImpl#requestNewAllocatedSlot --> SlotPoolImpl#requestNewAllocatedSlotInternal --> SlotPoolImpl#requestSlotFromResourceManager --> ResourceManager#requestSlot --> SlotManagerImpl#registerSlotRequest --> SlotManagerImpl#internalRequestSlot --> SlotManagerImpl#fulfillPendingSlotRequestWithPendingTaskManagerSlot --> SlotManagerImpl#allocateResource --> ResourceActionsImpl#allocateResource --> ActiveResourceManager#startNewWorker -->  ActiveResourceManager#requestNewWorker --> YarnResourceManagerDriver#requestResource --> AMRMClientAsyncImpl#addContainerRequest

In the process of job deployment, if there is no available slot for freelots, it will apply for resources from RM. The application call chain is shown in the figure above. Finally, addcontainerrequest will be added to a map to apply for resources asynchronously.

TM start

After the resource application is completed, amrmclientasync#oncontainersallocated — > yarnresourcemanagerdriver#oncontainersallocated will be called back. This method starts TM asynchronously (extension: here TM will be started asynchronously through yarn node manager client).

Next, let’s take a look at the TM specific startup process, which will involve how to register with RM and JM. The entry is the taskexecutor #onstart method, mainly focusing on starttaskexecutorservices

private void startTaskExecutorServices() throws Exception {
        try {
            // start by connecting to the ResourceManager
            //Resourcemanagerleaderretriever is actually the implementation of embeddedleaderservice, a simple leader selection service, which selects a leader among contents and notes listeners
            resourceManagerLeaderRetriever.start(new ResourceManagerLeaderListener());

            // tell the task slot table who's responsible for the task slot actions
            taskSlotTable.start(new SlotActionsImpl(), getMainThreadExecutor());

            // start the job leader service
            jobLeaderService.start(
                    getAddress(), getRpcService(), haServices, new JobLeaderListenerImpl());

            fileCache =
                    new FileCache(
                            taskManagerConfiguration.getTmpDirectories(),
                            blobCacheService.getPermanentBlobService());
        } catch (Exception e) {
            handleStartTaskExecutorServicesException(e);
        }
    }
2.2.2 interaction between taskmanager and ResourceManager

Look at the resourcemanagerleaderlistener. After the leader is elected, it will call the notifyleaderaddress method. Here, it will call notifyofnewresourcemanagerleader, and finally call the taskexecutor #connecttoresourcemanager — > registeredrpcconnection #start method

    public void start() {
        checkState(!closed, "The RPC connection is already closed");
        checkState(
                !isConnected() && pendingRegistration == null,
                "The RPC connection is already started");

        final RetryingRegistration<F, G, S> newRegistration = createNewRegistration();

        if (REGISTRATION_UPDATER.compareAndSet(this, null, newRegistration)) {
            newRegistration.startRegistration();
        } else {
            // concurrent start operation
            newRegistration.cancel();
        }
    }

Similar to registering JM, the registeredrpcconnection#start method is executed, but the specific implementation of invokeregistration and onregistrationsuccess is different.

  • Step 1: register TM (invokeregistration)
       1. JM corresponds to registerjobmanager.
       2. TM corresponds to registertaskeexecutor
         2.1 ResourceManager#registerTaskExecutor
           > if this TM has been registered before (taskexecutorresourceid exists in taskexecutors map), all slots on this TM will be released (that is, TM slots will be removed from RM object)
           > add the heartbeat information of this TM in RM, that is, access the TM in the taskmanagerheartbeat manager target of RM to monitor whether the TM survives. Since RM actively sends heartbeat detection to TM, in the taskmanagerheartbeat manager object of RM, only requestheartbeat needs to be implemented to send heartbeat request to TM, and receiveheartbeat does not need to be implemented.

  • Step 2: register TM successfully (onregistrationsuccess)
       1. JM corresponds to the Jobmaster #establishresourcemanagerconnection method:
         RM is added to the resourcemanagerheartbeatmanager target of JM 1.1 for heartbeat interaction with RM
       2. TM corresponds to the taskexecutor #establishresourcemanagerconnection method:
         RM is added to the resourcemanagerheartbeatmanager target of   2.1 TM for heartbeat interaction with RM. Since RM actively sends heartbeat detection to TM, in the resourcemanagerheartbeat manager object of TM, only receiveheartbeat needs to be implemented to accept the heartbeat request sent by RM, and requestheartbeat does not need to be implemented.
         2.2 ResourceManager #sendslotreport — > slotmanagerimpl#registertaskmanager: register a new TM in the slotmanager, so that TM’s slots can be perceived and allocated.

2.2.3 interaction between taskmanager and Jobmaster

look downTaskExecutor#startTaskExecutorServices –> jobLeaderService.start –> JobLeaderListenerImpl#jobManagerGainedLeadership –> TaskExecutor#establishJobManagerConnectionmethod

  1. First, judge whether the Jobmaster of the job exists
      1.1 if it does not exist, continue
       1.2 if JM exists and JM is different from the new jobmasterid, it indicates that JM has restarted, and disconnectjobmanagerconnection will be called to fail all tasks under the job.
  2. Add JM’s heartbeat monitoring to TM for heartbeat interaction with JM.
        that is, add the jobmanagerresourceid to the jobmanagerheartbeat manager target of TM. Since JM actively sends requests to TM, it only needs to implement receiveheartbeat to accept heartbeat requests from JM, and does not need to implement requestheartbeat.
  3. Notify the Jobmaster of all slots in the job that are not assigned tasks

3. Heartbeat timeout processing

Heartbeatlistener is the general interface for heartbeat timeout processing. RM, JM and TM all have corresponding implementation classes. See the heartbeat timeout processing methods respectively

3.1 in RM, JM heartbeat timeout is detected

The ResourceManager #closejobmanagerconnection method is called
      1. RM heartbeat detection service will no longer detect the JM, that is, remove the JM from the target from the jobmanagerheartbeatmanager object.
      2. If resourcemanageraddress= Null and resourcemanagerid remains unchanged,JM will reconnect RM(JM saves the heartbeat monitoring information of RM; slotpoolimpl establishes a connection with RM and applies for resources from RM according to the slot request queue waitingforresourcemanager)

3.2 in RM, TM heartbeat timeout is detected

The ResourceManager #closetaskmanagerconnection method is called
      1. RM heartbeat detection service will no longer detect this TM, that is, remove the TM from the target from the taskmanagerheartbeatmanager object.
      2. Remove all the slots in this TM
       2.1 remove these slots from the freelots of the slot manager impl
       2.2 and end the pending status slot request in completeexceptionally;
       2.3 if the slot has been assigned a task,Drop the task fail on the slotAnd set the taskslotstate state to releasing to release the allocated memory.
       2.4 if the RPC endpoint corresponding to TM is still in started state,TM will reconnect RM(TM saves RM’s heartbeat monitoring information; RM will also save this TM in the taskexecutors map, that is, register TM with RM)

The summary is as follows:
   close the connection between taskmanager and ResourceManager. Taskmanager will try to re register with ResourceManager. When resource manager closes the connection with taskmanager, it will also initiate unregister to slotmanager. At this time, slotmanager (the real work is slotpool) will release all corresponding slots and notify Jobmaster to fail the corresponding assigned tasks according to the allocation ID; If the registration is not successful after the maximum registration time, exit the taskmanager process.

3.3 in JM, TM heartbeat timeout is detected

The jobmaster#disconnecttaskmanager method is called
      1. JM heartbeat detection service will no longer detect this TM, that is, remove the TM from the target from the taskmanagerheartbeatmanager object.
      2. Remove all slots on the TM
       2.1 remove the slot from the allocatedslots and availableslots objects of the slotpool.
         2.2 if the slot has assigned a task, remove the task fail on the slot, and set the taskslotstate state to releasing to free the allocated memory.
      3. TM directly fails all the tasks of the job on the TM and disconnects from JM.
      4. TM reconnects to JM, and JM will apply to RM for TM. After applying, JM will reschedule.

The summary is as follows:
   the taskmanager directly fails all the tasks of the TM job. If the jobmanager marks the task as failed, it will also fail and reschedule.

4. Summary

4.1 heartbeat registration

  1. Jobmaster heartbeat registration
    In the startjobmasterservices method, the registration with RM will be completed and the heartbeat monitoring information of RM will be saved. At the same time, when JM registers with RM, RM will also save JM’s heartbeat monitoring information, and can send heartbeat requests to JM in the future.
  2. Taskmanager heartbeat registration
    2.1 the application entry of task manager is jobmaster#resetandstartscheduler, which is applied during the process of job deployment and restart.
    2.2 the start entry of task manager is: taskexecutor #starttaskexecutorservices.
       2.2.1 TM will first register with RM. After successful registration, it will save the heartbeat monitoring information of RM. At the same time, when TM registers with RM, RM will also save TM’s heartbeat monitoring information, and can send heartbeat requests to RM in the future.
       2.2.2 TM will register with JM after registering with RM. After TM establishes a connection with JM, it will save JM’s heartbeat monitoring information. At the same time, JM will save TM’s heartbeat monitoring information and can send a heartbeat request to TM in the future.

4.2 heartbeat timeout processing

  1. In RM, JM heartbeat timeout is detected: JM will reconnect to RM
  2. In RM, TM heartbeat timeout was detected
       close the connection between taskmanager and ResourceManager. Taskmanager will try to re register with ResourceManager. When resource manager closes the connection with taskmanager, it will also initiate unregister to slotmanager. At this time, slotmanager (the real work is slotpool) will release all corresponding slots and notify Jobmaster to fail the corresponding assigned tasks according to the allocation ID; If the registration is not successful after the maximum registration time, exit the taskmanager process.
  3. In JM, TM heartbeat timeout was detected
       the taskmanager directly fails all the tasks of the TM job. If the jobmanager marks the task as failed, it will also fail and reschedule.

Recommended Today

Proper memory alignment in go language

problem type Part1 struct { a bool b int32 c int8 d int64 e byte } Before we start, I want you to calculatePart1What is the total occupancy size? func main() { fmt.Printf(“bool size: %d\n”, unsafe.Sizeof(bool(true))) fmt.Printf(“int32 size: %d\n”, unsafe.Sizeof(int32(0))) fmt.Printf(“int8 size: %d\n”, unsafe.Sizeof(int8(0))) fmt.Printf(“int64 size: %d\n”, unsafe.Sizeof(int64(0))) fmt.Printf(“byte size: %d\n”, unsafe.Sizeof(byte(0))) fmt.Printf(“string size: %d\n”, […]