Interview spark module – spark workflow?


Spark’s workflow?

Answer train of thought

All spark programs are inseparableProgram initializationandPerform tasksThese two parts, so the question can be answered from these two parts.

Process of program initialization

  1. After the user submits the program through spark submit, the driver program starts to run (driver program is the submitted program running, which can be understood as Spark’s main program).
  2. The driver program initializes the sparkcontext first.
  3. The most important thing to do in the sparkcontext object is to construct a dagschedule and a taskscheduler.
  4. After the above task scheduler is built, it uses one of its background processes to register the application corresponding to the driver with the master node of spark. At this time, the application also includes the resource information required by the spark program.
  5. When the master receives the application, it will start the corresponding executor process from its own work node according to the resources required by the application.
  6. When the executor process in the worker node is started, it will reverse register to the taskscheduler and notify the taskscheduler that I am ready to work.
  7. At this point, the initialization of new sparkcontext() is completed, and the taskscheduler has obtained the executor resource.

2、 Process of task execution

  1. First, the job and stage are segmented. When the program reads the action, the dagschedulers start to segment the job, segment the stage according to the width dependence, then encapsulate it into a taskset, and send the taskset to the taskscheduler. One taskset corresponds to one stage, and one stage is in the corresponding job.
  2. Send tasks to the execution process. When the taskschedule gets the taskset, it will break the taskset into tasks one by one. The taskscheduler will submit each task in the taskset to the assigned executor for execution.
  3. Task starts to execute. When the executor receives a task, it will start a taskrunner, encapsulate the task, and taskrunner will take a thread from the thread pool to run the task.

Company No.: Guowei

——Focus on the explanation of big data interview, and try to make the solution clear in the simplest language.

Interview spark module - spark workflow?