Spark operation principle


notes preceding the text of a book or following the title of an article

The operating principle of spark is particularly important for the learning of spark. If you don’t understand its operation principle, you won’t be able to write Spark Program well. This will be the last article on spark theory. Next, we will share spark in practice

  • The composition of spark architecture is as follows:

Spark operation principle

  • Cluster manager: in the stand-alone mode, it is the master master node, which controls the whole cluster and monitors the workers. Resource manager in yarn mode
  • Worker node: the slave node, which is responsible for controlling the computing node and starting the executor or driver.
  • Driver: run the main() function of application
  • Executor: an executor is a process running on the worker node for an application. There is a thread pool in the executor
  • The spark task operation flow chart is as follows:

Spark operation principle

  1. Build the running environment of spark application and start sparkcontext
  2. Sparkcontext applies to resource manager (can be standalone, mesos, yarn) to run executor resource, and starts standalonexecutorbackend,
  3. The executor applies to sparkcontext for task
  4. Sparkcontext distributes the application to the executor
  5. Sparkcontext builds DAG diagram, decomposes DAG diagram into stage, sends taskset to task scheduler, and finally sends task to executor
  6. The task runs on the executor and releases all resources after running
    Spark operation features:
  7. Each application gets its own executor process, which resides during the application and runs the task in a multithreaded manner. This application isolation mechanism has advantages, whether from the perspective of scheduling (each driver schedules its own tasks) or from the perspective of running (tasks from different applications run in different JVMs). Of course, this means that spark applications cannot share data across applications unless data is written to an external storage system
  8. Spark has nothing to do with the resource manager, as long as you can get the executor process and keep communicating with each other
  9. The client submitting the sparkcontext should be close to the worker node (the node running the executor), preferably in the same rack, because there is a lot of information exchange between sparkcontext and executor during the running of spark application
  10. Task adopts the optimization mechanism of data locality and speculative execution

Common terms:

  • Application: application refers to the spark application written by users, including the code of one driver function and the executor code running on multiple nodes in the cluster
  • Driver: The driver in spark is to run the main function of the above application and create the sparkcontext. The purpose of creating the sparkcontext is to prepare the running environment of the spark application. In spark, there are sparkcontext responsible for communicating with the clustermanager, resource application, task allocation and monitoring. When the executor part is running, the driver is responsible for closing the sparkcontext at the same time. Usually Using sparkcontext to represent driver
  • Executor: a process of an application running on the worker node. The process is responsible for running some tasks and saving data to memory or disk. Each application has its own batch of executors. In spark on yarn mode, the process name is coarsegrainedexecutor backend. A coarsegrainedexecutor backend has only one executor object, which is responsible for wrapping the task as taskrunner and extracting an idle thread from the thread pool to run the task. The number of tasks that can be run in parallel by each oarsegrainedexecutior backend depends on the number of CPUs allocated to it
  • Cluter Manager: refers to the external services that acquire resources on the cluster. There are currently three types

    1. Standalone: the original resource management of spark, with the Master responsible for the allocation of resources
    2. Apache mesos: a resource scheduling framework with good compatibility with Hadoop Mr
    3. Hadoop yarn: mainly refers to ResourceManager in yarn
  • Worker: any node in the cluster that can run application code. In the standby mode, it refers to the worker node configured through the slave file, and in the spark on yarn mode, it is the nodemanager node
  • Task: a work unit sent to an executor. However, maptask in Hadoop MR is the basic unit for running application, just like the concept of reducetask. Multiple tasks form a stage, and the task scheduler is responsible for task scheduling and management
  • Job: parallel computing consisting of multiple tasks is often generated by spark action. Multiple jobs are often generated in one application
  • Stage: each job will be divided into multiple groups of tasks. As a taskset, its name is stage. Dagscheduler is responsible for the division and scheduling of stage. Stage has two types: non final stage (shuffle map stage) and final stage (result stage). The boundary of stage is where shuffle occurs
  • Dagscheduler: build DAG (directed acyclic graph) based on stage according to job, and submit stage to taskscheduler. The stage is divided based on the dependency relationship between RDDS to find the scheduling method with the lowest cost, as shown in the following figure

Spark operation principle

  • Taskscheduler: submit the taskset to the worker to run. The task scheduler maintains all tasksets. When the executor heartbeat to the driver, the taskscheduler will allocate the corresponding tasks according to the remaining resources. In addition, the task scheduler maintains the running labels of all tasks and retries the failed tasks. The figure below shows the role of the task scheduler
  • Spark operation principle
  • In different operation modes, the task scheduler is as follows:

    1. Spark on standalone mode is taskscheduler
    2. The yarn client mode is yarnclientclusterscheduler
    3. Yarn cluster mode is yarnclusterscheduler
  • The running hierarchy of these terms is as follows:
  • Spark operation principle
  • Job = multiple stages, stage = multiple tasks of the same kind, tasks are divided into shufflemattask and resulttask, dependency is divided into shuffledependency and narrowdependency

Spark operation mode:

  • The operation modes of spark are various and flexible. When deployed on a single machine, it can run in local mode or pseudo distribution mode. When deployed in a distributed cluster mode, there are many operation modes to choose from. This depends on the actual situation of the cluster. The underlying resource scheduling can either rely on the external resource scheduling framework or use spark Built in stand-alone mode.
  • For the support of external resource scheduling framework, the current implementation includes relatively stable mesos mode and Hadoop yarn mode
  • Local mode: commonly used for local development and testing, local mode and local cluster mode respectively

Standalone: independent cluster operation mode

  • Standalone mode uses Spark’s own resource scheduling framework
  • Using the typical architecture of master / slaves, zookeeper is used to realize the ha of master
  • The frame structure is as follows:
  • Spark operation principle

The main nodes of this mode are client node, master node and worker node. The driver can run on both the master node and the local client. When spark shell interactive tool is used to submit spark job, driver runs on master node; when spark submit tool is used to submit job, or “new” is used on development platforms such as eclipse and idea SparkConf.setManager (“ spark://master When the spark task is run in “mode, the driver runs on the local client

Yarn mode operation:

  • Spark on yarn mode can be divided into two modes according to the location of the driver in the cluster: one is the yarn client mode, and the other is the yarn cluster (or yarn standalone mode)
  • In the yarn client mode, the driver runs locally on the client. This mode enables spark application to interact with the client. Because the driver is on the client, the driver status can be accessed through the webui. The default is http://hadoop1 : 4040 and yarn via http: / / Hadoop 1:8088

The workflow steps of yarn client are as follows:
Spark operation principle

  • In the yarn cluster mode, when the user submits an application to yarn, yarn will run the application in two phases:

    1. The first stage is to start the spark driver as an application master in the yarn cluster;
    2. In the second stage, the application master creates the application, then applies for resources to the resource manager for it, starts the executor to run the task, and monitors its whole running process until the running is completed
  • The yarn cluster workflow is divided into the following steps
  • Spark operation principle
  • Spark born client submits application program to yarn, including applicationmaster program, command to start applicationmaster, program to be run in executor, etc
  • After receiving the request, the ResourceManager selects a nodemanager in the cluster, allocates the first container for the application, and requires it to start the applicationmaster of the application in this container, where the applicationmaster initializes the sparkcontext, etc
  • Applicationmaster registers with ResourceManager so that users can view the running status of the application directly through ResourceManager. Then, it will apply for resources for various tasks by polling through RPC Protocol, and monitor their running status until the end of running
  • Once the application master applies to the resource (container), it will communicate with the corresponding nodemanager and ask it to start coarsegrainedexecutorbackend in the obtained container. After starting coarsegrainedexecutorbackend, it will register with the sparkcontext in the application master and apply for task. This is the same as the standalone mode, except that when the sparkcontext is initialized in spark application, coarsegrainedschedulerbackend is used to schedule tasks with yarnclusterscheduler, in which yarnclusterscheduler is only a simple package for taskschedulerimpl, adding waiting logic for executor, etc
  • The sparkcontext in applicationmaster assigns tasks to coarsegrainedexecutor backend to execute. Coarsegrainedexecutor backend runs tasks and reports the running status and progress to applicationmaster, so that the applicationmaster can grasp the running status of each task at any time, so that the task can be restarted when the task fails
  • After the application runs, the applicationmaster applies to the ResourceManager to log off and shut down

The differences between Spark Client and Spark Cluster are as follows:

  • Before understanding the deep difference between yarn client and yarn cluster, one concept should be clear: application master. In yarn, each application instance has an applicationmaster process, which is the first container that the application starts. It is responsible for dealing with the ResourceManager and requesting the resource. After getting the resource, it tells nodemanager to start the container for it. From the deep meaning, the difference between yarn cluster and yarn client mode is actually the difference between applicationmaster process
  • In the yarn cluster mode, the driver runs in am (application master), which is responsible for applying for resources from yarn and supervising the running status of jobs. After the user submits the job, he can turn off the client, and the job will continue to run on yarn, so the yarn cluster mode is not suitable for running interactive jobs
  • In the yarn client mode, the application master only requests the executor from yarn, and the client will communicate with the requested container to schedule their work, that is, the client cannot leave