Abstract: This article will introduce spark with volcano from the development history and working principle of spark on kubernetes, and how volcano can help spark run more efficiently.
This article will start from the development history and working principle of spark on kubernetes. In the second part, we will give a brief introduction to spark with volcano. How can volcano help spark run more efficiently.
Spark on Kubernetes
Let’s look at the background of spark on kubernetes. In fact, since version 2.3, spark has supported kubernetes native, which allows spark users to run jobs on kubernetes and use kubernetes to manage the resource layer. In version 2.4, support for client mode and python language has been added. In this year’s release of spark 3.0, many important features have been added to spark on kubernetes, such as dynamic resource allocation, remote shuffle service and Kerberos support.
Advantages of spark on kubernetes
1) Elastic expansion and contraction
2) Resource utilization
3) Unified technology stack
4) Fine grained resource allocation
5) Logging and monitoring
Spark submit working principle
One of the earliest ways that spark supports kubernetes is through Spark’s official spark submit method. Clinet submits jobs through spark submit, and then spark driver calls some APIs of apisever to apply for the creation of executors. After all executors are up, they can perform real computing tasks, and then make log backup.
One advantage of this method is that the user experience of traditional spark users will change greatly after switching to this method. But there is also a lack of activity cycle management.
Spark-operator working principle
The second way to use spark on kubernetes is the operator. Operator is a more kubernetes way. You can see his whole job submission. First, yaml file submits the job through kubectl, in which it has its own CRD, that is, sparkapplication, object. After the creation of sparkapplication, the controller can watch the creation of these resources. The latter process is actually the first mode of reuse, but it is more perfect through this mode.
Compared with the first method, the controller here can maintain the object life cycle, watch the state of spark driver and update the state of application, which is a more perfect solution.
These two different ways of use have their own advantages, and many companies use them both. This one is also introduced on the official website.
Spark with Volcano
Volvo has integrated and supported the above two working modes. This link is the spark open source code repository we maintain:
In fact, what Volvo does is very simple. You can see the whole submission process. First of all, the job is submitted through spark submit. When the job is submitted, a podgroup is created. The podgroup contains some scheduling related information configured by the user. As you can see from its yaml file, two roles of driver and executor have been added to the right part of the page.
In fact, we talked about it in the first and second class. Because kubernetes has no queue support, it can’t share resources when multiple users or departments share a machine. But whether in the field of HPC or big data, resource sharing through queue is a basic requirement.
When sharing resources through queues, we provide a variety of mechanisms. In the top of the figure, we create two queues to share the resources of the whole cluster. One queue gives him 40% of the consulting resources, and the other gives him 60% of the resources. In this way, we can map the two different queues to different departments or different projects and use one queue respectively. In one queue, when resources are not used, they can be used by jobs in another queue. The following is about the resource balance between two different namespaces. In kubernetes, when users of two different application systems submit jobs, the more users submit jobs, the more resources they get from the cluster. Therefore, based on the namespace, we make fair scheduling to ensure that the resources of the cluster can be shared according to the weight.
Volcano: Pod delay creation
When I introduced this scene before, some students didn’t understand it very well, so I added a few pages of PPT to expand it.
For example, when we do the performance test, we submit 16 concurrent jobs. For each job, its specification is 1 driver + 4 executor. The whole cluster has 4 machines and 16 cores in total. This is a case.
When 16 spark jobs are submitted at the same time, there is a time difference between the creation of driver pod and the creation of executor pod. Because of this time difference, when the 16 spark jobs run up and occupy the whole cluster, the whole cluster will be stuck when the concurrent jobs are submitted at the same time.
In order to solve this situation, we have done such things.
Let a node specifically run the driver pod. The other three nodes specially run the executor pod to prevent the driver pod from occupying more resources, which can solve the problem of being stuck.
But there are also some bad things. In this example, the node is 1:3. In real scenarios, the job specifications of users are all dynamic, and this kind of allocation is divided in a static way, which can’t keep the same proportion with the dynamic in real business scenarios. There will always be some resource fragments and waste of resources.
Therefore, we added the function of pod delay creation. After adding this function, there is no need to statically divide the nodes. The whole node is still four nodes. When 16 jobs are put forward, the concept of podgroup is added to each job. The scheduler of Volvo will make resource planning according to the podgroup of the job.
In this way, too many assignments will not be submitted. It can not only use up all the resources in the four nodes, but also control the pace of pod creation in a high concurrency scenario without any waste. It is also very simple to use. It can allocate resources according to your needs, and solve the situation of running stuck or low operation efficiency in high concurrency scenarios.
Volcano: Spark external shuffle service
We know that the original spark is perfect and has a lot of functions that are very easy to use. Volvo ensures that there is no major lack of functions after migrating to kubernetes
1) ESS is deployed on each node as a daemonset
2) Shuffle writes shuffle data locally and reads shuffle data locally and remotely
3) Support dynamic resource allocation