How spark works on maxcompute

Time:2020-3-22

1、 Spark system overview


How spark works on maxcompute

On the left is the architecture diagram of the native spark. On the right, spark on maxcompute runs on the Cupid platform developed by Alibaba cloud. The platform can support the computing framework supported by yarn, the open source community, such as spark.

2、 Configuration and use of spark running on the client


2.1 open the link to download the client to the local

http://odps-repo.oss-cn-hangzhou.aliyuncs.com/spark/2.3.0-odps0.30.0/spark-2.3.0-odps0.30.0.tar.gz?spm=a2c4g.11186623.2.12.666a4b69yO8Qur&file=spark-2.3.0-odps0.30.0.tar.gz

2.2 upload files to ECS

How spark works on maxcompute

2.3 unzip the file

tar -zxvf spark-2.3.0-odps0.30.0.tar.gz

2.4 configure spark-default.conf

# spark-defaults.conf

Generally speaking, the default template only needs to fill in the account information related to maxcompute to use spark

Spark.hadoop.odps.project.name = spark.hadoop.odps.access.id = spark.hadoop.odps.access.key = ා other configurations can keep their own values
spark.hadoop.odps.end.point = http://service.cn.maxcompute….
spark.hadoop.odps.runtime.end.point = http://service.cn.maxcompute….-inc.com/api
spark.sql.catalogImplementation=odps
spark.hadoop.odps.task.major.version = cupid_v2
spark.hadoop.odps.cupid.container.image.enable = true spark.hadoop.odps.cupid.container.vm.engine.type = hyper

2.5 download the corresponding code on GitHub

https://github.com/aliyun/MaxCompute-Spark

2.5 upload the code to ECS for decompression

unzip MaxCompute-Spark-master.zip

2.6 package the code into a jar package (make sure Maven is installed)

cd MaxCompute-Spark-master/spark-2.x
mvn clean package

2.7 check the jar package and run it

bin/spark-submit –master yarn-cluster –class com.aliyun.odps.spark.examples.SparkPi \
MaxCompute-Spark-master/spark-2.x/target/spark-examples_2.11-1.0.0-SNAPSHOT-shaded.jar

3、 Configuration and use of spark running in dataworks


3.1 enter the dataworks console interface and click business process

How spark works on maxcompute

3.2 open business process and create ODPs spark node

How spark works on maxcompute

3.3 upload the jar package resources, click the corresponding jar package to upload, and submit

How spark works on maxcompute

How spark works on maxcompute

How spark works on maxcompute

3.4 configure the node configuration corresponding to ODPs spark, click save and submit, and click Run to view the operation status

How spark works on maxcompute

4、 The use of spark in the local idea test environment


4.1 download client and template code and extract

Client: http://odps-repo.oss-cn-hangzhou.aliyuncs.com/spark/2.3.0-odps0.30.0/spark-2.3.0-odps0.30.0.tar.gz? SPM = a2c4g.11186623.2.12.666a4b69yo8qur & file = spark-2.3.0-odps0.30.0.tar.gz

How spark works on maxcompute

Template code:

https://github.com/aliyun/MaxCompute-Spark

4.2 open idea and click open to select template code

How spark works on maxcompute

How spark works on maxcompute

4.2 install Scala plug-in

How spark works on maxcompute

How spark works on maxcompute

4.3 configure maven

How spark works on maxcompute

4.4 configure JDK and related dependencies

How spark works on maxcompute

How spark works on maxcompute

Welcome to “maxcompute developer community 2 group”, click the link to apply for joining or scanning QR code https://h5.dingtalk.com/invite-page/index.html? Bizsource = \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\

Go to cloud and see yunqi No.: more cloud information, cloud cases, best practices, product introduction, visit: https://yqh.aliyun.com/

This article is the original content of Alibaba cloud and cannot be reproduced without permission.