Spark 2.4.5 cluster installation and local development

Time:2020-10-31

download

Official website address: https://www.apache.org/dyn/cl…

Verify that Java is installed

java -verison

JDK download address

Unzip installation

tar -zxvf jdk-14.0.1_linux-x64_bin.tar.gz
mv jdk-14.0.1 /usr/local/java

Verify that Scala is installed

scala -verison
wget https://downloads.lightbend.com/scala/2.13.1/scala-2.13.1.tgz
tar xvf scala-2.13.1.tgz
mv scala-2.13.1 /usr/local/
vi /etc/profile

export JAVA_HOME=/usr/local/java
export SPARK_HOME=/usr/local/spark
export CLASSPATH=$JAVA_HOME/jre/lib/ext:$JAVA_HOME/lib/tools.jar
export PATH=$JAVA_HOME/bin:$PATH:$SPARK_HOME/bin

source /etc/profile
  • Verify again that the installation is successful
scala -version
java -verison

Install spark

  1. Unzip and move to the appropriate directory
tar -zxvf spark-2.4.5-bin-hadoop2.7.tgz 
mv spark-2.4.5-bin-hadoop2.7 /usr/local/spark
  1. Setting the spark environment variable
vi /etc/profile
export PATH=$PATH:/usr/local/spark/bin

Save, refresh

source /etc/profile
  1. Verify the spark shell
spark-shell

The following message appears
Spark 2.4.5 cluster installation and local development

Setting spark master node

The spark configuration provides the corresponding template configuration. We copy it

cd /usr/local/spark/conf/
cp spark-env.sh.template  spark-env.sh
vi spark-env.sh
  • Set the IP of master node
SPARK_MASTER_HOST='192.168.56.109'
JAVA_HOME=/usr/local/java
  • If it is a stand-alone startup
./sbin/start-master.sh
  • Open http://192.168.56.109 :8080/

The following interface appears
Spark 2.4.5 cluster installation and local development

  • stop it
./sbin/stop-master.sh
  • Set hosts
192.168.56.109 master
192.168.56.110 slave01
192.168.56.111 slave02

Password free login

Execute on master

ssh-keygen -t rsa -P ""

Generate three files
Spark 2.4.5 cluster installation and local development

Will ID_ rsa.pub Copy to slave. Note authorized_ Keys is ID_ rsa.pub , named authorized on the slave machine_ Keys, operation

scp -r id_rsa.pub [email protected]:/root/.ssh/authorized_keys
scp -r id_rsa.pub [email protected]:/root/.ssh/authorized_keys
cp id_rsa.pub authorized_keys

To the Slava machine

chmod 700 .ssh
  • Check whether you can log in to slave01 and slave02 without password
ssh slave01
ssh slave02

Master and slave configure worker nodes

cd /usr/local/spark/conf
cp slaves.template slaves

Add two slave nodes. Note: do not add master in the slave file, or the master will become a slave node

vi slaves
slave01
slave02

Master node start

cd /usr/local/spark
 ./sbin/start-all.sh 

If Java appears_ If the home is not set error, the spark in the slave node’s configuration directory is required- env.sh Add Java in_ HOME=/usr/local/java

If the startup is successful: http://192.168.56.109 : 8080 /, two workers will appear
Spark 2.4.5 cluster installation and local development

Local development

Unzip spark-2.4.5-bin-hadoop 2.7 above to the local directory and double-click spark in the bin directory- shell.cmd There should be no accident

Could not locate executable null\bin\winutils.exe in the Hadoop binaries.

The reason for the error is that the Hadoop windows executable was not downloaded. Because we don’t have Hadoop environment locally, winutils can be used to simulate it. We don’t need to build Hadoop
You can download it here. If you want to download other versions, you can choose

  • Setting native environment variables

Spark 2.4.5 cluster installation and local development
Restart again, you can see the following information is successful
Spark 2.4.5 cluster installation and local development

  • Add the following environment variables to the run / debug configuration in idea

Spark 2.4.5 cluster installation and local development

  • The idea also needs to add Scala plug-ins, which can be used happily later data.show () check the table

Spark 2.4.5 cluster installation and local development
Spark 2.4.5 cluster installation and local development

<div style=’text-align:center; margin:0 auto’>
< H5 > please pay attention to the following more wonderful articles to share < / H5 >

Spark 2.4.5 cluster installation and local development

</div>

Recommended Today

Summary of computer foundation for 2021 autumn recruitment interview database, redis

Series of articles: Summary of computer foundation in autumn 2021 interview algorithm, data structure, design pattern, Linux Summary of computer foundation in autumn 2021 interview – Java foundation, JVM, spring framework Summary of computer foundation for 2021 autumn recruitment interview database, redis Summary of computer foundation for 2021 autumn recruitment interview – operating system Summary […]