Spark 1.6.0 single installation configuration (basic)

Time:2020-12-7

This article will introduce the deployment of Apache spark 1.6.0 on a single machine, which is basically the same as that in the cluster, except that some master and slave file configurations are missing. Direct installation of scala and spark can be used on a single machine. However, if HDFS system is used, Hadoop and JDK should also be configured. It is recommended to install and configure them all.
My blog original address link:http://blog.tomgou.xyz/spark-160-dan-ji-an-zhuang-pei-zhi.html

0. Spark Installation Preparation

Documents on Spark’s official websitehttp://spark.apache.org/docs/latest/It says this:

Spark runs on Java 7+, Python 2.6+ and R 3.1+. For the Scala API, Spark 1.6.0 uses Scala 2.10. You will need to use a compatible Scala version (2.10.x).

My computer environment is Ubuntu 14.04.4 LTS, and I need to install:


1. Install JDK

Unzip the JDK installation package to any directory:

cd /home/tom
$ tar -xzvf jdk-8u73-linux-x64.tar.gz
$ sudo vim /etc/profile

Edit the / etc / profile file and add Java environment variables at the end:

export JAVA_HOME=/home/tom/jdk1.8.0_73/
export JRE_HOME=/home/tom/jdk1.8.0_73/jre
export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH

Save and update/etc/profile

$ source /etc/profil

To see if it is successful:

$ java -version

2. Configure SSH localhost

Make sure SSH is installed:

$ sudo apt-get update
$ sudo apt-get install openssh-server
$ sudo /etc/init.d/ssh start

Generate and add key:

$ ssh-keygen -t rsa
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ chmod 0600 ~/.ssh/authorized_keys

If the key has already been generated, just execute the last two lines.
Test SSH localhost

$ ssh localhost
$ exit

3. Install Hadoop 2.6.0

Unzip Hadoop 2.6.0 to any directory:

$ cd /home/tom
$ wget http://apache.claz.org/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz
$ tar -xzvf hadoop-2.6.0.tar.gz

edit/etc/profileFile, add Java environment variables at the end:

export HADOOP_HOME=/home/tom/hadoop-2.6.0
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin

edit$HADOOP_HOME/etc/hadoop/hadoop-env.shfile

$ vim $HADOOP_HOME/etc/hadoop/hadoop-env.sh

Add at the end:

export JAVA_HOME=/home/tom/jdk1.8.0_73/

Modify configuration file:

$ cd $HADOOP_HOME/etc/hadoop

modifycore-site.xml

<configuration>
<property>
  <name>fs.default.name</name>
    <value>hdfs://localhost:9000</value>
</property>
</configuration>

modifyhdfs-site.xml

<configuration>
<property>
 <name>dfs.replication</name>
 <value>1</value>
</property>

<property>
  <name>dfs.name.dir</name>
    <value>file:///home/tom/hadoopdata/hdfs/namenode</value>
</property>

<property>
  <name>dfs.data.dir</name>
    <value>file:///home/tom/hadoopdata/hdfs/datanode</value>
</property>
</configuration>

The first one is the number of DFS backups. One copy is enough for a single machine. The last two are namenode and datanode directories.

modifymapred-site.xml

<configuration>
 <property>
  <name>mapreduce.framework.name</name>
   <value>yarn</value>
 </property>
</configuration>

modifyyarn-site.xml

<configuration>
 <property>
  <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
 </property>
</configuration>

Initialize Hadoop:

$ hdfs namenode -format

start-up

$ $HADOOP_HOME/sbin/start-all.sh

stop it

$ $HADOOP_HOME/sbin/stop-all.sh

Check the webui, and the browser opens the port:http://localhost:8088

  • port 8088: cluster and all applications

  • port 50070: Hadoop NameNode

  • port 50090: Secondary NameNode

  • port 50075: DataNode

Hadoop can be used after runningjpsCommand view, the results are as follows:

10057 Jps
9611 ResourceManager
9451 SecondaryNameNode
9260 DataNode
9102 NameNode
9743 NodeManager

4. Install Scala

Unzip the scala installation package to any directory:

$ cd /home/tom
$ tar -xzvf scala-2.10.6.tgz
$ sudo vim /etc/profile

stay/etc/profileAdd environment variables to the end of the file:

export SCALA_HOME=/home/tom//scala-2.10.6
export PATH=$SCALA_HOME/bin:$PATH

Save and update/etc/profile

$ source /etc/profil

To see if it is successful:

$ scala -version

5. Install spark

Unzip the spark installation package to any directory:

$ cd /home/tom
$ tar -xzvf spark-1.6.0-bin-hadoop2.6.tgz
$ mv spark-1.6.0-bin-hadoop2.6 spark-1.6.0
$ sudo vim /etc/profile

stay/etc/profileAdd environment variables to the end of the file:

export SPARK_HOME=/home/tom/spark-1.6.0
export PATH=$SPARK_HOME/bin:$PATH

Save and update/etc/profile

$ source /etc/profil

Copy and rename in the conf directoryspark-env.sh.templatebyspark-env.sh

$ cp spark-env.sh.template spark-env.sh
$ vim spark-env.sh

stayspark-env.shAdd:

export JAVA_HOME=/home/tom/jdk1.8.0_73/
export SCALA_HOME=/home/tom//scala-2.10.6
export SPARK_MASTER_IP=localhost
export SPARK_WORKER_MEMORY=4G

start-up

$ $SPARK_HOME/sbin/start-all.sh

stop it

$ $SPARK_HOME/sbin/stop-all.sh

To test whether spark is installed successfully:

$ $SPARK_HOME/bin/run-example SparkPi

The results are as follows

Pi is roughly 3.14716

Check the webui, and the browser opens the port:http://localhost:8080