Deployment of spark development environment under IntelliJ idea windows

Time:2019-12-21

0x01 environment description

Blog address: http://www.cnblogs.com/ning-wang/p/7359977.html

1.1 local

OS: windows 10
jdk: jdk1.8.0_121
scala: scala-2.11.11
IDE: IntelliJ IDEA ULTIMATE 2017.2.1

1.2 server

OS: CentOS_6.5_x64
jdk: jdk1.8.111
hadoop: hadoop-2.6.5
spark: spark-1.6.3-bin-hadoop2.6
scala: scala-2.11.11

0x02 windows configuration

2.1 install JDK

Configure environment variables

JAVA_HOME
CLASSPATH
Path

2.2 configure hosts

2.2.1 document location

C:\Windows\System32\drivers\etc

2.2.2 new content

The contents of the hosts file are the same as those of the cluster

192.168.1.100    master
192.168.1.101    slave1
192.168.1.102    slave2

2.2.3 installing IntelliJ idea

Pay attention to plug-in installationMaven

2.2.4 install Scala plug-in in idea

Deployment of spark development environment under IntelliJ idea windows

0x03 server side configuration

3.1 install JDK

3.2 installing Hadoop

Configure remote access

3.3 installing spark

0x04 test

4.1 new Maven project

Deployment of spark development environment under IntelliJ idea windows

4.2 add dependency package

File -> Project Structure -> LibrariesAdd tospark-assembly-1.6.3-hadoop2.6.0.jar(location on server sidespark/lib/Next)

Deployment of spark development environment under IntelliJ idea windows

4.3 create a new connectionutil class

staysrc\main\javaNew under directoryjavaclassConnectionUtil

import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaSparkContext;

public class ConnectionUtil {

    public static final String master = "spark://master:7077";

    public static void main(String[] args) {

        SparkConf sparkConf = new SparkConf().setAppName("demo").setMaster(master);
        JavaSparkContext javaSparkContext = new JavaSparkContext(sparkConf);
        System.out.println(javaSparkContext);
        javaSparkContext.stop();
  }
}

4.4 compile and run

Deployment of spark development environment under IntelliJ idea windows

If the above results show that the operation is correct.

4.5 running javawordcount

4.5.1 data preparation

Prepare a documentUnlimited formatUpload tohdfsUp.

$ vim wordcount.txt
hello Tom
hello Jack
hello Ning
#Upload file
$ hadoop fs -put wordcount.txt /user/hadoop/
#Check whether the file is uploaded successfully
$ hadoop fs -ls /user/hadoop/

4.5.2 code

sparkIn the installation packageexample, which specifies the path of the jar package and the input file.

import scala.Tuple2;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.FlatMapFunction;
import org.apache.spark.api.java.function.Function2;
import org.apache.spark.api.java.function.PairFunction;
import java.util.Arrays;
import java.util.List;
import java.util.regex.Pattern;
public final class JavaWordCount {
    private static final Pattern SPACE = Pattern.compile(" ");
 public static void main(String[] args) throws Exception {
//      if (args.length < 1) {
//        System.err.println("Usage: JavaWordCount <file>");
//        System.exit(1);
//      }
  SparkConf sparkConf = new SparkConf().setAppName("JavaWordCount")
            .setMaster("spark://master:7077")
            .set("spark.executor.memory", "512M");
  JavaSparkContext ctx = new JavaSparkContext(sparkConf);
  ctx.addJar("D:\workspace\spark\JavaWordCount.jar");
  String path = "hdfs://master:9000/user/hadoop/wordcount.txt";
  JavaRDD<String> lines = ctx.textFile(path);
  JavaRDD<String> words = lines.flatMap(new FlatMapFunction<String, String>() {
      @Override
  public Iterable<String> call(String s) {
        return Arrays.asList(SPACE.split(s));
  }
    });
  JavaPairRDD<String, Integer> ones = words.mapToPair(new PairFunction<String, String, Integer>() {
        @Override
  public Tuple2<String, Integer> call(String s) {
            return new Tuple2<String, Integer>(s, 1);
  }
    });
  JavaPairRDD<String, Integer> counts = ones.reduceByKey(new Function2<Integer, Integer, Integer>() {
        @Override
  public Integer call(Integer i1, Integer i2) {
            return i1 + i2;
  }
    });
  List<Tuple2<String, Integer>> output = counts.collect();
 for (Tuple2<?,?> tuple : output) {
        System.out.println(tuple._1() + ": " + tuple._2());
  }
    ctx.stop();
  }
}

4.5.3 packing

It’s very important, otherwise there will be all kinds of errors in the operation, and even it can’t be run
stayFile – > project structure – > artifacts click green “+”, add — > jar — > from modules with dependencies

Deployment of spark development environment under IntelliJ idea windows

Enter the main class entry function name, and delete all jar packages under the output layout (because the spark running environment already contains these packages). If they already existMETA-INFDelete this folder first. Then apply, OK
Compiler:Build–>Build Artifacts…, and then select the project to compile

Deployment of spark development environment under IntelliJ idea windows

The output jar package can be found in the out directory generated by the current project and placed in the specified location in the program (that isaddJar()Path set in method)

Deployment of spark development environment under IntelliJ idea windows

4.5.4 operation procedure

Deployment of spark development environment under IntelliJ idea windows

0x05 problems

5.1 class file for scala.clonable not found

Problem Description:Java: unable to access scala.clonable class file for scala.clonable not found
Reason:What we used to do isspark-2.1.0-bin-hadoop2.4No,spark-assembly-1.6.3-hadoop2.6.0.jarDependency package.
Solve:Because the original version of Hadoop is2.5.2The corresponding dependency package official website is no longer supported, so the updated platform’s Hadoop environment is2.6.5, spark 2. X has few corresponding documents. Change the version to1.6.3

Create: 2017-08-12 10:33:55 Saturday
Update: 2017-08-14 20:10:47 Monday
update: 2017.10.17
From Weizhi notes to SF.

Recommended Today

The basic syntax and function of triggers in SQLSEVER

What is a trigger? Triggers are stored procedures that are executed automatically when a table is inserted, updated, or deleted. A special stored procedure that is automatically executed when a trigger inserts, updates, or deletes a table. Triggers are generally used for more complex check constraints. The difference between a trigger and a normal stored […]