VSCode+Maven+Hadoop development environment setup

Time:2022-8-18

With the help of the Maven plugin, VSCode is actually very convenient to write Java. In this lecture, we introduce how to use maven to build a Hadoop development environment with VScode.

1. Java environment installation

First, we need to set up a Java development environment. we need from the websitehttps://www.oracle.com/java/technologies/downloads/Download the Java zip or installation package of the specified version. The compressed package needs to be decompressed to the specified directory of the machine, and the installation package can be directly installed in a fool-like manner. What I downloaded here is the MacOS installation package of Java17. After running it, it is installed by default for me in/Library/Java/JavaVirtualMachines/temurin-17.jdk/Under contents.

Then configure the environment variables, Mac users in~/.zshrcadd:

export JAVA_HOME=/Library/Java/JavaVirtualMachines/temurin-17.jdk/Contents/Home

Thensource ~/.zshrcThat's it (Linux users, if I remember correctly, it should be~/.bashrcdocument).

can runjava -versioncommand to view the Java version of the current machine.

(base) [email protected] ~ % java -version
openjdk version "17" 2021-09-14
OpenJDK Runtime Environment Temurin-17+35 (build 17+35)
OpenJDK 64-Bit Server VM Temurin-17+35 (build 17+35, mixed mode)

The reader then needs to install the following Java plugins in VSCode:
NLP多任务学习
It should be noted that the Maven for Java plugin is used to build large-scale Java projects (that is, not only using JRE internal packages, but also using outsourced JDK packages. Internal packages usejavaWhen the command is compiled, it will automatically import it for us, but the external package is slightly more complicated, the easiest way is to use the maven tool).

2. Create a new Maven project

First, use VSCode to open a directory for the project (on my computer it is"/Users/lonelyprince7/Documents/LocalCode/Hadoop-MapReduce"), then select Create New Project from Maven Archetype in the context menu.

NLP多任务学习

Then select "maven-archetype-quickstart" from the drop-down menu.

NLP多任务学习

Then just choose any version.

NLP多任务学习

Then enter the name of the organization (this name is useful when packaging your project for distribution). The organization name is generally named ascom.×××, here I namedcom.orion. Note that the organization name can only consist of lowercase characters and underscores, and cannot include things like uppercase characters and spaces.

NLP多任务学习

Next, enter the name of the project (archetype), here we name it "hello_world".

NLP多任务学习

Then our project directory selects"/Users/lonelyprince7/Documents/LocalCode/Hadoop-MapReduce"Just do it.

NLP多任务学习

After that, VSCode will prompt whether to jump to the directory of the "hello_world" project, select "Yes", and then jump back to the directory.
Once inside that directory, we will find that the console is initializing the Maven configuration with the following printout:
NLP多任务学习
Among them, some options will pop up for us to choose, we just need to press Enter to select the default value.
NLP多任务学习
Finally, we see that the project "hello_world" is successfully created, and the project directory is as follows:
NLP多任务学习

2. Take the test

We can see that App.java has been generated for us as follows:

package com.orion;

/**
 * Hello world!
 *
 */
public class App 
{
    public static void main( String[] args )
    {
        System.out.println( "Hello World!" );
    }
}

pom.xml only introduces the most basic junit dependencies, and configures some basic Maven project initialization content:

4.0.0

  com.orion
  hello_world
  1.0-SNAPSHOT

  hello_world
  
  http://www.example.com

  
    UTF-8
    1.7
    1.7
  

  
    
      junit
      junit
      4.11
      test
    
  

  
    
      
        
        
          maven-clean-plugin
          3.1.0
        
        
        
          maven-resources-plugin
          3.0.2
        
        
          maven-compiler-plugin
          3.8.0
        
        
          maven-surefire-plugin
          2.22.1
        
        
          maven-jar-plugin
          3.0.2
        
        
          maven-install-plugin
          2.5.2
        
        
          maven-deploy-plugin
          2.8.2
        
        
        
          maven-site-plugin
          3.7.1
        
        
          maven-project-info-reports-plugin
          3.0.0

checkedApp.javafile pressF5Compile and run and see the console prints successfullyHello World!

(base) [email protected] hello_world %  /usr/bin/env /Library/Java/JavaVirtualMachines/temurin-17.jdk/Contents/Home/bin/java -agentlib:jdwp=transport=dt_socket,server=n,suspend=y,address=localhost:50684 -XX:+ShowCodeDetailsInExceptionMessages -cp /Users/lonelyprince7/Documents/LocalCode/Hadoop-MapReduce/hello_world/target/classes com.orion.App 
Hello World!

Let's try importing Hadoop dependencies with Maven next. We only need dependencies in pom.xml. The names and version numbers of related dependencies can be found on the official website of the Maven repositoryhttps://mvnrepository.com/For example, we can query the dependencies of the Hadoop-common (3.3.1) version as follows:

NLP多任务学习

Generally speaking, Hadoop projects need to introduce hadoop-common, hadoop-hdfs, hadoop-client, hadoop-yarn-api APIs, we need to add the following tagsbetween, that is:

junit
      junit
      4.11
      test
    
    
    
        org.apache.hadoop
        hadoop-common
        3.3.1
    
    
        org.apache.hadoop
        hadoop-hdfs
        3.3.1
    
    
        org.apache.hadoop
        hadoop-mapreduce-client-core
        3.3.1
    
    
        org.apache.hadoop
        hadoop-client
        3.3.1
    
    
        org.apache.hadoop
        hadoop-yarn-api
        3.3.1

But a clever inverse might have noticed, so the version number is completely "hardcoded". Later, if I want to modify the version number of hadoop, I can only modify the above one by one, which is very troublesome. good thing we areBetween tags, define some constants to be used later (such as version numbers), and then use them later${variable name}can be cited in the form of . For example, we will have the version number of hadoophadoop.versionappend toBetween tags:

UTF-8
    1.7
    1.7
    3.3.1

Then later we can use${hadoop.version}to replace the frequently occurring3.3.1.
Finally give our complete pom.xml file:

4.0.0

  com.orion
  hello_world
  1.0-SNAPSHOT

  hello_world
  
  http://www.example.com

  
  
    UTF-8
    1.7
    1.7
    3.3.1 
  

  
    
      junit
      junit
      4.11
      test
    
 
    
        org.apache.hadoop
        hadoop-common
        ${hadoop.version}
    
    
        org.apache.hadoop
        hadoop-hdfs
        ${hadoop.version}
    
    
        org.apache.hadoop
        hadoop-mapreduce-client-core
        ${hadoop.version}
    
    
        org.apache.hadoop
        hadoop-client
        ${hadoop.version}
    
    
        org.apache.hadoop
        hadoop-yarn-api
        ${hadoop.version}
    
  
  

  
    
      
        
        
          maven-clean-plugin
          3.1.0
        
        
        
          maven-resources-plugin
          3.0.2
        
        
          maven-compiler-plugin
          3.8.0
        
        
          maven-surefire-plugin
          2.22.1
        
        
          maven-jar-plugin
          3.0.2
        
        
          maven-install-plugin
          2.5.2
        
        
          maven-deploy-plugin
          2.8.2
        
        
        
          maven-site-plugin
          3.7.1
        
        
          maven-project-info-reports-plugin
          3.0.0
        
      
    
  s

according toctrl/command+sSave, the project willpom.xmlReparse, import the package we added into it.

Finally we tryApp.javaImport hadoop related packages in:

package com.orion;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
/**
 * Hello world!
 *
 */
public class App 
{
    public static void main( String[] args )
    {
        System.out.println( "Hello World!" );
    }
}

Press 'F5' to recompile and run, and print output successfullyHello World!, indicating that our package was imported successfully.

(base) [email protected] hello_world %  /usr/bin/env /Library/Java/JavaVirtualMachines/temurin-17.jdk/Contents/Home/bin/java -agentlib:jdwp=transport=dt_socket,server=n,suspend=y,address=localhost:54181 --enable-preview -XX:+ShowCodeDetailsInExceptionMessages -cp "/Users/lonelyprince7/Library/Application Support/Code/User/workspaceStorage/fe34994f622cdbfbf60e8ed045c6bde3/redhat.java/jdt_ws/jdt.ls-java-project/bin" com.orion.App 
Hello World!

Note that if an errororg.apache.hadoop.fs.FileSystem cant be resolvedFor this kind of error, you need to clear the project cache first, and then recompile and run.

So far, we have successfully created a Hadoop development environment with VSCode+Maven. In Hadoop programming, the most basic distributed programming paradigm is MapReduce programming. Later, we will explain the basic syntax of MapReduce programming in Hadoop by using "WordCount" (word frequency statistics), an entry-level item of MapReduce programming. Please continue to pay attention.