Eclipse submit task to Hadoop cluster under mac

Time:2020-12-3

Build Hadoop cluster: VirtualBox + Ubuntu 14.04 + Hadoop 2.6.0

After setting up the cluster, install eclipse on the Mac and connect to the Hadoop cluster

1. Access cluster

1.1. Modify MAC hosts

Add master IP to MAC hosts

##
# Host Database
#
# localhost is used to configure the loopback interface
# when the system is booting.  Do not change this entry.
##
127.0.0.1    localhost
255.255.255.255    broadcasthost
::1             localhost

192.168.56.101 master ා add IP of master

1.2 access cluster

Under master, start the cluster
Mac, open ithttp://master:50070/
If you can access the cluster successfully, you can see the cluster information

2. Download and install eclipse

Eclipse IDE for Java Developers

http://www.eclipse.org/downloads/package…

3. Configure eclipse

3.1 configure Hadoop eclipse plugin

3.1.1 download Hadoop eclipse plugin

You can download thehadoop2x-eclipse-plugin(alternate download address:http://pan.baidu.com/s/1i4ikIoP

3.1.2 install Hadoop eclipse plugin

Find an eclise in applications, right-click, and show package contents

Eclipse submit task to Hadoop cluster under mac

Copy the plug-in to the plugins directory, and then re open eclipse

Eclipse submit task to Hadoop cluster under mac

3.2 connecting Hadoop cluster

3.2.1 configure Hadoop installation directory

Unzip the Hadoop installation package to any directory without any configuration, and then point to that directory in eclipse

Eclipse submit task to Hadoop cluster under mac

3.2.2 configure cluster address

Click the plus sign in the upper right corner

Eclipse submit task to Hadoop cluster under mac

Add map / reduce view

Eclipse submit task to Hadoop cluster under mac

Select Map / reduce locations, right-click and select new Hadoop location

Eclipse submit task to Hadoop cluster under mac

You need to change the port and user name under location name, host and DFS master (master will refer to the IP configured by hosts in MAC), and then finish

Eclipse submit task to Hadoop cluster under mac

3.2.3 view HDFS

See if you can access HDFS directly

Eclipse submit task to Hadoop cluster under mac

4. Run wordcount in the cluster

4.1 create project

File -> New -> Other -> Map/Reduce Project

Enter the project name: wordcount and click finish

4.2 creating classes

Create a class, sign up org.apache.hadoop . examples, class name: wordcount

4.3 wordcount code

Copy the following code to WordCount.java in

package org.apache.hadoop.examples;
 
import java.io.IOException;
import java.util.StringTokenizer;
 
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
 
public class WordCount {
 
  public static class TokenizerMapper 
       extends Mapper<Object, Text, Text, IntWritable>{
 
    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();
 
    public void map(Object key, Text value, Context context
                    ) throws IOException, InterruptedException {
      StringTokenizer itr = new StringTokenizer(value.toString());
      while (itr.hasMoreTokens()) {
        word.set(itr.nextToken());
        context.write(word, one);
      }
    }
  }
 
  public static class IntSumReducer 
       extends Reducer<Text,IntWritable,Text,IntWritable> {
    private IntWritable result = new IntWritable();
 
    public void reduce(Text key, Iterable<IntWritable> values, 
                       Context context
                       ) throws IOException, InterruptedException {
      int sum = 0;
      for (IntWritable val : values) {
        sum += val.get();
      }
      result.set(sum);
      context.write(key, result);
    }
  }
 
  public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();
    String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
    if (otherArgs.length != 2) {
      System.err.println("Usage: wordcount <in> <out>");
      System.exit(2);
    }
    Job job = new Job(conf, "word count");
    job.setJarByClass(WordCount.class);
    job.setMapperClass(TokenizerMapper.class);
    job.setCombinerClass(IntSumReducer.class);
    job.setReducerClass(IntSumReducer.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);
    FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
    FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
    System.exit(job.waitForCompletion(true) ? 0 : 1);
  }
}

4.4 configure Hadoop parameters

Copy all modified configuration files and log4j. Properties to the SRC target

Here I copied the slaves, core- site.xml , hdfs- site.xml , mapred- site.xml , yarn- site.xml

4.4 configure HDFS I / O path

Move the mouse to WordCount.java Right click, run as, Java application

Eclipse submit task to Hadoop cluster under mac

At this point, the program will not run normally. Right click again, run as, and select Run configurations

Fill in the input / output path (space division)

Eclipse submit task to Hadoop cluster under mac

After configuration, click Run. Permission denied will appear

5. Problems in operation

5.1 Permission denied

No access to HDFS

#Suppose the Mac's user name is Hadoop
Groupadd supergroup ා
Useradd - G supergroup Hadoop ා add Hadoop user to supergroup group

#Modify the group permissions of HDFS files in Hadoop cluster, so that all users belonging to the super group group have read and write permissions
hadoop fs -chmod 777 /

6. Check Hadoop source code

6.1 download source code

http://apache.claz.org/hadoop/common/had…

6.2 link source code

In the search box in the upper right corner, search for open type

Eclipse submit task to Hadoop cluster under mac

Enter namenode and select namenode. You can’t see the source code

Click attach source > external location > external streamer

Eclipse submit task to Hadoop cluster under mac

reference material

Using eclipse to compile and run MapReduce program Hadoop 2.6.0_ Ubuntu/CentOS