Build Hadoop cluster: VirtualBox + Ubuntu 14.04 + Hadoop 2.6.0
After setting up the cluster, install eclipse on the Mac and connect to the Hadoop cluster
1. Access cluster
1.1. Modify MAC hosts
Add master IP to MAC hosts
##
# Host Database
#
# localhost is used to configure the loopback interface
# when the system is booting. Do not change this entry.
##
127.0.0.1 localhost
255.255.255.255 broadcasthost
::1 localhost
192.168.56.101 master ා add IP of master
1.2 access cluster
Under master, start the cluster
Mac, open ithttp://master:50070/
If you can access the cluster successfully, you can see the cluster information
2. Download and install eclipse
Eclipse IDE for Java Developers
http://www.eclipse.org/downloads/package…
3. Configure eclipse
3.1 configure Hadoop eclipse plugin
3.1.1 download Hadoop eclipse plugin
You can download thehadoop2x-eclipse-plugin(alternate download address:http://pan.baidu.com/s/1i4ikIoP)
3.1.2 install Hadoop eclipse plugin
Find an eclise in applications, right-click, and show package contents
Copy the plug-in to the plugins directory, and then re open eclipse
3.2 connecting Hadoop cluster
3.2.1 configure Hadoop installation directory
Unzip the Hadoop installation package to any directory without any configuration, and then point to that directory in eclipse
3.2.2 configure cluster address
Click the plus sign in the upper right corner
Add map / reduce view
Select Map / reduce locations, right-click and select new Hadoop location
You need to change the port and user name under location name, host and DFS master (master will refer to the IP configured by hosts in MAC), and then finish
3.2.3 view HDFS
See if you can access HDFS directly
4. Run wordcount in the cluster
4.1 create project
File -> New -> Other -> Map/Reduce Project
Enter the project name: wordcount and click finish
4.2 creating classes
Create a class, sign up org.apache.hadoop . examples, class name: wordcount
4.3 wordcount code
Copy the following code to WordCount.java in
package org.apache.hadoop.examples;
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
public class WordCount {
public static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}
public static class IntSumReducer
extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values,
Context context
) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if (otherArgs.length != 2) {
System.err.println("Usage: wordcount <in> <out>");
System.exit(2);
}
Job job = new Job(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
4.4 configure Hadoop parameters
Copy all modified configuration files and log4j. Properties to the SRC target
Here I copied the slaves, core- site.xml , hdfs- site.xml , mapred- site.xml , yarn- site.xml
4.4 configure HDFS I / O path
Move the mouse to WordCount.java Right click, run as, Java application
At this point, the program will not run normally. Right click again, run as, and select Run configurations
Fill in the input / output path (space division)
After configuration, click Run. Permission denied will appear
5. Problems in operation
5.1 Permission denied
No access to HDFS
#Suppose the Mac's user name is Hadoop
Groupadd supergroup ා
Useradd - G supergroup Hadoop ා add Hadoop user to supergroup group
#Modify the group permissions of HDFS files in Hadoop cluster, so that all users belonging to the super group group have read and write permissions
hadoop fs -chmod 777 /
6. Check Hadoop source code
6.1 download source code
http://apache.claz.org/hadoop/common/had…
6.2 link source code
In the search box in the upper right corner, search for open type
Enter namenode and select namenode. You can’t see the source code
Click attach source > external location > external streamer
reference material
Using eclipse to compile and run MapReduce program Hadoop 2.6.0_ Ubuntu/CentOS