Sharpening a knife doesn’t miss a firewood cutter. Here’s a question:
What is the relationship between MapReduce and yarn?
A: yarn is not the next generation MapReduce (mrv2). The next generation MapReduce is exactly the same as the first generation MapReduce (mrv1) in programming interface and data processing engine (maptask and reducetask), It can be considered that mrv2 reuses these modules of mrv1. The difference is resource management and job management system. In mrv1, both resource management and job management are implemented by jobtracker, which integrates the two functions. In mrv2, the two parts are separated, Among them, the job management is implemented by applicationmaster, while the resource management is completed by the new system yard. Because yard is universal, yard can also be used as a resource management system for other computing frameworks, not only MapReduce, but also other computing frameworks, such as spark and storm. Generally speaking, we call the computing framework running on yard “x on” Such as “MapReduce on horn”, “spark on horn”, “storm on horn”, etc.
Hadoop 2.0 consists of three subsystems, namely HDFS, yarn and MapReduce. Yarn is a new resource management system, while MapReduce is only an application running on yarn. If yarn is regarded as a cloud operating system, MapReduce can be regarded as an app running on this operating system.
What’s the relationship between MapReduce and yarn? Today, we are going to formally build the environment.
Building environment preparation: specific reference《Building hadoop-0.20.2 environment》Step one to step six
System: ubuntu-12.04 (other versions are also available)
Mode: pseudo distributed
Build user: Hadoop
Hadoop-2.2.0 download address:http://mirrors.hust.edu.cn/apache/hadoop/common/hadoop-2.2.0/
Choose the installation package you need, here we choose hadoop-2.2.0 tar.gz
Attached Hadoop image link address:http://www.apache.org/dyn/closer.cgi/hadoop/common/
Statement 1: the directory where I configure hadoop-2.2.0 is / home / Hadoop
Statement 2: the yarn directory is created under / home / Hadoop. The hadoop-2.2.0 directory and Hadoop data directory are both under the yarn directory.
Statement 3: you can change / home / Hadoop to your own directory in the following build process.
Step 1: upload hadoop-2.2.0 tar.gz And extract it to / home / Hadoop / yarn directory. At this time, extract hadoop-2.2.0 directory from yarn directory
sudo chown -R hadoop:hadoop hadoop-2.2.0
Create Hadoop data directory:
mkdir -p /home/hadoop/yarn/yarn_data/hdfs/namenode
mkdir -p /home/hadoop/yarn/yarn_data/hdfs/datanode
Before setting up the configuration file, let’s give a general introduction to each folder in hadoop-2.2.0 directory, and pay attention to the differences between hadoop-2.2.0 and hadoop-1.
The outer startup script is in the SBIN directory
The called scripts of the inner layer are in the bin directory
Native so files are all in the Lib / native directory
The configuration program files are placed in libexec
The configuration files are all in the etc directory, corresponding to the conf directory of the previous version
All jar packages are in the share / Hadoop directory
Step 2: configure environment variables
I didn’t make the environment global, so I didn’t configure the system environment / etc / profile in hadoop-2.2.0
If configured, executesource /etc/profile
To make it effective.
Step 3: Core- site.xml hdfs- site.xml mapred- site.xml yarn- site.xml to configure
Next, we will configure it in the / home / Hadoop / yarn / hadoop-2.2.0/etc/hadoop directory.
core- site.xml to configure
<configuration >
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
< description > specify the IP address and port number of namenode
</property>
</configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
< description > number of backups
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hadoop/yarn/yarn_data/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hadoop/yarn/yarn_data/hdfs/datanode</value>
</property>
</configuration>
mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>localhost:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>localhost:19888</value>
</property>
</configuration>
yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.resourcemanager.address</name>
<value>localhost:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>localhost:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>localhost:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>localhost:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>localhost:8088</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
Step 4: slave configuration
Because it is pseudo distributed, we only have localhost
Step 5: synchronize the configured hadoop-2.2.0 distribution to each data node
Because it’s pseudo distributed, skip this step.
Step 6: format namenode
Execute command:
bin/hdfs namenode –format
perhaps
bin/hadoop namenode –format
Step 7: start HDFS and yarn
Start HDFS:
sbin/start-dfs.sh
Start yarn:
sbin/start-yarn.sh
Or it can be executed
sbin/start-all.sh
Start HDFS and yarn together.
In addition, you need to start the history service, otherwise you cannot open the history link in the panel.
sbin/mr-jobhistory-daemon.sh start historyserver
Next, use the JPS command to view the startup process:
4504 ResourceManager
4066 DataNode
4761 NodeManager
5068 JobHistoryServer
4357 SecondaryNameNode
3833 NameNode
5127 Jps
Step 8: Test
HDFS test:
Create a file in HDFS: bin / Hadoop FS - MKDIR / wordcount
Upload file to HDFS: bin / Hadoop FS / home / Hadoop / file2.txt / wordcount
View the HDFS file directory: HDFS DFS – LS/
Yarn test: run wordcount test program,
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar wordcount /wordcount /output2
Specific results:
bin/hadoop fs -cat /output2/*
The results are as follows
hadoop 1
hello 2
java 4
jsp 1
Here, the hadoop-2.2.0 environment is finished, and the configuration files are configured according to the specific requirements. There may be improper configuration. If you see it, please correct it.