Hadoop (3) pseudo distributed operation of Hadoop

Time:2020-3-31

This article has been synchronized to liaosi’s blog Hadoop (3) pseudo distributed operation of Hadoop

The VMware virtual machine used in this article is CentOS 7 ﹐ 64 bit for Linux system, Hadoop 2.8.2 for Hadoop, JDK 1.8 for JDK. The account used is the created Hadoop account (refer to Hadoop (I) Hadoop introduction and preparation before installation).
Before installing Hadoop, make sure that the system has installed Java JDK and configured Java environment variables.

In the previous article, the Hadoop stand-alone mode did not start the HDFS and yarn processes. Through the JPS command, it was found that only a process called runjar was started. Hadoop is usually a large cluster, specifically HDFS cluster and yarn cluster. If both clusters have only one machine, it is the pseudo distribution mentioned in this paper. If both clusters have multiple machines, it is completely distributed, which is also the case in practical applications.

How to operate the pseudo distribution of Hadoop?

1、 Preparations

The preparation work is the same as that of Hadoop in stand-alone mode (refer to Hadoop (2) HelloWorld (installation and use in stand-alone mode)), mainly including two steps:

  1. Configure the Java installation directory in the Hadoop running environment configuration file
    edit${HADOOP_HOME}/etc/hadoop/hadoop-env.shFile, set Java home as the Java installation root path.

      cd $HADOOP_HOME/etc/hadoop
      vim hadoop-env.sh
    

    Hadoop (3) pseudo distributed operation of Hadoop

  2. Configuring environment variables for Hadoop
    stay/etc/profileAdd:

        export HADOOP_HOME=/opt/hadoop-2.8.1
        export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
    

    Like mine./etc/profileSet it as follows:
    Hadoop (3) pseudo distributed operation of Hadoop

2、 Edit profile

Enter the $Hadoop? Home / etc / Hadoop directory to start Hadoop single node cluster configuration.
For the description of the main configuration files of Hadoop, please refer to the official website document: http://hadoop.apache.org/docs… And the description of configuration is in the lower left corner.

Hadoop (3) pseudo distributed operation of Hadoop

1. Edit core-site.xml

<configuration>
    <! -- 1. Configure the default file system of the file system. The HDFS file system is configured here. HDFS is only one kind of file system, which is configured by a URI. The specific format is 
    {protocol name}: // {namenode host address or host name}: {port of namenode} 
    The port of namenode is also the port for the client to access the HDFS file system
    -->
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>

    <! -- 2. Configure the temporary data storage directory of Hadoop -- >
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/home/hadoop/app/hadoop-2.8.2/tmp</value>
    </property>
</configuration>

2. Edit hdfs-site.xml

<configuration>
        <! -- 1. Where does namenode store name table (fsimage) locally
             The default value is file: // ${Hadoop. TMP. Dir} / DFs / name -- >
        <property>
                <name>dfs.namenode.name.dir</name>
                <value>/home/hadoop/hadoopdata/dfs/name</value>
        </property>

        <! -- 1. Where does the datanode store the blocks locally
             The default value is file: // ${Hadoop. TMP. Dir} / DFs / data -- >
        <property>
                <name>dfs.datanode.data.dir</name>
                <value>/home/hadoop/hadoopdata/dfs/data</value>
        </property>

        <! -- 3. The number of copies of files in the HDFS file system. The default value is 3 -- >
        <property>
                <name>dfs.replication</name>
                <value>1</value>
        </property>

        <! -- 4. Configure the host address and port number of the secondary namenode,
             Can not match, default is 0.0.0.0:9868 -- >
        <property>
                <name>dfs.secondary.http.address</name>
                <value>0.0.0.0:50090</value>
        </property>
</configuration>

3. Edit mapred-site.xml

In the directory, only mapred-site.xml.template is available. Use the command CP mapred-site.xml.template mapred-site.xml to copy a configuration file. Hadoop will go to mapred-site.xml first.

<configuration>
    <! -- name of the platform where MapReduce program runs, use yarn -- >
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>

4. Edit yarn-site.xml

<configuration>
    <! -- reduce how to get data -- >
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
</configuration>

3、 Format a file system

Execute command:

hdfs namenode -format

The output information is similar to the following:

[[email protected] hadoop]$ hdfs namenode -format
18/05/05 16:33:37 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   user = hadoop
STARTUP_MSG:   host = server04/192.168.128.4
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 2.8.2
……
……
18/05/05 16:33:39 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
18/05/05 16:33:39 INFO util.ExitUtil: Exiting with status 0
18/05/05 16:33:39 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at server04/192.168.128.4
************************************************************/

4、 Open Hadoop cluster

cd $HADOOP_HOME/sbin/

1. Execute the start-dfs.sh script


./start-dfs.sh

You will be prompted to enter the password of the current login Hadoop account. It is better to configure password free login without entering the password every time. For details, please refer to the article on SSH connection of Linux server.
The final output information is as follows:

[[email protected] sbin]$ ./start-dfs.sh 
Starting namenodes on [localhost]
The authenticity of host 'localhost (127.0.0.1)' can't be established.
ECDSA key fingerprint is 44:92:44:48:ec:dc:71:9b:90:a0:6e:92:20:8b:cf:16.
Are you sure you want to continue connecting (yes/no)? yes
localhost: Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
[email protected]'s password: 
localhost: starting namenode, logging to /home/hadoop/app/hadoop-2.8.2/logs/hadoop-hadoop-namenode-server04.out
[email protected]'s password: 
localhost: starting datanode, logging to /home/hadoop/app/hadoop-2.8.2/logs/hadoop-hadoop-datanode-server04.out
Starting secondary namenodes [localhost]
[email protected]'s password: 
localhost: starting secondarynamenode, logging to /home/hadoop/app/hadoop-2.8.2/logs/hadoop-hadoop-secondarynamenode-server04.out

2. Execute the start-yarn.sh script

./start-yarn.sh

The output information is as follows:

[[email protected] sbin]$ ./start-yarn.sh 
starting yarn daemons
starting resourcemanager, logging to /home/hadoop/app/hadoop-2.8.2/logs/yarn-hadoop-resourcemanager-server04.out
[email protected]'s password: 
localhost: starting nodemanager, logging to /home/hadoop/app/hadoop-2.8.2/logs/yarn-hadoop-nodemanager-server04.out

5、 Viewing Hadoop services in browser

1. The default port of Hadoop namenode is 50070. You can visit http://192.168.128.4:50070/ in the browser to check the usage of HDFS file system. Here, 192.168.128.4 is the IP address of my virtual machine.

Hadoop (3) pseudo distributed operation of Hadoop

2. Obtain Hadoop cluster and application information on port 8088: http://192.168.128.4:8088 /.

Hadoop (3) pseudo distributed operation of Hadoop

3. You can view the secondary namenode information on port 50090: http://192.168.128.4:50090 /.

Hadoop (3) pseudo distributed operation of Hadoop

4. On port 50075, you can view the information of datanode: http://192.168.128.4:50075 /.

Hadoop (3) pseudo distributed operation of Hadoop

6、 Test HDFS file system of Hadoop single node

1. Execute the following command to create an HDFS file directory

hadoop fs -mkdir /user
hadoop fs -mkdir /user/hadoop

2. Copy the local file (test. Log) to the distributed file system of Hadoop

hadoop fs -put /home/hadoop/test.log /user/hadoop/log

To see if the copy was successful:

[[email protected] ~]$ hadoop fs -ls /user/hadoop
Found 1 items
-rw-r--r--   1 hadoop supergroup      44486 2018-05-05 17:59 /user/hadoop/log

You can also view it in the browser:
Hadoop (3) pseudo distributed operation of Hadoop

Hadoop (3) pseudo distributed operation of Hadoop

Recommended Today

PHP Basics – String Array Operations

In our daily work, we often need to deal with some strings or arrays. Today, we have time to sort them out String operation <?php //String truncation $str = ‘Hello World!’ Substr ($STR, 0,5); // return ‘hello’ //Chinese string truncation $STR = ‘Hello, Shenzhen’; $result = mb_ Substr ($STR, 0,2); // Hello //First occurrence of […]