5. Install Hadoop cluster

Time:2020-3-25

Hadoop is a distributed system infrastructure developed by the Apache foundation. It implements a distributed file system (HDFS). The core design of the framework is HDFS and MapReduce. HDFS provides storage for massive data, while MapReduce provides calculation for massive data.

Stand alone mode installation

  1. Extract the installation package to the specified location and rename it.

    tar -zxvf hadoop-2.7.2.tar.gz -C /opt/module/
    mv hadoop-2.7.2 hadoop
  2. Write Hadoop to the environment variable to start.

    # Hadoop
    export HADOOP_HOME=/opt/module/hadoop
    export PATH=$PATH:$HADOOP_HOME/bin
    export PATH=$PATH:$HADOOP_HOME/sbin

    Remember after writing“source /etc/profile”The document gives immediate effect.

  3. Input “hadoop version”View the installed version of Hadoop.

Fully distributed installation

  1. Based on the stand-alone mode, enter the “etc / Hadoop” directory under the Hadoop installation directory, and modify the configuration file.
  2. Modify “JAVA HOME” in “hadoop-env.sh”, “mapred-env.sh” and “yarn-env.sh” files.
  3. Modify the “core-site. XML” file.

    <! -- specify the address of namenode in HDFS -- >
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://hadoop151:9000</value>
    </property>
    
    <! -- specifies the storage directory of files generated by Hadoop runtime -- >
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/opt/module/hadoop/data/tmp</value>
    </property>
  4. Modify the “HDFS site. XML” file

    <! -- specifies the number of HDFS replicas, which can be left unset. The default value is 3 -- >
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
    
    <! -- specify Hadoop secondary name node host configuration -- >
    <property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>hadoop153:50090</value>
    </property>
  5. Modify the “mapred site. XML” file.

    <! -- specify MR to run on yarn -- >
    <property>    
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    
    <! -- historical server address -- >
    <property>
        <name>mapreduce.jobhistory.address</name>
        <value>hadoop151:10020</value>
    </property>
    
    <! -- historical server web address -- >
    <property>
        <name>mapreduce.jobhistory.webapp.address</name>
        <value>hadoop151:19888</value>
    </property>
  6. Modify the slaves file.

    hadoop151
    hadoop152
    hadoop153
  7. Modify the “yarn site. XML” file

    <! -- how to get data from reducer -- >
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    
    <! -- specify the address of resource manager of yarn -- >
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>hadoop152</value>
    </property>
    
    <! -- log aggregation enabled -- >
    <property>
        <name>yarn.log-aggregation-enable</name>
        <value>true</value>
    </property>
    
    <! -- log retention time is set to 7 days -- >
    <property>
        <name>yarn.log-aggregation.retain-seconds</name>
        <value>604800</value>
    </property>
  8. Perform all the above steps on the other two virtual machines.
  9. On Hadoop 151, enter the Hadoop directory. Use the command “bin / HDFS namenode – Format” under its subdirectory bin directory to initialize the whole cluster. Success appears.5. Install Hadoop cluster
  10. Start HDFS on Hadoop 151 and yarn on Hadoop 152.

    [[email protected] ~] start-dfs.sh
    [[email protected] ~] start-yarn.sh
  11. Type the command “JPS” on each of the three virtual machines to view the startup process.
    5. Install Hadoop cluster

    5. Install Hadoop cluster

    5. Install Hadoop cluster

  12. Finally, open the browser on the physical machine and enter “Hadoop 151:50070”. View Hadoop on the web side.5. Install Hadoop cluster

Configure Hadoop to support LZO compression

LZO is a data compression algorithm dedicated to decompression speed, LZO is the abbreviation of Lempel Ziv oberhumer. The compression mode with LZO enabled is very useful for small-scale clusters. The compression ratio can be reduced to about 1 / 3 of the original log size. At the same time, the speed of decompression is faster.

  1. The virtual machine installs the LZO service. LZO is not the native support of Linux system, so you need to download and install the software package.

    [[email protected] ~] sudo yum install -y lzo lzo-devel
    [[email protected] ~] sudo yum install -y lzo lzo-devel
    [[email protected] ~] sudo yum install -y lzo lzo-devel
  2. Put “hadoop-lzo-0.4.20. Jar” into “Hadoop / share / Hadoop / common” directory.
  3. Modify the “core site. XML” file to write LZO.

    <property>
       <name>io.compression.codecs</name>
       <value>
           org.apache.hadoop.io.compress.GzipCodec,
           org.apache.hadoop.io.compress.DefaultCodec,
           org.apache.hadoop.io.compress.BZip2Codec,
           org.apache.hadoop.io.compress.SnappyCodec,
           com.hadoop.compression.lzo.LzoCodec,
           com.hadoop.compression.lzo.LzopCodec
       </value>
    </property>
    
    <property>
       <name>io.compression.codec.lzo.class</name>
       <value>com.hadoop.compression.lzo.LzoCodec</value>
    </property>
  4. Repeat the above steps on the other two virtual machines. Restart the cluster.

Recommended Today

Performance improvement in. Net 5

Performance improvement in. Net 5 In previous versions of. Net core, the significant performance improvements found in that release were actually described in the blog. From. Net core 2.0 to. Net core 2.1 to. Net core 3.0Talking about more and more things. Interestingly, however, each time you want to know if there are enough meaningful […]