Building Hadoop High Availability Cluster Based on ZooKeeper

Time:2019-10-1

Introduction to High Availability

High Availability of Hadoop can be divided into HDFS high availability and YARN high availability. Their implementation is basically similar, but HDFS NameNode requires much higher data storage and consistency than YARN Resource Manger, so its implementation is more complex. So let’s explain the following:

1.1 High Availability Architecture

HDFS High Availability Architecture is as follows:

Building Hadoop High Availability Cluster Based on ZooKeeper

The picture is quoted from: https://www.edureka.co/blog/h…

HDFS High Availability Architecture consists of the following components:

  • Active NameNode and Stanby NameNodeTwo NameNodes are prepared for each other. One is in the active state, the main NameNode, and the other is in the Standby state. Only the main NameNode can provide reading and writing services to the outside world.
  • Main and Standby Switching Controller ZKFailover ControllerZKFailover Controller runs as a separate process and controls the master switch of NameNode. ZKFailover Controller can detect the health status of NameNode in time, and realize automatic backup election and switching with Zookeeper when the main NameNode fails. Of course, NameNode also supports manual backup switching without Zookeeper.
  • Zookeeper ClusterProvide backup election support for backup switching controller.
  • Shared Storage SystemShared storage system is the most critical part to achieve high availability of NameNode. Shared storage system preserves the metadata of HDFS generated by NameNode during its operation. Master NameNode and NameNode realize metadata synchronization through shared storage system. When switching between master and standby, the new master NameNode can continue to provide services only after confirming that metadata is fully synchronized.
  • DataNode nodeIn addition to sharing metadata information of HDFS through shared storage system, the mapping relationship between HDFS data blocks and DataNode is also needed for master NameNode and standby NameNode. DataNode reports the location information of the data block to both the main NameNode and the standby NameNode.

1.2 Data Synchronization Mechanism Analysis of Shared Storage System Based on QJM

At present, Hadoop supports the use of Quorum Journal Manager (QJM) or Network File System (NFS) as a shared storage system. This paper takes QJM cluster as an example to illustrate: Active NameNode first submits EditLog to Journal Node cluster, then Standby NameNode synchronizes EditLog from Journal Node cluster, when A After the active NameNode goes down, Standby NameNode can provide services to the outside world after confirming that metadata is fully synchronized.

It should be noted that writing EditLog to the JournalNode cluster follows the strategy of “more than half write is successful”, so you need at least three JournalNode nodes. Of course, you can continue to increase the number of nodes, but you should ensure that the total number of nodes is odd. At the same time, if there are 2N + 1 JournalNode, then according to the principle of more than half writing, it can tolerate at most N JournalNode nodes hanging up.

Building Hadoop High Availability Cluster Based on ZooKeeper

1.3 NameNode primary and standby switching

NameNode’s main and standby switching process is shown in the following figure:

Building Hadoop High Availability Cluster Based on ZooKeeper

  1. After the initialization of Health Monitor is completed, internal threads are started to call the method corresponding to the HASERVICE Protocol RPC interface of NameNode periodically to detect the health status of NameNode.
  2. If the Health Monitor detects a change in the health status of NameNode, it calls back the corresponding method registered by ZKFailover Controller for processing.
  3. If ZKFailover Controller decides that a primary-standby switch is needed, it will first use Active Standby Elector to conduct an automatic primary election.
  4. Active Standby Elector interacts with Zookeeper to complete an automatic backup election.
  5. Active Standby Elector calls back the corresponding method of ZKFailover Controller to notify the current NameNode to become the main NameNode or the standby NameNode after the primary election is completed.
  6. ZKFailover Controller calls the HASERVICE Protocol RPC interface corresponding to NameNode to convert NameNode to Active or Standby state.

1.4 YARN High Availability

The high availability of YARN Resource Manager is similar to that of HDFS NameNode. However, unlike NameNode, ResourceManager does not have so much metadata information to maintain, so its status information can be written directly to Zookeeper and relies on Zookeeper for primary and standby elections.

Building Hadoop High Availability Cluster Based on ZooKeeper

II. Cluster Planning

In accordance with the design goal of high availability, at least two NameNodes and two Resource Managers need to be guaranteed, while at least three Journal Nodes are required to satisfy the principle of “more than half writing is successful”. Three hosts are used to build the cluster. The cluster planning is as follows:

Building Hadoop High Availability Cluster Based on ZooKeeper

Pre-conditions

  • All servers are equipped with JDK, the installation steps can be referred to: JDK installation under Linux;
  • Build ZooKeeper Cluster, the steps can be referred to: Zookeeper stand-alone environment and cluster environment.
  • SSH secret-free login is configured between all servers.

IV. Cluster Configuration

4.1 Download and Unzip

Download Hadoop. Here I download the CDH version of Hadoop at http://archive.cloudera.com/c…

# tar -zvxf hadoop-2.6.0-cdh5.15.2.tar.gz 

4.2 Configure environment variables

editprofileDocument:

# vim /etc/profile

Add the following configuration:

export HADOOP_HOME=/usr/app/hadoop-2.6.0-cdh5.15.2
export  PATH=${HADOOP_HOME}/bin:$PATH

implementsourceCommand to make the configuration take effect immediately:

# source /etc/profile

4.3 Configuration Modification

Get into${HADOOP_HOME}/etc/hadoopUnder the directory, modify the configuration file. The contents of each configuration file are as follows:

1. hadoop-env.sh

# Specify the installation location of JDK
export JAVA_HOME=/usr/java/jdk1.8.0_201/

2. core-site.xml

<configuration>
    <property>
        <! - Specify the communication address of the HDFS protocol file system of namenode - >.
        <name>fs.defaultFS</name>
        <value>hdfs://hadoop001:8020</value>
    </property>
    <property>
        <! - Specify the directory where the Hadoop cluster stores temporary files - >
        <name>hadoop.tmp.dir</name>
        <value>/home/hadoop/tmp</value>
    </property>
    <property>
        <! - Address of ZooKeeper Cluster - >
        <name>ha.zookeeper.quorum</name>
        <value>hadoop001:2181,hadoop002:2181,hadoop002:2181</value>
    </property>
    <property>
        <! - The time-out for ZKFC to connect to ZooKeeper - >
        <name>ha.zookeeper.session-timeout.ms</name>
        <value>10000</value>
    </property>
</configuration>

3. hdfs-site.xml

<configuration>
    <property>
        <! - Specify the number of HDFS replicas - >
        <name>dfs.replication</name>
        <value>3</value>
    </property>
    <property>
        <! -- The storage location of namenode node data (metadata) can specify multiple directories for fault tolerance, and multiple directories are separated by commas-->.
        <name>dfs.namenode.name.dir</name>
        <value>/home/hadoop/namenode/data</value>
    </property>
    <property>
        <! - Storage location of data node data (i.e. data block) --> data node
        <name>dfs.datanode.data.dir</name>
        <value>/home/hadoop/datanode/data</value>
    </property>
    <property>
        <! - Logical name of cluster service - >
        <name>dfs.nameservices</name>
        <value>mycluster</value>
    </property>
    <property>
        <! - NameNode ID List - >
        <name>dfs.ha.namenodes.mycluster</name>
        <value>nn1,nn2</value>
    </property>
    <property>
        <! - nn1's RPC address - >
        <name>dfs.namenode.rpc-address.mycluster.nn1</name>
        <value>hadoop001:8020</value>
    </property>
    <property>
        <! - RPC address of nn2 - >
        <name>dfs.namenode.rpc-address.mycluster.nn2</name>
        <value>hadoop002:8020</value>
    </property>
    <property>
        <! - HTTP address of NN1 - >
        <name>dfs.namenode.http-address.mycluster.nn1</name>
        <value>hadoop001:50070</value>
    </property>
    <property>
        <! - HTTP address of nn2 - >
        <name>dfs.namenode.http-address.mycluster.nn2</name>
        <value>hadoop002:50070</value>
    </property>
    <property>
        <! - NameNode Metadata Shared Storage Directory on JournalNode
        <name>dfs.namenode.shared.edits.dir</name>
        <value>qjournal://hadoop001:8485;hadoop002:8485;hadoop003:8485/mycluster</value>
    </property>
    <property>
        Journal Edit Files Storage Directory
        <name>dfs.journalnode.edits.dir</name>
        <value>/home/hadoop/journalnode/data</value>
    </property>
    <property>
        <! - Configure the isolation mechanism to ensure that only one NameNode is active at any given time - >
        <name>dfs.ha.fencing.methods</name>
        <value>sshfence</value>
    </property>
    <property>
        <! - SSH secret-free login is required when using sshfence mechanism
        <name>dfs.ha.fencing.ssh.private-key-files</name>
        <value>/root/.ssh/id_rsa</value>
    </property>
    <property>
        <! - SSH timeout - >
        <name>dfs.ha.fencing.ssh.connect-timeout</name>
        <value>30000</value>
    </property>
    <property>
        <! -- Access the proxy class to determine NameNode - > currently in Active state
        <name>dfs.client.failover.proxy.provider.mycluster</name>
        <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    </property>
    <property>
        <! - Turn on automatic failover - >
        <name>dfs.ha.automatic-failover.enabled</name>
        <value>true</value>
    </property>
</configuration>

4. yarn-site.xml

<configuration>
    <property>
        <! -- Configure ancillary services running on NodeManager. You need to configure mapreduce_shuffle to run MapReduce on Yarn. >
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <! - Whether log aggregation is enabled (optional) -->
        <name>yarn.log-aggregation-enable</name>
        <value>true</value>
    </property>
    <property>
        <! - Aggregated log save time (optional) --> _____________
        <name>yarn.log-aggregation.retain-seconds</name>
        <value>86400</value>
    </property>
    <property>
        <! - Enable RM HA - >
        <name>yarn.resourcemanager.ha.enabled</name>
        <value>true</value>
    </property>
    <property>
        <! - RM Cluster Identity - >
        <name>yarn.resourcemanager.cluster-id</name>
        <value>my-yarn-cluster</value>
    </property>
    <property>
        <! - Logical ID List of RM - >
        <name>yarn.resourcemanager.ha.rm-ids</name>
        <value>rm1,rm2</value>
    </property>
    <property>
        <! - RM1 Service Address - >
        <name>yarn.resourcemanager.hostname.rm1</name>
        <value>hadoop002</value>
    </property>
    <property>
        <! - RM2 Service Address - >
        <name>yarn.resourcemanager.hostname.rm2</name>
        <value>hadoop003</value>
    </property>
    <property>
        <! - Address of RM1 Web Application - >
        <name>yarn.resourcemanager.webapp.address.rm1</name>
        <value>hadoop002:8088</value>
    </property>
    <property>
        <! - Address of RM2 Web Application - >
        <name>yarn.resourcemanager.webapp.address.rm2</name>
        <value>hadoop003:8088</value>
    </property>
    <property>
        <! - Address of ZooKeeper Cluster - >
        <name>yarn.resourcemanager.zk-address</name>
        <value>hadoop001:2181,hadoop002:2181,hadoop003:2181</value>
    </property>
    <property>
        <! - Enable Automatic Recovery - >
        <name>yarn.resourcemanager.recovery.enabled</name>
        <value>true</value>
    </property>
    <property>
        <! - Classes for persistent storage
        <name>yarn.resourcemanager.store.class</name>
        <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
    </property>
</configuration>

5. mapred-site.xml

<configuration>
    <property>
        <! -- Specify that the MapReduce job runs on yarn - >
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>

5. slaves

Configure the host name or IP address of all subordinate nodes, one per line. On all subordinate nodesDataNodeService andNodeManagerServices will be started.

hadoop001
hadoop002
hadoop003

4.4 Distribution Procedure

Hadoop installation packages are distributed to two other servers, and it is recommended that Hadoop environment variables be configured on both servers.

# Distribution of installation packages to Hadoop 002
scp -r /usr/app/hadoop-2.6.0-cdh5.15.2/  hadoop002:/usr/app/
# Distribution of installation packages to Hadoop 003
scp -r /usr/app/hadoop-2.6.0-cdh5.15.2/  hadoop003:/usr/app/

V. Starting Clusters

5.1 Start ZooKeeper

Start ZooKeeper service on three servers:

 zkServer.sh start

5.2 Start Journal node

To three servers${HADOOP_HOME}/sbinUnder the directory, startjournalnodeProcess:

hadoop-daemon.sh start journalnode

5.3 Initialize NameNode

stayhadop001Perform onNameNodeInitialization command:

hdfs namenode -format

After executing the initialization command, you need toNameNodeContents of metadata directories, copied to other unformattedNameNodeUp. The metadata storage directory is where we arehdfs-site.xmlUse indfs.namenode.name.dirThe directory specified by the property. Here we need to copy it tohadoop002Upper:

 scp -r /home/hadoop/namenode/data hadoop002:/home/hadoop/namenode/

5.4 Initialization of HA state

On any oneNameNodeUse the following commands to initialize the HA state in ZooKeeper:

hdfs zkfc -formatZK

5.5 Start HDFS

Enter intohadoop001Of${HADOOP_HOME}/sbinUnder the directory, start HDFS. herehadoop001andhadoop002UpperNameNodeServices, and on three serversDataNodeServices will be started:

start-dfs.sh

5.6 Start YARN

Enter intohadoop002Of${HADOOP_HOME}/sbinUnder the directory, start YARN. herehadoop002UpperResourceManagerServices, and on three serversNodeManagerServices will be started:

start-yarn.sh

It’s important to note that this is the time.hadoop003UpperResourceManagerServices are usually not started and need to be started manually:

yarn-daemon.sh start resourcemanager

6. Viewing Clusters

6.1 View Process

After successful startup, the processes on each server should be as follows:

[[email protected] sbin]# jps
4512 DFSZKFailoverController
3714 JournalNode
4114 NameNode
3668 QuorumPeerMain
5012 DataNode
4639 NodeManager


[[email protected] sbin]# jps
4499 ResourceManager
4595 NodeManager
3465 QuorumPeerMain
3705 NameNode
3915 DFSZKFailoverController
5211 DataNode
3533 JournalNode


[[email protected] sbin]# jps
3491 JournalNode
3942 NodeManager
4102 ResourceManager
4201 DataNode
3435 QuorumPeerMain

6.2 View Web UI

The port numbers of HDFS and YARN are respectively50070and8080The interface should be as follows:

At this point on Hadoop 001NameNodeAvailable:

Building Hadoop High Availability Cluster Based on ZooKeeper

And on Hadoop 002NameNodeIn the standby state:

Building Hadoop High Availability Cluster Based on ZooKeeper

On Hadoop 002ResourceManagerAvailable:

Building Hadoop High Availability Cluster Based on ZooKeeper

On Hadoop 003ResourceManagerIn the standby state:

Building Hadoop High Availability Cluster Based on ZooKeeper

At the same time, there are also on the interface.Journal ManagerRelevant information:

Building Hadoop High Availability Cluster Based on ZooKeeper

VII. Second Start-up of Cluster

The initial start-up of the cluster above involves some necessary initialization operations, so the process is slightly cumbersome. But once the cluster is built, it is convenient to start it again. The steps are as follows (first, make sure the ZooKeeper cluster is started):

stay hadoop001Start HDFS, at which point all HDFS-related services, including NameNode, DataNode and JournalNode, will be started:

start-dfs.sh

stayhadoop002Start YARN:

start-yarn.sh

This timehadoop003UpperResourceManagerServices are usually not started and need to be started manually:

yarn-daemon.sh start resourcemanager

Reference material

The above steps are mainly referred to from official documents:

  • HDFS High Availability Using the Quorum Journal Manager
  • ResourceManager High Availability

For detailed analysis of Hadoop high availability principle, recommended reading:

Hadoop NameNode High Availability Implementation Parsing

For more big data series articles, see the Personal GitHub Open Source Project: A Guide to Big Data