3- build Hadoop ha

Time:2022-6-20

3- build Hadoop ha

0. change host name

 hostnamectl set-hostname master
 hostnamectl set-hostname master2
 hostnamectl set-hostname slave

1.ssh password free login

(1) , generate secret key (group sending)

 ssh-keygen

(2) , configure hosts

\,
 vi /etc/hosts
\
 192.168.204.152 master
 192.168.204.153 master2
 192.168.204.154 slave
\

(3) , add public key by appending (mass sending)

First host:
 ssh-copy-id master
 yes
000000 (enter the set password)
Second host:
 ssh-copy-id master2
 yes
000000 (enter the set password)
The third host (note at this time that the other two hosts are still connected for the first time when you enter here, so you need to select Yes first and enter the password):
 ssh-copy-id slave
 yes
000000 (enter the set password)

2. turn off the firewall

\turn off firewall:
 systemctl stop firewalld
\set boot disable firewall:
 systemctl disable firewalld.service
\\check firewall status:
 systemctl status firewalld

3.jdk installation and configuration

(1). Decompress JDK components

\xftp is used to upload components to /soft directory
 mkdir /soft
\usr/java
 mkdir -p /usr/java/
\\
 cd /soft
 tar -zxvf jdk-8u77-linux-x64.tar.gz -C /usr/java/

(2). Configure environment variables

\\
 vi /etc/profile
 export JAVA_HOME=/usr/java/jdk1.8.0_77
 export PATH=$PATH:$JAVA_HOME/bin
\\set the current environment variable to take effect immediately
 source /etc/profile

(3). Synchronize JDK and environment variables to the other two servers

 scp -r /usr/java/ master2:/usr/
 scp /etc/profile master2:/etc/
 ​
 scp -r /usr/java/ slave:/usr/
 scp /etc/profile slave:/etc/
\
 source /etc/profile

4.zookeeper installation and configuration

(1). Unzip the zookeeper component

Components are uploaded to /soft directory using xftp

Create a new directory on all three machines: /usr/hadoop

mkdir /usr/hadoop

Enter the soft directory and unzip zookeeper

cd /soft
tar -zxvf zookeeper-3.4.10.tar.gz -C /usr/hadoop/

(2). Configure environment variables

Add the following content to the response configuration file

vi /etc/profile
export ZOOKEEPER_HOME=/usr/hadoop/zookeeper-3.4.10
export PATH=$PATH:$ZOOKEEPER_HOME/bin

Setting the current environment variable takes effect immediately

source /etc/profile

(3). Configure zookeeper

1. enter the specified path to modify the configuration file

cd /usr/hadoop/zookeeper-3.4.10/conf
cp zoo_sample.cfg zoo.cfg
vi zoo.cfg

2. delete all previous contents and add the following contents

tickTime=2000
initLimit=10
syncLimit=5

Data storage location

dataDir=/hadoop/zookeeper/zkdata

Log storage location

dataLogDir=/hadoop/zookeeper/zklog

port

clientPort=2181

Specify three nodes to deploy zookeeper

server.1=master:2888:3888
server.2=master2:2888:3888
server.3=slave:2888:3888

(4). Synchronize zookeeper and environment variables to the other two servers

scp -r /usr/hadoop/zookeeper-3.4.10 master2:/usr/hadoop/
scp /etc/profile master2:/etc/
scp -r /usr/hadoop/zookeeper-3.4.10 slave:/usr/hadoop/
scp /etc/profile slave:/etc/
source /etc/profile

(5). create folder

On each node

mkdir -p /hadoop/zookeeper/zkdata
mkdir -p /hadoop/zookeeper/zklog

(6). Create myid

Create the file myid under /hadoop/zookeeper/zkdata

cd /hadoop/zookeeper/zkdata
vi myid

The editing content is the current server value (1,2,3), which needs to be consistent with the above zoo The configuration in CFG corresponds to:

Master1 node – > 1
Master2 node – > 2
Worker1 node – > 3

5. Hadoop installation and configuration

(1). Unzip Hadoop components

Upload components to the soft directory

cd /soft
tar -zxvf hadoop-2.7.3.tar.gz -C /usr/hadoop/

(2). Modify the corresponding configuration file

Enter the configuration file directory

cd /usr/hadoop/hadoop-2.7.3/etc/hadoop/

1) , configure the core site xml

<property>

<name>fs.defaultFS</name>
<value>hdfs://ns1/</value>

</property>
<property>

<name>fs.trash.interval</name>
<value>1440</value>

</property>
<property>

<name>hadoop.tmp.dir</name>
<value>/hadoop/hadoop/tmp</value>

</property>
<property>

<name>ha.zookeeper.quorum</name>
<value>master:2181,master2:2181,slave:2181</value>

</property>

2) , configure HDFS site xml

Note: if namenode HA is enabled, it is no longer necessary to enable the secondary namenode

<property>

<name>dfs.nameservices</name>
<value>ns1</value>

</property>
<property>

<name>dfs.ha.namenodes.ns1</name>
<value>nn1,nn2</value>

</property>
<property>

<name>dfs.namenode.rpc-address.ns1.nn1</name>
<value>master:8020</value>

</property>
<property>

<name>dfs.namenode.http-address.ns1.nn1</name>
<value>master:50070</value>

</property>
<property>

<name>dfs.namenode.rpc-address.ns1.nn2</name>
<value>master2:8020</value>

</property>
<property>

<name>dfs.namenode.http-address.ns1.nn2</name>
<value>master2:50070</value>

</property>
<property>

<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://master:8485;master2:8485;slave:8485/ns1</value>

</property>
<property>

<name>dfs.journalnode.edits.dir</name>
<value>/hadoop/hadoop/edits</value>

</property>
<property>

<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>

</property>
<property>

<name>dfs.client.failover.proxy.provider.ns1</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>

</property>
<property>

<name>dfs.ha.fencing.methods</name>
<value>
sshfence  
shell(/bin/true)  
</value>

</property>
<property>

<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/root/.ssh/id_rsa</value>

</property>
<property>

<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>30000</value>

</property>
<property>

<name>dfs.replication</name>
<value>3</value>

</property>

3) , configure the yarn site xml

<property>

<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>

</property>
<property>

<name>yarn.resourcemanager.cluster-id</name>
<value>yrc</value>

</property>
<property>

<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>

</property>
<property>

<name>yarn.resourcemanager.hostname.rm1</name>
<value>master</value>

</property>
<property>

<name>yarn.resourcemanager.hostname.rm2</name>
<value>master2</value>

</property>
<property>

<name>yarn.resourcemanager.webapp.address.rm1</name>
<value>master:8088</value>

</property>
<property>

<name>yarn.resourcemanager.webapp.address.rm2</name>
<value>master2:8088</value>

</property>
<property>

<name>yarn.resourcemanager.zk-address</name>
<value>master:2181,master2:2181,slave:2181</value>

</property>
<property>

<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>

</property>

4) , configure marted-site xml

[[email protected] hadoop]# cp mapred-site.xml.template mapred-site.xml

Open profile s

vi mapred-site.xml

<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>

5) , configure Hadoop env sh

export JAVA_HOME=/usr/java/jdk1.8.0_77

6) , configure slaves

Specify datanode node (hostname)

[[email protected] ~]# vim /usr/hadoop/hadoop-2.7.3/etc/hadoop/slaves

Delete the original localhost and add the following

master
master2
slave

7) , configure environment variables

Modify profile

vi /etc/profile

Add the following

export HADOOP_HOME=/usr/hadoop/hadoop-2.7.3
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

Update profile

source /etc/profile

8) , synchronize Hadoop and copy configuration files to other nodes

After the master node is configured, it needs to be synchronized to the other two servers

[[email protected] /]# cd /usr/hadoop/
[[email protected] hadoop]# scp -r /usr/hadoop/hadoop-2.7.3 [email protected]:/usr/hadoop/
[[email protected] hadoop]# scp /etc/profile [email protected]:/etc/
[[email protected] hadoop]# scp -r /usr/hadoop/hadoop-2.7.3 [email protected]:/usr/hadoop/
[[email protected] hadoop]# scp /etc/profile [email protected]:/etc/

The profile needs to be updated after synchronizing the profile

source /etc/profile

6. cluster startup

(1). Start zookeeper

Execute the following commands on all three nodes

zkServer.sh start

Viewing status: one leader and two flowers

zkServer.sh status

(2). Start journalnode

Execute the following commands on all three nodes

hadoop-daemon.sh start journalnode

(3). Format HDFS

1. format namenode on the master:

hdfs namenode -format

Start namenode on master1:

hadoop-daemon.sh start namenode

Format namenode on master2:

hdfs namenode -bootstrapStandby

(4). Format zkfc

Execute on master1:

hdfs zkfc -formatZK

(5). Start HDFS

Execute on master1:

start-dfs.sh

(6). Start yarn

 # 1. Execute on master1:
 start-yarn.sh
 # 2. Execute on master2:
 yarn-daemon.sh start resourcemanager

(7). Start jobhistoryserver

# execute the following commands on each host
 mr-jobhistory-daemon.sh start historyserver

7. verify that the namenode is highly available

\1\1 visit the master:50070 and master2:50070 web pages respectively to view the status of the two nodes:
The master is in active status; Master2 is in standby state
\we will manually start the namenode service of the master:
 hadoop-daemon.sh start namenode
\5\go to the master:50070 web page to view, and the master is in standby status; Master2 is active

8. process validation

\master node
 [[email protected] hadoop]# jps
 10417 ResourceManager
 2226 QuorumPeerMain
 10994 Jps
 10519 NodeManager
 10312 DFSZKFailoverController
 10953 NameNode
 10044 DataNode
 9614 JournalNode
\master2 node
 [[email protected] hadoop]# jps
 9586 DataNode
 9811 NodeManager
 10181 NameNode
 9882 ResourceManager
 9708 DFSZKFailoverController
 10285 Jps
 9406 JournalNode
 2063 QuorumPeerMain
\slave node
 [[email protected] zkdata]# jps
 9504 DataNode
 2085 QuorumPeerMain
 9783 Jps
 9626 NodeManager
 9422 JournalNode