3- build Hadoop ha
0. change host name
hostnamectl set-hostname master
hostnamectl set-hostname master2
hostnamectl set-hostname slave
1.ssh password free login
(1) , generate secret key (group sending)
ssh-keygen
(2) , configure hosts
\,
vi /etc/hosts
\
192.168.204.152 master
192.168.204.153 master2
192.168.204.154 slave
\
(3) , add public key by appending (mass sending)
First host:
ssh-copy-id master
yes
000000 (enter the set password)
Second host:
ssh-copy-id master2
yes
000000 (enter the set password)
The third host (note at this time that the other two hosts are still connected for the first time when you enter here, so you need to select Yes first and enter the password):
ssh-copy-id slave
yes
000000 (enter the set password)
2. turn off the firewall
\turn off firewall:
systemctl stop firewalld
\set boot disable firewall:
systemctl disable firewalld.service
\\check firewall status:
systemctl status firewalld
3.jdk installation and configuration
(1). Decompress JDK components
\xftp is used to upload components to /soft directory
mkdir /soft
\usr/java
mkdir -p /usr/java/
\\
cd /soft
tar -zxvf jdk-8u77-linux-x64.tar.gz -C /usr/java/
(2). Configure environment variables
\\
vi /etc/profile
export JAVA_HOME=/usr/java/jdk1.8.0_77
export PATH=$PATH:$JAVA_HOME/bin
\\set the current environment variable to take effect immediately
source /etc/profile
(3). Synchronize JDK and environment variables to the other two servers
scp -r /usr/java/ master2:/usr/
scp /etc/profile master2:/etc/
scp -r /usr/java/ slave:/usr/
scp /etc/profile slave:/etc/
\
source /etc/profile
4.zookeeper installation and configuration
(1). Unzip the zookeeper component
Components are uploaded to /soft directory using xftp
Create a new directory on all three machines: /usr/hadoop
mkdir /usr/hadoop
Enter the soft directory and unzip zookeeper
cd /soft
tar -zxvf zookeeper-3.4.10.tar.gz -C /usr/hadoop/
(2). Configure environment variables
Add the following content to the response configuration file
vi /etc/profile
export ZOOKEEPER_HOME=/usr/hadoop/zookeeper-3.4.10
export PATH=$PATH:$ZOOKEEPER_HOME/bin
Setting the current environment variable takes effect immediately
source /etc/profile
(3). Configure zookeeper
1. enter the specified path to modify the configuration file
cd /usr/hadoop/zookeeper-3.4.10/conf
cp zoo_sample.cfg zoo.cfg
vi zoo.cfg
2. delete all previous contents and add the following contents
tickTime=2000
initLimit=10
syncLimit=5
Data storage location
dataDir=/hadoop/zookeeper/zkdata
Log storage location
dataLogDir=/hadoop/zookeeper/zklog
port
clientPort=2181
Specify three nodes to deploy zookeeper
server.1=master:2888:3888
server.2=master2:2888:3888
server.3=slave:2888:3888
(4). Synchronize zookeeper and environment variables to the other two servers
scp -r /usr/hadoop/zookeeper-3.4.10 master2:/usr/hadoop/
scp /etc/profile master2:/etc/
scp -r /usr/hadoop/zookeeper-3.4.10 slave:/usr/hadoop/
scp /etc/profile slave:/etc/
source /etc/profile
(5). create folder
On each node
mkdir -p /hadoop/zookeeper/zkdata
mkdir -p /hadoop/zookeeper/zklog
(6). Create myid
Create the file myid under /hadoop/zookeeper/zkdata
cd /hadoop/zookeeper/zkdata
vi myid
The editing content is the current server value (1,2,3), which needs to be consistent with the above zoo The configuration in CFG corresponds to:
Master1 node – > 1
Master2 node – > 2
Worker1 node – > 3
5. Hadoop installation and configuration
(1). Unzip Hadoop components
Upload components to the soft directory
cd /soft
tar -zxvf hadoop-2.7.3.tar.gz -C /usr/hadoop/
(2). Modify the corresponding configuration file
Enter the configuration file directory
cd /usr/hadoop/hadoop-2.7.3/etc/hadoop/
1) , configure the core site xml
<property>
<name>fs.defaultFS</name>
<value>hdfs://ns1/</value>
</property>
<property>
<name>fs.trash.interval</name>
<value>1440</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/hadoop/hadoop/tmp</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>master:2181,master2:2181,slave:2181</value>
</property>
2) , configure HDFS site xml
Note: if namenode HA is enabled, it is no longer necessary to enable the secondary namenode
<property>
<name>dfs.nameservices</name>
<value>ns1</value>
</property>
<property>
<name>dfs.ha.namenodes.ns1</name>
<value>nn1,nn2</value>
</property>
<property>
<name>dfs.namenode.rpc-address.ns1.nn1</name>
<value>master:8020</value>
</property>
<property>
<name>dfs.namenode.http-address.ns1.nn1</name>
<value>master:50070</value>
</property>
<property>
<name>dfs.namenode.rpc-address.ns1.nn2</name>
<value>master2:8020</value>
</property>
<property>
<name>dfs.namenode.http-address.ns1.nn2</name>
<value>master2:50070</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://master:8485;master2:8485;slave:8485/ns1</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/hadoop/hadoop/edits</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.ns1</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>
sshfence
shell(/bin/true)
</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/root/.ssh/id_rsa</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>30000</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
3) , configure the yarn site xml
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>yrc</value>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>master</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>master2</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm1</name>
<value>master:8088</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm2</name>
<value>master2:8088</value>
</property>
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>master:2181,master2:2181,slave:2181</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
4) , configure marted-site xml
[[email protected] hadoop]# cp mapred-site.xml.template mapred-site.xml
Open profile s
vi mapred-site.xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
5) , configure Hadoop env sh
export JAVA_HOME=/usr/java/jdk1.8.0_77
6) , configure slaves
Specify datanode node (hostname)
[[email protected] ~]# vim /usr/hadoop/hadoop-2.7.3/etc/hadoop/slaves
Delete the original localhost and add the following
master
master2
slave
7) , configure environment variables
Modify profile
vi /etc/profile
Add the following
export HADOOP_HOME=/usr/hadoop/hadoop-2.7.3
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
Update profile
source /etc/profile
8) , synchronize Hadoop and copy configuration files to other nodes
After the master node is configured, it needs to be synchronized to the other two servers
[[email protected] /]# cd /usr/hadoop/
[[email protected] hadoop]# scp -r /usr/hadoop/hadoop-2.7.3 [email protected]:/usr/hadoop/
[[email protected] hadoop]# scp /etc/profile [email protected]:/etc/
[[email protected] hadoop]# scp -r /usr/hadoop/hadoop-2.7.3 [email protected]:/usr/hadoop/
[[email protected] hadoop]# scp /etc/profile [email protected]:/etc/
The profile needs to be updated after synchronizing the profile
source /etc/profile
6. cluster startup
(1). Start zookeeper
Execute the following commands on all three nodes
zkServer.sh start
Viewing status: one leader and two flowers
zkServer.sh status
(2). Start journalnode
Execute the following commands on all three nodes
hadoop-daemon.sh start journalnode
(3). Format HDFS
1. format namenode on the master:
hdfs namenode -format
Start namenode on master1:
hadoop-daemon.sh start namenode
Format namenode on master2:
hdfs namenode -bootstrapStandby
(4). Format zkfc
Execute on master1:
hdfs zkfc -formatZK
(5). Start HDFS
Execute on master1:
start-dfs.sh
(6). Start yarn
# 1. Execute on master1:
start-yarn.sh
# 2. Execute on master2:
yarn-daemon.sh start resourcemanager
(7). Start jobhistoryserver
# execute the following commands on each host
mr-jobhistory-daemon.sh start historyserver
7. verify that the namenode is highly available
\1\1 visit the master:50070 and master2:50070 web pages respectively to view the status of the two nodes:
The master is in active status; Master2 is in standby state
\we will manually start the namenode service of the master:
hadoop-daemon.sh start namenode
\5\go to the master:50070 web page to view, and the master is in standby status; Master2 is active
8. process validation
\master node
[[email protected] hadoop]# jps
10417 ResourceManager
2226 QuorumPeerMain
10994 Jps
10519 NodeManager
10312 DFSZKFailoverController
10953 NameNode
10044 DataNode
9614 JournalNode
\master2 node
[[email protected] hadoop]# jps
9586 DataNode
9811 NodeManager
10181 NameNode
9882 ResourceManager
9708 DFSZKFailoverController
10285 Jps
9406 JournalNode
2063 QuorumPeerMain
\slave node
[[email protected] zkdata]# jps
9504 DataNode
2085 QuorumPeerMain
9783 Jps
9626 NodeManager
9422 JournalNode