Zookeeper installation and effect demonstration

Time:2019-10-21

High availability (HA) – zookeeper

ZooKeeper

  • An open source distributed project providing services for distributed applications
  • Provide a collection of primitives so that distributed applications can build a higher level of synchronous services on top of it
  • role

    Observer mode:
            Leader: the leader is responsible for initiating and deciding the voting and updating the status.
            Learner:
                Follow: accept the client's request and play the client's return results to participate in the voting
                Observer: accept the request, forward it to the leader, do not participate in voting, only synchronize the leader.
            Client: initiate request
        Application of observer mode: software skin, editing tool settings
  • install

    1. Pseudo distribution pattern

      1) install zookeeper (pay attention to permissions again)
      $ tar -zxf  /opt/software/zookeeper-3.4.5.tar.gz -C  /opt/modules/
      
      2) create a data directory of zookeeper
      $MKDIR zkdata -- // you can start automatic generation without manual creation.
      
      3) modify the configuration file ${zookeeper 「 home} / conf (Note: the configuration file is a template, and the duplicate name is zoo.cfg).
      $CD / opt / modules / zookeeper-3.4.5 / \\\\\\\\\\\\\\\\\\\\
      $CP conf / zoo? Sample.cfg conf / zoo.cfg? Copy
      $VI conf / zoo.cfg ා??ා??ා????????
      dataDir=/opt/modules/zookeeper-3.4.5/zkData
      
      4) start zookeeper
      $bin/zkServer.sh start
      $JPS view the java process as follows
      2088 QuorumPeerMain
      
      5) check the status of zookeeper
      The information of $bin / zkserver.sh status is as follows:
      JMX enabled by default
      Using config: /opt/modules/zookeeper-3.4.5/bin/../conf/zoo.cfg
      Mode: standalone
      6) understanding of some orders
      $bin / zkcli.sh enter zoopper
       Help view commands
       Quit
       Create - e temporary znode - s automatic numbering
            Get path view information
           Ls path view the list of specified directories
            RMR path delete
      
      Ls / ා view root directory
      Create - E / myapp MSG create directory
       Get / myapp view the creation information of myapp
       Ls / watch ා add attention event
      RMR / myapp delete trigger attention event
       quit
    2. Complete distribution pattern

      1. Install JDK (JDK shall be installed for all 3 PCS)
      2. Install a fully distributed cluster
      1) install ZK (pay attention to permissions)
          $ tar -zxvf /opt/software/zookeeper-3.4.5.tar.gz -C /opt/modules/
      2) configure zoo.cfg file
          $ cd /opt/modules/zookeeper-3.4.5/
          $ cp conf/zoo_sample.cfg conf/zoo.cfg
          $VI conf / zoo.cfg? Modify to add the following
          Amendment
          dataDir=/opt/modules/zookeeper-3.4.5/zkData
          #15 row addition
          server.1=centos01.ibeifeng.com:2888:3888
          server.2=centos02.ibeifeng.com:2888:3888
          server.3=centos03.ibeifeng.com:2888:3888
      3) create zkdata directory, create myid file in zkdata directory, and distribute
          $ mkdir zkData
          $ touch zkData/myid
          $CD / opt / modules / ා under modules
          From PC1 to PC2, PC3
          $ scp -r zookeeper-3.4.5/ centos02.ibeifeng.com:/opt/modules/
          $ scp -r zookeeper-3.4.5/ centos03.ibeifeng.com:/opt/modules/
      
      4) modify the myid files of PC1, PC2, PC3
          $ cd /opt/modules/zookeeper-3.4.5/
          $VI zkdata / myid ා corresponds to the server. N bound in conf / zoo.zfg
          The zkdata / myid content of PC1 is 1
          The zkdata / myid content of PC2 is 2
          The zkdata / myid content of PC3 is 3
      5) start 3 sets of zookeeper
          $ bin/zkServer.sh start
      
      6) inspection process
          $ jps
          3050 QuorumPeerMain
          3111 Jps
      
      8) check and check the status of 3 sets
          $ bin/zkServer.sh status
      
      Zookeeper shell command
      Bin / zkcli.sh - server hostname: 2181 or bin / zkcli.sh
I) file system (data structure)
        Znode
            1. Represents a directory in the zookeeper file system.
            2. On behalf of the client (such as namenode)
        
        |---/
            |---app
                |---app1
                |---app2
                |---app2

        Node namende in animal - > Hadoop


    II) watch event
        1. Namenode starts, and then triggers the registration event to zookeeper. At the same time, a unique directory will be created.
        2. Confirm the status of the node (live or down) through the heartbeat information
        3. If the machine is down and the heartbeat is not received for a long time, the node loss event will be triggered. Delete this directory.

    III) voting (even number may be stalemated)
        Leader's candidate must get more than half of the votes n + 1
        The surviving node data in the zookeeper cluster must be more than half

Namenode HA

I) install Hadoop
This step requires IP mapping, host name, JDK installation configuration, Hadoop decompression.
If the distribution has been set up, it is necessary to operate three sets.
    Please delete the data and logs directory!
    Pay attention to stop the relevant process before modifying the building
    Delete the / TMP / *. PID file (be careful not to delete all!!!!! Delete the PID end of / TMP /)
    $ rm /tmp/*.pid
    Privacy free login
    You can modify it according to the following steps.
II) configuration environment file
    inspect
    hadoop-env.sh
    mapred-env.sh
    yarn-env.sh
    To configure
    export JAVA_HOME=/opt/modules/jdk1.7.0_67

三)To configurehdfs文件

1.========core-site.xml========
    <! -- logical access name of namenode ha -- >
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://ns1</value>
    </property>

    <property>
        <name>hadoop.tmp.dir</name>
        <value>/opt/modules/hadoop-2.5.0/data</value>
    </property>

    2.=======hdfs-site.xml=============
    <! -- the number of distributed replicas is set to 3 -- >
    <property>
        <name>dfs.replication</name>
        <value>3</value>
    </property>
    <!-- 关闭权限inspect用户或用户组 -->
    <property>
        <name>dfs.permissions.enabled</name>
        <value>false</value>
    </property>
    <! -- specifies that the Nameservice of HDFS is NS1, which needs to be consistent with that in core-site.xml -- >
    <property>
        <name>dfs.nameservices</name>
        <value>ns1</value>
    </property>
    
    <! -- there are two namenodes under NS1, NN1, nn2 -- >
    <property>
        <name>dfs.ha.namenodes.ns1</name>
        <value>nn1,nn2</value>
    </property>
    
    The RPC communication address of <! -- NN1 -- >
    <property>
        <name>dfs.namenode.rpc-address.ns1.nn1</name>
        <value>centos01.ibeifeng.com:8020</value>
    </property>
    HTTP address of <! -- NN1 -- >
    <property>
        <name>dfs.namenode.http-address.ns1.nn1</name>
        <value>centos01.ibeifeng.com:50070</value>
    </property>
    
    The RPC communication address of <! -- nn2 -- >
    <property>
        <name>dfs.namenode.rpc-address.ns1.nn2</name>
        <value>centos02.ibeifeng.com:8020</value>
    </property>
    HTTP address of <! -- nn2 -- >
    <property>
        <name>dfs.namenode.http-address.ns1.nn2</name>
        <value>centos02.ibeifeng.com:50070</value>
    </property>
    
    <! -- specifies on which journalnode the edit file of namenode is stored -- >
    <property>
        <name>dfs.namenode.shared.edits.dir</name>
        <value>qjournal://centos01.ibeifeng.com:8485;centos02.ibeifeng.com:8485;centos03.ibeifeng.com:8485/ns1</value>
    </property>
    <! -- specify the location where the journalnode stores data on the local disk -- >
    <property>
        <name>dfs.journalnode.edits.dir</name>
        <value>/opt/modules/hadoop-2.5.0/journal</value>
    </property>
    
        <! -- when there is an active problem, standby switches to active. At this time, the original active does not stop the service. In this case, the process will be forcibly killed. >
    <property>
        <name>dfs.ha.fencing.methods</name>
        <value>
            sshfence
            shell(/bin/true)
        </value>
    </property>
    
    <!-- 使用sshfence隔离机制时需要ssh免登陆  /home/hadoop为我的用户家目录,我的是hadoop,  ssh也能Privacy free login了, 才行,切记!!  -->
    <property>
        <name>dfs.ha.fencing.ssh.private-key-files</name>
        <value>/home/hadoop/.ssh/id_rsa</value>
    </property>
    
    <!-- To configuresshfence隔离机制超时时间 -->
    <property>
        <name>dfs.ha.fencing.ssh.connect-timeout</name>
        <value>30000</value>
    </property>
    
    3. ==========To configure slaves start==========
    centos01.ibeifeng.com
    centos02.ibeifeng.com
    centos03.ibeifeng.com



    4. ===================== distribute data to the second and third===============
    $ cd /opt/modules/hadoop-2.5.0/
    
    其他两台机器安装了hadoop,直接覆盖对应的To configure文件
    (if there is no other server without Hadoop, you can distribute hadoop-2.5.0 to the other two servers.)
    $ scp etc/hadoop/core-site.xml etc/hadoop/hdfs-site.xml  centos02.ibeifeng.com:/opt/modules/hadoop-2.5.0/etc/hadoop/
    $ scp etc/hadoop/core-site.xml etc/hadoop/hdfs-site.xml  centos03.ibeifeng.com:/opt/modules/hadoop-2.5.0/etc/hadoop/
    
    使用命令inspect是否分发成功在对应的pc上
    $  cat etc/hadoop/hdfs-site.xml 
    $ cat etc/hadoop/core-site.xml 
    Switch to
    $ cd /opt/modules/zookeeper-3.4.5/


IV) start relevant processes
1. Start zookeeper===========
    $ bin/zkServer.sh start
    $JPS view process
        3129 QuorumPeerMain
    $bin / zkserver.sh status
        JMX enabled by default
        Using config: /opt/modules/zookeeper-3.4.5/bin/../conf/zoo.cfg
        Mode: leader or follower




    2. ============ start HDFS ha start=================
    ##Note: the following steps are strictly followed for each step
        $CD / opt / modules / hadoop-2.5.0 / ා switch directories
        1. Start the journal node
            $SBIN / hadoop-day.sh start journalnode ා starts the journalnode process of log synchronization at three stations respectively.
        $JPS - three servers display
            3422 Jps
            3281 QuorumPeerMain
            3376 JournalNode
    2. Format namenode to operate on the first platform!!!!! Remember
        $ bin/hdfs namenode -format
        Prompt: successfully formed
    3. Start namenode on the first (centos01)
        $SBIN / hadoop-day.sh start namenode ා JPS to see if it starts
    4. Switch to the second server (centos02. Ibeifeng. Com) and let another namenode copy the metadata.
        $ bin/hdfs namenode -bootstrapStandby
        $SBIN / hadoop-day.sh start namenode
    
        It can be observed on the web page. Now only namenode is started, which is in standby state.
        http://centos01.ibeifeng.com:50070
        http://centos02.ibeifeng.com:50070
    
    5. Use the following command to set a namenode active
        $bin / HDFS haadmin - transitiontoactive nn2 - take the second namenode as active
        Http://centos02.ibeifeng.com: 50070 active state when discovering the namenode of centos2 server

Turn on failover

First stop the HDFS process and execute on the machine centos01
    $ sbin/stop-dfs.sh
    
    1. Configure failover (append)
        1)=====core-site.xml
            <! -- which zookeeper to register with -- >
            <property>
                <name>ha.zookeeper.quorum</name>
                <value>centos01.ibeifeng.com:2181,centos02.ibeifeng.com:2181,centos03.ibeifeng.com:2181</value>
            </property>

        2)====hdfs-site.xml
            <! -- enable ha automatic failover -- >
            <property>
                <name>dfs.ha.automatic-failover.enabled</name>
                <value>true</value>
            </property>
            
            <property>
                <name>dfs.client.failover.proxy.provider.ns1</name>
                <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
            </property>

    2. Distribution profile
        $ scp etc/hadoop/core-site.xml  etc/hadoop/hdfs-site.xml centos02.ibeifeng.com:/opt/modules/hadoop-2.5.0/etc/hadoop/
        $ scp etc/hadoop/core-site.xml  etc/hadoop/hdfs-site.xml centos03.ibeifeng.com:/opt/modules/hadoop-2.5.0/etc/hadoop/

    3. Start failover service
            1) stop HDFS and ZK first
            $ sbin/stop-dfs.sh
            ##Turn off ZK (3 servers)
            $ bin/zkServer.sh stop        
            Restart and start ZK (3 servers)
            $ bin/zkServer.sh start        

            2) initialize zkfc [PC1, i.e. NN1] in the first set (CD / opt / modules / hadoop-2.5.0 /)
            $bin / HDFS zkfc - formatzk - initialize zkfc
            17/12/22 04:50:55 INFO ha.ActiveStandbyElector: Successfully created /hadoop-ha/ns1 in ZK.
            17/12/22 04:50:55 INFO zookeeper.ZooKeeper: Session: 0x2607abd3bee0000 closed
            17/12/22 04:50:55 INFO zookeeper.ClientCnxn: EventThread shut down
            3) start HDFS and start it in centos01.
            $SBIN / start dfs.sh start HDFS
        
            $bin / HDFS haadmin - getservicestate NN1 - view NN1 state
            $bin / HDFS haadmin - getservicestate nn2 - view nn2 state
    4. Check the three processes and confirm to start
            $jps 
            [centos01]
            [[email protected] hadoop-2.5.0]$ jps
            3281 QuorumPeerMain
            4793 JournalNode
            4610 DataNode
            5137 Jps
            4518 NameNode
            4974 DFSZKFailoverController

            [centos02]
            [[email protected] hadoop-2.5.0]$ jps
            3129 QuorumPeerMain
            4270 Jps
            4176 DFSZKFailoverController
            3892 NameNode
            3955 DataNode
            4046 JournalNode

            [centos03]
            [[email protected] hadoop-2.5.0]$ jps
            3630 Jps
            3553 JournalNode
            3022 QuorumPeerMain
            3465 DataNode
    5. Simulate active namenode fault
        $kill - 9 4518 kill active namenode standby namenode to active
        $SBIN / hadoop-day.sh start namenode ා namenode again on the host, which is standby

ResourceManager HA

Cluster planning
    PC01        PC02            PC03
    NameNode    NameNode
    ZKFC        ZKFC
                ResourceManager    ResourceManager
    DataNode    DataNode        DataNode
    JournalNode    JournalNode        JournalNode    
    NodeManager    NodeManager        NodeManager
    ZooKeeper    ZooKeeper        ZooKeeper

    Stop the HDFS process first
    $ sbin/hadoop-daemon.sh start namenode

    1) modify the configuration file

        =====Yarn-site.xml (overlay)
        <property>
            <name>yarn.nodemanager.aux-services</name>
            <value>mapreduce_shuffle</value>
        </property>
        
        <property>
            <name>yarn.log-aggregation-enable</name>
            <value>true</value>
        </property>
        
        <property>
            <name>yarn.log-aggregation.retain-seconds</name>
            <value>86400</value>
        </property>

        <! -- enable ResourceManager ha -- >
        <property>
           <name>yarn.resourcemanager.ha.enabled</name>
           <value>true</value>
        </property>
            
        <property>
           <name>yarn.resourcemanager.cluster-id</name>
           <value>rmcluster</value>
        </property>

        <property>
           <name>yarn.resourcemanager.ha.rm-ids</name>
           <value>rm1,rm2</value>
        </property>

        <property>
            <name>yarn.resourcemanager.hostname.rm1</name>
            <value>centos02.ibeifeng.com</value>
        </property>

        <property>
            <name>yarn.resourcemanager.hostname.rm2</name>
            <value>centos03.ibeifeng.com</value>
        </property>

        <! -- specify the address of the zookeeper Cluster -- > 
        <property>
           <name>yarn.resourcemanager.zk-address</name>  
           <value>centos01.ibeifeng.com:2181,centos02.ibeifeng.com:2181,centos03.ibeifeng.com:2181</value>
        </property>

        <! -- enable automatic recovery -- > 
        <property>
           <name>yarn.resourcemanager.recovery.enabled</name>
           <value>true</value>
        </property>

        <! -- specifies that the status information of resource manager is stored in the zookeeper Cluster -- > 
        <property>
           <name>yarn.resourcemanager.store.class</name>
           <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
        </property>


    2) distribution profile
        Distribute yarn-site.xml to the other two
        $ scp etc/hadoop/yarn-site.xml  centos02.ibeifeng.com:/opt/modules/hadoop-2.5.0/etc/hadoop/
        $ scp etc/hadoop/yarn-site.xml  centos03.ibeifeng.com:/opt/modules/hadoop-2.5.0/etc/hadoop/
    3) start resourcemanagere
        Start HDFS at centos01 first
            $ sbin/start-dfs.sh 
        On RM1 (centos02):
            $ sbin/start-yarn.sh

        Start manually on RM2 (centos03):
            $ sbin/yarn-daemon.sh start resourcemanager
    4) check whether the process is started
        $ jps

        [[email protected] hadoop-2.5.0]$ jps
        6737 DFSZKFailoverController
        6559 JournalNode
        3281 QuorumPeerMain
        6375 DataNode
        6975 Jps
        6277 NameNode
        6854 NodeManager

        [[email protected] hadoop-2.5.0]$ jps
        5471 DataNode
        4917 ResourceManager
        5403 NameNode
        3129 QuorumPeerMain
        6020 Jps
        5866 NodeManager
        5687 DFSZKFailoverController
        5564 JournalNode

        [[email protected] hadoop-2.5.0]$ jps
        3022 QuorumPeerMain
        4373 NodeManager
        4174 DataNode
        4577 Jps
        4263 JournalNode
        4518 ResourceManager

    5) check the status of the resource manager node with server ID RM1
        $ bin/yarn rmadmin -getServiceState rm1
        Or web page view
        http://centos02.ibeifeng.com:8088/cluster
        http://centos03.ibeifeng.com:8088/cluster
        
    6) test ResourceManager ha
        Kill - 9 4917 and see the access to the web interface
        Centos03 becomes active
        
        Start centos02 again with resource manager status of standby