HDFS of Hadoop (2) HDFS working mechanism and cluster security mode


1. Heartbeat mechanism between namenode and datanode

1) When namenode is started, an IPC server service will be started,
2) After datanode is started, it will actively connect to the IP server service of namenode. By default, it will connect every 3 seconds, which is heartbeat.
This time can be passed dfs.heartbeat.interval Parameter setting, that is, heartbeat time.
3) Datanode registers with namenode through heartbeat. Namenode obtains the status of datanode and the operation instructions issued by namenode through heartbeat, and periodically reports all its block information to namenode.
4) When namenode does not receive the heartbeat of datanode for a long time, it is considered that datanode is dead.
This heartbeat mechanism also exists in the ResourceManager and nodemanager of yarn.

This is the master / slave architecture of Hadoop. Namenode and ResourceManager are master, datanode and nodemanager are slave.

2. Working mechanism of namenode and secondarynamenode

First, how to save the metadata of namenode?
First of all, the metadata of naomnode needs to be put in memory, because we need to visit the namenode node frequently to obtain metadata. If it is put in disk, the efficiency will be very low.

Since it is stored in the memory, it is necessary to have a mechanism to ensure the security of the data in the memory, because once the data in the memory is powered off, it will be lost, so the metadata in the memory must also be landed on the disk. This is fstimage

But it’s not all right. The metadata in memory may be updated at any time. Do you want to update fsimage synchronously? If we update, it will inevitably lead to inefficiency. If we do not update, the metadata in memory and fsimage will be inconsistent. Once the namenode node node is powered off, some data will be lost.

Then we introduce a record file named edits. As long as the metadata in memory is added or updated, we will synchronously append the operation record to edits. In this way, even if the namenode is powered off, we can recover the metadata according to edits and fsimage.

A new problem is coming again. The metadata in the memory often changes. The constant addition of records to edits will inevitably lead to the file becoming larger and larger. In the future, when we need to recover the metadata, the time it will take will be greatly increased, which will affect our efficiency. Therefore, we need to merge fsimage and edits on a regular basis.

OK, here comes the task. Define to merge fsimage and edits. Who will do this task? Is namenode OK? Of course, it can, but this will cause the namenode task to be too heavy and affect the efficiency. In order to ensure the efficiency, the task should be handed over to another person, that is, secondary namenode.

It can be seen from this that secondary namenode is not a hot standby of namenode. When namenode is down, it can not replace namenode, but it can be used to help recover namenode.

The specific workflow of namenode and secondary namenode is as follows
Stage 1:
1) After starting the cluster for the first time, we need to format the namenode. At this time, we will create fsimage and edits, which are stored in $Hadoop_ Home / data / name / current
After that, it starts and loads edits and fsimage directly into memory.
2) Client is the operation request of adding, deleting and modifying metadata.
3) Namenode first records the operation, updates the log, and then adds, deletes and modifies the metadata in memory.

Phase 2:
The secondarynamenode performs a merge operation called checkpoint, which has two trigger conditions.
The first one is the interval time, which is 1 hour by default. This can be adjusted.
Second, secondarynamenode will check the number of operations once a minute. When the number of operations reaches the set upper limit, it will trigger.

1) First, the secondarynamenode will ask the namenode if it needs to execute checkpoint
2) Get the return result of namenode and start to request checkpoint
3) Namenode scrolls to update the edits log in progress, and copies the edits and fsimage files before scrolling to secondarynamenode.
4) Secondarynamenode loads two files into memory and merges them to generate a new file fsImage.chkpoint , copy to namenode.
5) Namenode will fsimage.chkpoint It is named fsimage.

Of course, with HA, secondaryname is rarely used.

3. How does datanode ensure the integrity of stored data

We know that the data is stored in the datanode node. If the data on a datanode is damaged, such as compressed packets, how does the datanode deal with this problem to ensure the integrity of the data?

Generally, in order to ensure the integrity of data, we use data verification technology
1) Parity check
2) MD5, SHA1, etc
3)CRC_ 32 cyclic redundancy check

HDFS can pass io.bytes.per The. Checksum property sets the verification method.

When writing data, the client sends the data and verification together to the datanode. The last datanode is responsible for verifying the data. If there is an error in the data, the client will receive a checksumexception exception.

When reading data, the client will check and compare with the checksums stored in the datanode. If there is an error, it will report a namenode and throw a checksumexception. Namenode marks the copy of the block as corrupt, after which it no longer sends processing requests to the node.
Then, it arranges a copy of the data block to be copied to the datanode, and the damaged data block is deleted.

In addition, the datanode node will execute a line — datablockscanner (data block detector) in the background to periodically verify all the blocks stored on the data node.

This is HDFS’s mechanism for ensuring data integrity.

4. Parameter setting of datanode offline time limit

What conditions can cause datanode to drop?
For example, network failure, datanode process hang up or server power down and so on.

When these events occur. Namenode will not immediately determine that the datanode is down, but will wait for a period of time, which is the timeout time.
The default timeout of HDFS is 10 minutes and 30 seconds. How did this time come about?
It has a formula:
timeout = 2 heartbeat.recheck.interval + 10 dfs.heartbeat.interval

Timeout indicates the timeout duration, heartbeat.recheck.interval The default is 5 minutes, dfs.heartbeat.interval The default is 3 seconds, and the result is 10 minutes and 30 seconds.

5 safety mode

As mentioned earlier, when namenode is started, the first step is to add the fsimage and edits files to the memory. This is to ensure the latest metadata. This is also a merge. Then a new fsimage and a blank edits are generated, and IPC is started Server service, listen to the request of datanode, during this period, the file system of namenode is in read-only state to the outside world, that is, security mode.

After that, the datanode starts and sends the latest block list information to each datanode through the ipcserver of namenode.

When reached dfs.replication.min The value set by this parameter is the minimum copy condition, which refers to the minimum copy level that the block in the file system meets.