HDFS high error tolerance implementation

Time:2021-4-19

HDFS high error tolerance mechanism

Summarize the above links

  • For the three kinds of faults, corresponding to a variety of measures

    • Node failed

      • If the namenode fails, the whole cluster will hang up in the non highly available version
      • If the datanode fails, it can detect whether it works normally by sending heartbeat signals to the namenode every 3 seconds. If it is not received in 10 minutes, it will be considered dead
    • Network failure

      • Any time a message is sent, it will wait for the ACK frame. If it is not received, it means that there is a fault
    • Data corruption

      • Each time the data is sent, there will be a total check code. The check code will be stored together with the data, which can be used to identify the data. The next time the block data needs to be worked, the data and check code will be taken out, checked and reported to namenode, and the faulty block will be comprehensively analyzed and restored