Only a bare head can make it stronger.
The text has been included in my GitHub selected articles. Welcome to star：https://github.com/ZhongFuCheng3y/3y
As explained in the previous articleIntroduction to big dataIn this article, we will learn about HDFS. If there are mistakes in the article, please kindly point out them in the comment area~
1、 HDFS introduction
As mentioned in the previous article, with the increasing amount of data, it is impossible to store all the data on one machine. Then we will allocate these data to different machines for storage, but this brings a problem:Inconvenient for management and maintenance
Therefore, we hope to have a system that can distribute these data on different operating serversunified management, that’s itdistributed file system
- HDFSIs one of the most widely used distributed file systems
When using HDFS, it’s very simple: Although HDFS stores files on different machines, I use these filesTreat asIt is stored in one machine to use (but multiple machines are executing behind it):
- For example: I called an RPC interface, I gave him parameters, and he returned a response to me. I don’t know what RPC interface does (maybe this RPC interface calls another RPC interface)—–Block implementation details and be user-friendly
Make it clear: HDFS is onedistributed file system , a file system. What do we use it for?Store data。
Now, let’s take a look at some knowledge of HDFS, which can help us better “use” HDFS
2、 HDFS learning
As mentioned above, as a distributed file system, HDFSIts data is stored on multiple systems。 For example, the following figure: a 1GB file will besegmentationInto a few small files, each server will store a part.
Then someone will ask: how many small files will be cut? By default
128MBThe size of each
128MBIn HDFS calledblock(block)
Obviously, this 128MB size is available. If it’s too small or too large, it’s not good. If the segmented file is too small, a single piece of data may be distributed to multiple machines (addressing time is slow). If the segmented file is too large, the data transfer time is very slow.
PS: the default value of the old version is 64MB
A user sent out a
1GBThe HDFS client will receive the file request from the HDFS client according to the configuration (now the default is
128MB）, so the HDFS client will split the file into 8 files (also calledblock）, and then each server stores these segmented files (blocks). Now let’s assumeTwo copies for each server。
These depositingReal dataIn the field of HDFSDataNode
Now the problem is, how do you know which server (datanode) the HDFS client will put data to after splitting according to the configuration? At this point, you need another role, manager（NameNode）。
Namenode is actuallyManage all kinds of file information(this kind of information specialty is calledMetaDataMetadata), including: file path name, ID of each block, storage location, etc.
So, whether it’s read or write, the HDFS client will first findNameNode, get the corresponding information through namenode, and then go to datanode
- If it is a write operation, after HDFS cuts the file, it will ask namenode which datanode should write the cut blocks to.
- If it is a read operation, HDFS will get the file name and ask which datanodes namenode should read data to.
2.1 HDFS backup
As a distributed system (large files are divided into several small files and stored on different machines), if there is no backup, as long as one of the machines is hung, it will lead to the “data” is not available.
If you read myKafkaandElasticSearchYou may understand. In fact, the thought is the same.
Kafka backs up partition, elasticsearch backs up fragmentation, and HDFS backs up block.
Back up data to different machines as much as possibleEven if a machine is hung up, the backup data can be pulled out for use.
For the students who don’t know Kafka and elasticsearch, you can pay attention to my GitHub, search for keywords and query (I think it’s easy to write)
Note:The backup here does not need to be written by the HDFS client, as long as the data nodes transfer data to each other.
2.2 something about namenode
As we can see from the above, namenode needs to handle the HDFS client request. (because it is the place where metadata is stored, it needs to be read and written).
Now the question is, how does namenode store metadata?
- If namenode only puts metadata into memory, then if namenode is restarted, metadata will be lost.
- If namenode stores the data written every time to the hard disk, ifDisk onlyIt will be very slow to find and modify (because this isPure IOOperation)
I think of Kafka again. Kafka also writes partition to the disk, but how is it written?Sequential IO
Namenode also does the same thing: modify the metadata in memory, and then change the informationappend(append) to a name
editlogOn the file.
Because append is sequential IO, the efficiency will not be low. Now we use memory to add, delete, modify and query, but only to disk files when adding, deleting and modifying
editlogAdd another one inside. In this way, even if we restart namenode, we can still use the
editlogThe file restores the metadata.
Now there is also a problem: if namenode has been running for a long time, then
editlogThe file should be larger and larger (because all the modified metadata information needs to be appended here). Need to rely on when restarting
editlogFile to recover data. If the file is too large, isn’t it too slow to start?
It’s true. How does HDFS do it? In order to prevent
editlogIt is too large, so it takes a long time to recover data when restarting, so namenode will have amemory dumpIt’s called
When it comes to snapshots, do you think of redis’s RDB!!
In this way, you only need to load the memory snapshot when you restart
The idea is very good, and there are still some things that need to be solved in reality: when do I generate a memory snapshot
fsimage? How do I know which part to load
The problem seems complicated. In fact, we only need oneTiming task。
If I had to do it myself, I might think: let’s add a configuration and set a time
editlogTo what extent or after how long, we will take the data of the editlog file and the memory snapshot
fsiamgeTo merge. Then generate a new
editlogTo clear, to cover the old
- In this way, every time namenode restarts, it gets the latest fsimage file, and all the files in editlog are not merged into fsimage. According to these two files, you can restore the latest metadata information.
HDFS is similar to the above, but it does not start a scheduled task run at namenode, but uses a new role:SecondNameNode。 As for why? Maybe HDFS thinksToo much resources for consolidationNow, different work is done by different servers, which also conforms to the distributed concept.
Now the problem is still coming, the architecture at this timeNamenode is a stand-alone. The function of secondnamenode is only to merge namenode
fsimageFile. If namenode is hung, the client cannot request it, and all requests need to go to namenode, which makes the whole HDFS cluster unavailable.
So we need to ensure that namenode is highly available. Generally now we will passZookeeperTo achieve. The architecture is as follows:
The primary and secondary namenodes need to keep the metadata information consistent (because if the primary namenode is hung, the secondary namenode needs to be on top, and the secondary namenode needs to have the information of the primary namenode).
Therefore, shared edits is introduced to realize synchronization between master and slave namenodes. Shared edits is also calledJournalNode。 In fact, if the main namenode has information to update metadata, its
editlogIt will write to the journal node, read the change information from the namenode in the journal node, and then synchronize. The function of secondnamenode mentioned above is also implemented from namenode (merge editlog and fsimage)
- Namenode needs to process client requests, which is where metadata is stored
- The metadata operations of namenode are all in memory, and the addition and deletion will be changed to
editlogPersistent to hard disk (not too slow due to sequential IO)
editlogThere may be too large a problem, causing the restart of namenode to be too slow (because it depends on
editlogTo recover the data), which leads to
fsimageMemory snapshot. Need to run a scheduled task to merge
editlogThat leads to
- Because namenode is a single machine, there may be a single machine failure. Therefore, we can maintain the master-slave namenode through zookeeper, and achieve the consistency of master-slave namenode metadata through journal node (share edits). Finally, the high availability of namenode is realized.
As we know from the above, our data is stored on the datanode (and backed up).
If a datanode is disconnected, how does HDFS know?
When datanode is started, it will go to namenode to register, and they will maintainheartbeat, if the heartbeat of the datanode is not received when the time threshold is exceeded, HDFS considers the datanode to be hung.
There is another problem: if we save the block to the datanode, it is still possible for the disk of the datanodeDamaged part, and our datanode is not offline, but we do not know it is damaged.
In addition to the data itself, a block also stores a metadata (including the length of the data block, the check sum of the block data, and the time stamp). Datanode will stillregularReport all the current block information to namenode throughThe metadata can verify whether the current block is in normal state。
In fact, when you study HDFS, you will find that many ideas are similar to those you have learned before. For example, Kafka and elastic search are commonly used distributed components.
If you do not know Kafka, Elasticsearch, Zookeeper, Redis, etc., you can find the corresponding articles in my GitHub or official account. I think it is fairly easy to understand.
Another day, I will integrate the persistence features of these frameworks and write another one (because it can be found that their persistence mechanisms are very similar)
If there is no accident in the next article, MapReduce will be written,Thank you for seeing this。
- HDFS comics
- Learning big data from scratch – Li Zhizhi
If you want toreal timePay close attention to my updated articles and the dry cargo I share, so I can pay attention to my official account.Java3y」。
- 🔥Java beautiful brain map
- 🔥Java learning path
- 🔥Develop common tools
Reply under official account number “888You can get it!!
This has been included in my GitHub selected articles. Welcome to star：https://github.com/ZhongFuCheng3y/3y
Ask for praise Seeking attention Seeking to share A kind of Asking for messages A kind ofFor me, reallyVery useful！！！