HDFS in detail


I was asked, “I let one go*Avi is on the Linux server. You can’t find it. Log on and find that the etc folder is several G large….

Why don't you cut avi into pieces and put it on different servers? Who can find out??

1. HDFS foreword

As a joke, HDFS does not allow us to use it as a network disk.

● design ideas

○ divide and Conquer: store large files and large quantities of files on a large number of servers in a distributed manner, so as to facilitate the calculation and analysis of large amounts of data in the way of divide and conquer.

● function in big data system:

Provide data storage services for various distributed computing frameworks (such as MapReduce, spark, tez,...)

Key concepts: file segmentation, copy storage, metadata

2. Concept and characteristics of HDFS

First of all, it is a file system for storing files and locating files through a unified namespace — directory tree
Secondly, it is distributed, which is implemented by many servers. The servers in the cluster have their own roles;

Important features are as follows:

  1. The files in HDFS are physically block storage. The block size can be specified by the configuration parameter (DFS. Block size). The default size is 128M in Hadoop 2. X version and 64M in the old version
  2. The HDFS file system will provide a unified abstract directory tree for the client. The client accesses the file through the path, such as: HDFS: / / namenode: port / dir-a / dir-b / file.data
  3. The management of directory structure and file block information (metadata) is undertaken by the namenode node, which is the master node of HDFS group, and is responsible for maintaining the directory tree of the entire HDFS file system and the block block information (block ID) corresponding to each path (file). And the datanode server)
  4. The storage management of each block of the file is undertaken by the datanode node —— namenode is the slave node of the HDFS cluster. Each block can store multiple copies on multiple datanames (the number of copies can also be set by the parameter DFS. Replication)

5. HDFS is designed to adapt to one write and multiple read scenarios, and does not support file modification
(Note: it is suitable for data analysis and not suitable for network disk application, because it is inconvenient to modify, delayed, expensive and expensive)

3. Shell (command line client) operation of HDFS

For HDFS shell operations, it can be said that those who understand the basic Linux operations can play twice, just add (Hadoop FS -) in front and the files or directories for operations in the back

3.1 command parameters supported by command line client

  • help

Function: output this command parameter manual

  • ls

Function: display catalog information
Example: Hadoop FS – LS HDFS: / / Hadoop server01:9000/
Note: among these parameters, all HDFS paths can be abbreviated
–>Hadoop FS – LS / equivalent to the effect of the previous command

  • moveFromLocal

Function: cut from local to HDFS

Example: Hadoop FS – movefromlocal / home / Hadoop / a.txt / AAA / BBB / cc / DD

  • moveToLocal

Function: cut and paste from HDFS to local

Example: Hadoop FS – movetolocal / a a A / B B B / CCC / DD / b.txt / home / Hadoop / a.txt

  • appendToFile

Function: append a file to the end of an existing file

Example: Hadoop FS – appendtofile. / hello.txt / hello.txt

  • cat

Functions: displaying file contents
Example: Hadoop FS – Cat / hello.txt

  • tail

Function: display the end of a file
Example: Hadoop FS – tail / weblog / access_log. 1

  • text

Function: print the contents of a file in character form
Example: Hadoop FS – text / weblog / access_log. 1

  • chgrp
  • chmod
  • chown

Function: the same usage in Linux file system, the permission to operate the file
Example: Hadoop FS – Chmod 666 / hello.txt
hadoop fs -chown someuser:somegrp /hello.txt

  • copyFromLocal

Function: copy files locally to HDFS path
Example: Hadoop FS – copyfromlocal. / jdk.tar.gz/aaa/

  • copyToLocal

Function: copy from HDFS to local
Example: Hadoop FS – coptlocal / AAA / jdk.tar.gz

  • cp

Function: copy another path of HDFS from one path of HDFS
Example: Hadoop FS – CP / AAA / JDK. Tar. Gz / BBB / JDK. Tar. GZ. 2

  • mv

Function: move files in HDFS directory
Example: Hadoop FS – MV / AAA / jdk.tar.gz/

  • get

Function: equivalent to copytolocal, that is, downloading files from HDFS to local
Example: Hadoop FS – get / AAA / jdk.tar.gz

  • getmerge

Function: merge and download multiple files
Example: for example, there are multiple files in the directory / AAA / of HDFS: log.1, log.2, log.3

hadoop fs -getmerge /aaa/log.* ./log.sum

  • put

Function: equivalent to copyfromlocal
Example: Hadoop FS – put / AAA / JDK. Tar. Gz / BBB / JDK. Tar. GZ. 2

  • rm

Function: delete files or folders
Example: Hadoop FS – RM – R / AAA / BBB/

  • rmdir

Function: delete empty directory
Example: Hadoop FS – rmdir / AAA / BBB / CCC

  • df

Function: Statistics of available space information of file system
Example: Hadoop FS – DF – H/

  • du

Function: Statistics of folder size information
Example: Hadoop FS – Du – S – H / AAA/*

  • count

Function: count the number of file nodes in a specified directory
Example: Hadoop FS – count / AAA/

  • setrep

Function: set the number of copies of files in HDFS
Example: Hadoop FS – setrep 3 / AAA / jdk.tar.gz
< the number of copies set here is only recorded in the metadata of namenode. Whether there are so many copies depends on the number of datanodes >

  • View HDFS status

hdfs dfsadmin -report

HDFS principle

4. Working mechanism of HDFS

(the learning of working mechanism is mainly to deepen the understanding of distributed system, enhance the analysis and solution ability when encountering various problems, and form a certain cluster operation and maintenance ability.)
Note: many people who do not really understand the Hadoop technology system will often feel that HDFS can be used for network disk applications, but this is not the case. If we want to use technology in the right place, we must have a deep understanding of technology.

4.1 overview

  1. HDFS cluster is divided into two roles: namenode and datanode
  2. Namenode is responsible for managing the metadata of the entire file system
  3. Datanode is responsible for managing user’s file data block
  4. The file will be cut into several blocks according to the fixed size and stored on several datanodes in a distributed way
  5. Each file block can have multiple copies and be stored on different datanodes
  6. Datanode will = = periodically = = report its saved file block information to namenode, while namenode will be responsible for keeping the number of copies of the file
  7. The internal working mechanism of HDFS is transparent to the client, = = the client requests to access HDFS by applying to namenode==

4.2 HDFS data writing process

4.2.1 to write data to HDFS, the client must first communicate with namenode to confirm that it can write files and obtain the datanode of the received file block, and then the client will block the files to the corresponding datanode one by one in order, = = and the datanode receiving the block is responsible for copying the copy of the block to other datanodes. = =

This shows that the file cutting is implemented on the client side, not the namenode. The transfer of files is also transferred from the client to the specified datanode, and the copy is transferred from the datanode to other datanodes.

4.2.3 detailed steps (important)

  1. Communicate with namenode to request to upload the file, = = namenode to check whether the current file exists and whether the parent directory exists==
  2. Can I upload the returned namenode
  3. Client requests which datanode servers (Replica) the first block should be transferred to
  4. Namenode returns three data node servers, ABC, if the number of copies is three. The priority is to find the same rack, followed by different racks, followed by another machine with the same rack. According to the capacity of the server,)
  5. Client requests one of three datanodes a to upload data = = (essentially an RPC call, build pipeline) = =, a will continue to call B after receiving the request, and then B will call C to complete the establishment of the entire pipeline and return to the client level by level
  6. Client starts to upload the first block to a (first read the data from the disk and put it into a local memory cache). In the unit of packet, a receives a packet and transmits it to B, B transmits it to C; each packet a transmits will be put into an answer queue to wait for an answer.
  7. When a block transmission is completed, client requests namenode to upload the server of the second block again.

4.3. HDFS data reading process

4.3.1 overview

The client sends the path of the file to be read to the namenode, and the namenode obtains the meta information of the file (mainly the storage location information of the block) and returns it to the client. The client finds the corresponding datanode to obtain the block of the file one by one according to the returned information, and carries out data tracking and merging at the = = client to obtain the whole file. = =

4.3.2 detailed step analysis

  1. Communicate with namenode to query metadata and find the datanoede server where the file block is located
  2. Select a datanode server and request to establish socket flow
  3. Datanode starts to send data (read data from the disk and put it into the stream, and check it with packet)
  4. The client receives the packet, caches it locally, and writes it to the target file

5. Working mechanism of namenode

Learning objectives: understand the working mechanism of namenode, especially the metadata management mechanism, so as to enhance the understanding of the working principle of HDFS, and cultivate the analysis and solution ability of “performance tuning” and “namenode” fault problems in Hadoop cluster operation

Problem scenario:

  1. After the cluster is started, you can view the files, but an error is reported when uploading the files. Open the web and you can see that the namenode is in safemode. What can I do?
  2. Namenode server disk failure leads to the failure of namenode. How to save the cluster and data?
  3. Can there be more than one namenode? How much memory should namenode be configured? Is namenode related to cluster data storage capacity?
  4. Is the block size of the file better adjusted up or down?

Answers to such questions need to be based on a deep understanding of the workings of namenode itself.

5.1 namenode responsibilities

Namenode responsibilities:
Responsible for client request response
Metadata management (query, modification)

5.2 metadata management

Namenode adopts three storage forms for data management:
Memory metadata (namesystem)
Disk metadata image file (fsimage)
Data operation log file (edit. XML) can calculate metadata through log

5.2.1 metadata storage mechanism (important)

A. There is a complete meta data in memory
B. The disk has a "quasi complete" metadata image file (in the working directory of namenode)
C. Operation log (edit file) used to connect memory metadata and persistent metadata image fsimage

==Note: when the client adds or modifies files in HDFS, the operation record is first recorded in the edit log file. When the client completes the operation, the corresponding meta data will be updated to the memory meta.data. = =

5.2.2 manual view of metadata

You can view the information in the edit through a tool of HDFS
hdfs oev -i edits -o edits.xml
hdfs oiv -i fsimage_0000000087 -p XML -o fsimage.xml

Inputfile: fsimage file to view

OutputFile: used to save the formatted file
   Process: what process to use to decode, xml| web|

5.2.3 checkpoint of metadata

Every other period of time (30 minutes), the secondary namenode will download all the edits accumulated on the namenode (the edits file will immediately scroll once, so that the checkpoint is the latest operation) and the latest fsimage to the local (generally only the first checkpoint is downloaded, and the latest fsimage already exists in the subsequent checkpoint, secondary namenode) and load them into memory for merge, Then dump it into a new image file, upload it to namenode, and rename it to fsimage in namenode (this process is called checkpoint)

Detailed process:

Trigger condition configuration parameters of checkpoint operation

DFS. Namenode. Checkpoint. Check. Period = 60 ා check every 60 seconds, whether checkpoint is needed
#When the above two parameters are used for checkpoint operation, the local working directory of the secondary namenode

DFS. Namenode. Checkpoint. Max retries = 3 ා maximum retries
DFS. Namenode. Checkpoint. Period = 3600. The time interval between two checkpoints is 3600 seconds
DFS. Namenode. Checkpoint. Txns = 1000000 × the largest operation record between two checkpoints
The side effect of checkpoint

The working directory storage structure of namenode and secondary namenode is the same, so = = when the failure exit of namenode needs to be recovered again, fsimage can be copied from the working directory of secondary namenode to the working directory of namenode to recover the metadata of namenode (fsimage will be loaded when namenode starts). = =

5.2.4 metadata directory description

When deploying Hadoop cluster for the first time, we need to format the disk on the namenode (NN) node:
==hdfs namenode -format==
After formatting, the following file structure will be created in the $dfs.name.dir/current directory
    |-- VERSION
    |-- edits_*
    |-- fsimage_00000000008547077
    |-- fsimage_00000000008547077.md5
    | -- seen_txid

Dfs.namenode.dir is configured in the hdfs-site.xml file. The default value is as follows:


Hadoop.tmp.dir is configured in core-site.xml. The default values are as follows

<description>A base for other temporary directories.</description>

The dfs.namenode.name.dir property can be used to configure multiple directories, such as / data1 / DFs / name, / data2 / DFs / name, / data3 / DFs / name. The file structure and content stored in each directory are exactly the same, which is equivalent to backup. The advantage of this is that when one directory is damaged, it will not affect the metadata of Hadoop, especially when one directory is on the network file system (NFS), even if your machine is damaged, the metadata will also be saved.

The following explains the files in the $DFS. Namenode. Name. Dir / current / directory.

1. Version file is a Java property file, with the following contents:
#Fri Nov 15 19:47:46 CST 2013namespaceID=934548976clusterID=CID-cdff7d73-93cd-4783-9399-0a22e6dce196cTime=0storageType=NAME_NODEblockpoolID=BP-893790215-


(1) , namespaceid is the unique identifier of the file system, = = generated after the file system is first formatted==
(2) . storagetype indicates what process's data structure information is stored in this file (if it is a datanode, storagetype = data? Node);
(3) , CTime indicates the creation time of namenode storage. Since my namenode has not been updated, the record value here is 0. After upgrading namenode, CTime will record the update timestamp;
(4) . layoutversion indicates the version information of the permanent data structure of HDFS. As long as the data structure changes, the version number will also decline. At this time, HDFS also needs to be upgraded. Otherwise, the disk still uses the old version of the data structure, which will cause the new version of namenode to be unavailable
(5) , clusterid is a system generated or manually specified cluster ID, which can be used in the - clusterid option: as follows
a. Format a namenode using the following command:
        $HADOOP_HOME/bin/hdfs namenode -format [-clusterid <cluster_id>]
        Select a unique cluster ID, and this cluster ID cannot conflict with other clusters in the environment. If you do not provide a cluster? ID, a unique cluster ID will be generated automatically.
        b. Format the other namenodes using the following command:
        $HADOOP_HOME/bin/hdfs -format -clusterId <cluster_id>
        c. Upgrade the cluster to the latest version. You need to provide a clusterid during the upgrade process, for example:
        $HADOOP_PREFIX_HOME/bin/hdfs start namenode --config $HADOOP_CONF_DIR -upgrade -clusterId <cluster_ID>
        If no clusterid is provided, a clusterid is automatically generated.
(6) , blockpoolid: the ID of the blockpool corresponding to each namespace. The bp-893790215- above is the ID of the storage block pool under the namespace of my Server1, = = this ID includes the IP address of the corresponding namenode node. = =

2. $dfs.namenode.name.dir/current/seen’xid is very important. It is the file where the transactionid is stored. After the format, it is 0, = = it represents the end number of the edit file in namenode = =, = = when the namenode is restarted, it will run from the beginning of the edit to the number of seen’txid in sequence according to the number of seen’txid. So when your HDFS is restarted abnormally, make sure to compare the number in seen [txid] with the last mantissa of your edits. Otherwise, there will be a lack of metadata data when creating the namenode, which will lead to the deletion of redundant block information on the datanode. = =

3. In the $dfs.namenode.name.dir/current directory, fsimage and edit files and their corresponding MD5 verification files will be generated at the same time of format.

Supplement: seen
The file records the sequence number of the edits scrolling. Every time the namenode is restarted, the namenode knows which edits to load.

  1. Working mechanism of datamode

Problem scenario:
1. Cluster capacity is not enough, how to expand?
2. What if some datanodes go down?
3. Datanode is clearly started, but it is not in the list of available datanodes in the cluster. What should I do?

The answer to these questions depends on a deep understanding of the working mechanism of datanode.

6.1 overview

1. Responsibilities of datanode:

Store file block data of management user
Regularly report the block information held by itself to namenode (via heartbeat information)
(this is important because when some block replicas fail in the cluster, how does the cluster recover the initial number of block replicas)
    <description>Determines blockreporting interval in milliseconds.</description>

2. Time limit parameter of datanode offline judgment

Datanode cannot communicate with namenode due to the death of datanode process or network failure, = = namenode will not immediately determine the node as dead. It will take a period of time, which is temporarily called timeout. The default timeout for HDFS is 10 minutes + 30 seconds = =. If timeout is defined as timeout, the calculation formula of timeout is:
==timeout  = 2 * heartbeat.recheck.interval + 10 * dfs.heartbeat.interval。==
The default heartbeat.recheck.interval size is 5 minutes, and the default dfs.heartbeat.interval is 3 seconds. The total is 10 minutes + 30 seconds
It should be noted that the unit of heart.recheck.interval in hdfs-site.xml configuration file is MS, and that of dfs.heartbeat.interval is s. So for example, if heartbeat.recheck.interval is set to 5000 (milliseconds), dfs.heartbeat.interval is set to 3 (seconds, default), the total timeout is 40 seconds.

6.2 observe and verify datamode function

Upload a file and observe the specific physical storage of the file block:
In this directory on each datanode machine, you can find the file chunks:

7. Java API operation HDFS

In the production application, HDFS is mainly the development of the client, = = the core step is to construct an HDFS access client object from the API provided by HDFS, and then operate (add, delete, modify and query) the files on HDFS through the client object==


1. Import dependency



To operate HDFS in Java, first obtain a client instance

Configuration conf = new Configuration();
 FileSystem fs = FileSystem.get(conf);

==Our operation target is HDFS, so the FS object obtained should be an instance of distributedfile system==

Where does the get method judge which client class to instantiate?
--------Judging from the configuration value of one parameter fs.defaultfs in conf;

If fs.defaultfs is not specified in our code, and no corresponding configuration is given under the project classpath, the default value in conf comes from the core-default.xml in the Hadoop jar package, = = the default value is: File: / /, then the obtained object will not be an instance of distributedfilesystem, but a client object of the local filesystem. = =

7.3 example of HDFS client operation data code:

7.3.1 document addition, deletion, modification and query

public class HdfsClient {
FileSystem fs = null;
public void init() throws Exception {
//Construct a configuration parameter object and set a parameter: the URI of the HDFS we want to access
//So the filesystem. Get () method knows that it should construct a client to access the HDFS file system and the access address of HDFS
//When new configuration(); is used, it will load hdfs-default.xml in the jar package
//Then load hdfs-site.xml under classpath
Configuration conf = new Configuration();
conf.set("fs.defaultFS", "hdfs://hdp-node01:9000");
*Parameter priority: 1. The value set in the client code. 2. The user-defined profile under classpath. 3. Then the default configuration of the server
conf.set("dfs.replication", "3");
//Get an access client of HDFS. According to the parameter, this instance should be an instance of distributedfilesystem
// fs = FileSystem.get(conf);
//If you want to obtain it in this way, you can not configure the "FS. Defaultfs" parameter in the conf. moreover, the identity of this client is already a Hadoop user
fs = FileSystem.get(new URI("hdfs://server1:9000"), conf, "hadoop");
*Upload files to HDFS
* @throws Exception
public void testAddFileToHdfs() throws Exception {
//The local path of the file to upload
Path src = new Path("/home/redis-recommend.zip");
//Destination path to upload to HDFS
Path dst = new Path("/aaa");
fs.copyFromLocalFile(src, dst);
*Copying files from HDFS to the local file system
* @throws IOException
* @throws IllegalArgumentException
public void testDownloadFileToLocal() throws IllegalArgumentException, IOException {
fs.copyToLocalFile(new Path("/jdk8.tar.gz"), new Path("/home"));
public void testMkdirAndDeleteAndRename() throws IllegalArgumentException, IOException {
//Create directory
fs.mkdirs(new Path("/a1/b1/c1"));
//Delete the folder. If it is a non empty folder, parameter 2 must give a value of true
fs.delete(new Path("/aaa"), true);
//Rename a file or folder
fs.rename(new Path("/a1"), new Path("/a2"));
*View directory information, only show files
* @throws IOException
* @throws IllegalArgumentException
* @throws FileNotFoundException
public void testListFiles() throws FileNotFoundException, IllegalArgumentException, IOException {
    //Thinking: why return iterators instead of containers like list
    //Because if there are tens of thousands of files in the cluster, putting them in the collection is a performance drain. If you return an iterator, you don't need to return all the files at once. Which file do you want to access to HDFS to get
    RemoteIterator<LocatedFileStatus> listFiles = fs.listFiles(new Path("/"), true);
    while (listFiles.hasNext()) {
        LocatedFileStatus fileStatus = listFiles.next();
        System. Out. Println (filestatus. Getlen()); // get the block information of the file
        BlockLocation[] blockLocations = fileStatus.getBlockLocations();
        for (BlockLocation bl : blockLocations) {
        System.out.println("block-length:" + bl.getLength() + "--" + "block-offset:" + bl.getOffset());
        String[] hosts = bl.getHosts();
        for (String host : hosts) {

*View file and folder information
* @throws IOException
* @throws IllegalArgumentException
* @throws FileNotFoundException
public void testListAll() throws FileNotFoundException, IllegalArgumentException, IOException {
FileStatus[] listStatus = fs.listStatus(new Path("/"));
String flag = "d--";
for (FileStatus fstatus : listStatus) {
if (fstatus.isFile()) flag = "f--";
System.out.println(flag + fstatus.getPath().getName());

==Note: the difference between listfiles and liststatus. Listfiles can recursively traverse all files, while liststatus can only view files and folders in a certain directory. = =

7.4.2 access to HDFS by streaming

*A lower level of operation compared to the encapsulated methods
*The upper layer of the MapReduce spark and other computing frameworks are the underlying API when they go to HDFS to get data
* @author
public class StreamAccess {
    FileSystem fs = null;
    public void init() throws Exception {
    Configuration conf = new Configuration();
    fs = FileSystem.get(new URI("hdfs://server1:9000"), conf, "hadoop");

*Upload files to HDFS by streaming
* @throws Exception
public void testUpload() throws Exception {
    FSDataOutputStream outputStream = fs.create(new Path("/hello.txt"), true);
    FileInputStream inputStream = new FileInputStream("/home/hello.txt");
    IOUtils.copy(inputStream, outputStream);

public void testDownLoadFileToLocal() throws IllegalArgumentException, IOException{
    //Get the input stream of a file first -- for the
    FSDataInputStream in = fs.open(new Path("/jdk8.tar.gz"));
    //Construct an output stream of a file -- for local
    FileOutputStream out = new FileOutputStream(new File("/home/jdk.tar.gz"));
    //Then transfer the data in the input stream to the output stream
    IOUtils.copyBytes(in, out, 4096);

*HDFS supports random location for file reading, and can easily read the specified length
*Concurrent data processing for upper distributed computing framework
* @throws IllegalArgumentException
* @throws IOException
public void testRandomAccess() throws IllegalArgumentException, IOException{
    //Get the input stream of a file first -- for the
    FSDataInputStream in = fs.open(new Path("/hello.txt"));
    //You can customize the start offset of the flow
    //Construct an output stream of a file -- for local
    FileOutputStream out = new FileOutputStream(new File("/home/hello.line.2.txt"));

*Display the contents of files on HDFS
* @throws IOException
* @throws IllegalArgumentException
public void testCat() throws IllegalArgumentException, IOException{
    FSDataInputStream in = fs.open(new Path("/hello.txt"));
    IOUtils.copyBytes(in, System.out, 1024);

Pay attention to the public address and get more benefits.

HDFS in detail