Hadoop framework: introduction to HDFS and shell management commands

Time:2021-9-26

Source code of this article:GitHub · click here || Gitee · point here

1、 HDFS basic overview

1. HDFS description

The big data field has been facing two core modules: data storage and data computing. As the most important big data storage technology, HDFS has high fault tolerance, stability and reliability. HDFS (Hadoop distributed file system), which is a distributed file system, is used to store files and locate files through the directory tree; The original intention of the design is to manage hundreds of servers and disks, so that the application can store large-scale file data like an ordinary file system. It is suitable for the scenario of one write and multiple read, and does not support file modification. It is suitable for data analysis.

2. Infrastructure

Hadoop framework: introduction to HDFS and shell management commands

HDFS has a master / slave architecture with two core components, namenode and datanode.

NameNode

Responsible for metadata management of file system, i.e. file pathname, data block ID, storage location and other information, configure replica policy and handle client read-write requests.

DataNode

Execute the actual storage and read-write operations of file data. Each datanode stores a part of file data blocks, and the files are distributed and stored in the whole HDFS server cluster.

Client

On the client side, when uploading HDFS files, the client divides the files into blocks one by one, and then uploads them; Obtain the location information of the file from the namenode; Communicating with datanode to read or write data; The client accesses or manages HDFS through some commands.

Secondary-NameNode

It is not a hot standby of namenode, but shares the workload of namenode, such as regularly merging fsimage and edits and pushing them to namenode; In case of emergency, the namenode can be recovered.

3. High fault tolerance

Hadoop framework: introduction to HDFS and shell management commands

Schematic diagram of multiple copy storage of data block, file / users / sameerp / data / part-0, copy backup is set to 2, and the stored block IDs are 1 and 3 respectively; File / users / sameerp / data / Part-1, copy backup is set to 3, and the stored block IDs are 2, 4 and 5 respectively; After any single server goes down, at least one backup service still exists for each data block, which will not affect the access to files and improve the overall fault tolerance.

The files in HDFS are physically stored in blocks. The block size can be configured through the parameter dfs.blocksize. If the block setting is too small, the addressing time will be increased; If the block is set too large, the time of transferring data from the disk will be very slow. The size setting of HDFS block mainly depends on the disk transfer rate.

2、 Basic shell command

1. Basic command

View the relevant shell operation commands under Hadoop.

[[email protected] hadoop2.7]# bin/hadoop fs
[[email protected] hadoop2.7]# bin/hdfs dfs

DFS is the implementation class of FS

2. View command description

[[email protected] hadoop2.7]# hadoop fs -help ls

3. Create directory recursively

[[email protected] hadoop2.7]# hadoop fs -mkdir -p /hopdir/myfile

4. View directory

[[email protected] hadoop2.7]# hadoop fs -ls /
[[email protected] hadoop2.7]# hadoop fs -ls /hopdir

5. Clip file

hadoop fs -moveFromLocal /opt/hopfile/java.txt /hopdir/myfile
##View file
hadoop fs -ls /hopdir/myfile

6. View file contents

##View all
hadoop fs -cat /hopdir/myfile/java.txt
##View end
hadoop fs -tail /hopdir/myfile/java.txt

7. Add file content

hadoop fs -appendToFile /opt/hopfile/c++.txt /hopdir/myfile/java.txt

8. Copy file

The copyfromlocal command is the same as the put command

hadoop fs -copyFromLocal /opt/hopfile/c++.txt /hopdir

9. Copy HDFS files to local

hadoop fs -copyToLocal /hopdir/myfile/java.txt /opt/hopfile/

10. Copy files in HDFS

hadoop fs -cp /hopdir/myfile/java.txt /hopdir

11. Move files within HDFS

hadoop fs -mv /hopdir/c++.txt /hopdir/myfile

12. Merge and download multiple files

The basic commands get and copytolocal have the same effect.

hadoop fs -getmerge /hopdir/myfile/* /opt/merge.txt

13. Delete file

hadoop fs -rm /hopdir/myfile/java.txt

14. View folder information

hadoop fs -du -s -h /hopdir/myfile

15. Delete folder

bin/hdfs dfs -rm -r /hopdir/file0703

3、 Source code address

GitHub · address
https://github.com/cicadasmile/big-data-parent
Gitee · address
https://gitee.com/cicadasmile/big-data-parent

Recommended reading: programming system sorting

entry name
[Java describes design patterns, algorithms, and data structures]GitHub==GitEE
[Java foundation, concurrency, object-oriented, web development]GitHub==GitEE
[detailed explanation of spring cloud microservice basic component case]GitHub==GitEE
[actual combat comprehensive case of springcloud microservice Architecture]GitHub==GitEE
[introduction to basic application of springboot framework to advanced]GitHub==GitEE
[common middleware for integrated development of springboot framework]GitHub==GitEE
[basic case of data management, distributed and architecture design]GitHub==GitEE
[big data series, storage, components, computing and other frameworks]GitHub==GitEE