Hadoop framework: HDFS introduction and shell management command

Time:2020-11-16

Source code:GitHub. Click here || Gitee. Click here

1、 Overview of HDFS

1. HDFS description

As the most important big data storage technology, HDFS has high fault tolerance, stability and reliability. HDFS (Hadoop distributed file system), which is a distributed file system, is used to store files and locate files through directory tree; The original intention of the design is to manage hundreds of servers and disks, so that applications can store large-scale file data like ordinary file system, which is suitable for the scenario of one write and multiple read, and does not support file modification, which is suitable for data analysis.

2. Infrastructure

Hadoop framework: HDFS introduction and shell management command

HDFS has a master / slave architecture with two core components, namenode and datanode.

NameNode

Responsible for metadata management of file system, including file path name, data block ID, storage location and other information, and configure replica policy to handle client read-write requests.

DataNode

The actual storage and read-write operations of file data are performed. Each datanode stores a part of file data blocks, and the files are distributed and stored in the whole HDFS server cluster.

Client

When uploading files to HDFS, the client splits the files into blocks and uploads them. The client obtains the location information of the file from the namenode, reads or writes data through communication with datanode. The client accesses or manages HDFS through some commands.

Secondary-NameNode

It is not a hot standby of namenode, but it shares the workload of namenode. For example, fsimage and edits are merged regularly and pushed to namenode. In case of emergency, namenode can be recovered.

3. High fault tolerance

Hadoop framework: HDFS introduction and shell management command

Schematic diagram of data block multiple copy storage, For file / users / sameerp / data / part-0, the copy backup is set to 2, and the stored block IDs are 1 and 3; for the file / users / sameerp / data / Part-1, the copy backup is set to 3, and the stored block IDs are 2, 4, and 5; after any single server goes down, at least one backup service exists for each data block, which will not affect file access and improve the overall fault tolerance.

Files in HDFS are physically block storage. The size of the block can be determined by parameters dfs.blocksize If the block setting is too small, the addressing time will be increased; if the block setting is too large, the data transfer time from the disk will be very slow. The size setting of HDFS block mainly depends on the disk transmission rate.

2、 Basic shell commands

1. Basic command

Check Hadoop shell operation commands.

[[email protected] hadoop2.7]# bin/hadoop fs
[[email protected] hadoop2.7]# bin/hdfs dfs

DFS is the implementation class of FS

2. View command description

[[email protected] hadoop2.7]# hadoop fs -help ls

3. Recursively create directory

[[email protected] hadoop2.7]# hadoop fs -mkdir -p /hopdir/myfile

4. View table of contents

[[email protected] hadoop2.7]# hadoop fs -ls /
[[email protected] hadoop2.7]# hadoop fs -ls /hopdir

5. Clip file

hadoop fs -moveFromLocal /opt/hopfile/java.txt /hopdir/myfile
##View files
hadoop fs -ls /hopdir/myfile

6. View file content

##View all
hadoop fs -cat /hopdir/myfile/java.txt
##See the end
hadoop fs -tail /hopdir/myfile/java.txt

7. Additional document content

hadoop fs -appendToFile /opt/hopfile/c++.txt /hopdir/myfile/java.txt

8. Copy files

The copyfromlocal command is the same as the put command

hadoop fs -copyFromLocal /opt/hopfile/c++.txt /hopdir

9. Copy HDFS file to local

hadoop fs -copyToLocal /hopdir/myfile/java.txt /opt/hopfile/

10. Copy files in HDFS

hadoop fs -cp /hopdir/myfile/java.txt /hopdir

11. Moving files within HDFS

hadoop fs -mv /hopdir/c++.txt /hopdir/myfile

12. Download multiple merge files

The basic command get has the same effect as the copytolocal command.

hadoop fs -getmerge /hopdir/myfile/* /opt/merge.txt

13. Delete file

hadoop fs -rm /hopdir/myfile/java.txt

14. View folder information

hadoop fs -du -s -h /hopdir/myfile

15. Delete folder

bin/hdfs dfs -rm -r /hopdir/file0703

3、 Source code address

GitHub · address
https://github.com/cicadasmile/big-data-parent
Gitee · address
https://gitee.com/cicadasmile/big-data-parent

Recommended reading: programming system arrangement

entry name
[Java describes design patterns, algorithms, data structures]GitHub==GitEE
[Java foundation, concurrency, object-oriented, web development]GitHub==GitEE
[detailed explanation of basic components of spring cloud microservices]GitHub==GitEE
[comprehensive practical case of spring cloud microservice Architecture]GitHub==GitEE
[introduction to basic application of springboot framework to advanced level]GitHub==GitEE
[springboot framework integrated development of common middleware]GitHub==GitEE
[basic cases of data management, distribution and architecture design]GitHub==GitEE
[big data series, storage, components, computing and other frameworks]GitHub==GitEE

Recommended Today

Computer basic summary of 2021 autumn recruitment interview – operating system

A series of articles: 2021 autumn recruitment interview computer foundation summary – algorithm, data structure, design pattern, Linux Computer foundation summary of 2021 autumn recruitment interview – Java foundation, JVM, spring framework Computer foundation summary of 2021 autumn recruitment interview database, redis Computer basic summary of 2021 autumn recruitment interview – operating system Computer foundation […]