Source code:GitHub. Click here || Gitee. Click here
1、 Overview of HDFS
1. HDFS description
As the most important big data storage technology, HDFS has high fault tolerance, stability and reliability. HDFS (Hadoop distributed file system), which is a distributed file system, is used to store files and locate files through directory tree; The original intention of the design is to manage hundreds of servers and disks, so that applications can store large-scale file data like ordinary file system, which is suitable for the scenario of one write and multiple read, and does not support file modification, which is suitable for data analysis.
2. Infrastructure
HDFS has a master / slave architecture with two core components, namenode and datanode.
NameNode
Responsible for metadata management of file system, including file path name, data block ID, storage location and other information, and configure replica policy to handle client read-write requests.
DataNode
The actual storage and read-write operations of file data are performed. Each datanode stores a part of file data blocks, and the files are distributed and stored in the whole HDFS server cluster.
Client
When uploading files to HDFS, the client splits the files into blocks and uploads them. The client obtains the location information of the file from the namenode, reads or writes data through communication with datanode. The client accesses or manages HDFS through some commands.
Secondary-NameNode
It is not a hot standby of namenode, but it shares the workload of namenode. For example, fsimage and edits are merged regularly and pushed to namenode. In case of emergency, namenode can be recovered.
3. High fault tolerance
Schematic diagram of data block multiple copy storage, For file / users / sameerp / data / part-0, the copy backup is set to 2, and the stored block IDs are 1 and 3; for the file / users / sameerp / data / Part-1, the copy backup is set to 3, and the stored block IDs are 2, 4, and 5; after any single server goes down, at least one backup service exists for each data block, which will not affect file access and improve the overall fault tolerance.
Files in HDFS are physically block storage. The size of the block can be determined by parameters dfs.blocksize If the block setting is too small, the addressing time will be increased; if the block setting is too large, the data transfer time from the disk will be very slow. The size setting of HDFS block mainly depends on the disk transmission rate.
2、 Basic shell commands
1. Basic command
Check Hadoop shell operation commands.
[[email protected] hadoop2.7]# bin/hadoop fs
[[email protected] hadoop2.7]# bin/hdfs dfs
DFS is the implementation class of FS
2. View command description
[[email protected] hadoop2.7]# hadoop fs -help ls
3. Recursively create directory
[[email protected] hadoop2.7]# hadoop fs -mkdir -p /hopdir/myfile
4. View table of contents
[[email protected] hadoop2.7]# hadoop fs -ls /
[[email protected] hadoop2.7]# hadoop fs -ls /hopdir
5. Clip file
hadoop fs -moveFromLocal /opt/hopfile/java.txt /hopdir/myfile
##View files
hadoop fs -ls /hopdir/myfile
6. View file content
##View all
hadoop fs -cat /hopdir/myfile/java.txt
##See the end
hadoop fs -tail /hopdir/myfile/java.txt
7. Additional document content
hadoop fs -appendToFile /opt/hopfile/c++.txt /hopdir/myfile/java.txt
8. Copy files
The copyfromlocal command is the same as the put command
hadoop fs -copyFromLocal /opt/hopfile/c++.txt /hopdir
9. Copy HDFS file to local
hadoop fs -copyToLocal /hopdir/myfile/java.txt /opt/hopfile/
10. Copy files in HDFS
hadoop fs -cp /hopdir/myfile/java.txt /hopdir
11. Moving files within HDFS
hadoop fs -mv /hopdir/c++.txt /hopdir/myfile
12. Download multiple merge files
The basic command get has the same effect as the copytolocal command.
hadoop fs -getmerge /hopdir/myfile/* /opt/merge.txt
13. Delete file
hadoop fs -rm /hopdir/myfile/java.txt
14. View folder information
hadoop fs -du -s -h /hopdir/myfile
15. Delete folder
bin/hdfs dfs -rm -r /hopdir/file0703
3、 Source code address
GitHub · address
https://github.com/cicadasmile/big-data-parent
Gitee · address
https://gitee.com/cicadasmile/big-data-parent
Recommended reading: programming system arrangement
entry name |
---|
[Java describes design patterns, algorithms, data structures]GitHub==GitEE |
[Java foundation, concurrency, object-oriented, web development]GitHub==GitEE |
[detailed explanation of basic components of spring cloud microservices]GitHub==GitEE |
[comprehensive practical case of spring cloud microservice Architecture]GitHub==GitEE |
[introduction to basic application of springboot framework to advanced level]GitHub==GitEE |
[springboot framework integrated development of common middleware]GitHub==GitEE |
[basic cases of data management, distribution and architecture design]GitHub==GitEE |
[big data series, storage, components, computing and other frameworks]GitHub==GitEE |