Build an upgraded version of Hadoop cluster based on docker


Abstract: kiwenlau/hadoop-cluster-dockerIt was last yearDocker funThe game was developed. Come onsecond awardAnd won an apple watch. At present, the project has obtained 236 stars on GitHub and 2000 + downloads of dockerhub images. All in all, the project is quite popular. This blog will introduce the updated version of the project.

1、 Project introduction

takeHadoopPackage toDockerIn the image, Hadoop cluster can be quickly built on a single machine, which is convenient for novice testing and learning.

As shown in the figure below, Hadoop master and slave run in different docker containers respectively. Namenode and ResourceManager run in Hadoop master container, and datanode and nodemanager run in Hadoop slave container. Namenode and datanode are components of Hadoop distributed file system HDFS, which are responsible for storing input and output data, while ResourceManager and nodemanager are components of yarn, which are responsible for scheduling CPU and memory resources.

Build an upgraded version of Hadoop cluster based on docker

Previous versions used serf / dnsmasq to provide DNS services for Hadoop clusters. Due to the update of docker network function, it is not needed now. In the updated version, create a separate network for the Hadoop cluster using the following command:

sudo docker network create --driver=bridge hadoop

Then, when running the Hadoop container, use the “- net = Hadoop” option. At this time, all containers will run in the Hadoop network, and they can communicate through the container name.

Key points of project updating:

  • Removing serf / dnsmasq

  • Merge master and slave images

  • usekiwenlau/compile-hadoopProject compiled hadoo for installation

  • Optimize Hadoop configuration

2、 3-node Hadoop cluster building steps

1. Download docker image

sudo docker pull kiwenlau/hadoop:1.0

2. Download GitHub repository

git clone

3. Create Hadoop network

sudo docker network create --driver=bridge hadoop

4. Run docker container

cd hadoop-cluster-docker

Operation results

start hadoop-master container...
start hadoop-slave1 container...
start hadoop-slave2 container...
[email protected]:~# 
  • Three containers, one master and two slave are started

  • After running, it enters the / root directory of the Hadoop master container

5. Start Hadoop


6. Run wordcount


Operation results

input file1.txt:
Hello Hadoop
input file2.txt:
Hello Docker
wordcount output:
Docker    1
Hadoop    1
Hello    2

Hadoop website management address: is the IP of the host running the container.

3、 N-node Hadoop cluster building steps

1. Preparation

  • Refer to part 2 1-3: Download docker image, Download GitHub repository, and create Hadoop network

2. Rebuild the docker image

./ 5
  • Any n can be specified (n > 1)

3. Start the docker container

./ 5
  • Same as N in step 2.

4. Run Hadoop

  • Refer to part 2 5-6: start Hadoop and run wordcount.

reference resources

  1. Building multi node Hadoop cluster based on docker

  2. How to Install Hadoop on Ubuntu 13.10

Copyright notice
Please indicate the author when reprintingKiwenLauAnd this article address:

Recommended Today

Comparison and analysis of Py = > redis and python operation redis syntax

preface R: For redis cli P: Redis for Python get ready pip install redis pool = redis.ConnectionPool(host=’′, port=6379, db=1) redis = redis.Redis(connection_pool=pool) Redis. All commands I have omitted all the following commands. If there are conflicts with Python built-in functions, I will add redis Global command Dbsize (number of returned keys) R: dbsize P: print(redis.dbsize()) […]