Developing a distributed file system requires many efforts, but if the problem can be solved accurately, it is priceless. CEPH’s goal is simply defined as: easily scalable to petabytes of capacity, high performance for multiple workloads (input / output operations per second [IOPs] and bandwidth).
Unfortunately, these goals compete with each other (for example, scalability can degrade or inhibit performance or affect reliability). CEPH has developed some very interesting concepts (for example, dynamic metadata partitioning, data distribution and replication), which are only briefly discussed in this article. CEPH’s design also includes fault tolerance to protect a single point of failure, assuming that large-scale (Pb level storage) storage failures are common rather than exceptional. Finally, its design does not assume a particular workload, but includes the ability to adapt to changing workloads and provide the best performance. It takes advantage of POSIX compatibility to accomplish all these tasks, allowing it to transparently deploy applications that currently rely on POSIX semantics (through improvements targeted at CEPH). Finally, CEPH is open source distributed storage and is part of the mainline Linux kernel (2.6.34).
Now let’s talk about CEPH’s architecture and the core elements of the high end. Then I’ll go to another level to explain some of the key aspects of CEPH and provide a more detailed discussion.
CEPH ecosystem can be roughly divided into four parts (see Figure 1): client (data user), metadata server (caching and synchronizing distributed metadata), an object storage cluster (storing data and metadata as objects to perform other key functions), and finally a cluster monitor (performing monitoring functions).
As shown in the figure, the customer uses the metadata server to perform metadata operations (to determine the data location). The metadata server manages the location of the data and where to store new data. It is worth noting that metadata is stored in a storage cluster (labeled “metadata I / O”). The actual file I / O occurs between the client and the object storage cluster. In this way, higher-level POSIX functions (for example, open, close, rename) are managed by the metadata server, while POSIX functions (such as read and write) are managed directly by the object storage cluster.
Another architectural view is provided in Figure 2. A series of servers access the CEPH ecosystem through a client interface, which explains the relationship between metadata servers and object level storage. Distributed storage systems can be viewed in a number of layers, including a storage device format (extend and B-tree-based object file system [ebofs] or an alternative), and an overlay management layer designed to manage data replication, fault detection, recovery, and subsequent data migration, called reliable autonomous distributed object storage (Rados). Finally, the monitor is used to identify component failures, including subsequent notifications.
Sample system resources
**CEPH-STORAGE**
OS: CentOS Linux 7 (Core)
RAM:1 GB
CPU:1 CPU
DISK: 20
Network: 45.79.136.163
FQDN: ceph-storage.linoxide.com
**CEPH-NODE**
OS: CentOS Linux 7 (Core)
RAM:1 GB
CPU:1 CPU
DISK: 20
Network: 45.79.171.138
FQDN: ceph-node.linoxide.com
Configuration before installation
Before installing CEPH storage, we need to complete some steps on each node. The first thing is to make sure that each node’s network is configured and accessible to each other.
Configure hosts
To configure the hosts entry on each node, open the default hosts configuration file as follows.
The code is as follows:
45.79.136.163 ceph-storage ceph-storage.linoxide.com
45.79.171.138 ceph-node ceph-node.linoxide.com
Installing VMware Tools
When the working environment is a VMware virtual environment, it is recommended that you install its open VM tool. You can use the following command to install.
The code is as follows:
Configure firewall
If you are using a restrictive environment with firewall enabled, make sure the following ports are open in your CEPH storage management node and client node.
You must open ports 80, 2003, and 4505-4506 on your admin Calamari node, and allow access to CEPH or Calamari management nodes through port 80, so that clients in your network can access the Calamari web user interface.
You can use the following command to start and enable the firewall in CentOS 7.
The code is as follows:
# systemctl enable firewalld
Run the following command to make the admin Calamari node open the ports mentioned above.
The code is as follows:
# firewall-cmd –zone=public –add-port=2003/tcp –permanent
# firewall-cmd –zone=public –add-port=4505-4506/tcp –permanent
# firewall-cmd –reload
In the CEPH monitor node, you want to allow the following ports in the firewall.
The code is as follows:
The following default port list is then allowed to interact with clients and monitoring nodes and send data to other OSDs.
The code is as follows:
If you work in a non production environment, it is recommended that you disable the firewall and SELinux settings. In our test environment, we will disable the firewall and SELinux.
The code is as follows:
# systemctl disable firewalld
System upgrade
Now upgrade your system and restart for the required changes to take effect.
The code is as follows:
# shutdown -r 0
Setting CEPH users
Now we will create a separate sudo user to install the CEPH deploy tool on each node and allow the user to access each node without a password, because it needs to install software and configuration files on the CEPH node without a password prompt.
Run the following command to create a new user with a separate home directory on the CEPH storage host.
The code is as follows:
Each new user in the node must have sudo permission. You can use the command shown below to give sudo permission.
The code is as follows:
ceph ALL = (root) NOPASSWD:ALL
[[email protected] ~]# sudo chmod 0440 /etc/sudoers.d/ceph
Set SSH key
Now we will generate the SSH key at the CEPH management node and copy the key to each CEPH cluster node.
Run the following command in CEPH node to copy its SSH key to CEPH storage.
The code is as follows:
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Created directory ‘/root/.ssh’.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
5b:*:*:*:*:*:*:*:*:*:c9 [email protected]
The key’s randomart image is:
+–[ RSA 2048]—-+
The code is as follows:
Number of configured PID
To configure the PID number value, we will check the default kernel value using the following command. By default, it is a small maximum of 32768 threads.
As shown in the figure below, configure the value to a larger number by editing the system configuration file.
Configuration management node server
After configuring and validating all networks, we now install CEPH deploy with CEPH users. Check the hosts entry by opening the file.
The code is as follows:
ceph-storage 45.79.136.163
ceph-node 45.79.171.138
Run the following command to add its library.
The code is as follows:
Or create a new file and update the CEPH library parameters. Don’t forget to replace your current release and version number.
The code is as follows:
[ceph-noarch]
name=Ceph noarch packages
baseurl=http://ceph.com/rpm-{ceph-release}/{distro}/noarch
enabled=1
gpgcheck=1
type=rpm-md
gpgkey=https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc
Then update your system and install the CEPH deploy package.
Install CEPH deploy package
We run the following command and the CEPH deploy install command to update the system as well as the latest CEPH library and other software packages.
The code is as follows:
Configure cluster
Use the following command to create a new directory on the CEPH management node and enter the new directory to collect all output files and logs.
The code is as follows:
# cd ~/ceph-cluster
# ceph-deploy new storage
If you successfully execute the above command, you will see that it creates a new configuration file.
Now configure CEPH’s default configuration file, open it in any editor, and add the following two lines under the global parameters that affect your public network.
The code is as follows:
osd pool default size = 1
public network = 45.79.0.0/16
Installing CEPH
Now we are ready to install CEPH on each node associated with the CEPH cluster. We use the following command to install CEPH on CEPH storage and CEPH node.
The code is as follows:
It will take some time to process all the required repositories and install the required software packages.
When the CEPH installation process on both nodes is complete, we will next create the monitor and collect the key by running the following command on the same node.
The code is as follows:
Setting up OSD and OSD Daemons
Now we’re going to set up disk storage. First run the following command to list all the available disks.
The code is as follows:
The results will list the disks used in your storage node and you will use them to create OSD. Let’s run the following command, please use your disk name.
The code is as follows:
# ceph-deploy disk zap storage:sdb
To complete the OSD configuration, run the following command to configure the log disk and data disk.
The code is as follows:
# ceph-deploy osd activate storage:/dev/sdb1:/dev/sda1
You need to run the same command on all nodes and it will clear everything on your disk. After that, in order for the cluster to work, we need to use the following command to copy different keys and configuration files from the CEPH management node to all relevant nodes.
The code is as follows:
Test CEPH
We are almost finished setting up the CEPH cluster. Let’s run the following command on the CEPH management node to check the running CEPH status.
The code is as follows:
# ceph health
HEALTH_OK
If you don’t see any error messages in CEPH status, you have successfully installed the CEPH storage cluster on CentOS 7.