Summary of Linux Operations and Maintenance from Primary to Advanced Knowledge Points


Operations and maintenance engineers in the early stage is a very hard work, in this period may be engaged in repairing computers, pinching nets, moving machines, appear to have no status! Time is also fragmented, a variety of trivial things around you, it is difficult to reflect personal value, gradually very confused about the industry, feel that there is no future for development.

These dull and tasteless work will indeed make people scarce. Technically speaking, these are all basic skills, which will help the later operation and maintenance work invisibly, because I came here too, and I can deeply understand it. Therefore, in this period, we must maintain a positive attitude and continue to learn. Someday in the future, I believe I will repay you!

Well, to get to the point, according to my years of operation and maintenance work experience, to share with you the learning route of senior operation and maintenance engineers.


1. Linux Foundation
At the beginning, you need to be familiar with Linux/Windows operating system installation, directory structure, startup process, etc.

2. System Management

Mainly learn Linux system, production environment is basically completed in the character interface, so we need to master dozens of basic management commands commonly used, including user management, disk partition, package management, file permissions, text processing, process management, performance analysis tools, etc.

3. Network Foundation

OSI and TCP/IP models must be familiar with. The basic concepts and implementation principles of switches and routers should be known.

4. The Basis of Shell Scripting Programming

To master the basic grammatical structure of shell, you can write simple scripts.


1. Network Services
The most commonly used network services must be deployed, such as vsftp, nfs, samba, bind, dhcp, etc.

Code version management system is indispensable, you can learn the mainstream SVN and GIT, can deploy and simple use.

Data is often transmitted between servers, so use Rsync and scp.

Data synchronization: inotify/sersync.

Repetitive completion of some work, can be written as scripts to run regularly, so you have to configure the timing task service crond under Linux.

2. Web Services

Every company will basically have a website, so that the website can run, we need to build a Web service platform.

If it is developed in PHP language, LAMP and LNMP website platforms are usually built. This is a spelling of a combination of technical terms. Separately speaking, Apache, Nginx, MySQL and PHP must be deployed.

If it is developed in JAVA language, Tomcat is usually used to run the project. In order to improve the access speed, Nginx can be used to reverse proxy Tomcat, Nginx can process static pages, Tomcat can process dynamic pages to achieve dynamic and static separation.

Not so simple to deploy, but also know how HTTP works, simple performance tuning.

3. Database

MySQL is the most widely used open source database in the world. It’s absolutely right to learn it! There are also some simple SQL statements, user management, commonly used storage engines, database backup and recovery.

To go deeper, master-slave replication, performance optimization, mainstream cluster solutions: MHA, MGR, etc. NoSQL is so popular, of course. Just learn Redis and MongoDB.

4. Security

Security is very important, don’t wait for the system to be intruded, then do security policy, it’s too late! Therefore, when a server is online, it should do security access control strategy immediately, such as using iptables to restrict only trust source IP access, closing some useless services and ports, etc.

Some common types of attacks must be known, otherwise how to prescribe the right medicine? For example, CC, DDOS, ARP and so on.

5. Monitoring System

Monitoring is indispensable. It is a life-saving straw for timely detection and traceability of problems. You can choose to learn the main Zabbix open source monitoring system, which has rich functions and can meet the basic monitoring needs. Monitoring points include basic server resources, interface status, service performance, PV/UV, log and so on.

You can also get a dashboard to display several real-time key data, such as Grafana, which will be very cool.

6. Shell scripting advancement

Shell scripts are powerful tools for Linux to automate its work. They must be skilled in writing, so they need to learn more about functions, arrays, signals, e-mail, etc.

Text processing three swordsmen (grep, sed, awk) have to play 6 ah, Linux text processing is expected to them.

7. The Foundation of Python Development

Shell scripts can only accomplish some basic tasks, and want to accomplish more complex tasks, such as calling API, multi-process, etc. You need to learn a high-level language.

Python is the most widely used language in the field of operation and maintenance. It’s easy to use, so it’s right to learn it. At this stage, you can master the basic grammar structure, file object operation, function, iteration object, exception handling, email, database programming and so on.


1. Web static cache

Users always shout that the access to the website is slow, to see that the server resources are still very rich! The slow access to the website may not be caused by the saturation of the server resources, there are many factors, such as the number of network, forwarding layers and so on.

For the network, there is a communication problem between North and South, and the access will be slow. This can be solved by using CDN, while caching static pages, intercepting requests at the top of the response as far as possible, and reducing back-end requests and response time.

If you do not use CDN, you can also use caching services such as Squid, Varnish, Nginx to achieve static page caching and put it at the traffic entrance.

2. Cluster

After all, the resources of a single server are limited, and the resistance to high traffic is certainly unsupported. The key technology to solve this problem is to use load balancer, expand multiple Web servers horizontally, and provide services to the outside world at the same time, so that the scalability can be doubled. The main open source technologies of load balancer are LVS, HAProxy and Nginx. Be familiar with one or two!

Web server performance bottleneck has been solved, database is more critical, or cluster. Take MySQL as an example, it can be a master-slave architecture. On this basis, read-write separation, master-write, slave-read, slave-library can be expanded horizontally. There is a four-tier load balancer in front, carrying tens of millions of PV, which is appropriate!

High-availability software also needs to be able to avoid single-point sharp tools, the mainstream are Keepalived, Heartbeat and so on.

What a lot of pictures! NFS shared storage support is too slow, easy to handle! Upper distributed file system, parallel processing tasks, no single point, high reliability, high performance and other characteristics, the mainstream FastDFS, MFS, HDFS, Ceph, GFS and so on. Initially, I suggest learning FastDFS to meet small and medium-sized needs.

3. Virtualization

Hardware server resource utilization is very low and wasteful. It can virtualize the server with more free time into many virtual machines, each of which is a complete operating system. It can greatly improve the utilization of resources. It is recommended to learn the open source KVM + OpenStack cloud platform.

Virtual machine as the basic platform is OK, but the application business flexibility is too heavy! Start for several minutes, the file is so big, rapid expansion is too arduous! ____________

In other words, the main characteristics of upper containers and containers are rapid deployment and environmental isolation. A service is encapsulated in a mirror and hundreds of containers can be created in minutes.

Docker is the mainstream container technology.

Of course, single Docker in production environment can not meet business needs in most cases. Kubernetes and Swarm cluster management containers can be deployed to form a large resource pool, centralized management, and provide strong support for the basic architecture.

4. Automation

Repeated work, not only can not improve efficiency, value can not be reflected.

All operations and maintenance work is standardized, such as environment version, directory structure, operating system and so on. Only on the basis of standardization can more aspects of automation be achieved. A complex task can be accomplished by clicking on a mouse or tapping several commands. It’s refreshing!

Therefore, all operations are as automated as possible to reduce human errors and improve work efficiency.

Mainstream Server Centralized Management Tools: Ansible, Saltstack

Either of these two choices will do.

Continuous Integration Tool: Jenkins

5. Python Development Advancement

You can learn Python development further and master object-oriented programming.

It’s better to learn a web framework development website, such as Django and Flask, mainly to develop operation and maintenance management system, write some complex processes into the platform, and then integrate centralized management tools to build a management platform belonging to operation and maintenance itself.

6. Log Analysis System

Log is also very important. Periodic analysis can discover potential dangers and extract valuable things.

An Open Source Logging System: ELK

Learn to deploy and use to provide log view requirements for development.

7. Performance optimization

It is not enough to deploy only. Performance optimization can maximize service load.

This piece is also more difficult, but also one of the key points of high salary, in order to make money, we have to work hard to learn ah!

We can start thinking from the dimensions of hardware layer, operating system layer, software layer and architecture layer.


1. Persistence

Learning is a long process. It’s a career that each of us needs to stick to for a lifetime.

It is precious to persist, difficult to persist, and successful to persist!

2. Goals

No goal is not called work, no quantification is not called goal.

At each stage, set a goal.

For example: first set a small goal that can be achieved, make it 100 million!

3. Sharing

Learn to share, the value of technology is to effectively transfer knowledge to the outside world, so that more people know it.

As long as everyone comes out with something, think about what it will be like.

Direction is right, so don’t be afraid of the long way!

Ten Common Senses of Linux

1. GNU and GPL

The GNU Program (also known as the Ganu Program) was launched by Richard Stallman (Richard Stolman) publicly on September 27, 1983, as the Free Software Collective Collaboration Program. Its goal is to create a completely free operating system. GNU is also known as Free Software Engineering Project.

GPL is a general public license (GPL) of GNU, which is the concept of “anti-copyright”. It is one of the GNU protocols. The purpose of GPL is to protect the free use, copy, research, modification and distribution of GNU software. At the same time, software must be released in the form of source code.

GNU system combines with Linux kernel to form a complete operating system: a GNU system based on Linux, which is usually called “GNU/Linux”, or Linux for short.

2. Linux Distribution

A typical Linux distribution includes the Linux kernel, some GNU libraries and tools, command-line shells, graphical interface X Window systems and corresponding desktop environments, such as KDE or GNOME, and thousands of applications ranging from office suites, compilers, text editors to scientific tools.

Mainstream distribution:

Red Hat Enterprise Linux、CentOS、SUSE、Ubuntu、Debian、Fedora、Gentoo

3. Unix and Linux

Linux is based on Unix and belongs to Unix class. Uinx operating system supports multi-user, multi-task, multi-thread and multi-CPU architecture. Linux inherits Unix’s network-centric design idea and is a stable multi-user network operating system.

4. Swap partition

Swap partition, that is, swap area, swaps with Swap when physical memory is insufficient. That is, when the physical memory of the system is insufficient, part of the space in the hard disk is released for the current running program to use. When those programs are running, the saved data is restored from the Swap partition to memory. Programs that are freed of memory space are generally long-term programs that do not operate.

Swap space should generally be greater than or equal to the size of physical memory, and the minimum should not be less than 64M, the maximum should be twice the size of physical memory.

5. Concept of GRUB

GNU GRUB (GR and Unified Bootloader abbreviated as “GRUB”) is a boot manager for multiple operating systems from GNU projects.

GRUB is a boot manager supporting multiple operating systems. In a computer with multiple operating systems, GRUB can select the operating system that users want to run when the computer starts. At the same time, GRUB can boot different cores on Linux system partitions, and can also be used to pass startup parameters to the kernel, such as entering single-user mode.

6. Buffer and Cache

Cache (cache) is a temporary memory between CPU and memory. The cache capacity is much smaller than memory, but the switching speed is much faster than memory. Cache solves the contradiction between CPU operation speed and memory read-write speed by caching file data blocks, and improves the data exchange speed between CPU and memory. The larger the cache, the faster the CPU will process.

Buffer (Buffer) Cache Memory (Buffer) accelerates access to data on disk by caching data blocks of disk (I/O device), reduces I/O, and improves the speed of data exchange between memory and hard disk (or other I/O device). Buffer is about to be written to disk, while Cache is read from disk.

7. TCP shakes hands three times

(1) The requester sends SYN (SYN = A) data packet and waits for confirmation from the responder.

(2) The responder receives SYN and returns SYN (A+1) and its own A CK (K) to the requester.

(3) The requester receives the SYN+ACK packet from the responder and sends the acknowledgment packet ACK (K+1) to the responder again.

The requester and the responder establish a TCP connection, complete three handshakes and start data transmission.

8. Linux system directory structure

Linux file system uses a tree directory structure with links, that is, there is only one root directory (usually expressed as “/”), which contains information about subdirectories or files; subdirectories can also contain information about subdirectories or files at a lower level.

/ First, the root of the hierarchy, the root directory of the entire file system hierarchy. That is the entrance of the file system, the highest level directory.

/ boot: Contains files needed by the Linux kernel and system bootstrap programs, such as kernel, initrd; grub system boot manager is also in this directory.

/ bin: Commands required by the basic system are similar to “/usr/bin”. Files in this directory are executable. Ordinary users can also execute them.

/ sbin: Basic system maintenance commands that can only be used by superusers.

/ etc: All system configuration files.

/ dev: Device file storage directory, such as terminal, disk, CD-ROM, etc.

/ var: Store frequently changing data, such as logs, mail, etc.

/ home: The default storage directory for ordinary users.

/ opt: The storage directory of third-party software, such as user-defined packages and compiled packages, is installed in this directory.

/ lib: The library file and the kernel module store the directory, including all the shared library files needed by the system program.

9. Hard Link and Soft Link

Hard Link: Hard Link is a link that uses the same index node (inode number), which allows multiple file names to point to the same file index node (hard links do not support directory links, can not cross-partition links), and deletes a hard link without affecting the source files of the index node and the multiple hard links below it. Answer.

ln source new-link
Soft Link (Symbolic Link): Symbolic links are links created in the form of paths, similar to fast links in Windows. Symbolic links allow multiple file names to be created to link to the same source file, delete the source file, and all soft links below will not be available. (Soft connection supports directories, cross-partition, cross-file system)

ln -s source new-link

10. RAID Technology

Redundant Arrays of Independent Disks (RAID), cheap redundant (independent) disk arrays.

RAID is a technology that combines multiple independent physical hard disks in different ways to form a hard disk group (logical hard disk), providing higher storage performance and data backup than a single hard disk. RAID technology can combine multiple disks as a logical volume to provide disk spanning function; can divide data into multiple data blocks (Blocks) to write/read multiple disks in parallel to improve the speed of access to disks; and can provide fault tolerance through mirroring or verification operations. Specific functions are implemented with different RAID combinations.

From the user’s point of view, the disk group composed of RAID is like a hard disk, which can be partitioned, formatted and other operations. RAID has a much faster storage speed than a single hard disk, and can provide automatic data backup and good fault tolerance.

RAID level, different RAID combinations are divided into different RAID levels:

RAID 0: Stripping strip storage technology. All disks are read and written in parallel. It is the simplest form to build disk array. It only needs more than two hard disks. It has low cost and can provide the performance and throughput of the whole disk. But RAID 0 does not provide data redundancy and error repair function. Damage to a single hard disk can result in loss of all data. (RAID 0 only improves disk capacity and performance, does not provide data reliability assurance, and is suitable for environments with low data security requirements.)

RAID 1: Mirror storage, which realizes data redundancy by mirroring one disk of two disks to another, and generates backup data on two disks, whose capacity is only equal to the capacity of one disk. When data is written to one disk, mirrors are produced on another idle disk to maximize the reliability and repairability of the system without affecting performance; when the original data is busy, data can be read directly from the mirror copy (read from one of the two hard disks faster) to improve reading performance. 。 Conversely, RAID 1 writes slowly. RAID 1 generally supports “heat exchange”, that is, the removal or replacement of hard disks in the array can be carried out in the running state of the system without interrupting the exit of the system. RAID 1 is the highest unit cost of hard disk in the disk array, but it provides high data security, reliability and availability. When a hard disk fails, the system can automatically switch to the mirror disk to read and write without reorganizing the failed data.

RAID 0+1: Also known as RAID 10, it is actually a combination of RAID 0 and RAID 1. It divides data in bits or bytes continuously and reads/writes multiple disks in parallel, while doing image redundancy for each disk. Through the combination of RAID 0+1, data is distributed on multiple disks, each disk has its physical mirror disk, providing redundancy, allowing one disk to fail without affecting data availability, and has fast read/write capability. RAID 0+1 or less requires four hard disks to build band-set in the disk image. RAID 0+1 technology guarantees high reliability of data and high efficiency of data reading/writing.

RAID 5: It is a storage solution that considers storage performance, data security and storage cost. RAID 5 can be understood as a compromise between RAID 0 and RAID 1. RAID 5 requires at least three hard disks. RAID 5 can provide data security for the system, but the degree of security is lower than mirror image and disk space utilization is higher than mirror image. RAID 5 has the same data reading speed as RAID 0, but only one more parity information. The speed of writing data is slightly slower than that of writing to a single disk. At the same time, because multiple data correspond to one parity check information, RAID 5 has higher disk space utilization rate and lower storage cost than RAID 1, so it is a solution that has been widely used at present.