Handling of high CPU caused by dry goods safety fault

Time:2021-7-27

Handling of high CPU caused by dry goods safety fault

This article is transferred from @twt community, [author] Boya.

During this period, an application test server has a high CPU and cannot log in normally. The details are as follows:

At 6:00 p.m. on February 28, the developer suddenly sent me a screenshot saying that 187 server could not log in and asked me if I had changed my password, as shown in Figure 1 below:

Handling of high CPU caused by dry goods safety fault

Considering that 187 service was only useful for installing the inspection, operation and maintenance monitoring tool during this period, but it was also 10 days ago, and I didn’t change the password, so I was curious to log in and have a look. I found that there was a problem, and I couldn’t re-enter the password, as shown in Figure 2 below:

Handling of high CPU caused by dry goods safety fault

find by hard and thorough search:

It is found that 187 cannot log in normally, but the prompt message indicates that the server has not been closed, but the SSH link has been tampered with. At this time, the first reaction in the brain is that the intruder uses an executable SSH back door, and these components are installed in the form of services to provide residence for malware.

Out of curiosity and the availability of the monitoring tool just deployed, I logged in to the operation and maintenance service monitoring and found that I can also collect resource information such as 187 service CPU, as shown in Figure 3 below. However, the CPU utilization rate is high. What malware should be used to provide services for itself, but it also shows that 187 service is still available, but the new SSH connection cannot be linked.

Handling of high CPU caused by dry goods safety fault

Fortunately, I had an inert habit before. I turned on a CRT on another computer and rarely turned it off after use. At this time, the server 187 was just turned on and can be accessed directly. It was found that the TSM service led to high CPU and high memory utilization of cron. After asking the developer, he found that the service was not used, so he killed it first.
Handling of high CPU caused by dry goods safety fault

So we directly kill the process and then change the system login password, but we still have to find out the problem. After killing the process, we find that the CPU immediately falls down, as shown in Figure 4 below:

Handling of high CPU caused by dry goods safety fault

Through verification: tsm64 is a scanner responsible for spreading mining machines and Backdoors through SSH brute force cracking. It can send remote commands to download and execute malware.

After looking at the services corresponding to the process, the installation path and configuration path are as follows:

root 31803 31798 84 07:44 ? 08:36:57 /tmp/.X19-unix/.rsync/c/lib/64/tsm –library-path /tmp/.X19-unix/.rsync/c/lib/64/ /usr/sbin/httpd rsync/c/tsm64 -t 505 -f 1 -s 12 -S 8 -p 0 -d 1 p ip

It is found that the service should be just a shell service. After looking at the records collected by remote monitoring, it is found that it was invaded and implanted with viruses at more than 4 a.m. on February 27, resulting in high CPU utilization and inability of CRT to log in normally, as shown in Figure 5 and figure 6 below:
Handling of high CPU caused by dry goods safety fault
Handling of high CPU caused by dry goods safety fault

According to the analysis, the service process should be started, which will lead to the high CPU and memory, while the cron process is causing the high memory. Therefore, through crontab – E, it is found that the process service is indeed started, as shown in Figure 7 below
Handling of high CPU caused by dry goods safety fault

Next, to be straightforward, stop the service, delete the files and scheduled jobs under the corresponding path, and continue to observe for two days. As shown in Figure 8 below, it is found that there is no recurrence problem.

Handling of high CPU caused by dry goods safety fault
Handling of high CPU caused by dry goods safety fault

Summary:

Although the problem took more than ten minutes from discovery to solution, it was solved quickly under pure luck. It also shows the multidimensional nature of service non functional fault handling, divergent technical thinking and comprehensive knowledge of operation and maintenance technicians. It is mainly due to more actual combat. The main reasons for this problem are:

1、 The main reason is that the server password setting is too simple, resulting in an opportunity.

2、 The server security protection settings are not perfect.

3、 The problem was caused by the failure of rigorous review of the uploaded documents by the personnel of the project team, resulting in the uploaded files containing viruses.

4、 The login authority of the server system user is not perfect.

5、 When you encounter problems, don’t panic, be calm and calm.

Recommended Today

Implementation example of go operation etcd

etcdIt is an open-source, distributed key value pair data storage system, which provides shared configuration, service registration and discovery. This paper mainly introduces the installation and use of etcd. Etcdetcd introduction etcdIt is an open source and highly available distributed key value storage system developed with go language, which can be used to configure sharing […]