Explain in detail the log in Linux and the method of checking errors with log

Time:2021-6-12

Linux system log
Many valuable log files are automatically created for you by Linux. You can find them in the / var / log directory. Here’s what this directory looks like in a typical Ubuntu system:


Some of the most important Linux system logs include:

/Var / log / syslog or / var / log / messages stores all global system activity data, including boot information. Debian based systems such as Ubuntu store them in / var / log / syslog, while RedHat based systems such as RHEL or CentOS store them in / var / log / messages.
/Var / log / auth.log or / var / log / secure stores logs from the pluggable authentication module (PAM), including successful login, failed login attempts and authentication methods. Ubuntu and Debian store authentication information in / var / log / auth.log, while RedHat and CentOS store it in / var / log / secure.
/Var / log / Kern stores the kernel error and warning data, which is particularly useful for troubleshooting the custom kernel.
/Var / log / cron stores information about cron jobs. Use this data to ensure that your cron job is running successfully.
Digital ocean has a complete tutorial on how rsyslog creates these files in common distributions such as RedHat and CentOS.

The application will also write log files in this directory. For example, common server programs such as Apache, nginx and MySQL can write log files in this directory. Some of the log files are created by the application itself, while others are created by syslog (see below).

What is syslog?
How to create Linux system log file? The answer is through the syslog daemons, which listen for log information on the syslog socket / dev / log and write it to the appropriate log file.

The word “syslog” stands for several meanings and is often used as one of the following abbreviations:

Syslog daemon – a program used to receive, process, and send syslog information. It can send syslog to a centralized server remotely or write to a local file. Common examples include rsyslogd and syslog ng. In this way, people often say “send to syslog”.
Syslog protocol – a transport protocol that specifies how logs are transmitted over the network and a data format definition for syslog information (see below for details). It is formally defined in rfc-5424. The standard port is 514 for text logs and 6514 for encrypted logs. In this way, people often say “through syslog transmission”.
Syslog information – log information or events in syslog format, which includes a message header with several standard fields. In this way, people often say “send syslog”.
Syslog information or events include a header with several standard fields to facilitate analysis and routing. They include the timestamp, the name of the application, the classification or location of the information source in the system, and the priority of the event.

The following shows a log information containing syslog message header, which comes from the sshd daemons that control the remote login to the system. This information describes a failed login attempt:

Copy code

The code is as follows:

<34>1 2003-10-11T22:14:15.003Z server1.com sshd – – pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=10.0.2.2

Syslog format and fields
Each syslog information contains a header with fields. These fields are structured data, which makes it easier to analyze and route events. Here is the format we used to generate the syslog example above. You can match each value to a specific field name.

Copy code

The code is as follows:

<%pri%>%protocol-version% %timestamp:::date-rfc3339% %HOSTNAME% %app-name% %procid% %msgid% %msg%n

Next, you’ll see some of the syslog fields that are most commonly used when looking up or debugging:

time stamp
The time stamp (the example above is 2003-10-11t22:14:15.003z) indicates the time and date when the message was sent in the system. This time may be different when the message is received on another system. The time stamp in the above example can be decomposed into:

October 11, 2003.
T is a required element of the timestamp, which separates the date and time.
22:14:15.003 is a 24-hour time, including the number of milliseconds (003) entering the next second.
Z is an optional element, which refers to UTC time. In addition to Z, this example can also include an offset, such as – 08:00, which means that the time is offset 8 hours from UTC, that is, PST time.
host name
The host name field (corresponding to Server1. Com in the above example) refers to the name of the host or the system sending the information

Application name
Apply the name field (in the example above) sshd:auth )Refers to the name of the program that sent the message

priority
The priority field, or pri (in the example above), tells us how urgent or critical the event is. It consists of two numeric fields: equipment field and emergency field. The urgency field ranges from the number 7 for debug events to the number 0 for emergencies. The device field describes which process created the event. It ranges from the number 0 for kernel information to the number 23 for local applications.

Pri has two output modes. The first one is represented by a single number, which can be calculated as follows: first, multiply the value of equipment field by 8, and then add the value of emergency field: (equipment field) (8) + (emergency field). The second is pri text, which will be output in string format of “device field. Emergency field”. The latter format is more convenient to read and search, but occupies more storage space.

Using log to debug in Linux
Login failure reason
If you want to check whether your system is secure, you can check the failed login and the successful but suspicious login users in the authentication log. Authentication failure occurs when someone logs in with improper or invalid credentials, which usually occurs when using SSH for remote login or Su to other local users for access. These are recorded by the PAM. You will see strings like failed password and user unknown in your log. Successful authentication records include strings such as accepted password and session opened.

Examples of failure:

Copy code

The code is as follows:

pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=10.0.2.2
Failed password for invalid user hoover from 10.0.2.2 port 4791 ssh2
pam_unix(sshd:auth): check pass; user unknown
PAM service(sshd) ignoring max retries; 6 > 3

Successful examples:

Copy code

The code is as follows:

Accepted password for hoover from 10.0.2.2 port 4792 ssh2
pam_unix(sshd:session): session opened for user hoover by (uid=0)
pam_unix(sshd:session): session closed for user hoover

You can use grep to find out which users failed to log in the most. These are potential attackers trying and accessing failed accounts. This is an example on the Ubuntu system.

Copy code

The code is as follows:

$ grep “invalid user” /var/log/auth.log | cut -d ‘ ‘ -f 10 | sort | uniq -c | sort -nr
23 oracle
18 postgres
17 nagios
10 zabbix
6 test

Since there is no standard format, you need to use different commands for each application’s log. Log management system can automatically analyze logs, classify them effectively, and help you extract keywords, such as user name.

Log management system can use automatic parsing function to extract user name from Linux Log. This allows you to see the user’s information and filter it by clicking. In the following example, we can see that the root user logged in as many as 2700 times, because our filtered log only shows the log of the root user’s attempt to log in.

The log management system also allows you to view charts with time as the axis, making it easier for you to find exceptions. If someone fails to login once or twice in a few minutes, it may be a real user and forget the password. However, if there are hundreds of failed logins with different user names, it is more likely that it is trying to attack the system. Here, you can see that on March 12, someone tried to log in to Nagios hundreds of times. This is obviously not a legitimate system user.

Reasons for restart
Sometimes, a server goes down due to a system crash or restart. How do you know when it happened and who did it?

Shutdown command
If someone runs the shutdown command manually, you can see it in the validation log file. Here, you can see that someone remotely logged in as a user of Ubuntu from IP 50.0.134.125, and then shut down the system.

Copy code

The code is as follows:

Mar 19 18:36:41 ip-172-31-11-231 sshd[23437]: Accepted publickey for ubuntu from 50.0.134.125 port 52538 ssh
Mar 19 18:36:41 ip-172-31-11-231 23437]:sshd[ pam_unix(sshd:session): session opened for user ubuntu by (uid=0)
Mar 19 18:37:09 ip-172-31-11-231 sudo: ubuntu : TTY=pts/1 ; PWD=/home/ubuntu ; USER=root ; COMMAND=/sbin/shutdown -r now

Kernel initialization
If you want to see all the reasons for a server restart (including a crash), you can look in the kernel initialization log. You need to search for kernel classes and CPU initializing information.

Copy code

The code is as follows:

Mar 19 18:39:30 ip-172-31-11-231 kernel: [ 0.000000] Initializing cgroup subsys cpuset
Mar 19 18:39:30 ip-172-31-11-231 kernel: [ 0.000000] Initializing cgroup subsys cpu
Mar 19 18:39:30 ip-172-31-11-231 kernel: [ 0.000000] Linux version 3.8.0-44-generic ([email protected]) (gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5) ) #66~precise1-Ubuntu SMP Tue Jul 15 04:01:04 UTC 2014 (Ubuntu 3.8.0-44.66~precise1-generic 3.8.13.25)

Detect memory problems
There are many reasons that can cause a server to crash, but a common one is out of memory.

When your system runs out of memory, the process is killed, usually the process that uses the most resources. An error occurs when the system uses all the memory and a new or existing process attempts to use more. Look for a string like out of memory or a kernel warning like kill in your log file. This information indicates that the system intentionally kills the process or application, rather than allowing the process to crash.

For example:

Copy code

The code is as follows:

[33238.178288] Out of memory: Kill process 6230 (firefox) score 53 or sacrifice child
[29923450.995084] select 5230 (docker), adj 0, size 708, to kill

You can use tools like grep to find these logs. This example is in Ubuntu:

Copy code

The code is as follows:

$ grep “Out of memory” /var/log/syslog
[33238.178288] Out of memory: Kill process 6230 (firefox) score 53 or sacrifice child

Keep in mind that grep also uses memory, so just running grep can also cause a memory shortage error. This is another reason why you should centralize log storage!

Timed task error log
The cron daemon is a scheduler that can run processes on a specified date and time. If the process fails or fails to complete, the cron error appears in your log file. Depending on your distribution, you can find this log in / var / log / cron, / var / log / messages, and / var / log / syslog. Cron tasks fail for many reasons. Usually, the problem is in the process, not the cron daemons themselves.

By default, the output of the cron task is sent by email via postfix. This is a log showing that the message has been sent. Unfortunately, you can’t see the content of the email here.

Copy code

The code is as follows:

Mar 13 16:35:01 PSQ110 postfix/pickup[15158]: C3EDC5800B4: uid=1001 from=<hoover>
Mar 13 16:35:01 PSQ110 postfix/cleanup[15727]: C3EDC5800B4: message-id=<[email protected]>
Mar 13 16:35:01 PSQ110 postfix/qmgr[15159]: C3EDC5800B4: from=<[email protected]>, size=607, nrcpt=1 (queue active)
Mar 13 16:35:05 PSQ110 postfix/smtp[15729]: C3EDC5800B4: to=<[email protected]>, relay=gmail-smtp-in.l.google.com[74.125.130.26]:25, delay=4.1, delays=0.26/0/2.2/1.7, dsn=2.0.0, status=sent (250 2.0.0 OK 1425985505 f16si501651pdj.5 – gsmtp)

You can consider logging cron’s standard output to help you locate the problem. This is an example of how you can use the logger command to redirect cron standard output to syslog. Using your script instead of echo, hellocron can be set to the name of any application you want.

*/5 * * * * echo ‘Hello World’ 2>&1 | /usr/bin/logger -t helloCron
Log entries it creates:

Copy code

The code is as follows:

Apr 28 22:20:01 ip-172-31-11-231 CRON[15296]: (ubuntu) CMD (echo ‘Hello World!’ 2>&1 | /usr/bin/logger -t helloCron)
Apr 28 22:20:01 ip-172-31-11-231 helloCron: Hello World!

Each cron task will record different logs according to the specific type of task and how to output data.

If you want to have clues about the root cause of the problem in the log, you can also add additional log records as needed.