A structured log viewing tool (I): why do you need structured logs

Time:2021-7-29

Structured log, as the name suggests, is no longer a free-form log, but follows a certain structure: each line of log is a JSON structure. The benefits are obvious: simplify log parsing and make the subsequent processing, analysis or query of logs convenient and efficient.

In contrast to the structured log, the traditional log is a string of one or more lines. The fields are separated by separators such as spaces, and the output format is freely defined. For example, the following are a few lines of logs of a Java spring boot application:

A structured log viewing tool (I): why do you need structured logs

The so-called output format can be defined freely, which means that the output format of the above logs needs to be set as follows:

%d{yyyy-MM-dd HH:mm:ss.SSS} %level ${application_name} trace=%X{X-B3-TraceId},span=%X{X-B3-SpanId} <%t> %c{40} – %msg%n

This kind of log can be called a flat log. The disadvantage is that it is inconvenient to be processed by elk and other log systems. For example, the log regular expression must be configured in logstash to do the parsing work. The converted log is actually a structured log. The log is sent to elasticsearch, and then we can run kibana to query( Since the structured log is finally sent to elasticsearch, simply remove the parsing?). Obviously, whether the parsing done by logstash is correct and reliable is very important for subsequent log analysis and search. One of the keys is that the grok regular expression must be correct. The problem is that regular expressions are not easy to write. There are so many log formats. Each field of each log must write the grok regular expression correctly before it can be parsed. It sounds big. It’s not easy to write correctly, and it’s even more difficult to write efficiently. Therefore, maintaining grok regular expressions is a tired, annoying and error prone “dirty job”. In reality, people will think like this: anyway, elasticsearch can do full-text retrieval. If the analysis is accurate or inaccurate, it won’t make it impossible to continue working. It’s better to throw it directly to elasticsearch without analysis. Therefore, in reality, it is possible to parse several key fields. It is not uncommon that there is no parsing at all – often the whole line of log is directly thrown to elk as a lump of text, so the log found on kibana will look like this:

A structured log viewing tool (I): why do you need structured logs

For the above log, even the basic fields such as log time and level are mixed in the log message field (i.e. message field), which is not resolved. For example, if I want to find the log with error level in the specified time range, I can only use full-text retrieval. Now logs are often counted in TB / Pb. How fast can full-text retrieval be done? In addition, now that microservices / distributed systems have become the mainstream, a single front-end request will span many services, so it is often necessary to extract the log field values attached to the distributed tracking system from the logs of many services from different sources (for example, x-b3-traceid and x-b3-spanid in the previous example are the fields that spring cloud sleuth adds to the log through MDC), Only by associating them can a complete service call chain query be displayed, and then the time and other business fields are combined for query. At this time, the speed and accuracy of full-text retrieval can not meet the requirements. If the query is inaccurate, it is easy to bring out a large number of irrelevant logs, and the query often times out. However, we also think that most of the query criteria are just and / or of multiple ‘fields = values’. Simple and direct equality is completely enough. Why do things that should be very simple become full-text retrieval everywhere? If there is less full-text retrieval, the efficiency can be improved, but as explained earlier, the premise is that the fields in the original log can be accurately parsed! Therefore, one of the main reasons for the low efficiency of query is also log parsing. Won’t it solve the problem by allowing applications to directly output structured logs? If the search log structure is not standardized, it is not reliable to directly calculate each row of the search log structure, which is another way to reduce the resource consumption of each row of the search log structure!

This is the main reason why structured logging is becoming more and more popular. For example, the structured log output of json.4log and even the mainstream log output of json.4log and json.net are basically supported. K8s and docker seem to print out the traditional unstructured string logs by default, but in fact, they are stored in structured logs. The log of mongodb 4 has also been changed to a structured log.

However, there are two sides to the matter: to do a good job in outputting structured logs is not only about setting the output format of the log library. In fact, programmers’ habit of writing logs also needs to be changed. I will write it later. Here, I mainly want to say that structured logs also have disadvantages, but they are basically acceptable or solved, For example, the first thing we can easily think of is that in JSON format, a large number of repeated field names will cause the log file to become larger, accounting for more disk space and bandwidth. This can be compressed. The compressed size is almost the same as that of the traditional text log, The second disadvantage is that programmers may not realize it until they really start using structured logs in daily life, because we will find out how the logs we tail -f see become like this?

A structured log viewing tool (I): why do you need structured logs

If there is stacktrace in the log, this will happen:

A structured log viewing tool (I): why do you need structured logs

This is easy for the machine to analyze, and easy for operation and maintenance and grok, but our eyes are going to spend.

Then why not format the JSON log, align, indent and wrap it, and even add color to the font to print it beautifully? Usually, we don’t do this because the log needs to be more compact. Formatting means additional blank separators and font color modifiers. A formatted single log may occupy many lines, and the number of logs that can be seen on one screen will be much less. In addition, I guess it’s not conducive to parsing by line if it spans multiple lines: structured logs actually don’t comply with the JSON specification, Although each log is a regular JSON object, many logs are not separated by commas. There are no left square brackets at the beginning and right square brackets at the end. In other words, they are not a JSON array. Then think about it again: as a kind of streaming data, it may be easy to add an left square bracket at the beginning of the log. When will you add a right square bracket to indicate the end? In addition, because the comma is also a large number of separators in use inside the JSON object, it is not as efficient as parsing the line break as a special separator. Therefore, if the line break is used as the end mark of a single JSON log, the line break symbol is not suitable for use inside a single JSON log, and the JSON fields have to be crowded together. This format is similar to JSON but not JSON. It is also commonly used in big data processing. It is also called JSON lines(http://jsonlines.org)。 Therefore, looking back, the traditional unstructured string log is not good for nothing, at least it is easier to view than the structured log.

There are generally two solutions to this second problem. One is to output different formats according to different environments: traditional formats are output in the development environment for visual inspection, and structured output is changed in other environments. For example, this can be set in logback.xml:

A structured log viewing tool (I): why do you need structured logs

Alternatively, you can output different formats according to different output targets. For example, the traditional format is used for command line output, while the structured format is used for log file output. The disadvantage is that the problem is still not solved when you need to read the log file to see the history, Moreover, considering that one of the mainstream log collection schemes in the current k8s environment is to directly collect the command line output as a log, it will be bypassed again.

Then there is the second solution, which is the special tool I want to push in the series of articles I want to write. You can use the tool to view while converting. You can directly open the log file like tail or tail – F, or redirect the command line output of the program to this tool, and the tool is responsible for converting the output to the traditional format. This eliminates the need for us to add if / else to the log library setting file. The log library only outputs structured logs, which is simple and direct. It completes complex work through the combination of simple tools, which is in line with the design principle of single responsibility. This conversion tool was available from the beginning. For example, Bunyan and zap brought their own conversion / viewing tools, but now my choice is to write a new one myself. Promoting my own tool is the motivation for writing this series of articles. Let me first list the main disadvantages of those existing tools:

  1. None of them is universal. Because there is no unified standard for structured logs, all log libraries and tools are “self sweeping snow in front of the door” and only consider using them for their own log libraries. For example, Bunyan uses numbers to represent the values of different log levels, while logback will add an @ before the name of the log time field to become @ timestamp, which does not recognize each other. A slightly larger system will have a variety of log formats from different sources, requiring common, simple and direct tools.
  2. The prefix added by other tools cannot be handled. For example, in order to combine the logs of multiple services together, docker compose will prefix each service log with the service name, and the output is as follows:

A structured log viewing tool (I): why do you need structured logs

Moreover, even if docker compose is not prefixed, what if the service is mysql, which does not support structured log output? It can be imagined that those tools that only care about their own proprietary format can’t handle it. What’s worse, if a service is started with PM2 and other process management tools, it will be added with an additional layer and become a two-layer nested prefix, which? Another example is springboot. Even if we change to always output logs in JSON format, a banner will be added to us when springboot starts, as shown in the following figure:

A structured log viewing tool (I): why do you need structured logs

It’s a bit interesting and similar, but it’s a tragedy for Bunyan’s log conversion tools.

  1. Elk is not used all the time, nor is it necessary to use elk all the time. At this time, we generally use grep or awk to search log files. Grep’s main disadvantage is that it will truncate logs printed across multiple lines, and it is also difficult to search according to the time range. In particular, when using traditional logs in the past, we actually ‘because writing regular expressions is troublesome after all, so we don’t bother to spend time extracting and converting the time in the log. Therefore, when we muddle along and need to search by time range, we have to turn the page with our eyes, but now that we have a structured log, Parsing a time value from JSON is a simple thing that a programmer can do. Isn’t it easy to extract it and search by time range? Bunyan didn’t have such a simple function (I guess it’s because it’s not the focus, so…), Plus, it’s not difficult. It’s very useful!

But it’s worth recommending JQ(https://stedolan.github.io/jq/),JQ is very powerful. Structured logs, which are unconventional JSON data, can also be recognized and can do very flexible filtering, conversion and formatted output. However, JQ is a general JSON file processing tool, which is not as convenient to use as special logging tools. Moreover, it can’t handle prefixed or mixed situations, You must filter it out with other tools first.

In short, for the above reasons, I wrote one myself, named jog, which means JSON log. At first, there were only dozens of lines of code and put it in a directory of the company’s git project. Later, I pushed it down and rewritten it in my spare time and published it to GitHub(https://github.com/qiangyt/jog)。 In order to facilitate cross platform use, jog is written in go, so it is an executable file, cross platform, without installation. This article is quite long. I’m not going to write about how to use it. However, in fact, the basic usage of jog is very simple, that is, ‘jog file name’, or ‘docker compose logs | jog’. The usage of other functions is intended to be written in another text. Below is a screenshot of the converted effect, And a function description printed out by ‘jog – H’, over.

A structured log viewing tool (I): why do you need structured logs

A structured log viewing tool (I): why do you need structured logs