Starting with a glibc bug spanning 20 years

Time:2021-10-20

one   origin

These days, I adjusted the cross compilation tool chain of GCC 7.5.0 + glibc 2.23. Because werr is opened by default in GCC 7.5.0, I accidentally found a century bug in glibc that has been hidden for 20 years.

This bug was introduced at the beginning of glibc version 2.0, but it was not finally solved until version 2.25, even if it was glibc-2.0.1.bin.alpha-linux.tar.gz   Release time of Version (04-feb-1997) to glibc-2.25.tar.bz2   The release time of (05-feb-2017) also lasted 20 years plus one day.

When compiling with GCC 7.5, if the two options – wall – werror are enabled (- the English description of wall is enable most warning messages, which means that most alarms are enabled to be reported; – werror means that all alarms are reported as errors and can not be ignored), the following errors will be reported:

nss_nisplus/nisplus-alias.c: In function '_nss_nisplus_getaliasbyname_r':
nss_nisplus/nisplus-alias.c:300:12: error: argument 1 null where non-null expected [-Werror=nonnull]
   char buf[strlen (name) + 9 + tablename_len];
            ^~~~~~~~~~~~~
In file included from ../include/string.h:54:0,
                 from ../sysdeps/generic/hp-timing-common.h:40,
                 from ../sysdeps/x86_64/hp-timing.h:38,
                 from ../include/libc-internal.h:7,
                 from ../sysdeps/x86_64/nptl/tls.h:29,
                 from ../sysdeps/x86_64/atomic-machine.h:20,
                 from ../include/atomic.h:50,
                 from nss_nisplus/nisplus-alias.c:19:
../string/string.h:394:15: note: in a call to function 'strlen' declared here
 extern size_t strlen (const char *__s)
               ^~~~~~
nss_nisplus/nisplus-alias.c:303:39: error: '%s' directive argument is null [-Werror=format-truncation=]
   snprintf (buf, sizeof (buf), "[name=%s],%s", name, tablename_val);
                                       ^~
cc1: all warnings being treated as errors

 

If – werror is not enabled, the compiler will report an alarm at most, and the program can be compiled normally. The above two alarms respectively check the non null input parameters of strlen and snprintf. According to the code logic, if the two codes are executed, the called input parameters must be null pointers.

The source code is as follows:

276 enum nss_status
277 _nss_nisplus_getaliasbyname_r (const char *name, struct aliasent *alias,
278                 char *buffer, size_t buflen, int *errnop)
279 {
280   int parse_res;
281  
282   if (tablename_val == NULL)
283     {
284       __libc_lock_lock (lock);
285  
286       enum nss_status status = _nss_create_tablename (errnop);
287  
288       __libc_lock_unlock (lock);
289  
290       if (status != NSS_STATUS_SUCCESS)
291     return status;
292     }
293  
294   if (name != NULL)
295     {
296       *errnop = EINVAL;
297       return NSS_STATUS_UNAVAIL;
298     }
299  
300   char buf[strlen (name) + 9 + tablename_len];
301   int olderr = errno;
302  
303   snprintf (buf, sizeof (buf), "[name=%s],%s", name, tablename_val);
304  
305   nis_result *result = nis_list (buf, FOLLOW_PATH | FOLLOW_LINKS, NULL, NULL);

  

It can be seen that the input parameter of the strlen function corresponding to line 300 is required to be non empty, but since line 294 makes a non empty judgment and returns, that is, if the if judgment of line 294 is non, it indicates that the name pointer must be empty. At this time, strlen will get the string length abnormally.

How will it be abnormal? We can write a simple example:

1 #include 
2 #include 
3 int main()
4 {
5     printf("%d", strlen(NULL));
6     return 0;
7 }

Without any parameters by default, GCC will report an alarm, but it can still be compiled. After execution, a segmentation fault will appear:

 1 gcc  test1.c
 2 test1.c: In function 'main':
 3 test1.c:5:5: warning: null argument where non-null required (argument 1) [-Wnonnull]
 4      printf("%d", strlen(NULL));
 5      ^
 6 test1.c:5:12: warning: format '%d' expects argument of type 'int', but argument 2 has type 'size_t {aka long unsigned int}' [-Wformat=]
 7      printf("%d", strlen(NULL));
 8             ^
 9 
10 ./a.out
11 Segmentation fault

If the – wall – werror option is added for compilation, an error will be reported directly. Compilation fails:

1 gcc -Wall -Werror test1.c
2 test1.c: In function 'main':
3 test1.c:5:5: error: null argument where non-null required (argument 1) [-Werror=nonnull]
4      printf("%d", strlen(NULL));
5      ^
6 test1.c:5:12: error: format '%d' expects argument of type 'int', but argument 2 has type 'size_t {aka long unsigned int}' [-Werror=format=]
7      printf("%d", strlen(NULL));
8             ^
9 cc1: all warnings being treated as errors

The direct cause of the problem is that strlen in the libc library does not have null pointer protection and directly accesses the memory corresponding to the input parameter, so in fact, null pointer access will occur and the program will exit abnormally.

Similarly, snprintf in line 303 also requires that the parameter corresponding to% s cannot be a null pointer, otherwise a segmentation fault will also appear.

 

It can be seen from the above analysis that some warnings are actually errors and should be treated as errors. In the long evolution process of glibc, many execution paths may not be found (if there is no unit test with 100% coverage and no perfect code review mechanism, no one may ever find them), Or it does not affect the normal release of functions. However, the codes pointed to by these alarms will cause fatal errors once they arrive.

The final glibc correction code is actually very simple, that is, the “if (name! = null)” in line 294 is changed to “if (name = = null)”, and an operator is reversed.

For many bugs with great impact, the actual modification after locating is a simple matter of one or two lines of code, but the key to the problem is to find and locate the bug, and the test work will be affected after the bug is corrected.

This bug can last for 20 years and no one has found it. It only shows that there should be a lot of code in glibc that is not used in the actual scene.

two   Compiler evolution

 

The following table gives the alarm count of static check of new codes in different clang or GCC versions. To be concise, all alarms of clang 7 or older clang and all alarms of GCC 4 or older GCC versions are summarized. It can be seen that each major version upgrade, The compiler team has provided some new tools to the development team, which can find more artifacts of their own code bugs.

Among the 1204 alarms summarized below, 119 alarms are provided by both clang and GCC, and the other 966 alarms are unique to GCC or clang at least in terms of name. Among them, there are 803 alarm check items unique to clang (calculated by clang 12) and 178 alarm check items unique to GCC (calculated by GCC 9). From this indicator alone, clang is far better than GCC in static check, and the “2012 ACM software system Award” award is well deserved.

However, clang itself is to support the llvm, so many functions not related to the llvm are directly called the GCC library interface. It can be considered that clang publishes its own products on the shoulders of the giant of GCC.

At present, various companies have introduced many static inspection tools to improve code quality, but the first step is to make full use of the static inspection function of the compiler, the ancestor of static inspection tools, and then consider eliminating the problems of other static inspection tools. Taking this step, it is very necessary to introduce clang.

first introduced compiler version

Count of new warning options

clang7 or older 584
clang8 12
clang9 223
clang10 55
clang11 33
clang12 15
gcc 4 or older 172
gcc 5 26
gcc 6 24
gcc 7 35
gcc 8 16
gcc 9 24
Grand Total 1204