An Experiment with Checking the GNU C Library (glibc) Reveals Bugs and Analysis Challenges

An Experiment with Checking the glibc
Library
Author: Andrey Karpov
Date: 26.02.2014
We have recently carried out an experiment with checking the glibc library by PVS-Studio. Its purpose
was to study how good our analyzer is at checking Linux-projects. The basic conclusion is, not much
good yet. Non-standard extensions used in such projects make the analyzer generate a huge pile of false
positives. However, we have found some interesting bugs.
glibc
glibc - is the GNU Project's implementation of the C standard library. Originally written by the Free
Software Foundation (FSF) for the GNU operating system.
Released under the GNU Lesser General Public License, glibc is free software.
The description is taken from the Wikipedia article: glibc.
Not long ago, the release of a new version of the glibc library was announced. It prompted us to check
this library with our analyzer PVS-Studio. Unfortunately, I was busy with another task for the last two
weeks and got the opportunity to write the article about that check only now. I was actually engaged in
carrying out a thorough comparison of several static analyzers and then writing a large article about the
results of that comparison. This undertaking is very important for us because people are constantly
asking about the differences between our analyzer and Cppcheck and Visual Studio 2013's static
analyzer. So glibc (glibc-2-19-90) had to wait a while before I could finally get down to it.
We were not expecting to find anything terrible there - and we haven't. The glibc library is very high-quality
and it is regularly checked by many analyzers, among which are at least the following ones:
• Coverity;
• Clang;
• Cppcheck.
So it would be a large success to find at least one bug in its code.

Difficulties of analysis
Those not familiar with the internal mechanisms and principles of static analysis tools may view them as
very simple utilities. It's a mistake. These are very complex programs.
There are tools like RATS that may confuse you. If you take a look at RATS' code, you'll see that it merely
searches for certain function names in files. Tools like that are called static code analyzers too, but their
job is actually very far from what real static code analyzers do. Static analysis has nothing to do with a
search with regular expressions [1].
We already mentioned a number of times that a Linux-version of an application is absolutely not the
same thing as a recompiled executable module [2]. There is a huge gulf between an executable module
and a ready software product. One of the obstacles one faces when making a Linux-version is the need
to support specific extensions and things like that.
An ordinary programmer who has never dealt with all those aspects doesn't have a slightest idea of how
difficult and huge this work is. For example, take a call of the strcmp() function:
cmpres = strcmp (newp->from_string, root->from_string);
An ordinary programmer doesn't even suspect which terrible code this single line will turn into after the
preprocessing and which non-standard extensions will be used in it. Particularly in this case, the line
above turns into the following stuff:
cmpres = __extension__ ({ size_t __s1_len, __s2_len;
(__builtin_constant_p (newp->from_string) &&
__builtin_constant_p (root->from_string) &&
(__s1_len = strlen (newp->from_string),
__s2_len = strlen (root->from_string),
(!((size_t)(const void *)((newp->from_string) + 1) -
(size_t)(const void *)(newp->from_string) == 1) ||
__s1_len >= 4) &&
(!((size_t)(const void *)((root->from_string) + 1) -
(size_t)(const void *)(root->from_string) == 1) ||
__s2_len >= 4)) ?
__builtin_strcmp (newp->from_string, root->from_string) :
((size_t)(const void *)((newp->from_string) + 1) -
(size_t)(const void *)(newp->from_string) == 1) &&
(__s1_len = strlen (newp->from_string), __s1_len < 4) ?
(__builtin_constant_p (root->from_string) &&

((size_t)(const void *)((root->from_string) + 1) -
(size_t)(const void *)(root->from_string) == 1) ?
(__extension__ ({ const unsigned char *__s2 =
(const unsigned char *) (const char *) (root->from_string);
int __result = (((const unsigned char *) (const char *)
(newp->from_string))[0] - __s2[0]);
if (__s1_len > 0 && __result == 0) {
__result = (((const unsigned char *) (const char *)
if (__s1_len > 1 && __result == 0) { __result =
(((const unsigned char *) (const char *)
if (__s1_len > 2 && __result == 0)
__result = (((const unsigned char *)
(const char *) (newp->from_string))[3] -
__s2[3]); } } __result; }))) :
(__builtin_constant_p (root->from_string) &&
((size_t)(const void *)((root->from_string) + 1) -
(size_t)(const void *)(root->from_string) == 1) &&
(__s2_len = strlen (root->from_string), __s2_len < 4) ?
((size_t)(const void *)((newp->from_string) + 1) -/
(size_t)(const void *)(newp->from_string) == 1) ?
(- (__extension__ ({ const unsigned char *__s2 =
(const unsigned char *) (const char *) (newp->from_string);
int __result = (((const unsigned char *) (const char *)
(root->from_string))[0] - __s2[0]);
if (__s2_len > 0 && __result == 0) { __result =

(((const unsigned char *) (const char *)
(root->from_string))[1] - __s2[1]);
if (__s2_len > 1 && __result == 0)
{ __result = (((const unsigned char *)
(const char *) (root->from_string))[2] -
__s2[2]); if (__s2_len > 2 && __result == 0)
__result = (((const unsigned char *) (const char *)
(root->from_string))[3] - __s2[3]); } } __result; })))) :
__builtin_strcmp (newp->from_string, root->from_string))));
});
The analyzer is not ready for such a turn-up and starts generating silly false positives on such constructs.
Let me explain this point about false positives by a simpler example. Assume we have the following line
of code:
assert(MAP_FAILED == (void *) -1);
The assert() macro expands into the following fragment:
((((void *) -1) == (void *) -1) ? (void) (0) :
__assert_fail ("((void *) -1) == (void *) -1",
"loadmsgcat.c", 840, __PRETTY_FUNCTION__));
The PVS-Studio analyzer generates a false positive on the comparison (((void *) -1) == (void *) -1):
V501 There are identical sub-expressions to the left and to the right of the '==' operator: ((void *) - 1) ==
(void *) - 1 loadmsgcat.c 840
There is nothing to be surprised at. We have already been through all that stuff when adapting our tool
for applications built with Visual C++. There is quite a lot to wonder at there, too. It takes you much time
and effort to teach the analyzer to understand all those nuances. You have to teach it to understand
that it is dealing with the macro "assert" which is harmless and whose job is but to check that the
MAP_FAILED macro equals "(void *) -1". We have already done all that for Visual C++, but not for Linux.
It is teaching the analyzer to correctly handle such constructs that makes up the hugest part of work on
supporting other compilers. This work is invisible to others, but it really requires you to thoroughly
investigate, support and test all the nuances of the compiler and standard libraries.
I've just slightly opened this door to Hell for you to peek in. I'm going to write a series of articles soon to
show all the difficulties of static analysis tools' development. Sure you'll like them.

Suspicious code fragments found in the glibc library
Although glibc is tested by many tools, we still have managed to find a few interesting bugs. Let's take a
look at them.
A strange expression
char *DCIGETTEXT (....)
{
....
/* Make CATEGORYVALUE point to the next element of the list. */
while (categoryvalue[0] != '0' && categoryvalue[0] == ':')
++categoryvalue;
....
}
V590 Consider inspecting this expression. The expression is excessive or contains a misprint. dcigettext.c
582
The condition can be shortened:
while (categoryvalue[0] == ':')
Perhaps there is no error here and the first part of the (categoryvalue[0] != '0') condition is simply
redundant. But I suspect the error is real and the code should look like this:
while (categoryvalue[0] != '0' && categoryvalue[0] != ':')
Dereferencing a pointer before checking it
This fragment is not necessarily unsafe - perhaps the pointer can never be null. But I should mention it
anyway:
static enum clnt_stat
clntraw_call (h, proc, xargs, argsp, xresults, resultsp, timeout)
CLIENT *h;
u_long proc;
xdrproc_t xargs;
caddr_t argsp;
xdrproc_t xresults;
caddr_t resultsp;

struct timeval timeout;
{
struct clntraw_private_s *clp = clntraw_private;
XDR *xdrs = &clp->xdr_stream;
....
if (clp == NULL)
return RPC_FAILED;
....
}
V595 The 'clp' pointer was utilized before it was verified against nullptr. Check lines: 145, 150.
clnt_raw.c 145
In the same file, not far from this piece, there is a similar defect: V595 The 'clp' pointer was utilized
before it was verified against nullptr. Check lines: 232, 235. clnt_raw.c 232
Dangerous optimization (a vulnerability)
char *
__sha256_crypt_r (key, salt, buffer, buflen)
const char *key;
const char *salt;
char *buffer;
int buflen;
{
....
unsigned char temp_result[32]
....
memset (temp_result, '0', sizeof (temp_result));
....
.... // temp_result not used further on
}
V597 The compiler could delete the 'memset' function call, which is used to flush 'temp_result' buffer.
The RtlSecureZeroMemory() function should be used to erase the private data. sha256-crypt.c 385

The compiler is allowed to remove the call of the memset() function when compiling the Release
version. Well, it is actually obliged to do so for the sake of optimization. Since the 'temp_result' buffer is
not used anywhere after calling the memset() function, the function call itself is not needed too.
This is a vulnerability because the private data will not be erased. The memset() function should be
replaced with a more appropriate one. The analyzer suggests RtlSecureZeroMemory(), which is absent in
Linux of course. But there are other alternatives.
The same defect: V597 The compiler could delete the 'memset' function call, which is used to flush
'temp_result' buffer. The RtlSecureZeroMemory() function should be used to erase the private data.
sha512-crypt.c 396
Undefined behavior
One might expect the glibc library's code to be written in such a way as to provide maximum portability.
However, there are quite a lot of shift constructs in it, which can't be safe from the viewpoint of
portability.
This is what the C language standard has to say about shifts:
The integer promotions are performed on each of the operands. The type of the result is that of the
promoted left operand. If the value of the right operand is negative or is greater than or equal to the
width of the promoted left operand, the behavior is undefined.
The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are filled with zeros. If E1 has an
unsigned type, the value of the result is E1 * 2 pow E2, reduced modulo one more than the maximum
value representable in the result type. If E1 has a signed type and nonnegative value, and E1 * 2 pow E2
is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined.
5 The result of E1 >> E2 is E1 right-shifted E2 bit positions. If E1 has an unsigned type or if E1 has a
signed type and a nonnegative value, the value of the result is the integral part of the quotient of E1 / 2
pow E2. If E1 has a signed type and a negative value, the resulting value is implementation-defined.
It follows from this text that it is illegal to shift negative numbers. However, it is a very common
operation in the glibc library.
An example of left shift:
static void init_cacheinfo (void)
{
....
count_mask = ~(-1 << (count_mask + 1));
....
}
V610 Undefined behavior. Check the shift operator '<<. The left operand '-1' is negative. cacheinfo.c 645

An example of right shift:
utf8_encode (char *buf, int val)
{
....
*buf = (unsigned char) (~0xff >> step);
....
}
The "~0xff" expression is of the 'int' type and equals -256.
Below is a list of all the code fragments with incorrect shift operations:
• strxfrm_l.c 68
• clock_nanosleep.c 38
• ifaddrs.c 786
• xdr_intXX_t.c 35
• xdr_intXX_t.c 41
• private.h 327
• private.h 331
• zic.c 696
• zdump.c 212
• zdump.c 216
• timer_create.c 47
• timer_create.c 49
• loop.c 331
• loop.c 437
• mktime.c 207
• mktime.c 208
• mktime.c 211
• mktime.c 212
• mktime.c 230
• mktime.c 298
• mktime.c 298
• ld-collate.c 298
Using an uninitialized variable
static int send_vc(....)
{
....

int truncating, connreset, resplen, n;
....
#ifdef _STRING_ARCH_unaligned
*anssizp2 = orig_anssizp - resplen;
*ansp2 = *ansp + resplen;
#else
....
}
V614 Uninitialized variable 'resplen' used. res_send.c 790
Incorrect string formatting
In some fragments, '%u' is used to print signed variables, and in some other fragments, '%d' is used to
print unsigned variables. These are trifles, of course, but they are worth mentioning.
For example:
typedef unsigned int __uid_t;
typedef __uid_t uid_t;
int
user2netname (...., const uid_t uid, ....)
{
....
sprintf (netname, "%s.%d@%s", OPSYS, uid, dfltdom);
....
}
V576 Incorrect format. Consider checking the fourth actual argument of the 'sprintf' function. The
SIGNED integer type argument is expected. netname.c 51
Other defects of this kind:
• Consider checking the second actual argument of the 'printf' function. The SIGNED integer type
argument is expected. locarchive.c 1741
• Consider checking the fourth actual argument of the 'printf' function. The SIGNED integer type
argument is expected. locarchive.c 1741

• Consider checking the fifth actual argument of the 'fprintf' function. The SIGNED integer type
argument is expected. res_debug.c 236
• Consider checking the third actual argument of the 'sprintf' function. The UNSIGNED integer
type argument is expected. inet_net_ntop.c 134
• Consider checking the fourth actual argument of the 'sprintf' function. The SIGNED integer type
argument is expected. ns_print.c 500
• Consider checking the fifth actual argument of the 'sprintf' function. The SIGNED integer type
• Consider checking the third actual argument of the 'sprintf' function. The SIGNED integer type
• Consider checking the fourth actual argument of the 'sprintf' function. The SIGNED integer type
• Consider checking the fifth actual argument of the 'sprintf' function. The SIGNED integer type
type argument is expected. ns_print.c 628
• Consider checking the fourth actual argument of the 'sprintf' function. The UNSIGNED integer
• Consider checking the fifth actual argument of the 'sprintf' function. The UNSIGNED integer type
• Consider checking the third actual argument of the 'sprintf' function. The SIGNED integer type
argument is expected. nis_print.c 209
argument is expected. sprof.c 480
Conclusion
We should admit that we made a wrong choice for the experiment with testing our analyzer on code
from the Linux world - the glibc project is just too high-quality. :) There are too few worthy defects to
make the article interesting. But anyway, there are many other well-known and interesting projects
under Linux waiting to be tested by PVS-Studio to demonstrate its capabilities.
References
1. Andrey Karpov. Static analysis and regular expressions.
2. Dmitry Tkachenko. PVS-Studio and CppCat: An Interview with Andrey Karpov, the Project CTO
and Developer.

An Experiment with Checking the GNU C Library (glibc) Reveals Bugs and Analysis Challenges

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (6)

Similar a An Experiment with Checking the GNU C Library (glibc) Reveals Bugs and Analysis Challenges

Similar a An Experiment with Checking the GNU C Library (glibc) Reveals Bugs and Analysis Challenges (20)

Más de Andrey Karpov

Más de Andrey Karpov (20)

Último

Último (20)

An Experiment with Checking the GNU C Library (glibc) Reveals Bugs and Analysis Challenges