Bounds checking for Variable Length Arrays (VLA)? - c

Is there a way to check for buffer overflows in VLA's ? I used -fstack-protector-all -Wstack-protector but get these warnings:
warning: not protecting local variables: variable length buffer
Is there a library for achieving this ? (-lefence is for heap memory)
I'm currently using Valgrind and gdb.

You can use -fmudflap instead of -fstack-protector-all
Update: Some documentation and options are here http://gcc.gnu.org/wiki/Mudflap_Pointer_Debugging

Perhaps using alloca() will help. That's annoying, because c99 should save you from having to use it, but the GCC man page seems to say that the stack protection code will be turned on if you use alloca().
Of course the real solution is to write perfect, bug free code that never tries to corrupt the stack.

I don't see how a library could do this for you; with a variable-length array, you're not calling any functions to do the indexing, so there's no place to "hook in" a library. With malloc(), the allocation is explicit in a function and you can track it.
Of course, you could go through the code and use preprocessor trickery to add some macro to each indexing point, and have the macro expand to code that checks the boundaries. But that is very intrusive.
I'm thinking something like changing:
void work(int n)
{
int data[n]; /* Our variable-length array. */
data[0] = 0;
}
into something like:
#include "vla-tracking.h"
void work(int n)
{
VLA_NEW(int, data, n); /* Our variable-length array. */
VLA_SET(data, 0, 0);
}
Then come up with suitable macro definitions (and auxiliary code) to track the accesses. As I said, it won't be pretty. Of course, the idea is that the macros would be able to "compile out" to just the plain definitions, controlled by some build-time setting (debug/release mode, or whatever).

Related

Unlogical C6001 warning: Using uninitialized memory warning in C with Visual Studio

Given this code:
#include <stdlib.h>
typedef struct
{
int *p;
} MyStruct;
MyStruct Test()
{
MyStruct ms;
ms.p = malloc(sizeof(int) * 5);
if (!ms.p) exit(-1);
return ms;
}
int main(void)
{
while (1)
{
MyStruct t = Test();
free(t.p); // C6001: Using uninitialized memory 't.p'.
}
}
Visual Studio shows C6001 warning on the free call line. However, I see there is no way to achieve the free line with the memory t.p uninitialized. What am I missing ?
This is very much a false positive and still exists even in MSVC 2019. There is no way that the t.p variable could be uninitialised.
In fact, there is no way it could reach the free() statement without it being initialised to a non-NULL value. But, even if you allow for the possibility the compiler doesn't know that the exit() function won't return, that's actually irrelevant, Whether it returns or not, the structure would still be initialised to something and, in any case, it's perfectly legal to free(NULL).
Removing the if .. exit has no effect on the warning so I doubt that's the issue. It's more likely that this is just MSVC being aggressive in reporting warnings and the best way to stop it from bothering you is to simply ignore it.
By that, I don't mean you ignoring the warning (I could never do that given my nature), I mean telling MSVC to shut up about it:
while (1) {
MyStruct t = Test();
// MSVC wrongly reports this as using uninitialised variable.
#pragma warning(push)
#pragma warning(disable: 6001)
free(t.p);
#pragma warning(pop)
}
Some points:
sometimes SAL warnings can be "treated" by having malloc() replaced by calloc()
a) much more precise (element size and count params provided) - better analyzer prediction?
b) different API - that one possibly not instrumented, thus no analyzer output? ;-P
analysis might be confused via the exit() within that function, which smells a bit like being related to [missing] noreturn attribution (this case very similar to bailing out of a return-value-based function via exception throw), see e.g. https://en.cppreference.com/w/cpp/language/attributes ; OTOH the noreturn attribution thing is conditional here (i.e., not in all code paths), thus a noreturn attribution smells imprecise/wrong (the code is trying to use a function result after all)
generally, try to aggressively "break" things into achieving "working" warning-free behaviour, by progressively removing (potentially larger) pieces of the implementation until it starts to "work". E.g. in this case, removing the exit() line may cause changed SAL behaviour and thus provide clues as to what aspect actually is the "problem".
perhaps the design might be less optimal than possible - in such cases, possibly some limited rework might lead to more "obvious"/"elegant"/"modern" handling which may result in not producing such SAL warnings.

C4996 (function unsafe) warning for strcpy but not for memcpy

I am writing code in VS2010 and I happen to see after compilation compiler gives me C4996 warning ("This function or variable may be unsafe") for strcpy and sprintf calls.
However, I couldn't get similar warnings for memcpy (and may be there are few more similar 'unsafe' function calls in the code)
int _tmain(int argc, _TCHAR* argv[])
{
char buf1[100], buf2[100];
strcpy (buf1, buf2); // Warning C4996 displayed here asking to use strcpy_s instead
memcpy (buf1, buf2, 100); // No warning here asking to use memcpy_s
memcpy_s(buf1, 100, buf2, 100);
return 0;
}
Why is this so? How can I turn on C4996 warning for all possible unsafe calls in my code?
In general, to compile C code you need a conforming C compiler. Visual Studio is a non-conforming C++ compiler.
You get the warning because Visual Studio is bad. See this.
C4996 appears whenever you use a function that Microsoft regards as obsolete. Apparently, Microsoft has decided that they should dictate the future of the C language, rather than the ISO C working group. Thus you get false warnings for perfectly fine code. The compiler is the problem.
There is nothing wrong with the strcpy() function, that's a myth. This function has existed for some 30-40 years and every little bit of it is properly documented. So what the function does and what it does not should not come as a surprise, even to beginner C programmers.
What strcpy does and does not:
It copies a null-terminated string into another memory location.
It does not take any responsibility for error handling.
It does not fix bugs in the caller application.
It does not take any responsibility for educating C programmers.
Because of the last remark above, you must know the following before calling strcpy:
If you pass a string of unknown length to strcpy, without checking its length in advance, you have a bug in the caller application.
If you pass some chunk of data which does not end with \0, you have a bug in the caller application.
If you pass two pointers to strcpy(), which point at memory locations that overlap, you invoke undefined behavior. Meaning you have a bug in the caller application.
For example, in the code you posted, you never initialized the arrays, so your program will likely crash and burn. That bug isn't in the slightest related to the strcpy() function and will not be solved by swapping out strcpy() for something else.
strcpy is unsafe if the terminating NUL is missing, as it may copy more characters than fit in the destination area. With memcpy, the number of bytes copied is fixed.
The memcpy_s function actually makes it easier for programmers to do it wrong -- you pass two lengths, and it uses the smaller of both, and all you get is an error code that can be silently ignored with no effort. Calling memcpy requires filling out the size parameter, which should make programmers think about what to pass.
Include in header "stdafx.h" definition
#define _CRT_SECURE_NO_WARNINGS
As for the difference of strcpy and memcpy then the last function has third parameter that explicitly specifies how many characters must be copied. The first function has no information how many characters will be copied from the source string to the destination string so in general case there is a possibility that the memory allocated for the destination string will be overwritten.
You get these warning because not passing the length of string and relying on \0 termination are unsafe as they may cause buffer overrun. In memcpy you pass length so no overrun issue.
You can use something like
#ifdef _MSC_VER
# pragma warning(push)
# pragma warning(disable:4996)
#endif
strcpy... ; // Code that causes unsafe warning
#ifdef _MSC_VER
# pragma warning(pop)
#endif
If you don't worry about portability, you can use alternatives like strcpy_s etc
Because strcpy and sprintf really are unsafe functions, it depends on the content of the string to not overflow. Instead you should use strncpy and snprintf to make sure it does not overwrite the memory.
While memcpy is not this case, it has the length so it does not overwrite memory as long as the length is correct.
The warning meens that the function is deprecated and will not be available in future versions: http://msdn.microsoft.com/en-US/en-en/library/ttcz0bys.aspx You can't add other functions to the deprecate list of Microsoft.
The reason for the deprecation are "unsafe", but that's different from your assumption "C4496 shows you all unsafe functions".
The reason why you get a warning on sprintf and strcpy, and not on memcpy, is because memcpy has a length parameter that limits how much memory you copy. For strcpy and memcpy, the input has to be terminated with a \0. If not, it will continue out of bounds. You can limit this by using the snprintf and strncpy functions. Those do limit implicitly how much can be copied.
Note that microsoft has deprecated snprintf, so you should use the replacement function _snprintf instead. However, this is a MSVC specific function.
I would advise to do away with char * buffers all together and switch to C++, using stl container, such as std::string. These will save you a lot of debugging headaches and keep your code portable.

Variadic function and arbitrary argument saving for future execution

I am developing a thread pool in c language and i wanted to allow a task to have an arbitrary number of arguments. Even-thought i could use a function like
int (*task) ();
This function would be able to be called with any type of arguments, like for example i could do
int fib(int n) { return n < 2 ? n : fib(n-1) + fib(n-2); }
...
task = fib;
printf("fib(10)=%d\n",task(10));
However what I want is to be able to save the arguments to run it later, without having to use a call to malloc, because otherwise i would prefer to just use a task like
void * (*task) (void *);
in which i would only have to save the void * argument on a struct.
However i wanted to do that for arbitrary arguments, is it possible to make it automatically for any kind of functions i want, without even using any va_list.
Is it possible?
tx in advance
I'm afraid what you want is not possible - given I correctly understand your question.
The way I'd implement it is using an anonymous pointer to a struct, which implementation is known by the callback and by the caller, but not the thread pool, which will only carry around a single pointer.
But, sadly that solution implies using a malloc(), or nasty memory copy on preallocated space which could be on the stack or globally.
If it was only the argument, then I'd use:
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wstrict-prototypes"
typedef void *(*task_t)();
#pragma GCC diagnostic pop
But that won't work for return types.
The 1st #pragma makes GCC remember which -W flags were provided on the command line. The current warning flags get pushed on a stack.
The 2nd #pragma makes GCC not whine about the absence of the arguments.
The 3rd #pragma restores the old warnings.
The ‘#pragma’ directive is the method specified by the C standard for providing additional information to the compiler, beyond what is conveyed in the language itself.
  — https://gcc.gnu.org/onlinedocs/cpp/Pragmas.html

Prettiest way to declare a C array either fixed size or variable size?

I am writing a small C code for an algorithm. The main target are embedded microcontrollers, however, for testing purposes, a Matlab/Python interface is required.
I am following an embedded programming standard (MISRA-C 2004), which requires the use of C90, and discourage the use of malloc and friends. Therefore, all the arrays in the code have their memory allocated at compile time. If you change the size of the input arrays, you need to recompile the code, which is alright in the microcontroller scenario.
However, when prototyping with Matlab/Python, the size of the input arrays change rather often, and recompiling every time does not seem like an option. In this case, the use of C99 is acceptable, and the size of the arrays should be determined in runtime.
The question is: what options do I have in C to make these two scenarios coexist in the same code, while keeping the code clean?
I must emphasize that my main concern is how to make the code easy to maintain. I have considered using #ifdef to either take the statically allocated array or the dynamically alocated array. But there are too many arrays, I think #ifdef makes the code look ugly.
I've thought of a way that you can get away with only one #ifdef. I would personally just bite the bullet and recompile my code when I need to. The idea of using a different dialect of C for production and test makes me a bit nervous.
Anyway, here's what you can do.
#ifdef EMBEDDED
#define ARRAY_SIZE(V,S) (S)
#else
#define ARRAY_SIZE(V,S) (V)
#endif
int myFunc(int n)
{
int myArray[ARRAY_SIZE(n, 6)];
// work with myArray
}
The ARRAY_SIZE macro chooses the variable V, if not in the embedded environment; or the fixed size S, if in the embedded environment.
MISRA-C:2004 forbids C99 and thereby VLAs, so if you are writing strictly-conforming MISRA code you can't use them. It is also very likely that VLAs will be explicitly banned in the upcoming MISRA-C standard.
Is it an option not to use statically allocated arrays of unknown size? That is:
uint8_t arr[] = { ... };
...
n = sizeof(arr)/sizeof(uint8_t);
This is most likely the "prettiest" way. Alternatively you can have a debug build in C99 with VLAs, and then change it to statically allocated arrays in the release build.

Why do compilers not warn about out-of-bounds static array indices?

A colleague of mine recently got bitten badly by writing out of bounds to a static array on the stack (he added an element to it without increasing the array size). Shouldn't the compiler catch this kind of error? The following code compiles cleanly with gcc, even with the -Wall -Wextra options, and yet it is clearly erroneous:
int main(void)
{
int a[10];
a[13] = 3; // oops, overwrote the return address
return 0;
}
I'm positive that this is undefined behavior, although I can't find an excerpt from the C99 standard saying so at the moment. But in the simplest case, where the size of an array is known as compile time and the indices are known at compile time, shouldn't the compiler emit a warning at the very least?
GCC does warn about this. But you need to do two things:
Enable optimization. Without at least -O2, GCC is not doing enough analysis to know what a is, and that you ran off the edge.
Change your example so that a[] is actually used, otherwise GCC generates a no-op program and has completely discarded your assignment.
.
$ cat foo.c
int main(void)
{
int a[10];
a[13] = 3; // oops, overwrote the return address
return a[1];
}
$ gcc -Wall -Wextra -O2 -c foo.c
foo.c: In function ‘main’:
foo.c:4: warning: array subscript is above array bounds
BTW: If you returned a[13] in your test program, that wouldn't work either, as GCC optimizes out the array again.
Have you tried -fmudflap with GCC? These are runtime checks but are useful, as most often you have got to do with runtime calculated indices anyway. Instead of silently continue to work, it will notify you about those bugs.
-fmudflap -fmudflapth -fmudflapir
For front-ends that support it (C and C++), instrument all risky
pointer/array dereferencing
operations, some standard
library string/heap functions, and some other associated
constructs with range/validity tests.
Modules so instrumented
should be immune to buffer overflows, invalid heap use, and some
other classes of C/C++ programming
errors. The instrumen‐
tation relies on a separate runtime library (libmudflap), which
will be linked into a program if
-fmudflap is given at link
time. Run-time behavior of the instrumented program is controlled
by the MUDFLAP_OPTIONS environment
variable. See "env
MUDFLAP_OPTIONS=-help a.out" for its options.
Use -fmudflapth instead of -fmudflap to compile and to link if your program is multi-threaded. Use
-fmudflapir, in addition
to -fmudflap or -fmudflapth, if instrumentation should ignore pointer reads. This produces
less instrumentation (and there‐
fore faster execution) and still provides some protection against
outright memory corrupting writes, but
allows erroneously
read data to propagate within a program.
Here is what mudflap gives me for your example:
[js#HOST2 cpp]$ gcc -fstack-protector-all -fmudflap -lmudflap mudf.c
[js#HOST2 cpp]$ ./a.out
*******
mudflap violation 1 (check/write): time=1229801723.191441 ptr=0xbfdd9c04 size=56
pc=0xb7fb126d location=`mudf.c:4:3 (main)'
/usr/lib/libmudflap.so.0(__mf_check+0x3d) [0xb7fb126d]
./a.out(main+0xb9) [0x804887d]
/usr/lib/libmudflap.so.0(__wrap_main+0x4f) [0xb7fb0a5f]
Nearby object 1: checked region begins 0B into and ends 16B after
mudflap object 0x8509cd8: name=`mudf.c:3:7 (main) a'
bounds=[0xbfdd9c04,0xbfdd9c2b] size=40 area=stack check=0r/3w liveness=3
alloc time=1229801723.191433 pc=0xb7fb09fd
number of nearby objects: 1
[js#HOST2 cpp]$
It has a bunch of options. For example it can fork off a gdb process upon violations, can show you where your program leaked (using -print-leaks) or detect uninitialized variable reads. Use MUDFLAP_OPTIONS=-help ./a.out to get a list of options. Since mudflap only outputs addresses and not filenames and lines of the source, i wrote a little gawk script:
/^ / {
file = gensub(/([^(]*).*/, "\\1", 1);
addr = gensub(/.*\[([x[:xdigit:]]*)\]$/, "\\1", 1);
if(file && addr) {
cmd = "addr2line -e " file " " addr
cmd | getline laddr
print $0 " (" laddr ")"
close (cmd)
next;
}
}
1 # print all other lines
Pipe the output of mudflap into it, and it will display the sourcefile and line of each backtrace entry.
Also -fstack-protector[-all] :
-fstack-protector
Emit extra code to check for buffer overflows, such as stack smashing attacks. This is done by adding a guard variable to functions with vulnerable objects. This includes functions that call alloca, and functions with buffers larger than 8 bytes. The guards are initialized when a function is entered and then checked when the function exits. If a guard check fails, an error message is printed and the program exits.
-fstack-protector-all
Like -fstack-protector except that all functions are protected.
You're right, the behavior is undefined. C99 pointers must point within or just one element beyond declared or heap-allocated data structures.
I've never been able to figure out how the gcc people decide when to warn. I was shocked to learn that -Wall by itself will not warn of uninitialized variables; at minimum you need -O, and even then the warning is sometimes omitted.
I conjecture that because unbounded arrays are so common in C, the compiler probably doesn't have a way in its expression trees to represent an array that has a size known at compile time. So although the information is present at the declaration, I conjecture that at the use it is already lost.
I second the recommendation of valgrind. If you are programming in C, you should run valgrind on every program, all the time until you can no longer take the performance hit.
It's not a static array.
Undefined behavior or not, it's writing to an address 13 integers from the beginning of the array. What's there is your responsibility. There are several C techniques that intentionally misallocate arrays for reasonable reasons. And this situation is not unusual in incomplete compilation units.
Depending on your flag settings, there are a number of features of this program that would be flagged, such as the fact that the array is never used. And the compiler might just as easily optimize it out of existence and not tell you - a tree falling in the forest.
It's the C way. It's your array, your memory, do what you want with it. :)
(There are any number of lint tools for helping you find this sort of thing; and you should use them liberally. They don't all work through the compiler though; Compiling and linking are often tedious enough as it is.)
The reason C doesn't do it is that C doesn't have the information. A statement like
int a[10];
does two things: it allocates sizeof(int)*10 bytes of space (plus, potentially, a little dead space for alignment), and it puts an entry in the symbol table that reads, conceptually,
a : address of a[0]
or in C terms
a : &a[0]
and that's all. In fact, in C you can interchange *(a+i) with a[i] in (almost*) all cases with no effect BY DEFINITION. So your question is equivalent to asking "why can I add any integer to this (address) value?"
* Pop quiz: what is the one case in this this isn't true?
The C philosophy is that the programmer is always right. So it will silently allow you to access whatever memory address you give there, assuming that you always know what you are doing and will not bother you with a warning.
I believe that some compilers do in certain cases. For example, if my memory serves me correctly, newer Microsoft compilers have a "Buffer Security Check" option which will detect trivial cases of buffer overruns.
Why don't all compilers do this? Either (as previously mentioned) the internal representation used by the compiler doesn't lend itself to this type of static analysis or it just isn't high enough of the writers priority list. Which to be honest, is a shame either way.
shouldn't the compiler emit a warning at the very least?
No; C compilers generally do not preform array bounds checks. The obvious negative effect of this is, as you mention, an error with undefined behavior, which can be very difficult to find.
The positive side of this is a possible small performance advantage in certain cases.
There are some extension in gcc for that (from compiler side)
http://www.doc.ic.ac.uk/~awl03/projects/miro/
on the other hand splint, rat and quite a few other static code analysis tools would have
found that.
You also can use valgrind on your code and see the output.
http://valgrind.org/
another widely used library seems to be libefence
It's simply a design decision ones made. Which now leads to this things.
Regards
Friedrich
-fbounds-checking option is available with gcc.
worth going thru this article
http://www.doc.ic.ac.uk/~phjk/BoundsChecking.html
'le dorfier' has given apt answer to your question though, its your program and it is the way C behaves.

Resources