Why do compilers not warn about out-of-bounds static array indices? - c

A colleague of mine recently got bitten badly by writing out of bounds to a static array on the stack (he added an element to it without increasing the array size). Shouldn't the compiler catch this kind of error? The following code compiles cleanly with gcc, even with the -Wall -Wextra options, and yet it is clearly erroneous:
int main(void)
{
int a[10];
a[13] = 3; // oops, overwrote the return address
return 0;
}
I'm positive that this is undefined behavior, although I can't find an excerpt from the C99 standard saying so at the moment. But in the simplest case, where the size of an array is known as compile time and the indices are known at compile time, shouldn't the compiler emit a warning at the very least?

GCC does warn about this. But you need to do two things:
Enable optimization. Without at least -O2, GCC is not doing enough analysis to know what a is, and that you ran off the edge.
Change your example so that a[] is actually used, otherwise GCC generates a no-op program and has completely discarded your assignment.
.
$ cat foo.c
int main(void)
{
int a[10];
a[13] = 3; // oops, overwrote the return address
return a[1];
}
$ gcc -Wall -Wextra -O2 -c foo.c
foo.c: In function ‘main’:
foo.c:4: warning: array subscript is above array bounds
BTW: If you returned a[13] in your test program, that wouldn't work either, as GCC optimizes out the array again.

Have you tried -fmudflap with GCC? These are runtime checks but are useful, as most often you have got to do with runtime calculated indices anyway. Instead of silently continue to work, it will notify you about those bugs.
-fmudflap -fmudflapth -fmudflapir
For front-ends that support it (C and C++), instrument all risky
pointer/array dereferencing
operations, some standard
library string/heap functions, and some other associated
constructs with range/validity tests.
Modules so instrumented
should be immune to buffer overflows, invalid heap use, and some
other classes of C/C++ programming
errors. The instrumen‐
tation relies on a separate runtime library (libmudflap), which
will be linked into a program if
-fmudflap is given at link
time. Run-time behavior of the instrumented program is controlled
by the MUDFLAP_OPTIONS environment
variable. See "env
MUDFLAP_OPTIONS=-help a.out" for its options.
Use -fmudflapth instead of -fmudflap to compile and to link if your program is multi-threaded. Use
-fmudflapir, in addition
to -fmudflap or -fmudflapth, if instrumentation should ignore pointer reads. This produces
less instrumentation (and there‐
fore faster execution) and still provides some protection against
outright memory corrupting writes, but
allows erroneously
read data to propagate within a program.
Here is what mudflap gives me for your example:
[js#HOST2 cpp]$ gcc -fstack-protector-all -fmudflap -lmudflap mudf.c
[js#HOST2 cpp]$ ./a.out
*******
mudflap violation 1 (check/write): time=1229801723.191441 ptr=0xbfdd9c04 size=56
pc=0xb7fb126d location=`mudf.c:4:3 (main)'
/usr/lib/libmudflap.so.0(__mf_check+0x3d) [0xb7fb126d]
./a.out(main+0xb9) [0x804887d]
/usr/lib/libmudflap.so.0(__wrap_main+0x4f) [0xb7fb0a5f]
Nearby object 1: checked region begins 0B into and ends 16B after
mudflap object 0x8509cd8: name=`mudf.c:3:7 (main) a'
bounds=[0xbfdd9c04,0xbfdd9c2b] size=40 area=stack check=0r/3w liveness=3
alloc time=1229801723.191433 pc=0xb7fb09fd
number of nearby objects: 1
[js#HOST2 cpp]$
It has a bunch of options. For example it can fork off a gdb process upon violations, can show you where your program leaked (using -print-leaks) or detect uninitialized variable reads. Use MUDFLAP_OPTIONS=-help ./a.out to get a list of options. Since mudflap only outputs addresses and not filenames and lines of the source, i wrote a little gawk script:
/^ / {
file = gensub(/([^(]*).*/, "\\1", 1);
addr = gensub(/.*\[([x[:xdigit:]]*)\]$/, "\\1", 1);
if(file && addr) {
cmd = "addr2line -e " file " " addr
cmd | getline laddr
print $0 " (" laddr ")"
close (cmd)
next;
}
}
1 # print all other lines
Pipe the output of mudflap into it, and it will display the sourcefile and line of each backtrace entry.
Also -fstack-protector[-all] :
-fstack-protector
Emit extra code to check for buffer overflows, such as stack smashing attacks. This is done by adding a guard variable to functions with vulnerable objects. This includes functions that call alloca, and functions with buffers larger than 8 bytes. The guards are initialized when a function is entered and then checked when the function exits. If a guard check fails, an error message is printed and the program exits.
-fstack-protector-all
Like -fstack-protector except that all functions are protected.

You're right, the behavior is undefined. C99 pointers must point within or just one element beyond declared or heap-allocated data structures.
I've never been able to figure out how the gcc people decide when to warn. I was shocked to learn that -Wall by itself will not warn of uninitialized variables; at minimum you need -O, and even then the warning is sometimes omitted.
I conjecture that because unbounded arrays are so common in C, the compiler probably doesn't have a way in its expression trees to represent an array that has a size known at compile time. So although the information is present at the declaration, I conjecture that at the use it is already lost.
I second the recommendation of valgrind. If you are programming in C, you should run valgrind on every program, all the time until you can no longer take the performance hit.

It's not a static array.
Undefined behavior or not, it's writing to an address 13 integers from the beginning of the array. What's there is your responsibility. There are several C techniques that intentionally misallocate arrays for reasonable reasons. And this situation is not unusual in incomplete compilation units.
Depending on your flag settings, there are a number of features of this program that would be flagged, such as the fact that the array is never used. And the compiler might just as easily optimize it out of existence and not tell you - a tree falling in the forest.
It's the C way. It's your array, your memory, do what you want with it. :)
(There are any number of lint tools for helping you find this sort of thing; and you should use them liberally. They don't all work through the compiler though; Compiling and linking are often tedious enough as it is.)

The reason C doesn't do it is that C doesn't have the information. A statement like
int a[10];
does two things: it allocates sizeof(int)*10 bytes of space (plus, potentially, a little dead space for alignment), and it puts an entry in the symbol table that reads, conceptually,
a : address of a[0]
or in C terms
a : &a[0]
and that's all. In fact, in C you can interchange *(a+i) with a[i] in (almost*) all cases with no effect BY DEFINITION. So your question is equivalent to asking "why can I add any integer to this (address) value?"
* Pop quiz: what is the one case in this this isn't true?

The C philosophy is that the programmer is always right. So it will silently allow you to access whatever memory address you give there, assuming that you always know what you are doing and will not bother you with a warning.

I believe that some compilers do in certain cases. For example, if my memory serves me correctly, newer Microsoft compilers have a "Buffer Security Check" option which will detect trivial cases of buffer overruns.
Why don't all compilers do this? Either (as previously mentioned) the internal representation used by the compiler doesn't lend itself to this type of static analysis or it just isn't high enough of the writers priority list. Which to be honest, is a shame either way.

shouldn't the compiler emit a warning at the very least?
No; C compilers generally do not preform array bounds checks. The obvious negative effect of this is, as you mention, an error with undefined behavior, which can be very difficult to find.
The positive side of this is a possible small performance advantage in certain cases.

There are some extension in gcc for that (from compiler side)
http://www.doc.ic.ac.uk/~awl03/projects/miro/
on the other hand splint, rat and quite a few other static code analysis tools would have
found that.
You also can use valgrind on your code and see the output.
http://valgrind.org/
another widely used library seems to be libefence
It's simply a design decision ones made. Which now leads to this things.
Regards
Friedrich

-fbounds-checking option is available with gcc.
worth going thru this article
http://www.doc.ic.ac.uk/~phjk/BoundsChecking.html
'le dorfier' has given apt answer to your question though, its your program and it is the way C behaves.

Related

i want to know unused string consume space or not and why program is run with usused strings

i want to know c compiler behavior with strings
i am using windows 7 code block with GCC
int main()
{
"1145"; "ho";
printf("hello");
}
so i want to know unused string consume memory space or not
First you need to understand l(eft)-values and r(ight)-values.
l-values actually are memory locations, where objects are stored.
r-values are data, that supposed to be stored in some place in memory (in l-value).
So your construct "1145"; "ho";
makes two r-values that are not assigned anywhere. You can even make this (perfectly valid) code:
int main(){
;;
printf("hello");
}
This is allowed because ; is null statement operator. You will, not once, see expressions like
while(*ptr++); // ajusts pointer until contents of the pointer become 0
where while is actually executing every iteration ;
I'm 99% sure that this strings didn't use any space at all, because GCC without any option recognized unused statement and didn't generate any code for this line.
Compiling the code shown and assuming you enabled enough warnings you can expect the following being issued by the compiler:
warning: statement with no effect [-Wunused-value]
So the compiler seems to have noticed that those strings are "unused". Knowing this and being told to "optimise" the compilation those strings might very well be removed and would not use any memory at all.
If the compiler has been told to not optimise the strings will be part of the program and use at least sizeof "1145" + sizeof "ho" bytes.
Further readings:
To enable GCC's warnings use its -Wxyz options.
To steer optimisation with GCC use its -O option.

string literals and strcat

I am not sure why strcat works in this case for me:
char* foo="foo";
printf(strcat(foo,"bar"));
It successfully prints "foobar" for me.
However, as per an earlier topic discussed on stackoverflow here: I just can't figure out strcat
It says, that the above should not work because foo is declared as a string literal. Instead, it needs to be declared as a buffer (an array of a predetermined size so that it can accommodate another string which we are trying to concatenate).
In that case, why does the above program work for me successfully?
This code invokes Undefined Behavior (UB), meaning that you have no guarantee of what will happen (failure here).
The reason is that string literals are immutable. That means that they are not mutable, and any attempt of doing so, will invoke UB.
Note what a difficult logical error(s) can arise with UB, since it might work (today and in your system), but it's still wrong, which makes it very likely that you might miss the error, and get along as everything was fine.
PS: In this Live Demo, I am lucky enough to get a Segmentation fault. I say lucky, because this seg fault will make me investigate and debug the code.
It's worth noting that GCC issues no warning, and the warning from Clang are also irrelevant:
p
rog.c:7:8: warning: format string is not a string literal (potentially insecure) [-Wformat-security]
printf(strcat(foo,"bar"));
^~~~~~~~~~~~~~~~~
prog.c:7:8: note: treat the string as an argument to avoid this
printf(strcat(foo,"bar"));
^
"%s",
1 warning generated.
String literals are immutable in the sense that the compiler will operate under the assumption that you won't mutate them, not that you'll necessarily get an error if you try to modify them. In legalese, this is "undefined behavior", so anything can happen, and, as far as the standard is concerned, it's fine.
Now, on modern platforms and with modern compilers you do have extra protections: on platforms that have memory protection the string table generally gets placed in a read-only memory area, so that modifying it will get you a runtime error.
Still, you may have a compiler that doesn't provide any of the runtime-enforced checks, either because you are compiling for a platform without memory protection (e.g. pre-80386 x86, so pretty much any C compiler for DOS such as Turbo C, most microcontrollers when operating on RAM and not on flash, ...), or with an older compiler which doesn't exploit this hardware capability by default to remain compatible with older revisions (older VC++ for a long time), or with a modern compiler which has such an option explicitly enabled, again for compatibility with older code (e.g. gcc with -fwritable-strings). In all these cases, it's normal that you won't get any runtime error.
Finally, there's an extra devious corner case: current-day optimizers actively exploit undefined behavior - i.e. they assume that it will never happen, and modify the code accordingly. It's not impossible that a particularly smart compiler can generate code that just drops such a write, as it's legally allowed to do anything it likes most for such a case.
This can be seen for some simple code, such as:
int foo() {
char *bar = "bar";
*bar = 'a';
if(*bar=='b') return 1;
return 0;
}
here, with optimizations enabled:
VC++ sees that the write is used just for the condition that immediately follows, so it simplifies the whole thing to return 0; no memory write, no segfault, it "appears to work" (https://godbolt.org/g/cKqYU1);
gcc 4.1.2 "knows" that literals don't change; the write is redundant and it gets optimized away (so, no segfault), the whole thing becomes return 1 (https://godbolt.org/g/ejbqDm);
any more modern gcc choose a more schizophrenic route: the write is not elided (so you get a segfault with the default linker options), but if it succeeded (e.g. if you manually fiddle with memory protection) you'd get a return 1 (https://godbolt.org/g/rnUDYr) - so, memory modified but the code that follows thinks it hasn't been modified; this is particularly egregious on AVR, where there's no memory protection and the write succeeds.
clang does pretty much the same as gcc.
Long story short: don't try your luck and tread carefully. Always assign string literals to const char * (not plain char *) and let the type system help you avoid this kind of problems.

How to switch off local array merging in clang?

There is simple code, where clang and gcc behave differently.
int t;
extern void abort (void);
int f(int t, const int *a)
{
const int b[] = { 1, 2, 3};
if (!t)
return f(1, b);
return b == a;
}
int main(void)
{
if (f(0, 0))
abort ();
return 0;
}
Clang:
> clang -v
clang version 4.0.1 (tags/RELEASE_401/final)
Target: x86_64-unknown-linux-gnu
Thread model: posix
> clang test.c
> ./a.out
Aborted
GCC:
> gcc -v
Target: x86_64-suse-linux
Thread model: posix
gcc version 7.2.0 (GCC)
> gcc test.c
> ./a.out
> echo $?
0
Reason is pretty obvious: behavior is implementation defined and clang merges constant local arrays to global one.
But lets say I want consistent behavior. Can I turn some switch on or off in clang to disable this optimization and make it honestly create different local arrays (even constant ones) for different stack frames?
The option in clang you're looking for is -fno-merge-all-constants. And you can enable it in gcc with -fmerge-all-constants if you want to achieve the opposite. But the documentation of the option in gcc makes me curious:
Languages like C or C++ require each variable, including multiple instances of the same variable in recursive calls, to have distinct locations, so using this option results in non-conforming behavior.
The only bit that somehow might suggest that clang is allowed to get away with this is (C11, 6.5.2.4):
String literals, and compound literals with const-qualified types, need not designate distinct objects.
The problem here is that your code doesn't have a compound literal.
There is in fact a bug report for clang about this and it appears that the developers are aware that this is non-conforming: https://bugs.llvm.org/show_bug.cgi?id=18538
The interesting comment in there is:
This is the only case I can think of (off the top of my head) where clang deliberately does not conform to the standard by default, and has a flag to make it conform. There are a few other places where we deliberately don't conform because we think the standard is wrong (and generally we try to get the standard fixed in those cases).
It does appear that clang is reusing the same array for variable b in both local scopes of f(), but that cannot be justified on the basis of implementation-defined behavior. Implementation-defined behaviors are explicitly called out in the standard, and conforming implementations document their actual behavior for each area of implementation-defined behavior. This is not an area where the standard grants such latitude.
On the contrary, the behavior of the clang-generated program is non-conforming, and Clang is non-conforming for producing such code. The standard specifically says that
For [an object with automatic storage duration] that does not have a variable length array type,
its lifetime extends from entry into the block with which it is
associated until execution of that block ends in any way. (Entering an
enclosed block or calling a function suspends, but does not end,
execution of the current block.) If the block is entered recursively,
a new instance of the object is created each time.
(C2011, 6.2.4/6; emphasis added)
The objects in question in this case do have automatic storage duration and do not have variable-length array type, so the standard specifies that they be distinct objects. That they are arrays with a const-qualifed element type does not permit clang to reuse the array.
But lets say I want consistent behavior. Can I turn some switch on or off in clang to disable this optimization and make it honestly create different local arrays (even constant ones) for different stack frames?
You can and should reported a bug against Clang, unless this issue has already been reported. The time frame for that bug being fixed is probably longer than you want to wait, but I do not find documentation of any command-line flag that would modulate this behavior.
The other answer suggests that there is in fact an option controlling this behavior. I'm entirely prepared to believe that, as I have previously found Clang's documentation to be incomplete in other ways, but it should not be necessary to explicitly turn off such an option to achieve language conformance.

Strange C code - dynamic arrays?

I have a bit of code copied from an unknown source:
int Len=0;
printf("Please input the length of vector");
scanf("%d",&Len);
float x[Len],y[Len],sig[Len];
Now normally I believe that arrays cannot be initialized during runtime with a variable. However, this does allegedly compile. Problem is that again I do not know the compiler. Is there a C variant where this is legal? The compiler I am using, IAR C, does not like it.
I am also seeing arrays indexed from 1 rather than 0, which suggests this is translated from something like Pascal originally. Any opinions?
Now normally I believe that arrays cannot be initialized during runtime with a variable.
That has been true before C99 standard. It is also illegal in C++ (although some compilers, such as gcc, offer this as an extension).
Is there a C variant where this is legal?
Any C99 compiler will do.
I am also seeing arrays indexed from 1 rather than 0
This is OK as well, as long as you are fine allocating an extra element, and not using element at index zero.
Note: since accessing an element past the end of an array is undefined behavior, an invalid program may appear to work and produce the desired result in your test runs. If you suspect that some array indexes may be off by one, consider running your program under a memory profiler, such as valgrind, to see if the program has hidden errors related to invalid memory access.
This was a feature introduced in C99 and are called VLAs(Variable Length Arrays). These arrays are also indexed starting from 0 not 1 and ending at length-1(Len-1 in your case) just like a normal array.
In C99 this is valid and called a VLA-Array.
This is called a Variable Length Array (VLA) and is a C99 feature.
If your compiler does not recognise it on it's own then try switching C standards
Try:
--std=c99
-std=c99
--std=gnu99
-std=gnu99
The manual page of your compiler will be able to tell you the exact flag.

Why do we include stdlib.h?

C function malloc() is defined under stdlib.h.
It should give an error if we don't include this file, but this code works fine with a little warning.
My question is, if malloc() works without this header file, then why do we need to include it? Please help clarify my concepts.
# include <stdio.h>
int main()
{
int a, b, *p;
p = (int*)malloc(sizeof(int)*5);
for(a=0;a<5;a++)p[a]=a*9;
for(b=0;b<5;b++)printf("%d ",p[b]);
}
In C unfortunately you don't need pre-declaration for functions. If the compiler encounters with a new function it will create an implicit declaration for it ("mmm`kay, this how it is used so I will assume that the type of the arguments are..").
Do not rely on this "feature" and in general do not write code that compiles with warnings.
Read the warning. It says it's invalid. The compiler is simply too kind to you. In Clang this works, but it might not in other compilers.
At least include it to suppress the warning. Unnecessary warnings are annoying. Any program should compile with warnings treated as errors (I always enable that).
It appears that that's your compiler's magic. Not including the necessary headers may work on your compiler (which I suppose is by Microsoft), but it won't necessarily compile elsewhere(that includes future versions of the same compiler). Write standard-conforming, portable code.
stdlib.h is of the general purpose standard header which includes functions of Dynamic Memory allocation and other Standard Functions.
For example if you want to display a message at the end of the execution of your program you will need to go for the getch() function,this functions reads a character from keyboard thus giving user the time to read the displayed Information.
The getch() function requires the stdlib header to be Included.
Like many things in c the reason that an error isn't generated when there is no prototype is for historical reasons. In the early days people often didn't bother prototyping functions because pointers and integers were usually the same size and integral types smaller than an integer were promoted to an integer when passed as a parameter (and floating point was rarely used for systems programming).
If at any point they had changed the compiler to give an error if a function was not prototyped then it would have broken many programs and would not have gained widespread acceptance.
With 64 bit addressing we are now entering a period when integers and pointers are not the same size and programs will most likely break if you do not prototype functions like malloc() that return a pointer.
In gcc always set the following options for your own programs: -Werror -Wstrict-prototypes

Resources