For straight C and GCC, why doesn't the pointed-to string get corrupted here?
#include <stdio.h>
int main(int argc, char *argv[])
{
char* str_ptr = NULL;
{
//local to this scope-block
char str[4]={0};
sprintf(str, "AGH");
str_ptr = str;
}
printf("str_ptr: %s\n", str_ptr);
getchar();
return 0;
}
|----OUTPUT-----|
str_ptr: AGH
|--------------------|
Here's a link to the above code compiled and executed using an online compiler.
I understand that if str was a string literal, str would be stored in the bss ( essentially as a static ), but sprintf(ing) to a stack-allocated buffer, I thought the string buffer would be purely stack-based ( and thus the address meaningless after leaving the scope block )? I understand that it may take additional stack allocations to over-write the memory at the given address, but even using a recursive function until a stack-overflow occurred, I was unable to corrupt the string pointed to by str_ptr.
FYI I am doing my testing in a VS2008 C project, although GCC seems to exhibit the same behavior.
While nasal lizards are a popular part of C folklore, code whose behaviour is undefined can actually exhibit any behaviour at all, including magically resuscitating variables whose lifetime has expired. The fact that code with undefined behaviour can appear to "work" should neither be surprising nor an excuse to neglect correcting it. Generally, unless you're in the business of writing compilers, it's not very useful to examine the precise nature of undefined behaviour in any given environment, especially as it might be different after you blink.
In this particular case, the explanation is simple, but it's still undefined behaviour, so the following explanation cannot be relied upon at all. It might at any time be replaced with reptilian emissions.
Generally speaking, C compilers will make each function's stack frame a fixed size, rather than expanding and contracting as control flow enters and leaves internal blocks. Unless called functions are inlined, their stack frames will not overlap with the stack frame of the caller.
So, in certain C compilers with certain sets of compile options and except for particular phases of the moon, the character array str will not be overwritten by the call to printf, even though the variable's lifetime has expired.
Most likely the compiler does some sort of simple optimizations resulting in the string still being in the same place on the stack. In other words, the compiler allows the stack to grow to store 'str'. But it doesn't shrink the stack in the scope of main, because it is not required to do so.
If you really want to see the result of saving the address of variables on the stack, call a function.
#include <stdio.h>
char * str_ptr = NULL;
void onstack(void)
{
char str[4] = {0};
sprintf(str,"AGH");
str_ptr = str;
}
int main(int argc, char *argv[])
{
onstack();
int x = 0x61626364;
printf("str_ptr: %s\n", str_ptr);
printf("x:%i\n",x);
getchar();
return 0;
}
With gcc -O0 -std=c99 strcorrupt.c I get random output on the first printf. It will vary from machine to machine and architecture to architecture.
Related
This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Undefined, unspecified and implementation-defined behavior
This should seg fault. Why doesn't it.
#include <string.h>
#include <stdio.h>
char str1[] = "Sample string. Sample string. Sample string. Sample string. Sample string. ";
char str2[2];
int main ()
{
strcpy (str2,str1);
printf("%s\n", str2);
return 0;
}
I am using gcc version 4.4.3 with the following command:
gcc -std=c99 testString.c -o test
I also tried setting optimisation to o (-O0).
This should seg fault
There's no reason it "should" segfault. The behaviour of the code is undefined. This does not mean it necessarily has to crash.
A segmentation fault only occurs when you perform an access to memory the operating system knows you're not supposed to.
So, what's likely going on, is that the OS allocates memory in pages (which are typically around 4KiB). str2 is probably on the same page as str1, and you're not running off the end of the page, so the OS doesn't notice.
That's the thing about undefined behavior. Anything can happen. Right now, that program actually "works" on your machine. Tomorrow, str2 may be put at the end of a page, and then segfault. Or possibly, you'll overwrite something else in memory, and have completely unpredictable results.
edit: how to cause a segfault:
Two ways. One is still undefined behavior, the other is not.
int main() {
*((volatile char *)0) = 42; /* undefined behavior, but normally segfaults */
}
Or to do it in a defined way:
#include <signal.h>
int main() {
raise(SIGSEGV); /* segfault using defined behavior */
}
edit: third and fourth way to segfault
Here is a variation of the first method using strcpy:
#include <string.h>
const char src[] = "hello, world";
int main() {
strcpy(0, src); /* undefined */
}
And this variant only crashes for me with -O0:
#include <string.h>
const char src[] = "hello, world";
int main() {
char too_short[1];
strcpy(too_short, src); /* smashes stack; undefined */
}
Your program writes beyond the allocated bounds of the array, this results in Undefined Behavior.
The program is ill-formed and It might crash or may not.An explanation may or may not be possible.
It probably doesn't crash because it overwrites some memory beyond the array bounds which is not being used, bt it will once the rightful owner of that memory tries to access it.
A seg-fault is NOT guaranteed behavior.It is one possible (and sometimes likely) outcome of doing something bad.Another possible outcome is that it works by pure luck.A third possible outcome is nasal demons.
if you really want to find out what this might be corrupting i would suggest you see what follows the over-written memory generate a linker map file that should give you a fair idea but then again this all depends on how things are layed out in memory, even can try running this with gdb to reason why it does or does not segfault, that being said, the granularity for built checks in access violations (HW assisted) cannot be finer than a page unless some software magic is thrown in (even with this page granularity access checking it may happen that the immediately next page does really point to something else for the program which you are executing and that it is a Writable page), someone who knows about valgrind can explain how it is able to detect such access violations (also libefence), most likely (i might be very wrong with this explanation, Correct me if i am wrong!) it uses some form of markers or comparisons for checking if out of bounds access has happened.
A program accessing illegal pointer to pointer does not crash with SIGSEGV. This is not a good thing, but I’m wondering how this could be and how the process survived for many days in production. It is bewildering to me.
I have given this program a go in Windows, Linux, OpenVMS, and Mac OS and they have never complained.
#include <stdio.h>
#include <string.h>
void printx(void *rec) { // I know this should have been a **
char str[1000];
memcpy(str, rec, 1000);
printf("%*.s\n", 1000, str);
printf("Whoa..!! I have not crashed yet :-P");
}
int main(int argc, char **argv) {
void *x = 0; // you could also say void *x = (void *)10;
printx(&x);
}
I am not surprised by the lack of a memory fault. The program is not dereferencing an uninitialized pointer. Instead, it is copying and printing the contents of memory beginning at a pointer variable, and the 996 (or 992) bytes beyond it.
Since the pointer is a stack variable, it is printing memory near the top of stack for a ways down. That memory contains the stack frame of main(): possibly some saved register values, a count of program arguments, a pointer to the program arguments, a pointer to a list of environment variables, and a saved instruction register for main() to return, usually in the C runtime library startup code. In all implementations I have investigated, the stack frames below that has copies of the environment variables themselves, an array of pointers to them, and an array of pointers to the program arguments. In Unix environments (which you hint you are using) the program argument strings will be below that.
All of this memory is "safe" to print, except some non-printable characters will appear which might mess up a display terminal.
The chief potential problem is whether there is enough stack memory allocated and mapped to prevent a SIGSEGV during access. A segment fault could happen if there is too little environment data. Or if the implementation puts that data elsewhere so that there are only a few words of stack here. I suggest confirming that by cleaning out the environment variables and re-running the program.
This code would not be so harmless if any of the C runtime conventions are not true:
The architecture uses a stack
A local variable (void *x) is allocated on the stack
The stack grows toward lower numbered memory
Parameters are passed on the stack
Whether main() is called with arguments. (Some light duty environments, like embedded processors, invoke main() without parameters.)
In all mainstream modern implementations, all of these are generally true.
Illegal memory access is undefined behaviour. This means that your program might crash, but is not guaranteed to, because exact behaviour is undefined.
(A joke among developers, especially when facing coworkers that are careless about such things, is that "invoking undefined behaviour might format your hard drive, it's just not guaranteed to". ;-) )
Update: There's some hot discussion going on here. Yes, system developers should know what actually happens on a given system. But such knowledge is tied to the CPU, the operating system, the compiler etc., and generally of limited usefulness, because even if you make the code work, it would still be of very poor quality. That's why I limited my answer to the most important point, and the actual question asked ("why doesn't this crash"):
The code posted in the question does not have well-defined behaviour, but that does just mean that you can't really rely on what it does, not that it should crash.
If you dereference an invalid pointer, you are invoking undefined behaviour. Which means, the program can crash, it can work, it could cook some coffee, whatever.
When you have
int main(int argc, char **argv) {
void *x = 0; // you could also say void *x = (void *)10;
printx(&x);
}
You are declaring x as a pointer with value 0, and that pointer lives in the stack since it's a local variable. Now, you are passing to printx the address of x, which means that with
memcpy(str, rec, 1000);
you are copying data from above the stack (or in fact from the stack itself), to the stack (because the stack pointer address decreases on each push). The source data is likely to be covered by the same page table entry as you are copying just 1000 bytes, so you get no segmentation fault. However, ultimately, as already written, we are talking about undefined behavior.
It would be crashed with great probability if you write to unacceed area. But you are reading, it can be ok. But the behaviour will be still undefined.
Since there is an option -fstack-protector-strong in gcc to detect stack smashing. However, it can not always detect stack buffer overflow. For the first function func, when I input a 10 char more string, the program does not always crash. My question is where there is a way to detect stack buffer overflow.
void func()
{
char array[10];
gets(array);
}
void func2()
{
char buffer[10];
int n = sprintf(buffer, "%s", "abcdefghpapeas");
printf("aaaa [%d], [%s]\n", n, buffer);
}
int main ()
{
func();
func2();
}
Overflows on the stack are either hard to detect or very expensive to detect - chose your poison.
In a nutshell, when you have this:
char a,b;
char *ptr=&a;
ptr[1] = 0;
then this is technically legal: There is space allocated on the stack which belongs to the function. It's just very dangerous.
So the solution might be to add a gap between a and b and fill that with a pattern. But, well, some people actually write code as above. So your compiler needs to detect that.
Alternatively, we could create a bit-map of all bytes that your code has really allocated and then instrument all the code to check against this map. Very safe, pretty slow, bloats your memory usage. On the positive side, there are tools to help with this (like Valgrind).
See where I'm going?
Conclusion: In C, there is no good way to automatically detect many memory problems because the language and the API is often too sloppy. The solution is to move code into helper functions that check their parameters rigorously, always to the right thing and have good unit test coverage.
Always use snprintf() versions of functions if you have a choice. If old code uses the unsafe versions, change it.
You can use a tool called Valgrind
http://valgrind.org/
My question is where there is a way to detect stack buffer overflow...
void func()
{
char array[10];
gets(array);
}
void func2()
{
char buffer[10];
int n = sprintf(buffer, "%s", "abcdefghpapeas");
printf("aaaa [%d], [%s]\n", n, buffer);
}
Because you are using GCC, you can use FORTIFY_SOURCES.
FORTIFY_SOURCE uses "safer" variants of high risk functions like memcpy, strcpy and gets. The compiler uses the safer variants when it can deduce the destination buffer size. If the copy would exceed the destination buffer size, then the program calls abort(). If the compiler cannot deduce the destination buffer size, then the "safer" variants are not used.
To disable FORTIFY_SOURCE for testing, you should compile the program with -U_FORTIFY_SOURCE or -D_FORTIFY_SOURCE=0.
The C Standard has "safer" functions via ISO/IEC TR 24731-1, Bounds Checking Interfaces. On conforming platforms, you can simply call gets_s and sprintf_s. They offer consistent behavior (like always ensuring a string is NULL terminated) and consistent return values (like 0 on success or an errno_t).
Unfortunately, gcc and glibc does not conform to the C Standard. Ulrich Drepper (one of the glibc maintainers) called bounds checking interfaces "horribly inefficient BSD crap", and they were never added. Hopefully it will change in the future.
First of all Do Not Use gets. By now almost everyone knows the all the security and reliability problems that can occur with gets. But it's included here for historical reasons as well because it's a very good example of bad programming.
Let's look at all the problems with the code:
// Really bad code
char line[100];
gets(line);
Because gets does not do bounds checking a string longer than 100 characters will overwrite memory. If you're lucky the program will just crash Or it might exhibit strange behavior.
The gets function is so bad that the GNU gcc linker issues a warning whenever it's used.
/tmp/ccI5WJ5m.o(.text+0x24): In function `main':
: warning: the `gets' function is dangerous and should not be used.
Protect array accesses with assert
C/C++ does not do bound checking.
for example:
int data[10]
i = 20
data[20] = 100 //Memory Corruption
Use the assert function for above code
#include<assert.h>
int data[10];
i=20
assert((i >= 0) && (i < sizeof(data) / sizeof(data[0]))); // throws
data[i] = 100
Array overflows are one of the most common programming errors and are extremely frustrating to try and locate. This code doesn't eliminate them, but it does cause buggy code to abort early in a way that makes the problem tremendously easier to find.
And use snprintf(buffer, sizeof(buffer), "%s", "abcdefghpapeas") and some tools like valgrind or GDB.
Hope this helps you..
A program accessing illegal pointer to pointer does not crash with SIGSEGV. This is not a good thing, but I’m wondering how this could be and how the process survived for many days in production. It is bewildering to me.
I have given this program a go in Windows, Linux, OpenVMS, and Mac OS and they have never complained.
#include <stdio.h>
#include <string.h>
void printx(void *rec) { // I know this should have been a **
char str[1000];
memcpy(str, rec, 1000);
printf("%*.s\n", 1000, str);
printf("Whoa..!! I have not crashed yet :-P");
}
int main(int argc, char **argv) {
void *x = 0; // you could also say void *x = (void *)10;
printx(&x);
}
I am not surprised by the lack of a memory fault. The program is not dereferencing an uninitialized pointer. Instead, it is copying and printing the contents of memory beginning at a pointer variable, and the 996 (or 992) bytes beyond it.
Since the pointer is a stack variable, it is printing memory near the top of stack for a ways down. That memory contains the stack frame of main(): possibly some saved register values, a count of program arguments, a pointer to the program arguments, a pointer to a list of environment variables, and a saved instruction register for main() to return, usually in the C runtime library startup code. In all implementations I have investigated, the stack frames below that has copies of the environment variables themselves, an array of pointers to them, and an array of pointers to the program arguments. In Unix environments (which you hint you are using) the program argument strings will be below that.
All of this memory is "safe" to print, except some non-printable characters will appear which might mess up a display terminal.
The chief potential problem is whether there is enough stack memory allocated and mapped to prevent a SIGSEGV during access. A segment fault could happen if there is too little environment data. Or if the implementation puts that data elsewhere so that there are only a few words of stack here. I suggest confirming that by cleaning out the environment variables and re-running the program.
This code would not be so harmless if any of the C runtime conventions are not true:
The architecture uses a stack
A local variable (void *x) is allocated on the stack
The stack grows toward lower numbered memory
Parameters are passed on the stack
Whether main() is called with arguments. (Some light duty environments, like embedded processors, invoke main() without parameters.)
In all mainstream modern implementations, all of these are generally true.
Illegal memory access is undefined behaviour. This means that your program might crash, but is not guaranteed to, because exact behaviour is undefined.
(A joke among developers, especially when facing coworkers that are careless about such things, is that "invoking undefined behaviour might format your hard drive, it's just not guaranteed to". ;-) )
Update: There's some hot discussion going on here. Yes, system developers should know what actually happens on a given system. But such knowledge is tied to the CPU, the operating system, the compiler etc., and generally of limited usefulness, because even if you make the code work, it would still be of very poor quality. That's why I limited my answer to the most important point, and the actual question asked ("why doesn't this crash"):
The code posted in the question does not have well-defined behaviour, but that does just mean that you can't really rely on what it does, not that it should crash.
If you dereference an invalid pointer, you are invoking undefined behaviour. Which means, the program can crash, it can work, it could cook some coffee, whatever.
When you have
int main(int argc, char **argv) {
void *x = 0; // you could also say void *x = (void *)10;
printx(&x);
}
You are declaring x as a pointer with value 0, and that pointer lives in the stack since it's a local variable. Now, you are passing to printx the address of x, which means that with
memcpy(str, rec, 1000);
you are copying data from above the stack (or in fact from the stack itself), to the stack (because the stack pointer address decreases on each push). The source data is likely to be covered by the same page table entry as you are copying just 1000 bytes, so you get no segmentation fault. However, ultimately, as already written, we are talking about undefined behavior.
It would be crashed with great probability if you write to unacceed area. But you are reading, it can be ok. But the behaviour will be still undefined.
Here's a simple example of a program that concatenates two strings.
#include <stdio.h>
void strcat(char *s, char *t);
void strcat(char *s, char *t) {
while (*s++ != '\0');
s--;
while ((*s++ = *t++) != '\0');
}
int main() {
char *s = "hello";
strcat(s, " world");
while (*s != '\0') {
putchar(*s++);
}
return 0;
}
I'm wondering why it works. In main(), I have a pointer to the string "hello". According to the K&R book, modifying a string like that is undefined behavior. So why is the program able to modify it by appending " world"? Or is appending not considered as modifying?
Undefined behavior means a compiler can emit code that does anything. Working is a subset of undefined.
I +1'd MSN, but as for why it works, it's because nothing has come along to fill the space behind your string yet. Declare a few more variables, add some complexity, and you'll start to see some wackiness.
Perhaps surprisingly, your compiler has allocated the literal "hello" into read/write initialized data instead of read-only initialized data. Your assignment clobbers whatever is adjacent to that spot, but your program is small and simple enough that you don't see the effects. (Put it in a for loop and see if you are clobbering the " world" literal.)
It fails on Ubuntu x64 because gcc puts string literals in read-only data, and when you try to write, the hardware MMU objects.
You were lucky this time.
Especially in debug mode some compilers will put spare memory (often filled with some obvious value) around declarations so you can find code like this.
It also depends on the how the pointer is declared. For example, can change ptr, and what ptr points to:
char * ptr;
Can change what ptr points to, but not ptr:
char const * ptr;
Can change ptr, but not what ptr points to:
const char * ptr;
Can't change anything:
const char const * ptr;
According to the C99 specifification (C99: TC3, 6.4.5, §5), string literals are
[...] used to initialize an array of static storage duration and length just
sufficient to contain the sequence. [...]
which means they have the type char [], ie modification is possible in principle. Why you shouldn't do it is explained in §6:
It is unspecified whether these arrays are distinct provided their elements have the
appropriate values. If the program attempts to modify such an array, the behavior is
undefined.
Different string literals with the same contents may - but don't have to - be mapped to the same memory location. As the behaviour is undefined, compilers are free to put them in read-only sections in order to cleanly fail instead of introducing possibly hard to detect error sources.
I'm wondering why it works
It doesn't. It causes a Segmentation Fault on Ubuntu x64; for code to work it shouldn't just work on your machine.
Moving the modified data to the stack gets around the data area protection in linux:
int main() {
char b[] = "hello";
char c[] = " ";
char *s = b;
strcat(s, " world");
puts(b);
puts(c);
return 0;
}
Though you then are only safe as 'world' fits in the unused spaces between stack data - change b to "hello to" and linux detects the stack corruption:
*** stack smashing detected ***: bin/clobber terminated
The compiler is allowing you to modify s because you have improperly marked it as non-const -- a pointer to a static string like that should be
const char *s = "hello";
With the const modifier missing, you've basically disabled the safety that prevents you from writing into memory that you shouldn't write into. C does very little to keep you from shooting yourself in the foot. In this case you got lucky and only grazed your pinky toe.
s points to a bit of memory that holds "hello", but was not intended to contain more than that. This means that it is very likely that you will be overwriting something else. That is very dangerous, even though it may seem to work.
Two observations:
The * in *s-- is not necessary. s-- would suffice, because you only want to decrement the value.
You don't need to write strcat yourself. It already exists (you probably knew that, but I'm telling you anyway:-)).