While executing the following program in Visual Studio 2012 Console App:
#include <stdio.h>
int main() {
int integer1, integer2, sum;
char str[5];
scanf("%s",str); /* Try to enter 10 chars */
printf("%s\n",str);
printf( "Enter first integer\n" );
scanf( "%d", &integer1 );
printf( "Enter second integer\n" );
scanf( "%d", &integer2 );
sum = integer1 + integer2;
printf( "Sum = %d\n", sum );
return 0;
}
It throws an exception "StackOverFlow" and this is obvious because of the statement:
scanf("%s",str); /* Try to enter 10 chars */
My question is: Why does the program continue the execution (by printing the str string, asking for entering the 2 integers, sum them and print the result also) even though the exception should have happened earlier?
The stack grows down, from high addresses to lower addresses. A CPU register, the stack pointer keeps track of the top of stack - which is in reality at the lowest address, because the stack grows towards lower addresses. The compiler looks at your function (main in this case) and looks to see how much automatic storage it needs, that is, storage for local variables. It generates code to decrement that stack pointer by the amount of local storage needed by your function. When the function gets called the caller pushes on the stack the return address (decrementing the stack pointer) and then branches to the called function, which in turn decrements the stack pointer (creating a stack frame) to make room for local variables.
If a program overflows the local variables (as yours did) it is likely to trash the return address. Since the stack grew down towards lower addresses, writing beyond the stack frame (towards higher addresses) will overwrite older stack frames (your caller, and the caller's caller, etc).
Although main() is the first function to be called in your program, there is already an active stack frame, corresponding to main()'s caller, which is the runtime environment.
Any side effects of trashing the stack (like overwriting the return address) won't be noticed until your function, main() in this case, tries to return. What happens then is anyone's guess. If the return address was overwritten with a value that points to a location on the stack, the CPU will branch there, this is a classic exploit by malicious code that takes advantage of buffer overflows with buffers allocated on the stack.
These links are helpful in understanding stack based buffer overflow:
http://www.tenouk.com/Bufferoverflowc/Bufferoverflow3.html
http://en.wikipedia.org/wiki/Format_string_attack
Recent microprocessors provide a security feature that prevents execution of data, the CPU would raise an exception as soon as your program attempts to return to a corrupted address that points to data (like the stack).
http://en.wikipedia.org/wiki/NX_bit
Because C doesn't check everything (anything?). Your long string has scribbled on the stack, which when the function returns the stack corruption is noticed.
It's worth noting that safe versions of scanf type functions should always be used.
In C, code can't throw exceptions. Also, scanf() doesn't check the stack.
What probably happens is that Visual Studio creates the environment for your program, including setting up the stack. While it does that, it fills the stack with a pattern.
When main() returns, the pattern is checked. Only at that time, the C runtime will notice that you trashed the stack.
Conclusion: Never use the unsafe versions of scanf() and sprintf(). The runtime might catch the error but it will do it too late and even when you get an error message, that won't help you one bit to find out when it happened.
Related
A program accessing illegal pointer to pointer does not crash with SIGSEGV. This is not a good thing, but I’m wondering how this could be and how the process survived for many days in production. It is bewildering to me.
I have given this program a go in Windows, Linux, OpenVMS, and Mac OS and they have never complained.
#include <stdio.h>
#include <string.h>
void printx(void *rec) { // I know this should have been a **
char str[1000];
memcpy(str, rec, 1000);
printf("%*.s\n", 1000, str);
printf("Whoa..!! I have not crashed yet :-P");
}
int main(int argc, char **argv) {
void *x = 0; // you could also say void *x = (void *)10;
printx(&x);
}
I am not surprised by the lack of a memory fault. The program is not dereferencing an uninitialized pointer. Instead, it is copying and printing the contents of memory beginning at a pointer variable, and the 996 (or 992) bytes beyond it.
Since the pointer is a stack variable, it is printing memory near the top of stack for a ways down. That memory contains the stack frame of main(): possibly some saved register values, a count of program arguments, a pointer to the program arguments, a pointer to a list of environment variables, and a saved instruction register for main() to return, usually in the C runtime library startup code. In all implementations I have investigated, the stack frames below that has copies of the environment variables themselves, an array of pointers to them, and an array of pointers to the program arguments. In Unix environments (which you hint you are using) the program argument strings will be below that.
All of this memory is "safe" to print, except some non-printable characters will appear which might mess up a display terminal.
The chief potential problem is whether there is enough stack memory allocated and mapped to prevent a SIGSEGV during access. A segment fault could happen if there is too little environment data. Or if the implementation puts that data elsewhere so that there are only a few words of stack here. I suggest confirming that by cleaning out the environment variables and re-running the program.
This code would not be so harmless if any of the C runtime conventions are not true:
The architecture uses a stack
A local variable (void *x) is allocated on the stack
The stack grows toward lower numbered memory
Parameters are passed on the stack
Whether main() is called with arguments. (Some light duty environments, like embedded processors, invoke main() without parameters.)
In all mainstream modern implementations, all of these are generally true.
Illegal memory access is undefined behaviour. This means that your program might crash, but is not guaranteed to, because exact behaviour is undefined.
(A joke among developers, especially when facing coworkers that are careless about such things, is that "invoking undefined behaviour might format your hard drive, it's just not guaranteed to". ;-) )
Update: There's some hot discussion going on here. Yes, system developers should know what actually happens on a given system. But such knowledge is tied to the CPU, the operating system, the compiler etc., and generally of limited usefulness, because even if you make the code work, it would still be of very poor quality. That's why I limited my answer to the most important point, and the actual question asked ("why doesn't this crash"):
The code posted in the question does not have well-defined behaviour, but that does just mean that you can't really rely on what it does, not that it should crash.
If you dereference an invalid pointer, you are invoking undefined behaviour. Which means, the program can crash, it can work, it could cook some coffee, whatever.
When you have
int main(int argc, char **argv) {
void *x = 0; // you could also say void *x = (void *)10;
printx(&x);
}
You are declaring x as a pointer with value 0, and that pointer lives in the stack since it's a local variable. Now, you are passing to printx the address of x, which means that with
memcpy(str, rec, 1000);
you are copying data from above the stack (or in fact from the stack itself), to the stack (because the stack pointer address decreases on each push). The source data is likely to be covered by the same page table entry as you are copying just 1000 bytes, so you get no segmentation fault. However, ultimately, as already written, we are talking about undefined behavior.
It would be crashed with great probability if you write to unacceed area. But you are reading, it can be ok. But the behaviour will be still undefined.
The problem seems to be the "*p = 20;" command, although I simply do not get why. Whenever I add it, I get the error "stack around the variable 'var' was corrupted".
main(void)
{
int* p;
int var;
p = &var;
*p = 16;
p++;
*p = 20;
system("pause");
}
After this statement
p++;
the pointer p does not point to a valid object (it points now to the memory beyond the object var of the type int). Thus this statement
*p = 20;
results in undefined behavior. That is
stack around the variable 'var' was corrupted
Vlad from Moscow's answer is correct. Here's some additional information on why that is that I thought would be useful.
The reason you are getting this error is because you're overwriting protected memory on the stack, which Vlad from Moscow has shown to you. compilers will protect you against stack corruption. There are various ways to implement this, but one way is to use a "canary," which is just a value or values at a specific memory location. If those values are changed, the compiler knows that the stack was corrupted, and can give you an error message like "stack around the variable 'var' was corrupted." In your case, the memory location 4 bytes beyond the variable var is probably a canary value on the stack (if your particular comipler uses canaries), which you changing, and that causes the error.
See the Wikipedia article on buffer overflow protection for more information.
Here's an excerpt from the article:
Typically, buffer overflow protection modifies the organization of stack-allocated data so it includes a canary value that, when destroyed by a stack buffer overflow, shows that a buffer preceding it in memory has been overflowed. By verifying the canary value, execution of the affected program can be terminated, preventing it from misbehaving or from allowing an attacker to take control over it. Other buffer overflow protection techniques include bounds checking, which checks accesses to each allocated block of memory so they cannot go beyond the actually allocated space, and tagging, which ensures that memory allocated for storing data cannot contain executable code.
Edit:
I actually really like the next paragraph in the wiki article as well:
Overfilling a buffer allocated on the stack is more likely to influence program execution than overfilling a buffer on the heap because the stack contains the return addresses for all active function calls. However, similar implementation-specific protections also exist against heap-based overflows.
A program accessing illegal pointer to pointer does not crash with SIGSEGV. This is not a good thing, but I’m wondering how this could be and how the process survived for many days in production. It is bewildering to me.
I have given this program a go in Windows, Linux, OpenVMS, and Mac OS and they have never complained.
#include <stdio.h>
#include <string.h>
void printx(void *rec) { // I know this should have been a **
char str[1000];
memcpy(str, rec, 1000);
printf("%*.s\n", 1000, str);
printf("Whoa..!! I have not crashed yet :-P");
}
int main(int argc, char **argv) {
void *x = 0; // you could also say void *x = (void *)10;
printx(&x);
}
I am not surprised by the lack of a memory fault. The program is not dereferencing an uninitialized pointer. Instead, it is copying and printing the contents of memory beginning at a pointer variable, and the 996 (or 992) bytes beyond it.
Since the pointer is a stack variable, it is printing memory near the top of stack for a ways down. That memory contains the stack frame of main(): possibly some saved register values, a count of program arguments, a pointer to the program arguments, a pointer to a list of environment variables, and a saved instruction register for main() to return, usually in the C runtime library startup code. In all implementations I have investigated, the stack frames below that has copies of the environment variables themselves, an array of pointers to them, and an array of pointers to the program arguments. In Unix environments (which you hint you are using) the program argument strings will be below that.
All of this memory is "safe" to print, except some non-printable characters will appear which might mess up a display terminal.
The chief potential problem is whether there is enough stack memory allocated and mapped to prevent a SIGSEGV during access. A segment fault could happen if there is too little environment data. Or if the implementation puts that data elsewhere so that there are only a few words of stack here. I suggest confirming that by cleaning out the environment variables and re-running the program.
This code would not be so harmless if any of the C runtime conventions are not true:
The architecture uses a stack
A local variable (void *x) is allocated on the stack
The stack grows toward lower numbered memory
Parameters are passed on the stack
Whether main() is called with arguments. (Some light duty environments, like embedded processors, invoke main() without parameters.)
In all mainstream modern implementations, all of these are generally true.
Illegal memory access is undefined behaviour. This means that your program might crash, but is not guaranteed to, because exact behaviour is undefined.
(A joke among developers, especially when facing coworkers that are careless about such things, is that "invoking undefined behaviour might format your hard drive, it's just not guaranteed to". ;-) )
Update: There's some hot discussion going on here. Yes, system developers should know what actually happens on a given system. But such knowledge is tied to the CPU, the operating system, the compiler etc., and generally of limited usefulness, because even if you make the code work, it would still be of very poor quality. That's why I limited my answer to the most important point, and the actual question asked ("why doesn't this crash"):
The code posted in the question does not have well-defined behaviour, but that does just mean that you can't really rely on what it does, not that it should crash.
If you dereference an invalid pointer, you are invoking undefined behaviour. Which means, the program can crash, it can work, it could cook some coffee, whatever.
When you have
int main(int argc, char **argv) {
void *x = 0; // you could also say void *x = (void *)10;
printx(&x);
}
You are declaring x as a pointer with value 0, and that pointer lives in the stack since it's a local variable. Now, you are passing to printx the address of x, which means that with
memcpy(str, rec, 1000);
you are copying data from above the stack (or in fact from the stack itself), to the stack (because the stack pointer address decreases on each push). The source data is likely to be covered by the same page table entry as you are copying just 1000 bytes, so you get no segmentation fault. However, ultimately, as already written, we are talking about undefined behavior.
It would be crashed with great probability if you write to unacceed area. But you are reading, it can be ok. But the behaviour will be still undefined.
#include<stdio.h>
int *fun();
int main()
{
int *ptr;
ptr=fun();
printf("%d",*ptr);
printf("%d",*ptr);
}
int * fun()
{
int k=4;//If auto then cannot print it two times.....stack will be changed
return(&k);
}
O/P: 4
-2
Calling printf() for the first time prints the correct value.
Calling any function (even printf( ) ) immediately after the call to fun( ). This time printf( ) prints a garbage value. Why does this happen?Why do we not get a garbage value during the first print statement itself????
This is not behavior you can rely on; it may and likely will differ on different systems, even different versions of the compiler or different compiler switches.
Given that, what is likely happening is this: fun returns a pointer to where it stored k. That part of the stack is no longer reliable, because the function that allocated it has exited. Nonetheless, nobody has written over it yet, so the 4 is still where it was written. Then main prepares to call printf. To do so, it gets the first argument, *ptr. To do this, it loads from the place ptr points, which is the (former) address of k, so the load gets the 4 that is there. This 4 is stored in a register or stack location to be passed to printf. Then the address of the format string, "%d", is stored to be passed to printf. Then printf is called. At this point, printf uses a great deal of stack and writes new data where k used to be. However, the 4 that was passed as an argument is in a safe place, where arguments to printf should be, so printf prints it. Then printf returns. Then the main routine prepares to call printf again. This time, when it loads from where ptr points, the 4 is no longer there; it is some value that was written during the first call to printf. So that value is what is passed to printf and is what is printed.
Never write code that uses this behavior. It is not reliable, and it is not proper code.
Why does it surprise you? The behavior is undefined, but there's nothing unusual in observing what you observed.
All variables live somewhere in memory. When a variable gets formally destroyed (like local variables when function exits) the memory it used to occupy still exists and, most likely, still holds the last value that was written to it. That memory in now officially free, but it will continue to hold that last value until some other code reuses that memory for other purposes and overwrites it.
This is what you observe in your experiment. Even though the variable kno longer exists, pointer ptr still points to its former location. And that former location still happens to hold the last value of k, which is 4.
The very first printf "successfully" receives a copy of that value for printing. And that very first printf is actually the one that reuses the old memory location and overwrites the former value of k. All further attempts to dereference ptr will show that 4 is no longer there, which is why your second printf prints something else.
Variable k is local to fun(), means it will be destroyed when the function returns. This is a very bad coding technique and will always lead to problem.
And the reason why the first printf returns the correct value:
First of all it might or might not return the value. The thing is suppose k is written somewhere on stack memory. First time when the function returns printf might get the correct value because that part of memory might exist for a while. But this is not guaranteed.
Here is my code
#include<stdio.h>
int * fun(int a1,int b)
{
int a[2];
a[0]=a1;
a[1]=b;
//int c=5;
printf("%x\n",&a[0]);
return a;
}
int main()
{
int *r=fun(3,5);
printf("%d\n",r[0]);
printf("%d\n",r[0]);
}
I am running codeblocks on Windows 7
Every time I run the loop I get the outputs as
22fee8
3
2293700
Here is the part I do not understand :
r expects a pointer to a part of memory which is interpreted as a sequence of boxes (each box of 4 byte width - >Integers ) on invoking fun function
What should happen is printf of function will print the address of a or address of a[0]:
Seconded
NOW THE QUESTION IS :
each time I run the program I get the same address?
And the array a should be destroyed at the end of Function fun only pointer must remain after function call
Then why on earth does the line r[0] must print 3?
r is pointing to something that doesn't exist anymore. You are returning a pointer to something on the stack. That stack will rewind when fun() ends. It can point to anything after that but nothing has overwritten it because another function is never called.
Nothing forces r[0] to be 3 - it's just a result of going for the simplest acceptable behaviour.
Basically, you're right that a must be destroyed at the end of fun. All this means is that the returned pointer (r in your case) is completely unreliable. That is, even though r[0] == 3 for you on your particular machine with your particular compiler, there's no guarantee that this will always hold on every machine.
To understand why it is so consistent for you, think about this: what does is mean for a to be destroyed? Only that you can't use it in any reliable way. The simplest way of satisfying this simple requirement is for the stack pointer to move back to the point where fun was called. So when you use r[0], the values of a are still present, but they are junk data - you can't count on them existing.
This is what happens:
int a[2]; is allocated on the stack (or similar). Suppose it gets allocated at the stack at address 0x12345678.
Various data gets pushed on the stack at this address, as the array is filled. Everything works as expected.
The address 0x12345678 pointing at the stack gets returned. (Ironically, the address itself likely gets returned on the stack.)
The memory allocated on the stack for a ceases to be valid. For now the two int values still sit at the given address in RAM, containing the values assigned to them. But the stack pointer isn't reserving those cells, nor is anything else in the program keeping track of that data. Computers don't delete data by erasing the value etc, they delete cells by forgetting that anything of use is stored at that memory location.
When the function ends, those memory cells are free to be used for the rest of the program. For example, a value returned by the function might end up there instead.
The function returned a pointer to a segment on the stack where there used to be valid data. The pointer is still 0x12345678 but at that address, anything might be stored by now. Furthermore, the contents at that address may change as different items are pushed/popped from the stack.
Printing the contents of that address will therefore give random results. Or it could print the same garbage value each time the program is executed. In fact it isn't guaranteed to print anything at all: printing the contents of an invalid memory cell is undefined behavior in C. The program could even crash when you attempt it.
r is undefined after the stack of the function int * fun(int a1,int b) is released, right after it ends, so it can be 3 or 42 or whatever value. The fact that it still contains your expected value is because it haven't been used for anything else, as a chunk of your memory is reserved for your program and your program does not use the stack further. Then after the first 3 is printed you get another value, that means that stack was used for something else, you could blame printf() since it's the only thing runing and it does a LOT of things to get that numbers into the console.
Why does it always print the same results? Because you always do the same process, there's no magic in it. But there's no guarantee that it'll be 3 since that 'memory space' is not yours and you are only 'peeking' into it.
Also, check the optimization level of your compiler fun() and main(), being as simple as they are, could be inline'd or replaced if the binary is to be optimized reducing the 'randomness' expected in your results. Although I wouldn't expect it to change much either.
You can find pretty good answers here:
can-a-local-variables-memory-be-accessed-outside-its-scope
returning-the-address-of-local-or-temporary-variable
return-reference-to-local-variable
Though the examples are for C++, underlying idea is same.