Why does a program accessing illegal pointer to pointer not crash? - c

A program accessing illegal pointer to pointer does not crash with SIGSEGV. This is not a good thing, but I’m wondering how this could be and how the process survived for many days in production. It is bewildering to me.
I have given this program a go in Windows, Linux, OpenVMS, and Mac OS and they have never complained.
#include <stdio.h>
#include <string.h>
void printx(void *rec) { // I know this should have been a **
char str[1000];
memcpy(str, rec, 1000);
printf("%*.s\n", 1000, str);
printf("Whoa..!! I have not crashed yet :-P");
}
int main(int argc, char **argv) {
void *x = 0; // you could also say void *x = (void *)10;
printx(&x);
}

I am not surprised by the lack of a memory fault. The program is not dereferencing an uninitialized pointer. Instead, it is copying and printing the contents of memory beginning at a pointer variable, and the 996 (or 992) bytes beyond it.
Since the pointer is a stack variable, it is printing memory near the top of stack for a ways down. That memory contains the stack frame of main(): possibly some saved register values, a count of program arguments, a pointer to the program arguments, a pointer to a list of environment variables, and a saved instruction register for main() to return, usually in the C runtime library startup code. In all implementations I have investigated, the stack frames below that has copies of the environment variables themselves, an array of pointers to them, and an array of pointers to the program arguments. In Unix environments (which you hint you are using) the program argument strings will be below that.
All of this memory is "safe" to print, except some non-printable characters will appear which might mess up a display terminal.
The chief potential problem is whether there is enough stack memory allocated and mapped to prevent a SIGSEGV during access. A segment fault could happen if there is too little environment data. Or if the implementation puts that data elsewhere so that there are only a few words of stack here. I suggest confirming that by cleaning out the environment variables and re-running the program.
This code would not be so harmless if any of the C runtime conventions are not true:
The architecture uses a stack
A local variable (void *x) is allocated on the stack
The stack grows toward lower numbered memory
Parameters are passed on the stack
Whether main() is called with arguments. (Some light duty environments, like embedded processors, invoke main() without parameters.)
In all mainstream modern implementations, all of these are generally true.

Illegal memory access is undefined behaviour. This means that your program might crash, but is not guaranteed to, because exact behaviour is undefined.
(A joke among developers, especially when facing coworkers that are careless about such things, is that "invoking undefined behaviour might format your hard drive, it's just not guaranteed to". ;-) )
Update: There's some hot discussion going on here. Yes, system developers should know what actually happens on a given system. But such knowledge is tied to the CPU, the operating system, the compiler etc., and generally of limited usefulness, because even if you make the code work, it would still be of very poor quality. That's why I limited my answer to the most important point, and the actual question asked ("why doesn't this crash"):
The code posted in the question does not have well-defined behaviour, but that does just mean that you can't really rely on what it does, not that it should crash.

If you dereference an invalid pointer, you are invoking undefined behaviour. Which means, the program can crash, it can work, it could cook some coffee, whatever.

When you have
int main(int argc, char **argv) {
void *x = 0; // you could also say void *x = (void *)10;
printx(&x);
}
You are declaring x as a pointer with value 0, and that pointer lives in the stack since it's a local variable. Now, you are passing to printx the address of x, which means that with
memcpy(str, rec, 1000);
you are copying data from above the stack (or in fact from the stack itself), to the stack (because the stack pointer address decreases on each push). The source data is likely to be covered by the same page table entry as you are copying just 1000 bytes, so you get no segmentation fault. However, ultimately, as already written, we are talking about undefined behavior.

It would be crashed with great probability if you write to unacceed area. But you are reading, it can be ok. But the behaviour will be still undefined.

Related

Why is Pointer = NULL defined as clean code?

So my programming teacher told us, that if you don't use a pointer but like to declare it, it is always better to initialize it with NULL. How does it prevent any errors if i don't even use it?
Or if I am wrong, what are the benefits of it?
De-referencing NULL is likely (guaranteed?) to cause a segmentation fault, immediately crashing your app and alerting you to the unsafe memory access you just preformed.
Leaving your pointer uninitialized will mean it still has whatever junk was left over from the previous user of that memory. It's entirely possible for that to be a pointer to a real memory region in your app. Dereferencing it will be undefined behavior, might cause a seg fault, or it might not. The latter is the worst case, which you should fear. Your app will just keep chugging along with whatever non-sense behavior resulted from that.
Here's a demonstration:
#include <stdio.h>
#include <stdlib.h>
void i_segfault() {
fflush(stdout); // Flush whatever is left of STDOUT before we blow up
int *i = NULL;
printf("%i", *i); // Boom
}
void use_some_memory() {
int *some_pointer = malloc(sizeof(int));
*some_pointer = 123;
printf(
"I used the address %p to store a pointer to %p, which contains %i\n",
&some_pointer, some_pointer, *some_pointer
);
}
void i_dont_segfault() {
int *my_new_pointer; // uninitialized
printf(
"I re-used the address %p, which still has a lingering value %p, which still points to %i\n",
&my_new_pointer, my_new_pointer, *my_new_pointer
);
}
int main(int argc, char *argv[]) {
use_some_memory();
i_dont_segfault();
i_segfault();
}
There are at least two ways to make sure your program does not use an uninitialized pointer:
You can design and write your program carefully so that you are sure that program control never flows through a path that uses the pointer without initializing it first.
You can initialize the pointer to NULL.
The first can be hard, depending on the program, and humans keep making mistakes with it. The second is easy.
On the other hand, the second only solves one problem: It ensures that if you use the pointer without otherwise initializing it, it will have an assigned value. Further, in many systems, it ensures that if you attempt to use that value to access a pointed-to object, your program will crash rather than do something worse, like corrupt data and produce wrong results or erase valuable information.
That is a useful problem to solve, because it means this bug of failing to assign the desired value is likely to be caught during testing and, even if it is not, the damage is may due is likely to be limited. However, this does not solve the problem of ensuring that the pointer is assigned the desired value before it is used. So, like many of these code recommendations, it is a useful tip to help limit human errors, but it is not a complete solution.
If you don't initialize a variable (by mistake) the problem is that your program can be even more anomalous than normal. Imagine you have declared an enum with the values ONE, TWO, THREE, and you forget to initialize it, and at some point you include the following code:
switch(my_var) {
case ONE: /* do one */
...
break;
case TWO: /* do two */
...
break;
case THREE: /* do three */
...
break;
}
and you can get nuts because you assume that, at least one of the possible values of the type should be executed... but that's simply not true, as the variable has not been initialized and the value doesn't need even to comply with the constraints imposed to the data type it represents.
Initializing a variable can introduce a short couple of instructions in your code, but it saves a lot of nightmares when searching for errors.
In the case of an uninitialized pointer, the case is worse, as many programmers free() memory allocated from the heap based on the test
if (var) free(var);
and most probably an uninitialized automatic variable of pointer type will point somewhere, and sill be tried to be free()d.

Accessing array beyond it's size [duplicate]

A program accessing illegal pointer to pointer does not crash with SIGSEGV. This is not a good thing, but I’m wondering how this could be and how the process survived for many days in production. It is bewildering to me.
I have given this program a go in Windows, Linux, OpenVMS, and Mac OS and they have never complained.
#include <stdio.h>
#include <string.h>
void printx(void *rec) { // I know this should have been a **
char str[1000];
memcpy(str, rec, 1000);
printf("%*.s\n", 1000, str);
printf("Whoa..!! I have not crashed yet :-P");
}
int main(int argc, char **argv) {
void *x = 0; // you could also say void *x = (void *)10;
printx(&x);
}
I am not surprised by the lack of a memory fault. The program is not dereferencing an uninitialized pointer. Instead, it is copying and printing the contents of memory beginning at a pointer variable, and the 996 (or 992) bytes beyond it.
Since the pointer is a stack variable, it is printing memory near the top of stack for a ways down. That memory contains the stack frame of main(): possibly some saved register values, a count of program arguments, a pointer to the program arguments, a pointer to a list of environment variables, and a saved instruction register for main() to return, usually in the C runtime library startup code. In all implementations I have investigated, the stack frames below that has copies of the environment variables themselves, an array of pointers to them, and an array of pointers to the program arguments. In Unix environments (which you hint you are using) the program argument strings will be below that.
All of this memory is "safe" to print, except some non-printable characters will appear which might mess up a display terminal.
The chief potential problem is whether there is enough stack memory allocated and mapped to prevent a SIGSEGV during access. A segment fault could happen if there is too little environment data. Or if the implementation puts that data elsewhere so that there are only a few words of stack here. I suggest confirming that by cleaning out the environment variables and re-running the program.
This code would not be so harmless if any of the C runtime conventions are not true:
The architecture uses a stack
A local variable (void *x) is allocated on the stack
The stack grows toward lower numbered memory
Parameters are passed on the stack
Whether main() is called with arguments. (Some light duty environments, like embedded processors, invoke main() without parameters.)
In all mainstream modern implementations, all of these are generally true.
Illegal memory access is undefined behaviour. This means that your program might crash, but is not guaranteed to, because exact behaviour is undefined.
(A joke among developers, especially when facing coworkers that are careless about such things, is that "invoking undefined behaviour might format your hard drive, it's just not guaranteed to". ;-) )
Update: There's some hot discussion going on here. Yes, system developers should know what actually happens on a given system. But such knowledge is tied to the CPU, the operating system, the compiler etc., and generally of limited usefulness, because even if you make the code work, it would still be of very poor quality. That's why I limited my answer to the most important point, and the actual question asked ("why doesn't this crash"):
The code posted in the question does not have well-defined behaviour, but that does just mean that you can't really rely on what it does, not that it should crash.
If you dereference an invalid pointer, you are invoking undefined behaviour. Which means, the program can crash, it can work, it could cook some coffee, whatever.
When you have
int main(int argc, char **argv) {
void *x = 0; // you could also say void *x = (void *)10;
printx(&x);
}
You are declaring x as a pointer with value 0, and that pointer lives in the stack since it's a local variable. Now, you are passing to printx the address of x, which means that with
memcpy(str, rec, 1000);
you are copying data from above the stack (or in fact from the stack itself), to the stack (because the stack pointer address decreases on each push). The source data is likely to be covered by the same page table entry as you are copying just 1000 bytes, so you get no segmentation fault. However, ultimately, as already written, we are talking about undefined behavior.
It would be crashed with great probability if you write to unacceed area. But you are reading, it can be ok. But the behaviour will be still undefined.

In which manner array of characters is allocated memory locally? [duplicate]

This question already has answers here:
returning a local variable from function in C [duplicate]
(4 answers)
Closed 8 years ago.
This is the code I've written,
char *foo();
void main()
{
char *str=foo();
strcpy(str,"Holy sweet moses! I blew my stack!!");
printf("%s",str);
}
char * foo()
{
char str[256];
return str;
}
When I use char array in function foo(), the strcpy in main() function doesn't copy the string into str. But, when I use int array in function foo(), main() strcpy copies successfully.
i.e.
int str[256]; //in function foo
output
Holy sweet moses! I blew my stack!!
if
char str[256]; //in foo()
output : nothing!
What you are doing is clearly UNDEF, but.. Let's try to understand WHY it works with ints and not with chars..
TL;DR: printf uses the stack, overwriting some of the space pointed by str, but since the int array is bigger in memory than the char array, it is "far ahead" in the stack and doesn't get overwritten.
An int is 4 bytes, so 256 ints will be 1024 bytes.
If the array is in the stack, this will point to RBP - 1024 for example.
With chars, a char is 1 byte, 256 chars will be 256 bytes.
If the array is in the stack, this will point to RBP - 256 for example.
What does this mean? the str pointer will point to 1024 or 256 bytes "ahead" of the current stack pointer when foo returns.
SO.. when you call strcpy(str, "yourstring"); that memory could get overwritten with the stack that strcpy and printf uses. The thing here is that it IS overwritten but not all the stack, just a little, but enough to cover 256 bytes, and thus, that function can overwrite the copied string, this doesn't happen with your int array, because the string will be copied 1024 bits ahead of the stack pointer and strcpy and printf don't use so much stack.
Let me show you how your stack will end up:
If you change the size of the char array it will probably work.
All of this is undefined behaviour and completely depends on your architecture, computer and compiler. I'm using Linux x86_64 at the moment.
Are you aware of Scope and Lifetime Of Variables Concept ? If you are, then you do know that what you are trying to do invokes "Undefined Behavior". You are lucky your code is even printing out something or not at all printing rather than referencing the unallocated memory and crashing due to Heap corruption.
From SO Soln::
To what extent are stack/heap controlled by the OS or language
runtime?
The OS allocates the stack for each system-level thread when the thread is created. Typically the OS is called by the language runtime to allocate the heap for the application.
What is their scope?
The stack is attached to a thread, so when the thread exits the stack is reclaimed. The heap is typically allocated at application startup by the runtime, and is reclaimed when the application (technically process) exits.
What determines the size of each of them?
The size of the stack is set when a thread is created. The size of the heap is set on application startup, but can grow as space is needed (the allocator requests more memory from the operating system).
When you function foo() returns, it returns address of a string from the stack. When function exits, your pointer is useless, because your string is removed from the stack when foo() stops. So you have a pointer to some place in memory, but can't tell what's there
Read compiler warnings, i bet there was at least one telling you that your function returns pointer to a local variable.(mine does and I did it a few times).
From your string about blowing the stack I conclude that you know that what you're doing is wrong. Therefore my answer will be: Undefined behavior is undefined. When writing to memory that's not yours to write to anything can happen including things that you might expect and things that you might not expect. Other undefined behaviors might be marginally interesting to explore, writing to memory that doesn't belong to you isn't. It's always wrong, it will always do something you don't expect and there's no situation where the correct solution is anything other than not doing it anymore.
When you change the array from char to int, you change the size of the array and since most likely the stack grows down on your architecture it changes the address of the memory you're not supposed to overwrite in the first place.

pointer to a structure

this will give a proper output even though i have not allocated memory and have declared a pointer to structure two inside main
struct one
{
char x;
int y;
};
struct two
{
char a;
struct one * ONE;
};
main()
{
struct two *TWO;
scanf("%d",&TWO->ONE->y);
printf("%d\n",TWO->ONE->y);
}
but when i declare a pointer to two after the structure outside main i will get segmentation fault but why is it i don't get segmentation fault in previous case
struct one
{
char x;
int y;
};
struct two
{
char a;
struct one * ONE;
}*TWO;
main()
{
scanf("%d",&TWO->ONE->y);
printf("%d\n",TWO->ONE->y);
}
In both the cases TWO is a pointer to a object of type struct two.
In case 1 the pointer is wild and can be pointing anywhere.
In case 2 the pointer is NULL as it is global.
But in both the cases it a pointer not pointing to a valid struct two object. Your code in scanf is treating this pointer as though it was referring to a valid object. This leads to undefined behavior.
Because what you are doing is undefined behaviour. Sometimes it seems to work. That doesn't mean you should do it :-)
The most likely explanation is to do with how the variables are initialised. Automatic variables (on the stack) will get whatever garbage happens to be on the stack when the stack pointer was decremented.
Variables outside functions (like in the second case) are always initialised to zero (null pointer for pointer types).
That's the basic difference between your two situations but, as I said, the first one is working purely by accident.
When declaring a global pointer, it will be initialized to zero, and so the generated addresses will be small numbers that may or may not be readable on your system.
When declaring an automatic pointer, its initial value is likely to be much more interesting. It will be, in this case, whatever the run-time library left at that point on the stack prior to calling main(), or perhaps a left-over value from the compiler-generated stack-frame setup code. It is somewhat likely to be a saved stack pointer or frame pointer, which is a valid pointer if used with small offsets.
So anyway, the uninitialized pointer does have something in it, and one value leads to a fault while the other, for now, on your system, does not.
And that's because the segmentation fault is a mechanism of the OS and not the C language.
A fault is a block-based mechanism that allocates to itself and other programs some number of pages -- which are each several K -- and it protects itself and other program's pages while allowing your program free reign. You must stray outside of the block context or try to write a read-only page (even if yours) to generate a fault. Simply breaking a language rule is not necessarily enough. The OS is happy to let your program misbehave and act oddly due to its wild references, just as long as it only reads and writes (or clobbers) itself.

in which segment of the program are function pointers stored?

I wanted to know in which section of the program are function pointers stored? As in, is it on the program stack or is there a separate section for the same?
void f(void){}
int main(void){
int x[10];
void (*fp)(void) = NULL;
fp = f;
return 0;
}
Now, will the address of x and fp be in the same segment of the program's stack memory?
A function pointer is no different from any other pointer in terms of storage, which is again no different from any other variable. So yes, they'll all be stored together in the same place, which is the stack for local variables.
With a good compiler, they won't exist anywhere because their values are never used and contribute nothing to the output of the program.
The answer to this precise question is that your two examples (an array of ints and a pointer-to-a-function) are both local variables and both are kept on "the stack" (the stack is a bit conceptual but at the level of your question, it's the right way to think about it), so the addresses of x and fp are both there.
What you might possibly be getting at however (with "which section of the program are function pointers stored") maybe something a bit different: if you assign a value to the pointer-to-function--as in you assign it the address of an actual function-- the address of the function is contains will almost certainly be somewhere else, because executable code is located in a different part of system memory than the execution stack.
(The array of ints is allocated entirely on the stack and if you treat x as a pointer, it will point into the stack area.)

Resources