I wanted to know in which section of the program are function pointers stored? As in, is it on the program stack or is there a separate section for the same?
void f(void){}
int main(void){
int x[10];
void (*fp)(void) = NULL;
fp = f;
return 0;
}
Now, will the address of x and fp be in the same segment of the program's stack memory?
A function pointer is no different from any other pointer in terms of storage, which is again no different from any other variable. So yes, they'll all be stored together in the same place, which is the stack for local variables.
With a good compiler, they won't exist anywhere because their values are never used and contribute nothing to the output of the program.
The answer to this precise question is that your two examples (an array of ints and a pointer-to-a-function) are both local variables and both are kept on "the stack" (the stack is a bit conceptual but at the level of your question, it's the right way to think about it), so the addresses of x and fp are both there.
What you might possibly be getting at however (with "which section of the program are function pointers stored") maybe something a bit different: if you assign a value to the pointer-to-function--as in you assign it the address of an actual function-- the address of the function is contains will almost certainly be somewhere else, because executable code is located in a different part of system memory than the execution stack.
(The array of ints is allocated entirely on the stack and if you treat x as a pointer, it will point into the stack area.)
Related
A program accessing illegal pointer to pointer does not crash with SIGSEGV. This is not a good thing, but I’m wondering how this could be and how the process survived for many days in production. It is bewildering to me.
I have given this program a go in Windows, Linux, OpenVMS, and Mac OS and they have never complained.
#include <stdio.h>
#include <string.h>
void printx(void *rec) { // I know this should have been a **
char str[1000];
memcpy(str, rec, 1000);
printf("%*.s\n", 1000, str);
printf("Whoa..!! I have not crashed yet :-P");
}
int main(int argc, char **argv) {
void *x = 0; // you could also say void *x = (void *)10;
printx(&x);
}
I am not surprised by the lack of a memory fault. The program is not dereferencing an uninitialized pointer. Instead, it is copying and printing the contents of memory beginning at a pointer variable, and the 996 (or 992) bytes beyond it.
Since the pointer is a stack variable, it is printing memory near the top of stack for a ways down. That memory contains the stack frame of main(): possibly some saved register values, a count of program arguments, a pointer to the program arguments, a pointer to a list of environment variables, and a saved instruction register for main() to return, usually in the C runtime library startup code. In all implementations I have investigated, the stack frames below that has copies of the environment variables themselves, an array of pointers to them, and an array of pointers to the program arguments. In Unix environments (which you hint you are using) the program argument strings will be below that.
All of this memory is "safe" to print, except some non-printable characters will appear which might mess up a display terminal.
The chief potential problem is whether there is enough stack memory allocated and mapped to prevent a SIGSEGV during access. A segment fault could happen if there is too little environment data. Or if the implementation puts that data elsewhere so that there are only a few words of stack here. I suggest confirming that by cleaning out the environment variables and re-running the program.
This code would not be so harmless if any of the C runtime conventions are not true:
The architecture uses a stack
A local variable (void *x) is allocated on the stack
The stack grows toward lower numbered memory
Parameters are passed on the stack
Whether main() is called with arguments. (Some light duty environments, like embedded processors, invoke main() without parameters.)
In all mainstream modern implementations, all of these are generally true.
Illegal memory access is undefined behaviour. This means that your program might crash, but is not guaranteed to, because exact behaviour is undefined.
(A joke among developers, especially when facing coworkers that are careless about such things, is that "invoking undefined behaviour might format your hard drive, it's just not guaranteed to". ;-) )
Update: There's some hot discussion going on here. Yes, system developers should know what actually happens on a given system. But such knowledge is tied to the CPU, the operating system, the compiler etc., and generally of limited usefulness, because even if you make the code work, it would still be of very poor quality. That's why I limited my answer to the most important point, and the actual question asked ("why doesn't this crash"):
The code posted in the question does not have well-defined behaviour, but that does just mean that you can't really rely on what it does, not that it should crash.
If you dereference an invalid pointer, you are invoking undefined behaviour. Which means, the program can crash, it can work, it could cook some coffee, whatever.
When you have
int main(int argc, char **argv) {
void *x = 0; // you could also say void *x = (void *)10;
printx(&x);
}
You are declaring x as a pointer with value 0, and that pointer lives in the stack since it's a local variable. Now, you are passing to printx the address of x, which means that with
memcpy(str, rec, 1000);
you are copying data from above the stack (or in fact from the stack itself), to the stack (because the stack pointer address decreases on each push). The source data is likely to be covered by the same page table entry as you are copying just 1000 bytes, so you get no segmentation fault. However, ultimately, as already written, we are talking about undefined behavior.
It would be crashed with great probability if you write to unacceed area. But you are reading, it can be ok. But the behaviour will be still undefined.
How does a compiler know if something is allocated on the heap or stack, for instance if I made a variable in a function and returned the address of the variable, the compiler warns me that "function returns address of a local variable":
#include <stdio.h>
int* something() {
int z = 21;
return &z;
}
int main() {
int *d = something();
return 0;
}
I understand why this is a warning because when the function exits, the stack frame is no more and if you have a pointer to that memory and you change it's value you will cause a segmentation fault. What I wonder is how the compiler will know if that variable is allocating memory via. malloc, or how it can tell if it's a local variable on the stack?
A compiler builds a syntax tree from which it is able to analyze each part of the source code.
It builds a symbol table which associates to each symbol defined some information. This is required for many aspects:
finding undeclared identifiers
checking that types are convertible
so on
Once you have this symbol table it is quite easy to know if you are trying to return the address of a local variable since you end up having a structure like
ReturnStatement
+ UnaryOperator (&)
+ Identifier (z)
So the compiler can easily check if the identifier is a local stack variable or not.
Mind that this information could in theory propagate along assignments but in practice I don't think many compilers do it, for example if you do
int* something() {
int z = 21;
int* pz = &z;
return pz;
}
The warning goes away. With static code flow analysis you could be able to prove that pz could only refer to a local variable but in practice that doesn't happen.
The example in your question is really easy to figure out.
int* something() {
int z = 21;
return &z;
}
Look at the expression in the return statement. It takes the address of the identifier z.
Find out where z is declared. Oh, it is a local variable.
Not all cases will be as easy as this one and it's likely that you can trick the compiler into giving false positives or negatives if you write sufficiently weird code.
If you're interested in this kind of stuff, you might enjoy watching some of the talks given at CppCon'15 where static analysis of C++ code was a big deal. Some remarkable talks:
Bjarne Stroustrup: “Writing Good C++14”
Herb Sutter: “Writing Good C++14… By Default”
Neil MacIntosh: “Static Analysis and C++: More Than Lint”
The compiler knows what chunk of memory is holding the current stack. Every time a function is called it creates a new stack and moves the previous frame and stack pointers appropriately which effectively give it a beginning and endpoint for the current stack in memory. Checking to see if you're trying to return a pointer to memory that's about to get freed is relatively simple given that setup.
What I wonder is how the compiler will know if that variable is
allocating memory via. malloc, or how it can tell if it's a local
variable on the stack?
The compiler has to analyse all the code and generate machine code from it.
When functions need to be called, the compiler has to push the parameters on the stack (or reserve registers for them), update the stack pointer, look if there are local variables, initialize those on the stack too and update the stack pointer again.
So obviously the compiler knows about local variables being pushed on the stack.
A program accessing illegal pointer to pointer does not crash with SIGSEGV. This is not a good thing, but I’m wondering how this could be and how the process survived for many days in production. It is bewildering to me.
I have given this program a go in Windows, Linux, OpenVMS, and Mac OS and they have never complained.
#include <stdio.h>
#include <string.h>
void printx(void *rec) { // I know this should have been a **
char str[1000];
memcpy(str, rec, 1000);
printf("%*.s\n", 1000, str);
printf("Whoa..!! I have not crashed yet :-P");
}
int main(int argc, char **argv) {
void *x = 0; // you could also say void *x = (void *)10;
printx(&x);
}
I am not surprised by the lack of a memory fault. The program is not dereferencing an uninitialized pointer. Instead, it is copying and printing the contents of memory beginning at a pointer variable, and the 996 (or 992) bytes beyond it.
Since the pointer is a stack variable, it is printing memory near the top of stack for a ways down. That memory contains the stack frame of main(): possibly some saved register values, a count of program arguments, a pointer to the program arguments, a pointer to a list of environment variables, and a saved instruction register for main() to return, usually in the C runtime library startup code. In all implementations I have investigated, the stack frames below that has copies of the environment variables themselves, an array of pointers to them, and an array of pointers to the program arguments. In Unix environments (which you hint you are using) the program argument strings will be below that.
All of this memory is "safe" to print, except some non-printable characters will appear which might mess up a display terminal.
The chief potential problem is whether there is enough stack memory allocated and mapped to prevent a SIGSEGV during access. A segment fault could happen if there is too little environment data. Or if the implementation puts that data elsewhere so that there are only a few words of stack here. I suggest confirming that by cleaning out the environment variables and re-running the program.
This code would not be so harmless if any of the C runtime conventions are not true:
The architecture uses a stack
A local variable (void *x) is allocated on the stack
The stack grows toward lower numbered memory
Parameters are passed on the stack
Whether main() is called with arguments. (Some light duty environments, like embedded processors, invoke main() without parameters.)
In all mainstream modern implementations, all of these are generally true.
Illegal memory access is undefined behaviour. This means that your program might crash, but is not guaranteed to, because exact behaviour is undefined.
(A joke among developers, especially when facing coworkers that are careless about such things, is that "invoking undefined behaviour might format your hard drive, it's just not guaranteed to". ;-) )
Update: There's some hot discussion going on here. Yes, system developers should know what actually happens on a given system. But such knowledge is tied to the CPU, the operating system, the compiler etc., and generally of limited usefulness, because even if you make the code work, it would still be of very poor quality. That's why I limited my answer to the most important point, and the actual question asked ("why doesn't this crash"):
The code posted in the question does not have well-defined behaviour, but that does just mean that you can't really rely on what it does, not that it should crash.
If you dereference an invalid pointer, you are invoking undefined behaviour. Which means, the program can crash, it can work, it could cook some coffee, whatever.
When you have
int main(int argc, char **argv) {
void *x = 0; // you could also say void *x = (void *)10;
printx(&x);
}
You are declaring x as a pointer with value 0, and that pointer lives in the stack since it's a local variable. Now, you are passing to printx the address of x, which means that with
memcpy(str, rec, 1000);
you are copying data from above the stack (or in fact from the stack itself), to the stack (because the stack pointer address decreases on each push). The source data is likely to be covered by the same page table entry as you are copying just 1000 bytes, so you get no segmentation fault. However, ultimately, as already written, we are talking about undefined behavior.
It would be crashed with great probability if you write to unacceed area. But you are reading, it can be ok. But the behaviour will be still undefined.
Okay I know that main()'s automatic local variables are stored in the stack and also any function automatic local variables too, but when I have tried the following code on gcc version 4.6.3:
#include <stdio.h>
int main(int argc, char *argv[]) {
int var1;
int var2;
int var3;
int var4;
printf("%p\n%p\n%p\n%p\n",&var1,&var2,&var3,&var4);
}
the results are :
0xbfca41e0
0xbfca41e4
0xbfca41e8
0xbfca41ec
according to the results var4 on the top of the stack and var1 on the bottom of the stack and the stack pointer now pointing on the address below var1 address....but why var4 on the
top of the stack and var1 on the bottom...its declared after var1 so I think logically that var1 should be on the top of the stack and any variable declared after var1 should be below
it in memory...so in my example like this:
>>var1 at 0xbfca41ec
>>var2 at 0xbfca41e8
>>var3 at 0xbfca41e4
>>var4 at 0xbfca41e0
>>and stack pointer pointing here
..
..
EDIT 1:
After reading the comment by #AusCBloke I’ve tried the following code :
#include <stdio.h>
void fun(){
int var1;
int var2;
printf("inside the function\n");
printf("%p\n%p\n",&var1,&var2);
}
int main(int argc, char *argv[]) {
int var1;
int var2;
int var3;
int var4;
printf("inside the main\n");
printf("%p\n%p\n%p\n%p\n",&var1,&var2,&var3,&var4);
fun();
return 0;
}
And the results :
inside the main
0xbfe82d60
0xbfe82d64
0xbfe82d68
0xbfe82d6c
inside the function
0xbfe82d28
0xbfe82d2c
so the variables inside fun() stack frame are below the variables inside main() stack frame and that’s true according to the nature of the stack ,..but inside the same stack frame its not necessary to be ordered from top to the bottom.
thanks #AusCBloke..... your comment helped me a lot
There is no requirement for these variables to be allocated in the order in which they were declared. They can be moved around by the compiler, or even optimized out entirely. If you need the relative addresses to stay the same, use a struct.
Objects with automatic storage duration are typically stored on the stack, but the language standard doesn't require it. In fact, the standard (the link is to the latest pre-release C11 draft)
doesn't even mention the word "stack".
The word "stack", unfortunately, is ambiguous.
In the most abstract sense, a stack is a data structure in which the most recently added items are removed first (last-in first-out, or LIFO). The requirements regarding the lifetime of objects with automatic storage duration (i.e., objects defined within a function with no static keyword) imply some kind of stack-like allocation.
The word "stack" is also commonly used to refer to a contiguous region of memory, typically controlled by a "stack pointer" pointing to the top-most element. The stack grows by moving the stack pointer away from the base, and shrinks by moving it toward the base. (It can grow in either direction, toward higher or lower memory addresses.) Most C compilers use this kind of contiguous stack to implement automatic objects -- but not all do. There have been C compilers for IBM mainframe systems which allocate storage for function calls from a heap-like structure, and the addresses for nested calls need not be uniformly in either increasing or decreasing order.
This is an unusual implementation, and there are very good reasons that this approach is not commonly used (a contiguous stack is simpler, more efficient, and is typically supported by the CPU). But the C standard is carefully written to avoid requiring a specific scheme, and C code that's carefully written to be portable will work correctly regardless of which method a compiler chooses. You don't need to know. All you really need to know about the address of var1 is that it's &var1. If you write if (&var1 < &var2) { ... }, then you're probably doing something wrong (that expression's behavior is undefined, BTW).
That's the standard C answer. I see that your question is tagged gcc. As far as I know, all versions of gcc use a contiguous stack. But even so, there's rarely any benefit in taking advantage of this.
On many (most) modern platform stack grows from higher addresses in memory to lower addresses. I..e. when you start your program, the stack pointer is immediately put to some address in memory, which is determined by the maximum stack size in your program. Once things get pushed into stack, the stack pointer actually moves down.
I could be wrong but stacks start in lower memory addresses and are then added to. So it is correct for var4 to be on top. It is a stack after all!
edit: the assembly code behind it has the stack pointer at the bottom of the memory stack and whenever data is added, the stackpointer is incremented so that the next variable falls ontop.
I'm 99.9999% sure that the answer is Yes. Also, the stack grows downwards on Intel architecture machines, not upwards. The lower area becomes the virtual "top" of the stack (it's upside-down, so to speak).
So technically, the variables are in the correct order in stack memory.
EDIT: This is probably still compiler-specific, though.
this will give a proper output even though i have not allocated memory and have declared a pointer to structure two inside main
struct one
{
char x;
int y;
};
struct two
{
char a;
struct one * ONE;
};
main()
{
struct two *TWO;
scanf("%d",&TWO->ONE->y);
printf("%d\n",TWO->ONE->y);
}
but when i declare a pointer to two after the structure outside main i will get segmentation fault but why is it i don't get segmentation fault in previous case
struct one
{
char x;
int y;
};
struct two
{
char a;
struct one * ONE;
}*TWO;
main()
{
scanf("%d",&TWO->ONE->y);
printf("%d\n",TWO->ONE->y);
}
In both the cases TWO is a pointer to a object of type struct two.
In case 1 the pointer is wild and can be pointing anywhere.
In case 2 the pointer is NULL as it is global.
But in both the cases it a pointer not pointing to a valid struct two object. Your code in scanf is treating this pointer as though it was referring to a valid object. This leads to undefined behavior.
Because what you are doing is undefined behaviour. Sometimes it seems to work. That doesn't mean you should do it :-)
The most likely explanation is to do with how the variables are initialised. Automatic variables (on the stack) will get whatever garbage happens to be on the stack when the stack pointer was decremented.
Variables outside functions (like in the second case) are always initialised to zero (null pointer for pointer types).
That's the basic difference between your two situations but, as I said, the first one is working purely by accident.
When declaring a global pointer, it will be initialized to zero, and so the generated addresses will be small numbers that may or may not be readable on your system.
When declaring an automatic pointer, its initial value is likely to be much more interesting. It will be, in this case, whatever the run-time library left at that point on the stack prior to calling main(), or perhaps a left-over value from the compiler-generated stack-frame setup code. It is somewhat likely to be a saved stack pointer or frame pointer, which is a valid pointer if used with small offsets.
So anyway, the uninitialized pointer does have something in it, and one value leads to a fault while the other, for now, on your system, does not.
And that's because the segmentation fault is a mechanism of the OS and not the C language.
A fault is a block-based mechanism that allocates to itself and other programs some number of pages -- which are each several K -- and it protects itself and other program's pages while allowing your program free reign. You must stray outside of the block context or try to write a read-only page (even if yours) to generate a fault. Simply breaking a language rule is not necessarily enough. The OS is happy to let your program misbehave and act oddly due to its wild references, just as long as it only reads and writes (or clobbers) itself.