This question already has answers here:
returning a local variable from function in C [duplicate]
(4 answers)
Closed 8 years ago.
This is the code I've written,
char *foo();
void main()
{
char *str=foo();
strcpy(str,"Holy sweet moses! I blew my stack!!");
printf("%s",str);
}
char * foo()
{
char str[256];
return str;
}
When I use char array in function foo(), the strcpy in main() function doesn't copy the string into str. But, when I use int array in function foo(), main() strcpy copies successfully.
i.e.
int str[256]; //in function foo
output
Holy sweet moses! I blew my stack!!
if
char str[256]; //in foo()
output : nothing!
What you are doing is clearly UNDEF, but.. Let's try to understand WHY it works with ints and not with chars..
TL;DR: printf uses the stack, overwriting some of the space pointed by str, but since the int array is bigger in memory than the char array, it is "far ahead" in the stack and doesn't get overwritten.
An int is 4 bytes, so 256 ints will be 1024 bytes.
If the array is in the stack, this will point to RBP - 1024 for example.
With chars, a char is 1 byte, 256 chars will be 256 bytes.
If the array is in the stack, this will point to RBP - 256 for example.
What does this mean? the str pointer will point to 1024 or 256 bytes "ahead" of the current stack pointer when foo returns.
SO.. when you call strcpy(str, "yourstring"); that memory could get overwritten with the stack that strcpy and printf uses. The thing here is that it IS overwritten but not all the stack, just a little, but enough to cover 256 bytes, and thus, that function can overwrite the copied string, this doesn't happen with your int array, because the string will be copied 1024 bits ahead of the stack pointer and strcpy and printf don't use so much stack.
Let me show you how your stack will end up:
If you change the size of the char array it will probably work.
All of this is undefined behaviour and completely depends on your architecture, computer and compiler. I'm using Linux x86_64 at the moment.
Are you aware of Scope and Lifetime Of Variables Concept ? If you are, then you do know that what you are trying to do invokes "Undefined Behavior". You are lucky your code is even printing out something or not at all printing rather than referencing the unallocated memory and crashing due to Heap corruption.
From SO Soln::
To what extent are stack/heap controlled by the OS or language
runtime?
The OS allocates the stack for each system-level thread when the thread is created. Typically the OS is called by the language runtime to allocate the heap for the application.
What is their scope?
The stack is attached to a thread, so when the thread exits the stack is reclaimed. The heap is typically allocated at application startup by the runtime, and is reclaimed when the application (technically process) exits.
What determines the size of each of them?
The size of the stack is set when a thread is created. The size of the heap is set on application startup, but can grow as space is needed (the allocator requests more memory from the operating system).
When you function foo() returns, it returns address of a string from the stack. When function exits, your pointer is useless, because your string is removed from the stack when foo() stops. So you have a pointer to some place in memory, but can't tell what's there
Read compiler warnings, i bet there was at least one telling you that your function returns pointer to a local variable.(mine does and I did it a few times).
From your string about blowing the stack I conclude that you know that what you're doing is wrong. Therefore my answer will be: Undefined behavior is undefined. When writing to memory that's not yours to write to anything can happen including things that you might expect and things that you might not expect. Other undefined behaviors might be marginally interesting to explore, writing to memory that doesn't belong to you isn't. It's always wrong, it will always do something you don't expect and there's no situation where the correct solution is anything other than not doing it anymore.
When you change the array from char to int, you change the size of the array and since most likely the stack grows down on your architecture it changes the address of the memory you're not supposed to overwrite in the first place.
Related
I have an infinite while loop, I am not sure if I should use a char array or char pointer. The value keeps getting overwritten and used in other functions. With a char pointer, I understand there could be a memory leak, so is it preferred to use an array?
char *recv_data = NULL;
int main(){
.....
while(1){
.....
recv_data = cJSON_PrintUnformatted(root);
.....
}
}
or
char recv[256] = {0};
int main(){
.....
while(1){
.....
strcpy(recv, cJSON_PrintUnformatted(root));
.....
}
}
The first version should be preferred.
It doesn't have a limit on the size of the returned string.
You can use free(recv_data) to fix the memory leak.
The second version has these misfeatures:
The memory returned from the function can't be freed, because you never assigned it to a variable that you can pass to free().
It's a little less efficient, since it performs an unnecessary copy.
Based on how you used it, the cJSON_PrintUnformatted returns a pointer to a char array. Since there are no input arguments, it probably allocates memory inside the function dynamically. You probably have to free that memory. So you need the returned pointer in order to deallocate the memory yourself.
The second option discards that returned pointer, and so you lost your only way to free the allocated memroy. Hence it will remain allocated -> memroy leak.
But of course this all depends on how the function is implemented. Maybe it just manipulates a global array and return a pointer to it, so there is no need to free it.
Indeed, the second version has a memory leak, as #Barmar points out.
However, even if you were to fix the memory leak, you still can't really use the first version of your code: With the first version, you have to decide at compile-time what the maximum length of the string returned by cJSON_PrintUnformatted(). Now,
If you choose a value that's too low, the strcpy() function would exceed the array bounds and corrupt your stack.
If you choose a value that's so high as to be safe - you might have to exceed the amount of space available for your program's stack, causing a Stack Overflow (yes, like the name of this site). You could fix that using a strncpy(), giving the maximum size - and then what you'd have is a truncated string.
So you really don't have much choice than using whatever memory is pointed to by the cJSON_PrintUnformatted()'s return value (it's probably heap-allocated memory). Plus - why make a copy of it when it's already there for you to use? Be lazy :-)
PS - What should really happen is for the cJSON_PrintUnformatted() to take a buffer and a buffer size as parameters, giving its caller more control over memory allocation and resource limits.
A program accessing illegal pointer to pointer does not crash with SIGSEGV. This is not a good thing, but I’m wondering how this could be and how the process survived for many days in production. It is bewildering to me.
I have given this program a go in Windows, Linux, OpenVMS, and Mac OS and they have never complained.
#include <stdio.h>
#include <string.h>
void printx(void *rec) { // I know this should have been a **
char str[1000];
memcpy(str, rec, 1000);
printf("%*.s\n", 1000, str);
printf("Whoa..!! I have not crashed yet :-P");
}
int main(int argc, char **argv) {
void *x = 0; // you could also say void *x = (void *)10;
printx(&x);
}
I am not surprised by the lack of a memory fault. The program is not dereferencing an uninitialized pointer. Instead, it is copying and printing the contents of memory beginning at a pointer variable, and the 996 (or 992) bytes beyond it.
Since the pointer is a stack variable, it is printing memory near the top of stack for a ways down. That memory contains the stack frame of main(): possibly some saved register values, a count of program arguments, a pointer to the program arguments, a pointer to a list of environment variables, and a saved instruction register for main() to return, usually in the C runtime library startup code. In all implementations I have investigated, the stack frames below that has copies of the environment variables themselves, an array of pointers to them, and an array of pointers to the program arguments. In Unix environments (which you hint you are using) the program argument strings will be below that.
All of this memory is "safe" to print, except some non-printable characters will appear which might mess up a display terminal.
The chief potential problem is whether there is enough stack memory allocated and mapped to prevent a SIGSEGV during access. A segment fault could happen if there is too little environment data. Or if the implementation puts that data elsewhere so that there are only a few words of stack here. I suggest confirming that by cleaning out the environment variables and re-running the program.
This code would not be so harmless if any of the C runtime conventions are not true:
The architecture uses a stack
A local variable (void *x) is allocated on the stack
The stack grows toward lower numbered memory
Parameters are passed on the stack
Whether main() is called with arguments. (Some light duty environments, like embedded processors, invoke main() without parameters.)
In all mainstream modern implementations, all of these are generally true.
Illegal memory access is undefined behaviour. This means that your program might crash, but is not guaranteed to, because exact behaviour is undefined.
(A joke among developers, especially when facing coworkers that are careless about such things, is that "invoking undefined behaviour might format your hard drive, it's just not guaranteed to". ;-) )
Update: There's some hot discussion going on here. Yes, system developers should know what actually happens on a given system. But such knowledge is tied to the CPU, the operating system, the compiler etc., and generally of limited usefulness, because even if you make the code work, it would still be of very poor quality. That's why I limited my answer to the most important point, and the actual question asked ("why doesn't this crash"):
The code posted in the question does not have well-defined behaviour, but that does just mean that you can't really rely on what it does, not that it should crash.
If you dereference an invalid pointer, you are invoking undefined behaviour. Which means, the program can crash, it can work, it could cook some coffee, whatever.
When you have
int main(int argc, char **argv) {
void *x = 0; // you could also say void *x = (void *)10;
printx(&x);
}
You are declaring x as a pointer with value 0, and that pointer lives in the stack since it's a local variable. Now, you are passing to printx the address of x, which means that with
memcpy(str, rec, 1000);
you are copying data from above the stack (or in fact from the stack itself), to the stack (because the stack pointer address decreases on each push). The source data is likely to be covered by the same page table entry as you are copying just 1000 bytes, so you get no segmentation fault. However, ultimately, as already written, we are talking about undefined behavior.
It would be crashed with great probability if you write to unacceed area. But you are reading, it can be ok. But the behaviour will be still undefined.
A program accessing illegal pointer to pointer does not crash with SIGSEGV. This is not a good thing, but I’m wondering how this could be and how the process survived for many days in production. It is bewildering to me.
I have given this program a go in Windows, Linux, OpenVMS, and Mac OS and they have never complained.
#include <stdio.h>
#include <string.h>
void printx(void *rec) { // I know this should have been a **
char str[1000];
memcpy(str, rec, 1000);
printf("%*.s\n", 1000, str);
printf("Whoa..!! I have not crashed yet :-P");
}
int main(int argc, char **argv) {
void *x = 0; // you could also say void *x = (void *)10;
printx(&x);
}
I am not surprised by the lack of a memory fault. The program is not dereferencing an uninitialized pointer. Instead, it is copying and printing the contents of memory beginning at a pointer variable, and the 996 (or 992) bytes beyond it.
Since the pointer is a stack variable, it is printing memory near the top of stack for a ways down. That memory contains the stack frame of main(): possibly some saved register values, a count of program arguments, a pointer to the program arguments, a pointer to a list of environment variables, and a saved instruction register for main() to return, usually in the C runtime library startup code. In all implementations I have investigated, the stack frames below that has copies of the environment variables themselves, an array of pointers to them, and an array of pointers to the program arguments. In Unix environments (which you hint you are using) the program argument strings will be below that.
All of this memory is "safe" to print, except some non-printable characters will appear which might mess up a display terminal.
The chief potential problem is whether there is enough stack memory allocated and mapped to prevent a SIGSEGV during access. A segment fault could happen if there is too little environment data. Or if the implementation puts that data elsewhere so that there are only a few words of stack here. I suggest confirming that by cleaning out the environment variables and re-running the program.
This code would not be so harmless if any of the C runtime conventions are not true:
The architecture uses a stack
A local variable (void *x) is allocated on the stack
The stack grows toward lower numbered memory
Parameters are passed on the stack
Whether main() is called with arguments. (Some light duty environments, like embedded processors, invoke main() without parameters.)
In all mainstream modern implementations, all of these are generally true.
Illegal memory access is undefined behaviour. This means that your program might crash, but is not guaranteed to, because exact behaviour is undefined.
(A joke among developers, especially when facing coworkers that are careless about such things, is that "invoking undefined behaviour might format your hard drive, it's just not guaranteed to". ;-) )
Update: There's some hot discussion going on here. Yes, system developers should know what actually happens on a given system. But such knowledge is tied to the CPU, the operating system, the compiler etc., and generally of limited usefulness, because even if you make the code work, it would still be of very poor quality. That's why I limited my answer to the most important point, and the actual question asked ("why doesn't this crash"):
The code posted in the question does not have well-defined behaviour, but that does just mean that you can't really rely on what it does, not that it should crash.
If you dereference an invalid pointer, you are invoking undefined behaviour. Which means, the program can crash, it can work, it could cook some coffee, whatever.
When you have
int main(int argc, char **argv) {
void *x = 0; // you could also say void *x = (void *)10;
printx(&x);
}
You are declaring x as a pointer with value 0, and that pointer lives in the stack since it's a local variable. Now, you are passing to printx the address of x, which means that with
memcpy(str, rec, 1000);
you are copying data from above the stack (or in fact from the stack itself), to the stack (because the stack pointer address decreases on each push). The source data is likely to be covered by the same page table entry as you are copying just 1000 bytes, so you get no segmentation fault. However, ultimately, as already written, we are talking about undefined behavior.
It would be crashed with great probability if you write to unacceed area. But you are reading, it can be ok. But the behaviour will be still undefined.
Am I correct in thinking that:
char *buff[500];
... creates a stack variable, and:
char *buff = (char *)malloc(500);
... creates a heap variable?
If that's correct, when and why would you use heap variables over stack variables and vice versa. I understand the stack is faster is there anything else.
One last question, is the main function a stack frame on the stack?
Yes, first one creates an array of char pointers in the stack, about 500*4 bytes and second one allocates 500 chars in the heap and points a stack char ptr to them.
Allocating in the stack is easy and fast, but stack is limited, heap is slower but much bigger. Apart from that, stack allocated values are "deleted" once you leave the scope, so it is very good for small local values like primitive variables.
If you allocate too much in the stack you might run out of stack and die, main as all the functions you execute has a stack frame in the stack and all the local variables to the function are stored there, so going too deep into function calling might get you into a stackoverflow as well.
In general is a good rule of thumb to allocate anything that you use often and is bigger than a hundred bytes in the heap, and small variables and pointers in the stack.
Seeing that you wrote
char *buff = (char *)malloc(500);
you probably meant
char buff[500]; instead of
char* buff[500];
in your first example (so you have a char-array, not an array of pointers to chars)
Yes, "allocation" on the stack is faster because you just increase a pointer stored in the ESP register.
You need heap-variables if you want:
1) more memory than fits in the stack (generally much earlier)
2) pass memory that was allocated by a called function to the calling function.
Your buffs are not equivalent.
The first one (char *buff[500]) is an array of 500 pointers; the 2nd one (char *buff = (char *)malloc(500)) is a pointer.
The pointer (on the stack) points to 500 bytes of memory (if the malloc call succeeded) on the heap.
The array of pointers is on the stack. Its pointers are not initialized.
Unless using C99, using the stack the size of your array must be known at compile-time. That means you cannot do:
int size = 3; // somewhere, possibly from user input
char *buff[size];
But using "the heap" (dynamic allocation), you can provide any dimensions you like. That's because the memory allocation is performed at run-time, rather than hardcoded into the executable.
The C standard contains neither the words heap nor stack. What we have here instead are two storage durations (of 4 in total): automatic and allocated:
char buff[500]; // note the missing * to match the malloc example
within a function declares the object buff as an array of char and having automatic storage duration. The object will cease to be when the block where the object was declared, is exited.
char *buff = malloc(500); // no cast necessary; this is C
will declare buff as a pointer to char. malloc will reserve 500 continuous bytes and return a pointer to it. The returned 500-byte object will exist until it is explicitly freed with a call to free. The object is said to have allocated storage duration.
That's all the C standard says. It doesn't specify that the char buff[500] needs to be allocated from a "stack", or that there needs to be a stack. It doesn't specify that the malloc needs to use some "heap". On the contrary, a compiler might internally implement the char buff[500] like
{
char *buff = malloc(500);
free(buff);
}
Or it can deduce that the allocated memory is not used, or that it is only used once, and use a stack-based allocation instead of actually calling malloc.
In practice, most current compilers and environments will use a memory layout called stack for automatic variables, and the objects with allocated storage duration are said to come from "heap" which is a metaphor for the unorganized mess that it is compared to the orderly stack, but it is not something that has to be so.
Heap variables can be created dynamically, ie you can ask a size to your user and malloc a new variable with this size.
The size of a stack variable must be known at compile time.
Like you said, stack variable are faster allocated and accessed. So i'll recommend using them every time you know the size at compile time. Otherwise you don't have the choice, you must use malloc()
This is indeed a variable allocated on the stack:
char buff[500]; // You had a typo here?
and this is on the heap:
char *buff = (char *)malloc(500);
Why would you use one vs the other?
In char *buff[500], the 500 needs to be a compile-time constant. You can't use it if 500 is computed at runtime.
On the other hand, stack allocations are instantaneous while heap allocations take time (thus they incur a runtime performance cost).
Space on the stack is limited by the thread's stack size (typically 1MB before you get a stack overflow), while there's much more available on the heap.
If you allocate an array on the stack big enough to take up more than 2 pages of virtual memory as managed by the OS, and access the end of the array before doing anything else, there's the possibility of getting a protection fault (this depends on the OS)
Finally: every function called has a frame on the stack. The main function is no different. It isn't even any more special than the other functions in your program, since when your program starts running the first code that runs is inside the C runtime environment. After the runtime is ready to begin execution of your own code, it calls main just as you would call any other function.
Those two aren't equivalent. The first is an array of size 500 (on the stack) with pointers to characters. The second is a pointer to a memory chunk of 500 which can be used with the indexing operator.
char buff[500];
char *buff = (char *)malloc(sizeof(char)*500);
Stack variables should be preferred because they require no deallocation. Heap variables allow passing of data between scopes as well as dynamic allocation.
What exactly happens, in terms of memory, when i declare something like:
char arr[4];
How many bytes are reserved for arr?
How is null string accommodated when I 'strcpy' a string of length 4 in arr?
I was writing a socket program, and when I tried to suffix NULL at arr[4] (i.e. the 5th memory location), I ended up replacing the values of some other variables of the program (overflow) and got into a big time mess.
Any descriptions of how compilers (gcc is what I used) manage memory?
sizeof(arr) bytes are saved* (plus any padding the compiler wants to put around it, though that isn't for the array per se). On an implementation with a stack, this just means moving the stack pointer sizeof(arr) bytes down. (That's where the storage comes from. This is also why automatic allocation is fast.)
'\0' isn't accommodated. If you copy "abcd" into it, you get a buffer overrun, because that takes up 5 bytes total, but you only have 4. You enter undefined behavior land, and anything could happen.
In practice you'll corrupt the stack and crash sooner or later, or experience what you did and overwrite nearby variables (because they too are allocated just like the array was.) But nobody can say for certain what happens, because it's undefined.
* Which is sizeof(char) * 4. sizeof(char) is always 1, so 4 bytes.
What exactly happens, in terms of
memory, when i declare something like:
char arr[4];
4 * sizeof(char) bytes of stack memory is reserved for the string.
How is null string accommodated when I
'strcpy' a string of length 4 in arr?
You can not. You can only have 3 characters, 4th one (i.e. arr[3]) should be '\0' character for a proper string.
when I tried to suffix NULL at arr[4]
The behavior will be undefined as you are accessing a invalid memory location. In the best case, your program will crash immediately, but it might corrupt the stack and crash at a later point of time also.
In C, what you ask for is--usually--exactly what you get. char arr[4] is exactly 4 bytes.
But anything in quotes has a 'hidden' null added at the end, so char arr[] = "oops"; reserves 5 bytes.
Thus, if you do this:
char arr[4];
strcpy(arr, "oops");
...you will copy 5 bytes (o o p s \0) when you've only reserved space for 4. Whatever happens next is unpredictable and often catastrophic.
When you define a variable like char arr[4], it reserves exactly 4 bytes for that variable. As you've found, writing beyond that point causes what the standard calls "undefined behavior" -- a euphemism for "you screwed up -- don't do that."
The memory management of something like this is pretty simple: if it's a global, it gets allocated in a global memory space. If it's a local, it gets allocated on the stack by subtracting an appropriate amount from the stack pointer. When you return, the stack pointer is restored, so they cease to exist (and when you call another function, will normally get overwritten by parameters and locals for that function).
When you make a declaration like char arr[4];, the compiler allocates as many bytes as you asked for, namely four. The compiler might allocate extra in order to accommodate efficient memory accesses, but as a rule you get exactly what you asked for.
If you then declare another variable in the same function, that variable will generally follow arr in memory, unless the compiler makes certain optimizations again. For that reason, if you try to write to arr but write more characters than were actually allocated for arr, then you can overwrite other variables on the stack.
This is not really a function of gcc. All C compilers work essentially the same way.