I recently read about scope rules in C. It says that a local or auto variable is available only inside the block of the function in which it is declared. Once outside the function it no longer is visible. Also that its lifetime is only till the end of the final closing braces of the function body.
Now here is the problem. What happens when the address of a local variable is returned from the function to the calling function ?
For example :-
main()
{
int *p=fun();
}
int * fun()
{
int localvar=0;
return (&localvar);
}
once the control returns back from the function fun, the variable localvar is no longer alive. So how will main be able to access the contents at this address ?
The address can be returned, but the value stored at the address cannot reliably be read. Indeed, it is not even clear that you can safely assign it, though the chances are that on most machines there wouldn't be a problem with that.
You can often read the address, but the behaviour is undefined (read 'bad: to be avoided at all costs!'). In particular, the address may be used for other variables in other functions, so if you access it after calling other functions, you are definitely unlikely to see the last value stored in the variable by the function that returned the pointer to it.
Why then is a function returning a pointer ever required?
One reason is often 'dynamic memory'. The malloc() family of functions return a pointer to new (non-stack) memory.
Another reason is 'found something at this location in a value passed to me'. Consider strchr() or strstr().
Another reason is 'returning pointer to a static object, either hidden in the function or in the file containing the source for the function'. Consider asctime() et al (and worry about thread-safety).
There are probably a few others, but those are probably the most common.
Note that none of these return a pointer to a local (stack-based) variable.
The variable is gone, but the memory location still exists and might even still contain the value you set. It will however probably get overwritten pretty fast as more functions are called and the memory address gets reused for another function's local variables. You can learn more by reading about the Call Stack, which is where local variables of functions are stored.
Referencing that location in memory after the function has returned is dangerous. Of course the location still exists (and it may still contain your value), but you no longer have any claim to that memory region and it will likely be overwritten with new data as the program continues and new local variables are allocated on the stack.
gcc gives me the following warning:
t.c: In function ‘test’:
t.c:3:2: warning: function returns address of local variable [enabled by default]
Consider this test program:
int * test(int p) {
int loc = p;
return &loc;
}
int main(void) {
int *c = test(4);
test(5);
printf("%d\n", *c);
return 0;
}
What do you think this prints?
Related
Below is the sample code written in C:
#include <stdlib.h>
#include <stdio.h>
#include <stdbool.h>
int* second;
void myTest1(int a, bool check){
if(check){
second = &a;
}
printf("%d", *(second));
printf(" ");
}
int main(int argc, char const *argv[])
{
int a =1;
int b = 2;
int c=3;
myTest1(a,true);
myTest1(b,false);
myTest1(c,false);
}
I expect the output be like
1 1 1
But the actual output is
1 2 3
I am a bit confused about it, void myTest1(int a, bool check) here I believed a should have function scope. But it seems that the memory location of a is reused in every function call.
I am building by using command gcc <filename>.c
Below are some system details:
OS: Ubuntu
GCC compiler version: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
You are setting second to the address of a variable that goes out of scope at the end of the function invocation. Once the variable is out of scope, the memory it occupied is no longer yours, and it can be reused. That memory address happens to be reused in each subsequent invocation, with the new argument's value copied into it.
Don't store the address of a local variable and access that address after the variable has gone out of scope. This produces undefined behaviour. You cannot make any assumptions about what may be on the other end of that pointer.
Regarding the following:
here I believed a should have function scope
It does...
But it seems that the memory location of "a" is reused in every function call.
Well, reuse is a necessary and natural result of a having "function scope". Unless you are assuming some kind of garbage-collection behavior, where keeping a pointer to this memory prevents it from being reused, a's memory location should be reused once a is inaccessible, otherwise this is the very definition of a memory leak. If the memory for function arguments weren't reused, every function invocation would leak memory by design.
In C and C++ it's your job not to store the address of stack-allocated variables that have gone out of scope (or at least, not to try to use that address after the variable has gone out of scope). The act of storing an address in a pointer does not innately protect that memory from reuse. It's up to you to either allocate that memory on the heap and manage its lifetime yourself, or to allow the stack to manage your memory and not hold onto pointers that outlive the lifetime the variables they point to.
You're causing undefined behavior. When a function returns, all its automatic variables are destroyed, and any pointers to them become invalid.
On the second and third calls to myTest1(), second points to a variable from the first call. Since this variable no longer exists, dereferencing the pointer results in undefined behavior.
You're getting the result you see because in practice each successive call to the function happens to use the same location for the stack frame. So the address of a in each call is the same, so the old pointer will point to the value that was passed in the new call.
I came across this page that illustrates common ways in which dangling pointes are created.
The code below is used to illustrate dangling pointers by returning address of a local variable:
// The pointer pointing to local variable becomes
// dangling when local variable is static.
#include<stdio.h>
int *fun()
{
// x is local variable and goes out of scope
// after an execution of fun() is over.
int x = 5;
return &x;
}
// Driver Code
int main()
{
int *p = fun();
fflush(stdout);
// p points to something which is not valid anymore
printf("%d", *p);
return 0;
}
On running this, this is the compiler warning I get (as expected):
In function 'fun':
12:2: warning: function returns address of local variable [-Wreturn-local-addr]
return &x;
^
And this is the output I get (good so far):
32743
However, when I comment out the fflush(stdout) line, this is the output I get (with the same compiler warning):
5
What is the reason for this behaviour? How exactly is the presence/absence of the fflush command causing this behaviour change?
Returning a pointer to an object on the stack is bad, as you've mentioned. The reason you only see a problem with your fflush() call in place is that the stack is unmodified if it's not there. That is, the 5 is still in place, so the pointer dereference still gives that 5 to you. If you call a function (almost any function, probably) in between fun and printf, it will almost certainly overwrite that stack location, making the later dereference return whatever junk that function happened to leave there.
This is because calling fflush(stdout) writes onto the stack where x was.
Let me explain. The stack in assembly language (which is what all programming languages eventually run as in one way or another) is commonly used to store local variables, return addresses, and function parameters. When a function is called, it pushes these things onto the stack:
the address of where to continue executing code once the function completes.
the parameters to the function, in an order determined by the calling convention used.
the local variables that the function uses.
These things are then popped off of the stack, one by one, simply by changing where the CPU thinks the top of the stack is. This means the data still exists, but it's not guaranteed to continue to exist.
Calling another function after fun() overwrites the previous values above the top of the stack, in this case with the value of stdout, and so the pointer's referenced value changes.
Without calling another function, the data stays there and is still valid when the pointer is dereferenced.
As we know, local variables have local scope and lifetime. Consider the following code:
int* abc()
{
int m;
return(&m);
}
void main()
{
int* p=abc();
*p=32;
}
This gives me a warning that a function returns the address of a local variable.
I see this as justification:
Local veriable m is deallocated once abc() completes. So we are dereferencing an invalid memory location in the main function.
However, consider the following code:
int* abc()
{
int m;
return(&m);
int p=9;
}
void main()
{
int* p=abc();
*p=32;
}
Here I am getting the same warning. But I guess that m will still retain its lifetime when returning. What is happening? Please explain the error. Is my justification wrong?
First, notice that int p=9; will never be reached, so your two versions are functionally identical. The program will allocate memory for m and return the address of that memory; any code below the return statement is unreacheable.
Second, the local variable m is not actually de-allocated after the function returns. Rather, the program considers the memory free space. That space might be used for another purpose, or it might stay unused and forever hold its old value. Because you have no guarantee about what happens to the memory once the abc() function exits, you should not attempt to access or modify it in any way.
As soon as return keyword is encountered, control passes back to the caller and the called function goes out of scope. Hence, all local variables are popped off the stack. So the last statement in your second example is inconsequential and the warning is justified
Logically, m no longer exists when you return from the function, and any reference to it is invalid once the function exits.
Physically, the picture is a bit more complicated. The memory cells that m occupied are certainly still there, and if you access those cells before anything else has a chance to write to them, they'll contain the value that was written to them in the function, so under the right circumstances it's possible for you to read what was stored in m through p after abc has returned. Do not rely on this behavior being repeatable; it is a coding error.
From the language standard (C99):
6.2.4 Storage durations of objects
...
2 The lifetime of an object is the portion of program execution during which storage is
guaranteed to be reserved for it. An object exists, has a constant address,25) and retains
its last-stored value throughout its lifetime.26) If an object is referred to outside of its
lifetime, the behavior is undefined. The value of a pointer becomes indeterminate when
the object it points to reaches the end of its lifetime.
25) The term ‘‘constant address’’ means that two pointers to the object constructed at possibly different
times will compare equal. The address may be different during two different executions of the same
program.
26) In the case of a volatile object, the last store need not be explicit in the program.
Emphasis mine. Basically, you're doing something that the language definition explicitly calls out as undefined behavior, meaning the compiler is free to handle that situation any way it wants to. It can issue a diagnostic (which your compiler is doing), it can translate the code without issuing a diagnostic, it can halt translation at that point, etc.
The only way you can make m still valid memory (keeping the maximum resemblance with your code) when you exit the function, is to prepend it with the static keyword
int* abc()
{
static int m;
m = 42;
return &m;
}
Anything after a return is a "dead branch" that won't be ever executed.
int m should be locally visible. You should create it as int* m and return it directly.
Why can I return from a function an array setup by malloc:
int *dog = (int*)malloc(n * sizeof(int));
but not an array setup by
int cat[3] = {0,0,0};
The "cat[ ]" array is returned with a Warning.
Thanks all for your help
This is a question of scope.
int cat[3]; // declares a local variable cat
Local variables versus malloc'd memory
Local variables exist on the stack. When this function returns, these local variables will be destroyed. At that point, the addresses used to store your array are recycled, so you cannot guarantee anything about their contents.
If you call malloc, you will be allocating from the heap, so the memory will persist beyond the life of your function.
If the function is supposed to return a pointer (in this case, a pointer-to-int which is the first address of the integer array), that pointer should point to good memory. Malloc is the way to ensure this.
Avoiding Malloc
You do not have to call malloc inside of your function (although it would be normal and appropriate to do so).
Alternatively, you could pass an address into your function which is supposed to hold these values. Your function would do the work of calculating the values and would fill the memory at the given address, and then it would return.
In fact, this is a common pattern. If you do this, however, you will find that you do not need to return the address, since you already know the address outside of the function you are calling. Because of this, it's more common to return a value which indicates the success or failure of the routine, like an int, than it is to return the address of the relevant data.
This way, the caller of the function can know whether or not the data was successfully populated or if an error occurred.
#include <stdio.h> // include stdio for the printf function
int rainCats (int *cats); // pass a pointer-to-int to function rainCats
int main (int argc, char *argv[]) {
int cats[3]; // cats is the address to the first element
int success; // declare an int to store the success value
success = rainCats(cats); // pass the address to the function
if (success == 0) {
int i;
for (i=0; i<3; i++) {
printf("cat[%d] is %d \r", i, cats[i]);
getchar();
}
}
return 0;
}
int rainCats (int *cats) {
int i;
for (i=0; i<3; i++) { // put a number in each element of the cats array
cats[i] = i;
}
return 0; // return a zero to signify success
}
Why this works
Note that you never did have to call malloc here because cats[3] was declared inside of the main function. The local variables in main will only be destroyed when the program exits. Unless the program is very simple, malloc will be used to create and control the lifespan of a data structure.
Also notice that rainCats is hard-coded to return 0. Nothing happens inside of rainCats which would make it fail, such as attempting to access a file, a network request, or other memory allocations. More complex programs have many reasons for failing, so there is often a good reason for returning a success code.
There are two key parts of memory in a running program: the stack, and the heap. The stack is also referred to as the call stack.
When you make a function call, information about the parameters, where to return, and all the variables defined in the scope of the function are pushed onto the stack. (It used to be the case that C variables could only be defined at the beginning of the function. Mostly because it made life easier for the compiler writers.)
When you return from a function, everything on the stack is popped off and is gone (and soon when you make some more function calls you'll overwrite that memory, so you don't want to be pointing at it!)
Anytime you allocate memory you are allocating if from the heap. That's some other part of memory, maintained by the allocation manager. Once you "reserve" part of it, you are responsible for it, and if you want to stop pointing at it, you're supposed to let the manager know. If you drop the pointer and can't ask to have it released any more, that's a leak.
You're also supposed to only look at the part of memory you said you wanted. Overwriting not just the part you said you wanted, but past (or before) that part of memory is a classic technique for exploits: writing information into part of memory that is holding computer instructions instead of data. Knowledge of how the compiler and the runtime manage things helps experts figure out how to do this. Well designed operating systems prevent them from doing that.
heap:
int *dog = (int*)malloc(n*sizeof(int*));
stack:
int cat[3] = {0,0,0};
Because int cat[3] = {0,0,0}; is declaring an automatic variable that only exists while the function is being called.
There is a special "dispensation" in C for inited automatic arrays of char, so that quoted strings can be returned, but it doesn't generalize to other array types.
cat[] is allocated on the stack of the function you are calling, when that stack is freed that memory is freed (when the function returns the stack should be considered freed).
If what you want to do is populate an array of int's in the calling frame pass in a pointer to an that you control from the calling frame;
void somefunction() {
int cats[3];
findMyCats(cats);
}
void findMyCats(int *cats) {
cats[0] = 0;
cats[1] = 0;
cats[2] = 0;
}
of course this is contrived and I've hardcoded that the array length is 3 but this is what you have to do to get data from an invoked function.
A single value works because it's copied back to the calling frame;
int findACat() {
int cat = 3;
return cat;
}
in findACat 3 is copied from findAtCat to the calling frame since its a known quantity the compiler can do that for you. The data a pointer points to can't be copied because the compiler does not know how much to copy.
When you define a variable like 'cat' the compiler assigns it an address. The association between the name and the address is only valid within the scope of the definition. In the case of auto variables that scope is the function body from the point of definition onwards.
Auto variables are allocated on the stack. The same address on the stack is associated with different variables at different times. When you return an array, what is actually returned is the address of the first element of the array. Unfortunately, after the return, the compiler can and will reuse that storage for completely unrelated purposes. What you'd see at a source code level would be your returned variable mysteriously changing for no apparent reason.
Now, if you really must return an initialized array, you can declare that array as static. A static variable has a permanent rather than a temporary storage allocation. You'll need to keep in mind that the same memory will be used by successive calls to the function, so the results from the previous call may need to be copied somewhere else before making the next call.
Another approach is to pass the array in as an argument and write into it in your function. The calling function then owns the variable, and the issues with stack variables don't arise.
None of this will make much sense unless you carefully study how the stack works. Good luck.
You cannot return an array. You are returning a pointer. This is not the same thing.
You can return a pointer to the memory allocated by malloc() because malloc() has allocated the memory and reserved it for use by your program until you explicitly use free() to deallocate it.
You may not return a pointer to the memory allocated by a local array because as soon as the function ends, the local array no longer exists.
This is a question of object lifetime - not scope or stack or heap. While those terms are related to the lifetime of an object, they aren't equivalent to lifetime, and it's the lifetime of the object that you're returning that's important. For example, a dynamically alloced object has a lifetime that extends from allocation to deallocataion. A local variable's lifetime might end when the scope of the variable ends, but if it's static its lifetime won't end there.
The lifetime of an object that has been allocated with malloc() is until that object has been freed using the free() function. Therefore when you create an object using malloc(), you can legitimately return the pointer to that object as long as you haven't freed it - it will still be alive when the function ends. In fact you should take care to do something with the pointer so it gets remembered somewhere or it will result in a leak.
The lifetime of an automatic variable ends when the scope of the variable ends (so scope is related to lifetime). Therefore, it doesn't make sense to return a pointer to such an object from a function - the pointer will be invalid as soon as the function returns.
Now, if your local variable is static instead of automatic, then its lifetime extends beyond the scope that it's in (therefore scope is not equivalent to lifetime). So if a function has a local static variable, the object will still be alive even when the function has returned, and it would be legitimate to return a pointer to a static array from your function. Though that brings in a whole new set of problems because there's only one instance of that object, so returning it multiple times from the function can cause problems with sharing the data (it basically only works if the data doesn't change after initialization or there are clear rules for when it can and cannot change).
Another example taken from another answer here is regarding string literals - pointers to them can be returned from a function not because of a scoping rule, but because of a rule that says that string literals have a lifetime that extends until the program ends.
I know C pretty well, however I'm confused of how temporary storage works.
Like when a function returns, all the allocation happened inside that function is freed (from the stack or however the implementation decides to do this).
For example:
void f() {
int a = 5;
} // a's value doesn't exist anymore
However we can use the return keyword to transfer some data to the outside world:
int f() {
int a = 5;
return a;
} // a's value exists because it's transfered to the outside world
Please stop me if any of this is wrong.
Now here's the weird thing, when you do this with arrays, it doesn't work.
int []f() {
int a[1] = {5};
return a;
} // a's value doesn't exist. WHY?
I know arrays are only accessible by pointers, and you can't pass arrays around like another data structure without using pointers. Is this the reason you can't return arrays and use them in the outside world? Because they're only accessible by pointers?
I know I could be using dynamic allocation to keep the data to the outside world, but my question is about temporary allocation.
Thanks!
When you return something, its value is copied. a does not exist outside the function in your second example; it's value does. (It exists as an rvalue.)
In your last example, you implicitly convert the array a to an int*, and that copy is returned. a's lifetime ends, and you're pointing at garbage.
No variable lives outside its scope, ever.
In the first example the data is copied and returned to the calling function, however the second returns a pointer so the pointer is copied and returned, however the data that is pointed to is cleaned up.
In implementations of C I use (primarily for embedded 8/16-bit microcontrollers), space is allocated for the return value in the stack when the function is called.
Before calling the function, assume the stack is this (the lines could represent various lengths, but are all fixed):
[whatever]
...
When the routine is called (e.g. sometype myFunc(arg1,arg2)), C throws the parameters for the function (arguments and space for the return value, which are all of fixed length) on to the stack, followed by the return address to continue code execution from, and possibly backs up some processor registers.
[myFunc local variables...]
[return address after myFunc is done]
[myFunc argument 1]
[myFunc argument 2]
[myFunc return value]
[whatever]
...
By the time the function fully completes and returns to the code it was called from, all of it's variables have been deallocated off the stack (they might still be there in theory, but there is no guarantee)
In any case, in order to return the array, you would need to allocate space for it somewhere else, then return the address to the 0th element.
Some compilers will store return values in temporary registers of the processor rather than using the stack, but it's rare (only seen it on some AVR compilers).
When you attempt to return a locally allocated array like that, the calling function gets a pointer to where the array used to live on the stack. This can make for some spectacularly gruesome crashes, when later on, something else writes to the array, and clobbers a stack frame .. which may not manifest itself until much later, if the corrupted frame is deep in the calling sequence. The maddening this with debugging this type of error is that real error (returning a local array) can make some other, absolutely perfect function blow up.
You still return a memory address, you can try to check its value, but the contents its pointing are not valid beyond the scope of function,so dont confuse value with reference.
int []f() {
int a[1] = {5};
return a;
} // a's value doesn't exist. WHY?
First, the compiler wouldn't know what size of array to return. I just got syntax errors when I used your code, but with a typedef I was able to get an error that said that functions can't return arrays, which I knew.
typedef int ia[1];
ia h(void) {
ia a = 5;
return a;
}
Secondly, you can't do that anyway. You also can't do
int a[1] = {4};
int b[1];
b = a; // Here both a and b are interpreted as pointer literals or pointer arithmatic
While you don't write it out like that, and the compiler really wouldn't even have to generate any code for it this operation would have to happen semantically for this to be possible so that a new variable name could be used to refer the value that was returned by the function. If you enclosed it in a struct then the compiler would be just fine with copying the data.
Also, outside of the declaration and sizeof statements (and possibly typeof operations if the compiler has that extension) whenever an array name appears in code it is thought of by the compiler as either a pointer literal or as a chunk of pointer arithmetic that results in a pointer. This means that the return statement would end looking like you were returning the wrong type -- a pointer rather than an array.
If you want to know why this can't be done -- it just can't. A compiler could implicitly think about the array as though it were in a struct and make it happen, but that's just not how the C standard says it is to be done.