I'm a beginner to the C programming language. I sort of understand the general definition of stack memory, heap memory, malloc, pointers, and memory addresses. But I'm a little overwhelmed on understanding when to use each technique in practice and the difference between them.
I've written three small programs to serve as examples. They all do the same thing, and I'd like a little commentary and explanation about what the difference between them is. I do realize that's a naive programming question, but I'm hoping to connect some basic dots here.
Program 1:
void B (int* worthRef) {
/* worthRef is a pointer to the
netWorth variable allocated
on the stack in A.
*/
*worthRef = *worthRef + 1;
}
void A() {
int netWorth = 20;
B(&netWorth);
printf("%d", netWorth); // Prints 21
}
int main() {
A();
}
Program 2:
int B (int worthRef) {
/* worthRef is now a local variable. If
I return it, will it get destroyed
once B finishes execution?
*/
worthRef = worthRef + 1;
return (worthRef);
}
void A() {
int netWorth = 20;
int result = B(netWorth);
printf("%d", result); // Also prints 21
}
int main() {
A();
}
Program 3:
void B (int* worthRef) {
/* worthRef is a pointer to the
netWorth variable allocated on
the heap.
*/
*worthRef = *worthRef + 1;
}
void A() {
int *netWorth = (int *) malloc(sizeof(int));
*netWorth = 20;
B(netWorth);
printf("%d", *netWorth); // Also prints 21
free(netWorth);
}
int main() {
A();
}
Please check my understanding:
Program 1 allocates memory on the stack for the variable netWorth, and uses a pointer to this stack memory address to directly modify the variable netWorth. This is an example of pass by reference. No copy of the netWorth variable is made.
Program 2 calls B(), which creates a locally stored copy of the value netWorth on its stack memory, increments this local copy, then returns it back to A() as result. This is an example of pass by value.
Does the local copy of worthRef get destroyed when it's returned?
Program 3 allocates memory on the heap for the variable netWorth, variable, and uses a pointer to this heap memory address to directly modify the variable netWorth. This is an example of pass by reference. No copy of the netWorth variable is made.
My main point of confusion is between Program 1 and Program 3. Both are passing pointers around; it's just that one is passing a pointer to a stack variable versus one passing a pointer to a heap variable, right? But in this situation, why do I even need the heap? I just want to have a single function to change a single value, directly, which I can do just fine without malloc.
The heap allows the programmer to choose the lifetime of the variable, right? In what circumstances would the programmer want to just keep a variable around (e.g. netWorth in this case)? Why not just make it a global variable in that case?
I sort of understand the general definition of stack memory, heap
memory, malloc, pointers, and memory addresses... I'd like a little
commentary and explanation about what the difference between them
is...
<= OK...
Program 1 allocates memory on the stack for the variable netWorth, and
uses a pointer to this stack memory address to directly modify the
variable netWorth. This is an example of pass by reference.
<= Absolutely correct!
Q: Program 2 ... Does the local copy of worthRef get destroyed when it's returned?
A: int netWorth exists only within the scope of A(), whicn included the invocation of B().
Program 1 and Program 3 ... one is passing a pointer to a stack variable versus one passing a pointer to a heap variable.
Q: But in this situation, why do I even need the heap?
A: You don't. It's perfectly OK (arguably preferable) to simply take addressof (&) int, as you did in Program 1.
Q: The heap allows the programmer to choose the lifetime of the variable, right?
A: Yes, that's one aspect of allocating memory dynamically. You are correct.
Q: Why not just make it a global variable in that case?
A: Yes, that's another alternative.
The answer to any question "Why choose one design alternative over another?" is usually "It depends".
For example, maybe you can't just declare everything "local variable" because you're environment happens to have a very small, limited stack. It happens :)
In general,
If you can declare a local variable instead of allocating heap, you generally should.
If you can avoid declaring a global variable, you generally should.
Checking your understanding:
Program 1 allocates memory on within the function stack frame of A() for the variable netWorth, and passes the address of netWorth as a pointer to this stack memory address to function B() allowing B() to directly modify the value for the variable netWorth stored at that memory address. This is an example of pass by reference passing the 'address of' a variable by value. (there is no pass by reference in C -- it's all pass by value) No copy of the netWorth variable is made.
Program 2 calls B(), which creates a locally stored copy of the value netWorth on its stack memory, increments this local copy, then returns the integer back to A() as result. (a function can always return its own type) This is an example of pass by value (because there is only pass by value in C).
(Does the local copy of worthRef get destroyed when it's returned? - answer yes, but since a function can always return its own type, it can return the int value to A() [which is handled by the calling convention for the platform])
Program 3 allocates memory on the heap for the variable netWorth, and uses a pointer to this heap memory address to directly modify the variable netWorth. While "stack/heap" are commonly used terms, C has no concept of stack or heap, the distiction is that variables are either declared with Automatic Storage Duration which is limited to the scope within which they are declared --or-- when you allocate with malloc/calloc/realloc, the block of memory has Allocated Storage Duration which is good for the life of the program or until the block of memory is freed. (there are also static and thread local storage durations not relevant here) See: C11 Standard - 6.2.4 Storage durations of objects This is an example of pass by reference. (It's all pass by Value!) No copy of the netWorth variable is made. A pointer to the address originally returned by malloc is used throughout to reference this memory location.
Related
Below is the sample code written in C:
#include <stdlib.h>
#include <stdio.h>
#include <stdbool.h>
int* second;
void myTest1(int a, bool check){
if(check){
second = &a;
}
printf("%d", *(second));
printf(" ");
}
int main(int argc, char const *argv[])
{
int a =1;
int b = 2;
int c=3;
myTest1(a,true);
myTest1(b,false);
myTest1(c,false);
}
I expect the output be like
1 1 1
But the actual output is
1 2 3
I am a bit confused about it, void myTest1(int a, bool check) here I believed a should have function scope. But it seems that the memory location of a is reused in every function call.
I am building by using command gcc <filename>.c
Below are some system details:
OS: Ubuntu
GCC compiler version: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
You are setting second to the address of a variable that goes out of scope at the end of the function invocation. Once the variable is out of scope, the memory it occupied is no longer yours, and it can be reused. That memory address happens to be reused in each subsequent invocation, with the new argument's value copied into it.
Don't store the address of a local variable and access that address after the variable has gone out of scope. This produces undefined behaviour. You cannot make any assumptions about what may be on the other end of that pointer.
Regarding the following:
here I believed a should have function scope
It does...
But it seems that the memory location of "a" is reused in every function call.
Well, reuse is a necessary and natural result of a having "function scope". Unless you are assuming some kind of garbage-collection behavior, where keeping a pointer to this memory prevents it from being reused, a's memory location should be reused once a is inaccessible, otherwise this is the very definition of a memory leak. If the memory for function arguments weren't reused, every function invocation would leak memory by design.
In C and C++ it's your job not to store the address of stack-allocated variables that have gone out of scope (or at least, not to try to use that address after the variable has gone out of scope). The act of storing an address in a pointer does not innately protect that memory from reuse. It's up to you to either allocate that memory on the heap and manage its lifetime yourself, or to allow the stack to manage your memory and not hold onto pointers that outlive the lifetime the variables they point to.
You're causing undefined behavior. When a function returns, all its automatic variables are destroyed, and any pointers to them become invalid.
On the second and third calls to myTest1(), second points to a variable from the first call. Since this variable no longer exists, dereferencing the pointer results in undefined behavior.
You're getting the result you see because in practice each successive call to the function happens to use the same location for the stack frame. So the address of a in each call is the same, so the old pointer will point to the value that was passed in the new call.
My question is about returning a value from a function call in c. I've read many questions and answers on this topic such as:
Returning a local variable confusion in C
I also understand that returning a pointer to a local variable is a problem since the local variable has no limited lifetime after the return. In my case below I am returning a pointer value not a pointer to a local variable. This should be ok yes? (i.e. returning &x would be bad)
int *foo(int a) {
int *x; //local scope local lifetime
...
x = &something_non-local;
...
return x; //return value of pointer
}
int main(void) {
int *bar;
bar = foo(10);
}
You can look at it this way. A variable represents a place in memory. Lifetime of a variable is defined by validity of this place in memory. The place in memory keeps a value. Any copy operation just copies values from one place in memory to another.
Every such memory place has an address. A pointer to a variable is just another variable which value is an address of another variable. A copy of a pointer is just a copy of an address from one place to another.
If a variable is declared in the global scope, than its memory will be valid till the exit of the program. If the variable is declared non-statically in a procedure, then its memory is valid till the end of this procedure. Technically it is probably allocated on stack which gets unallocated when the procedure returns, making this memory not valid.
In your case if pointer x points to a variable from a global scope, the variable itself is valid until the exit from the procedure. However, the return statement copies the value from the x into a different location just before the latter becomes invalid. As a result the value of x will end up in bar. This value is the address of a static variable which is still valid.
There would be a different story if you try to return the address of x, i.e. &x. This would be an address of memory which existed inside of the procedure. After return it will point to an invalid memory location.
So, if your something_non-local points to such a thing, than you are in trouble. It should point to something static or something in heap.
BTW, malloc allocates a memory in heap which is valid till you use free.
What is the difference between declaring an array "dynamically",
[ie. using realloc() or malloc(), etc... ]
vs
declaring an array within main() with Global scope?,
eg.
int main()
{
int array[10];
return 0;
}
I am learning, and at the moment it feels that there is not much differnce between
declaring a variable (array, whatever) -with Global scope,
when compared to a
dynamically allocated variable (array, whatever) -AND never calling free() on it AND allowing it to be 'destoryed' when the program ends'
What are the consequences of either option?
EDIT
Thank you for your responses.
Global scope should have been 'local scope' -local to main()
When you declare an array like int arr[10] in a function, the space for the array is allocated on the stack. The memory will be freed when your function exits.
When you declare an array or any other data structure using malloc() or realloc(), you allocated the space on the heap and the memory will only be freed afer the program exits. So when the program is running, you are responsible for freeing it using free() after you no longer want to use it. If you don't free it and make your array pointer point to something else, you will create a memory leak. However, your computer will always be able to retrieve all the program's used memory after the program ends because of virtual memory.
As kaylum said in comment below your question, the array in your second example does not have global scope. Its scope is limited to main(), and it is inaccessible in other scopes unless main() explicitly makes it available (e.g. passes it by argument to another function).
Dynamic memory allocation means that the programmer explicitly allocates memory when needed, and explicitly releases it when no longer needed. Because of that, the amount of memory allocated can be determined at run time (e.g. calculated from user input). Also, if the programmer forgets to release the memory, or reallocates it inappropriately, memory can be leaked (still allocated by the program, but not accessible by the program). For example;
/* within a function */
char *p = malloc(100);
p = malloc(200);
free(p);
leaks 100 bytes, every time this code is executed, because the result of the first malloc() call is never released, and it is then inaccessible to the program because its value is not stored anywhere.
Your second example is actually an array of automatic storage duration. As far as your program is concerned, it only exists until the end of the scope in which it is created. In your case, as main() returns, the array will cease to exist.
An example of an array with global scope is
int array[10];
void f() {array[0] = 42;}
int main()
{
array[0] = 10;
f();
/* array[0] will be 42 here */
}
The difference is that this array exists and is accessible to every function that has visibility of the declaration, within the same compilation unit.
One other important difference is that global arrays are (usually) zero initialised - a global array of int will have all elements zero. A dynamically allocated array will not have elements initialised (unless created with calloc(), which does initialise to zero). Similarly, an automatic array will not have elements initialised. It is undefined behaviour to access the value of something (including an array element) that is uninitialised.
So
#include <stdio.h>
int array[10];
int main()
{
int *array2;
int array3[10];
array2 = malloc(10*sizeof(*array2));
printf("%d\n", array[0]); /* okay - will print 0 */
printf("%d\n", array2[0]); /* undefined behaviour. array2[0] is uninitialised */
printf("%d\n", array3[0]); /* undefined behaviour. array3[0] uninitialised */
return 0;
}
Obviously the way to avoid undefined behaviour is to initialise array elements to something valid before trying to access their value (e.g. printing them out, in the example above).
I have a function that is called recursively a number of times. Inside this function I malloc memory for a struct and pass it as an argument into the recursive call of this function. I am confused whether I can keep the name of the variable I am mallocing the same. Or is this going to be a problem?
struct Student{
char *studentName;
int studentAge;
};
recursiveFunction(*struct){ //(Whoever calls this function sends in a malloced struct)
Student *structptr = malloc(sizeof(Student));
<Do some processing>
.
.
if(condition met){
return;
}
else{
recursiveFunction(structptr);
}
}
free(){} // All malloced variables are free'd in another function
Would this be a problem since the name of the variable being malloced doesnt change in each recursive call.
The short answer is no. When you declare a variable it is scoped at the level where it is declared, in your case within this function. Each successive recursive call creates a new scope and allocates that memory within that scope so the name of your variable will not cause problems. However, you do want to be very careful that you free any memory that you malloc() before returning from your function as it will not be accessible outside the scope of your function unless you pass back a pointer to it. This question provides a lot of helpful information on using malloc() within functions. I also recommend reading more about scope here.
Each malloc() must have a matching free(). Either you need to free the record inside recursiveFunction (e.g. immediately before it exits), or in a function called by recursiveFunction or you need to maintain a list of them and free them elsewhere.
The name of the 'variable being malloced' being the same is irrelevant. In any case, it is not the variable that is being malloc()d; rather it is memory that is being malloc()d and the address stored in a variable. Each recursive iteration of recursiveFunction has a different stack frame and thus a different instance of this variable. So all you need to do is ensure that each malloc() is paired with a free() that is passed the address returned by malloc().
If you want to check you've done your malloc() / free() right, run valgrind on the code.
Can keep the name of the variable I am mallocing the same?
Yes, in a recursive function this is fine. As the function is called recursively, each variable holding the malloc'd pointer (it doesn't hold the memory itself) will be allocated on a new stack frame.
However, you're going to have to free that memory somehow. Only the pointer to the memory is on the stack, so only the pointer is freed when the function exits. The malloc'd memory lives on. Either at the end of each call to the function, or all that memory will have to be returned as part of a larger structure and freed later.
I am confused whether I can keep the name of the variable I am mallocing the same.
You seem to be confused about the concept of scope. Functions in C define scopes for the (local) variables you declare within them. That means that when you declare a local variable bar inside some function foo, then when you reference bar inside that function you reference whatever you declared it to be.
int bar = 21;
void foo(void) {
int bar = 42;
// ...
bar; // This is the bar set to 42
}
Now scope is only the theoretical concept. It's implemented using (among other details that I skip over here) so called stack frames:
When you call foo, then a new stack frame is created on the call stack, containing (this is highly dependent on the target architecture) things like return address (i.e. the address of the instruction that will be executed after foo), parameters (i.e. the values that you pass to a function) and, most importantly, space for the local variables (bar).
Accessing the variable bar in foo is done using addresses relative to the current stack frame. So accessing bar could mean access byte 12 relative to the current stack frame.
When in a recursive function the function calls itself, this is handled (mostly, apart from possible optimizations) like any other function call, and thus a new stack frame is created. Accessing the same (named) variable from within different stack frames will (because, as said, the access is using a relative address) thus access a different entities.
[Note: I hope this rather rough descriptions helps you, this is a topic that is - when talked about in depth - extremely depending on actual implementations (compilers), used optimizations, calling convention, operating system, target architecture, ... ]
I put together a simple stupid example, which hopefully shows that what you want to do should be possible, given that you appropriately free whatever you allocated:
unsigned int crazy_factorial(unsigned int const * const n) {
unsigned int result;
if (*n == 0) {
result = 1;
} else {
unsigned int * const nextN = malloc(sizeof(unsigned int));
*nextN = *n - 1;
result = *n * crazy_factorial(nextN);
free(nextN);
}
return result;
}
Running this with some output shows what's going on.
Why can I return from a function an array setup by malloc:
int *dog = (int*)malloc(n * sizeof(int));
but not an array setup by
int cat[3] = {0,0,0};
The "cat[ ]" array is returned with a Warning.
Thanks all for your help
This is a question of scope.
int cat[3]; // declares a local variable cat
Local variables versus malloc'd memory
Local variables exist on the stack. When this function returns, these local variables will be destroyed. At that point, the addresses used to store your array are recycled, so you cannot guarantee anything about their contents.
If you call malloc, you will be allocating from the heap, so the memory will persist beyond the life of your function.
If the function is supposed to return a pointer (in this case, a pointer-to-int which is the first address of the integer array), that pointer should point to good memory. Malloc is the way to ensure this.
Avoiding Malloc
You do not have to call malloc inside of your function (although it would be normal and appropriate to do so).
Alternatively, you could pass an address into your function which is supposed to hold these values. Your function would do the work of calculating the values and would fill the memory at the given address, and then it would return.
In fact, this is a common pattern. If you do this, however, you will find that you do not need to return the address, since you already know the address outside of the function you are calling. Because of this, it's more common to return a value which indicates the success or failure of the routine, like an int, than it is to return the address of the relevant data.
This way, the caller of the function can know whether or not the data was successfully populated or if an error occurred.
#include <stdio.h> // include stdio for the printf function
int rainCats (int *cats); // pass a pointer-to-int to function rainCats
int main (int argc, char *argv[]) {
int cats[3]; // cats is the address to the first element
int success; // declare an int to store the success value
success = rainCats(cats); // pass the address to the function
if (success == 0) {
int i;
for (i=0; i<3; i++) {
printf("cat[%d] is %d \r", i, cats[i]);
getchar();
}
}
return 0;
}
int rainCats (int *cats) {
int i;
for (i=0; i<3; i++) { // put a number in each element of the cats array
cats[i] = i;
}
return 0; // return a zero to signify success
}
Why this works
Note that you never did have to call malloc here because cats[3] was declared inside of the main function. The local variables in main will only be destroyed when the program exits. Unless the program is very simple, malloc will be used to create and control the lifespan of a data structure.
Also notice that rainCats is hard-coded to return 0. Nothing happens inside of rainCats which would make it fail, such as attempting to access a file, a network request, or other memory allocations. More complex programs have many reasons for failing, so there is often a good reason for returning a success code.
There are two key parts of memory in a running program: the stack, and the heap. The stack is also referred to as the call stack.
When you make a function call, information about the parameters, where to return, and all the variables defined in the scope of the function are pushed onto the stack. (It used to be the case that C variables could only be defined at the beginning of the function. Mostly because it made life easier for the compiler writers.)
When you return from a function, everything on the stack is popped off and is gone (and soon when you make some more function calls you'll overwrite that memory, so you don't want to be pointing at it!)
Anytime you allocate memory you are allocating if from the heap. That's some other part of memory, maintained by the allocation manager. Once you "reserve" part of it, you are responsible for it, and if you want to stop pointing at it, you're supposed to let the manager know. If you drop the pointer and can't ask to have it released any more, that's a leak.
You're also supposed to only look at the part of memory you said you wanted. Overwriting not just the part you said you wanted, but past (or before) that part of memory is a classic technique for exploits: writing information into part of memory that is holding computer instructions instead of data. Knowledge of how the compiler and the runtime manage things helps experts figure out how to do this. Well designed operating systems prevent them from doing that.
heap:
int *dog = (int*)malloc(n*sizeof(int*));
stack:
int cat[3] = {0,0,0};
Because int cat[3] = {0,0,0}; is declaring an automatic variable that only exists while the function is being called.
There is a special "dispensation" in C for inited automatic arrays of char, so that quoted strings can be returned, but it doesn't generalize to other array types.
cat[] is allocated on the stack of the function you are calling, when that stack is freed that memory is freed (when the function returns the stack should be considered freed).
If what you want to do is populate an array of int's in the calling frame pass in a pointer to an that you control from the calling frame;
void somefunction() {
int cats[3];
findMyCats(cats);
}
void findMyCats(int *cats) {
cats[0] = 0;
cats[1] = 0;
cats[2] = 0;
}
of course this is contrived and I've hardcoded that the array length is 3 but this is what you have to do to get data from an invoked function.
A single value works because it's copied back to the calling frame;
int findACat() {
int cat = 3;
return cat;
}
in findACat 3 is copied from findAtCat to the calling frame since its a known quantity the compiler can do that for you. The data a pointer points to can't be copied because the compiler does not know how much to copy.
When you define a variable like 'cat' the compiler assigns it an address. The association between the name and the address is only valid within the scope of the definition. In the case of auto variables that scope is the function body from the point of definition onwards.
Auto variables are allocated on the stack. The same address on the stack is associated with different variables at different times. When you return an array, what is actually returned is the address of the first element of the array. Unfortunately, after the return, the compiler can and will reuse that storage for completely unrelated purposes. What you'd see at a source code level would be your returned variable mysteriously changing for no apparent reason.
Now, if you really must return an initialized array, you can declare that array as static. A static variable has a permanent rather than a temporary storage allocation. You'll need to keep in mind that the same memory will be used by successive calls to the function, so the results from the previous call may need to be copied somewhere else before making the next call.
Another approach is to pass the array in as an argument and write into it in your function. The calling function then owns the variable, and the issues with stack variables don't arise.
None of this will make much sense unless you carefully study how the stack works. Good luck.
You cannot return an array. You are returning a pointer. This is not the same thing.
You can return a pointer to the memory allocated by malloc() because malloc() has allocated the memory and reserved it for use by your program until you explicitly use free() to deallocate it.
You may not return a pointer to the memory allocated by a local array because as soon as the function ends, the local array no longer exists.
This is a question of object lifetime - not scope or stack or heap. While those terms are related to the lifetime of an object, they aren't equivalent to lifetime, and it's the lifetime of the object that you're returning that's important. For example, a dynamically alloced object has a lifetime that extends from allocation to deallocataion. A local variable's lifetime might end when the scope of the variable ends, but if it's static its lifetime won't end there.
The lifetime of an object that has been allocated with malloc() is until that object has been freed using the free() function. Therefore when you create an object using malloc(), you can legitimately return the pointer to that object as long as you haven't freed it - it will still be alive when the function ends. In fact you should take care to do something with the pointer so it gets remembered somewhere or it will result in a leak.
The lifetime of an automatic variable ends when the scope of the variable ends (so scope is related to lifetime). Therefore, it doesn't make sense to return a pointer to such an object from a function - the pointer will be invalid as soon as the function returns.
Now, if your local variable is static instead of automatic, then its lifetime extends beyond the scope that it's in (therefore scope is not equivalent to lifetime). So if a function has a local static variable, the object will still be alive even when the function has returned, and it would be legitimate to return a pointer to a static array from your function. Though that brings in a whole new set of problems because there's only one instance of that object, so returning it multiple times from the function can cause problems with sharing the data (it basically only works if the data doesn't change after initialization or there are clear rules for when it can and cannot change).
Another example taken from another answer here is regarding string literals - pointers to them can be returned from a function not because of a scoping rule, but because of a rule that says that string literals have a lifetime that extends until the program ends.