I was reading this article and saw this: "This article assumes that you already know and understand at least basically how the memory
map in GNU/Linux system works, and specially the difference between memory statically
allocated in the stack and memory dynamically allocated in the heap."
This confused me because I thought that stack and heap are dynamically allocated, meaning only allocated if necessary, and global variables and variables declared as "static" inside of a function are statically allocated, meaning always allocated.
For example, if I have
void f() {
int x = 1;
...
}
the value 1 only gets put on the stack and the stack pointer only gets incremented if the function f() gets called. Likewise, if I have
void f() {
int x = malloc(1 * sizeof(int));
...
}
that heap memory only gets allocated if f() is called. However, if I have "int x = 1;" in the global section of the program or "static int x = 1;" within a function body, any time I run this program, that memory will be allocated in the data section with the value 1.
Am I wrong about any of this?
The stack itself is statically allocated. Variables allocated in the stack come and go, as control flow enters and leaves their scope.
Static variables are initialized only once, even if the initialization statement is inside of a function body.
Check the wikipedia example:
#include <stdio.h>
void func() {
static int x = 0;
/* x is initialized only once across five calls of func() and
the variable will get incremented five
times after these calls. The final value of x will be 5. */
x++;
printf("%d\n", x); // outputs the value of x
}
int main() { //int argc, char *argv[] inside the main is optional in the particular program
func(); // prints 1
func(); // prints 2
func(); // prints 3
func(); // prints 4
func(); // prints 5
return 0;
}
The stack is basically a large statically allocated array with a movable pointer that starts at the beginning of it. When you call a function (starting w/ main), the pointer moves (creates a stack frame) and the space in the stack frame is sliced up and given to local variables (how many local variables you have determines how big your function's stack frame will be). So local variables are kind of dynamic (they only emerge once you enter the function), but the stack is of a static size. If you allocate a super large structure on it or use too much recursion, you'll go past the end and the OS will take you down—a phenomenon known as stackoverflow.
(
The icon actually illustrates this. The gray container at the bottom represents the static array that the stack is. The orange rectangles are frames created by function calls. As in a proper stack overflow, the frames overflow the container and boom -- you're program is dead. The fact that the frames go up illustrates another, rather special thing about the stack—new frames have lower addresses than the old ones so stackoverflows really happen at the beginning of the stack array rather than the end of it (unless you think of arrays as starting at their largest index and ending at 0).
)
Stack is allocated in unit of stack frame.
When a function is called, a stack frame is allocated for it,
When a function returns, its stack frame disappear,
And, stack helps to store function arguments & local variables.
After a function get its stack frame, yes, within the stack frame the function use bytes of it as need dynamic.
Dynamic allocate stack like heap
If u want to allocate memory on stack like the way as heap, then you can use alloca() from <alloca.h>, it's quicker than heap, but in most case u don't need that, it has disadvantages, thus not suggested in general case.
Describe stack allocation in different context might make it more clear:
From the view of a linux thread, (by the way, each process has 1 thread on creation by default, as main thread), the stack is of fix size & allocated on thread creation, (2Mb for IA-32, 32Mb for IA-64, by default), and you can change the default size as need. So you can say this is fix, and static.
From the view of a function within thread or process, the stack frame is allocated for it from the thread's stack memory when the function starts, and the stack frame disappear when the function finish.
From the view of a non-static local variable inside a function, the variable is allocated from the stack frame of the function as need, dynamically.
So, should it be called static or dynamical, you decide.
Related
void foo()
{
int i;
printf("%d",i++);
}
int main()
{
int j;
for(j=0;j<10;j++)
{
foo();
}
}
The output of the code is a series of 10 random but continuous numbers.
I wanted to know how is this possible if i is being initialized each time and my storage class is also auto?
Also is the stack frame for foo() assigned again every time it is called or is it the same one?
auto variables are not automatically initialized, they contain garbage. Those variables are allocated typically in the stack, with a quick stack operation. In foo(), you have such a variable, so the printf outputs random data.
Even if the function foo() was called recursively, the auto-variables problem stays there, every call makes a new stack frame containing garbage.
So,
Is the stack frame of a function called multiple times different each time?
YES. Unless you use static variables in that function. But then, you will have variables which keep their value, but which are in reality always the same (no more "local", or better, global but only visible in that function (scope)).
======== EDIT after comment ========
Well, the above sentence contains a formal error. It is not true that the stack frame will be different: it can be the same between calls (probably not if recursion is used). But, you can not be sure it is the same, so you must assume it is different each time. By assuming it is different, you state it is different, even if it is not true. Unless you want to exploit some arcane algorithm...
I'm a beginner to the C programming language. I sort of understand the general definition of stack memory, heap memory, malloc, pointers, and memory addresses. But I'm a little overwhelmed on understanding when to use each technique in practice and the difference between them.
I've written three small programs to serve as examples. They all do the same thing, and I'd like a little commentary and explanation about what the difference between them is. I do realize that's a naive programming question, but I'm hoping to connect some basic dots here.
Program 1:
void B (int* worthRef) {
/* worthRef is a pointer to the
netWorth variable allocated
on the stack in A.
*/
*worthRef = *worthRef + 1;
}
void A() {
int netWorth = 20;
B(&netWorth);
printf("%d", netWorth); // Prints 21
}
int main() {
A();
}
Program 2:
int B (int worthRef) {
/* worthRef is now a local variable. If
I return it, will it get destroyed
once B finishes execution?
*/
worthRef = worthRef + 1;
return (worthRef);
}
void A() {
int netWorth = 20;
int result = B(netWorth);
printf("%d", result); // Also prints 21
}
int main() {
A();
}
Program 3:
void B (int* worthRef) {
/* worthRef is a pointer to the
netWorth variable allocated on
the heap.
*/
*worthRef = *worthRef + 1;
}
void A() {
int *netWorth = (int *) malloc(sizeof(int));
*netWorth = 20;
B(netWorth);
printf("%d", *netWorth); // Also prints 21
free(netWorth);
}
int main() {
A();
}
Please check my understanding:
Program 1 allocates memory on the stack for the variable netWorth, and uses a pointer to this stack memory address to directly modify the variable netWorth. This is an example of pass by reference. No copy of the netWorth variable is made.
Program 2 calls B(), which creates a locally stored copy of the value netWorth on its stack memory, increments this local copy, then returns it back to A() as result. This is an example of pass by value.
Does the local copy of worthRef get destroyed when it's returned?
Program 3 allocates memory on the heap for the variable netWorth, variable, and uses a pointer to this heap memory address to directly modify the variable netWorth. This is an example of pass by reference. No copy of the netWorth variable is made.
My main point of confusion is between Program 1 and Program 3. Both are passing pointers around; it's just that one is passing a pointer to a stack variable versus one passing a pointer to a heap variable, right? But in this situation, why do I even need the heap? I just want to have a single function to change a single value, directly, which I can do just fine without malloc.
The heap allows the programmer to choose the lifetime of the variable, right? In what circumstances would the programmer want to just keep a variable around (e.g. netWorth in this case)? Why not just make it a global variable in that case?
I sort of understand the general definition of stack memory, heap
memory, malloc, pointers, and memory addresses... I'd like a little
commentary and explanation about what the difference between them
is...
<= OK...
Program 1 allocates memory on the stack for the variable netWorth, and
uses a pointer to this stack memory address to directly modify the
variable netWorth. This is an example of pass by reference.
<= Absolutely correct!
Q: Program 2 ... Does the local copy of worthRef get destroyed when it's returned?
A: int netWorth exists only within the scope of A(), whicn included the invocation of B().
Program 1 and Program 3 ... one is passing a pointer to a stack variable versus one passing a pointer to a heap variable.
Q: But in this situation, why do I even need the heap?
A: You don't. It's perfectly OK (arguably preferable) to simply take addressof (&) int, as you did in Program 1.
Q: The heap allows the programmer to choose the lifetime of the variable, right?
A: Yes, that's one aspect of allocating memory dynamically. You are correct.
Q: Why not just make it a global variable in that case?
A: Yes, that's another alternative.
The answer to any question "Why choose one design alternative over another?" is usually "It depends".
For example, maybe you can't just declare everything "local variable" because you're environment happens to have a very small, limited stack. It happens :)
In general,
If you can declare a local variable instead of allocating heap, you generally should.
If you can avoid declaring a global variable, you generally should.
Checking your understanding:
Program 1 allocates memory on within the function stack frame of A() for the variable netWorth, and passes the address of netWorth as a pointer to this stack memory address to function B() allowing B() to directly modify the value for the variable netWorth stored at that memory address. This is an example of pass by reference passing the 'address of' a variable by value. (there is no pass by reference in C -- it's all pass by value) No copy of the netWorth variable is made.
Program 2 calls B(), which creates a locally stored copy of the value netWorth on its stack memory, increments this local copy, then returns the integer back to A() as result. (a function can always return its own type) This is an example of pass by value (because there is only pass by value in C).
(Does the local copy of worthRef get destroyed when it's returned? - answer yes, but since a function can always return its own type, it can return the int value to A() [which is handled by the calling convention for the platform])
Program 3 allocates memory on the heap for the variable netWorth, and uses a pointer to this heap memory address to directly modify the variable netWorth. While "stack/heap" are commonly used terms, C has no concept of stack or heap, the distiction is that variables are either declared with Automatic Storage Duration which is limited to the scope within which they are declared --or-- when you allocate with malloc/calloc/realloc, the block of memory has Allocated Storage Duration which is good for the life of the program or until the block of memory is freed. (there are also static and thread local storage durations not relevant here) See: C11 Standard - 6.2.4 Storage durations of objects This is an example of pass by reference. (It's all pass by Value!) No copy of the netWorth variable is made. A pointer to the address originally returned by malloc is used throughout to reference this memory location.
I have a function that is called recursively a number of times. Inside this function I malloc memory for a struct and pass it as an argument into the recursive call of this function. I am confused whether I can keep the name of the variable I am mallocing the same. Or is this going to be a problem?
struct Student{
char *studentName;
int studentAge;
};
recursiveFunction(*struct){ //(Whoever calls this function sends in a malloced struct)
Student *structptr = malloc(sizeof(Student));
<Do some processing>
.
.
if(condition met){
return;
}
else{
recursiveFunction(structptr);
}
}
free(){} // All malloced variables are free'd in another function
Would this be a problem since the name of the variable being malloced doesnt change in each recursive call.
The short answer is no. When you declare a variable it is scoped at the level where it is declared, in your case within this function. Each successive recursive call creates a new scope and allocates that memory within that scope so the name of your variable will not cause problems. However, you do want to be very careful that you free any memory that you malloc() before returning from your function as it will not be accessible outside the scope of your function unless you pass back a pointer to it. This question provides a lot of helpful information on using malloc() within functions. I also recommend reading more about scope here.
Each malloc() must have a matching free(). Either you need to free the record inside recursiveFunction (e.g. immediately before it exits), or in a function called by recursiveFunction or you need to maintain a list of them and free them elsewhere.
The name of the 'variable being malloced' being the same is irrelevant. In any case, it is not the variable that is being malloc()d; rather it is memory that is being malloc()d and the address stored in a variable. Each recursive iteration of recursiveFunction has a different stack frame and thus a different instance of this variable. So all you need to do is ensure that each malloc() is paired with a free() that is passed the address returned by malloc().
If you want to check you've done your malloc() / free() right, run valgrind on the code.
Can keep the name of the variable I am mallocing the same?
Yes, in a recursive function this is fine. As the function is called recursively, each variable holding the malloc'd pointer (it doesn't hold the memory itself) will be allocated on a new stack frame.
However, you're going to have to free that memory somehow. Only the pointer to the memory is on the stack, so only the pointer is freed when the function exits. The malloc'd memory lives on. Either at the end of each call to the function, or all that memory will have to be returned as part of a larger structure and freed later.
I am confused whether I can keep the name of the variable I am mallocing the same.
You seem to be confused about the concept of scope. Functions in C define scopes for the (local) variables you declare within them. That means that when you declare a local variable bar inside some function foo, then when you reference bar inside that function you reference whatever you declared it to be.
int bar = 21;
void foo(void) {
int bar = 42;
// ...
bar; // This is the bar set to 42
}
Now scope is only the theoretical concept. It's implemented using (among other details that I skip over here) so called stack frames:
When you call foo, then a new stack frame is created on the call stack, containing (this is highly dependent on the target architecture) things like return address (i.e. the address of the instruction that will be executed after foo), parameters (i.e. the values that you pass to a function) and, most importantly, space for the local variables (bar).
Accessing the variable bar in foo is done using addresses relative to the current stack frame. So accessing bar could mean access byte 12 relative to the current stack frame.
When in a recursive function the function calls itself, this is handled (mostly, apart from possible optimizations) like any other function call, and thus a new stack frame is created. Accessing the same (named) variable from within different stack frames will (because, as said, the access is using a relative address) thus access a different entities.
[Note: I hope this rather rough descriptions helps you, this is a topic that is - when talked about in depth - extremely depending on actual implementations (compilers), used optimizations, calling convention, operating system, target architecture, ... ]
I put together a simple stupid example, which hopefully shows that what you want to do should be possible, given that you appropriately free whatever you allocated:
unsigned int crazy_factorial(unsigned int const * const n) {
unsigned int result;
if (*n == 0) {
result = 1;
} else {
unsigned int * const nextN = malloc(sizeof(unsigned int));
*nextN = *n - 1;
result = *n * crazy_factorial(nextN);
free(nextN);
}
return result;
}
Running this with some output shows what's going on.
when we declare a pointer it points to some random location or address in memory unless we explicitly assign a particular value(address of any variable) to it.
Here is code:
int *p;
printf("int is %p\n",p);
float *j;
printf("float is %p\n",j);
double *dp;
printf("double is %p\n",dp);
char *ch ;
printf("char is %p\n",ch);
j=(float *)p;
printf("cast int to float %p\n",j);
output:
int is (nil)
float is 0x400460
double is 0x7fff9f0f1a20
char is (nil)
cast int to float (nil)
Rather than printing the random location it prints (nil)
what is (nil) here ?? I don't understand the behaviour of pointers here??
Uninitialized pointer variables don't point to random address. Where they point is undefined.
In practice, they have the value what's left in the stack, which you can't know for sure. In your example, those several pointers happens to have 0 values, so they happen to be null pointer, and they print as nil.
Don't rely on such behavior, ever.
If by the "gnu" tag you mean that you're using glibc, then the reason is that the printf implementation in glibc will print "(nil)" when encountering a NULL pointer.
In other words, several of your pointers happen to have the value NULL (0), because that's what happened to be on the stack at that particular location.
Elaborating on what #yu hao has said:
Whenever a subroutine is invoked, a stack frame is allocated to the subroutine. This frame exists until return statement is encountered.
A subroutine frequently needs memory space for storing the values of local variables, the variables that are known only within the active subroutine and do not retain values after it returns. For doing so , the compiler allocate space for this use by simply moving the top of the stack by enough to provide the space. This is very fast when compared to dynamic memory allocation, which uses the heap space. Note that each separate activation of a subroutine gets its own separate space in the stack for locals known as Stack Frames.
The main reason for doing this is to keep track of the point to which each active subroutine should return control when it finishes executing. An active subroutine is one that has been called but is yet to complete execution after which control should be handed back to the point of call. Such activations of subroutines may be nested to any level (recursive as a special case), hence the stack structure. If, for example, a subroutine DrawSquare calls a subroutine DrawLine from four different places, DrawLine must know where to return when its execution completes. To accomplish this, the address following the call instruction, the return address, is also pushed onto the call stack with each call.
Coming back to your question, during its execution , the function can make changes to its stack frame and When a function 'returns', its frame is 'popped' from stack.
But the contents of stack remains unchanged in this process. Only the stack pointer gets modified to point to previous frame. So when a new subroutine gets called, new frame is allocated on top of previous one,and if the subroutine has uninitialized variables, they will print value that is stored in the memory allocated to them . Which will depend on the state of stack at that point of time.
All of your pointers are nil/ undefiend or the are just random values from the stack!
See: http://ideone.com/wwiy8F
If I define an array in if statement then does memory gets allocated during compile time eg.
if(1)
{
int a[1000];
}
else
{
float b[1000];
}
Then a memory of 2 * 1000 for ints + 4 * 1000 for floats get allocated?
It is reserved on the stack at run-time (assuming a non-trivial condition - in your case, the compiler would just exclude the else part). That means it only exists inside the scope block (between the {}).
In your example, only the memory for the ints gets allocated on the stack (1000 * sizeof(int)).
As you can guess, this is happening at run time. The generated code has instructions to allocate the space on the stack when the corresponding block of code is entered.
Keep in mind that this is happening because of the semantics of the language. The block structure introduces a new scope, and any automatic variables allocated in that scope have a lifetime that lasts as long as the scope does. In C, this is implemented by allocating it on the stack, which collapses as the scope disappears.
Just to drive home the point, note that the allocation would be different had the variables been of different nature.
if(1)
{
static int a[1000];
}
else
{
static float b[1000];
}
In this case, space is allocated for both the ints and the floats. The lifetime of these variables is the program. But the visibility is within the block scope they are allocated in.
Scope
Variables declared inside the scope of a pair of { } are on the stack. This applies to variables declared at the beginning of a function or in any pair of { } within the function.
int myfunc()
{
int i = 0; // On the stack, scoped: myfunc
printf("%i\n");
if (1)
{
int j = 1; // On the stack, scope: this if statement
printf("%i %i\n",i,j);
}
printf("%i %i\n",i,j); // Won't work, no j
}
These days the scope of the variables is limited to the surrounding { }. I recall that some older Microsoft compilers didn't limit the scope, and that in the example above the final printf() would compile.
So Where is it in memory?
The memory of i and j is merely reserved on the stack. This is not the same as memory allocation done with malloc(). That is important, because calling malloc() is very slow in comparison. Also with memory dynamically allocated using malloc() you have to call free().
In effect the compiler knows ahead of time what space is needed for a function's variables and will generate code that refers to memory relative to whatever the stack pointer is when myfunc() is called. So long as the stack is big enough (2MBytes normally, depends on the OS), all is good.
Stack overflow occurs in the situation where myfunc() is called with the stack pointer already close to the end of the stack (i.e. myfunc() is called by a function which in turn had been called by another which it self was called by yet another, etc. Each layer of nested calls to functions moves the stack pointer on a bit more, and is only moved back when functions return).
If the space between the stack pointer and the end of the stack isn't big enough to hold all the variables that are declared in myfunc(), the code for myfunc() will simply try to use locations beyond the end of the stack. That is almost always a bad thing, and exactly how bad and how hard it is to notice that something has gone wrong depends on the operating system. On small embedded micro controllers it can be a nightmare as it usually means some other part of the program's data (eg global variables) get silently overwritten, and it can be very hard to debug. On bigger systems (Linux, Windows) the OS will tell you what's happened, or will merely make the stack bigger.
Runtime Efficiency Considerations
In the example above I'm assigning values to i and j. This does actually take up a small amount of runtime. j is assigned 1 only after evaluation of the if statement and subsequent branch into where j is declared.
Say for example the if statement hadn't evaluated as true; in that case j is never assigned 1. If j was declared at the start of myfunc() then it would always get assigned the value of 1 regardless of whether the if statement was true - a minor waste of time. But consider a less trivial example where a large array is declared an initialised; that would take more execution time.
int myfunc()
{
int i = 0; // On the stack, scoped: myfunc
int k[10000] = {0} // On the stack, scoped: myfunc. A complete waste of time
// when the if statement evaluates to false.
printf("%i\n");
if (0)
{
int j = 1; // On the stack, scope: this if statement
// It would be better to move the declaration of k to here
// so that it is initialised only when the if evaluates to true.
printf("%i %i %i\n",i,j,k[500]);
}
printf("%i %i\n",i,j); // Won't work, no j
}
Placing the declaration of k at the top of myfunc() means that a loop 10,000 long is executed to initialise k every time myfunc() is called. However it never gets used, so that loop is a complete waste of time.
Of course, in these trivial examples compilers will optimise out the unnecessary code, etc. In real code where the compiler cannot predict ahead of time what the execution flow will be then things are left in place.
Memory for the array in the if block will be allocated on stack at run time. else part will be optimized (removed) by the compiler. For more on where the variables will be allocated memory, see Segmentation Fault when writing to a string
As DCoder & paddy corrected me, the memory will be calculated at compile time but allocated at run-time in stack memory segment, but with the scope & lifetime of the block in which the array is defined. The size of memory allocated depends on size of int & float in your system. Read this for an overview on C memory map