Is the stack pre-allocated in a process? - c

Well, my question is as follows, I saw somewhere that a linux process allocates 8 MiB on the stack to be used, if I have a C program for example, that I only allocate two variables on the stack, it is right to say that I allocated or is it better to say that I just reused that space? Since a process allocates 8 MiB on the stack it does not depend on the size that I am going to use in my program, as long as it does not exceed my stack, that is, whichever term is appropriate, I will allocate a data on the stack or I will reuse a data that has already been allocated by a linux process?
#include <stdio.h>
void f() {
int x = 5;
printf("Value = %d End = %p\n", x, &x);
}
void g() {
int y = 10;
printf("Value = %d End = %p\n", y, &y);
}
int main(){
f();
g();
return 0;
}
See that the addresses will be the same, because I reused the size that had already been allocated, the same wouldn't happen with malloc, summarizing the term Allocated right data in the Stack isn't very correct?

Is the stack pre-allocated in a process?
On a stack-based architecture, a process will have stack space available to it from the beginning of its execution. That could be described as "pre-allocated". However, do note that in some contexts, it may be possible for a process's stack to be extended during the lifetime of the process. Perhaps that changes how you would view it?
In any case, that has little to do with whether the process of assigning storage space for automatic variables should be described as "allocation". Although it has technical implications, it is of little account linguistically that such space may be carved out of the stack, as opposed to out of some other area of memory controlled by the process. The lifetimes of such objects do obey different rules than the lifetimes of mallocated objects, but so what?
if I have a C program for example, that I only allocate two variables on the stack, it is right to say that I allocated or is it better to say that I just reused that space?
People are likely to understand you just fine either way. Although I'm sure there are some who would quibble over whether "allocate" is technically correct for automatic variables, it is nevertheless widely used for them. If you are conversing with people, as opposed to writing technical documentation to which the distinction is important, then I would not hesitate to use "allocate" to describe assigning storage space to automatic variables.

Related

Why does malloc need to be used for dynamic memory allocation in C?

I have been reading that malloc is used for dynamic memory allocation. But if the following code works...
int main(void) {
int i, n;
printf("Enter the number of integers: ");
scanf("%d", &n);
// Dynamic allocation of memory?
int int_arr[n];
// Testing
for (int i = 0; i < n; i++) {
int_arr[i] = i * 10;
}
for (int i = 0; i < n; i++) {
printf("%d ", int_arr[i]);
}
printf("\n");
}
... what is the point of malloc? Isn't the code above just a simpler-to-read way to allocate memory dynamically?
I read on another Stack Overflow answer that if some sort of flag is set to "pedantic", then the code above would produce a compile error. But that doesn't really explain why malloc might be a better solution for dynamic memory allocation.
Look up the concepts for stack and heap; there's a lot of subtleties around the different types of memory. Local variables inside a function live in the stack and only exist within the function.
In your example, int_array only exists while execution of the function it is defined in has not ended, you couldn't pass it around between functions. You couldn't return int_array and expect it to work.
malloc() is used when you want to create a chunk of memory which exists on the heap. malloc returns a pointer to this memory. This pointer can be passed around as a variable (eg returned) from functions and can be used anywhere in your program to access your allocated chunk of memory until you free() it.
Example:
'''C
int main(int argc, char **argv){
int length = 10;
int *built_array = make_array(length); //malloc memory and pass heap pointer
int *array = make_array_wrong(length); //will not work. Array in function was in stack and no longer exists when function has returned.
built_array[3] = 5; //ok
array[3] = 5; //bad
free(built_array)
return 0;
}
int *make_array(int length){
int *my_pointer = malloc( length * sizeof int);
//do some error checking for real implementation
return my_pointer;
}
int *make_array_wrong(int length){
int array[length];
return array;
}
'''
Note:
There are plenty of ways to avoid having to use malloc at all, by pre-allocating sufficient memory in the callers, etc. This is recommended for embedded and safety critical programs where you want to be sure you'll never run out of memory.
Just because something looks prettier does not make it a better choice.
VLAs have a long list of problems, not the least of which they are not a sufficient replacement for heap-allocated memory.
The primary -- and most significant -- reason is that VLAs are not persistent dynamic data. That is, once your function terminates, the data is reclaimed (it exists on the stack, of all places!), meaning any other code still hanging on to it are SOL.
Your example code doesn't run into this problem because you aren't using it outside of the local context. Go ahead and try to use a VLA to build a binary tree, then add a node, then create a new tree and try to print them both.
The next issue is that the stack is not an appropriate place to allocate large amounts of dynamic data -- it is for function frames, which have a limited space to begin with. The global memory pool, OTOH, is specifically designed and optimized for this kind of usage.
It is good to ask questions and try to understand things. Just be careful that you don't believe yourself smarter than the many, many people who took what now is nearly 80 years of experience to design and implement systems that quite literally run the known universe. Such an obvious flaw would have been immediately recognized long, long ago and removed before either of us were born.
VLAs have their place, but it is, alas, small.
Declaring local variables takes the memory from the stack. This has two ramifications.
That memory is destroyed once the function returns.
Stack memory is limited, and is used for all local variables, as well as function return addresses. If you allocate large amounts of memory, you'll run into problems. Only use it for small amounts of memory.
When you have the following in your function code:
int int_arr[n];
It means you allocated space on the function stack, once the function will return this stack will cease to exist.
Image a use case where you need to return a data structure to a caller, for example:
Car* create_car(string model, string make)
{
Car* new_car = malloc(sizeof(*car));
...
return new_car;
}
Now, once the function will finish you will still have your car object, because it was allocated on the heap.
The memory allocated by int int_arr[n] is reserved only until execution of the routine ends (when it returns or is otherwise terminated, as by setjmp). That means you cannot allocate things in one order and free them in another. You cannot allocate a temporary work buffer, use it while computing some data, then allocate another buffer for the results, and free the temporary work buffer. To free the work buffer, you have to return from the function, and then the result buffer will be freed to.
With automatic allocations, you cannot read from a file, allocate records for each of the things read from the file, and then delete some of the records out of order. You simply have no dynamic control over the memory allocated; automatic allocations are forced into a strictly last-in first-out (LIFO) order.
You cannot write subroutines that allocate memory, initialize it and/or do other computations, and return the allocated memory to their callers.
(Some people may also point out that the stack memory commonly used for automatic objects is commonly limited to 1-8 mebibytes while the memory used for dynamic allocation is generally much larger. However, this is an artifact of settings selected for common use and can be changed; it is not inherent to the nature of automatic versus dynamic allocation.)
If the allocated memory is small and used only inside the function, malloc is indeed unnecessary.
If the memory amount is extremely large (usually MB or more), the above example may cause stack overflow.
If the memory is still used after the function returned, you need malloc or global variable (static allocation).
Note that the dynamic allocation through local variables as above may not be supported in some compiler.

Storage class of referenced memory [duplicate]

Stumbled upon this interview question somewhere,
In C,
Given a variable x, how do you find out if the space for that variable is allocated on the stack or heap?
(Is there any way to find it out programatically and not having to go through the symbol table, etc? And does finding if the space is allocated in stack or heap has any practical implications?)
No, not in general.
Do you know of gcc -fsplit-stack ?
It is up to the implementation to decide whether to allocate a contiguous stack or a stack where blocks are interleaved with heap blocks in memory. Good luck figuring out whether a block was allocated for the heap or the stack when the latter is split.
If you are working on an architecture that stores the stack on a larger address than the heap, you could compare the variable address with the bottom of the stack. Using the pthread threading API, this comparison would look like this:
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <inttypes.h>
int is_stack(void *ptr)
{
pthread_t self = pthread_self();
pthread_attr_t attr;
void *stack;
size_t stacksize;
pthread_getattr_np(self, &attr);
pthread_attr_getstack(&attr, &stack, &stacksize);
return ((uintptr_t) ptr >= (uintptr_t) stack
&& (uintptr_t) ptr < (uintptr_t) stack + stacksize);
}
The test:
int main()
{
int x;
int *p1 = malloc(sizeof(int));
int *p2 = &x;
printf("%d %d\n", is_stack(p1), is_stack(p2));
return 0;
}
...prints 0 1, as expected.
The above code will not detect storage from stacks in other threads. To do that, the code would need to track all created threads.
This is NOT guaranteed by any standard BUT
on most platforms the stack grows down from highest address available, and heap grows up from the bottom if the most significant byte of the address is in the top half of the available memory space for your platform, and you haven't allocated gigabytes of memory, it's a pretty good bet it's on the stack.
#include <iostream>
#include <stdlib.h>
int main()
{
int x = 0;
int* y = new int;
unsigned int a1 = (int) &x;
unsigned int a2 = (int) y;
std::cout<<std::hex<<a1<<" "<<a2<<std::endl;
}
gives the output ffbff474 21600 on the machine I'm typing this.
It might be a trick question. Variables have either automatic or static storage duration[*]. You can fairly safely say that automatics are allocated "on the stack", at least assuming they aren't optimized into registers. It's not a requirement of the standard that there be "a stack", but a conforming C implementation must maintain a call stack and associate automatic variables with the levels of the call stack. So whatever the details of what it actually does, you can pretty much call that "the stack".
Variables with static storage duration generally inhabit one or more data sections. From the POV of the OS, data sections might be allocated from the heap before the program starts, but from the POV of the program they have no relation to the "free store".
You can tell the storage duration of a variable by examining its definition in the source -- if it's in function scope then it's automatic unless marked static. If it's not in function scope then it has static duration regardless of whether or not it is marked static (since the static keyword means something different there).
There is no portable way to tell the storage duration of a variable from its address, but particular implementations might provide ways to do it, or tools that you can use with greater or lesser reliability to take a guess.
Objects can also have dynamic storage duration (which generally is what "allocated on the heap" is intended to mean), but such objects are not variables, so that would be the trick if there is one.
[*] Or thread-local in C11 and C++11.
I don't think it has solutions. The code may adjust var's address by stack(heap) address scope, but it's would not be an exact way. At most, the code can only run in some certain platforms.
No it's not possible to determine that by memory location, the compiler would have to support it with isstack() to be portable.
One possible way to track memory allocation SPECIFICALY ON THE HEAP in C++ is overloading the new operator to keep tracking that for you. Then you know that if the memory is not allocated on the heap, it must be allocated on the stack.
#include <iostream>
#include <string>
// Variable to track how many allocations has been done.
static uint32_t s_alloc_count = 0;
void* operator new(size_t size){
s_alloc_count++;
std::cout << "Allocating " << size << " bytes\n";
return malloc(size);
}
Tracking the variable s_alloc_count you should be capable of see how many allocations on the heap has been done and have the size of these allocations printed on the console. Using debug tools like breakpoints, running the code "step-by-step" and console logging is one way to track where are these allocations. This is not an automated way but is a way to do that.
OBS: This tip should be only used for tests, avoid this type of code in production code.

It creates an array in C, What happen when come out of the program in the memory?

i have this doubt, You supose have an array with one billion of elements and you create the array as follows:
int array[1000] = {1,2, ..., n.}
and you finish the program.
not being like java where the machine java garbage collection the memory clean records about stored of this arrangement.
They stay in memory or released?
When the process exits, the operating system frees all memory that the process was using.
(This does not apply to certain small embedded operating systems.)
The memory of a program is managed by the operating system so it will get released in any case when the program quits.
In any case you're allocation, as it is written, could be
static and outside of any function (so it won't get managed on heap like it would happen in Java), it is not really allocated neither deallocated
automatic on stack (which is dynamic indeed but it is not how it is done in Java anyway) and it is released when it exits its scope
A real comparison would be something like
int *array = malloc(sizeof(int)*1000);
This would reside in memory until a free(array) is called or program quits.
If you do this
if (true) {
int array[1000] = {1,2,3};
//...
}
// array[1000] "freed" here.
Then the memory will be freed when you exit the "if" braces. This is because the memory resides on the stack and an "allocation" is just movement of the stack pointer. When the scope exits, the stack pointer is returned to where it was before the scope was entered. So in this case, allocation and deallocation is pretty much free (performance wise), assuming your stack is big enough to hold an additional 1000 integers. On some embedded systems, it won't be, and your app will crash.
Same goes for
int foo( int x )
{
int array[1000] = {1,2,3};
// ...
return array[0];
}
// array[] "freed" here.
Edit: in the last case, if you replace foo() with main, then the array will be "freed" when the program exits.
Normally OS will not allow you to have int a[10000000000000000000].
Their values stay in memory. But the memory will be returned.
Simply put, the memory will be released at the end of the program.
In c, If you define array like a[1000] then it will allocate in stack part and it will automatically free the memory as program exist. But if you dynamically allocate (by using malloc) then you have to use free(array) for freeing the memory.
Maximum size of your array depends on your system RAM size.

Do the automatic local variables are stored in the stack in C?

Okay I know that main()'s automatic local variables are stored in the stack and also any function automatic local variables too, but when I have tried the following code on gcc version 4.6.3:
#include <stdio.h>
int main(int argc, char *argv[]) {
int var1;
int var2;
int var3;
int var4;
printf("%p\n%p\n%p\n%p\n",&var1,&var2,&var3,&var4);
}
the results are :
0xbfca41e0
0xbfca41e4
0xbfca41e8
0xbfca41ec
according to the results var4 on the top of the stack and var1 on the bottom of the stack and the stack pointer now pointing on the address below var1 address....but why var4 on the
top of the stack and var1 on the bottom...its declared after var1 so I think logically that var1 should be on the top of the stack and any variable declared after var1 should be below
it in memory...so in my example like this:
>>var1 at 0xbfca41ec
>>var2 at 0xbfca41e8
>>var3 at 0xbfca41e4
>>var4 at 0xbfca41e0
>>and stack pointer pointing here
..
..
EDIT 1:
After reading the comment by #AusCBloke I’ve tried the following code :
#include <stdio.h>
void fun(){
int var1;
int var2;
printf("inside the function\n");
printf("%p\n%p\n",&var1,&var2);
}
int main(int argc, char *argv[]) {
int var1;
int var2;
int var3;
int var4;
printf("inside the main\n");
printf("%p\n%p\n%p\n%p\n",&var1,&var2,&var3,&var4);
fun();
return 0;
}
And the results :
inside the main
0xbfe82d60
0xbfe82d64
0xbfe82d68
0xbfe82d6c
inside the function
0xbfe82d28
0xbfe82d2c
so the variables inside fun() stack frame are below the variables inside main() stack frame and that’s true according to the nature of the stack ,..but inside the same stack frame its not necessary to be ordered from top to the bottom.
thanks #AusCBloke..... your comment helped me a lot
There is no requirement for these variables to be allocated in the order in which they were declared. They can be moved around by the compiler, or even optimized out entirely. If you need the relative addresses to stay the same, use a struct.
Objects with automatic storage duration are typically stored on the stack, but the language standard doesn't require it. In fact, the standard (the link is to the latest pre-release C11 draft)
doesn't even mention the word "stack".
The word "stack", unfortunately, is ambiguous.
In the most abstract sense, a stack is a data structure in which the most recently added items are removed first (last-in first-out, or LIFO). The requirements regarding the lifetime of objects with automatic storage duration (i.e., objects defined within a function with no static keyword) imply some kind of stack-like allocation.
The word "stack" is also commonly used to refer to a contiguous region of memory, typically controlled by a "stack pointer" pointing to the top-most element. The stack grows by moving the stack pointer away from the base, and shrinks by moving it toward the base. (It can grow in either direction, toward higher or lower memory addresses.) Most C compilers use this kind of contiguous stack to implement automatic objects -- but not all do. There have been C compilers for IBM mainframe systems which allocate storage for function calls from a heap-like structure, and the addresses for nested calls need not be uniformly in either increasing or decreasing order.
This is an unusual implementation, and there are very good reasons that this approach is not commonly used (a contiguous stack is simpler, more efficient, and is typically supported by the CPU). But the C standard is carefully written to avoid requiring a specific scheme, and C code that's carefully written to be portable will work correctly regardless of which method a compiler chooses. You don't need to know. All you really need to know about the address of var1 is that it's &var1. If you write if (&var1 < &var2) { ... }, then you're probably doing something wrong (that expression's behavior is undefined, BTW).
That's the standard C answer. I see that your question is tagged gcc. As far as I know, all versions of gcc use a contiguous stack. But even so, there's rarely any benefit in taking advantage of this.
On many (most) modern platform stack grows from higher addresses in memory to lower addresses. I..e. when you start your program, the stack pointer is immediately put to some address in memory, which is determined by the maximum stack size in your program. Once things get pushed into stack, the stack pointer actually moves down.
I could be wrong but stacks start in lower memory addresses and are then added to. So it is correct for var4 to be on top. It is a stack after all!
edit: the assembly code behind it has the stack pointer at the bottom of the memory stack and whenever data is added, the stackpointer is incremented so that the next variable falls ontop.
I'm 99.9999% sure that the answer is Yes. Also, the stack grows downwards on Intel architecture machines, not upwards. The lower area becomes the virtual "top" of the stack (it's upside-down, so to speak).
So technically, the variables are in the correct order in stack memory.
EDIT: This is probably still compiler-specific, though.

memory location patterns on stack and heap

I'm just curious if there is any correlation between the length of the address of a variable (pointer) on stack and heap. On many occasions I have seen that those regarding stack variables are usually longer when compared to heap. For example consider the following simple test:
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
int i = 0;
int *j = malloc(sizeof(int)); *j = 0;
printf("&i = %p\n j = %p\n", &i, j);
free(j);
return 0;
}
output:
&i = 0x7fffe9c7fa5c
j = 0x100e010
These results are obtained in linux using gcc; could this be OS/compiler dependent?
The results depend on positions of the heap(s) and stack(s) in the address space of the program. These are determined by linker and processor architecture.
Due to ASLR, the exact numbers should be random on modern systems.
Nevertheless, heaps will usually grow upwards, and stacks downwards. Additionally, for performance and memory management reasons, both heaps and stacks will always start on page boundaries.
I believe it's because of the physical parts of the memory which we decide that they're called stack and heap. Since they start at opposite ends and grow towards the middle, it makes sense that one is lower and the other higher. It would be interesting to see what happens if you allocate 2 consecutive vars on the stack and 2 consecutive ones on the heap. This would help see which way the stack and heap grow. Actually I think for this to work you need to make a new stack frame (a new method) and allocate the second vars there, otherwise you remain in the same stack frame.

Resources