Storage class of referenced memory [duplicate] - c

Stumbled upon this interview question somewhere,
In C,
Given a variable x, how do you find out if the space for that variable is allocated on the stack or heap?
(Is there any way to find it out programatically and not having to go through the symbol table, etc? And does finding if the space is allocated in stack or heap has any practical implications?)

No, not in general.
Do you know of gcc -fsplit-stack ?
It is up to the implementation to decide whether to allocate a contiguous stack or a stack where blocks are interleaved with heap blocks in memory. Good luck figuring out whether a block was allocated for the heap or the stack when the latter is split.

If you are working on an architecture that stores the stack on a larger address than the heap, you could compare the variable address with the bottom of the stack. Using the pthread threading API, this comparison would look like this:
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <inttypes.h>
int is_stack(void *ptr)
{
pthread_t self = pthread_self();
pthread_attr_t attr;
void *stack;
size_t stacksize;
pthread_getattr_np(self, &attr);
pthread_attr_getstack(&attr, &stack, &stacksize);
return ((uintptr_t) ptr >= (uintptr_t) stack
&& (uintptr_t) ptr < (uintptr_t) stack + stacksize);
}
The test:
int main()
{
int x;
int *p1 = malloc(sizeof(int));
int *p2 = &x;
printf("%d %d\n", is_stack(p1), is_stack(p2));
return 0;
}
...prints 0 1, as expected.
The above code will not detect storage from stacks in other threads. To do that, the code would need to track all created threads.

This is NOT guaranteed by any standard BUT
on most platforms the stack grows down from highest address available, and heap grows up from the bottom if the most significant byte of the address is in the top half of the available memory space for your platform, and you haven't allocated gigabytes of memory, it's a pretty good bet it's on the stack.
#include <iostream>
#include <stdlib.h>
int main()
{
int x = 0;
int* y = new int;
unsigned int a1 = (int) &x;
unsigned int a2 = (int) y;
std::cout<<std::hex<<a1<<" "<<a2<<std::endl;
}
gives the output ffbff474 21600 on the machine I'm typing this.

It might be a trick question. Variables have either automatic or static storage duration[*]. You can fairly safely say that automatics are allocated "on the stack", at least assuming they aren't optimized into registers. It's not a requirement of the standard that there be "a stack", but a conforming C implementation must maintain a call stack and associate automatic variables with the levels of the call stack. So whatever the details of what it actually does, you can pretty much call that "the stack".
Variables with static storage duration generally inhabit one or more data sections. From the POV of the OS, data sections might be allocated from the heap before the program starts, but from the POV of the program they have no relation to the "free store".
You can tell the storage duration of a variable by examining its definition in the source -- if it's in function scope then it's automatic unless marked static. If it's not in function scope then it has static duration regardless of whether or not it is marked static (since the static keyword means something different there).
There is no portable way to tell the storage duration of a variable from its address, but particular implementations might provide ways to do it, or tools that you can use with greater or lesser reliability to take a guess.
Objects can also have dynamic storage duration (which generally is what "allocated on the heap" is intended to mean), but such objects are not variables, so that would be the trick if there is one.
[*] Or thread-local in C11 and C++11.

I don't think it has solutions. The code may adjust var's address by stack(heap) address scope, but it's would not be an exact way. At most, the code can only run in some certain platforms.

No it's not possible to determine that by memory location, the compiler would have to support it with isstack() to be portable.

One possible way to track memory allocation SPECIFICALY ON THE HEAP in C++ is overloading the new operator to keep tracking that for you. Then you know that if the memory is not allocated on the heap, it must be allocated on the stack.
#include <iostream>
#include <string>
// Variable to track how many allocations has been done.
static uint32_t s_alloc_count = 0;
void* operator new(size_t size){
s_alloc_count++;
std::cout << "Allocating " << size << " bytes\n";
return malloc(size);
}
Tracking the variable s_alloc_count you should be capable of see how many allocations on the heap has been done and have the size of these allocations printed on the console. Using debug tools like breakpoints, running the code "step-by-step" and console logging is one way to track where are these allocations. This is not an automated way but is a way to do that.
OBS: This tip should be only used for tests, avoid this type of code in production code.

Related

Is the stack pre-allocated in a process?

Well, my question is as follows, I saw somewhere that a linux process allocates 8 MiB on the stack to be used, if I have a C program for example, that I only allocate two variables on the stack, it is right to say that I allocated or is it better to say that I just reused that space? Since a process allocates 8 MiB on the stack it does not depend on the size that I am going to use in my program, as long as it does not exceed my stack, that is, whichever term is appropriate, I will allocate a data on the stack or I will reuse a data that has already been allocated by a linux process?
#include <stdio.h>
void f() {
int x = 5;
printf("Value = %d End = %p\n", x, &x);
}
void g() {
int y = 10;
printf("Value = %d End = %p\n", y, &y);
}
int main(){
f();
g();
return 0;
}
See that the addresses will be the same, because I reused the size that had already been allocated, the same wouldn't happen with malloc, summarizing the term Allocated right data in the Stack isn't very correct?
Is the stack pre-allocated in a process?
On a stack-based architecture, a process will have stack space available to it from the beginning of its execution. That could be described as "pre-allocated". However, do note that in some contexts, it may be possible for a process's stack to be extended during the lifetime of the process. Perhaps that changes how you would view it?
In any case, that has little to do with whether the process of assigning storage space for automatic variables should be described as "allocation". Although it has technical implications, it is of little account linguistically that such space may be carved out of the stack, as opposed to out of some other area of memory controlled by the process. The lifetimes of such objects do obey different rules than the lifetimes of mallocated objects, but so what?
if I have a C program for example, that I only allocate two variables on the stack, it is right to say that I allocated or is it better to say that I just reused that space?
People are likely to understand you just fine either way. Although I'm sure there are some who would quibble over whether "allocate" is technically correct for automatic variables, it is nevertheless widely used for them. If you are conversing with people, as opposed to writing technical documentation to which the distinction is important, then I would not hesitate to use "allocate" to describe assigning storage space to automatic variables.

Why does malloc need to be used for dynamic memory allocation in C?

I have been reading that malloc is used for dynamic memory allocation. But if the following code works...
int main(void) {
int i, n;
printf("Enter the number of integers: ");
scanf("%d", &n);
// Dynamic allocation of memory?
int int_arr[n];
// Testing
for (int i = 0; i < n; i++) {
int_arr[i] = i * 10;
}
for (int i = 0; i < n; i++) {
printf("%d ", int_arr[i]);
}
printf("\n");
}
... what is the point of malloc? Isn't the code above just a simpler-to-read way to allocate memory dynamically?
I read on another Stack Overflow answer that if some sort of flag is set to "pedantic", then the code above would produce a compile error. But that doesn't really explain why malloc might be a better solution for dynamic memory allocation.
Look up the concepts for stack and heap; there's a lot of subtleties around the different types of memory. Local variables inside a function live in the stack and only exist within the function.
In your example, int_array only exists while execution of the function it is defined in has not ended, you couldn't pass it around between functions. You couldn't return int_array and expect it to work.
malloc() is used when you want to create a chunk of memory which exists on the heap. malloc returns a pointer to this memory. This pointer can be passed around as a variable (eg returned) from functions and can be used anywhere in your program to access your allocated chunk of memory until you free() it.
Example:
'''C
int main(int argc, char **argv){
int length = 10;
int *built_array = make_array(length); //malloc memory and pass heap pointer
int *array = make_array_wrong(length); //will not work. Array in function was in stack and no longer exists when function has returned.
built_array[3] = 5; //ok
array[3] = 5; //bad
free(built_array)
return 0;
}
int *make_array(int length){
int *my_pointer = malloc( length * sizeof int);
//do some error checking for real implementation
return my_pointer;
}
int *make_array_wrong(int length){
int array[length];
return array;
}
'''
Note:
There are plenty of ways to avoid having to use malloc at all, by pre-allocating sufficient memory in the callers, etc. This is recommended for embedded and safety critical programs where you want to be sure you'll never run out of memory.
Just because something looks prettier does not make it a better choice.
VLAs have a long list of problems, not the least of which they are not a sufficient replacement for heap-allocated memory.
The primary -- and most significant -- reason is that VLAs are not persistent dynamic data. That is, once your function terminates, the data is reclaimed (it exists on the stack, of all places!), meaning any other code still hanging on to it are SOL.
Your example code doesn't run into this problem because you aren't using it outside of the local context. Go ahead and try to use a VLA to build a binary tree, then add a node, then create a new tree and try to print them both.
The next issue is that the stack is not an appropriate place to allocate large amounts of dynamic data -- it is for function frames, which have a limited space to begin with. The global memory pool, OTOH, is specifically designed and optimized for this kind of usage.
It is good to ask questions and try to understand things. Just be careful that you don't believe yourself smarter than the many, many people who took what now is nearly 80 years of experience to design and implement systems that quite literally run the known universe. Such an obvious flaw would have been immediately recognized long, long ago and removed before either of us were born.
VLAs have their place, but it is, alas, small.
Declaring local variables takes the memory from the stack. This has two ramifications.
That memory is destroyed once the function returns.
Stack memory is limited, and is used for all local variables, as well as function return addresses. If you allocate large amounts of memory, you'll run into problems. Only use it for small amounts of memory.
When you have the following in your function code:
int int_arr[n];
It means you allocated space on the function stack, once the function will return this stack will cease to exist.
Image a use case where you need to return a data structure to a caller, for example:
Car* create_car(string model, string make)
{
Car* new_car = malloc(sizeof(*car));
...
return new_car;
}
Now, once the function will finish you will still have your car object, because it was allocated on the heap.
The memory allocated by int int_arr[n] is reserved only until execution of the routine ends (when it returns or is otherwise terminated, as by setjmp). That means you cannot allocate things in one order and free them in another. You cannot allocate a temporary work buffer, use it while computing some data, then allocate another buffer for the results, and free the temporary work buffer. To free the work buffer, you have to return from the function, and then the result buffer will be freed to.
With automatic allocations, you cannot read from a file, allocate records for each of the things read from the file, and then delete some of the records out of order. You simply have no dynamic control over the memory allocated; automatic allocations are forced into a strictly last-in first-out (LIFO) order.
You cannot write subroutines that allocate memory, initialize it and/or do other computations, and return the allocated memory to their callers.
(Some people may also point out that the stack memory commonly used for automatic objects is commonly limited to 1-8 mebibytes while the memory used for dynamic allocation is generally much larger. However, this is an artifact of settings selected for common use and can be changed; it is not inherent to the nature of automatic versus dynamic allocation.)
If the allocated memory is small and used only inside the function, malloc is indeed unnecessary.
If the memory amount is extremely large (usually MB or more), the above example may cause stack overflow.
If the memory is still used after the function returned, you need malloc or global variable (static allocation).
Note that the dynamic allocation through local variables as above may not be supported in some compiler.

Where is an array stored in memory?

I am trying to understand how memory is managed in a C program. I know that there are the following segments in memory:
Initialized Data segment
BSS
Stack
Heap
Code
Now consider the following program:
#include <stdio.h>
int main(){
int arr[4] = {1,2,3,4};
int x = 10;
printf("Hello World!");
}
In the above program,both arr and x are locally declared within the main function. I thought that this would mean that they would both be allocated space on the function stack.
However, when I ran the size command on linux, I found out that the array is actually being allocated space in the data segment.
I have searched for this online but have found conflicting information. Some answers say that all locally declared variables should go to the stack while others say that the array should go on the heap. I think the array would go to the heap in case I was dynamically allocated memory using malloc, which is not the case here in this example.
I have searched for this online but have found conflicting information.
Please do not read random blogs or such, they usually have bad information. On Stack Overflow wrong information tends to be downvoted or at least would usually have comments pointing out the inaccuracies and fallacies.
In the above program, both arr and x are locally declared within the main function. I thought that this would mean that they would both be allocated space on the function stack.
The C standard does not specify how memory for variablesobjects should be allocated. It only specifies that objects have storage durations, which define the lifetime of the variableobject
static, which will have lifetime from the beginning of the program until the very end
automatic, which will have the lifetime of the innermost block { ... } which contains the declaration (or compound literal), until the end of the block
thread-local, which will have the lifetime of a thread
allocated objects, which will be alive from malloc/calloc/realloc/aligned_alloc until corresponding free/realloc.
In addition to that, the C standard specifies that during its lifetime, an object will
have memory reserved for it
and have a constant address (which you can observe by using the & operator)
Now, in addition to that, there is the so-called as-if rule which says that a compiler can produce any program code for as long as the external behaviour of the program is the same, external behaviour meaning input, output, access to volatile objects and so on.
The variables in your program have automatic storage duration, which means every time you enter the main function you will have new objects with new lifetime until the end of the main function. Usually this would mean that they would be stored on the stack, because it will nicely handle the allocations and deallocations with minimal overhead. But your program has the same external behaviour as
#include <stdio.h>
int main(void) {
printf("Hello World!");
}
It means that the compiler can completely eliminate these two variables and not reserve any space for it.
Now, if you print address of the variables:
#include <stdio.h>
int main(void) {
int arr[4] = {1,2,3,4};
int x = 10;
printf("Hello World! %p, %p\n", (void *)arr, (void *)&x);
}
because the variables have their addresses taken and used for output, C cannot optimize them out. Are they on stack now? Well, C standard does not say. They need to have lifetime from at least beginning of main until the end - but the C compiler can decide to not use the stack for them, as the external behaviour of that program would be the same as
#include <stdio.h>
static int arr[4] = {1,2,3,4};
static int x = 10;
int main(void) {
printf("Hello World! %p, %p\n", (void *)arr, (void *)&x);
}
which would put these variables in the static data segment; of course the addresses would be different but again C does not give any guarantees about where the particular objects are located in memory, just that they will have addresses.
However, when I ran the size command on linux, I found out that the array is actually being allocated space in the data segment.
I think you have misunderstood what you have seen.
The C-standard doesn't say anything about this. It only says that arr has automatic storage duration. However, most (if not all) systems will save both x and arr on a stack.
Try this code:
#include<stdio.h>
int main(){
int arr[4] = {1,2,3,4};
int x = 10;
static int i = 0;
printf("Hello World! arr is here %p and x is here %p\n", (void*)arr, (void*)&x);
++i;
if (i < 3) main();
return 0;
}
Possible output:
Hello World! arr is here 0x7ffcdaba4170 and x is here 0x7ffcdaba416c
Hello World! arr is here 0x7ffcdaba4140 and x is here 0x7ffcdaba413c
Hello World! arr is here 0x7ffcdaba4110 and x is here 0x7ffcdaba410c
Even if this isn't a solid proof, it strongly indicates that the system is using a stack and that the stack grows towards lower addresses and that both arr and x is stores on that stack.
BTW: Printing the stack-pointer can't be done in a portable way but this is a good read: Print out value of stack pointer
The storage of program in C works as follow :
global variables -------> data
static variables -------> data
constant data types -----> code and/or data. Consider string literals for a situation when a constant itself would be stored in the data segment, and references to it would be embedded in the code
local variables(declared and defined in functions) --------> stack
variables declared and defined in main function -----> heap also stack
pointers(ex: char *arr, int *arr) -------> heap data or stack, depending on the context. C lets you declare a global or a static pointer, in which case the pointer itself would end up in the data segment.
dynamically allocated space(using malloc, calloc, realloc) --------> stack heap
It is worth mentioning that "stack" is officially called "automatic storage class".

Do the automatic local variables are stored in the stack in C?

Okay I know that main()'s automatic local variables are stored in the stack and also any function automatic local variables too, but when I have tried the following code on gcc version 4.6.3:
#include <stdio.h>
int main(int argc, char *argv[]) {
int var1;
int var2;
int var3;
int var4;
printf("%p\n%p\n%p\n%p\n",&var1,&var2,&var3,&var4);
}
the results are :
0xbfca41e0
0xbfca41e4
0xbfca41e8
0xbfca41ec
according to the results var4 on the top of the stack and var1 on the bottom of the stack and the stack pointer now pointing on the address below var1 address....but why var4 on the
top of the stack and var1 on the bottom...its declared after var1 so I think logically that var1 should be on the top of the stack and any variable declared after var1 should be below
it in memory...so in my example like this:
>>var1 at 0xbfca41ec
>>var2 at 0xbfca41e8
>>var3 at 0xbfca41e4
>>var4 at 0xbfca41e0
>>and stack pointer pointing here
..
..
EDIT 1:
After reading the comment by #AusCBloke I’ve tried the following code :
#include <stdio.h>
void fun(){
int var1;
int var2;
printf("inside the function\n");
printf("%p\n%p\n",&var1,&var2);
}
int main(int argc, char *argv[]) {
int var1;
int var2;
int var3;
int var4;
printf("inside the main\n");
printf("%p\n%p\n%p\n%p\n",&var1,&var2,&var3,&var4);
fun();
return 0;
}
And the results :
inside the main
0xbfe82d60
0xbfe82d64
0xbfe82d68
0xbfe82d6c
inside the function
0xbfe82d28
0xbfe82d2c
so the variables inside fun() stack frame are below the variables inside main() stack frame and that’s true according to the nature of the stack ,..but inside the same stack frame its not necessary to be ordered from top to the bottom.
thanks #AusCBloke..... your comment helped me a lot
There is no requirement for these variables to be allocated in the order in which they were declared. They can be moved around by the compiler, or even optimized out entirely. If you need the relative addresses to stay the same, use a struct.
Objects with automatic storage duration are typically stored on the stack, but the language standard doesn't require it. In fact, the standard (the link is to the latest pre-release C11 draft)
doesn't even mention the word "stack".
The word "stack", unfortunately, is ambiguous.
In the most abstract sense, a stack is a data structure in which the most recently added items are removed first (last-in first-out, or LIFO). The requirements regarding the lifetime of objects with automatic storage duration (i.e., objects defined within a function with no static keyword) imply some kind of stack-like allocation.
The word "stack" is also commonly used to refer to a contiguous region of memory, typically controlled by a "stack pointer" pointing to the top-most element. The stack grows by moving the stack pointer away from the base, and shrinks by moving it toward the base. (It can grow in either direction, toward higher or lower memory addresses.) Most C compilers use this kind of contiguous stack to implement automatic objects -- but not all do. There have been C compilers for IBM mainframe systems which allocate storage for function calls from a heap-like structure, and the addresses for nested calls need not be uniformly in either increasing or decreasing order.
This is an unusual implementation, and there are very good reasons that this approach is not commonly used (a contiguous stack is simpler, more efficient, and is typically supported by the CPU). But the C standard is carefully written to avoid requiring a specific scheme, and C code that's carefully written to be portable will work correctly regardless of which method a compiler chooses. You don't need to know. All you really need to know about the address of var1 is that it's &var1. If you write if (&var1 < &var2) { ... }, then you're probably doing something wrong (that expression's behavior is undefined, BTW).
That's the standard C answer. I see that your question is tagged gcc. As far as I know, all versions of gcc use a contiguous stack. But even so, there's rarely any benefit in taking advantage of this.
On many (most) modern platform stack grows from higher addresses in memory to lower addresses. I..e. when you start your program, the stack pointer is immediately put to some address in memory, which is determined by the maximum stack size in your program. Once things get pushed into stack, the stack pointer actually moves down.
I could be wrong but stacks start in lower memory addresses and are then added to. So it is correct for var4 to be on top. It is a stack after all!
edit: the assembly code behind it has the stack pointer at the bottom of the memory stack and whenever data is added, the stackpointer is incremented so that the next variable falls ontop.
I'm 99.9999% sure that the answer is Yes. Also, the stack grows downwards on Intel architecture machines, not upwards. The lower area becomes the virtual "top" of the stack (it's upside-down, so to speak).
So technically, the variables are in the correct order in stack memory.
EDIT: This is probably still compiler-specific, though.

memory location patterns on stack and heap

I'm just curious if there is any correlation between the length of the address of a variable (pointer) on stack and heap. On many occasions I have seen that those regarding stack variables are usually longer when compared to heap. For example consider the following simple test:
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
int i = 0;
int *j = malloc(sizeof(int)); *j = 0;
printf("&i = %p\n j = %p\n", &i, j);
free(j);
return 0;
}
output:
&i = 0x7fffe9c7fa5c
j = 0x100e010
These results are obtained in linux using gcc; could this be OS/compiler dependent?
The results depend on positions of the heap(s) and stack(s) in the address space of the program. These are determined by linker and processor architecture.
Due to ASLR, the exact numbers should be random on modern systems.
Nevertheless, heaps will usually grow upwards, and stacks downwards. Additionally, for performance and memory management reasons, both heaps and stacks will always start on page boundaries.
I believe it's because of the physical parts of the memory which we decide that they're called stack and heap. Since they start at opposite ends and grow towards the middle, it makes sense that one is lower and the other higher. It would be interesting to see what happens if you allocate 2 consecutive vars on the stack and 2 consecutive ones on the heap. This would help see which way the stack and heap grow. Actually I think for this to work you need to make a new stack frame (a new method) and allocate the second vars there, otherwise you remain in the same stack frame.

Resources