Can we get the elements in the middle of "stack" directly? - c

Think of the situation as follow.
int main(void)
{
...
int a=10; //a get into the stack.
int b=20; //b get into the stack.
int c=30; //c get into the stack.
...
}
As we know, the “stack segment” is satisfied with the storage approach of "stack data structure"; and here, the local variables a, b, and c are exactly stored in such a direction of memory, so in theory we can only access to the element at the top of the stack.
But what if we do something like this?
printf("b = %d",b);
Local variable b is in the middle of a and c, but we can get it.
So...can we say that we can directly get the element in the middle of the stack?
Here is the image of " a, b, and c stored in stack "

the local variables a, b, and c are exactly stored in such a direction of memory
I don't know from where you got this but this is not true, at least in modern compilers.
First of all, C itself doesn't specify anything about using a stack. How the function calls are implemented is implementation defined. Lots of common implementations use stack like data structure to implement function call in the sense that last called function will be returned first.
But this doesn't mean that the local variables are stored in stack like structure. There are lots of options to the compiler like:
It can eliminate the variable completely if that is not needed in run-time.
It can place the variables in register.
It can re-order variables.
In all of these cases the only thing that compiler guarantees that the observable behavior of the code is not changed.
Since it doesn't store variables in stack like data structure, it has no problem to access them in middle.

First of all C standard never utter stack` anywhere. Now moving from standard - implementors may implement such a way that local variables are stored in stack memory as part of function frame.
Now you are thinking - stack is always accessed on top but here we can access the variable b directly. Though it is in the middle somewhere. Nope you are wrong - the thing is the frame of the functions are which are stored in stack and popped off when done(though an implementation can do it other way also - but speaking in general). It is not the variables that are the unit of operations here.
By accessing b we are not violating any rule of stack data structure. The function frames are the ones which are accessed in LIFO manner not the variables inside those frames.
Also it's a bit out of context now a days to segregate like that. We can simply say that they have automatic storage duration. And that's it. They can be implemented in group of registers also(standard won't stop them). The function frames are the one which will likely to be have that stack data-structures behavior.

Usage of the stack to store automatic variables is only an implementation detail, nothing is required by the standard. But at a lower level it is indeed the most common implementation. It is because in processors, a special register (stack pointer) is used to store return addresses in function calls (instructions call and return). When automatic variables are also stored in that stack, it is trivial to reclaim there storage back at return time or at end of the block. But they are not individually pushed onto a stack: a frame pointer is used to store the memory zone for the current block (including a reference for upper frames) and the stack pointer is increased in one single operation for the size of the frame containing all the local variables. Then those variables are known by the offset to the current frame pointer. So they are known by their own address and not as elements of a stack.

Related

How stack structure works with compound statements inside a function?

I'm trying to learn c programming and can't understand how stacks work.
Everywhere I read I find that when a function is called stack frame is created in the stack which contains all the data for the function call- parameters, return address and local variables. And the stack frame is removed releasing the memory when the function returns.
But what if we had a compound statement inside the function which have its own variables. Is the memory for the local variables for block is also allocated inside the stack frame when the function call and released when it returns.
Example
int main(){
int a = 10;
if(int a<50){
int b=9;
}
else{
int c=10;
}
}
Is the memory for b and c is allocated with a when the function starts executing?
And deallocated when the function returns?
If so than there is no difference other than the visibility of the variable when declaring it in the beginning of the function or inside a another block in the function.
Please explain.
The C standard doesn't specify how such things are to be implemented. The C standard doesn't even mention a stack! A stack is a common way of implementing function calls but nothing in the standard requires a stack. All such things are implementation specific details. For the posted code, the standard only specifies when the variables are in scope.
So there is no general answer to your question. The answer depends on your specific system, i.e. processor, compiler, etc.
Provided that your system uses a stack (which is likely), the compiler may reserve stack space for all 3 variables or it may reserve space for 2 variables, i.e. one for awhile b and c share the other. Both implementations will be legal. The compiler is even allowed to place the variables directly in some registers so that nothing needs to be reserved on the stack.
You can check your specific system by looking at the generated assembly code.
A C implementation may implement this in multiple ways. Let’s suppose your example objects, a, b, and c, are actually used in your code in some way that results in the compiler actually allocating memory for them and not optimizing them away. Then:
The compiler could allocate stack space (by decreasing the top-of-stack pointer) for all of a, b, and c when the function starts, and release it when the function ends.
The compiler could allocate stack space for a when the function starts, then allocate space (again by decreasing the stack pointer) in the middle of the function when space for b or c is needed, then release that stack space as each block ends.
In a good modern compiler, the compiler is likely to analyze all the active lifetimes of the objects and find a somewhat optimal solution for using stack space in overlapping ways. By “active lifetime”, I mean the time from when the value of an object is set to the last time that value is needed (not the C standard’s definition of “lifetime”). For example, in int a = f(x); … g(a); h(y); a = f(y); … g(a);, there are actually two lifetimes for a, from its initial assignment to the first g(a) and from the assignment a = f(y); to the second g(a);. If the compiler needs memory to store a, it might use different memory for these two lifetimes.
Because of the above, what memory is used for which C object can get quite complicated. A particular memory location might be used for a at one time and for b at another. It may depend on loops and goto statements in your code. It also depends on whether the address of an object is taken—if the address is taken, the compiler may have to keep the object in one place, so that the address is consistent. (It might be able to get away without doing that, depending on how it can see the address is used.)
Basically, the compiler is free to use the stack, other memory, and registers in whatever way it chooses as long as the observable behavior of your program remains as it is defined by the C standard.
(The observable behavior is the input/output interactions of your program, the data written to files, and the accesses to volatile objects.)
Your example as stated is not valid since you have no brackets in the if-else statement. However, in the example below all variables are typically allocated when the function is entered:
int main(void)
{
int a = 10;
if (a < 50) {
int b = 9;
} else {
int c = 10;
}
}
As mentioned by user "500 - Internal Server Error", this is an implementation issue.

Is the stack offset assigned to local stack variables ever reused, e.g. in case it becomes dead or goes out of scope?

In other words, will compilers allocate enough space in the program stack to store all variables at the deepest level of block nesting in the current function or do they look at liveness and the scope of variables too?
void zoo(int num) {
if (num) {
int a = foo();
bar(a);
} else {
int b = foo();
bar(b);
}
}
For example the above code will be assigned different offsets on the stack for a and b, even though, if they were assigned only one offset (e.g. rbp - 8) it would have been legal too. My question is that will compilers like gcc and clang ever output assembly where multiple variables are assigned the same static offset?
Is there anything in the specifications about this?
I want to know if there is a unique mapping between source variables and the stack offsets present in a compiled assembly file.
There is, in general, no unique mapping between objects with automatic storage duration (“local” objects defined inside a function or block) and stack offsets. I have seen compiler-generated code reuse the same stack location for different objects, either because the use of one did not overlap the use of the other in the C code or because the compiler had moved one into a register for whatever purposes and no longer needed to use the stack location for it.
The C and C++ standards do not require implementations to implement their stack allocation in any particular way. They are free to reuse stack locations. They are also free to allocate all the stack space that might be needed1 or to wait to see if particular blocks are entered or not before further allocating stack space for the objects inside those blocks.
Note
1 Implementations that support variable-length arrays generally must wait until the size of the array can be determined before allocating space for it.

Assembly local variable and parameters

I have the following code
#include<stdio.h>
int adunare(int a,int b)
{
int c=3;
int d=6;
while(c>10) c++;
if(c>15) return a+b+c+d;
else return a+b+c-d;
}
int main()
{
int w=5;
int y=6;
printf("%d",adunare(w,y));
}
My question is in assembly it puts the variable w,y at the [esp+24] ,[esp+28].
Why it puts my variables there?
I know that local variables are always [ebp-....].
Why here it is not [ebp-..]?
I know that local variables are always [ebp-....]
They're not (as evidenced by your question too, I suppose).
It's legal for a compiler to compile really naively, always using a frame pointers (even in functions that don't do variable-size stack allocations) and always putting locals on the stack in the first place (which is definitely not a rule). In a first year course in university, it is sometimes pretended that that's normal, to keep things simple.
Not using a frame pointer is usually possible, it works mostly the same as if you had used one except that offsets are calculated relative to the stack pointer, which you are now only allowed to move in predictable ways. Because it has to be predictable (that is, every instructions that references a stack slot can use a constant offset to do so), this optimization cannot be used in functions that use alloca or VLAs. In your example function neither are used, so no frame pointer is necessary.
Also in general you should not expect local variables to correspond to specific stack slots in the first place, regardless of how they are addressed. It is allowed, common, and often a good thing, to keep a variable in a register over the entire lifetime of the variable. Especially if that life time is short or if the usage-density is very high. On top of that, variables with non-overlapping life times can (and should, because it reduces the stack size) share stack slots, since it would be the case that at most one of them needs storage at any one moment (thanks to the assumption of non-overlapping life times).
It's also allowed to have a variable hop from one stack slot to an other, this might happen when you swap two variables in a way that allows the swap to be resolved "virtually", by just changing which stack slot the variables live in and not actually exchanging the data.
That's probably a compiler optimization. The variables aren't used within main scope so are placed directly on the stack, ready for the function call.

Organization of Virtual Memory in C

For each of the following, where does it appear to be stored in memory, and in what order: global variables, local variables, static local variables, function parameters, global constants, local constants, the functions themselves (and is main a special case?), dynamically allocated variables.
How will I evaluate this experimentally,i.e., using C code?
I know that
global variables -- data
static variables -- data
constant data types -- code
local variables(declared and defined in functions) -- stack
variables declared and defined in main function -- stack
pointers(ex: char *arr,int *arr) -- data or stack
dynamically allocated space(using malloc,calloc) -- heap
You could write some code to create all of the above, and then print out their addresses. For example:
void func(int a) {
int i = 0;
printf("local i address is %x\n", &i);
printf("parameter a address is %x\n", &a);
}
printf("func address is %x\n", (void *) &func);
note the function address is a bit tricky, you have to cast it a void* and when you take the address of a function you omit the (). Compare memory addresses and you will start to get a picture or where things are. Normally text (instructions) are at the bottom (closest to 0x0000) the heap is in the middle, and the stack starts at the top and grows down.
In theory
Pointers are no different from other variables as far as memory location is concerned.
Local variables and parameters might be allocated on the stack or directly in registers.
constant strings will be stored in a special data section, but basically the same kind of location as data.
numerical constants themselves will not be stored anywhere, they will be put into other variables or translated directly into CPU instructions.
for instance int a = 5; will store the constant 5 into the variable a (the actual memory is tied to the variable, not the constant), but a *= 5 will generate the code necessary to multiply a by the constant 5.
main is just a function like any other as far as memory location is concerned. A local main variable is no different from any other local variable, main code is located somewhere in code section like any other function, argc and argv are just parameters like any others (they are provided by the startup code that calls the main), etc.
code generation
Now if you want to see where the compiler and runtime put all these things, a possibility is to write a small program that defines a few of each, and ask the compiler to produce an assembly listing. You will then see how each element is stored.
For heap data, you will see calls to malloc, which is responsible for interfacing with the dynamic memory allocator.
For stack data, you will see strange references to stack pointers (the ebp register on x86 architectures), that will both be used for parameters and (automatic) local variables.
For global/static data, you will see labels named after your variables.
Constant strings will probably be labelled with an awful name, but you will notice they all go into a section (usually named bss) that will be linked next to data.
runtime addresses
Alternatively, you can run this program and ask it to print the addresses of each element. This, however, will not show you the register usage.
If you use a variable address, you will force the compiler to put it into memory, while it could have kept it into a register otherwise.
Note also that the memory organization is compiler and system dependent. The same code compiled with gcc and MSVC may have completely different addresses and elements in a completely different order.
Code optimizer is likely to do strange things too, so I advise to compile your sample code with all optimizations disabled first.
Looking at what the compiler does to gain size and/or speed might be interesting though.

why windows use stacks for storing the local variables?

Why C use stacks for storing the local variables? Is this just to have independent memory space or to have feature of automatic clearing of all the local variables and objects once it goes out of scope?
I have few more questions around the same,
Question 1) How local variables are referenced from the instruction part. Consider NewThreadFunc is the function which is called by createThread function.
DWORD WINAPI NewThreadFunc(PVOID p_pParam)
{
int l_iLocalVar1 = 10;
int l_iLocalVar2 = 20;
int l_iSumLocalVar = l_iLocalVar1 + l_iLocalVar2;
}
The stack for this thread would look like this,
| p_pParam |
| NewThreadFunc()|
| 10 |
| 20 |
| 30 |
| |
.
.
.
Now my question is, while executing this function how would CPU know the address of local variables (l_iSumLocalVar, l_iLocalVar1 and l_iLocalVar2)? These variables are not the pointers that they store the address from where they have to fetch the value. My question is wrt the stack above.
Question 2) If this function further calls any other function how would the stack behave to it? As I know, the stack would get divided into itself further. If this is true how the local variables of the callee function gets hidden from the called function. Basically how the local variables maintains the scope rules?
I know these could be very basic questions but some how I could not think an answer to these.
Firstly, it is not "Windows" that uses stack for local variables. It has absolutely nothing to do with "Windows", or with any other OS for that matter. It is your compiler that does that. Nobody forces your compiler to use system stack for that purpose, but normally this is the simplest and most efficient way to implement local variables.
Secondly, compilers use stacks to store local variables (be that system-provided stacks or compiler-implemented stack) simply because stack-like storage matches the language-mandated semantics of local variables very precisely. The storage duration of local variables is defined by their declarative regions (blocks) which strictly nest into each other. This immediately means that storage durations of local variables follow the LIFO principle: last in - first out. So, using a stack - a LIFO data structure - for allocating objects with LIFO storage duration is the first and the most natural thing that comes to mind.
Local variables are typically addressed by their offset from the beginning of the currently active stack frame. The compiler knows the exact offset of each local variable at compile time. The compiler generates the code that will allocate the stack frame for the current function by: 1) memorizing the current position of the stack pointer when the function is entered (let's say it is memorized in register R1) and 2) moving the current stack pointer by the amount necessary to store all local variables of the function. Once the stack frame is allocated in this fashion, your local variables l_iLocalVar1, l_iLocalVar2 and l_iSumLocalVar will simply be accessed through addresses R1 + 6, R1 + 10 and R1 + 14 (I used arbitrary offsets). In other words, local variables are not accessed by specific address values, since these addresses are not known at compile time. Instead local variables are accessed through calculated addresses. They are calculated as some run-time base address value + some compile-time offset value.
Normally the system calling convention reserves a register to be used as a "stack pointer". Local variable accesses are made relative to this register's value. Since every function must know how much stack space it uses, the compiler emits code to ensure the stack pointer is adjusted correctly for each function's requirements.
The scope of local variables is only enforced by the compiler, since it's a language construct, not anything to do with hardware. You can pass addresses of stack variables to other functions and they'll work correctly.
why is the stack used for local variables?
Well, the stack is an easy to use structure to reserve space for temporary variables. It has the benefit that it will be removed almost automatically when the function returns. An alternative would be to allocate memory from the OS, but then this would cause heavy memory fragmentation.
The stack can be easily allocated as well as freed again, so it is a natural choice.
All the variable addresses are relative to the stack pointer that is incremented at each function call or return. Fast easy way to allocate and cleanup memory used by these variables.

Resources