How assembly accesses/stores variables on the stack

How assembly accesses/stores variables on the stack - c

In assembly You can store data in registers or on the stack. Only the top of the stack can be accessed at any given moment (right?). Consider the following C code:
main(){
int x=2;
func();
}
func( int x ){
int i;
char a;
}
Upon calling func() the following is pushed onto the stack (consider a 32bit system):
variable x (4 bytes, pushed by main)
<RETURN ADDRESS> (4 bytes pushed by main?)
<BASE POINTER> (4 bytes pushed by func())
variable i (4 bytes, pushed by func())
variable a (1 byte, pushed by func())
I have the following questions:
In C code you can access the local variable from anywhere inside the function, but in assembly you can only access the top of the stack. The C code is translated into assembly (in machine code but assembly is the readable form of it). So how does assembly support the reading of variables that are not on top of the stack?
Did I leave out anything that would also be pushed to the stack in my example?
In assembly if you push a char on the stack or an int, how can it determine whethere it needs to push 4 bytes or 1 byte? Because it uses the same operation (push) right?
Thanks in advance
Gr. Maricruzz

The stack pointer at the beginning of the function is put into a register, and then the variables/arguments are accessed via this base address plus the offset for the variable.
If you want to see the code, instead of creating object files, let the compiler stop at creating assembler files. Then you can see exactly how it works. (Of course, that requires you to have a valid C program, unlike the one you have in the question now.)

The compiler is generating the assembly, each instruction set may differ but at the end of the day the stack is just a register holding an address to memory. The compiler is creating and knows the whole scope of the function it is creating and knows how far down to find each data item on the stack for local data items, so it will then create the appropriate code based on that instruction set to access those local items.
Some instruction sets you need to make a copy of the stack pointer and/or do math with the stack pointer as an operand but some other register as a result of that math, then based on that math (stack pointer + 8 words for example) you access that memory address. Some instruction sets have an addressing mode where you can in the load or store apply an offset to the stack pointer the math is done as part of the instruction execution and you dont have to use an intermediate result and a register.

Only the top of the stack can be accessed at any given moment (right?)
No, generally the ISA has instructions to access other elements on the stack as well. That is, accessing elements on the stack is not limited to push and pop like operations; typically you can just mov things back and forth between a stack location and a register.

Assembly can accesss any memory by address (just like C).
Simple, not optimized programs would put all local variables on stack before method execution, so variables addresses are address of execution frame plus some shift.
Then program can simple use pop and push method to store additional variables (i.e. subresults of some expression) on the top of the stack.
Summary:
There is register (ESP in x86) pointing to the top of the stack
Calling push is moving variable to the top of the stack and increasing this register
Calling pop is moving variable from the top of the stack and decreasing this register
Calling mov is moving variable between memory and registers and do nothing to stack register (ESP).

Related

In terms of using the stack, why do we need a base pointer and a stack pointer [duplicate]

This question already has answers here:
Why is it better to use the ebp than the esp register to locate parameters on the stack?
(1 answer)
What is the purpose of the EBP frame pointer register?
(5 answers)
Closed 3 years ago.
In terms of x86 assembly code anyways. I've been reading about function calls, but still can't fully grasp the need for a base / frame pointer (EBP) along with a stack pointer (ESP).
When we call a function, the current value of EBP will be placed on the stack and then EBP gets the current ESP value.
Place holders for the return value, function arguments and local variables of the function will then be placed on the stack, and the stack pointer ESP value will decrease (or increase) to point to after the last placeholder placed on the stack.
Now we have the EBP pointing to the beginning of the current stack frame, and ESP pointing to the end of the stack frame.
The EBP will be used to access the arguments and local variables of the function due to constant offsets from the EBP. That is fine. What I don't understand is, why can't the ESP just be used to access these variables also by using its offsets. The EBP points to the beginning of the stack frame , and the ESP points to the end of the stack frame. What's the difference?
The ESP shouldn't change from once there has been a placeholder for all the local variables etc. or should it?

Technically, it is possible (but sometimes hard) to track how many local and temporary variables are stored on the stack, so that accessing function input, and local variables can be done without EBP.
Consider the following "C" code ;
int func(int arg) {
int result ;
double x[arg+5] ;
// Do something with x, calculate result
return result ;
} ;
The numbers of items that are stored on the stack is now variables (arg+5 items of double). Calculating the location of 'arg' from the stack require run time calculation, which can have significant negative impact on performance.
With extra register (EBP), the location of arg is always at fixed location (EBP-2). Executing a 'return' is always simple - move BP to SP, and return, etc.
Bottom line, the decision to commit the EBP register to a single function (instead of using it as a general register) is a trade off between performance, simplicity, code size and other factors. Practical experience has shown the benefit outweigh the cost.

Side note about debugger/runtime tools:
Using of EBP make it easier for debugger (and other runtime tools) to 'walk the stack'. Tools can examine the stack at run-time, and without knowing anything about the current program stack (e.g., how many items have been pushed into eac frame), they can travel the stack all the way to the "main".
Without EBP pointing to the 'next' frame, run-time tools (including debugger) will face the very hard (impossible ?) task of knowing how to move from the ESP to specific local variables.

Why do all C functions converted to assembler start and end with some identical operations? [duplicate]

I know data in nested function calls go to the Stack.The stack itself implements a step-by-step method for storing and retrieving data from the stack as the functions get called or returns.The name of these methods is most known as Prologue and Epilogue.
I tried with no success to search material on this topic. Do you guys know any resource ( site,video, article ) about how function prologue and epilogue works generally in C ? Or if you can explain would be even better.
P.S : I just want some general view, not too detailed.

There are lots of resources out there that explain this:
Function prologue (Wikipedia)
x86 Disassembly/Calling Conventions (WikiBooks)
Considerations for Writing Prolog/Epilog Code (MSDN)
to name a few.
Basically, as you somewhat described, "the stack" serves several purposes in the execution of a program:
Keeping track of where to return to, when calling a function
Storage of local variables in the context of a function call
Passing arguments from calling function to callee.
The prolouge is what happens at the beginning of a function. Its responsibility is to set up the stack frame of the called function. The epilog is the exact opposite: it is what happens last in a function, and its purpose is to restore the stack frame of the calling (parent) function.
In IA-32 (x86) cdecl, the ebp register is used by the language to keep track of the function's stack frame. The esp register is used by the processor to point to the most recent addition (the top value) on the stack. (In optimized code, using ebp as a frame pointer is optional; other ways of unwinding the stack for exceptions are possible, so there's no actual requirement to spend instructions setting it up.)
The call instruction does two things: First it pushes the return address onto the stack, then it jumps to the function being called. Immediately after the call, esp points to the return address on the stack. (So on function entry, things are set up so a ret could execute to pop that return address back into EIP. The prologue points ESP somewhere else, which is part of why we need an epilogue.)
Then the prologue is executed:
push ebp ; Save the stack-frame base pointer (of the calling function).
mov ebp, esp ; Set the stack-frame base pointer to be the current
; location on the stack.
sub esp, N ; Grow the stack by N bytes to reserve space for local variables
At this point, we have:
...
ebp + 4: Return address
ebp + 0: Calling function's old ebp value
ebp - 4: (local variables)
...
The epilog:
mov esp, ebp ; Put the stack pointer back where it was when this function
; was called.
pop ebp ; Restore the calling function's stack frame.
ret ; Return to the calling function.

C Function Call Conventions and the Stack explains well the concept of a call stack
Function prologue briefly explains the assembly code and the hows and whys.
The gen on function perilogues

I am quite late to the party & I am sure that in the last 7 years since the question was asked, you'd have gotten a way clearer understanding of things, that is of course if you chose to pursue the question any further. However, I thought I would still give a shot at especially the why part of the prolog & the epilog.
Also, the accepted answer elegantly & quite simply explains the how of the epilog & the prolog, with good references. I only intend to supplement that answer with the why (at least the logical why) part.
I will quote the below from the accepted answer & try to extend it's explanation.
In IA-32 (x86) cdecl, the ebp register is used by the language to keep
track of the function's stack frame. The esp register is used by the
processor to point to the most recent addition (the top value) on the
stack.
The call instruction does two things: First it pushes the return
address onto the stack, then it jumps to the function being called.
Immediately after the call, esp points to the return address on the
stack.
The last line in the quote above says immediately after the call, esp points to the return address on the stack.
Why's that?
So let's say that our code that's getting currently executed has the following situation, as shown in the (really badly drawn) diagram below
So our next instruction to be executed is, say at the address 2. This is where the EIP is pointing. The current instruction has a function call (that would internally translate to the assembly call instruction).
Now ideally, because the EIP is pointing to the very next instruction, that would indeed be the next instruction to get executed. But since there's sort of a diversion from the current execution flow path, (that is now expected because of the call) the EIP's value would change. Why? Because now another instruction, that may be somewhere else, say at the address 1234 (or whatever), may need to get executed. But in order to complete the execution flow of the program as was intended by the programmer, after the diversion activities are done, the control must return back to the address 2 as that is what should have been executed next should the diversion have not happened. Let us call this address 2 as the return address in the context of the call that is being made.
Problem 1
So, before the diversion actually happens, the return address, 2, would need to be stored somewhere temporarily.
There could have been many choices of storing it in any of the available registers, or some memory location etc. But for (I believe good reason) it was decided that the return address would be stored onto the stack.
So what needs to be done now is increment the ESP (the stack pointer) such that the top of the stack now points at the next address on the stack. So TOS' (TOS before the increment) which was pointing to the address, say 292, now gets incremented & starts pointing to the address 293. That is where we put our return address 2. So something like this:
So it looks like now we have achieved our goal of temporarily storing the return address somewhere. We should now just go about making the diversion call. And we could. But there's a small problem. During the execution of the called function, the stack pointer, along with the other register values, could be manipulated multiple times.
Problem 2
So, although the return address of ours, is still stored on the stack, at location 293, after the called function finishes off executing, how would the execution flow know that it should now goto 293 & that's where it would find the return address?
So (I believe for good reason again) one of the ways of solving the above problem could be to store the stack address 293 (where the return address is) in a (designated) register called EBP. But then what about the contents of EBP? Would that not be overwritten? Sure, that's a valid point. So let's store the current contents of EBP on to the stack & then store this stack address into EBP. Something like this:
The stack pointer is incremented. The current value of EBP (denoted as EBP'), which is say xxx, is stored onto the top of the stack, i.e. at the address 294. Now that we have taken a backup of the current contents of EBP, we can safely put any other value onto the EBP. So we put the current address of the top of the stack, that is the address 294, in EBP.
With the above strategy in place, we solve for the Problem 2 discussed above. How? So now when the execution flow wants to know where from should it fetch the return address, it would :
first get the value from EBP out and point the ESP to that value. In our case, this would make TOS (top of stack) point to the address 294 (since that is what is stored in EBP).
Then it would restore the previous value of EBP. To do this it would simply take the value at 294 (the TOS), which is xxx (which was actually the older value of EBP), & put it back to EBP.
Then it would decrement the stack pointer to go to the next lower address in the stack which is 293 in our case. Thus finally reaching 293 (see that's what our problem 2 was). That's where it would find the return address, which is 2.
It will finally pop this 2 out into the EIP, that's the instruction that should have ideally been executed should the diversion have not happened, remember.
And the steps that we just saw being performed, with all the jugglery, to store the return address temporarily & then retrieve it is exactly what gets done with the function prolog (before the function call) & the epilog (before the function ret). The how was already answered, we just answered the why as well.
Just an end note: For the sake of brevity, I have not taken care of the fact that the stack addresses may grow the other way round.

Every function has an identical prologue(The starting of function code) and epilogue ( The ending of a function).
Prologue: The structure of Prologue is look like:
push ebp
mov esp,ebp
Epilogue: The structure of Prologue is look like:
leave
ret
More in detail : what is Prologue and Epilogue

How does a process keep track of its local variables

As far as I know, when a process allocates local variables, it does so by pushing them onto memory as a stack, but still accesses them as random memory by using an offset from the stack pointer to reference them (from this thread What is the idea behind using a stack for local variables?).
However, how does it know which variables have what offset? Am I thinking about this in the right way?

Offsets of local variables are "baked into" the machine code as constants. By the time the compiler is done, things that your program referred to as local variables are replaced with fixed memory offsets assigned by compiler.
Let's say you declare three local variables:
char a[8];
int b;
short c;
The compiler assigns offsets to these variables: a is at offset 0, b is at offset 8, and c is at offset 12. Let's say your code does b += c. Compiler translates this into a block of code that looks like this:
LOAD #(SP+8)
ADD #(SP+12)
STORE #(SP+8)
The only value that changes here is SP (stack pointer). All offsets are numeric constants.

Preface: The following text uses the x86 architecture as example. Other architectures do handle things differently.
[...] it does so by pushing them into memory as a stack, [...]
That's close. it does so by pushing them into memory ON THE stack [of the current process]. Every process has its own stack. Therefore with every context switch this Stack Frame does change - and so do its local variables (on the stack).
Usually(!) locally defined variables are referenced relative to the Stack Frame saved and present in the EBP register. This happens in contrast to globally defined varables which are referenced relative to the Data Segment Base. So every process does have its own stack with its own local variables.
Newer compilers can spare the register EBP and reference the variables relative to the ESP register. This has two consequences:
one register more available to use
one possibility less for debugging (debugging often used the EBP value as reference for the current Stack Frame to identify local variables). So this makes debugging harder without a separate debugging information file.
So to answer your main question
How does a process keep track of its local variables
Processes keep track of their Stack Frame (which contains the Local Variables), but not of their Local Variables themselves. And the Stack Frame changes with each Process Switch. The Local Variables are merely referenced relative to the Stack Frame Pointer kept in the register EBP (or relative to the Stack Pointer ESP, which depends on the compiler settings).

Compiler does the job in memorizing the offsets. These offsets are simply hardcoded. Like to load the variable to register (eg. to eax) compiler would produce something like mov eax, [esp-4], where esp is stack pointer register and 4 is the offset. If new variable will be pushed next mov to get/set variable will have bigger offset. All this is compilation time analysis.
Also, the stack on some platform may be reversed - so offset will be positive.

How do pointers work "under the hood" in C?

Take a simple program like this:
int main(void)
{
char p;
char *q;
q = &p;
return 0;
}
How is &p determined? Does the compiler calculate all such references before-hand or is it done at runtime? If at runtime, is there some table of variables or something where it looks these things up? Does the OS keep track of them and it just asks the OS?
My question may not even make sense in the context of the correct explanation, so feel free to set me straight.

How is &p determined? Does the compiler calculate all such references before-hand or is it done at runtime?
This is an implementation detail of the compiler. Different compilers can choose different techniques depending on the kind of operating system they are generating code for and the whims of the compiler writer.
Let me describe for you how this is typically done on a modern operating system like Windows.
When the process starts up, the operating system gives the process a virtual address space, of, let's say 2GB. Of that 2GB, a 1MB section of it is set aside as "the stack" for the main thread. The stack is a region of memory where everything "below" the current stack pointer is "in use", and everything in that 1MB section "above" it is "free". How the operating system chooses which 1MB chunk of virtual address space is the stack is an implementation detail of Windows.
(Aside: whether the free space is at the "top" or "bottom" of the stack, whether the "valid" space grows "up" or "down" is also an implementation detail. Different operating systems on different chips do it differently. Let's suppose the stack grows from high addresses to low addresses.)
The operating system ensures that when main is invoked, the register ESP contains the address of the dividing line between the valid and free portions of the stack.
(Aside: again, whether the ESP is the address of the first valid point or the first free point is an implementation detail.)
The compiler generates code for main that pushes the stack pointer by lets say five bytes, by subtracting from it if the stack is growing "down". It decreases by five because it needs one byte for p and four for q. So the stack pointer changes; there are now five more "valid" bytes and five fewer "free" bytes.
Let's say that q is the memory that is now in ESP through ESP+3 and p is the memory now in ESP+4. To assign the address of p to q, the compiler generates code that copies the four byte value ESP+4 into the locations ESP through ESP+3.
(Aside: Note that it is highly likely that the compiler lays out the stack so that everything that has its address taken is on an ESP+offset value that is divisible by four. Some chips have requirements that addresses be divisible by pointer size. Again, this is an implementation detail.)
If you do not understand the difference between an address used as a value and an address used as a storage location, figure that out. Without understanding that key difference you will not be successful in C.
That's one way it could work but like I said, different compilers can choose to do it differently as they see fit.

The compiler cannot know the full address of p at compile-time because a function can be called multiple times by different callers, and p can have different values.
Of course, the compiler has to know how to calculate the address of p at run-time, not only for the address-of operator, but simply in order to generate code that works with the p variable. On a regular architecture, local variables like p are allocated on the stack, i.e. in a position with fixed offset relative to the address of the current stack frame.
Thus, the line q = &p simply stores into q (another local variable allocated on the stack) the address p has in the current stack frame.
Note that in general, what the compiler does or doesn't know is implementation-dependent. For example, an optimizing compiler might very well optimize away your entire main after analyzing that its actions have no observable effect. The above is written under the assumption of a mainstream architecture and compiler, and a non-static function (other than main) that may be invoked by multiple callers.

This is actually an extraordinarily difficult question to answer in full generality because it's massively complicated by virtual memory, address space layout randomization and relocation.
The short answer is that the compiler basically deals in terms of offsets from some “base”, which is decided by the runtime loader when you execute your program. Your variables, p and q, will appear very close to the “bottom” of the stack (although the stack base is usually very high in VM and it grows “down”).

Address of a local variable cannot be completely calculated at compile time. Local variables are typically allocated in the stack. When called, each function allocates a stack frame - a single continuous block of memory in which it stores all its local variables. The physical location of the stack frame in memory cannot be predicted at compile time. It will only become known at run-time. The beginning of each stack frame is typically stored at run-time in a dedicated processor register, like ebp on Intel platform.
Meanwhile, the internal memory layout of a stack frame is pre-determined by the compiler at compile-time, i.e. it is the compiler who decides how local variables will be laid out inside the stack frame. This means that the compiler knows the local offset of each local variable inside the stack frame.
Put this all together and we get that the exact absolute address of a local variable is the sum of the address of the stack frame itself (the run-time component) and the offset of this variable inside that frame (the compile-time component).
This is basically exactly what the compiled code for
q = &p;
will do. It will take the current value of the stack frame register, add some compile-time constant to it (offset of p) and store the result in q.

In any function, the function arguments and the local variables are allocated on the stack, after the position (program counter) of the last function at the point where it calls the current function. How these variables get allocated on the stack and then deallocated when returning from the function, is taken care of by the compiler during compile time.
For e.g. for this case, p (1 byte) could be allocated first on the stack followed by q (4 bytes for 32-bit architecture). The code assigns the address of p to q. The address of p naturally then is 5 added or subtracted from the the last value of the stack pointer. Well, something like that, depends on how the value of the stack pointer is updated and whether the stack grows upwards or downwards.
How the return value is passed back to the calling function is something that I'm not certain of, but I'm guessing that it is passed through the registers and not the stack. So, when the return is called, the underlying assembly code should deallocate p and q, place zero into the register, then return to the last position of the caller function. Of course, in this case, it is the main function, so it is more complicated in that, it causes the OS to terminate the process. But in other cases, it just goes back to the calling function.
In ANSI C, all the local variables should be placed at the top of the function and is allocated once into the stack when entering the function and deallocated when returning from the function. In C++ or later versions of C, this becomes more complicated when local variables can also be declared inside blocks (like if-else or while statement blocks). In this case, the local variable is allocated onto the stack when entering the block and deallocated when leaving the block.
In all cases, the address of a local variable is always a fixed number added or subtracted from the stack pointer (as calculated by the compiler, relative to the containing block) and the size of the variable is determined from the variable type.
However, static local variables and global variables are different in C. These are allocated in fixed locations in the memory, and thus there's a fixed address for them (or a fixed offset relative to the process' boundary), which is calculated by the linker.
Yet a third variety is memory allocated on the heap using malloc/new and free/delete. I think this discussion would be too lengthy if we include that as well.
That said, my description is only for a typical hardware architecture and OS. All of these are also dependent on a wide variety of things, as mentioned by Emmet.

p is a variable with automatic storage. It lives only as long as the function it is in lives. Every time its function is called memory for it is taken from the stack, therefore, its address can change and is not known until runtime.

Segment prefix when using pointers as function parameters

I have an assembler/c question. I just read about segment prefixes, for example ds:varX and so on. The prefix is important for the calculation of the logical address. I read too, that default is "ds" and as soon as you use the ebp register to calculate an address, "ss" is used. For code "cs" is default. That all makes sense.
Now I have the following in c:
int x; // some static var in ds
void test(int *p){
...
*p =5;
}
... main(){
test(&x);
//now x is 5
}
If you now think about the implemention of test-function... you get the pointer to x on the stack. If you want to dereference the pointer, you first get the pointer-value(address of x) from the stack and save it in eax for example. Then you can dereference eax to change the value of x. But how does the c-compiler know if the given pointer(address) references memory on the stack (for example if i call test from another function and push the address of a localvariable as parameter for test) or the data segment? How is the full logical address calculated? The function cannot know which segment the given address offset relates to..?!

In general case, on a segmented platform your can't just read the pointer value "into eax" as you suggest. On a segmented platform the pointer would generally hold both the segment value and offset value, meaning that reading such a pointer would imply initializing at least two registers - segment and offset - not just one eax.
But in specific cases it depends on so called the memory model. Compilers on segmented platforms supported several memory models.
For starters, for obvious reasons it does not matter which segment register you use as long as the segment register holds the correct value. For example, if DS and ES registers hold the same value inside, then DS:<offset> will point to the same location in memory as ES:<offset>.
In so called "tiny" memory model, for one example, all segment registers were holding the same value, i.e. everything - code, data, stack - would fit in one segment (which is why it was called "tiny"). In this memory model each pointer was just an offset in this segment and, of course, it simply didn't matter which segment register to use with that offset.
In "larger" memory models you could have separate segments for code (CS), stack (SS) and data (DS). But on such memory models pointer object would normally hold both the offset and segment part of the address inside of it. In your example pointer p would actually be a two-part object, holding both segment value and offset value at the same time. In order to dereference such pointer the compiler would generate the code that would read both segment and offset values from p and use both of them. For example, the segment value would be read into ES register, while the offset value would be read into si register. The code would then access ES:[di] in order to read *p value.
There were also "intermediate" memory models, when code would be stored in one segment (CS), while data and stack would both be stored in another segment, so DS and SS would hold the same value. On that platform, obviously, there was no need to differentiate between DS and SS.
In the largest memory models you could have multiple data segments. In this case it is rather obvious that proper data addressing in segmented mode is not really a matter of choosing the proper segment register (as you seem to believe), but rather a matter of taking pretty much any segment register and initializing it with the correct value before performing the access.

What AndreyT described was what happened on DOS days. These days, modern operating systems use the so called flat memory model (or rather something very similar), in which all (protected mode) segments are setup so that they all can access the whole address space (i.e: they have a base of 0 and a limit = the whole address space).

On a machine with a segmented memory model, the C implementation must do one of the following things to be conformant:
Store the full address (with segment) in each pointer, OR
Ensure that all stack addresses that will be used for variables whose addresses are taken can be accessed via the data segment, either at the same relative address or via some magic offset the compiler can apply when taking the address of local variables, OR
Not use the stack for local variables whose addresses are taken, and perform a hidden malloc/free on every function entry/return (with special handling for longjmp!).
Perhaps there are other ways of doing it, but these are the only ones I can think of. Segmented memory models were really pretty disagreeable with C, and they were abandoned for good reason.

Segmentation is the legacy artifact of the Intel 16-bit 8086 processor. In reality, you probably operate in virtual memory, where everything is just a linear address. Compile with -S flag and see the resulting assembly.

Since you move the address to eax before dereferencing it, it defaults to the ds segment. However, as Nikolai mentioned, in user level code the segments probably all point to the same address.

Under x86, direct usage of the stack will use the stack segment, but indirect usage treats it as a data segment. You can see this if you disassemble a pointer dereference and write to a stack section pointer. Under x86 cs, ss and ds are treated pretty much the same(atleast in non kernel modes) due to linear addressing. the intel reference manuals should also have a section on segment addressing

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight