As far as I know, when a process allocates local variables, it does so by pushing them onto memory as a stack, but still accesses them as random memory by using an offset from the stack pointer to reference them (from this thread What is the idea behind using a stack for local variables?).
However, how does it know which variables have what offset? Am I thinking about this in the right way?
Offsets of local variables are "baked into" the machine code as constants. By the time the compiler is done, things that your program referred to as local variables are replaced with fixed memory offsets assigned by compiler.
Let's say you declare three local variables:
char a[8];
int b;
short c;
The compiler assigns offsets to these variables: a is at offset 0, b is at offset 8, and c is at offset 12. Let's say your code does b += c. Compiler translates this into a block of code that looks like this:
LOAD #(SP+8)
ADD #(SP+12)
STORE #(SP+8)
The only value that changes here is SP (stack pointer). All offsets are numeric constants.
Preface: The following text uses the x86 architecture as example. Other architectures do handle things differently.
[...] it does so by pushing them into memory as a stack, [...]
That's close. it does so by pushing them into memory ON THE stack [of the current process]. Every process has its own stack. Therefore with every context switch this Stack Frame does change - and so do its local variables (on the stack).
Usually(!) locally defined variables are referenced relative to the Stack Frame saved and present in the EBP register. This happens in contrast to globally defined varables which are referenced relative to the Data Segment Base. So every process does have its own stack with its own local variables.
Newer compilers can spare the register EBP and reference the variables relative to the ESP register. This has two consequences:
one register more available to use
one possibility less for debugging (debugging often used the EBP value as reference for the current Stack Frame to identify local variables). So this makes debugging harder without a separate debugging information file.
So to answer your main question
How does a process keep track of its local variables
Processes keep track of their Stack Frame (which contains the Local Variables), but not of their Local Variables themselves. And the Stack Frame changes with each Process Switch. The Local Variables are merely referenced relative to the Stack Frame Pointer kept in the register EBP (or relative to the Stack Pointer ESP, which depends on the compiler settings).
Compiler does the job in memorizing the offsets. These offsets are simply hardcoded. Like to load the variable to register (eg. to eax) compiler would produce something like mov eax, [esp-4], where esp is stack pointer register and 4 is the offset. If new variable will be pushed next mov to get/set variable will have bigger offset. All this is compilation time analysis.
Also, the stack on some platform may be reversed - so offset will be positive.
Related
This question already has answers here:
Why is it better to use the ebp than the esp register to locate parameters on the stack?
(1 answer)
What is the purpose of the EBP frame pointer register?
(5 answers)
Closed 3 years ago.
In terms of x86 assembly code anyways. I've been reading about function calls, but still can't fully grasp the need for a base / frame pointer (EBP) along with a stack pointer (ESP).
When we call a function, the current value of EBP will be placed on the stack and then EBP gets the current ESP value.
Place holders for the return value, function arguments and local variables of the function will then be placed on the stack, and the stack pointer ESP value will decrease (or increase) to point to after the last placeholder placed on the stack.
Now we have the EBP pointing to the beginning of the current stack frame, and ESP pointing to the end of the stack frame.
The EBP will be used to access the arguments and local variables of the function due to constant offsets from the EBP. That is fine. What I don't understand is, why can't the ESP just be used to access these variables also by using its offsets. The EBP points to the beginning of the stack frame , and the ESP points to the end of the stack frame. What's the difference?
The ESP shouldn't change from once there has been a placeholder for all the local variables etc. or should it?
Technically, it is possible (but sometimes hard) to track how many local and temporary variables are stored on the stack, so that accessing function input, and local variables can be done without EBP.
Consider the following "C" code ;
int func(int arg) {
int result ;
double x[arg+5] ;
// Do something with x, calculate result
return result ;
} ;
The numbers of items that are stored on the stack is now variables (arg+5 items of double). Calculating the location of 'arg' from the stack require run time calculation, which can have significant negative impact on performance.
With extra register (EBP), the location of arg is always at fixed location (EBP-2). Executing a 'return' is always simple - move BP to SP, and return, etc.
Bottom line, the decision to commit the EBP register to a single function (instead of using it as a general register) is a trade off between performance, simplicity, code size and other factors. Practical experience has shown the benefit outweigh the cost.
Side note about debugger/runtime tools:
Using of EBP make it easier for debugger (and other runtime tools) to 'walk the stack'. Tools can examine the stack at run-time, and without knowing anything about the current program stack (e.g., how many items have been pushed into eac frame), they can travel the stack all the way to the "main".
Without EBP pointing to the 'next' frame, run-time tools (including debugger) will face the very hard (impossible ?) task of knowing how to move from the ESP to specific local variables.
Take a simple program like this:
int main(void)
{
char p;
char *q;
q = &p;
return 0;
}
How is &p determined? Does the compiler calculate all such references before-hand or is it done at runtime? If at runtime, is there some table of variables or something where it looks these things up? Does the OS keep track of them and it just asks the OS?
My question may not even make sense in the context of the correct explanation, so feel free to set me straight.
How is &p determined? Does the compiler calculate all such references before-hand or is it done at runtime?
This is an implementation detail of the compiler. Different compilers can choose different techniques depending on the kind of operating system they are generating code for and the whims of the compiler writer.
Let me describe for you how this is typically done on a modern operating system like Windows.
When the process starts up, the operating system gives the process a virtual address space, of, let's say 2GB. Of that 2GB, a 1MB section of it is set aside as "the stack" for the main thread. The stack is a region of memory where everything "below" the current stack pointer is "in use", and everything in that 1MB section "above" it is "free". How the operating system chooses which 1MB chunk of virtual address space is the stack is an implementation detail of Windows.
(Aside: whether the free space is at the "top" or "bottom" of the stack, whether the "valid" space grows "up" or "down" is also an implementation detail. Different operating systems on different chips do it differently. Let's suppose the stack grows from high addresses to low addresses.)
The operating system ensures that when main is invoked, the register ESP contains the address of the dividing line between the valid and free portions of the stack.
(Aside: again, whether the ESP is the address of the first valid point or the first free point is an implementation detail.)
The compiler generates code for main that pushes the stack pointer by lets say five bytes, by subtracting from it if the stack is growing "down". It decreases by five because it needs one byte for p and four for q. So the stack pointer changes; there are now five more "valid" bytes and five fewer "free" bytes.
Let's say that q is the memory that is now in ESP through ESP+3 and p is the memory now in ESP+4. To assign the address of p to q, the compiler generates code that copies the four byte value ESP+4 into the locations ESP through ESP+3.
(Aside: Note that it is highly likely that the compiler lays out the stack so that everything that has its address taken is on an ESP+offset value that is divisible by four. Some chips have requirements that addresses be divisible by pointer size. Again, this is an implementation detail.)
If you do not understand the difference between an address used as a value and an address used as a storage location, figure that out. Without understanding that key difference you will not be successful in C.
That's one way it could work but like I said, different compilers can choose to do it differently as they see fit.
The compiler cannot know the full address of p at compile-time because a function can be called multiple times by different callers, and p can have different values.
Of course, the compiler has to know how to calculate the address of p at run-time, not only for the address-of operator, but simply in order to generate code that works with the p variable. On a regular architecture, local variables like p are allocated on the stack, i.e. in a position with fixed offset relative to the address of the current stack frame.
Thus, the line q = &p simply stores into q (another local variable allocated on the stack) the address p has in the current stack frame.
Note that in general, what the compiler does or doesn't know is implementation-dependent. For example, an optimizing compiler might very well optimize away your entire main after analyzing that its actions have no observable effect. The above is written under the assumption of a mainstream architecture and compiler, and a non-static function (other than main) that may be invoked by multiple callers.
This is actually an extraordinarily difficult question to answer in full generality because it's massively complicated by virtual memory, address space layout randomization and relocation.
The short answer is that the compiler basically deals in terms of offsets from some “base”, which is decided by the runtime loader when you execute your program. Your variables, p and q, will appear very close to the “bottom” of the stack (although the stack base is usually very high in VM and it grows “down”).
Address of a local variable cannot be completely calculated at compile time. Local variables are typically allocated in the stack. When called, each function allocates a stack frame - a single continuous block of memory in which it stores all its local variables. The physical location of the stack frame in memory cannot be predicted at compile time. It will only become known at run-time. The beginning of each stack frame is typically stored at run-time in a dedicated processor register, like ebp on Intel platform.
Meanwhile, the internal memory layout of a stack frame is pre-determined by the compiler at compile-time, i.e. it is the compiler who decides how local variables will be laid out inside the stack frame. This means that the compiler knows the local offset of each local variable inside the stack frame.
Put this all together and we get that the exact absolute address of a local variable is the sum of the address of the stack frame itself (the run-time component) and the offset of this variable inside that frame (the compile-time component).
This is basically exactly what the compiled code for
q = &p;
will do. It will take the current value of the stack frame register, add some compile-time constant to it (offset of p) and store the result in q.
In any function, the function arguments and the local variables are allocated on the stack, after the position (program counter) of the last function at the point where it calls the current function. How these variables get allocated on the stack and then deallocated when returning from the function, is taken care of by the compiler during compile time.
For e.g. for this case, p (1 byte) could be allocated first on the stack followed by q (4 bytes for 32-bit architecture). The code assigns the address of p to q. The address of p naturally then is 5 added or subtracted from the the last value of the stack pointer. Well, something like that, depends on how the value of the stack pointer is updated and whether the stack grows upwards or downwards.
How the return value is passed back to the calling function is something that I'm not certain of, but I'm guessing that it is passed through the registers and not the stack. So, when the return is called, the underlying assembly code should deallocate p and q, place zero into the register, then return to the last position of the caller function. Of course, in this case, it is the main function, so it is more complicated in that, it causes the OS to terminate the process. But in other cases, it just goes back to the calling function.
In ANSI C, all the local variables should be placed at the top of the function and is allocated once into the stack when entering the function and deallocated when returning from the function. In C++ or later versions of C, this becomes more complicated when local variables can also be declared inside blocks (like if-else or while statement blocks). In this case, the local variable is allocated onto the stack when entering the block and deallocated when leaving the block.
In all cases, the address of a local variable is always a fixed number added or subtracted from the stack pointer (as calculated by the compiler, relative to the containing block) and the size of the variable is determined from the variable type.
However, static local variables and global variables are different in C. These are allocated in fixed locations in the memory, and thus there's a fixed address for them (or a fixed offset relative to the process' boundary), which is calculated by the linker.
Yet a third variety is memory allocated on the heap using malloc/new and free/delete. I think this discussion would be too lengthy if we include that as well.
That said, my description is only for a typical hardware architecture and OS. All of these are also dependent on a wide variety of things, as mentioned by Emmet.
p is a variable with automatic storage. It lives only as long as the function it is in lives. Every time its function is called memory for it is taken from the stack, therefore, its address can change and is not known until runtime.
In assembly You can store data in registers or on the stack. Only the top of the stack can be accessed at any given moment (right?). Consider the following C code:
main(){
int x=2;
func();
}
func( int x ){
int i;
char a;
}
Upon calling func() the following is pushed onto the stack (consider a 32bit system):
variable x (4 bytes, pushed by main)
<RETURN ADDRESS> (4 bytes pushed by main?)
<BASE POINTER> (4 bytes pushed by func())
variable i (4 bytes, pushed by func())
variable a (1 byte, pushed by func())
I have the following questions:
In C code you can access the local variable from anywhere inside the function, but in assembly you can only access the top of the stack. The C code is translated into assembly (in machine code but assembly is the readable form of it). So how does assembly support the reading of variables that are not on top of the stack?
Did I leave out anything that would also be pushed to the stack in my example?
In assembly if you push a char on the stack or an int, how can it determine whethere it needs to push 4 bytes or 1 byte? Because it uses the same operation (push) right?
Thanks in advance
Gr. Maricruzz
The stack pointer at the beginning of the function is put into a register, and then the variables/arguments are accessed via this base address plus the offset for the variable.
If you want to see the code, instead of creating object files, let the compiler stop at creating assembler files. Then you can see exactly how it works. (Of course, that requires you to have a valid C program, unlike the one you have in the question now.)
The compiler is generating the assembly, each instruction set may differ but at the end of the day the stack is just a register holding an address to memory. The compiler is creating and knows the whole scope of the function it is creating and knows how far down to find each data item on the stack for local data items, so it will then create the appropriate code based on that instruction set to access those local items.
Some instruction sets you need to make a copy of the stack pointer and/or do math with the stack pointer as an operand but some other register as a result of that math, then based on that math (stack pointer + 8 words for example) you access that memory address. Some instruction sets have an addressing mode where you can in the load or store apply an offset to the stack pointer the math is done as part of the instruction execution and you dont have to use an intermediate result and a register.
Only the top of the stack can be accessed at any given moment (right?)
No, generally the ISA has instructions to access other elements on the stack as well. That is, accessing elements on the stack is not limited to push and pop like operations; typically you can just mov things back and forth between a stack location and a register.
Assembly can accesss any memory by address (just like C).
Simple, not optimized programs would put all local variables on stack before method execution, so variables addresses are address of execution frame plus some shift.
Then program can simple use pop and push method to store additional variables (i.e. subresults of some expression) on the top of the stack.
Summary:
There is register (ESP in x86) pointing to the top of the stack
Calling push is moving variable to the top of the stack and increasing this register
Calling pop is moving variable from the top of the stack and decreasing this register
Calling mov is moving variable between memory and registers and do nothing to stack register (ESP).
Why C use stacks for storing the local variables? Is this just to have independent memory space or to have feature of automatic clearing of all the local variables and objects once it goes out of scope?
I have few more questions around the same,
Question 1) How local variables are referenced from the instruction part. Consider NewThreadFunc is the function which is called by createThread function.
DWORD WINAPI NewThreadFunc(PVOID p_pParam)
{
int l_iLocalVar1 = 10;
int l_iLocalVar2 = 20;
int l_iSumLocalVar = l_iLocalVar1 + l_iLocalVar2;
}
The stack for this thread would look like this,
| p_pParam |
| NewThreadFunc()|
| 10 |
| 20 |
| 30 |
| |
.
.
.
Now my question is, while executing this function how would CPU know the address of local variables (l_iSumLocalVar, l_iLocalVar1 and l_iLocalVar2)? These variables are not the pointers that they store the address from where they have to fetch the value. My question is wrt the stack above.
Question 2) If this function further calls any other function how would the stack behave to it? As I know, the stack would get divided into itself further. If this is true how the local variables of the callee function gets hidden from the called function. Basically how the local variables maintains the scope rules?
I know these could be very basic questions but some how I could not think an answer to these.
Firstly, it is not "Windows" that uses stack for local variables. It has absolutely nothing to do with "Windows", or with any other OS for that matter. It is your compiler that does that. Nobody forces your compiler to use system stack for that purpose, but normally this is the simplest and most efficient way to implement local variables.
Secondly, compilers use stacks to store local variables (be that system-provided stacks or compiler-implemented stack) simply because stack-like storage matches the language-mandated semantics of local variables very precisely. The storage duration of local variables is defined by their declarative regions (blocks) which strictly nest into each other. This immediately means that storage durations of local variables follow the LIFO principle: last in - first out. So, using a stack - a LIFO data structure - for allocating objects with LIFO storage duration is the first and the most natural thing that comes to mind.
Local variables are typically addressed by their offset from the beginning of the currently active stack frame. The compiler knows the exact offset of each local variable at compile time. The compiler generates the code that will allocate the stack frame for the current function by: 1) memorizing the current position of the stack pointer when the function is entered (let's say it is memorized in register R1) and 2) moving the current stack pointer by the amount necessary to store all local variables of the function. Once the stack frame is allocated in this fashion, your local variables l_iLocalVar1, l_iLocalVar2 and l_iSumLocalVar will simply be accessed through addresses R1 + 6, R1 + 10 and R1 + 14 (I used arbitrary offsets). In other words, local variables are not accessed by specific address values, since these addresses are not known at compile time. Instead local variables are accessed through calculated addresses. They are calculated as some run-time base address value + some compile-time offset value.
Normally the system calling convention reserves a register to be used as a "stack pointer". Local variable accesses are made relative to this register's value. Since every function must know how much stack space it uses, the compiler emits code to ensure the stack pointer is adjusted correctly for each function's requirements.
The scope of local variables is only enforced by the compiler, since it's a language construct, not anything to do with hardware. You can pass addresses of stack variables to other functions and they'll work correctly.
why is the stack used for local variables?
Well, the stack is an easy to use structure to reserve space for temporary variables. It has the benefit that it will be removed almost automatically when the function returns. An alternative would be to allocate memory from the OS, but then this would cause heavy memory fragmentation.
The stack can be easily allocated as well as freed again, so it is a natural choice.
All the variable addresses are relative to the stack pointer that is incremented at each function call or return. Fast easy way to allocate and cleanup memory used by these variables.
I have some questions about memory layout of C programs.
Text Segment
Here is my first question:
When I searched the text segment (or code segment) I read that "Text segment contain executable instructions". ut what are executable instructions for any function? Could you give some different examples?
I also read that "Text segment is sharable so that only a single copy needs to be in memory for frequently executed programs such as text editors, the C compiler, etc.", but I couldn't make a connection between C programs and "text editors".
What should I understand from this statement?
Initialized Data Segment
It is said that the "Initialized Data Segment" contains the global variables and static variables, but I also read that const char* string = "hello world" makes the string literal "hello world" to be stored in initialized read-only area and the character pointer variable string in initialized read-write area. char* string is stored read-only area or read-write area? Since both are written here I'm a bit confused.
Stack
From what I understand, the stack contains the local variables. Is this right?
The text segment contains the actual code of your program, i.e. the machine code emitted by your compiler. The idea of the last statement is that your C program and, say, a text editor is exactly the same thing; it's just machine code instructions executing from memory.
For example, we'll take the following code, and a hypothetical architecture I've just thought up now because I can't remember x86 assembly.
while(i != 10)
{
x -= 5;
i++;
}
This would translate to the following instructions
LOOP_START:
CMP eax, 10 # EAX contains i. Is it 10?
JZ LOOP_END # If it's 10, exit the loop
SUB ebx, 5 # Otherwise, subtract 5 from EBX (x)
ADD eax, 1 # And add 1 to i
JMP LOOP_START # And then go to the top of the loop.
LOOP_END:
# Do something else
These are low-level operations that your processor can understand. These would then be translated into binary machine code, which is then stored in memory. The actual data stored might be 5, 2, 7, 6, 4, 9, for example, given a mapping between operation and opcode that I just thought up. For more information on how this actually happens, look up the relationship between assembler and machine code.
-- Ninja-edit - if you take RBK's comment above, you can view the actual instructions which make up your application using objdump or a similar disassembler. There's one in Visual Studio somewhere, or you could use OllyDbg or IDA on Windows.
Because the actual instructions of your program should be read-only, the text segment doesn't need to be replicated for multiple runs of your program since it should always be the same.
As for your question on the data segment, char* string will actually be stored in the .bss segment, since it doesn't have an initializer. This is an area of memory that is cleared before your program runs (by the crt0 or equivalent) unless you give GCC a flag that I can't remember off-hand. The .bss segment is read-write.
Yes, the stack segment contains your local variables. In reality, it stores what are called "stack frames". One of these is created for each function you call, and they stack on top of each other. It contains stuff like the local variables, as you said, and other useful bits like the address that the function was called from, and other useful data so that when the function exits, the previous state can be reinstated. For what is actually contained on a stack frame, you need to delve into your architecture's ABI (Application Binary Interface).
The text segment is often also called "code" ("text" tends to be the Unix/linux name, other OS's doesn't necessarily use that name).
And it is shareable in the sense that if you run TWO processes that both execute the C-compiler, or you open the text editor in two different windows, both of those share the same "text" section - because it doesn't change during the running of the code (self-modifying code is not allowed in text-segment).
Initialized string value is stored in either "ro-data" or "text", depending on the compiler. And yes, it's not writeable.
If string is a global variable, it will end up in "initialized data", which will hold the address of the "hello world" message in the value of string. The const part is referring to the fact that the contents the pointer points at is constant, so we can actually change the pointer by string = "foo bar"; later in the code.
The stack is, indeed, used for local variables and, typically, the call stack (where the code returns to after it finishes the current function).
However, the actual layout of a program's in-memory image is left entirely up to the operating system, and often the program itself as well. Yet, conceptually we can think of two segments of memory for a running program[1].
Text or Code Segment - Contains compiled program code.
Data Segment - Contains data (global, static, and local) both initialized and uninitialized. Data segment can further be sub-categorized as follows:
2.1 Initialized Data Segments
2.2 Uninitialized Data Segments
2.3 Stack Segment
2.4 Heap Segment
Initialized data segment stores all global, static, constant, and external variables (declared with extern keyword) that are initialized beforehand.
Uninitialized data segment or .bss segment stores all uninitialized global, static, and external variables (declared with extern keyword).
Stack segment is used to store all local variables and is used for passing arguments to the functions along with the return address of the instruction which is to be executed after the function call is over.
Heap segment is also part of RAM where dynamically allocated variables are stored.
Coming to your first question - If you are aware of function pointers then you know that the function name returns the address of the function (which is the entry point for that function). These instructions are coded in assembly. Instruction set may vary from architecture to architecture.
Text or code section is shareable - If more than one running process belong to the same program then the common compiled code need not be loaded into memory separately. For example if you have opened two .doc documents then there will be two processes for them but definitely there will be some common code being used by both processes.
The stack segment is area where local variables are stored. By saying local variable means that all those variables which are declared in every function including main( ) in your C program.
When we call any function, stack frame is created and when function returns, stack frame is destroyed including all local variables of that particular function.
Stack frame contain some data like return address, arguments passed to it, local variables, and any other information needed by the invoked function.
A “stack pointer (SP)” keeps track of stack by each push & pop operation onto it, by adjusted stack pointer to next or previous address.
you can refer this link for practical info:- http://www.firmcodes.com/memory-layout-c-program-2/