Memory layout of executable

Memory layout of executable - c

When loading an executable then segments like the code, data, bss and so on need to be placed in memory. I am just wondering, if someone could tell me where on a standard x86 for example the libc library is placed. Is that at the top or bottom of memory. My guess is at the bottom, close to the application code, ie., that would look something like this here:
--------- 0x1000
Stack
|
V
^
|
Heap
----------
Data + BSS
----------
App Code
----------
libc
---------- 0x0000
Thanks a lot,
Ross

It depends on the whims of the loader.
In particular, on any modern system that uses ASLR, you can't predict where a particular library is going to end up.

Related

Code Size Comparison Cortex M3: IAR ARM vs Keil µVision

I am currently developing a small project for an STM32F103 microcontroller which features a Cortex-M3 CPU.
Due to CMSIS standard header files it is possible to use the exact same code with IAR and Keil µVision. Because of that, I found it interesting to compare both compilers regarding code size (which is the most critical for small microcontrollers).
Both compilers were set on max. optimization level (for size). Unfortunately, I am not quite sure how IAR and Keil measure code size.
For example, IAR gives me this output:
868 bytes of readonly code memory
28 bytes of readonly data memory
2'056 bytes of readwrite data memory
and Keil this:
Program Size: Code=676 RO-data=252 RW-data=0 ZI-data=1640
At a first glance I am not able to detect which amount of bytes relates to used flash size and which to used SRAM.
Of course I know, that flash is Read-only and that SRAM is read-write but then there is code memory and data memory on IAR's side, and ZI-data and Code on Keil's side.
Anyone here who has more in depth knowledge about this?

Let's try to break this down in a systematic way. From a programmers point of view we want to differentiate between Code (Instructions) and Data.
Code is usually stored in some kind of non-volatile memory (ROM, FLASH, etc.) and is read and executed at runtime by the processor core. Modern MCU's usually read their instructions from FLASH but can also execute code from RAM. This is mainly useful in order to run the code faster (since FLASH is rather slow to access) or in order to implement some update functionality that can update the whole FLASH memory. Running code from RAM can also be used to construct self-modifying software, but that is a rather exotic use-case.
When talking about data we usually first think of variables, that are modified during run-time (read-write) and therefore need to be stored in random-access memory (RAM) that is usually volatile (values are lost at power-down). But there are more types to keep in mind:
Constants: Data values, that do not change during run-time, e.g. a peripheral register address or a hard-coded delay time. These values need to be placed into non-volatile memory, which can be read-only (e.g. FLASH).
Initialized variables: Most variables of a program need to have a defined initial value (e.g. the starting value of a loop count variable). This initial value is actually nothing else than a constant data value (see above), that is automatically copied to its associated variable at the beginning of its lifetime. Since a typical program requires quite a lot of those initialization values, modern compilers implement different optimization methods to reduce the memory footprint of this initializers. These includes clustering all variables that are initialized with the value zero (zero-initialization) and providing different data compression methods for non-zero initialized variables.
With these things in mind we can make an educated guess regarding the output of the IAR and Keil linker:
+---------------------+-----------------------+-------------------+
| Memory Object | IAR term | Keil term |
+---------------------+-----------------------+-------------------+
| Code in ROM | readonly code memory | Code |
| Code in RAM | readwrite code memory | ? |
| Constants in ROM | readonly data memory | RO-Data |
| Initializers in ROM | readonly data memory | (RW-Data) |
| Variables in RAM | readwrite data memory | RW-Data + ZI-Data |
+---------------------+-----------------------+-------------------+
Calculating memory usage is pretty straight-forward with IAR:
ROM usage = (readonly code memory) + (readonly data memory)
RAM usage = (readwrite code memory) + (readwrite data memory)
For Keil it is a bit more complicated:
ROM usage = (Code) + (RO-data) + (RW-data)
RAM usage = (RW-data) + (ZI-data)

Where are global variables located in the activation record for C?

In C, each function has an activation record which is allocated on a stack frame. Local variables are allocated in their own function's activation record. So, what is the case with global variables? Where are they allocated?
For example
#include <stdio.h>
int a;
void v()
{a= 2;
int b;
b++;
}
main()
{
int f;
printf("\n%d",a);
v();
}
-----Activation record----
-------------------
-------------------
activation record for main
-------------------
int f
-------------------
-------------------
activation record of v
--------------------
int a
--------------------
int b
--------------------
---------------
Where is variable x stored according to the activation record logic?

In C each function has an activation record which is allocated on a stack frame.
Nope. However, this is how it is usually solved by the compiler. At least if you have not activated any optimizations.
Firstly, the C standard does not say anything about a stack at all. So an answer to this will be about how it's usually solved in practice.
And usually they are in the data segment or bss segment. A typical layout looks like this:
Stack - Grows down towards the heap. Used for local variables.
-----
...
...
...
----
Heap - Grows up towards the stack. Used for dynamically allocated memory.
----
BSS - Uninitialized data. Used for uninitialized global and static variables.
----
Data - Initialized data.
----
Text - Runnable code

In C each function has an activation record which is allocated on a stack frame.
Wrong, an optimizing compiler might not do that (and gcc -O3 -flto won't, on Linux / x86-64 with a recent GCC). It will inline some functions. Some locals are only kept in some processor registers (so have no memory location). Read about register allocation, e.g. in the Dragon Book or some other textbook about compilers. Be aware of automatic variables. Be also aware that you don't even need a computer to run a C program (a good way of teaching C is to have the classroom play being a computer; and you could run a C program on paper with a pencil).
The globals are usually not in practice on the call stack (which hold call frames or activation records). They might sit in the data segment (and could be optimized out entirely).
The C11 specification does not require any call stack. Check by reading n1570. Some implementations don't use any call stack (or activation records). Be aware of the crt0 calling your main.
Read linkers and loaders for more. Read also a textbook about operating systems.
On Linux, try cat /proc/self/maps to understand the virtual address space of the process running that cat command; see proc(5)
Look into the assembler code generated by gcc -O2 -fverbose-asm -S, using Linux. Read about invoking GCC.
See also this answer.
On Linux, play with nm(1), readelf(1), objdump(1) on your executable or object file (in ELF format).

Where memory segments are defined?

I just learned about different memory segments like: Text, Data, Stack and Heap. My question is:
1- Where the boundaries between these sections are defined? Is it in Compiler or OS?
2- How the compiler or OS know which addresses belong to each section? Should we define it anywhere?

This answer is from the point of view of a more special-purpose embedded system rather than a more general-purpose computing platform running an OS such as Linux.
Where the boundaries between these sections are defined? Is it in Compiler or OS?
Neither the compiler nor the OS do this. It's the linker that determines where the memory sections are located. The compiler generates object files from the source code. The linker uses the linker script file to locate the object files in memory. The linker script (or linker directive) file is a file that is a part of the project and identifies the type, size and address of the various memory types such as ROM and RAM. The linker program uses the information from the linker script file to know where each memory starts. Then the linker locates each type of memory from an object file into an appropriate memory section. For example, code goes in the .text section which is usually located in ROM. Variables go in the .data or .bss section which are located in RAM. The stack and heap also go in RAM. As the linker fills one section it learns the size of that section and can then know where to start the next section. For example, the .bss section may start where the .data section ended.
The size of the stack and heap may be specified in the linker script file or as project options in the IDE.
IDEs for embedded systems typically provide a generic linker script file automatically when you create a project. The generic linker file is suitable for many projects so you may never have to customize it. But as you customize your target hardware and application further you may find that you also need to customize the linker script file. For example, if you add an external ROM or RAM to the board then you'll need to add information about that memory to the linker script so that the linker knows how to locate stuff there.
The linker can generate a map file which describes how each section was located in memory. The map file may not be generated by default and you may need to turn on a build option if you want to review it.
How the compiler or OS know which addresses belong to each section?
Well I don't believe the compiler or OS actually know this information, at least not in the sense that you could query them for the information. The compiler has finished its job before the memory sections are located by the linker so the compiler doesn't know the information. The OS, well how do I explain this? An embedded application may not even use an OS. The OS is just some code that provides services for an application. The OS doesn't know and doesn't care where the boundaries of memory sections are. All that information is already baked into the executable code by the time the OS is running.
Should we define it anywhere?
Look at the linker script (or linker directive) file and read the linker manual. The linker script is input to the linker and provides the rough outlines of memory. The linker locates everything in memory and determines the extent of each section.

For your Query :-
Where the boundaries between these sections are defined? Is it in Compiler or OS?
Answer is OS.
There is no universally common addressing scheme for the layout of the .text segment (executable code), .data segment (variables) and other program segments. However, the layout of the program itself is well-formed according to the system (OS) that will execute the program.
How the compiler or OS know which addresses belong to each section? Should we define it anywhere?
I divided your this question into 3 questions :-
About the text (code) and data sections and their limitation?
Text and Data are prepared by the compiler. The requirement for the compiler is to make sure that they are accessible and pack them in the lower portion of address space. The accessible address space will be limited by the hardware, e.g. if the instruction pointer register is 32-bit, then text address space would be 4 GiB.
About Heap Section and limit? Is it the total available RAM memory?
After text and data, the area above that is the heap. With virtual memory, the heap can practically grow up close to the max address space.
Do the stack and the heap have a static size limit?
The final segment in the process address space is the stack. The stack takes the end segment of the address space and it starts from the end and grows down.
Because the heap grows up and the stack grows down, they basically limit each other. Also, because both type of segments are writeable, it wasn't always a violation for one of them to cross the boundary, so you could have buffer or stack overflow. Now there are mechanism to stop them from happening.
There is a set limit for heap (stack) for each process to start with. This limit can be changed at runtime (using brk()/sbrk()). Basically what happens is when the process needs more heap space and it has run out of allocated space, the standard library will issue the call to the OS. The OS will allocate a page, which usually will be manage by user library for the program to use. I.e. if the program wants 1 KiB, the OS will give additional 4 KiB and the library will give 1 KiB to the program and have 3 KiB left for use when the program ask for more next time.
Most of the time the layout will be Text, Data, Heap (grows up), unallocated space and finally Stack (grows down). They all share the same address space.

The sections are defined by a format which is loosely tied to the OS. For example on Linux you have ELF and on Mac OS you have Mach-O.
You do not define the sections explicitly as a programmer, in 99.9% of cases. The compiler knows what to put where.

Why do allocated blocks in virtual memory vary so much from one run to the other?

I know this may be considered a silly question. But my curiosity is stronger than the fear of downvotes. The code below simply reserves 1GB of the process virtual memory, prints the address of the block reserved and releases the block.
#include <iostream>
#include <Windows.h>
int main()
{
// Reserves 1GB of the process virtual memory
LPVOID lp1 = VirtualAlloc((LPVOID)NULL, 0x40000000, MEM_RESERVE, PAGE_NOACCESS);
std::cout << lp1 << '\n';
// Releases the 1GB block of virtual memory
VirtualFree(lp1, NULL, MEM_RELEASE);
}
I run this code in a x64 machine, a few times and obtained the following addresses for lp1:
0x1e 9c22 0000
0xe1 8000 0000
0x16 92a3 0000
0x34 83ec 0000
Why do the addresses vary so much, from one run to the other? I know MS docs don't say anything about this, but I'd like to know if there is some reasonable explanation for this weird behavior?

There is no reason why it should not differ between allocations, but one popular reason for giving different addresses upon subsequent allocation, is to make security exploits harder to pull off.
The idea is, that exploit code is easier to do if it can know where memory is between program runs. Another reason could be that the different addresses you see is just a side effect of how the allocator keeps track of memory.

You probably link with the /DYNAMICBASE linker option, it is turned on by default for x64 projects. Which also gets the /HIGHENTROPYVA option turned on in the executable file header. Run Dumpbin.exe /headers on your EXE file:
OPTIONAL HEADER VALUES
20B magic # (PE32+)
...
8160 DLL characteristics
High Entropy Virtual Addresses <== here
Dynamic base <== and here
NX compatible
Terminal Server Aware
Which asks the memory manager to generate highly random addresses. It makes your program very difficult to attack by malware. Some background in this SE Q+A and highly googable.
Beware that /DYNAMICBASE is also turned on the Debug configuration. While that can be somewhat helpful in getting your program to bomb when it has pointer bugs, it is much more likely to be a massive pain when you have to diagnose such a bug. Don't hesitate to turn it off, it is only intended to protect your program in the wild. Project > Properties > Linker > Advanced > Randomized Base Address = "No".

What might be the point in putting a variable exactly in the "STACK" section with attribute ((section("STACK"))?

In gcc doc one reason is given for using section. This reason is to map to special hardware. But this seems to be not my case.
So I have given a task to modify a shared library that we use on our project. It is a Linux library. There is variable declarations in the library that puzzeles me. They look like this (roughly):
static int my_var_1 __attribute__((section("STACK"))) = 0;
Update 1:
There are a dozen of variables defined in this way (__attribute__((section("STACK"))))
Update 2:
my_var_1 is not a constant. my_var_1 might be changed in code during initialization:
my_var_1 = atoi(getenv("MY_VAR_1") ? getenv("MY_VAR_1") : "0");
later in the library it is used like this:
inline void do_something() __attribute__((always_inline));
inline void do_something()
{
if (my_var_1)
do_something_else();
}
What might be the point in using __attribute__((section("STACK")))? I understand that section tells the compiler to put a variable in the particular section. However what might be the point in putting static int exactly in the "STACK" section?
Update 3
These lines are excerpt from the output from readelf -t my_lib.so
[23] .got.plt
PROGBITS 00000000002103f0 00000000000103f0 0
00000000000003a8 0000000000000008 0 8
[0000000000000003]: WRITE, ALLOC
[24] .data
PROGBITS 00000000002107a0 00000000000107a0 0
00000000000000b0 0000000000000000 0 16
[0000000000000003]: WRITE, ALLOC
[25] STACK
PROGBITS 0000000000210860 0000000000010860 0
00000000000860e0 0000000000000000 0 32
[0000000000000003]: WRITE, ALLOC
[26] .bss
NOBITS 0000000000296940 0000000000096940 0
0000000000000580 0000000000000000 0 32
[0000000000000003]: WRITE, ALLOC
Update 4
Managed to get information from the author of the shared library.
__attribute__((section("STACK"))) was added since he had not managed to build the library on Solaris. Then he found this workaround. Before the workaround the definition of the my_var_1 was like:
int my_var_1 = 0;
and everything was OK. Then he changed it since my_var_1 was in fact needed only in this translation unit:
static int my_var_1 = 0;
And after that change he did not manage to build the library on Solaris. So he added __attribute__((section("STACK"))) and it helped somehow.

First the STACK section won't be the stack of any running task.
Putting variables, functions in a specific Section allow to select a memory area for them (thanks to the linker script). On some (mostly embedded) architecture, you want put often accessed data in the faster memory.
Other solution, some development post-link script will set all the STACK section to 1: a development software will always do do_something_else(). And the released software may keep the default value of 0.
An other possibility, if there are other variables in the STACK section, the developer wants to keep them close in the memory. All Variable in the STACK section will be near each other. Maybe a cache optimization ?

There may be many reasons and it is difficult to tell without details. Some of the reasons might be:
The section marked STACK is linked in run-time to a closely coupled memory with faster access time then other RAMs. It makes sense to map the stack to such a RAM to avoid stalls during function calls. Now if you suddenly had a variable that is accessed a lot and you wanted to map it to the same fast access RAM putting it in the same section as the stack makes sense.
The section marked STACK might be mapped to a region of memory that is accessible when other parts of memory might not be. For example, boot loaders need to init the memory controller before they can access RAM. But you really want to be able to write the code that does that in C, which requires stack. So you find some special memory (such as programming the data cache to write-back mode) and map the stack there so you can run code to get the memory controller working so you can use RAM. Once again, if you now happen to have a global variable that still need to be accessed before RAM is available, you might decide to put it in the STACK section.
A better programmer would have renamed the STACK section to something else if it is used not only for stack.

In some operating systems, the same region of addressing space is used for every thread's stack; when execution switches between threads, the mapping of that space is changed accordingly. On such systems, every thread will have its own independent set of any static variables located within that region of address space. Putting variables that need to be maintained separately for each thread in such an address range will avoid the need to manually swap them with each task switch.
Another occasional use for forcing variables into a stack area is to add stack sentinels (variables that can be periodically checked to see if a stack overflow clobbered them).
A third use occurs on 8086 and 80286 platforms (probably not so much later chips in the family): the 8086 and 80286 are limited to efficiently accessing things in four segments without having to reload segment registers. If code needs to do something equivalent to
for (n=0; n<256; n++)
*dest++ = xlat[*src++];
and none of the items can be put in the code segment, being able to force one of the items into the stack segment can make code much faster. Hand-written assembly code would be required to achieve the speedup, but it can be extremely massive (nearly a factor of two in some real-world situations I've done on 8086, and perhaps even greater in some situations on the 80286).

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight