I have an m.c:
extern void a(char*);
int main(int ac, char **av){
static char string [] = "Hello , world!\n";
a(string);
}
and an a.c:
#include <unistd.h>
#include <string.h>
void a(char* s){
write(1, s, strlen(s));
}
I compile and build these as:
g++ -c -g -std=c++14 -MMD -MP -MF "m.o.d" -o m.o m.c
g++ -c -g -std=c++14 -MMD -MP -MF "a.o.d" -o a.o a.c
g++ -o linux m.o a.o -lm -lpthread -ldl
Then, I examine the executable, linux thus:
objdump -drwxCS -Mintel linux
The output of this on my Ubuntu 16.04.6 starts off with:
start address 0x0000000000400540
then, later, is the init section:
00000000004004c8 <_init>:
4004c8: 48 83 ec 08 sub rsp,0x8
Finally, is the fini section:
0000000000400704 <_fini>:
400704: 48 83 ec 08 sub rsp,0x8
400708: 48 83 c4 08 add rsp,0x8
40070c: c3 ret
The program references the string Hello , world!\n which is in .data section obtained by command:
objdump -sj .data linux
Contents of section .data:
601030 00000000 00000000 00000000 00000000 ................
601040 48656c6c 6f202c20 776f726c 64210a00 Hello , world!..
All of this tells me that the executable has been created so as to be loaded in actual memory address starting from around 0x0000000000400540 (address of .init) and the program accesses data in actual memory address extending until atleast 601040 (address of .data)
I base this on Chapter 7 of "Linkers & Loaders" by John R Levine, where he states:
A linker combines a set of input files into a single output file that
is ready to be loaded at a specific address.
My question is about the next line.
If, when the program is loaded, storage at that address isn't
available, the loader has to relocate the loaded program to reflect
the actual load address.
(1) Suppose I have another executable that is currently running on my machine already using the memory space between 400540 and 601040, how is it decided where to start my new executable linux?
(2) Related to this, in Chapter 4, it is stated:
..ELF objects...are loaded in about the middle of the address space so
the stack can grown down below the text segment and the heap can grow
up from the end of the data, keeping the total address space in use
relatively compact.
Suppose a previous running application started at, say, 200000 and now linux starts around 400540. There is no clash or overlap of memory address. But as the programs continue, suppose the heap of the previous application creeps up to 300000, while the stack of the newly launched linux has grown downward to 310000. Soon, there will be a clash/overlap of the memory addresses. What happens when the clash eventually occurs?
If, when the program is loaded, storage at that address isn't available, the loader has to relocate the loaded program to reflect the actual load address.
Not all file formats support this:
GCC for 32-bit Windows will add the information required for the loader in the case of dynamic libraries (.dll). However, the information is not added to executable files (.exe), so such an executable file must be loaded to a fixed address.
Under Linux it is a bit more complicated; however, it is also not possible to load many (typically older 32-bit) executable files to different addresses while dynamic libraries (.so) can be loaded to different addresses.
Suppose I have another executable that is currently running on my machine already using the memory space between 400540 and 601040 ...
Modern computers (all x86 32-bit computers) have a paging MMU which is used by most modern operating systems. This is some circuit (typically in the CPU) which translates addresses seen by the software to addresses seen by the RAM. In your example, 400540 could be translated to 1234000, so accessing the address 400540 will actually access the address 1234000 in RAM.
The point is: Modern OSs use different MMU configurations for different tasks. So if you start your program again, a different MMU configuration is used that translates address 400540 seen by the software to address address 2345000 in RAM. Both programs using address 400540 can run at the same time because one program will actually access address 1234000 and the other one will access address 2345000 in RAM when the programs access the address 400540.
This means that some address (e.g. 400540) will never be "already in use" when the executable file is loaded.
The address may already be in use when a dynamic library (.so/.dll) is loaded because these libraries share the memory with the executable file.
... how is it decided where to start my new executable linux?
Under Linux the executable file will be loaded to the fixed address if it was linked in a way that it cannot be moved to another address. (As already said: This was typical for older 32-bit files.) In your example the "Hello world" string would be located at address 0x601040 if your compiler and linker created the executable that way.
However, most 64-bit executables can be loaded to a different address. Linux will load them to some random address because of security reasons making it more difficult for viruses or other malware to attack the program.
... so the stack can grown down below the text segment ...
I've never seen this memory layout in any operating system:
Both under Linux and under Solaris the stack was located at the end of the address space (somewhere around 0xBFFFFF00), while the text segment was loaded quite close to the start of the memory (maybe address 0x401000).
... and the heap can grow up from the end of the data, ...
suppose the heap of the previous application creeps up ...
Many implementations since the late 1990s do not use heap any more. Instead, they use mmap() to reserve new memory.
According to the manual page of brk(), the heap was declared as "legacy feature" in the year 2001, so it should not be used by new programs any longer.
(However, according to Peter Cordes malloc() still seems to use the heap in some cases.)
Unlike "simple" operating systems like MS-DOS, Linux does not allow you "simply" to use the heap, but you have to call the function brk() to tell Linux how much heap you want to use.
If a program uses heap and it uses more heap than available, the brk() function returns some error code and the malloc() function simply returns NULL.
However, this situation typically happens because no more RAM is available and not because the heap overlaps with some other memory area.
... while the stack of the newly launched linux has grown downward to ...
Soon, there will be a clash/overlap of the memory addresses. What happens when the clash eventually occurs?
Indeed, the size of the stack is limited.
If you use too much stack, you have a "stack overflow".
This program will intentionally use too much stack - just to see what happens:
.globl _start
_start:
sub $0x100000, %rsp
push %rax
push %rax
jmp _start
In the case of an operating system with an MMU (such as Linux), your program will crash with an error message:
~$ ./example_program
Segmentation fault (core dumped)
~$
EDIT/ADDENDUM
Is stack for all running programs located at the "end"?
In older Linux versions, the stack was located near (but not exactly at) the end of the virtual memory accessible by the program: Programs could access the address range from 0 to 0xBFFFFFFF in those Linux versions. The initial stack pointer was located around 0xBFFFFE00. (The command line arguments and environment variables came after the stack.)
And is this the end of actual physical memory? Will not the stack of different running programs then get mixed up? I was under the impression that all of the stack and memory of a program remains contiguous in actual physical memory, ...
On a computer using an MMU, the program never sees physical memory:
When the program is loaded, the OS will search some free area of RAM - maybe it finds some at the physical address 0xABC000. Then it configures the MMU in a way that the virtual addresses 0xBFFFF000-0xBFFFFFFF are translated to the physical addresses 0xABC000-0xABCFFF.
This means: Whenever the program accesses address 0xBFFFFE20 (for example using a push operation), the physical address 0xABCE20 in the RAM is actually accessed.
There is no possibility for a program at all to access a certain physical address.
If you have another program running, the MMU is configured in a way that the addresses 0xBFFFF000-0xBFFFFFFF are translated to the addresses 0x345000-0x345FFF when the other program is running.
So if one of the two programs will perform a push operation and the stack pointer is 0xBFFFFE20, the address 0xABCE20 in RAM will be accessed; if the other program performs a push operation (with the same stack pointer value), the address 0x345E20 will be accessed.
Therefore, the stacks will not mix up.
OSs not using an MMU but supporting multi-tasking (examples are the Amiga 500 or early Apple Macintoshes) will of course not work this way. Such OSs use special file formats (and not ELF) which are optimized for running multiple programs without MMU. Compiling programs for such OSs is much more complex than compiling programs for Linux or Windows. And there are even restrictions for the software developer (example: functions and arrays should not be too long).
Also, does each program have its own stack pointer, base pointer, registers, etc.? Or does the OS just have one set of these registers to be shared by all programs?
(Assuming a single-core CPU), the CPU has one set of registers; and only one program can run at the same time.
When you start multiple programs, the OS will switch between the programs. This means program A runs for (for example) 1/50 second, then program B runs for 1/50 second, then program A runs for 1/50 second and so on. It appears to you as if the programs run the same time.
When the OS switches from program A to program B, it must first save the values of the registers (of program A). Then it must change the MMU configuration. Finally it must restore program B's register values.
Yes, objdump on this executable shows addresses where its segments will be mapped. (Linking collects sections into segments: What's the difference of section and segment in ELF file format) .data and .text get linked into different sections with different permissions (read+write vs. read+exec).
If, when the program is loaded, storage at that address isn't available
That could only happen when loading a dynamic library, not the executable itself. Virtual memory means that each process has its own private virtual address space, even if they were started from the same executable. (This is also why ld can always pick the same default base address for the text and data segments, not trying to slot every executable and library on the system into a different spot in a single address space.)
An executable is the first thing that gets to lay claim to parts of that address space, when it's loaded/mapped by the OS's ELF program loader. That's why traditional (non-PIE) ELF executables can be non-relocatable, unlike ELF shared objects like /lib/libc.so.6
If you single-step a program with a debugger, or include a sleep, you'll have time to look at less /proc/<PID>/maps. Or cat /proc/self/maps to have cat show you its own map. (Also /proc/self/smaps for more details info on each mapping, like how much of it is dirty, using hugepages, etc.)
(Newer GNU/Linux distros configure GCC to make PIE executables by default: 32-bit absolute addresses no longer allowed in x86-64 Linux?. In that case objdump would only see addresses relative to a base of 0 or 1000 or something. And compiler-generated asm would have used PC-relative addressing, not absolute.)
Related
Assume this simple code:
int main(){return 0;}
using objdump we can see the memory addresses:
0000000100003fa0 _main:
100003fa0: 55 pushq %rbp
100003fa1: 48 89 e5 movq %rsp, %rbp
100003fa4: 31 c0 xorl %eax, %eax
100003fa6: c7 45 fc 00 00 00 00 movl $0, -4(%rbp)
100003fad: 5d popq %rbp
100003fae: c3 retq
I know that 0x100003fa0 (as an example) is a virtual memory address.
The OS will map it to the physical memory when my program is loaded.
2 Questions:
1- Can the initial address of main be random? as they are virtual I'm guessing it can be
any value as the virtual memory will take care of the rest? i.e I can start literally from 0x1 (not 0x0 as it's reserved for null)?
2- How does the linker come up with the initial address? (again is the starting address random?)
Can the initial address of main be random? as they are virtual I'm guessing it can be any value as the virtual memory will take care of the rest? i.e I can start literally from 0x1 (not 0x0 as it's reserved for null)?
The memory being virtual doesn’t mean that all of the virtual address space is yours to do with as you please. On most OSes, the executable modules (programs and libraries) need to use a subset of the address space or the loader will refuse to load them. That is highly platform-dependent of course.
So the address can be whatever you want as long as it is within the platform-specific range. I doubt that any platform would allow 0x1, not only because some platforms need the code to be aligned to something larger than a byte.
Furthermore, on many platforms the addresses are merely hints: if they can be used as-is, the loader doesn't have to relocate a given section in the binary. Otherwise, it'll move it to a block of the address space that is available. This is fairly common, e.g. on Windows, the 32-bit binaries (e.g. DLLs) have base addresses: if available, the loader can load the binary faster. So, in the hypothetical case of the "initial address" being 0x1, assuming that alignment wasn't a problem, the address will just end up being moved elsewhere in the address space.
It's also worth noting that the "initial address" is a bit of an unspecific term. The binary modules that are loaded when an executable starts, consist of something akin to sections. Each of the sections has its own base address, and possibly also internal (relative) addresses or address references that are tabulated. In addition, one or more of the executable sections will also have an "entry" address. Those addresses will be used by the loader to execute initialization code (e.g. DllMain concept on Windows) - that code always returns quickly. Eventually, one of the sections, that nothing else depends on, will have a suitably named entry point and will be the "actual" program you wrote - that one will keep running and return only when the program has been exited. At that point the control may return to the loader, which will note that nothing else is to be executed, and the process will be torn down. The details of all this are highly platform dependent - I'm only giving a high-level overview, it's not literally done that way on any particular platform.
How does the linker come up with the initial address? (again is the starting address random?)
The linker has no idea what to do by itself. When you link your program, the linker gets fed several more files that come with the platform itself. Those files are linker scripts and various static libraries needed to make the code able to start up. The linker scripts give the linker the constraints in which it can assign addresses. So it’s all highly platform-specific again. The linker can either assign the addresses in a completely deterministic fashion, ie. the same inputs produces identical output always, or it can be told to assign certain kinds of addresses at random (in a non-overlapping fashion of course). That’s known as ASLR (address space randomization).
Not sure about Visual C but gcc (or rather ld) uses a linker script to determine final addresses. This can be specified using the -T option. Full details of gcc linker scripts can be found at: https://sourceware.org/binutils/docs/ld/Scripts.html#Scripts.
Normally you don't need to play with this since your toolchain will be built either for the host machine or when cross compiling with the correct settings for a target.
For ASLR, and .so files you will need to compile with the -PIC or -PIE (position independent code or position independent executable). You compiled code will only contain offsets against some base address in memory. The base address then gets set by the operating system loader before running your application.
Those addresses are base addresses and offsets. ELF file contains special information on how to calculate the actual addresses when the program is loaded. It is a rather advanced topic but how the .elf file is loaded and executed you can read here: How do I load and execute an ELF binary executable manually? or https://linux-audit.com/elf-binaries-on-linux-understanding-and-analysis/
I'm learning about the layout of executable binaries. My end goal is to analyze a specific executable for things that could be refactored (in its source) to reduce the compiled output size.
I've been using https://www.embeddedrelated.com/showarticle/900.php and https://www.geeksforgeeks.org/memory-layout-of-c-program/ as references for this initial learning.
From what I've learned, a linker script specifies the addresses where sections of compiled binaries are placed. E.g.
> ld --verbose | grep text
PROVIDE (__executable_start = SEGMENT_START("text-segment", 0x400000)); . = SEGMENT_START("text-segment", 0x400000) + SIZEOF_HEADERS;
*(.rela.text .rela.text.* .rela.gnu.linkonce.t.*)
I think this means that the text segments of compiled binaries starts at memory address 0x400000 - true?
What does that value, 0x400000, represent? I'm probably not understanding something properly, but surely that 0x400000 does not represent a physical memory location, does it? E.g. if I were to run two instances of my compiled a.out executable in parallel, they couldn't both simultaneously occupy the space at 0x400000, right?
0x4000000 is not a physical address in the sense how your memory chips see it. This is a virtual address as it's seen from CPU's point of view.
Loader of your program will map a few pages of physical memory to VA 0x400000 and copy the contents of text-segment to it. And yes, another instance of your program could occupy the same physical and virtual block of memory for the text-segment, because text (code) is readable and executable but not writeable. Other segments (data, bss, stack, heap) may have identical VA but each will be mapped to their private protected physical block of memory.
What is 0x400000
I think this means that the text segments of compiled binaries starts at memory address 0x400000 - true?
No, this is well explained in the official documentation at: https://sourceware.org/binutils/docs/ld/Builtin-Functions.html
SEGMENT_START(segment, default)
Return the base address of the named segment. If an explicit value has already been given for this segment (with a command-line ‘-T’ option) then that value will be returned otherwise the value will be default. At present, the ‘-T’ command-line option can only be used to set the base address for the “text”, “data”, and “bss” sections, but you can use SEGMENT_START with any segment name.
Therefore, SEGMENT_START is not setting the address, but rather it is returning it, and 0x4000000 in your case is just the default if that value was not deterministically set by some CLI mechanism mentioned in the documentation (e.g. -Ttext=0x200 as mentioned in man ld)
Physical vs virtual
As you've said, doing things in physical addresses is very uncommon in userland, and would at the very least always require sudo as it would break process separation. Here is an example of userland doing physical address stuff for example: How to access physical addresses from user space in Linux?
Therefore, when the kernel loads an ELF binary with the exec syscalls, all addresses are interpreted as virtual addresses.
Note however that this is just a matter of convention. For example, when I give my Linux kernel ELF binary for QEMU to load into memory to start simulation, or when a bootloader does that in a real system, the ELF addresses would then be treated as physical addresses since there is no page table available at that point.
./hello is a simple echo program in c.
according to objdump file-headers,
$ objdump -f ./hello
./hello: file format elf32-i386
architecture: i386, flags 0x00000150:
HAS_SYMS, DYNAMIC, D_PAGED
start address 0x00000430
./hello has start address 0x430
Now loading this binary in gdb.
(gdb) file ./hello
Reading symbols from ./hello...(no debugging symbols found)...done.
(gdb) x/x _start
0x430 <_start>: 0x895eed31
(gdb) break _start
Breakpoint 1 at 0x430
(gdb) run
Starting program: /1/vooks/cdac/ditiss/proj/binaries/temp/hello
Breakpoint 1, 0x00400430 in _start ()
(gdb) x/x _start
0x400430 <_start>: 0x895eed31
(gdb)
in above output before setting breakpoint or running the binary, _start has address 0x430, but after running it, this address changes to 0x400430.
$ readelf -l ./hello | grep LOAD
LOAD 0x000000 0x00000000 0x00000000 0x007b4 0x007b4 R E 0x1000
LOAD 0x000eec 0x00001eec 0x00001eec 0x00130 0x00134 RW 0x1000
How this mapping happens?
Kindly Help.
Basically, after linking, ELF file format provide all necessary information for the loaders to load the program into memory and run it.
Each piece of code and data is placed within an offset inside a section, like data section, text section, etc. and access of specific function or global variable is done by adding the proper offset to the section start address.
Now, ELF file format also include program header table:
An executable or shared object file's program header table is an array
of structures, each describing a segment or other information that the
system needs to prepare the program for execution. An object file
segment contains one or more sections, as described in "Segment
Contents".
Those structures are then used by the OS loader to load the image to memory. The structure:
typedef struct {
Elf32_Word p_type;
Elf32_Off p_offset;
Elf32_Addr p_vaddr;
Elf32_Addr p_paddr;
Elf32_Word p_filesz;
Elf32_Word p_memsz;
Elf32_Word p_flags;
Elf32_Word p_align;
} Elf32_Phdr;
Note the following fields:
p_vaddr
The virtual address at which the first byte of the segment resides in memory
p_offset
The offset from the beginning of the file at which the first byte of
the segment resides.
And p_type
The kind of segment this array element describes or how to interpret the array element's information. Type values and their meanings are specified in Table 7-35.
From Table 7-35, note PT_LOAD:
Specifies a loadable segment, described by p_filesz and p_memsz. The
bytes from the file are mapped to the beginning of the memory segment.
If the segment's memory size (p_memsz) is larger than the file size
(p_filesz), the extra bytes are defined to hold the value 0 and to
follow the segment's initialized area. The file size can not be larger
than the memory size. Loadable segment entries in the program header
table appear in ascending order, sorted on the p_vaddr member.
So, by looking at those fields (and more) the loader can locate the segments (which can contain multiple sections) within the ELF file, and load them (PT_LOAD) into memory at a given virtual address.
Now, can a virtual address of an ELF file segment be changed at runtime (load time)? yes:
The virtual addresses in the program headers might not represent the
actual virtual addresses of the program's memory image. See "Program
Loading (Processor-Specific)".
So, program header contains the segments the OS loader will load into memory (loadable segments, which contains loadable sections), but the virtual addresses the loader puts them can differ from the addresses in the ELF file.
How?
To understand it, lets first read about Base Address
Executable and shared object files have a base address, which is the
lowest virtual address associated with the memory image of the
program's object file. One use of the base address is to relocate the
memory image of the program during dynamic linking.
An executable or shared object file's base address is calculated
during execution from three values: the memory load address, the
maximum page size, and the lowest virtual address of a program's
loadable segment. The virtual addresses in the program headers might
not represent the actual virtual addresses of the program's memory
image. See "Program Loading (Processor-Specific)".
So the practice is the following:
position-independent code. This code enables a segment's virtual
address change from one process to another, without invalidating
execution behavior.
Though the system chooses virtual addresses for individual processes,
it maintains the relative positions of the segments. Because
position-independent code uses relative addressing between segments,
the difference between virtual addresses in memory must match the
difference between virtual addresses in the file.
So by using relative addressing, (PIE- position independent executable) the actual placement can differ from the address in the ELF file.
From PeterCordes's answer:
0x400000 is the Linux default base address for loading PIE
executables with ASLR disabled (like GDB does by default).
So for your specific case (PIE executable in Linux) loader picks this base address.
Of course position independent is just an option. Program can be compiled without it, and than absolute addressing mode takes place, in which there must not be difference between segment address in ELF to the real memory address segment is loaded to:
Executable file segments typically contain absolute
code. For the process to execute correctly, the segments must reside
at the virtual addresses used to create the executable file. The
system uses the p_vaddr values unchanged as virtual addresses.
I would recommend you to take a look at the linux implementation of elf image loading here, and those two SO threads here and here.
Paragraphs takes from Oracle ELF documents (here and here)
You have a PIE executable (Position Independent Executable), so the file only contains an offset relative to the load address, which the OS chooses (and can randomize).
0x400000 is the Linux default base address for loading PIE executables with ASLR disabled (like GDB does by default).
If you compile with -m32 -fno-pie -no-pie hello.c to make a normal position dependent dynamically linked executable that can load from static locations with mov eax, [symname] instead of having to get EIP in a register and use that to do PC-relative addressing without x86-64 RIP-relative addressing modes, objdump -f will say:
./hello-32-nopie: file format elf32-i386
architecture: i386, flags 0x00000112:
EXEC_P, HAS_SYMS, D_PAGED
start address 0x08048380 # hard-coded load address, can't be ASLRed
instead of
architecture: i386, flags 0x00000150: # some different flags set
HAS_SYMS, DYNAMIC, D_PAGED # different ELF type
start address 0x000003e0
In a "regular" position-dependent executable, the linker chooses that base address by default, and does embed it in the executable. The OS's program loader does not get to choose for ELF executables, only for ELF shared objects. A non-PIE executable can't be loaded at any other address, so only their libraries can be ASLRed, not the executable itself. This is why PIE executables were invented.
A non-PIE is allowed to embed absolute addresses without any metadata that would let an OS try to relocate it. Or it's allowed to contain hand-written asm that takes advantage of whatever it wants to about the numeric values of addresses.
A PIE is an ELF shared object with an entry-point. Until PIEs were invented, ELF shared objects were usually only used for shared libraries. See 32-bit absolute addresses no longer allowed in x86-64 Linux? for more about PIEs.
They're quite inefficient for 32-bit code, I'd recommend not making 32-bit PIEs.
A static executable can't be a PIE, so gcc -static will create a non-PIE elf executable; it implies -no-pie. (So will linking with ld directly, because only gcc changed to making PIEs by default, gcc needs to pass -pie to ld to do that.)
So it's easy to understand why you wrote "static vs. dynamic" in your title, if the only dynamic executables you ever looked at were PIEs. But a dynamically linked non-PIE ELF executable is totally normal, and what you should be doing if you care about performance but for some reason want / need to make 32-bit executables.
Until the last couple years or so, normal binaries like /bin/ls in normal Linux distros were non-PIE dynamic executables. For x86-64 code, being PIE only slows them down by maybe 1% I think I've read. Slightly larger code for putting a static address in a register, or for indexing a static array. Nowhere near the amount of overhead that 32-bit code has for PIC/PIE.
Virtual memory along with logical memory helps to make sure programs do not corrupt each others data.
Program relocation does an almost similar thing of making sure that multiple programs does not corrupt each other.Relocation modifies object program so that it can be loaded at a new, alternate address.
How are virtual memory, logical memory and program relocation related ? Are they similar ?
If they are same/similar, then why do we need program relocation ?
Relocatable programs, or said another way position-independent code, is traditionally used in two circumstances:
systems without virtual memory (or too basic virtual memory, e.g. classic MacOS), for any code
for dynamic libraries, even on systems with virtual memory, given that a dynamic library could find itself lodaded on an address that is not its preferred one if other code is already at that space in the address space of the host program.
However, today even main executable programs on systems with virtual memory tend to be position-independent (e.g. the PIE* build flag on Mac OS X) so that they can be loaded at a randomized address to protect against exploits, e.g. those using ROP**.
* Position Independent Executable
** Return-Oriented Programming
Virtual memory does not prevent programs from interfering with out other. It is logical memory that does so. Unfortunately, it is common for the two concepts to be conflated to "virtual memory."
There are two types of relocation and it is not clear which you are referring to. However, they are connected. On the other hand, the concept is not really related to virtual memory.
The first concept of relocatable code. This is critical for shared libraries that usually have to be mapped to different addresses.
Relocatable code uses offsets rather than absolute addresses. When a program results in an instruction sequence something like:
JMP SOMELABEL
. . .
SOMELABEL:
The computer or assembler encodes this as
JUMP the-number-of-bytes-to-SOMELABEL
rather than
JUMP to-the-address-of-somelabel.
By using offsets the code works the same way no matter where the JMP instruction is located.
The second type of relocation uses the first. In the past relocation was mostly used for libraries. Now, some OS's will load program segments at different places in memory. That is intended for security. It is designed to keep malicious cracks that depend upon the application being loaded at a specific address.
Both of these concepts work with or without virtual memory.
Note that generally the program is not modified to relocated it. I generally, because an executable file will usually have some addresses that need to be fixed up at run time.
When a C program is compiled and the object file(ELF) is created. the object file contains different sections such as bss, data, text and other segments. I understood that these sections of the ELF are part of virtual memory address space. Am I right? Please correct me if I am wrong.
Also, there will be a virtual memory and page table associated with the compiled program. Page table associates the virtual memory address present in ELF to the real physical memory address when loading the program. Is my understanding correct?
I read that in the created ELF file, bss sections just keeps the reference of the uninitialised global variables. Here uninitialised global variable means, the variables that are not intialised during declaration?
Also, I read that the local variables will be allocated space at run time (i.e., in stack). Then how they will be referenced in the object file?
If in the program, there is particular section of code available to allocate memory dynamically. How these variables will be referenced in object file?
I am confused that these different segments of object file (like text, rodata, data, bss, stack and heap) are part of the physical memory (RAM), where all the programs are executed.
But I feel that my understanding is wrong. How are these different segments related to the physical memory when a process or a program is in execution?
1. Correct, the ELF file lays out the absolute or relative locations in the virtual address space of a process that the operating system should copy the ELF file contents into. (The bss is just a location and a size, since its supposed to be all zeros, there is no need to actually have the zeros in the ELF file). Note that locations can be absolute locations (like virtual address 0x100000 or relative locations like 4096 bytes after the end of text.)
2. The virtual memory definition (which is kept in page tables and maps virtual addresses to physical addresses) is not associated with a compiled program, but with a "process" (or "task" or whatever your OS calls it) that represents a running instance of that program. For example, a single ELF file can be loaded into two different processes, at different virtual addresses (if the ELF file is relocatable).
3. The programming language you're using defines which uninitialized state goes in the bss, and which gets explicitly initialized. Note that the bss does not contain "references" to these variables, it is the storage backing those variables.
4. Stack variables are referenced implicitly from the generated code. There is nothing explicit about them (or even the stack) in the ELF file.
5. Like stack references, heap references are implicit in the generated code in the ELF file. (They're all stored in memory created by changing the virtual address space via a call to sbrk or its equivalent.)
The ELF file explains to an OS how to setup a virtual address space for an instance of a program. The different sections describe different needs. For example ".rodata" says I'd like to store read-only data (as opposed to executable code). The ".text" section means executable code. The "bss" is a region used to store state that should be zeroed by the OS. The virtual address space means the program can (optionally) rely on things being where it expects when it starts up. (For example, if it asks for the .bss to be at address 0x4000, then either the OS will refuse to start it, or it will be there.)
Note that these virtual addresses are mapped to physical addresses by the page tables managed by the OS. The instance of the ELF file doesn't need to know any of the details involved in which physical pages are used.
I am not sure if 1, 2 and 3 are correct but I can explain 4 and 5.
4: They are referenced by offset from the top of the stack. When executing a function, the top of the stack is increased to allocate space for local variables. Compiler determines the order of local variables in the stack so the compiler nows what is the offset of the variables from the top of the stack.
Stack in physical memory is positioned upside down. Beginning of stack usually has highest memory address available. As programs runs and allocates space for local variables the address of the top of the stack decrements (and can potentially lead to stack overflow - overlapping with segments on lower addresses :-) )
5: Using pointers - Address of dynamically allocated variable is stored in (local) variable. This corresponds to using pointers in C.
I have found nice explanation here: http://www.ualberta.ca/CNS/RESEARCH/LinuxClusters/mem.html
All the addresses of the different sections (.text, .bss, .data, etc.) you see when you inspect an ELF with the size command:
$ size -A -x my_elf_binary
are virtual addresses. The MMU with the operating system performs the translation from the virtual addresses to the RAM physical addresses.
If you want to know these things, learn about the OS, with source code (www.kernel.org) if possible.
You need to realize that the OS kernel is actually running the CPU and managing the memory resource. And C code is just a light weight script to drive the OS and to run only simple operation with registers.
Virtual memory and Physical memory is about CPU's TLB letting the user space process to use contiguous memory virtually through the power of TLB (using page table) hardware.
So the actual physical memory, mapped to the contiguous virtual memory can be scattered to anywhere on the RAM.
Compiled program doesn't know about this TLB stuff and physical memory address stuff. They are managed in the OS kernel space.
BSS is a section which OS prepares as zero filled memory addresses, because they were not initialized in the c/c++ source code, thus marked as bss by the compiler/linker.
Stack is something prepared only a small amount of memory at first by the OS, and every time function call has been made, address will be pushed down, so that there is more space to place the local variables, and pop when you want to return from the function.
New physical memory will be allocated to the virtual address when the first small amount of memory is full and reached to the bottom, and page fault exception would occur, and the OS kernel will prepare a new physical memory and the user process can continue working.
No magic. In object code, every operation done to the pointer returned from malloc is handled as offsets to the register value returned from malloc function call.
Actually malloc is doing quite complex things. There are various implementations (jemalloc/ptmalloc/dlmalloc/googlemalloc/...) for improving dynamic allocations, but actually they are all getting new memory region from the OS using sbrk or mmap(/dev/zero), which is called anonymous memory.
Just do a man on the command readelf to find out the starting addresses of the different segments of your program.
Regarding the first question you are absolutely right. Since most of today's systems use run-time binding it is only during execution that the actual physical addresses are known. Moreover, it's the compiler and the loader that divide the program into different segments after linking the different libraries during compile and load time. Hence, the virtual addresses.
Coming to the second question it is at the run-time due to runtime binding. The third question is true. All uninitialized global variables and static variables go into BSS. Also note the special case: they go into BSS even if they are initialized to 0.
4.
If you look at a assembler code generated by gcc you can see that memory local variables is allocated in stack through command push or through changing value of the register ESP. Then they are initiated with command mov or something like that.