static int arr[10] memory address always ends in 060 - c

I have a c program that looks like this
main.c
#include <stdio.h>
#define SOME_VAR 10
static int heap[SOME_VAR];
int main(void) {
printf("%p", heap);
return 0;
}
and outputs this when I run the compiled program a few times
0x58aa7c49060
0x56555644060
0x2f8d1f8e060
0x92f58280060
0x59551c53060
0xd474ed6e060
0x767c4561060
0xf515aeda060
0xbe62367e060
Why does it always end in 060? And is the array stored in heap?
Edit: I am on Linux and I have ASLR on. I compiled the program using gcc

The addresses differ because of ASLR (Address space layout ramdomization). Using this, the binary can be mapped at different locations in the virtual address space.
The variable heap is - in contrast to it's name - not located on the heap, but on the bss. The offset in the address space is therefore constant.
Pages are mapped at page granularity, which is 4096 bytes (hex: 0x1000) on many platforms. This is the reason, why the last three hex digits of the address is the same.
When you did the same with a stack variable, the address could even vary in the last digits on some platforms (namely linux with recent kernels), because the stack is not only mapped somewhere else but also receives a random offset on startup.

If you are using Windows, the reason is PE structure.
Your heap variable is stored in .data section of file and its address is calculated based on start of this section. Each section is loaded in an address independently, but its starting address is multiple of page size. Because you have no other variables, its address is probably start of .data section, so its address will be multiple of chunk size.
For example, this is the table of the compiled Windows version of your code:
The .text section is were your compiled code is and .data contains your heap variable. When your PE is loaded into memory, sections are loaded in different address and which is returned by VirtualAlloc() and will be multiple of page size. But address of each variable is relative to start of section that is now a page size. So you will always see a fixed number on lower digits. Since the relative address of heap from start of section is based on compiler, compile options, etc. you will see different number from same code but different compilers, but every time what will be printed is fixed.
When I compile code, I noticed heap is placed on 0x8B0 bytes after start of .data section. So every time that I run this code, my address end in 0x8B0.

The compiler happened to put heap at offset 0x60 bytes in a data segment it has, possibly because the compiler has some other stuff in the first 0x60 bytes, such as data used by the code that starts the main routine. That is why you see “060”; it is just where it happened to be, and there is no great significance to it.
Address space layout randomization changes the base address(es) used for various parts of program memory, but it always does so in units of 0x1000 bytes (because this avoids causing problems with alignment and other issues). So you see the addresses fluctuate by multiples of 0x1000, but the last three digits do not change.
The definition static int heap[SOME_VAR]; defines heap with static storage duration. Typical C implementations store it in a general data section, not in the heap. The “heap” is a misnomer for memory that is used for dynamic allocation. (It is a misnomer because malloc implementations may use a variety of data structures and algorithms, not limited to heaps. They may even use multiple methods in one implementation.)

Related

where is it documented that global array in C, compiled by gcc, is initialized like "copy-on-write"?

For this C code:
foobar.c:
static int array[256];
int main() {
return 0;
}
the array is initialized to all 0's, by the C standard. However, when I compile
gcc -S foobar.c
this produces the assembly code foobar.s that I can inspect, and nowhere in foobar.s, is there any initialization of contents of the array.
Hence I reason, that the contents are not initialized, only when an element of the array is inspected, is it initialized, kind of like "copy-on-write" mechanism for fork.
Is my reasoning correct? If so, is this a documented feature, and if so where can I find that documentation?
There's kind of a lot of levels here. This answer addresses Linux in particular, but the same concepts are likely to apply on other systems, possibly with different names.
The compiler requires that the object be "zero initialized". In other words, when a memory read instruction is executed with an address in that range, the value that it reads must be zero. As you say, this is necessary to achieve the behavior dictated by the C standard.
The compiler accomplishes this by asking the assembler to fill the space with zeros, one way or another. It may use the .space or .zero directive which implicitly requests this. It will also place the object in a section with the special name .bss (the reasons for this name are historical). If you look further up in the assembly output, you should see a directive like .bss or .section .bss. The assembler and linker promises that this entire section will be (somehow) initialized to zero. This is documented:
The bss section is used for local common variable storage. You may allocate address space in the bss section, but you may not dictate data to load into it before your program executes. When your program starts running, all the contents of the bss section are zeroed bytes.
Okay, so now what do the assembler and linker do to make it happen? Well, an ELF executable file has a segment header, which specifies how and where code and data from the file should be mapped into the program's memory. (Please note that the use of the word "segment" here has nothing to do with the x86 memory segmentation model or segment registers, and is only vaguely related to the term "segmentation fault".) The size of the segment, and the amount of data to be mapped, are specified separately. If the size is greater, then all remaining bytes are to be initialized to zero. This is also documented in the above-linked man page:
PT_LOAD
The array element specifies a loadable segment,
described by p_filesz and p_memsz. The bytes
from the file are mapped to the beginning of the
memory segment. If the segment's memory size
p_memsz is larger than the file size p_filesz,
the "extra" bytes are defined to hold the value
0 and to follow the segment's initialized area.
So the linker ensures that the ELF executable contains such a segment, and that all objects in the .bss section are in this segment, but not within the part that is mapped to the file.
Once all this is done, then the observable behavior is guaranteed: as above, when an instruction attempts to read from this object before it has been written, the value it reads will be zero.
Now as to how that behavior is ensured at runtime: that is the job of the kernel. It could do it by pre-allocating actual physical memory for that range of virtual addresses, and filling it with zeros. Or by an "allocate on demand" method, like what you describe, by leaving those pages unmapped in the CPU's page tables. Then any access to those pages by the application will cause a page fault, which will be handled by the kernel, which will allocate zero-filled physical memory at that time, and then restart the faulting instruction. This is completely transparent to the application. It just sees that the read instruction got the value zero. If there was a page fault, then it just seems to the application like the read instruction took a long time to execute.
The kernel normally uses the "on demand" method, because it is more efficient in case not all of the "zero initialized" memory is actually used. But this is not going to be documented as guaranteed behavior; it is an implementation detail. An application programmer need not care, and in fact must not care, how it works under the hood. If the Linux kernel maintainers decide tomorrow to switch everything to the pre-allocate method, every application will work exactly as it did before, just maybe a little faster or slower.

How memory allocation of variables or data in a program are done by compiler and OS

Want to get an overview on a few things about how exactly the memory for a variable is allocated.
In C programming,
Taking the context of "auto" variables, which are allocated on the stack section, I have the following question:
Does the compiler generate a logical address for the variables? If yes, then how? Won't the compiler need OS permission to generate or assign such addresses? If no, then is there some sort of indication or instruction that the compiler puts in the code segment asking the OS to allocate memory when running the executable?
Now taking the context of heap allocated variables,
Is the heap of the same size for all programs? If not, then does the executable consist of a header or something that tells the OS how much heap space it needs for dynamic allocation?
I'd be grateful if someone provides the answer or shares any related content/links that explains this.
Stack (most implementations use stack for automatic storage duration objects) and static storage duration objects memory is allocated during the program load and startup.
Does the compiler generate a logical address for the variables? If
yes, then how?
I do not know what is the "logical address" but compilers do "calculate" the references to the automatic storage duration objects. How? Simply compiler knows how far from the stack pointer address the automatic storage duration object is located (offset).
Generally the same applies to the static duration objects and the code, the compiler only calculates the offset from the their sections.
Is the heap of the same size for all programs?
It is implementation defined.
A method typically used in operating systems is that, when a program is starting, there is a piece (or collection) of software used that loads the program. The program loader reads the executable file and sets up memory for the program.
Part of the executable file says what size stack should be allocated for it. Most often, this is set by default when linking the program. (It is 8 MiB for macOS, 2 MiB for Linux, and 1 MiB for Windows.) However, it can be changed by asking the linker to set a different size.
The program loader calls operating system routines to request virtual memory be mapped. It does this for the stack and for other parts of the program, such as the code sections (the parts of memory that contain, mostly, the executable instructions of the program), and the initialized and uninitialized data. When it starts the program, it tells the program where the stack starts by putting that address into a designated register (or similar means).
One of the processor registers is used as a stack pointer; it points to address within the memory allocated for the stack that is the current top of stack. When the compiler arranges to use stack space for objects, it generates instructions that adjust the stack pointer. The addresses for the objects are calculated relative to the stack pointer. If a function needs 128 bytes of data, the compiler generates an instruction that subtracts 128 from the stack pointer. (This may occur in multiple steps, such as “call” and “push” instructions that make some changes to the stack pointer plus an additional “subtract” instruction that finishes the changes.) Then the addresses of all the objects in this stack frame are calculated as offsets from the value of the stack pointer. For example, by taking the stack pointer and adding 40, we calculate the address of the object that has been assigned to be 40 bytes higher than the top of the stack.
(There is some confusion about the wording of directions here because stacks commonly grow from high addresses to low addresses. The program loader may allocate some chunk of memory from, say, address 12300000016 to 12400000016. The stack pointer will start at 12400000016. Subtracting 128 will make it 123FFFF8016. Then 123FFFFA816 is an address that is 40 bytes “higher” than 123FFFF8016 in the address space, but the “top of stack” is below that. That is because the term “top of stack” refers to the model of physically stacking things on top of each other, with the latest thing on top.)
The so-called “heap” is not the same size of all programs. In typical implementations, the memory management routines call system routines to request more virtual memory when they need it.
Note that “heap” is properly a word for a general data structure. Heaps may be used to organize things other than available memory, and the memory management routine keep track of available memory using data structures other than heaps. When referring to memory allocated via the memory management routines, you can call it “dynamically allocated memory.” It may also be shortened to “allocated memory,” but that can be confusing in some situations since all memory that has been reserved for some use is allocated memory.
Some background first
In C programming, Taking the context of "auto" variables, which are allocated on the stack section ...
To understand my answer, you first need to know how the stack works.
Imagine you write the following function:
int myFunction()
{
return function1() + function2() + function3();
}
Unfortunately, you do not use C as programming language but you use a programming language that neither supports local variables nor return values. (This is how most CPUs work internally.)
You may return a value from a function in a global variable:
function1()
{
result = 1234; // instead of: return 1234;
}
And your program may now look the following way if you use a global variable instead of local ones:
int a;
myFunction()
{
function1();
a = result;
function2();
a += result;
function3();
result += a;
}
Unfortunately, one of the three functions (e.g. function3()) may call myFunction() (so the function is called recursively) and the variable a is overwritten when calling function3().
To solve this problem, you may define an array for local variables (myvars[]) and a variable named mypos. In the example, the elements 0...mypos in myvars[] are used; the elements (mypos+1)...(MAX_LOCALS-1) are free:
int myvars[MAX_LOCALS];
int mypos;
...
myFunction()
{
function1();
mypos++;
myvars[mypos] = result;
function2();
myvars[mypos] += result;
function3();
result += myvars[mypos];
mypos--;
}
By changing the value of mypos from 10 to 11 (as an example), your program indicates that the element mypos[11] is now in use and that the functions being called shall store their data in elements mypos[x] with x>=12.
Exactly this is how the stack is working.
Typically, the "variable" mypos is not a variable but a CPU register named "stack pointer". (However, there are a few historic CPUs where an ordinary variable was used for this!)
The actual answers
Does the compiler generate a logical address for the variables?
In the example above, the compiler will perform a mypos+=3 if there are 3 local variables. Let's say they are named a, b and c.
The compiler simply replaces a by myvars[mypos-2], b by myvars[mypos-1] and c by myvars[mypos].
On most CPUs, the stack pointer (named mypos in the example) is not an index into an array but already a pointer (comparable to int * mypos;), so the compiler would replace a by *(mypos-2) instead of myvars[mypos-2] in the example.
For global variables, the compiler simply counts the number of bytes needed for all global variables. In the simplest case, it chooses a range of memory of the same size (e.g. 0x10000...0x10123) and places the variables there.
Won't the compiler need OS permission to generate or assign such addresses?
No.
The "stack" (in the example this is the array myvars[]) is already provided by the OS and the stack pointer (mypos in the example) is also set to a correct value by the OS before the program is started.
Your program knows that the elements myvars[x] with x>mypos can be used.
For global variables, the information about the range used by global variables (e.g. 0x10000...0x10123) is stored in the executable file. The OS must ensure that this memory range can be used by the program. (For example by configuring the MMU accordingly.)
If this is not possible, the OS will simply refuse to start the program with an error message.
... asking the OS to allocate memory when running the executable?
For variables on the stack:
There may be operating systems where this is done.
However, in most cases, the program will simply crash with a "stack overflow" if too much stack is needed. In the example, this would mean: The program crashes if an elements myvars[x] with x>=MAX_LOCALS is accessed.
Now taking the context of heap allocated variables ...
Please first note that global variables are not stored on the heap.
The heap is used for data allocated using malloc() and similar functions (new in C++ for example).
Under many operating systems, malloc() calls an operating system function - so it is actually the operating system that allocates memory on the heap.
... and if there is not enough space, the OS (and malloc()) will return NULL.
Does the compiler generate a logical address for the variables? If yes, then how?
Yes, but they are related to the stack pointer at function entry time (which is normally saved as a constant base pointer, stored in a cpu register) This is because the function can be recursive, and you can have two calls to the function with different instances for that variable (and related to different copies of the base pointer), the compiler assigns the offset to the base pointer for the variable, but the base pointer can be different, depending on the stack contents at function entry time.
Won't the compiler need OS permission to generate or assign such addresses?
Nope, the compiler just generates an executable in the format and form needed for the operating system to manage process' memory. When the program starts, it is given normally three (or more) segments of memory:
text segment. A normally read-only (or execute only) segment that gives no write access to the program. This is normally because the text segment is shared between all programs that are using the same executable at the same time. A program can demand exclusive read-write acces to the text (to allow programs that modify their own executable code) but this happens only rarely. This is normally specified to the compiler and the compiler writes an special flag in the text segment to inform the kernel of this requirement.
Data segment. A read-write segment, that can be grown by means of a system cal (sbrk(2)) This is used for global variables and the heap (while in modern systems, the heap is allocated into a new segment acquired by calling the mmap(2) system call. Sometimes this segment is divided in two. A data segment read-only for constants (so the program receives a signal in case you try to change the value of a constant) and a read-write segment, freely usable by the program. This is where global variables are stored.
Stack segment. A read-write segment, that is allocated for the process to use as the stack segment. It has the capability of growing in one direction as the process starts using it. When the process accesses the data one memory page below the start of the segment, it generates a page fault trap that results in a new page being appended to the segment, so its workings are transparent to the process. This is the memory we are talking about.
the process can ask the kernel explicitly to get a new segment if it wants to (let's say it needs to map some file on memory, or if it has to load a shared executable/library) and on some systems, the read only variables (declared as const) are explititly stored in the text segment or in a specific section called .rodata that demands from the system a special data segment that is read-only. The compiler doesn't normally code this kind of resource itself, it is normally encoded in the program being compiled.
The complete memory is limited by system imposed limits, so if you try to overpass them (around 8Mb of stack space, by default, and depending on the operating system) you will get signalled by the system and your program aborted.
As you see, the process memory is owned by the process, and it can make whatever use it is permitted to. The stack memory is read/write, and allocated on demand, you can use up to 8Mb, but there's no provision to check on the use you do about it.
If no, then is there some sort of indication or instruction that the compiler puts in the code segment asking the OS to allocate memory when running the executable?
The system will know the size of the text segment of the process by the size it has on the executable. The data segment is divided into two parts, normally based on the assumption of what are the global initialized variables and what are the ones defaulting to zero (the memory allocated by the kernel to a process is initialized to zeros for security reasons) so the sum of both the initialized/data and the non initialized data sections are added to know how much memory to assign to the data segment. And the stack segment is assigned initialy just one page of memory, but as the process starts running and filling the stack, it grows as the process generates page faults on the so called next page of the stack segment. As you see, there's no nedd for the compiler to embed in the code any instruction to ask for more memory. Everything is read from the executable file.
The compiler runs as a normal program... it only generates all this information and writes it in a file (the executable file) for the kernel to know the resources needed to run the program. But the compiler's communication with the kernel is just to ask it to open files, write on them, read from source code and struggle it's head to achieve its task. :)
In most POSIX systems, the kernel loads a program in memory by means of the exec*(2) system calls. The kernel reads the executable file pointed to in a parameter of the call and creates the segments above mentioned, based on the parameters passed in the file, checks if another instance of the same program is running in the system to avoid loading the instructions from the file and referencing in this process the segment already open by the other. The data segment contents is initialized to zeros, and the contents of the initialization data are read into the segment (so the first part has the .data section of initialized global variables and the .bss section, which has only a size, is used to calculate the total size of the data segment). Then the stack is normally allocated one or more pages, depending on the initial contents that the exec() calls put in the initial stack. The initial stack is filled with:
a structure of data containing references to the program parameter list that was used on legacy systems to provide the kernel about the command line parameters to show in the ps(1) command output (this is still being generated for legacy purposes, but not used by the kernel for obvious security reasons) Today, a special system call is used to indicate the kernel the command line parameters to be output in the ps(1) output.
a snippet of machine code to use in the return from a system call to allow the execution (in user mode) of any signal handler that should be executed (this is the reason for the requirement that all signal handlers are called when the kernel returns from kernel mode and switches back again to user mode, and not otherwise)
the environment of the process.
the array of pointers to environment strings.
the command line parameters.
the array of char pointers that point to the command line parameters.
the envp array referenct to main().
the argv array reference to the command line parameters.
the argc counter of the number of command line parameters.
Once all these data is pushed to the stack, the program jumps to the start address (fixed by the linker, or by the user by a linker option) and is let to start running.
Before the program jumps to main() the executed code is part of the C runtime, that loads a special shared executable (called /lib/ld.so or similar) that is responsible of searching and loading of all the shared libraries that are linked to the program. Not all the programs have this feature (but almost all of them today are dynamically linked) but IMHO this is out of the scope to this question, as the program has already started and is running.

How does the global variable declaration solve the stack overflow in C?

I have some C code.
What it does is simple, get some array from io, then sort it.
#include <stdio.h>
#include <stdlib.h>
#define ARRAY_MAX 2000000
int main(void) {
int my_array[ARRAY_MAX];
int w[ARRAY_MAX];
int count = 0;
while (count < ARRAY_MAX && 1 == scanf("%d", &my_array[count])) {
count++;
}
merge_sort(my_array, w, count);
return EXIT_SUCCESS;
}
And it works well, but if I really give it a group of number which is 2000000, it cause a stack overflow. Yes, it used up all the stack. One of the solution is to use malloc() to allocate a memory space for these 2 variables, to move them to the heap, so no problem at all.
The other solution is to move the below 2 declaration to the global scope, to make them global variables.
int my_array[ARRAY_MAX];
int w[ARRAY_MAX];
My tutor told me that this solution does the same job: to move these 2 variables into the heap.
But I checked some documents online. Global variables, without initialisation, they will reside in the bss segment, right?
I checked online, the size of this section is just few bytes.
How could it prevent the stack overflow?
Or, because these 2 types are array, so they are pointers, and global pointers reside in data segment, and it indicates the size of data segment can be dynamically changed as well?
The bss (block started by symbol) section is tiny in the object file (4 or 8 bytes) but the value stored is the number of bytes of zeroed memory to allocate after the initialized data.
It avoids the stack overflow by allocating the storage 'not on the stack'. It is normally in the data segment, after the text segment and before the start of the heap segment — but that simple memory picture can be more complicated these days.
Officially, there should be caveats about 'the standard doesn't say that there must be a stack' and various other minor bits'n'pieces, but that doesn't alter the substance of the answer. The bss section is small because it is a single number — but the number can represent an awful lot of memory.
Disclaimer: This is not a guide, it is an overview. It is based on how Linux does things, though I may have gotten some details wrong. Most (desktop) operating systems use a very similar model, with different details. Additionally, this only applies to userspace programs. Which is what you're writing unless you're developing for the kernel or working on modules (linux), drivers (windows), kernel extensions (osx).
Virtual Memory: I'll go into more detail below, but the gist is that each process gets an exclusive 32-/64-bit address space. And obviously a process' entire address space does not always map to real memory. This means A) one process' addresses mean nothing to another process and B) the OS decides which parts of a process' address space are loaded into real memory and which parts can stay on disk, at any given point in time.
Executable File Format
Executable files have a number of different sections. The ones we care about here are .text, .data, .bss, and .rodata. The .text section is your code. The .data and .bss sections are global variables. The .rodata section is constant-value 'variables' (aka consts). Consts are things like error strings and other messages, or perhaps magic numbers. Values that your program needs to refer to but never change. The .data section stores global variables that have an initial value. This includes variables defined as <type> <varname> = <value>;. E.g. a data structure containing state variables, with initial values, that your program uses to keep track of itself. The .bss section records global variables that do not have an initial value, or that have an initial value of zero. This includes variables defined as <type> <varname>; and <type> <varname> = 0;. Since the compiler and the OS both know that variables in the .bss section should be initialized to zero, there's no reason to actually store all of those zeros. So the executable file only stores variable metadata, including the amount of memory that should be allocated for the variable.
Process Memory Layout
When the OS loads your executable, it creates six memory segments. The bss, data, and text segments are all located together. The data and text segments are loaded (not really, see virtual memory) from the file. The bss section is allocated to the size of all of your uninitialized/zero-initialized variables (see VM). The memory mapping segment is similar to the data and text segments in that it consists of blocks of memory that are loaded (see VM) from files. This is where dynamic libraries are loaded.
The bss, data, and text segments are fixed-size. The memory mapping segment is effectively fixed-size, but it will grow when your program loads a new dynamic library or uses another memory mapping function. However, this does not happen often and the size increase is always the size of the library or file (or shared memory) being mapped.
The Stack
The stack is a bit more complicated. A zone of memory, the size of which is determined by the program, is reserved for the stack. The top of the stack (low memory address) is initialized with the main function's variables. During execution, more variables may be added to or removed from the bottom of the stack. Pushing data onto the stack 'grows' it down (higher memory address), increasing stack pointer (which maintains the address of the bottom of the stack). Popping data off the stack shrinks it up, reducing the stack pointer. When a function is called, the address of the next instruction in the calling function (the return address, within the text segment) is pushed onto the stack. When a function returns, it restores the stack to the state it was in before the function was called (everything it pushed onto the stack is popped off) and jumps to the return address.
If the stack grows too large, the result is dependent on many factors. Sometimes you get a stack overflow. Sometimes the run-time (in your case, the C runtime) tries to allocate more memory for the stack. This topic is beyond the scope of this answer.
The Heap
The heap is used for dynamic memory allocation. Memory allocated with one of the alloc functions lives on the heap. All other memory allocations are not on the heap. The heap starts as a large block of unused memory. When you allocate memory on the heap, the OS tries to find space within the heap for your allocation. I'm not going to go over how the actual allocation process works.
Virtual Memory
The OS makes your process think that it has the entire 32-/64-bit memory space to play in. Obviously, this is impossible; often this would mean your process had access to more memory than your computer physically has; on a 32-bit processor with 4GB of memory, this would mean your process had access to every bit of memory, with no room left for other processes.
The addresses that your process uses are fake. They do not map to actual memory. Additionally, most of the memory in your process' address space is inaccessible, because it refers to nothing (on a 32-bit processor it may not be most). The ranges of usable/valid addresses are partitioned into pages. The kernel maintains a page table for each process.
When your executable is loaded and when your process loads a file, in reality, it is mapped to one or more pages. The OS does not necessarily actually load that file into memory. What it does is create enough entries in the page table to cover the entire file while notating that those pages are backed by a file. Entries in the page table have two flags and an address. The first flag (valid/invalid) indicates whether or not the page is loaded in real memory. If the page is not loaded, the other flag and the address are meaningless. If the page is loaded, the second flag indicates whether or not the page's real memory has been modified since it was loaded and the address maps the page to real memory.
The stack, heap, and bss work similarly, except they are not backed by a 'real' file. If the OS decides that one of your process' pages isn't being used, it will unload that page. Before it unloads the page, if the modified flag is set in the page table for that page, it will save the page to disk somewhere. This means that if a page in the stack or heap is unloaded, a 'file' will be created that now maps to that page.
When your process tries to access a (virtual) memory address, the kernel/memory management hardware uses the page table to translate that virtual address to a real memory address. If the valid/invalid flag is invalid, a page fault is triggered. The kernel pauses your process, locates or makes a free page, loads part of the mapped file (or fake file for the stack or heap) into that page, sets the valid/invalid flag to valid, updates the address, then reruns the original instruction that triggered the page fault.
AFAIK, the bss section is a special page or pages. When a page in this section is first accessed (and triggers a page fault), the page is zeroed before the kernel returns control to your process. This means that the kernel doesn't pre-zero the entire bss section when your process is loaded.
Further Reading
Anatomy of a Program in Memory
How the Kernel Manages Your Memory
Global variables are not allocated on the stack. They are allocated in the data segment (if initialised) or the bss (if they are uninitialised).

Questions about loading executables into memory

So I'm working with the linux 0.11 kernel on a virtual machine, and I need to write a program that analyses executable files that are ran on that kernel. The files are in the format a.out. What I want to know is, how does the operating system decide where to load the file in (virtual?) memory? Is it decided by something called "base address", and if so, how come I can't seem to find any mention of it in the a.out header?
//where is base address?
struct exec {
unsigned long a_magic; /* Use macros N_MAGIC, etc for access */
unsigned a_text; /* length of text, in bytes */
unsigned a_data; /* length of data, in bytes */
unsigned a_bss; /* length of uninitialized data area for file, in bytes */
unsigned a_syms; /* length of symbol table data in file, in bytes */
unsigned a_entry; /* start address */
unsigned a_trsize; /* length of relocation info for text, in bytes */
unsigned a_drsize; /* length of relocation info for data, in bytes */
};
I tried looking for documentations about the format, but the only information I found just explains what each of these fields are, what values a_magic can have, etc.
I need to know about it because the program needs to print out file and line numbers when given an address in memory of an instruction in the executable, and the debug symbols only have their addresses as offsets (e.g. relative to the start of text section, etc).
Also, out of curiosity, I know that in C, "(void*)0" is NULL, which you can't dereference. How then would you get the content of memory address 0?
As you see, I know very little about linux kernel and operating systems in general, so please start from the basics...
I appreciate any help you can give, thanks.
The base address is the a_entry field.
Also, out of curiosity, I know that in C, "(void*)0" is NULL, which you can't dereference. How then would you get the content of memory address 0?
Any system that puts memory usable by a C program at address zero would have to make it work, somehow. While one can imagine possible ways to do this, I don't know of anyone who bothers. Virtual address zero is, for all intents and purposes, never used.
The operating system can load the application at any location it chooses and then relocate the embedded addresses to be relative to that point. This relocation information is recorded in the a.out file. The base address depends on the architecture and other details and is often non-zero.
If you look at a linker map file, you should see a symbol that is either at the beginning of the memory image, or at a fixed offset from it. At runtime, subtract this value from the actual addresses you note for debugging to get to the relative address of the position you are interested in.
Note, the symbols will not be present in the executable if your linker script strips them.
Also, out of curiosity, I know that in C, "(void*)0" is NULL, which you can't dereference. How then would you get the content of memory address 0?
Actually you can dereference NULL, but the results are not defined. For convenience, most operating systems trap the access to help you debug pointer problems.
Also, memory location with address 0 in a process space is different from the memory location with address 0 in the 'hardware space'. The 'pagination' support in the CPU and operating system are 'decoupling' the physical memory from the virtual memory. It could happen that a virtual page be mapped at address 0, although there you usually have interrupt vectors and other special device memory and not real RAM anyway.

How are the different segments like heap, stack, text related to the physical memory?

When a C program is compiled and the object file(ELF) is created. the object file contains different sections such as bss, data, text and other segments. I understood that these sections of the ELF are part of virtual memory address space. Am I right? Please correct me if I am wrong.
Also, there will be a virtual memory and page table associated with the compiled program. Page table associates the virtual memory address present in ELF to the real physical memory address when loading the program. Is my understanding correct?
I read that in the created ELF file, bss sections just keeps the reference of the uninitialised global variables. Here uninitialised global variable means, the variables that are not intialised during declaration?
Also, I read that the local variables will be allocated space at run time (i.e., in stack). Then how they will be referenced in the object file?
If in the program, there is particular section of code available to allocate memory dynamically. How these variables will be referenced in object file?
I am confused that these different segments of object file (like text, rodata, data, bss, stack and heap) are part of the physical memory (RAM), where all the programs are executed.
But I feel that my understanding is wrong. How are these different segments related to the physical memory when a process or a program is in execution?
1. Correct, the ELF file lays out the absolute or relative locations in the virtual address space of a process that the operating system should copy the ELF file contents into. (The bss is just a location and a size, since its supposed to be all zeros, there is no need to actually have the zeros in the ELF file). Note that locations can be absolute locations (like virtual address 0x100000 or relative locations like 4096 bytes after the end of text.)
2. The virtual memory definition (which is kept in page tables and maps virtual addresses to physical addresses) is not associated with a compiled program, but with a "process" (or "task" or whatever your OS calls it) that represents a running instance of that program. For example, a single ELF file can be loaded into two different processes, at different virtual addresses (if the ELF file is relocatable).
3. The programming language you're using defines which uninitialized state goes in the bss, and which gets explicitly initialized. Note that the bss does not contain "references" to these variables, it is the storage backing those variables.
4. Stack variables are referenced implicitly from the generated code. There is nothing explicit about them (or even the stack) in the ELF file.
5. Like stack references, heap references are implicit in the generated code in the ELF file. (They're all stored in memory created by changing the virtual address space via a call to sbrk or its equivalent.)
The ELF file explains to an OS how to setup a virtual address space for an instance of a program. The different sections describe different needs. For example ".rodata" says I'd like to store read-only data (as opposed to executable code). The ".text" section means executable code. The "bss" is a region used to store state that should be zeroed by the OS. The virtual address space means the program can (optionally) rely on things being where it expects when it starts up. (For example, if it asks for the .bss to be at address 0x4000, then either the OS will refuse to start it, or it will be there.)
Note that these virtual addresses are mapped to physical addresses by the page tables managed by the OS. The instance of the ELF file doesn't need to know any of the details involved in which physical pages are used.
I am not sure if 1, 2 and 3 are correct but I can explain 4 and 5.
4: They are referenced by offset from the top of the stack. When executing a function, the top of the stack is increased to allocate space for local variables. Compiler determines the order of local variables in the stack so the compiler nows what is the offset of the variables from the top of the stack.
Stack in physical memory is positioned upside down. Beginning of stack usually has highest memory address available. As programs runs and allocates space for local variables the address of the top of the stack decrements (and can potentially lead to stack overflow - overlapping with segments on lower addresses :-) )
5: Using pointers - Address of dynamically allocated variable is stored in (local) variable. This corresponds to using pointers in C.
I have found nice explanation here: http://www.ualberta.ca/CNS/RESEARCH/LinuxClusters/mem.html
All the addresses of the different sections (.text, .bss, .data, etc.) you see when you inspect an ELF with the size command:
$ size -A -x my_elf_binary
are virtual addresses. The MMU with the operating system performs the translation from the virtual addresses to the RAM physical addresses.
If you want to know these things, learn about the OS, with source code (www.kernel.org) if possible.
You need to realize that the OS kernel is actually running the CPU and managing the memory resource. And C code is just a light weight script to drive the OS and to run only simple operation with registers.
Virtual memory and Physical memory is about CPU's TLB letting the user space process to use contiguous memory virtually through the power of TLB (using page table) hardware.
So the actual physical memory, mapped to the contiguous virtual memory can be scattered to anywhere on the RAM.
Compiled program doesn't know about this TLB stuff and physical memory address stuff. They are managed in the OS kernel space.
BSS is a section which OS prepares as zero filled memory addresses, because they were not initialized in the c/c++ source code, thus marked as bss by the compiler/linker.
Stack is something prepared only a small amount of memory at first by the OS, and every time function call has been made, address will be pushed down, so that there is more space to place the local variables, and pop when you want to return from the function.
New physical memory will be allocated to the virtual address when the first small amount of memory is full and reached to the bottom, and page fault exception would occur, and the OS kernel will prepare a new physical memory and the user process can continue working.
No magic. In object code, every operation done to the pointer returned from malloc is handled as offsets to the register value returned from malloc function call.
Actually malloc is doing quite complex things. There are various implementations (jemalloc/ptmalloc/dlmalloc/googlemalloc/...) for improving dynamic allocations, but actually they are all getting new memory region from the OS using sbrk or mmap(/dev/zero), which is called anonymous memory.
Just do a man on the command readelf to find out the starting addresses of the different segments of your program.
Regarding the first question you are absolutely right. Since most of today's systems use run-time binding it is only during execution that the actual physical addresses are known. Moreover, it's the compiler and the loader that divide the program into different segments after linking the different libraries during compile and load time. Hence, the virtual addresses.
Coming to the second question it is at the run-time due to runtime binding. The third question is true. All uninitialized global variables and static variables go into BSS. Also note the special case: they go into BSS even if they are initialized to 0.
4.
If you look at a assembler code generated by gcc you can see that memory local variables is allocated in stack through command push or through changing value of the register ESP. Then they are initiated with command mov or something like that.

Resources