Linux memory map regions to store procedures - c

I have a question asking me to explain in what regions of a linux memory map a procedure is stored. The question instructs me to use objdump -h to find this information.
Now, I am a little bit confused what "regions in memory" means.
I know that for a given procedure we have certain register that we work with (say %eax, %edx...) and also for each variable we have a memory location it is stored in (say 8(%ebp)). In addition I know that we have the %esp and %ebp registers to "take care" of the stack.
I also run objdump -h on my file but from what I get I cannot tell anything specific.
So should I just mention the registers being used and the memory addresses where the variables of this procedure are being stored?

I believe your question is asking where the linker has designated your actual code to reside in memory when it's loaded by the operating system. This area of code would be represented by the program counter register, or %EIP on x86.
Typically on Linux, program code as well as read-only variables are stored in the lower regions of mapped memory for the process, with the stack in the upper regions (i.e., the stack grows down).

You could easily do a internet search for linux memory map, after all it is your homework and you would learn how to problem solve and do research.
Each program has certain segments, here are a few:
bss - uninitialized data
data - initialized data (strings, arrays etc...)
text - code "procedures"
Sections are relevant to the start address of the program, with positive or negative offsets.
Here is a good page:
http://duartes.org/gustavo/blog/post/anatomy-of-a-program-in-memory

Related

Beginner's confusion about x86 stack

First of all, I'd like to know if this model is an accurate representation of the stack "framing" process.
I've been told that conceptually, the stack is like a Coke bottle. The sugar is at the bottom and you fill it up to the top. With this in mind, how does the Call tell the EIP register to "target" the called function if the EIP is in another bottle (it's in the code segment, not the stack segment)? I watched a video on YouTube saying that the "Code Segment of RAM" (the place where functions are kept) is the place where the EIP register is.
Typically, a computer program uses four kinds of memory areas (also called sections or segments):
The text section: This contains the program code. It is reserved when the program is loaded by the operating system. This area is fixed and does not change while the program is running. This would better be called "code" section, but the name has historical reasons.
The data section: This contains variables of the program. It is reserved when the program is loaded and initialized to values defined by the programmer. These values can be altered by the program while it executes.
The stack: This is a dynamic area of memory. It is used to store data for function calls. It basically works by "pushing" values onto the stack and popping from the stack. This is also called "LIFO": last in first out. This is where local variables of a function reside. If a function complets, the data is removed from the stack and is lost (basically).
The heap: This is also a dynamic memory region. There are special function in the programming language which "allocate" (reserve) a piece of this area on request of the program. Another function is available to return this area to the heap if it is not required anymore. As the data is released explicitly, it can be used to store data which lives longer than just a function call (different from the stack).
The data for text and data section are stored in the program file (they can be found in Linux for example using objdump (add a . to the names). stack and heap are not stored anywhere in the file as they are allocated dynamically (on-demand) by the program itself.
Normally, after the program has been loaded, the memory area reamining is treated as a single large block where both, stack and heap are located. They start from opposite end of that area and grow towards each other. For most architectures the heap grows from low to high memory addresses (ascending) and the stack downwards (decending). If they ever intersect, the program has run out of memory. As this may happen undetected, the stack might corrupt (change foreign data) the heap or vice versa. This may result in any kind of errors, depending how/what data has changed. If the stack gets corrupted, this may result in the program going wild (this is actually one way a trojan might work). Modern operating systems, however should take measures to detect this situation before it becomes critical.
This is not only for x86, but also for most other CPU families and operating system, notably: ARM, x86, MIPS, MSP430 (microcontroller), AVR (microcontroller), Linux, Windows, OS-X, iOS, Android (which uses Linux OS), DOS. For microcontrollers, there is often no heap (all memory is allocated at run-time) and the stack may be organized a bit differently; this is also true for the ARM-based Cortex-M microcontrollers. But anyway, this is quite a special subject.
Disclaimer: This is very simplified, so please no comments like "how about bss, const, myspecialarea";-) . There also is not requirement from the C standard for these areas, specifically to use a heap or a stack. Indeed there are implementations which don't use either. Those are most times embedded systems with small (8 or 16 bit) MCUs or DSPs. Also modern architectures use CPU registers instead of the stack to pass parameters and keep local variables. Those are defined in the Application Binary Interface of the target platform.
For the stack, you might read the wikipedia article. Note the difference in implementation between the datatstructure "stack" and the "hardware stack" as implemented in a typical (micro)processor.

Access process memory directly

simple question:
Is it possible, and how is it possible, to acess the Virtual Memory of my program directly?
To be specific,
instead of typing
int someValue = 5;
can I do something like this:
VirtualMemory[0x0] = (int)5;
I'm just asking because I want the values to be stored next to each other to get a nice and small memory map.
When I look into assembler basics, the processor stores values directly after each other and I was wondering how to do so in c.
Thanks for all of your replies.
Cheers,
Lucky
Not exactly, because in the source code you don't know which memory address your program is going to be "loaded into". So all memory addresses in the program are encoded in an "offset from the start of program" type manner.
Part of the "process loader"'s responsibility in copying the program into memory is to add the "base offset pointer" to all the other offesets, so all the "names" describing memory addresses refer to actual memory addresses instead of "offsets from the beginning of the program".
That's generally a good thing, as if they were encoded directly, two programs that needed the same set of addresses couldn't be run at the same time without corrupting each other's shared memory. In addition, loading a program into a different starting address would not be possible, as walking outside of the memory of your program (nearly guaranteed if you relocate the program without rewriting the memory address references) is going to raise a segfault in the operating system's memory management monitors.
Also you need a name to start at, and this means that the offsets are bound to the variable names. Generally it is much easier to do fishing around in the heap based off of an alloc'd item than it is to truly find the start of the program loaded in memory (because the C programming language doesn't really capture that address into a in-language variable name, and the layout is somewhat system dependent).

Big empty space in memory?

Im very new to embedded programming started yesterday actually and Ive noticed something I think is strange. I have a very simple program doing nothing but return 0.
int main() {
return 0;
}
When I run this in IAR Embedded Workbench I have a memory view showing me the programs memory. Ive noticed that in the memory there is some memory but then it is a big block of empty space and then there is memory again (I suck at explaining :P so here is an image of the memory)
Please help me understand this a little more than I do now. I dont really know what to search for because Im so new to this.
The first two lines are the 8 interrupt vectors, expressed as 32-bit instructions with the highest byte last. That is, read them in groups of 4 bytes, with the highest byte last, and then convert to an instruction via the usual method. The first few vectors, including the reset at memory location 0, turn out to be LDR instructions, which load an immediate address into the PC register. This causes the processor to jump to that address. (The reset vector is also the first instruction to run when the device is switched on.)
You can see the structure of an LDR instruction here, or at many other places via an internet search. If we write the reset vector 18 f0 95 e5 as e5 95 f0 18, then we see that the PC register is loaded with the address located at an offset of 0x20.
So the next two lines are memory locations referred to by instructions in the first two lines. The reset vector sends the PC to 0x00000080, which is where the C runtime of your program starts. (The other vectors send the PC to 0x00000170 near the end of your program. What this instruction is is left to the reader.)
Typically, the C runtime is code added to the front of your program that loads the global variables into RAM from flash, and sets the uninitialized RAM to 0. Your program starts after that.
Your original question was: why have such a big gap of unused flash? The answer is that flash memory is not really at a premium, so we can waste a little, and that having extra space there allows for forward-compatibility. If we need to increase the vector table size, then we don't need to move the code around. In fact, this interrupt model has been changed in the new ARM Cortex processors anyway.
Physical (not virtual) memory addresses map to physical circuits. The lowest addresses often map to registers, not RAM arrays. In the interest of consistency, a given address usually maps to the same functionality on different processors of the same family, and missing functionality appears as a small hole in the address mapping.
Furthermore, RAM is assigned to a contiguous address range, after all the I/O registers and housekeeping functions. This produces a big hole between all the registers and the RAM.
Alternately, as #Martin suggests, it may represent uninitialized and read-only Flash memory as -- bytes. Unlike truly unassigned addresses, access to this is unlikely to produce an exception, and you might even be able to make them "reappear" using appropriate Flash controller commands.
On a modern desktop-class machine, virtual memory hides all this from you, and even parts of the physical address map may be configurable. Many embedded-class processors allow configuration to the extent of specifying the location of the interrupt vector table.
UncleO is right but here is some additional information.
The project's linker command file (*.icf for IAR EW) determines where sections are located in memory. (Look under Project->Options->Linker->Config to identify your linker configuration file.) If you view the linker command file with a text editor you may be able to identify where it locates a section named .intvec (or similar) at address 0x00000000. And then it may locate another section (maybe .text) at address 0x00000080.
You can also see these memory sections identified in the .map file, along with their locations. (Ensure "Generate linker map file" is checked under Project->Options->Linker->List.) The map file is an output from the build, however, and it's the linker command file that determines the locations.
So that space in memory is there because the linker command file instructed it to be that way. I'm not sure whether that space is necessary but it's certainly not a problem. You might be able to experiment with the linker command file and move that second section around. But the exception table (a.k.a. interrupt vector table) must be located at 0x00000000. And you'll want to ensure that the reset vector points to the new location of the startup code if you move it.

How are the different segments like heap, stack, text related to the physical memory?

When a C program is compiled and the object file(ELF) is created. the object file contains different sections such as bss, data, text and other segments. I understood that these sections of the ELF are part of virtual memory address space. Am I right? Please correct me if I am wrong.
Also, there will be a virtual memory and page table associated with the compiled program. Page table associates the virtual memory address present in ELF to the real physical memory address when loading the program. Is my understanding correct?
I read that in the created ELF file, bss sections just keeps the reference of the uninitialised global variables. Here uninitialised global variable means, the variables that are not intialised during declaration?
Also, I read that the local variables will be allocated space at run time (i.e., in stack). Then how they will be referenced in the object file?
If in the program, there is particular section of code available to allocate memory dynamically. How these variables will be referenced in object file?
I am confused that these different segments of object file (like text, rodata, data, bss, stack and heap) are part of the physical memory (RAM), where all the programs are executed.
But I feel that my understanding is wrong. How are these different segments related to the physical memory when a process or a program is in execution?
1. Correct, the ELF file lays out the absolute or relative locations in the virtual address space of a process that the operating system should copy the ELF file contents into. (The bss is just a location and a size, since its supposed to be all zeros, there is no need to actually have the zeros in the ELF file). Note that locations can be absolute locations (like virtual address 0x100000 or relative locations like 4096 bytes after the end of text.)
2. The virtual memory definition (which is kept in page tables and maps virtual addresses to physical addresses) is not associated with a compiled program, but with a "process" (or "task" or whatever your OS calls it) that represents a running instance of that program. For example, a single ELF file can be loaded into two different processes, at different virtual addresses (if the ELF file is relocatable).
3. The programming language you're using defines which uninitialized state goes in the bss, and which gets explicitly initialized. Note that the bss does not contain "references" to these variables, it is the storage backing those variables.
4. Stack variables are referenced implicitly from the generated code. There is nothing explicit about them (or even the stack) in the ELF file.
5. Like stack references, heap references are implicit in the generated code in the ELF file. (They're all stored in memory created by changing the virtual address space via a call to sbrk or its equivalent.)
The ELF file explains to an OS how to setup a virtual address space for an instance of a program. The different sections describe different needs. For example ".rodata" says I'd like to store read-only data (as opposed to executable code). The ".text" section means executable code. The "bss" is a region used to store state that should be zeroed by the OS. The virtual address space means the program can (optionally) rely on things being where it expects when it starts up. (For example, if it asks for the .bss to be at address 0x4000, then either the OS will refuse to start it, or it will be there.)
Note that these virtual addresses are mapped to physical addresses by the page tables managed by the OS. The instance of the ELF file doesn't need to know any of the details involved in which physical pages are used.
I am not sure if 1, 2 and 3 are correct but I can explain 4 and 5.
4: They are referenced by offset from the top of the stack. When executing a function, the top of the stack is increased to allocate space for local variables. Compiler determines the order of local variables in the stack so the compiler nows what is the offset of the variables from the top of the stack.
Stack in physical memory is positioned upside down. Beginning of stack usually has highest memory address available. As programs runs and allocates space for local variables the address of the top of the stack decrements (and can potentially lead to stack overflow - overlapping with segments on lower addresses :-) )
5: Using pointers - Address of dynamically allocated variable is stored in (local) variable. This corresponds to using pointers in C.
I have found nice explanation here: http://www.ualberta.ca/CNS/RESEARCH/LinuxClusters/mem.html
All the addresses of the different sections (.text, .bss, .data, etc.) you see when you inspect an ELF with the size command:
$ size -A -x my_elf_binary
are virtual addresses. The MMU with the operating system performs the translation from the virtual addresses to the RAM physical addresses.
If you want to know these things, learn about the OS, with source code (www.kernel.org) if possible.
You need to realize that the OS kernel is actually running the CPU and managing the memory resource. And C code is just a light weight script to drive the OS and to run only simple operation with registers.
Virtual memory and Physical memory is about CPU's TLB letting the user space process to use contiguous memory virtually through the power of TLB (using page table) hardware.
So the actual physical memory, mapped to the contiguous virtual memory can be scattered to anywhere on the RAM.
Compiled program doesn't know about this TLB stuff and physical memory address stuff. They are managed in the OS kernel space.
BSS is a section which OS prepares as zero filled memory addresses, because they were not initialized in the c/c++ source code, thus marked as bss by the compiler/linker.
Stack is something prepared only a small amount of memory at first by the OS, and every time function call has been made, address will be pushed down, so that there is more space to place the local variables, and pop when you want to return from the function.
New physical memory will be allocated to the virtual address when the first small amount of memory is full and reached to the bottom, and page fault exception would occur, and the OS kernel will prepare a new physical memory and the user process can continue working.
No magic. In object code, every operation done to the pointer returned from malloc is handled as offsets to the register value returned from malloc function call.
Actually malloc is doing quite complex things. There are various implementations (jemalloc/ptmalloc/dlmalloc/googlemalloc/...) for improving dynamic allocations, but actually they are all getting new memory region from the OS using sbrk or mmap(/dev/zero), which is called anonymous memory.
Just do a man on the command readelf to find out the starting addresses of the different segments of your program.
Regarding the first question you are absolutely right. Since most of today's systems use run-time binding it is only during execution that the actual physical addresses are known. Moreover, it's the compiler and the loader that divide the program into different segments after linking the different libraries during compile and load time. Hence, the virtual addresses.
Coming to the second question it is at the run-time due to runtime binding. The third question is true. All uninitialized global variables and static variables go into BSS. Also note the special case: they go into BSS even if they are initialized to 0.
4.
If you look at a assembler code generated by gcc you can see that memory local variables is allocated in stack through command push or through changing value of the register ESP. Then they are initiated with command mov or something like that.

C : Memory layout of C program execution

I wanted know how the kernel is providing memory for simple C program .
For example :
#include<stdio.h>
#include<malloc.h>
int my_global = 10 ;
main()
{
char *str ;
static int val ;
str = ( char *) malloc ( 100 ) ;
scanf ( "%s" , str ) ;
printf( " val:%s\n",str ) ;
free(str) ;
return 1 ;
}
See, In this program I have used static , global and malloc for allocating dynamic memory
So , how the memory lay out will be ... ?
Any one give me url , which will have have details information about this process..
Very basically, in C programs built to target ELF (Executable and Linkable Format) such as those built on linux there is a standard memory layout that is created. Similar layouts probably exist for other architectures, but I don't know enough to tell you more about them.
The Layout:
There are some global data sections that are initialized at low memory addresses in memory (such as sections for the currently executing code, global data, and any strings that are created with "..." inside your C code).
Below that there is a heap of open memory that can be used. The size of this heap increases automatically as calls to malloc and free move what is called the "program break" to higher addresses in memory.
Starting at a high address in memory, the stack grows towards lower addresses. The stack contains memory for any locally allocated variables, such as those at the top of functions or within a scope ({ ... }).
More Info:
There is a good description of a running ELF program here and more details on the format itself on the Wikipedia article. If you want an example of how a compiler goes about translating C code into assembly you might look at GCC, their Internals Manual has some interesting stuff in it; the most relevant sections are probably those in chapter 17, especially 17.10, 17.19 and 17.21. Finally, Intel has a lot of information about memory layout in its IA-32 Architectures Software Developer’s Manual. It describes how Intel processors handle memory segmentation and the creation of stacks and the like. There's no detail about ELF, but it's possible to see where the two match up. The most useful bits are probably section 3.3 of Volume 1: Basic Architecture, and chapter 3 of Volume 3A: System Programming Guide, Part 1.
I hope this helps anyone diving into the internals of running C programs, good luck.
There's a brief discussion at wikipedia.
A slightly longer introduction is here.
More details available here, but I'm not sure it's presented very well.
All static and global variables are stored in the Data segment, all automatic and temporary variables are stored on the stack, and all dynamic variable are stored on the heap.
All function parameters are stored on the stack and there is a different stack frame for each function call this is how recursion function works.
For more on this, see this site.
In practical words, when you run any C-program, its executable image is loaded into RAM of computer in an organized manner which is called process address space or Memory layout of C program.
http://www.firmcodes.com/memory-layout-c-program-2/
all the static and global uninitialized variables goes into bss(Block started by Symbol).
all Initialized GLobal/Local/static variable further divide as
read only
const int x=10;
& read/write
char Str[]="StackOverFlow"
The stack segment is area where local variables are stored. By saying local variable means that all those variables which are declared in every function including main( ) in your C program.
Text segment contain executable instructions of your C program, its also called code segment. This is the machine language representation of the program steps to be carried out, including all functions making up the program, both user defined and system. The text segment is sharable so that only a single copy needs to be in memory for different executing programs, such as text editors, shells, and so on. Usually, text segment is read-only, to prevent a program from accidentally modifying its instructions.
one more region in the memory layout of a program is Unmapped or reserved segment contain command line arguments and other program related data like lower address-higher address of executable image, etc.

Resources