Tool to clearly visualize Memory Layout of a C Program

Tool to clearly visualize Memory Layout of a C Program - c

Suppose I am having this code:
int main() {
int var1;
char *ptr = malloc(5 * sizeof(char));
//...........
do_something();
//...........
return 0;
}
We know that the actual memory layout will be divided into segments like: .text, .bss, .data, .heap, .stack.
I know how to use objdump, readelf, etc. But, I want to get a better view of the memory stack, where I can see things like:
.heap ptr
.stack do_something()
.text main()
.bss var1
The main point is: The actual variable names are missing from the output of objdump, readelf etc.
I am compiling this code with -g, thus retaining the symbol table.
Then, why am I not able to see the memory layout with local/global variable names included?
objdump -x shows the names of the variables if type is static otherwise not. Why?

There are few methods to track memory allocation but none of them is a builtin method and all of them require some additional work on your side. In order to visualise memory you will have to use code instrumentation and/or event logging i.e. memory allocation and deallocation events and then replay all the events and generate graphs out of it.
Take a look at this paper:Visualizing Dynamic Memory Allocations (in C programs).
The GCSpy (for heap visualisation) is available here: https://www.cs.kent.ac.uk/projects/gc/gcspy/. While initially used for JVM, you can visualise the heap of a C program using for instance dlmalloc.
I completely understand why you would like to do that - I was looking for the same thing. While I don't find memory layout snapshotting very useful per se, I find observing how memory is being allocated over time very interesting and useful for debugging performance issues.
I remember that XCode had some instrumentation tools built in - never used them though, but perhaps worth exploring what they are offering.

Sorry to say you're a bit confused about this. Consider:
all your functions go in the .text section
all your non-static local variables on on the stack: that they may be pointers and you intend to assign them a value returned from malloc doesn't put them on the heap, it just attempts to create a pointed-to object on the heap. No static tool looking at the binary (such as objdump, readelf) can know whether malloc will return memory or fail.
your global and static variables are likely to end up in an initialised or uninitialised data segment - which depends on whether the initial bit pattern is entirely 0s, and whether the compiler can convince itself of that at compile time.
Further, if you understand the above, then you don't need anything to draw you a little chart on a variable by variable basis, you just know instantly what type of memory you're using.

Related

stack memory management in embedded systems

In a course I am taking about embedded systems there are certain statements which lack a deep explanation which has left me confused at some points. I would be grateful if someone can offer me clarifications.
I have been told that, if there are initialized variables, their initialization values are stored in the code segment (may be in flash) and are loaded (may be to RAM) by startup routines before running the program. This make sense to me considering global variables as they are allocated to .data section. I presume that global variables have a fixed address for the entire program and the initialization value is loaded to a specific address location(please correct me if I am wrong). Now, how is this done for local variables considering that they don't have a fixed address location on stack? Considering that local variables come to existence only during function execution, how do they get initialized each time the function is invoked?
Also, The instructor says, "The stack is reserved at compile time and the data is allocated at runtime by pre-compiled instructions". Can someone please make me understand the latter half of this statement?

Your understanding of static variables the the .data section is correct. You may also want to consider zero-initialized static variables in the .bss section. These are initialized at the same time as those in the .data section, but their initial value does not need to be stored because it is zero.
Automatic variables may be on the stack or may be optimized to only be in processor registers. Either way, code is generated by the compiler to initialize them each time the function using them is called. If they are on the stack then this will include an instruction to adjust the stack pointer to "allocate" space for them when they are needed and "free" them when they go out of context.
The space for the entire stack is usually allocated in the linker script. In an embedded microcontroller system no instructions are necessary to "allocate" it. Depending on the hardware there may be code required to enable access to external memory, but in most cases there is a bank of fast SRAM ready to use as soon as the system powers on, and the first stack will be in this.

When are shared library functions loaded into the heap?

(This question concerns only the logical addresses)
I was experimenting with some code where I print out the addresses of different types/scopes of variables to better visualize the process image.
My confusion arose when I printed the addresses a few variables that have been allocated on the heap by malloc, and then also printed the address of the printf function out of curiosity.
What I discovered is that printf was being stored at a much higher address (ie closer to the stack) on the heap than my malloc allocated variables. This doesn't make sense to me because I assumed that the library functions would be loaded on the heap first thing at runtime before any other instructions are executed. I even put a printf statement before any malloc statements, in case library functions were loaded on the fly as they were needed, but it didn't change anything.
Thanks.

(This answer concerns Unix only. I don't know how it is on Windows.)
Most shared libraries are loaded into RAM before control reaches main, and the library containing printf definitely will be. The functions in dlfcn.h can be used to load more shared libraries during execution of the program, that's the most important exception.
Shared libraries have never been loaded as part of the "heap", if by that you mean the region of memory used to satisfy malloc requests. They are loaded using the system primitive mmap, and could be placed anywhere at all in memory. As user3386109 pointed out in a comment on the question, on modern systems, their locations are intentionally randomized as a countermeasure for various exploits.

Are activation records created on stack or heap in C?

I am reading about memory allocation and activation records. I am having some doubts. Can anyone make the following crystal clear ?
A). My first doubt is that "Are activation records created on stack or heap in C" ?
B). These are few lines from an abstract which i am referring :-->
Even though memory on stack area is created during run time- the
amount of memory (activation record size) is determined at compile
time. Static and global memory area is compile time determined and
this is part of the binary. At run time, we cannot change this. Only
memory area freely available for the process to change during runtime
is heap.At compile time compiler only reserves the stack space for
activation record. This gets used (allocated on actual memory) only
during program run. Only DATA segment part of the program like static
variables, string literals etc. are allocated during compile time. For
heap area, how much memory to be allocated is also determined at run
time.
Can anyone please elaborate these lines as i am unable to understand anything ?
I am sure the explaination would be of great need to me.

As a quick answer, I don't even really know what an activation record is. The rest of the quote has very poor English and is quite misleading.
Honestly, the abstract is talking about absolutes when in reality, there really are not at all absolute. You do define a main stack at compile time, yes (though you can create many stacks at runtime as well).
Yes, when you want to allocate memory, one usually creates a pointer to store that information, but where you place that is completely up to you. It can be stack, it can be global memory, it can be in the heap from another allocation, or you can just leak memory and not store it anywhere it all if you wish. Perhaps this is what is meant by an activation record?
Or perhaps, it means that when dynamic memory is created, somewhere in memory, there has to be some sort of information that keeps track of used and unused memory. For many allocators, this is a list of pointers stored somewhere in the allocated memory, though others store it in a different piece of memory and some could even place that on the stack. It all depends on the needs of the memory system.
Finally, where dynamic memory is allocated from can vary as well. It can come from a call to the OS, though in some cases, it can also just be overlayed onto existing global (or even stack) memory - which is not uncommon in embedded programming.
As you can see, this abstract is not even close to what dynamic memory represents.
Additional info:
Many are jumping all over me stating that 'C' has no stack in the standard. Correct. That said, how many people have truly coded in C without one? I'll leave that alone for now.
Defined memory, as you call it, is anything declared with the 'static' keyword within a function or any variable declared outside of a function without the 'extern' keyword in front of it. This is memory that the compiler knows about and can reserve space for without any additional help.
Allocated memory - is not a good term as defined memory can also be considered allocated. Instead, use the term dynamic memory. This is memory that you allocate from a heap at run-time. An example:
char *foo;
int my_value;
int main(void)
{
foo = malloc(10 * sizeof(char));
// Do stuff with foo
free(foo);
return 0;
}
foo is "defined" as you say as a pointer. If nothing else were done, it would only reserve that much memory, but when the malloc is reached in main(), it now points to at least 10 bytes of dynamic memory as well. Once the free is reached, that memory is now made available to the program for other uses. It's allocated size is 'dynamic'. Compare that to my_value which will always be the size of an int and nothing else.

In C (given how it is almost universally implemented*) An activation record is exactly the same thing as a stack frame which is the same thing as a call frame. They are always created on the stack.
The stack segment is a memory area the process gets "for free" from the OS when it created. It does not need to malloc or free it. On x86, a machine register (e.g RSP) points to the end of the segment and stack frames/activation records/call frames are "allocated" by decrementing the pointer in that register by how many byte to allocate. E.g:
int my_func() {
int x = 123;
int y = 234;
int z = 345;
...
return 1;
}
An unoptimizing C compiler could generate assembly code for keeping those three variables in the stack frame like this:
my_func:
; "allocate" 24 bytes of stack space
sub rsp, 24
; Initialize the allocated stack memory
mov [rsp], 345 ; z = 345
mov [rsp+8], 234 ; y = 234
mov [rsp+16], 134 ; x = 123
...
; "free" the allocated stack space
add rsp, 24
; return 1
mov rax, 1
ret
In other contexts and languages activation records can be implemented differently. For example using linked lists. But as the language is C and the context is low-level programming I don't think it is useful to discuss that.

In theory, a C99 (or C11) compatible implementation (e.g. a C compiler & C standard library implementation) do not even need (in all cases) a call stack. For example, one could imagine a whole program compiler (notably for freestanding C implementation) which would analyze the entire program and decide that stack frames are unneeded (e.g. each local variable could be allocated statically, or fit in a register). Or one could imagine an implementation allocating the call frames as continuation frames (perhaps after CPS transformation by the compiler) elsewhere (e.g. in some "heap"), using techniques similar to those described in Appel old book Compiling with Continuations (describing an SML/NJ compiler).
(remember that a programming language is a specification -not some software-, often written in English, perhaps with additional formalization, in some technical report or standard document. AFAIK, the C99 or C11 standards do not even mention any stack or activation record. But in practice, most C implementations are made of a compiler and a standard library implementation.)
In practice, allocation records are call frames (for C, they are synonyms; things are more complex with nested functions) and are allocated on a hardware assisted call stack on all reasonable C implementations I know. on Z/Architecture there is no hardware stack pointer register, so it is a convention (dedicating some register to play the role of the stack pointer).
So look first at call stack wikipage. It has a nice picture worth many words.
Are activation records created on stack or heap
In practice, they (activation records) are call frames on the call stack (allocated following calling conventions and ABIs). Of course the layout, slot usage, and size of a call frame is computed at compile-time by the compiler.
In practice, a local variable may correspond to some slot inside the call frame. But sometimes, the compiler would keep it only in a register, or reuse the same slot (which has a fixed offset in the call frame) for various usages, e.g. for several local variables in different blocks, etc.
But most C compilers are optimizing compilers. They are able to inline a function, or sometimes make a tail call to it (then the caller's call frame is reused as or overwritten by the callee call frame), so details are more complex.
See also this How was C ported to architectures that had no hardware stack? question on retro.

Determine total memory usage of embedded C program

I would like to be able to debug how much total memory is being used by C program in a limited resource environment of 256 KB memory (currently I am testing in an emulator program).
I have the ability to print debug statements to a screen, but what method should I use to calculate how much my C program is using (including globals, local variables [from perspective of my main function loop], the program code itself etc..)?
A secondary aspect would be to display the location/ranges of specific variables as opposed to just their size.
-Edit- The CPU is Hitachi SH2, I don't have an IDE that lets me put breakpoints into the program.

Using the IDE options make the proper actions (mark a checkobx, probably) so that the build process (namely, the linker) will generate a map file.
A map file of an embedded system will normally give you the information you need in a detailed fashion: The memory segments, their sizes, how much memory is utilzed in each one, program memory, data memory, etc.. There is usually a lot of data supplied by the map file, and you might need to write a script to calculate exactly what you need, or copy it to Excel. The map file might also contain summary information for you.
The stack is a bit trickier. If the map file gives that, then there you have it. If not, you need to find it yourself. Embedded compilers usually let you define the stack location and size. Put a breakpoint in the start of you program. When the application stops there zero the entire stack. Resume the application and let it work for a while. Finally stop it and inspect the stack memory. You will see non-zero values instead of zeros. The used stack goes until the zeros part starts again.

Generally you will have different sections in mmap generated file, where data goes, like :
.intvect
.intvect_end
.rozdata
.robase
.rosdata
.rodata
.text .... and so on!!!
with other attributes like Base,Size(hex),Size(dec) etc for each section.

While at any time local variables may take up more or less space (as they go in and out of scope), they are instantiated on the stack. In a single threaded environment, the stack will be a fixed allocation known at link time. The same is true of all statically allocated data. The only run-time variable part id dynamically allocated data, but even then sich data is allocated from the heap, which in most bare-metal, single-threaded environments is a fixed link-time allocation.
Consequently all the information you need about memory allocation is probably already provided by your linker. Often (depending on your tool-chain and linker parameters used) basic information is output when the linker runs. You can usually request that a full linker map file is generated and this will give you detailed information. Some linkers can perform stack usage analysis that will give you worst case stack usage for any particular function. In a single threaded environment, the stack usage from main() will give worst case overall usage (although interrupt handlers need consideration, the linker is not thread or interrupt aware, and some architectures have separate interrupt stacks, some are shared).
Although the heap itself is typically a fixed allocation (often all the available memory after the linker has performed static allocation of stack and static data), if you are using dynamic memory allocation, it may be useful at run-time to know how much memory has been allocated from the heap, as well as information about the number of allocations, average size of allocation, and the number of free blocks and their sizes also. Because dynamic memory allocation is implemented by your system's standard library any such analysis facility will be specific to your library, and may not be provided at all. If you have the library source you could implement such facilities yourself.
In a multi-threaded environment, thread stacks may be allocated statically or from the heap, but either way the same analysis methods described above apply. For stack usage analysis, the worst-case for each thread is measured from the entry point of each thread rather than from main().

What might be the point in putting a variable exactly in the "STACK" section with attribute ((section("STACK"))?

In gcc doc one reason is given for using section. This reason is to map to special hardware. But this seems to be not my case.
So I have given a task to modify a shared library that we use on our project. It is a Linux library. There is variable declarations in the library that puzzeles me. They look like this (roughly):
static int my_var_1 __attribute__((section("STACK"))) = 0;
Update 1:
There are a dozen of variables defined in this way (__attribute__((section("STACK"))))
Update 2:
my_var_1 is not a constant. my_var_1 might be changed in code during initialization:
my_var_1 = atoi(getenv("MY_VAR_1") ? getenv("MY_VAR_1") : "0");
later in the library it is used like this:
inline void do_something() __attribute__((always_inline));
inline void do_something()
{
if (my_var_1)
do_something_else();
}
What might be the point in using __attribute__((section("STACK")))? I understand that section tells the compiler to put a variable in the particular section. However what might be the point in putting static int exactly in the "STACK" section?
Update 3
These lines are excerpt from the output from readelf -t my_lib.so
[23] .got.plt
PROGBITS 00000000002103f0 00000000000103f0 0
00000000000003a8 0000000000000008 0 8
[0000000000000003]: WRITE, ALLOC
[24] .data
PROGBITS 00000000002107a0 00000000000107a0 0
00000000000000b0 0000000000000000 0 16
[0000000000000003]: WRITE, ALLOC
[25] STACK
PROGBITS 0000000000210860 0000000000010860 0
00000000000860e0 0000000000000000 0 32
[0000000000000003]: WRITE, ALLOC
[26] .bss
NOBITS 0000000000296940 0000000000096940 0
0000000000000580 0000000000000000 0 32
[0000000000000003]: WRITE, ALLOC
Update 4
Managed to get information from the author of the shared library.
__attribute__((section("STACK"))) was added since he had not managed to build the library on Solaris. Then he found this workaround. Before the workaround the definition of the my_var_1 was like:
int my_var_1 = 0;
and everything was OK. Then he changed it since my_var_1 was in fact needed only in this translation unit:
static int my_var_1 = 0;
And after that change he did not manage to build the library on Solaris. So he added __attribute__((section("STACK"))) and it helped somehow.

First the STACK section won't be the stack of any running task.
Putting variables, functions in a specific Section allow to select a memory area for them (thanks to the linker script). On some (mostly embedded) architecture, you want put often accessed data in the faster memory.
Other solution, some development post-link script will set all the STACK section to 1: a development software will always do do_something_else(). And the released software may keep the default value of 0.
An other possibility, if there are other variables in the STACK section, the developer wants to keep them close in the memory. All Variable in the STACK section will be near each other. Maybe a cache optimization ?

There may be many reasons and it is difficult to tell without details. Some of the reasons might be:
The section marked STACK is linked in run-time to a closely coupled memory with faster access time then other RAMs. It makes sense to map the stack to such a RAM to avoid stalls during function calls. Now if you suddenly had a variable that is accessed a lot and you wanted to map it to the same fast access RAM putting it in the same section as the stack makes sense.
The section marked STACK might be mapped to a region of memory that is accessible when other parts of memory might not be. For example, boot loaders need to init the memory controller before they can access RAM. But you really want to be able to write the code that does that in C, which requires stack. So you find some special memory (such as programming the data cache to write-back mode) and map the stack there so you can run code to get the memory controller working so you can use RAM. Once again, if you now happen to have a global variable that still need to be accessed before RAM is available, you might decide to put it in the STACK section.
A better programmer would have renamed the STACK section to something else if it is used not only for stack.

In some operating systems, the same region of addressing space is used for every thread's stack; when execution switches between threads, the mapping of that space is changed accordingly. On such systems, every thread will have its own independent set of any static variables located within that region of address space. Putting variables that need to be maintained separately for each thread in such an address range will avoid the need to manually swap them with each task switch.
Another occasional use for forcing variables into a stack area is to add stack sentinels (variables that can be periodically checked to see if a stack overflow clobbered them).
A third use occurs on 8086 and 80286 platforms (probably not so much later chips in the family): the 8086 and 80286 are limited to efficiently accessing things in four segments without having to reload segment registers. If code needs to do something equivalent to
for (n=0; n<256; n++)
*dest++ = xlat[*src++];
and none of the items can be put in the code segment, being able to force one of the items into the stack segment can make code much faster. Hand-written assembly code would be required to achieve the speedup, but it can be extremely massive (nearly a factor of two in some real-world situations I've done on 8086, and perhaps even greater in some situations on the 80286).