Benchmarking memory use in the Keil uVision 5 simulator - arm

I have a Keil uVision project that I would like to benchmark extensively. The code is currently running in simulator mode. To visualize results we simply store characters in a memory region and display said region as ASCII.
This approach works pretty well to obtain timings using the Cortex-M system ticks. However, I don't have an idea for the ram usage of the code:
Ideally I would the simulator to stop the execution when the maximum amount of ram was used.
I would also like to see maximum heap usage (or even per function).
Is there a way to do obtain these values? I'm aware of the maximum stack size reported by the build system.
Is there a way to limit the amount of ram available in the uVision simulator?
Thanks

There is a fairly obvious solution: Just count the ram in the memory window. First find the memory regions allocated for the heap and the stack (this can usually be found in the startup assembly file). Then go through the memory window in the debugger and just look where the memory didn't get changed.
Keil usually initializes the memory with 0 so the stack boundaries can easily be seen.
The total stack usage can be computed in the following way
$TOTAL = $TOP - $BOTTOM
If the boundary can't be seen it might make sense to first initialize the memory with a pattern (see here).

Related

Is it able to adjust HEAP size IN APPLICATION on a STM32 microcontroller?

In my program, I'm using a lot of malloc(), and it's targeting chips with different RAM sizes(same peripherals and footprint of course), so I want to set HEAP size in my c code application, before the first call to malloc().
Setting HEAP or STACK size in IDE before compile is not I wanted. I searched all around and found nothing.
So, is that possible?

Linux Heap Fragmentation

I have a question that keeps bothering me for the last week.
In Windows debugger there is the !heap -s command that outputs the virtual memory's heap status and calculates the external fragmentation using the formula :
External fragmentation = 1 - (larget free block / total free size)
Is there a similar method in linux, that outputs those statistics needed to calculate the effect?
Long story now:
I have a C application that keeps allocating and deallocating space of different sizes, using malloc and free, each allocation has different life span.
The platform I am using is Lubuntu, so ptmalloc2 algorithm is the default.
I am aware that those allocations are served in the virtual user space heap(except those that are larger than 128Kb, where the allocator uses mmap), and are mapped to physical pages when actually accessed .
The majority of the allocations is of size < 80 bytes, so they are served from FastBins.
Using Valgrind and Massif I can get the internal fragmentation, since it reports the extra bytes used for each allocation.
However, my main concern is how to figure out the external fragmentation.
I am aware of the /proc/[pid]/smaps heap size and the pmap-d[pid] anon statistics, but I find it difficult to interpret them in terms of external fragmentation.
I am also aware of LD_PRELOAD , and I can dynamically connect the /lib/i386-linux-gnu/libmemusage.so. This library outputs the heap total, peak and the distribution of the requested allocation sizes.
I know that __malloc__hook is deprecated now, and I do not really want to rely on implementation specific statistics like malloc_stats() and mallinfo() . However, If you have any suggestions using those two please let me know.
I can tell that the external fragmentation problem, is when a request can not be satisfied, because there is no contiguous space in the heap, but the requested total size is scattered all around that area.
I still have not figured out, how to get the statistics needed so I can calculate this effect. For example different formulas stating that I have to capture the live_memory or get the total_free_pages, or get the size of the largest_free_block.
How can I have a function to "traverse" through the heap and gather those statistics?
Thank you everyone in advance.
I believe that this will depend on the allocator you are using. That is, you would probably need a different strategy for whichever malloc (et al) and free implementation you are using. If the implementation doesn't offer the information you seek as an extension, you will probably have to read its source-code and type your own logic to examine the state of allocations.
I believe that the mapping of pages to swap-space and physical RAM is at a lower level and so won't particularly help you in your goal. The malloc (et al) and free implementation might or might not care about those lower-level details.
If you are certain that you are using ptmalloc2, are you able to find its source-code?

profiling maximum memory usage in C application - linux

I'm developing C module for php under linux and I'm trying to find a way that could help me profile my code by maximum memory spike (usage).
Using valgrind I can get total memory allocation within the code. But as it is with allocated memory it comes and goes ;). What I need to get is the highest memory usage that appeared during C application run so I could get total overview on memory requirements and have some measurement point for optimization of the code.
Does anyone know any tool/trick/good practice that could help ?
Take a look at Massif: http://valgrind.org/docs/manual/ms-manual.html
Have you checked massif (one of Valgrind's tool)?
this is actually what you are looking for
another possibility would be memusage (one of glibc's utilities, glibc-utils)

Determine total memory usage of embedded C program

I would like to be able to debug how much total memory is being used by C program in a limited resource environment of 256 KB memory (currently I am testing in an emulator program).
I have the ability to print debug statements to a screen, but what method should I use to calculate how much my C program is using (including globals, local variables [from perspective of my main function loop], the program code itself etc..)?
A secondary aspect would be to display the location/ranges of specific variables as opposed to just their size.
-Edit- The CPU is Hitachi SH2, I don't have an IDE that lets me put breakpoints into the program.
Using the IDE options make the proper actions (mark a checkobx, probably) so that the build process (namely, the linker) will generate a map file.
A map file of an embedded system will normally give you the information you need in a detailed fashion: The memory segments, their sizes, how much memory is utilzed in each one, program memory, data memory, etc.. There is usually a lot of data supplied by the map file, and you might need to write a script to calculate exactly what you need, or copy it to Excel. The map file might also contain summary information for you.
The stack is a bit trickier. If the map file gives that, then there you have it. If not, you need to find it yourself. Embedded compilers usually let you define the stack location and size. Put a breakpoint in the start of you program. When the application stops there zero the entire stack. Resume the application and let it work for a while. Finally stop it and inspect the stack memory. You will see non-zero values instead of zeros. The used stack goes until the zeros part starts again.
Generally you will have different sections in mmap generated file, where data goes, like :
.intvect
.intvect_end
.rozdata
.robase
.rosdata
.rodata
.text .... and so on!!!
with other attributes like Base,Size(hex),Size(dec) etc for each section.
While at any time local variables may take up more or less space (as they go in and out of scope), they are instantiated on the stack. In a single threaded environment, the stack will be a fixed allocation known at link time. The same is true of all statically allocated data. The only run-time variable part id dynamically allocated data, but even then sich data is allocated from the heap, which in most bare-metal, single-threaded environments is a fixed link-time allocation.
Consequently all the information you need about memory allocation is probably already provided by your linker. Often (depending on your tool-chain and linker parameters used) basic information is output when the linker runs. You can usually request that a full linker map file is generated and this will give you detailed information. Some linkers can perform stack usage analysis that will give you worst case stack usage for any particular function. In a single threaded environment, the stack usage from main() will give worst case overall usage (although interrupt handlers need consideration, the linker is not thread or interrupt aware, and some architectures have separate interrupt stacks, some are shared).
Although the heap itself is typically a fixed allocation (often all the available memory after the linker has performed static allocation of stack and static data), if you are using dynamic memory allocation, it may be useful at run-time to know how much memory has been allocated from the heap, as well as information about the number of allocations, average size of allocation, and the number of free blocks and their sizes also. Because dynamic memory allocation is implemented by your system's standard library any such analysis facility will be specific to your library, and may not be provided at all. If you have the library source you could implement such facilities yourself.
In a multi-threaded environment, thread stacks may be allocated statically or from the heap, but either way the same analysis methods described above apply. For stack usage analysis, the worst-case for each thread is measured from the entry point of each thread rather than from main().

Can you tune C runtime heap segment reservation size on XP?

When the VC6 C runtime on XP can't serve an allocation request within an existing heap segment, it reserves a new segment. The size of these new segments increase by factors of 2 (until there are not large enough free areas to do that, at which point it falls down to smaller segments.)
In any case, is there any way to control this behavior on XP with the VC6 runtime? For example, doubling up to a point, but capping at 64MB segments.
If there is no way on XP but there is on 7, that would be good to know too. Or if there is no way on VC6 but there is on VC8 or up would be interesting.
If you want specific allocation behaviour, write your own allocator. VirtualAlloc etc are there to help you do it. Using a compiler and CRT which is still in support would help too.

Resources