C embedded systems stack and heap size - c

How could I determine the current stack and heap size of a running C program on an embedded system? Also, how could I discover the maximum stack and heap sizes that my embedded system will allow? I thought about linearly calling malloc() with an increasing size until it fails to find the heap size, however I am more interested in the size of the stack.
I am using an mbed NXP LPC1768, and I am using an offline compiler developed on GitHub called gcc4mbed.
Any better ideas? All help is greatly appreciated!

For this look at your linker script, this will define how much space you have allocated to each.
For stack size usage do this:
At startup (before C main()) during initialization of memory, init all your stack bytes to known values such as 0xAA, or 0xCD. Run your program, at any point you can stop and see how many magic values you have left. If you don't see any magic values then you have overflowed your stack and weirdness may start to happen.
At runtime you can also check the last 4 bytes or so (maybe last two words, this is really up to you). If they don't match your magic value then force a reset. This only works if your system is well behaved on reset and it is best if it starts up quick and isn't doing something "real time" or mission critical.
Here's a really helpful whitepaper from IAR on the subject.

A crude way of measuring at runtime the current stack size is to declare
static void* mainsp;
then start your main with e.g:
int main(int argc, char**argv) {
int here;
mainsp = (void*) &here;
then inside some leaf routine, when the call stack is deep enough, do something similar to
int local;
printf ("stack size = %ld\n",
(long) ((intptr_t) &local - (intptr_t) mainsp));
Statically estimating from full source code of an application the required stack size is in general undecidable (think of recursion, function pointers), and in practice very difficult (even on a severely restricted class of applications). Look into Couverture. You might also consider customizing a recent GCC compiler with your plugin (perhaps Bismon in mid 2021; email me to basile.starynkevitch#cea.fr about it) for such purposes, but that won't be easy and will give you over-approximations.
If compiling with GCC, you might use the return address bultins to query the stack frame pointer at run time. On some architectures it is not available with some optimization flags. You could also use the -Wstack-usage=byte-size and/or -Wframe-larger-than=byte-size warning options to recent GCC.
As to how heap and stack spaces are distributed, this is system dependent. You might parse /proc/self/maps file on Linux. See proc(5). You could limit stack space on Linux in user-space using setrlimit(2).
Be however aware of Rice's theorem.
With multi-threaded applications things could be more difficult. Read some Pthread tutorial.
Notice that in simple cases, GCC may be capable of tail-call optimizations. You could compile your foo.c with gcc -Os -fverbose-asm -S foo.c and look inside the generated foo.s assembler code.
If you don't care about portability, consider also using the extended asm features of GCC.

Related

Does C at first tries to assign a certain address? [duplicate]

I'm trying to understand how C allocates memory on stack. I always thought variables on stack could be depicted like structs member variables, they occupy successive, contiguous bytes block within the Stack. To help illustrate this issue I found somewhere, I created this small program which reproduced the phenomenon.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void function(int *i) {
int *_prev_int = (int *) ((long unsigned int) i - sizeof(int)) ;
printf("%d\n", *_prev_int );
}
void main(void)
{
int x = 152;
int y = 234;
function(&y);
}
See what I'm doing? Suppose sizeof(int) is 4: I'm looking 4 bytes behind the passed pointer, as that would read the 4 bytes before where int y in the caller's stack.
It did not print the 152. Strangely when I look at the next 4 bytes:
int *_prev_int = (int *) ((long unsigned int) i + sizeof(int)) ;
and now it works, prints whatever in x inside the caller's stack. Why x has a lower address than y? Are stack variables stored upside down?
Stack organization is completely unspecified and is implementation specific. In practice, it depends a lot of the compiler (even of its version) and of optimization flags.
Some variables don't even sit on the stack (e.g. because they are just kept inside some registers, or because the compiler optimized them -e.g. by inlining, constant folding, etc..).
BTW, you could have some hypothetical C implementation which does not use any stack (even if I cannot name such implementation).
To understand more about stacks:
Read the wikipage on call stacks, tail calls, threads, and on continuations
Become familiar with your computer's architecture & instruction set (e.g. x86) & ABI, then ...
ask your compiler to show the assembler code and/or some intermediate compiler representations. If using GCC, compile some simple code with gcc -S -fverbose-asm (to get assembler code foo.s when compiling foo.c) and try several optimization levels (at least -O0, -O1, -O2 ....). Try also the -fdump-tree-all option (it dumps hundred of files showing some internal representations of the compiler for your source code). Notice that GCC also provides return address builtins
Read Appel's old paper on garbage collection can be faster than stack allocation, and understand garbage collection techniques (since they often need to inspect and possibly change some pointers inside call stack frames). To know more about GC, read the GC handbook.
Sadly, I know no low-level language (like C, D, Rust, C++, Go, ...) where the call stack is accessible at the language level. This is why coding a garbage collector for C is difficult (since GC-s need to scan the call stack pointers)... But see Boehm's conservative GC for a very practical and pragmatic solution.
Almost all the processors architectures nowadays supports stack manipulation instruction (e.g LDM,STM instructions in ARM). Compilers with the help of those implements stack. In most of the cases when data is pushed into stack, stack pointer decrements (Growing Downwards) and Increments when data popped from stack.
So it depends on processor architecture and compiler how stack is implemented.
Depends on the compiler and platform. The same thing can be done in more than one way as long it is done consistently by a program (this case the compiler translation to assembly, i.e. machine code) and the platform supports it (good compilers try to optimize assembly to get the “most” of each platform).
A very good source to deeply understand what goes behind the scenes of c, what happens when compiling a program and why they happen, is the free book Reverse Engineering for Beginners (Understanding Assembly Language) by Dennis Yurichev, the latest version can be found at his site.

Why do variables are stored in an inverted order than the declaration order [duplicate]

I'm trying to understand how C allocates memory on stack. I always thought variables on stack could be depicted like structs member variables, they occupy successive, contiguous bytes block within the Stack. To help illustrate this issue I found somewhere, I created this small program which reproduced the phenomenon.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void function(int *i) {
int *_prev_int = (int *) ((long unsigned int) i - sizeof(int)) ;
printf("%d\n", *_prev_int );
}
void main(void)
{
int x = 152;
int y = 234;
function(&y);
}
See what I'm doing? Suppose sizeof(int) is 4: I'm looking 4 bytes behind the passed pointer, as that would read the 4 bytes before where int y in the caller's stack.
It did not print the 152. Strangely when I look at the next 4 bytes:
int *_prev_int = (int *) ((long unsigned int) i + sizeof(int)) ;
and now it works, prints whatever in x inside the caller's stack. Why x has a lower address than y? Are stack variables stored upside down?
Stack organization is completely unspecified and is implementation specific. In practice, it depends a lot of the compiler (even of its version) and of optimization flags.
Some variables don't even sit on the stack (e.g. because they are just kept inside some registers, or because the compiler optimized them -e.g. by inlining, constant folding, etc..).
BTW, you could have some hypothetical C implementation which does not use any stack (even if I cannot name such implementation).
To understand more about stacks:
Read the wikipage on call stacks, tail calls, threads, and on continuations
Become familiar with your computer's architecture & instruction set (e.g. x86) & ABI, then ...
ask your compiler to show the assembler code and/or some intermediate compiler representations. If using GCC, compile some simple code with gcc -S -fverbose-asm (to get assembler code foo.s when compiling foo.c) and try several optimization levels (at least -O0, -O1, -O2 ....). Try also the -fdump-tree-all option (it dumps hundred of files showing some internal representations of the compiler for your source code). Notice that GCC also provides return address builtins
Read Appel's old paper on garbage collection can be faster than stack allocation, and understand garbage collection techniques (since they often need to inspect and possibly change some pointers inside call stack frames). To know more about GC, read the GC handbook.
Sadly, I know no low-level language (like C, D, Rust, C++, Go, ...) where the call stack is accessible at the language level. This is why coding a garbage collector for C is difficult (since GC-s need to scan the call stack pointers)... But see Boehm's conservative GC for a very practical and pragmatic solution.
Almost all the processors architectures nowadays supports stack manipulation instruction (e.g LDM,STM instructions in ARM). Compilers with the help of those implements stack. In most of the cases when data is pushed into stack, stack pointer decrements (Growing Downwards) and Increments when data popped from stack.
So it depends on processor architecture and compiler how stack is implemented.
Depends on the compiler and platform. The same thing can be done in more than one way as long it is done consistently by a program (this case the compiler translation to assembly, i.e. machine code) and the platform supports it (good compilers try to optimize assembly to get the “most” of each platform).
A very good source to deeply understand what goes behind the scenes of c, what happens when compiling a program and why they happen, is the free book Reverse Engineering for Beginners (Understanding Assembly Language) by Dennis Yurichev, the latest version can be found at his site.

How to instrument/profile memory(heap, pointers) reads and writes in C?

I know this might be a bit vague and far-fetched (sorry, stackoverflow police!).
Is there a way, without external forces, to instrument (track basically) each pointer access and track reads and writes - either general reads/writes or quantity of reads/writes per access. Bonus if it can be done for all variables and differentiate between stack and heap ones.
Is there a way to wrap pointers in general or should this be done via custom heap? Even with custom heap I can't think of a way.
Ultimately I'd like to see a visual representation of said logs that would show me variables represented as blocks (of bytes or multiples of) and heatmap over them for reads and writes.
Ultra simple example:
int i = 5;
int *j = &i;
printf("%d", *j); /* Log would write *j was accessed for read and read sizeof(int) bytes
Attempt of rephrasing in more concise manner:
(How) can I intercept (and log) access to a pointer in C without external instrumentation of binary? - bonus if I can distinguish between read and write and get name of the pointer and size of read/write in bytes.
I guess (or hope for you) that you are developing on Linux/x86-64 with a recent GCC (5.2 in october 2015) or perhaps Clang/LLVM compiler (3.7).
I also guess that you are tracking a naughty bug, and not asking this (too broad) question from a purely theoretical point of view.
(Notice that practically there is no simple answer to your question, because in practice C compilers produce machine code close to the hardware, and most hardware do not have sophisticated instrumentations like the one you dream of)
Of course, compile with all warnings and debug info (gcc -Wall -Wextra -g). Use the debugger (gdb), notably its watchpoint facilities which are related to your issue. Use also valgrind.
Notice also that GDB (recent versions like 7.10) is scriptable in Python (or Guile), and you could code some scripts for GDB to assist you.
Notice also that recent GCC & Clang/LLVM have several sanitizers. Use some of the -fsanitize= debugging options, notably the address sanitizer with -fsanitize=address; they are instrumenting the code to help in detecting pointer accesses, so they are sort-of doing what you want. Of course, the performance of the instrumented generated code is decreasing (depending on the sanitizer, can be 10 or 20% or a factor of 50x).
At last, you might even consider adding your own instrumentation by customizing your compiler, e.g. with MELT -a high level domain specific language designed for such customization tasks for GCC. This would take months of work, unless you are already familiar with GCC internals (then, only several weeks). You could add an "optimization" pass inside GCC which would instrument (by changing the Gimple code) whatever accesses or stores you want.
Read more about aspect-oriented programming.
Notice also that if your C code is generated, that is if you are meta-programming, then changing the C code generator might be very relevant. Read more about reflection and homoiconicity. Dynamic software updating is also related to your issues.
Look also into profiling tools like oprofile and into sound static source analyzers like Frama-C.
You could also run your program inside some (instrumenting) emulator (like Qemu, Unisim, etc...).
You might also compile for a fictitious architecture like MMIX and instrument its emulator.

How can I optimize GCC compilation for memory usage?

I am developing a library which should use as little memory as possible (I am not concerned about anything else, like the binary size, or speed optimizations).
Are there any GCC flags (or any other GCC-related options) I can use? Should I avoid some level of -O* optimization?
You library -or any code in idiomatic C- has several kinds of memory usage :
binary code size, and indeed -Os should optimize that
heap memory, using C dynamic allocation, that is malloc; you obviously should know how, and how much, heap memory is allocated (and later free-d). The actual memory consumption would depend upon your particular malloc implementation (e.g. many implementations, when calling malloc(25) could in fact consume 32 bytes), not on the compiler. BTW, you might design your library to use some memory pools or even implement your own allocator (above OS syscalls like mmap, or above malloc etc...)
local variables, that is the call frames on the call stack. This mostly depend upon your code (but an optimizing compiler, e.g. -Os or -O2 for gcc, would probably use more registers and perhaps slightly less stack when optimizing). You could pass -fstack-usage to gcc to ask it to give the size of every call frame and you might give -Wstack-usage=len to be warned when a call frame exceeds len bytes.
global or static variables. You should know how much memory they need (and you might use nm or some other binutils program to query them). BTW, declaring carefully some variables inside a function as static would lower the stack consumption (but you cannot do that for every variable or every function).
Notice also that in some limited cases, GCC is doing tail calls, and then the stack usage is lowered (since the stack frame of the caller is reused in the callee). (See also this old question).
You might also ask the compiler to pack some particular struct-s (beware, this could slowdown the performance significantly). You'll want to use some type attributes like __attribute__((packed)), etc... and perhaps also some variable attributes etc...
Perhaps you should read more about Garbage Collection, since GC techniques, concepts, and terminology might be relevant. See this answer.
If on Linux, the valgrind tool should be useful too... (and during the debugging phase the -fsanitize=address option of recent GCC).
You might perhaps also use some code generation options like -fstack-reuse= or -fshort-enums or -fpack-struct or -fstack-limit-symbol= or -fsplit-stack ; be very careful: some such options make your binary code incompatible with your existing C (and others!) libraries (then you might need to recompile all used libraries, including your libc, with the same code generation flags).
You probably should enable link-time optimizations by compiling and linking with -flto (in addition of other optimization flags like -Os).
You certainly should use a recent version of GCC. Notice that GCC 5.1 has been released a few days ago (in april 2015).
If your library is large enough to worth the effort, you might even consider customizing your GCC compiler with MELT (to help you find out how to spend less memory). This might take weeks or months of work.
there are advantages to using 'stack frames', but that does use more stack space to save the stack frame pointer.
You can tell the compiler to not use stack frames. This will (generally) slightly increase the code size but will reduce the amount of stack used.
you can only use char and short for values rather than int.
It is poor programing practice, but can re-use variables and arrays for multiple purposes.
if some set of variables are mutually exclusive on usage, then can place them in a union.
If the function parameter lists are all very short, then can for the compiler to pass all the parameters in registers. (having an architecture with lots of general purpose registers really helps here.
Only use one malloc that contains ALL the area needed for malloc kind of operations, so as to minimize the amount of allocated memory overhead.
there are many techniques. Most make the code much more difficult to debug/maintain and often make the code much harder for humans to read
When possible, you can use -m32 option to compile your application for 32-bit. So, the application will consume only half of the memory on 64-bit systems.
apt-get install libc6-dev-i386
gcc -m32 application.c -o application

How to Find Stack Overflow in Microcontrollers using C in Run Time

I am creating the application on STM32 microcontroller. I am using some library stack. Above that stack, I am using my application. I have two questions:
How can I detect and handle the stack over flow in runtime. Because I don't know how much memory that library is using.
How can I detect and handle the stack over flow in runtime if I am developing the code from the scratch. I read some where we have to maintain some count for each declaration. Is this correct way or any standard way to find it.
if you are limited to your device and no "high sophisticated" tools available, you could at least try "the old way". A simple stack guard may help. Somewhere in your code (depends on the tools you use), there must be the definition of the stack area. Something similar to:
.equ stacksize, 1024
stack: .space stacksize,0
(gnu as syntax, your's might be different)
with your device's startup code somewhere initializing the stack register to the top address of the stack area.
A stack guard would then just add a "magic number" to the stack top and bottom:
.equ stackmagic,0xaffeaffe
.equ stacksize, 1024
stacktop: .int stackmagic
stack: .space stacksize,0
stackbottom: .int stackmagic
with some code at least periodically checking (e.g. in a timer interrupt routine or - if available - in your debugger) if the stackmagic values are still there.
If your software is not tiny, I would first try to debug most of it on your laptop or desktop or tablet (perhaps running Linux, because it has good tools and very standard compliant compilers, similar to the cross-compiler you are using). Then you can profit from tools like valgrind or GCC compilation options like -Wall -Wextra -g -fsanitize=address etc....
You might store an approximation of the top of stack at start of your main function (e.g. by doing extern int* start_top_of_stack; then int i=0; start_top_of_stack= &i; near beginning of your main function. You could then have some local int j=0; in several functions and check at start of them that &j - start_top_of_stack is not too big.
But there is no silver bullet. I am just suggesting a trick.
If your application is critical to the point of accepting costly development efforts, you could use some formal method & source static program analysis tools (e.g. Frama-C, or make your own using MELT). If you are cross-compiling with a recent GCC you might want to use -Wstack-usage=some length and/or -fstack-usage to check that every call frame is not too big, or to compute manually the required stack depth.

Resources