C - Investigating stack and heap contents program

C - Investigating stack and heap contents program - c

How would I go about writing a C program that lists ALL the contents of stack and heap including its own variables and mallocations ?
The purpose of it is for me to be able to see what's going on in the memory as I write and test code.

The c standard doesn't explicitly mention a stack or a heap. That, along with the fact that variable and function names are compiled out, means that your task is impossible.
You could build your own compiler which would effectively be a debugging tool. But that would be ridiculous as such a thing would take a long time to build and you'd have to adapt it constantly as the standard evolves. Or you could use the output of a compiler that generates debugging symbols.
Better still, learn to use a good debugger.

Related

C function code in malloc'd memory

Is there a way to malloc memory space and then copy function code inside the space in C?
This question might not make sense in practice. I ask this question out of curiosity so that I can get a better understanding about how c and its underlying implementation work.
Here's the follow-up questions if it is possible to copy the code into heap:
How to determine the size for the function binary code when copy?
Can we use function pointer to execute the code? (the code is placed inside malloc'd memory, and that part of memory might be marked as non-executable for safety reason, but I'm not sure about this)

This (or something like it) is possible on most machines, but the techniques you'd use are system-specific -- there's no standard C or C++ way to do it.
Even figuring out the length of a function so you can copy it is difficult. I don't think you can do it reliably if the function is in the same translation unit, because the compiler may have done optimization magic that you can't see. However, if the function is in a different file, then the interface to it will probably be more reliable (although there could be linker magic going on that you would have to understand and emulate to accomplish your goal.)
Other problems (on some systems) are that malloc'd memory may not be executable. (This is often the case to improve security by preventing execution of code placed in an overrun buffer area.) However, systems with executable protection often have an alternate memory allocation function that can give you a chunk of memory where executable code can be placed, and to which execution can transfer. Some variation of this feature is necessary to implement shared libraries.
Finally, although self modifying code is probably the first thing people probably think of when considering your question, a reasonable, legitimate use of the relevant techniques might be in a native-code, just-in-time compilation system.
You may get better answers by specifying a particular OS and CPU where you want to do this.

The C standard (e.g. C11, read n1570) or the C++ one (e.g. C++11, C++14 and notice that they have lambda expressions and std::function; read more about closures ...) does not define what is a function address or pointer (it only defines what calling such an address does, then function pointers should point to existing functions and there is no standard way to build new ones dynamically at runtime). In some systems (pure Harvard architectures) a function sits in a different address space than the C heap (and on these systems executing anything in malloc-ed heap makes no sense and is undefined behavior). so the C11 standard forbids casting function pointers to data pointers and vice-versa.
So, to your question
Is there a way to malloc memory space and then put function code inside the space in C?
the answer is NO in general (but on some systems you could generate code at runtime, see below).
However, on desktop or laptop PCs or server PCs or tablets (running common OSes like Linux, Windows, MacOSX, Android), you usually have a Von Neumann architecture and there is (for a given process) a single virtual address space sharing both code and data (notably heap data obtained with malloc). That virtual address space organised in pages, and each page has its own memory protection. Read more about computer architecture, instruction sets, MMUs. Quite often heap allocated data is non-executable thru the NX bit.
The operating system plays an essential role. You need to read an entire book about OS, such as Operating Systems : Three Easy Pieces.
(I am guessing that you want to "create" some new functions in your program at runtime and call them thru C function pointers; you should explain why; I suppose you are coding some application for a PC or a tablet with a Unix-like OS, practically a Linux-x86_64 distribution, but you could adapt my answer to Windows)
You could use some libraries for JIT compilation such as asmjit, libgccjit, LLVM (or libjit or GNU lightning) and they generate code which is executable.
You could also use dynamic loading techniques on some plugin; on POSIX systems look into dlopen & dlsym (which can be used to "create" function addresses from a loaded plugin, beyond what the C11 standard allows). A possible way would be to generate some C code in a temporary file, compile it into a plugin, and dlopen that generated plugin. See this answer for more details.
On Linux, you can use the mmap(2) and related system calls (used to implement malloc in your C standard library, and also by dlopen(3)) to change your virtual address space, and the mprotect(2) system call to change protection (on a page by page basis). So if you want to explicitly copy or generate some function code it has to go into an executable page (PROT_EXEC).
Notice that because of relocation issues (and offsets or absolute addresses in machine code), it is not easy to copy machine code. Copying with memcpy the bytes of a given function code into some executable page usually won't work without pain: often CALL or JUMP machine instructions are using PC-relative addressing, so copying them without changing their offset won't work.
if it is possible to copy the code into heap
No, it is not possible in general; and in practice it is much more difficult than what you believe (even on Linux-x86_64, where other approaches that I mentioned are preferable); if you want to go that route you need to care about low level implementation details (instruction set, processor, compiler, calling conventions, ABIs, relocation) and your code would be non-portable and brittle.
How to determine the size for the function binary code when copy?
That question (and the notion of function size) has no sense in general. Some optimizing compilers are able to emit some machine code which is shared between several C functions, or to emit several non-contiguous machine code chunks for a given function (and gcc -O2 is likely to do these optimizations, read about function cloning). On Linux you could use dladdr(3) (or the nm or readelf programs) to get a "symbol size" in the ELF sense, but that size might not mean much. And as I explained, you can't just byte-copy binary machine code, you need to relocate (some parts of) it.

Avoiding stack overflows by allocating stack parts on the heap?

Is there a language where we can enable a mechanism that allocates new stack space on the heap when the original stack space is exceeded?
I remember doing a lab in my university where we fiddled with inline assembly in C to implement a heap-based extensible stack, so I know it should be possible in principle.
I understand it may be useful to get a stack overflow error when developing an app because it terminates a crazy infinite recursion quickly without making your system take lots of memory and begin to swap.
However, when you have a finished well-tested application that you want to deploy and you want it to be as robust as possible (say it's a pretty critical program running on a desktop computer), it would be nice to know it won't miserably fail on some other systems where the stack is more limited, where some objects take more space, or if the program is faced with a very particular case requiring more stack memory than in your tests.
I think it's because of these pitfalls that recursion is usually avoided in production code. But if we had a mechanism for automatic stack expansion in production code, we'd be able to write more elegant programs using recursion knowing it won't unexpectedly segfault while the system has 16 gigabytes of heap memory ready to be used...

There is precedent for it.
The runtime for GHC, a Haskell compiler, uses the heap instead of the stack. The stack is only used when you call into foreign code.
Google's Go implementation uses segmented stacks for goroutines, which enlarge the stack as necessary.
Mozilla's Rust used to use segmented stacks, although it was decided that it caused more problems than it solved (see [rust-dev] Abandoning segmented stacks in Rust).
If memory serves, some Scheme implementations put stack frames on the heap, then garbage collected the frames just like other objects.
In traditional programming styles for imperative languages, most code will avoid calling itself recursively. Stack overflows are rarely seen in the wild, and they're usually triggered by either sloppy programming or by malicious input--especially to recursive descent parsers and the like, which is why some parsers reject code when the nesting exceeds a threshold.
The traditional advice for avoiding stack overflows in production code:
Don't write recursive code. (Example: rewrite a search algorithm to use an explicit stack.)
If you do write recursive code, prove that recursion is bounded. (Example: searching a balanced tree is bounded by the logarithm of the size of the tree.)
If you can't prove that it's unbounded, add a bound to it. (Example: add a limit to the amount of nesting that a parser supports.)

I don't believe you will find a language mandating this. But a particular implementation could offer such a mechanism, and depending on the operating system it can very well be that the runtime environment enlarges the stack automatically as needed.

According to gcc's documentation, gcc can generate code which does this, if you compile with the -fsplit_stack option:
-fsplit-stack
Generate code to automatically split the stack before it overflows.
The resulting program has a discontiguous stack which can only
overflow if the program is unable to allocate any more memory.
This is most useful when running threaded programs, as it is no
longer necessary to calculate a good stack size to use for each
thread. This is currently only implemented for the i386 and
x86_64 backends running GNU/Linux.
When code compiled with -fsplit-stack calls code compiled
without -fsplit-stack, there may not be much stack space
available for the latter code to run. If compiling all code,
including library code, with -fsplit-stack is not an option,
then the linker can fix up these calls so that the code compiled
without -fsplit-stack always has a large stack. Support for
this is implemented in the gold linker in GNU binutils release 2.21
and later.
The llvm code generation framework provides support for segmented stacks, which are used in the go language and were originally used in Mozilla's rust (although they were removed from rust on the grounds that the execution overhead is too high for a "high-performance language". (See this mailing list thread)
Despite the rust-team's objections, segmented stacks are surprisingly fast, although the stack-thrashing problem can impact particular programs. (Some Go programs suffer from this issue, too.)
Another mechanism for heap-allocating stack segments in a relatively efficient way was proposed by Henry Baker in his 1994 paper Cheney on the MTA and became the basis of the run-time for Chicken Scheme, a compiled mostly R5RS-compatible scheme implementation.

Recursion is certainly not avoided in production code -- it's just used where and when appropriate.
If you're worried about it, the right answer may not simply be to switch to a manually-maintained stack in a vector or whatever -- though you can do that -- but to reorganize the logic. For example, the compiler I was working on replaced one deep recursive process with a worklist-driven process, since there wasn't a need to maintain strict nesting in the order of processing, only to ensure that variables we had a dependency upon were computed before being used.

If you link with a thread library (e.g. pthreads in C), each thread has a separate stack. Those stacks are allocated one way or another, ultimately (in a UNIX environment) with brk or an anonymous mmap. These might or might not use the heap on the way.
I note all the above answers refer to separate stacks; none explicitly says "on the heap" (in the C sense). I am taking it the poster simply means "from dynamically allocated memory" rather than the calling processor stack.

Maximum stack size needed for a C program on MSP430

In a C program that doesn't use recursion, it should be possible in theory to work out the maximum/worst case stack size needed to call a given function, and anything that it calls. Are there any free, open source tools that can do this, either from the source code or compiled ELF files?
Alternatively, is there a way to extract a function's stack frame size from an ELF file, so I can try to work it out manually?
I'm compiling for the MSP430 using MSPGCC 3.2.3 (I know it's an old version, but I have to use it in this case). The stack space to allocate is set in the source code, and should be as small as possible so that the rest of memory can be used for other things. I have read that you need to take account of the stack space used by interrupts, but the system I'm using already takes account of this - I'm trying to work out how much extra space to add on top of that. Also, I've read that function pointers make this difficult. In the few places where function pointers are used here, I know which functions they can call, so could take account of these cases manually if the stack space needed for the called functions and the calling functions was known.
Static analysis seems like a more robust option than stack painting at runtime, but working it out at runtime is an option if there's no good way to do it statically.
Edit:
I found GCC's -fstack-usage flag, which saves the frame size for each function as it is compiled. Unfortunately, MSPGCC doesn't support it. But it could be useful for anyone who is trying to do something similar on a different platform.

While static analysis is the best method for determining maximum stack usage you may have to resort to an experimental method. This method cannot guarantee you an absolute maximum but can provide you with a very good idea of your stack usage.
You can check your linker script to get the location of __STACK_END and __STACK_SIZE. You can use these to fill the stack space with an easily recognizable pattern like 0xDEAD or 0xAA55. Run your code through a torture test to try and make sure as many interrupts are generated as possible.
After the test you can examine the stack space to see how much of the stack was overwritten.

Interesting question.
I would expect this information to be statically available in the debugging data included in debug builds.
I had a brief look at the DWARF standard, and it does specify two attributes for functions called DW_AT_frame_base and DW_AT_static_link which can be used to "computes the frame
base of the relevant instance of the subroutine
that immediately encloses the subroutine or entry point".

I think that the only to go is by static analysis. You need to account the space for all non-static local variables, which are going to be mostly pointers, but pointers that are going to be stored in the stack anyway, you'll need also to reserve space for the current running address within the caller, as it's going to be stored by the compiler on the stack so control can be return to the caller after your function returns, and also, you need space for all your function parameters.
Based on that, if you have a tool able to count all parameters, auto variables and figure out their size, you should be able to calculate the minimum stack frame size you'll need.
Please note that the compiler could also try to align values on the stack for your particular architecture, what could make the stack space requirements a little bigger that what you'd expect from this calculation.

Some embedded IDE can give info on stack usageduring runtime
I know that IAR eembedded workbench supports it.
Be aware that you need to take in account that interrupts occur asynchronously, so take the biggest stack usage scenario and add interrupt context to it. If nested interrupts are supported like in ARM processors you need to take this in account also.

TinyOS has some work done on stack size analysis. It is described here:
http://tinyos.stanford.edu/tinyos-wiki/index.php/Stack_Analysis
They only support AVR, but say that "MSP430 is not difficult to support but this is not super high priority". In any case, the page provides lots of resources.

Initialise Stack pointer on ATtiny2313

I am programming an ATtiny2313 using avrdude and a makefile. I believe the stack pointer is not properly initialised, since when I call a function, the program appears to freeze. I found the following assembly code:
.include "tn2313def.inc"
ldi r16, low(RAMEND) ; Main program start
out SPL,r16 ;Set Stack Pointer to top of RAM
which I think might work, but I don't know how I can incorporate it into the c code that I created. ie. do I need to include a special header file or somehow denote that it is assembly and not c. I am relatively new to programming and I would appreciate any help either as to how to implement this code properly or another way of making my current c code initialise a stack pointer.
Thank you in advance.
Stephen

It really depends on how you've got your makefile configured as to whether the stack pointer will be initialised. If you're using gcc and the normal compile and link options, the linker ensures that some startup code crtX.o is also included in your executable. The linker automatically chooses the correct crtX.o file for your processor and compile options.
Amongst other things, the code in the crtX.o files will clear the bss segment to be all zeros as required by the C standard, configure your stack pointer and provide interrupt vectors in the correct location for those which have not been overridden.
Remember that the ATTiny2313 only has 128 bytes of SRAM. This area must be big enough for any initialised data you have in your program and the stack. Just the process of calling a simple function will use up quite a number of bytes of RAM to save the registers on the stack before calling the function.
So, I'd suggest to do these things:
Use the standard makefile if one is provided by your compiler, it will ensure that the standard startup code is included and that the stack/RAM is set up correctly before main() is called.
Turn on the linker map and symbol file output and verify that you actually have some space free that can be used for the stack.
The Atmel IDE has a reasonable simulator, so try running your code in the simulator. You'll be able to watch stack usage as you are calling the function and location any odd behaviour.
You may just happen to have a stack overflow (which is why you came to stackoverflow.com right?

Checking stack usage at compile time

Is there a way to know and output the stack size needed by a function at compile time in C ?
Here is what I would like to know :
Let's take some function :
void foo(int a) {
char c[5];
char * s;
//do something
return;
}
When compiling this function, I would like to know how much stack space it will consume whent it is called. This might be useful to detect the on stack declaration of a structure hiding a big buffer.
I am looking for something that would print something like this :
file foo.c : function foo stack usage is n bytes
Is there a way not to look at the generated assembly to know that ? Or a limit that can be set for the compiler ?
Update : I am not trying to avoid runtime stack overflow for a given process, I am looking for a way to find before runtime, if a function stack usage, as determined by the compiler, is available as an output of the compilation process.
Let's put it another way : is it possible to know the size of all the objects local to a function ? I guess compiler optimization won't be my friend, because some variable will disappear but a superior limit is fine.

Linux kernel code runs on a 4K stack on x86. Hence they care. What they use to check that, is a perl script they wrote, which you may find as scripts/checkstack.pl in a recent kernel tarball (2.6.25 has got it). It runs on the output of objdump, usage documentation is in the initial comment.
I think I already used it for user-space binaries ages ago, and if you know a bit of perl programming, it's easy to fix that if it is broken.
Anyway, what it basically does is to look automatically at GCC's output. And the fact that kernel hackers wrote such a tool means that there is no static way to do it with GCC (or maybe that it was added very recently, but I doubt so).
Btw, with objdump from the mingw project and ActivePerl, or with Cygwin, you should be able to do that also on Windows and also on binaries obtained with other compilers.

StackAnlyser seems to examinate the executable code itself plus some debugging info.
What is described by this reply, is what I am looking for, stack analyzer looks like overkill to me.
Something similar to what exists for ADA would be fine. Look at this manual page from the gnat manual :
22.2 Static Stack Usage Analysis
A unit compiled with -fstack-usage will generate an extra file that specifies the maximum amount of stack used, on a per-function basis. The file has the same basename as the target object file with a .su extension. Each line of this file is made up of three fields:
* The name of the function.
* A number of bytes.
* One or more qualifiers: static, dynamic, bounded.
The second field corresponds to the size of the known part of the function frame.
The qualifier static means that the function frame size is purely static. It usually means that all local variables have a static size. In this case, the second field is a reliable measure of the function stack utilization.
The qualifier dynamic means that the function frame size is not static. It happens mainly when some local variables have a dynamic size. When this qualifier appears alone, the second field is not a reliable measure of the function stack analysis. When it is qualified with bounded, it means that the second field is a reliable maximum of the function stack utilization.

I don't see why a static code analysis couldn't give a good enough figure for this.
It's trivial to find all the local variables in any given function, and the size for each variable can be found either through the C standard (for built in types) or by calculating it (for complex types like structs and unions).
Sure, the answer can't be guaranteed to be 100% accurate, since the compiler can do various sorts of optimizations like padding, putting variables in registers or completely remove unnecessary variables. But any answer it gives should be a good estimate at least.
I did a quick google search and found StackAnalyzer but my guess is that other static code analysis tools have similar capabilities.
If you want a 100% accurate figure, then you'd have to look at the output from the compiler or check it during runtime (like Ralph suggested in his reply)

Only the compiler would really know, since it is the guy that puts all your stuff together. You'd have to look at the generated assembly and see how much space is reserved in the preamble, but that doesn't really account for things like alloca which do their thing at runtime.

Assuming you're on an embedded platform, you might find that your toolchain has a go at this. Good commercial embedded compilers (like, for example the Arm/Keil compiler) often produce reports of stack usage.
Of course, interrupts and recursion are usually a bit beyond them, but it gives you a rough idea if someone has committed some terrible screw-up with a multi megabyte buffer on the stack somewhere.

Not exactly "compile time", but I would do this as a post-build step:
let the linker create a map file for you
for each function in the map file read the corresponding part of the executable, and analyse the function prologue.
This is similar to what StackAnalyzer does, but a lot simpler. I think analysing the executable or the disassembly is the easiest way you can get to the compiler output. While the compiler knows those things internally, I am afraid you will not be able to get it from it (you might ask the compiler vendor to implement the functionality, or if using open source compiler, you could do it yourself or let someone do it for you).
To implement this you need to:
be able to parse map file
understand format of the executable
know what a function prologue can look like and be able to "decode" it
How easy or difficult this would be depends on your target platform. (Embedded? Which CPU architecture? What compiler?)
All of this definitely can be done in x86/Win32, but if you never did anything like this and have to create all of this from the scratch, it can take a few days before you are done and have something working.

Not in general. The Halting Problem in theoretical computer science suggests that you can't even predict if a general program halts on a given input. Calculating the stack used for a program run in general would be even more complicated. So: no. Maybe in special cases.
Let's say you have a recursive function whose recursion level depends on the input which can be of arbitrary length and you are already out of luck.