Allocating a new call stack - c

(I think there's a high chance of this question either being a duplicate or otherwise answered here already, but searching for the answer is hard thanks to interference from "stack allocation" and related terms.)
I have a toy compiler I've been working on for a scripting language. In order to be able to pause the execution of a script while it's in progress and return to the host program, it has its own stack: a simple block of memory with a "stack pointer" variable that gets incremented using the normal C code operations for that sort of thing and so on and so forth. Not interesting so far.
At the moment I compile to C. But I'm interested in investigating compiling to machine code as well - while keeping the secondary stack and the ability to return to the host program at predefined control points.
So... I figure it's not likely to be a problem to use the conventional stack registers within my own code, I assume what happens to registers there is my own business as long as everything is restored when it's done (do correct me if I'm wrong on this point). But... if I want the script code to call out to some other library code, is it safe to leave the program using this "virtual stack", or is it essential that it be given back the original stack for this purpose?
Answers like this one and this one indicate that the stack isn't a conventional block of memory, but that it relies on special, system specific behaviour to do with page faults and whatnot.
So:
is it safe to move the stack pointers into some other area of memory? Stack memory isn't "special"? I figure threading libraries must do something like this, as they create more stacks...
assuming any area of memory is safe to manipulate using the stack registers and instructions, I can think of no reason why it would be a problem to call any functions with a known call depth (i.e. no recursion, no function pointers) as long as that amount is available on the virtual stack. Right?
stack overflow is obviously a problem in normal code anyway, but would there be any extra-disastrous consequences to an overflow in such a system?
This is obviously not actually necessary, since simply returning the pointers to the real stack would be perfectly serviceable, or for that matter not abusing them in the first place and just putting up with fewer registers, and I probably shouldn't try to do it at all (not least due to being obviously out of my depth). But I'm still curious either way. Want to know how these sorts of things work.
EDIT: Sorry of course, should have said. I'm working on x86 (32-bit for my own machine), Windows and Ubuntu. Nothing exotic.

All of these answer are based on "common processor architectures", and since it involves generating assembler code, it has to be "target specific" - if you decide to do this on processor X, which has some weird handling of stack, below is obviously not worth the screensurface it's written on [substitute for paper]. For x86 in general, the below holds unless otherwise stated.
is it safe to move the stack pointers into some other area of memory?
Stack memory isn't "special"? I figure threading libraries
must do something like this, as they create more stacks...
The memory as such is not special. This does however assume that it's not on an x86 architecture where the stack segment is used to limit the stack usage. Whilst that is possible, it's rather rare to see in an implementation. I know that some years ago Nokia had a special operating system using segments in 32-bit mode. As far as I can think of right now, that's the only one I've got any contact with that uses the stack segment for as x86-segmentation mode describes.
Assuming any area of memory is safe to manipulate using the stack
registers and instructions, I can think of no reason why it would be a
problem to call any functions with a known call depth (i.e. no
recursion, no function pointers) as long as that amount is available
on the virtual stack. Right?
Correct. Just as long as you don't expect to be able to get back to some other function without switching back to the original stack. Limited level of recursion would also be acceptable, as long as the stack is deep enough [there are certain types of problems that are definitely hard to solve without recursion - binary tree search for example].
stack overflow is obviously a problem in normal code anyway,
but would there be any extra-disastrous consequences to an overflow in
such a system?
Indeed, it would be a tough bug to crack if you are a little unlucky.
I would suggest that you use a call to VirtualProtect() (Windows) or mprotect() (Linux etc) to mark the "end of the stack" as unreadable and unwriteable so that if your code accidentally walks off the stack, it crashes properly rather than some other more subtle undefined behaviour [because it's not guaranteed that the memory just below (lower address) is unavailable, so you could overwrite some other useful things if it does go off the stack, and that would cause some very hard to debug bugs].
Adding a bit of code that occassionally checks the stack depth (you know where your stack starts and ends, so it shouldn't be hard to check if a particular stack value is "outside the range" [if you give yourself some "extra buffer space" between the top of the stack and the "we're dead" zone that you protected - a "crumble zone" as they would call it if it was a car in a crash]. You can also fill the entire stack with a recognisable pattern, and check how much of that is "untouched".

Typically, on x86, you can use the existing stack without any problems so long as:
you don't overflow it
you don't increment the stack pointer register (with pop or add esp, positive_value / sub esp, negative_value) beyond what your code starts with (if you do, interrupts or asynchronous callbacks (signals) or any other activity using the stack will trash its contents)
you don't cause any CPU exception (if you do, the exception handling code might not be able to unwind the stack to the nearest point where the exception can be handled)
The same applies to using a different block of memory for a temporary stack and pointing esp to its end.
The problem with exception handling and stack unwinding has to do with the fact that your compiled C and C++ code contains some exception-handling-related data structures like the ranges of eip with the links to their respective exception handlers (this tells where the closest exception handler is for every piece of code) and there's also some information related to identification of the calling function (i.e. where the return address is on the stack, etc), so you can bubble up exceptions. If you just plug in raw machine code into this "framework", you won't properly extend these exception-handling data structures to cover it, and if things go wrong, they'll likely go very wrong (the entire process may crash or become damaged, despite you having exception handlers around the generated code).
So, yeah, if you're careful, you can play with stacks.

You can use any region you like for the processor's stack (modulo the memory protections).
Essentially, you simply load the ESP register ("MOV ESP, ...") with a pointer to the new area, however you managed to allocate it.
You have to have enough for your program, and whatever it might call (e.g., a Windows OS API), and whatever funny behaviours the OS has. You might be able to figure out how much space your code needs; a good compiler can easily do that. Figuring how much is needed by Windows is harder; you can always allocate "way too much" which is what Windows programs tend to do.
If you decide to manage this space tightly, you'll probably have to switch stacks to call Windows functions. That won't be enough; you'll likely get burned by various Windows surprises. I describe one of them here Windows: avoid pushing full x86 context on stack. I have mediocre solutions, but not good solutions for this.

Related

How to know/limit static stack size in C program with GCC/Clang compiler? [duplicate]

This question already has answers here:
How to determine maximum stack usage in embedded system with gcc?
(7 answers)
Closed 1 year ago.
I'm writing an embedded program that uses a static limited stack area of a known size (in other words, I have X bytes for the stack, and there's no overlaying OS that can allocate more stack on demand for me). I want to avoid errors during runtime, and catch them in build time instead - to have some indication if I mistakenly declared too much variables in some function block that won't fit in the stack during the runtime.
Given that I don't use recursive calls in my program, can I somehow know during compilation time how much space on stack all my local variables will take on the deepest function call path? Or at least know how much space my variables will take in a single block (function) if the compiler is not smart enough to analyze it on all the nested calls?
Given that I don't use recursive calls in my program, can I somehow know during compilation time how much space on stack all my local variables will take on the deepest function call path?
Only if you don't use interrupts. Which is extremely likely in any embedded system. So you'll have to find out stack use with dynamic analysis.
The old school way is to set your whole stack area to a value like 0xAA upon reset from a debugger, then let the program run for a while, make sure to provoke all use-cases. Then halt and inspect how far down you still have 0xAA in memory. It isn't a 100% scientific, fool-proof method but works just fine in practice, in the vast majority of cases.
Other methods involve setting write breakpoints at certain stack locations where you don't expect the program to end up, sort of like a "hardware stack canary". Run the program and ensure that the breakpoint never triggers. If it does, then investigate from there, move the breakpoint further down the memory map to see exactly where.
Another good practice is to always memory map your stack so that it can only overflow into forbidden memory or at least into read-only flash etc - ideally you'd get a hardware exception for stack overflow. You definitely want to avoid the stack overflowing into other RAM sections like .data/.bss, as that will cause severe and extremely subtle error scenarios.

Usage example of the nonexistent sstk() system call?

Keeping with the festivities of Stackoverflow's new logo design, I was curious what the point of sstk() was supposed to be in BSD and other UNIX-Like operating systems?
According to the Linux kernel system call interface manpages, sstk(2) was supposed to:
...[change] the size of the stack area. The stack area is also automatically extended as needed. On the VAX the text and data areas are adjacent in the P0 region, while the stack section is in the P1 region, and grows downward.
However, also according to the manual:
This call is not supported in 4.3BSD or 4.4BSD or glibc or Linux or any other known Unix-like system. Some systems have a routine of this name that returns ENOSYS.
Which can be noticed by viewing glibc's sstk.c source
My question is, why would one want to manually change the size of the stack? sbrk() and friends make sense, but is there any use in manually re-sizing the stack size in your program manually?
As I first expressed in comments, the fact that the call is not supported any-known-where suggests that in fact there isn't much point to it, at least not any more.
The docs on linux.die.net attribute the function's heritage to BSD (though it's apparently not supported in modern BSD any more than it is anywhere else), and BSD traces its lineage to bona fide AT&T Unix. It may have made more sense in days when RAM was precious. In those days, you also might not have been able to rely on the stack size being increased automatically. Thus you might, say, enlarge the stack dynamically within a deeply recursive algorithm, and then shrink it back afterward.
Another plausible reason for explicitly growing the stack with a syscall would be so you can get a clean error indication if the request is too big, as opposed to the normal method of handling stack allocation failures (i.e. don't even try, if any allocation fails, just let the process crash).
It will be hard to know exactly how much stack space you need to perform some recursive operation, but you could make a reasonable guess and sstk(guess*10) just to be sure.

Catching stack overflow

What's the best way to catch stack overflow in C?
More specifically:
A C program contains an interpreter for a scripting language.
Scripts are not trusted, and may contain infinite recursion bugs. The interpreter has to be able to catch these and smoothly continue. (Obviously this can partly be handled by using a software stack, but performance is greatly improved if substantial chunks of library code can be written in C; at a minimum, this entails C functions running over recursive data structures created by scripts.)
The preferred form of catching a stack overflow would involve longjmp back to the main loop. (It's perfectly okay to discard all data that was held in stack frames below the main loop.)
The fallback portable solution is to use addresses of local variables to monitor the current stack depth, and for every recursive function to contain a call to a stack checking function that uses this method. Of course, this incurs some runtime overhead in the normal case; it also means if I forget to put the stack check call in one place, the interpreter will have a latent bug.
Is there a better way of doing it? Specifically, I'm not expecting a better portable solution, but if I had a system specific solution for Linux and another one for Windows, that would be okay.
I've seen references to something called structured exception handling on Windows, though the references I've seen have been about translating this into the C++ exception handling mechanism; can it be accessed from C, and if so is it useful for this scenario?
I understand Linux lets you catch a segmentation fault signal; is it possible to reliably turn this into a longjmp back to your main loop?
Java seems to support catching stack overflow exceptions on all platforms; how does it implement this?
Off the top of my head, one way to catch excessive stack growth is to check the relative difference in addresses of stack frames:
#define MAX_ROOM (64*1024*1024UL) // 64 MB
static char * first_stack = NULL;
void foo(...args...)
{
char stack;
// Compare addresses of stack frames
if (first_stack == NULL)
first_stack = &stack;
if (first_stack > &stack && first_stack - &stack > MAX_ROOM ||
&stack > first_stack && &stack - first_stack > MAX_ROOM)
printf("Stack is larger than %lu\n", (unsigned long)MAX_ROOM);
...code that recursively calls foo()...
}
This compares the address of the first stack frame for foo() to the current stack frame address, and if the difference exceeds MAX_ROOM it writes a message.
This assumes that you're on an architecture that uses a linear always-grows-down or always-grows-up stack, of course.
You don't have to do this check in every function, but often enough that excessively large stack growth is caught before you hit the limit you've chosen.
AFAIK, all mechanisms for detecting stack overflow will incur some runtime cost. You could let the CPU detect seg-faults, but that's already too late; you've probably already scribbled all over something important.
You say that you want your interpreter to call precompiled library code as much as possible. That's fine, but to maintain the notion of a sandbox, your interpreter engine should always be responsible for e.g. stack transitions and memory allocation (from the interpreted language's point of view); your library routines should probably be implemented as callbacks. The reason being that you need to be handling this sort of thing at a single point, for reasons that you've already pointed out (latent bugs).
Things like Java deal with this by generating machine code, so it's simply a case of generating code to check this at every stack transition.
(I won't bother those methods depending on particular platforms for "better" solutions. They make troubles, by limiting the language design and usability, with little gain. For answers "just work" on Linux and Windows, see above.)
First of all, in the sense of C, you can't do it in a portable way. In fact, ISO C mandates no "stack" at all. Pedantically, it even seems when allocation of automatic objects failed, the behavior is literally undefined, as per Clause 4p2 - there is simply no guarantee what would happen when the calls nested too deep. You have to rely on some additional assumptions of implementation (of ISA or OS ABI) to do that, so you end up with C + something else, not only C. Runtime machine code generation is also not portable in C level.
(BTW, ISO C++ has a notion of stack unwinding, but only in the context of exception handling. And there is still no guarantee of portable behavior on stack overflow; though it seems to be unspecified, not undefined.)
Besides to limit the call depth, all ways have some extra runtime cost. The cost would be quite easily observable unless there are some hardware-assisted means to amortize it down (like page table walking). Sadly, this is not the case now.
The only portable way I find is to not rely on the native stack of underlying machine architecture. This in general means you have to allocate the activation record frames as part of the free store (on the heap), rather than the native stack provided by ISA. This does not only work for interpreted language implementations, but also for compiled ones, e.g. SML/NJ. Such software stack approach does not always incur worse performance because they allow providing higher level abstraction in the object language so the programs may have more opportunities to be optimized, though it is not likely in a naive interpreter.
You have several options to achieve this. One way is to write a virtual machine. You can allocate memory and build the stack in it.
Another way is to write sophisticated asynchronous style code (e.g. trampolines, or CPS transformation) in your implementation instead, relying on less native call frames as possible. It is generally difficult to get right, but it works. Additional capabilities enabled by such way are easier tail call optimization and easier first-class continuation capture.

How will I know when my memory is full?

I'm writing a firmware for a Atmel XMEGA microcontroller in c and I think I filled up the 4 KB of SRAM. As far as I know I only do have static/global data and local stack variables (I don't use malloc within my code).
I use a local variable to buffer some pixel data. If I increase the buffer to 51 bytes my display is showing strange results - a buffer of 6 bytes is doing fine. This is why I think my ram is full and the stack is overwriting something.
Creating more free memory is not my problem because I can just move some static data into the flash and only load it when its needed. What bothers me is the fact that I could have never discovered that the memory got full.
Is it somehow possible to dected (e.g. by reseting the microcontroller) when the memory got filled up instead of letting it overwrite some other data?
It can be very difficult to predict exactly how much stack you'll need (some toolchains can have a go at this if you turn on the right options, but it's only ever a rough guide).
A common way of checking on the state of the stack is to fill it completely with a known value at startup, run the code as hard/long as you can, and then see how much had not been overwritten.
The startup code for your toolchain might even have an option to fill the stack for you.
Unfortunately, although the concepts are very simple: fill the stack with a known value, count the number of those values which remain, the reality of implementing it can require quite a deep understanding of the way your specific tools (particularly the startup code and the linker) work.
Crude ways to check if stack overflow is what's causing your problem are to make all your local arrays 'static' and/or to hugely increase the size of the stack and then see if things work better. These can both be difficult to do on small embedded systems.
"Is it somehow possible to dected (e.g.
by reseting the microcontroller) when
the memory got filled up instead of
letting it overwrite some other data?"
I suppose currently you have a memory mapping like (1).
When stack and/or variable space grow to much, they collide and overwrite each other (*).
Another possibility is a memory mapping like (2).
When stack or variable space exceeds the maximum space, they hit the not mapped addr space (*).
Depending on the controller (I am not sure about AVR family) this causes a reset/trap or similar (= what you desired).
[not mapped addr space][ RAM mapped addr space ][not mapped addr space]
(1) [variables ---> * <--- stack]
(2) *[ <--- stack variables ---> ]*
(arrows indicate growing direction if more variable/stack is used)
Of course it is better to make sure beforehand that RAM is big enough.
Typically the linker is responsible for allocating the memory for code, constants, static data, stacks and heaps. Often you must specify required stack sizes (and available memory) to the linker which will then flag an error if it can't fit everything in.
Note also that if you're dealing with a multithreaded application, then each thread has it's own stack and these are frequently allocated off the heap as the thread starts.
Unless your processor has some hardware checking for stack overflow on it (unlikely), there are a couple of tricks you can use to monitor the stack usage.
Fill the stack with a known marker pattern, and examine the stack memory (as allocated by the linker) to determine how much of the marker remains uncorrupted.
In a timer interrupt (or similar) compare the main thread stack pointer with the base of the stack to check for overflow
Both of these approaches are useful in debugging, but they are not guaranteed to catch all problems and will typically only flag a problem AFTER the stack has already corrupted something else...
Usually your programming tool knows the parameters of the controller, so you should be warned if you used more (without mallocs, it is known at compile time).
But you should be careful with pixeldata, because most displays don't have linear address space.
EDIT: usually you can specify the stack size manually. Leave just enough memory for static variables, and reserve the rest for stack.

Are programming languages and methods inefficient? (assembler and C knowledge needed)

for a long time, I am thinking and studying output of C language compiler in assembler form, as well as CPU architecture. I know this may be silly to you, but it seems to me that something is very ineffective. Please, don´t be angry if I am wrong, and there is some reason I do not see for all these principles. I will be very glad if you tell me why is it designed this way. I actually truly believe I am wrong, I know the genius minds of people which get PCs together knew a reason to do so. What exactly, do you ask? I´ll tell you right away, I use C as a example:
1: Stack local scope memory allocation:
So, typical local memory allocation uses stack. Just copy esp to ebp and than allocate all the memory via ebp. OK, I would understand this if you explicitly need allocate RAM by default stack values, but if I do understand it correctly, modern OS use paging as a translation layer between application and physical RAM, when address you desire is further translated before reaching actual RAM byte. So why don´t just say 0x00000000 is int a,0x00000004 is int b and so? And access them just by mov 0x00000000,#10? Because you wont actually access memory blocks 0x00000000 and 0x00000004 but those your OS set the paging tables to. Actually, since memory allocation by ebp and esp use indirect addressing, "my" way would be even faster.
2: Variable allocation duplicity:
When you run application, Loader load its code into RAM. When you create variable, or string, compiler generates code that pushes these values on the top o stack when created in main. So there is actual instruction for do so, and that actual number in memory. So, there are 2 entries of the same value in RAM. One in form of instruction, second in form of actual bytes in the RAM. But why? Why not to just when declaring variable count at which memory block it would be, than when used, just insert this memory location?
How would you implement recursive functions? What you are describing is equivalent to using global variables everywhere.
That's just one problem. How can you link to a precompiled object file and be sure it won't corrupt the memory of your procedures?
Because C (and most other languages) support recursion, so a function can call itself, and each call of the function needs separate copies of any local variables. Also, on most current processors, your way would actually be slower -- indirect addressing is so common that processors are optimized for it.
You seem to want the behavior of C (or at least that C allows) for string literals. There are good and bad points to this, such as the fact that even though you've defined a "variable", you can't actually modify its contents (without affecting other variables that are pointing at the same location).
The answers to your questions are mostly wrapped up in the different semantics of different storage classes
Google "data segment"
Think about the difference in behavior between global and local variables.
Think about how constant and non-constant variables have different requirements when functions are called repeatedly (or as Mehrdad says, recursively)
Think about the difference between static and non static automatic variables again in the context of multiple or recursive calls.
Since you are comparing assembler and c (which are very close together from an architectural standpoint), I'm inclined to say that you're describing micro-optimization, which is meaningless unless you profile the code to see if it performs better.
In general, programming languages are evolving towards a more declarative style (i.e. telling the computer what you want done, rather than how you want it done). When you program in an imperative language (like assembly or c), you specify in extreme detail how you want the problem solved. This gives the compiler little room to make optimization decisions on your behalf.
However, as the languages become more declarative, the compilers are getting smarter, because we are giving them the room they need to make more intelligent performance optimizations.
If every function would put its first variable at offset 0 and so on then you would have to change the memory mapping each time you enter a function (you could not allocate all variables to unique addresses if you want recursion). This is doable, but with current hardware it's very slow. Furthermore, the address translation performed by the virtual memory is not free either, it's actually quite complicated to implement this efficiently.
Addressing off ebp (or any other register) costs having a mux (to select the register) and an adder (to add the offset to the register). The time taken for this can often be overlapped with other operations.
If you want to be able to modify the static value you have to copy it to the stack. If you don't (saying it's 'const') then a good C compiler will no copy it to the stack.

Resources