Safe usage of `setjmp` and `longjmp` - c

I know people always say don't use longjmp, it's evil, it's dangerous.
But I think it can be useful for exiting deep recursions/nested function calls.
Is a single longjmp faster than a lot of repeated checks and returns like if(returnVal != SUCCESS) return returnVal;?
As for safety, as long as dynamic memory and other resources are released properly, there shouldn't be a problem, right?
So far it seems using longjmp isn't difficult and it even makes my code terser. I'm tempted to use it a lot.
(IMHO in many cases there is no dynamic memory/resources allocated within a deep recursion in the first place. Deep function call seems more common for data parsing/manipulation/validation. Dynamic allocation often happens at a higher level, before invoking the function where setjmp appears.)

setjmp and longjmp can be seen as a poor man's exception mechanism. BTW, Ocaml exceptions are as quick as setjmp but have a much clearer semantics.
Of course a longjmp is much faster than repeatedly returning error codes in intermediate functions, since it pops up a perhaps significant call stack portion.
(I am implicitly focusing on Linux)
They are valid and useful as long as no resources are allocated between them, including:
heap memory (malloc)
fopen-ing FILE* handles
opening operating system file descriptors (e.g. for sockets)
other operating system resources, such as timers or signal handlers
getting some external resource managed by some server, e.g. X11 windows (hence using any widget toolkit like GTK), or database handle or connection...
etc...
The main issue is that that property of not leaking resources is a global whole-program property (or at least global to all functions possibly called between setjmp and longjmp), so it prohibits modular software development : any other colleague having to improve some code in any function between setjmp and longjmp has to be aware of that limitation and follow that discipline.
Hence, if you use setjmp document that very clearly.
BTW, if you only care about malloc, using systematically Boehm's conservative garbage collector would help a lot; you'll use GC_malloc instead of malloc everywhere and you won't care about free, and practically that is enough; then you can use setjmp without fears (since you could call GC_malloc between setjmp and longjmp).
(notice that the concepts and the terminology around garbage collector are quite related to exception handling and setjmp, but many people don't know them enough. Reading the Garbage Collection Handbook should be worthwhile)
Read also about RAII and learn about C++11 exceptions (and their relation to destructors). Learn a bit about continuations and CPS.
Read setjmp(3), longjmp(3) (and also about sigsetjmp, siglongjmp, and setcontext(3)) and be aware that the compiler has to know about setjmp

You should note that calling setjmp in some contexts is not guaranteed to be safe (for example, you can't portably store the return value of setjmp).
Also, if you want to access local variables after calling setjmp, in the same function, that could have been changed you should mark that variables as volatile.
Using setjmp and longjmp is also useful because if the recursion causes a Stack Overflow, you can recover with a longjmp from the signal handler (don't forget to set an alternate stack) and return an error instead. If you want to do that you should consider to use sigsetjmp and siglongjmp for preserving signal dispositions.

Related

Deep stack unwinding

First of all, this is definitely about C, no C++ solutions are requested.
Target:
Return to the caller function (A) beyond multiple stack frames.
I have some solutions, but none of them feels like the best option.
The easiest one in the sense of implementation is longjmp/setjmp, but I am not sure
if it destroys auto variables, because as wiki refers, no normal stack unwinding
taking part if longjmp is performed.
Here is a short description of the program flow:
the A function calls file processing function, which results in many internal
and recursive invocations. At some point, file reader meets EOF, so the job of
file processing is done and control should be given to A function.
Comparing each read character against EOF or '\0'? No, thanks.
UPD: I can avoid dynamic allocations in the call chain between setjmp and longjmp.
Not being sure about auto variables, I do not know what will happen in sequential calls
to file processing (there is more than 1 file).
So:
1) Whats about 'no stack unwinding' by longjmp? How danger is that if I got all the
data holders available (pointers).
2) Other neat and effective ways to go back to the A frame?
I don't know what you read somewhere, but setjmp/longjmp is exactly the tool foreseen for the task.
longjmp re-establishes the "stack" exactly (well sort of) as it has been at the call to setjmp, all modifications to the "stack" that had been done between the two are lost, including all auto variables that have been defined. This re-establishment of the stack is brute forward, in C there is no concept of destructors, and this is perhaps meant by "no stack unwinding".
I put "stack" in quotes since this is not a term that the C standard applies, it only talks about state and allows that this is organized how it pleases to the implementation.
Now the only information that you are able to keep from the time between setjmp and longjmp are:
the value that you pass to longjmp
the value of modified volatile objects that you defined before setjmp
So in the branch where you come back from longjmp you have to use this (and only this) information to cleanup your mess: close files, free objects that you malloced etc.

Call C function with different stack pointer (gcc)

I'm looking for a way to call a C function in a different stack, i.e. save the current stack pointer, set the stack pointer to a different location, call the function and restore the old stack pointer when it returns.
The purpose of this is a lightweight threading system for a programming language. Threads will operate on very small stacks, check when more stack is needed and dynamically resize it. This is so that thousands of threads can be allocated without wasting a lot of memory. When calling in to C code it is not safe to use a tiny stack, since the C code does not know about checking and resizing, so I want to use a big pthread stack which is used only for calling C (shared between lightweight threads on the same pthread).
Now I could write assembly code stubs which will work fine, but I wondered if there is a better way to do this, such as a gcc extension or a library which already implements it. If not, then I guess I'll have my head buried in ABI and assembly language manuals ;-) I only ask this out of laziness and not wanting to reinvent the wheel.
Assuming you're using POSIX threads and on a POSIX system, you can achieve this with signals. Setup an alternate signal handling stack (sigaltstack) and designate one special real-time signal to have its handler run on the alternate signal stack. Then raise the signal to switch to the stack, and have the signal handler read the data for what function to call, and what argument to pass it, from thread-local data.
Note that this approach is fairly expensive (multiple system calls to change stacks), but should be 100% portable to POSIX systems. Since it's slow, you might want to make arch-specific call-on-alt-stack functions written in assembly, and only use my general solution as a fallback for archs where you haven't written an assembly version.

Is function call an effective memory barrier for modern platforms?

In a codebase I reviewed, I found the following idiom.
void notify(struct actor_t act) {
write(act.pipe, "M", 1);
}
// thread A sending data to thread B
void send(byte *data) {
global.data = data;
notify(threadB);
}
// in thread B event loop
read(this.sock, &cmd, 1);
switch (cmd) {
case 'M': use_data(global.data);break;
...
}
"Hold it", I said to the author, a senior member of my team, "there's no memory barrier here! You don't guarantee that global.data will be flushed from the cache to main memory. If thread A and thread B will run in two different processors - this scheme might fail".
The senior programmer grinned, and explained slowly, as if explaining his five years old boy how to tie his shoelaces: "Listen young boy, we've seen here many thread related bugs, in high load testing, and in real clients", he paused to scratch his longish beard, "but we've never had a bug with this idiom".
"But, it says in the book..."
"Quiet!", he hushed me promptly, "Maybe theoretically, it's not guaranteed, but in practice, the fact you used a function call is effectively a memory barrier. The compiler will not reorder the instruction global.data = data, since it can't know if anyone using it in the function call, and the x86 architecture will ensure that the other CPUs will see this piece of global data by the time thread B reads the command from the pipe. Rest assured, we have ample real world problems to worry about. We don't need to invest extra effort in bogus theoretical problems.
"Rest assured my boy, in time you'll understand to separate the real problem from the I-need-to-get-a-PhD non-problems."
Is he correct? Is that really a non-issue in practice (say x86, x64 and ARM)?
It's against everything I learned, but he does have a long beard and a really smart looks!
Extra points if you can show me a piece of code proving him wrong!
Memory barriers aren't just to prevent instruction reordering. Even if instructions aren't reordered it can still cause problems with cache coherence. As for the reordering - it depends on your compiler and settings. ICC is particularly agressive with reordering. MSVC w/ whole program optimization can be, too.
If your shared data variable is declared as volatile, even though it's not in the spec most compilers will generate a memory variable around reads and writes from the variable and prevent reordering. This is not the correct way of using volatile, nor what it was meant for.
(If I had any votes left, I'd +1 your question for the narration.)
In practice, a function call is a compiler barrier, meaning that the compiler will not move global memory accesses past the call. A caveat to this is functions which the compiler knows something about, e.g. builtins, inlined functions (keep in mind IPO!) etc.
So a processor memory barrier (in addition to a compiler barrier) is in theory needed to make this work. However, since you're calling read and write which are syscalls that change the global state, I'm quite sure that the kernel issues memory barriers somewhere in the implementation of those. There is no such guarantee though, so in theory you need the barriers.
The basic rule is: the compiler must make the global state appear to be exactly as you coded it, but if it can prove that a given function doesn't use global variables then it can implement the algorithm any way it chooses.
The upshot is that traditional compilers always treated functions in another compilation unit as a memory barrier because they couldn't see inside those functions. Increasingly, modern compilers are growing "whole program" or "link time" optimization strategies which break down these barriers and will cause poorly written code to fail, even though it's been working fine for years.
If the function in question is in a shared library then it won't be able to see inside it, but if the function is one defined by the C standard then it doesn't need to -- it already knows what the function does -- so you have to be careful of those also. Note that a compiler will not recognise a kernel call for what it is, but the very act of inserting something that the compiler can't recognise (inline assembler, or a function call to an assembler file) will create a memory barrier in itself.
In your case, notify will either be a black box the compiler can't see inside (a library function) or else it will contain a recognisable memory barrier, so you are most likely safe.
In practice, you have to write very bad code to fall over this.
In practice, he's correct and a memory barrier is implied in this specific case.
But the point is that if its presence is "debatable", the code is already too complex and unclear.
Really guys, use a mutex or other proper constructs. It's the only safe way to deal with threads and to write maintainable code.
And maybe you'll see other errors, like that the code is unpredictable if send() is called more than one time.

Catching stack overflow

What's the best way to catch stack overflow in C?
More specifically:
A C program contains an interpreter for a scripting language.
Scripts are not trusted, and may contain infinite recursion bugs. The interpreter has to be able to catch these and smoothly continue. (Obviously this can partly be handled by using a software stack, but performance is greatly improved if substantial chunks of library code can be written in C; at a minimum, this entails C functions running over recursive data structures created by scripts.)
The preferred form of catching a stack overflow would involve longjmp back to the main loop. (It's perfectly okay to discard all data that was held in stack frames below the main loop.)
The fallback portable solution is to use addresses of local variables to monitor the current stack depth, and for every recursive function to contain a call to a stack checking function that uses this method. Of course, this incurs some runtime overhead in the normal case; it also means if I forget to put the stack check call in one place, the interpreter will have a latent bug.
Is there a better way of doing it? Specifically, I'm not expecting a better portable solution, but if I had a system specific solution for Linux and another one for Windows, that would be okay.
I've seen references to something called structured exception handling on Windows, though the references I've seen have been about translating this into the C++ exception handling mechanism; can it be accessed from C, and if so is it useful for this scenario?
I understand Linux lets you catch a segmentation fault signal; is it possible to reliably turn this into a longjmp back to your main loop?
Java seems to support catching stack overflow exceptions on all platforms; how does it implement this?
Off the top of my head, one way to catch excessive stack growth is to check the relative difference in addresses of stack frames:
#define MAX_ROOM (64*1024*1024UL) // 64 MB
static char * first_stack = NULL;
void foo(...args...)
{
char stack;
// Compare addresses of stack frames
if (first_stack == NULL)
first_stack = &stack;
if (first_stack > &stack && first_stack - &stack > MAX_ROOM ||
&stack > first_stack && &stack - first_stack > MAX_ROOM)
printf("Stack is larger than %lu\n", (unsigned long)MAX_ROOM);
...code that recursively calls foo()...
}
This compares the address of the first stack frame for foo() to the current stack frame address, and if the difference exceeds MAX_ROOM it writes a message.
This assumes that you're on an architecture that uses a linear always-grows-down or always-grows-up stack, of course.
You don't have to do this check in every function, but often enough that excessively large stack growth is caught before you hit the limit you've chosen.
AFAIK, all mechanisms for detecting stack overflow will incur some runtime cost. You could let the CPU detect seg-faults, but that's already too late; you've probably already scribbled all over something important.
You say that you want your interpreter to call precompiled library code as much as possible. That's fine, but to maintain the notion of a sandbox, your interpreter engine should always be responsible for e.g. stack transitions and memory allocation (from the interpreted language's point of view); your library routines should probably be implemented as callbacks. The reason being that you need to be handling this sort of thing at a single point, for reasons that you've already pointed out (latent bugs).
Things like Java deal with this by generating machine code, so it's simply a case of generating code to check this at every stack transition.
(I won't bother those methods depending on particular platforms for "better" solutions. They make troubles, by limiting the language design and usability, with little gain. For answers "just work" on Linux and Windows, see above.)
First of all, in the sense of C, you can't do it in a portable way. In fact, ISO C mandates no "stack" at all. Pedantically, it even seems when allocation of automatic objects failed, the behavior is literally undefined, as per Clause 4p2 - there is simply no guarantee what would happen when the calls nested too deep. You have to rely on some additional assumptions of implementation (of ISA or OS ABI) to do that, so you end up with C + something else, not only C. Runtime machine code generation is also not portable in C level.
(BTW, ISO C++ has a notion of stack unwinding, but only in the context of exception handling. And there is still no guarantee of portable behavior on stack overflow; though it seems to be unspecified, not undefined.)
Besides to limit the call depth, all ways have some extra runtime cost. The cost would be quite easily observable unless there are some hardware-assisted means to amortize it down (like page table walking). Sadly, this is not the case now.
The only portable way I find is to not rely on the native stack of underlying machine architecture. This in general means you have to allocate the activation record frames as part of the free store (on the heap), rather than the native stack provided by ISA. This does not only work for interpreted language implementations, but also for compiled ones, e.g. SML/NJ. Such software stack approach does not always incur worse performance because they allow providing higher level abstraction in the object language so the programs may have more opportunities to be optimized, though it is not likely in a naive interpreter.
You have several options to achieve this. One way is to write a virtual machine. You can allocate memory and build the stack in it.
Another way is to write sophisticated asynchronous style code (e.g. trampolines, or CPS transformation) in your implementation instead, relying on less native call frames as possible. It is generally difficult to get right, but it works. Additional capabilities enabled by such way are easier tail call optimization and easier first-class continuation capture.

Patterns for freeing memory in C?

I'm currently working on a C based application am a bit stuck on freeing memory in a non-antipattern fashion. I am a memory-management amateur.
My main problem is I declare memory structures in various different scopes, and these structures get passed around by reference to other functions. Some of those functions may throw errors and exit().
How do I go about freeing my structures if I exit() in one scope, but not all my data structures are in that scope?
I get the feeling I need to wrap it all up in a psuedo exception handler and have the handler deal with freeing, but that still seems ugly because it would have to know about everything I may or may not need to free...
Consider wrappers to malloc and using them in a disciplined way. Track the memory that you do allocate (in a linked list maybe) and use a wrapper to exit to enumerate your memory to free it. You could also name the memory with an additional parameter and member of your linked list structure. In applications where allocated memory is highly scope dependent you will find yourself leaking memory and this can be a good method to dump the memory and analyze it.
UPDATE:
Threading in your application will make this very complex. See other answers regarding threading issues.
You don't need to worry about freeing memory when exit() is called. When the process exits, the operating system will free all of the associated memory.
I think to answer this question appropriately, we would need to know about the architecture of your entire program (or system, or whatever the case may be).
The answer is: it depends. There are a number of strategies you can use.
As others have pointed out, on a modern desktop or server operating system, you can exit() and not worry about the memory your program has allocated.
This strategy changes, for example, if you are developing on an embedded operating system where exit() might not clean everything up. Typically what I see is when individual functions return due to an error, they make sure to clean up anything they themselves have allocated. You wouldn't see any exit() calls after calling, say, 10 functions. Each function would in turn indicate an error when it returns, and each function would clean up after itself. The original main() function (if you will - it might not be called main()) would detect the error, clean up any memory it had allocated, and take the appropriate actions.
When you just have scopes-within-scopes, it's not rocket science. Where it gets difficult is if you have multiple threads of execution, and shared data structures. Then you might need a garbage collector or a way to count references and free the memory when the last user of the structure is done with it. For example, if you look at the source to the BSD networking stack, you'll see that it uses a refcnt (reference count) value in some structures that need to be kept "alive" for an extended period of time and shared among different users. (This is basically what garbage collectors do, as well.)
You can create a simple memory manager for malloc'd memory that is shared between scopes/functions.
Register it when you malloc it, de-register it when you free it. Have a function that frees all registered memory before you call exit.
It adds a bit of overhead, but it helps keep track of memory. It can also help you hunt down pesky memory leaks.
Michael's advice is sound - if you are exiting, you don't need to worry about freeing the memory since the system will reclaim it anyway.
One exception to that is shared memory segments - at least under System V Shared Memory. Those segments can persist longer than the program that creates them.
One option not mentioned so far is to use an arena-based memory allocation scheme, built on top of standard malloc(). If the entire application uses a single arena, your cleanup code can release that arena, and all is freed at once. (APR - Apache Portable Runtime - provides a pools feature which I believe is similar; David Hanson's "C Interfaces and Implementations" provides an arena-based memory allocation system; I've written one that you could use if you wanted to.) You can think of this as "poor man's garbage collection".
As a general memory discipline, every time you allocate memory dynamically, you should understand which code is going to release it and when it can be released. There are a few standard patterns. The simplest is "allocated in this function; released before this function returns". This keeps the memory largely under control (if you don't run too many iterations on the loop that contains the memory allocation), and scopes it so that it can be made available to the current function and the functions it calls. Obviously, you have to be reasonably sure that the functions you call are not going to squirrel away (cache) pointers to the data and try to reuse them later after you've released and reused the memory.
The next standard pattern is exemplified by fopen() and fclose(); there's a function that allocates a pointer to some memory, which can be used by the calling code, and then released when the program has finished with it. However, this often becomes very similar to the first case - it is usually a good idea to call fclose() in the function that called fopen() too.
Most of the remaining 'patterns' are somewhat ad hoc.
People have already pointed out that you probably don't need to worry about freeing memory if you're just exiting (or aborting) your code in case of error. But just in case, here's a pattern I developed and use a lot for creating and tearing down resources in case of error. NOTE: I'm showing a pattern here to make a point, not writing real code!
int foo_create(foo_t *foo_out) {
int res;
foo_t foo;
bar_t bar;
baz_t baz;
res = bar_create(&bar);
if (res != 0)
goto fail_bar;
res = baz_create(&baz);
if (res != 0)
goto fail_baz;
foo = malloc(sizeof(foo_s));
if (foo == NULL)
goto fail_alloc;
foo->bar = bar;
foo->baz = baz;
etc. etc. you get the idea
*foo_out = foo;
return 0; /* meaning OK */
/* tear down stuff */
fail_alloc:
baz_destroy(baz);
fail_baz:
bar_destroy(bar);
fail_bar:
return res; /* propagate error code */
}
I can bet I'm going to get some comments saying "this is bad because you use goto". But this is a disciplined and structured use of goto that makes code clearer, simpler, and easier to maintain if applied consistently. You can't achieve a simple, documented tear-down path through the code without it.
If you want to see this in real in-use commercial code, take a look at, say, arena.c from the MPS (which is coincidentally a memory management system).
It's a kind of poor-man's try...finish handler, and gives you something a bit like destructors.
I'm going to sound like a greybeard now, but in my many years of working on other people's C code, lack of clear error paths is often a very serious problem, especially in network code and other unreliable situations. Introducing them has occasionally made me quite a bit of consultancy income.
There are plenty of other things to say about your question -- I'm just going to leave it with this pattern in case that's useful.
Very simply, why not have a reference counted implementation, so when you create an object and pass it around you increment and decrement the reference counted number (remember to be atomic if you have more than one thread).
That way, when an object is no longer used (zero references) you can safely delete it, or automatically delete it in the reference count decrement call.
This sounds like a task for a Boehm garbage collector.
http://www.hpl.hp.com/personal/Hans_Boehm/gc/
Depends on the system of course whether you can or should afford to use it.

Resources