Deep stack unwinding - c

First of all, this is definitely about C, no C++ solutions are requested.
Target:
Return to the caller function (A) beyond multiple stack frames.
I have some solutions, but none of them feels like the best option.
The easiest one in the sense of implementation is longjmp/setjmp, but I am not sure
if it destroys auto variables, because as wiki refers, no normal stack unwinding
taking part if longjmp is performed.
Here is a short description of the program flow:
the A function calls file processing function, which results in many internal
and recursive invocations. At some point, file reader meets EOF, so the job of
file processing is done and control should be given to A function.
Comparing each read character against EOF or '\0'? No, thanks.
UPD: I can avoid dynamic allocations in the call chain between setjmp and longjmp.
Not being sure about auto variables, I do not know what will happen in sequential calls
to file processing (there is more than 1 file).
So:
1) Whats about 'no stack unwinding' by longjmp? How danger is that if I got all the
data holders available (pointers).
2) Other neat and effective ways to go back to the A frame?

I don't know what you read somewhere, but setjmp/longjmp is exactly the tool foreseen for the task.
longjmp re-establishes the "stack" exactly (well sort of) as it has been at the call to setjmp, all modifications to the "stack" that had been done between the two are lost, including all auto variables that have been defined. This re-establishment of the stack is brute forward, in C there is no concept of destructors, and this is perhaps meant by "no stack unwinding".
I put "stack" in quotes since this is not a term that the C standard applies, it only talks about state and allows that this is organized how it pleases to the implementation.
Now the only information that you are able to keep from the time between setjmp and longjmp are:
the value that you pass to longjmp
the value of modified volatile objects that you defined before setjmp
So in the branch where you come back from longjmp you have to use this (and only this) information to cleanup your mess: close files, free objects that you malloced etc.

Related

Python C API - Is it thread safe?

I have a C extension that is called from my multithreaded Python application. I use a static variable i somewhere in a C function, and I have a few i++ statements later on that can be run from different Python threads (that variable is only used in my C code though, I don't yield it to Python).
For some reason I haven't met any race condition so far, but I wonder if it's just luck...
I don't have any thread-related C code (no Py_BEGIN_ALLOW_THREADS or anything).
I know that the GIL only guarantees single bytecode instructions to be atomic and thread-safe, thus statements as i+=1 in Python are not thread-safe.
But I don't know about a i++ instruction in a C extension. Any help ?
Python will not release the GIL when you are running C code (unless you either tell it to or cause the execution of Python code - see the warning note at the bottom!). It only releases the GIL just before a bytecode instruction (not during) and from the interpreter's point of view running a C function is part of executing the CALL_FUNCTION bytecode.* (Unfortunately I can't find a reference for this paragraph currently, but I'm almost certain it's right)
Therefore, unless you do anything specific your C code will be the only thread running and thus any operation you do in it should be thread safe.
If you specifically want to release the GIL - for example because you're doing a long calculation which doesn't interfere with Python, reading from a file, or sleeping while waiting for something else to happen - then the easiest way is to do Py_BEGIN_ALLOW_THREADS then Py_END_ALLOW_THREADS when you want to get it back. During this block you cannot use most Python API functions and it's your responsibility to ensure thread safety in C. The easiest way to do this is to only use local variables and not read or write any global state.
If you've already got a C thread running without the GIL (thread A) then simply holding the GIL in thread B does not guarantee that thread A won't modify C global variables. To be safe you need to ensure that you never modify global state without some kind of locking mechanism (either the Python GIL or a C mechanism) in all your C functions.
Additional thought
* One place where the GIL can be released in C code is if the C code calls something that causes Python code to be executed. This might be through using PyObject_Call. A less obvious place would be if Py_DECREF caused a destructor to be executed. You'd have the GIL back by the time your C code resumed, but you could no longer guarantee that global objects were unchanged. This obvious doesn't affect simple C like x++.
Belated Edit:
It should be emphasised that it's really, really, really easy to cause the execution of Python code. For this reason you shouldn't use the GIL in place of a mutex or actual locking mechanism. You should only consider it for operations that are really atomic (i.e. a single C API call) or entirely on non-Python C objects. You won't lose the GIL unexpected while executing C Code, but a lot of C API calls may release the GIL, do something else, and then regain the GIL before returning to your C code.
The purpose the GIL is to make sure that the Python internals don't get corrupted. The GIL will continue to serve this purpose within an extension module. However race conditions that involve valid Python objects arranged in ways you don't expect are still available to you. For example:
PySequence_SetItem(some_list, 0, some_item);
PyObject* item = PySequence_GetItem(some_list, 0);
assert(item == some_item); // may not be true
// the destructor of the previous contents of item 0 may have released the GIL

Recursion With Memory Allocation

I was watching a video about CUDA and the Barnes-Hut algorithm where it was stated that it is necessary to place a depth limit on the tree for the GPU, and then the idea popped into my head about possibly doing recursion in the heap.
Basically, I am wondering just that: Is it possible to allocate memory from the heap and use that as a temporary "stack" in which to place function calls for the recursive function in question to somewhat delay a stack overflow?
If so, how could it be implemented, would we allocate space for a pointer to the function? I assume it would involve storing function address in the heap however I'm not too sure.
[edit] I just wanted to add that this is purely a theoretical question, and i would imagine that doing this would cause the program to slow down once using the heap.
[edit] As per request, the compiler I am using is GCC 4.8.4 on Ubuntu 14.04 (64-bit)
Sure. This is called continuation-passing style. The standard library supports it with setjmp() and longjmp(), and stores the information needed to restore control to an earlier point in a structure called jmp_buf. (There are several restrictions on where you can restore from.) You would store them in a stack, which is just a LIFO queue.
A more general approach is to run the program as a state machine and store the information needed to backtrack the program state, called a continuation, in a data structure called a trampoline. A common reason to want to do this is to get the equivalent of tail-recursion in an implementation that doesn’t optimize it and might chew up lots of stack space. One real-world application where someone I know is currently writing a trampoline is a GLL parser where the grammar is represented as a directed graph, the result of the parse is a shared packed parse forest, and the parser often needs to backtrack to try a different rule.
Continuation-passing and trampolines seem to be regarded as fancy style because they come from the world of functional programming, while longjmp() is regarded as an ugly low-level hack and even the Linux man page says not to use it.
You can simulate this by implementing your own heap-based stack as an array of structures, with each structure representing a stack frame that holds the equivalent of parameters and local variables. Instead of a function calling itself recursively, the function loops and each "call" explicitly pushes a new frame onto the stack.
I did exactly this years ago while attempting to solve a simple board game. The program was originally recursive, and it took forever to run. I changed it to the above structure, and this made it simple to make the app interruptible/restartable. When interrupted the app dumped its "stack" to a state file. When restarted, the app loaded the state file and continued where it left off.
This does required some care if the stack frame structure contains embedded pointers, but it's not insurmountable.

Safe usage of `setjmp` and `longjmp`

I know people always say don't use longjmp, it's evil, it's dangerous.
But I think it can be useful for exiting deep recursions/nested function calls.
Is a single longjmp faster than a lot of repeated checks and returns like if(returnVal != SUCCESS) return returnVal;?
As for safety, as long as dynamic memory and other resources are released properly, there shouldn't be a problem, right?
So far it seems using longjmp isn't difficult and it even makes my code terser. I'm tempted to use it a lot.
(IMHO in many cases there is no dynamic memory/resources allocated within a deep recursion in the first place. Deep function call seems more common for data parsing/manipulation/validation. Dynamic allocation often happens at a higher level, before invoking the function where setjmp appears.)
setjmp and longjmp can be seen as a poor man's exception mechanism. BTW, Ocaml exceptions are as quick as setjmp but have a much clearer semantics.
Of course a longjmp is much faster than repeatedly returning error codes in intermediate functions, since it pops up a perhaps significant call stack portion.
(I am implicitly focusing on Linux)
They are valid and useful as long as no resources are allocated between them, including:
heap memory (malloc)
fopen-ing FILE* handles
opening operating system file descriptors (e.g. for sockets)
other operating system resources, such as timers or signal handlers
getting some external resource managed by some server, e.g. X11 windows (hence using any widget toolkit like GTK), or database handle or connection...
etc...
The main issue is that that property of not leaking resources is a global whole-program property (or at least global to all functions possibly called between setjmp and longjmp), so it prohibits modular software development : any other colleague having to improve some code in any function between setjmp and longjmp has to be aware of that limitation and follow that discipline.
Hence, if you use setjmp document that very clearly.
BTW, if you only care about malloc, using systematically Boehm's conservative garbage collector would help a lot; you'll use GC_malloc instead of malloc everywhere and you won't care about free, and practically that is enough; then you can use setjmp without fears (since you could call GC_malloc between setjmp and longjmp).
(notice that the concepts and the terminology around garbage collector are quite related to exception handling and setjmp, but many people don't know them enough. Reading the Garbage Collection Handbook should be worthwhile)
Read also about RAII and learn about C++11 exceptions (and their relation to destructors). Learn a bit about continuations and CPS.
Read setjmp(3), longjmp(3) (and also about sigsetjmp, siglongjmp, and setcontext(3)) and be aware that the compiler has to know about setjmp
You should note that calling setjmp in some contexts is not guaranteed to be safe (for example, you can't portably store the return value of setjmp).
Also, if you want to access local variables after calling setjmp, in the same function, that could have been changed you should mark that variables as volatile.
Using setjmp and longjmp is also useful because if the recursion causes a Stack Overflow, you can recover with a longjmp from the signal handler (don't forget to set an alternate stack) and return an error instead. If you want to do that you should consider to use sigsetjmp and siglongjmp for preserving signal dispositions.

C memcpy() a function

Is there any method to calculate size of a function? I have a pointer to a function and I have to copy entire function using memcpy. I have to malloc some space and know 3rd parameter of memcpy - size. I know that sizeof(function) doesn't work. Do you have any suggestions?
Functions are not first class objects in C. Which means they can't be passed to another function, they can't be returned from a function, and they can't be copied into another part of memory.
A function pointer though can satisfy all of this, and is a first class object. A function pointer is just a memory address and it usually has the same size as any other pointer on your machine.
It doesn't directly answer your question, but you should not implement call-backs from kernel code to user-space.
Injecting code into kernel-space is not a great work-around either.
It's better to represent the user/kernel barrier like a inter-process barrier. Pass data, not code, back and forth between a well defined protocol through a char device. If you really need to pass code, just wrap it up in a kernel module. You can then dynamically load/unload it, just like a .so-based plugin system.
On a side note, at first I misread that you did want to pass memcpy() to the kernel. You have to remind that it is a very special function. It is defined in the C standard, quite simple, and of a quite broad scope, so it is a perfect target to be provided as a built-in by the compiler.
Just like strlen(), strcmp() and others in GCC.
That said, the fact that is a built-in does not impede you ability to take a pointer to it.
Even if there was a way to get the sizeof() a function, it may still fail when you try to call a version that has been copied to another area in memory. What if the compiler has local or long jumps to specific memory locations. You can't just move a function in memory and expect it to run. The OS can do that but it has all the information it takes to do it.
I was going to ask how operating systems do this but, now that I think of it, when the OS moves stuff around it usually moves a whole page and handles memory such that addresses translate to a page/offset. I'm not sure even the OS ever moves a single function around in memory.
Even in the case of the OS moving a function around in memory, the function itself must be declared or otherwise compiled/assembled to permit such action, usually through a pragma that indicates the code is relocatable. All the memory references need to be relative to its own stack frame (aka local variables) or include some sort of segment+offset structure such that the CPU, either directly or at the behest of the OS, can pick the appropriate segment value. If there was a linker involved in creating the app, the app may have to be
re-linked to account for the new function address.
There are operating systems which can give each application its own 32-bit address space but it applies to the entire process and any child threads, not to an individual function.
As mentioned elsewhere, you really need a language where functions are first class objects, otherwise you're out of luck.
You want to copy a function? I do not think that this is possible in C generally.
Assume, you have a Harvard-Architecture microcontroller, where code (in other words "functions") is located in ROM. In this case you cannot do that at all.
Also I know several compilers and linkers, which do optimization on file (not only function level). This results in opcode, where parts of C functions are mixed into each other.
The only way which I consider as possible may be:
Generate opcode of your function (e.g. by compiling/assembling it on its own).
Copy that opcode into an C array.
Use a proper function pointer, pointing to that array, to call this function.
Now you can perform all operations, common to typical "data", on that array.
But apart from this: Did you consider a redesign of your software, so that you do not need to copy a functions content?
I don't quite understand what you are trying to accomplish, but assuming you compile with -fPIC and don't have your function do anything fancy, no other function calls, not accessing data from outside function, you might even get away with doing it once. I'd say the safest possibility is to limit the maximum size of supported function to, say, 1 kilobyte and just transfer that, and disregard the trailing junk.
If you really needed to know the exact size of a function, figure out your compiler's epilogue and prologue. This should look something like this on x86:
:your_func_epilogue
mov esp, ebp
pop ebp
ret
:end_of_func
;expect a varying length run of NOPs here
:next_func_prologue
push ebp
mov ebp, esp
Disassemble your compiler's output to check, and take the corresponding assembled sequences to search for. Epilogue alone might be enough, but all of this can bomb if searched sequence pops up too early, e.g. in the data embedded by the function. Searching for the next prologue might also get you into trouble, i think.
Now please ignore everything that i wrote, since you apparently are trying to approach the problem in the wrong and inherently unsafe way. Paint us a larger picture please, WHY are you trying to do that, and see whether we can figure out an entirely different approach.
A similar discussion was done here:
http://www.motherboardpoint.com/getting-code-size-function-c-t95049.html
They propose creating a dummy function after your function-to-be-copied, and then getting the memory pointers to both. But you need to switch off compiler optimizations for it to work.
If you have GCC >= 4.4, you could try switching off the optimizations for your function in particular using #pragma:
http://gcc.gnu.org/onlinedocs/gcc/Function-Specific-Option-Pragmas.html#Function-Specific-Option-Pragmas
Another proposed solution was not to copy the function at all, but define the function in the place where you would want to copy it to.
Good luck!
If your linker doesn't do global optimizations, then just calculate the difference between the function pointer and the address of the next function.
Note that copying the function will produce something which can't be invoked if your code isn't compiled relocatable (i.e. all addresses in the code must be relative, for example branches; globals work, though since they don't move).
It sounds like you want to have a callback from your kernel driver to userspace, so that it can inform userspace when some asynchronous job has finished.
That might sound sensible, because it's the way a regular userspace library would probably do things - but for the kernel/userspace interface, it's quite wrong. Even if you manage to get your function code copied into the kernel, and even if you make it suitably position-independent, it's still wrong, because the kernel and userspace code execute in fundamentally different contexts. For just one example of the differences that might cause problems, if a page fault happens in kernel context due to a swapped-out page, that'll cause a kernel oops rather than swapping the page in.
The correct approach is for the kernel to make some file descriptor readable when the asynchronous job has finished (in your case, this file descriptor almost certainly be the character device your driver provides). The userspace process can then wait for this event with select / poll, or with read - it can set the file descriptor non-blocking if wants, and basically just use all the standard UNIX tools for dealing with this case. This, after all, is how the asynchronous nature of network sockets (and pretty much every other asychronous case) is handled.
If you need to provide additional information about what the event that occured, that can be made available to the userspace process when it calls read on the readable file descriptor.
Function isn't just object you can copy. What about cross-references / symbols and so on? Of course you can take something like standard linux "binutils" package and torture your binaries but is it what you want?
By the way if you simply are trying to replace memcpy() implementation, look around LD_PRELOAD mechanics.
I can think of a way to accomplish what you want, but I won't tell you because it's a horrific abuse of the language.
A cleaner method than disabling optimizations and relying on the compiler to maintain order of functions is to arrange for that function (or a group of functions that need copying) to be in its own section. This is compiler and linker dependant, and you'll also need to use relative addressing if you call between the functions that are copied. For those asking why you would do this, its a common requirement in embedded systems that need to update the running code.
My suggestion is: don't.
Injecting code into kernel space is such an enormous security hole that most modern OSes forbid self-modifying code altogether.
As near as I can tell, the original poster wants to do something that is implementation-specific, and so not portable; this is going off what the C++ standard says on the subject of casting pointers-to-functions, rather than the C standard, but that should be good enough here.
In some environments, with some compilers, it might be possible to do what the poster seems to want to do (that is, copy a block of memory that is pointed to by the pointer-to-function to some other location, perhaps allocated with malloc, cast that block to a pointer-to-function, and call it directly). But it won't be portable, which may not be an issue. Finding the size required for that block of memory is itself dependent on the environment, and compiler, and may very well require some pretty arcane stuff (e.g., scanning the memory for a return opcode, or running the memory through a disassembler). Again, implementation-specific, and highly non-portable. And again, may not matter for the original poster.
The links to potential solutions all appear to make use of implementation-specific behaviour, and I'm not even sure that they do what the purport to do, but they may be suitable for the OP.
Having beaten this horse to death, I am curious to know why the OP wants to do this. It would be pretty fragile even if it works in the target environment (e.g., could break with changes to compiler options, compiler version, code refactoring, etc). I'm glad that I don't do work where this sort of magic is necessary (assuming that it is)...
I have done this on a Nintendo GBA where I've copied some low level render functions from flash (16 bit access slowish memory) to the high speed workspace ram (32 bit access, at least twice as fast). This was done by taking the address of the function immdiately after the function I wanted to copy, size = (int) (NextFuncPtr - SourceFuncPtr). This did work well but obviously cant be garunteed on all platforms (does not work on Windows for sure).
I think one solution can be as below.
For ex: if you want to know func() size in program a.c, and have indicators before and after the function.
Try writing a perl script which will compile this file into object format(cc -o) make sure that pre-processor statements are not removed. You need them later on to calculate the size from object file.
Now search for your two indicators and find out the code size in between.

Patterns for freeing memory in C?

I'm currently working on a C based application am a bit stuck on freeing memory in a non-antipattern fashion. I am a memory-management amateur.
My main problem is I declare memory structures in various different scopes, and these structures get passed around by reference to other functions. Some of those functions may throw errors and exit().
How do I go about freeing my structures if I exit() in one scope, but not all my data structures are in that scope?
I get the feeling I need to wrap it all up in a psuedo exception handler and have the handler deal with freeing, but that still seems ugly because it would have to know about everything I may or may not need to free...
Consider wrappers to malloc and using them in a disciplined way. Track the memory that you do allocate (in a linked list maybe) and use a wrapper to exit to enumerate your memory to free it. You could also name the memory with an additional parameter and member of your linked list structure. In applications where allocated memory is highly scope dependent you will find yourself leaking memory and this can be a good method to dump the memory and analyze it.
UPDATE:
Threading in your application will make this very complex. See other answers regarding threading issues.
You don't need to worry about freeing memory when exit() is called. When the process exits, the operating system will free all of the associated memory.
I think to answer this question appropriately, we would need to know about the architecture of your entire program (or system, or whatever the case may be).
The answer is: it depends. There are a number of strategies you can use.
As others have pointed out, on a modern desktop or server operating system, you can exit() and not worry about the memory your program has allocated.
This strategy changes, for example, if you are developing on an embedded operating system where exit() might not clean everything up. Typically what I see is when individual functions return due to an error, they make sure to clean up anything they themselves have allocated. You wouldn't see any exit() calls after calling, say, 10 functions. Each function would in turn indicate an error when it returns, and each function would clean up after itself. The original main() function (if you will - it might not be called main()) would detect the error, clean up any memory it had allocated, and take the appropriate actions.
When you just have scopes-within-scopes, it's not rocket science. Where it gets difficult is if you have multiple threads of execution, and shared data structures. Then you might need a garbage collector or a way to count references and free the memory when the last user of the structure is done with it. For example, if you look at the source to the BSD networking stack, you'll see that it uses a refcnt (reference count) value in some structures that need to be kept "alive" for an extended period of time and shared among different users. (This is basically what garbage collectors do, as well.)
You can create a simple memory manager for malloc'd memory that is shared between scopes/functions.
Register it when you malloc it, de-register it when you free it. Have a function that frees all registered memory before you call exit.
It adds a bit of overhead, but it helps keep track of memory. It can also help you hunt down pesky memory leaks.
Michael's advice is sound - if you are exiting, you don't need to worry about freeing the memory since the system will reclaim it anyway.
One exception to that is shared memory segments - at least under System V Shared Memory. Those segments can persist longer than the program that creates them.
One option not mentioned so far is to use an arena-based memory allocation scheme, built on top of standard malloc(). If the entire application uses a single arena, your cleanup code can release that arena, and all is freed at once. (APR - Apache Portable Runtime - provides a pools feature which I believe is similar; David Hanson's "C Interfaces and Implementations" provides an arena-based memory allocation system; I've written one that you could use if you wanted to.) You can think of this as "poor man's garbage collection".
As a general memory discipline, every time you allocate memory dynamically, you should understand which code is going to release it and when it can be released. There are a few standard patterns. The simplest is "allocated in this function; released before this function returns". This keeps the memory largely under control (if you don't run too many iterations on the loop that contains the memory allocation), and scopes it so that it can be made available to the current function and the functions it calls. Obviously, you have to be reasonably sure that the functions you call are not going to squirrel away (cache) pointers to the data and try to reuse them later after you've released and reused the memory.
The next standard pattern is exemplified by fopen() and fclose(); there's a function that allocates a pointer to some memory, which can be used by the calling code, and then released when the program has finished with it. However, this often becomes very similar to the first case - it is usually a good idea to call fclose() in the function that called fopen() too.
Most of the remaining 'patterns' are somewhat ad hoc.
People have already pointed out that you probably don't need to worry about freeing memory if you're just exiting (or aborting) your code in case of error. But just in case, here's a pattern I developed and use a lot for creating and tearing down resources in case of error. NOTE: I'm showing a pattern here to make a point, not writing real code!
int foo_create(foo_t *foo_out) {
int res;
foo_t foo;
bar_t bar;
baz_t baz;
res = bar_create(&bar);
if (res != 0)
goto fail_bar;
res = baz_create(&baz);
if (res != 0)
goto fail_baz;
foo = malloc(sizeof(foo_s));
if (foo == NULL)
goto fail_alloc;
foo->bar = bar;
foo->baz = baz;
etc. etc. you get the idea
*foo_out = foo;
return 0; /* meaning OK */
/* tear down stuff */
fail_alloc:
baz_destroy(baz);
fail_baz:
bar_destroy(bar);
fail_bar:
return res; /* propagate error code */
}
I can bet I'm going to get some comments saying "this is bad because you use goto". But this is a disciplined and structured use of goto that makes code clearer, simpler, and easier to maintain if applied consistently. You can't achieve a simple, documented tear-down path through the code without it.
If you want to see this in real in-use commercial code, take a look at, say, arena.c from the MPS (which is coincidentally a memory management system).
It's a kind of poor-man's try...finish handler, and gives you something a bit like destructors.
I'm going to sound like a greybeard now, but in my many years of working on other people's C code, lack of clear error paths is often a very serious problem, especially in network code and other unreliable situations. Introducing them has occasionally made me quite a bit of consultancy income.
There are plenty of other things to say about your question -- I'm just going to leave it with this pattern in case that's useful.
Very simply, why not have a reference counted implementation, so when you create an object and pass it around you increment and decrement the reference counted number (remember to be atomic if you have more than one thread).
That way, when an object is no longer used (zero references) you can safely delete it, or automatically delete it in the reference count decrement call.
This sounds like a task for a Boehm garbage collector.
http://www.hpl.hp.com/personal/Hans_Boehm/gc/
Depends on the system of course whether you can or should afford to use it.

Resources