C Programming: Debugging with pthreads - c

One of the hardest things for me to initially adjust to was my first intense experience programming with pthreads in C. I was used to knowing exactly what the next line of code to be run would be and most of my debugging techniques centered around that expectation.
What are some good techniques to debugging with pthreads in C? You can suggest personal methodologies without any added tools, tools you use, or anything else that helps you debug.
P.S. I do my C programming using gcc in linux, but don't let that necessarily restrain your answer

Valgrind is an excellent tool to find race conditions and pthreads API misuses. It keeps a model of program memory (and perhaps of shared resources) accesses and will detect missing locks even when the bug is benign (which of course means that it will completely unexpectedly become less benign at some later point).
To use it, you invoke valgrind --tool=helgrind, here is its manual. Also, there is valgrind --tool=drd (manual). Helgrind and DRD use different models so they detect overlapping but possibly different set of bugs. False positives also may occur.
Anyway, valgrind has saved countless hours of debugging (not all of them though :) for me.

One of the things that will suprise you about debugging threaded programs is that you will often find the bug changes, or even goes away when you add printf's or run the program in the debugger (colloquially known as a Heisenbug).
In a threaded program, a Heisenbug usually means you have a race condition. A good programmer will look for shared variables or resources that are order-dependent. A crappy programmer will try to blindly fix it with sleep() statements.

Debugging a multithreaded application is difficult. A good debugger such as GDB (with optional DDD front end) for the *nix environment or the one that comes with Visual Studio on windows will help tremendously.

In the 'thinking' phase, before you start coding, use the State Machine concept. It can make the design much clearer.
printf's can help you understand the dynamics of your program. But they clutter up the source code, so use a macro DEBUG_OUT() and in its definition enable it with a boolean flag. Better still, set/clear this flag with a signal that you send via 'kill -USR1'. Send the output to a log file with a timestamp.
also consider using assert(), and then analyze your core dumps using gdb and ddd.

My approach to multi-threaded debugging is similar to single-threaded, but more time is usually spent in the thinking phase:
Develop a theory as to what could be causing the problem.
Determine what kind of results could be expected if the theory is true.
If necessary, add code that can disprove or verify your results and theory.
If your theory is true, fix the problem.
Often, the 'experiment' that proves the theory is the addition of a critical section or mutex around suspect code. I will then try to narrow down the problem by systematically shrinking the critical section. Critical sections are not always the best fix (though can often be the quick fix). However, they're useful for pinpointing the 'smoking gun'.
Like I said, the same steps apply to single-threaded debugging, though it is far too easy to just jump into a debugger and have at it. Multi-threaded debugging requires a much stronger understanding of the code, as I usually find the running multi-threaded code through a debugger doesn't yield anything useful.
Also, hellgrind is a great tool. Intel's Thread Checker performs a similar function for Windows, but costs a lot more than hellgrind.

I pretty much develop in an exclusively multi-threaded, high performance world so here's the general practice I use.
Design- the best optimization is a better algorithm:
1) Break you functions into LOGICALLY separable pieces. This means that a call does "A" and ONLY "A"- not A then B then C...
2) NO SIDE EFFECTS: Abolish all nakedly global variables, static or not. If you cannot fully abolish side effects, isolate them to a few locations (concentrate them in the code).
3) Make as many isolated components RE-ENTRANT as possible. This means they're stateless- they take all their inputs as constants and only manipulate DECLARED, logically constant parameters to produce the output. Pass-by-value instead of reference wherever you can.
4) If you have state, make a clear separation between stateless sub-assemblies and the actual state machine. Ideally the state machine will be a single function or class manipulating stateless components.
Debugging:
Threading bugs tend to come in 2 broad flavors- races and deadlocks. As a rule, deadlocks are much more deterministic.
1) Do you see data corruption?: YES => Probably a race.
2) Does the bug arise on EVERY run or just some runs?: YES => Likely a deadlock (races are generally non-deterministic).
3) Does the process ever hang?: YES => There's a deadlock somewhere. If it only hangs sometimes, you probably have a race too.
Breakpoints often act much like synchronization primitives THEMSELVES in the code, because they're logically similar- they force execution to stall in the current context until some other context (you) sends a signal to resume. This means that you should view any breakpoints you have in code as altering its mufti-threaded behavior, and breakpoints WILL affect race conditions but (in general) not deadlocks.
As a rule, this means you should remove all breakpoints, identify the type of bug, THEN reintroduce them to try and fix it. Otherwise, they simply distort things even more.

I tend to use lots of breakpoints. If you don't actually care about the thread function, but do care about it's side effects a good time to check them might be right before it exits or loops back to it's waiting state or whatever else it's doing.

When I started doing multithreaded programming I... stopped using debuggers.
For me the key point is good program decomposition and encapsulation.
Monitors are the easiest way of error-free multithreaded programming.
If you cannot avoid complex lock dependencies then it is easy to check if they are cyclic
- wait until program hangs ans check the stacktraces using 'pstack'.
You can break cyclic locks by introducing some new threads and asynchronous communication buffers.
Use assertions, and make sure to write singlethreaded unittests for particular components of your software - you can then run them in debugger if you want.

Related

where can I find signal and alarm function definitions?

I studied about signal and alarm functions but not satisfied hence thought that their definitions can help me.
THIS MIGHT help----->. Good Explanations ::
1.http://www.gnu.org/software/libc/manual/html_node/Signal-Handling.html#Signal-Handling
2.http://www.cs.cf.ac.uk/Dave/C/node24.html
3.http://www.thegeekstuff.com/2012/03/catch-signals-sample-c-code/
The signal(3) C library function is usually a thin wrapper around a system call to the underlying kernel. alarm(3) does a bit more work, but again falls back on the kernel's idea of time handling and signal delivery.
If you really want to know how they work, you'd have to dig into the source of a Unix(y) kernel. Be warned, the code you'll find is probably very complex, kernel programmers have to handle some very exotic corner cases, and be wary of weird uses thay might lead to security problems. All that while keeping it as fast as possible (it's code that'll be used hundreds or thousands of times a second on millions of machines).
Next best would be to check out a book on Unix internals.
Man pages are the best places to know about the working and usage of any system-defined function. From your question, you actually mean the implementations of the functions(I guess).
But ask yourself, why do you need to look into the code? All that happens is it induces more questions, since most of the variables and functions used will be defined somewhere else and you need to look into them again.
If you are so dissatisfied, then try some tough problems involving these functions.

How can I implement cooperative lightweight threading with C on Mac OS X?

I'm trying to find a lightweight cooperative threading solution to try implementing an actor model.
As far as I know, the only solution is setcontext/getcontext,
but the functionality is deprecated(?) by Apple. I'm confused by why they did this; however, I'm finding replacement for this.
Pthreads are not an option because I need cooperative model instead of preemptive model to control context switching timing precisely/manually without expensive locking.
-- edit --
Reason of avoiding pthreads:
Because pthreads are not cooperative/deterministic and too expensive. I need actor model for game logic code, so thousand of execution context are required at minimal. Hardware threading requires MB of memory and expense to create/destruct. And parallelism is not important. In fact, I just need concurrent execution of many functions. This can be implemented with many divided functions and some kind of object model, but my goal is reducing those overheads.
If I know something wrong, please correct me. It'll be very appreciated.
The obvious 'lightweight' solution is to avoid complex nested calling except for limited situations where the execution time will be tightly bounded, then store an explicit state structure for each "thread" and implement the main program logic as a state machine that's easily suspendable/resumable at most points. Then you can simply swap out the pointer to the state structure for 'context switch'. Basically this technique amounts to keeping all of your important state variables, including what would conventionally be local variables, in the state structure.
Whether this is worthwhile probably depends on your reason for avoiding pthreads. If your reason is to be portable to non-POSIX systems, or if you really need deterministic program flow, then it may be worthwhile. But if you're just worried about performance overhead and memory synchronization issues, I think you should use pthreads and manage these issues. If you avoid unnecessary locking, use fine-grained locks, and minimize the amount of time locks are held, performance should not suffer.
Edit: Based on your further details posted in the comments on the main question, I think the solution I've proposed is the right one. Each actor should have their own context in which you store the state of the actor's action/thinking/etc. You would have a run_actor function which would take an actor context and a number of "ticks" to advance the actor's state by, and a run_all_actors function which would iterate over a list of active actors and call run_actor for each with the specified number of ticks.
Further, note that this solution still allows you to use real threads to take advantage of SMP/multicore machines. You simply divide the actors up between threads. You may need some degree of locking if one actor needs to examine another's context (e.g. for collision detection).
I was researching this question as well, and I ran across GNU Pth (not to be confused with Pthreads). See http://www.gnu.org/software/pth/
It aims to be a portable solution for cooperative threads. It does mention it is implemented via setcontext/getcontext if available (so it may not be on Mac OSX). Otherwise it says it uses longjmp/setjmp, but it's not clear to me how that works.
Hope this is helpful to anyone who searches for this question.
I have discovered the some of required functionalities from setcontext/getcontext are implemented in libunwind.
Unfortunately the library won't be compiled on Mac OS X because of deprecation of the setcontext/getcontext. Anyway Apple has implemented their own libunwind which is compatible with GNU's implementation at source level. The library is exist on Mac OS X 10.6, 10.7, and iOS. (I don't know exact version in case of iOS)
This library is not documented, but I could find the headers from these locations.
/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS5.0.sdk/usr/include/libunwind.h
/Developer/Platforms/iPhoneSimulator.platform/Developer/SDKs/iPhoneSimulator4.3.sdk/usr/include/libunwind.h
/Developer/Platforms/iPhoneSimulator.platform/Developer/SDKs/iPhoneSimulator5.0.sdk/usr/include/libunwind.h
/Developer/SDKs/MacOSX10.6.sdk/usr/include/libunwind.h
/Developer/SDKs/MacOSX10.7.sdk/usr/include/libunwind.h
There was a note in the header file that to go GNU libunwind site for documentation.
I'll bet on the library.

Writing a VM - well formed bytecode?

I'm writing a virtual machine in C just for fun. Lame, I know, but luckily I'm on SO so hopefully no one will make fun :)
I wrote a really quick'n'dirty VM that reads lines of (my own) ASM and does stuff. Right now, I only have 3 instructions: add, jmp, end. All is well and it's actually pretty cool being able to feed lines (doing it something like write_line(&prog[1], "jmp", regA, regB, 0); and then running the program:
while (machine.code_pointer <= BOUNDS && DONE != true)
{
run_line(&prog[machine.cp]);
}
I'm using an opcode lookup table (which may not be efficient but it's elegant) in C and everything seems to be working OK.
My question is more of a "best practices" question but I do think there's a correct answer to it. I'm making the VM able to read binary files (storing bytes in unsigned char[]) and execute bytecode. My question is: is it the VM's job to make sure the bytecode is well formed or is it just the compiler's job to make sure the binary file it spits out is well formed?
I only ask this because what would happen if someone would edit a binary file and screw stuff up (delete arbitrary parts of it, etc). Clearly, the program would be buggy and probably not functional. Is this even the VM's problem? I'm sure that people much smarter than me have figured out solutions to these problems, I'm just curious what they are!
Is it the VM's job to make sure the bytecode is well formed or is it just the compiler's job to make sure the binary file it spits out is well formed?
You get to decide.
Best practice is to have the VM do a single check before execution, cost proportional to the size of the program, which is sophisticated enought to guarantee that nothing wonky can happen during execution. Then during actual execution of the bytecode, you run with no checks.
However, the check-before-running idea can require some very sophisticated analysis, and even the most performance-conscious VMs often have some checks at run time (example: array bounds).
For a hobby project, I'd keep things simple and have the VM check sanity every time you execute an instruction. The overhead for most instructions won't be too great.
The same issue arises in Java, and as I recall, in that case the VM does have to do some checks to make sure the bytecode is well formed. In that situation, it's actually a serious issue because of the potential for security problems: if someone can alter a Java bytecode file to contain something that the compiler would never output (such as accessing a private variable from another class), it could potentially expose sensitive data being held in the application's memory, or could allow the application to access a website that it shouldn't be allowed to, or something. Java's virtual machine includes a bytecode verifier to make sure, to the extent possible, that these sorts of things don't happen.
Now, in your case, unless your homemade language takes off and becomes popular, the security aspect is something you don't have to worry about so much; after all, who's going to be hacking your programs, other than you? Still, I would say it's a good idea to make sure that your VM at least has a reasonable failure strategy for when the bytecode is invalid. At a minimum, if it encounters something it doesn't understand and can't process, it should detect that and fail with an error message, which will make debugging easier on your part.
Virtual machines that interpret bytecode generally have some way of validating their input; for example, Java will throw a VerifyError if the class file is in an inconsistent state
However, it sounds like you're implementing a processor, and since they tend to be lower-level there's less ways you can manage to get things in a detectable invalid state -- giving it an undefined opcode is one obvious way. Real processors will signal that the process attempted to execute an illegal instruction, and the OS will deal with it (Linux kills it with SIGILL, for example)
If you're concerned about someone having edited the binary file, then there is only one answer to your question: the VM must do the check. It's the only way you have a chance to detect the tampering. The compiler just creates the binary. It has no way of detecting downstream tampering.
It makes sense to have the compiler do as much sanity checking as possible (since it only has to do it once), but there's always going to be issues that can't be detected by static analysis, like [cough] stack overflow, array range errors, and the like.
I'd say it's legitimate for your VM to let the emulated processor catch fire, as long as the VM implementation itself doesn't crash. As the VM implementor, you get to set the rules. But if you want virtual hardware companies to virtually buy your virtual chip, you'll have to do something a little more forgiving of errors: good options might be to raise an exception (harder to implement) or reset the processor (much easier). Or maybe you just define every opcode to be valid, except that some are "undocumented" - they do something unspecified, other than crashing your implementation. Rationale: if (!) your VM implementation is to run several instances of the guest simultaneously, it would be very bad if one guest were able to cause others to fail.

What are some "good" ways to use longjmp/setjmp for C error handling?

I have to use C for one project and I am thinking of using longjmp/setjmp for error handling as I think it will be much easier to handle error in one central place than return codes. I would appreciate if there are some leads on how to do this.
I am particularly concerned with resource cleanup being correctly done if any such error occurs.
Also how do I handle errors that result in multi-threaded programs using them?
Even better, is there some C library that already exists for error/exception handling?
Have a look at this example/tutorial:
http://www.di.unipi.it/~nids/docs/longjump_try_trow_catch.html
If you are worried about resource cleanup, you have to seriously wonder whether longjmp() and setjmp() are a good idea.
If you design your resource allocation system so that you can in fact clean up accurately, then it is OK - but that design tends to be tricky, and typically incomplete if, in fact, the standard libraries that your code uses themselves allocate resources that must be released. It requires extraordinary care, and because it is not wholly reliable, it is not suitable for long-running systems that might need to survive multiple uses of the setjmp()/longjmp() calls (they'll leak, expand, and eventually cause problems).
Symbian implemented it's Leave mechanism in terms of longjmp() and this serves as a good walk through of all the things you need to do.
Symbian has a global 'cleanup stack' that you push and pop things you want cleaned up should an jump happened. This is the manual alternative to the automatic stack unwinding that a C++ compiler does when a C++ exception is thrown.
Symbian had 'trap harnesses' that it would jump out to; these could be nested.
(Symbian more recently reimplemented it in terms of C++ exceptions, but the interface remains unchanged).
All together, I think that proper C++ exceptions are less prone to coding errors and much faster than rolling your own C equivalent.
(Modern C++ compilers are very good at 'zero overhead' exceptions when they are not thrown, for example; longjmp() has to store the state of all the registers and such even when the jump is not later taken, so can fundamentally never be as fast as exceptions.)
Using C++ as a better C, where you only adopt exceptions and RAII, would be a good route should using longjmp() for exception emulation be tempting to you.
I have only ever found one use for setjmp()/longjmp() and it wasn't to do with error handling.
There really is no need to use it for that since it can always be refactored into something easier to follow. The use of setjmp()/longjmp() is very similar to goto in that it can be easily abused. Anything that makes your code less readable is a bad idea in general. Note that I'm not saying they're inherently bad, just that they can lead to bad code easier than the alternatives.
FWIW, the one place they were invaluable was a project I did in the early days of the industry (MS-DOS 6 time frame). I managed to put together a co-operative multi-threading library using Turbo C which used those functions in a yield() function to switch tasks.
I'm pretty certain I haven't touched them (or had the need to) since those days.
Exceptions are by far a better general mechanism, but in the deep dark days of C past, I wrote a processor emulator that included a command shell. The shell used to setjmp/longjmp for interrupt handling (ie, the processor is running and the user hits break/ctrl-c, the code traps SIGINT and longjmps back to the shell).
I've used setjmp/longjmp reasonably tidily, to escape from within a callback, without having to negotiate my way up through various other library levels.
That case (if I recall correctly) was where a code within a yacc-generated parser could detect a (non-syntactical) problem, and wanted to abandon the parse but give a reasonably useful error report back to the caller on the other side of all the yacc-generated code. Another example was within a callback called from an Expat parser. In each case, there were other ways of doing this, but they seemed more cumbersome and obscure than simply bailing out in this way.
As other answers have pointed out, though, it's necessary to be careful about clean-up, and very thoughtful about making sure that the longjmp code is callable only within the scope of the region dynamically protected by the setjmp.
Doing it in the context of multi-threaded programming? I'm sure that's not impossible, but Oooh: get out your family-pack of aspirin now. It's probably wise to keep the setjmp/longjmp pairs as close together as possible. As long as a matching setjmp/longjmp pair are within the same thread, I expect you'll be OK, but ... be careful out there.

Reinitialize global/static memory at runtime or static analysis of global/static variables

The Problem
I'm working on a large C project (C99) that makes heavy use of global variables (I know, I know). The program works fairly well, but it was originally designed to run once and exit.
As such, it relies on it's global/static memory to be initialized with 0 (or whatever value it was declared with), and during runtime it modifies these variables (as most programs do).
However, instead of exiting on completion, I want to run the program again. I want to make a parent program that has control and visibility into this large program. Having complete visibility into the running program is very important.
The solution needs to work on macOS, Linux, and Windows.
I've considered:
1. Forking it
Make a small wrapper program that serves as the "shell", and execute the large program as needed.
Pros
OS does the hard work of resetting the memory to the correct values
Guaranteed to operate as intended
Cons
Lost visibility into the program
Can't inspect memory of executing program from wrapper during runtime, harder to tweak settings before launching, harder to collect runtime information
Need to implement a system to get internal data in/out of the program, potentially touching a lot of code
Unified experience harder (sharing a GUI window, etc)
2. Identify critical structures manually
Peruse the source, run the program multiple times, wait for program to blow up on a sanity check or bad memory access.
Pros
Easy to do
Easy to start
High visibility, code sharing, and unification
Cons
Does not catch every case, very patchwork
Time consuming
3. Refactor
Collect all globals into a single structure to memset, create initializers for variables that are initialized with a value. Handle statics on a case-by-case basis.
Pros
Conceptually easy, sledgehammer approach
High visibility, code sharing, and unification
Cons
Very time consuming, codebase large, would touch pretty much everything
4. Magic wand
Tell the OS to reinitialize global/static memory. If I need to save a value, I'll store it locally and then rewrite it when it's done.
Pros
Mostly perfect :)
Cons
Doesn't exist (?)
Very black magic
Probably not cross platform
May anger 3rd-party libs
What I am doing now
I am going with option 2 right now, just making my way through the code, leaning on the program to crash and point me in the right direction.
I'd say this method has gotten me about 80% of the way there. I've identified and reinitialized enough things that the program, more or less, can be rerun. It's not as widespread as I thought, and it gives me a lot of hope.
Occasionally, strange things happen or it doesn't operate as intended, but it also doesn't crash. This makes tracking it down more difficult.
I just need something to get me that last 20%. Maybe some sort of static analysis tool, or something to help me go through the source and see where globals are touched.
To detect easily the global and static variables you can try CppDepend and execute a cqlinq query like this one
from f in Fields where f.IsGlobal || f.IsStatic
select f
You can also modify the query if you want the variables used by a speific function or in a specific file.

Resources