Tips to write thread-safe UNIX code? - c

What are the guidelines to write thread-safe UNIX code in C and C++?
I know only a few:
Don't use globals
Don't use static local storage
What others are there?

The simple thing to do is read a little. The following list contains some stuff to look at and research.
Spend time reading the Open Group Base Specification particularly the General Information section and the subsection on threads. This is the basis information for multithreading under most UN*X-alike systems.
Learn the difference between a mutex and a semaphore
Realize that everything that is shared MUST be protected. This applies to global variables, static variables, and any shared dynamically allocated memory.
Replace global state flags with condition variables. These are implemented using pthread_cond_init and related functions.
Once you understand the basics, learn about the common problems so that you can identify them when they occur:
Lock inversion deadlocks
Priority inversion - if you are interested in a real life scenario, then read this snippet about the Mars Pathfinder

It really comes down to shared state, globals and static local are examples of shared state. If you don't share state, you won't have a problem. Other examples of shared state include multiple threads writing to a file or socket.
Any shared resource will need to be managed properly - that might mean making something mutex protected, opening another file, or intelligently serializing requests.
If two threads are reading and writing from the same struct, you'll need to handle that case.

Beware of the sem_t functions, they may return uncompleted on interrupts, IO, SIGCHLD etc. If you need them, be sure to allways capture that case.
pthread_mut_t and pthread_cond_t functions are safe with respect to EINTR.

A good open book about concurrency in general can be found here: Little Book of Semaphores
It presents various problems that are solved step-by step and include solutions to common concurrency issues like starvation, race conditions etc.
It is not language-specific but contains short chapters about implementing the solutions in C with the Pthread-Library or Python.

Related

How to pass variables between pthreads?

I have two types of threads, one student the other librarian. Also I have a list of struct which holds the basic info like book name, ISBN, publishing year regarding to each books.(which is a shared resource between threads) I want to pass the pointer of a certain book in a student thread/routine to a librarian thread using condition variables. (so that a librarian could reserve the book for the student by means of signaling) How can I accomplish this is or is this even the right way to go about it?
The easiest way is to use pipes man 2 pipe.
Performance wise faster, but far more complicated ways are to use a virtual ring buffer man 3 vrb (userland pipe) or any other message passing middleware.
If these are threads (using pthread library) in the same process, you can share data since the address space is common to them. However, be aware of synchronization issues.
A common way to do that is to use a mutex for every (read or write) access to that common data. Perhaps also use condition variables for synchronization (i.e. thread A needing to tell thread B that something significant changed).
Read a good pthread tutorial (and this perhaps also).
is this even the right way to go about it?
Your example is very artificial... the only reason why you would use threads and some strange local variable list for this, is because some teacher tells you to do so. So no, this is not the right way to implement a program to be used in the real world.
In the real world, things like these would almost certainly be implemented through a database, where the DBMS handles the accessing of individual posts. Most likely in some kind of client/server system, where there is a client used by the librarian. I don't see why the student would even be part of the system, except as a data post over who borrowed the book.

Multithreading and mutexes

I'm currently beginning development on an indie game in C using the Allegro cross-platform library. I figured that I would separate things like input, sound, game engine, and graphics into their own separate threads to increase the program's robustness. Having no experience in multithreading whatsoever, my question is:
If I have a section of data in memory (say, a pointer to a data structure), is it okay for one thread to write to it at will and another to read from it at will, or would each thread have to use a mutex to lock the memory, then read or write, then unlock?
In particular, I was thinking about the interaction between the game engine and the video renderer. (This is in 2D.) My plan was for the engine to process user input, then spit out the appropriate audio and video to be fed to the speakers and monitor. I was thinking that I'd have a global pointer to the next bitmap to be drawn on the screen, and the code for the game engine and the renderer would be something like this:
ALLEGRO_BITMAP *nextBitmap;
boolean using;
void GameEngine ()
{
ALLEGRO_BITMAP *oldBitmap;
while (ContinueGameEngine())
{
ALLEGRO_BITMAP *bitmap = al_create_bitmap (width, height);
MakeTheBitmap (bitmap);
while (using) ; //The other thread is using the bitmap. Don't mess with it!
al_destroy_bitmap (nextBitmap);
nextBitmap = bitmap;
}
}
void Renderer ()
{
while (ContinueRenderer())
{
ALLEGRO_BITMAP *bitmap = al_clone_bitmap (nextBitmap);
DrawBitmapOnScreen (bitmap);
}
}
This seems unstable... maybe something would happen in the call to al_clone_bitmap but I am not quite certain how to handle something like this. I would use a mutex on the bitmap, but mutexes seem like they take time to lock and unlock and I'd like both of these threads (especially the game engine thread) to run as fast as possible. I also read up on something called a condition, but I have absolutely no idea how a condition would be applicable or useful, although I'm sure they are. Could someone point me to a tutorial on mutexes and conditions (preferably POSIX, not Windows), so I can try to figure all this out?
If I have a section of data in memory (say, a pointer to a data
structure), is it okay for one thread to write to it at will and
another to read from it at will
The answer is "it depends" which usually means "no".
Depending on what you're writing/reading, and depending on the logic of your program, you could wind up with wild results or corruption if you try writing and reading with no synchronization and you're not absolutely sure that writes and reads are atomic.
So you should just use a mutex unless:
You're absolutely sure that writes and reads are atomic, and you're absolutely sure that one thread is only reading (ideally you'd use some kind of specific support for atomic operations such as the Interlocked family of functions from WinAPI).
You absolutely need the tiny performance gain from not locking.
Also worth noting that your while (using); construct would be a lot more reliable, correct, and would probably even perform better if you used a spin lock (again if you're absolutely sure you need a spin lock, rather than a mutex).
The tool that you need is called atomic operations which would ensure that the reader thread only reads whole data as written by the other thread. If you don't use such operations, the data may only be read partially, thus what it read may may make no sense at all in terms of your application.
The new standard C11 has these operations, but it is not yet widely implemented. But many compilers should have extension that implement these. E.g gcc has a series of builtin functions that start with a __sync prefix.
There are a lot of man pages in 'google'. Search for them. I found http://www.yolinux.com/TUTORIALS/LinuxTutorialPosixThreads.html in a few search minutes:
Besides, begin with a so little example, increasing difficulty. Firstable with threads creation and termination, threads returns, threads sincronization. Continue with posix mutex and conditions and understand all these terms.
One important documentation feed is linux man and info pages.
Good luck
If I have a section of data in memory (say, a pointer to a data structure), is it okay for one thread to write to it at will and another to read from it at will, or would each thread have to use a mutex to lock the memory, then read or write, then unlock?
If you have section of data in memory where two different threads are reading and writing this is called the critical section and is a common issue of the consumer and producer.
There are many resources that speak to this issue:
https://docs.oracle.com/cd/E19455-01/806-5257/sync-31/index.html
https://stackoverflow.com/questions/tagged/producer-consumer
But yes if you are going to be using two different threads to read and write you will have to implement the use of mutexes or another form of locking and unlocking.

Reader-Writer using semaphores and shared memory in C

I'm trying to make a simple reader/writer program using POSIX named semaphores, its working, but on some systems, it halts immediately on the first semaphore and thats it ... I'm really desperate by now. Can anyone help please? Its working fine on my system, so i can't track the problem by ltrace. (sorry for the comments, I'm from czech republic)
https://www.dropbox.com/s/hfcp44u2r0jd7fy/readerWriter.c
POSIX semaphores are not well suited for application code since they are interruptible. Basically any sort of IO to your processes will mess up your signalling. Please have a look at this post.
So you'd have to be really careful to interpret all error returns from the sem_ functions properly. In the code that you posted there is no such thing.
If your implementation of POSIX supports them, just use rwlocks, they are made for this, are much higher level and don't encounter that difficulty.
In computer science, the readers-writers problems are examples of a common computing problem in concurrency. There are at least three variations of the problems, which deal with situations in which many threads try to access the same shared memory at one time. Some threads may read and some may write, with the constraint that no process may access the share for either reading or writing, while another process is in the act of writing to it. (In particular, it is allowed for two or more readers to access the share at the same time.) A readers-writer lock is a data structure that solves one or more of the readers-writers problems.

Information Exchange between two threads by calling a shared DLL

Can you create a "conversation" (or-Information Exchange) between 2 threads, if those two threads are calling a shared DLL library? And, if this conversation is possible, What are the requirements or restrictions for it to actually take place between the threads?
This question was given to us by our professor. I can only assume, by the question's context, that my professor is referring to synchronization required between the two threads for the conversation to succeed, or restricting the DLL linking type (Implicit or Explicit).
Then again, assumptions or not, I am rather at a loss here :)
P.s. - In this case, we are programming in C.
Thanks in advance for your help :)
It appears that your professor is testing your understanding of what space DLLs are loaded into, and how this relates to threads.
Without doing your homework for you, I encourage you to consider what happens if two threads each call LoadLibrary() on a particular DLL. Is the DLL loaded into the process twice?
Given the result of the above, what implications does this have regarding the two threads making calls into that DLL?
Did you think about using Boost.Interprocess, because C++ has many implicit allocations. In general you need a system-wide mutex in order to synchronize access to that portion of memory.
I think that give each thread calls for LoadLibrary() the system will allocate different memory segment for each DLL thus each thread will not have a mutual resource to work with thus they will be unable to exchange any information.
but...
Say we will link explicitly to the DLL using #Pragam Comment(lib, "myDLL.lib")
I think that in this way you'll be able to share resources between threads because the DLL is fully loaded at the program startup.
Jeff? .. is this right ?...

Safety nets in complex multi-threaded code?

As a developer who has just finished writing thousands of lines of complex multi-threaded 'C' code in a project, and which is going to be enhanced, modified etc. by several other developers unfamiliar with this code in the future, I wanted to find out what kind of safety nets do you guys try to put in such code? As an example I could do these:
Define accessor macros for lock protected
structure members, which assert that
the corresponding lock is held. This
makes it clear that these members
are lock-protected to anyone unfamiliar with this code.
Functions which are supposed to be
called with some spinlock held,
assert that the spinlock is being held.
What kind of safety nets have you put into multi-threaded code that you have written?
What kind of problems have you faced when other developers modified such code?
What kind of debugging aids have you put into such code?
Thanks for your comments.
There are a number of things we do in our product (a hypervisor designed to help you find concurrency bugs in applications) that are more generally useful. Note that we do these in our code itself (because its a highly concurrent piece of software) and that some of these are useful whether or not you are writing concurrent code.
Like you, we have the ability to assert(lock_held(...)) and use it.
We also (because we have our own scheduler) can assert(single_threaded()) for those (rare) situations where we count on no other thread being active in the system.
Memory corruption from one thread to another is pretty common (and hard to debug) so we do two things to address this: sprinkled throughout our thread stack are some magic cookies. We periodically (in our get_thread_id()) function invoke a "validate_thread_stack()" function that checks these cookies to make sure the stack is not corrupted.
Our malloc sticks magic cookies before and after a malloc block of memory and checks these on free. If anyone overruns their data these can be used to find the corruption early.
On free() we blast a well known pattern (in our case 0xdddd...) over the memory. This nicely corrupts anyone else who had a dangling pointer left over to that memory region.
We have a guard page (a memory page not mapped into the address space) near the bottom of the thread stack. If the thread overruns its stack, we catch it via page fault and drop into our debugger.
Our locks are witnessed. Checkout the FreeBSD lock witness code. Its like that but homebrew. Basically the witness code is a lightweight way of detecting potential deadlocks by looking at cycles in the lock acquisition graph.
Our locks are also wrapped with accessors that record the file/line number of acquisition and release. For double unlocks or double locks, you get pretty debug information on your screwup.
Our locks are also profiled. Once you get your code working you want it working well. We track the usual things like how many acquisitions, how long it took to acquire it.
In our system, we have an expectation that locks are not contended (we carefully designed the code this way). So if you wait for a spin lock longer than a second or two in our system you get dropped into the debugger because its most likely not a good thing.
Our variables that are meant to be updated atomically are wrapped inside of C struct's. The reason for this is to prevent sloppy code where you mix good use: atomic_increment(&var); and bad use var++. We make it very hard to write the latter code.
"volatile" is forbidden in our code base because its ambiguously implemented by compilers. Its a bad way to try and cobble together synchronization.
And of course code reviews. If you can't explain your concurrency assumptions and locking discipline to a colleague, then there's definitely issues with the code :-)
Make everything absolutely obvious, so that other developers cannot miss the synchronization scope when they view subsections of the code in isolation.
for example: don't hold a lock in code that spans multiple files.
Seems like you've answered your own question: put lots of assertions into the code. They will tell other developers what invariants and preconditions must hold.

Resources