Multithreading and mutexes

Multithreading and mutexes - c

I'm currently beginning development on an indie game in C using the Allegro cross-platform library. I figured that I would separate things like input, sound, game engine, and graphics into their own separate threads to increase the program's robustness. Having no experience in multithreading whatsoever, my question is:
If I have a section of data in memory (say, a pointer to a data structure), is it okay for one thread to write to it at will and another to read from it at will, or would each thread have to use a mutex to lock the memory, then read or write, then unlock?
In particular, I was thinking about the interaction between the game engine and the video renderer. (This is in 2D.) My plan was for the engine to process user input, then spit out the appropriate audio and video to be fed to the speakers and monitor. I was thinking that I'd have a global pointer to the next bitmap to be drawn on the screen, and the code for the game engine and the renderer would be something like this:
ALLEGRO_BITMAP *nextBitmap;
boolean using;
void GameEngine ()
{
ALLEGRO_BITMAP *oldBitmap;
while (ContinueGameEngine())
{
ALLEGRO_BITMAP *bitmap = al_create_bitmap (width, height);
MakeTheBitmap (bitmap);
while (using) ; //The other thread is using the bitmap. Don't mess with it!
al_destroy_bitmap (nextBitmap);
nextBitmap = bitmap;
}
}
void Renderer ()
{
while (ContinueRenderer())
{
ALLEGRO_BITMAP *bitmap = al_clone_bitmap (nextBitmap);
DrawBitmapOnScreen (bitmap);
}
}
This seems unstable... maybe something would happen in the call to al_clone_bitmap but I am not quite certain how to handle something like this. I would use a mutex on the bitmap, but mutexes seem like they take time to lock and unlock and I'd like both of these threads (especially the game engine thread) to run as fast as possible. I also read up on something called a condition, but I have absolutely no idea how a condition would be applicable or useful, although I'm sure they are. Could someone point me to a tutorial on mutexes and conditions (preferably POSIX, not Windows), so I can try to figure all this out?

If I have a section of data in memory (say, a pointer to a data
structure), is it okay for one thread to write to it at will and
another to read from it at will
The answer is "it depends" which usually means "no".
Depending on what you're writing/reading, and depending on the logic of your program, you could wind up with wild results or corruption if you try writing and reading with no synchronization and you're not absolutely sure that writes and reads are atomic.
So you should just use a mutex unless:
You're absolutely sure that writes and reads are atomic, and you're absolutely sure that one thread is only reading (ideally you'd use some kind of specific support for atomic operations such as the Interlocked family of functions from WinAPI).
You absolutely need the tiny performance gain from not locking.
Also worth noting that your while (using); construct would be a lot more reliable, correct, and would probably even perform better if you used a spin lock (again if you're absolutely sure you need a spin lock, rather than a mutex).

The tool that you need is called atomic operations which would ensure that the reader thread only reads whole data as written by the other thread. If you don't use such operations, the data may only be read partially, thus what it read may may make no sense at all in terms of your application.
The new standard C11 has these operations, but it is not yet widely implemented. But many compilers should have extension that implement these. E.g gcc has a series of builtin functions that start with a __sync prefix.

There are a lot of man pages in 'google'. Search for them. I found http://www.yolinux.com/TUTORIALS/LinuxTutorialPosixThreads.html in a few search minutes:
Besides, begin with a so little example, increasing difficulty. Firstable with threads creation and termination, threads returns, threads sincronization. Continue with posix mutex and conditions and understand all these terms.
One important documentation feed is linux man and info pages.
Good luck

If I have a section of data in memory (say, a pointer to a data structure), is it okay for one thread to write to it at will and another to read from it at will, or would each thread have to use a mutex to lock the memory, then read or write, then unlock?
If you have section of data in memory where two different threads are reading and writing this is called the critical section and is a common issue of the consumer and producer.
There are many resources that speak to this issue:
https://docs.oracle.com/cd/E19455-01/806-5257/sync-31/index.html
https://stackoverflow.com/questions/tagged/producer-consumer
But yes if you are going to be using two different threads to read and write you will have to implement the use of mutexes or another form of locking and unlocking.

Related

Do I need a mutex to protect a int value which could be get/set via sysfs?

Multiple user space processes could access this value at the same time so I guess we should use some locks or memory barrier things for safe but I could find quite a lot code in linux driver who doesn't, or just protect the write case.
Do we really need a mutex for both read case and write case?

It depends on the CPU and the system that the code is executed. Actually you can do this without synchronization techniques if the operation is atomic. As long as you're not sure about this it's better to use a synchronization object. For int/dword values most of the time people do this without sych object.
Read this article
http://preshing.com/20130618/atomic-vs-non-atomic-operations/
and also a same issue Are C++ Reads and Writes of an int Atomic?

Calling convention which only allows one instance of a function at a time

Say I have multiple threads and all threads call the same function at approximately the same time.
Is there a calling convention which would only allow one instance of the function at any time? What I mean is that the function called by the second thread would only start after the function called by the first thread had returned.
Or are these calling conventions compiler specific? I don't have a whole lot of experience using them.

(Skip to the bottom if you don't care about the threading mumbo-jumbo)
As mentioned before, this is not a "calling convention" but a general problem of computing: concurrency. And the particular case where two or more threads can enter a shared zone at a time, and have a different outcome, is called a race condition (and also extends to/from electronics, and other areas).
The hard thing about threading is that computing is such a deterministic affair, but when threading gets involved, it adds a degree of uncertainty, which vary per platform/OS.
A one-thread affair would guarantee that it can do all tasks in the same order, always, but when you got multiple threads, and the order depends on how fast they can complete a task, shared other applications wanting to use the CPU, then the underlying hardware affects the results.
There's not much of a "sure fire way to do threading", as there's techniques, tools and libraries to deal with individual cases.
Locking in
The most well known technique is using semaphores (or locks), and the most well known semaphore is the mutex one, which only allows one thread at a time to access a shared space, by having a sort of "flag" that is raised once a thread has entered.
if (locked == NO)
{
locked = YES;
// Do ya' thing
locked = NO;
}
The code above, although it looks like it could work, it would not guarantee against cases where both threads pass the if () and then set the variable (which threads can easily do). So there's hardware support for this kind of operation, that guarantees that only one thread can execute it: The testAndSet operation, that checks and then, if available, sets the variable. (Here's the x86 instruction from the instruction set)
On the same vein of locks and semaphores, there's also the read-write lock, that allows multiple readers and one writer, specially useful for things with low volatility. And there's many other variations, some that limit an X amount of threads and whatnot.
But overall, locks are lame, since they are basically forcing serialisation of multi-threading, where threads actually need to get stuck trying to get a lock (or just testing it and leaving). Kinda defeats the purpose of having multiple threads, doesn't it?
The best solution in terms of threading, is to minimise the amount of shared space that threads need to use, possibly, elmininating it completely. Maybe use rwlocks when volatility is low, try to have "try and leave" kind of threads, that check if the lock is up, and then go away if it isn't, etc.
As my OS teacher once said (in Zen-like fashion): "The best kind of locking is the one you can avoid".
Thread Pools
Now, threading is hard, no way around it, that's why there are patterns to deal with such kind of problems, and the Thread Pool Pattern is a popular one, at least in iOS since the introduction of Grand Central Dispatch (GCD).
Instead of having a bunch of threads running amok and getting enqueued all over the place, let's have a set of threads, waiting for tasks in a "pool", and having queues of things to do, ideally, tasks that shouldn't overlap each other.
Now, the thread pattern doesn't solve the problems discussed before, but it changes the paradigm to make it easier to deal with, mentally. Instead of having to think about "threads that need to execute such and such", you just switch the focus to "tasks that need to be executed" and the matter of which thread is doing it, becomes irrelevant.
Again, pools won't solve all your problems, but it will make them easier to understand. And easier to understand may lead to better solutions.
All the theoretical things above mentioned are implemented already, at POSIX level (semaphore.h, pthreads.h, etc. pthreads has a very nice of r/w locking functions), try reading about them.
(Edit: I thought this thread was about Obj-C, not plain C, edited out all the Foundation and GCD stuff)

Calling convention defines how stack & registers are used to implement function calls. Because each thread has its own stack & registers, synchronising threads and calling convention are separate things.
To prevent multiple threads from executing the same code at the same time, you need a mutex. In your example of a function, you'd typically put the mutex lock and unlock inside the function's code, around the statements you don't want your threads to be executing at the same time.
In general terms: Plain code, including function calls, does not know about threads, the operating system does. By using a mutex you tap into the system that manages the running of threads. More details are just a Google search away.
Note that C11, the new C standard revision, does include multi-threading support. But this does not change the general concept; it simply means that you can use C library functions instead of operating system specific ones.

Making process survive failure in its thread

I'm writing app that has many independant threads. While I'm doing quite low level, dangerous stuff there, threads may fail (SIGSEGV, SIGBUS, SIGFPE) but they should not kill whole process. Is there a way to do it proper way?
Currently I intercept aforementioned signals and in their signal handler then I call pthread_exit(NULL). It seems to work but since pthread_exit is not async-signal-safe function I'm a bit concerned about this solution.
I know that splitting this app into multiple processes would solve the problem but in this case it's not an feasible option.
EDIT: I'm aware of all the Bad Things™ that can happen (I'm experienced in low-level system and kernel programming) due to ignoring SIGSEGV/SIGBUS/SIGFPE, so please try to answer my particular question instead of giving me lessons about reliability.

The PROPER way to do this is to let the whole process die, and start another one. You don't explain WHY this isn't appropriate, but in essence, that's the only way that is completely safe against various nasty corner cases (which may or may not apply in your situation).
I'm not aware of any method that is 100% safe that doesn't involve letting the whole process. (Note also that sometimes just the act of continuing from these sort of errors are "undefined behaviour" - it doesn't mean that you are definitely going to fall over, just that it MAY be a problem).
It's of course possible that someone knows of some clever trick that works, but I'm pretty certain that the only 100% guaranteed method is to kill the entire process.

Low-latency code design involves a careful "be aware of the system you run on" type of coding and deployment. That means, for example, that standard IPC mechanisms (say, using SysV msgsnd/msgget to pass messages between processes, or pthread_cond_wait/pthread_cond_signal on the PThreads side) as well as ordinary locking primitives (adaptive mutexes) are to be considered rather slow ... because they involve something that takes thousands of CPU cycles ... namely, context switches.
Instead, use "hot-hot" handoff mechanisms such as the disruptor pattern - both producers as well as consumers spin in tight loops permanently polling a single or at worst a small number of atomically-updated memory locations that say where the next item-to-be-processed is found and/or to mark a processed item complete. Bind all producers / consumers to separate CPU cores so that they will never context switch.
In this type of usecase, whether you use separate threads (and get the memory sharing implicitly by virtue of all threads sharing the same address space) or separate processes (and get the memory sharing explicitly by using shared memory for the data-to-be-processed as well as the queue mgmt "metadata") makes very little difference because TLBs and data caches are "always hot" (you never context switch).
If your "processors" are unstable and/or have no guaranteed completion time, you need to add a "reaper" mechanism anyway to deal with failed / timed out messages, but such garbage collection mechanisms necessarily introduce jitter (latency spikes). That's because you need a system call to determine whether a specific thread or process has exited, and system call latency is a few micros even in best case.
From my point of view, you're trying to mix oil and water here; you're required to use library code not specifically written for use in low-latency deployments / library code not under your control, combined with the requirement to do message dispatch with nanosec latencies. There is no way to make e.g. pthread_cond_signal() give you nsec latency because it must do a system call to wake the target up, and that takes longer.
If your "handler code" relies on the "rich" environment, and a huge amount of "state" is shared between these and the main program ... it sounds a bit like saying "I need to make a steam-driven airplane break the sound barrier"...

Reading Critical Section Data using pthreads

I have a multi-threaded application, I'm using pthreads with the pthread_mutex_lock function. The only data I need to protect is in one data structure. Is it safe if I apply the lock only when I write to the data structure? Or should I apply the lock whenever I read or write?
I found a question similar to this, but it was for Windows, from that question it would that the answer to mine would be that it is ok. Just want to make sure though.
EDIT
follow up: So I want to pass in a command line argument and on read from it (from different threads). Do I still have to use pthread_mutex_lock?

You could use a pthreads_rwlock_t to allow "one-writer OR N-readers" concurrency. But if you stick with the general pthread_mutex_lock, it needs to be acquired for ANY access to the shared data structure it's protecting, so you're cutting things down to "one reader-or-writer" concurrency.

It is necessary to apply the lock when you read as well unless you can guarantee atomic writes (at which point you don't even need to lock on write). The problem arises from writes that take more than 1 cycle.
Imagine you write 8 bytes as two 4 byte writes. If the other thread kicks off after it has half been written then the read will read invalid data. Its veyr ucommon that this happens but when it does its a hell of a bug to track down.

Yes, you need to be locked for reads as well as writes.
Compilers and CPUs do not necessarily write to a field in a structure atomically. In addition your code may not write atomically, and the structure may at certain points be out of sync with regards to itself.
If all you need to share is a single integer value, you might choose to use atomic integers. GCC has atomic attributes you can use. This is not as portable as using pthreads locks.

Safety nets in complex multi-threaded code?

As a developer who has just finished writing thousands of lines of complex multi-threaded 'C' code in a project, and which is going to be enhanced, modified etc. by several other developers unfamiliar with this code in the future, I wanted to find out what kind of safety nets do you guys try to put in such code? As an example I could do these:
Define accessor macros for lock protected
structure members, which assert that
the corresponding lock is held. This
makes it clear that these members
are lock-protected to anyone unfamiliar with this code.
Functions which are supposed to be
called with some spinlock held,
assert that the spinlock is being held.
What kind of safety nets have you put into multi-threaded code that you have written?
What kind of problems have you faced when other developers modified such code?
What kind of debugging aids have you put into such code?
Thanks for your comments.

There are a number of things we do in our product (a hypervisor designed to help you find concurrency bugs in applications) that are more generally useful. Note that we do these in our code itself (because its a highly concurrent piece of software) and that some of these are useful whether or not you are writing concurrent code.
Like you, we have the ability to assert(lock_held(...)) and use it.
We also (because we have our own scheduler) can assert(single_threaded()) for those (rare) situations where we count on no other thread being active in the system.
Memory corruption from one thread to another is pretty common (and hard to debug) so we do two things to address this: sprinkled throughout our thread stack are some magic cookies. We periodically (in our get_thread_id()) function invoke a "validate_thread_stack()" function that checks these cookies to make sure the stack is not corrupted.
Our malloc sticks magic cookies before and after a malloc block of memory and checks these on free. If anyone overruns their data these can be used to find the corruption early.
On free() we blast a well known pattern (in our case 0xdddd...) over the memory. This nicely corrupts anyone else who had a dangling pointer left over to that memory region.
We have a guard page (a memory page not mapped into the address space) near the bottom of the thread stack. If the thread overruns its stack, we catch it via page fault and drop into our debugger.
Our locks are witnessed. Checkout the FreeBSD lock witness code. Its like that but homebrew. Basically the witness code is a lightweight way of detecting potential deadlocks by looking at cycles in the lock acquisition graph.
Our locks are also wrapped with accessors that record the file/line number of acquisition and release. For double unlocks or double locks, you get pretty debug information on your screwup.
Our locks are also profiled. Once you get your code working you want it working well. We track the usual things like how many acquisitions, how long it took to acquire it.
In our system, we have an expectation that locks are not contended (we carefully designed the code this way). So if you wait for a spin lock longer than a second or two in our system you get dropped into the debugger because its most likely not a good thing.
Our variables that are meant to be updated atomically are wrapped inside of C struct's. The reason for this is to prevent sloppy code where you mix good use: atomic_increment(&var); and bad use var++. We make it very hard to write the latter code.
"volatile" is forbidden in our code base because its ambiguously implemented by compilers. Its a bad way to try and cobble together synchronization.
And of course code reviews. If you can't explain your concurrency assumptions and locking discipline to a colleague, then there's definitely issues with the code :-)

Make everything absolutely obvious, so that other developers cannot miss the synchronization scope when they view subsections of the code in isolation.
for example: don't hold a lock in code that spans multiple files.

Seems like you've answered your own question: put lots of assertions into the code. They will tell other developers what invariants and preconditions must hold.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight