Reader-Writer using semaphores and shared memory in C - c

I'm trying to make a simple reader/writer program using POSIX named semaphores, its working, but on some systems, it halts immediately on the first semaphore and thats it ... I'm really desperate by now. Can anyone help please? Its working fine on my system, so i can't track the problem by ltrace. (sorry for the comments, I'm from czech republic)
https://www.dropbox.com/s/hfcp44u2r0jd7fy/readerWriter.c

POSIX semaphores are not well suited for application code since they are interruptible. Basically any sort of IO to your processes will mess up your signalling. Please have a look at this post.
So you'd have to be really careful to interpret all error returns from the sem_ functions properly. In the code that you posted there is no such thing.
If your implementation of POSIX supports them, just use rwlocks, they are made for this, are much higher level and don't encounter that difficulty.

In computer science, the readers-writers problems are examples of a common computing problem in concurrency. There are at least three variations of the problems, which deal with situations in which many threads try to access the same shared memory at one time. Some threads may read and some may write, with the constraint that no process may access the share for either reading or writing, while another process is in the act of writing to it. (In particular, it is allowed for two or more readers to access the share at the same time.) A readers-writer lock is a data structure that solves one or more of the readers-writers problems.

Related

What happens when two or more threads or processes ftruncate(2) the same file?

As far as I understand, ftruncate(2) can't be atomic when I am expanding a file upto 2 GB length.
But what exactly happens behind the scenes? I have applied it and it seems to work fine when more than one thread expands the file but I can't be sure if it doesn't cause any data loss.
Also, let's suppose 2 threads call ftruncate(2) at the same time, the first one is on the way to expand file to 2 GB. Meanwhile the second thread calls ftruncate(2). Now the doubt is the first thread didn't finish the complete job and 2nd thread also started ftruncate, so what will happen?
Also if this causes any trouble, is using file locks a wise solution? This code I am writing is a library, so when the library is being used, I won't know state and config of the process the library is being used in. Is there a possibility of a deadlock in such a case?
Well answering this old question of mine, ftruncate(2) is atomic as far as what I observed in the past.
In the comments #BearAqua shared a link of improper functioning of ftruncate(2) with POSIX shared memory on macOS, but that's not valid for regular files.
Although as far as I remember I couldn't find it documented anywhere that ftruncate(2) is atomic or some similar reason, but yea as far as what I tested it is atomic.

How to pass variables between pthreads?

I have two types of threads, one student the other librarian. Also I have a list of struct which holds the basic info like book name, ISBN, publishing year regarding to each books.(which is a shared resource between threads) I want to pass the pointer of a certain book in a student thread/routine to a librarian thread using condition variables. (so that a librarian could reserve the book for the student by means of signaling) How can I accomplish this is or is this even the right way to go about it?
The easiest way is to use pipes man 2 pipe.
Performance wise faster, but far more complicated ways are to use a virtual ring buffer man 3 vrb (userland pipe) or any other message passing middleware.
If these are threads (using pthread library) in the same process, you can share data since the address space is common to them. However, be aware of synchronization issues.
A common way to do that is to use a mutex for every (read or write) access to that common data. Perhaps also use condition variables for synchronization (i.e. thread A needing to tell thread B that something significant changed).
Read a good pthread tutorial (and this perhaps also).
is this even the right way to go about it?
Your example is very artificial... the only reason why you would use threads and some strange local variable list for this, is because some teacher tells you to do so. So no, this is not the right way to implement a program to be used in the real world.
In the real world, things like these would almost certainly be implemented through a database, where the DBMS handles the accessing of individual posts. Most likely in some kind of client/server system, where there is a client used by the librarian. I don't see why the student would even be part of the system, except as a data post over who borrowed the book.

Calling convention which only allows one instance of a function at a time

Say I have multiple threads and all threads call the same function at approximately the same time.
Is there a calling convention which would only allow one instance of the function at any time? What I mean is that the function called by the second thread would only start after the function called by the first thread had returned.
Or are these calling conventions compiler specific? I don't have a whole lot of experience using them.
(Skip to the bottom if you don't care about the threading mumbo-jumbo)
As mentioned before, this is not a "calling convention" but a general problem of computing: concurrency. And the particular case where two or more threads can enter a shared zone at a time, and have a different outcome, is called a race condition (and also extends to/from electronics, and other areas).
The hard thing about threading is that computing is such a deterministic affair, but when threading gets involved, it adds a degree of uncertainty, which vary per platform/OS.
A one-thread affair would guarantee that it can do all tasks in the same order, always, but when you got multiple threads, and the order depends on how fast they can complete a task, shared other applications wanting to use the CPU, then the underlying hardware affects the results.
There's not much of a "sure fire way to do threading", as there's techniques, tools and libraries to deal with individual cases.
Locking in
The most well known technique is using semaphores (or locks), and the most well known semaphore is the mutex one, which only allows one thread at a time to access a shared space, by having a sort of "flag" that is raised once a thread has entered.
if (locked == NO)
{
locked = YES;
// Do ya' thing
locked = NO;
}
The code above, although it looks like it could work, it would not guarantee against cases where both threads pass the if () and then set the variable (which threads can easily do). So there's hardware support for this kind of operation, that guarantees that only one thread can execute it: The testAndSet operation, that checks and then, if available, sets the variable. (Here's the x86 instruction from the instruction set)
On the same vein of locks and semaphores, there's also the read-write lock, that allows multiple readers and one writer, specially useful for things with low volatility. And there's many other variations, some that limit an X amount of threads and whatnot.
But overall, locks are lame, since they are basically forcing serialisation of multi-threading, where threads actually need to get stuck trying to get a lock (or just testing it and leaving). Kinda defeats the purpose of having multiple threads, doesn't it?
The best solution in terms of threading, is to minimise the amount of shared space that threads need to use, possibly, elmininating it completely. Maybe use rwlocks when volatility is low, try to have "try and leave" kind of threads, that check if the lock is up, and then go away if it isn't, etc.
As my OS teacher once said (in Zen-like fashion): "The best kind of locking is the one you can avoid".
Thread Pools
Now, threading is hard, no way around it, that's why there are patterns to deal with such kind of problems, and the Thread Pool Pattern is a popular one, at least in iOS since the introduction of Grand Central Dispatch (GCD).
Instead of having a bunch of threads running amok and getting enqueued all over the place, let's have a set of threads, waiting for tasks in a "pool", and having queues of things to do, ideally, tasks that shouldn't overlap each other.
Now, the thread pattern doesn't solve the problems discussed before, but it changes the paradigm to make it easier to deal with, mentally. Instead of having to think about "threads that need to execute such and such", you just switch the focus to "tasks that need to be executed" and the matter of which thread is doing it, becomes irrelevant.
Again, pools won't solve all your problems, but it will make them easier to understand. And easier to understand may lead to better solutions.
All the theoretical things above mentioned are implemented already, at POSIX level (semaphore.h, pthreads.h, etc. pthreads has a very nice of r/w locking functions), try reading about them.
(Edit: I thought this thread was about Obj-C, not plain C, edited out all the Foundation and GCD stuff)
Calling convention defines how stack & registers are used to implement function calls. Because each thread has its own stack & registers, synchronising threads and calling convention are separate things.
To prevent multiple threads from executing the same code at the same time, you need a mutex. In your example of a function, you'd typically put the mutex lock and unlock inside the function's code, around the statements you don't want your threads to be executing at the same time.
In general terms: Plain code, including function calls, does not know about threads, the operating system does. By using a mutex you tap into the system that manages the running of threads. More details are just a Google search away.
Note that C11, the new C standard revision, does include multi-threading support. But this does not change the general concept; it simply means that you can use C library functions instead of operating system specific ones.

Tips to write thread-safe UNIX code?

What are the guidelines to write thread-safe UNIX code in C and C++?
I know only a few:
Don't use globals
Don't use static local storage
What others are there?
The simple thing to do is read a little. The following list contains some stuff to look at and research.
Spend time reading the Open Group Base Specification particularly the General Information section and the subsection on threads. This is the basis information for multithreading under most UN*X-alike systems.
Learn the difference between a mutex and a semaphore
Realize that everything that is shared MUST be protected. This applies to global variables, static variables, and any shared dynamically allocated memory.
Replace global state flags with condition variables. These are implemented using pthread_cond_init and related functions.
Once you understand the basics, learn about the common problems so that you can identify them when they occur:
Lock inversion deadlocks
Priority inversion - if you are interested in a real life scenario, then read this snippet about the Mars Pathfinder
It really comes down to shared state, globals and static local are examples of shared state. If you don't share state, you won't have a problem. Other examples of shared state include multiple threads writing to a file or socket.
Any shared resource will need to be managed properly - that might mean making something mutex protected, opening another file, or intelligently serializing requests.
If two threads are reading and writing from the same struct, you'll need to handle that case.
Beware of the sem_t functions, they may return uncompleted on interrupts, IO, SIGCHLD etc. If you need them, be sure to allways capture that case.
pthread_mut_t and pthread_cond_t functions are safe with respect to EINTR.
A good open book about concurrency in general can be found here: Little Book of Semaphores
It presents various problems that are solved step-by step and include solutions to common concurrency issues like starvation, race conditions etc.
It is not language-specific but contains short chapters about implementing the solutions in C with the Pthread-Library or Python.

Safety nets in complex multi-threaded code?

As a developer who has just finished writing thousands of lines of complex multi-threaded 'C' code in a project, and which is going to be enhanced, modified etc. by several other developers unfamiliar with this code in the future, I wanted to find out what kind of safety nets do you guys try to put in such code? As an example I could do these:
Define accessor macros for lock protected
structure members, which assert that
the corresponding lock is held. This
makes it clear that these members
are lock-protected to anyone unfamiliar with this code.
Functions which are supposed to be
called with some spinlock held,
assert that the spinlock is being held.
What kind of safety nets have you put into multi-threaded code that you have written?
What kind of problems have you faced when other developers modified such code?
What kind of debugging aids have you put into such code?
Thanks for your comments.
There are a number of things we do in our product (a hypervisor designed to help you find concurrency bugs in applications) that are more generally useful. Note that we do these in our code itself (because its a highly concurrent piece of software) and that some of these are useful whether or not you are writing concurrent code.
Like you, we have the ability to assert(lock_held(...)) and use it.
We also (because we have our own scheduler) can assert(single_threaded()) for those (rare) situations where we count on no other thread being active in the system.
Memory corruption from one thread to another is pretty common (and hard to debug) so we do two things to address this: sprinkled throughout our thread stack are some magic cookies. We periodically (in our get_thread_id()) function invoke a "validate_thread_stack()" function that checks these cookies to make sure the stack is not corrupted.
Our malloc sticks magic cookies before and after a malloc block of memory and checks these on free. If anyone overruns their data these can be used to find the corruption early.
On free() we blast a well known pattern (in our case 0xdddd...) over the memory. This nicely corrupts anyone else who had a dangling pointer left over to that memory region.
We have a guard page (a memory page not mapped into the address space) near the bottom of the thread stack. If the thread overruns its stack, we catch it via page fault and drop into our debugger.
Our locks are witnessed. Checkout the FreeBSD lock witness code. Its like that but homebrew. Basically the witness code is a lightweight way of detecting potential deadlocks by looking at cycles in the lock acquisition graph.
Our locks are also wrapped with accessors that record the file/line number of acquisition and release. For double unlocks or double locks, you get pretty debug information on your screwup.
Our locks are also profiled. Once you get your code working you want it working well. We track the usual things like how many acquisitions, how long it took to acquire it.
In our system, we have an expectation that locks are not contended (we carefully designed the code this way). So if you wait for a spin lock longer than a second or two in our system you get dropped into the debugger because its most likely not a good thing.
Our variables that are meant to be updated atomically are wrapped inside of C struct's. The reason for this is to prevent sloppy code where you mix good use: atomic_increment(&var); and bad use var++. We make it very hard to write the latter code.
"volatile" is forbidden in our code base because its ambiguously implemented by compilers. Its a bad way to try and cobble together synchronization.
And of course code reviews. If you can't explain your concurrency assumptions and locking discipline to a colleague, then there's definitely issues with the code :-)
Make everything absolutely obvious, so that other developers cannot miss the synchronization scope when they view subsections of the code in isolation.
for example: don't hold a lock in code that spans multiple files.
Seems like you've answered your own question: put lots of assertions into the code. They will tell other developers what invariants and preconditions must hold.

Resources