I've heard from various sources
(1,
2)
that one should avoid using recursive mutexes as it may be a sign of a hack or
bad design. Sometimes, however, I presume they may necessary. In light of that,
is the following an appropriate use case for a recursive mutex?
// main.c
// gcc -Wall -Wextra -Wpedantic main.c -pthread
#ifndef _GNU_SOURCE
#define _GNU_SOURCE
#endif /* _GNU_SOURCE */
#include <assert.h>
#include <pthread.h>
#include <stdlib.h>
typedef struct synchronized_counter
{
int count;
pthread_mutex_t mutex;
pthread_mutexattr_t mutexattr;
} synchronized_counter;
synchronized_counter* create_synchronized_counter()
{
synchronized_counter* sc_ptr = malloc(sizeof(synchronized_counter));
assert(sc_ptr != NULL);
sc_ptr->count = 0;
assert(pthread_mutexattr_init(&sc_ptr->mutexattr) == 0);
assert(pthread_mutexattr_settype(&sc_ptr->mutexattr,
PTHREAD_MUTEX_RECURSIVE) == 0);
assert(pthread_mutex_init(&sc_ptr->mutex, &sc_ptr->mutexattr) == 0);
return sc_ptr;
}
void synchronized_increment(synchronized_counter* sc_ptr)
{
assert(pthread_mutex_lock(&sc_ptr->mutex) == 0);
sc_ptr->count++;
assert(pthread_mutex_unlock(&sc_ptr->mutex) == 0);
}
int main()
{
synchronized_counter* sc_ptr = create_synchronized_counter();
// I need to increment this counter three times in succesion without having
// another thread increment it in between. Therefore, I acquire a lock
// before beginning.
assert(pthread_mutex_lock(&sc_ptr->mutex) == 0);
synchronized_increment(sc_ptr);
synchronized_increment(sc_ptr);
synchronized_increment(sc_ptr);
assert(pthread_mutex_unlock(&sc_ptr->mutex) == 0);
return 0;
}
EDIT:
I wanted to ask the question with a simple example, but perhaps it came across as too simple. This is what I had imagined would be a more realistic scenario: I have a stack data structure that will be accessed by multiple threads. In particular, sometimes a thread will be popping n elements from the stack, but it must do so all at once (without another thread pushing or popping from the stack in between). The crux of the design issue is if I should have the client manage locking the stack themselves with a non-recursive mutex, or have the stack provide synchronized, simple methods along with a recursive mutex which the client can use to make multiple atomic transactions that are also synchronized.
Both of your examples - the original synchronized_counter and the stack in your edit - are correct examples of using a recursive mutex, but they would be considered poor API design if you were building a data structure. I'll try to explain why.
Exposing internals - the caller is required to use the same lock that protects internal access to members of the data structure. This opens the possibility to misusing that lock for purposes other than accessing the data structure. That could lead to lock contention - or worse - deadlock.
Efficiency - it's often more efficient to implement a specialized bulk operations like increment_by(n) or pop_many(n).
First, it allows the data structure to optimize the operations - perhaps the counter can just do count += n or the stack could remove n items from a linked list in one operation. [1]
Second, you save time by not having to lock/unlock the mutex for every operation.[2]
Perhaps a better example for using a recursive mutex would be as follows:
I have a class with two methods Foo and Bar.
The class was designed to be single-threaded.
Sometimes Foo calls Bar.
I want to make the class thread-safe, so I add a mutex to the class and lock it inside Foo and Bar. Now I need to make sure that Bar can lock the mutex when called from Foo.
One way to solve this without a recursive mutex is to create a private unsynchronized_bar and have both Foo and Bar call it after locking the mutex.
This can get tricky if Foo is a virtual method that can be implemented by a sub-class and used to call Bar, or if Foo calls out to some other part of the program that can call back into Bar. However, if you have code inside critical sections (code protected by a mutex) calling into other arbitrary code, then the behaviour of the program will be hard to understand, and it is easy to cause deadlocks between different threads, even if you use a recursive mutex.
The best advice is to solve concurrency problems through good design rather than fancy synchronization primitives.
[1] There are some tricker patterns like "pop an item, look at it, decide if I pop another one", but those can be implemented by supplying a predicate to the bulk operation.
[2] Practically speaking locking a mutex you already own should be pretty cheap, but in your example it requires at least a call to an external library function, which cannot be inlined.
The logic that you describe is not, in fact, a recursive mutex, nor is it an appropriate case for one.
And, if you actually need to ensure that another thread won't increment your counter, I'm sorry to tell you that your logic as-written won't ensure that.
I therefore suggest that you step back from this, clear your head, and re-consider your actual use-case. I think that confusion about recursive mutexes has led you astray. It may well be the case that the logic which you now have in synchronized_increment ... in fact, the need for the entire method ... is unnecessary, and the logic which you show in main is all you really need, and it's just a simple variable after all.
Related
I was wondering if I can take a mutex within a task but by calling an external function.
Here is my code below:
void TakeMutexDelay50(SemaphoreHandle_t mutex)
{
while(xSemaphoreTake(mutex, 10) == pdFALSE)
{
vTaskDelay(50);
}
}
bool ContinueTaskCopy()
{
TakeMutexDelay50(ContinueTask_mutex);
bool Copy = ContinueTask;
xSemaphoreGive(ContinueTask_mutex);
return Copy;
}
Basically, my task calls the function ContinueTaskCopy(). Woud this be good practice?
The code above will work, but if you are not doing anything in the while loop for taking the mutex you could just set the timeout to portMAX_DELAY and avoid all the context switches every 50 ticks.
To answer your doubts - yes, this code will work. From the technical point of view, the RTOS code itself doesn't really care about the function which takes or releases the mutex, it's the task that executes the code that matters (or more specifically the context, as we also include interrupts).
In fact, a while ago (some FreeRTOS 7 version I think?) they've introduced an additional check in the function releasing the mutex which compares the task releasing the mutex to the task that holds the mutex. If it's not the same one, it actually fails an assert which is effectively an endless loop stopping your task from continuing further so that you can notice and fix the issue (there's extensive comments around asserts which help you diagnose the issue). It's done this way as mutexes are used to guard resources - think SD card access, display access and similar - and taking a mutex from one task and releasing it from another goes against this whole idea, or at least points to smelly code. If you need to do something like this, you likely wanted to use a semaphore instead.
Now as for the second part of your question - whether that's a good practice to do that - I'd say it depends how complicated you make it, but generally I consider it questionable. In general the code operating on a mutex looks something like this:
if(xSemaphoreTake(mutex, waitTime) == pdTRUE)
{
doResourceOperation();
xSemaphoreGive(mutex);
}
It's simple, it's easy to understand as that's how most are used to writing code like this. It pretty much avoids whole class of problems with this mutex which may arise if you start complicating code taking and releasing it ("Why isn't it released?", "Who holds this mutex?", "Am I in a deadlock?"). These kinds of problems tend to be hard to debug and sometimes even hard to fix because it may involve rearranging some parts of the code.
To give a general advice - keep it simple. Seeing some weird things being done around a mutex often means there's some questionable things going there. Possible some nasty deadlocks or race conditions. As in your example, instead of trying to take the mutex every 50ms forever until it succeeds, just wait forever by specifying portMAX_DELAY delay time for xSemaphoreTake and put it inside the same function that uses the resource and releases the mutex.
I have written the following code, and so far in all my tests it seems as if I have written a working Mutex for my 4 Threads, but I would like to get someone else's opinion on the validity of my solution.
typedef struct Mutex{
int turn;
int * waiting;
int num_processes;
} Mutex;
void enterLock(Mutex * lock, int id){
int i;
for(i = 0; i < lock->num_processes; i++){
lock->waiting[id] = 1;
if (i != id && lock->waiting[i])
i = -1;
lock->waiting[id] = 0;
}
printf("ID %d Entered\n",id);
}
void leaveLock(Mutex * lock, int id){
printf("ID %d Left\n",id);
lock->waiting[id] = 0;
}
void foo(Muted * lock, int id){
enterLock(lock,id);
// do stuff now that i have access
leaveLock(lock,id);
}
I feel compelled writing an answer here because the question is a good one, taking into concern it could help others to understand the general problem with mutual exclusion. In your case, you came a long way to hide this problem, but you can't avoid it. It boils down to this:
01 /* pseudo-code */
02 if (! mutex.isLocked())
03 mutex.lock();
You always have to expect a thread switch between lines 02 and 03. So there is a possible situation where two threads find mutex unlocked and be interrupted after that ... only to resume later and lock this mutex individually. You will have two threads entering the critical section at the same time.
What you definitely need to implement reliable mutual exclusion is therefore an atomic operation that tests a condition and at the same time sets a value without any chance to be interrupted meanwhile.
01 /* pseudo-code */
02 while (! test_and_lock(mutex));
As soon as this test_and_lock function cannot be interrupted, your implementation is safe. Until c11, C didn't provide anything like this, so implementations of pthreads needed to use e.g. assembly or special compiler intrinsics. With c11, there is finally a "standard" way to write atomic operations like this, but I can't give an example here, because I don't have experience doing that. For general use, the pthreads library will give you what you need.
edit: of course, this is still simplified -- in a multi-processor scenario, you need to ensure that even memory accesses are mutually exclusive.
The Problem I see in you code:
The idea behind a mutex is to provide mutual exclusion, means that when thread_a is in the critical section, thread_b must wait(in case he wants also to enter) for thread_a.
This waiting part should be implemented in enterLock function. But what you have is a for loop which might end way before thread_a is done from the critical section and thus thread_b could also enter, hence you can't have mutual exclusion.
Way to fix it:
Take a look for example at Peterson's algorithm or Dekker's(more complicated), what they did there is what's called busy waiting which is basically a while loop which says:
while(i can't enter) { do nothing and wait...}
You are totally ignoring the topic of memory models. Unless you are on a machine with a sequential consistent memory model (which none of today's PC CPUs are), your code is incorrect, as any store executed by one thread is not necessarily immediately visible to other CPUs. However, exactly this seems to be an assumption in your code.
Bottom line: Use the existing synchronization primitives provided by the OS or a runtime library such a POSIX or Win32 API and don't try to be smart and implement this yourself. Unless you have years of experince in parallel programming as well as in-depth knowledge of CPU architecture, chances are quite good that you end up with an incorrect implementation. And debugging parallel programms can be hell...
After enterLock() returns, the state of the Mutex object is the same as before the function was called. Hence it will not prevent a second thread to enter the same Mutex object even before the first one released it calling leaveLock(). There is no mutual exclusiveness.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I have a C program with 16-million-odd linked lists, and 4 worker threads.
No two threads should work on the same linked list at the same time, otherwise they might be modifying it simultaneously, which would be bad.
My initial naive solution was something like this:
int linked_lists_locks[NUM_LINKED_LISTS];
for (i=0; i< NUM_LINKED_LISTs; i++)
linked_lists_locks[i] = 0;
then later, in a section executed by each thread as it works:
while ( linked_lists_locks[some_list] == 1 ) {
/* busy wait */
}
linked_lists_locks[some_list] = 1; // mark it locked lock it
/* work with the list */
linked_lists_locks[some_list] = 0;
However, with 4 threads and ~250,000,000 operations I quickly got into cases where both threads did the same "is it locked" simultaneously and problems ensued. Smart people here would have seen that coming :-)
I've looked at some locking algorithms like Dekker's and Peterson's, but they seem to be more "lock this section of code" whereas what I'm looking for is "lock this variable". I suspect that if I lock the "work with the list" section of code, everything slows to a crawl because then only one thread can work (though I haven't tried it). Essentially, each worker's job is limited to doing some math and populating these lists. Cases where each thread wants to work on the same list simultaneously are rare, btw - only a few thousand times out of 250M operations, but they do happen.
Is there an algorithm or approach for implementing locks on many variables as opposed to sections of code? This is C (on Linux if that matters) so synchronized array lists, etc. from Java/C#/et al are not available.
It would be useful to know more about how your application is organized, but here are a few ideas about how to approach the problem.
A common solution for "synchronized" objects is to assign a mutex to each object. Before working on an object, the thread needs to acquire the object's mutex; when it is done, it releases the mutex. That's simple and effective, but if you really have 16 million lockable objects, it's a lot of overhead. More seriously, if two tasks really try to work on the same object at the same time, one of them will end up sleeping until the other one releases the lock. If there was something else the tasks might have been doing, the opportunity has been lost.
A simple solution to the first problem -- the overhead of 16 million mutexes -- is to use a small vector of mutexes and a hash function which maps each object to one mutex. If you only have four tasks, and you used a vector of, say, 1024 mutexes, you will occasionally end up with a thread needlessly waiting for another thread, but it won't be very common.
If lock contention really turns out to be a problem and it is possible to vary the order in which work is one, a reasonable model is a workqueue. Here, when a thread wants to do something, it takes a task off the workqueue, attempts to lock the task's object (using trylock instead of lock), and if that works, does the task. If the lock fails, it just puts the task back on the workqueue and grabs another one. To avoid workqueue lock contention, it's common for threads to grab a handful of tasks instead of one; each thread then manages its own subqueue. Tuning the various parameters in this solution requires knowing at least a bit about the characteristics of the tasks. (There is a kind of race condition in this solution, but it doesn't matter; it just means that occasionally tasks will be deferred unnecessarily. But they should always get executed eventually.)
You should use an atomic test and set operation. Unfortunately, you may need to use an assembly routine if your compiler doesn't have a built-in for that. See this article:
http://en.wikipedia.org/wiki/Test-and-set
If you are absolutely forced to use this many lists, and you have very few threads, you might not want to lock the lists, but allow the worker-threads to claim a single list at a time. In this case you need a structure to store the number of the list currently held and the list must not be aliased with another number.
Since you didn't seem to use any library I'll add some pseudo-code to clarify my idea:
/*
* list_number, the number of the list you want to lock
* my_id, the id of the thread trying to lock this list
* mutex, the mutex used to control locking the lists
* active_lists, array containing the lists currently held by the threads
* num_threads, size of the array and also number of threads
*/
void lock_list(int list_number, int my_id, some_mutex *mutex,
atomic_int *active_lists, size_t num_threads) {
int ok = 0;
int i;
while (true){ //busy wait to claim the lock
//first check if anyone seems to hold the list we want.
//Do this in a non-locking way to avoid lock contention
while (!ok){
ok = 1;
for (i = 0; i < num_threads; ++i){
if (active_lists[i].load() == list_number && i != my_id){
ok = 0;
/*
* we have to restart - potential to optimize
* at this point, you could delay the work on this list
* to do some other work
*/
break;
}
}
}
while(try_to_lock(mutex));
//rerun the check to see if anyone has taken the list in the meantime
// ok == 1 at this point
for (i = 0; i < num_threads; ++i){
if (active_lists[i].load() == list_number && i != my_id){
ok = 0;
break;
}
}
//this must not be set from anywhere else!
if (ok) active_lists[my_id].store(list_number);
unlock(mutex);
//if we noticed someone claimed the list, go back to the beginning.
if (ok) break;
}
}
There are a few constraints to the pseudo-types. some_mutex obviously has to be a mutex. What I call atomic_int here must somehow support fetching its latest value form main memory to prevent you from seeing old values, which are cached. Same goes for the store: it must not be cached core-locally before being written. Using a regular int and using lfence, sfence and/or mfence may work as well.
There are obviously some trade-offs here, where the main one is probably memory vs speed. This example will create contention at the single mutex used to store which list you have locked, so it will scale poorly with a large number of threads, but well with a large number of lists. If lists are claimed infrequently this would work well even at a larger number of threads. The advantage is that the storage requirement depends mainly on the number of threads. You have to pick a storage type which can hold a number equivalent to the maximum number of lists though.
I am not sure what exactly your scenario is, but recently lock-free lists have also gained some momentum. With the introduction of advanced support for lock-free code in C11 and C++11, there have been a few working (as in not shown to be broken) examples around. Herb Sutter gave a talk on how to do this in C++11. It is C++, but he discusses the relevant points of writing a lock free singly linked list, which are also true for plain old C. You can also try to find an existing implementation, but you should inspect it carefully because this is kind of bleeding edge stuff. However using lock-free lists would erase the need to lock at all.
How are they implemented especially in case of pthreads. What pthread synchronization APIs do they use under the hood? A little bit of pseudocode would be appreciated.
I haven't done any pthreads programming for a while, but when I did, I never used POSIX read/write locks. The problem is that most of the time a mutex will suffice: ie. your critical section is small, and the region isn't so performance critical that the double barrier is worth worrying about.
In those cases where performance is an issue, normally using atomic operations (generally available as a compiler extension) are a better option (ie. the extra barrier is the problem, not the size of the critical section).
By the time you eliminate all these cases, you are left with cases where you have specific performance/fairness/rw-bias requirements that require a true rw-lock; and that is when you discover that all the relevant performance/fairness parameters of POSIX rw-lock are undefined and implementation specific. At this point you are generally better off implementing your own so you can ensure the appropriate fairness/rw-bias requirements are met.
The basic algorithm is to keep a count of how many of each are in the critical section, and if a thread isn't allowed access yet, to shunt it off to an appropriate queue to wait. Most of your effort will be in implementing the appropriate fairness/bias between servicing the two queues.
The following C-like pthreads-like pseudo-code illustrates what I'm trying to say.
struct rwlock {
mutex admin; // used to serialize access to other admin fields, NOT the critical section.
int count; // threads in critical section +ve for readers, -ve for writers.
fifoDequeue dequeue; // acts like a cond_var with fifo behaviour and both append and prepend operations.
void *data; // represents the data covered by the critical section.
}
void read(struct rwlock *rw, void (*readAction)(void *)) {
lock(rw->admin);
if (rw->count < 0) {
append(rw->dequeue, rw->admin);
}
while (rw->count < 0) {
prepend(rw->dequeue, rw->admin); // Used to avoid starvation.
}
rw->count++;
// Wake the new head of the dequeue, which may be a reader.
// If it is a writer it will put itself back on the head of the queue and wait for us to exit.
signal(rw->dequeue);
unlock(rw->admin);
readAction(rw->data);
lock(rw->admin);
rw->count--;
signal(rw->dequeue); // Wake the new head of the dequeue, which is probably a writer.
unlock(rw->admin);
}
void write(struct rwlock *rw, void *(*writeAction)(void *)) {
lock(rw->admin);
if (rw->count != 0) {
append(rw->dequeue, rw->admin);
}
while (rw->count != 0) {
prepend(rw->dequeue, rw->admin);
}
rw->count--;
// As we only allow one writer in at a time, we don't bother signaling here.
unlock(rw->admin);
// NOTE: This is the critical section, but it is not covered by the mutex!
// The critical section is rather, covered by the rw-lock itself.
rw->data = writeAction(rw->data);
lock(rw->admin);
rw->count++;
signal(rw->dequeue);
unlock(rw->admin);
}
Something like the above code is a starting point for any rwlock implementation. Give some thought to the nature of your problem and replace the dequeue with the appropriate logic that determines which class of thread should be woken up next. It is common to allow a limited number/period of readers to leapfrog writers or visa versa depending on the application.
Of course my general preference is to avoid rw-locks altogether; generally by using some combination of atomic operations, mutexes, STM, message-passing, and persistent data-structures. However there are times when what you really need is a rw-lock, and when you do it is useful to know how they work, so I hope this helped.
EDIT - In response to the (very reasonable) question, where do I wait in the pseudo-code above:
I have assumed that the dequeue implementation contains the wait, so that somewhere within append(dequeue, mutex) or prepend(dequeue, mutex) there is a block of code along the lines of:
while(!readyToLeaveQueue()) {
wait(dequeue->cond_var, mutex);
}
which was why I passed in the relevant mutex to the queue operations.
Each implementation can be different, but normally they have to favor readers by default due to the requirement by POSIX that a thread be able to obtain the read-lock on an rwlock multiple times. If they favored writers, then whenever a writer was waiting, the reader would deadlock on the second read-lock attempt unless the implementation could determine the reader already has a read lock, but the only way to determine that is storing a list of all threads that hold read locks, which is very inefficient in time and space requirements.
I'm not sure about how pthread dataspecific works : considering the next code (found on the web), does this means i can create for example 5 threads in the main, have a call to func in only some of them (let's say 2) those threads would have the data 'key' set to something (ptr = malloc(OBJECT_SIZE) ) and the other threads would have the same key existing but with a NULL value?
static pthread_key_t key;
static pthread_once_t key_once = PTHREAD_ONCE_INIT;
static void
make_key()
{
(void) pthread_key_create(&key, NULL);
}
func()
{
void *ptr;
(void) pthread_once(&key_once, make_key);
if ((ptr = pthread_getspecific(key)) == NULL) {
ptr = malloc(OBJECT_SIZE);
...
(void) pthread_setspecific(key, ptr);
}
...
}
Some explanation on how dataspecific works and how it may have been implemented in pthread (simple way) would be appreciated!
Your reasoning is correct. These calls are for thread-specific data. They're a way of giving each thread a "global" area where it can store what it needs, but only if it needs it.
The key is shared among all threads, since it's created with pthread_once() the first time it's needed, but the value given to that key is different for each thread (unless it remains set to NULL). By having the value a void* to a memory block, a thread that needs thread-specific data can allocate it and save the address for later use. And threads that don't call a routine that needs thread-specific data never waste memory since it's never allocated for them.
The one area where I have used them is to make a standard C library thread-safe. The strtok() function (as opposed to a thread-safe strtok_r() which was considered an abomination when we were doing this) in an implementation I was involved in used almost this exact same code the first time it was called, to allocate some memory which would be used by strtok() for storing information for subsequent calls. These subsequent calls would retrieve the thread-specific data to continue tokenizing the string without interfering with other threads doing the exact same thing.
It meant users of the library didn't have to worry about cross-talk between threads - they still had to ensure a single thread didn't call the function until the last one had finished but that's the same as with single-threaded code.
It allowed us to give a 'proper' C environment to each thread running in our system without the usual "you have to call these special non-standard re-entrant routines" limitations that other vendors imposed on their users.
As for implementation, from what I remember of DCE user-mode threads (which I think were the precursor to the current pthreads), each thread had a single structure which stored things like instruction pointers, stack pointers, register contents and so on. It was a very simple matter to add one pointer to this structure to achieve very powerful functionality with minimal cost. The pointer pointed to a array (linked list in some implementations) of key/pointer pairs so each thread could have multiple keys (e.g., one for strtok(), one for rand()).
The answer to your first question is yes. In simple terms, it allows each thread to allocate and save its own data. This is roughly equivalent to w/o each thread simply allocating and passing around its own data structure. The API saves you the trouble of passing the thread-local structure to all subfunctions, and allows you to look it up on demand instead.
The implementation really doesn't matter all that much (it may vary per-OS), as long as the results are the same.
You can think of it as a two-level hashmap. The key specifies which thread-local "variable" you want to access, and the second level might perform a thread-id lookup to request the per-thread value.