I am implementing a condition variable's wait operation. I have a struct for my condition variable. So far, my struct has a monitor, a queue, and a spinlock. But I am not sure if a condition variable should have a queue by itself. My notify looks like this:
void uthread_cv_notify (uthread_cv_t* cv) {
uthread_t* waiter_thread;
spinlock_lock(&cv->spinlock);
waiter_thread = dequeue (&cv->waiter_queue);
if(waiter_thread)
{
uthread_monitor_exit(cv->mon);
uthread_stop(TS_BLOCKED);
uthread_monitor_enter(cv->mon);
spinlock_unlock(&cv->spinlock);
}
}
But I wonder if in a notify function or a wait function I should just enqueue and dequeue in the monitor's waiting queue?
Thanks
The signal operation (that you're calling notify) should not require that the monitor be entered. This is inefficient.
It seems like you're trying to implement some clumsy old fashioned condition/monitor system in which the caller of "notify" must be inside the monitor, and it is guaranteed that if a thread is waiting, that thread gets the monitor before the "notify" caller returns to the monitor. (And that waiting thread does not have to have a loop re-testing the condition, either.)
That may be how C. A. R. Hoare initially described monitors and conditions, but the formalism is impractical/inefficient on modern multiprocessor systems, and also on threading implementations which do not have the luxury of being extremely tightly integrated with the low level scheduler (to be able to precisely control which thread gets to run when, so there are no races about who acquires a mutex first: for instance, to be able to transfer a thread from one wait queue to another, etc.)
Note how you're extending the critical section of the monitor over the spinlock_lock operation and over the dequeue operation. Neither of these belong under the monitor. The spinlock is independent, and the queue is guarded by the spinlock, not by the monitor. The monitor should protect the shared variables of the user code only (the special atomic property of of the wait operation).
So why do you need an extra queue? You are already storing all the threads that need to be notified.
Also, you probably want to do something like this:
void uthread_cv_notify (uthread_cv_t* cv) {
uthread_t* waiter_thread;
spinlock_lock(&cv->spinlock);
waiter_thread = dequeue (&cv->waiter_queue);
if(waiter_thread)
{
uthread_monitor_exit(cv->mon);
uthread_stop(TS_BLOCKED);
uthread_monitor_enter(cv->mon);
}
spinlock_unlock(&cv->spinlock);
}
This will ensure that the spin lock is always released.
Related
I have this code:
int _break=0;
while(_break==0) {
if(someCondition) {
//...
if(someOtherCondition)_break=1;//exit the loop
//...
}
}
The problem is that if someCondition is false, the loop gets heavy on the CPU. Is there a way to sleep for some milliseconds in the loop so that the cpu will not have a huge load?
Update
What I'm trying to do is a server-client application, without using sockets, just using shared memory, semaphores and system calls. I'm doing this on linux.
someOtherCondition becomes true when the applications receives the "kill" signal, while someCondition is true if the message received is valid. If it's not valid, it keeps waiting for a valid message and the while loop becomes a heavy infinite loop (it works but loads the CPU too much). I would like to make it lightweight.
I'm working on Linux (Debian 7).
If you have a single-threaded application, then it won't make any difference whether you suspend the execution or not.
If you have multiple threads running, then you should use a binary semaphore instead of polling a global variable.
This thread should acquire the semaphore at the beginning of each iteration, and one of the other threads should release the semaphore whenever you wish this thread to run.
This method is also known as "consumer-producer".
When a thread attempts to acquire a binary semaphore:
If the semaphore is released, then the calling thread acquires it and continues the execution.
If the semaphore is already acquired, then the calling thread "asks" the OS to block itself, and the OS will unblock it as soon as some other thread releases the semaphore.
The entire procedure is "atomic", i.e., no context-switch between threads can take place while the semaphore code is executed. This is generally achieved by disabling the interrupts. Everything is implemented within the semaphore code, so you need not "worry" about it.
Since you did not specify what OS you're using, I cannot provide any technical details (i.e., code)...
UPDATE:
If you are trying to protect a critical section inside the loop (i.e., if you are accessing some other global variable, which is also being accessed by other threads, and at least one of those threads is changing that global variable), then you should use a Mutex instead of a binary semaphore.
There are two advantages for using a Mutex in this case:
It can be released only by the thread which has acquired it (thus ensuring mutual exclusion).
It can resolve a specific type of deadlocks that occur when a high-priority thread is waiting for a low-priority thread to complete, while a medium-priority thread is preventing the low-priority thread from completing (a.k.a. priority-inversion).
Of course, a Mutex is required only if you really need to ensure mutual exclusion for accessing the data.
UPDATE #2:
Now that you've added some specific details on your system, here is the general scheme:
Step #1 - Before starting your threads:
// Declare a global variable 'sem'
// Initialize the global variable 'sem' with 'count = 0' (i.e., as acquired)
Step #2 - In this thread:
// Declare the global variable 'sem' as 'extern'
while(1)
{
semget(&sem);
//...
}
Step #3 - In the Rx ISR:
// Declare the global variable 'sem' as 'extern'
semset(&sem);
Spinning a loop without any delay will use a fair amount of CPU, a small time delay will reduce that you're right.
Using Sleep() is the easiest way, in Windows this is in the windows.h header.
Having said that, the most elegant solution would be to thread your code so that the code is only ever run when your condition is true, that way it will truly sleep until you wake it up.
I suggest you look into pthread and mutex. This will allow you to sleep that loop of yours entirely until the condition becomes true.
Hope that helps in some way :)
I am developing a user level thread library as part of a project. I came up with an approach to implement mutex. I would like to see ur views before going on with it. Basically, i need to implement just 3 functions in my library
mutex_init, mutex_lock and mutex_unlock
I thought my mutex_t structure would look something like
typedef struct
{
int available; //indicates whether the mutex is locked or unlocked
queue listofwaitingthreads;
gtthread_t owningthread;
}mutex_t;
In my mutex_lock function, i will first check if the mutex is available in a while loop. If it is not, i will yield the processor for the next thread to execute.
In my mutex_unlock function, i will check if the owner thread is the current thread. If it is, i will set available to 0.
Is this the way to go about it ? Also, what about deadlock? Should i take care of those conditions in my user level library or should i leave the application programmers to write code properly ?
This won't work, because you have a race condition. If 2 threads try to catch the lock at the same time, both will see available == 0, and both will think they succeeded with taking the mutex.
If you want to do this properly, and without using an already-existing lock, You must access hardware operations like TAS, CAS, etc.
There are algorithms that give you mutual exclusion without such hardware support, but they make some assumptions that are many times false. For more details about this, I highly recommend reading Herlihy and Shavit's The art of multiprocessor programming, chapter 7.
You shouldn't worry about deadlocks in this level - mutex locks should be simple enough, and there is some assumption that the programmer using them should use care not to cause deadlocks (advanced mutexes can check for self-deadlock, meaning a thread that calls lock twice without calling unlock in the middle).
Not only that you have to do atomic operations to read and modify the flag (as Eran pointed out) you also have to watch that your queue is capable to have concurrent accesses. This is not completely trivial, sort of hen and egg problem.
But if you'd really implement this by spinning, you wouldn't even need to have such a queue. The access order to the lock then would be mainly random, though.
Probably just yielding would also not be enough, this can be quite costly if you have threads holding the lock for more than some processor cycles. Consider using nanosleep with a low time value for the wait.
In general, a mutex implementation should look like:
Lock:
while (trylock()==failed) {
atomic_inc(waiter_cnt);
atomic_sleep_if_locked();
atomic_dec(waiter_cnt);
}
Trylock:
return atomic_swap(&lock, 1);
Unlock:
atomic_store(&lock, 0);
if (waiter_cnt) wakeup_sleepers();
Things get more complex if you want recursive mutexes, mutexes that can synchronize their own destruction (i.e. freeing the mutex is safe as soon as you get the lock), etc.
Note that atomic_sleep_if_locked and wakeup_sleepers correspond to FUTEX_WAIT and FUTEX_WAKE ops on Linux. The other atomics are probably CPU instructions, but could be system calls or kernel-assisted userspace function code, as in the case of Linux/ARM and the 0xffff0fc0 atomic compare-and-swap call.
You do not need atomic instructions for a user level thread library, because all the threads are going to be user level threads of the same process. So actually when your process is given the time slice to execute, you are running multiple threads during that time slice but on the same processor. So, no two threads are going to be in the library function at the same time. Considering that the functions for mutex are already in the library, mutual exclusion is guaranteed.
I wrote a simple program that implements master/worker scheme where the master is the main thread, and workers are created by it.
The main thread writes something to a shared buffer, and the worker threads read this shared buffer, writing and reading to shared buffer are organized by read/write lock.
Unfortunately, this scheme definitely leads to starvation of main thread, since a single write has to wait on several reads to complete. One possible solution is increasing the priority of the master thread, so if it wants to write something, it will get immediate access to the shared buffer.
According to a great post to a similar issue, I discovered that probably manipulating the priority of a thread under SCHED_OTHER policy is not allowed, what can be changed is the nice value only.
I wrote a procedure to give worker threads lower priority than master thread, but it seems not to work correctly.
void assignWorkerThreadPriority(pthread_t* worker)
{
struct sched_param* worker_sched_param = (struct sched_param*)malloc(sizeof(struct sched_param));
worker_sched_param->sched_priority =0; //any value other than 0 gives error?
int policy = SCHED_OTHER;
pthread_setschedparam(*worker, policy, worker_sched_param);
printf("Result of changing priority is: %d - %s\n", errno, strerror(errno));
}
I have a two-fold question:
How can I set the nice value of a worker threads to avoid main thread starvation.
If not possible, then how can I change the scheduling policy to a one that allows changing the priority.
Edit: I managed to run the program using other policies, such as SCHED_FIFO, all I had to do was running the program as a super user
You cannot avoid problems using a read/write lock when the read and write usage is so even. You need a different method. You need a lock-free message queue or independent work queues or one of many other techniques.
Here is another way to do the job, the way I would do it. The worker can take the buffer away and work on it rather than keeping it shared:
Write thread:
Create work item.
Lock the mutex or CriticalSection protecting the current queue and pointer to queue.
Add work item to queue.
Release the lock.
Optionally signal a condition variable or Event. Another option is for worker threads to check for work on a timer.
Worker thread:
Create a new queue.
Wait for a condition variable or event or other signal, or wait on a timer.
Lock the mutex or CriticalSection protecting the current queue and pointer to queue.
Set the current queue pointer to the new queue.
Release the lock.
Proceed to work on the now private queue.
Delete the queue when all work items complete.
Now write thread creates more work items. When all the worker threads have their own copies of a queue to work on it will be able to write many items in peace.
You can modify this. For example, a worker thread may lock the queue and move a limited number of work items off into its own internal queue instead of taking the whole thing.
My current understanding of condition variables is that all blocked (waiting) threads are inserted into a basic FIFO queue, the first item of which is awakened when signal() is called.
Is there any way to modify this queue (or create a new structure) to perform as a priority queue instead? I've been thinking about it for a while, but most solutions I have end up being hampered by the existing queue structure inherent to C.V.'s and mutexes.
Thanks!
I think you should rethink what you're trying to do. If you're trying to optimize your performance, you're probably barking up the wrong tree.
pthread_cond_signal() isn't even guaranteed to unblock exactly one thread -- it's guaranteed to unblock at least one thread, so your code better be able to handle the situation where multiple threads are unblocked simultaneously. The typical way to do this is for each thread to re-check the condition after becoming unblocked, and, if false, return to waiting again.
You could implement some sort of scheme where you kept your own priority queue of threads waiting, and each thread added itself to that queue immediately before it was to begin waiting, and then it would check the queue when unblocking, but this would add a lot of complexity and a lot of potential for serious problems (race conditions, deadlocks, etc.). It was also add a non-trivial amount of overhead.
Also, what happens if a higher-priority thread starts waiting on a condition variable at the same moment that condition variable is being signalled? Who gets unblocked, the newly arrived high-priority thread or the former highest priority thread?
The order that threads get unblocked in is entirely dependent on the kernel's thread scheduler, so you are at its mercy. I wouldn't even assume FIFO ordering, either.
Since condition variables are basically just a barrier and you have no control over the queue of waiting threads there's no real way to apply priorities. It's invalid to assume waiting threads will act in a FIFO manner.
With a combination of atomics, additional condition variables, and pre-knowledge of the threads/priorities involved you could construct a solution where a signaled thread will re-signal the master CV and then re-block on a priority CV but it certainly wouldn't be a generic solution. That's also off the top of my head so might also have some other flaw.
It's the scheduler that determines which thread will run. You can look at pthread_setschedparam and pthread_getschedparam and fiddle with the policies (SCHED_OTHER, SCHED_FIFO, or SCHED_RR) and the priorities. But it probably won't get you to where I suspect you want to go.
It sounds as if you want to make something predictable from the inherently non-deterministic. As Andrew notes you might hack something but my guess is that this will lead to heartache or a lot code you will hate yourself for writing in six months (or both).
I am working on a multi-threaded C application using pthreads. I have one thread which writes to a a database (the database library is only safe to be used in a single thread), and several threads which are gathering data, processing it, and then need to send the results to the database thread for storage. I've seen in mentioned that it is "possible" to make a multiple-writer safe queue in C, but every place I see this mentioned simply says that it's "too complicated for this example" and merely demonstrates a single-writer safe queue.
I need the following things:
Efficient insertion and removal. I would assume that like any other queue O(1) enqueueing and dequeueing is possible.
Dynamically allocated memory, i.e. a linked structure. I need to not have an arbitrary limit on the size of the queue, so an array really isn't what I'm looking for.
EDIT: Reading threads should not spin on an empty queue, since there is likely to be minutes worth of time with no writes, with short bursts of large numbers of writes.
Sure, there are lockless queues. Based on what you've said in comments, though, performance here is not at all critical, since you're creating a thread per write anyway.
So, this is a standard use case for a condition variable. Make yourself a struct containing a mutex, a condition variable, a linked list (or circular buffer if you like), and a cancel flag:
write:
lock the mutex
(optionally - check the cancel flag to prevent leaks of stuff on the list)
add the event to the list
signal the condition variable
unlock the mutex
read:
lock the mutex
while (list is empty AND cancel is false):
wait on the condition variable with the mutex
if cancel is false: // or "if list non-empty", depending on cancel semantics
remove an event from the list
unlock the mutex
return event if we have one, else NULL meaning "cancelled"
cancel:
lock the mutex
set the cancel flag
(optionally - dispose of anything on the list, since the reader will quit)
signal the condition variable
unlock the mutex
If you're using a list with external nodes, then you might want to allocate the memory outside the mutex lock, just to reduce the time its held for. But if you design the events with an intrusive list node that's probably easiest.
Edit: you can also support multiple readers (with no portable guarantees for which one gets a given event) if in cancel you change the "signal" to "broadcast". Although you don't need it, it doesn't really cost anything either.
http://www.liblfds.org
Lock-free data structure library written in C.
Has the M&S queue.
If you dont need a lock free queue, then you could just wrap up an existing queue with a lock.
Mutex myQueueLock;
Queue myQueue;
void mtQueuePush(int value)
{
lock(myQueueLock);
queuePush(myQueue, value);
unlock(myQueueLock);
}
int mtQueueNext()
{
lock(myQueueLock);
int value = queueFront(myQueue);
queuePop(myQueue);
unlock(myQueueLock);
return value;
}
The only thing after that is to add some sort of handling for mtQueueNext when the queue is empty.
EDIT:
If you have a single reader, single writer lockless queue, you only need to have a lock around mtQueuePush, to prevent multiple simultaneous writers.
There are a number of single reader/writer lockless queues around, however most of them are implemented as c++ template classes. However do a google search and if need be work out how to rewrite them in plain C.
I'd go for multiple single-writer queues (one per writer thread). Then you can check this for how to get the single reader to read the various queues.