I'm trying to implement counting semaphores using OpenMP's locking constructs, but I'm running into a problem (the program hangs) when trying to use omp_set_lock() inside a critical region.
I'm testing the implementation with a simple producer-consumer program. This is what I've come up with:
#include <omp.h>
#include <stdio.h>
#define N 50
int semaphore_count = 0;
omp_lock_t semaphore_lock;
int A[N];
void semaphore_increment()
{
int my_id = omp_get_thread_num();
#pragma omp critical
{
printf("[%lf][%d] semaphore_count %d --> %d\n", omp_get_wtime(), my_id,
semaphore_count, semaphore_count + 1);
semaphore_count++;
if(semaphore_count == 1) {
// Semaphore was previously locked, so unlock it.
printf("[%lf][%d] Releasing lock.\n", omp_get_wtime(), my_id);
omp_unset_lock(&semaphore_lock);
}
}
}
void semaphore_decrement()
{
int my_id = omp_get_thread_num();
#pragma omp critical
{
printf("[%lf][%d] semaphore_count: %d\n", omp_get_wtime(), my_id,
semaphore_count);
if (semaphore_count == 0) {
printf("[%lf][%d] Sleeping\n", omp_get_wtime(), my_id);
omp_set_lock(&semaphore_lock);
}
else {
printf("[%lf][%d] Working\n", omp_get_wtime(), my_id);
// Creating a critical region here instead of in the beginning of
// the function solves the problem.
// #pragma omp critical
// {
semaphore_count--;
// }
if (semaphore_count == 0) {
omp_set_lock(&semaphore_lock);
}
}
}
}
void produce() {
for (int i = 0; i < N; ++i) {
A[i] = i;
#pragma omp flush
semaphore_increment();
}
}
void consume() {
int sum = 0;
for (int i = 0; i < N; ++i) {
semaphore_decrement();
sum += A[i];
}
printf("Sum is: %d\n", sum);
}
int main() {
omp_init_lock(&semaphore_lock);
omp_set_lock(&semaphore_lock);
#pragma omp parallel
{
#pragma omp single nowait
produce();
#pragma omp single nowait
consume();
}
omp_destroy_lock(&semaphore_lock);
return 0;
}
This version of the program hangs every time the consumer thread goes to sleep. If I modify it to reduce the critical region to a smaller portion of the code (as indicated by the comment in the program), then it works.
What I don't understand is: why does this happen? It seems that the producer thread, which only increments the semaphore, stops running, and then everything hangs, but I don't understand why.
This answer is WRONG (see the comments). I leave it here as an example of how not to do it.
As noted in the comments by EOF, the program shown in the question is wrong, because it has a race condition regarding the update of the semaphore_count variable, since that can happen simultaneously in two different critical sections.
I ended up replacing functions semaphore_increment and semaphore_decrement by the following function, which can both increment or decrement the semaphore value depending on the value passed in as operation.
void semaphore_update(int operation) {
int my_id = omp_get_thread_num();
int set_lock = 0;
#pragma omp critical
{
if (operation == 0) { // Decrement operation
if (semaphore_count == 0) {
set_lock = 1;
}
else {
semaphore_count--;
if (semaphore_count == 0) {
// Locking here won't actually lock the current thread, only set
// the semaphore so that the *next* thread will be put to sleep.
set_lock = 1;
}
}
}
else { // Increment operation
semaphore_count++;
if(semaphore_count == 1) {
// Semaphore was previously locked, so unlock it.
omp_unset_lock(&semaphore_lock);
}
}
}
// The actual call to omp_set_lock has to happen outside the critical region
// otherwise any threads trying to unlock the semaphore won't be able to
// get access to the critical region.
if (set_lock) {
omp_set_lock(&semaphore_lock);
}
}
Related
This question already has answers here:
Difference between mutex lock and pthread_join
(2 answers)
Closed last year.
What's the difference between mutex_lock and pthread_join in these two source codes? They seem both to do the same thing, making the main function wait for the thread to finish execution.
This code:
#include "philo.h"
typedef struct s_bablo
{
pthread_mutex_t mutex;
} t_bablo;
void *myturn(void *arg)
{
t_bablo *bablo = (t_bablo *)arg;
int i = 0;
while(i < 10)
{
printf("My Turn ! %d\n", i);
i++;
sleep(1);
}
pthread_mutex_unlock(&bablo->mutex);
}
void *yourturn()
{
int i = 0;
while(i < 5)
{
printf("Your Turn ! %d\n", i);
i++;
sleep(1);
}
}
int main ()
{
t_bablo bablo;
pthread_mutex_init(&bablo.mutex, NULL);
pthread_t ph;
pthread_mutex_lock(&bablo.mutex);
pthread_create(&ph, NULL, myturn, &bablo);
yourturn();
pthread_mutex_lock(&bablo.mutex);
}
And this code :
#include "philo.h"
void *myturn(void *arg)
{
int i = 0;
while(i < 10)
{
printf("My Turn ! %d\n", i);
i++;
sleep(1);
}
}
void *yourturn()
{
int i = 0;
while(i < 5)
{
printf("Your Turn ! %d\n", i);
i++;
sleep(1);
}
}
int main ()
{
pthread_t ph;
pthread_create(&ph, NULL, myturn, NULL);
yourturn();
pthread_join(ph, NULL);
}
not to be rude but you can easily find the difference by googling both functions name...
Though pthread_mutex_lock is for variables. It locks this variable for the current running thread. Thus no other thread can use it, they have to wait for pthread_mutex_unlock to use it.
pthread_join waits for the specified thread to finish it's execution before continuing
I encourage you to read the man pages, they are really self explanatory.
I have a use case where I want to run two functions in parallel: the first one contains work that I want to execute in thread 0, and the other contains a "for" loop that I want to share among the remaining 3 threads.
my code is like:
void fct1(){
//do some work1
};
void fct2(){
int p;
#pragma omp for schedule(static)
for (p=start; p < end; p++) {
//do some work2
}
};
int main(){
#pragma omp parallel
{
int tid = omp_get_thread_num();
if (tid==0)
fct1();
if(tid!=0)
fct2();
}
return 0;
}
the problem is that the "omp for" in fct2 hangs because it also waits for thread 0 to execute its part.
Do you have any suggestions?
Thank you.
You can do that with a single (or master if you really want thread 0) pragma. The nowait directive will allow other threads to continue running.
You should use a dynamic scheduling in your for loop as it deals better with a variable number of threads. And if thread 0 has finished its work, it will join the pool.
#include <stdio.h>
#include <omp.h>
#define end 2000
void fct1(){
printf("Hey I am thread %d\n", omp_get_thread_num());
};
void fct2(){
int p;
# pragma omp for schedule(dynamic,128) // adapt chunk size to your problem
for (p=0; p < end; p++) {
printf("%d (%d)\t",p,omp_get_thread_num());
} // all, including thread 0, will be synchronized here
};
int main(){
# pragma omp parallel
{
# pragma omp single nowait
fct1();
fct2();
}
return 0;
}
I'm trying to code the producer/consumer problem using semaphores. I have 3, 1 acting as a mutex, and another 2 for the buffer which the producers and consumers can add/remove from. When adding/removing from the buffer I use the binary semaphore to lock/unlock it so that the global variables aren't subject to any race conditions. The produce semaphore represents how many spots are available in the buffer (how many things can be put into the buffer) while the consumer semaphore represents how many things can be removed from the buffer. I think my logic is wrong cause I always reach a deadlock. Also when I removed the produce and consume semaphores just to test whether or not the program does what its supposed to do, I still get race conditions even though the binary semaphore should be blocking that. What am I doing wrong?
enter code here
#include <stdio.h>
#include <sys/types.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/wait.h>
#include <pthread.h>
#include </usr/include/semaphore.h>
#define MAXITEMS 100
#define PRODUCER_NO 5
#define NUM_PRODUCED 5
void *producer_function (void);
void *consumer_function (void);
void add_buffer (long i);
long get_buffer ();
long sum_value = 0;
long finished_producers;
long buffer[MAXITEMS];
int size = 0;
int front, rear = 0;
pthread_t producerThread[5];
pthread_t consumerThread;
sem_t mutex, produce, consume;
int main(void)
{
int i = 0;
srand (time(NULL));
sem_init (&mutex, 0, 1);
sem_init (&produce, 0, 100);
sem_init (&consume, 0, 0);
for (i = 0; i < 5; i++)
{
pthread_create (&producerThread[i], NULL, (void *) producer_function, NULL);
}
pthread_create (&consumerThread, NULL, (void *) consumer_function, NULL);
for (i = 0; i < 5; i++)
{
pthread_join (producerThread[i], NULL);
}
pthread_join (consumerThread, NULL);
return(0);
}
void *producer_function(void)
{
long counter = 0;
long producer_sum = 0L;
while (counter < NUM_PRODUCED)
{
sem_wait (&mutex);
sem_wait (&produce);
long rndNum = rand() % 10;
producer_sum += rndNum;
add_buffer (rndNum);
sem_post (&consume);
counter++;
if (counter == NUM_PRODUCED)
{
finished_producers++;
}
sem_post (&mutex);
usleep(1000);
}
printf("--+---+----+----------+---------+---+--+---+------+----\n");
printf("The sum of produced items for this producer at the end is: %ld \n", producer_sum);
printf("--+---+----+----------+---------+---+--+---+------+----\n");
return(0);
}
void *consumer_function (void)
{
while (1)
{
sem_wait (&mutex);
sem_wait (&consume);
long readnum = get_buffer();
sem_post (&produce);
sum_value += readnum;
sem_post (&mutex);
//printf ("%ld\n", sum_value);
if ((finished_producers == PRODUCER_NO) && (size == 0))
{
printf("~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n");
printf("The sum of the all produced items at the end is: %ld \n", sum_value);
printf("~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n");
break;
}
}
}
void add_buffer(long i){
buffer[rear] = i;
rear = (rear+1) % MAXITEMS;
size++;
}
long get_buffer(){
long v;
v = buffer[front];
front = (front+1) % MAXITEMS;
size--;
return v;
}
user2929779,
I think its essential to not have the mutex locked, when waiting for a consume notification in the consumer, or vice versa a produce notification in the producer. Imagine you're getting blocked because of waiting for a consume notification, and no producer was able to publish such a notification then your consumer keeps the mutex locked and no producer ever gets the chance to produce a new item...
So the order is important here:
1.) First wait for notification from remote side
2.) lock mutex
3.) modify global data
4.) release mutex
5.) notify remote side
Try this instead:
void *producer_function(void)
{
long counter = 0;
long producer_sum = 0L;
while (counter < NUM_PRODUCED)
{
sem_wait (&produce);
sem_wait (&mutex);
long rndNum = rand() % 10;
producer_sum += rndNum;
add_buffer (rndNum);
counter++;
if (counter == NUM_PRODUCED)
{
finished_producers++;
}
sem_post (&mutex);
sem_post (&consume);
usleep(1000);
}
printf("--+---+----+----------+---------+---+--+---+------+----\n");
printf("The sum of produced items for this producer at the end is: %ld \n", producer_sum);
printf("--+---+----+----------+---------+---+--+---+------+----\n");
return(0);
}
void *consumer_function (void)
{
while (1)
{
sem_wait (&consume);
sem_wait (&mutex);
long readnum = get_buffer();
sum_value += readnum;
if ((finished_producers == PRODUCER_NO) && (size == 0))
{
printf("~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n");
printf("The sum of the all produced items at the end is: %ld \n", sum_value);
printf("~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n");
break;
}
sem_post (&mutex);
sem_post (&produce);
//printf ("%ld\n", sum_value);
}
return(0);
}
P.S. For now ignoring return values of system calls just to show example implementation...
P.S.S. See also pseudo code on wiki http://en.wikipedia.org/wiki/Producer%E2%80%93consumer_problem#Using_semaphores ...
I am trying to use OpenMP for threading as it is cross platform. However I can't work out how to make the code after the parallel continue while the loop is running? It basically just executes the first loop in parallel but never gets to the second non parallel loop?
int main() {
#pragma omp parallel
while(1) {
Sleep(4000);
printf("doing work in thread %d, nthreads %d\n", omp_get_thread_num(), omp_get_num_threads());
}
while (1) {
Sleep(4000);
printf("Hello from main %d, nthreads %d\n", omp_get_thread_num(), omp_get_num_threads());
}
}
I think you could just make one of the threads your special thread within your omp parallel block
int main() {
#pragma omp parallel
if(omp_get_thread_num()==0){
while(1) {
Sleep(4000);
printf("Hello from main %d, nthreads %d\n", omp_get_thread_num(), omp_get_num_threads());
}
}else{
while(1) {
Sleep(4000);
printf("doing work in thread %d, nthreads %d\n", omp_get_thread_num(), omp_get_num_threads());
}
}
}
}
Weather this makes sense in your case is hard to judge without more details.
You could also use sections. Example from here: http://bisqwit.iki.fi/story/howto/openmp/#Sections :
#pragma omp parallel // starts a new team
{
//Work0(); // this function would be run by all threads.
#pragma omp sections // divides the team into sections
{
// everything herein is run only once.
{ Work1(); }
#pragma omp section
{ Work2();
Work3(); }
#pragma omp section
{ Work4(); }
}
//Work5(); // this function would be run by all threads.
}
You can do nested renationalisation: OpenMP: What is the benefit of nesting parallelizations?
Using trylock:
FILE *fp;
pthread_mutex_t demoMutex;
void * printHello (void* threadId)
{
pthread_mutex_trylock (&demoMutex);
pthread_t writeToFile = pthread_self ();
unsigned short iterate;
for (iterate = 0; iterate < 10000; iterate++)
{
fprintf (fp, " %d ", iterate, 4);
fprintf (fp, " %lu ", writeToFile, sizeof (pthread_t));
fprintf (fp, "\n", writeToFile, 1);
}
pthread_mutex_unlock (&demoMutex);
pthread_exit (NULL);
}
and then main ():
int main ()
{
pthread_t arrayOfThreadId [5];
int returnValue;
unsigned int iterate;
fp = fopen ("xyz", "w");
pthread_mutex_init (&demoMutex, NULL);
for (iterate = 0; iterate < 5; iterate++)
{
if (returnValue = pthread_create (&arrayOfThreadId [iterate],
NULL,
printHello,
(void*) &arrayOfThreadId [iterate]) != 0)
{
printf ("\nerror: pthread_create failed with error number %d", returnValue);
}
}
for (iterate = 0; iterate < 5; iterate++)
pthread_join (arrayOfThreadId [iterate], NULL);
return 0;
}
Here the output first prints some of the first thread and then the rest, and then again the first. The lock isn't working. If I replace the same with pthread_mutex_lock every thing gets shown very sequentially!
What's the ridiculous mistake here?
It does not make sense to call pthread_mutex_trylock() without testing the result.
If it fails to acquire the mutex, you should not enter the critical section, and you should not unlock it later. For example, you could rewrite it like so (note that you are also very confused about how fprintf() should be called):
void *printHello(void *threadId)
{
if (pthread_mutex_trylock(&demoMutex) == 0)
{
unsigned short iterate;
for (iterate = 0; iterate < 10000; iterate++)
{
fprintf (fp, " %d\n", iterate);
}
pthread_mutex_unlock (&demoMutex);
}
pthread_exit (NULL);
}
However, it probably makes more sense to use pthread_mutex_lock() instead of pthread_mutex_trylock(), so that your thread will wait for the mutex to be available if it is contended. pthread_mutex_lock() is in almost all cases what you want; the _trylock variant is only for optimising some unusual cases - if you ever encounter a situation where _trylock is needed, you'll know.
...
while (pthread_mutex_trylock(&demoMutex) == 0)
...
Your code makes no sense. Where is it force locked? It's like a not working spinlock that use more CPU?!
trylock returns 0 when it locks, so:
if(!pthread_mutex_trylock(&demoMutex))
{
// mutex locked
}
The pthread_mutex_trylock() function shall return zero if a lock on
the mutex object referenced by mutex is acquired. Otherwise, an error
number is returned to indicate the error.
caf had a great answer on how to use it. I just had to grab that explanation for myself, however I did learn that pthread_mutex_lock() has far more overhead in class and just tested it out using the <time.h> lib and the performance for my loop was significantly increased. Just adding in that two cents since he mentioned that maybe you should use pthread_mutex_lock() instead!
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#define NUM_THREADS 4
#define LOOPS 100000
int counter;
pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
// using pthread_mutex_lock
void* worker() {
for (int i = 0; i < LOOPS; i++) {
pthread_mutex_lock(&mutex);
counter++;
pthread_mutex_unlock(&mutex);
}
pthread_exit(NULL);
}
// using try_lock - obviously only use one at a time
void* worker() {
for (int i = 0; i < LOOPS; i++) {
while (pthread_mutex_trylock(&mutex) != 0) {
// wait - treated as spin lock in this example
}
counter++;
pthread_mutex_unlock(&mutex);
}
pthread_exit(NULL);
}
int main(int argc, char *argv[]) {
clock_t begin = clock();
pthread_t t[NUM_THREADS];
int rc;
counter = 0;
for (int i = 0; i < NUM_THREADS; i++) {
rc = pthread_create(&t[i], NULL, worker, NULL);
if (rc) {
printf("Thread #%d failed\n", i);
}
}
for (int i = 0; i < NUM_THREADS; i++) {
pthread_join(t[i], NULL);
}
printf("%d\n", counter);
clock_t end = clock();
double time = (double)(end - begin) / CLOCKS_PER_SEC;
printf("Time Spent: %f", time);
return 0;
}
Obviously you would comment out one worker to test it, but if you try it out, I get Time Spent: 1.36200 as an average for pthread_mutex_lock() and Time Spent: 0.36714 for pthread_mutex_trylock().
Goes faster again if you use Atomics.
The code is meant to block to ensure mutual exclusion where you call pthread_mutex_trylock(). Otherwise it is undefined behavior. Therfore you must call pthread_mutex_lock().
a modified version of force locked with while loop should be more stable.
void *printHello(void *threadId)
{
while (pthread_mutex_trylock(&demoMutex) == 0)
{
unsigned short iterate;
for (iterate = 0; iterate < 10000; iterate++)
{
fprintf (fp, " %d\n", iterate);
}
pthread_mutex_unlock (&demoMutex);
break;
}
pthread_exit (NULL);
}`
The use of pthread_mutex_trylock is used to ensure that tou will not cause a race to a specific command.
In order to do so, you must use pthread_mutex_trylock as a condition! an not assume that it would work by it self.
example-
while(pthread_mutex_trylock(&my_mutex)==0){
printf("The mutex is in my control!!\n");
}
that way you can be sure that even if the mutex is now locked, you are doing abusy-waiting for it in that particular thread.