Consumer/producer program got stuck

Consumer/producer program got stuck - c

Below is the code of an example consumer/producer model:
int buffer[MAX];
int fill_ptr = 0;
int use_ptr = 0;
int count = 3;
void put(int value) {
buffer[fill_ptr] = value;
fill_ptr = (fill_ptr + 1) % MAX;
count++;
}
int get() {
int tmp = buffer[use_ptr];
use_ptr = (use_ptr + 1) % MAX;
count--;
return tmp;
}
cond_t empty, fill;
mutex_t mutex;
void *producer(void *arg) {
int i;
for (i = 0; i < loops; i++) {
pthread_mutex_lock(&mutex); // p1
while (count == MAX) // p2
pthread_cond_wait(&empty, &mutex); // p3
put(i);// p4
pthread_cond_signal(&fill); // p5
pthread_mutex_unlock(&mutex); // p6
}
}
void* consumer(void *arg) {
int i;
for (i = 0; i < loops; i++) {
pthread_mutex_lock(&mutex); // c1
while (count == 0) // c2
pthread_cond_wait(&fill, &mutex); // c3
int tmp = get(); // c4
pthread_cond_signal(&empty); // c5
pthread_mutex_unlock(&mutex); // c6
printf("%d\n", tmp);
}
}
However, I think there is a problem in here. Assume MAX=3, and initially the buffer is full (count = 3), then the consumer can execute a get() and signal to the producer. After the producer receives a signal, it wakes up and begins to execute put() in buffer[0] with the mutex held.
Assume the producer just got stuck in put(); then the consumer cannot continue either (because the mutex is held by producer) even though there are 2 resources left.
Is my understanding correct? If so, that's unfair because there are 2 resources left which can be consumed.

Is my understanding correct?
Both yes and no.
Yes, it is correct that the consumerwill be stuck if the producer calls put and put gets stuck (e.g. by entering a endless loop).
However, you can't assume that there are 2 resources left. The pthread_cond_signaldoes not promise the producer executes before the consumer have read all 3 three elements. All you can know is that the consumer read at least one element but maybe it read 2 or even 3 before the producer executes.
If so, that's unfair ....
No, it is not unfair. It is exactly what a mutex is for, i.e. making sure that only one thread has access to the shared resource.
Therefore it is important to make sure that a thread holding a mutex will not get stuck! That is your responsibility as a programmer.
Note: In your case there is nothing inside put that can cause the thread to be stuck.

Assume the producer just got stuck in put(); then the consumer cannot continue either (because the mutex is held by producer) even though there are 2 resources left.
You must never do something that can get stuck while you hold the mutex, and your code does not.

Related

Why does my simple counting program take longer to run with multiple threads? (in C)

Here's my code:
#define COUNT_TO 100000000
#define MAX_CORES 4
static pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
long long i = 0;
void* start_counting(void *arg){
for(;;){
pthread_mutex_lock(&mutex);
if(i >= COUNT_TO){
pthread_mutex_unlock(&mutex);
return NULL;
}
i++;
pthread_mutex_unlock(&mutex);
//printf("i = %lld\n", i);
}
}
int main(int argc, char* argv[]){
int i = 0;
pthread_t * thread_group = malloc(sizeof(pthread_t) * MAX_CORES);
for(i = 0; i < MAX_CORES; i++){
pthread_create(&thread_group[i], NULL, start_counting, NULL);
}
for(i = 0; i < MAX_CORES; i++){
pthread_join(thread_group[i], NULL);
}
return 0;
}

This is what your threads do:
Read the value of i.
Increment the value we read.
Write back the incremented value of i.
Go to step 1.
Cleary, another thread cannot read the value of i after a different thread has accomplished step 1 but before it has completed step 3. So there can be no overlap between two threads doing steps 1, 2, or 3.
So all your threads are fighting over access to the same resource -- i (or the mutex that protects it). No thread can make useful forward progress without exclusive access to one or both of those. Given that, there is no benefit to using multiple threads since only one of them can accomplish useful work at a time.

How to speed up C mutex?

I has this wrong code.
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#define MAX 1000
struct TContext {
const char* Name;
int* Counter;
int Mod;
};
void* ThreadFunc(void* arg) {
struct TContext* ctxt = arg;
int* counter = ctxt->Counter;
fprintf(stderr, "This is %s thread\n", ctxt->Name);
while (*counter < MAX) {
if (*counter % 2 == ctxt->Mod) {
printf("%d ", (*counter)++);
}
}
pthread_exit(0);
}
int main()
{
pthread_t t1;
pthread_t t2;
int counter = 0;
struct TContext ctxt1 = {"even", &counter, 0};
struct TContext ctxt2 = {"odd", &counter, 1};
pthread_create(&t1, 0, ThreadFunc, &ctxt1);
pthread_create(&t2, 0, ThreadFunc, &ctxt2);
pthread_join(t1, 0);
pthread_join(t2, 0);
printf("\n");
return 0;
}
My aim is to synchronize it and get sequnce 0, 1, 2, 3, 4, 5... .
I am try to lock and unlock mutex in this way
pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
void* ThreadFunc(void* arg) {
struct TContext* ctxt = arg;
int* counter = ctxt->Counter;
fprintf(stderr, "This is %s thread\n", ctxt->Name);
while (*counter < MAX) {
if (*counter % 2 == ctxt->Mod) {
pthread_mutex_lock(&mutex);
printf("%d ", (*counter)++);
pthread_mutex_unlock(&mutex);
}
}
pthread_exit(0);
}
But it works very slow(I has tl in one second).
How I can synchronize this code in more effective way? Or maybe I can optimize C-mutex?

A slightly more traditiona way than Chris Halls is:
pthread_cond_t cv;
pthread_mutex_t lock;
void* ThreadFunc(void* arg) {
struct TContext* ctxt = arg;
int* counter = ctxt->Counter;
fprintf(stderr, "This is %s thread\n", ctxt->Name);
pthread_mutex_lock(&lock);
while (*counter < MAX) {
if (*counter % 2 == ctxt->Mod) {
printf("%d ", (*counter)++);
pthread_cond_broadcast(&cv);
} else {
pthread_cond_wait(&cv, &lock);
}
}
pthread_mutex_unlock(&lock);
pthread_exit(0);
}
and in main:
pthread_mutex_init(&lock, 0);
pthread_cond_init(&cv, 0);
somewhere before creating the threads. This also lets you add an arbitrary number of even + odd threads without interference ( although no speedup, just intellectual curiosity ).

I suggest:
void* ThreadFunc(void* arg) {
struct TContext* ctxt = arg;
volatile int* counter = ctxt->Counter;
fprintf(stderr, "This is %s thread\n", ctxt->Name);
while (1)
{
int count ;
count = *counter ; // NB: volatile*
if (count >= MAX)
break ;
if ((count % 2) == ctxt->Mod)
{
printf("%d ", count) ;
*counter = count + 1 ;
} ;
} ;
pthread_exit(0);
}
Which, for x86/x86_64 at least, will have the effect I think you were looking for, namely that the two threads take turns in incrementing the counter.
The really interesting question is why this works :-)
Postscript
The code above depends, critically, on four things:
there is only one value being shared between the threads -- the counter,
the counter is simultaneously data and control -- the ls bit of the counter signals which thread should proceed.
reading and writing the counter must be atomic -- so every read of the counter reads the last value written (and not some combination of the previous and current write).
the compiler must emit code to actually read/write the counter from/to memory inside the loop.
Now (1) and (2) are specific to this particular problem. (3) is generally true for int (though may require correct alignment). (4) is achieved by defining the counter as volatile.
So, I originally said that this would work "for x86/x86_64 at least" because I know (3) is true for those devices, but I also believe it is true for many (most ?) common devices.
A cleaner implementation would define the counter _Atomic, as follows:
#include <stdatomic.h>
void* ThreadFunc(void* arg) {
struct TContext* ctxt = arg;
atomic_int* counter = ctxt->Counter;
fprintf(stderr, "This is %s thread\n", ctxt->Name);
while (1)
{
int count ;
count = atomic_load_explicit(counter, memory_order_relaxed) ;
if (count > MAX) // printing up to and including MAX
break ;
if ((count % 2) == ctxt->Mod)
{
printf("%d ", count) ;
atomic_store_explicit(counter, count + 1, memory_order_relaxed) ;
} ;
} ;
pthread_exit(0);
}
Which makes (3) and (4) explicit. But note that (1) and (2) still mean that we don't need any memory ordering. Every time each thread reads the counter, bit0 tells it whether it "owns" the counter. If it does not own the counter, the thread loops to read it again. If it does own the counter, it uses the value and then writes a new value -- and because that passes "ownership" it returns to the read loop (it cannot do anything further with the counter until it "owns" it again). Once MAX+1 has been written to the counter neither thread will use or change it, so that's safe too.
Brother Employed Russian is correct, there is a "data race" here, but that is resolved by a data dependency, particular to this case.
More Generally
The code above is not terribly useful, unless you have other applications with a single shared value. But this can be generalised, using memory_order_acquire and memory_order_acquire atomic operations.
Suppose we have some struct shared which contains some (non-trivial) amount of data which one thread will produce and another will consume. Suppose we again use atomic_uint counter (initially zero) to manage access to a given struct shared parcel. Now we have a producer thread which:
void* ThreadProducerFunc(void* arg)
{
atomic_uint counter = &count ; // somehow
....
while (1)
{
uint count ;
do
count = atomic_load_explicit(counter, memory_order_acquire) ;
while ((count & 1) == 1) ;
... fill the struct shared parcel, somehow ...
atomic_store_explicit(counter, count + 1, memory_order_release) ;
} ;
....
}
And a consumer thread which:
void* ThreadConsumerFunc(void* arg)
{
atomic_uint counter = &count ; // somehow
....
while (1)
{
uint count ;
do
count = atomic_load_explicit(counter, memory_order_acquire) ;
while ((count & 1) == 0) ;
... empty the struct shared parcel, somehow ...
atomic_store_explicit(counter, count + 1, memory_order_release) ;
} ;
....
}
The load-acquire operations synchronize with the store-release operations, so:
in the producer: the filling of the parcel will not start until the producer has "ownership" (as above), and will then "complete" (writes become visible to the other thread) before the count is updated (and the new value becomes visible to the other thread).
in the consumer: the emptying of the parcel will not start until the consumer has "ownership" (as above), and will then "complete" (all reads will have read from memory) before the count is updated (and the new value becomes visible to the other thread).
Clearly, the two threads are busy waiting for each other. But with two or more parcels and counters, the threads can progress at the speed of the slower.
Finally -- x86/x86_64 and acquire/release
With x86/x86_64, all memory reads and writes are implicitly acquire-reads and release-writes. This means that there is zero overhead in atomic_load_explicit(..., memory_order_acquire) and atomic_store_explicit(..., memory_order_release).
Conversely, all read-modify-write operations (and memory_order_seq_cst operations), carry overheads in the several-10s of clocks -- 30?, 50?, more if the operation is contended (depending on the device).
So, where performance is critical, it may be worth understanding what's possible (and what isn't).

How I can synchronize this code in more effective way?
You can't: the code is fundamentally inefficient.
The issue is that the amount of work that you do (incrementing an integer) is minuscule compared to the synchronization overhead, so the latter dominates.
To fix the problem, you need to do more work for each lock/unlock pair.
In a real program, you would have each thread perform 1000 or 10000 "work items" for each lock/unlock iteration. Something like:
lock;
const int start = *ctx->Counter;
*ctx->Counter += N;
unlock;
for (int j = start; j < start + N; j++) /* do work on j-th iteration here */;
But your toy program isn't amenable to this.
Or maybe I can optimize C-mutex?
I suggest trying to implement a correct mutex first. You'll quickly discover that this is far from trivial.

Why are my threads I created not printed in order?

I have this program:
void *func(void *arg) {
pthread_mutex_lock(&mutex);
int *id = (int *)arg;
printf("My ID is %d\n" , *id);
pthread_mutex_unlock(&mutex);
}
int main() {
int i;
pthread_t tid[3];
// Let us create three threads
for (i = 0; i < 3; i++) {
pthread_create(&tid[i], NULL, func, (void *)&i);
}
for (i = 0; i < 3; i++) {
pthread_join(tid[i], NULL);
}
pthread_exit(NULL);
return 0;
}
I expected it to output this:
My ID is 0
My ID is 1
My ID is 2
But instead I get random output, such as this:
My ID is 0
My ID is 0
My ID is 2
Since I already added mutex lock, I thought it would solve the problem. What else did I do wrong? Is this related to race condition?

Here id points to the same variable i in main for all the threads.
int *id = (int *)arg;
printf("My ID is %d\n" , *id);
But the variable i is constantly being update by the two for-loops in main behind the threads back. So before the thread reaches the point of printf, the value of i, and therefore also the value of *id, may have changed.
There are a few ways to solve this. The best way depends on the use case:
Wait in main until the thread signals that it has made a copy of *id before modifying i or letting it go out of scope.
Declare and initialize an array, int thread_id[], and create the threads like this:
pthread_create(&tid[i], NULL, func, &thread_id[i]);
malloc some memory and and initialize it with a copy of i:
int *thread_id = malloc(sizeof(*thread_id));
*thread_id = i
pthread_create(&tid[i], NULL, func, thread_id);
Just don't forget to free your memory int the thread when you are finished using it. Or in main if the thread fails to start.
If i fits in a void * can pass its content directly as a parameter to the thread. To make sure it fits, you can declare it as intptr_t rather than int
(We basicly abuse the fact that pointers are nothing more than magic integers) :
void *func(void *arg) {
pthread_mutex_lock(&mutex);
// Here we interpret a pointer value as an integer value
intptr_t id = (intptr_t )arg;
printf("My ID is %d\n" , (int)id);
pthread_mutex_unlock(&mutex);
}
int main() {
intptr_t i;
pthread_t tid[3];
// Let us create three threads
for (i = 0; i < 3; i++) {
// Here we squeeze the integer value of `i` into something that is
// supposed to hold a pointer
pthread_create(&tid[i], NULL, func, (void *)i);
}
for (i = 0; i < 3; i++) {
pthread_join(tid[i], NULL);
}
// This does not belong here !!
// pthread_exit(NULL);
return 0;
}

Nope, no race conditions involved. (my b) There can be a race condition on i because all threads access it. Each thread gets started with a pointer to i. However, the main problem is that there is no guarantee that the thread will start and run the critical section while i holds the value you expect, in an order that you expect.
I'm assuming you declared the variable mutex globally and called pthread_mutex_init() somewhere to initialize it.
Mutexes are great to allow only one thread to access a critical section of code at a time. So the code as you've written creates all three threads to run in parallel, but only lets one thread at a time run the following code.
int *id = (int *)arg;
printf("My ID is %d\n" , *id);

How to use critical section

Hello I would like to write a program with 2 concurrent threads. First thread writes to the array letter 'A' and second one writes 'B'. My question is how to use critical section to gain result with alternately array with only letter A and with only letter B ? Here is my code, but it is not work properly. What is wrong with it?
#include <stdlib.h>
#include <stdio.h>
#include <windows.h>
#include <psapi.h>
#define SIZE_TAB 200
volatile char program[SIZE_TAB];
CRITICAL_SECTION CriticalSection;
DWORD WINAPI aa(void *v);
DWORD WINAPI bb(void *v);
int main(int argc, char *argv[])
{
InitializeCriticalSection(&CriticalSection);
HANDLE thread_a = CreateThread(NULL, 0, aa, 0, 0, 0);
HANDLE thread_b = CreateThread(NULL, 0, bb, 0, 0, 0);
while (1)
{
for (int i = 0; i<SIZE_TAB; i++)
printf("%c", program[i]);
Sleep(1000);
printf("\n\n");
}
DeleteCriticalSection(&CriticalSection);
CloseHandle(thread_a);
CloseHandle(thread_b);
return 0;
}
DWORD WINAPI aa(void *v)
{
EnterCriticalSection(&CriticalSection);
for (int i = 0; i < SIZE_TAB; i++)
{
program[i] = 'A';
for (int j = 0; j<8000; j++);
}
LeaveCriticalSection(&CriticalSection);
}
DWORD WINAPI bb(void *v)
{
EnterCriticalSection(&CriticalSection);
for (int i = 0; i<SIZE_TAB; i++)
{
program[i] = 'B';
for (int j = 0; j<8000; j++);
}
LeaveCriticalSection(&CriticalSection);
}

Critical section is a way of protecting data in a multi-threaded program. Once one thread enters a critical section, another thread cannot enter that same critical section until the first thread leaves it.
You have three threads in play here: the main thread, aa and bb. You have ensured that threads aa and bb cannot access the same data at the same time by protecting it with a critical section, but you left it open for the main thread to access it at any time (in the main loop where you print out the array). The main thread is not modifying it, but it is accessing it, so it will print out whatever it finds in the array at the time: the first thread that entered the critical section may have finished modifying the data or it may have not. Furthermore, you have surrounded the entire function body with a critical section in both aa and bb, which means that the first thread to enter it will have fully run through the loop before the other thread gets the chance.

Compute the summation of a given interval using multiple threads

For my homework, I need to compute the squares of integers in the interval (0,N) (e.g. (0,50) in a way that the load is distributed equally among threads (e.g. 5 threads). I have been advised to use small chunks from the interval and assign it to the thread. For that, I am using a queue. Here's my code:
#include <stdio.h>
#include <pthread.h>
#define QUEUE_SIZE 50
typedef struct {
int q[QUEUE_SIZE];
int first,last;
int count;
} queue;
void init_queue(queue *q)
{
q->first = 0;
q->last = QUEUE_SIZE - 1;
q->count = 0;
}
void enqueue(queue *q,int x)
{
q->last = (q->last + 1) % QUEUE_SIZE;
q->q[ q->last ] = x;
q->count = q->count + 1;
}
int dequeue(queue *q)
{
int x = q->q[ q->first ];
q->first = (q->first + 1) % QUEUE_SIZE;
q->count = q->count - 1;
return x;
}
queue q; //declare the queue data structure
void* threadFunc(void* data)
{
int my_data = (int)data; /* data received by thread */
int sum=0, tmp;
while (q.count)
{
tmp = dequeue(&q);
sum = sum + tmp*tmp;
usleep(1);
}
printf("SUM = %d\n", sum);
printf("Hello from new thread %u - I was created in iteration %d\n",pthread_self(), my_data);
pthread_exit(NULL); /* terminate the thread */
}
int main(int argc, char* argv[])
{
init_queue(&q);
int i;
for (i=0; i<50; i++)
{
enqueue(&q, i);
}
pthread_t *tid = malloc(5 * sizeof(pthread_t) );
int rc; //return value
for(i=0; i<5; i++)
{
rc = pthread_create(&tid[i], NULL, threadFunc, (void*)i);
if(rc) /* could not create thread */
{
printf("\n ERROR: return code from pthread_create is %u \n", rc);
return(-1);
}
}
for(i=0; i<5; i++)
{
pthread_join(tid[i], NULL);
}
}
The output is not always correct. Most of the time it is correct, 40425, but sometimes, the value is bigger. Is it because of the threads are running in parallel and accessing the queue at the same time (the processor on my laptop is is intel i7)? I would appreciate the feedback on my concerns.

I think contrary to what some of the other people here suggested, you don't need any synchronization primitives like semaphores or mutexes at all. Something like this:
Given some array like
int values[50];
I'd create a couple of threads (say: 5), each of which getting a pointer to a struct with the offset into the values array and a number of squares to compute, like
typedef struct ThreadArgs {
int *values;
size_t numSquares;
} ThreadArgs;
You can then start your threads, each of which being told to process 10 numbers:
for ( i = 0; i < 5; ++i ) {
ThreadArgs *args = malloc( sizeof( ThreadArgs ) );
args->values = values + 10 * i;
args->numSquares = 10;
pthread_create( ...., threadFunc, args );
}
Each thread then simply computes the squares it was assigned, like:
void *threadFunc( void *data )
{
ThreadArgs *args = data;
int i;
for ( i = 0; i < args->numSquares; ++i ) {
args->values[i] = args->values[i] * args->values[i];
}
free( args );
}
At the end, you'd just use a pthread_join to wait for all threads to finish, after which you have your squares in the values array.

All your threads read from the same queue. This leads to a race condition. For instance, if the number 10 was read simultaneously by two threads, your result will be offset by 100. You should protect your queue with a mutex. Put the following print in deque function to know which numbers are repeated:
printf("Dequeing %d in thread %d\n", x, pthread_self());
Your code doesn't show where the results are accumulated to a single variable. You should protect that variable with a mutex as well.
Alternatively, you can pass the start number as the input parameter to each thread from the loop so that each thread can work on its set of numbers. First thread will work on 1-10, the second one on 11-20 and so on. In this approach, you have to use mutex only the part where the threads update the global sum variable at the end of their execution.

First you need to define what it means to be "distributed equally among threads." If you mean that each thread does the same amount of work as the other threads, then I would create a single queue, put all the numbers in the queue, and start all threads (which are the same code.) Each thread tries to get a value from the queue which must be protected by a mutex unless it is thread safe, calculates the partial answer from the value taken from the thread, and adds the result to the total which must also be protected by a mutex. If you mean that each thread will execute an equal amount of times as each of the other threads, then you need to make a priority queue and put all the numbers in the queue along with the thread number that should compute on it. Each thread then tries to get a value from the queue that matches its thread number. From the thread point of view, it should try to get a value from the queue, do the work, then try to get another value. If there are no more values to get, then the thread should exit. The main program does a join on all threads and the program exits when all threads have exited.