I saw a blog stating the below code is thread safe , but the condition count not being inside the mutex would cause a data corruption; in case two threads check the count at the same time but before either acquires a mutex and one acquire the mutex after contending. When one thread completes , the other would very blindly add one more value to the array.
char arr[10];
int count=0;
int func(char ctr)
{
int i=0;
if(count >= sizeof(arr))
{
printf("\n No storage\n");
return -1;
}
/* lock the mutex here*/
arr[count] = ctr;
count++;
/* unlock the mutex here*/
return count;
}
Would I be right if I made the following changes? Or is there a better/efficient way to do it
int func(char ctr)
{
int i=0;
/* lock the mutex here*/
if(count >= sizeof(arr))
{
printf("\n No storage\n");
/* unlock the mutex here*/
return -1;
}
arr[count] = ctr;
count++;
/* unlock the mutex here*/
return count;
}`
You are correct. By doing the check outside of the critical section you are opening the doors for a possible buffer overrun. However, note that the returned count may not be the same index used to store ctr. That's an issue even in the corrected code.
In order to remedy that you could rewrite it like this:
int func(char ctr)
{
/* lock the mutex here*/
if(count >= sizeof(arr)) {
printf("\n No storage\n");
/* unlock the mutex here*/
return -1;
}
arr[count] = ctr;
int c = count++;
/* unlock the mutex here*/
return c;
}
It's worth noting that, if that's the only function changing "count", then no two threads would be able to change the same memory position in arr and this would actually be safe:
int func(char ctr)
{
/* lock the mutex here*/
if(count >= sizeof(arr)) {
printf("\n No storage\n");
/* unlock the mutex here*/
return -1;
}
int c = count++;
/* unlock the mutex here*/
arr[c] = ctr;
return c;
}
If that's the pattern, maybe you can refactor that code into two functions, like so:
int get_sequence(void)
{
/* lock the mutex here*/
int c = count++;
/* unlock the mutex here*/
return c;
}
int func(char ctr)
{
int c = get_sequence();
if(c >= sizeof(arr)) {
printf("\n No storage\n");
return -1;
}
arr[c] = ctr;
return c;
}
Note that will only work as long as get_sequence is really the only function changing count variable.
First, you are correct that the code from the blog has the potential to write beyond the end of the array. The limit checking only works if it's done after the mutex has been acquired.
Here's how I would write the function:
bool func(char ctr)
{
bool result;
/* lock the mutex here */
if (count >= sizeof(arr))
{
result = FAILURE;
}
else
{
arr[count] = ctr;
count++;
result = SUCCESS;
}
/* unlock the mutex here */
if ( result == FAILURE )
printf("\n No storage\n");
return result;
}
The features of this code worth noting
The mutex lock and unlock appear only once in the function, and there
are no return statements in the critical section. This makes it
clear that the mutex will always be unlocked.
The printf is outside of the critical section. printf is
relatively slow, and any function that uses a mutex should hold the
mutex for as little time as possible.
IMO the function shouldn't return a count, but rather should only
return a bool indicating success or failure. Any code that needs
to know how many entries are in the array should lock the mutex and
examine the count directly.
Nothing wrong with the previous answers, but there is a better way. A mutex is not needed.
int func(char ctr) {
int c = interlocked_increment(&count);
if (c >= sizeof(arr)) {
printf("\n No storage\n");
interlocked_decrement(&count);
return -1;
}
arr[c-1] = ctr;
return c;
}
This depends on the availability of interlocked increment and decrement functions, which have to be provided by your operating system or third party library.
Every value of c that is within range is guaranteed to be one not seen by any other thread, and is therefore a valid slot in arr, and no thread will miss a slot if there is one available. The order of storing the value is indeterminate, but that is true of most of the other solutions too. The maximum value reached by count if many threads are competing for storage is also indeterminate, and if that is an issue a different approach might be needed. The behaviour if count is decremented is another unknown. This stuff is hard, and it's always possible to add additional constraints to make it harder.
Just to point out that there is another possible implementation based on a CSET (Check and Set) function. This is a function that checks whether some location is equal to a value and if so sets it to another value atomically, returning true if so. It avoids some of the criticism leveled at the above function.
int func(char ctr) {
for (;;) {
int c = count;
if (c >= sizeof(arr)) {
printf("\n No storage\n");
return -1;
}
if (CSET(&count, c, c+1)) {
arr[c] = ctr;
return c;
}
}
}
The C++ standard atomic operations library contains a set of atomic_compare_exchange functions which should serve the purpose, if available.
Related
Despite the number of similar questions on Stackoverflow, I can't come up with a solution for the following Producer-Consumer problem:
My program has three threads:
One writer thread that reads from a file, saves data into a sensor_data_t struct, and writes it into a dynamic pointer-based buffer using sbuffer_insert(buffer, &sensor_data). Once this thread finishes reading it sends an end-of-stream data struct represented by data->id == 0.
Two reader threads that remove data from the buffer head (FIFO-style), and store it into a temporary data struct using sbuffer_remove(buffer, &data) and then print it to the cmd line for testing purposes.
I think I have to avoid at the least:
My reader threads to try to consume/remove from the buffer while it is empty.
My reader threads to consume/remove from the buffer at the same time.
On the other hand, I don't think my writer thread in sbuffer_insert() needs to worry if the readers are changing the head because it only appends to the tail. Is this reasoning correct or am I missing something?
Here's what I've done so far:
My main function:
sbuffer_t *buffer;
void *writer(void *fp);
void *reader(void *fp);
int main()
{
// Initialize the buffer
sbuffer_init(&buffer);
// Open sensor_data file
FILE *sensor_data_fp;
sensor_data_fp = fopen("sensor_data", "rb");
// Start thread for reading sensor_data file adding elements to the sbuffer
pthread_t writer_thread;
pthread_create(&writer_thread, NULL, &writer, sensor_data_fp);
// Open sensor_data_out file
FILE *sensor_data_out_fp;
sensor_data_out_fp = fopen("sensor_data_out", "w");
// Start thread 1 and 2 for writing sensor_data_out file
pthread_t reader_thread1;
pthread_t reader_thread2;
pthread_create(&reader_thread1, NULL, &reader, sensor_data_out_fp);
pthread_create(&reader_thread2, NULL, &reader, sensor_data_out_fp);
// Wait for threads to finish and join them
pthread_join(reader_thread1, NULL);
pthread_join(reader_thread2, NULL);
pthread_join(writer_thread, NULL);
// Close sensor_data file
fclose(sensor_data_fp);
// Close sensor_data_out file
fclose(sensor_data_out_fp);
// free buffer
sbuffer_free(&buffer);
return 0;
}
My reader and writer threads:
typedef uint16_t sensor_id_t;
typedef double sensor_value_t;
typedef time_t sensor_ts_t; // UTC timestamp as returned by time() - notice that the size of time_t is different on 32/64 bit machine
typedef struct {
sensor_id_t id;
sensor_value_t value;
sensor_ts_t ts;
} sensor_data_t;
void *writer(void *fp)
{
// cast fp to FILE
FILE *sensor_data_fp = (FILE *)fp;
// make char buffers of size sensor_id_t, sensor_value_t and sensor_ts_t
char sensor_id_buffer[sizeof(sensor_id_t)];
char sensor_value_buffer[sizeof(sensor_value_t)];
char sensor_ts_buffer[sizeof(sensor_ts_t)];
// parse sensor_data file into sensor_id_buffer, sensor_value_buffer and sensor_ts_buffer
while(fread(sensor_id_buffer, sizeof(sensor_id_t), 1, sensor_data_fp) == 1 &&
fread(sensor_value_buffer, sizeof(sensor_value_t), 1, sensor_data_fp) == 1 &&
fread(sensor_ts_buffer, sizeof(sensor_ts_t), 1, sensor_data_fp)) {
// create sensor_data_t
sensor_data_t sensor_data;
// copy sensor_id_buffer to sensor_data.id
memcpy(&sensor_data.id, sensor_id_buffer, sizeof(sensor_id_t));
// copy sensor_value_buffer to sensor_data.value
memcpy(&sensor_data.value, sensor_value_buffer, sizeof(sensor_value_t));
// copy sensor_ts_buffer to sensor_data.ts
memcpy(&sensor_data.ts, sensor_ts_buffer, sizeof(sensor_ts_t));
// print sensor_data for testing
// printf("sensor_data.id: %d, sensor_data.value: %f, sensor_data.ts: %ld\n", sensor_data.id, sensor_data.value, sensor_data.ts);
// insert sensor_data into buffer
sbuffer_insert(buffer, &sensor_data);
}
// Add dummy data to buffer to signal end of file
sensor_data_t sensor_data;
sensor_data.id = 0;
sensor_data.value = 0;
sensor_data.ts = 0;
sbuffer_insert(buffer, &sensor_data);
return NULL;
}
void *reader(void *fp)
{
// cast fp to FILE
//FILE *sensor_data_out_fp = (FILE *)fp;
// Init data as sensor_data_t
sensor_data_t data;
do{
// read data from buffer
if (sbuffer_remove(buffer, &data) == 0) { // SBUFFER_SUCCESS 0
// write data to sensor_data_out file
// fwrite(&data, sizeof(sensor_data_t), 1, sensor_data_out_fp);
// print data for testing
printf("data.id: %d, data.value: %f, data.ts: %ld \n", data.id, data.value, data.ts);
}
}
while(data.id != 0);
// free allocated memory
// free(fp);
return NULL;
}
Global variables and buffer initialization:
typedef struct sbuffer_node {
struct sbuffer_node *next;
sensor_data_t data;
} sbuffer_node_t;
struct sbuffer {
sbuffer_node_t *head;
sbuffer_node_t *tail;
};
pthread_mutex_t mutex;
pthread_cond_t empty, removing;
int count = 0; // reader count
int sbuffer_init(sbuffer_t **buffer) {
*buffer = malloc(sizeof(sbuffer_t));
if (*buffer == NULL) return SBUFFER_FAILURE;
(*buffer)->head = NULL;
(*buffer)->tail = NULL;
// Initialize mutex and condition variables
pthread_mutex_init(&mutex, NULL);
pthread_cond_init(&empty, NULL);
pthread_cond_init(&removing, NULL);
return SBUFFER_SUCCESS;
}
sbuffer_remove (Consumer)
int sbuffer_remove(sbuffer_t *buffer, sensor_data_t *data) {
sbuffer_node_t *dummy;
if (buffer == NULL) return SBUFFER_FAILURE;
// while the count is 0, wait
pthread_mutex_lock(&mutex);
while (count > 0) {
pthread_cond_wait(&removing, &mutex);
}
pthread_mutex_unlock(&mutex);
pthread_mutex_lock(&mutex);
while (buffer->head == NULL){
pthread_cond_wait(&empty, &mutex); // Wait until buffer is not empty
if (data->id == 0){ // end-of-stream
pthread_mutex_unlock(&mutex);
return SBUFFER_NO_DATA;
}
}
count++;
*data = buffer->head->data;
dummy = buffer->head;
if (buffer->head == buffer->tail) // buffer has only one node
{
buffer->head = buffer->tail = NULL;
} else // buffer has many nodes empty
{
buffer->head = buffer->head->next;
}
free(dummy);
count--;
pthread_cond_signal(&removing); // Signal that data is removed
pthread_mutex_unlock(&mutex);
return SBUFFER_SUCCESS;
}
sbuffer_insert (Producer)
int sbuffer_insert(sbuffer_t *buffer, sensor_data_t *data) {
sbuffer_node_t *dummy;
if (buffer == NULL) return SBUFFER_FAILURE;
dummy = malloc(sizeof(sbuffer_node_t));
if (dummy == NULL) return SBUFFER_FAILURE;
dummy->data = *data;
dummy->next = NULL;
if (buffer->tail == NULL) // buffer empty (buffer->head should also be NULL
{
pthread_mutex_lock(&mutex);
buffer->head = buffer->tail = dummy;
pthread_cond_signal(&empty); // Signal that buffer is not empty
pthread_mutex_unlock(&mutex);
} else // buffer not empty
{
buffer->tail->next = dummy;
buffer->tail = buffer->tail->next;
}
return SBUFFER_SUCCESS;
}
Currently, the code has very unstable behavior. Sometimes it runs and prints everything, sometimes it doesn't print anything and gets stuck in a loop, sometimes it prints everything but the last value comes after the end-of-stream code and it doesn't terminate.
I would really appreciate a solution that explains what I'm doing wrong or a comment that redirects me to a duplicate of my question.
I think I have to avoid at the least:
My reader threads to try to consume/remove from the buffer while it is empty.
My reader threads to consume/remove from the buffer at the same time.
Yes, you must avoid those. And more.
On the other hand, I don't think my writer thread in sbuffer_insert()
needs to worry if the readers are changing the head because it only
appends to the tail. Is this reasoning correct or am I missing
something?
You are missing at least that
when the buffer contains fewer than two nodes, there is no distinction between the head node and the tail node. This manifests at the code level at least in the fact that your sbuffer_insert() and sbuffer_remove() functions both access both buffer->head and buffer->tail. From the perspective of synchronization requirements, it is this lower-level view that matters.
Insertion and removal modifies the node objects themselves, not just the overall buffer object.
Synchronization is not just, nor even primarily, about avoiding threads directly interfering with each other. It is even more about the consistency of different threads' views of memory. You need appropriate synchronization to ensure that one thread's writes to memory are (ever) observed by other threads, and to establish ordering relationships among operations on memory by different threads.
Currently, the code has very unstable behavior. Sometimes it runs and
prints everything, sometimes it doesn't print anything and gets stuck
in a loop, sometimes it prints everything but the last value comes
after the end-of-stream code and it doesn't terminate.
This is unsurprising, because your program contains data races, and its behavior is therefore undefined.
Do ensure that neither the reader nor the writer accesses any member of the buffer object without holding the mutex locked. As the code is presently structured, that will synchronize access not only to the buffer structure itself, but also to the data in the nodes, which presently are involved in their own data races.
Now note that here ...
while (buffer->head == NULL){
pthread_cond_wait(&empty, &mutex); // Wait until buffer is not empty
if (data->id == 0){ // end-of-stream
pthread_mutex_unlock(&mutex);
return SBUFFER_NO_DATA;
}
}
... you are testing for an end-of-data marker before actually reading an item from the buffer. It looks like that's useless in practice. In prarticular, it does not prevent the end-of-stream item from being removed from the buffer, so only one reader will see it. The other(s) will then end up waiting indefinitely for data that will never arrive.
Next, consider this code executed by the readers:
// while the count is 0, wait
pthread_mutex_lock(&mutex);
while (count > 0) {
pthread_cond_wait(&removing, &mutex);
}
pthread_mutex_unlock(&mutex);
Note well that the reader unlocks the mutex while count is 0, so there is an opportunity for another reader to reach the while loop and pass through. I'm not sure that two threads both getting past that point at the same time produces a problem in practice, but the point seems to be to avoid that, so do it right: move the count++ from later in the function to right after the while loop, prior to unlocking the mutex.
Alternatively, once you've done that, it should be clear(er) that you've effectively hand-rolled a binary semaphore. You could simplify your code by switching to an actual POSIX semaphore for this purpose. Or if you want to continue with a mutex + CV for this, then consider using a different mutex, as the data to be protected for this purpose are disjoint from the buffer and its contents. That would get rid of the weirdness of re-locking the mutex immediately after unlocking it.
Or on the third hand, consider whether you need to do any of that at all. How is the (separate) mutex protection of the rest of the body of sbuffer_remove() not sufficient by itself? I propose to you that it is sufficient. After all, you're presently using your hand-rolled semaphore exactly as (another) mutex.
The bones of this code seem reasonably good, so I don't think repairs will be too hard.
First, add the needed mutex protection in sbuffer_insert(). Or really, just expand the scope of the critical section that's already there:
int sbuffer_insert(sbuffer_t *buffer, sensor_data_t *data) {
sbuffer_node_t *dummy;
if (buffer == NULL) return SBUFFER_FAILURE;
dummy = malloc(sizeof(sbuffer_node_t));
if (dummy == NULL) return SBUFFER_FAILURE;
dummy->data = *data;
dummy->next = NULL;
pthread_mutex_lock(&mutex);
if (buffer->tail == NULL) // buffer empty (buffer->head should also be NULL
{
assert(buffer->head == NULL);
buffer->head = buffer->tail = dummy;
pthread_cond_signal(&empty); // Signal that buffer is not empty
} else // buffer not empty
{
buffer->tail->next = dummy;
buffer->tail = buffer->tail->next;
}
pthread_mutex_unlock(&mutex);
return SBUFFER_SUCCESS;
}
Second, simplify and fix sbuffer_remove():
int sbuffer_remove(sbuffer_t *buffer, sensor_data_t *data) {
if (buffer == NULL) {
return SBUFFER_FAILURE;
}
pthread_mutex_lock(&mutex);
// Wait until the buffer is nonempty
while (buffer->head == NULL) {
pthread_cond_wait(&empty, &mutex);
}
// Copy the first item from the buffer
*data = buffer->head->data;
if (data->id == 0) {
// end-of-stream: leave the item in the buffer for other readers to see
pthread_mutex_unlock(&mutex);
pthread_cond_signal(&empty); // other threads can consume this item
return SBUFFER_NO_DATA;
} // else remove the item
sbuffer_node_t *dummy = buffer->head;
buffer->head = buffer->head->next;
if (buffer->head == NULL) {
// buffer is now empty
buffer->tail = NULL;
}
pthread_mutex_unlock(&mutex);
free(dummy);
return SBUFFER_SUCCESS;
}
Not a complete answer here, but I see this in your sbuffer_remove function:
// while the count is 0, wait
pthread_mutex_lock(&mutex);
while (count > 0) {
pthread_cond_wait(&removing, &mutex);
}
pthread_mutex_unlock(&mutex);
That looks suspicious to me. What is the purpose of waiting for the count to become zero? Your code waits for the count to become zero, but then it does nothing else before it unlocks the mutex.
I don't know what count represents, but if the other reader thread is concurrently manipulating it, then there is no guarantee that it will still be zero once you've unlocked the mutex.
But, maybe that hasn't caused a problem for you because...
...This also looks suspicious:
pthread_mutex_unlock(&mutex);
pthread_mutex_lock(&mutex);
Why do you unlock the mutex and immediately lock it again? Are you thinking that will afford the other consumer a chance to lock it? Technically speaking, it does that, but in practical terms, the chance it offers is known as, "a snowball's chance in Hell." If thread A is waiting for a mutex that is locked by thread B, and thread B unlocks and then immediately tries to re-lock, then in most languages/libraries/operating systems, thread B will almost always succeed while thread A goes back to try again.
Mutexes work best when they are rarely contested. If you have a program in which threads spend any significant amount of time waiting for mutexes, then you probably are not getting much benefit from using multiple threads.
Here's my code:
#define COUNT_TO 100000000
#define MAX_CORES 4
static pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
long long i = 0;
void* start_counting(void *arg){
for(;;){
pthread_mutex_lock(&mutex);
if(i >= COUNT_TO){
pthread_mutex_unlock(&mutex);
return NULL;
}
i++;
pthread_mutex_unlock(&mutex);
//printf("i = %lld\n", i);
}
}
int main(int argc, char* argv[]){
int i = 0;
pthread_t * thread_group = malloc(sizeof(pthread_t) * MAX_CORES);
for(i = 0; i < MAX_CORES; i++){
pthread_create(&thread_group[i], NULL, start_counting, NULL);
}
for(i = 0; i < MAX_CORES; i++){
pthread_join(thread_group[i], NULL);
}
return 0;
}
This is what your threads do:
Read the value of i.
Increment the value we read.
Write back the incremented value of i.
Go to step 1.
Cleary, another thread cannot read the value of i after a different thread has accomplished step 1 but before it has completed step 3. So there can be no overlap between two threads doing steps 1, 2, or 3.
So all your threads are fighting over access to the same resource -- i (or the mutex that protects it). No thread can make useful forward progress without exclusive access to one or both of those. Given that, there is no benefit to using multiple threads since only one of them can accomplish useful work at a time.
I has this wrong code.
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#define MAX 1000
struct TContext {
const char* Name;
int* Counter;
int Mod;
};
void* ThreadFunc(void* arg) {
struct TContext* ctxt = arg;
int* counter = ctxt->Counter;
fprintf(stderr, "This is %s thread\n", ctxt->Name);
while (*counter < MAX) {
if (*counter % 2 == ctxt->Mod) {
printf("%d ", (*counter)++);
}
}
pthread_exit(0);
}
int main()
{
pthread_t t1;
pthread_t t2;
int counter = 0;
struct TContext ctxt1 = {"even", &counter, 0};
struct TContext ctxt2 = {"odd", &counter, 1};
pthread_create(&t1, 0, ThreadFunc, &ctxt1);
pthread_create(&t2, 0, ThreadFunc, &ctxt2);
pthread_join(t1, 0);
pthread_join(t2, 0);
printf("\n");
return 0;
}
My aim is to synchronize it and get sequnce 0, 1, 2, 3, 4, 5... .
I am try to lock and unlock mutex in this way
pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
void* ThreadFunc(void* arg) {
struct TContext* ctxt = arg;
int* counter = ctxt->Counter;
fprintf(stderr, "This is %s thread\n", ctxt->Name);
while (*counter < MAX) {
if (*counter % 2 == ctxt->Mod) {
pthread_mutex_lock(&mutex);
printf("%d ", (*counter)++);
pthread_mutex_unlock(&mutex);
}
}
pthread_exit(0);
}
But it works very slow(I has tl in one second).
How I can synchronize this code in more effective way? Or maybe I can optimize C-mutex?
A slightly more traditiona way than Chris Halls is:
pthread_cond_t cv;
pthread_mutex_t lock;
void* ThreadFunc(void* arg) {
struct TContext* ctxt = arg;
int* counter = ctxt->Counter;
fprintf(stderr, "This is %s thread\n", ctxt->Name);
pthread_mutex_lock(&lock);
while (*counter < MAX) {
if (*counter % 2 == ctxt->Mod) {
printf("%d ", (*counter)++);
pthread_cond_broadcast(&cv);
} else {
pthread_cond_wait(&cv, &lock);
}
}
pthread_mutex_unlock(&lock);
pthread_exit(0);
}
and in main:
pthread_mutex_init(&lock, 0);
pthread_cond_init(&cv, 0);
somewhere before creating the threads. This also lets you add an arbitrary number of even + odd threads without interference ( although no speedup, just intellectual curiosity ).
I suggest:
void* ThreadFunc(void* arg) {
struct TContext* ctxt = arg;
volatile int* counter = ctxt->Counter;
fprintf(stderr, "This is %s thread\n", ctxt->Name);
while (1)
{
int count ;
count = *counter ; // NB: volatile*
if (count >= MAX)
break ;
if ((count % 2) == ctxt->Mod)
{
printf("%d ", count) ;
*counter = count + 1 ;
} ;
} ;
pthread_exit(0);
}
Which, for x86/x86_64 at least, will have the effect I think you were looking for, namely that the two threads take turns in incrementing the counter.
The really interesting question is why this works :-)
Postscript
The code above depends, critically, on four things:
there is only one value being shared between the threads -- the counter,
the counter is simultaneously data and control -- the ls bit of the counter signals which thread should proceed.
reading and writing the counter must be atomic -- so every read of the counter reads the last value written (and not some combination of the previous and current write).
the compiler must emit code to actually read/write the counter from/to memory inside the loop.
Now (1) and (2) are specific to this particular problem. (3) is generally true for int (though may require correct alignment). (4) is achieved by defining the counter as volatile.
So, I originally said that this would work "for x86/x86_64 at least" because I know (3) is true for those devices, but I also believe it is true for many (most ?) common devices.
A cleaner implementation would define the counter _Atomic, as follows:
#include <stdatomic.h>
void* ThreadFunc(void* arg) {
struct TContext* ctxt = arg;
atomic_int* counter = ctxt->Counter;
fprintf(stderr, "This is %s thread\n", ctxt->Name);
while (1)
{
int count ;
count = atomic_load_explicit(counter, memory_order_relaxed) ;
if (count > MAX) // printing up to and including MAX
break ;
if ((count % 2) == ctxt->Mod)
{
printf("%d ", count) ;
atomic_store_explicit(counter, count + 1, memory_order_relaxed) ;
} ;
} ;
pthread_exit(0);
}
Which makes (3) and (4) explicit. But note that (1) and (2) still mean that we don't need any memory ordering. Every time each thread reads the counter, bit0 tells it whether it "owns" the counter. If it does not own the counter, the thread loops to read it again. If it does own the counter, it uses the value and then writes a new value -- and because that passes "ownership" it returns to the read loop (it cannot do anything further with the counter until it "owns" it again). Once MAX+1 has been written to the counter neither thread will use or change it, so that's safe too.
Brother Employed Russian is correct, there is a "data race" here, but that is resolved by a data dependency, particular to this case.
More Generally
The code above is not terribly useful, unless you have other applications with a single shared value. But this can be generalised, using memory_order_acquire and memory_order_acquire atomic operations.
Suppose we have some struct shared which contains some (non-trivial) amount of data which one thread will produce and another will consume. Suppose we again use atomic_uint counter (initially zero) to manage access to a given struct shared parcel. Now we have a producer thread which:
void* ThreadProducerFunc(void* arg)
{
atomic_uint counter = &count ; // somehow
....
while (1)
{
uint count ;
do
count = atomic_load_explicit(counter, memory_order_acquire) ;
while ((count & 1) == 1) ;
... fill the struct shared parcel, somehow ...
atomic_store_explicit(counter, count + 1, memory_order_release) ;
} ;
....
}
And a consumer thread which:
void* ThreadConsumerFunc(void* arg)
{
atomic_uint counter = &count ; // somehow
....
while (1)
{
uint count ;
do
count = atomic_load_explicit(counter, memory_order_acquire) ;
while ((count & 1) == 0) ;
... empty the struct shared parcel, somehow ...
atomic_store_explicit(counter, count + 1, memory_order_release) ;
} ;
....
}
The load-acquire operations synchronize with the store-release operations, so:
in the producer: the filling of the parcel will not start until the producer has "ownership" (as above), and will then "complete" (writes become visible to the other thread) before the count is updated (and the new value becomes visible to the other thread).
in the consumer: the emptying of the parcel will not start until the consumer has "ownership" (as above), and will then "complete" (all reads will have read from memory) before the count is updated (and the new value becomes visible to the other thread).
Clearly, the two threads are busy waiting for each other. But with two or more parcels and counters, the threads can progress at the speed of the slower.
Finally -- x86/x86_64 and acquire/release
With x86/x86_64, all memory reads and writes are implicitly acquire-reads and release-writes. This means that there is zero overhead in atomic_load_explicit(..., memory_order_acquire) and atomic_store_explicit(..., memory_order_release).
Conversely, all read-modify-write operations (and memory_order_seq_cst operations), carry overheads in the several-10s of clocks -- 30?, 50?, more if the operation is contended (depending on the device).
So, where performance is critical, it may be worth understanding what's possible (and what isn't).
How I can synchronize this code in more effective way?
You can't: the code is fundamentally inefficient.
The issue is that the amount of work that you do (incrementing an integer) is minuscule compared to the synchronization overhead, so the latter dominates.
To fix the problem, you need to do more work for each lock/unlock pair.
In a real program, you would have each thread perform 1000 or 10000 "work items" for each lock/unlock iteration. Something like:
lock;
const int start = *ctx->Counter;
*ctx->Counter += N;
unlock;
for (int j = start; j < start + N; j++) /* do work on j-th iteration here */;
But your toy program isn't amenable to this.
Or maybe I can optimize C-mutex?
I suggest trying to implement a correct mutex first. You'll quickly discover that this is far from trivial.
For staters, I am a student who wasn't a CS undergrad, but am moving into a CS masters. So I welcome any and all help anyone is willing to give.
The purpose of this was to create N threads between 2-4, then using a randomly generated array of lower case characters, make them uppercase.
This needed to be done using the N threads (defined by the command line when executed), dividing the work up as evenly as possible, using pthread.
My main question I'm trying to ask, is if I avoided race conditions between my threads?
I am also struggling to understand dividing the work among the threads. As I understand (correct me if I'm wrong), in general the threads functioning will be chosen at random during execution. So, I'm assuming I need to do something along the lines of dynamically dividing the array among the N number of threads and setting it so that each thread will perform the uppercasing of a same sized subsection of the array?
I know there are likely a number of other discrepancies I need to get better at within my code, but I haven't coded long and just started using C/C++ about a month ago.
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <pthread.h>
#include <ctype.h>
//Global variable for threads
char randChars[60];
int j=0;
//Used to avoid race conditions
pthread_mutex_t mutex1 = PTHREAD_MUTEX_INITIALIZER;
//Establish the threads
void* upperThread(void* argp)
{
while(randChars[j])
{
pthread_mutex_lock( &mutex1 );
putchar (toupper(randChars[j]));
j++;
pthread_mutex_unlock( &mutex1 );
}
return NULL;
}
int main(int argc, char **argv)
{
//Initializae variables and thread
int N,randNum,t;
long i;
pthread_t pth[N];
pthread_mutex_init(&mutex1, NULL);
char randChar = ' ';
//Check number of command inputs given
if(argc!=2)
{
fprintf(stderr,"usage: %s <enter a value for N>\n", argv[0]);
exit(0);
}
N = atoi(argv[1]);
//Checks command inputs for correct values
if(N<2||N>4){
printf("Please input a value between 2 and 4 for the number of threads.\n");
exit(0);
}
//Seed random to create a randomized value
srand(time(NULL));
printf("original lower case version:\n");
for (i=0; i<61; i++)
{
//Generate a random integer in lower alphabetical range
randNum = rand()%26;
randNum = randNum+97;
//Convert int to char and add to array
randChar = (char) randNum;
randChars[i] = randChar;
printf("%c", randChar);
}
//Create N threads
for (i=0; i<N; i++)
{
pthread_create(pth + i, NULL, upperThread, (void *)i);
}
printf("\n\nupper case version:\n");
//Join the threads
for(t=0; t < N; t++)
{
pthread_join(pth[t], NULL);
}
printf("\n");
pthread_exit(NULL);
return 0;
}
The example you provided is not a good multithreaded program. The reason is that your threads will constantly wait for the one which holds the lock. Which basically makes your program sequential. I would change your upperThread to
void* upperThread(void* argp){
int temp;
while(randChars[j]){
pthread_mutex_lock( &mutex1 );
temp = j;
j++;
pthread_mutex_unlock( &mutex1 );
putchar (toupper(randChars[temp]));
}
return NULL;
}
This way your threads will wait for one that holds the lock until it extracts the value of j , increment it and release the lock and then do the rest of its operations.
The general rule is that you have to acquire the lock only when you deal with critical section or critical data in this case it is an index of your string. Read about critical sections and racing conditions here
I'm trying to do a simple multi-threaded consumer/producer, where multiple reader and writer thread, read from a file to the buffer and then from buffer back into a file. It should be thread safe. however, it is not performing as i expected. It halts half way but everytime on a different line?
Please help me understand what I am doing wrong?!?
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
//TODO Define global data structures to be used
#define BUF_SIZE 5
FILE *fr;
FILE *to; /* declare the file pointer */
struct _data {
pthread_mutex_t mutex;
pthread_cond_t cond_read;
pthread_cond_t cond_write;
int condition;
char buffer[BUF_SIZE];
int datainbuffer;
}dc1 = {
PTHREAD_MUTEX_INITIALIZER,PTHREAD_COND_INITIALIZER,PTHREAD_COND_INITIALIZER,0,{0},0
};
void *reader_thread(void *arg) {
//TODO: Define set-up required
struct _data *d = (struct _data *)arg;
int killreaders = 0;
while(1) {
//TODO: Define data extraction (queue) and processing
pthread_mutex_lock(&d->mutex);
while (d->condition == 0 || d->datainbuffer<=0){
pthread_cond_wait( &d->cond_read, &d->mutex );
if(killreaders == 1){
pthread_mutex_unlock(&d->mutex);
pthread_cond_signal(&d->cond_read);
pthread_cond_signal(&d->cond_write);
return NULL;
}
}
d->condition = 0;
int i;
char res;
//if the buffer is not full, that means the end of file is reached and it time to kill the threads remaining.
if(d->datainbuffer!=BUF_SIZE)
killreaders = 1;
for (i=0; i<(sizeof d->datainbuffer); i++) {
res = d->buffer[i];
printf("to file:%c",res);
fputc(res, to);
}
d->datainbuffer = 0;
pthread_mutex_unlock(&d->mutex);
pthread_cond_signal( &d->cond_write );
}
return NULL;
}
void *writer_thread(void *arg) {
//TODO: Define set-up required
struct _data *d = (struct _data *)arg;
char * pChar;
int killwriters = 0;
while(1){
pthread_mutex_lock(&d->mutex);
while( d->condition == 1 || d->datainbuffer>0){
pthread_cond_wait( &d->cond_write, &d->mutex );
if(killwriters==1){
pthread_mutex_unlock(&d->mutex);
pthread_cond_signal(&d->cond_write);
pthread_cond_signal(&d->cond_read);
return NULL;
}
}
d->condition = 1;
int i;
char rc;
for (i = 0; i < BUF_SIZE; i++){
if((rc = getc(fr)) == EOF){
killwriters = 1;
pthread_mutex_unlock(&d->mutex);
pthread_cond_signal(&d->cond_read);
return NULL;
}
d->datainbuffer = i+1;
d->buffer[i] = rc;
printf("%c",rc);
}
int m = 0;
pthread_mutex_unlock(&d->mutex);
pthread_cond_signal(&d->cond_read);
}
return NULL;
}
#define M 10
#define N 20
int main(int argc, char **argv) {
struct _data dc=dc1;
fr = fopen ("from.txt", "rt"); /* open the file for reading */
if (fr == NULL)
{
printf("Could not open file!");
return 1;
}
to = fopen("to.txt", "wt");
int i;
pthread_t readers[N];
pthread_t writers[M];
for(i = 0; i < N; i++) {
pthread_create(&readers[i], NULL, reader_thread, (void*)&dc);
}
for(i = 0; i < M; i++) {
pthread_create(&writers[i], NULL, writer_thread, (void*)&dc);
}
fclose(fr);
fclose(to);
return 0;
}
any suggestion is appreciated!
Your threads are reading from and writing to files, which you open & close in main. But main doesn't explicitly wait for the threads to finish before closing those files.
In addition to the problem pointed out by Scott Hunter, your readers and writers do all their "real work" while holding the mutex, defeating the point of having more than one thread in the first place.
Readers should operate as follows:
1) Acquire mutex.
2) Block on the condition variable until work is available.
3) Remove work from queue, possibly signal condition variable.
4) Release mutex.
5) Process the work.
6) Go to step 1.
Writers should operate as follows:
1) Get the information we need to write.
2) Acquire the mutex.
3) Block on the condition variable until there is space on the queue.
4) Place information in the queue, possibly signal condition variable.
5) Release the mutex.
6) Go to step 1.
Notice both threads do the "real work" without holding the mutex? Otherwise, why have multiple threads if only one of them can do work at a time?
I'm not sure whether my answer is going to help you or not.. but I'm going to give my best by giving you some reference code.
I have written a similar program (except that it does not write to the file, instead display the queue-/produced-/consumed- items in the stdout). It can be found here - https://github.com/sangeeths/pc . I have separated the command-line processing and queue logic into a separate files.
Hope this helps!