How to use critical section - c

Hello I would like to write a program with 2 concurrent threads. First thread writes to the array letter 'A' and second one writes 'B'. My question is how to use critical section to gain result with alternately array with only letter A and with only letter B ? Here is my code, but it is not work properly. What is wrong with it?
#include <stdlib.h>
#include <stdio.h>
#include <windows.h>
#include <psapi.h>
#define SIZE_TAB 200
volatile char program[SIZE_TAB];
CRITICAL_SECTION CriticalSection;
DWORD WINAPI aa(void *v);
DWORD WINAPI bb(void *v);
int main(int argc, char *argv[])
{
InitializeCriticalSection(&CriticalSection);
HANDLE thread_a = CreateThread(NULL, 0, aa, 0, 0, 0);
HANDLE thread_b = CreateThread(NULL, 0, bb, 0, 0, 0);
while (1)
{
for (int i = 0; i<SIZE_TAB; i++)
printf("%c", program[i]);
Sleep(1000);
printf("\n\n");
}
DeleteCriticalSection(&CriticalSection);
CloseHandle(thread_a);
CloseHandle(thread_b);
return 0;
}
DWORD WINAPI aa(void *v)
{
EnterCriticalSection(&CriticalSection);
for (int i = 0; i < SIZE_TAB; i++)
{
program[i] = 'A';
for (int j = 0; j<8000; j++);
}
LeaveCriticalSection(&CriticalSection);
}
DWORD WINAPI bb(void *v)
{
EnterCriticalSection(&CriticalSection);
for (int i = 0; i<SIZE_TAB; i++)
{
program[i] = 'B';
for (int j = 0; j<8000; j++);
}
LeaveCriticalSection(&CriticalSection);
}

Critical section is a way of protecting data in a multi-threaded program. Once one thread enters a critical section, another thread cannot enter that same critical section until the first thread leaves it.
You have three threads in play here: the main thread, aa and bb. You have ensured that threads aa and bb cannot access the same data at the same time by protecting it with a critical section, but you left it open for the main thread to access it at any time (in the main loop where you print out the array). The main thread is not modifying it, but it is accessing it, so it will print out whatever it finds in the array at the time: the first thread that entered the critical section may have finished modifying the data or it may have not. Furthermore, you have surrounded the entire function body with a critical section in both aa and bb, which means that the first thread to enter it will have fully run through the loop before the other thread gets the chance.

Related

Why does my simple counting program take longer to run with multiple threads? (in C)

Here's my code:
#define COUNT_TO 100000000
#define MAX_CORES 4
static pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
long long i = 0;
void* start_counting(void *arg){
for(;;){
pthread_mutex_lock(&mutex);
if(i >= COUNT_TO){
pthread_mutex_unlock(&mutex);
return NULL;
}
i++;
pthread_mutex_unlock(&mutex);
//printf("i = %lld\n", i);
}
}
int main(int argc, char* argv[]){
int i = 0;
pthread_t * thread_group = malloc(sizeof(pthread_t) * MAX_CORES);
for(i = 0; i < MAX_CORES; i++){
pthread_create(&thread_group[i], NULL, start_counting, NULL);
}
for(i = 0; i < MAX_CORES; i++){
pthread_join(thread_group[i], NULL);
}
return 0;
}
This is what your threads do:
Read the value of i.
Increment the value we read.
Write back the incremented value of i.
Go to step 1.
Cleary, another thread cannot read the value of i after a different thread has accomplished step 1 but before it has completed step 3. So there can be no overlap between two threads doing steps 1, 2, or 3.
So all your threads are fighting over access to the same resource -- i (or the mutex that protects it). No thread can make useful forward progress without exclusive access to one or both of those. Given that, there is no benefit to using multiple threads since only one of them can accomplish useful work at a time.

How to speed up C mutex?

I has this wrong code.
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#define MAX 1000
struct TContext {
const char* Name;
int* Counter;
int Mod;
};
void* ThreadFunc(void* arg) {
struct TContext* ctxt = arg;
int* counter = ctxt->Counter;
fprintf(stderr, "This is %s thread\n", ctxt->Name);
while (*counter < MAX) {
if (*counter % 2 == ctxt->Mod) {
printf("%d ", (*counter)++);
}
}
pthread_exit(0);
}
int main()
{
pthread_t t1;
pthread_t t2;
int counter = 0;
struct TContext ctxt1 = {"even", &counter, 0};
struct TContext ctxt2 = {"odd", &counter, 1};
pthread_create(&t1, 0, ThreadFunc, &ctxt1);
pthread_create(&t2, 0, ThreadFunc, &ctxt2);
pthread_join(t1, 0);
pthread_join(t2, 0);
printf("\n");
return 0;
}
My aim is to synchronize it and get sequnce 0, 1, 2, 3, 4, 5... .
I am try to lock and unlock mutex in this way
pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
void* ThreadFunc(void* arg) {
struct TContext* ctxt = arg;
int* counter = ctxt->Counter;
fprintf(stderr, "This is %s thread\n", ctxt->Name);
while (*counter < MAX) {
if (*counter % 2 == ctxt->Mod) {
pthread_mutex_lock(&mutex);
printf("%d ", (*counter)++);
pthread_mutex_unlock(&mutex);
}
}
pthread_exit(0);
}
But it works very slow(I has tl in one second).
How I can synchronize this code in more effective way? Or maybe I can optimize C-mutex?
A slightly more traditiona way than Chris Halls is:
pthread_cond_t cv;
pthread_mutex_t lock;
void* ThreadFunc(void* arg) {
struct TContext* ctxt = arg;
int* counter = ctxt->Counter;
fprintf(stderr, "This is %s thread\n", ctxt->Name);
pthread_mutex_lock(&lock);
while (*counter < MAX) {
if (*counter % 2 == ctxt->Mod) {
printf("%d ", (*counter)++);
pthread_cond_broadcast(&cv);
} else {
pthread_cond_wait(&cv, &lock);
}
}
pthread_mutex_unlock(&lock);
pthread_exit(0);
}
and in main:
pthread_mutex_init(&lock, 0);
pthread_cond_init(&cv, 0);
somewhere before creating the threads. This also lets you add an arbitrary number of even + odd threads without interference ( although no speedup, just intellectual curiosity ).
I suggest:
void* ThreadFunc(void* arg) {
struct TContext* ctxt = arg;
volatile int* counter = ctxt->Counter;
fprintf(stderr, "This is %s thread\n", ctxt->Name);
while (1)
{
int count ;
count = *counter ; // NB: volatile*
if (count >= MAX)
break ;
if ((count % 2) == ctxt->Mod)
{
printf("%d ", count) ;
*counter = count + 1 ;
} ;
} ;
pthread_exit(0);
}
Which, for x86/x86_64 at least, will have the effect I think you were looking for, namely that the two threads take turns in incrementing the counter.
The really interesting question is why this works :-)
Postscript
The code above depends, critically, on four things:
there is only one value being shared between the threads -- the counter,
the counter is simultaneously data and control -- the ls bit of the counter signals which thread should proceed.
reading and writing the counter must be atomic -- so every read of the counter reads the last value written (and not some combination of the previous and current write).
the compiler must emit code to actually read/write the counter from/to memory inside the loop.
Now (1) and (2) are specific to this particular problem. (3) is generally true for int (though may require correct alignment). (4) is achieved by defining the counter as volatile.
So, I originally said that this would work "for x86/x86_64 at least" because I know (3) is true for those devices, but I also believe it is true for many (most ?) common devices.
A cleaner implementation would define the counter _Atomic, as follows:
#include <stdatomic.h>
void* ThreadFunc(void* arg) {
struct TContext* ctxt = arg;
atomic_int* counter = ctxt->Counter;
fprintf(stderr, "This is %s thread\n", ctxt->Name);
while (1)
{
int count ;
count = atomic_load_explicit(counter, memory_order_relaxed) ;
if (count > MAX) // printing up to and including MAX
break ;
if ((count % 2) == ctxt->Mod)
{
printf("%d ", count) ;
atomic_store_explicit(counter, count + 1, memory_order_relaxed) ;
} ;
} ;
pthread_exit(0);
}
Which makes (3) and (4) explicit. But note that (1) and (2) still mean that we don't need any memory ordering. Every time each thread reads the counter, bit0 tells it whether it "owns" the counter. If it does not own the counter, the thread loops to read it again. If it does own the counter, it uses the value and then writes a new value -- and because that passes "ownership" it returns to the read loop (it cannot do anything further with the counter until it "owns" it again). Once MAX+1 has been written to the counter neither thread will use or change it, so that's safe too.
Brother Employed Russian is correct, there is a "data race" here, but that is resolved by a data dependency, particular to this case.
More Generally
The code above is not terribly useful, unless you have other applications with a single shared value. But this can be generalised, using memory_order_acquire and memory_order_acquire atomic operations.
Suppose we have some struct shared which contains some (non-trivial) amount of data which one thread will produce and another will consume. Suppose we again use atomic_uint counter (initially zero) to manage access to a given struct shared parcel. Now we have a producer thread which:
void* ThreadProducerFunc(void* arg)
{
atomic_uint counter = &count ; // somehow
....
while (1)
{
uint count ;
do
count = atomic_load_explicit(counter, memory_order_acquire) ;
while ((count & 1) == 1) ;
... fill the struct shared parcel, somehow ...
atomic_store_explicit(counter, count + 1, memory_order_release) ;
} ;
....
}
And a consumer thread which:
void* ThreadConsumerFunc(void* arg)
{
atomic_uint counter = &count ; // somehow
....
while (1)
{
uint count ;
do
count = atomic_load_explicit(counter, memory_order_acquire) ;
while ((count & 1) == 0) ;
... empty the struct shared parcel, somehow ...
atomic_store_explicit(counter, count + 1, memory_order_release) ;
} ;
....
}
The load-acquire operations synchronize with the store-release operations, so:
in the producer: the filling of the parcel will not start until the producer has "ownership" (as above), and will then "complete" (writes become visible to the other thread) before the count is updated (and the new value becomes visible to the other thread).
in the consumer: the emptying of the parcel will not start until the consumer has "ownership" (as above), and will then "complete" (all reads will have read from memory) before the count is updated (and the new value becomes visible to the other thread).
Clearly, the two threads are busy waiting for each other. But with two or more parcels and counters, the threads can progress at the speed of the slower.
Finally -- x86/x86_64 and acquire/release
With x86/x86_64, all memory reads and writes are implicitly acquire-reads and release-writes. This means that there is zero overhead in atomic_load_explicit(..., memory_order_acquire) and atomic_store_explicit(..., memory_order_release).
Conversely, all read-modify-write operations (and memory_order_seq_cst operations), carry overheads in the several-10s of clocks -- 30?, 50?, more if the operation is contended (depending on the device).
So, where performance is critical, it may be worth understanding what's possible (and what isn't).
How I can synchronize this code in more effective way?
You can't: the code is fundamentally inefficient.
The issue is that the amount of work that you do (incrementing an integer) is minuscule compared to the synchronization overhead, so the latter dominates.
To fix the problem, you need to do more work for each lock/unlock pair.
In a real program, you would have each thread perform 1000 or 10000 "work items" for each lock/unlock iteration. Something like:
lock;
const int start = *ctx->Counter;
*ctx->Counter += N;
unlock;
for (int j = start; j < start + N; j++) /* do work on j-th iteration here */;
But your toy program isn't amenable to this.
Or maybe I can optimize C-mutex?
I suggest trying to implement a correct mutex first. You'll quickly discover that this is far from trivial.

Trying to understand Race Conditions/Threads in C

For staters, I am a student who wasn't a CS undergrad, but am moving into a CS masters. So I welcome any and all help anyone is willing to give.
The purpose of this was to create N threads between 2-4, then using a randomly generated array of lower case characters, make them uppercase.
This needed to be done using the N threads (defined by the command line when executed), dividing the work up as evenly as possible, using pthread.
My main question I'm trying to ask, is if I avoided race conditions between my threads?
I am also struggling to understand dividing the work among the threads. As I understand (correct me if I'm wrong), in general the threads functioning will be chosen at random during execution. So, I'm assuming I need to do something along the lines of dynamically dividing the array among the N number of threads and setting it so that each thread will perform the uppercasing of a same sized subsection of the array?
I know there are likely a number of other discrepancies I need to get better at within my code, but I haven't coded long and just started using C/C++ about a month ago.
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <pthread.h>
#include <ctype.h>
//Global variable for threads
char randChars[60];
int j=0;
//Used to avoid race conditions
pthread_mutex_t mutex1 = PTHREAD_MUTEX_INITIALIZER;
//Establish the threads
void* upperThread(void* argp)
{
while(randChars[j])
{
pthread_mutex_lock( &mutex1 );
putchar (toupper(randChars[j]));
j++;
pthread_mutex_unlock( &mutex1 );
}
return NULL;
}
int main(int argc, char **argv)
{
//Initializae variables and thread
int N,randNum,t;
long i;
pthread_t pth[N];
pthread_mutex_init(&mutex1, NULL);
char randChar = ' ';
//Check number of command inputs given
if(argc!=2)
{
fprintf(stderr,"usage: %s <enter a value for N>\n", argv[0]);
exit(0);
}
N = atoi(argv[1]);
//Checks command inputs for correct values
if(N<2||N>4){
printf("Please input a value between 2 and 4 for the number of threads.\n");
exit(0);
}
//Seed random to create a randomized value
srand(time(NULL));
printf("original lower case version:\n");
for (i=0; i<61; i++)
{
//Generate a random integer in lower alphabetical range
randNum = rand()%26;
randNum = randNum+97;
//Convert int to char and add to array
randChar = (char) randNum;
randChars[i] = randChar;
printf("%c", randChar);
}
//Create N threads
for (i=0; i<N; i++)
{
pthread_create(pth + i, NULL, upperThread, (void *)i);
}
printf("\n\nupper case version:\n");
//Join the threads
for(t=0; t < N; t++)
{
pthread_join(pth[t], NULL);
}
printf("\n");
pthread_exit(NULL);
return 0;
}
The example you provided is not a good multithreaded program. The reason is that your threads will constantly wait for the one which holds the lock. Which basically makes your program sequential. I would change your upperThread to
void* upperThread(void* argp){
int temp;
while(randChars[j]){
pthread_mutex_lock( &mutex1 );
temp = j;
j++;
pthread_mutex_unlock( &mutex1 );
putchar (toupper(randChars[temp]));
}
return NULL;
}
This way your threads will wait for one that holds the lock until it extracts the value of j , increment it and release the lock and then do the rest of its operations.
The general rule is that you have to acquire the lock only when you deal with critical section or critical data in this case it is an index of your string. Read about critical sections and racing conditions here

Changing parts of arrays/structs/.. in threads without blocking the whole thing, in pure c

I want to modify some (not all) fields of an array (or structs) in multiple threads, with out blocking the rest of the array as the rest of it is being modified in other threads. How is this achieved? I found some answers, but they are for C++ and I want to do it in C.
Here is the code I got so far:
#define _GNU_SOURCE
#include <pthread.h>
#include <stdio.h>
#include <semaphore.h>
#include <stdlib.h>
#include <time.h>
#include <unistd.h>
#define ARRAYLENGTH 5
#define TARGET 10000
int target;
typedef struct zstr{
int* array;
int place;
int run;
pthread_mutex_t* locks;
}zstr;
void *countup(void *);
int main(int argc, char** args){
int al;
if(argc>2){
al=atoi(args[1]);
target=atoi(args[2]);
}else{
al=ARRAYLENGTH;
target=TARGET;
}
printf("%d %d\n", al, target);
zstr* t=malloc(sizeof(zstr));
t->array=calloc(al, sizeof(int));
t->locks=calloc(al, sizeof(pthread_mutex_t));
int* rua=calloc(al, sizeof(int));
pthread_t id[4*al];
for(int i=0; i<al; i++)
pthread_mutex_init(&(t->locks[i]), NULL);
for(int j=0; j<4*al; j++){
int st=j%al;
t->run=rua[st]++;
t->place=st;
pthread_create(&id[j], NULL, &countup, t);
}
for(int k=0; k<4*al; k++){
pthread_join(id[k], NULL);
}
for(int u=0; u<al; u++)
printf("%d\n", t->array[u]);
free(rua);
free(t->locks);
free(t->array);
return 0;
}
void *countup(void* table){
zstr* nu=table;
if(!nu->run){
pthread_mutex_lock(nu->locks + nu->place);
}else{
pthread_mutex_trylock(nu->locks + nu->place);
}
while(nu->array[nu->place]<target)
nu->array[nu->place]++;
pthread_mutex_unlock(nu->locks + nu->place);
return NULL;
}
Sometimes this works just fine, but then calculates wrong values and for quiet sort problems (like the default values), it takes super long (strangely it worked once when I handed them in as parameters).
There isn't anything special about part of an array or structure. What matters is that the mutex or other synchronization you apply to a given value is used correctly.
In this case, it seems like you're not checking your locking function results.
The design of the countup function only allows a single thread to ever access the object, running the value all the way up to target before releasing the lock, but you don't check the trylock result.
So what's probably happening is the first thread gets the lock, and subsequent threads on the same mutex call trylock and fail to get the lock, but the code doesn't check the result. Then you get multiple threads incrementing the same value without synchronization. Given all the pointer dereferences the index and increment operations are not guaranteed to be atomic, leading to problems where the values grow well beyond target.
The moral of the story is to check function results and handle errors.
Sorry, don't have enough reputation to comment, yet.
Adding to Brad's comment of not checking the result of pthread_mutex_trylock, there's a misconception that shows many times with Pthreads:
You assume, that pthread_create will start immediately, and receive the values passed (here pointer t to your struct) and it's content read atomically. That is not true. The thread might start any time later and will find the contents, like t->run and t->place already changed by the next iteration of the j-loop in main.
Moreover, you might want to read David Butenhof's book "Programming with Posix Threads" (old, but still a good reference) and check on synchronization and condition variables.
It's not that good style to start that many threads in the first place ;)
As this has come up a few times and might come up again, I have restructured that a bit to issue work_items to the started threads. The code below might be amended by a function, that maps the index into array to always the same area_lock, or by adding a queue to feed the running threads with further work-item...
#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <unistd.h>
#include <pthread.h>
/*
* Macros for default values. To make it more interesting, set:
* ARRAYLENGTH != THREADS
* INCREMENTS != TARGET
* NUM_AREAS != THREADS
* Please note, that NUM_AREAS must be <= ARRAY_LENGTH.
*/
#define ARRAYLENGTH 10
#define TARGET 100
#define INCREMENTS 10
#define NUM_AREAS 2
#define THREADS 5
/* These variables are initialized once in main, then only read... */
int array_len;
int target;
int num_areas;
int threads;
int increments;
/**
* A long array that is going to be equally split into number of areas.
* Each area is covered by a lock. The number of areas do not have to
* equal the length of the array, but must be smaller...
*/
typedef struct shared_array {
int * array;
int num_areas;
pthread_mutex_t * area_locks;
} shared_array;
/**
* A work-item a thread is assigned to upon startup (or later on).
* Then a value of { 0, any } might signal the ending of this thread.
* The thread is working on index within zstr->array, counting up increments
* (or up until the target is reached).
*/
typedef struct work_item {
shared_array * zstr;
int work_on_index;
int increments;
} work_item;
/* Local function declarations */
void * countup(void *);
int main(int argc, char * argv[]) {
int i;
shared_array * zstr;
if (argc == 1) {
array_len = ARRAYLENGTH;
target = TARGET;
num_areas = NUM_AREAS;
threads = THREADS;
increments = INCREMENTS;
} else if (argc == 6) {
array_len = atoi(argv[1]);
target = atoi(argv[2]);
num_areas = atoi(argv[3]);
threads = atoi(argv[4]);
increments = atoi(argv[5]);
} else {
fprintf(stderr, "USAGE: %s len target areas threads increments", argv[0]);
exit(-1);
}
assert(array_len >= num_areas);
zstr = malloc(sizeof (shared_array));
zstr->array = calloc(array_len, sizeof (int));
zstr->num_areas = num_areas;
zstr->area_locks = calloc(num_areas, sizeof (pthread_mutex_t));
for (i = 0; i < num_areas; i++)
pthread_mutex_init(&(zstr->area_locks[i]), NULL);
pthread_t * id = calloc(threads, sizeof (pthread_t));
work_item * work_items = calloc(threads, sizeof (work_item));
for (i = 0; i < threads; i++) {
work_items[i].zstr = zstr;
work_items[i].work_on_index = i % array_len;
work_items[i].increments = increments;
pthread_create(&(id[i]), NULL, &countup, &(work_items[i]));
}
// Let's just do this one work-item.
for (i = 0; i < threads; i++) {
pthread_join(id[i], NULL);
}
printf("Array: ");
for (i = 0; i < array_len; i++)
printf("%d ", zstr->array[i]);
printf("\n");
free(id);
free(work_items);
free(zstr->area_locks);
free(zstr->array);
return 0;
}
void *countup(void* first_work_item) {
work_item * wi = first_work_item;
int inc;
// Extract the information from this work-item.
int idx = wi->work_on_index;
int area = idx % wi->zstr->num_areas;
pthread_mutex_t * lock = &(wi->zstr->area_locks[area]);
pthread_mutex_lock(lock);
for (inc = wi->increments; inc > 0 && wi->zstr->array[idx] < target; inc--)
wi->zstr->array[idx]++;
pthread_mutex_unlock(lock);
return NULL;
}

How do I properly allocate memory in my C program?

I am writing a Windows program in C for a homework assignment and I am running into a problem that causes my program to crash with program.exe has stopped working. I believe that this is due to the memory not being allocated correctly.
The program is supposed to start multiple threads to perform a task, I have found an example on MSDN on creating threads. I have added parts of the code into my program.
My program:
#include <stdio.h>
#include <stdlib.h>
#include <malloc.h>
#include <Windows.h>
#define MAX_THREADS 4
#define BUFFER_SIZE 65000
DWORD WINAPI SomeFunction( LPVOID lpParam );
char fileBuffer[BUFFER_SIZE];
typedef struct MyData {
int val1;
int val2;
} MYDATA, *PMYDATA;
int main(int argc, char *argv[])
{
int i = 0;
int j = 0;
PMYDATA pDataArray[MAX_THREADS];
DWORD dwThreadIdArray[MAX_THREADS];
HANDLE hThreadArray[MAX_THREADS];
for (i; i < MAX_THREADS; i++)
{
pDataArray[i] = (PMYDATA) HeapAlloc(GetProcessHeap(), HEAP_ZERO_MEMORY,
sizeof(MYDATA));
if( pDataArray[i] == NULL )
{
// If the array allocation fails, the system is out of memory
// so there is no point in trying to print an error message.
// Just terminate execution.
ExitProcess(2);
}
// Create the thread to begin execution on its own.
hThreadArray[i] = CreateThread(NULL, 0, SomeFunction, pDataArray[i], 0, &dwThreadIdArray[i]);
if (hThreadArray[i] == NULL)
{
printf("%s\n", "Error creating thread!");
ExitProcess(3);
}
}
for (j; j < MAX_THREADS; j++)
{
printf("%s%d\n", "j=", j);
WaitForSingleObject(hThreadArray[j], INFINITE);
}
//WaitForMultipleObjects(MAX_THREADS, hThreadArray, TRUE, INFINITE);
i = 0;
for(i; i<MAX_THREADS; i++)
{
CloseHandle(hThreadArray[i]);
if(pDataArray[i] != NULL)
{
HeapFree(GetProcessHeap(), 0, pDataArray[i]);
pDataArray[i] = NULL; // Ensure address is not reused.
}
}
printf("%s\n", "DONE!");
return 0;
}
DWORD WINAPI SomeFunction( LPVOID lpParam)
{
PMYDATA pDataArray;
int anotherInt;
anotherInt = pDataArray->val1; // PROBLEM OCCURS HERE!
printf("%s%d\n", "Printing int ", anotherInt);
return 0;
}
The program above should be able to start multiple threads which execute SomeFunction(). I have isolated bug to this function, specifically the line anotherInt = pDataArray->val1;. pdataArray is an array of MyData defined in a struct and each element is passed into a thread.
Did I not allocate the memory for the array correctly? If not, how would I access the members of the struct that was passed in as the parameter to SomeFunction()? I have gone over my code a couple of times and could not find anything wrong that I know of. The example I followed on MSDN is here.
In MyFunction, PMYDATA pDataArray; doesn't magically become equal to the pDataArray in main. It's an uninitialized pointer, and pDataArray->val1; tries to write to a random memory location.
Hint: you also have a LPVOID lparam which you ignore.

Resources