My code does the following: creates N threads, each one of them increments the global variable counter M times. I am using a mutex in order to assure the final value of counter is M*N.
I would like to observe the situation without a mutex, to obtain a different value for counter, in order to proper assess the mutex's work. I commented out the mutex, but the results are the same. Should I put them to sleep for a random period of time? How should I procede?
#include <stdio.h>
#include <pthread.h>
#define N 10
#define M 4
pthread_mutex_t mutex;
int counter=0;
void *thread_routine(void *parameter)
{
pthread_mutex_lock(&mutex);
int i;
for (i=0; i<M; i++)
counter++;
pthread_mutex_unlock(&mutex);
}
int main(void)
{
pthread_t v[N];
int i;
pthread_mutex_init(&mutex,NULL);
for (i=0; i<N; i++)
{
pthread_create(&v[i],NULL,thread_routine,NULL);
}
for (i=0; i<N; i++)
{
pthread_join(v[i],NULL);
}
printf("%d %d\n",counter,N*M);
if (N*M==counter)
printf("Success!\n");
pthread_mutex_destroy(&mutex);
return 0;
}
I don't know what compiler you used, but g++ in this case would completely eliminate the threads and calculate the final value of counter at compile time.
To prevent that optimization you can make the counter variable volatile
volatile int counter=0;
As this will tell the compiler that the variable can change at any time by external resources, it is forced to not to do any optimizations on that variable that could have side effects. As an external resource could change the value, the final value might not be the result of N*M and therefore the value of counter will be calculated at run-time.
Also what WhozCraig stated in his comment will most likely apply in your case. But I think he meant M, not N.
Additionally to your original question: As you read the counter once all threads are joined, it might be worth to give each thread its own counter and sum all threads' counters when they finished computation. This way you can compute the final value without any locks or atomic operations.
Edit:
Your first test with mutex would look like this
#define N 10
#define M 10000000
pthread_mutex_t mutex;
volatile int counter=0;
void *thread_routine(void *parameter)
{
pthread_mutex_lock(&mutex);
int i;
for (i=0; i<M; i++)
counter++;
pthread_mutex_unlock(&mutex);
}
and your second test without mutex like this
#define N 10
#define M 10000000
pthread_mutex_t mutex;
volatile int counter=0;
void *thread_routine(void *parameter)
{
// pthread_mutex_lock(&mutex);
int i;
for (i=0; i<M; i++)
counter++;
// pthread_mutex_unlock(&mutex);
}
while the second test will have the expected race conditions when incrementing the counter variable.
Compilation can be done using gcc -O3 -lpthread test.c
Related
So i have a global variable called counter and i run 4 threads which increment in million times but the result i am getting at the end does not even reach 2 million.
#include <stdio.h>
#include <pthread.h>
#include <stdlib.h>
int nthread;
int counter=0;
void *f(void *arg)
{
int i = *(int *)arg;
int *p;
for (int c = 0; c < 1000000; c++)
{
counter++;
}
printf(" I am thread %d (out of %d),tid =% ld\n", i, nthread, pthread_self());
p = malloc(sizeof(int));
*p = i * 2;
pthread_exit(p); // return p
}
int main(int argc, char *argv[])
{
pthread_t *tid;
int e, i, *ti;
nthread = 4;
tid = malloc(nthread * sizeof(pthread_t));
ti = malloc(nthread * sizeof(int));
for (i = 0; i < nthread; i++)
{
ti[i] = i;
if ((e = pthread_create(&tid[i], NULL, f, &ti[i])) != 0)
send_error(e, " pthread_create ");
}
for (i = 0; i < nthread; i++)
{
void *r;
if ((e = pthread_join(tid[i], &r)) != 0)
send_error(e, " pthread_join ");
printf(" Return of thread %d = %d\n", i, *(int *)r);
free(r);
}
printf("counter is %d\n",counter);
free(tid);
free(ti);
}
What causes this and how i can fix this?
PS:if your code not compile replace send_error with printfs
The pthreads standards is very clear that you may not access an object in one thread while another thread is, or might be, modifying it. Your code violates this rule.
There are many reasons for this rule, but the most obvious is this:
for (int c = 0; c < 1000000; c++)
{
counter++;
}
You want your compiler to optimize code like this. You want it to keep counter in a register or even eliminate the loop if it can. But without the requirement that you avoid threads overlapping modifications and accesses to the same object, the compiler would have to somehow prove that no other code in any other thread could touch counter while this code was running.
That would result in a huge amount of valuable optimizations being impossible on the 99% of code that doesn't share objects across threads just because the compiler can't prove that accesses might overlap.
It makes much more sense to require code that does have overlapping object access to clearly indicate that they do. And every threading standard provides good ways to do this, including pthreads.
You can use any method to prevent this problem that you like. Using a mutex is the simplest and definitely the one you should learn first.
Using c11 threads, I'm trying to ensure that foo is thread safe. While foo isn't reentrant, I'm trying to mitigate this with mutexes.
I don't understand why the value of thrdn is changing in the critical loop. My understanding was that each threaded call to foo would have its own version of thrdn, but it seems that it is being modified by other threads at run-time.
I've tried moving the mtx_lock above thrdn's declaration and changing thrdn to be of type atomic_int * however these both result in the same behaviour.
#include <stdio.h>
#include <threads.h>
#include <string.h>
#include <stdlib.h>
#define THREAD_MAX 5
thrd_t threads[THREAD_MAX];
mtx_t mtx;
void foo(void * data)
{
int* thrdn = (int *)data;
mtx_lock(&mtx);
for(int i = 0; i < 3; ++i) {
printf("thread %d, number %d\n", *thrdn, i);
}
mtx_unlock(&mtx);
}
int main()
{
mtx_init(&mtx, mtx_plain | mtx_recursive);
for(int i = 0; i < THREAD_MAX; ++i){
thrd_create(&threads[i], foo, &i);
}
for(int i = 0; i < THREAD_MAX; ++i){
thrd_join(threads[i], NULL);
}
mtx_destroy(&mtx);
}
As has been noted in the comments, the issue was the reference to the local variable i. Tracking thread ids separately as seen in this answer solved the issue.
I'm tryng to add all the members of an array using openmp this way
#include <omp.h>
#include <stdio.h>
#include <stdlib.h>
int main (int argc, char *argv[])
{
int v[] ={1,2,3,4,5,6,7,8,9};
int sum = 0;
#pragma omp parallel private(v, sum)
{
#pragma reduction(+: sum)
{
for (int i = 0; i < sizeof(v)/sizeof(int); i++){
sum += v[i];
}
}
}
printf("%d\n",sum);
}
But when I print sum the result is 0
You are very confused about data-sharing attributes and work-sharing for OpenMP. This answer does not attempt to properly teach them to you, but only give you a concise specific example.
Your code does not make any sense and does not compile.
You do not need multiple regions or such, and there are only two variables. v - which is defined outside, is read by all and must be shared - which it implicitly is because it is defined outside. Then there is sum, which is a reduction variable.
Further, you need to apply worksharing (for) to the loop. So in the end it looks like this:
int v[] ={1,2,3,4,5,6,7,8,9};
int sum = 0;
#pragma omp parallel for reduction(+: sum)
for (int i = 0; i < sizeof(v)/sizeof(int); i++){
sum += v[i];
}
printf("%d\n",sum);
Note there are private variables in this example. Private variables are very dangerous because they are uninitialized inside the parallel region, simply don't use them explicitly. If you need something local, declare it inside the parallel region.
I am rather facing a strange problem. This is not crucial for my work but I still want to understand this behavior. I am running three tasks in the order of their priority. I am calling one function from all of these threads with different arguments. For the highest priority thread (l3_thread) I get the right value for int J, but for other threads with low priority (l2_thread), I see garbage value for J. What is the concept that is at play here ?
Code:
int p_task(int limit1, int limit2, int sleep_time, int prio){
int i, j;
for(i=limit1; i<=limit2; i++)
{
j=j+1;
printf("J = %d \n", j);
}
return 0;
}
void *l3_thread(void *arg){
/*call to p_task*/
pthread_exit(NULL);
}
void *l2_thread(void *arg){
/*call to p_task*/
pthread_exit(NULL);
}
I see garbage value for J...
This is because the variable j in the function p_task() is not initialized.
int i, j;
for(i=limit1; i<=limit2; i++)
{
j=j+1; //j is not initialized and used
If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate.
[C Standards#6.7.9p10]
I have the following code of filling an array with multiple threads:
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#define MAX_ITEMS 67108864
#define LINES_PER_THREAD 8388608
#define THREADS 8
static int *array;
static pthread_t pids[THREADS];
static int args[THREADS];
static void init_array_line(int *line) {
int i, max;
i = *line;
max = i + LINES_PER_THREAD;
for (i; i < max; i++)
array[i] = rand() % 10000 + 1;
}
static void init_array() {
int i;
for ( i = 0; i < THREADS; i++) {
args[i]=i* LINES_PER_THREAD;
pthread_create(pids + i, NULL, &init_array_line, args + i);;
}
}
static wait_all() {
for (int i = 0; i < THREADS; i++) {
pthread_join(pids[i], NULL);
}
}
int
main(int argc, char **argv)
{
array = (int *)malloc(MAX_ITEMS * sizeof(int));
init_array();
wait_all();
}
I am giving each thread 1/8 of the array to fill LINES_PER_THREAD, but it seems that it takes longer than filling it normally. Any suggestions why might this be?
I suspect the main bottleneck would the calls to rand(). rand() isn't required to be thread-safe. So, it can't be safely used in a multi-threaded program when multiple threads could call rand() concurrently. But the Glibc implementation uses an internal lock to protect against such uses. This effectively serializes the call to rand() in all threads and thus severely affecting the multi-threaded nature of your program. Instead use rand_r() which doesn't need to maintain any internal state (because the caller(s) do) and can at least solve this aspect of your problem.
In general, if the threads don't do sufficient work then the thread creation/synchronization overhead can outdo the concurrency that could be available on multi-core systems using threads.