I'm writing some simple example to understand how the things work with OpenMP programs.
#include <stdio.h>
#include <math.h>
#include <stdlib.h>
#include <omp.h>
int main (int argc ,char* argv[]){
omp_set_num_threads(4);
int j =0;
#pragma omp parallel private (j)
{
int i;
for(i=1;i<2;i++){
printf("from thread %d : i is equel to %d and j is equal to %d\n ",omp_get_thread_num(),i,j);
}
}
}
So in this example I should get j=0 each time,
unfortunately the result is j == 0 3 times , and j == 32707 one time.
What is wrong with my example?
Use firstprivate(j) rather than private(j) if you want that each thread has a private copy of j with the initial value being the value before entering the parallel region.
Related
Using c11 threads, I'm trying to ensure that foo is thread safe. While foo isn't reentrant, I'm trying to mitigate this with mutexes.
I don't understand why the value of thrdn is changing in the critical loop. My understanding was that each threaded call to foo would have its own version of thrdn, but it seems that it is being modified by other threads at run-time.
I've tried moving the mtx_lock above thrdn's declaration and changing thrdn to be of type atomic_int * however these both result in the same behaviour.
#include <stdio.h>
#include <threads.h>
#include <string.h>
#include <stdlib.h>
#define THREAD_MAX 5
thrd_t threads[THREAD_MAX];
mtx_t mtx;
void foo(void * data)
{
int* thrdn = (int *)data;
mtx_lock(&mtx);
for(int i = 0; i < 3; ++i) {
printf("thread %d, number %d\n", *thrdn, i);
}
mtx_unlock(&mtx);
}
int main()
{
mtx_init(&mtx, mtx_plain | mtx_recursive);
for(int i = 0; i < THREAD_MAX; ++i){
thrd_create(&threads[i], foo, &i);
}
for(int i = 0; i < THREAD_MAX; ++i){
thrd_join(threads[i], NULL);
}
mtx_destroy(&mtx);
}
As has been noted in the comments, the issue was the reference to the local variable i. Tracking thread ids separately as seen in this answer solved the issue.
I keep getting this error for >6 hours now when trying to compile C-code with -fopenmp flag using gcc.
error: invalid controlling predicate
for ( int i = 0; i < N; i++ )
I browsed stackoverflow and I stripped down my code up till the point where it is an exact copy of an example from an OpenMP handbook, but still it doesn't compile.
#include <stdio.h>
#include <math.h>
#ifdef _OPENMP
#include <omp.h>
#endif
int main(int argc, char *argv[]) {
double N; sscanf (argv[1]," %lf", &N);
double integral = 0.0;
#pragma omp parallel for reduction(+: integral)
for ( int i = 0; i < N; i++ )
integral = integral + i;
printf("%20.18lf\n", integral);
return 0;
}
Any suggestions..?
Found it, sorry for the clutter..
To all other C newbies like myself: The error was in the double N. OpenMP wants your loop to run op to an INTEGER N, and not a double.
I'm tryng to add all the members of an array using openmp this way
#include <omp.h>
#include <stdio.h>
#include <stdlib.h>
int main (int argc, char *argv[])
{
int v[] ={1,2,3,4,5,6,7,8,9};
int sum = 0;
#pragma omp parallel private(v, sum)
{
#pragma reduction(+: sum)
{
for (int i = 0; i < sizeof(v)/sizeof(int); i++){
sum += v[i];
}
}
}
printf("%d\n",sum);
}
But when I print sum the result is 0
You are very confused about data-sharing attributes and work-sharing for OpenMP. This answer does not attempt to properly teach them to you, but only give you a concise specific example.
Your code does not make any sense and does not compile.
You do not need multiple regions or such, and there are only two variables. v - which is defined outside, is read by all and must be shared - which it implicitly is because it is defined outside. Then there is sum, which is a reduction variable.
Further, you need to apply worksharing (for) to the loop. So in the end it looks like this:
int v[] ={1,2,3,4,5,6,7,8,9};
int sum = 0;
#pragma omp parallel for reduction(+: sum)
for (int i = 0; i < sizeof(v)/sizeof(int); i++){
sum += v[i];
}
printf("%d\n",sum);
Note there are private variables in this example. Private variables are very dangerous because they are uninitialized inside the parallel region, simply don't use them explicitly. If you need something local, declare it inside the parallel region.
My code does the following: creates N threads, each one of them increments the global variable counter M times. I am using a mutex in order to assure the final value of counter is M*N.
I would like to observe the situation without a mutex, to obtain a different value for counter, in order to proper assess the mutex's work. I commented out the mutex, but the results are the same. Should I put them to sleep for a random period of time? How should I procede?
#include <stdio.h>
#include <pthread.h>
#define N 10
#define M 4
pthread_mutex_t mutex;
int counter=0;
void *thread_routine(void *parameter)
{
pthread_mutex_lock(&mutex);
int i;
for (i=0; i<M; i++)
counter++;
pthread_mutex_unlock(&mutex);
}
int main(void)
{
pthread_t v[N];
int i;
pthread_mutex_init(&mutex,NULL);
for (i=0; i<N; i++)
{
pthread_create(&v[i],NULL,thread_routine,NULL);
}
for (i=0; i<N; i++)
{
pthread_join(v[i],NULL);
}
printf("%d %d\n",counter,N*M);
if (N*M==counter)
printf("Success!\n");
pthread_mutex_destroy(&mutex);
return 0;
}
I don't know what compiler you used, but g++ in this case would completely eliminate the threads and calculate the final value of counter at compile time.
To prevent that optimization you can make the counter variable volatile
volatile int counter=0;
As this will tell the compiler that the variable can change at any time by external resources, it is forced to not to do any optimizations on that variable that could have side effects. As an external resource could change the value, the final value might not be the result of N*M and therefore the value of counter will be calculated at run-time.
Also what WhozCraig stated in his comment will most likely apply in your case. But I think he meant M, not N.
Additionally to your original question: As you read the counter once all threads are joined, it might be worth to give each thread its own counter and sum all threads' counters when they finished computation. This way you can compute the final value without any locks or atomic operations.
Edit:
Your first test with mutex would look like this
#define N 10
#define M 10000000
pthread_mutex_t mutex;
volatile int counter=0;
void *thread_routine(void *parameter)
{
pthread_mutex_lock(&mutex);
int i;
for (i=0; i<M; i++)
counter++;
pthread_mutex_unlock(&mutex);
}
and your second test without mutex like this
#define N 10
#define M 10000000
pthread_mutex_t mutex;
volatile int counter=0;
void *thread_routine(void *parameter)
{
// pthread_mutex_lock(&mutex);
int i;
for (i=0; i<M; i++)
counter++;
// pthread_mutex_unlock(&mutex);
}
while the second test will have the expected race conditions when incrementing the counter variable.
Compilation can be done using gcc -O3 -lpthread test.c
What I am looking for is what is the best way to gather all the data from the parallel for loops into one variable. OpenMP seems to have a different routine then I am used to seeing as I started learning OpenMPI first which has scatter and gather routines.
Calculating PI (embarrassingly parallel routine)
#include <omp.h>
#include <stdio.h>
#include <stdlib.h>
#define NUM_STEPS 100
#define CHUNKSIZE 20
int main(int argc, char *argv[])
{
double step, x, pi, sum=0.0;
int i, chunk;
chunk = CHUNKSIZE;
step = 1.0/(double)NUM_STEPS;
#pragma omp parallel shared(chunk) private(i,x,sum,step)
{
#pragma omp for schedule(dynamic,chunk)
for(i = 0; i < NUM_STEPS; i++)
{
x = (i+0.5)*step;
sum = sum + 4.0/(1.0+x*x);
printf("Thread %d: i = %i sum = %f \n",tid,i,sum);
}
pi = step * sum;
}
EDIT: It seems that I could use an array sum[*NUM_STEPS / CHUNKSIZE*] and sum the array into one value, or would it be better to use some sort of blocking routine to sum the product of each iteration
Add this clause to your #pragma omp parallel ... statement:
reduction(+ : pi)
Then just do pi += step * sum; at the end of the parallel region. (Notice the plus!) OpenMP will then automagically sum up the partial sums for you.
Lets see, I am not quite sure what happens, because I havn't got deterministic behaviour on the finished application, but I have something looks like it resembles π. I removed the #pragma omp parallel shared(chunk) and changed the #pragma omp for schedule(dynamic,chunk) to #pragma omp parallel for schedule(dynamic) reduction(+:sum).
#pragma omp parallel for schedule(dynamic) reduction(+:sum)
This requires some explanation, I removed the schedules chunk just to make it all simpler (for me). The part that you are interested in is the reduction(+:sum) which is a normal reduce opeartion with the operator + and using the variable sum.
#include <omp.h>
#include <stdio.h>
#include <stdlib.h>
#define NUM_STEPS 100
int main(int argc, char *argv[])
{
double step, x, pi, sum=0.0;
int i;
step = 1.0/(double)NUM_STEPS;
#pragma omp parallel for schedule(dynamic) reduction(+:sum)
for(i = 0; i < NUM_STEPS; i++)
{
x = (i+0.5)*step;
sum +=4.0/(1.0+x*x);
printf("Thread %%d: i = %i sum = %f \n",i,sum);
}
pi = step * sum;
printf("pi=%lf\n", pi);
}