OpenMP minimum value array - c

I have the original code:
min = INT_MAX;
for (i=0;i<N;i++)
if (A[i]<min)
min = A[i];
for (i=0;i<N;i++)
A[i]=A[i]-min;
I want to get the parallel version of this and I did this:
min = INT_MAX;
#pragma omp parallel private(i){
minl = INT_MAX;
#pragma omp for
for (i=0;i<N;i++)
if (A[i]<minl)
minl=A[i];
#pragma omp critical{
if (minl<min)
min=minl;
}
#pragma omp for
for (i=0;i<N;i++)
A[i]=A[i]-min;
}
Is the parallel code right? I was wondering if it is necessary to write #pragma omp barrier before #pragma omp critical so that I make sure that all the minimums are calculated before calculating the global minimum.

The code is correct. There is no necessity to add a #pragma omp barrier because there is no need for all min_l to be computed when one thread enters the critical section. There is also an implicit barrier at the end of the loop region.
Further, you do not necessarily need to explicitly declare the loop iteration variable i private.
You can improve the code by using a reduction instead of your manual merging of minl:
#pragma omp for reduction(min:min)
for (i=0;i<N;i++)
if (A[i]<min)
min=A[i];
Note: The min operator for reduction is available since OpenMP 3.1.

Related

Using omp_get_num_threads() inside the parallel section

I would like to set the number of threads in OpenMP. When I use
omp_set_num_threads(2);
printf("nthread = %d\n", omp_get_num_threads());
#pragma omp parallel for
...
I see nthreads=1. Explained here, the number of reported threads belongs to serial section which is always 1. However, when I move it to the line after #pragma, I get compilation error that after #pragma, for is expected. So, how can I fix that?
Well, yeah, omp parallel for expects a loop in the next line. You can call omp_get_num_threads inside that loop. Outside a parallel section, you can call omp_get_max_threads for the maximum number of threads to spawn. This is what you are looking for.
int max_threads = omp_get_max_threads();
#pragma omp parallel for
for(...) {
int current_threads = omp_get_num_threads();
assert(current_threads == max_threads);
}
#pragma omp parallel
{
int current_threads = omp_get_num_threads();
# pragma omp for
for(...) {
...
}
}

Why does the result of #pragma omp parallel look like this

#include<stdio.h>
int main()
{
int a=0, b=0;
#pragma omp parallel num_threads(16)
{
// #pragma omp single
a++;
// #pragma omp critical
b++;
}
printf("single: %d -- critical: %d\n", a, b);
}
Why is my output single: 5 -- critical: 5 here?
And why is it output single: 3 -- critical: 3 when num_threads(4)?
I should not code anything like this, right? If the threads are confused here (I guess), why the result is consistent?
You have race conditions, therefore the values of a and b are undefined. To correct it you can
a) use reduction (it is the best solution):
#pragma omp parallel num_threads(16) reduction(+:a,b)
b) use atomic operations (generally it is less efficient):
#pragma omp atomic
a++;
#pragma omp atomic
b++;
c) use critical section (i.e. locks, which is the worst solution):
#pargma omp critical
{
a++;
b++;
}
That's just pure luck, helped by inherent timing of the operation. Since this is the first parallel section in your program, threads are built instead of reused. That occupies the main thread for a while and results in threads starting sequentially. Therefore the chance of overlapping operations is lower. Try adding a #pragma omp barrier in between to increase the chance of race conditions becoming apparent.

why is nested OpenMP program is taking more time in executing?

My OpenMP program of matrix multiplication which consists of nesting of for loops is taking more execution time than the non-nested version of the parallel program. This is the block where I have used nested parallelisation.
pragma omp parallel
omp_set_nested(1);
#pragma omp parallel for
for(i=0;i<N;i++) {
#pragma omp parallel for
for(j=0;j<N;j++) {
C[i][j]=0.; // set initial value of resulting matrix C = 0
#pragma omp parallel for
for(m=0;m<N;m++) {
C[i][j]=A[i][m]*B[m][j]+C[i][j];
}
printf("C:i=%d j=%d %f \n",i,j,C[i][j]);
}
}

Openmp: increase for loop iteration number

I have this parallel for loop
struct p
{
int n;
double *l;
}
#pragma omp parallel for default(none) private(i) shared(p)
for (i = 0; i < p.n; ++i)
{
DoSomething(p, i);
}
Now, it is possible that inside DoSomething(), p.n is increased because new elements are added to p.l. I'd like to process these elements in a parallel fashion. OpenMP manual states that parallel for can't be used with lists, so DoSomething() adds these p.l's new elements to another list which is processed sequentially and then it is joined back with p.l. I don't like this workaround. Anyone knows a cleaner way to do this?
A construct to support dynamic execution was added to OpenMP 3.0 and it is the task construct. Tasks are added to a queue and then executed as concurrently as possible. A sample code would look like this:
#pragma omp parallel private(i)
{
#pragma omp single
for (i = 0; i < p.n; ++i)
{
#pragma omp task
DoSomething(p, i);
}
}
This will spawn a new parallel region. One of the threads will execute the for loop and create a new OpenMP task for each value of i. Each different DoSomething() call will be converted to a task and will later execute inside an idle thread. There is a problem though: if one of the tasks add new values to p.l, it might happen after the creator thread has already exited the for loop. This could be fixed using task synchronisation constructs and an outer loop like this:
#pragma omp single
{
i = 0;
while (i < p.n)
{
for (; i < p.n; ++i)
{
#pragma omp task
DoSomething(p, i);
}
#pragma omp taskwait
#pragma omp flush
}
}
The taskwait construct makes for the thread to wait until all queued tasks are executed. If new elements were added to the list, the condition of the while would become true again and a new round of tasks creation will happen. The flush construct is supposed to synchronise the memory view between threads and e.g. update optimised register variables with the value from the shared storage.
OpenMP 3.0 is supported by all modern C compilers except MSVC, which is stuck at OpenMP 2.0.

how to use omp barrier on a while loop with no equal number of iterations for threads

I'm trying to implement the listranking problem (known also by shortcutting) with omp to have the sums prefixes of the array W.
I don't know if i use correctly the flush pragma..
And i have a warning when compiling "barrier region may not be closely nested inside of work-sharing, critical, ordered, master or explicit task region"
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <omp.h>
main(int argc, char *argv[])
{
int Q[9]={1,2,3,4,5,6,7,8,0};
int W[8]={1,2,3,4,5,6,7,8};
int i,j=6,id;
printf("Before:\n");
for(j=0;j<8;j++)
printf("%d",W[j]);
printf("\n");
#pragma omp parallel for shared(Q,W) private(id) num_threads(7)
for (i=6; i>=0; i--)
{
id= omp_get_thread_num();
while((Q[i] !=0)&& (Q[Q[i]] !=0))
{
#pragma omp flush(W)
W[i]=W[i]+W[Q[i]];
#pragma omp flush(W)
printf("Am %d \t W[%d]= %d",id,i,W[i]);
#pragma omp barrier
#pragma omp flush(Q)
Q[i]=Q[Q[i]];
#pragma omp flush(Q)
printf("Am %d \n Q[%d]= %d",id,i,Q[i]);
};
}
printf("Result:\n");
for(j=0; j<8; j++)
printf("%d \t",W[j]);
printf("\n");
}
PLEAAAAAAAAAAAASE HELP!
You can't use a barrier inside an omp parallel for, you can pretty much only use a barrier inside an omp parallel region.
The reason for this is because if your loop is from 1 to N, a barrier inside will effectively create N threads which will have a negative perf impact if N is large.
I didn't lookup the algorithm here, but two reasonable choices are to refactor to use 2 parallel for loops one after the other where the barrier is, or to refactor your algorithm to use a #pragma parallel region.
I looked up the list ranking algorithm, you will be well served to find an implementation of prefix sum or scan if you must use openmp.
-Rick

Resources