Suppose I would like to run the functions in parallel.
void foo()
{
foo1(args);
foo2(args);
foo3(args);
foo4(args);
}
I want these functions calls run in parallel. How can I run these functions in parallel in OpenMP with C?
Assuming that the code is running serially when you enter foo(), you have a couple of different options.
Option 1: use sections
void foo()
{
#pragma omp parallel
{
#pragma omp sections
{
#pragma omp section
foo1(args);
#pragma omp section
foo2(args);
#pragma omp section
foo3(args);
#pragma omp section
foo4(args);
}
}
}
Option 2: use tasks
void foo()
{
#pragma omp parallel
{
#pragma omp single
{
#pragma omp task
foo1(args);
#pragma omp task
foo2(args);
#pragma omp task
foo3(args);
#pragma omp task
foo4(args);
}
}
}
Tasks are the more modern way of expressing this, and, potentially, allow you more freedom in controlling the execution.
Related
The following snippet is from one of the functions of my code:
static int i;
#pragma omp parallel for default(shared) private(i) schedule(static,1)
for (i=0; i<ttm_ic_last; i++)
{
static int ni, ni1, ni2;
static double ni_ratio;
static double temp_e, temp_l;
...
}
It's odd that when I comment the line starting with #pragma it works properly, otherwise the loop doesn't touch at least some of the intended values of i. (I'm not sure if 'touch' is the correct verb here.)
I'm using a workstation with
gcc (GCC) 4.4.6 20120305 (Red Hat 4.4.6-4)
I wonder what the cause of this error can be.
(Answer by Stefan)
Don't use static variables when OpenMP threads are involved.
The thing is; with statics, they have a shared memory space. So they will likely to interfere with each other across the threads. Your parallel loops are all looking inside the same box.
I want to test #pragma omp parallel for and #pragma omp simd for a simple matrix addition program. When I use each of them separately, I get no error and it seems fine. But, I want to test how much performance can be gained using both of them. If I use #pragma omp parallel for before the outer loop and #pragma omp simd before the inner loop I get no error as well. The error occures when I use both of them before the outer loop. I get an error at runtime not compile time. ICC and GCC return error but Clang doesn't. It might be because Clang regect the parallelization. In my experiments, Clang does not parallelize and run the program with only one thread.
The program is here:
#include <stdio.h>
//#include <x86intrin.h>
#define N 512
#define M N
int __attribute__(( aligned(32))) a[N][M],
__attribute__(( aligned(32))) b[N][M],
__attribute__(( aligned(32))) c_result[N][M];
int main()
{
int i, j;
#pragma omp parallel for
#pragma omp simd
for( i=0;i<N;i++){
for(j=0;j<M;j++){
c_result[i][j]= a[i][j] + b[i][j];
}
}
return 0;
}
The error for:
ICC:
IMP1.c(20): error: omp directive is not followed by a parallelizable
for loop #pragma omp parallel for ^
compilation aborted for IMP1.c (code 2)
GCC:
IMP1.c: In function ‘main’:
IMP1.c:21:10: error: for statement
expected before ‘#pragma’ #pragma omp simd
Because in my other testes pragma omp simd for outer loop gets better performance I need to put that there (don't I?).
Platform: Intel Core i7 6700 HQ, Fedora 27
Tested compilers: ICC 18, GCC 7.2, Clang 5
Compiler command line:
icc -O3 -qopenmp -xHOST -no-vec
gcc -O3 -fopenmp -march=native -fno-tree-vectorize -fno-tree-slp-vectorize
clang -O3 -fopenmp=libgomp -march=native -fno-vectorize -fno-slp-vectorize
From OpenMP 4.5 Specification:
2.11.4 Parallel Loop SIMD Construct
The parallel loop SIMD construct is a shortcut for specifying a parallel
construct containing one loop SIMD construct and no other statement.
The syntax of the parallel loop SIMD construct is as follows:
#pragma omp parallel for simd
...
You can also write:
#pragma omp parallel
{
#pragma omp for simd
for ...
}
My OpenMP program is like this:
#include <stdio.h>
#include <omp.h>
int main (void)
{
int i = 10;
#pragma omp parallel lastprivate(i)
{
printf("thread %d: i = %d\n", omp_get_thread_num(), i);
i = 1000 + omp_get_thread_num();
}
printf("i = %d\n", i);
return 0;
}
Use gcc to compile it and generate following errors:
# gcc -fopenmp test.c
test.c: In function 'main':
test.c:8:26: error: 'lastprivate' is not valid for '#pragma omp parallel'
#pragma omp parallel lastprivate(i)
^~~~~~~~~~~
Why does OpenMP forbid use lastprivate in #pragma omp parallel?
The meaning of lastprivate, is to assign "the sequentially last iteration of the associated loops, or the lexically last section construct [...] to the original list item."
Hence, there it no meaning for a pure parallel construct. It would not be a good idea to use a meaning like "the last thread to exit the parallel construct" - that would be a race condition.
This is a simple test code:
#include <stdlib.h>
__thread int a = 0;
int main() {
#pragma omp parallel default(none)
{
a = 1;
}
return 0;
}
gcc compiles this without any problems with -fopenmp, but icc (ICC) 12.0.2 20110112 with -openmp complains with
test.c(7): error: "a" must be specified in a variable list at enclosing OpenMP parallel pragma
#pragma omp parallel default(none)
I have no clue which paradigm (i.e. shared, private, threadprivate) applies to this type of variables. Which one is the correct one to use?
I get the expected behaviour when calling a function that accesses that thread local variable, but I have trouble accessing it from within an explicit parallel section.
Edit:
My best solution so far is to return a pointer to the variable through a function
static inline int * get_a() { return &a; }
__thread is roughly analogous to the effect that the threadprivate OpenMP directive has. To a great extent (read as when no C++ objects are involved), both are often implemented using the same underlying compiler mechanism and therefore are compatible but this is not guaranteed to always work. Of course, the real world is far from ideal and we have to sometimes sacrifice portability for just having things working within the given development constraints.
threadprivate is a directive and not a clause, therefore you have to do something like:
#include "header_providing_a.h"
#pragma omp threadprivate(a)
void parallel_using_a()
{
#pragma omp parallel default(none) ...
... use 'a' here
}
GCC (at least version 4.7.1) treats __thread as implicit threadprivate declaration and you don't have to do anything.
I have a single block enclosed in a sections block like this
#include <stdio.h>
#include <stdlib.h>
#include <omp.h>
int main (int argc, char *argv[])
{
int nthreads, tid;
/* Fork a team of threads giving them their own copies of variables */
#pragma omp parallel private(tid)
{
#pragma omp sections
{
#pragma omp section
{
printf("First section %d \n" , tid);
}
#pragma omp section
{
#pragma omp single
{
printf("Second Section block %d \n" , tid);
}
}
}
} /* All threads join master thread and disband */
printf("Outside parallel block \n");
}
When i compile this code the compiler gives the following warning
work-sharing region may not be closely nested inside of work-sharing, critical, ordered or master region
Why is that ?
It gives you this warning because you have an openmp single region nested inside an openmp sections region without an openmp parallel region nested between them.
This is known as a closely nested region.
In C, the worksharing constructs are for, sections, and single.
For further information see the OpenMP Specification or see Intel's Documentation on Improper nesting of OpenMP* constructs.
In order to have the code compile cleanly, try replacing your #pragma omp sections with #pragma omp parallel sections
or enclosing #pragma omp sections with #pragma omp parallel.
See Guide into OpenMP: Easy multithreading programming for C++ for more information and examples.