Problems with gcc 7 and 8 (debian) in OpenMP offloading to nvptx - c

I installed both gcc-7, gcc-8, gcc-7-offload-nvptx and gcc-8-offload-nvptx
I tried with both to compile a simple OpenMP code with offloading:
#include <omp.h>
#include <stdio.h>
int main(){
#pragma omp target
#pragma omp teams distribute parallel for
for (int i=0; i<omp_get_num_threads(); i++)
printf("%d in %d of %d\n",i,omp_get_thread_num(), omp_get_num_threads());
}
With the following line (with gcc-7 too):
gcc-8 code.c -fopenmp -foffload=nvptx-none
But it doesn't compile, giving the following error:
/tmp/ccKESWcF.o: In function "main":
teste.c:(.text+0x50): undefined reference to "GOMP_target_ext"
/tmp/cc0iOH1Y.target.o: In function "init":
ccPXyu6Y.c:(.text+0x1d): undefined reference to "GOMP_offload_register_ver"
/tmp/cc0iOH1Y.target.o: In function "fini":
ccPXyu6Y.c:(.text+0x41): undefined reference to "GOMP_offload_unregister_ver"
collect2: error: ld returned 1 exit status
some clues?

You code compiles and runs for me using -foffload=disable -fno-stack-protector with gcc7 and gcc-7-offload-nvptx and Ubuntu 17.10.
But on the GPU (without -foffload=disable) it fails to compile. You can't call printf from the GPU. Instead you can do this:
#include <omp.h>
#include <stdio.h>
#include <stdlib.h>
int main(){
int nthreads;
#pragma omp target teams map(tofrom:nthreads)
#pragma omp parallel
#pragma omp single
nthreads = omp_get_num_threads();
int *ithreads = malloc(sizeof *ithreads *nthreads);
#pragma omp target teams distribute parallel for map(tofrom:ithreads[0:nthreads])
for (int i=0; i<nthreads; i++) ithreads[i] = omp_get_thread_num();
for (int i=0; i<nthreads; i++)
printf("%d in %d of %d\n", i, ithreads[i], nthreads);
free(ithreads);
}
For me this outputs
0 in 0 of 8
1 in 0 of 8
2 in 0 of 8
3 in 0 of 8
4 in 0 of 8
5 in 0 of 8
6 in 0 of 8
7 in 0 of 8

Related

Why getting incorrect results from an OpenMP program?

I'm writing some simple example to understand how the things work with OpenMP programs.
#include <stdio.h>
#include <math.h>
#include <stdlib.h>
#include <omp.h>
int main (int argc ,char* argv[]){
omp_set_num_threads(4);
int j =0;
#pragma omp parallel private (j)
{
int i;
for(i=1;i<2;i++){
printf("from thread %d : i is equel to %d and j is equal to %d\n ",omp_get_thread_num(),i,j);
}
}
}
So in this example I should get j=0 each time,
unfortunately the result is j == 0 3 times , and j == 32707 one time.
What is wrong with my example?
Use firstprivate(j) rather than private(j) if you want that each thread has a private copy of j with the initial value being the value before entering the parallel region.

OpenMP not showing correct thread number - C

I have a simple program that uses openMP to run 4 threads that read in 4 different text files and find anagrams. I am just trying to figure out why the last thread that is reported shows a thread number of 26478...I can't quite figure it out. The function countAnagrams doesn't do anything with tid, it just prints it to the screen when the function is done running.
Below is my code and the output. Any help would be greatly appreciated.
#include <omp.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void countAnagrams(char* fileName, int threadNum);
void main ()
{
char *fileNames[] = {"AnagramA.txt","AnagramB.txt","AnagramC.txt","AnagramD.txt"};
int i;
int tid;
int nthreads = 4;
omp_set_num_threads(nthreads);
#pragma omp parallel
{
#pragma omp sections
{
#pragma omp section
{tid = omp_get_thread_num();
countAnagrams(fileNames[0], tid);}
#pragma omp section
{tid = omp_get_thread_num();
countAnagrams(fileNames[1], tid);}
#pragma omp section
{tid = omp_get_thread_num();
countAnagrams(fileNames[2], tid);}
#pragma omp section
{tid = omp_get_thread_num();
countAnagrams(fileNames[3], tid);}
}
}
}
Output:
Filename: AnagramD.txt
Hello from thread: 1
Number of anagrams: 286
Longest anagram: 8
Filename: AnagramB.txt
Hello from thread: 0
Number of anagrams: 1148
Longest anagram: 8
Filename: AnagramC.txt
Hello from thread: 2
Number of anagrams: 5002
Longest anagram: 8
Filename: AnagramA.txt
Hello from thread: 26478
Number of anagrams: 3184
Longest anagram: 8
What's causing your issue is that you have not declared your thread ID variable private when you create your parallel region. Thus, threads are stomping over each other there and garbage can result. To fix this, make sure that all variables that should only be accessible by a single thread are declared private like so:
#pragma omp parallel private(tid)
The thing that may cause this problem is that tid is declared in main function. Try to do it in the following manner:``
#include <omp.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void countAnagrams(char* fileName, int threadNum);
void main ()
{
char *fileNames[] = {"AnagramA.txt","AnagramB.txt","AnagramC.txt","AnagramD.txt"};
int i;
int nthreads = 4;
omp_set_num_threads(nthreads);
#pragma omp parallel private(tid) //now each thread has its private copy of tid
{
#pragma omp sections
{
#pragma omp section
{tid = omp_get_thread_num();
countAnagrams(fileNames[0], tid);}
#pragma omp section
{tid = omp_get_thread_num();
countAnagrams(fileNames[1], tid);}
#pragma omp section
{tid = omp_get_thread_num();
countAnagrams(fileNames[2], tid);}
#pragma omp section
{tid = omp_get_thread_num();
countAnagrams(fileNames[3], tid);}
}
}
}

segmentation fault when calling MPF function

I wrote following code. And I compiled and run the program. Segmentation fault occurred when calling mpif_set_si. But I can't understand why segmentation fault occur.
OS: Mac OS X 10.9.2
Compiler: i686-apple-darwin11-llvm-gcc-4.2 (GCC) 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2336.11.00)
#include <stdio.h>
#include <gmp.h>
#include <math.h>
#define NUM_ITTR 1000000
int
main(void)
{
unsigned long int i, begin, end , perTh;
mpf_t pi, gbQuaterPi, quaterPi, pw, tmp;
int tn, nt;
mpf_init(quaterPi);
mpf_init(gbQuaterPi);
mpf_init(pw);
mpf_init(tmp);
mpf_init(pi);
#pragma omp parallel private(tmp, pw, quaterPi, tn, begin, end, i)
{
#ifdef OMP
tn = omp_get_thread_num();
nt = omp_get_num_threads();
perTh = NUM_ITTR / nt;
begin = perTh * tn;
end = begin + perTh - 1;
#else
begin = 0;
end = NUM_ITTR - 1;
#endif
for(i=begin;i<=end;i++){
printf("Before set begin=%lu %lu tn= %d\n", begin, end, tn);
mpf_set_si(tmp, -1); // segmentation fault occur
printf("After set begin=%lu %lu tn= %d\n", begin, end, tn);
mpf_pow_ui(pw, tmp, i);
mpf_set_si(tmp, 2);
mpf_mul_ui(tmp, tmp, i);
mpf_add_ui(tmp, tmp, 1);
mpf_div(tmp, pw, tmp);
mpf_add(quaterPi, quaterPi, tmp);
}
#pragma omp critical
{
mpf_add(gbQuaterPi, gbQuaterPi, quaterPi);
}
}
mpf_mul_ui(pi, gbQuaterPi, 4);
gmp_printf("pi= %.30ZFf\n", pi);
mpf_clear(pi);
mpf_clear(tmp);
mpf_clear(pw);
mpf_clear(quaterPi);
mpf_clear(gbQuaterPi);
return 0;
}
-Command line-
$ setenv OMP_NUM_THREADS 2
$ gcc -g -DOMP -I/opt/local/include -fopenmp -o calcpi calcpi.c -lgmp -L/opt/local/lib
$ ./calcpi
Before set begin=0 499999 tn= 0
Before set begin=500000 999999 tn= 1
After set begin=1 999999 tn= 1
Segmentation fault
private variables are not initialised, so they can have any value after the start of the parallel section. Initialising a value inside the parallel block can work, but often isn't efficient.
Usually a better way is to use firstprivate instead of private, which will initialise variables with the value they had before the parallel region.

OpenMP gathering data (join data?) after parallel for

What I am looking for is what is the best way to gather all the data from the parallel for loops into one variable. OpenMP seems to have a different routine then I am used to seeing as I started learning OpenMPI first which has scatter and gather routines.
Calculating PI (embarrassingly parallel routine)
#include <omp.h>
#include <stdio.h>
#include <stdlib.h>
#define NUM_STEPS 100
#define CHUNKSIZE 20
int main(int argc, char *argv[])
{
double step, x, pi, sum=0.0;
int i, chunk;
chunk = CHUNKSIZE;
step = 1.0/(double)NUM_STEPS;
#pragma omp parallel shared(chunk) private(i,x,sum,step)
{
#pragma omp for schedule(dynamic,chunk)
for(i = 0; i < NUM_STEPS; i++)
{
x = (i+0.5)*step;
sum = sum + 4.0/(1.0+x*x);
printf("Thread %d: i = %i sum = %f \n",tid,i,sum);
}
pi = step * sum;
}
EDIT: It seems that I could use an array sum[*NUM_STEPS / CHUNKSIZE*] and sum the array into one value, or would it be better to use some sort of blocking routine to sum the product of each iteration
Add this clause to your #pragma omp parallel ... statement:
reduction(+ : pi)
Then just do pi += step * sum; at the end of the parallel region. (Notice the plus!) OpenMP will then automagically sum up the partial sums for you.
Lets see, I am not quite sure what happens, because I havn't got deterministic behaviour on the finished application, but I have something looks like it resembles π. I removed the #pragma omp parallel shared(chunk) and changed the #pragma omp for schedule(dynamic,chunk) to #pragma omp parallel for schedule(dynamic) reduction(+:sum).
#pragma omp parallel for schedule(dynamic) reduction(+:sum)
This requires some explanation, I removed the schedules chunk just to make it all simpler (for me). The part that you are interested in is the reduction(+:sum) which is a normal reduce opeartion with the operator + and using the variable sum.
#include <omp.h>
#include <stdio.h>
#include <stdlib.h>
#define NUM_STEPS 100
int main(int argc, char *argv[])
{
double step, x, pi, sum=0.0;
int i;
step = 1.0/(double)NUM_STEPS;
#pragma omp parallel for schedule(dynamic) reduction(+:sum)
for(i = 0; i < NUM_STEPS; i++)
{
x = (i+0.5)*step;
sum +=4.0/(1.0+x*x);
printf("Thread %%d: i = %i sum = %f \n",i,sum);
}
pi = step * sum;
printf("pi=%lf\n", pi);
}

Segfault with simple recursive openmp task

I am trying to parallelize this recursive function with openmp:
#include <stdio.h>
#include <omp.h>
void rec(int from, int to){
int len=to-from;
printf("%X %x %X %d\n", from, to, len, omp_get_thread_num());
if (len > 1){
int mid = (from+to)/2;
#pragma omp task
rec(from, mid);
#pragma omp task
rec(mid, to);
}
}
int main(int argc, char *argv[]){
long len=1024;
#pragma omp parallel
#pragma omp single
rec(0, len);
return 0;
}
But when I run it I get segfault:
$g++ -fopenmp -Wall -pedantic -lefence -g -O0 test.cpp && ./a.out
0 400 400 0
0 200 200 1
200 400 200 0
Segmentation fault
When I run it in valgrind it shows no errors. Without -lefence it also work.
I tried all possible combinations of #pragma omp clauses and it is either one-threaded or segfault.
What is wrong?
Thanks a lot.

Resources