Iteration variable error in for loop: C program, OpenMP - c

I am trying to execute a code in C using OpenMP. The following is the code
#pragma omp parallel \
reduction(+:array[length])
{
int start = 1, distance, nthreads;
nthreads = omp_get_num_threads();
printf("%d\n", nthreads);
#pragma omp for
for (distance = 1; distance < length; distance = distance + distance)
{
for (i = length - 1; i >= start; i--)
{
array[i] = array[i] + array[i - distance];
}
start *=2;
}
}
The compiler is throwing the following error
**error**: increment expression refers to iteration variable ‘distance’
#pragma omp for
I tried to browse about this error online, but didn't find much. Any help in decoding the error would be useful.
Also, should the reduction clause be present on top top next to #pragma omp parallel \ or after #pragma omp for.

The OpenMP loop work-sharing construct requires a so called canonical loop form. You can only increment the loop variable by a loop-invariant value. You have to restructure your loop, e.g. through use of sqrt / <<. Also note that your use of start is not correct. Compute start from the loop iteration.

Related

Parallelizing inner loop with residual calculations in OpenMP with SSE vectorization

I'm trying to parallelizing the inner loop of a program that has data dependencies (min) outside the scope of the loops. I'm having an issue where the residual calculations occuring outside the scope of the inner j loop. The code gets errors if the "#pragma omp parallel" part is included on the j loop even if the loop doesn't run at all due to a k value being too low. say (1,2,3) for example.
for (i = 0; i < 10; i++)
{
#pragma omp parallel for shared(min) private (j, a, b, storer, arr) //
for (j = 0; j < k-4; j += 4)
{
mm_a = _mm_load_ps(&x[j]);
mm_b = _mm_load_ps(&y[j]);
mm_a = _mm_add_ps(mm_a, mm_b);
_mm_store_ps(storer, mm_a);
#pragma omp critical
{
if (storer[0] < min)
{
min = storer[0];
}
if (storer[1] < min)
{
min = storer[1];
}
//etc
}
}
do
{
#pragma omp critical
{
if (x[j]+y[j] < min)
{
min = x[j]+y[j];
}
}
}
} while (j++ < (k - 1));
round_min = min
}
The j-based loop is a parallel loop so you cannot use j after the loop. This is especially true since you explicitly put j as private, so only visible locally in the thread but not outside the parallel region. You can explicitly compute the position of the remaining j value using (k-4+3)/4*4 just after the parallel loop.
Furthermore, here is few important points:
You may not really need to vectorize the code yourself: you can use omp simd reduction. OpenMP can do all the boring job of computing the residual calculations for you automatically. Moreover, the code will be portable and much simpler. The generated code may also likely be faster than yours. Note however that some compilers might not be able to vectorize the code (GCC and ICC does, while Clang and MSVC often need some help).
Critical section (omp critical) are very costly. In your case this will just annihilate any possible improvement related to the parallel section. The code will likely be slower due to cache-line bouncing.
Reading data written by _mm_store_ps is inefficient here although some compiler (like GCC) may be able to understand the logic of your code and generate a faster implementation (extracting lane data).
Horizontal SIMD reductions inefficient. Use vertical ones that are much faster and that can be easily used here.
Here is a corrected code taking into account the above points:
for (i = 0; i < 10; i++)
{
// Assume min is already initialized correctly here
#pragma omp parallel for simd reduction(min:min) private(j)
for (j = 0; j < k; ++j)
{
const float tmp = x[j] + y[j];
if(tmp < min)
min = tmp;
}
// Use min here
}
The above code is vectorized correctly on x86 architecture on GCC/ICC (both with -O3 -fopenmp), Clang (with -O3 -fopenmp -ffastmath) and MSVC (with /O2 /fp:precise -openmp:experimental).

Counting does not work properly in OpenMP

I have the function
void collatz(int startNumber, int endNumber, int* iter, int nThreads)
{
int i, n, counter;
int isodd; /* 1 if n is odd, 0 if even */
#pragma omp parallel for
for (i = startNumber; i <= endNumber; i++)
{
counter = 0;
n = i;
omp_set_num_threads(nThreads);
while (n > 1)
{
isodd = n%2;
if (isodd)
n = 3*n+1;
else
n/=2;
counter++;
}
iter[i - startNumber] = counter;
}
}
It works as I wish when running serial (i.e. compiling without OpenMP or commenting out #pragma omp parallel for and omp_set_num_threads(nThreads);). However, the parallel version produces the wrong result and I think it is because the counter variable need to be set to zero at the beginning of each for loop and perhaps another thread can work with the non-zeroed counter value. But even if I use #pragma omp parallel for private(counter), the problem still occurs. What am I missing?
I compile the program as C89.
Inside your OpenMP parallel region, you are assigning values to the counter, n and isodd scalar variables. These cannot therefore be just shared as they are by default. You need to pay extra attention to them.
A quick analysis shows that as their values is only meaningful inside the parallel region and only for the current thread, so it becomes clear that they need to be declared private.
Adding a private( counter, n, isodd ) clause to your #pragma omp parallel directive should fix the issue.

Error while working with OpenMP: Parallel reduction calculation is invalid

While working with C and OpenMP to use parallel processing on a set of data I keep getting the following errors with my for loops.
Parallel reduction calculation is invalid!
Parallel atomic calculation is invalid!
The code is:
#pragma omp parallel for num_threads(numberOfThreads \
reduction(+:number_in_circle) shared(count)
for(count = 0; count < iterations; count++)
//calculate number in circle
# pragma omp parallel for num_threads(numberOfThreads) private(x, y,\
dist_sqrd) shared(count, number_in_circle, iterations)
for(count = 0; count < iterations; count++)
//calculate number_in_circle using atomic instruction to add to it.
Is there something wrong with my syntax or is it something wrong with the loop itself?
I'm not sure your copy of the OpenMP directives is 100% correct but there are definitely issues on the ones here:
#pragma omp parallel for num_threads(numberOfThreads \
reduction(+:number_in_circle) shared(count)
for(count = 0; count < iterations; count++)
num_threads(numberOfThreads misses the closing parenthesis
shared(count) is invalid since count is the index of the for loop you want to parallelise. Trying to define such index private is both stupid and explicitly forbidden by the OpenMP standard
This same remark goes for the second directive you cited.
Regarding the atomic and reduction clause errors, there isn't enough in your code snippet to give any advice.

Multiple pragmas directives on for loop (C and VS 2013)

I'm trying to use OpenMP to split a for loop computation to multiple threads. Additionally, I'm trying to instruct the compiler to vectorize each chunk assigned to each thread. The code is the following:
#pragma omp for private(i)
__pragma(loop(ivdep))
for (i = 0; i < 4096; i++)
vC[i] = vA[i] + SCALAR * vB[i];
The problem is that both pragmas expect the for loop right after.
Is there any smart construct to make this work?
Some might argue that due to the for loop splitting with OpenMP, the vectorization of the loop won't work. However I read that the #pragma omp for divides the loop into a number of contiguous chunks equal to the thread count. Is thitt?
What about using #pragma omp for simd private(i) instead of the pragma + __pragma() ?
Edit: since OpenMP 4 doesn't seem to be an option for you, you can manually split your loop to get rid of the #pragma omp for by just computing the index limits by hand using omp_get_num_threads() and omp_get_thread_num(), and then keep the ivdep for the per-thread loop.
Edit 2: since I'm a nice guy and since this is boilerplate (more common when programming in MPI, but still) but quite annoying to get right when you do it for the first time, here is a possible solution:
#pragma omp parallel
{
int n = 4096;
int tid = omp_get_thread_num();
int nth = omp_get_num_threads();
int chunk = n / nth;
int beg = tid * chunk + min( tid, n % nth );
int end = ( tid + 1 ) * chunk + min( tid + 1, n % nth );
#pragma ivdep
for ( int i = beg; i < end; i++ ) {
vC[i] = vA[i] + SCALAR * vB[i];
}
}

OpenMP Parallel for with scheduling in C

I want to run a parallel, scheduled(eg. static/dynamic/guided) for-loop, where each thread has its own set of variables, based on their thread-id. I know that any variable declared within the parallel pragma is private, but I don't want to re-declare the variables in every iteration of the for loop.
in my specific situation, I'm counting whenever a set of generating coordinates lies inside or outside of a circle to approximate pi.
I'm using erand48(int[3] seed) to generate these coordinates in each of the threads, and by giving each thread a different set of values for 'seed', I will get a greater variety of numbers to use in the approximation(and is also a requirement for this simulation).
long long int global_result = 0;
int tID = omp_get_thread_num();
int[3] seed;
seed[0] = (((tid*tid + 15) * 3)/7);
seed[1] = ((((tid + tid) * 44)/3) + 2);
seed[2] = tid;
int this_result = 0;
# pragma omp parallel for num_threads(thread_count) schedule(runtime)
for(i = 0; i < chunksize; i++){
x = erand48(seed);
y = erand48(seed);
((x*x+y*y)>=1)?(this_result++):;
}
# pragma omp critical{
global_result+= this_result;
}
This is as best as I can represent what I'm trying to do. I want the values of 'this_result','tid' and 'seed' to have a private scope.
I know that any variable declared within the parallel pragma is
private, but I don't want to re-declare the variables in every
iteration of the for loop.
Separate the #pragma omp parallel for into its two separate components #pragma omp parallel and #pragma omp for. Then you can declare the local variables in the parallel but outside the loop.
Something like this
int global_result = 0;
#pragma omp parallel reduction(+:global_result)
{
int tid = omp_get_thread_num();
int seed = (((tid*tid + 15) * 3)/7);
// Typo, as commented below
// # pragma omp parallel for schedule(runtime)
// What is intended!
# pragma omp for schedule(runtime)
for(i = 0; i < chunksize; i++){
float x = erand48(&seed);
float y = erand48(&seed);
if ((x*x+y*y)>=1)
this_result++;
}
global_result += this_result;
}
There are better ways to calculate pi, though :-)
You can use the clause "private" in your #pragma directive like that:
#pragma omp parallel for private(this_result, tid, seed) num_threads(thread_count) schedule(runtime)
If I understood your question correctly, that should do.

Resources