I am trying to add OpenMP parallelization to a working code (just to a single for loop), however I cannot get rid of a segmentation fault. The problem arises from this line:
pos += sprintf(com + pos, "%d ", i);
com is a character array, and I tried defining it as char com[255] or char *com = malloc(255*sizeof(char)), both inside and before the for loop. I added private(com) to #pragma omp parallel for directive when I defined com before the loop. I also tried initializing it and using firstprivate. (pos is an integer, initialized to 0)
When I do not add -fopenmp everything works fine, but with -fopenmp it gives segfault. What am I missing?
The segmentation fault comes from multiple threads updating the value of pos at the same time, therefore setting it to some value that turns com + pos into a pointer that points beyond or before the allocated memory for com. The proper way to parallelise such a loop would be to concatenate the values in private strings and then concatenate the private strings in an ordered fashion:
char com[255];
int pos = 0;
#pragma omp parallel
{
char mycom[255];
int mypos = 0;
#pragma omp for schedule(static) nowait
for (int i = 0; i < N; i++)
mypos += sprintf(mycom + mypos, "%d ", i);
// Concatenate the strings in an ordered fashion
#pragma omp for schedule(static) ordered
for (int i = 0; i < omp_get_num_threads(); i++)
{
#pragma omp ordered
pos += sprintf(com + pos, "%s", mycom);
}
}
The ordered construct ensures proper synchronisation so one does not need critical. The use of schedule(static) is important in order to guarantee each thread processes a single contiguous section of the iteration space.
Related
I am trying to do a reduction on multiple variables (an array) using OMP, but wasn't sure how to implement it with OMP. See the code below.
#pramga omp parallel for reduction( ??? )
for (int i = 0; i < n; i++) {
for (int j = 0; j < m; j++) {
[ compute value ... ]
y[j] += value
}
}
I thought I could do something like this, with the atomic keyword, but realised this would prevent two threads from updating y at the same time even if they are updating different values.
#pramga omp parallel for
for (int i = 0; i < n; i++) {
for (int j = 0; j < m; j++) {
[ compute value ... ]
#pragma omp atomic
y[j] += value
}
}
Does OMP have any functionality for something like this or otherwise how would I achieve this optimally without OMP's reduction keyword?
There is an array reduction available in OpenMP since version 4.5:
#pramga omp parallel for reduction(+:y[:m])
where m is the size of the array. The only limitation here is that the local array used in reduction is always reserved on the stack, so it cannot be used in the case of large arrays.
The atomic operation you mentioned should work fine, but it may be less efficient than reduction. Of course, it depends on the actual circumstances (e.g. actual value of n and m, time to compute value, false sharing, etc.).
#pragma omp atomic
y[j] += value
I have a problem with my code, it should print number of appearances of a certain number.
I want parallelize this code with OpenMP, and I tried to use reduction for arrays but it's obviously didn't working as I wanted.
The error is: "segmentation fault". Should some variables be private? or it's the problem with the way I'm trying to use the reduction?
I think each thread should count some part of array, and then merge it somehow.
#pragma omp parallel for reduction (+: reasult[:i])
for (i = 0; i < M; i++) {
for(j = 0; j < N; j++) {
if ( numbers[j] == i){
result[i]++;
}
}
}
Where N is big number telling how many numbers I have. Numbers is array of all numbers and result array with sum of each number.
First you have a typo on the name
#pragma omp parallel for reduction (+: reasult[:i])
should actually be "result" not "reasult"
Nonetheless, why are you section the array with result[:i]? Based on your code, it seems that you wanted to reduce the entire array, namely:
#pragma omp parallel for reduction (+: result)
for (i = 0; i < M; i++)
for(j = 0; j < N; j++)
if ( numbers[j] == i)
result[i]++;
When one's compiler does not support the OpenMP 4.5 array reduction feature one can alternatively explicitly implement the reduction (check this SO thread to see how).
As pointed out by #Hristo Iliev in the comments:
Provided that M * sizeof(result[0]) / #threads is a multiple of the
cache line size, and even if it isn't when the value of M is large
enough, there is absolutely no need to involve reduction in the
process. Unless the program is running on a NUMA system, that is.
Assuming that the aforementioned conditions are met, and if you analyze carefully the outermost loop iterations (i.e., variable i) are assigned to the threads, and since the variable i is used to access the result array, each thread will be updating a different position of the result array. Therefore, you can simplified your code to:
#pragma omp parallel for
for (i = 0; i < M; i++)
for(j = 0; j < N; j++)
if ( numbers[j] == i)
result[i]++;
I am trying to use OpenMP to add the numbers in an array. The following is my code:
int* input = (int*) malloc (sizeof(int)*snum);
int sum = 0;
int i;
for(i=0;i<snum;i++){
input[i] = i+1;
}
#pragma omp parallel for schedule(static)
for(i=0;i<snum;i++)
{
int* tmpsum = input+i;
sum += *tmpsum;
}
This does not produce the right result for sum. What's wrong?
Your code currently has a race condition, which is why the result is incorrect. To illustrate why this is, let's use a simple example:
You are running on 2 threads and the array is int input[4] = {1, 2, 3, 4};. You initialize sum to 0 correctly and are ready to start the loop. In the first iteration of your loop, thread 0 and thread 1 read sum from memory as 0, and then add their respective element to sum, and write it back to memory. However, this means that thread 0 is trying to write sum = 1 to memory (the first element is 1, and sum = 0 + 1 = 1), while thread 1 is trying to write sum = 2 to memory (the second element is 2, and sum = 0 + 2 = 2). The end result of this code depends on which one of the threads finishes last, and therefore writes to memory last, which is a race condition. Not only that, but in this particular case, neither of the answers that the code could produce are correct! There are several ways to get around this; I'll detail three basic ones below:
#pragma omp critical:
In OpenMP, there is what is called a critical directive. This restricts the code so that only one thread can do something at a time. For example, your for-loop can be written:
#pragma omp parallel for schedule(static)
for(i = 0; i < snum; i++) {
int *tmpsum = input + i;
#pragma omp critical
sum += *tmpsum;
}
This eliminates the race condition as only one thread accesses and writes to sum at a time. However, the critical directive is very very bad for performance, and will likely kill a large portion (if not all) of the gains you get from using OpenMP in the first place.
#pragma omp atomic:
The atomic directive is very similar to the critical directive. The major difference is that, while the critical directive applies to anything that you would like to do one thread at a time, the atomic directive only applies to memory read/write operations. As all we are doing in this code example is reading and writing to sum, this directive will work perfectly:
#pragma omp parallel for schedule(static)
for(i = 0; i < snum; i++) {
int *tmpsum = input + i;
#pragma omp atomic
sum += *tmpsum;
}
The performance of atomic is generally significantly better than that of critical. However, it is still not the best option in your particular case.
reduction:
The method you should use, and the method that has already been suggested by others, is reduction. You can do this by changing the for-loop to:
#pragma omp parallel for schedule(static) reduction(+:sum)
for(i = 0; i < snum; i++) {
int *tmpsum = input + i;
sum += *tmpsum;
}
The reduction command tells OpenMP that, while the loop is running, you want each thread to keep track of its own sum variable, and add them all up at the end of the loop. This is the most efficient method as your entire loop now runs in parallel, with the only overhead being right at the end of the loop, when the sum values of each of the threads need to be added up.
Use reduction clause (description at MSDN).
int* input = (int*) malloc (sizeof(int)*snum);
int sum = 0;
int i;
for(i=0;i<snum;i++){
input[i] = i+1;
}
#pragma omp parallel for schedule(static) reduction(+:sum)
for(i=0;i<snum;i++)
{
sum += input[i];
}
I am trying to use OpenMP to add the numbers in an array. The following is my code:
int* input = (int*) malloc (sizeof(int)*snum);
int sum = 0;
int i;
for(i=0;i<snum;i++){
input[i] = i+1;
}
#pragma omp parallel for schedule(static)
for(i=0;i<snum;i++)
{
int* tmpsum = input+i;
sum += *tmpsum;
}
This does not produce the right result for sum. What's wrong?
Your code currently has a race condition, which is why the result is incorrect. To illustrate why this is, let's use a simple example:
You are running on 2 threads and the array is int input[4] = {1, 2, 3, 4};. You initialize sum to 0 correctly and are ready to start the loop. In the first iteration of your loop, thread 0 and thread 1 read sum from memory as 0, and then add their respective element to sum, and write it back to memory. However, this means that thread 0 is trying to write sum = 1 to memory (the first element is 1, and sum = 0 + 1 = 1), while thread 1 is trying to write sum = 2 to memory (the second element is 2, and sum = 0 + 2 = 2). The end result of this code depends on which one of the threads finishes last, and therefore writes to memory last, which is a race condition. Not only that, but in this particular case, neither of the answers that the code could produce are correct! There are several ways to get around this; I'll detail three basic ones below:
#pragma omp critical:
In OpenMP, there is what is called a critical directive. This restricts the code so that only one thread can do something at a time. For example, your for-loop can be written:
#pragma omp parallel for schedule(static)
for(i = 0; i < snum; i++) {
int *tmpsum = input + i;
#pragma omp critical
sum += *tmpsum;
}
This eliminates the race condition as only one thread accesses and writes to sum at a time. However, the critical directive is very very bad for performance, and will likely kill a large portion (if not all) of the gains you get from using OpenMP in the first place.
#pragma omp atomic:
The atomic directive is very similar to the critical directive. The major difference is that, while the critical directive applies to anything that you would like to do one thread at a time, the atomic directive only applies to memory read/write operations. As all we are doing in this code example is reading and writing to sum, this directive will work perfectly:
#pragma omp parallel for schedule(static)
for(i = 0; i < snum; i++) {
int *tmpsum = input + i;
#pragma omp atomic
sum += *tmpsum;
}
The performance of atomic is generally significantly better than that of critical. However, it is still not the best option in your particular case.
reduction:
The method you should use, and the method that has already been suggested by others, is reduction. You can do this by changing the for-loop to:
#pragma omp parallel for schedule(static) reduction(+:sum)
for(i = 0; i < snum; i++) {
int *tmpsum = input + i;
sum += *tmpsum;
}
The reduction command tells OpenMP that, while the loop is running, you want each thread to keep track of its own sum variable, and add them all up at the end of the loop. This is the most efficient method as your entire loop now runs in parallel, with the only overhead being right at the end of the loop, when the sum values of each of the threads need to be added up.
Use reduction clause (description at MSDN).
int* input = (int*) malloc (sizeof(int)*snum);
int sum = 0;
int i;
for(i=0;i<snum;i++){
input[i] = i+1;
}
#pragma omp parallel for schedule(static) reduction(+:sum)
for(i=0;i<snum;i++)
{
sum += input[i];
}
I have a c program as below:
int a[10];
int b;
for(int i = 0; i < 10; i++)
function1(a[i]);
function1(b);
Now I want to parallelize all these 11 calls of function function1(). How can I do this using openmp?
I have tried
#pragma omp parallel sections
{
#pragma omp section
#pragmal omp parallel for
for(int i = 0; i < 10; i++)
function1(a[i]);
#pragma omp section
function1(b);
}
But the above code doesn't seem to work.
EDIT: Please read function1(b) as some different function, ie function2(b).
A simple way, that doesn't depend on OpemMP, is to add b to the a array.
This way, you have a single loop to parallelize.
Just make a 11 ints long, and put the value of b in the last one.
In a more general case (assuming the members of a are not integers, but something larger), you may want to change function1 to get a pointer. Then build another array, of 11 pointers. Set 10 to point to cells of a, the last to b.
In an even more general case, the function called for b is a different one (possibly with entirely different parameters). In this case, you can still use one loop:
for (i=0; i<11; i++) {
if (i<10) {
function1(a[i]);
} else {
function2(b);
}
}
The easiest way is using the parallel for pragma:
#pragma omp parallel for
for(int i = 0; i < 10; i++)
function1(a[i]);
Remember that you must turn on the appropiate switch for your compiler to enable OMP support. In GCC, for example, that switch is -fopenmp