I ran an example of OpenMP program to test the function of threadprivate, but the result is unexpected and nondetermined. (The program is run on ubuntu 14.04 in VMware.)
the source code is shown in the following:
int counter = 0;
#pragma omp threadprivate (counter)
int inc_counter()
{
counter++;
return counter;
}
void main()
{
#pragma omp parallel sections copyin(counter)
{
#pragma omp section
{
int count1;
for(int iter = 0; iter < 100; ++iter)
count1 = inc_counter();
printf("count1 = %1d\n", count1);
}
#pragma omp section
{
int count2;
for(int iter=0; iter<200; iter++)
count2 = inc_counter();
printf("count2 = %d\n", count2);
}
}
printf("counter = %d\n", counter);
}
The output of the program is:
The correct output should be:
count1 = 100
count2 = 200
counter = 0/100/200?
What's wrong with it?
threadprivate differs from private in the sense that it doesn't have to allocate local variable for each thread, the variable only have to be unique for each thread. One of you threads (master) uses global counter defined in int counter = 0;. Therefore this thread changes this value to 100, 200 or leave it unchanged (0), depending on what section this single thread started to execute.
As you highlighted it seems weird for you why program is giving next results for (count1,count2,counter): (100,300,300) and (100,300,0).
(100,300,300)
Master thread executes both sections. You can check it by launching your code with single thread: OMP_NUM_THREADS=1 ./ex_omp
(100,300,0)
Some thread execute both sections while master is idle. You can check it by introducing section (alongside your two):
#pragma omp section
{
sleep(1); // #include <unistd.h>
printf("hope it is 0 (master) -> %d\n", omp_get_thread_num());
}
If you have 2 threads and master starts to execute this section then another thread with high probability execute your two other sections and you will get (100,300,0) as expected. Launch for example as OMP_NUM_THREADS=2 ./ex_omp.
If it still seems wrong that count2 = 300 you should notice that count is not private for section, it is private for a thread, while this thread can execute both sections.
Related
I'd like to run something like the following:
for (int index = 0; index < num; index++)
I'd want to run the for loop with four threads, with the threads executing in the order: 0,1,2,3,4,5,6,7,8, etc...
That is, for the threads to be working on index =n,(n+1),(n+2),(n+3) (in any particular ordering but always in this pattern), I want iterations of index = 0,1,2,...(n-1) to already be finished.
Is there a way to do this? Ordered doesn't really work here as making the body an ordered section would basically remove all parallelism for me, and scheduling doesn't seem to work because I don't want a thread to be working on threads k->k+index/4.
Thanks for any help!
You can do this with, not a parallel for loop, but a parallel region that manages its own loop inside, plus a barrier to make sure all running threads have hit the same point in it before being able to continue. Example:
#include <stdatomic.h>
#include <stdio.h>
#include <omp.h>
int main()
{
atomic_int chunk = 0;
int num = 12;
int nthreads = 4;
omp_set_num_threads(nthreads);
#pragma omp parallel shared(chunk, num, nthreads)
{
for (int index; (index = atomic_fetch_add(&chunk, 1)) < num; ) {
printf("In index %d\n", index);
fflush(stdout);
#pragma omp barrier
// For illustrative purposes only; not needed in real code
#pragma omp single
{
puts("After barrier");
fflush(stdout);
}
}
}
puts("Done");
return 0;
}
One possible output:
$ gcc -std=c11 -O -fopenmp -Wall -Wextra demo.c
$ ./a.out
In index 2
In index 3
In index 1
In index 0
After barrier
In index 4
In index 6
In index 5
In index 7
After barrier
In index 10
In index 9
In index 8
In index 11
After barrier
Done
I'm not sure I understand your request correctly. If I try to summarize how I interpret it, that would be something like: "I want 4 threads sharing the iterations of a loop, with always the 4 threads running at most on 4 consecutive iterations of the loop".
If that's what you want, what about something like this:
int nths = 4;
#pragma omp parallel num_thread( nths )
for( int index_outer = 0; index_outer < num; index_outer += nths ) {
int end = min( index_outer + nths, num );
#pragma omp for
for( int index = index_outer; index < end; index++ ) {
// the loop body just as before
} // there's a thread synchronization here
}
I want to have 2 threads groups run at the same time. For example, 2 threads are executing code block 1 and another 2 threads are executing another code segment. There was a stackoverflow question here OpenMP: Divide all the threads into different groups and I changed the code to see if it suits the logic I need in my code.
I have the below code with me.
#include <stdio.h>
#include <omp.h>
#define NUM_THREADS 1
int main(int argc, char **argv) {
omp_set_nested(1); /* make sure nested parallism is on */
int nprocs = omp_get_num_procs();
int nthreads1 = NUM_THREADS;
int nthreads2 = NUM_THREADS;
int t1[nthreads1];
for (int i=0; i<nthreads1; i++) {
t1[i] = 0;
}
#pragma omp parallel default(none) shared(nthreads1, nthreads2, t1) num_threads(2)
#pragma omp single
{
#pragma omp task // section 1
#pragma omp parallel for num_threads(nthreads1) shared(t1)
for (int i=0; i<nthreads1; i++) {
printf("Task 1: thread %d of the %d children of %d: handling iter %d\n",
omp_get_thread_num(), omp_get_team_size(2),
omp_get_ancestor_thread_num(1), i);
t1[i] = 1;
}
#pragma omp task // section 2
#pragma omp parallel for num_threads(nthreads2) shared(t1, t2)
for (int j=0; j<nthreads2; j++) {
while (!t1[j]) {
printf("Task 2: thread %d of the %d children of %d: handling iter %d\n",
omp_get_thread_num(), omp_get_team_size(2),
omp_get_ancestor_thread_num(1), j);
}
}
}
return 0;
}
To check if my code is running 2 thread groups at once, I set the thread count in each group to 1 and then I keep a boolean list that is initialized to 0.
In the first code segment, I set the boolean value to 1, and in the 2nd code segment, I check the boolean value to break out of the while loop. It seems like the above code is only run by 1 thread, because if the thread starts running the 2nd code block/section, then it keeps stuck inside the while loop because another thread is not setting the boolean value to 1.
How to run 2 thread groups in parallel?
UPDATE: My use case: I am writing a word count map-reduce program using OpenMP. I want one thread group 2 read files which adds read lines to a queue. I want another thread group to process lines from those queues and update the counts in a chained hash table. I already wrote the code to first do the reading to formulate the queues and then do the mapping to take data from queues and generate word counts -- but I want to change my program to have 2 thread groups to do reading and mapping in parallel -- at the same time. That's why I made this shortcode to check how I can implement 2 thread groups, running in parallel executing 2 different code segments.
It seems like the above could be solved using single directives with nowait and task directives. The below approach puts the tasks on to a queue and then threads pickup work from the queue. So ideally, 2 thread groups will be working on 2 different tasks, which is what is required in the question. Below is the code;
#include <stdio.h>
#include <omp.h>
#define NUM_THREADS 1
int main(int argc, char **argv) {
omp_set_nested(1); /* make sure nested parallism is on */
int nprocs = omp_get_num_procs();
int nthreads1 = NUM_THREADS;
int nthreads2 = NUM_THREADS;
int t1[nthreads1];
for (int i=0; i<nthreads1; i++) {
t1[i] = 0;
}
#pragma omp parallel default(none) shared(nthreads1, nthreads2, t1)
{
#pragma omp single nowait // section 1
for (int i=0; i<nthreads1; i++) {
#pragma omp task
{
printf("Task 1: thread %d of the %d children of %d: handling iter %d\n",
omp_get_thread_num(), omp_get_team_size(2),
omp_get_ancestor_thread_num(1), i);
t1[i] = 1;
}
}
#pragma omp single nowait // section 2
for (int j=0; j<nthreads2; j++) {
#pragma omp task
{
while (!t1[j]) {
printf("Task 2: thread %d of the %d children of %d: handling iter %d\n",
omp_get_thread_num(), omp_get_team_size(2),
omp_get_ancestor_thread_num(1), j);
}
}
}
}
return 0;
}
Also, you can simply use an if-else statement inside the #pragma omp construct to run 2 threadgroups in parallel
#include <stdio.h>
#include <omp.h>
#define NUM_THREADS 2
int main(int argc, char **argv) {
omp_set_nested(1); /* make sure nested parallism is on */
int nprocs = omp_get_num_procs();
int nthreads1 = NUM_THREADS/2;
int nthreads2 = NUM_THREADS/2;
int t1[nthreads1];
for (int i=0; i<nthreads1; i++) {
t1[i] = 0;
}
#pragma omp parallel default(none) shared(t1) num_threads(NUM_THREADS) {
int i = omp_get_thread_num(); // section 1
if (i<nthreads1) {
printf("Section 1: thread %d\n",i);
t1[i] = 1;
} else {
int j = i - nthreads1;
while (!t1[j]) {
printf("Section 2: thread %d, shift_value %d\n", i, j);
}
}
}
return 0;
}
I'm learning OpenMP these days and I just met the "threadprivate" directive. The code snippet below written by myself didn't output the expected result:
// **** File: fun.h **** //
void seed(int x);
int drand();
// ********************* //
// **** File: fun.c **** //
extern int num;
int drand()
{
num = num + 1;
return num;
}
void seed(int num_initial)
{
num = num_initial;
}
// ************************ //
// **** File: main.c **** //
#include "fun.h"
#include <stdio.h>
#include <stdlib.h>
#include <omp.h>
int num = 0;
#pragma omp threadprivate(num)
int main()
{
int num_inital = 4;
seed(num_inital);
printf("At the beginning, num = %d\n", num); // should num be 4?
#pragma omp parallel for num_threads(2) schedule(static,1) copyin(num)
for (int ii = 0; ii < 4; ii++) {
int my_rank = omp_get_thread_num();
//printf("Before processing, in thread %d num = %d\n", my_rank,num);
int num_in_loop = drand();
printf("Thread %d is processing loop %d: num = %d\n", my_rank,ii, num_in_loop);
}
system("pause");
return 0;
}
// ********************* //
Here list my questions:
Why the result of printf("At the beginning, num = %d\n", num); is num = 0 instead of num = 4?
As for the parallel for loop, multiple executions produce different results one of which is:
Thread 1 is processing loop 1: num = 5
Thread 0 is processing loop 0: num = 6
Thread 1 is processing loop 3: num = 7
Thread 0 is processing loop 2: num = 8
It seems that num is initialized to 4 in the for loop which denotes that the num in copyin clause is equal to 4. Why num in printf("At the beginning, num = %d\n", num) is different from that in copyin?
In OpenMP website, it said
In parallel regions, references by the master thread will be to the copy of the variable in the thread that encountered the parallel region.
According to this explanation, Thread 0 (the master thread) should firstly contains num = 4. Therefore, loop 0's output should always be: Thread 0 is processing loop 0: num = 5. Why the result above is different?
My working environment is win10 operating system with VS2015.
I think the problem is within the fun.c compilation unit. The compiler cannot determine the extern int num; variable is also a TLS one.
I will include directive #pragma omp threadprivate(num) in this file:
// **** File: fun.c **** //
extern int num;
#pragma omp threadprivate(num)
int drand()
{
num = num + 1;
return num;
}
void seed(int num_initial)
{
num = num_initial;
}
// ************************ //
In any case, the compiler should warn about it at the linking phase.
The copyin clause is meant to be used in OpenMP teams (eg. computation on computing accelerators).
Indeed, the OpenMP documentation says:
These clauses support the copying of data values from private or threadprivate variables on one implicit task or thread to the corresponding variables on other implicit tasks or threads in the team.
Thus, in you case, you should rather use the clause firstprivate.
Please note that the version (5.0) of the OpenMP documentation your are reading is probably not supported by VS2015. I advise you to read an older version compatible with VS2015. The results of the compiled program are likely to be undefined.
EDIT: changed the code and phrasing to make my doubt more explicit
I'm struggling to parallelize a loop in C using OpenMP for quite a while and want directions of how I should takle this challenge.
the loop consists of the following (in case you wish to know this loop is the main loop integrated in a simulated annealing algorithm):
for(attempt = 0; attempt < SATISFIED; attempt++) {
i = (rand() % (len-1)) + 1;
j = i + (rand() % (len-i));
if(...) {
...
//Update global static variables:
if(dst < best_distance)
set_best(dst, path);
//Stop this attempt:
attempt = -1;
}
//Decrease the temperature:
temp = change_temp(temp);
}
The problem with this loop is that the number of iterations to do cannot be calculated by it's condition so I came up with a different way to write this loop in order to be able to use openmp:
while(keepGoing){
keepGoing = 0;
#pragma omp parallel for default(none) shared(len, best_distance, best_path, distances, avg_distance, path) private( i, j, seed, swp_dst) lastprivate(dst, temp, keepGoing) firstprivate(dst, temp, abort, keepGoing)
for(attempt = 0; attempt < SATISFIED; attempt++) {
#pragma omp flush (abort)
if (!abort) {
seed = omp_get_thread_num();
i = (rand_r(&seed) % (len-1)) + 1;
j = i + (rand_r(&seed) % (len-i));
//Update progress:
#pragma omp critical
{
if(...) {
...
//Update global static variables:
if(dst < best_distance)
set_best(dst, path);
//Stop this attempt:
keepGoing = 1;
abort = 1;
#pragma omp flush (abort)
#pragma omp flush (keepGoing)
}
}
//Decrease the temperature:
temp = change_temp(temp);
}
}
}
However this solution gives a different then the sequential version I wrote before output for reasons I don't understand... Are the openmp directives being well placed? Or should I use them in different ways? Thanks in advance for any answer.
Why am I not getting different thread ids when I uses " #pragma omp parallel num_threads(4)". All the thread ids are 0 in this case.
But when I comment the line and use default number of threads, I got different thread ids.
Note:- variable I used variable tid to get thread id.
#include <omp.h>
#include <stdio.h>
#include <stdlib.h>
int main (int argc, char *argv[])
{
int nthreads, tid;
int x = 0;
#pragma omp parallel num_threads(4)
#pragma omp parallel private(nthreads, tid)
{
/* Obtain thread number */
tid = omp_get_thread_num();
printf("Hello World from thread = %d\n", tid);
// /* Only master thread does this */
if (tid == 0)
{
nthreads = omp_get_num_threads();
printf("Number of threads = %d\n", nthreads);
}
}
}
Output of above code:-
Hello World from thread = 0
Hello World from thread = 0
Number of threads = 1
Hello World from thread = 0
Number of threads = 1
Hello World from thread = 0
Number of threads = 1
Number of threads = 1
Output when I comment the line mentioned above:-
Hello World from thread = 3
Hello World from thread = 0
Number of threads = 4
Hello World from thread = 1
Hello World from thread = 2
You are creating two nested parallel regions. It is the same as doing this:
#pragma omp parallel num_threads(4)
{
#pragma omp parallel private(nthreads, tid)
{
/* Obtain thread number */
tid = omp_get_thread_num();
printf("Hello World from thread = %d\n", tid);
// /* Only master thread does this */
if (tid == 0)
{
nthreads = omp_get_num_threads();
printf("Number of threads = %d\n", nthreads);
}
}
}
omp_get_num_threads() returns the number of threads in the innermost region. So you are executing four threads, each of which is executing one thread.
The inner parallel region is only executing one thread, because you haven't enabled nested parallelism. You can enable it by calling omp_set_nested(1).
http://docs.oracle.com/cd/E19205-01/819-5270/aewbi/index.html
If instead of making two nested parallel regions, you wanted to make a single parallel region and specify two properties, you can do this:
#pragma omp parallel num_threads(4) private(nthreads,tid)
{
.
.
.
}
Nesting can also be enabled with by setting the environment variable OMP_NESTED to true