I am trying to write a program using openmp in which the structure block is a while loop.
#pragma omp parallel num_threads(x)
while(condition){
}
I have to decide upon the way to code the condition on which any thread would stop. I need to know if it is proper to have a break statement in the while loop.
I think that is it better to use omp cancellation.
The code would look similar to this one.
#pragma omp parallel
{
while(true) {
#pragma omp cancellation point parallel
// Do the heavy work
if(condition==false) {
#pragma omp cancel parallel
}
}
}
Your Question is a bit incomplete. You said "Condition on Which any thread would stop", but what about the after math:
Rest of the Threads should also exit.
Rest of the Threads should continue till they match the condition.
Case 1:
bool abort = 0;
#pragma omp parallel num_threads(x) private(abort)
{
while(!abort)
{
// The work you need to do.
#pragma omp critical
{
if(condition==false)
{
abort = 1;
}
}
}
}
Case 2:
#pragma omp parallel num_threads(x)
{
while(condition)
{
// The work you need to do.
}
}
Related
I would like to set the number of threads in OpenMP. When I use
omp_set_num_threads(2);
printf("nthread = %d\n", omp_get_num_threads());
#pragma omp parallel for
...
I see nthreads=1. Explained here, the number of reported threads belongs to serial section which is always 1. However, when I move it to the line after #pragma, I get compilation error that after #pragma, for is expected. So, how can I fix that?
Well, yeah, omp parallel for expects a loop in the next line. You can call omp_get_num_threads inside that loop. Outside a parallel section, you can call omp_get_max_threads for the maximum number of threads to spawn. This is what you are looking for.
int max_threads = omp_get_max_threads();
#pragma omp parallel for
for(...) {
int current_threads = omp_get_num_threads();
assert(current_threads == max_threads);
}
#pragma omp parallel
{
int current_threads = omp_get_num_threads();
# pragma omp for
for(...) {
...
}
}
Among OpenMP examples the following code can be found 6.2 Worksharing Constructs Inside a critical Construct:
void critical_work()
{
int i = 1;
#pragma omp parallel sections
{
#pragma omp section
{
#pragma omp critical (name)
{
#pragma omp parallel
{
#pragma omp single
{
i++;
}
}
}
}
}
}
Have you ever used this structure? Under what circumstances is it the best option in real life? My only guess is that it can be useful in error handling, what else?
I think this particular example just demonstrates that this kind of code is still conforming with the standard.
If the question is just about having a worksharing construct inside a critical construct (inside a worksharing construct), I could roughly imagine hierarchical applications where you generally have two layers of nested OpenMP parallelism, but most work is done outside of the critical region, e.g.:
void mostly_uncritical_work()
{
#pragma omp parallel
{
#pragma omp parallel
{
/* main workload */
}
#pragma omp critical (name)
{
#pragma omp parallel
{
/* smaller amount of work but still big enough */
/* to profit from parallelization */
}
}
}
}
So in the end the question boils down to "Are there applications for nested OpenMP parallelism?" and there my answer would certainly be yes. I use it for example to have a team of 2 threads in the outer team, one of them simulating things on a GPU and the other one analyzing the output of the GPU using an inner team of threads.
for some reason I need to stress my processor and I want to fork a lot of threads in OpenMP. In pthreads you can easily do it using a for loop since it is forking a thread is just a function call. But in OpenMP you have to have something like this:
#pragma omp parallel sections
{
#pragma omp section
{
//section 0
}
#pragma omp section
{
//section 1
}
.... // repeat omp section for n times
}
I am just wondering if there is any easier way to fork a large number of threads in OpenMP?
You don't need to do anything special, almost. Just write code for a compute-intensive task and put it inside a parallel region. Then indicate what number of threads you want. In order to do that, you use omp_set_dynamic(0) to disable dynamic threads (this helps to achieve the number of threads you want, but it still won't be guaranteed), then omp_set_num_threads(NUM_THREADS) to indicate what number of threads you want.
Then each thread will clone the task you indicate in the code. Simple as that.
const int NUM_THREADS = 100;
omp_set_dynamic(0);
omp_set_num_threads(NUM_THREADS);
#pragma omp parallel
{
// How many threads did we really get? Let's write it once only.
#pragma omp single
{
cout << "using " << omp_get_num_threads() << " threads." << std::endl;
}
// write some compute-intensive code here
// (be sure to print the result at the end, so that
// the compiler doesn't throw away useless instructions)
}
To do what you want, you get the thread number and then do different things based on which thread you are.
// it's not guaranteed you will actually get this many threads
omp_set_num_threads( NUM_THREADS );
int actual_num_threads;
#pragma omp parallel
{
#pragma omp single
{
actual_num_threads = omp_get_num_threads();
}
int me = omp_get_thread_num();
if ( me < actual_num_threads / 2 ) {
section1();
}
else {
section2();
}
}
I have a algorithm in which I have made groups of structures(ie. array of structures). I want that each group should work in a single thread. I am giving the code as follows. the for loop following #pragma omp for, i want i=0 should be executed in one thread, i=1 in another and so on. Kindly help and suggest me if I am doing correct.
#pragma omp parallel shared(min,sgb,div,i) private(th_id)
omp_set_num_threads(4);
{
th_id=omp_get_thread_num();
printf("Thread %d\n",th_id);
scanf("%c",&ch);
#pragma omp for schedule(static,CHUNKSIZE)
for(i=0;i<div;i++)
{
sgb[i]=pso(sgb[i],kmax,c1,c2);
min[i]=sgb[i].gbest;
printf("in distribute gbest=%f x=%f y=%f index=%d\n",sgb[i].gbest,sgb[i].bestp[0],sgb[i].bestp[1],sgb[i].index);
}
#pragma omp barrier
//fclose(fp);
m=min[0];
for(j=0;j<div;j++)
{
printf("after barrier gbest=%f x=%f y=%f\n",sgb[j].gbest,sgb[j].bestp[0],sgb[j].bestp[1]);
if(m>min[j])
{
m=min[j];
k=j;
}
}
}
I am trying to parallelize a large program that is written by a third-party. I cannot disclose the code, but I will try and give the closest example of what I wish to do.
Based on the code below. As you can see, since the clause "parallel" is INSIDE the while loop, the creation/destruction of the threads are(is) done with each iteration, which is costly.
Given that I cannot move the Initializors...etc to be outside the "while" loop.
--Base code
void funcPiece0()
{
// many lines and branches of code
}
void funcPiece1()
{
// also many lines and branches of code
}
void funcCore()
{
funcInitThis();
funcInitThat();
#pragma omp parallel
{
#pragma omp sections
{
#pragma omp section
{
funcPiece0();
}//omp section
#pragma omp section
{
funcPiece1();
}//omp section
}//omp sections
}//omp parallel
}
int main()
{
funcInitThis();
funcInitThat();
#pragma omp parallel
{
while(1)
{
funcCore();
}
}
}
What I seek to do is to avoid the creation/destruction per-iteration, and make it once at the start/end of the program. I tried many variations to the displacement of the "parallel" clause. What I basically has the same essence is the below: (ONLY ONE thread creation/destruction per-program run)
--What I tried, but failed "illegal access" in the initializing functions.
void funcPiece0()
{
// many lines and branches of code
}
void funcPiece1()
{
// also many lines and branches of code
}
void funcCore()
{
funcInitThis();
funcInitThat();
//#pragma omp parallel
// {
#pragma omp sections
{
#pragma omp section
{
funcPiece0();
}//omp section
#pragma omp section
{
funcPiece1();
}//omp section
}//omp sections
// }//omp parallel
}
int main()
{
funcInitThis();
funcInitThat();
while(1)
{
funcCore();
}
}
--
Any help would be highly appreciated!
Thanks!
OpenMP only creates worker thread at start. parallel pragma does not spawn thread. How do you determine the thread are spawned?
This can be done! The key here is to move the loop inside one single parallel section and make sure that whatever is used to determine whether to repeat or not, all threads will make exactly the same decision. I've used shared variables and do a synchronization just before the loop condition is checked.
So this code:
initialize();
while (some_condition) {
#pragma omp parallel
{
some_parallel_work();
}
}
can be transformed into something like this:
#pragma omp parallel
{
#pragma omp single
{
initialize(); //if initialization cannot be parallelized
}
while (some_condition_using_shared_variable) {
some_parallel_work();
update_some_condition_using_shared_variable();
#pragma omp flush
}
}
The most important thing is to be sure that every thread makes the same decision at the same points in your code.
As a final thought, essentially what one is doing is trading the overhead for creating/destroying threads (every time a section of #pragma omp parallel begins/ends) into synchronization overhead for the decision making of the threads. I think synchronizing should be faster however there are some many parameters at play here that this may not always be.