OpenMP is not improving the performance [duplicate] - c

I have sequential code to parallelize via OpenMP. I have put in the corresponding pragmas and tested it. I measure the performance gain by checking the time spent in the main function.
The weird thing is the elapsed time calculated via cpu_time() and omp_get_wtime() is different. Why?
The elapsed time according to cpu_time() is similar to the sequential time.
Before computation starts:
ctime1_ = cpu_time();
#ifdef _OPENMP
ctime1 = omp_get_wtime();
#endif
After computation ends:
ctime2_ = cpu_time();
#ifdef _OPENMP
ctime2 = omp_get_wtime();
#endif
cpu_time() function definition:
double cpu_time(void)
{
double value;
value = (double) clock () / (double) CLOCKS_PER_SEC;
return value;
}
Printing result:
printf("%f - %f seconds.\n", ctime2 - ctime1, ctime2_ - ctime1_);
Sample result:
7.009537 - 11.575277 seconds.

The clock function measures cpu time, the time you spend actively on the CPU, the OMP function measures the time as it has passed during execution, two completely different things.
Your process seems to be blocked in waiting somewhere.

What you observe is a perfectly valid result for any parallel application - the combined CPU time of all threads as returned by clock() is usually more than the wallclock time measured by omp_get_wtime() except if your application mostly sleeps or waits.

The clock() function returns CPU time, not wall time. Instead, use gettimeofday().

Related

C (time.h) clock_t = clock() producing wrong duration

Some example code:
clock_t clock_start = clock();
for( ... ) { ... do stuff ... }
clock_t clock_stop = clock();
double duration = 1000.0 * (clock_stop - clock_start) / CLOCKS_PER_SEC;
printf("time: %f ms\n", duration);
When I ran this code it produced an output of:
time: 4756.869000 ms
This is clearly wrong. I estimate the actual time taken is about 10 seconds, and verified this via a stop watch.
There appears to be a factor of about 2 - 3 missing.
Is it possible that CLOCKS_PER_SEC is defined as something nonsensical on my system? (I am using a Raspberry Pi 3, with Raspberry Pi OS.) Is there any way to check this? Or is it more likely that something else is the cause of the issue.
I am aware of alternative methods of measuring time on posix systems. I will implement some tests with one of those as a possible alternative, regardless.
The clock() function returns an approximation of processor time used by the program.
It says "processor time", not the amount of time that a stopwatch would say. If you want to measure the amount of time passing in the real world, you simply need to use one of those other functions.

Uncertain about correctness of timekeeping for parallel programs with posix threads and openMP [duplicate]

I have sequential code to parallelize via OpenMP. I have put in the corresponding pragmas and tested it. I measure the performance gain by checking the time spent in the main function.
The weird thing is the elapsed time calculated via cpu_time() and omp_get_wtime() is different. Why?
The elapsed time according to cpu_time() is similar to the sequential time.
Before computation starts:
ctime1_ = cpu_time();
#ifdef _OPENMP
ctime1 = omp_get_wtime();
#endif
After computation ends:
ctime2_ = cpu_time();
#ifdef _OPENMP
ctime2 = omp_get_wtime();
#endif
cpu_time() function definition:
double cpu_time(void)
{
double value;
value = (double) clock () / (double) CLOCKS_PER_SEC;
return value;
}
Printing result:
printf("%f - %f seconds.\n", ctime2 - ctime1, ctime2_ - ctime1_);
Sample result:
7.009537 - 11.575277 seconds.
The clock function measures cpu time, the time you spend actively on the CPU, the OMP function measures the time as it has passed during execution, two completely different things.
Your process seems to be blocked in waiting somewhere.
What you observe is a perfectly valid result for any parallel application - the combined CPU time of all threads as returned by clock() is usually more than the wallclock time measured by omp_get_wtime() except if your application mostly sleeps or waits.
The clock() function returns CPU time, not wall time. Instead, use gettimeofday().

OpenMP slower than serial [duplicate]

I have sequential code to parallelize via OpenMP. I have put in the corresponding pragmas and tested it. I measure the performance gain by checking the time spent in the main function.
The weird thing is the elapsed time calculated via cpu_time() and omp_get_wtime() is different. Why?
The elapsed time according to cpu_time() is similar to the sequential time.
Before computation starts:
ctime1_ = cpu_time();
#ifdef _OPENMP
ctime1 = omp_get_wtime();
#endif
After computation ends:
ctime2_ = cpu_time();
#ifdef _OPENMP
ctime2 = omp_get_wtime();
#endif
cpu_time() function definition:
double cpu_time(void)
{
double value;
value = (double) clock () / (double) CLOCKS_PER_SEC;
return value;
}
Printing result:
printf("%f - %f seconds.\n", ctime2 - ctime1, ctime2_ - ctime1_);
Sample result:
7.009537 - 11.575277 seconds.
The clock function measures cpu time, the time you spend actively on the CPU, the OMP function measures the time as it has passed during execution, two completely different things.
Your process seems to be blocked in waiting somewhere.
What you observe is a perfectly valid result for any parallel application - the combined CPU time of all threads as returned by clock() is usually more than the wallclock time measured by omp_get_wtime() except if your application mostly sleeps or waits.
The clock() function returns CPU time, not wall time. Instead, use gettimeofday().

parallel 2D convolution using openMP [duplicate]

I have sequential code to parallelize via OpenMP. I have put in the corresponding pragmas and tested it. I measure the performance gain by checking the time spent in the main function.
The weird thing is the elapsed time calculated via cpu_time() and omp_get_wtime() is different. Why?
The elapsed time according to cpu_time() is similar to the sequential time.
Before computation starts:
ctime1_ = cpu_time();
#ifdef _OPENMP
ctime1 = omp_get_wtime();
#endif
After computation ends:
ctime2_ = cpu_time();
#ifdef _OPENMP
ctime2 = omp_get_wtime();
#endif
cpu_time() function definition:
double cpu_time(void)
{
double value;
value = (double) clock () / (double) CLOCKS_PER_SEC;
return value;
}
Printing result:
printf("%f - %f seconds.\n", ctime2 - ctime1, ctime2_ - ctime1_);
Sample result:
7.009537 - 11.575277 seconds.
The clock function measures cpu time, the time you spend actively on the CPU, the OMP function measures the time as it has passed during execution, two completely different things.
Your process seems to be blocked in waiting somewhere.
What you observe is a perfectly valid result for any parallel application - the combined CPU time of all threads as returned by clock() is usually more than the wallclock time measured by omp_get_wtime() except if your application mostly sleeps or waits.
The clock() function returns CPU time, not wall time. Instead, use gettimeofday().

Computing algorithm running time in C

I am using the time.h lib in c to find the time taken to run an algorithm. The code structure is somewhat as follows :-
#include <time.h>
int main()
{
time_t start,end,diff;
start = clock();
//ALGORITHM COMPUTATIONS
end = clock();
diff = end - start;
printf("%d",diff);
return 0;
}
The values for start and end are always zero. Is it that the clock() function does't work? Please help.
Thanks in advance.
Not that it doesn't work. In fact, it does. But it is not the right way to measure time as the clock () function returns an approximation of processor time used by the program. I am not sure about other platforms, but on Linux you should use clock_gettime () with CLOCK_MONOTONIC flag - that will give you the real wall time elapsed. Also, you can read TSC, but be aware that it won't work if you have a multi-processor system and your process is not pinned to a particular core. If you want to analyze and optimize your algorithm, I'd recommend you use some performance measurement tools. I've been using Intel's vTune for a while and am quite happy. It will show you not only what part uses the most cycles, but highlight memory problems, possible parallelism issues etc. You may be very surprised with the results. For example, most of the CPU cycles might be spent waiting for memory bus. Hope it helps!
UPDATE: Actually, if you run later versions of Linux, it might provide CLOCK_MONOTONIC_RAW, which is a hardware-based clock that is not a subject to NTP adjustments. Here is a small piece of code you can use:
stopwatch.hpp
stopwatch.cpp
Note that clock() returns the execution time in clock ticks, as opposed to wall clock time. Divide a difference of two clock_t values by CLOCKS_PER_SEC to convert the difference to seconds. The actual value of CLOCKS_PER_SEC is a quality-of-implementation issue. If it is low (say, 50), your process would have to run for 20ms to cause a nonzero return value from clock(). Make sure your code runs long enough to see clock() increasing.
I usually do it this way:
clock_t start = clock();
clock_t end;
//algo
end = clock();
printf("%f", (double)(end - start));
Consider the code below:
#include <stdio.h>
#include <time.h>
int main()
{
clock_t t1, t2;
t1 = t2 = clock();
// loop until t2 gets a different value
while(t1 == t2)
t2 = clock();
// print resolution of clock()
printf("%f ms\n", (double)(t2 - t1) / CLOCKS_PER_SEC * 1000);
return 0;
}
Output:
$ ./a.out
10.000000 ms
Might be that your algorithm runs for a shorter amount of time than that.
Use gettimeofday for higher resolution timer.

Resources