Calculating the execution time of nested for loops - c

Suppose i have a nested for-loop and if-checks shown below, if i wanted to see how many clock cycles (ultimately how many secs) a particular for-loop or if-check is taking to finish executing.
Should the sum of number of clock cycles (secs) taken by the inner for-loop and if-check be equal (or approximately equal) to the number of clock cycles(secs) taken by the outer most for-loop.?
Or am i doing it wrong? how do i time the loops if there's any other way of doing it.?
Note: I have 3 different functions doing pretty much the same thing, i have declared 3 different functions to measure each for-loop or if-check separately 'cause if i try to get the execution time of all the sub components in the same piece of code, then the number of clock cycles(secs) taken by the outer for-loop will include some extra execution of instructions which are calculating the clock cycles count of inner for-loop and if-check i guess.
void fun1(){
int i=0,j=0,k=0;
clock_t t=0,t_start=0,t_end=0;
//time the outermost forloop
t_start = clock();
for(i=0;i<100000;i++){
for(j=0;j<1000;j++){
//some code
}
if(k==0){
//some code
}
}
t_end = clock();
t=t_end-t_start;
double time_taken = ((double)t)/CLOCKS_PER_SEC;
printf("outer for-loop took %f seconds to execute \n", time_taken);
}
void fun2(){
int i=0,j=0,k=0;
clock_t t2=0,t2_start=0,t2_end=0;
for(i=0;i<100000;i++){
//time the inner for loop
t2_start=clock();
for(j=0;j<1000;j++){
//some code
}
t2_end=clock();
t2+=(t2_end-t2_start);
if(k==0){
//some code
}
}
double time_taken = ((double)t2)/CLOCKS_PER_SEC;
printf("inner for-loop took %f seconds to execute \n", time_taken);
}
void fun3(){
int i=0,j=0,k=0;
clock_t t3=0,t3_start=0,t3_end=0;
for(i=0;i<100000;i++){
for(j=0;j<1000;j++){
//some code
}
//time the if check
t3_start=clock();
if(k==0){
//some code
}
t3_end=clock();
t3+=(t3_end-t3_start);
}
double time_taken = ((double)t3)/CLOCKS_PER_SEC;
printf("if-check took %f seconds to execute \n", time_taken);
}

The expected answer is t in fun1 will likely be slightly more than t2+t3 from fun2 and fun3 respectively, representing the additional time to evaluate the outer loop itself.
Less obvious, however, is the time added by the measurement itself, which will be the time to invoke clock() itself once for each measurement. When measuring the inside loops, it's effectively multiplied by 100,000 because of the iteration of the outer loop.
Here's a program to measure the measurement itself, and for good measure, also measures the time to evaluate an empty outer loop.
#include <time.h>
#include <stdio.h>
int main () {
clock_t t = 0;
clock_t t_start, t_end;
for (int i = 0; i < 100000; i++) {
t_start = clock();
t_end = clock();
t += (t_end - t_start);
}
double time_taken = ((double) t) / CLOCKS_PER_SEC;
printf ("Time imposed by measurement itself: %fsec\n", time_taken);
t_start = clock();
for (int i = 0; i < 100000; i++) {
}
t_end = clock();
t = (t_end - t_start);
time_taken = ((double) t) / CLOCKS_PER_SEC;
printf ("Time to evaluate the loop: %fsec\n", time_taken);
}
Which, at least on my system, suggests the measurement may skew the results some:
Time imposed by measurement itself: 0.056949sec
Time to evaluate the loop: 0.000200sec
To get the amount of time your inner loops "really" take, you'll need to subtract out that added by the act of measuring it.

Related

Pragma omp parallel overhead

I have a problem with a #pragma omp parallel section in my code.
I hava program which should sort a given array of integers with quicksort using multiple threads. For this in every step every thread gets assigned a portion of the array, partitions it and returns how many elements are smaller than a given global pivot. The code executes without errors, but the more threads I tell omp to use, the slower it executes. I added logging for the execution times and it seems like a huge part of the program is spent on overhead for OpenMP. The overhead seems to be consistent, so the speed difference is proportional to the size of the array to sort.
Here is the code which is executed in parallel:
void create_count_elems_lower(int lower, int upper, int global_pivot_position, int *block_sizes, int *data,
int *count_elems_lower) {
assert(lower >= 0);
times_function_called++;
int pivot = data[global_pivot_position];
double start = omp_get_wtime();
double wait_start = 0;
double wait_time = 0;
#pragma omp parallel for
for (int i = 0; i < omp_get_max_threads(); ++i) {
double start_thread = omp_get_wtime();
int lower_p = lower;
lower_p += i == 0 ? 0 : block_sizes[i - 1] * i;
count_elems_lower[i] = partition_fixed_pivot(lower_p, lower_p + block_sizes[i], pivot, data) -
lower_p; // - lower_p since it needs to be relative
assert(count_elems_lower[i] >= 0);
double end_thread = omp_get_wtime();
double time_spent = end_thread - start_thread;
time_spent_per_thread_sum += time_spent;
if (max_time_spent_per_thread[i] < time_spent) {
max_time_spent_per_thread[i] = time_spent;
}
if (wait_start == 0) {
wait_start = end_thread;
} else {
double time_waiting = end_thread - wait_start;
if (time_waiting > wait_time) {
wait_time = time_waiting;
}
}
}
double end = omp_get_wtime();
time_spent_in_function += end - start;
time_spent_idling += wait_time;
}
And here the testing function:
printf("Num threads: %d\n", num_threads);
double start = omp_get_wtime();
test_sort_big();
double end = omp_get_wtime();
printf("total: %f\n", end - start);
printf("times function called: %f\n", times_function_called);
printf("time spent in create_count_elems_lower: %f\n", time_spent_in_function);
printf("time spent per thread approx: %f\n", time_spent_per_thread_sum / num_threads);
printf("time spent idling: %f\n", time_spent_idling);
for (int i = 0; i < num_threads; ++i) {
printf("max time spent by thread %d: %f \t", i, max_time_spent_per_thread[i]);
}
The program gets compiled and linked with:
gcc -fopenmp -O3 -c -o tests/tests.o tests/tests.c
gcc -fopenmp -o build_test tests/tests.o array_utils.o datagenerator.o quicksort.o
And the results are:
Num threads: 1
Testing sorting of 10000000 Elements
total: 9.204632
times function called: 10000000.000000
time spent in create_count_elems_lower: 5.914602
time spent per thread approx: 1.610363
time spent idling: 0.000000
max time spent by thread 0: 0.041889
Num threads: 4
Testing sorting of 10000000 Elements
total: 16.955334
times function called: 10000000.000000
time spent in create_count_elems_lower: 12.598185
time spent per thread approx: 0.874607
time spent idling: 2.130419
max time spent by thread 0: 0.016055 max time spent by thread 1: 0.013543 max time spent by thread 2: 0.013532 max time spent by thread 3: 0.018599
I run Fedora 27 64 bit with an Intel® Core™ i7-2760QM CPU # 2.40GHz × 8
Edit:
As it turned out the overhead was the problem, since the method gets called a lot of times with only one thread, changing the algorithm to a simple sort when only one thread is available improved the runtime a lot.

time execution can show up

#include <conio.h>
#include <stdio.h>
#include<time.h>
double multi();
void main()
{
clrscr();
clock_t start = clock();
for (int i = 0; i < 1000; i++)
{
multi();
//printf("Answer (%d)",s);
}
clock_t end = clock();
float diff;
diff = (float) (end - start) / CLOCKS_PER_SEC;
printf("time execution :%f", diff);
getch();
}
double multi()
{
double a;
a = 5 * 5;
return a;
}
The execution time appear as 0.000000 what the problem!
would it be cause of the nanoseconeds
The man for the clock() function says:
The clock() function returns an approximation of processor time used by the program.
Approximation, so it's not going to be exact, it depends on the granularity of your system. So for starters you can check the granularity of clock() on your system with something like:
clock_t start =clock(), end;
while(1)
{
if(start != (end=clock()))
break;
}
diff=(float)(end - start)/CLOCKS_PER_SEC;
printf("best time :%f",diff);
Doing this for me, I get 0.001 (which is 1ms), so anything that takes less that 1ms to do I will get back "0" instead. That's what's happening to you, your code is running faster than clock()s granularity and so you're getting back the best approximation which happens to be "0"

Clock() return value 0 in usleep loop with more orders

I'm trying to see how much time cost execute some code in a thread. But clock() is returning 0.
This is the code:
int main(int argc, char *argv[])
{
int begin, end;
float time_spent;
clock_t i,j;
struct timeval tv1;
struct timeval tv2;
for(i = 0; i<6; i++)
{
begin = clock();
// Send Audio Data
....
gettimeofday(&tv1,NULL);
usleep(200000); // Wait 200 ms
gettimeofday(&tv2,NULL);
printf("GETTIMEOFDAY %d\n", tv2.tv_usec-tv1.tv_usec); // Time using date WORKING
end = clock() - begin;
// Store time
...
printf ("It took me %d clicks (%f seconds).\n",begin,((float)begin)/CLOCKS_PER_SEC);
printf ("It took me %d clicks (%f seconds).\n",end,((float)end)/CLOCKS_PER_SEC);
time_spent = (((float)end) * 1000.0 / ((float)CLOCKS_PER_SEC)); // Time using clock BAD
printf("\n TIME %dms|%dms|%fms|%d\n",begin,end, time_spent,CLOCKS_PER_SEC);
}
return 0;
}
But I get 0 clicks all time. I think usleep is not waiting 200 ms exactly, so I need to calculate how much time cost the function to encode audio using ffmpeg with synchronization.
I think the problem is that you're using the clock() function.
The clock function determines the amount of processor time used since the invocation of the calling process, measured in CLOCKS_PER_SEC of a second.
So for example:
clock_t start = clock();
sleep(8);
clock_t finish = clock();
printf("It took %d seconds to execute the for loop.\n",
(finish - start) / CLOCKS_PER_SEC);
This code will give you a value of 0. Because the code was not using the processor, it was sleeping.
This code however:
long i;
clock_t start = clock();
for (i = 0; i < 100000000; ++i)
exp(log((double)i));
clock_t finish = clock();
printf("It took %d seconds to execute the for loop.\n",
(finish - start) / CLOCKS_PER_SEC);
Will give you a count of 8seconds, because the code was using the processor the whole time.

C how to measure time correctly?

This is the "algorithm", but when I want to measure the execution time it gives me zero. Why?
#define ARRAY_SIZE 10000
...
clock_t start, end;
start = clock();
for( i = 0; i < ARRAY_SIZE; i++)
{
non_parallel[i] = vec[i] * vec[i];
}
end = clock();
printf( "Number of seconds: %f\n", (end-start)/(double)CLOCKS_PER_SEC );
So What should i do to measure the time?
Two things:
10000 is not a lot on a modern computer. Therefore that loop will run in probably less than a millisecond - less than the precision of clock(). Therefore it will return zero.
If you aren't using the result of non_parallel its possible that the entire loop will be optimized out by the compiler.
Most likely, you just need a more expensive loop. Try increasing ARRAY_SIZE to something much larger.
Here's a test on my machine with a larger array size:
#define ARRAY_SIZE 100000000
int main(){
clock_t start, end;
double *non_parallel = (double*)malloc(ARRAY_SIZE * sizeof(double));
double *vec = (double*)malloc(ARRAY_SIZE * sizeof(double));
start = clock();
for(int i = 0; i < ARRAY_SIZE; i++)
{
non_parallel[i] = vec[i] * vec[i];
}
end = clock();
printf( "Number of seconds: %f\n", (end-start)/(double)CLOCKS_PER_SEC );
free(non_parallel);
free(vec);
return 0;
}
Output:
Number of seconds: 0.446000
This is an unreliable way to actually time number of seconds, since the clock() function is pretty low precision, and your loop isn't doing a lot of work. You can either make your loop do more so that it runs longer, or use a better timing method.
The higher precision methods are platform specific. For Windows, see How to use QueryPerformanceCounter? and for linux see High resolution timer with C++ and Linux?

C - measuring computing time

is there any simple way how to measure computing time in C? I tried time utility when executed, but I need to measure specific part of a program.
Thanks
You can use the clock function in <time.h> along with the macro CLOCKS_PER_SEC:
clock_t start = clock() ;
do_some_work() ;
clock_t end = clock() ;
double elapsed_time = (end-start)/(double)CLOCKS_PER_SEC ;
Now elapsed_time holds the time it took to call do_some_work, in fractional seconds.
You can try the profiler "gprof". More information here: http://www.cs.utah.edu/dept/old/texinfo/as/gprof.html
You can generally use the clock() function to get the start and end times of a single call to your function being tested. If, however, do_some_work() is particularly fast, it needs to be put in a loop and have the cost of the loop itself factored out, something like:
#define COUNT 10000
// Get cost of naked loop.
clock_t start_base = clock();
for (int i = count; i > 0; i--)
;
clock_t end_base = clock();
// Get cost of loop plus work.
clock_t start = clock();
for (int i = count; i > 0; i--)
do_some_work() ;
clock_t end = clock();
// Calculate cost of single call.
double elapsed_time = end - start - (end_base - start_base);
elapsed_time = elapsed_time / CLOCKS_PER_SEC / COUNT;
This has at least two advantages:
you'll get an average time which is more representative of the actual time it should take; and
you'll get a more accurate answer in the case where the clock() function has a limited resolution.
#codebolt - Thank you! very nice. On Mac OS X, I added an include of time.h, and pasted in your four lines. Then I printed the values of start, stop (integers) and elapsed time. 1mS resolution.
output:
3 X: strcpy .name, .numDocks: start 0x5dc end 0x5e1 elapsed: 0.000005
calloc: start 0x622 end 0x630 elapsed: 0.000014
in my foo.c program I have
#include <libc.h>
#include <stdlib.h>
#include <stdio.h>
#include <time.h>
but it works without explicitly including time.h. One of the others must bring it in.
Actual code:
clock_t start = clock() ;
strcpy( yard2.name, temp ); /* temp is only persistant in main... */
strcpy( yard1.name, "Yard 1");
strcpy( yard3.name, "3 y 3 a 3 r 3 d 3");
yard1.numDocks = MAX_DOCKS; /* or so I guess.. */
yard2.numDocks = MAX_DOCKS; /* or so I guess.. */
yard3.numDocks = MAX_DOCKS; /* or so I guess.. */
clock_t end = clock() ;
double elapsed_time = (end-start)/(double)CLOCKS_PER_SEC ;
printf("3 X: strcpy .name, .numDocks: start 0x%x end 0x%x elapsed: %-12:8f \n", start, end, elapsed_time );
start = clock() ;
arrayD = calloc( yard2.numDocks, sizeof( struct dock ) ); /* get some memory, init it to 0 */
end = clock() ;
elapsed_time = (end-start)/(double)CLOCKS_PER_SEC ;
printf("calloc: start 0x%x end 0x%x elapsed: %-12:8f \n", start, end, elapsed_time );

Resources