This is the "algorithm", but when I want to measure the execution time it gives me zero. Why?
#define ARRAY_SIZE 10000
...
clock_t start, end;
start = clock();
for( i = 0; i < ARRAY_SIZE; i++)
{
non_parallel[i] = vec[i] * vec[i];
}
end = clock();
printf( "Number of seconds: %f\n", (end-start)/(double)CLOCKS_PER_SEC );
So What should i do to measure the time?
Two things:
10000 is not a lot on a modern computer. Therefore that loop will run in probably less than a millisecond - less than the precision of clock(). Therefore it will return zero.
If you aren't using the result of non_parallel its possible that the entire loop will be optimized out by the compiler.
Most likely, you just need a more expensive loop. Try increasing ARRAY_SIZE to something much larger.
Here's a test on my machine with a larger array size:
#define ARRAY_SIZE 100000000
int main(){
clock_t start, end;
double *non_parallel = (double*)malloc(ARRAY_SIZE * sizeof(double));
double *vec = (double*)malloc(ARRAY_SIZE * sizeof(double));
start = clock();
for(int i = 0; i < ARRAY_SIZE; i++)
{
non_parallel[i] = vec[i] * vec[i];
}
end = clock();
printf( "Number of seconds: %f\n", (end-start)/(double)CLOCKS_PER_SEC );
free(non_parallel);
free(vec);
return 0;
}
Output:
Number of seconds: 0.446000
This is an unreliable way to actually time number of seconds, since the clock() function is pretty low precision, and your loop isn't doing a lot of work. You can either make your loop do more so that it runs longer, or use a better timing method.
The higher precision methods are platform specific. For Windows, see How to use QueryPerformanceCounter? and for linux see High resolution timer with C++ and Linux?
Related
Suppose i have a nested for-loop and if-checks shown below, if i wanted to see how many clock cycles (ultimately how many secs) a particular for-loop or if-check is taking to finish executing.
Should the sum of number of clock cycles (secs) taken by the inner for-loop and if-check be equal (or approximately equal) to the number of clock cycles(secs) taken by the outer most for-loop.?
Or am i doing it wrong? how do i time the loops if there's any other way of doing it.?
Note: I have 3 different functions doing pretty much the same thing, i have declared 3 different functions to measure each for-loop or if-check separately 'cause if i try to get the execution time of all the sub components in the same piece of code, then the number of clock cycles(secs) taken by the outer for-loop will include some extra execution of instructions which are calculating the clock cycles count of inner for-loop and if-check i guess.
void fun1(){
int i=0,j=0,k=0;
clock_t t=0,t_start=0,t_end=0;
//time the outermost forloop
t_start = clock();
for(i=0;i<100000;i++){
for(j=0;j<1000;j++){
//some code
}
if(k==0){
//some code
}
}
t_end = clock();
t=t_end-t_start;
double time_taken = ((double)t)/CLOCKS_PER_SEC;
printf("outer for-loop took %f seconds to execute \n", time_taken);
}
void fun2(){
int i=0,j=0,k=0;
clock_t t2=0,t2_start=0,t2_end=0;
for(i=0;i<100000;i++){
//time the inner for loop
t2_start=clock();
for(j=0;j<1000;j++){
//some code
}
t2_end=clock();
t2+=(t2_end-t2_start);
if(k==0){
//some code
}
}
double time_taken = ((double)t2)/CLOCKS_PER_SEC;
printf("inner for-loop took %f seconds to execute \n", time_taken);
}
void fun3(){
int i=0,j=0,k=0;
clock_t t3=0,t3_start=0,t3_end=0;
for(i=0;i<100000;i++){
for(j=0;j<1000;j++){
//some code
}
//time the if check
t3_start=clock();
if(k==0){
//some code
}
t3_end=clock();
t3+=(t3_end-t3_start);
}
double time_taken = ((double)t3)/CLOCKS_PER_SEC;
printf("if-check took %f seconds to execute \n", time_taken);
}
The expected answer is t in fun1 will likely be slightly more than t2+t3 from fun2 and fun3 respectively, representing the additional time to evaluate the outer loop itself.
Less obvious, however, is the time added by the measurement itself, which will be the time to invoke clock() itself once for each measurement. When measuring the inside loops, it's effectively multiplied by 100,000 because of the iteration of the outer loop.
Here's a program to measure the measurement itself, and for good measure, also measures the time to evaluate an empty outer loop.
#include <time.h>
#include <stdio.h>
int main () {
clock_t t = 0;
clock_t t_start, t_end;
for (int i = 0; i < 100000; i++) {
t_start = clock();
t_end = clock();
t += (t_end - t_start);
}
double time_taken = ((double) t) / CLOCKS_PER_SEC;
printf ("Time imposed by measurement itself: %fsec\n", time_taken);
t_start = clock();
for (int i = 0; i < 100000; i++) {
}
t_end = clock();
t = (t_end - t_start);
time_taken = ((double) t) / CLOCKS_PER_SEC;
printf ("Time to evaluate the loop: %fsec\n", time_taken);
}
Which, at least on my system, suggests the measurement may skew the results some:
Time imposed by measurement itself: 0.056949sec
Time to evaluate the loop: 0.000200sec
To get the amount of time your inner loops "really" take, you'll need to subtract out that added by the act of measuring it.
I am trying to measure the time it takes to bubble sort 10 large numbers. I put the numbers in an array of size 10. Then, I am bubble sorting the numbers 10 times and printing out the time taken for each time.
The problem is that all im getting is zeros for some reason!
Here is what i have in the main:
int n = sizeof(arr10)/sizeof(arr10[0]);
start=clock();
bubbleSort(arr10, n);
end=clock();
cpu_time_used = (double) (end - start) / CLOCKS_PER_SEC;
printf("Bubble Sort time= %f\n",cpu_time_used);
and here is the function of bubble sort:
void bubbleSort(int arr[], int n)
{
int i, j;
for (i = 0; i < n-1; i++)
// Last i elements are already in place
for (j = 0; j < n-i-1; j++)
if (arr[j] > arr[j+1])
swap(&arr[j], &arr[j+1]);
}
void swap(int *xp, int *yp)
{
int temp = *xp;
*xp = *yp;
*yp = temp;
}
That the result is zero is probaby because the resolution of clock() is too low. There are CLOCKS_PER_SEC ticks, and CLOCKS_PER_SEC is 1000. Your example probably runs in less than 1000th of a second (less than 1 millisecond).
Use %e format specifier to print the seconds. What is most likely happening is that the number is so small that %f just prints 0.
Anyway regardless of how you print it, the results are pretty much useless, as such a small time will not be relevant for measuring your algorithm, other noise in the OS will dominate.
Measuring the sort time of 10 numbers will never yield any significant value. Try with thousands and millions of numbers. And don't forget to compile with optimizations enabled.
I wrote a program based on the idea of Riemann's sum to find out the integral value. It uses several threads, but the performance of it (the algorithm), compared to sequential program i wrote later, is subpar. Algorithm-wise they are identical except the threads stuff, so the question is what's wrong with it? pthread_join is not the case, i assume, because if one thread will finish sooner than the other thread, that join wait on, it will simply skip it in the future. Is that correct? The free call is probably wrong and there is no error check upon creation of threads, i'm aware of it, i deleted it along the way of testing various stuff. Sorry for bad english and thanks in advance.
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <sys/types.h>
#include <time.h>
int counter = 0;
float sum = 0;
pthread_mutex_t mutx;
float function_res(float);
struct range {
float left_border;
int steps;
float step_range;
};
void *calcRespectiveRange(void *ranges) {
struct range *rangs = ranges;
float left_border = rangs->left_border;
int steps = rangs->steps;
float step_range = rangs->step_range;
free(rangs);
//printf("left: %f steps: %d step range: %f\n", left_border, steps, step_range);
int i;
float temp_sum = 0;
for(i = 0; i < steps; i++) {
temp_sum += step_range * function_res(left_border);
left_border += step_range;
}
sum += temp_sum;
pthread_exit(NULL);
}
int main() {
clock_t begin, end;
if(pthread_mutex_init(&mutx, NULL) != 0) {
printf("mutex error\n");
}
printf("enter range, amount of steps and threads: \n");
float left_border, right_border;
int steps_count;
int threads_amnt;
scanf("%f %f %d %d", &left_border, &right_border, &steps_count, &threads_amnt);
float step_range = (right_border - left_border) / steps_count;
int i;
pthread_t tid[threads_amnt];
float chunk = (right_border - left_border) / threads_amnt;
int steps_per_thread = steps_count / threads_amnt;
begin = clock();
for(i = 0; i < threads_amnt; i++) {
struct range *ranges;
ranges = malloc(sizeof(ranges));
ranges->left_border = i * chunk + left_border;
ranges->steps = steps_per_thread;
ranges->step_range = step_range;
pthread_create(&tid[i], NULL, calcRespectiveRange, (void*) ranges);
}
for(i = 0; i < threads_amnt; i++) {
pthread_join(tid[i], NULL);
}
end = clock();
pthread_mutex_destroy(&mutx);
printf("\n%f\n", sum);
double time_spent = (double) (end - begin) / CLOCKS_PER_SEC;
printf("Time spent: %lf\n", time_spent);
return(0);
}
float function_res(float lb) {
return(lb * lb + 4 * lb + 3);
}
Edit: in short - can it be improved to reduce execution time (with mutexes, for example)?
The execution time will be shortened, provided you you have multiple hardware threads available.
The problem is in how you measure time: clock returns the processor time used by the program. That means, it sums the time taken by all the threads. If your program uses 2 threads, and it's linear execution time is 1 second, that means that each thread has used 1 second of CPU time, and clock will return the equivalent of 2 seconds.
To get the actual time used (on Linux), use gettimeofday. I modified your code by adding
#include <sys/time.h>
and capturing the start time before the loop:
struct timeval tv_start;
gettimeofday( &tv_start, NULL );
and after:
struct timeval tv_end;
gettimeofday( &tv_end, NULL );
and calculating the difference in seconds:
printf("CPU Time: %lf\nTime passed: %lf\n",
time_spent,
((tv_end.tv_sec * 1000*1000.0 + tv_end.tv_usec) -
(tv_start.tv_sec * 1000*1000.0 + tv_start.tv_usec)) / 1000/1000
);
(I also fixed the malloc from malloc(sizeof(ranges)) which allocates the size of a pointer (4 or 8 bytes for 32/64 bit CPU) to malloc(sizeof(struct range)) (12 bytes)).
When running with the input parameters 0 1000000000 1000000000 1, that is, 1 billion iterations in 1 thread, the output on my machine is:
CPU Time: 4.352000
Time passed: 4.400006
When running with 0 1000000000 1000000000 2, that is, 1 billion iterations spread over 2 threads (500 million iterations each), the output is:
CPU Time: 4.976000
Time passed: 2.500003
For completeness sake, I tested it with the input 0 1000000000 1000000000 4:
CPU Time: 8.236000
Time passed: 2.180114
It is a little faster, but not twice as fast as with 2 threads, and it uses double the CPU time. This is because my CPU is a Core i3, a dual-core with hyperthreading, which aren't true hardware threads.
I need to calculate the time that is taken for various Sorting Algorithms in C
I am checking the time taken to sort 100,200,300,400 and 500 elements.
Surprisingly, the time taken to sort them seems to be the same!
Whats wrong in the code that I am getting the same time 0.00000 which is also absurd that it takes 0 seconds to sort the numbers.
Whats wrong in the code that I am getting such results?
What changes must be made to the Code that I get accurate time taken ( in Seconds) for sorting the numbers.
#include <stdio.h>
#include <time.h>
// Bubble Sort on 100 to 500 elements
int main()
{
int g;
//clock_t ti;
for(g=0;g<5;g++)
{
int n, i, j, swap;
clock_t ti;
time_t t;
srand((unsigned) time(&t));
//scanf("%d", &n);
n=100;
int array[n*(g+1)];
for (i = 0; i < n; i++)
array[i] = rand() % 10000;// Generating random numbers as array entries
//printf("%f \n",ti);
ti = clock();
for (i = 0 ; i < ( n - 1 ); i++)
{
for (j = 0 ; j < n - i - 1; j++)
{
if (array[j] > array[j+1])
{
swap = array[j];
array[j] = array[j+1];
array[j+1] = swap;
}
}
}
ti = clock() - ti;
double time_taken = ((double)ti)/CLOCKS_PER_SEC;
/*for ( i = 0 ; i < n ; i++ )
printf("%d \n ", array[i]);*/
printf("Time Taken to sort %d elements is %f\n",(g+1)*100,time_taken);
}
return 0;
}
The Output That I am getting is:
Time Taken to sort 100 elements is 0.000000
Time Taken to sort 200 elements is 0.000000
Time Taken to sort 300 elements is 0.000000
Time Taken to sort 400 elements is 0.000000
Time Taken to sort 500 elements is 0.000000
clock() might have too low of a resolution. Check your value for CLOCKS_PER_SEC.
clock_gettime() provides a high resolution timer.
I would prefer checking CPU cycles directly from CPU register instead of clock(). It's accurate and doesn't have overhead. How it's done depends on used CPU. See e.g. http://software.intel.com/en-us/forums/topic/300007
You must of course convert cycles to time if you actually need time. Like this (you must implement CPU specific get_cpu_cycles() according to your system/environment):
unsigned long begin = 0, end = 0, time = 0;
begin = get_cpu_cycles();
// Do what you like to measure
end = get_cpu_cycles();
unsigned long diff = end - begin;
const unsigned long cpu_speed = 1000000; // Core speed in kilo Hertz -> cpu_time in milliseconds.
double cpu_time = ((double) diff) / cpu_speed;
printf("Time: %.6f", cpu_time);
Please note that this is hard coded to have CPU speed as 1 GHz.
is there any simple way how to measure computing time in C? I tried time utility when executed, but I need to measure specific part of a program.
Thanks
You can use the clock function in <time.h> along with the macro CLOCKS_PER_SEC:
clock_t start = clock() ;
do_some_work() ;
clock_t end = clock() ;
double elapsed_time = (end-start)/(double)CLOCKS_PER_SEC ;
Now elapsed_time holds the time it took to call do_some_work, in fractional seconds.
You can try the profiler "gprof". More information here: http://www.cs.utah.edu/dept/old/texinfo/as/gprof.html
You can generally use the clock() function to get the start and end times of a single call to your function being tested. If, however, do_some_work() is particularly fast, it needs to be put in a loop and have the cost of the loop itself factored out, something like:
#define COUNT 10000
// Get cost of naked loop.
clock_t start_base = clock();
for (int i = count; i > 0; i--)
;
clock_t end_base = clock();
// Get cost of loop plus work.
clock_t start = clock();
for (int i = count; i > 0; i--)
do_some_work() ;
clock_t end = clock();
// Calculate cost of single call.
double elapsed_time = end - start - (end_base - start_base);
elapsed_time = elapsed_time / CLOCKS_PER_SEC / COUNT;
This has at least two advantages:
you'll get an average time which is more representative of the actual time it should take; and
you'll get a more accurate answer in the case where the clock() function has a limited resolution.
#codebolt - Thank you! very nice. On Mac OS X, I added an include of time.h, and pasted in your four lines. Then I printed the values of start, stop (integers) and elapsed time. 1mS resolution.
output:
3 X: strcpy .name, .numDocks: start 0x5dc end 0x5e1 elapsed: 0.000005
calloc: start 0x622 end 0x630 elapsed: 0.000014
in my foo.c program I have
#include <libc.h>
#include <stdlib.h>
#include <stdio.h>
#include <time.h>
but it works without explicitly including time.h. One of the others must bring it in.
Actual code:
clock_t start = clock() ;
strcpy( yard2.name, temp ); /* temp is only persistant in main... */
strcpy( yard1.name, "Yard 1");
strcpy( yard3.name, "3 y 3 a 3 r 3 d 3");
yard1.numDocks = MAX_DOCKS; /* or so I guess.. */
yard2.numDocks = MAX_DOCKS; /* or so I guess.. */
yard3.numDocks = MAX_DOCKS; /* or so I guess.. */
clock_t end = clock() ;
double elapsed_time = (end-start)/(double)CLOCKS_PER_SEC ;
printf("3 X: strcpy .name, .numDocks: start 0x%x end 0x%x elapsed: %-12:8f \n", start, end, elapsed_time );
start = clock() ;
arrayD = calloc( yard2.numDocks, sizeof( struct dock ) ); /* get some memory, init it to 0 */
end = clock() ;
elapsed_time = (end-start)/(double)CLOCKS_PER_SEC ;
printf("calloc: start 0x%x end 0x%x elapsed: %-12:8f \n", start, end, elapsed_time );