I have the following code for a C program that displays the number of microseconds it took for a system() call to run:
#include <stdio.h>
#include <stdlib.h>
#include <profileapi.h>
long long measure(char* command) {
// define variables
LARGE_INTEGER StartingTime, EndingTime, ElapsedMicroseconds;
LARGE_INTEGER Frequency;
// get the frequency of the counter
QueryPerformanceFrequency(&Frequency);
// get the current count
QueryPerformanceCounter(&StartingTime);
// run our command
system(command);
// get the end of the count
QueryPerformanceCounter(&EndingTime);
// calculate the difference in counts
ElapsedMicroseconds.QuadPart = EndingTime.QuadPart - StartingTime.QuadPart;
// scale to microseconds
ElapsedMicroseconds.QuadPart *= 1000000;
// divide by the frequency of the counter
ElapsedMicroseconds.QuadPart /= Frequency.QuadPart;
return ElapsedMicroseconds.QuadPart;
}
int main() {
// measure the time it takes to run the command "ls"
long long time = measure("echo hello");
// print the time elapsed
printf("%lld\n", time);
return 0;
}
Now, when I run the program I get somewhere between 16-20 milliseconds when I do the math, however I get much lower times in PowerShell with Measure-Command {echo hello | Out-Default}. This leads me to suspect I am doing something wrong with QueryPerformanceCount. I am getting much larger timespans than I should be.
I attached an image of one of the instances showing a large difference.
Am I doing anything wrong with my code?
Thanks
Image showing a instance of the problem
Using system creates a new process, that's why it's taking longer.
At the PowerShell example, you're not creating a new cmd.exe process, and performance is measured when PowerShell and all of its modules are already loaded.
(Answer from my comment)
I am evaluating the performance of a busy wait loop for firing events at consistent intervals. I have noticed some odd behavior using the following code:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <time.h>
int timespec_subtract(struct timespec *, struct timespec, struct timespec);
int main(int argc, char *argv[]) {
int iterations = atoi(argv[1])+1;
struct timespec t[2], diff;
for (int i = 0; i < iterations; i++) {
clock_gettime(CLOCK_MONOTONIC, &t[0]);
static volatile int i;
for (i = 0; i < 200000; i++)
;
clock_gettime(CLOCK_MONOTONIC, &t[1]);
timespec_subtract(&diff, t[1], t[0]);
printf("%ld\n", diff.tv_sec * 1000000000 + diff.tv_nsec);
}
}
On the test machine (dual 14-core E5-2683 v3 # 2.00Ghz, 256GB DDR4), 200k iterations of the for loop is approximately 1ms. Or maybe not:
1030854
1060237
1012797
1011479
1025307
1017299
1011001
1038725
1017361
... (about 700 lines later)
638466
638546
638446
640422
638468
638457
638468
638398
638493
640242
... (about 200 lines later)
606460
607013
606449
608813
606542
606484
606990
606436
606491
606466
... (about 3000 lines later)
404367
404307
404309
404306
404270
404370
404280
404395
404342
406005
When the times shift down the third time, they stay mostly consistent (within about 2 or 3 microseconds), except for occasionally jumping up to about 450us for a few hundred iterations. This behavior is repeatable on similar machines and over many runs.
I understand that busy loops can be optimized out by the compiler, but I don't think that's the issue here. I don't think cache should be affecting it, because no invalidation should be taking place, and wouldn't explain the sudden optimization. I also tried using a register int for the loop counter, with no noticeable effect.
Any thoughts on what is going on, and how to make this (more) consistent?
EDIT: For information, running this program with usleep, nanosleep, or the shown busy wait for 10k iterations all show ~20000 involuntary context switches with time -v.
I'd make 2 points
- Due to context swtiching sleep/usleep may sleep for more time than expected
- Moreover if there is some higher priority task like interrupts, there may come a situation when sleep may not be executed at all.
Thus if you want exact delay in your application you can use gettimeofday to calculate the time gap which can be subtracted from the delay in sleep/usleep call
One big issue with busy waiting is that, besides using up CPU resources, the amount of time you wait will be highly dependent on the CPU block speed. So the same loop can run for wildly different times on different machines.
The problem with any method of sleeping is that due to OS scheduling you may end up sleeping for longer than intended. The man pages for nanosleep says that it will use the rem argument to tell you the remaining time in case you received a signal, but it says nothing about waiting too long.
You need to grab the timestamp after each call to usleep so you know how long you actually slept for. If you slept too short, you add the deficit. If you slept too long, you subtract the overage.
Here's an example of how I did this in UFTP, a multicast file transfer application, in order to send packets at a consistent speed:
int64_t diff_usec(struct timeval t2, struct timeval t1)
{
return (t2.tv_usec - t1.tv_usec) +
(int64_t)1000000 * (t2.tv_sec - t1.tv_sec);
}
...
int32_t packet_wait = 10000;
int64_t overage = 0, tdiff;
struct timeval current_sent, last_sent;
gettimeofday(&last_sent, NULL);
while(...) {
...
if (packet_wait > overage) {
usleep(packet_wait - (int32_t)overage);
}
gettimeofday(¤t_sent, NULL);
tdiff = diff_usec(current_sent, last_sent);
overage += tdiff - packet_wait;
last_sent = current_sent;
...
}
I was trying to familiarize myself with the C time.h library by writing something simple in VS. The following code simply prints the value of x added to itself every two seconds:
int main() {
time_t start = time(NULL);
time_t clock = time(NULL);
time_t clockTemp = time(NULL); //temporary clock
int x = 1;
//program will continue for a minute (60 sec)
while (clock <= start + 58) {
clockTemp = time(NULL);
if (clockTemp >= clock + 2) { //if 2 seconds has passed
clock = clockTemp;
x = ADD(x);
printf("%d at %d\n", x, timeDiff(start, clock));
}
}
}
int timeDiff(int start, int at) {
return at - start;
}
My concern is with the amount of CPU that this program takes, about 22%. I figure this problem stems from the constant updating of the clockTemp (just below the while statement), but I'm not sure how to fix this issue. Is it possible that this is a visual studio problem, or is there a special way to check for time?
Solution
the code needed the sleep function so that it wouldn't need to run constantly.
I added sleep with #include <windows.h> and put Sleep (2000) //2 second sleep at the end of the while
while (clock <= start + 58) {
...
Sleep(2000); }
The problem is not in the way you are checking the current time. The problem is that there is nothing to limit the frequency with which the loop runs. Your program continues to execute statements as quickly as it can, and eats up a ton of processor time. (In the absence of other programs, on a single-threaded CPU, it would use 100% of your processor time.)
You need to add a "sleep" method inside your loop, which will indicate to the processor that it can stop processing your program for a short period of time. There are many ways to do this; this question has some examples.
I am running a C program using GCC and a proprietary DSP cross-compiler to simulate some functioality. I am using the following code to measure the execution time of particular part of my program:
clock_t start,end;
printf("DECODING DATA:\n");
start=clock();
conv3_dec(encoded, decoded,3*length,0);
end=clock();
duration = (double)(end - start) / CLOCKS_PER_SEC;
printf("DECODING TIME = %f\n",duration);
where conv3_dec() is a function defined in my program and I want to find the run-time of this function.
Now the thing is when my program runs, the conv3_dec() functions runs for almost 2 hours but the output from the printf("DECODING TIME = %f\n",duration) says that the execution of the function finished in just half a second (DECODING TIME = 0.455443) . This is very confusing for me.
I have used the clock_t technique to measure the runtimes of programs previously but the difference has never been so huge. Is this being caused by the cross-compiler. Just as a side note, the simulator simulates a DSP processor running at just 500MHz, so is the difference in the clock speeds of the DSP processor and my CPU causing the error is measuring the CLOCKS_PER_SEC.
clock measures the cpu time and not the wallclock time. Since you are not running the majority of your code on the cpu, this is not the right tool.
For durations like two hours, I wouldn't be too concerned about clock(), it's far more useful for measuring sub-second durations.
Just use time() if you want the actual elapsed time, something like (dummy stuff supplied for what was missing):
#include <stdio.h>
#include <time.h>
// Dummy stuff starts here
#include <unistd.h>
#define encoded 0
#define decoded 0
#define length 0
static void conv3_dec (int a, int b, int c, int d) {
sleep (20);
}
// Dummy stuff ends here
int main (void) {
time_t start, end, duration;
puts ("DECODING DATA:");
start = time (0);
conv3_dec (encoded, decoded, 3 * length, 0);
end = time (0);
duration = end - start;
printf ("DECODING TIME = %d\n", duration);
return 0;
}
which generates:
DECODING DATA:
DECODING TIME = 20
gettimeofday() function also can be considered.
The gettimeofday() function shall obtain the current time, expressed as seconds and microseconds since the Epoch, and store it in the timeval structure pointed to by tp. The resolution of the system clock is unspecified.
Calculating elapsed time in a C program in milliseconds
http://www.ccplusplus.com/2011/11/gettimeofday-example.html
I have a C program that aims to be run in parallel on several processors. I need to be able to record the execution time (which could be anywhere from 1 second to several minutes). I have searched for answers, but they all seem to suggest using the clock() function, which then involves calculating the number of clocks the program took divided by the Clocks_per_second value.
I'm not sure how the Clocks_per_second value is calculated?
In Java, I just take the current time in milliseconds before and after execution.
Is there a similar thing in C? I've had a look, but I can't seem to find a way of getting anything better than a second resolution.
I'm also aware a profiler would be an option, but am looking to implement a timer myself.
Thanks
CLOCKS_PER_SEC is a constant which is declared in <time.h>. To get the CPU time used by a task within a C application, use:
clock_t begin = clock();
/* here, do your time-consuming job */
clock_t end = clock();
double time_spent = (double)(end - begin) / CLOCKS_PER_SEC;
Note that this returns the time as a floating point type. This can be more precise than a second (e.g. you measure 4.52 seconds). Precision depends on the architecture; on modern systems you easily get 10ms or lower, but on older Windows machines (from the Win98 era) it was closer to 60ms.
clock() is standard C; it works "everywhere". There are system-specific functions, such as getrusage() on Unix-like systems.
Java's System.currentTimeMillis() does not measure the same thing. It is a "wall clock": it can help you measure how much time it took for the program to execute, but it does not tell you how much CPU time was used. On a multitasking systems (i.e. all of them), these can be widely different.
If you are using the Unix shell for running, you can use the time command.
doing
$ time ./a.out
assuming a.out as the executable will give u the time taken to run this
In plain vanilla C:
#include <time.h>
#include <stdio.h>
int main()
{
clock_t tic = clock();
my_expensive_function_which_can_spawn_threads();
clock_t toc = clock();
printf("Elapsed: %f seconds\n", (double)(toc - tic) / CLOCKS_PER_SEC);
return 0;
}
You functionally want this:
#include <sys/time.h>
struct timeval tv1, tv2;
gettimeofday(&tv1, NULL);
/* stuff to do! */
gettimeofday(&tv2, NULL);
printf ("Total time = %f seconds\n",
(double) (tv2.tv_usec - tv1.tv_usec) / 1000000 +
(double) (tv2.tv_sec - tv1.tv_sec));
Note that this measures in microseconds, not just seconds.
Most of the simple programs have computation time in milli-seconds. So, i suppose, you will find this useful.
#include <time.h>
#include <stdio.h>
int main(){
clock_t start = clock();
// Execuatable code
clock_t stop = clock();
double elapsed = (double)(stop - start) * 1000.0 / CLOCKS_PER_SEC;
printf("Time elapsed in ms: %f", elapsed);
}
If you want to compute the runtime of the entire program and you are on a Unix system, run your program using the time command like this time ./a.out
(All answers here are lacking, if your sysadmin changes the systemtime, or your timezone has differing winter- and sommer-times. Therefore...)
On linux use: clock_gettime(CLOCK_MONOTONIC_RAW, &time_variable);
It's not affected if the system-admin changes the time, or you live in a country with winter-time different from summer-time, etc.
#include <stdio.h>
#include <time.h>
#include <unistd.h> /* for sleep() */
int main() {
struct timespec begin, end;
clock_gettime(CLOCK_MONOTONIC_RAW, &begin);
sleep(1); // waste some time
clock_gettime(CLOCK_MONOTONIC_RAW, &end);
printf ("Total time = %f seconds\n",
(end.tv_nsec - begin.tv_nsec) / 1000000000.0 +
(end.tv_sec - begin.tv_sec));
}
man clock_gettime states:
CLOCK_MONOTONIC
Clock that cannot be set and represents monotonic time since some unspecified starting point. This clock is not affected by discontinuous jumps in the system time
(e.g., if the system administrator manually changes the clock), but is affected by the incremental adjustments performed by adjtime(3) and NTP.
Thomas Pornin's answer as macros:
#define TICK(X) clock_t X = clock()
#define TOCK(X) printf("time %s: %g sec.\n", (#X), (double)(clock() - (X)) / CLOCKS_PER_SEC)
Use it like this:
TICK(TIME_A);
functionA();
TOCK(TIME_A);
TICK(TIME_B);
functionB();
TOCK(TIME_B);
Output:
time TIME_A: 0.001652 sec.
time TIME_B: 0.004028 sec.
A lot of answers have been suggesting clock() and then CLOCKS_PER_SEC from time.h. This is probably a bad idea, because this is what my /bits/time.h file says:
/* ISO/IEC 9899:1990 7.12.1: <time.h>
The macro `CLOCKS_PER_SEC' is the number per second of the value
returned by the `clock' function. */
/* CAE XSH, Issue 4, Version 2: <time.h>
The value of CLOCKS_PER_SEC is required to be 1 million on all
XSI-conformant systems. */
# define CLOCKS_PER_SEC 1000000l
# if !defined __STRICT_ANSI__ && !defined __USE_XOPEN2K
/* Even though CLOCKS_PER_SEC has such a strange value CLK_TCK
presents the real value for clock ticks per second for the system. */
# include <bits/types.h>
extern long int __sysconf (int);
# define CLK_TCK ((__clock_t) __sysconf (2)) /* 2 is _SC_CLK_TCK */
# endif
So CLOCKS_PER_SEC might be defined as 1000000, depending on what options you use to compile, and thus it does not seem like a good solution.
#include<time.h>
#include<stdio.h>
int main(){
clock_t begin=clock();
int i;
for(i=0;i<100000;i++){
printf("%d",i);
}
clock_t end=clock();
printf("Time taken:%lf",(double)(end-begin)/CLOCKS_PER_SEC);
}
This program will work like charm.
You have to take into account that measuring the time that took a program to execute depends a lot on the load that the machine has in that specific moment.
Knowing that, the way of obtain the current time in C can be achieved in different ways, an easier one is:
#include <time.h>
#define CPU_TIME (getrusage(RUSAGE_SELF,&ruse), ruse.ru_utime.tv_sec + \
ruse.ru_stime.tv_sec + 1e-6 * \
(ruse.ru_utime.tv_usec + ruse.ru_stime.tv_usec))
int main(void) {
time_t start, end;
double first, second;
// Save user and CPU start time
time(&start);
first = CPU_TIME;
// Perform operations
...
// Save end time
time(&end);
second = CPU_TIME;
printf("cpu : %.2f secs\n", second - first);
printf("user : %d secs\n", (int)(end - start));
}
Hope it helps.
Regards!
ANSI C only specifies second precision time functions. However, if you are running in a POSIX environment you can use the gettimeofday() function that provides microseconds resolution of time passed since the UNIX Epoch.
As a side note, I wouldn't recommend using clock() since it is badly implemented on many(if not all?) systems and not accurate, besides the fact that it only refers to how long your program has spent on the CPU and not the total lifetime of the program, which according to your question is what I assume you would like to measure.
I've found that the usual clock(), everyone recommends here, for some reason deviates wildly from run to run, even for static code without any side effects, like drawing to screen or reading files. It could be because CPU changes power consumption modes, OS giving different priorities, etc...
So the only way to reliably get the same result every time with clock() is to run the measured code in a loop multiple times (for several minutes), taking precautions to prevent the compiler from optimizing it out: modern compilers can precompute the code without side effects running in a loop, and move it out of the loop., like i.e. using random input for each iteration.
After enough samples are collected into an array, one sorts that array, and takes the middle element, called median. Median is better than average, because it throws away extreme deviations, like say antivirus taking up all CPU up or OS doing some update.
Here is a simple utility to measure execution performance of C/C++ code, averaging the values near median: https://github.com/saniv/gauge
I'm myself still looking for a more robust and faster way to measure code. One could probably try running the code in controlled conditions on bare metal without any OS, but that will give unrealistic result, because in reality OS does get involved.
x86 has these hardware performance counters, which including the actual number of instructions executed, but they are tricky to access without OS help, hard to interpret and have their own issues ( http://archive.gamedev.net/archive/reference/articles/article213.html ). Still they could be helpful investigating the nature of the bottle neck (data access or actual computations on that data).
Every solution's are not working in my system.
I can get using
#include <time.h>
double difftime(time_t time1, time_t time0);
Some might find a different kind of input useful: I was given this method of measuring time as part of a university course on GPGPU-programming with NVidia CUDA (course description). It combines methods seen in earlier posts, and I simply post it because the requirements give it credibility:
unsigned long int elapsed;
struct timeval t_start, t_end, t_diff;
gettimeofday(&t_start, NULL);
// perform computations ...
gettimeofday(&t_end, NULL);
timeval_subtract(&t_diff, &t_end, &t_start);
elapsed = (t_diff.tv_sec*1e6 + t_diff.tv_usec);
printf("GPU version runs in: %lu microsecs\n", elapsed);
I suppose you could multiply with e.g. 1.0 / 1000.0 to get the unit of measurement that suits your needs.
If you program uses GPU or if it uses sleep() then clock() diff gives you smaller than actual duration. It is because clock() returns the number of CPU clock ticks. It only can be used to calculate CPU usage time (CPU load), but not the execution duration. We should not use clock() to calculate duration. We still should use gettimeofday() or clock_gettime() for duration in C.
perf tool is more accurate to be used in order to collect and profile the running program. Use perf stat to show all information related to the program being executed.
As simple as possible by using function-like macro
#include <stdio.h>
#include <time.h>
#define printExecTime(t) printf("Elapsed: %f seconds\n", (double)(clock()-(t)) / CLOCKS_PER_SEC)
int factorialRecursion(int n) {
return n == 1 ? 1 : n * factorialRecursion(n-1);
}
int main()
{
clock_t t = clock();
int j=1;
for(int i=1; i <10; i++ , j*=i);
printExecTime(t);
// compare with recursion factorial
t = clock();
j = factorialRecursion(10);
printExecTime(t);
return 0;
}
Comparison of execution time of bubble sort and selection sort
I have a program which compares the execution time of bubble sort and selection sort.
To find out the time of execution of a block of code compute the time before and after the block by
clock_t start=clock();
…
clock_t end=clock();
CLOCKS_PER_SEC is constant in time.h library
Example code:
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
int main()
{
int a[10000],i,j,min,temp;
for(i=0;i<10000;i++)
{
a[i]=rand()%10000;
}
//The bubble Sort
clock_t start,end;
start=clock();
for(i=0;i<10000;i++)
{
for(j=i+1;j<10000;j++)
{
if(a[i]>a[j])
{
int temp=a[i];
a[i]=a[j];
a[j]=temp;
}
}
}
end=clock();
double extime=(double) (end-start)/CLOCKS_PER_SEC;
printf("\n\tExecution time for the bubble sort is %f seconds\n ",extime);
for(i=0;i<10000;i++)
{
a[i]=rand()%10000;
}
clock_t start1,end1;
start1=clock();
// The Selection Sort
for(i=0;i<10000;i++)
{
min=i;
for(j=i+1;j<10000;j++)
{
if(a[min]>a[j])
{
min=j;
}
}
temp=a[min];
a[min]=a[i];
a[i]=temp;
}
end1=clock();
double extime1=(double) (end1-start1)/CLOCKS_PER_SEC;
printf("\n");
printf("\tExecution time for the selection sort is %f seconds\n\n", extime1);
if(extime1<extime)
printf("\tSelection sort is faster than Bubble sort by %f seconds\n\n", extime - extime1);
else if(extime1>extime)
printf("\tBubble sort is faster than Selection sort by %f seconds\n\n", extime1 - extime);
else
printf("\tBoth algorithms have the same execution time\n\n");
}