I saw many topics about this, even on stackoverflow, for example:
How can I measure CPU time and wall clock time on both Linux/Windows?
I want to measure both cpu and wall time. Although person who answered a question in topic I posted recommend using gettimeofday to measure a wall time, I read that its better to use instead clock_gettime. So, I wrote the code below (is it ok, is it really measure a wall time, not cpu time? Im asking, cause I found a webpage: http://nadeausoftware.com/articles/2012/03/c_c_tip_how_measure_cpu_time_benchmarking#clockgettme where it says that clock_gettime measures a cpu time...) Whats the truth and which one should I use to measure a wall time?
Another question is about cpu time. I found the answer that clock is great about it, so I wrote a sample code for it too. But its not what I really want, for my code it shows me a 0 secods of cpu time. Is it possible to measure cpu time more precisely (in seconds)? Thanks for any help (for now on, Im interested only in Linux solutions).
Heres my code:
#include <time.h>
#include <stdio.h> /* printf */
#include <math.h> /* sqrt */
#include <stdlib.h>
int main()
{
int i;
double sum;
// measure elapsed wall time
struct timespec now, tmstart;
clock_gettime(CLOCK_REALTIME, &tmstart);
for(i=0; i<1024; i++){
sum += log((double)i);
}
clock_gettime(CLOCK_REALTIME, &now);
double seconds = (double)((now.tv_sec+now.tv_nsec*1e-9) - (double)(tmstart.tv_sec+tmstart.tv_nsec*1e-9));
printf("wall time %fs\n", seconds);
// measure cpu time
double start = (double)clock() /(double) CLOCKS_PER_SEC;
for(i=0; i<1024; i++){
sum += log((double)i);
}
double end = (double)clock() / (double) CLOCKS_PER_SEC;
printf("cpu time %fs\n", end - start);
return 0;
}
Compile it like this:
gcc test.c -o test -lrt -lm
and it shows me:
wall time 0.000424s
cpu time 0.000000s
I know I can make more iterations but thats not the point here ;)
IMPORTANT:
printf("CLOCKS_PER_SEC is %ld\n", CLOCKS_PER_SEC);
shows
CLOCKS_PER_SEC is 1000000
According to my manual page on clock it says
POSIX requires that CLOCKS_PER_SEC equals 1000000 independent of the actual resolution.
When increasing the number iterations on my computer the measured cpu-time starts showing on 100000 iterations. From the returned figures it seems the resolution is actually 10 millisecond.
Beware that when you optimize your code, the whole loop may disappear because sum is a dead value. There is also nothing to stop the compiler from moving the clock statements across the loop as there are no real dependences with the code in between.
Let me elaborate a bit more on micro measurements of performance of code. The naive and tempting way to measure performance is indeed by adding clock statements as you have done. However since time is not a concept or side effect in C, compilers can often move these clock calls at will. To remedy this it is tempting to make such clock calls have side effects by for example having it access volatile variables. However this still doesn't prohibit the compiler from moving highly side-effect free code over the calls. Think for example of accessing regular local variables. But worse, by making the clock calls look very scary to the compiler, you will actually negatively impact any optimizations. As a result, mere measuring of the performance impacts that performance in a negative and undesirable way.
If you use profiling, as already mentioned by someone, you can get a pretty good assessment of the performance of even optimized code, although the overall time of course is increased.
Another good way to measure performance is just asking the compiler to report the number of cycles some code will take. For a lot of architectures the compiler has a very accurate estimate of this. However most notably for a Pentium architecture it doesn't because the hardware does a lot of scheduling that is hard to predict.
Although it is not standing practice I think compilers should support a pragma that marks a function to be measured. The compiler then can include high precision non-intrusive measuring points in the prologue and epilogue of a function and prohibit any inlining of the function. Depending on the architecture it can choose a high precision clock to measure time, preferably with support from the OS to only measure time of the current process.
Related
I want to make variable delay in ATmega8. But in function delay_us(), I can just put a constant value. I think I can make a variable delay microsecond with a timer but I don't know how to work with this.
Please help me.
You can use a delay loop: you delay for one microsecond in each
iteration, and do as many iterations as microseconds you have to burn:
void delay_us(unsigned long us)
{
while (us--) _delay_us(1);
}
There are, however, a few issues with this approach:
it takes time to manage the iterations (decrement the counter, compare
to zero, conditional branch...), so the delay within the loop should
be significantly shorter that 1 µs
it takes time to call the function and return from it, and this should
be discounted from the iteration count, but since this time may not be
a full number of microseconds, you will have to add a small delay in
order to get to the next full microsecond
if the compiler inlines the function, everything will be off.
Trying to fix those issues yields something like this:
// Only valid with a 16 MHz clock.
void __attribute__((noinline)) delay_us(unsigned long us)
{
if (us < 2) return;
us -= 2;
_delay_us(0.4375);
while (us--) _delay_us(0.3125);
}
For a more complete version that can handle various clock frequencies,
see the delayMicroseconds() function from the Arduino AVR
core. Notice that the function is only accurate for a few discrete
frequencies. Notice also that the delay loop is done in inline assembly,
in order to be independent of compiler optimizations.
I am measuring the cycle count of different C functions which I try to make constant time in order to mitigate side channel attacks (crypto).
I am working with a microcontroller (aurix from infineon) which has an onboard cycle counter which gets incremented each clock tick and which I can read out.
Consider the following:
int result[32], cnt=0;
int secret[32];
/** some other code***/
reset_and_startCounter(); //resets cycles to 0 and starts the counter
int tmp = readCycles(); //read cycles before function call
function(secret) //I want to measure this function, should be constant time
result[cnt++] = readCycles() - tmp; //read out cycles and subtract to get correct result
When I measure the cycles like shown above, I will sometimes receive a different amount of cycles depending on the input given to the function. (~1-10 cycles difference, function itself takes about 3000 cycles).
I was now wondering if it not yet is perfectly constant time, and that the calculations depend on some input. I looked into the function and did the following:
void function(int* input){
reset_and_startCounter();
int tmp = readCycles();
/*********************************
******calculations on input******
*********************************/
result[cnt++] = readCycles() - tmp;
}
and I received the same amount of cycles no matter what input is given.
I then also measured the time needed to call the function only, and to return from the function. Both measurements were the same no matter what input.
I was always using the gcc compiler flags -O3,-fomit-frame-pointer. -O3 because the runtime is critical and I need it to be fast. And also important, no other code has been running on the microcontroller (no OS etc.)
Does anyone have a possible explanation for this. I want to be secure, that my code is constant time, and those cycles are arbitrary...
And sorry for not providing a runnable code here, but I believe not many have an Aurix lying arround :O
Thank you
The Infineon Aurix microcontroller you're using is designed for hard real-time applications. It has intentionally been designed to provide consistent runtime performance -- it lacks most of the features that can lead to inconsistent performance on more sophisticated CPUs, like cache memory or branch prediction.
While showing that your code has constant runtime on this part is a start, it is still possible for your code to have variable runtime when run on other CPUs. It is also possible that a device containing this CPU may leak information through other channels, particularly through power analysis. If making your application resistant to sidechannel analysis is critical, you may want to consider using a part designed for cryptographic applications. (The Aurix is not such a part.)
As has been known for a while (see, e.g., this old question, and bug reports that pop when you google this), clock_gettime() doesn't appear to report back time monotonically. To rule out any silly error I might have overseen, here is the relevant code (excerpt from larger program):
<include time.h>
long nano_1, nano_2;
double delta;
struct timespec tspec, *tspec_ptr;
clock_gettime(CLOCK_MONOTONIC_RAW, tspec_ptr);
nano_1 = tspec.tv_nsec;
sort_selection(sorted_ptr, n);
clock_gettime(CLOCK_MONOTONIC_RAW, tspec_ptr);
nano_2 = tspec.tv_nsec;
delta = (nano_2 - nano_1)/1000000.0;
printf("\nSelection sort took %g micro seconds.\n", (double) delta);
Sorting small arrays (about 1,000 elements) reports plausible times. When I sort larger ones (10,000+) using 3 sort algorithms, 1-2 of the 3 report back negative sort time. I tried all clock types mentioned in the man page, not only CLOCK_MONOTONIC_RAW - no change.
(1) Anything I overlooked in my code?
(2) Is there an alternative to clock_gettime() that measures time in increments more accurate than seconds? I don't need nanonseconds, but seconds is too coarse to really help.
System:
- Ubuntu 12.04.
- kernel 3.2.0-30
- gcc 4.6.3.
- libc version 2.15
- compiled with -lrt
This has nothing to do with the mythology of clock_gettime's monotonic clock not actually being monotonic (which probably has a basis in reality, but which was never well documented and probably fixed a long time ago). It's just a bug in your program. tv_nsec is the nanoseconds portion of a time value that's stored as two fields:
tv_sec - whole seconds
tv_nsec - nanoseconds in the range 0 to 999999999
Of course tv_nsec is going to jump backwards from 999999999 to 0 when tv_sec increments. To compute differences of timespec structs, you need to take 1000000000 times the difference in seconds and add that to the difference in nanoseconds. Of course this could quickly overflow if you don't convert to a 64-bit type first.
Based on a bit of reading around (including the link I provided above, and How to measure the ACTUAL execution time of a C program under Linux?) it seems that getrusage() or clock() should both provide you with a "working" timer that measures the time spent by your calculation only. It does puzzle me that your other function doesn't always give a >= 0 interval, I must say.
For use on getrusage, see http://linux.die.net/man/2/getrusage
**********************Original edit**********************
I am using different kind of clocks to get the time on Linux systems:
rdtsc, gettimeofday, clock_gettime
and already read various questions like these:
What's the best timing resolution can i get on Linux
How is the microsecond time of linux gettimeofday() obtained and what is its accuracy?
How do I measure a time interval in C?
faster equivalent of gettimeofday
Granularity in time function
Why is clock_gettime so erratic?
But I am a little confused:
What is the difference between granularity, resolution, precision, and accuracy?
Granularity (or resolution or precision) and accuracy are not the same things (if I am right ...)
For example, while using the "clock_gettime" the precision is 10 ms as I get with:
struct timespec res;
clock_getres(CLOCK_REALTIME, &res):
and the granularity (which is defined as ticks per second) is 100 Hz (or 10 ms), as I get when executing:
long ticks_per_sec = sysconf(_SC_CLK_TCK);
Accuracy is in nanosecond, as the above code suggest:
struct timespec gettime_now;
clock_gettime(CLOCK_REALTIME, &gettime_now);
time_difference = gettime_now.tv_nsec - start_time;
In the link below, I saw that this is the Linux global definition of granularity and it's better not to change it:
http://wwwagss.informatik.uni-kl.de/Projekte/Squirrel/da/node5.html#fig:clock:hw
So my question is If this remarks above were right, and also:
a) Can we see what is the granularity of rdtsc and gettimeofday (with a command)?
b) Can we change them (with any way)?
**********************Edit number 2**********************
I have tested some new clocks and I will like to share information:
a) In the page below, David Terei, did a fine program that compares various clock and their performances:
https://github.com/dterei/Scraps/tree/master/c/time
b) I have also tested omp_get_wtime as Raxman suggested by and I found a precision in nsec, but not really better than "clock_gettime (as they did in this website):
http://msdn.microsoft.com/en-us/library/t3282fe5.aspx
I think it's a Windows-oriented time function.
Better results are given with clock_gettime using CLOCK_MONOTONIC than when using CLOCK_REALTIME. That's normal, because the first calculates PROCESSING time and the other REAL TIME respectively
c) I have found also the Intel function ippGetCpuClocks, but not I've not tested it because it's mandatory to register first:
http://software.intel.com/en-us/articles/ipp-downloads-registration-and-licensing/
... or you may use a trial version
Precision is the amount of information, i.e. the number of significant digits you report. (E.g. I am 2 m, 1.8 m, 1.83 m, and 1.8322 m tall. All those measurements are accurate, but increasingly precise.)
Accuracy is the relation between the reported information and the truth. (E.g. "I'm 1.70 m tall" is more precise than "1.8 m", but not actually accurate.)
Granularity or resolution are about the smallest time interval that the timer can measure. For example, if you have 1 ms granularity, there's little point reporting the result with nanosecond precision, since it cannot possibly be accurate to that level of precision.
On Linux, the available timers with increasing granularity are:
clock() from <time.h> (20 ms or 10 ms resolution?)
gettimeofday() from Posix <sys/time.h> (microseconds)
clock_gettime() on Posix (nanoseconds?)
In C++, the <chrono> header offers a certain amount of abstraction around this, and std::high_resolution_clock attempts to give you the best possible clock.
I need to find the time taken to execute a single instruction or a few couple of instructions and print it out in terms of milli seconds. Can some one please share the small code snippet for this.
Thanks.. I need to use this measure the time taken to execute some instructions in my project.
#include<time.h>
main()
{
clock_t t1=clock();
printf("Dummy Statement\n");
clock_t t2=clock();
printf("The time taken is.. %g ", (t2-t1));
Please look at the below liks too.
What’s the correct way to use printf to print a clock_t?
http://www.velocityreviews.com/forums/t454464-c-get-time-in-milliseconds.html
One instruction will take a lot shorter than 1 millisecond to execute. And if you are trying to measure more than one instruction it will get complicated (what about the loop that calls the instruction multiple times).
Also, most timing functions that you can use are just that: functions. That means they will execute instructions also. If you want to time one instruction then the best bet is to look up the specifications of the processor that you are using and see how many cycles it takes.
Doing this programatically isn't possible.
Edit:
Since you've updated your question to now refer to some instructions. You can measure sub-millisecond time on some processors. It would be nice to know the environment. This will work on x86 and linux, other environments will be different.
Clock get time allows forr sub-nanosecond accuracy. Or you can call the rdstc instruction yourself (good luck with this on a multiprocessor or smp system - you could be measuring the wrong thing, eg by having the instruction run on different processors).
The time to actually complete an instruction depends on the clock cycle time, and the depth of the pipeline the instruction traverses through the processor. As dave said, you can't really find this out by making a program. You can use some kind of timing function provided to you by your OS to measure the cpu time it takes to complete some small set of instructions. If you do this, try not to use any kind of instructions that rely on memory, or branching. Ideally you might do some kind of logical or arithmetic operations (so perhaps using some inline assembly in C).