caching localtime_r() is it worth it? - c

is it worth keeping a local copy of struct tm and update it only when required; below func is not thread safe... also I've seen only 6 to 7% of CPU time can be saved...
struct tm* custom_localtime (time_t now_sec)
{
static time_t cache_sec;
static struct tm tms;
if (now_sec != cache_sec) {
cache_sec = now_sec;
localtime_r(&cache_sec, &(tms));
}
return(&tms);
}
Additional details:
- my app makes more than 3000/sec calls to localtime_r()
found out at least 33% CPU time saving when I cache time-stamp strings of the format "2011-12-09 10:32:45" againt time_t seconds
thank you all nos, asc99c and Mircea.

I would probably have mentioned the 3000/s call rate in your question! Do it. I recently was profiling generation of a screen which was calling localtime approx 1,000,000 * 10,000 times.
The nested loops could have been improved substantially with a bit of thought, but what I saw was about 85% of CPU time was used by localtime. Simply caching the result so it was only called 10,000 times cut 85% of the time off page generation, and that made it easily fast enough.

"Avoiding a library function call that's not really needed" is worth it, of couse. The rest is only your tradeoff between memory and speed.
Since you're calling this 3000/second, you might want to go even further and put this function as static inline in a header and also (if using GCC) use branch prediction hints for the conditional, stating that taking it is "unlikely":
if (__builtin_expect(now_sec != cache_sec, 0))

Related

Function takes unusually long time to return

I am measuring how much time it takes to execute function compute_secret(). The pseudo code below is how I do my measurement.
The problem is that, there is a quite large different in between total_inside (the time its take to execute the function body, measure inside the callee) vs. total_outside (measure inside the caller). For example, total_inside = 127ms, total_outside = 238ms.
I understand that it takes sometime to call and return from function call. But it should not takes "that" long.
So what are the common reasons when calling and returning from function cost too much time ?
int encrypt(args){
time start_inside = get_current_time();
/**
actually compute stuff
**/
time end_inside = get_current_time();
time total_inside = end_inside - start_inside;
}
int main(){
time start_outside = get_current_time();
encrypt(args);
time end_outside = get_current_time();
time total_outside = end_outside - start_outside;
}
From a glance at the psudo-code, I see a number of things:
the first is that it's pseudo-code, which rather hampers our ability to do a full analysis :-) So comments below are based on what I see as most likely options.
you haven't actually posted the code for get_current_time(), the execution time of that may be irrelevant but that's not necessarily the case. Sometimes timing functions get in the way of accuarte timing.
the reason I mention that last point is that the outer time calculation will consist of the function calculations itself, the stack setup and teardown, and the inner time calculation. You should probably just run with the outer time to see if the timings are reduced.
because of the vagaries of code execution, it's often better to time something in a loop and average the time taken. For all we know, the first call may take 250ms and subsequent calls take 1ms.
If you're really concerned about stack setup and teardown (it's usually not too bad but it's definitely not free and, for very short-lived functions, it can often swamp the execution time of the actual work being done), there are options.
For example, you can choose higher optimisation levels, or force your function inline such as by making it a function macro, assuming you protect against all the usual macro-related issues, of course.

Time measurements differ on microcontroller

I am measuring the cycle count of different C functions which I try to make constant time in order to mitigate side channel attacks (crypto).
I am working with a microcontroller (aurix from infineon) which has an onboard cycle counter which gets incremented each clock tick and which I can read out.
Consider the following:
int result[32], cnt=0;
int secret[32];
/** some other code***/
reset_and_startCounter(); //resets cycles to 0 and starts the counter
int tmp = readCycles(); //read cycles before function call
function(secret) //I want to measure this function, should be constant time
result[cnt++] = readCycles() - tmp; //read out cycles and subtract to get correct result
When I measure the cycles like shown above, I will sometimes receive a different amount of cycles depending on the input given to the function. (~1-10 cycles difference, function itself takes about 3000 cycles).
I was now wondering if it not yet is perfectly constant time, and that the calculations depend on some input. I looked into the function and did the following:
void function(int* input){
reset_and_startCounter();
int tmp = readCycles();
/*********************************
******calculations on input******
*********************************/
result[cnt++] = readCycles() - tmp;
}
and I received the same amount of cycles no matter what input is given.
I then also measured the time needed to call the function only, and to return from the function. Both measurements were the same no matter what input.
I was always using the gcc compiler flags -O3,-fomit-frame-pointer. -O3 because the runtime is critical and I need it to be fast. And also important, no other code has been running on the microcontroller (no OS etc.)
Does anyone have a possible explanation for this. I want to be secure, that my code is constant time, and those cycles are arbitrary...
And sorry for not providing a runnable code here, but I believe not many have an Aurix lying arround :O
Thank you
The Infineon Aurix microcontroller you're using is designed for hard real-time applications. It has intentionally been designed to provide consistent runtime performance -- it lacks most of the features that can lead to inconsistent performance on more sophisticated CPUs, like cache memory or branch prediction.
While showing that your code has constant runtime on this part is a start, it is still possible for your code to have variable runtime when run on other CPUs. It is also possible that a device containing this CPU may leak information through other channels, particularly through power analysis. If making your application resistant to sidechannel analysis is critical, you may want to consider using a part designed for cryptographic applications. (The Aurix is not such a part.)

How to measure cpu time and wall clock time?

I saw many topics about this, even on stackoverflow, for example:
How can I measure CPU time and wall clock time on both Linux/Windows?
I want to measure both cpu and wall time. Although person who answered a question in topic I posted recommend using gettimeofday to measure a wall time, I read that its better to use instead clock_gettime. So, I wrote the code below (is it ok, is it really measure a wall time, not cpu time? Im asking, cause I found a webpage: http://nadeausoftware.com/articles/2012/03/c_c_tip_how_measure_cpu_time_benchmarking#clockgettme where it says that clock_gettime measures a cpu time...) Whats the truth and which one should I use to measure a wall time?
Another question is about cpu time. I found the answer that clock is great about it, so I wrote a sample code for it too. But its not what I really want, for my code it shows me a 0 secods of cpu time. Is it possible to measure cpu time more precisely (in seconds)? Thanks for any help (for now on, Im interested only in Linux solutions).
Heres my code:
#include <time.h>
#include <stdio.h> /* printf */
#include <math.h> /* sqrt */
#include <stdlib.h>
int main()
{
int i;
double sum;
// measure elapsed wall time
struct timespec now, tmstart;
clock_gettime(CLOCK_REALTIME, &tmstart);
for(i=0; i<1024; i++){
sum += log((double)i);
}
clock_gettime(CLOCK_REALTIME, &now);
double seconds = (double)((now.tv_sec+now.tv_nsec*1e-9) - (double)(tmstart.tv_sec+tmstart.tv_nsec*1e-9));
printf("wall time %fs\n", seconds);
// measure cpu time
double start = (double)clock() /(double) CLOCKS_PER_SEC;
for(i=0; i<1024; i++){
sum += log((double)i);
}
double end = (double)clock() / (double) CLOCKS_PER_SEC;
printf("cpu time %fs\n", end - start);
return 0;
}
Compile it like this:
gcc test.c -o test -lrt -lm
and it shows me:
wall time 0.000424s
cpu time 0.000000s
I know I can make more iterations but thats not the point here ;)
IMPORTANT:
printf("CLOCKS_PER_SEC is %ld\n", CLOCKS_PER_SEC);
shows
CLOCKS_PER_SEC is 1000000
According to my manual page on clock it says
POSIX requires that CLOCKS_PER_SEC equals 1000000 independent of the actual resolution.
When increasing the number iterations on my computer the measured cpu-time starts showing on 100000 iterations. From the returned figures it seems the resolution is actually 10 millisecond.
Beware that when you optimize your code, the whole loop may disappear because sum is a dead value. There is also nothing to stop the compiler from moving the clock statements across the loop as there are no real dependences with the code in between.
Let me elaborate a bit more on micro measurements of performance of code. The naive and tempting way to measure performance is indeed by adding clock statements as you have done. However since time is not a concept or side effect in C, compilers can often move these clock calls at will. To remedy this it is tempting to make such clock calls have side effects by for example having it access volatile variables. However this still doesn't prohibit the compiler from moving highly side-effect free code over the calls. Think for example of accessing regular local variables. But worse, by making the clock calls look very scary to the compiler, you will actually negatively impact any optimizations. As a result, mere measuring of the performance impacts that performance in a negative and undesirable way.
If you use profiling, as already mentioned by someone, you can get a pretty good assessment of the performance of even optimized code, although the overall time of course is increased.
Another good way to measure performance is just asking the compiler to report the number of cycles some code will take. For a lot of architectures the compiler has a very accurate estimate of this. However most notably for a Pentium architecture it doesn't because the hardware does a lot of scheduling that is hard to predict.
Although it is not standing practice I think compilers should support a pragma that marks a function to be measured. The compiler then can include high precision non-intrusive measuring points in the prologue and epilogue of a function and prohibit any inlining of the function. Depending on the architecture it can choose a high precision clock to measure time, preferably with support from the OS to only measure time of the current process.

gettimeofday/settimeofday for Making a Function Appear to Take No Time

I've got an auxiliary function that does some operations that are pretty costly.
I'm trying to profile the main section of the algorithm, but this auxiliary function gets called a lot within. Consequently, the measured time takes into account the auxillary function's time.
To solve this, I decided to set and restore the time so that the auxillary function appears to be instantaneous. I defined the following macros:
#define TIME_SAVE struct timeval _time_tv; gettimeofday(&_time_tv,NULL);
#define TIME_RESTORE settimeofday(&_time_tv,NULL);
. . . and used them as the first and last lines of the auxiliary function. For some reason, though, the auxiliary function's overhead is still included!
So, I know this is kind of a messy solution, and so I have since moved on, but I'm still curious as to why this idea didn't work.
Can someone please explain why?
If you insist on profiling this way, do not set the system clock. This will break all sorts of things, if you have permission to do it. Basically you should forget you ever heard of settimeofday. What you want to do is call gettimeofday both before and after the function you want to exclude from measurement, and compute the difference. You can then exclude the time spent in this function from the overall time.
With that said, this whole method of "profiling" is highly flawed, because gettimeofday probably (1) takes a significant amount of time compared to what you're trying to measure, and (2) probably involves a transition into kernelspace, which will do some serious damage to your program's cache coherency. This second problem, whereby in attempting to observe your program's performance characteristics you actually change them, is the most problematic.
What you really should do is forget about this kind of profiling (gettimeofday or even gcc's -pg/gmon profiling) and instead use oprofile or perf or something similar. These modern profiling techniques work based on statistically sampling the instruction pointer and stack information periodically; your program's own code is not modified at all, so it behaves as closely as possible to how it would behave with no profiler running.
There are a couple possibilities that may be occurring. One is that Linux tries to keep the clock accurate and adjustments to the clock may be 'smoothed' or otherwise 'fixed up' to try to keep a smooth sense of time within the system. If you are running NTP, it will also try to maintain a reasonable sense of time.
My approach would have been to not modify the clock but instead track time consumed by each portion of the process. The calls to the expensive part would be accumulated (by getting the difference between gettimeofday on entry and exit, and accumulating) and subtracting that from overall time. There are other possibilities for fancier approaches, I'm sure.

most efficient way to get current time/date/day in C

What is the most efficient way of getting current time/date/day/year in C language? As I have to execute this many times, I need a real efficient way.
I am on freeBSD.
thanks in advance.
/* ctime example */
#include <stdio.h>
#include <time.h>
int main ()
{
time_t rawtime;
time ( &rawtime );
printf ( "The current local time is: %s", ctime (&rawtime) );
return 0;
}
You can use ctime, if you need it as a string.
Standard C provides only one way to get the time - time() - which can be converted to a time/date/year with localtime() or gmtime(). So trivially, that must be the most efficient way.
Any other methods are operating-system specific, and you haven't told us what operating system you're using.
It really depends on what you mean by "many" :-)
I think you'll probably find that using the ISO standard time() and localtime() functions will be more than fast enough. For example, on my "Intel(R) Core(TM)2 Duo CPU E6850 # 3.00GHz", using unoptimised code, I can call time() ten million times in 1.045 seconds, and a time()/localtime() combination half a million times in 0.98 seconds. Whether that's fast enough for your needs, only you can decide, but I'm hard-pressed trying to come up with a use case that needs more grunt than that.
The time() function gives you the number of seconds since the epoch, while localtime() both converts it to local time (from UTC) and splits it into a more usable form, the struct tm structure.
#include <time.h>
time_t t = time (NULL);
struct tm* lt = localtime (&t);
// Use lt->tm_year, lt->tm_mday, and so forth.
Any attempt to cache the date/time and use other ways of finding out a delta to apply to it, such as with clock(), will almost invariably:
be slower; and
suffer from the fact you won't pick up external time changes.
The simplest is
#include <time.h>
//...
time_t current_time = time (NULL);
struct tm* local_time = localtime (&current_time);
printf ("the time is %s\n", asctime (local_time));
You can use gettimeofday() function to get time in seconds & microseconds which is (I think) very fast (as there is a similar function in Linux kernel do_gettimeofday()) and then you can convert it to your required format (might possible to use functions mentioned above for conversion.
I hope this helps.
Just about the only way (that's standard, anyway) is to call time followed by localtime or gmtime.
well, in general, directly accessing the OS's API to get the time is probably the most efficient, but not so portable.....
the C time functions are ok.
But really depends on your platform
Assuming a one second resolution is enough, the most efficient way on FreeBSD (or any POSIX system) is likely
Install a one second interval timer with setitimer (ITIMER_REAL, ...)
When it triggers the SIGALRM, update a static variable holding the currrent time
Use the value in the static variable whenever you need the time
Even if signals get lost due to system overload this will correct itself the next time the process is scheduled.

Resources