How to measure the elapsead time below nanosecond for x86?

How to measure the elapsead time below nanosecond for x86? - c

I have searched and used many approaches for measuring the elapsed time. there are many questions for this purpose. For example, this question is very good but when you need an accurate time recorder I couldn't find a good method. For this, I want to share my method here to be used and be corrected if something is wrong.
UPDATE&NOTE: this question is for Benchmarking, less than one nanosecond. It's completely different from using clock_gettime(CLOCK_MONOTONIC,&start); it records time more than one nanosecond.
UPDATE : A common method to measure the speedup is repeating a section of the program which should be benchmarked. But, as mentioned in comment it might show different optimization when the researcher rely on autovectorizing.
NOTE It's not accurate enough to measure the elapsed time in one repeatinng. In some cases my results show that the section must be repeated more than 1K or 1M to get the smallest time.
SUGGESTION : I'm not familiar with shell programming (just know some basic commands...) But, it might be possible to measure the smallest time with out repeating inside the program.
MY CURRENT SOLUTION In order to prevent the branches I repeat the ode section using a macro #define REP_CODE(X) X X X... X X which X is the code section I want to benchmark as follows:
//numbers
#define FMAX1 MAX1*MAX1
#define COEFF 8
int __attribute__(( aligned(32))) input[FMAX1+COEFF]; //= {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17};
int __attribute__(( aligned(32))) output[FMAX1];
int __attribute__(( aligned(32))) coeff[COEFF] = {1,2,3,4,5,6,7,8};//= {1,1,1,1,1,1,1,1};//; //= {1,2,1,2,1,2,1,2,2,1};
int main()
{
REP_CODE(
t1_rdtsc=_rdtsc();
//Code
for(i = 0; i < FMAX1; i++){
for(j = 0; j < COEFF; j++){//IACA_START
output[i] += coeff[j] * input[i+j];
}//IACA_END
}
t2_rdtsc=_rdtsc();
ttotal_rdtsc[ii++]=t2_rdtsc-t1_rdtsc;
)
// The smallest element in `ttotal_rdtsc` is the answer
}
This does not impact the optimization but also is restricted by code size and compiling time is too much in some cases.
Any suggestion and correction?
Thanks in advance.

If you have problem with autovectorizer and want to limit it just add a asm("#somthing"); after your begin_rdtsc it will separate the do-while loop. I just checked and it vectorized your posted code which auto vectorizer was unable to vectorize it.
I changed your macro you can use it....
long long t1_rdtsc, t2_rdtsc, ttotal_rdtsc[do_while], ttbest_rdtsc = 99999999999999999, elapsed, elapsed_rdtsc=do_while, overal_time = OVERAL_TIME, ttime=0;
int ii=0;
#define begin_rdtsc\
do{\
asm("#mmmmmmmmmmm");\
t1_rdtsc=_rdtsc();
#define end_rdtsc\
t2_rdtsc=_rdtsc();\
asm("#mmmmmmmmmmm");\
ttotal_rdtsc[ii]=t2_rdtsc-t1_rdtsc;\
}while (ii++<do_while);\
for(ii=0; ii<do_while; ii++){\
if (ttotal_rdtsc[ii]<ttbest_rdtsc){\
ttbest_rdtsc = ttotal_rdtsc[ii];}}\
printf("\nthe best is %lld in %lld iteration\n", ttbest_rdtsc, elapsed_rdtsc);

I have developed my first answer and got this solution. But, I still want a solution. Because it is very important to measure the time accurately and with the least impacts. I put this part in a header file and include it in main program files.
//Header file header.h
#define count 1000 // number of repetition
long long t1_rdtsc, t2_rdtsc, ttotal_rdtsc[count], ttbest_rdtsc = 99999999999999999, elapsed, elapsed_rdtsc=count, overal_time = OVERAL_TIME, ttime=0;
int ii=0;
#define begin_rdtsc\
do{\
t1_rdtsc=_rdtsc();
#define end_rdtsc\
t2_rdtsc=_rdtsc();\
ttotal_rdtsc[ii]=t2_rdtsc-t1_rdtsc;\
}while (ii++<count);\
for(ii=0; ii<do_while; ii++){\
if (ttotal_rdtsc[ii]<ttbest_rdtsc){\
ttbest_rdtsc = ttotal_rdtsc[ii];}}\
printf("\nthe best is %lld in %lldth iteration \n", ttbest_rdtsc, elapsed_rdtsc);
//Main program
#include "header.h"
.
.
.
int main()
{
//before the section
begin_rdtsc
//put your code here to measure the clocks.
end_rdtsc
return 0
}

I recommend using this method for x86 micro-architecture.
NOTE:
NUM_LOOP should be a number which helps to increase the accuracy
with repeating your code to record the best time
ttbest_rdtsc must
be bigger than the worst time I recommend to maximize it.
I used (you might not want it) OVERAL_TIME as another checking rule because I used this for many kernels and in some cases NUM_LOOP was very big and I didn't want to change it. I planned OVERAL_TIME to limit the iterations and stop after specific time.
UPDATE: The whole program is this:
#include <stdio.h>
#include <x86intrin.h>
#define NUM_LOOP 100 //executes your code NUM_LOOP times to get the smalest time to avoid overheads such as cache misses, etc.
int main()
{
long long t1_rdtsc, t2_rdtsc, ttotal_rdtsc, ttbest_rdtsc = 99999999999999999;
int do_while = 0;
do{
t1_rdtsc = _rdtsc();
//put your code here
t2_rdtsc = _rdtsc();
ttotal_rdtsc = t2_rdtsc - t1_rdtsc;
//store the smalest time:
if (ttotal_rdtsc<ttbest_rdtsc)
ttbest_rdtsc = ttotal_rdtsc;
}while (do_while++ < NUM_LOOP);
printf("\nthe best is %lld in %d repetitions\n", ttbest_rdtsc, NUM_LOOP );
return 0;
}
that I have changed to this and added to a header for my self then I can use it simply in my program.
#include <x86intrin.h>
#define do_while NUM_LOOP
#define OVERAL_TIME 999999999
long long t1_rdtsc, t2_rdtsc, ttotal_rdtsc, ttbest_rdtsc = 99999999999999999, elapsed, elapsed_rdtsc=do_while, overal_time = OVERAL_TIME, ttime=0;
#define begin_rdtsc\
do{\
t1_rdtsc=_rdtsc();
#define end_rdtsc\
t2_rdtsc=_rdtsc();\
ttotal_rdtsc=t2_rdtsc-t1_rdtsc;\
if (ttotal_rdtsc<ttbest_rdtsc){\
ttbest_rdtsc = ttotal_rdtsc;\
elapsed=(do_while-elapsed_rdtsc);}\
ttime+=ttotal_rdtsc;\
}while (elapsed_rdtsc-- && (ttime<overal_time));\
printf("\nthe best is %lld in %lldth iteration and %lld repetitions\n", ttbest_rdtsc, elapsed, (do_while-elapsed_rdtsc));
How to use this method? Well, it is very simple!
int main()
{
//before the section
begin_rdtsc
//put your code here to measure the clocks.
end_rdtsc
return 0
}
Be creative, You can change it to measure the speedup in your program, etc.
An example of the output is:
the best is 9600 in 384751th iteration and 569179 repetitions
my tested code got 9600 clock that the best was recorded in 384751enditeration and my code was tested 569179 times
I have tested them on GCC and Clang.

Related

Using Time stamp counter to get the time stamp

I have used the below code to get the clock cycle of the processor
unsigned long long rdtsc(void)
{
unsigned hi, lo;
__asm__ __volatile__ ("rdtsc" : "=a"(lo), "=d"(hi));
return ( (unsigned long long)lo)|( ((unsigned long long)hi)<<32 );
}
I get some value say 43, but what is the unit here? Is it in microseconds or nanoseconds.
I used below code to get the frequency of my board.
cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_cur_freq
1700000
I also used below code to find my processor speed
dmidecode -t processor | grep "Speed"
Max Speed: 3700 MHz
Current Speed: 3700 MHz
Now how do I use above frequency and convert it to microseconds or milliseconds?

A simple answer to the stated question, "how do I convert the TSC frequency to microseconds or milliseconds?" is: You do not. What the TSC (Time Stamp Counter) clock frequency actually is, varies depending on the hardware, and may vary during runtime on some. To measure real time, you use clock_gettime(CLOCK_REALTIME) or clock_gettime(CLOCK_MONOTONIC) in Linux.
As Peter Cordes mentioned in a comment (Aug 2018), on most current x86-64 architectures the Time Stamp Counter (accessed by the RDTSC instruction and __rdtsc() function declared in <x86intrin.h>) counts reference clock cycles, not CPU clock cycles. His answer to a similar question in C++ is valid for C also in Linux on x86-64, because the compiler provides the underlying built-in when compiling C or C++, and rest of the answer deals with the hardware details. I recommend reading that one, too.
The rest of this answer assumes the underlying issue is microbenchmarking code, to find out how two implementations of some function compare to each other.
On x86 (Intel 32-bit) and x86-64 (AMD64, Intel and AMD 64-bit) architectures, you can use __rdtsc() from <x86intrin.h> to find out the number of TSC clock cycles elapsed. This can be used to measure and compare the number of cycles used by different implementations of some function, typically a large number of times.
Do note that there are hardware differences as to how the TSC clock is related to CPU clock. The abovementioned more recent answer goes into some detail on that. For practical purposes in Linux, it is sufficient in Linux to use cpufreq-set to disable frequency scaling (to ensure the relationship between the CPU and TSC frequencies does not change during microbenchmarking), and optionally taskset to restrict the microbenchmark to specific CPU core(s). That ensures that the results gathered in that microbenchmark yield results that can be compared to each other.
(As Peter Cordes commented, we also want to add _mm_lfence() from <emmintrin.h> (included by <immintrin.h>). This ensures that the CPU does not internally reorder the RDTSC operation compared to the function to be benchmarked. You can use -DNO_LFENCE at compile time to omit those, if you want.)
Let's say you have functions void foo(void); and void bar(void); that you wish to compare:
#include <stdlib.h>
#include <x86intrin.h>
#include <stdio.h>
#ifdef NO_LFENCE
#define lfence()
#else
#include <emmintrin.h>
#define lfence() _mm_lfence()
#endif
static int cmp_ull(const void *aptr, const void *bptr)
{
const unsigned long long a = *(const unsigned long long *)aptr;
const unsigned long long b = *(const unsigned long long *)bptr;
return (a < b) ? -1 :
(a > b) ? +1 : 0;
}
unsigned long long *measure_cycles(size_t count, void (*func)())
{
unsigned long long *elapsed, started, finished;
size_t i;
elapsed = malloc((count + 2) * sizeof elapsed[0]);
if (!elapsed)
return NULL;
/* Call func() count times, measuring the TSC cycles for each call. */
for (i = 0; i < count; i++) {
/* First, let's ensure our CPU executes everything thus far. */
lfence();
/* Start timing. */
started = __rdtsc();
/* Ensure timing starts before we call the function. */
lfence();
/* Call the function. */
func();
/* Ensure everything has been executed thus far. */
lfence();
/* Stop timing. */
finished = __rdtsc();
/* Ensure we have the counter value before proceeding. */
lfence();
elapsed[i] = finished - started;
}
/* The very first call is likely the cold-cache case,
so in case that measurement might contain useful
information, we put it at the end of the array.
We also terminate the array with a zero. */
elapsed[count] = elapsed[0];
elapsed[count + 1] = 0;
/* Sort the cycle counts. */
qsort(elapsed, count, sizeof elapsed[0], cmp_ull);
/* This function returns all cycle counts, in sorted order,
although the median, elapsed[count/2], is the one
I personally use. */
return elapsed;
}
void benchmark(const size_t count)
{
unsigned long long *foo_cycles, *bar_cycles;
if (count < 1)
return;
printf("Measuring run time in Time Stamp Counter cycles:\n");
fflush(stdout);
foo_cycles = measure_cycles(count, foo);
bar_cycles = measure_cycles(count, bar);
printf("foo(): %llu cycles (median of %zu calls)\n", foo_cycles[count/2], count);
printf("bar(): %llu cycles (median of %zu calls)\n", bar_cycles[count/2], count);
free(bar_cycles);
free(foo_cycles);
}
Note that the above results are very specific to the compiler and compiler options used, and of course on the hardware it is run on. The median number of cycles can be interpreted as "the typical number of TSC cycles taken", because the measurement is not completely reliable (may be affected by events outside the process; for example, by context switches, or by migration to another core on some CPUs). For the same reason, I don't trust the minimum, maximum, or average values.
However, the two implementations' (foo() and bar()) cycle counts above can be compared to find out how their performance compares to each other, in a microbenchmark. Just remember that microbenchmark results may not extend to real work tasks, because of how complex tasks' resource use interactions are. One function might be superior in all microbenchmarks, but poorer than others in real world, because it is only efficient when it has lots of CPU cache to use, for example.
In Linux in general, you can use the CLOCK_REALTIME clock to measure real time (wall clock time) used, in the very same manner as above. CLOCK_MONOTONIC is even better, because it is not affected by direct changes to the realtime clock the administrator might make (say, if they noticed the system clock is ahead or behind); only drift adjustments due to NTP etc. are applied. Daylight savings time or changes thereof does not affect the measurements, using either clock. Again, the median of a number of measurements is the result I seek, because events outside the measured code itself can affect the result.
For example:
#define _POSIX_C_SOURCE 200809L
#include <stdlib.h>
#include <stdio.h>
#include <time.h>
#ifdef NO_LFENCE
#define lfence()
#else
#include <emmintrin.h>
#define lfence() _mm_lfence()
#endif
static int cmp_double(const void *aptr, const void *bptr)
{
const double a = *(const double *)aptr;
const double b = *(const double *)bptr;
return (a < b) ? -1 :
(a > b) ? +1 : 0;
}
double median_seconds(const size_t count, void (*func)())
{
struct timespec started, stopped;
double *seconds, median;
size_t i;
seconds = malloc(count * sizeof seconds[0]);
if (!seconds)
return -1.0;
for (i = 0; i < count; i++) {
lfence();
clock_gettime(CLOCK_MONOTONIC, &started);
lfence();
func();
lfence();
clock_gettime(CLOCK_MONOTONIC, &stopped);
lfence();
seconds[i] = (double)(stopped.tv_sec - started.tv_sec)
+ (double)(stopped.tv_nsec - started.tv_nsec) / 1000000000.0;
}
qsort(seconds, count, sizeof seconds[0], cmp_double);
median = seconds[count / 2];
free(seconds);
return median;
}
static double realtime_precision(void)
{
struct timespec t;
if (clock_getres(CLOCK_REALTIME, &t) == 0)
return (double)t.tv_sec
+ (double)t.tv_nsec / 1000000000.0;
return 0.0;
}
void benchmark(const size_t count)
{
double median_foo, median_bar;
if (count < 1)
return;
printf("Median wall clock times over %zu calls:\n", count);
fflush(stdout);
median_foo = median_seconds(count, foo);
median_bar = median_seconds(count, bar);
printf("foo(): %.3f ns\n", median_foo * 1000000000.0);
printf("bar(): %.3f ns\n", median_bar * 1000000000.0);
printf("(Measurement unit is approximately %.3f ns)\n", 1000000000.0 * realtime_precision());
fflush(stdout);
}
In general, I personally prefer to compile the benchmarked function in a separate unit (to a separate object file), and also benchmark a do-nothing function to estimate the function call overhead (although it tends to give an overestimate for the overhead; i.e. yield too large an overhead estimate, because some of the function call overhead is latencies and not actual time taken, and some operations are possible during those latencies in the actual functions).
It is important to remember that the above measurements should only be used as indications, because in a real world application, things like cache locality (especially on current machines, with multi-level caching, and lots of memory) hugely affect the time used by different implementations.
For example, you might compare the speeds of a quicksort and a radix sort. Depending on the size of the keys, the radix sort requires rather large extra arrays (and uses a lot of cache). If the real application the sort routine is used in does not simultaneously use a lot of other memory (and thus the sorted data is basically what is cached), then a radix sort will be faster if there is enough data (and the implementation is sane). However, if the application is multithreaded, and the other threads shuffle (copy or transfer) a lot of memory around, then the radix sort using a lot of cache will evict other data also cached; even though the radix sort function itself does not show any serious slowdown, it may slow down the other threads and therefore the overall program, because the other threads have to wait for their data to be re-cached.
This means that the only "benchmarks" you should trust, are wall clock measurements used on the actual hardware, running actual work tasks with actual work data. Everything else is subject to many conditions, and are more or less suspect: indications, yes, but not very reliable.

2D array, prototype function and random numbers [duplicate]

I need a 'good' way to initialize the pseudo-random number generator in C++. I've found an article that states:
In order to generate random-like
numbers, srand is usually initialized
to some distinctive value, like those
related with the execution time. For
example, the value returned by the
function time (declared in header
ctime) is different each second, which
is distinctive enough for most
randoming needs.
Unixtime isn't distinctive enough for my application. What's a better way to initialize this? Bonus points if it's portable, but the code will primarily be running on Linux hosts.
I was thinking of doing some pid/unixtime math to get an int, or possibly reading data from /dev/urandom.
Thanks!
EDIT
Yes, I am actually starting my application multiple times a second and I've run into collisions.

This is what I've used for small command line programs that can be run frequently (multiple times a second):
unsigned long seed = mix(clock(), time(NULL), getpid());
Where mix is:
// Robert Jenkins' 96 bit Mix Function
unsigned long mix(unsigned long a, unsigned long b, unsigned long c)
{
a=a-b; a=a-c; a=a^(c >> 13);
b=b-c; b=b-a; b=b^(a << 8);
c=c-a; c=c-b; c=c^(b >> 13);
a=a-b; a=a-c; a=a^(c >> 12);
b=b-c; b=b-a; b=b^(a << 16);
c=c-a; c=c-b; c=c^(b >> 5);
a=a-b; a=a-c; a=a^(c >> 3);
b=b-c; b=b-a; b=b^(a << 10);
c=c-a; c=c-b; c=c^(b >> 15);
return c;
}

The best answer is to use <random>. If you are using a pre C++11 version, you can look at the Boost random number stuff.
But if we are talking about rand() and srand()
The best simplest way is just to use time():
int main()
{
srand(time(nullptr));
...
}
Be sure to do this at the beginning of your program, and not every time you call rand()!
Side Note:
NOTE: There is a discussion in the comments below about this being insecure (which is true, but ultimately not relevant (read on)). So an alternative is to seed from the random device /dev/random (or some other secure real(er) random number generator). BUT: Don't let this lull you into a false sense of security. This is rand() we are using. Even if you seed it with a brilliantly generated seed it is still predictable (if you have any value you can predict the full sequence of next values). This is only useful for generating "pseudo" random values.
If you want "secure" you should probably be using <random> (Though I would do some more reading on a security informed site). See the answer below as a starting point: https://stackoverflow.com/a/29190957/14065 for a better answer.
Secondary note: Using the random device actually solves the issues with starting multiple copies per second better than my original suggestion below (just not the security issue).
Back to the original story:
Every time you start up, time() will return a unique value (unless you start the application multiple times a second). In 32 bit systems, it will only repeat every 60 years or so.
I know you don't think time is unique enough but I find that hard to believe. But I have been known to be wrong.
If you are starting a lot of copies of your application simultaneously you could use a timer with a finer resolution. But then you run the risk of a shorter time period before the value repeats.
OK, so if you really think you are starting multiple applications a second.
Then use a finer grain on the timer.
int main()
{
struct timeval time;
gettimeofday(&time,NULL);
// microsecond has 1 000 000
// Assuming you did not need quite that accuracy
// Also do not assume the system clock has that accuracy.
srand((time.tv_sec * 1000) + (time.tv_usec / 1000));
// The trouble here is that the seed will repeat every
// 24 days or so.
// If you use 100 (rather than 1000) the seed repeats every 248 days.
// Do not make the MISTAKE of using just the tv_usec
// This will mean your seed repeats every second.
}

if you need a better random number generator, don't use the libc rand. Instead just use something like /dev/random or /dev/urandom directly (read in an int directly from it or something like that).
The only real benefit of the libc rand is that given a seed, it is predictable which helps with debugging.

On windows:
srand(GetTickCount());
provides a better seed than time() since its in milliseconds.

C++11 random_device
If you need reasonable quality then you should not be using rand() in the first place; you should use the <random> library. It provides lots of great functionality like a variety of engines for different quality/size/performance trade-offs, re-entrancy, and pre-defined distributions so you don't end up getting them wrong. It may even provide easy access to non-deterministic random data, (e.g., /dev/random), depending on your implementation.
#include <random>
#include <iostream>
int main() {
std::random_device r;
std::seed_seq seed{r(), r(), r(), r(), r(), r(), r(), r()};
std::mt19937 eng(seed);
std::uniform_int_distribution<> dist{1,100};
for (int i=0; i<50; ++i)
std::cout << dist(eng) << '\n';
}
eng is a source of randomness, here a built-in implementation of mersenne twister. We seed it using random_device, which in any decent implementation will be a non-determanistic RNG, and seed_seq to combine more than 32-bits of random data. For example in libc++ random_device accesses /dev/urandom by default (though you can give it another file to access instead).
Next we create a distribution such that, given a source of randomness, repeated calls to the distribution will produce a uniform distribution of ints from 1 to 100. Then we proceed to using the distribution repeatedly and printing the results.

Best way is to use another pseudorandom number generator.
Mersenne twister (and Wichmann-Hill) is my recommendation.
http://en.wikipedia.org/wiki/Mersenne_twister

i suggest you see unix_random.c file in mozilla code. ( guess it is mozilla/security/freebl/ ...) it should be in freebl library.
there it uses system call info ( like pwd, netstat ....) to generate noise for the random number;it is written to support most of the platforms (which can gain me bonus point :D ).

The real question you must ask yourself is what randomness quality you need.
libc random is a LCG
The quality of randomness will be low whatever input you provide srand with.
If you simply need to make sure that different instances will have different initializations, you can mix process id (getpid), thread id and a timer. Mix the results with xor. Entropy should be sufficient for most applications.
Example :
struct timeb tp;
ftime(&tp);
srand(static_cast<unsigned int>(getpid()) ^
static_cast<unsigned int>(pthread_self()) ^
static_cast<unsigned int >(tp.millitm));
For better random quality, use /dev/urandom. You can make the above code portable in using boost::thread and boost::date_time.

The c++11 version of the top voted post by Jonathan Wright:
#include <ctime>
#include <random>
#include <thread>
...
const auto time_seed = static_cast<size_t>(std::time(0));
const auto clock_seed = static_cast<size_t>(std::clock());
const size_t pid_seed =
std::hash<std::thread::id>()(std::this_thread::get_id());
std::seed_seq seed_value { time_seed, clock_seed, pid_seed };
...
// E.g seeding an engine with the above seed.
std::mt19937 gen;
gen.seed(seed_value);

#include <stdio.h>
#include <sys/time.h>
main()
{
struct timeval tv;
gettimeofday(&tv,NULL);
printf("%d\n", tv.tv_usec);
return 0;
}
tv.tv_usec is in microseconds. This should be acceptable seed.

As long as your program is only running on Linux (and your program is an ELF executable), you are guaranteed that the kernel provides your process with a unique random seed in the ELF aux vector. The kernel gives you 16 random bytes, different for each process, which you can get with getauxval(AT_RANDOM). To use these for srand, use just an int of them, as such:
#include <sys/auxv.h>
void initrand(void)
{
unsigned int *seed;
seed = (unsigned int *)getauxval(AT_RANDOM);
srand(*seed);
}
It may be possible that this also translates to other ELF-based systems. I'm not sure what aux values are implemented on systems other than Linux.

Suppose you have a function with a signature like:
int foo(char *p);
An excellent source of entropy for a random seed is a hash of the following:
Full result of clock_gettime (seconds and nanoseconds) without throwing away the low bits - they're the most valuable.
The value of p, cast to uintptr_t.
The address of p, cast to uintptr_t.
At least the third, and possibly also the second, derive entropy from the system's ASLR, if available (the initial stack address, and thus current stack address, is somewhat random).
I would also avoid using rand/srand entirely, both for the sake of not touching global state, and so you can have more control over the PRNG that's used. But the above procedure is a good (and fairly portable) way to get some decent entropy without a lot of work, regardless of what PRNG you use.

For those using Visual Studio here's yet another way:
#include "stdafx.h"
#include <time.h>
#include <windows.h>
const __int64 DELTA_EPOCH_IN_MICROSECS= 11644473600000000;
struct timezone2
{
__int32 tz_minuteswest; /* minutes W of Greenwich */
bool tz_dsttime; /* type of dst correction */
};
struct timeval2 {
__int32 tv_sec; /* seconds */
__int32 tv_usec; /* microseconds */
};
int gettimeofday(struct timeval2 *tv/*in*/, struct timezone2 *tz/*in*/)
{
FILETIME ft;
__int64 tmpres = 0;
TIME_ZONE_INFORMATION tz_winapi;
int rez = 0;
ZeroMemory(&ft, sizeof(ft));
ZeroMemory(&tz_winapi, sizeof(tz_winapi));
GetSystemTimeAsFileTime(&ft);
tmpres = ft.dwHighDateTime;
tmpres <<= 32;
tmpres |= ft.dwLowDateTime;
/*converting file time to unix epoch*/
tmpres /= 10; /*convert into microseconds*/
tmpres -= DELTA_EPOCH_IN_MICROSECS;
tv->tv_sec = (__int32)(tmpres * 0.000001);
tv->tv_usec = (tmpres % 1000000);
//_tzset(),don't work properly, so we use GetTimeZoneInformation
rez = GetTimeZoneInformation(&tz_winapi);
tz->tz_dsttime = (rez == 2) ? true : false;
tz->tz_minuteswest = tz_winapi.Bias + ((rez == 2) ? tz_winapi.DaylightBias : 0);
return 0;
}
int main(int argc, char** argv) {
struct timeval2 tv;
struct timezone2 tz;
ZeroMemory(&tv, sizeof(tv));
ZeroMemory(&tz, sizeof(tz));
gettimeofday(&tv, &tz);
unsigned long seed = tv.tv_sec ^ (tv.tv_usec << 12);
srand(seed);
}
Maybe a bit overkill but works well for quick intervals. gettimeofday function found here.
Edit: upon further investigation rand_s might be a good alternative for Visual Studio, it's not just a safe rand(), it's totally different and doesn't use the seed from srand. I had presumed it was almost identical to rand just "safer".
To use rand_s just don't forget to #define _CRT_RAND_S before stdlib.h is included.

Assuming that the randomness of srand() + rand() is enough for your purposes, the trick is in selecting the best seed for srand. time(NULL) is a good starting point, but you'll run into problems if you start more than one instance of the program within the same second. Adding the pid (process id) is an improvement as different instances will get different pids. I would multiply the pid by a factor to spread them more.
But let's say you are using this for some embedded device and you have several in the same network. If they are all powered at once and you are launching the several instances of your program automatically at boot time, they may still get the same time and pid and all the devices will generate the same sequence of "random" numbers. In that case, you may want to add some unique identifier of each device (like the CPU serial number).
The proposed initialization would then be:
srand(time(NULL) + 1000 * getpid() + (uint) getCpuSerialNumber());
In a Linux machine (at least in the Raspberry Pi where I tested this), you can implement the following function to get the CPU Serial Number:
// Gets the CPU Serial Number as a 64 bit unsigned int. Returns 0 if not found.
uint64_t getCpuSerialNumber() {
FILE *f = fopen("/proc/cpuinfo", "r");
if (!f) {
return 0;
}
char line[256];
uint64_t serial = 0;
while (fgets(line, 256, f)) {
if (strncmp(line, "Serial", 6) == 0) {
serial = strtoull(strchr(line, ':') + 2, NULL, 16);
}
}
fclose(f);
return serial;
}

Include the header at the top of your program, and write:
srand(time(NULL));
In your program before you declare your random number. Here is an example of a program that prints a random number between one and ten:
#include <iostream>
#include <iomanip>
using namespace std;
int main()
{
//Initialize srand
srand(time(NULL));
//Create random number
int n = rand() % 10 + 1;
//Print the number
cout << n << endl; //End the line
//The main function is an int, so it must return a value
return 0;
}

Run code for exactly one second

I would like to know how I can program something so that my program runs as long as a second lasts.
I would like to evaluate parts of my code and see where the time is spend most so I am analyzing parts of it.
Here's the interesting part of my code :
int size = 256
clock_t start_benching = clock();
for (uint32_t i = 0;i < size; i+=4)
{
myarray[i];
myarray[i+1];
myarray[i+2];
myarray[i+3];
}
clock_t stop_benching = clock();
This just gives me how long the function needed to perform all the operations.
I want to run the code for one second and see how many operations have been done.
This is the line to print the time measurement:
printf("Walking through buffer took %f seconds\n", (double)(stop_benching - start_benching) / CLOCKS_PER_SEC);

A better approach to benchmarking is to know the % of time spent on each section of the code.
Instead of making your code run for exactly 1 second, make stop_benchmarking - start_benchmarking the total run time - Take the time spent on any part of the code and divide by the total runtime to get a value between 0 and 1. Multiply this value by 100 and you have the % of time consumed at that specific section.

Non-answer advice: Use an actual profiler to profile the performance of code sections.
On *nix you can set an alarm(2) with a signal handler that sets a global flag to indicate the elapsed time. The Windows API provides something similar with SetTimer.
#include <unistd.h>
#include <signal.h>
int time_elapsed = 0;
void alarm_handler(int signal) {
time_elapsed = 1;
}
int main() {
signal(SIGALRM, &alarm_handler);
alarm(1); // set alarm time-out to 1 second
do {
// stuff...
} while (!time_elapsed);
return 0;
}
In more complicated cases you can use setitimer(2) instead of alarm(2), which lets you
use microsecond precision and
choose between counting
wall clock time,
user CPU time, or
user and system CPU time.

Measuring time in millisecond precision

My program is going to race different sorting algorithms against each other, both in time and space. I've got space covered, but measuring time is giving me some trouble. Here is the code that runs the sorts:
void test(short* n, short len) {
short i, j, a[1024];
for(i=0; i<2; i++) { // Loop over each sort algo
memused = 0; // Initialize memory marker
for(j=0; j<len; j++) // Copy scrambled list into fresh array
a[j] = n[j]; // (Sorting algos are in-place)
// ***Point A***
switch(i) { // Pick sorting algo
case 0:
selectionSort(a, len);
case 1:
quicksort(a, len);
}
// ***Point B***
spc[i][len] = memused; // Record how much mem was used
}
}
(I removed some of the sorting algos for simplicity)
Now, I need to measure how much time the sorting algo takes. The most obvious way to do this is to record the time at point (a) and then subtract that from the time at point (b). But none of the C time functions are good enough:
time() gives me time in seconds, but the algos are faster than that, so I need something more accurate.
clock() gives me CPU ticks since the program started, but seems to round to the nearest 10,000; still not small enough
The time shell command works well enough, except that I need to run over 1,000 tests per algorithm, and I need the individual time for each one.
I have no idea what getrusage() returns, but it's also too long.
What I need is time in units (significantly, if possible) smaller than the run time of the sorting functions: about 2ms. So my question is: Where can I get that?

gettimeofday() has microseconds resolution and is easy to use.
A pair of useful timer functions is:
static struct timeval tm1;
static inline void start()
{
gettimeofday(&tm1, NULL);
}
static inline void stop()
{
struct timeval tm2;
gettimeofday(&tm2, NULL);
unsigned long long t = 1000 * (tm2.tv_sec - tm1.tv_sec) + (tm2.tv_usec - tm1.tv_usec) / 1000;
printf("%llu ms\n", t);
}

For measuring time, use clock_gettime with CLOCK_MONOTONIC (or CLOCK_MONOTONIC_RAW if it is available). Where possible, avoid using gettimeofday. It is specifically deprecated in favor of clock_gettime, and the time returned from it is subject to adjustments from time servers, which can throw off your measurements.

You can get the total user + kernel time (or choose just one) using getrusage as follows:
#include <sys/time.h>
#include <sys/resource.h>
double get_process_time() {
struct rusage usage;
if( 0 == getrusage(RUSAGE_SELF, &usage) ) {
return (double)(usage.ru_utime.tv_sec + usage.ru_stime.tv_sec) +
(double)(usage.ru_utime.tv_usec + usage.ru_stime.tv_usec) / 1.0e6;
}
return 0;
}
I elected to create a double containing fractional seconds...
double t_begin, t_end;
t_begin = get_process_time();
// Do some operation...
t_end = get_process_time();
printf( "Elapsed time: %.6f seconds\n", t_end - t_begin );

The Time Stamp Counter could be helpful here:
static unsigned long long rdtsctime() {
unsigned int eax, edx;
unsigned long long val;
__asm__ __volatile__("rdtsc":"=a"(eax), "=d"(edx));
val = edx;
val = val << 32;
val += eax;
return val;
}
Though there are some caveats to this. The timestamps for different processor cores may be different, and changing clock speeds (due to power saving features and the like) can cause erroneous results.

Get average run-time of a C program

I'm trying to measure differences in speed of reading and writing misaligned vs aligned bits into binary files. I would like to know is there an utility I can use (Except for running time over & over again and writing my own) to sample an average run-time of a program (I'm running Linux based OS)?
Thanks

running time over & over again and writing my own
That's fine. You can perform the read/write ten thousand times both ways and compute the average time.
If you really want to use a library you can try Google Perftools.

Put this in a header file:
#ifndef TIMER_H
#define TIMER_H
#include <stdlib>
#include <sys/time.h>
typedef unsigned long long timestamp_t;
static timestamp_t
get_timestamp ()
{
struct timeval now;
gettimeofday (&now, NULL);
return now.tv_usec + (timestamp_t)now.tv_sec * 1000000;
}
#endif
Include the header file into whichever .c file you'll be using, and do something like this:
#define N 10000
int main()
{
int i;
double avg;
timestamp_t start, end;
start = get_timestamp();
for(i = 0; i < N; i++)
foo();
end = get_timestamp();
avg = (end - start) / (double)N;
printf("%f", avg);
return 0;
}
Basically this calls whichever function you're trying to measure performance of N times, where N is a defined constant (doesn't have to be) in this case. It takes a timestamp before the for loop and after the for loop and then calculates the average time it's taken for the function to execute. The get_timestamp() function returns the number of microseconds, so if you need milliseconds, divide by 1000, seconds - divide by 1000000 etc.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

How to measure the elapsead time below nanosecond for x86? - c

Related

Using Time stamp counter to get the time stamp

2D array, prototype function and random numbers [duplicate]

Run code for exactly one second

Measuring time in millisecond precision

Get average run-time of a C program

Categories

Resources