I am currently working on the integration of a "shunt" type sensor on an electronic board. My choice was on a Linear (LTC2947), unfortunately it only has an Arduino driver. I have to translate everything in C under Linux to be compatible with my microprocessor (APQ8009 ARM Cortex-A7). I have a small question about one of the functions:
int16_t LTC2947_wake_up() //Wake up LTC2947 from shutdown mode and measure the wakeup time
{
byte data[1];
unsigned long wakeupStart = millis(), wakeupTime;
LTC2947_WR_BYTE(LTC2947_REG_OPCTL, 0);
do
{
delay(1);
LTC2947_RD_BYTE(LTC2947_REG_OPCTL, data);
wakeupTime = millis() - wakeupStart;
if (data[0] == 0) //! check if we are in idle mode
{
return wakeupTime;
}
if (wakeupTime > 200)
{
//! failed to wake up due to timeout, return -1
return -1;
}
}
while (true);
}
After finding usleep() as equivalent for delay(), I can not find it for millis() in C. Can you help me translate this function please?
Arduino millis() is based on a timer that trips an overflow interrupt at very close to 1 KHz, or 1 millisecond. To achieve the same thing, I suggest you setup a timer on the ARM platform and update a volatile unsigned long variable with a counter. That will be the equivalent of millis().
Here is what millis() is doing behind the scenes:
SIGNAL(TIMER0_OVF_vect)
{
// copy these to local variables so they can be stored in registers
// (volatile variables must be read from memory on every access)
unsigned long m = timer0_millis;
unsigned char f = timer0_fract;
m += MILLIS_INC;
f += FRACT_INC;
if (f >= FRACT_MAX) {
f -= FRACT_MAX;
m += 1;
}
timer0_fract = f;
timer0_millis = m;
timer0_overflow_count++;
}
unsigned long millis()
{
unsigned long m;
uint8_t oldSREG = SREG;
// disable interrupts while we read timer0_millis or we might get an
// inconsistent value (e.g. in the middle of a write to timer0_millis)
cli();
m = timer0_millis;
SREG = oldSREG;
return m;
}
Coming from the embedded world, arguably the first thing you should do when starting a project on a new platform is establish clocks and get a timer interrupt going at a prescribed rate. That is the "Hello World" of embedded systems. ;) If you choose to do this at 1 KHz, you're most of the way there.
#include <time.h>
unsigned int millis () {
struct timespec t ;
clock_gettime ( CLOCK_MONOTONIC_RAW , & t ) ; // change CLOCK_MONOTONIC_RAW to CLOCK_MONOTONIC on non linux computers
return t.tv_sec * 1000 + ( t.tv_nsec + 500000 ) / 1000000 ;
}
or
#include <sys/time.h>
unsigned int millis () {
struct timeval t ;
gettimeofday ( & t , NULL ) ;
return t.tv_sec * 1000 + ( t.tv_usec + 500 ) / 1000 ;
}
The gettimeofday() version probably does not work on non linux computers.
The clock_gettime() version probably does not work with old C compilers.
The arduino millis() returns unsigned long, 32 bit unsigned integer. Most
computers are 32 bit or 64 bit, so there is no need to use long except on
16 bit computers like arduino, so these versions return unsigned int. If
you want to measure a time period longer than 50 days in milliseconds, or if
you want the number of milliseconds since the beginning of unix in 1970, you
need a long long (64 bit) integer.
If a computer clock has the incorrect time, the operating system or system
administrator or program which synchonizes the computer clock with internet
clocks may change the computer clock to the correct time. This will affect
these functions, especially the gettimeofday() version. Usually there is a
big change in the computer clock when the computer boots, connects to the
network, and synchonizes the computer clock with the network time server.
But most programs are not running this early in the boot process, and thus
are not affected. Usually other changes to the computer clock are very
small, and the effect on other programs is very small. So usually changes
to the computer clock are not a problem.
The clock_gettime() requires a clock id.
CLOCK_MONOTONIC is not affected by discontinuous jumps in the system time,
but is affected by incremental adjustments, and does not count time computer
is suspended.
CLOCK_MONOTONIC_RAW is linux only, not affected by discontinuous jumps in
the system time, not affected by incremental adjustments, does not count
time computer is suspended.
CLOCK_BOOTTIME is linux only, not affected by discontinuous jumps in the
system time, but is affected by incremental adjustments, does count time
computer is suspended. It counts the time since the computer booted.
CLOCK_REALTIME is affected by discontinuous jumps in the system time, and by
incremental adjustments. It does count the time the computer is suspended.
It counts standard unix time (time since the beginning of unix in 1970).
I think CLOCK_MONOTONIC_RAW is the best choice for linux, and
CLOCK_MONOTONIC is the best choice for non linux. Usually millisecond time
is used to measure short periods of time, like how long it takes for part of
a computer program to run. In a short period of time, there will probably
be no changes to the computer clock, and the computer will probably not be
suspended, so any clock id will work, so the choice of clock id is not
important.
Precise time measurements are unreliable on multitasking computers because
the time measurement might be interrupted. Errors are usually small.
Sometimes this is a problem, and sometimes it isn't. If you need more
precise time measurements, you need dedicated hardware which cannot be
interrupted. Some computers have such hardware built in. For example, if a
program uses software pwm, changes to the output will be delayed if the
computer is interrupted at the time the computer needs to change the output.
But if the program uses hardware pwm, the hardware pwm controller cannot be
interrupted, and will change the output at the correct time.
Tested on a raspberry pi.
I hope this be useful. Works for me under Lubuntu 20.04 LTS.
#include <sys/time.h>
#include <stdio.h>
#include <unistd.h>
struct timeval __millis_start;
void init_millis() {
gettimeofday(&__millis_start, NULL);
};
unsigned long int millis() {
long mtime, seconds, useconds;
struct timeval end;
gettimeofday(&end, NULL);
seconds = end.tv_sec - __millis_start.tv_sec;
useconds = end.tv_usec - __millis_start.tv_usec;
mtime = ((seconds) * 1000 + useconds/1000.0) + 0.5;
return mtime;
};
int main()
{
init_millis();
printf("Elapsed time: %ld milliseconds\n", millis());
return 0;
}
Note:
Based on the discussion in comments (with dear #MarcCompere), I must mention that the conversion of seconds and useconds to mtime in the millis function is rounded by adding 0.5 (read comments to understand how!); but the 0.5 can be removed. It depends on your application. If you are using millis for accurate time measurement then add it to lower the "Mean Squared Error (MSE)" of conversion statistically. But if you need timing for general logic-based decisions (or closer behavior to that of Arduino), then the floor (natural behaviour when casting in this case) can be considered as the better option, so do not add the 0.5.
Related
I'm trying to establish what practical jitter I can achieve by using clock_nanosleep() in a loop and through experimentation I'm observing something I'm not confident I understand.
I'm using code posted in this SO question by another user to benchmark performance, targeting a 250ms interval. I've observed that on my system the sleep function returns very consistently 10us late with only about 2us jitter the vast majority of the time (fairly narrow statistical distribution).
NOTE: I haven't collected data to present a plot of statistical distribution but casual qualitative description should hopefully suffice.
I decided to subtract the 10us offset from the target wakeup time to compensate for it, and this caused the average error to be approximately zero as expected, however the jitter increased dramatically - I would estimate most wakeups are >100us early/late, and much more widely distributed.
Why is this?
My theory is that with the 10us correction the target waketimes are less nicely aligned with the underlying hardware clock, but it would be helpful to get confirmation. If this is true, is there a method to synchronize the phase of the target waketimes with the hardware clock?
Manpages for clock_nanosleep(2) say: "Furthermore, after the
sleep completes, there may still be a delay before the CPU
becomes free to once again execute the calling thread."
I tried to comprehend your question. For this I created the source code below based on the reference at SO which you provided. I include the source code such that you or someone else can check it, test it, play with it.
The debug print refers to a sleep of exactly 1 second. The debug print is shorter than the print in the comments - and the debug print will always refer to the deviation from 1 second, no matter which wakeTime has been defined. Thus, it is possible, to try a reduced wakeTime (wakeTime.tv_nsec-= some_value;) to achieve the target of 1 second.
Conclusions:
I would generally agree to all you (davegravy) write about it in your post, except that I am seeing much higher delays and deviations.
There are minor changes in the delay between a non-loaded and a heavy loaded system (all CPUs 100% load). On heavy loaded system scattering of delay reduces and the average delay also reduces (on my system - but not very significant).
As expected, the delay changes quite a bit when I try it on another machine (as expected raspberry pi is worse :o).
For a specific machine and moment it is possible to define a correction value of nanoseconds to bring the average sleep closer to the target. Anyway, the correction value is not necessarily equal to the delay error without correction. And the correction value might be different for different machines.
Idea: As the provided code can measure how good it is. There might be the chance, that the code does a few loops from which it can derive an optimized delay correction value by itself. (This auto-correction might be interesting just from a theoretical point of view. Well, it is an idea.)
Idea 2: Or some correction values can be created just to avoid a long-term shift when considering many intervals, one after another.
#include <pthread.h>
#include <unistd.h>
#include <stdint.h>
#include <stdio.h>
#define CLOCK CLOCK_MONOTONIC
//#define CLOCK CLOCK_REALTIME
//#define CLOCK CLOCK_TAI
//#define CLOCK CLOCK_BOOTTIME
static long calcTimeDiff(struct timespec const* t1, struct timespec const* t2)
{
long diff = t1->tv_nsec - t2->tv_nsec;
diff += 1000000000 * (t1->tv_sec - t2->tv_sec);
return diff;
}
static void* tickThread()
{
struct timespec sleepStart;
struct timespec currentTime;
struct timespec wakeTime;
long sleepTime;
long wakeDelay;
while(1)
{
clock_gettime(CLOCK, &wakeTime);
wakeTime.tv_sec += 1;
wakeTime.tv_nsec -= 0; // Value to play with for delay "correction"
clock_gettime(CLOCK, &sleepStart);
clock_nanosleep(CLOCK, TIMER_ABSTIME, &wakeTime, NULL);
clock_gettime(CLOCK, ¤tTime);
sleepTime = calcTimeDiff(¤tTime, &sleepStart);
wakeDelay = calcTimeDiff(¤tTime, &wakeTime);
{
/*printf("sleep req=%-ld.%-ld start=%-ld.%-ld curr=%-ld.%-ld sleep=%-ld delay=%-ld\n",
(long) wakeTime.tv_sec, (long) wakeTime.tv_nsec,
(long) sleepStart.tv_sec, (long) sleepStart.tv_nsec,
(long) currentTime.tv_sec, (long) currentTime.tv_nsec,
sleepTime, wakeDelay);*/
// Debug Short Print with respect to target sleep = 1 sec. = 1000000000 ns
long debugTargetDelay=sleepTime-1000000000;
printf("sleep=%-ld delay=%-ld targetdelay=%-ld\n",
sleepTime, wakeDelay, debugTargetDelay);
}
}
}
int main(int argc, char*argv[])
{
tickThread();
}
Some output with wakeTime.tv_nsec -= 0;
sleep=1000095788 delay=96104 targetdelay=95788
sleep=1000078989 delay=79155 targetdelay=78989
sleep=1000080717 delay=81023 targetdelay=80717
sleep=1000068001 delay=68251 targetdelay=68001
sleep=1000080475 delay=80519 targetdelay=80475
sleep=1000110925 delay=110977 targetdelay=110925
sleep=1000082415 delay=82561 targetdelay=82415
sleep=1000079572 delay=79713 targetdelay=79572
sleep=1000098609 delay=98664 targetdelay=98609
and with wakeTime.tv_nsec -= 65000;
sleep=1000031711 delay=96987 targetdelay=31711
sleep=1000009400 delay=74611 targetdelay=9400
sleep=1000015867 delay=80912 targetdelay=15867
sleep=1000015612 delay=80708 targetdelay=15612
sleep=1000030397 delay=95592 targetdelay=30397
sleep=1000015299 delay=80475 targetdelay=15299
sleep=999993542 delay=58614 targetdelay=-6458
sleep=1000031263 delay=96310 targetdelay=31263
sleep=1000002029 delay=67169 targetdelay=2029
sleep=1000031671 delay=96821 targetdelay=31671
sleep=999998462 delay=63608 targetdelay=-1538
Anyway, the delays change all the time. I tried different CLOCK definitions and different compiler options, but without any special results.
Some statistics from further testing, sample size = 100 in both cases.
targetdelay from wakeTime.tv_nsec -= 0;
Mean value = 97503 Standard deviation = 27536
targetdelay from wakeTime.tv_nsec -= 97508;
Mean value = -1909 Standard deviation = 32682
In both cases, there were a few massive outliers, such that even this result from 100 samples might not quite be representative.
I noticed the io_uring kernel side uses CLOCK_MONOTONIC at CLOCK_MONOTONIC, so for the first timer, I get the time with both CLOCK_REALTIME and CLOCK_MONOTONIC and adjust the nanosecond like below and use IORING_TIMEOUT_ABS flag for io_uring_prep_timeout. iorn/clock.c at master · hnakamur/iorn
const long sec_in_nsec = 1000000000;
static int queue_timeout(iorn_queue_t *queue) {
iorn_timeout_op_t *op = calloc(1, sizeof(*op));
if (op == NULL) {
return -ENOMEM;
}
struct timespec rts;
int ret = clock_gettime(CLOCK_REALTIME, &rts);
if (ret < 0) {
fprintf(stderr, "clock_gettime CLOCK_REALTIME error: %s\n", strerror(errno));
return -errno;
}
long nsec_diff = sec_in_nsec - rts.tv_nsec;
ret = clock_gettime(CLOCK_MONOTONIC, &op->ts);
if (ret < 0) {
fprintf(stderr, "clock_gettime CLOCK_MONOTONIC error: %s\n", strerror(errno));
return -errno;
}
op->handler = on_timeout;
op->ts.tv_sec++;
op->ts.tv_nsec += nsec_diff;
if (op->ts.tv_nsec > sec_in_nsec) {
op->ts.tv_sec++;
op->ts.tv_nsec -= sec_in_nsec;
}
op->count = 1;
op->flags = IORING_TIMEOUT_ABS;
ret = iorn_prep_timeout(queue, op);
if (ret < 0) {
return ret;
}
return iorn_submit(queue);
}
From the second time, I just increment the second part tv_sec and use IORING_TIMEOUT_ABS flag for io_uring_prep_timeout.
Here is the output from my example program. The millisecond part is zero but it is about 400 microsecond later than just second.
on_timeout time=2020-05-10T14:49:42.000442
on_timeout time=2020-05-10T14:49:43.000371
on_timeout time=2020-05-10T14:49:44.000368
on_timeout time=2020-05-10T14:49:45.000372
on_timeout time=2020-05-10T14:49:46.000372
on_timeout time=2020-05-10T14:49:47.000373
on_timeout time=2020-05-10T14:49:48.000373
Could you tell me a better way than this?
Thanks for your comments! I'd like to update the current time for logging like ngx_time_update(). I modified my example to use just CLOCK_REALTIME, but still about 400 microseconds late. github.com/hnakamur/iorn/commit/… Does it mean clock_gettime takes about 400 nanoseconds on my machine?
Yes, that sounds about right, sort of. But, if you're on an x86 PC under linux, 400 ns for clock_gettime overhead may be a bit high (order of magnitude higher--see below). If you're on an arm CPU (e.g. Raspberry Pi, nvidia Jetson), it might be okay.
I don't know how you're getting 400 microseconds. But, I've had to do a lot of realtime stuff under linux, and 400 us is similar to what I've measured as the overhead to do a context switch and/or wakeup a process/thread after a syscall suspends it.
I never use gettimeofday anymore. I now just use clock_gettime(CLOCK_REALTIME,...) because it's the same except you get nanoseconds instead of microseconds.
Just so you know, although clock_gettime is a syscall, nowadays, on most systems, it uses the VDSO layer. The kernel injects special code into the userspace app, so that it is able to access the time directly without the overhead of a syscall.
If you're interested, you could run under gdb and disassemble the code to see that it just accesses some special memory locations instead of doing a syscall.
I don't think you need to worry about this too much. Just use clock_gettime(CLOCK_MONOTONIC,...) and set flags to 0. The overhead doesn't factor into this, for the purposes of the ioring call as your iorn layer is using it.
When I do this sort of thing, and I want/need to calculate the overhead of clock_gettime itself, I call clock_gettime in a loop (e.g. 1000 times), and try to keep the total time below a [possible] timeslice. I use the minimum diff between times in each iteration. That compensates for any [possible] timeslicing.
The minimum is the overhead of the call itself [on average].
There are additional tricks that you can do to minimize latency in userspace (e.g. raising process priority, clamping CPU affinity and I/O interrupt affinity), but they can involve a few more things, and, if you're not very careful, they can produce worse results.
Before you start taking extraordinary measures, you should have a solid methodology to measure timing/benchmarking to prove that your results can not meet your timing/throughput/latency requirements. Otherwise, you're doing complicated things for no real/measurable/necessary benefit.
Below is some code I just created, simplified, but based on code I already have/use to calibrate the overhead:
#include <stdio.h>
#include <time.h>
#define ITERMAX 10000
typedef long long tsc_t;
// tscget -- get time in nanoseconds
static inline tsc_t
tscget(void)
{
struct timespec ts;
tsc_t tsc;
clock_gettime(CLOCK_MONOTONIC,&ts);
tsc = ts.tv_sec;
tsc *= 1000000000;
tsc += ts.tv_nsec;
return tsc;
}
// tscsec -- convert nanoseconds to fractional seconds
double
tscsec(tsc_t tsc)
{
double sec;
sec = tsc;
sec /= 1e9;
return sec;
}
tsc_t
calibrate(void)
{
tsc_t tscbeg;
tsc_t tscold;
tsc_t tscnow;
tsc_t tscdif;
tsc_t tscmin;
int iter;
tscmin = 1LL << 62;
tscbeg = tscget();
tscold = tscbeg;
for (iter = ITERMAX; iter > 0; --iter) {
tscnow = tscget();
tscdif = tscnow - tscold;
if (tscdif < tscmin)
tscmin = tscdif;
tscold = tscnow;
}
tscdif = tscnow - tscbeg;
printf("MIN:%.9f TOT:%.9f AVG:%.9f\n",
tscsec(tscmin),tscsec(tscdif),tscsec(tscnow - tscbeg) / ITERMAX);
return tscmin;
}
int
main(void)
{
calibrate();
return 0;
}
On my system, a 2.67GHz Core i7, the output is:
MIN:0.000000019 TOT:0.000254999 AVG:0.000000025
So, I'm getting 25 ns overhead [and not 400 ns]. But, again, each system can be different to some extent.
UPDATE:
Note that x86 processors have "speed step". The OS can adjust the CPU frequency up or down semi-automatically. Lower speeds conserve power. Higher speeds are maximum performance.
This is done with a heuristic (e.g. if the OS detects that the process is a heavy CPU user, it will up the speed).
To force maximum speed, linux has this directory:
/sys/devices/system/cpu/cpuN/cpufreq
Where N is the cpu number (e.g. 0-7)
Under this directory, there are a number of files of interest. They should be self explanatory.
In particular, look at scaling_governor. It has either ondemand [kernel will adjust as needed] or performance [kernel will force maximum CPU speed].
To force maximum speed, as root, set this [once] to performance (e.g.):
echo "performance" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
Do this for all cpus.
However, I just did this on my system, and it had little effect. So, the kernel's heuristic may have improved.
As to the 400us, when a process has been waiting on something, when it is "woken up", this is a two step process.
The process is marked "runnable".
At some point, the system/CPU does a reschedule. The process will be run, based upon the scheduling policy and the process priority in effect.
For many syscalls, the reschedule [only] occurs on the next system timer/clock tick/interrupt. So, for some, there can be a delay of up to a full clock tick (i.e.) for HZ value of 1000, this can be up to 1ms (1000 us) later.
On average, this is one half of HZ or 500 us.
For some syscalls, when the process is marked runnable, a reschedule is done immediately. If the process has a higher priority, it will be run immediately.
When I first looked at this [circa 2004], I looked at all code paths in the kernel, and the only syscall that did the immediate reschedule was SysV IPC, for msgsnd/msgrcv. That is, when process A did msgsnd, any process B waiting for the given message would be run.
But, others did not (e.g. futex). They would wait for the timer tick. A lot has changed since then, and now, more syscalls will do the immediate reschedule. For example, I recently measured futex [invoked via pthread_mutex_*], and it seemed to do the quick reschedule.
Also, the kernel scheduler has changed. The newer scheduler can wakeup/run some things on a fraction of a clock tick.
So, for you, the 400 us, is [possibly] the alignment to the next clock tick.
But, it could just be the overhead of doing the syscall. To test that, I modified my test program to open /dev/null [and/or /dev/zero], and added read(fd,buf,1) to the test loop.
I got a MIN: value of 529 us. So, the delay you're getting could just be the amount of time it takes to do the task switch.
This is what I would call "good enough for now".
To get "razor's edge" response, you'd probably have to write a custom kernel driver and have the driver do this. This is what embedded systems would do if (e.g.) they had to toggle a GPIO pin on every interval.
But, if all you're doing is printf, the overhead of printf and the underlying write(1,...) tends to swamp the actual delay.
Also, note that when you do printf, it builds the output buffer and when the buffer in FILE *stdout is full, it flushes via write.
For best performance, it's better to do int len = sprintf(buf,"current time is ..."); write(1,buf,len);
Also, when you do this, if the kernel buffers for TTY I/O get filled [which is quite possible given the high frequency of messages you're doing], the process will be suspended until the I/O has been sent to the TTY device.
To do this well, you'd have to watch how much space is available, and skip some messages if there isn't enough space to [wholy] contain them.
You'd need to do: ioctl(1,TIOCOUTQ,...) to get the available space and skip some messages if it is less than the size of the message you want to output (e.g. the len value above).
For your usage, you're probably more interested in the latest time message, rather than outputting all messages [which would eventually produce a lag]
I have used the below code to get the clock cycle of the processor
unsigned long long rdtsc(void)
{
unsigned hi, lo;
__asm__ __volatile__ ("rdtsc" : "=a"(lo), "=d"(hi));
return ( (unsigned long long)lo)|( ((unsigned long long)hi)<<32 );
}
I get some value say 43, but what is the unit here? Is it in microseconds or nanoseconds.
I used below code to get the frequency of my board.
cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_cur_freq
1700000
I also used below code to find my processor speed
dmidecode -t processor | grep "Speed"
Max Speed: 3700 MHz
Current Speed: 3700 MHz
Now how do I use above frequency and convert it to microseconds or milliseconds?
A simple answer to the stated question, "how do I convert the TSC frequency to microseconds or milliseconds?" is: You do not. What the TSC (Time Stamp Counter) clock frequency actually is, varies depending on the hardware, and may vary during runtime on some. To measure real time, you use clock_gettime(CLOCK_REALTIME) or clock_gettime(CLOCK_MONOTONIC) in Linux.
As Peter Cordes mentioned in a comment (Aug 2018), on most current x86-64 architectures the Time Stamp Counter (accessed by the RDTSC instruction and __rdtsc() function declared in <x86intrin.h>) counts reference clock cycles, not CPU clock cycles. His answer to a similar question in C++ is valid for C also in Linux on x86-64, because the compiler provides the underlying built-in when compiling C or C++, and rest of the answer deals with the hardware details. I recommend reading that one, too.
The rest of this answer assumes the underlying issue is microbenchmarking code, to find out how two implementations of some function compare to each other.
On x86 (Intel 32-bit) and x86-64 (AMD64, Intel and AMD 64-bit) architectures, you can use __rdtsc() from <x86intrin.h> to find out the number of TSC clock cycles elapsed. This can be used to measure and compare the number of cycles used by different implementations of some function, typically a large number of times.
Do note that there are hardware differences as to how the TSC clock is related to CPU clock. The abovementioned more recent answer goes into some detail on that. For practical purposes in Linux, it is sufficient in Linux to use cpufreq-set to disable frequency scaling (to ensure the relationship between the CPU and TSC frequencies does not change during microbenchmarking), and optionally taskset to restrict the microbenchmark to specific CPU core(s). That ensures that the results gathered in that microbenchmark yield results that can be compared to each other.
(As Peter Cordes commented, we also want to add _mm_lfence() from <emmintrin.h> (included by <immintrin.h>). This ensures that the CPU does not internally reorder the RDTSC operation compared to the function to be benchmarked. You can use -DNO_LFENCE at compile time to omit those, if you want.)
Let's say you have functions void foo(void); and void bar(void); that you wish to compare:
#include <stdlib.h>
#include <x86intrin.h>
#include <stdio.h>
#ifdef NO_LFENCE
#define lfence()
#else
#include <emmintrin.h>
#define lfence() _mm_lfence()
#endif
static int cmp_ull(const void *aptr, const void *bptr)
{
const unsigned long long a = *(const unsigned long long *)aptr;
const unsigned long long b = *(const unsigned long long *)bptr;
return (a < b) ? -1 :
(a > b) ? +1 : 0;
}
unsigned long long *measure_cycles(size_t count, void (*func)())
{
unsigned long long *elapsed, started, finished;
size_t i;
elapsed = malloc((count + 2) * sizeof elapsed[0]);
if (!elapsed)
return NULL;
/* Call func() count times, measuring the TSC cycles for each call. */
for (i = 0; i < count; i++) {
/* First, let's ensure our CPU executes everything thus far. */
lfence();
/* Start timing. */
started = __rdtsc();
/* Ensure timing starts before we call the function. */
lfence();
/* Call the function. */
func();
/* Ensure everything has been executed thus far. */
lfence();
/* Stop timing. */
finished = __rdtsc();
/* Ensure we have the counter value before proceeding. */
lfence();
elapsed[i] = finished - started;
}
/* The very first call is likely the cold-cache case,
so in case that measurement might contain useful
information, we put it at the end of the array.
We also terminate the array with a zero. */
elapsed[count] = elapsed[0];
elapsed[count + 1] = 0;
/* Sort the cycle counts. */
qsort(elapsed, count, sizeof elapsed[0], cmp_ull);
/* This function returns all cycle counts, in sorted order,
although the median, elapsed[count/2], is the one
I personally use. */
return elapsed;
}
void benchmark(const size_t count)
{
unsigned long long *foo_cycles, *bar_cycles;
if (count < 1)
return;
printf("Measuring run time in Time Stamp Counter cycles:\n");
fflush(stdout);
foo_cycles = measure_cycles(count, foo);
bar_cycles = measure_cycles(count, bar);
printf("foo(): %llu cycles (median of %zu calls)\n", foo_cycles[count/2], count);
printf("bar(): %llu cycles (median of %zu calls)\n", bar_cycles[count/2], count);
free(bar_cycles);
free(foo_cycles);
}
Note that the above results are very specific to the compiler and compiler options used, and of course on the hardware it is run on. The median number of cycles can be interpreted as "the typical number of TSC cycles taken", because the measurement is not completely reliable (may be affected by events outside the process; for example, by context switches, or by migration to another core on some CPUs). For the same reason, I don't trust the minimum, maximum, or average values.
However, the two implementations' (foo() and bar()) cycle counts above can be compared to find out how their performance compares to each other, in a microbenchmark. Just remember that microbenchmark results may not extend to real work tasks, because of how complex tasks' resource use interactions are. One function might be superior in all microbenchmarks, but poorer than others in real world, because it is only efficient when it has lots of CPU cache to use, for example.
In Linux in general, you can use the CLOCK_REALTIME clock to measure real time (wall clock time) used, in the very same manner as above. CLOCK_MONOTONIC is even better, because it is not affected by direct changes to the realtime clock the administrator might make (say, if they noticed the system clock is ahead or behind); only drift adjustments due to NTP etc. are applied. Daylight savings time or changes thereof does not affect the measurements, using either clock. Again, the median of a number of measurements is the result I seek, because events outside the measured code itself can affect the result.
For example:
#define _POSIX_C_SOURCE 200809L
#include <stdlib.h>
#include <stdio.h>
#include <time.h>
#ifdef NO_LFENCE
#define lfence()
#else
#include <emmintrin.h>
#define lfence() _mm_lfence()
#endif
static int cmp_double(const void *aptr, const void *bptr)
{
const double a = *(const double *)aptr;
const double b = *(const double *)bptr;
return (a < b) ? -1 :
(a > b) ? +1 : 0;
}
double median_seconds(const size_t count, void (*func)())
{
struct timespec started, stopped;
double *seconds, median;
size_t i;
seconds = malloc(count * sizeof seconds[0]);
if (!seconds)
return -1.0;
for (i = 0; i < count; i++) {
lfence();
clock_gettime(CLOCK_MONOTONIC, &started);
lfence();
func();
lfence();
clock_gettime(CLOCK_MONOTONIC, &stopped);
lfence();
seconds[i] = (double)(stopped.tv_sec - started.tv_sec)
+ (double)(stopped.tv_nsec - started.tv_nsec) / 1000000000.0;
}
qsort(seconds, count, sizeof seconds[0], cmp_double);
median = seconds[count / 2];
free(seconds);
return median;
}
static double realtime_precision(void)
{
struct timespec t;
if (clock_getres(CLOCK_REALTIME, &t) == 0)
return (double)t.tv_sec
+ (double)t.tv_nsec / 1000000000.0;
return 0.0;
}
void benchmark(const size_t count)
{
double median_foo, median_bar;
if (count < 1)
return;
printf("Median wall clock times over %zu calls:\n", count);
fflush(stdout);
median_foo = median_seconds(count, foo);
median_bar = median_seconds(count, bar);
printf("foo(): %.3f ns\n", median_foo * 1000000000.0);
printf("bar(): %.3f ns\n", median_bar * 1000000000.0);
printf("(Measurement unit is approximately %.3f ns)\n", 1000000000.0 * realtime_precision());
fflush(stdout);
}
In general, I personally prefer to compile the benchmarked function in a separate unit (to a separate object file), and also benchmark a do-nothing function to estimate the function call overhead (although it tends to give an overestimate for the overhead; i.e. yield too large an overhead estimate, because some of the function call overhead is latencies and not actual time taken, and some operations are possible during those latencies in the actual functions).
It is important to remember that the above measurements should only be used as indications, because in a real world application, things like cache locality (especially on current machines, with multi-level caching, and lots of memory) hugely affect the time used by different implementations.
For example, you might compare the speeds of a quicksort and a radix sort. Depending on the size of the keys, the radix sort requires rather large extra arrays (and uses a lot of cache). If the real application the sort routine is used in does not simultaneously use a lot of other memory (and thus the sorted data is basically what is cached), then a radix sort will be faster if there is enough data (and the implementation is sane). However, if the application is multithreaded, and the other threads shuffle (copy or transfer) a lot of memory around, then the radix sort using a lot of cache will evict other data also cached; even though the radix sort function itself does not show any serious slowdown, it may slow down the other threads and therefore the overall program, because the other threads have to wait for their data to be re-cached.
This means that the only "benchmarks" you should trust, are wall clock measurements used on the actual hardware, running actual work tasks with actual work data. Everything else is subject to many conditions, and are more or less suspect: indications, yes, but not very reliable.
I am trying to calculate the number of ticks a function uses to run and to do so using the clock() function like so:
unsigned long time = clock();
myfunction();
unsigned long time2 = clock() - time;
printf("time elapsed : %lu",time2);
But the problem is that the value it returns is a multiple of 10000, which I think is the CLOCK_PER_SECOND. Is there a way or an equivalent function value that is more precise?
I am using Ubuntu 64-bit, but would prefer if the solution can work on other systems like Windows & Mac OS.
There are a number of more accurate timers in POSIX.
gettimeofday() - officially obsolescent, but very widely available; microsecond resolution.
clock_gettime() - the replacement for gettimeofday() (but not necessarily so widely available; on an old version of Solaris, requires -lposix4 to link), with nanosecond resolution.
There are other sub-second timers of greater or lesser antiquity, portability, and resolution, including:
ftime() - millisecond resolution (marked 'legacy' in POSIX 2004; not in POSIX 2008).
clock() - which you already know about. Note that it measures CPU time, not elapsed (wall clock) time.
times() - CLK_TCK or HZ. Note that this measures CPU time for parent and child processes.
Do not use ftime() or times() unless there is nothing better. The ultimate fallback, but not meeting your immediate requirements, is
time() - one second resolution.
The clock() function reports in units of CLOCKS_PER_SEC, which is required to be 1,000,000 by POSIX, but the increment may happen less frequently (100 times per second was one common frequency). The return value must be divided by CLOCKS_PER_SEC to get time in seconds.
The most precise (but highly not portable) way to measure time is to count CPU ticks.
For instance on x86
unsigned long long int asmx86Time ()
{
unsigned long long int realTimeClock = 0;
asm volatile ( "rdtsc\n\t"
"salq $32, %%rdx\n\t"
"orq %%rdx, %%rax\n\t"
"movq %%rax, %0"
: "=r" ( realTimeClock )
: /* no inputs */
: "%rax", "%rdx" );
return realTimeClock;
}
double cpuFreq ()
{
ifstream file ( "/sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq" );
string sFreq; if ( file ) file >> sFreq;
stringstream ssFreq ( sFreq ); double freq = 0.;
if ( ssFreq ) { ssFreq >> freq; freq *= 1000; } // kHz to Hz
return freq;
}
// Timing
unsigned long long int asmStart = asmx86Time ();
doStuff ();
unsigned long long int asmStop = asmx86Time ();
float asmDuration = ( asmStop - asmStart ) / cpuFreq ();
If you don't have an x86, you'll have to re-write the assembler code accordingly to your CPU. If you need maximum precision, that's unfortunatelly the only way to go... otherwise use clock_gettime().
Per the clock() manpage, on POSIX platforms the value of the CLOCKS_PER_SEC macro must be 1000000. As you say that the return value you're getting from clock() is a multiple of 10000, that would imply that the resolution is 10 ms.
Also note that clock() on Linux returns an approximation of the processor time used by the program. On Linux, again, scheduler statistics are updated when the scheduler runs, at CONFIG_HZ frequency. So if the periodic timer tick is 100 Hz, you get process CPU time consumption statistics with 10 ms resolution.
Walltime measurements are not bound by this, and can be much more accurate. clock_gettime(CLOCK_MONOTONIC, ...) on a modern Linux system provides nanosecond resolution.
I agree with the solution of Jonathan. Here is the implementation of clock_gettime() with nanoseconds of precision.
//Import
#define _XOPEN_SOURCE 500
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <time.h>
#include <sys/time.h>
int main(int argc, char *argv[])
{
struct timespec ts;
int ret;
while(1)
{
ret = clock_gettime (CLOCK_MONOTONIC, &ts);
if (ret)
{
perror ("clock_gettime");
return;
}
ts.tv_nsec += 20000; //goto sleep for 20000 n
printf("Print before sleep tid%ld %ld\n",ts.tv_sec,ts.tv_nsec );
// printf("going to sleep tid%d\n",turn );
ret = clock_nanosleep (CLOCK_MONOTONIC, TIMER_ABSTIME,&ts, NULL);
}
}
Although It's difficult to achieve ns precision, but this can be used to get precision for less than a microseconds (700-900 ns). printf above is used to just print the thread # (it'll definitely take 2-3 micro seconds to just print a statement).
Using only ANSI C, is there any way to measure time with milliseconds precision or more? I was browsing time.h but I only found second precision functions.
There is no ANSI C function that provides better than 1 second time resolution but the POSIX function gettimeofday provides microsecond resolution. The clock function only measures the amount of time that a process has spent executing and is not accurate on many systems.
You can use this function like this:
struct timeval tval_before, tval_after, tval_result;
gettimeofday(&tval_before, NULL);
// Some code you want to time, for example:
sleep(1);
gettimeofday(&tval_after, NULL);
timersub(&tval_after, &tval_before, &tval_result);
printf("Time elapsed: %ld.%06ld\n", (long int)tval_result.tv_sec, (long int)tval_result.tv_usec);
This returns Time elapsed: 1.000870 on my machine.
#include <time.h>
clock_t uptime = clock() / (CLOCKS_PER_SEC / 1000);
I always use the clock_gettime() function, returning time from the CLOCK_MONOTONIC clock. The time returned is the amount of time, in seconds and nanoseconds, since some unspecified point in the past, such as system startup of the epoch.
#include <stdio.h>
#include <stdint.h>
#include <time.h>
int64_t timespecDiff(struct timespec *timeA_p, struct timespec *timeB_p)
{
return ((timeA_p->tv_sec * 1000000000) + timeA_p->tv_nsec) -
((timeB_p->tv_sec * 1000000000) + timeB_p->tv_nsec);
}
int main(int argc, char **argv)
{
struct timespec start, end;
clock_gettime(CLOCK_MONOTONIC, &start);
// Some code I am interested in measuring
clock_gettime(CLOCK_MONOTONIC, &end);
uint64_t timeElapsed = timespecDiff(&end, &start);
}
Implementing a portable solution
As it was already mentioned here that there is no proper ANSI solution with sufficient precision for the time measurement problem, I want to write about the ways how to get a portable and, if possible, a high-resolution time measurement solution.
Monotonic clock vs. time stamps
Generally speaking there are two ways of time measurement:
monotonic clock;
current (date)time stamp.
The first one uses a monotonic clock counter (sometimes it is called a tick counter) which counts ticks with a predefined frequency, so if you have a ticks value and the frequency is known, you can easily convert ticks to elapsed time. It is actually not guaranteed that a monotonic clock reflects the current system time in any way, it may also count ticks since a system startup. But it guarantees that a clock is always run up in an increasing fashion regardless of the system state. Usually the frequency is bound to a hardware high-resolution source, that's why it provides a high accuracy (depends on hardware, but most of the modern hardware has no problems with high-resolution clock sources).
The second way provides a (date)time value based on the current system clock value. It may also have a high resolution, but it has one major drawback: this kind of time value can be affected by different system time adjustments, i.e. time zone change, daylight saving time (DST) change, NTP server update, system hibernation and so on. In some circumstances you can get a negative elapsed time value which can lead to an undefined behavior. Actually this kind of time source is less reliable than the first one.
So the first rule in time interval measuring is to use a monotonic clock if possible. It usually has a high precision, and it is reliable by design.
Fallback strategy
When implementing a portable solution it is worth to consider a fallback strategy: use a monotonic clock if available and fallback to time stamps approach if there is no monotonic clock in the system.
Windows
There is a great article called Acquiring high-resolution time stamps on MSDN about time measurement on Windows which describes all the details you may need to know about software and hardware support. To acquire a high precision time stamp on Windows you should:
query a timer frequency (ticks per second) with QueryPerformanceFrequency:
LARGE_INTEGER tcounter;
LARGE_INTEGER freq;
if (QueryPerformanceFrequency (&tcounter) != 0)
freq = tcounter.QuadPart;
The timer frequency is fixed on the system boot so you need to get it only once.
query the current ticks value with QueryPerformanceCounter:
LARGE_INTEGER tcounter;
LARGE_INTEGER tick_value;
if (QueryPerformanceCounter (&tcounter) != 0)
tick_value = tcounter.QuadPart;
scale the ticks to elapsed time, i.e. to microseconds:
LARGE_INTEGER usecs = (tick_value - prev_tick_value) / (freq / 1000000);
According to Microsoft you should not have any problems with this approach on Windows XP and later versions in most cases. But you can also use two fallback solutions on Windows:
GetTickCount provides the number of milliseconds that have elapsed since the system was started. It wraps every 49.7 days, so be careful in measuring longer intervals.
GetTickCount64 is a 64-bit version of GetTickCount, but it is available starting from Windows Vista and above.
OS X (macOS)
OS X (macOS) has its own Mach absolute time units which represent a monotonic clock. The best way to start is the Apple's article Technical Q&A QA1398: Mach Absolute Time Units which describes (with the code examples) how to use Mach-specific API to get monotonic ticks. There is also a local question about it called clock_gettime alternative in Mac OS X which at the end may leave you a bit confused what to do with the possible value overflow because the counter frequency is used in the form of numerator and denominator. So, a short example how to get elapsed time:
get the clock frequency numerator and denominator:
#include <mach/mach_time.h>
#include <stdint.h>
static uint64_t freq_num = 0;
static uint64_t freq_denom = 0;
void init_clock_frequency ()
{
mach_timebase_info_data_t tb;
if (mach_timebase_info (&tb) == KERN_SUCCESS && tb.denom != 0) {
freq_num = (uint64_t) tb.numer;
freq_denom = (uint64_t) tb.denom;
}
}
You need to do that only once.
query the current tick value with mach_absolute_time:
uint64_t tick_value = mach_absolute_time ();
scale the ticks to elapsed time, i.e. to microseconds, using previously queried numerator and denominator:
uint64_t value_diff = tick_value - prev_tick_value;
/* To prevent overflow */
value_diff /= 1000;
value_diff *= freq_num;
value_diff /= freq_denom;
The main idea to prevent an overflow is to scale down the ticks to desired accuracy before using the numerator and denominator. As the initial timer resolution is in nanoseconds, we divide it by 1000 to get microseconds. You can find the same approach used in Chromium's time_mac.c. If you really need a nanosecond accuracy consider reading the How can I use mach_absolute_time without overflowing?.
Linux and UNIX
The clock_gettime call is your best way on any POSIX-friendly system. It can query time from different clock sources, and the one we need is CLOCK_MONOTONIC. Not all systems which have clock_gettime support CLOCK_MONOTONIC, so the first thing you need to do is to check its availability:
if _POSIX_MONOTONIC_CLOCK is defined to a value >= 0 it means that CLOCK_MONOTONIC is avaiable;
if _POSIX_MONOTONIC_CLOCK is defined to 0 it means that you should additionally check if it works at runtime, I suggest to use sysconf:
#include <unistd.h>
#ifdef _SC_MONOTONIC_CLOCK
if (sysconf (_SC_MONOTONIC_CLOCK) > 0) {
/* A monotonic clock presents */
}
#endif
otherwise a monotonic clock is not supported and you should use a fallback strategy (see below).
Usage of clock_gettime is pretty straight forward:
get the time value:
#include <time.h>
#include <sys/time.h>
#include <stdint.h>
uint64_t get_posix_clock_time ()
{
struct timespec ts;
if (clock_gettime (CLOCK_MONOTONIC, &ts) == 0)
return (uint64_t) (ts.tv_sec * 1000000 + ts.tv_nsec / 1000);
else
return 0;
}
I've scaled down the time to microseconds here.
calculate the difference with the previous time value received the same way:
uint64_t prev_time_value, time_value;
uint64_t time_diff;
/* Initial time */
prev_time_value = get_posix_clock_time ();
/* Do some work here */
/* Final time */
time_value = get_posix_clock_time ();
/* Time difference */
time_diff = time_value - prev_time_value;
The best fallback strategy is to use the gettimeofday call: it is not a monotonic, but it provides quite a good resolution. The idea is the same as with clock_gettime, but to get a time value you should:
#include <time.h>
#include <sys/time.h>
#include <stdint.h>
uint64_t get_gtod_clock_time ()
{
struct timeval tv;
if (gettimeofday (&tv, NULL) == 0)
return (uint64_t) (tv.tv_sec * 1000000 + tv.tv_usec);
else
return 0;
}
Again, the time value is scaled down to microseconds.
SGI IRIX
IRIX has the clock_gettime call, but it lacks CLOCK_MONOTONIC. Instead it has its own monotonic clock source defined as CLOCK_SGI_CYCLE which you should use instead of CLOCK_MONOTONIC with clock_gettime.
Solaris and HP-UX
Solaris has its own high-resolution timer interface gethrtime which returns the current timer value in nanoseconds. Though the newer versions of Solaris may have clock_gettime, you can stick to gethrtime if you need to support old Solaris versions.
Usage is simple:
#include <sys/time.h>
void time_measure_example ()
{
hrtime_t prev_time_value, time_value;
hrtime_t time_diff;
/* Initial time */
prev_time_value = gethrtime ();
/* Do some work here */
/* Final time */
time_value = gethrtime ();
/* Time difference */
time_diff = time_value - prev_time_value;
}
HP-UX lacks clock_gettime, but it supports gethrtime which you should use in the same way as on Solaris.
BeOS
BeOS also has its own high-resolution timer interface system_time which returns the number of microseconds have elapsed since the computer was booted.
Example usage:
#include <kernel/OS.h>
void time_measure_example ()
{
bigtime_t prev_time_value, time_value;
bigtime_t time_diff;
/* Initial time */
prev_time_value = system_time ();
/* Do some work here */
/* Final time */
time_value = system_time ();
/* Time difference */
time_diff = time_value - prev_time_value;
}
OS/2
OS/2 has its own API to retrieve high-precision time stamps:
query a timer frequency (ticks per unit) with DosTmrQueryFreq (for GCC compiler):
#define INCL_DOSPROFILE
#define INCL_DOSERRORS
#include <os2.h>
#include <stdint.h>
ULONG freq;
DosTmrQueryFreq (&freq);
query the current ticks value with DosTmrQueryTime:
QWORD tcounter;
unit64_t time_low;
unit64_t time_high;
unit64_t timestamp;
if (DosTmrQueryTime (&tcounter) == NO_ERROR) {
time_low = (unit64_t) tcounter.ulLo;
time_high = (unit64_t) tcounter.ulHi;
timestamp = (time_high << 32) | time_low;
}
scale the ticks to elapsed time, i.e. to microseconds:
uint64_t usecs = (prev_timestamp - timestamp) / (freq / 1000000);
Example implementation
You can take a look at the plibsys library which implements all the described above strategies (see ptimeprofiler*.c for details).
timespec_get from C11
Returns up to nanoseconds, rounded to the resolution of the implementation.
Looks like an ANSI ripoff from POSIX' clock_gettime.
Example: a printf is done every 100ms on Ubuntu 15.10:
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
static long get_nanos(void) {
struct timespec ts;
timespec_get(&ts, TIME_UTC);
return (long)ts.tv_sec * 1000000000L + ts.tv_nsec;
}
int main(void) {
long nanos;
long last_nanos;
long start;
nanos = get_nanos();
last_nanos = nanos;
start = nanos;
while (1) {
nanos = get_nanos();
if (nanos - last_nanos > 100000000L) {
printf("current nanos: %ld\n", nanos - start);
last_nanos = nanos;
}
}
return EXIT_SUCCESS;
}
The C11 N1570 standard draft 7.27.2.5 "The timespec_get function says":
If base is TIME_UTC, the tv_sec member is set to the number of seconds since an
implementation defined epoch, truncated to a whole value and the tv_nsec member is
set to the integral number of nanoseconds, rounded to the resolution of the system clock. (321)
321) Although a struct timespec object describes times with nanosecond resolution, the available
resolution is system dependent and may even be greater than 1 second.
C++11 also got std::chrono::high_resolution_clock: C++ Cross-Platform High-Resolution Timer
glibc 2.21 implementation
Can be found under sysdeps/posix/timespec_get.c as:
int
timespec_get (struct timespec *ts, int base)
{
switch (base)
{
case TIME_UTC:
if (__clock_gettime (CLOCK_REALTIME, ts) < 0)
return 0;
break;
default:
return 0;
}
return base;
}
so clearly:
only TIME_UTC is currently supported
it forwards to __clock_gettime (CLOCK_REALTIME, ts), which is a POSIX API: http://pubs.opengroup.org/onlinepubs/9699919799/functions/clock_getres.html
Linux x86-64 has a clock_gettime system call.
Note that this is not a fail-proof micro-benchmarking method because:
man clock_gettime says that this measure may have discontinuities if you change some system time setting while your program runs. This should be a rare event of course, and you might be able to ignore it.
this measures wall time, so if the scheduler decides to forget about your task, it will appear to run for longer.
For those reasons getrusage() might be a better better POSIX benchmarking tool, despite it's lower microsecond maximum precision.
More information at: Measure time in Linux - time vs clock vs getrusage vs clock_gettime vs gettimeofday vs timespec_get?
The best precision you can possibly get is through the use of the x86-only "rdtsc" instruction, which can provide clock-level resolution (ne must of course take into account the cost of the rdtsc call itself, which can be measured easily on application startup).
The main catch here is measuring the number of clocks per second, which shouldn't be too hard.
The accepted answer is good enough.But my solution is more simple.I just test in Linux, use gcc (Ubuntu 7.2.0-8ubuntu3.2) 7.2.0.
Alse use gettimeofday, the tv_sec is the part of second, and the tv_usec is microseconds, not milliseconds.
long currentTimeMillis() {
struct timeval time;
gettimeofday(&time, NULL);
return time.tv_sec * 1000 + time.tv_usec / 1000;
}
int main() {
printf("%ld\n", currentTimeMillis());
// wait 1 second
sleep(1);
printf("%ld\n", currentTimeMillis());
return 0;
}
It print:
1522139691342
1522139692342, exactly a second.
^
As of ANSI/ISO C11 or later, you can use timespec_get() to obtain millisecond, microsecond, or nanosecond timestamps, like this:
#include <time.h>
/// Convert seconds to milliseconds
#define SEC_TO_MS(sec) ((sec)*1000)
/// Convert seconds to microseconds
#define SEC_TO_US(sec) ((sec)*1000000)
/// Convert seconds to nanoseconds
#define SEC_TO_NS(sec) ((sec)*1000000000)
/// Convert nanoseconds to seconds
#define NS_TO_SEC(ns) ((ns)/1000000000)
/// Convert nanoseconds to milliseconds
#define NS_TO_MS(ns) ((ns)/1000000)
/// Convert nanoseconds to microseconds
#define NS_TO_US(ns) ((ns)/1000)
/// Get a time stamp in milliseconds.
uint64_t millis()
{
struct timespec ts;
timespec_get(&ts, TIME_UTC);
uint64_t ms = SEC_TO_MS((uint64_t)ts.tv_sec) + NS_TO_MS((uint64_t)ts.tv_nsec);
return ms;
}
/// Get a time stamp in microseconds.
uint64_t micros()
{
struct timespec ts;
timespec_get(&ts, TIME_UTC);
uint64_t us = SEC_TO_US((uint64_t)ts.tv_sec) + NS_TO_US((uint64_t)ts.tv_nsec);
return us;
}
/// Get a time stamp in nanoseconds.
uint64_t nanos()
{
struct timespec ts;
timespec_get(&ts, TIME_UTC);
uint64_t ns = SEC_TO_NS((uint64_t)ts.tv_sec) + (uint64_t)ts.tv_nsec;
return ns;
}
// NB: for all 3 timestamp functions above: gcc defines the type of the internal
// `tv_sec` seconds value inside the `struct timespec`, which is used
// internally in these functions, as a signed `long int`. For architectures
// where `long int` is 64 bits, that means it will have undefined
// (signed) overflow in 2^64 sec = 5.8455 x 10^11 years. For architectures
// where this type is 32 bits, it will occur in 2^32 sec = 136 years. If the
// implementation-defined epoch for the timespec is 1970, then your program
// could have undefined behavior signed time rollover in as little as
// 136 years - (year 2021 - year 1970) = 136 - 51 = 85 years. If the epoch
// was 1900 then it could be as short as 136 - (2021 - 1900) = 136 - 121 =
// 15 years. Hopefully your program won't need to run that long. :). To see,
// by inspection, what your system's epoch is, simply print out a timestamp and
// calculate how far back a timestamp of 0 would have occurred. Ex: convert
// the timestamp to years and subtract that number of years from the present
// year.
For a much-more-thorough answer of mine, including with an entire timing library I wrote, see here: How to get a simple timestamp in C.
#Ciro Santilli Путлер also presents a concise demo of C11's timespec_get() function here, which is how I first learned how to use that function.
In my more-thorough answer, I explain that on my system, the best resolution possible is ~20ns, but the resolution is hardware-dependent and can vary from system to system.
Under windows:
SYSTEMTIME t;
GetLocalTime(&t);
swprintf_s(buff, L"[%02d:%02d:%02d:%d]\t", t.wHour, t.wMinute, t.wSecond, t.wMilliseconds);