I'm working on a timing system and I'll implement a timer class.
#include <windows.h>
#include <stdio.h>
#include <time.h>
int main()
{
clock_t t1, t2;
t1 = clock();
Sleep(10);
t2 = clock();
printf("%i\n", (int)(t2 - t1));
return 0;
}
This program should print "10" but it prints "15" or "16". I need more accurate which is less than 1 ms! Suggestions? (maybe with select()'s timeout?)
NOTE: I've run this program on Windows 7 Ultimate x86. Program compiled with MinGW (C/C++) x86.
NOW I THINK >>
Sleep() is accurate to the operating system's clock interrupt rate. Which by default on Windows ticks 64 times per second. Or once every 15.625 msec, as you found out.
You can increase that rate, call timeBeginPeriod(10). Use timeEndPeriod(10) when you're done. You are still subject to normal thread scheduling latencies so you still don't have a guarantee that your thread will resume running after 10 msec. And won't when the machine is heavily loaded. Using SetThreadPriority() to boost the priority, increasing the odds that it will.
Your problem is that the "clock" only ticks 64 times per second (I've seen systems that can go up to 2000, but no higher). If you are creating a timer class, you probably want to have much higher resolution than even 2000. Use QueryPerformanceCounter to get higher resolution. See QueryPerformanceCounter and overflows for an example.
Note that if you want to sleep for very short intervals, you will have to call QueryPerformanceCounter in a tight loop.
Sleep() is not accurate in the way you want it to be.
It will cause the thread to sleep for AT LEAST the length of time you specify, but there's no guarantee that the OS will return control to your thread at exactly that specified time.
Related
I have the following code:
#include <stdio.h>
#include <time.h>
int main(){
clock_t timerS;
int i=1, targetTime=2;
scanf("%d", &targetTime);
while(i!=0){
timerS = clock();
while ((double)((clock() - timerS) / CLOCKS_PER_SEC) < targetTime){
//do something
}
//do another thing but delayed by the given time
if(targetTime>=0.5)
targetTime-=0.02;
else i=0;
}
return 0;
}
And what I want to do is having a loop which does something for (initially) an inputted amount of seconds and also doing another thing after targetTime-seconds have passed.
But after the first loop, to change the speed with which these operations are made(more specifically -0.02 seconds in this case).
An example would be getting multiple user inputs from user for 2 seconds, and displaying all the inputs made in these 2 seconds afterwards.
First problem is
If the initial given time is smaller than 1 second (for example 0.6), the other thing isn't delayed by 0.6 seconds, but is done immediately.
Second problem is
Actually similar to the first, if I subtract 0.02 seconds (in this case) from targetTime, it again does the other thing immediately and not in targetTime-0.02 seconds as I intend it to.
I'm new to this "clock" and "time" topic in C so I guess I'm doing something wrong regarding how these operations should be done. Also, please don't give an overly-complicated explanation/solution because of the above-mentioned reason.
Thanks!
Don't use the clock(2) system call, as it is obsolete and has been fully superseeded by machine independent replacements.
You can use, if your system supports it, clock_gettime(2), that will give you up to nanosecond precission (depending on the platform, but at least in linux on Intel architectures it is almost warranted) or, if you cannot use it, at least you'll have gettimeofday(2), which is derived from BSD systems, and provides you with a clock with microsecond resolution.
If you want to stop your program for some delay, you have also sleep(2) (second based) usleep(2) (microsecond based) or even nsleep(2) (nanosecond based)
Anyway, any of these calls has a tick that is not based on the system heartbeat, and the resolution is uniform and not system dependant.
I mistakenly initiated targetTime as int instead of double. Changing it to double solves the issue easily. Sorry!
#include <windows.h>
#include <stdio.h>
#include <stdint.h>
// assuming we return times with microsecond resolution
#define STOPWATCH_TICKS_PER_US 1
uint64_t GetStopWatch()
{
LARGE_INTEGER t, freq;
uint64_t val;
QueryPerformanceCounter(&t);
QueryPerformanceFrequency(&freq);
return (uint64_t) (t.QuadPart / (double) freq.QuadPart * 1000000);
}
void task()
{
printf("hi\n");
}
int main()
{
uint64_t start = GetStopWatch();
task();
uint64_t stop = GetStopWatch();
printf("Elapsed time (microseconds): %lld\n", stop - start);
}
The above contains a query performance counter function Retrieves the current value of the high-resolution performance counter and query performance frequency function Retrieves the frequency of the high-resolution performance counter. If I am calling the task(); function multiple times then the difference between the start and stop time varies but I should get the same time difference for calling the task function multiple times. could anyone help me to identify the mistake in the above code ??
The thing is, Windows is a pre-emptive multi-tasking operating system. What the hell does that mean, you ask?
'Simple' - windows allocates time-slices to each of the running processes in the system. This gives the illusion of dozens or hundreds of processes running in parallel. In reality, you are limited to 2, 4, 8 or perhaps 16 parallel processes in a typical desktop/laptop. An Intel i3 has 2 physical cores, each of which can give the impression of doing two things at once. (But in reality, there's hardware tricks going on that switch the execution between each of the two threads that each core can handle at once) This is in addition to the software context switching that Windows/Linux/MacOSX do.
These time-slices are not guaranteed to be of the same duration each time. You may find the pc does a sync with windows.time to update your clock, you may find that the virus-scanner decides to begin working, or any one of a number of other things. All of these events may occur after your task() function has begun, yet before it ends.
In the DOS days, you'd get very nearly the same result each and every time you timed a single iteration of task(). Though, thanks to TSR programs, you could still find an interrupt was fired and some machine-time stolen during execution.
It is for just these reasons that a more accurate determination of the time a task takes to execute may be calculated by running the task N times, dividing the elapsed time by N to get the time per iteration.
For some functions in the past, I have used values for N as large as 100 million.
EDIT: A short snippet.
LARGE_INTEGER tStart, tEnd;
LARGE_INTEGER tFreq;
double tSecsElapsed;
QueryPerformanceFrequency(&tFreq);
QueryPerformanceCounter(&tStart);
int i, n = 100;
for (i=0; i<n; i++)
{
// Do Something
}
QueryPerformanceCounter(&tEnd);
tSecsElapsed = (tEnd.QuadPart - tStart.QuadPart) / (double)tFreq.QuadPart;
double tMsElapsed = tSecElapsed * 1000;
double tMsPerIteration = tMsElapsed / (double)n;
Code execution time on modern operating systems and processors is very unpredictable. There is no scenario where you can be sure that the elapsed time actually measured the time taken by your code, your program may well have lost the processor to another process while it was executing. The caches used by the processor play a big role, code is always a lot slower when it is executed the first time when the caches do not yet contain the code and data used by the program. The memory bus is very slow compared to the processor.
It gets especially meaningless when you measure a printf() statement. The console window is owned by another process so there's a significant chunk of process interop overhead whose execution time critically depends on the state of that process. You'll suddenly see a huge difference when the console window needs to be scrolled for example. And most of all, there isn't actually anything you can do about making it faster so measuring it is only interesting for curiosity.
Profile only code that you can improve. Take many samples so you can get rid of the outliers. Never pick the lowest measurement, that just creates unrealistic expectations. Don't pick the average either, that is affected to much by the long delays that other processes can incur on your test. The median value is a good choice.
I need to do precision timing to the 1 us level to time a change in duty cycle of a pwm wave.
Background
I am using a Gumstix Over Water COM (https://www.gumstix.com/store/app.php/products/265/) that has a single core ARM Cortex-A8 processor running at 499.92 BogoMIPS (the Gumstix page claims up to 1Ghz with 800Mhz recommended) according to /proc/cpuinfo. The OS is an Angstrom Image version of Linux based of kernel version 2.6.34 and it is stock on the Gumstix Water COM.
The Problem
I have done a fair amount of reading about precise timing in Linux (and have tried most of it) and the consensus seems to be that using clock_gettime() and referencing CLOCK_MONOTONIC is the best way to do it. (I would have liked to use the RDTSC register for timing since I have one core with minimal power saving abilities but this is not an Intel processor.) So here is the odd part, while clock_getres() returns 1, suggesting resolution at 1 ns, actual timing tests suggest a minimum resolution of 30517ns or (it can't be coincidence) exactly the time between a 32.768KHz clock ticks. Here's what I mean:
// Stackoverflow example
#include <stdio.h>
#include <time.h>
#define SEC2NANOSEC 1000000000
int main( int argc, const char* argv[] )
{
// //////////////// Min resolution test //////////////////////
struct timespec resStart, resEnd, ts;
ts.tv_sec = 0; // s
ts.tv_nsec = 1; // ns
int iters = 100;
double resTime,sum = 0;
int i;
for (i = 0; i<iters; i++)
{
clock_gettime(CLOCK_MONOTONIC, &resStart); // start timer
// clock_nanosleep(CLOCK_MONOTONIC, 0, &ts, &ts);
clock_gettime(CLOCK_MONOTONIC, &resEnd); // end timer
resTime = ((double)resEnd.tv_sec*SEC2NANOSEC + (double)resEnd.tv_nsec
- ((double)resStart.tv_sec*SEC2NANOSEC + (double)resStart.tv_nsec);
sum = sum + resTime;
printf("resTime = %f\n",resTime);
}
printf("Average = %f\n",sum/(double)iters);
}
(Don't fret over the double casting, tv_sec in a time_t and tv_nsec is a long.)
Compile with:
gcc soExample.c -o runSOExample -lrt
Run with:
./runSOExample
With the nanosleep commented out as shown, the result is either 0ns or 30517ns with the majority being 0ns. This leads me to believe that CLOCK_MONOTONIC is updated at 32.768kHz and most of the time the clock has not been updated before the second clock_gettime() call is made and in cases where the result is 30517ns the clock has been updated between calls.
When I do the same thing on my development computer (AMD FX(tm)-6100 Six-Core Processor running at 1.4 GHz) the minimum delay is a more constant 149-151ns with no zeros.
So, let's compare those results to the CPU speeds. For the Gumstix, that 30517ns (32.768kHz) equates to 15298 cycles of the 499.93MHz cpu. For my dev computer that 150ns equates to 210 cycles of the 1.4Ghz CPU.
With the clock_nanosleep() call uncommented the average results are these:
Gumstix: Avg value = 213623 and the result varies, up and down, by multiples of that min resolution of 30517ns
Dev computer: 57710-68065 ns with no clear trend. In the case of the dev computer I expect the resolution to actually be at the 1 ns level and the measured ~150ns truly is the time elapsed between the two clock_gettime() calls.
So, my question's are these:
What determines that minimum resolution?
Why is the resolution of the dev computer 30000X better than the Gumstix when the processor is only running ~2.6X faster?
Is there a way to change how often CLOCK_MONOTONIC is updated and where? In the kernel?
Thanks! If you need more info or clarification just ask.
As I understand, the difference between two environments(Gumstix and your Dev-computer) might be the underlying timer h/w they are using.
Commented nanosleep() case:
You are using clock_gettime() twice. To give you a rough idea of what this clock_gettime() will ultimately get mapped to(in kernel):
clock_gettime -->clock_get() -->posix_ktime_get_ts -->ktime_get_ts() -->timekeeping_get_ns()
-->clock->read()
clock->read() basically reads the value of the counter provided by underlying timer driver and corresponding h/w. A simple difference with stored value of the counter in the past and current counter value and then nanoseconds conversion mathematics will yield you the nanoseconds elapsed and will update the time-keeping data structures in kernel.
For example , if you have a HPET timer which gives you a 10 MHz clock, the h/w counter will get updated at 100 ns time interval.
Lets say, on first clock->read(), you get a counter value of X.
Linux Time-keeping data structures will read this value of X, get the difference 'D'compared to some old stored counter value.Do some counter-difference 'D' to nanoseconds 'n' conversion mathematics, update the data-structure by 'n'
Yield this new time value to the user space.
When second clock->read() is issued, it will again read the counter and update the time.
Now, for a HPET timer, this counter is getting updated every 100ns and hence , you will see this difference being reported to the user-space.
Now, Let's replace this HPET timer with a slow 32.768 KHz clock. Now , clock->read()'s counter will updated only after 30517 ns seconds, so, if you second call to clock_gettime() is before this period, you will get 0(which is majority of the cases) and in some cases, your second function call will be placed after counter has incremented by 1, i.e 30517 ns has elapsed. Hence , the value of 30517 ns sometimes.
Uncommented Nanosleep() case:
Let's trace the clock_nanosleep() for monotonic clocks:
clock_nanosleep() -->nsleep --> common_nsleep() -->hrtimer_nanosleep() -->do_nanosleep()
do_nanosleep() will simply put the current task in INTERRUPTIBLE state, will wait for the timer to expire(which is 1 ns) and then set the current task in RUNNING state again. You see, there are lot of factors involved now, mainly when your kernel thread (and hence the user space process) will be scheduled again. Depending on your OS, you will always face some latency when your doing a context-switch and this is what we observe with the average values.
Now Your questions:
What determines that minimum resolution?
I think the resolution/precision of your system will depend on the underlying timer hardware being used(assuming your OS is able to provide that precision to the user space process).
*Why is the resolution of the dev computer 30000X better than the Gumstix when the processor is only running ~2.6X faster?*
Sorry, I missed you here. How it is 30000x faster? To me , it looks like something 200x faster(30714 ns/ 150 ns ~ 200X ? ) .But anyway, as I understand, CPU speed may or may not have to do with the timer resolution/precision. So, this assumption may be right in some architectures(when you are using TSC H/W), though, might fail in others(using HPET, PIT etc).
Is there a way to change how often CLOCK_MONOTONIC is updated and where? In the kernel?
you can always look into the kernel code for details(that's how i looked into it).
In linux kernel code , look for these source files and Documentation:
kernel/posix-timers.c
kernel/hrtimer.c
Documentation/timers/hrtimers.txt
I do not have gumstix on hand, but it looks like your clocksource is slow.
run:
$ dmesg | grep clocksource
If you get back
[ 0.560455] Switching to clocksource 32k_counter
This might explain why your clock is so slow.
In the recent kernels there is a directory /sys/devices/system/clocksource/clocksource0 with two files: available_clocksource and current_clocksource. If you have this directory, try switching to a different source by echo'ing its name into second file.
I'm using something like this to count how long does it takes my program from start to finish:
int main(){
clock_t startClock = clock();
.... // many codes
clock_t endClock = clock();
printf("%ld", (endClock - startClock) / CLOCKS_PER_SEC);
}
And my question is, since there are multiple process running at the same time, say if for x amount of time my process is in idle, durning that time will clock tick within my program?
So basically my concern is, say there's 1000 clock cycle passed by, but my process only uses 500 of them, will I get 500 or 1000 from (endClock - startClock)?
Thanks.
This depends on the OS. On Windows, clock() measures wall-time. On Linux/Posix, it measures the combined CPU time of all the threads.
If you want wall-time on Linux, you should use gettimeofday().
If you want CPU-time on Windows, you should use GetProcessTimes().
EDIT:
So if you're on Windows, clock() will measure idle time.
On Linux, clock() will not measure idle time.
clock on POSIX measures cpu time, but it usually has extremely poor resolution. Instead, modern programs should use clock_gettime with the CLOCK_PROCESS_CPUTIME_ID clock-id. This will give up to nanosecond-resolution results, and usually it's really just about that good.
As per the definition on the man page (in Linux),
The clock() function returns an approximation of processor time used
by the program.
it will try to be as accurate a possible, but as you say, some time (process switching, for example) is difficult to account to a process, so the numbers will be as accurate as possible, but not perfect.
I need a way to get the elapsed time (wall-clock time) since a program started, in a way that is resilient to users meddling with the system clock.
On windows, the non standard clock() implementation doesn't do the trick, as it appears to work just by calculating the difference with the time sampled at start up, so that I get negative values if I "move the clock hands back".
On UNIX, clock/getrusage refer to system time, whereas using function such as gettimeofday to sample timestamps has the same problem as using clock on windows.
I'm not really interested in precision, and I've hacked a solution by having a half a second resolution timer spinning in the background countering the clock skews when they happen
(if the difference between the sampled time and the expected exceeds 1 second i use the expected timer for the new baseline) but I think there must be a better way.
I guess you can always start some kind of timer. For example under Linux a thread
that would have a loop like this :
static void timer_thread(void * arg)
{
struct timespec delay;
unsigned int msecond_delay = ((app_state_t*)arg)->msecond_delay;
delay.tv_sec = 0;
delay.tv_nsec = msecond_delay * 1000000;
while(1) {
some_global_counter_increment();
nanosleep(&delay, NULL);
}
}
Where app_state_t is an application structure of your choice were you store variables. If you want to prevent tampering, you need to be sure no one killed your thread
For POSIX, use clock_gettime() with CLOCK_MONOTONIC.
I don't think you'll find a cross-platform way of doing that.
On Windows what you need is GetTickCount (or maybe QueryPerformanceCounter and QueryPerformanceFrequency for a high resolution timer). I don't have experience with that on Linux, but a search on Google gave me clock_gettime.
Wall clock time can bit calculated with the time() call.
If you have a network connection, you can always acquire the time from an NTP server. This will obviously not be affected in any the local clock.
/proc/uptime on linux maintains the number of seconds that the system has been up (and the number of seconds it has been idle), which should be unaffected by changes to the clock as it's maintained by the system interrupt (jiffies / HZ). Perhaps windows has something similar?