I am currently using GetThreadTimes to tell me how much time I spend in my application's event loop.
I wonder how this will be affected by hibernating. Is hibernation time reported at all? Or perhaps as system time? Is the behavior the same on all versions of Windows?
Note I asked the same question for Posix here.
No, hibernation time is not reported
How could it?
When hibernated the computer is actually turned off, no timer is ticking and no counting is taking place.
The timings you see with GetThreadTimes are computed at each tick (i.e. interrupt) of the system timer1.
I've made a small C program to test this.
It logs the timings every 1000 ms, I run it and hibernated my laptop (at second 9 circa).
No gaps are shown in the counting, as expected, each line is about 1000 ms from the previous.
So hibernation time doesn't count either as system time or user time.
I don't know it this is consistent with all old versions of Windows, specially the ones with different meaning of hibernation but you can expect this to be consistent among all the important versions.
1HPET, LAPIC timer or the old good PIT, whatever is available
Related
I am currently trying to talk to a piece of hardware in userspace (underneath the hood, everything is using the spidev kernel driver, but that's a different story).
The hardware will tell me that a command has been completed by indicating so with a special value in a register, that I am reading from. The hardware also has a requirement to get back to me in a certain time, otherwise the command has failed. Different commands take different times.
As a result, I am implementing a way to set a timeout and then check for that timeout using clock_gettime(). In my "set" function, I take the current time and add the time interval I should wait for (usually this anywhere from a few ms to a couple of seconds). I then store this value for safe keeping later.
In my "check" function, I once again, get the current time and then compare it against the time I have saved. This seems to work as I had hoped.
Given my use case, should I be using CLOCK_MONOTONIC or CLOCK_MONOTONIC_RAW? I'm assuming CLOCK_MONOTONIC_RAW is better suited, since I have short intervals that I am checking. I am worried that such a short interval might represent a system-wide outlier, in which NTP was doing alot of adjusting. Note that my target system is only Linux kernels 4.4 and newer.
Thanks in advance for the help.
Edited to add: given my use case, I need "wall clock" time, not CPU time. That is, I am checking to see if the hardware has responded in some wall clock time interval.
References:
Rutgers Course Notes
What is the difference between CLOCK_MONOTONIC & CLOCK_MONOTONIC_RAW?
Elapsed Time in C Tutorial
I am reading about the difference between CLOCK_REALTIME and CLOCK_MONOTONIC
Difference between CLOCK_REALTIME and CLOCK_MONOTONIC?
The CLOCK_REALTIME has discontinuities in time, can jump forwards as well as backwards: is that a bug in this clock? How could a clock that gives inconsistent time be reliable?
Despite its imperfections, CLOCK_REALTIME should be the system's best estimate of the current UTC or civil time. It's the basis for the system's ability to display the same time you'd see if you looked at your watch, or a clock on the wall, or your cell phone, or listened to a time broadcast on a radio station, etc. (The display does involve a conversion from UTC to local time; more on this later.)
But if CLOCK_REALTIME is going to match the UTC time out there in the real world, there are at least two pretty significant issues:
What if someone accidentally sets the clock on your computer wrong? They're going to have to fix it, and the fix might involve a time jump. Pretty much no way around that, especially if the error is large (like, hours or days).
Most computers unfortunately have no way of representing leap seconds. Therefore, when there's a leap second out in the real world, most computer clocks have to jump a little.
So when you read that CLOCK_REALTIME might have discontinuities, might jump forwards as well as backwards, that's not a bug, it's a feature: CLOCK_REALTIME must have those possibilities, if it's to cope with the real world with leap seconds and occasionally-wrong clocks.
So if you're writing code which is supposed to work with times matching those in the real world, CLOCK_REALTIME is what you want, warts and all. Ideally, though, you'll write your code in such a way that it behaves reasonably gracefully (does not crash or do something bizarre) if, once in a while, the system clock jumps forwards or backwards for some reason.
As you probably know from the other question you referenced, CLOCK_MONOTONIC is guaranteed to always step forward at exactly one second per second, with no jumps or discontinuities, but the absolute value of the clock doesn't mean much. If the CLOCK_MONOTONIC value is 13:05, that doesn't mean it's just after one in the afternoon, it typically means that the computer has been up and running for 13 hours and 5 minutes.
So if all you're interested in is relative times, CLOCK_MONOTONIC is fine. In particular, if you want to time how long something took, by subtracting the start time from the end time, it's preferable to use CLOCK_MONOTONIC values for this, since they won't give you a wrong answer if there was some kind of a time jump (that would have affected CLOCK_REALTIME) in between.
Or, in summary, as people said in the comments thread, CLOCK_REALTIME is what you need for absolute time, while CLOCK_MONOTONIC is better for relative time.
Now, a few more points.
As mentioned, CLOCK_REALTIME is not quite "wall time", because it actually deals in UTC. It uses the famous (infamous?) Unix/Posix representation of UTC seconds since 1970. For example, a CLOCK_REALTIME value of 1457852399 translates to 06:59:59 UTC on March 13, 2016. Where I live, five hours west of Greenwich, that corresponds to 01:59:59 local time. But one second later was not 2:00 in the morning for me! In fact, 1457852399 + 1 = 1457852400 corresponds to 03:00:00 Eastern time, because that's when Daylight Saving Time kicked in here.
I suggested that if your clock was wrong, a time jump was pretty much the only way to fix it, but that's not quite true. If your clock is only slightly off, it's possible to correct it by "slewing" the time gradually (by changing the clock frequency slightly) so that after a few minutes or hours it will have drifted to the correct time without a jump. That's what NTP tries to do, although depending on its configuration it may only be willing to do that for errors that are pretty small.
I said that CLOCK_MONOTONIC was typically the time the computer has been up and running. That's not guaranteed by the standard; all the standard says is that CLOCK_MONOTONIC counts time since some arbitrary timepoint. On systems that do implement CLOCK_MONOTONIC as the time the system has been up, there can be two interpretations: is it time since boot, or the time the system has been up and running (that is, minus any time it was asleep or suspended)? On many systems, there's yet another clock, CLOCK_BOOTTIME, that counts time since boot (whether up or suspended), while CLOCK_MONOTONIC counts only time the system was up and running.
I said, "CLOCK_MONOTONIC is guaranteed to always step forward at exactly one second per second", but that may not be strictly correct, either. If your computer is in the middle of a time-slewing operation, trying to gradually correct an absolute time error, it may actually be the case that CLOCK_MONOTONIC is temporarily stepping at 1.001 seconds per second, or 0.999 seconds per second, or something like that. The discrepancy will usually be quite small, but in case it matters to you, some systems have yet other clock types you can use, such as CLOCK_MONOTONIC_RAW, that are supposed to be free of such perturbations.
Finally, if you want to track proper time, and you want to avoid jumps or discontinuities at leap seconds, you've got a problem, because of the poor handling of leap seconds in traditional Unix/Linux (and Windows, and all other) computer systems. Under recent (4.x?) Linux kernels, there's a CLOCK_TAI which may help. Some experimental systems may implement yet another clock, CLOCK_UTC, which handles UTC time with leap seconds properly. Both of those have some other costs, though, and you'd have to really know what you were doing to use them effectively, at least with today's level of support. See the LEAPSECS mailing list for more information.
If I understand correctly, when you launch a CUDA kernel asynchronously, it may begin execution immediately or it may wait for previous asynchronous calls (transfers, kernels, etc) to complete first. (I also understand that kernels can run concurrently in some cases, but I want to ignore that for now).
How can I find out the time between launching a kernel ("queuing") and when it actually begins execution. In fact, I really just want to know the average "queued time" for all launches in a single run of my program (generally in the tens or hundreds of thousands of kernel launches.)
I can easily calculate the average execution time per kernel with events (~500us). I tried to simulate - I dropped the results of CLOCK() every time a kernel is launched, with the idea that I could then determine how long the launch queue was when each kernel was launched. But CLOCK() does not have high enough precision (0.01s) - sometimes as many as 60 kernels appear to be launched at a single time, when of course in reality many are not.
Rather than clock use the QueryPerformanceTimer which counts based on machine clock cycles.
Code for QueryPerformanceTimer
Secondly, the profiling tool (Visual Profiler) only measures serial launches [see page 24] and [see post number 3].
Thus the best option is (1) use QueryPerformanceTimer (or the Visual Profiler) such that you get an accurate measurement of a single launch and (2) use QueryPerformanceTimer to get the timing of multiple launches and observe whether the timing results suggest that asynchronous launching took place.
I have a C code with some functions.I need to find out the execution time of each function, I have tried using gettimeofday and rdtsc,but i guess that because its multi core system the output time provided involves the switching time between the processors. I wanted it to be serialized. So if somebody can give me an idea that how should i calulate the time or at least let me know about the syntax of rdstcp.
P.S. please reply as soon as possible
Thanks :)
It's a little impractical to expect nanosecond resolution.
You can't add code just to output the execution times of functions without increasing execution time. When you take the code out, the timing changes.
In practice, this kind of measurement is made by observing the CPU timing signals on an oscilloscope (or logic analyser).
If you have multiple cores, then the CPU timer won't be stable between them. So set the thread affinity to keep it on the one core. You also might want to use a real time timer to measure the time for the process or thread using clock_gettime(CLOCK_PROCESS_CPUTIMER_ID). Read the note for SMP systems in the usage for that function.
Both of these will effect the timing of the program, so perform multiple iterations of whatever you are benchmarking, and don't call the timing functions too often to try and mitigate this.
There should be some way to set processor affinity to tell the operating system to only run that thread on a particuar core.
In windows there is a SetThreadAffinity system call, I imagine there is a similar function in linux, although I don't know what it is called.
You could boot your dual core system to use one core only using the following kernel parameter:
maxcpus=1
But the measured time will still comprise process contest switching and thus depend on the activity on the other processes on the system. Are you interested in the execution time, or the CPU time needed to execute your task ?
Mate, I'm not sure about this, but even if you're dual core,unless the program is threaded, it will only run in 1 thread (meaning 1 core), so it should not involve the time of switching between processors, I believe there is no such thing...
Pavium is correct, the only way to get decent timing at this resolution in with an oscilloscope and toggling GPIO pins.
Bear in mind that this is all a bit academic anyway: I suppose you are running with an operating system, etc, so there is no way to get a straight run at the hardware.
You really need to look at the reason you want this measurement. Is it a performance benchmark for some code? You could try running the code many thousands of times and get some statistics. For this kind of approach I would recommend you read Zed Shaws diatribe to make sure the numbers aren't fooling you.
Precise Performance Measuring was impossible until Linux kernel 2.6.31. In this kernel a new library for accessing the performacne counters of the CPU and IMHO correcting times in the scheduler was added.
Unfortunately i don't have more details but maybe it is a starting point for more information search. I'm just adding this because nobody mentioned it before
Use the struct timespec structure & clock_gettime function as follows to obtain the time of execution of the code in nanoseconds precision
struct timespec start, end;
clock_gettime(CLOCK_REALTIME,&start);
/* Do something */
clock_gettime(CLOCK_REALTIME,&end);
It returns a value as ((((unsigned64)start.tv_sec) * ((unsigned64)(1000000000L))) + ((unsigned64)(start.tv_nsec))))
Moreover this I've used for multithreaded concepts too..
Hope this answer will be more helpful for you to get your desired execution time in nanoseconds.
I am using QueryPerformanceCounter to time some code. I was shocked when the code starting reporting times that were clearly wrong. To convert the results of QPC into "real" time you need to divide by the frequency returned from QueryPerformanceFrequency, so the elapsed time is:
Time = (QPC.end - QPC.start)/QPF
After a reboot, the QPF frequency changed from 2.7 GHz to 4.1 GHz. I do not think that the actual hardware frequency changed as the wall clock time of the running program did not change although the time reported using QPC did change (it dropped by 2.7/4.1).
MyComputer->Properties shows:
Intel(R)
Pentium(R)
4 CPU 2.80 GHz; 4.11 GHz;
1.99 GB of RAM; Physical Address Extension
Other than this, the system seems to be working fine.
I will try a reboot to see if the problem clears, but I am concerned that these critical performance counters could become invalid without warning.
Update:
While I appreciate the answers and especially the links, I do not have one of the affected chipsets nor to I have a CPU clock that varies itself. From what I have read, QPC and QPF are based on a timer in the PCI bus and not affected by changes in the CPU clock. The strange thing in my situation is that the FREQUENCY reported by QPF changed to an incorrect value and this changed frequency was also reported in MyComputer -> Properties which I certainly did not write.
A reboot fixed my problem (QPF now reports the correct frequency) but I assume that if you are planning on using QPC/QPF you should validate it against another timer before trusting it.
Apparently there is a known issue with QPC on some chipsets, so you may want to make sure you do not have those chipset. Additionally some dual core AMDs may also cause a problem. See the second post by sebbbi, where he states:
QueryPerformanceCounter() and
QueryPerformanceFrequency() offer a
bit better resolution, but have
different issues. For example in
Windows XP, all AMD Athlon X2 dual
core CPUs return the PC of either of
the cores "randomly" (the PC sometimes
jumps a bit backwards), unless you
specially install AMD dual core driver
package to fix the issue. We haven't
noticed any other dual+ core CPUs
having similar issues (p4 dual, p4 ht,
core2 dual, core2 quad, phenom quad).
From this answer.
You should always expect the core frequency to change on any CPU that supports technology such as SpeedStep or Cool'n'Quiet. Wall time is not affected, it uses a RTC. You should probably stop using the performance counters, unless you can tolerate a few (5-50) millisecond's worth of occasional phase adjustments, and are willing to perform some math in order to perform the said phase adjustment by continuously or periodically re-normalizing your performance counter values based on the reported performance counter frequency and on RTC low-resolution time (you can do this on-demand, or asynchronously from a high-resolution timer, depending on your application's ultimate needs.)
You can try to use the Stopwatch class from .NET, it could help with your problem since it abstracts from all this low-lever stuff.
Use the IsHighResolution property to see whether the timer is based on a high-resolution performance counter.
Note: On a multiprocessor computer, it
does not matter which processor the
thread runs on. However, because of
bugs in the BIOS or the Hardware
Abstraction Layer (HAL), you can get
different timing results on different
processors. To specify processor
affinity for a thread, use the
ProcessThread..::.ProcessorAffinity
method.
Just a shot in the dark.
On my home PC I used to have "AI NOS" or something like that enabled in the BIOS. I suspect this screwed up the QueryPerformanceCounter/QueryPerformanceFrequency APIs because although the system clock ran at the normal rate, and normal apps ran perfectly, all full screen 3D games ran about 10-15% too fast, causing, for example, adjacent lines of dialog in a game to trip on each other.
I'm afraid you can't say "I shouldn't have this problem" when you're using QueryPerformance* - while the documentation states that the value returned by QueryPerformanceFrequency is constant, practical experimentation shows that it really isn't.
However you also don't want to be calling QPF every time you call QPC either. In practice we found that periodically (in our case once a second) calling QPF to get a fresh value kept the timers synchronised well enough for reliable profiling.
As has been pointed out as well, you need to keep all of your QPC calls on a single processor for consistent results. While this might not matter for profiling purposes (because you can just use ProcessorAffinity to lock the thread onto a single CPU), you don't want to do this for timing which is running as part of a proper multi-threaded application (because then you run the risk of locking a hard working thread to a CPU which is busy).
Especially don't arbitrarily lock to CPU 0, because you can guarantee that some other badly coded application has done that too, and then both applications will fight over CPU time on CPU 0 while CPU 1 (or 2 or 3) sit idle. Randomly choose from the set of available CPUs and you have at least a fighting chance that you're not locked to an overloaded CPU.