I need to find out time taken by a function in my application. Application is a MS VIsual Studio 2005 solution, all C code.
I used thw windows API GetLocalTime(SYSTEMTIME *) to get the current system time before and after the function call which I want to measure time of.
But this has shortcoming that it lowest resolution is only 1msec. Nothing below that. So I cannot get any time granularity in micro seconds.
I know that time() which gives the time elapsed since the epoch time, also has resolution of 1msec (No microseconds)
1.) Is there any other Windows API which gives time in microseconds which I can use to measure the time consumed by my function?
-AD
There are some other possibilities.
QueryPerformanceCounter and QueryPerformanceFrequency
QueryPerformanceCounter will return a "performance counter" which is actually a CPU-managed 64-bit counter that increments from 0 starting with the computer power-on. The frequency of this counter is returned by the QueryPerformanceFrequency. To get the time reference in seconds, divide performance counter by performance frequency. In Delphi:
function QueryPerfCounterAsUS: int64;
begin
if QueryPerformanceCounter(Result) and
QueryPerformanceFrequency(perfFreq)
then
Result := Round(Result / perfFreq * 1000000);
else
Result := 0;
end;
On multiprocessor platforms, QueryPerformanceCounter should return consistent results regardless of the CPU the thread is currently running on. There are occasional problems, though, usually caused by bugs in hardware chips or BIOSes. Usually, patches are provided by motherboard manufacturers. Two examples from the MSDN:
Programs that use the QueryPerformanceCounter function may perform poorly in Windows Server 2003 and in Windows XP
Performance counter value may unexpectedly leap forward
Another problem with QueryPerformanceCounter is that it is quite slow.
RDTSC instruction
If you can limit your code to one CPU (SetThreadAffinity), you can use RDTSC assembler instruction to query performance counter directly from the processor.
function CPUGetTick: int64;
asm
dw 310Fh // rdtsc
end;
RDTSC result is incremented with same frequency as QueryPerformanceCounter. Divide it by QueryPerformanceFrequency to get time in seconds.
QueryPerformanceCounter is much slower thatn RDTSC because it must take into account multiple CPUs and CPUs with variable frequency. From Raymon Chen's blog:
(QueryPerformanceCounter) counts elapsed time. It has to, since its value is
governed by the QueryPerformanceFrequency function, which returns a number
specifying the number of units per second, and the frequency is spec'd as not
changing while the system is running.
For CPUs that can run at variable speed, this means that the HAL cannot
use an instruction like RDTSC, since that does not correlate with elapsed time.
timeGetTime
TimeGetTime belongs to the Win32 multimedia Win32 functions. It returns time in milliseconds with 1 ms resolution, at least on a modern hardware. It doesn't hurt if you run timeBeginPeriod(1) before you start measuring time and timeEndPeriod(1) when you're done.
GetLocalTime and GetSystemTime
Before Vista, both GetLocalTime and GetSystemTime return current time with millisecond precision, but they are not accurate to a millisecond. Their accuracy is typically in the range of 10 to 55 milliseconds. (Precision is not the same as accuracy)
On Vista, GetLocalTime and GetSystemTime both work with 1 ms resolution.
You can try to use clock() which will provide the number of "ticks" between two points. A "tick" is the smallest unit of time a processor can measure.
As a side note, you can't use clock() to determine the actual time - only the number of ticks between two points in your program.
One caution on multiprocessor systems:
from http://msdn.microsoft.com/en-us/library/ms644904(VS.85).aspx
On a multiprocessor computer, it should not matter which processor is called. However, you can get different results on different processors due to bugs in the basic input/output system (BIOS) or the hardware abstraction layer (HAL). To specify processor affinity for a thread, use the SetThreadAffinityMask function.
Al Weiner
On Windows you can use the 'high performance counter API'. Check out: QueryPerformanceCounter and QueryPerformanceCounterFrequency for the details.
Related
The man page for clock_gettime() describes CLOCK_MONOTONIC_COARSE as:
A faster but less precise version of CLOCK_MONOTONIC. Use when you need very fast, but
not fine-grained timestamps.
What does it mean for one to be a "version of" the other?
Can I validly compare one to the other, assuming I truncate a CLOCK_MONOTONIC value to the same precision as the coarse one?
Here is the man page that lists the different "versions" of Posix/Linux clocks:
https://linux.die.net/man/2/clock_gettime
Sufficiently recent versions of glibc and the Linux kernel support the
following clocks:
CLOCK_REALTIME
System-wide clock that measures real (i.e., wall-clock) time.
Setting this clock requires appropriate privileges. This clock is
affected by discontinuous jumps in the system time (e.g., if the
system administrator manually changes the clock), and by the
incremental adjustments performed by adjtime(3) and NTP.
CLOCK_REALTIME_COARSE (since Linux 2.6.32; Linux-specific)
A faster but less precise version of CLOCK_REALTIME. Use when you
need very fast, but not fine-grained timestamps.
CLOCK_MONOTONIC
Clock that cannot be set and represents monotonic time since some
unspecified starting point. This clock is not affected by
discontinuous jumps in the system time (e.g., if the system
administrator manually changes the clock), but is affected by the
incremental adjustments performed by adjtime(3) and NTP.
CLOCK_MONOTONIC_COARSE (since Linux 2.6.32; Linux-specific)
A faster but less precise version of CLOCK_MONOTONIC. Use when you
need very fast, but not fine-grained timestamps.
CLOCK_MONOTONIC_RAW (since Linux 2.6.28; Linux-specific)
Similar to CLOCK_MONOTONIC, but provides access to a raw hardware-based time that is not subject to NTP adjustments or the
incremental adjustments performed by adjtime(3).
CLOCK_BOOTTIME (since Linux 2.6.39; Linux-specific)
Identical to CLOCK_MONOTONIC, except it also includes any time that the system is suspended. This allows applications to get a
suspend-aware monotonic clock without having to deal with the
complications of CLOCK_REALTIME, which may have discontinuities if the
time is changed using settimeofday(2).
CLOCK_PROCESS_CPUTIME_ID
High-resolution per-process timer from the CPU.
CLOCK_THREAD_CPUTIME_ID
Thread-specific CPU-time clock.
As you can see above, CLOCK_MONOTONIC_COARSE was introduced in Linux 2.6.32. Here is the rationale (and the specific source patch):
https://lwn.net/Articles/347811/
fter talking with some application writers who want very fast, but not
fine-grained timestamps, I decided to try to implement a new clock_ids
to clock_gettime(): CLOCK_REALTIME_COARSE and CLOCK_MONOTONIC_COARSE
which returns the time at the last tick. This is very fast as we don't
have to access any hardware (which can be very painful if you're using
something like the acpi_pm clocksource), and we can even use the vdso
clock_gettime() method to avoid the syscall. The only trade off is you
only get low-res tick grained time resolution.
This isn't a new idea, I know Ingo has a patch in the -rt tree that
made the vsyscall gettimeofday() return coarse grained time when the
vsyscall64 sysctrl was set to 2. However this affects all applications
on a system.
With this method, applications can choose the proper speed/granularity
trade-off for themselves.
thanks
-john
ADDENDUM:
Q: What use cases might benefit from using CLOCK_MONOTONIC_COARSE or CLOCK_REALTIME_COARSE?
A: In Linux 2.6.32 time frame (2010-2011), "...application workloads (especially databases and financial service applications) perform extremely frequent gettimeofday or similar time function calls":
Redhat Enterprise: 2.6. gettimeofday speedup
Many application workloads (especially databases and financial service
applications) perform extremely frequent gettimeofday or similar time
function calls. Optimizing the efficiency of this calls can provide
major benefits.
CLOCK_MONOTONIC_COARSE uses the same timebase as CLOCK_MONOTONIC (but specifically does NOT as compared to CLOCK_MONOTONIC_RAW). Specifically, they both use wall_to_monotonic to convert a value derived from tk's xtime. RAW uses a completely different time source.
Remember, that CLOCK_MONOTONIC_COARSE is only updated once per tick (so usually about 1 ms, but ask clock_getres() to be sure). If that accuracy is good enough, then by all means subtract your clock values.
The short answer is YES (at least for Linux!), you can compare them, compute delays, etc...
The precision would be that of the less precise, most probably COARSE one.
See this short program:
#include <time.h>
#include <stdio.h>
int main()
{
int ret;
struct timespec res;
ret = clock_getres(CLOCK_MONOTONIC, &res);
if (0 != ret)
return ret;
printf("CLOCK_MONOTONIC resolution is: %ld sec, %ld nsec\n", (long)res.tv_sec, (long)res.tv_nsec);
ret = clock_getres(CLOCK_MONOTONIC_COARSE, &res);
if (0 != ret)
return ret;
printf("CLOCK_MONOTONIC_COARSE resolution is: %ld sec, %ld nsec\n", (long)res.tv_sec, (long)res.tv_nsec);
return 0;
}
It returns (Ubuntu 20.04 - 64bits - kernel 5.4)
CLOCK_MONOTONIC resolution is: 0 sec, 1 nsec
CLOCK_MONOTONIC_COARSE resolution is: 0 sec, 4000000 nsec
So MONOTONIC has nanosecond precision, and COARSE has 4 milliseconds precision.
Unlike above comment, I would on the contrary recommend to use the COARSE version when the timings you need allow.
Calls to the clock are so frequent in user programs that they have a place in vDSO
When you use COARSE versions, you have exactly zero system call, and it is as fast as your machine can run a few instructions. Thanks to vDSO your program fully stays in "userland" during the call with COARSE.
With other types of clocks, you will have some system calls, and potential access to hardware. So at least a switch to "kernel" and back to "userland".
This of course has zero importance if your program just needs a dozen of calls, but it can be a huge time saver if, on the contrary, the program relies heavily on the clock. That is why, vDSO is there in the first place: performance!
Define first what is the accuracy you need for your timings. Is
second enough, do you need milli second, micro, etc...
Have in mind, unless you are tinkering with RT systems, that time is a
relative value! Imagine you called clock_gettime, and immediately
after returning your thread gets interrupted for any kernel business:
what is the accuracy you get? That is exactly the famous question
that defeated HAL in 2001: A space Odyssey: "what time is it?".
From that you can derive what is the type of clock you need.
You can mix MONOTONIC and the COARSE version of it and still compute delays or compare (that was the original question). But of course the precision is that of the less precise.
The monotonics are best suited to do time delays and do comparisons since they don't depend on the real time (as your watch displays). They don't change when the user changes the actual time.
On the contrary, if you need to display at what time (meaningful for the user) an event occurred, don't use monotonic!
gettimeofday() is hardware dependent with RTC.
Can some one suggest how we can avoid the use of the same in Application Programming.
How we can approach the use of System ticks ?
thanks in advance !
To get time in ticks you might like to use times().
However is is not clear whether those ticks are measured from boot-time.
From man times:
RETURN VALUE
times() returns the number of clock ticks that have elapsed since an
arbitrary point in the past. [...]
[...]
NOTES
On Linux, the "arbitrary point in the past" from which the return
value of times() is measured has varied across kernel versions. On
Linux 2.4 and earlier this point is the moment the system was booted.
Since Linux 2.6, this point is (2^32/HZ) - 300 (i.e., about 429
million) seconds before system boot time. This variability across
kernel versions (and across UNIX implementations), combined with the
fact that the returned value may overflow the range of clock_t, means
that a portable application would be wise to avoid using this value.
To measure changes in elapsed time, use clock_gettime(2) instead.
Reading this using clock_gettitme() with the CLOCK_BOOTTIME timer might be the more secure and more portable way to go. If this function and/or timer is available for system without RTC I'm not sure. Others are encouraged to clarfiy this.
I know QueryPerformanceCounter() can be used for timing functions. I want to know:
1-Can I increase the resolution of the timer by over-clocking the CPU (so it ticks faster)?
2-Basically what makes some timers more precise than others, (e.g, QueryPerformanceCounter() is more precise as compared to GetTickCount())? If there is single crystal oscillator on the motherboard , why some timers are slower as compared to others?
QueryPerformanceCounter has very high resolution - normally less than one nanosecond. I don't see why you'd like to increase it. Overclocking will increase it, but it seems like a very weak reason for overclocking.
QueryPerformanceCounter is very accurate, but somewhat expensive and not very convenient.
a. It's expensive because it uses the expensive rdtsc instruction. Faster timers can just read an integer from memory. This integer needs to be updated, and we don't want to do it too often (1000 times a second is reasonable), so we get a very cheap timer, with low precision. That's basically GetTickCount.
b. It's inconvenient because it uses units which change between computers. Sometimes it will be nanoseconds, sometimes half-nano, or other values. It makes it harder to calculate with.
a. Another source of inconvenience is that it returns very large numbers, which may overflow when you try to do math with them, so you need to be careful.
The timing source for QPC is machine dependent. It is typically picked up from a frequency available somewhere in the chipset. Whether overclocking the cpu is going to affect it is highly dependent on your motherboard design. The simplest way is to just try it, use QueryPerformanceFrequency to see the effect.
GetTickCount is driven from an entirely different timer source, the signal that also generates the clock interrupt. It is not very precise, 1/64 of second normally, but it is highly accurate. Your machine contacts a time server from time to time to recalibrate the clock and adjust the clock correction factor. Which makes it accurate to about a second over an entire year. QPC is very precise, but not nearly as accurate. Use it only to time short intervals.
1 - Yes, Internally, one of the better timers is rdtsc, which does give you the clock value. Combining this with information from cpuid instruction, gives you time.
2 - The other timers rely upon various timing sources, such as the 8253 timer, for instance.
QPF is a wrapper added by Microsoft on and over what rdtsc provides. Read this article for more info:
http://www.strchr.com/performance_measurements_with_rdtsc
I want to get the overall total CPU usage for an application in C, the total CPU usage like we get in the TaskManager...
I want to know ... for windows and linux :: current Total CPU utilization by all processes ..... as we see in the task manager.
This is platform-specific:
In Windows, you can use the GetProcessTimes() function.
In Linux, you can actually just use clock().
These can be used to measure the amount of CPU time taken between two time intervals.
EDIT :
To get the CPU consumption (as a percentage), you will need to divide the total CPU time by the # of logical cores that the OS sees, and then divided by the total wall-clock time:
% CPU usage = (CPU time) / (# of cores) / (wall time)
Getting the # of logical cores is also platform-specific:
Windows: GetSystemInfo()
Linux: sysconf(_SC_NPROCESSORS_ONLN)
Under POSIX, you want getrusage(2)'s ru_utime field. Use RUSAGE_SELF for just the calling process, and RUSAGE_CHILDEN for all terminated and wait(2)ed-upon children. Linux also supports RUSAGE_THREAD for just the calling thread. Use ru_stime if you want the system time, which can be summed with ru_utime for total time actively running (not wall time).
It is usually operating system specific.
You could use the clock function, returning a clock_t (some integer type, like perhaps long). On Linux systems it measures the CPU time in microseconds.
I'm looking for a light weight timer to measure timing of few sections of a C code. This timer implementation shouldn't add to the overall program execution time.
Look at clock_gettime for POSIX-compliant platforms; you can do it yourself really easily by comparing one timestamp with one generated a little later.
Remember to use the CLOCK_PROCESS_CPUTIME_ID or CLOCK_THREAD_CPUTIME_ID parameters to specify that you want CPU time taken just by that process (and its children) or thread, and not the wider, absolute, "wall" time.
An alternative on Windows might be GetProcessTimes.
What is time?
Imagine your code took 3 miliseconds ... but it runs on 3 cores ... it used 2 milliseconds on core 1; 1.5 milliseconds on core 2; and 1.2 milliseconds on core 3 for a total of 4.7 milliseconds.
So ... is 3 milliseconds the same as 4.7 milliseconds?
Oh, don't forget that those 4.7 milliseconds were in fact also used to filter incoming internet connections and to download anti-virus database.
Use a profiler, and even then, don't trust the results :)
For POSIX, try gettimeofday() (obsolescent) clock_gettime().
For Windows, apparently, you can use GetSystemTime().
The closest thing to profiling without observation interference is oprofile. But it can't directly measure intervals; it only gives you a statistical map of where the whole program (or whole system) is spending its time.
If you really want cheap interval timing, on x86 you can use the rdtsc instruction in inline asm.
static inline unsigned rdtsc()
{
unsigned x;
__asm__ __volatile__ ( "rdtsc" : "=a"(x) : : "edx" );
return x;
}
Use this to save the timestamp before and after and take the difference. You could modify this code to save the full 64-bit result, but I opted just for the 32-bit result assuming you'll be timing intervals shorter than 4 billion cycles and don't want to waste time on 64-bit subtraction.