I want to get the overall total CPU usage for an application in C, the total CPU usage like we get in the TaskManager...
I want to know ... for windows and linux :: current Total CPU utilization by all processes ..... as we see in the task manager.
This is platform-specific:
In Windows, you can use the GetProcessTimes() function.
In Linux, you can actually just use clock().
These can be used to measure the amount of CPU time taken between two time intervals.
EDIT :
To get the CPU consumption (as a percentage), you will need to divide the total CPU time by the # of logical cores that the OS sees, and then divided by the total wall-clock time:
% CPU usage = (CPU time) / (# of cores) / (wall time)
Getting the # of logical cores is also platform-specific:
Windows: GetSystemInfo()
Linux: sysconf(_SC_NPROCESSORS_ONLN)
Under POSIX, you want getrusage(2)'s ru_utime field. Use RUSAGE_SELF for just the calling process, and RUSAGE_CHILDEN for all terminated and wait(2)ed-upon children. Linux also supports RUSAGE_THREAD for just the calling thread. Use ru_stime if you want the system time, which can be summed with ru_utime for total time actively running (not wall time).
It is usually operating system specific.
You could use the clock function, returning a clock_t (some integer type, like perhaps long). On Linux systems it measures the CPU time in microseconds.
Related
I got the following problem: I have to measure the time a program needs to be executed. A scalar version of the program works fine with the code below, but when using OpenMP, it works on my PC, but not on the resource I am supposed to use.
In fact:
scalar program rt 34s
openmp program rt 9s
thats my pc (everything working) -compiled with visual studio
the ressource I have to use (I think Linux, compiled with gcc):
scalar program rt 9s
openmp program rt 9s (but the text pops immediately afterwards up, so it should be 0-1s)
my gues is, that it adds all ticks, which is about the same amount and devides them by the tick rate of a single core. My question is how to solve this, and if there is a better way to watch the time in the console on c.
clock_t start, stop;
double t = 0.0;
assert((start = clock()) != -1);
... code running
t = (double)(stop - start) / CLOCKS_PER_SEC;
printf("Run time: %f\n", t);
To augment Mark's answer: DO NOT USE clock()
clock() is an awful misunderstanding from the old computer era, who's actual implementation differs greatly from platform to platform. Behold:
on Linux, *BSD, Darwin (OS X) -- and possibly other 4.3BSD descendants -- clock() returns the processor time (not the wall-clock time!) used by the calling process, i.e. the sum of each thread's processor time;
on IRIX, AIX, Solaris -- and possibly other SysV descendants -- clock() returns the processor time (again not the wall-clock time) used by the calling process AND all its terminated child processes for which wait, system or pclose was executed;
HP-UX doesn't even seem to implement clock();
on Windows clock() returns the wall-clock time (not the processor time).
In the descriptions above processor time usually means the sum of user and system time. This could be less than the wall-clock (real) time, e.g. if the process sleeps or waits for file IO or network transfers, or it could be more than the wall-clock time, e.g. when the process has more than one thread, actively using the CPU.
Never use clock(). Use omp_get_wtime() - it exists on all platforms, supported by OpenMP, and always returns the wall-clock time.
Converting my earlier comment to an answer in the spirit of doing anything for reputation ...
Use two calls to omp_get_wtime to get the wallclock time (in seconds) between two points in your code. Note that time is measured individually on each thread, there is no synchronisation of clocks across threads.
Your problem is clock. By the C standard it measures the time passed on the CPU for your process, not wall clock time. So this is what linux does (usually they stick to the standards) and then the total CPU time for the sequential program or the parallel program are the same, as they should be.
Windows OS deviate from that, in that there clock is the wall clock time.
So use other time measurement functions. For standard C this would be time or if you need more precision with the new C11 standard you could use timespec_get, for OpenMP there are other possibilities as have already be mentioned.
Hypothetical Question.
I wrote 1 multithreading code, which used to form 8 threads and process the data on different threads and complete the process. I am also using semaphore in the code. But it is giving me different execution time on different machines. Which is OBVIOUS!!
Execution time for same code:
On Intel(R) Core(TM) i3 CPU Machine: 36 sec
On AMD FX(tm)-8350 Eight-Core Processor Machine : 32 sec
On Intel(R) Core(TM) i5-2400 CPU Machine : 16.5 sec
So, my question is,
Is there any kind of setting/variable/command/switch i am missing which could be enabled in higher machine but not enabled in lower machine, which is making higher machine execution time faster? Or, is it the processor only, because of which the time difference is.
Any kind of help/suggestions/comments will be helpful.
Operating System: Linux (Centos5)
Multi-threading benchmarks should be performed with significant statistical sampling (ex: around 50 experiments per machines). Furthermore, the "environement" in which the program runs is important too (ex: was firefox running at the same time or not).
Also, depending on resources consumptions, runtimes can vary. In other words, without a more complete portrait of your experimental conditions, it's impossible to answer your question.
Some observations I have made from my personnal experiment:
Huge memory consumption can alter the results depending on the swapping settings on the machine.
Two "identical" machines with the same OS installed under the same conditions can show different results.
When total throughput is small compared to 5 mins, results appear pretty random.
etc.
I used to have a problem about time measure.My problem is the time in multithread is larger than that in single thread. Finally I found the problem is that not to measure the time in each thread and sum them but to measure out of the all thread. For example:
Wrong measure:
int main(void)
{
//create_thread();
//join_thread();
//sum the time
}
void thread(void *)
{
//measure time in thread
}
Right measure:
int main(void)
{
//record start time
//create_thread();
//join_thread();
//record end time
//calculate the diff
}
void thread(void *)
{
//measure time in thread
}
In section 3.9 of the classic APUE(Advanced Programming in the UNIX Environment), the author measured the user/system time consumed in his sample program which runs against varying buffer size(an I/O read/write program).
The result table goes kinda like(all the time are in the unit of second):
BUFF_SIZE USER_CPU SYSTEM_CPU CLOCK_TIME LOOPS
1 124.89 161.65 288.64 103316352
...
512 0.27 0.41 7.03 201789
...
I'm curious about and really wondering how to measure the USER/SYSTEM CPU time for a piece of program?
And in this example, what does the CLOCK TIME mean and how to measure it?
Obviously it isn't simply the sum of user CPU time and system CPU time.
You could easily measure the running time of a program using the time command under *nix:
$ time myprog
real 0m2.792s
user 0m0.099s
sys 0m0.200s
The real or CLOCK_TIME refers to the wall clock time i.e the time taken from the start of the program to finish and includes even the time slices taken by other processes when the kernel context switches them. It also includes any time, the process is blocked (on I/O events, etc.)
The user or USER_CPU refers to the CPU time spent in the user space, i.e. outside the kernel. Unlike the real time, it refers to only the CPU cycles taken by the particular process.
The sys or SYSTEM_CPU refers to the CPU time spent in the kernel space, (as part of system calls). Again this is only counting the CPU cycles spent in kernel space on behalf of the process and not any time it is blocked.
In the time utility, the user and sys are calculated from either times() or wait() system calls. The real is usually calculated using the time differences in the 2 timestamps gathered using the gettimeofday() system call at the start and end of the program.
One more thing you might want to know is real != user + sys. On a multicore system the user or sys or their sum can quite easily exceed the real time.
Partial answer:
Well, CLOCK_TIME is same as time shown by a clock, time passed in the so called "real world".
One way to measure that is to use gettimeofday POSIX function, which stores time to caller's struct timeval, containing UNIX seconds field and a microsecond field (actual accuracy is often less). Example for using that in typical benchmark code (ignoring errors etc):
struct timeval tv1, tv2;
gettimeofday(&tv1, NULL);
do_operation_to_measure();
gettimeofday(&tv2, NULL);
// get difference, fix value if microseconds became negative
struct timeval tvdiff = { tv2.tv_sec - tv1.tv_sec, tv2.tv_usec - tv1.tv_usec };
if (tvdiff.tv_usec < 0) { tvdiff.tv_usec += 1000000; tvdiff.tv_sec -= 1; }
// print it
printf("Elapsed time: %ld.%06ld\n", tvdiff.tv_sec, tvdiff.tv_usec);
I'm using something like this to count how long does it takes my program from start to finish:
int main(){
clock_t startClock = clock();
.... // many codes
clock_t endClock = clock();
printf("%ld", (endClock - startClock) / CLOCKS_PER_SEC);
}
And my question is, since there are multiple process running at the same time, say if for x amount of time my process is in idle, durning that time will clock tick within my program?
So basically my concern is, say there's 1000 clock cycle passed by, but my process only uses 500 of them, will I get 500 or 1000 from (endClock - startClock)?
Thanks.
This depends on the OS. On Windows, clock() measures wall-time. On Linux/Posix, it measures the combined CPU time of all the threads.
If you want wall-time on Linux, you should use gettimeofday().
If you want CPU-time on Windows, you should use GetProcessTimes().
EDIT:
So if you're on Windows, clock() will measure idle time.
On Linux, clock() will not measure idle time.
clock on POSIX measures cpu time, but it usually has extremely poor resolution. Instead, modern programs should use clock_gettime with the CLOCK_PROCESS_CPUTIME_ID clock-id. This will give up to nanosecond-resolution results, and usually it's really just about that good.
As per the definition on the man page (in Linux),
The clock() function returns an approximation of processor time used
by the program.
it will try to be as accurate a possible, but as you say, some time (process switching, for example) is difficult to account to a process, so the numbers will be as accurate as possible, but not perfect.
I need to find out time taken by a function in my application. Application is a MS VIsual Studio 2005 solution, all C code.
I used thw windows API GetLocalTime(SYSTEMTIME *) to get the current system time before and after the function call which I want to measure time of.
But this has shortcoming that it lowest resolution is only 1msec. Nothing below that. So I cannot get any time granularity in micro seconds.
I know that time() which gives the time elapsed since the epoch time, also has resolution of 1msec (No microseconds)
1.) Is there any other Windows API which gives time in microseconds which I can use to measure the time consumed by my function?
-AD
There are some other possibilities.
QueryPerformanceCounter and QueryPerformanceFrequency
QueryPerformanceCounter will return a "performance counter" which is actually a CPU-managed 64-bit counter that increments from 0 starting with the computer power-on. The frequency of this counter is returned by the QueryPerformanceFrequency. To get the time reference in seconds, divide performance counter by performance frequency. In Delphi:
function QueryPerfCounterAsUS: int64;
begin
if QueryPerformanceCounter(Result) and
QueryPerformanceFrequency(perfFreq)
then
Result := Round(Result / perfFreq * 1000000);
else
Result := 0;
end;
On multiprocessor platforms, QueryPerformanceCounter should return consistent results regardless of the CPU the thread is currently running on. There are occasional problems, though, usually caused by bugs in hardware chips or BIOSes. Usually, patches are provided by motherboard manufacturers. Two examples from the MSDN:
Programs that use the QueryPerformanceCounter function may perform poorly in Windows Server 2003 and in Windows XP
Performance counter value may unexpectedly leap forward
Another problem with QueryPerformanceCounter is that it is quite slow.
RDTSC instruction
If you can limit your code to one CPU (SetThreadAffinity), you can use RDTSC assembler instruction to query performance counter directly from the processor.
function CPUGetTick: int64;
asm
dw 310Fh // rdtsc
end;
RDTSC result is incremented with same frequency as QueryPerformanceCounter. Divide it by QueryPerformanceFrequency to get time in seconds.
QueryPerformanceCounter is much slower thatn RDTSC because it must take into account multiple CPUs and CPUs with variable frequency. From Raymon Chen's blog:
(QueryPerformanceCounter) counts elapsed time. It has to, since its value is
governed by the QueryPerformanceFrequency function, which returns a number
specifying the number of units per second, and the frequency is spec'd as not
changing while the system is running.
For CPUs that can run at variable speed, this means that the HAL cannot
use an instruction like RDTSC, since that does not correlate with elapsed time.
timeGetTime
TimeGetTime belongs to the Win32 multimedia Win32 functions. It returns time in milliseconds with 1 ms resolution, at least on a modern hardware. It doesn't hurt if you run timeBeginPeriod(1) before you start measuring time and timeEndPeriod(1) when you're done.
GetLocalTime and GetSystemTime
Before Vista, both GetLocalTime and GetSystemTime return current time with millisecond precision, but they are not accurate to a millisecond. Their accuracy is typically in the range of 10 to 55 milliseconds. (Precision is not the same as accuracy)
On Vista, GetLocalTime and GetSystemTime both work with 1 ms resolution.
You can try to use clock() which will provide the number of "ticks" between two points. A "tick" is the smallest unit of time a processor can measure.
As a side note, you can't use clock() to determine the actual time - only the number of ticks between two points in your program.
One caution on multiprocessor systems:
from http://msdn.microsoft.com/en-us/library/ms644904(VS.85).aspx
On a multiprocessor computer, it should not matter which processor is called. However, you can get different results on different processors due to bugs in the basic input/output system (BIOS) or the hardware abstraction layer (HAL). To specify processor affinity for a thread, use the SetThreadAffinityMask function.
Al Weiner
On Windows you can use the 'high performance counter API'. Check out: QueryPerformanceCounter and QueryPerformanceCounterFrequency for the details.