Calculating runtime differences in time realtime - c

I got the following problem: I have to measure the time a program needs to be executed. A scalar version of the program works fine with the code below, but when using OpenMP, it works on my PC, but not on the resource I am supposed to use.
In fact:
scalar program rt 34s
openmp program rt 9s
thats my pc (everything working) -compiled with visual studio
the ressource I have to use (I think Linux, compiled with gcc):
scalar program rt 9s
openmp program rt 9s (but the text pops immediately afterwards up, so it should be 0-1s)
my gues is, that it adds all ticks, which is about the same amount and devides them by the tick rate of a single core. My question is how to solve this, and if there is a better way to watch the time in the console on c.
clock_t start, stop;
double t = 0.0;
assert((start = clock()) != -1);
... code running
t = (double)(stop - start) / CLOCKS_PER_SEC;
printf("Run time: %f\n", t);

To augment Mark's answer: DO NOT USE clock()
clock() is an awful misunderstanding from the old computer era, who's actual implementation differs greatly from platform to platform. Behold:
on Linux, *BSD, Darwin (OS X) -- and possibly other 4.3BSD descendants -- clock() returns the processor time (not the wall-clock time!) used by the calling process, i.e. the sum of each thread's processor time;
on IRIX, AIX, Solaris -- and possibly other SysV descendants -- clock() returns the processor time (again not the wall-clock time) used by the calling process AND all its terminated child processes for which wait, system or pclose was executed;
HP-UX doesn't even seem to implement clock();
on Windows clock() returns the wall-clock time (not the processor time).
In the descriptions above processor time usually means the sum of user and system time. This could be less than the wall-clock (real) time, e.g. if the process sleeps or waits for file IO or network transfers, or it could be more than the wall-clock time, e.g. when the process has more than one thread, actively using the CPU.
Never use clock(). Use omp_get_wtime() - it exists on all platforms, supported by OpenMP, and always returns the wall-clock time.

Converting my earlier comment to an answer in the spirit of doing anything for reputation ...
Use two calls to omp_get_wtime to get the wallclock time (in seconds) between two points in your code. Note that time is measured individually on each thread, there is no synchronisation of clocks across threads.

Your problem is clock. By the C standard it measures the time passed on the CPU for your process, not wall clock time. So this is what linux does (usually they stick to the standards) and then the total CPU time for the sequential program or the parallel program are the same, as they should be.
Windows OS deviate from that, in that there clock is the wall clock time.
So use other time measurement functions. For standard C this would be time or if you need more precision with the new C11 standard you could use timespec_get, for OpenMP there are other possibilities as have already be mentioned.

Related

How to measure user/system cpu time for a piece of program?

In section 3.9 of the classic APUE(Advanced Programming in the UNIX Environment), the author measured the user/system time consumed in his sample program which runs against varying buffer size(an I/O read/write program).
The result table goes kinda like(all the time are in the unit of second):
BUFF_SIZE USER_CPU SYSTEM_CPU CLOCK_TIME LOOPS
1 124.89 161.65 288.64 103316352
...
512 0.27 0.41 7.03 201789
...
I'm curious about and really wondering how to measure the USER/SYSTEM CPU time for a piece of program?
And in this example, what does the CLOCK TIME mean and how to measure it?
Obviously it isn't simply the sum of user CPU time and system CPU time.
You could easily measure the running time of a program using the time command under *nix:
$ time myprog
real 0m2.792s
user 0m0.099s
sys 0m0.200s
The real or CLOCK_TIME refers to the wall clock time i.e the time taken from the start of the program to finish and includes even the time slices taken by other processes when the kernel context switches them. It also includes any time, the process is blocked (on I/O events, etc.)
The user or USER_CPU refers to the CPU time spent in the user space, i.e. outside the kernel. Unlike the real time, it refers to only the CPU cycles taken by the particular process.
The sys or SYSTEM_CPU refers to the CPU time spent in the kernel space, (as part of system calls). Again this is only counting the CPU cycles spent in kernel space on behalf of the process and not any time it is blocked.
In the time utility, the user and sys are calculated from either times() or wait() system calls. The real is usually calculated using the time differences in the 2 timestamps gathered using the gettimeofday() system call at the start and end of the program.
One more thing you might want to know is real != user + sys. On a multicore system the user or sys or their sum can quite easily exceed the real time.
Partial answer:
Well, CLOCK_TIME is same as time shown by a clock, time passed in the so called "real world".
One way to measure that is to use gettimeofday POSIX function, which stores time to caller's struct timeval, containing UNIX seconds field and a microsecond field (actual accuracy is often less). Example for using that in typical benchmark code (ignoring errors etc):
struct timeval tv1, tv2;
gettimeofday(&tv1, NULL);
do_operation_to_measure();
gettimeofday(&tv2, NULL);
// get difference, fix value if microseconds became negative
struct timeval tvdiff = { tv2.tv_sec - tv1.tv_sec, tv2.tv_usec - tv1.tv_usec };
if (tvdiff.tv_usec < 0) { tvdiff.tv_usec += 1000000; tvdiff.tv_sec -= 1; }
// print it
printf("Elapsed time: %ld.%06ld\n", tvdiff.tv_sec, tvdiff.tv_usec);

measuring time of multi-threading program

I measured time with function clock() but it gave bad results. I mean it gives the same results for program with one thread and for the same program running with OpenMP with many threads. But in fact, I notice with my watch that with many threads program counts faster.
So I need some wall-clock timer...
My question is: What is better function for this issue?
clock_gettime() or mb gettimeofday() ? or mb something else?
if clock_gettime(),then with which clock? CLOCK_REALTIME or CLOCK_MONOTONIC?
using mac os x (snow leopard)
If you want wall-clock time, and clock_gettime() is available, it's a good choice. Use it with CLOCK_MONOTONIC if you're measuring intervals of time, and CLOCK_REALTIME to get the actual time of day.
CLOCK_REALTIME gives you the actual time of day, but is affected by adjustments to the system time -- so if the system time is adjusted while your program runs that will mess up measurements of intervals using it.
CLOCK_MONOTONIC doesn't give you the correct time of day, but it does count at the same rate and is immune to changes to the system time -- so it's ideal for measuring intervals, but useless when correct time of day is needed for display or for timestamps.
I think clock() counts the total CPU usage among all threads, I had this problem too...
The choice of wall-clock timing method is personal preference. I use an inline wrapper function to take time-stamps (take the difference of 2 time-stamps to time your processing). I've used floating point for convenience (units are in seconds, don't have to worry about integer overflow). With multi-threading, there are so many asynchronous events that in my opinion it doesn't make sense to time below 1 microsecond. This has worked very well for me so far :)
Whatever you choose, a wrapper is the easiest way to experiment
inline double my_clock(void) {
struct timeval t;
gettimeofday(&t, NULL);
return (1.0e-6*t.tv_usec + t.tv_sec);
}
usage:
double start_time, end_time;
start_time = my_clock();
//some multi-threaded processing
end_time = my_clock();
printf("time is %lf\n", end_time-start_time);

Use clock() to count program execution time

I'm using something like this to count how long does it takes my program from start to finish:
int main(){
clock_t startClock = clock();
.... // many codes
clock_t endClock = clock();
printf("%ld", (endClock - startClock) / CLOCKS_PER_SEC);
}
And my question is, since there are multiple process running at the same time, say if for x amount of time my process is in idle, durning that time will clock tick within my program?
So basically my concern is, say there's 1000 clock cycle passed by, but my process only uses 500 of them, will I get 500 or 1000 from (endClock - startClock)?
Thanks.
This depends on the OS. On Windows, clock() measures wall-time. On Linux/Posix, it measures the combined CPU time of all the threads.
If you want wall-time on Linux, you should use gettimeofday().
If you want CPU-time on Windows, you should use GetProcessTimes().
EDIT:
So if you're on Windows, clock() will measure idle time.
On Linux, clock() will not measure idle time.
clock on POSIX measures cpu time, but it usually has extremely poor resolution. Instead, modern programs should use clock_gettime with the CLOCK_PROCESS_CPUTIME_ID clock-id. This will give up to nanosecond-resolution results, and usually it's really just about that good.
As per the definition on the man page (in Linux),
The clock() function returns an approximation of processor time used
by the program.
it will try to be as accurate a possible, but as you say, some time (process switching, for example) is difficult to account to a process, so the numbers will be as accurate as possible, but not perfect.

How to measure the ACTUAL execution time of a C program under Linux?

I know this question may have been commonly asked before, but it seems most of those questions are regarding the elapsed time (based on wall clock) of a piece of code. The elapsed time of a piece of code is unlikely equal to the actual execution time, as other processes may be executing during the elapsed time of the code of interest.
I used getrusage() to get the user time and system time of a process, and then calculate the actual execution time by (user time + system time). I am running my program on Ubuntu. Here are my questions:
How do I know the precision of getrusage()?
Are there other approaches that can provide higher precision than getrusage()?
You can check the real CPU time of a process on linux by utilizing the CPU Time functionality of the kernel:
#include <time.h>
clock_t start, end;
double cpu_time_used;
start = clock();
... /* Do the work. */
end = clock();
cpu_time_used = ((double) (end - start)) / CLOCKS_PER_SEC;
Source: http://www.gnu.org/s/hello/manual/libc/CPU-Time.html#CPU-Time
That way, you count the CPU ticks or the real amount of instructions worked upon by the CPU on the process, thus getting the real amount of work time.
The getrusage() function is the only standard/portable way that I know of to get "consumed CPU time".
There isn't a simple way to determine the precision of returned values. I'd be tempted to call the getrusage() once to get an initial value, and the call it repeatedly until the value/s returned are different from the initial value, and then assume the effective precision is the difference between the initial and final values. This is a hack (it would be possible for precision to be higher than this method determines, and the result should probably be considered a worst case estimate) but it's better than nothing.
I'd also be concerned about the accuracy of the values returned. Under some kernels I'd expect that a counter is incremented for whatever code happens to be running when a timer IRQ occurs; and therefore it's possible for a process to be very lucky (and continually block just before the timer IRQ occurs) or very unlucky (and unblock just before the timer IRQ occurs). In this case "lucky" could mean a CPU hog looks like it uses no CPU time, and "unlucky" could means a process that uses very little CPU time looks like a CPU hog.
For specific versions of specific kernels on specific architecture/s (potentially depending on if/when the kernel is compiled with specific configuration options in some cases), there may be higher precision alternatives that aren't portable and aren't standard...
You can use this piece of code :
#include <sys/time.h>
struct timeval start, end;
gettimeofday(&start, NULL);
.
.
.
gettimeofday(&end, NULL);
delta = ((end.tv_sec - start.tv_sec) * 1000000u +
end.tv_usec - start.tv_usec) / 1.e6;
printf("Time is : %f\n",delta);
It will show you the execution time for piece of your code

How can I find the execution time of a section of my program in C?

I'm trying to find a way to get the execution time of a section of code in C. I've already tried both time() and clock() from time.h, but it seems that time() returns seconds and clock() seems to give me milliseconds (or centiseconds?) I would like something more precise though. Is there a way I can grab the time with at least microsecond precision?
This only needs to be able to compile on Linux.
You referred to clock() and time() - were you looking for gettimeofday()?
That will fill in a struct timeval, which contains seconds and microseconds.
Of course the actual resolution is up to the hardware.
For what it's worth, here's one that's just a few macros:
#include <time.h>
clock_t startm, stopm;
#define START if ( (startm = clock()) == -1) {printf("Error calling clock");exit(1);}
#define STOP if ( (stopm = clock()) == -1) {printf("Error calling clock");exit(1);}
#define PRINTTIME printf( "%6.3f seconds used by the processor.", ((double)stopm-startm)/CLOCKS_PER_SEC);
Then just use it with:
main() {
START;
// Do stuff you want to time
STOP;
PRINTTIME;
}
From http://ctips.pbwiki.com/Timer
You want a profiler application.
Search keywords at SO and search engines: linux profiling
Have a look at gettimeofday,
clock_*, or get/setitimer.
Try "bench.h"; it lets you put a START_TIMER; and STOP_TIMER("name"); into your code, allowing you to arbitrarily benchmark any section of code (note: only recommended for short sections, not things taking dozens of milliseconds or more). Its accurate to the clock cycle, though in some rare cases it can change how the code in between is compiled, in which case you're better off with a profiler (though profilers are generally more effort to use for specific sections of code).
It only works on x86.
You might want to google for an instrumentation tool.
You won't find a library call which lets you get past the clock resolution of your platform. Either use a profiler (man gprof) as another poster suggested, or - quick & dirty - put a loop around the offending section of code to execute it many times, and use clock().
gettimeofday() provides you with a resolution of microseconds, whereas clock_gettime() provides you with a resolution of nanoseconds.
int clock_gettime(clockid_t clk_id, struct timespec *tp);
The clk_id identifies the clock to be used. Use CLOCK_REALTIME if you want a system-wide clock visible to all processes. Use CLOCK_PROCESS_CPUTIME_ID for per-process timer and CLOCK_THREAD_CPUTIME_ID for a thread-specific timer.
It depends on the conditions.. Profilers are nice for general global views however if you really need an accurate view my recommendation is KISS. Simply run the code in a loop such that it takes a minute or so to complete. Then compute a simple average based on the total run time and iterations executed.
This approach allows you to:
Obtain accurate results with low resolution timers.
Not run into issues where instrumentation interferes with high speed caches (l2,l1,branch..etc) close to the processor. However running the same code in a tight loop can also provide optimistic results that may not reflect real world conditions.
Don't know which enviroment/OS you are working on, but your timing may be inaccurate if another thread, task, or process preempts your timed code in the middle. I suggest exploring mechanisms such as mutexes or semaphores to prevent other threads from preemting your process.
If you are developing on x86 or x64 why not use the Time Stamp Counter: RDTSC.
It will be more reliable then Ansi C functions like time() or clock() as RDTSC is an atomic function. Using C functions for this purpose can introduce problems as you have no guarantee that the thread they are executing in will not be switched out and as a result the value they return will not be an accurate description of the actual execution time you are trying to measure.
With RDTSC you can better measure this. You will need to convert the tick count back into a human readable time H:M:S format which will depend on the processors clock frequency but google around and I am sure you will find examples.
However even with RDTSC you will be including the time your code was switched out of execution, while a better solution than using time()/clock() if you need an exact measurement you will have to turn to a profiler that will instrument your code and take into account when your code is not actually executing due to context switches or whatever.

Resources