Way to measure time of execution program - c

I have a lots of short programs in C. Each program realize simple operation for example: include library, load something (ex matrix) from file, do simple operation, write matrix to file end.
I want to measure real time of excecution a whole program (not only fragment of code).
My simple idea is using htop or ps aux -> column time. But this method isn't good because I don't have exacly time of execution but time of excecution during last refresh and I can miss this.
Do you have any method to measure time of process in linux?

If your program is named foo, then simply typing
~$ time foo
should do exactly what you want.

In addition to other answers, mostly suggesting to use the time utility or shell builtins:
time(7) is a very useful page to read.
You might use (inside your code) the clock(3) standard function to get CPU time in microseconds.
Resolution and accuracy of time measures depends upon hardware and operating system kernel. You could prefer a "real-time" kernel (e.g. a linux-image-3.2.0-rt package), or at least a kernel configured with CONFIG_HZ_1000) to get more precise time measures.
You might also use (inside your code) the clock_gettime(2) syscall (so link also the -lrt library).
When doing measurements, try to have your measured process run a few seconds at least, and measure it several times (because e.g. of disk cache issues).

If you use
time <PROGRAM> [ARGS]
this will provide some base-level information. This should run your shell's time command. Example output:
$ time sleep 2
real 0m2.002s
user 0m0.000s
sys 0m0.000s
But there is also
/usr/bin/time <PROGRAM> [ARGS]
which is more flexible and provides considerably more diagnostic information regarding timing. This runs a GNU timing program. This site has some usage examples. Example output:
$ /usr/bin/time -v sleep 2
Command being timed: "sleep 2"
User time (seconds): 0.00
System time (seconds): 0.00
Percent of CPU this job got: 0%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:02.00
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 2496
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 202
Voluntary context switches: 2
Involuntary context switches: 1
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0

Related

Difference between MPI_Wtime and actual wall time

I implemented a MIMD genetic algorithm using C and OpenMPI where each process takes care of a independent subpopulation (island model). So, for a population of size 200, an 1-process run operates on the whole population while 2 processes evolve populations of size 100.
So, by measuring the execution time with MPI_Wtime, I'm getting the expected execution time by running on a 2-core machine with ubuntu. However, it disagrees with both ubuntu's time command and perception alone: it's noticeable that running with 2 processes takes longer for some reason.
$time mpirun -n 1 genalg
execution time: 0.570039 s (MPI_Wtime)
real 0m0.618s
user 0m0.584s
sys 0m0.024s
$time mpirun -n 2 genalg
execution time: 0.309784 s (MPI_Wtime)
real 0m1.352s
user 0m0.604s
sys 0m0.064s
For a larger population (4000), I get the following:
$time mpirun -n 1 genalg
execution time: 11.645675 s (MPI_Wtime)
real 0m11.751s
user 0m11.292s
sys 0m0.392s
$time mpirun -n 2 genalg
execution time: 5.872798 s (MPI_Wtime)
real 0m8.047s
user 0m11.472s
sys 0m0.380s
I get similar results whether there's communication between the processes or not, and also tried MPI_Barrier. Also got the same results with gettimeofday, and turning gcc optimization on or off doesn't make much difference.
What is possibly going on? It should run faster with 2 processes, like MPI_Wtime suggests, but in reality it's running slower, matching the real time.
Update: I ran it on another PC and didn't have this issue.
The code:
void runGA(int argc,char* argv[])
{
(initializations)
if(MYRANK == 0)
t1 = MPI_Wtime();
genalg();
Individual* ind = best_found();
MPI_Barrier(MPI_COMM_WORLD);
if(MYRANK != 0)
return;
t2 = MPI_Wtime();
exptime = t2-t1;
printf("execution time: %f s\n",exptime);
}
My guess (and her/his) is that time give the sum of the time used by all cores. It's more like a cost : you have 2 processes on 2 cores, so the cost time is time1+time2 because the second core could be used for another process, so you "lost" this time on this second core. MPI_Wtime() display the actual time spend for the human.
It's maybe the explanation why the real time is lower that user time in the second case. The real time is closer to MPI time than the sum of user ans sys. In the 1st case the initialization time take to much time and probably false the result.
The issue was solved after upgrading Ubuntu Mate 15.10 to 16.04, which came with OpenMPI version 1.10.2 (the previous one was 1.6.5).

Why sys time of process is showing higher when the strace command is showing significantly less time?

I have written a program using C which has two threads.
Initially it was,
for(int i=0;i<n;i++){
long_operation(arr[i]);
}
then I divided the loop into two threads, two execute concurrently.
One thread will carry out the operation for arr[0] to arr[n/2], another thread will work for arr[n/2] to arr[n-1].
long_operation function is thread safe.
Initially I was using join but it was taking higher sys time for futex system call, which I observed using strace command.
So i removed the strace command and use two volatile variable in the two threads to keep track whether thread is completed or not and a busy loop in the thread spawing function two halt the execution of later code. And I made the thread detachable and remove join.
It improved performance a little bit. but when i used time command, the sys part was taking,
real 0m31.368s
user 0m53.738s
sys 0m15.203s
but when i checked using the strace command the output was,
% time seconds usecs/call calls errors syscall
55.79 0.000602 9 66 clone
44.21 0.000477 3 177 write
------ ----------- ----------- --------- --------- ---------------
100.00 0.001079 243 total
So the time command was showing that around 15 seconds CPU spend in kernel within the process. But the strace command showing almost 0 seconds was utilized for system calls.
Then why 15 seconds was wasted in kernel?
I have an dual-core hyper-threaded Intel CPU.

Interpreting gprof result and granularity

I am using gprof for profiling a C program for the first time. The following lines appears in the report that I generated using;
$ gprof test_gprof gmon.out > analysis.txt
In Flat Profile
Each sample counts as 0.01 seconds.
is this the maximum resolution in time?
In call graph:
granularity: each sample hit covers 4 byte(s) for 7.69% of 0.13
seconds
What exactly is 4 bytes here ? and what time it is talking about?
Please read charpter 5.1 and 5.2 of this manual. You can also read this manual on CentOS with the following command:
$ info gprof
In Flat Profile,
Each sample counts as 0.01 seconds.
indicates that the sampling period is 100Hz, which is not controlled by gprof itself(Check this page and this one for more details). Therefore, 0.01 seconds is theoretically not the maximum resolution in time.
As for Call graph, according to this doc from Oracle,
The "4 bytes" means resolution to a single instruction. The "0.07% of 14.74 seconds" means that each sample, representing ten milliseconds of CPU time, accounts for 0.07% of the run.

How to measure user/system cpu time for a piece of program?

In section 3.9 of the classic APUE(Advanced Programming in the UNIX Environment), the author measured the user/system time consumed in his sample program which runs against varying buffer size(an I/O read/write program).
The result table goes kinda like(all the time are in the unit of second):
BUFF_SIZE USER_CPU SYSTEM_CPU CLOCK_TIME LOOPS
1 124.89 161.65 288.64 103316352
...
512 0.27 0.41 7.03 201789
...
I'm curious about and really wondering how to measure the USER/SYSTEM CPU time for a piece of program?
And in this example, what does the CLOCK TIME mean and how to measure it?
Obviously it isn't simply the sum of user CPU time and system CPU time.
You could easily measure the running time of a program using the time command under *nix:
$ time myprog
real 0m2.792s
user 0m0.099s
sys 0m0.200s
The real or CLOCK_TIME refers to the wall clock time i.e the time taken from the start of the program to finish and includes even the time slices taken by other processes when the kernel context switches them. It also includes any time, the process is blocked (on I/O events, etc.)
The user or USER_CPU refers to the CPU time spent in the user space, i.e. outside the kernel. Unlike the real time, it refers to only the CPU cycles taken by the particular process.
The sys or SYSTEM_CPU refers to the CPU time spent in the kernel space, (as part of system calls). Again this is only counting the CPU cycles spent in kernel space on behalf of the process and not any time it is blocked.
In the time utility, the user and sys are calculated from either times() or wait() system calls. The real is usually calculated using the time differences in the 2 timestamps gathered using the gettimeofday() system call at the start and end of the program.
One more thing you might want to know is real != user + sys. On a multicore system the user or sys or their sum can quite easily exceed the real time.
Partial answer:
Well, CLOCK_TIME is same as time shown by a clock, time passed in the so called "real world".
One way to measure that is to use gettimeofday POSIX function, which stores time to caller's struct timeval, containing UNIX seconds field and a microsecond field (actual accuracy is often less). Example for using that in typical benchmark code (ignoring errors etc):
struct timeval tv1, tv2;
gettimeofday(&tv1, NULL);
do_operation_to_measure();
gettimeofday(&tv2, NULL);
// get difference, fix value if microseconds became negative
struct timeval tvdiff = { tv2.tv_sec - tv1.tv_sec, tv2.tv_usec - tv1.tv_usec };
if (tvdiff.tv_usec < 0) { tvdiff.tv_usec += 1000000; tvdiff.tv_sec -= 1; }
// print it
printf("Elapsed time: %ld.%06ld\n", tvdiff.tv_sec, tvdiff.tv_usec);

How to get the CPU Usage in C?

I want to get the overall total CPU usage for an application in C, the total CPU usage like we get in the TaskManager...
I want to know ... for windows and linux :: current Total CPU utilization by all processes ..... as we see in the task manager.
This is platform-specific:
In Windows, you can use the GetProcessTimes() function.
In Linux, you can actually just use clock().
These can be used to measure the amount of CPU time taken between two time intervals.
EDIT :
To get the CPU consumption (as a percentage), you will need to divide the total CPU time by the # of logical cores that the OS sees, and then divided by the total wall-clock time:
% CPU usage = (CPU time) / (# of cores) / (wall time)
Getting the # of logical cores is also platform-specific:
Windows: GetSystemInfo()
Linux: sysconf(_SC_NPROCESSORS_ONLN)
Under POSIX, you want getrusage(2)'s ru_utime field. Use RUSAGE_SELF for just the calling process, and RUSAGE_CHILDEN for all terminated and wait(2)ed-upon children. Linux also supports RUSAGE_THREAD for just the calling thread. Use ru_stime if you want the system time, which can be summed with ru_utime for total time actively running (not wall time).
It is usually operating system specific.
You could use the clock function, returning a clock_t (some integer type, like perhaps long). On Linux systems it measures the CPU time in microseconds.

Resources