Difference between MPI_Wtime and actual wall time - c

I implemented a MIMD genetic algorithm using C and OpenMPI where each process takes care of a independent subpopulation (island model). So, for a population of size 200, an 1-process run operates on the whole population while 2 processes evolve populations of size 100.
So, by measuring the execution time with MPI_Wtime, I'm getting the expected execution time by running on a 2-core machine with ubuntu. However, it disagrees with both ubuntu's time command and perception alone: it's noticeable that running with 2 processes takes longer for some reason.
$time mpirun -n 1 genalg
execution time: 0.570039 s (MPI_Wtime)
real 0m0.618s
user 0m0.584s
sys 0m0.024s
$time mpirun -n 2 genalg
execution time: 0.309784 s (MPI_Wtime)
real 0m1.352s
user 0m0.604s
sys 0m0.064s
For a larger population (4000), I get the following:
$time mpirun -n 1 genalg
execution time: 11.645675 s (MPI_Wtime)
real 0m11.751s
user 0m11.292s
sys 0m0.392s
$time mpirun -n 2 genalg
execution time: 5.872798 s (MPI_Wtime)
real 0m8.047s
user 0m11.472s
sys 0m0.380s
I get similar results whether there's communication between the processes or not, and also tried MPI_Barrier. Also got the same results with gettimeofday, and turning gcc optimization on or off doesn't make much difference.
What is possibly going on? It should run faster with 2 processes, like MPI_Wtime suggests, but in reality it's running slower, matching the real time.
Update: I ran it on another PC and didn't have this issue.
The code:
void runGA(int argc,char* argv[])
{
(initializations)
if(MYRANK == 0)
t1 = MPI_Wtime();
genalg();
Individual* ind = best_found();
MPI_Barrier(MPI_COMM_WORLD);
if(MYRANK != 0)
return;
t2 = MPI_Wtime();
exptime = t2-t1;
printf("execution time: %f s\n",exptime);
}

My guess (and her/his) is that time give the sum of the time used by all cores. It's more like a cost : you have 2 processes on 2 cores, so the cost time is time1+time2 because the second core could be used for another process, so you "lost" this time on this second core. MPI_Wtime() display the actual time spend for the human.
It's maybe the explanation why the real time is lower that user time in the second case. The real time is closer to MPI time than the sum of user ans sys. In the 1st case the initialization time take to much time and probably false the result.

The issue was solved after upgrading Ubuntu Mate 15.10 to 16.04, which came with OpenMPI version 1.10.2 (the previous one was 1.6.5).

Related

Why sys time of process is showing higher when the strace command is showing significantly less time?

I have written a program using C which has two threads.
Initially it was,
for(int i=0;i<n;i++){
long_operation(arr[i]);
}
then I divided the loop into two threads, two execute concurrently.
One thread will carry out the operation for arr[0] to arr[n/2], another thread will work for arr[n/2] to arr[n-1].
long_operation function is thread safe.
Initially I was using join but it was taking higher sys time for futex system call, which I observed using strace command.
So i removed the strace command and use two volatile variable in the two threads to keep track whether thread is completed or not and a busy loop in the thread spawing function two halt the execution of later code. And I made the thread detachable and remove join.
It improved performance a little bit. but when i used time command, the sys part was taking,
real 0m31.368s
user 0m53.738s
sys 0m15.203s
but when i checked using the strace command the output was,
% time seconds usecs/call calls errors syscall
55.79 0.000602 9 66 clone
44.21 0.000477 3 177 write
------ ----------- ----------- --------- --------- ---------------
100.00 0.001079 243 total
So the time command was showing that around 15 seconds CPU spend in kernel within the process. But the strace command showing almost 0 seconds was utilized for system calls.
Then why 15 seconds was wasted in kernel?
I have an dual-core hyper-threaded Intel CPU.

Sequential part of the program takes more time as process count increases

I'm writing a parallel C program using MPICH. The program naturally has a sequential part and a parallel part. The parallel part seems to be working fine, however I'm having trouble with the sequential part.
In the sequential part, the program reads some values from a file and then proceeds to distribute them among the other processes. It is written as follows:
if (rank == 0)
{
gettimeofday(&sequentialStartTime, NULL);
// Read document id's and their weights into the corresponding variables
documentCount = readDocuments(&weights, &documentIds, documentsFileName, dictionarySize);
readQuery(&query, queryFileName, dictionarySize);
gettimeofday(&endTime, NULL);
timersub(&endTime, &sequentialStartTime, &resultTime);
printf("Sequential part: %.2f ms\n", (double) resultTime.tv_sec + resultTime.tv_usec / 1000.0);
//distribute the data to other processes
} else {
//wait for the data and the start working
}
Here, readQuery and readDocuments are reading values from files, and the elapsed time is printed after they are complete. This piece of code actually works just fine. The problem arises when I try to run this with different number of processors.
I run the program with the following command
mpirun -np p ./main
where p is the processor count. I expect the sequential part to run in a certain amount of time no matter how many processors I use. For p values 1 to 4, this holds, however when I use 5 to 8 as p values, then the sequential part takes more time.
The processor I'm using is Intel® Core™ i7-4790 CPU # 3.60GHz × 8 and the operating system I have is Windows 8.1 64-bit. I'm running this program on Ubuntu 14.04 which runs on a virtual machine which has full access to my processor and 8 GB's of RAM.
The only reason that came to my mind is that maybe when the process count is higher than 4, the main process may share the physical core it's running on with another process, since I know that this CPU has 4 physical cores but functions as if it had 8, using the hyperthreading technology. However, when I increase p from 5 to 6 to 7 and so on, the execution time increases linearly, so that cannot be the case.
Any help or idea on this would be highly appreciated. Thanks in advance.
Edit: I realized that increasing p increases the run time no matter the value. I'm getting a linear increase in time as p increases.

How to measure user/system cpu time for a piece of program?

In section 3.9 of the classic APUE(Advanced Programming in the UNIX Environment), the author measured the user/system time consumed in his sample program which runs against varying buffer size(an I/O read/write program).
The result table goes kinda like(all the time are in the unit of second):
BUFF_SIZE USER_CPU SYSTEM_CPU CLOCK_TIME LOOPS
1 124.89 161.65 288.64 103316352
...
512 0.27 0.41 7.03 201789
...
I'm curious about and really wondering how to measure the USER/SYSTEM CPU time for a piece of program?
And in this example, what does the CLOCK TIME mean and how to measure it?
Obviously it isn't simply the sum of user CPU time and system CPU time.
You could easily measure the running time of a program using the time command under *nix:
$ time myprog
real 0m2.792s
user 0m0.099s
sys 0m0.200s
The real or CLOCK_TIME refers to the wall clock time i.e the time taken from the start of the program to finish and includes even the time slices taken by other processes when the kernel context switches them. It also includes any time, the process is blocked (on I/O events, etc.)
The user or USER_CPU refers to the CPU time spent in the user space, i.e. outside the kernel. Unlike the real time, it refers to only the CPU cycles taken by the particular process.
The sys or SYSTEM_CPU refers to the CPU time spent in the kernel space, (as part of system calls). Again this is only counting the CPU cycles spent in kernel space on behalf of the process and not any time it is blocked.
In the time utility, the user and sys are calculated from either times() or wait() system calls. The real is usually calculated using the time differences in the 2 timestamps gathered using the gettimeofday() system call at the start and end of the program.
One more thing you might want to know is real != user + sys. On a multicore system the user or sys or their sum can quite easily exceed the real time.
Partial answer:
Well, CLOCK_TIME is same as time shown by a clock, time passed in the so called "real world".
One way to measure that is to use gettimeofday POSIX function, which stores time to caller's struct timeval, containing UNIX seconds field and a microsecond field (actual accuracy is often less). Example for using that in typical benchmark code (ignoring errors etc):
struct timeval tv1, tv2;
gettimeofday(&tv1, NULL);
do_operation_to_measure();
gettimeofday(&tv2, NULL);
// get difference, fix value if microseconds became negative
struct timeval tvdiff = { tv2.tv_sec - tv1.tv_sec, tv2.tv_usec - tv1.tv_usec };
if (tvdiff.tv_usec < 0) { tvdiff.tv_usec += 1000000; tvdiff.tv_sec -= 1; }
// print it
printf("Elapsed time: %ld.%06ld\n", tvdiff.tv_sec, tvdiff.tv_usec);

Way to measure time of execution program

I have a lots of short programs in C. Each program realize simple operation for example: include library, load something (ex matrix) from file, do simple operation, write matrix to file end.
I want to measure real time of excecution a whole program (not only fragment of code).
My simple idea is using htop or ps aux -> column time. But this method isn't good because I don't have exacly time of execution but time of excecution during last refresh and I can miss this.
Do you have any method to measure time of process in linux?
If your program is named foo, then simply typing
~$ time foo
should do exactly what you want.
In addition to other answers, mostly suggesting to use the time utility or shell builtins:
time(7) is a very useful page to read.
You might use (inside your code) the clock(3) standard function to get CPU time in microseconds.
Resolution and accuracy of time measures depends upon hardware and operating system kernel. You could prefer a "real-time" kernel (e.g. a linux-image-3.2.0-rt package), or at least a kernel configured with CONFIG_HZ_1000) to get more precise time measures.
You might also use (inside your code) the clock_gettime(2) syscall (so link also the -lrt library).
When doing measurements, try to have your measured process run a few seconds at least, and measure it several times (because e.g. of disk cache issues).
If you use
time <PROGRAM> [ARGS]
this will provide some base-level information. This should run your shell's time command. Example output:
$ time sleep 2
real 0m2.002s
user 0m0.000s
sys 0m0.000s
But there is also
/usr/bin/time <PROGRAM> [ARGS]
which is more flexible and provides considerably more diagnostic information regarding timing. This runs a GNU timing program. This site has some usage examples. Example output:
$ /usr/bin/time -v sleep 2
Command being timed: "sleep 2"
User time (seconds): 0.00
System time (seconds): 0.00
Percent of CPU this job got: 0%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:02.00
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 2496
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 202
Voluntary context switches: 2
Involuntary context switches: 1
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0

Why my c program took time more than the time calculated by itself?

I use time command on linux to measure how long my program took, and in my code I have put timers to calculate time
time took calculated by program: 71.320 sec
real 1m27.268s
user 1m7.607s
sys 0m3.785s
I don't know why my program took real time more than calculated, how to find the reason and resolve it?
======================================================
here is how I calculate time in my code;
clock_t cl;
cl = clock();
do_some_work();
cl = clock() - cl;
float seconds = 1.0 * cl / CLOCKS_PER_SEC;
printf("time took: %.3f sec\n", seconds);
There always is overhead for starting up the process, starting the runtime, closing the program and time itself probably also has overhead.
On top of that, in a multi-process operating system your process can be "switched-out", meaning that other processes run while yours in put on hold. This can mess with timings too.
Let me explain the output of time:
real means the actual clock time, including all overhead.
user is time spent in the actual program.
sys is time spent by the kernel system (the switching out I talked about earlier, for example)
Note that user + sys is very close to your time: 1m7.607s + 0m3.785s == 71.392s.
Finally, how did you calculate the time? Without that information it's hard to tell exactly what the problem (if any) is.

Resources