Interpreting gprof result and granularity - c

I am using gprof for profiling a C program for the first time. The following lines appears in the report that I generated using;
$ gprof test_gprof gmon.out > analysis.txt
In Flat Profile
Each sample counts as 0.01 seconds.
is this the maximum resolution in time?
In call graph:
granularity: each sample hit covers 4 byte(s) for 7.69% of 0.13
What exactly is 4 bytes here ? and what time it is talking about?

Please read charpter 5.1 and 5.2 of this manual. You can also read this manual on CentOS with the following command:
$ info gprof
In Flat Profile,
Each sample counts as 0.01 seconds.
indicates that the sampling period is 100Hz, which is not controlled by gprof itself(Check this page and this one for more details). Therefore, 0.01 seconds is theoretically not the maximum resolution in time.
As for Call graph, according to this doc from Oracle,
The "4 bytes" means resolution to a single instruction. The "0.07% of 14.74 seconds" means that each sample, representing ten milliseconds of CPU time, accounts for 0.07% of the run.


Difference between MPI_Wtime and actual wall time

I implemented a MIMD genetic algorithm using C and OpenMPI where each process takes care of a independent subpopulation (island model). So, for a population of size 200, an 1-process run operates on the whole population while 2 processes evolve populations of size 100.
So, by measuring the execution time with MPI_Wtime, I'm getting the expected execution time by running on a 2-core machine with ubuntu. However, it disagrees with both ubuntu's time command and perception alone: it's noticeable that running with 2 processes takes longer for some reason.
$time mpirun -n 1 genalg
execution time: 0.570039 s (MPI_Wtime)
real 0m0.618s
user 0m0.584s
sys 0m0.024s
$time mpirun -n 2 genalg
execution time: 0.309784 s (MPI_Wtime)
real 0m1.352s
user 0m0.604s
sys 0m0.064s
For a larger population (4000), I get the following:
$time mpirun -n 1 genalg
execution time: 11.645675 s (MPI_Wtime)
real 0m11.751s
user 0m11.292s
sys 0m0.392s
$time mpirun -n 2 genalg
execution time: 5.872798 s (MPI_Wtime)
real 0m8.047s
user 0m11.472s
sys 0m0.380s
I get similar results whether there's communication between the processes or not, and also tried MPI_Barrier. Also got the same results with gettimeofday, and turning gcc optimization on or off doesn't make much difference.
What is possibly going on? It should run faster with 2 processes, like MPI_Wtime suggests, but in reality it's running slower, matching the real time.
Update: I ran it on another PC and didn't have this issue.
The code:
void runGA(int argc,char* argv[])
if(MYRANK == 0)
t1 = MPI_Wtime();
Individual* ind = best_found();
if(MYRANK != 0)
t2 = MPI_Wtime();
exptime = t2-t1;
printf("execution time: %f s\n",exptime);
My guess (and her/his) is that time give the sum of the time used by all cores. It's more like a cost : you have 2 processes on 2 cores, so the cost time is time1+time2 because the second core could be used for another process, so you "lost" this time on this second core. MPI_Wtime() display the actual time spend for the human.
It's maybe the explanation why the real time is lower that user time in the second case. The real time is closer to MPI time than the sum of user ans sys. In the 1st case the initialization time take to much time and probably false the result.
The issue was solved after upgrading Ubuntu Mate 15.10 to 16.04, which came with OpenMPI version 1.10.2 (the previous one was 1.6.5).

Regarding CPU utilization

Considering the below piece of C code, I expected the CPU utilization to go up to 100% as the processor would try to complete the job (endless in this case) given to it. On running the executable for 5 mins, I found the CPU to go up to a max. of 48%. I am running Mac OS X 10.5.8; processor: Intel Core 2 Duo; Compiler: GCC 4.1.
int i = 10;
while(1) {
i = i * 5;
Could someone please explain why the CPU usage does not go up to 100%? Does the OS limit the CPU from reaching 100%?
Please note that if I added a "printf()" inside the loop the CPU hits 88%. I understand that in this case, the processor also has to write to the standard output stream hence the sharp rise in usage.
Has this got something to do with the amount of job assigned to the processor per unit time?
You have a multicore processor and you are in a single thread scenario, so you will use only one core full throttle ... Why do you expect the overall processor use go to 100% in a similar context ?
Run two copies of your program at the same time. These will use both cores of your "Core 2 Duo" CPU and overall CPU usage will go to 100%
if I added a "printf()" inside the loop the CPU hits 88%.
The printf send some characters to the terminal/screen. Sending information, Display and Update is handeled by code outside your exe, this is likely to be executed on another thread. But displaying a few characters does not need 100% of such a thread. That is why you see 100% for Core 1 and 76% for Core 2 which results in the overal CPU usage of 88% what you see.

Way to measure time of execution program

I have a lots of short programs in C. Each program realize simple operation for example: include library, load something (ex matrix) from file, do simple operation, write matrix to file end.
I want to measure real time of excecution a whole program (not only fragment of code).
My simple idea is using htop or ps aux -> column time. But this method isn't good because I don't have exacly time of execution but time of excecution during last refresh and I can miss this.
Do you have any method to measure time of process in linux?
If your program is named foo, then simply typing
~$ time foo
should do exactly what you want.
In addition to other answers, mostly suggesting to use the time utility or shell builtins:
time(7) is a very useful page to read.
You might use (inside your code) the clock(3) standard function to get CPU time in microseconds.
Resolution and accuracy of time measures depends upon hardware and operating system kernel. You could prefer a "real-time" kernel (e.g. a linux-image-3.2.0-rt package), or at least a kernel configured with CONFIG_HZ_1000) to get more precise time measures.
You might also use (inside your code) the clock_gettime(2) syscall (so link also the -lrt library).
When doing measurements, try to have your measured process run a few seconds at least, and measure it several times (because e.g. of disk cache issues).
If you use
this will provide some base-level information. This should run your shell's time command. Example output:
$ time sleep 2
real 0m2.002s
user 0m0.000s
sys 0m0.000s
But there is also
/usr/bin/time <PROGRAM> [ARGS]
which is more flexible and provides considerably more diagnostic information regarding timing. This runs a GNU timing program. This site has some usage examples. Example output:
$ /usr/bin/time -v sleep 2
Command being timed: "sleep 2"
User time (seconds): 0.00
System time (seconds): 0.00
Percent of CPU this job got: 0%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:02.00
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 2496
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 202
Voluntary context switches: 2
Involuntary context switches: 1
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0

Calculating execution time of my C program? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How can I get the execution time of a program in milliseconds in C?
Calculating time of execution with time() function
I tried clock() in time.h but it always gives me 0.0000 seconds ie 0 seconds as output. Is there any way to get execution time in micro or Milli seconds or in any other smaller units?
Precede the execution of your program in shell with "time", i.e.:
user#linux:~$ time c_program_name
Running the following, for example:
sampson-chen#linux:~/src/reviewboard$ time ls -R
Gives the following time results:
real 0m0.046s
user 0m0.008s
sys 0m0.012s
See the manual for time to adjust the display formats / precision / verbosity.
clock should work, you need to define it at the very beginning and print the value at last, check this out:
Clock displays 0.000 all the time because the execution is very fast and execution time is negligible. Try checking out time with some complex algorithms like Tower of Hanoi or NQueesns with big values. Then you'll get the time of execution in some milliseconds. I tried it for Tower of hanoi with 15 discs and it gave me some value for execution.

Increasing CPU Utilization and keep it at a certain level using C code

I am writing a C code (on Linux) that needs to consume a certain amount of CPU when it's running. I am carrying out an experiment in which I trigger certain actions upon reaching a certain CPU threshold. So, once the Utilization reaches a certain threshold, I need to keep it at that state for say 30 seconds till I complete my experiments. I am monitoring the CPU Utilization using the top command.
So my questions are -
1. How do I increase the CPU Utilization to a given value (in a deterministic way if possible)?
2. Once I get to the threshold, is there a way to keep it at that level for a pre-defined time?
Sample output of top command (the 9th column is CPU used by the 'top' process) -
19304 abcde 16 0 5448 1212 808 R 0.2 0.0 0:00.06 top
Similar to above, I will look at the line in top which shows the utilization of my binary.
Any help would be appreciated. Also, let me know if you need more details.
The following lines of code allowed me to control CPU Utilization quite well - In the following case, I have 2 options - keep it above 50% and keep it below 50% - After some trial and error, I settled down at the given usleep values.
endwait = clock() + ( seconds * CLOCKS_PER_SEC );
while( clock() < endwait ) {}
if (cpu_utilization > 50)
Hope this helps!
cpuburn is known to make CPU utilization so high that it raise its temperature to its max level.
It seems there is no more official website about it, but you can still access to source code with Debian package or googlecode.
It's implemented in asm, so you'll have to make some glue in order to interact with it in C.
Something of this sort should have a constant CPU utilization, in my opinion:
md5sum < /dev/urandom
