Excluding a function from gprof results - c

I want to exclude some functions from the output generated by gprof. In other words, I do not want them to be included when calculating percentage time spent by each function during execution. I read at one place -E option can be used.
However I'm using gprof -E function_to_be_exluded my_program_name, but nothing happens. The manual says it is depreciated and you should use symspecs instead. However, I have wasted half an hour trying to figure out how to achieve it with symspecs, but no luck. Anyone can kindly help me in this.

Exactly, gprof -e -E are deprecated and superseded by usage of newer relevant options that have argument - symspecs. So try using:
gprof --no-time=symspec
The -n option causes "gprof", in its call graph analysis, not to propagate times for
symbols matching symspec.
e.g.
gprof --no-time=name_of_function_you_dont_want_to_profile.
Use this along with your other gprof options (-E -e definitely ruled out)

According to the man:
for display flat profile and exclude function from it you need use -P option:
gprof main gmon.out -Pfunction_name
for display call graph and exclude function from it you need use -Q option:
gprof main gmon.out -Qfunction_name
This options can be repeated and used in the same time:
gprof main gmon.out -Pfunction_name -Qfunction_name -Qother_function_name
If you need exclude function from one report but not exclude any function from other you need use -p or -q options.
Example:
Create program:
#include <stdio.h>
#include <stdlib.h>
void func_a () {printf ("%s ",__FUNCTION__);}
void func_b () {printf ("%s ",__FUNCTION__);}
void func_c () {printf ("%s ",__FUNCTION__);}
int main ()
{
func_a ();
func_b ();
func_c ();
return EXIT_SUCCESS;
}
Compile it:
gcc main.c -pg -o main
And launch:
$ ./main
func_a func_b func_c
Generate profile reports:
If you need print only flat profile you need call:
$ gprof main gmon.out -b -p
% cumulative self self total
time seconds seconds calls Ts/call Ts/call name
0.00 0.00 0.00 1 0.00 0.00 func_a
0.00 0.00 0.00 1 0.00 0.00 func_b
0.00 0.00 0.00 1 0.00 0.00 func_c
If you need print flat profile excluding functions func_a and func_c and full call graph you need call:
$ gprof main gmon.out -b -Pfunc_a -Pfunc_c -q
% cumulative self self total
time seconds seconds calls Ts/call Ts/call name
0.00 0.00 0.00 1 0.00 0.00 func_b
index % time self children called name
0.00 0.00 1/1 main [9]
[1] 0.0 0.00 0.00 1 func_a [1]
-----------------------------------------------
0.00 0.00 1/1 main [9]
[2] 0.0 0.00 0.00 1 func_b [2]
-----------------------------------------------
0.00 0.00 1/1 main [9]
[3] 0.0 0.00 0.00 1 func_c [3]
-----------------------------------------------
If you need print flat profile excluding functions func_a and func_c and call graph excluding func_b you need call:
$ gprof main gmon.out -b -Pfunc_a -Pfunc_c -Qfunc_b
% cumulative self self total
time seconds seconds calls Ts/call Ts/call name
0.00 0.00 0.00 1 0.00 0.00 func_b
index % time self children called name
0.00 0.00 1/1 main [9]
[1] 0.0 0.00 0.00 1 func_a [1]
-----------------------------------------------
0.00 0.00 1/1 main [9]
[3] 0.0 0.00 0.00 1 func_c [3]
-----------------------------------------------

Unless I've misunderstood what you're asking...
gprof a.out --no-time=function_name
works for me.

Related

Gprof shows regular functions in the app as called from <spontaneous>

So - I've been writing a language interpreter as a side project for a year now. Today I have finally decided to test its performance for the first time! Maybe I should have done that sooner... turns out running a Fibonacci function in the language takes x600 the time of the equivalent Python program. Whoopsy daisy.
Anyway... I'm off to profiling. In the call graph, gprof regards a few functions (namely critical ones) as called from <spontaneous>. It's a problem because understanding what calls these functions the most frequently will aid me.
I compile the project as a whole like so:
gcc *.c -o app.exe -g -pg -O2 -Wall -Wno-unused -LC:/msys64_new/mingw64/lib -lShlwapi
I use gprof like so:
gprof app.exe > gprofoutput.txt
Since it's a language interpreter, many of these functions (all of them?) might be called as part of a mutual recursion chain. Is it possible that this is the problem? If so, is gprof to be trusted at all with this program?
The functions called by <spontaneous> are compiled as part of the *.c files of the project, and are not called by an external library or anything that I know of.
Because I have checked this, the other answers here on SO about <spontaneous> haven't solved my issue. What can be causing these functions to appear as called from <spontaneous> and how can I fix this?
Example gprof output (_mcount_private and __fentry__ are of course irrelevant - including them here in case it grants any clues):
index % time self children called name
<spontaneous>
[1] 46.9 1.38 0.00 _mcount_private [1]
-----------------------------------------------
<spontaneous>
[2] 23.1 0.68 0.00 __fentry__ [2]
-----------------------------------------------
<spontaneous>
[3] 18.7 0.06 0.49 object_string_new [3]
0.17 0.24 5687901/5687901 cell_table_set_value [4]
0.00 0.08 5687901/7583875 make_native_function_with_params [7]
0.00 0.00 13271769/30578281 parser_parse [80]
-----------------------------------------------
0.17 0.24 5687901/5687901 object_string_new [3]
[4] 14.1 0.17 0.24 5687901 cell_table_set_value [4]
0.12 0.05 5687901/5930697 table_set_value_directly [6]
0.02 0.04 5687901/7341054 table_get_value_directly [9]
0.01 0.00 5687901/5930694 object_cell_new [31]
-----------------------------------------------
<spontaneous>
[5] 7.0 0.07 0.14 vm_interpret_frame [5]
0.01 0.05 1410341/1410345 cell_table_get_value_cstring_key [13]
0.01 0.02 242786/242794 cell_table_set_value_cstring_key [19]
0.02 0.00 3259885/3502670 object_thread_pop_eval_stack [22]
0.01 0.00 242785/242786 value_array_free [28]
0.00 0.01 242785/242785 vm_call_object [34]
0.00 0.00 681987/1849546 value_compare [32]
0.00 0.00 485570/31306651 table_init [20]
0.00 0.00 242785/242788 cell_table_free [38]
0.00 0.00 242785/25375951 cell_table_init [29]
0.00 0.00 1/1 object_load_attribute [50]
0.00 0.00 1/1 object_load_attribute_cstring_key [52]
0.00 0.00 1/2 object_user_function_new [56]
0.00 0.00 2/33884613 copy_cstring [17]
0.00 0.00 1/5687909 object_function_set_name [25]
0.00 0.00 1/17063722 copy_null_terminated_cstring [23]
0.00 0.00 1/72532402 allocate [21]
0.00 0.00 3502671/3502671 object_thread_push_eval_stack [81]
0.00 0.00 1167557/1167557 object_as_string [85]
0.00 0.00 681988/681995 two_bytes_to_short [86]
0.00 0.00 485572/485578 value_array_make [88]
0.00 0.00 242786/242786 object_thread_push_frame [96]
0.00 0.00 242786/242786 object_thread_peek_frame [95]
0.00 0.00 242785/242785 object_thread_pop_frame [97]
0.00 0.00 242785/485571 vm_import_module [89]
0.00 0.00 2/1167575 object_value_is [83]
-----------------------------------------------
..... etc .........
I'm running Mingw-w64 GCC on Windows 7.
From the gprof manual:
If the identity of the callers of a function cannot be determined, a dummy caller-line is printed which has `' as the "caller's name" and all other fields blank. This can happen for signal handlers.
Looks like your caller's name is unknown to gprof. If any potential caller (including async dispatch, if you're using such) is compiled without symbols, the callers names would not be known. What third party libraries are you using? Can you get debugging symbols for them?
You can obtain Windows symbol packages, though I don't know which libraries are covered. That page also discusses using Microsoft's Symbol Server instead of downloading (potentially out-of-date) symbols packages.

gprof : function 'etext' is taking 100.05% of time running

I used gprof to get a profile of a c code which is running too slowly. Here is what I get:
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls Ts/call Ts/call name
100.05 0.16 0.16 etext
0.00 0.16 0.00 90993 0.00 0.00 Nel_wind
0.00 0.16 0.00 27344 0.00 0.00 calc_crab_dens
0.00 0.16 0.00 17472 0.00 0.00 Nel_radio
0.00 0.16 0.00 1786 0.00 0.00 sync
0.00 0.16 0.00 1 0.00 0.00 _fini
0.00 0.16 0.00 1 0.00 0.00 calc_ele
0.00 0.16 0.00 1 0.00 0.00 ic
0.00 0.16 0.00 1 0.00 0.00 initialize
0.00 0.16 0.00 1 0.00 0.00 make_table
I don't know what does "etext" mean and why is it taking 100.05% of time running. Thanks for your help!
I was having a similar issue and it was caused by me calling gprof with a different executable.
The accident occurred because I was recompiling with different options and naively called gprof with the same executable name on two different gmon.out files that were generated with different executables.
gprof exec1 exec1.gmon.out # Good, expected output
gprof exec1 exec2.gmon.out # Weird etext function with 0 calls, but lots of time consumed
Make sure you're not doing something similar.

Is GNU gprof buggy?

I have a C program that calls a function pi_calcPiItem() 600000000 times through the function pi_calcPiBlock. So to analyze the time spent in the functions I used GNU gprof. The result seems to be erroneous since all calls are attributed to main() instead. Furthermore the call graph does not make any sense:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls Ts/call Ts/call name
61.29 9.28 9.28 pi_calcPiItem
15.85 11.68 2.40 pi_calcPiBlock
11.96 13.49 1.81 _mcount_private
9.45 14.92 1.43 __fentry__
1.45 15.14 0.22 pow
0.00 15.14 0.00 600000000 0.00 0.00 main
Call graph
granularity: each sample hit covers 4 byte(s) for 0.07% of 15.14 seconds
index % time self children called name
<spontaneous>
[1] 61.3 9.28 0.00 pi_calcPiItem [1]
-----------------------------------------------
<spontaneous>
[2] 15.9 2.40 0.00 pi_calcPiBlock [2]
0.00 0.00 600000000/600000000 main [6]
-----------------------------------------------
<spontaneous>
[3] 12.0 1.81 0.00 _mcount_private [3]
-----------------------------------------------
<spontaneous>
[4] 9.4 1.43 0.00 __fentry__ [4]
-----------------------------------------------
<spontaneous>
[5] 1.5 0.22 0.00 pow [5]
-----------------------------------------------
6 main [6]
0.00 0.00 600000000/600000000 pi_calcPiBlock [2]
[6] 0.0 0.00 0.00 600000000+6 main [6]
6 main [6]
-----------------------------------------------
Is this a bug or do I have to configure the program somehow?
And what does <spontaneous> mean?
EDIT (more insight for you)
The code is all about the calculation of pi:
#define PI_BLOCKSIZE (100000000)
#define PI_BLOCKCOUNT (6)
#define PI_THRESHOLD (PI_BLOCKSIZE * PI_BLOCKCOUNT)
int32_t main(int32_t argc, char* argv[]) {
double result;
for ( int32_t i = 0; i < PI_THRESHOLD; i += PI_BLOCKSIZE ) {
pi_calcPiBlock(&result, i, i + PI_BLOCKSIZE);
}
printf("pi = %f\n",result);
return 0;
}
static void pi_calcPiBlock(double* result, int32_t start, int32_t end) {
double piItem;
for ( int32_t i = start; i < end; ++i ) {
pi_calcPiItem(&piItem, i);
*result += piItem;
}
}
static void pi_calcPiItem(double* piItem, int32_t index) {
*piItem = 4.0 * (pow(-1.0,index) / (2.0 * index + 1.0));
}
And this is how I got the results (executed on Windows with the help of Cygwin):
> gcc -std=c99 -o pi *.c -pg -fno-inline-small-functions
> ./pi.exe
> gprof.exe pi.exe
Try:
Using the noinline, noclone function attributes instead of -fno-inline-small-functions
By disassembling main I could see that -fno-inline-small-functions doesn't stop inlining
Linking your program statically (-static)
You should also initialize result to 0.0 in main
This worked for me on Linux, x86-64:
#include <stdio.h>
#include <stdint.h>
#include <math.h>
#define PI_BLOCKSIZE (100000000)
#define PI_BLOCKCOUNT (6)
#define PI_THRESHOLD (PI_BLOCKSIZE * PI_BLOCKCOUNT)
static void pi_calcPiItem(double* piItem, int32_t index);
static void pi_calcPiBlock(double* result, int32_t start, int32_t end);
int32_t main(int32_t argc, char* argv[]) {
double result;
result = 0.0;
for ( int32_t i = 0; i < PI_THRESHOLD; i += PI_BLOCKSIZE ) {
pi_calcPiBlock(&result, i, i + PI_BLOCKSIZE);
}
printf("pi = %f\n",result);
return 0;
}
__attribute__((noinline, noclone))
static void pi_calcPiBlock(double* result, int32_t start, int32_t end) {
double piItem;
for ( int32_t i = start; i < end; ++i ) {
pi_calcPiItem(&piItem, i);
*result += piItem;
}
}
__attribute__((noinline, noclone))
static void pi_calcPiItem(double* piItem, int32_t index) {
*piItem = 4.0 * (pow(-1.0,index) / (2.0 * index + 1.0));
}
Building the Code
$ cc pi.c -o pi -Os -Wall -g3 -I. -std=c99 -pg -static -lm
Output
$ ./pi && gprof ./pi
pi = 3.141593
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls ns/call ns/call name
85.61 22.55 22.55 __ieee754_pow_sse2
4.75 23.80 1.25 pow
4.14 24.89 1.09 600000000 1.82 1.82 pi_calcPiItem
2.54 25.56 0.67 __exp1
0.91 25.80 0.24 pi_calcPiBlock
0.53 25.94 0.14 matherr
0.47 26.07 0.13 __lseek_nocancel
0.38 26.17 0.10 frame_dummy
0.34 26.26 0.09 __ieee754_exp_sse2
0.32 26.34 0.09 __profile_frequency
0.00 26.34 0.00 1 0.00 0.00 main
Call graph (explanation follows)
granularity: each sample hit covers 2 byte(s) for 0.04% of 26.34 seconds
index % time self children called name
<spontaneous>
[1] 85.6 22.55 0.00 __ieee754_pow_sse2 [1]
-----------------------------------------------
<spontaneous>
[2] 5.0 0.24 1.09 pi_calcPiBlock [2]
1.09 0.00 600000000/600000000 pi_calcPiItem [4]
-----------------------------------------------
<spontaneous>
[3] 4.7 1.25 0.00 pow [3]
-----------------------------------------------
1.09 0.00 600000000/600000000 pi_calcPiBlock [2]
[4] 4.1 1.09 0.00 600000000 pi_calcPiItem [4]
-----------------------------------------------
<spontaneous>
[5] 2.5 0.67 0.00 __exp1 [5]
-----------------------------------------------
<spontaneous>
[6] 0.5 0.14 0.00 matherr [6]
-----------------------------------------------
<spontaneous>
[7] 0.5 0.13 0.00 __lseek_nocancel [7]
-----------------------------------------------
<spontaneous>
[8] 0.4 0.10 0.00 frame_dummy [8]
-----------------------------------------------
<spontaneous>
[9] 0.3 0.09 0.00 __ieee754_exp_sse2 [9]
-----------------------------------------------
<spontaneous>
[10] 0.3 0.09 0.00 __profile_frequency [10]
-----------------------------------------------
0.00 0.00 1/1 __libc_start_main [827]
[11] 0.0 0.00 0.00 1 main [11]
-----------------------------------------------
Comments
As expected, pow() is the bottleneck. While pi is running, perf top (sampling based system profiler) also shows __ieee754_pow_sse2 taking 60%+ of CPU. Changing pow(-1.0,index) to ((i & 1) ? -1.0 : 1.0) as #Mike Dunlavey suggested makes the code roughly 4 times faster.
In 'man gprof' page, here is explanation for "spontaneous":
Parents that are not themselves profiled will have the time of
their profiled children propagated to them, but they will appear to be
spontaneously invoked in the call graph listing, and will not have
their time propagated further. Similarly, signal catchers, even
though profiled, will appear to be spontaneous (although for more
obscure reasons). Any profiled children of signal catchers should
have their times propagated properly, unless the signal catcher was
invoked during the execution of the profiling routine, in which case
all is lost.

C: reducing runtime for optimization

I have below output from gprof for my program:
Flat profile:
Each sample counts as 0.01 seconds.
no time accumulated
% cumulative self self total
time seconds seconds calls Ts/call Ts/call name
0.00 0.00 0.00 30002 0.00 0.00 insert
0.00 0.00 0.00 10124 0.00 0.00 getNode
0.00 0.00 0.00 3000 0.00 0.00 search
0.00 0.00 0.00 1 0.00 0.00 initialize
I have done optimizations and the run time I have is 0.01 secs(this is being calculated on a server where I'm uploading my code) which is the least I am getting at the moment. I am not able to reduce it further, though I want to. Does the 0.01 sec run time of my program has anything to do with the sampling time I see above in gprof output.
Call graph is as below:
gprof -q ./a.out gmon.out
Call graph (explanation follows)
granularity: each sample hit covers 2 byte(s) no time propagated
index % time self children called name
0.00 0.00 30002/30002 main [10]
[1] 0.0 0.00 0.00 30002 insert [1]
0.00 0.00 10124/10124 getNode [2]
-----------------------------------------------
0.00 0.00 10124/10124 insert [1]
[2] 0.0 0.00 0.00 10124 getNode [2]
-----------------------------------------------
0.00 0.00 3000/3000 main [10]
[3] 0.0 0.00 0.00 3000 search [3]
-----------------------------------------------
0.00 0.00 1/1 main [10]
[4] 0.0 0.00 0.00 1 initialize [4]
-----------------------------------------------
While using `time /bin/sh -c ' ./a.out < inp.in '` on my machine I get below which varies slightly on every run .
real 0m0.024s
user 0m0.016s
sys 0m0.004s
real 0m0.017s
user 0m0.008s
sys 0m0.004s
I am bit confused how to correlate time output and gprof o/p
According to your other question, you got it from 8 seconds down to 0.01 seconds.
That's pretty good.
Now if you want to go further, first do as #Peter suggested in his comment.
Run the code many times inside main() so it runs long enough to get samples.
Then you could try my favorite technique.
It will be much more informative than gprof.
P.S. Don't worry about CPU percent.
All it tells is if your machine is busy and not doing much I/O.
It does not tell you anything about your program.

gprof command is not creating proper out.txt

First of all I'm running MacOSX 10.7.1. I've installed all properly, Xcode 4 and all the libraries, to work with C lenguage.
I'm having troubles running gprof command in shell. I'll explain step by step what I'm doing and the output I'm receiving.
Step 1:
~ roger$ cd Path/to/my/workspace
~ roger$ ls
Output (Step 1):
queue.c queue.h testqueue.c
Step 2:
~ roger$ gcc -c -g -pg queue.c
~ roger$ ls
Output (Step 2):
queue.c queue.h queue.o testqueue.c
Step 3:
~ roger$ gcc -o testqueue -g -pg queue.o testqueue.c
~ roger$ ls
Output (Step 3):
queue.c queue.h queue.o testqueue testqueue.c
Step 4:
~ roger$ ./testqueue
~ roger$ ls
Output (Step 4):
enqueue element 16807
head=0,tail=1
enqueue element 282475249
head=0,tail=2
enqueue element 1622650073
head=0,tail=3
enqueue element 984943658
head=0,tail=4
enqueue element 1144108930
head=0,tail=5
enqueue element 470211272
head=0,tail=6
enqueue element 101027544
head=0,tail=7
enqueue element 1457850878
head=0,tail=8
enqueue element 1458777923
head=0,tail=9
enqueue element 2007237709
head=0,tail=10
queue is full
dequeue element 16807
dequeue element 282475249
dequeue element 1622650073
dequeue element 984943658
dequeue element 1144108930
dequeue element 470211272
dequeue element 101027544
dequeue element 1457850878
dequeue element 1458777923
dequeue element 2007237709
queue is empty
gmon.out queue.h testqueue
queue.c queue.o testqueue.c
Step 5:
~ roger$ gprof -b testqueue gmon.out > out.txt
~ roger$ nano out.txt
Output (Step 5):
GNU nano 2.0.6 File: out.txt
granularity: each sample hit covers 4 byte(s) no time propagated
called/total parents
index %time self descendents called+self name index
called/total children
^L
granularity: each sample hit covers 4 byte(s) no time accumulated
% cumulative self self total
time seconds seconds calls ms/call ms/call name
^L
Index by function name
Finally. The output file should show something like this:
% cumulative self self total
time seconds seconds calls ms/call ms/call name
33.34 0.02 0.02 7208 0.00 0.00 open
16.67 0.03 0.01 244 0.04 0.12 offtime
16.67 0.04 0.01 8 1.25 1.25 memccpy
16.67 0.05 0.01 7 1.43 1.43 write
16.67 0.06 0.01 mcount
0.00 0.06 0.00 236 0.00 0.00 tzset
0.00 0.06 0.00 192 0.00 0.00 tolower
0.00 0.06 0.00 47 0.00 0.00 strlen
0.00 0.06 0.00 45 0.00 0.00 strchr
0.00 0.06 0.00 1 0.00 50.00 main
0.00 0.06 0.00 1 0.00 0.00 memcpy
0.00 0.06 0.00 1 0.00 10.11 print
0.00 0.06 0.00 1 0.00 0.00 profil
0.00 0.06 0.00 1 0.00 50.00 report
...
And it shows blank field.
I searched here and I found nothing helpfully at all. I google it but the same thing.
I would be very grateful If anyone could help me please.
gprof does not work on OS X. The system call it needs was removed several versions ago. It's not clear why the utility still ships. The alternatives are to use dtrace and/or sample.
no need to give gmon.out in the last line, give gprof -b testqueue > out.txt
see http://www.network-theory.co.uk/docs/gccintro/gccintro_80.html for further reference

Resources