Is GNU gprof buggy? - c

I have a C program that calls a function pi_calcPiItem() 600000000 times through the function pi_calcPiBlock. So to analyze the time spent in the functions I used GNU gprof. The result seems to be erroneous since all calls are attributed to main() instead. Furthermore the call graph does not make any sense:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls Ts/call Ts/call name
61.29 9.28 9.28 pi_calcPiItem
15.85 11.68 2.40 pi_calcPiBlock
11.96 13.49 1.81 _mcount_private
9.45 14.92 1.43 __fentry__
1.45 15.14 0.22 pow
0.00 15.14 0.00 600000000 0.00 0.00 main
Call graph
granularity: each sample hit covers 4 byte(s) for 0.07% of 15.14 seconds
index % time self children called name
<spontaneous>
[1] 61.3 9.28 0.00 pi_calcPiItem [1]
-----------------------------------------------
<spontaneous>
[2] 15.9 2.40 0.00 pi_calcPiBlock [2]
0.00 0.00 600000000/600000000 main [6]
-----------------------------------------------
<spontaneous>
[3] 12.0 1.81 0.00 _mcount_private [3]
-----------------------------------------------
<spontaneous>
[4] 9.4 1.43 0.00 __fentry__ [4]
-----------------------------------------------
<spontaneous>
[5] 1.5 0.22 0.00 pow [5]
-----------------------------------------------
6 main [6]
0.00 0.00 600000000/600000000 pi_calcPiBlock [2]
[6] 0.0 0.00 0.00 600000000+6 main [6]
6 main [6]
-----------------------------------------------
Is this a bug or do I have to configure the program somehow?
And what does <spontaneous> mean?
EDIT (more insight for you)
The code is all about the calculation of pi:
#define PI_BLOCKSIZE (100000000)
#define PI_BLOCKCOUNT (6)
#define PI_THRESHOLD (PI_BLOCKSIZE * PI_BLOCKCOUNT)
int32_t main(int32_t argc, char* argv[]) {
double result;
for ( int32_t i = 0; i < PI_THRESHOLD; i += PI_BLOCKSIZE ) {
pi_calcPiBlock(&result, i, i + PI_BLOCKSIZE);
}
printf("pi = %f\n",result);
return 0;
}
static void pi_calcPiBlock(double* result, int32_t start, int32_t end) {
double piItem;
for ( int32_t i = start; i < end; ++i ) {
pi_calcPiItem(&piItem, i);
*result += piItem;
}
}
static void pi_calcPiItem(double* piItem, int32_t index) {
*piItem = 4.0 * (pow(-1.0,index) / (2.0 * index + 1.0));
}
And this is how I got the results (executed on Windows with the help of Cygwin):
> gcc -std=c99 -o pi *.c -pg -fno-inline-small-functions
> ./pi.exe
> gprof.exe pi.exe

Try:
Using the noinline, noclone function attributes instead of -fno-inline-small-functions
By disassembling main I could see that -fno-inline-small-functions doesn't stop inlining
Linking your program statically (-static)
You should also initialize result to 0.0 in main
This worked for me on Linux, x86-64:
#include <stdio.h>
#include <stdint.h>
#include <math.h>
#define PI_BLOCKSIZE (100000000)
#define PI_BLOCKCOUNT (6)
#define PI_THRESHOLD (PI_BLOCKSIZE * PI_BLOCKCOUNT)
static void pi_calcPiItem(double* piItem, int32_t index);
static void pi_calcPiBlock(double* result, int32_t start, int32_t end);
int32_t main(int32_t argc, char* argv[]) {
double result;
result = 0.0;
for ( int32_t i = 0; i < PI_THRESHOLD; i += PI_BLOCKSIZE ) {
pi_calcPiBlock(&result, i, i + PI_BLOCKSIZE);
}
printf("pi = %f\n",result);
return 0;
}
__attribute__((noinline, noclone))
static void pi_calcPiBlock(double* result, int32_t start, int32_t end) {
double piItem;
for ( int32_t i = start; i < end; ++i ) {
pi_calcPiItem(&piItem, i);
*result += piItem;
}
}
__attribute__((noinline, noclone))
static void pi_calcPiItem(double* piItem, int32_t index) {
*piItem = 4.0 * (pow(-1.0,index) / (2.0 * index + 1.0));
}
Building the Code
$ cc pi.c -o pi -Os -Wall -g3 -I. -std=c99 -pg -static -lm
Output
$ ./pi && gprof ./pi
pi = 3.141593
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls ns/call ns/call name
85.61 22.55 22.55 __ieee754_pow_sse2
4.75 23.80 1.25 pow
4.14 24.89 1.09 600000000 1.82 1.82 pi_calcPiItem
2.54 25.56 0.67 __exp1
0.91 25.80 0.24 pi_calcPiBlock
0.53 25.94 0.14 matherr
0.47 26.07 0.13 __lseek_nocancel
0.38 26.17 0.10 frame_dummy
0.34 26.26 0.09 __ieee754_exp_sse2
0.32 26.34 0.09 __profile_frequency
0.00 26.34 0.00 1 0.00 0.00 main
Call graph (explanation follows)
granularity: each sample hit covers 2 byte(s) for 0.04% of 26.34 seconds
index % time self children called name
<spontaneous>
[1] 85.6 22.55 0.00 __ieee754_pow_sse2 [1]
-----------------------------------------------
<spontaneous>
[2] 5.0 0.24 1.09 pi_calcPiBlock [2]
1.09 0.00 600000000/600000000 pi_calcPiItem [4]
-----------------------------------------------
<spontaneous>
[3] 4.7 1.25 0.00 pow [3]
-----------------------------------------------
1.09 0.00 600000000/600000000 pi_calcPiBlock [2]
[4] 4.1 1.09 0.00 600000000 pi_calcPiItem [4]
-----------------------------------------------
<spontaneous>
[5] 2.5 0.67 0.00 __exp1 [5]
-----------------------------------------------
<spontaneous>
[6] 0.5 0.14 0.00 matherr [6]
-----------------------------------------------
<spontaneous>
[7] 0.5 0.13 0.00 __lseek_nocancel [7]
-----------------------------------------------
<spontaneous>
[8] 0.4 0.10 0.00 frame_dummy [8]
-----------------------------------------------
<spontaneous>
[9] 0.3 0.09 0.00 __ieee754_exp_sse2 [9]
-----------------------------------------------
<spontaneous>
[10] 0.3 0.09 0.00 __profile_frequency [10]
-----------------------------------------------
0.00 0.00 1/1 __libc_start_main [827]
[11] 0.0 0.00 0.00 1 main [11]
-----------------------------------------------
Comments
As expected, pow() is the bottleneck. While pi is running, perf top (sampling based system profiler) also shows __ieee754_pow_sse2 taking 60%+ of CPU. Changing pow(-1.0,index) to ((i & 1) ? -1.0 : 1.0) as #Mike Dunlavey suggested makes the code roughly 4 times faster.

In 'man gprof' page, here is explanation for "spontaneous":
Parents that are not themselves profiled will have the time of
their profiled children propagated to them, but they will appear to be
spontaneously invoked in the call graph listing, and will not have
their time propagated further. Similarly, signal catchers, even
though profiled, will appear to be spontaneous (although for more
obscure reasons). Any profiled children of signal catchers should
have their times propagated properly, unless the signal catcher was
invoked during the execution of the profiling routine, in which case
all is lost.

Related

KNN Algorithm is Giving good Accuracy with Bad Confusion Matrix Results

I have data with multilabel classification. I used KNN model in order to classify it. The number of labels are 15, I got accuracy results for each label, averaged the results to get the accuracy of the model which is 93%.
The confusion matrix is showing bad numbers.
Would you tell me what does this mean? Is it overfitting? How can I solve my problem?
Accuracy and mean absolute error (mae) code
Input:
# Getting the accuracy of the model
y_pred1 = level_1_knn_model.predict(X_val1)
accuracy = (sum(y_val1==y_pred1)/y_val1.shape[0])*100
accuracy = sum(accuracy)/len(accuracy)
print("Accuracy: "+str(accuracy)+"%\n")
# Getting the mean absolute error
mae1 = mean_absolute_error(y_val1, y_pred1)
print("Mean Absolute Error: "+str(mae1))
Output:
Accuracy: [96.55462575 97.82146336 99.23207908 95.39247451 98.69340807 74.22793801
78.67975909 97.47825108 99.80189098 77.67264969 91.69399776 99.97084683
99.42621267 99.32682688 99.74159693]%
Accuracy: 93.71426804569977%
Mean Absolute Error: 9.703818402273944
Confusion Matrix and classification report code
Input:
# Calculate the confusion matrix
cMatrix1 = confusion_matrix(y_val1.argmax(axis=1), y_pred1.argmax(axis=1))
# Plot the confusion matrix
plt.figure(figsize=(11,10))
sns.heatmap(cMatrix1, annot=True, fmt='g')
# Calculate the classification report
classReport1 = classification_report(y_val1, y_pred1)
print("\nClassification Report:")
print(classReport1)
Output:
Classification Report:
precision recall f1-score support
0 0.08 0.00 0.01 5053
1 0.03 0.00 0.01 3017
2 0.00 0.00 0.00 1159
3 0.07 0.00 0.01 6644
4 0.00 0.00 0.00 1971
5 0.58 0.65 0.61 47222
6 0.39 0.33 0.36 27302
7 0.02 0.00 0.00 3767
8 0.00 0.00 0.00 299
9 0.58 0.61 0.60 40823
10 0.13 0.02 0.03 11354
11 0.00 0.00 0.00 44
12 0.00 0.00 0.00 866
13 0.00 0.00 0.00 1016
14 0.00 0.00 0.00 390
micro avg 0.54 0.43 0.48 150927
macro avg 0.13 0.11 0.11 150927
weighted avg 0.43 0.43 0.42 150927
samples avg 0.43 0.43 0.43 150927

Gprof shows regular functions in the app as called from <spontaneous>

So - I've been writing a language interpreter as a side project for a year now. Today I have finally decided to test its performance for the first time! Maybe I should have done that sooner... turns out running a Fibonacci function in the language takes x600 the time of the equivalent Python program. Whoopsy daisy.
Anyway... I'm off to profiling. In the call graph, gprof regards a few functions (namely critical ones) as called from <spontaneous>. It's a problem because understanding what calls these functions the most frequently will aid me.
I compile the project as a whole like so:
gcc *.c -o app.exe -g -pg -O2 -Wall -Wno-unused -LC:/msys64_new/mingw64/lib -lShlwapi
I use gprof like so:
gprof app.exe > gprofoutput.txt
Since it's a language interpreter, many of these functions (all of them?) might be called as part of a mutual recursion chain. Is it possible that this is the problem? If so, is gprof to be trusted at all with this program?
The functions called by <spontaneous> are compiled as part of the *.c files of the project, and are not called by an external library or anything that I know of.
Because I have checked this, the other answers here on SO about <spontaneous> haven't solved my issue. What can be causing these functions to appear as called from <spontaneous> and how can I fix this?
Example gprof output (_mcount_private and __fentry__ are of course irrelevant - including them here in case it grants any clues):
index % time self children called name
<spontaneous>
[1] 46.9 1.38 0.00 _mcount_private [1]
-----------------------------------------------
<spontaneous>
[2] 23.1 0.68 0.00 __fentry__ [2]
-----------------------------------------------
<spontaneous>
[3] 18.7 0.06 0.49 object_string_new [3]
0.17 0.24 5687901/5687901 cell_table_set_value [4]
0.00 0.08 5687901/7583875 make_native_function_with_params [7]
0.00 0.00 13271769/30578281 parser_parse [80]
-----------------------------------------------
0.17 0.24 5687901/5687901 object_string_new [3]
[4] 14.1 0.17 0.24 5687901 cell_table_set_value [4]
0.12 0.05 5687901/5930697 table_set_value_directly [6]
0.02 0.04 5687901/7341054 table_get_value_directly [9]
0.01 0.00 5687901/5930694 object_cell_new [31]
-----------------------------------------------
<spontaneous>
[5] 7.0 0.07 0.14 vm_interpret_frame [5]
0.01 0.05 1410341/1410345 cell_table_get_value_cstring_key [13]
0.01 0.02 242786/242794 cell_table_set_value_cstring_key [19]
0.02 0.00 3259885/3502670 object_thread_pop_eval_stack [22]
0.01 0.00 242785/242786 value_array_free [28]
0.00 0.01 242785/242785 vm_call_object [34]
0.00 0.00 681987/1849546 value_compare [32]
0.00 0.00 485570/31306651 table_init [20]
0.00 0.00 242785/242788 cell_table_free [38]
0.00 0.00 242785/25375951 cell_table_init [29]
0.00 0.00 1/1 object_load_attribute [50]
0.00 0.00 1/1 object_load_attribute_cstring_key [52]
0.00 0.00 1/2 object_user_function_new [56]
0.00 0.00 2/33884613 copy_cstring [17]
0.00 0.00 1/5687909 object_function_set_name [25]
0.00 0.00 1/17063722 copy_null_terminated_cstring [23]
0.00 0.00 1/72532402 allocate [21]
0.00 0.00 3502671/3502671 object_thread_push_eval_stack [81]
0.00 0.00 1167557/1167557 object_as_string [85]
0.00 0.00 681988/681995 two_bytes_to_short [86]
0.00 0.00 485572/485578 value_array_make [88]
0.00 0.00 242786/242786 object_thread_push_frame [96]
0.00 0.00 242786/242786 object_thread_peek_frame [95]
0.00 0.00 242785/242785 object_thread_pop_frame [97]
0.00 0.00 242785/485571 vm_import_module [89]
0.00 0.00 2/1167575 object_value_is [83]
-----------------------------------------------
..... etc .........
I'm running Mingw-w64 GCC on Windows 7.
From the gprof manual:
If the identity of the callers of a function cannot be determined, a dummy caller-line is printed which has `' as the "caller's name" and all other fields blank. This can happen for signal handlers.
Looks like your caller's name is unknown to gprof. If any potential caller (including async dispatch, if you're using such) is compiled without symbols, the callers names would not be known. What third party libraries are you using? Can you get debugging symbols for them?
You can obtain Windows symbol packages, though I don't know which libraries are covered. That page also discusses using Microsoft's Symbol Server instead of downloading (potentially out-of-date) symbols packages.

Tracing Mhash Files (Hashing)

I am trying to trace how this open source program, mhash computes it's hashing
I can run the program successfully by using using the following commands:
gcc -o example example.c -lmhash
(also, mhash is currently installed, and I am running Ubuntu Linux)
Mhash can be found here: http://mhash.sourceforge.net/
and the example that I have tried is here:
#include <mhash.h>
#include <stdio.h>
int main()
{
char password[] = "Jefe";
int keylen = 4;
char data[] = "what do ya want for nothing?";
int datalen = 28;
MHASH td;
unsigned char *mac;
int j;
td = mhash_hmac_init(MHASH_MD5, password, keylen,
mhash_get_hash_pblock(MHASH_MD5));
mhash(td, data, datalen);
mac = mhash_hmac_end(td);
/*
* The output should be 0x750c783e6ab0b503eaa86e310a5db738
* according to RFC 2104.
*/
printf("0x");
for (j = 0; j < mhash_get_block_size(MHASH_MD5); j++) {
printf("%.2x", mac[j]);
}
printf("\n");
exit(0);
}
I have read the API's, it has very well documentations, but there are soo many files, I do not know from which areas it inherits it's algorithms from?
Thanks for your time and help in advance
Your question seems a little vague to me ... I'm not sure I fully understand it. I'll adventure myself into an answer though.
If you simply don't know what gets executed for crunching that MD5 hash the easiest way to get into it is probably to attach yourself with a debugger on this example program of yours. Make sure you have the debug flags enabled on your mhash library (which seem to be on by default), then step in mhash and see where that gets you. You cannot miss anything this way.
In gdb it would look something like this (You'd probably want to use an IDE - eclipse perhaps, to make it a LOT prettier):
$ gdb ./test.exe
..
Reading symbols from /home/B41655/workspace/ctest/test.exe...done.
(gdb) break main
Breakpoint 1 at 0x4011af: file test.c, line 5.
(gdb) run
Starting program: /home/B41655/workspace/ctest/test.exe
[New Thread 10200.0x205c]
[New Thread 10200.0x27b0]
Breakpoint 1, main () at test.c:5
5 char password[] = "Jefe";
(gdb) s
6 int keylen = 4;
(gdb) s
7 char data[] = "what do ya want for nothing?";
(gdb) s
8 int datalen = 28;
(gdb) s
13 td = mhash_hmac_init(MHASH_MD5, password, keylen,
(gdb) s
mhash_get_hash_pblock (type=MHASH_MD5) at mhash.c:438
438 {
(gdb) s
441 MHASH_ALG_LOOP(ret = p->hash_pblock);
and so on ...
If by any chance you want to passively get some sort a call graph of your example program execution you could do that with a profiler. Using gprof on this program would issue something like this (this would require your library/program recompiled with -pg flag):
index % time self children called name
0.00 0.00 17/17 main [81]
[2] 0.0 0.00 0.00 17 mhash_get_block_size [2]
-----------------------------------------------
0.00 0.00 1/9 mhash [14]
0.00 0.00 2/9 mhash_hmac_deinit [17]
0.00 0.00 2/9 mhash_hmac_init [20]
0.00 0.00 2/9 MD5Update [9]
0.00 0.00 2/9 MD5Final [10]
[3] 0.0 0.00 0.00 9 mutils_memcpy [3]
-----------------------------------------------
0.00 0.00 1/6 mhash_deinit [15]
0.00 0.00 1/6 mhash_hmac_init [20]
0.00 0.00 2/6 mhash_hmac_deinit [17]
0.00 0.00 2/6 MD5Final [10]
[4] 0.0 0.00 0.00 6 mutils_bzero [4]
-----------------------------------------------
0.00 0.00 1/6 mhash_hmac_end_m [19]
0.00 0.00 1/6 mhash_hmac_init [20]
0.00 0.00 4/6 mhash_init_int [12]
[5] 0.0 0.00 0.00 6 mutils_malloc [5]
-----------------------------------------------
0.00 0.00 2/6 MD5Update [9]
0.00 0.00 4/6 MD5Final [10]
[6] 0.0 0.00 0.00 6 mutils_word32nswap [6]
-----------------------------------------------
0.00 0.00 1/5 mhash_deinit [15]
0.00 0.00 4/5 mhash_hmac_deinit [17]
[7] 0.0 0.00 0.00 5 mutils_free [7]
-----------------------------------------------
0.00 0.00 2/4 MD5Update [9]
0.00 0.00 2/4 MD5Final [10]
[8] 0.0 0.00 0.00 4 MD5Transform [8]
-----------------------------------------------
0.00 0.00 1/4 mhash [14]
0.00 0.00 1/4 mhash_hmac_init [20]
0.00 0.00 2/4 mhash_hmac_deinit [17]
[9] 0.0 0.00 0.00 4 MD5Update [9]
0.00 0.00 2/9 mutils_memcpy [3]
0.00 0.00 2/6 mutils_word32nswap [6]
0.00 0.00 2/4 MD5Transform [8]
-----------------------------------------------
0.00 0.00 1/2 mhash_deinit [15]
0.00 0.00 1/2 mhash_hmac_deinit [17]
[10] 0.0 0.00 0.00 2 MD5Final [10]
0.00 0.00 4/6 mutils_word32nswap [6]
0.00 0.00 2/4 MD5Transform [8]
0.00 0.00 2/9 mutils_memcpy [3]
0.00 0.00 2/6 mutils_bzero [4]
-----------------------------------------------
0.00 0.00 2/2 mhash_init_int [12]
[11] 0.0 0.00 0.00 2 MD5Init [11]
-----------------------------------------------
0.00 0.00 1/2 mhash_hmac_deinit [17]
0.00 0.00 1/2 mhash_hmac_init [20]
[12] 0.0 0.00 0.00 2 mhash_init_int [12]
0.00 0.00 4/6 mutils_malloc [5]
0.00 0.00 2/2 mutils_memset [13]
0.00 0.00 2/2 MD5Init [11]
-----------------------------------------------
0.00 0.00 2/2 mhash_init_int [12]
[13] 0.0 0.00 0.00 2 mutils_memset [13]
-----------------------------------------------
0.00 0.00 1/1 main [81]
[14] 0.0 0.00 0.00 1 mhash [14]
0.00 0.00 1/9 mutils_memcpy [3]
0.00 0.00 1/4 MD5Update [9]
-----------------------------------------------
0.00 0.00 1/1 mhash_hmac_deinit [17]
[15] 0.0 0.00 0.00 1 mhash_deinit [15]
0.00 0.00 1/6 mutils_bzero [4]
0.00 0.00 1/2 MD5Final [10]
0.00 0.00 1/5 mutils_free [7]
-----------------------------------------------
0.00 0.00 1/1 main [81]
[16] 0.0 0.00 0.00 1 mhash_get_hash_pblock [16]
-----------------------------------------------
0.00 0.00 1/1 mhash_hmac_end_m [19]
[17] 0.0 0.00 0.00 1 mhash_hmac_deinit [17]
0.00 0.00 4/5 mutils_free [7]
0.00 0.00 2/9 mutils_memcpy [3]
0.00 0.00 2/4 MD5Update [9]
0.00 0.00 2/6 mutils_bzero [4]
0.00 0.00 1/2 mhash_init_int [12]
0.00 0.00 1/2 MD5Final [10]
0.00 0.00 1/1 mhash_deinit [15]
-----------------------------------------------
0.00 0.00 1/1 main [81]
[18] 0.0 0.00 0.00 1 mhash_hmac_end [18]
0.00 0.00 1/1 mhash_hmac_end_m [19]
-----------------------------------------------
0.00 0.00 1/1 mhash_hmac_end [18]
[19] 0.0 0.00 0.00 1 mhash_hmac_end_m [19]
0.00 0.00 1/6 mutils_malloc [5]
0.00 0.00 1/1 mhash_hmac_deinit [17]
-----------------------------------------------
0.00 0.00 1/1 main [81]
[20] 0.0 0.00 0.00 1 mhash_hmac_init [20]
0.00 0.00 2/9 mutils_memcpy [3]
0.00 0.00 1/2 mhash_init_int [12]
0.00 0.00 1/6 mutils_malloc [5]
0.00 0.00 1/6 mutils_bzero [4]
0.00 0.00 1/4 MD5Update [9]
-----------------------------------------------
showing you which functions got executed and how they were called.

C: reducing runtime for optimization

I have below output from gprof for my program:
Flat profile:
Each sample counts as 0.01 seconds.
no time accumulated
% cumulative self self total
time seconds seconds calls Ts/call Ts/call name
0.00 0.00 0.00 30002 0.00 0.00 insert
0.00 0.00 0.00 10124 0.00 0.00 getNode
0.00 0.00 0.00 3000 0.00 0.00 search
0.00 0.00 0.00 1 0.00 0.00 initialize
I have done optimizations and the run time I have is 0.01 secs(this is being calculated on a server where I'm uploading my code) which is the least I am getting at the moment. I am not able to reduce it further, though I want to. Does the 0.01 sec run time of my program has anything to do with the sampling time I see above in gprof output.
Call graph is as below:
gprof -q ./a.out gmon.out
Call graph (explanation follows)
granularity: each sample hit covers 2 byte(s) no time propagated
index % time self children called name
0.00 0.00 30002/30002 main [10]
[1] 0.0 0.00 0.00 30002 insert [1]
0.00 0.00 10124/10124 getNode [2]
-----------------------------------------------
0.00 0.00 10124/10124 insert [1]
[2] 0.0 0.00 0.00 10124 getNode [2]
-----------------------------------------------
0.00 0.00 3000/3000 main [10]
[3] 0.0 0.00 0.00 3000 search [3]
-----------------------------------------------
0.00 0.00 1/1 main [10]
[4] 0.0 0.00 0.00 1 initialize [4]
-----------------------------------------------
While using `time /bin/sh -c ' ./a.out < inp.in '` on my machine I get below which varies slightly on every run .
real 0m0.024s
user 0m0.016s
sys 0m0.004s
real 0m0.017s
user 0m0.008s
sys 0m0.004s
I am bit confused how to correlate time output and gprof o/p
According to your other question, you got it from 8 seconds down to 0.01 seconds.
That's pretty good.
Now if you want to go further, first do as #Peter suggested in his comment.
Run the code many times inside main() so it runs long enough to get samples.
Then you could try my favorite technique.
It will be much more informative than gprof.
P.S. Don't worry about CPU percent.
All it tells is if your machine is busy and not doing much I/O.
It does not tell you anything about your program.

Excluding a function from gprof results

I want to exclude some functions from the output generated by gprof. In other words, I do not want them to be included when calculating percentage time spent by each function during execution. I read at one place -E option can be used.
However I'm using gprof -E function_to_be_exluded my_program_name, but nothing happens. The manual says it is depreciated and you should use symspecs instead. However, I have wasted half an hour trying to figure out how to achieve it with symspecs, but no luck. Anyone can kindly help me in this.
Exactly, gprof -e -E are deprecated and superseded by usage of newer relevant options that have argument - symspecs. So try using:
gprof --no-time=symspec
The -n option causes "gprof", in its call graph analysis, not to propagate times for
symbols matching symspec.
e.g.
gprof --no-time=name_of_function_you_dont_want_to_profile.
Use this along with your other gprof options (-E -e definitely ruled out)
According to the man:
for display flat profile and exclude function from it you need use -P option:
gprof main gmon.out -Pfunction_name
for display call graph and exclude function from it you need use -Q option:
gprof main gmon.out -Qfunction_name
This options can be repeated and used in the same time:
gprof main gmon.out -Pfunction_name -Qfunction_name -Qother_function_name
If you need exclude function from one report but not exclude any function from other you need use -p or -q options.
Example:
Create program:
#include <stdio.h>
#include <stdlib.h>
void func_a () {printf ("%s ",__FUNCTION__);}
void func_b () {printf ("%s ",__FUNCTION__);}
void func_c () {printf ("%s ",__FUNCTION__);}
int main ()
{
func_a ();
func_b ();
func_c ();
return EXIT_SUCCESS;
}
Compile it:
gcc main.c -pg -o main
And launch:
$ ./main
func_a func_b func_c
Generate profile reports:
If you need print only flat profile you need call:
$ gprof main gmon.out -b -p
% cumulative self self total
time seconds seconds calls Ts/call Ts/call name
0.00 0.00 0.00 1 0.00 0.00 func_a
0.00 0.00 0.00 1 0.00 0.00 func_b
0.00 0.00 0.00 1 0.00 0.00 func_c
If you need print flat profile excluding functions func_a and func_c and full call graph you need call:
$ gprof main gmon.out -b -Pfunc_a -Pfunc_c -q
% cumulative self self total
time seconds seconds calls Ts/call Ts/call name
0.00 0.00 0.00 1 0.00 0.00 func_b
index % time self children called name
0.00 0.00 1/1 main [9]
[1] 0.0 0.00 0.00 1 func_a [1]
-----------------------------------------------
0.00 0.00 1/1 main [9]
[2] 0.0 0.00 0.00 1 func_b [2]
-----------------------------------------------
0.00 0.00 1/1 main [9]
[3] 0.0 0.00 0.00 1 func_c [3]
-----------------------------------------------
If you need print flat profile excluding functions func_a and func_c and call graph excluding func_b you need call:
$ gprof main gmon.out -b -Pfunc_a -Pfunc_c -Qfunc_b
% cumulative self self total
time seconds seconds calls Ts/call Ts/call name
0.00 0.00 0.00 1 0.00 0.00 func_b
index % time self children called name
0.00 0.00 1/1 main [9]
[1] 0.0 0.00 0.00 1 func_a [1]
-----------------------------------------------
0.00 0.00 1/1 main [9]
[3] 0.0 0.00 0.00 1 func_c [3]
-----------------------------------------------
Unless I've misunderstood what you're asking...
gprof a.out --no-time=function_name
works for me.

Resources