How do the code coverage options of GCC work? - c

Consider the following command:
gcc -fprofile-arcs -ftest-coverage main.c
It generates the file, main.gcda, which is to be used by gcov, to generate the coverage analysis.
So how does main.gcda is generated? How the instrumentation is done? Can I see the instrumented code?

.gcda is not generated by compiler; it's generated by your program when you execute it.
.gcno is the file generated at compilation time and it is the 'note file'. gcc generate a basic block graph notes file (.gcno) for each CU(compiler unit).
So how does main.gcda is generated?
At running time the statistic data is gathered and stored in memory. Some exit callback is registered and is called to write the data to the .gcda file when the program terminates. This means if you call abort() instead of exit() in your program, no .gcda file would be generated.
How the instrumentation is done? Can I see the instrumented code?
You way need check gcc's implementation to get the details but basically the instrumentation is done by inserting instruction to the program to count the number of times each instruction is executed. But it doesn't really have to keep a counter for each instruction; GCC uses some algorithm to generate a program flow graph and finds a spanning tree for the graph. Only some special arcs have to be instrumented and from them the coverage of all code branches can be generated.
You can disassemble the binary to see the instrumented code.
And here are some files for coverage if you want to look into the gcc source file:
toplev.c
coverage.c
profile.c
libgcov.c
gcov.c
gcov-io.c
edit: some known gcov bugs FYI:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49484
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28441
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44779
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=7970

Can I see the instrumented code?
You cannot see the instrumented data like gcda files .
How Gcov works ?
GCOV works in four phases:
1. Code instrumentation during compilation
2. Data collection during code execution
3. Data extraction at program exit time
4. Coverage analysis and presentation post-mortem.
to know more about individual steps u can go through this pdf.
http://ltp.sourceforge.net/documentation/technical_papers/gcov-ols2003.pdf

You can see which code related to gcov is instrumented at compile time executable or obj file, you can use following steps.
nm executable/objfile
Below is image attached of steps and output : -

Related

what are dump and auxillary files?

Im a newcommer to Linux and the gcc commands. I was reading the
gcc documentation particularly about the -o flag where it mentions the following:
Though -o names only the primary output, it also affects the naming of
auxiliary and dump outputs. See the examples below. Unless overridden,
both auxiliary outputs and dump outputs are placed in the same
directory as the primary output. In auxiliary outputs, the suffix of
the input file is replaced with that of the ...
They mention it quite a lot following this paragraph but don't explain it. I've skimmed the document and also looked online but haven't found any satisfactory explanation. If someone could provide me some explanation or even link me to some resources where I can learn about these terms it would be greatly appreciated. Thanks!
-o file
Place the output in file. This applies regardless of the type of output produced, whether it is an executable file, an object file, an assembler file or preprocessed C code.
Since only one output file can be specified, it makes no sense to use -o when compiling more than one input file, unless you want to output an executable file.
If -o is not specified, the default behavior is to produce an executable file named a.out, an object file for source.suffix named source.o, its assembler file in source.s, and all C source code preprocessed on standard output.
source: http://www.linuxcertif.com/man/1/gcc/
hope it will be useful

Line level profiling of a C program by GNU

Could Someone kindly tell me how can I profile single lines or blocks of code of a program in C with GNU profiler?
I used gprof ./a.out gmon.out which gives me flat profile and Call graph. However, I would like to see lines that are more frequently accessed.
Thanks,
This is probably one of those things that you just don't know the term you should've googled, so I'll answer it:
The term you are looking for is "annotation"-you want to annotate the source and see the line by line hits in the code.
Calling gprof with the -A flag will dump out the samples on each line that were caught.
See Also:
https://sourceware.org/binutils/docs/gprof/Annotated-Source.html
Ok, I'll post this answer so if some newbie like me searched for it can find it faster :)
here are the steps: source
gcc -fprofile-arcs -ftest-coverage fourcefile.c
(At the end of compilation files *.gcno will be produced)
Run the executable.
Run gcov: gcov sourcefile.c
(At the end of running, a file (*.gcov) will be produced which contains which contains all the required info)

Can the object files output by gcc vary between compilations of the same source with the same options?

Does the gcc output of the object file (C language) vary between compilations? There is no time-specific information, no change in compilation options or the source code. No change in linked libraries, environmental variables either. This is a VxWorks MIPS64 cross compiler, if that helps. I personally think it shouldn't change. But I observe that sometimes randomly, the instructions generated changes. I don't know what's the reason. Can anyone throw some light on this?
How is this built? For example, if I built the very same Linux kernel, it includes a counter that is incremented each build. GCC has options to use profiler information to guide code generation, if the profiling information changes, so will the code.
What did you analyze? The generated assembly, an objdump of object files or the executable? How did you compare the different versions? Are you sure you looked at executable code, not compiler/assembler/linker timestamps?
Did anything change in the environment? New libraries (and header files/declarations/macro definitions!)? New compiler, linker? New kernel (yes, some header files originate with the kernel source and are shipped with it)?
Any changes in environment variables (another user doing the compiling, different machine, different hookup to the net gives a different IP address that makes it's way into the build)?
I'd try tracing the build process in detail (run a build and capture the output in a file, and do so again; compare those).
Completely mystified...
I had a similar problem with g++. Pre 4.3 versions produced exactly the same object files each time. With 4.3 (and later?) some of the mangled symbol names are different for each run - even without -g or other recordings. Perhaps the use a time stamp or random number (I hope not). Obviously some of those symbols make it into the .o symbol table and you get a difference.
Stripping the object file(s) makes them equal again (wrt. binary comparison).
g++ -c file.C ; strip file.o; cmp file.o origfile.o
Why should it vary? It is the same result always. Try this:
for i in `seq 1000`; do gcc 1.c; md5sum a.out; done | sort | uniq | wc -l
The answer is always 1. Replace 1.c and a.out to suit your needs.
The above counts how many different executables are generated by gcc when compiling the same source for 1000 times.
I've found that in at least some environments, the same source may yield a different executable if the source tree for the subsequent build is located in a different directory. Example:
Checkout a pristine copy of your project to dir1. Do a full rebuild from scratch.
Then, with the same user on the same machine, checkout the same exact copy of your source code to dir2 (dir1 != dir2). Do another full rebuild from scratch.
These builds are minutes apart, with no change in the toolchain or any 3rd party libs or code. Binary comparison of source code is the same. However, the executable in dir1 has different md5sum than the executable in dir2.
If I compare the different executables in BeyondCompare's hex editor, the difference is not just some tiny section that could plausibly be a timestamp.
I do get the same executable if I build in dir1, then rebuild again in dir1. Same if I keep building the same source over and over from dir2.
My only guess is that some sort of absolute paths of the include hierarchy are embedded in the executable.
My gcc sometimes produces different code for exactly the same Input. The output object files differ in exactly one byte.
Sometimes this causes linker Errors, because one possible object file is invalid. Recompiling another version usually fixes the linker error.
The gcc Version is 4.3.4 on Suse Linux Enterprise.
The gcc Parameters are:
cc -std=c++0x -Wall -fno-builtin -march=native -g -I<path1> -I<path2> -I<path3> -o obj/file.o -c file.cpp
If someone experiences the same effect, then please let me know.

lcov : coverage of source for several executions

I've created simple hello word cpp app.
Compiled it by passing gcc --coverage flag
Executed the executable
Generated coverage by invoking
lcov --directory . --capture --output-file ic.info
Generated html based report by genhtml
genhtml -o html/ ic.info
Now the question. No matter how many times I'm running the executable I'm getting always the same result, i.e. the same coverage of lines and functions. Should it increase the line coverage for every execution ? Do I get something wrong ?
If lcov generates coverage only for one execution, then how can I generate coverage for all executions that I've done ?
I guess you misunderstand how the coverage results are generated. lcov is not generating the coverage, as stated in your question. It only processes the coverage results, which are generated when running your program (step 3 in your question).
So, when executing the program multiple times (step 3) your line execution times will increase (not necessary the coverage). To see this you can generated multiple coverage reports (execute step 3,4 and 5 multiple times). You will see an increase in the execution times of lines in your code in the reports generated in step 5.

gprof : How to generate call graph for functions in shared library that is linked to main program

I am working on Linux environment. I have two 'C' source packages train and test_train.
train package when compiled generates libtrain.so
test_train links to libtrain.so and generates executable train-test
Now I want to generate a call graph using gprof which shows calling sequence of functions in main program as well as those inside libtrain.so
I am compiling and linking both packages with -pg option and debugging level is o0.
After I do ./train-test , gmon.out is generated. Then I do:
$ gprof -q ./train-test gmon.out
Here, output shows call graph of functions in train-test but not in libtrain.so
What could be the problem ?
gprof won't work, you need to use sprof instead. I found these links helpful:
How to use sprof?
http://greg-n-blog.blogspot.com/2010/01/profiling-shared-library-on-linux-using.html
Summary from the 2nd link:
Compile your shared library (libmylib.so) in debug (-g) mode. No -pg.
export LD_PROFILE_OUTPUT=`pwd`
export LD_PROFILE=libmylib.so
rm -f $LD_PROFILE.profile
execute your program that loads libmylib.so
sprof PATH-TO-LIB/$LD_PROFILE $LD_PROFILE.profile -p >log
See the log.
I found that in step 2, it needs to be an existing directory -- otherwise you get a helpful warning. And in step 3, you might need to specify the library as libmylib.so.X (maybe even .X.Y, not sure) -- otherwise you get no warning whatsoever.
I'm loading my library from Python and didn't have any luck with sprof. Instead, I used oprofile, which was in the Fedora repositories, at least:
operf --callgraph /path/to/mybinary
Wait for your application to finish or do Ctl-c to stop profiling. Now let's generate a profile summary:
opreport --callgraph --symbols
See the documentation to interpret it. It's kind of a mess. In the generated report, each symbol is listed in a block of its own. The block's main symbol is the one that's not indented. The items above it are functions that call that function, and the ones below it are the things that get called by it. The percentages in the below section are the relative amount of time it spent in those callees.
If you're not on Linux (like me on Solaris) you simply out of luck as there is no sprof there.
If you have the sources of your library you can solve your problem by linking a static library and making your profiling binary with that one instead.
Another way I manage to trace calls to shared libraries, is by using truss. With the option -u [!]lib,...:[:][!]func, ... one can get a good picture of the call history of a run. It's not completely the same as profiling but can be very usefull in some scenarios.

Resources