Clang Coverage Mapping with -fprofile-instr-generate - c

I'm trying to generate a code coverage report for some individually compiled tests on Ubuntu 18.04 and running into a strange problem. If I compile with clang 5.0.0 and pass it the -fprofile-instr-generate and -fcoverage-mapping flags, it does actually work and running the compiled test causes it to spit out a .profraw file I can process with llvm-cov and turn into a coverage report. However, the only coverage that it seems to track is that of the test harness and any code directly included via #include, completely ignoring code that was linked. As an example, if a header file is included via #include it will show coverage for that file but not for the associated .c file that the actual called code is stored in. From some research it seemed like the solution was to add -fprofile-instr-generate to the linking step as well, but this didn't change the result at all. It's terrible practice (and unsustainable) to manually #include any files I want to see the coverage of, but I don't see another option that lets me view the coverage of linked code (specifically, the coverage of the function I'm calling in the test harness and anything that function calls). Is this a problem that other people have had, and does anyone know how to solve it?

You need to add the flags when compiling the units to be covered. The compiler will instrument the object code, that means it adds the code that counts "passing control flow" to say it simply.
This means that you need to compile the unit-under-test again before you link it to your test harness. That test harness in contrary does not need to be compiled with the flags, because you are not interested in its coverage most probably.
There is no need to include the unit's source into another source. How you found, this is too hard to maintain.
However, the linker needs the flags, too.
For the production code you will compile your units without the flags.

I ran into this problem. It turns out that I need --object option for each binary.
I got the report that includes both reports on library and executable with this command:
llvm-cov report --object mylib.so --object my_executable --instr-profile somewhere.profdata

Fixed my own issue! Turns out the test was linking with a .so shared library that contained my target functions, so the problem was solved by doing two things. First, instrumenting the step that compiled the .o objects into the shared library allowed for visibility into the target functions. Second, using llvm-cov show/report with the .profdata generated by the test and passing it the shared library as the binary instead of the test binary allowed for it to report coverage information for the target functions. Lesson learned, always check if your instrumented test is importing shared libraries and make sure your shared libraries are being compiled with the right flags to instrument them!

Related

List "never linked against" source file in C project

I would like to know if someone is aware of a trick to retrieve the list of files that had been (or ideally will be) used by linker to produce an executable.
Some kind of solution must exist. A a static source analyzer, or a hack, such as compiling with some weird flags, and analyzing produced executable with another tool, or force the linker to output this information.
The goal is to provide a tool that strip useless source files from a list of source files.
The end goal is to ease the build process, by allowing him to give a list of usable source files. Then my tool would only compile the ones actually used by linker instead of everything.
This would allow for some unit_test to still be runnable even if some others are broken and can't compile, while not asking the user to manually list every test dependencies manually in the cmake.
I am targetting linux for now, but will be intersted in the futur to do the same trick on others OS. So I would like a cross-platform solution, eventhought I doubt I will have it :)
Thanks for your help
Edit because I see that it is confusing, what I mean by
allowing him to give a list of usable source file
is that, in cmake, for exemple. If you use add_executable(name, sources), then sources is considered as the sources to compile and link on.
I want to wrap add_executable, so sources is viewed as a set of usable if necessary sources files.
I'm afraid the idea of detecting never linked source files is not a fruitful one.
To build a program, CMake will not compile a source file if it not going to link the resulting object
file into the program. I can understand how you might think that this happens, but it doesn't happen.
CMake already does what you would like it to do and the same is true of every other build automation system going back to
their invention in the 1970s. The fundamental purpose of all
such systems is to ensure that the building of a program
compiles a source file name.(c|cc|f|m|...) if and only if
the object file name.o is going to be linked into the program
and is out of date or does not exist. You can always defeat this purpose by
egregiously bad coding of the project's build spec (CMakeLists.txt, Makefile, SConstruct, etc.),
but with CMake you would need to be really trying to do it, and
trying quite expertly.
If you do not want name.c to be compiled and the object file name.o
linked into a target program, then you do not tell the build system
that name.o or name.c is a prerequisite of the program. Don't tell
it what you know is not true. It is elementary competence not to specify redundant prerequisites of
a build system target.
The linker will link all its input object files into an output
program without question. It does not ask whether or not they are "needed"
by the program because it cannot answer that question. Neither the
linker nor any possible static analysis tool can know what program
you intend to produce when you input some object files for linkage.
It can only be assumed that you intend to produce the program that
results from the linkage of those object files, assuming the
linkage is successful.
If those object files cannot be linked into a program at all, the linker will tell you
that, and why. Otherwise, if you have linked object files that you didn't
intend to link, you can only discover that for yourself, by noticing
the mistake in the build log, or failing that by testing the program and/or inspecting its contents and comparing
your observations with your expectations.
Given your choice of object files for linkage, you can instruct the linker
to detect any code sections or data sections it extracts those object files in
which no symbols are defined that can be referenced by the program, and to
throw away all such unreferenced input sections instead of linking them
into the program. This is called linktime "garbage collection". You tell the
linker to do it by passing the option -Wl,-gc-sections in the
gcc linkage command. See this question
to learn how to maximise the collectible garbage. This is what you
can do to remove redundant object code from the linkage.
But you can only collect any garbage from a program in this way if the program
is dynamically opaque, i.e not linked with the option -rdynamic
: then the global symbols defined in the program's static image are not visible
to the OS loader and cannot be referenced from outside its static image by dynamic
libraries in the same process. In this case the linker can determine by static
analysis that a symbol whose definition is not referenced in the program's static
image cannot be referenced at all, since it cannot be referenced dynamically,
and if all symbols defined in an input section are statically unreferenced then
it can garbage-collect the section.
If the program has been linked -rdynamic then -Wl,-gc-sections will
collect no garbage, and this is quite right, because if the program is
not dynamically opaque then it is impossible for static analysis to determine that anything
defined in its linkage cannot be referenced.
It's noteworthy that although -rdynamic is not a default linkage
option for GCC, it is a default linkage option for CMake projects using
the GCC toolchain. So to use linktime garbage collection in CMake projects
you would always have to override the -rdynamic default. And obviously it would only be
valid to do this if you have determined that it is alright for the program to
be dynamically opaque.

Optimization: Faster compilation

Separating a program into header and source files perhaps might benefit in faster compilation if given to a smart compilation manager, which is what I am working on.
Will on theory work:
Creating a thread for each source file and
compiling each source file into object file at once.
Then link those object files together.
It still needs to wait for the source file being the slowest.
This shouldn't be a problem as a simple n != nSources counter can be implemented that increments for each .o generated.
I don't think GCC on default does that. When it invokes the assembler
it should parse the files one by one.
Is this a valid approach and how could I optimize compilation time even further?
All modern (as in post 2000-ish) make's offer this feature. Both GNU make and the various flavours of BSD make will compile source files in separate threads with the -j flag. It just requires that you have a makefile, of course. Ninja also does this by default. It vastly speeds up compilation.

Compile-time test if function is optimized out

I'm writing a small operating system for microcontrollers in C (not C++, so I can't use templates). It makes heavy use of some gcc features, one of the most important being the removal of unused code. The OS doesn't load anything at runtime; the user's program and the OS source are compiled together to form a single binary.
This design allows gcc to include only the OS functions that the program actually uses. So if the program never uses i2c or USB, support for those won't be included in the binary.
The problem is when I want to include optional support for those features without introducing a dependency. For example, a debug console should provide functions to debug i2c if it's being used, but including the debug console shouldn't also pull in i2c if the program isn't using it.
The methods that come to mind to achieve this aren't ideal:
Have the user explicitly enable the modules they need (using #define), and use #if to only include support for them in the debug console if enabled. I don't like this method, because currently the user doesn't have to do this, and I'd prefer to keep it that way.
Have the modules register function pointers with the debug module at startup. This isn't ideal, because it adds some runtime overhead and means the debug code is split up over several files.
Do the same as above, but using weak symbols instead of pointers. But I'm still not sure how to actually accomplish this.
Do a compile-time test in the debug code, like:
if(i2cInit is used) {
debugShowi2cStatus();
}
The last method seems ideal, but is it possible?
This seems like an interesting problem. Here's an idea, although it's not perfect:
Two-pass compile.
What you can do is first, compile the program with a flag like FINDING_DEPENDENCIES=1. Surround all the dependency checks with #ifs for this (I'm assuming you're not as concerned about adding extra ifs there.)
Then, when the compile is done (without any optional features), use nm or similar to detect the usage of functions/features in the program (such as i2cInit), and format this information into a .h file.
#ifndef FINDING_DEPENDENCIES
#include "dependency_info.h"
#endif
Now the optional dependencies are known.
This still doesn't seem like a perfect solution, but ultimately, it's mostly a chicken-and-the-egg problem. When compiling, the compiler doesn't know what symbols are going to be gc'd out. You basically need to get this information from the linker stage and feed it back to the compilation stage.
Theoretically, this might not increase build times much, especially if you used a temp file for the generated h, and then only replaced it if it was different. You'd need to use different object dirs, though.
Also this might help (pre-strip, of course):
How can I view function names and parameters contained in an ELF file?

Modular programming and compiling a C program in linux

So I have been studying this Modular programming that mainly compiles each file of the program at a time. Say we have FILE.c and OTHER.c that both are in the same program. To compile it, we do this in the prompt
$gcc FILE.c OTHER.c -c
Using the -c flag to compile it into .o files (FILE.o and OTHER.o) and only when that happens do we translate it (compile) to executable using
$gcc FILE.o OTHER.o -o
I know I can just do it and skip the middle part but as it shows everywhere, they do it first and then they compile it into executable, which I can't understand at all.
May I know why?
If you are working on a project with several modules, you don't want to recompile all modules if only some of them have been modified. The final linking command is however always needed. Build tools such as make is used to keep track of which modules need to be compiled or recompiled.
Doing it in two steps allows to separate more clearly the compiling and linking phases.
The output of the compiling step is object (.o) files that are machine code but missing the external references of each module (i.e. each c file); for instance file.c might use a function defined in other.c, but the compiler doesn't care about that dependency in that step;
The input of the linking step is the object files, and its output is the executable. The linking step bind together the object files by filling the blanks (i.e. resolving dependencies between objets files). That's also where you add the libraries to your executable.
This part of another answer responds to your question:
You might ask why there are separate compilation and linking steps.
First, it's probably easier to implement things that way. The compiler
does its thing, and the linker does its thing -- by keeping the
functions separate, the complexity of the program is reduced. Another
(more obvious) advantage is that this allows the creation of large
programs without having to redo the compilation step every time a file
is changed. Instead, using so called "conditional compilation", it is
necessary to compile only those source files that have changed; for
the rest, the object files are sufficient input for the linker.
Finally, this makes it simple to implement libraries of pre-compiled
code: just create object files and link them just like any other
object file. (The fact that each file is compiled separately from
information contained in other files, incidentally, is called the
"separate compilation model".)
It was too long to put in a comment, please give credit to the original answer.

Compile coretuils as shared objects?

I am trying to compile GNU Coreutils as a set of shared libraries, instead of a set of executables. I thought that make would let me pass in a flag to tell it to do this, but from what I can see I would actually have to modify the configure.ac and Makefile.am in order to make this work. I would prefer not to do this, since this potentially introduces bugs into code that I can currently rely on being bug-free. I tried manually turning the object files into so's by entering:
make CFLAGS='-fpic'
gcc -shared -o ls.so coreutils/src/ls.o
I am able to create the so file, but there seem to be a number of flags that I am missing, and I don't see any way to access a list of necessary flags to compile and link the code (even though this information is clearly contained in the computer). The only thing I can think to do is manually go through all of the linker errors and try to figure out what flags are missing, but I'm hoping that there is a less tedious way of getting what I want.
Not sure what you're trying to do, but related to this is the ./configure --enable-single-binary option which links all objects to a single executable.

Resources