GCOV static library coverage for C source code

GCOV static library coverage for C source code - c

I want to perform code coverage on a static library. For this I wrote test cases using boost. In my library I have many functions defined in header files.
For example in a header file accuracy.h I have the following functions
static float absf( float x )
{
return (x >= 0.0f) ? x : -x;
}
static boolean almost_zero( float n, float tol )
{
return (boolean)(absf( n ) <= tol);
}
I have written test cases for these functions. But the problem is GCOV shows these functions are not covered. If I move the function definition to C file then I get the proper coverage results.
I have used -fprofile-arcs -ftest-coverag for performing coverage. Does anyone has any idea on this issue.
Note:
Test cases are executed properly. I have confirmed it by debugging.
I am using MinGW gcc version 4.8.1 (GCC).

Functions in header files are difficult for coverage. It's not just a technical difficulty - it's also a presentation difficulty. These functions are copied every time the header is #included. Does full coverage require that all copies are covered? Or that one instance is covered?
From the user's perspective, both answers may be wrong.
Also, there are likely to be functions lurking in header files that the user does not care about. For instance, ctype.h has a few of these.
That's probably why coverage tools tend to ignore them entirely.
I work on a coverage tool, RapiCover, and our approach is to ignore them by default but provide an option to turn on coverage for headers. The option can be used on a file-by-file basis, and you can also specifically name the functions that you want coverage for. We found that this was the best way to support typical customer requirements.
I suggest you try forcing gcov to believe that the functions are defined in C source code rather than the header. To do this, preprocess your source file (e.g. -E option for GCC) and then filter out the # markers that indicate files and line numbers. Then do gcov on this preprocessed, filtered file. It should see all functions as part of the source code. This trick would also work with RapiCover, though it would not be necessary there.

Related

Test embedded code by replacing static symbols at compile time

Background
I'm building a C application for an embedded Cortex M4 TI-RTOS SYS/BIOS target, however this question should apply to all embedded targets where a single binary is loaded onto some microprocessor.
What I want
I'd like to do some in situ regression tests on the target where I just replace a single function with some test function instead. E.g. a GetAdcMeasurement() function would return predefined values from a read-only array instead of doing the actual measurement and returning that value.
This could of course be done with a mess of #ifndefs, but I'd rather keep the production code as untouched as possible.
My attempt
I figure one way to achieve this would be to have duplicate symbol definitions at the linker stage, and then have the linker prioritise the definitions from the test suite (given some #define).
I've looked into using LD_PRELOAD, but that doesn't really seem to apply here (since I'm using only static objects).
Details
I'm using TI Code Composer, with TI-RTOS & SYS/BIOS on the Sitara AM57xx platform, compiling for the M4 remote processor (denoted IPU1).
Here's the path to the compiler and linker
/opt/ti/ccsv7/tools/compiler/ti-cgt-arm_16.9.6.LTS/bin/armcl

One solution could be to have multiple .c files for each module, one the production code and one the test code, and compile and link with one of the two. The globals and function signatures in both .c file must be at least the same (at least: there may be more symbols but not less).
Another solution, building on the previous one, is to have two libraries, one with the production code and one with the test code, and link with one of both. You could ieven link with both lubraries, with the test version first, as linkers often resolve symbols in the order they are encountered.
And, as you said, you could work with a bunch of #ifdefs, which would have the advantage of having just one .c file, but making tyhe code less readable.
I would not go for #ifdefs on the function level, i.e. defining just one function of a .c file for test and keeping the others as is; however, if necessary, it could be away. And if necessary, you could have one .c file (two) for each function, but that would negate the module concept.
I think the first approach would be the cleanest.

One additional approach (apart from Paul Ogilvie's) would be to have your mocking header also create a define which will replace the original function symbol at the pre-processing stage.
I.e. if your mocking header looks like this:
// mock.h
#ifdef MOCKING_ENABLED
adcdata_t GetAdcMeasurement_mocked(void);
stuff_t GetSomeStuff_mocked(void);
#define GetAdcMeasurement GetAdcMeasurement_mocked
#define GetSomeStuff GetSomeStuff_mocked
#endif
Then whenever you include the file, the preprocessor will replace the calls before it even hits the compiler:
#include "mock.h"
void SomeOtherFunc(void)
{
// preprocessor will change this symbol into 'GetAdcMeasurement_mocked'
adcdata_t data = GetAdcMeasurement();
}
The approach might confuse the unsuspected reader of your code, because they won't necessarily realize that you are calling a different function altogether. Nevertheless, I find this approach to have the least impact to the production code (apart from adding the include, obviously).

(This is a quick sum up the discussion in the comments, thanks for answers)
A function can be redefined if it has the weak attribute, see
https://en.wikipedia.org/wiki/Weak_symbol
On GCC that would be the weak attribute, e.g.
int __attribute__((weak)) power2(int x);
and on the armcl (as in my question) that would be the pragma directive
#pragma weak power2
int power2(int x);
Letting the production code consist of partly weak functions will allow a test framework to replace single functions.

C - Header Files versus Functions

What are the pros and cons of shoving everything in one file:
void function(void) {
code...
}
Versus creating a completely new file for functions:
#include <stdio.h>
#include "header.h"
Is one or the other faster? More lightweight? I am in a situation where speed is necessary and portability is a must.
Might I add this is all based on C.

If you care about speed, you first should write a correct program, care about efficient algorithms (read Introduction to Algorithms), benchmark & profile it (perhaps using gprof and/or oprofile), and focus your efforts mostly on the few percents of source code which are critical to performance.
You'll better define these small critical functions in common included header files as static inline functions. The compiler would then be able to inline every call to them if it wants to (and it needs access to the definition of the function to inline).
In general small inlined functions would often run faster, because there is no call overhead in the compiled machine code; sometimes, it might perhaps go slightly slower, because inlining increases machine code size which is detrimental to CPU cache efficiency (read about locality of reference). Also a header file with many static inline functions needs more time to be compiled.
As a concrete example, my Linux system has a header /usr/include/glib-2.0/glib/gstring.h (from Glib in GTK) containing
/* -- optimize g_string_append_c --- */
#ifdef G_CAN_INLINE
static inline GString*
g_string_append_c_inline (GString *gstring,
gchar c)
{
if (gstring->len + 1 < gstring->allocated_len)
{
gstring->str[gstring->len++] = c;
gstring->str[gstring->len] = 0;
}
else
g_string_insert_c (gstring, -1, c);
return gstring;
}
#define g_string_append_c(gstr,c) g_string_append_c_inline (gstr, c)
#endif /* G_CAN_INLINE */
The G_CAN_INLINE preprocessor flag would have been enabled by some previously included header file.
It is a good example of inline function: it is short (a dozen of lines), it would run quickly its own code (excluding the time to call to g_string_insert_c), so it is worth to be defined as static inline.
It is not worth defining as inline a short function which runs by itself a significant time. There is no point inlining a matrix multiplication for example (the call overhead is insignificant w.r.t. the time to make a 100x100 or 8x8 matrix multiplication). So choose carefully the functions you want to inline.
You should trust the compiler, and enable its optimizations (in particular when benchmarking or profiling). For GCC, that would mean compiling with gcc -O3 -mcpu=native (and I also recommend -Wall -Wextra to get useful warnings). You might use link time optimizations by compiling and linking with gcc -flto -O3 -mcpu=native

You need to be clear about the concepts of header files, translation units and separate compilation.
The #include directive does nothing more than insert the content of the included file at the point of inclusion as if it were all one file, so in that sense placing content into a header file has no semantic or performance difference than "shoving everything in one file".
The point is that is not how header files should be used or what they are intended for; you will quickly run into linker errors and/or code bloat on anything other than the most trivial programs. A header file should generally contain only declarative code not definitive code. Take a look inside the standard headers for example - you will find no function definitions, only declarations (there may be some interfaces defined as macros or possibly since C99, inline functions, but that is a different issue).
What header-files provide is a means to support separate compilation and linking of code in separate translation units. A translation unit is a source file (.c in this case) with all it's #include'ed and #define'ed etc. content expanded by the pre-processor before actual compilation.
When the compiler builds a translation unit, there will be unresolved links to external code declared in headers. These declarations are a promise to the compiler that there is an interface of the form declared that is defined elsewhere and will be resolved by the linker.
The conventional form (although there are few restrictions to stop you from dong unconventional or foolish things) of a multiple module C program source is as follows:
main.c
#include foobar.h
int main( void )
{
int x = foo() ;
bar( x ) ;
return 0 ;
}
foobar.h
#if !defined foobar_INCLUDE
#define foobar_INCLUDE
int foo( void ) ;
void bar( int x ) ;
#endif
Note the use of the pre-processor here to prevent multiple declarations when a file is included more than once which can happen in complex code bases with nested includes for example. All your headers should have such "include guards" - some compilers support #pragma once to do the same thing, but it is less portable.
foobar.c
#include "foobar.h"
int foo( void )
{
int x = 0 ;
// do something
return x ;
}
void bar( int x )
{
// do something
}
Then main.c and foobar.c (and any other modules) are separately compiled and then linked, the linker also resolves references to library interfaces provided by the standard library or any other external libraries. A library in this sense is simply a collection of previously separately compiled object code.
Now that is perhaps clear, to answer your question but re-present it as the pros and cons of separate compilation and linking the benefits are:
code reuse - you build your own libraries of useful routines that can be reused in many projects without erroneous copy & pasting.
Build time reduction - on a non-trivial application the separate compilation and linking would be managed by a build manager such as make or an IDE such as Ecipse or Visual Studio; these tools perform incremental builds compiling only those modules for which the source or one of it's header dependencies have been modified. This means you are not compiling all the code all the time so turn-around during debugging and testing is much faster.
Development team scalability - if all your code is in one file, it becomes almost impractical to have multiple developers working on the same project at once. If you want to work with others either on open-source projects or as a career (the two are not necessarily mutually exclusive of course), you really cannot consider the all-in-one approach. Not least because your fellow developers will not take toy seriously if that is your practice.
Specifically separate compilation and linking has zero impact on performance or code size under normal circumstances. There is possibly an impact on the ability of the compiler to optimise in some cases when it cannot see all of the code at one time, but if your code is carefully partitioned according to the principles of high cohesion and minimal coupling this potential loss of opportunity is probably insignificant. Moreover modern linkers are able to perform some cross-module optimisations such as unused code removal in any case.

Its not a question of which one is "faster". Header files are custom created when you have a function or functions which you'd want to use in a lot of other places or in other projects. For example, if you've written a function to calculate the factorial of a number and you'd want to use that function in other programs (or you find that you'd have to replicate the same code in other programs as well) then instead of writing the function in the other programs, it'll be more convenient if you'd put it in a header file. Generally, a header file contains functions which are relevant to a certain subject (like math.h contains functions for mathematical calculations and not for string processing).

How to automatically call all functions in C source code

have you ever heard about automatic C code generators?
I have to do a kind of strange API functionality research which includes at least one attempt of every function execution. It may lead to crushes, segmentation faults - no matter. I just need to register every function call.
So i got a long list (several hundreds) of functions from sources using
ctags -x --c-kinds=f *.c
Can i use any tool to generate code calling every of them? Thanks a lot.
UPD: thanks for all your answers.

You could also consider customizing the GCC compiler, e.g. with a MELT extension (which e.g. would generate the testing during some customized compilation). Then you might even define your own #pragma or __attribute__ to parameterize these functions (enabling their auto-testing, giving default arguments for testing, etc etc).
However, I'm not sure it is the right approach for unit testing. There are many unit testing frameworks (but I am not very familiar with them).

Maybe something like autoconf could help you with that: as described here. In particular check for AC_CHECK_FUNCS. Autoconf creates small programs to test the existence of registered functions.

Any good reason to #include source (.c .cpp) files?

i've been working for some time with an opensource library ("fast artificial neural network"). I'm using it's source in my static library. When i compile it however, i get hundreds of linker warnings which are probably caused by the fact that the library includes it's *.c files in other *.c files (as i'm only including some headers i need and i did not touch the code of the lib itself).
My question: Is there a good reason why the developers of the library used this approach, which is strongly discouraged? (Or at least i've been told all my life that this is bad and from my own experience i believe it IS bad). Or is it just bad design and there is no gain in this approach?
I'm aware of this related question but it does not answer my question. I'm looking for reasons that might justify this.
A bonus question: Is there a way how to fix this without touching the library code too much? I have a lot of work of my own and don't want to create more ;)

As far as I see (grep '#include .*\.c'), they only do this in doublefann.c, fixedfann.c, and floatfann.c, and each time include the reason:
/* Easy way to allow for build of multiple binaries */
This exact use of the preprocessor for simple copy-pasting is indeed the only valid use of including implementation (*.c) files, and relatively rare. (If you want to include some code for another reason, just give it a different name, like *.h or *.inc.) An alternative is to specify configuration in macros given to the compiler (e.g. -DFANN_DOUBLE, -DFANN_FIXED, or -DFANN_FLOAT), but they didn't use this method. (Each approach has drawbacks, so I'm not saying they're necessarily wrong, I'd have to look at that project in depth to determine that.)
They provide makefiles and MSVS projects which should already not link doublefann.o (from doublefann.c) with either fann.o (from fann.c) or fixedfann.o (from fixedfann.c) and so on, and either their files are screwed up or something similar has gone wrong.
Did you try to create a project from scratch (or use your existing project) and add all the files to it? If you did, what is happening is each implementation file is being compiled independently and the resulting object files contain conflicting definitions. This is the standard way to deal with implementation files and many tools assume it. The only possible solution is to fix the project settings to not link these together. (Okay, you could drastically change their source too, but that's not really a solution.)
While you're at it, if you continue without using their project settings, you can likely skip compiling fann.c, et. al. and possibly just removing those from the project is enough – then they won't be compiled and linked. You'll want to choose exactly one of double-/fixed-/floatfann to use, otherwise you'll get the same link errors. (I haven't looked at their instructions, but would not be surprised to see this summary explained a bit more in-depth there.)

Including C/C++ code leads to all the code being stuck together in one translation unit. With a good compiler, this can lead to a massive speed boost (as stuff can be inlined and function calls optimized away).
If actual code is going to be included like this, though, it should have static in most of its declarations, or it will cause the warnings you're seeing.

If you ever declare a single global variable or function in that .c file, it cannot be included in two places which both compile to the same binary, or the two definitions will collide. If it is included in even one place, it cannot also be compiled on its own while still being linked into the same binary as its user.
If the file is only included in one place, why not just make it a discrete compilation unit (and use its globals via extern declarations)? Why bother having it included at all?
If your C files declare no global variables or functions, they are header files and should be named as such.
Therefore, by exhaustive search, I can say that the only time you would ever potentially want to include C files is if the same C code is used in building multiple different binaries. And even there, you're increasing your compile time for no real gain.
This is assuming that functions which should be inlined are marked inline and that you have a decent compiler and linker.
I don't know of a quick way to fix this.

I don't know that library, but as you describe it, it is either bad practice or your understanding of how to use it is not good enough.
A C project that wants to be included by others should always provide well structured .h files for others and then the compiled library for linking. If it wants to include function definitions in header files it should either mark them as static (old fashioned) or as inline (possible since C99).

I haven't looked at the code, but it's possible that the .c or .cpp files being included actually contain code that works in a header. For example, a template or an inline function. If that is the case, then the warnings would be spurious.

I'm doing this at the moment at home because I'm a relative newcomer to C++ on Linux and don't want to get bogged down in difficulties with the linker. But I wouldn't recommend it for proper work.
(I also once had to include a header.dat into a C++ program, because Rational Rose didn't allow headers to be part of the issued software and we needed that particular source file on the running system (for arcane reasons).)

Using Sparse to check C code

Does anyone have experience with Sparse? I seem unable to find any documentation, so the warnings, and errors it produces are unclear to me. I tried checking the mailing list and man page but there really isn't much in either.
For instance, I use INT_MAX in one of my files. This generates an error (undefined identifier) even though I #include limits.h.
Is there any place where the errors and warnings have been explained?

Sparse isn't intended to be a lint, per say. Sparse is intended to produce a parse tree of arbitrary code so that it can be further analyzed.
In your example, you either want to define GNU_SOURCE (which I believe turns on __GNUC__), which exposes the bits you need in limits.h
I would avoid defining __GNUC__ on its own, as several things it activates might behave in an undefined way without all of the other switches that GNU_SOURCE turns on being defined.
My point isn't to help you squash error by error, its to reiterate that sparse is mostly used as a library, not as a stand alone static analysis tool.
From my copy of the README (not sure if I have the current version) :
This means that a user of the library will literally just need to do
struct string_list *filelist = NULL;
char *file;
action(sparse_initialize(argc, argv, filelist));
FOR_EACH_PTR_NOTAG(filelist, file) {
action(sparse(file));
} END_FOR_EACH_PTR_NOTAG(file);
and he is now done - having a full C parse of the file he opened. The
library doesn't need any more setup, and once done does not impose any
more requirements. The user is free to do whatever he wants with the
parse tree that got built up, and needs not worry about the library ever
again. There is no extra state, there are no parser callbacks, there is
only the parse tree that is described by the header files. The action
function takes a pointer to a symbol_list and does whatever it likes with it.
The library also contains (as an example user) a few clients that do the
preprocessing, parsing and type evaluation and just print out the
results. These clients were done to verify and debug the library, and
also as trivial examples of what you can do with the parse tree once it
is formed, so that users can see how the tree is organized.
The included clients are more 'functional test suites and examples' than anything. Its a very useful tool, but you might consider another usage angle if you want to employ it. I like it because it doesn't use *lex / bison , which makes it remarkably easier to hack.

If you look at limits.h you'll see that INT_MAX is defined inside this #if
/* If we are not using GNU CC we have to define all the symbols ourself.
Otherwise use gcc's definitions (see below). */
#if !defined __GNUC__ || __GNUC__ < 2
so to get it to work you should undefine __GNUC__ before including limits.h

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight