Test embedded code by replacing static symbols at compile time - c

Background
I'm building a C application for an embedded Cortex M4 TI-RTOS SYS/BIOS target, however this question should apply to all embedded targets where a single binary is loaded onto some microprocessor.
What I want
I'd like to do some in situ regression tests on the target where I just replace a single function with some test function instead. E.g. a GetAdcMeasurement() function would return predefined values from a read-only array instead of doing the actual measurement and returning that value.
This could of course be done with a mess of #ifndefs, but I'd rather keep the production code as untouched as possible.
My attempt
I figure one way to achieve this would be to have duplicate symbol definitions at the linker stage, and then have the linker prioritise the definitions from the test suite (given some #define).
I've looked into using LD_PRELOAD, but that doesn't really seem to apply here (since I'm using only static objects).
Details
I'm using TI Code Composer, with TI-RTOS & SYS/BIOS on the Sitara AM57xx platform, compiling for the M4 remote processor (denoted IPU1).
Here's the path to the compiler and linker
/opt/ti/ccsv7/tools/compiler/ti-cgt-arm_16.9.6.LTS/bin/armcl

One solution could be to have multiple .c files for each module, one the production code and one the test code, and compile and link with one of the two. The globals and function signatures in both .c file must be at least the same (at least: there may be more symbols but not less).
Another solution, building on the previous one, is to have two libraries, one with the production code and one with the test code, and link with one of both. You could ieven link with both lubraries, with the test version first, as linkers often resolve symbols in the order they are encountered.
And, as you said, you could work with a bunch of #ifdefs, which would have the advantage of having just one .c file, but making tyhe code less readable.
I would not go for #ifdefs on the function level, i.e. defining just one function of a .c file for test and keeping the others as is; however, if necessary, it could be away. And if necessary, you could have one .c file (two) for each function, but that would negate the module concept.
I think the first approach would be the cleanest.

One additional approach (apart from Paul Ogilvie's) would be to have your mocking header also create a define which will replace the original function symbol at the pre-processing stage.
I.e. if your mocking header looks like this:
// mock.h
#ifdef MOCKING_ENABLED
adcdata_t GetAdcMeasurement_mocked(void);
stuff_t GetSomeStuff_mocked(void);
#define GetAdcMeasurement GetAdcMeasurement_mocked
#define GetSomeStuff GetSomeStuff_mocked
#endif
Then whenever you include the file, the preprocessor will replace the calls before it even hits the compiler:
#include "mock.h"
void SomeOtherFunc(void)
{
// preprocessor will change this symbol into 'GetAdcMeasurement_mocked'
adcdata_t data = GetAdcMeasurement();
}
The approach might confuse the unsuspected reader of your code, because they won't necessarily realize that you are calling a different function altogether. Nevertheless, I find this approach to have the least impact to the production code (apart from adding the include, obviously).

(This is a quick sum up the discussion in the comments, thanks for answers)
A function can be redefined if it has the weak attribute, see
https://en.wikipedia.org/wiki/Weak_symbol
On GCC that would be the weak attribute, e.g.
int __attribute__((weak)) power2(int x);
and on the armcl (as in my question) that would be the pragma directive
#pragma weak power2
int power2(int x);
Letting the production code consist of partly weak functions will allow a test framework to replace single functions.

Related

Make unresolved linking dependencies reported at runtime instead of at compilation/program load time for the purposes of unit testing

I have a home-grown unit testing framework for C programs on Linux using GCC. For each file in the project, let's say foobar.c, a matching file foobar-test.c may exist. If that is the case, both files are compiled and statically linked together into a small executable foobar-test which is then run. foobar-test.c is expected to contain main() which calls all the unit test cases defined in foobar-test.c.
Let's say I want to add a new test file barbaz-test.c to exercise sort() inside an existing production file barbaz.c:
// barbaz.c
#include "barbaz.h"
#include "log.h" // declares log() as a linking dependency coming from elsewhere
int func1() { ... res = log(); ...}
int func2() {... res = log(); ...}
int sort() {...}
Besides sort() there are several other functions in the same file which call into log() defined elsewhere in the project.
The functionality of sort() does not depend on log(), so testing it will never reach log(). Neither func1() nor func2() require testing and won't be reachable from the new test case I am about to prepare.
However, the barbaz-test executable cannot be successfully linked until I provide stub implementations of all dependencies coming from barbaz.c. A usual stub looks like this:
// in barbaz-test.c
#include "barbaz.h"
#include "log.h"
int log() {
assert(false && "stub must not be reached");
return 0;
}
// Actual test case for sort() starts here
...
If barbaz.c is large (which is often the case for legacy code written with no regard to the possibility to test it), it will contain many linking dependencies. I cannot start writing a test case for sort() until I provide stubs for all of them. Additionally, it creates a burden of maintaining these stubs, i.e. updating their prototypes whenever the production counterpart is updated, not forgetting to delete stubs which no longer are required etc.
What I am looking for is an option to have late runtime binding performed for missing symbols, similarly to how it is done in dynamic languages, but for C. If an unresolved symbol is reached during the test execution, that should lead to a failure. Having a proper diagnostic about the reason would be ideal, but a simple NULL pointer dereference would be good enough.
My current solution is to automate the initial generation of source code of stubs. It is done by analyzing of linking error messages and then looking up declarations for missing symbols in the headers. It is done in an ad-hoc manner, e.g. it involves "parsing" of C code with regular expressions.
Needless to say, it is very fragile: depends on specific format of linker error messages and uniformly formatted function declarations for regexps to recognize. It does not solve the future maintenance burden such stubs create either.
Another approach is to collect stubs for the most "popular" linking dependencies into a common object file which is then always linked into the test executables. This leaves a shorter list of "unique" dependencies requiring attention for each new file. This approach breaks down when a slightly specialized version of a common stub function has to be prepared. In such cases linking would fail with "the same symbol defined twice".
I may have stumbled on a solution myself, inspired by this discussion: Why can't ld ignore an unused unresolved symbol?
The linker can for sure determine if certain linking dependencies are not reachable. But it is not allowed to remove them by default because the compiler has put all function symbols into the same ELF section. The linker is not allowed to modify sections, but is allowed to drop whole sections.
A solution would be to add -fdata-sections and -ffunction-sections to compiler flags, and --gc-sections to linker flags.
The former options will create one section per function during the compilation. The latter will allow linker to remove unreachable code.
I do not think these flags can be safely used in a project without doing some benchmarking of the effects first. They affect size/speed of the production code.
man gcc says:
Only use these options when there are significant benefits from doing so. When you specify these options, the assembler and linker create larger object and executable files and are also slower. These options affect code generation. They prevent optimizations by the compiler and assembler using relative locations inside a translation unit since the locations are unknown until link time.
And it goes without saying that the solution only applies to the GCC/GNU Binutils toolchain.

Where are the header functions defined? [duplicate]

When I include some function from a header file in a C++ program, does the entire header file code get copied to the final executable or only the machine code for the specific function is generated. For example, if I call std::sort from the <algorithm> header in C++, is the machine code generated only for the sort() function or for the entire <algorithm> header file.
I think that a similar question exists somewhere on Stack Overflow, but I have tried my best to find it (I glanced over it once, but lost the link). If you can point me to that, it would be wonderful.
You're mixing two distinct issues here:
Header files, handled by the preprocessor
Selective linking of code by the C++ linker
Header files
These are simply copied verbatim by the preprocessor into the place that includes them. All the code of algorithm is copied into the .cpp file when you #include <algorithm>.
Selective linking
Most modern linkers won't link in functions that aren't getting called in your application. I.e. write a function foo and never call it - its code won't get into the executable. So if you #include <algorithm> and only use sort here's what happens:
The preprocessor shoves the whole algorithm file into your source file
You call only sort
The linked analyzes this and only adds the source of sort (and functions it calls, if any) to the executable. The other algorithms' code isn't getting added
That said, C++ templates complicate the matter a bit further. It's a complex issue to explain here, but in a nutshell - templates get expanded by the compiler for all the types that you're actually using. So if have a vector of int and a vector of string, the compiler will generate two copies of the whole code for the vector class in your code. Since you are using it (otherwise the compiler wouldn't generate it), the linker also places it into the executable.
In fact, the entire file is copied into .cpp file, and it depends on compiler/linker, if it picks up only 'needed' functions, or all of them.
In general, simplified summary:
debug configuration means compiling in all of non-template functions,
release configuration strips all unneeded functions.
Plus it depends on attributes -> function declared for export will be never stripped.
On the other side, template function variants are 'generated' when used, so only the ones you explicitly use are compiled in.
EDIT: header file code isn't generated, but in most cases hand-written.
If you #include a header file in your source code, it acts as if the text in that header was written in place of the #include preprocessor directive.
Generally headers contain declarations, i.e. information about what's inside a library. This way the compiler allows you to call things for which the code exists outside the current compilation unit (e.g. the .cpp file you are including the header from). When the program is linked into an executable that you can run, the linker decides what to include, usually based on what your program actually uses. Libraries may also be linked dynamically, meaning that the executable file does not actually include the library code but the library is linked at runtime.
It depends on the compiler. Most compilers today do flow analysis to prune out uncalled functions. http://en.wikipedia.org/wiki/Data-flow_analysis

Gathering test symbols into an array statically in C/C++

Short version of question
Is it possible to gather specific symbols in C into a single list/array into the executable statically at compile time, without relying on crt initialization (I frequently support embedded targets, and have limited support on dynamic memory).
EDIT: I'm 100% ok with this happening at link time and also ok with not having symbols cross library boundaries.
EDIT 2: I'm also OK with compiler specific answers if it's gcc or clang but would prefer cross platform if possible.
Longer version with more background
This has been a pain in my side for a while.
Right now I have a number of built-in self tests that I like to run in order.
I enforce the same calling convention on all functions and am manually gathering all the tests into an array statically.
// ThisLibrary_testlist.h
#define DECLARE_TEST(TESTNAME) void TESTNAME##_test(void * test_args)
DECLARE_TEST(test1);
DECLARE_TEST(test2);
DECLARE_TEST(test3);
// ThisLibrary_some_module.c
#include "ThisLibrary_testlist.h"
DECLARE_TEST(test1)
{
// ... do hood stuff here
}
// ThisLibrary_testarray.c
#include "ThisLibrary_testlist.h"
typedef void (*testfunc_t) (void*);
#define LIST_TEST(TESTNAME)
testfunc_t tests[] =
{
&LIST_TEST(test1),
&LIST_TEST(test2)
};
// now it's an array... you know what to do.
So far this has kept me alive but it's getting kind of ridiculous that I have to basically modify the code in 3 separate locations if I want to update a test.
Not to mention the absolute #ifdef nightmare that comes with conditionally compiled tests.
Is there a better way?
With a bit of scripting magic you could do the following: After compiling your source files (but before linking) you search the object files for symbols that match your test name pattern. See man nm how to obtain symbol names from object files (well, on Unix, that is - no idea about windows, sorry). Based on the list of object names found, you auto-create the file ThisLibrary_testarray.c, putting in all the extern declarations and then the function pointer table. After generation of this file, you compile it and finally link everything.
This way you only have to add new test functions to the source files. No need to maintain the header file ThisLibrary_testlist.h, but you have to make sure the test functions have external linkage, follow the naming pattern - and be sure no other symbol uses the naming pattern :-)

How can you share an internal set of functions between translation units without them having external linkage?

Let's say you are writing a library and you have a bunch of utility functions you have written just for yourself. Of course, you wouldn't want these functions to have external linkage so that they won't get mixed up by your library users (mostly because you are not going to tell the outside world of their existence)
On the other hand, these functions may be used in different translation units, so you want them to be shared internally.
Let's give an example. You have a library that does some stuff and in different source files you may need to copy_file and create_directory, so you would implement them as utility functions.
To make sure the user of your library doesn't accidentally get a linkage error because of having a function with the same name, I can think of the following solutions:
Terrible way: Copy paste the functions to every file that uses them adding static to their declaration.
Not a good way: Write them as macros. I like macros, but this is just not right here.
Give them such a weird name, that the chances of the user producing the same name would be small enough. This might work, but it makes the code using them very ugly.
What I do currently: Write them as static functions in an internal utils.h file and include that file in the source files.
Now the last option works almost fine, except it has one issue: If you don't use one of the functions, at the very least you get a warning about it (that says function declared static but never used). Call me crazy, but I keep my code warning free.
What I resorted to do was something like this:
utils.h:
...
#ifdef USE_COPY_FILE
static int copy_file(/* args */)
{...}
#endif
#ifdef USE_CREATE_DIR
static int create_dir(/* args */)
{...}
#endif
...
file1.c:
#define USE_COPY_FILE
#define USE_CREATE_DIR
#include "utils.h"
/* use both functions */
file2.c
#define USE_COPY_FILE
#include "utils.h
/* use only copy_file */
The problem with this method however is that it starts to get ugly as more utilities are introduced. Imagine if you have 10 of such functions, you need to have 7~8 lines of define before the include, if you need 7~8 of these functions!
Of course, another way would be to use DONT_USE_* type of macros that exclude functions, but then again you need a lot of defines for a file that uses few of these utility functions.
Either way, it doesn't look elegant.
My question is, how can you have functions that are internal to your own library, used by multiple translation units, and avoid external linkage?
Marking the functions static inline instead of static will make the warnings go away. It will do nothing about the code bloat of your current solution -- you're putting at least one copy of the function into each TU that uses it, and this will still be the case. Oli says in a comment that the linker might be smart enough to merge them. I'm not saying it isn't, but don't count on it :-)
It might even make the bloat worse, by encouraging the compiler to actually inline calls to the functions so that you get multiple copies per TU. But it's unlikely, GCC mostly ignores that aspect of the inline keyword. It inlines calls or not according to its own rules.
That's basically the best you can do portably. There's no way in standard C to define a symbol that's external from the POV of certain TUs (yours), but not from the POV of others (your users'). Standard C doesn't really care what libraries are, or the fact that TUs might be linked in several steps, or the difference between static and dynamic linking. So if you want the functions to be actually shared between your TUs, without any external symbol that could interfere with users of the library, then you need to do something specific to GCC and/or your static library or dll format to remove the symbols once the library is built but before the user links against it.
You can link your library normally, having these functions global, and localize them later.
objcopy can take global symbols and make them local, so they can't be linked with. It can also delete the symbol (the function stays, resolved references to it remain resolved, just the name is gone).
objcopy -L symbol localizes symbol. You can repeat -L multiple times.
objcopy -G symbol keeps symbol global, but localizes all others. You can repeat it also, and it will keep global all those you specified.
And I just found that I'm repeating the answer to this question, which Oli Charlesworth referenced in his comment.

Any good reason to #include source (*.c *.cpp) files?

i've been working for some time with an opensource library ("fast artificial neural network"). I'm using it's source in my static library. When i compile it however, i get hundreds of linker warnings which are probably caused by the fact that the library includes it's *.c files in other *.c files (as i'm only including some headers i need and i did not touch the code of the lib itself).
My question: Is there a good reason why the developers of the library used this approach, which is strongly discouraged? (Or at least i've been told all my life that this is bad and from my own experience i believe it IS bad). Or is it just bad design and there is no gain in this approach?
I'm aware of this related question but it does not answer my question. I'm looking for reasons that might justify this.
A bonus question: Is there a way how to fix this without touching the library code too much? I have a lot of work of my own and don't want to create more ;)
As far as I see (grep '#include .*\.c'), they only do this in doublefann.c, fixedfann.c, and floatfann.c, and each time include the reason:
/* Easy way to allow for build of multiple binaries */
This exact use of the preprocessor for simple copy-pasting is indeed the only valid use of including implementation (*.c) files, and relatively rare. (If you want to include some code for another reason, just give it a different name, like *.h or *.inc.) An alternative is to specify configuration in macros given to the compiler (e.g. -DFANN_DOUBLE, -DFANN_FIXED, or -DFANN_FLOAT), but they didn't use this method. (Each approach has drawbacks, so I'm not saying they're necessarily wrong, I'd have to look at that project in depth to determine that.)
They provide makefiles and MSVS projects which should already not link doublefann.o (from doublefann.c) with either fann.o (from fann.c) or fixedfann.o (from fixedfann.c) and so on, and either their files are screwed up or something similar has gone wrong.
Did you try to create a project from scratch (or use your existing project) and add all the files to it? If you did, what is happening is each implementation file is being compiled independently and the resulting object files contain conflicting definitions. This is the standard way to deal with implementation files and many tools assume it. The only possible solution is to fix the project settings to not link these together. (Okay, you could drastically change their source too, but that's not really a solution.)
While you're at it, if you continue without using their project settings, you can likely skip compiling fann.c, et. al. and possibly just removing those from the project is enough – then they won't be compiled and linked. You'll want to choose exactly one of double-/fixed-/floatfann to use, otherwise you'll get the same link errors. (I haven't looked at their instructions, but would not be surprised to see this summary explained a bit more in-depth there.)
Including C/C++ code leads to all the code being stuck together in one translation unit. With a good compiler, this can lead to a massive speed boost (as stuff can be inlined and function calls optimized away).
If actual code is going to be included like this, though, it should have static in most of its declarations, or it will cause the warnings you're seeing.
If you ever declare a single global variable or function in that .c file, it cannot be included in two places which both compile to the same binary, or the two definitions will collide. If it is included in even one place, it cannot also be compiled on its own while still being linked into the same binary as its user.
If the file is only included in one place, why not just make it a discrete compilation unit (and use its globals via extern declarations)? Why bother having it included at all?
If your C files declare no global variables or functions, they are header files and should be named as such.
Therefore, by exhaustive search, I can say that the only time you would ever potentially want to include C files is if the same C code is used in building multiple different binaries. And even there, you're increasing your compile time for no real gain.
This is assuming that functions which should be inlined are marked inline and that you have a decent compiler and linker.
I don't know of a quick way to fix this.
I don't know that library, but as you describe it, it is either bad practice or your understanding of how to use it is not good enough.
A C project that wants to be included by others should always provide well structured .h files for others and then the compiled library for linking. If it wants to include function definitions in header files it should either mark them as static (old fashioned) or as inline (possible since C99).
I haven't looked at the code, but it's possible that the .c or .cpp files being included actually contain code that works in a header. For example, a template or an inline function. If that is the case, then the warnings would be spurious.
I'm doing this at the moment at home because I'm a relative newcomer to C++ on Linux and don't want to get bogged down in difficulties with the linker. But I wouldn't recommend it for proper work.
(I also once had to include a header.dat into a C++ program, because Rational Rose didn't allow headers to be part of the issued software and we needed that particular source file on the running system (for arcane reasons).)

Resources