Gathering test symbols into an array statically in C/C++ - c

Short version of question
Is it possible to gather specific symbols in C into a single list/array into the executable statically at compile time, without relying on crt initialization (I frequently support embedded targets, and have limited support on dynamic memory).
EDIT: I'm 100% ok with this happening at link time and also ok with not having symbols cross library boundaries.
EDIT 2: I'm also OK with compiler specific answers if it's gcc or clang but would prefer cross platform if possible.
Longer version with more background
This has been a pain in my side for a while.
Right now I have a number of built-in self tests that I like to run in order.
I enforce the same calling convention on all functions and am manually gathering all the tests into an array statically.
// ThisLibrary_testlist.h
#define DECLARE_TEST(TESTNAME) void TESTNAME##_test(void * test_args)
DECLARE_TEST(test1);
DECLARE_TEST(test2);
DECLARE_TEST(test3);
// ThisLibrary_some_module.c
#include "ThisLibrary_testlist.h"
DECLARE_TEST(test1)
{
// ... do hood stuff here
}
// ThisLibrary_testarray.c
#include "ThisLibrary_testlist.h"
typedef void (*testfunc_t) (void*);
#define LIST_TEST(TESTNAME)
testfunc_t tests[] =
{
&LIST_TEST(test1),
&LIST_TEST(test2)
};
// now it's an array... you know what to do.
So far this has kept me alive but it's getting kind of ridiculous that I have to basically modify the code in 3 separate locations if I want to update a test.
Not to mention the absolute #ifdef nightmare that comes with conditionally compiled tests.
Is there a better way?

With a bit of scripting magic you could do the following: After compiling your source files (but before linking) you search the object files for symbols that match your test name pattern. See man nm how to obtain symbol names from object files (well, on Unix, that is - no idea about windows, sorry). Based on the list of object names found, you auto-create the file ThisLibrary_testarray.c, putting in all the extern declarations and then the function pointer table. After generation of this file, you compile it and finally link everything.
This way you only have to add new test functions to the source files. No need to maintain the header file ThisLibrary_testlist.h, but you have to make sure the test functions have external linkage, follow the naming pattern - and be sure no other symbol uses the naming pattern :-)

Related

Make unresolved linking dependencies reported at runtime instead of at compilation/program load time for the purposes of unit testing

I have a home-grown unit testing framework for C programs on Linux using GCC. For each file in the project, let's say foobar.c, a matching file foobar-test.c may exist. If that is the case, both files are compiled and statically linked together into a small executable foobar-test which is then run. foobar-test.c is expected to contain main() which calls all the unit test cases defined in foobar-test.c.
Let's say I want to add a new test file barbaz-test.c to exercise sort() inside an existing production file barbaz.c:
// barbaz.c
#include "barbaz.h"
#include "log.h" // declares log() as a linking dependency coming from elsewhere
int func1() { ... res = log(); ...}
int func2() {... res = log(); ...}
int sort() {...}
Besides sort() there are several other functions in the same file which call into log() defined elsewhere in the project.
The functionality of sort() does not depend on log(), so testing it will never reach log(). Neither func1() nor func2() require testing and won't be reachable from the new test case I am about to prepare.
However, the barbaz-test executable cannot be successfully linked until I provide stub implementations of all dependencies coming from barbaz.c. A usual stub looks like this:
// in barbaz-test.c
#include "barbaz.h"
#include "log.h"
int log() {
assert(false && "stub must not be reached");
return 0;
}
// Actual test case for sort() starts here
...
If barbaz.c is large (which is often the case for legacy code written with no regard to the possibility to test it), it will contain many linking dependencies. I cannot start writing a test case for sort() until I provide stubs for all of them. Additionally, it creates a burden of maintaining these stubs, i.e. updating their prototypes whenever the production counterpart is updated, not forgetting to delete stubs which no longer are required etc.
What I am looking for is an option to have late runtime binding performed for missing symbols, similarly to how it is done in dynamic languages, but for C. If an unresolved symbol is reached during the test execution, that should lead to a failure. Having a proper diagnostic about the reason would be ideal, but a simple NULL pointer dereference would be good enough.
My current solution is to automate the initial generation of source code of stubs. It is done by analyzing of linking error messages and then looking up declarations for missing symbols in the headers. It is done in an ad-hoc manner, e.g. it involves "parsing" of C code with regular expressions.
Needless to say, it is very fragile: depends on specific format of linker error messages and uniformly formatted function declarations for regexps to recognize. It does not solve the future maintenance burden such stubs create either.
Another approach is to collect stubs for the most "popular" linking dependencies into a common object file which is then always linked into the test executables. This leaves a shorter list of "unique" dependencies requiring attention for each new file. This approach breaks down when a slightly specialized version of a common stub function has to be prepared. In such cases linking would fail with "the same symbol defined twice".
I may have stumbled on a solution myself, inspired by this discussion: Why can't ld ignore an unused unresolved symbol?
The linker can for sure determine if certain linking dependencies are not reachable. But it is not allowed to remove them by default because the compiler has put all function symbols into the same ELF section. The linker is not allowed to modify sections, but is allowed to drop whole sections.
A solution would be to add -fdata-sections and -ffunction-sections to compiler flags, and --gc-sections to linker flags.
The former options will create one section per function during the compilation. The latter will allow linker to remove unreachable code.
I do not think these flags can be safely used in a project without doing some benchmarking of the effects first. They affect size/speed of the production code.
man gcc says:
Only use these options when there are significant benefits from doing so. When you specify these options, the assembler and linker create larger object and executable files and are also slower. These options affect code generation. They prevent optimizations by the compiler and assembler using relative locations inside a translation unit since the locations are unknown until link time.
And it goes without saying that the solution only applies to the GCC/GNU Binutils toolchain.

Where are the header functions defined? [duplicate]

When I include some function from a header file in a C++ program, does the entire header file code get copied to the final executable or only the machine code for the specific function is generated. For example, if I call std::sort from the <algorithm> header in C++, is the machine code generated only for the sort() function or for the entire <algorithm> header file.
I think that a similar question exists somewhere on Stack Overflow, but I have tried my best to find it (I glanced over it once, but lost the link). If you can point me to that, it would be wonderful.
You're mixing two distinct issues here:
Header files, handled by the preprocessor
Selective linking of code by the C++ linker
Header files
These are simply copied verbatim by the preprocessor into the place that includes them. All the code of algorithm is copied into the .cpp file when you #include <algorithm>.
Selective linking
Most modern linkers won't link in functions that aren't getting called in your application. I.e. write a function foo and never call it - its code won't get into the executable. So if you #include <algorithm> and only use sort here's what happens:
The preprocessor shoves the whole algorithm file into your source file
You call only sort
The linked analyzes this and only adds the source of sort (and functions it calls, if any) to the executable. The other algorithms' code isn't getting added
That said, C++ templates complicate the matter a bit further. It's a complex issue to explain here, but in a nutshell - templates get expanded by the compiler for all the types that you're actually using. So if have a vector of int and a vector of string, the compiler will generate two copies of the whole code for the vector class in your code. Since you are using it (otherwise the compiler wouldn't generate it), the linker also places it into the executable.
In fact, the entire file is copied into .cpp file, and it depends on compiler/linker, if it picks up only 'needed' functions, or all of them.
In general, simplified summary:
debug configuration means compiling in all of non-template functions,
release configuration strips all unneeded functions.
Plus it depends on attributes -> function declared for export will be never stripped.
On the other side, template function variants are 'generated' when used, so only the ones you explicitly use are compiled in.
EDIT: header file code isn't generated, but in most cases hand-written.
If you #include a header file in your source code, it acts as if the text in that header was written in place of the #include preprocessor directive.
Generally headers contain declarations, i.e. information about what's inside a library. This way the compiler allows you to call things for which the code exists outside the current compilation unit (e.g. the .cpp file you are including the header from). When the program is linked into an executable that you can run, the linker decides what to include, usually based on what your program actually uses. Libraries may also be linked dynamically, meaning that the executable file does not actually include the library code but the library is linked at runtime.
It depends on the compiler. Most compilers today do flow analysis to prune out uncalled functions. http://en.wikipedia.org/wiki/Data-flow_analysis

Test embedded code by replacing static symbols at compile time

Background
I'm building a C application for an embedded Cortex M4 TI-RTOS SYS/BIOS target, however this question should apply to all embedded targets where a single binary is loaded onto some microprocessor.
What I want
I'd like to do some in situ regression tests on the target where I just replace a single function with some test function instead. E.g. a GetAdcMeasurement() function would return predefined values from a read-only array instead of doing the actual measurement and returning that value.
This could of course be done with a mess of #ifndefs, but I'd rather keep the production code as untouched as possible.
My attempt
I figure one way to achieve this would be to have duplicate symbol definitions at the linker stage, and then have the linker prioritise the definitions from the test suite (given some #define).
I've looked into using LD_PRELOAD, but that doesn't really seem to apply here (since I'm using only static objects).
Details
I'm using TI Code Composer, with TI-RTOS & SYS/BIOS on the Sitara AM57xx platform, compiling for the M4 remote processor (denoted IPU1).
Here's the path to the compiler and linker
/opt/ti/ccsv7/tools/compiler/ti-cgt-arm_16.9.6.LTS/bin/armcl
One solution could be to have multiple .c files for each module, one the production code and one the test code, and compile and link with one of the two. The globals and function signatures in both .c file must be at least the same (at least: there may be more symbols but not less).
Another solution, building on the previous one, is to have two libraries, one with the production code and one with the test code, and link with one of both. You could ieven link with both lubraries, with the test version first, as linkers often resolve symbols in the order they are encountered.
And, as you said, you could work with a bunch of #ifdefs, which would have the advantage of having just one .c file, but making tyhe code less readable.
I would not go for #ifdefs on the function level, i.e. defining just one function of a .c file for test and keeping the others as is; however, if necessary, it could be away. And if necessary, you could have one .c file (two) for each function, but that would negate the module concept.
I think the first approach would be the cleanest.
One additional approach (apart from Paul Ogilvie's) would be to have your mocking header also create a define which will replace the original function symbol at the pre-processing stage.
I.e. if your mocking header looks like this:
// mock.h
#ifdef MOCKING_ENABLED
adcdata_t GetAdcMeasurement_mocked(void);
stuff_t GetSomeStuff_mocked(void);
#define GetAdcMeasurement GetAdcMeasurement_mocked
#define GetSomeStuff GetSomeStuff_mocked
#endif
Then whenever you include the file, the preprocessor will replace the calls before it even hits the compiler:
#include "mock.h"
void SomeOtherFunc(void)
{
// preprocessor will change this symbol into 'GetAdcMeasurement_mocked'
adcdata_t data = GetAdcMeasurement();
}
The approach might confuse the unsuspected reader of your code, because they won't necessarily realize that you are calling a different function altogether. Nevertheless, I find this approach to have the least impact to the production code (apart from adding the include, obviously).
(This is a quick sum up the discussion in the comments, thanks for answers)
A function can be redefined if it has the weak attribute, see
https://en.wikipedia.org/wiki/Weak_symbol
On GCC that would be the weak attribute, e.g.
int __attribute__((weak)) power2(int x);
and on the armcl (as in my question) that would be the pragma directive
#pragma weak power2
int power2(int x);
Letting the production code consist of partly weak functions will allow a test framework to replace single functions.

Removing internal symbols from C static library

I'm working on some embedded code that is shipped as a static library. We would like to remove all internal symbols from the library and keep only the API symbols visible.
Here's an example of what we want to do: imagine that you have a file called internal.c and one called api.c that look like that:
/* internal.c */
int fibonacci(int n)
{
/* Compute the nth Fibonacci number and return it */
}
/* api.c */
#include "internal.h"
#include <stdio.h>
void print_fibonacci(n)
{
printf("Fibonacci(%d): %d\n", n, fibonacci(n));
}
The user should only have access to the print_fibonacci function while all internal symbols such as the fibonacci function should be resolved before shipping. That means that the user should be able to define his own function called fibonacci without having to worry about conflicts with the library.
We already tried internal linkage by using ld --relocatable, but we can't seem to remove the symbols afterwards using objcopy. Is this feasible at all?
Thanks for the help!
Edit: The user-defined fibonacci function should not replace the library-defined function, they should just be able to coexist. Basically I'm looking for a solution to solve naming conflicts.
Static libraries are essentially a bunch of object files. All object files in a static library are treated as if they were provided individually by the linker. Generally, it is not possible to make the linker treat some symbols as internal, the linker simply does not have enough information to do so.
Here are a couple of strategies to solve these issues:
Construct a separate name space for non-public functions in your library. For instance, your fibonacci function can be placed in an internal name space libfoo_internal_fibonacci. If you're desparate, you can use macros in your internal header files like this:
#define fibonacci INTERNAL_PREFIX ## fibonacci
This would allow you to change the prefix arbitrarily on compile time. I suggest to not do that as it makes debugging harder. If you can cope with longer internal names, this would be a good strategy.
Make all internal functions static and merge translation units so that each internal function is only used by one translation unit. This might solve your problem but it makes the resulting programs larger: Most linkers can either take an object as a whole or not take it at all. You might end up with lots of dead code in the program if the linker has to include huge object files if you want to use just a single function.
Turn your library into a shared library and use mapfiles or a different mechanism to specify which symbols are supposed to be exported. This is the best option in my opinion but it's not totally portable and perhaps you really want your library to remain static.

How can you share an internal set of functions between translation units without them having external linkage?

Let's say you are writing a library and you have a bunch of utility functions you have written just for yourself. Of course, you wouldn't want these functions to have external linkage so that they won't get mixed up by your library users (mostly because you are not going to tell the outside world of their existence)
On the other hand, these functions may be used in different translation units, so you want them to be shared internally.
Let's give an example. You have a library that does some stuff and in different source files you may need to copy_file and create_directory, so you would implement them as utility functions.
To make sure the user of your library doesn't accidentally get a linkage error because of having a function with the same name, I can think of the following solutions:
Terrible way: Copy paste the functions to every file that uses them adding static to their declaration.
Not a good way: Write them as macros. I like macros, but this is just not right here.
Give them such a weird name, that the chances of the user producing the same name would be small enough. This might work, but it makes the code using them very ugly.
What I do currently: Write them as static functions in an internal utils.h file and include that file in the source files.
Now the last option works almost fine, except it has one issue: If you don't use one of the functions, at the very least you get a warning about it (that says function declared static but never used). Call me crazy, but I keep my code warning free.
What I resorted to do was something like this:
utils.h:
...
#ifdef USE_COPY_FILE
static int copy_file(/* args */)
{...}
#endif
#ifdef USE_CREATE_DIR
static int create_dir(/* args */)
{...}
#endif
...
file1.c:
#define USE_COPY_FILE
#define USE_CREATE_DIR
#include "utils.h"
/* use both functions */
file2.c
#define USE_COPY_FILE
#include "utils.h
/* use only copy_file */
The problem with this method however is that it starts to get ugly as more utilities are introduced. Imagine if you have 10 of such functions, you need to have 7~8 lines of define before the include, if you need 7~8 of these functions!
Of course, another way would be to use DONT_USE_* type of macros that exclude functions, but then again you need a lot of defines for a file that uses few of these utility functions.
Either way, it doesn't look elegant.
My question is, how can you have functions that are internal to your own library, used by multiple translation units, and avoid external linkage?
Marking the functions static inline instead of static will make the warnings go away. It will do nothing about the code bloat of your current solution -- you're putting at least one copy of the function into each TU that uses it, and this will still be the case. Oli says in a comment that the linker might be smart enough to merge them. I'm not saying it isn't, but don't count on it :-)
It might even make the bloat worse, by encouraging the compiler to actually inline calls to the functions so that you get multiple copies per TU. But it's unlikely, GCC mostly ignores that aspect of the inline keyword. It inlines calls or not according to its own rules.
That's basically the best you can do portably. There's no way in standard C to define a symbol that's external from the POV of certain TUs (yours), but not from the POV of others (your users'). Standard C doesn't really care what libraries are, or the fact that TUs might be linked in several steps, or the difference between static and dynamic linking. So if you want the functions to be actually shared between your TUs, without any external symbol that could interfere with users of the library, then you need to do something specific to GCC and/or your static library or dll format to remove the symbols once the library is built but before the user links against it.
You can link your library normally, having these functions global, and localize them later.
objcopy can take global symbols and make them local, so they can't be linked with. It can also delete the symbol (the function stays, resolved references to it remain resolved, just the name is gone).
objcopy -L symbol localizes symbol. You can repeat -L multiple times.
objcopy -G symbol keeps symbol global, but localizes all others. You can repeat it also, and it will keep global all those you specified.
And I just found that I'm repeating the answer to this question, which Oli Charlesworth referenced in his comment.

Resources