Removing internal symbols from C static library - c

I'm working on some embedded code that is shipped as a static library. We would like to remove all internal symbols from the library and keep only the API symbols visible.
Here's an example of what we want to do: imagine that you have a file called internal.c and one called api.c that look like that:
/* internal.c */
int fibonacci(int n)
{
/* Compute the nth Fibonacci number and return it */
}
/* api.c */
#include "internal.h"
#include <stdio.h>
void print_fibonacci(n)
{
printf("Fibonacci(%d): %d\n", n, fibonacci(n));
}
The user should only have access to the print_fibonacci function while all internal symbols such as the fibonacci function should be resolved before shipping. That means that the user should be able to define his own function called fibonacci without having to worry about conflicts with the library.
We already tried internal linkage by using ld --relocatable, but we can't seem to remove the symbols afterwards using objcopy. Is this feasible at all?
Thanks for the help!
Edit: The user-defined fibonacci function should not replace the library-defined function, they should just be able to coexist. Basically I'm looking for a solution to solve naming conflicts.

Static libraries are essentially a bunch of object files. All object files in a static library are treated as if they were provided individually by the linker. Generally, it is not possible to make the linker treat some symbols as internal, the linker simply does not have enough information to do so.
Here are a couple of strategies to solve these issues:
Construct a separate name space for non-public functions in your library. For instance, your fibonacci function can be placed in an internal name space libfoo_internal_fibonacci. If you're desparate, you can use macros in your internal header files like this:
#define fibonacci INTERNAL_PREFIX ## fibonacci
This would allow you to change the prefix arbitrarily on compile time. I suggest to not do that as it makes debugging harder. If you can cope with longer internal names, this would be a good strategy.
Make all internal functions static and merge translation units so that each internal function is only used by one translation unit. This might solve your problem but it makes the resulting programs larger: Most linkers can either take an object as a whole or not take it at all. You might end up with lots of dead code in the program if the linker has to include huge object files if you want to use just a single function.
Turn your library into a shared library and use mapfiles or a different mechanism to specify which symbols are supposed to be exported. This is the best option in my opinion but it's not totally portable and perhaps you really want your library to remain static.

Related

Gathering test symbols into an array statically in C/C++

Short version of question
Is it possible to gather specific symbols in C into a single list/array into the executable statically at compile time, without relying on crt initialization (I frequently support embedded targets, and have limited support on dynamic memory).
EDIT: I'm 100% ok with this happening at link time and also ok with not having symbols cross library boundaries.
EDIT 2: I'm also OK with compiler specific answers if it's gcc or clang but would prefer cross platform if possible.
Longer version with more background
This has been a pain in my side for a while.
Right now I have a number of built-in self tests that I like to run in order.
I enforce the same calling convention on all functions and am manually gathering all the tests into an array statically.
// ThisLibrary_testlist.h
#define DECLARE_TEST(TESTNAME) void TESTNAME##_test(void * test_args)
DECLARE_TEST(test1);
DECLARE_TEST(test2);
DECLARE_TEST(test3);
// ThisLibrary_some_module.c
#include "ThisLibrary_testlist.h"
DECLARE_TEST(test1)
{
// ... do hood stuff here
}
// ThisLibrary_testarray.c
#include "ThisLibrary_testlist.h"
typedef void (*testfunc_t) (void*);
#define LIST_TEST(TESTNAME)
testfunc_t tests[] =
{
&LIST_TEST(test1),
&LIST_TEST(test2)
};
// now it's an array... you know what to do.
So far this has kept me alive but it's getting kind of ridiculous that I have to basically modify the code in 3 separate locations if I want to update a test.
Not to mention the absolute #ifdef nightmare that comes with conditionally compiled tests.
Is there a better way?
With a bit of scripting magic you could do the following: After compiling your source files (but before linking) you search the object files for symbols that match your test name pattern. See man nm how to obtain symbol names from object files (well, on Unix, that is - no idea about windows, sorry). Based on the list of object names found, you auto-create the file ThisLibrary_testarray.c, putting in all the extern declarations and then the function pointer table. After generation of this file, you compile it and finally link everything.
This way you only have to add new test functions to the source files. No need to maintain the header file ThisLibrary_testlist.h, but you have to make sure the test functions have external linkage, follow the naming pattern - and be sure no other symbol uses the naming pattern :-)

Test embedded code by replacing static symbols at compile time

Background
I'm building a C application for an embedded Cortex M4 TI-RTOS SYS/BIOS target, however this question should apply to all embedded targets where a single binary is loaded onto some microprocessor.
What I want
I'd like to do some in situ regression tests on the target where I just replace a single function with some test function instead. E.g. a GetAdcMeasurement() function would return predefined values from a read-only array instead of doing the actual measurement and returning that value.
This could of course be done with a mess of #ifndefs, but I'd rather keep the production code as untouched as possible.
My attempt
I figure one way to achieve this would be to have duplicate symbol definitions at the linker stage, and then have the linker prioritise the definitions from the test suite (given some #define).
I've looked into using LD_PRELOAD, but that doesn't really seem to apply here (since I'm using only static objects).
Details
I'm using TI Code Composer, with TI-RTOS & SYS/BIOS on the Sitara AM57xx platform, compiling for the M4 remote processor (denoted IPU1).
Here's the path to the compiler and linker
/opt/ti/ccsv7/tools/compiler/ti-cgt-arm_16.9.6.LTS/bin/armcl
One solution could be to have multiple .c files for each module, one the production code and one the test code, and compile and link with one of the two. The globals and function signatures in both .c file must be at least the same (at least: there may be more symbols but not less).
Another solution, building on the previous one, is to have two libraries, one with the production code and one with the test code, and link with one of both. You could ieven link with both lubraries, with the test version first, as linkers often resolve symbols in the order they are encountered.
And, as you said, you could work with a bunch of #ifdefs, which would have the advantage of having just one .c file, but making tyhe code less readable.
I would not go for #ifdefs on the function level, i.e. defining just one function of a .c file for test and keeping the others as is; however, if necessary, it could be away. And if necessary, you could have one .c file (two) for each function, but that would negate the module concept.
I think the first approach would be the cleanest.
One additional approach (apart from Paul Ogilvie's) would be to have your mocking header also create a define which will replace the original function symbol at the pre-processing stage.
I.e. if your mocking header looks like this:
// mock.h
#ifdef MOCKING_ENABLED
adcdata_t GetAdcMeasurement_mocked(void);
stuff_t GetSomeStuff_mocked(void);
#define GetAdcMeasurement GetAdcMeasurement_mocked
#define GetSomeStuff GetSomeStuff_mocked
#endif
Then whenever you include the file, the preprocessor will replace the calls before it even hits the compiler:
#include "mock.h"
void SomeOtherFunc(void)
{
// preprocessor will change this symbol into 'GetAdcMeasurement_mocked'
adcdata_t data = GetAdcMeasurement();
}
The approach might confuse the unsuspected reader of your code, because they won't necessarily realize that you are calling a different function altogether. Nevertheless, I find this approach to have the least impact to the production code (apart from adding the include, obviously).
(This is a quick sum up the discussion in the comments, thanks for answers)
A function can be redefined if it has the weak attribute, see
https://en.wikipedia.org/wiki/Weak_symbol
On GCC that would be the weak attribute, e.g.
int __attribute__((weak)) power2(int x);
and on the armcl (as in my question) that would be the pragma directive
#pragma weak power2
int power2(int x);
Letting the production code consist of partly weak functions will allow a test framework to replace single functions.

Included files, all or nothing?

If I #include a file in C, do I get the entire contents of the file linked in, or just the parts I use?
If it has 10 functions in it, and I only use one of the functions, does the code for the other nine functions get included in my executable? This is especially relevant for me right now as I am working on a microcontroller and memory is precious.
Firstly, header files do not get "linked in". #include is basically a textual copy-paste feature. Everything from your include file gets pasted by preprocessor into the final translation unit, which will later be seamlessly processed by the compiler proper. The compiler proper knows nothing about any header files or #include directives.
Secondly, it means that if in your code you declared or defined some function or variable that you do not use, it is completely irrelevant whether it came from a header file through #include or was written directly in source file. There's absolutely no difference.
Thirdly, the question is: what exactly do you have in your header file that you include? Typically, header files do not define objects and functions, they simply declare them. Declarations do not produce any code, regardless whether you use the function or not. Declarations simply tell the compiler that the code (generated from the function definition) already exists elsewhere. Thus, as long as we are talking about typical header files, #include directives and header files by themselves have no effect on final code size.
Fourthly, if your header file is of some unusual kind that contains function (or object) definitions, then see "firstly" and "secondly" above. The compiler proper can see only one translation unit at a time, for which reason a typical strategy for the compiler proper is to completely discard unused entities with internal linkage (i.e. static objects and functions) and keep all entities with external linkage. Entities with external linkage cannot be discarded by compiler proper, since they might be needed in some other translation unit.
Fifthly, at linking stage linker can see the program in its entirety and, for that reason, can discard unused objects and functions, if it is advanced enough for that (and if you allow linker to do it). Meanwhile, inclusion-exclusion precision of a typical run-of-the-mill linker is limited to a single object file. Each object file is atomic to such linker. This means that if you want to be able to exclude unused functions on per-function basis, you might have to adopt "one function per object file" strategy, i.e. write one and only one function per .c file. Of course, this is only possible when you write your own code. If some third-party library you want to use does not adhere to this convention, then you might not be able to exclude individual functions.
If you #include a file in C, the entire contents of that file are added to your source file and compiled by your compiler. A header file, though, usually only has declarations of functions and no definitions (so no actual code is compiled).
The linker, on the other hand, takes all the functions from all the libraries and compiled source code and merges them into the final output file. At this time, the linker will discard any functions that you aren't using.
So, to answer your question: only the functions you use (and indirectly depend on) will be included in your final program file, and this is independent of what files you #include. Happy hacking!
You have to distinguish between different scenarios:
What does the included header file contain? Declarations of external functions only, or also static function definitions?
How are the implementations of the external functions distributed which are declared in that the header file you include? Are they all implemented in one .c file, or distributed across several .c files?
Regarding point 1: Only by #includeing external declarations, no other code will become part of your object file. And, definitions of static functions that are part of the header file, but which are not referenced by your code, may not become part of your object file - this is an optimization that is fairly common. It depends on your compiler, however.
Regarding point 2: Some linkers can only link whole object files, all or nothing. That means, if all the external functions declared in a header file are implemented in one .c file, and, if your code references at least one of these functions, chances are that you will get the whole object file, including all the other functions you don't use. Some linkers, however, can avoid this and remove unused parts when linking object files.
One brute-force approach to deal with non-optimizing linkers is, to put every external function into a .c file of its own. You will, however, have to find a way to deal with the situation that some of these functions refer to a common static function that is part of the original .c file...
Include simply presents the compiler ultimately with what looks like a single file (and if you do save-temps on GCC you will see that exactly a single file is presented to the actual compiler). It is no more complicated than that. So if you have some function prototypes or defines in your .c file then having them come from an include makes no difference whatsoever; the end result is the same.
If the things you include include code, functions, and not just prototypes, then it is the same as if you had those in the .c file itself. Whether or not those show up in the final binary has to do with whether or not you declared them as global or not using static, and then whether or not you optimized, etc. The same goes for variables and structures and other things.
Not all linkers are the same, but a common way to do it is whatever the compiler left in the object goes into the final binary. But if you take those objects and make a library out of them then some/many(?) linkers don’t suck everything into the binary on the portions that are required to resolve the dependencies.

How can you share an internal set of functions between translation units without them having external linkage?

Let's say you are writing a library and you have a bunch of utility functions you have written just for yourself. Of course, you wouldn't want these functions to have external linkage so that they won't get mixed up by your library users (mostly because you are not going to tell the outside world of their existence)
On the other hand, these functions may be used in different translation units, so you want them to be shared internally.
Let's give an example. You have a library that does some stuff and in different source files you may need to copy_file and create_directory, so you would implement them as utility functions.
To make sure the user of your library doesn't accidentally get a linkage error because of having a function with the same name, I can think of the following solutions:
Terrible way: Copy paste the functions to every file that uses them adding static to their declaration.
Not a good way: Write them as macros. I like macros, but this is just not right here.
Give them such a weird name, that the chances of the user producing the same name would be small enough. This might work, but it makes the code using them very ugly.
What I do currently: Write them as static functions in an internal utils.h file and include that file in the source files.
Now the last option works almost fine, except it has one issue: If you don't use one of the functions, at the very least you get a warning about it (that says function declared static but never used). Call me crazy, but I keep my code warning free.
What I resorted to do was something like this:
utils.h:
...
#ifdef USE_COPY_FILE
static int copy_file(/* args */)
{...}
#endif
#ifdef USE_CREATE_DIR
static int create_dir(/* args */)
{...}
#endif
...
file1.c:
#define USE_COPY_FILE
#define USE_CREATE_DIR
#include "utils.h"
/* use both functions */
file2.c
#define USE_COPY_FILE
#include "utils.h
/* use only copy_file */
The problem with this method however is that it starts to get ugly as more utilities are introduced. Imagine if you have 10 of such functions, you need to have 7~8 lines of define before the include, if you need 7~8 of these functions!
Of course, another way would be to use DONT_USE_* type of macros that exclude functions, but then again you need a lot of defines for a file that uses few of these utility functions.
Either way, it doesn't look elegant.
My question is, how can you have functions that are internal to your own library, used by multiple translation units, and avoid external linkage?
Marking the functions static inline instead of static will make the warnings go away. It will do nothing about the code bloat of your current solution -- you're putting at least one copy of the function into each TU that uses it, and this will still be the case. Oli says in a comment that the linker might be smart enough to merge them. I'm not saying it isn't, but don't count on it :-)
It might even make the bloat worse, by encouraging the compiler to actually inline calls to the functions so that you get multiple copies per TU. But it's unlikely, GCC mostly ignores that aspect of the inline keyword. It inlines calls or not according to its own rules.
That's basically the best you can do portably. There's no way in standard C to define a symbol that's external from the POV of certain TUs (yours), but not from the POV of others (your users'). Standard C doesn't really care what libraries are, or the fact that TUs might be linked in several steps, or the difference between static and dynamic linking. So if you want the functions to be actually shared between your TUs, without any external symbol that could interfere with users of the library, then you need to do something specific to GCC and/or your static library or dll format to remove the symbols once the library is built but before the user links against it.
You can link your library normally, having these functions global, and localize them later.
objcopy can take global symbols and make them local, so they can't be linked with. It can also delete the symbol (the function stays, resolved references to it remain resolved, just the name is gone).
objcopy -L symbol localizes symbol. You can repeat -L multiple times.
objcopy -G symbol keeps symbol global, but localizes all others. You can repeat it also, and it will keep global all those you specified.
And I just found that I'm repeating the answer to this question, which Oli Charlesworth referenced in his comment.

Determining the space used by library functions in C

I have a bunch of code I need to analyse that I don't know how to do. I have a pile of code that here and there are using math functions from a header file I have included called math.h that came with my IDE. I am being asked to see how much space is used to include this. Specifically is the compiler including all of the library functions or just the ones we use. There is no object file being created. So I think the library code is being compiled into the individual files. Any ideas of a slick way to figure this out? I can't just comment out the includes because then the code wont complie and I won't know a size and if I comment out all the lines that use math functions it is not really representative.
You can use the objdump command to see the individual symbols inside your object files and the space they require.
Note that unless you're doing static compilation, library methods aren't generally copied into your produced binary, but only referenced (and brought in via the dynamic linker when your program is loaded).
As math.h is part of the standard C library, a copy of that library is basically guaranteed to always be in memory, so the additional memory and space requirements on dynamic linking are minimal. (During static linking, all symbols which aren't directly required by your program are discarded, and math functions don't tend to be very big, so usage should be fairly minimal there too).
The code in the header file is being complied into the object file of the .c you are using if your header has the definitions of the functions and just being referenced to if they are simply the declarations. The linker will then find a definition for each symbol and place it in your executable if you are using dynamic linking the OS will pull in the definition at run time.

Resources