Hiding non-API symbols in library - c

Suppose I have a library foo which consists of the modules foo and util and has the following source tree:
foo/
foo.c
foo.h
util.c
util.h
The public API of the library is defined in foo.h and all global identifiers are properly prefixed with foo_ or util_. The module util is only used by foo. To prevent name clashes with other modules named util I want to create a (static) library in which only identifiers from module foo are visible. How can I do this?
Edit: I have searched the internet quite extensively but surprisingly this seems to be one of those unsolved problems in computer science.

There are probably other possible approaches, but here's one:
You might consider including the file util.c within foo.c and making all the util functions / globals static. i.e.:
#include "util.c"
// ...
This works the same as *.h files, it simply ports the whole source into foo.c, nesting util.c and making all the static data available.
When I do this, I rename the file to .inc (i.e. util.c => util.inc)...
#include "util.inc"
// ...
...it's an older convention I picked up somewhere, though it might conflict with assembler files, so you'll have to use your own discretion.
EDIT
Another approach might require linker specific directives. For example, this SO answer details GNU's ld to achieve this goal. There are other approaches as well, listed in that same thread.

The following is GCC-specific.
You can mark each utility function with
__attribute__((visibility ("hidden")))
which will prevent it from being linked to from another shared object.
You can apply this to a series of declarations by surrounding them with
#pragma GCC visibility push(hidden)
/* ... */
#pragma GCC visibility pop
or use -fvisibility=hidden when compiling the object, which applies to declarations without an explicit visibility (e.g. neither __attribute__((visibility)) nor #pragma GCC visibility).

Before each variable and function declaration in util.h, define a macro constant which renames the declared identifier by adding the library prefix foo_, for instance
#define util_x foo_util_x
extern int util_x;
#define util_f foo_util_f
void util_f(void);
...
With these definitions in place, no other parts of the code need to be changed, and all global symbols in the object file util.o will be prefixed with foo_. This means that name collisions are less likely to occur.

Related

Keep functions private into a lib in C

I recently had to face a fairly complex issue regarding lib management, but I would be very surprised to be the first one.
Let's imagine you are creating a library (static or dynamic) called lib1 in C. Inside lib1 are a few functions that are exposed through an API, and a few other ones which remain private.
Ideally, the private functions would be static. Unfortunately, let's assume one of the source files, called extmod.c, come from another project, and it would be beneficial to keep it unmodified. Therefore, it becomes unpractical to static the functions it defines.
As a consequence, all the functions defined into extmod are present into lib1 ABI, but not the API, since the relevant *.h is not shipped. So no one notice.
Unfortunately, at later stage, someone wants to link both lib1 and another lib2 which also includes extmod. It results in a linking error, due to duplicate definitions.
In C++, the answer to this problem would be a simple namespace. In C, we are less lucky.
There are a few solutions to this problem, but I would like to probe if anyone believes to have found an efficient and non-invasive way. By "non-invasive", I mean a method which avoids if possible to modify extmod.c.
Among the possible workaround, there is the possibility to change all definitions from extmod.c using a different prefix, effectively emulating namespace. Or the possibility to put the content of extmod.c into extmod.h and static everything. Both method do extensively modify extmod though ...
Note that I've looked at this previous answer, but it doesn't address this specific concern.
You could implement your 'different prefix' solution by excluding extmod.c from your your build and instead treating it as header file in a way. Use the C pre-processor to effectively modify the file without actually modifying it. For example if extmod.c contains:
void print_hello()
{
printf("hello!");
}
Exclude this file from your build and add one called ns_extmod.c. The content of this file should look like this:
#define print_hello ns_print_hello
#include "extmod.c"
On compilation, print_hello will be renamed to ns_print_hello by the C pre-processor but the original file will remain intact.
Alternatively, IF AND ONLY IF the function are not called internally by extmod.c, it might work to use the preprocessor to make them static in the same way:
#define print_hello static print_hello
#include "extmod.c"
This should work for you assuming you have control over the build process.
One way you can do prefixing without actually editing extmod.c is as follows:
Create a new header file extmod_prefix.h as:
#ifndef EXTMOD_PREFIX_H
#define EXTMOD_PREFIX_H
#ifdef LIB1
#define PREFIX lib1_
#else
#ifdef LIB2
#define PREFIX lib2_
#endif
#endif
#define function_in_extmod PREFIX##function_in_extmod
/* Do this for all the functions in extmod.c */
#endif
Include this file in extmod.h and define LIB1 in lib1's build process and LIB2 in lib2.
This way, all the functions in extmod.c will be prefixed by lib1_ in lib1 and lib2_ in lib2.
Here's the answer (in the form of a question). The relevant portion:
objcopy --prefix-symbols allows me to prefix all symbols exported by
an object file / static library.

Using enums from statically linked libraries

I've seen that enums do not get exported from libraries in gcc. That is, if I have enum foo in lib1.c and use it to build lib.a, I cannot use enum foo in myprog.c, which links against the library.
As such, does that mean that if I want to use enum foo, I have to redefine it in myprog.c? Also, is there any way to export the enums for a library so that my program can make use of them?
This is "normal behavior". Enums are compile-time constants, not variables that are put in a binary or exported.
Typically, when using a library, you would include a header file with the definitions of the functions that you will use and the enumerations used in/with this library.
This is what you do:
Create one (or more) header files that contains the declaration for lib1.c that you want other code to be able to use:
lib1.h:
#ifndef LIB1_H_
#define LIB1_H_
enum Foo {
Bar =1
};
void do_something(enum Foo a);
#endif
In the lib1.c source code, you include this header file, use the enum you defined, and implement the do_something() function.
Build lib1.c to produce your library, lib1.a
Anyone that wants to use your lib1.a needs two thing:
The library, lib1.a
The lib1.h header file for the library.
The source code that need to use functionality from lib1.a, include the same headerfile lib1.h , where the enum, functions, and other things are declared, and you link to the lib1.a

Function prototype in header file doesn't match definition, how to catch this?

(I found this question which is similar but not a duplicate:
How to check validity of header file in C programming language )
I have a function implementation, and a non-matching prototype (same name, different types) which is in a header file. The header file is included by a C file that uses the function, but is not included in the file that defines the function.
Here is a minimal test case :
header.h:
void foo(int bar);
File1.c:
#include "header.h"
int main (int argc, char * argv[])
{
int x = 1;
foo(x);
return 0;
}
File 2.c:
#include <stdio.h>
typedef struct {
int x;
int y;
} t_struct;
void foo (t_struct *p_bar)
{
printf("%x %x\n", p_bar->x, p_bar->y);
}
I can compile this with VS 2010 with no errors or warnings, but unsurprisingly it segfaults when I run it.
The compiler is fine with it (this I understand)
The linker did not catch it (this I was slightly surprised by)
The static analysis tool (Coverity) did not catch it (this I was very surprised by).
How can I catch these kinds of errors?
[Edit: I realise if I #include "header.h" in file2.c as well, the compiler will complain. But I have an enormous code base and it is not always possible or appropriate to guarantee that all headers where a function is prototyped are included in the implementation files.]
Have the same header file included in both file1.c and file2.c. This will pretty much prevent a conflicting prototype.
Otherwise, such a mistake cannot be detected by the compiler because the source code of the function is not visible to the compiler when it compiles file1.c. Rather, it can only trust the signature that has been given.
At least theoretically, the linker could be able to detect such a mismatch if additional metadata is stored in the object files, but I am not aware if this is practically possible.
-Werror-implicit-function-declaration, -Wmissing-prototypes or equivalent on one of your supported compilers. then it will either error or complain if the declaration does not precede the definition of a global.
Compiling the programs in some form of strict C99 mode should also generate these messages. GCC, ICC, and Clang all support this feature (not sure about MS's C compiler and its current status, as VS 2005 or 2008 was the latest I've used for C).
You may use the Frama-C static analysis platform available at http://frama-c.com.
On your examples you would get:
$ frama-c 1.c 2.c
[kernel] preprocessing with "gcc -C -E -I. 1.c"
[kernel] preprocessing with "gcc -C -E -I. 2.c"
[kernel] user error: Incompatible declaration for foo:
different type constructors: int vs. t_struct *
First declaration was at header.h:1
Current declaration is at 2.c:8
[kernel] Frama-C aborted: invalid user input.
Hope this helps!
Looks like this is not possible with C compiler because of its way how function names are mapped into symbolic object names (directly, without considering actual signature).
But this is possible with C++ because it uses name mangling that depends on function signature. So in C++ void foo(int) and void foo(t_struct*) will have different names on linkage stage and linker will raise error about it.
Of course, that will not be easy to switch a huge C codebase to C++ in turn. But you can use some relatively simple workaround - e.g. add single .cpp file into your project and include all C files into it (actually generate it with some script).
Taking your example and VS2010 I added TestCpp.cpp to project:
#include "stdafx.h"
namespace xxx
{
#include "File1.c"
#include "File2.c"
}
Result is linker error LNK2019:
TestCpp.obj : error LNK2019: unresolved external symbol "void __cdecl xxx::foo(int)" (?foo#xxx##YAXH#Z) referenced in function "int __cdecl xxx::main(int,char * * const)" (?main#xxx##YAHHQAPAD#Z)
W:\TestProjects\GenericTest\Debug\GenericTest.exe : fatal error LNK1120: 1 unresolved externals
Of course, this will not be so easy for huge codebase, there can be other problems leading to compilation errors that cannot be fixed without changing codebase. You can partially mitigate it by protecting .cpp file contents with conditional #ifdef and use only for periodical checks rather than for regular builds.
Every (non-static) function defined in every foo.c file should have a prototype in the corresponding foo.h file, and foo.c should have #include "foo.h". (main is the only exception.) foo.h should not contain prototypes for any functions not defined in foo.c.
Every function should prototyped exactly once.
You can have .h files with no corresponding .c files if they don't contain any prototypes. The only .c file without a corresponding .h file should be the one containing main.
You already know this, and your problem is that you have a huge code base where this rule has not been followed.
So how do you get from here to there? Here's how I'd probably do it.
Step 1 (requires a single pass over your code base):
For each file foo.c, create a file foo.h if it doesn't already exist. Add "#include "foo.h" near the top of foo.c. If you have a convention for where .h and .c files should live (either in the same directory or in parallel include and src directories, follow it; if not, try to introduce such a convention).
For each function in foo.c, copy its prototype to foo.h if it's not already there. Use copy-and-paste to ensure that everything stays consistent. (Parameter names are optional in prototypes and mandatory in definitions; I suggest keeping the names in both places.)
Do a full build and fix any problems that show up.
This won't catch all your problems. You could still have multiple prototypes for some functions. But you'll have caught any cases where two headers have inconsistent prototypes for the same function and both headers are included in the same translation unit.
Once everything builds cleanly, you should have a system that's at least as correct as what you started with.
Step 2:
For each file foo.h, delete any prototypes for functions that aren't defined in foo.c.
Do a full build and fix any problems that show up. If bar.c calls a function that's defined in foo.c, then bar.c needs a #include "foo.h".
For both of these steps, the "fix any problems that show up" phase is likely to be long and tedious.
If you can't afford to do all this at once, you can probably do a lot of it incrementally. Start with one or a few .c files, clean up their .h files, and remove any extra prototypes declared elsewhere.
Any time you find a case where a call uses an incorrect prototype, try to figure out the circumstances in which that call is executed, and how it causes your application to misbehave. Create a bug report and add a test to your regression test suite (you have one, right?). You can demonstrate to management that the test now passes because of all the work you've done; you really weren't just messing around.
Automated tools that can parse C are likely to be useful. Ira Baxter has some suggestions. ctags may also be useful. Depending on how your code is formatted, you can probably throw together some tools that don't require a full C parser. For example, you might use grep, sed, or perl to extract a list of function definitions from a foo.c file, then manually edit the list to remove false positives.
Its obvious ("I have a huge code base") you cannot do this by hand.
What you need is an automated tool that can read your source files as the compiler sees them, collect all function prototypes and definitions, and verify that all definitions/prototypes match. I doubt you'll find such a tool lying around.
Of course, this match much check the signature, and this requires something like the compiler's front end to compare the signatures.
Consider
typedef int T;
void foo(T x);
in one compilation unit, and
typedef float T;
void foo(T x);
in another. You can't just compare the signature "lines" for equality; you need something that can resolve the types when checking.
GCCXML may be able to help, if you are using a GCC dialect of C; it extracts top-level declarations from source files as XML chunks. I don't know if it will resolve typedefs, though. You obviously have to build (considerable) support to collect the definitions in a central place (a database) and compare them. Comparing XML documents for equivalents is at least reasonably straightforward, and pretty easy if they are formatted in a regular way. This is likely your easiest bet.
If that doesn't work, you need something that has a full C front end that you can customize. GCC is famously available, and famously hard to customize. Clang is available, and might be pressed into service for this, but AFAIK only works with GCC dialects.
Our DMS Software Reengineering Toolkit has C front ends (with full preprocessing capability) for many dialects of C (GCC, MS, GreenHills, ...) and builds symbol tables with complete type information. Using DMS you might be able (depending on the real scale of your application) to simply process all the compilation units, and build just the symbol tables for each compilation unit. Checking that symbol table entries "match" (are compatible according to compiler rules including using equivalent typedefs) is built-into the C front ends; all one needs to do is orchestrate the reading, and calling the match logic for all symbol table entries at global scope across the various compilation units.
Whether you do this with GCC/Clang/DMS, it is a fair amount of work to cobble together a custom tool. So you have decide how critical you need for fewer suprises is, compared to the energy to build such a custom tool.

How to declare an inline function in C99 multi-file project?

I want to define an inline function in a project, compiled with c99. How can I do it?
When I declare the function in a header file and give the detail in a .c file, the definition isn't recognized by other files. When I put the explicit function in a header file, I have a problem because all .o files who use it have a copy of the definition, so the linker gives me a "multiple definition" error.
What I am trying to do is something like:
header.h
inline void func()
{
do things...
}
lib1.c
#include "header.h"
...
lib2.c
#include "header.h"
with a utility which uses both lib1.o and lib2.o
Unfortunately not all compilers are completely complying to C99 in that point even if they claim that they'd be.
An conforming way to do this is
// header file. an inline definition alone is
// not supposed to generate an external symbol
inline void toto(void) {
// do something
}
// in one .c file, force the creation of an
// external symbol
extern inline void toto(void);
Newer versions of gcc, e.g, will work fine with that.
You may get away with it for other compilers (pretenders) by defining something like
#ifdef PRETENDER
# define inlDec static
# define inlIns static
#else
# define inlDec
# define inlIns extern
#endif
// header file. an inline declaration alone is
// not supposed to generate an external symbol
inlDec inline void toto(void) {
// do something
}
// in one .c file, force the creation of an
// external symbol
inlIns inline void toto(void);
Edit:
compilers with C99 support (usually option -std=c99) that I know of
gcc (versions >= 4.3 IIRC) implements
the correct inline model
pcc is also correct
ggc < 4.3 needs a special option to
implement the correct model,
otherwise they use their own model
that results in multiple defined
symbols if you are not careful
icc just emits symbols in every unit
if you don't take special care. But
these symbols are "weak" symbols, so
they don't generate a conflict. They
just blow up your code.
opencc, AFAIR, follows the old gcc specific model
clang doesn't emit symbols for inline functions at all, unless you have an extern declaration and you use the function pointer in one compilation unit.
tcc just ignores the inline keyword
If used by itself, in C99 inline requires that the function be defined in the same translation unit as it's being used (so, if you use it in lib1.c, it must be defined in lib1.c).
You can also declare a method as static inline (and put the definition in a header file shared between two source files). This avoids the multiple-definition issue, and lets the compiler inline the file across all the translation units where it's used (which it may or may not be able to do if you just declare the function in one translation unit).
See: http://www.greenend.org.uk/rjk/2003/03/inline.html
I think you don't need to use the inline word when you are defining and declaring the function inside the Header file, the compiler usually takes it as inline by default unless it's too long, in which case it will be smart enough to treat it as a normal function.
I think the multiple definition may be caused by the lack of a Include Guard in the Header file.
You should use something like this in the header:
#ifndef HEADERNAME_H
#define HEADERNAME_H
void func()
{
// do things...
}
#endif

How to structure #includes in C

Say I have a C program which is broken to a set of *.c and *.h files. If code from one file uses functions from another file, where should I include the header file? Inside the *.c file that used the function, or inside the header of that file?
E.g. file foo.c includes foo.h, which contains all declarations for foo.c; same for bar.c and bar.h. Function foo1() inside foo.c calls bar1(), which is declared in bar.h and defined in bar.c. Now the question is, should I include bar.h inside foo.h, or inside foo.c?
What would be a good set of rules-of-thumb for such issues?
You should include foo.h inside foo.c. This way other c files that include foo.h won't carry bar.h unnecessarily. This is my advice for including header files:
Add include definitions in the c files - this way the file dependencies are more obvious when reading the code.
Split the foo.h in two separate files, say foo_int.h and foo.h. The first one carries the type and forward declarations needed only by foo.c. The foo.h includes the functions and types needed by external modules. This is something like the private and public section of foo.
Avoid cross references, i.e. foo references bar and bar references foo. This may cause linking problems and is also a sign of bad design
As others have noted, a header foo.h should declare the information necessary to be able to use the facilities provided by a source file foo.c. This would include the types, enumerations and functions provided by foo.c. (You don't use global variables, do you? If you do, then those are declared in foo.h too.)
The header foo.h should be self-contained and idempotent. Self-contained means that any user can include foo.h and not need to worry about which other headers may be needed (because foo.h includes those headers). Idempotent means that if the header is included more than once, there is no damage done. That is achieved by the classic technique:
#ifndef FOO_H_INCLUDED
#define FOO_H_INCLUDED
...rest of the contents of foo.h...
#endif /* FOO_H_INCLUDED */
The question asked:
File foo.c includes foo.h, which contains all declarations for foo.c; same for bar.c and bar.h. Function foo1() inside foo.c calls bar1(), which is declared in bar.h and defined in bar.c. Now the question is, should I include bar.h inside foo.h, or inside foo.c?
It will depend on whether the services provided by foo.h depend on bar.h or not. If other files using foo.h will need one of the types or enumerations defined by bar.h in order to use the functionality of foo.h, then foo.h should ensure that bar.h is included (by including it). However, if the services of bar.h are only used in foo.c and are not needed by those who use foo.h, then foo.h should not include bar.h
I would only include header files in a *.h file that required for the header file itself. Header files that are needed for a source file, in my opinion, should be included in the source file so that the dependencies are obvious from the source. Header files should be be built to handle multiple inclusion so you could put it in both if required for clarity.
Using the examples of foo.c and foo.h I've found these guidelines helpful:
Remember that the purpose of foo.h is to facilitate the use of foo.c, so keep it as simple, organized, and self-explanatory as possible. Be liberal with comments that explain how and when to use the features of foo.c -- and when not to use them.
foo.h declares the public features of foo.c: functions, macros, typedefs, and (shudder) global variables.
foo.c should #include "foo.h -- see discussion, and also Jonathan Leffler's comment below.
If foo.c requires additional headers for it to compile, include them in foo.c.
If external headers are required for foo.h to compile, include them in foo.h
Leverage the preprocessor to prevent foo.h from being included more than once. (See below.)
If for some reason an external header will be required in order for another .c file to use the features in foo.c, include the header in foo.h to save the next developer from unnecessary debugging. If you're averse to this, consider adding macro that will display instructions at compile-time if the required headers haven't been included.
Don't include a .c file within another .c file unless you have a very good reason and document it clearly.
As kgiannakakis noted, it's helpful to separate the public interface from the definitions and declarations needed only within foo.c itself. But rather than creating two files, it's sometimes better to let the preprocessor do this for you:
// foo.c
#define _foo_c_ // Tell foo.h it's being included from foo.c
#include "foo.h"
. . .
// foo.h
#if !defined(_foo_h_) // Prevent multiple inclusion
#define _foo_h_
// This section is used only internally to foo.c
#ifdef _foo_c_
. . .
#endif
// Public interface continues to end of file.
#endif // _foo_h_ // Last-ish line in foo.h
I include the most minimal set of headers possible in the .h file, and include the rest in the .c file. This has the benefit of sometimes reducing compilation times. Given your example, if foo.h doesn't really need bar.h but includes it anyway, and some other file includes foo.h, then that file will be recompiled if bar.h changes, even though it may not actually need or use bar.h.
The .h file should define the public interface (aka the api) to the functions in the .c file.
If the interface of file1 uses the interface of file2 then #include file2.h in file1.h
If the implementation in file1.c makes use of stuff in file2.c then file1.c should #include file2.h.
I must admit though that - because I always #include file1.h in file1.c - I normally wouldn't bother #including file2.h directly in file1.c if it was already #included in file1.h
If you ever find yourself in the situation where two .c files #include each others .h files then it is a sign that modularity has broken down and you ought to think about restructuring things a bit.

Resources