Protect against linking multiple versions of a shared library - linker

When you use semantic versioning, a switch from libfoo.so.23 to libfoo.so.24 means that libfoo's API changed in an backwards-incompatible way.
Thus, one must not link against both libfoo.so.23 and libfoo.so.24.
Example:
Status quo:
Executable myexe links against libfoo.so.23 and libbar.so.1
libbar.so.1 itself also links against libfoo.so.23
Erroneous change:
libbar.so.1 upgrades to libfoo.so.24 without incrementing its major version number
As a result, when myexe is invoked the dynamic linker now loads libfoo.so.23 and libfoo.so.24 (cf. ldd output) leading to one set of functions using backwards-incompatible symbols from libfoo.
How to protect against such an error?
That means: how do I tell the linker to error out in such a situation (at build-time and runtime)?
On Linux, the linker just warns about this at build time:
/usr/bin/ld: warning: libfoo.so.24, needed by libbar.so.1,
may conflict with libfoo.so.23
Is it possible to turn this into an link error?
(This warning doesn't show up at runtime.)
Other linkers on other Unices don't even warn about this at build-time.
It's possible to detect such a library version mismatch at runtime with a constructor function that is executed during library loading. For example:
#include <stdio.h>
#include <stdlib.h>
#define FOO_VERSION 23
int foo_version() { return FOO_VERSION; }
static void cons() __attribute__((constructor));
static void cons()
{
fprintf(stderr, "constructing foo (%d)\n", foo_version());
if (foo_version() != FOO_VERSION) {
fprintf(stderr, "foo.c: FOO_VERSION %d != foo_version() %d\n",
FOO_VERSION, foo_version());
exit(1);
}
}
But I would prefer a linker error at build-time/runtime. Ideally something that also shows up in ldd (as an error).

Is it possible to turn this into an link error?
GNU ld and Gold have:
--fatal-warnings
--no-fatal-warnings
Treat all warnings as errors. The default behaviour can be
restored with the option --no-fatal-warnings.

Related

Optional dynamic library

Background
Trying to profile an executable, I experimented the profiler Intel VTune and I learn that there is an API library (ITT) that provide utility to start/stop profiling. Its basic functions __itt_resume() and __itt_pause(). What triggers me is that the library is optional, i.e. if the runtime library of ITT is not loaded, these functions are basically noops.
Optional library?
I want to know (first of all on Linux)
Does a process checks that the dynamic library he is linking to is loaded when he starts or when each symbol, or the first symbol of the library is called at runtime (i.e. lazy initialization)? I think on Windows it's at startup because of can't find XXX.dll messages, but I am not sure on Linux. Also, with the example, I don't get any compilation & execution issues even if the symbol is not defined in some_process.c.
How to implement this on Linux? Looking at the Github repo of ITT, among many macro trickery, I feel like the key is here:
#define ITTNOTIFY_VOID(n) (!ITTNOTIFY_NAME(n)) ? (void)0 : ITTNOTIFY_NAME(n)
Basically it wraps every function call with a function pointer call if its not NULL.
How to implement this in a cross-platform way (Windows, Mac, Linux) ?
I end up with a minimal example that looks like the code linked here, but it does not work as it should. In the linked version, my_api_hello_impl() is not called as it should. Also, there is no crash checking the value of the extern symbol api_hello_ptr() when the library is not linked.
my_api.c
#include "my_api.h"
#include <stdio.h>
void(*api_hello_ptr)();
void api_hello_impl()
{
printf("Hello\n");
}
__attribute__((constructor))
static void init()
{
printf("linked\n");
api_hello_ptr = api_hello_impl;
}
my_api.h
#pragma once
extern void(*api_hello_ptr)();
inline void api_hello() { if(api_hello_ptr) api_hello_ptr(); }
some_process.c
#include "my_api.h"
int main()
{
// NOOPS of not linked at runtime
api_hello();
}
Makefile
# my_api is not linked to some_process
some_process: some_process.c my_api.h
$(CC) -o $# $<
my_api.so: my_api.c my_api.h
$(CC) -shared -fPIC -o $# $<
test_linked: some_process my_api.so
LD_PRELOAD="$(shell pwd)/my_api.so" ./some_process
test_unlinked: some_process my_api.so
./some_process
.PHONY: test_linked test_unlinked
Output:
$ make test_linked
LD_PRELOAD="/tmp/tmp.EkrQbILrNg/my_api.so" ./some_process
linked
$ make test_unlinked
./some_process
Does a process checks that the dynamic library he is linking to is loaded when he starts
Yes, it does. If a dynamic library is linked, then it is a runtime requirement and the system loader will not start execution of a program without finding and loading the library first. There are mechanisms for delayed-loading, but it is not the norm on Linux, they are done manually or using custom libraries. By default, all dynamically linked objects need to be loaded before execution starts.
Note: I'm assuming we are talking about ELF executables here since we are on Linux.
How to implement this on Linux?
You can do it using macros or wrapper functions, plus libdl (link with -ldl), with dlopen() + dlsym(). Basically, in each one of those wrappers, the first thing you do is check if the library was already loaded, and if not, load it. Then, find and call the needed symbol.
Something like this:
#include <dlfcn.h>
#include <stdio.h>
#include <stdlib.h>
static void *libfoo_handle = NULL;
static int (*libfoo_func_a)(int, int);
static void load_libfoo_if_needed(void) {
if (!libfoo_handle) {
// Without "/" in the path, this will look in all standard system
// dynamic library directories.
libfoo_handle = dlopen("libfoo.so", RTLD_LAZY | RTLD_GLOBAL);
if (!libfoo_handle) {
perror("failed to load libfoo.so");
_exit(1);
}
// Optionally use dlsym() here to initialize a set of global
// function pointers, so that you don't have to do it later.
void *tmp = dlsym(libfoo_handle, "func_a");
if (!tmp) {
perror("no symbol func_a in libfoo.so");
_exit(1);
}
*((void**)&libfoo_func_a) = tmp;
}
}
int wrapper_libfoo_func_a(int a, int b) {
load_libfoo_if_needed();
return libfoo_func_a(a, b);
}
// And so on for every function you need. You could use macros as well.
How to implement this in a cross-platform way (Windows, Mac, Linux)?
For macOS, you should have dlopen() and dlsym() just like in Linux.
Not sure how to exactly do this on Windows, but I know there is LoadLibrary() available in different flavors (e.g. one, two, etc.), which should be more or less the equivalent of dlopen() and GetProcAddress(), which should be the equivalent of dlsym().
See also: Loading a library dynamically in Linux or OSX?

Compilation fails with #include "..." but not with #include <...>

I'm currently toying around with the C library NanoVG library. The library depends on OpenGL fucntions and has 2 header files nanovg.h and nanovg_gl.h. The latter file contains part of the implementation. For convenience, I have placed these two header files in /usr/include/nanovg.
When I try to compile the following code to an object file, gcc does not complain:
// working.c
#include <GL/gl.h>
#include <nanovg/nanovg.h>
#define NANOVG_GL3_IMPLEMENTATION
#include <nanovg/nanovg_gl.h>
(Command: gcc -c working.c -o working.o)
Now, I copy the header files from /usr/include/nanovg/ to the working directory, and replace the code with:
// notworking.c
#include <GL/gl.h>
#include "nanovg.h"
#define NANOVG_GL3_IMPLEMENTATION
#include "nanovg_gl.h"
(Command: gcc -c notworking.c -o notworking.o)
Gcc now complains that some OpenGL functions are not declared:
... (many more similar complaints)
src/nanovg_gl.h: In function ‘glnvg__renderDelete’:
src/nanovg_gl.h:1540:3: warning: implicit declaration of function ‘glDeleteBuffers’; did you mean ‘glSelectBuffer’? [-Wimplicit-function-declaration]
1540 | glDeleteBuffers(1, &gl->fragBuf);
| ^~~~~~~~~~~~~~~
...
Why does one file compile smoothly but not the other?
A bit deeper:
Using the cpp tool, I found that the difference between the two pre-processed files is limited to # directives but I don't see any difference as far as the "C content" goes. Below is a snippet of the pre-processed working.c. If I add the # lines from the pre-processed notworking.c, then gcc no longer compiles the pre-processed working.c and complains about a missing declaration for glDeleteBuffers.
// ...
if (gl ==
// # 1533 "src/nanovg_gl.h" 3 4 // <- uncomment this line and glDeleteBuffers is considered missing by gcc
((void *)0)
// # 1533 "src/nanovg_gl.h" // <- idem
) return;
glnvg__deleteShader(&gl->shader);
if (gl->fragBuf != 0)
glDeleteBuffers(1, &gl->fragBuf); // <- the function that gcc complains about is here
// ...
Edit: Just to make sure that I did not do anything sneaky that might have caused the difference, I followed the following steps which hopefully should be reproducible on another computer:
GCC version: gcc (Ubuntu 10.3.0-1ubuntu1) 10.3.0
Copy the version of GL/gl.h can be found here to working directory and call it glfoo.h
Copy the headers of nanovg (as found in the repo) to /usr/include/nanovg/ and nanovg/ (relative to working directory).
Save the following as test.c in the working dir:
#include "glfoo.h"
#include <nanovg/nanovg.h>
#define NANOVG_GL3_IMPLEMENTATION
#include <nanovg/nanovg_gl.h>
Run gcc -c test.c -o test.o => compilation works
Replace <...> with ".." on lines 2 and 4 and run command => compilation fails.
Just tried these exact steps and I was able to reproduce it.
After investigating this a bit I found the solution. gcc does not apply the same warning level to system headers as it does for "normal" files (this is mainly because system headers are sometimes doing weird things which are not backed up by the C standard, but are "safe" for the platform they are coming with).
The gcc documentation states (emphasis mine):
-Wsystem-headers:
Print warning messages for constructs found in system header files. Warnings from system headers are normally suppressed, on
the assumption that they usually do not indicate real problems and
would only make the compiler output harder to read. Using this
command-line option tells GCC to emit warnings from system headers as
if they occurred in user code. However, note that using -Wall in
conjunction with this option does not warn about unknown pragmas in
system headers—for that, -Wunknown-pragmas must also be used.
When you include nanovg via <...>, it is treated as a system header.
So doing gcc -Wsystem-headers working.c actually will bring on the warning.
Note that your code is neither working in working.c nor notworking.c, as working.c just hides the warning messages. The proper way to access any GL function beyond what is defined in GL 1.1 is to use the GL extension mechanism, which means you have to query the GL function pointers at run-time. Full GL loader libs like GLEW and glad can do that for you automatically. Many of these loaders (including GLEW and GLAD) work by re-#define-ing every GL function name to an internal function pointer, so when you include the header which comes with the loader, every GL function called in your code (and nanovg's) will be re-routed to the loader-libraries function pointers, and your code can actually work (provided you properly initialize the loader at run-time before any of the GL functions is called).
simply
#include <file.h>
include file from the path listed default to the compiler, while
#include "file.h"
include file from the current folder (where you are compiling).
As in your case , switching from <> to "" makes come files missing which makes that compiler error coming.

Why including an h file with external vars and funcs results in undefined references

What if I want these externals to be resolved in runtime with dlopen?
Im trying to understand why including an h file, with shared library external vars and funcs, to a C executable program results in undefined/unresolved. (when linking)
Why do I have to add "-lsomelib" flag to the gcc linkage if I only want these symbols to be resolved in runtime.
What does the link time linker need these deffinitions resolutions for. Why cant it wait for the resolution in runtime when using dlopen.
Can anyone help me understand this?
Here something that may help understanding:
there are 3 types of linking:
static linking (.a): the compiler includes the content of the library into your code at link time so that you can move the code to other computers with the same architecture and run it.
dynamic linking (.so): the compiler resolves the symbols at link time (during compilation); but the does not includes the code of the library in your executable. When the program is started, the library is loaded. And if the library is not found the program stop. You need the library on the computer that is running the program
dynamic loading: You are in charge of loading the library functions at runtime, using dlopen and etc. Specially used for plugins
see also: http://www.ibm.com/developerworks/library/l-dynamic-libraries/ and
Difference between shared objects (.so), static libraries (.a), and DLL's (.so)?
A header file (e.g. an *.h file referenced by some #include directive) is relevant to the C or C++ compiler. The linker does not know about source files (which are input to the compiler), but only about object files produced by the assembler (in executable and linkable format, i.e. ELF)
A library file (give by -lfoo) is relevant only at link time. The compiler does not know about libraries.
The dynamic linker needs to know which libraries should be linked. At runtime it does symbol resolution (against a fixed & known set of shared libraries). The dynamic linker won't try linking all the possible shared libraries present on your system (because it has too many shared objects, or because it may have several conflicting versions of a given library), it will link only a fixed set of libraries provided inside the executable. Use objdump(1) & readelf(1) & nm(1) to explore ELF object files and executables, and ldd(1) to understand shared libraries dependencies.
Notice that the g++ program is used both for compilation and for linking. (actually it is a driver program: it starts some cc1plus -the C++ compiler proper- to compile a C++ code to an assembly file, some as -the assembler- to assemble an assembly file into an object file, and some ld -the linker- to link object files and libraries).
Run g++ as g++ -v to understand what it is doing, i.e. what program[s] is it running.
If you don't link the required libraries, at link time, some references remain unresolved (because some object files contain an external reference and relocation).
(things are slightly more complex with link-time optimization, which we could ignore)
Read also Program Library HowTo, Levine's book linkers and loaders, and Drepper's paper: how to write shared libraries
If you use dynamic loading at runtime (by using dlopen(3) on some plugin), you need to know the type and signature of relevant functions (returned by dlsym(3)). A program loading plugins always have its specific plugin conventions. For examples look at the conventions used for geany plugins & GCC plugins (see also these slides about GCC plugins).
In practice, if you are developing your application accepting some plugins, you will define a set of names, their expected type, signature, and role. e.g.
typedef void plugin_start_function_t (const char*);
typedef int plugin_more_function_t (int, double);
then declare e.g. some variables (or fields in a data structure) to point to them with a naming convention
plugin_start_function_t* plustart; // app_plugin_start in plugins
#define NAME_plustart "app_plugin_start"
plugin_more_function_t* plumore; // app_plugin_more in plugins
#define NAME_plumore "app_plugin_more"
Then load the plugin and set these pointers, e.g.
void* plugdlh = dlopen(plugin_path, RTLD_NOW);
if (!plugdlh) {
fprintf(stderr, "failed to load %s: %s\n", plugin_path, dlerror());
exit(EXIT_FAILURE; }
then retrieve the symbols:
plustart = dlsym(plugdlh, NAME_plustart);
if (!plustart) {
fprintf(stderr, "failed to find %s in %s: %s\n",
NAME_plustart, plugin_path, dlerror();
exit(EXIT_FAILURE);
}
plumore = dlsym(plugdlh, NAME_plumore);
if (!plumore) {
fprintf(stderr, "failed to find %s in %s: %s\n",
NAME_plumore, plugin_path, dlerror();
exit(EXIT_FAILURE);
}
Then use appropriately the plustart and plumore function pointers.
In your plugin, you need to code
extern "C" void app_plugin_start(const char*);
extern "C" int app_plugin_more (int, double);
and give a definition to both of them. The plugin should be compiled as position independent code, e.g. with
g++ -Wall -fPIC -O -g pluginsrc1.c -o pluginsrc1.pic.o
g++ -Wall -fPIC -O -g pluginsrc2.c -o pluginsrc2.pic.o
and linked with
g++ -shared pluginsrc1.pic.o pluginsrc2.pic.o -o yourplugin.so
You may want to link extra shared libraries to your plugin.
You generally should link your main program (the one loading plugins) with the -rdynamic link flag (because you want some symbols of your main program to be visible to your plugins).
Read also the C++ dlopen mini howto

Two functions from the same library: why does one generate undefined reference while the other doesn't?

I want to replace pthread_mutex_lock by pthread_mutex_trylock in a function and when I do so, I get the "undefined reference" error message (See below). If I replace the lines 411-13 by pthread_mutex_lock(&cmd_queue_lock), I don't get the linker error.
They are both from the same library which I already include. Why does one generate the linker error and the other doesn't? More importantly, how can I fix it? I tried adding "extern int pthread_mutex_trylock" and changing the order of the .o files in Makefile but both don't work.
$ nl clientmain.c
12 #include <stdio.h>
...
21 #include <pthread.h>
411 if (pthread_mutex_trylock(&cmd_queue_lock) == EBUSY) {
412 continue;
413 }
$ make
clientmain.o: In function `createHC':
clientmain.c:411: undefined reference to `pthread_mutex_trylock'
collect2: ld returned 1 exit status
make: *** [clientmain] Error 1
Admittedly I can't find any reference to a manual page telling this, but adding -lpthread to your final linking phase will probably do the job. I found it by looking for the symbol pthread_mutex_trylock in all /usr/lib/lib*.a files and /usr/lib/libpthread.a was the only one defining the symbol. Reverse engineering.
The manual page of gcc does say that you can/should use the -pthread option to gcc to include POSIX thread support, so that is probably the royal route. This option worked on my system too. Interestingly the regular /usr/lib/libc.a does offer the pthread_mutex_lock but not the pthread_mutex_trylock so that caused your confusion. Note that the manual page of gcc is also saying that this option has effect on preprocessing, so it may be more and better than just linking against /usr/lib/libpthread.a.

c compiler error with linking

Here is the error I get from the gcc call:
gcc -o rr4 shells2.c graph1.c rng.c;
Undefined symbols:
"_getdisc", referenced from:
_main in cckR7zjP.o
ld: symbol(s) not found
The "cckR7zjP.o" keeps changing every time I call the compiler. The code for the method is in the file graph1.c; its header file is called graph2.h, and I am importing it to the file with the main method called shells2.c using:
#include "graph2.h"
The method or function definition is:
int getdisc(int i){ return disc[i];}
which attempts to return the ith member of the array disc created by
static int *disc;
that I already initialized in some other method! I think the problematic call is:
for (iter = 0; iter < n; iter++) {
if (getdisc(iter) == cln)
avgbtwn += get_betweenness(iter);
}
This seems like a linker problem I checked with some other questions, and I think I am linking my method properly (and am using the same method elsewhere in the code) but I still can't figure this out.
Edit: So I switched the order of the command in linux to
gcc -o rr4 graph1.c rng.c shells2.c
as per Soren's suggestion and the function compiled as normal, does anyone know why?
Further it seems when i put a trailing line break in the file graph1.c alleviates the problem.
There used to be a issue in the old GCC 2.x compilers/linkers where the linker couldn't resolve linking when the symbols were not group together -- think of it as that the linker would only looks for symbols that is still needed, and it would drop symbols which were unused.
To most people the problem would manifest itself as a problem of the ordering of libraries (specified with -l or as .a).
I see from the comments that you use a mac, so it might just be that the mac version of the compiler/linker still has that problem -- anyway since reordering the source files solved the problem, then you certainly have some variation of this bug.
So possible solutions;
Group all your source files into larger files -- bad solution -- but the linker is less likely to fail with this symptom -- or
Try to compiler all the files to .o first and then link the .o files (using a makefile would usually do this, but may or may not resolve the problem) and possibly combine the .o into a single .a (man ar), or
Change the order of the source files to have the shells2.c last (which worked for you), or
See if upgrading your compiler helps
Sorry for the long laundry list, but this is clearly just a compiler bug which just need a simple work around.
That's definitely an error with getdisc not being visible to the linker but, if what you say is correct, that shouldn't happen.
The gcc command line you have includes graph1.c which you assure use contains the function.
Don't worry about the object file name, that's just a temprary name created by the compiler to pass to the linker.
Can you confirm (exact cut and paste) the gcc command line you're using, and show us the function definition with some context around it?
In addition, make sure that graph1.c is being compiled as expected by inserting immediately before the getdisc function, the following line:
xyzzy plugh twisty;
If your function is being seen by the compiler, that should cause an error first. It may be something like ifdef statements causing your code not to be compiled.
By way of testing, the following transcript shows that what you are trying to do works just fine:
pax> cat shells2.c
#include "graph2.h"
int main (void) {
int x = getdisc ();
return x;
}
pax> cat graph2.h
int getdisc (void);
pax> cat graph1.c
int getdisc (void) {
return 42;
}
pax> gcc -o rr4 shells2.c graph1.c
pax> ./rr4
pax> echo $?
42
We have to therefore assume that what you're actually doing is something different, and that's unusually tactful for me :-)
What you're experiencing is what would happen with something like:
pax> gcc -o rr4 shells2.c
/tmp/ccb4ZOpG.o: In function `main':
shells2.c:(.text+0xa): undefined reference to `getdisc'
collect2: ld returned 1 exit status
or if getdisc was not declared correctly in graph1.c.
That last case could be for many reasons including, but not limited to:
mis-spelling of getdisc.
#ifdef type statements meaning the definition is never seen (though you seem to have discounted that in a comment).
some wag using #define to change getdisc to something else (unlikely, but possible).

Resources