Optional dynamic library - c

Background
Trying to profile an executable, I experimented the profiler Intel VTune and I learn that there is an API library (ITT) that provide utility to start/stop profiling. Its basic functions __itt_resume() and __itt_pause(). What triggers me is that the library is optional, i.e. if the runtime library of ITT is not loaded, these functions are basically noops.
Optional library?
I want to know (first of all on Linux)
Does a process checks that the dynamic library he is linking to is loaded when he starts or when each symbol, or the first symbol of the library is called at runtime (i.e. lazy initialization)? I think on Windows it's at startup because of can't find XXX.dll messages, but I am not sure on Linux. Also, with the example, I don't get any compilation & execution issues even if the symbol is not defined in some_process.c.
How to implement this on Linux? Looking at the Github repo of ITT, among many macro trickery, I feel like the key is here:
#define ITTNOTIFY_VOID(n) (!ITTNOTIFY_NAME(n)) ? (void)0 : ITTNOTIFY_NAME(n)
Basically it wraps every function call with a function pointer call if its not NULL.
How to implement this in a cross-platform way (Windows, Mac, Linux) ?
I end up with a minimal example that looks like the code linked here, but it does not work as it should. In the linked version, my_api_hello_impl() is not called as it should. Also, there is no crash checking the value of the extern symbol api_hello_ptr() when the library is not linked.
my_api.c
#include "my_api.h"
#include <stdio.h>
void(*api_hello_ptr)();
void api_hello_impl()
{
printf("Hello\n");
}
__attribute__((constructor))
static void init()
{
printf("linked\n");
api_hello_ptr = api_hello_impl;
}
my_api.h
#pragma once
extern void(*api_hello_ptr)();
inline void api_hello() { if(api_hello_ptr) api_hello_ptr(); }
some_process.c
#include "my_api.h"
int main()
{
// NOOPS of not linked at runtime
api_hello();
}
Makefile
# my_api is not linked to some_process
some_process: some_process.c my_api.h
$(CC) -o $# $<
my_api.so: my_api.c my_api.h
$(CC) -shared -fPIC -o $# $<
test_linked: some_process my_api.so
LD_PRELOAD="$(shell pwd)/my_api.so" ./some_process
test_unlinked: some_process my_api.so
./some_process
.PHONY: test_linked test_unlinked
Output:
$ make test_linked
LD_PRELOAD="/tmp/tmp.EkrQbILrNg/my_api.so" ./some_process
linked
$ make test_unlinked
./some_process

Does a process checks that the dynamic library he is linking to is loaded when he starts
Yes, it does. If a dynamic library is linked, then it is a runtime requirement and the system loader will not start execution of a program without finding and loading the library first. There are mechanisms for delayed-loading, but it is not the norm on Linux, they are done manually or using custom libraries. By default, all dynamically linked objects need to be loaded before execution starts.
Note: I'm assuming we are talking about ELF executables here since we are on Linux.
How to implement this on Linux?
You can do it using macros or wrapper functions, plus libdl (link with -ldl), with dlopen() + dlsym(). Basically, in each one of those wrappers, the first thing you do is check if the library was already loaded, and if not, load it. Then, find and call the needed symbol.
Something like this:
#include <dlfcn.h>
#include <stdio.h>
#include <stdlib.h>
static void *libfoo_handle = NULL;
static int (*libfoo_func_a)(int, int);
static void load_libfoo_if_needed(void) {
if (!libfoo_handle) {
// Without "/" in the path, this will look in all standard system
// dynamic library directories.
libfoo_handle = dlopen("libfoo.so", RTLD_LAZY | RTLD_GLOBAL);
if (!libfoo_handle) {
perror("failed to load libfoo.so");
_exit(1);
}
// Optionally use dlsym() here to initialize a set of global
// function pointers, so that you don't have to do it later.
void *tmp = dlsym(libfoo_handle, "func_a");
if (!tmp) {
perror("no symbol func_a in libfoo.so");
_exit(1);
}
*((void**)&libfoo_func_a) = tmp;
}
}
int wrapper_libfoo_func_a(int a, int b) {
load_libfoo_if_needed();
return libfoo_func_a(a, b);
}
// And so on for every function you need. You could use macros as well.
How to implement this in a cross-platform way (Windows, Mac, Linux)?
For macOS, you should have dlopen() and dlsym() just like in Linux.
Not sure how to exactly do this on Windows, but I know there is LoadLibrary() available in different flavors (e.g. one, two, etc.), which should be more or less the equivalent of dlopen() and GetProcAddress(), which should be the equivalent of dlsym().
See also: Loading a library dynamically in Linux or OSX?

Related

Constructor in shared object not called in when LD_PRELOAD-ing a go executable

There is a strange behavior around GO executable built in Alpine images where standard LD_PRELOAD feature is not working correctly.
It looks like constructor functions are not called by the dynamic loader!
I have an example go application (getgoogle.go):
package main
import (
"fmt"
"net/http"
)
func main() {
resp, err := http.Get("http://google.com/")
if err == nil {
fmt.Println(resp.StatusCode)
}
}
And the example shared object code (libldptest.c)
#include <stdio.h>
static void __attribute__((constructor)) StaticConstructor(int argc, char **argv, char **env)
{
printf(">>> LD_PRELOADED!\n");
}
I am creating a debian based docker image with this Dockerfile (gotest image):
FROM golang
COPY libldptest.c hello-world.go /
RUN gcc -shared -o /libldptest.so /libldptest.c
RUN go build -gcflags='-N -l' -o /getgoogle /getgoogle.go
ENV LD_PRELOAD=/libldptest.so
Then running the following command:
$docker run -it gotest /getgoogle
>>> LD_PRELOADED!
200
This means the constructor works here.
But when doing the same with an alpine based docker image
FROM golang:1.12-alpine
RUN apk add gcc libc-dev
COPY libldptest.c hello-world.go /
RUN gcc -shared -o /libldptest.so /libldptest.c
RUN go build -gcflags='-N -l' -o /getgoogle /getgoogle.go
ENV LD_PRELOAD=/libldptest.so
And running the same command as above
$docker run -it gotest /getgoogle
200
$docker run -it gotest ls
>>> LD_PRELOADED!
bin src
Meaning the static constructor was not called when running the go application! (but is was called when running ls)
Note that I have checked that the dynamic loader adds the library to the process space.
I'd be grateful to understand why it is not working.
Stop ignoring the first comment. If you insist on using Go's internal linker that does not link in a way that's compatible with libc use then you can't use any C code, including LD_PRELOADed C code or even features of the dynamic linker itself. As Florian (from glibc) said in the linked issue, it is not valid with glibc either and "working" only by chance there.
Even if you somehow figure out "mechanically" why your ctor isn't being called, you're still running C code in a corrupted process state and anything could go wrong. Even if you analyze everything and it seems fine, this can change entirely with next dynamic linker/libc update.
If you want to do this, use the external linker option in Go.
There is a principal problem with static constructors in the Go/Alpine environment as one can see from the comments. In short, from an ABI perspective, the requirement to call static constructors is assigned to the executable and not the loader. Go executable is no based on C runtime and it only calls the static constructors of the dependency shared objects and not LD_PRELOAD-ed shared objects. In case of glibc the constructors of a LD_PRELOAD-ed shared objects called by implementation and not by design by the loader. On musl-libc they are not.
I have made a "hack"-ish workaround to make existing Go apps work with a LD_PRELOAD-ed shared object. I am using the fact that the library is LD_PRELOAD-ed in this environment correctly by musl-libc and that Go is calling pthread_create in very
early stage of the initialization.
I am overriding/hooking the pthread_create symbol in the LD_PRELOAD-ed shared object and uses it to call constructors.
#include <pthread.h>
#include <dlfcn.h>
int pthread_create(pthread_t *thread, const pthread_attr_t *attr, void *(*start_routine) (void *), void *arg)
{
int (*pthread_create_original)(pthread_t *thread, const pthread_attr_t *attr, void *(*start_routine) (void *), void *arg) = dlsym(RTLD_NEXT,"pthread_create");
static int already_called = 0;
if (!already_called)
{
already_called = 1;
// call here your constructors
}
return pthread_create_original(thread,attr,start_routine,arg);
}
Caveats: this works with current Go runtime, but the assumptions on this solution is built are far from being future proof. Next Go releases can easily break it.

DLL dependency to static library

I currently build a purely static library MainLib for our customers that contains all symbols so that they can intrgrate it into their program. For several reasons, I now need to deliver a DLL version of MainLib that contains parts of the symbols alongside a static library FeatureLib that contains the remaining symbols. One reason is that we want to avoid bad guys using our software by simply stealing the DLL that is provided via the program of our customer. This wouldn't work if parts of the symbols are integrated within the calling software via a static library. The user of the package shall only be able to use the DLL if he added the symbols of FeatureLib into his application.
For Linux, I can make this work like a charm,i.e. the symbol doFeature() is not within libMainLib.so, but I don't succeed on this for Windows.
CMakeLists.txt:
cmake_minimum_required(VERSION 3.0)
project(MainLib)
add_library(FeatureLib STATIC src/FeatureLib.c)
target_include_directories(FeatureLib PUBLIC include
PRIVATE src)
add_library(MainLib SHARED src/MainLib.c)
target_include_directories(MainLib PUBLIC include
PRIVATE src)
# I don't want to include symbols from FeatureLib into shared MainLib
#target_link_libraries(MainLib PRIVATE FeatureLib)
add_executable(MainLibDemo src/demo.c)
target_link_libraries(MainLibDemo MainLib FeatureLib) #resolve symbol doFeature()
FeatureLib.h:
extern int doFeature(int input);
MainLib.h:
extern __declspec(dllexport) int MainLib(int input);
FeatureLib.c:
#include "FeatureLib.h"
int doFeature(int input) {return 4;}
MainLib.c:
#include "FeatureLib.h"
#include "MainLib.h"
__declspec(dllexport) int MainLib(int input)
{
if (input > 2) {
return doFeature(input);
} else {
return doFeature(0);
}
}
demo.c:
#include <stdio.h>
#include <stdlib.h>
#include "MainLib.h"
int main(int argc, char **argv)
{
if(argc > 1)
return MainLib(atoi(argv[1]));
else
return 0;
}
With this, I get the following compilation error:
"C:\Daten\tmp\DemoProject\simple\build\ALL_BUILD.vcxproj" (Standardziel) (1) ->
"C:\Daten\tmp\DemoProject\simple\build\MainLib.vcxproj" (Standardziel) (4) ->
(Link Ziel) ->
MainLib.obj : error LNK2019: unresolved external symbol _doFeature referenced in function _MainLib [C:\Daten\tmp\DemoProject\simple\build\MainLib.vcxproj]
C:\Daten\tmp\DemoProject\simple\build\Debug\MainLib.dll : fatal error LNK1120: 1 unresolved externals [C:\Daten\tmp\DemoProject\simple\build\MainLib.vcxproj]
0 Warnung(en)
2 Fehler
Is this even possible with Windows? What do I have to do to make it work and how can I verify it other than not linking FeatureLib to MainLibDemo. Any ideas are very welcome.
Kind regards,
Florian
The way you do it under Linux will not work under Windows
because dynamic linking works differently here.
Here is one strategy that could work.
In MainLib.dll code, instead of directly calling doFeature
you need to define a global pointer variable of proper function
pointer type and use it to call the function.
This will allow to build MainLib.dll without errors.
Now you need to set this pointer variable. One way would be:
Add exported function to MainLib.dll that takes pointers
to all functions that the DLL needs from the executable.
In FeatureLib.lib code add an initialisation function
that the application will need to call before using
your DLL which will pass pointers to its peers to the DLL.
This is basically the way most programs with plugins use to
give the plugins access to their facilities.
Another way would be to (Warning! I have not tested this specific
solution):
Declare the functions in FeatureLib.lib as exported
with __declspec(dllexport). This way they will be exported
from executable.
In MainLib.dll before first using the pointers use
GetModuleHandle and GetProcAddress to obtain the pointers.
It would best be done in some initialisation function for the
library. Otherwise you need to take care to avoid race conditions.
Hope this will help.
Though I do not think your copy protection scheme will work.
Andrew Henle is right in his comment: it is not hard
to extract the needed code from one executable and include it
in another.

How to link a project to two different versions of the same C static library?

I am working on a complex C ecosystem where different packages/libraries are developed by different people.
I would like to create a new project named foobar. This project uses two libraries, the library foo and the library bar.
Unfortunately, bar does not require the same version that foo requires. Both use say so there is a conflict.
If all the packages are on Git with submodules, the foobar project cannot be built when cloned recursively because two say functions exist in different translation units. So the submodule strategy doesn't work.
My question is: how is it possible to manage one project that uses two different version of the same static library (*.a)?
Structure
foobar
|
.----'----. <---- (require)
v v
foo bar
(v1.0) | | (v2.0)
'-> say <-'
The project foobar require the library foo and the library bar, both of these libraries uses the say package: foo requires version 1 and bar requires version 2.
Packages
say
// say.h
void say(char *);
foo
// foo.c
#include "say.h"
void foo(void) {
say("I am foo");
}
bar
// bar.c
#include "say.h"
void bar(void) {
say("I am bar");
}
foobar
// main.c
#include <stdlib.h>
#include "foo"
#include "bar"
int main() {
foo();
bar();
return EXIT_SUCCESS;
}
Linkers typically have a mode in which they perform a partial link, which resolves references that can currently be resolved and produces an object module ready for further linking instead of a finished executable file.
For example, the GCC linker ld has a -r switch that allows this. Using this switch, and possibly others, you could link foo.o with one library to make foo.partial.o and separately link bar.o with another library to make bar.partial.o. Then you could link foo.partial.o and bar.partial.o with each other, the main program, and any other libraries and object modules needed.
This can work for static libraries, where the code for each library is included in the resulting executable or object file, and the references to its symbols are fully resolved. For shared dynamic libraries, there may be problems, since dynamic libraries require references to be resolved at run time, and the linker and executable file format might or might not support the ability to distinguish symbols of the same name in different versions of one library.

Why including an h file with external vars and funcs results in undefined references

What if I want these externals to be resolved in runtime with dlopen?
Im trying to understand why including an h file, with shared library external vars and funcs, to a C executable program results in undefined/unresolved. (when linking)
Why do I have to add "-lsomelib" flag to the gcc linkage if I only want these symbols to be resolved in runtime.
What does the link time linker need these deffinitions resolutions for. Why cant it wait for the resolution in runtime when using dlopen.
Can anyone help me understand this?
Here something that may help understanding:
there are 3 types of linking:
static linking (.a): the compiler includes the content of the library into your code at link time so that you can move the code to other computers with the same architecture and run it.
dynamic linking (.so): the compiler resolves the symbols at link time (during compilation); but the does not includes the code of the library in your executable. When the program is started, the library is loaded. And if the library is not found the program stop. You need the library on the computer that is running the program
dynamic loading: You are in charge of loading the library functions at runtime, using dlopen and etc. Specially used for plugins
see also: http://www.ibm.com/developerworks/library/l-dynamic-libraries/ and
Difference between shared objects (.so), static libraries (.a), and DLL's (.so)?
A header file (e.g. an *.h file referenced by some #include directive) is relevant to the C or C++ compiler. The linker does not know about source files (which are input to the compiler), but only about object files produced by the assembler (in executable and linkable format, i.e. ELF)
A library file (give by -lfoo) is relevant only at link time. The compiler does not know about libraries.
The dynamic linker needs to know which libraries should be linked. At runtime it does symbol resolution (against a fixed & known set of shared libraries). The dynamic linker won't try linking all the possible shared libraries present on your system (because it has too many shared objects, or because it may have several conflicting versions of a given library), it will link only a fixed set of libraries provided inside the executable. Use objdump(1) & readelf(1) & nm(1) to explore ELF object files and executables, and ldd(1) to understand shared libraries dependencies.
Notice that the g++ program is used both for compilation and for linking. (actually it is a driver program: it starts some cc1plus -the C++ compiler proper- to compile a C++ code to an assembly file, some as -the assembler- to assemble an assembly file into an object file, and some ld -the linker- to link object files and libraries).
Run g++ as g++ -v to understand what it is doing, i.e. what program[s] is it running.
If you don't link the required libraries, at link time, some references remain unresolved (because some object files contain an external reference and relocation).
(things are slightly more complex with link-time optimization, which we could ignore)
Read also Program Library HowTo, Levine's book linkers and loaders, and Drepper's paper: how to write shared libraries
If you use dynamic loading at runtime (by using dlopen(3) on some plugin), you need to know the type and signature of relevant functions (returned by dlsym(3)). A program loading plugins always have its specific plugin conventions. For examples look at the conventions used for geany plugins & GCC plugins (see also these slides about GCC plugins).
In practice, if you are developing your application accepting some plugins, you will define a set of names, their expected type, signature, and role. e.g.
typedef void plugin_start_function_t (const char*);
typedef int plugin_more_function_t (int, double);
then declare e.g. some variables (or fields in a data structure) to point to them with a naming convention
plugin_start_function_t* plustart; // app_plugin_start in plugins
#define NAME_plustart "app_plugin_start"
plugin_more_function_t* plumore; // app_plugin_more in plugins
#define NAME_plumore "app_plugin_more"
Then load the plugin and set these pointers, e.g.
void* plugdlh = dlopen(plugin_path, RTLD_NOW);
if (!plugdlh) {
fprintf(stderr, "failed to load %s: %s\n", plugin_path, dlerror());
exit(EXIT_FAILURE; }
then retrieve the symbols:
plustart = dlsym(plugdlh, NAME_plustart);
if (!plustart) {
fprintf(stderr, "failed to find %s in %s: %s\n",
NAME_plustart, plugin_path, dlerror();
exit(EXIT_FAILURE);
}
plumore = dlsym(plugdlh, NAME_plumore);
if (!plumore) {
fprintf(stderr, "failed to find %s in %s: %s\n",
NAME_plumore, plugin_path, dlerror();
exit(EXIT_FAILURE);
}
Then use appropriately the plustart and plumore function pointers.
In your plugin, you need to code
extern "C" void app_plugin_start(const char*);
extern "C" int app_plugin_more (int, double);
and give a definition to both of them. The plugin should be compiled as position independent code, e.g. with
g++ -Wall -fPIC -O -g pluginsrc1.c -o pluginsrc1.pic.o
g++ -Wall -fPIC -O -g pluginsrc2.c -o pluginsrc2.pic.o
and linked with
g++ -shared pluginsrc1.pic.o pluginsrc2.pic.o -o yourplugin.so
You may want to link extra shared libraries to your plugin.
You generally should link your main program (the one loading plugins) with the -rdynamic link flag (because you want some symbols of your main program to be visible to your plugins).
Read also the C++ dlopen mini howto

How to call exported kernel module functions from another module?

I'm writing an API as a kernel module that provides device drivers with various functions. I wrote three functions in mycode.c. I then built and loaded the module, then copied mycode.h into < kernel >/include/linux. In a device driver, I have a #include < linux/mycode.h > and call those three functions. But when I build the driver module, I get three linker warnings saying that those functions are undefined.
Notes:
The functions are declared extern in mycode.h
The functions are exported using EXPORT_SYMBOL(func_name) in mycode.c
Running the command nm mycode.ko shows all three functions as being available in the symbol table (capital T next to them, meaning the symbols are found in the text (code) section)
After loading the module, the command grep func_name /proc/kallsyms shows all three functions as being loaded
So clearly the functions are being exported correctly and the kernel knows what and where they are. So why can't the driver see their definitions? Any idea what am I missing?
EDIT: I found some information about this here: http://www.kernel.org/doc/Documentation/kbuild/modules.txt
Sometimes, an external module uses exported symbols from another
external module. kbuild needs to have full knowledge of all symbols
to avoid spitting out warnings about undefined symbols. Three
solutions exist for this situation.
NOTE: The method with a top-level kbuild file is recommended but may
be impractical in certain situations.
Use a top-level kbuild file If you have two modules, foo.ko and
bar.ko, where foo.ko needs symbols from bar.ko, you can use a
common top-level kbuild file so both modules are compiled in the
same build. Consider the following directory layout:
./foo/ <= contains foo.ko
./bar/ <= contains bar.ko
The top-level kbuild file would then look like:
#./Kbuild (or ./Makefile):
obj-y := foo/ bar/
And executing
$ make -C $KDIR M=$PWD
will then do the expected and compile both modules with full
knowledge of symbols from either module.
Use an extra Module.symvers file When an external module is built,
a Module.symvers file is generated containing all exported symbols
which are not defined in the kernel. To get access to symbols from
bar.ko, copy the Module.symvers file from the compilation of bar.ko
to the directory where foo.ko is built. During the module build,
kbuild will read the Module.symvers file in the directory of the
external module, and when the build is finished, a new
Module.symvers file is created containing the sum of all symbols
defined and not part of the kernel.
Use "make" variable KBUILD_EXTRA_SYMBOLS If it is impractical to
copy Module.symvers from another module, you can assign a space
separated list of files to KBUILD_EXTRA_SYMBOLS in your build file.
These files will be loaded by modpost during the initialization of
its symbol tables.
But with all three of these solutions, in order for any driver to use my API, it would have to either create a new Makefile or have direct access to my Module.symvers file? That seems a bit inconvenient. I was hoping they'd just be able to #include my header file and be good to go. Do no other alternatives exist?
From my research, it seems that those are the only three ways to handle this situation, and I've gotten each of them to work, so I think I'll just pick my favorite out of those.
Minimal QEMU + Buildroot example
I have tested the following in a fully reproducible QEMU + Buildroot environment, so maybe having this working version version will help you find out what is wong with your code.
GitHub upstream is centered on the files:
dep.c
dep2.c
Makefile
dep.c
#include <linux/delay.h> /* usleep_range */
#include <linux/kernel.h>
#include <linux/kthread.h>
#include <linux/module.h>
MODULE_LICENSE("GPL");
int lkmc_dep = 0;
EXPORT_SYMBOL(lkmc_dep);
static struct task_struct *kthread;
static int work_func(void *data)
{
while (!kthread_should_stop()) {
printk(KERN_INFO "%d\n", lkmc_dep);
usleep_range(1000000, 1000001);
}
return 0;
}
static int myinit(void)
{
kthread = kthread_create(work_func, NULL, "mykthread");
wake_up_process(kthread);
return 0;
}
static void myexit(void)
{
kthread_stop(kthread);
}
module_init(myinit)
module_exit(myexit)
dep2.c
#include <linux/delay.h> /* usleep_range */
#include <linux/kernel.h>
#include <linux/kthread.h>
#include <linux/module.h>
MODULE_LICENSE("GPL");
extern int lkmc_dep;
static struct task_struct *kthread;
static int work_func(void *data)
{
while (!kthread_should_stop()) {
usleep_range(1000000, 1000001);
lkmc_dep++;
}
return 0;
}
static int myinit(void)
{
kthread = kthread_create(work_func, NULL, "mykthread");
wake_up_process(kthread);
return 0;
}
static void myexit(void)
{
kthread_stop(kthread);
}
module_init(myinit)
module_exit(myexit)
And now you can do:
insmod dep.ko
insmod dep2.ko
With that Buildroot setup, things are already configuring depmod /lib/module/*/depmod with the dependency, so just this is enough to load both:
modprobe dep
Also, if you built your kernel with CONFIG_KALLSYMS_ALL=y, then the exported symbol can be seen with:
grep lkmc_dep /proc/kallsyms
see also: Does kallsyms have all the symbol of kernel functions?
OK: You have one module where the function is and one place what wants to import it right?
You must use "EXPORT_SYMBOL("name of the function") such as foo in the place where the function is. So in the "c" file have the function "foo" defined and put in:
EXPORT_SYMBOL(foo)
Make sure you have the prototype for "foo" in a common header file (you can have it in separate places for each module and it will work but you are asking for trouble if the signatures change). So say: void foo(void *arg);
Then the other module that wants it just invoke "foo" and you are good.
Also: Make sure that you load the module with foo first. If you have cross dependencies like module2 needs foo from module1 and module1 needs bar from module2 you need to have one register functions with another. If you want to know please ask a separate Q.

Resources