Dynamic modules with DLLs on Windows - c

I'm writing an application in C that can be extended at runtime by means of modules / shared objects / DLLs. Those modules may use the API of the existing program but may also provide new functions for use in later loaded modules, so there is the possibility for modules to have dependencies on each other.
My current approach under Linux is to have every module define a depends() function that returns a list of other module names it depends on. That way, I can compile and link every module for itself, load a module with dlopen() and RTLD_LAZY, resolve its dependencies first and then fully load it with RTLD_GLOBAL. This works just fine and does exactly what I want. It also allows me to replace a module with a different version without recompiling all other modules depending on it.
The actual problem arises when porting this to Windows. First, I haven't found any way to link a DLL without already providing it with the export symbol tables of all its dependencies. Is there one I've overlooked?
Second, LoadLibraryEx from the Windows API doesn't seem to be able to perform any lazy loading because instead of letting me handle dependencies, it goes ahead and loads all referenced DLLs itself before it even returns. Since I'd like to perform version checking as well before actually loading modules in the future, this it not at all what I want. Is there any way to circumvent this behaviour?
The third odd thing is that I cannot replace a DLL without recompiling all other modules depending on it. It actually does work sometimes, but usually wild things start to happen or the program segfaults.
Is it even possible to write a modular application like that on Windows? Any suggestions or different approaches are highly appreciated!
Update: Just to provide some clarification on how my modules use each others functions on Linux (which I would like to have on Windows as well): Every module just returns the name of another module it would like to call functions from in the described depends() function and includes its header, then calls the used functions directly in the code without any wrapping. This works because Linux does not require you to have all symbols resolved at link time for shared objects.

You can export all functions manually (using __declspec(dllexport)) and load them using GetProcAddress. In this case you need to know the signature of each function, and you're limited to C functions only, but this is going to work. If you compile both modules, your C functions can also return C++ classes, more on this later. Using GetProcAddress & LoadLibrary makes modules totally independent. Basically, you do the linking manually, but as I understand, this is what you do on Linux, right?
LoadLibary only loads the libraries a library is dependent on, so make sure they don't load depend on each other. Either they are really independent, or they aren't, if done properly, changing one library will not force a recompile of the other (as you don't link them together).
A good idea is to use something like COM, so make each of your libraries return an interface, instead of individual functions. That way, you can simply load a whole DLL and also link them easily together (passing a DLL -> passing an object). Look up XPCOM and COM, it's actually very easy to do.

Related

Creating Backwards Compatible Drop In Statically Linked DLL

I have a C-based DLL that I wrote years ago for a project and it exports a set of functions that define an API. Now I need to re-write this DLL's internals but keep the API exactly the same.
The user of the DLL used static linking and they do not want to or are unable to recompile their executable.
I've noticed that the RVAs of the exported functions are different. My understanding is that means the executable won't be able to find the functions unless it is re-linked with the updated lib file.
Is there a way in VS2017 to force an exported function to use a specific RVA? I checked the Microsoft LINK DEF file format and I didn't see an option in there.
Even if it is possible, is fixing the RVAs enough to ensure the old executable will be able to use the updated DLL or are there additional complications that make this a non-starter?
Thanks.
When you statically link an EXE module against a DLL, you do indeed link against the the DLL's import library (a .LIB) created alongside the DLL when the DLL was built. This is not the same thing as linking against a static library which is confusing because those are also .LIB files.
The first thing you should do is figure out if your EXE module has an import entry for said DLL using a tool like Dependency Walker, Dumpbin, pelook or your favorite PE analyzer tool. If there is no DLL import entry, you have have probably linked the EXE against a static library as described by #HAL9000 's answer. Short of reverse-engineering the EXE, your best bet would be to rebuild the module as suggested if possible.
Otherwise, if you do find an import for said DLL, then yes you can swap out a newly-built DLL provided you have the same export (function) names and/or ordinal values as the original. DLLs find function by export names and/or ordinal values, not RVAs which in this case are only an internal detail. Whether the DLL is implicitly loaded (from being statically-linked) during process (EXE) initialization (before the EXE's entry point is called) or explicitly loaded (via code using LoadLibrary, etc.) the whole point of being a DLL is that it is a module is designed to be dynamically replaced and Windows was designed around this concept. The internal RVAs both within the EXE (referencing the DLL) and the DLL itself do not need to match an old DLL's values; this bookkeeping is automatically handled by the Windows loader during a process also known as runtime linking.
In the event the EXE is linked against said DLL and ALSO specifies hard-coded addresses (RVAs) for the DLL's exported functions (a process known as static binding), Windows will still verify the addresses still internally reflect the correct values in the DLL that is actually loaded which may be a different, updated DLL. This is done via a timestamp check in the import section for the DLL. If there is a mismatch, the Windows loader tosses-out all of the static RVAs and updates them with the current values incurring a slight performance penalty, but the program will still load. FWIW the bind.exe tool to do this static binding no longer ships with the Visual C++ toolset as the performance gain in modern versions of Windows is minimal. This optimization used to be be common practice to speed up load times, especially in OS-supplied system DLLs, but shouldn't affect what you are trying to do one way or the other.
If the user has statically linked in your library, then it is not a DLL, and making a drop in replacement without relinking is not possible. At least not without some ugly hacks. The old library functions have been copied into the executable, so there are no way around editing the executable. If you can't recompile or relink, then it is probably easier to rewrite the executable from scratch.
Mucking around with adresses of functions in your new DLL, if possible, can't have any effect if the executable doesn't have any code to load a DLL at all.

Instrumenting a C library

I have a binary library and a binary executable using that library, both written in C. I know the C API provided by the library, but neither the source of the library or the executable. I would like to understand how the executable uses the library (compare my previous question How to know which functions of a library get called by a program).
The proposed solutions did not give satisfactory results. A possibility not mentioned seems to be to implement a wrapper library that imitates the known interface of the binary library I am interested in. My idea is to forward all of the calls to the wrapper to the binary library. This should allow me to log all the calls and passed parameters, in other words to instrument the library.
I succeeded in implementing the wrapper library on Linux as a dynamic link library (*.so), together with my own sample application connecting to the wrapper. The wrapper, in turn, uses the original binary library. Both libraries are used with dlopen and dlsym to access the API. However, I am facing the following practical problem: I do not manage to link the original binary executable to my wrapper library. That is related to the fact that the executable expects the library under a certain name. However, if I name my rapper library that way, it conflicts with the original library. Surprisingly (to me) simply renaming the .so-file of that one and linking the wrapper library against it does not work (The result stops without error message when the wrapper library calls dlopen and I do not get more information from the debugger than that it seems to happen in an malloc).
I tried a number of things like using symbolic links to move one of the libraries out of the search path of the run-time linker, to add paths to the LD_LIBRARY_PATH environment variable, different relative locations of the .so-files (and corresponding paths for dlopen), as well as different compiler options, so far without success.
To summarize, I would like
(executable)_orig->(lib.so)->(lib.so)_orig
where (executable)_orig and (lib.so)_orig (both binaries that I cannot influence) are such that
(executable)_orig->(lib.so)_orig
works. I have the sources of (lib.so) and can modify it as I wish. Also, I can modify the Linux host system as I like. The task of (lib.so) is to tell me how (executable)_orig and (lib.so)_orig interact.
I also have
(executable)->(wrapper.so)->(lib.so)_orig
working, which seems to indicate that the issue is related to the naming and loading conventions for the libraries.
This is a separate new question because it deals with the specific practical issue sketched above. Beyond that, some background info on why renaming the file corresponding to (lib.so)_orig to circumvent the issue may fail could also prove useful.

Proxy shared library (sharedlib, shlib, so) for ELF?

On Windows, it's more or less common to create "proxy DLLs" which take place of the original DLL and forward calls to it (after any additional actions as needed). You can read about it here and here for example.
However, shlib munging culture under Linux is quite different. It starts with the fact that LD_PRELOAD is the builtin feature with ld.so under Linux, which simply injects separate shlib into process and uses any symbols it defines as override. And that "injection" technique seems to define whole direction of thought - here's a typical ELF hacking tool or this question, where gentleman seems to have the same usecase as me, but starts with asking how he can patch existing binaries.
No, thanks. I don't want to inject into or modify something which is nor mine. All I want to do is to make a standalone proxy shlib which will call out to the original. Ideally, there would be a tool which can be fed with the original .so and create a C source code which would just redirect to original's functions, while letting me easily override anything I want. So, where's such tool? ;-) Thanks.
Using LD_PRELOAD doesn't really involve modifying something which isn't yours, and the injection isn't all that different from normal dynamic library loading. The “typical ELF hacking tool” from the ERESI project is unrelated to LD_PRELOAD. You should not be afraid of it. A good introduction to writing LD_PRELOAD-able “proxies” is here.
That being said, if you want to create a system-wide proxy for some library, you might argue that globally setting LD_PRELOAD (and thus loading your proxy into every binary that ever runs on your system) is undesirable. It is commonly used to override functions from glibc by tools such as libeatmydata or socksify, but if you're overriding a function in a library that is bigger and/or less widespread than glibc, it makes sense to try to find another approach, to really create a proxy for just that one library.
One such approach is to use patchelf --replace-needed or --add-needed to hardcode the full pathname of the original library and then make sure the proxy library is found first by setting LD_LIBRARY_PATH¹. So, the complete procedure is:
create an LD_PRELOAD-able library that overrides some functions of the original one (test that it works using only LD_PRELOAD before proceeding further!)
compile and link this library with the original library so that ldd libwrapper-foo.so includes something like:
libfoo.so.0 => /usr/lib/x86_64-linux-gnu/libfoo.so.0 (0x0000deadbeef0000)
hardcode the full path using patchelf:
patchelf --replace-needed libfoo.so.0 /usr/lib/x86_64-linux-gnu/libfoo.so.0 libwrapper-foo.so
symlink libwrapper-foo.so to libfoo.so.0
now LD_LIBRARY_PATH=. ldd $(which program-that-uses-libfoo) should include these lines:
libfoo.so.0 => ./libfoo.so.0 (0x0000dead56780000)
/usr/lib/x86_64-linux-gnu/libfoo.so.0 (0x0000dead1234000000)
set LD_LIBRARY_PATH to full path to the wrapper library in your .bashrc or somewhere
A real-life example of such proxy libary is my wrapper for libpango that enables subpixel positioning for all applications.
¹) It might also be possible to put this proxy library into /usr/local/lib, but ldconfig (the tool that updates shared libraries cache) refuses to use libraries with hardcoded absolute paths.
apitrace is a tool which covers detailed tracing of graphic libs (OpenGL, DirectX) calls for a number of platform. It's probably too detailed and complex for generic solution, but at least provides some reference and affinity.

check compatability of a shared library before dynamically loading it

I have a program and bunch of "plug-ins" (shared libraries) that the main program loads on request during the runtime.
The plug-ins can access all the internal global data-structures/functions of the program, so there is no option to keep version for each time the internal data-structures changed.
I'm seeking for a way, that the main program can check if the plug-in it tries to load is supported (uses the appropriate data-structures).
Is there a creative way you can think of, doing this?
Have a function in the plugin returns information about the version of the protocol its support (The protocol of a plugin isn't restricted to what it provides, it is also what is required from the calling program.)
AProgrammer's answer (or simply exporting a global variable with the version number) will work, but bear in mind that no solution is foolproof or safe against malicious plugin files. Loaded modules run in the same memory space as your program, with the same privileges, and unfortunately the dynamic loader will happily run global constructors in the plugin before you are able to query the version or perform any checking yourself. (Grumble anyone have a link to Global Constructors Considered Harmful?)
In any case, if the plugin architecture is your design, I would highly recommend you ban any use of global constructors in specification for plugins. Of course you can't enforce this at runtime, but at least then you can blame any plugin author who breaks things for violating the contract.

efficiency of utilizing dll in c source code

I have a dll which I'd like to use in a c program,
Do you think is efficient to have a dll (lots of common functions) and then create a program that will eventually use them, or have all the source code?
To include the dll, What syntax must be followed?
Do you think is efficient to have a dll (lots of common functions) and then create a program that will eventually use them,or have all the source code.
For memory and disk space, it is more efficient to use a shared library (a DLL is the Windows implementation of shared libraries), assuming that at least two programs use this component. If only one program will ever use this component, then there is no memory or disk space savings to be had.
Shared libraries can be slightly slower than statically linking the code; however, this is likely to be incredibly minor, and shared libraries carry a number of benefits that make it more than worthwhile (such as the ability to load and handle symbols dynamically, which allows for plugin-like architectures). That said, there are also some disadvantages (if you are not careful about where your DLLs live, how they are versioned, and who can update them, then you can get into DLL hell).
To include the dll, What syntax must be followed?
This depends. There are two ways that shared libraries can be used. In the first way, you tell the linker to reference the shared library, and the shared library will automatically be loaded on program startup, and you would basically reference the code like normal (include the various headers and just use the name of the symbol when you want to reference it). The second way is to dynamically load the shared library (on Windows this is done via LoadLibrary while it is done on UNIX with dlopen). This second way makes it possible to change the behavior of the program based on the presence or absence of symbols in the shared library and to inspect the available set of symbols. For the second way, you would use GetProcAddress (Windows) or dlsym (UNIX) to obtain a pointer to a function defined in the library, and you would pass around function pointers to reference the functions that were loaded.
You can put your functions into either a static library ( a .lib) which is merged into your application at compile time and is basically the same as putting the .c files in the project.
Or you can use a dll where the functions are included at run time. the advantage of a dll is that two programs which use the same functions can use the same dll (saving disk space) and you can upgrade the dll without changing the program - neither of these probably matters for you.
The dll is automatically loaded when your program runs there is nothing special you need to do to include it ( you can load a dll specifically in your code - there are sometimes special reasons to do this)
Edit - if you need to create a stub lib for an existing dll see http://support.microsoft.com/kb/131313

Resources