What are the use cases of dlopen vs standard dynamic linking? - c

According to the doc, dlopen is used in conjunction with dlsym to load a library, and get a pointer to a symbol.
But that's already what the dynamic loader/linker does.
Moreover, both methods are based on ld.so.
There actually seems to be two differences when using dlopen:
The library can be conditionally loaded.
The compiler is not aware of the symbols (types, prototypes...) we're using, and thus does not check for potential errors. It's, by the way, a way to achieve introspection.
But, it does not seem to motivate the use of dlopen over standard loading, except for marginal examples:
Conditional loading is not really interesting, in terms of memory footprint optimization, when the shared library is already used by another program: Loading an already used library is not increasing the memory footprint.
Avoiding the compiler supervision is unsafe and a good way to write bugs... We're also missing the potential compiler optimizations.
So, are there other uses where dlopen is prefered over the standard dynamic linking/loading?

So, are there other uses where dlopen is prefered over the standard dynamic linking/loading?
Typical use-cases for using dlopen are
plugins
selecting optimal implementation for current CPU (Intel math libraries do this)
select implementation of API by different vendors (GLEW and other OpenGL wrappers do this)
delay loading of shared library if it's unlikely to be used (this would speed up startup because library constructors won't run + runtime linker would have slightly less work to do)
Avoiding the compiler supervision is unsafe and a good way to write bugs...
We're also missing the potential compiler optimizations.
That's true but you can have best of both worlds by providing a small wrapper library around delay loaded shared library. On Windows this is done by standard tools (google for "DLL import libraries"), on Linux you can do it by hand or use Implib.so.

I did this in a Windows environment to build a language switch feature. When my app starts it checks the configuration setting which language.dll should be used. From now on all texts are loaded from that dynamically loaded library, which even can be replaced during runtime. I included also a function for formatting ordinals (1st, 2nd, 3rd), which is language specific. The language resources for my native language I included in the executable, so I cannot end up with no texts available at all.
The key is that the executable can decide during runtime which library should be loaded. In my case it was a language switch, or as the commentators said something like a directory scan for plugins.
The lack of monitoring the call signature is definitely a disadvantage. If you really want to do evil things like overriding the prototype type definitions you could do this with standard C type casts.

Related

How to manage features that depend on libraries that may or may not be installed?

I'm writing some new functionality for a graphics program (written mostly in C, with small parts in C++). The new functionality will make use of libgmic. Some users of the program will have libgmic installed, quite a lot will not. The program is monolithic: I'm not writing a plugin, this will be part of the main program. Compiling the program with the right headers is easy, but I need to be able to check at runtime whether the library is installed on the user's system or not in order to enable / disable the particular menu item so that users without the library installed can't invoke this piece of functionality and crash the program. What's the best way of going about this?
You need to load the library at runtime with dlopen (or LoadLibrary on Windows) instead of linking to it, get function pointers with dlsym (GetProcAddress on Windows) and use them instead of function prototypes from the headers. Otherwise your program will simply fail to startup without the library (or crash, in some cases).
Some libraries support such usage well, such as providing types for all the functions you need. With others you’re on your own but that’s still possible.

Are C standard library structures compatible between compilers and library versions on macOS or Linux?

My host application took over the ownership of e.g. a FILE object which came from a dynamic library. Can I call fclose() on this object safely even though my host application and the dynamic library are compiled with different versions of clang / gcc?
Background
On Windows (with different VS runtimes) it would be illegal and I have to first extract the fclose() function from the runtime library which is used by the dynamic library since all runtimes have their own pools and internal structures for file or memory objects.
An illustration for the situation in Windows would look like this:
Does this restriction apply for Linux and macOS as well?
The issue is not whether your application and the dynamic libraries were compiled with different versions of clang and/or gcc. The issue is whether, ultimately, there's one underlying C library that manipulates one kind of FILE * object and has one, compatible implementation of fclose().
Under MacOS and Linux, at least, the answer to all these questions is likely to be "yes". In my experience it's hard to get two different, incompatible C libraries into the mix; you'd have to really work at it.
Addendum: I suppose I should admit, however, that my experience may be getting dated. In my experience, on any Unix-like system, there's exactly one C library, generally /lib/libc.{a,so}. But I gather that "modern" compilers are tending to access their own compiler- and version-specific libraries off in special places, meaning that the scenario you're worried about could be a problem. To me, it seems, this way lies madness, but then again, it seems that more and more of the world seems to be embracing dependency hell, rather than trying to eliminate it.
It is not generally safe to use a library designed for one compiler with code compiled by a different compiler. A compiler may generate code that implements the nominal functions in the standard library using internal routines or interfaces, and those routines or interfaces may be different or missing in the library designed for another compiler.
Nor is it safe to take any pointer to some internal data structure from one library and use it with another library.
If the sources are just compiled with different versions of one compiler (e.g., clang 73 and clang 89), not different compilers (e.g., Apple clang versus GCC), the compiler might offer some guarantee about library compatibility. You would have to check its documentation. Or, if the compiler is intended to use the library provided with the operating system, that could work. Again, you would have to check its documentation.
On Linux, if both your code and the other library dynamically link to the same library (such as libc.so.6), both will get the same version and implementation of that library at runtime. You can check which libraries a given dynamic library links to with ldd.
If you were linking to a library that statically linked in a supporting library, you would need to be careful to pass any structures to or from it against the same version of the library. But this is more likely to come up in practice with libc++ and libstdc++ than with libc.
So, don't statically link your library to another and then pass a data structure that requires client code to separately link to the same library.

Does NtDll really export C runtime functions, and can I use these in my application?

I was looking at the NtDll export table on my Windows 10 computer, and I found that it exports standard C runtime functions, like memcpy, sprintf, strlen, etc.
Does that mean that I can call them dynamically at runtime through LoadLibrary and GetProcAddress? Is this guaranteed to be the case for every Windows version?
If so, it is possible to drop the C runtime library altogether (by just using the CRT functions from NtDll), therefore making my program smaller?
There is absolutely no reason to call these undocumented functions exported by NtDll. Windows exports all of the essential C runtime functions as documented wrappers from the standard system libraries, namely Kernel32. If you absolutely cannot link to the C Runtime Library*, then you should be calling these functions. For memory, you have the basic HeapAlloc and HeapFree (or perhaps VirtualAlloc and VirtualFree), ZeroMemory, FillMemory, MoveMemory, CopyMemory, etc. For string manipulation, the important CRT functions are all there, prefixed with an l: lstrlen, lstrcat, lstrcpy, lstrcmp, etc. The odd man out is wsprintf (and its brother wvsprintf), which not only has a different prefix but also doesn't support floating-point values (Windows itself had no floating-point code in the early days when these functions were first exported and documented.) There are a variety of other helper functions, too, that replicate functionality in the CRT, like IsCharLower, CharLower, CharLowerBuff, etc.
Here is an old knowledge base article that documents some of the Win32 Equivalents for C Run-Time Functions. There are likely other relevant Win32 functions that you would probably need if you were re-implementing the functionality of the CRT, but these are the direct, drop-in replacements.
Some of these are absolutely required by the infrastructure of the operating system, and would be called internally by any CRT implementation. This category includes things like HeapAlloc and HeapFree, which are the responsibility of the operating system. A runtime library only wraps those, providing a nice standard-C interface and some other niceties on top of the nitty-gritty OS-level details. Others, like the string manipulation functions, are just exported wrappers around an internal Windows version of the CRT (except that it's a really old version of the CRT, fixed back at some time in history, save for possibly major security holes that have gotten patched over the years). Still others are almost completely superfluous, or seem so, like ZeroMemory and MoveMemory, but are actually exported so that they can be used from environments where there is no C Runtime Library, like classic Visual Basic (VB 6).
It is also interesting to point out that many of the "simple" C Runtime Library functions are implemented by Microsoft's (and other vendors') compiler as intrinsic functions, with special handling. This means that they can be highly optimized. Basically, the relevant object code is emitted inline, directly in your application's binary, avoiding the need for a potentially expensive function call. Allowing the compiler to generate inlined code for something like strlen, that gets called all the time, will almost undoubtedly lead to better performance than having to pay the cost of a function call to one of the exported Windows APIs. There is no way for the compiler to "inline" lstrlen; it gets called just like any other function. This gets you back to the classic tradeoff between speed and size. Sometimes a smaller binary is faster, but sometimes it's not. Not having to link the CRT will produce a smaller binary, since it uses function calls rather than inline implementations, but probably won't produce faster code in the general case.
* However, you really should be linking to the C Runtime Library bundled with your compiler, for a variety of reasons, not the least of which is security updates that can be distributed to all versions of the operating system via updated versions of the runtime libraries. You have to have a really good reason not to use the CRT, such as if you are trying to build the world's smallest executable. And not having these functions available will only be the first of your hurdles. The CRT handles a lot of stuff for you that you don't normally even have to think about, like getting the process up and running, setting up a standard C or C++ environment, parsing the command line arguments, running static initializers, implementing constructors and destructors (if you're writing C++), supporting structured exception handling (SEH, which is used for C++ exceptions, too) and so on. I have gotten a simple C app to compile without a dependency on the CRT, but it took quite a bit of fiddling, and I certainly wouldn't recommend it for anything remotely serious. Matthew Wilson wrote an article a long time ago about Avoiding the Visual C++ Runtime Library. It is largely out of date, because it focuses on the Visual C++ 6 development environment, but a lot of the big picture stuff is still relevant. Matt Pietrek wrote an article about this in the Microsoft Journal a long while ago, too. The title was "Under the Hood: Reduce EXE and DLL Size with LIBCTINY.LIB". A copy can still be found on MSDN and, in case that becomes inaccessible during one of Microsoft's reorganizations, on the Wayback Machine. (Hat tip to IInspectable and Gertjan Brouwer for digging up the links!)
If your concern is just the need to distribute the C Runtime Library DLL(s) alongside your application, you can consider statically linking to the CRT. This embeds the code into your executable, and eliminates the requirement for the separate DLLs. Again, this bloats your executable, but does make it simpler to deploy without the need for an installer or even a ZIP file. The big caveat of this, naturally, is that you cannot benefit to incremental security updates to the CRT DLLs; you have to recompile and redistribute the application to get those fixes. For toy apps with no other dependencies, I often choose to statically link; otherwise, dynamically linking is still the recommended scenario.
There are some C runtime functions in NtDll. According to Windows Internals these are limited to string manipulation functions. There are other equivalents such as using HeapAlloc instead of malloc, so you may get away with it depending on your requirements.
Although these functions are acknowledged by Microsoft publications and have been used for many years by the kernel programmers, they are not part of the official Windows API and you should not use of them for anything other than toy or demo programs as their presence and function may change.
You may want to read a discussion of the option for doing this for the Rust language here.
Does that mean that I can call them dynamically at runtime through
LoadLibrary and GetProcAddress?
yes. even more - why not use ntdll.lib (or ntdllp.lib) for static binding to ntdll ? and after this you can direct call this functions without any GetProcAddress
Is this guaranteed to be the case for every Windows version?
from nt4 to win10 exist many C runtime functions in ntdll, but it set is different. usual it grow from version to version. but some of then less functional compare msvcrt.dll . say for example printf from ntdll not support floating point format, but in general functional is same
it is possible to drop the C runtime library altogether (by just using
the CRT functions from NtDll), therefore making my program smaller?
yes, this is 100% possible.

How does OpenGL function loading work?

I understand it's necessary to "load" functions for opengl by having special functions locate pointers to the function based on the name of the opengl function. I've never seen something similar to this done before and I'm wondering how this works. Where are the functions actually located? How are they retrieved? Why is it done this way?
Where are the functions actually located?
This is implementation dependent. I'd wager that they're stored in a hash table of function names to function pointers. They're still in the shared library, but usually don't have their symbols exposed.
How are they retrieved?
glXGetProcAddress or wglGetProcAddress, depending on the platform. Most libraries that create OpenGL windows (GLFW, SDL) have their own, cross-platform function that uses the above.
Why is it done this way?
I can think of a few reasons: implementations can change what extensions are available without breaking ABI compatibility, and what functions you have access to may depend on the type or version of context you request (ex. for Mesa, post OpenGL 3.1 extensions are only available in a 3.1+ context, not in anything lower).

What is the C runtime library?

What actually is a C runtime library and what is it used for? I was searching, Googling like a devil, but I couldn't find anything better than Microsoft's: "The Microsoft run-time library provides routines for programming for the Microsoft Windows operating system. These routines automate many common programming tasks that are not provided by the C and C++ languages."
OK, I get that, but for example, what is in libcmt.lib? What does it do? I thought that the C standard library was a part of C compiler. So is libcmt.lib Windows' implementation of C standard library functions to work under win32?
Yes, libcmt is (one of several) implementations of the C standard library provided with Microsoft's compiler. They provide both "debug" and "release" versions of three basic types of libraries: single-threaded (always statically linked), multi-threaded statically linked, and multi-threaded dynamically linked (though, depending on the compiler version you're using, some of those may not be present).
So, in the name "libcmt", "libc" is the (more or less) traditional name for the C library. The "mt" means "multi-threaded". A "debug" version would have a "d" added to the end, giving "libcmtd".
As far as what functions it includes, the C standard (part 7, if you happen to care) defines a set of functions a conforming (hosted) implementation must supply. Most vendors (including Microsoft) add various other functions themselves (for compatibility, to provide capabilities the standard functions don't address, etc.) In most cases, it will also contain quite a few "internal" functions that are used by the compiler but not normally by the end user.
The runtime library is basically a collection of the implementations of those functions in one big file (or a few big files--e.g., on UNIX the floating point functions are traditionally stored separately from the rest). That big file is typically something on the same general order as a zip file, but without any compression, so it's basically just some little files collected together and stored together into one bigger file. The archive will usually contain at least some indexing to make it relatively fast/easy to find and extract the data from the internal files. At least at times, Microsoft has used a library format with an "extended" index the linker can use to find which functions are implemented in which of the sub-files, so it can find and link in the parts it needs faster (but that's purely an optimization, not a requirement).
If you want to get a complete list of the functions in "libcmt" (to use your example) you could open one of the Visual Studio command prompts (under "Visual Studio Tools", normally), switch to the directory where your libraries were installed, and type something like: lib -list libcmt.lib and it'll generate a (long) list of the names of all the object files in that library. Those don't always correspond directly to the names of the functions, but will generally give an idea. If you want to look at a particular object file, you can use lib -extract to extract one of those object files, then use dumpbin /symbols <object file name> to find what function(s) is/are in that particular object file.
At first, we should understand what a Runtime Library is; and think what it could mean by "Microsoft C Runtime Library".
see: http://en.wikipedia.org/wiki/Runtime_library
I have posted most of the article here because it might get updated.
When the source code of a computer program is translated into the respective target language by a compiler, it would cause an extreme enlargement of program code if each command in the program and every call to a built-in function would cause the in-place generation of the complete respective program code in the target language every time. Instead the compiler often uses compiler-specific auxiliary functions in the runtime library that are mostly not accessible to application programmers. Depending on the compiler manufacturer, the runtime library will sometimes also contain the standard library of the respective compiler or be contained in it.
Also some functions that can be performed only (or are more efficient or accurate) at runtime are implemented in the runtime library, e.g. some logic errors, array bounds checking, dynamic type checking, exception handling and possibly debugging functionality. For this reason, some programming bugs are not discovered until the program is tested in a "live" environment with real data, despite sophisticated compile-time checking and pre-release testing. In this case, the end user may encounter a runtime error message.
Usually the runtime library realizes many functions by accessing the operating system. Many programming languages have built-in functions that do not necessarily have to be realized in the compiler, but can be implemented in the runtime library. So the border between runtime library and standard library is up to the compiler manufacturer. Therefore a runtime library is always compiler-specific and platform-specific.
The concept of a runtime library should not be confused with an ordinary program library like that created by an application programmer or delivered by a third party or a dynamic library, meaning a program library linked at run time. For example, the programming language C requires only a minimal runtime library (commonly called crt0) but defines a large standard library (called C standard library) that each implementation has to deliver.
I just asked this myself and was hurting my brain for some hours. Still did not find anything that really makes a point. Everybody that does write something to a topic is not able to actually "teach". If you want to teach someone, take the most basic language a person understands, so he does not need to care about other topics when handling a topic. So I came to a conclusion for myself that seems to fit well in all this chaos.
In the programming language C, every program starts with the main() function.
Other languages might define other functions where the program starts. But a processor does not know the main(). A processor knows only predefined commands, represented by combinations of 0 and 1.
In microprocessor programming, not having an underlying operating system (Microsoft Windows, Linux, MacOS,..), you need to tell the processor explicitly where to start by setting the ProgramCounter (PC) that iterates and jumps (loops, function calls) within the commands known to the processor. You need to know how big the RAM is, you need to set the position of the program stack (local variables), as well as the position of the heap (dynamic variables) and the location of global variables (I guess it was called SSA?) within the RAM.
A single processor can only execute one program at a time.
That's where the operating system comes in. The operating system itself is a program that runs on the processor. A program that allows the execution of custom code. Runs multiple programs at a time by switching between the execution codes of the programs (which are loaded into the RAM). But the operating system IS A PROGRAM, each program is written differently. Simply putting the code of your custom program into RAM will not run it, the operating system does not know it. You need to call functions on the operating system that registers your program, tell the operating system how much memory the program needs, where the entry point into the program is located (the main() function in case of C). And this is what I guess is located within the Runtime Library, and explains why you need a special library for each operating system, cause these are just programs themselves and have different functions to do these things.
This also explains why it is NOT dynamically linked at runtime as .dll files are, even if it is called a RUNTIME Library. The Runtime Library needs to be linked statically, because it is needed at startup of your program. The Runtime Library injects/connects your custom program into/to another program (the operating system) at RUNTIME. This really causes some brain f...
Conclusion:
RUNTIME Library is a fail in naming. There might not have been a .dll (linking at runtime) in the early times and the issue of understanding the difference simply did not exist. But even if this is true, the name is badly chosen.
Better names for the Runtime Library could be: StartupLibrary/OSEntryLibrary/SystemConnectLibrary/OSConnectLibrary
Hope I got it right, up for correction/expansion.
cheers.
C is a language and in its definition, there do not need to be any functions available to you. No IO, no math routines and so on. By convention, there are a set of routines available to you that you can link into your executable, but you don't need to use them. This is, however, such a common thing to do that most linkers don't ask you to link to the C runtime libraries anymore.
There are times when you don't want them - for example, in working with embedded systems, it might be impractical to have malloc, for example. I used to work on embedding PostScript into printers and we had our own set of runtime libraries that were much happier on embedded systems, so we didn't bother with the "standard".
The runtime library is that library that is automatically compiled in for any C program you run. The version of the library you would use depends on your compiler, platform, debugging options, and multithreading options.
A good description of the different choices for runtime libraries:
http://www.davidlenihan.com/2008/01/choosing_the_correct_cc_runtim.html
It includes those functions you don't normally think of as needing a library to call:
malloc
enum, struct
abs, min
assert
Microsoft has a nice list of their runtime library functions:
https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/crt-alphabetical-function-reference?view=msvc-170
The exact list of functions would vary depending on compiler, so for iOS you would get other functions like dispatch_async() or NSLog().
If you use a tool like Dependency Walker on an executable compiled from C or C++ , you will see that one of the the DLLs it is dependent on is MSVCRT.DLL. This is the Microsoft C Runtime Library. If you further examine MSVCRT.DLL with DW, you will see that this is where all the functions like printf(), puts(0, gets(), atoi() etc. live.
i think Microsoft's definition really mean:
The Microsoft implementation of
standard C run-time library provides...
There are three forms of the C Run-time library provided with the Win32 SDK:
* LIBC.LIB is a statically linked library for single-threaded programs.
* LIBCMT.LIB is a statically linked library that supports multithreaded programs.
* CRTDLL.LIB is an import library for CRTDLL.DLL that also supports multithreaded programs. CRTDLL.DLL itself is part of Windows NT.
Microsoft Visual C++ 32-bit edition contains these three forms as well, however, the CRT in a DLL is named MSVCRT.LIB. The DLL is redistributable. Its name depends on the version of VC++ (ie MSVCRT10.DLL or MSVCRT20.DLL). Note however, that MSVCRT10.DLL is not supported on Win32s, while CRTDLL.LIB is supported on Win32s. MSVCRT20.DLL comes in two versions: one for Windows NT and the other for Win32s.
see: http://support.microsoft.com/?scid=kb%3Ben-us%3B94248&x=12&y=9

Resources