Libcurl and curl_global_init in shared library loaded at runtime - c

I am developing a photo booth application that uses 3 modules to provide printing, capturing, and triggering functionality. The idea is that people can develop modules for it that extend this functionality. These modules are implemented as shared libraries that are loaded at runtime when the user clicks "start".
I am trying to implement a printer module that "prints" to a facebook image gallery. I want to use libcurl for this. My problem is with the initialization function: curl_global_init() The libcurl API documentation states that this function is absolutely not thread safe. From the docs:
This function is not thread safe. You must not call it when any other thread in the program (i.e. a thread sharing the same memory) is running. This doesn't just mean no other thread that is using libcurl. Because curl_global_init() calls functions of other libraries that are similarly thread unsafe, it could conflict with any other thread that uses these other libraries.
Elsewhere in the documentation it says:
The global constant situation merits special consideration when the code you are writing to use libcurl is not the main program, but rather a modular piece of a program, e.g. another library. As a module, your code doesn't know about other parts of the program -- it doesn't know whether they use libcurl or not. And its code doesn't necessarily run at the start and end of the whole program.
A module like this must have global constant functions of its own, just like curl_global_init() and curl_global_cleanup(). The module thus has control at the beginning and end of the program and has a place to call the libcurl functions.
...which seems to address the issue. However, this seems to imply that my module's init() and finalize() functions would be called at the program's beginning and end. Since the modules are designed to be swappable at runtime, there is no way I can do this. Even if I could, my application uses GLib, which per their documentation, it is never safe to assume there are no threads running:
...Since version 2.32, the GLib threading system is automatically initialized at the start of your program, and all thread-creation functions and synchronization primitives are available right away.
Note that it is not safe to assume that your program has no threads even if you don't call g_thread_new() yourself. GLib and GIO can and will create threads for their own purposes...
My question is: is there any way to safely call curl_global_init() in my application? Can I put the calls to curl_global_init() and curl_global_cleanup() in my module's init() and finalize() functions? Do I need to find another HTTP library?

First, you won't really find any other library without these restrictions since they are inherited by libcurl from 3rd party (SSL mostly) libraries with those restrictions. For example OpenSSL.
This said, the thread safe situation for global_init is very unfortunate and something we (in the curl project) really strongly dislike but cannot do much about as long as we use those other libraries. This also means that the exact situation for you depends on exactly which dependency libraries your libcurl is built to use.
You will in most situations be perfectly fine with calling curl_global_init() from your modules init() function the way you suggest. I can't guarantee this to be safe with 100% certainty of course since there are a few unknowns here that I cannot speak to.

Related

Isolating thread-unsafe initialization functions

Suppose that I have a C library that requires initialization and cleanup functions that aren’t thread-safe. Specifically, these functions may invoke other thread-unsafe functions in other libraries. I don’t know (in a default build) which libraries these will be.
Now consider the case of writing Java bindings to this library. Java spawns multiple threads before running any Java code. Worse, in the case of (say) an Eclipse plugin, there could be multiple threads running Java code by the time my code receives control. Some of the other threads could be using the aforementioned unsafe functions.
My current plan is to statically link the C library (in my case, libcurl) and all transitive dependencies – in my case, a TLS library (probably mbedTLS) and (on Windows platforms) the CRT. Fortunately, libcurl cleans up everything it has allocated, so problems related to allocating from one heap and freeing it on another should not arise. Because everything is statically linked, and won’t try to load any other shared libraries, I can then initialize libcurl from a static initializer.
Will this even work? Is there a better way?
Edit: The reason that serializing library calls won’t work, and that I believe that my solution might work, is that the global state is stored not only in libcurl itself, but also in libraries libcurl depends on. Some of these libraries (ex. OpenSSL) might be in use by other code when my code is loaded. So I would need to lock against the entire process.
The reason I believe that isolating the global state would work is that libcurl (and every library it depends on) is thread safe after initialization. I need to make sure that the initialization​ of libcurl doesn’t create race conditions. Afterwards I am fine.
[Updated and revised]
Your concern seems to be that you will have both direct and indirect bindings to some native library -- say mbedTLS --, that that native library requires one-time initialization that is not thread-safe, and that, beyond your ability to detect or control, different threads of the process may concurrently attempt to initialize that library, or perhaps may (unsafely) attempt to initialize it more than once. That certainly seems to be a worst-case scenario.
On the other hand, you postulate that you can successfully build a monolithic, dynamically-loadable library containing the native library you want along with the transitive closure of all its dependencies (outside the kernel), so as to ensure that this library does not share state with any other library loaded by the process. You assert that after a non-thread-safe initialization, the combined stack will be thread safe, at least as you intend to use it. You want to know about how to initialize the library.
Java promises that each class will be initialized by exactly one thread, and that afterward its initialized state will be visible to all threads. Although that does not explicitly address the question, it certainly implies that if the initialization of your native libraries is performed entirely as part of the initialization of a class -- e.g. via a static initializer, as you propose -- then the correct initialized state will be visible to all Java threads. That adequately addresses the problem as I understand it.
I remain dubious that building the monolithic library is necessary, but if you truly have to deal with the worst-case scenario you seem to anticipate then perhaps it is. Inasmuch as you cannot isolate the library from conflicting demands on the kernel, however, it is conceivable that the strategy will not be sufficient. That would be one of the few conceivable good reasons for a library to rely on the kind of shared state you postulate, and your strategy would thwart that particular purpose. I cannot judge how probable such an eventuality might be, but I doubt it's very likely.

How to prevent a dlopened library from using certain libc functions?

I'm writing a Linux/Unix program that has a lot of implementation in plugins that are dlopened by the program on-demand.
I'd like to prevent these plugin libraries from using some libc functions that mess with global state of the host process (such as manipulating signal handlers and suchlike).
What would be the best way to do this?
As far as I know I can't employ the classical LD_PRELOAD trick here since the libs are dlopened.
In practical terms, you can't. Code running from a library runs with the full privileges of the host application. Don't load libraries that you don't trust to not do stupid things.
You could conceivably examine the library before loading it and (for instance) reject libraries which have unexpected dependencies, or which have relocations for functions which they shouldn't be using. (This could be accomplished using ldd or readelf, for instance.) However, this will never be entirely reliable; there are numerous ways that a malicious library could hide its use of various functions.

Use functions from a different program

I'm trying to make Wireshark call a functions from a different program. These 2 programs are independent from each other. Is there any way of linking these 2 programs and making Wireshark have the ability of calling a function from inside the second program?
I was thinking of adding the #include to the top of the code of the file which has the required functions. Would this be possible? (I'll be trying it in a while since VS2013 is currently installing.)
Is there any other way of making this possible?
There usually isn't a way for programA to call a function found in a separate executable programB. You have a variety of options — the main ones are:
The normal method is to make the function available in a shared library (DLL), and for both programs to use the DLL to call the function. The DLL might be linked at compile time or loaded at runtime. A similar technique puts the function in a static library that is linked with the executables.
The less common method is to create an RPC (remote procedure call) interface to the function and have both programs (or, at least, programA) use the RPC interface. There are many options for which RPC system to use.
You can also think of more esoteric techniques, such as exposing the function as a web service.

Testing a C function which uses file descriptors

I am writing some functions which will be called with file descriptor arguments in production code.
During testing, how can 'inject' something which will let me confirm that the function makes the intended calls to lseek, write and so on?
Since you're on Linux, you can simply define the functions you want to stub inside your test program. The linker will deem these functions as local, and ignore those that will be dynamically loaded.
I used this successfully on Linux and Solaris with gcc.
Make sure to store the parameters they are invoked with and not to put assertions inside the stub functions, this will make them more reusable.
Depending on your operating system, the best solution is likely to be writing a "shim" library that gets dynamically linked in and intercepts the calls to the standard functions you're looking for, reporting out-of-band to the test harness. The libtrash library is a good example of how this works, and the code is readable; it implements a "trash can" for Linux by intercepting (some) calls to unlink and instead moving the links to a trash-can directory.

Catching a system call just before control enters a shared library

I have wrapped a number of system call function like write(), open() etc and LD-PRELOAD is used to override the original system calls. Moreover I have defined a few more functions and made this too a shred library.
I would like to catch all system calls from different application processes to these shared libraries before they enter the shared library. How can i do that?
Thanks
LD_PRELOAD is not necessarily a good way to interpose system calls, because a) it only allows you to intercept library calls and b) it only allows you to intercept library calls. ;)
A) While in general, system calls are wrapped by the shared libC in your system, no one prevents you from calling a system call yourself, e.g., but setting up the right register content and then issuing INT 0x80 on an x86 system. If the program you're interested in does so, you'll never catch those with LD_PRELOAD-based libc-interposition.
B) While in general, most programs use the shared libC in your system to make system calls, sometimes applications are linked statically, which means the libC code is part of the application and does not come from the shared lib. In such cases, LD_PRELOAD also does not help.
A comment already suggested to use strace/ltrace -- my generalized advice would be to have a look at ptrace() which both of these tools use and which should give you what you want without the need of modifying the kernel.
Patch-free User-level Link-time intercepting of system calls and interposing on library functions may do the trick but I have not tested it.
I'm pretty sure the only way you can do this is by modifying the system call table. HIDS systems (such as Samhain) will report this as an intrusion and Linux kernel developers frown upon this, heavily. The implementation details are very specific to the OS (i.e. what works on FreeBSD won't necessarily work on Linux), but the general implementation details are going to be the same. A kernel module might be a better way to go with cleaner, more standardized APIs.

Resources