So here my problem. I have a Linux program running on a VM which uses OpenCL via dlopen to execute some commands. About half way through the program's execution it will sleep, and upon resume can make no assumptions about any state on the GPU (in fact, the drivers may have been reloaded and the physical GPU may have changed). Before sleeping dlclose is called and this does unload the OpenCL memory regions (which is a good thing), but the libraries OpenCL uses (cuda and nvidia libraries in this case) are NOT unloaded. Thus when the program resumes and tries to reinitalize everything again, things fail.
So what I'm looking for is a method to effectively unlink/unload the shared libraries that OpenCL was using so it can properly "restart" itself. As a note, the process itself can be paused (stopped) during this transition for as long as needed, but it may not be killed.
Also, assume I'm working with root access and have more or less no restrictions on what files I can modify/touch.
For a less short-term practical solution, you could probably try forcing re-initialization sequence on a loaded library, and/or force unloading.
libdl is just an ordinary library you can copy, modify, and link against instead of using the system one. One option is disabling RTLD_NODELETE handling. Another is running DT_FINI/DT_INIT sequence for a given library.
dl_iterate_phdr, if available, can probably be used to do the same without the need to modify libdl.
Poking around in libdl and/or using LD_DEBUG (if supported) may shed some light on why it is not being unloaded in the first place.
If they don't unload, this is either because something is keeping a reference to them, or they have the nodelete flag set, or because the implementation of dlclose does not support unloading them for some other reason. Note that there is no requirement that dlclose ever unload anything:
An application writer may use dlclose() to make a statement of intent on the part of the process, but this statement does not create any requirement upon the implementation. When the symbol table handle is closed, the implementation may unload the executable object files that were loaded by dlopen() when the symbol table handle was opened and those that were loaded by dlsym() when using the symbol table handle identified by handle.
Source: http://pubs.opengroup.org/onlinepubs/9699919799/functions/dlclose.html
The problem you're experiencing is a symptom of why it's an utterly bad idea to have graphics drivers loaded in the userspace process, but fixing this is both a technical and political challenge (lots of people are opposed to fixing it).
The only short-term practical solution is probably either factoring out the code that uses these libraries as a separate process that you can kill and restart, and talking to it via some IPC mechanism, or simply having your whole program serialize its state and re-exec itself on resume.
Related
Suppose that I have a C library that requires initialization and cleanup functions that aren’t thread-safe. Specifically, these functions may invoke other thread-unsafe functions in other libraries. I don’t know (in a default build) which libraries these will be.
Now consider the case of writing Java bindings to this library. Java spawns multiple threads before running any Java code. Worse, in the case of (say) an Eclipse plugin, there could be multiple threads running Java code by the time my code receives control. Some of the other threads could be using the aforementioned unsafe functions.
My current plan is to statically link the C library (in my case, libcurl) and all transitive dependencies – in my case, a TLS library (probably mbedTLS) and (on Windows platforms) the CRT. Fortunately, libcurl cleans up everything it has allocated, so problems related to allocating from one heap and freeing it on another should not arise. Because everything is statically linked, and won’t try to load any other shared libraries, I can then initialize libcurl from a static initializer.
Will this even work? Is there a better way?
Edit: The reason that serializing library calls won’t work, and that I believe that my solution might work, is that the global state is stored not only in libcurl itself, but also in libraries libcurl depends on. Some of these libraries (ex. OpenSSL) might be in use by other code when my code is loaded. So I would need to lock against the entire process.
The reason I believe that isolating the global state would work is that libcurl (and every library it depends on) is thread safe after initialization. I need to make sure that the initialization of libcurl doesn’t create race conditions. Afterwards I am fine.
[Updated and revised]
Your concern seems to be that you will have both direct and indirect bindings to some native library -- say mbedTLS --, that that native library requires one-time initialization that is not thread-safe, and that, beyond your ability to detect or control, different threads of the process may concurrently attempt to initialize that library, or perhaps may (unsafely) attempt to initialize it more than once. That certainly seems to be a worst-case scenario.
On the other hand, you postulate that you can successfully build a monolithic, dynamically-loadable library containing the native library you want along with the transitive closure of all its dependencies (outside the kernel), so as to ensure that this library does not share state with any other library loaded by the process. You assert that after a non-thread-safe initialization, the combined stack will be thread safe, at least as you intend to use it. You want to know about how to initialize the library.
Java promises that each class will be initialized by exactly one thread, and that afterward its initialized state will be visible to all threads. Although that does not explicitly address the question, it certainly implies that if the initialization of your native libraries is performed entirely as part of the initialization of a class -- e.g. via a static initializer, as you propose -- then the correct initialized state will be visible to all Java threads. That adequately addresses the problem as I understand it.
I remain dubious that building the monolithic library is necessary, but if you truly have to deal with the worst-case scenario you seem to anticipate then perhaps it is. Inasmuch as you cannot isolate the library from conflicting demands on the kernel, however, it is conceivable that the strategy will not be sufficient. That would be one of the few conceivable good reasons for a library to rely on the kind of shared state you postulate, and your strategy would thwart that particular purpose. I cannot judge how probable such an eventuality might be, but I doubt it's very likely.
I understand that dlopen/dlclose/dlsym/dlerror APIs are used to interface to a
dynamic loading library but how does this work for a binary already compiled with
the -L flag referring the .so library (shared library, background/load time linking). Edit: I am referring to live updating of shared library without restarting the app.
Cong Ma provided some great pointers. Here is what I could conclude after some more research :
If an app is linked to an .so, and the shared library is replaced with a new copy of it, without stopping the app, the outcome would be 'non-deterministic' or 'unpredictable'.
During the 'linking/loading' phase of the app, the library gets mapped to the address space of the app/process.
Unix-like OS maintains COW, so for the reference, if copy is made on write, the app would have no impact though objective of being able to use new .so code would not be achieved
It all about the memory region that the app is accessing - if the access region does not have a change in address, the app won't experience any impact.
The new version of the lib may have an incremental nature, which might not impact the app - there can be compilation magic for the relocatable address generation too.
You may be lucky sometimes to fall in those above categories and have no issues while replacing the .so with a running app referring to it but practically you will encounter SIGSEGV/SIGBUS etc. most of the time, as the address/cross-reference gets jumbled up with a considerable change in the library.
dlopen() solves the problem as we reduce the window of access of the library. If you do dlopen() and keep it open for long enough you are exposing your app to the same problem. So the idea is to reduce the window of access to fail-proof the update scenario.
I'd say it's possible to reload a dynamic library while an application is running. But to do this you'd have to design such "live reloads" into the library from the start.
Conceptually it's not that difficult.
First, you have to provide a mechanism to signal the application to reload the library. This is actually the easy part - at least in the Unix world: rename or unlink the original library and put the new one in its place, then have the application call dlclose() on the old library, and dlopen() on the new. Windows is probably more difficult. On Unix or Unix-type systems, you can even do the unlink/rename while the application is still running. In fact, on Unix-type systems you do not want to overwrite the existing library at all as that will cause you serious problems. The existing on-disk library is the backing store for the pages mapped from that library into the process's address space - physically overwriting the shared library's file contents will again result in undefined behavior - the OS loads a page it expects to find the entry point of the function foo() in but that address is now halfway through the bar() function. The unlink or rename is safe because the file contents won't be deleted from disk while they're still mapped by the process.
Second, a library can't be unloaded while it's in use (it's probably possible to unload a library while it's being used on some OSes, but I can't see how that could result in anything other than undefined behavior, so "can't" is de facto correct if not de jure...).
So you have to control access to the library when it's being reloaded. This isn't really all that difficult - have a shim library that presents your API to any application, and use a read/write lock in that library. Each call into the library has to be done with a read lock held on the read/write lock, and the library can only be reloaded when a write lock is held on the read/write lock.
Those are the easy parts.
The hard part is designing and coding your library so that nothing is lost or corrupted whenever it's reloaded. How you do that depends on your code. But if you make any mistake, more undefined behavior awaits.
Since you have to halt access to the library to reload it anyway, and you can't carry state over through the reload process, it's going to be a LOT easier to just bounce the application.
Don't assume. Read about it... Inside stories on shared libraries and dynamic loading. tldr: Shared library is loaded (mostly) before the app is executed, rather than "in the background".
Do you mean "live updating of shared library without restarting the app"? I don't think this is possible without hacking your own loader. See answers to this question.
I want to record synchronization operations, such as locks, sempahores, barriers of a multithreaded application, so that I can replay the recorded application later on, for the purpose of debugging.
On way is to supply your own lock, sempaphore, condition variables etc.. functions that also do logging, but I think that is an overkill, because down under they must be using some common synchronization operations.
So my question is which synchronization operations I should log so that I require minimum modifications to my program. In other words, which are the functions or macros in glibc and system calls over which all these synchronization operations are built? So that I only modify those for logging and replaying.
The best I can think of is debugging with gdb in 'record' mode:
Gdb process record/replay execution log
According to this page: GDB Process Record threading support is underway, but it might not be complete yet.
Less strictly answering your question, may I suggest
Helgrind
DRD
On other platforms, several other threading checkers exist, but I haven't got much experience with them.
In your case, an effective method of "logging" systems calls on Linux may be to use the LD_PRELOAD trick, and over-ride the actual system calls with your own versions of the calls that will log the use of the call and then forward to the actual system call.
A more extensive example is given here in Linux Journal.
As you can see at these links, the basic gist of the "trick" is that you can make the system load your own dynamic library before any other system libraries, such as pthreads, etc., and then mask the calls to those library functions by placing your own versions of those functions as the precendent. You can then, inside your over-riding function, log the use of the original function, as well as pass on the arguments to the actual call you're attempting to log.
The nice thing about this method is it will catch pretty much any call you can make, both a function that remains entirely in user-land, as well as a function that will make a kernel call.
So GDB record mode doesn't support multithreading, but the RR record/replay system absolutely does: https://rr-project.org/ .
For a commercial solution with fewer technical restrictions, there's also UDB: https://undo.io/solutions/ .
I've worked on debuggers for some years now and from what I've seen, the GDB record+replay stuff is really not ready for primetime, for this and other reasons (eg, slowdown & huge memory requirements).
If you can get it to work in your dev environment, record+replay/reversible debugging can be pretty gamechanging for your workflow; I hope you find a way to leverage it.
If my process is loading a .so library and if a new version of the library is available is it possible to switch to the new library without doing a process restart ? Or the answer depends on things like whether there is a parameter change to one of the existing functions in the library ?
I am working in a pretty big system which runs 100s of processes and each loading 10s of libraries. The libraries provide specific functionality and are provided by separate teams. So when one of the library changes (for a bug fix lets say) ideal thing would be to publish it under-the-hood without impacting the running process. Is it possible ?
EDIT Thanks! In my case when a new library is available all the running processes have to start using it. Its not option to let them run with the old version and pick-up the new one later. So it looks like the safer option is to just reload the processes.
You cannot upgrade a linked library on the fly with a process running.
You could even try to, but if you succed (and you'll not fail with a "text file is in use" error message), you'll have to restart the process to make it mapping the new library into memory.
You can use lsof command to check which libraries are linked in (runtime or linktime):
lsof -p <process_pid> | grep ' mem '
One interesting technique, although it is somewhat prone to failure in the checkpoint restore step, is to do an invisible restart.
Your server process or whatever it is, saves all its necessary information into disk files. Including the file descriptor numbers and current states. Then, the server process does an exec system call to execute itself, replacing the current version of itself. Then it reads its state from the disk files and resumes serving its file descriptors as if nothing happened.
If all goes well, the restart is invisible and the new process is using all of the updated libraries.
At the very least, you have to make sure that the interface of the library does not change between versions. If that is assured, then I would try looking into dynamically loading the libraries with dlopen/dlsym and see if dlclose allows you to re-load.
I've never done any of this myself, but that's the path I'd pursue first. If you go this way, could you publish the results?
Linux provides several dynamic loader interfaces, and process can load dynamic librarys when running. dlopen and dlsysm provided by linux may solve your problem.
If you expect libaries to change on a fairly regular basis, and
you expect to maintain up-time, I think that your system
should be re-engineered so that such libraries actually become
loosely coupled components (e.g. services).
Having said that, my answer to the question is yes: under certain
circumstances, it possible to update shared libraries without
restarting processes. In most cases I expect it is not possible,
for instance when the API of your library changes, when the
arrangement of your data segment changes, when the library
maintains internal threads. The list is quite long.
For very small bug fixes to the code, you can still make use
of ptrace to write to the process memory space, and from
there redo what /lib/ld-linux.so does in terms of dynamic
linking. Honestly, it is an extremely complex activity.
ldd the binary of your process is one way to find out. although it is theoretically possible, it is not advisable to tinker with the running process, although i am sure utilities exist such as ksplice that tinker with the running linux kernels.
you can simply upgrade and the running process will continue with the old version, and pick up the new version when it restarts, assuming that your package management system is good and knows what is comptible to install.
You might want to learn about shared library versioning and the ld -h option.
One way to use it is as follows:
You maintain a version counter in your build system. You build the shared library with:
ld ..... -h mylibrary.so.$VERSION
however, you put it in your dev tree's lib as just plain mylibrary.so. (There's also a hack involving putting the entire .so into a .a file).
Now, at runtime, processes using the library look for the fully-versioned name. To roll out a new version, you just add the new version to the picture. Running programs linked against the old version continue to use it. As programs are relinked and tested against the new one, you roll out new executables.
Sometimes you can upgrade an in-use .so, and sometimes you cannot. This depends mostly on how you try to do it, but also on the safety guarantees of the kernel you're running on.
Don't do this:
cat new.so > old.so
...because eventually, your process may try to demand page something, and find that it's not in the correct spot anymore. It's a problem because the addresses of things may change, and it's still the same inode; you're just overwriting the bytes in the file.
However, if you:
mv new.so old.so
You'll be OK on most systems, because your running processes can hold onto a now-unnamed inode for the old library, while new invocations of your processes get the new file.
BUT, some kernels don't like to let you mv an in-use .so, perhaps out of caution, perhaps for their own simplicity.
I apologize in advance for the long-winded question but I wanted to make sure I didn't leave out any key points that may alter your response.
I am responsible for maintaining system software written in 'C' of which we have a few common '.a' libraries. We have what I would call an "Execution Manager" who's main job is to fork and and exec a variable list of "test-job" executables and return control back to the Execution Manager once the test-job process terminates. All executables, including the execution manager, are statically linked against the aforementioned libraries. The execution manager and the test-job processes it forks use IPC via shared memory. One of the common libraries contains wrapper functions to create and attach the shared memory with a predefined key that never changes.
A few months ago we locked down our software (test-jobs plus execution manager), compiled them statically and released them to have the test-jobs "proofed". Since that time, there have been some requests for changes to be made to the execution manager which would require changes to a select few common library functions. However, management has decided that they do not want to recompile the test-jobs against the new version of the common libraries because that will require them to re-proof the test-job executables they currently have; and they do not want to expend the money to do that.
Since all executables were statically compiled, I would normally say mixing an Execution Manager with test-jobs statically compiled against different versions of the same common library would not be a problem. However, the inclusion of IPC via shared memory is making me wonder if that is still true. My gut says that as long as there were no changes to the shared memory wrapper functions, especially the key, then we should be OK, but I could really use some expert opinions on this.
Thanks for taking the time to read this, much appreciated.
You should be OK, as long as the data structures and protocols that define how the processes talk to each other over the shared memory have not changed. (That is, your little ABI that exists between the two processes).
I'd like to confirm what caf says. You have correctly identified the crucial bits: the key and the wrapper functions used to access the shared memory. These bits don't care whether they're defined in a .o file, whether they're part of a .a file, or where they are within a .a file. At link time, they get pulled into the exe, keeping their original functionality; their "temporary home" in a .a file doesn't affect how they find the shared memory segment, how they determine the relevant offsets, etc. So if these (i.e., the key and the wrapper functions) haven't changed, you should be set.