If my process is loading a .so library and if a new version of the library is available is it possible to switch to the new library without doing a process restart ? Or the answer depends on things like whether there is a parameter change to one of the existing functions in the library ?
I am working in a pretty big system which runs 100s of processes and each loading 10s of libraries. The libraries provide specific functionality and are provided by separate teams. So when one of the library changes (for a bug fix lets say) ideal thing would be to publish it under-the-hood without impacting the running process. Is it possible ?
EDIT Thanks! In my case when a new library is available all the running processes have to start using it. Its not option to let them run with the old version and pick-up the new one later. So it looks like the safer option is to just reload the processes.
You cannot upgrade a linked library on the fly with a process running.
You could even try to, but if you succed (and you'll not fail with a "text file is in use" error message), you'll have to restart the process to make it mapping the new library into memory.
You can use lsof command to check which libraries are linked in (runtime or linktime):
lsof -p <process_pid> | grep ' mem '
One interesting technique, although it is somewhat prone to failure in the checkpoint restore step, is to do an invisible restart.
Your server process or whatever it is, saves all its necessary information into disk files. Including the file descriptor numbers and current states. Then, the server process does an exec system call to execute itself, replacing the current version of itself. Then it reads its state from the disk files and resumes serving its file descriptors as if nothing happened.
If all goes well, the restart is invisible and the new process is using all of the updated libraries.
At the very least, you have to make sure that the interface of the library does not change between versions. If that is assured, then I would try looking into dynamically loading the libraries with dlopen/dlsym and see if dlclose allows you to re-load.
I've never done any of this myself, but that's the path I'd pursue first. If you go this way, could you publish the results?
Linux provides several dynamic loader interfaces, and process can load dynamic librarys when running. dlopen and dlsysm provided by linux may solve your problem.
If you expect libaries to change on a fairly regular basis, and
you expect to maintain up-time, I think that your system
should be re-engineered so that such libraries actually become
loosely coupled components (e.g. services).
Having said that, my answer to the question is yes: under certain
circumstances, it possible to update shared libraries without
restarting processes. In most cases I expect it is not possible,
for instance when the API of your library changes, when the
arrangement of your data segment changes, when the library
maintains internal threads. The list is quite long.
For very small bug fixes to the code, you can still make use
of ptrace to write to the process memory space, and from
there redo what /lib/ld-linux.so does in terms of dynamic
linking. Honestly, it is an extremely complex activity.
ldd the binary of your process is one way to find out. although it is theoretically possible, it is not advisable to tinker with the running process, although i am sure utilities exist such as ksplice that tinker with the running linux kernels.
you can simply upgrade and the running process will continue with the old version, and pick up the new version when it restarts, assuming that your package management system is good and knows what is comptible to install.
You might want to learn about shared library versioning and the ld -h option.
One way to use it is as follows:
You maintain a version counter in your build system. You build the shared library with:
ld ..... -h mylibrary.so.$VERSION
however, you put it in your dev tree's lib as just plain mylibrary.so. (There's also a hack involving putting the entire .so into a .a file).
Now, at runtime, processes using the library look for the fully-versioned name. To roll out a new version, you just add the new version to the picture. Running programs linked against the old version continue to use it. As programs are relinked and tested against the new one, you roll out new executables.
Sometimes you can upgrade an in-use .so, and sometimes you cannot. This depends mostly on how you try to do it, but also on the safety guarantees of the kernel you're running on.
Don't do this:
cat new.so > old.so
...because eventually, your process may try to demand page something, and find that it's not in the correct spot anymore. It's a problem because the addresses of things may change, and it's still the same inode; you're just overwriting the bytes in the file.
However, if you:
mv new.so old.so
You'll be OK on most systems, because your running processes can hold onto a now-unnamed inode for the old library, while new invocations of your processes get the new file.
BUT, some kernels don't like to let you mv an in-use .so, perhaps out of caution, perhaps for their own simplicity.
Related
I understand that dlopen/dlclose/dlsym/dlerror APIs are used to interface to a
dynamic loading library but how does this work for a binary already compiled with
the -L flag referring the .so library (shared library, background/load time linking). Edit: I am referring to live updating of shared library without restarting the app.
Cong Ma provided some great pointers. Here is what I could conclude after some more research :
If an app is linked to an .so, and the shared library is replaced with a new copy of it, without stopping the app, the outcome would be 'non-deterministic' or 'unpredictable'.
During the 'linking/loading' phase of the app, the library gets mapped to the address space of the app/process.
Unix-like OS maintains COW, so for the reference, if copy is made on write, the app would have no impact though objective of being able to use new .so code would not be achieved
It all about the memory region that the app is accessing - if the access region does not have a change in address, the app won't experience any impact.
The new version of the lib may have an incremental nature, which might not impact the app - there can be compilation magic for the relocatable address generation too.
You may be lucky sometimes to fall in those above categories and have no issues while replacing the .so with a running app referring to it but practically you will encounter SIGSEGV/SIGBUS etc. most of the time, as the address/cross-reference gets jumbled up with a considerable change in the library.
dlopen() solves the problem as we reduce the window of access of the library. If you do dlopen() and keep it open for long enough you are exposing your app to the same problem. So the idea is to reduce the window of access to fail-proof the update scenario.
I'd say it's possible to reload a dynamic library while an application is running. But to do this you'd have to design such "live reloads" into the library from the start.
Conceptually it's not that difficult.
First, you have to provide a mechanism to signal the application to reload the library. This is actually the easy part - at least in the Unix world: rename or unlink the original library and put the new one in its place, then have the application call dlclose() on the old library, and dlopen() on the new. Windows is probably more difficult. On Unix or Unix-type systems, you can even do the unlink/rename while the application is still running. In fact, on Unix-type systems you do not want to overwrite the existing library at all as that will cause you serious problems. The existing on-disk library is the backing store for the pages mapped from that library into the process's address space - physically overwriting the shared library's file contents will again result in undefined behavior - the OS loads a page it expects to find the entry point of the function foo() in but that address is now halfway through the bar() function. The unlink or rename is safe because the file contents won't be deleted from disk while they're still mapped by the process.
Second, a library can't be unloaded while it's in use (it's probably possible to unload a library while it's being used on some OSes, but I can't see how that could result in anything other than undefined behavior, so "can't" is de facto correct if not de jure...).
So you have to control access to the library when it's being reloaded. This isn't really all that difficult - have a shim library that presents your API to any application, and use a read/write lock in that library. Each call into the library has to be done with a read lock held on the read/write lock, and the library can only be reloaded when a write lock is held on the read/write lock.
Those are the easy parts.
The hard part is designing and coding your library so that nothing is lost or corrupted whenever it's reloaded. How you do that depends on your code. But if you make any mistake, more undefined behavior awaits.
Since you have to halt access to the library to reload it anyway, and you can't carry state over through the reload process, it's going to be a LOT easier to just bounce the application.
Don't assume. Read about it... Inside stories on shared libraries and dynamic loading. tldr: Shared library is loaded (mostly) before the app is executed, rather than "in the background".
Do you mean "live updating of shared library without restarting the app"? I don't think this is possible without hacking your own loader. See answers to this question.
I want to write software that will detect all used/created/modified/deleted files during the execution of a process (and its child processes). The process has not yet run - the user provides a command line which will later be subprocessed via bash, so we can do things before and after execution, and control the environment the command is run in.
I have thought of four methods so far that could be useful:
Parse the command line to identify files and directories mentioned. Assume all files explicitly mentioned are used. Check directories before/after for created/deleted files. MD5 existing files before/after to see any are modified. This works on all operating systems and environments, but obviously has serious limitations (doesnt work when command is "./script.sh")
Run the process via another process like strace (dtruss for OSX, and there are equivalent windows programs), which listens for system calls. Parse output file to find files used/modified/deleted/created. Pros that its more sensitive than MD5 method and can deal with script.sh. Cons that its very OS specific (dtruss requires root privileges even if the process being run does not - outputs of tools all different). Could also create huge log files if there are a lot of read/write operations, and will certainly slow things down.
Integrate something similar to the above into the kernel. Obviously still OS specific, but at least now we are calling the shots, creating common output format for all OS's. Wouldn't create huge log files, and could even stop hooking syscalls to, say, read() after process has requested the first read() to the file. I think this is what the tool inotify is doing, but im not familiar with it at all, nor kernel programming!
Run the process using the LD_PRELOAD trick (called DYLD_INSERT_LIBRARIES on OSX, not sure if it exists in Windows) which basically overwrites any call to open() by the process with our own version of open() which logs what we're opening. Same for write, read, etc. It's very simple to do, and very performant since you're essentially teaching the process to log itself. The downside is that it only works for dynamically-linked process, and i have no idea of the prevalence of dynamic/statically linked programs. I dont even know if it is possible before execution to tell if a process is dynamically or statically linked (with the intention of using this method by default, but falling back to a less-performant method if its not possible).
I need help choosing the optimal path to go down. I have already implemented the first method because it was simple and gave me a way to work on the logging backend (http://ac.gt/log) but really i need to upgrade to one of the other methods. Your advice would be invaluable :)
Take a look to the source code of "strace" (and its -f to trace children). It does basically what you are trying to do. It captures all the system calls of the process (or its childs) so you can grep for operations like "open", etc.
The following link provides some examples of implementing your own strace by using the ptrace system call:
https://blog.nelhage.com/2010/08/write-yourself-an-strace-in-70-lines-of-code/
So here my problem. I have a Linux program running on a VM which uses OpenCL via dlopen to execute some commands. About half way through the program's execution it will sleep, and upon resume can make no assumptions about any state on the GPU (in fact, the drivers may have been reloaded and the physical GPU may have changed). Before sleeping dlclose is called and this does unload the OpenCL memory regions (which is a good thing), but the libraries OpenCL uses (cuda and nvidia libraries in this case) are NOT unloaded. Thus when the program resumes and tries to reinitalize everything again, things fail.
So what I'm looking for is a method to effectively unlink/unload the shared libraries that OpenCL was using so it can properly "restart" itself. As a note, the process itself can be paused (stopped) during this transition for as long as needed, but it may not be killed.
Also, assume I'm working with root access and have more or less no restrictions on what files I can modify/touch.
For a less short-term practical solution, you could probably try forcing re-initialization sequence on a loaded library, and/or force unloading.
libdl is just an ordinary library you can copy, modify, and link against instead of using the system one. One option is disabling RTLD_NODELETE handling. Another is running DT_FINI/DT_INIT sequence for a given library.
dl_iterate_phdr, if available, can probably be used to do the same without the need to modify libdl.
Poking around in libdl and/or using LD_DEBUG (if supported) may shed some light on why it is not being unloaded in the first place.
If they don't unload, this is either because something is keeping a reference to them, or they have the nodelete flag set, or because the implementation of dlclose does not support unloading them for some other reason. Note that there is no requirement that dlclose ever unload anything:
An application writer may use dlclose() to make a statement of intent on the part of the process, but this statement does not create any requirement upon the implementation. When the symbol table handle is closed, the implementation may unload the executable object files that were loaded by dlopen() when the symbol table handle was opened and those that were loaded by dlsym() when using the symbol table handle identified by handle.
Source: http://pubs.opengroup.org/onlinepubs/9699919799/functions/dlclose.html
The problem you're experiencing is a symptom of why it's an utterly bad idea to have graphics drivers loaded in the userspace process, but fixing this is both a technical and political challenge (lots of people are opposed to fixing it).
The only short-term practical solution is probably either factoring out the code that uses these libraries as a separate process that you can kill and restart, and talking to it via some IPC mechanism, or simply having your whole program serialize its state and re-exec itself on resume.
The term has several definition according to Wikipedia, however what I'm really interested in is creating a program that has all its needed dependencies included within the source folder, so the end user doesn't need to install additional libraries for the app to install. For example, how Mac apps has all its dependencies all within the program itself already...
or is there a function that autotools does this? I'm programming in the Linux environment...
Are you talking about the source code of your application, or about your application binary?
The answer I'd give for both the cases depends on what libraries you're using.
If you're using libraries that you can find anywhere, that are somehow standard and/or that are quite big, you shouldn't attach them to your application, just require them both to build and to run your application.
Anyway don't be much concerned about your source code: little people will build your application, and they probably know something about programming and how a Linux system works; it won't be a big deal to require many (also not-so-common) dependences to build your application.
For what concerns the binary version it could be a little more problematic, since it will be used by end users who often don't know anything about libraries and programming stuff: you could choose to statically link the smallest and most uncommon libraries to your binary, in order to have less dependences.
You could do it, if you link statically, but it'd be somewhat unusual, and depending on what your program is supposed to do, you might be limiting yourself.
The alternative, if this is not just a one-off project, is to create a Linux Standard Base compatible RPM package and restrict yourself to linking against the libraries and symbols that LSB defines.
Run ldd on your program to discover all dependencies, then copy these to your directory, and add a program-wrapper script that issues
#!/bin/sh
LD_LIBRARY_PATH="${0##*/}:$LD_LIBRARY_PATH" exec "${0##*/}/real-program" "$#";
Duplicating the Mac OS X .app behavior on a plain POSIX system is difficult because it is very hard to guarantee that a process can find it's own executable (there are several way that will almost always work...). Mac OS X provides a OS service for this, but Linux (for instance) does not.
Once you've accomplished that feat, this becomes possible. Though, as others have mentioned, it loses the ability to share resource demands (disk space, RAM space, cache space) with other programs that use the same libraries because you'd be using static copies, or dynamically loading your own copy from the .app-like bundle.
I apologize in advance for the long-winded question but I wanted to make sure I didn't leave out any key points that may alter your response.
I am responsible for maintaining system software written in 'C' of which we have a few common '.a' libraries. We have what I would call an "Execution Manager" who's main job is to fork and and exec a variable list of "test-job" executables and return control back to the Execution Manager once the test-job process terminates. All executables, including the execution manager, are statically linked against the aforementioned libraries. The execution manager and the test-job processes it forks use IPC via shared memory. One of the common libraries contains wrapper functions to create and attach the shared memory with a predefined key that never changes.
A few months ago we locked down our software (test-jobs plus execution manager), compiled them statically and released them to have the test-jobs "proofed". Since that time, there have been some requests for changes to be made to the execution manager which would require changes to a select few common library functions. However, management has decided that they do not want to recompile the test-jobs against the new version of the common libraries because that will require them to re-proof the test-job executables they currently have; and they do not want to expend the money to do that.
Since all executables were statically compiled, I would normally say mixing an Execution Manager with test-jobs statically compiled against different versions of the same common library would not be a problem. However, the inclusion of IPC via shared memory is making me wonder if that is still true. My gut says that as long as there were no changes to the shared memory wrapper functions, especially the key, then we should be OK, but I could really use some expert opinions on this.
Thanks for taking the time to read this, much appreciated.
You should be OK, as long as the data structures and protocols that define how the processes talk to each other over the shared memory have not changed. (That is, your little ABI that exists between the two processes).
I'd like to confirm what caf says. You have correctly identified the crucial bits: the key and the wrapper functions used to access the shared memory. These bits don't care whether they're defined in a .o file, whether they're part of a .a file, or where they are within a .a file. At link time, they get pulled into the exe, keeping their original functionality; their "temporary home" in a .a file doesn't affect how they find the shared memory segment, how they determine the relevant offsets, etc. So if these (i.e., the key and the wrapper functions) haven't changed, you should be set.