Remap shared library on a child process - linker

Parent is forking a new child.
This child inherits the parent shared library libfoo.so
If I look at the child process memory map related to libfoo.so, the virtual mapping are the same of the parent (as it should, since doing a fork())
I would like to be able to remap the child shared library (libfoo.so), to a different virtual mapping, (which is of course different from the parent)
Do you have an idea ? or off-the-shelf solution for it ?
is it possible ?

It's generally impossible. Dynamic loader loads and relocates libraries on spawn and you shouldn't attempt calling it directly from running code.
Even in case you can somehow fit in the call of dynamic loader, it would like to start clean (meaning it will want to relocate itself, but it's already relocated…). It will also want to load and relocate all libraries, not just one of them.
So, simply speaking, there's no interface for doing this in Glibc and probably every other C library.
Then, there are also different formats of relocation information. In case of x86 and x86-64 it's RELA, meaning the relocations have explicit addends in separate table, and hence it's possible (in principle) to edit addresses again. But on ARM, the relocations are REL, meaning addends are implicit and are lost (overwritten) once you do the relocation.
You should look at dlopen and co. — maybe they'll fit your purpose.

Related

It's possible to share data-section (veriables) between twice `dlopen`ed sahrdLib in the same process? [duplicate]

I can't seem to find an answer after searching for this out on the net.
When I use dlopen the first time it seems to take longer than any time after that, including if I run it from multiple instances of a program.
Does dlopen load up the so into memory once and have the OS save it so that any following calls even from another instance of the program point to the same spot in memory?
So basically does 3 instances of a program running a library mean 3 instances of the same .so are loaded into memory, or is there only one instance in memory?
Thanks
Does dlopen load up the so into memory once and have the OS save it so that any following calls even from another instance of the program point to the same spot in memory?
Multiple calls to dlopen from within a single process are guaranteed to not load the library more than once. From the man page:
If the same shared object is loaded again with dlopen(), the same
object handle is returned. The dynamic linker maintains reference
counts for object handles, so a dynamically loaded shared object is
not deallocated until dlclose() has been called on it as many times
as dlopen() has succeeded on it.
When the first call to dlopen happens, the library is mmaped into the calling process. There are usually at least two separate mmap calls: the .text and .rodata sections (which usually reside in a single RO segment) are mapped read-only, the .data and .bss sections are mapped read-write.
A subsequent dlopen from another process performs the same mmaps. However the OS does not have to load any of the read-only data from disk -- it merely increments reference counts on the pages already loaded for the first dlopen call. That is the sharing in "shared library".
So basically does 3 instances of a program running a library mean 3 instances of the same .so are loaded into memory, or is there only one instance in memory?
Depends on what you call an "instance".
Each process will have its own set of (dynamically allocated) runtime loader structures describing this library, and each set will contain an "instance" of the shared library (which can be loaded at different address in different process). Each process will also have its own instance of writable data (which uses copy-on-write semantics). But the read-only mappings will all occupy the same physical memory (though they can appear at different addresses in each of the processes).

Loading shared library twice

I'm trying to load a shared library in C twice:
lib1 = dlopen("mylib.so", RTLD_LAZY | RTLD_LOCAL | RTLD_DEEPBIND);
lib2 = dlopen("mylib.so", RTLD_LAZY | RTLD_LOCAL | RTLD_DEEPBIND);
What I want is that lib1 and lib2 have separate address spaces so that they can do different things. Currently, the only way I can achieve this is by copying mylib so that the code looks like this:
lib1 = dlopen("mylib.so", RTLD_LAZY | RTLD_LOCAL | RTLD_DEEPBIND);
lib2 = dlopen("mylib2.so", RTLD_LAZY | RTLD_LOCAL | RTLD_DEEPBIND);
In a limited scope this works fine for me. However, I have an application which uses the library a generic number of times which makes copying the library cumbersome.
Is there a better way to have a separate address space for each time the library is loaded?
EDIT:
I want to load the library multiple times as my application is processing a kind of message queue. The items in the message queue refer to the name of a shared library (e.g. mylib) and contain a set of data that shall be processed by the library. I want to process the MQ in a multithreading environment, running each call to the library's method in its own thread.
As long as the MQ contains the call to a library only once, everything is working as expected. However, when I have two items that use the same library, things start to get weird.
You need to use dlmopen to achieve this sort of isolation:
// No need for RTLD_LOCAL, not sure about RTLD_DEEPBIND
lib1 = dlmopen (LM_ID_NEWLM, "mylib.so", RTLD_LAZY | RTLD_DEEPBIND);
The whole idea of dynamically loading code is that you are thus able to share it, in particular with other processes. For that reason, I don't think that it is possible to really load the library twice.
There are ways around this though. One may be to fool the dynamic linker into loading it a second time. Copying the library is one way that you found already. I could imagine hard links to work, too.
However, I think it would be better if you worked with the flow here. I see two ways to achieve what I guess is your goal: Forking a separate process or creating a separate init function.
For the separate process, you just fork(), after setting up appropriate IPC mechanisms between parent and child instead of loading the library a second time. Since the fork creates a new process, it receives its own memory space and things remain separate. As IPC I'd suggest using some kind of middleware, like ZeroMQ, dbus or XMLRPC.
Creating a separate init function is the other option. For that, instead of creating the library's state as globals, you throw them together into a structure. Then, in that init function, you create an instance of that structure, set it up and returns its address. All other functions, which previously operated on the global state, now receive the address of that structure as additional (customary first) parameter. Instead of loading the library twice, you simple call the init function twice to set up separate environments.
Is there a better way to have a separate address space for each time the library is loaded?
Actually, a virtual address space belongs to a process (so to all threads inside it), not to a shared library (which uses several segments of that virtual address space).
For a process of pid 1234, use pmap(1) (as pmap 1234) or proc(5) (e.g. try cat /proc/1234/maps ...)
You really should avoid dlopen(3)-ing the same shared library "twice" (and this is difficult, on purpose; you could use symlinks and dlopen several symlinks to the same shared object, but you should not do this, for example because static data would be "loaded twice" and aftermath will happen). To avoid this happening, the dynamic loader uses reference counting techniques...
Read also Drepper's How to Write Shared Libraries
Is there a better way to have a separate address space for each time the library is loaded?
You then need different processes, each having its own virtual address space. You'll use inter-process communication : see pipe(7), fifo(7), socket(7), unix(7), shm_overview(7), sem_overview(7) etc...

What and where exactly is the loader?

I understand every bit of the C compilation process (how the object files are linked to create the executable). But about the loader itself (which starts the program running) I have a few doubts.
Is the loader part of the kernel?
How exactly is the ./firefox or some command like that loaded? I mean you normally type such commands into the terminal which loads the executable I presume. So is the loader a component of the shell?
I think I'm also confused about where the terminal/shell fits into all of this and what its role is.
The format of an executable determines how it will be loaded. For example executables with "#!" as the first two characters are loaded by the kernel by executing the named interpreter and feeding the file to it as the first argument. If the executable is formatted as a PE, ELF, or MachO binary then the kernel uses an intrepter for that format that is built in to the kernel in order to find the executable code and data and then choose the next step.
In the case of a dynamically linked ELF, the next step is to execute the dynamic loader (usually ld.so) in order to find the libraries, load them, abd resolve the symbols. This all happens in userspace. The kernel is more or less unaware of dynamic linking, because it all happens in userspace after the kernel has handed control to the interprter named in the ELF file.
The corresponding system call is exec. It is part of the kernel and in charge of cleaning the old address space that makes the call and get a new fresh one with all materials to run a new code. This is part of the kernel because address space is a kind of sandbox that protect processes from others, and since it is critical it is in charge of the kernel.
The shell is just in charge of interpreting what you type and transform it to proper structures (list or arrays of C-strings) to pass to some exec call (after having, most of the time, spawned a new process with fork).

How to get address information from library to be shared among all processes?

In Understanding the Linux Kernel, 3rd edition, it says:
Shared libraries are especially convenient on systems that provide file memory mapping, because they reduce the amount of main memory requested for executing a
program. When the dynamic linker must link a shared library to a process, it does not copy the object code, but performs only a memory mapping of the relevant portion of the library file into the process’s address space. This allows the page frames containing the machine code of the library to be shared among all processes that are using the same code. Clearly, sharing is not possible if the program has been linked statically. (page 817)
I am interested in this, want to write a small program in C to verify, given two pids as input such as two gedit processes, and then get the address information from page frames to be shared. Does anyone know how to do it? From that book, I think the bss segment and text segment address from two or more gedit processes are same, is that correct?
It is not the text and bss sections of your gedit (or whatever) that have the same address, but the content of the libc.so shared library - and all other shared libraries used by the two gedit processes.
This, as the quoted text says, allows the shared library to be ONE copy, and this is the main benefit of the shared library in general.
bss is generally not shared - since that is per process data. text sections of two processes running the same executable, in Linux, will share the same code.
Unfortunately, the proof of this would be to look at the physical mapping of pages (page at address X in process A is at physical address Y, and page for address X in process B is also at physical address Y) within the processes, and that's, as far as I know, not easily available without groking about inside the OS kernel.
Look at the contents of /proc/*/maps.

When a binary file runs, does it copy its entire binary data into memory at once? Could I change that?

Does it copy the entire binary to the memory before it executes? I am interested in this question and want to change it into some other way. I mean, if the binary is 100M big (seems impossible), I could run it while I am copying it into the memory. Could that be possible?
Or could you tell me how to see the way it runs? Which tools do I need?
The theoretical model for an application-level programmer makes it appear that this is so. In point of fact, the normal startup process (at least in Linux 1.x, I believe 2.x and 3.x are optimized but similar) is:
The kernel creates a process context (more-or-less, virtual machine)
Into that process context, it defines a virtual memory mapping that maps
from RAM addresses to the start of your executable file
Assuming that you're dynamically linked (the default/usual), the ld.so program
(e.g. /lib/ld-linux.so.2) defined in your program's headers sets up memory mapping for shared libraries
The kernel does a jmp into the startup routine of your program (for a C program, that's
something like crtprec80, which calls main). Since it has only set up the mapping, and not actually loaded any pages(*), this causes a Page Fault from the CPU's Memory Management Unit, which is an interrupt (exception, signal) to the kernel.
The kernel's Page Fault handler loads some section of your program, including the part
that caused the page fault, into RAM.
As your program runs, if it accesses a virtual address that doesn't have RAM backing
it up right now, Page Faults will occur and cause the kernel to suspend the program
briefly, load the page from disc, and then return control to the program. This all
happens "between instructions" and is normally undetectable.
As you use malloc/new, the kernel creates read-write pages of RAM (without disc backing files) and adds them to your virtual address space.
If you throw a Page Fault by trying to access a memory location that isn't set up in the virtual memory mappings, you get a Segmentation Violation Signal (SIGSEGV), which is normally fatal.
As the system runs out of physical RAM, pages of RAM get removed; if they are read-only copies of something already on disc (like an executable, or a shared object file), they just get de-allocated and are reloaded from their source; if they're read-write (like memory you "created" using malloc), they get written out to the ( page file = swap file = swap partition = on-disc virtual memory ). Accessing these "freed" pages causes another Page Fault, and they're re-loaded.
Generally, though, until your process is bigger than available RAM — and data is almost always significantly larger than the executable — you can safely pretend that you're alone in the world and none of this demand paging stuff is happening.
So: effectively, the kernel already is running your program while it's being loaded (and might never even load some pages, if you never jump into that code / refer to that data).
If your startup is particularly sluggish, you could look at the prelink system to optimize shared library loads. This reduces the amount of work that ld.so has to do at startup (between the exec of your program and main getting called, as well as when you first call library routines).
Sometimes, linking statically can improve performance of a program, but at a major expense of RAM — since your libraries aren't shared, you're duplicating "your libc" in addition to the shared libc that every other program is using, for example. That's generally only useful in embedded systems where your program is running more-or-less alone on the machine.
(*) In point of fact, the kernel is a bit smarter, and will generally preload some pages
to reduce the number of page faults, but the theory is the same, regardless of the
optimizations
No, it only loads the necessary pages into memory. This is demand paging.
I don't know of a tool which can really show that in real time, but you can have a look at /proc/xxx/maps, where xxx is the PID of your process.
While you ask a valid question, I don't think it's something you need to worry about. First off, a binary of 100M is not impossible. Second, the system loader will load the pages it needs from the ELF (Executable and Linkable Format) into memory, and perform various relocations, etc. that will make it work, if necessary. It will also load all of its requisite shared library dependencies in the same way. However, this is not an incredibly time-consuming process, and one that doesn't really need to be optimized. Arguably, any "optimization" would have a significant overhead to make sure it's not trying to use something that hasn't been loaded in its due course, and would possibly be less efficient.
If you're curious what gets mapped, as fge says, you can check /proc/pid/maps. If you'd like to see how a program loads, you can try running a program with strace, like:
strace ls
It's quite verbose, but it should give you some idea of the mmap() calls, etc.

Resources