I want to clear up a confusion I have regarding shared libraries. When I search online, I find in explanations to static linking that since the library is included in the executable itself, it leads to a larger executable, increasing the memory footprint of the program.
While in case of dynamic library/shared library, the library is linked at runtime. But in dynamic linking (correct me if I'm wrong), if the library is loaded into the process at runtime to be linked, does it then lead to any memory saving in any way ?
The library is loaded once into memory by the OS, and is linked to the running process by mapping its memory location into the processes virtual address space. From the processes point of view, each has its own copy of the library, but there is really only one copy in memory.
Related
I know that dynamic linking are smaller on disk but do they use more RAM at run time. Why if so?
The answer is "it depends how you measure it", and also "it depends which platform you're running on".
Static linking uses less runtime RAM, because for dynamic linking the entire shared object needs to be loaded into memory (I'll be qualifying this statement in a second), whilst with static linking only those functions you actually need are loaded.
The above statement isn't 100% accurate. Only the shared object pages that actually contain code you use are loaded. This is still much less efficient than statically linking, which compresses those functions together.
On the other hand, dynamic linking uses much less runtime RAM, as all programs using the same shared object use the same in RAM copy of the code (I'll be qualifying this statement in a second).
The above is a true statement on Unix like systems. On Windows, it is not 100% accurate. On Windows (at least on 32bit Intel, I'm not sure about other platforms), DLLs are not compiled with position independent code. As such, each DLL carries the (virtual memory) load address it needs to be loaded at. If one executable links two DLLs that overlap, the loader will relocate one of the DLLs. This requires patching the actual code of the DLL, which means that this DLL now carries code that is specific to this program's use of it, and cannot be shared. Such collisions, however, should be rare, and are usually avoidable.
To illustrate with an example, statically linking glibc will probably cause you to consume more RAM at run time, as this library is, in all likelihood, already loaded in RAM before your program even starts. Statically linking some unique library only your program uses will save run time RAM. The in-between cases are in-between.
Different processes calling the same dll/so file can share the read-only memory pages this includes code or text pages.
However each dll loaded in a given peogram has to have its own page for writable global or static data. These pages may be 4/16/64k or bigger depending on the OS. If one statically linked, the static data can be shared in one pages.
Programs, when running on common operating systems like Linux, Windows, MacOSX, Android, ...., are running as processes having some virtual address space. This uses virtual memory (implemented by the kernel driving the MMU).
Read a good book like Operating Systems: Three Easy Pieces to understand more.
So programs don't consume directly RAM. The RAM is a resource managed by the kernel. When RAM becomes scarce, your system experiments thrashing. Read also about the page cache and about memory overcommitment (a feature that I dislike and that I often disable).
The advantage of using a shared library, when the same library is used by several processes, is that its code segment is appearing (technically is paged) only once in RAM.
However, dynamic linking has a small overhead (even in memory), e.g. to resolve relocations. So if a library is used by only one process, that might consume slightly more RAM than if it was statically linked. In practice you should not bother most of the time, and I recommend using dynamic linking systematically.
And in practice, for huge processes (such as your browser), the data and the heap consumes much more RAM than the code.
On Linux, Drepper's paper How To Write Shared Libraries explains a lot of things in details.
On Linux, you might use proc(5) and pmap(1) to explore virtual address spaces. For example, try cat /proc/$$/maps and cat /proc/self/maps and pmap $$ in a terminal. Use ldd(1) to find out the dynamic libraries dependencies of a program, e.g. ldd /bin/cat. Use strace(1) to find out what syscalls(2) are used by a process. Those relevant to the virtual address space include mmap(2) and munmap, mprotect(2), mlock(2), the old sbrk(2) -obsolete- and execve(2).
I'm still a noob in C, so I have a question about linking.
We have two programs "A" and "B", which links to the dynamic linked library "C".
Now we start program "A" and "B".
What happened now to "C". Will it be loaded once for both programs, or two times for every program?
And what is, when program B is a Python program, which make use of the foreign function interface?
It all depends on the operating system, but for e.g. Linux or Windows the shared library will only be loaded once, but it will be mapped twice. Each process using the shared library will have the library mapped, but those mappings all lead to the same single loaded library.
The mapping is done on a per-process basis, it doesn't really matter what the process does or is (if it's a program you made, a Python interpreter, or something completely different).
Searching google for Dynamic linking C gave me the following result (Shared libraries are dynamically loaded)
Shared libraries are loaded into memory by programs when they start. When a shared library is loaded properly, all programs that start later automatically use the already loaded shared library.
In the case of Windows, only the DLL (dynamic link library) code is shared between processes. Each process has it's own virtual memory address space, including the data used by a DLL. This means that a DLL normally can't have a static buffer that is shared between processes. The process or DLL code can set up shared memory, and that shared memory can be shared between processes.
MSDN articles:
http://msdn.microsoft.com/en-us/library/windows/desktop/ms681914(v=vs.85).aspx
http://msdn.microsoft.com/en-us/library/windows/desktop/ms682594(v=vs.85).aspx
http://msdn.microsoft.com/en-s/library/windows/desktop/ms686958(v=vs.85).aspx
In case of a shared library between two applications, does each application use it's own copy of the library during run time ? If they use the same instance of the library, what happens to the global variables inside the library ?
It depends on the operating system. On most Unix-like systems, shared libraries use position-independent code, so the memory used by the code segment(which holds instructions and read-only variables) can be shared between processes, but each process still has its own data segment(which holds other variables).
For Unix-like operating systems, when you first execute your applications, the page tables of the two processes which map the library address space will point to the same frames in memory where the library is loaded.
However, the page tables which map the data section of the library are handled with Copy on Write mechanism. As soon as you try to write a global variable, the OS will create a process specific copy of the page containing the variable and will remap the page table of the process accordingly.
Each program creates a new instance of the library in its own memory space. They are not shared and the 2 programs will not see each other's data.
Take a look at how dynamic libraries are loaded: http://eli.thegreenplace.net/2011/08/25/load-time-relocation-of-shared-libraries
The same is true for statically linked libraries except instead of being loaded at runtime, they are linked at compile time.
I'm building an application that has a huge .so file - well over 2GB in size (stripped).
Are there limits to the size of an shared object file?
Because strace shows that the file is refused because it is too big.
My system currently is a 32-bit system, and I also wonder how much this changes when I would build for a 64-bit Linux system.
Since shared library is loaded completely into memory, I would highly recommend you to move your resources away to some external files. IMHO, 2GB is totally non-acceptable for a shared library, and will cause problems on low memory systems.
UPDATE:
Please ignore my first sentence about loading whole shared libraries into memory. As OP commented, shared libraries are indeed mmap'ed, and symbol pages are loaded on demand.
It depends on your system's memory *.so links directly loaded with executable or system itself it can't load if you have low memory or OS allocates a lot of memory and if you build for 64-bit system it will expand more than 2 gb in size, because of adding some 64-bit flags and instructions.
Assume we have a very small embedded system consisting only of the linux kernel and a single statically linked binary run as init. We want the binary to be able to dynamically load external plugins in runtime.
Is it possible on linux? Dlopen only works with shared libraries and dynamic linking cause static binaries don't export any symbols to the outside world, so is there any other way to do it?
You could run the "plugins" as child processes, and communicate over IPC (shared memory, pipes, or so forth).
They would exist in their own process space, so you couldn't directly call functions in them (besides, if they're also statically linked, you won't have any function entry points other than main that you could reach), but you could (e.g.) send a command over a named pipe, or pass data in a shared memory structure.
Note that, the moment you load the second binary, you have lost one of the main benefits of static linking (because now you have two copies of your libc loaded), so you might want to consider just biting the bullet and using dynamic linking. You'll burn a few 100K's in adding the dynamic linking support, but the GNU libc is about 2M, so if you're loading one plug-in, you've gained maybe 1.8M in savings already; and for each additional plug-in you load, you're saving some 2M.
Dlopen only works with shared libraries and dynamic linking cause static binaries don't export any symbols to the outside world
You can dlopen a shared library from a statically linked binary when using glibc. If you need your plugin to reference symbols from the main executable, you would have to pass in pointers to them into the plugin, similar to this.
is there any other way to do it?
You could also write your own module loader. The Linux kernel does this, and so does Xorg.