Linux: Is it possible to make some plugin oriented programming using statically linked binaries? - c

Assume we have a very small embedded system consisting only of the linux kernel and a single statically linked binary run as init. We want the binary to be able to dynamically load external plugins in runtime.
Is it possible on linux? Dlopen only works with shared libraries and dynamic linking cause static binaries don't export any symbols to the outside world, so is there any other way to do it?

You could run the "plugins" as child processes, and communicate over IPC (shared memory, pipes, or so forth).
They would exist in their own process space, so you couldn't directly call functions in them (besides, if they're also statically linked, you won't have any function entry points other than main that you could reach), but you could (e.g.) send a command over a named pipe, or pass data in a shared memory structure.
Note that, the moment you load the second binary, you have lost one of the main benefits of static linking (because now you have two copies of your libc loaded), so you might want to consider just biting the bullet and using dynamic linking. You'll burn a few 100K's in adding the dynamic linking support, but the GNU libc is about 2M, so if you're loading one plug-in, you've gained maybe 1.8M in savings already; and for each additional plug-in you load, you're saving some 2M.

Dlopen only works with shared libraries and dynamic linking cause static binaries don't export any symbols to the outside world
You can dlopen a shared library from a statically linked binary when using glibc. If you need your plugin to reference symbols from the main executable, you would have to pass in pointers to them into the plugin, similar to this.
is there any other way to do it?
You could also write your own module loader. The Linux kernel does this, and so does Xorg.

Related

Two programs links to dynamic linked library

I'm still a noob in C, so I have a question about linking.
We have two programs "A" and "B", which links to the dynamic linked library "C".
Now we start program "A" and "B".
What happened now to "C". Will it be loaded once for both programs, or two times for every program?
And what is, when program B is a Python program, which make use of the foreign function interface?
It all depends on the operating system, but for e.g. Linux or Windows the shared library will only be loaded once, but it will be mapped twice. Each process using the shared library will have the library mapped, but those mappings all lead to the same single loaded library.
The mapping is done on a per-process basis, it doesn't really matter what the process does or is (if it's a program you made, a Python interpreter, or something completely different).
Searching google for Dynamic linking C gave me the following result (Shared libraries are dynamically loaded)
Shared libraries are loaded into memory by programs when they start. When a shared library is loaded properly, all programs that start later automatically use the already loaded shared library.
In the case of Windows, only the DLL (dynamic link library) code is shared between processes. Each process has it's own virtual memory address space, including the data used by a DLL. This means that a DLL normally can't have a static buffer that is shared between processes. The process or DLL code can set up shared memory, and that shared memory can be shared between processes.
MSDN articles:
http://msdn.microsoft.com/en-us/library/windows/desktop/ms681914(v=vs.85).aspx
http://msdn.microsoft.com/en-us/library/windows/desktop/ms682594(v=vs.85).aspx
http://msdn.microsoft.com/en-s/library/windows/desktop/ms686958(v=vs.85).aspx

Shared library in C with two application

In case of a shared library between two applications, does each application use it's own copy of the library during run time ? If they use the same instance of the library, what happens to the global variables inside the library ?
It depends on the operating system. On most Unix-like systems, shared libraries use position-independent code, so the memory used by the code segment(which holds instructions and read-only variables) can be shared between processes, but each process still has its own data segment(which holds other variables).
For Unix-like operating systems, when you first execute your applications, the page tables of the two processes which map the library address space will point to the same frames in memory where the library is loaded.
However, the page tables which map the data section of the library are handled with Copy on Write mechanism. As soon as you try to write a global variable, the OS will create a process specific copy of the page containing the variable and will remap the page table of the process accordingly.
Each program creates a new instance of the library in its own memory space. They are not shared and the 2 programs will not see each other's data.
Take a look at how dynamic libraries are loaded: http://eli.thegreenplace.net/2011/08/25/load-time-relocation-of-shared-libraries
The same is true for statically linked libraries except instead of being loaded at runtime, they are linked at compile time.

linking, loading, and virtual memory

I know these questions have been asked before - but I still can't reconcile everything together into an overall picture.
static vs dynamic library
static libraries have their code copied and linked into the resulting executable
static libraries have only copy and link the required modules into the executable, not the entire library implementation
static libraries don't need to be compiled as PIC as they are apart of the resulting executable
dynamic libraries copy and link in stubs that describe how to load/link (?) the function implementation at runtime
dynamic libraries can be PIC or relocatable
why are there separate static and dynamic libraries? All of the above seems to be be the job of the static or dynamic linker. Why do I need 2 libraries that implement scanf?
(bonus #1) what does a shared library refer to? I've heard it being used as (1) the overall umbrella term, synonymous to library, (2) directly to a dynamic library, (3) using virtual memory to map the same physical memory of a library to multiple address spaces. Can you do this only with dynamic libraries? (4) having different versions of the same dynamic library in memory.
(bonus #2) are the standard libraries (libc, libc++, stdlibc++, ..) linked dynamically or statically by default? I never need to dlopen()..
static vs dynamic linking
how is this any different than static vs dynamic libraries? I don't understand why there isn't just 1 library, and we use either a static or dynamic linker (other than the PIC issue). Instead of talking about static vs dynamic libraries, should we instead be discussing the more general static s dynamic linking?
is symbol resolution still performed at compile-time for both?
static vs dynamic loading
Static loading means copying the full executable into MM before executing it
Dynamic loading means that only the executable header copied into MM before executing, additional functionality is loaded into MM when requested. How is this any different from paging?
If the executable is dynamically linked, why would it not be dynamically loaded?
both static loading and dynamic loading may or may not perform relocation
I know there are a lot of things I'm confused about here - and I'm not necessary looking for someone to address each issue. I'm hoping by listing out everything that is confusing me, that someone that understands this will see where a lapse in my understanding is at a broad level, and be able to paint a larger picture about how these things cooperate together..
why 2 types of lib loading
dynamic saves space (you dont have hundreds of copies of the same code in all binaries using foo.lib
dynamic allows foo.lib vendor can ship a new version of the library and existing code takes advantage of it
static makes dependency management easier - in theory a binary can be one file
What is 'shared library'
unix name for dynamic library. Windows calls it DLL
Are standard libraries static or dynamic
depends on platform. On some you can choose on others its chosen for you. For example on windwos there are compiler switchs to say if you want static or dynamic runtimes. Not dont confuse dynamic library usage with dlopen - see later
'why we talk about 2 different types of library'
Typically a static library is in a different format from a dynamic one. Typically a static library is input to the linker just like any other compile unit. A dynamic library is typically output by the linker. They are used differently even though they both deliver the same chunk of code to your app
Symbol resolution is finalized at load time for a DLL
Full dynamic loading. This is the realm of dlopen. This is where you want to call entry points in a library that might not have even existing when you compiled. Use cases:
plugins that conform to a well known interface but there can be many implementations (PAM and NSS are good examples). The app chooses to load one or more implementations from specified files at run time
an app needs to load a library and call an arbitrary function. Imagine how , for example , how a scripting language can load and call an arbitrary method
To use a .so on unix you dont need to use dlopen. You can have it loaded for you (Same on windows). To really dynamically load a shared lib / dll you need dlopen or LoadLibrary
Note that statically linked libraries load faster, since there is less disk searching for all the runtime library files. If the libraries are small, and very unusual, probably better to link statically. If there are serious version dependencies / functional differences like MFC, the DLLs need different names.

C executable code independence from shared libraries

I'm reading a book about gcc and the following paragraph puzzles me now:
Furthermore, shared libraries make it possible to update a library with-
out recompiling the programs which use it (provided the interface to the
library does not change).
This only refers to the programs which are not yet linked, right?
I mean, in C isn't executable code completely independent from the compiler? In which case any alteration to the library, whether its interface or implementation is irrelevant to the executable code?
A shared library is not linked until the program is executed, so the library can be upgraded/changed without recompiling (nor relinking).
EG, on Linux, one might have
/bin/myprogram
depending upon
/usr/lib64/mylibrary.so
Replacing mylibrary.so with a different version (as long as the functions/symbols that it exports are the same/compatible) will affect myprogram the next time that myprogram is started. On Linux, this is handled by the system program /lib64/ld-linux-x864-64.so.2 or similar, which the system runs automatically when the program is started.
Contrast with a static library, which is linked at compile-time. Changes to static libraries require the application to be re-linked.
As an added benefit, if two programs share the same shared library, the memory footprint can be smaller, as the kernel can “tell” that it's the same code, and not copy it into RAM twice. With static libraries, this is not the case.
No, this is talking about code that is linked. If you link to a static libary, and change the library, the executable will not pick up the changes because it contains its own copy of the original version of the library.
If you link to a shared library, also known as dynamic linking, the executable does not contain a copy of the library. When the program is run, it loads the current version of the library into memory. This allows you to fix the library, and the fixes will be picked up by all users of the library without needing to be relinked.
Libraries provide interfaces (API) to the outside world. Applications using the libraries (a .DLL for example) bind to the interface (meaning they call functions from the API). The library author is free to modify the library and redistribute a newer version as long as they don't modify the interface.
If the library authors were to modify the interface, they could potentially break all of the applications that depend on that function!

efficiency of utilizing dll in c source code

I have a dll which I'd like to use in a c program,
Do you think is efficient to have a dll (lots of common functions) and then create a program that will eventually use them, or have all the source code?
To include the dll, What syntax must be followed?
Do you think is efficient to have a dll (lots of common functions) and then create a program that will eventually use them,or have all the source code.
For memory and disk space, it is more efficient to use a shared library (a DLL is the Windows implementation of shared libraries), assuming that at least two programs use this component. If only one program will ever use this component, then there is no memory or disk space savings to be had.
Shared libraries can be slightly slower than statically linking the code; however, this is likely to be incredibly minor, and shared libraries carry a number of benefits that make it more than worthwhile (such as the ability to load and handle symbols dynamically, which allows for plugin-like architectures). That said, there are also some disadvantages (if you are not careful about where your DLLs live, how they are versioned, and who can update them, then you can get into DLL hell).
To include the dll, What syntax must be followed?
This depends. There are two ways that shared libraries can be used. In the first way, you tell the linker to reference the shared library, and the shared library will automatically be loaded on program startup, and you would basically reference the code like normal (include the various headers and just use the name of the symbol when you want to reference it). The second way is to dynamically load the shared library (on Windows this is done via LoadLibrary while it is done on UNIX with dlopen). This second way makes it possible to change the behavior of the program based on the presence or absence of symbols in the shared library and to inspect the available set of symbols. For the second way, you would use GetProcAddress (Windows) or dlsym (UNIX) to obtain a pointer to a function defined in the library, and you would pass around function pointers to reference the functions that were loaded.
You can put your functions into either a static library ( a .lib) which is merged into your application at compile time and is basically the same as putting the .c files in the project.
Or you can use a dll where the functions are included at run time. the advantage of a dll is that two programs which use the same functions can use the same dll (saving disk space) and you can upgrade the dll without changing the program - neither of these probably matters for you.
The dll is automatically loaded when your program runs there is nothing special you need to do to include it ( you can load a dll specifically in your code - there are sometimes special reasons to do this)
Edit - if you need to create a stub lib for an existing dll see http://support.microsoft.com/kb/131313

Resources