Loading a SO Library, What can Linux do for me? - c

I write a loader for ELF64 programs. I now have the problem that I want to export and link existing so libraries in memory.
There are additional problems related to this. First of all here is what I know.
The so library is simply Position Independent Code that is compiled in a way that it can be just placed everywhere and run. It exports several symbols and I have a real problem to understand anything from here on.
Since starting an executable in Linux will issue a loading sequence about to load any required shared library and linking the external symbols right before the application starts.
So here are the questions:
Is it true that a so library is only loaded once despite how many programs request that library.
Is there a mechanism (linux function) I can call to load a so library on runtime other than by the loader.
Is it possible to optain the symbols (addresses) to invoke methods and relocate and bind system calls of an already loaded library. What is the API to use.
4.Can I privately load a so library? Would it result into conflicts? Is there a scenario where this is actually done?

Is it true that a so library is only loaded once despite how many
programs request that library.
The .text section only needs to load once, as it is shared among processes. The .data and .bss part is private to each process so it must be reloaded from the SO for each process that is dynamically linked to the SO.
Is there a mechanism (linux function) I can call to load a so library
on runtime other than by the loader.
The dlopen() function and related. http://linux.die.net/man/3/dlopen
Is it possible to optain the symbols (addresses) to invoke methods and
relocate and bind system calls of an already loaded library. What is
the API to use. 4.Can I privately load a so library? Would it result
into conflicts? Is there a scenario where this is actually done?
I'm not sure of what you mean by "system calls", as these are referred to the operating system, which is not a shared object (well, it is actually shared, but not in that way). To get symbol addresses and invoke functions within a loaded shared object, you can use the API exposed by the dynamic linking loader, which dlopen() belongs to.

Related

linking, loading, and virtual memory

I know these questions have been asked before - but I still can't reconcile everything together into an overall picture.
static vs dynamic library
static libraries have their code copied and linked into the resulting executable
static libraries have only copy and link the required modules into the executable, not the entire library implementation
static libraries don't need to be compiled as PIC as they are apart of the resulting executable
dynamic libraries copy and link in stubs that describe how to load/link (?) the function implementation at runtime
dynamic libraries can be PIC or relocatable
why are there separate static and dynamic libraries? All of the above seems to be be the job of the static or dynamic linker. Why do I need 2 libraries that implement scanf?
(bonus #1) what does a shared library refer to? I've heard it being used as (1) the overall umbrella term, synonymous to library, (2) directly to a dynamic library, (3) using virtual memory to map the same physical memory of a library to multiple address spaces. Can you do this only with dynamic libraries? (4) having different versions of the same dynamic library in memory.
(bonus #2) are the standard libraries (libc, libc++, stdlibc++, ..) linked dynamically or statically by default? I never need to dlopen()..
static vs dynamic linking
how is this any different than static vs dynamic libraries? I don't understand why there isn't just 1 library, and we use either a static or dynamic linker (other than the PIC issue). Instead of talking about static vs dynamic libraries, should we instead be discussing the more general static s dynamic linking?
is symbol resolution still performed at compile-time for both?
static vs dynamic loading
Static loading means copying the full executable into MM before executing it
Dynamic loading means that only the executable header copied into MM before executing, additional functionality is loaded into MM when requested. How is this any different from paging?
If the executable is dynamically linked, why would it not be dynamically loaded?
both static loading and dynamic loading may or may not perform relocation
I know there are a lot of things I'm confused about here - and I'm not necessary looking for someone to address each issue. I'm hoping by listing out everything that is confusing me, that someone that understands this will see where a lapse in my understanding is at a broad level, and be able to paint a larger picture about how these things cooperate together..
why 2 types of lib loading
dynamic saves space (you dont have hundreds of copies of the same code in all binaries using foo.lib
dynamic allows foo.lib vendor can ship a new version of the library and existing code takes advantage of it
static makes dependency management easier - in theory a binary can be one file
What is 'shared library'
unix name for dynamic library. Windows calls it DLL
Are standard libraries static or dynamic
depends on platform. On some you can choose on others its chosen for you. For example on windwos there are compiler switchs to say if you want static or dynamic runtimes. Not dont confuse dynamic library usage with dlopen - see later
'why we talk about 2 different types of library'
Typically a static library is in a different format from a dynamic one. Typically a static library is input to the linker just like any other compile unit. A dynamic library is typically output by the linker. They are used differently even though they both deliver the same chunk of code to your app
Symbol resolution is finalized at load time for a DLL
Full dynamic loading. This is the realm of dlopen. This is where you want to call entry points in a library that might not have even existing when you compiled. Use cases:
plugins that conform to a well known interface but there can be many implementations (PAM and NSS are good examples). The app chooses to load one or more implementations from specified files at run time
an app needs to load a library and call an arbitrary function. Imagine how , for example , how a scripting language can load and call an arbitrary method
To use a .so on unix you dont need to use dlopen. You can have it loaded for you (Same on windows). To really dynamically load a shared lib / dll you need dlopen or LoadLibrary
Note that statically linked libraries load faster, since there is less disk searching for all the runtime library files. If the libraries are small, and very unusual, probably better to link statically. If there are serious version dependencies / functional differences like MFC, the DLLs need different names.

How to programmatically look up a symbol in a running application

I have an application which I'm trying to debug, however running it under gdb is producing different results, and it would be nice to have it output true symbol information when confronted with an address. for instance.
I have a method which is called periodically and I can determine the address of the call site. However, I'd like to print out the symbol information at run time for this address. I know I can run "nm" on the executable but that is outside of the application. I want to be able to do it from within the application itself.
I'm using GCC 4.7.2 on a linux platform.
(eddited to explain why I can't use gdb)
Dynamic symbol information can be accessed via the DT_DYNAMIC segment, which is loaded into memory and can be accessed by asking dlopen(3) for a handle to the main executable.
Static symbol information can be read only from the actual executable file, or an external file, as it is not listed in the loadable segments.
With just dynamic information, you will not be able to resolve anything that is not exported, which means you will most likely only see library calls unless your executable has its symbol table exported, so static information is probably the way to go.
This involves either lots of parsing, or using the bfd library built from binutils.
I'd seriously wonder if that was really worth the effort, though. You might get the same information from using the profiling support in gcc.

LoadLibrary Calls, Returned Pointers Not Saved

I am fixing up someone else's code and noticed that the person calls LoadLibrary several times, as per below:
LoadLibrary("C:\\Windows\\SysWOW64\\msjint40");
LoadLibrary("C:\\Windows\\SysWOW64\\msjtes40");
LoadLibrary("C:\\Windows\\SysWOW64\\expsrv");
What is the point of this? The return pointers are not saved! The program later then calls a bunch of other DLL's that do use functions from MSJTES40, but not in the context of where the libraries are loaded.
The comment says - "else preload to optimize", but how does the rest of the program know where the DLL's are?
Thanks for any info.
LoadLibrary brings the specified module into the address space. Libraries can't be loaded twice, so doing this causes the preload (the loaded module may have other dependencies) so this could be viewed as an optimization. The second call to the library (where they use the return value) should complete faster.
See the documentation
If the specified module is a DLL that is not already loaded for the calling process, the system calls the DLL's DllMain function with the DLL_PROCESS_ATTACH value.
Also from the documentation.
Do not make assumptions about the operating system version based on a LoadLibrary call that searches for a DLL. If the application is running in an environment where the DLL is legitimately not present but a malicious version of the DLL is in the search path, the malicious version of the DLL may be loaded
Assuming a hard-coded DLL location opens your program up to all sorts of mischief!

Linux: Is it possible to make some plugin oriented programming using statically linked binaries?

Assume we have a very small embedded system consisting only of the linux kernel and a single statically linked binary run as init. We want the binary to be able to dynamically load external plugins in runtime.
Is it possible on linux? Dlopen only works with shared libraries and dynamic linking cause static binaries don't export any symbols to the outside world, so is there any other way to do it?
You could run the "plugins" as child processes, and communicate over IPC (shared memory, pipes, or so forth).
They would exist in their own process space, so you couldn't directly call functions in them (besides, if they're also statically linked, you won't have any function entry points other than main that you could reach), but you could (e.g.) send a command over a named pipe, or pass data in a shared memory structure.
Note that, the moment you load the second binary, you have lost one of the main benefits of static linking (because now you have two copies of your libc loaded), so you might want to consider just biting the bullet and using dynamic linking. You'll burn a few 100K's in adding the dynamic linking support, but the GNU libc is about 2M, so if you're loading one plug-in, you've gained maybe 1.8M in savings already; and for each additional plug-in you load, you're saving some 2M.
Dlopen only works with shared libraries and dynamic linking cause static binaries don't export any symbols to the outside world
You can dlopen a shared library from a statically linked binary when using glibc. If you need your plugin to reference symbols from the main executable, you would have to pass in pointers to them into the plugin, similar to this.
is there any other way to do it?
You could also write your own module loader. The Linux kernel does this, and so does Xorg.

efficiency of utilizing dll in c source code

I have a dll which I'd like to use in a c program,
Do you think is efficient to have a dll (lots of common functions) and then create a program that will eventually use them, or have all the source code?
To include the dll, What syntax must be followed?
Do you think is efficient to have a dll (lots of common functions) and then create a program that will eventually use them,or have all the source code.
For memory and disk space, it is more efficient to use a shared library (a DLL is the Windows implementation of shared libraries), assuming that at least two programs use this component. If only one program will ever use this component, then there is no memory or disk space savings to be had.
Shared libraries can be slightly slower than statically linking the code; however, this is likely to be incredibly minor, and shared libraries carry a number of benefits that make it more than worthwhile (such as the ability to load and handle symbols dynamically, which allows for plugin-like architectures). That said, there are also some disadvantages (if you are not careful about where your DLLs live, how they are versioned, and who can update them, then you can get into DLL hell).
To include the dll, What syntax must be followed?
This depends. There are two ways that shared libraries can be used. In the first way, you tell the linker to reference the shared library, and the shared library will automatically be loaded on program startup, and you would basically reference the code like normal (include the various headers and just use the name of the symbol when you want to reference it). The second way is to dynamically load the shared library (on Windows this is done via LoadLibrary while it is done on UNIX with dlopen). This second way makes it possible to change the behavior of the program based on the presence or absence of symbols in the shared library and to inspect the available set of symbols. For the second way, you would use GetProcAddress (Windows) or dlsym (UNIX) to obtain a pointer to a function defined in the library, and you would pass around function pointers to reference the functions that were loaded.
You can put your functions into either a static library ( a .lib) which is merged into your application at compile time and is basically the same as putting the .c files in the project.
Or you can use a dll where the functions are included at run time. the advantage of a dll is that two programs which use the same functions can use the same dll (saving disk space) and you can upgrade the dll without changing the program - neither of these probably matters for you.
The dll is automatically loaded when your program runs there is nothing special you need to do to include it ( you can load a dll specifically in your code - there are sometimes special reasons to do this)
Edit - if you need to create a stub lib for an existing dll see http://support.microsoft.com/kb/131313

Resources