linking, loading, and virtual memory - c

I know these questions have been asked before - but I still can't reconcile everything together into an overall picture.
static vs dynamic library
static libraries have their code copied and linked into the resulting executable
static libraries have only copy and link the required modules into the executable, not the entire library implementation
static libraries don't need to be compiled as PIC as they are apart of the resulting executable
dynamic libraries copy and link in stubs that describe how to load/link (?) the function implementation at runtime
dynamic libraries can be PIC or relocatable
why are there separate static and dynamic libraries? All of the above seems to be be the job of the static or dynamic linker. Why do I need 2 libraries that implement scanf?
(bonus #1) what does a shared library refer to? I've heard it being used as (1) the overall umbrella term, synonymous to library, (2) directly to a dynamic library, (3) using virtual memory to map the same physical memory of a library to multiple address spaces. Can you do this only with dynamic libraries? (4) having different versions of the same dynamic library in memory.
(bonus #2) are the standard libraries (libc, libc++, stdlibc++, ..) linked dynamically or statically by default? I never need to dlopen()..
static vs dynamic linking
how is this any different than static vs dynamic libraries? I don't understand why there isn't just 1 library, and we use either a static or dynamic linker (other than the PIC issue). Instead of talking about static vs dynamic libraries, should we instead be discussing the more general static s dynamic linking?
is symbol resolution still performed at compile-time for both?
static vs dynamic loading
Static loading means copying the full executable into MM before executing it
Dynamic loading means that only the executable header copied into MM before executing, additional functionality is loaded into MM when requested. How is this any different from paging?
If the executable is dynamically linked, why would it not be dynamically loaded?
both static loading and dynamic loading may or may not perform relocation
I know there are a lot of things I'm confused about here - and I'm not necessary looking for someone to address each issue. I'm hoping by listing out everything that is confusing me, that someone that understands this will see where a lapse in my understanding is at a broad level, and be able to paint a larger picture about how these things cooperate together..

why 2 types of lib loading
dynamic saves space (you dont have hundreds of copies of the same code in all binaries using foo.lib
dynamic allows foo.lib vendor can ship a new version of the library and existing code takes advantage of it
static makes dependency management easier - in theory a binary can be one file
What is 'shared library'
unix name for dynamic library. Windows calls it DLL
Are standard libraries static or dynamic
depends on platform. On some you can choose on others its chosen for you. For example on windwos there are compiler switchs to say if you want static or dynamic runtimes. Not dont confuse dynamic library usage with dlopen - see later
'why we talk about 2 different types of library'
Typically a static library is in a different format from a dynamic one. Typically a static library is input to the linker just like any other compile unit. A dynamic library is typically output by the linker. They are used differently even though they both deliver the same chunk of code to your app
Symbol resolution is finalized at load time for a DLL
Full dynamic loading. This is the realm of dlopen. This is where you want to call entry points in a library that might not have even existing when you compiled. Use cases:
plugins that conform to a well known interface but there can be many implementations (PAM and NSS are good examples). The app chooses to load one or more implementations from specified files at run time
an app needs to load a library and call an arbitrary function. Imagine how , for example , how a scripting language can load and call an arbitrary method
To use a .so on unix you dont need to use dlopen. You can have it loaded for you (Same on windows). To really dynamically load a shared lib / dll you need dlopen or LoadLibrary

Note that statically linked libraries load faster, since there is less disk searching for all the runtime library files. If the libraries are small, and very unusual, probably better to link statically. If there are serious version dependencies / functional differences like MFC, the DLLs need different names.

Related

How do dynamic linkers/loaders resolve symbols?

I'm a CS student, and I'm doing a project on shared libraries and dynamic linking/loading. One of the questions I have to answer is how symbols are resolved with dynamic linking/loading. I've scoured the internet and haven't been able to find anything conclusive. I understand different linkers may resolve symbols differently across different operating systems. I'm just looking for a general, windows-based answer; how are symbols resolved in dynamic linking?
Thank You!
Well, let's stick on Windows. I'll answer in a few words and then vote for moving this question to CS site instead of main SO.
First, dynamic linking can be at program start (prelinked variant) and when program code explicitly requests some library load. While the same DLLs are used there, details differ.
Prelinked variant will work with so called import library that is static but contains special thunks (AKA trampolines) that are replaced with jumps to real code when dynamic loader attaches a real library (DLL file). For linkers that aren't aware of dynamic loading, import library is enough to still provide dynamic linking - that how it appeared in DOS/Windows. The calling code could be ignorant of details if the code is provided by static or dynamic library.
On-the-fly loading is using methods like LoadLibrary to load the library (and activate it) and GetProcAddress to get pointer of real function implementation. In that case, application (or another library) is aware of details of this mechanism.
I hope this brief is enough to provide you with enough words for further delving.

At dynamic linking, does the dynamic loader look at all object files for definitions, or only at those specified by the executable?

So I'm trying to wrap my head around static and dynamic linking. There are many resources on SO and on the web. I think I pretty much get it, but there's still one thing that seems to bother me. Also, please correct me if my overall understanding is wrong.
I think I understand static linking:
The linker unpacks the linked libraries, and actually includes the libraries' object files inside the produced executable. The unresolved-stubs in the application object files are then replaced by actual function-calling code, which calls functions in addresses known at build time.
Dynamic linking on the other hand is what puzzles me more: I understand that in dynamic linking, the stubs in the object-code which reference yet-unresolved names, are going to stay as stubs until runtime.
Then at runtime, the dynamic loader of the OS would look through precompiled libraries stored at standard filesystem locations. It would look in the object-files of the libraries, inside their symbol tables (?) and try to find a matching function definition for each unresolved-stub. It would then load the matching object-files into memory, and replace the stubs to point to the function definitions.
So the part I'm missing is this: where does the OS dynamic loader look - does it look in the symbol tables for all object-files in the system-libraries directory? Or does it only look in object-files specified somewhere in the application-executable file? Is this the reason why at compile time we must specify all dynamic dependencies of our program? Also, is it true dynamic libraries expose a symbol-table too?
So the part I'm missing is this: where does the OS dynamic loader look
- does it look in the symbol tables for all object-files in the system-libraries directory?
No dynamic linker I'm aware of does this.
Or does it only look in object-files
specified somewhere in the application-executable file?
Nor exactly this, either.
Details vary, but generally, a dynamic linker looks for specific shared libraries by name in various directories. The directories searched may be built into the linker, specified by the operating system, specified in the object being linked, or a combination. The linker does not (generally) examine libraries' symbol tables until after it locates them by name and selects them for linking.
Is this the
reason why at compile time we must specify all dynamic dependencies of
our program?
Yes, though under some circumstances we do not need to specify all dynamic dependencies at compile time. Some dynamic linkers support on-demand dynamic loading as directed by the program itself. This can be used to implement plugin systems, among other purposes.
Also, is it true dynamic libraries expose a symbol-table
too?
Yes. Dynamic libraries have their own symbol tables because
The dynamic linker uses them to do its work, and
Dynamic libraries can have their own dynamic linking requirements, which are not necessarily reflected in the main program's.
In the normal usage, "dynamic linking" is performed by the loader. "Static linking" is performed by the linker.
Generally, linkers can create either executable files or shared libraries. The linker output for both is an instruction stream that tells the loaders how to place the executable or library in memory.
Dynamic linking on the other hand is what puzzles me more: I understand that in dynamic linking, the stubs in the object-code which reference yet-unresolved names, are going to stay as stubs until runtime
That is not [usually] correct. The linker will locate the shared library in which the symbol exists. The executable will have an instruction to find the symbol in that shared library. Linkers generally puke if they cannot find all the symbols that need to be resolved.
So the part I'm missing is this: where does the OS dynamic loader look - does it look in the symbol tables for all object-files in the system-libraries directory?
This a system specific question. In well designed operating systems, the shared libraries are designated by the system manager. The loader uses the library specified by the system. Poorly designed systems frequently use some kind of search path to find the shared libraries (which created a massive security hole).

C executable code independence from shared libraries

I'm reading a book about gcc and the following paragraph puzzles me now:
Furthermore, shared libraries make it possible to update a library with-
out recompiling the programs which use it (provided the interface to the
library does not change).
This only refers to the programs which are not yet linked, right?
I mean, in C isn't executable code completely independent from the compiler? In which case any alteration to the library, whether its interface or implementation is irrelevant to the executable code?
A shared library is not linked until the program is executed, so the library can be upgraded/changed without recompiling (nor relinking).
EG, on Linux, one might have
/bin/myprogram
depending upon
/usr/lib64/mylibrary.so
Replacing mylibrary.so with a different version (as long as the functions/symbols that it exports are the same/compatible) will affect myprogram the next time that myprogram is started. On Linux, this is handled by the system program /lib64/ld-linux-x864-64.so.2 or similar, which the system runs automatically when the program is started.
Contrast with a static library, which is linked at compile-time. Changes to static libraries require the application to be re-linked.
As an added benefit, if two programs share the same shared library, the memory footprint can be smaller, as the kernel can “tell” that it's the same code, and not copy it into RAM twice. With static libraries, this is not the case.
No, this is talking about code that is linked. If you link to a static libary, and change the library, the executable will not pick up the changes because it contains its own copy of the original version of the library.
If you link to a shared library, also known as dynamic linking, the executable does not contain a copy of the library. When the program is run, it loads the current version of the library into memory. This allows you to fix the library, and the fixes will be picked up by all users of the library without needing to be relinked.
Libraries provide interfaces (API) to the outside world. Applications using the libraries (a .DLL for example) bind to the interface (meaning they call functions from the API). The library author is free to modify the library and redistribute a newer version as long as they don't modify the interface.
If the library authors were to modify the interface, they could potentially break all of the applications that depend on that function!

Program location in the memory and static/shared libraries

When I run a program (in linux) does it all get loaded into the physical memory? If so, is using shared libraries, instead of static libraries, help in terms of caching? In general, when should I use shared libraries and when should I use static libraries? My codes are either written in C or in C++ if that matters.
This article hits covers some decent ground on what you want. This article goes much deeper about the advantages of shared libraries
SO also has covered this topic in depth
Difference between static and shared libraries?
When to use dynamic vs. static libraries
Almost all the above mentioned articles are shared library biased. Wikipedia tries to rescue static libraries :)
From wiki,
There are several advantages to statically linking libraries with an
executable instead of dynamically linking them. The most significant
is that the application can be certain that all its libraries are
present and that they are the correct version. This avoids dependency
problems. Usually, static linking will result in a significant
performance improvement.
Static linking can also allow the application
to be contained in a single executable file, simplifying distribution
and installation.
With static linking, it is enough to include those
parts of the library that are directly and indirectly referenced by
the target executable (or target library).
With dynamic libraries, the
entire library is loaded, as it is not known in advance which
functions will be invoked by applications. Whether this advantage is
significant in practice depends on the structure of the library.
Shared libraries are used mostly when you have functionality that could be used and "shared" across different programs. In that case, you will have a single point where all the programs will get their methods. However, this creates a dependency problem since now your compiled programs are dependent on that specific version of the library.
Static libraries are used mostly when you don't want to have dependency issues and don't want your program to care which X or Y libraries are installed on your target system.
So, which one to use?. for that you should answer the following questions:
Will your program be used on different platforms or Linux distributions? (e.g. Red Hat, Debian, SLES11-SP1)
Do you have replicated code that is being used by different binaries?
Do you envision that in the future other programs could benefit from your work?
I think this is a case by case decision, and it is not a one size fits all kind of answer.

efficiency of utilizing dll in c source code

I have a dll which I'd like to use in a c program,
Do you think is efficient to have a dll (lots of common functions) and then create a program that will eventually use them, or have all the source code?
To include the dll, What syntax must be followed?
Do you think is efficient to have a dll (lots of common functions) and then create a program that will eventually use them,or have all the source code.
For memory and disk space, it is more efficient to use a shared library (a DLL is the Windows implementation of shared libraries), assuming that at least two programs use this component. If only one program will ever use this component, then there is no memory or disk space savings to be had.
Shared libraries can be slightly slower than statically linking the code; however, this is likely to be incredibly minor, and shared libraries carry a number of benefits that make it more than worthwhile (such as the ability to load and handle symbols dynamically, which allows for plugin-like architectures). That said, there are also some disadvantages (if you are not careful about where your DLLs live, how they are versioned, and who can update them, then you can get into DLL hell).
To include the dll, What syntax must be followed?
This depends. There are two ways that shared libraries can be used. In the first way, you tell the linker to reference the shared library, and the shared library will automatically be loaded on program startup, and you would basically reference the code like normal (include the various headers and just use the name of the symbol when you want to reference it). The second way is to dynamically load the shared library (on Windows this is done via LoadLibrary while it is done on UNIX with dlopen). This second way makes it possible to change the behavior of the program based on the presence or absence of symbols in the shared library and to inspect the available set of symbols. For the second way, you would use GetProcAddress (Windows) or dlsym (UNIX) to obtain a pointer to a function defined in the library, and you would pass around function pointers to reference the functions that were loaded.
You can put your functions into either a static library ( a .lib) which is merged into your application at compile time and is basically the same as putting the .c files in the project.
Or you can use a dll where the functions are included at run time. the advantage of a dll is that two programs which use the same functions can use the same dll (saving disk space) and you can upgrade the dll without changing the program - neither of these probably matters for you.
The dll is automatically loaded when your program runs there is nothing special you need to do to include it ( you can load a dll specifically in your code - there are sometimes special reasons to do this)
Edit - if you need to create a stub lib for an existing dll see http://support.microsoft.com/kb/131313

Resources