In the "Operating System Concepts", 9th edition, by Abraham Silberschatz et al., the authors said that:
"Some operating systems support only static linking,
in which system libraries are treated like any other object module
and are combined by **the loader** into the binary program image."
(page 381, the 2nd sentence of the 1st paragraph of section 8.1.5
I wonder that the linking (combining) is performed by the Linker or Loader?
Thanks.
(assuming GNU/Linux)
I believe that's a typing mistake.
Static linking is done by the linker where you'd have a binary program image that contains your program's code and that of the library you're linking against; the loader will simply load your program as a whole.
Using Gnu C Compiler package, you may use static linking like this: gcc -static code.c
To check that the result indeed contains no markers for dynamically loaded libraries:
ldd a.out and you'll get a message like this: not a dynamic executable
When dynamically linking against a library, the linker will technically only leave a little marker in the resulting binary image stating that library 'x' needs to be loaded as well for your program to execute.
When the loader reads this binary image, it'll notice the marker and load the library; this action is never done in static linking because the whole thing becomes one large binary image.
Related
I use static linking to produce the executable object files and I use readelf to check the file and found there is one section called: .rela.plt
the keyword 'rela' indicates that this is related to relocation. but since I use static linking, not using any shared library, so the output executable file should be a fully linked executable file, so why this file still contain relocation information?
There are two ways run-time relocations can end up in statically-linked programs.
The GNU toolchain supports selecting different function implementations at run time using the IFUNC mechanism. On x86-64, these show up as R_X86_64_IRELATIVE relocations.
Some targets support statically linked position independent executables (via -static-pie in the GNU toolchain). Since the the load address differs from program run to program due to address-space layout randomization, any global data object that contains a pointer needs to be relocated at run time. On x86-64, these relocations show up as R_X86_64_RELATIVE.
(There might be other things that need relocations in statically linked programs on more obscure targets.)
Okay, until this morning I was thoroughly confused between these terms. I guess I have got the difference, hopefully.
Firstly, the confusion was that since the preprocessor already includes the header files into the code which contains the functions, what library functions does linker link to the object file produced by the assembler/compiler? Part of the confusion primarily arose due to my ignorance about the difference between a header file and a library.
After a bit of googling, and stack-overflowing (is that the term? :p), I gathered that the header file mostly contains the function declarations whereas the actual implementation is in another binary file called the library (I am still not 100% sure about this).
So, suppose in the following program:-
#include<stdio.h>
int main()
{
printf("whatever");
return 0;
}
The preprocessor includes the contents of the header file in the code. The compiler/compiler+assembler does its work, and then finally linker combines this object file with another object file which actually has stored the way printf() works.
Am I correct in my understanding? I may be way off...so could you please help me?
Edit: I have always wondered about the C++ STL. It always confused me as to what it exactly is, a collection of all those headers or what? Now after reading the responses, can I say that STL is an object file/something that resembles an object file?
And also, I thought where I could read the function definitions of functions like pow(), sqrt() etc etc. I would open the header files and not find anything. So, is the function definition in the library in binary unreadable form?
A C source file goes through two main stages, (1) the preprocessor stage where the C source code is processed by the preprocessor utility which looks for preprocessor directives and performs those actions and (2) the compilation stage where the processed C source code is then actually compiled to produce object code files.
The preprocessor is a utility that does text manipulation. It takes as input a file that contains text (usually C source code) that may contain preprocessor directives and outputs a modified version of the file by applying any directives found to the text input to generate a text output.
The file does not have to be C source code because the preprocessor is doing text manipulation. I have seen the C Preprocssor used to extend the make utility by allowing preprossor directives to be included in a make file. The make file with the C Preprocessor directives is run through the C Preprocessor utility and the resulting output then fed into make to do the actual build of the make target.
Libraries and linking
A library is a file that contains object code of various functions. It is a way to package the output from several source files when they are compiled into a single file. Many times a library file is provided along with a header file (include file), typically with a .h file extension. The header file contains the function declarations, global variable declarations, as well as preprocessor directives needed for the library. So to use the library, you include the header file provided using the #include directive and you link with the library file.
A nice feature of a library file is that you are providing the compiled version of your source code and not the source code itself. On the other hand since the library file contains compiled source code, the compiler used to generate the library file must be compatible with the compiler being used to compile your own source code files.
There are two types of libraries commonly used. The first and older type is the static library. The second and more recent is the dynamic library (Dynamic Link Library or DLL in Windows and Shared Library or SO in Linux). The difference between the two is when the functions in the library are bound to the executable that is using the library file.
The linker is a utility that takes the various object files and library files to create the executable file. When an external or global function or variable is used the C source file, a kind of marker is used to tell the linker that the address of the function or variable needs to be inserted at that point.
The C compiler only knows what is in the source it compiles and does not know what is in other files such as object files or libraries. So the linker's job is to take the various object files and libraries and to make the final connections between parts by replacing the markers with actual connections. So a linker is a utility that "links" together the various components, replacing the marker for a global function or variable in the object files and libraries with a link to the actual object code that was generated for that global function or variable.
During the linker stage is when the difference between a static library and a dynamic or shared library becomes evident. When a static library is used, the actual object code of the library is included in the application executable. When a dynamic or shared library is used, the object code included in the application executable is code to find the shared library and connect with it when the application is run.
In some cases the same global function name may be used in several different object files or libraries so the linker will normally just use the first one it comes across and issue a warning about others found.
Summary of compile and link
So the basic process for a compile and link of a C program is:
preprocessor utility generates the C source to be compiled
compiler compiles the C source into object code generating a set of object files
linker links the various object files along with any libraries into executable file
The above is the basic process however when using dynamic libraries it can get more complicated especially if part of the application being generated has dynamic libraries that it is generating.
The loader
There is also the stage of when the application is actually loaded into memory and execution starts. An operating system provides a utility, the loader, which reads the application executable file and loads it into memory and then starts the application running. The starting point or entry point for the executable is specified in the executable file so after the loader reads the executable file into memory it will then start the executable running by jumping to the entry point memory address.
One problem the linker can run into is that sometimes it may come across a marker when it is processing the object code files that requires an actual memory address. However the linker does not know the actual memory address because the address will vary depending on where in memory the application is loaded. So the linker marks that as something for the loader utility to fix when the loader is loading the executable into memory and getting ready to start it running.
With modern CPUs with hardware supported virtual address to physical address mapping or translation, this issue of actual memory address is seldom a problem. Each application is loaded at the same virtual address and the hardware address translation deals with the actual, physical address. However older CPUs or lower cost CPUs such as micro-controllers that are lacking the memory management unit (MMU) hardware support for address translation still need this issue addressed.
Entry points and the C Runtime
A final topic is the C Runtime and the main() and the executable entry point.
The C Runtime is object code provided by the compiler manufacturer that contains the entry point for an application that is written in C. The main() function is the entry point provided by the programmer writing the application however this is not the entry point that the loader sees. The main() function is called by the C Runtime after the application is started and the C Runtime code sets up the environment for the application.
The C Runtime is not the Standard C Library. The purpose of the C Runtime is to manage the runtime environment for the application. The purpose of the Standard C Library is to provide a set of useful utility functions so that a programmer doesn't have to create their own.
When the loader loads the application and jumps to the entry point provided by the C Runtime, the C Runtime then performs the various initialization actions needed to provide the proper runtime environment for the application. Once this is done, the C Runtime then calls the main() function so that the code created by the application developer or programmer starts to run. When the main() returns or when the exit() function is called, the C Runtime performs any actions needed to clean up and close out the application.
This is an extremely common source of confusion. I think the easiest way to understand what's happening is to take a simple example. Forget about libraries for a moment and consider the following:
$ cat main.c
extern int foo( void );
int main( void ) { return foo(); }
$ cat foo.c
int foo( void ) { return 0; }
$ cc -c main.c
$ cc -c foo.c
$ cc main.o foo.o
The declaration extern int foo( void ) is performing exactly the same function as the header file of a library. foo.o is performing the function of the library. If you understand this example, and why neither cc main.c nor cc main.o work, then you understand the difference between header files and libraries.
Yes, almost correct. Except that the linker does not links object files, but also libraries - in thise case, it's the C standard library (libc) is what is linked to your object file. The rest of your assumptions appear to be true about the compilation stages + difference between a header and a library.
When linking an application against a dynamic shared library such as in
gcc -o myprog myprog.o -lmylib
I know the linker (ld on my Linux) use the -l option to store in the produced myprog ELF executable file the name of the library (mylib in this case) that will be used at load and link time (both when the program will be started if we ignore lazy dynamic linking). I am wondering what are the other jobs perform by ld (I am only speaking of the static linking step done at compilation time) regarding the dynamic shared library ?
ld must checks for undefined symbol existence in provided dynamic shared libraries
any other stuff ?
Moreover, I will be interested on pointers you are using (books, online documentation) regarding ELF format and dynamic linking and loading processes.
While you hit the most obvious things ld needs to do when linking to ELF shared libraries, there are a few more you missed. I'll re-state the ones you mentioned and add some more:
Ensuring that all undefined symbols are resolved (unless the output is a shared library itself, in which case undefined symbols are valid).
Storing a reference to the library in a DT_NEEDED record of the _DYNAMIC object of the output file.
If the output is not position-independent and references objects (in the sense of data, as opposed to functions) in the shared library, generating a copy relocation to copy the original image of the object into the main program's data segment at load time, and the proper symbol table entry so that references to the object in the shared library itself get resolved to the new copy in the main program, rather than the original copy in the library.
Generating PLT thunks for the destination of each function call in the output that's not resolved at ld-time to a definition in the output.
These are the tasks I can think of that are specific to use of shared libraries, and of course don't include all the work that the linker already does which would be the same as for static linking. One way to think of what ld does with dynamic linking is that it takes object files with a huge repertoire of relocation types (representing anything the compiler or assembler can produce) and resolves all but a small number of them (for static linking, that number would be zero), where all of the remaining relocations fit into a much more limited set of types resolvable by the dynamic linker at load time.
One important step is the creation of a dynamic symbol table, which the runtime linker ld.so can use to link the executable against the library at runtime. It will also write the dynamic relocation table to note which machine code locations need to be changed to point to dynamically linked symbols. To see details:
objdump -T myprog
objdump -R myprog
Also note that the string written to the executable will actually be the SONAME of the library, which might be something like mylib.so.0. This will ensure that even when you install a newer and incompatible mylib.so.1.42 at some later point, the executable will use the compatible ABI version 0 instead. For details:
ldd myprog
Of course, the linker will also link your object files against one another, but since it does that even in the absence of a dynamic shared library, I take it that you are not interested in this part of its operation.
I'm getting errors in the lua plugin that I'm writing that are symptomatic of linking in two copies of the lua runtime, as per this message:
http://lua-users.org/lists/lua-l/2008-01/msg00671.html
Quote:
Which in turn means the equality test for dummynode is failing.
This is the usual symptom, if you've linked two copies of the Lua
core into your application (causing two instances of dummynode to
appear).
A common error is to link C extension modules (shared libraries)
with the static library. The linker command line for extension
modules must not ever contain -llua or anything similar!
The Lua core symbols (lua_insert() and so on) are only to be
exported from the executable which contains the Lua core itself.
All C extension modules loaded afterwards can then access these
symbols. Under ELF systems this is what -Wl,-E is for on the
linker line. MACH-O systems don't need this since all non-static
symbols are exported.
This is exactly the error I'm seeing... what I don't know is what I should be doing instead.
I've added the lua src directory to the include path of the DLL that is the c component of my lua plugin, but when I link it I get a pile of errors like:
Creating library file: libmo.dll.a
CMakeFiles/moshared.dir/objects.a(LTools.c.obj): In function `moLTools_dump':
d:/projects/mo-pong/deps/mo/src/mo/lua/LTools.c:38: undefined reference to `lua_gettop'
d:/projects/mo-pong/deps/mo/src/mo/lua/LTools.c:47: undefined reference to `lua_type'
d:/projects/mo-pong/deps/mo/src/mo/lua/LTools.c:48: undefined reference to `lua_typename'
d:/projects/mo-pong/deps/mo/src/mo/lua/LTools.c:49: undefined reference to `lua_tolstring'
So, in summary, I have this situation:
A parent binary that is statically linked to the lua runtime.
A lua library that loads a DLL with C code in it.
The C code in the DLL needs to invoke the lua c api (eg. lua_gettop())
How do I link that? Surely the dynamic library can't 'see' the symbols in the parent binary, because the parent binary isn't loading them from a DLL, they're statically linked.
...but if I link the symbols in as part of the plugin, I get the error above.
Help? This seems like a problem that should turn up a lot (dll depends on symbols in parent binary, how do you link it?) but I can't seem to see any useful threads about it.
(before you ask, no, I dont have control over the parent binary and I cant get it to load the lua symbols from the DLL)
It's probably best to use libtool for this to make your linking easier and more portable. The executable needs to be linked with -export-dynamic to export all the symbols in it, including the Lua symbols from the static library. The module needs to then be linked with -module -shared -avoid-version and, if on Windows, additionall -no-undefined; if on MacOS, additionally -no-undefined -flat_namespace -undefined suppress -bundle; Linux and FreeBSD need no other symbols. This will leave the module with undefined symbols that are satisfied in the parent. If there are any missing, the module will fail to be dlopened by the parent.
The semantics are slightly different for each environment, so it might take some fiddling. Sometimes order of the flags matters. Again, libtool is recommended since it hides much of the inconsistency.
I am reading about libraries in C but I have not yet found an explanation on what an object file is. What's the real difference between any other compiled file and an object file?
I would be glad if someone could explain in human language.
An object file is the real output from the compilation phase. It's mostly machine code, but has info that allows a linker to see what symbols are in it as well as symbols it requires in order to work. (For reference, "symbols" are basically names of global objects, functions, etc.)
A linker takes all these object files and combines them to form one executable (assuming that it can, i.e.: that there aren't any duplicate or undefined symbols). A lot of compilers will do this for you (read: they run the linker on their own) if you don't tell them to "just compile" using command-line options. (-c is a common "just compile; don't link" option.)
An Object file is the compiled file itself. There is no difference between the two.
An executable file is formed by linking the Object files.
Object file contains low level instructions which can be understood by the CPU. That is why it is also called machine code.
This low level machine code is the binary representation of the instructions which you can also write directly using assembly language and then process the assembly language code (represented in English) into machine language (represented in Hex) using an assembler.
Here's a typical high level flow for this process for code in High Level Language such as C
--> goes through pre-processor
--> to give optimized code, still in C
--> goes through compiler
--> to give assembly code
--> goes through an assembler
--> to give code in machine language which is stored in OBJECT FILES
--> goes through Linker
--> to get an executable file.
This flow can have some variations for example most compilers can directly generate the machine language code, without going through an assembler. Similarly, they can do the pre-processing for you. Still, it is nice to break up the constituents for a better understanding.
There are 3 kind of object files.
1. Relocatable object files:
Contain machine code in a form that can be combined with other relocatable object files at link time, in order to form an executable object file.
If you have an a.c source file, to create its object file with GCC you should run:
gcc a.c -c
The full process would be:
preprocessor (cpp) would run over a.c
Its output (still source; cpp) will feed into the compiler (cc1).
Its output (assembly) will feed into the assembler (as)
assembler (as) will produce the relocatable object file.
That relocatable object file contains:
object code, and metadata for linking, and debugging (if -g was used)
it is not directly executable.
2. Shared object files:
Special type of relocatable object file that can be loaded dynamically, either at load time, or at run time.
Shared libraries are an example of these kinds of objects.
3. Executable object files:
Contain machine code that can be directly loaded into memory (by the loader, e.g execve) and subsequently executed.
The result of running the linker over multiple relocatable object files is an executable object file. The linker merges all the input object files from the command line, from left-to-right, by merging all the same-type input sections (e.g. .data) to the same-type output section. It uses symbol resolution and relocation.
Bonus: Static vs Dynamic Libraries
When linking against a static library the functions that are referenced in the input objects are copied to the final executable.
With dynamic libraries a symbol table is created instead that will enable a dynamic linking with the library's functions/globals. Thus, the result is a partially executable object file, as it depends on the library. If the library doesn't exist, the file can no longer execute.
The linking process can be done as follows:
ld a.o -o myexecutable
The command: gcc a.c -o myexecutable will invoke all the commands mentioned at point 1 and at point 3 (cpp -> cc1 -> as -> ld1)
1: actually is collect2, which is a wrapper over ld.
An object file is just what you get when you compile one (or several) source file(s).
It can be either a fully completed executable or library, or intermediate files.
The object files typically contain native code, linker information, debugging symbols and so forth.
Object files are codes that are dependent on functions, symbols, and text to run the program. Just like old telex machines, which required teletyping to send signals to other telex machine.
In the same way processor's require binary code to run, object files are like binary code but not linked. Linking creates additional files so that the user does not have to have compile the C language themselves. Users can directly open the exe file once the object file is linked with some compiler like c language , or vb etc.