Position-independent executable: What is "main executable binary"? - c

When reading https://en.wikipedia.org/wiki/Address_space_layout_randomization, I encountered a term:
Position-independent executable (PIE) implements a random base address for the main executable binary and has been in place since 2003. It provides the same address randomness to the main executable as being used for the shared libraries.
What does main executable binary mean here? Is it just a tu/source file that contains the main function?

It means the output of the linker when you build your executable, the so-called a.out file in the *nix world. The compiler creates object files and the linker resolves all the dependencies into an a.out file. Some of those dependencies will be external (dynamic link libraries).
The main executable will be the file that the os (possibly the linker) loads initially when you execute it. Subsequent loads would be the dynamic link libraries that are external dependencies created during the build process.

I assume it is the binary with main() function, so the program in strict sense.
Previously, programs were loaded on specific addresses, but dynamic libraries were already loaded at different addresses, so I think the main here it is just to empathize that it is for the program binary and not for the library binaries.

Related

what's .rela.plt section in executable object files?

I use static linking to produce the executable object files and I use readelf to check the file and found there is one section called: .rela.plt
the keyword 'rela' indicates that this is related to relocation. but since I use static linking, not using any shared library, so the output executable file should be a fully linked executable file, so why this file still contain relocation information?
There are two ways run-time relocations can end up in statically-linked programs.
The GNU toolchain supports selecting different function implementations at run time using the IFUNC mechanism. On x86-64, these show up as R_X86_64_IRELATIVE relocations.
Some targets support statically linked position independent executables (via -static-pie in the GNU toolchain). Since the the load address differs from program run to program due to address-space layout randomization, any global data object that contains a pointer needs to be relocated at run time. On x86-64, these relocations show up as R_X86_64_RELATIVE.
(There might be other things that need relocations in statically linked programs on more obscure targets.)

Linking using map file without having object files

I want to create an operating system for embedded device with very limited resources (an ESP8266) that can load ELF files as program or shared object (shared object is in second importance).
I want to know is it possible to link any program for this OS against map file of OS?
for example I implement memcpy in OS and make a header file that declares it as extern, Compile OS and generate map file. then when i want to write a program, include the header to compile it successfully and make linker to peek the address of memcpy from map file of OS.
the OS is place non-independent and its functions are always at a fixed address, but programs are place independent ELF files. it is not necessary to program be loadable for different builds of OS.
This is by no means a complete solution to the problem of running ELFs on a embedded target but for the specific problem of providing known addresses during the linking process, GNU LD allows you to provide addresses for symbols in code defined as extern by adding a PROVIDE statement or a simple assignment to the linker script. LD won't directly read a map file, but you could parse the map file, find the relevant addresses, generate a linker script that has the appropriate symbols provided, and use that linker script in the compilation of the ELF. The documentation for the provide and assignment features can be found at https://sourceware.org/binutils/docs/ld/Assignments.html

Undefined reference to error when .so libraries are involved while building the executable

I have a .so library and while building it I didn't get any undefined reference errors.
But now I am building an executable using the .so file and I can see the undefined reference errors during the linking stage as shown below:
xy.so: undefined reference to `MICRO_TO_NANO_ULL'
I referred to this and this but couldn't really understand the dynamic linking.
Also reading from here lead to more confusion:
Dynamic linking is accomplished by placing the name of a sharable
library in the executable image. Actual linking with the library
routines does not occur until the image is run, when both the
executable and the library are placed in memory. An advantage of
dynamic linking is that multiple programs can share a single copy of
the library.
My questions are:
Doesn't dynamic linking means that when I start the executable using
./executable_name then if the linker not able to locate the .so
file on which executable depends it should crash?
What actually is dynamic linking if all external entity references are
resolved while building? Is it some sort of pre-check performed by dynamic linker? Else
dynamic linker can make use of
LD_LIBRARY_PATH to get additional libraries to resolve the undefined
symbols.
Doesn't dynamic linking means that when I start the executable using ./executable_name then if the linker not able to locate the .so file on which executable depends it should crash?
No, linker will exit with "No such file or directory" message.
Imagine it like this:
Your executable stores somewhere a list of shared libraries it needs.
Linker, think of it as a normal program.
Linker opens your executable.
Linker reads this list. For each file.
It tries to find this file in linker paths.
If it finds the file, it "loads" it.
If it can't find the file, it get's errno with No Such file or directory from open() call. And then prints a message that it can't find the library and terminates your executable.
When running the executable, linker dynamically searches for a symbol in shared libraries.
When it can't find a symbol, it prints some message and the executable teerminates.
You can for example set LD_DEBUG=all to inspect what linker is doing. You can also inspect your executable under strace to see all the open calls.
What actually is dynamic linking if all external entity references are resolved while
building?
Dynamic linking is when you run the executable then the linker loads each shared library.
When building, your compiler is kind enough to check for you, that all symbols that you use in your program exist in shared libraries. This is just for safety. You can for example disable this check with ex. --unresolved-symbols=ignore-in-shared-libs.
Is it some sort of pre-check performed by dynamic linker?
Yes.
Else dynamic linker can make use of LD_LIBRARY_PATH to get additional libraries to resolve the undefined symbols.
LD_LIBRARY_PATH is just a comma separated list of paths to search for the shared library. Paths in LD_LIBRARY_PATH are just processed before standard paths. That's all. It doesn't get "additional libraries", it gets additional paths to search for the libraries - libraries stay the same.
It looks like there is a #define missing when you compile your shared library. This error
xy.so: undefined reference to `MICRO_TO_NANO_ULL'
means, that something like
#define MICRO_TO_NANO_ULL(sec) ((unsigned long long)sec * 1000)
should be present, but is not.
The compiler assumes then, that it is an external function and creates an (undefined) symbol for it, while it should be resolved at compile time by a preprocessor macro.
If you include the correct file (grep for the macro name) or put an appropriate definition at the top of your source file, then the linker error should vanish.
Doesn't dynamic linking means that when I start the executable using ./executable_name then if the linker not able to locate the .so file on which executable depends it should crash?
Yes. If the .so file is not present at run-time.
What actually is dynamic linking if all external entity references are resolved while building? Is it some sort of pre-check performed by dynamic linker? Else dynamic linker can make use of LD_LIBRARY_PATH to get additional libraries to resolve the undefined symbols.
It allows for libraries to be upgraded and have applications still be able to use the library, and it reduces memory usage by loading one copy of the library instead of one in every application that uses it.
The linker just creates references to these symbols so that the underlying variables or functions can be used later. It does not link the variables and functions directly into the executable.
The dynamic linker does not pull in any libraries unless those libraries are specified in the executable (or by extension any library the executable depends on). If you provide an LD_LIBRARY_PATH directory with a .so file of an entirely different version than what the executable requires the executable can crash.
In your case, it seems as if a required macro definition has not been found and the compiler is using implicit declaration rules. You can easily fix this by compiling your code with -pedantic -pedantic-errors (assuming you're using GCC).
Doesn't dynamic linking means that when I start the executable using
./executable_name then if the linker not able to locate the .so file
on which executable depends it should crash?
It will crash. The time of crash does depend on the way you call a certain exported function from the .so file.
You might retrieve all exported functions via functions pointers by yourself by using dlopen dlysm and co. In this case the program will crash at first call in case it does not find the exported method.
In case of the executable just calling an exported method from a shared object (part of it's header) the dynamic linker uses the information of the method to be called in it's executable (see second answer) and crashes in case of not finding the lib or a mismatch in symbols.
What actually is dynamic linking if all external entity references are resolved while building? Is it some sort of pre-check performed by dynamic linker? Else dynamic linker can make use of LD_LIBRARY_PATH to get additional libraries to resolve the undefined symbols.
You need to differentiate between the actual linking and the dynamic linking. Starting off with the actual linking:
In case of linking a static library, the actual linking will copy all code from the method to be called inside the executable/library using it.
When linking a dynamic library you will not copy code but symbols. The symbols contain offsets or other information pointing to the acual code in the dynamic library. If the executable does invoke a method which is not exported by the dynamic library though, it will already fail at the actual linking part.
Now when starting your executable, the OS will at some point try to load the shared object into memory where the code actually resides in. If it does not find it or also if it is imcotable (i.e.: the executable was linked to a library using different exports), it might still fail at runtime.

Is Dynamic Linker part of Kernel or GCC Library on Linux Systems?

Is Dynamic Linker (aka Program Interpreter, Link Loader) part of Kernel or GCC Library ?
UPDATE (28-08-16):
I have found that the default path for dynamic linker that every binary (i.e linked against a shared library) uses /lib64/ld-linux-x86-64.so.2 is a link to the shared library /lib/x86_64-linux-gnu/ld-2.23.so which is the actual dynamic linker.
And It is part of libc6 (2.23-0ubuntu3) package viz. GNU C Library: Shared libraries in ubuntu for AMD64 architectures.
My actual question was
what would happen to all the applications that are dynamically linked (all,now a days), if this helper program (ld-2.23.so) doesn't exist ?
And answer to that is " no application would run, even the shell program ". I've tried it on virutal machine.
In an ELF executable, this is referred to as the "ELF interpreter". On linux (e.g.) this is /lib64/ld-linux-x86-64.so.2
This is not part of the kernel and [generally] with glibc et. al.
When the kernel executes an ELF executable, it must map the executable into userspace memory. It then looks inside for a special sub-section known as INTERP [which contains a string that is the full path].
The kernel then maps the interpreter into userspace memory and transfers control to it. Then, the interpreter does the necessary linking/loading and starts the program.
Because ELF stands for "extensible linker format", this allows many different sub-sections with the ELF file.
Rather than burdening the kernel with having to know about all the myriad of extensions, the ELF interpreter that is paired with the file knows.
Although usually only one format is used on a given system, there can be several different variants of ELF files on a system, each with its own ELF interpreter.
This would allow [say] a BSD ELF file to be run on a linux system [with other adjustments/support] because the ELF file would point to the BSD ELF interpreter rather than the linux one.
UPDATE:
every process(vlc player, chrome) had the shared library ld.so as part of their address space.
Yes. I assume you're looking at /proc/<pid>/maps. These are mappings (e.g. like using mmap) to the files. That is somewhat different than "loading", which can imply [symbol] linking.
So primarily loader after loading the executable(code & data) onto memory , It loads& maps dynamic linker (.so) to its address space
The best way to understand this is to rephrase what you just said:
So primarily the kernel after mapping the executable(code & data) onto memory, the kernel maps dynamic linker (.so) to the program address space
That is essentially correct. The kernel also maps other things, such as the bss segment and the stack. It then "pushes" argc, argv, and envp [the space for environment variables] onto the stack.
Then, having determined the start address of ld.so [by reading a special section of the file], it sets that as the resume address and starts the thread.
Up until now, it has been the kernel doing things. The kernel does little to no symbol linking.
Now, ld.so takes over ...
which further Loads shared Libraries , map & resolve references to libraries. It then calls entry function (_start)
Because the original executable (e.g. vlc) has been mapped into memory, ld.so can examine it for the list of shared libraries that it needs. It maps these into memory, but does not necessarily link the symbols right away.
Mapping is easy and quick--just an mmap call.
The start address of the executable [not to be confused with the start address of ld.so], is taken from a special section of the ELF executable. Although, the symbol associated with this start address has been traditionally called _start, it could actually be named anything (e.g. __my_start) as it is what is in the section data that determines the start address and not address of the symbol _start
Linking symbol references to symbol definitions is a time consuming process. So, this is deferred until the symbol is actually used. That is, if a program has references to printf, the linker doesn't actually try to link in printf until the first time the program actually calls printf
This is sometimes called "link-on-demand" or "on-demand-linking". See my answer here: Which segments are affected by a copy-on-write? for a more detailed explanation of that and what actually happens when an executable is mapped into userspace.
If you're interested, you could do ldd /usr/bin/vlc to get a list of the shared libraries it uses. If you looked at the output of readelf -a /usr/bin/vlc, you'll see these same shared libraries. Also, you'd get the full path of the ELF interpreter and could do readelf -a <full_path_to_interpreter> and note some of the differences. You could repeat the process for any .so files that vlc wanted.
Combining all that with /proc/<pid>maps et. al. might help with your understanding.

Operating System - Static linking is done by the Linker or Loader?

In the "Operating System Concepts", 9th edition, by Abraham Silberschatz et al., the authors said that:
"Some operating systems support only static linking,
in which system libraries are treated like any other object module
and are combined by **the loader** into the binary program image."
(page 381, the 2nd sentence of the 1st paragraph of section 8.1.5
I wonder that the linking (combining) is performed by the Linker or Loader?
Thanks.
(assuming GNU/Linux)
I believe that's a typing mistake.
Static linking is done by the linker where you'd have a binary program image that contains your program's code and that of the library you're linking against; the loader will simply load your program as a whole.
Using Gnu C Compiler package, you may use static linking like this: gcc -static code.c
To check that the result indeed contains no markers for dynamically loaded libraries:
ldd a.out and you'll get a message like this: not a dynamic executable
When dynamically linking against a library, the linker will technically only leave a little marker in the resulting binary image stating that library 'x' needs to be loaded as well for your program to execute.
When the loader reads this binary image, it'll notice the marker and load the library; this action is never done in static linking because the whole thing becomes one large binary image.

Resources