from how many Relocatable files an Elf execution file Built from? - c

Is there a way I can find from how many Relocatable files an Elf execution file Built from?
And how can I associate a segment with its original Relocatable file?
Thanks in avance.

Is there a way I can find from how many Relocatable files an Elf execution file Built from?
No (at least not in general).
And how can I associate a segment with its original Relocatable file?
Any trace of the original ET_REL file will be gone by the time the linker is done linking.
In addition, with section reordering, a single .o can be "spread out" all over the final linked binary, and with identical code folding a single segment of the final binary can be associated with multiple .o files.
I think http://xyproblem.info is also relevant here.

Related

Difference between the ELF vs MAP file

The linker can output both the ELF and the MAP file. These files are especially relevant in the embedded systems world because the ELF file is usually used to read out the addresses of variables or functions. Additionally, the ELF file is used by different embedded measurement or analysis tools.
When I open a MAP file, then within it, I can see for each global variable and every external function the following information: allocated address, symbolic name, allocated bytes, memory unit, and memory section.
On the other hand, once I open the ELF file, it is a binary file and not human readable. However, some tools I use are able to read it out and interpret it. Those tools can interpret the ELF file, and obtain the information about the symbolic name of a variable/function and its address or even show a function prototype.
From my understanding the ELF and MAP files are basically containing the same information, it is just that the first one is binary and the latter one is the text file.
So what are the actual differences between these two files from the content perspective?
Thank you in advance!
The primary output of the linker (i.e. its main purpose) is to produce the fully linked executable code. That is the ELF (Executable Linkable Format) file. An ELF file may as you have observed contain symbols - these are used for debug. It may also contain meta-data that associates the machine code with the source code from which it was generated. But the bulk of its content (and the part that is not optional) is the executable machine code and data objects that are your application.
The MAP file is an optional information only human readable output that contains information about the location and size of code and data objects in your application. The MAP file includes a summary that shows the total size and memory usage of your code.
In an embedded cross-development environment, the symbol information in the ELF file is used when the code is loaded into a source-level symbolic debugger. The debugger takes the binary code/data segments in the ELF file and loads them onto the target (typically using a JTAG or other debug/programming hardware tool), it loads the symbols and source-level debug meta-data into the debugger, then while the real machine code is executing on the target, that execution is reflected in the debugger in the original source code where you can view, step and break-point the code at the source level.
In short, the ELF file is your program. The MAP file is, as its name suggests, a map of your executable - it tells you where things are in the executable.

In linking, what is the difference between run-time addresses and the addresses in the .text (machine code) section of a relocatable object file?

I am learning about linking right now (self-taught) and I am having some trouble understanding some concepts.
After the preprocessing, compilation, and assembly of a source code file, you got a relocatable object file with an ELF format (WLOG). In this ____.o file, there is a .text section that contains the machine code of the individual source code.
Does this machine code correspond to the run-time addresses of the code that is in the input file? Like if the machine code where to run (assuming no unresolved external references) would the runtime profile of the code match the machine code here?
If this is true, is it safe to say that symbol references in this code are pointing to the runtime address of their corresponding symbols?
I need to know this so that I can better understand the linking process which happens directly after this process.
Does this machine code correspond to the run-time addresses of the code that is in the input file?
No.
It can't, because the code in a single .o file doesn't know what other object files will be linked into the main executable. Imagine foo.o saying "I want to be at address 0x123000", and bar.o saying "I want to be at address 0x123004". They clearly can't be both at the same address.
The "final" runtime addresses are determined by the linker, which collects all the different .o files, resolves references between them, and lays out the final executable in memory. (Even this isn't a complete story, as shared libraries and position-independent executables complicate the answer more.)

What's the difference between binary and executable files mentioned in ndisasm's manual?

I want to compile my C file with clang and then decompile it with with ndisasm (for educational purposes). However, ndisasm says in it's manual that it only works with binary and not executable files:
ndisasm only disassembles binary files: it has
no understanding of the header information
present in object or executable files. If you
want to disassemble an object file, you should
probably be using objdump(1).
What's the difference, exactly? And what does clang output when I run it with a simple C file, an executable or a binary?
An object file contains machine language code, and all sorts of other information. It sounds like ndisasm wants just the machine code, not the other stuff. So the message is telling you to use the objdump utility to extract just the machine code segment(s) from the object file. Then you can presumably run ndisasm on that.
And what does clang output when I run it with a simple C file, an executable or a binary?
A C compiler is usually able to create a 'raw' binary, which is Just The Code, hold the tomato, because for some (rare!) purposes that can be useful. Think, for instance, of boot sectors (which cannot 'load' an executable the regular way because the OS to load them is not yet started) and of programmable RAM chips. An Operating system in itself usually does not like to execute 'raw binary code' - pretty much for the same reasons. An exception is MS Windows, which still can run old format .com binaries.
By default, clang will create an executable. The intermediate files, called object files, are usually deleted after the executable is linked (glued together with library functions and an appropriate executable header). To get just a .o object file, use the -c switch.
Note that Object files also contain a header. After all, the linker needs to know what the file contains before it can link it to other parts.
For educational purposes, you may want to examine the object file format. Armed with that knowledge it should be possible to write a program that can tell you at what offset in the file the actual code starts. Then you can feed that information into ndisasm.
In addition to the header, files may contain even more data after the instructions. Again, ndisasm does not know and nor does it care. If your test program contains a string Hello world! somewhere at the end, it will happily try to disassemble that as well. It's up to you to recognize this garbage as such, and ignore what ndisasm does to it.

What's an object file in C?

I am reading about libraries in C but I have not yet found an explanation on what an object file is. What's the real difference between any other compiled file and an object file?
I would be glad if someone could explain in human language.
An object file is the real output from the compilation phase. It's mostly machine code, but has info that allows a linker to see what symbols are in it as well as symbols it requires in order to work. (For reference, "symbols" are basically names of global objects, functions, etc.)
A linker takes all these object files and combines them to form one executable (assuming that it can, i.e.: that there aren't any duplicate or undefined symbols). A lot of compilers will do this for you (read: they run the linker on their own) if you don't tell them to "just compile" using command-line options. (-c is a common "just compile; don't link" option.)
An Object file is the compiled file itself. There is no difference between the two.
An executable file is formed by linking the Object files.
Object file contains low level instructions which can be understood by the CPU. That is why it is also called machine code.
This low level machine code is the binary representation of the instructions which you can also write directly using assembly language and then process the assembly language code (represented in English) into machine language (represented in Hex) using an assembler.
Here's a typical high level flow for this process for code in High Level Language such as C
--> goes through pre-processor
--> to give optimized code, still in C
--> goes through compiler
--> to give assembly code
--> goes through an assembler
--> to give code in machine language which is stored in OBJECT FILES
--> goes through Linker
--> to get an executable file.
This flow can have some variations for example most compilers can directly generate the machine language code, without going through an assembler. Similarly, they can do the pre-processing for you. Still, it is nice to break up the constituents for a better understanding.
There are 3 kind of object files.
1. Relocatable object files:
Contain machine code in a form that can be combined with other relocatable object files at link time, in order to form an executable object file.
If you have an a.c source file, to create its object file with GCC you should run:
gcc a.c -c
The full process would be:
preprocessor (cpp) would run over a.c
Its output (still source; cpp) will feed into the compiler (cc1).
Its output (assembly) will feed into the assembler (as)
assembler (as) will produce the relocatable object file.
That relocatable object file contains:
object code, and metadata for linking, and debugging (if -g was used)
it is not directly executable.
2. Shared object files:
Special type of relocatable object file that can be loaded dynamically, either at load time, or at run time.
Shared libraries are an example of these kinds of objects.
3. Executable object files:
Contain machine code that can be directly loaded into memory (by the loader, e.g execve) and subsequently executed.
The result of running the linker over multiple relocatable object files is an executable object file. The linker merges all the input object files from the command line, from left-to-right, by merging all the same-type input sections (e.g. .data) to the same-type output section. It uses symbol resolution and relocation.
Bonus: Static vs Dynamic Libraries
When linking against a static library the functions that are referenced in the input objects are copied to the final executable.
With dynamic libraries a symbol table is created instead that will enable a dynamic linking with the library's functions/globals. Thus, the result is a partially executable object file, as it depends on the library. If the library doesn't exist, the file can no longer execute.
The linking process can be done as follows:
ld a.o -o myexecutable
The command: gcc a.c -o myexecutable will invoke all the commands mentioned at point 1 and at point 3 (cpp -> cc1 -> as -> ld1)
1: actually is collect2, which is a wrapper over ld.
An object file is just what you get when you compile one (or several) source file(s).
It can be either a fully completed executable or library, or intermediate files.
The object files typically contain native code, linker information, debugging symbols and so forth.
Object files are codes that are dependent on functions, symbols, and text to run the program. Just like old telex machines, which required teletyping to send signals to other telex machine.
In the same way processor's require binary code to run, object files are like binary code but not linked. Linking creates additional files so that the user does not have to have compile the C language themselves. Users can directly open the exe file once the object file is linked with some compiler like c language , or vb etc.

What is the difference between ELF files and bin files?

The final images produced by compliers contain both bin file and extended loader format ELf file ,what is the difference between the two , especially the utility of ELF file.
A Bin file is a pure binary file with no memory fix-ups or relocations, more than likely it has explicit instructions to be loaded at a specific memory address. Whereas....
ELF files are Executable Linkable Format which consists of a symbol look-ups and relocatable table, that is, it can be loaded at any memory address by the kernel and automatically, all symbols used, are adjusted to the offset from that memory address where it was loaded into. Usually ELF files have a number of sections, such as 'data', 'text', 'bss', to name but a few...it is within those sections where the run-time can calculate where to adjust the symbol's memory references dynamically at run-time.
A bin file is just the bits and bytes that go into the rom or a particular address from which you will run the program. You can take this data and load it directly as is, you need to know what the base address is though as that is normally not in there.
An elf file contains the bin information but it is surrounded by lots of other information, possible debug info, symbols, can distinguish code from data within the binary. Allows for more than one chunk of binary data (when you dump one of these to a bin you get one big bin file with fill data to pad it to the next block). Tells you how much binary you have and how much bss data is there that wants to be initialised to zeros (gnu tools have problems creating bin files correctly).
The elf file format is a standard, arm publishes its enhancements/variations on the standard. I recommend everyone writes an elf parsing program to understand what is in there, dont bother with a library, it is quite simple to just use the information and structures in the spec. Helps to overcome gnu problems in general creating .bin files as well as debugging linker scripts and other things that can help to mess up your bin or elf output.
some resources:
ELF for the ARM architecture
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0044d/IHI0044D_aaelf.pdf
ELF from wiki
http://en.wikipedia.org/wiki/Executable_and_Linkable_Format
ELF format is generally the default output of compiling.
if you use GNU tool chains, you can translate it to binary format by using objcopy, such as:
arm-elf-objcopy -O binary [elf-input-file] [binary-output-file]
or using fromELF utility(built in most IDEs such as ADS though):
fromelf -bin -o [binary-output-file] [elf-input-file]
bin is the final way that the memory looks before the CPU starts executing it.
ELF is a cut-up/compressed version of that, which the CPU/MCU thus can't run directly.
The (dynamic) linker first has to sufficiently reverse that (and thus modify offsets back to the correct positions).
But there is no linker/OS on the MCU, hence you have to flash the bin instead.
Moreover, Ahmed Gamal is correct.
Compiling and linking are separate stages; the whole process is called "building", hence the GNU Compiler Collection has separate executables:
One for the compiler (which technically outputs assembly), another one for the assembler (which outputs object code in the ELF format),
then one for the linker (which combines several object files into a single ELF file), and finally, at runtime, there is the dynamic linker,
which effectively turns an elf into a bin, but purely in memory, for the CPU to run.
Note that it is common to refer to the whole process as "compiling" (as in GCC's name itself), but that then causes confusion when the specifics are discussed,
such as in this case, and Ahmed was clarifying.
It's a common problem due to the inexact nature of human language itself.
To avoid confusion, GCC outputs object code (after internally using the assembler) using the ELF format.
The linker simply takes several of them (with an .o extension), and produces a single combined result, probably even compressing them (into "a.out").
But all of them, even ".so" are ELF.
It is like several Word documents, each ending in ".chapter", all being combined into a final ".book",
where all files technically use the same standard/format and hence could have had ".docx" as the extension.
The bin is then kind of like converting the book into a ".txt" file while adding as many whitespace as necessary to be equivalent to the size of the final book (printed on a single spool),
with places for all the pictures to be overlaid.
I just want to correct a point here. ELF file is produced by the Linker, not the compiler.
The Compiler mission ends after producing the object files (*.o) out of the source code files. Linker links all .o files together and produces the ELF.

Resources