Difference in Object Files formats - linker

What is the difference between COFF (Common Object File Format) and ELF (Extendable and Linkable Format), other than they being for distinct platforms??
If I have C Compiler, the Object files that my Linker in the toolchain will receive is of COFF type, on Windows. If its linux, it will be ELF.
What is the thing that makes them two different things? Firstly I thought about the Executable file it(Linker) will generate depending upon the type of Object file it received. Is it so?

What is the thing that makes it two different things?
It's the same difference as that between an XML file and a GIF file -- they have completely different rules about their internal structure. The former must start with <?xml version=..., the latter with a string GIF87a or GIF89a.

Related

Usage differences between. a.out, .ELF, .EXE, and .COFF

Don't get me wrong by looking at the question title - I know what they are (format for portable executable files). But my interest scope is slightly different
MY CONFUSION
I am involved in re-hosting/retargeting applications that are originally from third parties. The problem is that sometimes the formats for object codes are also in .elf, .COFF formats and still says, "Executable and linkable".
I am primarily a Windows user and know that when you compile and assemble your C/C++ code, you get something similar to .o or .obj. that are not executable (well, I never tried to execute them). But when you complete linking static and dynamic libraries and finish building, the executable appears. My understanding is that you can then go about and link that executable or "bash" test it with some form of script if necessary.
However, in Linux (or UNIX-like systems) there are .o files after you compile and assemble the C/C++ code. And once the linking is done, the executable is in a.out format (at least in Ubuntu distribution of Linux). It may very well be .elf in some other distrib. In my quick web search none of the sources mentioned anything about .o files as executables.
QUESTIONS
Therefore my question turns into the followings:
What is the true definitions for portable executables and object code?
How is it that Windows and UNIX platform covers both executables annd object code under the same file format (.COFF, .elf).
Am I misinterpreting "Linkable"? My interpretation of "Linkable" is something that is compiled object code and can then be "linked" to other static/dynamic link libraries. Is this a stupid thought?
Based on question 1. (and perhaps 2) do I need to use symbol tables (e.g. .LUM or .MAP files) with object code then? Symbols as in debug symbols and using them when re-hosting the executables/object files on a different machine.
Thanks in advance for the right nudges. Meanwhile, I will keep digging and update the question if necessary.
UPDATE
I have managed to dig this out from somewhere :( Seems like a lot to swallow to me.
I am primarily a Windows user and know that when you compile your C/C++ code, you get something similar to .o or .obj. that are not executable
Well, last time I compiled stuff on Windows, the result of the compilation was an .obj file, which is exactly what its name suggests: it's an object file. You're right in that it's not an executable in itself. It contains machine code which doesn't (yet) contain enough information to be directly run on the CPU.
However, in Linux (or UNIX-like systems) there are .o files after you compile the C/C++ code. And once the linking is done, the executable is in a.out format (at least in Ubuntu distribution of Linux). It may very well be .elf in some other distrib.
Living in the 90's, that is :P No modern compilers I am aware of target the a.out format as their default output format for object code. Maybe it's a misleading default of GCC to put the object code into a file called a.out when no explicit output file name is specified, but if you run the file command on a.out, you'll find out that it's an ELF file. The a.out format is ancient and it's kind of "de facto obsolete".
What is the true definitions for portable executables and object code?
You've already got the Wikipedia link to object files, here's the one to "Portable Executable".
How is it that Windows and UNIX platform covers both executables annd object code under the same file format (.COFF, .elf).
Because the ELF format (and apparently COFF too) has been designed like so. And why not? It's just the very same machine code after all, it seems quite logical to use one file format during all the compilation steps. Just like we don't like when dynamic libraries and stand-alone executables have a different format. (That's why ELF is called ELF - it's an "Executable and Linkable Format".)
Am I misinterpreting "Linkable"?
I don't know. From your question it's not clear to me what you think "linkable" is. In general, it means that it's a file that can be linked against, i. e. a library.
Based on question 1. (and perhaps 2) do I need to use symbol tables (e.g. .LUM or .MAP files) with object code then? Symbols as in debug symbols and using them when re-hosting the object files on a different machine.
I think this one is not related to the executable format used. If you want to debug, you have to generate debugging information no matter what. But if you don't need to debug, then you're free to omit them of course.

What's an object file in C?

I am reading about libraries in C but I have not yet found an explanation on what an object file is. What's the real difference between any other compiled file and an object file?
I would be glad if someone could explain in human language.
An object file is the real output from the compilation phase. It's mostly machine code, but has info that allows a linker to see what symbols are in it as well as symbols it requires in order to work. (For reference, "symbols" are basically names of global objects, functions, etc.)
A linker takes all these object files and combines them to form one executable (assuming that it can, i.e.: that there aren't any duplicate or undefined symbols). A lot of compilers will do this for you (read: they run the linker on their own) if you don't tell them to "just compile" using command-line options. (-c is a common "just compile; don't link" option.)
An Object file is the compiled file itself. There is no difference between the two.
An executable file is formed by linking the Object files.
Object file contains low level instructions which can be understood by the CPU. That is why it is also called machine code.
This low level machine code is the binary representation of the instructions which you can also write directly using assembly language and then process the assembly language code (represented in English) into machine language (represented in Hex) using an assembler.
Here's a typical high level flow for this process for code in High Level Language such as C
--> goes through pre-processor
--> to give optimized code, still in C
--> goes through compiler
--> to give assembly code
--> goes through an assembler
--> to give code in machine language which is stored in OBJECT FILES
--> goes through Linker
--> to get an executable file.
This flow can have some variations for example most compilers can directly generate the machine language code, without going through an assembler. Similarly, they can do the pre-processing for you. Still, it is nice to break up the constituents for a better understanding.
There are 3 kind of object files.
1. Relocatable object files:
Contain machine code in a form that can be combined with other relocatable object files at link time, in order to form an executable object file.
If you have an a.c source file, to create its object file with GCC you should run:
gcc a.c -c
The full process would be:
preprocessor (cpp) would run over a.c
Its output (still source; cpp) will feed into the compiler (cc1).
Its output (assembly) will feed into the assembler (as)
assembler (as) will produce the relocatable object file.
That relocatable object file contains:
object code, and metadata for linking, and debugging (if -g was used)
it is not directly executable.
2. Shared object files:
Special type of relocatable object file that can be loaded dynamically, either at load time, or at run time.
Shared libraries are an example of these kinds of objects.
3. Executable object files:
Contain machine code that can be directly loaded into memory (by the loader, e.g execve) and subsequently executed.
The result of running the linker over multiple relocatable object files is an executable object file. The linker merges all the input object files from the command line, from left-to-right, by merging all the same-type input sections (e.g. .data) to the same-type output section. It uses symbol resolution and relocation.
Bonus: Static vs Dynamic Libraries
When linking against a static library the functions that are referenced in the input objects are copied to the final executable.
With dynamic libraries a symbol table is created instead that will enable a dynamic linking with the library's functions/globals. Thus, the result is a partially executable object file, as it depends on the library. If the library doesn't exist, the file can no longer execute.
The linking process can be done as follows:
ld a.o -o myexecutable
The command: gcc a.c -o myexecutable will invoke all the commands mentioned at point 1 and at point 3 (cpp -> cc1 -> as -> ld1)
1: actually is collect2, which is a wrapper over ld.
An object file is just what you get when you compile one (or several) source file(s).
It can be either a fully completed executable or library, or intermediate files.
The object files typically contain native code, linker information, debugging symbols and so forth.
Object files are codes that are dependent on functions, symbols, and text to run the program. Just like old telex machines, which required teletyping to send signals to other telex machine.
In the same way processor's require binary code to run, object files are like binary code but not linked. Linking creates additional files so that the user does not have to have compile the C language themselves. Users can directly open the exe file once the object file is linked with some compiler like c language , or vb etc.

What is the difference between a .o file and a .lib file?

What is the difference between a .o file and a .lib file?
Conceptually, a compilation unit (the unit of code in a source file/object file) is either linked entirely or not at all. While some implementations, with significant levels of cooperation between the compiler and linker, are able to remove unused code from object files at link time, it doesn't change the issue that including 2 compilation units with conflicting symbol names in a program is an error.
As a practical example, suppose your library has two functions foo and bar and they're in an object file together. If I want to use bar, but my program already has an external symbol named foo, I'm stuck with an error. Even if or how the implementation might be able to resolve this problem for me, the code is still incorrect.
On the other hand, if I have a library file containing two separate object files, one with foo and the other with bar, only the one containing bar will get pulled into my program.
When writing libraries, you should avoid including multiple functions in the same object file unless it's essential that they be used together. Doing so will bloat up applications which link your library (statically) and increase the likelihood of symbol conflicts. Personally I prefer erring on the side of separate files when there's a doubt - it's even useful to put foo_create and foo_free in separate files if the latter is nontrivial so that short one-off programs that don't need to call foo_free can avoid pulling in the code for deep freeing (and possibly even avoid pulling in the implementation of free itself).
A .LIB file is a collection of .OBJ
files concatenated together with an
index. There should be no difference
in how the linker treats either.
Quoted from here:
What is the difference between .LIB and .OBJ files? (Visual Studio C++)
They are actually quite different, specially with older linkers.
The .o (or .obj) files are object files, they contain the output of the compiler generated code. It is still in an intermediate format, for example, most references are still unresolved. Usually there is a one to one mapping between the source file and the object file.
The .a (or .lib) files are archives, also known as library, and are a set of object files.
All operating systems have tools that allow you to add/remove/list object files to library files.
Another difference, specially with older linkers is how the files are dealt with, when linking them. Some linked will place the complete object file into the final binary, regardless of what is actually being used, while they will only extract the useful information out of library files.
Nowadays most linkers are smart enough to remove all stuff that is not being used.

What is the difference between ELF files and bin files?

The final images produced by compliers contain both bin file and extended loader format ELf file ,what is the difference between the two , especially the utility of ELF file.
A Bin file is a pure binary file with no memory fix-ups or relocations, more than likely it has explicit instructions to be loaded at a specific memory address. Whereas....
ELF files are Executable Linkable Format which consists of a symbol look-ups and relocatable table, that is, it can be loaded at any memory address by the kernel and automatically, all symbols used, are adjusted to the offset from that memory address where it was loaded into. Usually ELF files have a number of sections, such as 'data', 'text', 'bss', to name but a few...it is within those sections where the run-time can calculate where to adjust the symbol's memory references dynamically at run-time.
A bin file is just the bits and bytes that go into the rom or a particular address from which you will run the program. You can take this data and load it directly as is, you need to know what the base address is though as that is normally not in there.
An elf file contains the bin information but it is surrounded by lots of other information, possible debug info, symbols, can distinguish code from data within the binary. Allows for more than one chunk of binary data (when you dump one of these to a bin you get one big bin file with fill data to pad it to the next block). Tells you how much binary you have and how much bss data is there that wants to be initialised to zeros (gnu tools have problems creating bin files correctly).
The elf file format is a standard, arm publishes its enhancements/variations on the standard. I recommend everyone writes an elf parsing program to understand what is in there, dont bother with a library, it is quite simple to just use the information and structures in the spec. Helps to overcome gnu problems in general creating .bin files as well as debugging linker scripts and other things that can help to mess up your bin or elf output.
some resources:
ELF for the ARM architecture
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0044d/IHI0044D_aaelf.pdf
ELF from wiki
http://en.wikipedia.org/wiki/Executable_and_Linkable_Format
ELF format is generally the default output of compiling.
if you use GNU tool chains, you can translate it to binary format by using objcopy, such as:
arm-elf-objcopy -O binary [elf-input-file] [binary-output-file]
or using fromELF utility(built in most IDEs such as ADS though):
fromelf -bin -o [binary-output-file] [elf-input-file]
bin is the final way that the memory looks before the CPU starts executing it.
ELF is a cut-up/compressed version of that, which the CPU/MCU thus can't run directly.
The (dynamic) linker first has to sufficiently reverse that (and thus modify offsets back to the correct positions).
But there is no linker/OS on the MCU, hence you have to flash the bin instead.
Moreover, Ahmed Gamal is correct.
Compiling and linking are separate stages; the whole process is called "building", hence the GNU Compiler Collection has separate executables:
One for the compiler (which technically outputs assembly), another one for the assembler (which outputs object code in the ELF format),
then one for the linker (which combines several object files into a single ELF file), and finally, at runtime, there is the dynamic linker,
which effectively turns an elf into a bin, but purely in memory, for the CPU to run.
Note that it is common to refer to the whole process as "compiling" (as in GCC's name itself), but that then causes confusion when the specifics are discussed,
such as in this case, and Ahmed was clarifying.
It's a common problem due to the inexact nature of human language itself.
To avoid confusion, GCC outputs object code (after internally using the assembler) using the ELF format.
The linker simply takes several of them (with an .o extension), and produces a single combined result, probably even compressing them (into "a.out").
But all of them, even ".so" are ELF.
It is like several Word documents, each ending in ".chapter", all being combined into a final ".book",
where all files technically use the same standard/format and hence could have had ".docx" as the extension.
The bin is then kind of like converting the book into a ".txt" file while adding as many whitespace as necessary to be equivalent to the size of the final book (printed on a single spool),
with places for all the pictures to be overlaid.
I just want to correct a point here. ELF file is produced by the Linker, not the compiler.
The Compiler mission ends after producing the object files (*.o) out of the source code files. Linker links all .o files together and produces the ELF.

Object code, linking time in C language

When compiling, C produces object code before linking time.
I wonder if object code is in the form of binary yet?
If so, what happened next in the linking time?
Wikipedia says,
In computer science, an object file is
an organised collection of named
objects, and typically these objects
are sequences of computer instructions
in a machine code format, which may be
directly executed by a computer's CPU.
Object files are typically produced by
a compiler as a result of processing a
source code file. Object files contain
compact code, and are often called
"binaries".
A linker is typically used
to generate an executable or library
by amalgamating parts of object files
together. Object files for embedded
systems typically contain nothing but
machine code but generally, object
files also contain data for use by the
code at runtime: relocation
information, stack unwinding
information, comments, program symbols
(names of variables and functions) for
linking and/or debugging purposes, and
other debugging information.
Another great site has much more detailed info and a useful diagram, here:
Object files as produced by the C compiler essentially contain binary code with holes in each place where an address should go that is yet unknown (addresses of function from other files -- including libraries -- called, addresses of variables from other files that are accessed in this one, ...).
It also contains a table indexed by symbol names ("x" or "_x" for variable x, "f" or "_f" for function f). For each such symbol, there is a status code ("defined here", "not defined here but used", ...) and the addresses of holes in the binary code that need to be filed with each address when it becomes known.
If you are using Unix (or gcc on Windows), you can print the later table with the command "nm file.o".
Yes, object code is usually in binary form. Just try opening it in your favorite text editor.
You can learn what linkers do here or here.

Resources