Difference between the ELF vs MAP file - linker

The linker can output both the ELF and the MAP file. These files are especially relevant in the embedded systems world because the ELF file is usually used to read out the addresses of variables or functions. Additionally, the ELF file is used by different embedded measurement or analysis tools.
When I open a MAP file, then within it, I can see for each global variable and every external function the following information: allocated address, symbolic name, allocated bytes, memory unit, and memory section.
On the other hand, once I open the ELF file, it is a binary file and not human readable. However, some tools I use are able to read it out and interpret it. Those tools can interpret the ELF file, and obtain the information about the symbolic name of a variable/function and its address or even show a function prototype.
From my understanding the ELF and MAP files are basically containing the same information, it is just that the first one is binary and the latter one is the text file.
So what are the actual differences between these two files from the content perspective?
Thank you in advance!

The primary output of the linker (i.e. its main purpose) is to produce the fully linked executable code. That is the ELF (Executable Linkable Format) file. An ELF file may as you have observed contain symbols - these are used for debug. It may also contain meta-data that associates the machine code with the source code from which it was generated. But the bulk of its content (and the part that is not optional) is the executable machine code and data objects that are your application.
The MAP file is an optional information only human readable output that contains information about the location and size of code and data objects in your application. The MAP file includes a summary that shows the total size and memory usage of your code.
In an embedded cross-development environment, the symbol information in the ELF file is used when the code is loaded into a source-level symbolic debugger. The debugger takes the binary code/data segments in the ELF file and loads them onto the target (typically using a JTAG or other debug/programming hardware tool), it loads the symbols and source-level debug meta-data into the debugger, then while the real machine code is executing on the target, that execution is reflected in the debugger in the original source code where you can view, step and break-point the code at the source level.
In short, the ELF file is your program. The MAP file is, as its name suggests, a map of your executable - it tells you where things are in the executable.

Related

Does a C program load everything to memory?

I've been practicing C today and something came to my mind. Whenever C code is ran, does it load all files needed for execution into memory? Like, does the main.c file and it's header files get copied into memory? What happens if you have a complete C program that takes up 1 GB or something large?
A C program is first compiled into a binary executable so header files, sources files, etc do not exist anymore at this point... unless you compiled your binary with debugging informations (-g flag).
This is a huge topic. Generally the executable is mapped into what's called virtual memory which allows to address more space than you have available in your computer's memory (through paging). When you will try to access code segments that are not yet loaded, it will create a page fault and the os will fetch what's missing. Compilers will often reorder functions to avoid executing code from random memory locations so you're, most of the time, executing only a small part of your binary.
If you look into specific domains such as HPC or embedded devices the loading policies will likely be different.
C is not interpreted but compiled language.
This means that the original *.c source file is never loaded at execution time. Instead, the compiler will process it once, to produce an executable file containing machine language.
Therefore, the size of source file doesn't directly matter. It may totally be very large if it contains a lot of different use cases, then producing a tiny executable because only the applicable case will be picked at compilation time. However, most of the time, the executable size remains correlated with its source, but it doesn't necessarily means that this will end up in something huge.
Also, included *.h headers file at top of C source files are not actually « importing » a dependence (such as use, require, or import would in other languages). #include statement is only here to insert the content of a file at a given point, but these files usually contain only function prototypes, variable declarations and some precompiler #define clauses, which form the API of an external resource that is linked later to your program.
These external resources are typically other object modules (when you have multiple *.c files within a same project and you don't need to recompile them all from scratch at each time), static libraries or dynamic libraries. These later ones are DLL files under Windows and *.so files under Unix. In this case, the operating system will automatically load the required libraries when you run your program.

Linking using map file without having object files

I want to create an operating system for embedded device with very limited resources (an ESP8266) that can load ELF files as program or shared object (shared object is in second importance).
I want to know is it possible to link any program for this OS against map file of OS?
for example I implement memcpy in OS and make a header file that declares it as extern, Compile OS and generate map file. then when i want to write a program, include the header to compile it successfully and make linker to peek the address of memcpy from map file of OS.
the OS is place non-independent and its functions are always at a fixed address, but programs are place independent ELF files. it is not necessary to program be loadable for different builds of OS.
This is by no means a complete solution to the problem of running ELFs on a embedded target but for the specific problem of providing known addresses during the linking process, GNU LD allows you to provide addresses for symbols in code defined as extern by adding a PROVIDE statement or a simple assignment to the linker script. LD won't directly read a map file, but you could parse the map file, find the relevant addresses, generate a linker script that has the appropriate symbols provided, and use that linker script in the compilation of the ELF. The documentation for the provide and assignment features can be found at https://sourceware.org/binutils/docs/ld/Assignments.html

What's the difference between binary and executable files mentioned in ndisasm's manual?

I want to compile my C file with clang and then decompile it with with ndisasm (for educational purposes). However, ndisasm says in it's manual that it only works with binary and not executable files:
ndisasm only disassembles binary files: it has
no understanding of the header information
present in object or executable files. If you
want to disassemble an object file, you should
probably be using objdump(1).
What's the difference, exactly? And what does clang output when I run it with a simple C file, an executable or a binary?
An object file contains machine language code, and all sorts of other information. It sounds like ndisasm wants just the machine code, not the other stuff. So the message is telling you to use the objdump utility to extract just the machine code segment(s) from the object file. Then you can presumably run ndisasm on that.
And what does clang output when I run it with a simple C file, an executable or a binary?
A C compiler is usually able to create a 'raw' binary, which is Just The Code, hold the tomato, because for some (rare!) purposes that can be useful. Think, for instance, of boot sectors (which cannot 'load' an executable the regular way because the OS to load them is not yet started) and of programmable RAM chips. An Operating system in itself usually does not like to execute 'raw binary code' - pretty much for the same reasons. An exception is MS Windows, which still can run old format .com binaries.
By default, clang will create an executable. The intermediate files, called object files, are usually deleted after the executable is linked (glued together with library functions and an appropriate executable header). To get just a .o object file, use the -c switch.
Note that Object files also contain a header. After all, the linker needs to know what the file contains before it can link it to other parts.
For educational purposes, you may want to examine the object file format. Armed with that knowledge it should be possible to write a program that can tell you at what offset in the file the actual code starts. Then you can feed that information into ndisasm.
In addition to the header, files may contain even more data after the instructions. Again, ndisasm does not know and nor does it care. If your test program contains a string Hello world! somewhere at the end, it will happily try to disassemble that as well. It's up to you to recognize this garbage as such, and ignore what ndisasm does to it.

What is the difference between ELF files and bin files?

The final images produced by compliers contain both bin file and extended loader format ELf file ,what is the difference between the two , especially the utility of ELF file.
A Bin file is a pure binary file with no memory fix-ups or relocations, more than likely it has explicit instructions to be loaded at a specific memory address. Whereas....
ELF files are Executable Linkable Format which consists of a symbol look-ups and relocatable table, that is, it can be loaded at any memory address by the kernel and automatically, all symbols used, are adjusted to the offset from that memory address where it was loaded into. Usually ELF files have a number of sections, such as 'data', 'text', 'bss', to name but a few...it is within those sections where the run-time can calculate where to adjust the symbol's memory references dynamically at run-time.
A bin file is just the bits and bytes that go into the rom or a particular address from which you will run the program. You can take this data and load it directly as is, you need to know what the base address is though as that is normally not in there.
An elf file contains the bin information but it is surrounded by lots of other information, possible debug info, symbols, can distinguish code from data within the binary. Allows for more than one chunk of binary data (when you dump one of these to a bin you get one big bin file with fill data to pad it to the next block). Tells you how much binary you have and how much bss data is there that wants to be initialised to zeros (gnu tools have problems creating bin files correctly).
The elf file format is a standard, arm publishes its enhancements/variations on the standard. I recommend everyone writes an elf parsing program to understand what is in there, dont bother with a library, it is quite simple to just use the information and structures in the spec. Helps to overcome gnu problems in general creating .bin files as well as debugging linker scripts and other things that can help to mess up your bin or elf output.
some resources:
ELF for the ARM architecture
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0044d/IHI0044D_aaelf.pdf
ELF from wiki
http://en.wikipedia.org/wiki/Executable_and_Linkable_Format
ELF format is generally the default output of compiling.
if you use GNU tool chains, you can translate it to binary format by using objcopy, such as:
arm-elf-objcopy -O binary [elf-input-file] [binary-output-file]
or using fromELF utility(built in most IDEs such as ADS though):
fromelf -bin -o [binary-output-file] [elf-input-file]
bin is the final way that the memory looks before the CPU starts executing it.
ELF is a cut-up/compressed version of that, which the CPU/MCU thus can't run directly.
The (dynamic) linker first has to sufficiently reverse that (and thus modify offsets back to the correct positions).
But there is no linker/OS on the MCU, hence you have to flash the bin instead.
Moreover, Ahmed Gamal is correct.
Compiling and linking are separate stages; the whole process is called "building", hence the GNU Compiler Collection has separate executables:
One for the compiler (which technically outputs assembly), another one for the assembler (which outputs object code in the ELF format),
then one for the linker (which combines several object files into a single ELF file), and finally, at runtime, there is the dynamic linker,
which effectively turns an elf into a bin, but purely in memory, for the CPU to run.
Note that it is common to refer to the whole process as "compiling" (as in GCC's name itself), but that then causes confusion when the specifics are discussed,
such as in this case, and Ahmed was clarifying.
It's a common problem due to the inexact nature of human language itself.
To avoid confusion, GCC outputs object code (after internally using the assembler) using the ELF format.
The linker simply takes several of them (with an .o extension), and produces a single combined result, probably even compressing them (into "a.out").
But all of them, even ".so" are ELF.
It is like several Word documents, each ending in ".chapter", all being combined into a final ".book",
where all files technically use the same standard/format and hence could have had ".docx" as the extension.
The bin is then kind of like converting the book into a ".txt" file while adding as many whitespace as necessary to be equivalent to the size of the final book (printed on a single spool),
with places for all the pictures to be overlaid.
I just want to correct a point here. ELF file is produced by the Linker, not the compiler.
The Compiler mission ends after producing the object files (*.o) out of the source code files. Linker links all .o files together and produces the ELF.

Object code, linking time in C language

When compiling, C produces object code before linking time.
I wonder if object code is in the form of binary yet?
If so, what happened next in the linking time?
Wikipedia says,
In computer science, an object file is
an organised collection of named
objects, and typically these objects
are sequences of computer instructions
in a machine code format, which may be
directly executed by a computer's CPU.
Object files are typically produced by
a compiler as a result of processing a
source code file. Object files contain
compact code, and are often called
"binaries".
A linker is typically used
to generate an executable or library
by amalgamating parts of object files
together. Object files for embedded
systems typically contain nothing but
machine code but generally, object
files also contain data for use by the
code at runtime: relocation
information, stack unwinding
information, comments, program symbols
(names of variables and functions) for
linking and/or debugging purposes, and
other debugging information.
Another great site has much more detailed info and a useful diagram, here:
Object files as produced by the C compiler essentially contain binary code with holes in each place where an address should go that is yet unknown (addresses of function from other files -- including libraries -- called, addresses of variables from other files that are accessed in this one, ...).
It also contains a table indexed by symbol names ("x" or "_x" for variable x, "f" or "_f" for function f). For each such symbol, there is a status code ("defined here", "not defined here but used", ...) and the addresses of holes in the binary code that need to be filed with each address when it becomes known.
If you are using Unix (or gcc on Windows), you can print the later table with the command "nm file.o".
Yes, object code is usually in binary form. Just try opening it in your favorite text editor.
You can learn what linkers do here or here.

Resources