some confuse about elf.h

some confuse about elf.h - c

Does elf.h have anything to do with compiler?
I'm a little confuse about the elf.h. I know the elf.h's most use case is when parse a ELF file. But I can still parse a ELF file byte by byte even don't use elf.h at all, right？ Although I know it' a little ugly.
Does all the ELF file follow the same format(firstly ELF header, then sections, section table, etc.)?

elf.h is a header file that defines the format of ELF executable binary files, including executables, relocatable object files, core files, and shared libraries. ELF doesn't have anything to do with the compiler itself, but compilers must output binaries in a File Format that can be linked/loaded/executed by the operating system. Therefore, most compilers generating binaries in the Executable and Linkable Format (ELF) include elf.h and use it to generate ELF binaries (otherwise the results may be non-standard). For example, GCC includes elf.h for these reasons.
The purpose of elf.h is to well-define the ELF format such that ELF binaries can be deterministically generated and parsed. You can absolutely create your own ELF parser, and it can be a great way to learn more about ELF.
Since ELF is a standard format it is well-defined. In other words, yes ELF binaries should follow the same overall structure, although the contents will clearly differ. It may be helpful to think of ELF as a binary wrapper that encapsulates sections of your code so that programs can be linked/loaded/executed consistently. If such a standard did not exist, it would be very difficult (or near impossible) for a loader to run binaries as we do today. Dynamically linking would pretty much be impossible.

Related

What's the difference between binary and executable files mentioned in ndisasm's manual?

I want to compile my C file with clang and then decompile it with with ndisasm (for educational purposes). However, ndisasm says in it's manual that it only works with binary and not executable files:
ndisasm only disassembles binary files: it has
no understanding of the header information
present in object or executable files. If you
want to disassemble an object file, you should
probably be using objdump(1).
What's the difference, exactly? And what does clang output when I run it with a simple C file, an executable or a binary?

An object file contains machine language code, and all sorts of other information. It sounds like ndisasm wants just the machine code, not the other stuff. So the message is telling you to use the objdump utility to extract just the machine code segment(s) from the object file. Then you can presumably run ndisasm on that.

And what does clang output when I run it with a simple C file, an executable or a binary?
A C compiler is usually able to create a 'raw' binary, which is Just The Code, hold the tomato, because for some (rare!) purposes that can be useful. Think, for instance, of boot sectors (which cannot 'load' an executable the regular way because the OS to load them is not yet started) and of programmable RAM chips. An Operating system in itself usually does not like to execute 'raw binary code' - pretty much for the same reasons. An exception is MS Windows, which still can run old format .com binaries.
By default, clang will create an executable. The intermediate files, called object files, are usually deleted after the executable is linked (glued together with library functions and an appropriate executable header). To get just a .o object file, use the -c switch.
Note that Object files also contain a header. After all, the linker needs to know what the file contains before it can link it to other parts.
For educational purposes, you may want to examine the object file format. Armed with that knowledge it should be possible to write a program that can tell you at what offset in the file the actual code starts. Then you can feed that information into ndisasm.
In addition to the header, files may contain even more data after the instructions. Again, ndisasm does not know and nor does it care. If your test program contains a string Hello world! somewhere at the end, it will happily try to disassemble that as well. It's up to you to recognize this garbage as such, and ignore what ndisasm does to it.

AVR-GCC file output structure

I already searched the web, especially the avr-gcc website.
I want to know the STRUCTURE of the output file, of sourcecode, compiled with avr-gcc.
Example of a standard Microsoft .EXE file:
00h DW Signature word.
"N" is low-order byte.
"E" is high-order byte.
02h DB Version number of the linker.
03h DB Revision number of the linker.
Can someone please tell me the avr-gcc output file structure?
Thank you. -MW
edit:
As Rev1.0 said, it's the Intel-HEX format.

Since you talk about AVR, you probably mean the format of the HEX-file? That is encoded as Intel HEX format.
EDIT Regarding your question from the comment:
I see the Header has the field "DATA". What exactly is in that field? The pure assembly?
Each line of the HEX file is called a "record". There are several types of records. Depending on the record type, the data contents have a different meaning. A "data record" holds the actual firmware/program data. That is lowest level machine code, not assembly. It represents exactly the data that resides in the flash memory after the device has been programmed.

Can someone please tell me the avr-gcc output file structure?
avr-gcc is just a driver program that calls sub-processes on different files and file formats. For example, the compiler proper reads pre-processed input (text) and writes assembly (text).
The GNU assembler reads assembly (*.s, text) as generated by the compiler and writes object files (*.o, ELF32)
The GNU linker / locator reads this object files and resolves references to libraries like libgcc (*.a, ELF32) and produces a final executable (ELF32).
Depending on your loader, you can use ELF directly (for example with avrdude).
If you want something else like Intex HEX or plain binary as an AVR sees it, you'll convert ELF32 to the desired output by means of avr-objcopy from GNU Binutils.
In general, one wants to keep ELF as long as possible because formats like IHEX are "dumb". They don't have additional information like symbol info or debug info, these formats are only used to upload a program to an AVR, but most modern tools understand ELF as well.

Usage differences between. a.out, .ELF, .EXE, and .COFF

Don't get me wrong by looking at the question title - I know what they are (format for portable executable files). But my interest scope is slightly different
MY CONFUSION
I am involved in re-hosting/retargeting applications that are originally from third parties. The problem is that sometimes the formats for object codes are also in .elf, .COFF formats and still says, "Executable and linkable".
I am primarily a Windows user and know that when you compile and assemble your C/C++ code, you get something similar to .o or .obj. that are not executable (well, I never tried to execute them). But when you complete linking static and dynamic libraries and finish building, the executable appears. My understanding is that you can then go about and link that executable or "bash" test it with some form of script if necessary.
However, in Linux (or UNIX-like systems) there are .o files after you compile and assemble the C/C++ code. And once the linking is done, the executable is in a.out format (at least in Ubuntu distribution of Linux). It may very well be .elf in some other distrib. In my quick web search none of the sources mentioned anything about .o files as executables.
QUESTIONS
Therefore my question turns into the followings:
What is the true definitions for portable executables and object code?
How is it that Windows and UNIX platform covers both executables annd object code under the same file format (.COFF, .elf).
Am I misinterpreting "Linkable"? My interpretation of "Linkable" is something that is compiled object code and can then be "linked" to other static/dynamic link libraries. Is this a stupid thought?
Based on question 1. (and perhaps 2) do I need to use symbol tables (e.g. .LUM or .MAP files) with object code then? Symbols as in debug symbols and using them when re-hosting the executables/object files on a different machine.
Thanks in advance for the right nudges. Meanwhile, I will keep digging and update the question if necessary.
UPDATE
I have managed to dig this out from somewhere :( Seems like a lot to swallow to me.

I am primarily a Windows user and know that when you compile your C/C++ code, you get something similar to .o or .obj. that are not executable
Well, last time I compiled stuff on Windows, the result of the compilation was an .obj file, which is exactly what its name suggests: it's an object file. You're right in that it's not an executable in itself. It contains machine code which doesn't (yet) contain enough information to be directly run on the CPU.
However, in Linux (or UNIX-like systems) there are .o files after you compile the C/C++ code. And once the linking is done, the executable is in a.out format (at least in Ubuntu distribution of Linux). It may very well be .elf in some other distrib.
Living in the 90's, that is :P No modern compilers I am aware of target the a.out format as their default output format for object code. Maybe it's a misleading default of GCC to put the object code into a file called a.out when no explicit output file name is specified, but if you run the file command on a.out, you'll find out that it's an ELF file. The a.out format is ancient and it's kind of "de facto obsolete".
What is the true definitions for portable executables and object code?
You've already got the Wikipedia link to object files, here's the one to "Portable Executable".
How is it that Windows and UNIX platform covers both executables annd object code under the same file format (.COFF, .elf).
Because the ELF format (and apparently COFF too) has been designed like so. And why not? It's just the very same machine code after all, it seems quite logical to use one file format during all the compilation steps. Just like we don't like when dynamic libraries and stand-alone executables have a different format. (That's why ELF is called ELF - it's an "Executable and Linkable Format".)
Am I misinterpreting "Linkable"?
I don't know. From your question it's not clear to me what you think "linkable" is. In general, it means that it's a file that can be linked against, i. e. a library.
Based on question 1. (and perhaps 2) do I need to use symbol tables (e.g. .LUM or .MAP files) with object code then? Symbols as in debug symbols and using them when re-hosting the object files on a different machine.
I think this one is not related to the executable format used. If you want to debug, you have to generate debugging information no matter what. But if you don't need to debug, then you're free to omit them of course.

Modify ELF file

I have an ELF executable and I would like to know how can I modify its .rodata segment.
Also, more generally, how can I modify an ELF executable?

You can use any hexeditor to do that, if you know precisely which part of ELF you need to modify.
If you want to parse ELFs and do more complex logic you should write some code which will open file or better, mmap it. Then you can read ELF header which gives basic information about ELF and points to other important places in ELF. I suggest reading manual for ELF and <include/elf.h>.
If you are using Linux, you can view where sections lie in memory using readelf or objdump.

What is the difference between ELF files and bin files?

The final images produced by compliers contain both bin file and extended loader format ELf file ,what is the difference between the two , especially the utility of ELF file.

A Bin file is a pure binary file with no memory fix-ups or relocations, more than likely it has explicit instructions to be loaded at a specific memory address. Whereas....
ELF files are Executable Linkable Format which consists of a symbol look-ups and relocatable table, that is, it can be loaded at any memory address by the kernel and automatically, all symbols used, are adjusted to the offset from that memory address where it was loaded into. Usually ELF files have a number of sections, such as 'data', 'text', 'bss', to name but a few...it is within those sections where the run-time can calculate where to adjust the symbol's memory references dynamically at run-time.

A bin file is just the bits and bytes that go into the rom or a particular address from which you will run the program. You can take this data and load it directly as is, you need to know what the base address is though as that is normally not in there.
An elf file contains the bin information but it is surrounded by lots of other information, possible debug info, symbols, can distinguish code from data within the binary. Allows for more than one chunk of binary data (when you dump one of these to a bin you get one big bin file with fill data to pad it to the next block). Tells you how much binary you have and how much bss data is there that wants to be initialised to zeros (gnu tools have problems creating bin files correctly).
The elf file format is a standard, arm publishes its enhancements/variations on the standard. I recommend everyone writes an elf parsing program to understand what is in there, dont bother with a library, it is quite simple to just use the information and structures in the spec. Helps to overcome gnu problems in general creating .bin files as well as debugging linker scripts and other things that can help to mess up your bin or elf output.

some resources:
ELF for the ARM architecture
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0044d/IHI0044D_aaelf.pdf
ELF from wiki
http://en.wikipedia.org/wiki/Executable_and_Linkable_Format
ELF format is generally the default output of compiling.
if you use GNU tool chains, you can translate it to binary format by using objcopy, such as:
arm-elf-objcopy -O binary [elf-input-file] [binary-output-file]
or using fromELF utility(built in most IDEs such as ADS though):
fromelf -bin -o [binary-output-file] [elf-input-file]

bin is the final way that the memory looks before the CPU starts executing it.
ELF is a cut-up/compressed version of that, which the CPU/MCU thus can't run directly.
The (dynamic) linker first has to sufficiently reverse that (and thus modify offsets back to the correct positions).
But there is no linker/OS on the MCU, hence you have to flash the bin instead.
Moreover, Ahmed Gamal is correct.
Compiling and linking are separate stages; the whole process is called "building", hence the GNU Compiler Collection has separate executables:
One for the compiler (which technically outputs assembly), another one for the assembler (which outputs object code in the ELF format),
then one for the linker (which combines several object files into a single ELF file), and finally, at runtime, there is the dynamic linker,
which effectively turns an elf into a bin, but purely in memory, for the CPU to run.
Note that it is common to refer to the whole process as "compiling" (as in GCC's name itself), but that then causes confusion when the specifics are discussed,
such as in this case, and Ahmed was clarifying.
It's a common problem due to the inexact nature of human language itself.
To avoid confusion, GCC outputs object code (after internally using the assembler) using the ELF format.
The linker simply takes several of them (with an .o extension), and produces a single combined result, probably even compressing them (into "a.out").
But all of them, even ".so" are ELF.
It is like several Word documents, each ending in ".chapter", all being combined into a final ".book",
where all files technically use the same standard/format and hence could have had ".docx" as the extension.
The bin is then kind of like converting the book into a ".txt" file while adding as many whitespace as necessary to be equivalent to the size of the final book (printed on a single spool),
with places for all the pictures to be overlaid.

I just want to correct a point here. ELF file is produced by the Linker, not the compiler.
The Compiler mission ends after producing the object files (*.o) out of the source code files. Linker links all .o files together and produces the ELF.