ARM static relocations for elf format - arm

Is there some good document explaining step by step how to apply static arm relocations in elf relocatable files?
I've found this http://infocenter.arm.com/help/topic/com.arm.doc.ihi0044e/IHI0044E_aaelf.pdf but it's still very confusing. I'm not sure I'm getting segmentation faults over and over again...Please, someone?

Best would be to study some preexisting linker I would say. Check Android Bionic's Linker for example. It looks easier to study with around 2000 lines of code.

Related

What does linking in the compilation process actually do?

As I understand it the GCC compiler performs four steps when I compile a C program.
Preprocessing - C code (*.c) with macros to C code without macros (*.c)
Compiling - C code (*.c) to Assembly language (*.s)
Assembling - Assembly language (*.s) to Object code (*.o)
Linking - Object code (*.o) to executable (*)
The first three steps make perfect sense to me, but I am still confused as to what linking actually does.
After step three why can't I run the *.o file? At that point my C code is now in object/machine/byte code and can be interpreted by the CPU directly. Yet when I make my *.o file executable and try to run it I get this error:
bash: ./helloworld.o: cannot execute binary file: Exec format error
Why do I get this error? If I have a tiny C program (for example a hello world program) with only one C file it would appear to me that linking has no purpose because there's nothing to link. So what does linking in the compilation process actually do?
Thanks in advance for any replies.
If I have a tiny C program (for example a hello world program)
Even your helloworld program does use #inlude<stdio.h>, doesn't it? That means you're using a library, and the linking step is there to combine the necessary object code (here the library code) to create a binary for you.
For a detailed descriptions of what the linking step does (and compare with compiling) - see this question
Linking in rough explanation is:
Find all the matching segments from each object file, and concat them together. This way we end up with one large .code, one .data, one .bss etc.
Resolve all symbols that are used. Many symbols are local, so that they can be resolved immediately. Unresolved symbols will be searched for in the libraries requested to link with. When this is done, the result will be a symbol table / link map.
Make an file that is actually executable. On Linux, it usually just happens that both executable, libraries and object files all are in the ELF format. This is not true for all platforms.
The simple answer is that .o executables serve different purposes and have a different format.
If you want the complete answer you will need to read the necessary documentation for your platforms binary format.
On linux this will be here. This document will describe the difference between the intermediate format and the final executable format.
Just as an aside the linux kernel module loader does use .o (or rather .ko) files directly.

Is there a way to find the size of eclipse compiled C binary at runtime?

I'm compiling a C program for an embedded application using eclipse, but I need the code to know (at runtime) where exactly it ends in flash. What is the simplest way of doing this?
Thanks
You will need to go into the linker command file and create some labels that mark the start and end of the .text section in memory then in the code take the difference.
You're probably going to have to involve the linker, to do this.
Read up on the tools you're using, and see if (and how) to make the linker set the value of symbols owned by the program.
For the GNU linker, the relevant manual section is Section Data Expressions.

GNU Binutils' Binary File Descriptor library - format example

As in title. I tried reading the BFD's ELF's code, but it's rather not a light reading. I also tried to get something from the documentation, but I would need an example to see how it works. Could anyone point me some easier example for me, to know how to define an executable format?
Edit: Looks like I didn't formulate the question properly. I don't ask "how to create own executable format specification?", nor "where is good ELF documentation?", but "how can I implement my own executable format using GNU BFD?".
You did look here http://sourceware.org/binutils/docs-2.21/bfd/index.html and here http://sourceware.org/binutils/binutils-porting-guide.txt?
Also studying the MMO implementation of a BFD backend as mentioned here http://sourceware.org/binutils/docs-2.21/bfd/mmo.html#mmo (source: http://sourceware.org/cgi-bin/cvsweb.cgi/src/bfd/mmo.c?cvsroot=src) might be less complex than starting with ELF ... ;-)
I agree that BFD documentation is somewhat lacking. Here are some better sources:
ELF Format
System V ABI (section 4)
Here are a couple of readable introductions:
Linux Journal
Dr. Dobbs
And some examples that don't use libbfd:
ELF IO
ELF Toolchain
LibELF
The DOS COM file is the simplest possible format.
Load up to 64k less 256 bytes at seg:0100h, set DS,ES,SS=seg, SP=FFFFh and jump to seg:0100h

How to extract C source code from .so file?

I am working on previously developed software and source code is compiled as linux shared libraries (.so) and source code is not present. Is there any tool which can extract source code from the linux shared libraries?
Thanks,
Ravi
There isn't. Once you compile your code there is no trace of it left in the binary, only machine code.
Some may mention decompilers but those don't extract the source, they analyze the executable and produce some source that should have the same effect as the original one did.
You can try disassembling the object code and get the machine code mnemonics.
objdump -D --disassembler-options intel sjt.o to get Intel syntax assembly
objdump -D --disassembler-options att sjt.o or objdump -D sjt.o to get AT&T syntax assembly
But the original source code could never be found. You might try to reverse the process by studying and reconstruct the sections. It would be hell pain.
Disclaimer: I work for Hex-Rays SA.
The Hex-Rays decompiler is the only commercially available decompiler I know of that works well with real-life x86 and ARM code. It's true that you don't get the original source, but you get something which is equivalent to it. If you didn't strip your binary, you might even get the function names, or, with some luck, even types and local variables. However, even if you don't have symbol info, you don't have to stick to the first round of decompilation. The Hex-Rays decompiler is interactive - you can rename any variable or function, change variable types, create structure types to represent the structures in the original code, add comments and so on. With a little work you can recover a lot. And quite often what you need is not the whole original file, but some critical algorithm or function - and this Hex-Rays can usually provide to you.
Have a look at the demo videos and the comparison pages. Still think "staring at the assembly" is the same thing?
No. In general, this is impossible. Source is not packaged in compiled objects or libraries.
You cannot. But you can open it as an archive in 7-Zip. You can see the file type and size of each file separately in that. You can replace the files in it with your custom files.

How to write a linker

I have written a compiler for C that outputs byte code. The reason for this was to be able to write applications for an embedded platform that runs on multiple platforms.
I have the compiler and the assembler.
I need to write a linker, and am stuck.
The object format is a custom one, designed around the byte code interpreter, so I cant really use any existing linkers.
My biggest hurdle is how to organize the object code to output the linked binary.
Dynamic linking is not necessary, at this time.
I need to get static linking working first.
Ian Lance Taylor, one of the main developers on the gold linker(now part of binutils), posted a series of blogs on how linkers work. You can find it here.
http://linker.iecc.com is the only book I know about this subject.
I second the Linkers and Loaders book. You state that your object format is a custom one. If the format is under your control, you could consider using the ELF format with your bytecode as a new machine architecture, a la x86, SPARC, ARM, etc. The GNU binutils sources are sufficiently malleable to allow you to incorporate your "architecture".

Resources