When inspecting an object file e.g. one of those in /usr/lib32 or /usr/lib with readelf -r <object file>, it seems that the 32-bit variants do not have an addend field, while the 64-bit ones do have. I guess that it is to fix the address layout for non-32 bit system e.g. x86_64 or arm. Is it correct?
The distinction between RELA (explicit addend) vs. REL (addend stored at modified offset) relocations is mostly historic. It all started with REL to save space but most modern architectures use RELA to speed up linking. Theoretically static/dynamic linkers should support both REL and RELA or any mix thereof (see e.g. discussion here).
Related
I'm writing an ARMv7E-M Thumb2 binary analysis tool, and decoding the instruction stream manually.
arm-gcc, invoked with the -mcpu=cortex-m4 and -mfloat-abi=hard flags, emitted the following instruction while compiling my C code:
40280: eeb8 7a47 vcvt.f32.u32 s14, s14
I can't find this specific encoding in the ARMv7-M Architecture Reference Manual, though.
The closest I can find is A7.7.226 VCVT, pictured below, but bit 1 of word 0 is set to 1 in the specification, but 0 in eeb8.
Which instruction and encoding is the compiler selecting for eeb8 7a47? Where can I find the documentation for this specific encoding?
Uhh, well, a different version of the ARMv7-M ARM has an encoding that matches what the compiler is emitting. I'm not sure yet what the difference is, but I'm posting the matching version here and marking it as an answer.
I have an issue with an ELF file generated by the GNU linker ld.
The result is that the data section (.data) gets corrupted when the executable is loaded into memory. The corruption to the .data section occurs when the loader performs the relocation on the .eh_frame section using the relocation data (.rela.eh_frame).
What happens is that this relocation causes seven writes that are beyond the .eh_frame section and over-write the correct contents of the .data section which is adjacent to the top of the .eh_frame section.
After some investigation, I believe the loader is behaving correctly, but the ELF file it has been given contains an error.
But I could be wrong and wanted to check what I've found so far.
Using readelf on the ELF file, it can be seen that seven of the entries in the .rela.eh_frame section contain offsets that are outside (above) the range given by readelf for the .eh_frame section. ie The seven offsets in .rela.eh_frame are greater than the length given for .eh_frame. When these seven offsets are applied in the relocation, they corrupt the .data section.
So my questions are:
(1) Is my deduction that relocation offsets should not be greater than the length of the section to which they apply? And therefore the ELF file that has been generated is in error?
(2) What are people's opinions on the best way of proceeding to diagnose the cause of the incorrect ELF file? Are there any options to ld that will help, or any options that will remove/fix the .eh_frame and it's relocation counterpart .rela.eh_frame?
(3) How would I discover what linker script is being used when the ELF file is generated?
(4) Is there a specific forum where I might find a whole pile of linker experts who would be able to help. I appreciate this is a highly technical question and that many people may not have a clue what I'm talking about!
Thanks for any help!
The .eh_frame section is not supposed to have any run-time relocations. All offsets are fixed when the link editor is run (because the object layout is completely known at this point) and the ET_EXEC or ET_DYN object is created. Only ET_REL objects have relocations in that section, and those are never seen by the dynamic linker. So something odd most be going on.
You can ask such questions on the binutils list or the libc-help list (if you use the GNU toolchain).
EDIT It seems that you are using a toolchain configured for ZCX exceptions with a target which expects SJLJ exceptions. AdaCore has some documentation about his:
GNAT User's Guide Supplement for Cross Platforms 19.0w documentation » VxWorks Topics
Zero Cost Exceptions on PowerPC Targets
It doesn't quite say how t switch to the SJLJ-based VxWorks 5 toolchain. It is definitely not a matter of using the correct linker script. The choice of exception handling style affects code generation, too.
How can I disassemble an executable on my mac using ndisasm and reassemble and link it using nasm and ld?
This is what I tried (I'm running MacOS X btw):
*ndisasm a.out | cut -c 29- > main.asm*
this generated clean assembler code with all the processor instructions in main.asm
*nasm -f macho main.asm*
this generated an object file main.o which I then tried to link
*ld main.o*
... this is where I'm stuck. I don't know why it generates the following error:
ld: in section __TEXT,__text reloc 0: R_ABS reloc but no absolute symbol at target adress file 'main.o' for inferred architecture i386.
I also tried specifying the architecture (ld -arch x86_64 main.o) but that didn't work either.
My goal is to disassemble any executable, modify it and then reassemble it again.
What am I doing wrong?
There is no reliable way to do this with normal assembler syntax. See How to disassemble, modify and then reassemble a Linux executable?. Section info is typically not faithfully disassembled, so you'd need a special format designed for modify and reassembling + relinking.
Also, instruction-lengths are a problem when code only works when padded by using longer encodings. (e.g. in a table of jump targets for a computed goto). See Where are GNU assembler instruction suffixes like ".s" in x86 "mov.s" documented?, but note that disassemblers don't support disassembling into that format.
ndisasm doesn't understand object file formats, so it disassembles headers as machine code!
For this to have any hope of working, use a disassembler like Agner Fog's objconv which will output asm source (NASM, MASM, or GAS AT&T) which does assemble. It might not actually work if any of the code depended on a specific longer-than-default encoding.
I'm not sure how faithful objconv is with respect to emitting section .bss, section .rodata and other directives like that to place data where it found it in the object file, but that's what you need.
Re: absolute relocations: make sure you put DEFAULT REL at the top of your file. I forget if objconv does this by default. x86-64 Mach-o only supports PC-relative relocations, so you have to create position-independent code (e.g. using RIP-relative addressing modes).
ndisasm doesn't read the symbol table, so all its operands use absolute addressing. objconv makes up label names for jump targets and static data that doesn't appear in the symbol table.
I'm most interested in extracting the architecture version, i.e. v5, v5T, etc. I've been referencing Elf for the ARM Architecture Section 4.3.6 Build Attributes which has been helpful in getting me up to this point. I can find the start of the .ARM.attributes section and can parse the first key parts of the information: Format-version, Section-length, and vendor-name + null byte, no problem. I get a little lost after that. Below is a snapshot I ran using hexdump -vC on an elf compiled with arm-linux-gnueabi-gcc -march=armv5t -O myprog.c -o myprog for a ARMv5T architecture. The start of the section is 77f0b.
We can see:
Format-version: A
Section-length: 0x29
Vendor-name: "aeabi"
Obviously, 5T is available in ASCII form at 77f1C, but I'm not sure how to interpret the tag I need to parse to get that value.
Note: Yes, I understand there are tools that I can use to do this, but I need to extract the information in the application I am writing. It already parses the necessary information to make it this far.
Bonus question: Does PowerPC have similar tags? I couldn't find any supporting documentation.
These tags are documented in the Addenda to, and Errata in, the ABI for the ARM Architecture. For example, under The target-related-attributes (section 3.3.5.2), we learn that Tag_CPU_arch has value 6, which immediately follows Tag_CPU_name (5, preceding the 5T) in your dump. Its argument is 3, which again corresponds to ARM v5T, according to the table in the document. The next tag is Tag_ARM_ISA_use (8) with an argument of 1, meaning The user intended that this entity could use ARM instructions (whatever this means), and so on.
Note that the integers are encoded in uleb128 format. This encoding is described in the DWARF standard (in section 7.6 of DWARF 3). Basically, it's base-128, little endian, and you need to keep reading while the MSB is set.
I'm currently doing my own objdump implementation in C.
For my -s option, I have to show the full contents of the sections of an ELF file.
I'm doing it well, but I'm showing more sections than the "real" objdump.
In fact, it does not output the .bss, .shstrtab, .symtab and .strtab sections.
I'm looking around the sh_flags value on the Shdr struct but I can't find any logic...
Why does objdump -s <ELF file> not show these sections ?
Why objdump -s does not shows these sections ?
Objdump is based on libbfd, which abstracts away many complexities of ELF, and was written when objects tended to only have three sections.
As such, objdump is quite deficient. In addition to not showing you (some) existing sections, it may also "synthesize" sections that don't exist at all, and do other weird tricks. This is more of a libbfd fault -- its abstraction layer simply doesn't tell objdump about the "missing" sections.
TL;DR: don't use objdump. Use readelf instead.
Try using sh_size and sh_type, instead of sh_flags.
Quoting from the ELF specification
sh_size This member gives the section’s size in bytes. Unless the
section type is SHT_NOBITS, the section occupies sh_size bytes in the
file. A section of type SHT_NOBITS may have a non-zero size, but it
occupies no space in the file