I have a tool emitting an ELF, which as far as I can tell is compliant to the spec. Readelf output looks fine, but objdump refuses to disassemble anything.
I have simplified the input to a single global var, and "int main(void) { return 0;}" to aid debugging - the tiny section sizes are correct.
In particular, objdump seems unable to find the sections table:
$ arm-none-linux-gnueabi-readelf -S davidm.elf
There are 4 section headers, starting at offset 0x74:
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] .text NULL ff000000 000034 00001c 00 AX 0 0 4
[ 2] .data NULL ff00001c 000050 000004 00 WA 0 0 4
[ 3] .shstrtab NULL 00000000 000114 000017 00 0 0 0
$ arm-none-linux-gnueabi-objdump -h davidm.elf
davidm.elf: file format elf32-littlearm
Sections:
Idx Name Size VMA LMA File off Algn
I also have another ELF, built from the exact same objects, only produced with regular toolchain use:
$ objdump -h kernel.elf
kernel.elf: file format elf32-littlearm
Sections:
Idx Name Size VMA LMA File off Algn
0 .text 0000001c ff000000 ff000000 00008000 2**2
CONTENTS, ALLOC, LOAD, READONLY, CODE
1 .data 00000004 ff00001c ff00001c 0000801c 2**2
CONTENTS, ALLOC, LOAD, DATA
Even after I stripped .comment and .ARM.attributes sections (incase objdump requires them) from the 'known good' kernel.elf, it still happily lists the sections there, but not in my tool's davidm.elf.
I have confirmed the contents of the sections are identical between the two with readelf -x.
The only thing I can image is that the ELF file layout is different and breaks some expectations of BFD, which could explain why readelf (and my tool) processes it just fine but objdump has troubles.
Full readelf:
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: ARM
Version: 0x1
Entry point address: 0xff000000
Start of program headers: 84 (bytes into file)
Start of section headers: 116 (bytes into file)
Flags: 0x5000002, has entry point, Version5 EABI
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 1
Size of section headers: 40 (bytes)
Number of section headers: 4
Section header string table index: 3
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] .text NULL ff000000 000034 00001c 00 AX 0 0 4
[ 2] .data NULL ff00001c 000050 000004 00 WA 0 0 4
[ 3] .shstrtab NULL 00000000 000114 000017 00 0 0 0
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings)
I (info), L (link order), G (group), x (unknown)
O (extra OS processing required) o (OS specific), p (processor specific)
There are no section groups in this file.
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x000034 0xff000000 0xff000000 0x00020 0x00020 RWE 0x8000
Section to Segment mapping:
Segment Sections...
00 .text .data
There is no dynamic section in this file.
There are no relocations in this file.
There are no unwind sections in this file.
No version information found in this file.
Could the aggressive packing of the on-disk layout be causing troubles? Am I in violation of some bytestream alignment restrictions BFD expects, documented or otherwise?
Lastly - this file is not intended to be mmap'd into an address space, a loader will memcpy segment data into the desired location, so there is no requirement to play mmap-friendly file-alignment tricks. Keeping the ELF small is more important.
Cheers,
DavidM
EDIT: I was asked to upload the file, and/or provide 'objdump -x'. So I've done both:
davidm.elf
$ objdump -x davidm.elf
davidm.elf: file format elf32-littlearm
davidm.elf
architecture: arm, flags 0x00000002:
EXEC_P
start address 0xff000000
Program Header:
LOAD off 0x00000034 vaddr 0xff000000 paddr 0xff000000 align 2**15
filesz 0x00000020 memsz 0x00000020 flags rwx
private flags = 5000002: [Version5 EABI] [has entry point]
Sections:
Idx Name Size VMA LMA File off Algn
SYMBOL TABLE:
no symbols
OK - finally figured it out.
After building and annotating/debugging libbfd (function elf_object_p()) in the context of a little test app, I found why it was not matching on any of BFD supported targets.
I had bad sh_type flags for the section headers: NULL. Emitting STRTAB or PROGBITS (and eventually NOBITS when I get that far) as appropriate and objdump happily walks my image.
Not really surprising, in retrospect - I'm more annoyed I didn't catch this in comparing readelf outputs than anything else :(
Thanks for the help all :)
Related
Any experts with a deep understanding of ELF loading, could you please explain to me why the following ELF file throws a Segmentation fault (errno=139)?
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0x131a
Start of program headers: 64 (bytes into file)
Start of section headers: 232 (bytes into file)
Flags: 0x0
Size of this header: 64 (bytes)
Size of program headers: 56 (bytes)
Number of program headers: 3
Size of section headers: 64 (bytes)
Number of section headers: 8
Section header string table index: 7
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 0] null NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
[ 1] .init PROGBITS 000000000000131a 0000031a
0000000000000001 0000000000000000 AX 0 0 1
[ 2] .text PROGBITS 000000000000131b 0000031b
0000000000000096 0000000000000000 AX 0 0 1
[ 3] .fini PROGBITS 00000000000013b1 000003b1
0000000000000001 0000000000000000 AX 0 0 1
[ 4] .rodata PROGBITS 00000000000013b2 000003b2
0000000000000014 0000000000000000 A 0 0 1
[ 5] .data PROGBITS 00000000000013c6 000003c6
000000000000001e 0000000000000000 A 0 0 1
[ 6] .bss NOBITS 00000000000013e4 000003e4
0000000000000000 0000000000000000 WA 0 0 1
[ 7] strtab STRTAB 00000000000012e8 000002e8
0000000000000032 0000000000000000 AS 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
L (link order), O (extra OS processing required), G (group), T (TLS),
C (compressed), x (unknown), o (OS specific), E (exclude),
l (large), p (processor specific)
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
LOAD 0x000000000000031a 0x000000000000131a 0x000000000000131a
0x0000000000000098 0x0000000000000098 R E 0x1000
LOAD 0x00000000000003b2 0x00000000000013b2 0x00000000000013b2
0x0000000000000014 0x0000000000000014 R 0x1000
LOAD 0x00000000000003c6 0x00000000000013c6 0x00000000000013c6
0x000000000000001e 0x000000000000101e RW 0x1000
Section to Segment mapping:
Segment Sections...
00 .init .text .fini
01 .rodata
02 .data .bss
The exact same executable with the following file alignment changes works fine:
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0x10400
Start of program headers: 64 (bytes into file)
Start of section headers: 232 (bytes into file)
Flags: 0x0
Size of this header: 64 (bytes)
Size of program headers: 56 (bytes)
Number of program headers: 3
Size of section headers: 64 (bytes)
Number of section headers: 8
Section header string table index: 7
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 0] null NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
[ 1] .init PROGBITS 0000000000010400 00000400
0000000000000001 0000000000000000 AX 0 0 1
[ 2] .text PROGBITS 0000000000010800 00000800
0000000000000096 0000000000000000 AX 0 0 1
[ 3] .fini PROGBITS 0000000000010c00 00000c00
0000000000000001 0000000000000000 AX 0 0 1
[ 4] .rodata PROGBITS 0000000000011000 00001000
0000000000000014 0000000000000000 A 0 0 1
[ 5] .data PROGBITS 0000000000011400 00001400
000000000000001e 0000000000000000 A 0 0 1
[ 6] .bss NOBITS 0000000000011800 00001800
0000000000000000 0000000000000000 WA 0 0 1
[ 7] strtab STRTAB 00000000000102e8 000002e8
0000000000000032 0000000000000000 AS 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
L (link order), O (extra OS processing required), G (group), T (TLS),
C (compressed), x (unknown), o (OS specific), E (exclude),
l (large), p (processor specific)
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
LOAD 0x0000000000000400 0x0000000000010400 0x0000000000010400
0x0000000000000801 0x0000000000000801 R E 0x1
LOAD 0x0000000000001000 0x0000000000011000 0x0000000000011000
0x0000000000000014 0x0000000000000014 R 0x1
LOAD 0x0000000000001400 0x0000000000011400 0x0000000000011400
0x000000000000001e 0x000000000000101e RW 0x1
Section to Segment mapping:
Segment Sections...
00 .init .text .fini
01 .rodata
02 .data .bss
In both cases it holds that:
sh_addr mod sh_addralign = 0 and
p_vaddr mod PAGESIZE = p_offset. (Pagesize acquired with getconf PAGESIZE).
I appreciate your help - thank you very much in advance.
UPDATE:
I realized that my LOAD segments were overlapping in virtual memory in the first readelf printout that I posted. I have corrected this now, but for the now non-overlapping LOAD segments I still get a segmentation fault when my start virtual memory address for the first page is at 0x0 (same if it is at 0x1000, i.e. one page size higher):
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0x31a
Start of program headers: 64 (bytes into file)
Start of section headers: 232 (bytes into file)
Flags: 0x0
Size of this header: 64 (bytes)
Size of program headers: 56 (bytes)
Number of program headers: 3
Size of section headers: 64 (bytes)
Number of section headers: 8
Section header string table index: 7
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 0] null NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
[ 1] .init PROGBITS 000000000000031a 0000031a
0000000000000001 0000000000000000 AX 0 0 0
[ 2] .text PROGBITS 000000000000031b 0000031b
0000000000000076 0000000000000000 AX 0 0 0
[ 3] .fini PROGBITS 0000000000000391 00000391
0000000000000001 0000000000000000 AX 0 0 0
[ 4] .rodata PROGBITS 0000000000001392 00000392
0000000000000014 0000000000000000 A 0 0 0
[ 5] .data PROGBITS 00000000000023a6 000003a6
000000000000001e 0000000000000000 A 0 0 0
[ 6] .bss NOBITS 00000000000023c4 000003c4
0000000000000000 0000000000000000 WA 0 0 0
[ 7] strtab STRTAB 00000000000002e8 000002e8
0000000000000032 0000000000000000 AS 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
L (link order), O (extra OS processing required), G (group), T (TLS),
C (compressed), x (unknown), o (OS specific), E (exclude),
l (large), p (processor specific)
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
LOAD 0x000000000000031a 0x000000000000031a 0x000000000000031a
0x0000000000000078 0x0000000000000078 R E 0x1000
LOAD 0x0000000000000392 0x0000000000001392 0x0000000000001392
0x0000000000000014 0x0000000000000014 R 0x1000
LOAD 0x00000000000003a6 0x00000000000023a6 0x00000000000023a6
0x000000000000001e 0x0000000000000082 RW 0x1000
Section to Segment mapping:
Segment Sections...
00 .init .text .fini
01 .rodata
02 .data .bss
When I change the start address to 0x10000 (PAGESIZE * 16), then the segmentation fault disappears. Any ideas why that is?
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0x1031a
Start of program headers: 64 (bytes into file)
Start of section headers: 232 (bytes into file)
Flags: 0x0
Size of this header: 64 (bytes)
Size of program headers: 56 (bytes)
Number of program headers: 3
Size of section headers: 64 (bytes)
Number of section headers: 8
Section header string table index: 7
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 0] null NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
[ 1] .init PROGBITS 000000000001031a 0000031a
0000000000000001 0000000000000000 AX 0 0 0
[ 2] .text PROGBITS 000000000001031b 0000031b
0000000000000076 0000000000000000 AX 0 0 0
[ 3] .fini PROGBITS 0000000000010391 00000391
0000000000000001 0000000000000000 AX 0 0 0
[ 4] .rodata PROGBITS 0000000000011392 00000392
0000000000000014 0000000000000000 A 0 0 0
[ 5] .data PROGBITS 00000000000123a6 000003a6
000000000000001e 0000000000000000 A 0 0 0
[ 6] .bss NOBITS 00000000000123c4 000003c4
0000000000000000 0000000000000000 WA 0 0 0
[ 7] strtab STRTAB 00000000000002e8 000002e8
0000000000000032 0000000000000000 AS 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
L (link order), O (extra OS processing required), G (group), T (TLS),
C (compressed), x (unknown), o (OS specific), E (exclude),
l (large), p (processor specific)
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
LOAD 0x000000000000031a 0x000000000001031a 0x000000000001031a
0x0000000000000078 0x0000000000000078 R E 0x1000
LOAD 0x0000000000000392 0x0000000000011392 0x0000000000011392
0x0000000000000014 0x0000000000000014 R 0x1000
LOAD 0x00000000000003a6 0x00000000000123a6 0x00000000000123a6
0x000000000000001e 0x0000000000000082 RW 0x1000
Section to Segment mapping:
Segment Sections...
00 .init .text .fini
01 .rodata
02 .data .bss
UPDATE 2:
Thank you Employed Russian for your answer and ideas. I wanted to share the following update on my own research:
After digging a bit more, I ran across the following line in an Oracle document about program loading:
By default, 64–bit SPARC programs are linked with a starting address of 0x100000000. The whole program is located above 4 gigabytes, including its text, data, heap, stack, and shared object dependencies. This helps ensure that 64–bit programs are correct because the program will fault in the least significant 4 gigabytes of its address space if the program truncates any of its pointers. While 64–bit programs are linked above 4 gigabytes, you can still link programs below 4 gigabytes by using a mapfile and the -M option to the link-editor. See /usr/lib/ld/sparcv9/map.below4G.
(Source: https://docs.oracle.com/cd/E19120-01/open.solaris/819-0690/chapter6-34713/index.html)
Now I am aware the information from that link is awfully specific, but I was nonetheless wondering if there could be some more universal truth to this on other platforms, or at least point me in the right direction.
So I wrote a tiny test program in C and compiled it in two different ways:
gcc test.c - ELF type is ET_DYN / shared object file and no default virtual address offset is used for the LOAD segments:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x000000 0x0000000000000000 0x0000000000000000 0x0005c8 0x0005c8 R 0x1000
LOAD 0x001000 0x0000000000001000 0x0000000000001000 0x0001c5 0x0001c5 R E 0x1000
LOAD 0x002000 0x0000000000002000 0x0000000000002000 0x000130 0x000130 R 0x1000
LOAD 0x002df0 0x0000000000003df0 0x0000000000003df0 0x000220 0x000228 RW 0x1000
gcc -static test.c - ELF type is ET_EXEC / executable and default virtual address offset of 0x400000 is used:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x000000 0x0000000000400000 0x0000000000400000 0x000518 0x000518 R 0x1000
LOAD 0x001000 0x0000000000401000 0x0000000000401000 0x0936dd 0x0936dd R E 0x1000
LOAD 0x095000 0x0000000000495000 0x0000000000495000 0x02664d 0x02664d R 0x1000
LOAD 0x0bc0c0 0x00000000004bd0c0 0x00000000004bd0c0 0x005170 0x0068c0 RW 0x1000
Any ideas why that is? I know it possibly has to do with position-independent code, but I do not understand the necessity for an offset if absolute code is used (as in 2. above). Thanks.
When I change the start address to 0x10000 (PAGESIZE * 16), then the segmentation fault disappears. Any ideas why that is?
This was mentioned in the comments to this answer:
Why does loading at 0x10000 work but at 0x1000 doesn't? Does this depend on the kernel or the hardware? How do I pick the right number here?
Some code in the kernel doesn't like to use addresses below 0x10000, but I have not found that code.
I've tried to load a binary with first PT_LOAD.p_vaddr == 0x1000 into UML kernel (which is easy to debug), but that actually worked, so specific kernel code which prohibits this may be architecture-dependent.
I want to test my ARM project within QEMU using semihosting. Initially I built for Cortex A7 and A9 processors and had no issues running my code, however now that I switched to CM33 (and a CM33 board), it breaks immediately:
C:\Program Files\qemu>qemu-system-aarch64.exe -nographic -machine musca-a -cpu cortex-m33 -monitor none -serial stdio
-kernel app -m 512 -semihosting
qemu: fatal: Lockup: can't escalate 3 to HardFault (current priority -1)
R00=00000000 R01=00000000 R02=00000000 R03=00000000
R04=00000000 R05=00000000 R06=00000000 R07=00000000
R08=00000000 R09=00000000 R10=00000000 R11=00000000
R12=00000000 R13=ffffffe0 R14=fffffff9 R15=00000000
XPSR=40000003 -Z-- A S handler
FPSCR: 00000000
If I understand it right, PC=00000000 indicates reset handler issues. I thought maybe this musca-a board expects the table to be somewhere else, but looks like it's missing completely:
psykana#psykana-lap:~$ readelf app -S
There are 26 section headers, starting at offset 0xb1520:
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] .init PROGBITS 00008000 008000 00000c 00 AX 0 0 4
[ 2] .text PROGBITS 00008010 008010 01d5b4 00 AX 0 0 8
[ 3] .fini PROGBITS 000255c4 0255c4 00000c 00 AX 0 0 4
[ 4] .rodata PROGBITS 000255d0 0255d0 003448 00 A 0 0 8
[ 5] .ARM.exidx ARM_EXIDX 00028a18 028a18 000008 00 AL 2 0 4
[ 6] .eh_frame PROGBITS 00028a20 028a20 000004 00 A 0 0 4
[ 7] .init_array INIT_ARRAY 00038a24 028a24 000008 04 WA 0 0 4
[ 8] .fini_array FINI_ARRAY 00038a2c 028a2c 000004 04 WA 0 0 4
[ 9] .data PROGBITS 00038a30 028a30 000ad8 00 WA 0 0 8
[10] .persistent PROGBITS 00039508 029508 000000 00 WA 0 0 1
[11] .bss NOBITS 00039508 029508 0001c4 00 WA 0 0 4
[12] .noinit NOBITS 000396cc 000000 000000 00 WA 0 0 1
[13] .comment PROGBITS 00000000 029508 000049 01 MS 0 0 1
[14] .debug_aranges PROGBITS 00000000 029551 000408 00 0 0 1
[15] .debug_info PROGBITS 00000000 029959 02e397 00 0 0 1
[16] .debug_abbrev PROGBITS 00000000 057cf0 005b3e 00 0 0 1
[17] .debug_line PROGBITS 00000000 05d82e 01629f 00 0 0 1
[18] .debug_frame PROGBITS 00000000 073ad0 004bf4 00 0 0 4
[19] .debug_str PROGBITS 00000000 0786c4 006a87 01 MS 0 0 1
[20] .debug_loc PROGBITS 00000000 07f14b 01f27e 00 0 0 1
[21] .debug_ranges PROGBITS 00000000 09e3c9 009838 00 0 0 1
[22] .ARM.attributes ARM_ATTRIBUTES 00000000 0a7c01 000036 00 0 0 1
[23] .symtab SYMTAB 00000000 0a7c38 006ec0 10 24 1282 4
[24] .strtab STRTAB 00000000 0aeaf8 002927 00 0 0 1
[25] .shstrtab STRTAB 00000000 0b141f 000100 00 0 0 1
I'm building with the following options (modified toolchain file from my previous question):
add_compile_options(
-mcpu=cortex-m33
-specs=rdimon.specs
-O0
-g
-mfpu=fpv5-sp-d16
-mfloat-abi=hard
)
add_link_options(-specs=rdimon.specs -mcpu=cortex-m33 -mfpu=fpv5-sp-d16 -mfloat-abi=hard)
Again, this worked fine for all A processors I've tried, but breaks for CM33. In fact, it breaks for any M core and M core QEMU board.
For the record:
- arm-none-eabi-gcc (GNU Arm Embedded Toolchain 10.3-2021.10)
- QEMU emulator version 7.0.0 (v7.0.0-11902-g1d935f4a02-dirty)
- Microsoft Windows [Version 10.0.19044.1645]
- cmake version 3.22.
Your guest code has crashed on startup, which is almost always because of problems with your exception vector table. If you use QEMU's -d options (eg -d cpu,int,guest_errors,unimp,in_asm) this will generally give a bit more detail on what exactly happened.
Looking at your ELF headers, it looks like you've not put a vector table into your binary. QEMU requires this (as does real hardware). The usual way to do this is to have a little assembly source file that lays out the data table with the addresses of the various exception entry points, though there are other ways to do this. (This is one example.)
The reason you don't see this on A-profile CPUs is that A-profile exception handling is completely different: on A-profile reset starts execution at address 0x0, and similarly exceptions are taken by setting the PC to a fixed low address. On M-profile reset works by reading the initial PC and SP values from the vector table, and exception handlers start at addresses also read from the vector table. (That is, on A-profile, the thing at the magic low addresses is code, and on M-profile, it is data, effectively function pointers).
Note also that the behaviour of the QEMU -kernel option is different between A-profile and M-profile: on A-profile it will load the ELF file into memory and honour the ELF entry point (execution will start from there). On M-profile it will load the ELF file but then start the CPU from reset in the hardware-specified manner, ie without setting PC to the ELF entry point. (This variation is essentially for historical/back-compat reasons.) If you want "just load my ELF file and set PC to its ELF entry point" you should use QEMU's generic loader device, which behaves the same way on all targets, and not -kernel, which generally means "I am a Linux kernel, please load me in whatever random target-specific plus combination of do-what-I-mean behaviour seems best". -kernel is generally best avoided if you're trying to load a bare-metal binary rather than an actual Linux kernel.
This similar question about getting a working M-profile binary running on QEMU might also be helpful.
I am on a quest to understand low-level computing. I have noticed my compiled binaries are a lot bigger then I think they should be. So I tried to build the smallest possible c program without any stdlib code as follows:
void _start()
{
while(1) {};
}
gcc -nostdlib -o minimal minimal.c
When I disasseble the binary, it shows me exactly what I expect, namely this exact code in three lines of assembly.
$ objdump -d minimal
minimal: file format elf64-x86-64
Disassembly of section .text:
0000000000001000 <_start>:
1000: 55 push %rbp
1001: 48 89 e5 mov %rsp,%rbp
1004: eb fe jmp 1004 <_start+0x4>
But my actual executable is still 13856 Bytes in size. What is it, that makes this so large? What else is in that file? Does the OS need more than these 6 Bytes of machine code?
Edit #1:
The output of size is:
$ size -A minimal
minimal :
section size addr
.interp 28 680
.note.gnu.build-id 36 708
.gnu.hash 28 744
.dynsym 24 776
.dynstr 1 800
.text 6 4096
.eh_frame_hdr 20 8192
.eh_frame 52 8216
.dynamic 208 16176
.comment 18 0
Total 421
Modern compilers and linkers aren't really optimized for producing ultra-small code on full-scale platforms. Not because the job is difficult, but because there's usually no need to. It isn't necessarily that the compiler or linker adds additional code (although it might), but rather that it won't try hard to pack your data and code into the smallest possible space.
In your case, I note that you're using dynamic linking, even though nothing is actually linked. Using "-static" will shave off about 8kB. "-s" (strip) will get rid of a bit more.
I don't know if it's even possible with gcc to make a truly minimal ELF executable. In your case, that ought to be about 400 bytes, nearly all of which will be the various ELF headers, section table, etc.
I don't know if I'm allowed to link my own website (I'm sure somebody will put me right if not), but I have an article on producing a tiny ELF executable by building it from scratch in binary:
http://kevinboone.me/elfdemo.html
There are many different executable file formats. .com, .exe, .elf, .coff, a.out, etc. They ideally contain the machine code and other sections (.text (code), .data, .bss, .rodata and possibly others, names depend on toolchain) plus they contain debugging information. Notice how your disassembly showed the label _start? that is a string among others and other info to be able to connect that string to the address for debugging. The output of objdump also showed that you are using an elf file, you can easily look up the file format and can trivially write your own program to parse through the file, or try to use readelf and other tools to see what is in there (high level not raw).
On an operating system where in general (not always, but think pc) the programs are being loaded into ram and then run, so you want to have first and foremost a file format that is supported by the operating system, there is no reason for them to support more than one, but they might. It is os/system design dependent, but the os may be designed to not only load the code, but also load/initialize the data (.data, .bss). When booting say an mcu you need to embed the data into the binary blob and the application itself copies the data to ram from the flash, but in an os that isn't necessarily required, but in order to do it you need a file format that can distinguish the sections, target locations, and sizes. Which means extra bytes in the file to define this and a file format.
A binary includes the bootstrap code before it can enter the C generated code, depending on the system, depending on the C library (multiple/many C libraries can be used on a computer and bootstrap is specific to the C library in general not the target, nor operating system, not a compiler thing), so some percentage of the file is the bootstrap code, too when your main program is very tiny the a lot of the file size is overhead.
You can for example use strip to make the file smaller by getting rid of some symbols and other non-essential items like that the file size should get smaller but the objdump disassembly will then not have labels and for the case of x86, a variable length instruction set which is difficult at best to disassemble gets much harder, so the output with or without labels may not reflect the actual instructions, but without the labels the gnu disassembler doesn't reset itself at the labels and can make the output worse.
If you use clang 10.0 and lld 10.0 and strip out unnecessary sections you can get the size of a 64-bit statically linked executable to under 800 bytes.
$ cat minimal.c
void _start(void)
{
int i = 0;
while (i < 11) {
i++;
}
asm( "int $0x80" :: "a"(1), "b"(i) );
}
$ clang -static -nostdlib -flto -fuse-ld=lld -o minimal minimal.c
$ ls -l minimal
-rwxrwxr-x 1 fpm fpm 1376 Sep 4 17:38 minimal
$ readelf --string-dump .comment minimal
String dump of section '.comment':
[ 0] Linker: LLD 10.0.0
[ 13] clang version 10.0.0 (Fedora 10.0.0-2.fc32)
$ readelf -W --section-headers minimal
There are 9 section headers, starting at offset 0x320:
Section Headers:
[Nr] Name Type Address Off Size ES Flg Lk Inf Al
[ 0] NULL 0000000000000000 000000 000000 00 0 0 0
[ 1] .note.gnu.build-id NOTE 0000000000200190 000190 000018 00 A 0 0 4
[ 2] .eh_frame_hdr PROGBITS 00000000002001a8 0001a8 000014 00 A 0 0 4
[ 3] .eh_frame PROGBITS 00000000002001c0 0001c0 00003c 00 A 0 0 8
[ 4] .text PROGBITS 0000000000201200 000200 00002a 00 AX 0 0 16
[ 5] .comment PROGBITS 0000000000000000 00022a 000040 01 MS 0 0 1
[ 6] .symtab SYMTAB 0000000000000000 000270 000048 18 8 2 8
[ 7] .shstrtab STRTAB 0000000000000000 0002b8 000055 00 0 0 1
[ 8] .strtab STRTAB 0000000000000000 00030d 000012 00 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
L (link order), O (extra OS processing required), G (group), T (TLS),
C (compressed), x (unknown), o (OS specific), E (exclude),
l (large), p (processor specific)
$ strip -R .eh_frame_hdr -R .eh_frame minimal
$ strip -R .comment -R .note.gnu.build-id minimal
strip: minimal: warning: empty loadable segment detected at vaddr=0x200000, is this intentional?
$ readelf -W --section-headers minimal
There are 3 section headers, starting at offset 0x240:
Section Headers:
[Nr] Name Type Address Off Size ES Flg Lk Inf Al
[ 0] NULL 0000000000000000 000000 000000 00 0 0 0
[ 1] .text PROGBITS 0000000000201200 000200 00002a 00 AX 0 0 16
[ 2] .shstrtab STRTAB 0000000000000000 00022a 000011 00 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
L (link order), O (extra OS processing required), G (group), T (TLS),
C (compressed), x (unknown), o (OS specific), E (exclude),
l (large), p (processor specific)
$ ll minimal
-rwxrwxr-x 1 fpm fpm 768 Sep 4 17:45 minimal
I have built a axf (elf) file using Arm Compiler v6.9 for Cortex-R4. However when I load this to the target using Arm MCU Eclipse J-link GDB plugins it fails to load the initialisation data for my segments. If I load the axf using Segger Ozone and J-Link it loads the init data correctly.
If I run the arm-none-eabi-gdb.exe on the axf file I get "Warning: Loadable section "my_section" outside of ELF segments" for all my initialised segments.
Looking at the image the initialisation data should be loaded after the image to the addresses specified by the table in Region$$Table$$Base.
We don't have this problem if we link with gcc as the initialised data is done differently.
Any ideas?
I've faced the same issue today and observed the same problem that you described:
"Looking at the image the initialisation data should be loaded after the image to the addresses specified by the table in Region$$Table$$Base."
It seems that although very similar, the ELF file generated by armlink is a bit different than the ELF generated by GCC.
Anyway, I've found a workaround for that.
Checking my main.elf, I noticed that armlinker stored the initialization data into the ER_RW section:
arm-none-eabi-readelf.exe" -S main.elf
There are 16 section headers, starting at offset 0x122b0:
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] ER_RO PROGBITS 20000000 000034 001358 00 AX 0 0 4
[ 2] ER_RW PROGBITS 20002000 00138c 0000cc 00 WA 0 0 4
[ 3] ER_ZI NOBITS 200020cc 001458 0004e8 00 WA 0 0 4
[ 4] .debug_abbrev PROGBITS 00000000 001458 0005c4 00 0 0 1
[ 5] .debug_frame PROGBITS 00000000 001a1c 000dc4 00 0 0 1
...
I noticed that the issue happens because GDB loaded ER_RW at addr=0x20002000 but, in fact, I needed it to be loaded just after the of ER_RO section (i.e. at addr=0x20001358)
The workaround for that is:
1- Use fromelf to dump all sections into a binary file main.bin. Fromelf will append ER_RW just after ER_RO, as it is supposed to be:
fromelf.exe --bin -o main.bin main.elf
2- Use objcopy to replace the contents of the ER_RO section with the data from main.bin.
Please notice that we can remove the ER_RW section now since it was already merged with ER_RO into main.bin:
arm-none-eabi-objcopy.exe main.elf --update-section ER_RO=main.bin --remove-section=ER_RW main.gdb.elf
The new main.gdb.elf file can now be loaded by arm-none-eabi-gdb.exe
This is how it looks:
arm-none-eabi-readelf.exe" -S main.gdb2.elf
There are 15 section headers, starting at offset 0x11c0c:
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] ER_RO PROGBITS 20000000 000054 001424 00 AX 0 0 4
[ 2] ER_ZI NOBITS 200020cc 000000 0004e8 00 WA 0 0 4
[ 3] .debug_abbrev PROGBITS 00000000 001478 0005c4 00 0 0 1
...
Happy debugging with GDB!! ;-)
I'm compiling a c file foo.c:
#include <stdlib.h>
extern void *memcpy_optimized(void* __restrict, void* __restrict, size_t);
void foo() {
[blah blah blah]
memcpy_optimized((void *)a, (void *)b, 123);
}
then I have the assembly file memcpy_optimized.S:
.text
.fpu neon
.global memcpy_optimized
.type memcpy_optimized, %function
.align 4
memcpy_optimized:
.fnstart
mov ip, r0
cmp r2, #16
blt 4f # Have less than 16 bytes to copy
# First ensure 16 byte alignment for the destination buffer
tst r0, #0xF
beq 2f
tst r0, #1
ldrneb r3, [r1], #1
[blah blah blah]
.fnend
Both files compile fine with: gcc $< -o $# -c
but when I link the application with both resulting objects, I get the following error:
foo.c:(.text+0x380): undefined reference to `memcpy_optimized(void*, void *, unsigned int)'
Any idea what I'm doing wrong?
readelf -a obj/memcpy_optimized.o
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: REL (Relocatable file)
Machine: ARM
Version: 0x1
Entry point address: 0x0
Start of program headers: 0 (bytes into file)
Start of section headers: 436 (bytes into file)
Flags: 0x5000000, Version5 EABI
Size of this header: 52 (bytes)
Size of program headers: 0 (bytes)
Number of program headers: 0
Size of section headers: 40 (bytes)
Number of section headers: 11
Section header string table index: 8
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] .text PROGBITS 00000000 000040 0000f0 00 AX 0 0 16
[ 2] .data PROGBITS 00000000 000130 000000 00 WA 0 0 1
[ 3] .bss NOBITS 00000000 000130 000000 00 WA 0 0 1
[ 4] .ARM.extab PROGBITS 00000000 000130 000000 00 A 0 0 1
[ 5] .ARM.exidx ARM_EXIDX 00000000 000130 000008 00 AL 1 0 4
[ 6] .rel.ARM.exidx REL 00000000 00044c 000010 08 9 5 4
[ 7] .ARM.attributes ARM_ATTRIBUTES 00000000 000138 000023 00 0 0 1
[ 8] .shstrtab STRTAB 00000000 00015b 000056 00 0 0 1
[ 9] .symtab SYMTAB 00000000 00036c 0000b0 10 10 9 4
[10] .strtab STRTAB 00000000 00041c 00002f 00 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings)
I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
O (extra OS processing required) o (OS specific), p (processor specific)
There are no section groups in this file.
There are no program headers in this file.
Relocation section '.rel.ARM.exidx' at offset 0x44c contains 2 entries:
Offset Info Type Sym.Value Sym. Name
00000000 0000012a R_ARM_PREL31 00000000 .text
00000000 00000a00 R_ARM_NONE 00000000 __aeabi_unwind_cpp_pr0
Unwind table index '.ARM.exidx' at offset 0x130 contains 1 entries:
0x0 <memcpy_optimized>: 0x80b0b0b0
Compact model 0
0xb0 finish
0xb0 finish
0xb0 finish
Symbol table '.symtab' contains 11 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 00000000 0 NOTYPE LOCAL DEFAULT UND
1: 00000000 0 SECTION LOCAL DEFAULT 1
2: 00000000 0 SECTION LOCAL DEFAULT 2
3: 00000000 0 SECTION LOCAL DEFAULT 3
4: 00000000 0 NOTYPE LOCAL DEFAULT 1 $a
5: 00000000 0 SECTION LOCAL DEFAULT 4
6: 00000000 0 SECTION LOCAL DEFAULT 5
7: 00000000 0 NOTYPE LOCAL DEFAULT 5 $d
8: 00000000 0 SECTION LOCAL DEFAULT 7
9: 00000000 0 FUNC GLOBAL DEFAULT 1 memcpy_optimized
10: 00000000 0 NOTYPE GLOBAL DEFAULT UND __aeabi_unwind_cpp_pr0
No version information found in this file.
Attribute Section: aeabi
File Attributes
Tag_CPU_name: "7-A"
Tag_CPU_arch: v7
Tag_CPU_arch_profile: Application
Tag_ARM_ISA_use: Yes
Tag_THUMB_ISA_use: Thumb-2
Tag_FP_arch: VFPv3
Tag_Advanced_SIMD_arch: NEONv1
Tag_DIV_use: Not allowed
It seems to me that you compiled your foo.c as C++, hence the linking error. What made me say that is that the linker reported the full prototype of the missing function. C functions do not have their full prototype as their symbol (just the name of function), however the C++ mangled names represent the full prototype of the function.
In many Unix and GCC C implementations, names in C are decorated with an initial underscore in object code. So, to call memcpy_optimized in C, you must use the name _memcpy_optimized in assembly.