Crafting an ELF file using linker scripts without zero-initialized blocks between sections - linker

I am trying to craft a linker-command script to be bootable by legacy grub (using multiboot). I am having difficulty getting the multiboot header in the required location (within the first 8192 bytes). My script looks something like:
SECTIONS
{
.multiboot :
{
__multiboot_header = .;
*(.multiboot)
}
.text 0x00100000 :
{
*(.text*)
*(.rodata)
}
/* ... remainder of script ... */
}
Overall, my objective is to have my custom executable loaded by the bootloader after the first 1MiB of physical memory; the address as part of the .text section declaration seems to have done this as I expected. Reading the ELF header gives the entry point as:
$ readelf -h kernel.elf | grep Entry
Entry point address: 0x100000
However, in doing so I seem to have also increased the file by this much.
$ ls -l file.elf
-rwxr-xr-x 1 user user 1049960 May 13 02:20 file.elf
The area between the ELF header and the .text section is initialized to zeros.
$ hexdump -C file.elf
00000000 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 |.ELF............|
00000010 02 00 03 00 01 00 00 00 00 00 10 00 34 00 00 00 |............4...|
00000020 00 04 10 00 00 00 00 00 34 00 20 00 02 00 28 00 |........4. ...(.|
00000030 09 00 08 00 01 00 00 00 00 00 00 00 00 00 00 00 |................|
00000040 00 00 00 00 f8 00 10 00 f8 f0 4e 00 07 00 00 00 |..........N.....|
00000050 00 00 20 00 51 e5 74 64 00 00 00 00 00 00 00 00 |.. .Q.td........|
00000060 00 00 00 00 00 00 00 00 00 00 00 00 07 00 00 00 |................|
00000070 10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000080 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00100000 8b 25 f8 f0 4e 00 50 53 e8 66 00 00 00 fa f4 eb |.%..N.PS.f......|
00100010 fc 55 89 e5 53 83 ec 10 c7 45 f8 00 00 00 00 eb |.U..S....E......|
00100020 38 a1 f4 00 10 00 8b 55 f8 01 d2 01 d0 8b 15 f4 |8......U........|
Also, although readelf -s reports that __multiboot_header has a value of 0x0 (which should be the address of the structure since it was defined at the same point it was mentioned in the linker file, right?):
$ readelf -s kernel.elf | grep multiboot
22: 00000000 0 NOTYPE GLOBAL DEFAULT 1 __multiboot_header
The output of readelf -S seems to conflict:
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 1] .multiboot PROGBITS 00000000 1000f8 00000c 00 0 0 1
[ 2] .text PROGBITS 00100000 100000 000096 00 AX 0 0 1
Which implies that the .multiboot section is actually inside the .text section.
If I offset into the file by 0x1000f8 then I can find the structure, however, I am not sure where the offset came from.
tl;dr
1) How can I ensure a specific data structure is within the first 8192 bytes of the output file?
2) How can I specify the load address without inflating the output binary with large gaps of zero-initialized blocks?

The elf file format is generated by its own rules, how it stores the information is not directly influenced by the linker file (e.g. at what offsets the sections are stored must not correlate with their location at lifetime in memory). It's the memory layout which is described by the SECTIONS command in your linker file, and the elf file format describes this layout... You need a elf capable loader to load the individual sections into the target locations.
To get a flat binary file which can be loaded 1:1 into memory, use objcopy (e.g. something like objcopy -O binary myfile.elf myfile.bin). The layout of that file is directly influenced by your linker script, and the content of the .multiboot section should be really at offset 0.

Related

Why does my OS won't boot but writes a space before the cursor in QEMU? (long question) [duplicate]

I've been banging my head against the wall in an attempt to understand why the following assembly is not correctly dumping the contents of 'HELLO_WORLD'.
; Explicitly set 16-bit
[ BITS 16 ]
[ ORG 0x7C00 ]
; Create label for hello world string terminated by null.
HELLO_WORLD db 'hello world', 0
start:
; Move address of HELLO_WORLD into si
mov SI, HELLO_WORLD
call print_string
; Continue until the end of time
jmp $
print_string:
loop:
; Retrieve value stored in address at si
mov al, [SI]
mov ah, 0x0E
cmp al, 0
; Finish execution after hitting null terminator
je return
INT 0x10
; Increment contents of si (address)
inc SI
jmp loop
return:
ret
; boot loader length *must* be 512 bytes.
times 510-($-$$) db 0
dw 0xAA55
In the end, I discovered that if we do not execute (make it not code) the label, then it functions correctly.
jmp start
HELLO_WORLD db 'hello world',0
The part I find the most confusing, looking at the hex dump, HELLO_WORLD is still in the binary (at the beginning - and there appears to be no distinction of its type).
cat nojmp_boot.out
00000000 68 65 6c 6c 6f 20 77 6f 72 6c 64 00 be 00 7c e8 |hello world...|.|
00000010 02 00 eb fe 8a 04 b4 0e 3c 00 74 05 cd 10 46 eb |........<.t...F.|
00000020 f3 c3 eb e8 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000001f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 55 aa |..............U.|
00000200
cat jmpboot.out
00000000 eb 22 68 65 6c 6c 6f 20 77 6f 72 6c 64 00 be 02 |."hello world...|
00000010 7c e8 02 00 eb fe 8a 04 b4 0e 3c 00 74 05 cd 10 ||.........<.t...|
00000020 46 eb f3 c3 eb e8 00 00 00 00 00 00 00 00 00 00 |F...............|
00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000001f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 55 aa |..............U.|
00000200
Inspecting the first two bytes, we can see 'e8 22' is a shortjump to address 22 (http://net.cs.uni-bonn.de/fileadmin/user_upload/plohmann/x86_opcode_structure_and_instruction_overview.pdf).
My question is:
Why can we not have 'HELLO_WORLD' as a part of the execution of the program, as far I was concerned, there was no distinction between code and data?
I'm using the following for compilation:
nasm -f bin -o boot.bin boot.asm && if [ $(stat -c "%s" boot.bin) -ne 512 ]; then x; fi && qemu-system-x86_64 boot.bin
Execution starts at the top. If you omit the jmp start then the character h will get interpreted by the CPU as if it were an instruction. Surely you see that such can not be correct?
as far I was concerned, there was no distinction between code and data?
There's no distinction between code and data when we consider their placement in the binary. But code and data still remain 2 completly different items. Code being the only one that can get executed by the CPU.
Since you're creating a boot sector the execution begins at the first byte of the generated file. It won't begin at the start label or anywhere else. Since the string "hello world" is at the start of the file these bytes are what get executed first. These bytes are interpreted by the CPU as instructions, not characters, and they get executed as whatever instructions they get decoded as.
Here are the instructions that get executed:
7c00: 68 65 6c push 0x6c65
7c03: 6c ins BYTE PTR es:[di],dx
7c04: 6f outs dx,WORD PTR ds:[si]
7c05: 20 77 6f and BYTE PTR [bx+0x6f],dh
7c08: 72 6c jb 0x7c76
7c0a: 64 00 be 00 7c add BYTE PTR fs:[bp+0x7c00],bh
7c0f: e8 02 00 call 0x7c14
7c12: eb fe jmp 0x7c12
7c14: 8a 04 mov al,BYTE PTR [si]
...

U-boot: how to check if tftp command successfully loaded image into ram?

I load a rootfs image into RAM via u-boot tftp and flash it to the device flash storage. This is currently done manually, but now I want to do it automatically via a u-boot script:
tftp ${rootfs_image};
mmc write ${loadaddr} ${blk} ${cnt}
However, when it looks for an image from the tftp server with the u-boot tftp ${rootfs_image} command and it DOESN'T find the image, I don't want to run the mmc write part of the script.
How do I check if the tftp command successfully downloaded the image into RAM?
Using the TFTP protocol does not ensure that the integrity of the transferred data will be preserved - see section Security Consideration in this article.
Assuming your u-boot has the hash command available, or that you can re-compile it with CONFIG_CMD_HASH=y, you could use an SHA-256 hash for verifying that your image was properly transferred:
On a Linux TFTP server:
# create an image for the purpose of the example
echo "Binary Image" > image.bin
# display sha256 hash for image.bin
sha256sum -b image.bin
36949f85f1bff0d5d1dd5fcfdfd725e919b0ee64be24f7f3ccfb53908fd09550 *image.bin
# create a file containing the hash in binary
# credits:
sha256sum -b image.bin | xxd -r -p > image.bin.sha256sum.bin
# display content of binary file
hexdump -C image.bin.sha256sum.bin
00000000 36 94 9f 85 f1 bf f0 d5 d1 dd 5f cf df d7 25 e9 |6........._...%.|
00000010 19 b0 ee 64 be 24 f7 f3 cc fb 53 90 8f d0 95 50 |...d.$....S....P|
00000020
On your u-boot system (using the memory layout available on my Alwinner H5 system here):
# 0x40080000: address where image.bin will be transfered
# 0x40090000: address where image.bin.sha256sum.bin will be transfrered
# 0x40090000: address where the sha256 has will be computed by u-boot on the 13 bytes of image.bin
# clearing memory
mw.b 0x40080000 0 0x2000
mw.b 0x40090000 0 0x20
mw.b 0x400A0000 0 0x20
md.b 0x40080000 0x20
40080000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
40080010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
md.b 0x40090000 0x20
40090000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
40090010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
md.b 0x400A0000 0x20
400a0000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
400a0010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
tftp 0x40080000 image.bin
Using ethernet#1c30000 device
TFTP from server 192.168.1.22; our IP address is 192.168.1.2
Filename 'image.bin'.
Load address: 0x40080000
Loading: #
5.9 KiB/s
done
Bytes transferred = 13 (d hex)
tftp 0x40090000 image.bin.sha256sum.bin
Using ethernet#1c30000 device
TFTP from server 192.168.1.22; our IP address is 192.168.1.2
Filename 'image.bin.sha256sum.bin'.
Load address: 0x40090000
Loading: #
15.6 KiB/s
done
Bytes transferred = 32 (20 hex)
md.b 0x40090000 0x20
40090000: 36 94 9f 85 f1 bf f0 d5 d1 dd 5f cf df d7 25 e9 6........._...%.
40090010: 19 b0 ee 64 be 24 f7 f3 cc fb 53 90 8f d0 95 50 ...d.$....S....P
hash sha256 0x40080000 0x0d *0x400A0000
sha256 for 40080000 ... 4008000c ==> 36949f85f1bff0d5d1dd5fcfdfd725e919b0ee64be24f7f3ccfb53908fd09550
cmp.b 0x40090000 0x400A0000 0x20
Total of 32 byte(s) were the same
echo $?
0
In the case image.bin and/or image.bin.sha256sum.bin would have been improperly transferred, the chances that the computed sha256 would match the transferred one are extremely unlikely - using SHA-512 would make this even more unlikely.
The outcome would have been in the case of an incorrect transfer:
echo $?
1
In real life, this would be more practical to transfer an image with a fixed, maximum length, for example padded with zeroes, so that a u-boot script responsible for validating the transferred image would use a fixed length, say 8 KiB, that is 0x2000 bytes.
ls -lgG image.bin
-rw-rw-r-- 1 13 Dec 17 20:34 image.bin
dd if=/dev/zero of=image.bin bs=8K count=1 oflag=append
ls -lgG image.bin
-rw-rw-r-- 1 8192 Dec 17 21:03 image.bin
hexdump -C image.bin
00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00002000
The correct u-boot command to use for computing the hash would be:
hash sha256 0x40080000 0x2000 *0x400A0000
And a new binary file containing the new hash would of course have to be created as well:
sha256sum -b image.bin | xxd -r -p > image.bin.sha256sum.bin
I used two files for the purpose of the example, but you could just append image.bin.sha256sum.bin to image.bin and transfer one single file.
You would have to replace 0x400A0000 by 0x40082000 in the hash and cmp commands.
I hope this helps.
The tftp command returns true if it succeeds. So you could write:
tftp ${rootfs_image} && mmc write ${loadaddr} ${blk} ${cnt}
Now mmc write will only be executed if the tftp command succeeds.

panic: SetUint using value obtained using unexported field

From byte buffer received server, I want to copy struct.
The format of buffer is fixed size bytes as below.
00000000 83 27 48 12 6c 00 00 00 01 02 00 00 01 01 00 02 |.'H.l...........|
00000010 10 01 d2 02 96 49 00 00 00 00 87 d6 12 00 00 00 |.....I..........|
00000020 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000030 00 02 01 02 3c 01 01 00 00 00 01 01 01 01 18 10 |....<...........|
00000040 2c 01 90 01 01 6c 07 03 c8 02 01 02 03 9c 0a 0b |,....l..........|
00000050 0c 00 00 00 01 01 00 00 00 00 00 00 00 01 01 01 |................|
00000060 01 01 01 01 01 01 01 01 01 00 01 01 01 00 00 00 |................|
My struct is below.
type HeaderT struct {
magicValue [8]byte
bodyLength [4]byte
bodyVersion [1]byte
...
}
My implementation is at below.
func onMessageReceived(client MQTT.Client, message MQTT.Message) {
payload := message.Payload()
fmt.Printf("Received message on topic: %s\nMessage: \n%s\n", message.Topic(), hex.Dump(payload))
header := HeaderT {}
err := binary.Read(bytes.NewBuffer(payload[:]), binary.LittleEndian, &header) // <-- error occurred at this line
...
}
My code make panic as below.
panic: reflect: reflect.Value.SetUint using value obtained using
unexported field
goroutine 38 [running]: reflect.flag.mustBeAssignable(0x1a8)
/usr/local/go/src/reflect/value.go:231 +0x1ee reflect.Value.SetUint(0x12540e0, 0xc0001a2000, 0x1a8, 0x83)
/usr/local/go/src/reflect/value.go:1551 +0x2f encoding/binary.(*decoder).value(0xc000148d88, 0x12540e0,
0xc0001a2000, 0x1a8)
/usr/local/go/src/encoding/binary/binary.go:548 +0x7c6 encoding/binary.(*decoder).value(0xc000148d88, 0x125cfc0,
0xc0001a2000, 0x1b1)
/usr/local/go/src/encoding/binary/binary.go:510 +0x104 encoding/binary.(*decoder).value(0xc000148d88, 0x129fa00,
0xc0001a2000, 0x199)
/usr/local/go/src/encoding/binary/binary.go:523 +0x2c5 encoding/binary.Read(0x12fcf80, 0xc00018a150, 0x1300c60, 0x14d76d0,
0x1248040, 0xc0001a2000, 0x0, 0x0)
/usr/local/go/src/encoding/binary/binary.go:248 +0x342 main.onMessageReceived(0x13012a0, 0xc000140000, 0x1300c00,
0xc000192000)
The issue is that none of HeaderT's fields are "public".
Notice that all the fields start with a lowercase letter - that means the fields are unreachable to any code outside of your package.
From the spec:
Exported identifiers
An identifier may be exported to permit access to it from another package. An identifier is exported if both:
the first character of the identifier's name is a Unicode upper case letter (Unicode class "Lu"); and
the identifier is declared in the package block or it is a field name or method name.
All other identifiers are not exported.
Try Exporting them by capitalizing their names:
type HeaderT struct {
MagicValue [8]byte
BodyLength [4]byte
BodyVersion [1]byte
...
}

How to read binary executable by instructions?

is there a way to read given amount of instructions from a binary executable file on x86 architecture programmatically?
If I had a binary of a simple C program hello.c:
#include <stdio.h>
int main(){
printf("Hello world\n");
return 0;
}
Where after compilation using gcc, the disassembled function main looks like this:
000000000000063a <main>:
63a: 55 push %rbp
63b: 48 89 e5 mov %rsp,%rbp
63e: 48 8d 3d 9f 00 00 00 lea 0x9f(%rip),%rdi # 6e4 <_IO_stdin_used+0x4>
645: e8 c6 fe ff ff callq 510 <puts#plt>
64a: b8 00 00 00 00 mov $0x0,%eax
64f: 5d pop %rbp
650: c3 retq
651: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
658: 00 00 00
65b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
Is there an easy way in C to read for example first three instructions (meaning the bytes 55, 48, 89, e5, 48, 8d, 3d, 9f, 00, 00, 00) from main? It is not guaranteed that the function looks like this - the first instructions may have all different opcodes and sizes.
this prints the 10 first bytes of the main function by taking the address of the function and converting to a pointer of unsigned char, print in hex.
This small snippet doesn't count the instructions. For this you would need an instruction size table (not very difficult, just tedious unless you find the table already done, What is the size of each asm instruction?) to be able to predict the size of each instruction given the first byte.
(unless of course, the processor you're targetting has a fixed instruction size, which makes the problem trivial to solve)
Debuggers have to decode operands as well, but in some cases like step or trace, I suspect they have a table handy to compute the next breakpoint address.
#include <stdio.h>
int main(){
printf("Hello world\n");
const unsigned char *start = (const char *)&main;
int i;
for (i=0;i<10;i++)
{
printf("%x\n",start[i]);
}
return 0;
}
output:
Hello world
55
89
e5
83
e4
f0
83
ec
20
e8
seems to match the disassembly :)
00401630 <_main>:
401630: 55 push %ebp
401631: 89 e5 mov %esp,%ebp
401633: 83 e4 f0 and $0xfffffff0,%esp
401636: 83 ec 20 sub $0x20,%esp
401639: e8 a2 01 00 00 call 4017e0 <___main>
.globl _start
_start:
bl main
b .
.globl main
main:
add r1,#1
add r2,#1
add r3,#1
add r4,#1
b main
intentionally wrong architecture, architecture doesnt matter file format matters. built this into an elf file format, which is very popular, and is simply a file format which is what I understood your question to be, to read a file, not modify the binary to read the program runtime from memory.
it is very much popular and there are tools that do it which you appear to know how to run.
Disassembly of section .text:
00001000 <_start>:
1000: eb000000 bl 1008 <main>
1004: eafffffe b 1004 <_start+0x4>
00001008 <main>:
1008: e2811001 add r1, r1, #1
100c: e2822001 add r2, r2, #1
1010: e2833001 add r3, r3, #1
1014: e2844001 add r4, r4, #1
1018: eafffffa b 1008 <main>
if I hexdump the file though
00000000 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 |.ELF............|
00000010 02 00 28 00 01 00 00 00 00 10 00 00 34 00 00 00 |..(.........4...|
00000020 c0 11 00 00 00 02 00 05 34 00 20 00 01 00 28 00 |........4. ...(.|
00000030 06 00 05 00 01 00 00 00 00 00 00 00 00 00 00 00 |................|
00000040 00 00 00 00 1c 10 00 00 1c 10 00 00 05 00 00 00 |................|
00000050 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000060 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00001000 00 00 00 eb fe ff ff ea 01 10 81 e2 01 20 82 e2 |............. ..|
00001010 01 30 83 e2 01 40 84 e2 fa ff ff ea 41 11 00 00 |.0...#......A...|
00001020 00 61 65 61 62 69 00 01 07 00 00 00 08 01 00 00 |.aeabi..........|
00001030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00001040 00 00 00 00 00 10 00 00 00 00 00 00 03 00 01 00 |................|
00001050 00 00 00 00 00 00 00 00 00 00 00 00 03 00 02 00 |................|
00001060 01 00 00 00 00 00 00 00 00 00 00 00 04 00 f1 ff |................|
00001070 06 00 00 00 00 10 00 00 00 00 00 00 00 00 01 00 |................|
00001080 18 00 00 00 1c 10 01 00 00 00 00 00 10 00 01 00 |................|
00001090 09 00 00 00 1c 10 01 00 00 00 00 00 10 00 01 00 |................|
000010a0 17 00 00 00 1c 10 01 00 00 00 00 00 10 00 01 00 |................|
000010b0 55 00 00 00 00 10 00 00 00 00 00 00 10 00 01 00 |U...............|
000010c0 23 00 00 00 1c 10 01 00 00 00 00 00 10 00 01 00 |#...............|
000010d0 2f 00 00 00 08 10 00 00 00 00 00 00 10 00 01 00 |/...............|
000010e0 34 00 00 00 1c 10 01 00 00 00 00 00 10 00 01 00 |4...............|
000010f0 3c 00 00 00 1c 10 01 00 00 00 00 00 10 00 01 00 |<...............|
00001100 43 00 00 00 1c 10 01 00 00 00 00 00 10 00 01 00 |C...............|
00001110 48 00 00 00 00 00 08 00 00 00 00 00 10 00 01 00 |H...............|
00001120 4f 00 00 00 1c 10 01 00 00 00 00 00 10 00 01 00 |O...............|
00001130 00 73 6f 2e 6f 00 24 61 00 5f 5f 62 73 73 5f 73 |.so.o.$a.__bss_s|
00001140 74 61 72 74 5f 5f 00 5f 5f 62 73 73 5f 65 6e 64 |tart__.__bss_end|
00001150 5f 5f 00 5f 5f 62 73 73 5f 73 74 61 72 74 00 6d |__.__bss_start.m|
00001160 61 69 6e 00 5f 5f 65 6e 64 5f 5f 00 5f 65 64 61 |ain.__end__._eda|
00001170 74 61 00 5f 65 6e 64 00 5f 73 74 61 63 6b 00 5f |ta._end._stack._|
00001180 5f 64 61 74 61 5f 73 74 61 72 74 00 00 2e 73 79 |_data_start...sy|
00001190 6d 74 61 62 00 2e 73 74 72 74 61 62 00 2e 73 68 |mtab..strtab..sh|
000011a0 73 74 72 74 61 62 00 2e 74 65 78 74 00 2e 41 52 |strtab..text..AR|
000011b0 4d 2e 61 74 74 72 69 62 75 74 65 73 00 00 00 00 |M.attributes....|
000011c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
000011e0 00 00 00 00 00 00 00 00 1b 00 00 00 01 00 00 00 |................|
000011f0 06 00 00 00 00 10 00 00 00 10 00 00 1c 00 00 00 |................|
00001200 00 00 00 00 00 00 00 00 04 00 00 00 00 00 00 00 |................|
00001210 21 00 00 00 03 00 00 70 00 00 00 00 00 00 00 00 |!......p........|
00001220 1c 10 00 00 12 00 00 00 00 00 00 00 00 00 00 00 |................|
00001230 01 00 00 00 00 00 00 00 01 00 00 00 02 00 00 00 |................|
00001240 00 00 00 00 00 00 00 00 30 10 00 00 00 01 00 00 |........0.......|
00001250 04 00 00 00 05 00 00 00 04 00 00 00 10 00 00 00 |................|
00001260 09 00 00 00 03 00 00 00 00 00 00 00 00 00 00 00 |................|
00001270 30 11 00 00 5c 00 00 00 00 00 00 00 00 00 00 00 |0...\...........|
00001280 01 00 00 00 00 00 00 00 11 00 00 00 03 00 00 00 |................|
00001290 00 00 00 00 00 00 00 00 8c 11 00 00 31 00 00 00 |............1...|
000012a0 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 |................|
000012b0
can google the file format and find a lot of info at wikipedia, with a smidge more at one of the links
useful header information
00 10 00 00 entrh
34 00 00 00 phoff
c0 11 00 00 shoff
00 02 00 05 flags
34 00 ehsize
20 00 phentsize
01 00 phnum
28 00 shentsize
06 00 shnum
05 00shstrndx
so if I look at the beginning of the sections there are shnum number of them
0x11C0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x11E8 1b 00 00 00 01 00 00 00 06 00 00 00 00 10 00 00 00 10 00 00
0x1210 21 00 00 00 03 00 00 70 00 00 00 00 00 00 00 00 1c 10 00 00
0x1238 01 00 00 00 02 00 00 00 00 00 00 00 00 00 00 00 30 10 00 00
0x1260 09 00 00 00 03 00 00 00 00 00 00 00 00 00 00 00 30 11 00 00
0x1288 11 00 00 00 03 00 00 00 00 00 00 00 00 00 00 00 8c 11 00 00
0x1260 strtab type offset 0x1130 which is broken into null terminated strings until you hit a double null
[0] 00
[1] 73 6f 2e 6f 00 so.o
[2] 24 61 00 $a
[3] 5f 5f 62 73 73 5f 73 74 61 72 74 5f 5f 00 __bss_start__
[4] 5f 5f 62 73 73 5f 65 6e 64 5f 5f 00 __bss_end__
[5] 5f 5f 62 73 73 5f 73 74 61 72 74 00 __bss_start
[6] 6d 61 69 6e 00 main
...
main is at address 0x115F in the file which is offset 0x2F in the
strtab.
0x1238 symtab starts at 0x1030, 0x10 or 16 bytes per entry
00001030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00001040 00 00 00 00 00 10 00 00 00 00 00 00 03 00 01 00 |................|
00001050 00 00 00 00 00 00 00 00 00 00 00 00 03 00 02 00 |................|
00001060 01 00 00 00 00 00 00 00 00 00 00 00 04 00 f1 ff |................|
00001070 06 00 00 00 00 10 00 00 00 00 00 00 00 00 01 00 |................|
00001080 18 00 00 00 1c 10 01 00 00 00 00 00 10 00 01 00 |................|
00001090 09 00 00 00 1c 10 01 00 00 00 00 00 10 00 01 00 |................|
000010a0 17 00 00 00 1c 10 01 00 00 00 00 00 10 00 01 00 |................|
000010b0 55 00 00 00 00 10 00 00 00 00 00 00 10 00 01 00 |U...............|
000010c0 23 00 00 00 1c 10 01 00 00 00 00 00 10 00 01 00 |#...............|
000010d0 2f 00 00 00 08 10 00 00 00 00 00 00 10 00 01 00 |/...............|
000010e0 34 00 00 00 1c 10 01 00 00 00 00 00 10 00 01 00 |4...............|
000010f0 3c 00 00 00 1c 10 01 00 00 00 00 00 10 00 01 00 |<...............|
00001100 43 00 00 00 1c 10 01 00 00 00 00 00 10 00 01 00 |C...............|
00001110 48 00 00 00 00 00 08 00 00 00 00 00 10 00 01 00 |H...............|
00001120 4f 00 00 00 1c 10 01 00 00 00 00 00 10 00 01 00 |O...............|
000010d0 2f 00 00 00 has the 0x2f offset in the symbol table
so this is main, from this entry the address 08 10 00 00 or 0x1008 in
the processors memory, unfortunately due to the values I chose it happens to also be the file offset, dont get that confused.
this section is type 00000001 PROGBITS
0x11E8 1b 00 00 00 01 00 00 00 06 00 00 00 00 10 00 00 00 10 00 00
offset 0x1000 in the file 0x1C bytes
here is the program, the machine code.
00001000 00 00 00 eb fe ff ff ea 01 10 81 e2 01 20 82 e2
00001010 01 30 83 e2 01 40 84 e2 fa ff ff ea 41 11
so starting at memory offset 0x1008 which is 8 bytes after the
entry point (unfortunately I picked a bad address to use) we need to
go 0x8 bytes offset into this data
01 10 81 e2 01 20 82 e2
00001008 <main>:
1008: e2811001 add r1, r1, #1
100c: e2822001 add r2, r2, #1
1010: e2833001 add r3, r3, #1
this is all very file dependent, the cpu could care less about labels, main only means something to the humans, not the cpu.
If I convert the elf into other formats which are perfectly executable:
motorola s record:
S00A0000736F2E7372656338
S1131000000000EBFEFFFFEA011081E2012082E212
S10F1010013083E2014084E2FAFFFFEAB1
S9031000EC
raw binary image
hexdump -C so.bin
00000000 00 00 00 eb fe ff ff ea 01 10 81 e2 01 20 82 e2 |............. ..|
00000010 01 30 83 e2 01 40 84 e2 fa ff ff ea |.0...#......|
0000001c
The instruction bytes of interest are of course there, but the symbol information isnt. It depends on the file format you are interested in as to 1) if you can find "main" and then 2) print out the first few bytes at that address.
Hmm, a bit disturbing, but if you link for 0x2000 gnu ld burns some disk space and puts the offset at 0x2000, but choose 0x20000000 and it burns more disk space but not as much
000100d0 2f 00 00 00 08 00 00 20 00 00 00 00 10 00 01 00
shows the file offset is 0x010010 but the address in target space is 0x20000008
00010010 01 30 83 e2 01 40 84 e2 fa ff ff ea 41 11 00 00
00010020 00 61 65 61 62 69 00 01 07 00 00 00 08 01
just to demonstrate/enforce the file offset and the target memory space address are two different things.
this is a very nice format for what you are wanting to do
arm-none-eabi-objcopy -O symbolsrec so.elf so.srec
cat so.srec
$$ so.srec
$a $20000000
_bss_end__ $2001001c
__bss_start__ $2001001c
__bss_end__ $2001001c
_start $20000000
__bss_start $2001001c
main $20000008
__end__ $2001001c
_edata $2001001c
_end $2001001c
_stack $80000
__data_start $2001001c
$$
S0090000736F2E686578A1
S31520000000000000EBFEFFFFEA011081E2012082E200
S31120000010013083E2014084E2FAFFFFEA9F
S70520000000DA

Write header with C program?

In order to boot a Linux kernel on an embedded device I have to tag the kernel with a special header. The program used to tag the kernel is provided by the manufacture of the device as a 32-bit binary only. This is very annoying as I have to install hundreds of megabytes libraries on my 64-bit system only to tag a kernel with few bytes. This is how the kernel is tagged:
$./mkimage -f kernel.cfg -d zImage_without_header zImage
kernel.cfg:
##########################################################
#ENCINFO.CFG
#
# information and command for encode the Linux zImage
##########################################################
# Magic number for the ImageHeader, use this to seach start of the Image Header
#
MAGIC_NUMBER 0x27051956
#operation system type
OS_TYPE linux
#cpu architecture type
CPU_ARCH arm
#image type
IMAGE_TYPE kernel
#compress type
COMPRESS_TYPE gzip
#
DATALOAD_ADDRESS 0x00008000
#
ENTRY_ADDRESS 0x00008000
#image name string
IMAGE_NAME kernel.img
#model name string
MODEL_NAME DNS-313
# version string
VERSION 1.00b18
# mac address string
MAC_ADDRESS FF-FF-FF-FF-FF-FF
#the beginning offset of writing header
START_OFFSET 0x00
#the end offset of writing header
END_OFFSET 0xFF
#whether overwrite
OVERWRITE n
The mkimage binary is different from the mkimage that is available from e.g. the Debian repository, that one will not work for my device. I have tried to create a 1MB file and tagged it to display the header:
$dd if=/dev/zero bs=1k count=1024 of=zImage_without_header
$./mkimage -f kernel.cfg -d zImage_without_header zImage
output from last command:
Magic Number: 27051956
Image Name: kernel.img
Created: Wed May 2 17:40:43 2012
Image Type: ARM Linux Kernel Image (gzip compressed)
Data Size: 1048576 Bytes = 1024.00 kB = 1.00 MB
Load Address: 0x00008000
Entry Point: 0x00008000
Model Name: DNS-313
Version : 1.00b18
Mac Address: ff:ff:ff:ff:ff:ff
$hexdump -C zImage
output from last command:
00000000 27 05 19 56 [2c 83 53 d5] 4f a1 [55 7b 00 10 00 00] |'..V,.S.O.U{....|
00000010 00 00 80 00 00 00 80 00 [a7 38 ea 1c] 05 02 02 01 |.........8......|
00000020 6b 65 72 6e 65 6c 2e 69 6d 67 00 00 00 00 00 00 |kernel.img......|
00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000040 44 4e 53 2d 33 31 33 00 00 00 00 00 00 00 00 00 |DNS-313.........|
00000050 31 2e 30 30 62 31 38 00 00 00 00 00 00 00 00 00 |1.00b18.........|
00000060 ff ff ff ff ff ff 00 00 00 00 00 00 00 00 00 00 |................|
00000070 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00100060
The kernels should always be tagged with a header like the one above as I do not need to change anything. The the values enclosed in brackets [] seem to change when the filesize does, but I do not know how.
I think that the same thing could be accomplished with a small C program, but I am not sure where to start and how?
Any suggestions or ideas are welcome.
It might be a long shot, but if you do not have access to the "mkimage" source code, you can try disassembling it with objdump and try to figure out what is going on :
$ objdump -d ./mkimage

Resources