Where do addresses in S-Record files come from? - linker

I am developing a freestanding application for an ARM Cortex-M microcontroller and while researching the structure of an S-Record file I found that I have some kind of misunderstanding in how the addresses are represented in the S-Record format.
I have a variable defined in my source code like so:
uint32_t g_ip_address = IP_ADDRESS(10, 1, 0, 56); // in LE: 0x3800010A
When I run objdump I see that the variable ends up in the .data section at address 0x1ffe01c4:
$ arm-none-eabi-objdump -t application.elf | grep g_ip_address
1ffe01c4 g O .data 00000004 g_ip_address
This makes sense, given that the memory section of my linker script looks like this and .data is going to RAM:
MEMORY
{
FLASH (rx) : ORIGIN = 0x00000000, LENGTH = 0x0200000 /* 2M */
RAM (rwx) : ORIGIN = 0x1FFE0000, LENGTH = 0x00A0000 /* 640K */
}
However, when I check through the srec file, I'm finding that the address for the record is not 0x1FFE0000. It's 0x0005F570, which seems to put it in the FLASH section (spaces added for clarity).
S315 0005F570 00000000 3800010A 000010180000000014
Is there an implicit offset encoded in a different record entry? How does objcopy get this new address? If this value is being encoded into a function in some way (some pre-main initialization of variables perhaps)?
Ultimately, my goal is to be able to parse the srec file and patch the IP address value to create a new srec file. Is the idiomatic way of doing something like this simply to create a struct that hardcodes some leading magic number sequence that can be detected in the file?

flash.s
.cpu cortex-m0
.thumb
.word 0x00002000
.word reset
.thumb_func
reset:
b reset
.data
.word 0x11223344
.bss
.word 0x00000000
.word 0x00000000
flash.ld
MEMORY
{
rom : ORIGIN = 0x08000000, LENGTH = 0x1000
ram : ORIGIN = 0x20000000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > rom
.bss : { *(.bss*) } > ram AT > rom
.data : { *(.data*) } > ram AT > rom
}
build it
arm-none-eabi-as --warn --fatal-warnings -mcpu=cortex-m0 flash.s -o flash.o
arm-none-eabi-ld -nostdlib -nostartfiles -T flash.ld flash.o -o so.elf
arm-none-eabi-objdump -D so.elf > so.list
arm-none-eabi-objcopy --srec-forceS3 so.elf -O srec so.srec
arm-none-eabi-objcopy -O binary so.elf so.bin
cat so.list
08000000 <reset-0x8>:
8000000: 00002000 andeq r2, r0, r0
8000004: 08000009 stmdaeq r0, {r0, r3}
08000008 <reset>:
8000008: e7fe b.n 8000008 <reset>
Disassembly of section .bss:
20000000 <.bss>:
...
Disassembly of section .data:
20000008 <.data>:
20000008: 11223344 ; <UNDEFINED> instruction: 0x11223344
cat so.srec
S00A0000736F2E7372656338
S30F080000000020000009000008FEE7D2
S3090800000A443322113A
S70508000000F2
arm-none-eabi-readelf -l so.elf
Elf file type is EXEC (Executable file)
Entry point 0x8000000
There are 3 program headers, starting at offset 52
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x000094 0x08000000 0x08000000 0x0000a 0x0000a R E 0x2
LOAD 0x000000 0x20000000 0x0800000a 0x00000 0x00008 RW 0x1
LOAD 0x00009e 0x20000008 0x0800000a 0x00004 0x00004 RW 0x1
Section to Segment mapping:
Segment Sections...
00 .text
01 .bss
02 .data
hexdump -C so.bin
00000000 00 20 00 00 09 00 00 08 fe e7 44 33 22 11 |. ........D3".|
0000000e
bss is not normally exposed as is, you complicate your linker script to add beginning and end points so you can then zero that range in your bootstrap. For .data you can clearly see what is going on with the standard binutils tools.
You have not provided enough of your code (and linker script), nor a minimal example that demonstrates the problem, so this is about as far as this can go.

Related

Understanding ELF Binary Size for nostdlib C Program

I'm on Ubuntu 20.04, gcc 9.3.0, ld 2.34.
I have a simple hello world program that does not use glibc or any other library and just uses write syscall. Despite this, my binary size is roughly 8Kb. I'm unsure as to why it is that large and not say 1Kb.
C Program:
int
x64_syscall_write(int fd, char const *data, unsigned long int data_size)
{
int result = 0;
__asm__ __volatile__("syscall"
: "=a" (result)
: "a" (1), "D" (fd),
"S" (data), "d" (data_size)
: "r11", "rcx", "memory");
return result;
}
__asm__(".global entry_point\n"
"entry_point:\n"
"xor rbp, rbp\n"
"pop rdi\n"
"mov rsi, rsp\n"
"and rsp, 0xfffffffffffffff0\n"
"call main\n"
"mov rdi, rax\n"
"mov rax, 60\n"
"syscall\n"
"ret");
int
main(int argc, char *argv[])
{
x64_syscall_write(1, "hello\n", 6);
return 0;
}
Built with:
gcc -ffreestanding -static -nostdlib -no-pie -masm=intel \
-fno-unwind-tables -fno-asynchronous-unwind-tables \
-Wl,--gc-sections -fdata-sections -Os \
hello.c -c -o hello.o
# NOTE: I know more could be done here to shave
# off a few more bytes, but I feel this is the bulk of it.
ld -e entry_point hello.o -o hello
hello.o is 1.7Kb.
hello is 8.4Kb.
readelf -Wl hello
Elf file type is EXEC (Executable file)
Entry point 0x40101c
There are 6 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x000000 0x0000000000400000 0x0000000000400000 0x0001b0 0x0001b0 R 0x1000
LOAD 0x001000 0x0000000000401000 0x0000000000401000 0x000045 0x000045 R E 0x1000
LOAD 0x002000 0x0000000000402000 0x0000000000402000 0x000007 0x000007 R 0x1000
NOTE 0x000190 0x0000000000400190 0x0000000000400190 0x000020 0x000020 R 0x8
GNU_PROPERTY 0x000190 0x0000000000400190 0x0000000000400190 0x000020 0x000020 R 0x8
GNU_STACK 0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW 0x10
Section to Segment mapping:
Segment Sections...
00 .note.gnu.property
01 .text
02 .rodata
03 .note.gnu.property
04 .note.gnu.property
05
Here you can see that the linker created 3 LOAD segments: one for the ELF header and other metadata, one for .text and one for .rodata.
Linking with -z noseparate-code results in much smaller binary (smaller than hello.o):
ls -l hello*
-rwxr-xr-x 1 user user 1384 Apr 26 22:24 hello
-rw-r--r-- 1 user user 603 Apr 26 22:22 hello.c
-rw-r--r-- 1 user user 1680 Apr 26 22:22 hello.o
readelf -Wl hello
Elf file type is EXEC (Executable file)
Entry point 0x40015c
There are 4 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x000000 0x0000000000400000 0x0000000000400000 0x00018c 0x00018c R E 0x1000
NOTE 0x000120 0x0000000000400120 0x0000000000400120 0x000020 0x000020 R 0x8
GNU_PROPERTY 0x000120 0x0000000000400120 0x0000000000400120 0x000020 0x000020 R 0x8
GNU_STACK 0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW 0x10
Section to Segment mapping:
Segment Sections...
00 .note.gnu.property .text .rodata
01 .note.gnu.property
02 .note.gnu.property
03
You can shrink this further by removing .note.GNU-stack and .note.gnu.property sections:
objcopy -R .note.* hello.o hello1.o
ld -e entry_point hello1.o -o hello1 -z noseparate-code
ls -l hello1*
-rwxr-xr-x 1 user user 1072 Apr 26 22:38 hello1
-rw-r--r-- 1 user user 1440 Apr 26 22:37 hello1.o
readelf -Wl hello1
Elf file type is EXEC (Executable file)
Entry point 0x400094
There is 1 program header, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x000000 0x0000000000400000 0x0000000000400000 0x0000c4 0x0000c4 R E 0x1000
Section to Segment mapping:
Segment Sections...
00 .text .rodata

gcc/ld script ignores the start adress of the .text section and adds a lot of junk to my binary

I am trying to build the following really small C program into a raw binary file:
asm ("call sys_main\n" // Immediately run sys_main at start of code
"__asm_loop_halt:\n"
"jmp __asm_loop_halt\n"); // Then halt
int sys_main() {
short *addr = (short*) 0x08b000; // Address start of EGA-VRAM
*addr = 0x0f41; // Write white 'A' on black to screen
}
Because I am trying to create a raw binary I have to use a linker script gcc -std=gnu99 -Os -nostdlib -m32 -march=i386 -ffreestanding -Wl,--nmagic,--script=386.ld -o test test.c:
OUTPUT_FORMAT(binary)
SECTIONS
{
.text 0x0500 :
{
*(.text);
}
.data :
{
*(.data);
*(.bss);
*(.rodata);
}
_heap = ALIGN(4);
}
The script is supposed to tell the linker that the code starts running at 0x500 and that it should only create a binary file. However when I disassemble the binary, I get:
00000000 E802000000 call dword 0x7
00000005 EBFE jmp short 0x5
00000007 55 push ebp
00000008 89E5 mov ebp,esp
0000000A 66C70500B0080041 mov word [dword 0x8b000],0xf41
-0F
00000013 5D pop ebp
00000014 C3 ret
00000015 0000 add [eax],al
00000017 001400 add [eax+eax],dl
......
Appearantly the linker still took 0x0 as the start address of the code and also added a bunch of random data behind the last senseful 'ret' instruction, that is in total 4 times as big as the code.
What is this data, why is it there and what did I do wrong to have my code start at 0x0?
Edit: Thanks to Eugene's tip with the map I discovered that the bytes behind the .text section are .eh_frame responsible for exception handling which can easily removed by calling gcc with -fno-asynchronous-unwind-tables.

Linker assigns improper LMA to a section (using AT>)

I have a simple asm file with 3 sections:
.code 32
.section sec1
MOV R3, #10
.section sec2
MOV R1, #10
.section sec3
MOV R2, #10
.end
And a linker script:
MEMORY
{
ram : ORIGIN = 0x00200000, LENGTH = 1K
rom : ORIGIN = 0x00100000, LENGTH = 1K
}
SECTIONS
{
.text :
{
*(.glue_7t)
*(.glue_7)
*(.text)
}>rom
.sec1 :
{
*(sec1)
}>rom
.sec2 :
{
_ram_start = .;
*(sec2)
}>ram AT> rom
.sec3 :
{
*(sec3)
}>ram AT> rom
.data :
{
*(.data)
}>ram
.bss :
{
*(.bss)
}>ram
}
I assume that .sec2 should have VMA address set to ram's origin, but the LMA should be the address after .sec1, but objdump gives me:
test2.o: file format elf32-littlearm
Sections:
Idx Name Size VMA LMA File off Algn
0 .sec1 00000004 00100000 00100000 00000034 2**0
CONTENTS, READONLY
1 .sec2 00000004 00200000 00200000 00000038 2**0
CONTENTS, READONLY
2 .sec3 00000004 00200004 00200004 0000003c 2**0
CONTENTS, READONLY
Why is the .sec2 LMA set to ram?
It turns out that my sections from the .s file were not allocatable. That's why the LMA was wrong. If the sections won't be allocated, the LMA can be the same as VMA. I've found it out while playing with objcopy - the output binary file was always empty. The asm file should look like this:
.code 32
.section sec1, "a"
MOV R3, #10
.section sec2, "a"
MOV R1, #10
.section sec3, "a"
MOV R2, #10
.end
Normally the code would go to the .text section, which is allocatable by default. After adding "a" the linker produces proper LMA addresses.

Can you find the memory for a local character array in a function using objdump?

If I define a local character array within a function and then use objdump to grab the assembly code for that particular function, can I find the memory for that array within the assembly code?
This is a question I have for a homework assignment.
Sure, as long as your array has a non-zero initializer, you should be able to find it. Here's an example I made for ARM:
char function(int i)
{
char arr[] = { 1, 2, 3, 4, 5, 6, 7, 8 };
return arr[i];
}
Build it:
$ clang -O2 -Wall -c -o example.o example.c
Disassemble the output:
$ objdump -d example.o
example.o: file format elf32-littlearm
Disassembly of section .text:
00000000 <function>:
0: e59f1004 ldr r1, [pc, #4] ; c <function+0xc>
4: e7d10000 ldrb r0, [r1, r0]
8: e12fff1e bx lr
c: 00000000 .word 0x00000000
Hmm - notice that .word 0x0000000 at offset 0xc? That's going to be fixed up by the linker to point to the array. Let's go check out the relocation table:
$ objdump -r example.o
example.o: file format elf32-littlearm
RELOCATION RECORDS FOR [.text]:
OFFSET TYPE VALUE
00000008 R_ARM_V4BX *ABS*
0000000c R_ARM_ABS32 .rodata.cst8
Aha! The word at 0xc is going to get fixed up with an absolute pointer to the .rodata.cst8 section - that sounds like what we want. Let's take a peek:
$ objdump -s -j .rodata.cst8 example.o
example.o: file format elf32-littlearm
Contents of section .rodata.cst8:
0000 01020304 05060708 ........
And there you have the contents of the array!
A local array is allocated on stack in run-time only (when the function is entered). So it doesn't present in executable.
An exception would be a static array.

What are these extra bytes in my binary file?

I am in the process of writing a small operating system in C. I have written a bootloader and I'm now trying to get a simple C file (the "kernel") to compile with gcc:
int main(void) { return 0; }
I compile the file with the following command:
gcc kernel.c -o kernel.o -nostdlib -nostartfiles
I use the linker to create the final image using this command:
ld kernel.o -o kernel.bin -T linker.ld --oformat=binary
The contents of the linker.ld file are as follows:
SECTIONS
{
. = 0x7e00;
.text ALIGN (0x00) :
{
*(.text)
}
}
(The bootloader loads the image at address 0x7e00.)
This seems to work quite well - ld produces a 128-byte file containing the following instructions in the first 11 bytes:
00000000 55 push ebp
00000001 48 dec eax
00000002 89 E5 mov ebp, esp
00000004 B8 00 00 00 00 mov eax, 0x00000000
00000009 5D pop ebp
0000000A C3 ret
However, I can't figure out what the other 117 bytes are for. Disassembling them seems to produce a bunch of garbage that doesn't make any sense. The existence of the additional bytes has me wondering if I'm doing something wrong.
Should I be concerned?
These are additional sections, which were not stripped and not discarded. You want your linker.ld file to look like this:
SECTIONS
{
. = 0x7e00;
.text ALIGN (0x00) :
{
*(.text)
}
/DISCARD/ :
{
*(.comment)
*(.eh_frame_hdr)
*(.eh_frame)
}
}
I know what sections to discard from the output of objdump -t kernel.o.
Simple, you're using gcc, and it always put its initialization code before passing control to your main.
What's on that start up code I don't know, but they are there. As you may see there's also an comment 'GNU' on your binary, you can't print specific sectors by using objdump -s -j 'section name'.

Resources