I'm building a simple payload to execute on an ARM64 system that will print a "Hello, world!" string over UART.
hello-world-payload.c:
#include <stdint.h>
typedef uint32_t u32;
int _start() {
const char* txt = "Hello, world!\n";
volatile u32* uart_wfifo = (volatile u32*)0xc81004c0;
volatile u32* uart_status = (volatile u32*)0xc81004cc;
u32 i = 0;
char c = txt[0];
while (c) {
// wait for UART availability
do {} while (! (*uart_status & (1 << 22)) );
// print 1 character
*uart_wfifo = (0x000000ff & c);
c = txt[++i];
}
while (1) {} // wait for watchdog
}
Makefile:
CROSS_COMPILE ?= aarch64-linux-gnu-
CC = $(CROSS_COMPILE)gcc
OBJCOPY = $(CROSS_COMPILE)objcopy
AFLAGS = -nostdlib
CFLAGS = -O0 -nostdlib
LDFLAGS = -Wl,--build-id=none
all: hello-world-payload.bin
%.elf: %.c
$(CC) $(CFLAGS) $(LDFLAGS) -o $# $^
%.bin: %.elf
$(OBJCOPY) -O binary -S -g --strip-unneeded \
-j .text \
-j .rodata \
$< $#
.PHONY: clean
clean:
rm hello-world-payload.bin
For cross compiler I use the gcc-arm-10.3-2021.07-x86_64-aarch64-none-elf (AArch64 ELF bare-metal target) toolchain from ARM Developer.
With code above I get a 159 bytes binary that works just fine.
Once I move the txt out of the function scope this way:
typedef uint32_t u32;
const char* txt = "Hello, world!\n";
int _start() {
, the payload doesn't run anymore. After loading the payload binary into Ghidra I notice that the code tries to access txt at DAT_000100a0 while in fact it's stored at 0x90.
Since txt is const and is already initialized it should belong to the .rodata section which I confirmed by inspecting the assembly output of ${CROSS_COMPILE}gcc -O0 -nostdlib -Wl,--build-id=none -o hello-world-payload.s hello-world-payload.c -S, here's an excerpt from it:
.arch armv8-a
.file "hello-world-payload.c"
.text
.global txt
.section .rodata
.align 3
.LC0:
.string "Hello, world!\n"
.data
.align 3
.type txt, %object
.size txt, 8
I made sure I didn't forget to include .rodata in Makefile:
%.bin: %.elf
$(OBJCOPY) -O binary -S -g --strip-unneeded \
-j .text \
-j .rodata \
$< $#
The environment this binary runs in puts some constraints such as the max payload size (approx 29000 bytes in my case) and as far as I understood the binary must begin with the .text section so my goal is to keep the payload size as small as possible but I want to access various objects from different functions.
I inspected the ${CROSS_COMPILE}readelf -S output for hello-world-payload.o (${CROSS_COMPILE}gcc -O0 -nostdlib -Wl,--build-id=none -o hello-world-payload.o hello-world-payload.c):
Section Headers:
[Nr] Name Type Address Offset Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000 0000000000000000 0000000000000000 0 0 0
[ 1] .text PROGBITS 0000000000400000 00010000 0000000000000090 0000000000000000 AX 0 0 4
[ 2] .rodata PROGBITS 0000000000400090 00010090 000000000000000f 0000000000000000 A 0 0 8
[ 3] .data PROGBITS 00000000004100a0 000100a0 0000000000000008 0000000000000000 WA 0 0 8
[ 4] .comment PROGBITS 0000000000000000 000100a8 000000000000005d 0000000000000001 MS 0 0 1
[ 5] .symtab SYMTAB 0000000000000000 00010108 00000000000001e0 0000000000000018 6 9 8
[ 6] .strtab STRTAB 0000000000000000 000102e8 000000000000006f 0000000000000000 0 0 1
[ 7] .shstrtab STRTAB 0000000000000000 00010357 0000000000000038 0000000000000000
I see there's a .data section so I tried to add it to the objcopy command in my Makefile:
%.bin: %.elf
$(OBJCOPY) -O binary -S -g --strip-unneeded \
-j .text \
-j .rodata \
-j .data \
$< $#
The binary size grows to whopping 65704 bytes but even with the .data section Ghidra shows the same DAT_000100a0 reference with nothing like the `"Hello, world!\n" string at that position:
The actual string is at 0x90 as it was before adding the .data section.
It is clear to me that the compiler messes up addresses of .rodata section where the string resides but I don't know how to fix it. Adding .data section didn't help.
Commonly with microcontrollers, the content of the .data section needs to be initialized by the start-up code from a section in non-volatile memory of the same size. Apparently your start-up code does not fulfill this requirement to run a C application.
In contrast to your belief, txt is an separate non-constant variable, because it is a modifiable pointer to the constant text. Your C code specifies to initialize this global variable with the address of the unnamed string. But no code does this.
You can make the global pointer variable constant, if you change your code to:
const char * const txt = "Hello, world!\n";
Now txt is located in .rodata.
You can avoid the global pointer variable at all, if you change your code to:
const char txt[] = "Hello, world!\n";
Now txt names the array of characters, which is located in .rodata.
In your first version of your program, txt was a dynamic variable on the stack. The code initialized it with the address of the unnamed string after entering the function _start().
Related
I have a big text file that I want to include in a C program. I could just make it a string literal but it's pretty big and that would be cumbersome. So I'm currently linking like this:
$ ld -r -b binary -o /tmp/stuff.o /tmp/stuff.txt
$ clang -o myprogram main.o /tmp/stuff.o
Objdump output:
$ objdump -t /tmp/stuff.o
/tmp/stuff.o: file format elf64-x86-64
SYMBOL TABLE:
0000000000000000 l d .data 0000000000000000 .data
0000000000000006 g *ABS* 0000000000000000 _binary__tmp_stuff_txt_size
0000000000000006 g .data 0000000000000000 _binary__tmp_stuff_txt_end
0000000000000000 g .data 0000000000000000 _binary__tmp_stuff_txt_start
In the code, I do this (gotten from this question):
extern char _binary__tmp_stuff_txt_start[];
extern char _binary__tmp_stuff_txt_size[];
int f(void) {
size_t size = (size_t)_binary__tmp_stuff_txt_size;
do_stuff(size, _binary__tmp_stuff_txt_start);
}
Everything works great, but when I compile with GCC instead of Clang, it segfaults. Looking at it in GDB, the size variable initialized like this size_t size = (size_t)_binary__tmp_stuff_txt_size; is garbage. It seems that when GCC links, it passes the -pie flag to ld but Clang doesn't. I could fix this by just passing -no-pie to GCC, but it seems kindof sad that doing something so simple would prevent using PIE. Is there something I should change to make this work?
I am trying to build a basic project for ARM with symbols and associated line numbers, so that I can easily debug the project from GDB Multiarch while it is running in QEMU.
I have two files, a C source file and some assembly. In this example, they are very simple:
cmain.c:
int add_numbers(int a, int b) {
return a + b;
}
int cmain() {
int a = 3;
int b = 4;
int c = add_numbers(a, b);
}
main.s:
.section .init
.global _start
_start:
.extern cmain
mov sp, #0x8000
bl cmain
Additionally, here's the linker file, kernel.ld:
SECTIONS {
.init 0x8000 : {
*(.init)
}
.text : {
*(.text)
}
.data : {
*(.data)
*(.bss)
*(.rodata*)
*(.COMMON)
}
/DISCARD/ : {
*(*)
}
}
I then build these projects with debugging symbols using the following shell script. In brief, it assembles and compiles the files into object files, then links them into an ELF and objcopies into an IMG.
rm -r build
mkdir -p build
arm-none-eabi-as -I . main.s -o build/main.o
arm-none-eabi-gcc -ffreestanding -fno-builtin -march=armv7-a -MD -MP -g -c cmain.c -o build/cmain.o
arm-none-eabi-ld build/main.o build/cmain.o -L/usr/lib/gcc/arm-none-eabi/6.3.1/ -lgcc --no-undefined -o build/output.elf -T kernel.ld
arm-none-eabi-objcopy build/output.elf -O binary build/kernel.img --keep-file-symbols
For GDB debugger stepping, I need the ELF to have line numbers for the C source. (Note that the actual project has many more C files.) The lines numbers are present in C object file, but not in the ELF.
$ arm-none-eabi-nm build/cmain.o --line-numbers
00000000 T add_numbers /home/aaron/Desktop/arm-mcve/cmain.c:1
00000030 T cmain /home/aaron/Desktop/arm-mcve/cmain.c:5
$ arm-none-eabi-nm build/output.elf --line-numbers
00008008 T add_numbers
00008038 T cmain
00008000 T _start
Why is there no line number information in the ELF, and how can I add it so that GDB can step through it?
Your linker script discards the sections with debugging information. Look at the default linker script arm-none-eabi-ld --verbose for some ideas. You will at least need some of the DWARF 2 sections:
.debug_info 0 : { *(.debug_info .gnu.linkonce.wi.*) }
.debug_abbrev 0 : { *(.debug_abbrev) }
.debug_line 0 : { *(.debug_line .debug_line.* .debug_line_end ) }
.debug_frame 0 : { *(.debug_frame) }
.debug_str 0 : { *(.debug_str) }
.debug_loc 0 : { *(.debug_loc) }
.debug_macinfo 0 : { *(.debug_macinfo) }
(Adding all of them should work.)
I want to remove unused functions from code while compiling. Then I write some code (main.c):
#include <stdio.h>
const char *get1();
int main()
{
puts( get1() );
}
and getall.c:
const char *get1()
{
return "s97symmqdn-1";
}
const char *get2()
{
return "s97symmqdn-2";
}
const char *get3()
{
return "s97symmqdn-3";
}
Makefile
test1 :
rm -f a.out *.o *.a
gcc -ffunction-sections -fdata-sections -c main.c getall.c
ar cr libgetall.a getall.o
gcc -Wl,--gc-sections main.o -L. -lgetall
After run make test1 && objdump --sym a.out | grep get , I only find the next 2 lines output:
0000000000000000 l df *ABS* 0000000000000000 getall.c
0000000000400535 g F .text 000000000000000b get1
I guess the get2 and get3 was removed. But when I open the a.out by vim, I found s97symmqdn-1 s97symmqdn-2 s97symmqdn-3 exists.
Is the function get2 get3 removed really ? How I can remove the symbol s97symmqdn-2 s97symmqdn-3 ? Thank you for your reply.
My system is centos7 and gcc version is 4.8.5
The compilation options -ffunction-sections -fdata-sections and linkage option --gc-sections
are working correctly in your example. Your static library is superfluous, so it can
be simplified to:
$ gcc -ffunction-sections -fdata-sections -c main.c getall.c
$ gcc -Wl,--gc-sections main.o getall.o -Wl,-Map=mapfile
in which I'm also asking for the linker's mapfile.
The unused functions get2 and get3 are absent from the executable:
$ nm a.out | grep get
0000000000000657 T get1
and the mapfile shows that the unused function-sections .text.get2 and .text.get3 in which get2 and get3 are
respectively defined were discarded in the linkage:
mapfile (1)
...
Discarded input sections
...
.text.get2 0x0000000000000000 0xd getall.o
.text.get3 0x0000000000000000 0xd getall.o
...
Nevertheless, as you found, all three of the string literals "s97symmqdn-(1|2|3)"
are in the program:
$ strings a.out | egrep 's97symmqdn-(1|2|3)'
s97symmqdn-1
s97symmqdn-2
s97symmqdn-3
That is because -fdata-sections applies just to the same data objects that
__attribute__ ((__section__("name"))) applies to1, i.e. to the definitions
of variables that have static storage duration. It is not applied to anonymous string literals like your
"s97symmqdn-(1|2|3)". They are all just placed in the .rodata section as usual,
and there we find them:
$ objdump -s -j .rodata a.out
a.out: file format elf64-x86-64
Contents of section .rodata:
06ed 73393773 796d6d71 646e2d31 00733937 s97symmqdn-1.s97
06fd 73796d6d 71646e2d 32007339 3773796d symmqdn-2.s97sym
070d 6d71646e 2d3300 mqdn-3.
--gc-sections does not allow the linker to discard .rodata from the program
because it is not an unused section: it contains "s97symmqdn-1", referenced
in the program by get1 as well as the unreferenced strings "s97symmqdn-2"
and "s97symmqdn-3"
Fix
To get these three string literals separated into distinct data sections, you
need to assign them to distinct named objects, e.g.
getcall.c (2)
const char *get1()
{
static const char s[] = "s97symmqdn-1";
return s;
}
const char *get2()
{
static const char s[] = "s97symmqdn-2";
return s;
}
const char *get3()
{
static const char s[] = "s97symmqdn-3";
return s;
}
If we recompile and relink with that change, we see:
mapfile (2)
...
Discarded input sections
...
.text.get2 0x0000000000000000 0xd getall.o
.text.get3 0x0000000000000000 0xd getall.o
.rodata.s.1797
0x0000000000000000 0xd getall.o
.rodata.s.1800
0x0000000000000000 0xd getall.o
...
Now there are two new discarded data-sections, which contain
the two string literals we don't need, as we can see in the object file:
$ objdump -s -j .rodata.s.1797 getall.o
getall.o: file format elf64-x86-64
Contents of section .rodata.s.1797:
0000 73393773 796d6d71 646e2d32 00 s97symmqdn-2.
and:
$ objdump -s -j .rodata.s.1800 getall.o
getall.o: file format elf64-x86-64
Contents of section .rodata.s.1800:
0000 73393773 796d6d71 646e2d33 00 s97symmqdn-3.
Only the referenced string "s97symmqdn-1" now appears anywhere in the program:
$ strings a.out | egrep 's97symmqdn-(1|2|3)'
s97symmqdn-1
and it is the only string in the program's .rodata:
$ objdump -s -j .rodata a.out
a.out: file format elf64-x86-64
Contents of section .rodata:
06f0 73393773 796d6d71 646e2d31 00 s97symmqdn-1.
[1] Likewise, -function-sections has the same effect as qualifying the
definition of every function foo with __attribute__ ((__section__(".text.foo")))
I have been following a tutorial called "The little book about OS development". I can write individual characters to the framebuffer, but not the next character. Here are the files:
kmain.c
#include "io.h"
#define FB_BLACK 0
#define FB_BLUE 1
#define FB_GREEN 2
#define FB_CYAN 3
#define FB_RED 4
#define FB_MAGENTA 5
#define FB_BROWN 6
#define FB_LIGHTGREY 7
#define FB_DARKGREY 8
#define FB_LIGHTBLUE 9
#define FB_LIGHTGREEN 10
#define FB_LIGHTCYAN 11
#define FB_LIGHTRED 12
#define FB_LIGHTMAGENTA 13
#define FB_LIGHTBROWN 14
#define FB_WHITE 15
#define FB_COMMAND_PORT 0x3D4
#define FB_DATA_PORT 0x3D5
#define FB_HIGH_BYTE_COMMAND 14
#define FB_LOW_BYTE_COMMAND 15
char *fb = (char*)0x000B8000;
void fb_move_cursor(unsigned short pos) {
outb(FB_COMMAND_PORT, FB_HIGH_BYTE_COMMAND);
outb(FB_DATA_PORT, ((pos >> 8) & 0x00FF));
outb(FB_COMMAND_PORT, FB_LOW_BYTE_COMMAND);
outb(FB_DATA_PORT, pos & 0x00FF);
}
void fb_write_cell(unsigned int i, char c, unsigned char fg, unsigned char bg)
{
fb[i] = c;
fb[i + 1] = ((fg & 0x0F) << 4) | (bg & 0x0F);
}
void kmain(void) {
fb_write_cell(0, 'H', FB_WHITE, FB_BLACK);
fb_move_cursor(2);
fb_write_cell(1, 'i', FB_WHITE, FB_BLACK);
}
io.h
#ifndef INCLUDE_IOH
#define INCLUDE_IOH
void outb(unsigned short port, unsigned char data);
#endif
io.s
global outb
global hang
;Sends a byte to an io port
; [esp + 8] data byte
; [esp + 4] the io port
outb:
mov al, [esp + 8]
mov dx, [esp + 4]
out dx, al
ret
hang:
jmp hang; so that program can hang
loader.s
global loader
MAGIC_NUMBER equ 0x1BADB002 ;Multiboot constant
FLAGS equ 0x0 ;Multiboot flags
CHKSUM equ -MAGIC_NUMBER ;Multiboot checksum. Valid if CHKSUM + FLAGS + MAGIC_NUMBER == 0
KERNEL_STACK_SIZE equ 4096
section .bss
align 4
kernel_stack:
resb KERNEL_STACK_SIZE
section .text
align 4
dd MAGIC_NUMBER
dd FLAGS
dd CHKSUM
loader:
mov esp,kernel_stack+KERNEL_STACK_SIZE
extern kmain
call kmain
hang:
jmp hang
link.ld
ENTRY(loader) /* the name of the entry label */
SECTIONS {
. = 0x00100000; /* the code should be loaded at 1 MB */
.text ALIGN (0x1000) : /* align at 4 KB */
{
*(.text) /* all text sections from all files */
}
.rodata ALIGN (0x1000) : /* align at 4 KB */
{
*(.rodata*) /* all read-only data sections from all files */
}
.data ALIGN (0x1000) : /* align at 4 KB */
{
*(.data) /* all data sections from all files */
}
.bss ALIGN (0x1000) : /* align at 4 KB */
{
*(COMMON) /* all COMMON sections from all files */
*(.bss) /* all bss sections from all files */
}
}
Makefile
OBJECTS = loader.o kmain.o io.o
CC = gcc
CFLAGS = -m32 -nostdlib -nostdinc -fno-builtin -fno-stack-protector \
-nostartfiles -nodefaultlibs -Wall -Wextra -Werror -c
LDFLAGS = -T link.ld -melf_i386
AS = nasm
ASFLAGS = -f elf
all: kernel.elf
kernel.elf: $(OBJECTS)
ld $(LDFLAGS) $(OBJECTS) -o kernel.elf
os.iso: kernel.elf
cp kernel.elf iso/boot/kernel.elf
genisoimage -R \
-b boot/grub/stage2_eltorito \
-no-emul-boot \
-boot-load-size 4 \
-A os \
-input-charset utf8 \
-quiet \
-boot-info-table \
-o os.iso \
iso
run: os.iso
bochs -f bochsrc.txt -q
%.o: %.c
$(CC) $(CFLAGS) $< -o $#
%.o: %.s
$(AS) $(ASFLAGS) $< -o $#
clean:
rm -rf *.o kernel.elf os.iso
You can download the folder the project is in here
I expect to see "Hi" in the upper left corner, but instead I see 2 weird characters: Printing the first character and moving the cursor works fine, but when I attempt to print the second character, it messes up.
Edit: I realized that the first character is an incorrectly colored H.
#Margaret Bloom's comment solved my issue. The issue was that a character is 16 bits, and I needed to increment I by 2, but I was increment it by 1, causing it to overlap like this:
Intended: char1Data char1Color char2Data char2Color
Problem: char1Data char2Data char2Color
Wikipedia mentions that "the bss section typically includes all uninitialized variables declared at file scope." Given the following file:
int uninit;
int main() {
uninit = 1;
return 0;
}
When I compile this to an executable I see the bss segment filled properly:
$ gcc prog1.c -o prog1
$ size prog1
text data bss dec hex filename
1115 552 8 1675 68b prog1
However if I compile it as an object file I don't see the bss segment (I'd expect it to be 4):
$ gcc -c prog1.c
$ size prog1.o
text data bss dec hex filename
72 0 0 72 48 prog1.o
Is there something obvious I am missing?
I am using gcc version 4.8.1.
If we use readelf -s to look at the symbol table, we'll see:
$ readelf -s prog1.o
Symbol table '.symtab' contains 10 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FILE LOCAL DEFAULT ABS bss.c
2: 0000000000000000 0 SECTION LOCAL DEFAULT 1
3: 0000000000000000 0 SECTION LOCAL DEFAULT 3
4: 0000000000000000 0 SECTION LOCAL DEFAULT 4
5: 0000000000000000 0 SECTION LOCAL DEFAULT 6
6: 0000000000000000 0 SECTION LOCAL DEFAULT 7
7: 0000000000000000 0 SECTION LOCAL DEFAULT 5
8: 0000000000000004 4 OBJECT GLOBAL DEFAULT COM uninit <<<<
9: 0000000000000000 16 FUNC GLOBAL DEFAULT 1 main
We see that your uninit symbol ("variable") is, at this stage, a "common" symbol. It has not yet been "assigned" to the BSS.
See this question for more information on "common" symbols: What does "COM" means in the Ndx column of the .symtab section?
Once your final executable is linked together, it will be put in the BSS as you expected.
You can bypass this behavior by passing the -fno-common flag to GCC:
$ gcc -fno-common -c bss.c
$ size bss.o
text data bss dec hex filename
72 0 4 76 4c bss.o
Instead, you could mark uninit as static. This way, the compiler will know that no other .o file can refer to it, so it will not be a "common" symbol. Instead, it will be placed into the BSS immediately, as you expected:
$ cat bss.c
static int uninit;
int main() {
uninit = 1;
return 0;
}
$ gcc -c bss.c
$ size bss.o
text data bss dec hex filename
72 0 4 76 4c bss.o