Master Boot Record using GNU Assembly: extra bytes in flat binary output - gnu-assembler

I am try to compile the simple following MBR:
.code16
.globl _start
.text
_start:
end:
jmp end
; Don't bother with 0xAA55 yet
I run the following commands:
> as --32 -o boot.o boot.s
> ld -m elf_i386 boot.o --oformat=binary -o mbr -Ttext 0x7c00
However, I get a binary file of more than 129MB which is strange to me. Thus,
I wanted to know what is going on in that build process ? Thank you very much.
Running objdump over boot.o give me:
> objdump -s boot.o
boot.o: format de fichier elf32-i386
Contenu de la section .text :
0000 ebfe ..
Contenu de la section .note.gnu.property :
0000 04000000 18000000 05000000 474e5500 ............GNU.
0010 020001c0 04000000 00000000 010001c0 ................
0020 04000000 01000000
Manually removing the section .note.gnu.property before calling ld seems to solve the problem. However, I don't know why this section appears by default... Running the following build commands seems to solve the problem too:
> as --32 -o boot.o boot.s -mx86-used-note=no
> ld -m elf_i386 boot.o --oformat=binary -o mbr -Ttext 0x7c00

ld links all your sections into the flat binary output unless you tell it not to (with a linker script for example).
The extra bytes are from the .note.gnu.property section which as adds, which can indicate stuff like x86 ISA version (e.g. AVX2+FMA+BMI2, Haswell feature level, is x86-64_v3.) You don't want that in your flat binary, especially not at the default high address far from where you tell it to put your .text section with -Ttext; that would result in a huge file with zeros padding the gap since it's a flat binary.
Using as -mx86-used-note=no will omit that section from the .o in the first place, leaving only the sections you define in your asm source. From the GAS manual's i386 options
-mx86-used-note=no
-mx86-used-note=yes
These options control whether the assembler should generate GNU_PROPERTY_X86_ISA_1_USED and GNU_PROPERTY_X86_FEATURE_2_USED GNU
property notes. The default can be controlled by the
--enable-x86-used-note configure option.

using -mx86-used-note=no flag with as will remove note section.
Check here https://sourceware.org/binutils/docs/as/i386_002dOptions.html
-mx86-used-note=no
-mx86-used-note=yes
These options control whether the assembler should generate GNU_PROPERTY_X86_ISA_1_USED and GNU_PROPERTY_X86_FEATURE_2_USED GNU
property notes. The default can be controlled by the
--enable-x86-used-note configure option.

Related

Is it possible to create a basic bare-metal Assembly bootup/startup program using only GNU LD command-line options

Is it possible to create a basic bare-metal Assembly bootup/startup program using only GNU LD command-line options in lieu of a customary -T scriptfile for a Cortex-M4 target?
I have reviewed the GNU LD documentation and searched various locations including this site; however, I have not found any information suggesting that the exclusive use of command-line options for the GNU linker is possible or not possible.
My attempt to manage the object file layout without a customary vendor provided *.ld scriptfile is purely academic. This not homework. I'm not requesting any help for writing the startup Assembly code. I'm merely looking for a definitive answer or further resource direction.
$ arm-none-eabi-ld bootup.o -o bootup #bootup.ld.cli.file
Sample bootup.ld.cli.file content
--entry 0x0
--Ttext=0x0
--section-start .isr_vector=0x0
--section-start _start=0x4
--section-start .MyCode=0x8c
--Tdata=0x20000000
--Tbss=0x20000000
-M=bootup.map
--print-gc-sections
you have your answer right there the -Ttext=number -Tdata=number and so on are no gnu linker script items they are gnu command line items. note the at sign on your command line.
A gnu linker script looks more like this (although most are significantly more complicated even if they dont need to be).
MEMORY
{
rom : ORIGIN = 0x08000000, LENGTH = 0x1000
ram : ORIGIN = 0x20000000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > rom
.rodata : { *(.rodata*) } > rom
.bss : { *(.bss*) } > ram
}
Note that the gnu linker is a bit funny when you use the -Ttext=address approach, sometimes it will insert gaps you might have a few Kbytes of program and instead of it just linearly placing it at address like it should it will put some, then pad some dead space, then put some more, never figured out why but for extremely limited targets the linker script (vs command line) all other factors held constant, does not put the gap in the output.
EDIT:
so.s
.cpu cortex-m0
.thumb
.thumb_func
.global _start
_start:
stacktop: .word 0x20001000
.word reset
.word hang
.word hang
.word hang
.word hang
.thumb_func
reset:
b hang
.thumb_func
hang: b .
flash.s
.cpu cortex-m0
.thumb
.thumb_func
.global _start
_start:
stacktop: .word 0x20001000
.word reset
.word hang
.word hang
.word hang
.word hang
.word hang
.thumb_func
reset:
bl notmain
b hang
.thumb_func
hang: b .
.thumb_func
.globl dummy
dummy:
bx lr
flash.ld
MEMORY
{
rom : ORIGIN = 0x08000000, LENGTH = 0x1000
ram : ORIGIN = 0x20000000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > rom
.rodata : { *(.rodata*) } > rom
.bss : { *(.bss*) } > ram
}
blinker02.c
void dummy ( unsigned int );
int notmain ( void )
{
unsigned int ra;
for(ra=0;ra<100;ra++) dummy(ra);
return(0);
}
Makefile
ARMGNU = arm-none-eabi
AOPS = --warn -mcpu=cortex-m0
COPS = -Wall -O2 -nostdlib -nostartfiles -ffreestanding -mcpu=cortex-m0
all : blinker02.bin sols.bin socl.bin
clean:
rm -f *.bin
rm -f *.o
rm -f *.elf
rm -f *.list
so.o : so.s
$(ARMGNU)-as $(AOPS) so.s -o so.o
flash.o : flash.s
$(ARMGNU)-as $(AOPS) flash.s -o flash.o
blinker02.o : blinker02.c
$(ARMGNU)-gcc $(COPS) -mthumb -c blinker02.c -o blinker02.o
blinker02.bin : flash.ld flash.o blinker02.o
$(ARMGNU)-ld -o blinker02.elf -T flash.ld flash.o blinker02.o
$(ARMGNU)-objdump -D blinker02.elf > blinker02.list
$(ARMGNU)-objcopy blinker02.elf blinker02.bin -O binary
sols.bin : so.o
$(ARMGNU)-ld -o sols.elf -T flash.ld so.o
$(ARMGNU)-objdump -D sols.elf > sols.list
$(ARMGNU)-objcopy sols.elf sols.bin -O binary
socl.bin : so.o
$(ARMGNU)-ld -o socl.elf -Ttext=0x08000000 -Tbss=0x20000000 so.o
$(ARMGNU)-objdump -D socl.elf > socl.list
$(ARMGNU)-objcopy socl.elf socl.bin -O binary
The difference between the command line and the linker script socl and sols list files are the names
diff sols.list socl.list
2c2
< sols.elf: file format elf32-littlearm
---
> socl.elf: file format elf32-littlearm
Not going to bother with demonstrating the difference you may see down the road.
For assembly only you dont need to worry about the no start files and other command line options (on gcc). With C objects you do. by not allowing the linker to use the as-built/configured toolchains (or lets say C library) bootstrap code, you have to provide one, if you dont complicate the linker script to the point that specific object files are called out then the ordering of objects on the command line matters, if you swap flash.o and blinker02.o on the ld command line in the makefile, the binary wont work. you can set entry points all you want but those are strictly for the loader, if this is bare metal which it appears to be then the entry point is useless, the hardware boots how it boots, in this case with a cortex-m address zero is the value to load in the stack pointer, address four is the address to the reset vector (with the lsbit set since this is a thumb only machine, let the tools do that for you using the gnu assembler specific thumb_func to indicate the next label is a branch destination address).
I sprinkled cortex-m0 about one because that is what I took this code from and two the original armv4t and armv5t or as called out in the newer arm docs "all thumb variants", is the most portable arm instruction set across the arm cores. with your cortex-m4 you can get rid of that or perhaps make it a -m3 or -m4 to pull in the armv7-m thumb2 extensions.
so the short answer is
arm-none-eabi-ld -o so.elf -Ttext=0x08000000 -Tbss=0x20000000 so.o
Is more than adequate for making working binaries ASSUMING you dont need a .data.
.data requires a lot more stuff, linker script, a more complicated bootstrap, etc. That or you do a copy-jump thing, compile the REAL program to be run in sram only (different entry point full sized arm style but at the ram base address), then write an adhoc tool to take that binary and turn it into say .word 0xabcdef entries in a program that copies from flash to ram the whole REAL program then branches, that copy and jump program is now flash only with no .data nor .bss really needed and can use the command line, so can the REAL ram only program. And I probably lost you already on that one.
Likewise, using the command line you cannot or should not assume that .bss is zeroed, your bootstrap has to do that too. Now if you have .bss and no .data, then sure you could blindly zero all of the ram on boot before you branch to your C programs entry point (I use notmain() both because at least one old compiler added unnecessary garbage to the binary if it saw a main() function and to emphasize the point that normally there is nothing magic about the function named main().).
Linker scripts are toolchain specific, so no reason to expect gnu linker scripts to port to Kiel to port to ARM (yes I know ARM owns Kiel now was referring to RVCT or whatever it is now), etc. So that is the first .data/.bss problem. Ideally you want your tools to do the work, so they know how bit .data and .bss are so just let them tell you, how you let them tell you is crafting the linker script right (at least with ld) and that is tricky, but it creates variables if you will that can define things like start address for .bss, end address for .bss maybe even some math to subtract them and get length, likewise for .data, then in the bootstrap assembly language you can zero out the .bss memory using start address and length, and/or start address and end address. For .data you need two addresses, where you put it in flash (more linker script foo) and where it wants to go in ram, and the length then the bootstrap copies.
so basically if you write this code
unsigned int x=5;
unsigned int y;
and you use a command line linker script, there is no reason whatsoever to expect x to be 5 or y to be 0 when the first C function is entered that uses those variables. If you assume that x will be a 5 then your program will fail.
if you do this instead
unsigned int x;
unsigned int y;
void myfun ( void )
{
x=5;
y=0;
}
now those assignments are instructions in .text and not values in .data so it will always work command line or not simple linker script or complicated, etc.

Output relocatable section data from linker script

Using commands like BYTE or LONG, it is possible to include explicit bytes of data in an output section from a linker script. The linked page also describes that those commands can be used to output the value of symbols.
I would have expected that if you perform partial linking (i.e., using the -r option of ld), relocation records would be emitted for the symbols that are outputted in this way. However, it seems that the linker just outputs the currently known value1 of the symbol.
Here is a MWE to clarify what I mean.
test.c:
int foo = 1, bar = 2;
test.ld:
SECTIONS {
.data : {
*(.data)
LONG(foo)
LONG(bar)
}
}
Then run the following:
$ gcc -c test.c
$ ld -T test.ld -r -o test.elf test.o
$ readelf -r test.elf
There are no relocations in this file.
$ readelf -x .data test.elf
Hex dump of section '.data':
0x00000000 01000000 02000000 00000000 04000000 ................
As you can see, no relocations are created and the values that are outputted are the currently known values of foo and bar.
Could this be a bug? If not, is there any way to force the linker to output relocation records for symbols added to an output section?
1 I'm not sure of this is the correct term. What I mean is the value that you see when you run readelf -s on the input object file.

LD is producing 2000 lines of assembly for a 3 line C file. How can I get it to only produce the assembly needed?

I'm currently working through a document titled "Building a Simple OS -- from scratch". It teaches x86 instructions only in 32-bit. At one point the author lists this C function:
int my_function() {
return 0xbaba;
}
and says that it compiles into this assembly:
00000000 55 push ebp
00000001 89E5 mov ebp, esp
00000003 B8BABA0000 mov eax, 0xbaba
00000008 5D pop ebp
00000009 C3 ret
I have the code for my_function() in a file called basic.c and I'm using the following bash instructions (on Mac OS X Yosemite w/ Xcode installed):
gcc -ffreestanding -m32 -c basic.c -o basic.o
ld -arch i386 -no_pie -e _my_function -static -o basic.bin -image_base 0x0 basic.o
These are successful, but when I run
ndisasm -b 32 basic.bin > basic.dis
I get a file with over 2000 lines of assembly, most of which are
00000FDA 0000 add [eax],al
How can I get it to just compile to the simple five lines listed by author?
You should be looking at the .o file, not the linked file (or using a different tool to disassemble just the desired function in the linked file). Per the manual:
NDISASM does not have any understanding of object file formats, like objdump, and it will not understand DOS .EXE files like debug will. It just disassembles.
ld in the OS X / Xcode toolchain produces a Mach-O binary. This includes various metadata in addition to the machine code for the function. ndisasm isn't aware of the file structure and is attempting to disassemble the metadata as code (which it isn't).

How to make objdump show assembly of sections only appeared in source code?

I would like to produce assemblies like the one in the answer of this question Using GCC to produce readable assembly?
for simple test code: test.c
void main(){
int i;
for(i=0;i<10;i++){
printf("%d\n",i);
}
}
gcc command : gcc -g test.c -o test.o
objdump command: objdump -d -M intel -S test.o
But what i got is assemblies starts with .init section
080482bc<_init>: and end with .fini section 080484cc<_fini>
which i do not want them to be shown.
why is this happening ? and how can i avoid showing sections that are not in the source file?
Right now you're creating an executable file and not an object file. The executable file of course contains lot of extra sections.
If you want to create an object file, use the -c flag to GCC.
You can specify sections using -j option.
So objdump -d executable -j .text -j .plt will only show disassembly from .text and .plt sections.

From bootsector to C++ kernel

I decided to write a simple asm bootloader and a c++ kernel. I read a lot of tutorials, but I cant compile an assembly file seems like this:
[BITS 32]
[global start]
[extern _k_main]
start:
call _k_main
cli
hlt
(I would like to call th k_main function from c file)
Compile/assemble/linking errors:
nasm -f bin -o kernelstart.asm -o kernelstart.bin:
error: bin file cannot contain external references
okay, then i tried create a .o file:
nasm -f aout -o kernelstart.asm -o kernelstart.o (That's right)
ld -i -e _main -Ttext 0x1000 kernel.o kernelstart.o main.o
error: File format not recognized
Someone give me plz a working example or say how to compile. :/
(I'm browsing the tutorials and helps 2 days ago but cannot find a right answer)
I don't have a direct answer on where your error comes from. However, I do see a lot of things going wrong so I'll write these here:
nasm
nasm -f aout -o kernelstart.asm -o kernelstart
Does that even work? That should be something like
nasm -f aout -o kernelstart kernelstart.asm
ld
ld -i -e _main -Ttext 0x1000 kernel.o kernelstart.o main.o
Since you said you wanted to make a bootloader and a kernel, I'm assuming your goal here is to make ld output something that can be put in the MBR. If that's the case, here are some things to keep in mind:
You didn't specify the output format. If you want to make an MBR image, add --oformat=binary to the command line options. This makes sure a flat binary file is generated.
You set the entry point to _main. I'm not sure where that symbol is defined, but I guess you want your entry point to be start because that's where you call your kernel.
You link your text section starting at 0x1000. If you want to put your image in the MBR to be loaded by the BIOS, it should be linked at 0x7c00.
As a side note: it seems your trying to link your bootloader and kernel together in one image. Just remember that the MBR is can only hold 512 bytes (well, actually 510 bytes since the last 2 should contain a magic value) so you won't be able to write much of a kernel there. What you should do is create a separate kernel image and load this from your bootloader.
I hope these points will help you in solving your problem.
Also, you'll find a lot of useful information as OSDev. Here is a tutorial on writing a real mode "kernel" that only uses the MBR. The tutorial contains working code.

Resources