I have been implementing just for fun a simple operating system for x86 architecture from scratch. I implemented the assembly code for the bootloader that loads the kernel from disk and enters in 32-bit mode. The kernel code that is loaded is written in C, so in order to be executed the idea is to generate the raw binary from the C code.
Firstly, I used these commands:
$gcc -ffreestanding -c kernel.c -o kernel.o -m32
$ld -o kernel.bin -Ttext 0x1000 kernel.o --oformat binary -m elf_i386
However, it didn't generate any binary giving back these errors:
kernel.o: In function 'main':
kernel.c:(.text+0xc): undefined reference to '_GLOBAL_OFFSET_TABLE_'
Just for clarity sake, the kernel.c code is:
/* kernel.c */
void main ()
{
char *video_memory = (char *) 0xb8000 ;
*video_memory = 'X';
}
Then I followed this tutorial: http://wiki.osdev.org/GCC_Cross-Compiler
to implement my own cross-compiler for my own target. It worked for my purpose, however disassembling with the command ndisasm I obtained this code:
00000000 55 push ebp
00000001 89E5 mov ebp,esp
00000003 83EC10 sub esp,byte +0x10
00000006 C745FC00800B00 mov dword [ebp-0x4],0xb8000
0000000D 8B45FC mov eax,[ebp-0x4]
00000010 C60058 mov byte [eax],0x58
00000013 90 nop
00000014 C9 leave
00000015 C3 ret
00000016 0000 add [eax],al
00000018 1400 adc al,0x0
0000001A 0000 add [eax],al
0000001C 0000 add [eax],al
0000001E 0000 add [eax],al
00000020 017A52 add [edx+0x52],edi
00000023 0001 add [ecx],al
00000025 7C08 jl 0x2f
00000027 011B add [ebx],ebx
00000029 0C04 or al,0x4
0000002B 0488 add al,0x88
0000002D 0100 add [eax],eax
0000002F 001C00 add [eax+eax],bl
00000032 0000 add [eax],al
00000034 1C00 sbb al,0x0
00000036 0000 add [eax],al
00000038 C8FFFFFF enter 0xffff,0xff
0000003C 16 push ss
0000003D 0000 add [eax],al
0000003F 0000 add [eax],al
00000041 41 inc ecx
00000042 0E push cs
00000043 088502420D05 or [ebp+0x50d4202],al
00000049 52 push edx
0000004A C50C04 lds ecx,[esp+eax]
0000004D 0400 add al,0x0
0000004F 00 db 0x00
As you can see, the first 9 rows (except for the NOP that I don't know why it is inserted) are the assembly translation of my main function. From 10 row to the end, there's a lot code that I don't know why it is here.
In the end, I have two questions:
1) Why is it produced that code?
2) Is there a way to produce the raw machine code from C without that useless stuff?
A few hints first:
avoid naming your starting routine main. It is confusing (both for the reader and perhaps for the compiler; when you don't pass -ffreestanding to gcc it is handling main very specifically). Use something else like start or begin_of_my_kernel ...
compile with gcc -v to understand what your particular compiler is doing.
you probably should ask your compiler for some optimizations and all warnings, so pass -O -Wall at least to gcc
you may want to look into the produced assembler code, so use gcc -S -O -Wall -fverbose-asm kernel.c to get the kernel.s assembler file and glance into it
as commented by Michael Petch you might want to pass -fno-exceptions
your probably need some linker script and/or some hand-written assembler for crt0
you should read something about linkers & loaders
kernel.c:(.text+0xc): undefined reference to '_GLOBAL_OFFSET_TABLE_'
This smells like something related to position-independent-code. My guess: try compiling with an explicit -fno-pic or -fno-pie
(on some Linux distributions, their gcc might be configured with some -fpic enabled by default)
PS. Don't forget to add -m32 to gcc if you want x86 32 bits binaries.
Related
I have the following program:
void main1() {
((void(*)(void)) (0xabcdefabcdef)) ();
}
I create it with the following commands:
clang -fno-stack-protector -c -static -nostdlib -fpic -fpie -O0 -fno-asynchronous-unwind-tables main.c -o shellcode.o
ld shellcode.o -o shellcode -S -static -dylib -e main1 -order_file order.txt
gobjcopy -O binary --only-section=.text shellcode shellcode.output
The assembly looks like the following:
//
// ram
// ram: 00000000-00000011
//
**************************************************************
* FUNCTION *
**************************************************************
undefined FUN_00000000()
undefined AL:1 <RETURN>
FUN_00000000
00000000 55 PUSH RBP
00000001 48 89 e5 MOV RBP,RSP
00000004 48 b8 ef MOV RAX,0xabcdefabcdef
cd ab ef
cd ab 00 00
0000000e ff d0 CALL RAX
00000010 5d POP RBP
00000011 c3 RET
How do I get clang to remove the PUSH RBP, MOV RBP,RSP and POP RBP instructions as they are unnecessary?
I can do this if I write the program in assembly with the following lines:
.globl start
start:
movq $0xabcdefabcdef, %rax
call *%rax
ret
and with the following build commands:
clang -static -nostdlib main.S -o crashme.o
gobjcopy -O binary --only-section=.text crashme.o crashme.output
and the resulting assembly:
//
// ram
// ram: 00000000-0000000c
//
assume DF = 0x0 (Default)
00000000 48 b8 ef MOV RAX,0xabcdefabcdef
cd ab ef
cd ab 00 00
0000000a ff d0 CALL RAX
0000000c c3 RET
but I would much rather write C code instead of assembly.
You forgot to enable optimization. Any optimization level like -O3 enables -fomit-frame-pointer.
It will also optimize the tailcall into a jmp instead of call/ret though. If you need to avoid that for some reason, maybe you can use -fomit-frame-pointer at the default -O0.
For shellcode you might want -Os to optimize for code size. Or even clang's -Oz; that will have a side-effect of avoiding some 0 bytes in the machine code by using push imm8 / pop reg to put small constants in registers, instead of mov reg, imm32.
I've seen in some posts/videos/files that they are zero-padded to look bigger than they are, or match "same file size" criteria some file system utilities have for moving files, mostly they are either prank programs, or malware.
But I often wondered, what would happen if the file corrupted, and would "load" the next set of "instructions" that are in the big zero-padded space at the end of the file?
Would anything happen? What's the instruction set for 0x0?
The decoding of 0 bytes completely depends on the CPU architecture. On many architectures, instruction are fixed length (for example 32-bit), so the relevant thing would be 00 00 00 00 (using hexdump notation).
On most Linux distros, clang/llvm comes with support for multiple target architectures built-in (clang -target and llvm-objdump), unlike gcc / gas / binutils, so I was able to use that to check for some architectures I didn't have cross-gcc / binutils installed for. Use llvm-objdump --version to see the supported list. (But I didn't figure out how to get it to disassemble a raw binary like binutils objdump -b binary, and my clang won't create SPARC binaries on its own.)
On x86, 00 00 (2 bytes) decodes (http://ref.x86asm.net/coder32.html) as an 8-bit add with a memory destination. The first byte is the opcode, the 2nd byte is the ModR/M that specifies the operands.
This usually segfaults right away (if eax/rax isn't a valid pointer), or segfaults once execution falls off the end of the zero-padded part into an unmapped page. (This happens in real life because of bugs like falling off the end of _start without making an exit system call), although in those cases the following bytes aren't always all zero. e.g. data, or ELF metadata.)
x86 64-bit mode: ndisasm -b64 /dev/zero | head:
address machine code disassembly
00000000 0000 add [rax],al
x86 32-bit mode (-b32):
00000000 0000 add [eax],al
x86 16-bit mode: (-b16):
00000000 0000 add [bx+si],al
AArch32 ARM mode: cd /tmp && dd if=/dev/zero of=zero bs=16 count=1 && arm-none-eabi-objdump -z -D -b binary -marm zero. (Without -z, objdump skips over large blocks of all-zero and shows ...)
addr machine code disassembly
0: 00000000 andeq r0, r0, r0
ARM Thumb/Thumb2: arm-none-eabi-objdump -z -D -b binary -marm --disassembler-options=force-thumb zero
0: 0000 movs r0, r0
2: 0000 movs r0, r0
AArch64: aarch64-linux-gnu-objdump -z -D -b binary -maarch64 zero
0: 00000000 .inst 0x00000000 ; undefined
MIPS32: echo .long 0 > zero.S && clang -c -target mips zero.S && llvm-objdump -d zero.o
zero.o: file format ELF32-mips
Disassembly of section .text:
0: 00 00 00 00 nop
PowerPC 32 and 64-bit: -target powerpc and -target powerpc64. IDK if any extensions to PowerPC use the 00 00 00 00 instruction encoding for anything, or if it's still an illegal instruction on modern IBM POWER chips.
zero.o: file format ELF32-ppc (or ELF64-ppc64)
Disassembly of section .text:
0: 00 00 00 00 <unknown>
IBM S390: clang -c -target systemz zero.S
zero.o: file format ELF64-s390
Disassembly of section .text:
0: 00 00 <unknown>
2: 00 00 <unknown>
I am planning to use C to write a small kernel and I really don't want it to bloat with unnecessary instructions.
I have two C files which are called main.c and hello.c. I compile and link them using the following GCC command:
gcc -Wall -T lscript.ld -m16 -nostdlib main.c hello.c -o main.o
I am dumping .text section using following OBJDUMP command:
objdump -w -j .text -D -mi386 -Maddr16,data16,intel main.o
and get the following dump:
00001000 <main>:
1000: 67 66 8d 4c 24 04 lea ecx,[esp+0x4]
1006: 66 83 e4 f0 and esp,0xfffffff0
100a: 67 66 ff 71 fc push DWORD PTR [ecx-0x4]
100f: 66 55 push ebp
1011: 66 89 e5 mov ebp,esp
1014: 66 51 push ecx
1016: 66 83 ec 04 sub esp,0x4
101a: 66 e8 10 00 00 00 call 1030 <hello>
1020: 90 nop
1021: 66 83 c4 04 add esp,0x4
1025: 66 59 pop ecx
1027: 66 5d pop ebp
1029: 67 66 8d 61 fc lea esp,[ecx-0x4]
102e: 66 c3 ret
00001030 <hello>:
1030: 66 55 push ebp
1032: 66 89 e5 mov ebp,esp
1035: 90 nop
1036: 66 5d pop ebp
1038: 66 c3 ret
My questions are: Why are machine codes at the following lines being generated?
I can see that subtraction and addition completes each other, but why are they generated? I don't have any variable to be allocated on stack. I'd appreciate a source to read about usage of ECX.
1016: 66 83 ec 04 sub esp,0x4
1021: 66 83 c4 04 add esp,0x4
main.c
extern void hello();
void main(){
hello();
}
hello.c
void hello(){}
lscript.ld
SECTIONS{
.text 0x1000 : {*(.text)}
}
As I mentioned in my comments:
The first few lines (plus the push ecx) are to ensure the stack is aligned on a 16-byte boundary which is required by the Linux System V i386 ABI. The pop ecx and lea before the ret in main is to undo that alignment work.
#RossRidge has provided a link to another Stackoverflow answer that details this quite well.
In this case you seem to be doing real mode development. GCC isn't well suited for this but it can work and I will assume you know what you are doing. I mention some of the pitfalls of using -m16 in this Stackoverflow answer. I put this warning in that answer regarding real mode development with GCC:
There are so many pitfalls in doing this that I recommend against it.
If you remain undeterred and wish to continue forward you can do a few things to minimize the code. The 16-byte alignment of the stack at the point a function call is made is part of the more recent Linux System V i386 ABIs. Since you are generating code for a non-Linux environment you can change the stack alignment to 4 using compiler option -mpreferred-stack-boundary=2 . The GCC manual says:
-mpreferred-stack-boundary=num
Attempt to keep the stack boundary aligned to a 2 raised to num byte boundary. If -mpreferred-stack-boundary is not specified, the default is 4 (16 bytes or 128 bits).
If we add that to your GCC command we get gcc -Wall -T lscript.ld -m16 -nostdlib main.c hello.c -o main.o -mpreferred-stack-boundary=2:
00001000 <main>:
1000: 66 55 push ebp
1002: 66 89 e5 mov ebp,esp
1005: 66 e8 04 00 00 00 call 100f <hello>
100b: 66 5d pop ebp
100d: 66 c3 ret
0000100f <hello>:
100f: 66 55 push ebp
1011: 66 89 e5 mov ebp,esp
1014: 66 5d pop ebp
1016: 66 c3 ret
Now all the extra alignment code to get it on a 16-byte boundary has disappeared. We are left with typical function frame pointer prologue and epilogue code. This is often in the form of push ebp and mov ebp,esp pop ebp. we can remove these with the -fomit-frame-pointer define in the GCC manual as:
The option -fomit-frame-pointer removes the frame pointer for all functions which might make debugging harder.
If we add that option we get gcc -Wall -T lscript.ld -m16 -nostdlib main.c hello.c -o main.o -mpreferred-stack-boundary=2 -fomit-frame-pointer:
00001000 <main>:
1000: 66 e8 02 00 00 00 call 1008 <hello>
1006: 66 c3 ret
00001008 <hello>:
1008: 66 c3 ret
You can then optimize for size with -Os. The GCC manual says this:
-Os
Optimize for size. -Os enables all -O2 optimizations that do not typically increase code size. It also performs further optimizations designed to reduce code size.
This has a side effect that main will be placed into a section called .text.startup. If we display both with objdump -w -j .text -j .text.startup -D -mi386 -Maddr16,data16,intel main.o we get:
Disassembly of section .text:
00001000 <hello>:
1000: 66 c3 ret
Disassembly of section .text.startup:
00001002 <main>:
1002: e9 fb ff jmp 1000 <hello>
If you have functions in separate objects you can alter the calling convention so the first 3 Integer class parameters are passed in registers rather than the stack. The Linux kernel uses this method as well. Information on this can be found in the GCC documentation:
regparm (number)
On the Intel 386, the regparm attribute causes the compiler to pass arguments number one to number if they are of integral type in registers EAX, EDX, and ECX instead of on the stack. Functions that take a variable number of arguments will continue to be passed all of their arguments on the stack.
I wrote a Stackoverflow answer with code that uses __attribute__((regparm(3))) that may be a useful source of further information.
Other Suggestions
I recommend you consider compiling each object individually rather than altogether. This is also advantageous since it can be more easily be done in a Makefile later on.
If we look at your command line with the extra options mentioned above you'd have:
gcc -Wall -T lscript.ld -m16 -nostdlib main.c hello.c -o main.o \
-mpreferred-stack-boundary=2 -fomit-frame-pointer -Os
I recommend you do it this way:
gcc -c -Os -Wall -m16 -ffreestanding -nostdlib -mpreferred-stack-boundary=2 \
-fomit-frame-pointer main.c -o main.o
gcc -c -Os -Wall -m16 -ffreestanding -nostdlib -mpreferred-stack-boundary=2 \
-fomit-frame-pointer hello.c -o hello.o
The -c option (I added it to the beginning) forces the compiler to just generate the object file from the source and not to perform linking. You will also notice the -T lscript.ld has been removed. We have created .o files above. We can now use GCC to link all of them together:
gcc -ffreestanding -nostdlib -Wl,--build-id=none -m16 \
-Tlscript.ld main.o hello.o -o main.elf
The -ffreestanding will force the linker to not use the C runtime, the -Wl,--build-id=none will tell the compiler not to generate some noise in the executable for build notes. In order for this to really work you'll need a slightly more complex linker script that places the .text.startup before .text. This script also adds the .data section, the .rodata and .bss sections. The DISCARD option removes exception handling data and other unneeded information.
ENTRY(main)
SECTIONS{
.text 0x1000 : SUBALIGN(4) {
*(.text.startup);
*(.text);
}
.data : SUBALIGN(4) {
*(.data);
*(.rodata);
}
.bss : SUBALIGN(4) {
__bss_start = .;
*(COMMON);
*(.bss);
}
. = ALIGN(4);
__bss_end = .;
/DISCARD/ : {
*(.eh_frame);
*(.comment);
*(.note.gnu.build-id);
}
}
If we look at a complete OBJDUMP with objdump -w -D -mi386 -Maddr16,data16,intel main.elf we would see:
Disassembly of section .text:
00001000 <main>:
1000: e9 01 00 jmp 1004 <hello>
1003: 90 nop
00001004 <hello>:
1004: 66 c3 ret
If you want to convert main.elf to a binary file that you can place in a disk image and read it (ie. via BIOS interrupt 0x13), you can create it this way:
objcopy -O binary main.elf main.bin
If you dump main.bin with NDISASM using ndisasm -b16 -o 0x1000 main.bin you'd see:
00001000 E90100 jmp word 0x1004
00001003 90 nop
00001004 66C3 o32 ret
Cross Compiler
I can't stress this enough but you should consider using a GCC cross compiler. The OSDev Wiki has information on building one. It also has this to say about why:
Why do I need a Cross Compiler?
You need to use a cross-compiler unless you are developing on your own operating system. The compiler must know the correct target platform (CPU, operating system), otherwise you will run into trouble. If you use the compiler that comes with your system, then the compiler won't know it is compiling something else entirely. Some tutorials suggest using your system compiler and passing a lot of problematic options to the compiler. This will certainly give you a lot of problems in the future and the solution is build a cross-compiler.
I am trying to build the following really small C program into a raw binary file:
asm ("call sys_main\n" // Immediately run sys_main at start of code
"__asm_loop_halt:\n"
"jmp __asm_loop_halt\n"); // Then halt
int sys_main() {
short *addr = (short*) 0x08b000; // Address start of EGA-VRAM
*addr = 0x0f41; // Write white 'A' on black to screen
}
Because I am trying to create a raw binary I have to use a linker script gcc -std=gnu99 -Os -nostdlib -m32 -march=i386 -ffreestanding -Wl,--nmagic,--script=386.ld -o test test.c:
OUTPUT_FORMAT(binary)
SECTIONS
{
.text 0x0500 :
{
*(.text);
}
.data :
{
*(.data);
*(.bss);
*(.rodata);
}
_heap = ALIGN(4);
}
The script is supposed to tell the linker that the code starts running at 0x500 and that it should only create a binary file. However when I disassemble the binary, I get:
00000000 E802000000 call dword 0x7
00000005 EBFE jmp short 0x5
00000007 55 push ebp
00000008 89E5 mov ebp,esp
0000000A 66C70500B0080041 mov word [dword 0x8b000],0xf41
-0F
00000013 5D pop ebp
00000014 C3 ret
00000015 0000 add [eax],al
00000017 001400 add [eax+eax],dl
......
Appearantly the linker still took 0x0 as the start address of the code and also added a bunch of random data behind the last senseful 'ret' instruction, that is in total 4 times as big as the code.
What is this data, why is it there and what did I do wrong to have my code start at 0x0?
Edit: Thanks to Eugene's tip with the map I discovered that the bytes behind the .text section are .eh_frame responsible for exception handling which can easily removed by calling gcc with -fno-asynchronous-unwind-tables.
I am in the process of writing a small operating system in C. I have written a bootloader and I'm now trying to get a simple C file (the "kernel") to compile with gcc:
int main(void) { return 0; }
I compile the file with the following command:
gcc kernel.c -o kernel.o -nostdlib -nostartfiles
I use the linker to create the final image using this command:
ld kernel.o -o kernel.bin -T linker.ld --oformat=binary
The contents of the linker.ld file are as follows:
SECTIONS
{
. = 0x7e00;
.text ALIGN (0x00) :
{
*(.text)
}
}
(The bootloader loads the image at address 0x7e00.)
This seems to work quite well - ld produces a 128-byte file containing the following instructions in the first 11 bytes:
00000000 55 push ebp
00000001 48 dec eax
00000002 89 E5 mov ebp, esp
00000004 B8 00 00 00 00 mov eax, 0x00000000
00000009 5D pop ebp
0000000A C3 ret
However, I can't figure out what the other 117 bytes are for. Disassembling them seems to produce a bunch of garbage that doesn't make any sense. The existence of the additional bytes has me wondering if I'm doing something wrong.
Should I be concerned?
These are additional sections, which were not stripped and not discarded. You want your linker.ld file to look like this:
SECTIONS
{
. = 0x7e00;
.text ALIGN (0x00) :
{
*(.text)
}
/DISCARD/ :
{
*(.comment)
*(.eh_frame_hdr)
*(.eh_frame)
}
}
I know what sections to discard from the output of objdump -t kernel.o.
Simple, you're using gcc, and it always put its initialization code before passing control to your main.
What's on that start up code I don't know, but they are there. As you may see there's also an comment 'GNU' on your binary, you can't print specific sectors by using objdump -s -j 'section name'.