Invalid Op Code Exception when implementing IRQs - c

I have been trying to loosely follow this tutorial on basic kernel dev. Currently, the target architecture is i386.
The implementation of IRQs is causing me issues ; my interrupt handler reports a cascade of Invalid Op Code exceptions whenever I try to pass registers (defined as a struct) as an argument to a function. Here is the code for the interrupt handler which raises the exception:
void interrupt_handler(registers_t all_registers) {
// Printing exception's name
kprint("interrupt_handler.c (l. 53) : Exception raised was:", 0xB0);
kprint(exception_messages[(int) all_registers.int_no], 0xB0);
kprint("\n", 0xB0);
// Celling test_handle to display the value of some registers
// INVALID OP CODE ================>
test_handle(all_registers); // works as expected if this line is commented out
}
void test_handle(registers_t all_registers) {
kprint("interrupt_handler.c (l. 78) : Register DS contains", 0xD0);
kprint("to be implemented", 0xD0);
}
The structure registers_t is defined as follows (copied from the tutorial):
typedef struct {
u32int ds; /* Data segment selector */
u32int edi, esi, ebp, esp, ebx, edx, ecx, eax; /* Pushed by pusha. */
u32int int_no, err_code; /* Interrupt number and error code (if applicable) */
u32int eip, cs, eflags, useresp, ss; /* Pushed by the processor automatically */
} __attribute__((packed)) registers_t;
Trying function calling with other struct, I found that the number of variables in the struct matters ; any struct that has between 5 and 16 u32int triggers the exception. For instance, the following structure, when initialized and passed empty to test_handle, does not raise exceptions:
// Same as registers_t with less arguments
typedef struct {
u32int ds;
u32int edi, esi;
} __attribute__((packed)) test_t;
Disassembling the .o file reveals that the generated code uses the mov instruction to pass test_t structures and movsd to pass registers_t. So my suspicion is that the compilation process is at fault, since the compiler generates unrecognized instructions.
Here are the relevant excerpts of my Makefile:
C_FLAGS=-ffreestanding -nostartfiles -nodefaultlibs -fno-builtin -Wall -Wextra -fno-exceptions -m32 -target i386-pc-elf -fno-rtti
# Compiling C code
%.o: %.c
clang $(C_FLAGS) -c $< -o $#
# Linking
kernel/kernel.bin: $(O_FILES)
ld -o $# -Ttext 0x1000 $^ --oformat binary -m elf_i386
Is there anything wrong about the compiling process? Or does the problem stem from elsewhere?

#Ross Ridge figured it out (thanks to him!). The details below are what I learned from the OSDev wiki
The Streaming SIMD Extension (SSE) expands the set of instructions recognized by the CPU with some 70 additional instructions and adds some more registers. SSE needs to be enabled before its instructions and registers can be used. The compiler generates machine code which can include SSE instructions and therefore, SSE needs to be enabled.
In the code above, the passing of struct to the function was compiled into machine code which involved the xmm0 register, which is part of the SSE.
The assembly code to enable SSE is given below (adapted from the OSDev wiki). I added it to my bootloader, right after entering the 32-bit protected mode and before entering the kernel. That fixed the problem!
mov eax, cr0 ; cr0 cannot be manipulated directly, manipulate eax instead
and ax, 0xFFFB ; clear coprocessor emulation CR0.EM
or ax, 0x2 ; set coprocessor monitoring CR0.MP
mov cr0, eax
mov eax, cr4 ; cr4 too cannot be manipulated directly
or ax, 3 << 9 ; set CR4.OSFXSR and CR4.OSXMMEXCPT at the same time
mov cr4, eax

Related

Interfacing C functions with asm code at booting level?

I am following a tutorial about creating a 32bit operating system. I have managed to switch to the 32bit protected mode happily but could not get further.
Below, boot_sect.asm is the main assembly code. After switching to protected mode inside boot_sect.asm, an external C function (kmain) is called via kernel_entry.asm (calls the c function) from boot_sect.asm. The location of the C function in the ram is 0x1000000.
The kmain() is supposed to print an x on the screen. However, this does not happen. Indeed, to diagnose the code, I put an infinite while loop inside the kmain(), but the program did not hang there; instead it directly ended. I guess the files are not linked properly or there is a memory overlap problem. I could not get any further, and I am stuck at this point.
I would appreciate any idea or help to get over this step.
boot_sect.asm (the main asm code)
; A boot sector that boots a C kernel in 32-bit protected mode
[bits 16]
[org 0x7c00]
KERNEL_OFFSET equ 0x100000 ; This is the memory offset to where kernel will be loaded
mov [BOOT_DRIVE], dl
mov bp, 0x9000 ; stack origin.
mov sp, bp
mov bx, MSG_REAL_MODE ;print that we are starting
call print_string ; booting from 16-bit real mode
call load_kernel ; Load our kernel
call switch_to_pm ; Switch to protected mode, from which
; we will not return
jmp $
; Include our useful , hard-earned routines
%include "print/print_string.asm"
%include "disk/disk_load.asm"
%include "pm/gdt.asm"
%include "pm/print_string_pm.asm"
%include "pm/switch_to_pm.asm"
[bits 16]
; load_kernel
load_kernel:
mov bx, MSG_LOAD_KERNEL ; Print a message to say we are loading the kernel
call print_string
mov bx, KERNEL_OFFSET ; Set-up parameters for our disk_load routine , so
mov dh, 15 ; that we load the first 15 sectors (excluding
mov dl, [BOOT_DRIVE] ; the boot sector) from the boot disk (i.e. our
call disk_load ; kernel code) to address KERNEL_OFFSET
ret
[bits 32]
BEGIN_PM:
mov ebx, MSG_PROT_MODE ; Use 32-bit print routine to
call print_string_pm ; announce its in protected mode
;the program manages to come here perfectly.
;here the kmain() function is called via kernel_entry.asm
;but after this call, the program gets "lost", it ends somehow.
call KERNEL_OFFSET
;it cant even get here, it ends before getting here. Probably the kmain() is not located at KERNEL_OFFSET due to a problem.
jmp $ ;Hang.
; Global variables
BOOT_DRIVE db 0
MSG_REAL_MODE db "Started in 16-bit Real Mode", 0
MSG_PROT_MODE db "Successfully landed in 32-bit Protected Mode", 0
MSG_LOAD_KERNEL db "Loading kernel into memory.", 0
; Bootsector padding
times 510-($-$$) db 0
dw 0xaa55
kernel.c (c code)
void kmain()
{
char* screen = (char*)0xb8000;
screen[0] = 'X';
}
kernel_entry.asm (calls the c code)
[bits 32]
[extern _kmain]
call _kmain
jmp $
Finally, to obtain the os-image , I used the following commands
kernel.c and kernel_entry.asm are converted to object files (nasm and gcc commands), then they are linked to create kernel.bin file (by using ld command and the address 0x100000 of the kmain()).
Then kernel.bin and boot_sect.bin are combined using type command to obtain an os image.
gcc -m32 -ffreestanding -c kernel.c -o kernel.o
nasm kernel_entry.asm -f elf -o kernel_entry.o
ld -m i386pe -T NUL -o kernel.bin -Ttext-segment 0x100000 kernel_entry.o kernel.o
nasm boot_sect.asm -f bin -o boot_sect.bin
type boot_sect.bin kernel.bin > os-image

How to specify Win32 as output when invoking GCC using MinGW on Windows

How to specify Win32 as output when invoking GCC using MinGW on Windows.
Below I've posted my source code. My objective is to interface assembly with C code and produce an executable.
I start assembling the add.asm to Win32 using the following NASM command:
nasm -f win32 add.asm
Then it should be possible to invoke GCC using both C and object files?
gcc -o add add.obj call_asm.c
However, this results in an a linkage error:
C:\Users\nze\AppData\Local\Temp\cckUvRyC.o:call_asm.c:(.text+0x1e): undefined reference to `add'
collect2.exe: error: ld returned 1 exit status
If I instead compile to ELF using
nasm -f elf add.asm
the command (this time using the ELF file add.o)
gcc -o add add.o call_asm.c
works perfectly.
How can I tell GCC that my object files are in Win32 format, so that it should compile call_asm.c to Win32 before linking? (I guess this is the core of the problem, please comment whether I'm correct).
call_add.c:
#include <stdio.h>
extern int add(int a, int b);
int main()
{
printf("%d", add(7, 6));
}
add.asm:
BITS 32
global _add
_add:
push ebp
mov ebp, esp
mov eax, [ebp+8]
mov ebx, [ebp+12]
add eax, ebx
mov esp, ebp
pop ebp
ret
The problem isn't what you assume it is. GCC is generating "win32" format (more commonly know as PECOFF) object files. The problem is that your assembly code doesn't define a section, and this results in NASM not defining the symbol _add in the generated object file.
If you add a SECTION directive your code links and runs without error:
BITS 32
SECTION .text
global _add
_add:
push ebp
mov ebp, esp
mov eax, [ebp+8]
mov ebx, [ebp+12]
add eax, ebx
mov esp, ebp
pop ebp
ret
Telling NASM to generate and ELF object file changes its behaviour, for whatever reason, and causes it to define the _add symbol in the ELF object file.
Just add this before the label:
.globl _add
To get that symbol to export in a .DLL you should add this at the end of the file:
.section .drectve
.ascii " -export:\"add\""
Note that the leading underscore is left out.

Waste in memory allocation for local variables

This my program:
void test_function(int a, int b, int c, int d){
int flag;
char buffer[10];
flag = 31337;
buffer[0] = 'A';
}
int main() {
test_function(1, 2, 3, 4);
}
I compile this program with the debug option:
gcc -g my_program.c
I use gdb and I disassemble the test_function with intel syntax:
(gdb) disassemble test_function
Dump of assembler code for function test_function:
0x08048344 <test_function+0>: push ebp
0x08048345 <test_function+1>: mov ebp,esp
0x08048347 <test_function+3>: sub esp,0x28
0x0804834a <test_function+6>: mov DWORD PTR [ebp-12],0x7a69
0x08048351 <test_function+13>: mov BYTE PTR [ebp-40],0x41
0x08048355 <test_function+17>: leave
0x08048356 <test_function+18>: ret
End of assembler dump.
And I disassemble the main:
(gdb) disassemble main
Dump of assembler code for function main:
0x08048357 <main+0>: push ebp
0x08048358 <main+1>: mov ebp,esp
0x0804835a <main+3>: sub esp,0x18
0x0804835d <main+6>: and esp,0xfffffff0
0x08048360 <main+9>: mov eax,0x0
0x08048365 <main+14>: sub esp,eax
0x08048367 <main+16>: mov DWORD PTR [esp+12],0x4
0x0804836f <main+24>: mov DWORD PTR [esp+8],0x3
0x08048377 <main+32>: mov DWORD PTR [esp+4],0x2
0x0804837f <main+40>: mov DWORD PTR [esp],0x1
0x08048386 <main+47>: call 0x8048344 <test_function>
0x0804838b <main+52>: leave
0x0804838c <main+53>: ret
End of assembler dump.
I place a breakpoint at this adresse: 0x08048355 (leave instruction for the test_function) and I run the program.
I watch the stack like this:
(gdb) x/16w $esp
0xbffff7d0: 0x00000041 0x08049548 0xbffff7e8 0x08048249
0xbffff7e0: 0xb7f9f729 0xb7fd6ff4 0xbffff818 0x00007a69
0xbffff7f0: 0xb7fd6ff4 0xbffff8ac 0xbffff818 0x0804838b
0xbffff800: 0x00000001 0x00000002 0x00000003 0x00000004
0x0804838b is the return adress, 0xbffff818 is the saved frame pointer (main ebp) and flag variable is stocked 12 bytes further. Why 12?
I don't understand this instruction:
0x0804834a <test_function+6>: mov DWORD PTR [ebp-12],0x7a69
Why we don't stock the content's variable 0x00007a69 in ebp-4 instead of 0xbffff8ac?
Same question for buffer. Why 40?
We don't waste the memory? 0xb7fd6ff4 0xbffff8ac and 0xb7f9f729 0xb7fd6ff4 0xbffff818 0x08049548 0xbffff7e8 0x08048249 are not used?
This the output for the command gcc -Q -v -g my_program.c:
Reading specs from /usr/lib/gcc-lib/i486-linux-gnu/3.3.6/specs
Configured with: ../src/configure -v --enable-languages=c,c++ --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-gxx-include-dir=/usr/include/c++/3.3 --enable-shared --enable-__cxa_atexit --with-system-zlib --enable-nls --without-included-gettext --enable-clocale=gnu --enable-debug i486-linux-gnu
Thread model: posix
gcc version 3.3.6 (Ubuntu 1:3.3.6-15ubuntu1)
/usr/lib/gcc-lib/i486-linux-gnu/3.3.6/cc1 -v -D__GNUC__=3 -D__GNUC_MINOR__=3 -D__GNUC_PATCHLEVEL__=6 notesearch.c -dumpbase notesearch.c -auxbase notesearch -g -version -o /tmp/ccGT0kTf.s
GNU C version 3.3.6 (Ubuntu 1:3.3.6-15ubuntu1) (i486-linux-gnu)
compiled by GNU C version 3.3.6 (Ubuntu 1:3.3.6-15ubuntu1).
GGC heuristics: --param ggc-min-expand=99 --param ggc-min-heapsize=129473
options passed: -v -D__GNUC__=3 -D__GNUC_MINOR__=3 -D__GNUC_PATCHLEVEL__=6
-auxbase -g
options enabled: -fpeephole -ffunction-cse -fkeep-static-consts
-fpcc-struct-return -fgcse-lm -fgcse-sm -fsched-interblock -fsched-spec
-fbranch-count-reg -fcommon -fgnu-linker -fargument-alias
-fzero-initialized-in-bss -fident -fmath-errno -ftrapping-math -m80387
-mhard-float -mno-soft-float -mieee-fp -mfp-ret-in-387
-maccumulate-outgoing-args -mcpu=pentiumpro -march=i486
ignoring nonexistent directory "/usr/local/include/i486-linux-gnu"
ignoring nonexistent directory "/usr/i486-linux-gnu/include"
ignoring nonexistent directory "/usr/include/i486-linux-gnu"
#include "..." search starts here:
#include <...> search starts here:
/usr/local/include
/usr/lib/gcc-lib/i486-linux-gnu/3.3.6/include
/usr/include
End of search list.
gnu_dev_major gnu_dev_minor gnu_dev_makedev stat lstat fstat mknod fatal ec_malloc dump main print_notes find_user_note search_note
Execution times (seconds)
preprocessing : 0.00 ( 0%) usr 0.01 (25%) sys 0.00 ( 0%) wall
lexical analysis : 0.00 ( 0%) usr 0.01 (25%) sys 0.00 ( 0%) wall
parser : 0.02 (100%) usr 0.01 (25%) sys 0.00 ( 0%) wall
TOTAL : 0.02 0.04 0.00
as -V -Qy -o /tmp/ccugTYeu.o /tmp/ccGT0kTf.s
GNU assembler version 2.17.50 (i486-linux-gnu) using BFD version 2.17.50 20070103 Ubuntu
/usr/lib/gcc-lib/i486-linux-gnu/3.3.6/collect2 --eh-frame-hdr -m elf_i386 -dynamic-linker /lib/ld-linux.so.2 /usr/lib/gcc-lib/i486-linux-gnu/3.3.6/../../../crt1.o /usr/lib/gcc-lib/i486-linux-gnu/3.3.6/../../../crti.o /usr/lib/gcc-lib/i486-linux-gnu/3.3.6/crtbegin.o -L/usr/lib/gcc-lib/i486-linux-gnu/3.3.6 -L/usr/lib/gcc-lib/i486-linux-gnu/3.3.6/../../.. /tmp/ccugTYeu.o -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed /usr/lib/gcc-lib/i486-linux-gnu/3.3.6/crtend.o /usr/lib/gcc-lib/i486-linux-gnu/3.3.6/../../../crtn.o
NOTE: I read the book "The art of exploitation" and I use the VM provides with the book.
The compiler is trying to maintain 16 byte alignment on the stack. This also applies to 32-bit code these days (not just 64-bit). The idea is that at the point before executing a CALL instruction the stack must be aligned to a 16-byte boundary.
Because you compiled with no optimizations there are some extraneous instructions.
0x0804835a <main+3>: sub esp,0x18 ; Allocate local stack space
0x0804835d <main+6>: and esp,0xfffffff0 ; Ensure `main` has a 16 byte aligned stack
0x08048360 <main+9>: mov eax,0x0 ; Extraneous, not needed
0x08048365 <main+14>: sub esp,eax ; Extraneous, not needed
ESP is now 16-byte aligned after the last instruction above. We move the parameters for the call starting at the top of the stack at ESP. That is done with:
0x08048367 <main+16>: mov DWORD PTR [esp+12],0x4
0x0804836f <main+24>: mov DWORD PTR [esp+8],0x3
0x08048377 <main+32>: mov DWORD PTR [esp+4],0x2
0x0804837f <main+40>: mov DWORD PTR [esp],0x1
The CALL then pushes a 4 byte return address on the stack. We then reach these instructions after the call:
0x08048344 <test_function+0>: push ebp ; 4 bytes pushed on stack
0x08048345 <test_function+1>: mov ebp,esp ; Setup stackframe
This pushes another 4 bytes on the stack. With the 4 bytes from the return address we are now misaligned by 8 bytes. To reach 16-byte alignment again we will need to waste an additional 8 bytes on the stack. That is why in this statement there is an additional 8 bytes allocated:
0x08048347 <test_function+3>: sub esp,0x28
0x08 bytes already on stack because of return address(4-bytes) and EBP(4 bytes)
0x08 bytes of padding needed to align stack back to 16-byte alignment
0x20 bytes needed for local variable allocation = 32 bytes.
32/16 is evenly divisible by 16 so alignment maintained
The second and third number above added together is the value 0x28 computed by the compiler and used in sub esp,0x28.
0x0804834a <test_function+6>: mov DWORD PTR [ebp-12],0x7a69
So why [ebp-12] in this instruction? The first 8 bytes [ebp-8] through [ebp-1] are the alignment bytes used to get the stack 16-byte aligned. The local data will then appear on the stack after that. In this case [ebp-12] through [ebp-9] are the 4 bytes for the 32-bit integer flag.
Then we have this for updating buffer[0] with the character 'A':
0x08048351 <test_function+13>: mov BYTE PTR [ebp-40],0x41
The oddity then would be why a 10 byte array of characters would appear from [ebp+40](beginning of array) to [ebp+13] which is 28 bytes. The best guess I can make is that compiler felt that it could treat the 10 byte character array as a 128-bit (16-byte) vector. This would force the compiler to align the buffer on a 16 byte boundary, and pad the array out to 16 bytes (128-bits). From the perspective of the compiler, your code seems to be acting much like it was defined as:
#include <xmmintrin.h>
void test_function(int a, int b, int c, int d){
int flag;
union {
char buffer[10];
__m128 m128buffer; ; 16-byte variable that needs to be 16-bytes aligned
} bufu;
flag = 31337;
bufu.buffer[0] = 'A';
}
The output on GodBolt for GCC 4.9.0 generating 32-bit code with SSE2 enabled appears as follows:
test_function:
push ebp #
mov ebp, esp #,
sub esp, 40 #,same as: sub esp,0x28
mov DWORD PTR [ebp-12], 31337 # flag,
mov BYTE PTR [ebp-40], 65 # bufu.buffer,
leave
ret
This looks very similar to your disassembly in GDB.
If you compiled with optimizations (such as -O1, -O2, -O3), the optimizer could have simplified test_function because it is a leaf function in your example. A leaf function is one that doesn't call another function. Certain shortcuts could have been applied by the compiler.
As for why the character array seems to be aligned to a 16-byte boundary and padded to be 16 bytes? That probably can't be answered with certainty until we know what GCC compiler you are using (gcc --version will tell you). It would also be useful to know your OS and OS version. Even better would be to add the output from this command to your question gcc -Q -v -g my_program.c
Unless you're trying to improve gcc's code itself, understanding why un-optimized code is as bad as it is will mostly be a waste of time. Look at output from -O3 if you want to see what a compiler does with your code, or from -Og if you want to see a more literal translation of your source into asm. Write functions that take input in args and produce output in globals or return values, so the optimized asm isn't just ret.
You shouldn't expect anything efficient from gcc -O0. It makes the most braindead literal translation of your source.
I can't reproduce that asm output with any gcc or clang version on http://gcc.godbolt.org/. (gcc 4.4.7 to gcc 5.3.0, clang 3.0 to clang 3.7.1). (Note that godbolt use g++, but you can use -x c to treat the input as C, instead of compiling it as C++. This can sometimes change the asm output, even when you don't use any features C99 / C11 has but C++ doesn't. (e.g. C99 variable-length arrays).
Some versions of gcc default to emitting extra code unless I use -fno-stack-protector.
I thought at first that the extra space reserved by test_function was to copy its args down into its stack frame, but at least modern gcc doesn't do this. (64bit gcc does store its args into memory when they arrive in registers, but that's different. 32bit gcc will increment an arg in place on the stack, without copying it.)
The ABI does allow the called function to clobber its args on the stack, so a caller that wanted to make repeated function calls with the same args would have to keep storing them between calls.
clang 3.7.1 with -O0 does copy its args down into locals, but that still only reserves 32 (0x20) bytes.
This is about the best answer you're going to get unless you tell us which version of gcc you're using...

JMP not working

Okay, so I've been trying to make a two-step bootloader in assembly/C but I haven't been able to get the JMP working. At first I thought the read was failing, but, after the following test I ruled that out:
__asm__ __volatile__(
"xorw %ax, %ax;"
"movw %ax, %ds;"
"movw %ax, %es;"
"movb $0x02, %ah;"
"movb $0x01, %al;"
"movw $0x7E00, %bx;"
"movw $0x0003, %cx;"
"xorb %dh, %dh;"
"int $0x13;"
"movb 0x7E00, %al;"
"movb $0x0e, %ah;"
"int $0x10;"
//"jmp 0x7E00"
);
This printed 'f' as expected (the first byte of the sector is 0x66 which is the ASCII code for 'f') proving that the read is successful and the jmp is the problem. This is my code:
__asm__(".code16\n");
__asm__(".code16gcc\n");
__asm__("jmpl $0x0000, $main\n");
void main(){
__asm__ __volatile__(
"xorw %ax, %ax;"
"movw %ax, %ds;"
"movw %ax, %es;"
"movb $0x02, %ah;"
"movb $0x01, %al;"
"movw $0x7E00, %bx;"
"movw $0x0003, %cx;"
"xorb %dh, %dh;"
"int $0x13;"
"jmp $0x200;"
);
}
When run, my program simply hangs, this means that the program is probably jumping to the wrong location in memory. By the way, I am obviously running this in real mode under VMWare player. I am compiling this with the following commands:
gcc -c -0s -march=i686 -ffreestanding -Wall -Werror boot.c -o boot.o
ld -static -Ttest.ld -nostdlib --nmagic -o boot.elf boot.o --no-check-sections
objcopy -0 binary boot.elf boot.bin
and this is test.ld:
ENTRY(main);
SECTIONS
{
. = 0x7C00;
.text : AT(0x7C00)
{
*(.test);
}
.sig : AT(0x7DFE)
{
SHORT(0xAA55);
}
}
Note: I have confirmed this is not a problem with the inline asm - I have tried a pure assembly implementation too with the same results - the only reason I am using C is because I plan on expanding this a bit and I am much more comfortable with C loops and functions...
EDIT: I've uploaded the first three sectors of my floppy drive here
EDIT 2: I have been unable to get my boot loader working using any of the suggestions and, based on advice from #RossRidge I have written an assembly version of the same program and a simple assembly program to echo input. Sadly these aren't working either..
Bootloader:
org 0x7c00
xor ax, ax
mov ds, ax
mov es, ax
mov ah, 0x02
mov al, 0x01
mov bx, 0x7E00
mov cx, 0x0003
xor dh, dh
int 0x13
jmp 0x7E00
Program in sector 3:
xor ax, ax
int 0x16
mov ah, 0xe
int 0x10
These are both compiled with: nasm Linux/boot.S -o Linux/asm.bin and behave the same as their C counterparts..
I am pretty sure your assembler is generating the wrong jump offset, it's probably interpreting the 0x7e00 as relative in the current text section, but that's mapped at 0x7c00 by the linker script so your jump probably goes to 0x7c00+0x7e00=0xfa00 instead. As an ugly workaround you could try jmp 0x200 instead or use an indirect jump to hide it from your tools such as mov $0x7e00, %ax; jmp *%ax. Alternatively jmp .text+0x200 could also work.
Also note that writing 16 bit asm in a 32 bit C compiler will generate wrong code, which is also hinted by the first byte of the loaded sector being 0x66 which is the operand size override prefix. Since the compiler assumes 32 bit mode, it will generate your 16 bit code with prefixes, which when run in 16 bit mode will switch back to 32 bits. In general, it's not a good idea to abuse inline asm for such purpose, you should use a separate asm file and make sure you tell your assembler that the code is intended for 16 bit real mode.
PS: learn to use a debugger.
Update: The following code when assembled using nasm boot.asm -o boot.bin works fine in qemu and bochs:
org 0x7c00
xor ax, ax
mov ds, ax
mov es, ax
mov ah, 0x02
mov al, 0x01
mov bx, 0x7E00
mov cx, 0x0003
xor dh, dh
int 0x13
jmp 0x7E00
times 510-($-$$) db 0
dw 0xaa55
; second sector
times 512 db 0
; Program in sector 3:
xor ax, ax
int 0x16
mov ah, 0xe
int 0x10
jmp $
times 1536-($-$$) db 0
You can download an image here (limited offer ;)).
I have written many boot loaders.
There needs to be two(2) separate executables,
1) the boot loader
2) the main code
so the loaded code (the main program) can be updated without doing anything to the boot loader.
The boot loader, when done needs to jump to a known location.
The main program needs to place a jump instruction at the known location, that jumps to a function in the librt library. (usually start())
start() will set I/O, heap, etc and call main()
Both the linker command file for the boot loader and the linker command file for the main program must agreed on the address of the known location
The known location is usually the only thing in a unique section of both linker command files.
The boot loader must be smart enough to place all the parts of the main program in the right areas in memory.
It is best if the main program is in motorola S1 (or similar) format rather than a raw .elf or .coff format; otherwise the boot loader can get very big very quickly.

function arguments loading to registers on x64

I have this little C code
void decode(int *xp,int *yp,int *zp)
{
int a,b,c;
a=*yp;
b=*zp;
c=*xp;
*yp=c;
*zp=a;
*xp=b;
}
Then I compiled it to object file using gcc -c -O1 decode.c, and then dumped the object with objdump -M intel -d decode.o and the equivalent assembly code for this is
mov ecx,DWORD PTR [rsi]
mov eax,DWORD PTR [rdx]
mov r8d,DWORD PTR [rdi]
mov DWORD PTR [rsi],r8d
mov DWORD PTR [rdx],ecx
mov DWORD PTR [rdi],eax
ret
And I noticed that it doesnt use stack at all.But firstly values still need to be loaded to the registers. So my question is how do the arguments get loaded into the registers? does the compiler automatically loads the arguments to the registers behind the scenes? or something else happens? because there is no instructions that would load the arguments into the registers.
And a little off topic. When you increase optimization for compiling the relationship between original source code and machine code decreases,imposing dificulties to relate the machine code back to the source code. By default if you dont specify the optimization flag to the GCC it doesnt optimize the code. So I tried to compile without any optimizations to get expected results from the source, but what I got was 4-5 times bigger machine code that wasnt related to the source and understandable. But when I applied Level 1 optimization the code appeared understandable and related to the source. But why?
The arguments are loaded to registers in the caller. Example:
int a;
int b;
int f(int, int);
int g(void) {
return f(a, b);
}
Look at the code generated for g:
$ gcc -O1 -S t.c
$ cat t.s
…
movl b(%rip), %esi
movl a(%rip), %edi
call f
Second question:
So I tried to compile without any optimizations to get expected results from the source, but what I got was 4-5 times bigger machine code that wasnt related to the source and understandable.
This happens because unoptimized code is stupid. It is a straight translation of an intermediate representation in which each variable is stored in the stack even if it doesn't need to, each conversion is represented by an explicit operation even if one isn't needed, and so on. -O1 is the best level of optimization for reading the generated assembly. It is also possible to disable the frame pointer, which keeps the overhead to the minimum for simple functions.

Resources