Assembly - Error: junk '40' after expression - c

I'm using an i686-elf-as gcc cross compiler and it's failing to compile an assembly file.
The file is used alongside grub to boot my own operating system but when i try defining any globals or enter the _irq part it spits out tons of errors which are mainly
boot.s:78: Error: no such instruction: `irq4'
boot.s:81: Error: junk `0' after expression
boot.s:82: Error: junk `36' after expression
How would I stop this from happening?
Below is the entire boot.s file
# Declare constants used for creating a multiboot header.
.set ALIGN, 1<<0
.set MEMINFO, 1<<1
.set FLAGS, ALIGN | MEMINFO
.set MAGIC, 0x1BADB002
.set CHECKSUM, -(MAGIC + FLAGS)
.section .multiboot
.align 4
.long MAGIC
.long FLAGS
.long CHECKSUM
.section .bootstrap_stack, "aw", #nobits
stack_bottom:
.skip 16384 # 16 KiB
stack_top:
.section .text
.global _start
.type _start, #function
_start:
movl $stack_top, %esp
call kernel_main
cli
hlt
.Lhang:
jmp .Lhang
.global _irq0
.global _irq1
.global _irq2
.global _irq3
.global _irq4
.global _irq5
.global _irq6
.global _irq7
.global _irq8
.global _irq9
.global _irq10
.global _irq11
.global _irq12
.global _irq13
.global _irq14
.global _irq15
_irq0:
cli
push byte 0
push byte 32
jmp irq_common_stub
_irq1:
cli
push byte 0
push byte 33
jmp irq_common_stub
_irq2:
cli
push byte 0
push byte 34
jmp irq_common_stub
_irq3:
cli
push byte 0
push byte 35
jmp irq_common_stub
_irq4:
cli
push byte 0
push byte 36
jmp irq_common_stub
_irq5:
cli
push byte 0
push byte 37
jmp irq_common_stub
_irq6:
cli
push byte 0
push byte 38
jmp irq_common_stub
_irq7:
cli
push byte 0
push byte 39
jmp irq_common_stub
_irq8:
cli
push byte 0
push byte 40
jmp irq_common_stub
_irq9:
cli
push byte 0
push byte 41
jmp irq_common_stub
_irq10:
cli
push byte 0
push byte 42
jmp irq_common_stub
_irq11:
cli
push byte 0
push byte 43
jmp irq_common_stub
_irq12:
cli
push byte 0
push byte 44
jmp irq_common_stub
_irq13:
cli
push byte 0
push byte 45
jmp irq_common_stub
_irq14:
cli
push byte 0
push byte 46
jmp irq_common_stub
_irq15:
cli
push byte 0
push byte 47
jmp irq_common_stub
extern _irq_handler
irq_common_stub:
pusha
push %ds
push %es
push %fs
push %gs
mov %ax, 0x10
mov %ds, %ax
mov %es, %ax
mov %fs, %ax
mov %gs, %ax
mov %eax, %esp
push %eax
mov %eax, _irq_handler
call eax
pop %eax
pop %gs
pop %fs
pop %es
pop %ds
popa
add %esp, 8
iret
.size _start, . - _start

You're mixing Intel and AT&T syntax assembly language. GNU as uses AT&T syntax traditionally. Intel syntax is that used by assemblers such as NASM, MASM, YASM, and historical assemblers designed for the x86 platform.
movl $stack_top, %esp is a perfectly valid example of AT&T syntax assembly language. push byte 35 is a perfectly valid example of Intel syntax assembly language. The two syntaxes, however, are incompatible, and cannot be combined.
I recommend looking up an assembly language tutorial that uses as on Linux, and learning how to use assembly language in the first place before jumping into something as complex and headache-inducing as systems development. ;)
http://asm.sourceforge.net/ -- Perhaps this tutorial/resource site could be of use to you. Good luck!

Related

QEMU call goes to wrong address

I've been working on a small osdev project. So far i've gotten to running C code with A20, GDT, protected mode (32-bit) and disk loading, but function calls are not working. I've confirmed the actual binary has no problems (ndisasm -b 32 lizard.bin):
... irrelevant bootloader code ...
00000200 8D4C2404 lea ecx,[esp+0x4]
00000204 83E4F0 and esp,byte -0x10
00000207 FF71FC push dword [ecx-0x4]
0000020A 55 push ebp
0000020B 89E5 mov ebp,esp
0000020D 51 push ecx
0000020E 83EC14 sub esp,byte +0x14
00000211 C745F400000000 mov dword [ebp-0xc],0x0
00000218 83EC0C sub esp,byte +0xc
0000021B 8D45F4 lea eax,[ebp-0xc]
0000021E 50 push eax
0000021F E82F000000 call 0x253
00000224 83C410 add esp,byte +0x10
00000227 8945F4 mov [ebp-0xc],eax
0000022A FA cli
0000022B F4 hlt
0000022C 83EC0C sub esp,byte +0xc
0000022F 8D45F4 lea eax,[ebp-0xc]
00000232 50 push eax
00000233 E81B000000 call 0x253
00000238 83C410 add esp,byte +0x10
0000023B 8945F4 mov [ebp-0xc],eax
0000023E 83EC0C sub esp,byte +0xc
00000241 8D45F4 lea eax,[ebp-0xc]
00000244 50 push eax
00000245 E809000000 call 0x253
0000024A 83C410 add esp,byte +0x10
0000024D 8945F4 mov [ebp-0xc],eax
00000250 90 nop
00000251 EBFD jmp short 0x250
00000253 55 push ebp
00000254 89E5 mov ebp,esp
00000256 83EC10 sub esp,byte +0x10
00000259 FA cli
0000025A F4 hlt
0000025B C745FC01000000 mov dword [ebp-0x4],0x1
00000262 8B55FC mov edx,[ebp-0x4]
00000265 89D0 mov eax,edx
00000267 C1E002 shl eax,byte 0x2
0000026A 01D0 add eax,edx
0000026C 8945FC mov [ebp-0x4],eax
0000026F 8B55FC mov edx,[ebp-0x4]
00000272 89D0 mov eax,edx
00000274 C1E003 shl eax,byte 0x3
00000277 29D0 sub eax,edx
00000279 8945FC mov [ebp-0x4],eax
0000027C 836DFC06 sub dword [ebp-0x4],byte +0x6
00000280 8B55FC mov edx,[ebp-0x4]
00000283 89D0 mov eax,edx
00000285 C1E003 shl eax,byte 0x3
00000288 01D0 add eax,edx
0000028A 8945FC mov [ebp-0x4],eax
0000028D 8B4508 mov eax,[ebp+0x8]
00000290 8B55FC mov edx,[ebp-0x4]
00000293 8910 mov [eax],edx
00000295 8B45FC mov eax,[ebp-0x4]
00000298 C9 leave
00000299 C3 ret
The cli & hlt pairs are for debugging with qemu, qemu has not halted on them. As you can see the 3 call instructions are perfectly normal. However running qemu and running info registers produces:
QEMU 6.2.0 monitor - type 'help' for more information
(qemu) info registers
... irrelevant ...
EIP=00007e50 ... irrelevant ...
... irrelevant ...
As you can see, eip is 7e50, the infinite loop! This should not have happened, because there are cli and hlt instructions after the function call (not triggered) and the function (not triggered). If I use gdb, putting a breakpoint on 7e00, the memory address of the kernel, after that continuing and using si sees gdb go into a call to the function, only to have the next instruction be in the infinite loop!
Finally ill provide the files.
Makefile:
PRINTDIRECTORY = --no-print-directory
BOOTLOADER-PARTFILE = int/parts/boot.prt
BOOTLOADER-OBJECTFILE = int/boot.o
BOOTLOADER-SOURCEFILE = src/boot.s
KERNEL-PARTFILE = int/parts/detailed-boot.prt
KERNEL-OBJECTFILE = int/detailed-boot.o
KERNEL-SOURCEFILE = src/detailed-boot.c
GCC = ~/opt/cross/bin/i686-elf-gcc
LD = ~/opt/cross/bin/i686-elf-ld
VM = qemu-system-i386
SYSFILE = lizard.bin
full:
make bootloader $(PRINTDIRECTORY)
make kernel $(PRINTDIRECTORY)
truncate -s 32768 ./int/parts/detailed-boot.prt
make join $(PRINTDIRECTORY)
bootloader:
as -o $(BOOTLOADER-OBJECTFILE) $(BOOTLOADER-SOURCEFILE)
ld -o $(BOOTLOADER-PARTFILE) --oformat binary -e init $(BOOTLOADER-OBJECTFILE) -Ttext 0x7c00
kernel:
$(GCC) -ffunction-sections -ffreestanding $(KERNEL-SOURCEFILE) -o $(KERNEL-OBJECTFILE) -nostdlib -Wall -Wextra -O0
$(LD) -o $(KERNEL-PARTFILE) -Ttext 0x7e00 --oformat binary $(KERNEL-OBJECTFILE) -e main --script=LDfile -O 0 -Ttext-segment 0x7e00
join:
cat $(BOOTLOADER-PARTFILE) $(KERNEL-PARTFILE) > $(SYSFILE)
run:
$(VM) $(SYSFILE)
debug:
$(VM) $(SYSFILE) -gdb tcp:localhost:6000 -S
LDfile:
ENTRY(main)
SECTIONS {
. = 0x7e00;
.text . : { *(.text) }
.data . : { *(.data) }
.bss . : { *(.bss ) }
}
src/detailed-boot.c:
//#include "stdc/stdbool.h"
//#include "stdc/stdio.h"
asm(".code32");
int a(int *d);
int main() {
int c = 0;
c = a(&c);
asm("cli");
asm("hlt");
c = a(&c);
c = a(&c);
while(1);
}
int a(int *d) {
asm("cli");
asm("hlt");
int b = 1;
b *= 5;
b *= 7;
b -= 6;
b *= 9;
*d = b;
return b;
}
//#include "stdc/stdio.c"
src/boot.s:
.code16 # 16 bit mode
.global init # make label init global
init:
call enableA20
reset:
mov $0x00, %ah # 0 = reset drive
mov $0x80, %dl # boot disk
int $0x13
jc reset
load:
mov $0x42, %ah # 42 = extended read
mov $0x8000, %si
xor %bx, %bx
movl $0x00007e00, %ds:4 (%si,1)
movl $0x00400010, %ds:0 (%si,1)
mov %cs, %ds:6 (%si,1)
movl $0x00000001, %ds:8 (%si,1) # start sector in lba
movl $0x00000000, %ds:12(%si,1) # start sector in lba
int $0x13
# 1. Disable interrupts
cli
# 2. Load GDT
lgdt (gdt_descriptor)
# set 32 bit mode
mov %cr0, %eax
or $1, %eax
mov %eax, %cr0
# Far jmp
jmp %cs:(code32)
checkA20:
push %ds
xor %ax, %ax
mov %ax, %ds
movw $0xAA55, %ax
movw $0x7DFE, %bx
movw (%bx), %bx
cmpw %ax, %bx
jnz checkA20_enabled
checkA20_disabled:
xor %ax, %ax
jmp checkA20_done
checkA20_enabled:
xor %ax, %ax
inc %ax
checkA20_done:
pop %ds
ret
enableA20:
call checkA20
jnz enableA20_enabled
enableA20_int15:
mov $0x2403, %ax # A20 gate support
int $0x15
jb enableA20_keyboardController # INT 15 aint supported
cmp $0, %ah
jnz enableA20_keyboardController # INT 15 aint supported
mov $0x2402, %ax # A20 status
int $0x15
jb enableA20_keyboardController # couldnt get status
cmp $0, %ah
jnz enableA20_keyboardController # couldnt get status
cmp $1, %al
jz enableA20_enabled # A20 is activated
mov $0x2401, %ax # A20 activation
int $0x15
jb enableA20_keyboardController # couldnt activate
cmp $0, %ah
jnz enableA20_keyboardController # couldnt activate
enableA20_keyboardController:
call checkA20
jnz enableA20_enabled
cli
call enableA20_wait
mov $0xAD, %al
out %al, $0x64
call enableA20_wait
mov $0xD0, %al
out %al, $64
call enableA20_wait2
in $0x60, %al
push %eax
call enableA20_wait
mov $0xD1, %al
out %al, $0x64
call enableA20_wait
pop %eax
or $2, %al
out %al, $0x60
call enableA20_wait
mov $0xAE, %al
out %al, $0x64
call enableA20_wait
sti
enableA20_fastA20:
call checkA20
jnz enableA20_enabled
in $0x92, %al
test $2, %al
jnz enableA20_postFastA20
or $2, %al
and $0xFE, %al
out %al, $92
enableA20_postFastA20:
call checkA20
jnz enableA20_enabled
cli
hlt
enableA20_enabled:
ret
enableA20_wait:
in $0x64, %al
test $2, %al
jnz enableA20_wait
ret
enableA20_wait2:
in $0x64, %al
test $1, %al
jnz enableA20_wait2
ret
setGDT: ret
# NOTE limit is the length
# NOTE base is the start
# NOTE base + limit = last address
gdt_start:
gdt_null:
# null descriptor
.quad 0
gdt_data:
.word 0x01c8 # limit: bits 0-15
.word 0x0000 # base: bits 0-15
.byte 0x00 # base: bits 16-23
# segment presence: yes (+0x80)
# descriptor priviledge level: ring 0 (+0x00)
# descriptor type: code/data (+0x10)
# executable: no (+0x00)
# direction bit: grows up (+0x00)
# writable bit: writable (+0x02)
# accesed bit [best left 0, cpu will deal with it]: no (+0x00)
.byte 0x80 + 0x10 + 0x02
# granularity flag: limit scaled by 4kib (+0x80)
# size flag: 32 bit pm (+0x40)
# long mode flag: 32pm/16pm/data (+0x00)
# reserved: reserved (+0x00)
.byte 0x80 + 0x40 # flags: granularity # 4-7 limit: bits 16-19 # 0-3
.byte 0x00 # base: bits 24-31
gdt_code:
.word 0x0100 # limit: bits 0-15
.word 0x8000 # base: bits 0-15
.byte 0x1c # base: bits 16-23
# segment presence: yes (+0x80)
# descriptor priviledge level: ring 0 (+0x00)
# descriptor type: code/data (+0x10)
# executable: yes (+0x08)
# conforming bit [0: only ring 0 can execute this]: no (+0x00)
# readable bit: yes (0x02)
# accessed bit [best left 0, cpu will deal with it]: no (0x00)
.byte 0x80 + 0x10 + 0x08 + 0x02
# granularity flag: limit scaled by 4kib (+0x80)
# size flag: 32 bit pm (+0x40)
# long mode flag: 32pm/16pm/data (+0x00)
# reserved: reserved (+0x00)
.byte 0x80 + 0x40 + 0x00 # flags: granularity # 4-7 limit: bits 16-19 # 0-3
.byte 0x00 # base: bits 24-31
gdt_end:
gdt_descriptor:
.word gdt_end - gdt_start - 1
.long gdt_start
.code32
code32:
mov %ds, %ax
mov %ax, %ds
# mov %ax, %ss
mov %ax, %es
mov %ax, %fs
mov %ax, %gs
movl $0x4000, %ebp
mov %esp, %ebp
push $0x7e00
ret
.fill 500-(.-init)
.quad 1
.word 1
.word 0xaa55
kernel:
I know that this is not a minimum scenario, I apologize.
I'll end this off by giving a link to the github repo: https://github.com/saltq144/lizard
and the cross compiler tutorial i followed: https://wiki.osdev.org/GCC_Cross-Compiler
To address some comments: I have not configured the IDT, an NMI would cause a triple fault or jump to garbage, not the loop. Trying to modify SS caused a triple fault from my limited testing. And I do agree that .code32 in the .c file is pointless, but the cross compiler is i686 so 64-bit code shouldn't be an issue, however i'll look into it.
Note:
Using inline assembly, I am able to insert two nop instructions to allow function calls to work. This is not an ideal solution, but it will have to work until this issue is fully resolved. Compiler optimizations may break this, but they haven't yet.
When you switch to protected mode (at the jmp %cs:(code32)) CS is loaded with "protected mode compatible" information from the GDT.
At the start of code32: all other segment registers contain real mode values. You copy the real mode value that is not compatible with protected mode from DS (at mov %ds, %ax) into all data segment registers. This real mode value from DS is probably 0x0000. In protected mode that refers to the "null descriptor".
This is why you can't do mov %ax, %ss - the CPU will not allow you to use "null descriptor" for the stack segment (it will give you a general protection fault instead). Because you don't load SS with protected mode compatible values, it's left using old values from real mode - an unknown base address, a 64 KiB segment limit, and a 16-bit default stack pointer size.
The consequence of all this is... as soon as you do any normal memory access (e.g. the push dword [ecx-0x4] which uses DS as an implied segment register like [ds: ecx-0x4]) you will get a general protection fault because DS is set to the null descriptor. Because you haven't set up a protected mode IDT the CPU will just use the values real mode was using for its IVT, causing CPU to think unknown trash (that would've been for "interrupt 0x1A" in real mode and not "interupt 0x0D" due to IDT entries being twice as big) is the IDT entry for the general protection fault handler. There's no easy way to predict what happens after that (maybe the unknown trash isn't a valid IDT entry for protected mode and it causes a double fault, maybe it is a "valid enough" IDT entry and you start executing garbage at an unknown address).

20 bytes are reserved on the stack for no apparent reason when C code is compiled into machine code

I'm following section 5.1.3 at OS development by Nick Blundell. I'm investigating how the following C code is compiled into machine code:
void caller_fun(){
callee_fun(0xdede);
}
int callee_fun(int arg){
return arg;
}
My final dis-assembled machine code by ndisasm is this:
00000000 55 push ebp
00000001 89E5 mov ebp,esp
00000003 83EC08 sub esp,byte +0x8
00000006 83EC0C sub esp,byte +0xc
00000009 68DEDE0000 push dword 0xdede
0000000E E806000000 call dword 0x19
00000013 83C410 add esp,byte +0x10
00000016 90 nop
00000017 C9 leave
00000018 C3 ret
00000019 55 push ebp
0000001A 89E5 mov ebp,esp
0000001C 8B4508 mov eax,[ebp+0x8]
0000001F 5D pop ebp
00000020 C3 ret
Studying on how the stack pointer and base pointer work, I made the following diagram which shows stack situation when the opcode at offset 0x1C is being run by the processor:
Stack situation when processor is
running `mov eax,[ebp+0x8]` at offset 0x1C
+---------------------------------+
| 4 bytes for |
| `push ebp` at offset 0x00 |
+---------------------------------+
| 20 (8+12) bytes for |
| `sub esp,byte +0x8` |
| and `sub esp,byte +0xc` |
| at offsets 0x03 and 0x06 |
+---------------------------------+
| 4 bytes for `push dword 0xdede` |
| at offset 0x09 |
+---------------------------------+
| 4 bytes for instruction pointer |
| by `call dword 0x19` |
| at offset 0x0E |
+---------------------------------+
| 4 bytes for `push ebp` |
| at offset 0x19 |
+---------------------------------+ --> ebp & esp are both here by
`mov ebp,esp`
at offset 0x1A
Now, I have questions which I couldn't figure out by researching and studying:
Is my diagram of stack situation correct?
Why are 20 bytes pushed into the stack by sub esp,byte +0x8 and sub esp,byte +0xc at offsets 0x03 and 0x06?
Even if 20 bytes of stack memory is needed, why is it not assigned by a single instruction like sub esp,byte +0x14, i.e. 0x14=0x8+0xc
I'm compiling the C code with this make-file:
all: call_fun.o call_fun.bin call_fun.dis
call_fun.o: call_fun.c
gcc -ffreestanding -c call_fun.c -o call_fun.o
call_fun.bin: call_fun.o
ld -o call_fun.bin -Ttext 0x0 --oformat binary call_fun.o
call_fun.dis: call_fun.bin
ndisasm -b 32 call_fun.bin > call_fun.dis
Without the optimizations, stack is going to be used to preserve and restore base pointers. In x86_64 calling conventions (https://en.wikipedia.org/wiki/X86_calling_conventions) stack must be aligned by 16 byte boundaries when calling functions, so this is most likely what happens in your case. At least, this is what happens in my case, when I compile your code on my system. Here is the ASM for this:
callee_fun(int): # #callee_fun(int)
pushq %rbp
movq %rsp, %rbp
movl %edi, -4(%rbp)
movl -4(%rbp), %eax
popq %rbp
retq
caller_fun(): # #caller_fun()
pushq %rbp
movq %rsp, %rbp
subq $16, %rsp
movl $57054, %edi # imm = 0xDEDE
callq callee_fun(int)
movl %eax, -4(%rbp) # 4-byte Spill
addq $16, %rsp
popq %rbp
retq
It is worth noting, that when optimizations are turned on, there is no stack usage or modifications at all:
callee_fun(int): # #callee_fun(int)
movl %edi, %eax
retq
caller_fun(): # #caller_fun()
retq
Last, but not least, when playing with ASM listing, do not disassemble object file or executable file. Instead, direct your compiler to generate assembly listing. This will give you much more context.
If you are using gcc, a good command to do this would be
gcc -fverbose-asm -S -O

How to use lea instruction in a subroutine using GAS

I'm trying to convert a NASM code to GAS. I can't make the lea instruction work.
Here's my original code and this completely works:
section .bss
arr resb 10
section .text
global _start:
_start:
push arr
call getInput
...
getInput:
mov esi, 0
mov ebp, [esp+4]
loop:
...
mov eax, 3
mov ebx, 0
lea ecx, [ebp+esi]
mov edx, 2
int 80h
...
And here's the GAS counterpart I'm trying to write:
.data
arr: .space 10
.text
.globl _start
_start:
push arr
call getInput
...
getInput:
movl $0, %esi
movl 4(%esp,1), %ebp
loop:
...
movl $3, %eax
movl $0, %ebx
leal (%ebp,%esi), %ecx
movl $2, %edx
int $0x80
I've been searching for hours on how to properly do it but I can't find a tutorial on this matter. It produces a segmentation fault when I run it. Please help me.
P.S. I use these commands to compile and link (thanks to someone here who answered my previous question):
as --32 -o sample.o sample.s
ld -m elf_i386 -o sample sample.o

Error: Illegal Instruction of shellcode (at&t) for helloworld.

I am trying to learn how to write shellcode. After searching around, I wrote my own shellcode for hello world. I think the logic is correct, but somehow when I compile the wrapper with the shellcode, it always gives me "illegal instruction".
Could anybody help me to check what is wrong with this code:
Shellcode
.section .data
.section .text
.globl _start
jmp dummy
_start:
# write(1, message, 13)
mov $4, %al # system call 4 is write
mov $1, %bl # file handle 1 is stdout
popl %ecx
mov $12, %dl # number of bytes to write
int $0x80 # invoke operating system code
# exit(0)
xor %eax, %eax
mov $1, %al # system call 1 is exit
xor %ebx, %ebx # we want return code 0
int $0x80 # invoke operating system code
dummy:
call _start
.string "Hello, World"
After running objdump:
file format elf32-i386
Disassembly of section .text:
00000000 <_start-0x2>:
0: eb 11 jmp 13 <dummy>
00000002 <_start>:
2: b0 04 mov $0x4,%al
4: b3 01 mov $0x1,%bl
6: 59 pop %ecx
7: b2 0c mov $0xc,%dl
9: cd 80 int $0x80
b: 31 c0 xor %eax,%eax
d: b0 01 mov $0x1,%al
f: 31 db xor %ebx,%ebx
11: cd 80 int $0x80
00000013 <dummy>:
13: e8 fc ff ff ff call 14 <dummy+0x1>
18: 48 dec %eax
19: 65 gs
1a: 6c insb (%dx),%es:(%edi)
1b: 6c insb (%dx),%es:(%edi)
1c: 6f outsl %ds:(%esi),(%dx)
1d: 2c 20 sub $0x20,%al
1f: 57 push %edi
20: 6f outsl %ds:(%esi),(%dx)
21: 72 6c jb 8f <dummy+0x7c>
23: 64 fs
...
The C Wrapper I used
char code[] = "\xeb\x11"
"\xb0\x04"
"\xb3\x01"
"\x59"
"\xb2\x0c"
"\xcd\x80"
"\x31\xc0"
"\xb0\x01"
"\x31\xdb"
"\xcd\x80"
"\xe8\xfc\xff\xff\xff"
"\x48\x65\x6c\x6c\x6f\x2c\x20\x57\x6f\x72\x6c\x64";
void main() {
int (*func)();
func = (int(*)()) code;
(int) (*func)();
}
There are several problems with your code.
First and foremost, you're only setting the low byte of all of the parameter registers (namely al, bl, and dl). You need to set the full 32 bits. When you execute the way it is now, whatever is left in the remaining 24 bits gets passed to the kernel.
Also, in your C code, the call is not correct:
"\xe8\xfc\xff\xff\xff"
That's essentially call $+1 which is the second byte of the call instruction, which is why you're getting the illegal instruction.
I'm not sure how you arrived at the byte in your code variable, but you need to re-assemble.
Tested with gcc 4.7.2 on Fedora 17, with gcc -m32. (Sorry, I only use Intel syntax)
char code[] __attribute__((section(".text"))) =
"\xeb\x17" // jmp $+19
"\xB8\x04\x00\x00\x00" // mov eax, 4 ; (sys_write)
"\x31\xDB" // xor ebx, ebx
"\x43" // inc ebx
"\x59" // pop ecx ; (addr of string pushed by call below)
"\x31\xD2" // xor edx, edx
"\xb2\x0c" // mov dl, 0Ch ; (length of string)
"\xcd\x80" // int 80h
"\x31\xc0" // xor eax, eax
"\xb0\x01" // mov al, 1 ; (sys_exit)
"\x31\xdb" // xor ebx, ebx
"\xcd\x80" // int 80h
"\xe8\xe4\xff\xff\xff" // call $-23
"\x48\x65\x6c\x6c\x6f\x2c\x20\x57\x6f\x72\x6c\x64\x00"; // "Hello, World"
void main() {
int (*func)();
func = (int(*)()) code;
(int) (*func)();
}
Note that there are certainly ways to make the code smaller, but that is of course left as an exercise for the reader.
If you're going to play around with hand-tweaked assembly like this, be prepared to debug, debug, debug. Learn how to use GDB now, or you will be forever helpless. Set a breakpoint on the beginning of the assembly (b code) and step through it. You'll quickly see what went wrong.

Understand the assembly code generated by a simple C program

I am trying to understand the assembly level code for a simple C program by inspecting it with gdb's disassembler.
Following is the C code:
#include <stdio.h>
void function(int a, int b, int c) {
char buffer1[5];
char buffer2[10];
}
void main() {
function(1,2,3);
}
Following is the disassembly code for both main and function
gdb) disass main
Dump of assembler code for function main:
0x08048428 <main+0>: push %ebp
0x08048429 <main+1>: mov %esp,%ebp
0x0804842b <main+3>: and $0xfffffff0,%esp
0x0804842e <main+6>: sub $0x10,%esp
0x08048431 <main+9>: movl $0x3,0x8(%esp)
0x08048439 <main+17>: movl $0x2,0x4(%esp)
0x08048441 <main+25>: movl $0x1,(%esp)
0x08048448 <main+32>: call 0x8048404 <function>
0x0804844d <main+37>: leave
0x0804844e <main+38>: ret
End of assembler dump.
(gdb) disass function
Dump of assembler code for function function:
0x08048404 <function+0>: push %ebp
0x08048405 <function+1>: mov %esp,%ebp
0x08048407 <function+3>: sub $0x28,%esp
0x0804840a <function+6>: mov %gs:0x14,%eax
0x08048410 <function+12>: mov %eax,-0xc(%ebp)
0x08048413 <function+15>: xor %eax,%eax
0x08048415 <function+17>: mov -0xc(%ebp),%eax
0x08048418 <function+20>: xor %gs:0x14,%eax
0x0804841f <function+27>: je 0x8048426 <function+34>
0x08048421 <function+29>: call 0x8048340 <__stack_chk_fail#plt>
0x08048426 <function+34>: leave
0x08048427 <function+35>: ret
End of assembler dump.
I am seeking answers for following things :
how the addressing is working , I mean (main+0) , (main+1), (main+3)
In the main, why is $0xfffffff0,%esp being used
In the function, why is %gs:0x14,%eax , %eax,-0xc(%ebp) being used.
If someone can explain , step by step happening, that will be greatly appreciated.
The reason for the "strange" addresses such as main+0, main+1, main+3, main+6 and so on, is because each instruction takes up a variable number of bytes. For example:
main+0: push %ebp
is a one-byte instruction so the next instruction is at main+1. On the other hand,
main+3: and $0xfffffff0,%esp
is a three-byte instruction so the next instruction after that is at main+6.
And, since you ask in the comments why movl seems to take a variable number of bytes, the explanation for that is as follows.
Instruction length depends not only on the opcode (such as movl) but also the addressing modes for the operands as well (the things the opcode are operating on). I haven't checked specifically for your code but I suspect the
movl $0x1,(%esp)
instruction is probably shorter because there's no offset involved - it just uses esp as the address. Whereas something like:
movl $0x2,0x4(%esp)
requires everything that movl $0x1,(%esp) does, plus an extra byte for the offset 0x4.
In fact, here's a debug session showing what I mean:
Microsoft Windows XP [Version 5.1.2600]
(C) Copyright 1985-2001 Microsoft Corp.
c:\pax> debug
-a
0B52:0100 mov word ptr [di],7
0B52:0104 mov word ptr [di+2],8
0B52:0109 mov word ptr [di+0],7
0B52:010E
-u100,10d
0B52:0100 C7050700 MOV WORD PTR [DI],0007
0B52:0104 C745020800 MOV WORD PTR [DI+02],0008
0B52:0109 C745000700 MOV WORD PTR [DI+00],0007
-q
c:\pax> _
You can see that the second instruction with an offset is actually different to the first one without it. It's one byte longer (5 bytes instead of 4, to hold the offset) and actually has a different encoding c745 instead of c705.
You can also see that you can encode the first and third instruction in two different ways but they basically do the same thing.
The and $0xfffffff0,%esp instruction is a way to force esp to be on a specific boundary. This is used to ensure proper alignment of variables. Many memory accesses on modern processors will be more efficient if they follow the alignment rules (such as a 4-byte value having to be aligned to a 4-byte boundary). Some modern processors will even raise a fault if you don't follow these rules.
After this instruction, you're guaranteed that esp is both less than or equal to its previous value and aligned to a 16 byte boundary.
The gs: prefix simply means to use the gs segment register to access memory rather than the default.
The instruction mov %eax,-0xc(%ebp) means to take the contents of the ebp register, subtract 12 (0xc) and then put the value of eax into that memory location.
Re the explanation of the code. Your function function is basically one big no-op. The assembly generated is limited to stack frame setup and teardown, along with some stack frame corruption checking which uses the afore-mentioned %gs:14 memory location.
It loads the value from that location (probably something like 0xdeadbeef) into the stack frame, does its job, then checks the stack to ensure it hasn't been corrupted.
Its job, in this case, is nothing. So all you see is the function administration stuff.
Stack set-up occurs between function+0 and function+12. Everything after that is setting up the return code in eax and tearing down the stack frame, including the corruption check.
Similarly, main consist of stack frame set-up, pushing the parameters for function, calling function, tearing down the stack frame and exiting.
Comments have been inserted into the code below:
0x08048428 <main+0>: push %ebp ; save previous value.
0x08048429 <main+1>: mov %esp,%ebp ; create new stack frame.
0x0804842b <main+3>: and $0xfffffff0,%esp ; align to boundary.
0x0804842e <main+6>: sub $0x10,%esp ; make space on stack.
0x08048431 <main+9>: movl $0x3,0x8(%esp) ; push values for function.
0x08048439 <main+17>: movl $0x2,0x4(%esp)
0x08048441 <main+25>: movl $0x1,(%esp)
0x08048448 <main+32>: call 0x8048404 <function> ; and call it.
0x0804844d <main+37>: leave ; tear down frame.
0x0804844e <main+38>: ret ; and exit.
0x08048404 <func+0>: push %ebp ; save previous value.
0x08048405 <func+1>: mov %esp,%ebp ; create new stack frame.
0x08048407 <func+3>: sub $0x28,%esp ; make space on stack.
0x0804840a <func+6>: mov %gs:0x14,%eax ; get sentinel value.
0x08048410 <func+12>: mov %eax,-0xc(%ebp) ; put on stack.
0x08048413 <func+15>: xor %eax,%eax ; set return code 0.
0x08048415 <func+17>: mov -0xc(%ebp),%eax ; get sentinel from stack.
0x08048418 <func+20>: xor %gs:0x14,%eax ; compare with actual.
0x0804841f <func+27>: je <func+34> ; jump if okay.
0x08048421 <func+29>: call <_stk_chk_fl> ; otherwise corrupted stack.
0x08048426 <func+34>: leave ; tear down frame.
0x08048427 <func+35>: ret ; and exit.
I think the reason for the %gs:0x14 may be evident from above but, just in case, I'll elaborate here.
It uses this value (a sentinel) to put in the current stack frame so that, should something in the function do something silly like write 1024 bytes to a 20-byte array created on the stack or, in your case:
char buffer1[5];
strcpy (buffer1, "Hello there, my name is Pax.");
then the sentinel will be overwritten and the check at the end of the function will detect that, calling the failure function to let you know, and then probably aborting so as to avoid any other problems.
If it placed 0xdeadbeef onto the stack and this was changed to something else, then an xor with 0xdeadbeef would produce a non-zero value which is detected in the code with the je instruction.
The relevant bit is paraphrased here:
mov %gs:0x14,%eax ; get sentinel value.
mov %eax,-0xc(%ebp) ; put on stack.
;; Weave your function
;; magic here.
mov -0xc(%ebp),%eax ; get sentinel back from stack.
xor %gs:0x14,%eax ; compare with original value.
je stack_ok ; zero/equal means no corruption.
call stack_bad ; otherwise corrupted stack.
stack_ok: leave ; tear down frame.
Pax has produced a definitive answer. However, for completeness, I thought I'd add a note on getting GCC itself to show you the assembly it generates.
The -S option to GCC tells it to stop compilation and write the assembly to a file. Normally, it either passes that file to the assembler or for some targets writes the object file directly itself.
For the sample code in the question:
#include <stdio.h>
void function(int a, int b, int c) {
char buffer1[5];
char buffer2[10];
}
void main() {
function(1,2,3);
}
the command gcc -S q3654898.c creates a file named q3654898.s:
.file "q3654898.c"
.text
.globl _function
.def _function; .scl 2; .type 32; .endef
_function:
pushl %ebp
movl %esp, %ebp
subl $40, %esp
leave
ret
.def ___main; .scl 2; .type 32; .endef
.globl _main
.def _main; .scl 2; .type 32; .endef
_main:
pushl %ebp
movl %esp, %ebp
subl $24, %esp
andl $-16, %esp
movl $0, %eax
addl $15, %eax
addl $15, %eax
shrl $4, %eax
sall $4, %eax
movl %eax, -4(%ebp)
movl -4(%ebp), %eax
call __alloca
call ___main
movl $3, 8(%esp)
movl $2, 4(%esp)
movl $1, (%esp)
call _function
leave
ret
One thing that is evident is that my GCC (gcc (GCC) 3.4.5 (mingw-vista special r3)) doesn't include the stack check code by default. I imagine that there is a command line option, or that if I ever got around to nudging my MinGW install up to a more current GCC that it could.
Edit: Nudged to do so by Pax, here's another way to get GCC to do more of the work.
C:\Documents and Settings\Ross\My Documents\testing>gcc -Wa,-al q3654898.c
q3654898.c: In function `main':
q3654898.c:8: warning: return type of 'main' is not `int'
GAS LISTING C:\DOCUME~1\Ross\LOCALS~1\Temp/ccLg8pWC.s page 1
1 .file "q3654898.c"
2 .text
3 .globl _function
4 .def _function; .scl 2; .type
32; .endef
5 _function:
6 0000 55 pushl %ebp
7 0001 89E5 movl %esp, %ebp
8 0003 83EC28 subl $40, %esp
9 0006 C9 leave
10 0007 C3 ret
11 .def ___main; .scl 2; .type
32; .endef
12 .globl _main
13 .def _main; .scl 2; .type 32;
.endef
14 _main:
15 0008 55 pushl %ebp
16 0009 89E5 movl %esp, %ebp
17 000b 83EC18 subl $24, %esp
18 000e 83E4F0 andl $-16, %esp
19 0011 B8000000 movl $0, %eax
19 00
20 0016 83C00F addl $15, %eax
21 0019 83C00F addl $15, %eax
22 001c C1E804 shrl $4, %eax
23 001f C1E004 sall $4, %eax
24 0022 8945FC movl %eax, -4(%ebp)
25 0025 8B45FC movl -4(%ebp), %eax
26 0028 E8000000 call __alloca
26 00
27 002d E8000000 call ___main
27 00
28 0032 C7442408 movl $3, 8(%esp)
28 03000000
29 003a C7442404 movl $2, 4(%esp)
29 02000000
30 0042 C7042401 movl $1, (%esp)
30 000000
31 0049 E8B2FFFF call _function
31 FF
32 004e C9 leave
33 004f C3 ret
C:\Documents and Settings\Ross\My Documents\testing>
Here we see an output listing produced by the assembler. (Its name is GAS, because it is Gnu's version of the classic *nix assembler as. There's humor there somewhere.)
Each line has most of the following fields: a line number, an address in the current section, bytes stored at that address, and the source text from the assembly source file.
The addresses are offsets into that portion of each section provided by this module. This particular module only has content in the .text section which stores executable code. You will typically find mention of sections named .data and .bss as well. Lots of other names are used and some have special purposes. Read the manual for the linker if you really want to know.
It will be better to try the -fno-stack-protector flag with gcc to disable the canary and see your results.
I'd like to add that for simple stuff, GCC's assembly output is often easier to read if you turn on a little optimization. Here's the sample code again...
void function(int a, int b, int c) {
char buffer1[5];
char buffer2[10];
}
/* corrected calling convention of main() */
int main() {
function(1,2,3);
return 0;
}
this is what I get without optimization (OSX 10.6, gcc 4.2.1+Apple patches)
.globl _function
_function:
pushl %ebp
movl %esp, %ebp
pushl %ebx
subl $36, %esp
call L4
"L00000000001$pb":
L4:
popl %ebx
leal L___stack_chk_guard$non_lazy_ptr-"L00000000001$pb"(%ebx), %eax
movl (%eax), %eax
movl (%eax), %edx
movl %edx, -12(%ebp)
xorl %edx, %edx
leal L___stack_chk_guard$non_lazy_ptr-"L00000000001$pb"(%ebx), %eax
movl (%eax), %eax
movl -12(%ebp), %edx
xorl (%eax), %edx
je L3
call ___stack_chk_fail
L3:
addl $36, %esp
popl %ebx
leave
ret
.globl _main
_main:
pushl %ebp
movl %esp, %ebp
subl $24, %esp
movl $3, 8(%esp)
movl $2, 4(%esp)
movl $1, (%esp)
call _function
movl $0, %eax
leave
ret
Whew, one heck of a mouthful! But look what happens with -O on the command line...
.text
.globl _function
_function:
pushl %ebp
movl %esp, %ebp
leave
ret
.globl _main
_main:
pushl %ebp
movl %esp, %ebp
movl $0, %eax
leave
ret
Of course, you do run the risk of your code being rendered completely unrecognizable, especially at higher optimization levels and with more complicated stuff. Even here, we see that the call to function has been discarded as pointless. But I find that not having to read through dozens of unnecessary stack spills is generally more than worth a little extra scratching my head over the control flow.

Resources