How to use lea instruction in a subroutine using GAS - arrays

I'm trying to convert a NASM code to GAS. I can't make the lea instruction work.
Here's my original code and this completely works:
section .bss
arr resb 10
section .text
global _start:
_start:
push arr
call getInput
...
getInput:
mov esi, 0
mov ebp, [esp+4]
loop:
...
mov eax, 3
mov ebx, 0
lea ecx, [ebp+esi]
mov edx, 2
int 80h
...
And here's the GAS counterpart I'm trying to write:
.data
arr: .space 10
.text
.globl _start
_start:
push arr
call getInput
...
getInput:
movl $0, %esi
movl 4(%esp,1), %ebp
loop:
...
movl $3, %eax
movl $0, %ebx
leal (%ebp,%esi), %ecx
movl $2, %edx
int $0x80
I've been searching for hours on how to properly do it but I can't find a tutorial on this matter. It produces a segmentation fault when I run it. Please help me.
P.S. I use these commands to compile and link (thanks to someone here who answered my previous question):
as --32 -o sample.o sample.s
ld -m elf_i386 -o sample sample.o

Related

Return of syscall with wrong file descriptor is not negative [duplicate]

I'm having trouble finding the good documentation for writing 64-bit assembly on MacOS.
The 64-bit SysV ABI says the following in section A.2.1 and this SO post quotes it:
A system-call is done via the syscall instruction. The kernel destroys
registers %rcx and %r11.
Returning from the syscall, register %rax contains the result of the
system-call. A value in the range between -4095 and -1 indicates an error,
it is -errno.
Those two sentences are ok on Linux but are wrong on macOS Sierra with the following code:
global _start
extern _exit
section .text
_start:
; Align stack to 16 bytes for libc
and rsp, 0xFFFFFFFFFFFFFFF0
; Call write
mov rdx, 12 ; size
mov rsi, hello ; buf
mov edi, 1 ; fd
mov rax, 0x2000004 ; write ; replace to mov rax, 0x1 on linux
syscall
jc .err ; Jumps on error on macOS, but why?
jnc .ok
.err:
mov rdi, -1
call _exit ; exit(-1)
.ok:
; Expect rdx to be 12, but it isn't on macOS!
mov rdi, rdx
call _exit ; exit(rdx)
; String for write
section .data
hello:
.str db `Hello world\n`
.len equ $-hello.str
Compile with NASM:
; MacOS: nasm -f macho64 syscall.asm && ld syscall.o -lc -macosx_version_min 10.12 -e _start -o syscall
; Linux: nasm -f elf64 syscall.asm -o syscall.o && ld syscall.o -lc -dynamic-linker /lib64/ld-linux-x86-64.so.2 -o syscall
Run on macOS:
./syscall # Return value 0
./syscall >&- # Return value 255 (-1)
I found out that:
A syscall return errno an sets the carry flag on error, instead of returning -errno in rax
rdx register is clobbered by syscall
On Linux, everything works as expected
Why is rdx clobbered? Why doesn't a syscall return -errno? Where can I find the real documentation?
The only place I found where someone talks about the carry flag for syscall errors is here
I used this:
# as hello.asm -o hello.o
# ld hello.o -macosx_version_min 10.13 -e _main -o hello -lSystem
.section __DATA,__data
str:
.asciz "Hello world!\n"
.section __TEXT,__text
.globl _main
_main:
movl $0x2000004, %eax # preparing system call 4
movl $1, %edi # STDOUT file descriptor is 1
movq str#GOTPCREL(%rip), %rsi # The value to print
movq $13, %rdx # the size of the value to print
syscall
movl %eax, %edi
movl $0x2000001, %eax # exit (return value of the call to write())
syscall
and was able to catch return value into eax. Here return value is the number of bytes actually written by write system call. And yes MacOS being a BSD variant it is the carry flag that tells you if the syscall was wrong or not (errno is just an external linkage variable).
# hello_asm.s
# as hello_asm.s -o hello_asm.o
# ld hello_asm.o -e _main -o hello_asm
.section __DATA,__data
str:
.asciz "Hello world!\n"
good:
.asciz "OK\n"
.section __TEXT,__text
.globl _main
_main:
movl $0x2000004, %eax # preparing system call 4
movl $5, %edi # STDOUT file descriptor is 5
movq str#GOTPCREL(%rip), %rsi # The value to print
movq $13, %rdx # the size of the value to print
syscall
jc err
movl $0x2000004, %eax # preparing system call 4
movl $1, %edi # STDOUT file descriptor is 1
movq good#GOTPCREL(%rip), %rsi # The value to print
movq $3, %rdx # the size of the value to print
syscall
movl $0, %edi
movl $0x2000001, %eax # exit 0
syscall
err:
movl $1, %edi
movl $0x2000001, %eax # exit 1
syscall
This will exits with error code one because descriptor 5 was used, if you try descriptor 1 then it will work printing another message and exiting with 0.
I don't know why rdx gets clobbered, just to confirm that it indeed does seem to get zeroed across the "write" systemcall. I examined the status of every register:
global _start
section .text
_start:
mov rax, 0xDEADBEEF; 0xDEADBEEF = 3735928559; 3735928559 mod 256 = 239
mov rbx, 0xDEADBEEF
mov rcx, 0xDEADBEEF
mov rdx, 0xDEADBEEF
mov rsi, 0xDEADBEEF
mov rdi, 0xDEADBEEF
mov rsp, 0xDEADBEEF
mov rbp, 0xDEADBEEF
mov r8, 0xDEADBEEF
mov r9, 0xDEADBEEF
mov r10, 0xDEADBEEF
mov r11, 0xDEADBEEF
mov r12, 0xDEADBEEF
mov r13, 0xDEADBEEF
mov r14, 0xDEADBEEF
mov r15, 0xDEADBEEF
mov rdx, len2 ; size
mov rsi, msg2 ; buf
mov rdi, 1 ; fd
mov rax, 0x2000004 ; write
syscall
mov rdi, rsi ; CHANGE THIS TO EXAMINE DIFFERENT REGISTERS
mov rax, 0x2000001 ; exit
syscall
section .data
msg_pad db `aaaa\n` ; to make the buffer not to be page-aligned
msg2 db `bbbbbb\n` ; because then it's easier to notice whether
len2 equ $-msg2 ; clobbered or not
nasm -f macho64 syscall.asm && ld syscall.o -e _start -static && ./a.out; echo "status: $?"
The results I got:
clobber list of a "write" syscall
rax clobbered
rbx not clobbered
rcx clobbered
rdx clobbered <- This is the unexpected case?!
rsi not clobbered
rdi not clobbered
rsp not clobbered
rbp not clobbered
r8 not clobbered
r9 not clobbered
r10 not clobbered
r11 clobbered
r12 not clobbered
r13 not clobbered
r14 not clobbered
r15 not clobbered
It would be interesting to know other syscalls zero rdx too, I didn't have the energy to attempt a thorough investigation. But maybe, just to be safe, one should add rdx to the clobber list of all of the MacOS syscalls from now on.

Unexpected behaviour in simple pointer arithmetics in kernel space C code [duplicate]

I am currently following this workbook on build an operating system.
My intention is to write a 64-bit kernel. I have got as far as loading the "kernel" code and writing individual characters to the frame buffer while in text mode.
My problem appears when I add a level of indirection to writing a single character to the frame buffer by wrapping the code in a function. It would appear that the char value passed into the function is being corrupted in some way.
I have three files:
bootloader.asm
; bootloader.asm
[org 0x7c00]
KERNEL_OFFSET equ 0x1000
mov bp, 0x9000
mov sp, bp
; load the kernel from boot disk
mov bx, KERNEL_OFFSET
mov dl, dl ; boot drive is set to dl
mov ah, 0x02 ; bios read sector
mov al, 15 ; read 15 sectors
mov ch, 0x00 ; cylinder 0
mov cl, 0x02 ; read from 2nd sector
mov dh, 0x00 ; select head 0
int 0x13
; THERE COULD BE ERRORS HERE BUT FOR NOW ASSUME IT WORKS
; switch to protected mode
cli
lgdt [gdt.descriptor]
mov eax, cr0
or eax, 1
mov cr0, eax
jmp CODE_SEGMENT:start_protected_mode
[bits 32]
start_protected_mode:
mov ax, DATA_SEGMENT
mov ds, ax
mov ss, ax
mov es, ax
mov fs, ax
mov gs, ax
mov ebp, 0x90000
mov esp, ebp
call KERNEL_OFFSET
jmp $
[bits 16]
gdt: ; Super Simple Global Descriptor Table
.start:
.null:
dd 0x0
dd 0x0
.code:
dw 0xffff
dw 0x0
db 0x0
db 10011010b
db 11001111b
db 0x0
.data:
dw 0xffff
dw 0x0
db 0x0
db 10010010b
db 11001111b
db 0x0
.end:
.descriptor:
dw .end - .start
dd .start
CODE_SEGMENT equ gdt.code - gdt.start
DATA_SEGMENT equ gdt.data - gdt.start
times 510-($-$$) db 0
dw 0xaa55
bootkernel.asm
[bits 32]
[extern main]
[global _start]
_start:
call main
jmp $
kernel.c
// LEGACY MODE VIDEO DRIVER
#define FRAME_BUFFER_ADDRESS 0xb8002
#define GREY_ON_BLACK 0x07
#define WHITE_ON_BLACK 0x0f
void write_memory(unsigned long address, unsigned int index, unsigned char value)
{
unsigned char * memory = (unsigned char *) address;
memory[index] = value;
}
unsigned int frame_buffer_offset(unsigned int col, unsigned int row)
{
return 2 * ((row * 80u) + col);
}
void write_frame_buffer_cell(unsigned char c, unsigned char a, unsigned int col, unsigned int row)
{
unsigned int offset = frame_buffer_offset(col, row);
write_memory(FRAME_BUFFER_ADDRESS, offset, c);
write_memory(FRAME_BUFFER_ADDRESS, offset + 1, a);
}
void main()
{
unsigned int offset = frame_buffer_offset(0, 1);
write_memory(FRAME_BUFFER_ADDRESS, offset, 'A');
write_memory(FRAME_BUFFER_ADDRESS, offset + 1, GREY_ON_BLACK);
write_frame_buffer_cell('B', GREY_ON_BLACK, 0, 1);
}
The .text section is linked to start from 0x1000 which is where the bootloader expects the kernel to start.
The linker.ld script is
SECTIONS
{
. = 0x1000;
.text : { *(.text) } /* Kernel is expected at 0x1000 */
}
The Make file that puts this all together is:
bootloader.bin: bootloader.asm
nasm -f bin bootloader.asm -o bootloader.bin
bootkernel.o: bootkernel.asm
nasm -f elf64 bootkernel.asm -o bootkernel.o
kernel.o: kernel.c
gcc-6 -Wextra -Wall -ffreestanding -c kernel.c -o kernel.o
kernel.bin: bootkernel.o kernel.o linker.ld
ld -o kernel.bin -T linker.ld bootkernel.o kernel.o --oformat binary
os-image: bootloader.bin kernel.bin
cat bootloader.bin kernel.bin > os-image
qemu: os-image
qemu-system-x86_64 -d guest_errors -fda os-image -boot a
I've taken a screen shot of the output that I am getting. I expect 'A' to appear in the 0th column of the 1st row and for 'B' to appear on the 1st column of the 0th row. For some reason I am getting another character.
Output of gcc-6 -S kernel.c
.file "kernel.c"
.text
.globl write_memory
.type write_memory, #function
write_memory:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movq %rdi, -24(%rbp)
movl %esi, -28(%rbp)
movl %edx, %eax
movb %al, -32(%rbp)
movq -24(%rbp), %rax
movq %rax, -8(%rbp)
movl -28(%rbp), %edx
movq -8(%rbp), %rax
addq %rax, %rdx
movzbl -32(%rbp), %eax
movb %al, (%rdx)
nop
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size write_memory, .-write_memory
.globl frame_buffer_offset
.type frame_buffer_offset, #function
frame_buffer_offset:
.LFB1:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl %edi, -4(%rbp)
movl %esi, -8(%rbp)
movl -8(%rbp), %edx
movl %edx, %eax
sall $2, %eax
addl %edx, %eax
sall $4, %eax
movl %eax, %edx
movl -4(%rbp), %eax
addl %edx, %eax
addl %eax, %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE1:
.size frame_buffer_offset, .-frame_buffer_offset
.globl write_frame_buffer_cell
.type write_frame_buffer_cell, #function
write_frame_buffer_cell:
.LFB2:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
subq $32, %rsp
movl %esi, %eax
movl %edx, -28(%rbp)
movl %ecx, -32(%rbp)
movb %dil, -20(%rbp)
movb %al, -24(%rbp)
movl -32(%rbp), %edx
movl -28(%rbp), %eax
movl %edx, %esi
movl %eax, %edi
call frame_buffer_offset
movl %eax, -4(%rbp)
movzbl -20(%rbp), %edx
movl -4(%rbp), %eax
movl %eax, %esi
movl $753666, %edi
call write_memory
movzbl -24(%rbp), %eax
movl -4(%rbp), %edx
leal 1(%rdx), %ecx
movl %eax, %edx
movl %ecx, %esi
movl $753666, %edi
call write_memory
nop
leave
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE2:
.size write_frame_buffer_cell, .-write_frame_buffer_cell
.globl main
.type main, #function
main:
.LFB3:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
subq $16, %rsp
movl $1, %esi
movl $0, %edi
call frame_buffer_offset
movl %eax, -4(%rbp)
movl -4(%rbp), %eax
movl $65, %edx
movl %eax, %esi
movl $753666, %edi
call write_memory
movl -4(%rbp), %eax
addl $1, %eax
movl $7, %edx
movl %eax, %esi
movl $753666, %edi
call write_memory
movl $0, %ecx
movl $1, %edx
movl $7, %esi
movl $66, %edi
call write_frame_buffer_cell
nop
leave
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE3:
.size main, .-main
.ident "GCC: (Ubuntu 6.2.0-3ubuntu11~16.04) 6.2.0 20160901"
.section .note.GNU-stack,"",#progbits
I can reproduce your exact output if the code is modified to be:
unsigned int offset = frame_buffer_offset(0, 1);
write_memory(FRAME_BUFFER_ADDRESS, offset, 'A');
write_memory(FRAME_BUFFER_ADDRESS, offset + 1, GREY_ON_BLACK);
write_frame_buffer_cell('B', GREY_ON_BLACK, 1, 0);
The difference being in the last line ('B', GREY_ON_BLACK, 1, 0);. Originally you had ('B', GREY_ON_BLACK, 0, 1); . This is in line with what you described you were trying to do when you said:
I've taken a screen shot of the output that I am getting. I expect 'A' to appear in the 0th column of the 1st row and for 'B' to appear on the 1st column of the 0th row.
I gather you may have posted the wrong code in this question. This is the output I get:
It seems you are new to OS development. Your bootloader code only places the CPU into 32-bit protected mode, but to run a 64-bit kernel you need to be in 64-bit longmode. If you are just getting started I'd suggest falling back to writing a 32-bit kernel for purposes of learning at this early stage. At the bottom I have a 64-bit long mode section with a link to a longmode tutorial that could be used to modify your bootloader to run 64-bit code.
Primary Issue Causing Unusual Behaviour
You are experiencing an issue primarily related to the fact that you are generating 64-bit code with GCC but you are running it in 32-bit protected mode according to your bootloader code. 64-bit code generation running in 32-bit protected mode may appear to execute, but it will do it incorrectly. In simple OSes where you are simply displaying to the video display you may often see unexpected output as a side effect. Your program could triple fault the machine, but you got unlucky that the side effect seemed to display something on the video display. You may have been under the false impression that things were working as they should when they really weren't.
This question is somewhat similar to another Stackoverflow question. After the original poster of that question made available a complete example it became clear that it was his issue. Part of my answer to him to resolve the issue was as follows:
Likely Cause of Undefined Behavior
After all the code and the make file were made available in EDIT 2 it became clear that one significant problem was that most of the code was compiled and linked to 64-bit objects and executables. That code won't work in 32-bit protected mode.
In the make file make these adjustments:
When compiling with GCC you need to add -m32 option
When assembling with GNU Assembler (as) targeting 32-bit objects you need to use --32
When linking with LD you need to add the -melf_i386 option
When assembling with NASM targeting 32-bit objects you need to change -f elf64 to -f elf32
With that in mind you can alter your Makefile to generate 32-bit code. It could look like:
bootloader.bin: bootloader.asm
nasm -f bin bootloader.asm -o bootloader.bin
bootkernel.o: bootkernel.asm
nasm -f elf32 bootkernel.asm -o bootkernel.o
kernel.o: kernel.c
gcc-6 -m32 -Wextra -Wall -ffreestanding -c kernel.c -o kernel.o
kernel.bin: bootkernel.o kernel.o linker.ld
ld -melf_i386 -o kernel.bin -T linker.ld bootkernel.o kernel.o --oformat binary
os-image: bootloader.bin kernel.bin
cat bootloader.bin kernel.bin > os-image
qemu: os-image
qemu-system-x86_64 -d guest_errors -fda os-image -boot a
I gather when you started having issues with your code you ended up trying 0xb8002 as the address for your video memory. It should be 0xb8000. You'll need to modify:
#define FRAME_BUFFER_ADDRESS 0xb8002
To be:
#define FRAME_BUFFER_ADDRESS 0xb8000
Making all these changes should resolve your issues. This is what the output I got looked like after the changes mentioned above:
Other observations
In write_memory you use:
unsigned char * memory = (unsigned char *) address;
Since you are using 0xb8000 that is memory mapped to the video display you should mark it as volatile since a compiler could optimize things away not knowing that there is a side effect to writing to that memory (namely displaying characters on a display). You might wish to use:
volatile unsigned char * memory = (unsigned char *) address;
In your bootloader.asm You really should explicitly set the A20 line on. You can find information about doing that in this OSDev Wiki article. The status of the A20 line at the point a bootloader starts executing may vary between emulators. Failure to set it on could cause issues if you try to access memory areas on an odd numbered megabyte boundary (like 0x100000 to 0x1fffff, 0x300000 to 0x1fffff etc). Accesses to the odd numbered megabyte memory regions will actually read data from the even numbered memory region just below it. This is usually not behaviour you want.
64-bit long mode
If you want to run 64-bit code you will need to place the processor into 64-bit long mode. This is a bit more involved than entering 32-bit protected mode. Information on 64-bit longmode can be found in the OSDev wiki. Once properly in 64-bit longmode you can use 64-bit instructions generated by GCC.

How do i use arrays in x86 assembly code to replace letters of a word?

Hey everyone so i am working on an assignment involving arrays in assembly. I need to have the user enter a number, then clear the screen. After that a second player tries to guess the word. I did all that but i also have to display a hint everytime the second player tries to guess. For example if i entered the word hello the program displays h!l!o when the second player tries to guess. I have tried it but cant get it to work. Any help would be much appreciated, thank you.
.data
chose:
.ascii "Enter the Secret Word\n"
chose_length:
.int 22
lets_play_response:
.ascii "Try to Guess the Word Entered\n"
l_p_response_length:
.int 30
wrong_guess:
.ascii "Incorrect Guess, Try Again\n"
wrong_guess_length:
.int 27
correct:
.ascii "Correct Guess, Good Job\n"
correct_length:
.int 24
Screen_Clearer:
.ascii "\x1B[H\x1B[2J"
Screen_Clearer_length:
.int 11
letter:
.space 15
guess:
.space 15
.text
.global _start
_start:
mov $chose, %ecx
mov chose_length, %edx
mov $4, %eax
mov $1, %ebx
int $0x80
mov $letter, %ecx
mov $15, %edx
mov $3, %eax
mov $0, %ebx
int $0x80
call Screen_Clear
mov $lets_play_response, %ecx
mov l_p_response_length, %edx
mov $4, %eax
mov $1, %ebx
int $0x80
# Method to Print Word With Every Second Letter Replaced With !
# This is the area with the problems everything else works
mov $0, %edi
Loop:
cmp $4, %edi
jg End
mov $33, letter (%edi)
add $1, %edi
jmp Loop
End:
mov $letter, %ecx
mov $4, %eax
mov $1, %ebx
ret
# End of Method
call GuessLoop
mov $1, %eax
int $0x80
GuessLoop:
mov $guess, %ecx
mov $15, %edx
mov $3, %eax
mov $0, %ebx
int $0x80
mov guess, %ecx
mov letter, %edx
cmp %ecx, %edx
jne Incorrect
je Correct
Incorrect:
mov $wrong_guess, %ecx
mov wrong_guess_length, %edx
mov $4, %eax
mov $1, %ebx
int $0x80
jmp GuessLoop
Correct:
mov $correct, %ecx
mov correct_length, %edx
mov $4, %eax
mov $1, %ebx
int $0x80
ret
# Method That Clears the Screen #
Screen_Clear:
mov $Screen_Clearer, %ecx
mov Screen_Clearer_length, %edx
mov $4, %eax
mov $1, %ebx
int $0x80
ret
# End of Method to Clear Screen
If you are going to use Assembly, you will need to learn about Addressing Modes, Addressing Modes on Google
In this sample, I use the [Base + Index] mode. You will need one more variable to hold your hint string. It is not AT&T syntax, but it will give you the idea
%define sys_exit 1
%define sys_write 4
%define sys_read 3
%define stdin 0
%define stdout 1
SECTION .bss
hint resb 15
letter resb 15
leter_len equ $ - letter
SECTION .text
global _start
_start:
mov ecx, letter
mov edx, leter_len
mov ebx, stdin
mov eax, sys_read
int 80H
mov esi, hint
mov edi, letter
xor ecx, ecx
dec eax
.MakeHint:
mov dl, byte [edi + ecx] ; get byte from pointer + index
cmp dl, 10 ; is it linefeed
je .ShowIt
mov byte[esi + ecx], dl ; move byte into hint buffer
inc ecx ; increase index
cmp ecx, eax ; at the end?
je .ShowIt
mov byte[esi + ecx], 33 ; move ! to next index
inc ecx ; increase index
cmp ecx, eax ; at end?
jne .MakeHint
.ShowIt:
mov ecx, hint
mov edx, leter_len
mov ebx, stdout
mov eax, sys_write
int 80H
mov eax, sys_exit
xor ebx, ebx
int 80h

executing assembly within a function in c++

long getesp() {
__asm__("movl %esp,%eax");
}
void main() {
printf("%08X\n",getesp()+4);
}
why does esp points to value before the stack frame is setup and does it makes any difference between with the code below?
void main() {
__asm__("movl %esp,%eax");
}
After i did a gcc -S file.c
getesp:
pushl %ebp
movl %esp, %ebp
subl $4, %esp
#APP
# 4 "xxt.c" 1
movl %esp,%eax
# 0 "" 2
#NO_APP
leave
ret
main:
leal 4(%esp), %ecx
andl $-16, %esp
pushl -4(%ecx)
pushl %ebp
movl %esp, %ebp
pushl %ecx
subl $20, %esp
call getesp
addl $4, %eax
movl %eax, 4(%esp)
movl $.LC0, (%esp)
call printf
addl $20, %esp
popl %ecx
popl %ebp
leal -4(%ecx), %esp
ret
The getesp has a pushl which manipulates the esp and gets the manipulated esp in eax with the inline and as well as in ebp.
Making a function call to fetch the stack pointer and getting it inside the main is definitely different, and it differs by 12 bytes (in this specific case). This is because when you execute the call pushes the eip (if not intersegment, and for linux/unix normal program execution it is only eip) (need citation), next inside the getesp function there is another push with ebp and after that the stack pointer is subtracted by 4. Because the eip and ebp are of 4 bytes, so the total difference is now 12 bytes. Which actually we can see in the function call version.
Without the function call there is no pushing of eip and other esp manipulation, so we get the esp value after the main setup.
I am not comfortable with AT&T so here is the same code in Intel syntax and an Intex syntax asm dump below. Note that in the printf call for the __asm__ inside the main value got into a there is no push or other esp modification so, the __asm__ inside main gets the esp value which was set in the main by the sub esp, 20 line. Where as the value we get by calling getesp is the (what you are expecting) - 12 , as described above.
The C Code
#include <stdio.h>
int a;
long getesp() {
__asm__("mov a, esp");
}
int main(void)
{
__asm__("mov a,esp");
printf("%08X\n",a);
getesp ();
printf("%08X\n",a);
}
The output is in my case for the specific run:
BF855D00
BF855CF4
The intel syntax dump is:
getesp:
push ebp
mov ebp, esp
sub esp, 4
#APP
# 7 "xt.c" 1
mov a, esp
# 0 "" 2
#NO_APP
leave
ret
main:
lea ecx, [esp+4]
and esp, -16
push DWORD PTR [ecx-4]
push ebp
mov ebp, esp
push ecx
sub esp, 20
#APP
# 12 "xt.c" 1
mov a,esp
# 0 "" 2
#NO_APP
mov eax, DWORD PTR a
mov DWORD PTR [esp+4], eax
mov DWORD PTR [esp], OFFSET FLAT:.LC0
call printf
call getesp
mov eax, DWORD PTR a
mov DWORD PTR [esp+4], eax
mov DWORD PTR [esp], OFFSET FLAT:.LC0
call printf
add esp, 20
pop ecx
pop ebp
lea esp, [ecx-4]
ret
I hope this helps.

Understand the assembly code generated by a simple C program

I am trying to understand the assembly level code for a simple C program by inspecting it with gdb's disassembler.
Following is the C code:
#include <stdio.h>
void function(int a, int b, int c) {
char buffer1[5];
char buffer2[10];
}
void main() {
function(1,2,3);
}
Following is the disassembly code for both main and function
gdb) disass main
Dump of assembler code for function main:
0x08048428 <main+0>: push %ebp
0x08048429 <main+1>: mov %esp,%ebp
0x0804842b <main+3>: and $0xfffffff0,%esp
0x0804842e <main+6>: sub $0x10,%esp
0x08048431 <main+9>: movl $0x3,0x8(%esp)
0x08048439 <main+17>: movl $0x2,0x4(%esp)
0x08048441 <main+25>: movl $0x1,(%esp)
0x08048448 <main+32>: call 0x8048404 <function>
0x0804844d <main+37>: leave
0x0804844e <main+38>: ret
End of assembler dump.
(gdb) disass function
Dump of assembler code for function function:
0x08048404 <function+0>: push %ebp
0x08048405 <function+1>: mov %esp,%ebp
0x08048407 <function+3>: sub $0x28,%esp
0x0804840a <function+6>: mov %gs:0x14,%eax
0x08048410 <function+12>: mov %eax,-0xc(%ebp)
0x08048413 <function+15>: xor %eax,%eax
0x08048415 <function+17>: mov -0xc(%ebp),%eax
0x08048418 <function+20>: xor %gs:0x14,%eax
0x0804841f <function+27>: je 0x8048426 <function+34>
0x08048421 <function+29>: call 0x8048340 <__stack_chk_fail#plt>
0x08048426 <function+34>: leave
0x08048427 <function+35>: ret
End of assembler dump.
I am seeking answers for following things :
how the addressing is working , I mean (main+0) , (main+1), (main+3)
In the main, why is $0xfffffff0,%esp being used
In the function, why is %gs:0x14,%eax , %eax,-0xc(%ebp) being used.
If someone can explain , step by step happening, that will be greatly appreciated.
The reason for the "strange" addresses such as main+0, main+1, main+3, main+6 and so on, is because each instruction takes up a variable number of bytes. For example:
main+0: push %ebp
is a one-byte instruction so the next instruction is at main+1. On the other hand,
main+3: and $0xfffffff0,%esp
is a three-byte instruction so the next instruction after that is at main+6.
And, since you ask in the comments why movl seems to take a variable number of bytes, the explanation for that is as follows.
Instruction length depends not only on the opcode (such as movl) but also the addressing modes for the operands as well (the things the opcode are operating on). I haven't checked specifically for your code but I suspect the
movl $0x1,(%esp)
instruction is probably shorter because there's no offset involved - it just uses esp as the address. Whereas something like:
movl $0x2,0x4(%esp)
requires everything that movl $0x1,(%esp) does, plus an extra byte for the offset 0x4.
In fact, here's a debug session showing what I mean:
Microsoft Windows XP [Version 5.1.2600]
(C) Copyright 1985-2001 Microsoft Corp.
c:\pax> debug
-a
0B52:0100 mov word ptr [di],7
0B52:0104 mov word ptr [di+2],8
0B52:0109 mov word ptr [di+0],7
0B52:010E
-u100,10d
0B52:0100 C7050700 MOV WORD PTR [DI],0007
0B52:0104 C745020800 MOV WORD PTR [DI+02],0008
0B52:0109 C745000700 MOV WORD PTR [DI+00],0007
-q
c:\pax> _
You can see that the second instruction with an offset is actually different to the first one without it. It's one byte longer (5 bytes instead of 4, to hold the offset) and actually has a different encoding c745 instead of c705.
You can also see that you can encode the first and third instruction in two different ways but they basically do the same thing.
The and $0xfffffff0,%esp instruction is a way to force esp to be on a specific boundary. This is used to ensure proper alignment of variables. Many memory accesses on modern processors will be more efficient if they follow the alignment rules (such as a 4-byte value having to be aligned to a 4-byte boundary). Some modern processors will even raise a fault if you don't follow these rules.
After this instruction, you're guaranteed that esp is both less than or equal to its previous value and aligned to a 16 byte boundary.
The gs: prefix simply means to use the gs segment register to access memory rather than the default.
The instruction mov %eax,-0xc(%ebp) means to take the contents of the ebp register, subtract 12 (0xc) and then put the value of eax into that memory location.
Re the explanation of the code. Your function function is basically one big no-op. The assembly generated is limited to stack frame setup and teardown, along with some stack frame corruption checking which uses the afore-mentioned %gs:14 memory location.
It loads the value from that location (probably something like 0xdeadbeef) into the stack frame, does its job, then checks the stack to ensure it hasn't been corrupted.
Its job, in this case, is nothing. So all you see is the function administration stuff.
Stack set-up occurs between function+0 and function+12. Everything after that is setting up the return code in eax and tearing down the stack frame, including the corruption check.
Similarly, main consist of stack frame set-up, pushing the parameters for function, calling function, tearing down the stack frame and exiting.
Comments have been inserted into the code below:
0x08048428 <main+0>: push %ebp ; save previous value.
0x08048429 <main+1>: mov %esp,%ebp ; create new stack frame.
0x0804842b <main+3>: and $0xfffffff0,%esp ; align to boundary.
0x0804842e <main+6>: sub $0x10,%esp ; make space on stack.
0x08048431 <main+9>: movl $0x3,0x8(%esp) ; push values for function.
0x08048439 <main+17>: movl $0x2,0x4(%esp)
0x08048441 <main+25>: movl $0x1,(%esp)
0x08048448 <main+32>: call 0x8048404 <function> ; and call it.
0x0804844d <main+37>: leave ; tear down frame.
0x0804844e <main+38>: ret ; and exit.
0x08048404 <func+0>: push %ebp ; save previous value.
0x08048405 <func+1>: mov %esp,%ebp ; create new stack frame.
0x08048407 <func+3>: sub $0x28,%esp ; make space on stack.
0x0804840a <func+6>: mov %gs:0x14,%eax ; get sentinel value.
0x08048410 <func+12>: mov %eax,-0xc(%ebp) ; put on stack.
0x08048413 <func+15>: xor %eax,%eax ; set return code 0.
0x08048415 <func+17>: mov -0xc(%ebp),%eax ; get sentinel from stack.
0x08048418 <func+20>: xor %gs:0x14,%eax ; compare with actual.
0x0804841f <func+27>: je <func+34> ; jump if okay.
0x08048421 <func+29>: call <_stk_chk_fl> ; otherwise corrupted stack.
0x08048426 <func+34>: leave ; tear down frame.
0x08048427 <func+35>: ret ; and exit.
I think the reason for the %gs:0x14 may be evident from above but, just in case, I'll elaborate here.
It uses this value (a sentinel) to put in the current stack frame so that, should something in the function do something silly like write 1024 bytes to a 20-byte array created on the stack or, in your case:
char buffer1[5];
strcpy (buffer1, "Hello there, my name is Pax.");
then the sentinel will be overwritten and the check at the end of the function will detect that, calling the failure function to let you know, and then probably aborting so as to avoid any other problems.
If it placed 0xdeadbeef onto the stack and this was changed to something else, then an xor with 0xdeadbeef would produce a non-zero value which is detected in the code with the je instruction.
The relevant bit is paraphrased here:
mov %gs:0x14,%eax ; get sentinel value.
mov %eax,-0xc(%ebp) ; put on stack.
;; Weave your function
;; magic here.
mov -0xc(%ebp),%eax ; get sentinel back from stack.
xor %gs:0x14,%eax ; compare with original value.
je stack_ok ; zero/equal means no corruption.
call stack_bad ; otherwise corrupted stack.
stack_ok: leave ; tear down frame.
Pax has produced a definitive answer. However, for completeness, I thought I'd add a note on getting GCC itself to show you the assembly it generates.
The -S option to GCC tells it to stop compilation and write the assembly to a file. Normally, it either passes that file to the assembler or for some targets writes the object file directly itself.
For the sample code in the question:
#include <stdio.h>
void function(int a, int b, int c) {
char buffer1[5];
char buffer2[10];
}
void main() {
function(1,2,3);
}
the command gcc -S q3654898.c creates a file named q3654898.s:
.file "q3654898.c"
.text
.globl _function
.def _function; .scl 2; .type 32; .endef
_function:
pushl %ebp
movl %esp, %ebp
subl $40, %esp
leave
ret
.def ___main; .scl 2; .type 32; .endef
.globl _main
.def _main; .scl 2; .type 32; .endef
_main:
pushl %ebp
movl %esp, %ebp
subl $24, %esp
andl $-16, %esp
movl $0, %eax
addl $15, %eax
addl $15, %eax
shrl $4, %eax
sall $4, %eax
movl %eax, -4(%ebp)
movl -4(%ebp), %eax
call __alloca
call ___main
movl $3, 8(%esp)
movl $2, 4(%esp)
movl $1, (%esp)
call _function
leave
ret
One thing that is evident is that my GCC (gcc (GCC) 3.4.5 (mingw-vista special r3)) doesn't include the stack check code by default. I imagine that there is a command line option, or that if I ever got around to nudging my MinGW install up to a more current GCC that it could.
Edit: Nudged to do so by Pax, here's another way to get GCC to do more of the work.
C:\Documents and Settings\Ross\My Documents\testing>gcc -Wa,-al q3654898.c
q3654898.c: In function `main':
q3654898.c:8: warning: return type of 'main' is not `int'
GAS LISTING C:\DOCUME~1\Ross\LOCALS~1\Temp/ccLg8pWC.s page 1
1 .file "q3654898.c"
2 .text
3 .globl _function
4 .def _function; .scl 2; .type
32; .endef
5 _function:
6 0000 55 pushl %ebp
7 0001 89E5 movl %esp, %ebp
8 0003 83EC28 subl $40, %esp
9 0006 C9 leave
10 0007 C3 ret
11 .def ___main; .scl 2; .type
32; .endef
12 .globl _main
13 .def _main; .scl 2; .type 32;
.endef
14 _main:
15 0008 55 pushl %ebp
16 0009 89E5 movl %esp, %ebp
17 000b 83EC18 subl $24, %esp
18 000e 83E4F0 andl $-16, %esp
19 0011 B8000000 movl $0, %eax
19 00
20 0016 83C00F addl $15, %eax
21 0019 83C00F addl $15, %eax
22 001c C1E804 shrl $4, %eax
23 001f C1E004 sall $4, %eax
24 0022 8945FC movl %eax, -4(%ebp)
25 0025 8B45FC movl -4(%ebp), %eax
26 0028 E8000000 call __alloca
26 00
27 002d E8000000 call ___main
27 00
28 0032 C7442408 movl $3, 8(%esp)
28 03000000
29 003a C7442404 movl $2, 4(%esp)
29 02000000
30 0042 C7042401 movl $1, (%esp)
30 000000
31 0049 E8B2FFFF call _function
31 FF
32 004e C9 leave
33 004f C3 ret
C:\Documents and Settings\Ross\My Documents\testing>
Here we see an output listing produced by the assembler. (Its name is GAS, because it is Gnu's version of the classic *nix assembler as. There's humor there somewhere.)
Each line has most of the following fields: a line number, an address in the current section, bytes stored at that address, and the source text from the assembly source file.
The addresses are offsets into that portion of each section provided by this module. This particular module only has content in the .text section which stores executable code. You will typically find mention of sections named .data and .bss as well. Lots of other names are used and some have special purposes. Read the manual for the linker if you really want to know.
It will be better to try the -fno-stack-protector flag with gcc to disable the canary and see your results.
I'd like to add that for simple stuff, GCC's assembly output is often easier to read if you turn on a little optimization. Here's the sample code again...
void function(int a, int b, int c) {
char buffer1[5];
char buffer2[10];
}
/* corrected calling convention of main() */
int main() {
function(1,2,3);
return 0;
}
this is what I get without optimization (OSX 10.6, gcc 4.2.1+Apple patches)
.globl _function
_function:
pushl %ebp
movl %esp, %ebp
pushl %ebx
subl $36, %esp
call L4
"L00000000001$pb":
L4:
popl %ebx
leal L___stack_chk_guard$non_lazy_ptr-"L00000000001$pb"(%ebx), %eax
movl (%eax), %eax
movl (%eax), %edx
movl %edx, -12(%ebp)
xorl %edx, %edx
leal L___stack_chk_guard$non_lazy_ptr-"L00000000001$pb"(%ebx), %eax
movl (%eax), %eax
movl -12(%ebp), %edx
xorl (%eax), %edx
je L3
call ___stack_chk_fail
L3:
addl $36, %esp
popl %ebx
leave
ret
.globl _main
_main:
pushl %ebp
movl %esp, %ebp
subl $24, %esp
movl $3, 8(%esp)
movl $2, 4(%esp)
movl $1, (%esp)
call _function
movl $0, %eax
leave
ret
Whew, one heck of a mouthful! But look what happens with -O on the command line...
.text
.globl _function
_function:
pushl %ebp
movl %esp, %ebp
leave
ret
.globl _main
_main:
pushl %ebp
movl %esp, %ebp
movl $0, %eax
leave
ret
Of course, you do run the risk of your code being rendered completely unrecognizable, especially at higher optimization levels and with more complicated stuff. Even here, we see that the call to function has been discarded as pointless. But I find that not having to read through dozens of unnecessary stack spills is generally more than worth a little extra scratching my head over the control flow.

Resources