x86-64 ELF initial stack layout when calling glibc - c

Basically, I read through parts of http://www.nasm.us/links/unix64abi and at page 29, it shows the initial process stack of a C program.
My question is: I'm trying to interface with glibc from x86-64 nasm and based on what the above shows, argc should be at rsp. So the following code should print argc:
[SECTION .data]
PrintStr: db "You just entered %d arguments.", 10, 0
[SECTION .bss]
[SECTION .text]
extern printf
global main
main:
mov rax, 0 ; Required for functions taking in variable no. of args
mov rdi, PrintStr
mov rsi, [rsp]
call printf
ret
But it doesn't. Can someone enlighten me if I have made any mistakes in my code or tell me what the actual stack structure is?
Thanks!
UPDATE: I just randomly tried some offsets and changing the "mov rsi, [rsp]" to "mov rsi, [rsp+28]" did the trick.
But this means that the stack structure shown is wrong. Does anyone know what the initial stack layout is for an x86-64 elf? An equivalent of http://asm.sourceforge.net/articles/startup.html would be really nice.
UPDATE 2:
I left out how I build this code. I do it by:
nasm -f elf64 -g <filename>
gcc <filename>.o -o <outputfile>

The initial stack layout contains argc at the stack pointer, followed by the array char *argv[], not a pointer to it like main receives. Therefore, to call main, you need to do something like:
pop %rdi
mov %rsp,%rsi
call main
In reality there is usually a wrapper function that calls main, rather than the startup code doing it directly.
If you want to simply print argv[0], you could do something like:
pop %rdi
pop %rdi
call puts
xor %edi,%edi
jmp exit

Related

map exe decompilation back to C language

Im pretty new to assembly, and am trying my best to learn it. Im taking a course to learn it and they mentioned a very remedial Hello World example, that I decomplied.
original c file:
#include <stdio.h>
int main()
{
printf("Hello Students!");
return 0;
}
This was decompiled using the following command:
C:> objdump -d -Mintel HelloStudents.exe > disasm.txt
decompliation (assembly):
push ebp
mov ebp, esp
and esp, 0xfffffff0
sub esp, 0x10
call 401e80 <__main>
mov DWORD PTR [esp], 0x404000
call 4025f8 <_puts>
mov eax, 0x0
leave
ret
Im having issues mapping this output from the decompliation, to the original C file can someone help?
Thank you very much!
The technical term for decompiling assembly back into C is "turning hamburger back into cows". The generated assembly will not be a 1-to-1 translation of the source, and depending on the level of optimization may be radically different. You will get something functionally equivalent to the original source, but how closely it resembles that source in structure is heavily variable.
push ebp
mov ebp, esp
and esp, 0xfffffff0
sub esp, 0x10
This is all preamble, setting up the stack frame for the main function. It aligns the stack pointer (ESP) by 16 bytes then reserves another 16 bytes of space for outgoing function args.
call 401e80, <___main>
This function call to ___main is how MinGW arranges for libc initialization functions to run at the start of the program, making sure stdio buffers are allocated and stuff like that.
That's the end of the pre-amble; the part of the function that implements the C statements in your source starts with:
mov DWORD PTR [esp], 0x404000
This writes the address of the string literal "Hello Students!" onto the stack. Combined with the earliersub esp, 16, this is like apush` instruction. In this 32-bit calling convention, function args are passed on the stack, not registers, so that's where the compiler has to put them before function calls.
call 4025f8 <_puts>
This calls the puts function. The compiler realized that you weren't doing any format processing in the printf call and replaced it with the simpler puts call.
mov eax, 0x0
The return value of main is loaded into the eax register
leave
ret
Restore the previous EBP value, and tear down the stack frame, then exit the function. ret pops a return address off the stack, which can only work when ESP is pointing at the return address.

Add two numbers in assembly

I'm just getting started with assembly and I wanted to create a simple program that adds two numbers and prints the result
This is what I have so far:
.globl main
.type main, #function
main:
movl $14, %eax
movl $10, %ebx
add %eax, %ebx
call printf
From my understanding here is what's happening line by line
Line 1: I'm creating a label main that can be accessed by the linker
Line 2: I'm specifying the type of label main to a function
Line 3: I begin my definition of main
Line 4: I store the numeric value 14 into the general register eax
Line 5: I store the numeric value 10 into the general register ebx
Line 6: I add the values at eax and ebx and store the result in ebx
Line 7: I call the function printf(here's where I get confused)
How do I specify what value at which register gets printed?
Also, how do I complete this program? Currently when run, the program results in a segmentation fault.
SECTION .data
extern printf
global main
fmt:
db "%d", 10, 0
SECTION .text
main:
mov eax, 14
mov ebx, 10
add eax, ebx
push eax
push fmt
call printf
mov eax, 1
int 0x80
Unfortunately I don't know which compiler/assembler you are using, and I'm not familiar with at&t syntax so I have given you a working example in Intel style x86 for Nasm.
$ nasm -f elf32 test.s -o test.o
$ gcc test.o -m32 -o test
$ ./test
24
In order to use printf you need to actually push the arguments for it onto the stack, I do this here in reverse order (push the last arguments first):
push eax
push fmt
EAX contains the result of add eax, ebx and the label 'fmt' is an array of chars: "%d\n\0" (%d format, newline, null terminator).
After calling printf you need to actually exit your program with the exit system call, otherwise (at least for me) the program will segfault AFTER printf even though it worked and you won't see the result.
So these two lines:
mov eax, 1
int 0x80
are performing the sys_exit system call by placing the ordinal of exit on x86 (1) into EAX, and then invoking interrupt 0x80, this exits the program cleanly.

itoa implementation crashing?

I am trying to implement atoi in assembly (the netwide assembler). I have verified that my approach is valid by inspecting the register values with a debugger. The problem is that the application will crash when it is about to exit. I am afraid my program is corrupting the stack somehow. I am linking against the GCC stdlib to allow the use of the printf function. I noticed it mutated the registers which caused unexpected behaviour (extensive iterations over values I did not recognize), however I solved this by storing the value of EAX inside EBX (not modified by printf) and then restoring the value after the function call. This is why I have been able to confirm that the program now behaves as it is supposed to by singlestepping through the algorithm AND confirm that the program crashes as it is about to terminate.
Here is the code:
global _main
extern _printf
section .data
_str: db "%d", 0
section .text
_main:
mov eax, 1234
mov ebx, 10
call _itoa
_terminate:
ret
_itoa:
test eax, eax
jz _terminate
xor edx, edx
div ebx
add edx, 30h
push eax
push edx
push _str
call _printf
add esp, 8
pop eax
jmp _itoa
And here is the stackdump:
Exception: STATUS_ACCESS_VIOLATION at eip=00402005
eax=00000000 ebx=00000000 ecx=20000038 edx=61185C40 esi=612A3A7C edi=0022CD84
ebp=0022ACF8 esp=0022AC20 program=C:\Cygwin\home\Benjamin\nasm\itoa.exe, pid 3556, thread main
cs=001B ds=0023 es=0023 fs=003B gs=0000 ss=0023
Stack trace:
Frame Function Args
0022ACF8 00402005 (00000000, 0022CD84, 61007120, 00000000)
End of stack trace
EDIT: Please note that the stackdump really is not that relevant anymore as the program no longer crashes, it just displays an incorrect value.
I'm not familiar with your platform, but I would expect you need to restore the stack by popping off the pushed values after calling printf().
Since printf() doesn't know how many arguments will be passed, it can't restore the stack. Your code pushes arguments that are never popped off. So when your procedure returns, it gets the return address from the data that was pushed on the stack, which are not going to point to valid code. And that would be your access violation.

Segfault while calling C function (printf) from Assembly

I am using NASM on linux to write a basic assembly program that calls a function from the C libraries (printf). Unfortunately, I am incurring a segmentation fault while doing so. Commenting out the call to printf allows the program to run without error.
; Build using these commands:
; nasm -f elf64 -g -F stabs <filename>.asm
; gcc <filename>.o -o <filename>
;
SECTION .bss ; Section containing uninitialized data
SECTION .data ; Section containing initialized data
text db "hello world",10 ;
SECTION .text ; Section containing code
global main
extern printf
;-------------
;MAIN PROGRAM BEGINS HERE
;-------------
main:
push rbp
mov rbp,rsp
push rbx
push rsi
push rdi ;preserve registers
****************
;code i wish to execute
push text ;pushing address of text on to the stack
;x86-64 uses registers for first 6 args, thus should have been:
;mov rdi,text (place address of text in rdi)
;mov rax,0 (place a terminating byte at end of rdi)
call printf ;calling printf from c-libraries
add rsp,8 ;reseting the stack to pre "push text"
**************
pop rdi ;preserve registers
pop rsi
pop rbx
mov rsp,rbp
pop rbp
ret
x86_64 does not use the stack for the first 6 args. You need to load them in the proper registers. Those are:
rdi, rsi, rdx, rcx, r8, r9
The trick I use to remember the first two is to imagine the function is memcpy implemented as rep movsb,
You're calling a varargs function -- printf expects a variable number of arguments and you have to account for that in the argument stack. See here: http://www.csee.umbc.edu/portal/help/nasm/sample.shtml#printf1

Calling _printf in assembly loop, only outputting once

I'm learning assembly and I have a very basic loop here
segment .data
msg: db '%d',10,0
segment .text
global _asm_main
extern _printf
_asm_main:
push DWORD 5 ; Should loop 5 times
call dump_stack
add esp,4
ret
dump_stack:
push ebp
mov ebp, esp
mov ecx, 0
loop_start:
cmp ecx,[ebp+8] ;compare to the first param of dump_stack, (5)
jnle loop_end
push ecx ;push the value of my loop onto the stack
push DWORD msg ;push the msg (%d) should just print the value of my loop
call _printf
add esp, 8 ;clear the stack
inc ecx ;increment ecx
jmp loop_start ; go back to my loop start
loop_end:
mov esp, ebp
pop ebp
ret
My output looks something like this
program.exe
0
Just 0, then a newline. I tried to verify the loop was executing by moving my printf to the loop_end part, and it came out with ecx as 6, which is correct. So the loop is executing but printf is not... What am I doing wrong?
(Also, the function is called dump stack because it was initially supposed to dump the details of the stack, but that didn't work because of the same reason here)
And I am compiling with nasm -f win32 program.asm -o program.o
Then I have a cpp file that includes windows.h, and I compiled it with gcc -c include
and finally I linked them with gcc -o program program.o include.o
and I run program.exe
My guess is that printf() modifies ecx, it becomes >= [ebp+8] and your loop body executes only once. If that's the case, you need to read up on the calling conventions used in your compiler and manually preserve and restore the so-called volatile registers (which a called function can freely modify without restoring). Note that there may be several different calling conventions. Use the debugger!

Resources