This question already has answers here:
Why is GCC pushing an extra return address on the stack?
(2 answers)
What's the purpose of stack pointer alignment in the prologue of main()
(2 answers)
Responsibility of stack alignment in 32-bit x86 assembly
(1 answer)
Trying to understand gcc's complicated stack-alignment at the top of main that copies the return address
(3 answers)
Closed 2 months ago.
I tested this C code on Compiler Explorer:
#include <stdio.h>
int main(void)
{
printf("Hello world!\n");
}
I selected the compiler x86-64 gcc 12.2 with the parameters -std=c17 -O3 -m32.
In 64-bit mode (no -m32 option) the output looks OK and optimized (online example):
.LC0:
.string "Hello world!"
main:
sub rsp, 8
mov edi, OFFSET FLAT:.LC0
call puts
xor eax, eax
add rsp, 8
ret
In 32-bit mode, the output is bloated by a lot of non-sense (online example):
.LC0:
.string "Hello world!"
main:
lea ecx, [esp+4] # OK, so ecx = esp + 4, but main has no parameters...why?
and esp, -16
push DWORD PTR [ecx-4] # push [ecx - 4], which is the return address...why?
push ebp
mov ebp, esp
push ecx # saves the stack address above the return address...what is going on?
sub esp, 16 # what is this for?
push OFFSET FLAT:.LC0
call puts
mov ecx, DWORD PTR [ebp-4] # restores that address
add esp, 16 # cleans that add esp, 16??
xor eax, eax
leave
lea esp, [ecx-4] # restores stack as it was on function entry
ret
I expected something along the lines of:
.LC0:
.string "Hello world!"
main:
sub esp, 12 # GCC wants 16-byte aligned stack
push OFFSET FLAT:.LC0
call puts
xor eax, eax
add esp, 12
ret
Considering I used the -O3 option, why does the output look like that?
Related
This question already has answers here:
What registers are preserved through a linux x86-64 function call
(3 answers)
How to restore x86-64 register saving conventions
(1 answer)
Why do gcc and clang generate mov reg,-1
(1 answer)
Closed 1 year ago.
I'd like to order the unordered array displayed in the source code. I don't know what I'm missing out here. Ok, I added some more details, hope this helps :)
The program
section .data
fmt: db "%d",0x0a,0
arr: dd 1000,100,10,1
section .text
global main
extern qsort
extern printf
main: push rbp
mov rbp, rsp
mov rdi, arr
mov rsi, 4
mov rdx, 4
mov rcx, cmp
call qsort
xor rbx, rbx
.L1: lea rdi, [fmt]
mov esi, [arr+rbx*4]
xor rax, rax
call printf
inc rbx
cmp rbx, 4
jnz .L1
xor rax, rax
leave
ret
cmp: movsxd rax, dword [rdi]
movsxd rbx, dword [rsi]
sub rax, rbx
ret
The program prints
1
100
10
1000
I have been trying to compile this C program to assembly but it hasn't been working fine.
I am reading
Dennis Yurichev Reverse Engineering for Beginner but I am not getting the same output. Its a simple hello world statement. I am trying to get the 32 bit output
#include <stdio.h>
int main()
{
printf("hello, world\n");
return 0;
}
Here is what the book says the output should be
main proc near
var_10 = dword ptr -10h
push ebp
mov ebp, esp
and esp, 0FFFFFFF0h
sub esp, 10h
mov eax, offset aHelloWorld ; "hello, world\n"
mov [esp+10h+var_10], eax
call _printf
mov eax, 0
leave
retn
main endp
Here are the steps;
Compile the print statement as a 32bit (I am currently running a 64bit pc)
gcc -m32 hello_world.c -o hello_world
Use gdb to disassemble
gdb file
set disassembly-flavor intel
set architecture i386:intel
disassemble main
And i get;
lea ecx,[esp+0x4]
and esp,0xfffffff0
push DWORD PTR [ecx-0x4]
push ebp
mov ebp,esp
push ebx
push ecx
call 0x565561d5 <__x86.get_pc_thunk.ax>
add eax,0x2e53
sub esp,0xc
lea edx,[eax-0x1ff8]
push edx
mov ebx,eax
call 0x56556030 <puts#plt>
add esp,0x10
mov eax,0x0
lea esp,[ebp-0x8]
pop ecx
pop ebx
pop ebp
lea esp,[ecx-0x4]
ret
I have also used
objdump -D -M i386,intel hello_world> hello_world.txt
ndisasm -b32 hello_world > hello_world.txt
But none of those are working either. I just cant figure out what's wrong. I need some help. Looking at you Peter Cordes ^^
The output from the book looks like MSVC, not GCC. GCC will definitely not ever emit main proc because that's MASM syntax, not valid GAS syntax. And it won't do stuff like var_10 = dword ptr -10h.
(And even if it did, you wouldn't see assemble-time constant definitions in disassembly, only in the compiler's asm output which is what the book suggested you look at. gcc -S -masm=intel output. How to remove "noise" from GCC/clang assembly output?)
So there are lots of differences because you're using a different compiler. Even modern versions of MSVC (on the Godbolt compiler explorer) make somewhat different asm, for example not bothering to align ESP by 16, perhaps because more modern Windows versions, or CRT startup code, already does that?
Also, your GCC is making PIE executables by default, so use -fno-pie -no-pie. 32-bit PIE sucks for efficiency and for ease of understanding. See How do i get rid of call __x86.get_pc_thunk.ax. (Also 32-bit absolute addresses no longer allowed in x86-64 Linux? for more about PIE executables, mostly focused on 64-bit code)
The extra clunky stack-alignment in main's prologue is something that GCC8 optimized for functions that don't also need alloca. But it seems even current GCC10 emits the full un-optimized version when you don't enable optimization :(.
Why is gcc generating an extra return address? and Trying to understand gcc's complicated stack-alignment at the top of main that copies the return address
Optimizing printf to puts: see How to get the gcc compiler to not optimize a standard library function call like printf? and -O2 optimizes printf("%s\n", str) to puts(str). gcc -fno-builtin-printf would be one way to make that not happen, or just get used to it. GCC does a few optimizations even at -O0 that other compilers only do at higher optimization levels.
MSVC 19.10 compiles your function like this (on the Godbolt compiler explorer) with optimization disabled (the default, no compiler options).
_main PROC
push ebp
mov ebp, esp
push OFFSET $SG4501
call _printf
add esp, 4
xor eax, eax
pop ebp
ret 0
_main ENDP
_DATA SEGMENT
$SG4501 DB 'hello, world', 0aH, 00H
GCC10.2 still uses an over-complicated stack alignment dance in the prologue.
.LC0:
.string "hello, world"
main:
lea ecx, [esp+4]
and esp, -16
push DWORD PTR [ecx-4]
push ebp
mov ebp, esp
push ecx
sub esp, 4
# end of function prologue, I think.
sub esp, 12 # make sure arg will be 16-byte aligned
push OFFSET FLAT:.LC0 # push a pointer
call puts
add esp, 16 # pop the arg-passing space
mov eax, 0 # return 0
mov ecx, DWORD PTR [ebp-4] # undo stack alignment.
leave
lea esp, [ecx-4]
ret
Yes, this is super inefficient. If you called your function anything other than main, it would already assume ESP was aligned by 16 on function entry:
# GCC10.2 -m32 -O0
.LC0:
.string "hello, world"
foo:
push ebp
mov ebp, esp
sub esp, 8 # reach a 16-byte boundary, assuming ESP%16 = 12 on entry
#
sub esp, 12
push OFFSET FLAT:.LC0
call puts
add esp, 16
mov eax, 0
leave
ret
So it still doesn't combine the two sub instructions, but you did tell it not to optimize so braindead code is expected. See Why does clang produce inefficient asm with -O0 (for this simple floating point sum)? for example.
My GCC will very eagerly swap a call to printf to puts! I did not manage to find the command line options that would make the compiler to not do this. I.e. the program has the same external behaviour but the machine code is that of
#include <stdio.h>
int main(void)
{
puts("hello, world");
}
Thus, you'll have really hard time trying to get the exact same assembly as in the book, as the assembly from that book has a call to printf instead of puts!
First of all you compile not decompile.
You get a lots of noise as you compile without the optimizations. If you compile with optimizations you will get much smaller code almost identical with the one you have (to prevent change from printf to puts you need to remove the '\n' https://godbolt.org/z/cs4qe9):
.LC0:
.string "hello, world"
main:
lea ecx, [esp+4]
and esp, -16
push DWORD PTR [ecx-4]
push ebp
mov ebp, esp
push ecx
sub esp, 16
push OFFSET FLAT:.LC0
call puts
mov ecx, DWORD PTR [ebp-4]
add esp, 16
xor eax, eax
leave
lea esp, [ecx-4]
ret
https://godbolt.org/z/xMqo33
This question already has answers here:
What is array to pointer decay?
(11 answers)
What kind of C11 data type is an array according to the AMD64 ABI
(1 answer)
What are the calling conventions for UNIX & Linux system calls (and user-space functions) on i386 and x86-64
(4 answers)
Closed 2 years ago.
So I have this C file where I am getting two strings of input from stdin and I am trying to pass them to my assembly code and have my assembly code determine if the first string would come after the second string in a dictionary and if so then return a 1.
#include <stdio.h>
extern int stringcheck(char *s1, char *s2);
int main(){
char s1[30];
fgets(s1, 31, stdin);
char s2[31];
fgets(s2, 31, stdin);
int ret = stringcheck(s1, s2);
if(ret == 1){
printf("True\n");
}
if(ret == 0){
printf("False\n");
}
return 0;
}
In this assembly code I tried to load my strings into appropriate registers and then use cmpsb in a loop and have the program jump to 'second' if the carry flag is set to 1. What happens though is that regardless of the string entered the program will always jump to the 'second' block and return 9 so I'm assuming I messed something up with how I prepared the strings to be checked.
global stringcheck
section .text
stringcheck:
; save state of registers and setup stack frame
push rbp
mov rbp, rsp
push rax
push rbx
push rdx
lea rsi, [rbp+60] ; move first arg into rsi
lea rdi, [rbp+30] ; move second arg into rdi
mov rcx, 30 ; set up incrementer
cld
top: cmpsb
jc second
loop top
jmp first
first: pop rdx
pop rbx
pop rax
mov rsp, rbp
pop rbp
mov rax, 1
ret
second: pop rdx
pop rbx
pop rax
mov rsp, rbp
pop rbp
mov rax, 0
ret
This question already has an answer here:
Why does clang produce inefficient asm with -O0 (for this simple floating point sum)?
(1 answer)
Closed 3 years ago.
I used https://godbolt.org/ with "x86-64 gcc 9.1" to assemble the following C code to understand why passing a pointer to a local variable as a function argument works. Now I have difficulties to understand some steps.
I commented on the lines I have difficulties with.
void printStr(char* cpStr) {
printf("str: %s", cpStr);
}
int main(void) {
char str[] = "abc";
printStr(str);
return 0;
}
.LC0:
.string "str: %s"
printStr:
push rbp
mov rbp, rsp
sub rsp, 16 ; why allocate 16 bytes when using it just for the pointer to str[0] which is 4 bytes long?
mov QWORD PTR [rbp-8], rdi ; why copy rdi to the stack...
mov rax, QWORD PTR [rbp-8] ; ... just to copy it into rax again? Also rax seems to already contain the pointer to str[0] (see *)
mov rsi, rax
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call printf
nop
leave
ret
main:
push rbp
mov rbp, rsp
sub rsp, 16 ; why allocate 16 bytes when "abc" is just 4 bytes long?
mov DWORD PTR [rbp-4], 6513249
lea rax, [rbp-4] ; pointer to str[0] copied into rax (*)
mov rdi, rax ; why copy the pointer to str[0] to rdi?
call printStr
mov eax, 0
leave
ret
Thanks to the help of Jester I could solve my confusion. The following code is compiled with the "-O1" flag of GCC (for me the best optimization level to understand what's going on):
.LC0:
.string "str: %s"
printStr:
sub rsp, 8
; now the call to printf gets prepared, rdi = first argument, rsi = second argument
mov rsi, rdi ; move str[0] to rsi
mov edi, OFFSET FLAT:.LC0 ; move address of static string literal "str: %s" to edi
mov eax, 0 ; set eax to the number of vector registers used, because printf is a varargs function
call printf
add rsp, 8
ret
main:
sub rsp, 24
mov DWORD PTR [rsp+12], 6513249 ; create string "abc" on the stack
lea rdi, [rsp+12] ; move address of str[0] (pointer to 'a') to rdi (first argument for printStr)
call printStr
mov eax, 0
add rsp, 24
ret
As Jester said, the 16 bytes were allocated for alignment. There is a good post on Stack Overflow which explains this here.
Edit:
There is a post on Stack Overflow which explains why al is zeroed before a call to a varargs function here.
I need to get my char pointer and after that i print it on the screen to see if it works ok.
I read the text from a file and put in my char pointer (sizeof(char*) + filesize) + 1.. in the end i put the '\0'.
If i printf my char* its fine
Here is my asm code
; void processdata(char* filecontent);
section .text
global processdata
extern printf
section .data
FORMAT: db '%c', 10, 0 ; to break the line 10, 0
processdata:
push ebp
mov ebp, esp
mov ebx, [ebp]
push ebx
push FORMAT
call printf
add esp, 8
when i run it i just see trash in my variable.
As violet said:
; void processdata(char* filecontent);
section .text
[GLOBAL processdata] ;global processdata
extern printf
section .data
FORMAT: db '%c', 0
EQUAL: db "is equal", 10, 0
processdata:
lea esi, [esp]
mov ebx, FORMAT
oook:
mov eax, [esi]
push eax
push ebx
call printf
inc esi
cmp esi, 0x0
jnz oook
Thanks
quote demonofnight
But if i need to increment, can i do that?
Using your original function arg of char* and your %c format, something like this:
lea esi, [esp+4]
mov ebx, FORMAT
oook:
mov eax, [esi]
push eax
push ebx
call printf
inc esi
cmp [esi], 0x0
jnz oook
[edit: ok sry, i hacked that quickly into some shenzi winOS inline __asm block]
here's a complete thing done in linux and nasm:
; ----------------------------------------------------------------------------
; blah.asm
; ----------------------------------------------------------------------------
extern printf
SECTION .data ;local variables
fmt: db "next char:%c", 10, 0 ;printf format, "\n",'0'
SECTION .text ;hack in some codes XD
global foo
foo: ;foo entry
push ebp ;set up stack frame
mov ebp,esp
mov esi, [ebp+8] ;load the input char array to the source index
oook: ;loop while we have some chars
push dword [esi] ;push next char addy in string to stack
push dword fmt ;push stored format string to stack
call printf
add esp, 8 ;restore stack pointer
inc esi ;iterate to next char
cmp byte [esi], 0 ;test for null terminator byte
jnz oook
mov esp, ebp ;restore stack frame
pop ebp
mov eax,0 ;return 0
ret ;done
blah.c (that invokes the .asm foo) :
/*-----------------------------------------------
blah.c
invokes some asm foo
------------------------------------------------*/
#include <stdio.h>
void foo(char*);
int main() {
char sz[]={"oook\n"};
foo(sz);
return 0;
}
&here's the commandline stuff:
$ nasm -f elf blah.asm -o blah.o
$ gcc -o blah blah.c blah.o
$ ./blah
next char:o
next char:o
next char:o
next char:k
next char:
$
$ nasm -v
NASM version 2.09.08 compiled on Apr 30 2011
$ uname -a
Linux violet-313 3.0.0-17-generic #30-Ubuntu SMP Thu Mar 8 17:34:21 UTC 2012 i686 i686 i386 GNU/Linux
Hope it works for you ;)
On linux x86 the first function parameter is in ecx. The printf format for printing a pointer is "%p".
So something along
...
FORMAT: db '%p', 10, 0 ; to break the line 10, 0
processdata:
mov eax, [esp+4]
push eax
push format
call printf
add esp, 8
ret
should work, assuming the rest of your code is correct and your are using gcc's calling convention.
This is assuming that the pointer you want to print is on the stack.
The reason for the crash is probably that you push 12 bytes on the stack, but correct the stack pointer by only 8.