Can we push local variable to stack in assembly - c

I made a simple program which will just push a number and display it on the screen but
don't know what is going wrong
section .data
value db 10
section .text
global main
extern printf
main:
push 10 //can we push value directly on stack?
call printf
add esp,4
ret
Getting Segmentation fault for above.
section .data
value db 10
section .text
global main
extern printf
main:
push [value]
call printf
add esp,4
ret
In second version will be pushing value pointed to by value variable on to stack
But getting "operation size not specified"

Yes, you can push any DWORD value (in 32-bit assembler) onto the stack.
The problem in the first code fragment is that printf expects the first argument to be a format string (in C, you'd write printf("%d\n", 10);). So something like
section .data
fmt db "%d", 10, 0
...
push 10
push fmt
call printf
add esp, 8
will work.
In the second code fragment, instead of push [value] you should write push dword [value], but that's not correct if your value variable is a single byte. Either declare it as a DWORD (dd), or perform
movsx eax, byte [value] ; if it's a signed integer; movzx for unsigned
push eax
And one more thing. When calling printf (or any of the C library functions), beware of stack alignment. Some platforms require that stack is 16-byte aligned at the time of a function call (this is necessary for correct execution of optimized CPU instructions like SSE). So, to make the stack aligned:
push ebp
mov ebp, esp
sub esp, 8 ; reserve 8 bytes for parameters
and esp, -16 ; align the stack (the reserved space can increase)
mov dword [esp], fmt ; put parameters into stack
mov dword [esp+4], 10
call printf
mov esp, ebp ; restore stack
pop ebp

Related

map exe decompilation back to C language

Im pretty new to assembly, and am trying my best to learn it. Im taking a course to learn it and they mentioned a very remedial Hello World example, that I decomplied.
original c file:
#include <stdio.h>
int main()
{
printf("Hello Students!");
return 0;
}
This was decompiled using the following command:
C:> objdump -d -Mintel HelloStudents.exe > disasm.txt
decompliation (assembly):
push ebp
mov ebp, esp
and esp, 0xfffffff0
sub esp, 0x10
call 401e80 <__main>
mov DWORD PTR [esp], 0x404000
call 4025f8 <_puts>
mov eax, 0x0
leave
ret
Im having issues mapping this output from the decompliation, to the original C file can someone help?
Thank you very much!
The technical term for decompiling assembly back into C is "turning hamburger back into cows". The generated assembly will not be a 1-to-1 translation of the source, and depending on the level of optimization may be radically different. You will get something functionally equivalent to the original source, but how closely it resembles that source in structure is heavily variable.
push ebp
mov ebp, esp
and esp, 0xfffffff0
sub esp, 0x10
This is all preamble, setting up the stack frame for the main function. It aligns the stack pointer (ESP) by 16 bytes then reserves another 16 bytes of space for outgoing function args.
call 401e80, <___main>
This function call to ___main is how MinGW arranges for libc initialization functions to run at the start of the program, making sure stdio buffers are allocated and stuff like that.
That's the end of the pre-amble; the part of the function that implements the C statements in your source starts with:
mov DWORD PTR [esp], 0x404000
This writes the address of the string literal "Hello Students!" onto the stack. Combined with the earliersub esp, 16, this is like apush` instruction. In this 32-bit calling convention, function args are passed on the stack, not registers, so that's where the compiler has to put them before function calls.
call 4025f8 <_puts>
This calls the puts function. The compiler realized that you weren't doing any format processing in the printf call and replaced it with the simpler puts call.
mov eax, 0x0
The return value of main is loaded into the eax register
leave
ret
Restore the previous EBP value, and tear down the stack frame, then exit the function. ret pops a return address off the stack, which can only work when ESP is pointing at the return address.

How is the stack unwound and stack frames identified

Let's say I have this simple program in C.
int my_func(int a, int b, int c) //0x4000
{
int d = 0;
int e = 0;
return e+d;
}
int main()
{
my_func(1,2,3); // 0x5000
return 0;
}
Ignoring the fact that it is essentially all dead code which can be completely optimized away. We'll say that my_func() lives at address 0x4000 and it is being called at address 0x5000.
From my understanding, a c compiler (I understand they can operate differently by vendor) may:
push c to the stack
push b to the stack
push a to the stack
push 0x5000 to the stack (return address)
call 0x4000
Then I'm assuming to access a it uses sp (stack pointer) + 1. b is sp+2 and c is sp+3.
Since d and e are on the stack, I'm guessing our stack would now look like this?
c
b
a
0x5000
d
e
When we get to the end of the function.
Does it then pop e and d off the stack?
Then... push e+d? Or save it to a register to be used after return?
Return to 0x5000 because it's the top of the stack?
Then pop the return address (0x5000) and a, b and c?
I'm guessing this is why old c required all the variables to be declared at the top of a function so that the compiler could count the number of pops it needed to perform at the end of the function?
I understand that it could have stored 0x5000 in a register, but a C program is able to go multiple levels deep into many functions and there are only so many registers...
Thanks!
In default calling convention for C, caller frees function argument after return from function. But function itself manages its own variables on stack. For example here is your code in assembly without any optimization:
my_func:
push ebp // +
mov ebp, esp // These 2 lines prepare function stack
sub esp, 16 // reserve memory for local variables
mov DWORD PTR [ebp-4], 0
mov DWORD PTR [ebp-8], 0
mov edx, DWORD PTR [ebp-8]
mov eax, DWORD PTR [ebp-4]
add eax, edx // <--return value in eax
leave // return esp to what it was at start of function
ret // return to caller
main:
push ebp
mov ebp, esp
push 3
push 2
push 1
call my_func
add esp, 12 // <- return esp to what it was before pushing arguments
mov eax, 0
leave
ret
As you see, there is a add esp, 12 in main for returning esp as it was before pushing arguments. In my_func there is a pair like this:
push ebp
mov ebp, esp
sub esp, 16 // <--- size of stack
...
leave
ret
This pair set is used for reserving some memory as stack. leave reverses the effect of push ebp/move ebp,esp. And function used ebp for accessing its arguments and stack-allocated variables. Return value is always in eax.
A quick allocated stack size note:
As you see, in function, there is a add esp, 16 instruction even though you only keep 2 variable of type int on stack which has a total size of 8 bytes. It is because stack size is aligned to specific boundaries (At least with default compile options). If you add 2 more int variables to my_func, this instruction is still add esp, 16, because total stack is still in 16 byte alignment. But if you add a 3rd variable of int, this instruction becomes add esp, 32. This alignment can be configured by -mpreferred-stack-boundary option in GCC.
By the way, all of these are for 32-bit compilation of code.In contrast, you normally never pass argument via stack pushing in 64-bit and you pass them through registers. As mentioned in comment, in 64-bit arguments are only passed through stack starting 5th argument(on microsoft x64 calling convention).
Update:
From default calling convention, In mean cdecl which is normally used when you compile your code for x86, without any compiler options or specific function attributes. If you change function call to stdcall as an example, all these will change.

Causes and benefits of this improvement on gcc version >= 4.9.0 vs gcc version < 4.9?

I have recently exploited a dangerous program and found something interesting about the difference between versions of gcc on x86-64 architecture.
Note:
Wrongful usage of gets is not the issue here.
If we replace gets with any other functions, the problem doesn't change.
This is the source code I use:
#include <stdio.h>
int main()
{
char buf[16];
gets(buf);
return 0;
}
I use gcc.godbolt.org to disassemble the program with flag -m32 -fno-stack-protector -z execstack -g.
At the disassembled code, when gcc with version >= 4.9.0:
lea ecx, [esp+4] # begin of main
and esp, -16
push DWORD PTR [ecx-4] # push esp
push ebp
mov ebp, esp
/* between these comment is not related to the question
push ecx
sub esp, 20
sub esp, 12
lea eax, [ebp-24]
push eax
call gets
add esp, 16
mov eax, 0
*/
mov ebp, esp
mov ecx, DWORD PTR [ebp-4] # ecx = saved esp
leave
lea esp, [ecx-4]
ret # end of main
But gcc with version < 4.9.0 just:
push ebp # begin of main
mov ebp, esp
/* between these comment is not related to the question
and esp, -16
sub esp, 32
lea eax, [esp+16]
mov DWORD PTR [esp], eax
call gets
mov eax, 0
*/
leave
ret # end of main
My question is: What is the causes of this difference on the disassembled code and its benefits? Does it have a name for this technique?
I can't say for sure without the actual values in:
and esp, 0xXX # XX is a number
but this looks a lot like extra code to align the stack to a larger value than the ABI requires.
Edit: The value is -16, which is 32-bit 0xFFFFFFF0 or 64-bit 0xFFFFFFFFFFFFFFF0 so this is indeed stack alignment to 16 bytes, likely meant for use of SSE instructions. As mentioned in comments, there is more code in the >= 4.9.0 version because it also aligns the frame pointer instead of only the stack pointer.
The i386 ABI, used for 32-bit programs, imposes that a process, immediately after loaded, has to have the stack aligned on 32-bit values:
%esp Performing its usual job, the stack pointer holds the address of the
bottom of the stack, which is guaranteed to be word aligned.
confront this with the x86_64 ABI1 used for 64-bit programs:
%rsp The stack pointer holds the address of the byte with lowest address which
is part of the stack. It is guaranteed to be 16-byte aligned at process entry
The opportunity gave by the new AMD's 64-bit technology to rewrite the old i386 ABI allow a number of optimizations that were lacking due to backward compatibility, among these a bigger (stricter?) stack alignment.
I won't dwell on the benefits of stack alignment but it suffices to say that if a 4-byte alignment was good, so is a 16-byte one.
So much that it is worth spending some instructions aligning the stack.
That's what GCC 4.9.0+ does, it aligns the stack at 16-bytes.
That explains the and esp, -16 but not the other instructions.
Aligning the stack with and esp, -16 is the fastest way to do it when the compiler only knows that the stack is 4-byte aligned (since esp MOD 16 can be 0, 4, 8 or 12).
However it is a destructive method, the compiler loses the original esp value.
But now it comes the chicken or the egg problem: if we save the original esp on the stack before aligning the stack, we lose it because we don't know how far the stack pointer is lowered by the alignment. If we save it after the alignment, well, we can't. We lost it in the alignment.
So the only possible solution is to save it in a register, align the stack and then save said register on the stack.
;Save the stack pointer in ECX, actually is ESP+4 but still does
lea ecx, [esp+4] #ECX = ESP+4
;Align the stack
and esp, -16 #This lowers ESP by 0, 4, 8 or 12
;IGNORE THIS FOR NOW
push DWORD PTR [ecx-4]
;Usual prolog
push ebp
mov ebp, esp
;Save the original ESP (before alignment), actually is ESP+4 but OK
push ecx
GCC saves esp+4 in ecx, I don't know why2 but this values still does the trick.
The only mystery left is the push DWORD PTR [ecx-4].
But it turns out to be a simple mystery: for debugging purposes GCC pushes the return addresses just before the old frame pointer (before push ebp), this is where 32-bit tools expect it to be.
Since ecx=esp_o+4, where esp_o is the original stack pointer pre-alignment, [ecx-4] = [esp_o] = return address.
Note that now the stack is at 12 bytes modulo 16, thus the local variable area must be of size 16*k+4 to have the stack aligned at 16-byte again.
In your example k is 1 and the area is of 20 bytes in size.
The subsequent sub esp, 12 is to align the stack for the gets function (the requirement is to have the stack aligned at the function call).
Finally, the code
mov ebp, esp
mov ecx, DWORD PTR [ebp-4] # ecx = saved esp
leave
lea esp, [ecx-4]
ret
The first instruction is copy-paste error.
One could check it out or simply reason that
if it were there the [ebp-4] would be below the stack pointer (and there is no red zone for the i386 ABI).
The rest is just undoing what's is done in the prolog:
;Get the original stack pointer
mov ecx, DWORD PTR [ebp-4] ;ecx = esp_o+4
;Standard epilog
leave ;mov esp, ebp / pop ebp
;The stack pointer points to the copied return address
;Restore the original stack pointer
lea esp, [ecx-4] ;esp = esp_o
ret
GCC has to first get the original stack pointer (+4) saved on the stack, then restore the old frame pointer (ebp) and finally, restore the original stack pointer.
The return address is on the top of the stack when lea esp, [ecx-4] is executed, so in theory GCC could just return but it has to restore the original esp because main is not the first function to be executed in a C program, so it cannot leave the stack unbalanced.
1 This is not the latest version but the text quoted went unchanged in the successive editions.
2 This has been discussed here on SO but I can't remember if in some comment or in an answer.

How can I change the value of a single byte using NASM?

Using NASM, I need to change a character in a string at a given index and print the string in its new form. Here is a simplified version of my code:
;test_code.asm
global main
extern printf
output_str: db "----------"
index: dq 7
main:
push rbp
mov rdi, output_str
mov rax, index
mov byte[rdi + rax], 'x'
xor rax, rax
call printf
pop rbp
ret
I then compile using:
nasm -felf64 test_code.asm && gcc test_code.o -lm
and get a seg fault. Would someone please point out the flaw here? I can't seem to find it myself.
your string is in the .text section of the executable, which is read only by default. Either you allocate a buffer on the stack, copy the string and you modify it there, or you put the string in the .data section (which is read/write) using the section directive. In this last case, notice that the character replacement will be persistent, i.e. even later in the program the string will remain modified;
if you want to print that string with printf it has to be NUL-terminated. Add a ,0 to the end of the db line;
that mov rax, index is wrong - index is the address of the qword you wrote above, while you actually want to copy in rax the datum wrote there; you probably want mov rax, [index].
So, something like
;test_code.asm
global main
extern printf
section .data
output_str:
db "----------",0
section .text
index:
dq 7
main:
push rbp
mov rdi, output_str
mov rax, [index]
mov byte[rdi + rax], 'x'
xor rax, rax
call printf
pop rbp
ret

x86-64 Arrays Inputting and Printing

I'm trying to input values into an array in x86-64 Intel assembly, but I can't quite figure it out.
I'm creating an array in segement .bss. Then I try to pass the address of the array along to another module by using r15. Inside that module I prompt the user for a number that I then insert into the array. But it doesn't work.
I'm trying to do the following
segment .bss
dataArray resq 15 ; Array that will be manipulated
segment .text
mov rdi, dataArray ; Store memory address of array so the next module can use it.
call inputqarray ; Calling inputqarray module
Inside of inputqarary I have:
mov r15, rdi ; Move the memory address of the array into r15 for safe keeping
push qword 0 ; Make space on the stack for the value we are reading
mov rsi, rsp ; Set the second argument to point to the new locaiton on the stack
mov rax, 0 ; No SSE input
mov rdi, oneFloat ; "%f", 0
call scanf ; Call C Standard Library scanf function
call getchar ; Clean the input stream
pop qword [r15]
I then try to output the value entered by the use by doing
push qword 0
mov rax, 1
mov rdi, oneFloat
movsd xmm0, [dataArray]
call printf
pop rax
Unfortunately, all I get for output is 0.00000
Output is 0 because you are using the wrong format specifier. It should be "%lf"
Next, no need to push and pop in your procedure. Since your going to pass the address of the data array to scanf, and that will be in rsi, just pass it in rsi; one less move.
You declared your array as 15 QWORDS, is that correct - 120 bytes? or did you mean resb 15?
This works and should get you on your way:
extern printf, scanf, exit
global main
section .rodata
fmtFloatIn db "%lf", 0
fmtFloatOut db `%lf\n`, 0
section .bss
dataArray resb 15
section .text
main:
sub rsp, 8 ; stack pointer 16 byte aligned
mov rsi, dataArray
call inputqarray
movsd xmm0, [dataArray]
mov rdi, fmtFloatOut
mov rax, 1
call printf
call exit
inputqarray:
sub rsp, 8 ; stack pointer 16 byte aligned
; pointer to buffer is in rsi
mov rdi, fmtFloatIn
mov rax, 0
call scanf
add rsp, 8
ret
Since you are passing params in rdi to the C functions, this is not on Windows.

Resources