I'm trying to understand this simple C program:
int square(int num) {
return num * num;
}
When it's in Assembly code:
square(int):
push rbp ;push rbp register onto stack
mov rbp, rsp ;move contents of rbp register into rsp register
mov DWORD PTR [rbp-4], edi ;not sure what happens here
mov eax, DWORD PTR [rbp-4] ;not sure what happens here
imul eax, DWORD PTR [rbp-4] ;multiply eax and DWORD PTR [rbp-4] (?)
pop rbp ;pop original register out of stack
ret ;return
What is happening in the 3rd and 4th line?
Why did two more register (edi and eax) have to be used instead of rsp?
What is actually happening with DWORD PTR [rbp-4]?
mov DWORD PTR [rbp-4], edi ;not sure what happens here
The x86_64 System V ABI passes function arguments via registers - the first integer argument is passed in the rdi/edi register. So this line copies the argument num to a local (offset -4 bytes from the frame pointer value stored in rbp).
mov eax, DWORD PTR [rbp-4] ;not sure what happens here
This copies the value in the local to the eax register.
imul eax, DWORD PTR [rbp-4] ;multiply eax and DWORD PTR [rbp-4] (?)
And this multiplies the value in eax by the local, and stores the result to eax (which also happens to be the register in which the function return value is stored).
As others pointed out in the comments, compiling with optimization would likely eliminate the local, and write directly from edi to eax.
Related
I have a problem with the below code:
void swap(int* a, int* b) {
__asm {
mov eax, a;
mov ebx, b;
push[eax];
push[ebx];
pop[eax];
pop[ebx];
}
}
int main() {
int a = 3, b = 6;
printf("a: %d\tb: %d\n", a, b);
swap(&a, &b);
printf("a: %d\tb: %d\n", a, b);
}
I am running this code in visual studio and when I run this, it says:
Run-Time check failure- The value of ESP was not properly saved across a function call. This is usually a result of calling a function declared with one calling convention with a function pointer declared with a different calling convention.
What am I missing?
To answer the title question: make sure you balance pushes and pops. (Normally getting that wrong would just crash, not return with the wrong ESP). If you're writing a whole function in asm make sure ret 0 or ret 8 or whatever matches the calling convention you're supposed to be using and the amount of stack args to pop (e.g. caller-pops cdecl ret 0 or callee-pops stdcall ret n).
Looking at the compiler's asm output (e.g. on Godbolt or locally) reveals the problem: different operand-sizes for push vs. pop, MSVC not defaulting to dword ptr for pop.
; MSVC 19.14 (under WINE) -O0
_a$ = 8 ; size = 4
_b$ = 12 ; size = 4
void swap(int *,int *) PROC ; swap
push ebp
mov ebp, esp
push ebx ; save this call-preserved reg because you used it instead of ECX or EDX
mov eax, DWORD PTR _a$[ebp]
mov ebx, DWORD PTR _b$[ebp]
push DWORD PTR [eax]
push DWORD PTR [ebx]
pop WORD PTR [eax]
pop WORD PTR [ebx]
pop ebx
pop ebp
ret 0
void swap(int *,int *) ENDP
This code would just crash, with ret executing while ESP points to the saved EBP (pushed by push ebp). Presumably Visual Studio passes addition debug-build options to the compiler so it does more checking instead of just crashing?
Insanely, MSVC compiles/assembles push [reg] to push dword ptr (32-bit operand-size, ESP-=4 each), but pop [reg] to pop word ptr (16-bit operand-size, ESP+=2 each)
It doesn't even warn about the operand-size being ambiguous, unlike good assemblers such as NASM where push [eax] is an error without a size override. (push 123 of an immediate always defaults to an operand-size matching the mode, but push/pop of a memory operand usually needs a size specifier in most assemblers.)
Use push dword ptr [eax] / pop dword ptr [ebx]
Or since you're using EBX anyway, not limiting your function to just the 3 call-clobbered registers in the standard 32-bit calling conventions, use registers to hold the temporaries instead of stack space.
void swap_mov(int* a, int* b) {
__asm {
mov eax, a
mov ebx, b
mov ecx, [eax]
mov edx, [ebx]
mov [eax], edx
mov [ebx], ecx
}
}
(You don't need ; empty comments at the end of each line. The syntax inside an asm{} block is MASM-like, not C statements.)
I have the following disassembly of a main function in which a user input is stored using scanf function (at address 0x0000089c). Due to the comparison that is made, I suppose that the user input is stored into the rsp register but I cannot figure out why, as rsp doesn't seem to be pushed on the stack (at least, not near the call to the scanf function).
Here is the disassembly:
0x00000850 sub rsp, 0x18
0x00000854 mov rax, qword fs:[0x28]
0x0000085d mov qword [canary], rax
0x00000862 xor eax, eax
0x00000864 call fcn.00000a3c
0x00000869 lea rsi, str.Insert_input:
0x00000870 mov edi, 1
0x00000875 xor eax, eax
0x00000877 mov dword [rsp], 0
0x0000087e mov dword [var_4h], 0
0x00000886 call sym.imp.__printf_chk
0x0000088b lea rdx, [var_4h]
0x00000890 lea rdi, str.u__u ; "%u %u" ;const char *format
0x00000897 xor eax, eax
0x00000899 mov rsi, rsp
0x0000089c call sym.imp.__isoc99_scanf ; int scanf(const char *format)
0x000008a1 mov eax, dword [rsp]
0x000008a4 cmp eax, 0x1336
0x000008a9 jg 0x867
On x86_64, parameters are passed in registers, so your call to scanf has 3 parameters stored in 3 registers:
rdi pointer to the string "%u %u", the format to parse (two unsigned integers)
rsi should be a unsigned *, pointer to where to put the first parsed integer
rdx pointer to where to put the second parsed integer.
If you look just before the call, rsi is set to rsp (the stack pointer) while rdx is set to point at the global variable var_4h (an extern symbol not defined here).
The stack is used to hold local variables, and in this case rsp points at a block 0x18 "free" bytes (allocated in the first instruction in your block), which is enough space for 6 integers. The one at offset 0 from rsp is what rsi points to, and it is the value read by the mov instruction immediately after the call.
Here is a basic program I written on the godbolt compiler, and it's as simple as:
#include<stdio.h>
void main()
{
int a = 10;
int *p = &a;
printf("%d", *p);
}
The results after compilation I get:
.LC0:
.string "%d"
main:
push rbp
mov rbp, rsp
sub rsp, 16
mov DWORD PTR [rbp-12], 10
lea rax, [rbp-12]
mov QWORD PTR [rbp-8], rax
mov rax, QWORD PTR [rbp-8]
mov eax, DWORD PTR [rax]
mov esi, eax
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call printf
nop
leave
ret
Question: Pushing the rbp, making the stack frame by making a 16 byte block, how from a register, a value is moved to a stack location and vice versa, how the job of LEA is to figure out the address, I got this part.
Problem:
lea rax, [rbp-12]
mov QWORD PTR [rbp-8], rax
mov rax, QWORD PTR [rbp-8]
mov eax, DWORD PTR [rax]
Lea -> getting address of rbp-12 into rax,
then moving the value which is the address of rbp-12 into rax,
but next line again says, move to rax, the value of rbp-8. This seems ambiguous. Then again moving the value of rax to eax. I don't understand the amount of work here. Why couldn't I have done
lea rax, [rbp-12]
mov QWORD PTR [rbp-8], rax
mov eax, QWORD PTR [rbp-8]
and be done with it? coz on the original line, rbp-12's address is stored onto rax, then rax stored to rbp-8. then rbp-8 stored again into rax, and then again rax is stored into eax? couldn't we have just copied the rbp-8 directly to eax? i guess not. But my question is why?
I know there is de-referencing in pointers, so How LEA helps grabbing the address of rbp-12, I understand, but on the next parts, when did it went from grabbing values from addresses I completely lost. And also, after that, I didn't understand any of the asm lines.
You're seeing very un-optimized code. Here's my line-by-line interpretation:
.LC0:
.string "%d" ; Format string for printf
main:
push rbp ; Save original base pointer
mov rbp, rsp ; Set base pointer to beginning of stack frame
sub rsp, 16 ; Allocate space for stack frame
mov DWORD PTR [rbp-12], 10 ; Initialize variable 'a'
lea rax, [rbp-12] ; Load effective address of 'a'
mov QWORD PTR [rbp-8], rax ; Store address of 'a' in 'p'
mov rax, QWORD PTR [rbp-8] ; Load 'p' into rax (even though it's already there - heh!)
mov eax, DWORD PTR [rax] ; Load 32-bit value of '*p' into eax
mov esi, eax ; Load value to print into esi
mov edi, OFFSET FLAT:.LC0 ; Load format string address into edi
mov eax, 0 ; Zero out eax (not sure why -- likely printf call protocol)
call printf ; Make the printf call
nop ; No-op (not sure why)
leave ; Remove the stack frame
ret ; Return
Compilers, when not optimizing, generate code like this as they parse the code you gave them. It's doing a lot of unnecessary stuff, but it is quicker to generate and makes using a debugger easier.
Compare this with the optimized code (-O2):
.LC0:
.string "%d" ; Format string for printf
main:
mov esi, 10 ; Don't need those variables -- just a 10 to pass to printf!
mov edi, OFFSET FLAT:.LC0 ; Load format string address into edi
xor eax, eax ; It's a few cycles faster to xor a register with itself than to load an immediate 0
jmp printf ; Just jmp to printf -- it will handle the return
The optimizer found that the variables weren't necessary, so no stack frame is created. Nothing is left but the printf call! And that's done as a jmp since nothing else need be done here when the printf is complete.
I am trying to understand how a variable sized static array work internally:
Following is a fixed size static array in C and its Assembly equivalent;
int main()
{
int arr[2] = {3};
}
================
main:
push rbp
mov rbp, rsp
mov QWORD PTR [rbp-8], 0
mov DWORD PTR [rbp-8], 2
mov eax, 0
pop rbp
ret
However a variable sized array is shown below
int main()
{
int varSize ;
int Arr[varSize];
}
=================
main:
push rbp
mov rbp, rsp
sub rsp, 32
mov rax, rsp
mov rcx, rax
mov eax, DWORD PTR [rbp-4]
movsx rdx, eax
sub rdx, 1
mov QWORD PTR [rbp-16], rdx
movsx rdx, eax
mov r8, rdx
mov r9d, 0
movsx rdx, eax
mov rsi, rdx
mov edi, 0
cdqe
lea rdx, [0+rax*4]
mov eax, 16
sub rax, 1
add rax, rdx
mov edi, 16
mov edx, 0
div rdi
imul rax, rax, 16
sub rsp, rax
mov rax, rsp
add rax, 3
shr rax, 2
sal rax, 2
mov QWORD PTR [rbp-24], rax
mov rsp, rcx
mov eax, 0
leave
ret
I am seeing a whole lot of assembly instructions if I declare a variable sized array. Can some one explain how is this flexibility of variable size achieved?
Same mechanism as alloca() - allocate memory by decreasing the stack pointer, with the assumption that the stack is big enough and/or the OS will grow it as needed.
There might be a bit of an issue when the requested size is over a memory page and the stack is near its end. Normally, the OS grows the stack by setting up a guard page at the stack top and watching for faults in that area, but that assumes that the stack grows more or less sequentially (by pushes and function calls). If the decreased stack pointer overshoots the guard page, it might end up pointing at a bogus location. I'm not sure what does the compiler do about that possibility.
I'm doing a program in assembly to read a disk through ports (0x1f0-0x1f7) and I'm mixing it with c. I have a function in assembly that I will call in my c main funtion. My main function as 1 parameter: sectors to read:
Kernel.c
extern int _readd(int nmrsector);
(...)
int sector = 257;
int error = _readd(sector);
if(error == 0) PrintString("Error"); //It is declared on my screen.h file
disk.asm
global _readd
_readd:
push eax
push ebx
push ecx
push edx
push ebp
mov ebp, esp
mov eax, [ebp+8]
mov ecx, eax
cmp ecx, 256
jg short _fail
jne short _good
_fail:
xor eax, eax
leave
ret
_good:
xor eax, eax
mov eax, 12
leave
ret
It crashes when run it with VirtualBox. Any ideas?
If you save CPU registers when you enter a function, you need to restore them when you are finished. Your PUSHs need to be matched with POPs.
Also, if you use a stack frame to access local variables and parameters, setup the frame (push ebp ; mov ebp, esp) before everything, so you can more easily refer to them. Here [ebp+8] doesn't refer to a parameter, because you alter the stack before setting up the frame.