This question already has answers here:
Calling C functions from x86 assembly language
(2 answers)
Closed 2 years ago.
I'm doing an assembly x86 program that resolves the determinant of a 2x2 matrix. Well, i've done the driver in C and then I made a function that will be processed in the assembly x86 code.
That's the function in C: int det(int,int,int,int) atribute((_cdecl));
My question is: how do I take these four ints (that i passed like as parameters in that function) in x86? how can I operate with them in the x86 code?
I think that it could be with the stack, but I don't know how.
What you are basically asking is: where are the parameters located, right?
cdecl passes all arguments on the stack, in reverse order. Look up the specs of that calling convention.
So for your function:
int det(int p1, int p2, int p3, int p4);
Your stack will look like this at the beginning of your function:
p4
p3
p2
p1
return address
esp always points to the return address. esp+4 points to p1, esp+8 points to p2 etc..
Please note:
As you are using the cdecl calling convention, you must preserve all registers, except eax, ecx and edx. When you execute ret, all other registers must be the same as they were when you entered the function.
The return value goes in eax.
Here's an example, using NASM syntax. Note that I didn't include a stack frame for the function, because it is a short leaf function, so it is a waste:
; UNTESTED
; int det(int a11, int a12, int a21, int a22)
_det:
mov ecx, [esp + 4]
mov edx, [esp + 12]
imul ecx, edx
mov eax, [esp + 8]
mov edx, [esp + 16]
imul eax, edx
sub ecx, eax
mov eax, ecx
ret
Related
This question already has answers here:
Return value from writing an unused parameter when falling off the end of a non-void function
(1 answer)
Checking return value of a function without return statement
(3 answers)
Why does clang produce inefficient asm with -O0 (for this simple floating point sum)?
(1 answer)
Closed 1 year ago.
I think I have found a problem with the way functions are handled by the gcc compiler.
I don't know if it's a mistake or a never distraction on something I've let slip over the years.
In practice, by declaring a function and defining the latter having a return value, the compiler stores the value of the first variable allocated in the range of the function in the EAX register, and then stores it, in turn, within a variable. Example:
#include<stdio.h>
int add(int a, int b)
{
int c = a + b;
;there isn't return
}
int main(void)
{
int res = add(3, 2);
return 0;
}
This is the output:
5
This is the x86-64 Assembly with intel syntax:
Function add:
push rbp
mov rbp, rsp
mov DWORD PTR[rbp-0x14], edi ;store first
mov DWORD PTR[rbp-0x18], esi ;store second
mov edx, DWORD PTR[rbp-0x14]
mov eax, DWORD PTR[rbp-0x18]
add eax, esx
mov DWORD PTR[rbp-0x4], eax
nop
pop rbp
ret
Function main:
push rbp
mov rbp, rsp
sub rsp, 0x10
mov esi, 0x2 ;first parameter
mov edi, 0x3 ;second parameter
call 0x1129 <add>
;WHAT??? eax = a + b, why store it?
mov DWORD PTR[rbp-0x4], eax
mov eax, 0x0
leave
ret
As you can see, it saves me the sum of the parameters a and b in the variable c, but then it saves me in the variable res the eax register containing their sum, as if a function returned values.
Is this done because the function was defined with a return value?
What you've done is trigger undefined behavior by failing to return a value from a function and then attempting to use that return value.
This is documented in section 6.9.1p12 of the C standard:
If the } that terminates a function is reached, and the value of the function call is used by the caller, the behavior is undefined.
One of the ways that undefined behavior can manifest itself is by the program appearing to work properly, as you've seen. However, there's no guarantee that it will continue to work if for example, you added some unrelated code or compiled with different optimization settings.
eax is the register used to return a value, in this case because it is supposed to return int. So the caller gets whatever happens to be in that registers. However you should have gotten at least a warning, that there is no return statement.
Because your function is pretty small and the compiler decided to use the eax register for it's calculation, it appears to work.
If you switch on optimization or provide a more complex function, the result will be quite different.
there's something weird going on here. Visual Studio is letting me know the ESP value was not properly saved but I cannot see any mistakes in the code (32-bit, windows, __stdcall)
MASM code:
.MODE FLAT, STDCALL
...
memcpy PROC dest : DWORD, source : DWORD, size : DWORD
MOV EDI, [ESP+04H]
MOV ESI, [ESP+08H]
MOV ECX, [ESP+0CH]
AGAIN_:
LODSB
STOSB
LOOP AGAIN_
RETN 0CH
memcpy ENDP
I am passing 12 bytes (0xC) to the stack then cleaning it up. I have confirmed by looking at the symbols the functions symbol goes like "memcpy#12", so its indeed finding the proper symbol
this is the C prototype:
extern void __stdcall * _memcpy(void*,void*,unsigned __int32);
Compiling in 32-bit. The function copies the memory (I can see in the debugger), but the stack cleanup appears not to be working
EDIT:
MASM code:
__MyMemcpy PROC _dest : DWORD, _source : DWORD, _size : DWORD
MOV EDI, DWORD PTR [ESP + 04H]
MOV ESI, DWORD PTR [ESP + 08H]
MOV ECX, DWORD PTR [ESP + 0CH]
PUSH ESI
PUSH EDI
__AGAIN:
LODSB
STOSB
LOOP __AGAIN
POP EDI
POP ESI
RETN 0CH
__MyMemcpy ENDP
C code:
extern void __stdcall __MyMemcpy(void*, void*, int);
typedef struct {
void(__stdcall*MemCpy)(void*,void*,int);
}MemFunc;
int initmemfunc(MemFunc*f){
f->MemCpy=__MyMemcpy
}
when I call it like this I get the error:
MemFunc mf={0};
initmemfunc(&mf);
mf.MemCpy(dest,src,size);
when I call it like this I dont:
__MyMemcpy(dest,src,size)
Since you have provided an update to your question and comments suggesting you disable prologue and epilogue code generation for functions created with the MASM PROC directive I suspect your code looks something like this:
.MODEL FLAT, STDCALL
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
.CODE
__MyMemcpy PROC _dest : DWORD, _source : DWORD, _size : DWORD
MOV EDI, DWORD PTR [ESP + 04H]
MOV ESI, DWORD PTR [ESP + 08H]
MOV ECX, DWORD PTR [ESP + 0CH]
PUSH ESI
PUSH EDI
__AGAIN:
LODSB
STOSB
LOOP __AGAIN
POP EDI
POP ESI
RETN 0CH
__MyMemcpy ENDP
END
A note about this code: beware that if your source and destination buffers overlap this can cause problems. If the buffers don't overlap then what you are doing should work. You can avoid this by marking the pointers __restrict. __restrict is an MSVC/C++ extension that will act as a hint to the compiler that the argument doesn't overlap with another. This can allow the compiler to potentially warn of this situation since your assembly code is unsafe for that situation. Your prototypes could have been written as:
extern void __stdcall __MyMemcpy( void* __restrict, void* __restrict, int);
typedef struct {
void(__stdcall* MemCpy)(void* __restrict, void* __restrict, int);
}MemFunc;
You are using PROC but not taking advantage of any of the underlying power it affords (or obscures). You have disabled PROLOGUE and EPILOGUE generation with the OPTION directive. You properly use RET 0Ch to have the 12 bytes of arguments cleaned from the stack.
From a perspective of the STDCALL calling convention your code is correct as it pertains to stack usage. There is a serious issue in that the Microsoft Windows STDCALL calling convention requires the caller to preserve all the registers it uses except EAX, ECX, and EDX. You clobber EDI and ESI and both need to be saved before you use them. In your code you save them after their contents are destroyed. You have to push both ESI and EDI on the stack first. This will require you adding 8 to the offsets relative to ESP. Your code should have looked like this:
__MyMemcpy PROC _dest : DWORD, _source : DWORD, _size : DWORD
PUSH EDI ; Save registers first
PUSH ESI
MOV EDI, DWORD PTR [ESP + 0CH] ; Arguments are offset by an additional 8 bytes
MOV ESI, DWORD PTR [ESP + 10H]
MOV ECX, DWORD PTR [ESP + 14H]
__AGAIN:
LODSB
STOSB
LOOP __AGAIN
POP ESI ; Restore the caller (non-volatile) registers
POP EDI
RETN 0CH
__MyMemcpy ENDP
You asked the question why it appears you are getting an error about ESP or a stack issue. I assume you are getting an error similar to this:
This could be a result of either ESP being incorrect when mixing STDCALL and CDECL calling conventions or it can arise out of the value of the saved ESP being clobbered by the function. It appears in your case it is the latter.
I wrote a small C++ project with this code that has similar behaviour to your C program:
#include <iostream>
extern "C" void __stdcall __MyMemcpy( void* __restrict, void* __restrict, int);
typedef struct {
void(__stdcall* MemCpy)(void* __restrict, void* __restrict, int);
}MemFunc;
int initmemfunc(MemFunc* f) {
f->MemCpy = __MyMemcpy;
return 0;
}
char buf1[] = "Testing";
char buf2[200];
int main()
{
MemFunc mf = { 0 };
initmemfunc(&mf);
mf.MemCpy(buf2, buf1, strlen(buf1));
std::cout << "Hello World!\n" << buf2;
}
When I use code like yours that doesn't properly save ESI and EDI I discovered this in the generated assembly code displayed in the Visual Studio C/C++ debugger:
I have annotated the important parts. The compiler has generated C runtime checks (these can be disabled, but they will just hide the problem and not fix it) including a check of ESP across a STDCALL function call. Unfortunately it relies on saving the original value of ESP (before pushing parameters) into the register ESI. As a result a runtime check is made after the call to __MyMemcpy to see if ESP and ESI are still the same value. If they aren't you get the warning about ESP not being saved correctly.
Since your code incorrectly clobbers ESI (and EDI) the check fails. I have annotated the debug output to hopefully provide a better explanation.
You can avoid the use of a LODSB/STOSB loop to copy data. There is an instruction that just this very operation (REP MOVSB) that copies ECX bytes pointed to by ESI and copies them to EDI. A version of your code could have been written as:
__MyMemcpy PROC _dest : DWORD, _source : DWORD, _size : DWORD
PUSH EDI ; Save registers first
PUSH ESI
MOV EDI, DWORD PTR [ESP + 0CH] ; Arguments are offset by an additional 8 bytes
MOV ESI, DWORD PTR [ESP + 10H]
MOV ECX, DWORD PTR [ESP + 14H]
REP MOVSB
POP ESI ; Restore the caller (non-volatile) registers
POP EDI
RETN 0CH
__MyMemcpy ENDP
If you were to use the power of PROC to save the registers ESI and EDI you could list them with the USES directive. You can also reference the argument locations on the stack by name. You can also have MASM generate the proper EPILOGUE sequence for the calling convention by simply using ret. This will clean the up the stack appropriately and in the case of STDCALL return by removing the specified number of bytes from the stack (ie ret 0ch) in this case since there are 3 4-byte arguments.
The downside is that you do have to generate the PROLOGUE and EPILOGUE code that can make things more inefficient:
.MODEL FLAT, STDCALL
.CODE
__MyMemcpy PROC USES ESI EDI dest : DWORD, source : DWORD, size : DWORD
MOV EDI, dest
MOV ESI, source
MOV ECX, size
REP MOVSB ; Use instead of LODSB/STOSB+Loop
RET
__MyMemcpy ENDP
END
The assembler would generate this code for you:
PUBLIC __MyMemcpy#12
__MyMemcpy#12:
push ebp
mov ebp,esp ; Function prologue generate by PROC
push esi ; USES caused assembler to push EDI/ESI on stack
push edi
mov edi,dword ptr [ebp+8]
mov esi,dword ptr [ebp+0Ch]
mov ecx,dword ptr [ebp+10h]
rep movs byte ptr es:[edi],byte ptr [esi]
; MASM generated this from the simple RET instruction to restore registers,
; clean up stack and return back to caller per the STDCALL calling convention
pop edi ; Assembler
pop esi
leave
ret 0Ch
Some may rightly argue that having the assembler obscure all this work makes the code potentially harder to understand for someone who doesn't realize the special processing MASM can do with a PROC declared function. This may result in harder to maintain code for someone else that is unfamiliar with MASM's nuances in the future. If you don't understand what MASM may generate, then sticking to coding the body of the function yourself is probably a safer bet. As you have found that also involves turning PROLOGUE and EPILOGUE code generation off.
The reason why the stack is corrupted is that MASM "secretly" inserts the prologue code to your function. When I added the option to disable that, the function works for me now.
You can see this, when you switch to assembly mode while still in the C code and then step into your function. It seems that VS doesn't swtich to assembly mode when already in the assembly source.
.586
.MODEL FLAT,STDCALL
OPTION PROLOGUE:NONE
.CODE
mymemcpy PROC dest:DWORD, src:DWORD, sz:DWORD
MOV EDI, [ESP+04H]
MOV ESI, [ESP+08H]
MOV ECX, [ESP+0CH]
AGAIN_:
LODSB
STOSB
LOOP AGAIN_
RETN 0CH
mymemcpy ENDP
END
I am completing an assignment related to c programming and assembly language. Here is the simple c program :
int multiply(int a, int b) {
int k = 4;
int c,d, e;
c = a*b ;
d = a*b + k*c;
return d;
}
And it's optimised assembly is
_a$ = 8 ; size = 4
_b$ = 12 ; size = 4
_multiply PROC
mov eax, DWORD PTR _a$[esp-4]
imul eax, DWORD PTR _b$[esp-4]
lea eax, DWORD PTR [eax+eax*4]
ret 0
_multiply ENDP
I want to know the value of eax register after this line of code in assembly
lea eax, DWORD PTR [eax+eax*4]
I know when add integers in assembly, it stores result in the destination. and when we multiply it stores in eax. so if I call the function multiply( 3 , 8 ), the value of eax register after that line should be 120. Am I correct?
lea is "load effective address".
Instruction sets can have some quite complex multi-register address calculation modes that are generally used just for reading and writing data to memory, but lea allows the programmer to get the address that would be accessed by the instruction.
Effectively, it performs the calculation inside the bracket, returns that value - it doesn't access the memory (which is what bracket usually implies).
In this case it is being used as a quick way to multiply by 5, because the rest of the function has been optimised away!
this is my first question, because I couldn't find anything related to this topic.
Recently, while making a class for my C game engine project I've found something interesting:
struct Stack *S1 = new(Stack);
struct Stack *S2 = new(Stack);
S1->bPush(S1, 1, 2); //at this point
bPush is a function pointer in the structure.
So I wondered, what does operator -> in that case, and I've discovered:
mov r8b,2 ; a char, written to a low point of register r8
mov dl,1 ; also a char, but to d this time
mov rcx,qword ptr [S1] ; this is the 1st parameter of function
mov rax,qword ptr [S1] ; !Why cannot I use this one?
call qword ptr [rax+1A0h] ; pointer call
so I assume -> writes an object pointer to rcx, and I'd like to use it in functions (methods they shall be). So the question is, how can I do something alike
push rcx
// do other call vars
pop rcx
mov qword ptr [this], rcx
before it starts writing other variables of the function. Something with preprocessor?
It looks like you'd have an easier time (and get asm that's the same or more efficient) if you wrote in C++ so you could use language built-in support for virtual functions, and for running constructors on initialization. Not to mention not having to manually run destructors. You wouldn't need your struct Class hack.
I'd like to implicitly pass *this pointer, because as shown in second asm part it does the same thing twice, yes, it is what I'm looking for, bPush is a part of a struct and it cannot be called from outside, but I have to pass the pointer S1, which it already has.
You get inefficient asm because you disabled optimization.
MSVC -O2 or -Ox doesn't reload the static pointer twice. It does waste a mov instruction copying between registers, but if you want better asm use a better compiler (like gcc or clang).
The oldest MSVC on the Godbolt compiler explorer is CL19.0 from MSVC 2015, which compiles this source
struct Stack {
int stuff[4];
void (*bPush)(struct Stack*, unsigned char value, unsigned char length);
};
struct Stack *const S1 = new(Stack);
int foo(){
S1->bPush(S1, 1, 2);
//S1->bPush(S1, 1, 2);
return 0; // prevent tailcall optimization
}
into this asm (Godbolt)
# MSVC 2015 -O2
int foo(void) PROC ; foo, COMDAT
$LN4:
sub rsp, 40 ; 00000028H
mov rax, QWORD PTR Stack * __ptr64 __ptr64 S1
mov r8b, 2
mov dl, 1
mov rcx, rax ;; copy RAX to the arg-passing register
call QWORD PTR [rax+16]
xor eax, eax
add rsp, 40 ; 00000028H
ret 0
int foo(void) ENDP ; foo
(I compiled in C++ mode so I could write S1 = new(Stack) without having to copy your github code, and write it at global scope with a non-constant initializer.)
Clang7.0 -O3 loads into RCX straight away:
# clang -O3
foo():
sub rsp, 40
mov rcx, qword ptr [rip + S1]
mov dl, 1
mov r8b, 2
call qword ptr [rcx + 16] # uses the arg-passing register
xor eax, eax
add rsp, 40
ret
Strangely, clang only decides to use low-byte registers when targeting the Windows ABI with __attribute__((ms_abi)). It uses mov esi, 1 to avoid false dependencies when targeting its default Linux calling convention, not mov sil, 1.
Or if you are using optimization, then it's because even older MSVC is even worse. In that case you probably can't do anything in the C source to fix it, although you might try using a struct Stack *p = S1 local variable to hand-hold the compiler into loading it into a register once and reusing it from there.)
I'm picking up ASM language and trying out the IMUL function on Ubuntu Eclipse C++, but for some reason I just cant seem to get the desired output from my code.
Required:
Multiply the negative elements of an integer array int_array by a specified integer inum
Here's my code for the above:
C code:
#include <stdio.h>
extern void multiply_function();
// Variables
int iaver, inum;
int int_ar[10] = {1,2,3,4,-9,6,7,8,9,10};
int main()
{
inum = 2;
multiply_function();
for(int i=0; i<10; i++){
printf("%d ",int_ar[i]);
}
}
ASM code:
extern int_ar
extern inum
global multiply_function
multiply_function:
enter 0,0
mov ecx, 10
mov eax, inum
multiply_loop:
cmp [int_ar +ecx*4-4], dword 0
jg .ifpositive
mov ebx, [int_ar +ecx*4-4]
imul ebx
cdq
mov [int_ar +ecx*4-4], eax
loop multiply_loop
leave
ret
.ifpositive:
loop multiply_loop
leave
ret
The Problem
For an array of: {1,2,3,4,-9,6,7,8,9,10} and inum, I get the output {1,2,3,4,-1210688460,6,7,8,9,10} which hints at some sort of overflow occurring.
Is there something I'm missing or understood wrong about how the IMUL function in assembly language for x86 works?
Expected Output
The output I expected is {1,2,3,4,-18,6,7,8,9,10}
My Thought Process
My thought process for the above task:
1) Find which array elements in array are negative, for each positive element found, do nothing and continue loop to next element
cmp [int_ar +ecx*4-4], dword 0
jg .ifpositive
.ifpositive:
loop multiply_loop
leave
ret
2) Upon finding the negative element, move its value into register EBX which will serve as SRC in the IMUL SRC function. Then extend register EAX to EAX-EDX where the result is stored in:
mov ebx, [int_ar +ecx*4-4]
imul ebx
cdq
3) Move the result into the negative element of the array by using MOV:
mov [int_ar +ecx*4-4], eax
4) Loop through to the next array element and repeat the above 1)-3)
Reason for Incorrect Values
If we look past the inefficiencies and unneeded code and deal with the real issue it comes down to this instruction:
mov eax, inum
What is inum? You created and initialized a global variable in C called inum with:
int iaver, inum;
[snip]
inum = 2;
inum as a variable is essentially a label to a memory location containing an int (32-bit value). In your assembly code you need to treat inum as a pointer to a value, not the value itself. In your assembly code you need to change:
mov eax, inum
to:
mov eax, [inum]
What your version does is moves the address of inum into EAX. Your code ended up multiplying the address of the variable by the negative numbers in your array. That cause the erroneous values you see. the square brackets around inum tell the assembler you want to treat inum as a memory operand, and that you want to move the 32-bit value at inuminto EAX.
Calling Convention
You appear to be creating a 32-bit program and running it on 32-bit Ubuntu. I can infer the possibility of a 32-bit Linux by the erroneous value of -1210688460 being returned. -1210688460 = 0xB7D65C34 divide by -9 and you get 804A06C. Programs on 32-bit Linux are usually loaded starting at 0x8048000
Whether running on 32-bit Linux or 64-bit Linux, assembly code linked with 32-bit C/C++ programs need to abide by the CDECL calling convention:
cdecl
The cdecl (which stands for C declaration) is a calling convention that originates from the C programming language and is used by many C compilers for the x86 architecture.1 In cdecl, subroutine arguments are passed on the stack. Integer values and memory addresses are returned in the EAX register, floating point values in the ST0 x87 register. Registers EAX, ECX, and EDX are caller-saved, and the rest are callee-saved. The x87 floating point registers ST0 to ST7 must be empty (popped or freed) when calling a new function, and ST1 to ST7 must be empty on exiting a function. ST0 must also be empty when not used for returning a value.
Your code clobbers EAX, EBX, ECX, and EDX. You are free to destroy the contents of EAX, ECX, and EDX but you must preserve EBX. If you don't you can cause problems for the C code calling the function. After you do the enter 0,0 instruction you can push ebx and just before each leave instruction you can do pop ebx
If you were to use -O1, -O2, or -O3 GCC compiler options to enable optimizations your program may not work as expected or crash altogether.