I have a problem with the below code:
void swap(int* a, int* b) {
__asm {
mov eax, a;
mov ebx, b;
push[eax];
push[ebx];
pop[eax];
pop[ebx];
}
}
int main() {
int a = 3, b = 6;
printf("a: %d\tb: %d\n", a, b);
swap(&a, &b);
printf("a: %d\tb: %d\n", a, b);
}
I am running this code in visual studio and when I run this, it says:
Run-Time check failure- The value of ESP was not properly saved across a function call. This is usually a result of calling a function declared with one calling convention with a function pointer declared with a different calling convention.
What am I missing?
To answer the title question: make sure you balance pushes and pops. (Normally getting that wrong would just crash, not return with the wrong ESP). If you're writing a whole function in asm make sure ret 0 or ret 8 or whatever matches the calling convention you're supposed to be using and the amount of stack args to pop (e.g. caller-pops cdecl ret 0 or callee-pops stdcall ret n).
Looking at the compiler's asm output (e.g. on Godbolt or locally) reveals the problem: different operand-sizes for push vs. pop, MSVC not defaulting to dword ptr for pop.
; MSVC 19.14 (under WINE) -O0
_a$ = 8 ; size = 4
_b$ = 12 ; size = 4
void swap(int *,int *) PROC ; swap
push ebp
mov ebp, esp
push ebx ; save this call-preserved reg because you used it instead of ECX or EDX
mov eax, DWORD PTR _a$[ebp]
mov ebx, DWORD PTR _b$[ebp]
push DWORD PTR [eax]
push DWORD PTR [ebx]
pop WORD PTR [eax]
pop WORD PTR [ebx]
pop ebx
pop ebp
ret 0
void swap(int *,int *) ENDP
This code would just crash, with ret executing while ESP points to the saved EBP (pushed by push ebp). Presumably Visual Studio passes addition debug-build options to the compiler so it does more checking instead of just crashing?
Insanely, MSVC compiles/assembles push [reg] to push dword ptr (32-bit operand-size, ESP-=4 each), but pop [reg] to pop word ptr (16-bit operand-size, ESP+=2 each)
It doesn't even warn about the operand-size being ambiguous, unlike good assemblers such as NASM where push [eax] is an error without a size override. (push 123 of an immediate always defaults to an operand-size matching the mode, but push/pop of a memory operand usually needs a size specifier in most assemblers.)
Use push dword ptr [eax] / pop dword ptr [ebx]
Or since you're using EBX anyway, not limiting your function to just the 3 call-clobbered registers in the standard 32-bit calling conventions, use registers to hold the temporaries instead of stack space.
void swap_mov(int* a, int* b) {
__asm {
mov eax, a
mov ebx, b
mov ecx, [eax]
mov edx, [ebx]
mov [eax], edx
mov [ebx], ecx
}
}
(You don't need ; empty comments at the end of each line. The syntax inside an asm{} block is MASM-like, not C statements.)
Related
there's something weird going on here. Visual Studio is letting me know the ESP value was not properly saved but I cannot see any mistakes in the code (32-bit, windows, __stdcall)
MASM code:
.MODE FLAT, STDCALL
...
memcpy PROC dest : DWORD, source : DWORD, size : DWORD
MOV EDI, [ESP+04H]
MOV ESI, [ESP+08H]
MOV ECX, [ESP+0CH]
AGAIN_:
LODSB
STOSB
LOOP AGAIN_
RETN 0CH
memcpy ENDP
I am passing 12 bytes (0xC) to the stack then cleaning it up. I have confirmed by looking at the symbols the functions symbol goes like "memcpy#12", so its indeed finding the proper symbol
this is the C prototype:
extern void __stdcall * _memcpy(void*,void*,unsigned __int32);
Compiling in 32-bit. The function copies the memory (I can see in the debugger), but the stack cleanup appears not to be working
EDIT:
MASM code:
__MyMemcpy PROC _dest : DWORD, _source : DWORD, _size : DWORD
MOV EDI, DWORD PTR [ESP + 04H]
MOV ESI, DWORD PTR [ESP + 08H]
MOV ECX, DWORD PTR [ESP + 0CH]
PUSH ESI
PUSH EDI
__AGAIN:
LODSB
STOSB
LOOP __AGAIN
POP EDI
POP ESI
RETN 0CH
__MyMemcpy ENDP
C code:
extern void __stdcall __MyMemcpy(void*, void*, int);
typedef struct {
void(__stdcall*MemCpy)(void*,void*,int);
}MemFunc;
int initmemfunc(MemFunc*f){
f->MemCpy=__MyMemcpy
}
when I call it like this I get the error:
MemFunc mf={0};
initmemfunc(&mf);
mf.MemCpy(dest,src,size);
when I call it like this I dont:
__MyMemcpy(dest,src,size)
Since you have provided an update to your question and comments suggesting you disable prologue and epilogue code generation for functions created with the MASM PROC directive I suspect your code looks something like this:
.MODEL FLAT, STDCALL
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
.CODE
__MyMemcpy PROC _dest : DWORD, _source : DWORD, _size : DWORD
MOV EDI, DWORD PTR [ESP + 04H]
MOV ESI, DWORD PTR [ESP + 08H]
MOV ECX, DWORD PTR [ESP + 0CH]
PUSH ESI
PUSH EDI
__AGAIN:
LODSB
STOSB
LOOP __AGAIN
POP EDI
POP ESI
RETN 0CH
__MyMemcpy ENDP
END
A note about this code: beware that if your source and destination buffers overlap this can cause problems. If the buffers don't overlap then what you are doing should work. You can avoid this by marking the pointers __restrict. __restrict is an MSVC/C++ extension that will act as a hint to the compiler that the argument doesn't overlap with another. This can allow the compiler to potentially warn of this situation since your assembly code is unsafe for that situation. Your prototypes could have been written as:
extern void __stdcall __MyMemcpy( void* __restrict, void* __restrict, int);
typedef struct {
void(__stdcall* MemCpy)(void* __restrict, void* __restrict, int);
}MemFunc;
You are using PROC but not taking advantage of any of the underlying power it affords (or obscures). You have disabled PROLOGUE and EPILOGUE generation with the OPTION directive. You properly use RET 0Ch to have the 12 bytes of arguments cleaned from the stack.
From a perspective of the STDCALL calling convention your code is correct as it pertains to stack usage. There is a serious issue in that the Microsoft Windows STDCALL calling convention requires the caller to preserve all the registers it uses except EAX, ECX, and EDX. You clobber EDI and ESI and both need to be saved before you use them. In your code you save them after their contents are destroyed. You have to push both ESI and EDI on the stack first. This will require you adding 8 to the offsets relative to ESP. Your code should have looked like this:
__MyMemcpy PROC _dest : DWORD, _source : DWORD, _size : DWORD
PUSH EDI ; Save registers first
PUSH ESI
MOV EDI, DWORD PTR [ESP + 0CH] ; Arguments are offset by an additional 8 bytes
MOV ESI, DWORD PTR [ESP + 10H]
MOV ECX, DWORD PTR [ESP + 14H]
__AGAIN:
LODSB
STOSB
LOOP __AGAIN
POP ESI ; Restore the caller (non-volatile) registers
POP EDI
RETN 0CH
__MyMemcpy ENDP
You asked the question why it appears you are getting an error about ESP or a stack issue. I assume you are getting an error similar to this:
This could be a result of either ESP being incorrect when mixing STDCALL and CDECL calling conventions or it can arise out of the value of the saved ESP being clobbered by the function. It appears in your case it is the latter.
I wrote a small C++ project with this code that has similar behaviour to your C program:
#include <iostream>
extern "C" void __stdcall __MyMemcpy( void* __restrict, void* __restrict, int);
typedef struct {
void(__stdcall* MemCpy)(void* __restrict, void* __restrict, int);
}MemFunc;
int initmemfunc(MemFunc* f) {
f->MemCpy = __MyMemcpy;
return 0;
}
char buf1[] = "Testing";
char buf2[200];
int main()
{
MemFunc mf = { 0 };
initmemfunc(&mf);
mf.MemCpy(buf2, buf1, strlen(buf1));
std::cout << "Hello World!\n" << buf2;
}
When I use code like yours that doesn't properly save ESI and EDI I discovered this in the generated assembly code displayed in the Visual Studio C/C++ debugger:
I have annotated the important parts. The compiler has generated C runtime checks (these can be disabled, but they will just hide the problem and not fix it) including a check of ESP across a STDCALL function call. Unfortunately it relies on saving the original value of ESP (before pushing parameters) into the register ESI. As a result a runtime check is made after the call to __MyMemcpy to see if ESP and ESI are still the same value. If they aren't you get the warning about ESP not being saved correctly.
Since your code incorrectly clobbers ESI (and EDI) the check fails. I have annotated the debug output to hopefully provide a better explanation.
You can avoid the use of a LODSB/STOSB loop to copy data. There is an instruction that just this very operation (REP MOVSB) that copies ECX bytes pointed to by ESI and copies them to EDI. A version of your code could have been written as:
__MyMemcpy PROC _dest : DWORD, _source : DWORD, _size : DWORD
PUSH EDI ; Save registers first
PUSH ESI
MOV EDI, DWORD PTR [ESP + 0CH] ; Arguments are offset by an additional 8 bytes
MOV ESI, DWORD PTR [ESP + 10H]
MOV ECX, DWORD PTR [ESP + 14H]
REP MOVSB
POP ESI ; Restore the caller (non-volatile) registers
POP EDI
RETN 0CH
__MyMemcpy ENDP
If you were to use the power of PROC to save the registers ESI and EDI you could list them with the USES directive. You can also reference the argument locations on the stack by name. You can also have MASM generate the proper EPILOGUE sequence for the calling convention by simply using ret. This will clean the up the stack appropriately and in the case of STDCALL return by removing the specified number of bytes from the stack (ie ret 0ch) in this case since there are 3 4-byte arguments.
The downside is that you do have to generate the PROLOGUE and EPILOGUE code that can make things more inefficient:
.MODEL FLAT, STDCALL
.CODE
__MyMemcpy PROC USES ESI EDI dest : DWORD, source : DWORD, size : DWORD
MOV EDI, dest
MOV ESI, source
MOV ECX, size
REP MOVSB ; Use instead of LODSB/STOSB+Loop
RET
__MyMemcpy ENDP
END
The assembler would generate this code for you:
PUBLIC __MyMemcpy#12
__MyMemcpy#12:
push ebp
mov ebp,esp ; Function prologue generate by PROC
push esi ; USES caused assembler to push EDI/ESI on stack
push edi
mov edi,dword ptr [ebp+8]
mov esi,dword ptr [ebp+0Ch]
mov ecx,dword ptr [ebp+10h]
rep movs byte ptr es:[edi],byte ptr [esi]
; MASM generated this from the simple RET instruction to restore registers,
; clean up stack and return back to caller per the STDCALL calling convention
pop edi ; Assembler
pop esi
leave
ret 0Ch
Some may rightly argue that having the assembler obscure all this work makes the code potentially harder to understand for someone who doesn't realize the special processing MASM can do with a PROC declared function. This may result in harder to maintain code for someone else that is unfamiliar with MASM's nuances in the future. If you don't understand what MASM may generate, then sticking to coding the body of the function yourself is probably a safer bet. As you have found that also involves turning PROLOGUE and EPILOGUE code generation off.
The reason why the stack is corrupted is that MASM "secretly" inserts the prologue code to your function. When I added the option to disable that, the function works for me now.
You can see this, when you switch to assembly mode while still in the C code and then step into your function. It seems that VS doesn't swtich to assembly mode when already in the assembly source.
.586
.MODEL FLAT,STDCALL
OPTION PROLOGUE:NONE
.CODE
mymemcpy PROC dest:DWORD, src:DWORD, sz:DWORD
MOV EDI, [ESP+04H]
MOV ESI, [ESP+08H]
MOV ECX, [ESP+0CH]
AGAIN_:
LODSB
STOSB
LOOP AGAIN_
RETN 0CH
mymemcpy ENDP
END
I'm trying to understand this simple C program:
int square(int num) {
return num * num;
}
When it's in Assembly code:
square(int):
push rbp ;push rbp register onto stack
mov rbp, rsp ;move contents of rbp register into rsp register
mov DWORD PTR [rbp-4], edi ;not sure what happens here
mov eax, DWORD PTR [rbp-4] ;not sure what happens here
imul eax, DWORD PTR [rbp-4] ;multiply eax and DWORD PTR [rbp-4] (?)
pop rbp ;pop original register out of stack
ret ;return
What is happening in the 3rd and 4th line?
Why did two more register (edi and eax) have to be used instead of rsp?
What is actually happening with DWORD PTR [rbp-4]?
mov DWORD PTR [rbp-4], edi ;not sure what happens here
The x86_64 System V ABI passes function arguments via registers - the first integer argument is passed in the rdi/edi register. So this line copies the argument num to a local (offset -4 bytes from the frame pointer value stored in rbp).
mov eax, DWORD PTR [rbp-4] ;not sure what happens here
This copies the value in the local to the eax register.
imul eax, DWORD PTR [rbp-4] ;multiply eax and DWORD PTR [rbp-4] (?)
And this multiplies the value in eax by the local, and stores the result to eax (which also happens to be the register in which the function return value is stored).
As others pointed out in the comments, compiling with optimization would likely eliminate the local, and write directly from edi to eax.
I would like to know what's really happening calling & and * in C.
Is that it costs a lot of resources? Should I call & each time I wanna get an adress of a same given variable or keep it in memory i.e in a cache variable. Same for * i.e when I wanna get a pointer value ?
Example
void bar(char *str)
{
check_one(*str)
check_two(*str)
//... Could be replaced by
char c = *str;
check_one(c);
check_two(c);
}
I would like to know what's really happening calling & and * in C.
There's no such thing as "calling" & or *. They are the address operator, or the dereference operator, and instruct the compiler to work with the address of an object, or with the object that a pointer points to, respectively.
And C is not C++, so there's no references; I think you just misused that word in your question's title.
In most cases, that's basically two ways to look at the same thing.
Usually, you'll use & when you actually want the address of an object. Since the compiler needs to handle objects in memory with their address anyway, there's no overhead.
For the specific implications of using the operators, you'll have to look at the assembler your compiler generates.
Example: consider this trivial code, disassembled via godbolt.org:
#include <stdio.h>
#include <stdlib.h>
void check_one(char c)
{
if(c == 'x')
exit(0);
}
void check_two(char c)
{
if(c == 'X')
exit(1);
}
void foo(char *str)
{
check_one(*str);
check_two(*str);
}
void bar(char *str)
{
char c = *str;
check_one(c);
check_two(c);
}
int main()
{
char msg[] = "something";
foo(msg);
bar(msg);
}
The compiler output can far wildly depending on the vendor and optimization settings.
clang 3.8 using -O2
check_one(char): # #check_one(char)
movzx eax, dil
cmp eax, 120
je .LBB0_2
ret
.LBB0_2:
push rax
xor edi, edi
call exit
check_two(char): # #check_two(char)
movzx eax, dil
cmp eax, 88
je .LBB1_2
ret
.LBB1_2:
push rax
mov edi, 1
call exit
foo(char*): # #foo(char*)
push rax
movzx eax, byte ptr [rdi]
cmp eax, 88
je .LBB2_3
movzx eax, al
cmp eax, 120
je .LBB2_2
pop rax
ret
.LBB2_3:
mov edi, 1
call exit
.LBB2_2:
xor edi, edi
call exit
bar(char*): # #bar(char*)
push rax
movzx eax, byte ptr [rdi]
cmp eax, 88
je .LBB3_3
movzx eax, al
cmp eax, 120
je .LBB3_2
pop rax
ret
.LBB3_3:
mov edi, 1
call exit
.LBB3_2:
xor edi, edi
call exit
main: # #main
xor eax, eax
ret
Notice that foo and bar are identical. Do other compilers do something similar? Well...
gcc x64 5.4 using -O2
check_one(char):
cmp dil, 120
je .L6
rep ret
.L6:
push rax
xor edi, edi
call exit
check_two(char):
cmp dil, 88
je .L11
rep ret
.L11:
push rax
mov edi, 1
call exit
bar(char*):
sub rsp, 8
movzx eax, BYTE PTR [rdi]
cmp al, 120
je .L16
cmp al, 88
je .L17
add rsp, 8
ret
.L16:
xor edi, edi
call exit
.L17:
mov edi, 1
call exit
foo(char*):
jmp bar(char*)
main:
sub rsp, 24
movabs rax, 7956005065853857651
mov QWORD PTR [rsp], rax
mov rdi, rsp
mov eax, 103
mov WORD PTR [rsp+8], ax
call bar(char*)
mov rdi, rsp
call bar(char*)
xor eax, eax
add rsp, 24
ret
Well, if there were any doubt foo and bar are equivalent, a least by the compiler, I think this:
foo(char*):
jmp bar(char*)
is a strong argument they indeed are.
In C, there's no runtime cost associated with either the unary & or * operators; both are evaluated at compile time. So there's no difference in runtime between
check_one(*str)
check_two(*str)
and
char c = *str;
check_one( c );
check_two( c );
ignoring the overhead of the assignment.
That's not necessarily true in C++, since you can overload those operators.
tldr;
If you are programming in C, then the & operator is used to obtain the address of a variable and * is used to get the value of that variable, given it's address.
This is also the reason why in C, when you pass a string to a function, you must state the length of the string otherwise, if someone unfamiliar with your logic sees the function signature, they could not tell if the function is called as bar(&some_char) or bar(some_cstr).
To conclude, if you have a variable x of type someType, then &x will result in someType* addressOfX and *addressOfX will result in giving the value of x. Functions in C only take pointers as parameters, i.e. you cannot create a function where the parameter type is &x or &&x
Also your examples can be rewritten as:
check_one(str[0])
check_two(str[0])
AFAIK, in x86 and x64 your variables are stored in memory (if not stated with register keyword) and accessed by pointers.
const int foo = 5 equal to foo dd 5 and check_one(*foo) equal to push dword [foo]; call check_one.
If you create additional variable c, then it looks like:
c resd 1
...
mov eax, [foo]
mov dword [c], eax ; Variable foo just copied to c
push dword [c]
call check_one
And nothing changed, except additional copying and memory allocation.
I think that compiler's optimizer deals with it and makes both cases as fast as it is possible. So you can use more readable variant.
I compiled this C code:
void foo() {
int i = 0;
i = 0;
i = 0;
}
and I got this:
push ebp
mov ebp,esp
push ecx
mov dword ptr ss:[ebp-4],0
mov dword ptr ss:[ebp-4],0
mov dword ptr ss:[ebp-4],0
mov esp,ebp
pop ebp
retn
My question is why is there push ecx? and how come there is no sub esp,4 or something to make space on the stack? No compiler options used.
Either way will make 4 bytes of space available on the stack, and the push saves a couple of bytes over the sub. Maybe the compiler writer decided to optimize this case by pushing a register.
Compiling this simple function with MSVC2008, in Debug mode:
int __cdecl sum(int a, int b)
{
return a + b;
}
I get the following disassembly listing:
int __cdecl sum(int a, int b)
{
004113B0 push ebp
004113B1 mov ebp,esp
004113B3 sub esp,0C0h
004113B9 push ebx
004113BA push esi
004113BB push edi
004113BC lea edi,[ebp-0C0h]
004113C2 mov ecx,30h
004113C7 mov eax,0CCCCCCCCh
004113CC rep stos dword ptr es:[edi]
return a + b;
004113CE mov eax,dword ptr [a]
004113D1 add eax,dword ptr [b]
}
004113D4 pop edi
004113D5 pop esi
004113D6 pop ebx
004113D7 mov esp,ebp
004113D9 pop ebp
004113DA ret
There are some parts of the prolog I don't understand:
004113BC lea edi,[ebp-0C0h]
004113C2 mov ecx,30h
004113C7 mov eax,0CCCCCCCCh
004113CC rep stos dword ptr es:[edi]
Why is this required?
EDIT:
After removing the /RTC compiler option, as was suggested, most of this code indeed went away. What remained is:
int __cdecl sum(int a, int b)
{
00411270 push ebp
00411271 mov ebp,esp
00411273 sub esp,40h
00411276 push ebx
00411277 push esi
00411278 push edi
return a + b;
00411279 mov eax,dword ptr [a]
0041127C add eax,dword ptr [b]
}
Now, why is the: sub esp, 40h needed? It's as if place is being allocated for local variables, though there aren't any. Why is the compiler doing this? Is there another flag involved?
This code is emitted due to the /RTC compile option. It initializes all local variables in your function to a bit pattern that is highly likely to generate an access violation or to cause unusual output values. That helps you find out when you forgot to initialize a variable.
The extra space in the stack frame you see allocated is there to support the Edit + Continue feature. This space will be used when you edit the function while debugging and add more local variables. Change the /ZI option to /Zi to disable it.
and in any case of buffer overflow (if you would overwrite local variables) you will end up in a field of "int 3" opcodes:
int 3 ; 0xCC
int 3 ; 0xCC
int 3 ; 0xCC
int 3 ; 0xCC
int 3 ; 0xCC
int 3 ; 0xCC
...
that can be catched by the debugger, so you can fix your code