cl.exe produces weird assembly code - c

I compiled this C code:
void foo() {
int i = 0;
i = 0;
i = 0;
}
and I got this:
push ebp
mov ebp,esp
push ecx
mov dword ptr ss:[ebp-4],0
mov dword ptr ss:[ebp-4],0
mov dword ptr ss:[ebp-4],0
mov esp,ebp
pop ebp
retn
My question is why is there push ecx? and how come there is no sub esp,4 or something to make space on the stack? No compiler options used.

Either way will make 4 bytes of space available on the stack, and the push saves a couple of bytes over the sub. Maybe the compiler writer decided to optimize this case by pushing a register.

Related

how to save the value of ESP during a function call

I have a problem with the below code:
void swap(int* a, int* b) {
__asm {
mov eax, a;
mov ebx, b;
push[eax];
push[ebx];
pop[eax];
pop[ebx];
}
}
int main() {
int a = 3, b = 6;
printf("a: %d\tb: %d\n", a, b);
swap(&a, &b);
printf("a: %d\tb: %d\n", a, b);
}
I am running this code in visual studio and when I run this, it says:
Run-Time check failure- The value of ESP was not properly saved across a function call. This is usually a result of calling a function declared with one calling convention with a function pointer declared with a different calling convention.
What am I missing?
To answer the title question: make sure you balance pushes and pops. (Normally getting that wrong would just crash, not return with the wrong ESP). If you're writing a whole function in asm make sure ret 0 or ret 8 or whatever matches the calling convention you're supposed to be using and the amount of stack args to pop (e.g. caller-pops cdecl ret 0 or callee-pops stdcall ret n).
Looking at the compiler's asm output (e.g. on Godbolt or locally) reveals the problem: different operand-sizes for push vs. pop, MSVC not defaulting to dword ptr for pop.
; MSVC 19.14 (under WINE) -O0
_a$ = 8 ; size = 4
_b$ = 12 ; size = 4
void swap(int *,int *) PROC ; swap
push ebp
mov ebp, esp
push ebx ; save this call-preserved reg because you used it instead of ECX or EDX
mov eax, DWORD PTR _a$[ebp]
mov ebx, DWORD PTR _b$[ebp]
push DWORD PTR [eax]
push DWORD PTR [ebx]
pop WORD PTR [eax]
pop WORD PTR [ebx]
pop ebx
pop ebp
ret 0
void swap(int *,int *) ENDP
This code would just crash, with ret executing while ESP points to the saved EBP (pushed by push ebp). Presumably Visual Studio passes addition debug-build options to the compiler so it does more checking instead of just crashing?
Insanely, MSVC compiles/assembles push [reg] to push dword ptr (32-bit operand-size, ESP-=4 each), but pop [reg] to pop word ptr (16-bit operand-size, ESP+=2 each)
It doesn't even warn about the operand-size being ambiguous, unlike good assemblers such as NASM where push [eax] is an error without a size override. (push 123 of an immediate always defaults to an operand-size matching the mode, but push/pop of a memory operand usually needs a size specifier in most assemblers.)
Use push dword ptr [eax] / pop dword ptr [ebx]
Or since you're using EBX anyway, not limiting your function to just the 3 call-clobbered registers in the standard 32-bit calling conventions, use registers to hold the temporaries instead of stack space.
void swap_mov(int* a, int* b) {
__asm {
mov eax, a
mov ebx, b
mov ecx, [eax]
mov edx, [ebx]
mov [eax], edx
mov [ebx], ecx
}
}
(You don't need ; empty comments at the end of each line. The syntax inside an asm{} block is MASM-like, not C statements.)

Stack cleanup not working (__stdcall MASM function)

there's something weird going on here. Visual Studio is letting me know the ESP value was not properly saved but I cannot see any mistakes in the code (32-bit, windows, __stdcall)
MASM code:
.MODE FLAT, STDCALL
...
memcpy PROC dest : DWORD, source : DWORD, size : DWORD
MOV EDI, [ESP+04H]
MOV ESI, [ESP+08H]
MOV ECX, [ESP+0CH]
AGAIN_:
LODSB
STOSB
LOOP AGAIN_
RETN 0CH
memcpy ENDP
I am passing 12 bytes (0xC) to the stack then cleaning it up. I have confirmed by looking at the symbols the functions symbol goes like "memcpy#12", so its indeed finding the proper symbol
this is the C prototype:
extern void __stdcall * _memcpy(void*,void*,unsigned __int32);
Compiling in 32-bit. The function copies the memory (I can see in the debugger), but the stack cleanup appears not to be working
EDIT:
MASM code:
__MyMemcpy PROC _dest : DWORD, _source : DWORD, _size : DWORD
MOV EDI, DWORD PTR [ESP + 04H]
MOV ESI, DWORD PTR [ESP + 08H]
MOV ECX, DWORD PTR [ESP + 0CH]
PUSH ESI
PUSH EDI
__AGAIN:
LODSB
STOSB
LOOP __AGAIN
POP EDI
POP ESI
RETN 0CH
__MyMemcpy ENDP
C code:
extern void __stdcall __MyMemcpy(void*, void*, int);
typedef struct {
void(__stdcall*MemCpy)(void*,void*,int);
}MemFunc;
int initmemfunc(MemFunc*f){
f->MemCpy=__MyMemcpy
}
when I call it like this I get the error:
MemFunc mf={0};
initmemfunc(&mf);
mf.MemCpy(dest,src,size);
when I call it like this I dont:
__MyMemcpy(dest,src,size)
Since you have provided an update to your question and comments suggesting you disable prologue and epilogue code generation for functions created with the MASM PROC directive I suspect your code looks something like this:
.MODEL FLAT, STDCALL
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
.CODE
__MyMemcpy PROC _dest : DWORD, _source : DWORD, _size : DWORD
MOV EDI, DWORD PTR [ESP + 04H]
MOV ESI, DWORD PTR [ESP + 08H]
MOV ECX, DWORD PTR [ESP + 0CH]
PUSH ESI
PUSH EDI
__AGAIN:
LODSB
STOSB
LOOP __AGAIN
POP EDI
POP ESI
RETN 0CH
__MyMemcpy ENDP
END
A note about this code: beware that if your source and destination buffers overlap this can cause problems. If the buffers don't overlap then what you are doing should work. You can avoid this by marking the pointers __restrict. __restrict is an MSVC/C++ extension that will act as a hint to the compiler that the argument doesn't overlap with another. This can allow the compiler to potentially warn of this situation since your assembly code is unsafe for that situation. Your prototypes could have been written as:
extern void __stdcall __MyMemcpy( void* __restrict, void* __restrict, int);
typedef struct {
void(__stdcall* MemCpy)(void* __restrict, void* __restrict, int);
}MemFunc;
You are using PROC but not taking advantage of any of the underlying power it affords (or obscures). You have disabled PROLOGUE and EPILOGUE generation with the OPTION directive. You properly use RET 0Ch to have the 12 bytes of arguments cleaned from the stack.
From a perspective of the STDCALL calling convention your code is correct as it pertains to stack usage. There is a serious issue in that the Microsoft Windows STDCALL calling convention requires the caller to preserve all the registers it uses except EAX, ECX, and EDX. You clobber EDI and ESI and both need to be saved before you use them. In your code you save them after their contents are destroyed. You have to push both ESI and EDI on the stack first. This will require you adding 8 to the offsets relative to ESP. Your code should have looked like this:
__MyMemcpy PROC _dest : DWORD, _source : DWORD, _size : DWORD
PUSH EDI ; Save registers first
PUSH ESI
MOV EDI, DWORD PTR [ESP + 0CH] ; Arguments are offset by an additional 8 bytes
MOV ESI, DWORD PTR [ESP + 10H]
MOV ECX, DWORD PTR [ESP + 14H]
__AGAIN:
LODSB
STOSB
LOOP __AGAIN
POP ESI ; Restore the caller (non-volatile) registers
POP EDI
RETN 0CH
__MyMemcpy ENDP
You asked the question why it appears you are getting an error about ESP or a stack issue. I assume you are getting an error similar to this:
This could be a result of either ESP being incorrect when mixing STDCALL and CDECL calling conventions or it can arise out of the value of the saved ESP being clobbered by the function. It appears in your case it is the latter.
I wrote a small C++ project with this code that has similar behaviour to your C program:
#include <iostream>
extern "C" void __stdcall __MyMemcpy( void* __restrict, void* __restrict, int);
typedef struct {
void(__stdcall* MemCpy)(void* __restrict, void* __restrict, int);
}MemFunc;
int initmemfunc(MemFunc* f) {
f->MemCpy = __MyMemcpy;
return 0;
}
char buf1[] = "Testing";
char buf2[200];
int main()
{
MemFunc mf = { 0 };
initmemfunc(&mf);
mf.MemCpy(buf2, buf1, strlen(buf1));
std::cout << "Hello World!\n" << buf2;
}
When I use code like yours that doesn't properly save ESI and EDI I discovered this in the generated assembly code displayed in the Visual Studio C/C++ debugger:
I have annotated the important parts. The compiler has generated C runtime checks (these can be disabled, but they will just hide the problem and not fix it) including a check of ESP across a STDCALL function call. Unfortunately it relies on saving the original value of ESP (before pushing parameters) into the register ESI. As a result a runtime check is made after the call to __MyMemcpy to see if ESP and ESI are still the same value. If they aren't you get the warning about ESP not being saved correctly.
Since your code incorrectly clobbers ESI (and EDI) the check fails. I have annotated the debug output to hopefully provide a better explanation.
You can avoid the use of a LODSB/STOSB loop to copy data. There is an instruction that just this very operation (REP MOVSB) that copies ECX bytes pointed to by ESI and copies them to EDI. A version of your code could have been written as:
__MyMemcpy PROC _dest : DWORD, _source : DWORD, _size : DWORD
PUSH EDI ; Save registers first
PUSH ESI
MOV EDI, DWORD PTR [ESP + 0CH] ; Arguments are offset by an additional 8 bytes
MOV ESI, DWORD PTR [ESP + 10H]
MOV ECX, DWORD PTR [ESP + 14H]
REP MOVSB
POP ESI ; Restore the caller (non-volatile) registers
POP EDI
RETN 0CH
__MyMemcpy ENDP
If you were to use the power of PROC to save the registers ESI and EDI you could list them with the USES directive. You can also reference the argument locations on the stack by name. You can also have MASM generate the proper EPILOGUE sequence for the calling convention by simply using ret. This will clean the up the stack appropriately and in the case of STDCALL return by removing the specified number of bytes from the stack (ie ret 0ch) in this case since there are 3 4-byte arguments.
The downside is that you do have to generate the PROLOGUE and EPILOGUE code that can make things more inefficient:
.MODEL FLAT, STDCALL
.CODE
__MyMemcpy PROC USES ESI EDI dest : DWORD, source : DWORD, size : DWORD
MOV EDI, dest
MOV ESI, source
MOV ECX, size
REP MOVSB ; Use instead of LODSB/STOSB+Loop
RET
__MyMemcpy ENDP
END
The assembler would generate this code for you:
PUBLIC __MyMemcpy#12
__MyMemcpy#12:
push ebp
mov ebp,esp ; Function prologue generate by PROC
push esi ; USES caused assembler to push EDI/ESI on stack
push edi
mov edi,dword ptr [ebp+8]
mov esi,dword ptr [ebp+0Ch]
mov ecx,dword ptr [ebp+10h]
rep movs byte ptr es:[edi],byte ptr [esi]
; MASM generated this from the simple RET instruction to restore registers,
; clean up stack and return back to caller per the STDCALL calling convention
pop edi ; Assembler
pop esi
leave
ret 0Ch
Some may rightly argue that having the assembler obscure all this work makes the code potentially harder to understand for someone who doesn't realize the special processing MASM can do with a PROC declared function. This may result in harder to maintain code for someone else that is unfamiliar with MASM's nuances in the future. If you don't understand what MASM may generate, then sticking to coding the body of the function yourself is probably a safer bet. As you have found that also involves turning PROLOGUE and EPILOGUE code generation off.
The reason why the stack is corrupted is that MASM "secretly" inserts the prologue code to your function. When I added the option to disable that, the function works for me now.
You can see this, when you switch to assembly mode while still in the C code and then step into your function. It seems that VS doesn't swtich to assembly mode when already in the assembly source.
.586
.MODEL FLAT,STDCALL
OPTION PROLOGUE:NONE
.CODE
mymemcpy PROC dest:DWORD, src:DWORD, sz:DWORD
MOV EDI, [ESP+04H]
MOV ESI, [ESP+08H]
MOV ECX, [ESP+0CH]
AGAIN_:
LODSB
STOSB
LOOP AGAIN_
RETN 0CH
mymemcpy ENDP
END

Assembly and callstack

I'm trying to get an understanding of assembly but unfortunately I have problems to understand the following C code in assembly:
void test_function(int a, int b, int c, int d) {
int flag;
char buffer[10]
flag = 31337;
buffer[0] = 'A';
}
int main() {
test_fuction(1,2,3,4);
}
The assembly of main() looks like this:
push ebp
mov ebp, esp
sub esp,0x18
and esp,0xffffffff0
mov eax,0x0
sub esp,eax
mov DWORD PTR [esp+12], 0x4
mov DWORD PTR [esp+12], 0x3
mov DWORD PTR [esp+12], 0x2
mov DWORD PTR [esp+12], 0x1
call <test_function>
The assembly for test_function(...) looks like this:
push ebp
mov ebp, esp
sub esp,0x28
mov DWORD PTR [ebp-12], 0x7a69 ;this is 31337 in hexadecimal
mov BYTE PTR [ebp-40], 0x41 ;this is the 'A' in ASCII
leave
ret
What is hard for me to understand is:
and esp,0xffffffff0
mov eax,0x0
sub esp,eax
Why are we operating an and with 0xffffffff0 on esp?
And why do we move a 0 to eax and sub the content of eax from esp?
Second:
Through sub esp,0x28 we are allocating 40 bytes of RAM. Why 40? The integer and the 10 chars of the array are altogether only 14 bytes, aren't they?
And why are we moving 0x7a69 to the position [ebp-12] and not to [ebp]? By operating mov ebp, esp I set ebp to the current ESP. Now ESP is pointing to the end of the stack. The last value I pushed on the stack was the ebp by operating push ebp. So EBP (= esp) points behind the saved ebp. So why couldn't I move 0x7a69 to [ebp] just directly behind the saved EBP?
And why is the 'A' moved to [ebp-40]?
This seems to be some standard compiler-generated assembler.
and esp,0xffffffff0
mov eax,0x0
sub esp,eax
The and will make esp a multiple of 16, i.e. alligns it on 16 bytes. Because the stack grows downward it is essentially a substraction, not an addition.
The next mov and add reserve space for the local variables. in main there are no local variables, so their total is 0x0. Because test_function has local variables, 0x28 is moved to eax and added to esp. Probably the compiler has also alligned this on some multiple. Lastly, [ebp-40] is the location on the reserved stack space the compiler has assigned to buffer.

Mixing C and Assembly

I'm doing a program in assembly to read a disk through ports (0x1f0-0x1f7) and I'm mixing it with c. I have a function in assembly that I will call in my c main funtion. My main function as 1 parameter: sectors to read:
Kernel.c
extern int _readd(int nmrsector);
(...)
int sector = 257;
int error = _readd(sector);
if(error == 0) PrintString("Error"); //It is declared on my screen.h file
disk.asm
global _readd
_readd:
push eax
push ebx
push ecx
push edx
push ebp
mov ebp, esp
mov eax, [ebp+8]
mov ecx, eax
cmp ecx, 256
jg short _fail
jne short _good
_fail:
xor eax, eax
leave
ret
_good:
xor eax, eax
mov eax, 12
leave
ret
It crashes when run it with VirtualBox. Any ideas?
If you save CPU registers when you enter a function, you need to restore them when you are finished. Your PUSHs need to be matched with POPs.
Also, if you use a stack frame to access local variables and parameters, setup the frame (push ebp ; mov ebp, esp) before everything, so you can more easily refer to them. Here [ebp+8] doesn't refer to a parameter, because you alter the stack before setting up the frame.

Understanding the C function call prolog with __cdecl on windows

Compiling this simple function with MSVC2008, in Debug mode:
int __cdecl sum(int a, int b)
{
return a + b;
}
I get the following disassembly listing:
int __cdecl sum(int a, int b)
{
004113B0 push ebp
004113B1 mov ebp,esp
004113B3 sub esp,0C0h
004113B9 push ebx
004113BA push esi
004113BB push edi
004113BC lea edi,[ebp-0C0h]
004113C2 mov ecx,30h
004113C7 mov eax,0CCCCCCCCh
004113CC rep stos dword ptr es:[edi]
return a + b;
004113CE mov eax,dword ptr [a]
004113D1 add eax,dword ptr [b]
}
004113D4 pop edi
004113D5 pop esi
004113D6 pop ebx
004113D7 mov esp,ebp
004113D9 pop ebp
004113DA ret
There are some parts of the prolog I don't understand:
004113BC lea edi,[ebp-0C0h]
004113C2 mov ecx,30h
004113C7 mov eax,0CCCCCCCCh
004113CC rep stos dword ptr es:[edi]
Why is this required?
EDIT:
After removing the /RTC compiler option, as was suggested, most of this code indeed went away. What remained is:
int __cdecl sum(int a, int b)
{
00411270 push ebp
00411271 mov ebp,esp
00411273 sub esp,40h
00411276 push ebx
00411277 push esi
00411278 push edi
return a + b;
00411279 mov eax,dword ptr [a]
0041127C add eax,dword ptr [b]
}
Now, why is the: sub esp, 40h needed? It's as if place is being allocated for local variables, though there aren't any. Why is the compiler doing this? Is there another flag involved?
This code is emitted due to the /RTC compile option. It initializes all local variables in your function to a bit pattern that is highly likely to generate an access violation or to cause unusual output values. That helps you find out when you forgot to initialize a variable.
The extra space in the stack frame you see allocated is there to support the Edit + Continue feature. This space will be used when you edit the function while debugging and add more local variables. Change the /ZI option to /Zi to disable it.
and in any case of buffer overflow (if you would overwrite local variables) you will end up in a field of "int 3" opcodes:
int 3 ; 0xCC
int 3 ; 0xCC
int 3 ; 0xCC
int 3 ; 0xCC
int 3 ; 0xCC
int 3 ; 0xCC
...
that can be catched by the debugger, so you can fix your code

Resources