Assembler program crashes after call to printf - c

I am writing a simple function to print a float value from stack. This function is generated, so it is not optimized. Program crashes at printf call.
;input: float32 as dword ptr ebp+8
printfloat32:
push ebp
mov ebp, esp
sub esp, 16
;local ptr variable k at dword ptr ebp-4
mov dword ptr ebp-4, lpcstr4 ;which is "%f"
movss xmm0, dword ptr ebp+8
cvtss2sd xmm0, xmm0
sub esp, 8
movsd qword ptr esp, xmm0
push dword ptr ebp-4
call printstr
add esp, 12
mov esp, ebp
pop ebp
ret
printstr is printf. Here is the full generated code: https://pastebin.com/g0Wff0JY

Looking at the image, I see what could be a potential issue, but I don't know fasm syntax:
call [printstr] ;syntax used for the first call
...
call printstr ;syntax used for the second call that fails
If printstr is a memory based pointer to the function, then the second call syntax may be trying to make a call to where the pointer is stored, rather than calling the actual function by using the value in memory as a pointer to function.
In the case of recent versions of Visual Studio, the default printf and scanf are effectively inlined into C/C++ code, with fairly complex syntax. Rather than deal with this, there are callable legacy versions available that require this includelib statement:
includelib legacy_stdio_definitions.lib ;for scanf, printf, ...
I converted the code from the question to masm syntax, change printstr to printf and tested a 32 bit build using Visual Studio 2015 on Windows 7 Pro 64 bit (the build is 32 bit so was run in 32 bit mode). I didn't have any issues with this code. I stepped through the code using debugger and didn't see any issues with how stuff was stored on the stack. I suspect the issue is with the second call to printstr without the brackets, which I corrected as part of the conversion to masm syntax.
.data
varf real4 123.75
lpcstr4 db "%f",00ah,0 ;added new line
.code
extern printf:near ;instead of printstr
printfloat32 proc
push ebp
mov ebp,esp
sub esp,16
mov dword ptr [ebp-4], offset lpcstr4
movss xmm0,dword ptr [ebp+8]
cvtss2sd xmm0,xmm0
sub esp,8
movsd qword ptr [esp],xmm0
push dword ptr [ebp-4]
call printf ;was printstr
add esp,12
mov esp,ebp
pop ebp
ret
printfloat32 endp
main proc
push varf ;test printfloat32 function
call printfloat32
add esp,4
xor eax,eax
ret
main endp
end
Using printstr as a pointer to printf. Masm doesn't need brackets, since it knows printstr is a dd (pointer to printf).
.code
extern printf:near
printstr dd printf ;masm doesn't need brackets
printfloat32 proc
; ...
call printstr ;masm doesn't need brackets
; ...
printfloat32 endp
If printstr was external to this source file, the masm syntax would be
extrn printstr:ptr ; or extern printstr:dword

Related

how to save the value of ESP during a function call

I have a problem with the below code:
void swap(int* a, int* b) {
__asm {
mov eax, a;
mov ebx, b;
push[eax];
push[ebx];
pop[eax];
pop[ebx];
}
}
int main() {
int a = 3, b = 6;
printf("a: %d\tb: %d\n", a, b);
swap(&a, &b);
printf("a: %d\tb: %d\n", a, b);
}
I am running this code in visual studio and when I run this, it says:
Run-Time check failure- The value of ESP was not properly saved across a function call. This is usually a result of calling a function declared with one calling convention with a function pointer declared with a different calling convention.
What am I missing?
To answer the title question: make sure you balance pushes and pops. (Normally getting that wrong would just crash, not return with the wrong ESP). If you're writing a whole function in asm make sure ret 0 or ret 8 or whatever matches the calling convention you're supposed to be using and the amount of stack args to pop (e.g. caller-pops cdecl ret 0 or callee-pops stdcall ret n).
Looking at the compiler's asm output (e.g. on Godbolt or locally) reveals the problem: different operand-sizes for push vs. pop, MSVC not defaulting to dword ptr for pop.
; MSVC 19.14 (under WINE) -O0
_a$ = 8 ; size = 4
_b$ = 12 ; size = 4
void swap(int *,int *) PROC ; swap
push ebp
mov ebp, esp
push ebx ; save this call-preserved reg because you used it instead of ECX or EDX
mov eax, DWORD PTR _a$[ebp]
mov ebx, DWORD PTR _b$[ebp]
push DWORD PTR [eax]
push DWORD PTR [ebx]
pop WORD PTR [eax]
pop WORD PTR [ebx]
pop ebx
pop ebp
ret 0
void swap(int *,int *) ENDP
This code would just crash, with ret executing while ESP points to the saved EBP (pushed by push ebp). Presumably Visual Studio passes addition debug-build options to the compiler so it does more checking instead of just crashing?
Insanely, MSVC compiles/assembles push [reg] to push dword ptr (32-bit operand-size, ESP-=4 each), but pop [reg] to pop word ptr (16-bit operand-size, ESP+=2 each)
It doesn't even warn about the operand-size being ambiguous, unlike good assemblers such as NASM where push [eax] is an error without a size override. (push 123 of an immediate always defaults to an operand-size matching the mode, but push/pop of a memory operand usually needs a size specifier in most assemblers.)
Use push dword ptr [eax] / pop dword ptr [ebx]
Or since you're using EBX anyway, not limiting your function to just the 3 call-clobbered registers in the standard 32-bit calling conventions, use registers to hold the temporaries instead of stack space.
void swap_mov(int* a, int* b) {
__asm {
mov eax, a
mov ebx, b
mov ecx, [eax]
mov edx, [ebx]
mov [eax], edx
mov [ebx], ecx
}
}
(You don't need ; empty comments at the end of each line. The syntax inside an asm{} block is MASM-like, not C statements.)

Why the first actual parameter printing as a output in C

#include <stdio.h>
int add(int a, int b)
{
if (a > b)
return a * b;
}
int main(void)
{
printf("%d", add(3, 7));
return 0;
}
Output:
3
In the above code, I am calling the function inside the print. In the function, the if condition is not true, so it won't execute. Then why I am getting 3 as output? I tried changing the first parameter to some other value, but it's printing the same when the if condition is not satisfied.
What happens here is called undefined behaviour.
When (a <= b), you don't return any value (and your compiler probably told you so). But if you use the return value of the function anyway, even if the function doesn't return anything, that value is garbage. In your case it is 3, but with another compiler or with other compiler flags it could be something else.
If your compiler didn't warn you, add the corresponding compiler flags. If your compiler is gcc or clang, use the -Wall compiler flags.
Jabberwocky is right: this is undefined behavior. You should turn your compiler warnings on and listen to them.
However, I think it can still be interesting to see what the compiler was thinking. And we have a tool to do just that: Godbolt Compiler Explorer.
We can plug your C program into Godbolt and see what assembly instructions it outputs. Here's the direct Godbolt link, and here's the assembly that it produces.
add:
push rbp
mov rbp, rsp
mov DWORD PTR [rbp-4], edi
mov DWORD PTR [rbp-8], esi
mov eax, DWORD PTR [rbp-4]
cmp eax, DWORD PTR [rbp-8]
jle .L2
mov eax, DWORD PTR [rbp-4]
imul eax, DWORD PTR [rbp-8]
jmp .L1
.L2:
.L1:
pop rbp
ret
.LC0:
.string "%d"
main:
push rbp
mov rbp, rsp
mov esi, 7
mov edi, 3
call add
mov esi, eax
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call printf
mov eax, 0
pop rbp
ret
Again, to be perfectly clear, what you've done is undefined behavior. With different compiler flags or a different compiler version or even just a compiler that happens to feel like doing things differently on a particular day, you will get different behavior. What I'm studying here is the assembly output by gcc 12.2 on Godbolt with optimizations disabled, and I am not representing this as standard or well-defined behavior.
This engine is using the System V AMD64 calling convention, common on Linux machines. In System V, the first two integer or pointer arguments are passed in the rdi and rsi registers, and integer values are returned in rax. Since everything we work with here is either an int or a char*, this is good enough for us. Note that the compiler seems to have been smart enough to figure out that it only needs edi, esi, and eax, the lower half-words of each of these registers, so I'll start using edi, esi, and eax from this point on.
Our main function works fine. It does everything we'd expect. Our two function calls are here.
mov esi, 7
mov edi, 3
call add
mov esi, eax
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call printf
To call add, we put 3 in the edi register and 7 in the esi register and then we make the call. We get the return value back from add in eax, and we move it to esi (since it will be the second argument to printf). We put the address of the static memory containing "%d" in edi (the first argument), and then we call printf. This is all normal. main knows that add was declared to return an integer, so it has the right to assume that, after calling add, there will be something useful in eax.
Now let's look at add.
add:
push rbp
mov rbp, rsp
mov DWORD PTR [rbp-4], edi
mov DWORD PTR [rbp-8], esi
mov eax, DWORD PTR [rbp-4]
cmp eax, DWORD PTR [rbp-8]
jle .L2
mov eax, DWORD PTR [rbp-4]
imul eax, DWORD PTR [rbp-8]
jmp .L1
.L2:
.L1:
pop rbp
ret
The rbp and rsp shenanigans are standard function call fare and aren't specific to add. First, we load our two arguments onto the call stack as local variables. Now here's where the undefined behavior comes in. Remember that I said eax is the return value of our function. Whatever happens to be in eax when the function returns is the returned value.
We want to compare a and b. To do that, we need a to be in a register (lots of assembly instructions require their left-hand argument to be a register, while the right-hand can be a register, reference, immediate, or just about anything). So we load a into eax. Then we compare the value in eax to the value b on the call stack. If a > b, then the jle does nothing. We go down to the next two lines, which are the inside of your if statement. They correctly set eax and return a value.
However, if a <= b, then the jle instruction jumps to the end of the function without doing anything else to eax. Since the last thing in eax happened to be a (because we happened to use eax as our comparison register in cmp), that's what gets returned from our function.
But this really is just random. It's what the compiler happened to have put in that register previously. If I turn optimizations up (with -O3), then gcc inlines the whole function call and ends up printing out 0 rather than a. I don't know exactly what sequence of optimizations led to this conclusion, but since they started out by hinging on undefined behavior, the compiler is free to make what assumptions it chooses.

Stack cleanup not working (__stdcall MASM function)

there's something weird going on here. Visual Studio is letting me know the ESP value was not properly saved but I cannot see any mistakes in the code (32-bit, windows, __stdcall)
MASM code:
.MODE FLAT, STDCALL
...
memcpy PROC dest : DWORD, source : DWORD, size : DWORD
MOV EDI, [ESP+04H]
MOV ESI, [ESP+08H]
MOV ECX, [ESP+0CH]
AGAIN_:
LODSB
STOSB
LOOP AGAIN_
RETN 0CH
memcpy ENDP
I am passing 12 bytes (0xC) to the stack then cleaning it up. I have confirmed by looking at the symbols the functions symbol goes like "memcpy#12", so its indeed finding the proper symbol
this is the C prototype:
extern void __stdcall * _memcpy(void*,void*,unsigned __int32);
Compiling in 32-bit. The function copies the memory (I can see in the debugger), but the stack cleanup appears not to be working
EDIT:
MASM code:
__MyMemcpy PROC _dest : DWORD, _source : DWORD, _size : DWORD
MOV EDI, DWORD PTR [ESP + 04H]
MOV ESI, DWORD PTR [ESP + 08H]
MOV ECX, DWORD PTR [ESP + 0CH]
PUSH ESI
PUSH EDI
__AGAIN:
LODSB
STOSB
LOOP __AGAIN
POP EDI
POP ESI
RETN 0CH
__MyMemcpy ENDP
C code:
extern void __stdcall __MyMemcpy(void*, void*, int);
typedef struct {
void(__stdcall*MemCpy)(void*,void*,int);
}MemFunc;
int initmemfunc(MemFunc*f){
f->MemCpy=__MyMemcpy
}
when I call it like this I get the error:
MemFunc mf={0};
initmemfunc(&mf);
mf.MemCpy(dest,src,size);
when I call it like this I dont:
__MyMemcpy(dest,src,size)
Since you have provided an update to your question and comments suggesting you disable prologue and epilogue code generation for functions created with the MASM PROC directive I suspect your code looks something like this:
.MODEL FLAT, STDCALL
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
.CODE
__MyMemcpy PROC _dest : DWORD, _source : DWORD, _size : DWORD
MOV EDI, DWORD PTR [ESP + 04H]
MOV ESI, DWORD PTR [ESP + 08H]
MOV ECX, DWORD PTR [ESP + 0CH]
PUSH ESI
PUSH EDI
__AGAIN:
LODSB
STOSB
LOOP __AGAIN
POP EDI
POP ESI
RETN 0CH
__MyMemcpy ENDP
END
A note about this code: beware that if your source and destination buffers overlap this can cause problems. If the buffers don't overlap then what you are doing should work. You can avoid this by marking the pointers __restrict. __restrict is an MSVC/C++ extension that will act as a hint to the compiler that the argument doesn't overlap with another. This can allow the compiler to potentially warn of this situation since your assembly code is unsafe for that situation. Your prototypes could have been written as:
extern void __stdcall __MyMemcpy( void* __restrict, void* __restrict, int);
typedef struct {
void(__stdcall* MemCpy)(void* __restrict, void* __restrict, int);
}MemFunc;
You are using PROC but not taking advantage of any of the underlying power it affords (or obscures). You have disabled PROLOGUE and EPILOGUE generation with the OPTION directive. You properly use RET 0Ch to have the 12 bytes of arguments cleaned from the stack.
From a perspective of the STDCALL calling convention your code is correct as it pertains to stack usage. There is a serious issue in that the Microsoft Windows STDCALL calling convention requires the caller to preserve all the registers it uses except EAX, ECX, and EDX. You clobber EDI and ESI and both need to be saved before you use them. In your code you save them after their contents are destroyed. You have to push both ESI and EDI on the stack first. This will require you adding 8 to the offsets relative to ESP. Your code should have looked like this:
__MyMemcpy PROC _dest : DWORD, _source : DWORD, _size : DWORD
PUSH EDI ; Save registers first
PUSH ESI
MOV EDI, DWORD PTR [ESP + 0CH] ; Arguments are offset by an additional 8 bytes
MOV ESI, DWORD PTR [ESP + 10H]
MOV ECX, DWORD PTR [ESP + 14H]
__AGAIN:
LODSB
STOSB
LOOP __AGAIN
POP ESI ; Restore the caller (non-volatile) registers
POP EDI
RETN 0CH
__MyMemcpy ENDP
You asked the question why it appears you are getting an error about ESP or a stack issue. I assume you are getting an error similar to this:
This could be a result of either ESP being incorrect when mixing STDCALL and CDECL calling conventions or it can arise out of the value of the saved ESP being clobbered by the function. It appears in your case it is the latter.
I wrote a small C++ project with this code that has similar behaviour to your C program:
#include <iostream>
extern "C" void __stdcall __MyMemcpy( void* __restrict, void* __restrict, int);
typedef struct {
void(__stdcall* MemCpy)(void* __restrict, void* __restrict, int);
}MemFunc;
int initmemfunc(MemFunc* f) {
f->MemCpy = __MyMemcpy;
return 0;
}
char buf1[] = "Testing";
char buf2[200];
int main()
{
MemFunc mf = { 0 };
initmemfunc(&mf);
mf.MemCpy(buf2, buf1, strlen(buf1));
std::cout << "Hello World!\n" << buf2;
}
When I use code like yours that doesn't properly save ESI and EDI I discovered this in the generated assembly code displayed in the Visual Studio C/C++ debugger:
I have annotated the important parts. The compiler has generated C runtime checks (these can be disabled, but they will just hide the problem and not fix it) including a check of ESP across a STDCALL function call. Unfortunately it relies on saving the original value of ESP (before pushing parameters) into the register ESI. As a result a runtime check is made after the call to __MyMemcpy to see if ESP and ESI are still the same value. If they aren't you get the warning about ESP not being saved correctly.
Since your code incorrectly clobbers ESI (and EDI) the check fails. I have annotated the debug output to hopefully provide a better explanation.
You can avoid the use of a LODSB/STOSB loop to copy data. There is an instruction that just this very operation (REP MOVSB) that copies ECX bytes pointed to by ESI and copies them to EDI. A version of your code could have been written as:
__MyMemcpy PROC _dest : DWORD, _source : DWORD, _size : DWORD
PUSH EDI ; Save registers first
PUSH ESI
MOV EDI, DWORD PTR [ESP + 0CH] ; Arguments are offset by an additional 8 bytes
MOV ESI, DWORD PTR [ESP + 10H]
MOV ECX, DWORD PTR [ESP + 14H]
REP MOVSB
POP ESI ; Restore the caller (non-volatile) registers
POP EDI
RETN 0CH
__MyMemcpy ENDP
If you were to use the power of PROC to save the registers ESI and EDI you could list them with the USES directive. You can also reference the argument locations on the stack by name. You can also have MASM generate the proper EPILOGUE sequence for the calling convention by simply using ret. This will clean the up the stack appropriately and in the case of STDCALL return by removing the specified number of bytes from the stack (ie ret 0ch) in this case since there are 3 4-byte arguments.
The downside is that you do have to generate the PROLOGUE and EPILOGUE code that can make things more inefficient:
.MODEL FLAT, STDCALL
.CODE
__MyMemcpy PROC USES ESI EDI dest : DWORD, source : DWORD, size : DWORD
MOV EDI, dest
MOV ESI, source
MOV ECX, size
REP MOVSB ; Use instead of LODSB/STOSB+Loop
RET
__MyMemcpy ENDP
END
The assembler would generate this code for you:
PUBLIC __MyMemcpy#12
__MyMemcpy#12:
push ebp
mov ebp,esp ; Function prologue generate by PROC
push esi ; USES caused assembler to push EDI/ESI on stack
push edi
mov edi,dword ptr [ebp+8]
mov esi,dword ptr [ebp+0Ch]
mov ecx,dword ptr [ebp+10h]
rep movs byte ptr es:[edi],byte ptr [esi]
; MASM generated this from the simple RET instruction to restore registers,
; clean up stack and return back to caller per the STDCALL calling convention
pop edi ; Assembler
pop esi
leave
ret 0Ch
Some may rightly argue that having the assembler obscure all this work makes the code potentially harder to understand for someone who doesn't realize the special processing MASM can do with a PROC declared function. This may result in harder to maintain code for someone else that is unfamiliar with MASM's nuances in the future. If you don't understand what MASM may generate, then sticking to coding the body of the function yourself is probably a safer bet. As you have found that also involves turning PROLOGUE and EPILOGUE code generation off.
The reason why the stack is corrupted is that MASM "secretly" inserts the prologue code to your function. When I added the option to disable that, the function works for me now.
You can see this, when you switch to assembly mode while still in the C code and then step into your function. It seems that VS doesn't swtich to assembly mode when already in the assembly source.
.586
.MODEL FLAT,STDCALL
OPTION PROLOGUE:NONE
.CODE
mymemcpy PROC dest:DWORD, src:DWORD, sz:DWORD
MOV EDI, [ESP+04H]
MOV ESI, [ESP+08H]
MOV ECX, [ESP+0CH]
AGAIN_:
LODSB
STOSB
LOOP AGAIN_
RETN 0CH
mymemcpy ENDP
END

Mixing C and Assembly

I'm doing a program in assembly to read a disk through ports (0x1f0-0x1f7) and I'm mixing it with c. I have a function in assembly that I will call in my c main funtion. My main function as 1 parameter: sectors to read:
Kernel.c
extern int _readd(int nmrsector);
(...)
int sector = 257;
int error = _readd(sector);
if(error == 0) PrintString("Error"); //It is declared on my screen.h file
disk.asm
global _readd
_readd:
push eax
push ebx
push ecx
push edx
push ebp
mov ebp, esp
mov eax, [ebp+8]
mov ecx, eax
cmp ecx, 256
jg short _fail
jne short _good
_fail:
xor eax, eax
leave
ret
_good:
xor eax, eax
mov eax, 12
leave
ret
It crashes when run it with VirtualBox. Any ideas?
If you save CPU registers when you enter a function, you need to restore them when you are finished. Your PUSHs need to be matched with POPs.
Also, if you use a stack frame to access local variables and parameters, setup the frame (push ebp ; mov ebp, esp) before everything, so you can more easily refer to them. Here [ebp+8] doesn't refer to a parameter, because you alter the stack before setting up the frame.

Difference between ebp based addressing and esp addressing

I have written some code to learn about the call stack. I've done this with some inline assembly for passing parameters on stack. I've compiled it with gcc 4.1.2(on CentOS5.4) and it works well, then I compile it with gcc 4.8.4(on Ubuntu14.04.3) and run the program but it always crashes.
I discovered that there are differences in how the variables are referenced. The local variable is addressed using the EBP register in gcc 4.1.2(CentOS5.4) while the local variable is addressed using the ESP register in gcc 4.8.4(Ubuntu14.04.3). This seems to be the reason of that it crashes.
My question is, how can I control whether gcc uses EBP or ESP? Also, what is the difference between them?
Here is the C code:
double fun(double d) {
return d;
}
int main(void) {
double a = 1.6;
double (*myfun)() = fun;
asm volatile("subl $8, %esp\n"
"fstpl (%esp)\n");
myfun();
asm volatile("addl $8, %esp\n");
return 0;
}
Here is the assembly in gcc 4.1.2, and it works
int main(void) {
**......**
double a = 1.6;
0x080483bf <+17>: fldl 0x80484d0
0x080483c5 <+23>: fstpl -0x18(%ebp)
double (*myfun) () = fun;
0x080483c8 <+26>: movl $0x8048384,-0xc(%ebp)
asm volatile("subl $8, %esp\n"
"fstpl (%esp)\n");
0x080483cf <+33>: sub $0x8,%esp
0x080483d2 <+36>: fstpl (%esp)
myfun();
0x080483d5 <+39>: mov -0xc(%ebp),%eax
0x080483d8 <+42>: call *%eax
0x080483da <+44>: fstp %st(0)
asm volatile("addl $8, %esp\n");
0x080483dc <+46>: add $0x8,%esp
**......**
here is the assembly in gcc 4.8.4. This is what crashes:
int main(void) {
**......**
double a = 1.6;
0x0804840d <+9>: fldl 0x80484d0
0x08048413 <+15>: fstpl 0x8(%esp)
double (*myfun)() = fun;
0x08048417 <+19>: movl $0x80483ed,0x4(%esp)
asm volatile("subl $8,%esp\n"
"fstpl (%esp)\n");
0x0804841f <+27>: sub $0x8,%esp
0x08048422 <+30>: fstpl (%esp)
myfun();
0x08048425 <+33>: mov 0x4(%esp),%eax
0x08048429 <+37>: call *%eax
0x0804842b <+39>: fstp %st(0)
asm volatile("addl $8,%esp\n");
0x0804842d <+41>: add $0x8,%esp
**......**
There's no real difference between using esp and ebp, except that esp changes with push, pop, call, ret, which sometimes makes it difficult to know where a certain local variable or parameter is located in the stack. That's why ebp gets loaded with esp, so that there is a stable reference point to refer to the function arguments and the local variables.
For a function like this:
int foo( int arg ) {
int a, b, c, d;
....
}
the following assembly is usually generated:
# using Intel syntax, where `mov eax, ebx` puts the value in `ebx` into `eax`
.intel_syntax noprefix
foo:
push ebp # preserve
mov ebp, esp # remember stack
sub esp, 16 # allocate local variables a, b, c, d
...
mov esp, ebp # de-allocate the 16 bytes
pop ebp # restore ebp
ret
Calling this method (foo(0)) would generate something like this:
pushd 0 # the value for arg; esp becomes esp-4
call foo
add esp, 4 # free the 4 bytes of the argument 'arg'.
Immediately after the call instruction has executed, right before the first instruction of the foo method is executed, [esp] will hold the return address, and [esp+4] the 0 value for arg.
In method foo, if we wanted to load arg into eax (at the ...)
we could use:
mov eax, [ebp + 4 + 4]
because [ebp + 0] holds the previous value of ebp (from the push ebp),
and [ebp + 4] (the original value of esp), holds the return address.
But we could also reference the parameter using esp:
mov eax, [esp + 16 + 4 + 4]
We add 16 because of the sub esp, 16, then 4 because of the push ebp, and another 4 to skip the return address, to arrive at arg.
Similarly accessing the four local variables can be done in two ways:
mov eax, [ebp - 4]
mov eax, [ebp - 8]
mov eax, [ebp - 12]
mov eax, [ebp - 16]
or
mov eax, [esp + 12]
mov eax, [esp + 8]
mov eax, [esp + 4]
mov eax, [esp + 0]
But, whenever esp changes, these instructions must change aswell. So, in the end, it does not matter whether esp or ebp is used. It might be more efficient to use esp since you don't have to push ebp; mov ebp, esp; ... mov esp, ebp; pop ebp.
UPDATE
As far as I can tell, there's no way to guarantee your inline assembly will work: the gcc 4.8.4 on Ubunty optimizes out the use of ebp and references everything with esp. It doesn't know that your inline assembly modifies esp, so when it tries to call myfun(), it fetches it from [esp + 4], but it should have fetched it from [esp + 4 + 8].
Here is a workaround: don't use local variables (or parameters) in the function where you use inline assembly that does stack manipulation. To bypass the problem of casting double fun(double) to double fn() call the function directly in assembly:
void my_call() {
asm volatile("subl $8, %esp\n"
"fstpl (%esp)\n"
"call fun\n"
"addl $8, %esp\n");
}
int main(void) {
my_call();
return 0;
}
You could also place the my_call function in a separate .s (or .S) file:
.text
.global my_call
my_call:
subl $8, %esp
fstpl (%esp)
call fun
addl $8, %esp
ret
and in C:
extern double my_call();
You could also pass fun as an argument:
extern double my_call( double (*myfun)() );
...
my_call( fun );
and
.text
.global my_call
my_call:
sub $8, %esp
fstp (%esp)
call *12(%esp)
add $8, %esp
ret
Most compilers create EBP-based stack frames. Or, at least they used to. This is the method that most people are taught that utilizes using EBP as a fixed base frame pointer.
Some compilers create ESP-based stack frames. The reason is simple. It frees up EBP to be used for other uses, and removes the overhead of setting up and restoring the stack frame. It is clearly much harder to visualize, since the stack pointer can be constantly changing.
The problem you are having might be because you are calling APIs that use stdcall calling convention, which end up trashing your stack, unintentionally, when they return to the caller. EBP must be preserved by the callee by cdecl and stdcall founction. However, stdcall routines will clean up the stack with ret 4 for example, thus shrinking its size. The caller must compensate for these types of mishaps, and reallocate space on the stack after the call returns.
GCC has the option -fomit-frame-pointer which will turn off EBP-based frames. It's on by default at most optimization levels. You can use -O2 -fno-omit-frame-pointer to optimize normally except for still setting up EBP as a frame pointer.
If you want to learn about the stack and parameter passing conventions (ABI), I suggest you look at the assembly generated by the compiler. You can do this interactively on this site: http://gcc.godbolt.org/#
Try various argument types, varadic functions, passing and returning floats, doubles, structures of different sizes...
Messing with the stack using inline assembly is too difficult and unpredictable. It is likely to fail in so many ways, you will not learn anything useful.
ebp is normally used for frame pointers. The first instructions for functions using frame pointers are
push ebp ;save ebp
mov ebp,esp ;ebp = esp
sub esp,... ;allocate space for local variables
then parameters and local variable are +/- offsets from ebp
Most compilers have an option to not use frame pointers, in which case esp is used as the base pointer. If non-frame pointer code uses ebp as a generic register, it still need to be saved.

Resources