Poking the stack

Poking the stack - c

I'm trying to understand how exactly the stack works, so I'll recreate here a small example with some questions.
Pretend that I have a small code in ASM that does the following thing:
(all this is x86, intel syntax, Linux)
push ebp
mov ebp, esp
sub esp, 16
mov eax, 0xdeadbeef <-- let's call this 'local variable a'
mov [ebp - 16], eax
mov eax, 0xabcdabcd <-- let's call this 'local variable b'
mov [ebp - 12], eax
mov eax, 0xcacafafa <-- let's call this 'local variable c'
mov [ebp - 8], eax
mov eax, 0xdada1111 <-- let's call this 'local variable d'
mov [ebp - 4], eax
call 0x10101010 <------- pretend that is the real address of function_B
mov esp, ebp
pop ebp
ret
Now pretend that I have a C function called function_B which get's called from the asm code and it looks like this:
asmlinkage void function_B(void){
//some code here...
}
How would I access a, b, c and d from function_B with inline ASM code?
Would this work? Should I be doing it differently?
uint32_t val_a;
__asm__ __volatile__(
".intel_syntax noprefix;"
"mov %0, dword [ebp + 4 + 4 + 0];"
".att_syntax;"
: "=r" (val_a)
);
uint32_t val_b;
__asm__ __volatile__(
".intel_syntax noprefix;"
"mov %0, dword [ebp + 4 + 4 + 4];"
".att_syntax;"
: "=r" (val_b)
);
Also, how would I access a, b, c and d if my function looked like this:
asmlinkage void function_B(unsigned int val){
//some code here...
}

What about this:
void function_B(uint32_t fake){
uint32_t* poke = &fake;
uint32_t a = poke[0];
uint32_t b = poke[1];
uint32_t c = poke[2];
uint32_t d = poke[3];
}
What will give you the values of a, b, c and d with your current asm code (which is not pushing any arguments to function_B). If you also want to pass real arguments to function_B you can still add the fake one after the last real argument:
void function_B(unsigned int val,uint32_t fake)
It is actually a little safer to do it like this (then with inline asm) since it has a better chance to survive optimizations (like stack frame omission). But it's still a hack of course, and nothing I would rely too much upon.

Related

how to save the value of ESP during a function call

I have a problem with the below code:
void swap(int* a, int* b) {
__asm {
mov eax, a;
mov ebx, b;
push[eax];
push[ebx];
pop[eax];
pop[ebx];
}
}
int main() {
int a = 3, b = 6;
printf("a: %d\tb: %d\n", a, b);
swap(&a, &b);
printf("a: %d\tb: %d\n", a, b);
}
I am running this code in visual studio and when I run this, it says:
Run-Time check failure- The value of ESP was not properly saved across a function call. This is usually a result of calling a function declared with one calling convention with a function pointer declared with a different calling convention.
What am I missing?

To answer the title question: make sure you balance pushes and pops. (Normally getting that wrong would just crash, not return with the wrong ESP). If you're writing a whole function in asm make sure ret 0 or ret 8 or whatever matches the calling convention you're supposed to be using and the amount of stack args to pop (e.g. caller-pops cdecl ret 0 or callee-pops stdcall ret n).
Looking at the compiler's asm output (e.g. on Godbolt or locally) reveals the problem: different operand-sizes for push vs. pop, MSVC not defaulting to dword ptr for pop.
; MSVC 19.14 (under WINE) -O0
_a$ = 8 ; size = 4
_b$ = 12 ; size = 4
void swap(int *,int *) PROC ; swap
push ebp
mov ebp, esp
push ebx ; save this call-preserved reg because you used it instead of ECX or EDX
mov eax, DWORD PTR _a$[ebp]
mov ebx, DWORD PTR _b$[ebp]
push DWORD PTR [eax]
push DWORD PTR [ebx]
pop WORD PTR [eax]
pop WORD PTR [ebx]
pop ebx
pop ebp
ret 0
void swap(int *,int *) ENDP
This code would just crash, with ret executing while ESP points to the saved EBP (pushed by push ebp). Presumably Visual Studio passes addition debug-build options to the compiler so it does more checking instead of just crashing?
Insanely, MSVC compiles/assembles push [reg] to push dword ptr (32-bit operand-size, ESP-=4 each), but pop [reg] to pop word ptr (16-bit operand-size, ESP+=2 each)
It doesn't even warn about the operand-size being ambiguous, unlike good assemblers such as NASM where push [eax] is an error without a size override. (push 123 of an immediate always defaults to an operand-size matching the mode, but push/pop of a memory operand usually needs a size specifier in most assemblers.)
Use push dword ptr [eax] / pop dword ptr [ebx]
Or since you're using EBX anyway, not limiting your function to just the 3 call-clobbered registers in the standard 32-bit calling conventions, use registers to hold the temporaries instead of stack space.
void swap_mov(int* a, int* b) {
__asm {
mov eax, a
mov ebx, b
mov ecx, [eax]
mov edx, [ebx]
mov [eax], edx
mov [ebx], ecx
}
}
(You don't need ; empty comments at the end of each line. The syntax inside an asm{} block is MASM-like, not C statements.)

Strange register allocation in GCC's inline asm [duplicate]

This question already has answers here:
When to use earlyclobber constraint in extended GCC inline assembly?
(2 answers)
Why we need Clobbered registers list in Inline Assembly?
(1 answer)
Closed 12 months ago.
This post was edited and submitted for review 12 months ago and failed to reopen the post:
Original close reason(s) were not resolved
In short, GCC is allocating registers wrongly for this code.
typedef intptr_t Int;
Int add(Int *x, Int n, Int *y, Int m) {
Int r0, r1, r2;
asm (
R"(xor %k0, %k0
xor %k1, %k1
L0%=:
mov %2, [%[y] + %0 * 8]
neg %1
adc [%[x] + %0 * 8], %2
sbb %1, %1
.
.
.)"
: "=r"(r0), "=r"(r1), "=r"(r2)
: [x]"r"(x), [n]"r"(n), [y]"r"(y), [m]"r"(m)
: "memory"
);
printf("%ld %ld\n", n, r0);
return r0;
}
/*
add:
push rbx
xor ebx, ebx
xor edx, edx ; killing `y`
L017:
mov rcx, [rdx + rbx * 8] ; segfault
neg rdx
adc [rdi + rbx * 8], rcx
sbb rdx, rdx
*/
More weird thing is removing the printf line makes the code compile normally.
The code is a basic implementation of bignum addition. The code itself could also have a bug, but that's apart from this problem.
link to Godbolt
I read the answers in this question, and now I understand that the & is necessary to tell the input is reused after being consumed. I also read the manual where it states,
GCC may allocate the output operand in the same register as an unrelated input operand, on the assumption that the assembler code consumes its inputs before producing outputs.
But I still don't get why GCC thinks it's okay to overwrite y before it's "consumed", when there is no &. You can see in the code above that GCC is zeroing y before any value is ever read from it.
full code
#include <stdio.h>
#include <stdint.h>
#define asm __asm__ volatile
typedef intptr_t Int;
Int add(Int *x, Int n, Int *y, Int m) {
Int r0, r1, r2;
asm (
R"(xor %k0, %k0
xor %k1, %k1
L0%=:
mov %2, [%[y] + %0 * 8]
neg %1
adc [%[x] + %0 * 8], %2
sbb %1, %1
inc %0
cmp %0, %[m]
jl L0%=
neg %1
jz end%=
carry%=:
inc %0
add qword ptr [%[x] + %0 * 8 - 8], 1
jc carry%=
cmp %0, %[n]
cmovl %0, %[n]
end%=:)"
: "=&r"(r0), "=&r"(r1), "=&r"(r2)
: [x]"r"(x), [n]"r"(n), [y]"r"(y), [m]"r"(m)
: "memory"
);
printf("%ld %ld\n", n, r0);
return r0;
}

Stack cleanup not working (__stdcall MASM function)

there's something weird going on here. Visual Studio is letting me know the ESP value was not properly saved but I cannot see any mistakes in the code (32-bit, windows, __stdcall)
MASM code:
.MODE FLAT, STDCALL
...
memcpy PROC dest : DWORD, source : DWORD, size : DWORD
MOV EDI, [ESP+04H]
MOV ESI, [ESP+08H]
MOV ECX, [ESP+0CH]
AGAIN_:
LODSB
STOSB
LOOP AGAIN_
RETN 0CH
memcpy ENDP
I am passing 12 bytes (0xC) to the stack then cleaning it up. I have confirmed by looking at the symbols the functions symbol goes like "memcpy#12", so its indeed finding the proper symbol
this is the C prototype:
extern void __stdcall * _memcpy(void*,void*,unsigned __int32);
Compiling in 32-bit. The function copies the memory (I can see in the debugger), but the stack cleanup appears not to be working
EDIT:
MASM code:
__MyMemcpy PROC _dest : DWORD, _source : DWORD, _size : DWORD
MOV EDI, DWORD PTR [ESP + 04H]
MOV ESI, DWORD PTR [ESP + 08H]
MOV ECX, DWORD PTR [ESP + 0CH]
PUSH ESI
PUSH EDI
__AGAIN:
LODSB
STOSB
LOOP __AGAIN
POP EDI
POP ESI
RETN 0CH
__MyMemcpy ENDP
C code:
extern void __stdcall __MyMemcpy(void*, void*, int);
typedef struct {
void(__stdcall*MemCpy)(void*,void*,int);
}MemFunc;
int initmemfunc(MemFunc*f){
f->MemCpy=__MyMemcpy
}
when I call it like this I get the error:
MemFunc mf={0};
initmemfunc(&mf);
mf.MemCpy(dest,src,size);
when I call it like this I dont:
__MyMemcpy(dest,src,size)

Since you have provided an update to your question and comments suggesting you disable prologue and epilogue code generation for functions created with the MASM PROC directive I suspect your code looks something like this:
.MODEL FLAT, STDCALL
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
.CODE
__MyMemcpy PROC _dest : DWORD, _source : DWORD, _size : DWORD
MOV EDI, DWORD PTR [ESP + 04H]
MOV ESI, DWORD PTR [ESP + 08H]
MOV ECX, DWORD PTR [ESP + 0CH]
PUSH ESI
PUSH EDI
__AGAIN:
LODSB
STOSB
LOOP __AGAIN
POP EDI
POP ESI
RETN 0CH
__MyMemcpy ENDP
END
A note about this code: beware that if your source and destination buffers overlap this can cause problems. If the buffers don't overlap then what you are doing should work. You can avoid this by marking the pointers __restrict. __restrict is an MSVC/C++ extension that will act as a hint to the compiler that the argument doesn't overlap with another. This can allow the compiler to potentially warn of this situation since your assembly code is unsafe for that situation. Your prototypes could have been written as:
extern void __stdcall __MyMemcpy( void* __restrict, void* __restrict, int);
typedef struct {
void(__stdcall* MemCpy)(void* __restrict, void* __restrict, int);
}MemFunc;
You are using PROC but not taking advantage of any of the underlying power it affords (or obscures). You have disabled PROLOGUE and EPILOGUE generation with the OPTION directive. You properly use RET 0Ch to have the 12 bytes of arguments cleaned from the stack.
From a perspective of the STDCALL calling convention your code is correct as it pertains to stack usage. There is a serious issue in that the Microsoft Windows STDCALL calling convention requires the caller to preserve all the registers it uses except EAX, ECX, and EDX. You clobber EDI and ESI and both need to be saved before you use them. In your code you save them after their contents are destroyed. You have to push both ESI and EDI on the stack first. This will require you adding 8 to the offsets relative to ESP. Your code should have looked like this:
__MyMemcpy PROC _dest : DWORD, _source : DWORD, _size : DWORD
PUSH EDI ; Save registers first
PUSH ESI
MOV EDI, DWORD PTR [ESP + 0CH] ; Arguments are offset by an additional 8 bytes
MOV ESI, DWORD PTR [ESP + 10H]
MOV ECX, DWORD PTR [ESP + 14H]
__AGAIN:
LODSB
STOSB
LOOP __AGAIN
POP ESI ; Restore the caller (non-volatile) registers
POP EDI
RETN 0CH
__MyMemcpy ENDP
You asked the question why it appears you are getting an error about ESP or a stack issue. I assume you are getting an error similar to this:
This could be a result of either ESP being incorrect when mixing STDCALL and CDECL calling conventions or it can arise out of the value of the saved ESP being clobbered by the function. It appears in your case it is the latter.
I wrote a small C++ project with this code that has similar behaviour to your C program:
#include <iostream>
extern "C" void __stdcall __MyMemcpy( void* __restrict, void* __restrict, int);
typedef struct {
void(__stdcall* MemCpy)(void* __restrict, void* __restrict, int);
}MemFunc;
int initmemfunc(MemFunc* f) {
f->MemCpy = __MyMemcpy;
return 0;
}
char buf1[] = "Testing";
char buf2[200];
int main()
{
MemFunc mf = { 0 };
initmemfunc(&mf);
mf.MemCpy(buf2, buf1, strlen(buf1));
std::cout << "Hello World!\n" << buf2;
}
When I use code like yours that doesn't properly save ESI and EDI I discovered this in the generated assembly code displayed in the Visual Studio C/C++ debugger:
I have annotated the important parts. The compiler has generated C runtime checks (these can be disabled, but they will just hide the problem and not fix it) including a check of ESP across a STDCALL function call. Unfortunately it relies on saving the original value of ESP (before pushing parameters) into the register ESI. As a result a runtime check is made after the call to __MyMemcpy to see if ESP and ESI are still the same value. If they aren't you get the warning about ESP not being saved correctly.
Since your code incorrectly clobbers ESI (and EDI) the check fails. I have annotated the debug output to hopefully provide a better explanation.
You can avoid the use of a LODSB/STOSB loop to copy data. There is an instruction that just this very operation (REP MOVSB) that copies ECX bytes pointed to by ESI and copies them to EDI. A version of your code could have been written as:
__MyMemcpy PROC _dest : DWORD, _source : DWORD, _size : DWORD
PUSH EDI ; Save registers first
PUSH ESI
MOV EDI, DWORD PTR [ESP + 0CH] ; Arguments are offset by an additional 8 bytes
MOV ESI, DWORD PTR [ESP + 10H]
MOV ECX, DWORD PTR [ESP + 14H]
REP MOVSB
POP ESI ; Restore the caller (non-volatile) registers
POP EDI
RETN 0CH
__MyMemcpy ENDP
If you were to use the power of PROC to save the registers ESI and EDI you could list them with the USES directive. You can also reference the argument locations on the stack by name. You can also have MASM generate the proper EPILOGUE sequence for the calling convention by simply using ret. This will clean the up the stack appropriately and in the case of STDCALL return by removing the specified number of bytes from the stack (ie ret 0ch) in this case since there are 3 4-byte arguments.
The downside is that you do have to generate the PROLOGUE and EPILOGUE code that can make things more inefficient:
.MODEL FLAT, STDCALL
.CODE
__MyMemcpy PROC USES ESI EDI dest : DWORD, source : DWORD, size : DWORD
MOV EDI, dest
MOV ESI, source
MOV ECX, size
REP MOVSB ; Use instead of LODSB/STOSB+Loop
RET
__MyMemcpy ENDP
END
The assembler would generate this code for you:
PUBLIC __MyMemcpy#12
__MyMemcpy#12:
push ebp
mov ebp,esp ; Function prologue generate by PROC
push esi ; USES caused assembler to push EDI/ESI on stack
push edi
mov edi,dword ptr [ebp+8]
mov esi,dword ptr [ebp+0Ch]
mov ecx,dword ptr [ebp+10h]
rep movs byte ptr es:[edi],byte ptr [esi]
; MASM generated this from the simple RET instruction to restore registers,
; clean up stack and return back to caller per the STDCALL calling convention
pop edi ; Assembler
pop esi
leave
ret 0Ch
Some may rightly argue that having the assembler obscure all this work makes the code potentially harder to understand for someone who doesn't realize the special processing MASM can do with a PROC declared function. This may result in harder to maintain code for someone else that is unfamiliar with MASM's nuances in the future. If you don't understand what MASM may generate, then sticking to coding the body of the function yourself is probably a safer bet. As you have found that also involves turning PROLOGUE and EPILOGUE code generation off.

The reason why the stack is corrupted is that MASM "secretly" inserts the prologue code to your function. When I added the option to disable that, the function works for me now.
You can see this, when you switch to assembly mode while still in the C code and then step into your function. It seems that VS doesn't swtich to assembly mode when already in the assembly source.
.586
.MODEL FLAT,STDCALL
OPTION PROLOGUE:NONE
.CODE
mymemcpy PROC dest:DWORD, src:DWORD, sz:DWORD
MOV EDI, [ESP+04H]
MOV ESI, [ESP+08H]
MOV ECX, [ESP+0CH]
AGAIN_:
LODSB
STOSB
LOOP AGAIN_
RETN 0CH
mymemcpy ENDP
END

How safe is this swap w/o pushing registers?

I'm very new to Assembly and the code below is supposed to swap two integers via two different functions: first using swap_c and then using swap_asm.
However, I doubt, whether I need to push (I mean save) each value of registers before assembly code and pop them later (just before returning to main). In other words, will the CPU get mad at me if I return different register content (not the crucial ones like ebp or esp; but, just eax, ebx, ecx & edx) after running swap_asm function? Is it better to uncomment the lines in the assembly part?
This code runs OK for me and I managed to reduce the 27 lines of assembled C code down to 7 Assembly lines.
p.s.: System is Windows 10, VS-2013 Express.
main.c part
#include <stdio.h>
extern void swap_asm(int *x, int *y);
void swap_c(int *a, int *b) {
int t = *a;
*a = *b;
*b = t;
}
int main(int argc, char *argv[]) {
int x = 3, y = 5;
printf("before swap => x = %d y = %d\n\n", x, y);
swap_c(&x, &y);
printf("after swap_c => x = %d y = %d\n\n", x, y);
swap_asm(&x, &y);
printf("after swap_asm => x = %d y = %d\n\n", x, y);
getchar();
return (0);
}
assembly.asm part
.686
.model flat, c
.stack 100h
.data
.code
swap_asm proc
; push eax
; push ebx
; push ecx
; push edx
mov eax, [esp] + 4 ; get address of "x" stored in stack into eax
mov ebx, [esp] + 8 ; get address of "y" stored in stack into ebx
mov ecx, [eax] ; get value of "x" from address stored in [eax] into ecx
mov edx, [ebx] ; get value of "y" from address stored in [ebx] into edx
mov [eax], edx ; store value in edx into address stored in [eax]
mov [ebx], ecx ; store value in ecx into address stored in [ebx]
; pop edx
; pop ecx
; pop ebx
; pop eax
ret
swap_asm endp
end

Generally, this depends on the calling convention of the system you are working on. The calling convention specifies how to call functions. Generally, it says where to put the arguments and what registers must be preserved by the called function.
On i386 Windows with the cdecl calling convention (which is the one you probably use), you can freely overwrite the eax, ecx, and edx registers. The ebx register must be preserved. While your code appears to work, it mysteriously fails when a function starts to depend on ebx being preserved, so better save and restore it.

Understanding the C function call prolog with __cdecl on windows

Compiling this simple function with MSVC2008, in Debug mode:
int __cdecl sum(int a, int b)
{
return a + b;
}
I get the following disassembly listing:
int __cdecl sum(int a, int b)
{
004113B0 push ebp
004113B1 mov ebp,esp
004113B3 sub esp,0C0h
004113B9 push ebx
004113BA push esi
004113BB push edi
004113BC lea edi,[ebp-0C0h]
004113C2 mov ecx,30h
004113C7 mov eax,0CCCCCCCCh
004113CC rep stos dword ptr es:[edi]
return a + b;
004113CE mov eax,dword ptr [a]
004113D1 add eax,dword ptr [b]
}
004113D4 pop edi
004113D5 pop esi
004113D6 pop ebx
004113D7 mov esp,ebp
004113D9 pop ebp
004113DA ret
There are some parts of the prolog I don't understand:
004113BC lea edi,[ebp-0C0h]
004113C2 mov ecx,30h
004113C7 mov eax,0CCCCCCCCh
004113CC rep stos dword ptr es:[edi]
Why is this required?
EDIT:
After removing the /RTC compiler option, as was suggested, most of this code indeed went away. What remained is:
int __cdecl sum(int a, int b)
{
00411270 push ebp
00411271 mov ebp,esp
00411273 sub esp,40h
00411276 push ebx
00411277 push esi
00411278 push edi
return a + b;
00411279 mov eax,dword ptr [a]
0041127C add eax,dword ptr [b]
}
Now, why is the: sub esp, 40h needed? It's as if place is being allocated for local variables, though there aren't any. Why is the compiler doing this? Is there another flag involved?

This code is emitted due to the /RTC compile option. It initializes all local variables in your function to a bit pattern that is highly likely to generate an access violation or to cause unusual output values. That helps you find out when you forgot to initialize a variable.
The extra space in the stack frame you see allocated is there to support the Edit + Continue feature. This space will be used when you edit the function while debugging and add more local variables. Change the /ZI option to /Zi to disable it.

and in any case of buffer overflow (if you would overwrite local variables) you will end up in a field of "int 3" opcodes:
int 3 ; 0xCC
int 3 ; 0xCC
int 3 ; 0xCC
int 3 ; 0xCC
int 3 ; 0xCC
int 3 ; 0xCC
...
that can be catched by the debugger, so you can fix your code