I am writing an assembly function to be called from C that will call the sidt machine instruction with a memory address passed to the C function. From my analysis of the code produced by MSVC10, I have constructed the following code (YASM syntax):
SECTION .data
SECTION .text
GLOBAL _sidtLoad
_sidtLoad:
push ebp
mov ebp,esp
sub esp,0C0h
push ebx
push esi
push edi
lea edi,[ebp-0C0h]
mov ecx,30h
mov eax,0CCCCCCCCh
sidt [ebp+8]
pop edi
pop esi
pop ebx
add esp,0C0h
cmp ebp,esp
mov esp,ebp
pop ebp
ret
Here is the C function signature:
void sidtLoad (void* mem);
As far as I can see everything should work, I have even checked the memory address passed to the function and have seen that it matches the address stored at [ebp+8] (the bytes are reversed, which I presume is a result of endianness and should be handled by the machine). I have tried other arguments for the sidt instruction, such as [ebp+12], [ebp+4], [ebp-8] etc but no luck.
P.S I am writing this in an external assembly module in order to get around the lack of inline assembly when targeting x64 using MSVC10. However, this particular assembly function/program is being run as x86, just so I can get to grips with the whole process.
P.P.S I am not using the __sidt intrinsic as the Windows Driver Kit (WDK) doesn't seem to support it. Or at least I can't get it to work with it!
Your code will write the IDT into the memory address ebp+8, which is not what you want. Instead, you need something like:
mov eax, [ebp+8]
sidt [eax]
compile:
int sidLoad_fake(void * mem) {
return (int)mem;
}
to assembly, and then look at where it pulls the value from to put in the return register.
I think that the first few arguments on x86_64 are passed in registers.
Related
I dont think function parameter names are treated like variables and they doesnt get stored in memory. But I dont get how we can use these parameters in functions as variables if they dont have any place in memory. Can anyone explain me whats going on with function parameters and if they have place or not in memory
All variables are either allocated somewhere or optimized away in case the compiler found them unnecessary. Function parameters are variables and are almost certainly stored either in a CPU register or on the stack, if they are used by the program.
The only time when they might not get allocated is when the function is inlined - when the whole function call is optimized away and the function code is instead injected in the caller-side machine code. In such cases the original variables used by the caller might be used instead.
Function parameter names however are not stored anywhere in the final executable, just like any other identifier isn't stored there either. Names of variables, functions etc only exist in the source code, for the benefit of the programmer alone.
Although your title asks about “function parameter names,” it appears your question is about function parameters, which are different.
Commonly, arguments are passed to functions by putting them in processor registers or on the hardware stack. Each computing platform has some specification of which arguments should be passed where. For example, the first few small arguments (such as int values) may be passed in certain processor registers, while more or larger arguments may be put on the stack.
To the called function, these are parameters. The called function uses them from the processor registers or the stack.
I'm writing this answer assuming you know what the stack and CPU registers are. If you don't, I'd suggest you look them up before seeing this answer.
I dont think function parameter names are treated like variables and they doesnt get stored in memory.
At the assembly level, function parameter names don't really exist. But for function parameters, it depends on the assembly generated based on the compiler's level of optimization. Consider this simple function:
int foo(int a, int b)
{
return a + b;
}
Using Compiler Explorer, I checked the generated disassembly of x64 GCC 10.2. On -O0, it looks like this:
foo:
push rbp
mov rbp, rsp
mov DWORD PTR [rbp-4], edi
mov DWORD PTR [rbp-8], esi
mov edx, DWORD PTR [rbp-4]
mov eax, DWORD PTR [rbp-8]
add eax, edx
pop rbp
ret
These two lines:
mov DWORD PTR [rbp-4], edi
mov DWORD PTR [rbp-8], esi
interestingly show that the arguments are passed to edi and esi for a and b respectively, and then moved into the stack, presumably in case the registers need to be used elsewhere in the function. The rest of the function uses the space in the stack as opposed to the registers:
mov edx, DWORD PTR [rbp-4]
mov eax, DWORD PTR [rbp-8]
add eax, edx
(In case you didn't know, eax/rax generally holds the value for functions and edx in this case just serves as a general-purpose register, so these three lines are besaically eax = a; edx = b; eax += edx).
Ok, so that makes sense. The arguments are passed to registers and copied to the stack, where they are used for the rest of the function. What about -O1?
foo:
lea eax, [rdi+rsi]
ret
Now that is a lot shorter. Here, eax gets the value of rdi + rsi and the function ends. All the copying to the stack is completely skipped and the registers are used directly. So yes, in this case, the memory is never used.
EDIT
After writing this answer, I went and checked the generated assembly with the -m32 option and noticed that arguments were always pushed to the stack before the function was called. Assembly generated from -O0 looks like this:
foo:
push ebp
mov ebp, esp
mov edx, DWORD PTR [ebp+8]
mov eax, DWORD PTR [ebp+12]
add eax, edx
pop ebp
ret
Here, since the arguments are passed to the stack before the function is called, they don't have be copied from the registers to the stack (because they're already there). So the function is shorter, and amount of registers used is reduced. However, on higher levels of optimization, the function ends up becoming longer because of this:
foo:
mov eax, DWORD PTR [esp+8]
add eax, DWORD PTR [esp+4]
ret
So with -m32 set, parameters are always placed in memory.
there's something weird going on here. Visual Studio is letting me know the ESP value was not properly saved but I cannot see any mistakes in the code (32-bit, windows, __stdcall)
MASM code:
.MODE FLAT, STDCALL
...
memcpy PROC dest : DWORD, source : DWORD, size : DWORD
MOV EDI, [ESP+04H]
MOV ESI, [ESP+08H]
MOV ECX, [ESP+0CH]
AGAIN_:
LODSB
STOSB
LOOP AGAIN_
RETN 0CH
memcpy ENDP
I am passing 12 bytes (0xC) to the stack then cleaning it up. I have confirmed by looking at the symbols the functions symbol goes like "memcpy#12", so its indeed finding the proper symbol
this is the C prototype:
extern void __stdcall * _memcpy(void*,void*,unsigned __int32);
Compiling in 32-bit. The function copies the memory (I can see in the debugger), but the stack cleanup appears not to be working
EDIT:
MASM code:
__MyMemcpy PROC _dest : DWORD, _source : DWORD, _size : DWORD
MOV EDI, DWORD PTR [ESP + 04H]
MOV ESI, DWORD PTR [ESP + 08H]
MOV ECX, DWORD PTR [ESP + 0CH]
PUSH ESI
PUSH EDI
__AGAIN:
LODSB
STOSB
LOOP __AGAIN
POP EDI
POP ESI
RETN 0CH
__MyMemcpy ENDP
C code:
extern void __stdcall __MyMemcpy(void*, void*, int);
typedef struct {
void(__stdcall*MemCpy)(void*,void*,int);
}MemFunc;
int initmemfunc(MemFunc*f){
f->MemCpy=__MyMemcpy
}
when I call it like this I get the error:
MemFunc mf={0};
initmemfunc(&mf);
mf.MemCpy(dest,src,size);
when I call it like this I dont:
__MyMemcpy(dest,src,size)
Since you have provided an update to your question and comments suggesting you disable prologue and epilogue code generation for functions created with the MASM PROC directive I suspect your code looks something like this:
.MODEL FLAT, STDCALL
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
.CODE
__MyMemcpy PROC _dest : DWORD, _source : DWORD, _size : DWORD
MOV EDI, DWORD PTR [ESP + 04H]
MOV ESI, DWORD PTR [ESP + 08H]
MOV ECX, DWORD PTR [ESP + 0CH]
PUSH ESI
PUSH EDI
__AGAIN:
LODSB
STOSB
LOOP __AGAIN
POP EDI
POP ESI
RETN 0CH
__MyMemcpy ENDP
END
A note about this code: beware that if your source and destination buffers overlap this can cause problems. If the buffers don't overlap then what you are doing should work. You can avoid this by marking the pointers __restrict. __restrict is an MSVC/C++ extension that will act as a hint to the compiler that the argument doesn't overlap with another. This can allow the compiler to potentially warn of this situation since your assembly code is unsafe for that situation. Your prototypes could have been written as:
extern void __stdcall __MyMemcpy( void* __restrict, void* __restrict, int);
typedef struct {
void(__stdcall* MemCpy)(void* __restrict, void* __restrict, int);
}MemFunc;
You are using PROC but not taking advantage of any of the underlying power it affords (or obscures). You have disabled PROLOGUE and EPILOGUE generation with the OPTION directive. You properly use RET 0Ch to have the 12 bytes of arguments cleaned from the stack.
From a perspective of the STDCALL calling convention your code is correct as it pertains to stack usage. There is a serious issue in that the Microsoft Windows STDCALL calling convention requires the caller to preserve all the registers it uses except EAX, ECX, and EDX. You clobber EDI and ESI and both need to be saved before you use them. In your code you save them after their contents are destroyed. You have to push both ESI and EDI on the stack first. This will require you adding 8 to the offsets relative to ESP. Your code should have looked like this:
__MyMemcpy PROC _dest : DWORD, _source : DWORD, _size : DWORD
PUSH EDI ; Save registers first
PUSH ESI
MOV EDI, DWORD PTR [ESP + 0CH] ; Arguments are offset by an additional 8 bytes
MOV ESI, DWORD PTR [ESP + 10H]
MOV ECX, DWORD PTR [ESP + 14H]
__AGAIN:
LODSB
STOSB
LOOP __AGAIN
POP ESI ; Restore the caller (non-volatile) registers
POP EDI
RETN 0CH
__MyMemcpy ENDP
You asked the question why it appears you are getting an error about ESP or a stack issue. I assume you are getting an error similar to this:
This could be a result of either ESP being incorrect when mixing STDCALL and CDECL calling conventions or it can arise out of the value of the saved ESP being clobbered by the function. It appears in your case it is the latter.
I wrote a small C++ project with this code that has similar behaviour to your C program:
#include <iostream>
extern "C" void __stdcall __MyMemcpy( void* __restrict, void* __restrict, int);
typedef struct {
void(__stdcall* MemCpy)(void* __restrict, void* __restrict, int);
}MemFunc;
int initmemfunc(MemFunc* f) {
f->MemCpy = __MyMemcpy;
return 0;
}
char buf1[] = "Testing";
char buf2[200];
int main()
{
MemFunc mf = { 0 };
initmemfunc(&mf);
mf.MemCpy(buf2, buf1, strlen(buf1));
std::cout << "Hello World!\n" << buf2;
}
When I use code like yours that doesn't properly save ESI and EDI I discovered this in the generated assembly code displayed in the Visual Studio C/C++ debugger:
I have annotated the important parts. The compiler has generated C runtime checks (these can be disabled, but they will just hide the problem and not fix it) including a check of ESP across a STDCALL function call. Unfortunately it relies on saving the original value of ESP (before pushing parameters) into the register ESI. As a result a runtime check is made after the call to __MyMemcpy to see if ESP and ESI are still the same value. If they aren't you get the warning about ESP not being saved correctly.
Since your code incorrectly clobbers ESI (and EDI) the check fails. I have annotated the debug output to hopefully provide a better explanation.
You can avoid the use of a LODSB/STOSB loop to copy data. There is an instruction that just this very operation (REP MOVSB) that copies ECX bytes pointed to by ESI and copies them to EDI. A version of your code could have been written as:
__MyMemcpy PROC _dest : DWORD, _source : DWORD, _size : DWORD
PUSH EDI ; Save registers first
PUSH ESI
MOV EDI, DWORD PTR [ESP + 0CH] ; Arguments are offset by an additional 8 bytes
MOV ESI, DWORD PTR [ESP + 10H]
MOV ECX, DWORD PTR [ESP + 14H]
REP MOVSB
POP ESI ; Restore the caller (non-volatile) registers
POP EDI
RETN 0CH
__MyMemcpy ENDP
If you were to use the power of PROC to save the registers ESI and EDI you could list them with the USES directive. You can also reference the argument locations on the stack by name. You can also have MASM generate the proper EPILOGUE sequence for the calling convention by simply using ret. This will clean the up the stack appropriately and in the case of STDCALL return by removing the specified number of bytes from the stack (ie ret 0ch) in this case since there are 3 4-byte arguments.
The downside is that you do have to generate the PROLOGUE and EPILOGUE code that can make things more inefficient:
.MODEL FLAT, STDCALL
.CODE
__MyMemcpy PROC USES ESI EDI dest : DWORD, source : DWORD, size : DWORD
MOV EDI, dest
MOV ESI, source
MOV ECX, size
REP MOVSB ; Use instead of LODSB/STOSB+Loop
RET
__MyMemcpy ENDP
END
The assembler would generate this code for you:
PUBLIC __MyMemcpy#12
__MyMemcpy#12:
push ebp
mov ebp,esp ; Function prologue generate by PROC
push esi ; USES caused assembler to push EDI/ESI on stack
push edi
mov edi,dword ptr [ebp+8]
mov esi,dword ptr [ebp+0Ch]
mov ecx,dword ptr [ebp+10h]
rep movs byte ptr es:[edi],byte ptr [esi]
; MASM generated this from the simple RET instruction to restore registers,
; clean up stack and return back to caller per the STDCALL calling convention
pop edi ; Assembler
pop esi
leave
ret 0Ch
Some may rightly argue that having the assembler obscure all this work makes the code potentially harder to understand for someone who doesn't realize the special processing MASM can do with a PROC declared function. This may result in harder to maintain code for someone else that is unfamiliar with MASM's nuances in the future. If you don't understand what MASM may generate, then sticking to coding the body of the function yourself is probably a safer bet. As you have found that also involves turning PROLOGUE and EPILOGUE code generation off.
The reason why the stack is corrupted is that MASM "secretly" inserts the prologue code to your function. When I added the option to disable that, the function works for me now.
You can see this, when you switch to assembly mode while still in the C code and then step into your function. It seems that VS doesn't swtich to assembly mode when already in the assembly source.
.586
.MODEL FLAT,STDCALL
OPTION PROLOGUE:NONE
.CODE
mymemcpy PROC dest:DWORD, src:DWORD, sz:DWORD
MOV EDI, [ESP+04H]
MOV ESI, [ESP+08H]
MOV ECX, [ESP+0CH]
AGAIN_:
LODSB
STOSB
LOOP AGAIN_
RETN 0CH
mymemcpy ENDP
END
I have been trying to compile this C program to assembly but it hasn't been working fine.
I am reading
Dennis Yurichev Reverse Engineering for Beginner but I am not getting the same output. Its a simple hello world statement. I am trying to get the 32 bit output
#include <stdio.h>
int main()
{
printf("hello, world\n");
return 0;
}
Here is what the book says the output should be
main proc near
var_10 = dword ptr -10h
push ebp
mov ebp, esp
and esp, 0FFFFFFF0h
sub esp, 10h
mov eax, offset aHelloWorld ; "hello, world\n"
mov [esp+10h+var_10], eax
call _printf
mov eax, 0
leave
retn
main endp
Here are the steps;
Compile the print statement as a 32bit (I am currently running a 64bit pc)
gcc -m32 hello_world.c -o hello_world
Use gdb to disassemble
gdb file
set disassembly-flavor intel
set architecture i386:intel
disassemble main
And i get;
lea ecx,[esp+0x4]
and esp,0xfffffff0
push DWORD PTR [ecx-0x4]
push ebp
mov ebp,esp
push ebx
push ecx
call 0x565561d5 <__x86.get_pc_thunk.ax>
add eax,0x2e53
sub esp,0xc
lea edx,[eax-0x1ff8]
push edx
mov ebx,eax
call 0x56556030 <puts#plt>
add esp,0x10
mov eax,0x0
lea esp,[ebp-0x8]
pop ecx
pop ebx
pop ebp
lea esp,[ecx-0x4]
ret
I have also used
objdump -D -M i386,intel hello_world> hello_world.txt
ndisasm -b32 hello_world > hello_world.txt
But none of those are working either. I just cant figure out what's wrong. I need some help. Looking at you Peter Cordes ^^
The output from the book looks like MSVC, not GCC. GCC will definitely not ever emit main proc because that's MASM syntax, not valid GAS syntax. And it won't do stuff like var_10 = dword ptr -10h.
(And even if it did, you wouldn't see assemble-time constant definitions in disassembly, only in the compiler's asm output which is what the book suggested you look at. gcc -S -masm=intel output. How to remove "noise" from GCC/clang assembly output?)
So there are lots of differences because you're using a different compiler. Even modern versions of MSVC (on the Godbolt compiler explorer) make somewhat different asm, for example not bothering to align ESP by 16, perhaps because more modern Windows versions, or CRT startup code, already does that?
Also, your GCC is making PIE executables by default, so use -fno-pie -no-pie. 32-bit PIE sucks for efficiency and for ease of understanding. See How do i get rid of call __x86.get_pc_thunk.ax. (Also 32-bit absolute addresses no longer allowed in x86-64 Linux? for more about PIE executables, mostly focused on 64-bit code)
The extra clunky stack-alignment in main's prologue is something that GCC8 optimized for functions that don't also need alloca. But it seems even current GCC10 emits the full un-optimized version when you don't enable optimization :(.
Why is gcc generating an extra return address? and Trying to understand gcc's complicated stack-alignment at the top of main that copies the return address
Optimizing printf to puts: see How to get the gcc compiler to not optimize a standard library function call like printf? and -O2 optimizes printf("%s\n", str) to puts(str). gcc -fno-builtin-printf would be one way to make that not happen, or just get used to it. GCC does a few optimizations even at -O0 that other compilers only do at higher optimization levels.
MSVC 19.10 compiles your function like this (on the Godbolt compiler explorer) with optimization disabled (the default, no compiler options).
_main PROC
push ebp
mov ebp, esp
push OFFSET $SG4501
call _printf
add esp, 4
xor eax, eax
pop ebp
ret 0
_main ENDP
_DATA SEGMENT
$SG4501 DB 'hello, world', 0aH, 00H
GCC10.2 still uses an over-complicated stack alignment dance in the prologue.
.LC0:
.string "hello, world"
main:
lea ecx, [esp+4]
and esp, -16
push DWORD PTR [ecx-4]
push ebp
mov ebp, esp
push ecx
sub esp, 4
# end of function prologue, I think.
sub esp, 12 # make sure arg will be 16-byte aligned
push OFFSET FLAT:.LC0 # push a pointer
call puts
add esp, 16 # pop the arg-passing space
mov eax, 0 # return 0
mov ecx, DWORD PTR [ebp-4] # undo stack alignment.
leave
lea esp, [ecx-4]
ret
Yes, this is super inefficient. If you called your function anything other than main, it would already assume ESP was aligned by 16 on function entry:
# GCC10.2 -m32 -O0
.LC0:
.string "hello, world"
foo:
push ebp
mov ebp, esp
sub esp, 8 # reach a 16-byte boundary, assuming ESP%16 = 12 on entry
#
sub esp, 12
push OFFSET FLAT:.LC0
call puts
add esp, 16
mov eax, 0
leave
ret
So it still doesn't combine the two sub instructions, but you did tell it not to optimize so braindead code is expected. See Why does clang produce inefficient asm with -O0 (for this simple floating point sum)? for example.
My GCC will very eagerly swap a call to printf to puts! I did not manage to find the command line options that would make the compiler to not do this. I.e. the program has the same external behaviour but the machine code is that of
#include <stdio.h>
int main(void)
{
puts("hello, world");
}
Thus, you'll have really hard time trying to get the exact same assembly as in the book, as the assembly from that book has a call to printf instead of puts!
First of all you compile not decompile.
You get a lots of noise as you compile without the optimizations. If you compile with optimizations you will get much smaller code almost identical with the one you have (to prevent change from printf to puts you need to remove the '\n' https://godbolt.org/z/cs4qe9):
.LC0:
.string "hello, world"
main:
lea ecx, [esp+4]
and esp, -16
push DWORD PTR [ecx-4]
push ebp
mov ebp, esp
push ecx
sub esp, 16
push OFFSET FLAT:.LC0
call puts
mov ecx, DWORD PTR [ebp-4]
add esp, 16
xor eax, eax
leave
lea esp, [ecx-4]
ret
https://godbolt.org/z/xMqo33
I want to allocate on the heap a stack frame for each function call, but for that I need to modify the stack base pointer for each function.
Is there any way to get that pointer and modify it in C under Linux?
The only (practical) way to manually modify ebp is inline assembly or just write a function in assembly. If you have an assembly function you could do something like this (I am not really experienced with x86 assembly fyi)
; void function(void* heapPtr)
function:
push ebp
mov reg, esp
mov esp, [esp+4]
mov ebp, esp
...
mov esp, reg
pop ebp
ret
Have a look at the function alloca(). Its use is not recommended, but most compilers has an implementation for it.
I compiled the code below with the VC++ 2010 compiler:
__declspec(dllexport)
unsigned int __cdecl __mm_getcsr(void) { return _mm_getcsr(); }
and the generated code was:
push ECX
stmxcsr [ESP]
mov EAX, [ESP]
pop ECX
retn
Why is there a push ECX/pop ECX instruction pair?
The compiler is making room on the stack to store the MXCSR. It could have equally well done this:
sub esp,4
stmxcsr [ESP]
mov EAX, [ESP]
add esp,4
retn
But "push ecx" is probably shorter or faster.
The push here is used to allocate 4 bytes of temporary space. [ESP] would normally point to the pushed return address, which we cannot overwrite.
ECX will be overwritten here, however, ECX is a probably a volatile register in the ABI you're targeting, so functions don't have to preserve ECX.
The reason a push/pop is used here is a space (and possibly speed) optimization.
It creates an top-of-stack entry that ESP now refers to as the target for the stmxcsr instruction. Then the result is stored in EAX for the return.