Variadic Function 64-Bit Windows - c

I am trying to get the parameters of a 64-Bit __fastcall function, but I am having a couple of issues/questions.
1) I checked the registers in the debugger and when I have 3 32-bit parameters and a void function, the second one goes into RDX, the third one into R8 and the first one I cannot see at all and assume is on the stack.
I did not check every possible combination but this goes against what MSDN's documentation on 64-bit __fastcall says. ...Or am I missing something?
-- Regarding 1 I just realized I think it says that if I pass a 32-bit value into a 64-bit register it's not 0 extended so I probably just missed it due to gibberish data that was in the RCX register.
Due to VS not support 64-bit inline assembly or any useful intrinsics (At least that I can find), I wrote a shellcode to get all of the parameters from RCX, RDX, R8, R9, XMM0-3.
The issue here is that in order to prepare the shellcode I have to allocate memory, copy memory then set the EIP to my shellcode or calling it, etc. which screws up the thread's context. Is there any way to cleanly do this?

Related

GCC C and ARM Assembly Stack Cleanup

If I call an ARM assembly function from C, sometimes I need to pass in many arguments. If they do not fit in registers r0, r1, r2, r3 it is generally expected that 5-th, 6-th ... x-th arguments are pushed onto stack so that ARM assembly can read them from it.
So in the ARM function I receive some arguments that are on the stack. After finishing the assembly function I can either remove these arguments from stack or leave them there and expect that the C program will deal with them later.
If we are talking about GCC C and ARM assembly who is usually responsible for cleaning up the stack?
The function that made the call (A)
Or the function that was called (B)
I understand that when developing we could agree on either convention. But what is generally used as the default in this particular case (ARM assembly and GCC C)?
And how would generally a low level piece of code describe which behavior it implements? It seems that there should be some kind of standard description for this. If there isn't one it seems that you pretty much just have to try them both and look at which one does not crash.
If someone is interested in how the code could look like:
arm_function:
stmfd sp, {r4-r12, lr} # Save registers that are not the first three registers, SP->PASSED ARGUMENTS
ldmfd sp, {r4-r6} # Load 3 arguments that were passed through the stack, SP->PASSED ARGUMENTS
sub sp, sp, #40 # Adjust the stack pointer so it points to saved registers, STACK POINTER->SAVED REGISTERS->PASSED ARGUMENTS
#The main function body.
ldmfd sp!, {r4-r12, lr}, # Load saved registers STACK POINTER->PASSED ARGUMENTS
add sp, sp, #12 # Increment stack pointer to remove passed arguments, SP->NOTHING
# If the last code line would not be there, the caller would need to remove the arguments from stack.
UPDATE:
It seems that for C/C++ choice A. is pretty standard. Compilers usually use calling conventions like cdecl that work pretty similar to code in the answers below. More information can be found in this link about calling conventions. Changing C/C++ calling convention for a function does not seem to be so common/easy. With older C standard I could not manage to change it, so it looks like using A should be a decent default choice.
The current ARM procedure call standard is AAPCS.
The language-specific ABI can be found here. Relevant will be the document about C, but others should be similar (why reinvent the wheel?).
A good start for reading might be page 14 in the AAPCS.
It basically requires the caller to clean up the stack, as this is the most simple way: push additional arguments onto the stack, call the function and after return simply adjust the stack pointer by adding an offset (the number of bytes pushed on the stack; this is always a multiple of 4 (the "natural 32bit ARM word size).
But if you use gcc, you can just avoid handling the stack yourself by using inline assembler. This provides features to pass C variables (etc.) to the assembler code. This will also automatically load a parameter into a register if required. Just have a look at the gcc documentation. It is a bit hard to figure out in detail, but I prefer this to having raw assember stubs somewhere.
Ok, i added this as there might be problems understanding the principle:
caller:
...
push r5 // argument which does not fit into r0..r3 anymore
bl callee
add sp,4 // adjust SP
callee:
push r5-r7,lr // temp, variables, return address
sub sp,8 // local variables
// processing
add sp, 8 // restore previous stack frame
pop r5-r7,pc // restore temp. variables and return (replaces bx)
You can verify this by just disassmbling some sample C functions. Note that the pre- and postamble may vary if no temp registers are used or the function does not call another function (no need to stack lr for this).
Also, the caller might have to stack r0..r3 before the call. But that is a matter of compiler optimizations.
Disassembly can be done with gdb and objdump for example.
I use -mabi=aapcs for gcc invocation; not sure if gcc would otherwise use a different standard. Note that all object files have to use the same standard.
Edit:
Just had a peek in the AAPCS and that states that the SP need only 4 byte alignment. I might have confused this with the Cortex-M interrupt handling system which (for whatever reason, possibly for M7 which has 64 bit busses) aligns the SP to 8 bytes by default (software-config option).
However, SP must be 8 byte aligned at a public interface. Ok, the standard actually is more complicated than I remembered. That's why I prefer gcc caring about this stuff.
If some spaces allocated on the stack by caller function (argument passing), stack clearance done within the caller function. And how it happens you may ask. In ARM #Olaf has completely cleared, and in x86 it is usually like this:
sub esp, 8 ; make some room
... ; move arguments on stack
call func
add esp, 8 ; clean the stack
or
push eax ; push the arguments
push ebx ; or pusha, then after call, popa
call func
add esp, 8 ; assuming registers are 4 bytes each
Also how the interaction between caller and callee in a system takes places is explained in ABI (Application Binary Interface) You may find it useful.

Passing Parameters in 64 bit Assembly Function from C language. Which Register Receive These Parameter?

I want to pass a parameter to an assembly function from C.
On a UNIX-like system, the first six parameters go into rdi, rsi, rdx, rcx, r8, and r9.
On Windows, the first four parameters go into rcx, rdx, r8, and r9.
Now, my question is: On the BIOS- or DOS programming level, which registers receive these parameters? If the number of parameter are more than 6, how do I write the assembly to handle these parameters?
If I understand the first part of your question, using C in 16-bit mode is not really supported (since 16-bit mode uses segmentation to get past 16 bits of addressing).
Referring to the second part, that depends on the compiler, but IIRC both Windows and Unix will pass additional arguments on the stack (see http://en.wikipedia.org/wiki/X86_calling_conventions for more on argument passing).
64-bit UEFI uses the Windows convention.
The BIOS and DOS APIs are defined in assembly language.
Traditionally in 16-bit and 32-bit x86 all the arguments are stored on the stack.

Displaying PSW content

I'm beginner with asm, so I've been researching for my question for a while but answears were unsatisfactory. I'm wondering how to display PSW content on standard output. Other thing, how to display Instruction Pointer value ? I would be very gratefull if ypu could give me a hint (or better a scratch of code). It may be masm or 8086 as well (actually I don't know wthat is the difference :) )
The instruction pointer is not directly accessible on the x86 family, however, it is quite straightforward to retrieve its value - it will never be accurate though.
Since a subroutine call places the return address on the stack, you just need to copy it from there and violá! You have the address of the opcode following the call instruction:
proc getInstructionPointer
push bp
mov bp,sp
mov ax,[word ptr ss:bp + 2]
mov sp,bp
pop bp
ret
endp getInstructionPointer
The PSW on the x86 is called the Flags register. There are two operations that explicitly reference it: pushf and popf. As you might have guessed, you can simply push the Flags onto the stack and load it to any general purpose register you like:
pushf
pop ax
Displaying these values consists of converting their values to ASCII and writing them onto the screen. There are several ways of doing this - search for "string output assembly", I bet you find the answer.
To dispel a minor confusion: 8086 is the CPU itself, whereas MASM is the assembler. The syntax is assembler-specific; MASM assembly is x86 assembly. TASM assembly is x86 assembly as well, just like NASM assembly.
When one says "x86 Assembly", he/she is referencing any of these (or others), talking about the instruction set, not the dialect.
Note that the above examples are 16bit, indtended for 8086 and won't work on 80386+ in 32bit mode

Safely hooking C functions into assembly

I'm trying to put some C hooks into some code that someone else wrote in asm. I don't know much about x86 asm and I want to make sure I'm doing this safely.
So far I've got this much:
EXTERN _C_func
CALL _C_func
And that seems to work, but I'm sure that's dangerous as-is. At the very least it could be destroying EAX, and possibly more.
So I think all I need to know is: Which registers does C destroy and how do I ensure the registers are being saved properly? Is there anything else I can do to make sure I can safely insert hooks into arbitrary locations?
I'm using Visual Studio for the C functions, if that helps.
Thank you.
UPDATE:
I looked up the "paranoid" method several people suggested and it looks like this:
pushfd
pushad
call _C_func
popad
popfd
AD = (A)ll general registers
FD = (F)lags
(I'm not sure what the 'd' stands for, but it means 32-bit registers instead of 16-bit.)
But the most efficient method is illustrated in Kragen's answer below. (Assuming you don't need the preserve the flags. If you do, add the FD instructions.)
Also, be careful about allocating lots of stack variables in your C function as it could overrun the asm subroutine's stack space.
I think that about does it, thanks for you help everyone!
If the calling convention is cdecl and the signature of C_funct is simply:
void C_func(void);
Then that is perfectly "safe", however the registers EAX, ECX and EDX are "available for use inside the function", and so may be overwritten.
To protect against this you can save the registers that you care about and restore them afterwards:
push eax
push ecx
push edx
call _C_func
pop edx
pop ecx
pop eax
By convention the registers EBX, ESI, EDI, and EBP shouldn't be modified by the callee.
I believe that the flags may be modified by the callee - again if you care about preserving the value of flags then you should save them.
See the Wikipedia page on x86 calling conventions.
You need to be aware of calling conventions. See this article for a discussion on calling conventions.
If you are paranoid about registers, you can always push all registers onto the stack before calling the function (functions shouldn't return without cleaning up, except for those registers that carry return information).
This will depend on whether you are using an x86 or x86_64 platform ... the calling conventions for each, as well as the caller-save and callee-save registers (and even the available register set) are a bit different.
That being said, caller-save registers need to be pushed on the stack before you call your C-function. Callee-save registers you don't need to worry about if you are calling a C-function, but you will need to pay attention to if you call an assembly routine from a C-function.
For a x86 32-bit platform, the caller-save registers are:
EAX
EDX
ECX
Also keep in mind that the stack-pointer registers, ESP and EBP will be changed as well with cdecl calling convention, but a C-function will restore those values after the call assuming you have not done something that a C-function would not expect you to-do with those registers.
The callee-save registers are:
EBX
ESI
EDI
Finally, the return value from your C-function is in the EAX register, and the C-function arguments are pushed onto the stack before calling the function. The order in which that is done will depend on the calling convention, but for cdecl, that will be in right-to-left order, meaning the left-most argument to the C-function is pushed last on the stack before calling the function.
Should you decide to simply save all the general purpose registers on the stack before you call your C-function, and then pop them back off the stack after the C-function, you can do that on x86 using the PUSHAD and POPAD instructions. These instructions are not useable on x86_64 though. And also keep in mind that if you have a return value in EAX, you will need to-do something with that (like saving it in memory), before you call POPAD.

How many machine instructions are needed for a function call in C?

I'd like to know how many instructions are needed for a function call in a C program compiled with gcc for x86 platforms from start to finish.
Write some code.
Compile it.
Look at the disassembly.
Count the instructions.
The answer will vary as you vary the number and type of parameters, calling conventions etc.
That is a really tricky question that's hard to answer and it may vary.
First of all in the caller it is needed to pass the parameters, depending on the type this will vary, in most cases you will have a push instruction for each parameter.
Then, in the called procedure the first instructions will be to do the allocation for local variables. This is usually done in 3 operations:
PUSH EBP
MOV EBP, ESP
SUB ESP, xxx
You will have the assembly code of the function after that.
Following the code but before the return, the ebp and esp will be restored:
MOV ESP, EBP
POP EBP
Lastly, you will have a ret instruction that depending on the calling convention will dealocate the parameters of the stack or it will leave that to the caller. You can determine this if the RET is with a number as parameter or if the parameter is 0, respectively. In case the parameter is 0 you will have POP instructions in the caller after the CALL instruction.
I would expect at least one
CALL Function
unless it is inlined, of course.
If you use -mno-accumulate-outgoing-args and -Os (or -mpreferred-stack-boundary=2, or 3 on 64-bit), then the overhead is exactly one push per argument word-sized argument, one call, and one add to adjust the stack pointer after return.
Without -mno-accumulate-outgoing-args and with default 16-byte stack alignment, gcc generates code that's roughly the same speed but roughly five times larger for function calls, for no good reason.

Resources