I know this is a strange question but is there a way to use pointers to return to a certain point in the code? What I'm trying to do is mimic the behavior of a jal instruction in MIPS.
For instance, if I have a function fun1() that returns 1, and another function fun2() that returns 2, and main() as defined here:
1 main() {
2 int v = fun1(); // v = 1
3 if (v == 2) return 2;
4 v = fun2(); // v = 2
5 }
Could I jump back to line 3 after fun2() is called in line 4 by keeping a pointer to the return address of the call to fun1 on line 2?
With the GNU C extension to take the address of a goto label (https://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html), yes, you could emulate a jal-like function call using a C variable as the "link register" like asm $ra. But that's separate from the asm the compiler emits - you'd have to make the "function calls" using goto to do exactly what you describe.
No, you can't hack into the asm the compiler emits and capture the return address. That's not something you can safely do even with GNU C inline asm.
Also, IIRC, the standard MIPS calling convention doesn't require functions to return by restoring the return address into RA. In theory they could return with jr $t9 or whatever after copying the return address to that register instead of $ra. The caller can't assume anything about RA on return from a jal. In practice, though, RA will hold the value JAL put there. I can't think of any reason the compiler would do something different. Except maybe C++ exception unwinding? But that would end in a catch, not in the normal return path.
So sure, if you were writing the caller in asm, you could take advantage of $ra if you wanted, although normally it would be no more efficient than a normal relative-branch bne $v0, $s0 instruction at the bottom of a while loop containing a jal fun2 to keep calling until you got a different return value.
Related
#include<conio.h>
#include<math.h>
int sum(int n);
int main()
{
printf("sum is %d", sum(5));
return 0;
}
//recursive function
int sum(int n)
{
if(n==1)
{
return 1;
}
int sumNm1=sum(n-1); //sum of 1 to n
int sumN=sumNm1+n;
}
Here i didn't understand how this code works when n==1 becomes true,
How this code backtracks itself afterwards..?
The code needs a return statement in the case where n is not 1:
int sum(int n)
{
if(n==1)
{
return 1;
}
int sumNm1=sum(n-1); //sum of 1 to n
int sumN=sumNm1+n;
return sumN;
}
or more simply:
int sum(int n)
{
if(n==1)
{
return 1;
}
return n + sum(n-1);
}
How this code backtracks itself afterwards..?
When a function is called, the program saves information about hwo to get back to the calling context. When return statement is executed, the program uses the saved information to return to the calling context.
This is usually implemented via a hardware stack, a region of memory set aside for this purpose. There is a stack pointer that points to the active portion of the stack memory. When main calls sum(5), a return address into main is pushed onto the stack, and the stack pointer is adjusted to point to memory that is then used for the local variables in sum. When sum calls sum(n-1), which is sum(4), a return address into sum is pushed onto the stack, and the stack pointer is adjusted again. This continues for sum(3), sum(2), and sum(1). For sum(1), the function returns 1, and the saved return address is used to go back to the previous execution of sum, for sum(2), and the stack pointer is adjusted in the reverse direction. Then the returned value 1 is added to its n, and 3 is returned. The saved address is used to go back to the previous execution, and the stack pointer is again adjusted in the reverse direction. This continues until the original sum(5) is executing again. It returns 15 and uses the saved address to go back to main.
How this code backtracks itself afterwards..?
It doesn't certainly work.
Any success is due to undefined behavior (UB).
The biggest mistake is not compiling with a well enabled compiler.
int sum(int n)
{
if(n==1)
{
return 1;
}
int sumNm1=sum(n-1); //sum of 1 to n
int sumN=sumNm1+n; // !! What, no warning?
} // !! What, no warning?
A well enabled compiler generates warnings something like the below.
warning: unused variable 'sumN' [-Wunused-variable]
warning: control reaches end of non-void function [-Wreturn-type]
Save time and enable all compiler warnings. You get faster feedback to code troubles than posting on SO.
int sumN=sumNm1+n;
return sumN; // Add
}
Like pointed in comments, the problem is that you don't return the value you compute from within the function (Undefined Behavior). You calculate it correctly (but in a clumsy way, using 2 unneeded variables). If you add a return sumN; statement at the end of the function, things should be fine.
Also, the type chosen for the return value is not the best one. You should choose:
An unsigned type (as we are talking about natural numbers), otherwise half of its interval would be simply wasted (on negative values which won't be used)
One that's as large as possible (uint64_t). Note that this only allows larger values to be computed, but does not eliminate the possibility of an overflow, so you should also be careful when choosing the input type (uint32_t)
More details on recursion: [Wikipedia]: Recursion (it also contains an example very close to yours: factorial).
Example:
main00.c:
#include <stdint.h>
#include <stdio.h>
#if defined(_WIN32)
# define PC064U_FMT "%llu"
# define PC064UX_FMT "0x%016llX"
#else
# define PC064U_FMT "%lu"
# define PC064UX_FMT "0x%016lX"
#endif
uint64_t sum(uint32_t n) // Just 3 lines of code
{
if (n < 2)
return n;
return n + sum(n - 1);
}
uint64_t sum_gauss(uint32_t n)
{
if (n == (uint32_t)-1)
return (uint64_t)(n - 1) / 2 * n + n;
return n % 2 ? (uint64_t)(n + 1) / 2 * n : (uint64_t)n / 2 * (n + 1);
}
uint64_t sum_acc(uint32_t n, uint64_t acc)
{
if (n == 0)
return acc;
return sum_acc(n - 1, acc + n);
}
int main()
{
uint32_t numbers[] = { 0, 1, 2, 3, 5, 10, 254, 255, 1000, 100000, (uint32_t)-2, (uint32_t)-1 };
for (size_t i = 0; i < sizeof(numbers) / sizeof(numbers[0]); ++i) {
uint64_t res = sum_gauss(numbers[i]);
printf("\nsum_gauss(%u): "PC064U_FMT" ("PC064UX_FMT")\n", numbers[i], res, res);
res = sum_acc(numbers[i], 0);
printf(" sum_acc(%u): "PC064U_FMT" ("PC064UX_FMT")\n", numbers[i], res, res);
res = sum(numbers[i]);
printf(" sum(%u): "PC064U_FMT" ("PC064UX_FMT")\n", numbers[i], res, res);
}
printf("\nDone.\n\n");
return 0;
}
Notes:
I added Gauss's formula (sum_gauss) to calculate the same thing using just simple arithmetic operations (and thus is waaay faster)
Another thing about recursion: although it's a nice technique (very useful for learning), it's not so practical (because each function call eats up stack), and if function calls itself many times, the stack will eventually run out (StackOverflow). A recurrent call can be worked around that using an optimization - with the help of an accumulator (check [Wikipedia]: Tail call or [SO]: What is tail call optimization?). I added sum_acc to illustrate this
Didn't consider necessary to also add the iterative variant (as it would only be a simple for loop)
Output:
(qaic-env) [cfati#cfati-5510-0:/mnt/e/Work/Dev/StackOverflow/q074798666]> ~/sopr.sh
### Set shorter prompt to better fit when pasted in StackOverflow (or other) pages ###
[064bit prompt]> ls
main00.c vs2022
[064bit prompt]> gcc -O2 -o exe main00.c
[064bit prompt]> ./exe
sum_gauss(0): 0 (0x0000000000000000)
sum_acc(0): 0 (0x0000000000000000)
sum(0): 0 (0x0000000000000000)
sum_gauss(1): 1 (0x0000000000000001)
sum_acc(1): 1 (0x0000000000000001)
sum(1): 1 (0x0000000000000001)
sum_gauss(2): 3 (0x0000000000000003)
sum_acc(2): 3 (0x0000000000000003)
sum(2): 3 (0x0000000000000003)
sum_gauss(3): 6 (0x0000000000000006)
sum_acc(3): 6 (0x0000000000000006)
sum(3): 6 (0x0000000000000006)
sum_gauss(5): 15 (0x000000000000000F)
sum_acc(5): 15 (0x000000000000000F)
sum(5): 15 (0x000000000000000F)
sum_gauss(10): 55 (0x0000000000000037)
sum_acc(10): 55 (0x0000000000000037)
sum(10): 55 (0x0000000000000037)
sum_gauss(254): 32385 (0x0000000000007E81)
sum_acc(254): 32385 (0x0000000000007E81)
sum(254): 32385 (0x0000000000007E81)
sum_gauss(255): 32640 (0x0000000000007F80)
sum_acc(255): 32640 (0x0000000000007F80)
sum(255): 32640 (0x0000000000007F80)
sum_gauss(1000): 500500 (0x000000000007A314)
sum_acc(1000): 500500 (0x000000000007A314)
sum(1000): 500500 (0x000000000007A314)
sum_gauss(100000): 5000050000 (0x000000012A06B550)
sum_acc(100000): 5000050000 (0x000000012A06B550)
sum(100000): 5000050000 (0x000000012A06B550)
sum_gauss(4294967294): 9223372030412324865 (0x7FFFFFFE80000001)
sum_acc(4294967294): 9223372030412324865 (0x7FFFFFFE80000001)
sum(4294967294): 9223372030412324865 (0x7FFFFFFE80000001)
sum_gauss(4294967295): 9223372034707292160 (0x7FFFFFFF80000000)
sum_acc(4294967295): 9223372034707292160 (0x7FFFFFFF80000000)
sum(4294967295): 9223372034707292160 (0x7FFFFFFF80000000)
Done.
As seen in the image above, the simple implementation (sum) failed while the other 2 passed (for a certain (big) input value). Not sure though why it didn't also fail on Linux (WSL), most likely one of the optimizations (from -O2) enabled tail-end-recursion (or increased the stack?).
If I understand your question correctly, you're more interested in how recursion actually works, than in the error produced by the missing return statement (see any of the other answers).
So here's my personal guide to understanding recurive functions.
If you know about Mathematical Induction, this might help understand how recursion works (a least it did for me). You prove a base case(, make an assumption about a fixed value) and prove the statement for a following number. In programming we do a very similar thing.
Firstly, identify your base cases, i.e. some input to the function that you know what the output is. In your example this is
if(n==1)
{
return 1;
}
Now, we need to find a way to compute the value for any given input from "smaller" inputs; in this case sum(n) = sum(n-1) +n.
How does backtracking work after the base case has been reached?
To understand this, picture the function call sum(2).
We first find that 2 does not match our base case, so we recursively call the function with sum(2-1). You can imagine this recursive call as the function called with sum(2) halting until sum(1) has returned a result. Now sum(1) is the "active" function, and we find that it matches our base case, so we return 1. This is now returned to where sum(2) has waited for the result, and this function now can compute 2 + sum(1), because we got the result from the recursive call.
This goes on like this for every recursive call, that is made.
Interested in a bit more low-level explanation?
In assembly (MIPS), your code would look something like this:
sum:
addi $t1, $0, 1 # store '1' in $t0
beq $a0, $t0, base # IF n == 1: GOTO base
# ELSE:
# prepare for recursive call:
sw $a0, 4($sp) # write n on top of the stack
sw %ra, 8($sp) # write the line of the caller on top of stack
addi $sp, $sp, 8 # advance stack pointer
addi $a0, $a0, -1 # n = n-1
jal sum # call sum with reduced n
# this is executed AFTER the recursive call
addi $sp, $sp, -8 # reset stack pointer
lw %ra, 8($sp) # load where to exit the function
lw %a0, 4($sp) # load the original n this instance of sum got
add %v0, %a0, %v0 # add our n to the result of sum(n-1)
jr %ra # return to wherever sum() was called
base: # this is only executed when base case is reached
add %v0, %0, %t1 # write '1' as return statement of base case
jr %ra # reutrn to caller
Anytime the recursive function is called, we temporarily store the argument the current function got ($a0) and the calling function ($ra) on the stack. That's basically a LIFO storage, and we can access the top of it using the stack pointer $sp. So when we enter recursion, we want to make room on the stack for whatever we need to store there by advancing the stack pointer(addi $sp, $sp, 8); we can now store whatever we need there.
When this is done, we manipulate the argument we got (function arguments are always stored in $a0 in MIPS so we need to overwrite the argument we got). We write n-1 as argument for our recursive call and proceed to 'jump and lin' (jal) to the beginning of the function. This jumps to the provided label (start of our function) and saves the current line of code in $ra so we can return here after the function call. For every recursive call we make, the stack grows, because we store our data there, so we need to remember to reset it lateron.
Once a function call gets the argument 1, the programm jumps to base, we can simply write 1 into the designated return register ($v0), and jump back to the line of code we were called from.
This is the line where we used jal to jump back to the beginning. Since the called function provided the result of the base case in $v0,we can simply add our argument to $v0and return. However we first need to recover the argument and the return address from the stack. We also decrement the stack pointer, so that it is in the exact position where it was when the function was called. Therefore all recursive calls work together to compute the overall result; every idividual call has it's own storage on the stack, but it also ensures to tidy up before exiting, so that all the other calls can access their respective data.
The takeaway is: When calling a function recursively, execution jumps back to the beginning of the function with altered arguments. However, every individual function call handles their own set of variables (temporarily store on the stack). After a recursive call returns a value, the next most-inner recursive call becomes active, re-loads all their variables and computes the next result.
If this program were implemented correctly, it would work like this: When n is 1, the function returns 1. When n is 2, the function calls itself for n is 1, so it gets 1, and then adds n (i.e., 2) to it. When n is 3, the function calls itself for n is 2, so it gets 3, and then adds n (i.e., 3) to it. And so on.
My programming language compiles to C, I want to implement tail recursion optimization. The question here is how to pass control to another function without "returning" from the current function.
It is quite easy if the control is passed to the same function:
void f() {
__begin:
do something here...
goto __begin; // "call" itself
}
As you can see there is no return value and no parameters, those are passed in a separate stack adressed by a global variable.
Another option is to use inline assembly:
#ifdef __clang__
#define tail_call(func_name) asm("jmp " func_name " + 8");
#else
#define tail_call(func_name) asm("jmp " func_name " + 4");
#endif
void f() {
__begin:
do something here...
tail_call(f); // "call" itself
}
This is similar to goto but as goto passes control to the first statement in a function, skipping the "entry code" generated by a compiler, jmp is different, it's argument is a function pointer, and you need to add 4 or 8 bytes to skip the entry code.
The both above will work but only if the callee and the caller use the same amount of stack for local variables which is allocated by the entry code of the callee.
I was thinking to do leave manually with inline assembly, then replace the return address on the stack, then do a legal function call like f(). But my attempts all crashed. You need to modify BP and SP somehow.
So again, how to implement this for x64? (Again, assuming functions have no arguments and return void). Portable way without inline assembly is better, but assembly is accepted. Maybe longjump can be used?
Maybe you can even push the callee address on the stack, replacing the original return address and just ret?
Do not try to do this yourself. A good C compiler can perform tail-call elimination in many cases and will do so. In contrast, a hack using inline assembly has a good chance of going wrong in a way that is difficult to debug.
For example, see this snippet on godbolt.org. To duplicate it here:
The C code I used was:
int foo(int n, int o)
{
if (n == 0) return o;
puts("***\n");
return foo(n - 1, o + 1);
}
This compiles to:
.LC0:
.string "***\n"
foo:
test edi, edi
je .L4
push r12
mov r12d, edi
push rbp
mov ebp, esi
push rbx
mov ebx, edi
.L3:
mov edi, OFFSET FLAT:.LC0
call puts
sub ebx, 1
jne .L3
lea eax, [r12+rbp]
pop rbx
pop rbp
pop r12
ret
.L4:
mov eax, esi
ret
Notice that the tail call has been eliminated. The only call is to puts.
Since you don't need arguments and return values, how about combining all function into one and use labels instead of function names?
f:
__begin:
...
CALL(h); // a macro implementing traditional call
...
if (condition_ret)
RETURN; // a macro implementing traditional return
...
goto g; // tail recurse to g
The tricky part here is RETURN and CALL macros. To return you should keep yet another stack, a stack of setjump buffers, so when you return you call longjump(ret_stack.pop()), and when you call you do ret_stack.push(setjump(f)). This is poetical rendition ofc, you'll need to fill out the details.
gcc can offer some optimization here with computed goto, they are more lightweight than longjump. Also people who write vms have similar problems, and seemingly have asm-based solutions for those even on MSVC, see example here.
And finally such approach even if it saves memory, may be confusing to compiler, so can cause performance anomalies. You probably better off generating for some portable assembler-like language, llvm maybe? Not sure, should be something that has computed goto.
The venerable approach to this problem is to use trampolines. Essentially, every compiled function returns a function pointer (and maybe an arg count). The top level is a tight loop that, starting with your main, simply calls the returned function pointer ad infinitum. You could use a function that longjmps to escape the loop, i.e., to terminate the progam.
See this SO Q&A. Or Google "recursion tco trampoline."
For another approach, see Cheney on the MTA, where the stack just grows until it's full, which triggers a GC. This works once the program is converted to continuation passing style (CPS) since in that style, functions never return; so, after the GC, the stack is all garbage, and can be reused.
I will suggest a hack. The x86 call instruction, which is used by the compiler to translate your function calls, pushes the return address on the stack and then performs a jump.
What you can do is a bit of a stack manipulation, using some inline assembly and possibly some macros to save yourself a bit of headache. You basically have to overwrite the return address on the stack, which you can do immediately in the function called. You can have a wrapper function which overwrites the return address and calls your function - the control flow will then return to the wrapper which then moves to wherever you pointed it to.
I am trying to get call stack backtrace at my assert/exception handler. Can't include "execinfo.h" therefore can't use int backtrace(void **buffer, int size);.
Also, tried to use __builtin_return_address() but acording to :http://codingrelic.geekhold.com/2009/05/pre-mortem-backtracing.html
... on some architectures, including my beloved MIPS, only __builtin_return_address(0) works.MIPS has no frame pointer, making it difficult to walk back up the stack. Frame 0 can use the return address register directly.
How can I reproduce full call stack backtrace?
I have successfully used the method described here, to get a call trace from stack on MIPS32.
You can then print out the call stack:
void *retaddrs[16];
int n, i;
n = get_call_stack_no_fp (retaddrs, 16);
printf ("CALL STACK: ");
for (i = 0; i < n; i++) {
printf ("0x%08X ", (uintptr_t)retaddrs[i]);
}
printf ("\r\n");
... and if you have the ELF file, then use the addr2line to convert the return addresses to function names:
addr2line -a -f -p -e xxxxxxx.elf addr addr ...
There are of course many gotchas, when using a method like this, including interrupts and exception handlers or results of code optimization. But nevertheless, it might be helpful sometimes.
I have successfully used the method suggested by #Erki A and described here.
Here is a short summary of the method:
The problem:
get a call stack without a frame pointer.
Solution main idea:
conclude from the assembly code what the debugger understood from debug info.
The information we need:
1. Where the return address is kept.
2. What amount the stack pointer is decremented.
To reproduce the whole stack trace one need to:
1. Get the current $sp and $ra
2. Scan towards the beginning of the function and look for "addui
sp,sp,spofft" command (spofft<0)
3. Reprodece prev. $sp (sp- spofft)
4. Scan forward and look for "sw r31,raofft(sp)"
5. Prev. return address stored at [sp+ raofft]
Above I described one iteration. You stop when the $ra is 0.
How to get the first $ra?
__builtin_return_address(0)
How to get the first $sp?
register unsigned sp asm("29");
asm("" : "=r" (sp));
***Since most of my files compiled with micro-mips optimisation I had to deal with micro-mips-ISA.
A lot of issues arose when I tried to analyze code that compiled with microMips optimization(remember that the goal at each step is to reproduce prev. ra and prev. sp):
It makes things a bit more complicated:
1. ra ($31) register contain unaligned return address.
You may find more information at Linked questions.
The unaligned ra helps you understand that you run over different
ISA(micro-mips-isa)
2. There are functions that do not move the sp. You can find more
information [here][3].
(If a "leaf" function only modifies the temporary registers and returns
to a return statement in its caller's code, then there is no need for
$ra to be changed, and there is no need for a stack frame for that
function.)
3. Functions that do not store the ra
4. MicroMips instructions can be both - 16bit and 32bit: run over the
commnds using unsinged short*.
5. There are functions that perform "addiu sp, sp, spofft" more than once
6. micro-mips-isa has couple variations for the same command
for example: addiu,addiusp.
I have decided to ignore part of the issues and that is why it works for 95% of the cases.
Today, I played around with incrementing function pointers in assembly code to create alternate entry points to a function:
.386
.MODEL FLAT, C
.DATA
INCLUDELIB MSVCRT
EXTRN puts:PROC
HLO DB "Hello!", 0
WLD DB "World!", 0
.CODE
dentry PROC
push offset HLO
call puts
add esp, 4
push offset WLD
call puts
add esp, 4
ret
dentry ENDP
main PROC
lea edx, offset dentry
call edx
lea edx, offset dentry
add edx, 13
call edx
ret
main ENDP
END
(I know, technically this code is invalid since it calls puts without the CRT being initialized, but it works without any assembly or runtime errors, at least on MSVC 2010 SP1.)
Note that in the second call to dentry I took the address of the function in the edx register, as before, but this time I incremented it by 13 bytes before calling the function.
The output of this program is therefore:
C:\Temp>dblentry
Hello!
World!
World!
C:\Temp>
The first output of "Hello!\nWorld!" is from the call to the very beginning of the function, whereas the second output is from the call starting at the "push offset WLD" instruction.
I'm wondering if this kind of thing exists in languages that are meant to be a step up from assembler like C, Pascal or FORTRAN. I know C doesn't let you increment function pointers but is there some other way to achieve this kind of thing?
AFAIK you can only write functions with multiple entry-points in asm.
You can put labels on all the entry points, so you can use normal direct calls instead of hard-coding the offsets from the first function-name.
This makes it easy to call them normally from C or any other language.
The earlier entry points work like functions that fall-through into the body of another function, if you're worried about confusing tools (or humans) that don't allow function bodies to overlap.
You might do this if the early entry-points do a tiny bit of extra stuff, and then fall through into the main function. It's mainly going to be a code-size saving technique (which might improve I-cache / uop-cache hit rate).
Compilers tend to duplicate code between functions instead of sharing large chunks of common implementation between slightly different functions.
However, you can probably accomplish it with only one extra jmp with something like:
int foo(int a) { return bigfunc(a + 1); }
int bar(int a) { return bigfunc(a + 2); }
int bigfunc(int x) { /* a lot of code */ }
See a real example on the Godbolt compiler explorer
foo and bar tailcall bigfunc, which is slightly worse than having bar fall-through into bigfunc. (Having foo jump over bar into bigfunc is still good, esp. if bar isn't that trivial.)
Jumping into the middle of a function isn't in general safe, because non-trivial functions usually need to save/restore some regs. So the prologue pushes them, and the epilogue pops them. If you jump into the middle, then the pops in the prologue will unbalance the stack. (i.e. pop off the return address into a register, and return to a garbage address).
See also Does a function with instructions before the entry-point label cause problems for anything (linking)?
You can use the longjmp function: http://www.cplusplus.com/reference/csetjmp/longjmp/
It's a fairly horrible function, but it'll do what you seek.
i know this is the big deal to manipulate stack but i think it would be a great lesson for me.
im searched the internet, and i found calling convention. I know how its working and why. I whant to simulate some of "Callee clean-up stack" maybe stdcall, fastcall its doesnt matter, important think is that who clean-up stack, then i will be have less work do to :)
for example.
i have function in C
double __fastcall Add(int a, int b) {
return a + b;
}
it will be Calee
and i have pointer to this function with type void*,
void* p = reinterpreted_cast<void*>(Add);
And i have function Caller
void Call(void* p, int a, int b) {
//some code about push and pop arg
//some code about save p into register
//some code about move 'p' into CPU to call this function manually
//some code about take of from stack/maybe register the output from function
}
And thats it, its helpful when i use calling convention "Calle clean-up" because i dont need
//some code about cleans-up this mess
I dont know how to do it, i know it can be done with assembler. but i afraid about it, and i never 'touch' this language. i would be greatful to simulate that calling with C, but when anyone can do it with ASM i will be haapy :)
I told also what i whant to do with it,
when i will be know how to manually call function, i will be able to call function with several parameters(if i know the number and size of it) and any type of function.
so i will be able to call any function in any language when that function is in the right calling convention.
I'm using Windows OS x64 and MinGw
First of all: C is intended to hide calling conventions and everything that is specific to how your code is executed from the programmer and provide an abstract layer above it.
The only condition when you need to (as you say) "manually" call a function is when you do it from asm.
C as a language has no direct control over the stack or the program counter.
To cite from GCC manual for fastcall for x86:
On the Intel 386, the `fastcall' attribute causes the compiler to
pass the first two arguments in the registers ECX and EDX.
Subsequent arguments are passed on the stack. The called function
will pop the arguments off the stack. If the number of arguments
is variable all arguments are pushed on the stack.
Also as far as I remember return values are passed in EAX.
So in order to call a function in this way you need to provide the arguments in ECX, EDX and then invoke the call instruction on the function address
int __fastcall Add(int a, int b) {
return a + b;
}
Please note I have changed the return type to int, because I do not remember how doubles are passed back.
int a, b;
// set a,b to something
void* p = reinterpreted_cast<void*>(Add);
int return_val;
asm (
"call %3"
: "=a" (return_val) // return value is passed in eax
: "c" (a) // pass c in ecx
, "d" (b) // pass b in edx
, "r" (p) // pass p in a random free register
);
By calling convention it is up to the callee to clean up any used stack space. In this case we didn't use any, but if we did then your compiler will translate your function Add in such a way that it cleans up the stack automatically.
The code above is actually a hack in such a way that I use the GCC extended asm syntax to automatically put our variables into the appropriate registers. It will generate sufficient code around this asm call to make sure data is consistent.
If you wish to use a stack based calling convention then cdecl is the standard one
int __cdecl Add(int a, int b) {
return a + b;
}
Then we need to push the arguments to the stack prior to calling
asm (
"push %1\n" // push a to the stack
"push %2\n" // push b to the stack
"call %3" // the callee will pop them from the stack and clean up
: "=a" (return_val) // return value is passed in eax
: "r" (a) // pass c in any register
, "r" (b) // pass b in any register
, "r" (p) // pass p in any register
);
One thing that I have not mentioned is that this asm call does not save any of our in-use registers, so I do not recommend putting this in a function that does anything else. In 32 bit x86 there is an instruction pushad that will push all general purpose registers to the stack and an equivalent (popad) to restore them. An equivalent for x86_64 is unavailable though. Normally when you compile C code the compiler will know which registers are in use and will save them in order for the callee not to overwrite them. Here it does not. If your callee uses registers that are in use by the caller - they will be overwritten!