MASM and C jump to function - c

I have a pointer to a __stdcall function in C and in both x86 and x64 assembly what I'd like to do is have an asm function that I can use to jump to that function.
For example take the windows API function MessageBoxW
void *fn = GetProcAddress(GetModuleHandle("kernel32.dll"), MessageBoxW);
Then in C I'll have a call to the ASM, like
void foo()
{
MessageBoxW_asmstub(NULL, "test", "test", NULL);
}
Assume fn is global. Then in assembly I'd like to have a function that just forwards to MessageBoxW, not calling it. In other words I want MessageBoxW to clean up the variables passed to MessageBoxW_asmstub and then return to foo
jump (fn) ?
I don't know how to do this.

Assuming that MessageBoxW_asmstub is declared to the C compiler as having the correct calling convention (i.e. __stdcall for x86; for x64 there is thankfully only one calling convention), then as the comment from Ross Ridge said, this is as simple as jumping to the target function which will then return directly to the caller. Since you have an indirect reference (i.e. fn refers to a pointer to the target), you probably need another load instruction (although my knowledge of x86 is limited here -- I wouldn't be at all surprised if there is some double-indirect form of jmp). You can use any volatile registers in the calling convention to do this, e.g. for x64 you might use something along the lines of:
extern fn:qword
MessageBoxW_asmstub:
mov rax, fn
jmp rax
BTW, if you use a debugger to step through calls to delay-loaded DLL imports, you'll probably see a similar pattern used in the linker-generated stub functions.

Related

Linux Kernel should I use asmlinkage for a function that implements a system call?

I am trying to implement a new syscall in linux kernel, so I wrote:
asmlinkage int my_func(void) {
return my_func_internal();
}
my question, should I define my_func_internal as asmlinkage or not?
in other words, should I write A or B?
A) asmlinkage int my_func_internal(void) {return 1;}
B) int my_func_internal(void) {return 1;}
I would like some explanation too
Note: I have added my_func to syscalls.h should I add the internal one too (probably the answer is no)
It doesn't matter (for correctness) what calling convention you use for functions that aren't called directly by hand-written asm. (Which syscall implementation functions might be on some architectures, that's why they should be asmlinkage.) As long as all callers can see a prototype that matches the definition, it will work.
If asmlinkinkage is a different calling convention from the default one (e.g. on i386, asmlinkage means to use stack args, overriding the -mregparm=3 build option that makes internal functions use register args), the compiler will have to emit a definition for my_func that handles the difference if it calls a function that isn't asmlinkage. Or simply inline my_func_internal() into it.
If they use the same calling convention, and the compiler chooses not to inline, it could just do an optimized tailcall to my_func_internal, e.g. on x86 jmp my_func_internal. So there's a possible efficiency advantage to using the same calling convention if there's a possibility of an optimized tailcall. Otherwise don't; asmlinkage makes the calling convention less efficient on i386.
(IIRC, asmlinkage has no effect on x86-64 and most other modern ISAs with register-args calling conventions; the default calling convention on x86 is already good so the kernel doesn't need to override it with -mregparm=3 like it does on i386.)
In your example where there are no args, there's no difference.
BTW, the usual naming convention for the function name is sys_foo to implement a system-call called foo. i.e. the function that will get called when user-space passes __NR_foo as the call number.
Note: I have added my_func to syscalls.h should I add the internal one too (probably the answer is no)
Of course not, unless my_func_internal implements a different system call that you want user-space to be able to call directly.

Register usage in ARM assembly function which is called by a C function

The C function call convention for ARM says:
Caller will pass the first 4 parameters in r0-r3.
Caller will pass any extra parameters on stack.
Caller will get the return value from r0.
I am handcrafting an assembly function called by C. The prototype is equivalent to this:
void s(void);
Suppose a C function c() calls s().
Since s() has no parameter nor return value. I believe r0-r3 will not be touched by the compiler to generate the calling sequence for c() to call s().
Suppose s() will use r0-r12 to complete its function. It is also possible that c() will use those registers.
I am not sure if I have to explicitly save and restore all the registers touched in s(), say r0-r12. Such memory operation will cost some time.
Or at least I don't have to do that for r0-r3?
From Procedure Call Standard for the Arm Architecture, section 6.1.1 (page 19):
A subroutine must preserve the contents of the registers r4-r8, r10, r11 and SP (and r9 in PCS variantsthat designate r9 as v6)
So yes, since r0-r3 are scratch registers, you do not need to save those before using them in s(), but you have to save and restore any other register.
Assuming that the compiler is compliant with the ARM ABI, then declaring s() like this:
extern void s(void);
should suffice, and the compiler should not emit code that relies on previous values of r0-r3 in the c() function after the call to s() (i.e. c() should save r0-r3 if needed before calling s() and restore them after), since that would break the ABI compliance.
Generally when mixing C and asm, you can never make any assumptions about what registers the C code uses, save for those guaranteed to get stacked by the calling convention. Stack all other registers before using them and then pop them later. All of this depends on what assumptions the compiler makes and doesn't make internally upon calling your assembler function.
Some good info here: Mixing C, C++, and Assembly Language

Does a C function without any argument and return value require a stack to execute?

Does below function need any stack for execution?
int a;
void func(void)
{
a = 10;
}
As long as a C compiler can see the definition of func, it can1 implement func without using any stack space. For example, where it sees a call to func, it can implement that by emitting an instruction or two to move 10 into a. That would achieve the same result as calling func as a subroutine, so the C rules permit a C implementation to implement a call to func in that way, and it does not use any stack space.
Generally, if the compiler could not see the definition of func, as when compiling another source file that calls func but does not define it, the compiler would have to issue a call instruction or something similar, and that would, at the least, push the return address onto the stack.
Additionally, if the routine being called were more complicated, the compiler might choose not to implement it inline or might not be able to do so. (For example, if func contained calls to itself, it is generally not possible for the compiler to implement it with inline code in all situations; the compiler will need to implement it with actual subroutine call instructions, which do use stack space.)
Footnote
1 Whether any particular compiler will implement func without using stack space is another matter, dependent on the compiler, the switches used to compile, and other factors.

Can symAdd be used to overwrite a existing symbol in sysSymTbl?

A function foo is already present in the system symbol table in vxworks. Now I want to know if I can replace symbol foo with similar function foo1 which takes same arguments and returns the same type as foo, using symAdd routine?
You shouldn't be able to do this using system instruments. If some function g() invokes foo(), then this function is statically linked to foo(). This means, that g() body contains machine instruction like call <hardcoded_address_of_foo()>. This means, you can't replace foo() with foo1() without actually altering machine instructions either of all foo() callers or foo() itself.
However, altering machine instructions of some function on the fly is quite possible. So if you know your platform well, you can actually do function replacement trick on lower level. I successfully did this for VxWorks 5.5 compiled for MIPS architecture.
What worked for me (everything is done in shell):
Get address of foo() and foo1() using lkup
Calculate, how machine instruction jump <address_of_foo1> should be represented in hex
Use m() to modify RAM occupied by foo() body in order to replace first instruction of foo() with calculated one, second instruction of foo() with nop. The latter is needed since all jumps are performed after executing one more instruction on my platform (delay slot instruction)
As a result foo() is corrupted, but now it performs instant jump to foo1() while keeping all information intact (stack, registers etc), so it looks like foo1() is always called instead.
This is applicable to debugging purposes only, so you can continue debugging by patching code on the fly without writing new image to device. However, this approach shouldn't be ever used in production.

GCC Inline-Assembly Error: "Operand size mismatch for 'int'"

first, if somebody knows a function of the Standard C Library, that prints
a string without looking for a binary zero, but requires the number of characters to draw, please tell me!
Otherwise, I have this problem:
void printStringWithLength(char *str_ptr, int n_chars){
asm("mov 4, %rax");//Function number (write)
asm("mov 1, %rbx");//File descriptor (stdout)
asm("mov $str_ptr, %rcx");
asm("mov $n_chars, %rdx");
asm("int 0x80");
return;
}
GCC tells the following error to the "int" instruction:
"Error: operand size mismatch for 'int'"
Can somebody tell me the issue?
There are a number of issues with your code. Let me go over them step by step.
First of all, the int $0x80 system call interface is for 32 bit code only. You should not use it in 64 bit code as it only accepts 32 bit arguments. In 64 bit code, use the syscall interface. The system calls are similar but some numbers are different.
Second, in AT&T assembly syntax, immediates must be prefixed with a dollar sign. So it's mov $4, %rax, not mov 4, %rax. The latter would attempt to move the content of address 4 to rax which is clearly not what you want.
Third, you can't just refer to the names of automatic variables in inline assembly. You have to tell the compiler what variables you want to use using extended assembly if you need any. For example, in your code, you could do:
asm volatile("mov $4, %%eax; mov $1, %%edi; mov %0, %%esi; mov %2, %%edx; syscall"
:: "r"(str_ptr), "r"(n_chars) : "rdi", "rsi", "rdx", "rax", "memory");
Fourth, gcc is an optimizing compiler. By default it assumes that inline assembly statements are like pure functions, that the outputs are a pure function of the explicit inputs. If the output(s) are unused, the asm statement can be optimized away, or hoisted out of loops if run with the same inputs.
But a system call like write has a side-effect you need the compiler to keep, so it's not pure. You need the asm statement to run the same number of times and in the same order as the C abstract machine would. asm volatile will make this happen. (An asm statement with no outputs is implicitly volatile, but it's good practice to make it explicit when the side effect is the main purpose of the asm statement. Plus, we do want to use an output operand to tell the compiler that RAX is modified, as well as being an input, which we couldn't do with a clobber.)
You do always need to accurately describe your asm's inputs, outputs, and clobbers to the compiler using Extended inline assembly syntax. Otherwise you'll step on the compiler's toes (it assumes registers are unchanged unless they're outputs or clobbers). (Related: How can I indicate that the memory *pointed* to by an inline ASM argument may be used? shows that a pointer input operand alone does not imply that the pointed-to memory is also an input. Use a dummy "m" input or a "memory" clobber to force all reachable memory to be in sync.)
You should simplify your code by not writing your own mov instructions to put data into registers but rather letting the compiler do this. For example, your assembly becomes:
ssize_t retval;
asm volatile ("syscall" // note only 1 instruction in the template
: "=a"(retval) // RAX gets the return value
: "a"(SYS_write), "D"(STDOUT_FILENO), "S"(str_ptr), "d"(n_chars)
: "memory", "rcx", "r11" // syscall destroys RCX and R11
);
where SYS_WRITE is defined in <sys/syscall.h> and STDOUT_FILENO in <stdio.h>. I am not going to explain all the details of extended inline assembly to you. Using inline assembly in general is usually a bad idea. Read the documentation if you are interested. (https://stackoverflow.com/tags/inline-assembly/info)
Fifth, you should avoid using inline assembly when you can. For example, to do system calls, use the syscall function from unistd.h:
syscall(SYS_write, STDOUT_FILENO, str_ptr, (size_t)n_chars);
This does the right thing. But it doesn't inline into your code, so use wrapper macros from MUSL for example if you want to really inline a syscall instead of calling a libc function.
Sixth, always check if the system call you want to call is already available in the C standard library. In this case, it is, so you should just write
write(STDOUT_FILENO, str_ptr, n_chars);
and avoid all of this altogether.
Seventh, if you prefer to use stdio, use fwrite instead:
fwrite(str_ptr, 1, n_chars, stdout);
There are so many things wrong with your code (and so little reason to use inline asm for it) that it's not worth trying to actually correct all of them. Instead, use the write(2) system call the normal way, via the POSIX function / libc wrapper as documented in the man page, or use ISO C <stdio.h> fwrite(3).
#include <unistd.h>
static inline
void printStringWithLength(const char *str_ptr, int n_chars){
write(1, str_ptr, n_chars);
// TODO: check error return value
}
Why your code doesn't assemble:
In AT&T syntax, immediates always need a $ decorator. Your code will assemble if you use asm("int $0x80").
The assembler is complaining about 0x80, which is a memory reference to the absolute address 0x80. There is no form of int that takes the interrupt vector as anything other than an immediate. I'm not sure exactly why it complains about the size, since memory references don't have an implied size in AT&T syntax.
That will get it to assemble, at which point you'll get linker errors:
In function `printStringWithLength':
5 : <source>:5: undefined reference to `str_ptr'
6 : <source>:6: undefined reference to `n_chars'
collect2: error: ld returned 1 exit status
(from the Godbolt compiler explorer)
mov $str_ptr, %rcx
means to mov-immediate the address of the symbol str_ptr into %rcx. In AT&T syntax, you don't have to declare external symbols before using them, so unknown names are assumed to be global / static labels. If you had a global variable called str_ptr, that instruction would reference its address (which is a link-time constant, so can be used as an immediate).
As other have said, this is completely the wrong way to go about things with GNU C inline asm. See the inline-assembly tag wiki for more links to guides.
Also, you're using the wrong ABI. int $0x80 is the x86 32-bit system call ABI, so it doesn't work with 64-bit pointers. What are the calling conventions for UNIX & Linux system calls on x86-64
See also the x86 tag wiki.

Resources