C passes value instead of address to assembly function (x64) - c

I need to pass address instead of value of my field from C to assembly function, and I have no idea why I end up with value instead of address.
C code:
long n = 1,ret = 0;
fun(&n, &ret);
//the rest is omitted
Assembly code:
.globl fun
fun:
pushq %rbp
movq %rsp, %rbp
movq 16(%rbp), %rax #my n address
movq 24(%rbp), %rbx #my ret address
cmpq $0, %rax
//the rest is omitted
When I peek values of %rax and %rbx with gdb I can see that I have values in my registers:
Breakpoint 1, fun () at cw.s:6
6 movq 16(%rbp), %rax #my n address
(gdb) s
7 movq 24(%rbp), %rbx #my ret address
(gdb) s
9 cmpq $0, %rax
(gdb) p $rax
$1 = 1
(gdb) p $rbx
$2 = 0
I don't really see whats wrong with my code. I'm sure that &n makes C pass address instead of value. I am following the solution provided here, but with no luck.
Calling a C function in assembly
Update:
I'm running LXLE (it's a fork of Ubuntu) on AMD x86_64. The compiler used is gcc (Ubuntu 4.8.2-19ubuntu1) and GNU assembler (GNU Binutils for Ubuntu) 2.24. My makefile:
cw: cw.c cw.o
gcc cw.o cw.c -o cw
cw.o: cw.s
as -gstabs -o cw.o cw.s

What architecture are you on? What compiler generated the code for fun? Did you write it yourself?
The code is using the r* registers and your question mentions "x64", so I would assume it's some amd64/x86-64/x64 architecture. You're reading things from the stack (which you've commented as "my n/ret address") which I would assume that you expect the function arguments to be there but I'm not aware of any ABI on that CPU family that passes the first arguments to a function on the stack.
If you wrote it yourself, you need to read up on the calling conventions of the ABI your operating system/compiler uses, because unless you're on a very obscure operating system it will not pass (the first few) function arguments on the stack. Most likely you're just reading random values from the stack that just happen to match where your compiler happened to put the values in the calling function.
If you're on Linux or most other unix-like system that use the SysV ABI the first two arguments to a function will be in the rdi, rsi registers. If you're on Windows, that will be rcx, rdx. This is assuming that your arguments are int/long/pointers. If the arguments are structs, floating point or such, other rules apply.

Related

Why can't I get the value of asm registers in C?

I'm trying to get the values of the assembly registers rdi, rsi, rdx, rcx, r8, but I'm getting the wrong value, so I don't know if what I'm doing is taking those values or telling the compiler to write on these registers, and if that's the case how could I achieve what I'm trying to do (Put the value of assembly registers in C variables)?
When this code compiles (with gcc -S test.c)
#include <stdio.h>
void beautiful_function(int a, int b, int c, int d, int e) {
register long rdi asm("rdi");
register long rsi asm("rsi");
register long rdx asm("rdx");
register long rcx asm("rcx");
register long r8 asm("r8");
const long save_rdi = rdi;
const long save_rsi = rsi;
const long save_rdx = rdx;
const long save_rcx = rcx;
const long save_r8 = r8;
printf("%ld\n%ld\n%ld\n%ld\n%ld\n", save_rdi, save_rsi, save_rdx, save_rcx, save_r8);
}
int main(void) {
beautiful_function(1, 2, 3, 4, 5);
}
it outputs the following assembly code (before the function call):
movl $1, %edi
movl $2, %esi
movl $3, %edx
movl $4, %ecx
movl $5, %r8d
callq _beautiful_function
When I compile and execute it outputs this:
0
0
4294967296
140732705630496
140732705630520
(some undefined values)
What did I do wrong ? and how could I do this?
Your code didn't work because Specifying Registers for Local Variables explicitly tells you not to do what you did:
The only supported use for this feature is to specify registers for input and output operands when calling Extended asm (see Extended Asm).
Other than when invoking the Extended asm, the contents of the specified register are not guaranteed. For this reason, the following uses are explicitly not supported. If they appear to work, it is only happenstance, and may stop working as intended due to (seemingly) unrelated changes in surrounding code, or even minor changes in the optimization of a future version of gcc:
Passing parameters to or from Basic asm
Passing parameters to or from Extended asm without using input or output operands.
Passing parameters to or from routines written in assembler (or other languages) using non-standard calling conventions.
To put the value of registers in variables, you can use Extended asm, like this:
long rdi, rsi, rdx, rcx;
register long r8 asm("r8");
asm("" : "=D"(rdi), "=S"(rsi), "=d"(rdx), "=c"(rcx), "=r"(r8));
But note that even this might not do what you want: the compiler is within its rights to copy the function's parameters elsewhere and reuse the registers for something different before your Extended asm runs, or even to not pass the parameters at all if you never read them through the normal C variables. (And indeed, even what I posted doesn't work when optimizations are enabled.) You should strongly consider just writing your whole function in assembly instead of inline assembly inside of a C function if you want to do what you're doing.
Even if you had a valid way of doing this (which this isn't), it probably only makes sense at the top of a function which isn't inlined. So you'd probably need __attribute__((noinline, noclone)). (noclone is a GCC attribute that clang will warn about not recognizing; it means not to make an alternate version of the function with fewer actual args, to be called in the case where some of them are known constants that can get propagated into the clone.)
register ... asm local vars aren't guaranteed to do anything except when used as operands to Extended Asm statements. GCC does sometimes still read the named register if you leave it uninitialized, but clang doesn't. (And it looks like you're on a Mac, where the gcc command is actually clang, because so many build scripts use gcc instead of cc.)
So even without optimization, the stand-alone non-inlined version of your beautiful_function is just reading uninitialized stack space when it reads your rdi C variable in const long save_rdi = rdi;. (GCC does happen to do what you wanted here, even at -Os - optimizes but chooses not to inline your function. See clang and GCC (targeting Linux) on Godbolt, with asm + program output.).
Using an asm statement to make register asm do something
(This does what you say you want (reading registers), but because of other optimizations, still doesn't produce 1 2 3 4 5 with clang when the caller can see the definition. Only with actual GCC. There might be a clang option to disable some relevant IPA / IPO optimization, but I didn't find one.)
You can use an asm volatile() statement with an empty template string to tell the compiler that the values in those registers are now the values of those C variables. (The register ... asm declarations force it to pick the right register for the right variable)
#include <stdlib.h>
#include <stdio.h>
__attribute__((noinline,noclone))
void beautiful_function(int a, int b, int c, int d, int e) {
register long rdi asm("rdi");
register long rsi asm("rsi");
register long rdx asm("rdx");
register long rcx asm("rcx");
register long r8 asm("r8");
// "activate" the register-asm locals:
// associate register values with C vars here, at this point
asm volatile("nop # asm statement here" // can be empty, nop is just because Godbolt filters asm comments
: "=r"(rdi), "=r"(rsi), "=r"(rdx), "=r"(rcx), "=r"(r8) );
const long save_rdi = rdi;
const long save_rsi = rsi;
const long save_rdx = rdx;
const long save_rcx = rcx;
const long save_r8 = r8;
printf("%ld\n%ld\n%ld\n%ld\n%ld\n", save_rdi, save_rsi, save_rdx, save_rcx, save_r8);
}
int main(void) {
beautiful_function(1, 2, 3, 4, 5);
}
This makes asm in your beautiful_function that does capture the incoming values of your registers. (It doesn't inline, and the compiler happens not to have used any instructions before the asm statement that steps on any of those registers. The latter is not guaranteed in general.)
On Godbolt with clang -O3 and gcc -O3
gcc -O3 does actually work, printing what you expect. clang still prints garbage, because the caller sees that the args are unused, and decides not to set those registers. (If you'd hidden the definition from the caller, e.g. in another file without LTO, that wouldn't happen.)
(With GCC, noninline,noclone attributes are enough to disable this inter-procedural optimization, but not with clang. Not even compiling with -fPIC makes that possible. I guess the idea is that symbol-interposition to provide an alternate definition of beautiful_function that does use its args would violate the one definition rule in C. So if clang can see a definition for a function, it assumes that's how the function works, even if it isn't allowed to actually inline it.)
With clang:
main:
pushq %rax # align the stack
# arg-passing optimized away
callq beautiful_function#PLT
# indirect through the PLT because I compiled for Linux with -fPIC,
# and the function isn't "static"
xorl %eax, %eax
popq %rcx
retq
But the actual definition for beautiful_function does exactly what you want:
# clang -O3
beautiful_function:
pushq %r14
pushq %rbx
nop # asm statement here
movq %rdi, %r9 # copying all 5 register outputs to different regs
movq %rsi, %r10
movq %rdx, %r11
movq %rcx, %rbx
movq %r8, %r14
leaq .L.str(%rip), %rdi
xorl %eax, %eax
movq %r9, %rsi # then copying them to printf args
movq %r10, %rdx
movq %r11, %rcx
movq %rbx, %r8
movq %r14, %r9
popq %rbx
popq %r14
jmp printf#PLT # TAILCALL
GCC wastes fewer instructions, just for example starting with movq %r8, %r9 to move your r8 C var as the 6th arg to printf. Then movq %rcx, %r8 to set up the 5th arg, overwriting one of the output registers before it's read all of them. Something clang was over-cautious about. However, clang does still push/pop %r12 around the asm statement; I don't understand why. It ends by tailcalling printf, so it wasn't for alignment.
Related:
How to specify a specific register to assign the result of a C expression in inline assembly in GCC? - the opposite problem: materialize a C variable value in a specific register at a certain point.
Reading a register value into a C variable - the previous canonical Q&A which uses the now-unsupported register ... asm("regname") method like you were trying to. Or with a register-asm global variable, which hurts efficiency of all your code by leaving it otherwise untouched.
I forgot I'd answered that Q&A, making basically the same points as this. And some other points, e.g. that this doesn't work on registers like the stack pointer.

Why does C not push a pointer on the stack when calling a assembly function?

I am currently trying to get some experience with calling assembly functions from C. Therefore, I created a little program which calculates the sum of all array elements.
The C Code looks like this:
#include <stdio.h>
#include <stdint.h>
extern int32_t arrsum(int32_t* arr,int32_t length);
int main()
{
int32_t test[] = {1,2,3};
int32_t length = 3;
int32_t sum = arrsum(test,length);
printf("Sum of arr: %d\n",sum);
return 0;
}
And the assembly function looks like this:
.text
.global arrsum
arrsum:
pushq %rbp
movq %rsp, %rbp
pushq %rdi
pushq %rcx
movq 24(%rbp),%rcx
#movq 16(%rbp),%rdi
xorq %rax,%rax
start_loop:
addl (%rdi),%eax
addq $4,%rdi
loop start_loop
popq %rcx
popq %rdi
movq %rbp , %rsp
popq %rbp
ret
I assumed that C obeys the calling convention and pushes all arguments on the stack. And indeed, at position 24(%rbp) I am able to find the length of the array. I expected to find the pointer to the array at 16(%rbp), but instead I just found 0x0. After some debugging I found that C didn't push the pointer at all but instead moved the whole pointer into the %rdi register.
Why does this happen? I couldn't find any information about this behavior.
The calling convention the C compiler will use depends on your system, metadata you pass to the compiler and flags. It sounds like your compiler is using the System V AMD64 calling convention detailed here: https://en.m.wikipedia.org/wiki/X86_calling_conventions (implying that you're using a Unix-like OS on a 64 bit x86 chip). Basically, in this convention most arguments go into registers because it's faster and the 64 bit x86 systems have enough registers to make this work (usually).
I assumed that C obeys the calling convention and pushes all arguments on the stack.
There is no "the" calling convention. Passing arguments via the stack is only one possible calling convention (of many). This strategy is commonly used on 32-bit systems, but even there, it is not the only way that parameters are passed.
Most 64-bit calling conventions pass the first 4–6 arguments in registers, which is generally more efficient than passing them on the stack.
Exactly which calling convention is at play here is system-dependent; your question doesn't give much of a clue whether you're using Windows or *nix, but I'm guessing that you're using *nix since the parameter is being passed in the rdi register. In that case, the compiler would be following the System V AMD64 ABI.
In the System V AMD64 calling convention, the first six integer-sized arguments (which can also be pointers) are passed in the registers RDI, RSI, RDX, RCX, R8, and R9, in that order. Each register is dedicated to a parameter, thus parameter 1 always goes into RDI, parameter 2 always goes into RSI, and so on. Floating-point parameters are instead passed via the vector registers, XMM0-XMM7. Additional parameters are passed on the stack in reverse order.
More information about this and other common calling conventions is available in the x86 tag wiki.

c & gcc : Stack growth and alignment - for a 64 bit machine

I have the following program. I wonder why it outputs -4 on the following 64 bit machine? Which of my assumptions went wrong ?
[Linux ubuntu 3.2.0-23-generic #36-Ubuntu SMP Tue Apr 10 20:39:51 UTC
2012 x86_64 x86_64 x86_64 GNU/Linux]
In the above machine and gcc compiler, by default b should be pushed first and a second.
The stack grows downwards. So b should have higher address and a have lower address. So result should be positive. But I got -4. Can anybody explain this ?
The arguments are two chars occupying 2 bytes in the stack frame. But I saw the difference as 4 where as I am expecting 1. Even if somebody says it is because of alignment, then I am wondering a structure with 2 chars is not aligned at 4 bytes.
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
void CompareAddress(char a, char b)
{
printf("Differs=%ld\n", (intptr_t )&b - (intptr_t )&a);
}
int main()
{
CompareAddress('a','b');
return 0;
}
/* Differs= -4 */
Here's my guess:
On Linux in x64, the calling convention states that the first few parameters are passed by register.
So in your case, both a and b are passed by register rather than on the stack. However, since you take its address, the compiler will store it somewhere on the stack after the function is called.(Not necessary in the downwards order.)
It's also possible that the function is just outright inlined.
In either case, the compiler makes temporary stack space to store the variables. Those can be in any order and subject to optimizations. So they may not be in any particular order that you might expect.
The best way to answer these sort of questions (about behaviour of a specific compiler on a specific platform) is to look at the assembler. You can get gcc to dump its assembler by passing the -S flag (and the -fverbose-asm flag is nice too). Running
gcc -S -fverbose-asm file.c
gives a file.s that looks a little like (I've removed all the irrelevant bits, and the bits in parenthesis are my notes):
CompareAddress:
# ("allocate" memory on the stack for local variables)
subq $16, %rsp
# (put a and b onto the stack)
movl %edi, %edx # a, tmp62
movl %esi, %eax # b, tmp63
movb %dl, -4(%rbp) # tmp62, a
movb %al, -8(%rbp) # tmp63, b
# (get their addresses)
leaq -8(%rbp), %rdx #, b.0
leaq -4(%rbp), %rax #, a.1
subq %rax, %rdx # a.1, D.4597 (&b - &a)
# (set up the parameters for the printf call)
movl $.LC0, %eax #, D.4598
movq %rdx, %rsi # D.4597,
movq %rax, %rdi # D.4598,
movl $0, %eax #,
call printf #
main:
# (put 'a' and 'b' into the registers for the function call)
movl $98, %esi #,
movl $97, %edi #,
call CompareAddress
(This question explains nicely what [re]bp and [re]sp are.)
The reason the difference is negative is the stack grows downward: i.e. if you push two things onto the stack, the one you push first will have a larger address, and a is pushed before b.
The reason it is -4 rather than -1 is the compiler has decided that aligning the arguments to 4 byte boundaries is "better", probably because a 32 bit/64 bit CPU deals with 4 bytes at time better than it handles single bytes.
(Also, looking at the assembler shows the effect that -mpreferred-stack-boundary has: it essentially means that memory on the stack is allocated in different sized chunks.)
I think the answer that program given you is correct, the default preferred-stack-boundary of GCC is 4, you can set -mpreferred-stack-boundary=num to GCC options to change the stack boudary, then program will give you the different answer according your set.

Is syscall an instruction on x86_64?

I wanted to check the code for performing system calls in glibc. I found something like this:
ENTRY (syscall)
movq %rdi, %rax /* Syscall number -> rax. */
movq %rsi, %rdi /* shift arg1 - arg5. */
movq %rdx, %rsi
movq %rcx, %rdx
movq %r8, %r10
movq %r9, %r8
movq 8(%rsp),%r9 /* arg6 is on the stack. */
syscall /* Do the system call. */
cmpq $-4095, %rax /* Check %rax for error. */
jae SYSCALL_ERROR_LABEL /* Jump to error handler if error. */
L(pseudo_end):
ret /* Return to caller. */
Now my questions are:
Is syscall (before the cmpq instruction) an instruction?
If it is an instruction, what is the meaning of ENTRY (syscall)? The same name for an ENTRY (I don't know what an ENTRY is) and instruction?
What is L(pseudo_end)?
syscall is an instruction in x86-64, and is used as part of the ABI for making system calls. (The 32-bit ABI uses int 80h or sysenter, and is also available in 64-bit mode, but using the 32-bit ABI from 64-bit code is a bad idea, especially for calls with pointer arguments.)
But there is also a C library function named syscall(2), a generic wrapper for the system-call ABI. Your code shows the dump of that function, including its decoding of the return value into errno-setting. ENTRY(syscall) just means that the function starts there.
L() and ENTRY() are CPP macros.
L(pseudo_end) is just a Label that can be a jump target. Maybe the code at SYSCALL_ERROR_LABEL jumps back to there, although it would be more efficient for that block of code to just ret, so maybe it's a relic from a former version, or used for something else.
Yes, syscall is an instruction on x86-64. There is a similar instruction sysenter on i686.
ENTRY(syscall) would be a macro. Probably expands to the symbol definition, you have to grep for that.

Calling main from assembly

I'm writing a small library intended to be used in place of libc in a small application. I've read the source of the major libc alternatives, but I am unable to get the parameter passing to work for the x86_64 architecture on Linux.
The library does not require any initialization step in between _start and main. Since the libc and its alternatives do use a initialization step, and my assembly knowledge being limited, I suspect the parameter reordering is causing me troubles.
This is what I've got, which contains assembly inspired from various implementations:
.text
.global _start
_start:
/* Mark the outmost frame by clearing the frame pointer. */
xorl %ebp, %ebp
/* Pop the argument count of the stack and place it
* in the first parameter-passing register. */
popq %rdi
/* Place the argument array in the second parameter-passing register. */
movq %rsi, %rsp
/* Align the stack at a 16-byte boundary. */
andq $~15, %rsp
/* Invoke main (defined by the host program). */
call main
/* Request process termination by the kernel. This
* is x86 assembly but it works for now. */
mov %ebx, %eax
mov %eax, 1
int $80
And the entry point is the ordinary main signature: int main(int argc, char* argv[]). Environment variables etc. are not required for this particular project.
The AMD64 ABI says rdi should be used for the first parameter, and rsi for the second.
How do I correctly setup the stack and pass the parameters to main on Linux x86_64? Thanks!
References:
http://www.eglibc.org/cgi-bin/viewvc.cgi/trunk/libc/sysdeps/x86_64/elf/start.S?view=markup
http://git.uclibc.org/uClibc/tree/libc/sysdeps/linux/x86_64/crt1.S
I think you got
/* Place the argument array in the second parameter-passing register. */
movq %rsi, %rsp
wrong. It should be
movq %rsp, %rsi # move argv to rsi, the second parameter in x86_64 abi
main is called by crt0.o; see also this question
The kernel is setting up the initial stack and process environment after execve as specified in the ABI document (architecture specific); the crt0 (and related) code is in charge of calling main.

Resources