I've run into some strange behaviour when trying to obtain the current stack pointer in C (using inline ASM). The code looks like:
#include <stdio.h>
class os {
public:
static void* current_stack_pointer();
};
void* os::current_stack_pointer() {
register void *esp __asm__ ("rsp");
return esp;
}
int main() {
printf("%p\n", os::current_stack_pointer());
}
If I compile the code using the standard gcc options:
$ g++ test.cc -o test
It generates the following assembly:
__ZN2os21current_stack_pointerEv:
0000000000000000 pushq %rbp
0000000000000001 movq %rsp,%rbp
0000000000000004 movq %rdi,0xf8(%rbp)
0000000000000008 movq 0xe0(%rbp),%rax
000000000000000c movq %rax,%rsp
000000000000000f movq %rsp,%rax
0000000000000012 movq %rax,0xe8(%rbp)
0000000000000016 movq 0xe8(%rbp),%rax
000000000000001a movq %rax,0xf0(%rbp)
000000000000001e movq 0xf0(%rbp),%rax
0000000000000022 popq %rbp
If I run the resulting binary it crashes with a SIGILL (Illegal Instruction). However if I add a little optimisation to the compile:
$ g++ -O1 test.cc -o test
The generated assembly is much simpler:
0000000000000000 pushq %rbp
0000000000000001 movq %rsp,%rbp
0000000000000004 movq %rsp,%rax
0000000000000007 popq %rbp
0000000000000008 ret
And the code runs fine. So to the question; is there a more stable to get hold of the stack pointer from C code on Mac OS X? The same code has no problems on Linux.
The problem with attempting to fetch the stack pointer through a function call is that the stack pointer inside the called function is pointing at a value that will be completely different after the function returns, and therefore you're capturing the address of a location that will be invalid after the call. You're also making the assumption that there was no function prologue added by the compiler on that platform (i.e., both your functions currently have a prologue where the compiler setups up the current activation record on the stack for the function, which will change the value of RSP that you are attempting to capture). At the very least, provided that there was no function prologue added by the compiler, you will need to subtract the size of a pointer on the platform you're using in order to actually get the "true" address to where the stack will be pointing after the return from the function call. This is because the assembly command call pushes the return address for the instruction pointer onto the stack, and ret in the callee will pop that value off the stack. Thus inside the callee, there will at the very least be a return-address instruction that the stack-pointer will be pointing to, and that location won't be valid after the function call. Finally, on certain platforms (unfortunately not x86), you can use the __attributes__((naked)) tag to create a function with no prologue in gcc. Using the inline keyword to avoid a prologue is not completely reliable since it does not force the compiler to inline the function ... under certain low-optimization levels, inlining will not occur, and you'll end up with a prologue again, and the stack-pointer will not be pointing to the correct location if you decide to take it's address in those cases.
If you must have the value of the stack pointer, then the only reliable method will be to use assembly, follow the rules of your platform's ABI, compile to an object file using an assembler, and then link that object file with the rest of the object files in your executable. You can then expose the assembler function to the rest of your code by including a function declaration in a header file. So your code could look like (assuming you're using gcc to compile your assembly):
//get_stack_pointer.h
extern "C" void* get_stack_ptr();
//get_stack_pointer.S
.section .text
.global get_stack_ptr
get_stack_ptr:
movq %rsp, %rax
addq $8, %rax
ret
Rather than using a register variable with a constraint, you should just write some explicit inline assembler to fetch %esp:
static void *getsp(void)
{
void *sp;
__asm__ __volatile__ ("movq %%rsp,%0"
: "=r" (sp)
: /* No input */);
return sp;
}
You can also convert this to a macro using gcc statement expressions:
#define GETSP() ({void *sp;__asm__ __volatile__("movl %%esp,%0":"=r"(sp):);sp;})
A multi arch version was what I needed recently:
/**
* helps to check the architecture macros:
* `echo | gcc -E -dM - | less`
*
* this is arm, x64 and i386 (linux | apple) compatible
* #return address where the stack starts
*/
void *get_sp(void) {
void *sp;
__asm__ __volatile__(
#ifdef __x86_64__
"movq %%rsp,%0"
#elif __i386__
"movl %%esp,%0"
#elif __arm__
// sp is an alias for r13
"mov %%sp,%0"
#endif
: "=r" (sp)
: /* no input */
);
return sp;
}
I do not have a reference for that, but GCC is known to occasionally (often) misbehave in the presence of inline assembly if compilation is not optimized at all. So you should always add the -O1 flag.
As a side-note, what you are trying to do is not very robust in the presence of an optimizing compiler, because the compiler may inline the call to current_stack_pointer() and the returned value may thus be an approximation of the current stack pointer value (not even a lower bound).
Related
I'm trying to get the values of the assembly registers rdi, rsi, rdx, rcx, r8, but I'm getting the wrong value, so I don't know if what I'm doing is taking those values or telling the compiler to write on these registers, and if that's the case how could I achieve what I'm trying to do (Put the value of assembly registers in C variables)?
When this code compiles (with gcc -S test.c)
#include <stdio.h>
void beautiful_function(int a, int b, int c, int d, int e) {
register long rdi asm("rdi");
register long rsi asm("rsi");
register long rdx asm("rdx");
register long rcx asm("rcx");
register long r8 asm("r8");
const long save_rdi = rdi;
const long save_rsi = rsi;
const long save_rdx = rdx;
const long save_rcx = rcx;
const long save_r8 = r8;
printf("%ld\n%ld\n%ld\n%ld\n%ld\n", save_rdi, save_rsi, save_rdx, save_rcx, save_r8);
}
int main(void) {
beautiful_function(1, 2, 3, 4, 5);
}
it outputs the following assembly code (before the function call):
movl $1, %edi
movl $2, %esi
movl $3, %edx
movl $4, %ecx
movl $5, %r8d
callq _beautiful_function
When I compile and execute it outputs this:
0
0
4294967296
140732705630496
140732705630520
(some undefined values)
What did I do wrong ? and how could I do this?
Your code didn't work because Specifying Registers for Local Variables explicitly tells you not to do what you did:
The only supported use for this feature is to specify registers for input and output operands when calling Extended asm (see Extended Asm).
Other than when invoking the Extended asm, the contents of the specified register are not guaranteed. For this reason, the following uses are explicitly not supported. If they appear to work, it is only happenstance, and may stop working as intended due to (seemingly) unrelated changes in surrounding code, or even minor changes in the optimization of a future version of gcc:
Passing parameters to or from Basic asm
Passing parameters to or from Extended asm without using input or output operands.
Passing parameters to or from routines written in assembler (or other languages) using non-standard calling conventions.
To put the value of registers in variables, you can use Extended asm, like this:
long rdi, rsi, rdx, rcx;
register long r8 asm("r8");
asm("" : "=D"(rdi), "=S"(rsi), "=d"(rdx), "=c"(rcx), "=r"(r8));
But note that even this might not do what you want: the compiler is within its rights to copy the function's parameters elsewhere and reuse the registers for something different before your Extended asm runs, or even to not pass the parameters at all if you never read them through the normal C variables. (And indeed, even what I posted doesn't work when optimizations are enabled.) You should strongly consider just writing your whole function in assembly instead of inline assembly inside of a C function if you want to do what you're doing.
Even if you had a valid way of doing this (which this isn't), it probably only makes sense at the top of a function which isn't inlined. So you'd probably need __attribute__((noinline, noclone)). (noclone is a GCC attribute that clang will warn about not recognizing; it means not to make an alternate version of the function with fewer actual args, to be called in the case where some of them are known constants that can get propagated into the clone.)
register ... asm local vars aren't guaranteed to do anything except when used as operands to Extended Asm statements. GCC does sometimes still read the named register if you leave it uninitialized, but clang doesn't. (And it looks like you're on a Mac, where the gcc command is actually clang, because so many build scripts use gcc instead of cc.)
So even without optimization, the stand-alone non-inlined version of your beautiful_function is just reading uninitialized stack space when it reads your rdi C variable in const long save_rdi = rdi;. (GCC does happen to do what you wanted here, even at -Os - optimizes but chooses not to inline your function. See clang and GCC (targeting Linux) on Godbolt, with asm + program output.).
Using an asm statement to make register asm do something
(This does what you say you want (reading registers), but because of other optimizations, still doesn't produce 1 2 3 4 5 with clang when the caller can see the definition. Only with actual GCC. There might be a clang option to disable some relevant IPA / IPO optimization, but I didn't find one.)
You can use an asm volatile() statement with an empty template string to tell the compiler that the values in those registers are now the values of those C variables. (The register ... asm declarations force it to pick the right register for the right variable)
#include <stdlib.h>
#include <stdio.h>
__attribute__((noinline,noclone))
void beautiful_function(int a, int b, int c, int d, int e) {
register long rdi asm("rdi");
register long rsi asm("rsi");
register long rdx asm("rdx");
register long rcx asm("rcx");
register long r8 asm("r8");
// "activate" the register-asm locals:
// associate register values with C vars here, at this point
asm volatile("nop # asm statement here" // can be empty, nop is just because Godbolt filters asm comments
: "=r"(rdi), "=r"(rsi), "=r"(rdx), "=r"(rcx), "=r"(r8) );
const long save_rdi = rdi;
const long save_rsi = rsi;
const long save_rdx = rdx;
const long save_rcx = rcx;
const long save_r8 = r8;
printf("%ld\n%ld\n%ld\n%ld\n%ld\n", save_rdi, save_rsi, save_rdx, save_rcx, save_r8);
}
int main(void) {
beautiful_function(1, 2, 3, 4, 5);
}
This makes asm in your beautiful_function that does capture the incoming values of your registers. (It doesn't inline, and the compiler happens not to have used any instructions before the asm statement that steps on any of those registers. The latter is not guaranteed in general.)
On Godbolt with clang -O3 and gcc -O3
gcc -O3 does actually work, printing what you expect. clang still prints garbage, because the caller sees that the args are unused, and decides not to set those registers. (If you'd hidden the definition from the caller, e.g. in another file without LTO, that wouldn't happen.)
(With GCC, noninline,noclone attributes are enough to disable this inter-procedural optimization, but not with clang. Not even compiling with -fPIC makes that possible. I guess the idea is that symbol-interposition to provide an alternate definition of beautiful_function that does use its args would violate the one definition rule in C. So if clang can see a definition for a function, it assumes that's how the function works, even if it isn't allowed to actually inline it.)
With clang:
main:
pushq %rax # align the stack
# arg-passing optimized away
callq beautiful_function#PLT
# indirect through the PLT because I compiled for Linux with -fPIC,
# and the function isn't "static"
xorl %eax, %eax
popq %rcx
retq
But the actual definition for beautiful_function does exactly what you want:
# clang -O3
beautiful_function:
pushq %r14
pushq %rbx
nop # asm statement here
movq %rdi, %r9 # copying all 5 register outputs to different regs
movq %rsi, %r10
movq %rdx, %r11
movq %rcx, %rbx
movq %r8, %r14
leaq .L.str(%rip), %rdi
xorl %eax, %eax
movq %r9, %rsi # then copying them to printf args
movq %r10, %rdx
movq %r11, %rcx
movq %rbx, %r8
movq %r14, %r9
popq %rbx
popq %r14
jmp printf#PLT # TAILCALL
GCC wastes fewer instructions, just for example starting with movq %r8, %r9 to move your r8 C var as the 6th arg to printf. Then movq %rcx, %r8 to set up the 5th arg, overwriting one of the output registers before it's read all of them. Something clang was over-cautious about. However, clang does still push/pop %r12 around the asm statement; I don't understand why. It ends by tailcalling printf, so it wasn't for alignment.
Related:
How to specify a specific register to assign the result of a C expression in inline assembly in GCC? - the opposite problem: materialize a C variable value in a specific register at a certain point.
Reading a register value into a C variable - the previous canonical Q&A which uses the now-unsupported register ... asm("regname") method like you were trying to. Or with a register-asm global variable, which hurts efficiency of all your code by leaving it otherwise untouched.
I forgot I'd answered that Q&A, making basically the same points as this. And some other points, e.g. that this doesn't work on registers like the stack pointer.
I'm trying to exit a program with assembly instructions, but when I compile with gcc it says that mov is a bad instruction, even when I use movl which I don't even know what it is. Is it even possible to exit a program with assembly instructions?
int main(void)
{
__asm__("movl %rax, $60\n\t"
"movl %rdi, $0\n\t"
"syscall\n");
}
// cc main.c -o main && ./main
You need movq for 64 bit. Also, your operations are not in the correct order.
The following compiles:
int main(void)
{
__asm__("movq $60, %rax\n\t"
"movq $0, %rdi\n\t"
"syscall\n");
}
Note that for any other system call (which doesn't terminate the whole program), it's necessary to tell the compiler which registers are clobbered, and usually to use a "memory" clobber to make sure memory is in sync with C values before a system call reads or writes memory.
Also, to pass operands, you'll need Extended asm syntax. See How to invoke a system call via sysenter in inline assembly? for an example my_write wrapper. (Which has only "syscall" inside the asm template; we ask the compiler to put the call number and args in the right registers instead of writing mov)
I need to pass address instead of value of my field from C to assembly function, and I have no idea why I end up with value instead of address.
C code:
long n = 1,ret = 0;
fun(&n, &ret);
//the rest is omitted
Assembly code:
.globl fun
fun:
pushq %rbp
movq %rsp, %rbp
movq 16(%rbp), %rax #my n address
movq 24(%rbp), %rbx #my ret address
cmpq $0, %rax
//the rest is omitted
When I peek values of %rax and %rbx with gdb I can see that I have values in my registers:
Breakpoint 1, fun () at cw.s:6
6 movq 16(%rbp), %rax #my n address
(gdb) s
7 movq 24(%rbp), %rbx #my ret address
(gdb) s
9 cmpq $0, %rax
(gdb) p $rax
$1 = 1
(gdb) p $rbx
$2 = 0
I don't really see whats wrong with my code. I'm sure that &n makes C pass address instead of value. I am following the solution provided here, but with no luck.
Calling a C function in assembly
Update:
I'm running LXLE (it's a fork of Ubuntu) on AMD x86_64. The compiler used is gcc (Ubuntu 4.8.2-19ubuntu1) and GNU assembler (GNU Binutils for Ubuntu) 2.24. My makefile:
cw: cw.c cw.o
gcc cw.o cw.c -o cw
cw.o: cw.s
as -gstabs -o cw.o cw.s
What architecture are you on? What compiler generated the code for fun? Did you write it yourself?
The code is using the r* registers and your question mentions "x64", so I would assume it's some amd64/x86-64/x64 architecture. You're reading things from the stack (which you've commented as "my n/ret address") which I would assume that you expect the function arguments to be there but I'm not aware of any ABI on that CPU family that passes the first arguments to a function on the stack.
If you wrote it yourself, you need to read up on the calling conventions of the ABI your operating system/compiler uses, because unless you're on a very obscure operating system it will not pass (the first few) function arguments on the stack. Most likely you're just reading random values from the stack that just happen to match where your compiler happened to put the values in the calling function.
If you're on Linux or most other unix-like system that use the SysV ABI the first two arguments to a function will be in the rdi, rsi registers. If you're on Windows, that will be rcx, rdx. This is assuming that your arguments are int/long/pointers. If the arguments are structs, floating point or such, other rules apply.
I am trying to analyze Linux assembly stack initialization/cleanup during function call/return using this code snippet below. The uninitialized variables are intended.
#define MAX 16
typedef struct _CONTEXT {
int arr[MAX];
int a;
int b;
int c;
};
void init(CONTEXT* ctx)
{
memset(ctx->arr, 0, sizeof(ctx->arr[0]));
ctx->a = 1;
}
void process(CONTEXT* ctx)
{
int trash;
int i;
for (i = 0; i < MAX; i++)
{
trash = ctx->arr[i];
}
}
int main(int argc, char *argv[])
{
CONTEXT ctx;
init(&ctx);
process(&ctx);
return 0;
}
As I learned from the school, and from this lecture slide,
an assembly of a function's stack initialization(style-1) should look like this:
pushq %rbp
movq %rsp, %rbp
subq $16, %rsp
movq %rdi, -8(%rbp)
...
leave
ret
But when I compile the code snippet above with gcc, the function main and init has the same stack initialization routine style-1 including subq instruction, to allocate stack variables' memory space,
but the function process does not have that kind of stack initialization.
I got this assembly code(style-2):
pushq %rbp
movq %rsp, %rbp
movq %rdi, -24(%rbp)
...
popq %rbp
ret
So the questions are:
What's the policy of compiler decision of creating different function stack initialization during compile time? I didn't put any __cdecl or such in this code, but 2 different stack initialization is found.
How can I find out the allocated stack memory address and size when a style-2 function is initialized?
What's the purpose of movq %rdi, -8(%rbp)?
Are there more stack initialization styles beside style-1 and style-2 in Linux? (without mentioning __cdecl or __stdcall things explicitly)
The way functions are compiled are highly compiler specific and not OS specific. I'm sure that the code generated by GCC under 32-bit Windows is similar to the code generated by GCC under 32-bit Linux while the code generated by Sun's C compiler under 32-bit Linux will look different from the code generated by GCC. This is also the case for stack initialization! Therefore there can be MANY stack initialization styles depending on the compiler used, the compiler settings, the compiler version, internal compiler states and so on.
You are obviously running 64-bit code. Unlike 32-bit Windows (where __cdecl and __stdcall exist) in 64-bit Windows and in Linux there is only one calling convention: In 32-bit Linux this is equal to __cdecl in Windows; 64-bit Linux and 64-bit Windows use two different register-based calling conventions. This means: You cannot change the calling convention for Linux and 64-bit Windows programs because only one is supported.
The purpose of movq %rdi, -8(%rbp) is to store the argument (in the rdi register) on the stack; movq %rdi, -24(%rbp) does the same but it is writing to some area of the stack that may be overwritten by signal handlers - not a good idea! However this is no problem if the value is not read back from the stack!
Obviously the "style-2" function does not require any stack memory.
While playing around with GCC's inline assembler feature, I tried to make a function which immediately exited the process, akin to _Exit from the C standard library.
Here is the relevant piece of source code:
void immediate_exit(int code)
{
#if defined(__x86_64__)
asm (
//Load exit code into %rdi
"mov %0, %%rdi\n\t"
//Load system call number (group_exit)
"mov $231, %%rax\n\t"
//Linux syscall, 64-bit version.
"syscall\n\t"
//No output operands, single unrestricted input register, no clobbered registers because we're about to exit.
:: "" (code) :
);
//Skip other architectures here, I'll fix these later.
#else
# error "Architecture not supported."
#endif
}
This works fine for debug builds (with -O0), but as soon as I turn optimisation on at any level, I get the following error:
immediate_exit.c: Assembler messages:
immediate_exit.c:4: Error: unsupported for `mov'
So I looked at the assembler output for both builds (I've removed .cfi* directives and other things for clarity, I can add that in again if it's a problem). The debug build:
immediate_exit:
.LFB0:
pushq %rbp
movq %rsp, %rbp
movl %edi, -4(%rbp)
mov -4(%rbp), %rdi
mov $231, %rax
syscall
popq %rbp
ret
And the optimised version:
immediate_exit:
.LFB0:
mov %edi, %rdi
mov $231, %rax
syscall
ret
So the optimised version is trying to put a 32-bit register edi into a 64-bit register, rdi, rather than loading it from rbp, which I presume is what is causing the error.
Now, I can fix this by specifying 'm' as a register constraint for code, which causes GCC to load from rbp regardless of optimisation level. However, I'd rather not do that, because I think the compiler and its authors has a much better idea about where to put stuff than I do.
So (finally!) my question is: how do I persuade GCC to use rdi rather than edi for the assembly output?
Overall, you're much better off using constraints to get values into the right registers rather than explicit moves:
#include <asm/unistd.h>
asm volatile("syscall"
: // no outputs. Other syscalls need an "=a"(retval) to tell the compiler RAX is modified, whether you actually use the retval or not.
: "D" ((uint64_t)code), "a" ((uint64_t)__NR_exit_group) // 231
: "rcx", "r11" // syscall itself clobbers these. exit can't fail and return; mostly here as an example for other syscalls
, "memory" // make sure any stores, e.g. to mmapped files, are done before this
);
__builtin_unreachable(); // tell the compiler execution doesn't come out the bottom of the asm statement. Maybe have the same effect as a "memory" clobber of making sure not to delay stores which could potentially be to mmapped files or shared memory.
That lets compiler hoist the moves earlier in the code if useful, or even avoid the move altogether if the value can be arranged to already be in the correct register...
For example code will be in EDI if this function doesn't inline; the Linux system-calling convention was chosen to be as close as possible to the x86-64 System V function-calling convention, except for using R10 instead of RCX because the syscall instruction itself overwrites it with saved-RIP, and R11 with saved-RFLAGS.
(Unnecessarily casting (uint64_t)code would force the compiler to redo zero-extension with a mov %edi, %edi in that case, though. The call number does need to be zero-extended to 64-bit, which will almost certainly happen for free even if you didn't manually cast it (since the compiler will use a mov $231, %eax), but it doesn't hurt to be explicit about something that is required. The exit_group system call takes a 32-bit int arg, so the kernel is guaranteed to ignore high garbage in RDI.)
Cast your variable into the appropriate length type.
#include <stdint.h>
asm (
//Load exit code into %rdi
"mov %0, %%rdi\n\t"
//Load system call number (group_exit)
"mov $231, %%rax\n\t"
//Linux syscall, 64-bit version.
"syscall\n\t"
//No output operands, single unrestricted input register, no clobbered registers because we're about to exit.
:: "g" ((uint64_t)code)
);
or better have your operand type straight away of the right size:
void immediate_exit(uint64_t code) { ...