Bad Instruction - Inline Assembly Language in C Code - c

I'm trying to exit a program with assembly instructions, but when I compile with gcc it says that mov is a bad instruction, even when I use movl which I don't even know what it is. Is it even possible to exit a program with assembly instructions?
int main(void)
{
__asm__("movl %rax, $60\n\t"
"movl %rdi, $0\n\t"
"syscall\n");
}
// cc main.c -o main && ./main

You need movq for 64 bit. Also, your operations are not in the correct order.
The following compiles:
int main(void)
{
__asm__("movq $60, %rax\n\t"
"movq $0, %rdi\n\t"
"syscall\n");
}
Note that for any other system call (which doesn't terminate the whole program), it's necessary to tell the compiler which registers are clobbered, and usually to use a "memory" clobber to make sure memory is in sync with C values before a system call reads or writes memory.
Also, to pass operands, you'll need Extended asm syntax. See How to invoke a system call via sysenter in inline assembly? for an example my_write wrapper. (Which has only "syscall" inside the asm template; we ask the compiler to put the call number and args in the right registers instead of writing mov)

Related

Why can't I get the value of asm registers in C?

I'm trying to get the values of the assembly registers rdi, rsi, rdx, rcx, r8, but I'm getting the wrong value, so I don't know if what I'm doing is taking those values or telling the compiler to write on these registers, and if that's the case how could I achieve what I'm trying to do (Put the value of assembly registers in C variables)?
When this code compiles (with gcc -S test.c)
#include <stdio.h>
void beautiful_function(int a, int b, int c, int d, int e) {
register long rdi asm("rdi");
register long rsi asm("rsi");
register long rdx asm("rdx");
register long rcx asm("rcx");
register long r8 asm("r8");
const long save_rdi = rdi;
const long save_rsi = rsi;
const long save_rdx = rdx;
const long save_rcx = rcx;
const long save_r8 = r8;
printf("%ld\n%ld\n%ld\n%ld\n%ld\n", save_rdi, save_rsi, save_rdx, save_rcx, save_r8);
}
int main(void) {
beautiful_function(1, 2, 3, 4, 5);
}
it outputs the following assembly code (before the function call):
movl $1, %edi
movl $2, %esi
movl $3, %edx
movl $4, %ecx
movl $5, %r8d
callq _beautiful_function
When I compile and execute it outputs this:
0
0
4294967296
140732705630496
140732705630520
(some undefined values)
What did I do wrong ? and how could I do this?
Your code didn't work because Specifying Registers for Local Variables explicitly tells you not to do what you did:
The only supported use for this feature is to specify registers for input and output operands when calling Extended asm (see Extended Asm).
Other than when invoking the Extended asm, the contents of the specified register are not guaranteed. For this reason, the following uses are explicitly not supported. If they appear to work, it is only happenstance, and may stop working as intended due to (seemingly) unrelated changes in surrounding code, or even minor changes in the optimization of a future version of gcc:
Passing parameters to or from Basic asm
Passing parameters to or from Extended asm without using input or output operands.
Passing parameters to or from routines written in assembler (or other languages) using non-standard calling conventions.
To put the value of registers in variables, you can use Extended asm, like this:
long rdi, rsi, rdx, rcx;
register long r8 asm("r8");
asm("" : "=D"(rdi), "=S"(rsi), "=d"(rdx), "=c"(rcx), "=r"(r8));
But note that even this might not do what you want: the compiler is within its rights to copy the function's parameters elsewhere and reuse the registers for something different before your Extended asm runs, or even to not pass the parameters at all if you never read them through the normal C variables. (And indeed, even what I posted doesn't work when optimizations are enabled.) You should strongly consider just writing your whole function in assembly instead of inline assembly inside of a C function if you want to do what you're doing.
Even if you had a valid way of doing this (which this isn't), it probably only makes sense at the top of a function which isn't inlined. So you'd probably need __attribute__((noinline, noclone)). (noclone is a GCC attribute that clang will warn about not recognizing; it means not to make an alternate version of the function with fewer actual args, to be called in the case where some of them are known constants that can get propagated into the clone.)
register ... asm local vars aren't guaranteed to do anything except when used as operands to Extended Asm statements. GCC does sometimes still read the named register if you leave it uninitialized, but clang doesn't. (And it looks like you're on a Mac, where the gcc command is actually clang, because so many build scripts use gcc instead of cc.)
So even without optimization, the stand-alone non-inlined version of your beautiful_function is just reading uninitialized stack space when it reads your rdi C variable in const long save_rdi = rdi;. (GCC does happen to do what you wanted here, even at -Os - optimizes but chooses not to inline your function. See clang and GCC (targeting Linux) on Godbolt, with asm + program output.).
Using an asm statement to make register asm do something
(This does what you say you want (reading registers), but because of other optimizations, still doesn't produce 1 2 3 4 5 with clang when the caller can see the definition. Only with actual GCC. There might be a clang option to disable some relevant IPA / IPO optimization, but I didn't find one.)
You can use an asm volatile() statement with an empty template string to tell the compiler that the values in those registers are now the values of those C variables. (The register ... asm declarations force it to pick the right register for the right variable)
#include <stdlib.h>
#include <stdio.h>
__attribute__((noinline,noclone))
void beautiful_function(int a, int b, int c, int d, int e) {
register long rdi asm("rdi");
register long rsi asm("rsi");
register long rdx asm("rdx");
register long rcx asm("rcx");
register long r8 asm("r8");
// "activate" the register-asm locals:
// associate register values with C vars here, at this point
asm volatile("nop # asm statement here" // can be empty, nop is just because Godbolt filters asm comments
: "=r"(rdi), "=r"(rsi), "=r"(rdx), "=r"(rcx), "=r"(r8) );
const long save_rdi = rdi;
const long save_rsi = rsi;
const long save_rdx = rdx;
const long save_rcx = rcx;
const long save_r8 = r8;
printf("%ld\n%ld\n%ld\n%ld\n%ld\n", save_rdi, save_rsi, save_rdx, save_rcx, save_r8);
}
int main(void) {
beautiful_function(1, 2, 3, 4, 5);
}
This makes asm in your beautiful_function that does capture the incoming values of your registers. (It doesn't inline, and the compiler happens not to have used any instructions before the asm statement that steps on any of those registers. The latter is not guaranteed in general.)
On Godbolt with clang -O3 and gcc -O3
gcc -O3 does actually work, printing what you expect. clang still prints garbage, because the caller sees that the args are unused, and decides not to set those registers. (If you'd hidden the definition from the caller, e.g. in another file without LTO, that wouldn't happen.)
(With GCC, noninline,noclone attributes are enough to disable this inter-procedural optimization, but not with clang. Not even compiling with -fPIC makes that possible. I guess the idea is that symbol-interposition to provide an alternate definition of beautiful_function that does use its args would violate the one definition rule in C. So if clang can see a definition for a function, it assumes that's how the function works, even if it isn't allowed to actually inline it.)
With clang:
main:
pushq %rax # align the stack
# arg-passing optimized away
callq beautiful_function#PLT
# indirect through the PLT because I compiled for Linux with -fPIC,
# and the function isn't "static"
xorl %eax, %eax
popq %rcx
retq
But the actual definition for beautiful_function does exactly what you want:
# clang -O3
beautiful_function:
pushq %r14
pushq %rbx
nop # asm statement here
movq %rdi, %r9 # copying all 5 register outputs to different regs
movq %rsi, %r10
movq %rdx, %r11
movq %rcx, %rbx
movq %r8, %r14
leaq .L.str(%rip), %rdi
xorl %eax, %eax
movq %r9, %rsi # then copying them to printf args
movq %r10, %rdx
movq %r11, %rcx
movq %rbx, %r8
movq %r14, %r9
popq %rbx
popq %r14
jmp printf#PLT # TAILCALL
GCC wastes fewer instructions, just for example starting with movq %r8, %r9 to move your r8 C var as the 6th arg to printf. Then movq %rcx, %r8 to set up the 5th arg, overwriting one of the output registers before it's read all of them. Something clang was over-cautious about. However, clang does still push/pop %r12 around the asm statement; I don't understand why. It ends by tailcalling printf, so it wasn't for alignment.
Related:
How to specify a specific register to assign the result of a C expression in inline assembly in GCC? - the opposite problem: materialize a C variable value in a specific register at a certain point.
Reading a register value into a C variable - the previous canonical Q&A which uses the now-unsupported register ... asm("regname") method like you were trying to. Or with a register-asm global variable, which hurts efficiency of all your code by leaving it otherwise untouched.
I forgot I'd answered that Q&A, making basically the same points as this. And some other points, e.g. that this doesn't work on registers like the stack pointer.

x86 add and addl operands are adding wrong?

I working with xv6, which implements the original UNIX on x86 machines. I wrote very simple inline assembly in a C program :
register int ecx asm ("%ecx");
printf(1, "%d\n", ecx);
__asm__("movl 16(%esp), %ecx\t\n");
printf(1, "%d\n", ecx);
__asm__("add $0, %ecx\t\n");
printf(1, "%d\n", ecx);
__asm__("movl %ecx, 16(%esp)\t\n");
I usually get a value like 434 printed by the second print statement. However, after the add command it prints 2. If I use the addl command instead, it also prints 2. I am using the latest stable version of xv6. So, I don't really suspect it to be the problem. Is there any other way I can add two numbers in inline assembly?
Essentially I need to increment 16(%esp) by 4.
Edited code to:
__asm__("addl $8, 16(%esp)\t\n");
1) In your example you're not incrementing ecx by 4, your incrementing it by 0.
__asm__("addl $4, %ecx");
2) You should be able to chain multiple commands into one asm call
__asm__("movl 16(%esp), %ecx\n\t"
"addl $4, %ecx\n\t"
"movl %ecx, 16(%esp)");
3) The register keyword is a hint, and the compiler may decide to put your variable where ever it wants still. Also reading the documentation on the GCC page warns about how some functions may clobber various registers. printf() being a C function may very well use the ecx register without preserving its value. It could preserve it, but it may not; the compiler could be using that register for all sorts of optimizations inside of that call. It is a general purpose register on the 80x86 and those are often used for various parameter passing and return values all the time.
Untested corrections:
int reg; // By leaving this out, we give GCC the ability to pick the best available register.
/*
* volatile indicates to GCC that this inline assembly might do odd side
* effects and should disable any optimizations around it.
*/
asm volatile ("movl 16(%esp), %0\n\t"
"addl $4, %0\n\t"
"movl %0, 16(%esp)"
: "r" (reg)); // The "r" indicates we want to use a register
printf("Result: %d\n", reg);
The GCC manage page has more details.

"unsupported for mov" GCC inline assembler

While playing around with GCC's inline assembler feature, I tried to make a function which immediately exited the process, akin to _Exit from the C standard library.
Here is the relevant piece of source code:
void immediate_exit(int code)
{
#if defined(__x86_64__)
asm (
//Load exit code into %rdi
"mov %0, %%rdi\n\t"
//Load system call number (group_exit)
"mov $231, %%rax\n\t"
//Linux syscall, 64-bit version.
"syscall\n\t"
//No output operands, single unrestricted input register, no clobbered registers because we're about to exit.
:: "" (code) :
);
//Skip other architectures here, I'll fix these later.
#else
# error "Architecture not supported."
#endif
}
This works fine for debug builds (with -O0), but as soon as I turn optimisation on at any level, I get the following error:
immediate_exit.c: Assembler messages:
immediate_exit.c:4: Error: unsupported for `mov'
So I looked at the assembler output for both builds (I've removed .cfi* directives and other things for clarity, I can add that in again if it's a problem). The debug build:
immediate_exit:
.LFB0:
pushq %rbp
movq %rsp, %rbp
movl %edi, -4(%rbp)
mov -4(%rbp), %rdi
mov $231, %rax
syscall
popq %rbp
ret
And the optimised version:
immediate_exit:
.LFB0:
mov %edi, %rdi
mov $231, %rax
syscall
ret
So the optimised version is trying to put a 32-bit register edi into a 64-bit register, rdi, rather than loading it from rbp, which I presume is what is causing the error.
Now, I can fix this by specifying 'm' as a register constraint for code, which causes GCC to load from rbp regardless of optimisation level. However, I'd rather not do that, because I think the compiler and its authors has a much better idea about where to put stuff than I do.
So (finally!) my question is: how do I persuade GCC to use rdi rather than edi for the assembly output?
Overall, you're much better off using constraints to get values into the right registers rather than explicit moves:
#include <asm/unistd.h>
asm volatile("syscall"
: // no outputs. Other syscalls need an "=a"(retval) to tell the compiler RAX is modified, whether you actually use the retval or not.
: "D" ((uint64_t)code), "a" ((uint64_t)__NR_exit_group) // 231
: "rcx", "r11" // syscall itself clobbers these. exit can't fail and return; mostly here as an example for other syscalls
, "memory" // make sure any stores, e.g. to mmapped files, are done before this
);
__builtin_unreachable(); // tell the compiler execution doesn't come out the bottom of the asm statement. Maybe have the same effect as a "memory" clobber of making sure not to delay stores which could potentially be to mmapped files or shared memory.
That lets compiler hoist the moves earlier in the code if useful, or even avoid the move altogether if the value can be arranged to already be in the correct register...
For example code will be in EDI if this function doesn't inline; the Linux system-calling convention was chosen to be as close as possible to the x86-64 System V function-calling convention, except for using R10 instead of RCX because the syscall instruction itself overwrites it with saved-RIP, and R11 with saved-RFLAGS.
(Unnecessarily casting (uint64_t)code would force the compiler to redo zero-extension with a mov %edi, %edi in that case, though. The call number does need to be zero-extended to 64-bit, which will almost certainly happen for free even if you didn't manually cast it (since the compiler will use a mov $231, %eax), but it doesn't hurt to be explicit about something that is required. The exit_group system call takes a 32-bit int arg, so the kernel is guaranteed to ignore high garbage in RDI.)
Cast your variable into the appropriate length type.
#include <stdint.h>
asm (
//Load exit code into %rdi
"mov %0, %%rdi\n\t"
//Load system call number (group_exit)
"mov $231, %%rax\n\t"
//Linux syscall, 64-bit version.
"syscall\n\t"
//No output operands, single unrestricted input register, no clobbered registers because we're about to exit.
:: "g" ((uint64_t)code)
);
or better have your operand type straight away of the right size:
void immediate_exit(uint64_t code) { ...

use printf function in inline asm on gcc

I want to test inline asm capabilty on gcc.
So, I type and compile following code on ubuntu 12.04 64-bit
but system shows ''segmentation fault" on screen when it runs.
I don't have any idea what causes the problem.
#include <stdio.h>
char Format[]="Hello world %d\n";
int main()
{
asm
(
"movl $3,4(%esp);"
"movl $Format,(%esp);"
"call printf;"
);
return 0;
}
Thank you guys for helping me a program newbie.
I use Code::blocks as IDE to write this code. I had tried to use 64-bit registers such like %rdx, but logs of Build messages shows " Error: bad register name `%rdx' " when compiling the code. I think this means the gcc invoked by Code::blocks is 32-bit version, hence it can't recognize those registers.
I modify the code to reserve the stack space
#include <stdio.h>
char Format[]="Hello world %d\n";
int main()
{
asm
(
"subl $8,%esp;" //I don't know $4, $8, $12, $16, $20 which is correct
//but I had tried them all but results are still ''segmentation fault."
"movl $3,4(%esp);"
"movl $Format,(%esp);"
"call printf;"
"movl %ebp,%esp;"
);
return 0;
}
and even use -m32 as compiler option, but it still shows ''segmentation fault ".
thanks again for who helps.
System V ABI for x64 mandates that the first six integer/pointer arguments to a function should go in registers %rdi, %rsi, %rdx, %rcx, %r8 and %r9. The stack is used to pass further arguments. It also requres that when calling functions with variable number of arguments (like printf), %rax should be set to the total number of floating-point arguments passed in the XMM registers. The right sequence to call printf() in your case is:
xorl %eax, %eax
movl $Format, %edi
movl $3, %esi
call printf
%rax should be set to 0 since no floating-point arguments are being passed. This code also uses the fact that VA of initialised data usually lies somewhere in the first 4 GiB and thus shorter 32-bit instructions are used. Of course printf will still examine the full content of %rdi to determine where the format string is located in memory.
Your code uses the 32-bit calling convention and should theoretically work if cross-compiled as 32-bit with -m32 but you should first reserve stack space for the arguments using something like subl $20, %esp and restore it after the call with addl %20, %esp, otherwise you are either overwriting the stack of main() or ret will pick the wrong return address. Here is a fully working (tested) C/asm code that compiles and run in 32-bit mode:
#include <stdio.h>
char Format[] = "Hello world, %d\n";
int main (void)
{
asm
(
// Make stack space for arguments to printf
"subl $8, %esp\n"
"movl $3, 4(%esp)\n"
"movl $Format, (%esp)\n"
"call printf\n"
// Clean-up the stack
"addl $8, %esp\n"
);
return 0;
}
$ gcc -m32 -o test.x test.c
$ ./test.x
Hello world, 3
Remark: I use \n instead of ; at the end of each assembly line only to improve the readability of the compiler assembly output - it is irrelevant to the correctness of the code.
Try first to look at a normal C program and see what asm it gives (you can get it by using gcc -S).
Then, identify the part of ASM which is needed for the printf call and reproduce it in your original program.
What you have here is a calling convention error.

Get the Stack Pointer in C on Mac OS X Lion

I've run into some strange behaviour when trying to obtain the current stack pointer in C (using inline ASM). The code looks like:
#include <stdio.h>
class os {
public:
static void* current_stack_pointer();
};
void* os::current_stack_pointer() {
register void *esp __asm__ ("rsp");
return esp;
}
int main() {
printf("%p\n", os::current_stack_pointer());
}
If I compile the code using the standard gcc options:
$ g++ test.cc -o test
It generates the following assembly:
__ZN2os21current_stack_pointerEv:
0000000000000000 pushq %rbp
0000000000000001 movq %rsp,%rbp
0000000000000004 movq %rdi,0xf8(%rbp)
0000000000000008 movq 0xe0(%rbp),%rax
000000000000000c movq %rax,%rsp
000000000000000f movq %rsp,%rax
0000000000000012 movq %rax,0xe8(%rbp)
0000000000000016 movq 0xe8(%rbp),%rax
000000000000001a movq %rax,0xf0(%rbp)
000000000000001e movq 0xf0(%rbp),%rax
0000000000000022 popq %rbp
If I run the resulting binary it crashes with a SIGILL (Illegal Instruction). However if I add a little optimisation to the compile:
$ g++ -O1 test.cc -o test
The generated assembly is much simpler:
0000000000000000 pushq %rbp
0000000000000001 movq %rsp,%rbp
0000000000000004 movq %rsp,%rax
0000000000000007 popq %rbp
0000000000000008 ret
And the code runs fine. So to the question; is there a more stable to get hold of the stack pointer from C code on Mac OS X? The same code has no problems on Linux.
The problem with attempting to fetch the stack pointer through a function call is that the stack pointer inside the called function is pointing at a value that will be completely different after the function returns, and therefore you're capturing the address of a location that will be invalid after the call. You're also making the assumption that there was no function prologue added by the compiler on that platform (i.e., both your functions currently have a prologue where the compiler setups up the current activation record on the stack for the function, which will change the value of RSP that you are attempting to capture). At the very least, provided that there was no function prologue added by the compiler, you will need to subtract the size of a pointer on the platform you're using in order to actually get the "true" address to where the stack will be pointing after the return from the function call. This is because the assembly command call pushes the return address for the instruction pointer onto the stack, and ret in the callee will pop that value off the stack. Thus inside the callee, there will at the very least be a return-address instruction that the stack-pointer will be pointing to, and that location won't be valid after the function call. Finally, on certain platforms (unfortunately not x86), you can use the __attributes__((naked)) tag to create a function with no prologue in gcc. Using the inline keyword to avoid a prologue is not completely reliable since it does not force the compiler to inline the function ... under certain low-optimization levels, inlining will not occur, and you'll end up with a prologue again, and the stack-pointer will not be pointing to the correct location if you decide to take it's address in those cases.
If you must have the value of the stack pointer, then the only reliable method will be to use assembly, follow the rules of your platform's ABI, compile to an object file using an assembler, and then link that object file with the rest of the object files in your executable. You can then expose the assembler function to the rest of your code by including a function declaration in a header file. So your code could look like (assuming you're using gcc to compile your assembly):
//get_stack_pointer.h
extern "C" void* get_stack_ptr();
//get_stack_pointer.S
.section .text
.global get_stack_ptr
get_stack_ptr:
movq %rsp, %rax
addq $8, %rax
ret
Rather than using a register variable with a constraint, you should just write some explicit inline assembler to fetch %esp:
static void *getsp(void)
{
void *sp;
__asm__ __volatile__ ("movq %%rsp,%0"
: "=r" (sp)
: /* No input */);
return sp;
}
You can also convert this to a macro using gcc statement expressions:
#define GETSP() ({void *sp;__asm__ __volatile__("movl %%esp,%0":"=r"(sp):);sp;})
A multi arch version was what I needed recently:
/**
* helps to check the architecture macros:
* `echo | gcc -E -dM - | less`
*
* this is arm, x64 and i386 (linux | apple) compatible
* #return address where the stack starts
*/
void *get_sp(void) {
void *sp;
__asm__ __volatile__(
#ifdef __x86_64__
"movq %%rsp,%0"
#elif __i386__
"movl %%esp,%0"
#elif __arm__
// sp is an alias for r13
"mov %%sp,%0"
#endif
: "=r" (sp)
: /* no input */
);
return sp;
}
I do not have a reference for that, but GCC is known to occasionally (often) misbehave in the presence of inline assembly if compilation is not optimized at all. So you should always add the -O1 flag.
As a side-note, what you are trying to do is not very robust in the presence of an optimizing compiler, because the compiler may inline the call to current_stack_pointer() and the returned value may thus be an approximation of the current stack pointer value (not even a lower bound).

Resources