I want to assign an array using inline assembly using the AT&T syntax. I want to achieve something like the following. Note that rsp here is the %rsp register.
long saved_sp[N];
long new_sp[N];
void some_function( unsigned int tid, ... )
{
// These two lines should be in assembly
saved_sp[tid] = rsp;
rsp = new_sp[tid];
......
}
I'm sure I don't need to warn you...
__asm__ __volatile__ (
"movq %%rsp, (%0, %2, 8)\n\t"
"movq (%1, %2, 8), %%rsp\n\t"
: : "r" (saved_sp), "r" (new_sp), "r" ((long) tid));
Perhaps "memory" should be added as a clobber, but it seems kind of redundant. Wherever you go after this, remember that the frame pointer "%rbp" will be invalidated.
Related
Trying to save a variable in an arm register using inline assembly.
unsigned int lma_offset = 0x1234; // typically calculated, hardcoded for example
__asm volatile ("MOV R10, %[input]"
: [input] "=r" (lma_offset)
);
This changes lma_offset to 0xd100 in my case, instead of setting the register. What am I doing wrong?
PS: when I declare lma_offset as const it gives a compiler error because lma_offset is used as output. So obviously something is wrong, still I cant find the correct syntax for this.
For future reference, according to Erics comment
const unsigned int lma_offset = 0x10000;
__asm__ volatile ("MOV R10, %[input]"
: // no C variable outputs
: [input] "r" (lma_offset)
: "R10" // tell the compiler R10 is modified
);
using double : and replacing the "=r" with "r" indeed solves the problem.
It would also be possible to ask the compiler to have that constant already in R10 for an asm statement, by using a register local variable to force the "r" input to pick r10. (Then we can omit the redundant mov r10, r10).
register unsigned r10 __asm__("r10") = lma_offset; // picks r10 for "r" constraints, no other *guaranteed* effects
__asm__ volatile ("# inline asm went here" // empty template, actually just a comment you can see if looking at the compiler's asm output
: // no C variable outputs
: [input] "r" (lma_offset)
: // no clobbers needed
);
When writing a register to some output C variable it would result in
unsigned int lma_offset = 0x0;
__asm__ volatile ("MOV %[output], R11"
: [output] "=r" (lma_offset)
// no clobbers needed; reading a register doesn't step on the compiler's toes
);
I recently dabbled into low level programming, and want to make a function somesyscall that accepts (CType rax, CType rbx, CType rcx, CType rdx). struct CType looks like:
/*
TYPES:
0 int
1 string
2 bool
*/
typedef struct {
void* val;
int typev;
} CType;
the function is a bit messy, but in theory should work:
#include <errno.h>
#include <stdbool.h>
#include "ctypes.h"
//define functions to set registers
#define seteax(val) asm("mov %0, %%rax" :: "g" (val) : "%rax")
#define setebx(val) asm("mov %0, %%rbx" :: "g" (val) : "%rbx")
#define setecx(val) asm("mov %0, %%rcx" :: "g" (val) : "%rcx")
#define setedx(val) asm("mov %0, %%rdx" :: "g" (val) : "%rdx")
///////////////////////////////////
#define setregister(value, register) \
switch (value.typev) { \
case 0: { \
register(*((double*)value.val)); \
break; \
} \
case 1: { \
register(*((char**)value.val)); \
break; \
} \
case 2: { \
register(*((bool*)value.val)); \
break; \
} \
}
static inline long int somesyscall(CType a0, CType a1, CType a2, CType a3) {
//set the registers
setregister(a0, seteax);
setregister(a1, setebx);
setregister(a2, setecx);
setregister(a3, setedx);
///////////////////
asm("int $0x80"); //interrupt
//fetch back the rax
long int raxret;
asm("mov %%rax, %0" : "=r" (raxret));
return raxret;
}
when I run with:
#include "syscall_unix.h"
int main() {
CType rax;
rax.val = 39;
rax.typev = 0;
CType rbx;
rbx.val = 0;
rbx.typev = 0;
CType rcx;
rcx.val = 0;
rcx.typev = 0;
CType rdx;
rdx.val = 0;
rdx.typev = 0;
printf("%ld", somesyscall(rax, rbx, rcx, rdx));
}
and compile (and run binary) with
clang test.c
./a.out
I get a segfault. However, everything seems to look correct. Am I doing anything wrong here?
After macro expansion you will have something like
long int raxret;
asm("mov %0, %%rax" :: "g" (a0) : "%rax");
asm("mov %0, %%rbx" :: "g" (a1) : "%rbx");
asm("mov %0, %%rcx" :: "g" (a2) : "%rcx");
asm("mov %0, %%rdx" :: "g" (a3) : "%rdx");
asm("int $0x80");
asm("mov %%rax, %0" : "=r" (raxret));
This doesn't work because you haven't told the compiler that it's not allowed to reuse rax, rbx, rcx, and rdx for something else during the sequence of asm statements. For instance, the register allocator might decide to copy a2 from the stack to rax and then use rax as the input operand for the mov %0, %%rcx instruction -- clobbering the value you put in rax.
(asm statements with no outputs are implicitly volatile so the first 5 can't reorder relative to each other, but the final one can move anywhere. For example, be moved after later code to where the compiler finds it convenient to generate raxret in a register of its choice. RAX might no longer have the system call return value at that point - you need to tell the compiler that the output comes from the asm statement that actually produces it, without assuming any registers survive between asm statements.)
There are two different ways to tell the compiler not to do that:
Put only the int instruction in an asm, and express all of the requirements for what goes in what register with constraint letters:
asm volatile ("int $0x80"
: "=a" (raxret) // outputs
: "a" (a0), "b" (a1), "c" (a2), "d" (a3) // pure inputs
: "memory", "r8", "r9", "r10", "r11" // clobbers
// 32-bit int 0x80 system calls in 64-bit code zero R8..R11
// for native "syscall", clobber "rcx", "r11".
);
This is possible for this simple example but not always possible in general, because there aren't constraint letters for every single register, especially not on CPUs other than x86.
// use the native 64-bit syscall ABI
// remove the r8..r11 clobbers for 32-bit mode
Put only the int instruction in an asm, and express the requirements for what goes in what register with explicit register variables:
register long rax asm("rax") = a0;
register long rbx asm("rbx") = a1;
register long rcx asm("rcx") = a2;
register long rdx asm("rdx") = r3;
// Note that int $0x80 only looks at the low 32 bits of input regs
// so `uint32_t` would be more appropriate than long
// but really you should just use "syscall" in 64-bit code.
asm volatile ("int $0x80"
: "+r" (rax) // read-write: in=call num, out=retval
: "r" (rbx), "r" (rcx), "r" (rdx) // read-only inputs
: "memory", "r8", "r9", "r10", "r11"
);
return rax;
This will work regardless of which registers you need to use. It's also probably more compatible with the macros you're trying to use to erase types.
Incidentally, if this is 64-bit x86/Linux then you should be using syscall rather than int $0x80, and the arguments belong in the ABI-standard incoming-argument registers (rdi, rsi, rdx, rcx, r8, r9 in that order), not in rbx, rcx, rdx etc. The system call number still goes in rax, though. (Use call numbers from #include <asm/unistd.h> or <sys/syscall.h>, which will be appropriate for the native ABI of the mode you're compiling for, another reason not to use int $0x80 in 64-bit mode.)
Also, the asm statement for the system-call instruction should have a "memory" clobber and be declared volatile; almost all system calls access memory somehow.
(As a micro-optimization, I suppose you could have a list of system calls that don't read memory, write memory, or modify the virtual address space, and avoid the memory clobber for them. It would be a pretty short list and I'm not sure it would be worth the trouble. Or use the syntax shown in How can I indicate that the memory *pointed* to by an inline ASM argument may be used? to tell GCC which memory might be read or written, instead of a "memory" clobber, if you write wrappers for specific syscalls.
Some of the no-pointer cases include getpid where it would be a lot faster to call into the VDSO to avoid a round trip to kernel mode and back, like glibc does for the appropriate syscalls. That also applies to clock_gettime which does take pointers.)
Incidentally, beware of the actual kernel interfaces not matching up with the interfaces presented by the C library's wrappers. This is generally documented in the NOTES section of the man page, e.g. for brk(2) and getpriority(2)
Extended asm gives the following description regarding the "+" modifier:
Operands using the ‘+’ constraint modifier count as two operands (that
is, both as input and output) towards the total maximum of 30 operands
per asm statement.
So I assume that it is not necessary to mention output operand with the "+" modifier in the input section again, but it is not specified how to determine their index. I wrote the following example Godbolt :
#include <stdint.h>
#include <inttypes.h>
#include <stdio.h>
void asm_add(uint64_t o1, uint64_t o2, uint64_t o3){
__asm__ volatile (
"addq %2, %3\n\
addq %2, %4":
"+r" (o2), "+r" (o3):
"r" (o1):
"cc"
);
printf("o2 = %" PRIu64 "\n", o2);
printf("o3 = %" PRIu64 "\n", o3);
}
int main(void){
asm_add(20, 30, 40);
}
Which printed
o2 = 50
o3 = 60
Is the template using +
__asm__ volatile (
"addq %2, %3\n\
addq %2, %4":
"+r" (o2), "+r" (o3):
"r" (o1):
"cc"
);
exactly the same as
__asm__ volatile (
"addq %2, %3\n\
addq %2, %4":
"+r" (o2), "+r" (o3):
"r" (o1), "0" (o2), "1" (o3):
"cc"
);
where all inputs are specified explicitly? So in the first example the "implicit" inputs are appended.
By using "+r" (o2), you are saying that this parameter needs to contain o2 on entry to the asm block, and will contain an updated value on exit.
In other words, %0 describes both input and output. The fact that you can (apparently?) reference indices greater than the number of parameters is an undocumented quirk. Don't depend upon it.
You might also consider using symbolic names, which (I find) are easier to read, especially as the number of asm lines goes up. Names are particularly useful when you are first creating the asm and there's the potential for adding/removing parameters. Having to renumber everything is painful and error prone:
__asm__ volatile (
"addq %[o1], %[o2]\n\
addq %[o1], %[o3]":
[o2] "+r" (o2), [o3] "+r" (o3):
[o1] "r" (o1):
"cc"
);
Lastly, consider not using inline asm for anything beyond educational purposes. And even then, inline asm is the hardest possible way to learn asm.
I'm having trouble solving a school exercise , I'm supposed to change a char array in c using inline assembly. In this case change "ahoy" to "aXoy", but I'm getting segmentation fault. This is my code:
#include <stdio.h>
int main() {
char string[] = "ahoy";
__asm__ volatile (
"mov %0, %%eax;"
"movb $'X', 1(%%eax);"
: "=m"(string) : "0"(string) : "memory", "eax");
printf("%s\n", string);
return 0
}
with this: "mov %0, %%eax;" I'm trying to store address of the array in register
then with this: "movb $'X', 1(%%eax);" I want to store byte 'X' in location pointed to by (%%eax) offset by 1 byte(char),
I have string both as output and input, and "memory","eax" in clobber since I'm modifying both. What is wrong with my code?
Use gcc -S instead of -c to look at the compiler's assembly output and you should see what's wrong. The "m" constraint produces a memory reference expression for accessing the object associated with it, not a register containing its address. So it will expand to something like (%ecx), not %ecx, and the first mov will load 4 bytes from string, rather than the address of string, into eax.
One way to fix this would be to use a register constraint "r"(string) instead.
Alternatively you could replace the mov with lea: lea %0, %%eax.
There are other issues with your code too like the useless temporary/clobber register eax but they shouldn't keep it from working.
According to GCC's Extended ASM and Assembler Template, to keep instructions consecutive, they must be in the same ASM block. I'm having trouble understanding what provides the scheduling or timings of reads and writes to the operands in a block with multiple statements.
As an example, EBX or RBX needs to be preserved when using CPUID because, according to the ABI, the caller owns it. There are some open questions with respect to the use of EBX and RBX, so we want to preserve it unconditionally (its a requirement). So three instructions need to be encoded into a single ASM block to ensure the consecutive-ness of the instructions (re: the assembler template discussed in the first paragraph):
unsigned int __FUNC = 1, __SUBFUNC = 0;
unsigned int __EAX, __EBX, __ECX, __EDX;
__asm__ __volatile__ (
"push %ebx;"
"cpuid;"
"pop %ebx"
: "=a"(__EAX), "=b"(__EBX), "=c"(__ECX), "=d"(__EDX)
: "a"(__FUNC), "c"(__SUBFUNC)
);
If the expression representing the operands is interpreted at the wrong point in time, then __EBX will be the saved EBX (and not the CPUID's EBX), which will likely be a pointer to the Global Offset Table (GOT) if PIC is enabled.
Where, exactly, does the expression specify that the store of CPUID's %EBX into __EBX should happen (1) after the PUSH %EBX; (2) after the CPUID; but (3) before the POP %EBX?
In your question you present some code that does a push and pop of ebx. The idea of saving ebx in the event that you compile with gcc using -fPIC (position independent code) is correct. It is up to our function not to clobber ebx upon return in that situation. Unfortunately the way you have defined the constraints you explicitly use ebx. Generally the compiler will warn you (error: inconsistent operand constraints in an 'asm') if you are using PIC code and you specify =b as an output constraint. Why it doesn't produce a warning for you is unusual.
To get around this problem you can let the assembler template choose a register for you. Instead of pushing and popping we simply exchange %ebx with an unused register chosen by the compiler and restore it by exchanging it back after. Since we don't wish to have the compiler clobber our input registers during the exchange we specify early clobber modifier, thus ending up with a constraint of =&r (instead of =b in the OPs code). More on modifiers can be found here. Your code (for 32 bit) would look something like:
unsigned int __FUNC = 1, __SUBFUNC = 0;
unsigned int __EAX, __EBX, __ECX, __EDX;
__asm__ __volatile__ (
"xchgl\t%%ebx, %k1\n\t" \
"cpuid\n\t" \
"xchgl\t%%ebx, %k1\n\t"
: "=a"(__EAX), "=&r"(__EBX), "=c"(__ECX), "=d"(__EDX)
: "a"(__FUNC), "c"(__SUBFUNC));
If you intend to compile for X86_64 (64 bit) you'll need to save the entire contents of %rbx. The code above will not quite work. You'd have to use something like:
uint32_t __FUNC = 1, __SUBFUNC = 0;
uint32_t __EAX, __ECX, __EDX;
uint64_t __BX; /* Big enough to hold a 64 bit value */
__asm__ __volatile__ (
"xchgq\t%%rbx, %q1\n\t" \
"cpuid\n\t" \
"xchgq\t%%rbx, %q1\n\t"
: "=a"(__EAX), "=&r"(__BX), "=c"(__ECX), "=d"(__EDX)
: "a"(__FUNC), "c"(__SUBFUNC));
You could code this up using conditional compilation to deal with both X86_64 and i386:
uint32_t __FUNC = 1, __SUBFUNC = 0;
uint32_t __EAX, __ECX, __EDX;
uint64_t __BX; /* Big enough to hold a 64 bit value */
#if defined(__i386__)
__asm__ __volatile__ (
"xchgl\t%%ebx, %k1\n\t" \
"cpuid\n\t" \
"xchgl\t%%ebx, %k1\n\t"
: "=a"(__EAX), "=&r"(__BX), "=c"(__ECX), "=d"(__EDX)
: "a"(__FUNC), "c"(__SUBFUNC));
#elif defined(__x86_64__)
__asm__ __volatile__ (
"xchgq\t%%rbx, %q1\n\t" \
"cpuid\n\t" \
"xchgq\t%%rbx, %q1\n\t"
: "=a"(__EAX), "=&r"(__BX), "=c"(__ECX), "=d"(__EDX)
: "a"(__FUNC), "c"(__SUBFUNC));
#else
#error "Unknown architecture."
#endif
GCC has a __cpuid macro defined in cpuid.h. It defined the macro so that it only saves the ebx and rbx register when required. You can find the GCC 4.8.1 macro definition here to get an idea of how they handle cpuid in cpuid.h.
The astute reader may ask the question - what stops the compiler from choosing ebx or rbx as the scratch register to use for the exchange. The compiler knows about ebx and rbx in the context of PIC, and will not allow it to be used as a scratch register. This is based on my personal observations over the years and reviewing the assembler (.s) files generated from C code. I can't say for certain how more ancient versions of gcc handled it so it could be a problem.
I think you understand, but to be clear, the "consecutive" rule means that this:
asm ("a");
asm ("b");
asm ("c");
... might get other instructions interposed, so if that's not desirable then it must be rewritten like this:
asm ("a\n"
"b\n"
"c");
... and now it will be inserted as a whole.
As for the cpuid snippet, we have two problems:
The cpuid instruction will overwrite ebx, and hence clobber the data that PIC code must keep there.
We want to extract the value that cpuid places in ebx while never returning to compiled code with the "wrong" ebx value.
One possible solution would be this:
unsigned int __FUNC = 1, __SUBFUNC = 0;
unsigned int __EAX, __EBX, __ECX, __EDX;
__asm__ __volatile__ (
"push %ebx;"
"cpuid;"
"mov %ebx, %ecx"
"pop %ebx"
: "=c"(__EBX)
: "a"(__FUNC), "c"(__SUBFUNC)
: "eax", "edx"
);
__asm__ __volatile__ (
"push %ebx;"
"cpuid;"
"pop %ebx"
: "=a"(__EAX), "=c"(__ECX), "=d"(__EDX)
: "a"(__FUNC), "c"(__SUBFUNC)
);
There's no need to mark ebx as clobbered as you're putting it back how you found it.
(I don't do much Intel programming, so I may have some of the assembler-specific details off there, but this is how asm works.)