This question already has answers here:
When to use earlyclobber constraint in extended GCC inline assembly?
(2 answers)
Closed 4 months ago.
Following this manual I wanted to create simplest inline AVR assembly snippet possible: copy values of two variables to two other variables.
uint8_t a, b, c, d;
a = 42;
b = 11;
asm(
"mov %0, %2\n\t"
"mov %1, %3\n\t"
: "=r" (c), "=r" (d)
: "r" (a), "r" (b)
);
I would expect it to be equivalent to:
uint8_t a, b, c, d;
a = 42;
b = 11;
c = a;
d = b;
However, after running both values of c and d are equal to 42. If I change the asm snipptet to:
asm(
"mov %0, %3\n\t"
"mov %1, %2\n\t"
: "=r" (c), "=r" (d)
: "r" (a), "r" (b)
);
c is equal to 11 and d is equal to 42 as expected. Similarly, changing both source operands to %2 yields two 42 and setting both of them to %3 yields two 11.
Why the first version does not work as intended?
I would expect it to be equivalent to:
uint8_t a, b, c, d;
a = 42;
b = 11;
c = a;
d = b;
No, it's not1. The reason is that in the C code, one assignment follows after the other, whereas in inline asm, the compiler treats the "code" as if it happens at once. The compiler does not analyze the code in the asm string template in any way, it's just a string on which it performs replacements of %-operands. In
asm ("mov %0, %3" "\n\t"
"mov %1, %2"
: "=r" (c), "=r" (d)
: "r" (a), "r" (b));
the lifetime of a and b ends at the asm, and the lifetime of c and d begins. Therefore, it's totally fine for the compiler to use the same register for, say c and a. This means the output of the 1st move overrides the input of the 2nd move. This is the classic early-clobber situation, and you'll have to tell this fact to the compiler by means of early-clobber modifier &:
asm ("mov %0, %3" "\n\t"
"mov %1, %2"
: "=&r" (c), "=r" (d)
: "r" (a), "r" (b));
However, the code that's generated is sub-optimal because it's actually fine if the compiler uses the same register for c and b, and the same register for d and a. This means you don't need any explicit asm code at all, and everything can be described by means of the constraints:
asm (""
: "=r" (c), "=r" (d)
: "1" (a), "0" (b));
1Apart from that, your asm code tries to implement c = b and d = a, not c = a and d = b.
Related
Extended asm gives the following description regarding the "+" modifier:
Operands using the ‘+’ constraint modifier count as two operands (that
is, both as input and output) towards the total maximum of 30 operands
per asm statement.
So I assume that it is not necessary to mention output operand with the "+" modifier in the input section again, but it is not specified how to determine their index. I wrote the following example Godbolt :
#include <stdint.h>
#include <inttypes.h>
#include <stdio.h>
void asm_add(uint64_t o1, uint64_t o2, uint64_t o3){
__asm__ volatile (
"addq %2, %3\n\
addq %2, %4":
"+r" (o2), "+r" (o3):
"r" (o1):
"cc"
);
printf("o2 = %" PRIu64 "\n", o2);
printf("o3 = %" PRIu64 "\n", o3);
}
int main(void){
asm_add(20, 30, 40);
}
Which printed
o2 = 50
o3 = 60
Is the template using +
__asm__ volatile (
"addq %2, %3\n\
addq %2, %4":
"+r" (o2), "+r" (o3):
"r" (o1):
"cc"
);
exactly the same as
__asm__ volatile (
"addq %2, %3\n\
addq %2, %4":
"+r" (o2), "+r" (o3):
"r" (o1), "0" (o2), "1" (o3):
"cc"
);
where all inputs are specified explicitly? So in the first example the "implicit" inputs are appended.
By using "+r" (o2), you are saying that this parameter needs to contain o2 on entry to the asm block, and will contain an updated value on exit.
In other words, %0 describes both input and output. The fact that you can (apparently?) reference indices greater than the number of parameters is an undocumented quirk. Don't depend upon it.
You might also consider using symbolic names, which (I find) are easier to read, especially as the number of asm lines goes up. Names are particularly useful when you are first creating the asm and there's the potential for adding/removing parameters. Having to renumber everything is painful and error prone:
__asm__ volatile (
"addq %[o1], %[o2]\n\
addq %[o1], %[o3]":
[o2] "+r" (o2), [o3] "+r" (o3):
[o1] "r" (o1):
"cc"
);
Lastly, consider not using inline asm for anything beyond educational purposes. And even then, inline asm is the hardest possible way to learn asm.
I want to write something like this:
#include <stdint.h>
inline uint64_t with_rsp(uint64_t x, uint64_t y) {
uint64_t z, w;
uint64_t rsp;
asm ("mov %%rsp, %[rsp]\t\n"
"mov $0x13, %%rsp\t\n"
"mov %[x], %%rdx\t\n"
"mulx %[y], %[z], %[w]\t\n"
"mov %[rsp], %%rsp\t\n"
: [z] "=&r" (z), [w] "=&r" (w)
: [x] "r" (x), [y] "r" (y), [rsp] "m" (rsp)
: "rdx"
);
return z + w;
}
inline uint64_t with_rbp(uint64_t x, uint64_t y) {
uint64_t z, w;
uint64_t rbp;
asm ("mov %%rbp, %[rbp]\t\n"
"mov $0x13, %%rbp\t\n"
"mov %[x], %%rdx\t\n"
"mulx %[y], %[z], %[w]\t\n"
"mov %[rbp], %%rbp\t\n"
: [z] "=&r" (z), [w] "=&r" (w)
: [x] "r" (x), [y] "r" (y), [rbp] "m" (rbp)
: "rdx"
);
return z + w;
}
int main() {
uint64_t x = 15, y = 3, zw;
if (inline_asm_uses_rbp()) {
zw = with_rsp(x, y);
} else {
zw = with_rbp(x, y);
}
return zw;
}
Ideally, the if statement should compile away at compile-time (but I don't think I can do this with preprocessor macros, because those get evaluated before the code is assembled). So I'm fine with needing some sort of jump to get it to work, though I'd prefer to not need that.
The reason I need this is that I have some inline assembly that needs to be able to use 15 registers, plus some memory locations on the stack, and gcc is choosing rsp-based offsets in some locations where the function is inlined, and it's choosing rbp-based offsets in other locations. (A separate assembly module isn't a good match for this because I'd like to avoid the overhead of a function call.)
Not exactly sure how to title this, but
I want to emulate this code:
asm("movl %%fs:0x30, %0" : "=r" (peb) : : );
but I want to specify the offset variably in C
trying:
int mee = 48;
asm("movl %%fs:%1, %0"
: "=r" (peb)
: "r" (mee)
:
);
Error is bad memory operand '%eax'
For what you have written, the compiler translates the first operand to %fs:%eax because it chooses %eax to be the register holding the value of mee. The addressing mode I think you're trying to use is base:offset and the offset must be a value rather than a register. This is the reason for the error "bad memory operand". It worked in the %fs:0x30 case because 0x30 is an immediate value.
To use the register %eax as an offset, try a dereference of it %%fs:(%1) to get the value in the register:
int mee = 48;
asm("movl %%fs:(%1), %0" : "=r" (peb) : "r" (mee) :);
See also this guide, which contains some possibly useful examples of memory access (and more) in inline assembly.
I want to translate this function:
iowrite32(mem1, value1);
into assembly code.
mem1 is defined as:
int * mem1;
in order to use ioremap.
I've written this code:
asm volatile(
"mov %[whr],%[wht]"
: [whr] "=r" (mem1)
: [wht] "r" (value)
);
Then I've realized I don't want to move value to mem1, but to the ADDRESS stored in mem1.
How do I write it in assembly?
You might want to take a look at the m constraint
asm volatile(
"mov %[wht], %[whr];"
: [whr] "=m" (*mem1)
: [wht] "r" (value)
);
I have to write inline assembly code that executes a custom instruction that I integrated into my hardware.
Depending on what hardware is to find on the actual chip, the instruction behaves differently. My assembly looks as follows:
asm volatile (
" instr_generic %1, %2, %0 \n\t"
: "=r" (c)
: "r" (a), "r" (b)
: "%g0"
);
This instr_generic could now execute either an addition or subtraction for example, depending on what is on the hardware.
Now, instead of instr_generic I wanna write cust_add or cust_sub and this should then be replaced with instr_generic. In other words, it should look like this here
#define cust_add instr_generic
...
asm volatile (
" cust_add %1, %2, %0 \n\t"
: "=r" (c)
: "r" (a), "r" (b)
: "%g0"
);
But I guess I can't use the pre-processor in this context to replace inline assemly is that right? Is there another way to do that easily?
...
#define cust(arg) \
asm volatile (
" " #arg " %1, %2, %0 \n\t" \
: "=r" (c) \
: "r" (a), "r" (b) \
: "%g0" \
)
...
cust(cust_add);
I would either do an if-then-else with the different solutions based on runtime detection of the processor, or to squeeze a little speed, use a function pointer to functions containing the different solution, if detected a then funptr = a_solution, else if detected b then funptr = b_solution, etc. Do that one time then use funptr for the duration of the program.
As already mentioned the custom instruction needs to be compiled at compile time not runtime. if you want to change the instruction runtime that is a third option to do self-modifying code to insert the proper instruction at runtime.
Can't you just use string concatenation? Or is there some reason you can't do it that way?
#define cust_add "instr_generic"
...
asm volatile (
cust_add " %1, %2, %0 \n\t"
: "=r" (c)
: "r" (a), "r" (b)
: "%g0"
);