Extended asm gives the following description regarding the "+" modifier:
Operands using the ‘+’ constraint modifier count as two operands (that
is, both as input and output) towards the total maximum of 30 operands
per asm statement.
So I assume that it is not necessary to mention output operand with the "+" modifier in the input section again, but it is not specified how to determine their index. I wrote the following example Godbolt :
#include <stdint.h>
#include <inttypes.h>
#include <stdio.h>
void asm_add(uint64_t o1, uint64_t o2, uint64_t o3){
__asm__ volatile (
"addq %2, %3\n\
addq %2, %4":
"+r" (o2), "+r" (o3):
"r" (o1):
"cc"
);
printf("o2 = %" PRIu64 "\n", o2);
printf("o3 = %" PRIu64 "\n", o3);
}
int main(void){
asm_add(20, 30, 40);
}
Which printed
o2 = 50
o3 = 60
Is the template using +
__asm__ volatile (
"addq %2, %3\n\
addq %2, %4":
"+r" (o2), "+r" (o3):
"r" (o1):
"cc"
);
exactly the same as
__asm__ volatile (
"addq %2, %3\n\
addq %2, %4":
"+r" (o2), "+r" (o3):
"r" (o1), "0" (o2), "1" (o3):
"cc"
);
where all inputs are specified explicitly? So in the first example the "implicit" inputs are appended.
By using "+r" (o2), you are saying that this parameter needs to contain o2 on entry to the asm block, and will contain an updated value on exit.
In other words, %0 describes both input and output. The fact that you can (apparently?) reference indices greater than the number of parameters is an undocumented quirk. Don't depend upon it.
You might also consider using symbolic names, which (I find) are easier to read, especially as the number of asm lines goes up. Names are particularly useful when you are first creating the asm and there's the potential for adding/removing parameters. Having to renumber everything is painful and error prone:
__asm__ volatile (
"addq %[o1], %[o2]\n\
addq %[o1], %[o3]":
[o2] "+r" (o2), [o3] "+r" (o3):
[o1] "r" (o1):
"cc"
);
Lastly, consider not using inline asm for anything beyond educational purposes. And even then, inline asm is the hardest possible way to learn asm.
Related
As a (beginners) exercise I want to implement a swap operation in C using GNU asm.
I'm confused about the restraints. My code is:
int c = 1;
int b = 2;
asm ("xchg %2,%3"
: "=r" (c), "=r" (b)
: "<X>" (c), "<Y>" (b)
);
return c+10*b;
Where $<X>$ and $<Y>$ are replaced by ("0", "0"), ("0", "r"), ("r", "r") and ("r", 0").
("0", "0") does not compile. (Why?)
("0", "r") returns 12, the expected result.
("r", "r") and ("r", "0") return 21, as if nothing happened. (Why?)
So what is wrong in those cases? Why does it fail?
You can use : "+r" (c), "+r" (b) to declare read/write operands1 (https://gcc.gnu.org/onlinedocs/gcc/Modifiers.html) In/out instructions go in the first (output) group of constraints.
And of course you should use xchg %0, %1 to make sure the asm template only references operands you actually declared.
asm ("xchg %0,%1"
: "+r" (c), "+r" (b)
);
Footnote 1: Fun fact: GCC internally implements read/write operand by inventing matching input constraints for these output. So if you didn't fix the %2,%3 in the template, it would still compile. But you wouldn't have any guarantee of which order GCC chose to do the matching constraints in.
To find out what happened with your wrong constraints, make your asm template print the unused operand names as an asm comment, e.g. xchg %2,%3 # inputs= %0,%1 so you can look at the compiler's asm output (gcc -S or on https://godbolt.org), and see which registers it picked. If it picked opposite constraints you'd see no swap.
Or with "r" if it picked 2 registers that are different from the output operands then you're stepping on the compilers toes by modifying registers it expected to remain unchanged (because you told it those were inputs, and could be in separate registers from the outputs.) So no, "r" can't be safe.
If you did want to use matching constraints manually, you'd of course use "0"(c), "1"(b) to force it to pick the same register for c as an input that it picked for c as the output #0, and same for b as input as output #1.
Of course there's no need for a runtime instruction at all to just tell the compiler that C variables have each other's values.
asm ("nop # c input in %2, output in %0. b input in %3, output in %1"
: "=r" (c), "=r" (b)
: "1"(c), "0" (b) // opposite matching constraints
);
Even the nop is unnecessary, but https://godbolt.org/ by default filters comments so it's convenient to put a comment on a NOP.
Not exactly sure how to title this, but
I want to emulate this code:
asm("movl %%fs:0x30, %0" : "=r" (peb) : : );
but I want to specify the offset variably in C
trying:
int mee = 48;
asm("movl %%fs:%1, %0"
: "=r" (peb)
: "r" (mee)
:
);
Error is bad memory operand '%eax'
For what you have written, the compiler translates the first operand to %fs:%eax because it chooses %eax to be the register holding the value of mee. The addressing mode I think you're trying to use is base:offset and the offset must be a value rather than a register. This is the reason for the error "bad memory operand". It worked in the %fs:0x30 case because 0x30 is an immediate value.
To use the register %eax as an offset, try a dereference of it %%fs:(%1) to get the value in the register:
int mee = 48;
asm("movl %%fs:(%1), %0" : "=r" (peb) : "r" (mee) :);
See also this guide, which contains some possibly useful examples of memory access (and more) in inline assembly.
This question already has answers here:
When to use earlyclobber constraint in extended GCC inline assembly?
(2 answers)
Closed 4 months ago.
Following this manual I wanted to create simplest inline AVR assembly snippet possible: copy values of two variables to two other variables.
uint8_t a, b, c, d;
a = 42;
b = 11;
asm(
"mov %0, %2\n\t"
"mov %1, %3\n\t"
: "=r" (c), "=r" (d)
: "r" (a), "r" (b)
);
I would expect it to be equivalent to:
uint8_t a, b, c, d;
a = 42;
b = 11;
c = a;
d = b;
However, after running both values of c and d are equal to 42. If I change the asm snipptet to:
asm(
"mov %0, %3\n\t"
"mov %1, %2\n\t"
: "=r" (c), "=r" (d)
: "r" (a), "r" (b)
);
c is equal to 11 and d is equal to 42 as expected. Similarly, changing both source operands to %2 yields two 42 and setting both of them to %3 yields two 11.
Why the first version does not work as intended?
I would expect it to be equivalent to:
uint8_t a, b, c, d;
a = 42;
b = 11;
c = a;
d = b;
No, it's not1. The reason is that in the C code, one assignment follows after the other, whereas in inline asm, the compiler treats the "code" as if it happens at once. The compiler does not analyze the code in the asm string template in any way, it's just a string on which it performs replacements of %-operands. In
asm ("mov %0, %3" "\n\t"
"mov %1, %2"
: "=r" (c), "=r" (d)
: "r" (a), "r" (b));
the lifetime of a and b ends at the asm, and the lifetime of c and d begins. Therefore, it's totally fine for the compiler to use the same register for, say c and a. This means the output of the 1st move overrides the input of the 2nd move. This is the classic early-clobber situation, and you'll have to tell this fact to the compiler by means of early-clobber modifier &:
asm ("mov %0, %3" "\n\t"
"mov %1, %2"
: "=&r" (c), "=r" (d)
: "r" (a), "r" (b));
However, the code that's generated is sub-optimal because it's actually fine if the compiler uses the same register for c and b, and the same register for d and a. This means you don't need any explicit asm code at all, and everything can be described by means of the constraints:
asm (""
: "=r" (c), "=r" (d)
: "1" (a), "0" (b));
1Apart from that, your asm code tries to implement c = b and d = a, not c = a and d = b.
I want to assign an array using inline assembly using the AT&T syntax. I want to achieve something like the following. Note that rsp here is the %rsp register.
long saved_sp[N];
long new_sp[N];
void some_function( unsigned int tid, ... )
{
// These two lines should be in assembly
saved_sp[tid] = rsp;
rsp = new_sp[tid];
......
}
I'm sure I don't need to warn you...
__asm__ __volatile__ (
"movq %%rsp, (%0, %2, 8)\n\t"
"movq (%1, %2, 8), %%rsp\n\t"
: : "r" (saved_sp), "r" (new_sp), "r" ((long) tid));
Perhaps "memory" should be added as a clobber, but it seems kind of redundant. Wherever you go after this, remember that the frame pointer "%rbp" will be invalidated.
I have to write inline assembly code that executes a custom instruction that I integrated into my hardware.
Depending on what hardware is to find on the actual chip, the instruction behaves differently. My assembly looks as follows:
asm volatile (
" instr_generic %1, %2, %0 \n\t"
: "=r" (c)
: "r" (a), "r" (b)
: "%g0"
);
This instr_generic could now execute either an addition or subtraction for example, depending on what is on the hardware.
Now, instead of instr_generic I wanna write cust_add or cust_sub and this should then be replaced with instr_generic. In other words, it should look like this here
#define cust_add instr_generic
...
asm volatile (
" cust_add %1, %2, %0 \n\t"
: "=r" (c)
: "r" (a), "r" (b)
: "%g0"
);
But I guess I can't use the pre-processor in this context to replace inline assemly is that right? Is there another way to do that easily?
...
#define cust(arg) \
asm volatile (
" " #arg " %1, %2, %0 \n\t" \
: "=r" (c) \
: "r" (a), "r" (b) \
: "%g0" \
)
...
cust(cust_add);
I would either do an if-then-else with the different solutions based on runtime detection of the processor, or to squeeze a little speed, use a function pointer to functions containing the different solution, if detected a then funptr = a_solution, else if detected b then funptr = b_solution, etc. Do that one time then use funptr for the duration of the program.
As already mentioned the custom instruction needs to be compiled at compile time not runtime. if you want to change the instruction runtime that is a third option to do self-modifying code to insert the proper instruction at runtime.
Can't you just use string concatenation? Or is there some reason you can't do it that way?
#define cust_add "instr_generic"
...
asm volatile (
cust_add " %1, %2, %0 \n\t"
: "=r" (c)
: "r" (a), "r" (b)
: "%g0"
);