What is the use of matching constraints in inline assembly - c

From the following link,
https://www.ibm.com/developerworks/library/l-ia/index.html
a single variable may serve as both the input and the output operand.
I wrote the following code:
#include <stdio.h>
int main()
{
int num = 1;
asm volatile ("incl %0"
:"=a"(num)
:"0"(num));
printf("num:%d\n", num);
return 0;
}
The above code increments the value of num.
What is the use of matching constraints, if i don't use matching constraints, the code does not work as expected.
asm volatile ("incl %0"
:"=a"(num));

why and when should we use matching constraints
That's not the question you asked; you asked why you need an input at all, which should be fairly obvious when you know what the syntax actually means. (That "=r"(var) is a pure output, independent of any previous value the C variable had, like var = 123; would be). So "=r" with an inc instruction is like var = stale_garbage + 1;
But anyway, as I commented, the interesting question is "why do matching constraints exist when you can just use "+r"(var) for a read/write operand, instead of the more complicated matching-constraint syntax?"
They're rarely useful; usually you can use the same variable for input and output especially if you have your asm inside a C wrapper function. But if you don't want to use the same C var for input and output, but still need them to pick the same register or memory, then you want a matching constraint. One use-case might be wrapping a system call is one use-case; you might want to use a different C variable for the call number vs. the return value. (Except you could just use "=a" and "a" instead of a matching constraint; the compiler doesn't have a choice.) Or maybe an output var of a narrower or different type than the input var could be another use-case.
IIRC, x87 is another use-case; I seem to recall "+t" not working.
I think that "+r" RMW constraints are internally implemented as an output with a "hidden" matching constraint. But while %1 normally errors in an asm template that only has one operand, if that operand is an in/out "+something" then GCC doesn't reject %1 as being too high an operand number. And if you look at the asm to see which register or memory it actually chose for that out-of-bounds operand number, it does match the in/out operand.
So "+r" is basically syntactic sugar for matching constraints. I'm not sure if it was new at some point, and before GCC version x.y you had to use matching constraints? It's not rare to see tutorial examples that use matching constraints with the same var for both input and output that would simpler to read with "+" RMW constraints.
Basics:
With constraints like "a" and "=a" you don't need a matching constraint; the compiler only has 1 choice anyway. Where it's useful is "=r" where the compiler could pick any register, and you need it to pick the same register for an input operand.
If you just used "=r" and a separate "r" input, you'd be telling the compiler that it can use this as a copy-and-whatever operation, leaving the original input unmodified and producing the output in a new register. Or overwriting the input if it wants to. That would be appropriate for lea 1(%[srcreg]), %[dstreg] but not inc %0. The latter would assume that %0 and %1 are the same register, therefore you need to do something to make sure that's true!

This code:
asm volatile ("incl %0"
:"=a"(num));
Doesn't work because in order to increase the value in a register (by 1 in this case) an original value needs to be read from the register; 1 added to it; and the value written back to the register. =a only says that the output of the register EAX will be moved to num when finished but the compiler won't load the register EAX with the original value of num. The code above will just add 1 to whatever happens to be in EAX (could be anything) and puts that in num when the inline assembly is finished.
asm volatile ("incl %0"
:"=a"(num)
:"0"(num));
On the other hand this says that num is both used as an input (so the value of num is moved to EAX) and that it also outputs a value in EAX so the compiler will move the value in EAX to num when the inline assembly is finished.
It could have been rewritten to use an input/output constraint as well (this does the same thing):
asm volatile ("incl %0"
:"+a"(num));
There is no need for volatile here either since all of the side effects are captured in the constraints. Adding volatile unnecessarily can lead to less efficient code generation but the code will still work. I would have written it this way:
asm ("incl %0"
:"+a"(num));

Related

How use INT %0 in inline asm with the interrupt number coming from a C variable?

I want to call the bios inline my c code. I tried asm("int %%al"::"a" (interrupt)); but gcc write Error: operand size mismatch for 'int'. I wonder that code work.
The int instruction must take its vector as an immediate; it has no form that takes the number from a register. See the instruction description; note that the second form is INT imm8 and there is nothing like INT r8 or INT r/m8 that would allow a register or memory operand.
If interrupt can be evaluated as a compile-time constant then you may be able to do
asm volatile("int %0" : : "i" (interrupt));
Note that in order for the interrupt to do something useful, you probably have to load various values into registers beforehand, and retrieve the values returned. Those will need to be done as part of the same asm block, requiring more operands and constraints. You cannot put something like asm("mov $0x1a, %%ah"); in a preceding block; the compiler need not preserve register contents between blocks.
If you truly don't know the interrupt number until runtime, your options are either to assemble all 256 possible int instructions and jump to the right one, or else use self-modifying code.

Can I tell the compiler that I need to earlyclobber a memory operand?

Consider this program, which can be compiled as either 32-bit or 64-bit:
#include <stdio.h>
static int f(int x, int y) {
__asm__(
"shrl $4, %0\n\t"
"movl %1, %%edx\n\t"
"addl %%edx, %0"
: "+r"(x) // needs "+&r" to work as intended
: "r"(y)
: "edx"
);
return x;
}
int main(void) {
printf("0x%08X\n", f(0x10000000, 0x10000000));
}
At -O1 or higher, it gives the wrong answer (0x02000000 instead of 0x11000000), because x gets written before y gets read, but the constraint for x doesn't have the & to specify earlyclobber, so the compiler put them in the same register. If I change +r to +&r, then it gives the right answer again, as expected.
Now consider this program:
#include <stdio.h>
static int f(int x, int y) {
__asm__(
"shrl $4, %0\n\t"
"movl %1, %%edx\n\t"
"addl %%edx, %0"
: "+m"(x) // Is this safe without "+&m"? Compilers reject that
: "m"(y)
: "edx"
);
return x;
}
int main(void) {
printf("0x%08X\n", f(0x10000000, 0x10000000));
}
Other than using m constraints instead of r constraints, it's exactly the same. Now it happens to give the right answer even without the &. However, I understand relying on this to be a bad idea, since I'm still writing to x before I read from y without telling the compiler I'm doing so. But when I change +m to +&m, my program no longer compiles: GCC tells me error: input operand constraint contains '&', and Clang tells me invalid output constraint '+&m' in asm. Why doesn't this work?
I can think of two possibilities:
It's always safe to earlyclobber things in memory, so the & is rejected as redundant
It's never safe to earlyclobber things in memory, so the & is rejected as unsatisfiable
Is one of those the case? If the latter, what's the best workaround? Or is something else going on here?
I think "+m" and "=m" are safe without an explicit &.
From the docs, my emphasis added:
&
Means (in a particular alternative) that this operand is an
earlyclobber operand, which is written before the instruction is
finished using the input operands. Therefore, this operand may not lie
in a register that is read by the instruction or as part of any memory
address.
Over-interpreting this could be problematic, but given the fact that it seems safe in practice, and there are good reasons why that should be the case, I think the following interpretation of the docs (i.e. guaranteed behaviour for GCC) is reasonable:
"Memory address" is talking about the addressing mode itself, e.g. something like 16(%rdx), that GCC invents and substitutes in for %1 if you have a "m"(foo) memory operand for example. It's not talking about early-clobbering pointed-to memory, only registers that might be read as part of the addressing mode.
It means GCC needs to avoid picking the same register in any addressing mode as it picked for an early-clobber register operand. This lets you safely use "m" operands (and +m or =m) in the same statement as an "=&r" operand, just like you can use "r" operands. It's the register output operand that needs to be flagged with &, not the potential readers.
The fact that it explicitly says in a register implies that this is only a concern at all for register operands, not memory.
In the C abstract machine, every object has a memory address (except register int foo).
I think compilers will always pick that address for "m" / "+m" operands, not some invented temporary. For example, I think it's safe / supported to lea that memory operand and store the address somewhere, if it would be safe to to tmp = &foo; in C.
You can think of "earlyclobber" as "don't pick the same location as any input operand". Since different objects have different addresses, that already happens for free for memory.
Unless you specified the same object for separate input and output operands, of course. In the register case for "=&r"(foo) and "r"(foo) you would get separate registers for the input and result. But not for memory, even if you use an early-clobber "=&m"(foo) operand, which does compile even though "+&m" doesn't.
Random facts, experiments on Godbolt:
"m"(y+1) doesn't work as an input: "memory input 1 is not directly addressable". But it works for a register. Memory source operands may have to be objects that exist in the C abstract machine.
"+&m"(x) doesn't compile: error: input operand constraint contains '&'
"=&m"(x) compiles cleanly. However, a "0"(x) matching constraint for it gets a warning: warning: matching constraint does not allow a register. https://godbolt.org/z/4kKNq4.
+ operands appear to be internally implemented as separate output and input operands with a matching constraint to make sure they pick the same location. (More evidence: if you use just one "+r" operand, you can reference %1 in the asm template without a warning, and it's the same register as %0.)
It appears that "=&m"(x) and "m"(x) will always pick the same memory anyway, even without a matching constraint. (For the same reason that it's not the same memory as any other object, which is why "+&m"(x) is redundant.)
If the lifetimes of two C objects overlap, their addresses will be distinct. So I think this works just like passing pointers to locals to a non-inline function, as far as the optimizer is concerned. It can't invent aliasing between them. e.g.
int x = 1;
{
int tmp = x; // dead after this call.
foo(&x, &tmp);
}
For example, the above code can't pass the same address for both operands of foo (e.g. by optimizing away tmp). Same for an inline-asm statement with "=m(x)" and "m"(tmp) operands. No early-clobber needed.
A lot of this reasoning is extrapolated from how one would reasonably expect it to work, but that is consistent with how it appears to work in practice and with the wording in the docs. I mention this as a caution against applying the same reasoning without any support from the docs for other cases.
Re: point 2: Even if early-clobber were necessary, it would always be satisfiable for memory. Every object has its own address. It's the programmer's fault if you pass overlapping union members as memory inputs and outputs. The compiler won't create that situation if it wasn't present in the source. e.g. it won't elide a temporary variable if it would mean that a memory input overlaps a memory output. (Or at all).

Clobber list for rep_movsl

I'm trying out the examples of inline assembly in: http://www.delorie.com/djgpp/doc/brennan/brennan_att_inline_djgpp.html
But something is confusing me about clobbering:
About behavior of clobber
Clobbering essentially tells GCC to not trust the values in the specified register/memories.
"Well, it really helps when optimizing, when GCC can know exactly what you're doing with the registers before and after....It's even smart enough to know that if you tell it to put (x+1) in a register, then if you don't clobber it, and later C code refers to (x+1), and it was able to keep that register free, it will reuse the computation. Whew."
Does this paragraph means clobbering will disable common sub-expression elimination?
There's some inconsistency in the tutorial about the clobber list:
For registers specified in input/output list, there's no need to put them in clobber list as GCC knows; However in the example about rep_movsl (or rep_stosl):
asm ("cld\n\t"
"rep\n\t"
"stosl"
: /* no output registers */
: "c" (count), "a" (fill_value), "D" (dest)
: "%ecx", "%edi" );
although "S, D, c" are in the output operands, they are listed as clobbered again.
I tried a simple snippet in C:
#include<stdio.h>
int main()
{
int a[] = {2, 4, 6};
int b[3];
int n = 3;
int v = 12;
asm ("cld\n\t"
"rep\n\t"
"movsl"
:
: "S" (a), "D" (b), "c" (n)
: );
// : "%ecx", "%esi", "%edi" );
printf("%d\n", b[1]);
}
If I use the commented clobber list, GCC will complain:
a.c:8:3: error: can't find a register in class ‘CREG’ while reloading
‘asm’ a.c:8:3: error: ‘asm’ operand has impossible constraints
If I use empty clobber list, it will compile and the output is 4.
The document you are quoting appears to be significantly inaccurate. Here's what asm operand constraints actually mean to GCC:
Input: The assembly operation reads from this operand. GCC assumes that all reads happen simultaneously at the very beginning of the assembly operation.
Output: The assembly operation writes to this operand; after it completes, the associated variable will have a meaningful value. (There is no way to tell GCC what that value is.) GCC assumes that all writes happen simultaneously at the very end of the assembly operation.
Clobber: The assembly operation destroys any meaningful value in this operand. Like writes, all clobbers are assumed to happen simultaneously at the end of the operation.
Earlyclobber: Same as clobber except that it happens at the beginning of the operation.
Furthermore, the current (GCC 4.7) manual includes this critical paragraph:
You may not write a clobber description in a way that overlaps with an input or output operand. For example, you may not have an operand describing a register class with one member if you mention that register in the clobber list. Variables declared to live in specific registers (see Explicit Reg Vars), and used as asm input or output operands must have no part mentioned in the clobber description. There is no way for you to specify that an input operand is modified without also specifying it as an output operand. Note that if all the output operands you specify are for this purpose (and hence unused), you will then also need to specify volatile for the asm construct, as described below, to prevent GCC from deleting the asm statement as unused.
This is why attempting to both input and clobber certain registers is failing for you.
Now, inserting rep movsl is kind of silly nowadays -- just use memcpy and let GCC replace that with an optimal instruction sequence for you -- but nonetheless the correct way to write your example is
int main()
{
int a[] = {2, 4, 6};
int b[3];
int n = 3;
int v = 12;
int *ap = a, *bp = b;
asm volatile ("rep movsl" : "+S" (ap), "+D" (bp), "+c" (n) : : "memory");
printf("%d\n", b[1]);
}
You need the ap and bp intermediate variables because the address of an array is not an lvalue, so it can't appear in the output constraints. The "+r" notation tells GCC that this register is both an input and an output. The 'volatile' is necessary because all of the output operands are unused after the asm, so GCC would otherwise cheerfully delete it (on the theory that it was only there for what it did to the output operands). Putting "memory" in the clobber list is how you tell GCC that the operation modified memory. And finally, a micro-optimization: GCC never ever issues 'std', so you need not 'cld' (this is actually guaranteed by the x86 ABI).
Most of the changes I made would not affect whether a tiny test program like this behaves correctly; however, they are all essential in a full-size program to prevent subtle optimization errors. For instance, if you left out the "memory" clobber, GCC would be within its rights to hoist the load of b[1] above the asm!

inline asm unknown

static inline void *__memset(void *s, char c, size_t n) {
int d0, d1;
asm volatile (
"rep; stosb;"
: "=&c" (d0), "=&D" (d1)
: "0" (n), "a" (c), "1" (s)
: "memory");
return s;
}
What are "d0" and "d1" used for? Could you please explain all the code completely?Thank you!
You need to understand gcc extended inline asm format:
The first part is the actual assembly. In this case there are only 2 instructions
The second part specifies output constraints and the third part specifies input constraints. The fourth part specifies the assembly will clobber the memory
Output
"=&c" associates d0 with the ecx register and marks it for write-only. & means it can be modified before the end of the code
"=&D" means the same thing, for the edi register
Input
"0" (n) associates n with the first mentioned register. In your case, with ecx
"a" (c) associates c with eax
"1" (s) associates s with edi
Assembly
So there you have it. Repeat this ecx times (n times): store eax (c) into edi (s) then increment it.
So then, why the unused d0 and d1 ? I'm not sure. I too think they are useless in this case and the whole output section could be left empty BUT I don't think it's possible to specify "writable" and "early-clobbered" in the input constraints. So I think d0 and d1 are there to make & possible.
I would try writing it like this:
asm volatile (
"rep\n"
"stosb\n"
:
: "c" (n), "a" (c), "D" (s)
: "%ecx", "%edi", "memory"
);
What are "d0" and "d1" used for?
In effect, it says that the final values of %ecx, %edi (assuming 32-bit) are stored in d0, d1 respectively. This serves a couple of purposes:
It lets the compiler know that, as outputs, these registers are effectively clobbered. By assigning them to temporary variables, an optimizing compiler also knows that there is no need to actually perform the 'store' operation.
The "=&" specifies these as early-clobber operands. They may be written to before all the inputs are consumed. So if the compiler is free to choose an input register, it shouldn't alias these two.
This isn't technically necessary for %ecx, since it's explicitly named as an input: "0" (n) - the 'rep' count in this case. I'm not sure it's necessary for %edi either, since it can't be updated before the input "1" (s) is consumed, and the instruction executed. And again, as it's explicitly named as an input, the compiler isn't free to choose another register. In short, "=&" doesn't hurt here, but it doesn't do anything.
As "a" (c) specifies an input-only register %eax set to (c), the compiler may assume that %eax still holds this value after the 'asm' - which is indeed the case with "rep; stosb;".
"memory" specifies that memory can be modified in a way unknown to the compiler - which is true in this case, it's setting (n) bytes starting at (r) to the value (c) - assuming the direction flag is cleared, which it should be. This does have the effect of forcing a reload of values, as the compiler can't assume that registers reflect the memory values they're supposed to anymore. It doesn't hurt, and it may be necessary to make it safe for a general case memset, but it's often overkill.
Edit: Input operands may not overlap clobber operands. It doesn't make sense to specify something as input-only and clobbered. I don't think the compiler allows this, and it wouldn't be wise to use an ambiguous specification even if it did. From the manual:
You may not write a clobber description in a way that overlaps with an input or output operand. For example, you may not have an operand describing a register class with one member if you mention that register in the clobber list.
Reviewing some old answers, I thought I would add a link to the excellent Lockless GCC inline ASM tutorial. The article builds on prior sections, unlike the gcc manual which is best described as a 'reference', and not really suited to any sort of structured learning.

What is r() and double percent %% in GCC inline assembly language?

Example:
int main(void)
{
int x = 10, y;
asm ("movl %1, %%eax;"
"movl %%eax, %0;"
:"=r"(y) /* y is output operand */
:"r"(x) /* x is input operand */
:"%eax"); /* %eax is clobbered register */
}
what is r(y)?
also why %% is used before eax? Generally single % is used right?
Okay, this is gcc inline assembler which very powerful but difficult to understand.
First off, the % char is a special char. It lets you define register and number placeholders (mor on this later). Unfortunately the % is also used to as part of a register name (such as %EAX) so in gcc inline assembler you have to use two percent chars if you want to name a register.
%0, %1 and %2 (ect..) are placeholder input and output operands. These are defined in the list followed by the assembler string.
In your example %0 becomes a placeholder for y, and %1 becomes a placeholder for x. The compiler will make sure the variables will be in the registers for input operands before the asm-code gets executed, and it will make sure the output operand will get written to the variable specified in the output operand list.
Now you should get an idea what r(y) is: It is an input operand that reserves a register for the variable y and assigns it to the placeholder %1 (because it is the second operand listed after the inline assembler string).
There are lots of other placeholder types. m lets you specify a memory location, and if I'm not mistaken i can be used for numeric constants. You'll find them all listed in the gcc documentation.
Then there is the clobber list. This list is important! It lists all registers, flags, memory-locations ect that gets modified in your assembler code (such as the EAX in your example). If you get this wrong the optimizer will not know what has been modified and it is very likely that you end up with code that doesn't work.
Your example is by the way almost pointless. It just loads the value X into a register and assigns this register to EAX. Afterwards EAX gets stored into another register which will then later become your y variable. So all it does is a simple assignment:
y = x;
A last thing: If you have worked with Intel-style assembler before: You have to read the arguments backwards. For all instructions the source operand is the one following the instruction itself, and the target operand is the one on the right of the comma. Compared to Intel syntax this is exactly the other way around.
Try this tutorial. It covers everything you ask: for example, try section 6 - it explains constraints quite well, and what the "=" sign is for. Even the concept of clobbered registers is covered (section 5.3).
The lines with "r" or "=r" are operand constraints. The "=" means output operand. Essentially, this:
:"=r"(y)
:"r"(x)
means that %0 (ie: the first operand) corresponds to y and is for output, and %1 (the second operand) corresponds to x.
A single % is normally used in AT&T syntax assembly, but for inline assembly the single % is used for operand references (eg: %0, %1) while a double % is used for literal register references. Think of it like the way you have to use a double % in a printf format if you want a literal % in the output.
A clobbered register is a register whose value will be modified by the assembly code. As you can see from the code, eax is written to. You need to tell gcc about this so that it knows that the compiled code can't keep anything it needs for later in eax when it's about to invoke this assembly.
I can't answer all of this, but a clobbered register is one that will get used somewhere in the computation in a way that will destroy its current value. So if the caller wants to use the current value later, it needs to save it somehow.
In asm directives like this, when you write the assembly you figure out which registers are going to be clobbered by it; you then tell the compiler this (as shown in your example), and the compiler does what it has to do to preserve the current value of that register if necessary. The compiler knows a lot about how values in registers and elsewhere will be used for later computations, but it usually can't analyse embedded assembly. So you do the analysis yourself and the compiler uses the clobbering information to safely incorporate the assembly into its optimisation choices.

Resources