So I have some code
#define umul_ppmm(w1, w0, u, v) \
asm ("mulq %3" \
: "=a" (w0), "=d" (w1) \
: "0" ((uint64_t)(u)), "rm" ((uint64_t)(v)))
I'm trying to debug it and understand how it works.
Currently, I am looking at This pdf for reference on mulq.
My understanding so far is that it is multiplying two 64 bit numbers together which would be w0 and u. Then it stores the result of that multiplication in w0 and w1.
My main questions are:
According to This GCC assembly guide on Simple Constraints 'a' and 'd' in "=a" and "=d" are address and data registers respectively. How does that play in here and what exactly does that mean?
What does "0" mean in this case? That same guide says that "An operand that matches the specified operand number is allowed." What would be the matching operand here?
How does v come into play? If at all?
Printing out the variables before and after the function call results in
w1 w0 u v
2097147 549755813889 17179869183 4611684961865433149
4294966311 17179869183 13835060159816138691 4611684961865433149
The mulq instruction implicitly produces result in the a and d registers (normally known as rax and rdx)
Operands are indexed from zero. "0" thus means same place as the first operand, which is w0. mulq implicitly uses rax as one of the input operands, hence the matching constraint. Could have written it out as "a" again.
v is the operand %3 which is the only explicit operand referenced in the mulq instruction. The code multiplies u and v so of course it "comes into play".
You printed the registers wrong, on the second line you swapped w0 and u, as u and v are unchanged input operands.
u*v=w1*2^64+w0, that is 17179869183*4611684961865433149=4294966311*2^64+13835060159816138691
Related
Consider this program, which can be compiled as either 32-bit or 64-bit:
#include <stdio.h>
static int f(int x, int y) {
__asm__(
"shrl $4, %0\n\t"
"movl %1, %%edx\n\t"
"addl %%edx, %0"
: "+r"(x) // needs "+&r" to work as intended
: "r"(y)
: "edx"
);
return x;
}
int main(void) {
printf("0x%08X\n", f(0x10000000, 0x10000000));
}
At -O1 or higher, it gives the wrong answer (0x02000000 instead of 0x11000000), because x gets written before y gets read, but the constraint for x doesn't have the & to specify earlyclobber, so the compiler put them in the same register. If I change +r to +&r, then it gives the right answer again, as expected.
Now consider this program:
#include <stdio.h>
static int f(int x, int y) {
__asm__(
"shrl $4, %0\n\t"
"movl %1, %%edx\n\t"
"addl %%edx, %0"
: "+m"(x) // Is this safe without "+&m"? Compilers reject that
: "m"(y)
: "edx"
);
return x;
}
int main(void) {
printf("0x%08X\n", f(0x10000000, 0x10000000));
}
Other than using m constraints instead of r constraints, it's exactly the same. Now it happens to give the right answer even without the &. However, I understand relying on this to be a bad idea, since I'm still writing to x before I read from y without telling the compiler I'm doing so. But when I change +m to +&m, my program no longer compiles: GCC tells me error: input operand constraint contains '&', and Clang tells me invalid output constraint '+&m' in asm. Why doesn't this work?
I can think of two possibilities:
It's always safe to earlyclobber things in memory, so the & is rejected as redundant
It's never safe to earlyclobber things in memory, so the & is rejected as unsatisfiable
Is one of those the case? If the latter, what's the best workaround? Or is something else going on here?
I think "+m" and "=m" are safe without an explicit &.
From the docs, my emphasis added:
&
Means (in a particular alternative) that this operand is an
earlyclobber operand, which is written before the instruction is
finished using the input operands. Therefore, this operand may not lie
in a register that is read by the instruction or as part of any memory
address.
Over-interpreting this could be problematic, but given the fact that it seems safe in practice, and there are good reasons why that should be the case, I think the following interpretation of the docs (i.e. guaranteed behaviour for GCC) is reasonable:
"Memory address" is talking about the addressing mode itself, e.g. something like 16(%rdx), that GCC invents and substitutes in for %1 if you have a "m"(foo) memory operand for example. It's not talking about early-clobbering pointed-to memory, only registers that might be read as part of the addressing mode.
It means GCC needs to avoid picking the same register in any addressing mode as it picked for an early-clobber register operand. This lets you safely use "m" operands (and +m or =m) in the same statement as an "=&r" operand, just like you can use "r" operands. It's the register output operand that needs to be flagged with &, not the potential readers.
The fact that it explicitly says in a register implies that this is only a concern at all for register operands, not memory.
In the C abstract machine, every object has a memory address (except register int foo).
I think compilers will always pick that address for "m" / "+m" operands, not some invented temporary. For example, I think it's safe / supported to lea that memory operand and store the address somewhere, if it would be safe to to tmp = &foo; in C.
You can think of "earlyclobber" as "don't pick the same location as any input operand". Since different objects have different addresses, that already happens for free for memory.
Unless you specified the same object for separate input and output operands, of course. In the register case for "=&r"(foo) and "r"(foo) you would get separate registers for the input and result. But not for memory, even if you use an early-clobber "=&m"(foo) operand, which does compile even though "+&m" doesn't.
Random facts, experiments on Godbolt:
"m"(y+1) doesn't work as an input: "memory input 1 is not directly addressable". But it works for a register. Memory source operands may have to be objects that exist in the C abstract machine.
"+&m"(x) doesn't compile: error: input operand constraint contains '&'
"=&m"(x) compiles cleanly. However, a "0"(x) matching constraint for it gets a warning: warning: matching constraint does not allow a register. https://godbolt.org/z/4kKNq4.
+ operands appear to be internally implemented as separate output and input operands with a matching constraint to make sure they pick the same location. (More evidence: if you use just one "+r" operand, you can reference %1 in the asm template without a warning, and it's the same register as %0.)
It appears that "=&m"(x) and "m"(x) will always pick the same memory anyway, even without a matching constraint. (For the same reason that it's not the same memory as any other object, which is why "+&m"(x) is redundant.)
If the lifetimes of two C objects overlap, their addresses will be distinct. So I think this works just like passing pointers to locals to a non-inline function, as far as the optimizer is concerned. It can't invent aliasing between them. e.g.
int x = 1;
{
int tmp = x; // dead after this call.
foo(&x, &tmp);
}
For example, the above code can't pass the same address for both operands of foo (e.g. by optimizing away tmp). Same for an inline-asm statement with "=m(x)" and "m"(tmp) operands. No early-clobber needed.
A lot of this reasoning is extrapolated from how one would reasonably expect it to work, but that is consistent with how it appears to work in practice and with the wording in the docs. I mention this as a caution against applying the same reasoning without any support from the docs for other cases.
Re: point 2: Even if early-clobber were necessary, it would always be satisfiable for memory. Every object has its own address. It's the programmer's fault if you pass overlapping union members as memory inputs and outputs. The compiler won't create that situation if it wasn't present in the source. e.g. it won't elide a temporary variable if it would mean that a memory input overlaps a memory output. (Or at all).
From the following link,
https://www.ibm.com/developerworks/library/l-ia/index.html
a single variable may serve as both the input and the output operand.
I wrote the following code:
#include <stdio.h>
int main()
{
int num = 1;
asm volatile ("incl %0"
:"=a"(num)
:"0"(num));
printf("num:%d\n", num);
return 0;
}
The above code increments the value of num.
What is the use of matching constraints, if i don't use matching constraints, the code does not work as expected.
asm volatile ("incl %0"
:"=a"(num));
why and when should we use matching constraints
That's not the question you asked; you asked why you need an input at all, which should be fairly obvious when you know what the syntax actually means. (That "=r"(var) is a pure output, independent of any previous value the C variable had, like var = 123; would be). So "=r" with an inc instruction is like var = stale_garbage + 1;
But anyway, as I commented, the interesting question is "why do matching constraints exist when you can just use "+r"(var) for a read/write operand, instead of the more complicated matching-constraint syntax?"
They're rarely useful; usually you can use the same variable for input and output especially if you have your asm inside a C wrapper function. But if you don't want to use the same C var for input and output, but still need them to pick the same register or memory, then you want a matching constraint. One use-case might be wrapping a system call is one use-case; you might want to use a different C variable for the call number vs. the return value. (Except you could just use "=a" and "a" instead of a matching constraint; the compiler doesn't have a choice.) Or maybe an output var of a narrower or different type than the input var could be another use-case.
IIRC, x87 is another use-case; I seem to recall "+t" not working.
I think that "+r" RMW constraints are internally implemented as an output with a "hidden" matching constraint. But while %1 normally errors in an asm template that only has one operand, if that operand is an in/out "+something" then GCC doesn't reject %1 as being too high an operand number. And if you look at the asm to see which register or memory it actually chose for that out-of-bounds operand number, it does match the in/out operand.
So "+r" is basically syntactic sugar for matching constraints. I'm not sure if it was new at some point, and before GCC version x.y you had to use matching constraints? It's not rare to see tutorial examples that use matching constraints with the same var for both input and output that would simpler to read with "+" RMW constraints.
Basics:
With constraints like "a" and "=a" you don't need a matching constraint; the compiler only has 1 choice anyway. Where it's useful is "=r" where the compiler could pick any register, and you need it to pick the same register for an input operand.
If you just used "=r" and a separate "r" input, you'd be telling the compiler that it can use this as a copy-and-whatever operation, leaving the original input unmodified and producing the output in a new register. Or overwriting the input if it wants to. That would be appropriate for lea 1(%[srcreg]), %[dstreg] but not inc %0. The latter would assume that %0 and %1 are the same register, therefore you need to do something to make sure that's true!
This code:
asm volatile ("incl %0"
:"=a"(num));
Doesn't work because in order to increase the value in a register (by 1 in this case) an original value needs to be read from the register; 1 added to it; and the value written back to the register. =a only says that the output of the register EAX will be moved to num when finished but the compiler won't load the register EAX with the original value of num. The code above will just add 1 to whatever happens to be in EAX (could be anything) and puts that in num when the inline assembly is finished.
asm volatile ("incl %0"
:"=a"(num)
:"0"(num));
On the other hand this says that num is both used as an input (so the value of num is moved to EAX) and that it also outputs a value in EAX so the compiler will move the value in EAX to num when the inline assembly is finished.
It could have been rewritten to use an input/output constraint as well (this does the same thing):
asm volatile ("incl %0"
:"+a"(num));
There is no need for volatile here either since all of the side effects are captured in the constraints. Adding volatile unnecessarily can lead to less efficient code generation but the code will still work. I would have written it this way:
asm ("incl %0"
:"+a"(num));
We know, if I set %eax as the input, then I cannot include it into list of clobbered registers. So my question is that, is that legal to modify the value of %eax in the assembly code without any declaration in clobbered list?
__asm__ __volatile__("inc %0" :: "a"(num) : "%eax"); // illegal
__asm__ __volatile__("inc %0" :: "a"(num)); // Can I modify %eax?
No, if the assembly code changes an input register, it must also be listed as an output register, or as an in-out register, by using "+" instead of "=" in the constraint.
Examples:
__asm__ __volatile__("..." : "+r"(num));
__asm__ __volatile__("..." : "=a"(dummy) : "a"(num));
__asm__ __volatile__("..." : "=r"(dummy) : "0"(num));
The first example specifies num as both an input and an output. This will overwrite the prior value of num, which may be undesirable if the operand is clobbered rather being set to a useful value.
The second example explicitly uses eax for both input and output. It directs the output to a dummy variable so as to not corrupt the actual input variable num.
The third example uses a dummy output, to avoid modifying num, and also avoids specifying eax explicitly, allowing the compiler to choose which register to use. The "0" constraint tells the compiler to use the same register as operand 0 for this input operand.
static inline void *__memset(void *s, char c, size_t n) {
int d0, d1;
asm volatile (
"rep; stosb;"
: "=&c" (d0), "=&D" (d1)
: "0" (n), "a" (c), "1" (s)
: "memory");
return s;
}
What are "d0" and "d1" used for? Could you please explain all the code completely?Thank you!
You need to understand gcc extended inline asm format:
The first part is the actual assembly. In this case there are only 2 instructions
The second part specifies output constraints and the third part specifies input constraints. The fourth part specifies the assembly will clobber the memory
Output
"=&c" associates d0 with the ecx register and marks it for write-only. & means it can be modified before the end of the code
"=&D" means the same thing, for the edi register
Input
"0" (n) associates n with the first mentioned register. In your case, with ecx
"a" (c) associates c with eax
"1" (s) associates s with edi
Assembly
So there you have it. Repeat this ecx times (n times): store eax (c) into edi (s) then increment it.
So then, why the unused d0 and d1 ? I'm not sure. I too think they are useless in this case and the whole output section could be left empty BUT I don't think it's possible to specify "writable" and "early-clobbered" in the input constraints. So I think d0 and d1 are there to make & possible.
I would try writing it like this:
asm volatile (
"rep\n"
"stosb\n"
:
: "c" (n), "a" (c), "D" (s)
: "%ecx", "%edi", "memory"
);
What are "d0" and "d1" used for?
In effect, it says that the final values of %ecx, %edi (assuming 32-bit) are stored in d0, d1 respectively. This serves a couple of purposes:
It lets the compiler know that, as outputs, these registers are effectively clobbered. By assigning them to temporary variables, an optimizing compiler also knows that there is no need to actually perform the 'store' operation.
The "=&" specifies these as early-clobber operands. They may be written to before all the inputs are consumed. So if the compiler is free to choose an input register, it shouldn't alias these two.
This isn't technically necessary for %ecx, since it's explicitly named as an input: "0" (n) - the 'rep' count in this case. I'm not sure it's necessary for %edi either, since it can't be updated before the input "1" (s) is consumed, and the instruction executed. And again, as it's explicitly named as an input, the compiler isn't free to choose another register. In short, "=&" doesn't hurt here, but it doesn't do anything.
As "a" (c) specifies an input-only register %eax set to (c), the compiler may assume that %eax still holds this value after the 'asm' - which is indeed the case with "rep; stosb;".
"memory" specifies that memory can be modified in a way unknown to the compiler - which is true in this case, it's setting (n) bytes starting at (r) to the value (c) - assuming the direction flag is cleared, which it should be. This does have the effect of forcing a reload of values, as the compiler can't assume that registers reflect the memory values they're supposed to anymore. It doesn't hurt, and it may be necessary to make it safe for a general case memset, but it's often overkill.
Edit: Input operands may not overlap clobber operands. It doesn't make sense to specify something as input-only and clobbered. I don't think the compiler allows this, and it wouldn't be wise to use an ambiguous specification even if it did. From the manual:
You may not write a clobber description in a way that overlaps with an input or output operand. For example, you may not have an operand describing a register class with one member if you mention that register in the clobber list.
Reviewing some old answers, I thought I would add a link to the excellent Lockless GCC inline ASM tutorial. The article builds on prior sections, unlike the gcc manual which is best described as a 'reference', and not really suited to any sort of structured learning.
Example:
int main(void)
{
int x = 10, y;
asm ("movl %1, %%eax;"
"movl %%eax, %0;"
:"=r"(y) /* y is output operand */
:"r"(x) /* x is input operand */
:"%eax"); /* %eax is clobbered register */
}
what is r(y)?
also why %% is used before eax? Generally single % is used right?
Okay, this is gcc inline assembler which very powerful but difficult to understand.
First off, the % char is a special char. It lets you define register and number placeholders (mor on this later). Unfortunately the % is also used to as part of a register name (such as %EAX) so in gcc inline assembler you have to use two percent chars if you want to name a register.
%0, %1 and %2 (ect..) are placeholder input and output operands. These are defined in the list followed by the assembler string.
In your example %0 becomes a placeholder for y, and %1 becomes a placeholder for x. The compiler will make sure the variables will be in the registers for input operands before the asm-code gets executed, and it will make sure the output operand will get written to the variable specified in the output operand list.
Now you should get an idea what r(y) is: It is an input operand that reserves a register for the variable y and assigns it to the placeholder %1 (because it is the second operand listed after the inline assembler string).
There are lots of other placeholder types. m lets you specify a memory location, and if I'm not mistaken i can be used for numeric constants. You'll find them all listed in the gcc documentation.
Then there is the clobber list. This list is important! It lists all registers, flags, memory-locations ect that gets modified in your assembler code (such as the EAX in your example). If you get this wrong the optimizer will not know what has been modified and it is very likely that you end up with code that doesn't work.
Your example is by the way almost pointless. It just loads the value X into a register and assigns this register to EAX. Afterwards EAX gets stored into another register which will then later become your y variable. So all it does is a simple assignment:
y = x;
A last thing: If you have worked with Intel-style assembler before: You have to read the arguments backwards. For all instructions the source operand is the one following the instruction itself, and the target operand is the one on the right of the comma. Compared to Intel syntax this is exactly the other way around.
Try this tutorial. It covers everything you ask: for example, try section 6 - it explains constraints quite well, and what the "=" sign is for. Even the concept of clobbered registers is covered (section 5.3).
The lines with "r" or "=r" are operand constraints. The "=" means output operand. Essentially, this:
:"=r"(y)
:"r"(x)
means that %0 (ie: the first operand) corresponds to y and is for output, and %1 (the second operand) corresponds to x.
A single % is normally used in AT&T syntax assembly, but for inline assembly the single % is used for operand references (eg: %0, %1) while a double % is used for literal register references. Think of it like the way you have to use a double % in a printf format if you want a literal % in the output.
A clobbered register is a register whose value will be modified by the assembly code. As you can see from the code, eax is written to. You need to tell gcc about this so that it knows that the compiled code can't keep anything it needs for later in eax when it's about to invoke this assembly.
I can't answer all of this, but a clobbered register is one that will get used somewhere in the computation in a way that will destroy its current value. So if the caller wants to use the current value later, it needs to save it somehow.
In asm directives like this, when you write the assembly you figure out which registers are going to be clobbered by it; you then tell the compiler this (as shown in your example), and the compiler does what it has to do to preserve the current value of that register if necessary. The compiler knows a lot about how values in registers and elsewhere will be used for later computations, but it usually can't analyse embedded assembly. So you do the analysis yourself and the compiler uses the clobbering information to safely incorporate the assembly into its optimisation choices.