I know there are some other questions similar to this, but I'm still having trouble understanding the () part of it. Could someone spell this syntax out for me? Thanks.
cmp %eax,0x80498d4(,%ebx,4)
cmp is the comparison assembly instruction. It performs a comparison between two arguments by signed subtracting the right argument from the left and sets a CPU EFLAGS register. This EFLAGS register can then be used to do conditional branching / moving, etc.
First argument: `%eax (the value in the %eax register)
Second argument: 0x80498d4(,%ebx,4). This is read as offset ( base, index, scale ) In your example, the value of the second argument is the memory location offset 0x80498d4 + base (which I believe defaults to zero if not included) + value in %ebx register * 4 (scaling factor).
Note: I believe base here is empty and defaults to the value 0.
You can take a look at http://docs.oracle.com/cd/E19120-01/open.solaris/817-5477/ennby/index.html for more information on the syntax for Intel x86 assembly instructions.
Related
The overall problem I am trying to solve, is to call printf, while fetching its format string and arguments from a raw buffer. So far, the solution that seems to be working the best is through the use of inline assembly as a way to pass the mixed typing variadic arguments to the function.
Currently we have chars and ints working flawlessly, and floats/doubles working up until we need to pass them on the stack. (Passing through xmm0 - xmm7 works flawlessly for us). The goal here is to push these floating point values to the stack once xmm0-xmm7 have all been used. These values would then be used in the subsequent call to printf. The way we handle this for the chars and ints is to push them onto the stack just by simply using the push instruction, which the call to printf is able to use just fine, but since that instruction doesn't work for floating point values we have to manually 'push' it onto the stack with the method below. I realize that this is very likely to be the wrong way to handle this, but we haven't been able to figure a way out of doing it this way.
Currently our solution to passing more than eight floating point values on the stack requires us to know the offset of the argument that is being passed to our printf call. In this case the offsets correspond to 8 byte increments. The 9th argument is to be loaded into (%rsp), the 10th into 0x8(%rsp) the 11th into 0x10(%rsp) the 12th into 0x18(%rsp) with the rest of the arguments continuing this trend.
My goal with this "variable offset" is to just reduce the amount of repeated code that handles the incremented offset. Currently it just checks which argument is being processed, and jumps to the hardcoded constant offset. But this has led to a lot of duplicated code, which I was hoping to clean up.
Below is a small snippet of what we are doing currently to move one of the arguments into its appropriate place for the call to printf to access the argument.
double myDouble = 1.23;
asm volatile (
"movsd %0, 0x8(%%rsp)" #The 0x8 is the offset we are hoping to pass in
:: "m" (myDouble)
);
I am looking for a way to store this offset (0x8, 0x10, 0x18,...) in a variable that can be incremented by eight as I process the arguments, though I now fear that this will break once we start mixing in more mixed typed values that are pushed onto the stack.
Any guidance would be greatly appreciated!
That's not possible using the instruction with a constant offset. To generate the code, the offset would need to be known at compile time, and not be variable. You have to use a different instruction, an indirect load with base register and offset:
int foo(int64_t offset, double value)
{
asm volatile (
"movsd %0, (%%rsp,%1)" :: "x" (value), "r" (offset)
: "memory"
);
}
You could also let the CPU do the multiplication by 8 by using a scaled offset addressing mode:
int foo(int64_t offset, double value)
{
asm volatile (
"movsd %0, (%%rsp,%1,8)" :: "x" (value), "r" (offset)
: "memory"
);
}
Or if you want to emulate push, then sub $8, %%rsp / movsd %0, (%%rsp), but you can't mess with the stack pointer from inline asm without breaking compiler-generated code.
I have just a little theoretical curiosity. The == operator in C returns 1 in case of positive equality, 0 otherwise. My knowledge of assembly is very limited. However I was wondering if it could be possible, theoretically, to implement a new operator that returns ~0 in case of positive equality, 0 otherwise – but at one condition: it must produce the same number of assembly instructions as the == operator. It's really just a theoretical curiosity, I have no practical uses in mind.
EDIT
My question targets x86 CPUs, however I am very curious to know if there are architectures that natively do that.
SECOND EDIT
As Sneftel has pointed out, nothing similar to the SETcc instructions [1] – but able to convert flag register bits into 0/~0 values (instead of the classical 0/1) – exists. So the answer to my question seems to be no.
THIRD EDIT
A little note. I am not trying to represent a logical true as ~0, I am trying to understand if a logical true can also be optionally represented as ~0 when needed, whithout further effort, within a language that already normally represents true as 1. And for this I had hypothized a new operator that “returns” numbers, not booleans (the natural logical true “returned” by == remains represented as 1) – otherwise I would have asked whether == could be re-designed to “return” ~0 instead of 1. You can think of this new operator as half-belonging to the family of bitwise operators, which “return” numbers, not booleans (and by booleans I don't mean boolean data types, I mean anything outside of the number pair 0/1, which is what a boolean is intended in C as a result of a logical operation).
I know that all of this might sound futile, but I had warned: it is a theoretical question.
However here my question seems to be addressed explicitly:
Some languages represent a logical one as an integer with all bits set. This representation can be obtained by choosing the logically opposite condition for the SETcc instruction, then decrementing the result. For example, to test for overflow, use the SETNO instruction, then decrement the result.
So it seems there is no direct instruction, since using SETNE and then decrementing means adding one more instruction.
EDIT: as other people are pointing out, there are some flavors of "conditionally assign 0/1" out there. Kind of undermines my point :) Apparently, the 0/1 boolean type admits a slightly deeper optimization than a 0/~0 boolean.
The "operator returns a value" notion is a high level one, it's not preserved down to the assembly level. That 1/0 may only exist as a bit in the flags register, or not even that.
In other words, assigning the C-defined value of the equality operator to an int sized variable is not a primitive on the assembly level. If you write x = (a == b), the compiler might implement it as
cmp a, b ; set the Z flag
cmovz x, 1 ; if equals, assign 1
cmovnz x, 0 ; if not equals, assign 0
Or it can be done with conditional jumps. As you can see, assigning a ~0 as the value for TRUE will take the same commands, just with a different operand.
None of the architectures that I'm familiar with implement equality comparison as "assign a 1 or 0 to a general purpose register".
There is no assembly implementation of a C operator. For instance, there is no x86 instruction which compares two arguments and results in a 0 or 1, only one which compares two arguments and puts the result in a bit in the flag register. And that's not usually what happens when you use ==.
Example:
void foo(int a, int b) {
if(a == b) { blah(); }
}
produces the following assembly, more or less:
foo(int, int):
cmp %edi, %esi
je .L12
rep ret
.L12:
jmp blah()
Note that nothing in there involves a 0/1 value. If you want that, you have to really ask for it:
int bar(int a, int b) {
return a == b;
}
which becomes:
bar(int, int):
xor %eax, %eax
cmp %edi, %esi
sete %al
ret
I suspect the existence of the SETcc instructions is what prompted your question, since they convert flag register bits into 0/1 values. There is no corresponding instruction which converts them into 0/~0: GCC instead does a clever little DEC to map them. But in general, the result of == exists only as an abstract and optimizer-determined difference in machine state between the two.
Incidentally, I would not be surprised at all if some x86 implementations chose to fuse SETcc and a following DEC into a single micro-op; I know this is done with other common instruction pairs. There is no simple relationship between a stream of instructions and a number of cycles.
For just 1 extra cycle you can just negate the /output/.
Internally in 8086, the comparison operations only exist in the flags. Getting the value of the flags into a variable takes extra code. It is pretty much the same code whether you want true as 1 or -1. Generally a compiler doesn't actually generate the value 0 or 1 when evaluating an if statement, but uses the Jcc instructions directly on the flags generated by comparison operations. https://pdos.csail.mit.edu/6.828/2006/readings/i386/Jcc.htm
With 80386, SETcc was added, which only ever sets 0 or 1 as the answer, so that is the preferred arrangement if the code insists on storing the answer. https://pdos.csail.mit.edu/6.828/2006/readings/i386/SETcc.htm
And there are lots of new compare instructions that save results to registers going forward. The flags have been seen as a bottleneck for instruction pipeline stalls in modern processors, and very much are disfavoured by code optimisation.
Of course there are all sorts of tricks you can do to get 0, 1, or -1 given a particular set of values to compare. Needless to say the compiler has been optimised to generate 1 for true when applying these tricks, and wherever possible, it doesn't actually store the value at all, but just reorganises your code to avoid it.
SIMD vector comparisons do produce vectors of 0 / -1 results. This is the case on x86 MMX/SSE/AVX, ARM NEON, PowerPC Altivec, etc. (They're 2's complement machines, so I like to write -1 instead of ~0 to represent the elements of all-zero / all-one bits).
e.g. pcmpeqd xmm0, xmm1 replaces each element of xmm0 with xmm0[i] == xmm1[i] ? -1 : 0;
This lets you use them as AND masks, because SIMD code can't branch separately on each vector element without unpacking to scalar and back. It has to be branchless. How to use if condition in intrinsics
e.g. to blend 2 vectors based on a condition, without SSE4.1 pblendvb / blendvps, you'd compare and then AND / ANDNOT / OR. e.g. from Substitute a byte with another one
__m128i mask = _mm_cmpeq_epi8(inp, val); // movdqa xmm1, xmm0 / PCMPEQB xmm1, xmm2
// zero elements in the original where there was a match (that we want to replace)
inp = _mm_andnot_si128(mask, inp); // inp &= ~mask; // PANDN xmm0, xmm1
// zero elements where we keep the original
__m128i tmp = _mm_and_si128(newvals, mask); // newvals & mask; // PAND xmm3, xmm1
inp = _mm_or_si128(inp, tmp); // POR xmm0, xmm1
But if you want to count matches, you can subtract the compare result. total -= -1 avoids having to negate the vector elements. How to count character occurrences using SIMD
Or to conditionally add something, instead of actually blending, just do total += (x & mask), because 0 is the identity element for operations like ADD (and some others like XOR and OR).
See How to access a char array and change lower case letters to upper case, and vice versa and Convert a String In C++ To Upper Case for examples in C with intrinsics and x86 asm.
All of this has nothing to do with C operators and implicit conversion from boolean to integer.
In C and C++, operators return a boolean true/false condition, which in asm for most machines for scalar code (not auto-vectorized) maps to a bit in a flag register.
Converting that to an integer in a register is a totally separate thing.
But fun fact: MIPS doesn't have a flags register: it has some compare-and-branch instructions for simple conditions like reg == reg or reg != reg (beq and bne). And branch on less-than-zero (branch on the sign bit of one register): bltz $reg, target.
(And an architectural $zero register that always reads as zero, so you can use that implement branch if reg !=0 or reg == 0).
For more complex conditions, you use slt (set on less-than) or sltu (set on less-than-unsigned) to compare into an integer register. Like slt $t4, $t1, $t0 implements t4 = t1 < t0, producing a 0 or 1. Then you can branch on that being 0 or not, or combine multiple conditions with boolean AND / OR before branching on that. If one of your inputs is an actual bool that's already 0 or 1, it can be optimized into this without an slt.
Incomplete instruction listing of classic MIPS instructions (not including pseudo-instructions like blt that assemble to slt into $at + bne: http://www.mrc.uidaho.edu/mrc/people/jff/digital/MIPSir.html
But MIPS32r6 / MIPS64r6 changed this: instructions generating truth values now generate all zeroes or all ones instead of just clearing/setting the 0-bit, according to https://en.wikipedia.org/wiki/MIPS_architecture#MIPS32/MIPS64_Release_6. MIPS32/64 r6 is not binary compatible with previous MIPS ISAs, it also rearranged some opcodes. And because of this change, not even asm source compatible! But it's a definite change for the better.
Fun fact, there is an undocumented 8086 SALC instruction (set AL from carry) that's still supported in 16/32-bit mode by modern Intel (and AMD?) CPUs.
It's basically like sbb al,al without setting flags: AL = CF ? -1 : 0. http://os2museum.com/wp/undocumented-8086-opcodes.
Subtract-with-borrow with the same input twice does x-x - CF on x86, where CF is a borrow for subtraction. And x-x is of course always zero. (On some other ISAs, like ARM, the carry flag meaning is opposite for subtraction, C set means "no borrow".)
In general, you can do sbb edx,edx (or any register you want) to convert CF into a 0 / -1 integer. But this only works for CF; the carry flag is special and there's nothing equivalent for other flags.
Some AMD CPUs even recognize sbb same,same as independent of the old value of the register, only dependent on CF, like xor-zeroing. On other CPUs it still has the same architectural effect, but with a microarchitectural false dependency on the old value of EDX.
static inline void *__memset(void *s, char c, size_t n) {
int d0, d1;
asm volatile (
"rep; stosb;"
: "=&c" (d0), "=&D" (d1)
: "0" (n), "a" (c), "1" (s)
: "memory");
return s;
}
What are "d0" and "d1" used for? Could you please explain all the code completely?Thank you!
You need to understand gcc extended inline asm format:
The first part is the actual assembly. In this case there are only 2 instructions
The second part specifies output constraints and the third part specifies input constraints. The fourth part specifies the assembly will clobber the memory
Output
"=&c" associates d0 with the ecx register and marks it for write-only. & means it can be modified before the end of the code
"=&D" means the same thing, for the edi register
Input
"0" (n) associates n with the first mentioned register. In your case, with ecx
"a" (c) associates c with eax
"1" (s) associates s with edi
Assembly
So there you have it. Repeat this ecx times (n times): store eax (c) into edi (s) then increment it.
So then, why the unused d0 and d1 ? I'm not sure. I too think they are useless in this case and the whole output section could be left empty BUT I don't think it's possible to specify "writable" and "early-clobbered" in the input constraints. So I think d0 and d1 are there to make & possible.
I would try writing it like this:
asm volatile (
"rep\n"
"stosb\n"
:
: "c" (n), "a" (c), "D" (s)
: "%ecx", "%edi", "memory"
);
What are "d0" and "d1" used for?
In effect, it says that the final values of %ecx, %edi (assuming 32-bit) are stored in d0, d1 respectively. This serves a couple of purposes:
It lets the compiler know that, as outputs, these registers are effectively clobbered. By assigning them to temporary variables, an optimizing compiler also knows that there is no need to actually perform the 'store' operation.
The "=&" specifies these as early-clobber operands. They may be written to before all the inputs are consumed. So if the compiler is free to choose an input register, it shouldn't alias these two.
This isn't technically necessary for %ecx, since it's explicitly named as an input: "0" (n) - the 'rep' count in this case. I'm not sure it's necessary for %edi either, since it can't be updated before the input "1" (s) is consumed, and the instruction executed. And again, as it's explicitly named as an input, the compiler isn't free to choose another register. In short, "=&" doesn't hurt here, but it doesn't do anything.
As "a" (c) specifies an input-only register %eax set to (c), the compiler may assume that %eax still holds this value after the 'asm' - which is indeed the case with "rep; stosb;".
"memory" specifies that memory can be modified in a way unknown to the compiler - which is true in this case, it's setting (n) bytes starting at (r) to the value (c) - assuming the direction flag is cleared, which it should be. This does have the effect of forcing a reload of values, as the compiler can't assume that registers reflect the memory values they're supposed to anymore. It doesn't hurt, and it may be necessary to make it safe for a general case memset, but it's often overkill.
Edit: Input operands may not overlap clobber operands. It doesn't make sense to specify something as input-only and clobbered. I don't think the compiler allows this, and it wouldn't be wise to use an ambiguous specification even if it did. From the manual:
You may not write a clobber description in a way that overlaps with an input or output operand. For example, you may not have an operand describing a register class with one member if you mention that register in the clobber list.
Reviewing some old answers, I thought I would add a link to the excellent Lockless GCC inline ASM tutorial. The article builds on prior sections, unlike the gcc manual which is best described as a 'reference', and not really suited to any sort of structured learning.
I'm trying to call a function from within ASM. I know how to call it, but i'm having trouble finding how to get the return value of this function. An example follows:
C code:
int dummy() {
return 5;
}
(N)ASM code:
dummyFunction:
call dummy
;grab return into eax
inc eax ; eax should be 6 now
ret
Any ideas?
The return value is in eax. If you've called a C function from asm, you can read the return value from eax. If you're trying to return from an asm function to C, store the intended return value in eax.
Things get a little bit more complicated for returning floating point values, long long values, or structures, so ask if you need that and someone (maybe me) will help you.
Although the answers are sufficient to answer the OP's question, here's an extract covering most cases from DJGPP's manpage for completeness:
Return Value
Integers (of any size up to 32 bits) and pointers are returned in the %eax register.
Floating point values are returned in the 387 top-of-stack register, st(0).
Return values of type long long int are returned in %edx:%eax (the most significant word in %edx and the least significant in %eax).
Returning a structure is complicated and rarely useful; try to avoid it. (Note that this is different from returning a pointer to a structure.)
If your function returns void (e.g. no value), the contents of these registers are not used.
It depends on the platform and the calling convention, but usually, the return value should already be returned in eax if it's a primitive type or pointer and in the floating point register st(0) if it's a floating point type, I think.
Example:
int main(void)
{
int x = 10, y;
asm ("movl %1, %%eax;"
"movl %%eax, %0;"
:"=r"(y) /* y is output operand */
:"r"(x) /* x is input operand */
:"%eax"); /* %eax is clobbered register */
}
what is r(y)?
also why %% is used before eax? Generally single % is used right?
Okay, this is gcc inline assembler which very powerful but difficult to understand.
First off, the % char is a special char. It lets you define register and number placeholders (mor on this later). Unfortunately the % is also used to as part of a register name (such as %EAX) so in gcc inline assembler you have to use two percent chars if you want to name a register.
%0, %1 and %2 (ect..) are placeholder input and output operands. These are defined in the list followed by the assembler string.
In your example %0 becomes a placeholder for y, and %1 becomes a placeholder for x. The compiler will make sure the variables will be in the registers for input operands before the asm-code gets executed, and it will make sure the output operand will get written to the variable specified in the output operand list.
Now you should get an idea what r(y) is: It is an input operand that reserves a register for the variable y and assigns it to the placeholder %1 (because it is the second operand listed after the inline assembler string).
There are lots of other placeholder types. m lets you specify a memory location, and if I'm not mistaken i can be used for numeric constants. You'll find them all listed in the gcc documentation.
Then there is the clobber list. This list is important! It lists all registers, flags, memory-locations ect that gets modified in your assembler code (such as the EAX in your example). If you get this wrong the optimizer will not know what has been modified and it is very likely that you end up with code that doesn't work.
Your example is by the way almost pointless. It just loads the value X into a register and assigns this register to EAX. Afterwards EAX gets stored into another register which will then later become your y variable. So all it does is a simple assignment:
y = x;
A last thing: If you have worked with Intel-style assembler before: You have to read the arguments backwards. For all instructions the source operand is the one following the instruction itself, and the target operand is the one on the right of the comma. Compared to Intel syntax this is exactly the other way around.
Try this tutorial. It covers everything you ask: for example, try section 6 - it explains constraints quite well, and what the "=" sign is for. Even the concept of clobbered registers is covered (section 5.3).
The lines with "r" or "=r" are operand constraints. The "=" means output operand. Essentially, this:
:"=r"(y)
:"r"(x)
means that %0 (ie: the first operand) corresponds to y and is for output, and %1 (the second operand) corresponds to x.
A single % is normally used in AT&T syntax assembly, but for inline assembly the single % is used for operand references (eg: %0, %1) while a double % is used for literal register references. Think of it like the way you have to use a double % in a printf format if you want a literal % in the output.
A clobbered register is a register whose value will be modified by the assembly code. As you can see from the code, eax is written to. You need to tell gcc about this so that it knows that the compiled code can't keep anything it needs for later in eax when it's about to invoke this assembly.
I can't answer all of this, but a clobbered register is one that will get used somewhere in the computation in a way that will destroy its current value. So if the caller wants to use the current value later, it needs to save it somehow.
In asm directives like this, when you write the assembly you figure out which registers are going to be clobbered by it; you then tell the compiler this (as shown in your example), and the compiler does what it has to do to preserve the current value of that register if necessary. The compiler knows a lot about how values in registers and elsewhere will be used for later computations, but it usually can't analyse embedded assembly. So you do the analysis yourself and the compiler uses the clobbering information to safely incorporate the assembly into its optimisation choices.