What does equals sign g "=g" in GCC inline assembly mean / do? - c

I'm not sure what this inline assembly does:
asm ("mov %%esp, %0" : "=g" (esp));
especially the : "=g" (esp) part.

"=g" (esp) defines an output for the inline assembly. The g tells the compiler that it can use any general register, or memory, to store the result. The (esp) means that the result will be stored in the c variable named esp. mov %%esp, %0 is the assembly command, which simply moves the stack pointer into the 0th operand (the output). Therefore, this assembly simply stores the stack pointer in the variable named esp.

If you want the gory details, read the GCC documentation on Extended Asm.
The short answer is that this moves the x86 stack pointer (%esp register) into the C variable named "esp". The "=g" tells the compiler what sorts of operands it can substitute for the %0 in the assembly code. (In this case, it is a "general operand", which means pretty much any register or memory reference is allowed.)

Related

Is output always determined by the %eax register in inline assembly in C?

I was reading tutorials regarding inline assembly within C, and they tried a simple variable assignment with
int a=10, b;
asm ("movl %1, %%eax;
movl %%eax, %0;"
:"=r"(b) /* output */
:"r"(a) /* input */
:"%eax" /* clobbered register */
);
which made sense to me (move input into eax then move eax to output). But when I removed the %movl %%eax, 0 line (which is supposed to move the proper value to the output), the variable b was still assigned the proper value from the inline assembly.
My main question is how does the output 'know' to read from this %eax register?
An inline-assembly statement is not a function call.
The "return in EAX" thing is for functions; it's part of the calling convention that lets compilers make code that can interact with other code even when they're compiled separately. A calling convention is defined as part of an ABI doc.
As well as defining how to return (e.g. small non-FP objects in EAX, floating point in XMM0 or ST0), they also define where callers put args, and which registers you can use without saving/restoring (call-clobbered) and which you can (call-preserved). See https://en.wikipedia.org/wiki/Calling_convention in general, and https://www.agner.org/optimize/calling_conventions.pdf for more about x86 calling conventions.
This inflexible rigid set of rules doesn't apply to inline asm because it doesn't have to; the compiler necessarily can see the asm statement as part of the surrounding C code. That would defeat the whole point of inline. Instead, in GNU C inline asm you write operands / constraints that describe the asm to the compiler, effectively creating a custom calling convention for each asm statement. (With parts of that convention left up to the compiler's choice for "=r" outputs. Use "=a" if you want to force it to pick AL/AX/EAX/RAX.)
If you want to write asm that returns in EAX without having to tell the compiler about it, write a stand-alone function. (e.g. in a .s file, or an asm("") statement as the body of an __attribute__((naked)) C function. Either way you have to write the ret yourself and get args via the calling convention, too.)
Falling off the end of a non-void function after running an asm statement that leaves a value in EAX may appear to work with optimization disabled, but it's totally unsafe and will break as soon as you enable optimization and the compiler inlines it.
My main question is how does the output 'know' to read from this %eax register?
It probably just happened to pick EAX for the "=r" output when you compiled with optimization disabled. EAX is always GCC's first choice for evaluating expressions. Look at the compiler-generated asm output (gcc -S -fverbose-asm) to see what asm it generated around your asm, and which register it substituted into your asm template. You probably have mov %eax, %eax ; mov %eax, %eax.
Using mov as the first or last instruction of an asm template almost always means you're doing it wrong and should have used better constraints to tell the compiler where to put or where to find your data.
e.g. asm("" : "=r"(b) : "0"(a)) will make the compiler put the input into the same register as it's expecting the output operand. So that copies a value. (And forces the compiler to materialize it in a register, and forget anything it knows about the current value, defeating constant-propagation and value range optimizations, as well as stopping the compiler from optimizing away that temporary entirely.)
Why does issuing empty asm commands swap variables? describes that happening by change, same as your case with the compiler picking the same reg for input and output "r" operands. And illustrates using asm comments *inside the asm template to print out what the compiler chose for any %0 or %1 operands you don't otherwise reference explicitly**.
See also segmentation fault(core dumped) error while using inline assembly for more about the basics of using input and output constraints.
Also related: What happens to registers when you manipulate them using asm code in C++? for another example and writeup of how compilers handle register in GNU C inline asm statements.

GCC Inline-Assembly Error: "Operand size mismatch for 'int'"

first, if somebody knows a function of the Standard C Library, that prints
a string without looking for a binary zero, but requires the number of characters to draw, please tell me!
Otherwise, I have this problem:
void printStringWithLength(char *str_ptr, int n_chars){
asm("mov 4, %rax");//Function number (write)
asm("mov 1, %rbx");//File descriptor (stdout)
asm("mov $str_ptr, %rcx");
asm("mov $n_chars, %rdx");
asm("int 0x80");
return;
}
GCC tells the following error to the "int" instruction:
"Error: operand size mismatch for 'int'"
Can somebody tell me the issue?
There are a number of issues with your code. Let me go over them step by step.
First of all, the int $0x80 system call interface is for 32 bit code only. You should not use it in 64 bit code as it only accepts 32 bit arguments. In 64 bit code, use the syscall interface. The system calls are similar but some numbers are different.
Second, in AT&T assembly syntax, immediates must be prefixed with a dollar sign. So it's mov $4, %rax, not mov 4, %rax. The latter would attempt to move the content of address 4 to rax which is clearly not what you want.
Third, you can't just refer to the names of automatic variables in inline assembly. You have to tell the compiler what variables you want to use using extended assembly if you need any. For example, in your code, you could do:
asm volatile("mov $4, %%eax; mov $1, %%edi; mov %0, %%esi; mov %2, %%edx; syscall"
:: "r"(str_ptr), "r"(n_chars) : "rdi", "rsi", "rdx", "rax", "memory");
Fourth, gcc is an optimizing compiler. By default it assumes that inline assembly statements are like pure functions, that the outputs are a pure function of the explicit inputs. If the output(s) are unused, the asm statement can be optimized away, or hoisted out of loops if run with the same inputs.
But a system call like write has a side-effect you need the compiler to keep, so it's not pure. You need the asm statement to run the same number of times and in the same order as the C abstract machine would. asm volatile will make this happen. (An asm statement with no outputs is implicitly volatile, but it's good practice to make it explicit when the side effect is the main purpose of the asm statement. Plus, we do want to use an output operand to tell the compiler that RAX is modified, as well as being an input, which we couldn't do with a clobber.)
You do always need to accurately describe your asm's inputs, outputs, and clobbers to the compiler using Extended inline assembly syntax. Otherwise you'll step on the compiler's toes (it assumes registers are unchanged unless they're outputs or clobbers). (Related: How can I indicate that the memory *pointed* to by an inline ASM argument may be used? shows that a pointer input operand alone does not imply that the pointed-to memory is also an input. Use a dummy "m" input or a "memory" clobber to force all reachable memory to be in sync.)
You should simplify your code by not writing your own mov instructions to put data into registers but rather letting the compiler do this. For example, your assembly becomes:
ssize_t retval;
asm volatile ("syscall" // note only 1 instruction in the template
: "=a"(retval) // RAX gets the return value
: "a"(SYS_write), "D"(STDOUT_FILENO), "S"(str_ptr), "d"(n_chars)
: "memory", "rcx", "r11" // syscall destroys RCX and R11
);
where SYS_WRITE is defined in <sys/syscall.h> and STDOUT_FILENO in <stdio.h>. I am not going to explain all the details of extended inline assembly to you. Using inline assembly in general is usually a bad idea. Read the documentation if you are interested. (https://stackoverflow.com/tags/inline-assembly/info)
Fifth, you should avoid using inline assembly when you can. For example, to do system calls, use the syscall function from unistd.h:
syscall(SYS_write, STDOUT_FILENO, str_ptr, (size_t)n_chars);
This does the right thing. But it doesn't inline into your code, so use wrapper macros from MUSL for example if you want to really inline a syscall instead of calling a libc function.
Sixth, always check if the system call you want to call is already available in the C standard library. In this case, it is, so you should just write
write(STDOUT_FILENO, str_ptr, n_chars);
and avoid all of this altogether.
Seventh, if you prefer to use stdio, use fwrite instead:
fwrite(str_ptr, 1, n_chars, stdout);
There are so many things wrong with your code (and so little reason to use inline asm for it) that it's not worth trying to actually correct all of them. Instead, use the write(2) system call the normal way, via the POSIX function / libc wrapper as documented in the man page, or use ISO C <stdio.h> fwrite(3).
#include <unistd.h>
static inline
void printStringWithLength(const char *str_ptr, int n_chars){
write(1, str_ptr, n_chars);
// TODO: check error return value
}
Why your code doesn't assemble:
In AT&T syntax, immediates always need a $ decorator. Your code will assemble if you use asm("int $0x80").
The assembler is complaining about 0x80, which is a memory reference to the absolute address 0x80. There is no form of int that takes the interrupt vector as anything other than an immediate. I'm not sure exactly why it complains about the size, since memory references don't have an implied size in AT&T syntax.
That will get it to assemble, at which point you'll get linker errors:
In function `printStringWithLength':
5 : <source>:5: undefined reference to `str_ptr'
6 : <source>:6: undefined reference to `n_chars'
collect2: error: ld returned 1 exit status
(from the Godbolt compiler explorer)
mov $str_ptr, %rcx
means to mov-immediate the address of the symbol str_ptr into %rcx. In AT&T syntax, you don't have to declare external symbols before using them, so unknown names are assumed to be global / static labels. If you had a global variable called str_ptr, that instruction would reference its address (which is a link-time constant, so can be used as an immediate).
As other have said, this is completely the wrong way to go about things with GNU C inline asm. See the inline-assembly tag wiki for more links to guides.
Also, you're using the wrong ABI. int $0x80 is the x86 32-bit system call ABI, so it doesn't work with 64-bit pointers. What are the calling conventions for UNIX & Linux system calls on x86-64
See also the x86 tag wiki.

How to specify clobbered bottom of the x87 FPU stack with extended gcc assembly?

In a codebase of ours I found this snippet for fast, towards-negative-infinity1 rounding on x87:
inline int my_int(double x)
{
int r;
#ifdef _GCC_
asm ("fldl %1\n"
"fistpl %0\n"
:"=m"(r)
:"m"(x));
#else
// ...
#endif
return r;
}
I'm not extremely familiar with GCC extended assembly syntax, but from what I gather from the documentation:
r must be a memory location, where I'm writing back stuff;
x must be a memory location too, whence the data comes from.
there's no clobber specification, so the compiler can rest assured that at the end of the snippet the registers are as he left them.
Now, to come to my question: it's true that in the end the FPU stack is balanced, but what if all the 8 locations were already in use and I'm overflowing it? How can the compiler know that it cannot trust ST(7) to be where it left it? Should some clobber be added?
Edit I tried to specify st(7) in the clobber list and it seems to affect the codegen, now I'll wait for some confirmation of this fact.
As a side note: looking at the implementation of the barebones lrint both in glibc and in MinGW I see something like
__asm__ __volatile__ ("fistpl %0"
: "=m" (retval)
: "t" (x)
: "st");
where we are asking for the input to be placed directly in ST(0) (which avoids that potentially useless fldl); what is that "st" clobber? The docs seems to mention only t (i.e. the top of the stack).
yes, it depends from the current rounding mode, which in our application should always be "towards negative infinity".
looking at the implementation of the barebones lrint both in glibc and in MinGW I see something like
__asm__ __volatile__ ("fistpl %0"
: "=m" (retval)
: "t" (x)
: "st");
where we are asking for the input to be placed directly in ST(0) (which avoids that potentially useless fldl)
This is actually the correct way to represent the code you want as inline assembly.
To get the most optimal possible code generated, you want to make use of the inputs and outputs. Rather than hard-coding the necessary load/store instructions, let the compiler generate them. Not only does this introduce the possibility of eliding potentially unnecessary instructions, it also means that the compiler can better schedule these instructions when they are required (that is, it can interleave the instruction within a prior sequence of code, often minimizing its cost).
what is that "st" clobber? The docs seems to mention only t (i.e. the top of the stack).
The "st" clobber refers to the st(0) register, i.e., the top of the x87 FPU stack. What Intel/MASM notation calls st(0), AT&T/GAS notation generally refers to as simply st. And, as per GCC's documentation for clobbers, the items in the clobber list are "either register names or the special clobbers" ("cc" (condition codes/flags) and "memory"). So this just means that the inline assembly clobbers (overwrites) the st(0) register. The reason why this clobber is necessary is that the fistpl instruction pops the top of the stack, thus clobbering the original contents of st(0).
The only thing that concerns me regarding this code is the following paragraph from the documentation:
Clobber descriptions may not in any way overlap with an input or output operand. For example, you may not have an operand describing a register class with one member when listing that register in the clobber list. Variables declared to live in specific registers (see Explicit Register Variables) and used as asm input or output operands must have no part mentioned in the clobber description. In particular, there is no way to specify that input operands get modified without also specifying them as output operands.
When the compiler selects which registers to use to represent input and output operands, it does not use any of the clobbered registers. As a result, clobbered registers are available for any use in the assembler code.
As you already know, the t constraint means the top of the x87 FPU stack. The problem is, this is the same as the st register, and the documentation very clearly said that we could not have a clobber that specifies the same register as one of the input/output operands. Furthermore, since the documentation states that the compiler is forbidden to use any of the clobbered registers to represent input/output operands, this inline assembly makes an impossible request—load this value at the top of the x87 FPU stack without putting it in st!
Now, I would assume that the authors of glibc know what they are doing and are more familiar with the compiler's implementation of inline assembly than you or I, so this code is probably legal and legitimate.
Actually, it seems that the unusual case of the x87's stack-like registers forces an exception to the normal interactions between clobbers and operands. The official documentation says:
On x86 targets, there are several rules on the usage of stack-like registers in the operands of an asm. These rules apply only to the operands that are stack-like registers:
Given a set of input registers that die in an asm, it is necessary to know which are implicitly popped by the asm, and which must be explicitly popped by GCC.
An input register that is implicitly popped by the asm must be explicitly clobbered, unless it is constrained to match an output operand.
That fits our case exactly.
Further confirmation is provided by an example appearing in the official documentation (bottom of the linked section):
This asm takes two inputs, which are popped by the fyl2xp1 opcode, and replaces them with one output. The st(1) clobber is necessary for the compiler to know that fyl2xp1 pops both inputs.
asm ("fyl2xp1" : "=t" (result) : "0" (x), "u" (y) : "st(1)");
Here, the clobber st(1) is the same as the input constraint u, which seems to violate the above-quoted documentation regarding clobbers, but is used and justified for precisely the same reason that "st" is used as the clobber in your original code, because fistpl pops the input.
All of that said, and now that you know how to correctly write the code in inline assembly, I have to echo previous commenters who suggested that the best solution would be not to use inline assembly at all. Just call lrint, which not only has the exact semantics that you want, but can also be better optimized by the compiler under certain circumstances (e.g., transforming it into a single cvtsd2si instruction when the target architecture supports SSE).

Accessing CPU Registers from C

Recently I have been playing around with inline assmbly in C, and was wondering if I could directly access a register from a variable
Something like this:
volatile uint64_t* flags = RFLAGS;
Where RFLAGS is the CPUs flags register. Obviously, the above code doesn't compile, but I was wondering if there was a similar way to achieve the desired result.
Compiling for Ubuntu x86_64 with gcc
You can obtain the value of the flags register via inline asm, but this operation is not useful because you have no control over sequencing of the access with respect to other operations. In particular, what you likely want is for the flags resulting from some particular arithmetic operation to be available at the beginning of your asm block, but there is no way in which to express that constraint to the compiler. For example, suppose you wrote:
z = x + y;
__asm__ ( "pushf ; pop %0" : "=r"(flags) );
You might expect the flags resulting from the addition to be available. However, the compiler may have chosen to:
reorder the arithmetic after the asm, since neither has a result that depends on the other.
adjust the stack pointer with add/sub in between, clobbering the flags.
use lea instead of add to implement the addition, producing no flags.
omit the addition entirely based on determination that the result is not used.
etc.
The same principle applies for accessing any register that might be modified by the code the compiler generates. There is a syntax (in GCC/"GNU C") for accessing registers not subject to this issue; it looks like:
register int var __asm__("regname");
where regname is replaced by the name of the register. This is largely useless on most targets, but it can allow you to control register usage for input/output constraints to asm, and some targets have special values kept permanently in general purpose registers (the thread-local-storage pointer is the most common) which could be useful in some situations.
Yes, certainly. You can PUSHF, PUSHFD or PUSHFQ the flags and them pop them into another register. For example:
unsigned int flags;
__asm{
pushfd
pop edx
mov flags, edx
}
For gcc under Ubuntu using AT&T syntax, you may find the following more immediately useable:
unsigned int flags:
__asm__("pushf\n\t"
"pop edx\n\t"
"movl edx, flags");
From there you can view them at your leisure!

What is this code trying to do?

I'm trying to understand how the following code is working:
#define M32toX128(x128,m32) __asm__ \
("movddup %1, %0\n\t" \
"movsldup %0, %0" \
: "=&x"(x128) : "m"(m32) )
I have only basic assembly knowledge. Searching around and from the context of the program that is using it, I have understood that it is duplicating a 32-bit variable and storing the result in a 128-bit variable.
My questions are:
What do %0 and %1 refer to?
What are the colons (:) doing?
What is the actual assembly code that is executed? I mean after replacing %ns, "=&x"(x128)...
gcc inline assembly is a complicated beast, too complicated to describe here in detail.
As a quick overview, the general form of the asm block is: "template" : outputs : inputs : clobbers. You can refer to the outputs and inputs (collectively known as operands) in the template by using % followed by a zero-based index. Thus %0 refers to x128 and %1 refers to m32. For each operand, you can specify a constraint that tells the compiler how to allocate said operand. The =&x means, allocate the x128 as an early-clobber output in any available xmm register, and the m means use a memory address for the operand. See the manual for the mind-boggling details.
The actual assembly generated will depend on the operand choices the compiler uses. You can see that if you ask for an assembly listing using the -S option. Assuming m32 is a local variable, the code may look like:
movddup x(%esp), %xmmN
movsldup %xmmN, %xmmN
Note that gcc inline assembler is of course gcc and architecture specific, and that means it's simpler to use the equivalent compiler intrinsics.
This code will be passed from GCC to the assembler stage. Some part of the macro will be replaced in the process. Here is the documentation: http://www.ibiblio.org/gferg/ldp/GCC-Inline-Assembly-HOWTO.html
%0 and %1 will be replaced with the values that you passed to the C macro.
The : is used to separate parts of the macro. The first mandatory part is the template. The second one is for output operands. See http://www.ibiblio.org/gferg/ldp/GCC-Inline-Assembly-HOWTO.html#s5 for the full.
In this case, you have an output = which is 128 bits (x) and which gets trashed (&) by the macro. m means this is a memory operand.

Resources