C, Inline Assembly - manual function call [duplicate] - c

I don't have experience in assembly, but this is what I've been working on. I would like input if I'm missing any fundamental aspects to passing parameters and calling a function via pointer in assembly.
For instance I'm wondering if I supposed to restore ecx, edx, esi, edi. I read they are general purpose registers, but I couldn't find if they need to be restored? Is there any kind of cleanup I am supposed to do after a call?
This is the code I have now, and it does work:
#include "stdio.h"
void foo(int a, int b, int c, int d)
{
printf("values = %d and %d and %d and %d\r\n", a, b, c, d);
}
int main()
{
int a=3,b=6,c=9,d=12;
__asm__(
"mov %3, %%ecx;"
"mov %2, %%edx;"
"mov %1, %%esi;"
"mov %0, %%edi;"
"call %4;"
:
: "g"(a), "g"(b), "g"(c), "g"(d), "a"(foo)
);
}

The original question was Is this assembly function call safe/complete?. The answer to that is: no. While it may appear to work in this simple example (especially if optimizations are disabled), you are violating rules that will eventually lead to failures (ones that are really hard to track down).
I'd like to address the (obvious) followup question of how to make it safe, but without feedback from the OP on the actual intent, I can't really do that.
So, I'll do the best I can with what we have and try to describe the things that make it unsafe and some of the things you can do about it.
Let's start by simplifying that asm:
__asm__(
"mov %0, %%edi;"
:
: "g"(a)
);
Even with this single statement, this code is already unsafe. Why? Because we are changing the value of a register (edi) without letting the compiler know.
How can the compiler not know you ask? After all, it's right there in the asm! The answer comes from this line in the gcc docs:
GCC does not parse the assembler instructions themselves and does not
know what they mean or even whether they are valid assembler input.
In that case, how do you let gcc know what's going on? The answer lies in using the constraints (the stuff after the colons) to describe the impact of the asm.
Perhaps the simplest way to fix this code would be like this:
__asm__(
"mov %0, %%edi;"
:
: "g"(a)
: edi
);
This adds edi to the clobber list. In brief, this tell gcc that the value of edi is going to be changed by the code, and that gcc shouldn't assume any particular value will be in it when the asm exits.
Now, while that's the easiest, it's not necessarily the best way. Consider this code:
__asm__(
""
:
: "D"(a)
);
This uses a machine constraint to tell gcc to put the value of the variable a into the edi register for you. Doing it this way, gcc will load the register for you at a 'convenient' time, perhaps by always keeping a in edi.
There is one (significant) caveat to this code: By putting the parameter after the 2nd colon, we are declaring it to be an input. Input parameters are required to be read-only (ie they must have the same value on exiting the asm).
In your case, the call statement means that we won't be able to guarantee that edi won't be changed, so this doesn't quite work. There are a few ways to deal with this. The easiest is to move the constraint up after the first colon, making it an output, and specify "+D" to indicate that the value is read+write. But then the contents of a are going to be pretty much undefined after the asm (printf could set it to anything). If destroying a is unacceptable, there's always something like this:
int junk;
__asm__ volatile (
""
: "=D" (junk)
: "0"(a)
);
This tells gcc that on starting the asm, it should put the value of the variable a into the same place as output constraint #0 (ie edi). It also says that on output, edi won't be a anymore, it will contain the variable junk.
Edit: Since the 'junk' variable isn't actually going to be used, we need to add the volatile qualifier. Volatile was implicit when there weren't any output parameters.
One other point on that line: You end it with a semi-colon. This is legal and will work as expected. However, if you ever want to use the -S command line option to see exactly what code got generated (and if you want to get good with inline asm, you will), you will find that produces difficult-to-read code. I'd recommend using \n\t instead of a semi-colon.
All that and we're still on the first line...
Obviously the same would apply to the other two mov statements.
Which brings us to the call statement.
Both Michael and I have listed a number of reasons doing call in inline asm is difficult.
Handling all the registers that may be clobbered by the function call's ABI.
Handling red-zone.
Handling alignment.
Memory clobber.
If the goal here is 'learning,' then feel free to experiment. But I don't know that I would ever feel comfortable doing this in production code. Even when it looks like it works, I'd never feel confident there wasn't some weird case I'd missed. That's aside from my normal concerns about using inline asm at all.
I know, that's a lot of information. Probably more than you were looking for as an introduction to gcc's asm command, but you've picked a challenging place to start.
If you haven't done so already, spend time looking over all the docs in gcc's Assembly Language interface. There's a lot of good information there along with examples to try to explain how it all works.

I read they are general purpose registers, but I couldn't find if they
need to be restored?
I am not the expert in the field, but from my reading of the x86-64 ABI (Figure 3.4) the following registers: %rdi, %rsi, %rdx, and %rcx are not preserved between function calls, thus apparently don't require to be restored.
As commented by David Wohlferd you should be careful, because either way, the compiler will not be aware of the "custom" function call and in consequence you may get into its way, particularly because it may be not aware of registers modification.

Related

Is output always determined by the %eax register in inline assembly in C?

I was reading tutorials regarding inline assembly within C, and they tried a simple variable assignment with
int a=10, b;
asm ("movl %1, %%eax;
movl %%eax, %0;"
:"=r"(b) /* output */
:"r"(a) /* input */
:"%eax" /* clobbered register */
);
which made sense to me (move input into eax then move eax to output). But when I removed the %movl %%eax, 0 line (which is supposed to move the proper value to the output), the variable b was still assigned the proper value from the inline assembly.
My main question is how does the output 'know' to read from this %eax register?
An inline-assembly statement is not a function call.
The "return in EAX" thing is for functions; it's part of the calling convention that lets compilers make code that can interact with other code even when they're compiled separately. A calling convention is defined as part of an ABI doc.
As well as defining how to return (e.g. small non-FP objects in EAX, floating point in XMM0 or ST0), they also define where callers put args, and which registers you can use without saving/restoring (call-clobbered) and which you can (call-preserved). See https://en.wikipedia.org/wiki/Calling_convention in general, and https://www.agner.org/optimize/calling_conventions.pdf for more about x86 calling conventions.
This inflexible rigid set of rules doesn't apply to inline asm because it doesn't have to; the compiler necessarily can see the asm statement as part of the surrounding C code. That would defeat the whole point of inline. Instead, in GNU C inline asm you write operands / constraints that describe the asm to the compiler, effectively creating a custom calling convention for each asm statement. (With parts of that convention left up to the compiler's choice for "=r" outputs. Use "=a" if you want to force it to pick AL/AX/EAX/RAX.)
If you want to write asm that returns in EAX without having to tell the compiler about it, write a stand-alone function. (e.g. in a .s file, or an asm("") statement as the body of an __attribute__((naked)) C function. Either way you have to write the ret yourself and get args via the calling convention, too.)
Falling off the end of a non-void function after running an asm statement that leaves a value in EAX may appear to work with optimization disabled, but it's totally unsafe and will break as soon as you enable optimization and the compiler inlines it.
My main question is how does the output 'know' to read from this %eax register?
It probably just happened to pick EAX for the "=r" output when you compiled with optimization disabled. EAX is always GCC's first choice for evaluating expressions. Look at the compiler-generated asm output (gcc -S -fverbose-asm) to see what asm it generated around your asm, and which register it substituted into your asm template. You probably have mov %eax, %eax ; mov %eax, %eax.
Using mov as the first or last instruction of an asm template almost always means you're doing it wrong and should have used better constraints to tell the compiler where to put or where to find your data.
e.g. asm("" : "=r"(b) : "0"(a)) will make the compiler put the input into the same register as it's expecting the output operand. So that copies a value. (And forces the compiler to materialize it in a register, and forget anything it knows about the current value, defeating constant-propagation and value range optimizations, as well as stopping the compiler from optimizing away that temporary entirely.)
Why does issuing empty asm commands swap variables? describes that happening by change, same as your case with the compiler picking the same reg for input and output "r" operands. And illustrates using asm comments *inside the asm template to print out what the compiler chose for any %0 or %1 operands you don't otherwise reference explicitly**.
See also segmentation fault(core dumped) error while using inline assembly for more about the basics of using input and output constraints.
Also related: What happens to registers when you manipulate them using asm code in C++? for another example and writeup of how compilers handle register in GNU C inline asm statements.

Why can't local variable be used in GNU C basic inline asm statements?

Why cannot I use local variables from main to be used in basic asm inline? It is only allowed in extended asm, but why so?
(I know local variables are on the stack after return address (and therefore cannot be used once the function return), but that should not be the reason to not use them)
And example of basic asm:
int a = 10; //global a
int b = 20; //global b
int result;
int main() {
asm ( "pusha\n\t"
"movl a, %eax\n\t"
"movl b, %ebx\n\t"
"imull %ebx, %eax\n\t"
"movl %eax, result\n\t"
"popa");
printf("the answer is %d\n", result);
return 0;
}
example of extended:
int main (void) {
int data1 = 10; //local var - could be used in extended
int data2 = 20;
int result;
asm ( "imull %%edx, %%ecx\n\t"
"movl %%ecx, %%eax"
: "=a"(result)
: "d"(data1), "c"(data2));
printf("The result is %d\n",result);
return 0;
}
Compiled with:
gcc -m32 somefile.c
platform:
uname -a:
Linux 5.0.0-32-generic #34-Ubuntu SMP Wed Oct 2 02:06:48 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
You can use local variables in extended assembly, but you need to tell the extended assembly construct about them. Consider:
#include <stdio.h>
int main (void)
{
int data1 = 10;
int data2 = 20;
int result;
__asm__(
" movl %[mydata1], %[myresult]\n"
" imull %[mydata2], %[myresult]\n"
: [myresult] "=&r" (result)
: [mydata1] "r" (data1), [mydata2] "r" (data2));
printf("The result is %d\n",result);
return 0;
}
In this [myresult] "=&r" (result) says to select a register (r) that will be used as an output (=) value for the lvalue result, and that register will be referred to in the assembly as %[myresult] and must be different from the input registers (&). (You can use the same text in both places, result instead of myresult; I just made it different for illustration.)
Similarly [mydata1] "r" (data1) says to put the value of expression data1 into a register, and it will be referred to in the assembly as %[mydata1].
I modified the code in the assembly so that it only modifies the output register. Your original code modifies %ecx but does not tell the compiler it is doing that. You could have told the compiler that by putting "ecx" after a third :, which is where the list of “clobbered” registers goes. However, since my code lets the compiler assign a register, I would not have a specific register to list in the clobbered register. There may be a way to tell the compiler that one of the input registers will be modified but is not needed for output, but I do not know. (Documentation is here.) For this task, a better solution is to tell the compiler to use the same register for one of the inputs as the output:
__asm__(
" imull %[mydata1], %[myresult]\n"
: [myresult] "=r" (result)
: [mydata1] "r" (data1), [mydata2] "0" (data2));
In this, the 0 with data2 says to make it the same as operand 0. The operands are numbered in the order they appear, starting with 0 for the first output operand and continuing into the input operands. So, when the assembly code starts, %[myresult] will refer to some register that the value of data2 has been placed in, and the compiler will expect the new value of result to be in that register when the assembly is done.
When doing this, you have to match the constraint with how a thing will be used in assembly. For the r constraint, the compiler supplies some text that can be used in assembly language where a general processor register is accepted. Others include m for a memory reference, and i for an immediate operand.
There is little distinction between "Basic asm" and "Extended asm"; "basic asm" is just a special case where the __asm__ statement has no lists of outputs, inputs, or clobbers. The compiler does not do % substitution in the assembly string for Basic asm. If you want inputs or outputs you have to specify them, and then it's what people call "extended asm".
In practice, it may be possible to access external (or even file-scope static) objects from "basic asm". This is because these objects will (respectively may) have symbol names at the assembly level. However, to perform such access you need to be careful of whether it is position-independent (if your code will be linked into libraries or PIE executables) and meets other ABI constraints that might be imposed at linking time, and there are various considerations for compatibility with link-time optimization and other transformations the compiler may perform. In short, it's a bad idea because you can't tell the compiler that a basic asm statement modified memory. There's no way to make it safe.
A "memory" clobber (Extended asm) can make it safe to access static-storage variables by name from the asm template.
The use-case for basic asm is things that modify the machine state only, like asm("cli") in a kernel to disable interrupts, without reading or writing any C variables. (Even then, you'd often use a "memory" clobber to make sure the compiler had finished earlier memory operations before changing machine state.)
Local (automatic storage, not static ones) variables fundamentally never have symbol names, because they don't exist in a single instance; there's one object per live instance of the block they're declared in, at runtime. As such, the only possible way to access them is via input/output constraints.
Users coming from MSVC-land may find this surprising since MSVC's inline assembly scheme papers over the issue by transforming local variable references in their version of inline asm into stack-pointer-relative accesses, among other things. The version of inline asm it offers however is not compatible with an optimizing compiler, and little to no optimization can happen in functions using that type of inline asm. GCC and the larger compiler world that grew alongside C out of unix does not do anything similar.
You can't safely use globals in Basic Asm statements either; it happens to work with optimization disabled but it's not safe and you're abusing the syntax.
There's very little reason to ever use Basic Asm. Even for machine-state control like asm("cli") to disable interrupts, you'd often want a "memory" clobber to order it wrt. loads / stores to globals. In fact, GCC's https://gcc.gnu.org/wiki/ConvertBasicAsmToExtended page recommends never using Basic Asm because it differs between compilers, and GCC might change to treating it as clobbering everything instead of nothing (because of existing buggy code that makes wrong assumptions). This would make a Basic Asm statement that uses push/pop even more inefficient if the compiler is also generating stores and reloads around it.
Basically the only use-case for Basic Asm is writing the body of an __attribute__((naked)) function, where data inputs/outputs / interaction with other code follows the ABI's calling convention, instead of whatever custom convention the constraints / clobbers describe for a truly inline block of code.
The design of GNU C inline asm is that it's text that you inject into the compiler's normal asm output (which is then fed to the assembler, as). Extended asm makes the string a template that it can substitute operands into. And the constraints describe how the asm fits into the data-flow of the program logic, as well as registers it clobbers.
Instead of parsing the string, there is syntax that you need to use to describe exactly what it does. Parsing the template for var names would only solve part of the language-design problem that operands need to solve, and would make the compiler's code more complicated. (It would have to know more about every instruction to know whether memory, register, or immediate was allowed, and stuff like that. Normally its machine-description files only need to know how to go from logical operation to asm, not the other direction.)
Your Basic asm block is broken because you modify C variables without telling the compiler about it. This could break with optimization enabled (maybe only with more complex surrounding code, but happening to work is not the same thing as actually safe. This is why merely testing GNU C inline asm code is not even close to sufficient for it to be future proof against new compilers and changes in surrounding code). There is no implicit "memory" clobber. (Basic asm is the same as Extended asm except for not doing % substitution on the string literal. So you don't need %% to get a literal % in the asm output. It's implicitly volatile like Extended asm with no outputs.)
Also note that if you were targeting i386 MacOS, you'd need _result in your asm. result only happens to work because the asm symbol name exactly matches the C variable name. Using Extended asm constraints would make it portable between GNU/Linux (no leading underscore) vs. other platforms that do use a leading _.
Your Extended asm is broken because you modify an input ("c") (without telling the compiler that register is also an output, e.g. an output operand using the same register).
It's also inefficient: if a mov is the first or last instruction of your template, you're almost always doing it wrong and should have used better constraints.
Instead, you can do:
asm ("imull %%edx, %%ecx\n\t"
: "=c"(result)
: "d"(data1), "c"(data2));
Or better, use "+r"(data2) and "r"(data1) operands to give the compiler free choice when doing register allocation instead of potentially forcing the compiler to emit unnecessary mov instructions. (See #Eric's answer using named operands and "=r" and a matching "0" constraint; that's equivalent to "+r" but lets you use different C names for the input and output.)
Look at the asm output of the compiler to see how code-gen happened around your asm statement, if you want to make sure it was efficient.
Since local vars don't have a symbol / label in the asm text (instead they live in registers or at some offset from the stack or frame pointer, i.e. automatic storage), it can't work to use symbol names for them in asm.
Even for global vars, you want the compiler to be able to optimize around your inline asm as much as possible, so you want to give the compiler the option of using a copy of a global var that's already in a register, instead of getting the value in memory in sync with a store just so your asm can reload that.
Having the compiler try to parse your asm and figure out which C local var names are inputs and outputs would have been possible. (But would be a complication.)
But if you want it to be efficient, you need to figure out when x in the asm can be a register like EAX, instead of doing something braindead like always storing x into memory before the asm statement, and then replacing x with 8(%rsp) or whatever. If you want to give the asm statement control over where inputs can be, you need constraints in some form. Doing it on a per-operand basis makes total sense, and means the inline-asm handling doesn't have to know that bts can take an immediate or register source but not memory, for and other machine-specific details like that. (Remember; GCC is a portable compiler; baking a huge amount of per-machine info into the inline-asm parser would be bad.)
(MSVC forces all C vars in _asm{} blocks to be memory. It's impossible to use to efficiently wrap a single instruction because the input has to bounce through memory, even if you wrap it in a function so you can use the officially-supported hack of leaving a value in EAX and falling off the end of a non-void function. What is the difference between 'asm', '__asm' and '__asm__'? And in practice MSVC's implementation was apparently pretty brittle and hard to maintain, so much so that they removed it for x86-64, and it was documented as not supported in function with register args even in 32-bit mode! That's not the fault of the syntax design, though, just the actual implementation.)
Clang does support -fasm-blocks for _asm { ... } MSVC-style syntax where it parses the asm and you use C var names. It probably forces inputs and outputs into memory but I haven't checked.
Also note that GCC's inline asm syntax with constraints is designed around the same system of constraints that GCC-internals machine-description files use to describe the ISA to the compiler. (The .md files in the GCC source that tell the compiler about an instruction to add numbers that takes inputs in "r" registers, and has the text string for the mnemonic. Notice the "r" and "m" in some examples in https://gcc.gnu.org/onlinedocs/gccint/RTL-Template.html).
The design model of asm in GNU C is that it's a black-box for optimizer; you must fully describe the effects of the code (to the optimizer) using constraints. If you clobber a register, you have to tell the compiler. If you have an input operand that you want to destroy, you need to use a dummy output operand with a matching constraint, or a "+r" operand to update the corresponding C variable's value.
If you read or write memory pointed-to by a register input, you have to tell the compiler. How can I indicate that the memory *pointed* to by an inline ASM argument may be used?
If you use the stack, you have to tell the compiler (but you can't, so instead you have to avoid stepping on the red-zone :/ Using base pointer register in C++ inline asm) See also the inline-assembly tag wiki
GCC's design makes it possible for the compiler to give you an input in a register, and use the same register for a different output. (Use an early-clobber constraint if that's not ok; GCC's syntax is designed to efficiently wrap a single instruction that reads all its inputs before writing any of its outputs.)
If GCC could only infer all of these things from C var names appearing in asm source, I don't think that level of control would be possible. (At least not plausible.) And there'd probably be surprising effects all over the place, not to mention missed optimizations. You only ever use inline asm when you want maximum control over things, so the last thing you want is the compiler using a lot of complex opaque logic to figure out what to do.
(Inline asm is complex enough in its current design, and not used much compared to plain C, so a design that requires very complex compiler support would probably end up with a lot of compiler bugs.)
GNU C inline asm isn't designed for low-performance low-effort. If you want easy, just write in pure C or use intrinsics and let the compiler do its job. (And file missed-optimization bug reports if it makes sub-optimal code.)
This is because asm is a defined language which is common for all compilers on the same processor family. After using the __asm__ keyword, you can reliably use any good manual for the processor to then start writing useful code.
But it does not have a defined interface for C, and lets be honest, if you don't interface your assembler with your C code then why is it there?
Examples of useful very simple asm: generate a debug interrupt; set the floating point register mode (exceptions/accuracy);
Each compiler writer has invented their own mechanism to interface to C. For example in one old compiler you had to declare the variables you want to share as named registers in the C code. In GCC and clang they allow you to use their quite messy 2-step system to reference an input or output index, then associate that index with a local variable.
This mechanism is the "extension" to the asm standard.
Of course, the asm is not really a standard. Change processor and your asm code is trash. When we talk in general about sticking to the c/c++ standards and not using extensions, we don't talk about asm, because you are already breaking every portability rule there is.
Then, on top of that, if you are going to call C functions, or your asm declares functions that are callable by C then you will have to match to the calling conventions of your compiler. These rules are implicit. They constrain the way you write your asm, but it will still be legal asm, by some criteria.
But if you were just writing your own asm functions, and calling them from asm, you may not be constrained so much by the c/c++ conventions: make up your own register argument rules; return values in any register you want; make stack frames, or don't; preserve the stack frame through exceptions - who cares?
Note that you might still be constrained by the platform's relocatable code conventions (these are not "C" conventions, but are often described using C syntax), but this is still one way that you can write a chunk of "portable" asm functions, then call them using "extended" embedded asm.

Inserting "marker" instructions into assembly without GCC reordering them

For purposes of doing performance analysis it is useful to be able to
tell which line of C code goes with which line of generated assembly
code. This can be very difficult once a sufficient number of
optimization passes get involved, and I devised the following scheme
to make it easier (though it has a lot of caveats). I figured I would
use in-line assembly to insert an instruction that is effectively a
nop, but that the compiler would rarely or never generate itself. Then
when I looked at the generated code I could infer that assembly code
that appears between the inserted marker instructions probably comes
from C code that lies between the in-line assembly statements.
I came up with these candidates:
// Force insertion of a instruction that will only clobber
// flags and that the compiler hardly ever uses itself. Lie and say
// that it alters memory to try to prevent the compiler from moving
// around. Mark it volatile so the compiler can't remove it entirely.
#define ASSEMBLY_MARKER_0() \
__asm__ volatile ("cld" : /* no outputs */ : /* no inputs */ : "memory", "cc")
#define ASSEMBLY_MARKER_1() \
__asm__ volatile ("xorl %%eax,0" : /* no outputs */ : /* no inputs */ : "memory", "cc")
Then I decided to test whether the compiler would move instructions
across these boundaries. clang appears to do exactly what I want, but
GCC appears to not be deterred either by the memory clobbering or the
fact that this snippet is volatile. It reorders instructions anyway!
Is there any way to prevent this?
I know there are a lot of caveats to this method even if I get it to
work -- I may heavily influence generated code around the markers. But
I maintain that it would still be useful for finding things like
accidental implicit conversions between integer widths, and other
"wait that should never be necessary..." type problems.
You can see the difference between GCC and clang here: https://godbolt.org/z/ZtUPc9
C code:
int f(int x)
{
__asm__ volatile ("xorl %%eax,0" : /* no outputs */ : /* no inputs */ : "memory", "cc");
int j = x << 3;
__asm__ volatile ("xorl %%eax,0" : /* no outputs */ : /* no inputs */ : "memory", "cc");
return j;
}
GCC:
xorl %eax,0
xorl %eax,0
lea eax, [0+rdi*8]
ret
Clang:
xor dword ptr [0], eax
lea eax, [8*rdi]
xor dword ptr [0], eax
ret
Edits to answer questions in comments:
Why not nops? Because gcc inserts those itself often. The point is to stick out.
Why not move code into its own function? If you're doing this analysis on C++ template code for example, there be many layers of inlining that occur before producing the function that actually goes in the executable, and the code may be very different if you turn off the inlining (e.g. the code may have been written with the assumption that constant folding, dead code elimitation etc would get rid of trivial things).
Then I decided to test whether the compiler would move instructions across these boundaries. clang appears to do exactly what I want, but GCC appears to not be deterred either by the memory clobbering or the fact that this snippet is volatile. It reorders instructions anyway! Is there any way to prevent this?
Not really. The point is that such memory barriers avoid reordering stuff across it that is volatile (like volatile accesses or asm volatile) and / or memory accesses. Or in the case of x86 and cc (condition code) parts of def-use chains of condition code cannot be moved across. Such barriers do not whatsoever avoid moving unrelated instructions across it.
Sometimes it can be helpful to add options -save-temps -fverbose-asm to better understand assembly code and its relation to C. New versions of GCC dump C/C++ code alongside assembly code (dumped as *.s). When you inspect assembly (as opposed to disassembly) it's sufficient to inject asm comments to show where the inline asm is injected, there is no need to add actual instructions. Legibility of assembly might be improved by disabling debug-info (-g0).
To better understand the code, you can also disable passes that usually result in great amount of instruction reordering like instruction scheduling (-fno-schedule-insns,
-fno-schedule-insns2) but that has a big performance impact, of course.

GCC Inline-Assembly Error: "Operand size mismatch for 'int'"

first, if somebody knows a function of the Standard C Library, that prints
a string without looking for a binary zero, but requires the number of characters to draw, please tell me!
Otherwise, I have this problem:
void printStringWithLength(char *str_ptr, int n_chars){
asm("mov 4, %rax");//Function number (write)
asm("mov 1, %rbx");//File descriptor (stdout)
asm("mov $str_ptr, %rcx");
asm("mov $n_chars, %rdx");
asm("int 0x80");
return;
}
GCC tells the following error to the "int" instruction:
"Error: operand size mismatch for 'int'"
Can somebody tell me the issue?
There are a number of issues with your code. Let me go over them step by step.
First of all, the int $0x80 system call interface is for 32 bit code only. You should not use it in 64 bit code as it only accepts 32 bit arguments. In 64 bit code, use the syscall interface. The system calls are similar but some numbers are different.
Second, in AT&T assembly syntax, immediates must be prefixed with a dollar sign. So it's mov $4, %rax, not mov 4, %rax. The latter would attempt to move the content of address 4 to rax which is clearly not what you want.
Third, you can't just refer to the names of automatic variables in inline assembly. You have to tell the compiler what variables you want to use using extended assembly if you need any. For example, in your code, you could do:
asm volatile("mov $4, %%eax; mov $1, %%edi; mov %0, %%esi; mov %2, %%edx; syscall"
:: "r"(str_ptr), "r"(n_chars) : "rdi", "rsi", "rdx", "rax", "memory");
Fourth, gcc is an optimizing compiler. By default it assumes that inline assembly statements are like pure functions, that the outputs are a pure function of the explicit inputs. If the output(s) are unused, the asm statement can be optimized away, or hoisted out of loops if run with the same inputs.
But a system call like write has a side-effect you need the compiler to keep, so it's not pure. You need the asm statement to run the same number of times and in the same order as the C abstract machine would. asm volatile will make this happen. (An asm statement with no outputs is implicitly volatile, but it's good practice to make it explicit when the side effect is the main purpose of the asm statement. Plus, we do want to use an output operand to tell the compiler that RAX is modified, as well as being an input, which we couldn't do with a clobber.)
You do always need to accurately describe your asm's inputs, outputs, and clobbers to the compiler using Extended inline assembly syntax. Otherwise you'll step on the compiler's toes (it assumes registers are unchanged unless they're outputs or clobbers). (Related: How can I indicate that the memory *pointed* to by an inline ASM argument may be used? shows that a pointer input operand alone does not imply that the pointed-to memory is also an input. Use a dummy "m" input or a "memory" clobber to force all reachable memory to be in sync.)
You should simplify your code by not writing your own mov instructions to put data into registers but rather letting the compiler do this. For example, your assembly becomes:
ssize_t retval;
asm volatile ("syscall" // note only 1 instruction in the template
: "=a"(retval) // RAX gets the return value
: "a"(SYS_write), "D"(STDOUT_FILENO), "S"(str_ptr), "d"(n_chars)
: "memory", "rcx", "r11" // syscall destroys RCX and R11
);
where SYS_WRITE is defined in <sys/syscall.h> and STDOUT_FILENO in <stdio.h>. I am not going to explain all the details of extended inline assembly to you. Using inline assembly in general is usually a bad idea. Read the documentation if you are interested. (https://stackoverflow.com/tags/inline-assembly/info)
Fifth, you should avoid using inline assembly when you can. For example, to do system calls, use the syscall function from unistd.h:
syscall(SYS_write, STDOUT_FILENO, str_ptr, (size_t)n_chars);
This does the right thing. But it doesn't inline into your code, so use wrapper macros from MUSL for example if you want to really inline a syscall instead of calling a libc function.
Sixth, always check if the system call you want to call is already available in the C standard library. In this case, it is, so you should just write
write(STDOUT_FILENO, str_ptr, n_chars);
and avoid all of this altogether.
Seventh, if you prefer to use stdio, use fwrite instead:
fwrite(str_ptr, 1, n_chars, stdout);
There are so many things wrong with your code (and so little reason to use inline asm for it) that it's not worth trying to actually correct all of them. Instead, use the write(2) system call the normal way, via the POSIX function / libc wrapper as documented in the man page, or use ISO C <stdio.h> fwrite(3).
#include <unistd.h>
static inline
void printStringWithLength(const char *str_ptr, int n_chars){
write(1, str_ptr, n_chars);
// TODO: check error return value
}
Why your code doesn't assemble:
In AT&T syntax, immediates always need a $ decorator. Your code will assemble if you use asm("int $0x80").
The assembler is complaining about 0x80, which is a memory reference to the absolute address 0x80. There is no form of int that takes the interrupt vector as anything other than an immediate. I'm not sure exactly why it complains about the size, since memory references don't have an implied size in AT&T syntax.
That will get it to assemble, at which point you'll get linker errors:
In function `printStringWithLength':
5 : <source>:5: undefined reference to `str_ptr'
6 : <source>:6: undefined reference to `n_chars'
collect2: error: ld returned 1 exit status
(from the Godbolt compiler explorer)
mov $str_ptr, %rcx
means to mov-immediate the address of the symbol str_ptr into %rcx. In AT&T syntax, you don't have to declare external symbols before using them, so unknown names are assumed to be global / static labels. If you had a global variable called str_ptr, that instruction would reference its address (which is a link-time constant, so can be used as an immediate).
As other have said, this is completely the wrong way to go about things with GNU C inline asm. See the inline-assembly tag wiki for more links to guides.
Also, you're using the wrong ABI. int $0x80 is the x86 32-bit system call ABI, so it doesn't work with 64-bit pointers. What are the calling conventions for UNIX & Linux system calls on x86-64
See also the x86 tag wiki.

Is this assembly function call safe/complete?

I don't have experience in assembly, but this is what I've been working on. I would like input if I'm missing any fundamental aspects to passing parameters and calling a function via pointer in assembly.
For instance I'm wondering if I supposed to restore ecx, edx, esi, edi. I read they are general purpose registers, but I couldn't find if they need to be restored? Is there any kind of cleanup I am supposed to do after a call?
This is the code I have now, and it does work:
#include "stdio.h"
void foo(int a, int b, int c, int d)
{
printf("values = %d and %d and %d and %d\r\n", a, b, c, d);
}
int main()
{
int a=3,b=6,c=9,d=12;
__asm__(
"mov %3, %%ecx;"
"mov %2, %%edx;"
"mov %1, %%esi;"
"mov %0, %%edi;"
"call %4;"
:
: "g"(a), "g"(b), "g"(c), "g"(d), "a"(foo)
);
}
The original question was Is this assembly function call safe/complete?. The answer to that is: no. While it may appear to work in this simple example (especially if optimizations are disabled), you are violating rules that will eventually lead to failures (ones that are really hard to track down).
I'd like to address the (obvious) followup question of how to make it safe, but without feedback from the OP on the actual intent, I can't really do that.
So, I'll do the best I can with what we have and try to describe the things that make it unsafe and some of the things you can do about it.
Let's start by simplifying that asm:
__asm__(
"mov %0, %%edi;"
:
: "g"(a)
);
Even with this single statement, this code is already unsafe. Why? Because we are changing the value of a register (edi) without letting the compiler know.
How can the compiler not know you ask? After all, it's right there in the asm! The answer comes from this line in the gcc docs:
GCC does not parse the assembler instructions themselves and does not
know what they mean or even whether they are valid assembler input.
In that case, how do you let gcc know what's going on? The answer lies in using the constraints (the stuff after the colons) to describe the impact of the asm.
Perhaps the simplest way to fix this code would be like this:
__asm__(
"mov %0, %%edi;"
:
: "g"(a)
: edi
);
This adds edi to the clobber list. In brief, this tell gcc that the value of edi is going to be changed by the code, and that gcc shouldn't assume any particular value will be in it when the asm exits.
Now, while that's the easiest, it's not necessarily the best way. Consider this code:
__asm__(
""
:
: "D"(a)
);
This uses a machine constraint to tell gcc to put the value of the variable a into the edi register for you. Doing it this way, gcc will load the register for you at a 'convenient' time, perhaps by always keeping a in edi.
There is one (significant) caveat to this code: By putting the parameter after the 2nd colon, we are declaring it to be an input. Input parameters are required to be read-only (ie they must have the same value on exiting the asm).
In your case, the call statement means that we won't be able to guarantee that edi won't be changed, so this doesn't quite work. There are a few ways to deal with this. The easiest is to move the constraint up after the first colon, making it an output, and specify "+D" to indicate that the value is read+write. But then the contents of a are going to be pretty much undefined after the asm (printf could set it to anything). If destroying a is unacceptable, there's always something like this:
int junk;
__asm__ volatile (
""
: "=D" (junk)
: "0"(a)
);
This tells gcc that on starting the asm, it should put the value of the variable a into the same place as output constraint #0 (ie edi). It also says that on output, edi won't be a anymore, it will contain the variable junk.
Edit: Since the 'junk' variable isn't actually going to be used, we need to add the volatile qualifier. Volatile was implicit when there weren't any output parameters.
One other point on that line: You end it with a semi-colon. This is legal and will work as expected. However, if you ever want to use the -S command line option to see exactly what code got generated (and if you want to get good with inline asm, you will), you will find that produces difficult-to-read code. I'd recommend using \n\t instead of a semi-colon.
All that and we're still on the first line...
Obviously the same would apply to the other two mov statements.
Which brings us to the call statement.
Both Michael and I have listed a number of reasons doing call in inline asm is difficult.
Handling all the registers that may be clobbered by the function call's ABI.
Handling red-zone.
Handling alignment.
Memory clobber.
If the goal here is 'learning,' then feel free to experiment. But I don't know that I would ever feel comfortable doing this in production code. Even when it looks like it works, I'd never feel confident there wasn't some weird case I'd missed. That's aside from my normal concerns about using inline asm at all.
I know, that's a lot of information. Probably more than you were looking for as an introduction to gcc's asm command, but you've picked a challenging place to start.
If you haven't done so already, spend time looking over all the docs in gcc's Assembly Language interface. There's a lot of good information there along with examples to try to explain how it all works.
I read they are general purpose registers, but I couldn't find if they
need to be restored?
I am not the expert in the field, but from my reading of the x86-64 ABI (Figure 3.4) the following registers: %rdi, %rsi, %rdx, and %rcx are not preserved between function calls, thus apparently don't require to be restored.
As commented by David Wohlferd you should be careful, because either way, the compiler will not be aware of the "custom" function call and in consequence you may get into its way, particularly because it may be not aware of registers modification.

Resources