I working with xv6, which implements the original UNIX on x86 machines. I wrote very simple inline assembly in a C program :
register int ecx asm ("%ecx");
printf(1, "%d\n", ecx);
__asm__("movl 16(%esp), %ecx\t\n");
printf(1, "%d\n", ecx);
__asm__("add $0, %ecx\t\n");
printf(1, "%d\n", ecx);
__asm__("movl %ecx, 16(%esp)\t\n");
I usually get a value like 434 printed by the second print statement. However, after the add command it prints 2. If I use the addl command instead, it also prints 2. I am using the latest stable version of xv6. So, I don't really suspect it to be the problem. Is there any other way I can add two numbers in inline assembly?
Essentially I need to increment 16(%esp) by 4.
Edited code to:
__asm__("addl $8, 16(%esp)\t\n");
1) In your example you're not incrementing ecx by 4, your incrementing it by 0.
__asm__("addl $4, %ecx");
2) You should be able to chain multiple commands into one asm call
__asm__("movl 16(%esp), %ecx\n\t"
"addl $4, %ecx\n\t"
"movl %ecx, 16(%esp)");
3) The register keyword is a hint, and the compiler may decide to put your variable where ever it wants still. Also reading the documentation on the GCC page warns about how some functions may clobber various registers. printf() being a C function may very well use the ecx register without preserving its value. It could preserve it, but it may not; the compiler could be using that register for all sorts of optimizations inside of that call. It is a general purpose register on the 80x86 and those are often used for various parameter passing and return values all the time.
Untested corrections:
int reg; // By leaving this out, we give GCC the ability to pick the best available register.
/*
* volatile indicates to GCC that this inline assembly might do odd side
* effects and should disable any optimizations around it.
*/
asm volatile ("movl 16(%esp), %0\n\t"
"addl $4, %0\n\t"
"movl %0, 16(%esp)"
: "r" (reg)); // The "r" indicates we want to use a register
printf("Result: %d\n", reg);
The GCC manage page has more details.
Related
I'm using i686 gcc on windows. When I built the code with separate asm statements, it worked. However, when I try to combine it into one statement, it doesn't build and gives me a error: unsupported size for integer register.
Here's my code
u8 lstatus;
u8 lsectors_read;
u8 data_buffer;
void operate(u8 opcode, u8 sector_size, u8 track, u8 sector, u8 head, u8 drive, u8* buffer, u8* status, u8* sectors_read)
{
asm volatile("mov %3, %%ah;\n"
"mov %4, %%al;\n"
"mov %5, %%ch;\n"
"mov %6, %%cl;\n"
"mov %7, %%dh;\n"
"mov %8, %%dl;\n"
"int $0x13;\n"
"mov %%ah, %0;\n"
"mov %%al, %1;\n"
"mov %%es:(%%bx), %2;\n"
: "=r"(lstatus), "=r"(lsectors_read), "=r"(buffer)
: "r"(opcode), "r"(sector_size), "r"(track), "r"(sector), "r"(head), "r"(drive)
:);
status = &lstatus;
sectors_read = &lsectors_read;
buffer = &data_buffer;
}
The error message is a little misleading. It seems to be happening because GCC ran out of 8-bit registers.
Interestingly, it compiles without error messages if you just edit the template to remove references to the last 2 operands (https://godbolt.org/z/oujNP7), even without dropping them from the list of input constraints! (Trimming down your asm statement is a useful debugging technique to figure out which part of it GCC doesn't like, without caring for now if the asm will do anything useful.)
Removing 2 earlier operands and changing numbers shows that "r"(head), "r"(drive) weren't specifically a problem, just the combination of everything.
It looks like GCC is avoiding high-8 registers like AH as inputs, and x86-16 only has 4 low-8 registers but you have 6 u8 inputs. So I think GCC means it ran out of byte registers that it was willing to use.
(The 3 outputs aren't declared early-clobber so they're allowed to overlap the inputs.)
You could maybe work around this by using "rm" to give GCC the option of picking a memory input. (The x86-specific constraints like "Q" that are allowed to pick a high-8 register wouldn't help unless you require it to pick the correct one to get the compiler to emit a mov for you.) That would probably let your code compile, but the result would be totally broken.
You re-introduced basically the same bugs as before: not telling the compiler which registers you write, so for example your mov %4, %%al will overwrite one of the registers GCC picked as an input, before you actually read that operand.
Declaring clobbers on all the registers you use would leave not enough registers to hold all the input variables. (Unless you allow memory source operands.) That could work but is very inefficient: if your asm template string starts or ends with mov, you're almost always doing it wrong.
Also, there are other serious bugs, apart from how you're using inline asm. You don't supply an input pointer to your buffer. int $0x13 doesn't allocate a new buffer for you, it needs a pointer in ES:BX (which it dereferences but leaves unmodified). GCC requires that ES=DS=SS so you already have to have properly set up segmentation before calling into your C code, and isn't something you have to do every call.
Plus even in C terms outside the inline asm, your function doesn't make sense. status = &lstatus; modifies the value of a function arg, not dereferencing it to modify a pointed-to output variable. The variable written by those assignments die at the end of the function. But the global temporaries do have to be updated because they're global and some other function could see their value. Perhaps you meant something like *status = lstatus; with different types for your vars?
If that C problem isn't obvious (at least once it's pointed out), you need some more practice with C before you're ready to try mixing C and asm which require you to understand both very well, in order to correctly describe your asm to the compiler with accurate constraints.
A good and correct way to implement this is shown in #fuz's answer to your previous question. If you want to understand how the constraints can replace your mov instructions, compile it and look at the compiler-generated instructions. See https://stackoverflow.com/tags/inline-assembly/info for links to guides and docs. e.g. #fuz's version without the ES setup (because GCC needs you to have done that already before calling any C):
typedef unsigned char u8;
typedef unsigned short u16;
// Note the different signature, and using the output args correctly.
void read(u8 sector_size, u8 track, u8 sector, u8 head, u8 drive,
u8 *buffer, u8 *status, u8 *sectors_read)
{
u16 result;
asm volatile("int $0x13"
: "=a"(result)
: "a"(0x200|sector_size), "b"(buffer),
"c"(track<<8|sector), "d"(head<<8|drive)
: "memory" ); // memory clobber was missing from #fuz's version
*status = result >> 8;
*sectors_read = result >> 0;
}
Compiles as follows, with GCC10.1 -O2 -m16 on Godbolt:
read:
pushl %ebx
movzbl 12(%esp), %ecx
movzbl 16(%esp), %edx
movzbl 24(%esp), %ebx # load some stack args
sall $8, %ecx
movzbl 8(%esp), %eax
orl %edx, %ecx # shift and merge into CL,CH instead of writing partial regs
movzbl 20(%esp), %edx
orb $2, %ah
sall $8, %edx
orl %ebx, %edx
movl 28(%esp), %ebx # the pointer arg
int $0x13 # from the inline asm statement
movl 32(%esp), %edx # load output pointer arg
movl %eax, %ecx
shrw $8, %cx
movb %cl, (%edx)
movl 36(%esp), %edx
movb %al, (%edx)
popl %ebx
ret
It might be possible to use register u8 track asm("ch") or something to get the compiler to just write partial regs instead of shift/OR.
If you don't want to understand how constraints work, don't use GNU C inline asm. You could instead write stand-alone functions that you call from C, which accept args according to the calling convention the compiler uses (e.g. gcc -mregparm=3, or just everything on the stack with the traditional inefficient calling convention.)
You could do a better job than GCC's above code-gen, but note that the inline asm could optimize into surrounding code and avoid some of the actual copying to memory for passing args via the stack.
I have a function (C) that modifies "ecx" (or any other registers)
int proc(int n) {
int ret;
asm volatile ("movl %1, %%ecx\n\t" // mov (n) to ecx
"addl $10, %%ecx\n\t" // add (10) to ecx (n)
"movl %%ecx, %0" /* ret = n + 10 */
: "=r" (ret) : "r" (n) : "ecx");
return ret;
}
now i want to call this function in another function which that function moves a value in "ecx" before calling "proc" function
int main_proc(int n) {
asm volatile ("movl $55, %%ecx" ::: "ecx"); /// mov (55) to ecx
int ret;
asm volatile ("call proc" : "=r" (ret) : "r" (n) : "ecx"); // ecx is modified in proc function and the value of ecx is not 55 anymore even with "ecx" clobber
asm volatile ("addl %%ecx, %0" : "=r" (ret));
return ret;
}
in this function, (55) is moved into "ecx" register and then "proc" function is called (which modifies "ecx"). in this situation, "proc" function Must push "ecx" first and pop it at the end but it's not going to happen !!!!
this is the assembly source with (-O3) optimiaztion level
proc:
movl %edi, %ecx
addl $10, %ecx
movl %ecx, %eax
ret
main_proc:
movl $55, %ecx
call proc
addl %ecx, %eax
ret
why GCC is not going to use (push) and (pop) for "ecx" register ?? i used "ecx" clobber too !!!!!
You are using inline asm completely wrong. Your input/output constraints need to fully describe the inputs / outputs of each asm statement. To get data between asm statements, you have to hold it in C variables between them.
Also, call isn't safe inside inline asm in general, and specifically in x86-64 code for the System V ABI it steps on the red-zone where gcc might have been keeping things. There's no way to declare a clobber on that. You could use sub $128, %rsp first to skip past the red zone, or you could make calls from pure C like a normal person so the compiler knows about it. (Remember that call pushes a return address.) Your inline asm doesn't even make sense; your proc takes an arg but you didn't do anything in the caller to pass one.
The compiler-generated code in proc could have also destroyed any other call-clobbered registers, so you at least need to declare clobbers on those registers. Or hand-write the whole function in asm so you know what to put in clobbers.
why GCC is not going to use (push) and (pop) for "ecx" register ?? i used "ecx" clobber too !!!!!
An ecx clobber tells GCC that this asm statement destroys whatever GCC had in ECX previously. Using an ECX clobber in two separate inline-asm statements doesn't declare any kind of data dependency between them.
It's not equivalent to declaring a register-asm local variable like
register int foo asm("ecx"); that you use as a "+r" (foo) operand to the first and last asm statement. (Or more simply that you use with a "+c" constraint to make an ordinary variable pick ECX).
From GCC's point of view, your source means only what the constraints + clobbers tell it.
int main_proc(int n) {
asm volatile ("movl $55, %%ecx" ::: "ecx");
// ^^ black box that destroys ECX and produces no outputs
int ret;
asm volatile ("call proc" : "=r" (ret) : "r" (n) : "ecx");
// ^^ black box that can take `n` in any register, and can produce `ret` in any reg. And destroys ECX.
asm volatile ("addl %%ecx, %0" : "=r" (ret));
// ^^ black box with no inputs that can produce a new value for `ret` in any register
return ret;
}
I suspect you wanted the last asm statement to be "+r"(ret) to read/write the C variable ret instead of telling GCC that it was output-only. Because your asm uses it as an input as well as output as the destination of an add.
It might be interesting to add comments like # %%0 = %0 %%1 = %1 inside your 2nd asm statement to see which registers the "=r" and "r" constraints picked. On the Godbolt compiler explorer:
# gcc9.2 -O3
main_proc:
movl $55, %ecx
call proc # %0 = %edi %1 = %edi
addl %ecx, %eax # "=r" happened to pick EAX,
# which happens to still hold the return value from proc
ret
That accident of picking EAX as the add destinatino might not happen after this function inlines into something else. or GCC happens to put some compiler-generated instructions between asm statements. (asm volatile is barrier to compile-time reordering but not not a strog one. It only definitely stops optimizing away entirely).
Remember that inline asm templates are purely text substitution; asking the compiler to fill in an operand into a comment is no different from anywhere else in the template string. (Godbolt strips comment lines by default so sometimes it's handy to tack them onto other instructions, or onto a nop).
As you can see, this is 64-bit code (n arrives in EDI as per the x86-64 SysV calling convention, like how you built your code), so push %ecx wouldn't be encodeable. push %rcx would be.
Of course if GCC actually wanted to keep a value around past an asm statement with an "ecx" clobber, it would have just used mov %ecx, %edx or whatever other call-clobbered register that wasn't in the clobber list.
I'm trying to use cmpxchg with inline assembly through c. This is my code:
static inline int
cas(volatile void* addr, int expected, int newval) {
int ret;
asm volatile("movl %2 , %%eax\n\t"
"lock; cmpxchg %0, %3\n\t"
"pushfl\n\t"
"popl %1\n\t"
"and $0x0040, %1\n\t"
: "+m" (*(int*)addr), "=r" (ret)
: "r" (expected), "r" (newval)
: "%eax"
);
return ret;
}
This is my first time using inline and i'm not sure what could be causing this problem.
I tried "cmpxchgl" as well, but still nothing. Also tried removing the lock.
I get "operand size mismatch".
I think maybe it has something to do with the casting i do to addr, but i'm unsure. I try and exchange int for int, so don't really understand why there would be a size mismatch.
This is using AT&T style.
Thanks
As #prl points out, you reversed the operands, putting them in Intel order (See Intel's manual entry for cmpxchg). Any time your inline asm doesn't assemble, you should look at the asm the compiler was feeding to the assembler to see what happened to your template. In your case, simply remove the static inline so the compiler will make a stand-alone definition, then you get (on the Godbolt compiler explorer):
# gcc -S output for the original, with cmpxchg operands backwards
movl %edx , %eax
lock; cmpxchg (%ecx), %ebx # error on this line from the assembler
pushfl
popl %edx
and $0x0040, %edx
Sometimes that will clue your eye / brain in cases where staring at %3 and %0 didn't, especially after you check the instruction-set reference manual entry for cmpxchg and see that the memory operand is the destination (Intel-syntax first operand, AT&T syntax last operand).
This makes sense because the explicit register operand is only ever a source, while EAX and the memory operand are both read and then one or the other is written depending on the success of the compare. (And semantically you use cmpxchg as a conditional store to a memory destination.)
You're discarding the load result from the cas-failure case. I can't think of any use-cases for cmpxchg where doing a separate load of the atomic value would be incorrect, rather than just inefficient, but the usual semantics for a CAS function is that oldval is taken by reference and updated on failure. (At least that's how C++11 std::atomic and C11 stdatomic do it with bool atomic_compare_exchange_weak( volatile A *obj, C* expected, C desired );.)
(The weak/strong thing allows better code-gen for CAS retry-loops on targets that use LL/SC, where spurious failure is possible due to an interrupt or being rewritten with the same value. x86's lock cmpxchg is "strong")
Actually, GCC's legacy __sync builtins provide 2 separate CAS functions: one that returns the old value, and one that returns a bool. Both take the old/new value by reference. So it's not the same API that C++11 uses, but apparently it isn't so horrible that nobody used it.
Your overcomplicated code isn't portable to x86-64. From your use of popl, I assume you developed it on x86-32. You don't need pushf/pop to get ZF as an integer; that's what setcc is for. cmpxchg example for 64 bit integer has a 32-bit example that works that way (to show what they want a 64-bit version of).
Or even better, use GCC6 flag-return syntax so using this in a loop can compile to a cmpxchg / jne loop instead of cmpxchg / setz %al / test %al,%al / jnz.
We can fix all of those problems and improve the register allocation as well. (If the first or last instruction of an inline-asm statement is mov, you're probably using constraints inefficiently.)
Of course, by far the best thing for real usage would be to use C11 stdatomic or a GCC builtin. https://gcc.gnu.org/wiki/DontUseInlineAsm in cases where the compiler can emit just as good (or better) asm from code it "understands", because inline asm constrains the compiler. It's also difficult to write correctly / efficient, and to maintain.
Portable to i386 and x86-64, AT&T or Intel syntax, and works for any integer type width of register width or smaller:
// Note: oldVal by reference
static inline char CAS_flagout(int *ptr, int *poldVal, int newVal)
{
char ret;
__asm__ __volatile__ (
" lock; cmpxchg {%[newval], %[mem] | %[mem], %[newval]}\n"
: "=#ccz" (ret), [mem] "+m" (*ptr), "+a" (*poldVal)
: [newval]"r" (newVal)
: "memory"); // barrier for compiler reordering around this
return ret; // ZF result, 1 on success else 0
}
// spinning read-only is much better (with _mm_pause in the retry loop)
// not hammering on the cache line with lock cmpxchg.
// This is over-simplified so the asm is super-simple.
void cas_retry(int *lock) {
int oldval = 0;
while(!CAS_flagout(lock, &oldval, 1)) oldval = 0;
}
The { foo,bar | bar,foo } is ASM dialect alternatives. For x86, it's {AT&T | Intel}. The %[newval] is a named operand constraint; it's another way to keep your operands . The "=ccz" takes the z condition code as the output value, like a setz.
Compiles on Godbolt to this asm for 32-bit x86 with AT&T output:
cas_retry:
pushl %ebx
movl 8(%esp), %edx # load the pointer arg.
movl $1, %ecx
xorl %ebx, %ebx
.L2:
movl %ebx, %eax # xor %eax,%eax would save a lot of insns
lock; cmpxchg %ecx, (%edx)
jne .L2
popl %ebx
ret
gcc is dumb and stores a 0 in one reg before copying it to eax, instead of re-zeroing eax inside the loop. This is why it needs to save/restore EBX at all. It's the same asm we get from avoiding inline-asm, though (from x86 spinlock using cmpxchg):
// also omits _mm_pause and read-only retry, see the linked question
void spin_lock_oversimplified(int *p) {
while(!__sync_bool_compare_and_swap(p, 0, 1));
}
Someone should teach gcc that Intel CPUs can materialize a 0 more cheaply with xor-zeroing than they can copy it with mov, especially on Sandybridge (xor-zeroing elimination but no mov-elimination).
You had the operand order for the cmpxchg instruction is reversed. AT&T syntax needs the memory destination last:
"lock; cmpxchg %3, %0\n\t"
Or you could compile that instruction with its original order using -masm=intel, but the rest of your code is AT&T syntax and ordering so that's not the right answer.
As far as why it says "operand size mismatch", I can only say that that appears to be an assembler bug, in that it uses the wrong message.
While playing around with GCC's inline assembler feature, I tried to make a function which immediately exited the process, akin to _Exit from the C standard library.
Here is the relevant piece of source code:
void immediate_exit(int code)
{
#if defined(__x86_64__)
asm (
//Load exit code into %rdi
"mov %0, %%rdi\n\t"
//Load system call number (group_exit)
"mov $231, %%rax\n\t"
//Linux syscall, 64-bit version.
"syscall\n\t"
//No output operands, single unrestricted input register, no clobbered registers because we're about to exit.
:: "" (code) :
);
//Skip other architectures here, I'll fix these later.
#else
# error "Architecture not supported."
#endif
}
This works fine for debug builds (with -O0), but as soon as I turn optimisation on at any level, I get the following error:
immediate_exit.c: Assembler messages:
immediate_exit.c:4: Error: unsupported for `mov'
So I looked at the assembler output for both builds (I've removed .cfi* directives and other things for clarity, I can add that in again if it's a problem). The debug build:
immediate_exit:
.LFB0:
pushq %rbp
movq %rsp, %rbp
movl %edi, -4(%rbp)
mov -4(%rbp), %rdi
mov $231, %rax
syscall
popq %rbp
ret
And the optimised version:
immediate_exit:
.LFB0:
mov %edi, %rdi
mov $231, %rax
syscall
ret
So the optimised version is trying to put a 32-bit register edi into a 64-bit register, rdi, rather than loading it from rbp, which I presume is what is causing the error.
Now, I can fix this by specifying 'm' as a register constraint for code, which causes GCC to load from rbp regardless of optimisation level. However, I'd rather not do that, because I think the compiler and its authors has a much better idea about where to put stuff than I do.
So (finally!) my question is: how do I persuade GCC to use rdi rather than edi for the assembly output?
Overall, you're much better off using constraints to get values into the right registers rather than explicit moves:
#include <asm/unistd.h>
asm volatile("syscall"
: // no outputs. Other syscalls need an "=a"(retval) to tell the compiler RAX is modified, whether you actually use the retval or not.
: "D" ((uint64_t)code), "a" ((uint64_t)__NR_exit_group) // 231
: "rcx", "r11" // syscall itself clobbers these. exit can't fail and return; mostly here as an example for other syscalls
, "memory" // make sure any stores, e.g. to mmapped files, are done before this
);
__builtin_unreachable(); // tell the compiler execution doesn't come out the bottom of the asm statement. Maybe have the same effect as a "memory" clobber of making sure not to delay stores which could potentially be to mmapped files or shared memory.
That lets compiler hoist the moves earlier in the code if useful, or even avoid the move altogether if the value can be arranged to already be in the correct register...
For example code will be in EDI if this function doesn't inline; the Linux system-calling convention was chosen to be as close as possible to the x86-64 System V function-calling convention, except for using R10 instead of RCX because the syscall instruction itself overwrites it with saved-RIP, and R11 with saved-RFLAGS.
(Unnecessarily casting (uint64_t)code would force the compiler to redo zero-extension with a mov %edi, %edi in that case, though. The call number does need to be zero-extended to 64-bit, which will almost certainly happen for free even if you didn't manually cast it (since the compiler will use a mov $231, %eax), but it doesn't hurt to be explicit about something that is required. The exit_group system call takes a 32-bit int arg, so the kernel is guaranteed to ignore high garbage in RDI.)
Cast your variable into the appropriate length type.
#include <stdint.h>
asm (
//Load exit code into %rdi
"mov %0, %%rdi\n\t"
//Load system call number (group_exit)
"mov $231, %%rax\n\t"
//Linux syscall, 64-bit version.
"syscall\n\t"
//No output operands, single unrestricted input register, no clobbered registers because we're about to exit.
:: "g" ((uint64_t)code)
);
or better have your operand type straight away of the right size:
void immediate_exit(uint64_t code) { ...
I have the following code:
#include <stdio.h>
void main(){
int x=0, y=0,i=100;
for (;i<1000; i++,x+=32){
if (x == 25*32) {
y+=32;
asm volatile("pushl %%eax\n\t"
"movl $0, %%eax\n\t"
"popl %%eax\n\t"
:"=a"(x)
:"a"(0)
);
}
printf("%d %d\n", x, y);
}
}
Basically, what I want to do is, set the variable x to 0 but I don't quite understand what the assembly code does, and it actually does set x to 0 but I'm not sure whats going on. can somebody explain what's going on? (just for the sake of learning assembly and C).
Here is what your asm construct says:
In "=a"(x), you tell the compiler that the assembly will write (=) to the %eax (a) register, and you want the compiler to assign that result to x ((x)).
In "a"(0), you tell the compiler you want it to put a 0 ((0)) in %eax (a), and the assembly will read it.
Then push %%eax saves %eax on the stack, movl $0, %%eax puts 0 in %eax, and popl %%eax restores the saved value to %eax.
So, what happens is:
The compiler puts 0 in %eax.
Your instructions save 0 on the stack, move 0 into %eax, and restore 0 from the stack.
The compiler uses the 0 in %eax as the value of x.
So, this works, but it is inefficient. You can get the same effect with this:
asm volatile("movl $0, %[MyName]"
: [MyName] "=r" (x)
);
What this says is:
There are no inputs (because the second “:” is absent).
As before, the = tells the compiler these instructions will write a result.
The r says the result will be written to a register, but the compiler gets to pick the register.
The [MyName] tells the compiler to change %[MyName], where it appears in the assembly code, to the name of the register the compiler picks.
As before, the (x) says to use the value in the register after the assembly code as the new value of x.
Finally, the instruction movl $0, %[MyName] says to move 0 to the register named by %[MyName].
Because the compiler gets to pick the register, you do not have to save and restore it in the assembly language. The compiler is responsible for making sure it does not need that register for anything else.
Being able to name the operands as I have done with [MyName] is a new feature in GCC. If your version does not have it, you can do this instead:
asm volatile("movl $0, %0"
: "=r" (x)
);
Without names, each operand gets a number, starting at 0, and incremented in the order the operands appear in the input/output specifiers. Since we had only one operand, it was %0.