'asm' operand has impossible constraints - c

I'm trying to compile xen on Arch linux and getting following error:
src/stacks.c:342:5: error: 'asm' operand has impossible constraints
asm volatile(
^
Here is code for the method that is causing error:
void
run_thread(void (*func)(void*), void *data)
{
ASSERT32FLAT();
if (! CONFIG_THREADS || ! ThreadControl)
goto fail;
struct thread_info *thread;
thread = memalign_tmphigh(THREADSTACKSIZE, THREADSTACKSIZE);
if (!thread)
goto fail;
dprintf(DEBUG_thread, "/%08x\\ Start thread\n", (u32)thread);
thread->stackpos = (void*)thread + THREADSTACKSIZE;
struct thread_info *cur = getCurThread();
hlist_add_after(&thread->node, &cur->node);
asm volatile(
// Start thread
" pushl $1f\n" // store return pc
" pushl %%ebp\n" // backup %ebp
" movl %%esp, (%%edx)\n" // cur->stackpos = %esp
" movl (%%ebx), %%esp\n" // %esp = thread->stackpos
" calll *%%ecx\n" // Call func
// End thread
" movl %%ebx, %%eax\n" // %eax = thread
" movl 4(%%ebx), %%ebx\n" // %ebx = thread->node.next
" movl (%5), %%esp\n" // %esp = MainThread.stackpos
" calll %4\n" // call __end_thread(thread)
" movl -4(%%ebx), %%esp\n" // %esp = next->stackpos
" popl %%ebp\n" // restore %ebp
" retl\n" // restore pc
"1:\n"
: "+a"(data), "+c"(func), "+b"(thread), "+d"(cur)
: "m"(*(u8*)__end_thread), "m"(MainThread)
: "esi", "edi", "cc", "memory");
return;
fail:
func(data);
}
I'm not sure what's going on. Can someone with assebly knowledge help look at it and tell if there is some obvious problem here?
Update:
You can fix this error by doing 2 things:
add COMMONCFLAGS += $(call cc-option,$(CC),-fstack-check=no,) into seabios makefile (if you build xen from git AUR then location should be xen/src/xen-4.5.1/tools/firmware/seabios-dir-remote/Makefile)
go to stacks.c and change movl (%5), %%esp to movl %5, %%esp

The immediate cause is probably that you don't have -fomit-frame-pointer enabled, either directly or indirectly through optimization switches. Thus, the compiler runs out of registers since eax, ebx, ecx and edx are used for arguments, esi and edi are clobbers and ebp is the frame pointer. The solution is therefore to make sure this option is enabled.
Apparently this code is part of SeaBIOS (thanks to Michael Petch for finding it). __end_thread there is simply a function, as opposed to a function pointer one would expect from the presence of that casting magic. As such, I think the point of this construct is to work around any eventual name mangling. Unfortunately, it sacrifices a register for that purpose. If you know you your environment does not mangle function names, you can use this simpler version which doesn't need an extra register and should compile fine in debug builds with a frame pointer too:
asm volatile(
// Start thread
" pushl $1f\n" // store return pc
" pushl %%ebp\n" // backup %ebp
" movl %%esp, (%%edx)\n" // cur->stackpos = %esp
" movl (%%ebx), %%esp\n" // %esp = thread->stackpos
" calll *%%ecx\n" // Call func
// End thread
" movl %%ebx, %%eax\n" // %eax = thread
" movl 4(%%ebx), %%ebx\n" // %ebx = thread->node.next
" movl (%4), %%esp\n" // %esp = MainThread.stackpos
" call __end_thread\n" // call __end_thread(thread)
" movl -4(%%ebx), %%esp\n" // %esp = next->stackpos
" popl %%ebp\n" // restore %ebp
" retl\n" // restore pc
"1:\n"
: "+a"(data), "+c"(func), "+b"(thread), "+d"(cur)
: "m"(MainThread)
: "esi", "edi", "cc", "memory");

I'd completely rewrite the asm statement. The basic problem is that the statement either clobbers or uses as an input/output operand every register except EBP. When optimization is disabled and -fno-omit-frame-pointer is used there isn't a register to store the result of evaluating the expression (u8*)__end_thread. Which is a good thing because it because if the frame pointer was available it would generate calll (%ebp) which isn't what is actually wanted here.
Instead of trying assign all the registers and clobbering any that aren't used, the following asm statement makes every register except EBP an output operand. This gives the compiler much more freedom to assign input registers.
int dummy;
asm volatile("push 1f\n\t"
"push %%ebp\n\t"
"mov %%esp, %[cur_stackpos]\n\t"
"mov %[thread_stackpos], %%esp\n\t"
"call *%[func]\n\t"
"mov %p[mainthread_stackpos], %%esp\n\t"
"mov %[thread], %%eax\n\t"
"call %c[end_thread]\n\t"
"mov 4(%[thread]),%%eax\n\t"
"mov -4(%%eax),%%esp\n\t"
"pop %%ebp\n\t"
"pop %%eax\n\t"
"jmp *%%eax\n\t"
"1:\n"
:
[data] "+a" (data),
"=b" (dummy), "=c" (dummy), "=d" (dummy),
"=S" (dummy), "=D" (dummy)
:
[func] "r" (func),
[cur_stackpos] "m" (cur->stackpos),
[thread_stackpos] "rm" (thread->stackpos),
[mainthread_stackpos] "i" (&MainThread.stackpos),
[thread] "bSD" (thread),
[end_thread] "i" (__end_thread)
:
"memory", "cc");
I've used "i" constraints and operand modifiers for the [mainthread_stackpos] and [end_thread] operands to ensure that these operands are simple labels. The compiler can't put them in registers or on the stack. This is a bit of paranoia, using an "m" constraint without operand modifiers will also work. At least until the compiler does something unexpected like it did with *(u8*)__end_thread. Speaking of which, I've replaced it with just __end_thread as the cast and dereference seems to be pointless.
I've also replaced the ret statement with pop %eax; jmp *%eax as this should be faster. The ret statement will always be mispredicted because the address won't be return stack buffer, but there's at least chance that jmp *eax will be predicted. It either jumps to the next instruction or the 1: label in switch_stacks.

Related

C Assembly : Return value from %eax beyond jump instruction error: expected ‘)’ before ‘:’ token

In following c function
#1
int check()
{
__asm__ __volatile__ (
<snip some activity that has a jump to not_supported>
"movl $1, %eax \n\t" \
"jmp done \n\t" \
"not_supported:\n\t" \
"movl $0, %eax \n\t" \
“done:\n\t”
);
}
Return value is stored in the eax register
This compiles fine on
gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
But complains in other place due to werror enforcement
error: no return statement in function returning non-void [-Werror=return-type]
So to make it acceptable to gcc with werror I added a stack variable as a return value to #1
int check()
{
int ret_value =0;
__asm__ __volatile__ (
<snip some activity that has a jump to not_supported>
"movl $1, %0 \n\t" : "=a"(ret_value) :: \
"jmp done \n\t" \
"not_supported:\n\t" \
"movl $0, %0 \n\t" : "=a"(ret_value) :: \
"done:\n\t"
);
return ret_value;
}
The gcc doesn’t allow this to compile even with non werror case :
: error: expected ‘)’ before ‘:’ token
"movl $0, %0 \n\t" : "=r"(ret_value) :: \
This complains at the first : in the movl instruction
I tried with the “=r” register operand constraints for output as well but still it doesn’t compile. I also tried explicitly giving clobber registers as "eax" but that also doesn't help.
It seems gcc complains about ret_value modification after the jmp .
Other thing I tried was #1 with another mov from eax to ret_val which logically didn’t make sense to me.
(What I mean is adding a movl instruction after the done: that moves value of %eax to %0 which is ret_val)
And that didn’t compile either.
Any thing I am missing here ?
#margaretBloom's suggestion helped. I removed multiple mentions of output operands and just kept one at the end. This helped compile but here's the disassembled output looks like :
0x000055555555519f <+53>: jmp 0x5555555551a6 <dummy_funct+60>
0x00005555555551a1 <+55>: mov $0x0,%eax
0x00005555555551a6 <+60>: mov %edx,-0x4(%rbp)
0x00005555555551a9 <+63>: mov -0x4(%rbp),%eax
0x00005555555551ac <+66>: pop %rbp
0x00005555555551ad <+67>: retq
Here the compiler is trying to write the stack variable contents to eax . Thus overwriting the real output that was originally stored in the eax.
With further exploration and bug fixes following thing works :
"movl $1, %%eax \n\t" \
"movl %%eax, %0 \n\t" \
"jmp done \n\t" \
not_supported:\n\t" \
"movl $0, %%eax \n\t" \
"movl %%eax, %0 \n\t" \
"done:\n\t" \
: "=r" (ret_value)
:
:"%eax", "%ebx", "%ecx"
);
The trick was to record the results in the eax and then copy those to the return value explicitly along with appropriate clobbering. The other thing was provided register operands rather than the registers themselves for the instructions.

GCC doesn't push registers around my inline asm function call even though I have clobbers

I have a function (C) that modifies "ecx" (or any other registers)
int proc(int n) {
int ret;
asm volatile ("movl %1, %%ecx\n\t" // mov (n) to ecx
"addl $10, %%ecx\n\t" // add (10) to ecx (n)
"movl %%ecx, %0" /* ret = n + 10 */
: "=r" (ret) : "r" (n) : "ecx");
return ret;
}
now i want to call this function in another function which that function moves a value in "ecx" before calling "proc" function
int main_proc(int n) {
asm volatile ("movl $55, %%ecx" ::: "ecx"); /// mov (55) to ecx
int ret;
asm volatile ("call proc" : "=r" (ret) : "r" (n) : "ecx"); // ecx is modified in proc function and the value of ecx is not 55 anymore even with "ecx" clobber
asm volatile ("addl %%ecx, %0" : "=r" (ret));
return ret;
}
in this function, (55) is moved into "ecx" register and then "proc" function is called (which modifies "ecx"). in this situation, "proc" function Must push "ecx" first and pop it at the end but it's not going to happen !!!!
this is the assembly source with (-O3) optimiaztion level
proc:
movl %edi, %ecx
addl $10, %ecx
movl %ecx, %eax
ret
main_proc:
movl $55, %ecx
call proc
addl %ecx, %eax
ret
why GCC is not going to use (push) and (pop) for "ecx" register ?? i used "ecx" clobber too !!!!!
You are using inline asm completely wrong. Your input/output constraints need to fully describe the inputs / outputs of each asm statement. To get data between asm statements, you have to hold it in C variables between them.
Also, call isn't safe inside inline asm in general, and specifically in x86-64 code for the System V ABI it steps on the red-zone where gcc might have been keeping things. There's no way to declare a clobber on that. You could use sub $128, %rsp first to skip past the red zone, or you could make calls from pure C like a normal person so the compiler knows about it. (Remember that call pushes a return address.) Your inline asm doesn't even make sense; your proc takes an arg but you didn't do anything in the caller to pass one.
The compiler-generated code in proc could have also destroyed any other call-clobbered registers, so you at least need to declare clobbers on those registers. Or hand-write the whole function in asm so you know what to put in clobbers.
why GCC is not going to use (push) and (pop) for "ecx" register ?? i used "ecx" clobber too !!!!!
An ecx clobber tells GCC that this asm statement destroys whatever GCC had in ECX previously. Using an ECX clobber in two separate inline-asm statements doesn't declare any kind of data dependency between them.
It's not equivalent to declaring a register-asm local variable like
register int foo asm("ecx"); that you use as a "+r" (foo) operand to the first and last asm statement. (Or more simply that you use with a "+c" constraint to make an ordinary variable pick ECX).
From GCC's point of view, your source means only what the constraints + clobbers tell it.
int main_proc(int n) {
asm volatile ("movl $55, %%ecx" ::: "ecx");
// ^^ black box that destroys ECX and produces no outputs
int ret;
asm volatile ("call proc" : "=r" (ret) : "r" (n) : "ecx");
// ^^ black box that can take `n` in any register, and can produce `ret` in any reg. And destroys ECX.
asm volatile ("addl %%ecx, %0" : "=r" (ret));
// ^^ black box with no inputs that can produce a new value for `ret` in any register
return ret;
}
I suspect you wanted the last asm statement to be "+r"(ret) to read/write the C variable ret instead of telling GCC that it was output-only. Because your asm uses it as an input as well as output as the destination of an add.
It might be interesting to add comments like # %%0 = %0 %%1 = %1 inside your 2nd asm statement to see which registers the "=r" and "r" constraints picked. On the Godbolt compiler explorer:
# gcc9.2 -O3
main_proc:
movl $55, %ecx
call proc # %0 = %edi %1 = %edi
addl %ecx, %eax # "=r" happened to pick EAX,
# which happens to still hold the return value from proc
ret
That accident of picking EAX as the add destinatino might not happen after this function inlines into something else. or GCC happens to put some compiler-generated instructions between asm statements. (asm volatile is barrier to compile-time reordering but not not a strog one. It only definitely stops optimizing away entirely).
Remember that inline asm templates are purely text substitution; asking the compiler to fill in an operand into a comment is no different from anywhere else in the template string. (Godbolt strips comment lines by default so sometimes it's handy to tack them onto other instructions, or onto a nop).
As you can see, this is 64-bit code (n arrives in EDI as per the x86-64 SysV calling convention, like how you built your code), so push %ecx wouldn't be encodeable. push %rcx would be.
Of course if GCC actually wanted to keep a value around past an asm statement with an "ecx" clobber, it would have just used mov %ecx, %edx or whatever other call-clobbered register that wasn't in the clobber list.

inline asm code organization

I have just written a few small inline asm routines to query the timestamp counter in x86 so that I can profile small portions of code. I would really like to put those routines in a header so that I can reuse them in many different source files so basically my question is whether I should just organize those in macros or make them inline functions, my doubt with inline is that it is not necessarily the case that the compiler will actually inline it and since it is a performance sensitive call I would rather skip the function call overhead, on the other hand with macros the whole type safety goes away and I would strictly need a 32 bit int for this, I assume I could just add the specification in comments but still I try to avoid macros because of the many caveats. Here is the code:
inline void rdtsc(uint64_t* cycles)
{
uint32_t cycles_high, cycles_low;
asm volatile (
".att_syntax\n"
"CPUID\n\t" //Serialize
"RDTSC\n\t" //Read clock and cpuid
"mov %%edx, %0 \n\t"
"mov %%eax, %1 \n\t"
: "=r" (cycles_high), "=r" (cycles_low)
:: "%edx", "%eax");
*cycles = ((uint64_t) cycles_high << 32) | cycles_low;
}
Any suggestions on this are welcome. I am just trying to figure out what the preferred style would be for this kind of situation.
Since you will be measuring performance of portions of code, not necessarily always entire functions, you should not try to inline your performance counter.
It doesn't matter if there's a call overhead or not. What matter is that the mesurement is consistent, which means you either want ALWAYS the call overhead to be present, or NEVER.
The first is much easier to achieve than the former.
Let every portion of your code have the same call overhead.
If you really need to serialize before reading the TSC, you could use the LFENCE instruction instead which doesn't alter registers.
If you decide to continue to use CPUID for serialization, you ought to set EAX first (probably to 0, since you're not really concerned about the output) and note that this instruction trashes the EAX, EBX, ECX and EDX registers, so your routine MUST account for this fact.
In all, I'd be inclined to write it like this:
#include <stdint.h>
#include <stdio.h>
inline uint64_t rdtsc() {
uint32_t high, low;
asm volatile (
".att_syntax\n\t"
"LFENCE\n\t"
"RDTSC\n\t"
"movl %%eax, %0\n\t"
"movl %%edx, %1\n\t"
: "=rm" (low), "=rm" (high)
:: "%edx", "%eax");
return ((uint64_t) high << 32) | low;
}
int main() {
uint64_t x, y;
x = rdtsc();
printf("%lu\n", x);
y = rdtsc();
printf("%lu\n", y);
printf("%lu\n", y-x);
}
update:
It's been proposed by #Jester, and by #DavidWohlferd that one can eliminate the register allocations by assigning high and low directly to the edx and eax registers.
That version would look like this:
inline uint64_t rdtsc() {
uint32_t high, low;
asm volatile (
".att_syntax\n\t"
"LFENCE\n\t"
"RDTSC\n\t"
: "=a" (low), "=d" (high)
:: );
return ((uint64_t) high << 32) | low;
}
The resulting code (using gcc 4.8.3 on a 64-bit machine running Linux) using optimization -O2 and including up to the call to printf, is this:
#APP
# 20 "rdtsc.c" 1
.att_syntax
LFENCE
RDTSC
# 0 "" 2
#NO_APP
movq %rdx, %rbx
movl %eax, %eax
movl $.LC0, %edi
salq $32, %rbx
orq %rax, %rbx
xorl %eax, %eax
movq %rbx, %rsi
call printf
The version I originally posted results in this:
#APP
# 7 "rdtsc.c" 1
.att_syntax
LFENCE
RDTSC
movl %eax, %ecx
movl %edx, %ebx
# 0 "" 2
#NO_APP
movl %ecx, %ecx
salq $32, %rbx
movl $.LC0, %edi
orq %rcx, %rbx
xorl %eax, %eax
movq %rbx, %rsi
call printf
That version of the code is one instruction longer.

Copy content of C variable into a register (GCC)

Since I'm very new to GCC, I'm facing a problem in inline assembly code. The problem is that I'm not able to figure out how to copy the contents of a C variable (which is of type UINT32) into the register eax. I have tried the below code:
__asm__
(
// If the LSB of src is a 0, use ~src. Otherwise, use src.
"mov $src1, %eax;"
"and $1,%eax;"
"dec %eax;"
"xor $src2,%eax;"
// Find the number of zeros before the most significant one.
"mov $0x3F,%ecx;"
"bsr %eax, %eax;"
"cmove %ecx, %eax;"
"xor $0x1F,%eax;"
);
However mov $src1, %eax; doesn't work.
Could someone suggest a solution to this?
I guess what you are looking for is extended assembly e.g.:
int a=10, b;
asm ("movl %1, %%eax; /* eax = a */
movl %%eax, %0;" /* b = eax */
:"=r"(b) /* output */
:"r"(a) /* input */
:"%eax" /* clobbered register */
);
In the example above, we made the value of b equal to that of a using assembly instructions and eax register:
int a = 10, b;
b = a;
Please see the inline comments.
note:
mov $4, %eax // AT&T notation
mov eax, 4 // Intel notation
A good read about inline assembly in GCC environment.

syscall from within GCC inline assembly [duplicate]

This question already has answers here:
How to invoke a system call via syscall or sysenter in inline assembly?
(2 answers)
Closed 3 years ago.
is it possible to write a single character using a syscall from within an inline assembly block? if so, how? it should look "something" like this:
__asm__ __volatile__
(
" movl $1, %%edx \n\t"
" movl $80, %%ecx \n\t"
" movl $0, %%ebx \n\t"
" movl $4, %%eax \n\t"
" int $0x80 \n\t"
::: "%eax", "%ebx", "%ecx", "%edx"
);
$80 is 'P' in ascii, but that returns nothing.
any suggestions much appreciated!
You can use architecture-specific constraints to directly place the arguments in specific registers, without needing the movl instructions in your inline assembly. Furthermore, then you can then use the & operator to get the address of the character:
#include <sys/syscall.h>
void sys_putc(char c) {
// write(int fd, const void *buf, size_t count);
int ret;
asm volatile("int $0x80"
: "=a"(ret) // outputs
: "a"(SYS_write), "b"(1), "c"(&c), "d"(1) // inputs
: "memory"); // clobbers
}
int main(void) {
sys_putc('P');
sys_putc('\n');
}
(Editor's note: the "memory" clobber is needed, or some other way of telling the compiler that the memory pointed-to by &c is read. How can I indicate that the memory *pointed* to by an inline ASM argument may be used?)
(In this case, =a(ret) is needed to indicate that the syscall clobbers EAX. We can't list EAX as a clobber because we need an input operand to use that register. The "a" constraint is like "r" but can only pick AL/AX/EAX/RAX. )
$ cc -m32 sys_putc.c && ./a.out
P
You could also return the number of bytes written that the syscall returns, and use "0" as a constraint to indicate EAX again:
int sys_putc(char c) {
int ret;
asm volatile("int $0x80" : "=a"(ret) : "0"(SYS_write), "b"(1), "c"(&c), "d"(1) : "memory");
return ret;
}
Note that on error, the system call return value will be a -errno code like -EBADF (bad file descriptor) or -EFAULT (bad pointer).
The normal libc system call wrapper functions check for a return value of unsigned eax > -4096UL and set errno + return -1.
Also note that compiling with -m32 is required: the 64-bit syscall ABI uses different call numbers (and registers), but this asm is hard-coding the slow way of invoking the 32-bit ABI, int $0x80.
Compiling in 64-bit mode will get sys/syscall.h to define SYS_write with 64-bit call numbers, which would break this code. So would 64-bit stack addresses even if you used the right numbers. What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code? - don't do that.
IIRC, two things are wrong in your example.
Firstly, you're writing to stdin with mov $0, %ebx
Second, write takes a pointer as it's second argument, so to write a single character you need that character stored somewhere in memory, you can't write the value directly to %ecx
ex:
.data
char: .byte 80
.text
mov $char, %ecx
I've only done pure asm in Linux, never inline using gcc, you can't drop data into the middle of the assembly, so I'm not sure how you'd get the pointer using inline assembly.
EDIT: I think I just remembered how to do it. you could push 'p' onto the stack and use %esp
pushw $80
movl %%esp, %%ecx
... int $0x80 ...
addl $2, %%esp
Something like
char p = 'P';
int main()
{
__asm__ __volatile__
(
" movl $1, %%edx \n\t"
" leal p , %%ecx \n\t"
" movl $0, %%ebx \n\t"
" movl $4, %%eax \n\t"
" int $0x80 \n\t"
::: "%eax", "%ebx", "%ecx", "%edx"
);
}
Add: note that I've used lea to Load the Effective Address of the char into ecx register; for the value of ebx I tried $0 and $1 and it seems to work anyway ...
Avoid the use of external char
int main()
{
__asm__ __volatile__
(
" movl $1, %%edx \n\t"
" subl $4, %%esp \n\t"
" movl $80, (%%esp)\n\t"
" movl %%esp, %%ecx \n\t"
" movl $1, %%ebx \n\t"
" movl $4, %%eax \n\t"
" int $0x80 \n\t"
" addl $4, %%esp\n\t"
::: "%eax", "%ebx", "%ecx", "%edx"
);
}
N.B.: it works because of the endianness of intel processors! :D

Resources