This question already has answers here:
How to invoke a system call via syscall or sysenter in inline assembly?
(2 answers)
Closed 3 years ago.
is it possible to write a single character using a syscall from within an inline assembly block? if so, how? it should look "something" like this:
__asm__ __volatile__
(
" movl $1, %%edx \n\t"
" movl $80, %%ecx \n\t"
" movl $0, %%ebx \n\t"
" movl $4, %%eax \n\t"
" int $0x80 \n\t"
::: "%eax", "%ebx", "%ecx", "%edx"
);
$80 is 'P' in ascii, but that returns nothing.
any suggestions much appreciated!
You can use architecture-specific constraints to directly place the arguments in specific registers, without needing the movl instructions in your inline assembly. Furthermore, then you can then use the & operator to get the address of the character:
#include <sys/syscall.h>
void sys_putc(char c) {
// write(int fd, const void *buf, size_t count);
int ret;
asm volatile("int $0x80"
: "=a"(ret) // outputs
: "a"(SYS_write), "b"(1), "c"(&c), "d"(1) // inputs
: "memory"); // clobbers
}
int main(void) {
sys_putc('P');
sys_putc('\n');
}
(Editor's note: the "memory" clobber is needed, or some other way of telling the compiler that the memory pointed-to by &c is read. How can I indicate that the memory *pointed* to by an inline ASM argument may be used?)
(In this case, =a(ret) is needed to indicate that the syscall clobbers EAX. We can't list EAX as a clobber because we need an input operand to use that register. The "a" constraint is like "r" but can only pick AL/AX/EAX/RAX. )
$ cc -m32 sys_putc.c && ./a.out
P
You could also return the number of bytes written that the syscall returns, and use "0" as a constraint to indicate EAX again:
int sys_putc(char c) {
int ret;
asm volatile("int $0x80" : "=a"(ret) : "0"(SYS_write), "b"(1), "c"(&c), "d"(1) : "memory");
return ret;
}
Note that on error, the system call return value will be a -errno code like -EBADF (bad file descriptor) or -EFAULT (bad pointer).
The normal libc system call wrapper functions check for a return value of unsigned eax > -4096UL and set errno + return -1.
Also note that compiling with -m32 is required: the 64-bit syscall ABI uses different call numbers (and registers), but this asm is hard-coding the slow way of invoking the 32-bit ABI, int $0x80.
Compiling in 64-bit mode will get sys/syscall.h to define SYS_write with 64-bit call numbers, which would break this code. So would 64-bit stack addresses even if you used the right numbers. What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code? - don't do that.
IIRC, two things are wrong in your example.
Firstly, you're writing to stdin with mov $0, %ebx
Second, write takes a pointer as it's second argument, so to write a single character you need that character stored somewhere in memory, you can't write the value directly to %ecx
ex:
.data
char: .byte 80
.text
mov $char, %ecx
I've only done pure asm in Linux, never inline using gcc, you can't drop data into the middle of the assembly, so I'm not sure how you'd get the pointer using inline assembly.
EDIT: I think I just remembered how to do it. you could push 'p' onto the stack and use %esp
pushw $80
movl %%esp, %%ecx
... int $0x80 ...
addl $2, %%esp
Something like
char p = 'P';
int main()
{
__asm__ __volatile__
(
" movl $1, %%edx \n\t"
" leal p , %%ecx \n\t"
" movl $0, %%ebx \n\t"
" movl $4, %%eax \n\t"
" int $0x80 \n\t"
::: "%eax", "%ebx", "%ecx", "%edx"
);
}
Add: note that I've used lea to Load the Effective Address of the char into ecx register; for the value of ebx I tried $0 and $1 and it seems to work anyway ...
Avoid the use of external char
int main()
{
__asm__ __volatile__
(
" movl $1, %%edx \n\t"
" subl $4, %%esp \n\t"
" movl $80, (%%esp)\n\t"
" movl %%esp, %%ecx \n\t"
" movl $1, %%ebx \n\t"
" movl $4, %%eax \n\t"
" int $0x80 \n\t"
" addl $4, %%esp\n\t"
::: "%eax", "%ebx", "%ecx", "%edx"
);
}
N.B.: it works because of the endianness of intel processors! :D
Related
I'm trying to compile xen on Arch linux and getting following error:
src/stacks.c:342:5: error: 'asm' operand has impossible constraints
asm volatile(
^
Here is code for the method that is causing error:
void
run_thread(void (*func)(void*), void *data)
{
ASSERT32FLAT();
if (! CONFIG_THREADS || ! ThreadControl)
goto fail;
struct thread_info *thread;
thread = memalign_tmphigh(THREADSTACKSIZE, THREADSTACKSIZE);
if (!thread)
goto fail;
dprintf(DEBUG_thread, "/%08x\\ Start thread\n", (u32)thread);
thread->stackpos = (void*)thread + THREADSTACKSIZE;
struct thread_info *cur = getCurThread();
hlist_add_after(&thread->node, &cur->node);
asm volatile(
// Start thread
" pushl $1f\n" // store return pc
" pushl %%ebp\n" // backup %ebp
" movl %%esp, (%%edx)\n" // cur->stackpos = %esp
" movl (%%ebx), %%esp\n" // %esp = thread->stackpos
" calll *%%ecx\n" // Call func
// End thread
" movl %%ebx, %%eax\n" // %eax = thread
" movl 4(%%ebx), %%ebx\n" // %ebx = thread->node.next
" movl (%5), %%esp\n" // %esp = MainThread.stackpos
" calll %4\n" // call __end_thread(thread)
" movl -4(%%ebx), %%esp\n" // %esp = next->stackpos
" popl %%ebp\n" // restore %ebp
" retl\n" // restore pc
"1:\n"
: "+a"(data), "+c"(func), "+b"(thread), "+d"(cur)
: "m"(*(u8*)__end_thread), "m"(MainThread)
: "esi", "edi", "cc", "memory");
return;
fail:
func(data);
}
I'm not sure what's going on. Can someone with assebly knowledge help look at it and tell if there is some obvious problem here?
Update:
You can fix this error by doing 2 things:
add COMMONCFLAGS += $(call cc-option,$(CC),-fstack-check=no,) into seabios makefile (if you build xen from git AUR then location should be xen/src/xen-4.5.1/tools/firmware/seabios-dir-remote/Makefile)
go to stacks.c and change movl (%5), %%esp to movl %5, %%esp
The immediate cause is probably that you don't have -fomit-frame-pointer enabled, either directly or indirectly through optimization switches. Thus, the compiler runs out of registers since eax, ebx, ecx and edx are used for arguments, esi and edi are clobbers and ebp is the frame pointer. The solution is therefore to make sure this option is enabled.
Apparently this code is part of SeaBIOS (thanks to Michael Petch for finding it). __end_thread there is simply a function, as opposed to a function pointer one would expect from the presence of that casting magic. As such, I think the point of this construct is to work around any eventual name mangling. Unfortunately, it sacrifices a register for that purpose. If you know you your environment does not mangle function names, you can use this simpler version which doesn't need an extra register and should compile fine in debug builds with a frame pointer too:
asm volatile(
// Start thread
" pushl $1f\n" // store return pc
" pushl %%ebp\n" // backup %ebp
" movl %%esp, (%%edx)\n" // cur->stackpos = %esp
" movl (%%ebx), %%esp\n" // %esp = thread->stackpos
" calll *%%ecx\n" // Call func
// End thread
" movl %%ebx, %%eax\n" // %eax = thread
" movl 4(%%ebx), %%ebx\n" // %ebx = thread->node.next
" movl (%4), %%esp\n" // %esp = MainThread.stackpos
" call __end_thread\n" // call __end_thread(thread)
" movl -4(%%ebx), %%esp\n" // %esp = next->stackpos
" popl %%ebp\n" // restore %ebp
" retl\n" // restore pc
"1:\n"
: "+a"(data), "+c"(func), "+b"(thread), "+d"(cur)
: "m"(MainThread)
: "esi", "edi", "cc", "memory");
I'd completely rewrite the asm statement. The basic problem is that the statement either clobbers or uses as an input/output operand every register except EBP. When optimization is disabled and -fno-omit-frame-pointer is used there isn't a register to store the result of evaluating the expression (u8*)__end_thread. Which is a good thing because it because if the frame pointer was available it would generate calll (%ebp) which isn't what is actually wanted here.
Instead of trying assign all the registers and clobbering any that aren't used, the following asm statement makes every register except EBP an output operand. This gives the compiler much more freedom to assign input registers.
int dummy;
asm volatile("push 1f\n\t"
"push %%ebp\n\t"
"mov %%esp, %[cur_stackpos]\n\t"
"mov %[thread_stackpos], %%esp\n\t"
"call *%[func]\n\t"
"mov %p[mainthread_stackpos], %%esp\n\t"
"mov %[thread], %%eax\n\t"
"call %c[end_thread]\n\t"
"mov 4(%[thread]),%%eax\n\t"
"mov -4(%%eax),%%esp\n\t"
"pop %%ebp\n\t"
"pop %%eax\n\t"
"jmp *%%eax\n\t"
"1:\n"
:
[data] "+a" (data),
"=b" (dummy), "=c" (dummy), "=d" (dummy),
"=S" (dummy), "=D" (dummy)
:
[func] "r" (func),
[cur_stackpos] "m" (cur->stackpos),
[thread_stackpos] "rm" (thread->stackpos),
[mainthread_stackpos] "i" (&MainThread.stackpos),
[thread] "bSD" (thread),
[end_thread] "i" (__end_thread)
:
"memory", "cc");
I've used "i" constraints and operand modifiers for the [mainthread_stackpos] and [end_thread] operands to ensure that these operands are simple labels. The compiler can't put them in registers or on the stack. This is a bit of paranoia, using an "m" constraint without operand modifiers will also work. At least until the compiler does something unexpected like it did with *(u8*)__end_thread. Speaking of which, I've replaced it with just __end_thread as the cast and dereference seems to be pointless.
I've also replaced the ret statement with pop %eax; jmp *%eax as this should be faster. The ret statement will always be mispredicted because the address won't be return stack buffer, but there's at least chance that jmp *eax will be predicted. It either jumps to the next instruction or the 1: label in switch_stacks.
I have just written a few small inline asm routines to query the timestamp counter in x86 so that I can profile small portions of code. I would really like to put those routines in a header so that I can reuse them in many different source files so basically my question is whether I should just organize those in macros or make them inline functions, my doubt with inline is that it is not necessarily the case that the compiler will actually inline it and since it is a performance sensitive call I would rather skip the function call overhead, on the other hand with macros the whole type safety goes away and I would strictly need a 32 bit int for this, I assume I could just add the specification in comments but still I try to avoid macros because of the many caveats. Here is the code:
inline void rdtsc(uint64_t* cycles)
{
uint32_t cycles_high, cycles_low;
asm volatile (
".att_syntax\n"
"CPUID\n\t" //Serialize
"RDTSC\n\t" //Read clock and cpuid
"mov %%edx, %0 \n\t"
"mov %%eax, %1 \n\t"
: "=r" (cycles_high), "=r" (cycles_low)
:: "%edx", "%eax");
*cycles = ((uint64_t) cycles_high << 32) | cycles_low;
}
Any suggestions on this are welcome. I am just trying to figure out what the preferred style would be for this kind of situation.
Since you will be measuring performance of portions of code, not necessarily always entire functions, you should not try to inline your performance counter.
It doesn't matter if there's a call overhead or not. What matter is that the mesurement is consistent, which means you either want ALWAYS the call overhead to be present, or NEVER.
The first is much easier to achieve than the former.
Let every portion of your code have the same call overhead.
If you really need to serialize before reading the TSC, you could use the LFENCE instruction instead which doesn't alter registers.
If you decide to continue to use CPUID for serialization, you ought to set EAX first (probably to 0, since you're not really concerned about the output) and note that this instruction trashes the EAX, EBX, ECX and EDX registers, so your routine MUST account for this fact.
In all, I'd be inclined to write it like this:
#include <stdint.h>
#include <stdio.h>
inline uint64_t rdtsc() {
uint32_t high, low;
asm volatile (
".att_syntax\n\t"
"LFENCE\n\t"
"RDTSC\n\t"
"movl %%eax, %0\n\t"
"movl %%edx, %1\n\t"
: "=rm" (low), "=rm" (high)
:: "%edx", "%eax");
return ((uint64_t) high << 32) | low;
}
int main() {
uint64_t x, y;
x = rdtsc();
printf("%lu\n", x);
y = rdtsc();
printf("%lu\n", y);
printf("%lu\n", y-x);
}
update:
It's been proposed by #Jester, and by #DavidWohlferd that one can eliminate the register allocations by assigning high and low directly to the edx and eax registers.
That version would look like this:
inline uint64_t rdtsc() {
uint32_t high, low;
asm volatile (
".att_syntax\n\t"
"LFENCE\n\t"
"RDTSC\n\t"
: "=a" (low), "=d" (high)
:: );
return ((uint64_t) high << 32) | low;
}
The resulting code (using gcc 4.8.3 on a 64-bit machine running Linux) using optimization -O2 and including up to the call to printf, is this:
#APP
# 20 "rdtsc.c" 1
.att_syntax
LFENCE
RDTSC
# 0 "" 2
#NO_APP
movq %rdx, %rbx
movl %eax, %eax
movl $.LC0, %edi
salq $32, %rbx
orq %rax, %rbx
xorl %eax, %eax
movq %rbx, %rsi
call printf
The version I originally posted results in this:
#APP
# 7 "rdtsc.c" 1
.att_syntax
LFENCE
RDTSC
movl %eax, %ecx
movl %edx, %ebx
# 0 "" 2
#NO_APP
movl %ecx, %ecx
salq $32, %rbx
movl $.LC0, %edi
orq %rcx, %rbx
xorl %eax, %eax
movq %rbx, %rsi
call printf
That version of the code is one instruction longer.
I have this function in C:
int write(int fd, char *buffer, int size)
{
int ret;
__asm__("mov $4, %%eax;"
"mov %0, %%ebx;"
"mov %1, %%ecx;"
"mov %2, %%edx;"
"int $0x80"
: "=r"(ret)
: "g"(fd), "g"(buffer), "g"(size)
: "eax", "ebx", "ecx", "edx");
if (ret < 0) {
return -1;
} else {
return 0;
}
}
Which translates to this code in ASM:
push %ebp
mov %esp,%ebp
push %esi
push %ebx
mov $0x4,%eax
mov %esi,%ebx
mov 0x8(%ebp),%ecx
mov 0xc(%ebp),%edx
int $0x80
mov %esi,%eax
sar $0x1f,%eax
pop %ebx
pop %esi
pop %ebp
ret
As fd, *buffer and size are function parameters, they are in 0x8(%ebp), 0xc(%ebp) and 0x10(%ebp), respectively. Why does GCC identify the position of fd in %esi, and the other two variables shifted in the stack? How can I get this function to run (get the variables in the registers properly)?
This is probably the calling convention on your architecture. If you want to constrain the other parameters to registers you should use the register constraints directly: eg. "a" stands for the eax register.
Also the $4 at the beginning looks wrong to me.
Something along the line
__asm__(
"int $0x80"
: "=b"(ret)
: "c"(fd), "d"(buffer), "a"(size)
);
should do, if these are really the registers that your syscall uses.
But, in the whole I think you shouldn't do this yourself. Your OS certainly has something like syscall that provides that functionality to you.
Arguments to inline assembler are numbered from zero starting with the outputs, so %0 is ret, %1 is fd, %2 is buffer and %3 is size.
Since I'm very new to GCC, I'm facing a problem in inline assembly code. The problem is that I'm not able to figure out how to copy the contents of a C variable (which is of type UINT32) into the register eax. I have tried the below code:
__asm__
(
// If the LSB of src is a 0, use ~src. Otherwise, use src.
"mov $src1, %eax;"
"and $1,%eax;"
"dec %eax;"
"xor $src2,%eax;"
// Find the number of zeros before the most significant one.
"mov $0x3F,%ecx;"
"bsr %eax, %eax;"
"cmove %ecx, %eax;"
"xor $0x1F,%eax;"
);
However mov $src1, %eax; doesn't work.
Could someone suggest a solution to this?
I guess what you are looking for is extended assembly e.g.:
int a=10, b;
asm ("movl %1, %%eax; /* eax = a */
movl %%eax, %0;" /* b = eax */
:"=r"(b) /* output */
:"r"(a) /* input */
:"%eax" /* clobbered register */
);
In the example above, we made the value of b equal to that of a using assembly instructions and eax register:
int a = 10, b;
b = a;
Please see the inline comments.
note:
mov $4, %eax // AT&T notation
mov eax, 4 // Intel notation
A good read about inline assembly in GCC environment.
I remember back in the day with the old borland DOS compiler you could do something like this:
asm {
mov ax,ex
etc etc...
}
Is there a semi-platform independent way to do this now? I have a need to make a BIOS call, so if there was a way to do this without asm code, that would be equally useful to me.
Using GCC
__asm__("movl %edx, %eax\n\t"
"addl $2, %eax\n\t");
Using VC++
__asm {
mov eax, edx
add eax, 2
}
In GCC, there's more to it than that. In the instruction, you have to tell the compiler what changed, so that its optimizer doesn't screw up. I'm no expert, but sometimes it looks something like this:
asm ("lock; xaddl %0,%2" : "=r" (result) : "0" (1), "m" (*atom) : "memory");
It's a good idea to write some sample code in C, then ask GCC to produce an assembly listing, then modify that code.
A good start would be reading this article which talk about inline assembly in C/C++:
http://www.codeproject.com/KB/cpp/edujini_inline_asm.aspx
Example from the article:
#include <stdio.h>
int main() {
/* Add 10 and 20 and store result into register %eax */
__asm__ ( "movl $10, %eax;"
"movl $20, %ebx;"
"addl %ebx, %eax;"
);
/* Subtract 20 from 10 and store result into register %eax */
__asm__ ( "movl $10, %eax;"
"movl $20, %ebx;"
"subl %ebx, %eax;"
);
/* Multiply 10 and 20 and store result into register %eax */
__asm__ ( "movl $10, %eax;"
"movl $20, %ebx;"
"imull %ebx, %eax;"
);
return 0 ;
}
For Microsoft compilers, inline assembly is supported only for x86. For other targets you have to define the whole function in a separate assembly source file, pass it to an assembler and link the resulting object module.
You're highly unlikely to be able to call into the BIOS under a protected-mode operating system and should use whatever facilities are available on that system. Even if you're in kernel mode it's probably unsafe - the BIOS may not be correctly synchronized with respect to OS state if you do so.
use of asm or __asm__ function ( in compilers have difference )
also you can write fortran codes with fortran function
asm("syscall");
fortran("Print *,"J");