Is there a way to insert assembly code into C? - c

I remember back in the day with the old borland DOS compiler you could do something like this:
asm {
mov ax,ex
etc etc...
}
Is there a semi-platform independent way to do this now? I have a need to make a BIOS call, so if there was a way to do this without asm code, that would be equally useful to me.

Using GCC
__asm__("movl %edx, %eax\n\t"
"addl $2, %eax\n\t");
Using VC++
__asm {
mov eax, edx
add eax, 2
}

In GCC, there's more to it than that. In the instruction, you have to tell the compiler what changed, so that its optimizer doesn't screw up. I'm no expert, but sometimes it looks something like this:
asm ("lock; xaddl %0,%2" : "=r" (result) : "0" (1), "m" (*atom) : "memory");
It's a good idea to write some sample code in C, then ask GCC to produce an assembly listing, then modify that code.

A good start would be reading this article which talk about inline assembly in C/C++:
http://www.codeproject.com/KB/cpp/edujini_inline_asm.aspx
Example from the article:
#include <stdio.h>
int main() {
/* Add 10 and 20 and store result into register %eax */
__asm__ ( "movl $10, %eax;"
"movl $20, %ebx;"
"addl %ebx, %eax;"
);
/* Subtract 20 from 10 and store result into register %eax */
__asm__ ( "movl $10, %eax;"
"movl $20, %ebx;"
"subl %ebx, %eax;"
);
/* Multiply 10 and 20 and store result into register %eax */
__asm__ ( "movl $10, %eax;"
"movl $20, %ebx;"
"imull %ebx, %eax;"
);
return 0 ;
}

For Microsoft compilers, inline assembly is supported only for x86. For other targets you have to define the whole function in a separate assembly source file, pass it to an assembler and link the resulting object module.
You're highly unlikely to be able to call into the BIOS under a protected-mode operating system and should use whatever facilities are available on that system. Even if you're in kernel mode it's probably unsafe - the BIOS may not be correctly synchronized with respect to OS state if you do so.

use of asm or __asm__ function ( in compilers have difference )
also you can write fortran codes with fortran function
asm("syscall");
fortran("Print *,"J");

Related

GCC doesn't push registers around my inline asm function call even though I have clobbers

I have a function (C) that modifies "ecx" (or any other registers)
int proc(int n) {
int ret;
asm volatile ("movl %1, %%ecx\n\t" // mov (n) to ecx
"addl $10, %%ecx\n\t" // add (10) to ecx (n)
"movl %%ecx, %0" /* ret = n + 10 */
: "=r" (ret) : "r" (n) : "ecx");
return ret;
}
now i want to call this function in another function which that function moves a value in "ecx" before calling "proc" function
int main_proc(int n) {
asm volatile ("movl $55, %%ecx" ::: "ecx"); /// mov (55) to ecx
int ret;
asm volatile ("call proc" : "=r" (ret) : "r" (n) : "ecx"); // ecx is modified in proc function and the value of ecx is not 55 anymore even with "ecx" clobber
asm volatile ("addl %%ecx, %0" : "=r" (ret));
return ret;
}
in this function, (55) is moved into "ecx" register and then "proc" function is called (which modifies "ecx"). in this situation, "proc" function Must push "ecx" first and pop it at the end but it's not going to happen !!!!
this is the assembly source with (-O3) optimiaztion level
proc:
movl %edi, %ecx
addl $10, %ecx
movl %ecx, %eax
ret
main_proc:
movl $55, %ecx
call proc
addl %ecx, %eax
ret
why GCC is not going to use (push) and (pop) for "ecx" register ?? i used "ecx" clobber too !!!!!
You are using inline asm completely wrong. Your input/output constraints need to fully describe the inputs / outputs of each asm statement. To get data between asm statements, you have to hold it in C variables between them.
Also, call isn't safe inside inline asm in general, and specifically in x86-64 code for the System V ABI it steps on the red-zone where gcc might have been keeping things. There's no way to declare a clobber on that. You could use sub $128, %rsp first to skip past the red zone, or you could make calls from pure C like a normal person so the compiler knows about it. (Remember that call pushes a return address.) Your inline asm doesn't even make sense; your proc takes an arg but you didn't do anything in the caller to pass one.
The compiler-generated code in proc could have also destroyed any other call-clobbered registers, so you at least need to declare clobbers on those registers. Or hand-write the whole function in asm so you know what to put in clobbers.
why GCC is not going to use (push) and (pop) for "ecx" register ?? i used "ecx" clobber too !!!!!
An ecx clobber tells GCC that this asm statement destroys whatever GCC had in ECX previously. Using an ECX clobber in two separate inline-asm statements doesn't declare any kind of data dependency between them.
It's not equivalent to declaring a register-asm local variable like
register int foo asm("ecx"); that you use as a "+r" (foo) operand to the first and last asm statement. (Or more simply that you use with a "+c" constraint to make an ordinary variable pick ECX).
From GCC's point of view, your source means only what the constraints + clobbers tell it.
int main_proc(int n) {
asm volatile ("movl $55, %%ecx" ::: "ecx");
// ^^ black box that destroys ECX and produces no outputs
int ret;
asm volatile ("call proc" : "=r" (ret) : "r" (n) : "ecx");
// ^^ black box that can take `n` in any register, and can produce `ret` in any reg. And destroys ECX.
asm volatile ("addl %%ecx, %0" : "=r" (ret));
// ^^ black box with no inputs that can produce a new value for `ret` in any register
return ret;
}
I suspect you wanted the last asm statement to be "+r"(ret) to read/write the C variable ret instead of telling GCC that it was output-only. Because your asm uses it as an input as well as output as the destination of an add.
It might be interesting to add comments like # %%0 = %0 %%1 = %1 inside your 2nd asm statement to see which registers the "=r" and "r" constraints picked. On the Godbolt compiler explorer:
# gcc9.2 -O3
main_proc:
movl $55, %ecx
call proc # %0 = %edi %1 = %edi
addl %ecx, %eax # "=r" happened to pick EAX,
# which happens to still hold the return value from proc
ret
That accident of picking EAX as the add destinatino might not happen after this function inlines into something else. or GCC happens to put some compiler-generated instructions between asm statements. (asm volatile is barrier to compile-time reordering but not not a strog one. It only definitely stops optimizing away entirely).
Remember that inline asm templates are purely text substitution; asking the compiler to fill in an operand into a comment is no different from anywhere else in the template string. (Godbolt strips comment lines by default so sometimes it's handy to tack them onto other instructions, or onto a nop).
As you can see, this is 64-bit code (n arrives in EDI as per the x86-64 SysV calling convention, like how you built your code), so push %ecx wouldn't be encodeable. push %rcx would be.
Of course if GCC actually wanted to keep a value around past an asm statement with an "ecx" clobber, it would have just used mov %ecx, %edx or whatever other call-clobbered register that wasn't in the clobber list.

How to write multiple assembly statements within asm() without "\t\n" separating each line using GCC?

How to write multiple assembly statements within asm() without "\t\n" separating each line using GCC?
I've seen some textbooks write multiple assembly statements within asm() as:
asm("
movl $4, %eax
movl $2, %ebx
addl %eax, %ebx
...
");
However, my compiler (GCC) doesn't recognize this syntax. Instead, I must rely on "\t\n" separating each line or using multiple asm():
asm(
"movl $4, %eax\t\n"
"movl $2, %ebx\t\n"
"addl %eax, %ebx\t\n"
...);
or
asm("movl $4, %eax");
asm("movl $2, %ebx");
asm("addl %eax, %ebx");
...
How do I enable the "clean" syntax with no "\t\n" or repeated asm()?
GCC
Your inline assembly is ill advised since you alter registers without informing the compiler. You should use GCC's extended inline assembler with proper input and output constraints. Using inline assembler should be used as a last resort and you should understand exactly what you are doing. GCC's inline assembly is very unforgiving, as code that seems to work may not even be correct.
With that being said ending each string with \n\t makes the generated assembler code look cleaner. You can see this by compiling with the -S parameter to generate the corresponding assembly code. You do have the option of using a ; (semicolon). This will separate each instruction but will output all of the instructions on the same assembler line. And yes this matters: looking at the -S output is a good way to see how the compiler substituted operands into your asm template and put its own code around yours.
Another option is to use C line continuation character \ (backslash). Although the following will generate excessive white space in generate assembly code it will compile and assemble as expected:
int main()
{
__asm__("movl $4, %eax; \
movl $2, %ebx; \
addl %eax, %ebx"
::: "eax", "ebx");
}
Although this is a way of doing it, I'm not suggesting that this is good form. I have a preference for the form you use in your second example using \n\t without line continuation characters.
Regarding splitting up multiple instructions into separate ASM statements:
asm("movl $4, %eax");
asm("movl $2, %ebx"); // unsafe, no operands specifying connections
asm("addl %eax, %ebx");
This is problematic. The compiler can reorder these relative to one another since they are basic assembler with no dependencies. It is possible for a compiler to generate this code:
movl $4, %eax
addl %eax, %ebx
movl $2, %ebx
This of course would not generate the result you expect. When you place all the instructions in a single ASM statement they will be generated in the order you specify.
MSVC/C++
32-bit Microsoft C and C++ compilers support an extension to the language that allows you to place multi-line inline assembly between __asm { and }. Using this mechanism you don't place the inline assembly in a C string; don't need to use line continuation; and no need to end a statement with with a ; (semicolon).
An example of this would be:
__asm {
mov eax, 4
mov ebx, 2
add ebx, eax
}
You can also just do...
int main()
{
__asm__(
"movl $4, %eax;"
"movl $2, %ebx;"
"addl %eax, %ebx;"
);
}

x86 add and addl operands are adding wrong?

I working with xv6, which implements the original UNIX on x86 machines. I wrote very simple inline assembly in a C program :
register int ecx asm ("%ecx");
printf(1, "%d\n", ecx);
__asm__("movl 16(%esp), %ecx\t\n");
printf(1, "%d\n", ecx);
__asm__("add $0, %ecx\t\n");
printf(1, "%d\n", ecx);
__asm__("movl %ecx, 16(%esp)\t\n");
I usually get a value like 434 printed by the second print statement. However, after the add command it prints 2. If I use the addl command instead, it also prints 2. I am using the latest stable version of xv6. So, I don't really suspect it to be the problem. Is there any other way I can add two numbers in inline assembly?
Essentially I need to increment 16(%esp) by 4.
Edited code to:
__asm__("addl $8, 16(%esp)\t\n");
1) In your example you're not incrementing ecx by 4, your incrementing it by 0.
__asm__("addl $4, %ecx");
2) You should be able to chain multiple commands into one asm call
__asm__("movl 16(%esp), %ecx\n\t"
"addl $4, %ecx\n\t"
"movl %ecx, 16(%esp)");
3) The register keyword is a hint, and the compiler may decide to put your variable where ever it wants still. Also reading the documentation on the GCC page warns about how some functions may clobber various registers. printf() being a C function may very well use the ecx register without preserving its value. It could preserve it, but it may not; the compiler could be using that register for all sorts of optimizations inside of that call. It is a general purpose register on the 80x86 and those are often used for various parameter passing and return values all the time.
Untested corrections:
int reg; // By leaving this out, we give GCC the ability to pick the best available register.
/*
* volatile indicates to GCC that this inline assembly might do odd side
* effects and should disable any optimizations around it.
*/
asm volatile ("movl 16(%esp), %0\n\t"
"addl $4, %0\n\t"
"movl %0, 16(%esp)"
: "r" (reg)); // The "r" indicates we want to use a register
printf("Result: %d\n", reg);
The GCC manage page has more details.

Inline assembly, getting into interrupt

Good day.
I faced a problem that I couldn't solve for several days. The error appears when I try to compile this function in C language.
void GetInInterrupt(UChar Interrupt)
{
//asm volatile(".intel_syntax noprefix");
asm volatile
(
"movb %0, %%al\n"
"movb %%al, 1(point)\n"
"point:\n"
"int $0\n"
: /*output*/ : "r" (Interrupt) /*input*/ : /*clobbered*/
);
//asm volatile(".att_syntax noprefix");
}
Message I get from gas is following:
Error: junk '(point)' after expression
As I can understand the pointer in second line is faulty, but unfortunately I can't solve it by my own.
Thank you for help.
If you can use C++, then this one:
template <int N> static inline void GetInInterrupt (void)
{
__asm__ ("int %0\n" : "N"(N));
}
will do. If I use that template like:
GetInInterrupt<123>();
GetInInterrupt<3>();
GetInInterrupt<23>();
GetInInterrupt<0>();
that creates the following object code: 0: cd 7b int $0x7b
2: cc int3
3: cd 17 int $0x17
5: cd 00 int $0x0
which is pretty much optimal (even for the int3 case, which is the breakpoint op). It'll also create a compile-time warning if the operand is out of the 0..255 range, due to the N constraint allowing only that.
Edit: plain old C-style macros work as well, of course:
#define GetInInterrupt(arg) __asm__("int %0\n" : : "N"((arg)) : "cc", "memory")
creates the same code as the C++ templated function. Due to the way int behaves, it's a good idea to tell the compiler (via the "cc", "memory" constraints) about the barrier semantics, to make sure it doesn't try to re-order instructions when embedding the inline assembly.
The limitation of both is, obviously, the fact that the interrupt number must be a compile-time constant. If you absolutely don't want that, then creating a switch() statement created e.g. with the help of BOOST_PP_REPEAT() covering all 255 cases is a better option than self-modifying code, i.e. like:
#include <boost/preprocessor/repetition/repeat.html>
#define GET_INTO_INT(a, INT, d) case INT: GetInInterrupt<INT>(); break;
void GetInInterrupt(int interruptNumber)
{
switch(interruptNumber) {
BOOST_PP_REPEAT(256, GET_INTO_INT, 0)
default:
runtime_error("interrupt Number %d out of range", interruptNumber);
}
}
This can be done in plain C (if you change the templated function invocation for a plain __asm__ of course) - because the boost preprocessor library does not depend on a C++ compiler ... and gcc 4.7.2 creates the following code for this:
GetInInterrupt:
.LFB0:
cmpl $255, %edi
jbe .L262
movl %edi, %esi
xorl %eax, %eax
movl $.LC0, %edi
jmp runtime_error
.p2align 4,,10
.p2align 3
.L262:
movl %edi, %edi
jmp *.L259(,%rdi,8)
.section .rodata
.align 8
.align 4
.L259:
.quad .L3
.quad .L4
[ ... ]
.quad .L258
.text
.L257:
#APP
# 17 "tccc.c" 1
int $254
# 0 "" 2
#NO_APP
ret
[ ... accordingly for the other vectors ... ]
Beware though if you do the above ... the compiler (gcc up to and including 4.8) is not intelligent enough to optimize the switch() away, i.e. even if you say static __inline__ ... it'll create the full jump table version of GetInInterrupt(3) instead of just an inlined int3 as would the simpler implementations.
Below show how you could write to a location in the code. It does assume that the code is writeable in the first place, which is typically not the case in mainstream OS's - since that would hide some nasty bugs.
void GetInInterrupt(UChar Interrupt)
{
//asm volatile(".intel_syntax noprefix");
asm volatile
(
"movb %0, point+1\n"
"point:\n"
"int $0\n"
: /*output*/ : "r" (Interrupt) /*input*/ : /*clobbered */
);
//asm volatile(".att_syntax noprefix");
}
I also simplified the code to avoid using two registers, and instead just using the register that Interrupt already is in. If the compiler moans about it, you may find that "a" instead or "r" solves the problem.

syscall from within GCC inline assembly [duplicate]

This question already has answers here:
How to invoke a system call via syscall or sysenter in inline assembly?
(2 answers)
Closed 3 years ago.
is it possible to write a single character using a syscall from within an inline assembly block? if so, how? it should look "something" like this:
__asm__ __volatile__
(
" movl $1, %%edx \n\t"
" movl $80, %%ecx \n\t"
" movl $0, %%ebx \n\t"
" movl $4, %%eax \n\t"
" int $0x80 \n\t"
::: "%eax", "%ebx", "%ecx", "%edx"
);
$80 is 'P' in ascii, but that returns nothing.
any suggestions much appreciated!
You can use architecture-specific constraints to directly place the arguments in specific registers, without needing the movl instructions in your inline assembly. Furthermore, then you can then use the & operator to get the address of the character:
#include <sys/syscall.h>
void sys_putc(char c) {
// write(int fd, const void *buf, size_t count);
int ret;
asm volatile("int $0x80"
: "=a"(ret) // outputs
: "a"(SYS_write), "b"(1), "c"(&c), "d"(1) // inputs
: "memory"); // clobbers
}
int main(void) {
sys_putc('P');
sys_putc('\n');
}
(Editor's note: the "memory" clobber is needed, or some other way of telling the compiler that the memory pointed-to by &c is read. How can I indicate that the memory *pointed* to by an inline ASM argument may be used?)
(In this case, =a(ret) is needed to indicate that the syscall clobbers EAX. We can't list EAX as a clobber because we need an input operand to use that register. The "a" constraint is like "r" but can only pick AL/AX/EAX/RAX. )
$ cc -m32 sys_putc.c && ./a.out
P
You could also return the number of bytes written that the syscall returns, and use "0" as a constraint to indicate EAX again:
int sys_putc(char c) {
int ret;
asm volatile("int $0x80" : "=a"(ret) : "0"(SYS_write), "b"(1), "c"(&c), "d"(1) : "memory");
return ret;
}
Note that on error, the system call return value will be a -errno code like -EBADF (bad file descriptor) or -EFAULT (bad pointer).
The normal libc system call wrapper functions check for a return value of unsigned eax > -4096UL and set errno + return -1.
Also note that compiling with -m32 is required: the 64-bit syscall ABI uses different call numbers (and registers), but this asm is hard-coding the slow way of invoking the 32-bit ABI, int $0x80.
Compiling in 64-bit mode will get sys/syscall.h to define SYS_write with 64-bit call numbers, which would break this code. So would 64-bit stack addresses even if you used the right numbers. What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code? - don't do that.
IIRC, two things are wrong in your example.
Firstly, you're writing to stdin with mov $0, %ebx
Second, write takes a pointer as it's second argument, so to write a single character you need that character stored somewhere in memory, you can't write the value directly to %ecx
ex:
.data
char: .byte 80
.text
mov $char, %ecx
I've only done pure asm in Linux, never inline using gcc, you can't drop data into the middle of the assembly, so I'm not sure how you'd get the pointer using inline assembly.
EDIT: I think I just remembered how to do it. you could push 'p' onto the stack and use %esp
pushw $80
movl %%esp, %%ecx
... int $0x80 ...
addl $2, %%esp
Something like
char p = 'P';
int main()
{
__asm__ __volatile__
(
" movl $1, %%edx \n\t"
" leal p , %%ecx \n\t"
" movl $0, %%ebx \n\t"
" movl $4, %%eax \n\t"
" int $0x80 \n\t"
::: "%eax", "%ebx", "%ecx", "%edx"
);
}
Add: note that I've used lea to Load the Effective Address of the char into ecx register; for the value of ebx I tried $0 and $1 and it seems to work anyway ...
Avoid the use of external char
int main()
{
__asm__ __volatile__
(
" movl $1, %%edx \n\t"
" subl $4, %%esp \n\t"
" movl $80, (%%esp)\n\t"
" movl %%esp, %%ecx \n\t"
" movl $1, %%ebx \n\t"
" movl $4, %%eax \n\t"
" int $0x80 \n\t"
" addl $4, %%esp\n\t"
::: "%eax", "%ebx", "%ecx", "%edx"
);
}
N.B.: it works because of the endianness of intel processors! :D

Resources