GCC Assembly Inline: Function Body with Only Inlined Assembly Code - c

I am trying to reuse some assembly code in my C project. Suppose I have a sequence of instructions, and I would like to organize them as a function:
void foo() {
__asm__ (
"mov %eax, %ebx"
"push %eax"
...
);
}
However, one obstacle is that in the compiled assembly code of function foo, besides the inlined assembly code, compiler would also generate some prologue instructions for this function, and the whole assembly program would become something like:
foo:
push %ebp <---- routine code generated by compilers
mov %ebp, %esp <---- routine code generated by compilers
mov %eax, %ebx
push %eax
Given my usage scenario, such routine code actually breaks the original semantics of the inlined assembly.
So here is my question, is there any way that I can prevent compiler from generating those function prologue and epilogue instructions, and only include the inlined assembly code?

You mention that you use gcc for compiling.
In this case you can use -O2 optimization level. This will cause the compiler to do stack optimization and if your inline assembly is simple, it won't insert the prologue and epilogue. Although, this might not be guaranteed in every case because optimizations keep changing. (My gcc with -O2 does it).
Another option is that you can put the entire function (including the foo:) inside an assembly block as
__asm__ (
"foo:\n"
"mov ..."
);
With this option you need to know the name mangling specifications if any. You will also have to add .globl foo before the function start if you want the function to be non static.
Lastly you can check the gcc __attribute__ ((naked)) attribute on the function declaration. But as mentioned by MichaelPetch, this is not available for the X86 target.

The whole point of inline asm code is to interface with the C compiler's scheduler and register allocator in a sane way, by giving you a way to specify how to hook up the assembly code to the compiler's constraint solving machinery. That's why it rarely makes sense to have inline asm code with specific registers in it; you instead want to use constraints to allocate some registers and have the compiler tell you what they are.
If you really want to write stand-alone asm code that communicates with the rest of you program by the system ABI, write that code in a separate .s (or .S) file that you include in your project, rather than trying to use inline asm code.

Related

How does inline (x86) assembly affect the program flow?

I'm trying to understand how such snippets are invoked during run time:
__asm{
PUSH ES
MOV CX,0
//... More x86 assembly
};
Won't tweaking the registers corrupt the program flow execution?
For example: If CX above holds some value, wouldn't this mean that this register value will no longer be valid?
Does the compiler take care of these dependencies or does the execution of the snippet happens under special circumstances?
On which compilers the usage of inline assembly is not transparent?
GCC
In GCC you have to specify the affected registers explicitly to prevent corruption of the exectution flow:
asm [volatile] ( AssemblerTemplate
: OutputOperands
[ : InputOperands
[ : Clobbers ] ])
While the compiler is aware of changes to entries listed in the output
operands, the inline asm code may modify more than just the
outputs.[...] calculations may require additional registers, [...]
list them in the clobber list.
Please use the "memory" clobber argument if your code performs reads or writes to other items, than already listed.
The "memory" clobber tells the compiler that the assembly code
performs memory reads or writes to items other than those listed in
the input and output operands
Reference: https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html
MSVC
In MSVC on the other hand, you don't need to preserve the general purpose registers:
When using __asm to write assembly language in C/C++ functions, you
don't need to preserve the EAX, EBX, ECX, EDX, ESI, or EDI registers. [...]
You should preserve other registers you use (such as DS, SS, SP, BP,
and flags registers) for the scope of the __asm block. You should
preserve the ESP and EBP registers unless you have some reason to
change them.
Reference: https://msdn.microsoft.com/en-us/library/k1a8ss06.aspx
EDITS: changed should to have to for gcc and added note about the "memory" clobber argument, follwing Olafs suggestions.
There are some additional flags that can be passed to the inline assembly code. One of them is the "clobber list", that indicates to the C/C++ compiler the list of registers that will be modified by the bloc of assembly code.
Note that the way to specify these additional flags is dependent on the compiler (it is completely different in Microsoft Visual C++, GCC etc...)
For GCC, see for instance:
https://www.ibiblio.org/gferg/ldp/GCC-Inline-Assembly-HOWTO.html#ss5.3

Is it possible to convert C to asm without link libc on Linux?

Test platform is on Linux 32 bit. (But certain solution on windows 32 bit is also welcome)
Here is a c code snippet:
int a = 0;
printf("%d\n", a);
And if I use gcc to generate assembly code
gcc -S test.c
Then I will get:
movl $0, 28(%esp)
movl 28(%esp), %eax
movl %eax, 4(%esp)
movl $.LC0, (%esp)
call printf
leave
ret
And this assembly code needs linking to libc to work(because of the call printf)
My question is :
Is it possible to convert C to asm with only explicit using system call automatically, without using libc?
Like this:
pop ecx
add ecx,host_msg-host_reloc
mov eax,4
mov ebx,1
mov edx,host_msg_len
int 80h
mov eax,1
xor ebx,ebx
int 80h
Directly call the int 80h software interrupt.
Is it possible? If so, is there any tool on this issue?
Thank you!
Not from that source code. A call to printf() cannot be converted by the compiler to a call to the write system call - the printf() library function contains a significant amount of logic which is not present in the system call (such as processing the format string and converting integer and floating-point numbers to strings).
It is possible to generate system calls directly, but only by using inline assembly. For instance, to generate a call to _exit(0) (not quite the same as exit()!), you would write:
#include <asm/unistd.h>
...
int retval;
asm("int $0x80" : "=a" (retval) : "a" (__NR_exit_group), "b" (0) : "memory");
For more information on GCC inline assembly, particularly on the constraints I'm using here to map variables to registers, please read the GCC Inline Assembly HOWTO. It's rather old, but still perfectly relevant.
Note that doing this is not recommended. The exact calling conventions for system calls (e.g, which registers are used for the call number and arguments, how errors are returned, etc) are different on different architectures, operating systems, and even between 32-bit and 64-bit x86. Writing code this way will make it very difficult to maintain.
You can certainly compile C code to assembly without linking to libc, but you can't use the C library functions. Libc's entire purpose IS to provide the interface from C library functions to Linux system calls (or Windows, or whatever system you're on). So, if you didn't want to use libc, you would have to write your own wrappers to the system calls.
If you compile some C code which does not use any function from the C library (e.g. does not use printf or malloc etc etc....) in the free-standing mode of the GCC compiler (i.e. with -ffreestanding flag to gcc), you'll need either to call some assembler function (from some other object or library) or to use asm instruction (you won't be able to do any kind of input output without making a syscall).
Read also the Assembly HowTo, the x86 calling conventions and the ABI relevant to your kernel (probably x86-64 ABI) and understand quite well what are system calls, starting with syscalls(2) and what is the VDSO (int 80 is not the best way to make syscalls these days, SYSENTER is often better). Study the source code of some libc, in particular of MUSL libc (whose source code is very readable).
On Windows (which is not free software and which I don't know) the question could be much more difficult: I am not sure that the system call level is exactly and completely documented.
The libffi enables you to call arbitrary functions from C. You could also cast function pointers from dlsym(3). You could consider JIT techniques (e.g. libjit, GNU lightning, asmjit etc...).

Arbitrary code execution using existing code only

Let's say I want to execute an arbitrary mov instruction. I can write the following function (using GCC inline assembly):
void mov_value_to_eax()
{
asm volatile("movl %0, %%eax"::"m"(function_parameter):"%eax");
// will move the value of the variable function_parameter to register eax
}
And I can make functions like this one that will work on every possible register.
I mean -
void movl_value_to_ebx() { asm volatile("movl %0, %%ebx"::"m"(function_parameter):"%ebx"); }
void movl_value_to_ecx() { asm volatile("movl %0, %%ecx"::"m"(function_parameter):"%ecx"); }
...
In a similar way I can write functions that will move memory in arbitrary addresses into specific registers, and specific registers to arbitrary addresses in memory. (mov eax, [memory_address] and mov [memory_address],eax)
Now, I can perform these basic instructions whenever I want, so I can create other instructions. For example, to move a register to another register:
function_parameter = 0x028FC;
mov_eax_to_memory(); // parameter is a pointer to some temporary memory address
mov_memory_to_ebx(); // same parameter
So I can parse an assembly instruction and decide what functions to use based on it, like this:
if (sourceRegister == ECX) mov_ecx_to_memory();
if (sourceRegister == EAX) mov_eax_to_memory();
...
if (destRegister == EBX) mov_memory_to_ebx();
if (destRegister == EDX) mov_memory_to_edx();
...
If it can work, It allows you to execute arbitrary mov instructions.
Another option is to make a list of functions to call and then loop through the list and call each function. Maybe it requires more tricks for making equivalent instructions like these.
So my question is this: Is is possible to make such things for all (or some) of the possible opcodes? It probably requires a lot of functions to write, but is it possible to make a parser, that will build code somehow based on given assembly instructions ,and than execute it, or that's impossible?
EDIT: You cannot change memory protections or write to executable memory locations.
It is really unclear to me why you're asking this question. First of all, this function...
void mov_value_to_eax()
{
asm volatile("movl %0, %%eax"::"m"(function_parameter):"%eax");
// will move the value of the variable function_parameter to register eax
}
...uses GCC inline assembly, but the function itself is not inline, meaning that there will be prologue & epilogue code wrapping it, which will probably affect your intended result. You may instead want to use GCC inline assembly functions (as opposed to functions that contain GCC inline assembly), which may get you closer to what you want, but there are still problems with that.....
OK, so supposing you write a GCC inline assembly function for every possible x86 opcode (at least the ones that the GCC assembler knows about). Now supposing you want to invoke those functions in arbitrary order to accomplish whatever you might wish to accomplish (taking into account which opcodes are legal to execute at ring 3 (or in whatever ring you're coding for)). Your example shows you using C statements to encode logic for determining whether to call an inline assembly function or not. Guess what: Those C statements are using processor registers (perhaps even EAX!) to accomplish their tasks. Whatever you wanted to do by calling these arbitrary inline assembly functions is being stomped on by the compiler-emitted assembly code for the logic (if (...), etc). And vice-versa: Your inline assembly function arbitrary instructions are stomping on the registers that the compiler-emitted instructions expect to not be stomped-on. The result is not likely to run without crashing.
If you want to write code in assembly, I suggest you simply write it in assembly & use the GCC assembler to assemble it. Alternatively, you can write whole C-callable assembly functions within an asm() statement, and call them from your C code, if you like. But the C-callable assembly functions you write need to operate within the rules of the calling convention (ABI) you're using: If your assembly functions use a callee-saved register, your function will need to save the original value in that register (generally on the stack), and then restore it before returning to the caller.
...OK, based on your comment Because if it's working it can be a way to execute code if you can't write it to memory. (the OS may prevent it)....
Of course you can execute arbitrary instructions (as long as they're legal for whatever ring you're running in). How else would JIT work? You just need to call the OS system call(s) for setting the permissions of the memory page(s) in which your instructions reside... change them to "executable" and then call 'em!

Pushing a pointer into the eax and ebx registers in GCC

I need to push a pointer into the eax and another into the ebx register. I first solved this with:
register int eax asm("eax");
register int ebx asm("ebx");
int main()
{
eax = ptr1;
ebx = ptr2;
}
Which worked like a charm. However, when I added this into my other code, I got some strange errors about gcc being unable to find a register to spill in class AREG, in totally unrelated part of the code. I googled, and it turns out to actually be a bug in gcc -.-. So, I need an other way, to push two pointers, into the eax and ebx registers. Anyone any ideas?
Edit:
Since people have been asking what I am trying to accomplish here, I thought I'd explain a bit.
I need to change the eax and ebx for some assembly code I'm trying to run in my program. I need to execute this assembly code, and give a pointer to the parameter via the eax and ebx register. I execute the assembly code by pushing a pointer to it in ebx and that call ebx. When I don't call the register stuff globally, but locally, the assembly code crashes. If I call it globally, I get this weird error at the end of a random function. When I remove that functions, it throws the same error at another random function. Until I ran out of functions, then it works, but then I miss the rest of the code :P
If you have (inline) assembly code that requires specific parameters in EAX/EBX, the way to do this in gcc is to use the following:
__asm__("transmogrify %0, %1\n" : "+a"(val_for_eax), "+b"(val_for_ebx));
This uses what gcc calls inline assembly constraints which tell the compiler that the assembly code - whatever it is - expects val_for_eax/val_for_ebx in EAX/EBX (that's the a/b part) as well as that it will return potentially modified versions of these variables (that's the +) in these registers as well.
Beyond that, the actual code within the asm() statement doesn't matter to the compiler - it'll only need/want to know where the parameters %0 and %1 live. The above example will, due to a transmogrify instruction not existing in the current x86 instruction set, fail when the assembler runs; just substitute it with something valid.
The explanations why gcc behaves this way and exactly what you can tell it to do is in the GCC manual, at:
Extended Assembly - Assembler Instructions with C operands
Constraints for asm operands, in particular the Intel/386 section of the Machine-specific Constraints list for what to say if you need to pass/retrieve a value in a specific register, and the Modifiers section about the meaning of things like the + (to both pass and return a value; there are other such "modifiers" to the constraints)
You can specify a specific register for a variable but due to the way gcc works / the way inline assembly is implemented in gcc, doing so does not mean (!) the register is from then on reserved (out of scope) for gcc to use for its own purposes. That can only be achieved through constraints, for a specific, single asm() block - the constraints tells gcc what to write into those registers before the placement of the actual assembly code, and what to read from them afterwards.
Since the eax register is need all over the place in a valid program on your architecture, your strategy can't work with global variables that are bound to the specific registers. Don't do that, reserving a register globally is not a good idea.
Place the variables that are bound to registers in the particular function, as close as possible to their use.

how to integrate assembly code when i am designing a compiler in c?

i am designing a compiler in c . but for certain problems like big integers i have to code in assembly code . so how can i integrate assembly code in c?
i am wrting my code in dev cpp.. which i suppose uses gcc ... in windows..!!..
pls give me instructions for linux too
using asm
Good article : GCC-Inline-Assembly-HOWTO
Use the 'asm' instruction, e.g.
asm("movl %ecx %eax"); /* moves the contents of ecx to eax */
Don't you compile the runtime with your own compiler?
Note that another option is to use an external assembler (like AS). Less optimal, but the principle is portable. (though assembler syntaxes vary wildly)
Our own little compiler (which is GCC linking compatible) used AS for most of its assembler, and only acquired an own internal assembler after 8 year or so.
P.s. if you implement an internal assembler, have a look at NASM, their tables of assembler instructions and their addressing are really clean and can be often get converted (and used for regular updates for new instructions)

Resources