Why we need Clobbered registers list in Inline Assembly?

Why we need Clobbered registers list in Inline Assembly? - c

In my guide book it says:
In inline assembly, Clobbered registers list is used to tell the
compiler which registers we are using (So it can empty them before
that).
Which I totally don't understand, why the compiler needs to know so? what's the problem of leaving those registers as is? did they meant instead to back them up and restore them after the assembly code.
Hope someone can provide an example as I spent hours reading about Clobbered registers list with no clear answers to this problem.

The problems you'd see from failing to tell the compiler about registers you modify would be exactly the same as if you wrote a function in asm that modified some call-preserved registers1. See more explanation and a partial example in Why should certain registers be saved? What could go wrong if not?
In GNU inline-asm, all registers are assumed preserved, except for ones the compiler picks for "=r" / "+r" or other output operands. The compiler might be keeping a loop counter in any register, or anything else that it's going to read later and expect it to still have the value it put there before the instructions from the asm template. (With optimization disabled, the compiler won't keep variables in registers across statements, but it will when you use -O1 or higher.)
Same for all memory except for locations that are part of an "=m" or "+m" memory output operand. (Unless you use a "memory" clobber.) See How can I indicate that the memory *pointed* to by an inline ASM argument may be used? for more details.
Footnote 1:
Unlike for a function, you should not save/restore any registers with your own instructions inside the asm template. Just tell the compiler about it so it can save/restore at the start/end of the whole function after inlining, and avoid having any values it needs in them. In fact, in ABIs with a red-zone (like x86-64 System V) using push/pop inside the asm would be destructive: Using base pointer register in C++ inline asm
The design philosophy of GNU C inline asm is that it uses the same syntax as the compiler internal machine-description files. The standard use-case is for wrapping a single instruction, which is why you need early-clobber declarations if the asm code in the template string doesn't read all its inputs before it writes some registers.
The template is a black box to the compiler; it's up to you to accurately describe it to the optimizing compiler. Any mistake is effectively undefined behaviour, and leaves room for the compiler to mess up other variables in the surrounding code, potentially even in functions that call this one if you modify a call-preserved register that the compiler wasn't otherwise using.
That makes it impossible to verify correctness just by testing. You can't distinguish "correct" from "happens to work with this surrounding code and set of compiler options". This is one reason why you should avoid inline asm unless the benefits outweigh the downsides and risk of bugs. https://gcc.gnu.org/wiki/DontUseInlineAsm
GCC just does a string substitution into the template string, very much like printf, and sends the whole result (including the compiler-generated instructions for the pure C code) to the assembler as a single file. Have a look on https://godbolt.org/ sometime; even if you have invalid instructions in the inline asm, the compiler itself doesn't notice. Only when you actually assemble will there be a problem. ("binary" mode on the compiler-explorer site.)
See also https://stackoverflow.com/tags/inline-assembly/info for more links to guides.

Related

Reasons a C Compiler Ignores register Declaration

What are the reasons behind a C compiler ignoring register declarations? I understand that this declaration is essentially meaningless for modern compilers since they store values in registers when appropriate. I'm taking a Computer Architecture class so it's important that I understand why this is the case for older compilers.
"A register declaration advises the compiler that the variable in
question will be heavily used. The idea is that register variables
are to be placed in machine registers, which may result in smaller and
faster programs. But compilers are free to ignore the advice." ~ The C Programming Language by Brian W. Kernighan and Dennis M. Ritchie
Thank you!

Historical C compilers might only be "smart" (complex) enough to look at one C statement at a time, like modern TinyCC. Or parse a whole function to find out how many variables there are, then come back and still only do code-gen one statement at a time. For examples of how simplistic and naive old compilers are, see Why do C to Z80 compilers produce poor code? - some of the examples shown could have been optimized for the special simple case, but weren't. (Despite Z80 and 6502 being quite poor C compiler targets because (unlike PDP-11) "a pointer" isn't something you can just keep in a register.)
With optimization enabled, modern compilers do have enough RAM (and compile-time) available to use more complex algorithms to map out register allocation for the whole function and make good decisions anyway, after inlining. (e.g. transform the program logic into an SSA form.) See also https://en.wikipedia.org/wiki/Register_allocation
The register keyword becomes pointless; the compiler can already notice when the address of a variable isn't taken (something the register keyword disallows). Or when it can optimize away the address-taking and keep a variable in a register anyway.
TL:DR: Modern compilers no longer need hand-holding to fully apply the as-if rule in the ways the register keyword hinted at.
They basically always keep everything in registers except when forced to spill it back to memory, unless you disable optimization for fully consistent debugging. (So you can change variables with a debugger when stopped at a breakpoint, or even jump between statements.)
Fun fact: Why does clang produce inefficient asm with -O0 (for this simple floating point sum)? shows an example where register float makes modern GCC and clang create more efficient asm with optimization disabled.

How can I write in (GNU) C a proxy function to interface two different calling conventions?

I'm writing an interpreter/compiler hybrid where the calling convention passes parameters on the CPU stack. Functions are simply pointers to machine code (like C function pointers) potentially generated at runtime. I need a proxy function to interface with the custom calling convention. I want to write as much as possible of this function in C, although necessarily some parts will have to be written in assembly. I will refer to this proxy function as apply.
I don't fully understand the semantics of GCC inline assembly and I would like to know if the following tentative implementation of a 1-ary apply function is correct, or where it goes wrong. In particular, I wonder about the integrity of the stack between the many __asm__ blocks: how does the compiler (GCC and clang in my case) interpret the stack pointer register being clobbered, and what are the consequences of that in the generated code? Does the compiler understand that I want to "own" the stack? Is the memory clobber necessary?
Through experimentation I found that clang with -fomit-frame-pointer correctly disables this optimization for a function when it sees the rsp register in a clobber list, since rsp is obviously not anymore a reliable way of addressing local variables on the stack. This is not true in GCC, and as a consequence it generates buggy code (this seems like a bug in GCC). So I guess this answers some of my questions. I can live with -fno-omit-frame-pointer, but it seems as if GCC doesn't consider the various implications of rsp being clobbered.
This is written for x86-64, although I am interested in eventually porting it to other architectures. We assume all registers are preserved across calls in the custom calling convention.
#define push(x) \
__asm__ volatile ("pushq %0;" : : "g" (x) : "rsp", "memory")
#define pop(n) \
__asm__ volatile ("addq %0, %%rsp;" : : "g" (n * 8) : "rsp", "memory")
#define call(f) \
__asm__ volatile ("callq *%0;" : : "g" (f) : "cc", "memory")
void apply(void* f, void* x) {
push(x);
call(f);
pop(1);
}
I think the -mno-red-zone flag is technically necessary to use the stack in the way I want. Is this correct?
The previous code assumes all registers are preserved across calls. But if there's a set of registers which aren't preserved, how should I reflect this in the code? I get the feeling that adding them to the call clobber list won't produce correct results because the registers may be pushed onto the top of the stack, shadowing the pushed x. If instead they are saved on a previously reserved area of the call frame, it may work. Is this the case? Can I rely on this behaviour? (Is it silly of me to hope so?)
Another option would be to manually preserve and restore these registers but I have a strong feeling this will only give the illusion of safety and break at some point.

I need a proxy function to interface with the custom calling convention. I want to write as much as possible of this function in C, although necessarily some parts will have to be written in assembly.
I'm sorry, this simply will not work. You must write the entire proxy function in assembly language.
More concretely -- I don't know about clang, but GCC assumes at a very basic level that nobody touches the stack pointer in inline assembly, ever. That doesn't mean it will error out -- it means it will blithely mis-optimize on the assumption that you didn't do that, even though you told it you did. This is not something that is likely ever to change; it's baked into the register allocator and all umpteen CPU back ends.
Now, the good news is, you may be able to persuade libffi to do what you want. It's got proxy functions which someone else has written in assembly language for you; if it fits your use case, it'll save you quite a bit of trouble.

__fastcall vs register syntax?

Currently I have a small function which gets called very very very often (looped multiple times), taking one argument. Thus, it's a good case for a __fastcall.
I wonder though.
Is there a difference between these two syntaxes:
void __fastcall func(CTarget *pCt);
and
void func(register CTarget *pCt);
After all, those two syntaxes basically tell the compiler to pass the argument in registers right?
Thanks!

__fastcall defines a particular convention.
It was first added by Microsoft to define a convention in which the first two arguments that fit in the ECX and EDX registers are placed in them (on x86, on x86-64 the keyword is ignored though the convention that is used already makes an even heavier use of registers anyway).
Some other compilers also have a __fastcall or fastcall. GCC's is much as Microsofts. Borland uses EAX, EDX & ECX.
Watcom recognises the keyword for compatibility, but ignores it and uses EAX, EDX, EBX & ECX regardless. Indeed, it was the belief that this convention was behind Watcom beating Microsoft on several benchmarks a long time ago that led to the invention of __fastcall in the first place. (So MS could produce a similar effect, while the default would remain compatible with older code).
_mregparam can also be used with some compilers to change the number of registers used (some builds of the Linux kernel are on Intel or GCC but with _mregparam 3 so as to result in a similar result as that of __fastcall on Borland.
It's worth noting that the state of the art having moved on in many regards, (the caching that happens in CPUs being particularly relevant) __fastcall may in fact be slower than some other conventions in some cases.
None of the above is standard.
Meanwhile, register is a standard keyword originally defined as "please put this in a register if possible" but more generally meaning "The address of this automatic variable or parameter will never be used. Please make use of this in optimising, in whatever way you can". This may mean en-registering the value, it may be ignored, or it may be used in some other compiler optimisation (e.g. the fact that the address cannot be taken means certain types of aliasing error can't happen with certain optimisations).
As a rule, it's largely ignored because compilers can tell if you took an address or not and just use that information (or indeed have a memory location, copy into a register for a bunch or work, then copy back before the address is used). Conversely, it may be ignored in function signatures just to allow conventions to remain conventions (especially if exported, then it would either have to be ignored, or have to be considered part of the signature; as a rule, it's ignored by most compilers).
And all of this becomes irrelevant if the compiler decides to inline, as there is then no real "argument passing" at all.
register is enforced, so it can serve as an assertion that you won't take the address; any attempt to do so is then a compile error.

Visual Studio 2012 Microsoft documentation regarding the register keyword:
The compiler does not accept user requests for register variables; instead, it makes its own register choices when global register-allocation optimization (/Oe option) is on. However, all other semantics associated with the register keyword are honored.
Visual Studio 2012 Microsoft documentation regarding the __fastcall keyword:
The __fastcall calling convention specifies that arguments to functions are to be passed in registers, when possible. The following list shows the implementation of this calling convention.
You can still have a look at the assembler code created by the compiler to check what actually happens.

register is essentially meaningless in modern C/C++. Compilers ignore it, putting whichever variables in registers they want (and note that a given variable will often be in a register some of the time, and in the stack some of the time, during the function's execution). It has some minor utility in hinting non-aliasing, but using restrict (or a given compiler's equivalent to restrict) is a better way to achieve that.
__fastcall does improve performance slightly, though not as much as you'd expect. If you have a small function which is called often, the number one thing to do to improve performance is to inline it.

In short, it depends on your architecture and your compiler.
The main difference between these two syntaxes is that register is standardized and __fastcall isn't, but they are both calling conventions.
The default calling convention in C is the cdecl, where parameters are pushed into the stack in reverse order, and return value is stored on EAX register. Every data register can be used in the function, before the call they are caller-saved.
There is another convention, the fastcall, which is indicated by the register keyword. It passes arguments into EAX, ECX and EDX registers (the remaining args are pushed into the stack).
And __fastcall keyword isn't conventionned, it totaly depends on your compiler. With cl (Visual Studio), it seems to store the four first arguments of your function to registers, except on x86-64 and ARM archs. With gcc, the two first arguments are stored on register, regardless of the arch.
But keep in mind that compilers are able by themselves to optimize your code to greatly improve its speed. And I bet that for your function there is a better way to optimize your code.
But you need to disable optimisation to use these keywords (volatile as well). Which is a thing I totaly not recommend.

Unable to understand following macro [duplicate]

This question already has answers here:
memory barrier and atomic_t on linux
(2 answers)
Closed 9 years ago.
I found below macro when i am going through kernel source code and I am unable to understand what it is doing.
#define barrer() __asm__ __volatile__("":::"memory")
Please some one clarify this.

This is a compiler memory barrier used to prevent the compiler from reordering instructions, if we look at the Wikipedia article on Memory ordering it says:
These barriers prevent a compiler from reordering instructions, they do not prevent reordering by CPU
The GNU inline assembler statement
asm volatile("" ::: "memory");
or even
__asm__ __volatile__ ("" ::: "memory");
forbids GCC compiler to reorder read and write commands around it.
You can find details of how this works in the Clobber List section of the GCC-Inline-Assembly-HOWTO and I quote:
[...]If our instruction modifies memory in an unpredictable fashion, add "memory" to the list of clobbered registers. This will cause GCC to not keep memory values cached in registers across the assembler instruction. We also have to add the volatile keyword if the memory affected is not listed in the inputs or outputs of the asm. [...]

It's gcc inline assembly. However there is no actual assembly in there (the first empty string) and only the specified side effects are relevant.
That is the "memory" clobber. It tells the compiler that the assembly accesses memory (and not just registers) and so the compiler must not reorder its own memory accesses across it to prevent reading old values or overwriting new values.
Thus it acts, as the macro name tells, as a compiler level memory barrier on the language level. It is not sufficient to prevent hardware based memory access reordering which would be necessary when DMA or other processors in a SMP machine would be involved.
The __volatile__ makes sure that the inline assembly is not optimized away or reordered with respect to other volatile statements. It is not strictly necessary since gcc assumes inline assembly without output to be volatile.
That is the implementation. Other memory barrier primitives and their documentation can be found in Documentation/memory-barriers.txt in the Linux kernel sources.

This macro doesn't "do" anything on a language level of C, but it does prevent the compiler from reordering code around this barrier.
If you have platform knowledge on how your generated code behaves in concurrent execution contexts, then you may be able to produce a correct program as long as you can prevent the compiler from changing the order of your instructions. This barrier is a building block in such platform-specific, concurrent code.
As an example, you might want to write some kind of lock-free queue, and you're relying on the fact that your architecture (x86?) already comes with a strongly orderered memory model, so your naive stores and loads imply sufficient synchronization, provided the emitted code follows the source code order. Pairing the platform guarantees with this compiler barrier allows you to end up with correct machine code (although it's of course undefined behaviour from the perspective of C).

Arbitrary code execution using existing code only

Let's say I want to execute an arbitrary mov instruction. I can write the following function (using GCC inline assembly):
void mov_value_to_eax()
{
asm volatile("movl %0, %%eax"::"m"(function_parameter):"%eax");
// will move the value of the variable function_parameter to register eax
}
And I can make functions like this one that will work on every possible register.
I mean -
void movl_value_to_ebx() { asm volatile("movl %0, %%ebx"::"m"(function_parameter):"%ebx"); }
void movl_value_to_ecx() { asm volatile("movl %0, %%ecx"::"m"(function_parameter):"%ecx"); }
...
In a similar way I can write functions that will move memory in arbitrary addresses into specific registers, and specific registers to arbitrary addresses in memory. (mov eax, [memory_address] and mov [memory_address],eax)
Now, I can perform these basic instructions whenever I want, so I can create other instructions. For example, to move a register to another register:
function_parameter = 0x028FC;
mov_eax_to_memory(); // parameter is a pointer to some temporary memory address
mov_memory_to_ebx(); // same parameter
So I can parse an assembly instruction and decide what functions to use based on it, like this:
if (sourceRegister == ECX) mov_ecx_to_memory();
if (sourceRegister == EAX) mov_eax_to_memory();
...
if (destRegister == EBX) mov_memory_to_ebx();
if (destRegister == EDX) mov_memory_to_edx();
...
If it can work, It allows you to execute arbitrary mov instructions.
Another option is to make a list of functions to call and then loop through the list and call each function. Maybe it requires more tricks for making equivalent instructions like these.
So my question is this: Is is possible to make such things for all (or some) of the possible opcodes? It probably requires a lot of functions to write, but is it possible to make a parser, that will build code somehow based on given assembly instructions ,and than execute it, or that's impossible?
EDIT: You cannot change memory protections or write to executable memory locations.

It is really unclear to me why you're asking this question. First of all, this function...
void mov_value_to_eax()
{
asm volatile("movl %0, %%eax"::"m"(function_parameter):"%eax");
// will move the value of the variable function_parameter to register eax
}
...uses GCC inline assembly, but the function itself is not inline, meaning that there will be prologue & epilogue code wrapping it, which will probably affect your intended result. You may instead want to use GCC inline assembly functions (as opposed to functions that contain GCC inline assembly), which may get you closer to what you want, but there are still problems with that.....
OK, so supposing you write a GCC inline assembly function for every possible x86 opcode (at least the ones that the GCC assembler knows about). Now supposing you want to invoke those functions in arbitrary order to accomplish whatever you might wish to accomplish (taking into account which opcodes are legal to execute at ring 3 (or in whatever ring you're coding for)). Your example shows you using C statements to encode logic for determining whether to call an inline assembly function or not. Guess what: Those C statements are using processor registers (perhaps even EAX!) to accomplish their tasks. Whatever you wanted to do by calling these arbitrary inline assembly functions is being stomped on by the compiler-emitted assembly code for the logic (if (...), etc). And vice-versa: Your inline assembly function arbitrary instructions are stomping on the registers that the compiler-emitted instructions expect to not be stomped-on. The result is not likely to run without crashing.
If you want to write code in assembly, I suggest you simply write it in assembly & use the GCC assembler to assemble it. Alternatively, you can write whole C-callable assembly functions within an asm() statement, and call them from your C code, if you like. But the C-callable assembly functions you write need to operate within the rules of the calling convention (ABI) you're using: If your assembly functions use a callee-saved register, your function will need to save the original value in that register (generally on the stack), and then restore it before returning to the caller.

...OK, based on your comment Because if it's working it can be a way to execute code if you can't write it to memory. (the OS may prevent it)....
Of course you can execute arbitrary instructions (as long as they're legal for whatever ring you're running in). How else would JIT work? You just need to call the OS system call(s) for setting the permissions of the memory page(s) in which your instructions reside... change them to "executable" and then call 'em!

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight