Currently I have a small function which gets called very very very often (looped multiple times), taking one argument. Thus, it's a good case for a __fastcall.
I wonder though.
Is there a difference between these two syntaxes:
void __fastcall func(CTarget *pCt);
void func(register CTarget *pCt);
After all, those two syntaxes basically tell the compiler to pass the argument in registers right?

__fastcall defines a particular convention.
It was first added by Microsoft to define a convention in which the first two arguments that fit in the ECX and EDX registers are placed in them (on x86, on x86-64 the keyword is ignored though the convention that is used already makes an even heavier use of registers anyway).
Some other compilers also have a __fastcall or fastcall. GCC's is much as Microsofts. Borland uses EAX, EDX & ECX.
Watcom recognises the keyword for compatibility, but ignores it and uses EAX, EDX, EBX & ECX regardless. Indeed, it was the belief that this convention was behind Watcom beating Microsoft on several benchmarks a long time ago that led to the invention of __fastcall in the first place. (So MS could produce a similar effect, while the default would remain compatible with older code).
_mregparam can also be used with some compilers to change the number of registers used (some builds of the Linux kernel are on Intel or GCC but with _mregparam 3 so as to result in a similar result as that of __fastcall on Borland.
It's worth noting that the state of the art having moved on in many regards, (the caching that happens in CPUs being particularly relevant) __fastcall may in fact be slower than some other conventions in some cases.
None of the above is standard.
Meanwhile, register is a standard keyword originally defined as "please put this in a register if possible" but more generally meaning "The address of this automatic variable or parameter will never be used. Please make use of this in optimising, in whatever way you can". This may mean en-registering the value, it may be ignored, or it may be used in some other compiler optimisation (e.g. the fact that the address cannot be taken means certain types of aliasing error can't happen with certain optimisations).
As a rule, it's largely ignored because compilers can tell if you took an address or not and just use that information (or indeed have a memory location, copy into a register for a bunch or work, then copy back before the address is used). Conversely, it may be ignored in function signatures just to allow conventions to remain conventions (especially if exported, then it would either have to be ignored, or have to be considered part of the signature; as a rule, it's ignored by most compilers).
And all of this becomes irrelevant if the compiler decides to inline, as there is then no real "argument passing" at all.
register is enforced, so it can serve as an assertion that you won't take the address; any attempt to do so is then a compile error.

Visual Studio 2012 Microsoft documentation regarding the register keyword:
The compiler does not accept user requests for register variables; instead, it makes its own register choices when global register-allocation optimization (/Oe option) is on. However, all other semantics associated with the register keyword are honored.
Visual Studio 2012 Microsoft documentation regarding the __fastcall keyword:
The __fastcall calling convention specifies that arguments to functions are to be passed in registers, when possible. The following list shows the implementation of this calling convention.
You can still have a look at the assembler code created by the compiler to check what actually happens.

register is essentially meaningless in modern C/C++. Compilers ignore it, putting whichever variables in registers they want (and note that a given variable will often be in a register some of the time, and in the stack some of the time, during the function's execution). It has some minor utility in hinting non-aliasing, but using restrict (or a given compiler's equivalent to restrict) is a better way to achieve that.
__fastcall does improve performance slightly, though not as much as you'd expect. If you have a small function which is called often, the number one thing to do to improve performance is to inline it.

In short, it depends on your architecture and your compiler.
The main difference between these two syntaxes is that register is standardized and __fastcall isn't, but they are both calling conventions.
The default calling convention in C is the cdecl, where parameters are pushed into the stack in reverse order, and return value is stored on EAX register. Every data register can be used in the function, before the call they are caller-saved.
There is another convention, the fastcall, which is indicated by the register keyword. It passes arguments into EAX, ECX and EDX registers (the remaining args are pushed into the stack).
And __fastcall keyword isn't conventionned, it totaly depends on your compiler. With cl (Visual Studio), it seems to store the four first arguments of your function to registers, except on x86-64 and ARM archs. With gcc, the two first arguments are stored on register, regardless of the arch.
But keep in mind that compilers are able by themselves to optimize your code to greatly improve its speed. And I bet that for your function there is a better way to optimize your code.
But you need to disable optimisation to use these keywords (volatile as well). Which is a thing I totaly not recommend.


Why we need Clobbered registers list in Inline Assembly?

In my guide book it says:
In inline assembly, Clobbered registers list is used to tell the
compiler which registers we are using (So it can empty them before
Which I totally don't understand, why the compiler needs to know so? what's the problem of leaving those registers as is? did they meant instead to back them up and restore them after the assembly code.
Hope someone can provide an example as I spent hours reading about Clobbered registers list with no clear answers to this problem.
The problems you'd see from failing to tell the compiler about registers you modify would be exactly the same as if you wrote a function in asm that modified some call-preserved registers1. See more explanation and a partial example in Why should certain registers be saved? What could go wrong if not?
In GNU inline-asm, all registers are assumed preserved, except for ones the compiler picks for "=r" / "+r" or other output operands. The compiler might be keeping a loop counter in any register, or anything else that it's going to read later and expect it to still have the value it put there before the instructions from the asm template. (With optimization disabled, the compiler won't keep variables in registers across statements, but it will when you use -O1 or higher.)
Same for all memory except for locations that are part of an "=m" or "+m" memory output operand. (Unless you use a "memory" clobber.) See How can I indicate that the memory *pointed* to by an inline ASM argument may be used? for more details.
Footnote 1:
Unlike for a function, you should not save/restore any registers with your own instructions inside the asm template. Just tell the compiler about it so it can save/restore at the start/end of the whole function after inlining, and avoid having any values it needs in them. In fact, in ABIs with a red-zone (like x86-64 System V) using push/pop inside the asm would be destructive: Using base pointer register in C++ inline asm
The design philosophy of GNU C inline asm is that it uses the same syntax as the compiler internal machine-description files. The standard use-case is for wrapping a single instruction, which is why you need early-clobber declarations if the asm code in the template string doesn't read all its inputs before it writes some registers.
The template is a black box to the compiler; it's up to you to accurately describe it to the optimizing compiler. Any mistake is effectively undefined behaviour, and leaves room for the compiler to mess up other variables in the surrounding code, potentially even in functions that call this one if you modify a call-preserved register that the compiler wasn't otherwise using.
That makes it impossible to verify correctness just by testing. You can't distinguish "correct" from "happens to work with this surrounding code and set of compiler options". This is one reason why you should avoid inline asm unless the benefits outweigh the downsides and risk of bugs. https://gcc.gnu.org/wiki/DontUseInlineAsm
GCC just does a string substitution into the template string, very much like printf, and sends the whole result (including the compiler-generated instructions for the pure C code) to the assembler as a single file. Have a look on https://godbolt.org/ sometime; even if you have invalid instructions in the inline asm, the compiler itself doesn't notice. Only when you actually assemble will there be a problem. ("binary" mode on the compiler-explorer site.)
See also https://stackoverflow.com/tags/inline-assembly/info for more links to guides.

Why calling conventions aren't used in all C programs

I am new to programming and while reading Charles Petzold book Programming Windows, I stumbled upon WINAPI (actually was surprised by the presence of another word before a function's name besides the return type) and found that it is a calling convention and to the best of my understanding it is a way of how a function pushes variables on the stack and gets the return value, I wondered why we do not use them in every C programs? Are they just exclusive to OS programming?
Calling conventions are typically tied to compiler, architecture and (when it comes to using system runtime libraries) the OS; they're not part of the C standard at all. In most cases, there's only one calling convention for a given architecture/compiler/OS combo, so you don't need to think about it; it just uses the only convention that OS supports.
The one place where it has mattered a lot in recent history was on 32 bit x86 systems, particularly on Windows. x86 had very few general purpose registers, so only a few were available at all, less than what a typical function might need for its arguments, and using them for argument passing meant you often needed to push whatever they used to contain to the stack, so there were a lot of trade-offs involved in calling conventions (it's faster to pass arguments in registers, but only if the caller could spare the registers or at least not be forced into excessive spilling to stack), and Windows went with "we'll use 'em all in different scenarios".
In modern usage on x86-64 (which is far less register starved) and on non-x86 architectures (which usually had enough registers), most compilers/OSes stick with a single common calling convention, so again, you don't need to pay attention. It's a curiosity, not something you need to personally pay attention to unless you're hand-writing whole functions in assembly.
We DO use calling conventions in all C programs, but they are typically defaulted in the compiler settings and so do not have to be expressed explicitly in code, unless actually necessary (library interactions, etc).
Calling conventions are not part of the C language itself, but are handled by compiler vendors as extensions to the language.
Most C compilers typically default to the __cdecl calling convention, but this can be changed by the compiler user if needed. Once upon a time, Windows APIs used to use __pascal, but for a very long time now __stdcall is being used instead. Hence the existence of the WINAPI preprocessor macro so Microsoft could switch between them without requiring most existing code to be rewritten.

Shall I use register class variables in modern C programs?

In C++, the keyword register was removed in its latest standard ISO/IEC 14882:2017 (C++17).
But also in C, I see a lot, that more and more coders tend to not use or like to declare an object with the register class qualifier because its purposed benefit shall be almost useless, like in #user253751´s answer:
register does not cause the compiler to store a value in a register. register does absolutely nothing. Only extremely old compilers used register to know which variables to store in registers. New compilers do it automatically. Even 20-year-old compilers do it automatically.
Is the use of register class variables and with that the use of the keyword register deprecated?
Shall I use register class variables in my modern programs? Or is this behavior redundant and deprecated?
There is no benefit to using register. Modern compilers substantially ignore it — they can handle register allocation better than you can. The only thing it prevents is taking the address of the variable, which is not a significant benefit.
None of my own code uses register any more. The code I work on loses register when I get to work on a file — but it takes time to get through 17,000+ files (and I only change a file when I have an external reason to change it — but it can be a flimsy reason).
As #JonathanLeffler stated it is ignored in most cases.
Some compilers have a special extension syntax if you want to keep the variable in the particular register.
gcc Global or local variable can be placed in the particular register. This option is not available for all platforms. I know that AVR & ARM ports implement it.
register int x asm ("10");
int foo(int y)
x = bar(x);
x = bar1(x);
return x*x;
More information: https://gcc.gnu.org/onlinedocs/gcc-6.1.0/gcc/Explicit-Register-Variables.html#Explicit-Register-Variables
But to be honest I was never using it in my programming life (30y+)
It's effectively deprecated and offers no real benefit.
C is a product of the early 1970s, and the register keyword served as a hint to the compiler that a) this particular object was going to be used a lot, so b) you might want to store it somewhere other than main memory - IOW, a register or some other "fast" memory.
It may have made a difference then - now, it's pretty much ignored. The only measurable effect is that it prevents you from taking the address of that object.
First of all, this feature is NOT deprecated because: "register" in this context (global or local register variables) is a GNU extension which are not deprecated.
In your example, R10 (or the register that GCC internally assigns REGNO(reg) = 10), is a global register. "global" here means, that all code in your application must agree on that usage. This is usually not the case for code from libraries like libc, libm or libgcc because they are not compiled with -ffixed-10. Moreover, global registers might conflict with the ABI. avr-gcc for example might pass values in R10. In avr-gcc, R2...R9 are not used by the ABI and not by code from libgcc (except for 64-bit double).
In some hard real-time app with avr-gcc I used global regs in a (premature) optimization, just to notice that the performance gain was miniscule.
Local register variables, however, are very handy when it comes to integrating non-ABI functions for example assembly functions that don't comply to the GCC ABI, without the need for assembly wrappers.

What is the calling convention that clang uses?

What is the default call convention that the clang compiler uses? I noticed that when I return a local pointer, the reference is not lost
#include <stdio.h>
char *retx(void) {
char buf[4] = "buf";
return buf;
int main(void) {
char *p1 = retx();
return 0;
This is Undefined Behaviour. It might happen to work, or it might not, depending on what the compiler happened to choose when compiling for some specific target. It's literally undefined, not "guaranteed to break"; that's the entire point. Compilers can just completely ignore the possibility of UB when generating code, not using extra instructions to make sure UB breaks. (If you want that, compile with -fsanitize=undefined).
Understanding exactly what happened requires looking at the asm, not just trying running it.
warning: address of stack memory associated with local variable 'buf' returned [-Wreturn-stack-address]
return buf;
Clang prints this warning even without -Wall enabled. Exactly because it's not legal C, regardless of what asm calling convention you're targeting.
Clang uses the C calling convention of the target it's compiling for1. Different OSes on the same ISA can have different conventions, although outside of x86 most ISAs only have one major calling convention. x86 has been around so long that the original calling conventions (stack args with no register args) were inefficient so various 32-bit conventions evolved. And Microsoft chose a different 64-bit convention from everyone else. So there's x86-64 System V, Windows x64, i386 System V for 32-bit x86, AArch64's standard convention, PowerPC's standard convention, etc. etc.
I have tested with clang several times and every time I displayed the string
The "decision" / "luck" of whether it "works" or not is made at compile time, not runtime. Compiling / running the same source multiple times with the same compiler tells you nothing.
Look at the generated asm to find out where char buf[4] ends up.
My guess: maybe you're on Windows x64. Happening to work is more plausible there than most calling conventions, where you'd expect buf[4] to end up below the stack pointer in main, so the call to puts, and puts itself, would be very likely to overwrite it.
If you're on Windows x64 compiling with optimization disabled, retx()'s local char buf[4] might be placed in the shadow space it owns. The caller then calls puts() with the same stack alignment, so retx's shadow space becomes puts's shadow space.
And if puts happens not to write its shadow space, then the data in memory that retx stored is still there. e.g. maybe puts is a wrapper function that in turn calls another function, without initializing a bunch of locals for itself first. But not a tailcall, so it allocates new shadow space.
(But that's not what clang8.0 does in practice with optimization disabled. It looks like buf[4] will be placed below RSP and get stepped on there, using __attribute__((ms_abi)) to get Windows x64 code-gen from Linux clang: https://godbolt.org/z/2VszYg)
But it's also possible in stack-args conventions where padding is left to align the stack pointer by 16 before a call. (e.g. modern i386 System V on Linux for 32-bit x86). puts() has an arg but retx() doesn't, so maybe buf[4] ended up in memory that the caller "allocates" as padding before pushing a pointer arg for puts.
Of course that would be unsafe because the data would be temporarily below the stack pointer, in a calling convention with no red-zone. (Only a few ABIs / calling conventions have red zones: memory below the stack pointer that's guaranteed not to be clobbered asynchronously by signal handlers, exception handlers, or debuggers calling functions in the target process.)
I wondered if enabling optimization would make it inline and happen to work. But no, I tested that for Windows x64: https://godbolt.org/z/k3xGe4. clang and MSVC both optimize away any stores of "buf\0" into memory. Instead they just pass puts a pointer to some uninitialized stack memory.
Code that breaks with optimization enabled is almost always UB.
Footnote 1: Except for x86-64 System V, where clang uses an extra un-documented "feature" of the calling convention: Narrow integer types as function args in registers are assumed to be sign-extended to 32 bits. gcc and clang both do this when calling, but ICC does not, so calling clang functions from ICC-compiled code can cause breakage. See Is a sign or zero extension required when adding a 32bit offset to a pointer for the x86-64 ABI?
Annex L of the C11 Draft N1570 recognizes some situations (i.e. "non-critical Undefined Behavior") where the Standard imposes no particular behavioral requirements but implementations that define __STDC_ANALYZABLE__ with a non-zero value should offer some guarantees, and other situations ("critical Undefined Behavior") where it would be common for implementations not to guarantee anything. Attempts to access objects past their lifetime would fall into the latter category.
While nothing would prevent an implementation from offering behavioral guarantees beyond what the Standard requires, even for Critical Undefined Behavior, and some tasks would require that implementations do so (e.g. many embedded systems tasks require that programs dereference pointers to addresses whose targets no not satisfy the definition for "objects"), accessing automatic variables past their lifetime is a behavior about which few implementations would offer any guarantees beyond perhaps guaranteeing that reading an arbitrary RAM address will have no side-effects beyond yielding an Unspecified value.
Even implementations that guaranteed how automatic objects will be laid out on the stack seldom guaranteed that the storage that held them wouldn't be overwritten between the time a function returned and the next action by the caller. Unless interrupts were disabled, interrupt handling could overwrite any storage used that had been used by automatic objects that were no longer in a live stack frame.
While many implementations can be configured to offer useful guarantees about the behavior of actions for which the Standard imposes no requirements, I can't think of any implementations that can be configured to offer sufficient guarantees to make the above code usable.

From Compiler to assembler

I have a question regarding the assembler. I was thinking of how the C function that takes multiple parameters as an argument is transformed into assembly. So my question is, is there a subroutine in assembly that takes arguments as a parameter to operate?
The code might look something like this:
Call label1, R16.
Where R16 is the subroutine input parameter.
If that's not the case then that means that EACH time the C function is called, it gets assembled into a subroutine with the parameters related to the specific call being substituted automatically in it. That basically means that whenever a C function is called, the compiler transforms it into an inline function which am sure is not the case either :D
So which is right?
Thanks alot! :)
The compiler uses a "calling convention" which can be specific to that one compiler for that one target architecture (x86, arm, mips, pdp-11, etc). For architectures with "plenty" of general purpose registers, the calling convention often starts with passing parameters in registers, and then uses the stack, for architectures with not a lot of registers the stack is primarily if not completely used for parameter passing and the return.
The calling convention is a set of rules, such that if everyone follows the rules you can compile functions into objects and link them with other objects and they will be able to call each others functions or call themselves.
So it is a bit of a hybrid of what you were assuming. The code built for that function is in some respects custom to that function as the number and type of parameters dictate what registers or how much stack is consumed and how. At the same time all functions conform to the same formula so they look more alike than different.
On an arm for example you might have three integers being passed in to a function, they would for all the arm calling conventions I have seen (generally you find that even though it could vary across compilers it often doesnt or in the case of arm and mips and some others they try to dictate the convention for everyone rather than the compiler folks trying to do it) the first parameter in the C function would come in in r0, the second in r1 and third in r2. If the first parameter were a 64 bit integer though then r0 and r1 are used for that first parameter and r2 gets the second and r3 the third, after r3 you use the stack, ordering of parameters on the stack is also dictated by the convention. So when a caller or a callee's code is compiled using the same C prototype then both sides know exactly where to find the parameters and construct the assembly language to do that.
There might be some minimal options in some instruction sets, but in general that is not the case.
Some assemblers have macros though that mimic procedural calls (usually with only a few registrable basetypes).
And no, only in the case of inline functions a new function is generated with the parametrised with the parameters substituted.
A compiler doesn't generate code for a procedure by textual substitution of parameters, but by putting all relevant parameters in registers or on the stack in a fixed regime called the "calling convention".
The code that calculates and loads the parameters (in registers or on stack) is generated for each invocation, and the procedure/function remains unmodified and loads the parameters from where it knows it can find them
