Using register in inline assembler - c

I'm optimizing C code for OpenRISC and I want to manually prereserve some computed values in registers, the pseudocode looks like that:
external loop
compute eight values (heavy calculations)
internal loop
use values computed above
When I looked at GCC ABI for OpenRISC I saw two groups of registers: callee-saved and temporary? Which registers I should use to store these eight values? I mean, which registers I can put on clobbered list in inline asm?
I need to hardoce registers, because we run executables on custom OpenRISC.

The answer is: whatever you like.
If you use callee-save registers then the compiler will save them for you (as long as you do mark them as clobbered).
If you use temporary registers (a.k.a. caller-save) then the compiler will be forced to save them if you make a function call. Beware that the compiler also prefers to use these for other variables, so if you use up the caller-save ones it'll have to use callee-save for other things, so it might end up being much the same difference.
At the end of the day, if you are doing heavy calculations then saving a few registers to stack before you start is not going to be a big deal.
There are some registers that contain important values (such as stack pointer) that you must not overwrite. Others, such as the GOT table pointer are less important, and the compiler will restore the value when you're done (just be sure you don't need it during the process.
Really though, you don't need to work it out for yourself: the compiler can select registers for you:
int a, b, c;
asm volatile ("whatever" : "=&w" (a), "=&w" (b), "=&w" (c));
The variables are not needed, but they must have registers assigned, so they effectively reserve a register for whatever you want. The & indicates an "early-clobber", which means that they can't share the same register as an input register (not that my example shows any).

Related

How does Google's `DoNotOptimize()` function enforce statement ordering

I'm trying to understand exactly how Google's DoNotOptimize() is supposed to work.
For completeness, here is its definition (for clang, and non-const data):
template <class Tp>
inline BENCHMARK_ALWAYS_INLINE void DoNotOptimize(Tp& value) {
asm volatile("" : "+r,m"(value) : : "memory");
}
As I understand we can use this in code like this:
start_time = time();
bench_output = run_bench(bench_inputs);
result = time() - start_time;
To ensure that the benchmark stays in the critical section:
start_time = time();
DoNotOptimize(bench_inputs);
bench_output = run_bench(bench_inputs);
DoNotOptimise(bench_output);
result = time() - start_time;
Specifically what I don't understand is why this guarantees (does it?) that run_bench() is not moved above start_time = time().
(Someone asked exactly this in this comment, however I don't understand the answer).
As I understand, the above DoNotOptimze() does several things:
It forces value to the stack, as it is passed by C++ reference. You can't have a pointer to a register, so it must be in memory.
Because value is now on the stack, subsequently clobbering memory (as done in the asm constraints) will force the compiler to assume that value is both read and written by the call to DoNotOptimize(value).
(it's not clear to me if the +r,m constraint is relevant. As far as I know this says that the pointer itself may be stored in a register or in memory, but the pointer value itself may be read and/or written.)
And this is where things get fuzzy for me.
If start_time is also stack allocated, the memory clobbering in DoNotOptimize() will mean that the compiler must assume that DoNotOptimize() potentially reads start_time. Therefore the order of the statements can only be:
start_time = time(); // on the stack
DoNotOptimize(bench_inputs); // reads start_time, writes bench_inputs
bench_output = run_bench(bench_inputs)
But if start_time is not stored in memory, but instead in a register, then clobbering memory will not clobber start_time, right? In that case the desired ordering of start_time = time() and DoNotOptimize(bench_inputs) is lost and the compiler is free to do:
DoNotOptimize(bench_inputs); // only writes bench_inputs
bench_output = run_bench(bench_inputs)
start_time = time(); // in a register
Obviously I've misunderstood something. Can anyone help explain? Thanks :)
I'm wondering if this is because reordering optimisations happen prior to register allocation, and thus everything is assumed to be stack allocated at that time. But if that were the case, then DoNotOptimize() would be redundant, as ClobberMemory() would be sufficient.
Summary: DoNotOptimize is ordered wrt. time() by the the "memory" clobbers, as if it were another function call to an opaque function that could modify any global state.
DoNotOptimize is ordered wrt. the computation of output from input by the data dependency of the calculation on the input, and the output on the calculation, as Chandler Carruth explained in the Q&A you linked. The "memory" clobber is irrelevant for this part.
"memory" clobber is like a non-inline function call
DoNotOptimize's asm statement contains a "memory" clobber. As far as the optimizer is concerned, that's equivalent to an opaque function call: it has to be assumed to read and write every globally-reachable object1. (Even ones this compilation unit might not know about.)
Since time() itself doesn't have an inline definition in any header, it can't reorder with DoNotOptimize at compile time for the same reason that a compiler can't reorder calls to foo() and bar() when it can't see the definitions of those functions. Same reason compilers don't need any special logic to stop them from reordering puts("hi"); puts("mom");.
(A hypothetical time() that could inline and only contained an asm statement would have to use asm volatile to make sure repeated calls didn't just use the first one's output. asm volatile statements can't reorder with each other or accesses to volatile variables, so that would be ok too, for a different reason.)
Footnote 1: Globally reachable = any object that might be pointed-to by any hypothetical global variable. i.e. anything except local variables within this function, or memory freshly allocated with new, if escape analysis can prove that nothing outside this function could have pointers to them.
How the asm statement works
I think you're pretty seriously misunderstanding how the asm works. "+r,m" tells the compiler to materialize the value in a register (or memory if it wants), and then use the value there at the end of the (empty) asm template as the new value of that C++ object.
So it forces the compiler to actually materialize (produce) the value somewhere, which means it has to be computed. And it means has to forget what it previously knew about the value (e.g. that it was a compile time constant 5, or non-negative, or anything) because the "+" modifier declares a read/write operand.
The point of DoNotOptimize on the input is to defeat constant-propagation that would let the benchmark optimize away.
And on the output to make sure a final result is actually materialized in a register (or memory) instead of optimizing away all the computation leading to an unused result. (This is where being asm volatile is relevant; defeating constant-propagation still works with non-volatile asm.)
So the computation you want to benchmark has to happen between the two DoNotOptimize() statements, and separately those two statements can't reorder with time().
The compiler has to assume that the asm statement modifies the value like val ^= random for all it knows, along with changing the value in memory of any/every other object except for private locals that weren't operands, so e.g. the "memory" clobber doesn't stop the compiler from keeping a local loop counter in memory. (It doesn't special case an empty asm template string here; programs don't contain asm statements like this by accident so nobody wants them optimized away.)
Misconceptions about the reference arg and picking "m"
I only got part way into the details of your attempt to reason about the "+r,m" operand and the reference function-arg before deciding it would probably be better to just explain from scratch. The correct reason isn't that complicated. But a couple things are worth specifically correcting:
The C++ function containing the asm statement can inline, letting the by-reference function arg optimize away. (It's even declared inline __attribute__((always_inline)) to force inlining even with optimization disabled, although in that case the reference variable won't optimize away.)
The net result is as if the asm statement were used directly on the C++ variable passed to DoNotOptimize. e.g. DoNotOptimize(foo) is like asm volatile("" : "+r,m"(foo) :: "memory")
The compiler can always pick register if it wants to, e.g. choosing to load a variable's value into a register before an asm statement. (And if the C++ semantics demand updating the variable's value in memory, also emitting a store instruction after the asm statement.)
For example, we can see that GCC does choose to do that. (I guess I could have used incl %0 as the example, but I just chose nop as a way to show what the compiler picked for the operand location as an alternative to # %0 pure comment, so the Godbolt compiler explorer wouldn't filter it out.)
void foo(int *p)
{
asm volatile("nop # operand picked %0" : "+r,m" (p[4]) );
}
# GCC 11.2 -O2
foo(int*):
movl 16(%rdi), %eax
nop # operand picked %eax
movl %eax, 16(%rdi)
ret
vs. clang choosing to leave the value in memory, so every instruction in the asm template would be accessing memory instead of a register. (If there were any instructions).
# clang 12.0.1 -O2 -fPIE
foo(int*): # #foo(int*)
nop # operand picked 16(%rdi)
retq
Fun fact: "r,m" is an attempt to work around a clang missed-optimization bug that makes it always pick memory for "rm" constraints, even if the value was already in a register. Spilling it first, even if it has to invent a temporary location for the value of an expression as an input.

How to select register variables in C?

Please note: I was originally going to title this question "When to use registers in C?", however it seems like someone already beat me to the punch. However, the way that question was asked when compared to the title is a bit misleading, and I believe this question is unique and not a dupe of it.
Whereas, that question really ought to have been titled "Are register variables really faster?", I actually want to know when one should be using registers. It's obvious to me that they are in fact faster, but obviously your CPU only has so many register on chip, and so you are limited by what you can store on them.
So I ask: How do I select which variables should be qualified with register? Variables that are used with a certain frequency? Variables of a particular size or type? Variables that are used in compute-bound problems? Something else?
I look at it like this: to every product owner or stakeholder, every single bug or feature is "top priority" and critical. But if you really analyse their needs, you will see that some features are in deed more "top priority" than others. With code, you want it to run as fast as possible, and so I'm sure every variable is a candidate for optimization/performance tuning. But I would imagine that if you really analyse a program (or the C compiler for that matter, let's assume gcc), I'm sure there's a way to determine which variables are best suited for use with register.
First, let me tell you, don't be fooled by the presence of register in your C source code.
Your compiler is absolutely free to ignore it (and most of time, it does). In modern compiler, using register is most probably useless.
Usually, any "standard" compiler will have their own algorithm to detect and put the appropriate variables to allocate them to registers (or not). Most of the time, they are surprisingly correct. Leave it to them.
FWIW, Only one thing to remember, you cannot get the address of a register variable. That is the only reason (if we can consider that as a "reason") to use register. maybe.
There are times when register is useful ... when used with extended assembly.
#define STACK_POINTER "esp"
char **environ;
_start(void){
register long *sp __asm__( STACK_POINTER );
long argc = *sp;
char **argv = (char **)(sp + 1);
environ = (char **)(sp + argc + 1);
exit(main(argc, argv, environ) );
__builtin_unreachable(); //or for(;;); to shut up compiler about returning
}
In Linux (possibly others), elf binaries pass _start args on the stack even if the architecture passes some number of arguments as registers. Since there is no standardized way of accessing the stack pointer, you either have to write start in all assembly or use extensions like this to access a specific named register through a variable. You can also assign a value
register long *syscall_num __asm__( "eax" ) = __NR_open;
...
This is useful for the opposite situation where functions normally pass arguments on the stack, but a special function (syscall etc..) requires parameters to be passed in certain registers.
Theses two mechanisms will use registers if the compiler supports that syntax, but no, using register int i; or similar has no guarantees.
The register keyword is useful in defining extensions to calling ABIs, such as creating functions that can pass multiple parameters; or to globally allocate a register for local memory manager etc.
register int g_imag __asm__ ("r14");
int complex_multiply(int real, int imag, int r2, int i2)
{
g_imag = real*i2 + imag*r2; // put to "global" variable
return real*r2 - imag*i2;
}
Only use callee saved registers to guarantee that functions using r14 to other purposes first spill the content to stack. The downside is the reduction of available registers in other functions, as well as others have pointed out, this only works when it works.
As you know, register are fast access memory used by the CPU.They are mainly limited by their memory space, some are 8 bits, some are 256 bits.
I remember it's around 10 times faster compared to DRAM, like 5 to 50 ns ( not sure ).
Using it depends on the field you are working on.
For example, when working with embedded electronics it's usefull to use register , when you often use a global const, setting it as register will make your program " faster".
But keep in mind that's it's a matter of optimization,unless you are working with heavy demanding algorithms or embedded electronics, there is no need to dive into it. As it's said above, compilers can detect often used variables and try to store them in register,automatically.
But if you really want to use it, keep in mind that you should set variables or const that are very often used and aren't bigger than the register size !

How does including assembly inline with C code work?

I've seen code for Arduino and other hardware that have assembly inline with C, something along the lines of:
asm("movl %ecx %eax"); /* moves the contents of ecx to eax */
__asm__("movb %bh (%eax)"); /*moves the byte from bh to the memory pointed by eax */
How does this actually Work? I realize every compiler is different, but what are the common reasons this is done, and how could someone take advantage of this?
The inline assembler code goes right into the complete assembled code untouched and in one piece. You do this when you really need absolutely full control over your instruction sequence, or maybe when you can't afford to let an optimizer have its way with your code. Maybe you need every clock tick. Maybe you need every single branch of your code to take the exact same number of clock ticks, and you pad with NOPs to make this happen.
In any case, lots of reasons why someone may want to do this, but you really need to know what you're doing. These chunks of code will be pretty opaque to your compiler, and its likely you won't get any warnings if you're doing something bad.
Usually the compiler will just insert the assembler instructions right into its generated assembler output. And it will do this with no regard for the consequences.
For example, in this code the optimiser is performing copy propagation, whereby it sees that y=x, then z=y. So it replaces z=y with z=x, hoping that this will allow it to perform further optimisations. Howver, it doesn't spot that I've messed with the value of x in the mean time.
char x=6;
char y,z;
y=x; // y becomes 6
_asm
rrncf x, 1 // x becomes 3. Optimiser doesn't see this happen!
_endasm
z=y; // z should become 6, but actually gets
// the value of x, which is 3
To get around this, you can essentially tell the optimiser not to perform this optimisation for this variable.
volatile char x=6; // Tell the compiler that this variable could change
// all by itself, and any time, and therefore don't
// optimise with it.
char y,z;
y=x; // y becomes 6
_asm
rrncf x, 1 // x becomes 3. Optimiser doesn't see this happen!
_endasm
z=y; // z correctly gets the value of y, which is 6
Historically, C compilers generated assembly code, which would then be translated to machine code by an assembler. Inline assembly arises as a simple feature — in the intermediate assembly code, at that point, inject some user-picked code. Some compilers directly generate machine code, in which case they contain an assembler or call an external assembler to generate the machine code for the inline assembly snippets.
The most common use for assembly code is to use specialized processor instructions that the compiler isn't able to generate. For example, disabling interrupts for a critical section, controlling processor features (cache, MMU, MPU, power management, querying CPU capabilities, …), accessing coprocessors and hardware peripherals (e.g. inb/outb instructions on x86), etc. You'll rarely find asm("movl %ecx %eax"), because that affects general-purpose registers that the C code around it is also using, but something like asm("mcr p15, 0, 0, c7, c10, 5") has its use (data memory barrier on ARM). The OSDev wiki has several examples with code snippets.
Assembly code is also useful to implement features that break C's flow control model. A common example is context switching between threads (whether cooperative or preemptive, whether in the same address space or not) requiring assembly code to save and restore register values.
Assembly code is also useful to hand-optimize small bits of code for memory or speed. As compilers are getting smarter, this is rarely relevant at the application level nowadays, but it's still relevant in much of the embedded world.
There are two ways to combine assembly with C: with inline assembly, or by linking assembly modules with C modules. Linking is arguably cleaner but not always applicable: sometimes you need that one instruction in the middle of a function (e.g. for register saving on a context switch, a function call would clobber the registers), or you don't want to pay the cost of a function call.
Most C compilers support inline assembly, but the syntax varies. It is typically introduced by the keyword asm, _asm, __asm or __asm__. In addition to the assembly code itself, the inline assembly construct may contain additional code that allows you to pass values between assembly and C (for example, requesting that the value of a local variable is copied to a register on entry), or to declare that the assembly code clobbers or preserves certain registers.
asm("") and __asm__ are both valid usage. Basically, you can use __asm__ if the keyword asm conflicts with something in your program. If you have more than one instructions, you can write one per line in double quotes, and also suffix a ’\n’ and ’\t’ to the instruction. This is because gcc sends each instruction as a string to as(GAS) and by using the newline/tab you can send correctly formatted lines to the assembler. The code snippet in your question is basic inline.
In basic inline assembly, there is only instructions. In extended assembly, you can also specify the operands. It allows you to specify the input registers, output registers and a list of clobbered registers. It is not mandatory to specify the registers to use, you can leave that to GCC and that probably fits into GCC’s optimization scheme better. An example for the extended asm is:
__asm__ ("movl %eax, %ebx\n\t"
"movl $56, %esi\n\t"
"movl %ecx, $label(%edx,%ebx,$4)\n\t"
"movb %ah, (%ebx)");
Notice that the '\n\t' at the end of each line except the last, and each line is enclosed in quotes. This is because gcc sends each as instruction to as as a string as I mentioned before. The newline/tab combination is required so that the lines are fed to as according to the correct format.

Concept of register variables(datatype:register) in C language?

I just to want to get an idea about how the register variables are handled in C program executables. ie in which location(or register) it exactly get stored in case of an embedded system and in a X86 machine(C program executable in a desktop PC)?
What about this view? (correct me if am wrong)
Suppose we have declared/initialized one variable inside a function as 'int' datatype. Normally it will go to the stack segment and it will be there in that section only at run time ,when the caller calls the callee containing the local variable. But if we declare above local variable as 'register int' then also it'll go to the stack segment. But on run time , the processor put that local variable from stack to its general purpose register locations(because of extra compiler inserted code due to 'register' keyword) and a fast access of the same from there.
That is the only difference between them is at run time access and there is no memory loading differences between them.
__Kanu
The register keyword in C (rarely ever seen anymore) is only a hint to the compiler that it may be useful to keep a variable in a register for faster access.
The compiler is free to ignore the hint, and optimize as it sees best.
Since modern compilers are much better than humans at understanding usage and speed, the register keyword is usually ignored by modern compilers, and in some cases, may actually slow down execution speed.
From K&R C:
A register variable advises the
compiler that the variable in question
will be heavily used. The idea is that
register variables are to be placed in
machine registers, which may result in
smaller & faster programs. But
compilers are free to ignore this
advice.
It is not possible to take the address of a register variable, regardless of whether the variable is actually placed in a register.
Hence,
register int x;
int *y = &x; // is illegal
So, you must weigh in the cons of not being able to get the address of the register variable.
In addition to crypto's answer (that has my vote) just see the name register for the keyword as a historical misnomer. It has not much to do with registers as you learn it in class e.g for the von Neumann processor model, but is just a hint to the compiler that this variable doesn't need an address.
On modern machines an addressless variable can be realized by different means (e.g an immediate assembler operator) or optimized away completely. Tagging a variable as register can be a useful optimization hint for the compiler and also a useful discipline for the programmer.
When a compiler takes its internal code and the backend turns it into machine/assembler for the target processor, it keeps track of the registers it is generating instructions for as it creates the code. When it needs to allocate a register to load or keep track of a variable if there is an unused working variable then it marks it as used and generates the instructions using that register. But if all the working registers have something in them then it will usually evict the contents of one of those registers somewhere, often ram for example global memory or the stack if that variable had a home. The compiler may or may not be smart about that decision and may evict a variable that is highly used. By using the register keyword, depending on the compiler, you may be able to influence that decision, it may choose to keep the register keyword variables in registers and evict non-register keyword variables to memory as needed.
which location(or register) it exactly get stored in case of an embedded system and in a > X86 machine(C program executable in a desktop PC)?
You don't know without opening up the assembly output, which will be liable to shift based on compiler choices. It's a good idea to check the assembly just for educational purposes.
If you need to read and write particular registers that precisely, you should write inline assembly or link in an assembly module.
Typically when using a standard C compiler for x86/amd64 (gcc, icc, cl), you can reasonably assume that the compiler will optimize sufficiently well for most purposes.
If, however, you are using a non-standard compiler, e.g., one cooked up for a new embedded system, it is a good idea to consider hand optimization. If the architecture is new, it might also be a good idea to consider hand optimization.

"register" keyword in C?

What does the register keyword do in C language? I have read that it is used for optimizing but is not clearly defined in any standard. Is it still relevant and if so, when would you use it?
It's a hint to the compiler that the variable will be heavily used and that you recommend it be kept in a processor register if possible.
Most modern compilers do that automatically, and are better at picking them than us humans.
I'm surprised that nobody mentioned that you cannot take an address of register variable, even if compiler decides to keep variable in memory rather than in register.
So using register you win nothing (anyway compiler will decide for itself where to put the variable) and lose the & operator - no reason to use it.
It tells the compiler to try to use a CPU register, instead of RAM, to store the variable. Registers are in the CPU and much faster to access than RAM. But it's only a suggestion to the compiler, and it may not follow through.
I know this question is about C, but the same question for C++ was closed as a exact duplicate of this question. This answer therefore may not apply for C.
The latest draft of the C++11 standard, N3485, says this in 7.1.1/3:
A register specifier is a hint to the implementation that the variable so declared will be heavily used. [ note: The hint can be ignored and in most implementations it will be ignored if the address of the variable is taken. This use is deprecated ... —end note ]
In C++ (but not in C), the standard does not state that you can't take the address of a variable declared register; however, because a variable stored in a CPU register throughout its lifetime does not have a memory location associated with it, attempting to take its address would be invalid, and the compiler will ignore the register keyword to allow taking the address.
I have read that it is used for optimizing but is not clearly defined in any standard.
In fact it is clearly defined by the C standard. Quoting the N1570 draft section 6.7.1 paragraph 6 (other versions have the same wording):
A declaration of an identifier for an object with storage-class
specifier register suggests that access to the object be as fast
as possible. The extent to which such suggestions are effective is
implementation-defined.
The unary & operator may not be applied to an object defined with register, and register may not be used in an external declaration.
There are a few other (fairly obscure) rules that are specific to register-qualified objects:
Defining an array object with register has undefined behavior.
Correction: It's legal to define an array object with register, but you can't do anything useful with such an object (indexing into an array requires taking the address of its initial element).
The _Alignas specifier (new in C11) may not be applied to such an object.
If the parameter name passed to the va_start macro is register-qualified, the behavior is undefined.
There may be a few others; download a draft of the standard and search for "register" if you're interested.
As the name implies, the original meaning of register was to require an object to be stored in a CPU register. But with improvements in optimizing compilers, this has become less useful. Modern versions of the C standard don't refer to CPU registers, because they no longer (need to) assume that there is such a thing (there are architectures that don't use registers). The common wisdom is that applying register to an object declaration is more likely to worsen the generated code, because it interferes with the compiler's own register allocation. There might still be a few cases where it's useful (say, if you really do know how often a variable will be accessed, and your knowledge is better than what a modern optimizing compiler can figure out).
The main tangible effect of register is that it prevents any attempt to take an object's address. This isn't particularly useful as an optimization hint, since it can be applied only to local variables, and an optimizing compiler can see for itself that such an object's address isn't taken.
It hasn't been relevant for at least 15 years as optimizers make better decisions about this than you can. Even when it was relevant, it made a lot more sense on a CPU architecture with a lot of registers, like SPARC or M68000 than it did on Intel with its paucity of registers, most of which are reserved by the compiler for its own purposes.
Actually, register tells the compiler that the variable does not alias with
anything else in the program (not even char's).
That can be exploited by modern compilers in a variety of situations, and can help the compiler quite a bit in complex code - in simple code the compilers can figure this out on their own.
Otherwise, it serves no purpose and is not used for register allocation. It does not usually incur performance degradation to specify it, as long as your compiler is modern enough.
Storytime!
C, as a language, is an abstraction of a computer. It allows you to do things, in terms of what a computer does, that is manipulate memory, do math, print things, etc.
But C is only an abstraction. And ultimately, what it's extracting from you is Assembly language. Assembly is the language that a CPU reads, and if you use it, you do things in terms of the CPU. What does a CPU do? Basically, it reads from memory, does math, and writes to memory. The CPU doesn't just do math on numbers in memory. First, you have to move a number from memory to memory inside the CPU called a register. Once you're done doing whatever you need to do to this number, you can move it back to normal system memory. Why use system memory at all? Registers are limited in number. You only get about a hundred bytes in modern processors, and older popular processors were even more fantastically limited (The 6502 had 3 8-bit registers for your free use). So, your average math operation looks like:
load first number from memory
load second number from memory
add the two
store answer into memory
A lot of that is... not math. Those load and store operations can take up to half your processing time. C, being an abstraction of computers, freed the programmer the worry of using and juggling registers, and since the number and type vary between computers, C places the responsibility of register allocation solely on the compiler. With one exception.
When you declare a variable register, you are telling the compiler "Yo, I intend for this variable to be used a lot and/or be short lived. If I were you, I'd try to keep it in a register." When the C standard says compilers don't have to actually do anything, that's because the C standard doesn't know what computer you're compiling for, and it might be like the 6502 above, where all 3 registers are needed just to operate, and there's no spare register to keep your number. However, when it says you can't take the address, that's because registers don't have addresses. They're the processor's hands. Since the compiler doesn't have to give you an address, and since it can't have an address at all ever, several optimizations are now open to the compiler. It could, say, keep the number in a register always. It doesn't have to worry about where it's stored in computer memory (beyond needing to get it back again). It could even pun it into another variable, give it to another processor, give it a changing location, etc.
tl;dr: Short-lived variables that do lots of math. Don't declare too many at once.
You are messing with the compiler's sophisticated graph-coloring algorithm. This is used for register allocation. Well, mostly. It acts as a hint to the compiler -- that's true. But not ignored in its entirety since you are not allowed to take the address of a register variable (remember the compiler, now on your mercy, will try to act differently). Which in a way is telling you not to use it.
The keyword was used long, long back. When there were only so few registers that could count them all using your index finger.
But, as I said, deprecated doesn't mean you cannot use it.
Just a little demo (without any real-world purpose) for comparison: when removing the register keywords before each variable, this piece of code takes 3.41 seconds on my i7 (GCC), with register the same code completes in 0.7 seconds.
#include <stdio.h>
int main(int argc, char** argv) {
register int numIterations = 20000;
register int i=0;
unsigned long val=0;
for (i; i<numIterations+1; i++)
{
register int j=0;
for (j;j<i;j++)
{
val=j+i;
}
}
printf("%d", val);
return 0;
}
I have tested the register keyword under QNX 6.5.0 using the following code:
#include <stdlib.h>
#include <stdio.h>
#include <inttypes.h>
#include <sys/neutrino.h>
#include <sys/syspage.h>
int main(int argc, char *argv[]) {
uint64_t cps, cycle1, cycle2, ncycles;
double sec;
register int a=0, b = 1, c = 3, i;
cycle1 = ClockCycles();
for(i = 0; i < 100000000; i++)
a = ((a + b + c) * c) / 2;
cycle2 = ClockCycles();
ncycles = cycle2 - cycle1;
printf("%lld cycles elapsed\n", ncycles);
cps = SYSPAGE_ENTRY(qtime) -> cycles_per_sec;
printf("This system has %lld cycles per second\n", cps);
sec = (double)ncycles/cps;
printf("The cycles in seconds is %f\n", sec);
return EXIT_SUCCESS;
}
I got the following results:
-> 807679611 cycles elapsed
-> This system has 3300830000 cycles per second
-> The cycles in seconds is ~0.244600
And now without register int:
int a=0, b = 1, c = 3, i;
I got:
-> 1421694077 cycles elapsed
-> This system has 3300830000 cycles per second
-> The cycles in seconds is ~0.430700
During the seventies, at the very beginning of the C language, the register keyword has been introduced in order to allow the programmer to give hints to the compiler, telling it that the variable would be used very often, and that it should be wise to keep it’s value in one of the processor’s internal register.
Nowadays, optimizers are much more efficient than programmers to determine variables that are more likely to be kept into registers, and the optimizer does not always take the programmer’s hint into account.
So many people wrongly recommend not to use the register keyword.
Let’s see why!
The register keyword has an associated side effect: you can not reference (get the address of) a register type variable.
People advising others not to use registers takes wrongly this as an additional argument.
However, the simple fact of knowing that you can not take the address of a register variable, allows the compiler (and its optimizer) to know that the value of this variable can not be modified indirectly through a pointer.
When at a certain point of the instruction stream, a register variable has its value assigned in a processor’s register, and the register has not been used since to get the value of another variable, the compiler knows that it does not need to re-load the value of the variable in that register.
This allows to avoid expensive useless memory access.
Do your own tests and you will get significant performance improvements in your most inner loops.
c_register_side_effect_performance_boost
Register would notify the compiler that the coder believed this variable would be written/read enough to justify its storage in one of the few registers available for variable use. Reading/writing from registers is usually faster and can require a smaller op-code set.
Nowadays, this isn't very useful, as most compilers' optimizers are better than you at determining whether a register should be used for that variable, and for how long.
gcc 9.3 asm output, without using optimisation flags (everything in this answer refers to standard compilation without optimisation flags):
#include <stdio.h>
int main(void) {
int i = 3;
i++;
printf("%d", i);
return 0;
}
.LC0:
.string "%d"
main:
push rbp
mov rbp, rsp
sub rsp, 16
mov DWORD PTR [rbp-4], 3
add DWORD PTR [rbp-4], 1
mov eax, DWORD PTR [rbp-4]
mov esi, eax
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call printf
mov eax, 0
leave
ret
#include <stdio.h>
int main(void) {
register int i = 3;
i++;
printf("%d", i);
return 0;
}
.LC0:
.string "%d"
main:
push rbp
mov rbp, rsp
push rbx
sub rsp, 8
mov ebx, 3
add ebx, 1
mov esi, ebx
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call printf
add rsp, 8
pop rbx
pop rbp
ret
This forces ebx to be used for the calculation, meaning it needs to be pushed to the stack and restored at the end of the function because it is callee saved. register produces more lines of code and 1 memory write and 1 memory read (although realistically, this could have been optimised to 0 R/Ws if the calculation had been done in esi, which is what happens using C++'s const register). Not using register causes 2 writes and 1 read (although store to load forwarding will occur on the read). This is because the value has to be present and updated directly on the stack so the correct value can be read by address (pointer). register doesn't have this requirement and cannot be pointed to. const and register are basically the opposite of volatile and using volatile will override the const optimisations at file and block scope and the register optimisations at block-scope. const register and register will produce identical outputs because const does nothing on C at block-scope, so only the register optimisations apply.
On clang, register is ignored but const optimisations still occur.
On supported C compilers it tries to optimize the code so that variable's value is held in an actual processor register.
Microsoft's Visual C++ compiler ignores the register keyword when global register-allocation optimization (the /Oe compiler flag) is enabled.
See register Keyword on MSDN.
Register keyword tells compiler to store the particular variable in CPU registers so that it could be accessible fast. From a programmer's point of view register keyword is used for the variables which are heavily used in a program, so that compiler can speedup the code. Although it depends on the compiler whether to keep the variable in CPU registers or main memory.
Register indicates to compiler to optimize this code by storing that particular variable in registers then in memory. it is a request to compiler, compiler may or may not consider this request.
You can use this facility in case where some of your variable are being accessed very frequently.
For ex: A looping.
One more thing is that if you declare a variable as register then you can't get its address as it is not stored in memory. it gets its allocation in CPU register.
The register keyword is a request to the compiler that the specified variable is to be stored in a register of the processor instead of memory as a way to gain speed, mostly because it will be heavily used. The compiler may ignore the request.

Resources