I'm a foreigner reading C99, while a sentence (in 6.7.1) makes me confused:
'(A declaration of an identifier for an object with storage-class
specifier register suggests that access to the object be as fast as
possible.) The extent to which such suggestions are effective is
implementation-defined.'
How should I parse the second sentence :
The extent to, which such suggestions are effective, is implementation-defined.
The extent, to which such suggestions are effective, is implementation-defined.
which one is better?
Does that means an implementation has full powers to decide how to deal with register, even with a termination of translation?
Thanks.
Don't mix up register variables with registers of the CPU. These are not the same.
register has the purpose to allow for optimizations. Taking the address of a such a variable is forbidden and as a consequence such a variable can never alias, the compiler always masters its latest value. Thus it can realize such a variable easily as a CPU register or an assembler immediate, for example.
As a particular subcase there are const qualified register variables that can't be altered by the program directly nor behind the scenes by some other code that would access it through a pointer. Only such variables can easily be guaranteed to be constant through out the whole program execution.
No -- the extent to which the suggestions are effective, is implementation defined, but the effect of the code as a whole is not.
To put it slightly differently, when/if you put register into your code (where allowed), the compiler can completely ignore it -- and I feel obliged to add that nearly all reasonably modern compilers will.
Nonetheless, the mere presence of register won't prevent the code from compiling, unless you try to apply it somewhere it's not allowed (e.g., to a global variable).
Bottom line: don't use it, but don't worry about removing it from code that already has it either. It's a waste of time and effort, but with reasonably modern compilers a completely harmless one.
Register is an old keyword. Basically it is a hint to an optimizer telling that it is better to use register for storing this variable. The pharse that you are mentioning means that compiler can treat this hint in any way it wants. In our days of optimizing compilers, its meaning and value for 99% is nothing.
Related
The C register keyword gives the compiler a hint to prefer storing a variable in a register rather than, say, on the stack. A compiler may ignore it if it likes. I understand that it's mostly-useless these days when you compile with optimization turned on, but is it entirely useless?
More specifically: For any combination of { gcc, clang, msvc } x { -Og, -O, -O2, -O3 }: Is register ignored when deciding whether to actually assign a register? And if not, are there cases in which it's useful enough to bother using it?
Notes:
I am not asking whether it the keyword any effect; of course it does - it prevents you from using the address of that variable; and if you don't optimize at all, it will make the difference between register assignment or memory assignment for your variable.
Answers for just one compiler / some of the above combinations are very welcome.
For GCC, register has had no effect on code generation at all, not even a hint, at all optimization levels, for all supported CPU architectures, for over a decade.
The reasons for this are largely historical. GCC 2.95 and older had two register allocators, one ("stupid") used when not optimizing, and one ("local, global, reload") used when optimizing. The "stupid" allocator did try to honor register, but the "local, global, reload" allocator completely ignored it. (I don't know what the original rationale for that design decision was; you'd have to ask Richard Kenner.) In version 3.0, the "stupid" allocator was scrapped in favor of adding a fast-and-sloppy mode to "local, global, reload". Nobody bothered to write the code to make that mode pay attention to register, so it doesn't.
As of this writing, the GCC devs are in the process of replacing "local, global, reload" with a new allocator called "IRA and LRA", but it, too, completely ignores register.
However, the (C-only) rule that you cannot take the address of a register variable is still enforced, and the keyword is used by the explicit register variable extension, which allows you to dedicate a specific register to a variable; this can be useful in programs that use a lot of inline assembly.
C Standard says:
(c11, 6.7.1p6) "A declaration of an identifier for an object with storage-class specifier register suggests that access to the object be as fast as possible. The extent to which such suggestions are effective is implementation-defined."
These suggestions being implementation-defined means the implementation must define the choices being made ((c11, J.3.8 Hints) "The extent to which suggestions made by using the register storage-class specifier are effective (6.7.1)").
Here is what the documentation of some popular C compilers says.
For gcc (source):
The register specifier affects code generation only in these ways:
When used as part of the register variable extension, see Explicit Register Variables.
When -O0 is in use, the compiler allocates distinct stack memory for all variables that do not have the register storage-class
specifier; if register is specified, the variable may have a
shorter lifespan than the code would indicate and may never be
placed in memory.
On some rare x86 targets, setjmp doesn’t save the registers in all circumstances. In those cases, GCC doesn’t allocate any
variables in registers unless they are marked register.
For IAR compiler for ARM (source):
Honoring the register keyword (6.7.1)
User requests for register variables are not honored
If an object of automatic storage duration never has its address taken, a compiler may safely assume that its value can only be observed or modified by code which uses it directly. In the absence of the register keyword, a compiler which generates code in single-pass fashion would have no way of knowing, given...
void test(int *somePointer)
{
int i;
for (int i=0; i<10; i+=2)
{
somePointer[i] = -1;
somePointer[i+1] = 2;
whether the write to somePointer[i] might possibly write i. If it could, then the compiler would be required to reload i when evaluating somePointer[i+1]. Applying the register keyword to i would allow even a single-pass compiler to avoid the reload because it would be entitled to assume that no valid pointer could hold a value derived from the address of i, and consequently there was no way that writing somePointer[i] could affect i.
The register keyword could be useful, even today, if it were not interpreted as imposing a constraint that the address of an object cannot be exposed to anything, but rather as inviting compilers to assume that no pointer derived from an object's address will be used outside the immediate context where the address was taken, and within contexts where the address is taken the object will be accessed only using pointers derived from address. Unfortunately, the only situations the Standard permits register qualifiers are those where an object's address is never neither used nor exported to outside code in any fashion--i.e. cases which a multi-pass compiler could identify by itself without need for the qualifier.
The register keyword could be useful, even with modern compilers, if it invited compilers to--at their leisure--treat any context where a register-qualified object's address is taken (e.g. get_integer(&i);) using the pattern:
{
int temp = i;
get_integer(&temp);
i = temp;
}
and to assume that a register-qualified object of external scope will only be accessed in contexts which either explicitly access the object or take its address. At present, compilers have very limited ability to cache global variables in registers across pointer accesses involving the same types; the register keyword could help with that except that compilers are required to treat as constraint violations all the situations where it could be useful.
I have been advised many times that accesses to volatile objects can't be optimised away, however it seems to me as though this section, present in the C89, C99 and C11 standards advises otherwise:
... An actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no needed side effects are produced (including any caused by calling a function or accessing a volatile object).
If I understand correctly, this sentence is stating that an actual implementation can optimise away part of an expression, providing these two requirements are met:
"its value is not used", and
"that no needed side effects are produced (including any caused by calling a function or accessing a volatile object)"...
It seems to me that many people are confusing the meaning for "including" with the meaning for "excluding".
Is it possible for a compiler to distinguish between a side effect that's "needed", and a side effect that isn't? If timing is considered a needed side effect, then why are compilers allowed to optimise away null operations like do_nothing(); or int unused_variable = 0;?
If a compiler is able to deduce that a function does nothing (eg. void do_nothing() { }), then is it possible that the compiler might have justification to optimise calls to that function away?
If a compiler is able to deduce that a volatile object isn't mapped to anything crucial (i.e. perhaps it's mapped to /dev/null to form a null operation), then is it possible that the compiler might also have justification to optimise that non-crucial side-effect away?
If a compiler can perform optimisations to eliminate unnecessary code such as calls to do_nothing() in a process called "dead code elimination" (which is quite the common practice), then why can't the compiler also eliminate volatile writes to a null device?
As I understand, either the compiler can optimise away calls to functions or volatile accesses or the compiler can't optimise away either, because of 5.1.2.3p4.
I think the "including any" applies to the "needed side-effects" , whereas you seem to be reading it as applying to "part of an expression".
So the intent was to say:
... An actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no needed side effects are produced .
Examples of needed side-effects include:
Needed side-effects caused by a function which this expression calls
Accesses to volatile variables
Now, the term needed side-effect is not defined by the Standard. Section /4 is not attempting to define it either -- it's trying (and not succeeding very well) to provide examples.
I think the only sensible interpretation is to treat it as meaning observable behaviour which is defined by 5.1.2.3/6. So it would have been a lot simpler to write:
An actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no observable behaviour would be caused.
Your questions in the edit are answered by 5.1.2.3/6, sometimes known as the as-if rule, which I'll quote here:
The least requirements on a conforming implementation are:
Accesses to volatile objects are evaluated strictly according to the rules of the abstract machine.
At program termination, all data written into files shall be identical to the result that execution of the program according to the abstract semantics would have produced.
The input and output dynamics of interactive devices shall take place as specified in 7.21.3. The intent of these requirements is that unbuffered or line-buffered output appear as soon as possible, to ensure that prompting messages actually appear prior to a program waiting for input.
This is the observable behaviour of the program.
Answering the specific questions in the edit:
Is it possible for a compiler to distinguish between a side effect that's "needed", and a side effect that isn't? If timing is considered a needed side effect, then why are compilers allowed to optimise away null operations like do_nothing(); or int unused_variable = 0;?
Timing isn't a side-effect. A "needed" side-effect presumably here means one that causes observable behaviour.
If a compiler is able to deduce that a function does nothing (eg. void do_nothing() { }), then is it possible that the compiler might have justification to optimise calls to that function away?
Yes, these can be optimized out because they do not cause observable behaviour.
If a compiler is able to deduce that a volatile object isn't mapped to anything crucial (i.e. perhaps it's mapped to /dev/null to form a null operation), then is it possible that the compiler might also have justification to optimise that non-crucial side-effect away?
No, because accesses to volatile objects are defined as observable behaviour.
If a compiler can perform optimisations to eliminate unnecessary code such as calls to do_nothing() in a process called "dead code elimination" (which is quite the common practice), then why can't the compiler also eliminate volatile writes to a null device?
Because volatile accesses are defined as observable behaviour and empty functions aren't.
I believe this:
(including any caused by calling a function or accessing a volatile
object)
is intended to be read as
(including:
any side-effects caused by calling a function; or
accessing a volatile variable)
This reading makes sense because accessing a volatile variable is a side-effect.
Consider a while loop in ANSI C whose only purpose is to delay execution:
unsigned long counter = DELAY_COUNT;
while(counter--);
I've seen this used a lot to enforce delays on embedded systems, where eg. there is no sleep function and timers or interrupts are limited.
My reading of the ANSI C standard is that this can be completely removed by a conforming compiler. It has none of the side effects described in 5.1.2.3:
Accessing a volatile object, modifying an object, modifying a file, or calling a function that does any of those operations are all side effects, which are changes in the state of the execution environment.
...and this section also says:
An actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no needed side effects are produced (including any caused by calling a function or accessing a volatile object).
Does this imply that the loop could be optimised out? Even if counter were volatile?
Notes:
That this is not quite the same as Are compilers allowed to eliminate infinite loops?, because that refers to infinite loops, and questions arise about when a program is allowed to terminate at all. In this case, the program will certainly proceed past this line at some point, optimisation or not.
I know what GCC does (removes the loop for -O1 or higher, unless counter is volatile), but I want to know what the standard dictates.
C standard compliance follows the "as-if" rule, by which the compiler can generate any code that behaves "as if" it was running your actual instructions on the abstract machine. Since not performing any operations has the same observable behaviour "as if" you did perform the loop, it's entirely permissible to not generate code for it.
In other words, the time something takes to compute on a real machine is not part of the "observable" behaviour of your program, it is merely a phenomenon of a particular implementation.
The situation is different for volatile variables, since accessing a volatile counts as an "observable" effect.
Does this imply that the loop could be optimised out?
Yes.
Even if counter were volatile?
No. It would read and write a volatile variable, which has observable behavior, so it must occur.
If the counter is volatile, the compiler cannot legally optimize out the delay loop. Otherwise it can.
Delay loops like this are bad because the time they burn depends on how the compiler generates code for them. Using different optimization options you can achieve different delays, which is hardly what one wants from a delay loop.
For this reason such delay loops should be implemented in assembly language, where the programmer controls the code fully. This typically applies in embedded systems with simple CPUs.
The standard dictates the behavior you see. If you create a dependency tree for DELAY_COUNT you see that it has a modify without use property which means it can be eliminated. This is in reference to the non volatile case. IN the volatile case the compiler cannot use the dependency tree to attempt to remove this variable and as such the delay remains (since volatile means that hardware can change the memory mapped value OR in some cases means "I really need this don't throw it away") In the case you're looking at if labeled volatile it tells the compiler, please don't throw this away it's here for a reason.
While reading from a site a read that you can not make a global variable of type register.Why is it so?
source:
http://publib.boulder.ibm.com/infocenter/lnxpcomp/v8v101/index.jsp?topic=/com.ibm.xlcpp8l.doc/language/ref/regdef.htm
In theory, you could allocate a processor register to a global scope variable - that register would simply have to remain allocated to that variable for the whole life of the program.
However, C compilers don't generally get to see the entire program during the compile phase - the C standard was written so that each translation unit (roughly corresponding to each .c file) could be compiled independently of the others (with the compiled objects later linked into a program). This is why global scope register variables aren't allowed - when the compiler is compiling b.c, it has no way to know that there was a global variable allocated to a register in a.c (and that therefore functions in b.c must preserve the value in that register).
Actually, GCC allows this. A declaration in global scope in the form:
register int foo asm ("r12");
Allocates the register "r12" (on x86_64) for the global "foo". This has a number of limitations and the corresponding manual page is probably the best reference to all the hassle global register variables would make:
http://gcc.gnu.org/onlinedocs/gcc/Explicit-Reg-Vars.html
Because it would be senseless. Global variables exist all the time the application is working. There surely is no free processor register for such a long time ;)
The register keyword has a different meaning than what its name seems to indicate, nowadays it has not much to do with a register of the processing environment. (Although it probably once was chosen for this.) The only text that constrains the use of a variable that is declared with register is this
The operand of the unary & operator
shall be either a function designator,
the result of a [] or unary *
operator, or an lvalue that designates
an object that is not a bit-field and
is not declared with the register
storage-class specifier
So it implements a restriction to automatic variables (those that you declare in a function) such that it is an error to take the address of such a variable. The idea then is that the compiler may represent this variable in whatever way pleases, as a register or as an immediate assembler value etc. You as a programmer promise that you wouldn't take an address of it. Usually this makes not much sense for global variables (they have an address, anyhow).
To summarize:
No, the register keyword is not
ignored.
Yes, it can only be used for stack
variables if you want to be standard conformant
Originally, register variables were meant to be stored in processor registers, but global variables have to be stored in the data or the BSS section to be accessible from every function. Today, compilers don't interpret the register storage class strictly, so it remains largely for compatibility reasons.
The register word is used in C/C++ as request to the compiler to use registers of processor like variables. A register is a sort of variable used by CPU, very very fast in access because it isn't located in memory (RAM). The use of a register is limited by the architecture and the size of the register itself (this mean that some could be just like memory pointers, other to load special debug values and so on).
The calling conventions used by C/C++ doesn't use general registers (EAX, EBX and so on in 80x86 Arch) to save parameters (But the returned value is stored in EAX), so you could declare a var like register making code faster.
If you ask to make it global you ask to reserve the register for all the code and all your source. This is impossible, so compiler give you an error, or simply make it a usual var stored in memory.
Some compilers provide a means of dedicating a register permanently to a variable. The register keyword, however, is insufficient. A compiler's decision to allocate the local variables for a routine in registers generally does not require coordination with anything in other source modules (while some development systems do register optimization between routines, it's far more common to simply define the calling convention so that all routines are allowed to freely alter certain registers (so a caller is responsible for saving the contents if they're needed after the function call) but must not alter others (so the called routine is responsible for saving and restoring the contents if the registers are needed in the function). Thus, a linker doesn't need to concern itself with register usage.
Such an approach is fine for local register variables, but useless for global ones. For global register variables to be useful, the programmer must generally tell the compiler which register is to be used for what variable, and make sure that such reservations are known to the compiler when compiling all modules--even those that don't use the register otherwise. This can be useful in embedded systems, especially with variables that are used by interrupts, but there's usually a very limited number (e.g. 2 or so) of such variables allowed in a system.
So do we all agree now? Do we all see that making a global variable a register variable would be a really, really bad idea? If the original C definition did not forbid it, it was probably because nobody thought anyone would actually implement it that way -- as they should not have especially back in CISC days.
Besides: modern optimizing compilers do a better job of deciding when to keep variables in registers than humans can do. If yours can't do it, then you really, REALLY need to get a better compiler.
Because they're in registers. It's a contradiction in terms.
Consider the following:
volatile uint32_t i;
How do I know if gcc did or did not treat i as volatile? It would be declared as such because no nearby code is going to modify it, and modification of it is likely due to some interrupt.
I am not the world's worst assembly programmer, but I play one on TV. Can someone help me to understand how it would differ?
If you take the following stupid code:
#include <stdio.h>
#include <inttypes.h>
volatile uint32_t i;
int main(void)
{
if (i == 64738)
return 0;
else
return 1;
}
Compile it to object format and disassemble it via objdump, then do the same after removing 'volatile', there is no difference (according to diff). Is the volatile declaration just too close to where its checked or modified or should I just always use some atomic type when declaring something volatile? Do some optimization flags influence this?
Note, my stupid sample does not fully match my question, I realize this. I'm only trying to find out if gcc did or did not treat the variable as volatile, so I'm studying small dumps to try to find the difference.
Many compilers in some situations don't treat volatile the way they should. See this paper if you deal much with volatiles to avoid nasty surprises: Volatiles are Miscompiled, and What to Do about It. It also contains the pretty good description of the volatile backed with the quotations from the standard.
To be 100% sure, and for such a simple example check out the assembly output.
Try setting the variable outside a loop and reading it inside the loop. In a non-volatile case, the compiler might (or might not) shove it into a register or make it a compile time constant or something before the loop, since it "knows" it's not going to change, whereas if it's volatile it will read it from the variable space every time through the loop.
Basically, when you declare something as volatile, you're telling the compiler not to make certain optimizations. If it decided not to make those optimizations, you don't know that it didn't do them because it was declared volatile, or just that it decided it needed those registers for something else, or it didn't notice that it could turn it into a compile time constant.
As far as I know, volatile helps the optimizer. For example, if your code looked like this:
int foo() {
int x = 0;
while (x);
return 42;
}
The "while" loop would be optimized out of the binary.
But if you define 'x' as being volatile (ie, volatile int x;), then the compiler will leave the loop alone.
Your little sample is inadequate to show anything. The difference between a volatile variable and one that isn't is that each load or store in the code has to generate precisely one load or store in the executable for a volatile variable, whereas the compiler is free to optimize away loads or stores of non-volatile variables. If you're getting one load of i in your sample, that's what I'd expect for volatile and non-volatile.
To show a difference, you're going to have to have redundant loads and/or stores. Try something like
int i = 5;
int j = i + 2;
i = 5;
i = 5;
printf("%d %d\n", i, j);
changing i between non-volatile and volatile. You may have to enable some level of optimization to see the difference.
The code there has three stores and two loads of i, which can be optimized away to one store and probably one load if i is not volatile. If i is declared volatile, all stores and loads should show up in the object code in order, no matter what the optimization. If they don't, you've got a compiler bug.
It should always treat it as volatile.
The reason the code is the same is that volatile just instructs the compiler to load the variable from memory each time it accesses it. Even with optimization on, the compiler still needs to load i from memory once in the code you've written, because it can't infer the value of i at compile time. If you access it repeatedly, you'll see a difference.
Any modern compiler has multiple stages. One of the fairly easy yet interesting questions is whether the declaration of the variable itself was parsed correctly. This is easy because the C++ name mangling should differ depending on the volatile-ness. Hence, if you compile twice, once with volatile defined away, the symbol tables should differ slightly.
Read the standard before you misquote or downvote. Here's a quote from n2798:
7.1.6.1 The cv-qualifiers
7 Note: volatile is a hint to the implementation to avoid aggressive optimization involving the object because the value of the object might be changed by means undetectable by an implementation. See 1.9 for detailed semantics. In general, the semantics of volatile are intended to be the same in C++ as they are in C.
The keyword volatile acts as a hint. Much like the register keyword. However, volatile asks the compiler to keep all its optimizations at bay. This way, it won't keep a copy of the variable in a register or a cache (to optimize speed of access) but rather fetch it from the memory everytime you request for it.
Since there is so much of confusion: some more. The C99 standard does in fact say that a volatile qualified object must be looked up every time it is read and so on as others have noted. But, there is also another section that says that what constitutes a volatile access is implementation defined. So, a compiler, which knows the hardware inside out, will know, for example, when you have an automatic volatile qualified variable and whose address is never taken, that it will not be put in a sensitive region of memory and will almost certainly ignore the hint and optimize it away.
This keyword finds usage in setjmp and longjmp type of error handling. The only thing you have to bear in mind is that: You supply the volatile keyword when you think the variable may change. That is, you could take an ordinary object and manage with a few casts.
Another thing to keep in mind is the definition of what constitutes a volatile access is left by standard to the implementation.
If you really wanted different assembly compile with optimization