Consider a while loop in ANSI C whose only purpose is to delay execution:
unsigned long counter = DELAY_COUNT;
while(counter--);
I've seen this used a lot to enforce delays on embedded systems, where eg. there is no sleep function and timers or interrupts are limited.
My reading of the ANSI C standard is that this can be completely removed by a conforming compiler. It has none of the side effects described in 5.1.2.3:
Accessing a volatile object, modifying an object, modifying a file, or calling a function that does any of those operations are all side effects, which are changes in the state of the execution environment.
...and this section also says:
An actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no needed side effects are produced (including any caused by calling a function or accessing a volatile object).
Does this imply that the loop could be optimised out? Even if counter were volatile?
Notes:
That this is not quite the same as Are compilers allowed to eliminate infinite loops?, because that refers to infinite loops, and questions arise about when a program is allowed to terminate at all. In this case, the program will certainly proceed past this line at some point, optimisation or not.
I know what GCC does (removes the loop for -O1 or higher, unless counter is volatile), but I want to know what the standard dictates.
C standard compliance follows the "as-if" rule, by which the compiler can generate any code that behaves "as if" it was running your actual instructions on the abstract machine. Since not performing any operations has the same observable behaviour "as if" you did perform the loop, it's entirely permissible to not generate code for it.
In other words, the time something takes to compute on a real machine is not part of the "observable" behaviour of your program, it is merely a phenomenon of a particular implementation.
The situation is different for volatile variables, since accessing a volatile counts as an "observable" effect.
Does this imply that the loop could be optimised out?
Yes.
Even if counter were volatile?
No. It would read and write a volatile variable, which has observable behavior, so it must occur.
If the counter is volatile, the compiler cannot legally optimize out the delay loop. Otherwise it can.
Delay loops like this are bad because the time they burn depends on how the compiler generates code for them. Using different optimization options you can achieve different delays, which is hardly what one wants from a delay loop.
For this reason such delay loops should be implemented in assembly language, where the programmer controls the code fully. This typically applies in embedded systems with simple CPUs.
The standard dictates the behavior you see. If you create a dependency tree for DELAY_COUNT you see that it has a modify without use property which means it can be eliminated. This is in reference to the non volatile case. IN the volatile case the compiler cannot use the dependency tree to attempt to remove this variable and as such the delay remains (since volatile means that hardware can change the memory mapped value OR in some cases means "I really need this don't throw it away") In the case you're looking at if labeled volatile it tells the compiler, please don't throw this away it's here for a reason.
Related
I can use volatile for something like the following, where the value might be modified by an external function/signal/etc:
volatile int exit = 0;
while (!exit)
{
/* something */
}
And the compiler/assembly will not cache the value. On the other hand, with the restrict keyword, I can tell the compiler that a variable has no aliases / only referenced once inside the current scope, and the compiler can try and optimize it:
void update_res (int *a , int *b, int * restrict c ) {
* a += * c;
* b += * c;
}
Is that a correct understanding of the two, that they are basically opposites of each other? volatile says the variable can be modified outside the current scope and restrict says it cannot? What would be an example of the assembly instructions it would emit for the most basic example using these two keywords?
They're not exact opposites of each other. But yes, volatile gives a hard constraint to the optimizer to not optimize away accesses to an object, while restrict is a promise / guarantee to the optimizer about aliasing, so in a broad sense they act in opposite directions in terms of freedom for the optimizer. (And of course usually only matter in optimized builds.)
restrict is totally optional, only allowing extra performance. volatile sig_atomic_t can be "needed" for communication between a signal handler and the main program, or for device drivers. For any other use, _Atomic is usually a better choice. Other than that, volatile is also not needed for correctness of normal code. (_Atomic has a similar effect, especially with current compilers which purposely don't optimize atomics.) Neither volatile nor _Atomic are needed for correctness of single-threaded code without signal handlers, regardless of how complex the series of function calls is, or any amount of globals holding pointers to other variables. The as-if rule already requires compilers to make asm that gives observable results equivalent to stepping through the C abstract machine 1 line at a time. (Memory contents is not an observable result; that's why data races on non-atomic objects are undefined behaviour.)
volatile means that every C variable read (lvalue to rvalue conversion) and write (assignment) must become an asm load and store. In practice yes that means it's safe for things that change asynchronously, like MMIO device addresses, or as a bad way to roll your own _Atomic int with memory_order_relaxed. (When to use volatile with multi threading? - basically never in C11 / C++11.)
volatile says the variable can be modified outside the current scope
It depends what you mean by that. Volatile is far stronger than that, and makes it safe for it to be modified asynchronously while inside the current scope.
It's already safe for a function called from this scope to modify a global exit var; if a function doesn't get inlined, compilers generally have to assume that every global var could have been modified, same for everything possibly reachable from global pointers (escape analysis), or from calling functions in this translation unit that modify file-scoped static variables.
And like I said, you can use it for multi-threading, but don't. C11 _Atomic is standardized and can be used to write code that compiles to the same asm, but with more guarantees about exactly what is and isn't implied. (Especially ordering wrt. other operations.)
They have no equivalent in hand-written asm because there's no optimizer between the source and machine code asm.
In C compiler output, you won't notice a difference if you compile with optimization disabled. (Well maybe a minor difference in expressions that read the same volatile multiple times.)
Compiling with optimization disabled makes bad uninteresting asm, where every object is treated much like volatile to enable consistent debugging. As Multithreading program stuck in optimized mode but runs normally in -O0 shows, the optimizations allowed by making variables plain non-volatile only get done with optimization enabled. See also this Q&A about the same issue on single-core microcontrollers with interrupts.
Why does clang produce inefficient asm with -O0 (for this simple floating point sum)?
How to remove "noise" from GCC/clang assembly output? - pretty sure I linked you this multiple times already.
*What would be an example of the assembly instructions it would emit for the most basic example using these two keywords?
Try it yourself on https://godbolt.org/ with gcc10 -O3. You already have a useful test-case for restrict; it should let the compiler load *c once.
Or if you search at all, Ciro Santilli has already analyzed the exact function you're asking about back in 2015, in an answer with over 150 upvotes. I found it by searching on site:stackoverflow.com optimize restrict, as the 3rd hit.
Realistic usage of the C99 'restrict' keyword? shows your exact case, including asm output with/without restrict, and analysis / discussion of that asm.
The C99 standard 5.1.2.3$2 says
Accessing a volatile object, modifying an object, modifying a file, or calling a function
that does any of those operations are all side effects, 12) which are changes in the state of
the execution environment. Evaluation of an expression in general includes both value
computations and initiation of side effects. Value computation for an lvalue expression
includes determining the identity of the designated object.
I guess that in a lot of cases the compiler can't inline and possibly eliminate the functions doing I/O since they live in a different translation unit. And the parameters to functions doing I/O are often pointers, further hindering the optimizer.
But link-time-optimization gives the compiler "more to chew on".
And even though the paragraph I quoted says that "modifying an object" (that's standard-speak for memory) is a side-effect, stores to memory is not automatically treated as a side effect when the optimizer kicks in. Here's an example from John Regehrs Nine Ways to Break your Systems Software using Volatile where the message store is reordered relative to the volatile ready variable.
.
volatile int ready;
int message[100];
void foo (int i) {
message[i/10] = 42;
ready = 1;
}
How do a C compiler determine if a statement operates on a file? In a free-standing embedded environment I declare registers as volatile, thus hindering the compiler from optimizing calls away and swapping order of I/O calls.
Is that the only way to tell the compiler that we're doing I/O? Or do the C standard dictate that these N calls in the standard library do I/O and thus must receive special treatment? But then, what if someone created their own system call wrapper for say read?
As C has no statement dedicated to IO, only function calls can modify files. So if the compiler sees no function call in a sequence of statements, it knows that this sequence has not modified any file.
If only functions from the standard library are called, and if the environment is hosted, the compiler could know what they do and use that to guess what will happen.
But what is really important, is that the compiler only needs to respect side effects. It is perfectly allowed when it does not know, to assume that a function call could involve side effects and act accordingly. It will not be a violation of the standard if no side effects are actually involved, it will just possibly lose a higher optimization.
I have been advised many times that accesses to volatile objects can't be optimised away, however it seems to me as though this section, present in the C89, C99 and C11 standards advises otherwise:
... An actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no needed side effects are produced (including any caused by calling a function or accessing a volatile object).
If I understand correctly, this sentence is stating that an actual implementation can optimise away part of an expression, providing these two requirements are met:
"its value is not used", and
"that no needed side effects are produced (including any caused by calling a function or accessing a volatile object)"...
It seems to me that many people are confusing the meaning for "including" with the meaning for "excluding".
Is it possible for a compiler to distinguish between a side effect that's "needed", and a side effect that isn't? If timing is considered a needed side effect, then why are compilers allowed to optimise away null operations like do_nothing(); or int unused_variable = 0;?
If a compiler is able to deduce that a function does nothing (eg. void do_nothing() { }), then is it possible that the compiler might have justification to optimise calls to that function away?
If a compiler is able to deduce that a volatile object isn't mapped to anything crucial (i.e. perhaps it's mapped to /dev/null to form a null operation), then is it possible that the compiler might also have justification to optimise that non-crucial side-effect away?
If a compiler can perform optimisations to eliminate unnecessary code such as calls to do_nothing() in a process called "dead code elimination" (which is quite the common practice), then why can't the compiler also eliminate volatile writes to a null device?
As I understand, either the compiler can optimise away calls to functions or volatile accesses or the compiler can't optimise away either, because of 5.1.2.3p4.
I think the "including any" applies to the "needed side-effects" , whereas you seem to be reading it as applying to "part of an expression".
So the intent was to say:
... An actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no needed side effects are produced .
Examples of needed side-effects include:
Needed side-effects caused by a function which this expression calls
Accesses to volatile variables
Now, the term needed side-effect is not defined by the Standard. Section /4 is not attempting to define it either -- it's trying (and not succeeding very well) to provide examples.
I think the only sensible interpretation is to treat it as meaning observable behaviour which is defined by 5.1.2.3/6. So it would have been a lot simpler to write:
An actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no observable behaviour would be caused.
Your questions in the edit are answered by 5.1.2.3/6, sometimes known as the as-if rule, which I'll quote here:
The least requirements on a conforming implementation are:
Accesses to volatile objects are evaluated strictly according to the rules of the abstract machine.
At program termination, all data written into files shall be identical to the result that execution of the program according to the abstract semantics would have produced.
The input and output dynamics of interactive devices shall take place as specified in 7.21.3. The intent of these requirements is that unbuffered or line-buffered output appear as soon as possible, to ensure that prompting messages actually appear prior to a program waiting for input.
This is the observable behaviour of the program.
Answering the specific questions in the edit:
Is it possible for a compiler to distinguish between a side effect that's "needed", and a side effect that isn't? If timing is considered a needed side effect, then why are compilers allowed to optimise away null operations like do_nothing(); or int unused_variable = 0;?
Timing isn't a side-effect. A "needed" side-effect presumably here means one that causes observable behaviour.
If a compiler is able to deduce that a function does nothing (eg. void do_nothing() { }), then is it possible that the compiler might have justification to optimise calls to that function away?
Yes, these can be optimized out because they do not cause observable behaviour.
If a compiler is able to deduce that a volatile object isn't mapped to anything crucial (i.e. perhaps it's mapped to /dev/null to form a null operation), then is it possible that the compiler might also have justification to optimise that non-crucial side-effect away?
No, because accesses to volatile objects are defined as observable behaviour.
If a compiler can perform optimisations to eliminate unnecessary code such as calls to do_nothing() in a process called "dead code elimination" (which is quite the common practice), then why can't the compiler also eliminate volatile writes to a null device?
Because volatile accesses are defined as observable behaviour and empty functions aren't.
I believe this:
(including any caused by calling a function or accessing a volatile
object)
is intended to be read as
(including:
any side-effects caused by calling a function; or
accessing a volatile variable)
This reading makes sense because accessing a volatile variable is a side-effect.
I'm a foreigner reading C99, while a sentence (in 6.7.1) makes me confused:
'(A declaration of an identifier for an object with storage-class
specifier register suggests that access to the object be as fast as
possible.) The extent to which such suggestions are effective is
implementation-defined.'
How should I parse the second sentence :
The extent to, which such suggestions are effective, is implementation-defined.
The extent, to which such suggestions are effective, is implementation-defined.
which one is better?
Does that means an implementation has full powers to decide how to deal with register, even with a termination of translation?
Thanks.
Don't mix up register variables with registers of the CPU. These are not the same.
register has the purpose to allow for optimizations. Taking the address of a such a variable is forbidden and as a consequence such a variable can never alias, the compiler always masters its latest value. Thus it can realize such a variable easily as a CPU register or an assembler immediate, for example.
As a particular subcase there are const qualified register variables that can't be altered by the program directly nor behind the scenes by some other code that would access it through a pointer. Only such variables can easily be guaranteed to be constant through out the whole program execution.
No -- the extent to which the suggestions are effective, is implementation defined, but the effect of the code as a whole is not.
To put it slightly differently, when/if you put register into your code (where allowed), the compiler can completely ignore it -- and I feel obliged to add that nearly all reasonably modern compilers will.
Nonetheless, the mere presence of register won't prevent the code from compiling, unless you try to apply it somewhere it's not allowed (e.g., to a global variable).
Bottom line: don't use it, but don't worry about removing it from code that already has it either. It's a waste of time and effort, but with reasonably modern compilers a completely harmless one.
Register is an old keyword. Basically it is a hint to an optimizer telling that it is better to use register for storing this variable. The pharse that you are mentioning means that compiler can treat this hint in any way it wants. In our days of optimizing compilers, its meaning and value for 99% is nothing.
I saw this in a part that never gets called in a coworker's code:
volatile unsigned char vol_flag = 0;
// ...
while(!vol_flag);
vol_flag is declared in the header file, but is never changed. Am I correct that this will lead to the program hanging in an infinite loop? Is there a way out of it?
Usually a code like this indicates that vol_flag is expected to be changed externally at some point. Here, externally may mean a different thread, an interrupt handler, a piece of hardware (in case of memory mapped IO) etc. This loop effectively waits for the external event which changes the flag.
The volatile keyword is a way for the programmer to express the fact that it is not safe to assume what is apparent from the code: namely that the flag is not changed in the loop. Thus, it prevents the compiler from making optimizations which could compromise the intentions behind the code. Instead, the compiler is forced to make a memory reference to fetch the value of the flag.
Note that (unlike in Java) volatile in C/C++ does not establish happens-before relationship and does not guarantee any ordering or visibility of memory references across volatile access. Moreover, it does not ensure atomicity of variable references. Thus, it is not a tool for communication between threads. See this for details.