Is restricted the opposite of volatile? - c

I can use volatile for something like the following, where the value might be modified by an external function/signal/etc:
volatile int exit = 0;
while (!exit)
{
/* something */
}
And the compiler/assembly will not cache the value. On the other hand, with the restrict keyword, I can tell the compiler that a variable has no aliases / only referenced once inside the current scope, and the compiler can try and optimize it:
void update_res (int *a , int *b, int * restrict c ) {
* a += * c;
* b += * c;
}
Is that a correct understanding of the two, that they are basically opposites of each other? volatile says the variable can be modified outside the current scope and restrict says it cannot? What would be an example of the assembly instructions it would emit for the most basic example using these two keywords?

They're not exact opposites of each other. But yes, volatile gives a hard constraint to the optimizer to not optimize away accesses to an object, while restrict is a promise / guarantee to the optimizer about aliasing, so in a broad sense they act in opposite directions in terms of freedom for the optimizer. (And of course usually only matter in optimized builds.)
restrict is totally optional, only allowing extra performance. volatile sig_atomic_t can be "needed" for communication between a signal handler and the main program, or for device drivers. For any other use, _Atomic is usually a better choice. Other than that, volatile is also not needed for correctness of normal code. (_Atomic has a similar effect, especially with current compilers which purposely don't optimize atomics.) Neither volatile nor _Atomic are needed for correctness of single-threaded code without signal handlers, regardless of how complex the series of function calls is, or any amount of globals holding pointers to other variables. The as-if rule already requires compilers to make asm that gives observable results equivalent to stepping through the C abstract machine 1 line at a time. (Memory contents is not an observable result; that's why data races on non-atomic objects are undefined behaviour.)
volatile means that every C variable read (lvalue to rvalue conversion) and write (assignment) must become an asm load and store. In practice yes that means it's safe for things that change asynchronously, like MMIO device addresses, or as a bad way to roll your own _Atomic int with memory_order_relaxed. (When to use volatile with multi threading? - basically never in C11 / C++11.)
volatile says the variable can be modified outside the current scope
It depends what you mean by that. Volatile is far stronger than that, and makes it safe for it to be modified asynchronously while inside the current scope.
It's already safe for a function called from this scope to modify a global exit var; if a function doesn't get inlined, compilers generally have to assume that every global var could have been modified, same for everything possibly reachable from global pointers (escape analysis), or from calling functions in this translation unit that modify file-scoped static variables.
And like I said, you can use it for multi-threading, but don't. C11 _Atomic is standardized and can be used to write code that compiles to the same asm, but with more guarantees about exactly what is and isn't implied. (Especially ordering wrt. other operations.)
They have no equivalent in hand-written asm because there's no optimizer between the source and machine code asm.
In C compiler output, you won't notice a difference if you compile with optimization disabled. (Well maybe a minor difference in expressions that read the same volatile multiple times.)
Compiling with optimization disabled makes bad uninteresting asm, where every object is treated much like volatile to enable consistent debugging. As Multithreading program stuck in optimized mode but runs normally in -O0 shows, the optimizations allowed by making variables plain non-volatile only get done with optimization enabled. See also this Q&A about the same issue on single-core microcontrollers with interrupts.
Why does clang produce inefficient asm with -O0 (for this simple floating point sum)?
How to remove "noise" from GCC/clang assembly output? - pretty sure I linked you this multiple times already.
*What would be an example of the assembly instructions it would emit for the most basic example using these two keywords?
Try it yourself on https://godbolt.org/ with gcc10 -O3. You already have a useful test-case for restrict; it should let the compiler load *c once.
Or if you search at all, Ciro Santilli has already analyzed the exact function you're asking about back in 2015, in an answer with over 150 upvotes. I found it by searching on site:stackoverflow.com optimize restrict, as the 3rd hit.
Realistic usage of the C99 'restrict' keyword? shows your exact case, including asm output with/without restrict, and analysis / discussion of that asm.

Related

Is it good practice to make a global variable always volatile?

I know which is the meaning of volatile. I need to ask that if my variable is global, is it good practise to make it volatile, even i dont use interface with hardware.
Header:
typedef struct
{
int Value;
}Var_;
extern volatile Var_ myVariable;
Source:
volatile Var_ myVariable;
No. If you’re writing multi-threaded code, you want to use atomic variables, not volatile. For example, many concurrent structures need to be kept consistent, not modified one word at a time.
If no other thread, process or hardware is modifying the variable, you should not use either atomics or volatile. It will just complicate the program, run slower, and disable certain APIs for no reason.
The volatile keyword has historically been used for a few different things (such as telling the compiler not to optimize away a delay loop), but its purpose in C11 is narrow: to specify that a value in memory will change by some means that doesn’t follow the rules of atomics. You need it to write some kinds of device drivers, but it’s discouraged even in other low-level code such as OS kernels.
No, it is not good practice. volatile informs the C implementation (largely the compiler) that an object may be changed by something outside of the C implementation or that accesses to the object within the C implementation may have desired effects outside the C implementation. As long as your global object is only used and modified inside your own program, it has no volatile effects, and declaring it with volatile causes the compiler to suppress optimization and to generate unnecessary accesses to it within your program.

Is it useless to use the `register` keyword with modern compilers, when optimizing?

The C register keyword gives the compiler a hint to prefer storing a variable in a register rather than, say, on the stack. A compiler may ignore it if it likes. I understand that it's mostly-useless these days when you compile with optimization turned on, but is it entirely useless?
More specifically: For any combination of { gcc, clang, msvc } x { -Og, -O, -O2, -O3 }: Is register ignored when deciding whether to actually assign a register? And if not, are there cases in which it's useful enough to bother using it?
Notes:
I am not asking whether it the keyword any effect; of course it does - it prevents you from using the address of that variable; and if you don't optimize at all, it will make the difference between register assignment or memory assignment for your variable.
Answers for just one compiler / some of the above combinations are very welcome.
For GCC, register has had no effect on code generation at all, not even a hint, at all optimization levels, for all supported CPU architectures, for over a decade.
The reasons for this are largely historical. GCC 2.95 and older had two register allocators, one ("stupid") used when not optimizing, and one ("local, global, reload") used when optimizing. The "stupid" allocator did try to honor register, but the "local, global, reload" allocator completely ignored it. (I don't know what the original rationale for that design decision was; you'd have to ask Richard Kenner.) In version 3.0, the "stupid" allocator was scrapped in favor of adding a fast-and-sloppy mode to "local, global, reload". Nobody bothered to write the code to make that mode pay attention to register, so it doesn't.
As of this writing, the GCC devs are in the process of replacing "local, global, reload" with a new allocator called "IRA and LRA", but it, too, completely ignores register.
However, the (C-only) rule that you cannot take the address of a register variable is still enforced, and the keyword is used by the explicit register variable extension, which allows you to dedicate a specific register to a variable; this can be useful in programs that use a lot of inline assembly.
C Standard says:
(c11, 6.7.1p6) "A declaration of an identifier for an object with storage-class specifier register suggests that access to the object be as fast as possible. The extent to which such suggestions are effective is implementation-defined."
These suggestions being implementation-defined means the implementation must define the choices being made ((c11, J.3.8 Hints) "The extent to which suggestions made by using the register storage-class specifier are effective (6.7.1)").
Here is what the documentation of some popular C compilers says.
For gcc (source):
The register specifier affects code generation only in these ways:
When used as part of the register variable extension, see Explicit Register Variables.
When -O0 is in use, the compiler allocates distinct stack memory for all variables that do not have the register storage-class
specifier; if register is specified, the variable may have a
shorter lifespan than the code would indicate and may never be
placed in memory.
On some rare x86 targets, setjmp doesn’t save the registers in all circumstances. In those cases, GCC doesn’t allocate any
variables in registers unless they are marked register.
For IAR compiler for ARM (source):
Honoring the register keyword (6.7.1)
User requests for register variables are not honored
If an object of automatic storage duration never has its address taken, a compiler may safely assume that its value can only be observed or modified by code which uses it directly. In the absence of the register keyword, a compiler which generates code in single-pass fashion would have no way of knowing, given...
void test(int *somePointer)
{
int i;
for (int i=0; i<10; i+=2)
{
somePointer[i] = -1;
somePointer[i+1] = 2;
whether the write to somePointer[i] might possibly write i. If it could, then the compiler would be required to reload i when evaluating somePointer[i+1]. Applying the register keyword to i would allow even a single-pass compiler to avoid the reload because it would be entitled to assume that no valid pointer could hold a value derived from the address of i, and consequently there was no way that writing somePointer[i] could affect i.
The register keyword could be useful, even today, if it were not interpreted as imposing a constraint that the address of an object cannot be exposed to anything, but rather as inviting compilers to assume that no pointer derived from an object's address will be used outside the immediate context where the address was taken, and within contexts where the address is taken the object will be accessed only using pointers derived from address. Unfortunately, the only situations the Standard permits register qualifiers are those where an object's address is never neither used nor exported to outside code in any fashion--i.e. cases which a multi-pass compiler could identify by itself without need for the qualifier.
The register keyword could be useful, even with modern compilers, if it invited compilers to--at their leisure--treat any context where a register-qualified object's address is taken (e.g. get_integer(&i);) using the pattern:
{
int temp = i;
get_integer(&temp);
i = temp;
}
and to assume that a register-qualified object of external scope will only be accessed in contexts which either explicitly access the object or take its address. At present, compilers have very limited ability to cache global variables in registers across pointer accesses involving the same types; the register keyword could help with that except that compilers are required to treat as constraint violations all the situations where it could be useful.

Volatile and its harmful implications

I am a embedded developer and use volatile keyword when working with I/O ports. But my Project manager suggested using volatile keyword is harmful and has lot of draw backs, But i find in most of the cases volatile is useful in embedded programming, As per my knowledge volatile is harmful in kernel code as the changes to our code will become unpredictable. There are any drawbacks using volatile in Embedded Systems also?
No, volatile is not harmful. In any situation. Ever. There is no possible well-formed piece of code that will break with the addition of volatile to an object (and pointers to that object). However, volatile is often poorly understood. The reason the kernel docs state that volatile is to be considered harmful is that people kept using it for synchronization between kernel threads in broken ways. In particular, they used volatile integer variables as though access to them was guaranteed to be atomic, which it isn't.
volatile is also not useless, and particularly if you go bare-metal, you will need it. But, like any other tool, it is important to understand the semantics of volatile before using it.
What volatile is
Access to volatile objects is, in the standard, considered a side-effect in the same way as incrementing or decrementing by ++ and --. In particular, this means that 5.1.2.3 (3), which says
(...) An actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no needed side effects are produced (including any caused by calling a function or accessing a volatile object)
does not apply. The compiler has to chuck out everything it thinks it knows about the value of a volatile variable at every sequence point. (like other side-effects, when access to volatile objects happens is governed by sequence points)
The effect of this is largely the prohibition of certain optimizations. Take, for example, the code
int i;
void foo(void) {
i = 0;
while(i == 0) {
// do stuff that does not touch i
}
}
The compiler is allowed to make this into an infinite loop that never checks i again because it can deduce that the value of i is not changed in the loop, and thus that i == 0 will never be false. This holds true even if there is another thread or an interrupt handler that could conceivably change i. The compiler does not know about them, and it does not care. It is explicitly allowed to not care.
Contrast this with
int volatile i;
void foo(void) {
i = 0;
while(i == 0) { // Note: This is still broken, only a little less so.
// do stuff that does not touch i
}
}
Now the compiler has to assume that i can change at any time and cannot do this optimization. This means, of course, that if you deal with interrupt handlers and threads, volatile objects are necessary for synchronisation. They are not, however, sufficient.
What volatile isn't
What volatile does not guarantee is atomic access. This should make intuitive sense if you're used to embedded programming. Consider, if you will, the following piece of code for an 8-bit AVR MCU:
uint32_t volatile i;
ISR(TIMER0_OVF_vect) {
++i;
}
void some_function_in_the_main_loop(void) {
for(;;) {
do_something_with(i); // This is thoroughly broken.
}
}
The reason this code is broken is that access to i is not atomic -- cannot be atomic on an 8-bit MCU. In this simple case, for example, the following might happen:
i is 0x0000ffff
do_something_with(i) is about to be called
the high two bytes of i are copied into the parameter slot for this call
at this point, timer 0 overflows and the main loop is interrupted
the ISR changes i. The lower two bytes of i overflow and are now 0. i is now 0x00010000.
the main loop continues, and the lower two bytes of i are copied into the parameter slot
do_something_with is called with 0 as its parameter.
Similar things can happen on PCs and other platforms. If anything, more opportunities it can fail open up with a more complex architecture.
Takeaway
So no, using volatile is not bad, and you will (often) have to do it in bare-metal code. However, when you do use it, you have to keep in mind that it is not a magic wand, and that you will still have to make sure you don't trip over yourself. In embedded code, there's often a platform-specific way to handle the problem of atomicity; in the case of AVR, for example, the usual crowbar method is to disable interrupts for the duration, as in
uint32_t x;
ATOMIC_BLOCK(ATOMIC_RESTORESTATE) {
x = i;
}
do_something_with(x);
...where the ATOMIC_BLOCK macro calls cli() (disable interrupts) before and sei() (enable interrupts) afterwards if they were enabled beforehand.
With C11, which is the first C standard that explicitly acknowledges the existence of multithreading, a new family of atomic types and memory fencing operations have been introduced that can be used for inter-thread synchronisation and in many cases make use of volatile unnecessary. If you can use those, do it, but it'll likely be some time before they reach all common embedded toolchains. With them, the loop above could be fixed like this:
atomic_int i;
void foo(void) {
atomic_store(&i, 0);
while(atomic_load(&i) == 0) {
// do stuff that does not touch i
}
}
...in its most basic form. The precise semantics of the more relaxed memory order semantics go way beyond the scope of a SO answer, so I'll stick with the default sequentially consistent stuff here.
If you're interested in it, Gil Hamilton provided a link in the comments to an explanation of a lock-free stack implementation using C11 atomics, although I don't feel it's a terribly good write-up of the memory order semantics themselves. The C11 model does, however, appear to closely mirror the C++11 memory model, of which a useful presentation exists here. If I find a link to a C11-specific write-up, I will put it here later.
volatile is only useful when the so qualified object can change asynchronously. Such changes can happen
if the object is in fact an hardware IO register or similar that has changes external to your program
if the object might be changed by a signal handler
if the object is changed between calls to setjmp and longjmp
in all these cases you must declare your object volatile, otherwise your program will not work correctly. (And you might notice that objects shared between different threads is not in the list.)
In all other cases you shouldn't, because you may be missing optimization opportunities. On the other hand, qualifying an object volatile that doesn't fall under the points above will not make your code incorrect.
Not using volatile where necessary and appropriate is far more likely to be harmful! The solution to any perceived problems with volatile is not to ban its use, because there are a number of cases where it is necessary for safe and correct semantics. Rather the solution is to understand its purpose and its behaviour.
It is essential for any data that may be changed outside of the knowledge of the compiler, such as I/O and dual-ported or DMA memory. It is also necessary for access to memory shared between execution contexts such as threads and interrupt-handlers; this is where perhaps the confusion lies; it ensures an explicit read of such memory, and does not enforce atomicity or mutual exclusion - additional mechanisms are required for that, but that does not preclude volatile, but it is merely part of the solution to shared memory access.
See the following articles of the use of volatile (and send them to your project manager too!):
Place volatile accurately by Dan Saks.
Introduction to the volatile keyword by Nigel Jones
Guidelines for handling volatile variables by Colin Walls
Combining C's volatile and const keywords - Michael Barr
Volatile tells the compiler not to optimize anything that has to do with the volatile variable.
Why the "volatile" type class should not be used? - Best article in Kernel doc
https://www.kernel.org/doc/Documentation/volatile-considered-harmful.txt
volatile is a keyword in c which tell the compiler not to do any kind of optimization on that variable.
Let me give you a simple example:
int temp;
for ( i=0 ;i <5 ; i++ )
{
temp = 5;
}
what compiler will do to make the code optimized :
int temp;
temp = 5; /* assigned temp variable before the loop. */
for ( i=0 ;i <5 ; i++ )
{
}
But if we mention volatile keyword then compiler will not do any kind of optimization in temp variable.
volatile int temp;
for ( i=0 ;i <5 ; i++ )
{
temp = 5;
}
"Volatile Considered Harmful" ---> I don't consider volatile as harmful. You use volatile where you don't want any kind of optimization from compiler end.
For example consider this piece of code is used by a thermometer company and temp is a variable used to take the temperature of the atmosphere which can change anytime. So if we do not use volatile then compiler will do the optimization and the atmosphere temperature will always be same.

Can an ANSI C compiler remove a delay loop?

Consider a while loop in ANSI C whose only purpose is to delay execution:
unsigned long counter = DELAY_COUNT;
while(counter--);
I've seen this used a lot to enforce delays on embedded systems, where eg. there is no sleep function and timers or interrupts are limited.
My reading of the ANSI C standard is that this can be completely removed by a conforming compiler. It has none of the side effects described in 5.1.2.3:
Accessing a volatile object, modifying an object, modifying a file, or calling a function that does any of those operations are all side effects, which are changes in the state of the execution environment.
...and this section also says:
An actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no needed side effects are produced (including any caused by calling a function or accessing a volatile object).
Does this imply that the loop could be optimised out? Even if counter were volatile?
Notes:
That this is not quite the same as Are compilers allowed to eliminate infinite loops?, because that refers to infinite loops, and questions arise about when a program is allowed to terminate at all. In this case, the program will certainly proceed past this line at some point, optimisation or not.
I know what GCC does (removes the loop for -O1 or higher, unless counter is volatile), but I want to know what the standard dictates.
C standard compliance follows the "as-if" rule, by which the compiler can generate any code that behaves "as if" it was running your actual instructions on the abstract machine. Since not performing any operations has the same observable behaviour "as if" you did perform the loop, it's entirely permissible to not generate code for it.
In other words, the time something takes to compute on a real machine is not part of the "observable" behaviour of your program, it is merely a phenomenon of a particular implementation.
The situation is different for volatile variables, since accessing a volatile counts as an "observable" effect.
Does this imply that the loop could be optimised out?
Yes.
Even if counter were volatile?
No. It would read and write a volatile variable, which has observable behavior, so it must occur.
If the counter is volatile, the compiler cannot legally optimize out the delay loop. Otherwise it can.
Delay loops like this are bad because the time they burn depends on how the compiler generates code for them. Using different optimization options you can achieve different delays, which is hardly what one wants from a delay loop.
For this reason such delay loops should be implemented in assembly language, where the programmer controls the code fully. This typically applies in embedded systems with simple CPUs.
The standard dictates the behavior you see. If you create a dependency tree for DELAY_COUNT you see that it has a modify without use property which means it can be eliminated. This is in reference to the non volatile case. IN the volatile case the compiler cannot use the dependency tree to attempt to remove this variable and as such the delay remains (since volatile means that hardware can change the memory mapped value OR in some cases means "I really need this don't throw it away") In the case you're looking at if labeled volatile it tells the compiler, please don't throw this away it's here for a reason.

How do I know if gcc agrees that something is volatile?

Consider the following:
volatile uint32_t i;
How do I know if gcc did or did not treat i as volatile? It would be declared as such because no nearby code is going to modify it, and modification of it is likely due to some interrupt.
I am not the world's worst assembly programmer, but I play one on TV. Can someone help me to understand how it would differ?
If you take the following stupid code:
#include <stdio.h>
#include <inttypes.h>
volatile uint32_t i;
int main(void)
{
if (i == 64738)
return 0;
else
return 1;
}
Compile it to object format and disassemble it via objdump, then do the same after removing 'volatile', there is no difference (according to diff). Is the volatile declaration just too close to where its checked or modified or should I just always use some atomic type when declaring something volatile? Do some optimization flags influence this?
Note, my stupid sample does not fully match my question, I realize this. I'm only trying to find out if gcc did or did not treat the variable as volatile, so I'm studying small dumps to try to find the difference.
Many compilers in some situations don't treat volatile the way they should. See this paper if you deal much with volatiles to avoid nasty surprises: Volatiles are Miscompiled, and What to Do about It. It also contains the pretty good description of the volatile backed with the quotations from the standard.
To be 100% sure, and for such a simple example check out the assembly output.
Try setting the variable outside a loop and reading it inside the loop. In a non-volatile case, the compiler might (or might not) shove it into a register or make it a compile time constant or something before the loop, since it "knows" it's not going to change, whereas if it's volatile it will read it from the variable space every time through the loop.
Basically, when you declare something as volatile, you're telling the compiler not to make certain optimizations. If it decided not to make those optimizations, you don't know that it didn't do them because it was declared volatile, or just that it decided it needed those registers for something else, or it didn't notice that it could turn it into a compile time constant.
As far as I know, volatile helps the optimizer. For example, if your code looked like this:
int foo() {
int x = 0;
while (x);
return 42;
}
The "while" loop would be optimized out of the binary.
But if you define 'x' as being volatile (ie, volatile int x;), then the compiler will leave the loop alone.
Your little sample is inadequate to show anything. The difference between a volatile variable and one that isn't is that each load or store in the code has to generate precisely one load or store in the executable for a volatile variable, whereas the compiler is free to optimize away loads or stores of non-volatile variables. If you're getting one load of i in your sample, that's what I'd expect for volatile and non-volatile.
To show a difference, you're going to have to have redundant loads and/or stores. Try something like
int i = 5;
int j = i + 2;
i = 5;
i = 5;
printf("%d %d\n", i, j);
changing i between non-volatile and volatile. You may have to enable some level of optimization to see the difference.
The code there has three stores and two loads of i, which can be optimized away to one store and probably one load if i is not volatile. If i is declared volatile, all stores and loads should show up in the object code in order, no matter what the optimization. If they don't, you've got a compiler bug.
It should always treat it as volatile.
The reason the code is the same is that volatile just instructs the compiler to load the variable from memory each time it accesses it. Even with optimization on, the compiler still needs to load i from memory once in the code you've written, because it can't infer the value of i at compile time. If you access it repeatedly, you'll see a difference.
Any modern compiler has multiple stages. One of the fairly easy yet interesting questions is whether the declaration of the variable itself was parsed correctly. This is easy because the C++ name mangling should differ depending on the volatile-ness. Hence, if you compile twice, once with volatile defined away, the symbol tables should differ slightly.
Read the standard before you misquote or downvote. Here's a quote from n2798:
7.1.6.1 The cv-qualifiers
7 Note: volatile is a hint to the implementation to avoid aggressive optimization involving the object because the value of the object might be changed by means undetectable by an implementation. See 1.9 for detailed semantics. In general, the semantics of volatile are intended to be the same in C++ as they are in C.
The keyword volatile acts as a hint. Much like the register keyword. However, volatile asks the compiler to keep all its optimizations at bay. This way, it won't keep a copy of the variable in a register or a cache (to optimize speed of access) but rather fetch it from the memory everytime you request for it.
Since there is so much of confusion: some more. The C99 standard does in fact say that a volatile qualified object must be looked up every time it is read and so on as others have noted. But, there is also another section that says that what constitutes a volatile access is implementation defined. So, a compiler, which knows the hardware inside out, will know, for example, when you have an automatic volatile qualified variable and whose address is never taken, that it will not be put in a sensitive region of memory and will almost certainly ignore the hint and optimize it away.
This keyword finds usage in setjmp and longjmp type of error handling. The only thing you have to bear in mind is that: You supply the volatile keyword when you think the variable may change. That is, you could take an ordinary object and manage with a few casts.
Another thing to keep in mind is the definition of what constitutes a volatile access is left by standard to the implementation.
If you really wanted different assembly compile with optimization

Resources