compiler execution sequence - c

I am programming a microcontroller in C language and I’ve faced the following situation: To enable/disable the internal clock generator, I need to first write a protection key to a specific register and after make the enable/disable operation, writing in another register.
The user manual specify the following about this operation:
snip from user manual
As you can see in the highlighted part of the text, the piece of code that make this two operation (writing protection key and enable the oscillator) is very sensitive, because any operation should be executed between this two register access. Therefore, I start to concern about any situation that could led to a non sequential execution of this two operations.
I know that it is common to temporary disable interruption while executing sensitive piece of codes, but I was wonder if there is any compiler optimization that could insert another operation between this two register access. So, in C programming, there is any compiler directive to ensure that it will not happen?
Actually, I even do not know if make sense to think that the compiler will mix the sequence of instructions written in C language. I remember to have already heard that it could happens sometimes, in the speed optimization process. If I am wrong, sorry about and thanks in advance for the clarification.

The process of toggling bits in C involves a read-modify-write sequence of instructions.
Some unwanted operations may happen between those instructions and the usual approach is to either disable the interrupts or to use bit-banding.
What bit-banding is, it allows to you toggle bits using one instruction(without going into details how it happens).
If your microcontroller allows you to use bit-banding, then that should fix all your problems.
If you can't use bit-banding, then you have to sit and think what interrupts may cause problems to you and temporarily disable them.
To get back to the core of the question - you must declare all of your registers volatile for two main reasons:
Unlike ordinary memory that is only modified by the program itself, the value stored in those memory mapped registers can change at any time, what the compiler will do is just optimize away all reads and writes to the register. Volatile prevents this.By declaring a variable volatile, you're effectively asking the compiler to be as inefficient as possible when it comes to reading or writing that variable. Specifically, the compiler should generate code to perform each and every read from a volatile variable as well as each and every write to a volatile variable even if you write it twice in a row or read it and ignore the result. Not a single read or write can be skipped. In other words, no compiler optimizations are allowed with respect to volatile variables.
In a nutshell, the order of accesses of volatile variables A and B in the object code must be the same as the order of those accesses in the source code. The compiler is not allowed to reorder volatile variable accesses for any reason. (Consider what might go wrong if the referenced memory locations were hardware registers like in this case). Volatile guarantees you that everything happens in the sequence you specify in your code and no reorderings are being made.
To sum it up - use bit-banding and declare the registers volatile.

Related

embedded C - using "volatile" to assert consistency

Consider the following code:
// In the interrupt handler file:
volatile uint32_t gSampleIndex = 0; // declared 'extern'
void HandleSomeIrq()
{
gSampleIndex++;
}
// In some other file
void Process()
{
uint32_t localSampleIndex = gSampleIndex; // will this be optimized away?
PrevSample = RawSamples[(localSampleIndex + 0) % NUM_RAW_SAMPLE_BUFFERS];
CurrentSample = RawSamples[(localSampleIndex + 1) % NUM_RAW_SAMPLE_BUFFERS];
NextSample = RawSamples[(localSampleIndex + 2) % NUM_RAW_SAMPLE_BUFFERS];
}
My intention is that PrevSample, CurrentSample and NextSample are consistent, even if gSampleIndex is updated during the call to Process().
Will the assignment to the localSampleIndex do the trick, or is there any chance it will be optimized away even though gSampleIndex is volatile?
In principle, volatile is not enough to guarantee that Process only sees consistent values of gSampleIndex. In practice, however, you should not run into any issues if uinit32_t is directly supported by the hardware. The proper solution would be to use atomic accesses.
The problem
Suppose that you are running on a 16-bit architecture, so that the instruction
localSampleIndex = gSampleIndex;
gets compiled into two instructions (loading the upper half, loading the lower half). Then the interrupt might be called between the two instructions, and you'll get half of the old value combined with half of the new value.
The solution
The solution is to access gSampleCounter using atomic operations only. I know of three ways of doing that.
C11 atomics
In C11 (supported since GCC 4.9), you declare your variable as atomic:
#include <stdatomic.h>
atomic_uint gSampleIndex;
You then take care to only ever access the variable using the documented atomic interfaces. In the IRQ handler:
atomic_fetch_add(&gSampleIndex, 1);
and in the Process function:
localSampleIndex = atomic_load(gSampleIndex);
Do not bother with the _explicit variants of the atomic functions unless you're trying to get your program to scale across large numbers of cores.
GCC atomics
Even if your compiler does not support C11 yet, it probably has some support for atomic operations. For example, in GCC you can say:
volatile int gSampleIndex;
...
__atomic_add_fetch(&gSampleIndex, 1, __ATOMIC_SEQ_CST);
...
__atomic_load(&gSampleIndex, &localSampleIndex, __ATOMIC_SEQ_CST);
As above, do not bother with weak consistency unless you're trying to achieve good scaling behaviour.
Implementing atomic operations yourself
Since you're not trying to protect against concurrent access from multiple cores, just race conditions with an interrupt handler, it is possible to implement a consistency protocol using standard C primitives only. Dekker's algorithm is the oldest known such protocol.
In your function you access volatile variable just once (and it's the only volatile one in that function) so you don't need to worry about code reorganization that compiler may do (and volatile prevents). What standard says for these optimizations at §5.1.2.3 is:
In the abstract machine, all expressions are evaluated as specified by the semantics. An actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no needed side effects are produced (including any caused by calling a function or accessing a volatile object).
Note last sentence: "...no needed side effects are produced (...accessing a volatile object)".
Simply volatile will prevent any optimization compiler may do around that code. Just to mention few: no instruction reordering respect other volatile variables. no expression removing, no caching, no value propagation across functions.
BTW I doubt any compiler may break your code (with or without volatile). Maybe local stack variable will be elided but value will be stored in a registry (for sure it won't repeatedly access a memory location). What you need volatile for is value visibility.
EDIT
I think some clarification is needed.
Let me safely assume you know what you're doing (you're working with interrupt handlers so this shouldn't be your first C program): CPU word matches your variable type and memory is properly aligned.
Let me also assume your interrupt is not reentrant (some magic cli/sti stuff or whatever your CPU uses for this) unless you're planning some hard-time debugging and tuning.
If these assumptions are satisfied then you don't need atomic operations. Why? Because localSampleIndex = gSampleIndex is atomic (because it's properly aligned, word size matches and it's volatile), with ++gSampleIndex there isn't any race condition (HandleSomeIrq won't be called again while it's still in execution). More than useless they're wrong.
One may think: "OK, I may not need atomic but why I can't use them? Even if such assumption are satisfied this is an *extra* and it'll achieve same goal" . No, it doesn't. Atomic has not same semantic of volatile variables (and seldom volatile is/should be used outside memory mapped I/O and signal handling). Volatile (usually) is useless with atomic (unless a specific architecture says it is) but it has a great difference: visibility. When you update gSampleIndex in HandleSomeIrq standard guarantees that value will be immediately visible to all threads (and devices). with atomic_uint standard guarantees it'll be visible in a reasonable amount of time.
To make it short and clear: volatile and atomic are not the same thing. Atomic operations are useful for concurrency, volatile are useful for lower level stuff (interrupts, devices). If you're still thinking "hey they do *exactly* what I need" please read few useful links picked from comments: cache coherency and a nice reading about atomics.
To summarize:
In your case you may use an atomic variable with a lock (to have both atomic access and value visibility) but no one on this earth would put a lock inside an interrupt handler (unless absolutely definitely doubtless unquestionably needed, and from code you posted it's not your case).

When do I need to use volatile in ISRs?

I am making embedded firmware where everything after initialization happens in ISRs. I have variables that are shared between them, and I am wondering in which cases they need to be volatile. I never block, waiting for a change in another ISR.
When can I be certain that actual memory is read or written to when not using volatile? Once every ISR?
Addendum:
This is for ARM Cortex-M0, but this isn't really a question about ISRs as much as it is about compiler optimization, and as such, the platform shouldn't really be important.
The question is entirely answerable, and the answer is simple:
Without volatile you (simplistically) can't assume that actual memory access will ever happen - the compiler is free to conclude that results are either entirely unused (if that is apparent in what it can see), or that they may be safely cached in a register, or computed out of order (as long as visible dependencies are maintained).
You need volatile to tell the compiler that the side effects of access may be important to something the optimizer is unable to analyze, such as an interrupt context or the context of a different "thread".
In effect volatile is how you say to the compiler "I know something you don't, so don't try to be clever here"
Beware that volatile does not guarantee atomicity of read-modify-write, or of even read or write alone where the data type (or its misalignment) requires muti-step access. In those cases, you risk not just getting a stale value, but an entirely erroneous one.
it is already mentioned that actual write to memory/cache is not exactly predictable when using non volatile variables.
but it is also worth mentioning about another aspect where the volatile variable might get cached and might require a forced cache flush to write in to the actual memory ( depends on whether a write-through or a write-back policy is used).
consider another case where the volatile variable is not cached ( placed in non-cacheable area)
but due to the presence of write buffers and bus bridges sometimes it is not predictable when the real write happens to the intended register and it requires a dummy read to ensure that write actually happened to the real register/memory. This is particularly helpful to avoid race conditions in interrupt clearing/masking.
even though compilers are not supposed to be clever around volatile variables.it is free to do some optimizations with respect to volatile sequence points ( optimization across sequence points not permitted, but optimization between sequence points are permitted)
The variables that need volatile are:
1) Share data between ISR and program data or other threads.
Preferable these are flags that indicate block access to various data structures.
// main() code;
disable_interrupts();
if (flag == 0) {
flag = 1;
enable_interrupts();
Manipulate_data();
flag = 0;
} else {
enable_interrupts();
Cope_with_data_unavailable();
}
2) Memory mapped hardware registers.
This "memory" can change at any time due to hardware conditions and the complier needs to know that their values are not necessarily consistent. Without volatile, a naive comapiler would only sample fred once resulting in a potential endless loop.
volatile int *fred = 0x1234; // Hardware reg at address 0x1234;
while (*fred);

Using bit fields for AVR ports

I'd like to be able to use something like this to make access to my ports clearer:
typedef struct {
unsigned rfid_en: 1;
unsigned lcd_en: 1;
unsigned lcd_rs: 1;
unsigned lcd_color: 3;
unsigned unused: 2;
} portc_t;
extern volatile portc_t *portc;
But is it safe? It works for me, but...
1) Is there a chance of race conditions?
2) Does gcc generate read-modify-write cycles for code that modifies a single field?
3) Is there a safe way to update multiple fields?
4) Is the bit packing and order guaranteed? (I don't care about portability in this case, so gcc-specific options to make it Do What I Mean are fine.)
Handling race conditions must be done by operating system level calls (which will indeed use read-modify-writes), GCC won't do that.
Idem., and no GCC does not generate read-modify-write instructions for volatile. However, a CPU will normally do the write atomically (simply because it's one instruction). This holds true if the bit-field stays within an int for example, but this is CPU/implementation dependent; I mean some may guarantee this up to 8-byte value, while other only up to 4-byte values. So under that condition, bits can't be mixed up (i.e. a few written from one thread, and others from another thread won't occur).
The only way to set multiple fields at the same time, is to set these values in an intermediate variable, and then assign this variable to the volatile.
The C standard specifies that bits are packed together (it seems that there might be exceptions when you start mixing types, but I've never seen that; everyone always uses unsigned ...).
Note: Defining something volatile does not cause a compiler to generate read-modify-writes. What volatile does is telling the compiler that an assignment to that pointer/address must always be made, and may not be optimised away.
Here's another post about the same subject matter. I found there to be quite a few other places where you can find more details.
The keyword volatile has nothing to do with race conditions, or what thread is accessing code. The keyword tells the compiler not to cache the value in registers. It tells the compiler to generate code so that every access goes to the location allocated to the variable, because each access may see a different value. This is the case with memory mapped peripherals. This doesn't help if your MPU has it's own cache. There are usually special instructions or un-cached areas of the memory map to ensure the location, and not a cached copy, is read.
As for being thread safe, just remember that even a memory access may not be thread safe is it is done in two instructions. E.g. in 8051 assembler, you have to get a 16 bit value one byte at a time. The instruction sequence can be interrupted by an IRQ or another thread and the second byte read or written, potentially corrupted.

What optimization could the compiler perform in absence of the "volatile" keyword in the following code?

Following is some code to flash LED's on a Pandaboard when running the Barrelfish operating system. My question is that why don't the LED's flash if the 'volatile' keyword is removed from the definitions of gpio_oe and gpio_dataout.
static volatile uint32_t *gpio_oe = (uint32_t *)(GPIO_BASE + 0x0134);
static volatile uint32_t *gpio_dataout = (uint32_t *)(GPIO_BASE + 0x013C);
void led_flash
{
// Enable output
*gpio_oe &= (~(1 << 8));
// Toggle LED on and off till eternity
while(true)
{
*gpio_dataout ^= (1 << 8); // Set means LED on; Clear means LED off
time_delay(); // To give blinking effect
}
}
I know that volatile needs to be used if the value of a variable can change spontaneously through a source outside the program. But I can't see such a case here. What optimization does the compiler perform that renders the whole while loop for flashing LED's meaningless? And what is the logic behind such optimization, ie. a legit case where such an optimization would make sense?
You also need volatile to force a memory write and order in which generated code would access volatile variables. With regular variables the compiler may decide that the writes are unnecessary and either throw them away or only keep the last one.
Moved from the comments: The compiler may write nothing at all if it sees no reads of the variable, it may even remove the variable.
As far as the compiler can tell, the values *gpio_oe and *gpio_dataout are written but never read. For normal data memory such an access pattern is entirely redundant so can be optimised out. Similarly for locations that are read but never written.
For memory mapped I/O however access to the "memory" location has side effects that the compiler is not aware of. Declaring the location volatile tells the compiler that the location must be explicitly accessed exactly as described by the code.
As well as memory mapped I/O, a similar issue occurs with memory shared between separate threads (RTOS tasks or interrupt handlers for example) since the language is similarly unaware of these contexts.
Embedded.com covers the subject in a number of articles:
Introduction to the volatile keyword - Nigel Jones
Place volatile accurately - Dan Saks
Combining C's volatile and const keywords - Michael Barr
You are trying to access hardware registers directly so you want every access to go to the memory bus and not stay in registers like a normal variable. Volatile will tell the compiler to force all uses of that variable to go to or come from the memory bus. You still have the problem of data caching but that is a separate topic.
EDIT:
What would happen is in your infinite loop the compiler could optimize that variable to be in a register meaning never go to the memory bus meaning never change the gpio meaning the led would not blink. This should be easy to see if you remove the volatile, compile then disassemble (or compile to asm, I find it much easier to read by disassembling the binary).
volatile prevents the compiler from optimizing reads and writes to variable, without it the compiler assumes the value never changes and could replace a loop in which a flag is read with one call or remove a write if the variable is not used later, See this question:
Why is volatile needed in C?

Concurrent access to struct member

I'm using 32-bit microcontroller (STR91x). I'm concurrently accessing (from ISR and main loop) struct member of type enum. Access is limited to writing to that enum field in the ISR and checking in the main loop. Enum's underlying type is not larger than integer (32-bit).
I would like to make sure that I'm not missing anything and I can safely do it.
Provided that 32 bit reads and writes are atomic, which is almost certainly the case (you might want to make sure that your enum's word-aligned) then that which you've described will be just fine.
As paxdiablo & David Knell said, generally speaking this is fine. Even if your bus is < 32 bits, chances are the instruction's multiple bus cycles won't be interrupted, and you'll always read valid data.
What you stated, and what we all know, but it bears repeating, is that this is fine for a single-writer, N-reader situation. If you had more than one writer, all bets are off unless you have a construct to protect the data.
If you want to make sure, find the compiler switch that generates an assembly listing and examine the assembly for the write in the ISR and the read in the main loop. Even if you are not familiar with ARM assembly, I'm sure you could quickly and easily be able to discern whether or not the reads and writes are atomic.
ARM supports 32-bit aligned reads that are atomic as far as interrupts are concerned. However, make sure your compiler doesn't try to cache the value in a register! Either mark it as a volatile, or use an explicit memory barrier - on GCC this can be done like so:
int tmp = yourvariable;
__sync_synchronize(yourvariable);
Note, however, that current versions of GCC person a full memory barrier for __sync_synchronize, rather than just for the one variable, so volatile is probably better for your needs.
Further, note that your variable will be aligned automatically unless you are doing something Weird (ie, explicitly specifying the location of the struct in memory, or requesting a packed struct). Unaligned variables on ARM cannot be read atomically, so make sure it's aligned, or disable interrupts while reading.
Well, it depends entirely on your hardware but I'd be surprised if an ISR could be interrupted by the main thread.
So probably the only thing you have to watch out for is if the main thread could be interrupted halfway through a read (so it may get part of the old value and part of the new).
It should be a simple matter of consulting the specs to ensure that interrupts are only processed between instructions (this is likely since the alternative would be very complex) and that your 32-bit load is a single instruction.
An aligned 32 bit access will generally be atomic (unless it were a particularly ludicrous compiler!).
However the rock-solid solution (and one generally applicable to non-32 bit targets too) is to simply disable the interrupt temporarily while accessing the data outside of the interrupt. The most robust way to do this is through an access function to statically scoped data rather than making the data global where you then have no single point of access and therefore no way of enforcing an atomic access mechanism when needed.

Resources