CPU Architecture dependent?

CPU Architecture dependent? - c

I would like to determine if the below code would allow the CPU to fetch the unsafe_variable twice? Let's assume the compiler will not reorder or optimise the code because of the volatile and _ReadWriteBarrier (on VS). Mutex cannot be used here and I only care about the case of the potential double fetch.
I am not an expert in term of CPU design but what I am concerned about regarding a potential double fetch would be: Speculative execution (Performance optimization technique including branch prediction and prefetch techniques), Register and memory location renaming and the use of reorder buffer and store buffers within one or two CPUs? Please let me know if a double fetch here would be at all possible.
int function(void* Data) {
size_t _varSize = ((volatile DATA *)Data)->unsafe_variable;
// unsafe_variable is in some kind of shared memory and can change at any time
_ReadWriteBarrier();
// this does not prevent against CPU optimisations (MemoryBarrier would)
if (_varSize > x * y) { return FALSE;}
size_t size = _varSize - t * q;
function_xy(size);
return TRUE;
}

Accessing volatile memory location counts as a side-effect in the C standard C11 5.1.2.3/2:
"Accessing a volatile object, modifying an object, modifying a file, or calling a function
that does any of those operations are all side effects"
and the compiler is not allowed to generate code that causes additional side-effects. It is explicitly not allowed to generate code that causes additional access of volatile variables (C11 5.1.2.3/6).
But then you are using VC++ so all bets are off. It barely follows any standard. I would advise to use a strictly conforming C compiler instead, if possible.

Nothing in the C language specifies memory system behavior at this level nor anything about threads, so there is no certain answer.
To be certain you would need exact details of the CPU, OS, and compiler.
However, I doubt any modern architecture would fetch twice from memory to serve any of the purposes you mention, provided computation is not interrputed. Any reference after the first would be to cache.
However, if there is a context switch after the unsafe_variable fetch is begun but before _varSize can be used, it's imaginable that the fetch might be restarted when this thread continues.

Related

Is assigning a pointer in C program considered atomic on x86-64

https://www.gnu.org/software/libc/manual/html_node/Atomic-Types.html#Atomic-Types says - In practice, you can assume that int is atomic. You can also assume that pointer types are atomic; that is very convenient. Both of these assumptions are true on all of the machines that the GNU C Library supports and on all POSIX systems we know of.
My question is whether pointer assignment can be considered atomic on x86_64 architecture for a C program compiled with gcc m64 flag. OS is 64bit Linux and CPU is Intel(R) Xeon(R) CPU D-1548. One thread will be setting a pointer and another thread accessing the pointer. There is only one writer thread and one reader thread. Reader should either be getting the previous value of the pointer or the latest value and no garbage value in between.
If it is not considered atomic, please let me know how can I use the gcc atomic builtins or maybe memory barrier like __sync_synchronize to achieve the same without using locks. Interested only in C solution and not C++. Thanks!

Bear in mind that atomicity alone is not enough for communicating between threads. Nothing prevents the compiler and CPU from reordering previous/subsequent load and store instructions with that "atomic" store. In old days people used volatile to prevent that reordering but that was never intended for use with threads and doesn't provide means to specify less or more restrictive memory order (see "Relationship with volatile" in there).
You should use C11 atomics because they guarantee both atomicity and memory order.

For almost all architectures, pointer load and store are atomic. A once notable exception was 8086/80286 where pointers could be seg:offset; there was an l[des]s instruction which could make an atomic load; but no corresponding atomic store.
The integrity of the pointer is only a small concern; your bigger issue revolves around synchronization: the pointer was at value Y, you set it to X; how will you know when nobody is using the (old) Y value?
A somewhat related problem is that you may have stored things at X, which the other thread expects to find. Without synchronization, other might see the new pointer value, however what it points to might not be up to date yet.

A plain global char *ptr should not be considered atomic. It might work sometimes, especially with optimization disabled, but you can get the compiler to make safe and efficient optimized asm by using modern language features to tell it you want atomicity.
Use C11 stdatomic.h or GNU C __atomic builtins. And see Why is integer assignment on a naturally aligned variable atomic on x86? - yes the underlying asm operations are atomic "for free", but you need to control the compiler's code-gen to get sane behaviour for multithreading.
See also LWN: Who's afraid of a big bad optimizing compiler? - weird effects of using plain vars include several really bad well-known things, but also more obscure stuff like invented loads, reading a variable more than once if the compiler decides to optimize away a local tmp and load the shared var twice, instead of loading it into a register. Using asm("" ::: "memory") compiler barriers may not be sufficient to defeat that depending on where you put them.
So use proper atomic stores and loads that tell the compiler what you want: You should generally use atomic loads to read them, too.
#include <stdatomic.h> // C11 way
_Atomic char *c11_shared_var; // all access to this is atomic, functions needed only if you want weaker ordering
void foo(){
atomic_store_explicit(&c11_shared_var, newval, memory_order_relaxed);
}
char *plain_shared_var; // GNU C
// This is a plain C var. Only specific accesses to it are atomic; be careful!
void foo() {
__atomic_store_n(&plain_shared_var, newval, __ATOMIC_RELAXED);
}
Using __atomic_store_n on a plain var is the functionality that C++20 atomic_ref exposes. If multiple threads access a variable for the entire time that it needs to exist, you might as well just use C11 stdatomic because every access needs to be atomic (not optimized into a register or whatever). When you want to let the compiler load once and reuse that value, do char *tmp = c11_shared_var; (or atomic_load_explicit if you only want acquire instead of seq_cst; cheaper on a few non-x86 ISAs).
Besides lack of tearing (atomicity of asm load or store), the other key parts of _Atomic foo * are:
The compiler will assume that other threads may have changed memory contents (like volatile effectively implies), otherwise the assumption of no data-race UB will let the compiler hoist loads out of loops. Without this, dead-store elimination might only do one store at the end of a loop, not updating the value multiple times.
The read side of the problem is usually what bites people in practice, see Multithreading program stuck in optimized mode but runs normally in -O0 - e.g. while(!flag){} becomes if(!flag) infinite_loop; with optimization enabled.
Ordering wrt. other code. e.g. you can use memory_order_release to make sure that other threads that see the pointer update also see all changes to the pointed-to data. (On x86 that's as simple as compile-time ordering, no extra barriers needed for acquire/release, only for seq_cst. Avoid seq_cst if you can; mfence or locked operations are slow.)
Guarantee that the store will compile to a single asm instruction. You'd be depending on this. It does happen in practice with sane compilers, although it's conceivable that a compiler might decide to use rep movsb to copy a few contiguous pointers, and that some machine somewhere might have a microcoded implementation that does some stores narrower than 8 bytes.
(This failure mode is highly unlikely; the Linux kernel relies on volatile load/store compiling to a single instruction with GCC / clang for its hand-rolled intrinsics. But if you just used asm("" ::: "memory") to make sure a store happened on a non-volatile variable, there's a chance.)
Also, something like ptr++ will compile to an atomic RMW operation like lock add qword [mem], 4, rather than separate load and store like volatile would. (See Can num++ be atomic for 'int num'? for more about atomic RMWs). Avoid that if you don't need it, it's slower. e.g. atomic_store_explicit(&ptr, ptr + 1, mo_release); - seq_cst loads are cheap on x86-64 but seq_cst stores aren't.
Also note that memory barriers can't create atomicity (lack of tearing), they can only create ordering wrt other ops.
In practice x86-64 ABIs do have alignof(void*) = 8 so all pointer objects should be naturally aligned (except in a __attribute__((packed)) struct which violates the ABI, so you can use __atomic_store_n on them. It should compile to what you want (plain store, no overhead), and meet the asm requirements to be atomic.
See also When to use volatile with multi threading? - you can roll your own atomics with volatile and asm memory barriers, but don't. The Linux kernel does that, but it's a lot of effort for basically no gain, especially for a user-space program.
Side note: an often repeated misconception is that volatile or _Atomic are needed to avoid reading stale values from cache. This is not the case.
All machines that run C11 threads across multiple cores have coherent caches, not needing explicit flush instructions in the reader or writer. Just ordinary load or store instructions, like x86 mov. The key is to not let the compiler keep values of shared variable in CPU registers (which are thread-private). It normally can do this optimization because of the assumption of no data-race Undefined Behaviour. Registers are very much not the same thing as L1d CPU cache; managing what's in registers vs. memory is done by the compiler, while hardware keeps cache in sync. See When to use volatile with multi threading? for more details about why coherent caches is sufficient to make volatile work like memory_order_relaxed.
See Multithreading program stuck in optimized mode but runs normally in -O0 for an example.

"Atomic" is treated as this quantum state where something can be both atomic and not atomic at the same time because "it's possible" that "some machines" "somewhere" "might not" write "a certain value" atomically. Maybe.
That is not the case. Atomicity has a very specific meaning, and it solves a very specific problem: threads being pre-empted by the OS to schedule another thread in its place on that core. And you cannot stop a thread from executing mid-assembly instruction.
What that means is that any single assembly instruction is "atomic" by definition. And since you have registry moving instructions, any register-sized copy is atomic by definition. That means a 32-bit integer on a 32-bit CPU, and a 64-bit integer on a 64-bit CPU are all atomic -- and of course that includes pointers (ignore all the people who will tell you "some architectures" have pointers of "different size" than registers, that hasn't been the case since 386).
You should however be careful not to hit variable caching problems (ie one thread writing a pointer, and another trying to read it but getting an old value from the cache), use volatile as needed to prevent this.

Volatile function argument in C (STM32F4 example) [duplicate]

Why is volatile needed in C? What is it used for? What will it do?

volatile tells the compiler not to optimize anything that has to do with the volatile variable.
There are at least three common reasons to use it, all involving situations where the value of the variable can change without action from the visible code:
When you interface with hardware that changes the value itself
when there's another thread running that also uses the variable
when there's a signal handler that might change the value of the variable.
Let's say you have a little piece of hardware that is mapped into RAM somewhere and that has two addresses: a command port and a data port:
typedef struct
{
int command;
int data;
int isBusy;
} MyHardwareGadget;
Now you want to send some command:
void SendCommand (MyHardwareGadget * gadget, int command, int data)
{
// wait while the gadget is busy:
while (gadget->isbusy)
{
// do nothing here.
}
// set data first:
gadget->data = data;
// writing the command starts the action:
gadget->command = command;
}
Looks easy, but it can fail because the compiler is free to change the order in which data and commands are written. This would cause our little gadget to issue commands with the previous data-value. Also take a look at the wait while busy loop. That one will be optimized out. The compiler will try to be clever, read the value of isBusy just once and then go into an infinite loop. That's not what you want.
The way to get around this is to declare the pointer gadget as volatile. This way the compiler is forced to do what you wrote. It can't remove the memory assignments, it can't cache variables in registers and it can't change the order of assignments either
This is the correct version:
void SendCommand (volatile MyHardwareGadget * gadget, int command, int data)
{
// wait while the gadget is busy:
while (gadget->isBusy)
{
// do nothing here.
}
// set data first:
gadget->data = data;
// writing the command starts the action:
gadget->command = command;
}

volatile in C actually came into existence for the purpose of not caching the values of the variable automatically. It will tell the compiler not to cache the value of this variable. So it will generate code to take the value of the given volatile variable from the main memory every time it encounters it. This mechanism is used because at any time the value can be modified by the OS or any interrupt. So using volatile will help us accessing the value afresh every time.

Another use for volatile is signal handlers. If you have code like this:
int quit = 0;
while (!quit)
{
/* very small loop which is completely visible to the compiler */
}
The compiler is allowed to notice the loop body does not touch the quit variable and convert the loop to a while (true) loop. Even if the quit variable is set on the signal handler for SIGINT and SIGTERM; the compiler has no way to know that.
However, if the quit variable is declared volatile, the compiler is forced to load it every time, because it can be modified elsewhere. This is exactly what you want in this situation.

volatile tells the compiler that your variable may be changed by other means, than the code that is accessing it. e.g., it may be a I/O-mapped memory location. If this is not specified in such cases, some variable accesses can be optimised, e.g., its contents can be held in a register, and the memory location not read back in again.

See this article by Andrei Alexandrescu, "volatile - Multithreaded Programmer's Best Friend"
The volatile keyword was
devised to prevent compiler
optimizations that might render code
incorrect in the presence of certain
asynchronous events. For example, if
you declare a primitive variable as
volatile, the compiler is not
permitted to cache it in a register --
a common optimization that would be
disastrous if that variable were
shared among multiple threads. So the
general rule is, if you have variables
of primitive type that must be shared
among multiple threads, declare those
variables volatile. But you can
actually do a lot more with this
keyword: you can use it to catch code
that is not thread safe, and you can
do so at compile time. This article
shows how it is done; the solution
involves a simple smart pointer that
also makes it easy to serialize
critical sections of code.
The article applies to both C and C++.
Also see the article "C++ and the Perils of Double-Checked Locking" by Scott Meyers and Andrei Alexandrescu:
So when dealing with some memory locations (e.g. memory mapped ports or memory referenced by ISRs [ Interrupt Service Routines ] ), some optimizations must be suspended. volatile exists for specifying special treatment for such locations, specifically: (1) the content of a volatile variable is "unstable" (can change by means unknown to the compiler), (2) all writes to volatile data are "observable" so they must be executed religiously, and (3) all operations on volatile data are executed in the sequence in which they appear in the source code. The first two rules ensure proper reading and writing. The last one allows implementation of I/O protocols that mix input and output. This is informally what C and C++'s volatile guarantees.

My simple explanation is:
In some scenarios, based on the logic or code, the compiler will do optimisation of variables which it thinks do not change. The volatile keyword prevents a variable being optimised.
For example:
bool usb_interface_flag = 0;
while(usb_interface_flag == 0)
{
// execute logic for the scenario where the USB isn't connected
}
From the above code, the compiler may think usb_interface_flag is defined as 0, and that in the while loop it will be zero forever. After optimisation, the compiler will treat it as while(true) all the time, resulting in an infinite loop.
To avoid these kinds of scenarios, we declare the flag as volatile, we are telling to compiler that this value may be changed by an external interface or other module of program, i.e., please don't optimise it. That's the use case for volatile.

A marginal use for volatile is the following. Say you want to compute the numerical derivative of a function f :
double der_f(double x)
{
static const double h = 1e-3;
return (f(x + h) - f(x)) / h;
}
The problem is that x+h-x is generally not equal to h due to roundoff errors. Think about it : when you substract very close numbers, you lose a lot of significant digits which can ruin the computation of the derivative (think 1.00001 - 1). A possible workaround could be
double der_f2(double x)
{
static const double h = 1e-3;
double hh = x + h - x;
return (f(x + hh) - f(x)) / hh;
}
but depending on your platform and compiler switches, the second line of that function may be wiped out by a aggressively optimizing compiler. So you write instead
volatile double hh = x + h;
hh -= x;
to force the compiler to read the memory location containing hh, forfeiting an eventual optimization opportunity.

There are two uses. These are specially used more often in embedded development.
Compiler will not optimise the functions that uses variables that are defined with volatile keyword
Volatile is used to access exact memory locations in RAM, ROM, etc... This is used more often to control memory-mapped devices, access CPU registers and locate specific memory locations.
See examples with assembly listing.
Re: Usage of C "volatile" Keyword in Embedded Development

I'll mention another scenario where volatiles are important.
Suppose you memory-map a file for faster I/O and that file can change behind the scenes (e.g. the file is not on your local hard drive, but is instead served over the network by another computer).
If you access the memory-mapped file's data through pointers to non-volatile objects (at the source code level), then the code generated by the compiler can fetch the same data multiple times without you being aware of it.
If that data happens to change, your program may become using two or more different versions of the data and get into an inconsistent state. This can lead not only to logically incorrect behavior of the program but also to exploitable security holes in it if it processes untrusted files or files from untrusted locations.
If you care about security, and you should, this is an important scenario to consider.

Volatile is also useful, when you want to force the compiler not to optimize a specific code sequence (e.g. for writing a micro-benchmark).

volatile means the storage is likely to change at anytime and be changed but something outside the control of the user program. This means that if you reference the variable, the program should always check the physical address (ie a mapped input fifo), and not use it in a cached way.

In the language designed by Dennis Ritchie, every access to any object, other than automatic objects whose address had not been taken, would behave as though it computed the address of the object and then read or wrote the storage at that address. This made the language very powerful, but severely limited optimization opportunities.
While it might have been possible to add a qualifier that would invite a compiler to assume that a particular object wouldn't be changed in weird ways, such an assumption would be appropriate for the vast majority of objects in C programs, and it would have been impractical to add a qualifier to all the objects for which such assumption would be appropriate. On the other hand, some programs need to use some objects for which such an assumption would not hold. To resolve this issue, the Standard says that compilers may assume that objects which are not declared volatile will not have their value observed or changed in ways that are outside the compiler's control, or would be outside a reasonable compiler's understanding.
Because various platforms may have different ways in which objects could be observed or modified outside a compiler's control, it is appropriate that quality compilers for those platforms should differ in their exact handling of volatile semantics. Unfortunately, because the Standard failed to suggest that quality compilers intended for low-level programming on a platform should handle volatile in a way that will recognize any and all relevant effects of a particular read/write operation on that platform, many compilers fall short of doing so in ways that make it harder to process things like background I/O in a way which is efficient but can't be broken by compiler "optimizations".

In my opinion, you should not expect too much from volatile. To illustrate, look at the example in Nils Pipenbrinck's highly-voted answer.
I would say, his example is not suitable for volatile. volatile is only used to:
prevent the compiler from making useful and desirable optimizations. It is nothing about the thread safe, atomic access or even memory order.
In that example:
void SendCommand (volatile MyHardwareGadget * gadget, int command, int data)
{
// wait while the gadget is busy:
while (gadget->isbusy)
{
// do nothing here.
}
// set data first:
gadget->data = data;
// writing the command starts the action:
gadget->command = command;
}
the gadget->data = data before gadget->command = command only is only guaranteed in compiled code by compiler. At running time, the processor still possibly reorders the data and command assignment, regarding to the processor architecture. The hardware could get the wrong data(suppose gadget is mapped to hardware I/O). The memory barrier is needed between data and command assignment.

In simple terms, it tells the compiler not to do any optimisation on a particular variable. Variables which are mapped to device register are modified indirectly by the device. In this case, volatile must be used.

The Wiki say everything about volatile:
volatile (computer programming)
And the Linux kernel's doc also make a excellent notation about volatile:
Why the "volatile" type class should not be used

A volatile can be changed from outside the compiled code (for example, a program may map a volatile variable to a memory mapped register.) The compiler won't apply certain optimizations to code that handles a volatile variable - for example, it won't load it into a register without writing it to memory. This is important when dealing with hardware registers.

As rightly suggested by many here, the volatile keyword's popular use is to skip the optimisation of the volatile variable.
The best advantage that comes to mind, and worth mentioning after reading about volatile is -- to prevent rolling back of the variable in case of a longjmp. A non-local jump.
What does this mean?
It simply means that the last value will be retained after you do stack unwinding, to return to some previous stack frame; typically in case of some erroneous scenario.
Since it'd be out of scope of this question, I am not going into details of setjmp/longjmp here, but it's worth reading about it; and how the volatility feature can be used to retain the last value.

it does not allows compiler to automatic changing values of variables. a volatile variable is for dynamic use.

embedded C - using "volatile" to assert consistency

Consider the following code:
// In the interrupt handler file:
volatile uint32_t gSampleIndex = 0; // declared 'extern'
void HandleSomeIrq()
{
gSampleIndex++;
}
// In some other file
void Process()
{
uint32_t localSampleIndex = gSampleIndex; // will this be optimized away?
PrevSample = RawSamples[(localSampleIndex + 0) % NUM_RAW_SAMPLE_BUFFERS];
CurrentSample = RawSamples[(localSampleIndex + 1) % NUM_RAW_SAMPLE_BUFFERS];
NextSample = RawSamples[(localSampleIndex + 2) % NUM_RAW_SAMPLE_BUFFERS];
}
My intention is that PrevSample, CurrentSample and NextSample are consistent, even if gSampleIndex is updated during the call to Process().
Will the assignment to the localSampleIndex do the trick, or is there any chance it will be optimized away even though gSampleIndex is volatile?

In principle, volatile is not enough to guarantee that Process only sees consistent values of gSampleIndex. In practice, however, you should not run into any issues if uinit32_t is directly supported by the hardware. The proper solution would be to use atomic accesses.
The problem
Suppose that you are running on a 16-bit architecture, so that the instruction
localSampleIndex = gSampleIndex;
gets compiled into two instructions (loading the upper half, loading the lower half). Then the interrupt might be called between the two instructions, and you'll get half of the old value combined with half of the new value.
The solution
The solution is to access gSampleCounter using atomic operations only. I know of three ways of doing that.
C11 atomics
In C11 (supported since GCC 4.9), you declare your variable as atomic:
#include <stdatomic.h>
atomic_uint gSampleIndex;
You then take care to only ever access the variable using the documented atomic interfaces. In the IRQ handler:
atomic_fetch_add(&gSampleIndex, 1);
and in the Process function:
localSampleIndex = atomic_load(gSampleIndex);
Do not bother with the _explicit variants of the atomic functions unless you're trying to get your program to scale across large numbers of cores.
GCC atomics
Even if your compiler does not support C11 yet, it probably has some support for atomic operations. For example, in GCC you can say:
volatile int gSampleIndex;
...
__atomic_add_fetch(&gSampleIndex, 1, __ATOMIC_SEQ_CST);
...
__atomic_load(&gSampleIndex, &localSampleIndex, __ATOMIC_SEQ_CST);
As above, do not bother with weak consistency unless you're trying to achieve good scaling behaviour.
Implementing atomic operations yourself
Since you're not trying to protect against concurrent access from multiple cores, just race conditions with an interrupt handler, it is possible to implement a consistency protocol using standard C primitives only. Dekker's algorithm is the oldest known such protocol.

In your function you access volatile variable just once (and it's the only volatile one in that function) so you don't need to worry about code reorganization that compiler may do (and volatile prevents). What standard says for these optimizations at §5.1.2.3 is:
In the abstract machine, all expressions are evaluated as specified by the semantics. An actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no needed side effects are produced (including any caused by calling a function or accessing a volatile object).
Note last sentence: "...no needed side effects are produced (...accessing a volatile object)".
Simply volatile will prevent any optimization compiler may do around that code. Just to mention few: no instruction reordering respect other volatile variables. no expression removing, no caching, no value propagation across functions.
BTW I doubt any compiler may break your code (with or without volatile). Maybe local stack variable will be elided but value will be stored in a registry (for sure it won't repeatedly access a memory location). What you need volatile for is value visibility.
EDIT
I think some clarification is needed.
Let me safely assume you know what you're doing (you're working with interrupt handlers so this shouldn't be your first C program): CPU word matches your variable type and memory is properly aligned.
Let me also assume your interrupt is not reentrant (some magic cli/sti stuff or whatever your CPU uses for this) unless you're planning some hard-time debugging and tuning.
If these assumptions are satisfied then you don't need atomic operations. Why? Because localSampleIndex = gSampleIndex is atomic (because it's properly aligned, word size matches and it's volatile), with ++gSampleIndex there isn't any race condition (HandleSomeIrq won't be called again while it's still in execution). More than useless they're wrong.
One may think: "OK, I may not need atomic but why I can't use them? Even if such assumption are satisfied this is an *extra* and it'll achieve same goal" . No, it doesn't. Atomic has not same semantic of volatile variables (and seldom volatile is/should be used outside memory mapped I/O and signal handling). Volatile (usually) is useless with atomic (unless a specific architecture says it is) but it has a great difference: visibility. When you update gSampleIndex in HandleSomeIrq standard guarantees that value will be immediately visible to all threads (and devices). with atomic_uint standard guarantees it'll be visible in a reasonable amount of time.
To make it short and clear: volatile and atomic are not the same thing. Atomic operations are useful for concurrency, volatile are useful for lower level stuff (interrupts, devices). If you're still thinking "hey they do *exactly* what I need" please read few useful links picked from comments: cache coherency and a nice reading about atomics.
To summarize:
In your case you may use an atomic variable with a lock (to have both atomic access and value visibility) but no one on this earth would put a lock inside an interrupt handler (unless absolutely definitely doubtless unquestionably needed, and from code you posted it's not your case).

When do I need to use volatile in ISRs?

I am making embedded firmware where everything after initialization happens in ISRs. I have variables that are shared between them, and I am wondering in which cases they need to be volatile. I never block, waiting for a change in another ISR.
When can I be certain that actual memory is read or written to when not using volatile? Once every ISR?
Addendum:
This is for ARM Cortex-M0, but this isn't really a question about ISRs as much as it is about compiler optimization, and as such, the platform shouldn't really be important.

The question is entirely answerable, and the answer is simple:
Without volatile you (simplistically) can't assume that actual memory access will ever happen - the compiler is free to conclude that results are either entirely unused (if that is apparent in what it can see), or that they may be safely cached in a register, or computed out of order (as long as visible dependencies are maintained).
You need volatile to tell the compiler that the side effects of access may be important to something the optimizer is unable to analyze, such as an interrupt context or the context of a different "thread".
In effect volatile is how you say to the compiler "I know something you don't, so don't try to be clever here"
Beware that volatile does not guarantee atomicity of read-modify-write, or of even read or write alone where the data type (or its misalignment) requires muti-step access. In those cases, you risk not just getting a stale value, but an entirely erroneous one.

it is already mentioned that actual write to memory/cache is not exactly predictable when using non volatile variables.
but it is also worth mentioning about another aspect where the volatile variable might get cached and might require a forced cache flush to write in to the actual memory ( depends on whether a write-through or a write-back policy is used).
consider another case where the volatile variable is not cached ( placed in non-cacheable area)
but due to the presence of write buffers and bus bridges sometimes it is not predictable when the real write happens to the intended register and it requires a dummy read to ensure that write actually happened to the real register/memory. This is particularly helpful to avoid race conditions in interrupt clearing/masking.
even though compilers are not supposed to be clever around volatile variables.it is free to do some optimizations with respect to volatile sequence points ( optimization across sequence points not permitted, but optimization between sequence points are permitted)

The variables that need volatile are:
1) Share data between ISR and program data or other threads.
Preferable these are flags that indicate block access to various data structures.
// main() code;
disable_interrupts();
if (flag == 0) {
flag = 1;
enable_interrupts();
Manipulate_data();
flag = 0;
} else {
enable_interrupts();
Cope_with_data_unavailable();
}
2) Memory mapped hardware registers.
This "memory" can change at any time due to hardware conditions and the complier needs to know that their values are not necessarily consistent. Without volatile, a naive comapiler would only sample fred once resulting in a potential endless loop.
volatile int *fred = 0x1234; // Hardware reg at address 0x1234;
while (*fred);

Why is volatile needed in C?

Why is volatile needed in C? What is it used for? What will it do?

volatile tells the compiler not to optimize anything that has to do with the volatile variable.
There are at least three common reasons to use it, all involving situations where the value of the variable can change without action from the visible code:
When you interface with hardware that changes the value itself
when there's another thread running that also uses the variable
when there's a signal handler that might change the value of the variable.
Let's say you have a little piece of hardware that is mapped into RAM somewhere and that has two addresses: a command port and a data port:
typedef struct
{
int command;
int data;
int isBusy;
} MyHardwareGadget;
Now you want to send some command:
void SendCommand (MyHardwareGadget * gadget, int command, int data)
{
// wait while the gadget is busy:
while (gadget->isbusy)
{
// do nothing here.
}
// set data first:
gadget->data = data;
// writing the command starts the action:
gadget->command = command;
}
Looks easy, but it can fail because the compiler is free to change the order in which data and commands are written. This would cause our little gadget to issue commands with the previous data-value. Also take a look at the wait while busy loop. That one will be optimized out. The compiler will try to be clever, read the value of isBusy just once and then go into an infinite loop. That's not what you want.
The way to get around this is to declare the pointer gadget as volatile. This way the compiler is forced to do what you wrote. It can't remove the memory assignments, it can't cache variables in registers and it can't change the order of assignments either
This is the correct version:
void SendCommand (volatile MyHardwareGadget * gadget, int command, int data)
{
// wait while the gadget is busy:
while (gadget->isBusy)
{
// do nothing here.
}
// set data first:
gadget->data = data;
// writing the command starts the action:
gadget->command = command;
}

volatile in C actually came into existence for the purpose of not caching the values of the variable automatically. It will tell the compiler not to cache the value of this variable. So it will generate code to take the value of the given volatile variable from the main memory every time it encounters it. This mechanism is used because at any time the value can be modified by the OS or any interrupt. So using volatile will help us accessing the value afresh every time.

Another use for volatile is signal handlers. If you have code like this:
int quit = 0;
while (!quit)
{
/* very small loop which is completely visible to the compiler */
}
The compiler is allowed to notice the loop body does not touch the quit variable and convert the loop to a while (true) loop. Even if the quit variable is set on the signal handler for SIGINT and SIGTERM; the compiler has no way to know that.
However, if the quit variable is declared volatile, the compiler is forced to load it every time, because it can be modified elsewhere. This is exactly what you want in this situation.

volatile tells the compiler that your variable may be changed by other means, than the code that is accessing it. e.g., it may be a I/O-mapped memory location. If this is not specified in such cases, some variable accesses can be optimised, e.g., its contents can be held in a register, and the memory location not read back in again.

My simple explanation is:
In some scenarios, based on the logic or code, the compiler will do optimisation of variables which it thinks do not change. The volatile keyword prevents a variable being optimised.
For example:
bool usb_interface_flag = 0;
while(usb_interface_flag == 0)
{
// execute logic for the scenario where the USB isn't connected
}
From the above code, the compiler may think usb_interface_flag is defined as 0, and that in the while loop it will be zero forever. After optimisation, the compiler will treat it as while(true) all the time, resulting in an infinite loop.
To avoid these kinds of scenarios, we declare the flag as volatile, we are telling to compiler that this value may be changed by an external interface or other module of program, i.e., please don't optimise it. That's the use case for volatile.

A marginal use for volatile is the following. Say you want to compute the numerical derivative of a function f :
double der_f(double x)
{
static const double h = 1e-3;
return (f(x + h) - f(x)) / h;
}
The problem is that x+h-x is generally not equal to h due to roundoff errors. Think about it : when you substract very close numbers, you lose a lot of significant digits which can ruin the computation of the derivative (think 1.00001 - 1). A possible workaround could be
double der_f2(double x)
{
static const double h = 1e-3;
double hh = x + h - x;
return (f(x + hh) - f(x)) / hh;
}
but depending on your platform and compiler switches, the second line of that function may be wiped out by a aggressively optimizing compiler. So you write instead
volatile double hh = x + h;
hh -= x;
to force the compiler to read the memory location containing hh, forfeiting an eventual optimization opportunity.

There are two uses. These are specially used more often in embedded development.
Compiler will not optimise the functions that uses variables that are defined with volatile keyword
Volatile is used to access exact memory locations in RAM, ROM, etc... This is used more often to control memory-mapped devices, access CPU registers and locate specific memory locations.
See examples with assembly listing.
Re: Usage of C "volatile" Keyword in Embedded Development

I'll mention another scenario where volatiles are important.
Suppose you memory-map a file for faster I/O and that file can change behind the scenes (e.g. the file is not on your local hard drive, but is instead served over the network by another computer).
If you access the memory-mapped file's data through pointers to non-volatile objects (at the source code level), then the code generated by the compiler can fetch the same data multiple times without you being aware of it.
If that data happens to change, your program may become using two or more different versions of the data and get into an inconsistent state. This can lead not only to logically incorrect behavior of the program but also to exploitable security holes in it if it processes untrusted files or files from untrusted locations.
If you care about security, and you should, this is an important scenario to consider.

Volatile is also useful, when you want to force the compiler not to optimize a specific code sequence (e.g. for writing a micro-benchmark).

volatile means the storage is likely to change at anytime and be changed but something outside the control of the user program. This means that if you reference the variable, the program should always check the physical address (ie a mapped input fifo), and not use it in a cached way.

In the language designed by Dennis Ritchie, every access to any object, other than automatic objects whose address had not been taken, would behave as though it computed the address of the object and then read or wrote the storage at that address. This made the language very powerful, but severely limited optimization opportunities.
While it might have been possible to add a qualifier that would invite a compiler to assume that a particular object wouldn't be changed in weird ways, such an assumption would be appropriate for the vast majority of objects in C programs, and it would have been impractical to add a qualifier to all the objects for which such assumption would be appropriate. On the other hand, some programs need to use some objects for which such an assumption would not hold. To resolve this issue, the Standard says that compilers may assume that objects which are not declared volatile will not have their value observed or changed in ways that are outside the compiler's control, or would be outside a reasonable compiler's understanding.
Because various platforms may have different ways in which objects could be observed or modified outside a compiler's control, it is appropriate that quality compilers for those platforms should differ in their exact handling of volatile semantics. Unfortunately, because the Standard failed to suggest that quality compilers intended for low-level programming on a platform should handle volatile in a way that will recognize any and all relevant effects of a particular read/write operation on that platform, many compilers fall short of doing so in ways that make it harder to process things like background I/O in a way which is efficient but can't be broken by compiler "optimizations".

In my opinion, you should not expect too much from volatile. To illustrate, look at the example in Nils Pipenbrinck's highly-voted answer.
I would say, his example is not suitable for volatile. volatile is only used to:
prevent the compiler from making useful and desirable optimizations. It is nothing about the thread safe, atomic access or even memory order.
In that example:
void SendCommand (volatile MyHardwareGadget * gadget, int command, int data)
{
// wait while the gadget is busy:
while (gadget->isbusy)
{
// do nothing here.
}
// set data first:
gadget->data = data;
// writing the command starts the action:
gadget->command = command;
}
the gadget->data = data before gadget->command = command only is only guaranteed in compiled code by compiler. At running time, the processor still possibly reorders the data and command assignment, regarding to the processor architecture. The hardware could get the wrong data(suppose gadget is mapped to hardware I/O). The memory barrier is needed between data and command assignment.

In simple terms, it tells the compiler not to do any optimisation on a particular variable. Variables which are mapped to device register are modified indirectly by the device. In this case, volatile must be used.

The Wiki say everything about volatile:
volatile (computer programming)
And the Linux kernel's doc also make a excellent notation about volatile:
Why the "volatile" type class should not be used

A volatile can be changed from outside the compiled code (for example, a program may map a volatile variable to a memory mapped register.) The compiler won't apply certain optimizations to code that handles a volatile variable - for example, it won't load it into a register without writing it to memory. This is important when dealing with hardware registers.

As rightly suggested by many here, the volatile keyword's popular use is to skip the optimisation of the volatile variable.
The best advantage that comes to mind, and worth mentioning after reading about volatile is -- to prevent rolling back of the variable in case of a longjmp. A non-local jump.
What does this mean?
It simply means that the last value will be retained after you do stack unwinding, to return to some previous stack frame; typically in case of some erroneous scenario.
Since it'd be out of scope of this question, I am not going into details of setjmp/longjmp here, but it's worth reading about it; and how the volatility feature can be used to retain the last value.

it does not allows compiler to automatic changing values of variables. a volatile variable is for dynamic use.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight