What do they mean by scope of the program in c language? - c

While declaring the volatile keyword, the value of variable make change any moment from outside the scope of the program. What does that meant? Whether it will change outside the scope of main function or outside the scope of globally declared function? What is the perspective in terms of embedded system, if two or more events are performed simultaneously?

volatile was originally intended for stuff like reading from a memory mapped hardware device; each time you read from something like a memory address mapped to a serial port it might have a new value, even if nothing in your program wrote to it. volatile makes it clear that the data there may change at any time, so it should be reread each time, rather than allowing the compiler to optimize it to a single read when it knows your program never changes it. Similar cases can occur even without hardware interference; asynchronous kernel callbacks may write back into user mode memory in a similar way, so reading the value afresh each time is sometimes necessary.

Ab optimizing compiler assumes there is only the context of a single thread of execution. Another context means anything the compiler can't see happening at the same time. So this is hardware actions, interrupt handlers or other threads or processes. Where your code accesses a global (program or file level) variable the optimizer won't assume another context might change or read it unless you tell it by using the volatile qualifier.
Take the case of a hardware register that is memory mapped and you read in a while loop waiting for it to change. Without volatile the compiler only sees your while loop reading the register and if you allow the compiler to optimize the code it will optimize away the multiple reads and never see a change in the register. This is what we normally want the optimizing compiler to do with variables that don't change in a loop.
A similar thing happens to memory mapped hardware registers you write to. If your program never reads from them the compiler could optimize away the write. Again this is what you want an optimizing compiler to do when you are not dealing with a memory location that is used by hardware or another context.
Interrupt handlers and forked threads are treated the same way as hardware. The optimizer doesn't assume they are running at the same time and skip optimizing away a load or store to a shared memory location unless you use volatile.

Related

Does the correct use of "volatile" still (always) result in a program with undefined interaction with that data?

Online examples of correct use of the volatile keyword appear to be like so:
void Foo (volatile SomethingExternal * x, int data_update)
{
while (x->busy);
x->data = data_update;
}
But it seems that if the data that x points to is genuinely volatile, then a context switch may occur between exiting the while loop and writing to the data, so if it's important that the busy flag is false when we access it then isn't this code unsafe?
This is not quite true. There are constructs which, by design, are correct when implemented with volatile operations. From the standard as quoted in [this answer]:
The observable behavior of the abstract machine is its sequence of reads and writes to volatile data and calls to library I/O functions.
This gives us guarantees that all volatile data will be read and written as requested, without reordering with respect to the current thread.
As an example of a structure which is correct even with context-switching, the low-level acquisition of a mutex can be implemented using Dekker's Algorithm. This algorithm does not require an atomic compare-and-swap operation, but it does require the use of volatile-qualified memory. Since volatile operations of one thread are not reordered as seen by anyone (including external threads), the algorithm's correctness holds (the proof requires that operations not be reordered). Likewise, because volatile reads always read from actual memory and not from a cached value, the algorithm can make progress when the lock is made available.
It is an exercise to show to the reader that this algorithm can be used to construct, for example, a safe locking idiom.
Another example of (safe) use of volatile variables is the code given in your question, when executed on a single-threaded processor without context switches (e.g. a microcontroller with interrupts disabled) with x pointing into the memory mapping of an external device. This assumes that the code is actually correct for the device's intended use (i.e. as soon as busy is deasserted, a single write to the data register will initiate whatever task is required of it).
Volatile reads ensure that your program makes progress when the device is no longer busy (liveness), because the compiler cannot simply coalesce the loop into a single memory read followed by an infinite loop taken if the device was busy.
In the example you link to, the model is of some device that is accessed with volatile objects. There is no other thread or process interacting with the device: Once the device finishes its task and becomes not busy, it remains not busy until you give it a new command. No other thread or process will make it busy; you own the device and have exclusive access. The memory needs to be marked volatile so that the compiler will perform an actual read when the C code checks x->busy and will perform an actual write when the C code writes x->data.
You are correct that a context switch could occur between testing x->busy and writing x->data. This would be a bug if there were another process or thread that were accessing the device. But that is not what this code is for.

Is it true that "volatile" in a userspace program tends to indicate a bug?

When I googling about "volatile" and its user space usage, I found mails between Theodore Tso and Linus Torvalds. According to these great masters, use of "volatile" in userspace probably be a bug??Check discussion here
Although they have some explanations, but I really couldn't understand. Could anyone use some simple language explain why they said so? We are not suppose to use volatile in user space??
volatile tells the compiler that every read and write has an observable side effect; thus, the compiler can't make any assumptions about two reads or two writes in a row having the same effect.
For instance, normally, the following code:
int a = *x;
int b = *x;
if (a == b)
printf("Hi!\n");
Could be optimized into:
printf("Hi!\n");
What volatile does is tell the compiler that those values might be coming from somewhere outside of the program's control, so it has to actually read those values and perform the comparison.
A lot of people have made the mistake of thinking that they could use volatile to build lock-free data structures, which would allow multiple threads to share values, and they could observe the effects of those values in other threads.
However, volatile says nothing about how different threads interact, and could be applied to values that could be cached with different values on different cores, or could be applied to values that can't be atomically written in a single operation, and so if you try to write multi-threaded or multi-core code using volatile, you can run into a lot of problems.
Instead, you need to either use locks or some other standard concurrency mechanism to communicate between threads, or use memory barriers, or use C11/C++11 atomic types and atomic operations. Locks ensure that an entire region of code has exclusive access to a variable, which can work if you have a value that is too large, too small, or not aligned to be atomically written in a single operation, while memory barriers and the atomic types and operations provide guarantees about how they work with the CPU to ensure that caches are synchronized or reads and writes happen in particular orders.
Basically, volatile winds up mostly being useful when you're interfacing with a single hardware register, which can vary outside the programs control but may not require any special atomic operations to access. Or it can be used in signal handlers, where because a thread could be interrupted, and the handler run, and then control returned within the same thread, you need to use a volatile value if you want to communicate a flag to the interrupted code.
But if you're doing any kind of sychronization between threads, you should be using locks or some other concurrency primitives provided by a standard library, or really know what you're doing with regards to memory ordering and use memory barriers or atomic operations.

How to allow two threads to share a global variable in WinAPI?

I have two threads that are created using CreateThread(), and I have a global variable that one thread writes to, and the other thread reads from.
Now based on my understanding, the compiler and/or the CPU can do all sorts of optimizations, which could mean for example that when I write a value to the variable, the value can be written in some cache and not written directly to memory (and hence the other thread will not be able to see it).
I have read that I can wrap the code that access the variable in a critical section, but the documentation says that a critical section will only enforce mutual exclusion, and does not say anything about enforcing writing directly to memory and reading directly from memory.
Note that I do not which to use the volatile keyword, I want to know how this is done in pure WinAPI (as I could use a language other than C in a later time).
MSDN explicitly states that critical sections are memory barriers. https://msdn.microsoft.com/en-us/library/windows/desktop/ms686355(v=vs.85).aspx

C volatile, and issues with hardware caching

I've read similar answers on this site, and elsewhere, but am still confused in a few circumstances.
I'm aware of what the standard actually guarantees us, I understand the intended use of the keyword, and I'm well aware of the difference between the compiler caching and L1/L2/ect. caching; it's more for curiosity's sake that I understand the other cases.
Say I have a variable declared volatile in C. Four scenarios:
Signal handlers, single threaded (As intended): This is the problem the keyword was meant to solve. My process gets a signal callback from the OS, and I modify some volatile variable out of the normal execution of my process. Since it was declared volatile, the normal process won't store this value in a CPU register, and will always do a load from memory. Even if the signal handler writes to the volatile variable, since the signal handler shares the same address space as the normal process, even if the volatile variable was previously cached in hardware (i.e. L1, L2), we guarantee the main process will load the correct, updated variable. Perfect, everyone is happy.
DMA-transfers, single-threaded: Say the volatile variable is mapped to a region of memory for which a DMA-write is taking place. As before, the compiler won't keep the volatile variable in a CPU register, and will always do a load from memory; however, if that variable exists in hardware cache, then the load request will never reach main memory. If the DMA controller updates MM behind our backs, we'll never get the up-to-date value. In a preemptive OS, we are saved by the fact that eventually, we'll probably be context-switched out, and the next time our process resumes, the cache will be cold and we'll actually have to reload from main memory - so we'll get the correct functionality.. eventually (our own process could potentially swap that cache line out too - but again, we might waste valuable cycles before that happens). Is there standardized HW support or OS support that notifies the hardware caches when main memory is updated via the DMA controller? Or do we have to explicitly flush the cache to guarantee we arm't reading a false value? (Is this even possible in the architectures listed?)
Memory-mapped registers, single-threaded: Same as #2, except the volatile variable is mapped to a memory-mapped register (or an explicit IO-port). I would imagine this is a more difficult problem then #3, since at least the DMA controller will signal the CPU when it's done transferring, which gives the OS or HW a chance to do something.
Mutilthreaded: If I have a volatile variable, is there any guarantee of cache-coherency between multiple threads running on separate physical cores? Like sure, again, the compiler is still issuing load instructions from memory, but if the value is cached in one core's cache, is there any guarantee the same value must exist in the other core's caches? (I would imagine it's not an issue at all for hyperthreading threads on different logical cores on the same physical core, since they share physical cache memory). My overwhelming intuition says no, but thought I'd list the case here anyways.
If possible, differentiate between x64 and ARMv6/7/8 architectures, and kernel vs user land solutions.
For 2 and 3, no there's no standardized way this would work.
Normally when doing DMA transfers one would flush the cache in a platform depending manner. Normally there's quite straight forward instructions for doing that (since now-days the caches are integrated in the CPU).
When accessing memory-mapped registers on the other hand, often the behavior is dependent on the order of writes. For example, suppose you have a UART port and write characters to it — you'll need to make sure that there is an actual write to the port each time you write to it from C.
While it might work with flushing the cache between each write, it's not what one normally does. The normal way (for ARM at least) is to set up the MMU so that writes to certain regions of address space happen uncached and in correct sequence.
This approach can also be used for memory used for DMA transfers; one could for example set up dedicated regions for use as DMA buffers and set up the MMU so that reads and writes to that region happen uncached.
On the other hand the language guarantees that all memory (well what you get from declaring variables or allocating memory using new) will behave in certain ways. It should be no difference between if it's multi-threaded or there's signals involved. Note that the C90 and C99 standards don't mention threads (C11 does), but they are supposed to work this way. The implementation has to make sure that the CPU's and cache are used in a way that is consistent with this (as a consequence, the OS might not be able to schedule different threads on different cores if this can't be accomplished). Consequently you should not need to flush caches in order to share data between threads, but you do need to synchronize threads and of course use volatile qualified data. The same is true for signal handlers even if the implementation happens to schedule them on a different core.

Usage of Volatile in case of Memory mapped Devices?

Following link says that "Access to device registers is always uncached"
http://techpubs.sgi.com/library/dynaweb_docs/hdwr/SGI_Developer/books/DevDrvrO2_PG/sgi_html/ch01.html
My Question is do we ever need volatile when access to device registers which is memory mapped?
The confusion here comes from two mechanisms which have similarities in their goals, but quite distinct mechanisms and levels of implementation.
The link refers to memory mapped I/O regions being configured as ineligible for hardware caching in fast intermediate memory that is used to speed operations compared to accessing slower main memory banks. This is traditionally nearly transparent to software (exceptions being things like modifying code on a machine with distinct instruction and data caches).
In contrast, volatile is used to prohibit an optimizing compiler from performing "software" caching of values by strategically holding them in registers, delaying calculating them until needed, or perhaps never calculating them if un-needed. The basic effect is to inform the compiler that the value may be produced or consumed by a mechanism invisible to its analysis - be that either hardware beyond the present processor core, or a distinct thread or context of execution.
This question is a more procesor-specific version of Why is volatile needed in C?
This is one of the two situations where volatile is mandatory (and it would be nice if compilers could know that).
Any memory location which can change either without your code initiating it (I.e. a memory mapped device register) or without your thread initiating it (i.e. it is changed by another thread or by an interrupt handler) absolutely must be declared as volatile to prevent the compiler optimizing away memory-fetch operations.

Resources