I have two threads that are created using CreateThread(), and I have a global variable that one thread writes to, and the other thread reads from.
Now based on my understanding, the compiler and/or the CPU can do all sorts of optimizations, which could mean for example that when I write a value to the variable, the value can be written in some cache and not written directly to memory (and hence the other thread will not be able to see it).
I have read that I can wrap the code that access the variable in a critical section, but the documentation says that a critical section will only enforce mutual exclusion, and does not say anything about enforcing writing directly to memory and reading directly from memory.
Note that I do not which to use the volatile keyword, I want to know how this is done in pure WinAPI (as I could use a language other than C in a later time).
MSDN explicitly states that critical sections are memory barriers. https://msdn.microsoft.com/en-us/library/windows/desktop/ms686355(v=vs.85).aspx
Related
I am curious about whether volatile is necessary for the resources used in a critical section. Consider I have two threads executed on two CPUs and they are competing on a shared resource. I know I need to a locking mechanism to make sure only one thread is performing operations on that shared resource. Below is the pseudo code that will be executed on those two threads.
take_lock();
// Read shared resource.
read_shared_resouce();
// Write something to shared resource.
write_shared_resource();
release_lock();
I am wondering if I need to make that shared resource volatile to make sure that when one thread is reading shared resource, a thread won't just get the value from registers, it will actually read from that shared resource. Or maybe I should use a accessor functions to make the access to that shared resource volatile with some memory barrier operations instead of make that shared resource volatile?
I am curious about whether volatile is necessary for the resources used in a critical section. Consider I have two threads executed on two CPUs and they are competing on a shared resource. I know I need to a locking mechanism to make sure only one thread is performing operations on that shared resource.
Making sure that only one thread accesses a shared resource at a time is only part of what a locking mechanism adequate for the purpose will do. Among other things, such a mechanism will also ensure that all writes to shared objects performed by thread Ti before it releases lock L are visible to all other threads Tj after they subsequently acquire lock L. And that in terms of the C semantics of the program, notwithstanding any questions of compiler optimization, register usage, CPU instruction reordering, or similar.
When such a locking mechanism is used, volatile does not provide any additional benefit for making threads' writes to shared objects be visible to each other. When such a locking mechanism is not used, volatile does not provide a complete substitute.
C's built-in (since C11) mutexes provide a suitable locking mechanism, at least when using C's built-in threads. So do pthreads mutexes, Sys V and POSIX semaphores, and various other, similar synchronization objects available in various environments, each with respect to corresponding multithreading systems. These semantics are pretty consistent across C-like multithreading implementations, extending at least as far as Java. The semantic requirements for C's built-in multithreading are described in section 5.1.2.4 of the current (C17) language spec.
volatile is for indicating that an object might be accessed outside the scope of the C semantics of the program. That may happen to produce properties that interact with multithreaded execution in a way that is taken to be desirable, but that is not the purpose or intended use of volatile. If it were, or if volatile were sufficient for such purposes, then we would not also need _Atomic objects and operations.
The previous remarks focus on language-level semantics, and that is sufficient to answer the question. However, inasmuch as the question asks specifically about accessing variables' values from registers, I observe that compilers don't actually have to do anything much multithreading-specific in that area as long as acquiring and releasing locks requires calling functions.
In particular, if an execution E of function f writes to an object o that is visible to other functions or other executions of f, then the C implementation must ensure that that write is actually performed on memory before E evaluates any subsequent function call (such as is needed to release a lock). This is necessary because because the value written must be visible to the execution of the called function, regardless of any other threads.
Similarly, if E uses the value of o after return from a function call (such as is needed to acquire a lock) then it must load that value from memory to ensure that it sees the effect of any write that the function may have performed.
The only thing special to multithreading in this regard is that the implementation must ensure that interprocedural analysis optimizations or similar do not subvert the needed memory reads and writes around the lock and unlock functions. In practice, this rarely requires special attention.
The answer is no; volatile is not necessary (assuming the critical-section functions you are using were implemented correctly, and you are using them correctly, of course). Any proper critical-section API's implementation will include the memory-barriers necessary to handle flushing registers, etc, and therefore avoid the need for the volatile keyword.
volatile is normally used inform compiler that this data might be change by others (interrupt, DMA, other CPU,...) to prevent un-expected optimization in compiler.
So in your case you may need or don't need:
If you don't have some while loop with some info from share resource in the thread for value change, you don't really need for volatile.
If you have some wait like while (shareVal == 0) in the source code, you need to tell compiler explicit by attribute volatile.
For case 2 CPUs, there is also possibility issue with cache that a CPU is only reading value from cache memory. Please consider to configure memory attribute properly for shared resource.
Online examples of correct use of the volatile keyword appear to be like so:
void Foo (volatile SomethingExternal * x, int data_update)
{
while (x->busy);
x->data = data_update;
}
But it seems that if the data that x points to is genuinely volatile, then a context switch may occur between exiting the while loop and writing to the data, so if it's important that the busy flag is false when we access it then isn't this code unsafe?
This is not quite true. There are constructs which, by design, are correct when implemented with volatile operations. From the standard as quoted in [this answer]:
The observable behavior of the abstract machine is its sequence of reads and writes to volatile data and calls to library I/O functions.
This gives us guarantees that all volatile data will be read and written as requested, without reordering with respect to the current thread.
As an example of a structure which is correct even with context-switching, the low-level acquisition of a mutex can be implemented using Dekker's Algorithm. This algorithm does not require an atomic compare-and-swap operation, but it does require the use of volatile-qualified memory. Since volatile operations of one thread are not reordered as seen by anyone (including external threads), the algorithm's correctness holds (the proof requires that operations not be reordered). Likewise, because volatile reads always read from actual memory and not from a cached value, the algorithm can make progress when the lock is made available.
It is an exercise to show to the reader that this algorithm can be used to construct, for example, a safe locking idiom.
Another example of (safe) use of volatile variables is the code given in your question, when executed on a single-threaded processor without context switches (e.g. a microcontroller with interrupts disabled) with x pointing into the memory mapping of an external device. This assumes that the code is actually correct for the device's intended use (i.e. as soon as busy is deasserted, a single write to the data register will initiate whatever task is required of it).
Volatile reads ensure that your program makes progress when the device is no longer busy (liveness), because the compiler cannot simply coalesce the loop into a single memory read followed by an infinite loop taken if the device was busy.
In the example you link to, the model is of some device that is accessed with volatile objects. There is no other thread or process interacting with the device: Once the device finishes its task and becomes not busy, it remains not busy until you give it a new command. No other thread or process will make it busy; you own the device and have exclusive access. The memory needs to be marked volatile so that the compiler will perform an actual read when the C code checks x->busy and will perform an actual write when the C code writes x->data.
You are correct that a context switch could occur between testing x->busy and writing x->data. This would be a bug if there were another process or thread that were accessing the device. But that is not what this code is for.
When I googling about "volatile" and its user space usage, I found mails between Theodore Tso and Linus Torvalds. According to these great masters, use of "volatile" in userspace probably be a bug??Check discussion here
Although they have some explanations, but I really couldn't understand. Could anyone use some simple language explain why they said so? We are not suppose to use volatile in user space??
volatile tells the compiler that every read and write has an observable side effect; thus, the compiler can't make any assumptions about two reads or two writes in a row having the same effect.
For instance, normally, the following code:
int a = *x;
int b = *x;
if (a == b)
printf("Hi!\n");
Could be optimized into:
printf("Hi!\n");
What volatile does is tell the compiler that those values might be coming from somewhere outside of the program's control, so it has to actually read those values and perform the comparison.
A lot of people have made the mistake of thinking that they could use volatile to build lock-free data structures, which would allow multiple threads to share values, and they could observe the effects of those values in other threads.
However, volatile says nothing about how different threads interact, and could be applied to values that could be cached with different values on different cores, or could be applied to values that can't be atomically written in a single operation, and so if you try to write multi-threaded or multi-core code using volatile, you can run into a lot of problems.
Instead, you need to either use locks or some other standard concurrency mechanism to communicate between threads, or use memory barriers, or use C11/C++11 atomic types and atomic operations. Locks ensure that an entire region of code has exclusive access to a variable, which can work if you have a value that is too large, too small, or not aligned to be atomically written in a single operation, while memory barriers and the atomic types and operations provide guarantees about how they work with the CPU to ensure that caches are synchronized or reads and writes happen in particular orders.
Basically, volatile winds up mostly being useful when you're interfacing with a single hardware register, which can vary outside the programs control but may not require any special atomic operations to access. Or it can be used in signal handlers, where because a thread could be interrupted, and the handler run, and then control returned within the same thread, you need to use a volatile value if you want to communicate a flag to the interrupted code.
But if you're doing any kind of sychronization between threads, you should be using locks or some other concurrency primitives provided by a standard library, or really know what you're doing with regards to memory ordering and use memory barriers or atomic operations.
While declaring the volatile keyword, the value of variable make change any moment from outside the scope of the program. What does that meant? Whether it will change outside the scope of main function or outside the scope of globally declared function? What is the perspective in terms of embedded system, if two or more events are performed simultaneously?
volatile was originally intended for stuff like reading from a memory mapped hardware device; each time you read from something like a memory address mapped to a serial port it might have a new value, even if nothing in your program wrote to it. volatile makes it clear that the data there may change at any time, so it should be reread each time, rather than allowing the compiler to optimize it to a single read when it knows your program never changes it. Similar cases can occur even without hardware interference; asynchronous kernel callbacks may write back into user mode memory in a similar way, so reading the value afresh each time is sometimes necessary.
Ab optimizing compiler assumes there is only the context of a single thread of execution. Another context means anything the compiler can't see happening at the same time. So this is hardware actions, interrupt handlers or other threads or processes. Where your code accesses a global (program or file level) variable the optimizer won't assume another context might change or read it unless you tell it by using the volatile qualifier.
Take the case of a hardware register that is memory mapped and you read in a while loop waiting for it to change. Without volatile the compiler only sees your while loop reading the register and if you allow the compiler to optimize the code it will optimize away the multiple reads and never see a change in the register. This is what we normally want the optimizing compiler to do with variables that don't change in a loop.
A similar thing happens to memory mapped hardware registers you write to. If your program never reads from them the compiler could optimize away the write. Again this is what you want an optimizing compiler to do when you are not dealing with a memory location that is used by hardware or another context.
Interrupt handlers and forked threads are treated the same way as hardware. The optimizer doesn't assume they are running at the same time and skip optimizing away a load or store to a shared memory location unless you use volatile.
I am programming two processes that communicate by posting messages to each other in a segment of shared memory. Although the messages are not accessed atomically, synchronization is achieved by protecting the messages with shared atomic objects accessed with store-releases and load-acquires.
My problem is about security. The processes do not trust each other. Upon receiving a message, a process makes no assumption about the message being well formed; it first copies the message from shared memory to private memory, then performs some validation on this private copy and, if valid, proceeds to handle this same private copy. Making this private copy is crucial, as it prevents a TOC/TOU attack in which the other process would modify the message between validation and use.
My question is the following: does the standard guarantee that a clever C compiler will never decide that it can read the original instead of the copy? Imagine the following scenario, in which the message is a simple integer:
int private = *pshared; // pshared points to the message in shared memory
...
if (is_valid(private)) {
...
handle(private);
}
If the compiler runs out of registers and temporarily needs to spill private, could it decide, instead of spilling it to the stack, that it can simply discard its value and reload it from *pshared later, provided that an alias analysis ensures that this thread has not changed *pshared?
My guess is that such a compiler optimization would not preserve the semantics of the source program, and would therefore be illegal: pshared does not point to an object that is provably reachable from this thread only (such as an object allocated on the stack whose address has not leaked), therefore the compiler cannot rule out that some other thread might concurrently modify *pshared. By constrast, the compiler may eliminate redundant loads, because one of the possible behaviors is that no other thread runs between the redundant loads, therefore the current thread must be ready to deal with this particular behavior.
Could anyone confirm or infirm that guess and possibly provide references to the relevant parts of the standard?
(By the way: I assume that the message type has no trap representations, so that loads are always defined.)
UPDATE
Several posters have commented on the need for synchronization, which I did not intend to get into, since I believe that I already have this covered. But since people are pointing that out, it is only fair that I provide more details.
I am implementing a low-level asynchronous communication system between two entities that do not trust each other. I run tests with processes, but will eventually move to virtual machines on top of a hypervisor. I have two basic ingredients at my disposal: shared memory and a notification mechanism (typically, injecting an IRQ into the other virtual machine).
I have implemented a generic circular buffer structure with which the communicating entities can produce messages, then send the aforementioned notifications to let each other know when there is something to consume. Each entity maintains its own private state that tracks what it has produced/consumed, and there is a shared state in shared memory composed of message slots and atomic integers tracking the bounds of the regions holding pending messages. The protocol unambiguously identifies which message slots are to be exclusively accessed by which entity at any time. When it needs to produce a message, an entity writes a message (non atomically) to the appropriate slot, then performs an atomic store-release to the appropriate atomic integer to transfer the ownership of the slot to the other entity, then waits until memory writes have completed, then sends a notification to wake up the other entity. Upon receiving a notification, the other entity is expected to perform an atomic load-acquire on the appropriate atomic integer, determine how many pending messages there are, then consume them.
The load of *pshared in my code snippet is just an example of what consuming a trivial (int) message looks like. In a realistic setting, the message would be a structure. Consuming a message does not need any particular atomicity or synchronization, since, as specified by the protocol, it only happens when the consuming entity has synchronized with the other one and knows that it owns the message slot. As long as both entites follow the protocol, everything works flawlessly.
Now, I do not want the entites to have to trust each other. Their implementation must be robust against a malicious entity that would disregard the protocol and write all over the shared memory segment at any time. If this were to happen, the only thing the malicious entity should be able to achieve would be to disrupt the communication. Think of a typical server, that must be prepared to handle ill-formed requests by a malicious client, without letting such misbehavior cause buffer overflows or out-of-bound accesses.
So, while the protocol relies on synchronization for normal operation, the entities must be prepared for the contents of shared memory to change at any time. All I need is a way to make sure that, after an entity makes a private copy of a message, it validates and uses that same copy, and never accesses the original any more.
I have an implementation that copies the message using a volatile read, thus making it clear to the compiler that the shared memory does not have ordinary memory semantics. I believe that this is sufficient; I wonder whether it is necessary.
You should inform the compiler the the shared memory can change at any moment by the volatile modifier.
volatile int *pshared;
...
int private = *pshared; // pshared points to the message in shared memory
...
if (is_valid(private)) {
...
handle(private);
}
As *pshared is declared to be volatile, the compiler can no longer assume that *pshared and private keep same value.
Per your edit, it is now clear, that we all know that a volatile modifier on the shared memory is sufficient to guarantee that the compiler will honour the temporality of all accesses to that shared memory.
Anyway, draft N1256 for C99 is explicit about it in 5.1.2.3 Program execution (emphasize mine)
2 Accessing a volatile object, modifying an object, modifying a file, or calling a function
that does any of those operations are all side effects, which are changes in the state of
the execution environment. Evaluation of an expression may produce side effects. At
certain specified points in the execution sequence called sequence points, all side effects
of previous evaluations shall be complete and no side effects of subsequent evaluations
shall have taken place.
5 The least requirements on a conforming implementation are:
— At sequence points, volatile objects are stable in the sense that previous accesses are
complete and subsequent accesses have not yet occurred
— At program termination, all data written into files shall be identical to the result that
execution of the program according to the abstract semantics would have produced.
That let think that even if pshared is not qualified as volatile, private value must have been loaded from *pshared before the evaluation of is_valid, and as the abstract machine has no reason to change it before the evaluation of handle, a conformant implementation should not change it. At most it could remove the call to handle if it contained no side-effects which is unlikely to happen
Anyway, this is only an academic discussion, because I cannot imagine a real use case where share memory could not need the volatile modifier. If you do not use it, the compiler is free to believe that the previous value is still valid, so on second access, you will still get first value. So even if the answer to this question is it is not necessary, you still have to use volatile int *pshared;.
It's hard to answer your question as posted. Note that you must use a synchronization object to prevent concurrent accesses, unless you are only reading units which are atomic on the platform.
I am assuming that you intend to ask about (pseudocode):
lock_shared_area();
int private = *pshared;
unlock_shared_area();
if (is_valid(private))
and that the other process also uses the same lock. (If not, it would be good to update your question to be a bit more specific about your synchronization).
This code guarantees to read *pshared at most once. Using the name private means to read the variable private, not the object *pshared. The compiler "knows" that the call to unlock the area acts as a memory fence and it won't reorder operations past the fence.
Since the C doesn't have any concept of interprocess communication there is nothing you can do to inform the compiler that there is another process that might be modifying the memory.
Thus, I believe there is no way to prevent a sufficiently clever, malicious, but conforming build system from invoking the "as if" rule to allow it to do the Wrong Thing.
To get something that is 'guaranteed' to work, you need to work whatever guarantees are given by your specific compiler and/or shared memory library you're using.