ensure reading a struct as a whole - c

I have a shared struct variable between 2 threads:
struct {
long a;
long b;
long c;
} myStruct;
struct myStruct A;
All 3 fields of A is initialized to zero. Then 1st thread will update them:
A.a = 1;
A.b = 2;
A.c = 3;
And 2nd thread will read from it. What I want to ensure is that 2nd thread will read A as a whole, either the old value {0, 0, 0}, or the new value {1, 2, 3}, not some corrupted like {1, 2, 0}.
The struct don't fit in 64bit so I can not use builtin atomic of gcc, and I don't want to use mutex either, so I came up with 2 guarding flags:
struct {
long a;
long b;
long c;
volatile int beginCount, endCount;
} A;
then 1st thread will:
A.a = 1;
A.b = 2;
A.c = 3;
and 2nd will loop until it get a consistent struct:
int begin, end;
myStruct tmp;
do {
begin = A.beginCount;
end = A.endCount;
tmp = A;
} while (!(begin == A.beginCount && end == A.endCount && A.beginCount == A.endCount))
// now tmp will be either {0,0,0} or {1,2,3}
Are those 2 guarding flags enough? If not then please point out the specificed combination of thread scheduling that could break it.
Edit 1: the reason I don't want to use mutex is that the 1st thread has high priority, it should not wait for anything. If 1st thread want to write when 2nd is reading, then 1st thread still write anyway, and 2nd thread has to redo the reading until it get a consistent value. We can't do that with mutex, at least not something I'm aware of.
Edit 2: about environment: this code run on multiprocessor system, and I dedicated 1 entire cpu core for each thread.
Edit 3: I know that synchronization without mutex or atomic is very tricky. I've listed down all combination I could think of, and could not find any one break the code. So, please, don't just tell that it won't work, I will really appreciated if you point out when it will break.

I don't want to use mutex either
On a uniprocessor system, if the first thread gets preempted while writing, the reading thread will spend its time slice spinning needlessly. You do want a mutex in such a case.
Both Linux futexes and Windows' CriticalSections don't context-switch in the non-contention case and on multiprocessor systems, spin a while before yielding.
Why reimplement the exact same mechanism?

There is absolutely no portable way to do what you want. Some very high-end systems have transactional memory that may be able to accomplish what you want, but the normal pattern for using transactional memory anyway is to write your code with locks and rely on the lock implementation to use transactions.
Simply use a mutex to protect both reads and writes. There is no other way to make your code correct, but lots of ways to make it "seem correct to testing" until it violates an invariant and crashes a few months later or gets run on a slightly different environment/cpu and starts crashing every time you run it.

My first advice is that you really should implement it using a mutex (be sure each thread holds the mutex for as little time as possible) and see if you actually run into any problems. Most likely you will find out that using a mutex works just fine, and that nothing more is required. Doing it that way has the advantage of being portable to any hardware, simple to understand, and easy to debug.
That said, if you insist on not using a mutex, then the only other option is to use atomic variables. Since atomic variables are word-sized, you won't be able to make an entire struct atomic, but you can fake it by instantiating an array of structs instead (the necessary size of the array will depend on how often you intend to update the struct) and then using atomic integers as indices into the "currently valid for reading" and "okay to write into for writing" structs in the array. Reading the current value of the struct out of the array is simple enough -- you just need to read from the "currently valid for reading" index in the array, which is guaranteed not to be written to -- but writing a new value is more elaborate; you need to atomically increment the "okay to write into for writing" index (and wrap it around if necessary to avoid indexing off the end of the array, and check for an overflow condition if, after doing this, the okay-for-writing index equals the read-from index). Then write your new struct into the slot specified by the okay-for-writing index. Then you've got to do an atomic compare-and-set operation to set the read-from index equal to the okay-for-writing index; if the compare-and-set operation fails, you need to restart the whole operation because another thread beat you to the update. Repeat the whole set() process again until the compare-and-set operation succeeds.
(if this all sounds dubious and error-prone, that's because it is. It can be implemented correctly, but it's very easy to instead implement it almost-correctly, and end up with code that works 99.999% of the time and then does something regrettable and non-reproducible the other 0.0001% of the time. Consider yourself warned :))


How Do I Enforce Write-only Behavior to a Page in Windows?

I'm reading the documentation for the Win32 VirtualAlloc function, and in the protections listed, there is no PAGE_WRITEONLY option that would pair with the PAGE_READONLY option, naturally. Is there any way to obtain such support by the operating system, or will I have to implement this in software somehow, or can I use processor features that may be available for implementing such things in hardware from user code? A software implementation is undesirable for obvious performance reasons.
Now this also introduces an obvious problem: the memory cannot be read, effectively making the writes an expensive NOP sequence, so the question is whether or not I can make a page have different protections from different contexts so that from one context, the page is write-only, but from another context, the page is read-only.
Security is only one small consideration, but in principle, it is for the sake of ensuring consistency of implementation with design which has security as a benefit (no unwanted reading of what should only be written from one context and vice versa). If you only need to write to something (which is obvious in the case of e.g. the output of a procedure, a hardware send buffer [or software model thereof in RAM], etc.), then it is worthwhile to ensure it is only written, and if you only need to read something, then it is worthwhile to ensure it is only read.
Reading you comments I think you are looking for a lock system where only one thread can write or read to memory at the same time. Is that correct?
You may be looking for the cmpxchg instruction which is implemented in Windows by function InterlockedCompareExchange, InterlockedCompareExchange64 and InterlockedCompareExchange128. This will help you compare two 32/64/128 bit values and copy a new value to the location if they are equal. You can compare it to the following C code
a = c;
The difference between this C example and the cmpxchg instruction is that cmpxchg is one single instruction and the C example consist out of multiple instructions. This means the cmpxchg cannot be interrupted, where the C example can be interrupted. If the C example is interrupted after the 'if' statement and before the 'set' instruction, another thread will get CPU time and can change variable 'a'. This cannot happen with cmpxchg.
This still causes problems if the system has multiple cores. To fix this, the lock prefix is used. This causes synchronization through all the CPU's. This is also used in the windows API I mentioned above, so don't worry about this.
For every piece of memory you want to lock, you create an integer. You use the InterlockedCompareExchange to set this variable to '1', but only if it equals '0'. If the function returns that it didn't equal '0', you wait by calling sleep, and retry until it does. Every thread needs to set this variable to '0' when it's done using it.
LONG volatile lock;
int main()
//init the lock
lock = (LONG)0;
for (int i = 0; i < 100; i++)
CreateThread(0, 0, (LPTHREAD_START_ROUTINE) &newThread, (LPVOID) i, 0, 0);
int newThread(int var) {
//Request lock
while (InterlockedCompareExchange((long *)&lock, 1, 0) != 0)
printf("Thread %x (%d) got the lock, waiting %dms seconds before releasing the lock.\n", GetCurrentThreadId(), var, var*100);
//Do whatever you want to do
Sleep(var * 100);
printf("Lock released.\n");
lock = (LONG)0;
return 0;

Self-written Mutex for 2+ Threads

I have written the following code, and so far in all my tests it seems as if I have written a working Mutex for my 4 Threads, but I would like to get someone else's opinion on the validity of my solution.
typedef struct Mutex{
int turn;
int * waiting;
int num_processes;
} Mutex;
void enterLock(Mutex * lock, int id){
int i;
for(i = 0; i < lock->num_processes; i++){
lock->waiting[id] = 1;
if (i != id && lock->waiting[i])
i = -1;
lock->waiting[id] = 0;
printf("ID %d Entered\n",id);
void leaveLock(Mutex * lock, int id){
printf("ID %d Left\n",id);
lock->waiting[id] = 0;
void foo(Muted * lock, int id){
// do stuff now that i have access
I feel compelled writing an answer here because the question is a good one, taking into concern it could help others to understand the general problem with mutual exclusion. In your case, you came a long way to hide this problem, but you can't avoid it. It boils down to this:
01 /* pseudo-code */
02 if (! mutex.isLocked())
03 mutex.lock();
You always have to expect a thread switch between lines 02 and 03. So there is a possible situation where two threads find mutex unlocked and be interrupted after that ... only to resume later and lock this mutex individually. You will have two threads entering the critical section at the same time.
What you definitely need to implement reliable mutual exclusion is therefore an atomic operation that tests a condition and at the same time sets a value without any chance to be interrupted meanwhile.
01 /* pseudo-code */
02 while (! test_and_lock(mutex));
As soon as this test_and_lock function cannot be interrupted, your implementation is safe. Until c11, C didn't provide anything like this, so implementations of pthreads needed to use e.g. assembly or special compiler intrinsics. With c11, there is finally a "standard" way to write atomic operations like this, but I can't give an example here, because I don't have experience doing that. For general use, the pthreads library will give you what you need.
edit: of course, this is still simplified -- in a multi-processor scenario, you need to ensure that even memory accesses are mutually exclusive.
The Problem I see in you code:
The idea behind a mutex is to provide mutual exclusion, means that when thread_a is in the critical section, thread_b must wait(in case he wants also to enter) for thread_a.
This waiting part should be implemented in enterLock function. But what you have is a for loop which might end way before thread_a is done from the critical section and thus thread_b could also enter, hence you can't have mutual exclusion.
Way to fix it:
Take a look for example at Peterson's algorithm or Dekker's(more complicated), what they did there is what's called busy waiting which is basically a while loop which says:
while(i can't enter) { do nothing and wait...}
You are totally ignoring the topic of memory models. Unless you are on a machine with a sequential consistent memory model (which none of today's PC CPUs are), your code is incorrect, as any store executed by one thread is not necessarily immediately visible to other CPUs. However, exactly this seems to be an assumption in your code.
Bottom line: Use the existing synchronization primitives provided by the OS or a runtime library such a POSIX or Win32 API and don't try to be smart and implement this yourself. Unless you have years of experince in parallel programming as well as in-depth knowledge of CPU architecture, chances are quite good that you end up with an incorrect implementation. And debugging parallel programms can be hell...
After enterLock() returns, the state of the Mutex object is the same as before the function was called. Hence it will not prevent a second thread to enter the same Mutex object even before the first one released it calling leaveLock(). There is no mutual exclusiveness.

Race condition and mutex

I have 2 questions regarding to threads, one is about race condition and the other is about mutex.
So the first question :
I've read about race condition in wikipedia page :
And in the example of race condition between 2 threads this is shown :
Now so far I believed that threads works parallel to each other, but judging from this picture it's seems that I interpreted on how actions done by the computer wrong.
From this picture only 1 action is done at a time, and although the threads gets switched from time to time and the other thread gets to do some actions this is still 1 action at a time done by the computer. Is it really like this ? There's no "real" parallel computing, just 1 action done at a time in a very fast rate which gives the illusion of parallel computing ?
This leads me to my second question about mutex.
I've read that if threads read/write to the same memory we need some sort of synchronization mechanism. I've read the normal data types won't do and we need a mutex.
Let's take for example the following code :
#include <stdio.h>
#include <stdbool.h>
#include <windows.h>
#include <process.h>
bool lock = false;
void increment(void*);
void decrement(void*);
int main()
int n = 5;
HANDLE hIncrement = (HANDLE)_beginthread(increment, 0, (void*)&n);
HANDLE hDecrement = (HANDLE)_beginthread(decrement, 0, (void*)&n);
WaitForSingleObject(hIncrement, 1000 * 500);
WaitForSingleObject(hDecrement, 1000 * 500);
return 0;
void increment(void *p)
int *n = p;
for(int i = 0; i < 10; i++)
while (lock)
lock = true;
lock = false;
void decrement(void *p)
int *n = p;
for(int i = 0; i < 10; i++)
while (lock)
lock = true;
lock = false;
Now in my example here, I use bool lock as my synchronization mechanism to avoid a race condition between the 2 threads over the memory space pointed by pointer n.
Now what I did here won't obviously work because although I avoided a race condition over the memory space pointed by pointer n between the 2 threads a new race condition over bool lock variable may occur.
Let's consider the following sequence of events (A = increment thread, B = decrement thread) :
A gets out of the while loop since lock is false
A gets to set lock to true
B waits in the while loop because lock is set to true
A increment the value pointed by n
A sets lock to false
A gets to the while loop
A gets out of the while loop since lock is false
B gets out of the while loop since lock is false
A sets lock to true
B sets lock to true
and from here we get an unexpected behavior of 2 un-synchronized threads because the bool lock is not race condition proof.
Ok, so far this is my understanding and the solution to our problem above we need a mutex.
I'm fine with that, a data type that will magically be condition race proof.
I just don't understand how with mutex type it won't happen where as with every other type it will and here lies my problem, I want to understand why mutex and how this is happening.
About your first question: Whether or not there are actually several different threads running at once, or whether it is just implemented as as fast switching, is a matter of your hardware. Typical PCs these days have several cores (often with more than one thread each), so you have to assume that things actually DO happen at the same time.
But even if you have only a single-core system, things are not quite so easy. This is because the compiler is usually allowed to re-order instructions in order to optimize code. It can also e.g. choose to cache a variable in a CPU register instead of loading it from memory every time you access it, and it also doesn't have to write it back to memory every time you write to that variable. The compiler is allowed to do that as long as the result is the same AS IF it had run your original code in its original order - as long as nobody else is looking closely at what's actually going on, such as a different thread.
And once you actually do have different cores, consider that they all have their own CPU registers and even their own cache. Even if a thread on one core wrote to a certain variable, as long as that core doesn't write its cache back to the shared memory a different core won't see that change.
In short, you have to be very careful in making any assumptions about what happens when two threads access variables at the same time, especially in C/C++. The interactions can be so surprising that I'd say, to stay on the safe side, you should make sure that there are no race conditions in your code, e.g. by always using mutexes for accessing memory that is shared between threads.
Which is where we can neatly segway into the second question: What's so special about mutexes, and how can they work if all basic data types are not threadsafe?
The thing about mutexes is that they are implemented with a lot of knowledge about the system for which they are being used (hardware and operating system), and with either the direct help or a deep knowledge of the compiler itself.
The C language does not give you direct access to all the capabilities of your hardware and operating system, because platforms can be very different from each other. Instead, C focuses on providing a level of abstraction that allows you to compile the same code for many different platforms. The different "basic" data types are just something that the C standard came up with as a set of data types which can in some way be supported on almost any platform - but the actual hardware that your program will be compiled for is usually not limited to those types and operations.
In other word, not everything that you can do with your PC can be expressed in terms of C's ints, bytes, assignments, arithmetic operators and so on. For example, PCs often calculate with 80-bit floating point types which are usually not mapped directly to a C floating point type at all. More to the point of our topic, there are also CPU instructions that influence how multiple CPU cores will work together. Additionally, if you know the CPU, you often know a few things about the behaviour of the basic types that the C standard doesn't guarantee (for example, whether loads and stores to 32-bit integers are atomic). With that extra knowledge, it can become possible to implement mutexes for that particular platform, and it will often require code that is e.g. written directly in assembly language, because the necessary features are not available in plain C.

