Is there any synchronization problem in this program - c

Consider the following program.
(i) What would be the output in Line A and Line B? Justify your answer.
(ii) Do you think there is synchronization problem in updating the variable value?
Justify your answer.
#include <pthread.h>
#include <stdio.h>
#include <sys/types.h>
int value = 100;
void *thread_prog(void *param);
int main(int argc, char *argv[])
{
pthread_t tid;
pthread_create(&tid, NULL, thread_prog, NULL);
pthread_join(tid, NULL);
value = value + 100;
printf("Parent value = %d\n", value); //Line A
}
void *thread_prog(void *param)
{
value = value + 100;
printf("Child value = %d\n", value); // Line B
pthread_exit(0);
}
The output will be Line A is 300 and Line B is 200
I don't think there is a synchronization problem because of the pthread_join(tid, NULL);

In the posted code, the control flow is obvious, so there are no issues with before-after ordering relationships. But proper synchronization of multithreaded code requires more than just establishing proper before-after relationships.
There are two additional aspects that need to be addressed to ensure this code has no synchronization issues.
Are the changes made in the child thread visible in the main thread?
Do the semantics of the C abstract machine preclude a C compiler from assuming that the contents of the variable value does not change while the main thread is running?
This answer only addresses the first concern.
It's not sufficient to merely establish a guaranteed before-after relationship in multithreaded code to ensure a change to a variable is seen in its entirety by another thread. The Wikipedia entry on "memory barrier" provides a good explanation:
A memory barrier, also known as a membar, memory fence or fence instruction, is a type of barrier instruction that causes a central processing unit (CPU) or compiler to enforce an ordering constraint on memory operations issued before and after the barrier instruction. This typically means that operations issued prior to the barrier are guaranteed to be performed before operations issued after the barrier.
Memory barriers are necessary because most modern CPUs employ performance optimizations that can result in out-of-order execution. This reordering of memory operations (loads and stores) normally goes unnoticed within a single thread of execution, but can cause unpredictable behaviour in concurrent programs and device drivers unless carefully controlled. The exact nature of an ordering constraint is hardware dependent and defined by the architecture's memory ordering model. Some architectures provide multiple barriers for enforcing different ordering constraints.
In other words, a change made to a variable running on, for example, CPU 1 may not be "seen" by another thread running on, say, CPU 7 even though the code runs on CPU 7 after the code that changed the variable ran on CPU 1.
There needs to be some sort of platform-specific implementation of a guarantee that such changes are propagated through the actual hardware and visible.
And POSIX's threading model specifies those exact guarantees.
Per POSIX 4.12 Memory Synchronization (bolding mine):
Applications shall ensure that access to any memory location by more than one thread of control (threads or processes) is restricted such that no thread of control can read or modify a memory location while another thread of control may be modifying it. Such access is restricted using functions that synchronize thread execution and also synchronize memory with respect to other threads. The following functions synchronize memory with respect to other threads:
.
.
.
pthread_create()
pthread_join()
.
.
.
The use of pthread_create() and pthread_join() not only establishes the before-after ordering relationship needed for proper synchronization, but per the POSIX standard they also establish the visibility requirements.
So yes, the posted code is properly synchronized in guaranteed before-after ordering and also visbility aspects.
Note, though, this answer does not address the question of whether or not the posted code is properly synchronized per the semantics of the C abstract machine. I'll defer to experts with better understanding of the C standard and the interpretation of its abstract machine in that regard.

You assume correctly that there is no synchronization problem, pthread_join() suspends the calling thread until the target thread terminates. If the target thread already has terminated, it returns immediately. In any case, the target case is executed and terminated before line A is output.

Related

multithreading: which variables need mutex protection when communication via a condition variable?

I have a question on the interplay between condition variables and associated mutex locks (it arose from a simplified example I was presenting in a lecture, confusing myself in the process). Two threads are exchanging data (lets say an int n indicating an array size, and a double *d which points to the array) via shared variables in the memory of their process. I use an additional int flag (initially flag = 0) to indicate when the data (n and d) is ready (flag = 1), a pthread_mutex_t mtx, and a condition variable pthread_cond_t cnd.
This part is from the receiver thread which waits until flag becomes 1 under the protection of the mutex lock, but afterward processes n and d without protection:
while (1) {
pthread_mutex_lock(&mtx);
while (!flag) {
pthread_cond_wait(&cnd, &mtx);
}
pthread_mutex_unlock(&mtx);
// use n and d
}
This part is from the sender thread which sets n and d beforehand without protection by the mutex lock, but sets flag while the mutex is locked:
n = 10;
d = malloc(n * sizeof(float));
pthread_mutex_lock(&mtx);
flag = 1;
pthread_cond_signal(&cnd);
pthread_mutex_unlock(&mtx);
It is clear that you need the mutex in the sender since otherwise you have a "lost wakeup call" problem (see https://stackoverflow.com/a/4544494/3852630).
My question is different: I'm not sure which variables have to be set (in the sender thread) or read out (in the receiver thread) inside the region protected by the mutex lock, and which variables don't need to be protected by the mutex lock. Is it sufficient to protect flag on both sides, or do n and d also need protection?
Memory visiblity (see the rule below) between sender and receiver should be guaranteed by the call to pthread_cond_signal(), so the necessary pairwise memory barriers should be there (in combination with pthread_cond_wait()).
I'm aware that this is an unusual case. Typically my applications modify a task list in the sender and pop tasks from the list in the receiver, and the associated mutex lock protects of the list operations on both sides. However I'm not sure what would be necessary in the case above. Could the danger be that the compiler (which is not aware of the concurrent access to variables) optimizes away the write and/or read access to the variables? Are there other problems if n and d are not protected by the mutex?
Thanks for your help!
David R. Butenhof: Programming with POSIX Threads, p.89: "Whatever memory values a thread can see when is signals ... a condition variable can also be seen by any thread that is awakened by that signal ...".
At the low level of memory ordering, the rules aren't specified in terms of "the region protected by a mutex", but in terms of synchronization. Whenever two different threads access the same non-atomic object, then unless they are both reads, there must be a synchronization operation in between, to ensure that one of the accesses definitely happens before the other.
The way to achieve synchronization is to have one thread perform a release operation (such as unlocking a mutex) after accessing the shared variables, and have the other thread perform an acquire operation (such as locking a mutex) before accessing them, in such a way that program logic guarantees the acquire must have happened after the release.
And here, you have that. The sender thread does perform a mutex unlock after accessing n and d (last line of the sender code). And the receiver does perform a mutex lock before accessing them (inside pthread_cond_wait). The setting and testing of flag ensures that, when we exit the while (!flag) loop, the most recent lock of the mutex by the receiver did happen after the unlock by the sender. So synchronization is achieved.
The compiler and CPU must not perform any optimization that would defeat this, so in particular they can't optimize away the accesses to n and d, or reorder them around the synchronizing operations. This is usually ensured by treating the release/acquire operations as barriers. Any accesses to potentially shared objects that occur in program order before a release barrier must actually be performed and flushed to coherent memory/cache prior to anything that comes after the barrier (in this case, before any other thread may see the mutex as unlocked). If special CPU instructions are needed to ensure global visibility, the compiler must emit them. And vice versa for acquire barriers: any access that occurs in program order after an acquire barrier must not be reordered before it.
To say it another way, the compiler treats the release barrier as an operation that may potentially read all of memory; so all variables must be written out before that point, so that the actual contents of memory at that point will match what an abstract machine would have. Likewise, an acquire barrier is treated as an operation that may potentially write all of memory, and all variables must be reloaded from memory afterwards. The only exception would be local variables for which the compiler can prove that no other thread could legally know their address; those can be safely kept in registers or otherwise reordered.
It's true that, after the synchronizing lock operation, the receiver happened to unlock the mutex again, but that isn't relevant here. That unlock doesn't synchronize with anything in this particular program, and it has no impact on its execution. Likewise, for purposes of synchronizing access to n and d, it didn't matter whether the sender locked the mutex before or after accessing them. (Though it was important that the sender locked the mutex before writing to flag; that's how we ensure that any earlier reads of flag by the receiver really did happen before the write, instead of racing with it.)
The principle that "accesses to shared variables should be inside a critical region protected by a mutex" is just a higher-level abstraction that is one way to ensure that accesses by different threads always have a synchronizing unlock-lock pair in between them. And in cases where the variables could be accessed over and over again, you normally would want a lock before, and an unlock after, every such access, which is equivalent to the "critical section" principle. But this principle is not itself the fundamental rule.
That said, in real life you probably do want to follow this principle as much as possible, since it will make it easier to write correct code and avoid subtle bugs, and likewise make it easier for other programmers to verify that your code is correct. So even though it is not strictly necessary in this program for the accesses to n,d to be "protected" by the mutex, it would probably be wise to do so anyway, unless there is a significant and measurable benefit (performance or otherwise) to be gained by avoiding it.
The condition variable doesn't play a role in the race avoidance here, except insofar as the pthread_cond_wait locked the mutex. Functionally, it is equivalent to having the receiver simply do a tight spin loop of "lock mutex; test flag; unlock mutex", but without wasting all those CPU cycles.
And I think that the quote from Butenhof is mistaken, or at best misleading. My understanding is that pthread_cond_signal by itself is not guaranteed to be a barrier of any kind, and in fact has no memory ordering effect whatsoever. POSIX doesn't directly address memory ordering, but this is the case for the standard C equivalent cnd_signal. There would be no point in having pthread_cond_signal ensure global visibility unless you could make use of it by assuming that all those accesses were visible by the time the corresponding pthread_cond_wait returns. But since pthread_cond_wait can wake up spuriously, you cannot ever safely make such an assumption.
In this program, the necessary release barrier in the sender comes not from pthread_cond_signal, but rather from the subsequent pthread_mutex_unlock.
Is it sufficient to protect flag on both sides, or do n and d also need protection?
In principle, if you use your mutex, CV, and flag in such a way that the the writer does not modify n, d, or *d after setting flag under mutex protection, and the reader cannot access n, d, or *d until after it observes the modification of flag (under protection of the same mutex), then you can rely on the reader observing the writer's last-written values of n, d, and *d. This is more or less a hand-rolled semaphore.
In practice, you should use whichever system synchronization objects you have chosen (mutexes, semaphores, etc.) to protect all the shared data. Doing so is easier to reason about and less prone to spawn bugs. Often, it's simpler, too.

How to use sched_yield() properly?

For an assignment, I need to use sched_yield() to synchronize threads. I understand a mutex lock/conditional variables would be much more effective, but I am not allowed to use those.
The only functions we are allowed to use are sched_yield(), pthread_create(), and pthread_join(). We cannot use mutexes, locks, semaphores, or any type of shared variable.
I know sched_yield() is supposed to relinquish access to the thread so another thread can run. So it should move the thread it executes on to the back of the running queue.
The code below is supposed to print 'abc' in order and then the newline after all three threads have executed. I looped sched_yield() in functions b() and c() because it wasn't working as I expected, but I'm pretty sure all that is doing is delaying the printing because a function is running so many times, not because sched_yield() is working.
The server it needs to run on has 16 CPUs. I saw somewhere that sched_yield() may immediately assign the thread to a new CPU.
Essentially I'm unsure of how, using only sched_yield(), to synchronize these threads given everything I could find and troubleshoot with online.
#include <stdio.h>
#include <pthread.h>
#include <stdlib.h>
#include <sched.h>
void* a(void*);
void* b(void*);
void* c(void*);
int main( void ){
pthread_t a_id, b_id, c_id;
pthread_create(&a_id, NULL, a, NULL);
pthread_create(&b_id, NULL, b, NULL);
pthread_create(&c_id, NULL, c, NULL);
pthread_join(a_id, NULL);
pthread_join(b_id, NULL);
pthread_join(c_id, NULL);
printf("\n");
return 0;
}
void* a(void* ret){
printf("a");
return ret;
}
void* b(void* ret){
for(int i = 0; i < 10; i++){
sched_yield();
}
printf("b");
return ret;
}
void* c(void* ret){
for(int i = 0; i < 100; i++){
sched_yield();
}
printf("c");
return ret;
}
There's 4 cases:
a) the scheduler doesn't use multiplexing (e.g. doesn't use "round robin" but uses "highest priority thread that can run does run", or "earliest deadline first", or ...) and sched_yield() does nothing.
b) the scheduler does use multiplexing in theory, but you have more CPUs than threads so the multiplexing doesn't actually happen, and sched_yield() does nothing. Note: With 16 CPUs and 2 threads, this is likely what you'd get for "default scheduling policy" on an OS like Linux - the sched_yield() just does a "Hrm, no other thread I could use this CPU for, so I guess the calling thread can keep using the same CPU!").
c) the scheduler does use multiplexing and there's more threads than CPUs, but to improve performance (avoid task switches) the scheduler designer decided that sched_yield() does nothing.
d) sched_yield() does cause a task switch (yielding the CPU to some other task), but that is not enough to do any kind of synchronization on its own (e.g. you'd need an atomic variable or something for the actual synchronization - maybe like "while( atomic_variable_not_set_by_other_thread ) { sched_yield(); }). Note that with an atomic variable (introduced in C11) it'd work without sched_yield() - the sched_yield() (if it does anything) merely makes busy waiting less awful/wasteful.
Essentially I'm unsure of how, using only sched_yield(), to
synchronize these threads given everything I could find and
troubleshoot with online.
That would be because sched_yield() is not well suited to the task. As I wrote in comments, sched_yield() is about scheduling, not synchronization. There is a relationship between the two, in the sense that synchronization events affect which threads are eligible to run, but that goes in the wrong direction for your needs.
You are probably looking at the problem from the wrong end. You need each of your threads to wait to execute until it is their turn, and for them to do that, they need some mechanism to convey information among them about whose turn it is. There are several alternatives for that, but if "only sched_yield()" is taken to mean that no library functions other than sched_yield() may be used for that purpose then a shared variable seems the expected choice. The starting point should therefore be how you could use a shared variable to make the threads take turns in the appropriate order.
Flawed starting point
Here is a naive approach that might spring immediately to mind:
/* FLAWED */
void *b(void *data){
char *whose_turn = data;
while (*whose_turn != 'b') {
// nothing?
}
printf("b");
*whose_turn = 'c';
return NULL;
}
That is, the thread executes a busy loop, monitoring the shared variable to await it taking a value signifying that the thread should proceed. When it has done its work, the thread modifies the variable to indicate that the next thread may proceed. But there are several problems with that, among them:
Supposing that at least one other thread writes to the object designated by *whose_turn, the program contains a data race, and therefore its behavior is undefined. As a practical matter, a thread that once entered the body of the loop in that function might loop infinitely, notwithstanding any action by other threads.
Without making additional assumptions about thread scheduling, such as a fairness policy, it is not safe to assume that the thread that will make the needed modification to the shared variable will be scheduled in bounded time.
While a thread is executing the loop in that function, it prevents any other thread from executing on the same core, yet it cannot make progress until some other thread takes action. To the extent that we can assume preemptive thread scheduling, this is an efficiency issue and contributory to (2). However, if we assume neither preemptive thread scheduling nor the threads being scheduled each on a separate core then this is an invitation to deadlock.
Possible improvements
The conventional and most appropriate way to do that in a pthreads program is with the use of a mutex and condition variable. Properly implemented, that resolves the data race (issue 1) and it ensures that other threads get a chance to run (issue 3). If that leaves no other threads eligible to run besides the one that will modify the shared variable then it also addresses issue 2, to the extent that the scheduler is assumed to grant any CPU to the process at all.
But you are forbidden to do that, so what else is available? Well, you could make the shared variable _Atomic. That would resolve the data race, and in practice it would likely be sufficient for the wanted thread ordering. In principle, however, it does not resolve issue 3, and as a practical matter, it does not use sched_yield(). Also, all that busy-looping is wasteful.
But wait! You have a clue in that you are told to use sched_yield(). What could that do for you? Suppose you insert a call to sched_yield() in the body of the busy loop:
/* (A bit) better */
void* b(void *data){
char *whose_turn = data;
while (*whose_turn != 'b') {
sched_yield();
}
printf("b");
*whose_turn = 'c';
return NULL;
}
That resolves issues 2 and 3, explicitly affording the possibility for other threads to run and putting the calling thread at the tail of the scheduler's thread list. Formally, it does not resolve issue 1 because sched_yield() has no documented effect on memory ordering, but in practice, I don't think it can be implemented without a (full) memory barrier. If you are allowed to use atomic objects then combining an atomic shared variable with sched_yield() would tick all three boxes. Even then, however, there would still be a bunch of wasteful busy-looping.
Final remarks
Note well that pthread_join() is a synchronization function, thus, as I understand the task, you may not use it to ensure that the main thread's output is printed last.
Note also that I have not spoken to how the main() function would need to be modified to support the approach I have suggested. Changes would be needed for that, and they are left as an exercise.

Does pthread_mutex_lock contains memory fence instruction? [duplicate]

This question already has answers here:
Does guarding a variable with a pthread mutex guarantee it's also not cached?
(3 answers)
Closed 3 years ago.
Do pthread_mutex_lock and pthread_mutex_unlock functions call memory fence/barrier instructions? Or do the the lower level instructions like compare_and_swap implicity have memory barriers?
Do pthread_mutex_lock and pthread_mutex_unlock functions call memory fence/barrier instructions?
They do, as well as thread creation.
Note, however, there are two types of memory barriers: compiler and hardware.
Compiler barriers only prevent the compiler from reordering reads and writes and speculating variable values, but don't prevent the CPU from reordering.
The hardware barriers prevent the CPU from reordering reads and writes. Full memory fence is usually the slowest instruction, most of the time you only need operations with acquire and release semantics (to implement spinlocks and mutexes).
With multi-threading you need both barriers most of the time.
Any function whose definition is not available in this translation unit (and is not intrinsic) is a compiler memory barrier. pthread_mutex_lock, pthread_mutex_unlock, pthread_create also issue a hardware memory barrier to prevent the CPU from reordering reads and writes.
From Programming with POSIX Threads by David R. Butenhof:
Pthreads provides a few basic rules about memory visibility. You can count on all implementations of the standard to follow these rules:
Whatever memory values a thread can see when it calls pthread_create can also be seen by the new thread when it starts. Any data written to memory after the call to pthread_create may not necessarily be seen by the new thread, even if the write occurs before the thread starts.
Whatever memory values a thread can see when it unlocks a mutex, either directly or by waiting on a condition variable, can also be seen by any thread that later locks the same mutex. Again, data written after the mutex is unlocked may not necessarily be seen by the thread that locks the mutex, even if the write occurs before the lock.
Whatever memory values a thread can see when it terminates, either by cancellation, returning from its start function, or by calling pthread_exit, can also be seen by the thread that joins with the terminated thread bycalling pthread_join. And, of course, data written after the thread terminates may not necessarily be seen by the thread that joins, even if the write occurs before the join.
Whatever memory values a thread can see when it signals or broadcasts a condition variable can also be seen by any thread that is awakened by that signal or broadcast. And, one more time, data written after the signal or broadcast may not necessarily be seen by the thread that wakes up, even if the write occurs before it awakens.
Also see C++ and Beyond 2012: Herb Sutter - atomic<> Weapons for more details.
Please take a look at section 4.12 of the POSIX specification.
Applications shall ensure that access to any memory location by more than one thread of control (threads or processes) is restricted such that no thread of control can read or modify a memory location while another thread of control may be modifying it. Such access is restricted using functions that synchronize thread execution and also synchronize memory with respect to other threads. [emphasis mine]
Then a list of functions is given which synchronize memory, plus a few additional notes.
If that requires memory barrier instructions on some architecture, then those must be used.
About compare_and_swap: that isn't in POSIX; check the documentation for whatever you are using. For instance, IBM defines a compare_and_swap function for AIX 5.3. which doesn't have full memory barrier semantics The documentation note says:
If compare_and_swap is used as a locking primitive, insert an isync at the start of any critical sections.
From this documentation we can guess that IBM's compare_and_swap has release semantics: since the documentation does not require a barrier for the end of the critical section. The acquiring processor needs to issue an isync to make sure it is not reading stale data, but the publishing processor doesn't have to do anything.
At the instruction level, some processors have compare and swap with certain synchronizing guarantees, and some don't.

pthread_create(3) and memory synchronization guarantee in SMP architectures

I am looking at the section 4.11 of The Open Group Base Specifications Issue 7 (IEEE Std 1003.1, 2013 Edition), section 4.11 document, which spells out the memory synchronization rules. This is the most specific by the POSIX standard I have managed to come by for detailing the POSIX/C memory model.
Here's a quote
4.11 Memory Synchronization
Applications shall ensure that access to any memory location by more
than one thread of control (threads or processes) is restricted such
that no thread of control can read or modify a memory location while
another thread of control may be modifying it. Such access is
restricted using functions that synchronize thread execution and also
synchronize memory with respect to other threads. The following
functions synchronize memory with respect to other threads:
fork() pthread_barrier_wait() pthread_cond_broadcast()
pthread_cond_signal() pthread_cond_timedwait() pthread_cond_wait()
pthread_create() pthread_join() pthread_mutex_lock()
pthread_mutex_timedlock()
pthread_mutex_trylock() pthread_mutex_unlock() pthread_spin_lock()
pthread_spin_trylock() pthread_spin_unlock() pthread_rwlock_rdlock()
pthread_rwlock_timedrdlock() pthread_rwlock_timedwrlock()
pthread_rwlock_tryrdlock() pthread_rwlock_trywrlock()
pthread_rwlock_unlock() pthread_rwlock_wrlock() sem_post()
sem_timedwait() sem_trywait() sem_wait() semctl() semop() wait()
waitpid()
(exceptions to the requirement omitted).
Basically, paraphrasing the above document, the rule is that when applications read or modify a memory location while another thread or process may modify it, they should make sure to synchronize the thread execution and memory with respect to other threads by calling one of the listed functions. Among them, pthread_create(3) is mentioned to provide that memory synchronization.
I understand that this basically means there needs to be some sort of memory barrier implied by each of the functions (although the standard seems not to use that concept). So for example returning from pthread_create(), we are guaranteed that the memory modifications made by that thread before the call appear to other threads (running possibly different CPU/core) after they also synchronize memory. But what about the newly created thread - is there implied memory barrier before the thread starts running the thread function so that it unfailingly sees the memory modifications synchronized by pthread_create()? Is this specified by the standard? Or should we provide memory synchronization explicitly to be able to trust correctness of any data we read according to POSIX standard?
Special case (which would as a special case answer the above question): does a context switch provide memory synchronization, that is, when the execution of a process or thread is started or resumed, is the memory synchronized with respect to any memory synchronization by other threads of execution?
Example:
Thread #1 creates a constant object allocated from heap. Thread #1 creates a new thread #2 that reads the data from the object. If we can assume the new thread #2 starts with memory synchronized then everything is fine. However, if the CPU core running the new thread has copy of previously allocated but since discarded data in its cache memory instead of the new value, then it might have wrong view of the state and the application may function incorrectly.
More concretely...
Previously in the program (this is the value in CPU #1 cache memory)
int i = 0;
Thread T0 running in CPU #0:
pthread_mutex_lock(...);
int tmp = i;
pthread_mutex_unlock(...);
Thread T1 running in CPU #1:
i = 42;
pthread_create(...);
Newly created thread T2 running in CPU #0:
printf("i=%d\n", i); /* First step in the thread function */
Without memory barrier, without synchronizing thread T2 memory it could happen that the output would be
i=0
(previously cached, unsynchronized value).
Update:
Lot of applications using POSIX thread library would not be thread safe if this implementation craziness was allowed.
is there implied memory barrier before the thread starts running the thread function so that it
unfailingly sees the memory modifications synchronized by pthread_create()?
Yes. Otherwise there would be no point to pthread_create acting as memory synchronization (barrier).
(This is afaik. not explicitly stated by posix, (nor does posix define a standard memory model),
so you'll have to decide whether you trust your implementation to do the only sane thing it possibly could - ensure synchronization before the new thread is run- I would not worry particularly about it).
Special case (which would as a special case answer the above question): does a context switch provide memory synchronization, that is, when the execution of a process or thread is started or resumed, is the memory synchronized with respect to any memory synchronization by other threads of execution?
No, a context switch does not act as a barrier.
Thread #1 creates a constant object allocated from heap. Thread #1 creates a new thread #2 that reads the data from the object. If we can assume the new thread #2 starts with memory synchronized then everything is fine. However, if the CPU core running the new thread has copy of previously allocated but since discarded data in its cache memory instead of the new value, then it might have wrong view of the state and the application may function incorrectly.
Since pthread_create must perform memory synchronization, this cannot happen. Any old memory that reside in a cpu cache on another core must be invalidated. (Luckily, the commonly used platforms are cache coherent, so the hardware takes care of that).
Now, if you change your object after you've created your 2. thread, you need memory synchronization again so all parties can see the changes, and otherwise avoid race conditions. pthread mutexes are commonly used to achieve that.
cache coherent architectures guarantee from the architectural design point of view that even separated CPUs (ccNUMA - cache coherent Not Uniform Memory Architecture), with independent memory channels when accessing a memory location will not incur in the incoherency you are describing in the example.
This happens with an important penalty, but the application will function correctly.
Thread #1 runs on CPU0, and hold the object memory in cache L1. When thread #2 on CPU1 read the same memory address (or more exactly: the same cache line - look for false sharing for more info), it forces a cache miss on CPU0 before loading that cache line.
You've turned the guarantee pthread_create provides into an incoherent one. The only thing the pthread_create function could possibly do is establish a "happens before" relationship between the thread that calls it and the newly-created thread.
There is no way it could establish such a relationship with existing threads. Consider two threads, one calls pthread_create, the other accesses a shared variable. What guarantee could you possibly have? "If the thread called pthread_create first, then the other thread is guaranteed to see the latest value of the variable". But that "If" renders the guarantee meaningless and useless.
Creating thread:
i = 1;
pthread_create (...)
Created thread:
if (i == 1)
...
Now, this is a coherent guarantee -- the created thread must see i as 1 since that "happened before" the thread was created. Our code made it possible for the standard to enforce a logical "happens before" relationship, and the standard did so to assure us that our code works as we expect.
Now, let's try to do that with an unrelated thread:
Creating thread:
i = 1;
pthread_create (...)
Unrelated thread:
if ( i == 1)
...
What guarantee could we possible have, even if the standard wanted to provide one? With no synchronization between the threads, we haven't tried to make a logical happens before relationship. So the standard can't honor it -- there's nothing to honor. There no particular behavior that is "right", so no way the standard can promise us the right behavior.
The same applies to the other functions. For example, the guarantee for pthread_mutex_lock means that a thread that acquires a mutex sees all changes made by, or seen by, any threads that have unlocked the mutex. We logically expect our thread to get the mutex "after" any threads that got the mutex "before", and the standard promises to honor that expectation so our code works.

POSIX threads and global variables in C on Linux

If I have two threads and one global variable (one thread constantly loops to read the variable; the other constantly loops to write to it) would anything happen that shouldn't? (ex: exceptions, errors). If it, does what is a way to prevent this. I was reading about mutex locks and that they allow exclusive access to a variable to one thread. Does this mean that only that thread can read and write to it and no other?
Would anything happen that shouldn't?
It depends in part on the type of the variables. If the variable is, say, a string (long array of characters), then if the writer and the reader access it at the same time, it is completely undefined what the reader will see.
This is why mutexes and other coordinating mechanisms are provided by pthreads.
Does this mean that only that thread can read and write to it and no other?
Mutexes ensure that at most one thread that is using the mutex can have permission to proceed. All other threads using the same mutex will be held up until the first thread releases the mutex. Therefore, if the code is written properly, at any time, only one thread will be able to access the variable. If the code is not written properly, then:
one thread might access the variable without checking that it has permission to do so
one thread might acquire the mutex and never release it
one thread might destroy the mutex without notifying the other
None of these is desirable behaviour, but the mere existence of a mutex does not prevent any of these happening.
Nevertheless, your code could reasonably use a mutex carefully and then the access to the global variable would be properly controlled. While it has permission via the mutex, either thread could modify the variable, or just read the variable. Either will be safe from interference by the other thread.
Does this mean that only that thread can read and write to it and no other?
It means that only one thread can read or write to the global variable at a time.
The two threads will not race amongst themselves to access the global variable neither will they access it at the same time at any given point of time.
In short the access to the global variable is Synchronized.
First; In C/C++ unsynchronized read/write of variable does not generate any exceptions or system error, BUT it can generate application level errors -- mostly because you are unlikely to fully understand how the memory is accessed, and whether it is atomic unless you look at the generated assembler. A multi core CPU may likely create hard-to-debug race conditions when you access shared memory without synchronization.
Hence
Second; You should always use synchronization -- such as mutex locks -- when dealing with shared memory. A mutex lock is cheap; so it will not really impact performance if done right. Rule of thumb; keep the lcok for as short as possible, such as just for the duration of reading/incrementing/writing the shared memory.
However, from your description, it sounds like that one of your threads is doing nothing BUT waiting for the shared meory to change state before doing something -- that is a bad multi-threaded design which cost unnecessary CPU burn, so
Third; Look at using semaphores (sem_create/wait/post) for synchronization between your threads if you are trying to send a "message" from one thread to the other
As others already said, when communicating between threads through "normal" objects you have to take care of race conditions. Besides mutexes and other lock structures that are relatively heavy weight, the new C standard (C11) provides atomic types and operations that are guaranteed to be race-free. Most modern processors provide instructions for such types and many modern compilers (in particular gcc on linux) already provide their proper interfaces for such operations.
If the threads truly are only one producer and only one consumer, then (barring compiler bugs) then
1) marking the variable as volatile, and
2) making sure that it is correctly aligned, so as to avoid interleaved fetches and stores
will allow you to do this without locking.

Resources