I have a bug in a multi-processes program. The program receives input and instantly produces output, no network involved, and it doesn't have any time references.
What makes the cause of this bug hard to track down is that it only happens sometimes.
If I constantly run it, it produces both correct and incorrect output, with no discernible order or pattern.
What can cause such non-deterministic behavior? Are there tools out there that can help? There is a possibility that there are uninitialized variables in play. How do I find those?
EDIT: Problem solved, thanks for anyone who suggested
Race Condition.
I didn't thought of it mainly because I was sure that my design prevents this. The problem was that I've used 'wait' instead of 'waitpid', thus sometimes, when some process was lucky enough to finish before the one I was expecting, the correct order of things went wild.
You say it's a "multi-processes" program - could you be more specific? It may very well be a race condition in how you're handling the multiple processes.
If you could tell us more about how the processes interact, we might be able to come up with some possibilities. Note that although Artem's suggestion of using a debugger is fine in and of itself, you need to be aware that introducing a debugger may very well change the situation completely - particularly when it comes to race conditions. Personally I'm a fan of logging a lot, but even that can change the timing subtly.
The scheduler!
Basically, when you have multiple processes, they can run in any bizarre order they want. If those processes are sharing a resource that they are both reading and writing from (whether it be a file or memory or an IO device of some sort), ops are going to get interleaved in all sorts of weird orders. As a simple example, suppose you have two threads (they're threads so they share memory) and they're both trying to increment a global variable, x.
y = x + 1;
x = y
Now run those processes, but interleave the code in this way
Assume x = 1
P1:
y = x + 1
So now in P1, for variable y which is local and on the stack, y = 2. Then the scheduler comes in and starts P2
P2:
y = x + 1
x = y
x was still 1 coming into this, so 1 has been added to it and now x = 2
Then P1 finishes
P1:
x = y
and x is still 2! We incremented x twice but only got that once. And because we don't know how this is going to happen, it's referred to as non-deterministic behavior.
The good news is, you've stumbled upon one of the hardest problems in Systems programming as well as the primary battle cry of many of the functional language folks.
You're most likely looking at a race condition, i.e. an unpredictable and therefore hard to reproduce and debug interaction between improperly synchronized threads or processes.
The non-determinism in this case stems from process/thread and memory access scheduling. This is unpredictable because it is influenced by a large number of external factors, including network traffic and user input which constantly cause interrupts and lead to different actual sequences of execution in the program's threads each time it's run.
It could be a lot of things, memory leaks, critical sections access, unclosed resources, unclosed connection and etc. There is only one tool which can help you - DEBUGGER, or try examine your algorithm and find bug, or if you succeeded to point the problematic part, you can paste here a snippet and we will try to help you.
Start with the basics... make sure that all your variables have a default value and that all dynamic memory is zeroed out before you use it (i.e. use calloc rather than malloc). There should be a compiler option to flag this (unless you're using some obscure compiler).
If this is c++ (I know it's supposed to be a 'c' forum), there are times were object creation and initialization lags behind variable assignment that can bite you. For example if you have a scope that is used concurrently by multiple threads (as in a singleton or a global var) this can cause issues:
if (!foo)
Foo tmp = new Foo();
If you have multiple threads access the above, the first thread finds foo = null and starts the object creation and assignment and then yields. Another thread comes in and finds foo != null so skips the section and starts to use foo.
We'd need to see specifics about your code to be able to give a more accurate answer, but to be concise, when you have a program that coordinates between multiple processes or multiple threads, the variable of when the threads execute can add indeterminacy to your application. Essentially, the scheduling that the OS does can cause processes and threads to execute out-of-order. Depending on your environment and code, the scheduling that the OS does can cause wildly different results. You can search on google for more information about out-of-order execution with multithreading for more information; it's a large topic.
By "multi-process" do you mean multi-threaded? If we had two threads that do this routine
i = 1;
while(true)
{
printf(i++);
if(i > 4) i = 1;
}
Normally we'd expect the output to be something like
112233441122334411223344
But actually we'd be seeing something like
11232344112233441231423
This is because each thread would get to use the CPU at different rates. (There's a whole lot of complicated behind the scheduling schedule, and I'm too weak to tell you the technical stuffs behind it.) Suffice to say, the scheduling from the average person's point of view is pretty random.
This is an example of race condition mentioned in other comments.
Related
There are 2 threads,one only reads the signal,the other only sets the signal.
Is it necessary to create a mutex for signal and the reason?
UPDATE
All I care is whether it'll crash if two threads read/set the same time
You will probably want to use atomic variables for this, though a mutex would work as well.
The problem is that there is no guarantee that data will stay in sync between threads, but using atomic variables ensures that as soon as one thread updates that variable, other threads immediately read its updated value.
A problem could occur if one thread updates the variable in cache, and a second thread reads the variable from memory. That second thread would read an out-of-date value for the variable, if the cache had not yet been flushed to memory. Atomic variables ensure that the value of the variable is consistent across threads.
If you are not concerned with timely variable updates, you may be able to get away with a single volatile variable.
It depends. If writes are atomic then you don't need a mutual exclusion lock. If writes are not atomic, then you do need a lock.
There is also the issue of compilers caching variables in the CPU cache which may cause the copy in main memory to not get updating on every write. Some languages have ways of telling the compiler to not cache a variable in the CPU like that (volatile keyword in Java), or to tell the compiler to sync any cached values with main memory (synchronized keyword in Java). But, mutex's in general don't solve this problem.
If all you need is synchronization between threads (one thread must complete something before the other can begin something else) then mutual exclusion should not be necessary.
Mutual exclusion is only necessary when threads are sharing some resource where the resource could be corrupted if they both run through the critical section at roughly the same time. Think of two people sharing a bank account and are at two different ATM's at the same time.
Depending on your language/threading library you may use the same mechanism for synchronization as you do for mutual exclusion- either a semaphore or a monitor. So, if you are using Pthreads someone here could post an example of synchronization and another for mutual exclusion. If its java, there would be another example. Perhaps you can tell us what language/library you're using.
If, as you've said in your edit, you only want to assure against a crash, then you don't need to do much of anything (at least as a rule). If you get a collision between threads, about the worst that will happen is that the data will be corrupted -- e.g., the reader might get a value that's been partially updated, and doesn't correspond directly to any value the writing thread ever wrote. The classic example would be a multi-byte number that you added something to, and there was a carry, (for example) the old value was 0x3f ffff, which was being incremented. It's possible the reading thread could see 0x3f 0000, where the lower 16 bits have been incremented, but the carry to the upper 16 bits hasn't happened (yet).
On a modern machine, an increment on that small of a data item will normally be atomic, but there will be some size (and alignment) where it's not -- typically if part of the variable is in one cache line, and part in another, it'll no longer be atomic. The exact size and alignment for that varies somewhat, but the basic idea remains the same -- it's mostly just a matter of the number having enough digits for it to happen.
Of course, if you're not careful, something like that could cause your code to deadlock or something on that order -- it's impossible to guess what might happen without knowing anything about how you plan to use the data.
I was wondering. If I have an int variable which I want to be synced across all my threads - wouldn't I be able to reserve one bit to know whether the value is being updated or not?
To avoid the write operation being executed in chunks, which would mean threads could potentially be accessing mid-written value, which is not correct, or even worse, overwrite it, causing it to be totally wrong, I want the threads to first be informed that the variable is being written to. I could simply use an atomic operation to write the new value so that the other threads don't interfere, but this idea does not seem that dumb and I would like to use the basic tools first.
What if I just make one operation, which is small enough to keep it in one chunk, an operation like changing a single bit (which will still result in the whole byte(s) changing, but it's not the whole value changing, right), and let the bit indicate the variable is being written to or not? Would that even work, or would the whole int be written to?
I mean, even if the whole int was to change, this still would have a chance of working - if the bit indicating if the value is changing was written first.
Any thoughts on this?
EDIT: I feel like I did not specify what I am actually planning to do, and why I thought of this in the first place.
I am trying to implement a timeout function, similarly to setTimeout in JavaScript. It is pretty straightforward for a timeout that you don't want to ever cancel - you create a new thread, tell it to sleep for given amount of time, then give it a function to execute, eventually with some data. Piece of cake. Finished writing it in maybe half an hour, while being totally new to C.
The hard part comes when you want to set a timeout which might be canceled in the future. So you do exactly the same as a timeout without canceling, but when the thread wakes up and the CPU's scheduler puts it on, the thread must check if a value in the memory it was given when it started does not say 'you should stop executing'. The value could potentially be modified by other thread, but it would only be done once, at least in the best case scenario. I will worry about different solutions when it comes down to trying to modify the value from multiple threads at the same time. The base assumption right now is that only the main thread, or one of other threads, can modify the value, and it will happen only once. Control of it happening only once can be by setting up other variable, which might change multiple times, but always to the same value (that is, initial value is 0 and it means not-yet-canceled, but then when it must be canceled, the value changes to 1, so there is no worrying about the value being fragmented into multiple write operations and only chunk of it being updated at the time of reading it by different thread).
Given this assumption, I think the text I initially wrote at the beginning of this post should be more clear. In a nutshell, no need to worry about the value being written multiple times, only once, but by any thread, and the value must be available to be read by any other thread, or it must be indicated that it cannot be read.
Now as I am thinking of it, since the value itself will only ever be 0 or 1, the trick with knowing when it's already been canceled should work too, shouldn't it? Since the 0 or 1 will always be in one operation, so there is no need to worry about it being fragmented and read incorrectly. Please correct me if I'm wrong.
On the other hand, what if the value is being written from the end, not the beginning? If it's not possible then no need to worry and the post will be resolved, but I would like to know of every danger that might come with overcoming atomic operations like this, in this specific context. In case it is being written from the end, and a thread wants to access the variable to know if it should continue executing, it will notice that it indeed should, while the expected behaviour would be to stop executing. This should have completely minimal chance of being possible, but still is, which means it is dangerous, and I want it to be 100% predictable.
Another edit to explain what steps I imagine the program to make.
Main thread spawns a new thread, aka 'cancelable timeout'. It passes a function to execute along with data, time to sleep, and memory address, pointing to a value. After the thread wakes up after given time, it must check the value to see if it should execute the function it has been given. 0 means it should continue, 1 means it should stop and exit. The value (thread's 'state', canceled or not canceled) can be manipulated by either the main thread, or any other thread, 'timeout', which's job is to cancel the first thread.
Sample code:
struct Timeout {
void (*function)(void* data);
void* data;
int milliseconds;
int** base;
int cancelID;
};
DWORD WINAPI CTimeout(const struct Timeout* data) {
Sleep(data->milliseconds);
if(*(*(data->base) + sizeof(int) * data->cancelID) == 0) {
data->function(data->data);
}
free(data);
return 0;
}
Where CTimeout is a function provided to the newly-spawned thread. Please note that I have written some of this code on go and haven't tested it. Ignore any potential errors.
Timeout.base is pointer to a pointer to an array of ints, since many timeouts can exists at the same time. Timeout.cancelID is the ID of current thread on the list of timeouts. The same ID has a value if treated as index in the base array. If the value is 0, the thread should execute its function, else, clean up the data it has been given and nicely return. The reason behind base being pointer to a pointer, is because at any time, the array of states of timeouts can be resized. In case place of the array changes, there is no option to pass its initial place. It might potentially cause a segmentation fault (if not, correct me please), for accessing memory which does not belong to us anymore.
Base can be accessed from the main thread or other threads if necessary, and the state of our thread can be changed to cancel its execution.
If any thread wants to change the state (the state as state of the timeout we spawned at the beginning and want to cancel), it should change the value in the 'base' array. I think this is pretty straightforward so far.
There would be a huge problem if the values for continuing and stopping would be something bigger than just 1 byte. Operation to write to the memory could actually take multiple operations, and thus, accessing the memory too early would cause unexpected results to occur, which is not what I am fond of. Though, as I earlier mentioned out, what if the value is very small, 0 or 1? Wouldn't it matter at all at what time the value is accessed at? We are interested only in 1 byte, or even 2 or 4 bytes or the whole number, even 8 bytes wouldn't make any difference in this case, would they? In the end, there is no worry about receiving an invalid value, since we don't care about 32bit value, but just 1 bit, no matter how many bytes we would be reading.
Maybe it isn't exactly understandable what I mean. Write/read operations do not consist of reading single bits, but byte(s). That is, if our value is not bigger than 255, or 65535, or 4 million million, whatever the amount of bytes we are writing/reading is, we shouldn't worry about reading it in middle of it being written. What we care about is only one chunk of what is being written, the last or the first byte(s). The rest is completely useless to us, so no need to worry about it all being synced at the time we access the value. The real problem starts when the value is being written to, but the first byte written to is at the end, which is useless to us. If we read the value at that moment, we will receive what we shouldn't - no cancel state instead of cancel. If the first byte, given little endian, was to be read first, we would receive valid value even if reading in the middle of write.
Perhaps I am mangling and mistaking everything. I am not a pro, you know. Perhaps I have been reading trashy articles, whatever. If I am wrong about anything at all, please correct me.
Except for some specialised embedded environments with dedicated hardware, there is no such thing as "one operation, which is small enough to keep it in one chunk, an operation like changing a single bit". You need to keep in mind that you do not want to simply overwrite the special bit with "1" (or "0"). Because even if you could do that, it might just coincide with some other thread doing the same. What you need in fact to do is to check whrther it is already 1 and ONLY if it is NOT write a 1 yourself and KNOW that you did not overwrite an existing 1 (or that writing your 1 failed because of a 1 already being there).
This is called the critical section. And this problem can only be solved by the OS, which happens to know or be able to prevent about other parallel threads. This is the reason for the existence of the OS-supported synchronisation methods.
There is no easy way around this.
Is there a problem with multiple threads using the same integer memory location between pthreads in a C program without any synchronization utilities?
To simplify the issue,
Only one thread will write to the integer
Multiple threads will read the integer
This pseudo-C illustrates what I am thinking
void thread_main(int *a) {
//wait for something to finish
//dereference 'a', make decision based on its value
}
int value = 0;
for (int i=0; i<10; i++)
pthread_create(NULL,NULL,thread_main,&value);
}
// do something
value = 1;
I assume it is safe, since an integer occupies one processor word, and reading/writing to a word should be the most atomic of operations, right?
Your pseudo-code is NOT safe.
Although accessing a word-sized integer is indeed atomic, meaning that you'll never see an intermediate value, but either "before write" or "after write", this isn't enough for your outlined algorithm.
You are relying on the relative order of the write to a and making some other change that wakes the thread. This is not an atomic operation and is not guaranteed on modern processors.
You need some sort of memory fence to prevent write reordering. Otherwise it's not guaranteed that other threads EVER see the new value.
Unlike java where you explicitly start a thread, posix threads start executing immediatelly.
So there is no guarantee that the value you set to 1 in main function (assuming that is what you refer in your pseudocode) will be executed before or after the threads try to access it.
So while it is safe to read the integer concurrently, you need to do some synchronization if you need to write to the value in order to be used by the threads.
Otherwise there is no guarantee what is the value they will read (in order to act depending on the value as you note).
You should not be making assumptions on multithreading e.g.that there is some processing in each thread befor accessing the value etc.
There are no guarantees
I wouldn't count on it. The compiler may emit code that assumes it knows what the value of 'value' is at any given time in a CPU register without re-loading it from memory.
EDIT:
Ben is correct (and I'm an idiot for saying he wasn't) that there is the possibility that the cpu will re-order the instructions and execute them down multiple pipelines at the same time. This means that the value=1 could possibly get set before the pipeline performing "the work" finished. In my defense (not a full idiot?) I have never seen this happen in real life and we do have an extensive thread library and we do run exhaustive long term tests and this pattern is used throughout. I would have seen it if it were happening, but none of our tests ever crash or produce the wrong answer. But... Ben is correct, the possibility exists. It is probably happening all the time in our code, but the re-ordering is not setting flags early enough that the consumers of the data protected by the flags can use the data before its finished. I will be changing our code to include barriers, because there is no guarantee that this will continue to work in the wild. I believe the correct solution is similar to this:
Threads that read the value:
...
if (value)
{
__sync_synchronize(); // don't pipeline any of the work until after checking value
DoSomething();
}
...
The thread that sets the value:
...
DoStuff()
__sync_synchronize(); // Don't pipeline "setting value" until after finishing stuff
value = 1; // Stuff Done
...
That being said, I found this to be a simple explanation of barriers.
COMPILER BARRIER
Memory barriers affect the CPU. Compiler barriers affect the compiler. Volatile will not keep the compiler from re-ordering code. Here for more info.
I believe you can use this code to keep gcc from rearranging the code during compile time:
#define COMPILER_BARRIER() __asm__ __volatile__ ("" ::: "memory")
So maybe this is what should really be done?
#define GENERAL_BARRIER() do { COMPILER_BARRIER(); __sync_synchronize(); } while(0)
Threads that read the value:
...
if (value)
{
GENERAL_BARRIER(); // don't pipeline any of the work until after checking value
DoSomething();
}
...
The thread that sets the value:
...
DoStuff()
GENERAL_BARRIER(); // Don't pipeline "setting value" until after finishing stuff
value = 1; // Stuff Done
...
Using GENERAL_BARRIER() keeps gcc from re-ordering the code and also keeps the cpu from re-ordering the code. Now, I wonder if gcc wont re-order code over its memory barrier builtin, __sync_synchronize(), which would make the use of COMPILER_BARRIER redundant.
X86
As Ben points out, different architectures have different rules regarding how they rearrange code in the execution pipelines. Intel seems to be fairly conservative. So the barriers might not be required nearly as much on Intel. Not a good reason to avoid the barriers though, since that could change.
ORIGINAL POST:
We do this all the time. its perfectly safe (not for all situations, but a lot). Our application runs on 1000's of servers in a huge farm with 16 instances per server and we don't have race conditions. You are correct to wonder why people use mutexes to protect already atomic operations. In many situations the lock is a waste of time. Reading and writing to 32 bit integers on most architectures is atomic. Don't try that with 32 bit bit-fields though!
Processor write re-ordering is not going to affect one thread reading a global value set by another thread. In fact, the result using locks is the same as the result not without locks. If you win the race and check the value before its changed ... well that's the same as winning the race to lock the value so no-one else can change it while you read it. Functionally the same.
The volatile keyword tells the compiler not to store a value in a register, but to keep referring to the original memory location. this should have no effect unless you are optimizing code. We have found that the compiler is pretty smart about this and have not run into a situation yet where volatile changed anything. The compiler seems to be pretty good at coming up with candidates for register optimization. I suspect that the const keyword might encourage register optimization on a variable.
The compiler might re-order code in a function if it knows the end result will not be different. I have not seen the compiler do this with global variables, because the compiler has no idea how changing the order of a global variable will affect code outside of the immediate function.
If a function is acting up, you can control the optimization level at the function level using __attrribute__.
Now, that said, if you use that flag as a gateway to allow only one thread of a group to perform some work, that wont work. Example: Thread A and Thread B both could read the flag. Thread A gets scheduled out. Thread B sets the flag to 1 and starts working. Thread A wakes up and sets the flag to 1 and starts working. Ooops! To avoid locks and still do something like that you need to look into atomic operations, specifically gcc atomic builtins like __sync_bool_compare_and_swap(value, old, new). This allows you to set value = new if value is currently old. In the previous example, if value = 1, only one thread (A or B) could execute __sync_bool_compare_and_swap(&value, 1, 2) and change value from 1 to 2. The losing thread would fail. __sync_bool_compare_and_swap returns the success of the operation.
Deep down, there is a "lock" when you use the atomic builtins, but it is a hardware instruction and very fast when compared to using mutexes.
That said, use mutexes when you have to change a lot of values at the same time. atomic operations (as of todayu) only work when all the data that has to change atomicly can fit into a contiguous 8,16,32,64 or 128 bits.
Assume the first thing you're doing in thread func in sleeping for a second. So value after that will be definetly 1.
In any instant you should at least declare the shared variable volatile. However you should in all cases prefer some other form of thread IPC or synchronisation; in this case it looks like a condition variable is what you actually need.
Hm, I guess it is secure, but why don't you just declare a function that returns the value to the other threads, as they will only read it?
Because the simple idea of passing pointers to separate threads is already a security fail, in my humble opinion. What I'm telling you is: why to give a (modifiable, public accessible) integer address when you only need the value?
Say I have multiple threads and all threads call the same function at approximately the same time.
Is there a calling convention which would only allow one instance of the function at any time? What I mean is that the function called by the second thread would only start after the function called by the first thread had returned.
Or are these calling conventions compiler specific? I don't have a whole lot of experience using them.
(Skip to the bottom if you don't care about the threading mumbo-jumbo)
As mentioned before, this is not a "calling convention" but a general problem of computing: concurrency. And the particular case where two or more threads can enter a shared zone at a time, and have a different outcome, is called a race condition (and also extends to/from electronics, and other areas).
The hard thing about threading is that computing is such a deterministic affair, but when threading gets involved, it adds a degree of uncertainty, which vary per platform/OS.
A one-thread affair would guarantee that it can do all tasks in the same order, always, but when you got multiple threads, and the order depends on how fast they can complete a task, shared other applications wanting to use the CPU, then the underlying hardware affects the results.
There's not much of a "sure fire way to do threading", as there's techniques, tools and libraries to deal with individual cases.
Locking in
The most well known technique is using semaphores (or locks), and the most well known semaphore is the mutex one, which only allows one thread at a time to access a shared space, by having a sort of "flag" that is raised once a thread has entered.
if (locked == NO)
{
locked = YES;
// Do ya' thing
locked = NO;
}
The code above, although it looks like it could work, it would not guarantee against cases where both threads pass the if () and then set the variable (which threads can easily do). So there's hardware support for this kind of operation, that guarantees that only one thread can execute it: The testAndSet operation, that checks and then, if available, sets the variable. (Here's the x86 instruction from the instruction set)
On the same vein of locks and semaphores, there's also the read-write lock, that allows multiple readers and one writer, specially useful for things with low volatility. And there's many other variations, some that limit an X amount of threads and whatnot.
But overall, locks are lame, since they are basically forcing serialisation of multi-threading, where threads actually need to get stuck trying to get a lock (or just testing it and leaving). Kinda defeats the purpose of having multiple threads, doesn't it?
The best solution in terms of threading, is to minimise the amount of shared space that threads need to use, possibly, elmininating it completely. Maybe use rwlocks when volatility is low, try to have "try and leave" kind of threads, that check if the lock is up, and then go away if it isn't, etc.
As my OS teacher once said (in Zen-like fashion): "The best kind of locking is the one you can avoid".
Thread Pools
Now, threading is hard, no way around it, that's why there are patterns to deal with such kind of problems, and the Thread Pool Pattern is a popular one, at least in iOS since the introduction of Grand Central Dispatch (GCD).
Instead of having a bunch of threads running amok and getting enqueued all over the place, let's have a set of threads, waiting for tasks in a "pool", and having queues of things to do, ideally, tasks that shouldn't overlap each other.
Now, the thread pattern doesn't solve the problems discussed before, but it changes the paradigm to make it easier to deal with, mentally. Instead of having to think about "threads that need to execute such and such", you just switch the focus to "tasks that need to be executed" and the matter of which thread is doing it, becomes irrelevant.
Again, pools won't solve all your problems, but it will make them easier to understand. And easier to understand may lead to better solutions.
All the theoretical things above mentioned are implemented already, at POSIX level (semaphore.h, pthreads.h, etc. pthreads has a very nice of r/w locking functions), try reading about them.
(Edit: I thought this thread was about Obj-C, not plain C, edited out all the Foundation and GCD stuff)
Calling convention defines how stack & registers are used to implement function calls. Because each thread has its own stack & registers, synchronising threads and calling convention are separate things.
To prevent multiple threads from executing the same code at the same time, you need a mutex. In your example of a function, you'd typically put the mutex lock and unlock inside the function's code, around the statements you don't want your threads to be executing at the same time.
In general terms: Plain code, including function calls, does not know about threads, the operating system does. By using a mutex you tap into the system that manages the running of threads. More details are just a Google search away.
Note that C11, the new C standard revision, does include multi-threading support. But this does not change the general concept; it simply means that you can use C library functions instead of operating system specific ones.
Is there a problem with multiple threads using the same integer memory location between pthreads in a C program without any synchronization utilities?
To simplify the issue,
Only one thread will write to the integer
Multiple threads will read the integer
This pseudo-C illustrates what I am thinking
void thread_main(int *a) {
//wait for something to finish
//dereference 'a', make decision based on its value
}
int value = 0;
for (int i=0; i<10; i++)
pthread_create(NULL,NULL,thread_main,&value);
}
// do something
value = 1;
I assume it is safe, since an integer occupies one processor word, and reading/writing to a word should be the most atomic of operations, right?
Your pseudo-code is NOT safe.
Although accessing a word-sized integer is indeed atomic, meaning that you'll never see an intermediate value, but either "before write" or "after write", this isn't enough for your outlined algorithm.
You are relying on the relative order of the write to a and making some other change that wakes the thread. This is not an atomic operation and is not guaranteed on modern processors.
You need some sort of memory fence to prevent write reordering. Otherwise it's not guaranteed that other threads EVER see the new value.
Unlike java where you explicitly start a thread, posix threads start executing immediatelly.
So there is no guarantee that the value you set to 1 in main function (assuming that is what you refer in your pseudocode) will be executed before or after the threads try to access it.
So while it is safe to read the integer concurrently, you need to do some synchronization if you need to write to the value in order to be used by the threads.
Otherwise there is no guarantee what is the value they will read (in order to act depending on the value as you note).
You should not be making assumptions on multithreading e.g.that there is some processing in each thread befor accessing the value etc.
There are no guarantees
I wouldn't count on it. The compiler may emit code that assumes it knows what the value of 'value' is at any given time in a CPU register without re-loading it from memory.
EDIT:
Ben is correct (and I'm an idiot for saying he wasn't) that there is the possibility that the cpu will re-order the instructions and execute them down multiple pipelines at the same time. This means that the value=1 could possibly get set before the pipeline performing "the work" finished. In my defense (not a full idiot?) I have never seen this happen in real life and we do have an extensive thread library and we do run exhaustive long term tests and this pattern is used throughout. I would have seen it if it were happening, but none of our tests ever crash or produce the wrong answer. But... Ben is correct, the possibility exists. It is probably happening all the time in our code, but the re-ordering is not setting flags early enough that the consumers of the data protected by the flags can use the data before its finished. I will be changing our code to include barriers, because there is no guarantee that this will continue to work in the wild. I believe the correct solution is similar to this:
Threads that read the value:
...
if (value)
{
__sync_synchronize(); // don't pipeline any of the work until after checking value
DoSomething();
}
...
The thread that sets the value:
...
DoStuff()
__sync_synchronize(); // Don't pipeline "setting value" until after finishing stuff
value = 1; // Stuff Done
...
That being said, I found this to be a simple explanation of barriers.
COMPILER BARRIER
Memory barriers affect the CPU. Compiler barriers affect the compiler. Volatile will not keep the compiler from re-ordering code. Here for more info.
I believe you can use this code to keep gcc from rearranging the code during compile time:
#define COMPILER_BARRIER() __asm__ __volatile__ ("" ::: "memory")
So maybe this is what should really be done?
#define GENERAL_BARRIER() do { COMPILER_BARRIER(); __sync_synchronize(); } while(0)
Threads that read the value:
...
if (value)
{
GENERAL_BARRIER(); // don't pipeline any of the work until after checking value
DoSomething();
}
...
The thread that sets the value:
...
DoStuff()
GENERAL_BARRIER(); // Don't pipeline "setting value" until after finishing stuff
value = 1; // Stuff Done
...
Using GENERAL_BARRIER() keeps gcc from re-ordering the code and also keeps the cpu from re-ordering the code. Now, I wonder if gcc wont re-order code over its memory barrier builtin, __sync_synchronize(), which would make the use of COMPILER_BARRIER redundant.
X86
As Ben points out, different architectures have different rules regarding how they rearrange code in the execution pipelines. Intel seems to be fairly conservative. So the barriers might not be required nearly as much on Intel. Not a good reason to avoid the barriers though, since that could change.
ORIGINAL POST:
We do this all the time. its perfectly safe (not for all situations, but a lot). Our application runs on 1000's of servers in a huge farm with 16 instances per server and we don't have race conditions. You are correct to wonder why people use mutexes to protect already atomic operations. In many situations the lock is a waste of time. Reading and writing to 32 bit integers on most architectures is atomic. Don't try that with 32 bit bit-fields though!
Processor write re-ordering is not going to affect one thread reading a global value set by another thread. In fact, the result using locks is the same as the result not without locks. If you win the race and check the value before its changed ... well that's the same as winning the race to lock the value so no-one else can change it while you read it. Functionally the same.
The volatile keyword tells the compiler not to store a value in a register, but to keep referring to the original memory location. this should have no effect unless you are optimizing code. We have found that the compiler is pretty smart about this and have not run into a situation yet where volatile changed anything. The compiler seems to be pretty good at coming up with candidates for register optimization. I suspect that the const keyword might encourage register optimization on a variable.
The compiler might re-order code in a function if it knows the end result will not be different. I have not seen the compiler do this with global variables, because the compiler has no idea how changing the order of a global variable will affect code outside of the immediate function.
If a function is acting up, you can control the optimization level at the function level using __attrribute__.
Now, that said, if you use that flag as a gateway to allow only one thread of a group to perform some work, that wont work. Example: Thread A and Thread B both could read the flag. Thread A gets scheduled out. Thread B sets the flag to 1 and starts working. Thread A wakes up and sets the flag to 1 and starts working. Ooops! To avoid locks and still do something like that you need to look into atomic operations, specifically gcc atomic builtins like __sync_bool_compare_and_swap(value, old, new). This allows you to set value = new if value is currently old. In the previous example, if value = 1, only one thread (A or B) could execute __sync_bool_compare_and_swap(&value, 1, 2) and change value from 1 to 2. The losing thread would fail. __sync_bool_compare_and_swap returns the success of the operation.
Deep down, there is a "lock" when you use the atomic builtins, but it is a hardware instruction and very fast when compared to using mutexes.
That said, use mutexes when you have to change a lot of values at the same time. atomic operations (as of todayu) only work when all the data that has to change atomicly can fit into a contiguous 8,16,32,64 or 128 bits.
Assume the first thing you're doing in thread func in sleeping for a second. So value after that will be definetly 1.
In any instant you should at least declare the shared variable volatile. However you should in all cases prefer some other form of thread IPC or synchronisation; in this case it looks like a condition variable is what you actually need.
Hm, I guess it is secure, but why don't you just declare a function that returns the value to the other threads, as they will only read it?
Because the simple idea of passing pointers to separate threads is already a security fail, in my humble opinion. What I'm telling you is: why to give a (modifiable, public accessible) integer address when you only need the value?