A thread only reads and a thread only modifies. Does this variable also need a mutex with linux c? [duplicate] - c

There are 2 threads,one only reads the signal,the other only sets the signal.
Is it necessary to create a mutex for signal and the reason?
UPDATE
All I care is whether it'll crash if two threads read/set the same time

You will probably want to use atomic variables for this, though a mutex would work as well.
The problem is that there is no guarantee that data will stay in sync between threads, but using atomic variables ensures that as soon as one thread updates that variable, other threads immediately read its updated value.
A problem could occur if one thread updates the variable in cache, and a second thread reads the variable from memory. That second thread would read an out-of-date value for the variable, if the cache had not yet been flushed to memory. Atomic variables ensure that the value of the variable is consistent across threads.
If you are not concerned with timely variable updates, you may be able to get away with a single volatile variable.

It depends. If writes are atomic then you don't need a mutual exclusion lock. If writes are not atomic, then you do need a lock.
There is also the issue of compilers caching variables in the CPU cache which may cause the copy in main memory to not get updating on every write. Some languages have ways of telling the compiler to not cache a variable in the CPU like that (volatile keyword in Java), or to tell the compiler to sync any cached values with main memory (synchronized keyword in Java). But, mutex's in general don't solve this problem.

If all you need is synchronization between threads (one thread must complete something before the other can begin something else) then mutual exclusion should not be necessary.
Mutual exclusion is only necessary when threads are sharing some resource where the resource could be corrupted if they both run through the critical section at roughly the same time. Think of two people sharing a bank account and are at two different ATM's at the same time.
Depending on your language/threading library you may use the same mechanism for synchronization as you do for mutual exclusion- either a semaphore or a monitor. So, if you are using Pthreads someone here could post an example of synchronization and another for mutual exclusion. If its java, there would be another example. Perhaps you can tell us what language/library you're using.

If, as you've said in your edit, you only want to assure against a crash, then you don't need to do much of anything (at least as a rule). If you get a collision between threads, about the worst that will happen is that the data will be corrupted -- e.g., the reader might get a value that's been partially updated, and doesn't correspond directly to any value the writing thread ever wrote. The classic example would be a multi-byte number that you added something to, and there was a carry, (for example) the old value was 0x3f ffff, which was being incremented. It's possible the reading thread could see 0x3f 0000, where the lower 16 bits have been incremented, but the carry to the upper 16 bits hasn't happened (yet).
On a modern machine, an increment on that small of a data item will normally be atomic, but there will be some size (and alignment) where it's not -- typically if part of the variable is in one cache line, and part in another, it'll no longer be atomic. The exact size and alignment for that varies somewhat, but the basic idea remains the same -- it's mostly just a matter of the number having enough digits for it to happen.
Of course, if you're not careful, something like that could cause your code to deadlock or something on that order -- it's impossible to guess what might happen without knowing anything about how you plan to use the data.

Related

Is it safe to read and write to an array at different positions from multiple threads in C with phtreads?

Let's suppose that there are two threads, A and B. There is also a shared array: float X[100].
Thread A writes to the array one element at a time in order, every 10 steps it updates a shared variable index (in a safe way) that indicates the current index, and it also sends a signal to thread B.
As soon as thread B receives the signal, it reads index in a safe way, and then proceed to read the elements of X until position index.
Is it safe to do this? Thread A really updates the array or just a copy in cache?
Every sane way of one thread sending a signal to another provides the assurance that anything written by a thread before sending a signal is guaranteed to be visible to a thread after it receives that signal. So as long as you sent the signal through some means that provided this guarantee, which they pretty much all do, you are safe.
Note that attempting to use a condition variable without a predicate protected by a mutex is not a sane way of one thread sending a signal to another! Among other things, it doesn't guarantee that the thread that you think received the signal actually received the signal. You do need to make sure the thread that does the reads in fact received the very signal sent by the thread that does the writes.
Is it safe to do this?
Provided your data modification is rendered safe and protected by critical sections, locks or whatever, this kind of access is perfectly safe for what concerns hardware access.
Thread A really updates the array or just a copy in cache?
Just a copy in cache. Most caches are presently write-back and just write data back to memory when a line is ejected from the cache if it has been modified. This largely improves memory bandwidth, especially in a multicore context.
BUT all happens as if the memory had been updated.
For shared memory processors, there are generally cache coherency protocols (except in some processors for real time applications). The basic idea of these protocols is that a state is associated with every cache line.
State describes informations concerning the line in the cache of the different processors.
These states indicate, for instance, if the line is only present in the current cache, or is shared by several caches, in sync with memory, invalid... See for instance this description of the popular MESI cache coherence protocol.
So what happens, when a cache line is written and is also present in another processor?
Thanks to the state, the cache knows that one or more other processor also have a copy of the line and it will send an invalidate signal. The line will be invalidated in the other caches and when they want to read or write it, they have to reload its content. Actually, this reload will be served by the cache that has the valid copy to limit memory accesses.
This way, whilst data is only written in the cache, the behavior is similar to a situation where data would have been written to memory.
BUT, despite the fact that functionally the hardware will ensure correctness of the transfer, one must be take into account the cache existence, to avoid performances degradation.
Assume cache A is updating a line and cache B is reading it. Whenever cache A writes, the line in cache B is invalidated. And whenever cache B wants to read it, if the line has been invalidated, it must fetch it from cache A. This can lead to many transfers of the line between the caches and render inefficient the memory system.
So concerning your example, probably 10 is not a good idea, and you should use informations on the caches to improve your exchanges between sender and receiver.
For instance, if you are on a pentium with 64 bytes cache lines, you should declare X as
_Alignas(64) float X[100];
This way the starting address of X will be a multiple of 64 and fit cache lines boundaries. The _Alignas quaiifier exists since C17, and by including stdalign.h, you can also use similarly alignas(64). Before C17, there were several extensions in most compilers in order to have an aligned placement.
And of course, you should indicate process B to read data only when a full 64 bytes line (16 floats) has been written.
This way, when thread B accesses the data, the cache line will not be modified any longer by thread A and only one initial transfer between caches A and B Will take place. This reduction in the number of transfers between the caches may have a significant impact on performances depending on your program.
If you're using a variable to that tracks readiness to read the index, the variable is protected by a mutex and the signalling is done via a pthread condition variable that thread B waits on under the mutex, then yes.
If you're using POSIX signals, then I believe you need a synchronization mechanism on top of that. Writing to an atomic variable with memory_order_release in thread A, and reading it with memory_order_acquire in thread B should guarantee in the most lightweight fashion that writes in A preceding the write to the atomic should be visible in B after it has read the atomic.
For best performance, the array sharing should be also done in such a way that the shared parts of the array do not cross cache-line boundaries (or else you're performance might degrade due to false sharing).

Is it possible to achieve 2 lines of code to always occur in a order in a multithreaded program without locks?

atomic_compare_exchange_strong_explicit(mem, old, new, <mem_order>, <mem_order>);
ftruncate(fd, <size>);
All I want is that these two lines of code always occur without any interference (WITHOUT USING LOCKS). Immediately after that CAS, ftruncate(2) should be called. I read a small description about memory orders, although I don’t understand them much. But they seemed to make this possible. Is there any way around?
Your title asks for the things to occur in order. That's easy, and C basically does that automatically with mo_seq_cst; all visible side-effects of CAS will appear before any from ftruncate.
(Not strictly required by the ISO C standard, but in practice real implementations implement seq-cst with a full barrier, except AArch64 where STLR doesn't stall to drain the store buffer unless/until there's a LDAR while the seq-cst store is still in the store buffer. But a system call is definitely going to also include a full barrier.)
Within the thread doing the operation, the atomic is Sequenced Before the system call.
What kind of interference are you worried about? Some other thread changing the size of the file? You can't prevent that race condition.
There's no way to combine some operation on memory + a system call into a single atomic transaction. You would need to use a hypothetical system call that atomically does what you want. (Presumably it would have to do locking inside the kernel to make a file operation and a memory modification appear as one atomic transaction.) e.g. the Linux futex system call atomically does a couple things, but of course there's nothing like this for any other operations.
Or you need locking. (Or to suspend all other threads of your process somehow.)

Usage of atomic integer in a shared data

I was studying OS and synchronizing and I got an idea about dealing with this shared data without synchronizing but I am not sure if it will work.Here is the code
Now,the race condition is obviously the increment and decrement in a shared data.But what if the integer variable was atomic?I think I read something about this when I just a beginner in CS so question might not be perfect.As far as I remember it was blocking something to prevent the increment and decrement at the same time.Now,I am a bit confused about this because if the atomic variables really worked there would not be any need to find synchronization methods for simple codes like this one.
Note:Code is removed since it just changes the focus of people and answer provides enough info
As it stands, the code is indeed not safe to call concurrently, so there must be some kind of syncronization that prevents this.
Now, concerning the idea to make num_processes atomic, that could work. It wouldn't be a simple substitution though, in particular comparing to the max and incrementing must be done atomically and not in two steps, otherwise you still have a race condition. In particular, the following steps must be prevented:
Thread A checks if the limit is reached, which it isn't.
Thread B checks if the limit is reached, which it isn't.
Thread B increments the PID counter.
Thread A increments the PID counter.
Each step in and of itself is atomic, but obviously that didn't help preventing a PID overflow. Instead, the code must check if the counter is not at the limit and then increment it atomically. This is also a common task (compare and increment), so you should easily find existing code examples.
However, I'm pretty sure this isn't all code that is involved and some other code (e.g. in get_processID() or the code that releases a PID) could still require a lock around the whole.
For your code, synchronization is not necessary at all because here num_processes is incremented and decremented by only one process i.e. Parent process.And also num_processes is not a shared variable here. To create shared variable you have to first learn about shmget() and shmat() function in UNIX.
And race condition arises if two or more processes want to access a shared memory.An operation will be atomic if that operation is going to executed entirely (i.e. no switching) or not at all. For example
Consider increment operator on a shared data. This operator is not atomic. Because if go to the lower level instruction for increment operator then this operation is performed in several steps as:
1. First load the value of variable in some register.
2. Add one with that loaded value and now result will be in some temporary register.
3. Store this result in the memory location / register that is pointed by that variable on which increment is performed.
Now As you can see this operation is done in three step. So if there is any switching to another process before completion of these three steps then it leads to undesired results. For more you can read about race condition from this link http://tutorials.jenkov.com/java-concurrency/race-conditions-and-critical-sections.html. As from above you can see that add, store, load instructions are atomic because it will be performed entirely or not at all considering there is no power failure any system failure. So to perform increment operation atomic we need to do some synchronization either using semaphores or monitors. These all are software synchronization technique. I think now you will be clear on this topic..

pointer shared between two threads without mutex [duplicate]

Is there a problem with multiple threads using the same integer memory location between pthreads in a C program without any synchronization utilities?
To simplify the issue,
Only one thread will write to the integer
Multiple threads will read the integer
This pseudo-C illustrates what I am thinking
void thread_main(int *a) {
//wait for something to finish
//dereference 'a', make decision based on its value
}
int value = 0;
for (int i=0; i<10; i++)
pthread_create(NULL,NULL,thread_main,&value);
}
// do something
value = 1;
I assume it is safe, since an integer occupies one processor word, and reading/writing to a word should be the most atomic of operations, right?
Your pseudo-code is NOT safe.
Although accessing a word-sized integer is indeed atomic, meaning that you'll never see an intermediate value, but either "before write" or "after write", this isn't enough for your outlined algorithm.
You are relying on the relative order of the write to a and making some other change that wakes the thread. This is not an atomic operation and is not guaranteed on modern processors.
You need some sort of memory fence to prevent write reordering. Otherwise it's not guaranteed that other threads EVER see the new value.
Unlike java where you explicitly start a thread, posix threads start executing immediatelly.
So there is no guarantee that the value you set to 1 in main function (assuming that is what you refer in your pseudocode) will be executed before or after the threads try to access it.
So while it is safe to read the integer concurrently, you need to do some synchronization if you need to write to the value in order to be used by the threads.
Otherwise there is no guarantee what is the value they will read (in order to act depending on the value as you note).
You should not be making assumptions on multithreading e.g.that there is some processing in each thread befor accessing the value etc.
There are no guarantees
I wouldn't count on it. The compiler may emit code that assumes it knows what the value of 'value' is at any given time in a CPU register without re-loading it from memory.
EDIT:
Ben is correct (and I'm an idiot for saying he wasn't) that there is the possibility that the cpu will re-order the instructions and execute them down multiple pipelines at the same time. This means that the value=1 could possibly get set before the pipeline performing "the work" finished. In my defense (not a full idiot?) I have never seen this happen in real life and we do have an extensive thread library and we do run exhaustive long term tests and this pattern is used throughout. I would have seen it if it were happening, but none of our tests ever crash or produce the wrong answer. But... Ben is correct, the possibility exists. It is probably happening all the time in our code, but the re-ordering is not setting flags early enough that the consumers of the data protected by the flags can use the data before its finished. I will be changing our code to include barriers, because there is no guarantee that this will continue to work in the wild. I believe the correct solution is similar to this:
Threads that read the value:
...
if (value)
{
__sync_synchronize(); // don't pipeline any of the work until after checking value
DoSomething();
}
...
The thread that sets the value:
...
DoStuff()
__sync_synchronize(); // Don't pipeline "setting value" until after finishing stuff
value = 1; // Stuff Done
...
That being said, I found this to be a simple explanation of barriers.
COMPILER BARRIER
Memory barriers affect the CPU. Compiler barriers affect the compiler. Volatile will not keep the compiler from re-ordering code. Here for more info.
I believe you can use this code to keep gcc from rearranging the code during compile time:
#define COMPILER_BARRIER() __asm__ __volatile__ ("" ::: "memory")
So maybe this is what should really be done?
#define GENERAL_BARRIER() do { COMPILER_BARRIER(); __sync_synchronize(); } while(0)
Threads that read the value:
...
if (value)
{
GENERAL_BARRIER(); // don't pipeline any of the work until after checking value
DoSomething();
}
...
The thread that sets the value:
...
DoStuff()
GENERAL_BARRIER(); // Don't pipeline "setting value" until after finishing stuff
value = 1; // Stuff Done
...
Using GENERAL_BARRIER() keeps gcc from re-ordering the code and also keeps the cpu from re-ordering the code. Now, I wonder if gcc wont re-order code over its memory barrier builtin, __sync_synchronize(), which would make the use of COMPILER_BARRIER redundant.
X86
As Ben points out, different architectures have different rules regarding how they rearrange code in the execution pipelines. Intel seems to be fairly conservative. So the barriers might not be required nearly as much on Intel. Not a good reason to avoid the barriers though, since that could change.
ORIGINAL POST:
We do this all the time. its perfectly safe (not for all situations, but a lot). Our application runs on 1000's of servers in a huge farm with 16 instances per server and we don't have race conditions. You are correct to wonder why people use mutexes to protect already atomic operations. In many situations the lock is a waste of time. Reading and writing to 32 bit integers on most architectures is atomic. Don't try that with 32 bit bit-fields though!
Processor write re-ordering is not going to affect one thread reading a global value set by another thread. In fact, the result using locks is the same as the result not without locks. If you win the race and check the value before its changed ... well that's the same as winning the race to lock the value so no-one else can change it while you read it. Functionally the same.
The volatile keyword tells the compiler not to store a value in a register, but to keep referring to the original memory location. this should have no effect unless you are optimizing code. We have found that the compiler is pretty smart about this and have not run into a situation yet where volatile changed anything. The compiler seems to be pretty good at coming up with candidates for register optimization. I suspect that the const keyword might encourage register optimization on a variable.
The compiler might re-order code in a function if it knows the end result will not be different. I have not seen the compiler do this with global variables, because the compiler has no idea how changing the order of a global variable will affect code outside of the immediate function.
If a function is acting up, you can control the optimization level at the function level using __attrribute__.
Now, that said, if you use that flag as a gateway to allow only one thread of a group to perform some work, that wont work. Example: Thread A and Thread B both could read the flag. Thread A gets scheduled out. Thread B sets the flag to 1 and starts working. Thread A wakes up and sets the flag to 1 and starts working. Ooops! To avoid locks and still do something like that you need to look into atomic operations, specifically gcc atomic builtins like __sync_bool_compare_and_swap(value, old, new). This allows you to set value = new if value is currently old. In the previous example, if value = 1, only one thread (A or B) could execute __sync_bool_compare_and_swap(&value, 1, 2) and change value from 1 to 2. The losing thread would fail. __sync_bool_compare_and_swap returns the success of the operation.
Deep down, there is a "lock" when you use the atomic builtins, but it is a hardware instruction and very fast when compared to using mutexes.
That said, use mutexes when you have to change a lot of values at the same time. atomic operations (as of todayu) only work when all the data that has to change atomicly can fit into a contiguous 8,16,32,64 or 128 bits.
Assume the first thing you're doing in thread func in sleeping for a second. So value after that will be definetly 1.
In any instant you should at least declare the shared variable volatile. However you should in all cases prefer some other form of thread IPC or synchronisation; in this case it looks like a condition variable is what you actually need.
Hm, I guess it is secure, but why don't you just declare a function that returns the value to the other threads, as they will only read it?
Because the simple idea of passing pointers to separate threads is already a security fail, in my humble opinion. What I'm telling you is: why to give a (modifiable, public accessible) integer address when you only need the value?

Can an integer be shared between threads safely?

Is there a problem with multiple threads using the same integer memory location between pthreads in a C program without any synchronization utilities?
To simplify the issue,
Only one thread will write to the integer
Multiple threads will read the integer
This pseudo-C illustrates what I am thinking
void thread_main(int *a) {
//wait for something to finish
//dereference 'a', make decision based on its value
}
int value = 0;
for (int i=0; i<10; i++)
pthread_create(NULL,NULL,thread_main,&value);
}
// do something
value = 1;
I assume it is safe, since an integer occupies one processor word, and reading/writing to a word should be the most atomic of operations, right?
Your pseudo-code is NOT safe.
Although accessing a word-sized integer is indeed atomic, meaning that you'll never see an intermediate value, but either "before write" or "after write", this isn't enough for your outlined algorithm.
You are relying on the relative order of the write to a and making some other change that wakes the thread. This is not an atomic operation and is not guaranteed on modern processors.
You need some sort of memory fence to prevent write reordering. Otherwise it's not guaranteed that other threads EVER see the new value.
Unlike java where you explicitly start a thread, posix threads start executing immediatelly.
So there is no guarantee that the value you set to 1 in main function (assuming that is what you refer in your pseudocode) will be executed before or after the threads try to access it.
So while it is safe to read the integer concurrently, you need to do some synchronization if you need to write to the value in order to be used by the threads.
Otherwise there is no guarantee what is the value they will read (in order to act depending on the value as you note).
You should not be making assumptions on multithreading e.g.that there is some processing in each thread befor accessing the value etc.
There are no guarantees
I wouldn't count on it. The compiler may emit code that assumes it knows what the value of 'value' is at any given time in a CPU register without re-loading it from memory.
EDIT:
Ben is correct (and I'm an idiot for saying he wasn't) that there is the possibility that the cpu will re-order the instructions and execute them down multiple pipelines at the same time. This means that the value=1 could possibly get set before the pipeline performing "the work" finished. In my defense (not a full idiot?) I have never seen this happen in real life and we do have an extensive thread library and we do run exhaustive long term tests and this pattern is used throughout. I would have seen it if it were happening, but none of our tests ever crash or produce the wrong answer. But... Ben is correct, the possibility exists. It is probably happening all the time in our code, but the re-ordering is not setting flags early enough that the consumers of the data protected by the flags can use the data before its finished. I will be changing our code to include barriers, because there is no guarantee that this will continue to work in the wild. I believe the correct solution is similar to this:
Threads that read the value:
...
if (value)
{
__sync_synchronize(); // don't pipeline any of the work until after checking value
DoSomething();
}
...
The thread that sets the value:
...
DoStuff()
__sync_synchronize(); // Don't pipeline "setting value" until after finishing stuff
value = 1; // Stuff Done
...
That being said, I found this to be a simple explanation of barriers.
COMPILER BARRIER
Memory barriers affect the CPU. Compiler barriers affect the compiler. Volatile will not keep the compiler from re-ordering code. Here for more info.
I believe you can use this code to keep gcc from rearranging the code during compile time:
#define COMPILER_BARRIER() __asm__ __volatile__ ("" ::: "memory")
So maybe this is what should really be done?
#define GENERAL_BARRIER() do { COMPILER_BARRIER(); __sync_synchronize(); } while(0)
Threads that read the value:
...
if (value)
{
GENERAL_BARRIER(); // don't pipeline any of the work until after checking value
DoSomething();
}
...
The thread that sets the value:
...
DoStuff()
GENERAL_BARRIER(); // Don't pipeline "setting value" until after finishing stuff
value = 1; // Stuff Done
...
Using GENERAL_BARRIER() keeps gcc from re-ordering the code and also keeps the cpu from re-ordering the code. Now, I wonder if gcc wont re-order code over its memory barrier builtin, __sync_synchronize(), which would make the use of COMPILER_BARRIER redundant.
X86
As Ben points out, different architectures have different rules regarding how they rearrange code in the execution pipelines. Intel seems to be fairly conservative. So the barriers might not be required nearly as much on Intel. Not a good reason to avoid the barriers though, since that could change.
ORIGINAL POST:
We do this all the time. its perfectly safe (not for all situations, but a lot). Our application runs on 1000's of servers in a huge farm with 16 instances per server and we don't have race conditions. You are correct to wonder why people use mutexes to protect already atomic operations. In many situations the lock is a waste of time. Reading and writing to 32 bit integers on most architectures is atomic. Don't try that with 32 bit bit-fields though!
Processor write re-ordering is not going to affect one thread reading a global value set by another thread. In fact, the result using locks is the same as the result not without locks. If you win the race and check the value before its changed ... well that's the same as winning the race to lock the value so no-one else can change it while you read it. Functionally the same.
The volatile keyword tells the compiler not to store a value in a register, but to keep referring to the original memory location. this should have no effect unless you are optimizing code. We have found that the compiler is pretty smart about this and have not run into a situation yet where volatile changed anything. The compiler seems to be pretty good at coming up with candidates for register optimization. I suspect that the const keyword might encourage register optimization on a variable.
The compiler might re-order code in a function if it knows the end result will not be different. I have not seen the compiler do this with global variables, because the compiler has no idea how changing the order of a global variable will affect code outside of the immediate function.
If a function is acting up, you can control the optimization level at the function level using __attrribute__.
Now, that said, if you use that flag as a gateway to allow only one thread of a group to perform some work, that wont work. Example: Thread A and Thread B both could read the flag. Thread A gets scheduled out. Thread B sets the flag to 1 and starts working. Thread A wakes up and sets the flag to 1 and starts working. Ooops! To avoid locks and still do something like that you need to look into atomic operations, specifically gcc atomic builtins like __sync_bool_compare_and_swap(value, old, new). This allows you to set value = new if value is currently old. In the previous example, if value = 1, only one thread (A or B) could execute __sync_bool_compare_and_swap(&value, 1, 2) and change value from 1 to 2. The losing thread would fail. __sync_bool_compare_and_swap returns the success of the operation.
Deep down, there is a "lock" when you use the atomic builtins, but it is a hardware instruction and very fast when compared to using mutexes.
That said, use mutexes when you have to change a lot of values at the same time. atomic operations (as of todayu) only work when all the data that has to change atomicly can fit into a contiguous 8,16,32,64 or 128 bits.
Assume the first thing you're doing in thread func in sleeping for a second. So value after that will be definetly 1.
In any instant you should at least declare the shared variable volatile. However you should in all cases prefer some other form of thread IPC or synchronisation; in this case it looks like a condition variable is what you actually need.
Hm, I guess it is secure, but why don't you just declare a function that returns the value to the other threads, as they will only read it?
Because the simple idea of passing pointers to separate threads is already a security fail, in my humble opinion. What I'm telling you is: why to give a (modifiable, public accessible) integer address when you only need the value?

Resources