Access control for one Publisher and multiple Subscribers - c

I have a block of shared memory that multiple processes access.
To this block of memory, I have one process that writes/updates information (which I'm calling a Publisher), and I have more than one process that is reading this data (which I'm calling Subscribers).
This leads me to believe that, because I don't want the Subscribers to read in the middle of a write/update from the Publisher, I need to implement access control, to guarantee that the data currently in shared memory is fully updated before the Subscribers take it (no reading in the middle of a write).
This is the behavior I'm trying to design:
Publisher may modify shared memory, but only when no other Subscriber is currently reading from the memory.
Any Subscriber may read from shared memory, so long as the Publisher is not currently modifying it.
Subscribers may not modify shared memory, only read; therefore, Consumers are allowed to read concurrently (assuming the Publisher is not modifying the shared memory).
The first solution I thought of is a simple mutex, or semaphore of size 1. This would mean that every time the Subscribers want to fetch new information, they would need to wait for the memory to be updated by the Publisher. However, this has the unintended consequences of Subscribers having to wait for other Subscribers, and the possibility that the Publisher gets delayed or locked out of the ability to publish new data if enough Subscribers exist on the system.
The second solution I thought of was looking into shm and found SHM_LOCK and SHM_UNLOCK, which seem useful to enforce the Publisher and Subscriber roles, but otherwise just seems to help reinforce what they can do, not necessarily when they can do it.
Alternatively, I have the reverse situation elsewhere, where the Subscribers from above become Publishers, each of which may or may not set a block of shared memory to a specific value. (They are not guaranteed to write to the block of memory, but the value is guaranteed to be the same across Publishers if they do write.) The Publisher from above becomes a Subscriber.
Addendum:
Each Publisher and Subscriber is an individual process.
'Shared memory' in my question represents multiple different caches of memory, not a single unit. I do not want all shared memory locked out from Subscriber(s) when my Publisher(s) issue an update to just one of N data units.
The Publisher (from the first part) is a daemon. My logic is that I want the daemon to be doing a timely action, putting data somewhere; I don't want the daemon disturbed to any great extent by Subscribers.
My questions:
Is there a control scheme that can properly encode the logic above?
(Publisher sets and removes access, Subscribers read when accessible.)
In this context, are there better methods of publishing information to multiple processes? Or is shared memory the way to go in this situation?

What you need is referred to as a read-write lock.
These are natively supported with pthreads with pthread_rwlock_*. pthread.h. Normally pthreads would be used for threads.
In the case of multiple processes you could implement a read-write lock with semaphores. Do a little bit more reading and research and that would easy enough to figure out the rest on your own.

normally, you need two mutexes for that (or more exactly, two conds, that can share the same mutex) The reason is that only locking the acces with a complex conditional is prone to a problem where readers are continously overlapping and blocking the access to writers. When using two conds, you can give priority to the queue of writers and disallow the blocking of the resources for reading when there's a writer waiting to acquire. Well, I'm supposing that the number of writers is far less than the number or readers, as you can hit the other side, and block readers because writers are overlapping and blocking them....
The most flexible approach is probably to allow writers and readers to act in sequence (well, readers can do in parallel) using a flip-flop and preparing the swith as soon as there's a worker in the other side waiting for access.
Anyway, as you have been suggested in other responses, take a look at the read-write lock suggested in other responses.

Related

Do mutexes only function correctly if all relevant threads attempt to acquire the locks they should be acquiring, prior to utilizing a resource?

I'm just learning about locks for the first time prior to taking an OS class for the first time. I originally thought that locks would literally "lock some resource" where you would need to specify the resource (perhaps by pointer to the address of the resource in memory), but after reading through a couple really basic implementations of spin-locks (say, the unix-like training OS "xv6"'s version):
http://pages.cs.wisc.edu/~skobov/cs537/P3/xv6/kernel/spinlock.h
http://pages.cs.wisc.edu/~skobov/cs537/P3/xv6/kernel/spinlock.c
As well as this previous stack overflow question: (What part of memory does a mutex lock? (pthreads))
I think I had it all wrong.
It seems to me instead that locks are effectively just a boolean flag like variable that temporarily (or indefinitely) blocks execution of some code that would utilize a resource, but only where another thread actually also attempts to acquire the lock (where in that second thread attempting to acquire the lock as well, that blocking of the second thread has the side effect of that second thread not being able to utilize the resource until the lock is released by the first thread). So now I'm wondering instead: if a poorly designed thread that uses no mutexes and simply attempts to utilize a resource that another well designed thread held a lock on, is the poorly designed thread able to access the resource regardless (by simply ignoring the mutex -- which I'm now thinking acts as a flag a thread should look at, but has the opportunity to ignore)?
If that's the case, then why do we implement locks as sophisticated boolean variables such that all threads must use the locks as opposed to a lock that instead prevents access to a memory region?
Since I'm relatively new to all this, I appreciate any reasonable terminology edit recommendations if I'm stating my question incorrectly as well an answer!
Thank you very much!
--edit, Thank you all for the prompt and helpful responses!
If that's the case, then why do we implement locks as sophisticated boolean variables such that all threads must use the locks as opposed to a lock that instead prevents access to a memory region?
A lot of reasons:
What if the thing you're controlling access to isn't a memory region? What if it's a file or a network connection?
How would the compiler know when it was going to access a region of protected memory? Would the compiler have to assume that any memory access anywhere might synchronize with other threads? That would make many optimizations impossible, including storing possibly shared variables in registers which is pretty critical.
Would hardware have to support locking memory on any granularity? How would it know what memory is associated with an object? Consider a linked list. Would you have to lock every bit of memory associated with that linked list and every object in it? When you add or remove an object from the list, do you have to change what memory is protected? Won't that be both expensive and extremely difficult to use?
How would it know when to release a lock? Say you access some area of memory that needs protection and then later you access some other area of memory. How would the implementation know whether other threads could be allowed to access that area in-between those two accesses? The implementation would need to know whether the code accessing that region was or wasn't relying on a consistent view of the shared state over those two accesses. How could it know that? Get it wrong by keeping the lock and concurrency suffers. Get it wrong by releasing the lock in-between the two accesses, and the code can behave unpredictably.
And so on.

POSIX shared memory - method for automatic client notification

I am investigating POSIX shared memory for IPC in place of a POSIX message queue. I plan to make a shared memory area large enough to hold 50 messages of 750 bytes each. The messages will be sent at random intervals from several cores (servers) to one core (client) that receives the messages and takes action based on the message content.
I have three questions about POSIX shared memory:
(1) is there a method for automatic client notification when new data are available, like the methods available with POSIX pipes and message queues?
(2) What problems would arise using shared memory without a lock where the data are write-once, read-once?
(3) I have read that shared memory is the fastest IPC method because it has the highest bandwith and data become available in both server and client cores immediately. However, with message queues and pipes the server cores can send the messages and continue with their work without waiting for a lock. Does the need for a lock slow the performance of shared memory over message queues and pipes in the type of scenario described above?
(1) There is no automatic mechanism to notify threads/processes that data was written to a memory location. You'd have to use some other mechanism for notifications.
(2) You have a multiple-producer/single-consumer (MPSC) setup. Implementing a lockless MPSC queue is not trivial. You would have to pay careful attention to doing atomic compare-and-swap (CAS) operations in right order with correct memory ordering and you should know how to avoid false cache line sharing. See https://en.cppreference.com/w/c/atomic for the atomic operations support in C11 and read up about memory barriers. Another good read is the paper on Disruptor at http://lmax-exchange.github.io/disruptor/files/Disruptor-1.0.pdf.
(3) Your data size (50*750) is small. Chances are that it all fits in cache and you'll have no bandwidth issues accessing it. Lock vs. pipe vs. message queue: none of these is free at times of contention and when the queue is full or empty.
One benefit of lockless queues is that they can work entirely in user-space. This is a huge benefit when extremely low latency is desired.

Is it safe to read and write to an array at different positions from multiple threads in C with phtreads?

Let's suppose that there are two threads, A and B. There is also a shared array: float X[100].
Thread A writes to the array one element at a time in order, every 10 steps it updates a shared variable index (in a safe way) that indicates the current index, and it also sends a signal to thread B.
As soon as thread B receives the signal, it reads index in a safe way, and then proceed to read the elements of X until position index.
Is it safe to do this? Thread A really updates the array or just a copy in cache?
Every sane way of one thread sending a signal to another provides the assurance that anything written by a thread before sending a signal is guaranteed to be visible to a thread after it receives that signal. So as long as you sent the signal through some means that provided this guarantee, which they pretty much all do, you are safe.
Note that attempting to use a condition variable without a predicate protected by a mutex is not a sane way of one thread sending a signal to another! Among other things, it doesn't guarantee that the thread that you think received the signal actually received the signal. You do need to make sure the thread that does the reads in fact received the very signal sent by the thread that does the writes.
Is it safe to do this?
Provided your data modification is rendered safe and protected by critical sections, locks or whatever, this kind of access is perfectly safe for what concerns hardware access.
Thread A really updates the array or just a copy in cache?
Just a copy in cache. Most caches are presently write-back and just write data back to memory when a line is ejected from the cache if it has been modified. This largely improves memory bandwidth, especially in a multicore context.
BUT all happens as if the memory had been updated.
For shared memory processors, there are generally cache coherency protocols (except in some processors for real time applications). The basic idea of these protocols is that a state is associated with every cache line.
State describes informations concerning the line in the cache of the different processors.
These states indicate, for instance, if the line is only present in the current cache, or is shared by several caches, in sync with memory, invalid... See for instance this description of the popular MESI cache coherence protocol.
So what happens, when a cache line is written and is also present in another processor?
Thanks to the state, the cache knows that one or more other processor also have a copy of the line and it will send an invalidate signal. The line will be invalidated in the other caches and when they want to read or write it, they have to reload its content. Actually, this reload will be served by the cache that has the valid copy to limit memory accesses.
This way, whilst data is only written in the cache, the behavior is similar to a situation where data would have been written to memory.
BUT, despite the fact that functionally the hardware will ensure correctness of the transfer, one must be take into account the cache existence, to avoid performances degradation.
Assume cache A is updating a line and cache B is reading it. Whenever cache A writes, the line in cache B is invalidated. And whenever cache B wants to read it, if the line has been invalidated, it must fetch it from cache A. This can lead to many transfers of the line between the caches and render inefficient the memory system.
So concerning your example, probably 10 is not a good idea, and you should use informations on the caches to improve your exchanges between sender and receiver.
For instance, if you are on a pentium with 64 bytes cache lines, you should declare X as
_Alignas(64) float X[100];
This way the starting address of X will be a multiple of 64 and fit cache lines boundaries. The _Alignas quaiifier exists since C17, and by including stdalign.h, you can also use similarly alignas(64). Before C17, there were several extensions in most compilers in order to have an aligned placement.
And of course, you should indicate process B to read data only when a full 64 bytes line (16 floats) has been written.
This way, when thread B accesses the data, the cache line will not be modified any longer by thread A and only one initial transfer between caches A and B Will take place. This reduction in the number of transfers between the caches may have a significant impact on performances depending on your program.
If you're using a variable to that tracks readiness to read the index, the variable is protected by a mutex and the signalling is done via a pthread condition variable that thread B waits on under the mutex, then yes.
If you're using POSIX signals, then I believe you need a synchronization mechanism on top of that. Writing to an atomic variable with memory_order_release in thread A, and reading it with memory_order_acquire in thread B should guarantee in the most lightweight fashion that writes in A preceding the write to the atomic should be visible in B after it has read the atomic.
For best performance, the array sharing should be also done in such a way that the shared parts of the array do not cross cache-line boundaries (or else you're performance might degrade due to false sharing).

FreeRTOS locks and tasks

When should I use locks using freertos on something like cortex-m3? It seems to be clear using multicore systems but what about single core?
For example I have two tasks. In first I increment variable named counter. Second task saves current value of that variable when some event occurs. Should I use locks here? May first task corrupt value which second task is saving?
Yes, you should use locks to protect access to the shared resource.
You can't be sure that the sequence generated by the compiler to read the shared variable is atomic, so it might be unsafe.
If you want to be a bit more hard-core, possibly gaining performance, you can use various ways to actually ensure that the variable can be accessed atomically. See comment.
You need use locks to synchronize concurrent access to shared objects, the easiest scenario would be like:
lock_t l; // defines a lock of your system
// thread 1:
lock(l);
counter += 1;
unlock(l);
// thread 2:
lock(l);
do_something(counter);
unlock(l);
In your specific example where there is one reader and one writer (so not in the "general" case, and definitely not in the case where there are multiple writers) then I would suggest a lock is not need if the variable being written/read is the natural word size of the architecture, and is needed if the variable is not the natural word size of the architecture.
In your case the word size is 32-bits, so if the variable is a uint32_t then it will be updated atomically, and one writer and multiple readers is safe. If on the other hand the variable were a uint64_t then it will be updated (written to) in two separate accesses, and you must ensure the reader does not access the variable in between the two updates as to do so would be to read a corrupted (half updated) value.
In FreeRTOS a simple way of doing this would be to use a basic critical section thus:
taskENTER_CRITICAL();
My64BitVariable++;
taskEXIT_CRITICAL();
The best method though depends on the frequency of the event. If the event is "not too fast", then why not send the value from one task to another using a queue, in which case FreeRTOS takes care of all the concurrency issues for you. Better (faster and less RAM) still, depending on what the receiving task is doing, have the writing task send the value to the receiving task directly using a direct to task notification.

having database in memory - C

I am programming a server daemon from which users can query data in C. The data can also be modified from clients.
I thought about keeping the data in memory.
For every new connection I do a fork().
First thing I thought about that this will generate a copy of the db every time a connection takes places, which is a waste of memory.
Second problem I have is that I don't know how to modify the database in the parent process.
What concepts are there to solve these problems?
Shared memory and multi-threading are two ways of sharing memory between multiple execution units. Check out POSIX Threads for multi-threading, and don't forget to use mutexes and/or semaphores to lock the memory areas from writing when someone is reading.
All this is part of the bigger problem of concurrency. There are multiple books and entire university courses about the problems of concurrency so maybe you need to sit down and study it a bit if you find yourself lost. It's very easy to introduce deadlocks and race conditions into concurrent C programs if you are not careful.
What concepts are there to solve these problems?
Just a few observations:
fork() only clones the memory of the process it executes at the time of execution. If you haven't opened or loaded your database at this stage, it won't be cloned into the child processes.
Shared memory - that is, memory mapped with mmap() and MAP_SHARED will be shared between processes and will not be duplicated.
The general term for communicating between processes is Interprocess communication of which there are several types and varieties, depending on your needs.
Aside On modern Linux systems, fork() implements copy-on-write copying of process memory. Actually, you won't end up with two copies of a process in memory - you'll end up with one copy that believes it has been copied twice. If you write to any of the memory, then it will be copied. This is an efficiency saving that makes use of the fact that the majority of processes alter only a small fraction of their memory as they run, so in fact even if you went for the copy the whole database approach, you might find the memory usage less that you expect - although of course that wouldn't fix your synchronisation problems!

Resources