Why does `aio_write` disallow concurrent buffer access? - c

The Linux man-page for aio_write says
The buffer area being written out must not be accessed during the operation or undefined results may occur.
My emphasis on "accessed", which strictly interpreted is not only stores to the buffer, but also loads from the buffer.
The man-page on Mac OS X says
Modifications of the Asynchronous I/O Control Block structure or the buffer contents after the request has been enqueued, but before the request has completed, are not allowed.
This sounds slightly more reasonable; the buffer can be read from, but not modified. The consequences of a violation are still left vague, though.
Thinking about how this might be implemented in the OS, I can't why a read access would ever be a problem, and the only problem I can imagine from a concurrent write would be that the actual data written could be some arbitrary mix of the initial buffer contents and the concurrent stores to the buffer.
Undefined behaviour opens up a lot of possibilities, however, and with that in mind we could get SIGSEGV on access (the underlying page was locked to prevent concurrent access?), or reads could return garbage data (the file system does in-place encryption or compression?), or the file could be left with permanently unreadable blocks (block checksummed, then modified concurrently, then written out?). Undefined behaviour does not even exclude crashing the storage device firmware, or the OS.
My question is, what could actually, reasonably happen, given the systems and hardware we have? I assume the language is left intentionally vague to not constrain future implementations.

Linux, BSD (MacOS is a BSD flavour), POSIX say different things.
POSIX says:
For any system action that changes the process memory space while an
asynchronous I/O is outstanding to the address range being changed,
the result of that action is undefined.
Linux manual seems more restrictive, two possibilites:
It's matter of interpretation. Author may thought about write accesses but simply wrote accesses,
It may be any access because implementation is free to use any mechanism that may forbid any access (severe locking or protection during IO).
BSD also says:
If the request is successfully enqueued, the value of iocb-_aio_offset
can be modified during the request as context, so this value must not be
referenced after the request is enqueued
thus explicitly forbids some read accesses (to the control structure).
As Martin said in comment: I don't know why anyone would want to access the structs/buffers/s before I/O completion is notified. But it is also too restrictive: ok, that is clear for write access, but one can imagine (while not common) a scenario where you want read access to the buffer during the IO (writing a framebuffer content while displaying it - or alike).
Whatever, if you violate the restrictions anything bad may happen, so don't violate them.

My question is, what could actually, reasonably happen, given the systems and hardware we have? I assume the language is left intentionally vague to not constrain future implementations.
Take a close look at the API:
int aio_write(struct aiocb *aiocbp);
Notice it does not take pointer to const? The warning is quite clear, once you pass the aiocbp parameter to aio_write(), the data belongs to the AIO code until the operation is complete. You could read the data, but what can you reasonably expect its state to be? According the API and the spec, you can't expect anything at all. Even observed behavior could appear to be totally random. In addition, AIO may lock the cache lines for that block for performance (consistency) reasons, any reads from another core could interfere with the performance of the entire system.
In the absence of lock/unlock semantics, anytime you pass non-const data off to another thread of execution, you cannot reasonably expect to consistently read anything from that data block, until whatever API you are using has signaled completion of whatever work you expect it to perform. This is true whether their documentation says so or not.

Related

Is it possible to achieve 2 lines of code to always occur in a order in a multithreaded program without locks?

atomic_compare_exchange_strong_explicit(mem, old, new, <mem_order>, <mem_order>);
ftruncate(fd, <size>);
All I want is that these two lines of code always occur without any interference (WITHOUT USING LOCKS). Immediately after that CAS, ftruncate(2) should be called. I read a small description about memory orders, although I don’t understand them much. But they seemed to make this possible. Is there any way around?
Your title asks for the things to occur in order. That's easy, and C basically does that automatically with mo_seq_cst; all visible side-effects of CAS will appear before any from ftruncate.
(Not strictly required by the ISO C standard, but in practice real implementations implement seq-cst with a full barrier, except AArch64 where STLR doesn't stall to drain the store buffer unless/until there's a LDAR while the seq-cst store is still in the store buffer. But a system call is definitely going to also include a full barrier.)
Within the thread doing the operation, the atomic is Sequenced Before the system call.
What kind of interference are you worried about? Some other thread changing the size of the file? You can't prevent that race condition.
There's no way to combine some operation on memory + a system call into a single atomic transaction. You would need to use a hypothetical system call that atomically does what you want. (Presumably it would have to do locking inside the kernel to make a file operation and a memory modification appear as one atomic transaction.) e.g. the Linux futex system call atomically does a couple things, but of course there's nothing like this for any other operations.
Or you need locking. (Or to suspend all other threads of your process somehow.)

In C, how do I make sure that a memory load is performed only once?

I am programming two processes that communicate by posting messages to each other in a segment of shared memory. Although the messages are not accessed atomically, synchronization is achieved by protecting the messages with shared atomic objects accessed with store-releases and load-acquires.
My problem is about security. The processes do not trust each other. Upon receiving a message, a process makes no assumption about the message being well formed; it first copies the message from shared memory to private memory, then performs some validation on this private copy and, if valid, proceeds to handle this same private copy. Making this private copy is crucial, as it prevents a TOC/TOU attack in which the other process would modify the message between validation and use.
My question is the following: does the standard guarantee that a clever C compiler will never decide that it can read the original instead of the copy? Imagine the following scenario, in which the message is a simple integer:
int private = *pshared; // pshared points to the message in shared memory
...
if (is_valid(private)) {
...
handle(private);
}
If the compiler runs out of registers and temporarily needs to spill private, could it decide, instead of spilling it to the stack, that it can simply discard its value and reload it from *pshared later, provided that an alias analysis ensures that this thread has not changed *pshared?
My guess is that such a compiler optimization would not preserve the semantics of the source program, and would therefore be illegal: pshared does not point to an object that is provably reachable from this thread only (such as an object allocated on the stack whose address has not leaked), therefore the compiler cannot rule out that some other thread might concurrently modify *pshared. By constrast, the compiler may eliminate redundant loads, because one of the possible behaviors is that no other thread runs between the redundant loads, therefore the current thread must be ready to deal with this particular behavior.
Could anyone confirm or infirm that guess and possibly provide references to the relevant parts of the standard?
(By the way: I assume that the message type has no trap representations, so that loads are always defined.)
UPDATE
Several posters have commented on the need for synchronization, which I did not intend to get into, since I believe that I already have this covered. But since people are pointing that out, it is only fair that I provide more details.
I am implementing a low-level asynchronous communication system between two entities that do not trust each other. I run tests with processes, but will eventually move to virtual machines on top of a hypervisor. I have two basic ingredients at my disposal: shared memory and a notification mechanism (typically, injecting an IRQ into the other virtual machine).
I have implemented a generic circular buffer structure with which the communicating entities can produce messages, then send the aforementioned notifications to let each other know when there is something to consume. Each entity maintains its own private state that tracks what it has produced/consumed, and there is a shared state in shared memory composed of message slots and atomic integers tracking the bounds of the regions holding pending messages. The protocol unambiguously identifies which message slots are to be exclusively accessed by which entity at any time. When it needs to produce a message, an entity writes a message (non atomically) to the appropriate slot, then performs an atomic store-release to the appropriate atomic integer to transfer the ownership of the slot to the other entity, then waits until memory writes have completed, then sends a notification to wake up the other entity. Upon receiving a notification, the other entity is expected to perform an atomic load-acquire on the appropriate atomic integer, determine how many pending messages there are, then consume them.
The load of *pshared in my code snippet is just an example of what consuming a trivial (int) message looks like. In a realistic setting, the message would be a structure. Consuming a message does not need any particular atomicity or synchronization, since, as specified by the protocol, it only happens when the consuming entity has synchronized with the other one and knows that it owns the message slot. As long as both entites follow the protocol, everything works flawlessly.
Now, I do not want the entites to have to trust each other. Their implementation must be robust against a malicious entity that would disregard the protocol and write all over the shared memory segment at any time. If this were to happen, the only thing the malicious entity should be able to achieve would be to disrupt the communication. Think of a typical server, that must be prepared to handle ill-formed requests by a malicious client, without letting such misbehavior cause buffer overflows or out-of-bound accesses.
So, while the protocol relies on synchronization for normal operation, the entities must be prepared for the contents of shared memory to change at any time. All I need is a way to make sure that, after an entity makes a private copy of a message, it validates and uses that same copy, and never accesses the original any more.
I have an implementation that copies the message using a volatile read, thus making it clear to the compiler that the shared memory does not have ordinary memory semantics. I believe that this is sufficient; I wonder whether it is necessary.
You should inform the compiler the the shared memory can change at any moment by the volatile modifier.
volatile int *pshared;
...
int private = *pshared; // pshared points to the message in shared memory
...
if (is_valid(private)) {
...
handle(private);
}
As *pshared is declared to be volatile, the compiler can no longer assume that *pshared and private keep same value.
Per your edit, it is now clear, that we all know that a volatile modifier on the shared memory is sufficient to guarantee that the compiler will honour the temporality of all accesses to that shared memory.
Anyway, draft N1256 for C99 is explicit about it in 5.1.2.3 Program execution (emphasize mine)
2 Accessing a volatile object, modifying an object, modifying a file, or calling a function
that does any of those operations are all side effects, which are changes in the state of
the execution environment. Evaluation of an expression may produce side effects. At
certain specified points in the execution sequence called sequence points, all side effects
of previous evaluations shall be complete and no side effects of subsequent evaluations
shall have taken place.
5 The least requirements on a conforming implementation are:
— At sequence points, volatile objects are stable in the sense that previous accesses are
complete and subsequent accesses have not yet occurred
— At program termination, all data written into files shall be identical to the result that
execution of the program according to the abstract semantics would have produced.
That let think that even if pshared is not qualified as volatile, private value must have been loaded from *pshared before the evaluation of is_valid, and as the abstract machine has no reason to change it before the evaluation of handle, a conformant implementation should not change it. At most it could remove the call to handle if it contained no side-effects which is unlikely to happen
Anyway, this is only an academic discussion, because I cannot imagine a real use case where share memory could not need the volatile modifier. If you do not use it, the compiler is free to believe that the previous value is still valid, so on second access, you will still get first value. So even if the answer to this question is it is not necessary, you still have to use volatile int *pshared;.
It's hard to answer your question as posted. Note that you must use a synchronization object to prevent concurrent accesses, unless you are only reading units which are atomic on the platform.
I am assuming that you intend to ask about (pseudocode):
lock_shared_area();
int private = *pshared;
unlock_shared_area();
if (is_valid(private))
and that the other process also uses the same lock. (If not, it would be good to update your question to be a bit more specific about your synchronization).
This code guarantees to read *pshared at most once. Using the name private means to read the variable private, not the object *pshared. The compiler "knows" that the call to unlock the area acts as a memory fence and it won't reorder operations past the fence.
Since the C doesn't have any concept of interprocess communication there is nothing you can do to inform the compiler that there is another process that might be modifying the memory.
Thus, I believe there is no way to prevent a sufficiently clever, malicious, but conforming build system from invoking the "as if" rule to allow it to do the Wrong Thing.
To get something that is 'guaranteed' to work, you need to work whatever guarantees are given by your specific compiler and/or shared memory library you're using.

Is write/fwrite guaranteed to be sequential?

Is the data written via write (or fwrite) guaranteed to be persisted to the disk in a sequence manner? In particular in relation to fault tolerance. If the system should fail during the write, will it behave as though first bytes were written first and writing stopped mid-stream (as opposed to random blocks written).
Also, are sequential calls to write/fwrite guaranteed to be sequential? According to POSIX I find only that a call to read is guaranteed to consider a previous write.
I'm asking as I'm creating a fault tolerant data store that persists to disks. My logical order of writing is such that faults won't ruin the data, but if the logical order isn't being obeyed I have a problem.
Note: I'm not asking if persistence is guaranteed. Only that if my calls to write do eventually persist they obey the order in which I actually write.
The POSIX docs for write() state that "If the O_DSYNC bit has been set, write I/O operations on the file descriptor shall complete as defined by synchronized I/O data integrity completion". Presumably, if the O_DSYNC bit isn't set, then the synchronization of I/O data integrity completion is unspecified. POSIX also says that "This volume of POSIX.1-2008 is also silent about any effects of application-level caching (such as that done by stdio)", so I think there is no guarantee for fwrite().
I am not an expert, but I might know enough to point you in the right direction:
The most disastrous case is if you lose power, so that is the only one worth considering.
Start with a file with X bytes of meaningful content, and a header that indicates it.
Write Y bytes of meaningful content somewhere that won't invalidate X.
Call fsync (slow!).
Update the header (probably has to be less than your disk's block size).
I don't know if changing the length of a file is safe. I don't know how much depends on the filesystem mount mode, except that any "safe" mode is probably completely unusable for systems need to have even a slight level of performance.
Keep in mind that on some systems the fsync call lies and just returns without doing anything safely. You can tell because it returns quickly. For this reason, you need to make pretty large transactions (i.e. much larger than application-level transactions).
Keep in mind that the kind of people who solve this problem in the real world get paid high 6 figures at least. The best answer for the rest of us is either "just send the data to postgres and let it deal with it." or "accept that we might have to lose data and revert to an hourly backup."
No, in general as far as POSIX and reality are concerned, filesystems do not provide such guarantee. Order of persistence (in which disk makes them permanent on the platter) is not dictated by order in which syscalls were made, or position within the file, or order of sectors on disk. Filesystems retain the data to be written in memory for several seconds, hoarding as much as possible, and later send them to disk in batches, in whatever order they seem fit. And regardless of how kernel sends it to disk, the disk itself can reorder writes on its own due to NCQ.
Filesystems have means of ensuring some ordering for safety. Barriers were used in the past, and explicit flushes and FUA requests nowadays. There is a good article on LWN about that. But those are used by filesystems, not applications.
I strongly suggest reading the article about Application Level Consistency. Not sure how relevant to you but it shows many behaviors that developers wrongly assumed in the past.
Answer from o11c is a good way to do it.
Yes, so long as we are not talking about adding in the complexity of multithreading. It will be in the same order on disk, for what makes it to disk. It buffers to memory and dumps that memory to disk when it fills up, or you close the file.

Will WriteFile() be atomic if the process is terminated but the system continues running?

If my process is terminated at a random moment but the operating system continues to run properly, will Windows guarantee that individual calls to WriteFile are atomic (a.k.a. all-or-nothing)?
Or can I get partial/torn writes?
Note: I am specifically NOT asking for advice on how to practice defensive coding.
This is strictly a question about the behavior of the Microsoft Windows operating system itself.
To be 100% perfectly crystal clear, we can and explicitly do trust the user code to behave sanely. There is no undefined behavior or anything of the sort. All process terminations are assumed to occur through a well-defined behavior such as unhandled exceptions or calls to TerminateProcess, not memory corruption, etc.
Also, specifically note that there are no C++ destructors to worry about here; this is C.
I hope that puts all the secondary concerns about the user code to rest.
WriteFile is certainly not atomic in the case of your process being terminated while it is executing, it is not even atomic if your process is not being killed.
Also, "all or nothing written" is not even a proper definition of an atomic write. All could be written, but intermingled with an independent write from another process. If writes are guaranteed to be atomic, there must be a guarantee (read as: lock) that this doesn't happen.
Apart from the fact that implementing proper atomicity would be considerable extra trouble with very little to gain for the average everyday user, you can also guess that WriteFile is not atomic from:
The absence of mention in the API documentation. You can bet that this would be prominently mentioned, as it is a really big, distinguishing feature.
The presence of the lpNumberOfBytesWritten parameter. A write might still fail (e.g. disk full) but if the function was guaranteed to be atomic, you would know that it either succeeded or failed, and you already know how many bytes you were going to write, so returning that number is unnecessary.
The presence of TxF. Although TxF does a lot more than just making single writes atomic, it is reasonable to assume that Microsoft wouldn't waste considerable time and money in implementing such a beast when "normal" filesystem operations already more or less work the like anyway.
No other mainstream operation system that I know of gives such a guarantee. Linux does give a sort of atomicity guarantee on writev (but not on write) insofar as your writes will not be intermingled with writes from other processes. But that is not at all the same thing as guaranteeing atomicity in presence of process termination.
However, overlapped writes on a handle opened with FILE_FLAG_NO_BUFFERING are technically atomic in respect of process termination (but not in respect of failure, such as disk full or in any other respect!). Saying so is admittedly a bit of a sophistry on an implementation detail, not an actual guarantee given by the operating system, but from a certain point of view it's certainly correct to say so.
A process that is performing an unbuffered, overlapped I/O operation cannot be terminated. That is because the OS is doing DMA transfers into that process' address space. Which of course means that the process cannot be terminated since the OS would reclaim the physical pages. The OS will therefore refuse to terminate a process while such an I/O operation is running.
You can verify this by firing off a couple of big unbuffered overlapped requests (a few GB) and try killing your process in Task Manager. It will only be killed when the I/O is complete (so, after some seconds). That comes as a big surprise when you see it happen for the first time and don't expect it!

Reading Critical Section Data using pthreads

I have a multi-threaded application, I'm using pthreads with the pthread_mutex_lock function. The only data I need to protect is in one data structure. Is it safe if I apply the lock only when I write to the data structure? Or should I apply the lock whenever I read or write?
I found a question similar to this, but it was for Windows, from that question it would that the answer to mine would be that it is ok. Just want to make sure though.
EDIT
follow up: So I want to pass in a command line argument and on read from it (from different threads). Do I still have to use pthread_mutex_lock?
You could use a pthreads_rwlock_t to allow "one-writer OR N-readers" concurrency. But if you stick with the general pthread_mutex_lock, it needs to be acquired for ANY access to the shared data structure it's protecting, so you're cutting things down to "one reader-or-writer" concurrency.
It is necessary to apply the lock when you read as well unless you can guarantee atomic writes (at which point you don't even need to lock on write). The problem arises from writes that take more than 1 cycle.
Imagine you write 8 bytes as two 4 byte writes. If the other thread kicks off after it has half been written then the read will read invalid data. Its veyr ucommon that this happens but when it does its a hell of a bug to track down.
Yes, you need to be locked for reads as well as writes.
Compilers and CPUs do not necessarily write to a field in a structure atomically. In addition your code may not write atomically, and the structure may at certain points be out of sync with regards to itself.
If all you need to share is a single integer value, you might choose to use atomic integers. GCC has atomic attributes you can use. This is not as portable as using pthreads locks.

Resources