Atomic writes to a file descriptor - c

I'm reading about pipe(7)s in Linux and came across the following thing:
POSIX.1 says that write(2)s of less than PIPE_BUF bytes must be
atomic: the output data is written to the pipe as a contiguous
sequence. Writes of more than PIPE_BUF bytes may be nonatomic: the
kernel may interleave the data with data written by other processes.
POSIX.1 requires PIPE_BUF to be at least 512 bytes. (On Linux,
PIPE_BUF is 4096 bytes.)
This is not quite clear. Does POSIX require that all writes less then PIPE_BUF are atomic? Or this is true to pipes created with pipe(int[2], int) only?

The quoted behavior is pipe specific (but applies to all pipes, no matter how they were created (e.g. by pipe, mkfifo+open, etc)).
From the POSIX description of write:
Write requests to a pipe or FIFO shall be handled in the same way as a regular file with the following exceptions:
[...]
Write requests of {PIPE_BUF} bytes or less shall not be interleaved with data from other processes doing writes on the same pipe. Writes of greater than {PIPE_BUF} bytes may have data interleaved, on arbitrary boundaries, with writes by other processes, whether or not the O_NONBLOCK flag of the file status flags is set.

Related

Pipe - communicating with multiple forked child processes

I am writing a parent process, that needs to count events from a group of child processes.
I am going to use pipe() to achieve this.
Can I open a single pipe on the parent, and then fork 4 child processes that will use that same pipe to communicate with the parent, or must I create 4 different pipes? (1 for each child process)
It is important to state that the parent never communicates with the child processes. All it does is: Count and sum up the rate at which the child processes raise events.
Also: In case I can use a shared pipe, what would the atomicity of the messages be. Do I have to keep them one byte long, or can I assume that two 4 byte messages will not get their bytes interpolated?
You can use a single pipe.
You don't need to limit yourself to single-byte events.
man 7 pipe on Linux states:
PIPE_BUF
POSIX.1 says that write(2)s of less than PIPE_BUF bytes must be
atomic: the output data is written to the pipe as a contiguous
sequence. Writes of more than PIPE_BUF bytes may be nonatomic: the
kernel may interleave the data with data written by other processes.
POSIX.1 requires PIPE_BUF to be at least 512 bytes. (On Linux,
PIPE_BUF is 4096 bytes.)
(Related: The description of write in POSIX.)
Another option is to use one datagram unix socket pair created with socketpair instead of a pipe. In this case each write creates a separate datagram and each read returns one datagram only. This way messages can be larger than PIPE_BUF and still be atomic.
Try using named pipe is this example https://www.geeksforgeeks.org/named-pipe-fifo-example-c-program/ Rin the reader and then the written should do the work, you should also think about message quue that is asynchronous way of communicating between process

Is a write operation in unix atomic? [duplicate]

This question already has answers here:
Atomicity of `write(2)` to a local filesystem
(4 answers)
Closed 4 years ago.
I was reading the APUE(Advanced Programming in the UNIX Environment), and come across this question when I see $3.11:
if (lseek(fd, 0L, 2) < 0) /* position to EOF */
err_sys("lseek error");
if (write(fd, buf, 100) != 100) /* and write */
err_sys("write error")
APUE says:
This works fine for a single process, but problems arise if multiple processes use this technique to append to the same file. .......The problem here is that our logical operation of ‘‘position to the end of file and
write’’ requires two separate function calls (as we’ve shown it). Any operation that requires more than one function call cannot be atomic, as there is always the possibility that the kernel might temporarily suspend the process between the two function calls.
It just says cpu will switch between function calls between lseek and write, I want to know if it will also switch in half write operation? Or rather, is write atomic? If threadA writes "aaaaa", threadB writes "bbbbb", will the result be "aabbbbbaaa"?
What's more,after that APUE says pread and pwrite are all atomic operations, does that mean these functions use mutex or lock internally to be atomic?
To call the Posix semantics "atomic" is perhaps an oversimplification. Posix requires that reads and writes occur in some order:
Writes can be serialized with respect to other reads and writes. If a read() of file data can be proven (by any means) to occur after a write() of the data, it must reflect that write(), even if the calls are made by different processes. A similar requirement applies to multiple write operations to the same file position. This is needed to guarantee the propagation of data from write() calls to subsequent read() calls. (from the Rationale section of the Posix specification for pwrite and write)
The atomicity guarantee mentioned in APUE refers to the use of the O_APPEND flag, which forces writes to be performed at the end of the file:
If the O_APPEND flag of the file status flags is set, the file offset shall be set to the end of the file prior to each write and no intervening file modification operation shall occur between changing the file offset and the write operation.
With respect to pread and pwrite, APUE says (correctly, of course) that these interfaces allow the application to seek and perform I/O atomically; in other words, that the I/O operation will occur at the specified file position regardless of what any other process does. (Because the position is specified in the call itself, and does not affect the persistent file position.)
The Posix sequencing guarantee is as follows (from the Description of the write() and pwrite() functions):
After a write() to a regular file has successfully returned:
Any successful read() from each byte position in the file that was modified by that write shall return the data specified by the write() for that position until such byte positions are again modified.
Any subsequent successful write() to the same byte position in the file shall overwrite that file data.
As mentioned in the Rationale, this wording does guarantee that two simultaneous write calls (even in different unrelated processes) will not interleave data, because if data were interleaved during a write which will eventually succeed the second guarantee would be impossible to provide. How this is accomplished is up to the implementation.
It must be noted that not all filesystems conform to Posix, and modular OS design, which allows multiple filesystems to coexist in a single installation, make it impossible for the kernel itself to provide guarantees about write which apply to all available filesystems. Network filesystems are particularly prone to data races (and local mutexes won't help much either), as is mentioned as well by Posix (at the end of the paragraph quoted from the Rationale):
This requirement is particularly significant for networked file systems, where some caching schemes violate these semantics.
The first guarantee (about subsequent reads) requires some bookkeeping in the filesystem, because data which has been successfully "written" to a kernel buffer but not yet synched to disk must be made transparently available to processes reading from that file. This also requires some internal locking of kernel metadata.
Since writing to regular files is typically accomplished via kernel buffers and actually synching the data to the physical storage device is definitely not atomic, the locks necessary to provide these guarantee don't have to be very long-lasting. But they must be done inside the filesystem because nothing in the Posix wording limits the guarantees to simultaneous writes within a single threaded process.
Within a multithreaded process, Posix does require read(), write(), pread() and pwrite() to be atomic when they operate on regular files (or symbolic links). See Thread Interactions with Regular File Operations for a complete list of interfaces which must obey this requirement.
In Linux there are blocking and non-blocking system calls. The write is an example of blocking system call, which means the execution thread will be blocked until the write completes. So once the user process called write, it can not execute anything else until the system call is complete. So from user thread perspective it will behave like atomic [although at kernel level lot many things can happen and kernel execution of system call can be interrupted many times].

Are pipe reads atomic on Linux (multiple writers, one reader)?

I have multiple processes (and multiple threads within some processes) writing to a single named pipe. The pipe is opened with O_WRONLY for each writer.
I have another process reading from this pipe, blocking with select. The pipe is opened with O_RDONLY | O_NONBLOCK in the reader.
When select in the reader wakes up, will read return at most one chunk of data available, or could it return multiple chunks? If the former, then I expect after I read the first chunk, select will immediately wake up until I finish reading the remaining chunks.
Or could read return less than one of the chunks written by a writer?
I'm writing and reading strings, and they're all less than PIPE_BUF, so I know the writes are atomic. I can easily append a delimiter to check for multiple strings, but I'm just curious how it works on Linux.
read will return all data available in the pipe, it doesn't matter how many writes were used to write that data. The number of bytes returned will be same as the size requested, when there is more data in pipe. In such cases, select will return immediately, indicating that there is some data to read.
You will have to delimit each chuck you write to pipe and separate it after reading.

what is the sizeof pipe [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Pipe buffer size is 4k or 64k?
In linux, which header file specifies the size available for writes on a pipe ?
I capture latency of my main application per configurable cycle and write that data to a pipe. A separate reporting process reads off that pipe. Typically, the main application exchanges about 10,000 messages per second. So, given a cycle of one second, the main application collects 10k latency data points for each message exchange and then writes them off to a pipe on a second's boundary. I have following questions in this scenario
Is there way to specify the size of pipe while creation,so i can ensure there is adequate write space in the pipe?
Are writes to pipe expensive? How is pipe implemented? Do writes to pipe go against some mmap file or in-memory buffer?
Is there way to specify the size of a pipe at creation? Maybe. Since Linux 2.6.35 you can use fcntl(2) with the F_SETPIPE_SZ operation to set the pipe buffer up to /proc/sys/fs/pipe-max-size. On earlier versions, no, but I suppose you could use the socket mechanism instead. It would be slower for most purposes but you can specify the amount of buffering up to wmem_max, see socket(7), and you have certain other controls over kernel memory allocation.
Are writes to pipe expensive? No. But write(2) is a kernel call, so pipe I/O should be buffered if possible.
How are pipes implemented? With kernel code that transfers data in and out of the system buffer cache.
Do writes to pipe go against some mmap file or in-memory buffer? It's an in-memory buffer

What happens if a write system call is called on same file by 2 different processes simultaneously

Does the OS handle it correctly?
Or will I have to call flock()?
Although the OS won't crash, and the filesystem won't be corrupted, calls to write() are NOT guarenteed to be atomic, unless the file descriptor in question is a pipe, and the amount of data to be written is PIPE_MAX bytes or less. The relevant part of the standard:
An attempt to write to a pipe or FIFO has several major characteristics:
Atomic/non-atomic: A write is atomic if the whole amount written in one operation is not interleaved with data from any other process. This is useful when there are multiple writers sending data to a single reader. Applications need to know how large a write request can be expected to be performed atomically. This maximum is called {PIPE_BUF}. This volume of IEEE Std 1003.1-2001 does not say whether write requests for more than {PIPE_BUF} bytes are atomic, but requires that writes of {PIPE_BUF} or fewer bytes shall be atomic.
[...]
As such, in principle, you must lock with simultaneous writers, or your written data may get mixed up and out of order (even within the same write) or you may have multiple writes overwriting each other. However, there is an exception - if you pass O_APPEND, your writes will be effectively atomic:
If the O_APPEND flag of the file status flags is set, the file offset shall be set to the end of the file prior to each write and no intervening file modification operation shall occur between changing the file offset and the write operation.
Although this is not necessarily atomic with respect to non-O_APPEND writes, or simultaneous reads, if all writers use O_APPEND, and you synchronize somehow before doing a read, you should be okay.
write (and writev, too) guarantee atomicity.
Which means if two threads or processes write simultaneously, you do not have a guarantee which one writes first. But you do have the guarantee that anything that is in one syscall will not be intermingled with data from the other one.
Insofar it will always work correctly, but not necessarily in the way you expect (if you assume that process A comes before process B).
Of course the kernel will handle it correctly, for the kernel’s idea of correctness — which is by definition correct.
If you have a set of coöperating flockers, then you can use the kernel to queue everyone up. But remember that flock has nothing to do with I/O: it will not stop someone else from writing the file. It will at most only interfere with other flockers.
Yes of course it will work correctly. It won't crash the OS or the process.
Whether it makes any sense, depends on the way the application(s) are written an what the file's purpose is.
If the file is opened by all processes as append-only, each process (notionally) does an atomic seek-to-end before each write; these are guaranteed not to overwrite each others' data (but of course, the order is nondeterministic).
In any case, if you use a library which potentially splits a single logical write into several write syscalls, expect trouble.
write(), writev(), read(), readv() can generate partial writes/reads where the amount of data transferred is smaller than what was requested.
Quoting the Linux man page for writev():
Note that is not an error for a successful call to transfer fewer bytes than requested
Quoting the POSIX man page:
If write() is interrupted by a signal after it successfully writes some data, it shall return the number of bytes written.
AFAIU, O_APPEND does not help in this regard because it does not prevent partial writes: it only ensures that whatever data is written is appended at the end of the file.
See this bug report from the Linux kernel:
A process is writing a messages to the file. [...] the writes [...] can be split in two. [...] So if the signal arrives [...] the write is interrupted. [...] this is perfectly valid behavior as far as spec (POSIX, SUS,...) is concerned
FIFOs and PIPE writes smaller than PIPE_MAX however are guaranteed to be atomic.

Resources