Is writing to stdout using printf thread-safe on Linux? What about using the lower-level write command?
It's not specified by the C standard -- it depends on your implementation of the C standard library. In fact, the C standard doesn't even mention threads at all, since certain systems (e.g. embedded systems) don't have multithreading.
In the GNU implementation (glibc), most of the higher-level functions in stdio that deal with FILE* objects are thread-safe. The ones that aren't usually have unlocked in their names (e.g. getc_unlocked(3)). However, the thread safety is at a per-function call level: if you make multiple calls to printf(3), for example, each of those calls is guaranteed to output atomically, but other threads might print things out between your calls to printf(). If you want to ensure that a sequence of I/O calls gets output atomically, you can surround them with a pair of flockfile(3)/funlockfile(3) calls to lock the FILE handle. Note that these functions are reentrant, so you can safely call printf() in between them, and that won't result in deadlock even thought printf() itself makes a call to flockfile().
The low-level I/O calls such as write(2) should be thread-safe, but I'm not 100% sure of that - write() makes a system call into the kernel to perform I/O. How exactly this happens depends on what kernel you're using. It might be the sysenter instruction, or the int (interrupt) instruction on older systems. Once inside the kernel, it's up to the kernel to make sure that the I/O is thread-safe. In a test I just did with the Darwin Kernel Version 8.11.1, write(2) appears to be thread-safe.
Whether you'd call it "thread-safe" depends on your definition of thread-safe. POSIX requires stdio functions to use locking, so your program will not crash, corrupt the FILE object states, etc. if you use printf simultaneously from multiple threads. However, all stdio operations are formally specified in terms of repeated calls to fgetc and fputc, so there is no larger-scale atomicity guaranteed. That is to say, if threads 1 and 2 try to print "Hello\n" and "Goodbye\n" at the same time, there's no guarantee that the output will be either "Hello\nGoodbye\n" or "Goodbye\nHello\n". It could just as well be "HGelolodboy\ne\n". In practice, most implementations will acquire a single lock for the entire higher-level write call simply because it's more efficient, but your program should not assume so. There may be corner cases where this is not done; for instance an implementation could probably entirely omit locking on unbuffered streams.
Edit: The above text about atomicity is incorrect. POSIX guarantees all stdio operations are atomic, but the guarantee is hidden in the documentation for flockfile: http://pubs.opengroup.org/onlinepubs/9699919799/functions/flockfile.html
All functions that reference ( FILE *) objects shall behave as if they use flockfile() and funlockfile() internally to obtain ownership of these ( FILE *) objects.
You can use the flockfile, ftrylockfile, and funlockfile functions yourself to achieve larger-than-single-function-call atomic writes.
They are both thread-safe to the point that your application won't crash if multiple threads call them on the same file descriptor. However, without some application-level locking, whatever is written could be interleaved.
C got a new standard since this question was asked (and last answered).
C11 now comes with multithreading support and addresses multithreaded behavior of streams:
§7.21.2 Streams
¶7 Each stream has an associated lock that is used to prevent data races when multiple threads of execution access a stream, and to restrict the interleaving of stream operations performed by multiple threads. Only one thread may hold this lock at a time. The lock is reentrant: a single thread may hold the lock multiple times at a given time.
¶8 All functions that read, write, position, or query the position of a stream lock the stream before accessing it. They release the lock associated with the stream when the access is complete.
So, an implementation with C11 threads must guarantee that using printf is thread-safe.
Whether atomicity (as in no interleaving1) is guaranteed, wasn't that clear to me at a first glance, because the standard spoke of restricting interleaving, as opposed to preventing, which it mandated for data races.
I lean towards it being guaranteed. The standard speaks of restricting interleaving, as some interleaving that doesn't change the outcome is still allowed to happen; e.g. fwrite some bytes, fseek back some more and fwrite till the original offset, so that both fwrites are back-to-back. The implementation is free to reorder these 2 fwrites and merge them into a single write.
1: See the strike-through text in R..'s answer for an example.
It's thread-safe; printf should be reentrant, and you won't cause any strangeness or corruption in your program.
You can't guarantee that your output from one thread won't start half way through the output from another thread. If you care about that you need to develop your own locked output code to prevent multiple access.
Related
This question already has answers here:
Atomicity of `write(2)` to a local filesystem
(4 answers)
Closed 4 years ago.
I was reading the APUE(Advanced Programming in the UNIX Environment), and come across this question when I see $3.11:
if (lseek(fd, 0L, 2) < 0) /* position to EOF */
err_sys("lseek error");
if (write(fd, buf, 100) != 100) /* and write */
err_sys("write error")
APUE says:
This works fine for a single process, but problems arise if multiple processes use this technique to append to the same file. .......The problem here is that our logical operation of ‘‘position to the end of file and
write’’ requires two separate function calls (as we’ve shown it). Any operation that requires more than one function call cannot be atomic, as there is always the possibility that the kernel might temporarily suspend the process between the two function calls.
It just says cpu will switch between function calls between lseek and write, I want to know if it will also switch in half write operation? Or rather, is write atomic? If threadA writes "aaaaa", threadB writes "bbbbb", will the result be "aabbbbbaaa"?
What's more,after that APUE says pread and pwrite are all atomic operations, does that mean these functions use mutex or lock internally to be atomic?
To call the Posix semantics "atomic" is perhaps an oversimplification. Posix requires that reads and writes occur in some order:
Writes can be serialized with respect to other reads and writes. If a read() of file data can be proven (by any means) to occur after a write() of the data, it must reflect that write(), even if the calls are made by different processes. A similar requirement applies to multiple write operations to the same file position. This is needed to guarantee the propagation of data from write() calls to subsequent read() calls. (from the Rationale section of the Posix specification for pwrite and write)
The atomicity guarantee mentioned in APUE refers to the use of the O_APPEND flag, which forces writes to be performed at the end of the file:
If the O_APPEND flag of the file status flags is set, the file offset shall be set to the end of the file prior to each write and no intervening file modification operation shall occur between changing the file offset and the write operation.
With respect to pread and pwrite, APUE says (correctly, of course) that these interfaces allow the application to seek and perform I/O atomically; in other words, that the I/O operation will occur at the specified file position regardless of what any other process does. (Because the position is specified in the call itself, and does not affect the persistent file position.)
The Posix sequencing guarantee is as follows (from the Description of the write() and pwrite() functions):
After a write() to a regular file has successfully returned:
Any successful read() from each byte position in the file that was modified by that write shall return the data specified by the write() for that position until such byte positions are again modified.
Any subsequent successful write() to the same byte position in the file shall overwrite that file data.
As mentioned in the Rationale, this wording does guarantee that two simultaneous write calls (even in different unrelated processes) will not interleave data, because if data were interleaved during a write which will eventually succeed the second guarantee would be impossible to provide. How this is accomplished is up to the implementation.
It must be noted that not all filesystems conform to Posix, and modular OS design, which allows multiple filesystems to coexist in a single installation, make it impossible for the kernel itself to provide guarantees about write which apply to all available filesystems. Network filesystems are particularly prone to data races (and local mutexes won't help much either), as is mentioned as well by Posix (at the end of the paragraph quoted from the Rationale):
This requirement is particularly significant for networked file systems, where some caching schemes violate these semantics.
The first guarantee (about subsequent reads) requires some bookkeeping in the filesystem, because data which has been successfully "written" to a kernel buffer but not yet synched to disk must be made transparently available to processes reading from that file. This also requires some internal locking of kernel metadata.
Since writing to regular files is typically accomplished via kernel buffers and actually synching the data to the physical storage device is definitely not atomic, the locks necessary to provide these guarantee don't have to be very long-lasting. But they must be done inside the filesystem because nothing in the Posix wording limits the guarantees to simultaneous writes within a single threaded process.
Within a multithreaded process, Posix does require read(), write(), pread() and pwrite() to be atomic when they operate on regular files (or symbolic links). See Thread Interactions with Regular File Operations for a complete list of interfaces which must obey this requirement.
In Linux there are blocking and non-blocking system calls. The write is an example of blocking system call, which means the execution thread will be blocked until the write completes. So once the user process called write, it can not execute anything else until the system call is complete. So from user thread perspective it will behave like atomic [although at kernel level lot many things can happen and kernel execution of system call can be interrupted many times].
It seems that glibc's implementation of fprintf() is thread-safe, but is that so for Microsoft's CRT, as well?
By thread-safe, I don't mean just crashing, but also that if multiple threads (in the same process) call fprintf(), the texts will not be mixed.
That is, for example, if thread A calls fprintf(stdout, "aaaa"); and thread B calls fprintf(stdout, "bbbb"); it's guaranteed not to mix to become aabbaabb.
Is there such a guarantee?
Yes. In the multithreaded runtime libraries, every stream has an associated lock. This lock is acquired at the beginning of any call to a printf function and not released until just before that printf function returns.
This behavior is required by C11 (there was no concept of "threads" in standard C until C11). C11 §7.21.2/7-8 states:
Each stream has an associated lock that is used to prevent data races when multiple
threads of execution access a stream, and to restrict the interleaving of stream operations performed by multiple threads. Only one thread may hold this lock at a time. The lock is reentrant: a single thread may hold the lock multiple times at a given time.
All functions that read, write, position, or query the position of a stream lock the stream before accessing it. They release the lock associated with the stream when the access is complete.
Visual C++ does not fully support C11, but it does conform to this requirement. A couple of other Visual C++-specific comments:
As long as you are not defining _CRT_DISABLE_PERFCRIT_LOCKS (which only works with the statically-linked runtime libraries, libcmt.lib and friends) or using the _nolock-suffixed functions, then most operations on a single stream are atomic.
If you require atomicity across multiple operations on a stream, you can acquire the lock for a file yourself by acquiring and releasing the stream lock yourself using _lock_file and _unlock_file.
I'm writing a server web.
Each connection is served by a separate thread, so I don't know in advance the number of threads.
There are also a group of text files (don't know the number, too), and each thread can read/write on each file.
A file can be written by just one thread a time, but different threads can write on different files at the same time.
If a file is read by one or more threads (reads can be concurrent), no thread can write on THAT file.
Now, I noticed this (Thread safe multi-file writing) solution, but I'd like also to use functions as fgets(), for example.
So, can I flock() a file, and then use a fgets() or another stdio read/write library function?
First of all, use fcntl, not flock. The latter is a non-standard, deprecated BSD function and does not work with NFS and possibly other filesystems. fcntl locking on the other hand is POSIX standard and is intended to work everywhere.
Now if you want to use file-level reader-writer locking mixed with stdio, it will work, but you have to take some care to ensure that buffering does not break your assumptions about locks. The method I'm about to explain is not the only one, but I believe it's the clearest/simplest:
When you want to operate on one of your files with stdio, obtaining the correct type of lock (read or write, aka shared of exclusive) should be the first thing you do after fopen. Use fileno to get the file descriptor number and apply the lock to it. After that, perform your entire read or write operation. Do not make any attempt to unlock the file; instead, call fclose to close the file and let it be implicitly unlocked when it's closed. Otherwise you may release the lock while unbuffered data is still unwritten, or later read data that was buffered before the lock was released, that's no longer valid after the lock is released.
Does the OS handle it correctly?
Or will I have to call flock()?
Although the OS won't crash, and the filesystem won't be corrupted, calls to write() are NOT guarenteed to be atomic, unless the file descriptor in question is a pipe, and the amount of data to be written is PIPE_MAX bytes or less. The relevant part of the standard:
An attempt to write to a pipe or FIFO has several major characteristics:
Atomic/non-atomic: A write is atomic if the whole amount written in one operation is not interleaved with data from any other process. This is useful when there are multiple writers sending data to a single reader. Applications need to know how large a write request can be expected to be performed atomically. This maximum is called {PIPE_BUF}. This volume of IEEE Std 1003.1-2001 does not say whether write requests for more than {PIPE_BUF} bytes are atomic, but requires that writes of {PIPE_BUF} or fewer bytes shall be atomic.
[...]
As such, in principle, you must lock with simultaneous writers, or your written data may get mixed up and out of order (even within the same write) or you may have multiple writes overwriting each other. However, there is an exception - if you pass O_APPEND, your writes will be effectively atomic:
If the O_APPEND flag of the file status flags is set, the file offset shall be set to the end of the file prior to each write and no intervening file modification operation shall occur between changing the file offset and the write operation.
Although this is not necessarily atomic with respect to non-O_APPEND writes, or simultaneous reads, if all writers use O_APPEND, and you synchronize somehow before doing a read, you should be okay.
write (and writev, too) guarantee atomicity.
Which means if two threads or processes write simultaneously, you do not have a guarantee which one writes first. But you do have the guarantee that anything that is in one syscall will not be intermingled with data from the other one.
Insofar it will always work correctly, but not necessarily in the way you expect (if you assume that process A comes before process B).
Of course the kernel will handle it correctly, for the kernel’s idea of correctness — which is by definition correct.
If you have a set of coöperating flockers, then you can use the kernel to queue everyone up. But remember that flock has nothing to do with I/O: it will not stop someone else from writing the file. It will at most only interfere with other flockers.
Yes of course it will work correctly. It won't crash the OS or the process.
Whether it makes any sense, depends on the way the application(s) are written an what the file's purpose is.
If the file is opened by all processes as append-only, each process (notionally) does an atomic seek-to-end before each write; these are guaranteed not to overwrite each others' data (but of course, the order is nondeterministic).
In any case, if you use a library which potentially splits a single logical write into several write syscalls, expect trouble.
write(), writev(), read(), readv() can generate partial writes/reads where the amount of data transferred is smaller than what was requested.
Quoting the Linux man page for writev():
Note that is not an error for a successful call to transfer fewer bytes than requested
Quoting the POSIX man page:
If write() is interrupted by a signal after it successfully writes some data, it shall return the number of bytes written.
AFAIU, O_APPEND does not help in this regard because it does not prevent partial writes: it only ensures that whatever data is written is appended at the end of the file.
See this bug report from the Linux kernel:
A process is writing a messages to the file. [...] the writes [...] can be split in two. [...] So if the signal arrives [...] the write is interrupted. [...] this is perfectly valid behavior as far as spec (POSIX, SUS,...) is concerned
FIFOs and PIPE writes smaller than PIPE_MAX however are guaranteed to be atomic.
Is there a problem with using pread on the same file descriptor from 2 or more different threads at the same time?
pread itself is thread-safe, since it is not on the list of unsafe functions. So it is safe to call it.
The real question is: what happens if you read from the same file concurrently (not necessarily from two threads, but also from two processes).
Regarding this, the specification says:
The behavior of multiple concurrent reads on the same pipe, FIFO, or terminal device is unspecified.
Note that it doesn't mention ordinary files. This bit relates only to read anyway, because pread cannot be used on unseekable files.
I/O is intended to be atomic to ordinary files and pipes and FIFOs.
But this is from the non-normative section, so your OS might do it differently. E.g., if you read from two threads and there is a concurrent write, you might get different pieces of the write in your two read buffers. But this kind of problem is not specific to multithreading.
Also nice to know that in some cases
read() shall block the calling thread
Not the process, just the thread. And
A thread that has blocked shall not prevent any unblocked thread [...] from eventually making forward progress
As we are using same fd, we have to bind a lock otherwise there will be mix of data from the two pread on the file descriptor.
Hence yes there is a problem in doing this
http://linux.die.net/man/2/pread
I'm not 100% sure but I think that the file descriptor structure itself isn't thread safe, so two concurrent changes to it would corrupt it. You need some kind of locking.