Is fprintf thread-safe? The glibc manual seems to say it is, but my application, which writes to a file using single call to fprintf() seems to be intermingling partial writes from different processes.
edit: To clarify, the program in question is a lighttpd plugin, and the server is running with multiple worker threads.
Looking at the file, some of the writes are intermingled.
edit 2: It seems the problem I'm seeing might be due to lighttpd's "worker threads" actually being separate processes: http://redmine.lighttpd.net/wiki/lighttpd/Docs:MultiProcessor
Problems
By running 2 or more processes on the
same socket you will have a better
concurrency, but will have a few
drawbacks that you have to be aware
of:
mod_accesslog might create broken access logs, as the same file is opened twice and is NOT synchronized.
mod_status will have n separate counters, one set for each
process.
mod_rrdtool will fail as it receives the same timestamp twice.
mod_uploadprogress will not show correct status.
You're confusing two concepts - writing from multiple threads and writing from multiple processes.
Inside a process its possible to ensure that one invocation of fprintf is completed before the next is allowed access to the output buffer, but once your app pumps that output to a file you're at the mercy of the OS. Without some kind of OS based locking mechanism you cant ensure that an entirely different application doesnt write to your log file.
Sounds to me like you need to read on file locking. The problem you have is that multiple processes (i.e. not threads) are writing to the same file simultaneously and there is no reliable way to insure the writes will be atomic. This can result in files overwriting each other's writes, mixed output, and altogether non-deterministic behaviour.
This has nothing to do with Thread Safety, as this is relevant only in single-process multithreading programs.
The current C++ standard says nothing useful about concurrency, nor does the 1990 C standard. (I haven't read the 1999 C standard, so can't comment on it; the upcoming C++0x standard does say things, but I don't know exactly what offhand.)
This means that fprintf() itself is likely neither thread-safe nor otherwise, and that it would depend on the implementation. I'd read exactly what the glibc documentation says about it, and compare it to exactly what you're doing.
Related
What would happen if you call read (or write, or both) in two different thread, on the same file descriptor (lets says we are interested about a local file, and a it's a socket file descriptor), without using explicitly a synchronization mechanism?
Read and Write are syscall, so, on a single core CPU, it's probably unlucky that two read would be executed "at the same time". But with multiple cores...
What the linux kernel will do?
And let's be a bit more general : is the behavior always the same for other kernels (like BSDs) ?
Edit : According to the close documentation, we should be sure that the file descriptor isn't used by a syscall in an other thread. So it seams that explicit synchronization would be required before closing a file descriptor (and so, also around read/write if thread that may call it are still running).
Any system level (syscall) file descriptor access is thread safe in all mainstream UNIX-like OSes.
Though depending on the age they are not necessarily signal safe.
If you call read, write, accept or similar on a file descriptor from two different tasks then the kernel's internal locking mechanism will resolve contention.
For reads each byte may be only read once though and writes will go in any undefined order.
The stdio library functions fread, fwrite and co. also have by default internal locking on the control structures, though by using flags it is possible to disable that.
The comment about close is because it doesn't make a lot of sense to close a file descriptor in any situation in which some other thread might be trying to use it. So while it is 'safe' as far as the kernel is concerned, it can lead to odd, hard to diagnose corner cases.
If a thread closes a file descriptor while a second thread is trying to read from it, the second thread may get an unexpected EBADF error. Worse, if a third thread is simultaneously opening a new file, that might reallocate the same fd, and the second thread might accidentally read from the new file rather than the one it was expecting...
Have a care for those who follow in your footsteps
It's perfectly normal to protect the file descriptor with a mutex semaphore. It removes any dependence on kernel behaviour so your message boundaries are now certain. You then don't have to cite the last paragraph at the bottom of a 15,489 line manpage which explains why the mutex isn't necessary (I exaggerated, but you get my meaning)
It also makes it clear to anyone reading your code that the file descriptor is being used by more than one thread.
Fringe Benefit
There is a fringe benefit to using a mutex that way. Suppose you've got different messages coming from the different threads and some of those messages are more important than others. All you need to do is set the thread priorities to reflect their messages' importance. That way the OS will ensure that your messages will be sent in order of importance for minimal effort on your part.
The result would depend on how the threads are scheduled to run at that particular instant in time.
One way to potentially avoid undefined behavior with multi-threading is to assume that you are doing memory operations. E.g. updating a linked list or changing a variable, etc.
If you use mutex/semaphores/lock or some other synchronization mechanism, it should work as intended.
I wrote a C program and in the program there are many printf() which output log information to stdout. Now I want to use multiple processes to run the program simultaneously with different arguments. And I want to redirect the output from stdout to a log file using >.
But multiple processes are running at the same time, their log information output overlap with each other, which can be confusing for future analysis.
one solution is: considering that different processes will exit at different time,modify the C program, so each log information is temporarily written into a temporal file. When the C program is about to exit. Read from the temporal file and write the content to stdout, this requires a lot of modification.
My idea is: I hope in the C program, all the printf() output can be buffered, the outputs put into stdout/redirection only when the process exits.
is it possible or not?
thanks!
This is not really possible, unless you are sure that the output is reasonably bounded (e.g. the total output is less than a few megabytes), otherwise use a logging mechanism which send to some central logger (like syslog).
On Linux and most Posix systems, the simplest way to do logging would be to use syslog(3) which is designed for logging (and is able to deal with different processes). I think this is the preferable approach.
With GNU libc, you could consider using open_memstream(3) -to write to memory, and here you need to be sure the total output is bounded- and use atexit(3) to have the memory stream written at the exit of the program into some file; you probably want to use some locking mechanism like flock(2) etc...
As commented by J.Holetzeck the simplest way is to redirect output into different files (perhaps using freopen(3), or simply in the invoking shell), and later merge these files.
I'm guessing you use Linux, or some Posix system. For Windows, I have no idea.
My question is quite simple. Is reading and writing from and to a serial port under Linux thread-safe? Can I read and write at the same time from different threads? Is it even possible to do 2 writes simultaneously? I'm not planning on doing so but this might be interesting for others. I just have one thread that reads and another one that writes.
There is little to find about this topic.
More on detail—I am using write() and read() on a file descriptor that I obtained by open(); and I am doing so simultaneously.
Thanks all!
Roel
There are two aspects to this:
What the C implementation does.
What the kernel does.
Concerning the kernel, I'm pretty sure that it will either support this or raise an according error, otherwise this would be too easy to exploit. The C implementation of read() is just a syscall wrapper (See what happens after read is called for a Linux socket), so this doesn't change anything. However, I still don't see any guarantees documented there, so this is not reliable.
If you really want two threads, I'd suggest that you stay with stdio functions (fopen/fread/fwrite/fclose), because here you can leverage the fact that the glibc synchronizes these calls with a mutex internally.
However, if you are doing a blocking read in one thread, the other thread could be blocked waiting to write something. This could be a deadlock. A solution for that is to use select() to detect when there is some data ready to be read or buffer space to be written. This is done in a single thread though, but while the initial code is a bit larger, in the end this approach is easier and cleaner, even more so if multiple streams are involved.
Is there a problem with using pread on the same file descriptor from 2 or more different threads at the same time?
pread itself is thread-safe, since it is not on the list of unsafe functions. So it is safe to call it.
The real question is: what happens if you read from the same file concurrently (not necessarily from two threads, but also from two processes).
Regarding this, the specification says:
The behavior of multiple concurrent reads on the same pipe, FIFO, or terminal device is unspecified.
Note that it doesn't mention ordinary files. This bit relates only to read anyway, because pread cannot be used on unseekable files.
I/O is intended to be atomic to ordinary files and pipes and FIFOs.
But this is from the non-normative section, so your OS might do it differently. E.g., if you read from two threads and there is a concurrent write, you might get different pieces of the write in your two read buffers. But this kind of problem is not specific to multithreading.
Also nice to know that in some cases
read() shall block the calling thread
Not the process, just the thread. And
A thread that has blocked shall not prevent any unblocked thread [...] from eventually making forward progress
As we are using same fd, we have to bind a lock otherwise there will be mix of data from the two pread on the file descriptor.
Hence yes there is a problem in doing this
http://linux.die.net/man/2/pread
I'm not 100% sure but I think that the file descriptor structure itself isn't thread safe, so two concurrent changes to it would corrupt it. You need some kind of locking.
Is writing to stdout using printf thread-safe on Linux? What about using the lower-level write command?
It's not specified by the C standard -- it depends on your implementation of the C standard library. In fact, the C standard doesn't even mention threads at all, since certain systems (e.g. embedded systems) don't have multithreading.
In the GNU implementation (glibc), most of the higher-level functions in stdio that deal with FILE* objects are thread-safe. The ones that aren't usually have unlocked in their names (e.g. getc_unlocked(3)). However, the thread safety is at a per-function call level: if you make multiple calls to printf(3), for example, each of those calls is guaranteed to output atomically, but other threads might print things out between your calls to printf(). If you want to ensure that a sequence of I/O calls gets output atomically, you can surround them with a pair of flockfile(3)/funlockfile(3) calls to lock the FILE handle. Note that these functions are reentrant, so you can safely call printf() in between them, and that won't result in deadlock even thought printf() itself makes a call to flockfile().
The low-level I/O calls such as write(2) should be thread-safe, but I'm not 100% sure of that - write() makes a system call into the kernel to perform I/O. How exactly this happens depends on what kernel you're using. It might be the sysenter instruction, or the int (interrupt) instruction on older systems. Once inside the kernel, it's up to the kernel to make sure that the I/O is thread-safe. In a test I just did with the Darwin Kernel Version 8.11.1, write(2) appears to be thread-safe.
Whether you'd call it "thread-safe" depends on your definition of thread-safe. POSIX requires stdio functions to use locking, so your program will not crash, corrupt the FILE object states, etc. if you use printf simultaneously from multiple threads. However, all stdio operations are formally specified in terms of repeated calls to fgetc and fputc, so there is no larger-scale atomicity guaranteed. That is to say, if threads 1 and 2 try to print "Hello\n" and "Goodbye\n" at the same time, there's no guarantee that the output will be either "Hello\nGoodbye\n" or "Goodbye\nHello\n". It could just as well be "HGelolodboy\ne\n". In practice, most implementations will acquire a single lock for the entire higher-level write call simply because it's more efficient, but your program should not assume so. There may be corner cases where this is not done; for instance an implementation could probably entirely omit locking on unbuffered streams.
Edit: The above text about atomicity is incorrect. POSIX guarantees all stdio operations are atomic, but the guarantee is hidden in the documentation for flockfile: http://pubs.opengroup.org/onlinepubs/9699919799/functions/flockfile.html
All functions that reference ( FILE *) objects shall behave as if they use flockfile() and funlockfile() internally to obtain ownership of these ( FILE *) objects.
You can use the flockfile, ftrylockfile, and funlockfile functions yourself to achieve larger-than-single-function-call atomic writes.
They are both thread-safe to the point that your application won't crash if multiple threads call them on the same file descriptor. However, without some application-level locking, whatever is written could be interleaved.
C got a new standard since this question was asked (and last answered).
C11 now comes with multithreading support and addresses multithreaded behavior of streams:
§7.21.2 Streams
¶7 Each stream has an associated lock that is used to prevent data races when multiple threads of execution access a stream, and to restrict the interleaving of stream operations performed by multiple threads. Only one thread may hold this lock at a time. The lock is reentrant: a single thread may hold the lock multiple times at a given time.
¶8 All functions that read, write, position, or query the position of a stream lock the stream before accessing it. They release the lock associated with the stream when the access is complete.
So, an implementation with C11 threads must guarantee that using printf is thread-safe.
Whether atomicity (as in no interleaving1) is guaranteed, wasn't that clear to me at a first glance, because the standard spoke of restricting interleaving, as opposed to preventing, which it mandated for data races.
I lean towards it being guaranteed. The standard speaks of restricting interleaving, as some interleaving that doesn't change the outcome is still allowed to happen; e.g. fwrite some bytes, fseek back some more and fwrite till the original offset, so that both fwrites are back-to-back. The implementation is free to reorder these 2 fwrites and merge them into a single write.
1: See the strike-through text in R..'s answer for an example.
It's thread-safe; printf should be reentrant, and you won't cause any strangeness or corruption in your program.
You can't guarantee that your output from one thread won't start half way through the output from another thread. If you care about that you need to develop your own locked output code to prevent multiple access.