My question is quite simple. Is reading and writing from and to a serial port under Linux thread-safe? Can I read and write at the same time from different threads? Is it even possible to do 2 writes simultaneously? I'm not planning on doing so but this might be interesting for others. I just have one thread that reads and another one that writes.
There is little to find about this topic.
More on detail—I am using write() and read() on a file descriptor that I obtained by open(); and I am doing so simultaneously.
Thanks all!
Roel
There are two aspects to this:
What the C implementation does.
What the kernel does.
Concerning the kernel, I'm pretty sure that it will either support this or raise an according error, otherwise this would be too easy to exploit. The C implementation of read() is just a syscall wrapper (See what happens after read is called for a Linux socket), so this doesn't change anything. However, I still don't see any guarantees documented there, so this is not reliable.
If you really want two threads, I'd suggest that you stay with stdio functions (fopen/fread/fwrite/fclose), because here you can leverage the fact that the glibc synchronizes these calls with a mutex internally.
However, if you are doing a blocking read in one thread, the other thread could be blocked waiting to write something. This could be a deadlock. A solution for that is to use select() to detect when there is some data ready to be read or buffer space to be written. This is done in a single thread though, but while the initial code is a bit larger, in the end this approach is easier and cleaner, even more so if multiple streams are involved.
Related
When a program is doing I/O, my understanding is that the thread will briefly sleep and then resume (e.g. when writing to a file). My question is that when we do printing using printf(), does a C program thread sleep in any way ?
Since you've specifically asked for printf(), I'm going to assume that you mean in the most generic way where it will fill a reasonably sized buffer and invoke the system call write(2) to stdout and that the stdout happens to point to your terminal.
In most operating systems, when you invoke certain system calls the calling thread/process is removed from CPU runnable list and placed in a separate waiting list. This is true for all I/O calls like read/write/etc. Being temporarily removed from processing due to I/O is not the same as being put to sleep via a timer.
For example, in Linux there's uninterruptible sleep state of a thread/process specifically meant for I/O waiting, while interruptible sleep state for those thread/process that are waiting on timers and events. Though, from a dumb user's perspective they both seem to be same, their implementation behind the scenes are significantly different.
To answer your question, a call to printf() isn't exactly sleeping but waiting for the buffer to be flushed to device rather than actually being in sleep. Even then there are a few more quirks which you can read about it in signal(7) and even more about various process/thread states from Marek's blog.
Hope this helps.
Much of the point of stdio.h is that it buffers I/O: a call to printf will often simply put text into a memory buffer (owned by the library by default) and perform zero system calls, thus offering no opportunity to yield the CPU. Even when something like write(2) is called, the thread may continue running: the kernel can copy the data into kernel memory (from which it will be transferred to the disk later, e.g. by DMA) and return immediately.
Of course, even on a single-core system, most operating systems frequently interrupt the running thread in order to share it. So another thread can still run at any time, even if no blocking calls are made.
Some Unix code I am working on depends on being able to poll over a small number of pipes. poll is a POSIX system call that (much like the older select) allows the process to wait until one or more file descriptors is "ready" for reading or writing, which means one can proceed to do so without blocking. This is useful to implement event loops where waiting is clearly separated from the rest of the communication.
Is it possible to do the same for Windows pipe handles - wait for one or more of them to become "ready" for reading/writing?
Existing SO advice on the matter, such as answers to this question, recommend the use of completion ports. However as far as I can tell, completion ports require initiating reading/writing beforehand, and then waiting for (or being notified of) the completion of those operations. This approach does not fit the architecture of the code, which strongly separates the polling code from the reading/writing code, the latter calling into a library that uses the regular ReadFile and WriteFile on the underlying handle.
If there is no direct equivalent to poll, could one abuse completion ports to provide something similar? In other words, is it possible to create IO completion events that announce "you can now call ReadFile (WriteFile) on this handle without it blocking" and wait for them using WaitForMultipleObjects or GetQueuedCompletionStatus?
What would happen if you call read (or write, or both) in two different thread, on the same file descriptor (lets says we are interested about a local file, and a it's a socket file descriptor), without using explicitly a synchronization mechanism?
Read and Write are syscall, so, on a single core CPU, it's probably unlucky that two read would be executed "at the same time". But with multiple cores...
What the linux kernel will do?
And let's be a bit more general : is the behavior always the same for other kernels (like BSDs) ?
Edit : According to the close documentation, we should be sure that the file descriptor isn't used by a syscall in an other thread. So it seams that explicit synchronization would be required before closing a file descriptor (and so, also around read/write if thread that may call it are still running).
Any system level (syscall) file descriptor access is thread safe in all mainstream UNIX-like OSes.
Though depending on the age they are not necessarily signal safe.
If you call read, write, accept or similar on a file descriptor from two different tasks then the kernel's internal locking mechanism will resolve contention.
For reads each byte may be only read once though and writes will go in any undefined order.
The stdio library functions fread, fwrite and co. also have by default internal locking on the control structures, though by using flags it is possible to disable that.
The comment about close is because it doesn't make a lot of sense to close a file descriptor in any situation in which some other thread might be trying to use it. So while it is 'safe' as far as the kernel is concerned, it can lead to odd, hard to diagnose corner cases.
If a thread closes a file descriptor while a second thread is trying to read from it, the second thread may get an unexpected EBADF error. Worse, if a third thread is simultaneously opening a new file, that might reallocate the same fd, and the second thread might accidentally read from the new file rather than the one it was expecting...
Have a care for those who follow in your footsteps
It's perfectly normal to protect the file descriptor with a mutex semaphore. It removes any dependence on kernel behaviour so your message boundaries are now certain. You then don't have to cite the last paragraph at the bottom of a 15,489 line manpage which explains why the mutex isn't necessary (I exaggerated, but you get my meaning)
It also makes it clear to anyone reading your code that the file descriptor is being used by more than one thread.
Fringe Benefit
There is a fringe benefit to using a mutex that way. Suppose you've got different messages coming from the different threads and some of those messages are more important than others. All you need to do is set the thread priorities to reflect their messages' importance. That way the OS will ensure that your messages will be sent in order of importance for minimal effort on your part.
The result would depend on how the threads are scheduled to run at that particular instant in time.
One way to potentially avoid undefined behavior with multi-threading is to assume that you are doing memory operations. E.g. updating a linked list or changing a variable, etc.
If you use mutex/semaphores/lock or some other synchronization mechanism, it should work as intended.
Is there a problem with using pread on the same file descriptor from 2 or more different threads at the same time?
pread itself is thread-safe, since it is not on the list of unsafe functions. So it is safe to call it.
The real question is: what happens if you read from the same file concurrently (not necessarily from two threads, but also from two processes).
Regarding this, the specification says:
The behavior of multiple concurrent reads on the same pipe, FIFO, or terminal device is unspecified.
Note that it doesn't mention ordinary files. This bit relates only to read anyway, because pread cannot be used on unseekable files.
I/O is intended to be atomic to ordinary files and pipes and FIFOs.
But this is from the non-normative section, so your OS might do it differently. E.g., if you read from two threads and there is a concurrent write, you might get different pieces of the write in your two read buffers. But this kind of problem is not specific to multithreading.
Also nice to know that in some cases
read() shall block the calling thread
Not the process, just the thread. And
A thread that has blocked shall not prevent any unblocked thread [...] from eventually making forward progress
As we are using same fd, we have to bind a lock otherwise there will be mix of data from the two pread on the file descriptor.
Hence yes there is a problem in doing this
http://linux.die.net/man/2/pread
I'm not 100% sure but I think that the file descriptor structure itself isn't thread safe, so two concurrent changes to it would corrupt it. You need some kind of locking.
Is fprintf thread-safe? The glibc manual seems to say it is, but my application, which writes to a file using single call to fprintf() seems to be intermingling partial writes from different processes.
edit: To clarify, the program in question is a lighttpd plugin, and the server is running with multiple worker threads.
Looking at the file, some of the writes are intermingled.
edit 2: It seems the problem I'm seeing might be due to lighttpd's "worker threads" actually being separate processes: http://redmine.lighttpd.net/wiki/lighttpd/Docs:MultiProcessor
Problems
By running 2 or more processes on the
same socket you will have a better
concurrency, but will have a few
drawbacks that you have to be aware
of:
mod_accesslog might create broken access logs, as the same file is opened twice and is NOT synchronized.
mod_status will have n separate counters, one set for each
process.
mod_rrdtool will fail as it receives the same timestamp twice.
mod_uploadprogress will not show correct status.
You're confusing two concepts - writing from multiple threads and writing from multiple processes.
Inside a process its possible to ensure that one invocation of fprintf is completed before the next is allowed access to the output buffer, but once your app pumps that output to a file you're at the mercy of the OS. Without some kind of OS based locking mechanism you cant ensure that an entirely different application doesnt write to your log file.
Sounds to me like you need to read on file locking. The problem you have is that multiple processes (i.e. not threads) are writing to the same file simultaneously and there is no reliable way to insure the writes will be atomic. This can result in files overwriting each other's writes, mixed output, and altogether non-deterministic behaviour.
This has nothing to do with Thread Safety, as this is relevant only in single-process multithreading programs.
The current C++ standard says nothing useful about concurrency, nor does the 1990 C standard. (I haven't read the 1999 C standard, so can't comment on it; the upcoming C++0x standard does say things, but I don't know exactly what offhand.)
This means that fprintf() itself is likely neither thread-safe nor otherwise, and that it would depend on the implementation. I'd read exactly what the glibc documentation says about it, and compare it to exactly what you're doing.