It's often said that one shouldn't use C standard I/O functions (like fprintf(), fscanf()) when working with sockets.
I can't understand why. I think if the reason was just in their buffered nature, one could just flush the output buffer each time he outputs, right?
Why everyone uses UNIX I/O functions instead? Are there any situations when the use of standard C functions is appropriate and correct?
You can certainly use stdio with sockets. You can even write a program that uses nothing but stdin and stdout, run it from inetd (which provides a socket on STDIN_FILENO and STDOUT_FILENO), and it works even though it doesn't contain any socket code at all.
What you can't do is mix buffered I/O with select or poll because there is no fselect or fpoll working on FILE *'s and you can't even implement one yourself because there's no standard way of querying a FILE * to find out whether its input buffer is empty.
As soon as you need to handle multiple connections, stdio is not good enough.
It's totally fine when you have simple scenario with one socket in blocking mode and your application protocol is text-based.
It quickly becomes a huge pain with more then one or non-blocking socket(s), with any sort of binary encoding, and with any real performance requirements.
Do not know any direct objection. Most likely this will work fine.
At the same time I can imagine that a platform, where fprintf() and fscanf() have their own buffers, staying above the file descriptor layer. You may not be able to flush these buffers.
It is hard to speak about all possible platforms. This means that it is better to avoid this with sockets.
At the end of the day the app program should solve the app problem. It should not be a compiler/library test.
It's because sockets (TCP sockets, for example) are readable and writable as if they were files or pipes, but this is just an abstraction. The inner workings of a network connection are much more complicated than a local file or pipe.
To start with, reading a file is always "fast", either you get the data or bump end-of-file. In the other hand, if you expect 500 bytes from a TCP connection and it sends 499 (and the connection is not closed), you may be waiting forever. Writing is the same thing: it will block after TCP output buffer.
Even the most basic program needs to handle timeouts, disconnection, and all these things interact with FILE's own buffered I/O, not even textbook examples could be expected to work well.
Related
When writing directly to a device in /dev, I open a file descriptor and perform a UNIX write() followed by a read(). Can I have multiple threads doing this write()/read() sequence on the same file descriptor, and not get jumbled data if two threads enter the write() function at the same time?
References to std documentation would be immensely helpful. I've not been able to find anything though. Someone has mentioned that such operations are atomic in the kernel, but I am sceptical.
Also, to clarify this is a file in /dev, so any insight as to how far the "file pointer" concept applies here is helpful as well.
File pointers (FILE *fp, for example), are a layer in the user-side code sitting above the function calls (such as write()). Access to fp is controlled by locks in a threaded environment (you can't have to threads modifying the same structure at the same time).
Inside the kernel, I'd expect there to be a lock on the file descriptor (and/or 'open file description') to prevent it being used from two threads at once.
You can look up the POSIX specification for read() and
getchar_unlocked()
to find out more about locking etc — at least for a POSIX compliant implementation.
Note that POSIX still uses C99. Therefore, it is not cognizant of the C11 thread facilities. The C11 standard does not have read() et al (file I/O using file descriptors), so it says nothing about such system calls. Neither does it provide a getchar_unlocked() or any of its relatives.
Caveat: I have not been in kernels for awhile, but this is the way it used to work.
For disk files:
Can you open the file in append mode, writing block sizes <= BLKSIZE ?
Small enough block sizes guarantee, in POSIX environments, atomic writes in POSIX environments (actually, the limit may be greater than BLKSIZE... I'm too lazy to hunt around for the alternate symbol).
Append guarantees seeks to the end of the file... for devices supporting seeks. Combine with atomic writes you should be golden.
Each buffer must stand by itself under the assumption some "foreign" information may follow it.
For ttys:
Append mode makes no sense here. As before, but paying attention to line endings gets even more important. And this very much does not apply to reads. Codes ttys treat as control sequences can also trip you up if even the modes the sequences enable split across blocks.
For other devices:
Can get tricky here. Depends on the device.
I'm going to assume that you are referring to a generic character device (e.g. a tty) since you were not specific. As far as I know, each fd-type operation (e.g. read()/write()) maps directly into a call into the driver.
Therefore, the driver will receive each write()'s data chunk as a whole and not see the next one's data until it is done (e.g. data is queued to be transmitted).
However, if the driver does not consume the entire chunk of data at once (i.e. write() returns less than the specified number of bytes, then there is no guarantee, that the thread will be able to write again with the remainder before another thread does a different write().
Also, as Johnathan Leffler noted, if you use standard I/O with process-level buffering, all bets are off.
Bottom line, if you are using direct fd writes, each write will map directly to one driver function call. From there, it's up to the driver as to if write is atomic.
Edit: wlformyd brings up the question of locking between multiple threads on multiple processors. To my knowledge, there is no locking on a FD and, in fact, that would be ineffective as multiple FDs could be used to access the same device.
I believe it is up to the driver itself to do locking to prevent contention to internal queues and/or hardware. in that sense, on a multi-processor system, the kernel doesn't prevent multiple simultaneous access to a driver's write routine. However, a properly written driver should do the locking to prevent mixing of output between two write calls.
man pipe -s7 documents writing to the pipe very well.
The part important to me being that the write will only ever be partially completed if O_NONBLOCK is set, and write length is greater than PIPE_BUF.
However, nothing is said about the read end.
I am sending structures representing events through my pipe in blocking mode at the write end.
At the read end, i am processing those events (and other things) in an update loop in non-blocking mode.
Since my struct is smaller than PIPE_BUF, will read ALWAYS read a whole number of structs? Or do i need to handle the possibility of only part my struct being read ?
Common sense tells me that read behavior will mirror the documented write behavior, but i would be happier if this was specified.
I'm working on Linux ( kernel 3.8, x86_64 ). But it is important that my code is portable across different UNIX flavors, and CPU architectures.
Thanks.
Chris.
The comments are right: read is not atomic. The whole point of atomicity of write is to allow multiple writers without corruption from interleaving data. Multiple readers are much less useful, but even if they were useful, supporting atomic reads would require maintaining packet boundaries in pipes, which do not exist.
Reads from a pipe are not atomic.
From the standard page for read()
The standard developers considered adding atomicity requirements to a pipe or FIFO, but recognized that due to the nature of pipes and FIFOs there could be no guarantee of atomicity of reads of {PIPE_BUF} or any other size that would be an aid to applications portability.
I'm duplicating a "master" pipe with tee() to write to multiple sockets using splice(). Naturally these pipes will get emptied at different rates depending on how much I can splice() to the destination sockets. So when I next go to add data to the "master" pipe and then tee() it again, I may have a situation where I can write 64KB to the pipe but only tee 4KB to one of the "slave" pipes. I'm guessing then that if I splice() all of the "master" pipe to the socket, I will never be able to tee() the remaining 60KB to that slave pipe. Is that true? I guess I can keep track of a tee_offset (starting at 0) which I set to the start of the "unteed" data and then don't splice() past it. So in this case I would set tee_offset to 4096 and not splice more than that until I'm able to tee it to all to the other pipes. Am I on the right track here? Any tips/warnings for me?
If I understand correctly, you've got some realtime source of data that you want to multiplex to multiple sockets. You've got a single "source" pipe hooked up to whatever's producing your data, and you've got a "destination" pipe for each socket over which you wish to send the data. What you're doing is using tee() to copy data from the source pipe to each of the destination pipes and splice() to copy it from the destination pipes to the sockets themselves.
The fundamental issue you're going to hit here is if one of the sockets simply can't keep up - if you're producing data faster than you can send it, then you're going to have a problem. This isn't related to your use of pipes, it's just a fundamental issue. So, you'll want to pick a strategy to cope in this case - I suggest handling this even if you don't expect it to be common as these things often come up to bite you later. Your basic choices are to either close the offending socket, or to skip data until it's cleared its output buffer - the latter choice might be more suitable for audio/video streaming, for example.
The issue which is related to your use of pipes, however, is that on Linux the size of a pipe's buffer is somewhat inflexible. It defaults to 64K since Linux 2.6.11 (the tee() call was added in 2.6.17) - see the pipe manpage. Since 2.6.35 this value can be changed via the F_SETPIPE_SZ option to fcntl() (see the fcntl manpage) up to the limit specified by /proc/sys/fs/pipe-size-max, but the buffering is still more awkward to change on-demand than a dynamically allocated scheme in user-space would be. This means that your ability to cope with slow sockets will be somewhat limited - whether this is acceptable depends on the rate at which you expect to receive and be able to send data.
Assuming this buffering strategy is acceptable, you're correct in your assumption that you'll need to track how much data each destination pipe has consumed from the source, and it's only safe to discard data which all destination pipes have consumed. This is somewhat complicated by the fact that tee() doesn't have the concept of an offset - you can only copy from the start of the pipe. The consequence of this is that you can only copy at the speed of the slowest socket, since you can't use tee() to copy to a destination pipe until some of the data has been consumed from the source, and you can't do this until all the sockets have the data you're about to consume.
How you handle this depends on the importance of your data. If you really need the speed of tee() and splice(), and you're confident that a slow socket will be an extremely rare event, you could do something like this (I've assumed you're using non-blocking IO and a single thread, but something similar would also work with multiple threads):
Make sure all pipes are non-blocking (use fcntl(d, F_SETFL, O_NONBLOCK) to make each file descriptor non-blocking).
Initialise a read_counter variable for each destination pipe to zero.
Use something like epoll() to wait until there's something in the source pipe.
Loop over all destination pipes where read_counter is zero, calling tee() to transfer data to each one. Make sure you pass SPLICE_F_NONBLOCK in the flags.
Increment read_counter for each destination pipe by the amount transferred by tee(). Keep track of the lowest resultant value.
Find the lowest resultant value of read_counter - if this is non-zero, then discard that amount of data from the source pipe (using a splice() call with a destination opened on /dev/null, for example). After discarding data, subtract the amount discarded from read_counter on all the pipes (since this was the lowest value then this cannot result in any of them becoming negative).
Repeat from step 3.
Note: one thing that's tripped me up in the past is that SPLICE_F_NONBLOCK affects whether the tee() and splice() operations on the pipes are non-blocking, and the O_NONBLOCK you set with fnctl() affects whether the interactions with other calls (e.g. read() and write()) are non-blocking. If you want everything to be non-blocking, set both. Also remember to make your sockets non-blocking or the splice() calls to transfer data to them might block (unless that's what you want, if you're using a threaded approach).
As you can see, this strategy has a major problem - as soon as one socket blocks up, everything halts - the destination pipe for that socket will fill up, and then the source pipe will become stagnant. So, if you reach the stage where tee() returns EAGAIN in step 4 then you'll want to either close that socket, or at least "disconnect" it (i.e. take it out of your loop) such that you don't write anything else to it until its output buffer is empty. Which you choose depends on whether your data stream can recovery from having bits of it skipped.
If you want to cope with network latency more gracefully then you're going to need to do more buffering, and this is going to involve either user-space buffers (which rather negates the advantages of tee() and splice()) or perhaps disk-based buffer. The disk-based buffering will almost certainly be significantly slower than user-space buffering, and hence not appropriate given that presumably you want a lot of speed since you've chosen tee() and splice() in the first place, but I mention it for completeness.
One thing that's worth noting if you end up inserting data from user-space at any point is the vmsplice() call which can perform "gather output" from user-space into a pipe, in a similar way to the writev() call. This might be useful if you're doing enough buffering that you've split your data among multiple different allocated buffers (for example if you're using a pool allocator approach).
Finally, you could imagine swapping sockets between the "fast" scheme of using tee() and splice() and, if they fail to keep up, moving them on to a slower user-space buffering. This is going to complicate your implementation, but if you're handling large numbers of connections and only a very small proportion of them are slow then you're still reducing the amount of copying to user-space that's involved somewhat. However, this would only ever be a short-term measure to cope with transient network issues - as I said originally, you've got a fundamental problem if your sockets are slower than your source. You'd eventually hit some buffering limit and need to skip data or close connections.
Overall, I would carefully consider why you need the speed of tee() and splice() and whether, for your use-case, simply user-space buffering in memory or on disk would be more appropriate. If you're confident that the speeds will always be high, however, and limited buffering is acceptable then the approach I outlined above should work.
Also, one thing I should mention is that this will make your code extremely Linux-specific - I'm not aware of these calls being support in other Unix variants. The sendfile() call is more restricted than splice(), but might be rather more portable. If you really want things to be portable, stick to user-space buffering.
Let me know if there's anything I've covered which you'd like more detail on.
I'm working on a little program that needs to pipe binary streams very closely (unbuffered). It has to rely on select() multiplexing and is never allowed to "hold existing input unless more input has arrived, because it's not worth it yet".
It's possible using System calls, but then again, I would like to use stdio for convenience (string formatting is involved, too).
Can I safely use select() on a stream's underlying file descriptor as long as I'm using unbuffered stdio? If not, how can I determine a FILE stream that will not block from a set?
Is there any call that transfers all input from libc to the the application, besides the char-by-char functions (getchar() and friends)?
While I'm not entirely clear on whether it's sanctioned by the standards, using select on fileno(f) should in practice work when f is unbuffered. Keep in mind however that unbuffered stdio can perform pathologically bad, and that you are not allowed to change the buffering except as the very first operation before you use the stream at all.
If your only concern is being able to do formatted output, the newly-standardized-in-POSIX-2008 dprintf (and vdprintf) function might be a better solution to your problem.
When I need buffered IO on blocking file descriptor I use stdio. But if I turn file descriptor into non-blocking mode according to manual stdio buffering is unusable. After some research I see that BIO can be usable for buffering non-blocking IO.
But may be there are other alternatives?
I need this to avoid using threads in a multi-connection environment.
I think what you are talking about is the Reactor Pattern. This is a pretty standard way of processing lots of network connections without threads, and is very common in multiplayer game server engines. Another implementation (in python) is twisted matrix.
The basic algorith is:
have a buffer for each socket
check which sockets are ready to read (select(), poll(), or just iterate)
for each socket:
call recv() and accumulate the contents into the socket's buffer until recv returns 0 or an error with EWOULDBLOCK
call application level data handler for the socket with the contents of the buffer
clear the socket's buffer
I see the question has been edited now, and is at least more understandable than before.
Anyway, isn't this a contradiction?
You make I/O non-blocking because you want to be able to read small amounts quickly, typically sacrificing throughput for latency.
You make it buffered because you don't care that much about latency, but want to make efficient use of the I/O subsystem by trading latency for throughput.
Doing them both at the same time seems like a contradiction, and is hard to imagine.
What are the semantics you're after? If you do this:
int fd;
char buf[1024];
ssize_t got;
fd = setup_non_blocking_io(...);
got = read(fd, buf, sizeof buf);
What behavior do you expect if there is 3 bytes available? Blocking/buffered I/O might block until able to read more satisfy your request, non-blocking I/O would return the 3 available bytes immediately.
Of course, if you have some protocol on top, that defines some kind of message structure so that you can know that "this I/O is incomplete, I can't parse it until I have more data", you can buffer it yourself at that level, and not pass data on upwards until a full message has been received.
Depending on the protocol, it is certainly possible that you will need to buffer your reads for a non-blocking network node (client or server).
Typically, these buffers provide multiple indexes (offsets) that both record the position of the last byte processed and last byte read (which is either the same or greater than the processed offset). And they also (should) provide richer semantics of compacting the buffer, transparent buffer size management, etc.
In Java (at least) the non-blocking network io (NIO) packages also provide a set of data structures (ByteBuffer, etc.) that are geared towards providing a general data structure.
There either exists such data structures for C, or you must roll your own. Once you have it, then simply read as much data as available and let the buffer manage issues such as overflow (e.g. reading bytes across message frame boundaries) and use the marker offset to mark off the bytes that you have processed.
As Android pointed out, you will (very likely) need to create matched buffers for each open connection.
You could create a struct with buffers for each open file descriptor, then accumulate these buffers until recv() returns 0 or you have data enough to process in your buffer.
If I understand your question correctly, you can't buffer because with non-blocking you're writing to the same buffer with multiple connections (if global) or just writing small pieces of data (if local).
In any case, your program has to be able to identify where the data is coming (possibly by file descriptor) from and buffer it accordingly.
Threading is also an option, it's not as scary as many make it sound out to be.
Ryan Dahl's evcom library which does exactly what you wanted.
I use it in my job and it works great. Be aware, though, that it doesn't (yet, but coming soon) have async DNS resolving. Ryan suggests udns by Michael Tokarev for that. I'm trying to adopt udns instead of blocking getaddrinfo() now.