Determining that reader has closed when using tee() function in Linux

Determining that reader has closed when using tee() function in Linux - c

When using the tee() system call to move data from one pipe to another, it returns 0 if the writer on the input pipe closes, but how can one discern whether the reader on the output pipe has closed?

For future generations, it looks like the answer is that tee() will return -1 signalling an error, with errno set to EPIPE, when the reader of the pipe has closed it, even though this isn't documented in the man pages. Correspondingly, a SIGPIPE will be generated, so be sure to properly handle that if you value your program continuing to execute.

Related

How can I understand the behavior of pipe, with varying data flow?

My problem is a bit hard to explain properly as I do not understand fully the behavior behind it.
I have been working on pipe and pipelines in C, and I noticed some behavior that is a bit mysterious to me.
Let's take a few example: Let's try to pipe yes with head. (yes | head). Even though I coded the behavior in a custom program, I don't understand how the pipe knows when to stop piping ? It seems two underlying phenomenons are causing this (maybe), the SIGPIPE and/or the internal size a pipe can take. How does the pipe stop piping, is it when it's full ? But the size of a pipe is way superior to 10 "yes\n" no ? And SIGPIPE only works when the end read/write is closed no ?
Also let's take another example, for example cat and ls: cat | ls or even cat | cat | ls.
It seems the stdin of the pipe is waiting for input, but how does it know when to stop, i.e. after one input ? What are the mechanism that permits this behavior?
Also can anyone provide me with others examples of these very specific behavior if there are any in pipes and pipelines so I can get an good overview of theses mechanism ?
In my own implementation, I managed to replicate that behavior using waitpid. However how does the child process itself know when to stop ? Is it command specific ?

The write operation will block when the pipe buffer is full, the read operation will block when the buffer is empty.
When the write end of the pipe is closed, the reading process will get an EOF indication after reading all data from the buffer. Many programs will terminate in this case.
When the read end of the pipe is closed, the writing process will get a SIGPIPE. This will also terminate most programs.
When you run cat | ls, STDOUT of cat is connected to STDIN of ls, but ls does not read from STDIN. On the system where I checked this, ls simply ignores STDIN and the file descriptor will be closed when ls terminates.
You will see the output of ls, and cat will be waiting for input.
cat will not write anything to STDOUT before it has read enough data from STDIN, so it will not notice that the other end of the pipe has been closed.
cat will terminate when it detects EOF on STDIN which can be done by pressing CTRL+D or by redirecting STDIN from /dev/null, or when it gets SIGPIPE after trying to write to the pipe which will happen when you (type something and) press ENTER.
You can see the behavior with strace.
cat terminates after EOF on input which is shown as read(0, ...) returning 0.
strace cat < /dev/null | ls
cat killed by SIGPIPE.
strace cat < /dev/zero | ls

How does the pipe stop piping
The pipe stops piping when either end is closed.
If the input(write) end of the pipe is closed, then any data in the pipe is held until it is read from the output end. Once the buffer is emptied, anyone subsequently reading from the output end will get an EOF.
If the output(read) end of the pipe is closed, any data in the pipe will be discarded. Anyone subsequently writing to the input end will get a SIGPIPE/EPIPE. Note that a process merely holding open the input but not actively writing to it will not be signalled.
So when you type cat | ls you get a cat program with stdout connected to the input of the pipe and ls with stdin connected to the output. ls runs and outputs some stuff (to its stdout, which is still the terminal) and never reads from stdin. Once done it exits and closes the output of the pipe. Meanwhile cat is waiting for input from its stdin (the terminal). When it gets it (you type a line), it writes it to stdout, gets a SIGPIPE/EPIPE and exits (discarding the data as there's noone to write it to.) This closes the input of the pipe, so the pipe goes away now that both ends have been closed.
Now lets look at what happens with cat | cat | ls. You now have two pipes and two cat programs. As before ls runs and exits, closing the output of the second pipe. Now you type a line and the first cat reads it and copies it to the first pipe (still fully open) where the second cat reads it and copies it to the second pipe (which has its output closed), so it (the second cat) gets a SIGPIPE/EPIPE and exits (which closes the output of the first pipe). At this point the first cat is still waiting for input, so if you type a second line, it copies that to the now closed first pipe and gets a SIGPIPE/EPIPE and exits

How does the pipe stop piping, is it when it's full ?
A pipe has several states:
if you obtain the pipe through a call to pipe(2) (an unnamed pipe) both file descriptors are already open, so this doesn't apply to it (you start in point 2. below). When you open a named pipe, your open(2) call (depending if you have open with O_READ, O_WRITE, or O_RDWR. The pipe has two sides, the writer and the reader side. When you open it, you attach to the sides, depending on how do you open it. Well, up to here, the pipe blocks any open(2) call, until both sides have at least one process tied to them. So, if you open a pipe and read(2) from it, then your open will be blocked, until other process has opened it to read.
once both extremes have it open, the readers (the process issuing a read(2) call) block when the pipe is empty, and the writers (the processes issuing a write(2) call) block whenever the write call cannot be satisfied due to fillin completely the pipe. Old implementations of pipes used the filesystem to hold the data, and the data was stored only in the direct addressed disk blocks. This meant (as there are 10 such blocks in an inode) that you normally had space in the pipe to hold 10 blocks, after that, the writers are blocked. Later, pipes were implemented using the socket infrastructure in BSD systems, which allowed you to control the buffer size with ioctl(2) calls. Today, IMHO, pipes use a common implementation, that is separate from sockets also.
When the processes close the pipe continues to work as said in point 2. above, until the number of readers/writers collapses to zero. At that point, the pipe starts giving End Of File condition to all readers (this means read(2) syscall will return 0 bytes, without blocking) and error (cannot write to pipe) to writers. In addition, the kernel sends a signal (which normally aborts the writer processes) SIGPIPE to every process that has the pipe open for writing. If you have not ignored that signal or you have not installed a signal handler for it, your process will die. In this state, it's impossible to reopen the pipe again, until all processes have closed it.
A common error is when you pipe() or you open a pipe with O_RDWR, and the other process closes its file descriptor, and you don't get anything indicating about the other's close call..... this is due to the thing that both sides of the pipe are still open (by the same process) so it will not receive anything because it can still write to the pipe.
Any other kind of misbehaviour could be explained if you had posted any code, but you didn't, so IMHO, thi answer is still incomplete, but the number of different scenarios is difficult to enumerate, so I'll be pendant of any update to your question with some faulty (or needed of explanation) code.

popen()ed pipe closed from other extreme kills my program

I have a pipe which I opened with FILE *telnet = popen("telnet server", "w". If telnet exits after a while because server is not found, the pipe is closed from the other extreme.
Then I would expect some error, either in fprintf(telnet, ...) or fflush(telnet) calls, but instead, my program suddenly dies at fflush(telnet) without reporting the error. Is this normal behaviour? Why is it?

Converting (expanded) comments into an answer.
If you write to a pipe when there's no process at the other end of the pipe to read the data, you get a SIGPIPE signal to let you know, and the default behaviour for SIGPIPE is to exit (no core dump, but exit with prejudice).
If you examine the exit status in the shell, you should see $? is 141 (128 + SIGPIPE, which is normally 13).
If you don't mind that the process exits, you need do nothing. Alternatively, you can set the signal handler for SIGPIPE to SIG_IGN, in which case your writing operation should fail with an error, rather than terminating the process. Or you can set up more elaborate signal handling.
Note that one of the reasons you need to be careful to close unused file descriptors from pipes is that if the current process is writing to a pipe but also has the read end of the pipe open, it won't get SIGPIPE — but it might get blocked because it can't write more information to the pipe until some process reads from the pipe, but the only process that can read from the pipe is the one that's trying to write to it.

What happens when I try to read from a pipe without writing to it?

With this operation I expected to get an error because I'm reading from nothing, but in fact the program seems to keep attempting to read until someone writes to it. If there are no writes it will be stuck in an indefinite loop trying to read and will not proceed.
What exactly happens behind the scene here, do the function kept looping, or is it waiting for a signal, or is something else going on? Is it still taking CPU resources?
Also, is it possible to make the program return an error code/print out something when trying to read without any writes? I don't really need to do it, just wondering if it's possible.

This is normal behavior. If nothing is available to read, the reading process will block until there is. It will not consume CPU time while blocking; the OS will put it to sleep until another process writes to the pipe.
Keep in mind that pipes were designed to be somewhat transparent; a simple filter-type program should not have to care whether the input is a file or a pipe. If every program that wanted to be able to read from a pipe (think grep) had to include special handling to wait until the writer was ready, it would be very tedious for those programmers. This behavior means that reading from pipes doesn't require doing anything special.
If you don't want to block if no data is available, you can set the O_NONBLOCK status flag on the file descriptor, either when you open(2) it, or with fcntl(fd, F_SETFL, ...). In this case, when no data is available, read(2) will return -1 and set errno to EAGAIN or EWOULDBLOCK. This means, of course, that every time you read from the file descriptor, you have to write code to handle such a case.
You can also use select(2) or poll(2) to wait until data is available, optionally with a timeout.
It is also possible to arrange it so that a signal arriving during the blocking will cause read(2) to return -1 and set errno to EINTR. This depends on system call restarting semantics and is a little bit complicated.

How to check if non-blocking anonymous pipe has data without removing it

I have a POSIX thread which reads from the non-blocking anonymous pipe (marked with O_NONBLOCK flag). When thread is stopping (because of errors for example) I want to check if there is something left in pipe (in its internal buffer). If pipe has data - run new thread with the same read descriptor (it is shared between threads) so the new thread can continue reading from pipe. If pipe is empty - close pipe and do nothing.
So I need to check if pipe is empty without removing data from pipe (as regular read will do). Is there any way to do it?
P.S. I think setting count = 0 in read(int fd, void *buf, size_t count); may help but the documentation sais that it is some kind of undefined behavior:
If count is zero, read() may detect the errors described below. In
the absence of any errors, or if read() does not check for errors, a
read() with a count of 0 returns zero and has no other effects.

I believe you want poll or select, called with a zero timeout.
Short description from the select() docs:
select() and pselect() allow a program to monitor multiple file
descriptors, waiting until one or more of the file descriptors become
"ready" for some class of I/O operation (e.g., input possible).
...and the poll() docs:
poll() performs a similar task to select(2): it waits for one of a
set of file descriptors to become ready to perform I/O.

What's the purpose of send(2) receiving a SIGPIPE? [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Why is writing a closed TCP socket worse than reading one?
Why doesn't an erroneous return value suffice?
What can I do in a signal handler that I can't do by testing the return value for EPIPE?

Back in the old days almost every signal caused a Unix program to terminate. Because inter-process communication by pipes is fundamental in Unix, SIGPIPE was intended to terminate programs which didn't handle write(2)/read(2) errors.
Suppose you have two processes communicating through a pipe. If one of them dies, one of the ends of the pipe isn't active anymore. SIGPIPE is intended to kill the other process as well.
As an example, consider:
cat myfile | grep find_something
If cat is killed in the middle of reading the file, grep simply doesn't have what to do anymore and is killed by a SIGPIPE signal. If no signal was sent and grep didn't check the return value of read, grep would misbehave in some way.

As with many other things, my guess is that it was just a design choice someone made that eventually made it into the POSIX standards and has remained till date. That someone may have thought that trying to send data over a closed socket is a Bad Thing™ and that your program needs to be notified immediately, and since nobody ever checks error codes, what better way to notify you than to send a signal?