popen()ed pipe closed from other extreme kills my program

popen()ed pipe closed from other extreme kills my program - c

I have a pipe which I opened with FILE *telnet = popen("telnet server", "w". If telnet exits after a while because server is not found, the pipe is closed from the other extreme.
Then I would expect some error, either in fprintf(telnet, ...) or fflush(telnet) calls, but instead, my program suddenly dies at fflush(telnet) without reporting the error. Is this normal behaviour? Why is it?

Converting (expanded) comments into an answer.
If you write to a pipe when there's no process at the other end of the pipe to read the data, you get a SIGPIPE signal to let you know, and the default behaviour for SIGPIPE is to exit (no core dump, but exit with prejudice).
If you examine the exit status in the shell, you should see $? is 141 (128 + SIGPIPE, which is normally 13).
If you don't mind that the process exits, you need do nothing. Alternatively, you can set the signal handler for SIGPIPE to SIG_IGN, in which case your writing operation should fail with an error, rather than terminating the process. Or you can set up more elaborate signal handling.
Note that one of the reasons you need to be careful to close unused file descriptors from pipes is that if the current process is writing to a pipe but also has the read end of the pipe open, it won't get SIGPIPE — but it might get blocked because it can't write more information to the pipe until some process reads from the pipe, but the only process that can read from the pipe is the one that's trying to write to it.

Related

How can I understand the behavior of pipe, with varying data flow?

My problem is a bit hard to explain properly as I do not understand fully the behavior behind it.
I have been working on pipe and pipelines in C, and I noticed some behavior that is a bit mysterious to me.
Let's take a few example: Let's try to pipe yes with head. (yes | head). Even though I coded the behavior in a custom program, I don't understand how the pipe knows when to stop piping ? It seems two underlying phenomenons are causing this (maybe), the SIGPIPE and/or the internal size a pipe can take. How does the pipe stop piping, is it when it's full ? But the size of a pipe is way superior to 10 "yes\n" no ? And SIGPIPE only works when the end read/write is closed no ?
Also let's take another example, for example cat and ls: cat | ls or even cat | cat | ls.
It seems the stdin of the pipe is waiting for input, but how does it know when to stop, i.e. after one input ? What are the mechanism that permits this behavior?
Also can anyone provide me with others examples of these very specific behavior if there are any in pipes and pipelines so I can get an good overview of theses mechanism ?
In my own implementation, I managed to replicate that behavior using waitpid. However how does the child process itself know when to stop ? Is it command specific ?

The write operation will block when the pipe buffer is full, the read operation will block when the buffer is empty.
When the write end of the pipe is closed, the reading process will get an EOF indication after reading all data from the buffer. Many programs will terminate in this case.
When the read end of the pipe is closed, the writing process will get a SIGPIPE. This will also terminate most programs.
When you run cat | ls, STDOUT of cat is connected to STDIN of ls, but ls does not read from STDIN. On the system where I checked this, ls simply ignores STDIN and the file descriptor will be closed when ls terminates.
You will see the output of ls, and cat will be waiting for input.
cat will not write anything to STDOUT before it has read enough data from STDIN, so it will not notice that the other end of the pipe has been closed.
cat will terminate when it detects EOF on STDIN which can be done by pressing CTRL+D or by redirecting STDIN from /dev/null, or when it gets SIGPIPE after trying to write to the pipe which will happen when you (type something and) press ENTER.
You can see the behavior with strace.
cat terminates after EOF on input which is shown as read(0, ...) returning 0.
strace cat < /dev/null | ls
cat killed by SIGPIPE.
strace cat < /dev/zero | ls

How does the pipe stop piping
The pipe stops piping when either end is closed.
If the input(write) end of the pipe is closed, then any data in the pipe is held until it is read from the output end. Once the buffer is emptied, anyone subsequently reading from the output end will get an EOF.
If the output(read) end of the pipe is closed, any data in the pipe will be discarded. Anyone subsequently writing to the input end will get a SIGPIPE/EPIPE. Note that a process merely holding open the input but not actively writing to it will not be signalled.
So when you type cat | ls you get a cat program with stdout connected to the input of the pipe and ls with stdin connected to the output. ls runs and outputs some stuff (to its stdout, which is still the terminal) and never reads from stdin. Once done it exits and closes the output of the pipe. Meanwhile cat is waiting for input from its stdin (the terminal). When it gets it (you type a line), it writes it to stdout, gets a SIGPIPE/EPIPE and exits (discarding the data as there's noone to write it to.) This closes the input of the pipe, so the pipe goes away now that both ends have been closed.
Now lets look at what happens with cat | cat | ls. You now have two pipes and two cat programs. As before ls runs and exits, closing the output of the second pipe. Now you type a line and the first cat reads it and copies it to the first pipe (still fully open) where the second cat reads it and copies it to the second pipe (which has its output closed), so it (the second cat) gets a SIGPIPE/EPIPE and exits (which closes the output of the first pipe). At this point the first cat is still waiting for input, so if you type a second line, it copies that to the now closed first pipe and gets a SIGPIPE/EPIPE and exits

How does the pipe stop piping, is it when it's full ?
A pipe has several states:
if you obtain the pipe through a call to pipe(2) (an unnamed pipe) both file descriptors are already open, so this doesn't apply to it (you start in point 2. below). When you open a named pipe, your open(2) call (depending if you have open with O_READ, O_WRITE, or O_RDWR. The pipe has two sides, the writer and the reader side. When you open it, you attach to the sides, depending on how do you open it. Well, up to here, the pipe blocks any open(2) call, until both sides have at least one process tied to them. So, if you open a pipe and read(2) from it, then your open will be blocked, until other process has opened it to read.
once both extremes have it open, the readers (the process issuing a read(2) call) block when the pipe is empty, and the writers (the processes issuing a write(2) call) block whenever the write call cannot be satisfied due to fillin completely the pipe. Old implementations of pipes used the filesystem to hold the data, and the data was stored only in the direct addressed disk blocks. This meant (as there are 10 such blocks in an inode) that you normally had space in the pipe to hold 10 blocks, after that, the writers are blocked. Later, pipes were implemented using the socket infrastructure in BSD systems, which allowed you to control the buffer size with ioctl(2) calls. Today, IMHO, pipes use a common implementation, that is separate from sockets also.
When the processes close the pipe continues to work as said in point 2. above, until the number of readers/writers collapses to zero. At that point, the pipe starts giving End Of File condition to all readers (this means read(2) syscall will return 0 bytes, without blocking) and error (cannot write to pipe) to writers. In addition, the kernel sends a signal (which normally aborts the writer processes) SIGPIPE to every process that has the pipe open for writing. If you have not ignored that signal or you have not installed a signal handler for it, your process will die. In this state, it's impossible to reopen the pipe again, until all processes have closed it.
A common error is when you pipe() or you open a pipe with O_RDWR, and the other process closes its file descriptor, and you don't get anything indicating about the other's close call..... this is due to the thing that both sides of the pipe are still open (by the same process) so it will not receive anything because it can still write to the pipe.
Any other kind of misbehaviour could be explained if you had posted any code, but you didn't, so IMHO, thi answer is still incomplete, but the number of different scenarios is difficult to enumerate, so I'll be pendant of any update to your question with some faulty (or needed of explanation) code.

Will a process writing to a pipe block if the pipe is full?

I'm currently diving into the Win32 API and writing myself a wrapper class for CreateProcess and CreatePipe. I was just wondering what will happen if a process that I opened writes too much output for the pipe buffer to hold. Will the process wait until I read from the other end of the pipe? The Remark of the CreatePipe function suggests so:
When a process uses WriteFile to write to an anonymous pipe, the write operation is not completed until all bytes are written. If the pipe buffer is full before all bytes are written, WriteFile does not return until another process or thread uses ReadFile to make more buffer space available.
https://msdn.microsoft.com/en-us/library/windows/desktop/aa365152%28v=vs.85%29.aspx
Let's assume I open a process with CreateProcess, then use WaitForSingleObject to wait until the process exits. Will the process ever exit if it exceeds the buffer size of its standard output pipe?

WaitForSingleObject on a process with redirected output is indeed a deadlock. You need to keep the output pipe drained in order to let the child process run to completion.
Generally you would use overlapped I/O on the pipe and then WaitForMultipleObjects on the handle pair1 (process handle, pipe read event handle) in a loop until the process handle becomes signaled.
Raymond Chen wrote about the scenario when the input is also piped:
Be careful when redirecting both a process's stdin and stdout to pipes, for you can easily deadlock
1 As Hans commented, there can be more than one output stream. stdout, stderr are typical, even more are possible through handle inheritance. Drain all the pipes coming out of the process.

Why should you close a pipe in linux?

When using a pipe for process-process communication, what is the purpose of closing one end of the pipe?
For example: How to send a simple string between two programs using pipes?
Notice that one side of the pipe is closed in the child and parent processes. Why is this required?

If you connect two processes - parent and child - using a pipe, you create the pipe before the fork.
The fork makes the both processes have access to both ends of the pipe. This is not desirable.
The reading side is supposed to learn that the writer has finished if it notices an EOF condition. This can only happen if all writing sides are closed. So it is best if it closes its writing FD ASAP.
The writer should close its reading FD just in order not to have too many FDs open and thus reaching a maybe existing limit of open FDs. Besides, if the then only reader dies, the writer gets notified about this by getting a SIGPIPE or at least an EPIPE error (depending on how signals are defined). If there are several readers, the writer cannot detect that "the real one" went away, goes on writing and gets stuck as the writing FD blocks in the hope, the "unused" reader will read something.
So here in detail what happens:
parent process calls pipe() and gets 2 file descriptors: let's call it rd and wr.
parent process calls fork(). Now both processes have a rd and a wr.
Suppose the child process is supposed to be the reader.
Then
the parent should close its reading end (for not wasting FDs and for proper detection of dying reader) and
the child must close its writing end (in order to be possible to detect the EOF condition).

The number of file descriptors that can be open at a given time is limited. If you keep opening pipes and not closing them pretty soon you'll run out of FDs and can't open anything anymore: not pipes, not files, not sockets, ...
Another reason why it can be important to close the pipe is when the closing itself has a meaning to the application. For example, a common use of pipes is to send the errno from a child process to the parent when using fork and exec to launch an external program:
The parent creates the pipe, calls fork to create a child process, closes its writing end, and tries to read from the pipe.
The child process attempts to use exec to run a different program:
If exec fails, for example because the program does not exist, the child writes errno to the pipe, and the parent reads it and knows what went wrong, and can tell the user.
If exec is successful the pipe is closed without anything being written. The read function in the parent returns 0 indicating the pipe was closed and knows the program was successfully started.
If the parent did not close its writing end of the pipe before trying to read from the pipe this would not work because the read function would never return when exec is successful.

Closing unused pipe file descriptor is more than a matter of ensuring that a process doesn't exhaust its limited set of file descriptor-it is essential to the correct use of pipes. We now consider why the unused file descriptors for both the read and write ends of the pipe must be closed.
The process reading from the pipe closes its write descriptor for the pipe, so that, when the other process completes its output and closes its write descriptor, the read sees end-of-file (once it has ready any outstanding data in the pipe).
If the reading process doesn't close the write end of the pipe, then after the other process closes its write descriptor, the reader won't see end-of-file, even after it has read all data from the pipe. Instead, a read() would block waiting for data, because the kernel knows that there is still at least one write descriptor open for the pipe.That this descriptor is held open by the reading process itself is irrelevant; In theory, that process could still write to the pipe, even if it is blocked trying to read.
For example, the read() might be interrupted by a signal handler that writes data to the pipe.
The writing process closes its read descriptor for the pipe for a different reason.
When a process tries to write to a pipe for which no process has an open read descriptor, the kernel sends the SIGPIPE signal to the writing process. By default, this signal kills a process. A process can instead arrange to catch or ignore this signal, in which case the write() on the pipe fails with the error EPIPE (broken pipe). Receiving the SIGPIPE signal or getting the EPIPE error is useful indication about the status of the pipe, and this is why unused read descriptors for the pipe should be closed.
If the writing process doesn't close the read end of the pipe, then even after the other process closes the read end of the pipe, the writing process will fill the pipe, and a further attempt to write will block indefinitely.
One final reason for closing unused file descriptor is that only after it all file descriptor are closed that the pipe is destroyed and its resources released for reuse by other processes. At this point, any unread data in the pipe is lost.
~ Micheal Kerrisk , the Linux programming interface

how to communicate to a program with another external program

I'm trying to write to stdin and read from stdout ( and stderr ) from an external program, without changing the code.
I've tried using named pipes, but stdout doesn't show until the program is terminated and stdin only works on the first input( then cin is null ).
i've tried using /proc/[pid]/fd but that only writes and reads from the terminal and not the program.
i've tried writing a character device file for this and it worked, but only one program at a time ( this needs to work for multiple programs at a time ).
at this point, to my knowledge, I could write the driver that worked to multiplex the io across multiple programs but I don't think that's the "right" solution.
the main purpose of this is to view a feed of a program through a web interface. I'm sure there has to be someway to do this. is there anything I haven't tried that's been done before?

The typical way of doing this is:
Create anonymous pipes (not named pipes) with the pipe(2) system call for the new process's standard streams
Call fork(2) to spawn the child process
close(2) the appropriate ends of the pipes in both the parent and the child (e.g. for the stdin pipe, close the read end in the parent and close the write end in the child; vice-versa for the stdout and stderr pipes)
Use dup2(2) in the child to copy the pipe file descriptors onto file descriptors 0, 1, and 2, and then close(2) the remaining old descriptors
exec(3) the external application in the child process
In the parent process, simultaneously write to the child's stdin pipe and read from the child's stdout and stderr pipes. However, depending on how the child behaves, this can easily lead to deadlock if you're not careful. One way to avoid deadlock is to spawn separate threads to handle each of the 3 streams; another way is to use the select(2) system call to wait until one of the streams can be read from/written to without blocking, and then process that stream.
Even if you do this all correctly, you may still not see your program's output right away. This is typically due to buffering stdout. Normally, when stdout is going to a terminal, it's line-buffered—it gets flushed after every newline gets written. But when stdout is a pipe (or anything else that's not a terminal, like a file or a socket), it's fully buffered, and it only gets written to when the program has outputted a full buffer's worth of data (e.g. 4 KB).
Many programs have command line options to change their buffering behavior. For example, grep(1) has the --line-buffered flag to force it to line-buffer its output even when stdout isn't a terminal. If your external program has such an option, you should probably use it. If not, it's still possible to change the buffering behavior, but you have to use some sneaky tricks—see this question and this question for how to do that.

what if tail fails while reading from pipe

distinguish stdout from stderr on pipe
So, related to the link above, I have a child who is executing tail and parent is reading its out put via a pipe.
dup2(pipefd[1], STDOUT_FILENO);
dup2(pipefd[1], STDERR_FILENO);
My question is, if somehow tail fails, what happens to the pipe from which I am reading? Do I get anything on stderr? Does tail terminate itself? or it may hang in there as defunct?

The kernel will send a SIGPIPE signal to the other process on the pipe when tail has terminated. The default action for this signal (if a handler is not installed) is to terminate the process.
If you don't want to deal with signals, you can ignore SIGPIPE in the parent (so it doesn't terminate when tail has terminated), and instead check whether the value of errno is EPIPE after each read. Additionally, you'll have to call wait or waitpid from the parent to reap the zombie child.

you don't get EPIPE when reading, only write will return EPIPE. You'll get EOF, indicated by read returning 0, and since you read stderr, you'll get the error message as well (before the EOF).
The process will become a zombie, and you can use wait/waitpid to get the exit status, which will be non-zero if there was an error.

If tail fails, any read on the read end of the pipe will return EOF. If tail fails, it has already terminated, the definition of "fail" being that it terminated with a non-zero exit status. It will remain in the process table (ie, "defunct") until the parent waits for it.
But why are you having tail use the same pipe for both stderr and stdout? Why not just make two pipes? It seems that this would eliminate the problem of distinguishing between the two output streams.