When using a pipe for process-process communication, what is the purpose of closing one end of the pipe?
For example: How to send a simple string between two programs using pipes?
Notice that one side of the pipe is closed in the child and parent processes. Why is this required?
If you connect two processes - parent and child - using a pipe, you create the pipe before the fork.
The fork makes the both processes have access to both ends of the pipe. This is not desirable.
The reading side is supposed to learn that the writer has finished if it notices an EOF condition. This can only happen if all writing sides are closed. So it is best if it closes its writing FD ASAP.
The writer should close its reading FD just in order not to have too many FDs open and thus reaching a maybe existing limit of open FDs. Besides, if the then only reader dies, the writer gets notified about this by getting a SIGPIPE or at least an EPIPE error (depending on how signals are defined). If there are several readers, the writer cannot detect that "the real one" went away, goes on writing and gets stuck as the writing FD blocks in the hope, the "unused" reader will read something.
So here in detail what happens:
parent process calls pipe() and gets 2 file descriptors: let's call it rd and wr.
parent process calls fork(). Now both processes have a rd and a wr.
Suppose the child process is supposed to be the reader.
Then
the parent should close its reading end (for not wasting FDs and for proper detection of dying reader) and
the child must close its writing end (in order to be possible to detect the EOF condition).
The number of file descriptors that can be open at a given time is limited. If you keep opening pipes and not closing them pretty soon you'll run out of FDs and can't open anything anymore: not pipes, not files, not sockets, ...
Another reason why it can be important to close the pipe is when the closing itself has a meaning to the application. For example, a common use of pipes is to send the errno from a child process to the parent when using fork and exec to launch an external program:
The parent creates the pipe, calls fork to create a child process, closes its writing end, and tries to read from the pipe.
The child process attempts to use exec to run a different program:
If exec fails, for example because the program does not exist, the child writes errno to the pipe, and the parent reads it and knows what went wrong, and can tell the user.
If exec is successful the pipe is closed without anything being written. The read function in the parent returns 0 indicating the pipe was closed and knows the program was successfully started.
If the parent did not close its writing end of the pipe before trying to read from the pipe this would not work because the read function would never return when exec is successful.
Closing unused pipe file descriptor is more than a matter of ensuring that a process doesn't exhaust its limited set of file descriptor-it is essential to the correct use of pipes. We now consider why the unused file descriptors for both the read and write ends of the pipe must be closed.
The process reading from the pipe closes its write descriptor for the pipe, so that, when the other process completes its output and closes its write descriptor, the read sees end-of-file (once it has ready any outstanding data in the pipe).
If the reading process doesn't close the write end of the pipe, then after the other process closes its write descriptor, the reader won't see end-of-file, even after it has read all data from the pipe. Instead, a read() would block waiting for data, because the kernel knows that there is still at least one write descriptor open for the pipe.That this descriptor is held open by the reading process itself is irrelevant; In theory, that process could still write to the pipe, even if it is blocked trying to read.
For example, the read() might be interrupted by a signal handler that writes data to the pipe.
The writing process closes its read descriptor for the pipe for a different reason.
When a process tries to write to a pipe for which no process has an open read descriptor, the kernel sends the SIGPIPE signal to the writing process. By default, this signal kills a process. A process can instead arrange to catch or ignore this signal, in which case the write() on the pipe fails with the error EPIPE (broken pipe). Receiving the SIGPIPE signal or getting the EPIPE error is useful indication about the status of the pipe, and this is why unused read descriptors for the pipe should be closed.
If the writing process doesn't close the read end of the pipe, then even after the other process closes the read end of the pipe, the writing process will fill the pipe, and a further attempt to write will block indefinitely.
One final reason for closing unused file descriptor is that only after it all file descriptor are closed that the pipe is destroyed and its resources released for reuse by other processes. At this point, any unread data in the pipe is lost.
~ Micheal Kerrisk , the Linux programming interface
Related
What happens when the child process gets killed while the parent is blocked on read() from a pipe? How should I handle this scenario in parent process?
For clarification, parent process has two threads. Lets say thread1 was reading from the pipe when thread2 killed the child.
Will read() return -1?
Will appreciate any help here.
Pipe behavior has nothing to do with process relationships. The same rules apply regardless of whether the reader is the parent, child, sibling, or some other distant relation of the writer. Or even if the reader and writer are the same process.
The short answer is that death of a writing process is just an EOF from the reader's point of view, not an error, and this doesn't depend on whether the writing process voluntarily called _exit() or was killed by a signal.
The whole cause and effect chain goes like this:
Process X dies -> all of process X's file descriptors are closed.
One of process X's file descriptors was the write end of a pipe
A pipe write file descriptor is closed -> was it the last one?
3a. There are other write file descriptors on the same pipe (e.g. inherited by fork and still open in another process), nothing happens. Stop.
3b. There are no more write file descriptors for this pipe -> the pipe has hit EOF.
Pipe hits EOF -> readers notice.
4a. All read file descriptors for the pipe become readable, waking up any process that was blocking on select or poll or read or another similar syscall.
4b. If there is any leftover data in the pipe buffer (written before the last write file descriptor was closed), that data is returned to the reader(s).
4c. repeat 4b until the pipe buffer is empty
4d. Finally, read() returns 0, indicating EOF.
The exit status of a child process is returned to the parent by the wait family of syscalls, and you have to check that if you want to know when your child processes have been killed by a signal.
I do not know why the parent process needs to close both the file descriptors of a pipe before calling wait()?
I have a C program which does:
Parent creates child_a, which executes ls -l using execvp, and writes to the pipe (after closing read end of pipe).
Parent creates another child (without closing any file descriptor for pipe), called child_b, which executes 'wc' by reading from pipe.(after closing write end of pipe).
Parent waits for both children to complete by calling wait() twice.
I noticed that program is blocked if parent does not close both file descriptors of the pipe before calling the wait() syscall. Also after reading few questions already posted online it looks like this is the general rule and needs to be done. But I could not find the reason why this has to be done?
Why does wait() not return if the parent does not close the file descriptors of the pipe?
I was thinking that, in the worst case, if the parent does not close the file descriptor of pipe, then the only consequence would be that the pipe would keep existing (which is a waste of resource). But I never thought this would block the execution of child process (as can be seen because wait() does not return).
Also remember, parent is not using the pipe at all. It is child_a writing in the pipe, and child_b reading from the pipe.
If the parent process doesn't close the write ends of the pipes, the child processes never get EOF (zero bytes read) because there's a process that might (but won't) write to the pipe. The child process must also close the write end of the pipe for the same reason — if it doesn't, there's a process (itself) that might (but won't) write to the pipe, so the read won't return EOF.
If you duplicate one end of a pipe to standard output or standard error, you should close both ends of that pipe. It is a common mistake not to have enough calls to close() in multiprocess code using pipes. Occasionally, you get away with being sloppy, but the details vary by case and usually you don't.
I'm currently diving into the Win32 API and writing myself a wrapper class for CreateProcess and CreatePipe. I was just wondering what will happen if a process that I opened writes too much output for the pipe buffer to hold. Will the process wait until I read from the other end of the pipe? The Remark of the CreatePipe function suggests so:
When a process uses WriteFile to write to an anonymous pipe, the write operation is not completed until all bytes are written. If the pipe buffer is full before all bytes are written, WriteFile does not return until another process or thread uses ReadFile to make more buffer space available.
https://msdn.microsoft.com/en-us/library/windows/desktop/aa365152%28v=vs.85%29.aspx
Let's assume I open a process with CreateProcess, then use WaitForSingleObject to wait until the process exits. Will the process ever exit if it exceeds the buffer size of its standard output pipe?
WaitForSingleObject on a process with redirected output is indeed a deadlock. You need to keep the output pipe drained in order to let the child process run to completion.
Generally you would use overlapped I/O on the pipe and then WaitForMultipleObjects on the handle pair1 (process handle, pipe read event handle) in a loop until the process handle becomes signaled.
Raymond Chen wrote about the scenario when the input is also piped:
Be careful when redirecting both a process's stdin and stdout to pipes, for you can easily deadlock
1 As Hans commented, there can be more than one output stream. stdout, stderr are typical, even more are possible through handle inheritance. Drain all the pipes coming out of the process.
I am reading about the pipes in UNIX for inter process communication between 2 processes. I have following question
Is it really necessary to close the unused end of the pipe? for example, if my parent process is writing data in to the pipe and child is reading from pipe, is it really necessary to close the read end of the pipe in parent process and close the write end from child process? Are there any side effects if I won't close those ends? Why do we need to close those ends?
Here's the problem if you don't. In your example, the parent creates a pipe for writing to the child. It then forks the child but does not close its own read descriptor. This means that there are still two read descriptors on the pipe.
If the child had the only one and it closed it (for example, by exiting), the parent would get a SIGPIPE signal or, if that was masked, an error on writing to the pipe.
However, there is a second read descriptor on the pipe (the parent's). Now, if the child exits, the pipe will remain open. The parent can continue to write to the pipe until it fills and then the next write will block (or return without writing if non-blocking).
Thus, by not closing the parent's read descriptor, the parent cannot detect that the child has closed its descriptor.
According to the man page for getdtablesize
Each process has a fixed size descriptor table, which is guaranteed to
have at least 20 slots.
Each pipe uses two entries in the descriptor table. Closing the unneeded end of the pipe frees up one of those descriptors. So, if you were unfortunate enough to be on a system where each process is limited to 20 descriptors, you would be highly motivated free up unneeded file descriptors.
Pipes are destined to be used as unidirectional communication channels. Closing them is a good practice allowing to avoid some mess in sent messages. Writer's descriptor should be closed for reader and vice versa.
Refer here
Quoting from above reference:
[...] Each pipe provides one-way communication; information flows from
one process to another.
For this reason, the parent and child process should close unused end
of the pipe.
There is actually another more important reason for closing unused
ends of the pipe.
The process reading from the pipe blocks when making the read system
call unless:
The pipe contains enough data to fill the reader's buffer or,
The end-of-file character is sent. The end-of-file character is sent through the pipe when every file descriptor to write end of the
pipe is closed. Any process reading from the pipe and forgetting to
close the write end of the pipe will never be notified of the
"end-of-file" [...]
I have this following scenario.
I create a pipe.
Forked a child process.
Child closes read end of the pipe explicitly and writes into the write end of the pipe and exits without closing anything ( exit should close all open file/pipe descriptors on behalf of the child, I presume).
Parent closes the write end of the pipe explicitly and reads from the read end of the pipe using fgets until fgets returns NULL. ie it reads completely.
Now my question is, why does the parent need to close the read end of the pipe explicitly once its done reading? Isn't it wise for the system to delete the pipe altogether once complete data has been read from the read-end?
I dint close the read end explicitly in the parent and I have Too many file descriptors error sooner or later while opening more pipes. My assumption was that the system automatically deletes a pipe once its write end is closed and data has been completely read from read end. Cos you cant from a pipe twice!
So, whats the rationale behind the system not deleting the pipe once data has been completely read and write end closed?
You're correct that the system will close the write end of the pipe once the child exits. However there could be another write end of that pipe open, if the child forks or passes a duplicate of the write end to another process.
It is still true that the system would be able to tell when all the descriptors at one end of a pipe have been closed (either explicitly or because the owning process exited). It still doesn't make sense to close those on the other end of the pipe, as that would lead to confusion when the parent process tries to close the descriptor on its end of the pipe; either:
the fd has been closed by the system, in which case there is an error as it tries to close an already closed fd; or
the fd has been reused, which is even worse as it is now closing a completely unrelated fd.
From the point of view of the system, it might well have discarded the pipe once all the descriptors at one end have been closed, so you don't need to worry about inefficiency there. What matters more is that the user space process should have a consistent experience, which means not closing the descriptor unless it is specifically requested.
File descriptors are not closed by the system, until the process exits. This is true for pipes, as well as any other file descriptor.
There's a big difference between a pipe (or any other file) with no data in it and a closed file descriptor.
When a file descriptor is closed, the system can reuse its number for a new file descriptor. Then, when you read, you get something else. So after you've closed a file descriptor, you must no longer use it.
Now imagine that once there's no more data, the system would automatically close the file descriptor. This would make the number available for reuse, and a subsequent unrelated open may get it. Now the reader, who doesn't know yet that there's no more data, will read from what it thinks is the pipe, but will actually read from another file.