Confusion regarding usage of dup() - c

When we use dup to redirect STDOUT to a pipe we do:
close(1); dup(fd[1]);
close(fd[0]);
close(fd[1]);
execlp("ls","-al",(char *) NULL);
but we are closing both ends end of the pipe. then how the STDOUT can be written to the pipe?

You're not closing both ends of the pipe, in the example code. You're closing fd[0] and fd[1]. Initially, closing those would have been enough to close both ends of the pipe, but it's not after you duplicated fd[0]. You'd have to close the duplicated fd also to close all your references to the pipe. That would be silly though: you're keeping an end open precisely so that ls can write to.
Perhaps your confusion is about close() is closing? It closes the fd, the reference to one of the ends of the pipe. It doesn't close the pipe itself: that's what shutdown() would do. (If you don't call shutdown, the pipe is automatically closed when every fd referring to it has been closed.) So, because the duplicated fd is still open, the process can write to the pipe (which isn't closed, because only two of the three references to it were closed).

Because once the file descriptor number 1 (e.g. standard out) is closed, that number is available for further dup or open calls.
You should check the result of your close and dup syscalls.
Of course closing both ends of a pipe is non-sense, unless you do something useful before (i.e. reading or writing on appropriate ends).
See open(2), dup(2), pipe(2), close(2) man pages. Read the Advanced Linux Programming book.

Related

How does this example use of dup work?

I've been wanting to create a child process that forks off twice to create two child processes. With the output of one, sent to the other.
I found this example code. But I'm confused as to how it works.
I found an example here. But I'm confused by the way dup is used and how it works.
i.e.
close(1);
dup(fd[1]);
close(fd[0]);
close(fd[1]);
Output is then piped into a second forked process and it's pipes are connected like this:
close(0);
dup(fd[0]);
close(fd[0]);
close(fd[1]);
The main relevant lines are these — they form a standard idiom (but it is easier to replace the first two lines with dup2(fd[1], 1)):
close(1);
dup(fd[1]);
close(fd[0]);
close(fd[1]);
The dup() function duplicates its argument file descriptor to the lowest-numbered unopen file descriptor. The close() closes descriptor 1, and descriptor 0 is still open, so the dup() makes standard output 1 refer to the write side of the pipe fd[1]. The other two close calls correctly close both ends of the pipe. The process shouldn't be reading from the read end of the pipe fd[0] and standard output is writing to the write end of the pipe so the other descriptor isn't needed any more (and could lead to problems if it was not closed).
So, this is a standard sequence for connecting the write end of a pipe to the standard output of a process. The second sequence is similar, but connects the read end of the pipe to standard input (instead of the write end to standard output).
Generally, when you connect one end of a pipe to either standard input or standard output, the process should close both ends of the original pipe.
I note that there was no error checking, though it is unlikely that anything would go wrong — unless the process had been started with either standard output or standard input closed, contrary to all reasonable expectations.

Understanding the Unix dup2 system call?

I'm playing around with the dup2() function to try and get a better grasp of it.
From looking at the manual, it takes two parameters. First is the existing file descriptor and second parameter is the copied file descriptor.
I decided to try and redirect stdout to my write end of the pipe.
Judging by the manual I thought the code should be...
if ((dup2(STDOUT_FILENO, fd[1])) <= 0)
{
printf("error on dup \n");
}
write(STDOUT_FILENO, "Hi \n", 5);
As stdout would now be duplicated to fd[1]. Therefore if we wrote to stdout we should be writing to write end of pipe. However this still prints to screen. So I assumed it should be fd[1] followed by stdout. So does that mean stdout is now a copy of fd[1] and that's why it's working.
Lastly if I wanted to write back to screen....how would I do this in same process?
The prototype for dup2 is: int dup2(int oldfd, int newfd);
So your cope:
dup2(STDOUT_FILENO, fd[1])
copies the stream associated with STDOUT_FILENO (which normally will be 1) to the decriptor in fd[1]. Let's assume you have put the descriptor value 4 in fd[1], then at the end, both 1 and 4 will both point to the "standard output stream" which is usually the terminal tty/pty.
After the call (if successful), fd[1] no longer refers to a pipe. Sounds like you are confusing dup/dup2 functionality with pipe. pipe() creates a descriptor pair with a read and write end. If you then fork, you can connect two processes with the pipe, and after that, a child process with a pipe can dup its pipe to STDIN_FILENO or STDOUT_FILENO such that standard library routines will read/write from those descriptors thinking they are reading/writing to the terminal.
The only thing that makes 0, 1, 2 special are that they are initially opened to a terminal, and that library routines refer to them by number (or macro SDTIN_FILENO, etc.) The dup calls basically increment the reference count for a particular descriptor and link the underlying descriptor slot to the original slot.
Sounds like what you want to do is pass fd[1] in the first argument, and dup it to STDOUT_FILENO in order to connect your pipe to a standard stream.

Why should you close a pipe in linux?

When using a pipe for process-process communication, what is the purpose of closing one end of the pipe?
For example: How to send a simple string between two programs using pipes?
Notice that one side of the pipe is closed in the child and parent processes. Why is this required?
If you connect two processes - parent and child - using a pipe, you create the pipe before the fork.
The fork makes the both processes have access to both ends of the pipe. This is not desirable.
The reading side is supposed to learn that the writer has finished if it notices an EOF condition. This can only happen if all writing sides are closed. So it is best if it closes its writing FD ASAP.
The writer should close its reading FD just in order not to have too many FDs open and thus reaching a maybe existing limit of open FDs. Besides, if the then only reader dies, the writer gets notified about this by getting a SIGPIPE or at least an EPIPE error (depending on how signals are defined). If there are several readers, the writer cannot detect that "the real one" went away, goes on writing and gets stuck as the writing FD blocks in the hope, the "unused" reader will read something.
So here in detail what happens:
parent process calls pipe() and gets 2 file descriptors: let's call it rd and wr.
parent process calls fork(). Now both processes have a rd and a wr.
Suppose the child process is supposed to be the reader.
Then
the parent should close its reading end (for not wasting FDs and for proper detection of dying reader) and
the child must close its writing end (in order to be possible to detect the EOF condition).
The number of file descriptors that can be open at a given time is limited. If you keep opening pipes and not closing them pretty soon you'll run out of FDs and can't open anything anymore: not pipes, not files, not sockets, ...
Another reason why it can be important to close the pipe is when the closing itself has a meaning to the application. For example, a common use of pipes is to send the errno from a child process to the parent when using fork and exec to launch an external program:
The parent creates the pipe, calls fork to create a child process, closes its writing end, and tries to read from the pipe.
The child process attempts to use exec to run a different program:
If exec fails, for example because the program does not exist, the child writes errno to the pipe, and the parent reads it and knows what went wrong, and can tell the user.
If exec is successful the pipe is closed without anything being written. The read function in the parent returns 0 indicating the pipe was closed and knows the program was successfully started.
If the parent did not close its writing end of the pipe before trying to read from the pipe this would not work because the read function would never return when exec is successful.
Closing unused pipe file descriptor is more than a matter of ensuring that a process doesn't exhaust its limited set of file descriptor-it is essential to the correct use of pipes. We now consider why the unused file descriptors for both the read and write ends of the pipe must be closed.
The process reading from the pipe closes its write descriptor for the pipe, so that, when the other process completes its output and closes its write descriptor, the read sees end-of-file (once it has ready any outstanding data in the pipe).
If the reading process doesn't close the write end of the pipe, then after the other process closes its write descriptor, the reader won't see end-of-file, even after it has read all data from the pipe. Instead, a read() would block waiting for data, because the kernel knows that there is still at least one write descriptor open for the pipe.That this descriptor is held open by the reading process itself is irrelevant; In theory, that process could still write to the pipe, even if it is blocked trying to read.
For example, the read() might be interrupted by a signal handler that writes data to the pipe.
The writing process closes its read descriptor for the pipe for a different reason.
When a process tries to write to a pipe for which no process has an open read descriptor, the kernel sends the SIGPIPE signal to the writing process. By default, this signal kills a process. A process can instead arrange to catch or ignore this signal, in which case the write() on the pipe fails with the error EPIPE (broken pipe). Receiving the SIGPIPE signal or getting the EPIPE error is useful indication about the status of the pipe, and this is why unused read descriptors for the pipe should be closed.
If the writing process doesn't close the read end of the pipe, then even after the other process closes the read end of the pipe, the writing process will fill the pipe, and a further attempt to write will block indefinitely.
One final reason for closing unused file descriptor is that only after it all file descriptor are closed that the pipe is destroyed and its resources released for reuse by other processes. At this point, any unread data in the pipe is lost.
~ Micheal Kerrisk , the Linux programming interface

Why make stdin, stdout and stderr to a single fd?

I saw this code snippet from APUE
dup2(fd,0);
dup2(fd,1);
dup2(fd, 2);
if (fd > 2)
close(fd);
In my understanding, it makes stdin, stdout and stderr all point to fd. It says that lots program contain this code, why? What's it functionality?
I'm going to add to the comments and answer here because even though they're correct, I would still have a hard time understanding exactly when and why this sequence of calls were needed.
This sequence of function calls is typically used when a process will run as a daemon. In that case, among other things, the daemon doesn't want to have the standard I/O file descriptors attached to the terminal (or other resources). To 'detach' those descriptors, something like the following might occur:
int fd;
fd = open("/dev/null",O_RDWR); // missing from APUE exercise 3.4 example
if (fd != -1)
{
dup2 (fd, 0); // stdin
dup2 (fd, 1); // stdout
dup2 (fd, 2); // stderr
if (fd > 2) close (fd);
}
What this does is bind /dev/null' to each of the standard I/O descriptors and closes the temporary descriptor used to open/dev/null` in the first place (as long as that open didn't end up using one of the descriptors usually used for the standard I/O descriptors for some reason).
Now the daemon has valid stdin/stdout/stderr descriptors, but they aren't referring to a file or device that might interfere with another process.
This is mostly used in daemon programs because the daemon not connected with the terminal or tty. so for that we need maintain the error or printed statements in one file. for that only we were using this statements. In our system File descriptor 0,1,2 is already allocated for the standard buffers like stdin,etc...
Dup2 function is something different from dup function.
In dup2 function we no need to close already using file descriptor.
In this dup2 function itself if the second argument file descriptors is already using means
without close() function dup2 is closed the second argument fd and allocated a dup of first argument fd.
Then first argument fd is connected to second fd and do the first fd works
For example dup2(fd,1) means the file descriptor works are copied to the stdout.
fd is contains any the statements is print the stdout.

Do we need to close the read end of a pipe explicitly whose write end has already been closed?

I have this following scenario.
I create a pipe.
Forked a child process.
Child closes read end of the pipe explicitly and writes into the write end of the pipe and exits without closing anything ( exit should close all open file/pipe descriptors on behalf of the child, I presume).
Parent closes the write end of the pipe explicitly and reads from the read end of the pipe using fgets until fgets returns NULL. ie it reads completely.
Now my question is, why does the parent need to close the read end of the pipe explicitly once its done reading? Isn't it wise for the system to delete the pipe altogether once complete data has been read from the read-end?
I dint close the read end explicitly in the parent and I have Too many file descriptors error sooner or later while opening more pipes. My assumption was that the system automatically deletes a pipe once its write end is closed and data has been completely read from read end. Cos you cant from a pipe twice!
So, whats the rationale behind the system not deleting the pipe once data has been completely read and write end closed?
You're correct that the system will close the write end of the pipe once the child exits. However there could be another write end of that pipe open, if the child forks or passes a duplicate of the write end to another process.
It is still true that the system would be able to tell when all the descriptors at one end of a pipe have been closed (either explicitly or because the owning process exited). It still doesn't make sense to close those on the other end of the pipe, as that would lead to confusion when the parent process tries to close the descriptor on its end of the pipe; either:
the fd has been closed by the system, in which case there is an error as it tries to close an already closed fd; or
the fd has been reused, which is even worse as it is now closing a completely unrelated fd.
From the point of view of the system, it might well have discarded the pipe once all the descriptors at one end have been closed, so you don't need to worry about inefficiency there. What matters more is that the user space process should have a consistent experience, which means not closing the descriptor unless it is specifically requested.
File descriptors are not closed by the system, until the process exits. This is true for pipes, as well as any other file descriptor.
There's a big difference between a pipe (or any other file) with no data in it and a closed file descriptor.
When a file descriptor is closed, the system can reuse its number for a new file descriptor. Then, when you read, you get something else. So after you've closed a file descriptor, you must no longer use it.
Now imagine that once there's no more data, the system would automatically close the file descriptor. This would make the number available for reuse, and a subsequent unrelated open may get it. Now the reader, who doesn't know yet that there's no more data, will read from what it thinks is the pipe, but will actually read from another file.

Resources