Understanding the Unix dup2 system call? - c

I'm playing around with the dup2() function to try and get a better grasp of it.
From looking at the manual, it takes two parameters. First is the existing file descriptor and second parameter is the copied file descriptor.
I decided to try and redirect stdout to my write end of the pipe.
Judging by the manual I thought the code should be...
if ((dup2(STDOUT_FILENO, fd[1])) <= 0)
{
printf("error on dup \n");
}
write(STDOUT_FILENO, "Hi \n", 5);
As stdout would now be duplicated to fd[1]. Therefore if we wrote to stdout we should be writing to write end of pipe. However this still prints to screen. So I assumed it should be fd[1] followed by stdout. So does that mean stdout is now a copy of fd[1] and that's why it's working.
Lastly if I wanted to write back to screen....how would I do this in same process?

The prototype for dup2 is: int dup2(int oldfd, int newfd);
So your cope:
dup2(STDOUT_FILENO, fd[1])
copies the stream associated with STDOUT_FILENO (which normally will be 1) to the decriptor in fd[1]. Let's assume you have put the descriptor value 4 in fd[1], then at the end, both 1 and 4 will both point to the "standard output stream" which is usually the terminal tty/pty.
After the call (if successful), fd[1] no longer refers to a pipe. Sounds like you are confusing dup/dup2 functionality with pipe. pipe() creates a descriptor pair with a read and write end. If you then fork, you can connect two processes with the pipe, and after that, a child process with a pipe can dup its pipe to STDIN_FILENO or STDOUT_FILENO such that standard library routines will read/write from those descriptors thinking they are reading/writing to the terminal.
The only thing that makes 0, 1, 2 special are that they are initially opened to a terminal, and that library routines refer to them by number (or macro SDTIN_FILENO, etc.) The dup calls basically increment the reference count for a particular descriptor and link the underlying descriptor slot to the original slot.
Sounds like what you want to do is pass fd[1] in the first argument, and dup it to STDOUT_FILENO in order to connect your pipe to a standard stream.

Related

How does this example use of dup work?

I've been wanting to create a child process that forks off twice to create two child processes. With the output of one, sent to the other.
I found this example code. But I'm confused as to how it works.
I found an example here. But I'm confused by the way dup is used and how it works.
i.e.
close(1);
dup(fd[1]);
close(fd[0]);
close(fd[1]);
Output is then piped into a second forked process and it's pipes are connected like this:
close(0);
dup(fd[0]);
close(fd[0]);
close(fd[1]);
The main relevant lines are these — they form a standard idiom (but it is easier to replace the first two lines with dup2(fd[1], 1)):
close(1);
dup(fd[1]);
close(fd[0]);
close(fd[1]);
The dup() function duplicates its argument file descriptor to the lowest-numbered unopen file descriptor. The close() closes descriptor 1, and descriptor 0 is still open, so the dup() makes standard output 1 refer to the write side of the pipe fd[1]. The other two close calls correctly close both ends of the pipe. The process shouldn't be reading from the read end of the pipe fd[0] and standard output is writing to the write end of the pipe so the other descriptor isn't needed any more (and could lead to problems if it was not closed).
So, this is a standard sequence for connecting the write end of a pipe to the standard output of a process. The second sequence is similar, but connects the read end of the pipe to standard input (instead of the write end to standard output).
Generally, when you connect one end of a pipe to either standard input or standard output, the process should close both ends of the original pipe.
I note that there was no error checking, though it is unlikely that anything would go wrong — unless the process had been started with either standard output or standard input closed, contrary to all reasonable expectations.

Clarification on how pipe() and dup2() work in C

I am writing a simple shell that handles piping. I have working code, but I don't quite understand how it all works under the hood. Here is a modified code snippet I need help understanding (I removed error checking to shorten it):
int fd[2];
pipe(fd);
if (fork()) { /* parent code */
close(fd[1]);
dup2(fd[0], 0);
/* call to execve() here */
} else { /* child code */
close(fd[0]);
dup2(fd[1], 1);
}
I have guesses for my questions, but that's all they are - guesses. Here are the questions I have:
Where is the blocking performed? In all the example code I've seen, read() and write() provide the blocking, but I didn't need to use them here. I just copy STDIN to point at the at the read end of the pipe and STDOUT to point to the write end of the pipe. What I'm guessing is happening is that STDIN is doing the blocking after dup2(fd[0], 0) is executed. Is this correct?
From what I understand, there is a descriptor table for each running process that points to the open files in the file table. What happens when a process redirects STDIN, STDOUT, or STDERR? Are these file descriptors shared across all processes' descriptor tables? Or are there copies for each process? Does redirecting one cause changes to be reflected among all of them?
After a call to pipe() and then a subsequent call to fork() there are 4 "ends" of the pipe open: A read and a write end accessed by the parent and a read and a write end accessed by the child. In my code, I close the parent's write end and the child's read end. However, I don't close the remaining two ends after I'm done with the pipe. The code works fine, so I assume that some sort of implicit closing is done, but that's all guess work. Should I be adding explicit calls to close the remaining two ends, like this?
int fd[2];
pipe(fd);
if (fork()) { /* parent code */
close(fd[1]);
dup2(fd[0], 0);
/* call to execve() here */
close(fd[0]);
} else { /* child code */
close(fd[0]);
dup2(fd[1], 1);
close(fd[1]);
}
This is more of a conceptual question about how the piping process works. There is the read end of the pipe, referred to by the file handle fd[0], and the write end of the pipe, referred to by the file handle fd[1]. The pipe itself is just an abstraction represented by a byte stream. The file handles represent open files, correct? So does that mean that somewhere in the system, there is a file (pointed at by fd[1]) that has all the information we want to send down the pipe written to it? And that after pushing that information through the byte stream, there is a file (pointed at by fd[0]) that has all that information written to it as well, thus creating the abstraction of a pipe?
Nothing in the code you've provided blocks. fork, dup2, and close all operate immediately. The code does not pause execution anywhere in the lines you've printed. If you're observing any waiting or hanging, it's elsewhere in your code (eg. in a call to waitpid or select or read).
Each process has its own file descriptor table. The files objects are global between all processes (and a file in the file system may be open multiple times, with different file objects representing it), but the file descriptors are per-process, a way for each process to reference the file objects. So a file descriptor like "1" or "2" only has meaning in your process -- "file number 1" and "file number 2" probably mean something different to another process. But it's possible for processes to reference the same file object (although each might have a different number for it).
So, technically, that's why there are two sets of flags you can set on file descriptors, the file descriptor flags that aren't shared between processes (F_CLOEXEC), and the file object flags (such as O_NONBLOCK) that get shared even between processes.
Unless you do something weird like freopen on stdin/stdout/stderr (rare) they're just synonyms for fds 0,1,2. When you want to write raw bytes, call write with the file descriptor number; if you want to write pretty strings, call fprintf with stdin/stdout/stderr -- they go to the same place.
No implicit closing is done, you're just getting away with it. Yes, you should close file descriptors when you're done with them -- technically, I'd write if (fd[0] != 0) close(fd[0]); just to make sure!
Nope, there's nothing written to disk. It's a memory backed file, which means that the buffer doesn't get stored anywhere. When you write to a "regular" file on the disk, the written data is stored by the kernel in a buffer, and then passed on to the disk as soon as possible to commit. When you write to a pipe, it goes to a kernel-managed buffer just the same, but it won't normally go to disk. It just sits there until it's read by the reading end of the pipe, at which point the kernel discards it rather than saving it.
The pipe has a read and write end, so written data always goes at the end of the buffer, and data that's read out gets taken from the head of the buffer then removed. So, there's a strict ordering to the flow, just like in a physical pipe: the water drops that go in one end first come out first from the other end. If the tap at the far end is closed (process not reading) then you can't push (write) more data into your end of the pipe. If the data isn't being written and the pipe empties, you have to wait when reading until more data comes through.
First of all you usually call execve or one of its sister calls in the child process, not in the parent. Remember that a parent knows who its child is, but not vice-versa.
Underneath a pipe is really a buffer handled by the operating system in such a way that it is guaranteed that an attempt to write to it blocks if the buffer is full and that a read to it blocks if there is nothing to read. This is where the blocking you experience comes from.
In the good old days, when buffers were small and computers were slow, you could actually rely on the reading process being awoken intermittently, even for smallish amounts of data, say in the order of tens of kilobytes. Now in many cases the reading process gets its input in a single shot.

Breaking down shell scripts; What happens under the hood?

So, I was given this one line script:
echo test | cat | grep test
Could you please explain to me how exactly that would work given the following system calls: pipe(), fork(), exec() and dup2()?
I am looking for an general overview here and mainly the sequence of operations.
What I know so far is that the shell will fork using fork() and the script's code will replace the shell's one by using the exec(). But what about pipe and dup2? How do they fall in place?
Thanks in advance.
First consider a simpler example, such as:
echo test | cat
What we want is to execute echo in a separate process, arranging for its standard output to be diverted into the standard input of the process executing cat. Ideally this diversion, once setup, would require no further intervention by the shell — the shell would just calmly wait for both processes to exit.
The mechanism to achieve that is called the "pipe". It is an interprocess communication device implemented in the kernel and exported to the user-space. Once created by a Unix program, a pipe has the appearance of a pair of file descriptors with the peculiar property that, if you write into one of them, you can read the same data from the other. This is not very useful within the same process, but keep in mind that file descriptors, including but not limited to pipes, are inherited across fork() and even accross exec(). This makes pipe an easy to set up and reasonably efficient IPC mechanism.
The shell creates the pipe, and now owns a set of file descriptors belonging to the pipe, one for reading and one for writing. These file descriptors are inherited by both forked subprocesses. Now only if echo were writing to the pipe's write-end descriptor instead of to its actual standard output, and if cat were reading from the pipe's read-end descriptor instead of from its standard input, everything would work. But they don't, and this is where dup2 comes into play.
dup2 duplicates a file descriptor as another file descriptor, automatically closing the new descriptor beforehand. For example, dup2(1, 15) will close file descriptor 1 (by convention used for the standard output), and reopen it as a copy of file descriptor 15 — meaning that writing to the standard output will in fact be equivalent to writing to file descriptor 15. The same applies to reading: dup2(0, 8) will make reading from file descriptor 0 (the standard input) equivalent to reading from file descriptor 8. If we proceed to close the original file descriptor, the open file (or a pipe) will have been effectively moved from the original descriptor to the new one, much like sci-fi teleports that work by first duplicating a piece of matter at a remote location and then disintegrating the original.
If you're still following the theory, the order of operations performed by the shell should now be clear:
The shell creates a pipe and then fork two processes, both of which will inherit the pipe file descriptors, r and w.
In the subprocess about to execute echo, the shell calls dup2(1, w); close(w) before exec in order to redirect the standard output to the write end of the pipe.
In the subprocess about to execute cat, the shell calls dup2(0, r); close(r) in order to redirect the standard input to the read end of the pipe.
After forking, the main shell process must itself close both ends of the pipe. One reason is to free up resources associated with the pipe once subprocesses exit. The other is to allow cat to actually terminate — a pipe's reader will receive EOF only after all copies of the write end of the pipe are closed. In steps above, we did close the child's redundant copy of the write end, the file descriptor 15, right after its duplication to 1. But the file descriptor 15 must also exist in the parent, because it was inherited under that number, and can only be closed by the parent. Failing to do that leaves cat's standard input never reporting EOF, and its cat process hanging as a consequence.
This mechanism is easily generalized it to three or more processes connected by pipes. In case of three processes, the pipes need to arrange that echo's output writes to cat's input, and cat's output writes to grep's input. This requires two calls to pipe(), three calls to fork(), four calls to dup2() and close (one for echo and grep and two for cat), three calls to exec(), and four additional calls to close() (two for each pipe).

Why make stdin, stdout and stderr to a single fd?

I saw this code snippet from APUE
dup2(fd,0);
dup2(fd,1);
dup2(fd, 2);
if (fd > 2)
close(fd);
In my understanding, it makes stdin, stdout and stderr all point to fd. It says that lots program contain this code, why? What's it functionality?
I'm going to add to the comments and answer here because even though they're correct, I would still have a hard time understanding exactly when and why this sequence of calls were needed.
This sequence of function calls is typically used when a process will run as a daemon. In that case, among other things, the daemon doesn't want to have the standard I/O file descriptors attached to the terminal (or other resources). To 'detach' those descriptors, something like the following might occur:
int fd;
fd = open("/dev/null",O_RDWR); // missing from APUE exercise 3.4 example
if (fd != -1)
{
dup2 (fd, 0); // stdin
dup2 (fd, 1); // stdout
dup2 (fd, 2); // stderr
if (fd > 2) close (fd);
}
What this does is bind /dev/null' to each of the standard I/O descriptors and closes the temporary descriptor used to open/dev/null` in the first place (as long as that open didn't end up using one of the descriptors usually used for the standard I/O descriptors for some reason).
Now the daemon has valid stdin/stdout/stderr descriptors, but they aren't referring to a file or device that might interfere with another process.
This is mostly used in daemon programs because the daemon not connected with the terminal or tty. so for that we need maintain the error or printed statements in one file. for that only we were using this statements. In our system File descriptor 0,1,2 is already allocated for the standard buffers like stdin,etc...
Dup2 function is something different from dup function.
In dup2 function we no need to close already using file descriptor.
In this dup2 function itself if the second argument file descriptors is already using means
without close() function dup2 is closed the second argument fd and allocated a dup of first argument fd.
Then first argument fd is connected to second fd and do the first fd works
For example dup2(fd,1) means the file descriptor works are copied to the stdout.
fd is contains any the statements is print the stdout.

Confusion regarding usage of dup()

When we use dup to redirect STDOUT to a pipe we do:
close(1); dup(fd[1]);
close(fd[0]);
close(fd[1]);
execlp("ls","-al",(char *) NULL);
but we are closing both ends end of the pipe. then how the STDOUT can be written to the pipe?
You're not closing both ends of the pipe, in the example code. You're closing fd[0] and fd[1]. Initially, closing those would have been enough to close both ends of the pipe, but it's not after you duplicated fd[0]. You'd have to close the duplicated fd also to close all your references to the pipe. That would be silly though: you're keeping an end open precisely so that ls can write to.
Perhaps your confusion is about close() is closing? It closes the fd, the reference to one of the ends of the pipe. It doesn't close the pipe itself: that's what shutdown() would do. (If you don't call shutdown, the pipe is automatically closed when every fd referring to it has been closed.) So, because the duplicated fd is still open, the process can write to the pipe (which isn't closed, because only two of the three references to it were closed).
Because once the file descriptor number 1 (e.g. standard out) is closed, that number is available for further dup or open calls.
You should check the result of your close and dup syscalls.
Of course closing both ends of a pipe is non-sense, unless you do something useful before (i.e. reading or writing on appropriate ends).
See open(2), dup(2), pipe(2), close(2) man pages. Read the Advanced Linux Programming book.

Resources