How does this example use of dup work? - c

I've been wanting to create a child process that forks off twice to create two child processes. With the output of one, sent to the other.
I found this example code. But I'm confused as to how it works.
I found an example here. But I'm confused by the way dup is used and how it works.
i.e.
close(1);
dup(fd[1]);
close(fd[0]);
close(fd[1]);
Output is then piped into a second forked process and it's pipes are connected like this:
close(0);
dup(fd[0]);
close(fd[0]);
close(fd[1]);

The main relevant lines are these — they form a standard idiom (but it is easier to replace the first two lines with dup2(fd[1], 1)):
close(1);
dup(fd[1]);
close(fd[0]);
close(fd[1]);
The dup() function duplicates its argument file descriptor to the lowest-numbered unopen file descriptor. The close() closes descriptor 1, and descriptor 0 is still open, so the dup() makes standard output 1 refer to the write side of the pipe fd[1]. The other two close calls correctly close both ends of the pipe. The process shouldn't be reading from the read end of the pipe fd[0] and standard output is writing to the write end of the pipe so the other descriptor isn't needed any more (and could lead to problems if it was not closed).
So, this is a standard sequence for connecting the write end of a pipe to the standard output of a process. The second sequence is similar, but connects the read end of the pipe to standard input (instead of the write end to standard output).
Generally, when you connect one end of a pipe to either standard input or standard output, the process should close both ends of the original pipe.
I note that there was no error checking, though it is unlikely that anything would go wrong — unless the process had been started with either standard output or standard input closed, contrary to all reasonable expectations.

Related

Communicating across child processes with a pipe

I have been tasked with creating my own shell in c. I am to use fork(), pipe(), exec(), and wait() to achieve this. I have a good start, but the more I research about pipes, the more confused I get. Every example of piping to a child processes looks like this:
I completely understand this. I have implemented it before. My problem is with how simple the example is. In creating a shell, I need two children to communicate with each other through a pipe in order to run a command like "cat file | grep hello". I can imagine a few ways of doing this. This was my first idea:
This doesn't seem to work. I could just be that my code is flawed, but I suspect my
understanding of pipes and file descriptors is insufficient. I figured that since pipe was called in main() and fd[] is a file variable, this strategy should work. The Linux manual states "At the time of fork() both memory spaces have the same content." Surely my child processes can access the pipe through the same file descriptors.
Is there a flaw in my understanding? I could try to make the processes run concurrently like so:
But I'm not sure why this would behave differently.
Question: If a process writes to a pipe, but there is no immediate second process to read that data, does the data get lost?
Most examples online show that each process needs to close the end of the pipe that it is not using. However, occasionally I see an example that closed both ends of the pipe in both processes:
close(fd[1]);
dup2(fd[0], STDIN_FILENO);
close(fd[0]);
As best I can tell, dup2 duplicates the file descriptor, making 2 open file descriptors to the same file. If I don't close BOTH, then execvp() continues to expect input and never exits. This means that when I am done with the reading, I should close(stdin).
Question: With 2 children communicating over a pipe, does the main process need anything with the pipe, such as close(fd[0])?
If a process writes to a pipe, but there is no immediate second process to read that data, does the data get lost?
No.
By default, writing to a pipe is a blocking action. That is, writing to a pipe will block execution of the calling process until there is enough room in the pipe to write the requested data.
The responsibility is on the reading side to drain the pipe to make room, or close their side of the pipe to signal they no longer wish to receive data.
With 2 children communicating over a pipe, does the main process need anything with the pipe, such as close(fd[0])?
Each process involved will have its own copy of the file descriptors.
As such, the parent process should close both ends of the pipe (after both forks), since it has no reason to hold onto those file descriptors. Failing to do so could result in the parent process running out of file descriptors (ulimit -n).
Your understanding of dup2 appears to be correct.
close(fd[1]);
dup2(fd[0], STDIN_FILENO);
close(fd[0]);
Both ends of the pipe are closed because after dup2 the file descriptor usually associated with stdin now refers to the same file description that the file descriptor for the read end of the pipe does.
stdin is of course closed closed when the replacement process image (exec*) exits.
Your second example of forking two processes, where they run concurrently, is the correct understanding.
In your typical shell, piped commands run concurrently. Otherwise, as stated earlier, the writer may fill the pipe and block before completing its task.
Generally, the parent waits for both processes to finish.
Here's a toy example. Run as ./program FILE STRING to emulate cat FILE | grep STRING.
#include <stdio.h>
#include <sys/wait.h>
#include <unistd.h>
int main(int argc, char **argv) {
int fds[2];
pipe(fds);
int left = fork();
if (0 == left) {
close(fds[0]);
dup2(fds[1], fileno(stdout));
close(fds[1]);
execlp("cat", "cat", argv[1], NULL);
return 1;
}
int right = fork();
if (0 == right) {
close(fds[1]);
dup2(fds[0], fileno(stdin));
close(fds[0]);
execlp("grep", "grep", argv[2], NULL);
return 1;
}
close(fds[0]);
close(fds[1]);
waitpid(left, NULL, 0);
waitpid(right, NULL, 0);
}

Why parent process has to close all file descriptors of a pipe before calling wait( )?

I do not know why the parent process needs to close both the file descriptors of a pipe before calling wait()?
I have a C program which does:
Parent creates child_a, which executes ls -l using execvp, and writes to the pipe (after closing read end of pipe).
Parent creates another child (without closing any file descriptor for pipe), called child_b, which executes 'wc' by reading from pipe.(after closing write end of pipe).
Parent waits for both children to complete by calling wait() twice.
I noticed that program is blocked if parent does not close both file descriptors of the pipe before calling the wait() syscall. Also after reading few questions already posted online it looks like this is the general rule and needs to be done. But I could not find the reason why this has to be done?
Why does wait() not return if the parent does not close the file descriptors of the pipe?
I was thinking that, in the worst case, if the parent does not close the file descriptor of pipe, then the only consequence would be that the pipe would keep existing (which is a waste of resource). But I never thought this would block the execution of child process (as can be seen because wait() does not return).
Also remember, parent is not using the pipe at all. It is child_a writing in the pipe, and child_b reading from the pipe.
If the parent process doesn't close the write ends of the pipes, the child processes never get EOF (zero bytes read) because there's a process that might (but won't) write to the pipe. The child process must also close the write end of the pipe for the same reason — if it doesn't, there's a process (itself) that might (but won't) write to the pipe, so the read won't return EOF.
If you duplicate one end of a pipe to standard output or standard error, you should close both ends of that pipe. It is a common mistake not to have enough calls to close() in multiprocess code using pipes. Occasionally, you get away with being sloppy, but the details vary by case and usually you don't.

Understanding the Unix dup2 system call?

I'm playing around with the dup2() function to try and get a better grasp of it.
From looking at the manual, it takes two parameters. First is the existing file descriptor and second parameter is the copied file descriptor.
I decided to try and redirect stdout to my write end of the pipe.
Judging by the manual I thought the code should be...
if ((dup2(STDOUT_FILENO, fd[1])) <= 0)
{
printf("error on dup \n");
}
write(STDOUT_FILENO, "Hi \n", 5);
As stdout would now be duplicated to fd[1]. Therefore if we wrote to stdout we should be writing to write end of pipe. However this still prints to screen. So I assumed it should be fd[1] followed by stdout. So does that mean stdout is now a copy of fd[1] and that's why it's working.
Lastly if I wanted to write back to screen....how would I do this in same process?
The prototype for dup2 is: int dup2(int oldfd, int newfd);
So your cope:
dup2(STDOUT_FILENO, fd[1])
copies the stream associated with STDOUT_FILENO (which normally will be 1) to the decriptor in fd[1]. Let's assume you have put the descriptor value 4 in fd[1], then at the end, both 1 and 4 will both point to the "standard output stream" which is usually the terminal tty/pty.
After the call (if successful), fd[1] no longer refers to a pipe. Sounds like you are confusing dup/dup2 functionality with pipe. pipe() creates a descriptor pair with a read and write end. If you then fork, you can connect two processes with the pipe, and after that, a child process with a pipe can dup its pipe to STDIN_FILENO or STDOUT_FILENO such that standard library routines will read/write from those descriptors thinking they are reading/writing to the terminal.
The only thing that makes 0, 1, 2 special are that they are initially opened to a terminal, and that library routines refer to them by number (or macro SDTIN_FILENO, etc.) The dup calls basically increment the reference count for a particular descriptor and link the underlying descriptor slot to the original slot.
Sounds like what you want to do is pass fd[1] in the first argument, and dup it to STDOUT_FILENO in order to connect your pipe to a standard stream.

Clarification on how pipe() and dup2() work in C

I am writing a simple shell that handles piping. I have working code, but I don't quite understand how it all works under the hood. Here is a modified code snippet I need help understanding (I removed error checking to shorten it):
int fd[2];
pipe(fd);
if (fork()) { /* parent code */
close(fd[1]);
dup2(fd[0], 0);
/* call to execve() here */
} else { /* child code */
close(fd[0]);
dup2(fd[1], 1);
}
I have guesses for my questions, but that's all they are - guesses. Here are the questions I have:
Where is the blocking performed? In all the example code I've seen, read() and write() provide the blocking, but I didn't need to use them here. I just copy STDIN to point at the at the read end of the pipe and STDOUT to point to the write end of the pipe. What I'm guessing is happening is that STDIN is doing the blocking after dup2(fd[0], 0) is executed. Is this correct?
From what I understand, there is a descriptor table for each running process that points to the open files in the file table. What happens when a process redirects STDIN, STDOUT, or STDERR? Are these file descriptors shared across all processes' descriptor tables? Or are there copies for each process? Does redirecting one cause changes to be reflected among all of them?
After a call to pipe() and then a subsequent call to fork() there are 4 "ends" of the pipe open: A read and a write end accessed by the parent and a read and a write end accessed by the child. In my code, I close the parent's write end and the child's read end. However, I don't close the remaining two ends after I'm done with the pipe. The code works fine, so I assume that some sort of implicit closing is done, but that's all guess work. Should I be adding explicit calls to close the remaining two ends, like this?
int fd[2];
pipe(fd);
if (fork()) { /* parent code */
close(fd[1]);
dup2(fd[0], 0);
/* call to execve() here */
close(fd[0]);
} else { /* child code */
close(fd[0]);
dup2(fd[1], 1);
close(fd[1]);
}
This is more of a conceptual question about how the piping process works. There is the read end of the pipe, referred to by the file handle fd[0], and the write end of the pipe, referred to by the file handle fd[1]. The pipe itself is just an abstraction represented by a byte stream. The file handles represent open files, correct? So does that mean that somewhere in the system, there is a file (pointed at by fd[1]) that has all the information we want to send down the pipe written to it? And that after pushing that information through the byte stream, there is a file (pointed at by fd[0]) that has all that information written to it as well, thus creating the abstraction of a pipe?
Nothing in the code you've provided blocks. fork, dup2, and close all operate immediately. The code does not pause execution anywhere in the lines you've printed. If you're observing any waiting or hanging, it's elsewhere in your code (eg. in a call to waitpid or select or read).
Each process has its own file descriptor table. The files objects are global between all processes (and a file in the file system may be open multiple times, with different file objects representing it), but the file descriptors are per-process, a way for each process to reference the file objects. So a file descriptor like "1" or "2" only has meaning in your process -- "file number 1" and "file number 2" probably mean something different to another process. But it's possible for processes to reference the same file object (although each might have a different number for it).
So, technically, that's why there are two sets of flags you can set on file descriptors, the file descriptor flags that aren't shared between processes (F_CLOEXEC), and the file object flags (such as O_NONBLOCK) that get shared even between processes.
Unless you do something weird like freopen on stdin/stdout/stderr (rare) they're just synonyms for fds 0,1,2. When you want to write raw bytes, call write with the file descriptor number; if you want to write pretty strings, call fprintf with stdin/stdout/stderr -- they go to the same place.
No implicit closing is done, you're just getting away with it. Yes, you should close file descriptors when you're done with them -- technically, I'd write if (fd[0] != 0) close(fd[0]); just to make sure!
Nope, there's nothing written to disk. It's a memory backed file, which means that the buffer doesn't get stored anywhere. When you write to a "regular" file on the disk, the written data is stored by the kernel in a buffer, and then passed on to the disk as soon as possible to commit. When you write to a pipe, it goes to a kernel-managed buffer just the same, but it won't normally go to disk. It just sits there until it's read by the reading end of the pipe, at which point the kernel discards it rather than saving it.
The pipe has a read and write end, so written data always goes at the end of the buffer, and data that's read out gets taken from the head of the buffer then removed. So, there's a strict ordering to the flow, just like in a physical pipe: the water drops that go in one end first come out first from the other end. If the tap at the far end is closed (process not reading) then you can't push (write) more data into your end of the pipe. If the data isn't being written and the pipe empties, you have to wait when reading until more data comes through.
First of all you usually call execve or one of its sister calls in the child process, not in the parent. Remember that a parent knows who its child is, but not vice-versa.
Underneath a pipe is really a buffer handled by the operating system in such a way that it is guaranteed that an attempt to write to it blocks if the buffer is full and that a read to it blocks if there is nothing to read. This is where the blocking you experience comes from.
In the good old days, when buffers were small and computers were slow, you could actually rely on the reading process being awoken intermittently, even for smallish amounts of data, say in the order of tens of kilobytes. Now in many cases the reading process gets its input in a single shot.

Confusion regarding usage of dup()

When we use dup to redirect STDOUT to a pipe we do:
close(1); dup(fd[1]);
close(fd[0]);
close(fd[1]);
execlp("ls","-al",(char *) NULL);
but we are closing both ends end of the pipe. then how the STDOUT can be written to the pipe?
You're not closing both ends of the pipe, in the example code. You're closing fd[0] and fd[1]. Initially, closing those would have been enough to close both ends of the pipe, but it's not after you duplicated fd[0]. You'd have to close the duplicated fd also to close all your references to the pipe. That would be silly though: you're keeping an end open precisely so that ls can write to.
Perhaps your confusion is about close() is closing? It closes the fd, the reference to one of the ends of the pipe. It doesn't close the pipe itself: that's what shutdown() would do. (If you don't call shutdown, the pipe is automatically closed when every fd referring to it has been closed.) So, because the duplicated fd is still open, the process can write to the pipe (which isn't closed, because only two of the three references to it were closed).
Because once the file descriptor number 1 (e.g. standard out) is closed, that number is available for further dup or open calls.
You should check the result of your close and dup syscalls.
Of course closing both ends of a pipe is non-sense, unless you do something useful before (i.e. reading or writing on appropriate ends).
See open(2), dup(2), pipe(2), close(2) man pages. Read the Advanced Linux Programming book.

Resources