Communicating across child processes with a pipe - c

I have been tasked with creating my own shell in c. I am to use fork(), pipe(), exec(), and wait() to achieve this. I have a good start, but the more I research about pipes, the more confused I get. Every example of piping to a child processes looks like this:
I completely understand this. I have implemented it before. My problem is with how simple the example is. In creating a shell, I need two children to communicate with each other through a pipe in order to run a command like "cat file | grep hello". I can imagine a few ways of doing this. This was my first idea:
This doesn't seem to work. I could just be that my code is flawed, but I suspect my
understanding of pipes and file descriptors is insufficient. I figured that since pipe was called in main() and fd[] is a file variable, this strategy should work. The Linux manual states "At the time of fork() both memory spaces have the same content." Surely my child processes can access the pipe through the same file descriptors.
Is there a flaw in my understanding? I could try to make the processes run concurrently like so:
But I'm not sure why this would behave differently.
Question: If a process writes to a pipe, but there is no immediate second process to read that data, does the data get lost?
Most examples online show that each process needs to close the end of the pipe that it is not using. However, occasionally I see an example that closed both ends of the pipe in both processes:
close(fd[1]);
dup2(fd[0], STDIN_FILENO);
close(fd[0]);
As best I can tell, dup2 duplicates the file descriptor, making 2 open file descriptors to the same file. If I don't close BOTH, then execvp() continues to expect input and never exits. This means that when I am done with the reading, I should close(stdin).
Question: With 2 children communicating over a pipe, does the main process need anything with the pipe, such as close(fd[0])?

If a process writes to a pipe, but there is no immediate second process to read that data, does the data get lost?
No.
By default, writing to a pipe is a blocking action. That is, writing to a pipe will block execution of the calling process until there is enough room in the pipe to write the requested data.
The responsibility is on the reading side to drain the pipe to make room, or close their side of the pipe to signal they no longer wish to receive data.
With 2 children communicating over a pipe, does the main process need anything with the pipe, such as close(fd[0])?
Each process involved will have its own copy of the file descriptors.
As such, the parent process should close both ends of the pipe (after both forks), since it has no reason to hold onto those file descriptors. Failing to do so could result in the parent process running out of file descriptors (ulimit -n).
Your understanding of dup2 appears to be correct.
close(fd[1]);
dup2(fd[0], STDIN_FILENO);
close(fd[0]);
Both ends of the pipe are closed because after dup2 the file descriptor usually associated with stdin now refers to the same file description that the file descriptor for the read end of the pipe does.
stdin is of course closed closed when the replacement process image (exec*) exits.
Your second example of forking two processes, where they run concurrently, is the correct understanding.
In your typical shell, piped commands run concurrently. Otherwise, as stated earlier, the writer may fill the pipe and block before completing its task.
Generally, the parent waits for both processes to finish.
Here's a toy example. Run as ./program FILE STRING to emulate cat FILE | grep STRING.
#include <stdio.h>
#include <sys/wait.h>
#include <unistd.h>
int main(int argc, char **argv) {
int fds[2];
pipe(fds);
int left = fork();
if (0 == left) {
close(fds[0]);
dup2(fds[1], fileno(stdout));
close(fds[1]);
execlp("cat", "cat", argv[1], NULL);
return 1;
}
int right = fork();
if (0 == right) {
close(fds[1]);
dup2(fds[0], fileno(stdin));
close(fds[0]);
execlp("grep", "grep", argv[2], NULL);
return 1;
}
close(fds[0]);
close(fds[1]);
waitpid(left, NULL, 0);
waitpid(right, NULL, 0);
}

Related

How does this example use of dup work?

I've been wanting to create a child process that forks off twice to create two child processes. With the output of one, sent to the other.
I found this example code. But I'm confused as to how it works.
I found an example here. But I'm confused by the way dup is used and how it works.
i.e.
close(1);
dup(fd[1]);
close(fd[0]);
close(fd[1]);
Output is then piped into a second forked process and it's pipes are connected like this:
close(0);
dup(fd[0]);
close(fd[0]);
close(fd[1]);
The main relevant lines are these — they form a standard idiom (but it is easier to replace the first two lines with dup2(fd[1], 1)):
close(1);
dup(fd[1]);
close(fd[0]);
close(fd[1]);
The dup() function duplicates its argument file descriptor to the lowest-numbered unopen file descriptor. The close() closes descriptor 1, and descriptor 0 is still open, so the dup() makes standard output 1 refer to the write side of the pipe fd[1]. The other two close calls correctly close both ends of the pipe. The process shouldn't be reading from the read end of the pipe fd[0] and standard output is writing to the write end of the pipe so the other descriptor isn't needed any more (and could lead to problems if it was not closed).
So, this is a standard sequence for connecting the write end of a pipe to the standard output of a process. The second sequence is similar, but connects the read end of the pipe to standard input (instead of the write end to standard output).
Generally, when you connect one end of a pipe to either standard input or standard output, the process should close both ends of the original pipe.
I note that there was no error checking, though it is unlikely that anything would go wrong — unless the process had been started with either standard output or standard input closed, contrary to all reasonable expectations.

Why parent process has to close all file descriptors of a pipe before calling wait( )?

I do not know why the parent process needs to close both the file descriptors of a pipe before calling wait()?
I have a C program which does:
Parent creates child_a, which executes ls -l using execvp, and writes to the pipe (after closing read end of pipe).
Parent creates another child (without closing any file descriptor for pipe), called child_b, which executes 'wc' by reading from pipe.(after closing write end of pipe).
Parent waits for both children to complete by calling wait() twice.
I noticed that program is blocked if parent does not close both file descriptors of the pipe before calling the wait() syscall. Also after reading few questions already posted online it looks like this is the general rule and needs to be done. But I could not find the reason why this has to be done?
Why does wait() not return if the parent does not close the file descriptors of the pipe?
I was thinking that, in the worst case, if the parent does not close the file descriptor of pipe, then the only consequence would be that the pipe would keep existing (which is a waste of resource). But I never thought this would block the execution of child process (as can be seen because wait() does not return).
Also remember, parent is not using the pipe at all. It is child_a writing in the pipe, and child_b reading from the pipe.
If the parent process doesn't close the write ends of the pipes, the child processes never get EOF (zero bytes read) because there's a process that might (but won't) write to the pipe. The child process must also close the write end of the pipe for the same reason — if it doesn't, there's a process (itself) that might (but won't) write to the pipe, so the read won't return EOF.
If you duplicate one end of a pipe to standard output or standard error, you should close both ends of that pipe. It is a common mistake not to have enough calls to close() in multiprocess code using pipes. Occasionally, you get away with being sloppy, but the details vary by case and usually you don't.

2 children write to pipe, parent reads from pipe in C

I've searched and searched to no avail, so I've finally come to ask for some help.
My assignment involves (in C running on RedHat Linux) creating two child processes, each of which write characters for an arbitrary amount of iterations to a pipe. This pipe is shared with the parent process that created them. The parent reads from the pipe while the children still have characters to write to it. Once the children have exited and the pipe is empty, the parent(main) can then terminate.
Here is a quick & dirty example of the logic of the code I have written.
main()
{
//create pipe
fork(); //childA
//writes to pipe
fork(); //childB
//writes to pipe
//parent reading
while(condition) {
//read from pipe + print chars to terminal
}
}
Now, my question is regarding the condition of the while loop.
I need to read when the children are blocked from writing due to a full pipe, but I cannot figure out what type of condition would allow me to do this. Any help would be absolutely amazing.
Is it a requirement that the pipe needs to be full when you read? Or is it simply a requirement that you keep reading from both children, even if one of the pipes is full so the child is blocked on writing?
I don't know of any standard way to tall if a pipe is full. You could try using the FIONREAD ioctl to determine how much data there is to read on the pipe, and compare that against PIPE_BUF, but that may not work properly if there is less than PIPE_BUF data in the pipe, but the child is doing a write that would put it over PIPE_BUF; that call may block without filling up the pipe. I would not rely on this technique working.
The usual way to keep reading from two file descriptors, regardless of which one is ready, is to use the select system call. This allows you to wait until one or the other file descriptor has data available. This means that the parent process won't block trying to read from one child which doesn't have any data available, while the other child is blocking because its buffer is full.
edit: After re-reading your question, it sounds like there is actually only one pipe, that both children are writing to. Is that correct? Again, the question comes up about whether you are required to wait until the children are blocking. If the parent simply reads from the pipe, it will block until one of the children has written to the pipe; you don't need to do anything special.
If your assignment requires you to wait until the pipe is actually full, I'd be interested in seeing the exact wording, because I'm not sure why you would want to do that.
edit 2: In response to your questions in the comment:
Does the code for my parent process need to follow the code for my two child processes within the program?
No, there is no requirement about the order of the code for the parent or child processes. In order to distinguish the code the parent runs from the code the children run, you check the return value of fork(). If the return value is 0, you are in the child process. If it is not, you are in the parent; if the return value is -1, then there was an error, if it's positive, it is the PID of the child process.
So, you can write code like this:
int pid = fork();
if (pid) {
// in the parent, check if pid is -1 for errors
} else {
// in the child
}
Or:
int pid = fork();
if (pid == 0) {
// in the child, do whatever you need to do and...
exit(0);
}
// in the parent; since the child calls exit() above, control will never
// reach here in the child. Or you could do an execl() in the child, which
// replaces the current program with another one, so again, control will
// never reach here within the child process.
How can I keep reading from the pipe until the two children terminate AND the pipe is empty?
Just keep reading until read returns 0. read on the read end of the pipe will not return 0 until all processes have closed the write end of the pipe, and all data has been read out of the pipe. If something still has it open, but there is no data in the pipe, read will block until data is written to the pipe.
One gotcha is to remember to close the write end of the pipe in the parent process before trying to read; otherwise, when the parent tries to do a read after the children have finished, it will block waiting for it to close its own pipe.

Do I have to make a new pipe for every pair of processes in C?

If I have 4 processes that I want to pipe:
process1 | process2 | process3 | process4
do I have to make 3 individual pipes likes this
int pipe1[2];
int pipe2[2];
int pipe3[2];
or can I somehow recycle pipe names like in this pseudocode:
int pipe1[2]; // we use ONLY two pipe names: pipe1
int pipe2[2]; // and pipe2
pipe(pipe1); // getting 2 file descriptors here
pipe(pipe2); // and 2 here
for process=1 to 4
if (process==3) // getting 2 new file descriptors for
pipe(pipe1); // process3|process4 (reusing pipe1)
fork() // forking here
if (child 1) then
use pipe1
if (child 2) then
use pipe1
use pipe2
if (child 3) then
use pipe2
use pipe1 //the pipe1 that we re-pipe()ed
if (child 3) then
use pipe1 //the pipe1 that we re-pipe()ed
Would this work? I am not sure if repiping pipe1 will have an impact on the previous forked processes that used pipe1.
Short answer:
no, "repiping" pipe1 will not have an impact on the previous forked processes that used pipe1, but you're better off declaring 3 pipes and pipe()'ing before fork()'ing.
Long answer:
To understand why, let's first see what happens when you create a "pipe", and then what happens when you "fork" a process.
When you call pipe(), it
"creates a pipe (an object that allows unidirectional
data flow) and allocates a pair of file descriptors. The first descriptor connects to the read end of the pipe; the second connects to the write end." (This is from the man pipe page)
These file descriptors are stored into the int array you passed into it.
When you call fork(), "The new process (child process) shall be an exact copy of the calling process" (This is from the man fork() page)
In other words, the parent process will create a child process, and that child process will have it's own copy of the data.
So when child 3 calls pipe(pipe1), it will be creating a new pipe, and storing the new file descriptors in it's own copy of the pipe1 variable, without modifying any other process's pipe1.
Even though you can get away with only declaring two pipe variables and just calling pipe() in child 3, it's not very easy to read, and other's (including yourself) will be confused later on when they have to look at your code.
For more on fork()'s and pipe()'s, take a look at http://beej.us/guide/bgipc/output/html/multipage/index.html
The way I've done it in the past, and the way I would do it again, was to not reuse pipes and end up with N-1 pipes. It would also depend on whether or not you want to have more than two process running at the same time communicating, if so then you'd obviously have problems with reusing 2 pipes.
You need one pipe, and thus one call to pipe(), for each | character in your command.
You do not need to use three separate int [2] arrays to store the pipe file descriptors, though. The system does not care what variable you store the pipe file descriptors in - they are just ints.

Using execl to execute a daemon

I'm writing a program in C on Linux which includes a module that
allows a shell command to be executed on a remote machine. The
easiest way to actually execute the command would of course be to
simply use the system() function, or use popen and then grab the
output. However, I chose to use a more low-level approach due to other design requirements which are not relevant to the current
problem.
So basically, I set up a pipe and fork, and then call execl. This all
works perfectly, except for one annoying exception. It doesn't work
properly if the shell command to be executed is a daemon. In that
case, it just hangs. I can't figure out why. My understanding is
that when a daemon starts, it typically forks and then the parent exits. Since my application has an open pipe to the parent, the call
to read() should fail when the parent exits. But instead
the application just hangs.
Here is some bare bones code that reproduces the problem:
int main(int argc, char** argv)
{
// Create a pipe and fork
//
int fd[2];
int p = pipe(fd);
pid_t pid = fork();
if (pid > 0)
{
// Read from the pipe and output the result
//
close(fd[1]);
char buf[1024] = { 0 };
read(fd[0], buf, sizeof(buf));
printf("%s\n", buf);
// Wait for child to terminate
int status;
wait(&status);
}
else if (pid == 0)
{
// Redirect stdout and stderr to the pipe and execute the shell
// command
//
dup2(fd[1], STDOUT_FILENO);
dup2(fd[1], STDERR_FILENO);
close(fd[0]);
execl("/bin/sh", "sh", "-c", argv[1], 0);
}
}
The code works fine if you use it with a normal shell command. But if
you try to run a daemon, it just hangs instead of returning to the
prompt as it should.
The most probable solution is adding close(fd[1]); above the execl().
The reason why your program hangs is that the read() function waits for the daemon to write something to its stdout/stderr. If the daemon (including the child process of your program, and also the child process' forked children who keep their stdout/stderr) doesn't write anything and there is at least one process holding the writable end of the pipe open, read() will never return. But which is that process, which is holding the writable end of the pipe open? It is most probably the child of your program's child, the long-running daemon process. Although it may have called close(0); and close(1); when daemonizing itself, most probably it hasn't called close(fd[1]);, so the writable end of the pipe is still open.
Your problem is proably here:-
// Wait for child to terminate
int status;
wait(&status);
As the child process is a deamon it wont terminate anytime soon.
Also your "read()" is likely to hang. You are going to have to decide how long you wait before abandoning any attempt to display output.
As the child process is a deamon it wont terminate anytime soon.
Are you sure? Of course I would agree that a daemon won't terminate anytime soon - but when a daemon starts up it forks so the child can disassociate itself with the terminal, and then the parent exits. Since the wait() system call is waiting on the parent daemon process, it should exit.
Regardless, the same problem occurs without the call to wait().
Also, why doesn't the read() get an EOF? The read() is reading from an open pipe that is connected with the parent daemon process. So when the parent daemon process exits, the read() should return immediately with an EOF.
I think you should receive the SIGPIPE signal when waiting on the read to complete, since the other end of the pipe is closed. Did you make anything unusual with the signal ? I suggest you run your code with the strace command.

Resources