Why do we need to call close on pipes before execvp?

Why do we need to call close on pipes before execvp? - c

I've been trying to implement shell-like functionality with pipes in an application and I'm following this example. I will reproduce the code here for future reference in case the original is removed:
#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/types.h>
#include <sys/stat.h>
/**
* Executes the command "cat scores | grep Villanova | cut -b 1-10".
* This quick-and-dirty version does no error checking.
*
* #author Jim Glenn
* #version 0.1 10/4/2004
*/
int main(int argc, char **argv)
{
int status;
int i;
// arguments for commands; your parser would be responsible for
// setting up arrays like these
char *cat_args[] = {"cat", "scores", NULL};
char *grep_args[] = {"grep", "Villanova", NULL};
char *cut_args[] = {"cut", "-b", "1-10", NULL};
// make 2 pipes (cat to grep and grep to cut); each has 2 fds
int pipes[4];
pipe(pipes); // sets up 1st pipe
pipe(pipes + 2); // sets up 2nd pipe
// we now have 4 fds:
// pipes[0] = read end of cat->grep pipe (read by grep)
// pipes[1] = write end of cat->grep pipe (written by cat)
// pipes[2] = read end of grep->cut pipe (read by cut)
// pipes[3] = write end of grep->cut pipe (written by grep)
// Note that the code in each if is basically identical, so you
// could set up a loop to handle it. The differences are in the
// indicies into pipes used for the dup2 system call
// and that the 1st and last only deal with the end of one pipe.
// fork the first child (to execute cat)
if (fork() == 0)
{
// replace cat's stdout with write part of 1st pipe
dup2(pipes[1], 1);
// close all pipes (very important!); end we're using was safely copied
close(pipes[0]);
close(pipes[1]);
close(pipes[2]);
close(pipes[3]);
execvp(*cat_args, cat_args);
}
else
{
// fork second child (to execute grep)
if (fork() == 0)
{
// replace grep's stdin with read end of 1st pipe
dup2(pipes[0], 0);
// replace grep's stdout with write end of 2nd pipe
dup2(pipes[3], 1);
// close all ends of pipes
close(pipes[0]);
close(pipes[1]);
close(pipes[2]);
close(pipes[3]);
execvp(*grep_args, grep_args);
}
else
{
// fork third child (to execute cut)
if (fork() == 0)
{
// replace cut's stdin with input read of 2nd pipe
dup2(pipes[2], 0);
// close all ends of pipes
close(pipes[0]);
close(pipes[1]);
close(pipes[2]);
close(pipes[3]);
execvp(*cut_args, cut_args);
}
}
}
// only the parent gets here and waits for 3 children to finish
close(pipes[0]);
close(pipes[1]);
close(pipes[2]);
close(pipes[3]);
for (i = 0; i < 3; i++)
wait(&status);
}
I have trouble understanding why the pipes are being closed just before calling execvp and reading or writing any data. I believe it has something to do with passing EOF flags to processes so that they can stop reading writing however I don't see how that helps before any actual data is pushed to the pipe. I'd appreciate a clear explanation. Thanks.

I have trouble understanding why the pipes are being closed just before calling execvp and reading or writing any data.
The pipes are not being closed. Rather, some file descriptors associated with the pipe ends are being closed. Each child process is duping pipe-end file descriptors onto one or both of its standard streams, then closing all pipe-end file descriptors that it is not actually going to use, which is all of the ones stored in the pipes array. Each pipe itself remains open and usable as long as each end is open in at least one process, and each child process holds at least one end of one pipe open. Those are closed when the child processes terminate (or at least under the control of the child processes, post execvp()).
One reason to perform such closures is for tidiness and resource management. There is a limit on how many file descriptors a process may have open at once, so it is wise to avoiding leaving unneeded file descriptors open.
But also, functionally, a process reading from one of the pipes will not detect end of file until all open file descriptors associated with the write end of the pipe, in any process, are closed. That's what EOF on a pipe means, and it makes sense because as long as the write end is open anywhere, it is possible that more data will be written to it.

Related

How to use stderr with execve [duplicate]

I'm writing a C program where I fork(), exec(), and wait(). I'd like to take the output of the program I exec'ed to write it to file or buffer.
For example, if I exec ls I want to write file1 file2 etc to buffer/file. I don't think there is a way to read stdout, so does that mean I have to use a pipe? Is there a general procedure here that I haven't been able to find?

For sending the output to another file (I'm leaving out error checking to focus on the important details):
if (fork() == 0)
{
// child
int fd = open(file, O_RDWR | O_CREAT, S_IRUSR | S_IWUSR);
dup2(fd, 1); // make stdout go to file
dup2(fd, 2); // make stderr go to file - you may choose to not do this
// or perhaps send stderr to another file
close(fd); // fd no longer needed - the dup'ed handles are sufficient
exec(...);
}
For sending the output to a pipe so you can then read the output into a buffer:
int pipefd[2];
pipe(pipefd);
if (fork() == 0)
{
close(pipefd[0]); // close reading end in the child
dup2(pipefd[1], 1); // send stdout to the pipe
dup2(pipefd[1], 2); // send stderr to the pipe
close(pipefd[1]); // this descriptor is no longer needed
exec(...);
}
else
{
// parent
char buffer[1024];
close(pipefd[1]); // close the write end of the pipe in the parent
while (read(pipefd[0], buffer, sizeof(buffer)) != 0)
{
}
}

You need to decide exactly what you want to do - and preferably explain it a bit more clearly.
Option 1: File
If you know which file you want the output of the executed command to go to, then:
Ensure that the parent and child agree on the name (parent decides name before forking).
Parent forks - you have two processes.
Child reorganizes things so that file descriptor 1 (standard output) goes to the file.
Usually, you can leave standard error alone; you might redirect standard input from /dev/null.
Child then execs relevant command; said command runs and any standard output goes to the file (this is the basic shell I/O redirection).
Executed process then terminates.
Meanwhile, the parent process can adopt one of two main strategies:
Open the file for reading, and keep reading until it reaches an EOF. It then needs to double check whether the child died (so there won't be any more data to read), or hang around waiting for more input from the child.
Wait for the child to die and then open the file for reading.
The advantage of the first is that the parent can do some of its work while the child is also running; the advantage of the second is that you don't have to diddle with the I/O system (repeatedly reading past EOF).
Option 2: Pipe
If you want the parent to read the output from the child, arrange for the child to pipe its output back to the parent.
Use popen() to do this the easy way. It will run the process and send the output to your parent process. Note that the parent must be active while the child is generating the output since pipes have a small buffer size (often 4-5 KB) and if the child generates more data than that while the parent is not reading, the child will block until the parent reads. If the parent is waiting for the child to die, you have a deadlock.
Use pipe() etc to do this the hard way. Parent calls pipe(), then forks. The child sorts out the plumbing so that the write end of the pipe is its standard output, and ensures that all other file descriptors relating to the pipe are closed. This might well use the dup2() system call. It then executes the required process, which sends its standard output down the pipe.
Meanwhile, the parent also closes the unwanted ends of the pipe, and then starts reading. When it gets EOF on the pipe, it knows the child has finished and closed the pipe; it can close its end of the pipe too.

Since you look like you're going to be using this in a linux/cygwin environment, you want to use popen. It's like opening a file, only you'll get the executing programs stdout, so you can use your normal fscanf, fread etc.

After forking, use dup2(2) to duplicate the file's FD into stdout's FD, then exec.

You could also use the linux sh command and pass it a command that includes the redirection:
string cmd = "/bin/ls > " + filepath;
execl("/bin/sh", "sh", "-c", cmd.c_str(), 0);

For those such as myself who like a complete example with includes, here's this fantastic answer with a runnable example (still without error handling, left as an exercise):
#include <fcntl.h>
#include <sys/wait.h>
#include <unistd.h>
int main() {
if (fork() == 0) { // child
int fd = open("test.txt", O_RDWR | O_CREAT, S_IRUSR | S_IWUSR);
dup2(fd, 1); // make stdout go to file
dup2(fd, 2); // make stderr go to file - you may choose to not do this
// or perhaps send stderr to another file
close(fd); // fd no longer needed - the dup'ed handles are sufficient
execlp("ls", "ls", NULL);
}
else {
while (wait(NULL) > 0) {} // wait for each child process
}
return 0;
}

Difference between pipe system call and reading/writing to stdin/out

A pipe connects the stdout of one process to the stdin of another: https://superuser.com/a/277327
Here is a simple program to take input from stdin and print it:
int main( ) {
char str[100];
gets( str );
puts( str );
return 0;
}
I can use a unix pipe to pass the input from another process:
echo "hi" | ./a.out
My question is, what is the difference between the simple code above and using the pipe() system call? Does the system call essentially do the same job without writing to the terminal? More on Pipes: https://tldp.org/LDP/lpg/node11.html

The pipe() system call allows you to get file descriptors (one for reading and one for writing) for a channel (a pipe) that allows to stream bytes through multiple processes. This is an example where a parent process creates a pipe and its child writes to it so the parent can read from it:
int main() {
int fd[2];
pipe(fd);
int pid = fork();
if (pid == 0) { // Child:
close(fd[0]); // Close reading descriptor as it's not needed
write(fd[1], "Hello", 5);
} else { // Parent:
char buf[5];
close(fd[1]); // Close writing descriptor as it's not needed
read(fd[0], buf, 5); // Read the data sent by the child through the pipe
write(1, buf, 5); // print the data that's been read to stdout
}
}
When a shell encounters the pipe (|) operator, it does use the pipe() system call, but also does additional things, in order to redirect the left operand's stdout and the right operand's stdin to the pipe. Here's a simplified example of what the shell would do for the command echo "hi" | ./a.out (keep in mind that when duplicating a file descriptor it gets duplicated to the first index available in the open files structure of the process):
int main() {
int fd[2];
pipe(fd);
int pid_echo = fork();
if (pid_echo == 0) {
// Close reading descriptor as it's not needed
close(fd[0]);
// Close standard output
close(1);
// Replace standard output with the pipe by duplicating its writing descriptor
dup(fd[1]);
// Execute echo;
// now when echo prints to stdout it will actually print to the pipe
// because now file descriptor 1 belongs to the pipe
execlp("echo", "echo", "hi", (char*)NULL);
exit(-1);
}
int pid_aout = fork();
if (pid_aout == 0) {
// Close standard input
close(0);
// Replace standard input with the pipe by duplicating its reading descriptor
dup(fd[0]);
// Execute a.out;
// Now when a.out reads from stdin it will actually read from the pipe
// because now file descriptor 0 belongs to the pipe
execl("./a.out", "./a.out", (char*)NULL);
exit(-1);
}
}

A pipe is an inter-process communication mechanism that leverages I/O redirection. However, pipes are not involved in all I/O redirection.
Since child processes may inherit file descriptors from their parent process, a parent process may change what files the child's standard streams point to, unbeknownst to the child process. This is I/O redirection.

Trouble with pipes, dup, close and exec in C

Well. I kinda understand how pipes work and why is dup/dup2 used before an exec in any child process.
But I need help with the 'close(int fd)' thing.
To make it clear I would like to ask you for any pseudocode or any C code example which does the following:
Parent gets a fd from a file using open().
Parent creates a child which execs to another program which reads data from the open() func fd used before and writes the output in a pipe. (So parent should wait it to end before continuing).
Same parent then creates another child which is going to exec and read from that write end of the pipe created before and write the output in the stdo.
Is it even posible to do this with only one pipe?
The tricky thing here for me is not creating the pipe and redirecting channels with dup2 and stuff, it is knowing where and when to close() all the fd channels.
If you could explain me how to do a thing like that and when and where to close the channels with an example I think I would definetly understand it all.
Thanks a lot guys.

Below is a complete example. WhozCraig already told that there's no need for parent to wait for child 1 to end before continuing, since child 2 has to read the pipe until EOF. (On the contrary, the parent must not wait, because the pipe buffer might not be large enough to hold all the file data.) Of course there's only one pipe needed, where child 1 writes to one end and child 2 reads from the other. And for that no dup is needed.
When and where does the parent have to close the pipe channels?
The parent may close the pipe ends as soon as it doesn't need them any longer, provided that a child which needs a pipe end has it open, i. e. in our case, the parent may close (its descriptor of) the write end after child 1 has been forked, and the parent may close the read end after child 2 has been forked, since the children inherit the pipe descriptors, and they remain usable until the children close them at exit.
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>
main(int argc, char *argv[])
{
if (argc <= 1) return 1;
int fd = open(argv[1], O_RDONLY); // Parent gets a fd from a file.
if (fd < 0) return perror(argv[1]), 1;
int pipefd[2];
if (pipe(pipefd) < 0) return perror("pipe"), 1;
char in[8], out[8];
sprintf(in, "FD:%d", fd); // reads data from the open() func fd
sprintf(out, "FD:%d", pipefd[1]); // writes the output in a pipe
switch (fork())
{ // Parent creates a child which execs to another program, e. g. "socat"
case -1: return perror("fork 1"), 1;
case 0: execlp("socat", "socat", "-u", in, out, NULL);
return perror("execlp 1"), 1;
}
close(pipefd[1]); // parent may close write end, since child has it open
sprintf(in, "FD:%d", pipefd[0]); // read from the pipe created before
sprintf(out, "FD:%d", 1); // write the output in the stdo
switch (fork())
{ // Same parent then creates another child which is going to exec
case -1: return perror("fork 2"), 1;
case 0: execlp("socat", "socat", "-u", in, out, NULL);
return perror("execlp 2"), 1;
}
close(pipefd[0]); // parent may close read end, since child has it open
}

Can not understand the pipe() in my own shell

This is the code i found for my own shell. It works fine, but the thing i can't understand is pipe section of the code.
#include <stdio.h>
#include <unistd.h>
#include <string.h>
#include <stdlib.h>
char* cmndtkn[256];
char buffer[256];
char* path=NULL;
char pwd[128];
int main(){
//setting path variable
char *env;
env=getenv("PATH");
putenv(env);
system("clear");
printf("\t MY OWN SHELL !!!!!!!!!!\n ");
printf("_______________________________________\n\n");
while(1){
fflush(stdin);
getcwd(pwd,128);
printf("[MOSH~%s]$",pwd);
fgets(buffer,sizeof(buffer),stdin);
buffer[sizeof(buffer)-1] = '\0';
//tokenize the input command line
char* tkn = strtok(buffer," \t\n");
int i=0;
int indictr=0;
// loop for every part of the command
while(tkn!=NULL)
{
if(strcoll(tkn,"exit")==0 ){
exit(0);
}
else if(strcoll(buffer,"cd")==0){
path = buffer;
chdir(path+=3);
}
else if(strcoll(tkn,"|")==0){
indictr=i;
}
cmndtkn[i++] = tkn;
tkn = strtok(NULL," \t\n");
}cmndtkn[i]='\0';
// execute when command has pipe. when | command is found indictr is greater than 0.
if(indictr>0){
char* leftcmnd[indictr+1];
char* rightcmnd[i-indictr];
int a,b;
for(b=0;b<indictr;b++)
leftcmnd[b]=cmndtkn[b];
leftcmnd[indictr]=NULL;
for(a=0;a<i-indictr-1;a++)
rightcmnd[a]=cmndtkn[a+indictr+1];
rightcmnd[i-indictr]=NULL;
if(!fork())
{
fflush(stdout);
int pfds[2];
pipe(pfds);
if(!fork()){
close(1);
dup(pfds[1]);
close(pfds[0]);
execvp(leftcmnd[0],leftcmnd);
}
else{
close(0);
dup(pfds[0]);
close(pfds[1]);
execvp(rightcmnd[0],rightcmnd);
}
}else
wait(NULL);
//command not include pipe
}else{
if(!fork()){
fflush(stdout);
execvp(cmndtkn[0],cmndtkn);
}else
wait(NULL);
}
}
}
What is the purpose of the calls to close() with parameters of 0 and 1 mean and what does the call to dup() do?

On Unix, the dup() call uses the lowest numbered unused file descriptor. So, the close(1) before the call to dup() is to coerce dup() to use file descriptor 1. Similarly for close(0).
So, the aliasing is to get the process to use the write end of the pipe for stdout (file descriptor 1 is used for console output), and the read end of the pipe for stdin (file descriptor 0 is used for console input).
The code may have been more clearly expressed with dup2() instead.
dup2(fd[1], 1); /* alias fd[1] to 1 */
From your question about how ls | sort works, your question is not limited to why the dup() system call is being made. Your question is actually how pipes in Unix work, and how a shell command pipeline works.
A pipe in Unix is a pair of file descriptors that are related in that writing data on tje writable descriptor allows that data to be read from the readable descriptor. The pipe() call returns this pair in an array, where the first array element is readable, and second array element is writable.
In Unix, a fork() followed by some kind of exec() is the only way to produce a new process (there are other library calls, such as system() or popen() that create processes, but they call fork() and do an exec() under the hood). A fork() produces a child process. The child process sees the return value of 0 from the call, while the parent sees a non-zero return value that is either the PID of the child process, or a -1 indicating that an error has occurred.
The child process is a duplicate of the parent. This means that when a child modifies a variable, it is modifying a copy of the variable that resides in its own process. The parent does not see the modification occur, as the parent has the original copy). However, a duplicated pair of file descriptors that form a pipe can be used to allow a child process its parent to communicate with each other.
So, ls | sort means that there are two processes being spawned, and the output written by ls is being read as input by sort. Two processes means two calls to fork() to create two child processes. One child process will exec() the ls command, the other child process will exec() the sort command. A pipe is used between them to allow the processes to talk to each other. The ls process writes to the writable end of the pipe, the sort process reads from the readable end of the pipe.
The ls process is coerced into writing into the writable end of the pipe with the dup() call after issuing close(1). The sort process is coerced into reading the readable end of the pipe with the dup() call after close(0).
In addition, the close() calls that close the pipe file descriptors are used to make sure that the ls process is the only process to have an open reference to the writable fd, the the sort process is the only process to have an open reference to the readable fd. That step is important because after ls exits, it will close the writable end of the fd, and the sort process will expect to see an EOF as a result. However, this will not occur if some other process still has the writable fd open.

http://en.wikipedia.org/wiki/Standard_streams#Standard_input_.28stdin.29
stdin is file descriptor 0.
stdout is file descriptor 1.
In the !fork section, the process closes stdout then calls dup on pfds[1] which according to:
http://linux.die.net/man/2/dup
Creates a duplicate of the specified file descriptor at the lowest available position, which will be 1, since it was just closed (and stdin hasn't been closed yet). This means everything sent to stdout will really go to pfds[1].
So, basically, it's setting up the two new processes to talk to each other. the !fork section is for the new child which will send data to stdout (file descriptor 1), the parent (the else block) closes stdin, so it really reads from pfds[0] when it tries to read from stdout.
Each process has to close the file descriptor in pfds it's not using, as there are two open handles to the file now that the process has forked. Each process now execs to left/right-cmnd, but the new stdin and stdout mappings remain for the new processes.
Forking twice is explained here: Why fork() twice

Redirecting exec output to a buffer or file

I'm writing a C program where I fork(), exec(), and wait(). I'd like to take the output of the program I exec'ed to write it to file or buffer.
For example, if I exec ls I want to write file1 file2 etc to buffer/file. I don't think there is a way to read stdout, so does that mean I have to use a pipe? Is there a general procedure here that I haven't been able to find?

For sending the output to another file (I'm leaving out error checking to focus on the important details):
if (fork() == 0)
{
// child
int fd = open(file, O_RDWR | O_CREAT, S_IRUSR | S_IWUSR);
dup2(fd, 1); // make stdout go to file
dup2(fd, 2); // make stderr go to file - you may choose to not do this
// or perhaps send stderr to another file
close(fd); // fd no longer needed - the dup'ed handles are sufficient
exec(...);
}
For sending the output to a pipe so you can then read the output into a buffer:
int pipefd[2];
pipe(pipefd);
if (fork() == 0)
{
close(pipefd[0]); // close reading end in the child
dup2(pipefd[1], 1); // send stdout to the pipe
dup2(pipefd[1], 2); // send stderr to the pipe
close(pipefd[1]); // this descriptor is no longer needed
exec(...);
}
else
{
// parent
char buffer[1024];
close(pipefd[1]); // close the write end of the pipe in the parent
while (read(pipefd[0], buffer, sizeof(buffer)) != 0)
{
}
}

Since you look like you're going to be using this in a linux/cygwin environment, you want to use popen. It's like opening a file, only you'll get the executing programs stdout, so you can use your normal fscanf, fread etc.

After forking, use dup2(2) to duplicate the file's FD into stdout's FD, then exec.

You could also use the linux sh command and pass it a command that includes the redirection:
string cmd = "/bin/ls > " + filepath;
execl("/bin/sh", "sh", "-c", cmd.c_str(), 0);

For those such as myself who like a complete example with includes, here's this fantastic answer with a runnable example (still without error handling, left as an exercise):
#include <fcntl.h>
#include <sys/wait.h>
#include <unistd.h>
int main() {
if (fork() == 0) { // child
int fd = open("test.txt", O_RDWR | O_CREAT, S_IRUSR | S_IWUSR);
dup2(fd, 1); // make stdout go to file
dup2(fd, 2); // make stderr go to file - you may choose to not do this
// or perhaps send stderr to another file
close(fd); // fd no longer needed - the dup'ed handles are sufficient
execlp("ls", "ls", NULL);
}
else {
while (wait(NULL) > 0) {} // wait for each child process
}
return 0;
}