The following code is the pipe implementation given in beej's guide:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
int main(void)
{
int pfds[2];
pipe(pfds);
if (!fork()) {
close(1); /* close normal stdout */
dup(pfds[1]); /* make stdout same as pfds[1] */
close(pfds[0]); /* we don't need this */
execlp("ls", "ls", NULL);
} else {
close(0); /* close normal stdin */
dup(pfds[0]); /* make stdin same as pfds[0] */
close(pfds[1]); /* we don't need this */
execlp("wc", "wc", "-l", NULL);
}
return 0;
}
I wanted to ask:
Is it possible that close(0) is executed before dup(pfds[1])? If yes, then in that case the program will not behave as expected.
What is the use of the following lines of code:
close(pfds[0]); /* we don't need this */
close(pfds[1]); /* we don't need this */
And what would change if these lines were not there?
Is it possible that close(0) is executed before dup(pfds[1])? If yes,
then in that case the program will not behave as expected.
Yes, it is possible to have the parent successfully complete close(0) before the child calls dup(pfds[1]). However, this is not a problem. When you fork a new process, the new process gets an entire copy of the parent's memory address space, including open file descriptors (except those marked with the O_CLOEXEC flag - see fcntl(2)). So, essentially each process has its own private copy of the file descriptors and is isolated and free to do whatever it wants with that copy.
Thus, when the parent calls close(0), it is only closing its copy of file descriptor 0 (stdin); it does not affect the child in any way, which still has a reference to stdin and can use it if needed (even though in this example it won't).
What is the use of the following lines of code:
close(pfds[0]); /* we don't need this */
close(pfds[1]); /* we don't need this */
Best practices mandate that you should close file descriptors that you don't use - this is the case for close(pfds[0]). Unused open file descriptors eat up space and resources, why keep it open if you're not going to use it?
close(pfds[1]) is a little more subtle though. Pipes report end of file only when there is no more data in the pipe buffer and there are no active writers, i.e., no live processes that have the pipe open for writing. If you do not close pfds[1] in the parent, the program will hang forever because wc(1) will never see the end of input, since there is a process (wc(1) itself) that has the pipe opened for writing and as such could (but won't) write more data.
Tl;DR: close(pfds[0]) is just good practice but not mandatory; close(pfds[1]) is absolutely necessary to ensure program correctness.
Question 1:
Yes it is entirely possible that "close(0);" (in the parent) is executed before "dup(pfds[1]);" (in the child). But since this happens in different processes, the child will still have fd 0 open.
Question 2:
It is good bookkeeping practice to close the end of the pipe that a process is not going to use. That way, you can avoid bugs further down the road in more complex programs. In the above scenario, the child process should ever only read from the pipe. If you close the write end in the child, eny attempt to write to it will cause an error, otherwise you might have a bug that is hard to detect.
Related
I'm writing a C program where I fork(), exec(), and wait(). I'd like to take the output of the program I exec'ed to write it to file or buffer.
For example, if I exec ls I want to write file1 file2 etc to buffer/file. I don't think there is a way to read stdout, so does that mean I have to use a pipe? Is there a general procedure here that I haven't been able to find?
For sending the output to another file (I'm leaving out error checking to focus on the important details):
if (fork() == 0)
{
// child
int fd = open(file, O_RDWR | O_CREAT, S_IRUSR | S_IWUSR);
dup2(fd, 1); // make stdout go to file
dup2(fd, 2); // make stderr go to file - you may choose to not do this
// or perhaps send stderr to another file
close(fd); // fd no longer needed - the dup'ed handles are sufficient
exec(...);
}
For sending the output to a pipe so you can then read the output into a buffer:
int pipefd[2];
pipe(pipefd);
if (fork() == 0)
{
close(pipefd[0]); // close reading end in the child
dup2(pipefd[1], 1); // send stdout to the pipe
dup2(pipefd[1], 2); // send stderr to the pipe
close(pipefd[1]); // this descriptor is no longer needed
exec(...);
}
else
{
// parent
char buffer[1024];
close(pipefd[1]); // close the write end of the pipe in the parent
while (read(pipefd[0], buffer, sizeof(buffer)) != 0)
{
}
}
You need to decide exactly what you want to do - and preferably explain it a bit more clearly.
Option 1: File
If you know which file you want the output of the executed command to go to, then:
Ensure that the parent and child agree on the name (parent decides name before forking).
Parent forks - you have two processes.
Child reorganizes things so that file descriptor 1 (standard output) goes to the file.
Usually, you can leave standard error alone; you might redirect standard input from /dev/null.
Child then execs relevant command; said command runs and any standard output goes to the file (this is the basic shell I/O redirection).
Executed process then terminates.
Meanwhile, the parent process can adopt one of two main strategies:
Open the file for reading, and keep reading until it reaches an EOF. It then needs to double check whether the child died (so there won't be any more data to read), or hang around waiting for more input from the child.
Wait for the child to die and then open the file for reading.
The advantage of the first is that the parent can do some of its work while the child is also running; the advantage of the second is that you don't have to diddle with the I/O system (repeatedly reading past EOF).
Option 2: Pipe
If you want the parent to read the output from the child, arrange for the child to pipe its output back to the parent.
Use popen() to do this the easy way. It will run the process and send the output to your parent process. Note that the parent must be active while the child is generating the output since pipes have a small buffer size (often 4-5 KB) and if the child generates more data than that while the parent is not reading, the child will block until the parent reads. If the parent is waiting for the child to die, you have a deadlock.
Use pipe() etc to do this the hard way. Parent calls pipe(), then forks. The child sorts out the plumbing so that the write end of the pipe is its standard output, and ensures that all other file descriptors relating to the pipe are closed. This might well use the dup2() system call. It then executes the required process, which sends its standard output down the pipe.
Meanwhile, the parent also closes the unwanted ends of the pipe, and then starts reading. When it gets EOF on the pipe, it knows the child has finished and closed the pipe; it can close its end of the pipe too.
Since you look like you're going to be using this in a linux/cygwin environment, you want to use popen. It's like opening a file, only you'll get the executing programs stdout, so you can use your normal fscanf, fread etc.
After forking, use dup2(2) to duplicate the file's FD into stdout's FD, then exec.
You could also use the linux sh command and pass it a command that includes the redirection:
string cmd = "/bin/ls > " + filepath;
execl("/bin/sh", "sh", "-c", cmd.c_str(), 0);
For those such as myself who like a complete example with includes, here's this fantastic answer with a runnable example (still without error handling, left as an exercise):
#include <fcntl.h>
#include <sys/wait.h>
#include <unistd.h>
int main() {
if (fork() == 0) { // child
int fd = open("test.txt", O_RDWR | O_CREAT, S_IRUSR | S_IWUSR);
dup2(fd, 1); // make stdout go to file
dup2(fd, 2); // make stderr go to file - you may choose to not do this
// or perhaps send stderr to another file
close(fd); // fd no longer needed - the dup'ed handles are sufficient
execlp("ls", "ls", NULL);
}
else {
while (wait(NULL) > 0) {} // wait for each child process
}
return 0;
}
I'm trying to understand how pipe() function works and I have the following program example
int main(void)
{
int fd[2], nbytes;
pid_t childpid;
char string[] = "Hello, world!\n";
char readbuffer[80];
pipe(fd);
if((childpid = fork()) == -1)
{
perror("fork");
exit(1);
}
if(childpid == 0)
{
/* Child process closes up input side of pipe */
close(fd[0]);
/* Send "string" through the output side of pipe */
write(fd[1], string, (strlen(string)+1));
exit(0);
}
else
{
/* Parent process closes up output side of pipe */
close(fd[1]);
/* Read in a string from the pipe */
nbytes = read(fd[0], readbuffer, sizeof(readbuffer));
printf("Received string: %s", readbuffer);
}
return(0);
}
My first question is what benefits do we get from closing the file descriptor using close(fd[0]) and close(fd[1]) in child and parent processes. Second, we use write in child and read in parent, but what if parent process reaches read before child reaches write and tries to read from pipe which has nothing in it ? Thanks!
Daniel Jour gave you 99% of the answer already, in a very succinct and easy to understand manner:
Closing: Because it's good practice to close what you don't need. For the second question: These are potentially blocking functions. So reading from an empty pipe will just block the reader process until something gets written into the pipe.
I'll try to elaborate.
Closing:
When a process is forked, its open files are duplicated.
Each process has a limit on how many files descriptors it's allowed to have open. As stated in the documentation: each side of the pipe is a single fd, meaning a pipe requires two file descriptors and in your example, each process is only using one.
By closing the file descriptor you don't use, you're releasing resources that are in limited supply and which you might need further on down the road.
e.g., if you were writing a server, that extra fd means you can handle one more client.
Also, although releasing resources on exit is "optional", it's good practice. Resources that weren't properly released should be handled by the OS...
...but the OS was also written by us programmers, and we do make mistakes. So it only makes sense that the one who claimed a resource and knows about it will be kind enough to release the resource.
Race conditions (read before write):
POSIX defines a few behaviors that make read, write and pipes a good choice for thread and process concurrency synchronization. You can read more about it on the Rational section for write, but here's a quick rundown:
By default, pipes (and sockets) are created in what is known as "blocking mode".
This means that the application will hang until the IO operation is performed.
Also, IO operations are atomic, meaning that:
You will never be reading and writing at the same time. A read operation will wait until a write operation completes before reading from the pipe (and vice-versa)
if two threads call read in the same time, each will get a serial (not parallel) response, reading sequentially from the pipe (or socket) - this make pipes great tools for concurrency handling.
In other words, when your application calls:
read(fd[0], readbuffer, sizeof(readbuffer));
Your application will wait forever for some data to be available and for the read operation to complete (which it will once 80 (sizeof(readbuffer)) bytes were read, or if the EOF status changed during a read).
I have the following code taken from the “Pipes” section of Beej’s Guide to Unix IPC.
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
int main(void)
{
int pfds[2];
pipe(pfds);
if (!fork()) {
close(1); /* close normal stdout */
dup(pfds[1]); /* make stdout same as pfds[1] */
close(pfds[0]); /* we don't need this */
execlp("ls", "ls", NULL);
} else {
close(0); /* close normal stdin */
dup(pfds[0]); /* make stdin same as pfds[0] */
close(pfds[1]); /* we don't need this */
execlp("wc", "wc", "-l", NULL);
}
return 0;
}
This code allows the user to see how many files are in a specific directory. How can I edit this code to implement the longer pipeline cat /etc/passwd | cut –f1 –d: | sort? Does anyone have any idea how to do this because I am completely stuck. Any help would be appreciated.
Feels like homework, so I'll just give you some pointers:
The longer pipeline has two pipes, so you'll need to call pipe() twice. (I'd also check pipe's return value whilst I was at it.)
There are three processes, which means two forks. Again, check fork()'s return value properly: it's tri-state: parent, child or failure, and your program should test all three cases.
If you call pipe() twice up front, think carefully about which file descriptors (i.e. which ends of pipes) are which in each process, and hence which ones to close before invoking execlp(). I'd draw a picture.
I'd prefer dup2() to dup(), since you're explicitly setting the target file descriptor, and so it makes sense to specify it in the call. Also avoids silly bugs.
dup and execlp can fail, so I'd check their return values too...
You need (depending on the length of the command list) some pipes. But: at maximum you need not more than two pipe-pair-fds for a process in the middle, for the first and the last you need one pipe-pair-fds. Be really sure to close the pipe-fds which are not needed - if not, the child processes might not get an EOF and never finish.
And (as user3392484 stated): check all system calls for error conditions and report them to the caller. This will make life much easier.
I implemented something like this during the last days, maybe you want to have a look there: pipexec.c.
Kind regards - Andreas
Given the following code:
int main(int argc, char *argv[])
{
int pipefd[2];
pid_t cpid;
char buf;
if (argc != 2) {
fprintf(stderr, "Usage: %s \n", argv[0]);
exit(EXIT_FAILURE);
}
if (pipe(pipefd) == -1) {
perror("pipe");
exit(EXIT_FAILURE);
}
cpid = fork();
if (cpid == -1) {
perror("fork");
exit(EXIT_FAILURE);
}
if (cpid == 0) { /* Child reads from pipe */
close(pipefd[1]); /* Close unused write end */
while (read(pipefd[0], &buf, 1) > 0)
write(STDOUT_FILENO, &buf, 1);
write(STDOUT_FILENO, "\n", 1);
close(pipefd[0]);
_exit(EXIT_SUCCESS);
} else { /* Parent writes argv[1] to pipe */
close(pipefd[0]); /* Close unused read end */
write(pipefd[1], argv[1], strlen(argv[1]));
close(pipefd[1]); /* Reader will see EOF */
wait(NULL); /* Wait for child */
exit(EXIT_SUCCESS);
}
return 0;
}
Whenever the child process wants to read from the pipe, it must first close the pipe's side from writing. When I remove that line close(pipefd[1]); from the child process's if,
I'm basically saying that "okay, the child can read from the pipe, but I'm allowing the parent to write to the pipe at the same time"?
If so, what would happen when the pipe is open for both reading & writing? No mutual exclusion?
Whenever the child process wants to read from the pipe, it must first close the pipe's side from writing.
If the process — parent or child — is not going to use the write end of a pipe, it should close that file descriptor. Similarly for the read end of a pipe. The system will assume that a write could occur while any process has the write end open, even if the only such process is the one that is currently trying to read from the pipe, and the system will not report EOF, therefore. Further, if you overfill a pipe and there is still a process with the read end open (even if that process is the one trying to write), then the write will hang, waiting for the reader to make space for the write to complete.
When I remove that line close(pipefd[1]); from the child's process IF, I'm basically saying that "okay, the child can read from the pipe, but I'm allowing the parent to write to the pipe at the same time"?
No; you're saying that the child can write to the pipe as well as the parent. Any process with the write file descriptor for the pipe can write to the pipe.
If so, what would happen when the pipe is open for both reading and writing — no mutual exclusion?
There isn't any mutual exclusion ever. Any process with the pipe write descriptor open can write to the pipe at any time; the kernel ensures that two concurrent write operations are in fact serialized. Any process with the pipe read descriptor open can read from the pipe at any time; the kernel ensures that two concurrent read operations get different data bytes.
You make sure a pipe is used unidirectionally by ensuring that only one process has it open for writing and only one process has it open for reading. However, that is a programming decision. You could have N processes with the write end open and M processes with the read end open (and, perish the thought, there could be processes in common between the set of N and set of M processes), and they'd all be able to work surprisingly sanely. But you'd not readily be able to predict where a packet of data would be read after it was written.
fork() duplicates the file handles, so you will have two handles for each end of the pipe.
Now, consider this. If the parent doesn't close the unused end of the pipe, there will still be two handles for it. If the child dies, the handle on the child side goes away, but there's still the open handle held by the parent -- thus, there will never be a "broken pipe" or "EOF" arriving because the pipe is still perfectly valid. There's just nobody putting data into it anymore.
Same for the other direction, of course.
Yes, the parent/child could still use the handle to write into their own pipe; I don't remember a use-case for this, though, and it still gives you synchronization problems.
When the pipe is created it is having two ends the read end and write end. These are entries in the User File descriptor table.
Similarly there will be two entries in the File table with 1 as reference count for both the read end and the write end.
Now when you fork, a child is created that is the file descriptors are duplicated and thus the reference count of both the ends in the file table becomes 2.
Now "When I remove that line close(pipefd[1])" -> In this case even if the parent has completed writing, your while loop below this line will block for ever for the read to return 0(ie EOF). This happens since even if the parent has completed writing and closed the write end of the pipe, the reference count of the write end in the File table is still 1 (Initially it was 2) and so the read function still is waiting for some data to arrive which will never happen.
Now if you have not written "close(pipefd[0]);" in the parent, this current code may not show any problem, since you are writing once in the parent.
But if you write more than once then ideally you would have wanted to get an error (if the child is no longer reading),but since the read end in the parent is not closed, you will not be getting the error (Even if the child is no more there to read).
So the problem of not closing the unused ends become evident when we are continuously reading/writing data. This may not be evident if we are just reading/writing data once.
Like if instead of the read loop in the child, you are using only once the line below, where you are getting all the data in one go, and not caring to check for EOF, your program will work even if you are not writing "close(pipefd[1]);" in the child.
read(pipefd[0], buf, sizeof(buf));//buf is a character array sufficiently large
man page for pipe() for SunOS :-
Read calls on an empty pipe (no buffered data) with only one
end (all write file descriptors closed) return an EOF (end
of file).
A SIGPIPE signal is generated if a write on a pipe with only
one end is attempted.
This seems to be a fairly common thing to do, and I've managed to teach myself everything that I need to make it work, except that I now have a single problem, which is defying my troubleshooting.
int nonBlockingPOpen(char *const argv[]){
int inpipe;
pid_t pid;
/* open both ends of pipe nonblockingly */
pid = fork();
switch(pid){
case 0: /*child*/
sleep(1); /*child should open after parent has open for reading*/
/*redirect stdout to opened pipe*/
int outpipe = open("./fifo", O_WRONLY);
/*SHOULD BLOCK UNTIL MAIN PROCESS OPENS FOR WRITING*/
dup2(outpipe, 1);
fcntl(1, F_SETFL, fcntl(1, F_GETFL) | O_NONBLOCK);
printf("HELLO WORLD I AM A CHILD PROCESS\n");
/*This seems to be written to the pipe immediately, blocking or not.*/
execvp(*argv, argv);
/*All output from this program, which outputs "one" sleeps for 1 second
*outputs "two" sleeps for a second, etc, is captured only after the
*exec'd program exits!
*/
break;
default: /*parent*/
inpipe = open("./fifo", O_RDONLY | O_NONBLOCK);
sleep(2);
/*no need to do anything special here*/
break;
}
return inpipe;
}
Why won't the child process write its stdout to the pipe each time a line is generated? Is there something I'm missing in the way execvp or dup2 work? I'm aware that my approach to all this is a bit strange, but I can't find another way to capture output of closed-source binaries programatically.
I would guess you only get the exec'd program's output after it exits because it does not flush after each message. If so, there is nothing you can do from the outside.
I am not quite sure how this is supposed to relate to the choice between blocking and nonblocking I/O in your question. A non-blocking write may fail completely or partially: instead of blocking the program until room is available in the pipe, the call returns immediately and says that it was not able to write everything it should have. Non-blocking I/O neither makes the buffer larger nor forces output to be flushed, and it may be badly supported by some programs.
You cannot force the binary-only program that you are exec'ing to flush. If you thought that non-blocking I/O was a solution to that problem, sorry, but I'm afraid it is quite orthogonal.
EDIT: Well, if the exec'd program only uses the buffering provided by libc (does not implement its own) and is dynamically linked, you could force it to flush by linking it against a modified libc that flushes every write. This would be a desperate measure. to try only if everything else failed.
When a process is started (via execvp() in your example), the behaviour of standard output depends on whether the output device is a terminal or not. If it is not (and a FIFO is not a terminal), then the output will be fully buffered, rather than line buffered. There is nothing you can do about that; the (Standard) C library does that.
If you really want to make it work line buffered, then you will have to provide the program with a pseudo-terminal as its standard output. That gets into interesting realms - pseudo-terminals or ptys are not all that easy to handle. For the POSIX functions, see:
grantpt() - grant access to the slave pseudo-terminal device
posix_openpt() - open a pseudo-terminal device
ptsname() - get name of the slave pseudo-terminal device
unlockpt() - unlock a pseudo-terminal master/slave pair
Why won't the child process write its stdout to the pipe each time a line is generated?
How do you know that? You do not even try to read the output from the fifo.
N.B. by the file name I presume that you are using the fifo. Or is it a plain file?
And the minor bug in the child: after dup2(), you need to close(outpipe).
fcntl(1, F_SETFL, fcntl(1, F_GETFL) | O_NONBLOCK);
Depending on what program you exec(), you might either lose some output or cause the program to fail since write to stdout now might fail with EWOULDBLOCK.
IIRC fifos has the same buffer size as pipes. Per POSIX minimum is 512 bytes, commonly 4K or 8K.
You probably want to explain why you need that at all. Non-blocking IO has different semantics compared to blocking IO and unless your child process expects that you will run into various problems.
printf("HELLO WORLD I AM A CHILD PROCESS\n");
stdout is buffered, I would have after that fflush(stdout). (Can't find documentation whether exec() on its own would flush stdout or not.)
Is there something I'm missing in the way execvp or dup2 work? I'm aware that my approach to all this is a bit strange, but I can't find another way to capture output of closed-source binaries programatically.
I wouldn't toy with non-blocking IO - and leave it as it is in blocking mode.
And I would use pipe() instead of the fifo. Linux's man pipe has a convenient example with the fork().
Otherwise, that is a pretty normal practice.
The sleep()s do not guarantee that the parent will open the pipe first - as Dummy00001 says, you should be using a pipe() pipe, not a named pipe. You should also check for execvp() and fork() failing, and you shouldn't be setting the child side to non-blocking - that's a decision for the child process to make.
int nonBlockingPOpen(char *const argv[])
{
int childpipe[2];
pid_t pid;
pipe(childpipe);
pid = fork();
if (pid == 0)
{
/*child*/
/*redirect stdout to opened pipe*/
dup2(childpipe[1], 1);
/* close leftover pipe file descriptors */
close(childpipe[0]);
close(childpipe[1]);
execvp(*argv, argv);
/* Only reached if execvp fails */
perror("execvp");
exit(1);
}
/*parent*/
/* Close leftover pipe file descriptor */
close(childpipe[1]);
/* Check for fork() failing */
if (pid < 0)
{
close(childpipe[0]);
return -1;
}
/* Set file descriptor non-blocking */
fcntl(childpipe[0], F_SETFL, fcntl(childpipe[0], F_GETFL) | O_NONBLOCK);
return childpipe[0];
}