Before stating my question, I have read several related questions on stack overflow, such as pipe & dup functions in UNIX, and several others,but didn't clarify my confusion.
First, the code, which is an example code from 'Beginning Linux Programming', 4th edition, Chapter 13:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
int main()
{
int data_processed;
int file_pipes[2];
const char some_data[] = "123";
pid_t fork_result;
if (pipe(file_pipes) == 0)
{
fork_result = fork();
if (fork_result == (pid_t)-1)
{
fprintf(stderr, "Fork failure");
exit(EXIT_FAILURE);
}
if (fork_result == (pid_t)0) // Child process
{
close(0);
dup(file_pipes[0]);
close(file_pipes[0]); // LINE A
close(file_pipes[1]); // LINE B
execlp("od", "od", "-c", (char *)0);
exit(EXIT_FAILURE);
}
else // parent process
{
close(file_pipes[0]); // LINE C
data_processed = write(file_pipes[1], some_data,
strlen(some_data));
close(file_pipes[1]); // LINE D
printf("%d - wrote %d bytes\n", (int)getpid(), data_processed);
}
}
exit(EXIT_SUCCESS);
}
The execution result is:
momo#xue5:~/TestCode/IPC_Pipe$ ./a.out
10187 - wrote 3 bytes
momo#xue5:~/TestCode/IPC_Pipe$ 0000000 1 2 3
0000003
momo#xue5:~/TestCode/IPC_Pipe$
If you commented LINE A, LINE C, and LINE D, the result is the same as above.
I understand the result, the child get the data from its parent through its own stdin which is connected to pipe, and send 'od -c' result to its stdout.
However, if you commented LINE B, the result would be:
momo#xue5:~/TestCode/IPC_Pipe$ ./a.out
10436 - wrote 3 bytes
momo#xue5:~/TestCode/IPC_Pipe$
No 'od -c' result!
Is 'od -c' started by execlp() not excuted, or its output not directed to stdout? One possibility is the read() of 'od' is blocked, because the write file descriptor file_pipes[1] of child is open if you commented LINE B. But commenting LINE D, which let write file descriptor file_pipes[1] of parent open, can still have the 'od -c' output.
And, why we need to close pipe before execlp()? execlp() will replace the process image, including stack, .data, .heap, .text with new image from 'od'. Does that mean, even if you don't close file_pipes[0] and file_pipes[1] in child as LINE A and B, file_pipes[0] and file_pipes[1] will still be 'destroyed' by execlp()? From the result by code, it is not. But where am I wrong?
Thanks so much for your time and efforts here~~
Is closing a pipe necessary when followed by execlp()?
It's not strictly necessary because it depends on how the pipe is used. But in general, yes it should be closed if the pipe end is not needed by the process.
why we need to close pipe before execlp()? execlp() will replace the process image
Because file descriptors (by default) remain open across exec calls. From the man page: "By default, file descriptors remain open across an execve(). File descriptors that are marked close-on-exec are closed; see the description of FD_CLOEXEC in fcntl(2)."
However, if you commented LINE B,...No 'od -c' result!
This is because the od process reads from stdin until it gets an EOF. If the process itself does not close file_pipes[1] then it will not see an EOF as the write end of the pipe would not be fully closed by all processes that had it opened.
If you commented LINE A, LINE C, and LINE D, he result is the same as above
This is because the file descriptors at A and C are read ends of the pipe and no one will be blocked waiting for it to be closed (as described above). The file descriptor at D is a write end and not closing it would indeed cause problems. However, even though the code does not explicitly call close on that file descriptor, it will still be closed because the process exits.
And, why we need to close pipe before execlp()? execlp() will replace the process image, including stack, .data, .heap, .text with new image from 'od'.
Yes, the exec-family functions, including execlp(), replace the process image of the calling process with a copy of the specified program. But the process's table of open file descriptors is not part of the process image -- it is maintained by the kernel, and it survives the exec.
Does that mean, even if you don't close file_pipes[0] and file_pipes[1] in child as LINE A and B, file_pipes[0] and file_pipes[1] will still be 'destroyed' by execlp()?
The variable file_pipes is destroyed by execlp(), but that's just the program's internal storage for the file descriptors. The descriptors are just integer indexes into a table maintained for the process by the kernel. Losing track of the file descriptor values does not cause the associated files to be closed. In fact, that's a form of resource leakage.
From the result by code, it is not. But where am I wrong?
As described above.
Additionally, when a process exits, all its open file descriptors are closed, but the underlying open file description in the kernel, to which the file descriptors refer, is closed only when no open file descriptors referring to it remain. Additional open file descriptors may be held by other processes, as a result of inheriting them across a fork().
Now as to the specific question of what happens when the child process does not close file_pipes[1] before execing od, you might get a clue by checking the process list via the ps command. You will see the child od process still running (maybe several, if you have tested several times). Why?
Well, how does od know when to exit? It processes its entire input, so it must exit when it reaches the end of its input(s). But the end of input on a pipe doesn't mean that no more data is available right now, because more data might later be written to the write end of the pipe. End of input on a pipe happens when the write end is closed. And if the child does not close file_pipes[1] before it execs, then it likely will remain open indefinitely, because after the exec the child doesn't any longer know that it owns it.
Related
I'm trying to implement a Linux pipe chain in C. For example:
grep file | ls | wc
So, there is a code that splits the arguments into tokens with the pipe as the separator, and sends each part to the following function with an integer specifying whether it precedes a pipe or not:
int control_flow(char** args, int precedes){
int stdin_copy = dup(0);
int stdout_copy = dup(1);
// if the command and its args precedes a pipe
if (precedes){
int fd[2];
if (pipe(fd) == -1){
fprintf(stderr, "pipe failed\n");
}
if (dup2(fd[1], 1)!=1)
perror("dup2 error 1 to p_in\n"); // 1 points to pipe's input
status = turtle_execute(args); // executes the argument list, output should go into the pipe
// Code stops running here
if (dup2(fd[0], 0)!=0)
perror("dup2 error 0 to p_out\n"); // 0 points to pipe's output, any process that reads next will read from the pipe
if (dup2(stdout_copy, 1)!=1)
perror("dup2 error 1 to stdout_copy\n"); // 1 points back to stdout
}
// if the command does not precede a pipe
else{
status = turtle_execute(args); // input to this is coming from pipe
if (dup2(stdin_copy, 0)!=0) // 0 points back to stdin
perror("dup2 error 1 to stdin_copy");
}
return 0;
}
My code stops running after the first command executes. I suspect it is necessary to fork a process before using this pipe, why is that? If so, how do I do that in my code without changing what I intend to do?
Edit:
This is roughly what turtle_execute does:
turtle_execute(args){
if (args[0] is cd or ls or pwd or echo)
// Implement by calling necessary syscalls
else
// Do fork and exec the process
So wherever I have used exec, I have first used fork, so process getting replaced shouldn't be a problem.
The exec system call replaces the current process with the program you are executing. So your process naturally stops working after the turtle_execute, since it was replaced with the new process.
To execute a new process you normally fork to create a copy of the current process and then execute in the copy.
When you are in the shell, normally each command you type is forked and executed. Try typing exec followed by a command into a shell and you will find that the shell terminates once that command has finished executing, since it does not fork in that case.
Edit
I suggest you have a look at the example on the pipe(2) man page (http://man7.org/linux/man-pages/man2/pipe.2.html#EXAMPLE). It shows the usual way of using a pipe:
Calling pipe to get the create the pipe
Calling fork to fork the process
Depending on whether it is child or parent close one end of the pipe and use the other
I think your problem might be that you make the writing end of your pipe the stdout before forking, causing both the parent and the child to have an open writing end. That could prevent an EOF to be sent since one writing end is still open.
I can only guess what happens in most of turtle_execute, but if you fork, exec on one process, and wait for it on the other, without consuming data from the pipe, it might fill the pipe and to the point where writing is blocked. You should always consume data from the pipe while you write to it. It is a pipe after all and not a water tank. For more information have a look at the pipe(7) man page under the 'Pipe capacity' section.
From the below program,
/*****************************************************************************
MODULE: popen.c
*****************************************************************************/
#include <stdio.h>
int main(void)
{
FILE *pipein_fp, *pipeout_fp;
char readbuf[80];
/* Create one way pipe line with call to popen() */
if (( pipein_fp = popen("ls", "r")) == NULL)
{
perror("popen");
exit(1);
}
/* Create one way pipe line with call to popen() */
if (( pipeout_fp = popen("sort", "w")) == NULL)
{
perror("popen");
exit(1);
}
/* Processing loop */
while(fgets(readbuf, 80, pipein_fp))
fputs(readbuf, pipeout_fp);
/* Close the pipes */
pclose(pipein_fp);
pclose(pipeout_fp);
return(0);
}
popen.c is compiled to my_program
Here is my understanding of the file descriptors created/involved after executing my_program, popen() forks & execs the my_program process but child processes does not inherit the pipe file descriptors of my_program.
So, After exec,
1) write file descriptor is only created for ls
2) read file descriptor is only created for sort
3) read and write files descriptors are created in my_program, because ls writes to my_program & sort reads from my_program
As shown above, Are these the only file descriptors involved/created?
Note: 'in' & 'out' are just naming conventions used here
Child processes from a fork() have exactly the same set of open file descriptors as the parent process.
The popen() call uses pipe() to create two file descriptors; it then executes fork(). The parent process arranges that one end of the pipe is closed and the other converted to a file stream (FILE *). The child process closes the other end of the pipe and arranges for the one end to become standard input ("w") or standard output ("r") for the process it executes (using dup() or dup2() for the task).
You use popen() twice; you end up with 2 open descriptors in the parent, and transiently there's a third.
When you say, 'the parent process arranges that one end of the pipe is closed', do you mean read file descriptor (stdin)?
Depending on the mode argument to popen(), one of the two ends of the pipe in the parent is closed immediately; the other is closed by pclose(). The file descriptor is 'never' the one for standard input or standard output — you have to go through extraordinary gyrations to make it so that is one of the standard I/O channels.
Do all process thru popen() make sure they dup() to make sure they use stdout & stdin?
Each pipe has a read end and a write end. Take popen("ls", "r"); your program reads from the ls process. It (popen()) creates a pipe and forks. In the child, the write end of the pipe is connected to stdout (dup2() or perhaps dup()), and the read end of the pipe is closed, before the command is executed. In the parent, the read end of the pipe is 'converted to' or 'attached to' a stream (fdopen(), more or less) and the write end of the pipe is closed. In the parent process, the pipe is never connected to either stdout or stdin.
In the child process, either standard input or standard output is connected to the pipe, depending on the mode argument to popen().
This is the code i found for my own shell. It works fine, but the thing i can't understand is pipe section of the code.
#include <stdio.h>
#include <unistd.h>
#include <string.h>
#include <stdlib.h>
char* cmndtkn[256];
char buffer[256];
char* path=NULL;
char pwd[128];
int main(){
//setting path variable
char *env;
env=getenv("PATH");
putenv(env);
system("clear");
printf("\t MY OWN SHELL !!!!!!!!!!\n ");
printf("_______________________________________\n\n");
while(1){
fflush(stdin);
getcwd(pwd,128);
printf("[MOSH~%s]$",pwd);
fgets(buffer,sizeof(buffer),stdin);
buffer[sizeof(buffer)-1] = '\0';
//tokenize the input command line
char* tkn = strtok(buffer," \t\n");
int i=0;
int indictr=0;
// loop for every part of the command
while(tkn!=NULL)
{
if(strcoll(tkn,"exit")==0 ){
exit(0);
}
else if(strcoll(buffer,"cd")==0){
path = buffer;
chdir(path+=3);
}
else if(strcoll(tkn,"|")==0){
indictr=i;
}
cmndtkn[i++] = tkn;
tkn = strtok(NULL," \t\n");
}cmndtkn[i]='\0';
// execute when command has pipe. when | command is found indictr is greater than 0.
if(indictr>0){
char* leftcmnd[indictr+1];
char* rightcmnd[i-indictr];
int a,b;
for(b=0;b<indictr;b++)
leftcmnd[b]=cmndtkn[b];
leftcmnd[indictr]=NULL;
for(a=0;a<i-indictr-1;a++)
rightcmnd[a]=cmndtkn[a+indictr+1];
rightcmnd[i-indictr]=NULL;
if(!fork())
{
fflush(stdout);
int pfds[2];
pipe(pfds);
if(!fork()){
close(1);
dup(pfds[1]);
close(pfds[0]);
execvp(leftcmnd[0],leftcmnd);
}
else{
close(0);
dup(pfds[0]);
close(pfds[1]);
execvp(rightcmnd[0],rightcmnd);
}
}else
wait(NULL);
//command not include pipe
}else{
if(!fork()){
fflush(stdout);
execvp(cmndtkn[0],cmndtkn);
}else
wait(NULL);
}
}
}
What is the purpose of the calls to close() with parameters of 0 and 1 mean and what does the call to dup() do?
On Unix, the dup() call uses the lowest numbered unused file descriptor. So, the close(1) before the call to dup() is to coerce dup() to use file descriptor 1. Similarly for close(0).
So, the aliasing is to get the process to use the write end of the pipe for stdout (file descriptor 1 is used for console output), and the read end of the pipe for stdin (file descriptor 0 is used for console input).
The code may have been more clearly expressed with dup2() instead.
dup2(fd[1], 1); /* alias fd[1] to 1 */
From your question about how ls | sort works, your question is not limited to why the dup() system call is being made. Your question is actually how pipes in Unix work, and how a shell command pipeline works.
A pipe in Unix is a pair of file descriptors that are related in that writing data on tje writable descriptor allows that data to be read from the readable descriptor. The pipe() call returns this pair in an array, where the first array element is readable, and second array element is writable.
In Unix, a fork() followed by some kind of exec() is the only way to produce a new process (there are other library calls, such as system() or popen() that create processes, but they call fork() and do an exec() under the hood). A fork() produces a child process. The child process sees the return value of 0 from the call, while the parent sees a non-zero return value that is either the PID of the child process, or a -1 indicating that an error has occurred.
The child process is a duplicate of the parent. This means that when a child modifies a variable, it is modifying a copy of the variable that resides in its own process. The parent does not see the modification occur, as the parent has the original copy). However, a duplicated pair of file descriptors that form a pipe can be used to allow a child process its parent to communicate with each other.
So, ls | sort means that there are two processes being spawned, and the output written by ls is being read as input by sort. Two processes means two calls to fork() to create two child processes. One child process will exec() the ls command, the other child process will exec() the sort command. A pipe is used between them to allow the processes to talk to each other. The ls process writes to the writable end of the pipe, the sort process reads from the readable end of the pipe.
The ls process is coerced into writing into the writable end of the pipe with the dup() call after issuing close(1). The sort process is coerced into reading the readable end of the pipe with the dup() call after close(0).
In addition, the close() calls that close the pipe file descriptors are used to make sure that the ls process is the only process to have an open reference to the writable fd, the the sort process is the only process to have an open reference to the readable fd. That step is important because after ls exits, it will close the writable end of the fd, and the sort process will expect to see an EOF as a result. However, this will not occur if some other process still has the writable fd open.
http://en.wikipedia.org/wiki/Standard_streams#Standard_input_.28stdin.29
stdin is file descriptor 0.
stdout is file descriptor 1.
In the !fork section, the process closes stdout then calls dup on pfds[1] which according to:
http://linux.die.net/man/2/dup
Creates a duplicate of the specified file descriptor at the lowest available position, which will be 1, since it was just closed (and stdin hasn't been closed yet). This means everything sent to stdout will really go to pfds[1].
So, basically, it's setting up the two new processes to talk to each other. the !fork section is for the new child which will send data to stdout (file descriptor 1), the parent (the else block) closes stdin, so it really reads from pfds[0] when it tries to read from stdout.
Each process has to close the file descriptor in pfds it's not using, as there are two open handles to the file now that the process has forked. Each process now execs to left/right-cmnd, but the new stdin and stdout mappings remain for the new processes.
Forking twice is explained here: Why fork() twice
I am writing some C code that involves the use of pipes. To make a child process use my pipe instead of STDOUT for output, I used the following lines:
close(STDOUT);
dup2(leftup[1], STDOUT);
However, it seems to go into some sort of infinite loop or hang on those lines. When I get rid of close, it hangs on dup2.
Curiously, the same idea works in the immediately preceding line for STDIN:
close(STDIN);
dup2(leftdown[0], STDIN);
What could be causing this behavior?
Edit: Just to be clear...
#define STDIN 0
#define STDOUT 1
Edit 2: Here is a stripped-down example:
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <string.h>
#include <sys/wait.h>
#define STDIN 0
#define STDOUT 1
main(){
pid_t child1 = 0;
int leftdown[2];
if (pipe(leftdown) != 0)
printf("ERROR");
int leftup[2];
if (pipe(leftup) != 0)
printf("ERROR");
printf("MADE PIPES");
child1 = fork();
if (child1 == 0){
close(STDOUT);
printf("TEST 1");
dup2(leftup[1], STDOUT);
printf("TEST 2");
exit(0);
}
return(0);
}
The "TEST 1" line is never reached. The only output is "MADE PIPES".
At a minimum, you should ensure that the dup2 function returns the new file descriptor rather than -1.
There's always a possibility that it will give you an error (for example, if the pipe() call failed previously). In addition, be absolutely certain that you're using the right indexes (0 and 1) - I've been bitten by that before and it depends on whether you're in the parent or child process.
Based on your edit, I'm not the least bit surprised that MADE PIPES is the last thing printed.
When you try to print TEST 1, you have already closed the STDOUT descriptor so that will go nowhere.
When you try to print TEST 2, you have duped the STDOUT descriptor so that will go to the parent but your parent doesn't read it.
If you change your forking code to:
child1 = fork();
if (child1 == 0){
int count;
close(STDOUT);
count = printf("TEST 1\n");
dup2(leftup[1], STDOUT);
printf("TEST 2 (%d)\n", count);
exit(0);
} else {
char buff[80];
read (leftup[0], buff, 80);
printf ("%s\n", buff);
sleep (2);
}
you'll see that the TEST 2 (-1) line is output by the parent because it read it via the pipe. The -1 in there is the return code from the printf you attempted in the child after you closed the STDOUT descriptor (but before you duped it), meaning that it failed.
From ISO C11 7.20.6.3 The printf function:
The printf function returns the number of characters transmitted, or a negative value if an output or encoding error occurred.
Multiple thing to mention,
When you use fork, it causes almost a complete copy of parent process. That also includes the buffer that is set up for stdout standard output stream as well. The stdout stream will hold the data till buffer is full or explicitly requested to flush the data from buffer/stream. Now because of this , now you have "MADE PIPES" sitting in buffer. When you close the STDOUT fd and use printf for writing data out to terminal, it does nothing but transfers your "TEST 1" and "TEST 2" into the stdout buffer and doesn't cause any error or crash (due to enough buffer). Thus even after duplicating pipe fd on STDOUT, due to buffered output printf hasn't even touched pipe write end. Most important, please use only one set of APIs i.e. either *NIX or standard C lib functions. Make sure you understand the libraries well, as they often play tricks for some sort of optimization.
Now, another thing to mention, make sure that you close the appropriate ends of pipe in appropriate process. Meaning that if say, pipe-1 is used to communicate from parent to child then make sure that you close the read end in parent and write end in child. Otherwise, your program may hung, due to reference counts associated with file descriptors you may think that closing read end in child means pipe-read end is closed. But as when you don't close the read end in parent, then you have extra reference count for read end of pipe and ultimately the pipe will never close.
There are many other things about your coding style, better you should get hold on it :)
Sooner you learn it better it will save your time. :)
Error checking is absolutely important, use at least assert to ensure that your assumptions are correct.
While using printf statements to log the error or as method of debugging and you are changing terminal FD's (STDOUT / STDIN / STDERR) its better you open a log file with *NIX open and write errors/ log entries to it.
At last, using strace utility will be a great help for you. This utility will allow you to track the system calls executed while executing your code. It is very straight forward and simple. You can even attach this to executing process, provided you have right permissions.
Recently I started suspecting that I use the ends of the pipes wrongly:
From the man pages:
pipe() creates a pipe.. ..pipefd[0] refers to the read end of the
pipe. pipefd[1] refers to the write end of the pipe.
So in my mind I had it like this:
.---------------------------.
/ /\
| pipedfd[0] pipedfd[1]| |
process1 ---> | | -----> process2
| input output| |
\____________________________\/
However the code that I have here and works suggests otherwise:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
int main(void)
{
int pipedfd[2];
char buf[30];
pipe(pipedfd);
printf("writing to file descriptor #%d\n", pipedfd[1]);
write(pipedfd[1], "test", 5);
printf("reading from file descriptor #%d\n", pipedfd[0]);
read(pipedfd[0], buf, 5);
printf("read \"%s\"\n", buf);
return 0;
}
Namely it writes to the output(?) of the pipe and reads from the input(?) of the pipe?
In a nutshell, swap the numbers 0 and 1 in your diagram and you got what I'll describe below.
From the Mac OS X man page:
The pipe() function creates a pipe (an object that allows unidirectional data flow) and allocates a pair of file descriptors. The first descriptor connects to the read
end of the pipe; the second connects to the write end.
Data written to fildes[1] appears on (i.e., can be read from) fildes[0]. This allows the output of one program to be sent to another program: the source's standard out-
put is set up to be the write end of the pipe; the sink's standard input is set up to be the read end of the pipe. The pipe itself persists until all of its associated
descriptors are closed.
I'll describe how it's often used, that might clear it up. Imagine you have a process and want to spawn a child, to which you want to send commands.
First, you call pipe and get the two file descriptors.
Then you call fork to create the child.
In the child, you close the writing file descriptor (fd[1]) and leave the reading one open.
In the parent, you do the reverse: you close the reading (fd[0]) file descriptor and leave the writing one open.
Now the parent can write into "his" part of the pipe (fd[1]) and the child can read on the other (fd[0]).
The closing is not necessary but is usually done. If you need two-way communication you either need a second set of file descriptors plus a second call to pipe, or you use a two way channel like Unix domain sockets or a named pipe.
The Linux man page for pipe disambiguates this as follows:
Data written to the write end of the pipe is buffered by the kernel
until it is read from the read end of the pipe.
That is, you read from fd[0] and write to fd[1]