How many file descriptors created/involved after running this program - pipe - c

From the below program,
/*****************************************************************************
MODULE: popen.c
*****************************************************************************/
#include <stdio.h>
int main(void)
{
FILE *pipein_fp, *pipeout_fp;
char readbuf[80];
/* Create one way pipe line with call to popen() */
if (( pipein_fp = popen("ls", "r")) == NULL)
{
perror("popen");
exit(1);
}
/* Create one way pipe line with call to popen() */
if (( pipeout_fp = popen("sort", "w")) == NULL)
{
perror("popen");
exit(1);
}
/* Processing loop */
while(fgets(readbuf, 80, pipein_fp))
fputs(readbuf, pipeout_fp);
/* Close the pipes */
pclose(pipein_fp);
pclose(pipeout_fp);
return(0);
}
popen.c is compiled to my_program
Here is my understanding of the file descriptors created/involved after executing my_program, popen() forks & execs the my_program process but child processes does not inherit the pipe file descriptors of my_program.
So, After exec,
1) write file descriptor is only created for ls
2) read file descriptor is only created for sort
3) read and write files descriptors are created in my_program, because ls writes to my_program & sort reads from my_program
As shown above, Are these the only file descriptors involved/created?
Note: 'in' & 'out' are just naming conventions used here

Child processes from a fork() have exactly the same set of open file descriptors as the parent process.
The popen() call uses pipe() to create two file descriptors; it then executes fork(). The parent process arranges that one end of the pipe is closed and the other converted to a file stream (FILE *). The child process closes the other end of the pipe and arranges for the one end to become standard input ("w") or standard output ("r") for the process it executes (using dup() or dup2() for the task).
You use popen() twice; you end up with 2 open descriptors in the parent, and transiently there's a third.
When you say, 'the parent process arranges that one end of the pipe is closed', do you mean read file descriptor (stdin)?
Depending on the mode argument to popen(), one of the two ends of the pipe in the parent is closed immediately; the other is closed by pclose(). The file descriptor is 'never' the one for standard input or standard output — you have to go through extraordinary gyrations to make it so that is one of the standard I/O channels.
Do all process thru popen() make sure they dup() to make sure they use stdout & stdin?
Each pipe has a read end and a write end. Take popen("ls", "r"); your program reads from the ls process. It (popen()) creates a pipe and forks. In the child, the write end of the pipe is connected to stdout (dup2() or perhaps dup()), and the read end of the pipe is closed, before the command is executed. In the parent, the read end of the pipe is 'converted to' or 'attached to' a stream (fdopen(), more or less) and the write end of the pipe is closed. In the parent process, the pipe is never connected to either stdout or stdin.
In the child process, either standard input or standard output is connected to the pipe, depending on the mode argument to popen().

Related

Is closing a pipe necessary when followed by execlp()?

Before stating my question, I have read several related questions on stack overflow, such as pipe & dup functions in UNIX, and several others,but didn't clarify my confusion.
First, the code, which is an example code from 'Beginning Linux Programming', 4th edition, Chapter 13:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
int main()
{
int data_processed;
int file_pipes[2];
const char some_data[] = "123";
pid_t fork_result;
if (pipe(file_pipes) == 0)
{
fork_result = fork();
if (fork_result == (pid_t)-1)
{
fprintf(stderr, "Fork failure");
exit(EXIT_FAILURE);
}
if (fork_result == (pid_t)0) // Child process
{
close(0);
dup(file_pipes[0]);
close(file_pipes[0]); // LINE A
close(file_pipes[1]); // LINE B
execlp("od", "od", "-c", (char *)0);
exit(EXIT_FAILURE);
}
else // parent process
{
close(file_pipes[0]); // LINE C
data_processed = write(file_pipes[1], some_data,
strlen(some_data));
close(file_pipes[1]); // LINE D
printf("%d - wrote %d bytes\n", (int)getpid(), data_processed);
}
}
exit(EXIT_SUCCESS);
}
The execution result is:
momo#xue5:~/TestCode/IPC_Pipe$ ./a.out
10187 - wrote 3 bytes
momo#xue5:~/TestCode/IPC_Pipe$ 0000000 1 2 3
0000003
momo#xue5:~/TestCode/IPC_Pipe$
If you commented LINE A, LINE C, and LINE D, the result is the same as above.
I understand the result, the child get the data from its parent through its own stdin which is connected to pipe, and send 'od -c' result to its stdout.
However, if you commented LINE B, the result would be:
momo#xue5:~/TestCode/IPC_Pipe$ ./a.out
10436 - wrote 3 bytes
momo#xue5:~/TestCode/IPC_Pipe$
No 'od -c' result!
Is 'od -c' started by execlp() not excuted, or its output not directed to stdout? One possibility is the read() of 'od' is blocked, because the write file descriptor file_pipes[1] of child is open if you commented LINE B. But commenting LINE D, which let write file descriptor file_pipes[1] of parent open, can still have the 'od -c' output.
And, why we need to close pipe before execlp()? execlp() will replace the process image, including stack, .data, .heap, .text with new image from 'od'. Does that mean, even if you don't close file_pipes[0] and file_pipes[1] in child as LINE A and B, file_pipes[0] and file_pipes[1] will still be 'destroyed' by execlp()? From the result by code, it is not. But where am I wrong?
Thanks so much for your time and efforts here~~
Is closing a pipe necessary when followed by execlp()?
It's not strictly necessary because it depends on how the pipe is used. But in general, yes it should be closed if the pipe end is not needed by the process.
why we need to close pipe before execlp()? execlp() will replace the process image
Because file descriptors (by default) remain open across exec calls. From the man page: "By default, file descriptors remain open across an execve(). File descriptors that are marked close-on-exec are closed; see the description of FD_CLOEXEC in fcntl(2)."
However, if you commented LINE B,...No 'od -c' result!
This is because the od process reads from stdin until it gets an EOF. If the process itself does not close file_pipes[1] then it will not see an EOF as the write end of the pipe would not be fully closed by all processes that had it opened.
If you commented LINE A, LINE C, and LINE D, he result is the same as above
This is because the file descriptors at A and C are read ends of the pipe and no one will be blocked waiting for it to be closed (as described above). The file descriptor at D is a write end and not closing it would indeed cause problems. However, even though the code does not explicitly call close on that file descriptor, it will still be closed because the process exits.
And, why we need to close pipe before execlp()? execlp() will replace the process image, including stack, .data, .heap, .text with new image from 'od'.
Yes, the exec-family functions, including execlp(), replace the process image of the calling process with a copy of the specified program. But the process's table of open file descriptors is not part of the process image -- it is maintained by the kernel, and it survives the exec.
Does that mean, even if you don't close file_pipes[0] and file_pipes[1] in child as LINE A and B, file_pipes[0] and file_pipes[1] will still be 'destroyed' by execlp()?
The variable file_pipes is destroyed by execlp(), but that's just the program's internal storage for the file descriptors. The descriptors are just integer indexes into a table maintained for the process by the kernel. Losing track of the file descriptor values does not cause the associated files to be closed. In fact, that's a form of resource leakage.
From the result by code, it is not. But where am I wrong?
As described above.
Additionally, when a process exits, all its open file descriptors are closed, but the underlying open file description in the kernel, to which the file descriptors refer, is closed only when no open file descriptors referring to it remain. Additional open file descriptors may be held by other processes, as a result of inheriting them across a fork().
Now as to the specific question of what happens when the child process does not close file_pipes[1] before execing od, you might get a clue by checking the process list via the ps command. You will see the child od process still running (maybe several, if you have tested several times). Why?
Well, how does od know when to exit? It processes its entire input, so it must exit when it reaches the end of its input(s). But the end of input on a pipe doesn't mean that no more data is available right now, because more data might later be written to the write end of the pipe. End of input on a pipe happens when the write end is closed. And if the child does not close file_pipes[1] before it execs, then it likely will remain open indefinitely, because after the exec the child doesn't any longer know that it owns it.

Using pipe() after child calls exec()

My end goal is to have a parent process pass lines of text to the child, then the child process will print the text to stdout. The child is to run "permanently" in the back ground while the parent gets user input and pass's it to the child. I prefer the child in a separate program. Differentiating between child and paretn through if statements is messy as fudge.
I was looking into pipes but I'm unsure if it's even possible for pipes to communicate between a parent/child after the child has called exec() to a different program.
Is this possible? If so, is there any example you can point me to? If not, what method of IPC can I use in that case?
The standard scenario is to have the program executed as a child be agnostic of the pipe and just use stdin / stdout. You achieve this by dup2()ing the respective end of the pipe as fd 0 or 1 (or both with two pipes for bidirectional communication), corresponding to STDIN_FILENO and STDOUT_FILENO. After this, exec your child program.
Of course, there are alternatives like e.g. "named pipes" if you need stdin / stdout for a different purpose in the child.
Still if you write both parts yourself, you might want to think about simpler solutions:
Differentiating between child and parent through if statements is messy as fudge.
You have to do this anyways, at least for wiring up the pipes and calling exec(). Just create separate code files and call things like parent_main() and child_main() as appropriate (whatever you like to call them).
After exec*() functions the child shares all the file descriptors of the parent process. So if you create a pipe before fork() you have access to read/write fd in both of them.
Usual method is:
create a pipe (a read fd, a write fd)
fork()
in parent:
close read fd (you will not read from parent here)
write data into write fd
wait/die
in child:
close write fd (you will not write from child)
read data from read fd
wait/die
Can't see why you want to exec a new process. If really needed you have to use standard stdin/stdout (see other answer) or have a program that will accept a fd (filedescriptor, an integer) as parameter in order to know which one is the pipe. Don't seems very nice to me.
Actually it is much easier to do it with popen which manages the communication channels automatically
FILE *popen(const char *command, const char *mode);
The popen() function shall execute the command specified by the string
command. It shall create a pipe between the calling program and the
executed command, and shall return a pointer to a stream that can be
used to either read from or write to the pipe.
The environment of the executed command shall be as if a child process
were created within the popen() call using the fork() function, and
the child invoked the sh utility using the call:
execl(shell path, "sh", "-c", command, (char *)0);
where shell path is an unspecified pathname for the sh utility.
The popen() function shall ensure that any streams from previous
popen() calls that remain open in the parent process are closed in the
new child process.
The mode argument to popen() is a string that specifies I/O mode:
If mode is r, when the child process is started, its file descriptor
STDOUT_FILENO shall be the writable end of the pipe, and the file
descriptor fileno(stream) in the calling process, where stream is the
stream pointer returned by popen(), shall be the readable end of the
pipe.
If mode is w, when the child process is started its file descriptor
STDIN_FILENO shall be the readable end of the pipe, and the file
descriptor fileno(stream) in the calling process, where stream is the
stream pointer returned by popen(), shall be the writable end of the
pipe.
If mode is any other value, the result is undefined.
After popen(), both the parent and the child process shall be capable
of executing independently before either terminates.
Pipe streams are byte-oriented.
RETURN VALUE
Upon successful completion, popen() shall return a pointer to an open
stream that can be used to read or write to the pipe. Otherwise, it
shall return a null pointer and may set errno to indicate the error.
For example suppose you want to send Hello world to the child:
#include<stdio.h>
#include<stdlib.h>
#include <unistd.h>
int main()
{
FILE* toChild;
toChild = popen("./child", "w");//child is executable of the other file: change the name
int res = fputs( "Hello World\n", toChild);
pclose(toChild);
return 0;
}
and the child:
int main()
{
char p[100];
int n;
do{
n = scanf("%s", p);
if (n>0) {
printf("INPUT MESSAGE: \"%s\n\"", p);
//free(p);
}
else {
printf( "%d, No matching characters\n", n);
}
}while(n>0);
return 0;
}
You can also use scanf("%ms", &p) with char* p and then free(p) if you are on a pure POSIX system (not on OSX).

Can not understand the pipe() in my own shell

This is the code i found for my own shell. It works fine, but the thing i can't understand is pipe section of the code.
#include <stdio.h>
#include <unistd.h>
#include <string.h>
#include <stdlib.h>
char* cmndtkn[256];
char buffer[256];
char* path=NULL;
char pwd[128];
int main(){
//setting path variable
char *env;
env=getenv("PATH");
putenv(env);
system("clear");
printf("\t MY OWN SHELL !!!!!!!!!!\n ");
printf("_______________________________________\n\n");
while(1){
fflush(stdin);
getcwd(pwd,128);
printf("[MOSH~%s]$",pwd);
fgets(buffer,sizeof(buffer),stdin);
buffer[sizeof(buffer)-1] = '\0';
//tokenize the input command line
char* tkn = strtok(buffer," \t\n");
int i=0;
int indictr=0;
// loop for every part of the command
while(tkn!=NULL)
{
if(strcoll(tkn,"exit")==0 ){
exit(0);
}
else if(strcoll(buffer,"cd")==0){
path = buffer;
chdir(path+=3);
}
else if(strcoll(tkn,"|")==0){
indictr=i;
}
cmndtkn[i++] = tkn;
tkn = strtok(NULL," \t\n");
}cmndtkn[i]='\0';
// execute when command has pipe. when | command is found indictr is greater than 0.
if(indictr>0){
char* leftcmnd[indictr+1];
char* rightcmnd[i-indictr];
int a,b;
for(b=0;b<indictr;b++)
leftcmnd[b]=cmndtkn[b];
leftcmnd[indictr]=NULL;
for(a=0;a<i-indictr-1;a++)
rightcmnd[a]=cmndtkn[a+indictr+1];
rightcmnd[i-indictr]=NULL;
if(!fork())
{
fflush(stdout);
int pfds[2];
pipe(pfds);
if(!fork()){
close(1);
dup(pfds[1]);
close(pfds[0]);
execvp(leftcmnd[0],leftcmnd);
}
else{
close(0);
dup(pfds[0]);
close(pfds[1]);
execvp(rightcmnd[0],rightcmnd);
}
}else
wait(NULL);
//command not include pipe
}else{
if(!fork()){
fflush(stdout);
execvp(cmndtkn[0],cmndtkn);
}else
wait(NULL);
}
}
}
What is the purpose of the calls to close() with parameters of 0 and 1 mean and what does the call to dup() do?
On Unix, the dup() call uses the lowest numbered unused file descriptor. So, the close(1) before the call to dup() is to coerce dup() to use file descriptor 1. Similarly for close(0).
So, the aliasing is to get the process to use the write end of the pipe for stdout (file descriptor 1 is used for console output), and the read end of the pipe for stdin (file descriptor 0 is used for console input).
The code may have been more clearly expressed with dup2() instead.
dup2(fd[1], 1); /* alias fd[1] to 1 */
From your question about how ls | sort works, your question is not limited to why the dup() system call is being made. Your question is actually how pipes in Unix work, and how a shell command pipeline works.
A pipe in Unix is a pair of file descriptors that are related in that writing data on tje writable descriptor allows that data to be read from the readable descriptor. The pipe() call returns this pair in an array, where the first array element is readable, and second array element is writable.
In Unix, a fork() followed by some kind of exec() is the only way to produce a new process (there are other library calls, such as system() or popen() that create processes, but they call fork() and do an exec() under the hood). A fork() produces a child process. The child process sees the return value of 0 from the call, while the parent sees a non-zero return value that is either the PID of the child process, or a -1 indicating that an error has occurred.
The child process is a duplicate of the parent. This means that when a child modifies a variable, it is modifying a copy of the variable that resides in its own process. The parent does not see the modification occur, as the parent has the original copy). However, a duplicated pair of file descriptors that form a pipe can be used to allow a child process its parent to communicate with each other.
So, ls | sort means that there are two processes being spawned, and the output written by ls is being read as input by sort. Two processes means two calls to fork() to create two child processes. One child process will exec() the ls command, the other child process will exec() the sort command. A pipe is used between them to allow the processes to talk to each other. The ls process writes to the writable end of the pipe, the sort process reads from the readable end of the pipe.
The ls process is coerced into writing into the writable end of the pipe with the dup() call after issuing close(1). The sort process is coerced into reading the readable end of the pipe with the dup() call after close(0).
In addition, the close() calls that close the pipe file descriptors are used to make sure that the ls process is the only process to have an open reference to the writable fd, the the sort process is the only process to have an open reference to the readable fd. That step is important because after ls exits, it will close the writable end of the fd, and the sort process will expect to see an EOF as a result. However, this will not occur if some other process still has the writable fd open.
http://en.wikipedia.org/wiki/Standard_streams#Standard_input_.28stdin.29
stdin is file descriptor 0.
stdout is file descriptor 1.
In the !fork section, the process closes stdout then calls dup on pfds[1] which according to:
http://linux.die.net/man/2/dup
Creates a duplicate of the specified file descriptor at the lowest available position, which will be 1, since it was just closed (and stdin hasn't been closed yet). This means everything sent to stdout will really go to pfds[1].
So, basically, it's setting up the two new processes to talk to each other. the !fork section is for the new child which will send data to stdout (file descriptor 1), the parent (the else block) closes stdin, so it really reads from pfds[0] when it tries to read from stdout.
Each process has to close the file descriptor in pfds it's not using, as there are two open handles to the file now that the process has forked. Each process now execs to left/right-cmnd, but the new stdin and stdout mappings remain for the new processes.
Forking twice is explained here: Why fork() twice

Is dup2() necessary for execl

Is it necessary to replace stdin with a pipe end when using pipes?
I have an application that:-
Creates a pipe,
Forks a child process, and then
execl() a new process image within new child process,
But I'm running into two conceptual issues.
Is it necessary to use dup() or dup2() to replace stdin? It would obviously be easier to just use the fd from the pipe. (I need little insight about this)
If you can just use the fd from the pipe, how do you pass an integer fd using execl() when execl takes char * arguments?
I'm having trouble figuring out exactly what remains open after execl() is performed, and how to access that information from the newly execl'd process.
It depends on the commands you're running. However, many Unix commands read from standard input and write to standard output, so if the pipes are not set up so that the write end is the output of one command and the read end is the input of the next command, nothing happens (or, more accurately, programs read from places where the input isn't coming from, or write to places which will not be read from, or hang waiting for you to type the input at the terminal, or otherwise do not work as intended).
If your pipe is on file descriptors 3 and 4, the commands you execute must know to read from 3 and write to 4. You could handle that with shell, but it is moderately grotesque overkill to do so compared with using dup2().
No; you're not obliged to use dup2(), but it is generally easier to do so. You could close standard output and then use plain dup() instead of dup2().
If you use dup2() for a pipe, don't forget to close both of the original file descriptors.
You are probably trying to feed data to a subprocess that exists on the system but on the off chance that you are also writing the child process then no you don't need to use dup() and stdin.
execl() keeps all open file descriptors from the parent process open so you could:
int fd[2];
pipe(fd);
if (fork() == 0)
{
char tmp[20];
close(fd[1]);
snprintf(tmp, sizeof(tmp), "%d", fd[0]);
execl("client", tmp, NULL);
exit(1);
}
and in the code for client:
int main(int argc, char** argv)
{
int fd = strtod(argv[1], NULL, 10);
/* Read from fd */
}

Which end of a pipe is for input and which for output?

Recently I started suspecting that I use the ends of the pipes wrongly:
From the man pages:
pipe() creates a pipe.. ..pipefd[0] refers to the read end of the
pipe. pipefd[1] refers to the write end of the pipe.
So in my mind I had it like this:
.---------------------------.
/ /\
| pipedfd[0] pipedfd[1]| |
process1 ---> | | -----> process2
| input output| |
\____________________________\/
However the code that I have here and works suggests otherwise:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
int main(void)
{
int pipedfd[2];
char buf[30];
pipe(pipedfd);
printf("writing to file descriptor #%d\n", pipedfd[1]);
write(pipedfd[1], "test", 5);
printf("reading from file descriptor #%d\n", pipedfd[0]);
read(pipedfd[0], buf, 5);
printf("read \"%s\"\n", buf);
return 0;
}
Namely it writes to the output(?) of the pipe and reads from the input(?) of the pipe?
In a nutshell, swap the numbers 0 and 1 in your diagram and you got what I'll describe below.
From the Mac OS X man page:
The pipe() function creates a pipe (an object that allows unidirectional data flow) and allocates a pair of file descriptors. The first descriptor connects to the read
end of the pipe; the second connects to the write end.
Data written to fildes[1] appears on (i.e., can be read from) fildes[0]. This allows the output of one program to be sent to another program: the source's standard out-
put is set up to be the write end of the pipe; the sink's standard input is set up to be the read end of the pipe. The pipe itself persists until all of its associated
descriptors are closed.
I'll describe how it's often used, that might clear it up. Imagine you have a process and want to spawn a child, to which you want to send commands.
First, you call pipe and get the two file descriptors.
Then you call fork to create the child.
In the child, you close the writing file descriptor (fd[1]) and leave the reading one open.
In the parent, you do the reverse: you close the reading (fd[0]) file descriptor and leave the writing one open.
Now the parent can write into "his" part of the pipe (fd[1]) and the child can read on the other (fd[0]).
The closing is not necessary but is usually done. If you need two-way communication you either need a second set of file descriptors plus a second call to pipe, or you use a two way channel like Unix domain sockets or a named pipe.
The Linux man page for pipe disambiguates this as follows:
Data written to the write end of the pipe is buffered by the kernel
until it is read from the read end of the pipe.
That is, you read from fd[0] and write to fd[1]

Resources