In the following code:
int main(void) {
printf("before child\n");
int pid = fork();
if(pid == 0)
{
exit(0);
}
int status;
wait(&status);
if(4 != printf("abc\n"))
perror("printing to stdout\n");
return 0;
}
Produces the output:
before child
abc
The call to exit() in the child should close all file discriptors, including stdout fd.
Then how can the parent process still write to stdout after it has been closed?
Think of file descriptors as pointers to reference-counted file objects.
When you fork, the child process gets new references to the same streams as the parent process. Both the parent and child's descriptors point to the same stream object.
When your child process exits, all of the file descriptors of the child process are closed. But since the parent also has file descriptors to the stream objects, the streams don't go away.
Files and streams are only torn down once no one refers to them anymore. In this case, the parent process refers to them.
(For additional fun, check out the dup family of functions, which duplicate file descriptors in a similar way. With it, you can have, in a single process, two file descriptors for the same file.)
Related
Suppose I have a process p that uses a file descriptor, for example an unnamed pipe, to communicate to his parent process p1.
Suppose p calls fork() to create a child process c which right after the fork() calls one of the exec family function.
By default parent's file descriptors are duplicated to the child even when using exec. So c should be able to communicate with p1, having its parent p opened a file descriptor to p1.
How can I get that file descriptor in the C source code of c child process if the variable corresponding to that file descriptor is defined only in p (and p1)?
Just to give an example of what I mean, the following is the code for p and p1
//p1 process
int fd[2];
pipe(fd);
pid_t p_pid, c_pid;
p_pid = fork();
if(p_pid == 0) // p process
{
/* p_pid closes up input side of pipe */
close(fd[0]);
c_pid = fork();
if (c_pid)==0 //c child process
{
exec("execution/child/process"...);
}
else
{
...// do p process parenting stuff
}
}
else
{
/* Parent p1 process closes up output side of pipe */
close(fd[1]);
}
Now the "execution/child/process" has its own source code and I cannot use the variable fd to communicate with p1 since it is not defined but the file descriptor should exist: so how to reference it and use it?
By default parent's file descriptors are duplicated to the child even
when using exec. So c should be able to communicate with p1, having
its parent p opened a file descriptor to p1.
Yes. The main proviso is that the file descriptor is not set close-on-exec.
How can I get that file descriptor in the C source code of c child
process if the variable corresponding to that file descriptor is
defined only in p (and p1)?
You can dup2() the file descripor onto a well-known number, such as stdin's (0), stdout's (1), or stderr's (2), or some other number that parent and child code agree upon.
You can pass the file descriptor number to the child as an argument.
You can write the number to a file from which the child subsequently reads it.
As a special case of the previous, you can set up a pipe from the parent to the child's stdin, and send the number to the child over the pipe.
Those are not the only possibilities, but I think they cover all of the easiest and best ones. Note well that the first is the only one that does not depend on the cooperation of the child.
so in this code snippet a process forked a child process. the child process calculated a random number r, and called linux command 'head -r "file' with an exec function which destroys the process itself, but to send the result back to the parent process the child process first duplicated the writing end of a pipe p, shared with the parent process and then closed both ends of the pipe p and closed the stdout file descriptor too...after execlp the parent process could read the result of the command 'head -r "fil2"' from the pipe p.
How is this possible?
if (pid == 0)
{
/* code of child */
srand(time(NULL));
nr=atoi(argv[(i*2)+2]);
r=mia_random(nr); //calc random value
close(1); //closing standard output???
dup(p[1]); //duplicating write end of inherited pipe from parent
close(p[0]);//closing read end of inherited pipe
close(p[1]);//closing write end of inherited pipe
//creating a variable to hold an argument for `head`
sprintf(option, "-%d", r);
//calling head on a file given as argument in main
execlp("head", "head", option, argv[(i*2)+1], (char *)0);
/* must not be here anymore*/
/* using perror to check for errors since stdout is closed? or connected to the pipe?*/
perror("Problem esecuting head by child process");
exit(-1);
}
Why wasn't the result of head written to stderr instead? How come it was written to the dup(p[1])???
The system is guaranteed to open each new file at the lowest possible file descriptor.
Effectively, that means that if fd 0 and 1 are open and p[1] != 1, then
close(1);
dup(p[1]);
in a single-threaded process is equivalent to
dup2(p[1],1);
or in other words, if the dup call in this context succeeds, it will return (filedescriptor) 1.
This is the code i found for my own shell. It works fine, but the thing i can't understand is pipe section of the code.
#include <stdio.h>
#include <unistd.h>
#include <string.h>
#include <stdlib.h>
char* cmndtkn[256];
char buffer[256];
char* path=NULL;
char pwd[128];
int main(){
//setting path variable
char *env;
env=getenv("PATH");
putenv(env);
system("clear");
printf("\t MY OWN SHELL !!!!!!!!!!\n ");
printf("_______________________________________\n\n");
while(1){
fflush(stdin);
getcwd(pwd,128);
printf("[MOSH~%s]$",pwd);
fgets(buffer,sizeof(buffer),stdin);
buffer[sizeof(buffer)-1] = '\0';
//tokenize the input command line
char* tkn = strtok(buffer," \t\n");
int i=0;
int indictr=0;
// loop for every part of the command
while(tkn!=NULL)
{
if(strcoll(tkn,"exit")==0 ){
exit(0);
}
else if(strcoll(buffer,"cd")==0){
path = buffer;
chdir(path+=3);
}
else if(strcoll(tkn,"|")==0){
indictr=i;
}
cmndtkn[i++] = tkn;
tkn = strtok(NULL," \t\n");
}cmndtkn[i]='\0';
// execute when command has pipe. when | command is found indictr is greater than 0.
if(indictr>0){
char* leftcmnd[indictr+1];
char* rightcmnd[i-indictr];
int a,b;
for(b=0;b<indictr;b++)
leftcmnd[b]=cmndtkn[b];
leftcmnd[indictr]=NULL;
for(a=0;a<i-indictr-1;a++)
rightcmnd[a]=cmndtkn[a+indictr+1];
rightcmnd[i-indictr]=NULL;
if(!fork())
{
fflush(stdout);
int pfds[2];
pipe(pfds);
if(!fork()){
close(1);
dup(pfds[1]);
close(pfds[0]);
execvp(leftcmnd[0],leftcmnd);
}
else{
close(0);
dup(pfds[0]);
close(pfds[1]);
execvp(rightcmnd[0],rightcmnd);
}
}else
wait(NULL);
//command not include pipe
}else{
if(!fork()){
fflush(stdout);
execvp(cmndtkn[0],cmndtkn);
}else
wait(NULL);
}
}
}
What is the purpose of the calls to close() with parameters of 0 and 1 mean and what does the call to dup() do?
On Unix, the dup() call uses the lowest numbered unused file descriptor. So, the close(1) before the call to dup() is to coerce dup() to use file descriptor 1. Similarly for close(0).
So, the aliasing is to get the process to use the write end of the pipe for stdout (file descriptor 1 is used for console output), and the read end of the pipe for stdin (file descriptor 0 is used for console input).
The code may have been more clearly expressed with dup2() instead.
dup2(fd[1], 1); /* alias fd[1] to 1 */
From your question about how ls | sort works, your question is not limited to why the dup() system call is being made. Your question is actually how pipes in Unix work, and how a shell command pipeline works.
A pipe in Unix is a pair of file descriptors that are related in that writing data on tje writable descriptor allows that data to be read from the readable descriptor. The pipe() call returns this pair in an array, where the first array element is readable, and second array element is writable.
In Unix, a fork() followed by some kind of exec() is the only way to produce a new process (there are other library calls, such as system() or popen() that create processes, but they call fork() and do an exec() under the hood). A fork() produces a child process. The child process sees the return value of 0 from the call, while the parent sees a non-zero return value that is either the PID of the child process, or a -1 indicating that an error has occurred.
The child process is a duplicate of the parent. This means that when a child modifies a variable, it is modifying a copy of the variable that resides in its own process. The parent does not see the modification occur, as the parent has the original copy). However, a duplicated pair of file descriptors that form a pipe can be used to allow a child process its parent to communicate with each other.
So, ls | sort means that there are two processes being spawned, and the output written by ls is being read as input by sort. Two processes means two calls to fork() to create two child processes. One child process will exec() the ls command, the other child process will exec() the sort command. A pipe is used between them to allow the processes to talk to each other. The ls process writes to the writable end of the pipe, the sort process reads from the readable end of the pipe.
The ls process is coerced into writing into the writable end of the pipe with the dup() call after issuing close(1). The sort process is coerced into reading the readable end of the pipe with the dup() call after close(0).
In addition, the close() calls that close the pipe file descriptors are used to make sure that the ls process is the only process to have an open reference to the writable fd, the the sort process is the only process to have an open reference to the readable fd. That step is important because after ls exits, it will close the writable end of the fd, and the sort process will expect to see an EOF as a result. However, this will not occur if some other process still has the writable fd open.
http://en.wikipedia.org/wiki/Standard_streams#Standard_input_.28stdin.29
stdin is file descriptor 0.
stdout is file descriptor 1.
In the !fork section, the process closes stdout then calls dup on pfds[1] which according to:
http://linux.die.net/man/2/dup
Creates a duplicate of the specified file descriptor at the lowest available position, which will be 1, since it was just closed (and stdin hasn't been closed yet). This means everything sent to stdout will really go to pfds[1].
So, basically, it's setting up the two new processes to talk to each other. the !fork section is for the new child which will send data to stdout (file descriptor 1), the parent (the else block) closes stdin, so it really reads from pfds[0] when it tries to read from stdout.
Each process has to close the file descriptor in pfds it's not using, as there are two open handles to the file now that the process has forked. Each process now execs to left/right-cmnd, but the new stdin and stdout mappings remain for the new processes.
Forking twice is explained here: Why fork() twice
Assuming I have a parent process that forks a child process, writes to the child, and then waits to read something from the child, can I implement this with one pipe? It would look something like:
int main(){
pid_t pid1;
int pipefd[2];
char data[]="some data";
char rec[20];
if(pipe(pipefd) == -1){
printf("Failed to pipe\n");
exit(0);
}
pid1 = fork();
if(pid1<0){
printf("Fork failed\n");
exit(0);
}else if(pid1==0){
close(pipefd[1]);
read(pipefd[0],rec,sizeof(rec));
close(pipefd[0]);
//do some work and then write back to pipe
write(pipefd[1],data,sizeof(data));
}else{
close(pipefd[0]);
write(pipefd[1],data,sizeof(data));
close(pipefd[1]);
//ignoring using select() for the moment.
read(pipedfd[0],rec,sizeof(rec));
}
When trying to learn more about this, the man pages state that pipes are unidirectional. Does this mean that when you create a pipe to communicate between a parent and child, the process that writes to the pipe can no longer read from it, and the process that reads from the pipe can no longer write to it? Does this mean you need two pipes to allow back and forth communication? Something like:
Pipe1:
P----read----->C
P<---write-----C
Pipe2:
P----write---->C
P<---read------C
No. Pipes by definition are one-way. The problem is, that without any synchronization you will have both processes reading from the same filedescriptor. If you, however, use semaphores you could do something like that
S := semaphore initiated to 0.
P writes to pipe
P tries down on S (it blocks)
P reads from pipe
C reads from pipe
C writes to pipe
C does up on S (P wakes up and continues)
The other way is to use two pipes - easier.
It is unspecified whether fildes[0] is also open for writing and whether fildes[1] is also open for reading.
That being said, the easiest way would be to use two pipes.
Another way would be to specify a file descriptor/name/path to the child process through the pipe. In the child process, instead of writing to filedes[1], you can write to the file descriptor/name/path specified in filedes[1].
int main()
{
int data_processed;
int file_pipes[2];
const char some_data[] = "123";
char buffer[BUFSIZ + 1];
pid_t fork_result;
memset(buffer, '\0', sizeof(buffer));
if (pipe(file_pipes) == 0) {
fork_result = fork();
if (fork_result == -1) {
fprintf(stderr, "Fork failure");
exit(EXIT_FAILURE);
}
// We've made sure the fork worked, so if fork_result equals zero, we're in the child process.
if (fork_result == 0) {
data_processed = read(file_pipes[0], buffer, BUFSIZ);
printf("Read %d bytes: %s\n", data_processed, buffer);
exit(EXIT_SUCCESS);
}
// Otherwise, we must be the parent process.
else {
data_processed = write(file_pipes[1], some_data,
strlen(some_data));
printf("Wrote %d bytes\n", data_processed);
}
}
exit(EXIT_SUCCESS);
}
Based on my understanding, the child process created by fork doesn't share variables with its parent process. Then, why here the parent can write to one file descriptor and child process can get the data by reading from another file descriptor. Is this because they are controled somehow by the pipe function internally?
File descriptors, including pipes, are duplicated on fork -- the child process ends up with the same file descriptor table, including stdin/out/err and the pipes, as the parent had immediately before the fork.
Based on my understanding, the child process created by fork doesn't share variables with its parent process.
This isn't entirely true -- changes to variables are not shared with the parent, but the values that the parent had immediately prior to the fork are all visible to the child afterwards.
In any case, pipes exist within the operating system, not within the process. As such, data written to one end of the pipe becomes visible to any other process holding a FD for the other end. (If more than one process tries to read the data, the first process to try to read() data gets it, and any other processes miss out.)
The variables are not shared e.g. if you write file_pipes[0] = 999 in the child, it will not be reflected in the parent. The file descriptors are shared (FD number x in the child refers to the same thing as FD number x in the parent). This is why (for example) you can redirect the output of a shell script which executes other commands (because they share the same standard output file descriptor).
You're right - ordinary variables aren't shared between the parent and the child.
However, pipes are not variables. They're a pseudo-file specifically designed to connect two independent processes together. When you write to a pipe, you're not changing a variable in the current process - you're sending data off to the operating system and asking it to make that data available to the next process to read from the pipe.
It's just like when you write to a real, on-disk file - except that the data isn't written to disk, it's just made available at the other end of the pipe.