So I have 2 questions about pipes in c :
1:
When i fork a process after creating a pipe in order to make the parent write to the pipe and the child read from it, How it's synchronized ? : the parent always send data before the child attempts to read that data although they are running concurrently?
why i don't fall on the case where the child start reading before the parent try to send its data ?
2:
This is about file descriptors of the pipe, referring to the code below how can parent-process close the pipe-output file-descriptor while the child doesn't access to that file yet ? , assuming the parent starts first.
Any help will be appreciated, Thank you !
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/wait.h>
#include <stdlib.h>
#define BUFFER_SIZE 256
int main(int argc , char*argv[])
{
pid_t worker_pid ;
int descriptor[2];
unsigned char bufferR[256] , bufferW[256];
/***************************** create pipe ********************/
puts("pipe creation");
if (pipe(descriptor) !=0)
{
fprintf(stderr,"error in the pipe");
exit(1);
}
/***************************** create processes ********************/
puts("now fork processes");
worker_pid = fork(); // fork process
if (worker_pid == -1 ) // error forking processes
{
fprintf(stderr,"error in fork");
exit(2);
}
/*********************** communicate processes ********************/
if (worker_pid == 0) // in the child process : the reader
{
close(descriptor[1]); // close input file descriptor , ok because parent finished writing
read(descriptor[0],bufferR,BUFFER_SIZE);
printf("i'm the child process and i read : %s \n",bufferR);
}
if (worker_pid !=0)
{
// is there any case where child attempts to read before parent writting ?
close(descriptor[0]);// close file descriptor of child before it reads ?
sprintf(bufferW,"i'm the parent process my id is %d , and i wrote this to my child",getpid());
write(descriptor[1],bufferW,BUFFER_SIZE);
wait(NULL);
}
return 0;
}
I expected there will be some cases for question 1 where the output is :
i'm the child process and i read :
because the parent doesn't wrote it's message yet
for question 2 i expected an error saying :
invalid file descriptor in the child process because the parent already closed it (assuming the parent runs always the first)
but the actual output is always :
i'm the child process and i read: i'm the parent process my id is 7589, and i wrote this to my child
When i fork a process after creating a pipe in order to make the
parent write to the pipe and the child read from it, How it's
synchronized ? : the parent always send data before the child attempts
to read that data although they are running concurrently? why i
don't fall on the case where the child start reading before the parent
try to send its data ?
Typically, it doesn't need to be synchronized, and in fact it can itself serve as a synchronization mechanism. If you perform blocking reads (the default) then they will not complete until the corresponding data have been sent, regardless of the relative order of the initial read and write calls.
The two processes do, however, need to implement an appropriate mechanism to demarcate and recognize message boundaries, so that the reader can recognize and respond appropriately to short reads. That may be as simple as ending each message with a newline, and reading messages with fgets() (through a stream obtained via fdopen()).
This is about file descriptors of the pipe, referring to the code below how can parent-process close the pipe-output file-descriptor while the child doesn't access to that file yet ? , assuming the parent starts first.
Not an issue. Once the child is forked, it has access to the underlying open file descriptions it inherited from its parent, through the same file descriptor numbers that the parent could use, but these are separate from the parent for the purpose of determining how many times each open file description is referenced. The situation is similar to that resulting from the operation of the dup() syscall. Only when all processes close all their file descriptors for a given open file description is that open file description invalidated.
This is done internally in the kernel. When you try to read from a file descriptor (being it a pipe, or socket), if the "remote side" has not sent any data, your process stalls (the call to read does not return at all), until the other side has pushed something into the internal buffers of the kernel (in your example, wrote to the pipe).
Here you can see the internal implementation of the pipes in linux:
https://github.com/torvalds/linux/blob/master/fs/pipe.c#L272
Look for variable do_wakeup.
Related
I have to implement a program in which a process sends data it has received from parent process to its child process, waits until the child sends him processed data back, and then return processed data to child process (so e.g. in case of 4 processes the data flow would look like this P1->P2->P3->P4->P3->P2->P1). For means of interprocess communication I need to use pipes. Here's an approach I planned to take:
./child
// Assert argv contains 2 pipe descriptors - for reading
// from parent and for writing to parent, both of type char[]
// I'm not handling system errors currently
int main(int argc, char *argv[]) {
int read_dsc, write_dsc;
read_dsc = atoi(argv[1]);
write_dsc = atoi(argv[2]);
char data[DATA_SIZE];
read (read_dsc, data, DATA_SIZE - 1);
close (read_dsc);
// Process data...
(...)
// Pass processed data further
int pipeRead[2]; // Child process will read from this pipe
int pipeWrite[2]; // Child process will write into this pipe
pipe(pipeRead);
pipe(pipeWrite);
switch(fork()) {
case 0:
close (pipeRead[1]);
close (pipeWrite[0]);
char pipeReadDsc[DSC_SIZE];
char pipeWriteDsc[DSC_SIZE];
printf (pipeReadDsc, "%d", pipeRead[0]);
printf (pipeWriteDsc, "%d", pipeWrite[1]);
execl ("./child", "child", pipeReadDsc, pipeWriteDsc, (char *) 0);
default:
close(pipeRead[0]);
close(pipeWrite[1]);
wait(0);
read (pipeWrite[0], data, DATA_SIZE - 1);
close (pipeWrite[0]);
// Pass data to parent process
write (write_dsc, data, DATA_SIZE - 1);
close (write_dsc);
}
}
High level description of my solution is as follows: make 2 pipes, one for writing to child process, one for reading from child process. Wait until child process finishes and then read from read pipe and pass data to parent.
The problem is I don't know whether this approach is correct. I've read somewhere that not closing unused pipes is an error as it clutters OS file descriptors and there shouldn't be many opened pipes at once. Here however we're keping unclosed pipe for reading from a child and potentially if there are n processes, there are n opened pipes when process number n processes it's data (all parent processes are waiting for data to come back). However I can't see any other way to solve this problem...
So - is my solution correct? If it isn't, how should I approach this problem?
Yes your solution is correct. But there is problems in your code:
case 0 is the child, you will benefit in redirecting pipe ends onto standard input and output (use dup or dup2); passing descriptor ids to the child is weird.
default is the parent, so you need to write before reading.
"not closing unused pipes is an error" : it is not an error but may cause problems (detecting the end of a communication would be difficult or impossible), but it seems that you correctly close all non useful pipe ends in your code, so ok. In general the number of open pipes is not really an issue, as open files...
I am creating a TCP service that forks a new process each time a client connects. Before the fork I set up a pipe so the child can send statistics gathered during the connection back to the parent. The parent closes the writing end and the child closes the reading end, and the parent maintains an array of reading-end file descriptors, one per child.
I am not sure what to do with these file descriptors when the child finishes with the connection and exits. Does the child need to notify the parent via the pipe that it is about to exit so the parent can close the pipe? Or can the parent detect the broken pipe automatically after the child exits and close it?
The code in the parent program is running a loop with select() detecting activity on the listening socket and on the read ends of the children's pipes. Each child may send multiple messages to the parent as it runs.
In general, what should the parent process do with a pipe file descriptor when a child exits?
First pass: before it was clear that there was a loop using select() and that children sent multiple messages.
If the parent process maintains an array of file descriptors, it needs to also associate each file descriptor with a child process. If the children send a single small statistics message before they die, then when the main program waits for dead children, it knows which child died, so it can then close the file descriptor for the child that it just spotted dieing (after making sure the pipe is empty by doing one or more final reads).
An alternative mechanism uses select() or poll() or a related function that reports when a read operation on a file descriptor would not hang. When it detects EOF (zero bytes read) from a pipe, it knows the child died. However, this is probably fiddlier to deal with.
It isn't entirely clear from your question whether there's a single message from the child process as it exits, or whether there is a 'stream of consciousness' statistics reports as the child is working. If there's a single message (that's smaller than the pipe buffer size), then life is easy. If there's a stream of messages or the message is longer than the pipe buffer size, you have to think more carefully about coordination — you can't detect messages only when the child dies.
Second pass: after the extra information became available.
If you're already using select() in a loop, then when a child dies, you will get a 'pipe ready for reading' indication from select() and you will get 0 bytes from read() which indicates EOF on that pipe. You should then close that pipe (and wait for one or more children with waitpid(), probably using W_NOHANG — there should be at least one corpse to be collected — so you don't have zombies kicking around for protracted times).
A strict answer to your last question is: when the only child with the write end of a pipe dies, the parent should close the read end of that pipe to release resources for later reuse.
In your case, the parent process shall close writing end of the pipe right after fork. Then it can read its statistic data until EOF (end-of-file) and then close the reading end of the pipe.
broken pipe happens when you write to the pipe but no fd is open for reading from that pipe. So it doesn't apply to your case. In your case, since your parent is reading from the pipe, it should read EOF when child exits (if you have closed the write end in your parent process correctly, otherwise it will just block since it assumes there will still be things to read in the future). Then you can safely close the read fd in your parent process.
In general
If parent writes and child reads, you do need to worry about broken pipe which is when the child closes the read fd, and parent gets SIGPIPE as it keeps writing to the pipe. SIGPIPE by default terminates the process, so you may want to set up a signal handler to make it do whatever you want (if you don't want it to just terminate).
Let's see the see the cases differently for Parent having 1 child and children.
Parent having 1 child. When child process exits and parent is waiting on read end, parent will also exit. Here's the code
//fd[0] //read end
//fd[1] //write end
#include <unistd.h>
#include <stdio.h>
#include <errno.h> //For errno
#include <stdlib.h> //exit()
void DumpAndExit(char* str){
perror (str);
printf ("errno: %d", errono);
exit(0);
}
int main(){
int fd[2], pid = -1;
if (pipe(fd) < 0)
DumpAndExit ("pipe");
if (pid = fork() < 0) {
DumpAndExit ("fork");
}
else if (pid == 0) { //Child
close(fd[0]); //Close read end
printf("In Child \n");
sleep(2);
exit(0);
} else { //Parent
close(fd[1]); //close write
waitpid(-1); //Parent will wait for child
printf("Parent waiting\n");
char buf[4] = {};
read(fd[0], buf, sizeof(buf)); //reads from pipe
printf ("Read from child: %s", buf);
}
}
# ./a.out
In child
Parent waiting
Read from child:
#
In very simple words:
Every process have a PCB(struct task_struct) which has all information of process, In case of fork() it will have child's contexts as well. Means pointers to child's PCB.
Since pipe ie int fd[2] is created on stack of parent, then duplicated to child's stack. When child exits, its PCB is cleared, PCB of parent is updated and Parent knows there's no one connected at other end of pipe.
This is the code i found for my own shell. It works fine, but the thing i can't understand is pipe section of the code.
#include <stdio.h>
#include <unistd.h>
#include <string.h>
#include <stdlib.h>
char* cmndtkn[256];
char buffer[256];
char* path=NULL;
char pwd[128];
int main(){
//setting path variable
char *env;
env=getenv("PATH");
putenv(env);
system("clear");
printf("\t MY OWN SHELL !!!!!!!!!!\n ");
printf("_______________________________________\n\n");
while(1){
fflush(stdin);
getcwd(pwd,128);
printf("[MOSH~%s]$",pwd);
fgets(buffer,sizeof(buffer),stdin);
buffer[sizeof(buffer)-1] = '\0';
//tokenize the input command line
char* tkn = strtok(buffer," \t\n");
int i=0;
int indictr=0;
// loop for every part of the command
while(tkn!=NULL)
{
if(strcoll(tkn,"exit")==0 ){
exit(0);
}
else if(strcoll(buffer,"cd")==0){
path = buffer;
chdir(path+=3);
}
else if(strcoll(tkn,"|")==0){
indictr=i;
}
cmndtkn[i++] = tkn;
tkn = strtok(NULL," \t\n");
}cmndtkn[i]='\0';
// execute when command has pipe. when | command is found indictr is greater than 0.
if(indictr>0){
char* leftcmnd[indictr+1];
char* rightcmnd[i-indictr];
int a,b;
for(b=0;b<indictr;b++)
leftcmnd[b]=cmndtkn[b];
leftcmnd[indictr]=NULL;
for(a=0;a<i-indictr-1;a++)
rightcmnd[a]=cmndtkn[a+indictr+1];
rightcmnd[i-indictr]=NULL;
if(!fork())
{
fflush(stdout);
int pfds[2];
pipe(pfds);
if(!fork()){
close(1);
dup(pfds[1]);
close(pfds[0]);
execvp(leftcmnd[0],leftcmnd);
}
else{
close(0);
dup(pfds[0]);
close(pfds[1]);
execvp(rightcmnd[0],rightcmnd);
}
}else
wait(NULL);
//command not include pipe
}else{
if(!fork()){
fflush(stdout);
execvp(cmndtkn[0],cmndtkn);
}else
wait(NULL);
}
}
}
What is the purpose of the calls to close() with parameters of 0 and 1 mean and what does the call to dup() do?
On Unix, the dup() call uses the lowest numbered unused file descriptor. So, the close(1) before the call to dup() is to coerce dup() to use file descriptor 1. Similarly for close(0).
So, the aliasing is to get the process to use the write end of the pipe for stdout (file descriptor 1 is used for console output), and the read end of the pipe for stdin (file descriptor 0 is used for console input).
The code may have been more clearly expressed with dup2() instead.
dup2(fd[1], 1); /* alias fd[1] to 1 */
From your question about how ls | sort works, your question is not limited to why the dup() system call is being made. Your question is actually how pipes in Unix work, and how a shell command pipeline works.
A pipe in Unix is a pair of file descriptors that are related in that writing data on tje writable descriptor allows that data to be read from the readable descriptor. The pipe() call returns this pair in an array, where the first array element is readable, and second array element is writable.
In Unix, a fork() followed by some kind of exec() is the only way to produce a new process (there are other library calls, such as system() or popen() that create processes, but they call fork() and do an exec() under the hood). A fork() produces a child process. The child process sees the return value of 0 from the call, while the parent sees a non-zero return value that is either the PID of the child process, or a -1 indicating that an error has occurred.
The child process is a duplicate of the parent. This means that when a child modifies a variable, it is modifying a copy of the variable that resides in its own process. The parent does not see the modification occur, as the parent has the original copy). However, a duplicated pair of file descriptors that form a pipe can be used to allow a child process its parent to communicate with each other.
So, ls | sort means that there are two processes being spawned, and the output written by ls is being read as input by sort. Two processes means two calls to fork() to create two child processes. One child process will exec() the ls command, the other child process will exec() the sort command. A pipe is used between them to allow the processes to talk to each other. The ls process writes to the writable end of the pipe, the sort process reads from the readable end of the pipe.
The ls process is coerced into writing into the writable end of the pipe with the dup() call after issuing close(1). The sort process is coerced into reading the readable end of the pipe with the dup() call after close(0).
In addition, the close() calls that close the pipe file descriptors are used to make sure that the ls process is the only process to have an open reference to the writable fd, the the sort process is the only process to have an open reference to the readable fd. That step is important because after ls exits, it will close the writable end of the fd, and the sort process will expect to see an EOF as a result. However, this will not occur if some other process still has the writable fd open.
http://en.wikipedia.org/wiki/Standard_streams#Standard_input_.28stdin.29
stdin is file descriptor 0.
stdout is file descriptor 1.
In the !fork section, the process closes stdout then calls dup on pfds[1] which according to:
http://linux.die.net/man/2/dup
Creates a duplicate of the specified file descriptor at the lowest available position, which will be 1, since it was just closed (and stdin hasn't been closed yet). This means everything sent to stdout will really go to pfds[1].
So, basically, it's setting up the two new processes to talk to each other. the !fork section is for the new child which will send data to stdout (file descriptor 1), the parent (the else block) closes stdin, so it really reads from pfds[0] when it tries to read from stdout.
Each process has to close the file descriptor in pfds it's not using, as there are two open handles to the file now that the process has forked. Each process now execs to left/right-cmnd, but the new stdin and stdout mappings remain for the new processes.
Forking twice is explained here: Why fork() twice
Given the following code:
int main(int argc, char *argv[])
{
int pipefd[2];
pid_t cpid;
char buf;
if (argc != 2) {
fprintf(stderr, "Usage: %s \n", argv[0]);
exit(EXIT_FAILURE);
}
if (pipe(pipefd) == -1) {
perror("pipe");
exit(EXIT_FAILURE);
}
cpid = fork();
if (cpid == -1) {
perror("fork");
exit(EXIT_FAILURE);
}
if (cpid == 0) { /* Child reads from pipe */
close(pipefd[1]); /* Close unused write end */
while (read(pipefd[0], &buf, 1) > 0)
write(STDOUT_FILENO, &buf, 1);
write(STDOUT_FILENO, "\n", 1);
close(pipefd[0]);
_exit(EXIT_SUCCESS);
} else { /* Parent writes argv[1] to pipe */
close(pipefd[0]); /* Close unused read end */
write(pipefd[1], argv[1], strlen(argv[1]));
close(pipefd[1]); /* Reader will see EOF */
wait(NULL); /* Wait for child */
exit(EXIT_SUCCESS);
}
return 0;
}
Whenever the child process wants to read from the pipe, it must first close the pipe's side from writing. When I remove that line close(pipefd[1]); from the child process's if,
I'm basically saying that "okay, the child can read from the pipe, but I'm allowing the parent to write to the pipe at the same time"?
If so, what would happen when the pipe is open for both reading & writing? No mutual exclusion?
Whenever the child process wants to read from the pipe, it must first close the pipe's side from writing.
If the process — parent or child — is not going to use the write end of a pipe, it should close that file descriptor. Similarly for the read end of a pipe. The system will assume that a write could occur while any process has the write end open, even if the only such process is the one that is currently trying to read from the pipe, and the system will not report EOF, therefore. Further, if you overfill a pipe and there is still a process with the read end open (even if that process is the one trying to write), then the write will hang, waiting for the reader to make space for the write to complete.
When I remove that line close(pipefd[1]); from the child's process IF, I'm basically saying that "okay, the child can read from the pipe, but I'm allowing the parent to write to the pipe at the same time"?
No; you're saying that the child can write to the pipe as well as the parent. Any process with the write file descriptor for the pipe can write to the pipe.
If so, what would happen when the pipe is open for both reading and writing — no mutual exclusion?
There isn't any mutual exclusion ever. Any process with the pipe write descriptor open can write to the pipe at any time; the kernel ensures that two concurrent write operations are in fact serialized. Any process with the pipe read descriptor open can read from the pipe at any time; the kernel ensures that two concurrent read operations get different data bytes.
You make sure a pipe is used unidirectionally by ensuring that only one process has it open for writing and only one process has it open for reading. However, that is a programming decision. You could have N processes with the write end open and M processes with the read end open (and, perish the thought, there could be processes in common between the set of N and set of M processes), and they'd all be able to work surprisingly sanely. But you'd not readily be able to predict where a packet of data would be read after it was written.
fork() duplicates the file handles, so you will have two handles for each end of the pipe.
Now, consider this. If the parent doesn't close the unused end of the pipe, there will still be two handles for it. If the child dies, the handle on the child side goes away, but there's still the open handle held by the parent -- thus, there will never be a "broken pipe" or "EOF" arriving because the pipe is still perfectly valid. There's just nobody putting data into it anymore.
Same for the other direction, of course.
Yes, the parent/child could still use the handle to write into their own pipe; I don't remember a use-case for this, though, and it still gives you synchronization problems.
When the pipe is created it is having two ends the read end and write end. These are entries in the User File descriptor table.
Similarly there will be two entries in the File table with 1 as reference count for both the read end and the write end.
Now when you fork, a child is created that is the file descriptors are duplicated and thus the reference count of both the ends in the file table becomes 2.
Now "When I remove that line close(pipefd[1])" -> In this case even if the parent has completed writing, your while loop below this line will block for ever for the read to return 0(ie EOF). This happens since even if the parent has completed writing and closed the write end of the pipe, the reference count of the write end in the File table is still 1 (Initially it was 2) and so the read function still is waiting for some data to arrive which will never happen.
Now if you have not written "close(pipefd[0]);" in the parent, this current code may not show any problem, since you are writing once in the parent.
But if you write more than once then ideally you would have wanted to get an error (if the child is no longer reading),but since the read end in the parent is not closed, you will not be getting the error (Even if the child is no more there to read).
So the problem of not closing the unused ends become evident when we are continuously reading/writing data. This may not be evident if we are just reading/writing data once.
Like if instead of the read loop in the child, you are using only once the line below, where you are getting all the data in one go, and not caring to check for EOF, your program will work even if you are not writing "close(pipefd[1]);" in the child.
read(pipefd[0], buf, sizeof(buf));//buf is a character array sufficiently large
man page for pipe() for SunOS :-
Read calls on an empty pipe (no buffered data) with only one
end (all write file descriptors closed) return an EOF (end
of file).
A SIGPIPE signal is generated if a write on a pipe with only
one end is attempted.
int main()
{
int data_processed;
int file_pipes[2];
const char some_data[] = "123";
char buffer[BUFSIZ + 1];
pid_t fork_result;
memset(buffer, '\0', sizeof(buffer));
if (pipe(file_pipes) == 0) {
fork_result = fork();
if (fork_result == -1) {
fprintf(stderr, "Fork failure");
exit(EXIT_FAILURE);
}
// We've made sure the fork worked, so if fork_result equals zero, we're in the child process.
if (fork_result == 0) {
data_processed = read(file_pipes[0], buffer, BUFSIZ);
printf("Read %d bytes: %s\n", data_processed, buffer);
exit(EXIT_SUCCESS);
}
// Otherwise, we must be the parent process.
else {
data_processed = write(file_pipes[1], some_data,
strlen(some_data));
printf("Wrote %d bytes\n", data_processed);
}
}
exit(EXIT_SUCCESS);
}
Based on my understanding, the child process created by fork doesn't share variables with its parent process. Then, why here the parent can write to one file descriptor and child process can get the data by reading from another file descriptor. Is this because they are controled somehow by the pipe function internally?
File descriptors, including pipes, are duplicated on fork -- the child process ends up with the same file descriptor table, including stdin/out/err and the pipes, as the parent had immediately before the fork.
Based on my understanding, the child process created by fork doesn't share variables with its parent process.
This isn't entirely true -- changes to variables are not shared with the parent, but the values that the parent had immediately prior to the fork are all visible to the child afterwards.
In any case, pipes exist within the operating system, not within the process. As such, data written to one end of the pipe becomes visible to any other process holding a FD for the other end. (If more than one process tries to read the data, the first process to try to read() data gets it, and any other processes miss out.)
The variables are not shared e.g. if you write file_pipes[0] = 999 in the child, it will not be reflected in the parent. The file descriptors are shared (FD number x in the child refers to the same thing as FD number x in the parent). This is why (for example) you can redirect the output of a shell script which executes other commands (because they share the same standard output file descriptor).
You're right - ordinary variables aren't shared between the parent and the child.
However, pipes are not variables. They're a pseudo-file specifically designed to connect two independent processes together. When you write to a pipe, you're not changing a variable in the current process - you're sending data off to the operating system and asking it to make that data available to the next process to read from the pipe.
It's just like when you write to a real, on-disk file - except that the data isn't written to disk, it's just made available at the other end of the pipe.