understanding pipe() function - c

I'm trying to understand how pipe() function works and I have the following program example
int main(void)
{
int fd[2], nbytes;
pid_t childpid;
char string[] = "Hello, world!\n";
char readbuffer[80];
pipe(fd);
if((childpid = fork()) == -1)
{
perror("fork");
exit(1);
}
if(childpid == 0)
{
/* Child process closes up input side of pipe */
close(fd[0]);
/* Send "string" through the output side of pipe */
write(fd[1], string, (strlen(string)+1));
exit(0);
}
else
{
/* Parent process closes up output side of pipe */
close(fd[1]);
/* Read in a string from the pipe */
nbytes = read(fd[0], readbuffer, sizeof(readbuffer));
printf("Received string: %s", readbuffer);
}
return(0);
}
My first question is what benefits do we get from closing the file descriptor using close(fd[0]) and close(fd[1]) in child and parent processes. Second, we use write in child and read in parent, but what if parent process reaches read before child reaches write and tries to read from pipe which has nothing in it ? Thanks!

Daniel Jour gave you 99% of the answer already, in a very succinct and easy to understand manner:
Closing: Because it's good practice to close what you don't need. For the second question: These are potentially blocking functions. So reading from an empty pipe will just block the reader process until something gets written into the pipe.
I'll try to elaborate.
Closing:
When a process is forked, its open files are duplicated.
Each process has a limit on how many files descriptors it's allowed to have open. As stated in the documentation: each side of the pipe is a single fd, meaning a pipe requires two file descriptors and in your example, each process is only using one.
By closing the file descriptor you don't use, you're releasing resources that are in limited supply and which you might need further on down the road.
e.g., if you were writing a server, that extra fd means you can handle one more client.
Also, although releasing resources on exit is "optional", it's good practice. Resources that weren't properly released should be handled by the OS...
...but the OS was also written by us programmers, and we do make mistakes. So it only makes sense that the one who claimed a resource and knows about it will be kind enough to release the resource.
Race conditions (read before write):
POSIX defines a few behaviors that make read, write and pipes a good choice for thread and process concurrency synchronization. You can read more about it on the Rational section for write, but here's a quick rundown:
By default, pipes (and sockets) are created in what is known as "blocking mode".
This means that the application will hang until the IO operation is performed.
Also, IO operations are atomic, meaning that:
You will never be reading and writing at the same time. A read operation will wait until a write operation completes before reading from the pipe (and vice-versa)
if two threads call read in the same time, each will get a serial (not parallel) response, reading sequentially from the pipe (or socket) - this make pipes great tools for concurrency handling.
In other words, when your application calls:
read(fd[0], readbuffer, sizeof(readbuffer));
Your application will wait forever for some data to be available and for the read operation to complete (which it will once 80 (sizeof(readbuffer)) bytes were read, or if the EOF status changed during a read).

Related

The proper use of close in C

I am confused as to how to properly use close to close pipes in C. I am fairly new to C so I apologize if this is too elementary but I cannot find any explanations elsewhere.
#include <stdio.h>
int main()
{
int fd[2];
pipe(fd);
if(fork() == 0) {
close(0);
dup(fd[0]);
close(fd[0]);
close(fd[1]);
} else {
close(fd[0]);
write(fd[1], "hi", 2);
close(fd[1]);
}
wait((int *) 0);
exit(0);
}
My first question is: In the above code, the child process will close the write side of fd. If we first reach close(fd[1]), then the parent process reach write(fd[1], "hi", 2), wouldn't fd[1] already been closed?
int main()
{
char *receive;
int[] fd;
pipe(fd);
if(fork() == 0) {
while(read(fd[0], receive, 2) != 0){
printf("got u!\n");
}
} else {
for(int i = 0; i < 2; i++){
write(fd[1], 'hi', 2);
}
close(fd[1]);
}
wait((int *) 0);
exit(0);
}
The second question is: In the above code, would it be possible for us to reach close(fd[1]) in the parent process before the child process finish receiving all the contents? If yes, then what is the correct way to communicate between parent and child. My understanding here is that if we do not close fd[1] in the parent, then read will keep being blocked, and the program won't exit either.
First of all note that, after fork(), the file descriptors fd would also get copied over to the child process. So basically, a pipe acts like a file with each process having its own references to the read and write end of the pipe. Essentially there are 2 read and 2 write file descriptors, one for each process.
My first question is: In the above code, the child process will close
the write side of fd. If we first reach close(fd[1]), then the parent
process reach write(fd[1], "hi", 2), wouldn't fd[1] already been
closed?
Answer: No. The fd[1] in parent process is the parent's write end. The child has forsaken its right to write on the pipe by closing its fd[1], which does not stop the parent from writing into it.
Before answering the second question, I fixed your code to actually run it and produce some results.
int main()
{
char receive[10];
int fd[2];
pipe(fd);
if(fork() == 0) {
close(fd[1]); <-- Close UNUSED write end
while(read(fd[0], receive, 2) != 0){
printf("got u!\n");
receive[2] = '\0';
printf("%s\n", receive);
}
close(fd[0]); <-- Close read end after reading
} else {
close(fd[0]); <-- Close UNUSED read end
for(int i = 0; i < 2; i++){
write(fd[1], "hi", 2);
}
close(fd[1]); <-- Close write end after writing
wait((int *) 0);
}
exit(0);
}
Result:
got u!
hi
got u!
hi
Note: We (seemingly) lost one hi because we are reading it into same array receive which essentially overrides the first hi. You can use 2D char arrays to retain both the messages.
The second question is: In the above code, would it be possible for us
to reach close(fd[1]) in the parent process before the child process
finish receiving all the contents?
Answer: Yes. Writing to a pipe() is non-blocking (unless otherwise specified) until the pipe buffer is full.
If yes, then what is the correct
way to communicate between parent and child. My understanding here is
that if we do not close fd[1] in the parent, then read will keep being
blocked, and the program won't exit either.
If we close fd[1] in parent, it will signal that parent has closed its write end. However, if the child did not close its fd[1] earlier, it will block on read() as the pipe will not send EOF until all the write ends are closed. So the child will be left expecting itself to write to the pipe, while reading from it simultaneously!
Now what happens if the parent does not close its unused read end? If the file had only one read descriptor (say the one with the child), then once the child closes it, the parent will receive some signal or error while trying to write further to the pipe as there are no readers.
However in this situation, parent also has a read descriptor open and it will be able to write to the buffer until it gets filled, which may cause problems to the next write call, if any.
This probably won't make much sense now, but if you write a program where you need to pass values through pipe again and again, then not closing unused ends will fetch you frustrating bugs often.
what is the correct way to communicate between parent and child[?]
The parent creates the pipe before forking. After the the fork, parent and child each close the pipe end they are not using (pipes should be considered unidirectional; create two if you want bidirectional communication). The processes each have their own copy of each pipe-end file descriptor, so these closures do not affect the other process's ability to use the pipe. Each process then uses the end it holds open appropriately for its directionality -- writing to the write end or reading from the read end.
When the writer finishes writing everything it intends ever to write to the pipe, it closes its end. This is important, and sometimes essential, because the reader will not perceive end-of-file on the read end of the pipe as long as any process has the write end open. This is also one reason why it is important for each process to close the end it is not using, because if the reader also has the write end open then it can block indefinitely trying to read from the pipe, regardless of what any other process does.
Of course, the reader should also close the read end when it is done with it (or terminate, letting the system handle that). Failing to do so constitutes excess resource consumption, but whether that is a serious problem depends on the circumstances.

C fifo keeps blocked

I'm currently studying multithreading with C, but there is something I don't quite understand with our named pipe excersize.
We are expected to do an implementation of file search system that finds files and adds to a buffer with one process and the second process should take filenames from threads of first one, finds the search query inside that file and returns the position to first process via pipe. I did nearly all of it but i'm confused how to do the communication between two processes.
Here is my code that does the communication:
main.c
void *controller_thread(void *arg) {
pthread_mutex_lock(&index_mutex);
int index = t_index++; /*Get an index to thread*/
pthread_mutex_unlock(&index_mutex);
char sendPipe[10];
char recvPipe[10];
int fdsend, fdrecv;
sprintf(sendPipe, "contrl%d", (index+1));
sprintf(recvPipe, "minion%d", (index+1));
mkfifo(sendPipe, 0666);
execlp("minion", "minion", sendPipe, recvPipe, (char*) NULL);
if((fdsend = open(sendPipe, O_WRONLY|O_CREAT)) < 0)
perror("Error opening pipe");
if((fdrecv = open(recvPipe, O_RDONLY)) < 0)
perror("Error opening pipe");
while(1) {
char *fileName = pop(); /*Counting semaphore from buffer*/
if(notFile(fileName))
break;
write(fdsend, fileName, strlen(fileName));
write(fdsend, search, strlen(search));
char place[10];
while(1) {
read(fdrecv, place, 10);
if(notPlace(place)) /*Only checks if all numeric*/
break;
printf("Minion %d searching %s in %s, found at %s\n", index,
search, fileName, place);
}
}
}
From the online resources I found, I think this is the way to handle the fifo inside the main. I tried to write a test minion just to make sure it works, so here it is
minion.c
int main(int argc, char **argv) {
char *recvPipe = argv[1];
char *sendPipe = argv[2];
char fileName[100];
int fdsend, fdrecv;
return 0;
fdrecv = open(recvPipe, O_RDONLY);
mkfifo(sendPipe, 0666);
fdsend = open(sendPipe, O_WRONLY|O_CREAT);
while(1) {
read(fdrecv, fileName, 100);
write(fdsend, "12345", 6);
write(fds, "xxx", 4);
}
return 0;
}
When I run this way, the threads get blocked and prints no response if I change to O_NONBLOCK to mode of open. Then it prints "Error opening pipe no such device or address" error, so I know that somehow I couldn't open the recvPipe inside minion but I don't know what is the mistake
Among the problems with your code is an apparent misunderstanding about the usage of execlp(). On success, this function does not return, so the code following it will never be executed. One ordinarily fork()s first, then performs the execlp() in the child process, being certain to make the child terminate if the execlp() fails. The parent process may need to eventually wait for the forked child, as well.
Additionally, it is strange, and probably undesirable, that each process passes the O_CREAT flag when it attempts to open the write end of a FIFO. It should be unnecessary, because each one has just created the FIFO with mkfifo(). Even in the event that mkfifo() fails or that some other process removes it before it can be opened, you do not want to open with O_CREAT because that will get you a regular file, not a FIFO.
Once you fix the execlp() issue, you will also have a race condition. The parent process relies on the child to create one of the FIFOs, but does not wait for that process to do so. You will not get the desired behavior if the parent reaches its open attempt before the child completes its mkfifo().
I suggest having the parent create both FIFOs, before creating the child process. The child and parent must cooperate by opening the both ends of one FIFO before proceeding to open both ends of the other. One's open for reading will block until the other opens the same FIFO for writing.
Or you could use ordinary (anonymous) pipes (see pipe()) instead of FIFOs. These are created open on both ends, and they are more natural for communication between processes related by inheritance.
In any event, be sure to check the return values of your function calls. Almost all of these functions can fail, and it is much better to detect and handle that up front than to sort out the tangle that may form when you assume incorrectly that every call succeeded.
Fifos need some synchronization at open time. By default open(s) are blocking, so that an open for read is blocked until some other open the same fifo for writing, and the converse (this to let peers to be synchronized for a communication). You can use O_NONBLOCK to open for reading while there is no actual open peer, but the converse is false, because opening for writing while there is no reading peer leads to an error (letting a process trying to write while there is no reader is considered as non sense).
You may read Linux Fifo manual entry for example.

Confusion about posix pipe with respect to kernel fd table

I'm trying to understand how pipes work. From my understanding, a kernel has a file descriptor table where each element points to things like files and pipes etc. So a process can write to or read from a pipe when the correct file descriptor is specified.
In the example I've found below, a file descriptor is made of an array and a pipe is created using that. The program then forks so that there's a child copy. This is where I get confused, the child closes fd[0] so that is cannot recieve information from the parent? It writes some data to fd[1]. The parent then closes fd[1] and reads from fd[0]. This seems wrong to me, the parent is reading from the wrong place?
int main(void)
{
int fd[2], nbytes;
pid_t childpid;
char string[] = "Hello, world!\n";
char readbuffer[80];
pipe(fd);
if((childpid = fork()) == -1)
{
perror("fork");
exit(1);
}
if(childpid == 0)
{
/* Child process closes up input side of pipe */
close(fd[0]);
/* Send "string" through the output side of pipe */
write(fd[1], string, (strlen(string)+1));
exit(0);
}
else
{
/* Parent process closes up output side of pipe */
close(fd[1]);
/* Read in a string from the pipe */
nbytes = read(fd[0], readbuffer, sizeof(readbuffer));
printf("Received string: %s", readbuffer);
}
return(0);
}
Am I wrong and actually both fd elements reference the same point in the kernel's table? Intuitively I thought it would be creating two pipes. If they are the same position in the table what is the structure of a pipe where it can interpret these different read and writes?
Apologies if this is being too vague, I'm having real trouble wrapping my head around it. Any help would be appreciated. Thanks in advance!
When you fork a new process, the child has an exact copy of the open file descriptors. How this is implemented can be considered "magic" or whatever as we don't really need to know how, only that it does work. They share them and if both tried reading from stdin (for example) you'd get unpredictable results because they're both reading from the same place. It's only when all processes close a file descriptor does it truly get closed.
So in the case of your pipe, the child and parent can close the end of the pipe they're not going to use without worrying about the end they do care about from closing unexpectedly. If one of them opens another file, it may re-use the same file descriptor id of the recently closed one.

beej guide pipe example explanation

The following code is the pipe implementation given in beej's guide:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
int main(void)
{
int pfds[2];
pipe(pfds);
if (!fork()) {
close(1); /* close normal stdout */
dup(pfds[1]); /* make stdout same as pfds[1] */
close(pfds[0]); /* we don't need this */
execlp("ls", "ls", NULL);
} else {
close(0); /* close normal stdin */
dup(pfds[0]); /* make stdin same as pfds[0] */
close(pfds[1]); /* we don't need this */
execlp("wc", "wc", "-l", NULL);
}
return 0;
}
I wanted to ask:
Is it possible that close(0) is executed before dup(pfds[1])? If yes, then in that case the program will not behave as expected.
What is the use of the following lines of code:
close(pfds[0]); /* we don't need this */
close(pfds[1]); /* we don't need this */
And what would change if these lines were not there?
Is it possible that close(0) is executed before dup(pfds[1])? If yes,
then in that case the program will not behave as expected.
Yes, it is possible to have the parent successfully complete close(0) before the child calls dup(pfds[1]). However, this is not a problem. When you fork a new process, the new process gets an entire copy of the parent's memory address space, including open file descriptors (except those marked with the O_CLOEXEC flag - see fcntl(2)). So, essentially each process has its own private copy of the file descriptors and is isolated and free to do whatever it wants with that copy.
Thus, when the parent calls close(0), it is only closing its copy of file descriptor 0 (stdin); it does not affect the child in any way, which still has a reference to stdin and can use it if needed (even though in this example it won't).
What is the use of the following lines of code:
close(pfds[0]); /* we don't need this */
close(pfds[1]); /* we don't need this */
Best practices mandate that you should close file descriptors that you don't use - this is the case for close(pfds[0]). Unused open file descriptors eat up space and resources, why keep it open if you're not going to use it?
close(pfds[1]) is a little more subtle though. Pipes report end of file only when there is no more data in the pipe buffer and there are no active writers, i.e., no live processes that have the pipe open for writing. If you do not close pfds[1] in the parent, the program will hang forever because wc(1) will never see the end of input, since there is a process (wc(1) itself) that has the pipe opened for writing and as such could (but won't) write more data.
Tl;DR: close(pfds[0]) is just good practice but not mandatory; close(pfds[1]) is absolutely necessary to ensure program correctness.
Question 1:
Yes it is entirely possible that "close(0);" (in the parent) is executed before "dup(pfds[1]);" (in the child). But since this happens in different processes, the child will still have fd 0 open.
Question 2:
It is good bookkeeping practice to close the end of the pipe that a process is not going to use. That way, you can avoid bugs further down the road in more complex programs. In the above scenario, the child process should ever only read from the pipe. If you close the write end in the child, eny attempt to write to it will cause an error, otherwise you might have a bug that is hard to detect.

What happens if a child process won't close the pipe from writing, while reading?

Given the following code:
int main(int argc, char *argv[])
{
int pipefd[2];
pid_t cpid;
char buf;
if (argc != 2) {
fprintf(stderr, "Usage: %s \n", argv[0]);
exit(EXIT_FAILURE);
}
if (pipe(pipefd) == -1) {
perror("pipe");
exit(EXIT_FAILURE);
}
cpid = fork();
if (cpid == -1) {
perror("fork");
exit(EXIT_FAILURE);
}
if (cpid == 0) { /* Child reads from pipe */
close(pipefd[1]); /* Close unused write end */
while (read(pipefd[0], &buf, 1) > 0)
write(STDOUT_FILENO, &buf, 1);
write(STDOUT_FILENO, "\n", 1);
close(pipefd[0]);
_exit(EXIT_SUCCESS);
} else { /* Parent writes argv[1] to pipe */
close(pipefd[0]); /* Close unused read end */
write(pipefd[1], argv[1], strlen(argv[1]));
close(pipefd[1]); /* Reader will see EOF */
wait(NULL); /* Wait for child */
exit(EXIT_SUCCESS);
}
return 0;
}
Whenever the child process wants to read from the pipe, it must first close the pipe's side from writing. When I remove that line close(pipefd[1]); from the child process's if,
I'm basically saying that "okay, the child can read from the pipe, but I'm allowing the parent to write to the pipe at the same time"?
If so, what would happen when the pipe is open for both reading & writing? No mutual exclusion?
Whenever the child process wants to read from the pipe, it must first close the pipe's side from writing.
If the process — parent or child — is not going to use the write end of a pipe, it should close that file descriptor. Similarly for the read end of a pipe. The system will assume that a write could occur while any process has the write end open, even if the only such process is the one that is currently trying to read from the pipe, and the system will not report EOF, therefore. Further, if you overfill a pipe and there is still a process with the read end open (even if that process is the one trying to write), then the write will hang, waiting for the reader to make space for the write to complete.
When I remove that line close(pipefd[1]); from the child's process IF, I'm basically saying that "okay, the child can read from the pipe, but I'm allowing the parent to write to the pipe at the same time"?
No; you're saying that the child can write to the pipe as well as the parent. Any process with the write file descriptor for the pipe can write to the pipe.
If so, what would happen when the pipe is open for both reading and writing — no mutual exclusion?
There isn't any mutual exclusion ever. Any process with the pipe write descriptor open can write to the pipe at any time; the kernel ensures that two concurrent write operations are in fact serialized. Any process with the pipe read descriptor open can read from the pipe at any time; the kernel ensures that two concurrent read operations get different data bytes.
You make sure a pipe is used unidirectionally by ensuring that only one process has it open for writing and only one process has it open for reading. However, that is a programming decision. You could have N processes with the write end open and M processes with the read end open (and, perish the thought, there could be processes in common between the set of N and set of M processes), and they'd all be able to work surprisingly sanely. But you'd not readily be able to predict where a packet of data would be read after it was written.
fork() duplicates the file handles, so you will have two handles for each end of the pipe.
Now, consider this. If the parent doesn't close the unused end of the pipe, there will still be two handles for it. If the child dies, the handle on the child side goes away, but there's still the open handle held by the parent -- thus, there will never be a "broken pipe" or "EOF" arriving because the pipe is still perfectly valid. There's just nobody putting data into it anymore.
Same for the other direction, of course.
Yes, the parent/child could still use the handle to write into their own pipe; I don't remember a use-case for this, though, and it still gives you synchronization problems.
When the pipe is created it is having two ends the read end and write end. These are entries in the User File descriptor table.
Similarly there will be two entries in the File table with 1 as reference count for both the read end and the write end.
Now when you fork, a child is created that is the file descriptors are duplicated and thus the reference count of both the ends in the file table becomes 2.
Now "When I remove that line close(pipefd[1])" -> In this case even if the parent has completed writing, your while loop below this line will block for ever for the read to return 0(ie EOF). This happens since even if the parent has completed writing and closed the write end of the pipe, the reference count of the write end in the File table is still 1 (Initially it was 2) and so the read function still is waiting for some data to arrive which will never happen.
Now if you have not written "close(pipefd[0]);" in the parent, this current code may not show any problem, since you are writing once in the parent.
But if you write more than once then ideally you would have wanted to get an error (if the child is no longer reading),but since the read end in the parent is not closed, you will not be getting the error (Even if the child is no more there to read).
So the problem of not closing the unused ends become evident when we are continuously reading/writing data. This may not be evident if we are just reading/writing data once.
Like if instead of the read loop in the child, you are using only once the line below, where you are getting all the data in one go, and not caring to check for EOF, your program will work even if you are not writing "close(pipefd[1]);" in the child.
read(pipefd[0], buf, sizeof(buf));//buf is a character array sufficiently large
man page for pipe() for SunOS :-
Read calls on an empty pipe (no buffered data) with only one
end (all write file descriptors closed) return an EOF (end
of file).
A SIGPIPE signal is generated if a write on a pipe with only
one end is attempted.

Resources