Processes hang on read - c

The following code reads messages from other processes through a pipe. All processes correctly print out all the messages, but then they will never proceed past the while loop. Tried debugging in Eclipse, after reading reading all the messages, it will just stop at that while loop.
The index is a number assigned to each process. The first process would have index == 0.
The message itself is simply the index of the process sending the message.
while((n = read(fd[index][0], &mymsg, sizeof(int))) == sizeof(int))
printf("process%d has received a message from process%d\n", index, mymsg);
Any ideas why this would happen?
Here is how each process writes to another:
// Write to other process
if(write(fd[index2][1], &index, sizeof(int)) != sizeof(int))
sys_error(2);
This is done five times. fd is a table of read-and-write ends for each process.

The call to read() is blocking until more data shows up. From the man page for pipe
If a process attempts to read from an empty pipe, then read(2) will
block until data is available. If a process attempts to write to a
full pipe (see below), then write(2) blocks until sufficient data has
been read from the pipe to allow the write to complete. Nonblocking
I/O is possible by using the fcntl(2) F_SETFL operation to enable the
O_NONBLOCK open file status flag.
After you open each file descriptor before you enter that while loop do this to each one:
fcntl(fd, F_SETFL, O_NONBLOCK);
However, you really should read up on blocking vs. non-blocking I/O, including reading the man pages for pipe, read, fcntl, etc.

Related

Closing socket file descriptor after EPIPE

After ignoring SIGPIPE with a call to signal(), there is a possibility that a write() call will fail with errno set equal to EPIPE. Do I still have to close the file descriptor like in the example below?
if (write(sock_fd, buf, len) < 0) {
if (errno == EPIPE) {
close(sock_fd);
}
}
Remember that when you create a pipe, using the system call of the same name, you get two file descriptors: one for reading and one for writing. You get EPIPE from a write operation, on the write fd, when the read fd has been closed. The write fd is still open. If you try to write to it again, you'll get EPIPE again.
(Often, when this happens, the pipe was set up by a shell or some other parent process, and the read fd was never available to your program, but that doesn't matter to the kernel. It was open in some process(es) and now it isn't.)
Since it's still open, you do need to close it. However, exiting automatically closes all fds that are still open. So if the very next thing you'd do after closing the pipe is to exit, then you don't need to bother closing it first. Since it is a pipe, and you already got EPIPE, there can't be any delayed write errors that close might report.
You always have to close file descriptors. No ifs, no buts.

How to check if non-blocking anonymous pipe has data without removing it

I have a POSIX thread which reads from the non-blocking anonymous pipe (marked with O_NONBLOCK flag). When thread is stopping (because of errors for example) I want to check if there is something left in pipe (in its internal buffer). If pipe has data - run new thread with the same read descriptor (it is shared between threads) so the new thread can continue reading from pipe. If pipe is empty - close pipe and do nothing.
So I need to check if pipe is empty without removing data from pipe (as regular read will do). Is there any way to do it?
P.S. I think setting count = 0 in read(int fd, void *buf, size_t count); may help but the documentation sais that it is some kind of undefined behavior:
If count is zero, read() may detect the errors described below. In
the absence of any errors, or if read() does not check for errors, a
read() with a count of 0 returns zero and has no other effects.
I believe you want poll or select, called with a zero timeout.
Short description from the select() docs:
select() and pselect() allow a program to monitor multiple file
descriptors, waiting until one or more of the file descriptors become
"ready" for some class of I/O operation (e.g., input possible).
...and the poll() docs:
poll() performs a similar task to select(2): it waits for one of a
set of file descriptors to become ready to perform I/O.

Clarification on how pipe() and dup2() work in C

I am writing a simple shell that handles piping. I have working code, but I don't quite understand how it all works under the hood. Here is a modified code snippet I need help understanding (I removed error checking to shorten it):
int fd[2];
pipe(fd);
if (fork()) { /* parent code */
close(fd[1]);
dup2(fd[0], 0);
/* call to execve() here */
} else { /* child code */
close(fd[0]);
dup2(fd[1], 1);
}
I have guesses for my questions, but that's all they are - guesses. Here are the questions I have:
Where is the blocking performed? In all the example code I've seen, read() and write() provide the blocking, but I didn't need to use them here. I just copy STDIN to point at the at the read end of the pipe and STDOUT to point to the write end of the pipe. What I'm guessing is happening is that STDIN is doing the blocking after dup2(fd[0], 0) is executed. Is this correct?
From what I understand, there is a descriptor table for each running process that points to the open files in the file table. What happens when a process redirects STDIN, STDOUT, or STDERR? Are these file descriptors shared across all processes' descriptor tables? Or are there copies for each process? Does redirecting one cause changes to be reflected among all of them?
After a call to pipe() and then a subsequent call to fork() there are 4 "ends" of the pipe open: A read and a write end accessed by the parent and a read and a write end accessed by the child. In my code, I close the parent's write end and the child's read end. However, I don't close the remaining two ends after I'm done with the pipe. The code works fine, so I assume that some sort of implicit closing is done, but that's all guess work. Should I be adding explicit calls to close the remaining two ends, like this?
int fd[2];
pipe(fd);
if (fork()) { /* parent code */
close(fd[1]);
dup2(fd[0], 0);
/* call to execve() here */
close(fd[0]);
} else { /* child code */
close(fd[0]);
dup2(fd[1], 1);
close(fd[1]);
}
This is more of a conceptual question about how the piping process works. There is the read end of the pipe, referred to by the file handle fd[0], and the write end of the pipe, referred to by the file handle fd[1]. The pipe itself is just an abstraction represented by a byte stream. The file handles represent open files, correct? So does that mean that somewhere in the system, there is a file (pointed at by fd[1]) that has all the information we want to send down the pipe written to it? And that after pushing that information through the byte stream, there is a file (pointed at by fd[0]) that has all that information written to it as well, thus creating the abstraction of a pipe?
Nothing in the code you've provided blocks. fork, dup2, and close all operate immediately. The code does not pause execution anywhere in the lines you've printed. If you're observing any waiting or hanging, it's elsewhere in your code (eg. in a call to waitpid or select or read).
Each process has its own file descriptor table. The files objects are global between all processes (and a file in the file system may be open multiple times, with different file objects representing it), but the file descriptors are per-process, a way for each process to reference the file objects. So a file descriptor like "1" or "2" only has meaning in your process -- "file number 1" and "file number 2" probably mean something different to another process. But it's possible for processes to reference the same file object (although each might have a different number for it).
So, technically, that's why there are two sets of flags you can set on file descriptors, the file descriptor flags that aren't shared between processes (F_CLOEXEC), and the file object flags (such as O_NONBLOCK) that get shared even between processes.
Unless you do something weird like freopen on stdin/stdout/stderr (rare) they're just synonyms for fds 0,1,2. When you want to write raw bytes, call write with the file descriptor number; if you want to write pretty strings, call fprintf with stdin/stdout/stderr -- they go to the same place.
No implicit closing is done, you're just getting away with it. Yes, you should close file descriptors when you're done with them -- technically, I'd write if (fd[0] != 0) close(fd[0]); just to make sure!
Nope, there's nothing written to disk. It's a memory backed file, which means that the buffer doesn't get stored anywhere. When you write to a "regular" file on the disk, the written data is stored by the kernel in a buffer, and then passed on to the disk as soon as possible to commit. When you write to a pipe, it goes to a kernel-managed buffer just the same, but it won't normally go to disk. It just sits there until it's read by the reading end of the pipe, at which point the kernel discards it rather than saving it.
The pipe has a read and write end, so written data always goes at the end of the buffer, and data that's read out gets taken from the head of the buffer then removed. So, there's a strict ordering to the flow, just like in a physical pipe: the water drops that go in one end first come out first from the other end. If the tap at the far end is closed (process not reading) then you can't push (write) more data into your end of the pipe. If the data isn't being written and the pipe empties, you have to wait when reading until more data comes through.
First of all you usually call execve or one of its sister calls in the child process, not in the parent. Remember that a parent knows who its child is, but not vice-versa.
Underneath a pipe is really a buffer handled by the operating system in such a way that it is guaranteed that an attempt to write to it blocks if the buffer is full and that a read to it blocks if there is nothing to read. This is where the blocking you experience comes from.
In the good old days, when buffers were small and computers were slow, you could actually rely on the reading process being awoken intermittently, even for smallish amounts of data, say in the order of tens of kilobytes. Now in many cases the reading process gets its input in a single shot.

Why should you close a pipe in linux?

When using a pipe for process-process communication, what is the purpose of closing one end of the pipe?
For example: How to send a simple string between two programs using pipes?
Notice that one side of the pipe is closed in the child and parent processes. Why is this required?
If you connect two processes - parent and child - using a pipe, you create the pipe before the fork.
The fork makes the both processes have access to both ends of the pipe. This is not desirable.
The reading side is supposed to learn that the writer has finished if it notices an EOF condition. This can only happen if all writing sides are closed. So it is best if it closes its writing FD ASAP.
The writer should close its reading FD just in order not to have too many FDs open and thus reaching a maybe existing limit of open FDs. Besides, if the then only reader dies, the writer gets notified about this by getting a SIGPIPE or at least an EPIPE error (depending on how signals are defined). If there are several readers, the writer cannot detect that "the real one" went away, goes on writing and gets stuck as the writing FD blocks in the hope, the "unused" reader will read something.
So here in detail what happens:
parent process calls pipe() and gets 2 file descriptors: let's call it rd and wr.
parent process calls fork(). Now both processes have a rd and a wr.
Suppose the child process is supposed to be the reader.
Then
the parent should close its reading end (for not wasting FDs and for proper detection of dying reader) and
the child must close its writing end (in order to be possible to detect the EOF condition).
The number of file descriptors that can be open at a given time is limited. If you keep opening pipes and not closing them pretty soon you'll run out of FDs and can't open anything anymore: not pipes, not files, not sockets, ...
Another reason why it can be important to close the pipe is when the closing itself has a meaning to the application. For example, a common use of pipes is to send the errno from a child process to the parent when using fork and exec to launch an external program:
The parent creates the pipe, calls fork to create a child process, closes its writing end, and tries to read from the pipe.
The child process attempts to use exec to run a different program:
If exec fails, for example because the program does not exist, the child writes errno to the pipe, and the parent reads it and knows what went wrong, and can tell the user.
If exec is successful the pipe is closed without anything being written. The read function in the parent returns 0 indicating the pipe was closed and knows the program was successfully started.
If the parent did not close its writing end of the pipe before trying to read from the pipe this would not work because the read function would never return when exec is successful.
Closing unused pipe file descriptor is more than a matter of ensuring that a process doesn't exhaust its limited set of file descriptor-it is essential to the correct use of pipes. We now consider why the unused file descriptors for both the read and write ends of the pipe must be closed.
The process reading from the pipe closes its write descriptor for the pipe, so that, when the other process completes its output and closes its write descriptor, the read sees end-of-file (once it has ready any outstanding data in the pipe).
If the reading process doesn't close the write end of the pipe, then after the other process closes its write descriptor, the reader won't see end-of-file, even after it has read all data from the pipe. Instead, a read() would block waiting for data, because the kernel knows that there is still at least one write descriptor open for the pipe.That this descriptor is held open by the reading process itself is irrelevant; In theory, that process could still write to the pipe, even if it is blocked trying to read.
For example, the read() might be interrupted by a signal handler that writes data to the pipe.
The writing process closes its read descriptor for the pipe for a different reason.
When a process tries to write to a pipe for which no process has an open read descriptor, the kernel sends the SIGPIPE signal to the writing process. By default, this signal kills a process. A process can instead arrange to catch or ignore this signal, in which case the write() on the pipe fails with the error EPIPE (broken pipe). Receiving the SIGPIPE signal or getting the EPIPE error is useful indication about the status of the pipe, and this is why unused read descriptors for the pipe should be closed.
If the writing process doesn't close the read end of the pipe, then even after the other process closes the read end of the pipe, the writing process will fill the pipe, and a further attempt to write will block indefinitely.
One final reason for closing unused file descriptor is that only after it all file descriptor are closed that the pipe is destroyed and its resources released for reuse by other processes. At this point, any unread data in the pipe is lost.
~ Micheal Kerrisk , the Linux programming interface

Why make stdin, stdout and stderr to a single fd?

I saw this code snippet from APUE
dup2(fd,0);
dup2(fd,1);
dup2(fd, 2);
if (fd > 2)
close(fd);
In my understanding, it makes stdin, stdout and stderr all point to fd. It says that lots program contain this code, why? What's it functionality?
I'm going to add to the comments and answer here because even though they're correct, I would still have a hard time understanding exactly when and why this sequence of calls were needed.
This sequence of function calls is typically used when a process will run as a daemon. In that case, among other things, the daemon doesn't want to have the standard I/O file descriptors attached to the terminal (or other resources). To 'detach' those descriptors, something like the following might occur:
int fd;
fd = open("/dev/null",O_RDWR); // missing from APUE exercise 3.4 example
if (fd != -1)
{
dup2 (fd, 0); // stdin
dup2 (fd, 1); // stdout
dup2 (fd, 2); // stderr
if (fd > 2) close (fd);
}
What this does is bind /dev/null' to each of the standard I/O descriptors and closes the temporary descriptor used to open/dev/null` in the first place (as long as that open didn't end up using one of the descriptors usually used for the standard I/O descriptors for some reason).
Now the daemon has valid stdin/stdout/stderr descriptors, but they aren't referring to a file or device that might interfere with another process.
This is mostly used in daemon programs because the daemon not connected with the terminal or tty. so for that we need maintain the error or printed statements in one file. for that only we were using this statements. In our system File descriptor 0,1,2 is already allocated for the standard buffers like stdin,etc...
Dup2 function is something different from dup function.
In dup2 function we no need to close already using file descriptor.
In this dup2 function itself if the second argument file descriptors is already using means
without close() function dup2 is closed the second argument fd and allocated a dup of first argument fd.
Then first argument fd is connected to second fd and do the first fd works
For example dup2(fd,1) means the file descriptor works are copied to the stdout.
fd is contains any the statements is print the stdout.

Resources