Clarification on how pipe() and dup2() work in C - c

I am writing a simple shell that handles piping. I have working code, but I don't quite understand how it all works under the hood. Here is a modified code snippet I need help understanding (I removed error checking to shorten it):
int fd[2];
pipe(fd);
if (fork()) { /* parent code */
close(fd[1]);
dup2(fd[0], 0);
/* call to execve() here */
} else { /* child code */
close(fd[0]);
dup2(fd[1], 1);
}
I have guesses for my questions, but that's all they are - guesses. Here are the questions I have:
Where is the blocking performed? In all the example code I've seen, read() and write() provide the blocking, but I didn't need to use them here. I just copy STDIN to point at the at the read end of the pipe and STDOUT to point to the write end of the pipe. What I'm guessing is happening is that STDIN is doing the blocking after dup2(fd[0], 0) is executed. Is this correct?
From what I understand, there is a descriptor table for each running process that points to the open files in the file table. What happens when a process redirects STDIN, STDOUT, or STDERR? Are these file descriptors shared across all processes' descriptor tables? Or are there copies for each process? Does redirecting one cause changes to be reflected among all of them?
After a call to pipe() and then a subsequent call to fork() there are 4 "ends" of the pipe open: A read and a write end accessed by the parent and a read and a write end accessed by the child. In my code, I close the parent's write end and the child's read end. However, I don't close the remaining two ends after I'm done with the pipe. The code works fine, so I assume that some sort of implicit closing is done, but that's all guess work. Should I be adding explicit calls to close the remaining two ends, like this?
int fd[2];
pipe(fd);
if (fork()) { /* parent code */
close(fd[1]);
dup2(fd[0], 0);
/* call to execve() here */
close(fd[0]);
} else { /* child code */
close(fd[0]);
dup2(fd[1], 1);
close(fd[1]);
}
This is more of a conceptual question about how the piping process works. There is the read end of the pipe, referred to by the file handle fd[0], and the write end of the pipe, referred to by the file handle fd[1]. The pipe itself is just an abstraction represented by a byte stream. The file handles represent open files, correct? So does that mean that somewhere in the system, there is a file (pointed at by fd[1]) that has all the information we want to send down the pipe written to it? And that after pushing that information through the byte stream, there is a file (pointed at by fd[0]) that has all that information written to it as well, thus creating the abstraction of a pipe?

Nothing in the code you've provided blocks. fork, dup2, and close all operate immediately. The code does not pause execution anywhere in the lines you've printed. If you're observing any waiting or hanging, it's elsewhere in your code (eg. in a call to waitpid or select or read).
Each process has its own file descriptor table. The files objects are global between all processes (and a file in the file system may be open multiple times, with different file objects representing it), but the file descriptors are per-process, a way for each process to reference the file objects. So a file descriptor like "1" or "2" only has meaning in your process -- "file number 1" and "file number 2" probably mean something different to another process. But it's possible for processes to reference the same file object (although each might have a different number for it).
So, technically, that's why there are two sets of flags you can set on file descriptors, the file descriptor flags that aren't shared between processes (F_CLOEXEC), and the file object flags (such as O_NONBLOCK) that get shared even between processes.
Unless you do something weird like freopen on stdin/stdout/stderr (rare) they're just synonyms for fds 0,1,2. When you want to write raw bytes, call write with the file descriptor number; if you want to write pretty strings, call fprintf with stdin/stdout/stderr -- they go to the same place.
No implicit closing is done, you're just getting away with it. Yes, you should close file descriptors when you're done with them -- technically, I'd write if (fd[0] != 0) close(fd[0]); just to make sure!
Nope, there's nothing written to disk. It's a memory backed file, which means that the buffer doesn't get stored anywhere. When you write to a "regular" file on the disk, the written data is stored by the kernel in a buffer, and then passed on to the disk as soon as possible to commit. When you write to a pipe, it goes to a kernel-managed buffer just the same, but it won't normally go to disk. It just sits there until it's read by the reading end of the pipe, at which point the kernel discards it rather than saving it.
The pipe has a read and write end, so written data always goes at the end of the buffer, and data that's read out gets taken from the head of the buffer then removed. So, there's a strict ordering to the flow, just like in a physical pipe: the water drops that go in one end first come out first from the other end. If the tap at the far end is closed (process not reading) then you can't push (write) more data into your end of the pipe. If the data isn't being written and the pipe empties, you have to wait when reading until more data comes through.

First of all you usually call execve or one of its sister calls in the child process, not in the parent. Remember that a parent knows who its child is, but not vice-versa.
Underneath a pipe is really a buffer handled by the operating system in such a way that it is guaranteed that an attempt to write to it blocks if the buffer is full and that a read to it blocks if there is nothing to read. This is where the blocking you experience comes from.
In the good old days, when buffers were small and computers were slow, you could actually rely on the reading process being awoken intermittently, even for smallish amounts of data, say in the order of tens of kilobytes. Now in many cases the reading process gets its input in a single shot.

Related

Diff/compare two files by file descriptor (fd) instead of file name

Is there any way in Linux, using c, to generate a diff/patch of two files stored in memory, using a common format (ie: unified diff, like with the command-line diff utility)?
I'm working on a system where I generate two text files in memory, and no external storage is available, or desired. I need to create a line-by-line diff of the two files, and since they are mmap'ed, they don't have file names, preventing me from simply calling system("diff file1.txt file2.txt").
I have file descriptors (fds) available for use, and that's my only entry point to the data. Is there any way to generate a diff/patch by comparing the two open files? If the implementation is MIT/BSD licensed (ie: non-GPL), so much the better.
Thank you.
On Linux you can use the /dev/fd/ pseudo filesystem (a symbolic link to /proc/self/fd). Use snprintf() to construct the path for both file descriptors like snprintf(path1, PATH_MAX, "/dev/fd/%d", fd1); ditto for fd2 and run diff on them.
Considering the requirements, the best option would be to implement your own in-memory diff -au. You could perhaps adapt the relevant parts of OpenBSD's diff to your needs.
Here's an outline of one how you can use the /usr/bin/diff command via pipes to obtain the unified diff between two strings stored in memory:
Create three pipes: I1, I2, and O.
Fork a child process.
In the child process:
Move the read ends of pipes I1 and I2 to descriptors 3 and 4, and the write end of pipe O to descriptor 1.
Close the other ends of those pipes in the child process. Open descriptor 0 for reading from /dev/null, and descriptor 2 for writing to /dev/null.
Execute execl("/usr/bin/diff", "diff", "-au", "/proc/self/fd/3", "/proc/self/fd/4", NULL);
This executes the diff binary in the child process. It will read the inputs from the two pipes, I1 and I2, and output the differences to pipe O.
The parent process closes the read ends of the I1 and I2 pipes, and the write end of the O pipe.
The parent process writes the comparison data to the write ends of I1 and I2 pipes, and reads the differences from the read end of the O pipe.
Note that the parent process must use select() or poll() or a similar method (preferably with nonblocking descriptors) to avoid deadlock. (Deadlock occurs if both parent and child try to read at the same time, or write at the same time.) Typically, the parent process must avoid blocking at all costs, because that is likely to lead to a deadlock.
When the input data has been completely written, the parent process must close the respective write end of the pipe, so that the child process detects the end-of-input. (Unless an error occurs, the write ends must be closed before the child process closes its end of the O pipe.)
When the parent process notices that no more data is available in the O pipe (read() returning 0), either it has already closed the write ends of the I1 and I2 pipes, or there was an error. If there is no error, the data transfer is complete, and the child process can be reaped.
The parent process reaps the child using e.g. waitpid(). Note that if there were any differences, diff returns with exit status 1.
You can use a fourth pipe to receive the standard error stream from the child process; diff does not normally output anything to standard error.
You can use a fifth pipe, write end marked O_CLOEXEC with fcntl() in the child, to detect execl() errors. O_CLOEXEC flag means the descriptor is closed when executing another binary, so the parent process can detect successful starting of the diff command by detecting the end-of-data in the read end (read() returning 0). If the execl() fails, the child can e.g. write the errno value (as a decimal number, or as an int) to this pipe, so that the parent process can read the exact cause for the failure.
In all, the complete method (that both records standard error, and detects exec errors) uses 10 descriptors. This should not be an issue in a normal application, but may be important -- for example, consider an internet-facing server with descriptors used by incoming connections.

Is it really necessary to close the unused end of the pipe in a process

I am reading about the pipes in UNIX for inter process communication between 2 processes. I have following question
Is it really necessary to close the unused end of the pipe? for example, if my parent process is writing data in to the pipe and child is reading from pipe, is it really necessary to close the read end of the pipe in parent process and close the write end from child process? Are there any side effects if I won't close those ends? Why do we need to close those ends?
Here's the problem if you don't. In your example, the parent creates a pipe for writing to the child. It then forks the child but does not close its own read descriptor. This means that there are still two read descriptors on the pipe.
If the child had the only one and it closed it (for example, by exiting), the parent would get a SIGPIPE signal or, if that was masked, an error on writing to the pipe.
However, there is a second read descriptor on the pipe (the parent's). Now, if the child exits, the pipe will remain open. The parent can continue to write to the pipe until it fills and then the next write will block (or return without writing if non-blocking).
Thus, by not closing the parent's read descriptor, the parent cannot detect that the child has closed its descriptor.
According to the man page for getdtablesize
Each process has a fixed size descriptor table, which is guaranteed to
have at least 20 slots.
Each pipe uses two entries in the descriptor table. Closing the unneeded end of the pipe frees up one of those descriptors. So, if you were unfortunate enough to be on a system where each process is limited to 20 descriptors, you would be highly motivated free up unneeded file descriptors.
Pipes are destined to be used as unidirectional communication channels. Closing them is a good practice allowing to avoid some mess in sent messages. Writer's descriptor should be closed for reader and vice versa.
Refer here
Quoting from above reference:
[...] Each pipe provides one-way communication; information flows from
one process to another.
For this reason, the parent and child process should close unused end
of the pipe.
There is actually another more important reason for closing unused
ends of the pipe.
The process reading from the pipe blocks when making the read system
call unless:
The pipe contains enough data to fill the reader's buffer or,
The end-of-file character is sent. The end-of-file character is sent through the pipe when every file descriptor to write end of the
pipe is closed. Any process reading from the pipe and forgetting to
close the write end of the pipe will never be notified of the
"end-of-file" [...]

Breaking down shell scripts; What happens under the hood?

So, I was given this one line script:
echo test | cat | grep test
Could you please explain to me how exactly that would work given the following system calls: pipe(), fork(), exec() and dup2()?
I am looking for an general overview here and mainly the sequence of operations.
What I know so far is that the shell will fork using fork() and the script's code will replace the shell's one by using the exec(). But what about pipe and dup2? How do they fall in place?
Thanks in advance.
First consider a simpler example, such as:
echo test | cat
What we want is to execute echo in a separate process, arranging for its standard output to be diverted into the standard input of the process executing cat. Ideally this diversion, once setup, would require no further intervention by the shell — the shell would just calmly wait for both processes to exit.
The mechanism to achieve that is called the "pipe". It is an interprocess communication device implemented in the kernel and exported to the user-space. Once created by a Unix program, a pipe has the appearance of a pair of file descriptors with the peculiar property that, if you write into one of them, you can read the same data from the other. This is not very useful within the same process, but keep in mind that file descriptors, including but not limited to pipes, are inherited across fork() and even accross exec(). This makes pipe an easy to set up and reasonably efficient IPC mechanism.
The shell creates the pipe, and now owns a set of file descriptors belonging to the pipe, one for reading and one for writing. These file descriptors are inherited by both forked subprocesses. Now only if echo were writing to the pipe's write-end descriptor instead of to its actual standard output, and if cat were reading from the pipe's read-end descriptor instead of from its standard input, everything would work. But they don't, and this is where dup2 comes into play.
dup2 duplicates a file descriptor as another file descriptor, automatically closing the new descriptor beforehand. For example, dup2(1, 15) will close file descriptor 1 (by convention used for the standard output), and reopen it as a copy of file descriptor 15 — meaning that writing to the standard output will in fact be equivalent to writing to file descriptor 15. The same applies to reading: dup2(0, 8) will make reading from file descriptor 0 (the standard input) equivalent to reading from file descriptor 8. If we proceed to close the original file descriptor, the open file (or a pipe) will have been effectively moved from the original descriptor to the new one, much like sci-fi teleports that work by first duplicating a piece of matter at a remote location and then disintegrating the original.
If you're still following the theory, the order of operations performed by the shell should now be clear:
The shell creates a pipe and then fork two processes, both of which will inherit the pipe file descriptors, r and w.
In the subprocess about to execute echo, the shell calls dup2(1, w); close(w) before exec in order to redirect the standard output to the write end of the pipe.
In the subprocess about to execute cat, the shell calls dup2(0, r); close(r) in order to redirect the standard input to the read end of the pipe.
After forking, the main shell process must itself close both ends of the pipe. One reason is to free up resources associated with the pipe once subprocesses exit. The other is to allow cat to actually terminate — a pipe's reader will receive EOF only after all copies of the write end of the pipe are closed. In steps above, we did close the child's redundant copy of the write end, the file descriptor 15, right after its duplication to 1. But the file descriptor 15 must also exist in the parent, because it was inherited under that number, and can only be closed by the parent. Failing to do that leaves cat's standard input never reporting EOF, and its cat process hanging as a consequence.
This mechanism is easily generalized it to three or more processes connected by pipes. In case of three processes, the pipes need to arrange that echo's output writes to cat's input, and cat's output writes to grep's input. This requires two calls to pipe(), three calls to fork(), four calls to dup2() and close (one for echo and grep and two for cat), three calls to exec(), and four additional calls to close() (two for each pipe).

how to communicate to a program with another external program

I'm trying to write to stdin and read from stdout ( and stderr ) from an external program, without changing the code.
I've tried using named pipes, but stdout doesn't show until the program is terminated and stdin only works on the first input( then cin is null ).
i've tried using /proc/[pid]/fd but that only writes and reads from the terminal and not the program.
i've tried writing a character device file for this and it worked, but only one program at a time ( this needs to work for multiple programs at a time ).
at this point, to my knowledge, I could write the driver that worked to multiplex the io across multiple programs but I don't think that's the "right" solution.
the main purpose of this is to view a feed of a program through a web interface. I'm sure there has to be someway to do this. is there anything I haven't tried that's been done before?
The typical way of doing this is:
Create anonymous pipes (not named pipes) with the pipe(2) system call for the new process's standard streams
Call fork(2) to spawn the child process
close(2) the appropriate ends of the pipes in both the parent and the child (e.g. for the stdin pipe, close the read end in the parent and close the write end in the child; vice-versa for the stdout and stderr pipes)
Use dup2(2) in the child to copy the pipe file descriptors onto file descriptors 0, 1, and 2, and then close(2) the remaining old descriptors
exec(3) the external application in the child process
In the parent process, simultaneously write to the child's stdin pipe and read from the child's stdout and stderr pipes. However, depending on how the child behaves, this can easily lead to deadlock if you're not careful. One way to avoid deadlock is to spawn separate threads to handle each of the 3 streams; another way is to use the select(2) system call to wait until one of the streams can be read from/written to without blocking, and then process that stream.
Even if you do this all correctly, you may still not see your program's output right away. This is typically due to buffering stdout. Normally, when stdout is going to a terminal, it's line-buffered—it gets flushed after every newline gets written. But when stdout is a pipe (or anything else that's not a terminal, like a file or a socket), it's fully buffered, and it only gets written to when the program has outputted a full buffer's worth of data (e.g. 4 KB).
Many programs have command line options to change their buffering behavior. For example, grep(1) has the --line-buffered flag to force it to line-buffer its output even when stdout isn't a terminal. If your external program has such an option, you should probably use it. If not, it's still possible to change the buffering behavior, but you have to use some sneaky tricks—see this question and this question for how to do that.

Do we need to close the read end of a pipe explicitly whose write end has already been closed?

I have this following scenario.
I create a pipe.
Forked a child process.
Child closes read end of the pipe explicitly and writes into the write end of the pipe and exits without closing anything ( exit should close all open file/pipe descriptors on behalf of the child, I presume).
Parent closes the write end of the pipe explicitly and reads from the read end of the pipe using fgets until fgets returns NULL. ie it reads completely.
Now my question is, why does the parent need to close the read end of the pipe explicitly once its done reading? Isn't it wise for the system to delete the pipe altogether once complete data has been read from the read-end?
I dint close the read end explicitly in the parent and I have Too many file descriptors error sooner or later while opening more pipes. My assumption was that the system automatically deletes a pipe once its write end is closed and data has been completely read from read end. Cos you cant from a pipe twice!
So, whats the rationale behind the system not deleting the pipe once data has been completely read and write end closed?
You're correct that the system will close the write end of the pipe once the child exits. However there could be another write end of that pipe open, if the child forks or passes a duplicate of the write end to another process.
It is still true that the system would be able to tell when all the descriptors at one end of a pipe have been closed (either explicitly or because the owning process exited). It still doesn't make sense to close those on the other end of the pipe, as that would lead to confusion when the parent process tries to close the descriptor on its end of the pipe; either:
the fd has been closed by the system, in which case there is an error as it tries to close an already closed fd; or
the fd has been reused, which is even worse as it is now closing a completely unrelated fd.
From the point of view of the system, it might well have discarded the pipe once all the descriptors at one end have been closed, so you don't need to worry about inefficiency there. What matters more is that the user space process should have a consistent experience, which means not closing the descriptor unless it is specifically requested.
File descriptors are not closed by the system, until the process exits. This is true for pipes, as well as any other file descriptor.
There's a big difference between a pipe (or any other file) with no data in it and a closed file descriptor.
When a file descriptor is closed, the system can reuse its number for a new file descriptor. Then, when you read, you get something else. So after you've closed a file descriptor, you must no longer use it.
Now imagine that once there's no more data, the system would automatically close the file descriptor. This would make the number available for reuse, and a subsequent unrelated open may get it. Now the reader, who doesn't know yet that there's no more data, will read from what it thinks is the pipe, but will actually read from another file.

Resources