dup2 / dup - Why would I need to duplicate a file descriptor? - c

I'm trying to understand the use of dup2 and dup.
From the man page:
DESCRIPTION
dup and dup2 create a copy of the file descriptor oldfd. After successful return of dup or dup2, the old and new descriptors may be used interchangeably. They share locks, file position pointers and flags; for example, if the file position is modified by using lseek on one of the descriptors, the position is also changed for the other.
The two descriptors do not share the close-on-exec flag, however. dup uses the lowest-numbered unused descriptor for the new descriptor.
dup2 makes newfd be the copy of oldfd, closing newfd first if necessary.
RETURN VALUE
dup and dup2 return the new descriptor, or -1 if an error occurred (in which case, errno is set appropriately).
Why would I need that system call? What is the use of duplicating the file descriptor? If I have the file descriptor, why would I want to make a copy of it? I'd appreciate it if you could explain and give me an example where dup2 / dup is needed.

The dup system call duplicates an existing file descriptor, returning a new one that
refers to the same underlying I/O object.
Dup allows shells to implement commands like this:
ls existing-file non-existing-file > tmp1 2>&1
The 2>&1 tells the shell to give the command a file descriptor 2 that is a duplicate of descriptor 1. (i.e stderr & stdout point to same fd).
Now the error message for calling ls on non-existing file and the correct output of ls on existing file show up in tmp1 file.
The following example code runs the program wc with standard input connected
to the read end of a pipe.
int p[2];
char *argv[2];
argv[0] = "wc";
argv[1] = 0;
pipe(p);
if(fork() == 0) {
close(STDIN); //CHILD CLOSING stdin
dup(p[STDIN]); // copies the fd of read end of pipe into its fd i.e 0 (STDIN)
close(p[STDIN]);
close(p[STDOUT]);
exec("/bin/wc", argv);
} else {
write(p[STDOUT], "hello world\n", 12);
close(p[STDIN]);
close(p[STDOUT]);
}
The child dups the read end onto file descriptor 0, closes the file de
scriptors in p, and execs wc. When wc reads from its standard input, it reads from the
pipe.
This is how pipes are implemented using dup, well that one use of dup now you use pipe to build something else, that's the beauty of system calls,you build one thing after another using tools which are already there , these tool were inturn built using something else so on ..
At the end system calls are the most basic tools you get in kernel
Cheers :)

Another reason for duplicating a file descriptor is using it with fdopen. fclose closes the file descriptor that was passed to fdopen, so if you don't want the original file descriptor to be closed, you have to duplicate it with dup first.

dup is used to be able to redirect the output from a process.
For example, if you want to save the output from a process, you duplicate the output (fd=1), you redirect the duplicated fd to a file, then fork and execute the process, and when the process finishes, you redirect again the saved fd to output.

Some points related to dup/dup2 can be noted please
dup/dup2 - Technically the purpose is to share one File table Entry inside a single process by different handles. ( If we are forking the descriptor is duplicated by default in the child process and the file table entry is also shared).
That means we can have more than one file descriptor having possibly different attributes for one single open file table entry using dup/dup2 function.
(Though seems currently only FD_CLOEXEC flag is the only attribute for a file descriptor).
http://www.gnu.org/software/libc/manual/html_node/Descriptor-Flags.html
dup(fd) is equivalent to fcntl(fd, F_DUPFD, 0);
dup2(fildes, fildes2); is equivalent to
close(fildes2);
fcntl(fildes, F_DUPFD, fildes2);
Differences are (for the last)- Apart from some errno value beteen dup2 and fcntl
close followed by fcntl may raise race conditions since two function calls are involved.
Details can be checked from
http://pubs.opengroup.org/onlinepubs/009695399/functions/dup.html
An Example of use -
One interesting example while implementing job control in a shell, where the use of dup/dup2 can be seen ..in the link below
http://www.gnu.org/software/libc/manual/html_node/Launching-Jobs.html#Launching-Jobs

Related

What are the rules of closing file descriptors after calling dup/dup2?

I feel like this is a topic I've taken for granted. In the past I literally just closed as many file descriptors "because I was told to". Most of the time this worked, but occasionally I ran into some unpredictable behaviour.
Thus, I'd like to ask - what the rule for closing file descriptors after calling dup / dup2?
Let's say I want to perform cat < in > out.
fd[IN] = open("in", O_RDONLY);
saved_stdin = dup(STDIN_FILENO);
dup2(fd[IN], STDIN_FILENO);
close(fd[IN])
fd[OUT] = open("out", O_WRONLY | O_CREAT | O_TRUNC, 0644);
saved_stdout = dup(STDOUT_FILENO);
dup2(fd[OUT], STDOUT_FILENO);
close(fd[OUT])
// Later on when I want to restore stdin and stdout
dup2(saved_stdin, STDIN_FILENO);
close(saved_stdin);
dup2(saved_stdout, STDINOUT_FILENO);
close(saved_stdout);
Is this correct or should I be closing more file descriptors?
The rule is indeed quite simple. For both dup() variants, it is true, that:
The source fd remains open and will have to be closed once it is no longer needed.
The target file descriptor,
when using dup(), is always an unused one
when using dup2(), is implicitly closed and replaced by a copy of the source fd.
The new target fd has to be closed, when it is no longer needed.
Source fd refers to the file descriptor to be duplicated, while target fd is the new file descriptor.
int new_fd = dup(source_fd);
dup2(source_fd, new_fd);
So yes, your code does the necessary closes, and no unneeded ones.
The figures is come from CSAPP System-Level:
Figure 2: Before Redirect IO
dup2(4,1);
Figure 1: After Redirect IO
Notice the refcnt of fd 1 has changed to 0, after call dup2.
According to the description of close in linux manual. It said:
if the file descriptor was the last reference to a file which has been removed using unlink(2), the file is deleted.
close be used to decrease the refcnt of opened file. we use dup to create a new fd will increase the refcnt of opened file. when we call close function, it did't close the file immediately, it only decrease the reference count of file. the file will be close/delete when the refcnt is 0.
So it's really like the Reference counting for memory management.

how to use the system call dup?

I am trying to understand how the system call dup() works. I am asking this question because I am writing a shell in C and I need to redirect the STDOUT to a file. Is this the right way to do it?
If for example I have the following code:
remember = dup(STDOUT_FILENO);
fileDescriptor = open("file.txt",O_RDONLY);
then everything that writes to the stdout will now write to the opened file?
As soon as the following line is executed:
remember = dup(STDOUT_FILENO);
STDOUT_FILENO is removed from the table of file descriptors leaving the first spot empty. When a new file is opened, the earliest empty file descriptor will be appointed to this new opened file, so in this case 1.
Nope. You just duplicate the file descriptor for stdout.
With the code you have so far, you could now do a write to remember, and the output would go to console, too:
char str = "this now goes to console, too!";
write(remember, str, strlen(str));
If you want to redirect console output, you yet have to do this:
dup2(fileDescriptor, STDOUT_FILENO);
This will close STDOUT_FILENO (but you have a duplicate in remember to restore it, if need be) and overwrite it with fileDescriptor – and from now on, console output goes to file...
If you don't ever consider to restore outputting to console, you can ommit the first call to dup entirely...
Edit (in response to your edit):
STDOUT_FILENO is removed from the table of file descriptors leaving the first spot empty. When a new file is opened, the earliest empty file descriptor will be appointed to this new opened file, so in this case 1.
This applies for close(STDOUT_FILENO)!
So back to if not wanting to restore: You could then do, too:
close(STDOUT_FILENO);
fileDescriptor = open("file.txt",O_WRONLY | O_CREAT);
// fileDescriptor will be 1 now
By the way: You must open your file with write access enabled (O_WRONLY or O_RDWR), as you want to write to that file (redirect output to)!
And you need the O_CREAT flag for the case the file does not exist yet. If you do not want to clear the file, but append to, add the O_APPEND flag (see open).
No dup is used to duplicate an existing descriptor and returning a duplicate whose value is the less possible among free descriptors (some docs says ``The new descriptor returned by the call is the lowest
numbered descriptor currently not in use by the process.''), so:
fileDescriptor = open("file.txt",O_RDONLY); // get new desc.
close(STDIN_FILENO); // close 0
dup(fileDescriptor); // dup new desc to 0 (less possible free desc).
// here fileDescriptor and 0 are aliases to the same opened file
close(fileDescriptor); // free unused desc.

Redirecting STDIN/OUT/ERR

I'm trying to create a linux daemon in c and found some sample code on this page.
I understand all the code except where it tries to redirect STDIN, STDOUT and STDERR (to /dev/null/). I also found a number of questions on here related to why these should be redirected (which I understand).
Specifically the section of code my question relates to is:
/* Route I/O connections */
/* Open STDIN */
i = open("/dev/null", O_RDWR);
/* STDOUT */
dup(i);
/* STDERR */
dup(i);
Reading the man page for dup() it implies that dup() simply duplicates a file descriptor.
So I don't understand how this does the redirect ? Is the compiler taking hints from the comments in the line above ?, or is it missing some code ?, is it plain wrong ?, or am I missing something ?
It's import to understand the previous bit of the example code you link to:
/* close all descriptors */
for (i = getdtablesize(); i >= 0; --i)
{
close(i);
}
This closes all open file descriptors including STDIN, STDOUT and STDERR.
As the manpage for open() states
The file descriptor returned by a successful call will be the lowest-numbered file descriptor not currently open for the process
So the subsequent call to open() in the example code will redirect file descriptor 0 which is STDIN, to /dev/null.
The subsequent calls to dup() will duplicate the file descriptor using the next lowest numbers. STDOUT is 1, and STDERR is 2.
The manpage for dup() states:
The dup() system call creates a copy of the file descriptor oldfd, using the lowest-numbered unused descriptor for the new descriptor
From the man page of dup:
The dup(oldfd) system call creates a copy of the file descriptor oldfd,
using the lowest-numbered unused descriptor for the new descriptor.
If you see the referenced code, he is first closing all the open file descriptors:
for (i = getdtablesize(); i >= 0; --i)
{
close(i);
}
After that when you call dup(i), it will copy the file descriptor i to the lowest available descriptor, which will be 0 (stdin). Doing that again will copy it to descriptor 1 (stdout) and similarly for descriptor 2 (stderr). In this way, the stdin, stdout, and stderr of the daemon process are pointing to /dev/null.
Every process gets three open file descriptors which are the stdin, stdout, and stderr (these descriptors usually have the values 0, 1, and 2 respectively). When you call printf(), for example, it writes to the file pointed to by the stdout descriptor. By pointing this descriptor to another file (such as /dev/null), any output from this process will get redirected to that file. Same logic applies for stdin and stderr.
On the shell, when you run something like ls > ls.out, the shell does the same. It fork()s a new process, opens ls.out for writing, and calls dup (or dup2) to copy the file descriptor of ls.out to this process' stdout.

where stdin / stdout created

In c( ansi ) , we say input taken by (s/v/f)scanf and stored in stdin , same as we say
stdout . I wonder, in linux ( unix ) where are they reside, under which folder .
Or they ( stdin / stdout ) are arbitrary ( that is, no such things exist )
They are streams created for your process by the operating system. There is no named file object associated with them, and so they do not have a representation within the file system, although as unwind points out, they may be accessed via a pseudo file system if your UNIX variant supports such a thing.
stdin is a FILE * referring to the stdio (standard io) structure that is tied to the file descriptor 0. File descriptors are what Unix-like systems, such as Linux, use to talk with applications about particular file-like things. (Actually, I'm pretty sure that Windows does this as well).
File descriptor 0 may refer to any type of file, but to make sense it must be one that read can be called on (it must be a regular file, a steam socket, or a character device opened for reading or the read side of a pipe, as opposed to a directory file, data gram socket, or a block device).
Processes in Unix-like systems inherit their open file descriptors from their parent process in Unix-like systems. So to run a program with stdin set to something besides the parent's stdin you would do:
int new_stdin = open("new_stdin_file, O_RDONLY);
pid_t fk = fork();
if (!fk) { // in the child
dup2(new_stdin, 0);
close(new_stdin);
execl("program_name", "program_name", NULL);
_exit(127); // should not have gotten here, and calling exit (without _ ) can have
// side effects because it runs atexit registered functions, and we
// don't want that here
} else if (fk < 0) {
// in parent with error from fork
} else {
// in parent with no error so fk = pid of child
}
close(new_stdin); // we don't need this anymore
dup2 duplicates the first file descriptor argument as the second (closing the second before doing so if it were open for the current process).
fork creates a duplicate of the current process. execl is one of the exec family of functions, which use the execve system call to replace the current program with another program. The combination of fork and exec are how programs are generally run (even when hidden within other functions).
In the above example we could have run the new program with stdin set to the read end of a pipe, a tty (serial port / TeleTYpe), or several other things. Some of these have names present in the filesystem and others do not (like some pipes and sockets, though some do have names in the filesystem).
Linux makes /proc/self/fd/0 a symbolic link to the file opened as 0 in the current process. /proc/%i/fd/0, pid would represent the symbolic link to the same thing for an arbitrary pid (process ID) using the printf syntax. These symbolic links are often usable to find the real file in the filesystem (using the readlink system call), but if the file does not actually exist in the filesystem the link data (what would usually be a file name) instead is just a string that tells a little bit about the file.
I should point out here that a file that stdin (fd 0) refers to, even if it is in the filesystem, may not have just one name. It may have more than one hard link, so it would have more than one name -- and each of these would be just as much its name as any other hard link. Additionally it may have no name at all if all of its hard links have been unlinked since it was opened, though it's data would still live on the disk until all open file descriptors for it are closed.
If you don't actually need to know where it is in the filesystem, but just want some data about it you can use the fstat system call. This is like the stat system call and command line utility, except for already open files.
Everything I said here about stdin (fd 0) should be applicable to stdout (fd 1) and stderr (fd 2) except that they will both be writable rather than readable.
If you want to know more about any of the functions I mentioned be sure to look them up in the man pages by typing:
man fork
on the command line. Most functions I mentioned are in section 2 of the man pages, but one or two may be in section one, so man 2 fork will work too, and may be useful when a command line tool has the same name as a function.
In Linux, you can generally find stdin through the /proc file system in /proc/self/fd/0, and stdout is /proc/self/fd/1.
stdin is standard input - for example, keyboard input.
stdout is standard output - for example, monitor.
For more info, read this.
If you run:
./myprog < /etc/passwd
then stdin exists in the filesystem as /etc/passwd. If you just run
./myprog
interactively on a terminal, then stdin exists in the filesystem as whatever your terminal device is (probably /dev/pts/5 or something).
If you run
cat /etc/passwd | ./myprog
then stdin is an anonymous pipe and has no instantiation in the filesystem, but Linux allows you to get at it via /proc/12345/fd/0 where 12345 is the pid of myprog.

Behavior of a pipe after a fork()

When reading about pipes in Advanced Programming in the UNIX Environment, I noticed that after a fork the parent can close() the read end of a pipe and it doesn't close the read end for the child. When a process forks, does its file descriptors get retained?
What I mean by this is that before the fork the pipe read file descriptor had a retain count of 1, and after the fork 2. When the parent closed its read side the fd went to 1 and is kept open for the child. Is this essentially what is happening? Does this behavior also occur for regular file descriptors?
As one can read on the man page about fork():
The child process shall have its own copy of the parent's file
descriptors. Each of the child's file
descriptors shall refer to the same
open file description with the
corresponding file descriptor of the
parent.
So yes, the child have exact copy of parent's file descriptors and that refers to all of them, including open files.
The answer is yes, and yes (the same applies to all file descriptors, including things like sockets).
In a fork() call, the child gets its own seperate copy of each file descriptor, that each act like they had been created by dup(). A close() only closes the specific file descriptor that was passed - so for example if you do n2 = dup(n); close(n);, the file (pipe, socket, device...) that n was referring to remains open - the same applies to file descriptors duplicated by a fork().
Yes, a fork duplicates all open file descriptors.
So for a typical pipe, a 2 slot array (int fd[2]), fd[0] is the same for the parent and child, and so is fd[1].
You can create a pipe without forking at all, and read/write to yourself by using fd[0] and fd[1] in one process.

Resources