Can a file descriptor be duplicated multiple times? - c

I've been looking for quite a while and cannot find the answer to my question.
I'm trying to reproduce a shell in C, with full redirections. In order to do this, I wanted to open the file before executing my command.
For example, in ls > file1 > file2, I use dup2(file1_fd, 1) and dup2(file2_fd, 1) and then I execute ls to fill the files, but it seems a standard output can only be open once so only file2 will be filled, because it was the last one to be duplicated.
Is there a way to redirect standard output to multiple file?
Is there something I am missing? Thanks!

Is there a way to redirect standard output to multiple files?
Many file descriptors cannot be made one file descriptor. You need to write into each file descriptor separately. This is what tee utility does for you.

What you are asking for is the exact reason why the tee command exists (you can take a look at its source code here).
You cannot duplicate a file descriptor using dup2() multiple times. As you already saw, the last one overwrites any previous duplication. Therefore you cannot redirect the output of a program to multiple files directly using dup2().
In order to do this, you really need multiple descriptors, and therefore you would have to open both files, launch the command using popen() and then read from the pipe and write to both files.
Here is a very simple example of how you could do it:
#include <stdio.h>
#include <stdlib.h>
#define N 4096
int main(int argc, const char *argv[]) {
FILE *fp1, *fp2, *pipe;
fp1 = fopen("out1.txt", "w");
if (fp1 == NULL) {
perror("fopen out1 failed");
return 1;
}
fp2 = fopen("out2.txt", "w");
if (fp2 == NULL) {
perror("fopen out2 failed");
return 1;
}
// Run `ls -l` just as an example.
pipe = popen("ls -l", "r");
if (pipe == NULL) {
perror("popen failed");
return 1;
}
size_t nread, nwrote;
char buf[N];
while ((nread = fread(buf, 1, N, pipe))) {
nwrote = 0;
while (nwrote < nread)
nwrote += fwrite(buf + nwrote, 1, nread - nwrote, fp1);
nwrote = 0;
while (nwrote < nread)
nwrote += fwrite(buf + nwrote, 1, nread - nwrote, fp2);
}
pclose(pipe);
fclose(fp2);
fclose(fp1);
return 0;
}
The above code is only to give a rough estimate on how the whole thing works, it doesn't check for some errors on fread, fwrite, etc: you should of course check for errors in your final program.
It's also easy to see how this could be extended to support an arbitrary number of output files (just using an array of FILE *).

Standard output is not different from any other open file, the only special characteristic is for it to be file descriptor 1 (so only one file descriptor with index 1 can be in your process) You can dup(2) file descriptor 1 to get, let´s say file descriptor 6. That's the mission of dup() just to get another file descriptor (with a different number) than the one you use as source, but for the same source. Dupped descriptors allow you to use any of the dupped descriptors indifferently to output, or to change open flags like close on exec flag or non block or append flag (not all are shared, I'm not sure which ones can be changed without affecting the others in a dup). They share the file pointer, so every write() you attempt to any of the file descriptors will be updated in the others.
But the idea of redirection is not that. A convention in unix says that every program will receive three descriptors already open from its parent process. So to use forking, first you need to consider how to write notation to express that a program will receive (already opened) more than one output stream (so you can redirect any of them properly, before calling the program) The same also applies for joining streams. Here, the problem is more complex, as you'll need to express how the data flows might be merged into one, and this makes the merging problem, problem dependant.
File dup()ping is not a way to make a file descriptor to write in two files... but the reverse, it is a way to make two different file descriptors to reference the same file.
The only way to do what you want is to duplicate write(2) calls on every file descriptor you are going to use.
As some answer has commented, tee(1) command allows you to fork the flow of data in a pipe, but not with file descriptors, tee(1) just opens a file, and write(2)s there all the input, in addition to write(2)`ing it to stdout also.
There's no provision to fork data flows in the shell, as there's no provision to join (in paralell) dataflows on input. I think this is some abandoned idea in the shell design by Steve Bourne, and you'll probably get to the same point.
BTW, just study the possibility of using the general dup2() operator, which is <n>&m>, but again, consider that, for the redirecting program, 2>&3 2>&4 2>&5 2>&6 mean that you have pre-opened 7 file descriptors, 0...6 in which stderr is an alias of descpritors 3 to 6 (so any data written to any of those descriptors will appear into what was stderr) or you can use 2<file_a 3<file_b 4<file_c meaning your program will be executed with file descriptor 2 (stderr) redirected from file_a, and file descriptors 3 and 4 already open from files file_b and file_c. Probably, some notation should be designed (and it doesn't come easily to my mind now, how to devise it) to allow for piping (with the pipe(2) system call) between different processes that have been launched to do some task, but you need to build a general graph to allow for generality.

Related

How to reserve a file descriptor?

I'm writing a curses-based program. In order to make it simpler for me to find errors in this program, I would like to produce debug output. Due to the program already displaying a user interface on the terminal, I cannot put debugging output there.
Instead, I plan to write debugging output to file descriptor 3 unconditionally. You can invoke the program as program 3>/dev/ttyX with /dev/ttyX being a different teletype to see the debugging output. When file descriptor 3 is not opened, write calls fail with EBADF, which I ignore like all errors when writing debugging output.
A problem occurs when I open another file and no debugging output has been requested (i.e. file descriptor 3 has not been opened). In this case, the newly opened file might receive file descriptor 3, causing debugging output to randomly corrupt a file I just opened. This is a bad thing. How can I avoid this? Is there a portable way to mark a file descriptor as “reserved” or such?
Here are a couple of ideas I had and their problems:
I could open /dev/null or a temporary file to file descriptor 3 (e.g. by means of dup2()) before opening any other file. This works but I'm not sure if I can assume this to always succeed as opening /dev/null may not succeed.
I could test if file descriptor 3 is open and not write debugging output if it isn't. This is problematic when I'm attempting to restart the program by calling exec as a different file descriptor might have been opened (and not closed) prior to the exec call. I could intentionally close file descriptor 3 before calling exec when it has not been opened for debugging, but this feels really uggly.
Why use fd 3? Why not use fd 2 (stderr)? It already has a well-defined "I am logging of some sorts" meaning, is always (not true, but sufficiently true...) and you can redirect it before starting your binary, to get the logs where you want.
Another option would be to log messages to syslog, using the LOG_DEBUG level. This entails calling syslog() instead of a normal write function, but that's simply making the logging more explicit.
A simple way of checking if stderr has been redirected or is still pointing at the terminal is by using the isatty function (example code below):
#include <stdio.h>
#include <unistd.h>
int main(void) {
if (isatty(2)) {
printf("stderr is not redirected.\n");
} else {
printf("stderr seems to be redirected.\n");
}
}
In the very beginning of your program, open /dev/null and then assign it to file descriptor 3:
int fd = open ("/dev/null", O_WRONLY);
dup2(fd, 3);
This way, file descriptor 3 won't be taken.
Then, if needed, reuse dup2() to assign file descriptor 3 to your debugging output.
You claim you can't guarantee you can open /dev/null successfully, which is a little strange, but let's run with it. You should be able to use socketpair() to get a pair of FDs. You can then set the write end of the pair non-blocking, and dup2 it. You claim you are already ignoring errors on writes to this FD, so the data going in the bit-bucket won't bother you. You can of course close the other end of the socketpair.
Don't focus on a specific file descriptor value - you can't control it in a portable manner anyway. If you can control it at all. But you can use an environment variable to control debug output to a file:
int debugFD = getDebugFD();
...
int getDebugFD()
{
const char *debugFile = getenv( "DEBUG_FILE" );
if ( NULL == debugFile )
{
return( -1 );
}
int fd = open( debugFile, O_CREAT | O_APPEND | O_WRONLY, 0644 );
// error checking can be here
return( fd );
}
Now you can write your debug output to debugFD. I assume you know enough to make sure debugFD is visible where you need it, and also how to make sure it's initialized before trying to use it.
If you don't pass a DEBUG_FILE envval, you get an invalid file descriptor and your debug calls fail - presumably silently.

Redirecting of stdout in bash vs writing to file in c with fprintf (speed)

I am wondering which option is basically quicker.
What interests me the most is the mechanism of redirection. I suspect the file is opened at the start of the program ./program > file and is closed at the end. Hence every time a program outputs something it should be just written to a file, as simple as it sounds. Is it so? Then I guess both options should be comparable when it comes to speed.
Or maybe it is more complicated process since the operating system has to perform more operations?
There is no much difference between that options (except making file as a strict option reduces flexibility of your program).
To compare both approaches, let's check, what stays behind a magical entity FILE*:
So in both cases we have a FILE* object, a file descriptor fd - a gateway to an OS kernel and in-kernel infrastructure that provides access to files or user terminals, which should (unless libc has some special initializer for stdout or kernel specially handles files with fd = 1).
How does bash redirection work in compare with fopen()?
When bash redirects file:
fork() // new process is created
fd = open("file", ...) // open new file
close(1) // get rid of fd=1 pointing to /dev/pts device
dup2(fd, 1) // make fd=1 point to opened file
close(fd) // get rid of redundant fd
execve("a") // now "a" will have file as its stdout
// in a
stdout = fdopen(1, ...)
When you open file on your own:
fork() // new process is created
execve("a") // now "a" will have file as its stdout
stdout = fdopen(1, ...)
my_file = fopen("file", ...)
fd = open("file", ...)
my_file = fdopen(fd, ...)
So as you can see, the main bash difference is twiddling with file descriptors.
Yes, you are right. The speed will be identical. The only difference in the two cases is which program opens and closes the file. When you redirect it using shell, it is the shell that opens the file and makes the handle available as stdout to the program. When the program opens the file, well, the program opens the file. After that, the handle is a file handle in both the cases, so there should be absolutely no difference in speed.
As a side remark, the program which writes to stdout can be used in more general ways. You can for example say
./program | ssh remotehost bash -c "cat > file"
which will cause the output of the program to be written to file on remotehost. Of course in this case there is no comparison like one you are making in the question.
stdout is a FILE handle, fprintf writes to a file handle, so the speed will be very similar in both cases. In fact printf("Some string") is equivalent to fprintf(stdout, "Some string"). I will say no more :)

auto delete file on linux

I am trying to do a file be deleted when a program ends. I remember that before I could put the unlink() before the first close() and I don't need reopen the file.
What I expect: The file is erased after the program ends.
What is happening: The file is erased when the call to unlink happens the file is erased.
My sample program:
int main()
{
int fd = open(argv[1], O_CREAT);
int x = 1;
write(fd, "1234\n", 5);
close(fd);
fd = open(argv[1], 0);
unlink(argv[1]);
while (x <= 3)
{
int k;
scanf(" %d", &k);
x++;
}
close(fd);
return 0;
}
Has a way that I can open() the file, interact with it and on close() delete the file from harddisk? I'm using fedora linux 18.
I need know the name of the file that I did open in this way because it will be used by another application.
Unlinking a file simply detaches the file name from the underlying inode, making it impossible to open the file using that file name afterwards.
If any process has the file still open, they can happily read and write it, as those operations operate on the inode and not the file name. Also, if there are hardlinks (other file names referring to the same inode) left, those other file names can be used to open the file just fine. See e.g. the Wikipedia article on inodes for further details.
Edited to add:
In Linux, you can leverage the /proc pseudofilesystem. If your application (with process ID PID) has file descriptor FD open, with the file name already unlinked, it can still let another application work on it by telling the other application to work on /proc/PID/fd/FD. It is a pseudo-file, meaning it looks like a (non-functioning!) symlink, but it is not -- it's just useful Linux kernel magic: as long as the other application just opens it normally (open()/fopen() etc., no lstat()/readlink() stuff), they will get access as if they were opening a normal file.
As a real-world example, open two terminals, and in one write
bash -c 'exec 3<>foobar ; echo $$ ; rm foobar ; echo "Initial contents" >&3 ; cat >&3'
The first line it outputs is the PID, and FD is 3 here. Anything you type (after pressing Enter) will be appended to a file that was briefly named foobar, but no longer exists. (You can easily verify that.)
In a second terminal, type
cat /proc/PID/fd/3
to see what that file contains.
It sounds like what you really want is tmpfile():
The tmpfile() function opens a unique temporary file in binary
read/write (w+b) mode. The file will be automatically deleted when it
is closed or the program terminates.
The File is unlinked, so it won't show up from ls... but the file still exists there is an inode and you could actually re-link it... the file won't be removed from the disk until all file descriptors pointing to it are closed...
you could still read and write to the fd while it is open after it is unlinked...

Reopen a file descriptor with another access?

Assume the OS is linux. Suppose I opened a file for write and get a file descriptor fdw. Is it possible to get another file descriptor fdr, with read-only access to the file without calling open again? The reason I don't want to call open is the underlying file may have been moved or even unlinked in the file system by other processes, so re-use the same file name is not reliable against such actions. So my question is: is there anyway to open a file descriptor with different access right if given only a file descriptor? dup or dup2 doesn't change the access right, I think.
Yes! The trick is to access the deleted file via /proc/self/fd/n. It’s a linux-only trick, as far as I know.
Run this program:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
int main() {
FILE* out_file;
FILE* in_file;
char* dev_fd_path;
char buffer[128];
/* Write “hi!” to test.txt */
out_file = fopen("test.txt", "w");
fputs("hi!\n", out_file);
fflush(out_file);
/* Delete the file */
unlink("test.txt");
/* Verify that the file is gone */
system("ls test.txt");
/* Reopen the filehandle in read-mode from /proc */
asprintf(&dev_fd_path, "/proc/self/fd/%d", fileno(out_file));
in_file = fopen(dev_fd_path, "r");
if (!in_file) {
perror("in_file is NULL");
exit(1);
}
printf("%s", fgets(buffer, sizeof(buffer), in_file));
return 0;
}
It writes some text to a file, deletes it, but keeps the file descriptor open and and then reopens it via a different route. Files aren’t actually deleted until the last process holding the last file descriptor closes it, and until then, you can get at the file contents via /proc.
Thanks to my old boss Anatoly for teaching me this trick when I deleted some important files that were fortunately still being appended to by another process!
No, the fcntl call will not let you set the read/write bits on an open file descriptor and the only way to get a new file descriptor from an existing one is by using the duplicate functionality. The calls to dup/dup2/dup3 (and fcntl) do not allow you to change the file access mode.
NOTE: this is true for Linux, but not true for other Unixes in general. In HP-UX, for example, [see (1) and (2)] you are able to change the read/write bits with fcntl using F_SETFL on an open file descriptor. Since file descriptors created by dup share the same status flags, however, changing the access mode for one will necessarily change it for the other.

dup2 / dup - Why would I need to duplicate a file descriptor?

I'm trying to understand the use of dup2 and dup.
From the man page:
DESCRIPTION
dup and dup2 create a copy of the file descriptor oldfd. After successful return of dup or dup2, the old and new descriptors may be used interchangeably. They share locks, file position pointers and flags; for example, if the file position is modified by using lseek on one of the descriptors, the position is also changed for the other.
The two descriptors do not share the close-on-exec flag, however. dup uses the lowest-numbered unused descriptor for the new descriptor.
dup2 makes newfd be the copy of oldfd, closing newfd first if necessary.
RETURN VALUE
dup and dup2 return the new descriptor, or -1 if an error occurred (in which case, errno is set appropriately).
Why would I need that system call? What is the use of duplicating the file descriptor? If I have the file descriptor, why would I want to make a copy of it? I'd appreciate it if you could explain and give me an example where dup2 / dup is needed.
The dup system call duplicates an existing file descriptor, returning a new one that
refers to the same underlying I/O object.
Dup allows shells to implement commands like this:
ls existing-file non-existing-file > tmp1 2>&1
The 2>&1 tells the shell to give the command a file descriptor 2 that is a duplicate of descriptor 1. (i.e stderr & stdout point to same fd).
Now the error message for calling ls on non-existing file and the correct output of ls on existing file show up in tmp1 file.
The following example code runs the program wc with standard input connected
to the read end of a pipe.
int p[2];
char *argv[2];
argv[0] = "wc";
argv[1] = 0;
pipe(p);
if(fork() == 0) {
close(STDIN); //CHILD CLOSING stdin
dup(p[STDIN]); // copies the fd of read end of pipe into its fd i.e 0 (STDIN)
close(p[STDIN]);
close(p[STDOUT]);
exec("/bin/wc", argv);
} else {
write(p[STDOUT], "hello world\n", 12);
close(p[STDIN]);
close(p[STDOUT]);
}
The child dups the read end onto file descriptor 0, closes the file de
scriptors in p, and execs wc. When wc reads from its standard input, it reads from the
pipe.
This is how pipes are implemented using dup, well that one use of dup now you use pipe to build something else, that's the beauty of system calls,you build one thing after another using tools which are already there , these tool were inturn built using something else so on ..
At the end system calls are the most basic tools you get in kernel
Cheers :)
Another reason for duplicating a file descriptor is using it with fdopen. fclose closes the file descriptor that was passed to fdopen, so if you don't want the original file descriptor to be closed, you have to duplicate it with dup first.
dup is used to be able to redirect the output from a process.
For example, if you want to save the output from a process, you duplicate the output (fd=1), you redirect the duplicated fd to a file, then fork and execute the process, and when the process finishes, you redirect again the saved fd to output.
Some points related to dup/dup2 can be noted please
dup/dup2 - Technically the purpose is to share one File table Entry inside a single process by different handles. ( If we are forking the descriptor is duplicated by default in the child process and the file table entry is also shared).
That means we can have more than one file descriptor having possibly different attributes for one single open file table entry using dup/dup2 function.
(Though seems currently only FD_CLOEXEC flag is the only attribute for a file descriptor).
http://www.gnu.org/software/libc/manual/html_node/Descriptor-Flags.html
dup(fd) is equivalent to fcntl(fd, F_DUPFD, 0);
dup2(fildes, fildes2); is equivalent to
close(fildes2);
fcntl(fildes, F_DUPFD, fildes2);
Differences are (for the last)- Apart from some errno value beteen dup2 and fcntl
close followed by fcntl may raise race conditions since two function calls are involved.
Details can be checked from
http://pubs.opengroup.org/onlinepubs/009695399/functions/dup.html
An Example of use -
One interesting example while implementing job control in a shell, where the use of dup/dup2 can be seen ..in the link below
http://www.gnu.org/software/libc/manual/html_node/Launching-Jobs.html#Launching-Jobs

Resources