Open stdout of new process as file descriptor - c

Somewhere I've seen a method which lets you open the output of a program like a file. I vaguely remember that it involved opening a pseudo-filename which contains a pipe | symbol, but I was unable to find it on the internet. I think it looked something like this:
int fd = open("| some_program", O_RDONLY);
read(fd, buffer, 100); // read 100 bytes of some_program output
This would:
Start some_program as a new process
Create a new pipe attached to the stdout of some_program
Return the other end of the pipe as a file descriptor
Does this method exist somewhere (maybe only in a specific language, not C) or did I just dream it up?

Related

Can a file descriptor be duplicated multiple times?

I've been looking for quite a while and cannot find the answer to my question.
I'm trying to reproduce a shell in C, with full redirections. In order to do this, I wanted to open the file before executing my command.
For example, in ls > file1 > file2, I use dup2(file1_fd, 1) and dup2(file2_fd, 1) and then I execute ls to fill the files, but it seems a standard output can only be open once so only file2 will be filled, because it was the last one to be duplicated.
Is there a way to redirect standard output to multiple file?
Is there something I am missing? Thanks!
Is there a way to redirect standard output to multiple files?
Many file descriptors cannot be made one file descriptor. You need to write into each file descriptor separately. This is what tee utility does for you.
What you are asking for is the exact reason why the tee command exists (you can take a look at its source code here).
You cannot duplicate a file descriptor using dup2() multiple times. As you already saw, the last one overwrites any previous duplication. Therefore you cannot redirect the output of a program to multiple files directly using dup2().
In order to do this, you really need multiple descriptors, and therefore you would have to open both files, launch the command using popen() and then read from the pipe and write to both files.
Here is a very simple example of how you could do it:
#include <stdio.h>
#include <stdlib.h>
#define N 4096
int main(int argc, const char *argv[]) {
FILE *fp1, *fp2, *pipe;
fp1 = fopen("out1.txt", "w");
if (fp1 == NULL) {
perror("fopen out1 failed");
return 1;
}
fp2 = fopen("out2.txt", "w");
if (fp2 == NULL) {
perror("fopen out2 failed");
return 1;
}
// Run `ls -l` just as an example.
pipe = popen("ls -l", "r");
if (pipe == NULL) {
perror("popen failed");
return 1;
}
size_t nread, nwrote;
char buf[N];
while ((nread = fread(buf, 1, N, pipe))) {
nwrote = 0;
while (nwrote < nread)
nwrote += fwrite(buf + nwrote, 1, nread - nwrote, fp1);
nwrote = 0;
while (nwrote < nread)
nwrote += fwrite(buf + nwrote, 1, nread - nwrote, fp2);
}
pclose(pipe);
fclose(fp2);
fclose(fp1);
return 0;
}
The above code is only to give a rough estimate on how the whole thing works, it doesn't check for some errors on fread, fwrite, etc: you should of course check for errors in your final program.
It's also easy to see how this could be extended to support an arbitrary number of output files (just using an array of FILE *).
Standard output is not different from any other open file, the only special characteristic is for it to be file descriptor 1 (so only one file descriptor with index 1 can be in your process) You can dup(2) file descriptor 1 to get, let´s say file descriptor 6. That's the mission of dup() just to get another file descriptor (with a different number) than the one you use as source, but for the same source. Dupped descriptors allow you to use any of the dupped descriptors indifferently to output, or to change open flags like close on exec flag or non block or append flag (not all are shared, I'm not sure which ones can be changed without affecting the others in a dup). They share the file pointer, so every write() you attempt to any of the file descriptors will be updated in the others.
But the idea of redirection is not that. A convention in unix says that every program will receive three descriptors already open from its parent process. So to use forking, first you need to consider how to write notation to express that a program will receive (already opened) more than one output stream (so you can redirect any of them properly, before calling the program) The same also applies for joining streams. Here, the problem is more complex, as you'll need to express how the data flows might be merged into one, and this makes the merging problem, problem dependant.
File dup()ping is not a way to make a file descriptor to write in two files... but the reverse, it is a way to make two different file descriptors to reference the same file.
The only way to do what you want is to duplicate write(2) calls on every file descriptor you are going to use.
As some answer has commented, tee(1) command allows you to fork the flow of data in a pipe, but not with file descriptors, tee(1) just opens a file, and write(2)s there all the input, in addition to write(2)`ing it to stdout also.
There's no provision to fork data flows in the shell, as there's no provision to join (in paralell) dataflows on input. I think this is some abandoned idea in the shell design by Steve Bourne, and you'll probably get to the same point.
BTW, just study the possibility of using the general dup2() operator, which is <n>&m>, but again, consider that, for the redirecting program, 2>&3 2>&4 2>&5 2>&6 mean that you have pre-opened 7 file descriptors, 0...6 in which stderr is an alias of descpritors 3 to 6 (so any data written to any of those descriptors will appear into what was stderr) or you can use 2<file_a 3<file_b 4<file_c meaning your program will be executed with file descriptor 2 (stderr) redirected from file_a, and file descriptors 3 and 4 already open from files file_b and file_c. Probably, some notation should be designed (and it doesn't come easily to my mind now, how to devise it) to allow for piping (with the pipe(2) system call) between different processes that have been launched to do some task, but you need to build a general graph to allow for generality.

What are the rules of closing file descriptors after calling dup/dup2?

I feel like this is a topic I've taken for granted. In the past I literally just closed as many file descriptors "because I was told to". Most of the time this worked, but occasionally I ran into some unpredictable behaviour.
Thus, I'd like to ask - what the rule for closing file descriptors after calling dup / dup2?
Let's say I want to perform cat < in > out.
fd[IN] = open("in", O_RDONLY);
saved_stdin = dup(STDIN_FILENO);
dup2(fd[IN], STDIN_FILENO);
close(fd[IN])
fd[OUT] = open("out", O_WRONLY | O_CREAT | O_TRUNC, 0644);
saved_stdout = dup(STDOUT_FILENO);
dup2(fd[OUT], STDOUT_FILENO);
close(fd[OUT])
// Later on when I want to restore stdin and stdout
dup2(saved_stdin, STDIN_FILENO);
close(saved_stdin);
dup2(saved_stdout, STDINOUT_FILENO);
close(saved_stdout);
Is this correct or should I be closing more file descriptors?
The rule is indeed quite simple. For both dup() variants, it is true, that:
The source fd remains open and will have to be closed once it is no longer needed.
The target file descriptor,
when using dup(), is always an unused one
when using dup2(), is implicitly closed and replaced by a copy of the source fd.
The new target fd has to be closed, when it is no longer needed.
Source fd refers to the file descriptor to be duplicated, while target fd is the new file descriptor.
int new_fd = dup(source_fd);
dup2(source_fd, new_fd);
So yes, your code does the necessary closes, and no unneeded ones.
The figures is come from CSAPP System-Level:
Figure 2: Before Redirect IO
dup2(4,1);
Figure 1: After Redirect IO
Notice the refcnt of fd 1 has changed to 0, after call dup2.
According to the description of close in linux manual. It said:
if the file descriptor was the last reference to a file which has been removed using unlink(2), the file is deleted.
close be used to decrease the refcnt of opened file. we use dup to create a new fd will increase the refcnt of opened file. when we call close function, it did't close the file immediately, it only decrease the reference count of file. the file will be close/delete when the refcnt is 0.
So it's really like the Reference counting for memory management.

how to use the system call dup?

I am trying to understand how the system call dup() works. I am asking this question because I am writing a shell in C and I need to redirect the STDOUT to a file. Is this the right way to do it?
If for example I have the following code:
remember = dup(STDOUT_FILENO);
fileDescriptor = open("file.txt",O_RDONLY);
then everything that writes to the stdout will now write to the opened file?
As soon as the following line is executed:
remember = dup(STDOUT_FILENO);
STDOUT_FILENO is removed from the table of file descriptors leaving the first spot empty. When a new file is opened, the earliest empty file descriptor will be appointed to this new opened file, so in this case 1.
Nope. You just duplicate the file descriptor for stdout.
With the code you have so far, you could now do a write to remember, and the output would go to console, too:
char str = "this now goes to console, too!";
write(remember, str, strlen(str));
If you want to redirect console output, you yet have to do this:
dup2(fileDescriptor, STDOUT_FILENO);
This will close STDOUT_FILENO (but you have a duplicate in remember to restore it, if need be) and overwrite it with fileDescriptor – and from now on, console output goes to file...
If you don't ever consider to restore outputting to console, you can ommit the first call to dup entirely...
Edit (in response to your edit):
STDOUT_FILENO is removed from the table of file descriptors leaving the first spot empty. When a new file is opened, the earliest empty file descriptor will be appointed to this new opened file, so in this case 1.
This applies for close(STDOUT_FILENO)!
So back to if not wanting to restore: You could then do, too:
close(STDOUT_FILENO);
fileDescriptor = open("file.txt",O_WRONLY | O_CREAT);
// fileDescriptor will be 1 now
By the way: You must open your file with write access enabled (O_WRONLY or O_RDWR), as you want to write to that file (redirect output to)!
And you need the O_CREAT flag for the case the file does not exist yet. If you do not want to clear the file, but append to, add the O_APPEND flag (see open).
No dup is used to duplicate an existing descriptor and returning a duplicate whose value is the less possible among free descriptors (some docs says ``The new descriptor returned by the call is the lowest
numbered descriptor currently not in use by the process.''), so:
fileDescriptor = open("file.txt",O_RDONLY); // get new desc.
close(STDIN_FILENO); // close 0
dup(fileDescriptor); // dup new desc to 0 (less possible free desc).
// here fileDescriptor and 0 are aliases to the same opened file
close(fileDescriptor); // free unused desc.

Redirecting stdout to socket

I am trying to redirect stdout to a socket. I do something like this:
dup2(new_fd, STDOUT_FILENO);
After doing so all stdio functions writing to the stdout fail. I have tried to reopen stdout this way:
fclose(stdout);
stdout = fdopen(STDOUT_FILENO, "wb");
But printf and other functions still don't work.
EDIT:
I am affraid that I misunderstood the problem at the first place. After some more debugging I've figured out that this is a real issue:
printf("Test"); // We get Broken pipe here
// Reconnect new_fd
dup2(new_fd, STDERR_FILENO);
printf("Test"); // This also returns Broken pipe despite that stdout is fine now
Thanks.
1: on dup2(src, dst)
A number of operating systems track open files through the use of file descriptors. dup2 internally duplicates a file descriptor from the src to dst, closing dst if its already open.
What your first statement is doing is making every write to STDOUT_FILENO to go to the object represented by new_fd. I say object because it could be a socket as well as a file.
I don't see anything wrong with your first line of code, but I don't know how new_fd is defined.
2: on reopening stdout
When you close a file descriptor, the OS removes it from its table. However, when you open a file descriptor, the OS sets the smallest available file descriptor as the returned value. Thus, to reopen stdout, all you need to do is reopen the device. I believe the device changes depending on the OS. For example, on my Mac the device is /dev/tty.
Therefore, to reopen the stdout, you want to do the following:
close(1);
open("/dev/tty", O_WRONLY);
I've solved the problem by clearing a stdio's error indicator after fixing stdout:
clearerr(stdout);
Thanks for your help.

dup2 / dup - Why would I need to duplicate a file descriptor?

I'm trying to understand the use of dup2 and dup.
From the man page:
DESCRIPTION
dup and dup2 create a copy of the file descriptor oldfd. After successful return of dup or dup2, the old and new descriptors may be used interchangeably. They share locks, file position pointers and flags; for example, if the file position is modified by using lseek on one of the descriptors, the position is also changed for the other.
The two descriptors do not share the close-on-exec flag, however. dup uses the lowest-numbered unused descriptor for the new descriptor.
dup2 makes newfd be the copy of oldfd, closing newfd first if necessary.
RETURN VALUE
dup and dup2 return the new descriptor, or -1 if an error occurred (in which case, errno is set appropriately).
Why would I need that system call? What is the use of duplicating the file descriptor? If I have the file descriptor, why would I want to make a copy of it? I'd appreciate it if you could explain and give me an example where dup2 / dup is needed.
The dup system call duplicates an existing file descriptor, returning a new one that
refers to the same underlying I/O object.
Dup allows shells to implement commands like this:
ls existing-file non-existing-file > tmp1 2>&1
The 2>&1 tells the shell to give the command a file descriptor 2 that is a duplicate of descriptor 1. (i.e stderr & stdout point to same fd).
Now the error message for calling ls on non-existing file and the correct output of ls on existing file show up in tmp1 file.
The following example code runs the program wc with standard input connected
to the read end of a pipe.
int p[2];
char *argv[2];
argv[0] = "wc";
argv[1] = 0;
pipe(p);
if(fork() == 0) {
close(STDIN); //CHILD CLOSING stdin
dup(p[STDIN]); // copies the fd of read end of pipe into its fd i.e 0 (STDIN)
close(p[STDIN]);
close(p[STDOUT]);
exec("/bin/wc", argv);
} else {
write(p[STDOUT], "hello world\n", 12);
close(p[STDIN]);
close(p[STDOUT]);
}
The child dups the read end onto file descriptor 0, closes the file de
scriptors in p, and execs wc. When wc reads from its standard input, it reads from the
pipe.
This is how pipes are implemented using dup, well that one use of dup now you use pipe to build something else, that's the beauty of system calls,you build one thing after another using tools which are already there , these tool were inturn built using something else so on ..
At the end system calls are the most basic tools you get in kernel
Cheers :)
Another reason for duplicating a file descriptor is using it with fdopen. fclose closes the file descriptor that was passed to fdopen, so if you don't want the original file descriptor to be closed, you have to duplicate it with dup first.
dup is used to be able to redirect the output from a process.
For example, if you want to save the output from a process, you duplicate the output (fd=1), you redirect the duplicated fd to a file, then fork and execute the process, and when the process finishes, you redirect again the saved fd to output.
Some points related to dup/dup2 can be noted please
dup/dup2 - Technically the purpose is to share one File table Entry inside a single process by different handles. ( If we are forking the descriptor is duplicated by default in the child process and the file table entry is also shared).
That means we can have more than one file descriptor having possibly different attributes for one single open file table entry using dup/dup2 function.
(Though seems currently only FD_CLOEXEC flag is the only attribute for a file descriptor).
http://www.gnu.org/software/libc/manual/html_node/Descriptor-Flags.html
dup(fd) is equivalent to fcntl(fd, F_DUPFD, 0);
dup2(fildes, fildes2); is equivalent to
close(fildes2);
fcntl(fildes, F_DUPFD, fildes2);
Differences are (for the last)- Apart from some errno value beteen dup2 and fcntl
close followed by fcntl may raise race conditions since two function calls are involved.
Details can be checked from
http://pubs.opengroup.org/onlinepubs/009695399/functions/dup.html
An Example of use -
One interesting example while implementing job control in a shell, where the use of dup/dup2 can be seen ..in the link below
http://www.gnu.org/software/libc/manual/html_node/Launching-Jobs.html#Launching-Jobs

Resources