How to clone a file descriptor? (not just duplicate it) - c

I want to clone a file descriptor. So that changing it with fcntl() does not change the original file descriptor.
Reopening the same path does not work in my case, the file descriptor may point to a pipe or socket.
Background:
I want to read from an inherited file descriptor without blocking. But when i enable the flag O_NONBLOCK, the parents file descriptor is also non-blocking, and if the parent or anything else uses the same file description and sets it to blocking, all file descriptors of that file description are blocking, in all processes using it. A dup() call does also not help, a call to fcntl() will change both file descriptors. The parent breaks when the file descriptor is non-blocking and the child breaks when the file descriptor is blocking.
I can't use recv() because it only works on sockets, the file descriptor can be a socket but can also be a regular file, pipe or a fifo.
I could try to change the file back to non-blocking before I exit the child, however that may not work when the child exits in a unplanned way. I can't change the parent.

What you ask for cannot be done. Things like file position and certain I/O modes, including O_NONBLOCK, are properties of the open file description “beneath” the integer file descriptor.
Functions that newly acquire a file or file-like object (open, pipe, socket, etc.) allocate a new and distinct file description. However, as you’ve discovered, dup and fork and friends give you distinct descriptors which refer to a shared underlying description.

Related

Closing and reopening piped file descriptors for writing in c

I have a question please regarding what happens if I closed a file descriptor after writing into it ( e.g fd[1] after piping fd ), then opened it again to write. Will the data be overwritten and all the previous ones will be gone or it will keep on writing from the end point it stopped at after the first write?
I used the system call open() with the file descriptor and no other arguments.
If you close either of the file descriptors for a pipe, it can never be reopened. There is no name by which to reopen it. Even with /dev/fd file systems, once you close the file descriptor, the corresponding entry in the file system is removed — you're snookered.
Don't close a pipe if you might need to use it again.
Consider whether to make a duplicate of the pipe before closing; you can then either use the duplicate directly or duplicate the duplicate back to the original (pipe) file descriptor, but that's cheating; you didn't actually close all the references to the pipe's file descriptor. (Note that the process(es) at the other end of the pipe won't get an EOF indication because of the close — there's still an open file descriptor referring to the pipe.)

Processes and a shared file descriptor

I have an application that creates multiple instances (processes) of itself and these processes have a shared data structure. In that struct there is a file descriptor used for logging data to file. There is a check in the logging function that checks to see if the file descriptor is -1 and if it is then it opens the file and sets the value of the shared file descriptor.
Other processes / threads do the same check but at this time the fd is != -1. So the file does not get opened. They then continue to writing to the file. The write fails most of the time and returns -1. When the write did not fail I checked the file path of the fd using readlink. The path was some other file than the log file.
I am assuming that this is because even though the file descriptor value was always 11, even in subsequent runs, that value refers to a different file for each process. So it is the eleventh file that process has open? So the log file is not even regarded as open for these processes and even if they do open the file the fd would be different.
So my question is this correct? My second question is how do I then re-implement this method given that multiple processes need to write to this log file. Would each process need to open that file.. or is there another way that is more efficient.. do I need to close the file so that other processes can open and write to it..?
EDIT:
The software is an open source software called filebench.
The file can be seen here.
Log method is filebench_log. Line 204 is the first check I mentioned where the file is opened. The write happens at line 293. The fd value is eleven among all processes and the value is the same: 11. It is actually shared through all processes and setup mostly here. The file is only opened once (verified via print statements).
The shared data struct that has the fd is called
filebench_shm
and the fd is
filebench_shm->shm_log_fd
EDIT 2:
The error message that I get is Bad file descriptor. Errno is 9.
EDIT 3:
So it seems that each process has a different index table for the fds. Wiki:
On Linux, the set of file descriptors open in a process can be accessed under the path /proc/PID/fd/, where PID is the process identifier.
So the issue that I am having is that for two processes with process IDs 101, 102 the file descriptor 11 is not the same for the two processes:
/proc/101/fd/11
/proc/102/fd/11
I have a shared data structure between these processes.. is there another way I can share an open file between them other than an fd since that doesn't work?
It seems that it would be simplest to open the file before spawning the new processes. This avoids all the coordination complexity regarding opening the file by centralizing it to one time and place.
I originally wrote this as a solution:
Create a shared memory segment.
Put the file descriptor variable in the segment.
Put a mutex semaphore in the segment
Each process accesses the file descriptor in the segment. If it is not open, lock the semaphore, check if it is open, and if not open the
file. Release the mutex.
That way all processes share the same file descriptor.
But this assumes that the the underlying file descriptor object is also in the shared memory, which I think it is not.
Instead, use the open then fork method mentioned in the other answer, or have each process open the file and use flock to serialize access when needed.

File descriptors before fork()

I know that if I call the open function before the fork(), the IO pointer is shared between the processes.
If one of these processes closes the file calling the close(fd) function, will the other processes still be capable to write/read the file or will the file be closed for everyone?
Yes. Each process has a copy of the file descriptor (among other things). So one process closing it won't affect the copy of the fd in other process.
From fork() manual:
The child inherits copies of the parent's set of open file
descriptors. Each file descriptor in the child refers to the same
open file description (see open(2)) as the corresponding file
descriptor in the parent. This means that the two descriptors
share open file status flags, current file offset, and signal-
driven I/O attributes (see the description of F_SETOWN and
F_SETSIG in fcntl(2)).
From close() manual:
If fd is the last file descriptor referring to the underlying open
file description (see open(2)), the resources associated with the
open file description are freed; if the descriptor was the last
reference to a file which has been removed using unlink(2), the file
is deleted.
So if you do close(fd); it closes only the reference in that process and other process holding another reference to the same file descriptor can continue to operate on it.
Whenever a child process is created, it gets a copy of the file descriptor table from the parent process. And there is a reference count corresponding to each file descriptor, that is the number of processes currently accessing the file. So, if a file is open in master process and a child process is created, the reference count increments, as it is now open in child process as well, and when it is closed in any of the processes, it decrements. A file is finally closed when the reference count reaches zero.

dup2 / dup - Why would I need to duplicate a file descriptor?

I'm trying to understand the use of dup2 and dup.
From the man page:
DESCRIPTION
dup and dup2 create a copy of the file descriptor oldfd. After successful return of dup or dup2, the old and new descriptors may be used interchangeably. They share locks, file position pointers and flags; for example, if the file position is modified by using lseek on one of the descriptors, the position is also changed for the other.
The two descriptors do not share the close-on-exec flag, however. dup uses the lowest-numbered unused descriptor for the new descriptor.
dup2 makes newfd be the copy of oldfd, closing newfd first if necessary.
RETURN VALUE
dup and dup2 return the new descriptor, or -1 if an error occurred (in which case, errno is set appropriately).
Why would I need that system call? What is the use of duplicating the file descriptor? If I have the file descriptor, why would I want to make a copy of it? I'd appreciate it if you could explain and give me an example where dup2 / dup is needed.
The dup system call duplicates an existing file descriptor, returning a new one that
refers to the same underlying I/O object.
Dup allows shells to implement commands like this:
ls existing-file non-existing-file > tmp1 2>&1
The 2>&1 tells the shell to give the command a file descriptor 2 that is a duplicate of descriptor 1. (i.e stderr & stdout point to same fd).
Now the error message for calling ls on non-existing file and the correct output of ls on existing file show up in tmp1 file.
The following example code runs the program wc with standard input connected
to the read end of a pipe.
int p[2];
char *argv[2];
argv[0] = "wc";
argv[1] = 0;
pipe(p);
if(fork() == 0) {
close(STDIN); //CHILD CLOSING stdin
dup(p[STDIN]); // copies the fd of read end of pipe into its fd i.e 0 (STDIN)
close(p[STDIN]);
close(p[STDOUT]);
exec("/bin/wc", argv);
} else {
write(p[STDOUT], "hello world\n", 12);
close(p[STDIN]);
close(p[STDOUT]);
}
The child dups the read end onto file descriptor 0, closes the file de
scriptors in p, and execs wc. When wc reads from its standard input, it reads from the
pipe.
This is how pipes are implemented using dup, well that one use of dup now you use pipe to build something else, that's the beauty of system calls,you build one thing after another using tools which are already there , these tool were inturn built using something else so on ..
At the end system calls are the most basic tools you get in kernel
Cheers :)
Another reason for duplicating a file descriptor is using it with fdopen. fclose closes the file descriptor that was passed to fdopen, so if you don't want the original file descriptor to be closed, you have to duplicate it with dup first.
dup is used to be able to redirect the output from a process.
For example, if you want to save the output from a process, you duplicate the output (fd=1), you redirect the duplicated fd to a file, then fork and execute the process, and when the process finishes, you redirect again the saved fd to output.
Some points related to dup/dup2 can be noted please
dup/dup2 - Technically the purpose is to share one File table Entry inside a single process by different handles. ( If we are forking the descriptor is duplicated by default in the child process and the file table entry is also shared).
That means we can have more than one file descriptor having possibly different attributes for one single open file table entry using dup/dup2 function.
(Though seems currently only FD_CLOEXEC flag is the only attribute for a file descriptor).
http://www.gnu.org/software/libc/manual/html_node/Descriptor-Flags.html
dup(fd) is equivalent to fcntl(fd, F_DUPFD, 0);
dup2(fildes, fildes2); is equivalent to
close(fildes2);
fcntl(fildes, F_DUPFD, fildes2);
Differences are (for the last)- Apart from some errno value beteen dup2 and fcntl
close followed by fcntl may raise race conditions since two function calls are involved.
Details can be checked from
http://pubs.opengroup.org/onlinepubs/009695399/functions/dup.html
An Example of use -
One interesting example while implementing job control in a shell, where the use of dup/dup2 can be seen ..in the link below
http://www.gnu.org/software/libc/manual/html_node/Launching-Jobs.html#Launching-Jobs

Behavior of a pipe after a fork()

When reading about pipes in Advanced Programming in the UNIX Environment, I noticed that after a fork the parent can close() the read end of a pipe and it doesn't close the read end for the child. When a process forks, does its file descriptors get retained?
What I mean by this is that before the fork the pipe read file descriptor had a retain count of 1, and after the fork 2. When the parent closed its read side the fd went to 1 and is kept open for the child. Is this essentially what is happening? Does this behavior also occur for regular file descriptors?
As one can read on the man page about fork():
The child process shall have its own copy of the parent's file
descriptors. Each of the child's file
descriptors shall refer to the same
open file description with the
corresponding file descriptor of the
parent.
So yes, the child have exact copy of parent's file descriptors and that refers to all of them, including open files.
The answer is yes, and yes (the same applies to all file descriptors, including things like sockets).
In a fork() call, the child gets its own seperate copy of each file descriptor, that each act like they had been created by dup(). A close() only closes the specific file descriptor that was passed - so for example if you do n2 = dup(n); close(n);, the file (pipe, socket, device...) that n was referring to remains open - the same applies to file descriptors duplicated by a fork().
Yes, a fork duplicates all open file descriptors.
So for a typical pipe, a 2 slot array (int fd[2]), fd[0] is the same for the parent and child, and so is fd[1].
You can create a pipe without forking at all, and read/write to yourself by using fd[0] and fd[1] in one process.

Resources