Background
I have multiple threads in the same process that are all installing fcntl(2) locks on a given file. These locks must block, thus to achieve intra-process blocking I must use Open file description locks (or OFD locks, see fcntl(2)). And it is documented that:
Open file description locks placed via the same open file
description (i.e., via the same file descriptor, or via a
duplicate of the file descriptor created by fork(2), dup(2),
fcntl() F_DUPFD, and so on) are always compatible: if a new lock
is placed on an already locked region, then the existing lock is
converted to the new lock type. (Such conversions may result in
splitting, shrinking, or coalescing with an existing lock as
discussed above.)
On the other hand, open file description locks may conflict with
each other when they are acquired via different open file
descriptions. Thus, the threads in a multithreaded program can
use open file description locks to synchronize access to a file
region by having each thread perform its own open(2) on the file
and applying locks via the resulting file descriptor.
Thus, when a thread is booting up, it must open its own descriptor via open. It should be noted that the "main thread" has the file already open and threads come and go throughout the processes lifetime.
Question
So I was thinking, is there any way I can re-use an existing file descriptor to open a separate descriptor to the same file without dup(2)?
In otherwords, if I had file descriptor A, but do not know the filename, can I open descriptor B pointing to that same file A is?
My first instinct is as follows, whereas fd is the original file descriptor and fd2 is the "deep cloned" descriptor.
char buff[500]
sprintf(buff, "/proc/%d/fd/%d", getpid(), fd);
fd2 = open(buff, O_RDWR)
However, it feels dirty. I was hoping there is a system call to do this.
Related
I've been reading through libev's source code and stumbled upon this comment:
a) epoll silently removes fds from the fd set. as nothing tells us that an fd has been removed otherwise, we have to continually "rearm" fds that we suspect might have changed (same problem with kqueue, but much less costly there).
I've been doing some tests with epoll (directly using syscalls) on some modern linux kernel and I couldn't reproduce it. I didn't see any problem with "silently disappearing fds". Could someone elaborate on this and tell me if it's still an issue?
This is rather vague text there, but I guess it is just that if the descriptor is closed elsewhere, it is silently removed from the set. From Linux manpages, epoll(7):
Q6 Will closing a file descriptor cause it to be
removed from all epoll sets automatically?
A6 Yes, but be aware of the following point. A file
descriptor is a reference to an open file
description (see open(2)). Whenever a descriptor
is duplicated via dup(2), dup2(2), fcntl(2)
F_DUPFD, or fork(2), a new file descriptor refer‐
ring to the same open file description is cre‐
ated. An open file description continues to
exist until all file descriptors referring to it
have been closed. A file descriptor is removed
from an epoll set only after all the file
descriptors referring to the underlying open file
description have been closed (or before if the
descriptor is explicitly removed using
epoll_ctl(2) EPOLL_CTL_DEL). This means that
even after a file descriptor that is part of an
epoll set has been closed, events may be reported
for that file descriptor if other file descrip‐
tors referring to the same underlying file
description remain open.
So you have a socket with fd 42. It gets closed, and subsequently removed from the epoll object. But the kernel doesn't notify the libev about this through epoll_wait. Now the epoll_modify is called again with fd = 42. epoll_modify doesn't know whether this file descriptor 42 the same that already was in the epoll object or some other file description with the file descriptor number 42 reused.
One could also argue that the comments are just ranting and the design of the libev API is at fault here.
If a process uses open(2) (or similar) to obtain more than one
descriptor for the same file, these descriptors are treated
independently by flock(). An attempt to lock the file using one of
these file descriptors may be denied by a lock that the calling
process has already placed via another descriptor.
If flock() treats the descriptors independently, why locking the file using one of the file descriptors would be denied by a lock placed via another descriptor? What does independent here mean?
Also if I unlock one of the descriptor, would other descriptors unlock as well?
treated independently by flock() means that flock() will not "ask" one descriptor, when attempting to modify the other. However, it doesn't mean they are truly independent. If flock() tries to lock one, while the other is already locked, this attempt may block.
Think of it as 2-levels mechanism. flock() looks at only one descriptor at a time, but eventually, upon the lock attempt, the system tries to move to the dipper level and actually lock, and there the problem occurs.
Also if I unlock one of the descriptor, would other descriptors unlock as well?
I'm not sure. This quote (below) states that this indeed is the case if a file has multiple descriptors from fork(2), dup(2). However there is nothing that says so in the 2nd paragraph that treats multiple open(2) which leads me to believe that it is just not a good thing to do :)
From here:
Locks created by flock() are associated with an open file description
(see open(2)). This means that duplicate file descriptors (created
by, for example, fork(2) or dup(2)) refer to the same lock, and this
lock may be modified or released using any of these file descriptors.
Furthermore, the lock is released either by an explicit LOCK_UN
operation on any of these duplicate file descriptors, or when all
such file descriptors have been closed.
If a process uses open(2) (or similar) to obtain more than one file
descriptor for the same file, these file descriptors are treated
independently by flock(). An attempt to lock the file using one of
these file descriptors may be denied by a lock that the calling
process has already placed via another file descriptor.
Suppose your process has two file descriptors, fd1 and fd2, that operate on the same file. If you lock a segment of the file on fd1, and then lock another overlapping segment also on fd1, the two locks won't interfere with each other because they're on the same file descriptor.
However, if the second lock was applied on fd2 instead of fd1, then the locks would be overlapping and the second lock would be deemed to interfere with first and would fail, despite the fact that it is the same process doing the locking.
This is the sense in which the locks on the file descriptors are independent of each other — the locking system doesn't check which process owns the interfering locks on a different file descriptor; it is sufficient that it is not the current file descriptor.
When you unlock one descriptor, you don't change the locks on any other file descriptor.
While trying to learn socket programming, I saw the following code:
int sock;
sock = socket(PF_INET, SOCK_STREAM, IPPROTO_TCP);
I browsed through the man page and found that socket returns a file descriptor. I have tried searching the internet and other similar questions here but I couldn't understand what file descriptor really is. That would be great if someone could explain file descriptor in easy language.
There are two related objects: file descriptor and file description. People often confuse these two and think they are the same.
File descriptor is an integer in your application that refers to the file description in the kernel.
File description is the structure in the kernel that maintains the state of an open file (its current position, blocking/non-blocking, etc.). In Linux file descripion is struct file.
POSIX open():
The open() function shall establish the connection between a file and a file descriptor. It shall create an open file description that refers to a file and a file descriptor that refers to that open file description. The file descriptor is used by other I/O functions to refer to that file. The path argument points to a pathname naming the file.
The open() function shall return a file descriptor for the named file that is the lowest file descriptor not currently open for that process. The open file description is new, and therefore the file descriptor shall not share it with any other process in the system.
In Unix/ Linux operating systems, a file descriptor is an abstract indicator (handle) used to access a file or other IO(input/output) resource, such as a pipe or network socket.
Normally a file descriptors index into a per-process file descriptor table maintained by the kernel in Linux/Unix OS, that in turn indexes
into a system-wide table of files opened by all processes, called the file table.
This table records the "mode" with which the file or the other resource has been opened
for the following operations(There are more operations)
reading
writing
appending
writing
and possibly other modes.
It also indexes into a third table called the inode table that describes the actual underlying files.
File Descriptors are nothing but mappings to a file. You can also say these are pointers to a file that the process is using.
FDs are just integer values which act as pointers to process resources.
Whenever a process starts, an entry of the running process is added to the /proc/<pid> directory. This is the place where all of the data related to the process is kept. Also, on process start the kernel allocates 3 file-descriptors to the process for communication with the 3 data streams referred to as stdin, stdout and stderr.
the linux kernel uses an algorithm to always create a FD with the lowest possible integer value so these data-streams are mapped to the numbers 0, 1 and 2.
Let's say in you code you opened a file to read from or to write to. This means the process needs access to a resource and it has to create a mapping/pointer for this new resource.
To do this, the kernel automatically creates a FD as soon as the file is opened by your code.
If you run ls -l /proc/<pid>/fd/ you will se an additional FD created there with id 4 (can be some other number also if the program has used other resources)
I think of file descriptors as (indirect, higher-level) pointers to opaque file objects maintained by the kernel.
Normally, when you deal with objects maintained by a library, you pass to the library pointers to objects that you're not supposed to dereference and manipulate yourself.
For kernel objects, this it's not just that you're not supposed to manipulate them yourself -- you literally can't because they live in a different address space that's not at all accessible to you. And because they live in a different address space, pointers wouldn't be a meaningful way of referring to them.
You need a token or handle which the kernel would internally resolve to a pointer that's meaningful in the kernel address space. File descriptors are such tokens in integer form.
For the kernel:
your_process_id + your_file_descriptor => kernels_file_object_pointer
(or an EBADF error if a given filedescriptor may not be resolved to a file object pointer for the given process)
I'm reading for hours but can't understand what is the difference between the two locks. The only thing I understand is that fcntl() lock is offering a granular lock that can lock specific bytes and that only fcntl() supports NFS locking.
It's said that the difference is in their semantics, how do they behave when being duplicated by dup() or while fork(), but I can't understand what is the difference in practice.
My scenario is that I'm writing to a log file in a fork() based server, where every forked process is writing to the same file when something happens. Why would I want to use flock() and why would I want to use fcntl() locks?
I have tried to figure out the differences based on available documentation and took the following conclusions (please correct me if I am wrong):
With fcntl() (POSIX):
you create a lock record on the file at filesystem level including process id.
If the process dies or closes any filedescriptor to this file, the lock record gets removed by the system.
A request for an exclusive lock shall fail if the file descriptor was not opened with write access.
simply: fnctl locks work as a Process <--> File relationship, ignoring filedescriptors
flock() (BSD) is different (Linux: since kernel 2.0, flock() is implemented as a system call in its own right rather than being emulated in the GNU C library as a call to fcntl):
flock() creates locks on systems's "Open file descriptions". "Open file descriptions" are generated by open() calls.
a filedescriptor (FD) is a reference to a "Open file description". FDs generated by dup() or fork() refer to the same "Open file description".
a process may generate multiple "Open file descriptions" for one file by opening() the file multiple times
flock() places it's locks via a FD on a "Open file description"
therefore flock() may be used to synchronize file access among processes as well as threads (in one ore more processes).
see flock(2) and especially open(2) man pages for details on "Open file descriptions".
In Your scenario you probably want to use fcntl() based locks, because your forked processes will open() the logfile on their own and do not expect to inherit a filedescriptor with a possibly placed lock.
If you need synchronisation among multiple threads, possibly in more than one process, you should use flock() based locks if your system supports them without emulation by fcntl(). Then every thread needs to open() the file rather than using dup()ed or fork()ed handles.
Edit 2022: An excellent write-up and additional thoughts here: https://lwn.net/Articles/586904/
My program does the following in chronological order
The program is started with root permissions.
Among other tasks, A file only readable with root permissions is open()ed.
Root privileges are dropped.
Child processes are spawned with clone() and the CLONE_FILES | CLONE_FS | CLONE_IO flags set, which means that while they use separate regions of virtual memory, they share the same file descriptor table (and other IO stuff).
All child processes execve() their own programs (the FD_CLOEXEC flag is not used).
The original program terminates.
Now I want every spawned program to read the contents of the aforementioned file, but after they all have read the file, I want it to be closed (for security reasons).
One possible solution I'm considering now is having a step 3a where the fd of the file is dup()licated once for every child process, and each child gets its own fd (as an argv). Then every child program would simply close() their fd, so that after all fds pointing to the file are close()d the "actual file" is closed.
But does it work that way? And is it safe to do this (i.e. is the file really closed)? If not, is there another/better method?
While using dup() as I suggested above is probably just fine, I've now --a day after asking this SO question-- realized that there is a nicer way to do this, at least from the point of view of thread safety.
All dup()licated file descriptors point to the same same file position indicator, which of course means you run into trouble when multiple threads/processes might simultaneously try to change the file position during read operations (even if your own code does so in a thread safe way, the same doesn't necessarily go for libraries you depend on).
So wait, why not just call open() multiple times (once for every child) on the needed file before dropping root? From the manual of open():
A call to open() creates a new open file description, an entry in the system-wide table of open files. This entry records the file offset and the file status flags (modifiable via the fcntl(2) F_SETFL operation). A file descriptor is a reference to one of these entries; this reference is unaffected if pathname is subsequently removed or modified to refer to a different file. The new open file description is initially not shared with any other process, but sharing may arise via fork(2).
Could be used like this:
int fds[CHILD_C];
for (int i = 0; i < CHILD_C; i++) {
fds[i] = open("/foo/bar", O_RDONLY);
// check for errors here
}
drop_privileges();
// etc
Then every child gets a reference to one of those fds through argv and does something like:
FILE *stream = fdopen(atoi(argv[FD_STRING_I]), "r")
read whatever needed from the stream
fclose(stream) (this also closes the underlying file descriptor)
Disclaimer: According to a bunch of tests I've run this is indeed safe and sound. I have however only tested open()ing with O_RDONLY. Using O_RDWR or O_WRONLY may or may not be safe.