dup() followed by close() from multiple threads or processes - c

My program does the following in chronological order
The program is started with root permissions.
Among other tasks, A file only readable with root permissions is open()ed.
Root privileges are dropped.
Child processes are spawned with clone() and the CLONE_FILES | CLONE_FS | CLONE_IO flags set, which means that while they use separate regions of virtual memory, they share the same file descriptor table (and other IO stuff).
All child processes execve() their own programs (the FD_CLOEXEC flag is not used).
The original program terminates.
Now I want every spawned program to read the contents of the aforementioned file, but after they all have read the file, I want it to be closed (for security reasons).
One possible solution I'm considering now is having a step 3a where the fd of the file is dup()licated once for every child process, and each child gets its own fd (as an argv). Then every child program would simply close() their fd, so that after all fds pointing to the file are close()d the "actual file" is closed.
But does it work that way? And is it safe to do this (i.e. is the file really closed)? If not, is there another/better method?

While using dup() as I suggested above is probably just fine, I've now --a day after asking this SO question-- realized that there is a nicer way to do this, at least from the point of view of thread safety.
All dup()licated file descriptors point to the same same file position indicator, which of course means you run into trouble when multiple threads/processes might simultaneously try to change the file position during read operations (even if your own code does so in a thread safe way, the same doesn't necessarily go for libraries you depend on).
So wait, why not just call open() multiple times (once for every child) on the needed file before dropping root? From the manual of open():
A call to open() creates a new open file description, an entry in the system-wide table of open files. This entry records the file offset and the file status flags (modifiable via the fcntl(2) F_SETFL operation). A file descriptor is a reference to one of these entries; this reference is unaffected if pathname is subsequently removed or modified to refer to a different file. The new open file description is initially not shared with any other process, but sharing may arise via fork(2).
Could be used like this:
int fds[CHILD_C];
for (int i = 0; i < CHILD_C; i++) {
fds[i] = open("/foo/bar", O_RDONLY);
// check for errors here
}
drop_privileges();
// etc
Then every child gets a reference to one of those fds through argv and does something like:
FILE *stream = fdopen(atoi(argv[FD_STRING_I]), "r")
read whatever needed from the stream
fclose(stream) (this also closes the underlying file descriptor)
Disclaimer: According to a bunch of tests I've run this is indeed safe and sound. I have however only tested open()ing with O_RDONLY. Using O_RDWR or O_WRONLY may or may not be safe.

Related

Why should I close all file descriptors after calling fork() and prior to calling exec...()? And how would I do it?

I've seen a lot of C code that tries to close all file descriptors between calling fork() and calling exec...(). Why is this commonly done and what is the best way to do it in my own code, as I've seen so many different implementations already?
When calling fork(), your operation system creates a new process by simply cloning your existing process. The new process will be pretty much identical to the process it was cloned from, except for its process ID and any properties that are documented to be replaced or reset by the fork() call.
When calling any form of exec...(), the process image of the calling process is replaced by a new process image but other than that the process state is preserved. One consequence is that open file descriptors in the process file descriptor table prior to calling exec...() are still present in that table after calling it, so the new process code inherits access to them. I guess this has probably been done so that STDIN, STDOUT, and STDERR are automatically inherited by child processes.
However, keep in mind that in POSIX C file descriptors are not only used to access actual files, they are also used for all kind of system and network sockets, pipes, shared memory identifiers, and so on. If you don't close these prior to calling exec...(), your new child process will get access to all of them, even to those resources it could not gain access on its own as it doesn't even have the required access rights. Think about a root process creating a non-root child process, yet this child would have access to all open file descriptors of the root parent process, including open files that should only be writable by root or protected server sockets below port 1024.
So unless you want a child process to inherit access to currently open file descriptors, as may explicitly be desired e.g. to capture STDOUT of a process or feed data via STDIN to that process, you are required to close them prior to calling exec...(). Not only because of security (which sometimes may play no role at all) but also because otherwise the child process will have less free file descriptors available (and think of a long chain of processes, each opening files and then spawning a sub-process... there will be less and less free file descriptors available).
One way to do that is to always open files using the flag O_CLOEXEC, which ensures that this file descriptor is automatically closed when exec...() is ever called. One problem with that solution is that you cannot control how external libraries may open files, so you cannot rely that all code will always set this flag.
Another problem is that this solution only works for file descriptors created with open(). You cannot pass that flag when creating sockets, pipes, etc. This is a known problem and some systems are working around that by offering the non-standard acccept4(), pipe2(), dup3(), and the SOCK_CLOEXEC flag for sockets, however these are not yet POSIX standard and it's unknown if they will become standard (this is planned but until a new standard has been released we cannot know for sure, also it will take years until all systems have adopted them).
What you can do is to later on set the flag FD_CLOEXEC using fcntl() on the file descriptor, however, note that this isn't safe in a multi-thread environment. Just consider the following code:
int so = socket(...);
fcntl(so, F_SETFD, FD_CLOEXEC);
If another thread calls fork() in between the first and the second line, which is of course possible, the flag has not yet been set yet and thus this file descriptor won't get closed.
So the only way that is really safe is to explicitly close them and this is not as easy as it may seem!
I've seen a lot of code that does stupid things like this:
for (int i = STDERR_FILENO + 1; i < 256; i++) close(i);
But just because some POSIX systems have a default limit of 256 doesn't mean that this limit cannot be raised. Also on some system the default limit is always higher to begin with.
Using FD_SETSIZE instead of 256 is equally wrong as just because the select() API has a hard limit by default on most systems doesn't mean that a process cannot have more open file descriptors than this limit (after all you don't have to use select() with them, you can use poll() API as a replacement and poll() has no upper limit on file descriptor numbers).
Always correct is to use OPEN_MAX instead of 256 as that is really the absolute maximum of file descriptors a process can have. The downside is that OPEN_MAX can theoretically be huge and doesn't reflect the real current runtime limit of a process.
To avoid having to close too many non-existing file descriptors, you can use this code instead:
int fdlimit = (int)sysconf(_SC_OPEN_MAX);
for (int i = STDERR_FILENO + 1; i < fdlimit; i++) close(i);
sysconf(_SC_OPEN_MAX) is documented to update correctly if the open file limit (RLIMIT_NOFILE) has been raised using setrlimit(). The resource limits (rlimits) are the effective limits for a running process and for files they will always have to be between _POSIX_OPEN_MAX (documented as the minimum number of file descriptors a process is always allowed to open, must be at least 20) and OPEN_MAX (must be at least _POSIX_OPEN_MAX and sets the upper limit).
While closing all possible descriptors in a loop is technically correct and will work as desired, it may try to close several thousand file descriptors, most of them will often not exist. Even if the close() call for a non-existing file descriptor is fast (which is not guaranteed by any standard), it may take a while on weaker systems (think of embedded devices, think of small single-board computers), which may be a problem.
So several systems have developed more efficient ways to solve this issue. Famous examples are closefrom() and fdwalk() which BSD and Solaris systems support. Unfortunately The Open Group voted against adding closefrom() to the standard (quote): "it is not possible to standardize an interface that closes arbitrary file descriptors above a certain value while still guaranteeing a conforming environment." (Source) This is of course nonsense, as they make the rules themselves and if they define that certain file descriptors can always be silently omitted from closing if the environment or system requires or the code itself requests that, then this would break no existing implementation of that function and still offer the desired functionality for the rest of us. Without these functions people will use a loop and do exactly what The Open Group tries to avoid here, so not adding it only makes the situation even worse.
On some platforms you are basically out of luck, e.g. macOS, which is fully POSIX conform. If you don't want to close all file descriptors in a loop on macOS, your only option is to not use fork()/exec...() but instead posix_spawn(). posix_spawn() is a newer API for platforms that don't support process forking, it can be implemented purely in user space on top of fork()/exec...() for those platforms that do support forking and can otherwise use some other API a platform offers for starting child processes. On macOS there exists a non-standard flag POSIX_SPAWN_CLOEXEC_DEFAULT, which will tread all file descriptors as if the CLOEXEC flag has been set on them, except for those for that you explicitly specified file actions.
On Linux you can get a list of file descriptors by looking at the path /proc/{PID}/fd/ with {PID} being the process ID of your process (getpid()), that is, if the proc file system has been mounted at all and it has been mounted to /proc (but a lot of Linux tools rely on that, not doing so would break many other things as well). Basically you can limit yourself to close all descriptors listed under this path.
True story: Once upon a time I wrote a simple little C program that opened a file, and I noticed that the file descriptor returned by open was 4. "That's funny," I thought. "Standard input, output, and error are always file descriptors 0, 1, and 2, so the first file descriptor you open is usually 3."
So I wrote another little C program that started reading from file descriptor 3 (without opening it, that is, but rather, assuming that 3 was a pre-opened fd, just like 0, 1, and 2). It quickly became apparent that, on the Unix system I was using, file descriptor 3 was pre-opened on the system password file. This was evidently a bug in the login program, which was exec'ing my login shell with fd 3 still open on the password file, and the stray fd was in turn being inherited by programs I ran from my shell.
Naturally the next thing I tried was a simple little C program to write to the pre-opened file descriptor 3, to see if I could modify the password file and give myself root access. This, however, didn't work; the stray fd 3 was opened on the password file in read-only mode.
But at any rate, this helps to explain why you shouldn't leave file descriptors open when you exec a child process.
[Footnote: I said "true story", and it mostly is, but for the sake of the narrative I did change one detail. In fact, the buggy version of /bin/login was leaving fd 3 opened on the groups file, /etc/group, not the password file.]

What is file-descriptor?

While trying to learn socket programming, I saw the following code:
int sock;
sock = socket(PF_INET, SOCK_STREAM, IPPROTO_TCP);
I browsed through the man page and found that socket returns a file descriptor. I have tried searching the internet and other similar questions here but I couldn't understand what file descriptor really is. That would be great if someone could explain file descriptor in easy language.
There are two related objects: file descriptor and file description. People often confuse these two and think they are the same.
File descriptor is an integer in your application that refers to the file description in the kernel.
File description is the structure in the kernel that maintains the state of an open file (its current position, blocking/non-blocking, etc.). In Linux file descripion is struct file.
POSIX open():
The open() function shall establish the connection between a file and a file descriptor. It shall create an open file description that refers to a file and a file descriptor that refers to that open file description. The file descriptor is used by other I/O functions to refer to that file. The path argument points to a pathname naming the file.
The open() function shall return a file descriptor for the named file that is the lowest file descriptor not currently open for that process. The open file description is new, and therefore the file descriptor shall not share it with any other process in the system.
In Unix/ Linux operating systems, a file descriptor is an abstract indicator (handle) used to access a file or other IO(input/output) resource, such as a pipe or network socket.
Normally a file descriptors index into a per-process file descriptor table maintained by the kernel in Linux/Unix OS, that in turn indexes
into a system-wide table of files opened by all processes, called the file table.
This table records the "mode" with which the file or the other resource has been opened
for the following operations(There are more operations)
reading
writing
appending
writing
and possibly other modes.
It also indexes into a third table called the inode table that describes the actual underlying files.
File Descriptors are nothing but mappings to a file. You can also say these are pointers to a file that the process is using.
FDs are just integer values which act as pointers to process resources.
Whenever a process starts, an entry of the running process is added to the /proc/<pid> directory. This is the place where all of the data related to the process is kept. Also, on process start the kernel allocates 3 file-descriptors to the process for communication with the 3 data streams referred to as stdin, stdout and stderr.
the linux kernel uses an algorithm to always create a FD with the lowest possible integer value so these data-streams are mapped to the numbers 0, 1 and 2.
Let's say in you code you opened a file to read from or to write to. This means the process needs access to a resource and it has to create a mapping/pointer for this new resource.
To do this, the kernel automatically creates a FD as soon as the file is opened by your code.
If you run ls -l /proc/<pid>/fd/ you will se an additional FD created there with id 4 (can be some other number also if the program has used other resources)
I think of file descriptors as (indirect, higher-level) pointers to opaque file objects maintained by the kernel.
Normally, when you deal with objects maintained by a library, you pass to the library pointers to objects that you're not supposed to dereference and manipulate yourself.
For kernel objects, this it's not just that you're not supposed to manipulate them yourself -- you literally can't because they live in a different address space that's not at all accessible to you. And because they live in a different address space, pointers wouldn't be a meaningful way of referring to them.
You need a token or handle which the kernel would internally resolve to a pointer that's meaningful in the kernel address space. File descriptors are such tokens in integer form.
For the kernel:
your_process_id + your_file_descriptor => kernels_file_object_pointer
(or an EBADF error if a given filedescriptor may not be resolved to a file object pointer for the given process)

Passing file descriptor from one program to another on the same host using UNIX sockets

I have two prgrams lets say prog1 and prog2. I am opening a file with prog1 and doing some operations
on it. Now without closing the file in prog1 i am sending its file descriptor to prog2 using unix
sockets which then does some operations in it.
Though i get the same descriptor i passed in prog1 but doing a fstat() on the fd recieved in prog2
throws an error saying Bad file descriptor. I have opened the file in prog1 with corerct permissions
that is read and write for all, still i get an error.
Why is it happening so. If my way of passing a file descriptor is wrong then please suggest a correct
one.
I believe this site has what you're looking for:
http://www.lst.de/~okir/blackhats/node121.html
There's also information in Linux's man 7 unix on using SCM_RIGHTS and other features of Unix sockets.
Fix for broken link: http://web.archive.org/web/20131016032959/http://www.lst.de/~okir/blackhats/node121.html
This is normal. Each program has its own file descriptors.
EDIT: Well, it seems that you can pass file descriptor using local socket.
You can see them in /proc/PID/fd, they are often symlinks to your files. What you can do with unix socket is allowing a write to a file from one program to another with sendmsg/recvmsg. See this question for more detail.
But there's way many better way to write concurrently to a file. You can use fifo, shm or even just pass your offset position between your 2 programs.
A file descriptor is a small int value that lets you access a file. It's an index into a file descriptor table, a data structure in the kernel that's associated with each individual process. A process cannot do anything meaningful with a file descriptor from another process, since it has no access to any other process's file descriptor table.
This is for basic security reasons. If one process were able to perform operations on an open file belonging to another process, chaos would ensue. Also, a file descriptor just doesn't contain enough information to do what you're trying to do; one process's file descriptor 0 (stdin) might refer to a completely different file than another process's file descriptor 0. And even if they happen to be the same file, each process needs to maintain its own information about the state of that open file (how much it's read/written, etc.).
If you'll describe what you're trying to accomplish, perhaps we can help.
EDIT :
You want to pass data from one program to another. The most straightforward way to do this is to create a pipe (man 2 pipe). Note that the second process will have to be a child of the first.
An alternative might be to create a file that the second process can open and read (not trying to share a file descriptor), or you might use sockets.

How to use a file as a mutex in Linux and C?

I have different processes concurrently accessing a named pipe in Linux and I want to make this access mutually exclusive.
I know is possible to achieve that using a mutex placed in a shared memory area, but being this a sort of homework assignment I have some restrictions.
Thus, what I thought about is to use locking primitives on files to achieve mutual exclusion; I made some try but I can't make it work.
This is what i tried:
flock(lock_file, LOCK_EX)
// critic section
flock(lock_file, LOCK_UN)
Different projects will use different file descriptors but referring to the same file.
Is it possible to achieve something like that? Can you provide some example.
The standard lock-file technique uses options such as O_EXCL on the open() call to try and create the file. You store the PID of the process using the lock, so you can determine whether the process still exists (using kill() to test). You have to worry about concurrency - a lot.
Steps:
Determine name of lock file based on name of FIFO
Open lock file if it exists
Check whether process using it exists
If other process exists, it has control (exit with error, or wait for it to exit)
If other process is absent, remove lock file
At this point, lock file did not exist when last checked.
Try to create it with open() and O_EXCL amongst the other options.
If that works, your process created the file - you have permission to go ahead.
Write your PID to the file; close it.
Open the FIFO - use it.
When done (atexit()?) remove the lock file.
Worry about what happens if you open the lock file and read no PID...is it that another process just created it and hasn't yet written its PID into it, or did it die before doing so? Probably best to back off - close the file and try again (possibly after a randomized nanosleep()). If you get the empty file multiple times (say 3 in a row) assume that the process is dead and remove the lock file.
You could consider having the process that owns the file maintain an advisory lock on the file while it has the FIFO open. If the lock is absent, the process has died. There is still a TOCTOU (time of check, time of use) window of vulnerability between opening the file and applying the lock.
Take a good look at the open() man page on your system to see whether there are any other options to help you. Sometimes, processes use directories (mkdir()) instead of files because even root can't create a second instance of a given directory name, but then you have issues with how to know the PID of the process with the resource open, etc.
I'd definitely recommend using an actual mutex (as has been suggested in the comments); for example, the pthread library provides an implementation. But if you want to do it yourself using a file for educational purposes, I'd suggest taking a look at this answer I posted a while ago which describes a method for doing so in Python. Translated to C, it should look something like this (Warning: untested code, use at your own risk; also my C is rusty):
// each instance of the process should have a different filename here
char* process_lockfile = "/path/to/hostname.pid.lock";
// all processes should have the same filename here
char* global_lockfile = "/path/to/lockfile";
// create the file if necessary (only once, at the beginning of each process)
FILE* f = fopen(process_lockfile, "w");
fprintf(f, "\n"); // or maybe write the hostname and pid
fclose(f);
// now, each time you have to lock the file:
int lock_acquired = 0;
while (!lock_acquired) {
int r = link(process_lockfile, global_lockfile);
if (r == 0) {
lock_acquired = 1;
}
else {
struct stat buf;
stat(process_lockfile, &buf);
lock_acquired = (buf.st_nlink == 2);
}
}
// do your writing
unlink(global_lockfile);
lock_acquired = 0;
Your example is as good as you're going to get using flock (2) (which is after all, merely an "advisory" lock (which is to say not a lock at all, really)). The man page for it on my Mac OS X system has a couple of possibly important provisos:
Locks are on files, not file descriptors. That is, file descriptors duplicated through dup(2) or fork(2) do not result in multiple instances of a
lock, but rather multiple references to a single lock. If a process holding a lock on a file forks and the child explicitly unlocks the file, the
parent will lose its lock
and
Processes blocked awaiting a lock may be awakened by signals.
both of which suggest ways it could fail.
// would have been a comment, but I wanted to quote the man page at some length

execve() and sharing file descriptors

I read from man pages of execve that if a process(A) call execve, the already opened file descriptors are copied to the new process(B).
Two possiblities arise here :-
1) Does it mean that a new file descriptor table is created for process B, the entries to which are copied from older file descriptor table of process A
2) Or process B gets the file descriptor table of process A as after execve process A will cease to exist and the already opened files could only be closed from process B, if it gets the file descriptor table of process A.
Which one is correct?
execve does not create a new process. It replaces the calling process's program image, memory space, etc. with new ones based on an executable file from the filesystem. The file descriptor table is modified by closing any descriptors with close-on-exec flag set; the rest remain open and in the same state they were in (current position, locks, etc.) prior to execve.
You're probably confusing this with what happens on fork since execve is usually preceded by fork. When a process forks, the child process has a new file descriptor table referring to the same open file descriptions as the parent process's file descriptor table.
Which one is correct?
#2
Though what you ask is more of OS implementation details and that is rarely if ever important to applications, completely transparent to the applications and depends on the OS.
It is normally said that the new process inherits file descriptors. Except those having FD_CLOEXEC flag set, obviously.
Even in case of #1, if we would presume that for some short time both process A and B are in memory (not really, that's fork() territory) copying of the fd table would be OK. Since process A would be terminated (by the exec()) all its file descriptors would be close()d. And that would have no effect on the already-copied file descriptors in the process B. File descriptors are like pointers to the corresponding kernel structure containing actual information about what the file descriptor actually points to. Copying the fd table doesn't make copies of the underlying structures - it copies only the pointers. The kernel structure contains reference counter (required to implement the fork()) which is incremented when copy is made and thus knows how many processes are using it. Calling close() on file descriptor first of all does decrement of the reference counter. And only if the counter goes to zero (no more processes are using the structure) only then OS actually closes the underlying file/socket/pipe/etc. (But obviously even if inside of the kernel two processes are present for some short time simultaneously, user space applications can't see that since the new process after exec() also inherits the PID of the original process.)

Resources