C get all open file descriptors - c

I want to implement behavior in my C program so that if a SIGINT happens, I close all open file descriptors. Is there a simple way to get a list of them?

I'd use brute force: for (i = 0; i < fd_max; ++i) close (i);. Quick and pretty portable.

Keep track of all of your open file descriptors and close them individually.
In the general case, a library you're using might have an open file, and closing it will cause that library to misbehave.
In fact, the same problem could exist in your own code, because if you close file descriptors indiscriminately but another part of your program still remembers the file descriptor and tries to use it, it will get an unexpected error or (if other files have been opened since) operate on the wrong file. It is much better for the component responsible for opening a file to also be responsible for closing it.

You could read out the content of /proc/<pid>/fd., if available.
But be aware of the potiential race, that might occur if your application closes some or opens new ones in between your read out /proc/<pid>/fd and you are going to close what you read.
So conculding I want to recommend Kevin Reid's approach to this.

My solution for POSIX systems:
All opened fd's are the lowest value possible.
Make a wrapper function upon open(2).
Your new function open (and return) the requested fd and pass his value to a function called define_if_is_the_higtest_fd_and_store_it().
You should have a int hightest_fd_saved accessible only for a singleton function (there is only 1 'descriptor table') named save_fd() (initial value is 3 (cuz stderr is 2)).
Configure SIGINT to your signal function. Inside, you do a loop from [3, return_fd()].
I think that's it...

Related

Why should I close all file descriptors after calling fork() and prior to calling exec...()? And how would I do it?

I've seen a lot of C code that tries to close all file descriptors between calling fork() and calling exec...(). Why is this commonly done and what is the best way to do it in my own code, as I've seen so many different implementations already?
When calling fork(), your operation system creates a new process by simply cloning your existing process. The new process will be pretty much identical to the process it was cloned from, except for its process ID and any properties that are documented to be replaced or reset by the fork() call.
When calling any form of exec...(), the process image of the calling process is replaced by a new process image but other than that the process state is preserved. One consequence is that open file descriptors in the process file descriptor table prior to calling exec...() are still present in that table after calling it, so the new process code inherits access to them. I guess this has probably been done so that STDIN, STDOUT, and STDERR are automatically inherited by child processes.
However, keep in mind that in POSIX C file descriptors are not only used to access actual files, they are also used for all kind of system and network sockets, pipes, shared memory identifiers, and so on. If you don't close these prior to calling exec...(), your new child process will get access to all of them, even to those resources it could not gain access on its own as it doesn't even have the required access rights. Think about a root process creating a non-root child process, yet this child would have access to all open file descriptors of the root parent process, including open files that should only be writable by root or protected server sockets below port 1024.
So unless you want a child process to inherit access to currently open file descriptors, as may explicitly be desired e.g. to capture STDOUT of a process or feed data via STDIN to that process, you are required to close them prior to calling exec...(). Not only because of security (which sometimes may play no role at all) but also because otherwise the child process will have less free file descriptors available (and think of a long chain of processes, each opening files and then spawning a sub-process... there will be less and less free file descriptors available).
One way to do that is to always open files using the flag O_CLOEXEC, which ensures that this file descriptor is automatically closed when exec...() is ever called. One problem with that solution is that you cannot control how external libraries may open files, so you cannot rely that all code will always set this flag.
Another problem is that this solution only works for file descriptors created with open(). You cannot pass that flag when creating sockets, pipes, etc. This is a known problem and some systems are working around that by offering the non-standard acccept4(), pipe2(), dup3(), and the SOCK_CLOEXEC flag for sockets, however these are not yet POSIX standard and it's unknown if they will become standard (this is planned but until a new standard has been released we cannot know for sure, also it will take years until all systems have adopted them).
What you can do is to later on set the flag FD_CLOEXEC using fcntl() on the file descriptor, however, note that this isn't safe in a multi-thread environment. Just consider the following code:
int so = socket(...);
fcntl(so, F_SETFD, FD_CLOEXEC);
If another thread calls fork() in between the first and the second line, which is of course possible, the flag has not yet been set yet and thus this file descriptor won't get closed.
So the only way that is really safe is to explicitly close them and this is not as easy as it may seem!
I've seen a lot of code that does stupid things like this:
for (int i = STDERR_FILENO + 1; i < 256; i++) close(i);
But just because some POSIX systems have a default limit of 256 doesn't mean that this limit cannot be raised. Also on some system the default limit is always higher to begin with.
Using FD_SETSIZE instead of 256 is equally wrong as just because the select() API has a hard limit by default on most systems doesn't mean that a process cannot have more open file descriptors than this limit (after all you don't have to use select() with them, you can use poll() API as a replacement and poll() has no upper limit on file descriptor numbers).
Always correct is to use OPEN_MAX instead of 256 as that is really the absolute maximum of file descriptors a process can have. The downside is that OPEN_MAX can theoretically be huge and doesn't reflect the real current runtime limit of a process.
To avoid having to close too many non-existing file descriptors, you can use this code instead:
int fdlimit = (int)sysconf(_SC_OPEN_MAX);
for (int i = STDERR_FILENO + 1; i < fdlimit; i++) close(i);
sysconf(_SC_OPEN_MAX) is documented to update correctly if the open file limit (RLIMIT_NOFILE) has been raised using setrlimit(). The resource limits (rlimits) are the effective limits for a running process and for files they will always have to be between _POSIX_OPEN_MAX (documented as the minimum number of file descriptors a process is always allowed to open, must be at least 20) and OPEN_MAX (must be at least _POSIX_OPEN_MAX and sets the upper limit).
While closing all possible descriptors in a loop is technically correct and will work as desired, it may try to close several thousand file descriptors, most of them will often not exist. Even if the close() call for a non-existing file descriptor is fast (which is not guaranteed by any standard), it may take a while on weaker systems (think of embedded devices, think of small single-board computers), which may be a problem.
So several systems have developed more efficient ways to solve this issue. Famous examples are closefrom() and fdwalk() which BSD and Solaris systems support. Unfortunately The Open Group voted against adding closefrom() to the standard (quote): "it is not possible to standardize an interface that closes arbitrary file descriptors above a certain value while still guaranteeing a conforming environment." (Source) This is of course nonsense, as they make the rules themselves and if they define that certain file descriptors can always be silently omitted from closing if the environment or system requires or the code itself requests that, then this would break no existing implementation of that function and still offer the desired functionality for the rest of us. Without these functions people will use a loop and do exactly what The Open Group tries to avoid here, so not adding it only makes the situation even worse.
On some platforms you are basically out of luck, e.g. macOS, which is fully POSIX conform. If you don't want to close all file descriptors in a loop on macOS, your only option is to not use fork()/exec...() but instead posix_spawn(). posix_spawn() is a newer API for platforms that don't support process forking, it can be implemented purely in user space on top of fork()/exec...() for those platforms that do support forking and can otherwise use some other API a platform offers for starting child processes. On macOS there exists a non-standard flag POSIX_SPAWN_CLOEXEC_DEFAULT, which will tread all file descriptors as if the CLOEXEC flag has been set on them, except for those for that you explicitly specified file actions.
On Linux you can get a list of file descriptors by looking at the path /proc/{PID}/fd/ with {PID} being the process ID of your process (getpid()), that is, if the proc file system has been mounted at all and it has been mounted to /proc (but a lot of Linux tools rely on that, not doing so would break many other things as well). Basically you can limit yourself to close all descriptors listed under this path.
True story: Once upon a time I wrote a simple little C program that opened a file, and I noticed that the file descriptor returned by open was 4. "That's funny," I thought. "Standard input, output, and error are always file descriptors 0, 1, and 2, so the first file descriptor you open is usually 3."
So I wrote another little C program that started reading from file descriptor 3 (without opening it, that is, but rather, assuming that 3 was a pre-opened fd, just like 0, 1, and 2). It quickly became apparent that, on the Unix system I was using, file descriptor 3 was pre-opened on the system password file. This was evidently a bug in the login program, which was exec'ing my login shell with fd 3 still open on the password file, and the stray fd was in turn being inherited by programs I ran from my shell.
Naturally the next thing I tried was a simple little C program to write to the pre-opened file descriptor 3, to see if I could modify the password file and give myself root access. This, however, didn't work; the stray fd 3 was opened on the password file in read-only mode.
But at any rate, this helps to explain why you shouldn't leave file descriptors open when you exec a child process.
[Footnote: I said "true story", and it mostly is, but for the sake of the narrative I did change one detail. In fact, the buggy version of /bin/login was leaving fd 3 opened on the groups file, /etc/group, not the password file.]

Stdout redirecting (to a file for instance) with a static library in C

I know already how to implement methods regarding usual freopen(), popen() or similar stdout/stdin/stderr -based redirecting mechanisms, but I wondered how should I apply the said mechanism to static (own) libraries in C? Say, I want to use a library to capture any program with printf() commands or so into a file (for instance) without letting it appear on the console - are there some things I need to acknowledge before applying simple fd dups and just calling the library in the main program? Even piping seems to be complex seeing as execing here is risky...
thanks in advance.
There's an old-timers' trick to force the entire process, regardless of what library the code comes from, to have one of the standard IO ports connected to a different filehandle. You simply close the filehandle in question, then open a new one. If you close(1), then open('some_file', 'w'), then ALL calls that would result in a write to stdout will go to some_file from that point forward.
This works because open() always uses the first file descriptor that isn't currently in use. Presuming that you haven't closed stdin (fd=0), the call to open will get a file descriptor of 1.
There are some caveats. FILE outputs that haven't flushed their buffers will have undefined behavior, but you probably won't be doing this in the middle of execution. Set it up as your process starts and you'll be golden.

C - Proper way to close files when using both open() and fdopen()

So I'm building a Unix minishell in C, and am implementing input, output, and err redirection, and have come across a problem with files. I open my files in a loop where I find redirection operators, and use open(), which returns an fd. I then assign the child's fd accordingly, and call an execute function.
When my shell is just going out and finding programs, and executing them with execvp(), I don't have much of a problem. The only problem is knowing whether or not I need to call close() on the file descriptors before prompting for the next command line. I'm worried about having an fd leak, but don't exactly understand how it works.
My real problem arises when using builtin commands. I have a builtin command called "read", that takes one argument, an environmental variable name(could be one that doesn't yet exist). Read then prompts for a value, and assigns that value to the variable. Here's an example:
% read TESTVAR
test value test value test value
% echo ${TESTVAR}
test value test value test value
Well lets say that I try something like this:
% echo here's another test value > f1
% read TESTVAR < f1
% echo ${TESTVAR}
here's another test value
This works great, keep in mind that read executes inside the parent process, I don't call read with execvp since it's builtin. Read uses gets, which requires a stream variable, not an fd. So after poking around on the irc forums a bit I was told to use fdopen, to get the stream from the file descriptor. So before calling gets, I call:
rdStream = fdopen(inFD, "r");
then call
if(fgets(buffer, envValLen, rdStream) != buffer)
{
if(inFD) fclose(rdStream);
return -1;
}
if(inFD) fclose(rdStream);
As you can see, at the moment I'm closing the stream with fclose(), unless it is equal to stdin(which is 0). Is this necessary? Do I need to close the stream? Or just the file descriptor? Or both? I'm quite confused on which I should close, since they both refer to the same file, in a different manner. At the moment I'm not closing the fd, however I think that I definitely should. I would just like somebody to help make sure my shell isn't leaking any files, as I want it to be able to execute several thousand commands in a single session without leaking memory.
Thanks, if you guys want me to post anymore code just ask.
The standard says:
The fclose() function shall perform the equivalent of a close() on the
file descriptor that is associated with the stream pointed to by
stream.
So calling fclose is enough; it will also close the descriptor.
FILE is a buffering object from standard C library. When you do fclose (standard C function) it will eventually call close (Unix system function) but only after making sure C library buffers are flushed. So, I would say, if you use fopen andfwrite then you should use fclose, and not just close, otherwise you risk loosing some data.

Preventing reuse of file descriptors

Is there anyway in Linux (or more generally in a POSIX OS) to guarantee that during the execution of a program, no file descriptors will be reused, even if a file is closed and another opened? My understanding is that this situation would usually lead to the file descriptor for the closed file being reassigned to the newly opened file.
I'm working on an I/O tracing project and it would make life simpler if I could assume that after an open()/fopen() call, all subsequent I/O to that file descriptor is to the same file.
I'll take either a compile-time or run-time solution.
If it is not possible, I could do my own accounting when I process the trace file (noting the location of all open and close calls), but I'd prefer to squash the problem during execution of the traced program.
Note that POSIX requires:
The open() function shall return a file descriptor for the named file
that is the lowest file descriptor not currently open for that
process.
So in the strictest sense, your request will change the program's environment to be no longer POSIX compliant.
That said, I think your best bet is to use the LD_PRELOAD trick to intercept calls to close and ignore them.
You'd have to write a SO that contains a close(2) that opens /dev/null on old FDs, and then use $LD_PRELOAD to load it into process space before starting the application.
You must already be ptraceing the application to intercept its file opening and closing operations.
It would appear trivial to prevent FD re-use by "injecting" dup2(X, Y); close(X); calls into the application, and adjusting Y to be anything you want.
However, the application itself could be using dup2 to force a re-use of previously closed FD, and may not work if you prevent that, so I think you'll just have to deal with this in post-processing step.
Also, it's quite easy to write an app that will run out of FDs if you disallow re-use.

What does select(2) do if you close(2) a file descriptor in a separate thread?

What is the behavior of the select(2) function when a file descriptor it is watching for reading is closed by another thread?
From some cursory testing, it does return right away. I suspect the outcome is either that (a) it still continues to wait for data, but if you actually tried to read from it you'd get EBADF (possibly -- there's a potential race) or (b) that it pretends as though the file descriptor were never passed in. If the latter case is true, passing in a single fd with no timeout would cause a deadlock if it were closed.
From some additional investigation, it appears that both dwc and bothie are right.
bothie's answer to the question boils down to: it's undefined behavior. That doesn't mean that it's unpredictable necessarily, but that different OSes do it differently. It would appear that systems like Solaris and HP-UX return from select(2) in this case, but Linux does not based on this post to the linux-kernel mailing list from 2001.
The argument on the linux-kernel mailing list is essentially that it is undefined (and broken) behavior to rely upon. In Linux's case, calling close(2) on the file descriptor effectively decrements a reference count on it. Since there is a select(2) call also with a reference to it, the fd will remain open and waiting for input until the select(2) returns. This is basically dwc's answer. You will get an event on the file descriptor and then it'll be closed. Trying to read from it will result in a EBADF, assuming the fd hasn't been recycled. (A concern that MarkR made in his answer, although I think it's probably avoidable in most cases with proper synchronization.)
So thank you all for the help.
I would expect that it would behave as if the end-of-file had been reached, that's to say, it would return with the file descriptor shown as ready but any attempt to read it subsequently would return "bad file descriptor".
Having said that, doing that is very bad practice anyway, as you'd always have potential race conditions as another file descriptor with the same number could be opened by yet another thread immediately after the other 2nd closed it, then the selecting thread would end up waiting on the wrong one.
As soon as you close a file, its number becomes available for reuse, and may get reused by the next call to open(), socket() etc, even if by another thread. Therefore you really, really need to avoid this kind of thing.
The select system call is a way to wait for file desctriptors to change state while the programs doesn't have anything else to do. The main use is for server applications, which open a bunch of file descriptors and then wait for anything to do on them (accept new connections, read requests or send the responses). Those file descriptors will be opened in non-blocking io mode such that the server process won't hang in a syscall at any times.
This additionally means, there is no need for separate threads, because all the work, that could be done in the thread can be done prior to the select call as well. And if the work takes long, than it can be interrupted, select being called with timeout={0,0}, the file descriptors get handled and afterwards the work is being resumed.
Now, you close a file descriptor in another thread. Why do you have that extra thread at all, and why shall it close the file descriptor?
The POSIX standard doesn't provide any hints, what happens in this case, so what you're doing is UNDEFINED BEHAVIOR. Expect that the result will be very different between different operating systems and even between version of the same OS.
Regards, Bodo
It's a little confusing what you're asking...
Select() should return upon an "interesting" change. If the close() merely decremented the reference count and the file was still open for writing somewhere then there's no reason for select() to wake up.
If the other thread did close() on the only open descriptor then it gets more interesting, but I'd need to see a simple version of the code to see if something's really wrong.

Resources