How to know the next available file descriptor in C? - c

Calling a function like open in C will return the next available file descriptor and use it up. Is there a way to simply ask my system what the next free fd will actually be instead? i.e not eating it up.

By the time you ask what the "next" one is, another component (library, thread, etc) may immediately grab it and use it, so it will no longer be free.
The information on what the next unused descriptor is, is completely worthless, so it is not available.

Related

Closing open directories on exit()

I would like to know if there is a way to access a list of all open directories from the current process? I have a function that opens many directories recursively but exits the program as soon as something is wrong. Of course, I would like to close all directories before calling exit() without having to keep track of everything I open. Is this even possible?
Thanks!
I have a function that opens many directories recursively but exits the program as soon as something is wrong.
Of course, I would like to close all directories before calling exit() without having to keep track of everything I open.
I think your very approach is wrong. What is the point of opening the directories if you don't keep a handle on them?
You should keep a reference to the opened directory as long as you need it and discard it as soon as you can.
Keep in mind that normally, the nomber of open file descriptors is limited, e. g. to 1024.
You do not need to do this as exit() will (eventually) exit the process, which will close all open file descriptors whether for directories or real files.
However, you absolutely do need to worry about valgrind and friends reporting this, as this means fds are leaking in your program. But the solution is not to hunt around for open directories, but rather to simply ensure each opendir is matched by a closedir. That's what valgrind is prompting you to do.
When you exit(), file handles are close()d. This is good for one-time tools, but not good practice in the long run.
You should instead walk back up the recursion, close()ing as you go. Replace, for example:
exit(1);
for:
close(current_fd);
return NULL;
Change your recursive call for:
if (thisfunc(...) == NULL) {
close(current_fd);
return NULL;
}

Program restart self on update

I checked everywhere so I am hopefully not repeating a question.
I want to add a portable update feature to some C code I am writing. The program may not be in any specific location, and I would prefer to keep it to a single binary (No dynamic library loading)
Then after the update is complete, I want the program to be able to restart (not a loop, actually reload from the HDD)
Is there any way to do this in C on Linux?
If you know where the program is saved on disk, then you can exec() the program:
char args[] = { "/opt/somewhere/bin/program", 0 };
execv(args[0], args);
fprintf(stderr, "Failed to reexecute %s\n", args[0]);
exit(1);
If you don't know where the program is on disk, either use execvp() to search for it on $PATH, or find out. On Linux, use the /proc file system — and /proc/self/exe specifically; it is a symlink to the executable, so you would need to use readlink() to get the value. Beware: readlink() does not null terminate the string it reads.
If you want, you can arrange to pass an argument which indicates to the new process that it is being restarted after update; the bare minimum argument list I provided can be as complex as you need (a list of the files currently open for edit, perhaps, or any other appropriate information and options).
Also, don't forget to clean up before reexecuting — cleanly close any open files, for example. Remember, open file descriptors are inherited by the executed process (unless you mark them for closure on exec with FD_CLOEXEC or O_CLOEXEC), but the new process won't know what they're for unless you tell it (in the argument list) so it won't be able to use them. They'll just be cluttering up the process without helping in the least.
Yes, you need to call the proper exec() function. There might be some complications, it can be troublesome to find the absolute path name. You need to:
Store the current directory in main().
Store the argc and (all) argv[] values from main().
Since calling exec() replaces the current process, that should be all you need to do in order to restart yourself. You might also need to take care to close any opened files, since they might otherwise be "inherited" back to yourself, which is seldom what you want.

C get all open file descriptors

I want to implement behavior in my C program so that if a SIGINT happens, I close all open file descriptors. Is there a simple way to get a list of them?
I'd use brute force: for (i = 0; i < fd_max; ++i) close (i);. Quick and pretty portable.
Keep track of all of your open file descriptors and close them individually.
In the general case, a library you're using might have an open file, and closing it will cause that library to misbehave.
In fact, the same problem could exist in your own code, because if you close file descriptors indiscriminately but another part of your program still remembers the file descriptor and tries to use it, it will get an unexpected error or (if other files have been opened since) operate on the wrong file. It is much better for the component responsible for opening a file to also be responsible for closing it.
You could read out the content of /proc/<pid>/fd., if available.
But be aware of the potiential race, that might occur if your application closes some or opens new ones in between your read out /proc/<pid>/fd and you are going to close what you read.
So conculding I want to recommend Kevin Reid's approach to this.
My solution for POSIX systems:
All opened fd's are the lowest value possible.
Make a wrapper function upon open(2).
Your new function open (and return) the requested fd and pass his value to a function called define_if_is_the_higtest_fd_and_store_it().
You should have a int hightest_fd_saved accessible only for a singleton function (there is only 1 'descriptor table') named save_fd() (initial value is 3 (cuz stderr is 2)).
Configure SIGINT to your signal function. Inside, you do a loop from [3, return_fd()].
I think that's it...

Preventing reuse of file descriptors

Is there anyway in Linux (or more generally in a POSIX OS) to guarantee that during the execution of a program, no file descriptors will be reused, even if a file is closed and another opened? My understanding is that this situation would usually lead to the file descriptor for the closed file being reassigned to the newly opened file.
I'm working on an I/O tracing project and it would make life simpler if I could assume that after an open()/fopen() call, all subsequent I/O to that file descriptor is to the same file.
I'll take either a compile-time or run-time solution.
If it is not possible, I could do my own accounting when I process the trace file (noting the location of all open and close calls), but I'd prefer to squash the problem during execution of the traced program.
Note that POSIX requires:
The open() function shall return a file descriptor for the named file
that is the lowest file descriptor not currently open for that
process.
So in the strictest sense, your request will change the program's environment to be no longer POSIX compliant.
That said, I think your best bet is to use the LD_PRELOAD trick to intercept calls to close and ignore them.
You'd have to write a SO that contains a close(2) that opens /dev/null on old FDs, and then use $LD_PRELOAD to load it into process space before starting the application.
You must already be ptraceing the application to intercept its file opening and closing operations.
It would appear trivial to prevent FD re-use by "injecting" dup2(X, Y); close(X); calls into the application, and adjusting Y to be anything you want.
However, the application itself could be using dup2 to force a re-use of previously closed FD, and may not work if you prevent that, so I think you'll just have to deal with this in post-processing step.
Also, it's quite easy to write an app that will run out of FDs if you disallow re-use.

What does select(2) do if you close(2) a file descriptor in a separate thread?

What is the behavior of the select(2) function when a file descriptor it is watching for reading is closed by another thread?
From some cursory testing, it does return right away. I suspect the outcome is either that (a) it still continues to wait for data, but if you actually tried to read from it you'd get EBADF (possibly -- there's a potential race) or (b) that it pretends as though the file descriptor were never passed in. If the latter case is true, passing in a single fd with no timeout would cause a deadlock if it were closed.
From some additional investigation, it appears that both dwc and bothie are right.
bothie's answer to the question boils down to: it's undefined behavior. That doesn't mean that it's unpredictable necessarily, but that different OSes do it differently. It would appear that systems like Solaris and HP-UX return from select(2) in this case, but Linux does not based on this post to the linux-kernel mailing list from 2001.
The argument on the linux-kernel mailing list is essentially that it is undefined (and broken) behavior to rely upon. In Linux's case, calling close(2) on the file descriptor effectively decrements a reference count on it. Since there is a select(2) call also with a reference to it, the fd will remain open and waiting for input until the select(2) returns. This is basically dwc's answer. You will get an event on the file descriptor and then it'll be closed. Trying to read from it will result in a EBADF, assuming the fd hasn't been recycled. (A concern that MarkR made in his answer, although I think it's probably avoidable in most cases with proper synchronization.)
So thank you all for the help.
I would expect that it would behave as if the end-of-file had been reached, that's to say, it would return with the file descriptor shown as ready but any attempt to read it subsequently would return "bad file descriptor".
Having said that, doing that is very bad practice anyway, as you'd always have potential race conditions as another file descriptor with the same number could be opened by yet another thread immediately after the other 2nd closed it, then the selecting thread would end up waiting on the wrong one.
As soon as you close a file, its number becomes available for reuse, and may get reused by the next call to open(), socket() etc, even if by another thread. Therefore you really, really need to avoid this kind of thing.
The select system call is a way to wait for file desctriptors to change state while the programs doesn't have anything else to do. The main use is for server applications, which open a bunch of file descriptors and then wait for anything to do on them (accept new connections, read requests or send the responses). Those file descriptors will be opened in non-blocking io mode such that the server process won't hang in a syscall at any times.
This additionally means, there is no need for separate threads, because all the work, that could be done in the thread can be done prior to the select call as well. And if the work takes long, than it can be interrupted, select being called with timeout={0,0}, the file descriptors get handled and afterwards the work is being resumed.
Now, you close a file descriptor in another thread. Why do you have that extra thread at all, and why shall it close the file descriptor?
The POSIX standard doesn't provide any hints, what happens in this case, so what you're doing is UNDEFINED BEHAVIOR. Expect that the result will be very different between different operating systems and even between version of the same OS.
Regards, Bodo
It's a little confusing what you're asking...
Select() should return upon an "interesting" change. If the close() merely decremented the reference count and the file was still open for writing somewhere then there's no reason for select() to wake up.
If the other thread did close() on the only open descriptor then it gets more interesting, but I'd need to see a simple version of the code to see if something's really wrong.

Resources