File size lookup in C - c

I was wondering if there was any significant performance increase in using sys/stat.h versus fseek() and and ftell()?

Choosing between fstat() and the fseek()/ftell() combination, there isn't going to be much difference. The single function call should be slightly quicker than the double function call, but the difference won't be great.
Choosing between stat() and the combination isn't a very fair comparison. For the combination calls, the hard work was done when the file was opened, so the inode information is readily available. The stat() call has to parse the file path and then report what it finds. It should almost always be slower - unless you recently opened the file anyway so the kernel has most of the information cached. Even so, the pathname lookup required by stat() is likely to make it slower than the combination.

If you're not sure, try it!
I just coded this test. I generated 10,000 files of 2KB each, and iterated over all of them, asking for their file size.
Results on my machine by measuring with the "time" command and doing an average of 10 runs:
fseek/fclose version: 0.22 secs
stat version: 0.06 secs
So, the winner (at least on my machine): stat!
Here's the test code:
#include <stdio.h>
#include <sys/stat.h>
#if 0
size_t getFileSize(const char * filename)
{
struct stat st;
stat(filename, &st);
return st.st_size;
}
#else
size_t getFileSize(const char * filename)
{
FILE * fd=fopen(filename, "rb");
if(!fd)
printf("ERROR on file %s\n", filename);
fseek(fd, 0, SEEK_END);
size_t size = ftell(fd);
fclose(fd);
return size;
}
#endif
int main()
{
char buf[256];
int i, n;
for(i=0; i<10000; ++i)
{
sprintf(buf, "file_%d", i);
if(getFileSize(buf)!= 2048)
printf("WRONG!\n");
}
return 0;
}

Logically, one would assume that fseek() when prompted to seek to the end of the file uses stat to know how far to seek, or rather, where the end of the file is.
This would make fseek slower than using the facilities directly, and it also requires you to fopen the file in the first place.
Still, any performance difference is likely to be negligible, and if you need to open the file for some reason anyway, fseek/ftell likely improves the readability of your code significantly.

For stat.h you mainly want to use it to tell the stats of the file. Like if you want to tell if it's a file or a directory, etc.
However, if you want to do manipulations with the file, then you'll probably want to use ftell() and fseek(). That is you're actually doing manipulations on the file stream itself.
So in terms of performance, it's really what you need.
Hope it helps :) Cheers!

Depending on the circumstances, stat() can be hundred of times faster then seek()/tell(). I am currently toying around with sshfs/FUSE and getting the file size of a few thousand files with seek()/tell() takes well over a minute, doing it with stat() takes a second. So the difference is pretty huge when working over sshfs/FUSE.

Related

Same program is 10 times slower on windows

I wrote a simple c program that copies 10 million bytes from a file and pastes them in reverse order on another file (this is done one byte at a time, I know it's not efficient but it's just to make some tests), I don't understand why on linux it takes 2.5 seconds while on windows it takes more than 20 seconds. I run the same program changing only the paths.
I use windows 10 and archlinux, the files are on an ntfs partition.
code on windows
#include <stdio.h>
#include <time.h>
void get_nth_byte(FILE *fp, int nth_index,unsigned char* output){
fseek(fp,nth_index,SEEK_SET);
fread(output, sizeof(unsigned char), 1,fp);
}
int main() {
clock_t begin = clock();
//
FILE* input = fopen( "C:\\Users\\piero\\Desktop\\input.txt","rb");
FILE* output = fopen("C:\\Users\\piero\\Desktop\\output.txt","wb");
unsigned char byte;
for (int i = 10000000; i > 0; i--) {
get_nth_byte(input,i,&byte);
fwrite(&byte, sizeof(unsigned char),1,output);
}
//
clock_t end = clock();
double result = (double) (end - begin)/CLOCKS_PER_SEC;
printf("%f",result);
return 0;
}
code on linux
#include <stdio.h>
#include <time.h>
void get_nth_byte(FILE *fp, int nth_index,unsigned char* output){
fseek(fp,nth_index,SEEK_SET);
fread(output, sizeof(unsigned char), 1,fp);
}
int main() {
clock_t begin = clock();
//
FILE* input = fopen( "/run/media/piero/Windows/Users/piero/Desktop/input.txt","rb");
FILE* output = fopen("/run/media/piero/Windows/Users/piero/Desktop/output.txt","wb");
unsigned char byte;
for (int i = 10000000; i > 0; i--) {
get_nth_byte(input,i,&byte);
fwrite(&byte, sizeof(unsigned char),1,output);
}
//
clock_t end = clock();
double result = (double) (end - begin)/CLOCKS_PER_SEC;
printf("%f",result);
return 0;
}
output on linux : 2.224549
output on windows : 25.349647
UPDATE
I solved the problem by using cygwin rather than mingwin, now it takes about 4.3 seconds
This is a great demonstration of how it's not the code we write that runs, it's the executable that the compiler makes from the code that runs.
It is possible that your Windows C compiler is not as advanced as your Linux C compiler, and is not optimizing your code as well as it could, or it's possible that the libraries that the Windows compiler is linking to for fread() and fwrite() are slower than the equivalent libraries in the Linux system.
If I had to put up my best guess, the Linux C compiler probably noticed that it would be more efficient to read more than one byte at a time, and it could do that without affecting the semantics of your program, and the Windows compiler either didn't infer the same, or wasn't able to optimize in the same way due to some underlying proprietary filesystem thing that only Microsoft engineers understand.
I can't say for sure without a peek at the disassembled binaries
One of the strengths of Unix/Linux is that files are designed to be treated as streams of bytes, with it being maximally easy and efficient to seek to the n'th byte using fseek or lseek.
Non-Unix operating systems, such as Windows, tend to have to work much harder to implement those seek operations. In the worst case, they may actually need to read through the file, counting characters as they go.
Your code opens both files in binary mode, and this should reduce the need for the fseek implementation to perform any expensive emulations. In text mode, a 10x performance penalty for heavy fseek use wouldn't surprise me. I'm much more surprised you're seeing it in binary mode.
[Disclaimer: strictly speaking, in text mode fseek is not defined as seeking to an arbitrary byte offset at all, but rather, only to a position defined by the number returned by a previous call to ftell. If an implementation takes advantage of that freedom, it can reduce the performance penalty for text-mode fseek operations, also, but it then means that code like yours, that constructs positions to seek to on the assumption that they're pure byte offsets, may not work at all.]

Is it recommended method for computing the size of a file using fseek()?

In C, we can find the size of file using fseek() function. Like,
if (fseek(fp, 0L, SEEK_END) != 0)
{
// Handle repositioning error
}
So, I have a question, Is it recommended method for computing the size of a file using fseek() and ftell()?
If you're on Linux or some other UNIX like system, what you want is the stat function:
struct stat statbuf;
int rval;
rval = stat(path_to_file, &statbuf);
if (rval == -1) {
perror("stat failed");
} else {
printf("file size = %lld\n", (long long)statbuf.st_size;
}
On Windows under MSVC, you can use _stati64:
struct _stati64 statbuf;
int rval;
rval = _stati64(path_to_file, &statbuf);
if (rval == -1) {
perror("_stati64 failed");
} else {
printf("file size = %lld\n", (long long)statbuf.st_size;
}
Unlike using fseek, this method doesn't involve opening the file or seeking through it. It just reads the file metadata.
The fseek()/ftell() works sometimes.
if (fseek(fp, 0L, SEEK_END) != 0)
printf("Size: %ld\n", ftell(fp));
}
Problems.
If the file size exceeds about LONG_MAX, long int ftell(FILE *stream) response is problematic.
If the file is opened in text mode, the return value from ftell() may not correspond to the file length. "For a text stream, its file position indicator contains unspecified information," C11dr §7.21.9.4 2
If the file is opened in binary mode, fseek(fp, 0L, SEEK_END) is not well defined. "Setting the file position indicator to end-of-file, as with fseek(file, 0, SEEK_END), has undefined behavior for a binary stream (because of possible trailing null characters) or for any stream with state-dependent encoding that does not assuredly end in the initial shift state." C11dr footnote 268. #Evert This most often applies to earlier platforms than today, but it is still part of the spec.
If the file is a stream like a serial input or stdin, fseek(file, 0, SEEK_END) makes little sense.
The usual solution to finding file size is a non-portable platform specific one. Example good answer #dbush.
Note: If code attempts to allocate memory based on file size, the memory available can easily be exceeded by the file size.
Due to these issues, I do not recommend this approach.
Typically the problem should be re-worked to not need to find the file size, but to grow the data as more input is processed.
LL disclaimer: Note that C spec footnotes are informative and so not necessarily normative.
The best method in my opinion is fstat(): https://linux.die.net/man/2/fstat
Well, you can estimate the size of a file in several ways:
You can read(2) the file from the beginning to the end, and the number or chars read is the size of the file. This is a tedious way of getting the size of a file, as you have to read the whole file to get the size. But if the operating system doesn't allow to position the file pointer arbitrarily, then this is the only way to get the file size.
Or you can move the pointer at the end of file position. This is the lseek(2) you showed in the question, but be careful that you have to do the system call twice, as the value returned is the actual position before moving the pointer to the desired place.
Or you can use the stat(2) system call, that will tell you all the administrative information of the file, like the owner, group, permissions, size, number of blocks the file occupies in the disk, disk this file belongs to, number of directory entries pointing to it, etc. This allows you to get all this information with only one syscall.
Other methods you point (like the use of the ftell(3) stdio library call) will work also (with the same problem that it results in two system calls to set and retrieve/restore the file pointer) but have the problem of involving libraries that probably you are not using for anything else. It should be complicated to get a FILE * pointer (e.g. fdopen(3)) on a int file descriptor, just to be able to use the ftell(3) function on it (twice), and then fclose(3) it again.

How to increase the limit of "maximum open files" in C on Mac OS X

The default limit for the max open files on Mac OS X is 256 (ulimit -n) and my application needs about 400 file handlers.
I tried to change the limit with setrlimit() but even if the function executes correctly, i'm still limited to 256.
Here is the test program I use:
#include <stdio.h>
#include <sys/resource.h>
main()
{
struct rlimit rlp;
FILE *fp[10000];
int i;
getrlimit(RLIMIT_NOFILE, &rlp);
printf("before %d %d\n", rlp.rlim_cur, rlp.rlim_max);
rlp.rlim_cur = 10000;
setrlimit(RLIMIT_NOFILE, &rlp);
getrlimit(RLIMIT_NOFILE, &rlp);
printf("after %d %d\n", rlp.rlim_cur, rlp.rlim_max);
for(i=0;i<10000;i++) {
fp[i] = fopen("a.out", "r");
if(fp[i]==0) { printf("failed after %d\n", i); break; }
}
}
and the output is:
before 256 -1
after 10000 -1
failed after 253
I cannot ask the people who use my application to poke inside a /etc file or something. I need the application to do it by itself.
rlp.rlim_cur = 10000;
Two things.
1st. LOL. Apparently you have found a bug in the Mac OS X' stdio. If I fix your program up/add error handling/etc and also replace fopen() with open() syscall, I can easily reach the limit of 10000 (which is 240 fds below my 10.6.3' OPEN_MAX limit 10240)
2nd. RTFM: man setrlimit. Case of max open files has to be treated specifically regarding OPEN_MAX.
etresoft found the answer on the apple discussion board:
The whole problem here is your
printf() function. When you call
printf(), you are initializing
internal data structures to a certain
size. Then, you call setrlimit() to
try to adjust those sizes. That
function fails because you have
already been using those internal
structures with your printf(). If you
use two rlimit structures (one for
before and one for after), and don't
print them until after calling
setrlimit, you will find that you can
change the limits of the current
process even in a command line
program. The maximum value is 10240.
For some reason (perhaps binary compatibility), you have to define _DARWIN_UNLIMITED_STREAMS before including <stdio.h>:
#define _DARWIN_UNLIMITED_STREAMS
#include <stdio.h>
#include <sys/resource.h>
main()
{
struct rlimit rlp;
FILE *fp[10000];
int i;
getrlimit(RLIMIT_NOFILE, &rlp);
printf("before %d %d\n", rlp.rlim_cur, rlp.rlim_max);
rlp.rlim_cur = 10000;
setrlimit(RLIMIT_NOFILE, &rlp);
getrlimit(RLIMIT_NOFILE, &rlp);
printf("after %d %d\n", rlp.rlim_cur, rlp.rlim_max);
for(i=0;i<10000;i++) {
fp[i] = fopen("a.out", "r");
if(fp[i]==0) { printf("failed after %d\n", i); break; }
}
}
prints
before 256 -1
after 10000 -1
failed after 9997
This feature appears to have been introduced in Mac OS X 10.6.
This may be a hard limitation of your libc. Some versions of solaris have a similar limitation because they store the fd as an unsigned char in the FILE struct. If this is the case for your libc as well, you may not be able to do what you want.
As far as I know, things like setrlimit only effect how many file you can open with open (fopen is almost certainly implemented in terms on open). So if this limitation is on the libc level, you will need an alternate solution.
Of course you could always not use fopen and instead use the open system call available on just about every variant of unix.
The downside is that you have to use write and read instead of fwrite and fread, which don't do things like buffering (that's all done in your libc, not by the OS itself). So it could end up be a performance bottleneck.
Can you describe the scenario that requires 400 files open ** simultaneously**? I am not saying that there is no case where that is needed. But, if you describe your use case more clearly, then perhaps we can recommend a better solution.
I know that's sound a silly question, but you really need 400 files opened at the same time?
By the way, are you running this code as root are you?
Mac OS doesn't allow us to easily change the limit as in many of the unix based operating system. We have to create two files
/Library/LaunchDaemons/limit.maxfiles.plist
/Library/LaunchDaemons/limit.maxproc.plist
describing the max proc and max file limit. The ownership of the file need to be changed to 'root:wheel'
This alone doesn't solve the problem, by default latest version of mac OSX uses 'csrutil', we need to disable it. To disable it we need to reboot our mac in recovery mode and from there disable csrutil using terminal.
Now we can easily change the max open file handle limit easily from terminal itself (even in normal boot mode).
This method is explained in detail in the following link. http://blog.dekstroza.io/ulimit-shenanigans-on-osx-el-capitan/
works for OSX-elcapitan and OSX-Seirra.

How to use /dev/random or urandom in C?

I want to use /dev/random or /dev/urandom in C. How can I do it? I don't know how can I handle them in C, if someone knows please tell me how. Thank you.
In general, it's a better idea to avoid opening files to get random data, because of how many points of failure there are in the procedure.
On recent Linux distributions, the getrandom system call can be used to get crypto-secure random numbers, and it cannot fail if GRND_RANDOM is not specified as a flag and the read amount is at most 256 bytes.
As of October 2017, OpenBSD, Darwin and Linux (with -lbsd) now all have an implementation of arc4random that is crypto-secure and that cannot fail. That makes it a very attractive option:
char myRandomData[50];
arc4random_buf(myRandomData, sizeof myRandomData); // done!
Otherwise, you can use the random devices as if they were files. You read from them and you get random data. I'm using open/read here, but fopen/fread would work just as well.
int randomData = open("/dev/urandom", O_RDONLY);
if (randomData < 0)
{
// something went wrong
}
else
{
char myRandomData[50];
ssize_t result = read(randomData, myRandomData, sizeof myRandomData);
if (result < 0)
{
// something went wrong
}
}
You may read many more random bytes before closing the file descriptor. /dev/urandom never blocks and always fills in as many bytes as you've requested, unless the system call is interrupted by a signal. It is considered cryptographically secure and should be your go-to random device.
/dev/random is more finicky. On most platforms, it can return fewer bytes than you've asked for and it can block if not enough bytes are available. This makes the error handling story more complex:
int randomData = open("/dev/random", O_RDONLY);
if (randomData < 0)
{
// something went wrong
}
else
{
char myRandomData[50];
size_t randomDataLen = 0;
while (randomDataLen < sizeof myRandomData)
{
ssize_t result = read(randomData, myRandomData + randomDataLen, (sizeof myRandomData) - randomDataLen);
if (result < 0)
{
// something went wrong
}
randomDataLen += result;
}
close(randomData);
}
There are other accurate answers above. I needed to use a FILE* stream, though. Here's what I did...
int byte_count = 64;
char data[64];
FILE *fp;
fp = fopen("/dev/urandom", "r");
fread(&data, 1, byte_count, fp);
fclose(fp);
Just open the file for reading and then read data. In C++11 you may wish to use std::random_device which provides cross-platform access to such devices.
Zneak is 100% correct. Its also very common to read a buffer of random numbers that is slightly larger than what you'll need on startup. You can then populate an array in memory, or write them to your own file for later re-use.
A typical implementation of the above:
typedef struct prandom {
struct prandom *prev;
int64_t number;
struct prandom *next;
} prandom_t;
This becomes more or less like a tape that just advances which can be magically replenished by another thread as needed. There are a lot of services that provide large file dumps of nothing but random numbers that are generated with much stronger generators such as:
Radioactive decay
Optical behavior (photons hitting a semi transparent mirror)
Atmospheric noise (not as strong as the above)
Farms of intoxicated monkeys typing on keyboards and moving mice (kidding)
Don't use 'pre-packaged' entropy for cryptographic seeds, in case that doesn't go without saying. Those sets are fine for simulations, not fine at all for generating keys and such.
Not being concerned with quality, if you need a lot of numbers for something like a monte carlo simulation, it's much better to have them available in a way that will not cause read() to block.
However, remember, the randomness of a number is as deterministic as the complexity involved in generating it. /dev/random and /dev/urandom are convenient, but not as strong as using a HRNG (or downloading a large dump from a HRNG). Also worth noting that /dev/random refills via entropy, so it can block for quite a while depending on circumstances.
zneak's answer covers it simply, however the reality is more complicated than that. For example, you need to consider whether /dev/{u}random really is the random number device in the first place. Such a scenario may occur if your machine has been compromised and the devices replaced with symlinks to /dev/zero or a sparse file. If this happens, the random stream is now completely predictable.
The simplest way (at least on Linux and FreeBSD) is to perform an ioctl call on the device that will only succeed if the device is a random generator:
int data;
int result = ioctl(fd, RNDGETENTCNT, &data);
// Upon success data now contains amount of entropy available in bits
If this is performed before the first read of the random device, then there's a fair bet that you've got the random device. So #zneak's answer can better be extended to be:
int randomData = open("/dev/random", O_RDONLY);
int entropy;
int result = ioctl(randomData, RNDGETENTCNT, &entropy);
if (!result) {
// Error - /dev/random isn't actually a random device
return;
}
if (entropy < sizeof(int) * 8) {
// Error - there's not enough bits of entropy in the random device to fill the buffer
return;
}
int myRandomInteger;
size_t randomDataLen = 0;
while (randomDataLen < sizeof myRandomInteger)
{
ssize_t result = read(randomData, ((char*)&myRandomInteger) + randomDataLen, (sizeof myRandomInteger) - randomDataLen);
if (result < 0)
{
// error, unable to read /dev/random
}
randomDataLen += result;
}
close(randomData);
The Insane Coding blog covered this, and other pitfalls not so long ago; I strongly recommend reading the entire article. I have to give credit to their where this solution was pulled from.
Edited to add (2014-07-25)...
Co-incidentally, I read last night that as part of the LibReSSL effort, Linux appears to be getting a GetRandom() syscall. As at time of writing, there's no word of when it will be available in a kernel general release. However this would be the preferred interface to get cryptographically secure random data as it removes all pitfalls that access via files provides. See also the LibReSSL possible implementation.

Retrieve filename from file descriptor in C

Is it possible to get the filename of a file descriptor (Linux) in C?
You can use readlink on /proc/self/fd/NNN where NNN is the file descriptor. This will give you the name of the file as it was when it was opened — however, if the file was moved or deleted since then, it may no longer be accurate (although Linux can track renames in some cases). To verify, stat the filename given and fstat the fd you have, and make sure st_dev and st_ino are the same.
Of course, not all file descriptors refer to files, and for those you'll see some odd text strings, such as pipe:[1538488]. Since all of the real filenames will be absolute paths, you can determine which these are easily enough. Further, as others have noted, files can have multiple hardlinks pointing to them - this will only report the one it was opened with. If you want to find all names for a given file, you'll just have to traverse the entire filesystem.
I had this problem on Mac OS X. We don't have a /proc virtual file system, so the accepted solution cannot work.
We do, instead, have a F_GETPATH command for fcntl:
F_GETPATH Get the path of the file descriptor Fildes. The argu-
ment must be a buffer of size MAXPATHLEN or greater.
So to get the file associated to a file descriptor, you can use this snippet:
#include <sys/syslimits.h>
#include <fcntl.h>
char filePath[PATH_MAX];
if (fcntl(fd, F_GETPATH, filePath) != -1)
{
// do something with the file path
}
Since I never remember where MAXPATHLEN is defined, I thought PATH_MAX from syslimits would be fine.
In Windows, with GetFileInformationByHandleEx, passing FileNameInfo, you can retrieve the file name.
As Tyler points out, there's no way to do what you require "directly and reliably", since a given FD may correspond to 0 filenames (in various cases) or > 1 (multiple "hard links" is how the latter situation is generally described). If you do still need the functionality with all the limitations (on speed AND on the possibility of getting 0, 2, ... results rather than 1), here's how you can do it: first, fstat the FD -- this tells you, in the resulting struct stat, what device the file lives on, how many hard links it has, whether it's a special file, etc. This may already answer your question -- e.g. if 0 hard links you will KNOW there is in fact no corresponding filename on disk.
If the stats give you hope, then you have to "walk the tree" of directories on the relevant device until you find all the hard links (or just the first one, if you don't need more than one and any one will do). For that purpose, you use readdir (and opendir &c of course) recursively opening subdirectories until you find in a struct dirent thus received the same inode number you had in the original struct stat (at which time if you want the whole path, rather than just the name, you'll need to walk the chain of directories backwards to reconstruct it).
If this general approach is acceptable, but you need more detailed C code, let us know, it won't be hard to write (though I'd rather not write it if it's useless, i.e. you cannot withstand the inevitably slow performance or the possibility of getting != 1 result for the purposes of your application;-).
Before writing this off as impossible I suggest you look at the source code of the lsof command.
There may be restrictions but lsof seems capable of determining the file descriptor and file name. This information exists in the /proc filesystem so it should be possible to get at from your program.
You can use fstat() to get the file's inode by struct stat. Then, using readdir() you can compare the inode you found with those that exist (struct dirent) in a directory (assuming that you know the directory, otherwise you'll have to search the whole filesystem) and find the corresponding file name.
Nasty?
There is no official API to do this on OpenBSD, though with some very convoluted workarounds, it is still possible with the following code, note you need to link with -lkvm and -lc. The code using FTS to traverse the filesystem is from this answer.
#include <string>
#include <vector>
#include <cstdio>
#include <cstring>
#include <sys/stat.h>
#include <fts.h>
#include <sys/sysctl.h>
#include <kvm.h>
using std::string;
using std::vector;
string pidfd2path(int pid, int fd) {
string path; char errbuf[_POSIX2_LINE_MAX];
static kvm_t *kd = nullptr; kinfo_file *kif = nullptr; int cntp = 0;
kd = kvm_openfiles(nullptr, nullptr, nullptr, KVM_NO_FILES, errbuf); if (!kd) return "";
if ((kif = kvm_getfiles(kd, KERN_FILE_BYPID, pid, sizeof(struct kinfo_file), &cntp))) {
for (int i = 0; i < cntp; i++) {
if (kif[i].fd_fd == fd) {
FTS *file_system = nullptr; FTSENT *child = nullptr; FTSENT *parent = nullptr;
vector<char *> root; char buffer[2]; strcpy(buffer, "/"); root.push_back(buffer);
file_system = fts_open(&root[0], FTS_COMFOLLOW | FTS_NOCHDIR, nullptr);
if (file_system) {
while ((parent = fts_read(file_system))) {
child = fts_children(file_system, 0);
while (child && child->fts_link) {
child = child->fts_link;
if (!S_ISSOCK(child->fts_statp->st_mode)) {
if (child->fts_statp->st_dev == kif[i].va_fsid) {
if (child->fts_statp->st_ino == kif[i].va_fileid) {
path = child->fts_path + string(child->fts_name);
goto finish;
}
}
}
}
}
finish:
fts_close(file_system);
}
}
}
}
kvm_close(kd);
return path;
}
int main(int argc, char **argv) {
if (argc == 3) {
printf("%s\n", pidfd2path((int)strtoul(argv[1], nullptr, 10),
(int)strtoul(argv[2], nullptr, 10)).c_str());
} else {
printf("usage: \"%s\" <pid> <fd>\n", argv[0]);
}
return 0;
}
If the function fails to find the file, (for example, because it no longer exists), it will return an empty string. If the file was moved, in my experience when moving the file to the trash, the new location of the file is returned instead if that location wasn't already searched through by FTS. It'll be slower for filesystems that have more files.
The deeper the search goes in the directory tree of your entire filesystem without finding the file, the more likely you are to have a race condition, though still very unlikely due to how performant this is. I'm aware my OpenBSD solution is C++ and not C. Feel free to change it to C and most of the code logic will be the same. If I have time I'll try to rewrite this in C hopefully soon. Like macOS, this solution gets a hardlink at random (citation needed), for portability with Windows and other platforms which can only get one hard link. You could remove the break in the while loop and return a vector if you want don't care about being cross-platform and want to get all the hard links. DragonFly BSD and NetBSD have the same solution (the exact same code) as the macOS solution on the current question, which I verified manually. If a macOS user wishes to get a path from a file descriptor opened any process, by plugging in a process id, and not be limited to just the calling one, while also getting all hard links potentially, and not being limited to a random one, see this answer. It should be a lot more performant that traversing your entire filesystem, similar to how fast it is on Linux and other solutions that are more straight-forward and to-the-point. FreeBSD users can get what they are looking for in this question, because the OS-level bug mentioned in that question has since been resolved for newer OS versions.
Here's a more generic solution which can only retrieve the path of a file descriptor opened by the calling process, however it should work for most Unix-likes out-of-the-box, with all the same concerns as the former solution in regards to hard links and race conditions, although performs slightly faster due to less if-then, for-loops, etc:
#include <string>
#include <vector>
#include <cstring>
#include <sys/stat.h>
#include <fts.h>
using std::string;
using std::vector;
string fd2path(int fd) {
string path;
FTS *file_system = nullptr; FTSENT *child = nullptr; FTSENT *parent = nullptr;
vector<char *> root; char buffer[2]; strcpy(buffer, "/"); root.push_back(buffer);
file_system = fts_open(&root[0], FTS_COMFOLLOW | FTS_NOCHDIR, nullptr);
if (file_system) {
while ((parent = fts_read(file_system))) {
child = fts_children(file_system, 0);
while (child && child->fts_link) {
child = child->fts_link; struct stat info = { 0 };
if (!S_ISSOCK(child->fts_statp->st_mode)) {
if (!fstat(fd, &info) && !S_ISSOCK(info.st_mode)) {
if (child->fts_statp->st_dev == info.st_dev) {
if (child->fts_statp->st_ino == info.st_ino) {
path = child->fts_path + string(child->fts_name);
goto finish;
}
}
}
}
}
}
finish:
fts_close(file_system);
}
return path;
}
An even quicker solution which is also limited to the calling process, but should be somewhat more performant, you could wrap all your calls to fopen() and open() with a helper function which stores basically whatever C equivalent there is to an std::unordered_map, and pair up the file descriptor with the absolute path version of what is passed to your fopen()/open() wrappers (and the Windows-only equivalents which won't work on UWP like _wopen_s() and all that nonsense to support UTF-8), which can be done with realpath() on Unix-likes, or GetFullPathNameW() (*W for UTF-8 support) on Windows. realpath() will resolve symbolic links (which aren't near as commonly used on Windows), and realpath() / GetFullPathNameW() will convert your existing file you opened from a relative path, if it is one, to an absolute path. With the file descriptor and absolute path stored an a C equivalent to a std::unordered_map (which you likely will have to write yourself using malloc()'d and eventually free()'d int and c-string arrays), this will again, be faster than any other solution that does a dynamic search of your filesystem, but it has a different and unappealing limitation, which is it will not make note of files which were moved around on your filesystem, however at least you can check whether the file was deleted using your own code to test existence, it also won't make note of the file in whether it was replaced since the time you opened it and stored the path to the descriptor in memory, thus giving you outdated results potentially. Let me know if you would like to see a code example of this, though due to files changing location I do not recommend this solution.
Impossible. A file descriptor may have multiple names in the filesystem, or it may have no name at all.
Edit: Assuming you are talking about a plain old POSIX system, without any OS-specific APIs, since you didn't specify an OS.

Resources