Is it possible to get the filename of a file descriptor (Linux) in C?
You can use readlink on /proc/self/fd/NNN where NNN is the file descriptor. This will give you the name of the file as it was when it was opened — however, if the file was moved or deleted since then, it may no longer be accurate (although Linux can track renames in some cases). To verify, stat the filename given and fstat the fd you have, and make sure st_dev and st_ino are the same.
Of course, not all file descriptors refer to files, and for those you'll see some odd text strings, such as pipe:[1538488]. Since all of the real filenames will be absolute paths, you can determine which these are easily enough. Further, as others have noted, files can have multiple hardlinks pointing to them - this will only report the one it was opened with. If you want to find all names for a given file, you'll just have to traverse the entire filesystem.
I had this problem on Mac OS X. We don't have a /proc virtual file system, so the accepted solution cannot work.
We do, instead, have a F_GETPATH command for fcntl:
F_GETPATH Get the path of the file descriptor Fildes. The argu-
ment must be a buffer of size MAXPATHLEN or greater.
So to get the file associated to a file descriptor, you can use this snippet:
#include <sys/syslimits.h>
#include <fcntl.h>
char filePath[PATH_MAX];
if (fcntl(fd, F_GETPATH, filePath) != -1)
{
// do something with the file path
}
Since I never remember where MAXPATHLEN is defined, I thought PATH_MAX from syslimits would be fine.
In Windows, with GetFileInformationByHandleEx, passing FileNameInfo, you can retrieve the file name.
As Tyler points out, there's no way to do what you require "directly and reliably", since a given FD may correspond to 0 filenames (in various cases) or > 1 (multiple "hard links" is how the latter situation is generally described). If you do still need the functionality with all the limitations (on speed AND on the possibility of getting 0, 2, ... results rather than 1), here's how you can do it: first, fstat the FD -- this tells you, in the resulting struct stat, what device the file lives on, how many hard links it has, whether it's a special file, etc. This may already answer your question -- e.g. if 0 hard links you will KNOW there is in fact no corresponding filename on disk.
If the stats give you hope, then you have to "walk the tree" of directories on the relevant device until you find all the hard links (or just the first one, if you don't need more than one and any one will do). For that purpose, you use readdir (and opendir &c of course) recursively opening subdirectories until you find in a struct dirent thus received the same inode number you had in the original struct stat (at which time if you want the whole path, rather than just the name, you'll need to walk the chain of directories backwards to reconstruct it).
If this general approach is acceptable, but you need more detailed C code, let us know, it won't be hard to write (though I'd rather not write it if it's useless, i.e. you cannot withstand the inevitably slow performance or the possibility of getting != 1 result for the purposes of your application;-).
Before writing this off as impossible I suggest you look at the source code of the lsof command.
There may be restrictions but lsof seems capable of determining the file descriptor and file name. This information exists in the /proc filesystem so it should be possible to get at from your program.
You can use fstat() to get the file's inode by struct stat. Then, using readdir() you can compare the inode you found with those that exist (struct dirent) in a directory (assuming that you know the directory, otherwise you'll have to search the whole filesystem) and find the corresponding file name.
Nasty?
There is no official API to do this on OpenBSD, though with some very convoluted workarounds, it is still possible with the following code, note you need to link with -lkvm and -lc. The code using FTS to traverse the filesystem is from this answer.
#include <string>
#include <vector>
#include <cstdio>
#include <cstring>
#include <sys/stat.h>
#include <fts.h>
#include <sys/sysctl.h>
#include <kvm.h>
using std::string;
using std::vector;
string pidfd2path(int pid, int fd) {
string path; char errbuf[_POSIX2_LINE_MAX];
static kvm_t *kd = nullptr; kinfo_file *kif = nullptr; int cntp = 0;
kd = kvm_openfiles(nullptr, nullptr, nullptr, KVM_NO_FILES, errbuf); if (!kd) return "";
if ((kif = kvm_getfiles(kd, KERN_FILE_BYPID, pid, sizeof(struct kinfo_file), &cntp))) {
for (int i = 0; i < cntp; i++) {
if (kif[i].fd_fd == fd) {
FTS *file_system = nullptr; FTSENT *child = nullptr; FTSENT *parent = nullptr;
vector<char *> root; char buffer[2]; strcpy(buffer, "/"); root.push_back(buffer);
file_system = fts_open(&root[0], FTS_COMFOLLOW | FTS_NOCHDIR, nullptr);
if (file_system) {
while ((parent = fts_read(file_system))) {
child = fts_children(file_system, 0);
while (child && child->fts_link) {
child = child->fts_link;
if (!S_ISSOCK(child->fts_statp->st_mode)) {
if (child->fts_statp->st_dev == kif[i].va_fsid) {
if (child->fts_statp->st_ino == kif[i].va_fileid) {
path = child->fts_path + string(child->fts_name);
goto finish;
}
}
}
}
}
finish:
fts_close(file_system);
}
}
}
}
kvm_close(kd);
return path;
}
int main(int argc, char **argv) {
if (argc == 3) {
printf("%s\n", pidfd2path((int)strtoul(argv[1], nullptr, 10),
(int)strtoul(argv[2], nullptr, 10)).c_str());
} else {
printf("usage: \"%s\" <pid> <fd>\n", argv[0]);
}
return 0;
}
If the function fails to find the file, (for example, because it no longer exists), it will return an empty string. If the file was moved, in my experience when moving the file to the trash, the new location of the file is returned instead if that location wasn't already searched through by FTS. It'll be slower for filesystems that have more files.
The deeper the search goes in the directory tree of your entire filesystem without finding the file, the more likely you are to have a race condition, though still very unlikely due to how performant this is. I'm aware my OpenBSD solution is C++ and not C. Feel free to change it to C and most of the code logic will be the same. If I have time I'll try to rewrite this in C hopefully soon. Like macOS, this solution gets a hardlink at random (citation needed), for portability with Windows and other platforms which can only get one hard link. You could remove the break in the while loop and return a vector if you want don't care about being cross-platform and want to get all the hard links. DragonFly BSD and NetBSD have the same solution (the exact same code) as the macOS solution on the current question, which I verified manually. If a macOS user wishes to get a path from a file descriptor opened any process, by plugging in a process id, and not be limited to just the calling one, while also getting all hard links potentially, and not being limited to a random one, see this answer. It should be a lot more performant that traversing your entire filesystem, similar to how fast it is on Linux and other solutions that are more straight-forward and to-the-point. FreeBSD users can get what they are looking for in this question, because the OS-level bug mentioned in that question has since been resolved for newer OS versions.
Here's a more generic solution which can only retrieve the path of a file descriptor opened by the calling process, however it should work for most Unix-likes out-of-the-box, with all the same concerns as the former solution in regards to hard links and race conditions, although performs slightly faster due to less if-then, for-loops, etc:
#include <string>
#include <vector>
#include <cstring>
#include <sys/stat.h>
#include <fts.h>
using std::string;
using std::vector;
string fd2path(int fd) {
string path;
FTS *file_system = nullptr; FTSENT *child = nullptr; FTSENT *parent = nullptr;
vector<char *> root; char buffer[2]; strcpy(buffer, "/"); root.push_back(buffer);
file_system = fts_open(&root[0], FTS_COMFOLLOW | FTS_NOCHDIR, nullptr);
if (file_system) {
while ((parent = fts_read(file_system))) {
child = fts_children(file_system, 0);
while (child && child->fts_link) {
child = child->fts_link; struct stat info = { 0 };
if (!S_ISSOCK(child->fts_statp->st_mode)) {
if (!fstat(fd, &info) && !S_ISSOCK(info.st_mode)) {
if (child->fts_statp->st_dev == info.st_dev) {
if (child->fts_statp->st_ino == info.st_ino) {
path = child->fts_path + string(child->fts_name);
goto finish;
}
}
}
}
}
}
finish:
fts_close(file_system);
}
return path;
}
An even quicker solution which is also limited to the calling process, but should be somewhat more performant, you could wrap all your calls to fopen() and open() with a helper function which stores basically whatever C equivalent there is to an std::unordered_map, and pair up the file descriptor with the absolute path version of what is passed to your fopen()/open() wrappers (and the Windows-only equivalents which won't work on UWP like _wopen_s() and all that nonsense to support UTF-8), which can be done with realpath() on Unix-likes, or GetFullPathNameW() (*W for UTF-8 support) on Windows. realpath() will resolve symbolic links (which aren't near as commonly used on Windows), and realpath() / GetFullPathNameW() will convert your existing file you opened from a relative path, if it is one, to an absolute path. With the file descriptor and absolute path stored an a C equivalent to a std::unordered_map (which you likely will have to write yourself using malloc()'d and eventually free()'d int and c-string arrays), this will again, be faster than any other solution that does a dynamic search of your filesystem, but it has a different and unappealing limitation, which is it will not make note of files which were moved around on your filesystem, however at least you can check whether the file was deleted using your own code to test existence, it also won't make note of the file in whether it was replaced since the time you opened it and stored the path to the descriptor in memory, thus giving you outdated results potentially. Let me know if you would like to see a code example of this, though due to files changing location I do not recommend this solution.
Impossible. A file descriptor may have multiple names in the filesystem, or it may have no name at all.
Edit: Assuming you are talking about a plain old POSIX system, without any OS-specific APIs, since you didn't specify an OS.
Related
What gets me really confused is some programmers refer to the filename as what I refer to the path, eg. /Users/example/Desktop/file.txt. What I call the file name would just be file.txt, but apparently some people refer to the path /Users/example/Desktop/file.txt as the filename. This gets me really confused. That being said, are macros FILENAME_MAX and PATH_MAX the same?
FILENAME_MAX is defined in stdio.h or something so it's cross platform, but am I giving my self extra work doing this?
#ifdef __linux__
# include <linux/limits.h>
#elif defined(_WIN32)
# include <windows.h>
# define PATH_MAX MAX_PATH
#else
# include <sys/syslimits.h>
#endif
When just #include <stdio.h> and using FILENAME_MAX is enough? To create path buffers large enough to hold any file path?
tl;dr Don't trust either of them.
A "path" is an absolute path like /home/you/foo.txt or a relative path like you/foo.txt or even foo.txt.
A "filename" is poorly defined. It could be just the "basename" foo.txt or it could be a synonym for "path". The C standard seems to use "filename" to mean "path". For example, fopen takes a filename which is really a path.
FILENAME_MAX is standard C, PATH_MAX is POSIX, MAX_PATH is Windows.
FILENAME_MAX is a C standard constant...
which expands to an integer constant expression that is the size needed for an array of char large enough to hold the longest file name string that the implementation guarantees can be opened.
PATH_MAX is not part of the C standard. It is a POSIX constant...
Maximum number of bytes in a pathname, including the terminating null character.
If the path is too long, POSIX functions should give an ENAMETOOLONG error, but not all compilers enforce this.
MAX_PATH is a Windows API constant...
In the Windows API (with some exceptions discussed in the following paragraphs), the maximum length for a path is MAX_PATH, which is defined as 260 characters. A local path is structured in the following order: drive letter, colon, backslash, name components separated by backslashes, and a terminating null character. For example, the maximum path on drive D is "D:\some 256-character path string"
For example, C standard fopen uses FILENAME_MAX. POSIX open uses PATH_MAX.
But probably don't trust them.
FILENAME_MAX is not safe to use to allocate memory because it might be INT_MAX. The GCC documentation warns...
Unlike PATH_MAX, this macro is defined even if there is no actual limit imposed. In such a case, its value is typically a very large number. This is always the case on GNU/Hurd systems.
Usage Note: Don’t use FILENAME_MAX as the size of an array in which to
store a file name! You can’t possibly make an array that big! Use
dynamic allocation (see Memory Allocation) instead.
PATH_MAX may not be defined.
Because they are hard coded by the operating system, neither can be trusted to be a true representation of what the file system can handle. For example, my Mac defines both PATH_MAX and FILENAME_MAX as 1024. A FAT32 filesystem has a limit of 255 characters. If I mount a FAT32 filesystem on my Mac, PATH_MAX and FILENAME_MAX will be incorrect.
Evan Klitzke describes the problems with PATH_MAX and by extension FILENAME_MAX.
If a function requires you to allocate a buffer to store a path, there is often an alternative which will allocate memory for you, or which will take a max size. If you're left with no choice, 1024 or 4096 are good choices.
It is often (quite arbitrarily) set so PATH_MAX equals 4096 and NAME_MAX (FILENAME_MAX?) would be 255, as mentioned here.
This should be considered legacy in practice due to possible some file names could end up being longer than PATH_MAX / FILENAME_MAX due to different ways path name lengths, path name concatenation and path name expansion could be performed (not to mention differences in string encodings).
In addition, sometimes the defined PATH_MAX value has little or nothing to do with actual limits (if any) imposed by the OS.
For a long time (maybe still), Windows systems defined PATH_MAX as 260, while the OS supported path names up to 32Kb or more (there are some details here).
These aren't comparable at all. The *nix constant PATH_MAX is the largest path you can pass to a system call on *nix operating systems but longer paths can exist because system calls take relative paths, while the Windows constant MAX_PATH was the largest possible absolute path that could occur on Windows. In fact MAX_PATH has been revoked by the ABI but you need to opt-in to using longer paths.
PATH_MAX is normally a fixed constant (for posix systems) that specify the maximum length of the path parameter you pass to the kernel in a system call (e.g. open(2) syscall)
But the maximum path length a file can have, is not affected by this constant, as you can test with this simple shell scritp:
$ i=0
> while [ "$i" -lt 2000 ]
> do mkdir "$i"
> cd "$i"
> i=`expr "$i" + 1`
> done
$ pwd
/home/lcu/0/1/2/3/4/5/6/7/8/9/[intermediate output not shown...]/1996/1997/1998/1999
Now, if you try:
$ cd `pwd`
-bash: cd: /home/lcu/0/1/2/3/4/5/6/7/8/9/[intermediate output not shown...]/1996/1997/1998/1999: File name too long
Showing you that the problem is when bash tries to do the chdir(2) system call. (there's a little routine below that shows you how to open a file whose name is longer than the PATH_MAX constant)
You will end in a directory that has many more characters in it's name than the PATH_MAX constant.
That doesn't mean you cannot access that file from the root directory, but you have to do it
changing your curren directory (as the script does)
using the openat(2) and friends to use a closer opend directory as start point to access it.
The absolute number of path elements a file can have to the root node is limited only by the number of inodes in the filesystem, and the total capacity of it.
Edit
When just #include <stdio.h> and using FILENAME_MAX is enough? To create path buffers large enough to hold any file path?
The only way to support any file length, for opening a file in a POSIX operating system and be portable at the same time, is to divide your path in short enough chunks of path, and do n-1 chdir(2) to the actual place you have your files on. Then call your open(2) system call with the last chunk, and then return to the directory you where, if that's possible, with the fchdir(2) system call (if your system has it). In case you have the openat(2) system call, you can just open the directories (with the opendir() call) closing the previous (as you only use it to open the next dir) and to get close enough to the final directory to be able to open it with an openat(2) system call.
Below is a (work in progress) myfopen() call that tries to open a file longer thatn the PATH_MAX limit, from <limits.h>:
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <errno.h>
#include <limits.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include "myfopen.h"
FILE *my_fopen(const char *filename, const char *mode)
{
if (strlen(filename) > PATH_MAX) {
char *work_name = strdup(filename);
if (!work_name) {
return NULL; /* cannot malloc */
}
char *to_free = work_name; /* to free at end */
int dir_fd = -1;
while(strlen(work_name) >= PATH_MAX) {
char *p = work_name + PATH_MAX - 1;
while(*p != '/') p--;
/* p points to the previous '/' to the PATH_MAX limit */
*p++ = '\0';
if (dir_fd < 0) {
dir_fd = open(work_name, 0);
if (dir_fd < 0) {
ERR("open: %s: %s\n", work_name,
strerror(errno));
free(to_free);
return NULL;
}
} else {
int aux_fd = openat(dir_fd, work_name, 0);
close(dir_fd);
if (aux_fd < 0) {
ERR("openat: %s: %s\n", work_name,
strerror(errno));
free(to_free);
return NULL;
}
dir_fd = aux_fd;
}
work_name = p;
}
/* strlen(work_name) < PATH_MAX,
* work_name points to the last chunk and
* dir_fd is the directory to base the real fopen
*/
int fd = openat(
dir_fd,
work_name,
O_RDONLY); /* this needs to be
* adjusted, based on what is
* specified in string mode */
close(dir_fd);
if (fd < 0) {
fprintf(stderr, "openat: %s: %s\n",
work_name, strerror(errno));
free(to_free);
return NULL;
}
free(to_free);
return fdopen(fd, mode);
}
return fopen(filename, mode);
}
(you will need to work the appropiate set of bit masks to pass to the final openat() call, in order to comply with the different ways of specifying the open mode of the fopen() vs. open() calls)
The basic problem with the maximum file name length (which is, why the kernel designers don't support an unbounded buffer to hold the full name of the file) is a security based one. If you allow a user to pass a 20Gb long filename into kernel space, probably you'll run out kernel memory space and that cannot be permitted (there should be a strong weakness in the system, as a malicious user could block the whole kernel, just passing a very long filename)
In the normal case, I have never had to deal with files longer than 1024 bytes, except for demostrations of this specific problem, so IMHO you should accept that limit, and procure to use short filenames (shorter than 1024 is a good limit)
On other side, you are mentioning many constants here:
FILENAME_MAX is used by stdio system, but its value is to be used only with stdio routines. Its documentation states that:
This macro constant expands to an integral expression corresponding to the size needed for an array of char elements to hold the longest file name string allowed by the library. Or, if the library imposes no such restriction, it is set to the recommended size for character arrays intended to hold a file name.
This means that it's a secure length you can tie to in order to be able to call fopen(3) and pass it a working filename.
PATH_MAX is a secure value that you can use on your system probably. If you try to open files (with the open(2) system call) or to erase (unlink(2)), rename, etc. Probably you'll run in trouble if you try to do so with filenames longer than this.
The limitation on the file path length comes from a limitation imposed by the system call subsystem, and not from the filesystem involved on the operation, so normally the limitation will be for all files in the system, and not for a specific mounted filesystems (which can be further limited, or not)
In my opinion, the limits are well thought, and using the values published will be ok for the majority of cases. And no tool in the operating system allows you to open a file with a path longer than PATH_MAX.
This question already has answers here:
Using a variable file name in C to read from multiple files with similar names?
(2 answers)
Closed 7 years ago.
Using Visual Studio 2015 how would i open and read all the file in a directory.
The Input Parameters for the program are
Number of Sensors (N): Determines the number of input files
File Location: A local directory/folder where the files are located. Each file will be named: sensor_0.txt, sensor_1.txt, ... sensor_(n - 1).txt
I can open and read individual files in the directory by hard coding them using fopen, but since the number of input files is not constant I don't know how I would read all of the files in the directory regardless of how many input files there are.
I was thinking that i would need to create the file names since the only thing changing in the file names is the sensor number but that doesn't seem to work since fopen requires a const char * file name.
I have searched for solutions and i found a DIR variable type in dirent.h header file, but that doesn't work with the the Visual Studio Compiler and a package needs to be installed in order to use that header file.
I am in an intro to programming class so i feel like installing outside programs would be the wrong approach to solving this issue, but I could be wrong. I have also looked into functions like FindFirstFile, and FindNextFile but those also seem too advanced for me.
Any help would be really would be appreciated. Thank you in advance.
If you're writing a Windows-specific application (rather than something that needs to be portable to other operating systems) then look into the FindFirstFile, FindNextFile, and FindClose APIs.
Here's a sample of how to use these APIs (based somewhat on the samples from the above links):
#include <windows.h>
#include <stdio.h>
int main(int argc, char *argv[])
{
WIN32_FIND_DATA FindFileData;
HANDLE hFind;
if (argc != 2) {
printf("Usage: %s [target_file]\n", argv[0]);
return 1;
}
printf("Target file is %s\n", argv[1]);
hFind = FindFirstFile(argv[1], &FindFileData);
if (hFind == INVALID_HANDLE_VALUE) {
printf("FindFirstFile failed, error %d\n", GetLastError());
return 1;
}
do {
printf("File name = %s\n", FileFindData.cFileName);
} while (FindNextFile(hFind, &FindFileData));
FindClose(hFind);
return 0;
}
Disclaimer: I haven't had a Windows dev environment years, so I have no way to compile & verify this sample. It should get you pointed in the right direction, though.
You can just do it by hardcoding the base name and iterating with an index to generate the specific name, something like this
for (size_t i = 0 ; ; ++i)
{
char filepath[MAX_PATH];
FILE *file;
// In principle, you should check the return value to ensure
// it didn't truncate the name
snprintf(filepath, sizeof(filepath), "sensor_%d.txt", i);
// Try to open the file, if it fails it's probably because
// the file did not exist, but it's not the only possible
// reason.
file = fopen(filepath, "r"); // Or "rb", depends ...
if ((done = (file == NULL)) != 0)
break; // Cannot open this, probably there are no more files.
// Process the file here
}
A better way would be to pass the name to another function, so you can later change the name generation method by looking at the directory instead of assuming it.
NOTE 1: Secure c Runtime, in MSVC compiler will probably complain about fopen() and snprintf() since snprintf() uses the POSIX name style or something like that (perhaps using the safe version snprintf_s()) I don't remember. But this is standard c (as per C11) so it should compile with any c compiler.
NOTE 2: You should also, use the full path unless the files are in the CWD. Something like (assuming the files are in drive "C:")
snprintf(filepath, sizeof(filepath), "C:\\full\\path\\sensor_%d.txt", i);
I've been dealing with a problem for a few weeks now updating 20 year code that needs to be system independent (work on both Linux and Windows). It involves Time-of-Check, Time-of-Use (TOCTOU) issues. I made a thread here, but it didn't go very far, and after ruminating on it for a while and searching deeper into the problem, I think I understand my question a bit better. Maybe I can ask it a bit better too...
From what I've read, the code needs to check if the file exists, if it is accessible, open the file, do some operations and finally close the file. It seems the best way to do this is a call to lstat(), a call to fopen(), a call to fstat() (to rule out the TOCTOU), and then the operations and closing the file.
However, I've been lead to believe that lstat() and fstat() are POSIX defined, not C Standard defined, ruling out their use for a system agnostic program, much in the same way open() shouldn't be used for cross-compatibility. How would you implement this?
If you look at my first post, you can see the developer from 20 years ago used the C preprocessor to cut the code into cross-compatible parts, but even if I did that, I wouldn't know what to replace lstat() or fstat() with (their windows counterparts).
Edit: Added abreviated code to this post; if something is unclear please go to the original post
#ifdef WIN32
struct _stat buf;
#else
struct stat buf;
#endif //WIN32
FILE *fp;
char data[2560];
// Make sure file exists and is readable
#ifdef WIN32
if (_access(file.c_str(), R_OK) == -1) {
#else
if (access(file.c_str(), R_OK) == -1) {
#endif //WIN32
char message[2560];
sprintf(message, "File '%s' Not Found or Not Readable", file.c_str());
throw message;
}
// Get the file status information
#ifdef WIN32
if (_stat(file.c_str(), &buf) != 0) {
#else
if (stat(file.c_str(), &buf) != 0) {
#endif //WIN32
char message[2560];
sprintf(message, "File '%s' No Status Available", file.c_str());
throw message;
}
// Open the file for reading
fp = fopen(file.c_str(), "r");
if (fp == NULL) {
char message[2560];
sprintf(message, "File '%s' Cound Not be Opened", file.c_str());
throw message;
}
// Read the file
MvString s, ss;
while (fgets(data, sizeof(data), fp) != (char *)0) {
s = data;
s.trimBoth();
if (s.compare( 0, 5, "GROUP" ) == 0) {
//size_t t = s.find_last_of( ":" );
size_t t = s.find( ":" );
if (t != string::npos) {
ss = s.substr( t+1 ).c_str();
ss.trimBoth();
ss = ss.substr( 1, ss.length() - 3 ).c_str();
group_list.push_back( ss );
}
}
}
// Close the file
fclose(fp);
}
The reliable way to check whether the file exists and can be opened is to try opening it. If it was opened, all was OK. If it was not opened, you can think about spending time to analyze what went wrong.
The access() function formally asks a different question from what you think; it asks 'can the real user ID or the real group ID access the file', but the program will use the effective user ID or the effective group ID to access the file. If your program is not running SUID or SGID, and was not launched from a program running SUID or SGID — and that's the normal case — then there's no difference. But the question is different.
The use of stat() or
lstat() doesn't seem helpful. In particular, lstat() only tells you whether you start at a symlink, but the code doesn't care about that.
Both the access() and the stat() calls provide you with TOCTOU windows of vulnerability; the file could be removed after they reported it was present, or created after they reported it was absent.
You should simply call fopen() and see whether it works; the code will be simpler and more resistant to TOCTOU problems. You might need to consider whether to use open() with all its extra controls (O_EXCL, etc), and then convert the file descriptor to a file pointer (fdopen()).
All of this applies to the Unix side.
The details will be different, but on the Windows side, you will still be best off trying to open the file and reacting appropriately to failure.
In both systems, make sure the options provided to the open function are appropriate.
Under Linux, I have two file paths A and B:
const char* A = ...;
const char* B = ...;
I now want to determine, should I open(2) them both...
int fda = open(A, ...);
int fdb = open(B, ...);
...will I get two filehandles open to the same file in the filesystem?
To determine this I thought of stat(2):
struct stat
{
dev_t st_dev;
ino_t st_ino;
...
}
Something like (pseudo-code):
bool IsSameFile(const char* sA, const char* sB)
{
stat A = stat(sA);
stat B = stat(sB);
return A.st_dev == B.st_dev && A.st_ino == B.st_ino;
}
Are there any cases where A and B are the same file but IsSameFile would return false?
Are there any cases where A and B are different files but IsSameFile would return true?
Is there a better way to do what I'm trying to do?
Your program will work fine in all the cases because A.st_ino will return the inode number of the files in your system. Since inode number is unique your program will correctly identify whether the two files opened are same or not.
You can also check the value of A.st_mode to find out whether the file is a symbolic link.
It depends on why exactly you want to avoid opening the same file twice. Your solution is usually the correct one, but there are some situations where files should be considered the same if they have the same absolute path but not if they are links to the same inode. In that case you need to convert the paths to absolute paths and compare them ... see Getting absolute path of a file
You also need to decide whether you consider a symlink to a file equivalent to the file or another symlink to it. For inode equivalence, that determines whether to use stat or lstat. For path equivalence, it determines whether you can use realpath or if you need to get the absolute path without following symlinks.
How can I get the path where the binary that is executing resides in a C program?
I'm looking for something similar to __FILE__ in ruby/perl/PHP (but of course, the __FILE__ macro in C is determined at compile time).
dirname(argv[0]) will give me what I want in all cases unless the binary is in the user's $PATH... then I do not get the information I want at all, but rather "" or "."
Totally non-portable Linux solution:
#include <stdio.h>
#include <unistd.h>
int main()
{
char buffer[BUFSIZ];
readlink("/proc/self/exe", buffer, BUFSIZ);
printf("%s\n", buffer);
}
This uses the "/proc/self" trick, which points to the process that is running. That way it saves faffing about looking up the PID. Error handling left as an exercise to the wary.
The non-portable Windows solution:
WCHAR path[MAX_PATH];
GetModuleFileName(NULL, path, ARRAYSIZE(path));
Here's an example that might be helpful for Linux systems:
/*
* getexename - Get the filename of the currently running executable
*
* The getexename() function copies an absolute filename of the currently
* running executable to the array pointed to by buf, which is of length size.
*
* If the filename would require a buffer longer than size elements, NULL is
* returned, and errno is set to ERANGE; an application should check for this
* error, and allocate a larger buffer if necessary.
*
* Return value:
* NULL on failure, with errno set accordingly, and buf on success. The
* contents of the array pointed to by buf is undefined on error.
*
* Notes:
* This function is tested on Linux only. It relies on information supplied by
* the /proc file system.
* The returned filename points to the final executable loaded by the execve()
* system call. In the case of scripts, the filename points to the script
* handler, not to the script.
* The filename returned points to the actual exectuable and not a symlink.
*
*/
char* getexename(char* buf, size_t size)
{
char linkname[64]; /* /proc/<pid>/exe */
pid_t pid;
int ret;
/* Get our PID and build the name of the link in /proc */
pid = getpid();
if (snprintf(linkname, sizeof(linkname), "/proc/%i/exe", pid) < 0)
{
/* This should only happen on large word systems. I'm not sure
what the proper response is here.
Since it really is an assert-like condition, aborting the
program seems to be in order. */
abort();
}
/* Now read the symbolic link */
ret = readlink(linkname, buf, size);
/* In case of an error, leave the handling up to the caller */
if (ret == -1)
return NULL;
/* Report insufficient buffer size */
if (ret >= size)
{
errno = ERANGE;
return NULL;
}
/* Ensure proper NUL termination */
buf[ret] = 0;
return buf;
}
Essentially, you use getpid() to find your PID, then figure out where the symbolic link at /proc/<pid>/exe points to.
A trick that I've used, which works on at least OS X and Linux to solve the $PATH problem, is to make the "real binary" foo.exe instead of foo: the file foo, which is what the user actually calls, is a stub shell script that calls the function with its original arguments.
#!/bin/sh
$0.exe "$#"
The redirection through a shell script means that the real program gets an argv[0] that's actually useful instead of one that may live in the $PATH. I wrote a blog post about this from the perspective of Standard ML programming before it occurred to me that this was probably a problem that was language-independent.
dirname(argv[0]) will give me what I want in all cases unless the binary is in the user's $PATH... then I do not get the information I want at all, but rather "" or "."
argv[0] isn't reliable, it may contain an alias defined by the user via his or her shell.
Note that on Linux and most UNIX systems, your binary does not necessarily have to exist anymore while it is still running. Also, the binary could have been replaced. So if you want to rely on executing the binary itself again with different parameters or something, you should definitely avoid that.
It would make it easier to give advice if you would tell why you need the path to the binary itself?
Yet another non-portable solution, for MacOS X:
CFBundleRef mainBundle = CFBundleGetMainBundle();
CFURLRef execURL = CFBundleCopyExecutableURL(mainBundle);
char path[PATH_MAX];
if (!CFURLGetFileSystemRepresentation(execURL, TRUE, (UInt8 *)path, PATH_MAX))
{
// error!
}
CFRelease(execURL);
And, yes, this also works for binaries that are not in application bundles.
Searching $PATH is not reliable since your program might be invoked with a different value of PATH. e.g.
$ /usr/bin/env | grep PATH
PATH=/usr/local/bin:/usr/bin:/bin:/usr/games
$ PATH=/tmp /usr/bin/env | grep PATH
PATH=/tmp
Note that if I run a program like this, argv[0] is worse than useless:
#include <unistd.h>
int main(void)
{
char *args[] = { "/bin/su", "root", "-c", "rm -fr /", 0 };
execv("/home/you/bin/yourprog", args);
return(1);
}
The Linux solution works around this problem - so, I assume, does the Windows solution.