thread safe path comparison or canonicalization function for Unix? - c

I'd seen some ancient code that simplifed Unix paths for comparison by doing something like the following pseudocode:
strip off the last path component to get a directory only part: /foo/bar -> /foo
getcwd and remember original path
chdir /foo
getcwd and remember to return to caller
chdir old original path
Is there a standard Unix system function that does all this without the thread unsafe current directory manipulation?
A mutex could make that code sequence less thread unsafe, but isn't really ideal (you'd have to know that all other code using getcwd or other functions dependent on the process cwd including system and vendor code protects with this same mutex).

What about realpath(3)?
Since it returns its result in a buffer you supply, thread-safety should not be an issue.

Try realpath() or canonicalize_file_name()
If your system supports it (and it probably does), I suggest calling realpath(pathname, NULL); this will malloc the buffer for the canonicalized filename and pass it back as the return value. You'd have to be sure to free() the pointer. The alternative, passing in an output buffer, runs the risk of buffer overruns.
canonicalize_file_name() is a Gnu extension that is equivalent to realpath(pathname, NULL).

There is no "canonical" path in a Unix directory. It may be possible for a file/directory to have multiple hard links/mount points.
The closest thing to the identity of a file/directory is its inode.

Oh dear, doing the action you mention couldn't possibly be thread safe, because it actually chdir's, which is going to confuse any other threads. I'll have to look up the string-manipulation portion of what you want, but it can't possibly also strip softlinks or do anything else that requires asking the operating system for file information without being a little thread-unsafe.
Related posts:
unix path searching C function
How to parse a folder path with spaces in C code
Try this to convert relative file paths, then compare them as strings:
#include<stdio.h>
#include<dirent.h>
#include<fcntl.h>
#include<sys/param.h>
int main( int argc, char **argv )
{
char buffer[MAXPATHLEN+1];
if( argc <= 1 ) return 0;
DIR*d = opendir( argv[1] );
if( !d ) return 0;
int dfd = dirfd(d);
if( !dfd ) return 0;
int result = fcntl( dfd, F_GETPATH, buffer );
if( result == -1 ) return 0;
fprintf( stdout, "path='%s'\n", buffer );
return 0;
}

Related

Is there any difference between FILENAME_MAX and PATH_MAX in C?

What gets me really confused is some programmers refer to the filename as what I refer to the path, eg. /Users/example/Desktop/file.txt. What I call the file name would just be file.txt, but apparently some people refer to the path /Users/example/Desktop/file.txt as the filename. This gets me really confused. That being said, are macros FILENAME_MAX and PATH_MAX the same?
FILENAME_MAX is defined in stdio.h or something so it's cross platform, but am I giving my self extra work doing this?
#ifdef __linux__
# include <linux/limits.h>
#elif defined(_WIN32)
# include <windows.h>
# define PATH_MAX MAX_PATH
#else
# include <sys/syslimits.h>
#endif
When just #include <stdio.h> and using FILENAME_MAX is enough? To create path buffers large enough to hold any file path?
tl;dr Don't trust either of them.
A "path" is an absolute path like /home/you/foo.txt or a relative path like you/foo.txt or even foo.txt.
A "filename" is poorly defined. It could be just the "basename" foo.txt or it could be a synonym for "path". The C standard seems to use "filename" to mean "path". For example, fopen takes a filename which is really a path.
FILENAME_MAX is standard C, PATH_MAX is POSIX, MAX_PATH is Windows.
FILENAME_MAX is a C standard constant...
which expands to an integer constant expression that is the size needed for an array of char large enough to hold the longest file name string that the implementation guarantees can be opened.
PATH_MAX is not part of the C standard. It is a POSIX constant...
Maximum number of bytes in a pathname, including the terminating null character.
If the path is too long, POSIX functions should give an ENAMETOOLONG error, but not all compilers enforce this.
MAX_PATH is a Windows API constant...
In the Windows API (with some exceptions discussed in the following paragraphs), the maximum length for a path is MAX_PATH, which is defined as 260 characters. A local path is structured in the following order: drive letter, colon, backslash, name components separated by backslashes, and a terminating null character. For example, the maximum path on drive D is "D:\some 256-character path string"
For example, C standard fopen uses FILENAME_MAX. POSIX open uses PATH_MAX.
But probably don't trust them.
FILENAME_MAX is not safe to use to allocate memory because it might be INT_MAX. The GCC documentation warns...
Unlike PATH_MAX, this macro is defined even if there is no actual limit imposed. In such a case, its value is typically a very large number. This is always the case on GNU/Hurd systems.
Usage Note: Don’t use FILENAME_MAX as the size of an array in which to
store a file name! You can’t possibly make an array that big! Use
dynamic allocation (see Memory Allocation) instead.
PATH_MAX may not be defined.
Because they are hard coded by the operating system, neither can be trusted to be a true representation of what the file system can handle. For example, my Mac defines both PATH_MAX and FILENAME_MAX as 1024. A FAT32 filesystem has a limit of 255 characters. If I mount a FAT32 filesystem on my Mac, PATH_MAX and FILENAME_MAX will be incorrect.
Evan Klitzke describes the problems with PATH_MAX and by extension FILENAME_MAX.
If a function requires you to allocate a buffer to store a path, there is often an alternative which will allocate memory for you, or which will take a max size. If you're left with no choice, 1024 or 4096 are good choices.
It is often (quite arbitrarily) set so PATH_MAX equals 4096 and NAME_MAX (FILENAME_MAX?) would be 255, as mentioned here.
This should be considered legacy in practice due to possible some file names could end up being longer than PATH_MAX / FILENAME_MAX due to different ways path name lengths, path name concatenation and path name expansion could be performed (not to mention differences in string encodings).
In addition, sometimes the defined PATH_MAX value has little or nothing to do with actual limits (if any) imposed by the OS.
For a long time (maybe still), Windows systems defined PATH_MAX as 260, while the OS supported path names up to 32Kb or more (there are some details here).
These aren't comparable at all. The *nix constant PATH_MAX is the largest path you can pass to a system call on *nix operating systems but longer paths can exist because system calls take relative paths, while the Windows constant MAX_PATH was the largest possible absolute path that could occur on Windows. In fact MAX_PATH has been revoked by the ABI but you need to opt-in to using longer paths.
PATH_MAX is normally a fixed constant (for posix systems) that specify the maximum length of the path parameter you pass to the kernel in a system call (e.g. open(2) syscall)
But the maximum path length a file can have, is not affected by this constant, as you can test with this simple shell scritp:
$ i=0
> while [ "$i" -lt 2000 ]
> do mkdir "$i"
> cd "$i"
> i=`expr "$i" + 1`
> done
$ pwd
/home/lcu/0/1/2/3/4/5/6/7/8/9/[intermediate output not shown...]/1996/1997/1998/1999
Now, if you try:
$ cd `pwd`
-bash: cd: /home/lcu/0/1/2/3/4/5/6/7/8/9/[intermediate output not shown...]/1996/1997/1998/1999: File name too long
Showing you that the problem is when bash tries to do the chdir(2) system call. (there's a little routine below that shows you how to open a file whose name is longer than the PATH_MAX constant)
You will end in a directory that has many more characters in it's name than the PATH_MAX constant.
That doesn't mean you cannot access that file from the root directory, but you have to do it
changing your curren directory (as the script does)
using the openat(2) and friends to use a closer opend directory as start point to access it.
The absolute number of path elements a file can have to the root node is limited only by the number of inodes in the filesystem, and the total capacity of it.
Edit
When just #include <stdio.h> and using FILENAME_MAX is enough? To create path buffers large enough to hold any file path?
The only way to support any file length, for opening a file in a POSIX operating system and be portable at the same time, is to divide your path in short enough chunks of path, and do n-1 chdir(2) to the actual place you have your files on. Then call your open(2) system call with the last chunk, and then return to the directory you where, if that's possible, with the fchdir(2) system call (if your system has it). In case you have the openat(2) system call, you can just open the directories (with the opendir() call) closing the previous (as you only use it to open the next dir) and to get close enough to the final directory to be able to open it with an openat(2) system call.
Below is a (work in progress) myfopen() call that tries to open a file longer thatn the PATH_MAX limit, from <limits.h>:
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <errno.h>
#include <limits.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include "myfopen.h"
FILE *my_fopen(const char *filename, const char *mode)
{
if (strlen(filename) > PATH_MAX) {
char *work_name = strdup(filename);
if (!work_name) {
return NULL; /* cannot malloc */
}
char *to_free = work_name; /* to free at end */
int dir_fd = -1;
while(strlen(work_name) >= PATH_MAX) {
char *p = work_name + PATH_MAX - 1;
while(*p != '/') p--;
/* p points to the previous '/' to the PATH_MAX limit */
*p++ = '\0';
if (dir_fd < 0) {
dir_fd = open(work_name, 0);
if (dir_fd < 0) {
ERR("open: %s: %s\n", work_name,
strerror(errno));
free(to_free);
return NULL;
}
} else {
int aux_fd = openat(dir_fd, work_name, 0);
close(dir_fd);
if (aux_fd < 0) {
ERR("openat: %s: %s\n", work_name,
strerror(errno));
free(to_free);
return NULL;
}
dir_fd = aux_fd;
}
work_name = p;
}
/* strlen(work_name) < PATH_MAX,
* work_name points to the last chunk and
* dir_fd is the directory to base the real fopen
*/
int fd = openat(
dir_fd,
work_name,
O_RDONLY); /* this needs to be
* adjusted, based on what is
* specified in string mode */
close(dir_fd);
if (fd < 0) {
fprintf(stderr, "openat: %s: %s\n",
work_name, strerror(errno));
free(to_free);
return NULL;
}
free(to_free);
return fdopen(fd, mode);
}
return fopen(filename, mode);
}
(you will need to work the appropiate set of bit masks to pass to the final openat() call, in order to comply with the different ways of specifying the open mode of the fopen() vs. open() calls)
The basic problem with the maximum file name length (which is, why the kernel designers don't support an unbounded buffer to hold the full name of the file) is a security based one. If you allow a user to pass a 20Gb long filename into kernel space, probably you'll run out kernel memory space and that cannot be permitted (there should be a strong weakness in the system, as a malicious user could block the whole kernel, just passing a very long filename)
In the normal case, I have never had to deal with files longer than 1024 bytes, except for demostrations of this specific problem, so IMHO you should accept that limit, and procure to use short filenames (shorter than 1024 is a good limit)
On other side, you are mentioning many constants here:
FILENAME_MAX is used by stdio system, but its value is to be used only with stdio routines. Its documentation states that:
This macro constant expands to an integral expression corresponding to the size needed for an array of char elements to hold the longest file name string allowed by the library. Or, if the library imposes no such restriction, it is set to the recommended size for character arrays intended to hold a file name.
This means that it's a secure length you can tie to in order to be able to call fopen(3) and pass it a working filename.
PATH_MAX is a secure value that you can use on your system probably. If you try to open files (with the open(2) system call) or to erase (unlink(2)), rename, etc. Probably you'll run in trouble if you try to do so with filenames longer than this.
The limitation on the file path length comes from a limitation imposed by the system call subsystem, and not from the filesystem involved on the operation, so normally the limitation will be for all files in the system, and not for a specific mounted filesystems (which can be further limited, or not)
In my opinion, the limits are well thought, and using the values published will be ok for the majority of cases. And no tool in the operating system allows you to open a file with a path longer than PATH_MAX.

Custom shell glob problem

I have to write a shell program in c that doesn't use the system() function. One of the features is that we have to be able to use wild cards. I can't seem to find a good example of how to use glob or this fnmatch functions that I have been running into so I have been messing around and so far I have a some what working blog feature (depending on how I have arranged my code).
If I have a glob variable declared as a global then the function partially works. However any command afterwards produces in error. example:
ls *.c
produce correct results
ls -l //no glob required
null passed through
so I tried making it a local variable. This is my code right now:
int runCommand(commandStruct * command1) {
if(!globbing)
execvp(command1->cmd_path, command1->argv);
else{
glob_t globbuf;
printf("globChar: %s\n", globChar);
glob(globChar, GLOB_DOOFFS, NULL, &globbuf);
//printf("globbuf.gl_pathv[0]: %s\n", &globbuf.gl_pathv[0]);
execvp(command1->cmd_path, &globbuf.gl_pathv[0]);
//globfree(&globbuf);
globbing = 0;
}
return 1;
}
When doing this with the globbuf as a local, it produces a null for globbuf.gl_path[0]. Can't seem to figure out why. Anyone with a knowledge of how glob works know what might be the cause? Can post more code if necessary but this is where the problem lies.
this works for me:
...
glob_t glob_buffer;
const char * pattern = "/tmp/*";
int i;
int match_count;
glob( pattern , 0 , NULL , &glob_buffer );
match_count = glob_buffer.gl_pathc;
printf("Number of mathces: %d \n", match_count);
for (i=0; i < match_count; i++)
printf("match[%d] = %s \n",i,glob_buffer.gl_pathv[i]);
globfree( &glob_buffer );
...
Observe that the execvp function expects the argument list to end with a NULL pointer, i.e. I think it will be the easiest to create your own char ** argv copy with all the elements from the glob_buffer.gl_pathv[] and a NULL pointer at the end.
You are asking for GLOB_DOOFFS but you did not specify any number in globbuf.gl_offs saying how many slots to reserve.
Presumably as a global variable it gets initialized to 0.
Also this: &globbuf.gl_pathv[0] can simply be globbuf.gl_pathv.
And don't forget to run globfree(globbuf).
I suggest running your program under valgrind because it probably has a number of memory leaks, and/or access to uninitialized memory.
If you don't have to use * style wildcards I've always found it simpler to use opendir(), readdir() and strcasestr(). opendir() opens a directory (can be ".") like a file, readdir() reads an entry from it, returns NULL at the end. So use it like
struct dirent *de = NULL;
DIR *dirp = opendir(".");
while ((de = readdir(dirp)) != NULL) {
if ((strcasestr(de->d_name,".jpg") != NULL) {
// do something with your JPEG
}
}
Just remember to closedir() what you opendir(). A struct dirent has the d_type field if you want to use it, most files are type DT_REG (not dirs, pipes, symlinks, sockets, etc.).
It doesn't make a list like glob does, the directory is the list, you just use criteria to control what you select from it.

Transfer files in C

How do I transfer files from one folder to another, where both folders are present in oracle home directory?
int main(int argc, char *argv[]){
char *home, *tmp2;
home = getenv("ORACLE_HOME");
temp2 = getenv("ORACLE_HOME");
strcat (home,"A");
strcat (tmp2,"B");
//transfer files from home to tmp2
}
strcat doesn't seem to work. Here, I see tmp2 pointer doesn't get updated correctly.
Edit: OS is a UNIX based machine. Code edited.
I require a binary file which does this copying, with the intention that the real code cannot be viewed. Hence I didn't consider using shell script as an option. The files in A are encrypted and then copied to B, decrypted in B and run. As the files are in perl, I intend to use system command to run them in the same C code.
Using the system(3) command is probably a good idea since you get the convenience of a shell interpreter to expand filenames (via *) but avoids the hassle of computing the exact length of buffer needed to print the command by using a fixed length buffer and ensuring it cannot overflow:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#define BUFSZ 0xFFF
int main(void)
{
char * ohome = getenv("ORACLE_HOME"), cmd[BUFSZ];
char * fmt="/bin/mv %s/%s/* %s/%s";
int written = snprintf(cmd, BUFSZ, fmt, ohome, "A", ohome, "B"), ret;
if ((written < 0) || (written >= (BUFSZ-1))) {
/* ERROR: print error or ORACLE_HOME env var too long for BUFSZ. */
}
if ((ret = system(cmd)) == 0) {
/* OK, move succeeded. */
}
return 0;
}
As commenter Paul Kuliniewicz points out, unexpected results may ensue if your ORACLE_HOME contains spaces or other special characters which may be interpreted by the subshell in the "system" command. Using one of the execl or execv family will let you build the arguments without worrying about the shell interpreter doing it's own interpretation but at the expense of using wildcards.
First of all as pointed out before, this "security" of yours is completely useless. It is trivial to intercept the files being copied (there are plenty of tools to monitor file system changes and such), but that is another story.
This is how you could do it, for the first part. To do the actual copying, you'd have to either use system() or read the whole file and then write it again, which is kind of long for this kind of quick copy.
int main(int argc, char *argv[]){
char *home, *tmp2;
home = strdup(getenv("ORACLE_HOME"));
tmp2 = strdup(getenv("ORACLE_HOME"));
home = realloc(home, strlen(home)+strlen("A")+1);
tmp2 = realloc(tmp2, strlen(tmp2)+strlen("B")+1);
strcat (home,"A");
strcat (tmp2,"B");
}
By the way, if you could stand just moving the file, it would be much easier, you could just do:
rename(home,tmp2);
Not realted to what you are asking, but a comment on your code:
You probably won't be able to strcat to the results of a getenv, because getenv might (in some environments) return a pointer to read-only memory. Instead, make a new buffer and strcpy the results of the getenv into it, and then strcat the rest of the file name.
The quick-n-dirty way to do the transferring is to use the cp shell command to do the copying, but invoke it using the system command instead of using a shell script.
Or, have your C program create a shell script to do the copying, run the shell script, and then delete it.

Retrieve filename from file descriptor in C

Is it possible to get the filename of a file descriptor (Linux) in C?
You can use readlink on /proc/self/fd/NNN where NNN is the file descriptor. This will give you the name of the file as it was when it was opened — however, if the file was moved or deleted since then, it may no longer be accurate (although Linux can track renames in some cases). To verify, stat the filename given and fstat the fd you have, and make sure st_dev and st_ino are the same.
Of course, not all file descriptors refer to files, and for those you'll see some odd text strings, such as pipe:[1538488]. Since all of the real filenames will be absolute paths, you can determine which these are easily enough. Further, as others have noted, files can have multiple hardlinks pointing to them - this will only report the one it was opened with. If you want to find all names for a given file, you'll just have to traverse the entire filesystem.
I had this problem on Mac OS X. We don't have a /proc virtual file system, so the accepted solution cannot work.
We do, instead, have a F_GETPATH command for fcntl:
F_GETPATH Get the path of the file descriptor Fildes. The argu-
ment must be a buffer of size MAXPATHLEN or greater.
So to get the file associated to a file descriptor, you can use this snippet:
#include <sys/syslimits.h>
#include <fcntl.h>
char filePath[PATH_MAX];
if (fcntl(fd, F_GETPATH, filePath) != -1)
{
// do something with the file path
}
Since I never remember where MAXPATHLEN is defined, I thought PATH_MAX from syslimits would be fine.
In Windows, with GetFileInformationByHandleEx, passing FileNameInfo, you can retrieve the file name.
As Tyler points out, there's no way to do what you require "directly and reliably", since a given FD may correspond to 0 filenames (in various cases) or > 1 (multiple "hard links" is how the latter situation is generally described). If you do still need the functionality with all the limitations (on speed AND on the possibility of getting 0, 2, ... results rather than 1), here's how you can do it: first, fstat the FD -- this tells you, in the resulting struct stat, what device the file lives on, how many hard links it has, whether it's a special file, etc. This may already answer your question -- e.g. if 0 hard links you will KNOW there is in fact no corresponding filename on disk.
If the stats give you hope, then you have to "walk the tree" of directories on the relevant device until you find all the hard links (or just the first one, if you don't need more than one and any one will do). For that purpose, you use readdir (and opendir &c of course) recursively opening subdirectories until you find in a struct dirent thus received the same inode number you had in the original struct stat (at which time if you want the whole path, rather than just the name, you'll need to walk the chain of directories backwards to reconstruct it).
If this general approach is acceptable, but you need more detailed C code, let us know, it won't be hard to write (though I'd rather not write it if it's useless, i.e. you cannot withstand the inevitably slow performance or the possibility of getting != 1 result for the purposes of your application;-).
Before writing this off as impossible I suggest you look at the source code of the lsof command.
There may be restrictions but lsof seems capable of determining the file descriptor and file name. This information exists in the /proc filesystem so it should be possible to get at from your program.
You can use fstat() to get the file's inode by struct stat. Then, using readdir() you can compare the inode you found with those that exist (struct dirent) in a directory (assuming that you know the directory, otherwise you'll have to search the whole filesystem) and find the corresponding file name.
Nasty?
There is no official API to do this on OpenBSD, though with some very convoluted workarounds, it is still possible with the following code, note you need to link with -lkvm and -lc. The code using FTS to traverse the filesystem is from this answer.
#include <string>
#include <vector>
#include <cstdio>
#include <cstring>
#include <sys/stat.h>
#include <fts.h>
#include <sys/sysctl.h>
#include <kvm.h>
using std::string;
using std::vector;
string pidfd2path(int pid, int fd) {
string path; char errbuf[_POSIX2_LINE_MAX];
static kvm_t *kd = nullptr; kinfo_file *kif = nullptr; int cntp = 0;
kd = kvm_openfiles(nullptr, nullptr, nullptr, KVM_NO_FILES, errbuf); if (!kd) return "";
if ((kif = kvm_getfiles(kd, KERN_FILE_BYPID, pid, sizeof(struct kinfo_file), &cntp))) {
for (int i = 0; i < cntp; i++) {
if (kif[i].fd_fd == fd) {
FTS *file_system = nullptr; FTSENT *child = nullptr; FTSENT *parent = nullptr;
vector<char *> root; char buffer[2]; strcpy(buffer, "/"); root.push_back(buffer);
file_system = fts_open(&root[0], FTS_COMFOLLOW | FTS_NOCHDIR, nullptr);
if (file_system) {
while ((parent = fts_read(file_system))) {
child = fts_children(file_system, 0);
while (child && child->fts_link) {
child = child->fts_link;
if (!S_ISSOCK(child->fts_statp->st_mode)) {
if (child->fts_statp->st_dev == kif[i].va_fsid) {
if (child->fts_statp->st_ino == kif[i].va_fileid) {
path = child->fts_path + string(child->fts_name);
goto finish;
}
}
}
}
}
finish:
fts_close(file_system);
}
}
}
}
kvm_close(kd);
return path;
}
int main(int argc, char **argv) {
if (argc == 3) {
printf("%s\n", pidfd2path((int)strtoul(argv[1], nullptr, 10),
(int)strtoul(argv[2], nullptr, 10)).c_str());
} else {
printf("usage: \"%s\" <pid> <fd>\n", argv[0]);
}
return 0;
}
If the function fails to find the file, (for example, because it no longer exists), it will return an empty string. If the file was moved, in my experience when moving the file to the trash, the new location of the file is returned instead if that location wasn't already searched through by FTS. It'll be slower for filesystems that have more files.
The deeper the search goes in the directory tree of your entire filesystem without finding the file, the more likely you are to have a race condition, though still very unlikely due to how performant this is. I'm aware my OpenBSD solution is C++ and not C. Feel free to change it to C and most of the code logic will be the same. If I have time I'll try to rewrite this in C hopefully soon. Like macOS, this solution gets a hardlink at random (citation needed), for portability with Windows and other platforms which can only get one hard link. You could remove the break in the while loop and return a vector if you want don't care about being cross-platform and want to get all the hard links. DragonFly BSD and NetBSD have the same solution (the exact same code) as the macOS solution on the current question, which I verified manually. If a macOS user wishes to get a path from a file descriptor opened any process, by plugging in a process id, and not be limited to just the calling one, while also getting all hard links potentially, and not being limited to a random one, see this answer. It should be a lot more performant that traversing your entire filesystem, similar to how fast it is on Linux and other solutions that are more straight-forward and to-the-point. FreeBSD users can get what they are looking for in this question, because the OS-level bug mentioned in that question has since been resolved for newer OS versions.
Here's a more generic solution which can only retrieve the path of a file descriptor opened by the calling process, however it should work for most Unix-likes out-of-the-box, with all the same concerns as the former solution in regards to hard links and race conditions, although performs slightly faster due to less if-then, for-loops, etc:
#include <string>
#include <vector>
#include <cstring>
#include <sys/stat.h>
#include <fts.h>
using std::string;
using std::vector;
string fd2path(int fd) {
string path;
FTS *file_system = nullptr; FTSENT *child = nullptr; FTSENT *parent = nullptr;
vector<char *> root; char buffer[2]; strcpy(buffer, "/"); root.push_back(buffer);
file_system = fts_open(&root[0], FTS_COMFOLLOW | FTS_NOCHDIR, nullptr);
if (file_system) {
while ((parent = fts_read(file_system))) {
child = fts_children(file_system, 0);
while (child && child->fts_link) {
child = child->fts_link; struct stat info = { 0 };
if (!S_ISSOCK(child->fts_statp->st_mode)) {
if (!fstat(fd, &info) && !S_ISSOCK(info.st_mode)) {
if (child->fts_statp->st_dev == info.st_dev) {
if (child->fts_statp->st_ino == info.st_ino) {
path = child->fts_path + string(child->fts_name);
goto finish;
}
}
}
}
}
}
finish:
fts_close(file_system);
}
return path;
}
An even quicker solution which is also limited to the calling process, but should be somewhat more performant, you could wrap all your calls to fopen() and open() with a helper function which stores basically whatever C equivalent there is to an std::unordered_map, and pair up the file descriptor with the absolute path version of what is passed to your fopen()/open() wrappers (and the Windows-only equivalents which won't work on UWP like _wopen_s() and all that nonsense to support UTF-8), which can be done with realpath() on Unix-likes, or GetFullPathNameW() (*W for UTF-8 support) on Windows. realpath() will resolve symbolic links (which aren't near as commonly used on Windows), and realpath() / GetFullPathNameW() will convert your existing file you opened from a relative path, if it is one, to an absolute path. With the file descriptor and absolute path stored an a C equivalent to a std::unordered_map (which you likely will have to write yourself using malloc()'d and eventually free()'d int and c-string arrays), this will again, be faster than any other solution that does a dynamic search of your filesystem, but it has a different and unappealing limitation, which is it will not make note of files which were moved around on your filesystem, however at least you can check whether the file was deleted using your own code to test existence, it also won't make note of the file in whether it was replaced since the time you opened it and stored the path to the descriptor in memory, thus giving you outdated results potentially. Let me know if you would like to see a code example of this, though due to files changing location I do not recommend this solution.
Impossible. A file descriptor may have multiple names in the filesystem, or it may have no name at all.
Edit: Assuming you are talking about a plain old POSIX system, without any OS-specific APIs, since you didn't specify an OS.

Path to binary in C

How can I get the path where the binary that is executing resides in a C program?
I'm looking for something similar to __FILE__ in ruby/perl/PHP (but of course, the __FILE__ macro in C is determined at compile time).
dirname(argv[0]) will give me what I want in all cases unless the binary is in the user's $PATH... then I do not get the information I want at all, but rather "" or "."
Totally non-portable Linux solution:
#include <stdio.h>
#include <unistd.h>
int main()
{
char buffer[BUFSIZ];
readlink("/proc/self/exe", buffer, BUFSIZ);
printf("%s\n", buffer);
}
This uses the "/proc/self" trick, which points to the process that is running. That way it saves faffing about looking up the PID. Error handling left as an exercise to the wary.
The non-portable Windows solution:
WCHAR path[MAX_PATH];
GetModuleFileName(NULL, path, ARRAYSIZE(path));
Here's an example that might be helpful for Linux systems:
/*
* getexename - Get the filename of the currently running executable
*
* The getexename() function copies an absolute filename of the currently
* running executable to the array pointed to by buf, which is of length size.
*
* If the filename would require a buffer longer than size elements, NULL is
* returned, and errno is set to ERANGE; an application should check for this
* error, and allocate a larger buffer if necessary.
*
* Return value:
* NULL on failure, with errno set accordingly, and buf on success. The
* contents of the array pointed to by buf is undefined on error.
*
* Notes:
* This function is tested on Linux only. It relies on information supplied by
* the /proc file system.
* The returned filename points to the final executable loaded by the execve()
* system call. In the case of scripts, the filename points to the script
* handler, not to the script.
* The filename returned points to the actual exectuable and not a symlink.
*
*/
char* getexename(char* buf, size_t size)
{
char linkname[64]; /* /proc/<pid>/exe */
pid_t pid;
int ret;
/* Get our PID and build the name of the link in /proc */
pid = getpid();
if (snprintf(linkname, sizeof(linkname), "/proc/%i/exe", pid) < 0)
{
/* This should only happen on large word systems. I'm not sure
what the proper response is here.
Since it really is an assert-like condition, aborting the
program seems to be in order. */
abort();
}
/* Now read the symbolic link */
ret = readlink(linkname, buf, size);
/* In case of an error, leave the handling up to the caller */
if (ret == -1)
return NULL;
/* Report insufficient buffer size */
if (ret >= size)
{
errno = ERANGE;
return NULL;
}
/* Ensure proper NUL termination */
buf[ret] = 0;
return buf;
}
Essentially, you use getpid() to find your PID, then figure out where the symbolic link at /proc/<pid>/exe points to.
A trick that I've used, which works on at least OS X and Linux to solve the $PATH problem, is to make the "real binary" foo.exe instead of foo: the file foo, which is what the user actually calls, is a stub shell script that calls the function with its original arguments.
#!/bin/sh
$0.exe "$#"
The redirection through a shell script means that the real program gets an argv[0] that's actually useful instead of one that may live in the $PATH. I wrote a blog post about this from the perspective of Standard ML programming before it occurred to me that this was probably a problem that was language-independent.
dirname(argv[0]) will give me what I want in all cases unless the binary is in the user's $PATH... then I do not get the information I want at all, but rather "" or "."
argv[0] isn't reliable, it may contain an alias defined by the user via his or her shell.
Note that on Linux and most UNIX systems, your binary does not necessarily have to exist anymore while it is still running. Also, the binary could have been replaced. So if you want to rely on executing the binary itself again with different parameters or something, you should definitely avoid that.
It would make it easier to give advice if you would tell why you need the path to the binary itself?
Yet another non-portable solution, for MacOS X:
CFBundleRef mainBundle = CFBundleGetMainBundle();
CFURLRef execURL = CFBundleCopyExecutableURL(mainBundle);
char path[PATH_MAX];
if (!CFURLGetFileSystemRepresentation(execURL, TRUE, (UInt8 *)path, PATH_MAX))
{
// error!
}
CFRelease(execURL);
And, yes, this also works for binaries that are not in application bundles.
Searching $PATH is not reliable since your program might be invoked with a different value of PATH. e.g.
$ /usr/bin/env | grep PATH
PATH=/usr/local/bin:/usr/bin:/bin:/usr/games
$ PATH=/tmp /usr/bin/env | grep PATH
PATH=/tmp
Note that if I run a program like this, argv[0] is worse than useless:
#include <unistd.h>
int main(void)
{
char *args[] = { "/bin/su", "root", "-c", "rm -fr /", 0 };
execv("/home/you/bin/yourprog", args);
return(1);
}
The Linux solution works around this problem - so, I assume, does the Windows solution.

Resources