Programmatically retrieving the absolute path of an OS X command-line app - c

On Linux, an application can easily get its absolute path by querying /proc/self/exe. On FreeBSD, it's more involved, since you have to build up a sysctl call:
int mib[4];
mib[0] = CTL_KERN;
mib[1] = KERN_PROC;
mib[2] = KERN_PROC_PATHNAME;
mib[3] = -1;
char buf[1024];
size_t cb = sizeof(buf);
sysctl(mib, 4, buf, &cb, NULL, 0);
but it's still completely doable. Yet I cannot find a way to determine this on OS X for a command-line application. If you're running from within an app bundle, you can determine it by running [[NSBundle mainBundle] bundlePath], but because command-line applications are not in bundles, this doesn't help.
(Note: consulting argv[0] is not a reasonable answer, since, if launched from a symlink, argv[0] will be that symlink--not the ultimate path to the executable called. argv[0] can also lie if a dumb application uses an exec() call and forget to initialize argv properly, which I have seen in the wild.)

The function _NSGetExecutablePath will return a full path to the executable (GUI or not). The path may contain symbolic links, "..", etc. but the realpath function can be used to clean those up if needed. See man 3 dyld for more information.
char path[1024];
uint32_t size = sizeof(path);
if (_NSGetExecutablePath(path, &size) == 0)
printf("executable path is %s\n", path);
else
printf("buffer too small; need size %u\n", size);
The secret to this function is that the Darwin kernel puts the executable path on the process stack immediately after the envp array when it creates the process. The dynamic link editor dyld grabs this on initialization and keeps a pointer to it. This function uses that pointer.

I believe there is much more elegant solution, which actually works for any PID, and also returns the absolute path directly:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <libproc.h>
int main (int argc, char* argv[])
{
int ret;
pid_t pid;
char pathbuf[PROC_PIDPATHINFO_MAXSIZE];
pid = getpid();
ret = proc_pidpath (pid, pathbuf, sizeof(pathbuf));
if ( ret <= 0 ) {
fprintf(stderr, "PID %d: proc_pidpath ();\n", pid);
fprintf(stderr, " %s\n", strerror(errno));
} else {
printf("proc %d: %s\n", pid, pathbuf);
}
return 0;
}

Looks like the answer is that you can't do it:
I'm trying to achieve something like
lsof's functionality and gather a
whole bunch of statistics and info
about running processes. If lsof
weren't so slow, I'd be happy sticking
with it.
If you reimplement lsof, you will find
that it's slow because it's doing a
lot of work.
I guess that's not really because lsof
is user-mode, it's more that it has to
scan through a task's address space
looking for things backed by an
external pager. Is there any quicker
way of doing this when I'm in the
kernel?
No. lsof is not stupid; it's doing
what it has to do. If you just want a
subset of its functionality, you might
want to consider starting with the
lsof source (which is available) and
trimming it down to meet your
requirements.
Out of curiosity, is p_textvp used at
all? It looks like it's set to the
parent's p_textvp in kern_fork (and
then getting released??) but it's not
getting touched in any of kern_exec's
routines.
p_textvp is not used. In Darwin, the
proc is not the root of the address
space; the task is. There is no
concept of "the vnode" for a task's
address space, as it is not
necessarily initially populated by
mapping one.
If exec were to populate p_textvp, it
would pander to the assumption that
all processes are backed by a vnode.
Then programmers would assume that it
was possible to get a path to the
vnode, and from there it is a short
jump to the assumption that the
current path to the vnode is the path
from which it was launched, and that
text processing on the string might
lead to the application bundle name...
all of which would be impossible to
guarantee without substantial penalty.
—Mike Smith, Darwin Drivers
mailing list

This is late, but [[NSBundle mainBundle] executablePath] works just fine for non-bundled, command-line programs.

There is no guaranteed way I think.
If argv[0] is a symlink then you could use readlink().
If command is executed through the $PATH then one could
try some of: search(getenv("PATH")), getenv("_"), dladdr()

Why not simply realpath(argv[0], actualpath);? True, realpath has some limits (documented in the manual page) but it handles symbolic links fine. Tested on FreeBSD and Linux
% ls -l foobar
lrwxr-xr-x 1 bortzmeyer bortzmeyer 22 Apr 29 07:39 foobar -> /tmp/get-real-name-exe
% ./foobar
My real path: /tmp/get-real-name-exe
#include <limits.h>
#include <stdlib.h>
#include <stdio.h>
#include <libgen.h>
#include <string.h>
#include <sys/stat.h>
int
main(argc, argv)
int argc;
char **argv;
{
char actualpath[PATH_MAX + 1];
if (argc > 1) {
fprintf(stderr, "Usage: %s\n", argv[0]);
exit(1);
}
realpath(argv[0], actualpath);
fprintf(stdout, "My real path: %s\n", actualpath);
exit(0);
}
If the program is launched via PATH, see pixelbeat's solution.

http://developer.apple.com/documentation/Carbon/Reference/Process_Manager/Reference/reference.html#//apple_ref/c/func/GetProcessBundleLocation
GetProcessBundleLocation seems to work.

Related

How to use relative paths in general C program?

I have created a C program which I'd like to publish to Homebrew, so it needs to be able to be run from any working directory (via a symlink in the user's PATH).
The program uses relative paths (relative to the code) for file writing, and I'm not sure how to set it up so that the relative paths will work, regardless of what location the program is run from. (using absolute paths doesn't seem to be an option, since it needs to run on anyone's system).
Please could someone provide some general guidance on how to approach this?
Many thanks in advance.
Under Linux, the file /proc/self/exe is a symbolic link to the actual executable running. Hence you could use that to ascertain the directory where the executable is, and work from there.
For example, this program (executable file /home/pax/testprog):
#include <stdio.h>
#include <unistd.h>
int main(void) {
// Buffer should be more robust in real program but read anyway.
char buff[1000];
ssize_t sz = readlink("/proc/self/exe", buff, sizeof(buff));
// Find final directory separator (from '/home/pax/testprog').
while (sz > 0 && buff[sz-1] != '/')
--sz;
// Truncate string to directory name only, and print.
buff[sz] = '\0';
printf("[%s]\n", buff);
}
generates the following output:
[/home/pax/]
For MacOs, it's the same theory, but you may not necessarily have procfs available to you. On that system however, there's another way to get the path:
int proc_pidpath(
int pid, // pid of the process to know more about
void * buffer, // buffer to fill with the abs path
uint32_t buffersize // size of the buffer
);
So, in order to use this, replace:
ssize_t sz = readlink("/proc/self/exe", buff, sizeof(buff));
with:
ssize_t sz = proc_pidpath(getpid(), buff, sizeof(buff));
though you'll probably also need the libproc.h header (getpid is already included as part of unistd.h).

Why does symlink() always print symlink failed?

My code always prints symlink failed even when it creates the symlink, why does this happen?
I am writting all the core utils I use myself as I want the experince and don't like the implementations that exist, I am working on ln and honestly may just do soft links and skip hard links. Right now the program works, but always prints my error and I can't figure out why.
#include <stdio.h>
#include <unistd.h>
int main(int argc, const char *argv[])
{
short i;
for (i = 1; i < argc; i++) {
if (symlink(argv[1], argv[2]) == -1)
printf("symlink failed");
else
symlink(argv[1], argv[2]);
}
}
You're looping over each argument to the program, but attempting to create a symlink from argv[2] to argv[1] on every iteration. The first one might succeed, but any further attempts will always fail because the link already exists.
You'll want to think carefully about how ln should behave when passed more than two arguments. The behavior of ln -s is more complex than simply calling symlink(); notably, it behaves differently when the last argument is a directory.

Reading a directory file in C

I'm trying to write a small program to show me the internal representation of a directory in linux (debian, specifically). The idea was a small C program using open(".", O_RDONLY), but this seems to give no output. The program is the following:
#include <stdio.h>
#include <fcntl.h>
int main(int argc, char** argv)
{
int fd = open(argv[1],O_RDONLY,0 );
char buf;
printf("%i\n",fd);
while(read(fd, &buf, 1) > 0)
printf("%x ", buf);
putchar('\n');
}
When I run it on regular files it works as expected, but on a directory such as ".", it gives no output. The value of fd is 3 (as expected) but the call to read returns -1.
Why isn't this working, and how could I achieve to read the internal representation?
Thanks!
For handling directories, you need to use opendir/readdir/closedir. Read the corresponding man pages for more infos.
To check whether a filename corresponds to a directory, you first need to call stat for the filename and check whether it's a directory (S_ISDIR(myStatStruc.st_mode)).
Directories are a filesystem specific representation and are part of the file system. On extfs, they are a table of string/inode pairs, unlike files which have blocks of data(that you read using your code above).
To read directory-specific information in C, you need to use dirent.h .
Look at this page for more information
http://pubs.opengroup.org/onlinepubs/7908799/xsh/dirent.h.html
On POSIX systems, the system call "stat" would give you all the information about an inode on the filesystem(file/directory/etc.)

Retrieve filename from file descriptor in C

Is it possible to get the filename of a file descriptor (Linux) in C?
You can use readlink on /proc/self/fd/NNN where NNN is the file descriptor. This will give you the name of the file as it was when it was opened — however, if the file was moved or deleted since then, it may no longer be accurate (although Linux can track renames in some cases). To verify, stat the filename given and fstat the fd you have, and make sure st_dev and st_ino are the same.
Of course, not all file descriptors refer to files, and for those you'll see some odd text strings, such as pipe:[1538488]. Since all of the real filenames will be absolute paths, you can determine which these are easily enough. Further, as others have noted, files can have multiple hardlinks pointing to them - this will only report the one it was opened with. If you want to find all names for a given file, you'll just have to traverse the entire filesystem.
I had this problem on Mac OS X. We don't have a /proc virtual file system, so the accepted solution cannot work.
We do, instead, have a F_GETPATH command for fcntl:
F_GETPATH Get the path of the file descriptor Fildes. The argu-
ment must be a buffer of size MAXPATHLEN or greater.
So to get the file associated to a file descriptor, you can use this snippet:
#include <sys/syslimits.h>
#include <fcntl.h>
char filePath[PATH_MAX];
if (fcntl(fd, F_GETPATH, filePath) != -1)
{
// do something with the file path
}
Since I never remember where MAXPATHLEN is defined, I thought PATH_MAX from syslimits would be fine.
In Windows, with GetFileInformationByHandleEx, passing FileNameInfo, you can retrieve the file name.
As Tyler points out, there's no way to do what you require "directly and reliably", since a given FD may correspond to 0 filenames (in various cases) or > 1 (multiple "hard links" is how the latter situation is generally described). If you do still need the functionality with all the limitations (on speed AND on the possibility of getting 0, 2, ... results rather than 1), here's how you can do it: first, fstat the FD -- this tells you, in the resulting struct stat, what device the file lives on, how many hard links it has, whether it's a special file, etc. This may already answer your question -- e.g. if 0 hard links you will KNOW there is in fact no corresponding filename on disk.
If the stats give you hope, then you have to "walk the tree" of directories on the relevant device until you find all the hard links (or just the first one, if you don't need more than one and any one will do). For that purpose, you use readdir (and opendir &c of course) recursively opening subdirectories until you find in a struct dirent thus received the same inode number you had in the original struct stat (at which time if you want the whole path, rather than just the name, you'll need to walk the chain of directories backwards to reconstruct it).
If this general approach is acceptable, but you need more detailed C code, let us know, it won't be hard to write (though I'd rather not write it if it's useless, i.e. you cannot withstand the inevitably slow performance or the possibility of getting != 1 result for the purposes of your application;-).
Before writing this off as impossible I suggest you look at the source code of the lsof command.
There may be restrictions but lsof seems capable of determining the file descriptor and file name. This information exists in the /proc filesystem so it should be possible to get at from your program.
You can use fstat() to get the file's inode by struct stat. Then, using readdir() you can compare the inode you found with those that exist (struct dirent) in a directory (assuming that you know the directory, otherwise you'll have to search the whole filesystem) and find the corresponding file name.
Nasty?
There is no official API to do this on OpenBSD, though with some very convoluted workarounds, it is still possible with the following code, note you need to link with -lkvm and -lc. The code using FTS to traverse the filesystem is from this answer.
#include <string>
#include <vector>
#include <cstdio>
#include <cstring>
#include <sys/stat.h>
#include <fts.h>
#include <sys/sysctl.h>
#include <kvm.h>
using std::string;
using std::vector;
string pidfd2path(int pid, int fd) {
string path; char errbuf[_POSIX2_LINE_MAX];
static kvm_t *kd = nullptr; kinfo_file *kif = nullptr; int cntp = 0;
kd = kvm_openfiles(nullptr, nullptr, nullptr, KVM_NO_FILES, errbuf); if (!kd) return "";
if ((kif = kvm_getfiles(kd, KERN_FILE_BYPID, pid, sizeof(struct kinfo_file), &cntp))) {
for (int i = 0; i < cntp; i++) {
if (kif[i].fd_fd == fd) {
FTS *file_system = nullptr; FTSENT *child = nullptr; FTSENT *parent = nullptr;
vector<char *> root; char buffer[2]; strcpy(buffer, "/"); root.push_back(buffer);
file_system = fts_open(&root[0], FTS_COMFOLLOW | FTS_NOCHDIR, nullptr);
if (file_system) {
while ((parent = fts_read(file_system))) {
child = fts_children(file_system, 0);
while (child && child->fts_link) {
child = child->fts_link;
if (!S_ISSOCK(child->fts_statp->st_mode)) {
if (child->fts_statp->st_dev == kif[i].va_fsid) {
if (child->fts_statp->st_ino == kif[i].va_fileid) {
path = child->fts_path + string(child->fts_name);
goto finish;
}
}
}
}
}
finish:
fts_close(file_system);
}
}
}
}
kvm_close(kd);
return path;
}
int main(int argc, char **argv) {
if (argc == 3) {
printf("%s\n", pidfd2path((int)strtoul(argv[1], nullptr, 10),
(int)strtoul(argv[2], nullptr, 10)).c_str());
} else {
printf("usage: \"%s\" <pid> <fd>\n", argv[0]);
}
return 0;
}
If the function fails to find the file, (for example, because it no longer exists), it will return an empty string. If the file was moved, in my experience when moving the file to the trash, the new location of the file is returned instead if that location wasn't already searched through by FTS. It'll be slower for filesystems that have more files.
The deeper the search goes in the directory tree of your entire filesystem without finding the file, the more likely you are to have a race condition, though still very unlikely due to how performant this is. I'm aware my OpenBSD solution is C++ and not C. Feel free to change it to C and most of the code logic will be the same. If I have time I'll try to rewrite this in C hopefully soon. Like macOS, this solution gets a hardlink at random (citation needed), for portability with Windows and other platforms which can only get one hard link. You could remove the break in the while loop and return a vector if you want don't care about being cross-platform and want to get all the hard links. DragonFly BSD and NetBSD have the same solution (the exact same code) as the macOS solution on the current question, which I verified manually. If a macOS user wishes to get a path from a file descriptor opened any process, by plugging in a process id, and not be limited to just the calling one, while also getting all hard links potentially, and not being limited to a random one, see this answer. It should be a lot more performant that traversing your entire filesystem, similar to how fast it is on Linux and other solutions that are more straight-forward and to-the-point. FreeBSD users can get what they are looking for in this question, because the OS-level bug mentioned in that question has since been resolved for newer OS versions.
Here's a more generic solution which can only retrieve the path of a file descriptor opened by the calling process, however it should work for most Unix-likes out-of-the-box, with all the same concerns as the former solution in regards to hard links and race conditions, although performs slightly faster due to less if-then, for-loops, etc:
#include <string>
#include <vector>
#include <cstring>
#include <sys/stat.h>
#include <fts.h>
using std::string;
using std::vector;
string fd2path(int fd) {
string path;
FTS *file_system = nullptr; FTSENT *child = nullptr; FTSENT *parent = nullptr;
vector<char *> root; char buffer[2]; strcpy(buffer, "/"); root.push_back(buffer);
file_system = fts_open(&root[0], FTS_COMFOLLOW | FTS_NOCHDIR, nullptr);
if (file_system) {
while ((parent = fts_read(file_system))) {
child = fts_children(file_system, 0);
while (child && child->fts_link) {
child = child->fts_link; struct stat info = { 0 };
if (!S_ISSOCK(child->fts_statp->st_mode)) {
if (!fstat(fd, &info) && !S_ISSOCK(info.st_mode)) {
if (child->fts_statp->st_dev == info.st_dev) {
if (child->fts_statp->st_ino == info.st_ino) {
path = child->fts_path + string(child->fts_name);
goto finish;
}
}
}
}
}
}
finish:
fts_close(file_system);
}
return path;
}
An even quicker solution which is also limited to the calling process, but should be somewhat more performant, you could wrap all your calls to fopen() and open() with a helper function which stores basically whatever C equivalent there is to an std::unordered_map, and pair up the file descriptor with the absolute path version of what is passed to your fopen()/open() wrappers (and the Windows-only equivalents which won't work on UWP like _wopen_s() and all that nonsense to support UTF-8), which can be done with realpath() on Unix-likes, or GetFullPathNameW() (*W for UTF-8 support) on Windows. realpath() will resolve symbolic links (which aren't near as commonly used on Windows), and realpath() / GetFullPathNameW() will convert your existing file you opened from a relative path, if it is one, to an absolute path. With the file descriptor and absolute path stored an a C equivalent to a std::unordered_map (which you likely will have to write yourself using malloc()'d and eventually free()'d int and c-string arrays), this will again, be faster than any other solution that does a dynamic search of your filesystem, but it has a different and unappealing limitation, which is it will not make note of files which were moved around on your filesystem, however at least you can check whether the file was deleted using your own code to test existence, it also won't make note of the file in whether it was replaced since the time you opened it and stored the path to the descriptor in memory, thus giving you outdated results potentially. Let me know if you would like to see a code example of this, though due to files changing location I do not recommend this solution.
Impossible. A file descriptor may have multiple names in the filesystem, or it may have no name at all.
Edit: Assuming you are talking about a plain old POSIX system, without any OS-specific APIs, since you didn't specify an OS.

Path to binary in C

How can I get the path where the binary that is executing resides in a C program?
I'm looking for something similar to __FILE__ in ruby/perl/PHP (but of course, the __FILE__ macro in C is determined at compile time).
dirname(argv[0]) will give me what I want in all cases unless the binary is in the user's $PATH... then I do not get the information I want at all, but rather "" or "."
Totally non-portable Linux solution:
#include <stdio.h>
#include <unistd.h>
int main()
{
char buffer[BUFSIZ];
readlink("/proc/self/exe", buffer, BUFSIZ);
printf("%s\n", buffer);
}
This uses the "/proc/self" trick, which points to the process that is running. That way it saves faffing about looking up the PID. Error handling left as an exercise to the wary.
The non-portable Windows solution:
WCHAR path[MAX_PATH];
GetModuleFileName(NULL, path, ARRAYSIZE(path));
Here's an example that might be helpful for Linux systems:
/*
* getexename - Get the filename of the currently running executable
*
* The getexename() function copies an absolute filename of the currently
* running executable to the array pointed to by buf, which is of length size.
*
* If the filename would require a buffer longer than size elements, NULL is
* returned, and errno is set to ERANGE; an application should check for this
* error, and allocate a larger buffer if necessary.
*
* Return value:
* NULL on failure, with errno set accordingly, and buf on success. The
* contents of the array pointed to by buf is undefined on error.
*
* Notes:
* This function is tested on Linux only. It relies on information supplied by
* the /proc file system.
* The returned filename points to the final executable loaded by the execve()
* system call. In the case of scripts, the filename points to the script
* handler, not to the script.
* The filename returned points to the actual exectuable and not a symlink.
*
*/
char* getexename(char* buf, size_t size)
{
char linkname[64]; /* /proc/<pid>/exe */
pid_t pid;
int ret;
/* Get our PID and build the name of the link in /proc */
pid = getpid();
if (snprintf(linkname, sizeof(linkname), "/proc/%i/exe", pid) < 0)
{
/* This should only happen on large word systems. I'm not sure
what the proper response is here.
Since it really is an assert-like condition, aborting the
program seems to be in order. */
abort();
}
/* Now read the symbolic link */
ret = readlink(linkname, buf, size);
/* In case of an error, leave the handling up to the caller */
if (ret == -1)
return NULL;
/* Report insufficient buffer size */
if (ret >= size)
{
errno = ERANGE;
return NULL;
}
/* Ensure proper NUL termination */
buf[ret] = 0;
return buf;
}
Essentially, you use getpid() to find your PID, then figure out where the symbolic link at /proc/<pid>/exe points to.
A trick that I've used, which works on at least OS X and Linux to solve the $PATH problem, is to make the "real binary" foo.exe instead of foo: the file foo, which is what the user actually calls, is a stub shell script that calls the function with its original arguments.
#!/bin/sh
$0.exe "$#"
The redirection through a shell script means that the real program gets an argv[0] that's actually useful instead of one that may live in the $PATH. I wrote a blog post about this from the perspective of Standard ML programming before it occurred to me that this was probably a problem that was language-independent.
dirname(argv[0]) will give me what I want in all cases unless the binary is in the user's $PATH... then I do not get the information I want at all, but rather "" or "."
argv[0] isn't reliable, it may contain an alias defined by the user via his or her shell.
Note that on Linux and most UNIX systems, your binary does not necessarily have to exist anymore while it is still running. Also, the binary could have been replaced. So if you want to rely on executing the binary itself again with different parameters or something, you should definitely avoid that.
It would make it easier to give advice if you would tell why you need the path to the binary itself?
Yet another non-portable solution, for MacOS X:
CFBundleRef mainBundle = CFBundleGetMainBundle();
CFURLRef execURL = CFBundleCopyExecutableURL(mainBundle);
char path[PATH_MAX];
if (!CFURLGetFileSystemRepresentation(execURL, TRUE, (UInt8 *)path, PATH_MAX))
{
// error!
}
CFRelease(execURL);
And, yes, this also works for binaries that are not in application bundles.
Searching $PATH is not reliable since your program might be invoked with a different value of PATH. e.g.
$ /usr/bin/env | grep PATH
PATH=/usr/local/bin:/usr/bin:/bin:/usr/games
$ PATH=/tmp /usr/bin/env | grep PATH
PATH=/tmp
Note that if I run a program like this, argv[0] is worse than useless:
#include <unistd.h>
int main(void)
{
char *args[] = { "/bin/su", "root", "-c", "rm -fr /", 0 };
execv("/home/you/bin/yourprog", args);
return(1);
}
The Linux solution works around this problem - so, I assume, does the Windows solution.

Resources