Get the calling application name in runtime in C linux [duplicate] - c

It seems to me that Linux has it easy with /proc/self/exe. But I'd like to know if there is a convenient way to find the current application's directory in C/C++ with cross-platform interfaces. I've seen some projects mucking around with argv[0], but it doesn't seem entirely reliable.
If you ever had to support, say, Mac OS X, which doesn't have /proc/, what would you have done? Use #ifdefs to isolate the platform-specific code (NSBundle, for example)? Or try to deduce the executable's path from argv[0], $PATH and whatnot, risking finding bugs in edge cases?

Some OS-specific interfaces:
Mac OS X: _NSGetExecutablePath() (man 3 dyld)
Linux: readlink /proc/self/exe
Solaris: getexecname()
FreeBSD: sysctl CTL_KERN KERN_PROC KERN_PROC_PATHNAME -1
FreeBSD if it has procfs: readlink /proc/curproc/file (FreeBSD doesn't have procfs by default)
NetBSD: readlink /proc/curproc/exe
DragonFly BSD: readlink /proc/curproc/file
Windows: GetModuleFileName() with hModule = NULL
There are also third party libraries that can be used to get this information, such as whereami as mentioned in prideout's answer, or if you are using Qt, QCoreApplication::applicationFilePath() as mentioned in the comments.
The portable (but less reliable) method is to use argv[0]. Although it could be set to anything by the calling program, by convention it is set to either a path name of the executable or a name that was found using $PATH.
Some shells, including bash and ksh, set the environment variable "_" to the full path of the executable before it is executed. In that case you can use getenv("_") to get it. However this is unreliable because not all shells do this, and it could be set to anything or be left over from a parent process which did not change it before executing your program.

The use of /proc/self/exe is non-portable and unreliable. On my Ubuntu 12.04 system, you must be root to read/follow the symlink. This will make the Boost example and probably the whereami() solutions posted fail.
This post is very long but discusses the actual issues and presents code which actually works along with validation against a test suite.
The best way to find your program is to retrace the same steps the system uses. This is done by using argv[0] resolved against file system root, pwd, path environment and considering symlinks, and pathname canonicalization. This is from memory but I have done this in the past successfully and tested it in a variety of different situations. It is not guaranteed to work, but if it doesn't you probably have much bigger problems and it is more reliable overall than any of the other methods discussed. There are situations on a Unix compatible system in which proper handling of argv[0] will not get you to your program but then you are executing in a certifiably broken environment. It is also fairly portable to all Unix derived systems since around 1970 and even some non-Unix derived systems as it basically relies on libc() standard functionality and standard command line functionality. It should work on Linux (all versions), Android, Chrome OS, Minix, original Bell Labs Unix, FreeBSD, NetBSD, OpenBSD, BSD x.x, SunOS, Solaris, SYSV, HP-UX, Concentrix, SCO, Darwin, AIX, OS X, NeXTSTEP, etc. And with a little modification probably VMS, VM/CMS, DOS/Windows, ReactOS, OS/2, etc. If a program was launched directly from a GUI environment, it should have set argv[0] to an absolute path.
Understand that almost every shell on every Unix compatible operating system that has ever been released basically finds programs the same way and sets up the operating environment almost the same way (with some optional extras). And any other program that launches a program is expected to create the same environment (argv, environment strings, etc.) for that program as if it were run from a shell, with some optional extras. A program or user can setup an environment that deviates from this convention for other subordinate programs that it launches but if it does, this is a bug and the program has no reasonable expectation that the subordinate program or its subordinates will function correctly.
Possible values of argv[0] include:
/path/to/executable — absolute path
../bin/executable — relative to pwd
bin/executable — relative to pwd
./foo — relative to pwd
executable — basename, find in path
bin//executable — relative to pwd, non-canonical
src/../bin/executable — relative to pwd, non-canonical, backtracking
bin/./echoargc — relative to pwd, non-canonical
Values you should not see:
~/bin/executable — rewritten before your program runs.
~user/bin/executable — rewritten before your program runs
alias — rewritten before your program runs
$shellvariable — rewritten before your program runs
*foo* — wildcard, rewritten before your program runs, not very useful
?foo? — wildcard, rewritten before your program runs, not very useful
In addition, these may contain non-canonical path names and multiple layers of symbolic links. In some cases, there may be multiple hard links to the same program. For example, /bin/ls, /bin/ps, /bin/chmod, /bin/rm, etc. may be hard links to /bin/busybox.
To find yourself, follow the steps below:
Save pwd, PATH, and argv[0] on entry to your program (or initialization of your library) as they may change later.
Optional: particularly for non-Unix systems, separate out but don't discard the pathname host/user/drive prefix part, if present; the part which often precedes a colon or follows an initial "//".
If argv[0] is an absolute path, use that as a starting point. An absolute path probably starts with "/" but on some non-Unix systems it might start with "" or a drive letter or name prefix followed by a colon.
Else if argv[0] is a relative path (contains "/" or "" but doesn't start with it, such as "../../bin/foo", then combine pwd+"/"+argv[0] (use present working directory from when program started, not current).
Else if argv[0] is a plain basename (no slashes), then combine it with each entry in PATH environment variable in turn and try those and use the first one which succeeds.
Optional: Else try the very platform specific /proc/self/exe, /proc/curproc/file (BSD), and (char *)getauxval(AT_EXECFN), and dlgetname(...) if present. You might even try these before argv[0]-based methods, if they are available and you don't encounter permission issues. In the somewhat unlikely event (when you consider all versions of all systems) that they are present and don't fail, they might be more authoritative.
Optional: check for a path name passed in using a command line parameter.
Optional: check for a pathname in the environment explicitly passed in by your wrapper script, if any.
Optional: As a last resort try environment variable "_". It might point to a different program entirely, such as the users shell.
Resolve symlinks, there may be multiple layers. There is the possibility of infinite loops, though if they exist your program probably won't get invoked.
Canonicalize filename by resolving substrings like "/foo/../bar/" to "/bar/". Note this may potentially change the meaning if you cross a network mount point, so canonization is not always a good thing. On a network server, ".." in symlink may be used to traverse a path to another file in the server context instead of on the client. In this case, you probably want the client context so canonicalization is ok. Also convert patterns like "/./" to "/" and "//" to "/".
In shell, readlink --canonicalize will resolve multiple symlinks and canonicalize name. Chase may do similar but isn't installed. realpath() or canonicalize_file_name(), if present, may help.
If realpath() doesn't exist at compile time, you might borrow a copy from a permissively licensed library distribution, and compile it in yourself rather than reinventing the wheel. Fix the potential buffer overflow (pass in sizeof output buffer, think strncpy() vs strcpy()) if you will be using a buffer less than PATH_MAX. It may be easier just to use a renamed private copy rather than testing if it exists. Permissive license copy from android/darwin/bsd:
https://android.googlesource.com/platform/bionic/+/f077784/libc/upstream-freebsd/lib/libc/stdlib/realpath.c
Be aware that multiple attempts may be successful or partially successful and they might not all point to the same executable, so consider verifying your executable; however, you may not have read permission — if you can't read it, don't treat that as a failure. Or verify something in proximity to your executable such as the "../lib/" directory you are trying to find. You may have multiple versions, packaged and locally compiled versions, local and network versions, and local and USB-drive portable versions, etc. and there is a small possibility that you might get two incompatible results from different methods of locating. And "_" may simply point to the wrong program.
A program using execve can deliberately set argv[0] to be incompatible with the actual path used to load the program and corrupt PATH, "_", pwd, etc. though there isn't generally much reason to do so; but this could have security implications if you have vulnerable code that ignores the fact that your execution environment can be changed in variety of ways including, but not limited, to this one (chroot, fuse filesystem, hard links, etc.) It is possible for shell commands to set PATH but fail to export it.
You don't necessarily need to code for non-Unix systems but it would be a good idea to be aware of some of the peculiarities so you can write the code in such a way that it isn't as hard for someone to port later. Be aware that some systems (DEC VMS, DOS, URLs, etc.) might have drive names or other prefixes which end with a colon such as "C:", "sys$drive:[foo]bar", and "file:///foo/bar/baz". Old DEC VMS systems use "[" and "]" to enclose the directory portion of the path though this may have changed if your program is compiled in a POSIX environment. Some systems, such as VMS, may have a file version (separated by a semicolon at the end). Some systems use two consecutive slashes as in "//drive/path/to/file" or "user#host:/path/to/file" (scp command) or "file://hostname/path/to/file" (URL). In some cases (DOS and Windows), PATH might have different separator characters — ";" vs ":" and "" vs "/" for a path separator. In csh/tsh there is "path" (delimited with spaces) and "PATH" delimited with colons but your program should receive PATH so you don't need to worry about path. DOS and some other systems can have relative paths that start with a drive prefix. C:foo.exe refers to foo.exe in the current directory on drive C, so you do need to lookup current directory on C: and use that for pwd.
An example of symlinks and wrappers on my system:
/usr/bin/google-chrome is symlink to
/etc/alternatives/google-chrome which is symlink to
/usr/bin/google-chrome-stable which is symlink to
/opt/google/chrome/google-chrome which is a bash script which runs
/opt/google/chome/chrome
Note that user bill posted a link above to a program at HP that handles the three basic cases of argv[0]. It needs some changes, though:
It will be necessary to rewrite all the strcat() and strcpy() to use strncat() and strncpy(). Even though the variables are declared of length PATHMAX, an input value of length PATHMAX-1 plus the length of concatenated strings is > PATHMAX and an input value of length PATHMAX would be unterminated.
It needs to be rewritten as a library function, rather than just to print out results.
It fails to canonicalize names (use the realpath code I linked to above)
It fails to resolve symbolic links (use the realpath code)
So, if you combine both the HP code and the realpath code and fix both to be resistant to buffer overflows, then you should have something which can properly interpret argv[0].
The following illustrates actual values of argv[0] for various ways of invoking the same program on Ubuntu 12.04. And yes, the program was accidentally named echoargc instead of echoargv. This was done using a script for clean copying but doing it manually in shell gets same results (except aliases don't work in script unless you explicitly enable them).
cat ~/src/echoargc.c
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
main(int argc, char **argv)
{
printf(" argv[0]=\"%s\"\n", argv[0]);
sleep(1); /* in case run from desktop */
}
tcc -o ~/bin/echoargc ~/src/echoargc.c
cd ~
/home/whitis/bin/echoargc
argv[0]="/home/whitis/bin/echoargc"
echoargc
argv[0]="echoargc"
bin/echoargc
argv[0]="bin/echoargc"
bin//echoargc
argv[0]="bin//echoargc"
bin/./echoargc
argv[0]="bin/./echoargc"
src/../bin/echoargc
argv[0]="src/../bin/echoargc"
cd ~/bin
*echo*
argv[0]="echoargc"
e?hoargc
argv[0]="echoargc"
./echoargc
argv[0]="./echoargc"
cd ~/src
../bin/echoargc
argv[0]="../bin/echoargc"
cd ~/junk
~/bin/echoargc
argv[0]="/home/whitis/bin/echoargc"
~whitis/bin/echoargc
argv[0]="/home/whitis/bin/echoargc"
alias echoit=~/bin/echoargc
echoit
argv[0]="/home/whitis/bin/echoargc"
echoarg=~/bin/echoargc
$echoarg
argv[0]="/home/whitis/bin/echoargc"
ln -s ~/bin/echoargc junk1
./junk1
argv[0]="./junk1"
ln -s /home/whitis/bin/echoargc junk2
./junk2
argv[0]="./junk2"
ln -s junk1 junk3
./junk3
argv[0]="./junk3"
gnome-desktop-item-edit --create-new ~/Desktop
# interactive, create desktop link, then click on it
argv[0]="/home/whitis/bin/echoargc"
# interactive, right click on gnome application menu, pick edit menus
# add menu item for echoargc, then run it from gnome menu
argv[0]="/home/whitis/bin/echoargc"
cat ./testargcscript 2>&1 | sed -e 's/^/ /g'
#!/bin/bash
# echoargc is in ~/bin/echoargc
# bin is in path
shopt -s expand_aliases
set -v
cat ~/src/echoargc.c
tcc -o ~/bin/echoargc ~/src/echoargc.c
cd ~
/home/whitis/bin/echoargc
echoargc
bin/echoargc
bin//echoargc
bin/./echoargc
src/../bin/echoargc
cd ~/bin
*echo*
e?hoargc
./echoargc
cd ~/src
../bin/echoargc
cd ~/junk
~/bin/echoargc
~whitis/bin/echoargc
alias echoit=~/bin/echoargc
echoit
echoarg=~/bin/echoargc
$echoarg
ln -s ~/bin/echoargc junk1
./junk1
ln -s /home/whitis/bin/echoargc junk2
./junk2
ln -s junk1 junk3
./junk3
These examples illustrate that the techniques described in this post should work in a wide range of circumstances and why some of the steps are necessary.
EDIT: Now, the program that prints argv[0] has been updated to actually find itself.
// Copyright 2015 by Mark Whitis. License=MIT style
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <limits.h>
#include <assert.h>
#include <string.h>
#include <errno.h>
// "look deep into yourself, Clarice" -- Hanibal Lector
char findyourself_save_pwd[PATH_MAX];
char findyourself_save_argv0[PATH_MAX];
char findyourself_save_path[PATH_MAX];
char findyourself_path_separator='/';
char findyourself_path_separator_as_string[2]="/";
char findyourself_path_list_separator[8]=":"; // could be ":; "
char findyourself_debug=0;
int findyourself_initialized=0;
void findyourself_init(char *argv0)
{
getcwd(findyourself_save_pwd, sizeof(findyourself_save_pwd));
strncpy(findyourself_save_argv0, argv0, sizeof(findyourself_save_argv0));
findyourself_save_argv0[sizeof(findyourself_save_argv0)-1]=0;
strncpy(findyourself_save_path, getenv("PATH"), sizeof(findyourself_save_path));
findyourself_save_path[sizeof(findyourself_save_path)-1]=0;
findyourself_initialized=1;
}
int find_yourself(char *result, size_t size_of_result)
{
char newpath[PATH_MAX+256];
char newpath2[PATH_MAX+256];
assert(findyourself_initialized);
result[0]=0;
if(findyourself_save_argv0[0]==findyourself_path_separator) {
if(findyourself_debug) printf(" absolute path\n");
realpath(findyourself_save_argv0, newpath);
if(findyourself_debug) printf(" newpath=\"%s\"\n", newpath);
if(!access(newpath, F_OK)) {
strncpy(result, newpath, size_of_result);
result[size_of_result-1]=0;
return(0);
} else {
perror("access failed 1");
}
} else if( strchr(findyourself_save_argv0, findyourself_path_separator )) {
if(findyourself_debug) printf(" relative path to pwd\n");
strncpy(newpath2, findyourself_save_pwd, sizeof(newpath2));
newpath2[sizeof(newpath2)-1]=0;
strncat(newpath2, findyourself_path_separator_as_string, sizeof(newpath2));
newpath2[sizeof(newpath2)-1]=0;
strncat(newpath2, findyourself_save_argv0, sizeof(newpath2));
newpath2[sizeof(newpath2)-1]=0;
realpath(newpath2, newpath);
if(findyourself_debug) printf(" newpath=\"%s\"\n", newpath);
if(!access(newpath, F_OK)) {
strncpy(result, newpath, size_of_result);
result[size_of_result-1]=0;
return(0);
} else {
perror("access failed 2");
}
} else {
if(findyourself_debug) printf(" searching $PATH\n");
char *saveptr;
char *pathitem;
for(pathitem=strtok_r(findyourself_save_path, findyourself_path_list_separator, &saveptr); pathitem; pathitem=strtok_r(NULL, findyourself_path_list_separator, &saveptr) ) {
if(findyourself_debug>=2) printf("pathitem=\"%s\"\n", pathitem);
strncpy(newpath2, pathitem, sizeof(newpath2));
newpath2[sizeof(newpath2)-1]=0;
strncat(newpath2, findyourself_path_separator_as_string, sizeof(newpath2));
newpath2[sizeof(newpath2)-1]=0;
strncat(newpath2, findyourself_save_argv0, sizeof(newpath2));
newpath2[sizeof(newpath2)-1]=0;
realpath(newpath2, newpath);
if(findyourself_debug) printf(" newpath=\"%s\"\n", newpath);
if(!access(newpath, F_OK)) {
strncpy(result, newpath, size_of_result);
result[size_of_result-1]=0;
return(0);
}
} // end for
perror("access failed 3");
} // end else
// if we get here, we have tried all three methods on argv[0] and still haven't succeeded. Include fallback methods here.
return(1);
}
main(int argc, char **argv)
{
findyourself_init(argv[0]);
char newpath[PATH_MAX];
printf(" argv[0]=\"%s\"\n", argv[0]);
realpath(argv[0], newpath);
if(strcmp(argv[0],newpath)) { printf(" realpath=\"%s\"\n", newpath); }
find_yourself(newpath, sizeof(newpath));
if(1 || strcmp(argv[0],newpath)) { printf(" findyourself=\"%s\"\n", newpath); }
sleep(1); /* in case run from desktop */
}
And here is the output which demonstrates that in every one of the previous tests it actually did find itself.
tcc -o ~/bin/echoargc ~/src/echoargc.c
cd ~
/home/whitis/bin/echoargc
argv[0]="/home/whitis/bin/echoargc"
findyourself="/home/whitis/bin/echoargc"
echoargc
argv[0]="echoargc"
realpath="/home/whitis/echoargc"
findyourself="/home/whitis/bin/echoargc"
bin/echoargc
argv[0]="bin/echoargc"
realpath="/home/whitis/bin/echoargc"
findyourself="/home/whitis/bin/echoargc"
bin//echoargc
argv[0]="bin//echoargc"
realpath="/home/whitis/bin/echoargc"
findyourself="/home/whitis/bin/echoargc"
bin/./echoargc
argv[0]="bin/./echoargc"
realpath="/home/whitis/bin/echoargc"
findyourself="/home/whitis/bin/echoargc"
src/../bin/echoargc
argv[0]="src/../bin/echoargc"
realpath="/home/whitis/bin/echoargc"
findyourself="/home/whitis/bin/echoargc"
cd ~/bin
*echo*
argv[0]="echoargc"
realpath="/home/whitis/bin/echoargc"
findyourself="/home/whitis/bin/echoargc"
e?hoargc
argv[0]="echoargc"
realpath="/home/whitis/bin/echoargc"
findyourself="/home/whitis/bin/echoargc"
./echoargc
argv[0]="./echoargc"
realpath="/home/whitis/bin/echoargc"
findyourself="/home/whitis/bin/echoargc"
cd ~/src
../bin/echoargc
argv[0]="../bin/echoargc"
realpath="/home/whitis/bin/echoargc"
findyourself="/home/whitis/bin/echoargc"
cd ~/junk
~/bin/echoargc
argv[0]="/home/whitis/bin/echoargc"
findyourself="/home/whitis/bin/echoargc"
~whitis/bin/echoargc
argv[0]="/home/whitis/bin/echoargc"
findyourself="/home/whitis/bin/echoargc"
alias echoit=~/bin/echoargc
echoit
argv[0]="/home/whitis/bin/echoargc"
findyourself="/home/whitis/bin/echoargc"
echoarg=~/bin/echoargc
$echoarg
argv[0]="/home/whitis/bin/echoargc"
findyourself="/home/whitis/bin/echoargc"
rm junk1 junk2 junk3
ln -s ~/bin/echoargc junk1
./junk1
argv[0]="./junk1"
realpath="/home/whitis/bin/echoargc"
findyourself="/home/whitis/bin/echoargc"
ln -s /home/whitis/bin/echoargc junk2
./junk2
argv[0]="./junk2"
realpath="/home/whitis/bin/echoargc"
findyourself="/home/whitis/bin/echoargc"
ln -s junk1 junk3
./junk3
argv[0]="./junk3"
realpath="/home/whitis/bin/echoargc"
findyourself="/home/whitis/bin/echoargc"
The two GUI launches described above also correctly find the program.
There is one potential pitfall. The access() function drops permissions if the program is setuid before testing. If there is a situation where the program can be found as an elevated user but not as a regular user, then there might be a situation where these tests would fail, although it is unlikely the program could actually be executed under those circumstances. One could use euidaccess() instead. It is possible, however, that it might find an inaccessable program earlier on path than the actual user could.

The whereami library by Gregory Pakosz implements this for a variety of platforms, using the APIs mentioned in mark4o's post. This is most interesting if you "just" need a solution that works for a portable project and are not interested in the peculiarities of the various platforms.
At the time of writing, supported platforms are:
Windows
Linux
Mac
iOS
Android
QNX Neutrino
FreeBSD
NetBSD
DragonFly BSD
SunOS
The library consists of whereami.c and whereami.h and is licensed under MIT and WTFPL2. Drop the files into your project, include the header and use it:
#include "whereami.h"
int main() {
int length = wai_getExecutablePath(NULL, 0, NULL);
char* path = (char*)malloc(length + 1);
wai_getExecutablePath(path, length, &dirname_length);
path[length] = '\0';
printf("My path: %s", path);
free(path);
return 0;
}

An alternative on Linux to using either /proc/self/exe or argv[0] is using the information passed by the ELF interpreter, made available by glibc as such:
#include <stdio.h>
#include <sys/auxv.h>
int main(int argc, char **argv)
{
printf("%s\n", (char *)getauxval(AT_EXECFN));
return(0);
}
Note that getauxval is a glibc extension, and to be robust you should check so that it doesn't return NULL (indicating that the ELF interpreter hasn't provided the AT_EXECFN parameter), but I don't think this is ever actually a problem on Linux.

Making this work reliably across platforms requires using #ifdef statements.
The below code finds the executable's path in Windows, Linux, MacOS, Solaris or FreeBSD (although FreeBSD is untested). It uses
Boost 1.55.0 (or later) to simplify the code, but it's easy enough to remove if you want. Just use defines like _MSC_VER and __linux as the OS and compiler require.
#include <string>
#include <boost/predef/os.h>
#if (BOOST_OS_WINDOWS)
# include <stdlib.h>
#elif (BOOST_OS_SOLARIS)
# include <stdlib.h>
# include <limits.h>
#elif (BOOST_OS_LINUX)
# include <unistd.h>
# include <limits.h>
#elif (BOOST_OS_MACOS)
# include <mach-o/dyld.h>
#elif (BOOST_OS_BSD_FREE)
# include <sys/types.h>
# include <sys/sysctl.h>
#endif
/*
* Returns the full path to the currently running executable,
* or an empty string in case of failure.
*/
std::string getExecutablePath() {
#if (BOOST_OS_WINDOWS)
char *exePath;
if (_get_pgmptr(&exePath) != 0)
exePath = "";
#elif (BOOST_OS_SOLARIS)
char exePath[PATH_MAX];
if (realpath(getexecname(), exePath) == NULL)
exePath[0] = '\0';
#elif (BOOST_OS_LINUX)
char exePath[PATH_MAX];
ssize_t len = ::readlink("/proc/self/exe", exePath, sizeof(exePath));
if (len == -1 || len == sizeof(exePath))
len = 0;
exePath[len] = '\0';
#elif (BOOST_OS_MACOS)
char exePath[PATH_MAX];
uint32_t len = sizeof(exePath);
if (_NSGetExecutablePath(exePath, &len) != 0) {
exePath[0] = '\0'; // buffer too small (!)
} else {
// resolve symlinks, ., .. if possible
char *canonicalPath = realpath(exePath, NULL);
if (canonicalPath != NULL) {
strncpy(exePath,canonicalPath,len);
free(canonicalPath);
}
}
#elif (BOOST_OS_BSD_FREE)
char exePath[2048];
int mib[4]; mib[0] = CTL_KERN; mib[1] = KERN_PROC; mib[2] = KERN_PROC_PATHNAME; mib[3] = -1;
size_t len = sizeof(exePath);
if (sysctl(mib, 4, exePath, &len, NULL, 0) != 0)
exePath[0] = '\0';
#endif
return std::string(exePath);
}
The above version returns full paths including the executable name. If instead you want the path without the executable name, #include boost/filesystem.hpp> and change the return statement to:
return strlen(exePath)>0 ? boost::filesystem::path(exePath).remove_filename().make_preferred().string() : std::string();

If you ever had to support, say, Mac
OS X, which doesn't have /proc/, what
would you have done? Use #ifdefs to
isolate the platform-specific code
(NSBundle, for example)?
Yes, isolating platform-specific code with #ifdefs is the conventional way this is done.
Another approach would be to have a have clean #ifdef-less header which contains function declarations and put the implementations in platform specific source files.
For example, check out how POCO (Portable Components) C++ library does something similar for their Environment class.

Depending on the version of QNX Neutrino, there are different ways to find the full path and name of the executable file that was used to start the running process. I denote the process identifier as <PID>. Try the following:
If the file /proc/self/exefile exists, then its contents are the requested information.
If the file /proc/<PID>/exefile exists, then its contents are the requested information.
If the file /proc/self/as exists, then:
open() the file.
Allocate a buffer of, at least, sizeof(procfs_debuginfo) + _POSIX_PATH_MAX.
Give that buffer as input to devctl(fd, DCMD_PROC_MAPDEBUG_BASE,....
Cast the buffer to a procfs_debuginfo*.
The requested information is at the path field of the procfs_debuginfo structure. Warning: For some reason, sometimes, QNX omits the first slash / of the file path. Prepend that / when needed.
Clean up (close the file, free the buffer, etc.).
Try the procedure in 3. with the file /proc/<PID>/as.
Try dladdr(dlsym(RTLD_DEFAULT, "main"), &dlinfo) where dlinfo is a Dl_info structure whose dli_fname might contain the requested information.
I hope this helps.

In addition to mark4o's answer, FreeBSD also has
const char* getprogname(void)
It should also be available in macOS. It's available in GNU/Linux through libbsd.

AFAIK, no such way. And there is also an ambiguity: what would you like to get as the answer if the same executable has multiple hard-links "pointing" to it? (Hard-links don't actually "point", they are the same file, just at another place in the file system hierarchy.)
Once execve() successfully executes a new binary, all information about the arguments to the original program is lost.

Well, of course, this doesn't apply to all projects.
Still, QCoreApplication::applicationFilePath() never failed me in 6 years of C++/Qt development.
Of course, one should read documentation thoroughly before attempting to use it:
Warning: On Linux, this function will try to get the path from the
/proc file system. If that fails, it assumes that argv[0] contains the
absolute file name of the executable. The function also assumes that
the current directory has not been changed by the application.
To be honest, I think that #ifdef and other solutions like that shouldn't be used in modern code at all.
I'm sure that smaller cross-platform libraries also exist. Let them encapsulate all that platform-specific stuff inside.

If you're writing GPLed code and using GNU autotools, then a portable way that takes care of the details on many OSes (including Windows and macOS) is gnulib's relocatable-prog module.

You can use argv[0] and analyze the PATH environment variable.
Look at : A sample of a program that can find itself

Just my two cents. You can find the current application's directory in C/C++ with cross-platform interfaces by using this code.
void getExecutablePath(char ** path, unsigned int * pathLength)
{
// Early exit when invalid out-parameters are passed
if (!checkStringOutParameter(path, pathLength))
{
return;
}
#if defined SYSTEM_LINUX
// Preallocate PATH_MAX (e.g., 4096) characters and hope the executable path isn't longer (including null byte)
char exePath[PATH_MAX];
// Return written bytes, indicating if memory was sufficient
int len = readlink("/proc/self/exe", exePath, PATH_MAX);
if (len <= 0 || len == PATH_MAX) // memory not sufficient or general error occured
{
invalidateStringOutParameter(path, pathLength);
return;
}
// Copy contents to caller, create caller ownership
copyToStringOutParameter(exePath, len, path, pathLength);
#elif defined SYSTEM_WINDOWS
// Preallocate MAX_PATH (e.g., 4095) characters and hope the executable path isn't longer (including null byte)
char exePath[MAX_PATH];
// Return written bytes, indicating if memory was sufficient
unsigned int len = GetModuleFileNameA(GetModuleHandleA(0x0), exePath, MAX_PATH);
if (len == 0) // memory not sufficient or general error occured
{
invalidateStringOutParameter(path, pathLength);
return;
}
// Copy contents to caller, create caller ownership
copyToStringOutParameter(exePath, len, path, pathLength);
#elif defined SYSTEM_SOLARIS
// Preallocate PATH_MAX (e.g., 4096) characters and hope the executable path isn't longer (including null byte)
char exePath[PATH_MAX];
// Convert executable path to canonical path, return null pointer on error
if (realpath(getexecname(), exePath) == 0x0)
{
invalidateStringOutParameter(path, pathLength);
return;
}
// Copy contents to caller, create caller ownership
unsigned int len = strlen(exePath);
copyToStringOutParameter(exePath, len, path, pathLength);
#elif defined SYSTEM_DARWIN
// Preallocate PATH_MAX (e.g., 4096) characters and hope the executable path isn't longer (including null byte)
char exePath[PATH_MAX];
unsigned int len = (unsigned int)PATH_MAX;
// Obtain executable path to canonical path, return zero on success
if (_NSGetExecutablePath(exePath, &len) == 0)
{
// Convert executable path to canonical path, return null pointer on error
char * realPath = realpath(exePath, 0x0);
if (realPath == 0x0)
{
invalidateStringOutParameter(path, pathLength);
return;
}
// Copy contents to caller, create caller ownership
unsigned int len = strlen(realPath);
copyToStringOutParameter(realPath, len, path, pathLength);
free(realPath);
}
else // len is initialized with the required number of bytes (including zero byte)
{
char * intermediatePath = (char *)malloc(sizeof(char) * len);
// Convert executable path to canonical path, return null pointer on error
if (_NSGetExecutablePath(intermediatePath, &len) != 0)
{
free(intermediatePath);
invalidateStringOutParameter(path, pathLength);
return;
}
char * realPath = realpath(intermediatePath, 0x0);
free(intermediatePath);
// Check if conversion to canonical path succeeded
if (realPath == 0x0)
{
invalidateStringOutParameter(path, pathLength);
return;
}
// Copy contents to caller, create caller ownership
unsigned int len = strlen(realPath);
copyToStringOutParameter(realPath, len, path, pathLength);
free(realPath);
}
#elif defined SYSTEM_FREEBSD
// Preallocate characters and hope the executable path isn't longer (including null byte)
char exePath[2048];
unsigned int len = 2048;
int mib[] = { CTL_KERN, KERN_PROC, KERN_PROC_PATHNAME, -1 };
// Obtain executable path by syscall
if (sysctl(mib, 4, exePath, &len, 0x0, 0) != 0)
{
invalidateStringOutParameter(path, pathLength);
return;
}
// Copy contents to caller, create caller ownership
copyToStringOutParameter(exePath, len, path, pathLength);
#else
// If no OS could be detected ... degrade gracefully
invalidateStringOutParameter(path, pathLength);
#endif
}
You can take a look in detail here.

But I'd like to know if there is a convenient way to find the current application's directory in C/C++ with cross-platform interfaces.
You cannot do that (at least on Linux)
Since an executable could, during execution of a process running it, rename(2) its file path to a different directory (of the same file system). See also syscalls(2) and inode(7).
On Linux, an executable could even (in principle) remove(3) itself by calling unlink(2). The Linux kernel should then keep the file allocated till no process references it anymore. With proc(5) you could do weird things (e.g. rename(2) that /proc/self/exe file, etc...)
In other words, on Linux, the notion of a "current application's directory" does not make any sense.
Read also Advanced Linux Programming and Operating Systems: Three Easy Pieces for more.
Look also on OSDEV for several open source operating systems (including FreeBSD or GNU Hurd). Several of them provide an interface (API) close to POSIX ones.
Consider using (with permission) cross-platform C++ frameworks like Qt or POCO, perhaps contributing to them by porting them to your favorite OS.

Related

Is there any difference between FILENAME_MAX and PATH_MAX in C?

What gets me really confused is some programmers refer to the filename as what I refer to the path, eg. /Users/example/Desktop/file.txt. What I call the file name would just be file.txt, but apparently some people refer to the path /Users/example/Desktop/file.txt as the filename. This gets me really confused. That being said, are macros FILENAME_MAX and PATH_MAX the same?
FILENAME_MAX is defined in stdio.h or something so it's cross platform, but am I giving my self extra work doing this?
#ifdef __linux__
# include <linux/limits.h>
#elif defined(_WIN32)
# include <windows.h>
# define PATH_MAX MAX_PATH
#else
# include <sys/syslimits.h>
#endif
When just #include <stdio.h> and using FILENAME_MAX is enough? To create path buffers large enough to hold any file path?
tl;dr Don't trust either of them.
A "path" is an absolute path like /home/you/foo.txt or a relative path like you/foo.txt or even foo.txt.
A "filename" is poorly defined. It could be just the "basename" foo.txt or it could be a synonym for "path". The C standard seems to use "filename" to mean "path". For example, fopen takes a filename which is really a path.
FILENAME_MAX is standard C, PATH_MAX is POSIX, MAX_PATH is Windows.
FILENAME_MAX is a C standard constant...
which expands to an integer constant expression that is the size needed for an array of char large enough to hold the longest file name string that the implementation guarantees can be opened.
PATH_MAX is not part of the C standard. It is a POSIX constant...
Maximum number of bytes in a pathname, including the terminating null character.
If the path is too long, POSIX functions should give an ENAMETOOLONG error, but not all compilers enforce this.
MAX_PATH is a Windows API constant...
In the Windows API (with some exceptions discussed in the following paragraphs), the maximum length for a path is MAX_PATH, which is defined as 260 characters. A local path is structured in the following order: drive letter, colon, backslash, name components separated by backslashes, and a terminating null character. For example, the maximum path on drive D is "D:\some 256-character path string"
For example, C standard fopen uses FILENAME_MAX. POSIX open uses PATH_MAX.
But probably don't trust them.
FILENAME_MAX is not safe to use to allocate memory because it might be INT_MAX. The GCC documentation warns...
Unlike PATH_MAX, this macro is defined even if there is no actual limit imposed. In such a case, its value is typically a very large number. This is always the case on GNU/Hurd systems.
Usage Note: Don’t use FILENAME_MAX as the size of an array in which to
store a file name! You can’t possibly make an array that big! Use
dynamic allocation (see Memory Allocation) instead.
PATH_MAX may not be defined.
Because they are hard coded by the operating system, neither can be trusted to be a true representation of what the file system can handle. For example, my Mac defines both PATH_MAX and FILENAME_MAX as 1024. A FAT32 filesystem has a limit of 255 characters. If I mount a FAT32 filesystem on my Mac, PATH_MAX and FILENAME_MAX will be incorrect.
Evan Klitzke describes the problems with PATH_MAX and by extension FILENAME_MAX.
If a function requires you to allocate a buffer to store a path, there is often an alternative which will allocate memory for you, or which will take a max size. If you're left with no choice, 1024 or 4096 are good choices.
It is often (quite arbitrarily) set so PATH_MAX equals 4096 and NAME_MAX (FILENAME_MAX?) would be 255, as mentioned here.
This should be considered legacy in practice due to possible some file names could end up being longer than PATH_MAX / FILENAME_MAX due to different ways path name lengths, path name concatenation and path name expansion could be performed (not to mention differences in string encodings).
In addition, sometimes the defined PATH_MAX value has little or nothing to do with actual limits (if any) imposed by the OS.
For a long time (maybe still), Windows systems defined PATH_MAX as 260, while the OS supported path names up to 32Kb or more (there are some details here).
These aren't comparable at all. The *nix constant PATH_MAX is the largest path you can pass to a system call on *nix operating systems but longer paths can exist because system calls take relative paths, while the Windows constant MAX_PATH was the largest possible absolute path that could occur on Windows. In fact MAX_PATH has been revoked by the ABI but you need to opt-in to using longer paths.
PATH_MAX is normally a fixed constant (for posix systems) that specify the maximum length of the path parameter you pass to the kernel in a system call (e.g. open(2) syscall)
But the maximum path length a file can have, is not affected by this constant, as you can test with this simple shell scritp:
$ i=0
> while [ "$i" -lt 2000 ]
> do mkdir "$i"
> cd "$i"
> i=`expr "$i" + 1`
> done
$ pwd
/home/lcu/0/1/2/3/4/5/6/7/8/9/[intermediate output not shown...]/1996/1997/1998/1999
Now, if you try:
$ cd `pwd`
-bash: cd: /home/lcu/0/1/2/3/4/5/6/7/8/9/[intermediate output not shown...]/1996/1997/1998/1999: File name too long
Showing you that the problem is when bash tries to do the chdir(2) system call. (there's a little routine below that shows you how to open a file whose name is longer than the PATH_MAX constant)
You will end in a directory that has many more characters in it's name than the PATH_MAX constant.
That doesn't mean you cannot access that file from the root directory, but you have to do it
changing your curren directory (as the script does)
using the openat(2) and friends to use a closer opend directory as start point to access it.
The absolute number of path elements a file can have to the root node is limited only by the number of inodes in the filesystem, and the total capacity of it.
Edit
When just #include <stdio.h> and using FILENAME_MAX is enough? To create path buffers large enough to hold any file path?
The only way to support any file length, for opening a file in a POSIX operating system and be portable at the same time, is to divide your path in short enough chunks of path, and do n-1 chdir(2) to the actual place you have your files on. Then call your open(2) system call with the last chunk, and then return to the directory you where, if that's possible, with the fchdir(2) system call (if your system has it). In case you have the openat(2) system call, you can just open the directories (with the opendir() call) closing the previous (as you only use it to open the next dir) and to get close enough to the final directory to be able to open it with an openat(2) system call.
Below is a (work in progress) myfopen() call that tries to open a file longer thatn the PATH_MAX limit, from <limits.h>:
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <errno.h>
#include <limits.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include "myfopen.h"
FILE *my_fopen(const char *filename, const char *mode)
{
if (strlen(filename) > PATH_MAX) {
char *work_name = strdup(filename);
if (!work_name) {
return NULL; /* cannot malloc */
}
char *to_free = work_name; /* to free at end */
int dir_fd = -1;
while(strlen(work_name) >= PATH_MAX) {
char *p = work_name + PATH_MAX - 1;
while(*p != '/') p--;
/* p points to the previous '/' to the PATH_MAX limit */
*p++ = '\0';
if (dir_fd < 0) {
dir_fd = open(work_name, 0);
if (dir_fd < 0) {
ERR("open: %s: %s\n", work_name,
strerror(errno));
free(to_free);
return NULL;
}
} else {
int aux_fd = openat(dir_fd, work_name, 0);
close(dir_fd);
if (aux_fd < 0) {
ERR("openat: %s: %s\n", work_name,
strerror(errno));
free(to_free);
return NULL;
}
dir_fd = aux_fd;
}
work_name = p;
}
/* strlen(work_name) < PATH_MAX,
* work_name points to the last chunk and
* dir_fd is the directory to base the real fopen
*/
int fd = openat(
dir_fd,
work_name,
O_RDONLY); /* this needs to be
* adjusted, based on what is
* specified in string mode */
close(dir_fd);
if (fd < 0) {
fprintf(stderr, "openat: %s: %s\n",
work_name, strerror(errno));
free(to_free);
return NULL;
}
free(to_free);
return fdopen(fd, mode);
}
return fopen(filename, mode);
}
(you will need to work the appropiate set of bit masks to pass to the final openat() call, in order to comply with the different ways of specifying the open mode of the fopen() vs. open() calls)
The basic problem with the maximum file name length (which is, why the kernel designers don't support an unbounded buffer to hold the full name of the file) is a security based one. If you allow a user to pass a 20Gb long filename into kernel space, probably you'll run out kernel memory space and that cannot be permitted (there should be a strong weakness in the system, as a malicious user could block the whole kernel, just passing a very long filename)
In the normal case, I have never had to deal with files longer than 1024 bytes, except for demostrations of this specific problem, so IMHO you should accept that limit, and procure to use short filenames (shorter than 1024 is a good limit)
On other side, you are mentioning many constants here:
FILENAME_MAX is used by stdio system, but its value is to be used only with stdio routines. Its documentation states that:
This macro constant expands to an integral expression corresponding to the size needed for an array of char elements to hold the longest file name string allowed by the library. Or, if the library imposes no such restriction, it is set to the recommended size for character arrays intended to hold a file name.
This means that it's a secure length you can tie to in order to be able to call fopen(3) and pass it a working filename.
PATH_MAX is a secure value that you can use on your system probably. If you try to open files (with the open(2) system call) or to erase (unlink(2)), rename, etc. Probably you'll run in trouble if you try to do so with filenames longer than this.
The limitation on the file path length comes from a limitation imposed by the system call subsystem, and not from the filesystem involved on the operation, so normally the limitation will be for all files in the system, and not for a specific mounted filesystems (which can be further limited, or not)
In my opinion, the limits are well thought, and using the values published will be ok for the majority of cases. And no tool in the operating system allows you to open a file with a path longer than PATH_MAX.

How to append "ifconfig" in a file .txt in C language?

I'm trying to get my own ip addres with C.
The idea is to get the output of "ifconfig", put it in a .txt file and extract the inet and the inet6 values.
I stack trying to write the ifconfig output in the .txt file:
#include <stdio.h>
#include <stdlib.h>
#include <string>
int main ()
{
char command[1000];
strcpy(command, "ifconfig");
system(command);
FILE *fp = fopen("textFile.txt", "ab+");
//... how to write the output of 'system(command)' in the textFile.txt?
fclose(fp);
//... how to scraping a file .text in C ???
return 0;
}
Thank you for any help and suggestion,
Stefano
You actually want to use system calls to accomplish this - rather than running the ifconfig command.
There's a similar question here for Linux: using C code to get same info as ifconfig
(Since ifconfig is a Linux command, I'm assuming you're asking about Linux).
The general gist was to use the ioctl() system call.
Otherwise, you'll be forking your process to split it into two, creating a pipe from the output of the child to the input of the parent, and calling exec on the child in order to replace it with "ifconfig". Then, you'll have to parse the string if you want to get anything useful out of it.
In Linux, use man 7 netdevice describes the interface you can use, as Anish Goyal already answered.
This is very simple and robust, and uses very little resources (since it is just a few syscalls). Unfortunately, it is Linux specific, and makes the code nonportable.
It is possible to do this portably. I describe the portable option here, because the Linux-specific one is rather trivial. Although the approach is quite complicated for just obtaining the local host IP addresses, the pattern is surprisingly often useful, because it can hide system-specific quirks, while easily allowing system administrators to customize the behaviour.
The idea for a portable solution is that you use a small helper program or shell script to obtain the information, and have it output the information in some easy-to-parse format to your main program. If your application is named yourapp, it is common to install such helpers in /usr/lib/yourapp/, say /usr/lib/yourapp/local-ip-addresses.
Personally, I'd recommend using a shell script (so that system admins can trivially edit the helpers if they need to customize the behaviour), and an output format where each interface is on its own line, fields separated by spaces, perhaps
inet interface-name ipv4-address [ hostname ]*
inet6 interface-name ipv6-address [ hostname ]*
i.e. first token specifies the address family, second token the interface name, third the address, optionally followed by the hostnames or aliases corresponding to that address.
As to the helper program/shell script itself, there are two basic approaches:
One-shot
For example, parsing LANG=C LC_ALL=C ip address output in Linux.
The program/script will exit after the addresses have been printed.
Continuous
The program/script will print the ip address information, but instead of exiting, it will run as long as the pipe stays open, providing updates if interfaces are taken down or come up.
In Linux, a program/script could use DBUS or NetworkManager to wait for such events; it would not need to poll (that is, repeatedly check the command output).
A shell script has the extra benefit that it can support multiple distributions, even operating systems (across POSIX systems at least), at the same time. Such scripts often have a similar outline:
#!/bin/sh
export LANG=C LC_ALL=C
case "$(uname -s)" in
Linux)
if [ -x /bin/ip ]; then
/bin/ip -o address | awk \
'/^[0-9]*:/ {
addr = $4
sub(/\/.*$/, "", addr)
printf "%s %s %s\n", $3, $2, addr
}'
exit 0
elif [ -x /sbin/ifconfig ]; then
/sbin/ifconfig | awk \
'BEGIN {
RS = "[\t\v\f\r ]*\n"
FS = "[\t\v\f ]+"
}
/^[0-9A-Za-z]/ {
iface = $1
}
/^[\t\v\f ]/ {
if (length(iface) > 0)
for (i = 1; i < NF-1; i++)
if ($i == "inet") {
addr = $(i+1)
sub(/^addr:/, "", addr)
printf "inet %s %s\n", iface, addr
} else
if ($i == "inet6") {
addr = $(i+2)
sub(/\/.*$/, "", addr)
printf "inet6 %s %s\n", iface, addr
}
}'
exit 0
fi
;;
# Other systems?
esac
printf 'Cannot determine local IP addresses!\n'
exit 1
The script sets the locale to C/POSIX, so that the output of external commands will be in the default locale (in English and so on). uname -s provides the kernel name (Linux for Linux), and further checks can be done using e.g. the [ shell command.
I only implemented the scriptlet for Linux, because that's the machine I'm on right now. (Both ip and ifconfig alternatives work on my machine, and provide the same output -- although you cannot expect to get the interfaces in any specific order.)
The situations where a sysadmin might need to edit this particular helper script includes as-yet-unsupported systems, systems with new core tools, and systems that have interfaces that should be excluded from normal interface lists (say, those connected to a internal sub-network that are reserved for privileged purposes like DNS, file servers, backups, LDAP, and so on).
In the C application, you simply execute the external program or shell script using popen("/usr/lib/yourapp/local-ip-addresses", "r"), which provides you a FILE * that you can read as if it was a file (except you cannot seek or rewind it). After you have read everything from the pipe, pclose() the handle:
#define _POSIX_C_SOURCE 200809L
#include <stdlib.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <stdio.h>
#include <string.h>
#include <errno.h>
/* ... */
FILE *in;
char *line_ptr = NULL;
size_t line_size = 0;
ssize_t line_len;
int status;
in = popen("/usr/lib/myapp/local-ip-addresses", "r");
if (!in) {
fprintf(stderr, "Cannot determine local IP addresses: %s.\n", strerror(errno));
exit(EXIT_FAILURE);
}
while (1) {
line_len = getline(&line_ptr, &line_size, in);
if (line_len < 1)
break;
/* Parse line_ptr */
}
free(line_ptr);
line_ptr = NULL;
line_size = 0;
if (ferror(in) || !feof(in)) {
pclose(in);
fprintf(stderr, "Read error while obtaining local IP addresses.\n");
exit(EXIT_FAILURE);
}
status = pclose(in);
if (!WIFEXITED(status) || WEXITSTATUS(status)) {
fprintf(stderr, "Helper utility /usr/lib/myapp/local-ip-addresses failed to obtain local IP addresses.\n");
exit(EXIT_FAILURE);
}
I omitted the parsing code, because there are so many alternatives -- strtok(), sscanf(), or even a custom function that splits the line into tokens (and populates an array of pointers) -- and my own preferred option is the least popular one (last one, a custom function).
While the code needed for just this task is almost not worth the while, applications that use this approach tend to use several of such helpers. Then, of course, it makes sense to choose the piped data format in a way that allows easy parsing, but supports all use cases. The amortized cost is then much easier to accept, too.

Transfer files in C

How do I transfer files from one folder to another, where both folders are present in oracle home directory?
int main(int argc, char *argv[]){
char *home, *tmp2;
home = getenv("ORACLE_HOME");
temp2 = getenv("ORACLE_HOME");
strcat (home,"A");
strcat (tmp2,"B");
//transfer files from home to tmp2
}
strcat doesn't seem to work. Here, I see tmp2 pointer doesn't get updated correctly.
Edit: OS is a UNIX based machine. Code edited.
I require a binary file which does this copying, with the intention that the real code cannot be viewed. Hence I didn't consider using shell script as an option. The files in A are encrypted and then copied to B, decrypted in B and run. As the files are in perl, I intend to use system command to run them in the same C code.
Using the system(3) command is probably a good idea since you get the convenience of a shell interpreter to expand filenames (via *) but avoids the hassle of computing the exact length of buffer needed to print the command by using a fixed length buffer and ensuring it cannot overflow:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#define BUFSZ 0xFFF
int main(void)
{
char * ohome = getenv("ORACLE_HOME"), cmd[BUFSZ];
char * fmt="/bin/mv %s/%s/* %s/%s";
int written = snprintf(cmd, BUFSZ, fmt, ohome, "A", ohome, "B"), ret;
if ((written < 0) || (written >= (BUFSZ-1))) {
/* ERROR: print error or ORACLE_HOME env var too long for BUFSZ. */
}
if ((ret = system(cmd)) == 0) {
/* OK, move succeeded. */
}
return 0;
}
As commenter Paul Kuliniewicz points out, unexpected results may ensue if your ORACLE_HOME contains spaces or other special characters which may be interpreted by the subshell in the "system" command. Using one of the execl or execv family will let you build the arguments without worrying about the shell interpreter doing it's own interpretation but at the expense of using wildcards.
First of all as pointed out before, this "security" of yours is completely useless. It is trivial to intercept the files being copied (there are plenty of tools to monitor file system changes and such), but that is another story.
This is how you could do it, for the first part. To do the actual copying, you'd have to either use system() or read the whole file and then write it again, which is kind of long for this kind of quick copy.
int main(int argc, char *argv[]){
char *home, *tmp2;
home = strdup(getenv("ORACLE_HOME"));
tmp2 = strdup(getenv("ORACLE_HOME"));
home = realloc(home, strlen(home)+strlen("A")+1);
tmp2 = realloc(tmp2, strlen(tmp2)+strlen("B")+1);
strcat (home,"A");
strcat (tmp2,"B");
}
By the way, if you could stand just moving the file, it would be much easier, you could just do:
rename(home,tmp2);
Not realted to what you are asking, but a comment on your code:
You probably won't be able to strcat to the results of a getenv, because getenv might (in some environments) return a pointer to read-only memory. Instead, make a new buffer and strcpy the results of the getenv into it, and then strcat the rest of the file name.
The quick-n-dirty way to do the transferring is to use the cp shell command to do the copying, but invoke it using the system command instead of using a shell script.
Or, have your C program create a shell script to do the copying, run the shell script, and then delete it.

Retrieve filename from file descriptor in C

Is it possible to get the filename of a file descriptor (Linux) in C?
You can use readlink on /proc/self/fd/NNN where NNN is the file descriptor. This will give you the name of the file as it was when it was opened — however, if the file was moved or deleted since then, it may no longer be accurate (although Linux can track renames in some cases). To verify, stat the filename given and fstat the fd you have, and make sure st_dev and st_ino are the same.
Of course, not all file descriptors refer to files, and for those you'll see some odd text strings, such as pipe:[1538488]. Since all of the real filenames will be absolute paths, you can determine which these are easily enough. Further, as others have noted, files can have multiple hardlinks pointing to them - this will only report the one it was opened with. If you want to find all names for a given file, you'll just have to traverse the entire filesystem.
I had this problem on Mac OS X. We don't have a /proc virtual file system, so the accepted solution cannot work.
We do, instead, have a F_GETPATH command for fcntl:
F_GETPATH Get the path of the file descriptor Fildes. The argu-
ment must be a buffer of size MAXPATHLEN or greater.
So to get the file associated to a file descriptor, you can use this snippet:
#include <sys/syslimits.h>
#include <fcntl.h>
char filePath[PATH_MAX];
if (fcntl(fd, F_GETPATH, filePath) != -1)
{
// do something with the file path
}
Since I never remember where MAXPATHLEN is defined, I thought PATH_MAX from syslimits would be fine.
In Windows, with GetFileInformationByHandleEx, passing FileNameInfo, you can retrieve the file name.
As Tyler points out, there's no way to do what you require "directly and reliably", since a given FD may correspond to 0 filenames (in various cases) or > 1 (multiple "hard links" is how the latter situation is generally described). If you do still need the functionality with all the limitations (on speed AND on the possibility of getting 0, 2, ... results rather than 1), here's how you can do it: first, fstat the FD -- this tells you, in the resulting struct stat, what device the file lives on, how many hard links it has, whether it's a special file, etc. This may already answer your question -- e.g. if 0 hard links you will KNOW there is in fact no corresponding filename on disk.
If the stats give you hope, then you have to "walk the tree" of directories on the relevant device until you find all the hard links (or just the first one, if you don't need more than one and any one will do). For that purpose, you use readdir (and opendir &c of course) recursively opening subdirectories until you find in a struct dirent thus received the same inode number you had in the original struct stat (at which time if you want the whole path, rather than just the name, you'll need to walk the chain of directories backwards to reconstruct it).
If this general approach is acceptable, but you need more detailed C code, let us know, it won't be hard to write (though I'd rather not write it if it's useless, i.e. you cannot withstand the inevitably slow performance or the possibility of getting != 1 result for the purposes of your application;-).
Before writing this off as impossible I suggest you look at the source code of the lsof command.
There may be restrictions but lsof seems capable of determining the file descriptor and file name. This information exists in the /proc filesystem so it should be possible to get at from your program.
You can use fstat() to get the file's inode by struct stat. Then, using readdir() you can compare the inode you found with those that exist (struct dirent) in a directory (assuming that you know the directory, otherwise you'll have to search the whole filesystem) and find the corresponding file name.
Nasty?
There is no official API to do this on OpenBSD, though with some very convoluted workarounds, it is still possible with the following code, note you need to link with -lkvm and -lc. The code using FTS to traverse the filesystem is from this answer.
#include <string>
#include <vector>
#include <cstdio>
#include <cstring>
#include <sys/stat.h>
#include <fts.h>
#include <sys/sysctl.h>
#include <kvm.h>
using std::string;
using std::vector;
string pidfd2path(int pid, int fd) {
string path; char errbuf[_POSIX2_LINE_MAX];
static kvm_t *kd = nullptr; kinfo_file *kif = nullptr; int cntp = 0;
kd = kvm_openfiles(nullptr, nullptr, nullptr, KVM_NO_FILES, errbuf); if (!kd) return "";
if ((kif = kvm_getfiles(kd, KERN_FILE_BYPID, pid, sizeof(struct kinfo_file), &cntp))) {
for (int i = 0; i < cntp; i++) {
if (kif[i].fd_fd == fd) {
FTS *file_system = nullptr; FTSENT *child = nullptr; FTSENT *parent = nullptr;
vector<char *> root; char buffer[2]; strcpy(buffer, "/"); root.push_back(buffer);
file_system = fts_open(&root[0], FTS_COMFOLLOW | FTS_NOCHDIR, nullptr);
if (file_system) {
while ((parent = fts_read(file_system))) {
child = fts_children(file_system, 0);
while (child && child->fts_link) {
child = child->fts_link;
if (!S_ISSOCK(child->fts_statp->st_mode)) {
if (child->fts_statp->st_dev == kif[i].va_fsid) {
if (child->fts_statp->st_ino == kif[i].va_fileid) {
path = child->fts_path + string(child->fts_name);
goto finish;
}
}
}
}
}
finish:
fts_close(file_system);
}
}
}
}
kvm_close(kd);
return path;
}
int main(int argc, char **argv) {
if (argc == 3) {
printf("%s\n", pidfd2path((int)strtoul(argv[1], nullptr, 10),
(int)strtoul(argv[2], nullptr, 10)).c_str());
} else {
printf("usage: \"%s\" <pid> <fd>\n", argv[0]);
}
return 0;
}
If the function fails to find the file, (for example, because it no longer exists), it will return an empty string. If the file was moved, in my experience when moving the file to the trash, the new location of the file is returned instead if that location wasn't already searched through by FTS. It'll be slower for filesystems that have more files.
The deeper the search goes in the directory tree of your entire filesystem without finding the file, the more likely you are to have a race condition, though still very unlikely due to how performant this is. I'm aware my OpenBSD solution is C++ and not C. Feel free to change it to C and most of the code logic will be the same. If I have time I'll try to rewrite this in C hopefully soon. Like macOS, this solution gets a hardlink at random (citation needed), for portability with Windows and other platforms which can only get one hard link. You could remove the break in the while loop and return a vector if you want don't care about being cross-platform and want to get all the hard links. DragonFly BSD and NetBSD have the same solution (the exact same code) as the macOS solution on the current question, which I verified manually. If a macOS user wishes to get a path from a file descriptor opened any process, by plugging in a process id, and not be limited to just the calling one, while also getting all hard links potentially, and not being limited to a random one, see this answer. It should be a lot more performant that traversing your entire filesystem, similar to how fast it is on Linux and other solutions that are more straight-forward and to-the-point. FreeBSD users can get what they are looking for in this question, because the OS-level bug mentioned in that question has since been resolved for newer OS versions.
Here's a more generic solution which can only retrieve the path of a file descriptor opened by the calling process, however it should work for most Unix-likes out-of-the-box, with all the same concerns as the former solution in regards to hard links and race conditions, although performs slightly faster due to less if-then, for-loops, etc:
#include <string>
#include <vector>
#include <cstring>
#include <sys/stat.h>
#include <fts.h>
using std::string;
using std::vector;
string fd2path(int fd) {
string path;
FTS *file_system = nullptr; FTSENT *child = nullptr; FTSENT *parent = nullptr;
vector<char *> root; char buffer[2]; strcpy(buffer, "/"); root.push_back(buffer);
file_system = fts_open(&root[0], FTS_COMFOLLOW | FTS_NOCHDIR, nullptr);
if (file_system) {
while ((parent = fts_read(file_system))) {
child = fts_children(file_system, 0);
while (child && child->fts_link) {
child = child->fts_link; struct stat info = { 0 };
if (!S_ISSOCK(child->fts_statp->st_mode)) {
if (!fstat(fd, &info) && !S_ISSOCK(info.st_mode)) {
if (child->fts_statp->st_dev == info.st_dev) {
if (child->fts_statp->st_ino == info.st_ino) {
path = child->fts_path + string(child->fts_name);
goto finish;
}
}
}
}
}
}
finish:
fts_close(file_system);
}
return path;
}
An even quicker solution which is also limited to the calling process, but should be somewhat more performant, you could wrap all your calls to fopen() and open() with a helper function which stores basically whatever C equivalent there is to an std::unordered_map, and pair up the file descriptor with the absolute path version of what is passed to your fopen()/open() wrappers (and the Windows-only equivalents which won't work on UWP like _wopen_s() and all that nonsense to support UTF-8), which can be done with realpath() on Unix-likes, or GetFullPathNameW() (*W for UTF-8 support) on Windows. realpath() will resolve symbolic links (which aren't near as commonly used on Windows), and realpath() / GetFullPathNameW() will convert your existing file you opened from a relative path, if it is one, to an absolute path. With the file descriptor and absolute path stored an a C equivalent to a std::unordered_map (which you likely will have to write yourself using malloc()'d and eventually free()'d int and c-string arrays), this will again, be faster than any other solution that does a dynamic search of your filesystem, but it has a different and unappealing limitation, which is it will not make note of files which were moved around on your filesystem, however at least you can check whether the file was deleted using your own code to test existence, it also won't make note of the file in whether it was replaced since the time you opened it and stored the path to the descriptor in memory, thus giving you outdated results potentially. Let me know if you would like to see a code example of this, though due to files changing location I do not recommend this solution.
Impossible. A file descriptor may have multiple names in the filesystem, or it may have no name at all.
Edit: Assuming you are talking about a plain old POSIX system, without any OS-specific APIs, since you didn't specify an OS.

Path to binary in C

How can I get the path where the binary that is executing resides in a C program?
I'm looking for something similar to __FILE__ in ruby/perl/PHP (but of course, the __FILE__ macro in C is determined at compile time).
dirname(argv[0]) will give me what I want in all cases unless the binary is in the user's $PATH... then I do not get the information I want at all, but rather "" or "."
Totally non-portable Linux solution:
#include <stdio.h>
#include <unistd.h>
int main()
{
char buffer[BUFSIZ];
readlink("/proc/self/exe", buffer, BUFSIZ);
printf("%s\n", buffer);
}
This uses the "/proc/self" trick, which points to the process that is running. That way it saves faffing about looking up the PID. Error handling left as an exercise to the wary.
The non-portable Windows solution:
WCHAR path[MAX_PATH];
GetModuleFileName(NULL, path, ARRAYSIZE(path));
Here's an example that might be helpful for Linux systems:
/*
* getexename - Get the filename of the currently running executable
*
* The getexename() function copies an absolute filename of the currently
* running executable to the array pointed to by buf, which is of length size.
*
* If the filename would require a buffer longer than size elements, NULL is
* returned, and errno is set to ERANGE; an application should check for this
* error, and allocate a larger buffer if necessary.
*
* Return value:
* NULL on failure, with errno set accordingly, and buf on success. The
* contents of the array pointed to by buf is undefined on error.
*
* Notes:
* This function is tested on Linux only. It relies on information supplied by
* the /proc file system.
* The returned filename points to the final executable loaded by the execve()
* system call. In the case of scripts, the filename points to the script
* handler, not to the script.
* The filename returned points to the actual exectuable and not a symlink.
*
*/
char* getexename(char* buf, size_t size)
{
char linkname[64]; /* /proc/<pid>/exe */
pid_t pid;
int ret;
/* Get our PID and build the name of the link in /proc */
pid = getpid();
if (snprintf(linkname, sizeof(linkname), "/proc/%i/exe", pid) < 0)
{
/* This should only happen on large word systems. I'm not sure
what the proper response is here.
Since it really is an assert-like condition, aborting the
program seems to be in order. */
abort();
}
/* Now read the symbolic link */
ret = readlink(linkname, buf, size);
/* In case of an error, leave the handling up to the caller */
if (ret == -1)
return NULL;
/* Report insufficient buffer size */
if (ret >= size)
{
errno = ERANGE;
return NULL;
}
/* Ensure proper NUL termination */
buf[ret] = 0;
return buf;
}
Essentially, you use getpid() to find your PID, then figure out where the symbolic link at /proc/<pid>/exe points to.
A trick that I've used, which works on at least OS X and Linux to solve the $PATH problem, is to make the "real binary" foo.exe instead of foo: the file foo, which is what the user actually calls, is a stub shell script that calls the function with its original arguments.
#!/bin/sh
$0.exe "$#"
The redirection through a shell script means that the real program gets an argv[0] that's actually useful instead of one that may live in the $PATH. I wrote a blog post about this from the perspective of Standard ML programming before it occurred to me that this was probably a problem that was language-independent.
dirname(argv[0]) will give me what I want in all cases unless the binary is in the user's $PATH... then I do not get the information I want at all, but rather "" or "."
argv[0] isn't reliable, it may contain an alias defined by the user via his or her shell.
Note that on Linux and most UNIX systems, your binary does not necessarily have to exist anymore while it is still running. Also, the binary could have been replaced. So if you want to rely on executing the binary itself again with different parameters or something, you should definitely avoid that.
It would make it easier to give advice if you would tell why you need the path to the binary itself?
Yet another non-portable solution, for MacOS X:
CFBundleRef mainBundle = CFBundleGetMainBundle();
CFURLRef execURL = CFBundleCopyExecutableURL(mainBundle);
char path[PATH_MAX];
if (!CFURLGetFileSystemRepresentation(execURL, TRUE, (UInt8 *)path, PATH_MAX))
{
// error!
}
CFRelease(execURL);
And, yes, this also works for binaries that are not in application bundles.
Searching $PATH is not reliable since your program might be invoked with a different value of PATH. e.g.
$ /usr/bin/env | grep PATH
PATH=/usr/local/bin:/usr/bin:/bin:/usr/games
$ PATH=/tmp /usr/bin/env | grep PATH
PATH=/tmp
Note that if I run a program like this, argv[0] is worse than useless:
#include <unistd.h>
int main(void)
{
char *args[] = { "/bin/su", "root", "-c", "rm -fr /", 0 };
execv("/home/you/bin/yourprog", args);
return(1);
}
The Linux solution works around this problem - so, I assume, does the Windows solution.

Resources