POSIX function to search PATH for an executable? - c

Is there a POSIX function that searches PATH for an executable according to the POSIX spec's description of the PATH environment variable and returns the absolute path to the executable?
If not, is there a simple, safe, standard, and reliable way to search PATH?
Edit:
glibc's execvpe() function does its own PATH search, so I'm guessing there isn't a specific PATH search function defined by the standard.
Edit 2: I don't want to copy someone else's code or implement the PATH search myself for a few reasons:
DRY
More code I have to test and maintain
Possible licensing issues
POSIX says, "If PATH is unset or is set to null, the path search is implementation-defined." I would like the behavior in these cases to be consistent with whatever the system does, but I can't do this if there's not a standard function I can call.

Is there a POSIX function that searches PATH for an executable according to the POSIX spec's description of the PATH environment variable and returns the absolute path to the executable?
No.
If not, is there a simple, safe, standard, and reliable way to search PATH?
Yes and no. Yes, there is a standard for the format of PATH, from which the correctness/reliability of implementations follow.
No, there is no standard function that does this. Copying code is your best bet.
If PATH is unset or is set to null, the path search is implementation-defined.
That means you can't always portably replicate what execvp does, but searching /bin:/usr/bin is a pretty safe bet. Alternatively, just raise an error in this case.
(I admit that it would have been nice if POSIX had had this function, but it just isn't there.)

The command line tool which will do that. here's the man page
and the source

What about doing something like:
FILE *f = popen("command -v somecommand", "r")
and then read its output? This would result in behavior that matches the system's handling of empty/unset PATH, and it may be simpler than manually searching PATH.
There are some drawbacks to this approach:
If somecommand comes from the user, it may have to be sanitized to prevent code injection attacks. That would add to the complexity.
Reliably reading the stream while handling all possible error cases is not trivial. More complexity.
If somecommand is a shell special built-in (e.g., set), it'll return bogus results. This case should be detectable, but then what? More complexity.

Related

How to deal with bash shell expansions and fopen()

I am writing a program to make copies of files in Linux. The program takes two arguments:
a char *source which is the path to the source file that needs copying
a char *dest which is the path to the destination where a copy of source will be made
I am using the fopen(3) function to open files.
So far i have noticed fopen() does not recognize common bash shell expansions such as:
~ for current users home directory
To handle this i could use switch/if statements and check if the source path (char *source) has any of the characters and act accordingly. This solution would mean i would have to construct a absolute path (a path from / aka the root directory example: /usr/dir2/books/maths.pdf/). Ideally i would like a solution that works on any path from any current working directory the user is in.
My question is: Is there a better way to handle this? How could i handle the paths in a portable and efficient manner?
So far i have noticed fopen() does not recognize common bash shell expansions
Indeed it does not. shell expansions such as tilde substitution and globbing are performed by the shell. If you want them to be performed by your program, too, then you need to implement that yourself. On the flip side, you do not need to worry about quoting characters that would otherwise be significant to the shell, because you don't get word splitting etc. except if and as you implement it.
Is there a better way to handle this? How could i handle the paths in a portable and efficient manner?
A pretty good way to handle it would be to let somebody else handle it. In particular, if the filenames in question are specified as command-line arguments, then expansions will be handled (or not) by the shell, under control of the user. In that case, users will not expect your program to perform additional expansions. If the file names come from another source (config file, file chooser) then it is reasonable to expect the full path or possibly a correct relative path to be given.
If you nevertheless want to handle bash-style expansion of a bare tilde in particular, then that's not so hard. It applies only to tildes appearing as the first character of a file name, and it expands those to the value of environment variable HOME. You can read environment variables with getenv(), and other than that you just need some relatively simple string manipulation.
More general tilde expansion, where ~username appearing at the beginning of a filename is expanded to the home directory of user username, would require you to consult the user (a.k.a. password) database. On a POSIX system, a natural way to do that would be via the getpwnam() function.
Note well that different shells exhibit differences in the expansions they perform and the conditions under which they perform them. You cannot emulate all shells at the same time, so leaving expansions an external concern is both the most portable and the most efficient option.
Consider using the POSIXwordexp() function (or maybe the POSIX
glob() instead) to convert an input string to list of files, based on bash expansion.
See:
man wordexp — https://man7.org/linux/man-pages/man3/wordexp.3.html
man glob — https://man7.org/linux/man-pages/man3/glob.3.html
Not very efficient, as it forks a shell to perform the expansion, but it does the job.

Determine which binary will run via execlp in advance

Edit #1
The "Possible duplicates" so far are not duplicates. They test for the existence of $FILE in $PATH, rather than providing the full path to the first valid result; and the top answer uses bash command line commands, not pure c.
Original Question
Of all the exec family functions, there are a few which do $PATH lookups rather than requiring an absolute path to the binary to execute.
From man exec:
The execlp(), execvp(), and execvpe() functions duplicate the actions
of the shell in searching for an executable file if the specified
filename does not contain a slash
(/) character. The file is sought in the colon-separated list of directory pathnames specified in the PATH environment variable. If
this variable isn't defined, the path
list defaults to the current directory followed by the list of directories returned by confstr(_CS_PATH). (This confstr(3)
call typically returns the value
"/bin:/usr/bin".)
Is there a simple, straightforward way, to test what the first "full path to execute" will evaluate to, without having to manually iterate through all the elements in the $PATH environment variable, and appending the binary name to the end of the path? I would like to use a "de facto standard" approach to estimating the binary to be run, rather than re-writing a task that has likely already been implemented several times over in the past.
I realize that this won't be a guarantee, since someone could potentially invalidate this check via a buggy script, TOCTOU attacks, etc. I just need a decent approximation for testing purposes.
Thank you.
Is there a simple, straightforward way, to test what the first "full path to execute" will evaluate to, without having to manually iterate through all the elements in the $PATH environment variable
No, you need to iterate thru $PATH (i.e. getenv("PATH") in C code). Some (non standard) libraries provide a way to do that, but it is really so simple that you should not bother. You could use strchr(3) to find the "next" occurrence of colon :, so coding that loop is really simple. As Jonathan Leffler commented, they are subtleties (e.g. permissions, hanging symbolic links, some other process adding some new executable to a directory mentionned in your $PATH) but most programs ignore them.
And what is really relevant is the PATH value before running execvp. In practice, it is the value of PATH when starting your program (because outside processes cannot change it). You just need to be sure that your program don't change PATH which is very likely (the corner case, and difficult one, would be some other thread -of the same process- changing the PATH environment variable with putenv(3) or setenv(3)).
In practice the PATH won't change (unless you have some code explicitly changing it). Even if you use proprietary libraries and don't have time to check their source code, you can expect PATH to stay the same in practice during execution of your process.
If you need some more precise thing, and assuming you use execp functions on program names which are compile time constants, or at least constant after your program initialization reading some configuration files, you could do what many shells are doing: "caching" the result of searching the PATH into some hash table, and using execve on that. Still, you cannot avoid the issue of some other process adding or removing files into directories mentioned in your PATH; but most programs don't care (and are written with the implicit hypothesis that this don't happen, or is notified to your program: look at the rehash builtin of zsh as an example).
But you always need to test against failure of exec (including execlp(3) & execve(2)) and fork functions. They could fail for many reasons, even if the PATH has not changed and directories and files mentioned in it have not been changed.

Safe cross-platform function to get normalized path

I'd like to have a standard function that will convert relative paths into absolute ones, and if possible I'd like to make it as cross-platform as possible (so I'd like to avoid calling external library functions). This is intended so it's possible to prevent path exploitations.
I am aware that such a function wouldn't be able to detect symbolic links, but I'm ok with that for my application.
I could roll my own code, but there might be some problems with e.g. how a platform handles encoding or variations of the "../" pattern.
Is there something like that already implemented?
There's not a single, universal function you can call, since there's no such function in the C or C++ standard libraries. On Windows, you can use GetFullPathName. On Linux, Mac OS X, and other *Unix-based systems, you can use the realpath(3) function, which as a bonus also resolves symbolic links along the way.
Beware: Any solution to this is only reliable in a single-threaded program. If you're using multiple threads, another can go out and change the working directory out from under you unexpectedly, changing the path name resolution.
I think the closest you're going to get to platform independence are the POSIX libraries. In particular you'll wanna check out unistd.h which unfortunately I don't believe has a 'normalized' path concept. If I remember correctly the standard itself doesn't even know much about directories much less relative ones.
To get better than that I think you'll need to roll your own path goodies.

following symbolic links in C

I'm looking to write a C program which, given the name of symbolic link, will print the name of the file or directory the link points to. Any suggestions on how to start?
The readlink() function that has been mentioned is part of the answer. However, you should be aware of its horrid interface (it does not null terminate the response string!).
You might also want to look at the realpath() function, the use of which was discussed in SO 1563186. You could also look at the code for 'linkpath' at the IIUG Software Archive. It analyzes the security of all the directories encountered as a symbolic link is resolved - it uses readlink() and lstat() and stat(); one of the checks when testing the program was to ensure that realpath() resolved the name to the same file.
Make sure that you have an environment which supports POSIX functions, include unistd.h and then use the readlink function.
Depending on the platform, stat() or fstat() are probably the first things to try out. If you're on Linux or cygwin then the stat program will give you a reasonable idea of what to expect from the system API call (it pretty much gives you a text dump of it).
The system call you want is readlink(). Takes a path to the link, returns the string (not always a valid path in the filesystem!) stored in the link. Check the man page ("man 2 readlink") for details.
Note there is some ambiguity to your question. You might be asking for how to tell the "real" path in the filesystem, which is a little more complicated.

How to get the absolute path of a file programmatically with out realpath() under linux?

I know it is possible to get an absolute path of a file with realpath() function. However, according to BUGS section the manpage, there are some problem in its implementation. The details are following:
BUGS
Avoid using this function. It is broken by design since (unless using the non-standard resolved_path == NULL feature) it is impossible to determine a suitable size for the output buffer, resolved_path. According to POSIX a buffer of size PATH_MAX suffices, but PATH_MAX need not be a defined constant, and may have to be obtained using pathconf(3). And asking pathconf(3) does not really help, since on the one hand POSIX warns that the result of pathconf(3) may be huge and unsuitable for mallocing memory. And on the other hand pathconf(3) may return -1 to signify that PATH_MAX is not bounded.
The libc4 and libc5 implementation contains a buffer overflow (fixed in libc-5.4.13). Thus, set-user-ID programs like mount(8) need a private version.
So, the question is what is the best practice to get the absolute path of a file?
I know this question is old, but I don't see any answers that address the core issue: The man page OP referenced is wrong and outdated, for at least two reasons.
One is that POSIX 2008 added/mandated support for the NULL argument option, whereby realpath allocates the string for you. Programs using this feature will be portable to all relevant versions of GNU/Linux, probably most other modern systems, and anything conforming to POSIX 2008.
The second reason the man page is wrong is the admonition against PATH_MAX. This is purely GNU religious ideology against "arbitrary limits". In the real world, not having a pathname length limit would add all sorts of avenues for abuse/DoS, would add lots of failure cases to tasks that otherwise could not fail, and would break more interfaces than just realpath.
If you care about maximum portability, it's probably best to use a mix of both methods. See the POSIX documentation for details:
http://pubs.opengroup.org/onlinepubs/9699919799/functions/realpath.html
I would use a fixed-size, caller-provided buffer if PATH_MAX is defined, and otherwise pass NULL. This seems to cover all cases, but you might also want to check older versions of POSIX to see if they have any guidelines for what to do if PATH_MAX is not defined.
Use getcwd() and readlink() which allows to give a buffer size to reimplement realpath(). Note that you have to resolve symbolic links, "." and ".." from left to right to do it correctly.
From the shell, I can get a full path using readlink -f $FILE. There's a readlink() function in glibc, maybe that'll help you.
# man 2 readlink

Resources