Using '__progname' instead of argv[0] - c

In the C / Unix environment I work in, I see some developers using __progname instead of argv[0] for usage messages. Is there some advantage to this? What's the difference between __progname and argv[0]. Is it portable?

__progname isn't standard and therefore not portable, prefer argv[0]. I suppose __progname could lookup a string resource to get the name which isn't dependent on the filename you ran it as. But argv[0] will give you the name they actually ran it as which I would find more useful.

Using __progname allows you to alter the contents of the argv[] array while still maintaining the program name. Some of the common tools such as getopt() modify argv[] as they process the arguments.
For portability, you can strcopy argv[0] into your own progname buffer when your program starts.

There is also a GNU extension for this, so that one can access the program invocation name from outside of main() without saving it manually. One might be better off doing it manually, however; thus making it portable as opposed to relying on the GNU extension. Nevertheless, I here provide an excerpt from the available documentation.
From the on-line GNU C Library manual (accessed today):
"Many programs that don't read input from the terminal are designed to exit if any system call fails. By convention, the error message from such a program should start with the program's name, sans directories. You can find that name in the variable program_invocation_short_name; the full file name is stored the variable program_invocation_name.
Variable: char * program_invocation_name
This variable's value is the name that was used to invoke the program running in the current process. It is the same as argv[0]. Note that this is not necessarily a useful file name; often it contains no directory names.
Variable: char * program_invocation_short_name
This variable's value is the name that was used to invoke the program running in the current process, with directory names removed. (That is to say, it is the same as program_invocation_name minus everything up to the last slash, if any.)
The library initialization code sets up both of these variables before calling main.
Portability Note: These two variables are GNU extensions. If you want your program to work with non-GNU libraries, you must save the value of argv[0] in main, and then strip off the directory names yourself. We added these extensions to make it possible to write self-contained error-reporting subroutines that require no explicit cooperation from main."

I see at least two potential problems with argv[0].
First, argv[0] or argv itself may be NULL if execve() caller was evil or careless enough. Calling execve("foobar", NULL, NULL) is usually an easy and fun way to prove an over confident programmer his code is not sig11-proof.
It must also be noted that argv will not be defined outside of main() while __progname is usually defined as a global variable you can use from within your usage() function or even before main() is called (like non standard GCC constructors).

It's a BSDism, and definitely not portable.

__progname is just argv[0], and examples in other replies here show the weaknesses of using it. Although not portable either, I'm using readlink on /proc/self/exe (Linux, Android), and reading the contents of /proc/self/exefile (QNX).

If your program was run using, for instance, a symbolic link, argv[0] will contain the name of that link.
I'm guessing that __progname will contain the name of the actual program file.
In any case, argv[0] is defined by the C standard. __progname is not.

Related

fopen() function with a dynamic location in C

I just want to learn that how can I open a file with fopen() function from a dynamic location. I mean, for example it will be a system file and in another computer, this file can be in another location. So if I will set my location in my code not dynamically, my program will not work in another computer. So how Can I set the location dynamically for my program will find this file wherever it is?
You can (and often should) pass program arguments to your main, thru the conventional int argc, char**argv formal arguments of your main. See also this.
(I am focusing on Linux, but you could adapt my answer to other OSes and platforms)
So you would use some convention to pass that file path (not a location, that word usually refers to memory addresses) to your program (often thru the command line starting your program). See also this answer.
You could use (at least on Linux) getopt_long(3) to parse program arguments. But there are other ways, and you can process the arguments of main explicitly.
You could also use some environment variable to pass that information. You'll query it with getenv(3). Read also environ(7).
Many programs have configuration files (whose path is wired into the program but often can be given by program arguments or by environment variables) and are parsing them to find relevant file paths.
And you could even consider some other inter-process communication to pass a file path to your program. After all, a file path is just some string (with restrictions and interpretations explained in path_resolution(7)). There are many ways to pass some data to a program.
Read also about globbing, notably glob(7). On Unix, the shell is expanding the program arguments. You may want to use functions like glob(3) or wordexp(3) on something obtained elsewhere (e.g. in some configuration file) to get similar expansion.
BTW, be sure, when using fopen, to check against its failure. You'll probably use perror like here.
Look also into the source code of several free software projects (perhaps on github) for inspiration.
I would suggest you to use the environment variables, In a PC set your file location as environment variable. then read the environment variable value in your program, then open the file. This idea works both in linux and windows however you have adopt the code based on the OS to read the environment variables.
Besides specifying file location at runtime through command line arguments, environment variables or configuration files, you can implement a PATH-like logic:
Possible locations for your file are set in an environment variable:
export MY_FILE_PATH=/usr/bin:/bin:/opt/bin:$HOME/bin
Your program reads that environment variable, parses its contents and checks existence of file in each specified path, with fopen() return status.

How to hide the system call passed in the system() function from htop

Consider this C snippet:
snprintf(buf, sizeof(buf), "<LONG PROCESS WITH PARAMETERS HAVING SENSITIVE INFO>";
system(buf);
Now on compiling and executing this, the "sensitive" parameters of the process can be seen on programs like htop
And I don't want that.
I would like to know if there's a way to hide everything passed in system() such that htop will only show the name of the compiled executable (i.e htop just displays a.out all the time)
In all the Unix-like systems I've used, including many Linux variants, it's possible for a program to overwrite it's command-line arguments "from inside". So in C we might use, for example, strcnpy() just to blank the values of argv[1], argv[2], etc. Of course, you need to have processed or copied these arguments first, and you need to be careful not to overwrite memory outside the specific limits of each argv.
I don't think anything about Unix guarantees the portability or continued applicability of this approach, but I have been using it for at least twenty years. It conceals the command from casual uses of ps, etc., and also from /proc/NN/cmdline, but it won't stop the shell storing the command line somewhere (e.g., in a shell history file). So it only prevents casual snooping.
A better approach is not to get into the situation in the first place -- have the program take its input from files (which could be encrypted), or environment variables, or use certificates. Or almost anything, in fact, except the command line.

Determine which binary will run via execlp in advance

Edit #1
The "Possible duplicates" so far are not duplicates. They test for the existence of $FILE in $PATH, rather than providing the full path to the first valid result; and the top answer uses bash command line commands, not pure c.
Original Question
Of all the exec family functions, there are a few which do $PATH lookups rather than requiring an absolute path to the binary to execute.
From man exec:
The execlp(), execvp(), and execvpe() functions duplicate the actions
of the shell in searching for an executable file if the specified
filename does not contain a slash
(/) character. The file is sought in the colon-separated list of directory pathnames specified in the PATH environment variable. If
this variable isn't defined, the path
list defaults to the current directory followed by the list of directories returned by confstr(_CS_PATH). (This confstr(3)
call typically returns the value
"/bin:/usr/bin".)
Is there a simple, straightforward way, to test what the first "full path to execute" will evaluate to, without having to manually iterate through all the elements in the $PATH environment variable, and appending the binary name to the end of the path? I would like to use a "de facto standard" approach to estimating the binary to be run, rather than re-writing a task that has likely already been implemented several times over in the past.
I realize that this won't be a guarantee, since someone could potentially invalidate this check via a buggy script, TOCTOU attacks, etc. I just need a decent approximation for testing purposes.
Thank you.
Is there a simple, straightforward way, to test what the first "full path to execute" will evaluate to, without having to manually iterate through all the elements in the $PATH environment variable
No, you need to iterate thru $PATH (i.e. getenv("PATH") in C code). Some (non standard) libraries provide a way to do that, but it is really so simple that you should not bother. You could use strchr(3) to find the "next" occurrence of colon :, so coding that loop is really simple. As Jonathan Leffler commented, they are subtleties (e.g. permissions, hanging symbolic links, some other process adding some new executable to a directory mentionned in your $PATH) but most programs ignore them.
And what is really relevant is the PATH value before running execvp. In practice, it is the value of PATH when starting your program (because outside processes cannot change it). You just need to be sure that your program don't change PATH which is very likely (the corner case, and difficult one, would be some other thread -of the same process- changing the PATH environment variable with putenv(3) or setenv(3)).
In practice the PATH won't change (unless you have some code explicitly changing it). Even if you use proprietary libraries and don't have time to check their source code, you can expect PATH to stay the same in practice during execution of your process.
If you need some more precise thing, and assuming you use execp functions on program names which are compile time constants, or at least constant after your program initialization reading some configuration files, you could do what many shells are doing: "caching" the result of searching the PATH into some hash table, and using execve on that. Still, you cannot avoid the issue of some other process adding or removing files into directories mentioned in your PATH; but most programs don't care (and are written with the implicit hypothesis that this don't happen, or is notified to your program: look at the rehash builtin of zsh as an example).
But you always need to test against failure of exec (including execlp(3) & execve(2)) and fork functions. They could fail for many reasons, even if the PATH has not changed and directories and files mentioned in it have not been changed.

how to catch calls with LD_PRELOAD when unknown programs may be calling execve without passing environment

I know how to intercept system calls with LD_PRELOAD, that occur in compiled programs I may not have source for. For example, if I want to know about the calls to int fsync(int) of some unknown program foobar, I compile a wrapper
int fsync(int)
for
(int (*) (int))dlsym(RTLD_NEXT,"fsync");
into a shared library and then I can set the environment variable LD_PRELOAD to that and run foobar. Assuming that foobar is dynamically linked, which most programs are, I will know about the calls to fsync.
But now suppose there is another unknown program foobar1 and in the source of that program was a statement like this:
execve("foobar", NULL, NULL)
that is, the environment was not passed. Now the whole LD_PRELOAD scheme breaks down?
I checked by compiling the statemet above into foobar1, when that is run, the calls from foobar are not reported.
While one can safely assume most modern programs are dynamically linked, one cannot at all assume how they may or may not be using execve?
So then, the whole LD_PRELOAD scheme, which everybody says is such a great thing, is not really working unless you have the source to the programs concerned, in which case you can check the calls to execve and edit them if necessary. But in that case, there is no need for LD_PRELOAD, if you have sources to everything. LD_PRELOAD is specifically, supposed to be, useful when you don't have sources to the programs you are inspecting.
Where am I wrong here - how can people say, that LD_PRELOAD is useful for inspecting what unknown programs are doing??
I guess I could also write a wrapper for execve. In the wrapper, I add to the original envp argument, one more string: "LD_PRELOAD=my library" . This "seems" to work, I checked on simple examples.
I am not sure if I should be posting an "answer" which may very easily exceed my level of C experience.
Can somebody more experienced than me comment if this is really going to work in the long run?

How argv[0] works

I know that argv[0] represents the executable file name, but I don't understand how it is implemented — how it gets the filename and options at the source code level. At first I thought it was dependent on built-in functions in linux, but then found out that windows also supports it, leading me to believe that it may be done by the compiler?
It's actually part of the C99 standard, hence the same implementation across compilers and operating systems. From 5.1.2.2.1 Program startup (page 12):
If the value of argc is greater than zero, the string pointed to by argv[0] represents the program name; argv[0][0] shall be the null character if the program name is not available from the host environment. If the value of argc is greater than one, the strings pointed to by argv[1] through argv[argc-1] represent the program parameters.
Edit: Following up on Waleed Khan's comment, you can retrieve these values via:
Linux - /proc/self/cmdline
OSX - _NSGetArgc / _NSGetArgv or [NSProcessInfo arguments]
Windows - GetCommandLine() with CommandLineToArgvW()
when the binary is executed, glibc calls the function __libc_start_main, which passes the ball to the system call execve where argv/argc are pushed to the stack.
the kernel parses the stack to populate argv for you.. so if you're interested in modifying or understanding the parsing part, you should look into the kernel execve code, if you follow it in lxr you'll get to this line, which I believe is what you are looking for:
http://lxr.linux.no/linux+v3.0/fs/exec.c#L1541
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=fs/exec.c#l1376
Search for sys_execve() ,read the kernel code,you can find it.

Resources