Why first arg to execve() must be path to executable - c

I understand that execve() and family require the first argument of its argument array to be the same as the executable that is also pointed to by its first argument. That is, in this:
execve(prog, args, env);
args[0] will usually be the same as prog. But I can't seem to find information as to why this is.
I also understand that executables (er, at least shell scripts) always have their calling path as the first argument when running, but I would think that the shell would do the work to put it there, and execve() would just call the executable using the path given in its first argument ("prog" from above), then passing the argument array ("args" from above) as one would on the command line.... i.e., I don't call scripts on the command line with a duplicate executable path in the args list....
/bin/ls /bin/ls /home/john
Can someone explain?

There is no requirement that the first of the arguments bear any relation to the name of the executable:
int main(void)
{
char *args[3] = { "rip van winkle", "30", 0 };
execv("/bin/sleep", args);
return 1;
}
Try it - on a Mac (after three tests):
make x; ./x & sleep 1; ps
The output on the third run was:
MiniMac JL: make x; ./x & sleep 1; ps
make: `x' is up to date.
[3] 5557
PID TTY TIME CMD
5532 ttys000 0:00.04 -bash
5549 ttys000 0:00.00 rip van winkle 30
5553 ttys000 0:00.00 rip van winkle 30
5557 ttys000 0:00.00 rip van winkle 30
MiniMac JL:
EBM comments:
Yeah, and this makes it even more weird. In my test bash script (the target of the execve), I don't see the value of what execve has in arg[0] anywhere -- not in the environment, and not as $0.
Revising the experiment - a script called 'bash.script':
#!/bin/bash
echo "bash script at sleep (0: $0; *: $*)"
sleep 30
And a revised program:
int main(void)
{
char *args[3] = { "rip van winkle", "30", 0 };
execv("./bash.script", args);
return 1;
}
This yields the ps output:
bash script at sleep (0: ./bash.script; *: 30)
PID TTY TIME CMD
7804 ttys000 0:00.11 -bash
7829 ttys000 0:00.00 /bin/bash ./bash.script 30
7832 ttys000 0:00.00 sleep 30
There are two possibilities as I see it:
The kernel juggles the command line when executing the script via the shebang ('#!/bin/bash') line, or
Bash itself dinks with its argument list.
How to establish the difference? I suppose copying the shell to an alternative name, and then using that alternative name in the shebang would tell us something:
$ cp /bin/bash jiminy.cricket
$ sed "s%/bin/bash%$PWD/jiminy.cricket%" bash.script > tmp
$ mv tmp bash.script
$ chmod +w bash.script
$ ./x & sleep 1; ps
[1] 7851
bash script at sleep (0: ./bash.script; *: 30)
PID TTY TIME CMD
7804 ttys000 0:00.12 -bash
7851 ttys000 0:00.01 /Users/jleffler/tmp/soq/jiminy.cricket ./bash.script 30
7854 ttys000 0:00.00 sleep 30
$
This, I think, indicates that the kernel rewrites argv[0] when the shebang mechanism is used.
Addressing the comment by nategoose:
MiniMac JL: pwd
/Users/jleffler/tmp/soq
MiniMac JL: cat al.c
#include <stdio.h>
int main(int argc, char **argv)
{
while (*argv)
puts(*argv++);
return 0;
}
MiniMac JL: make al.c
cc al.c -o al
MiniMac JL: ./al a b 'c d' e
./al
a
b
c d
e
MiniMac JL: cat bash.script
#!/Users/jleffler/tmp/soq/al
echo "bash script at sleep (0: $0; *: $*)"
sleep 30
MiniMac JL: ./x
/Users/jleffler/tmp/soq/al
./bash.script
30
MiniMac JL:
That shows that it is the shebang '#!/path/to/program' mechanism, rather than any program such as Bash, that adjusts the values of argv[0]. So, when a binary is executed, the value of argv[0] is not adjusted; when a script is executed via the shebang, the argument list is adjusted by the kernel; argv[0] is the binary listed on the shebang; if there is an argument after the shebang, that becomes argv[1]; the next argument is the name of the script file, followed by any remaining arguments from the execv() or equivalent call.
MiniMac JL: cat bash.script
#!/Users/jleffler/tmp/soq/al -arg0
#!/bin/bash
#!/Users/jleffler/tmp/soq/jiminy.cricket
echo "bash script at sleep (0: $0; *: $*)"
sleep 30
MiniMac JL: ./x
/Users/jleffler/tmp/soq/al
-arg0
./bash.script
30
MiniMac JL:

According to this, the first argument being the program name is a custom.
by custom, the first element should be
the name of the executed program (for
example, the last component of path)
That said, these values could be different. If for example, the program was launched from a symbolic link. The program name might be different than that of the link used to launch it.
And, you are right. The shell would normally do the work of setting up the first argument. In this case however, the use of execve circumvents the shell altogether - which is why you need to set it up yourself.

It allows you to specify the exact path to the executable to be loaded, but also allows for a "beautified" name to be presented in tools such as ps or top.
execl("/bin/ls", "ls", "/home/john", (char *)0);

That allows a program to have many names and work slightly differently depending on using which name it was called.
Imaging trivial program, e.g. print0.c compiled into print0:
#include <stdio.h>
int main(int argc, char **argv)
{
printf("%s\n",argv[0]);
return 0;
}
Running it as ./print0 would print ./print0 Make a symbolic link e.g. print1 to it and now use name ./print1 to run it - it would print "./print1".
Now that was with a symlink. But with exec*() function, you can tell program its name explicitly.
Artifact from *NIX, but nice to have nevertheless.

Related

Why doesn't the execve command in C on macOS allow the 'which' command to work?

Why does the execve command in C on macOS not allow the 'which' command to work? It works on non-Mac devices.
#include <errno.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdio.h>
int main()
{
int fd;
char cmd[] = "/bin/cat";
char cmd1[] = "/usr/bin/which";
char *s[]={"which","ls",NULL};
if (execve(cmd1, s, NULL) == -1)
perror("oops ur wrong!!");
}
Expected output
 clang-7 -pthread -lm -o main main.c
 ./main
/bin/ls

but on a Mac, it returns nothing.
macOS
The code works. It doesn't work well, but it does work.
Given the null PATH in the environment (because you've used execve() and provided NULL as the environment), /usr/bin/which can't find ls — it has nowhere to look for it because PATH is not set.
On my machine (a MacBook Pro running macOS Big Sur 11.7.1 — it's a work machine and the company IT is behind the times), /usr/bin/which is a universal binary with two architectures. If I run /usr/bin/which ozymandias on the command line, there is no output (I don't have a command ozymandias anywhere), but the exit status is 1 (failure). That's an odd implementation — not reporting an error — but it works within its limits.
You can see this effect with:
$ (unset PATH; /usr/bin/which ls)
$ echo $?
1
$
If you use execv() instead of execve() and remove the , NULL from the argument list, the output is /bin/ls and the exit status is 0.
Linux
Just for comparison, on a RHEL 7.4 machine, I get different results:
$ which -a which
which='alias | /usr/bin/which --tty-only --read-alias --show-dot --show-tilde'
/usr/bin/alias
/usr/bin/which
/usr/bin/which
$ file /usr/bin/which
/usr/bin/which: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.32, BuildID[sha1]=317ba624d2914607bf9246993446803a977fbc18, stripped
$ /usr/bin/which which
/usr/bin/which
$ (unset PATH; /usr/bin/which which)
/usr/bin/which: no which in ((null))
$ /usr/bin/which ozymandias
/usr/bin/which: no ozymandias in (/work2/jleffler/bin:/u/jleffler/bin:/usr/perl/v5.34.0/bin:/usr/gcc/v12.2.0/bin:/usr/local/bin:/usr/bin:/usr/sbin)
$ /usr/bin/which --help
Usage: /usr/bin/which [options] [--] COMMAND [...]
Write the full path of COMMAND(s) to standard output.
--version, -[vV] Print version and exit successfully.
--help, Print this help and exit successfully.
--skip-dot Skip directories in PATH that start with a dot.
--skip-tilde Skip directories in PATH that start with a tilde.
--show-dot Don't expand a dot to current directory in output.
--show-tilde Output a tilde for HOME directory for non-root.
--tty-only Stop processing options on the right if not on tty.
--all, -a Print all matches in PATH, not just the first
--read-alias, -i Read list of aliases from stdin.
--skip-alias Ignore option --read-alias; don't read stdin.
--read-functions Read shell functions from stdin.
--skip-functions Ignore option --read-functions; don't read stdin.
Recommended use is to write the output of (alias; declare -f) to standard
input, so that which can show aliases and shell functions. See which(1) for
examples.
If the options --read-alias and/or --read-functions are specified then the
output can be a full alias or function definition, optionally followed by
the full path of each command used inside of those.
Report bugs to <which-bugs#gnu.org>.
$
PATH sanitized — radically shortened.
The which command reports an error when it can't find the command. It is a standalone executable on this Linux machine, and the which alias feeds it the aliases so it can report on them. The -a option reports on all the things that could be known as which (the second which in which -a which).
I found that adding the envp(path argument in main) to the arguments made it work
#include <errno.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdio.h>
int main(int argv, char *argc[],char *envp[])
{
int fd;
char cmd1[] = "/usr/bin/which";
char *s[] = {"which", "ls", NULL};
if (execve(cmd1, s, envp) == -1)
perror("oops ur wrong!!");
}
thanks anyways

C cannot endors another user permissions using setresuid

I wrote the following code expecting to spawn a /bin/sh from another user.
#define _GNU_SOURCE
#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>
int main(int argc, char **argv, char **envp)
{
setresgid(getegid(), getegid(), getegid());
setresuid(geteuid(), geteuid(), geteuid());
execve("/bin/sh", argv, envp);
return 0;
}
I then changed the owner to match with my target user and changed permissions (too much, I know)
chown usertarget:globalgroup ./shell
chmod 777 ./shell
chmod +s ./shell
ls -lah shell
Everything is fine according to me. However, It keeps opening a shell as my current user, not the target one.
I already tried to hardcode the userid of my target user and a few other things (setuid function, ...) but nothing seems to work...
Anyone has an idea or anything that could help me investigate this problem ?
EDIT #1
baseuser#machine:/tmp/tata$ ls -lah shell2
-rwsrwsrwx 1 targetuser globalgroup 7.2K Aug 18 18:21 shell2
baseuser#machine:/tmp/tata$ id
uid=1507(baseuser) gid=1314(globalgroup) groups=1314(globalgroup),100(users)
baseuser#machine:/tmp/tata$ ls -lah shell2
-rwsrwsrwx 1 targetuser globalgroup 7.2K Aug 18 18:21 shell2
baseuser#machine:/tmp/tata$ ./shell2
====== WELCOME USER ======
baseuser#machine:/tmp/tata$ id -a
uid=1507(baseuser) gid=1314(globalgroup) groups=1314(globalgroup),100(users)
baseuser#machine:/tmp/tata$
Well in facts, the parition was mounted with nosuid option. This can be checked through mount command

Bash reopen tty on simple program

#include <stdio.h>
#include <stdlib.h>
int main()
{
char buf[512];
fgets(buf, 512, stdin);
system("/bin/sh");
}
Compile with cc main.c
I would like a one-line command that makes this program run ls without it waiting for user input.
# This does not work - it prints nothing
(echo ; echo ls) | ./a.out
# This does work if you type ls manually
(echo ; cat) | ./a.out
I'm wondering:
Why doesn't the first example work?
What command would make the program run ls, without changing the source?
My question is shell and OS-agnostic but I would like it to work at least on bash 4.
Edit:
While testing out the answers, I found out that this works.
(python -c "print ''" ; echo ls) | ./a.out
Using strace:
$ (python -c "print ''" ; echo ls) | strace ./a.out
...
read(0, "\n", 4096)
...
This also works:
(echo ; sleep 0.1; echo ls) | ./a.out
It seems like the buffering is ignored. Is this due to the race condition?
strace shows what's going on:
$ ( echo; echo ls; ) | strace ./foo
[...]
read(0, "\nls\n", 4096) = 4
[...]
clone(child_stack=NULL, flags=CLONE_PARENT_SETTID|SIGCHLD, parent_tidptr=0x7ffdefc88b9c) = 9680
In other words, your program reads a whole 4096 byte buffer that includes both lines before it runs your shell. It's fine interactively, because by the time the read happens, there's only one line in the pipe buffer, so over-reading is not possible.
You instead need to stop reading after the first \n, and the only way to do that is to read byte by byte until you hit it. I don't know libc well enough to know if this kind of functionality is supported, but here it is with read directly:
#include <unistd.h>
#include <stdlib.h>
int main()
{
char buf[1];
while((read(0, buf, 1)) == 1 && buf[0] != '\n');
system("/bin/sh");
}

Why does using pipes with `who` cause mom not to like me?

In a program I'm writing, I fork() and execl() do determine who mom likes. I noticed that if I set up pipes to write to who's stdin, it produces no output. If I don't set up pipes to write to stdin, then who produces output as normal. (yes, I know, writing to who's stdin is pointless; it was residual code from executing other processes that made me discover this).
Investigating this, I wrote this simple program (edit: for a simpler example, just run: true | who mom likes):
$ cat t.c:
#include <unistd.h>
#include <assert.h>
int main()
{
int stdin_pipe[2];
assert( pipe(stdin_pipe) == 0);
assert( dup2(stdin_pipe[0], STDIN_FILENO) != -1);
assert( close(stdin_pipe[0]) == 0);
assert( close(stdin_pipe[1]) == 0);
execl("/usr/bin/who", "/usr/bin/who", "mom", "likes", (char*)NULL);
return 0;
}
Compiling and running results in no output, which is what surprised me initially:
$ cc t.c
$ ./a.out
$
However, if I compile with -DNDEBUG (to remove the piping work in the assert()s) and run, it works:
$ cc -DNDEBUG t.c
$ ./a.out
batman pts/0 2014-08-15 12:57 (:0)
$
As soon as I call dup2(stdin_pipe[0], STDIN_FILENO), who stops producing output. The only explanation I could come up with is that dup2 affects the tty, and who uses the tty do determine who I am (given the -m flag prints "only hostname and user associated with stdin"). My main question is:
Why can't who mom likes/who am i/who -m determine who I am when I give it a pipe for stdin? What mechanism is it using to determine its information, and why does using a pipe ruin this mechanism? I know it's using stdin somehow, but I don't understand exactly how or exactly why stdin being a pipe matters.
Let's look at the source code for GNU coreutils who:
if (my_line_only)
{
ttyname_b = ttyname (STDIN_FILENO);
if (!ttyname_b)
return;
if (STRNCMP_LIT (ttyname_b, DEV_DIR_WITH_TRAILING_SLASH) == 0)
ttyname_b += DEV_DIR_LEN; /* Discard /dev/ prefix. */
}
When -m (my_line_only) is used, who finds the tty device connected to stdin, and then proceeds to finds the entry for that tty in utmp.
When stdin is not a terminal, there is no name to look up in utmp, so it exits without printing anything.

execve("/bin/sh", 0, 0); in a pipe

I have the following example program:
#include <stdio.h>
int
main(int argc, char ** argv){
char buf[100];
printf("Please enter your name: ");
fflush(stdout);
gets(buf);
printf("Hello \"%s\"\n", buf);
execve("/bin/sh", 0, 0);
}
I and when I run without any pipe it works as it should and returns a sh promt:
bash$ ./a.out
Please enter your name: warning: this program uses gets() which is unsafe.
testName
Hello "testName"
$ exit
bash$
But this does not work in a pipe, i think I know why that is, but I cannot figure out a solution. Example run bellow.
bash$ echo -e "testName\npwd" | ./a.out
Please enter your name: warning: this program uses gets() which is unsafe.
Hello "testName"
bash$
I figure this has something to do with the fact that gets empties stdin in such a way that /bin/sh receives a EOF and promtly quits without an error message.
But how do I get around this (without modifying the program, if possible, and not removing gets, if not) so that I get a promt even though I supply input through a pipe?
P.S. I am running this on a FreeBSD (4.8) machine D.S.
You can run your program without any modifications like this:
(echo -e 'testName\n'; cat ) | ./a.out
This way you ensure that your program's standard input doesn't end after what echo outputs. Instead, cat continues to supply input to your program. The source of that subsequent input is your terminal since this is where cat reads from.
Here's an example session:
bash-3.2$ cc stdin_shell.c
bash-3.2$ (echo -e 'testName\n'; cat ) | ./a.out
Please enter your name: warning: this program uses gets(), which is unsafe.
Hello "testName"
pwd
/home/user/stackoverflow/stdin_shell_question
ls -l
total 32
-rwxr-xr-x 1 user group 9024 Dec 14 18:53 a.out
-rw-r--r-- 1 user group 216 Dec 14 18:52 stdin_shell.c
ps -p $$
PID TTY TIME CMD
93759 ttys000 0:00.01 (sh)
exit
bash-3.2$
Note that because shell's standard input is not connected to a terminal, sh thinks it is not executed interactively and hence does not display the prompt. You can type your commands normally, though.
Using execve("/bin/sh", 0, 0); is cruel and unusual punishment for the shell. It gives it no arguments or environment at all - not even its own program name, nor even such mandatory environment variables as PATH or HOME.
Not 100% sure of this (the precise shell being used and the OS might throw these answers a bit; I believe that FreeBSD uses GNU bash by default as /bin/sh?), but
sh may be detecting that its input is not a tty.
or
Your version of sh might go into non-interactive mode like that also if called as sh, expecting login will prepend a - onto argv[0] for it. Setting up execve ("/bin/sh", { "-sh", NULL}, NULL) might convince it that it's being run as a login shell.

Resources