openat() recognized /dev/stdout as a directory - c

Let's have a look at this demonstration program
#define _GNU_SOURCE
#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
int main(void) {
int fd = openat(AT_FDCWD, "/dev/stdout", O_RDONLY|O_PATH|O_DIRECTORY);
if (fd == -1) {
perror("openat");
return 1;
}
return 0;
}
I've done a ltrace of this program, and here is the important part:
When compiled using a 3.10.0-693 linux kernel the output is
__libc_start_main([ "./a.out" ] <unfinished ...>
openat(0xffffff9c, 0x400670, 0x210000, 0x4005e0) = 3
+++ exited (status 0) +++
When compiled using a 4.14.15-1 linux kernel the output is
__libc_start_main(0x40059d, 1, 0x7ffcaccd9e98, 0x4005e0 <unfinished ...>
openat(0xffffff9c, 0x400670, 0x210000, 0x4005e0) = -1
perror("openat"openat: Not a directory
) = <void>
+++ exited (status 1) +++
So it seems the openat() recognized /dev/stdout as a directory in 3.10.0-693 linux kernel. From a logical point of view this is quiet unexpected.
I could not find this documented anywhere and it feels a bit like an edge case. Can anyone explain what exactly is going on here?
Edit: strace the program on kernel 3.10.0-693 produces:
mprotect(0x7f568f480000, 4096, PROT_READ) = 0
munmap(0x7f568f473000, 47716) = 0
openat(AT_FDCWD, "/dev/stdout", O_RDONLY|O_PATH|O_DIRECTORY) = 3
exit_group(0) = ?
+++ exited with 0 +++

Related

How to intercept execve() via exec* wrapper calls using LD_PRELOAD?

I am trying to intercept execve() via execl(). Here's my wrapper call (built as a shared library — libexec.so).
#define _GNU_SOURCE
#include <dlfcn.h>
#include <stdio.h>
#include <unistd.h>
#include <errno.h>
#include <string.h>
static int (*real_exec)(const char*, char *const [], char *const []) = 0;
static void __attribute__((constructor))init(void) {
real_exec = (int (*)(const char*, char *const [], char *const []))dlsym(RTLD_NEXT, "execve");
}
int execve(const char* arg, char *const argv[], char *const envp[]) {
printf ("In wrapped execve\n");
return (*real_exec)(arg, argv, envp);
}
and here's my program that execs.
//run.c
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>
int main() {
int pid;
pid = fork();
if (pid == 0) {
execl("/usr/bin/date", "date", NULL);
} else {
wait (NULL);
}
return 0;
}
AFAIK, all other exec* calls are wrapper over execve() system call. I also validated the program above by running with strace.
> strace -f -e execve ./run
execve("./run", ["./run"], 0x7ffcefad6a28 /* 61 vars */) = 0
strace: Process 1491914 attached
[pid 1491914] execve("/usr/bin/date", ["date"], 0x7fff28393cc8 /* 61 vars */) = 0
Tuesday 16 February 2021 12:52:16 PM UTC
[pid 1491914] +++ exited with 0 +++
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=1491914, si_uid=1000, si_status=0, si_utime=0, si_stime=0} ---
+++ exited with 0 +++
But when I run the program the following way,
> LD_PRELOAD=/home/user/libexec.so ./run
the call is not getting intercepted. i.e I don't see In wrapped execve\n getting printed.
What am I missing here? If I instead directly call execve() in run.c, it works.
Secondly, does LD_PRELOAD also follow child processes? Are the calls made by the children and descendents also intercepted?
Short Answer: The issue is that with the above implemented 'libexec' will only intercepted execve calls between the main program to execve. Calls inside 'libc' between execl and execve will not be intercepted to the libexec wrapper.
Long Answer:
With LD_LIBRARY_PATH=libexec.so, the following calling tree is established: "main" -> "libexec.so" -> "libc.so". Calls from "main" will be intercepted by libexec, but calls inside "libc.so" (e.g., execl -> execve) are not intercepted (by default).
The solution is to add a wrapper to "execl" to the libexec.so, following the same structure as the execve wrapper already implemented.

can /proc/self/exe be mmap'ed?

Can a process read /proc/self/exe using mmap? This program fails to mmap the file:
$ cat e.c
#include <fcntl.h>
#include <unistd.h>
#include <sys/mman.h>
int main()
{
int f=open("/proc/self/exe",O_RDONLY);
char*p=mmap(NULL,0,PROT_READ,0,f,0);
return 0;
}
$ cc e.c -o e
$ strace ./e
[snip]
open("/proc/self/exe", O_RDONLY) = 3
mmap(NULL, 0, PROT_READ, MAP_FILE, 3, 0) = -1 EINVAL (Invalid argument)
exit_group(0) = ?
+++ exited with 0 +++
You are making 2 mistakes here:
Mapped size must be > 0. Zero-size mappings are invalid.
You have to specify, if you want to create a shared (MAP_SHARED) or a private (MAP_PRIVATE) mapping.
The following should work for example:
char *p = mmap(NULL, 4096, PROT_READ, MAP_SHARED, f, 0);
If you wish to map the full executable, you will have to do a stat() on it first, to retrieve the correct file size and then use that as the second parameter to mmap().

Overriding execve() with LD_PRELOAD only works sometimes

I want to override the execve() syscall by using LD_PRELOAD and can't figure out why it sometimes works and sometimes doesn't.
Consider this very simple code overriding execve() (I'll keep it complete so you can try it if you like):
#define _GNU_SOURCE
#include <unistd.h>
#include <dlfcn.h>
typedef ssize_t (*execve_func_t)(const char* filename, char* const argv[], char* const envp[]);
static execve_func_t old_execve = NULL;
int execve(const char* filename, char* const argv[], char* const envp[]) {
printf("Running hook\n");
old_execve = dlsym(RTLD_NEXT, "execve");
return old_execve(filename, argv, envp);
}
(compile with: gcc -std=c99 -o exec.so -shared exec.c -Wall -Wfatal-errors -fPIC -g -ldl)
and this very simple test program:
#define _GNU_SOURCE
#include <unistd.h>
#include <dlfcn.h>
int main() {
char* args[] = {"ls", "/usr", NULL};
char* envp[] = {"LD_PRELOAD=/path/to/exec.so", NULL};
execve("/usr/bin/ls", args, envp);
return 0;
}
Now, when I do export LD_PRELOAD=/path/to/exec.so in my shell, I would expect any binary I run to first execute the hook. That is not true already, which confuses me: edit: Ok, this part is clear now. The issue below is still unsolved.
» strace -f -e trace=execve ./test
execve("./test", ["./test"], [/* 58 vars */]) = 0
Running hook
execve("/usr/bin/ls", ["ls", "/usr"], [/* 1 var */]) = 0
arm-none-eabi avr bin games include lib lib32 lib64 libexec local python sbin share src usr x86_64-pc-linux-gnu
+++ exited with 0 +++
As you see, the hook is only run for the second execve, not for the first.
Still unclear:
What confuses me even more however is that in some cases, the code is not preloaded ever, not even for the child processes; for example, when running ls /usr with Python's subprocess module, this happens:
» strace -f -e trace=execve /usr/bin/python -c "import subprocess; subprocess.Popen(['ls', '/usr'])"
execve("/usr/bin/python", ["/usr/bin/python", "-c", "import subprocess; subprocess.Po"...], [/* 58 vars */]) = 0
strace: Process 8350 attached
[pid 8350] execve("/usr/local/sbin/ls", ["ls", "/usr"], [/* 58 vars */]) = -1 ENOENT (No such file or directory)
[pid 8350] execve("/usr/local/bin/ls", ["ls", "/usr"], [/* 58 vars */]) = -1 ENOENT (No such file or directory)
[pid 8350] execve("/usr/bin/ls", ["ls", "/usr"], [/* 58 vars */]) = 0
arm-none-eabi avr bin games include lib lib32 lib64 libexec local python sbin share src usr x86_64-pc-linux-gnu
[pid 8350] +++ exited with 0 +++
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=8350, si_uid=1000, si_status=0, si_utime=0, si_stime=0} ---
+++ exited with 0 +++
How is that possible? It's the exact same syscall with the exact same environment for the calling process, but it does something different. I'd be happy about any pointers about this.
Ok, the solution is actually quite simple: Python calls execv, not execve; and the standard output printed is taken up by the code calling the process and not printed to the terminal. That's why it appears to not work (while it actually does).

Unix C - Portable WEXITSTATUS

I'm trying to get the exit code of a subprocess. On Linux and FreeBSD I can go like so:
[0] [ishpeck#kiyoshi /tmp]$ uname
FreeBSD
[0] [ishpeck#kiyoshi /tmp]$ cat tinker.c
#include <stdio.h>
#include <sys/wait.h>
int main(void)
{
FILE *proc = popen("ls", "r");
printf("Exit code: %d\n", WEXITSTATUS(pclose(proc)));
return 0;
}
[0] [ishpeck#kiyoshi /tmp]$ gcc tinker.c -o tinker
[0] [ishpeck#kiyoshi /tmp]$ ./tinker
Exit code: 0
[0] [ishpeck#kiyoshi /tmp]$ grep WEXITSTATUS /usr/include/sys/wait.h
#define WEXITSTATUS(x) (_W_INT(x) >> 8)
However, on OpenBSD, I get complaints from the compiler...
[0] [ishpeck#ishberk-00 /tmp]$ uname
OpenBSD
[0] [ishpeck#ishberk-00 /tmp]$ cat tinker.c
#include <stdio.h>
#include <sys/wait.h>
int main(void)
{
FILE *proc = popen("ls", "r");
printf("Exit code: %d\n", WEXITSTATUS(pclose(proc)));
return 0;
}
[0] [ishpeck#ishberk-00 /tmp]$ gcc tinker.c -o tinker
tinker.c: In function 'main':
tinker.c:7: error: lvalue required as unary '&' operand
[1] [ishpeck#ishberk-00 /tmp]$ grep WEXITSTATUS /usr/include/sys/wait.h
#define WEXITSTATUS(x) (int)(((unsigned)_W_INT(x) >> 8) & 0xff)
I don't really care how it's done, I just need the exit code.
This leads me to believe that I would also have this problem on Mac:
http://web.archiveorange.com/archive/v/8XiUWJBLMIKYSCRJnZK5#F4GgyRGRAgSCEG1
Is there a more portable way to use the WEXITSTATUS macro? Or is there a more portable alternative?
OpenBSD's implementation of WEXITSTATUS uses the address-of operator (unary &) on its argument, effectively requiring that its argument have storage. You are calling it with the return value of a function, which doesn't have storage, so the compiler complains.
It is unclear whether OpenBSD's WEXITSTATUS is POSIX-compliant, but the problem can be easily worked around by assigning the return value of pclose() to a variable:
int status = pclose(proc);
printf("Exit code: %d\n", WEXITSTATUS(status));
As a detail that could go unnoticed for some people arriving here, BSD object code needs the library:
#include <sys/wait.h>
I was too compiling to Linux and BSD, and WEXITSTATUS worked OK without the need for that library (I don't know why) when compiling to Linux (using gcc), but failed when compiling to BSD (using clang).
If your application died or was otherwise killed, the return status is bogus. You need to check the status to see if the exit value is even valid. See the man page for waitpid.
if(WIFEXITED(status))
{
use WEXITSTATUS(status);
} else if (WIFSIGNALED(status)) {
use WTERMSIG(status);
} else {
oh oh
}

Print custom text into strace. Strace comments

We use strace a lot. We would like to output some text into strace to mark places the code has reached. The way i see people have done it so far is to stat an non-existent file. The filename is just the text they want to see in the strace. Its pretty fast, but I'm sure there is a better way. I worry that there might be a lot of code and maybe kernel locks being hit even though the mount point is bogus. Any ideas?
write() to an out-of-range file descriptor shows up well in strace output, and should be much faster - the range check happens early, and it doesn't need to look at the data at all. (You need to pass the length of the data to be written, rather than just a 0-terminated string, but gcc will normally optimise strlen() of a constant string to a constant.)
$ cat hw.c
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#define STRACE_TRACE(str) write(-1, str, strlen(str))
int main(void)
{
STRACE_TRACE("before");
printf("Hello world\n");
STRACE_TRACE("after");
return 0;
}
$ gcc -Wall -o hw hw.c
$ strace ./hw
...
write(-1, "before"..., 6) = -1 EBADF (Bad file descriptor)
fstat64(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 150), ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb77da000
write(1, "Hello world\n"..., 12Hello world
) = 12
write(-1, "after"..., 5) = -1 EBADF (Bad file descriptor)
exit_group(0) = ?
$

Resources