IsBadReadPtr analogue on Unix - c

Is there a function analogous to IsBadReadPtr in Unix? At least some functionalities of IsBadReadPtr?
I want to write a procedure which would react if something bad happens to a process (like SIGSEGV) and recover some information. But I want to check the pointers to make sure that the data is not corrupt and see if they can be accessed safely. Otherwise the crash handling procedure itself will crash, thus becoming useless.
Any suggestions?

The usual way to do this on POSIX systems is to use the write() system call. It will return EFAULT in errno rather than raising a signal if the memory cannot be read:
int nullfd = open("/dev/random", O_WRONLY);
if (write(nullfd, pointer, size) < 0)
{
/* Not OK */
}
close(nullfd);
(/dev/random is a good device to use for this on Linux, because it can be written by any user and will actually try to read the memory given. On OSes without /dev/random or where it isn't writeable, try /dev/null). Another alternative would be an anonymous pipe, but if you want to test a large region you'll need to regularly clear the reading end of the pipe.

How can you do it?
You try to do it and then handle the error.
To do this, first you set up a sigsetjmp and a SIGSEGV signal handler. Then attempt to use the pointer. If it was a bad pointer then the SIGSEGV handler is called and you can jump to safety and report the error to the user.

You can never tell "whether a pointer can be accessed safely", on Windows or on Unix. But for some similar information on some unix platforms, check out cat /proc/self/maps.

I ran into the same issue trying to read a 'pixel' from a framebuffer while running Ubuntu from within a virtualbox. There seemed to be no secure way to check access without crashing or acutally hanging gdb. The suggestion made by StasM hinted me towards to following working 'local' solution using fork.
void *some_address;
int pid = fork();
if (pid== 0)
{
someaddress[0] = some_address[0];
_exit(123);
}
bool access_ok = true;
int status;
int result = waitpid(pid, &status, 0);
if (result == -1 || WIFEXITED(status) == 0 || WEXITSTATUS(status) != 123)
{
access_ok = false;
}

Related

How to make a process unable to exit by hook sys_exit_group and sys_kill

I'm working with Android 8.1 Pixel2 XL phone.
I have hooked the sys_call_table and have replaced the syscalls with my own functions using the kernel module.
I want to make an application unable to quit.
I'm trying to invalidate an application's sys_exit_group and sys_kill.
What should I do in my own function.
I want to debug an application, but it turns on anti-debugging. So I want to hook the system call
I have tried direct return, but It wasn't work. System will call sys_kill again.But this time, I can't get the application's uid from its pid.
asmlinkage long my_sys_kill(pid_t pid, int sig)
{
char buff[MAX_PATH] = {0};
kuid_t uid = current->cred->uid;
int target_uid = get_uid_from_pid(pid);
if (target_uid == targetuid)
{
printk(KERN_DEBUG "#Tsingxing: kill hooked uid is %d pid is %d, tragetuid is %d, packagename: %s\n",uid.val,pid, target_uid, buff);
return 0;
}
printk(KERN_DEBUG "#Tsingxing:kill called uid is %d,pid is %d, traget_uid is %d\n",uid.val,pid,target_uid);
return origin_sys_kill(pid, sig);
}
asmlinkage long my_sys_exit_group(int error_code)
{
char buff[MAX_PATH] = {0};
kuid_t uid = current->cred->uid;
long tgid = current -> tgid;
long pid = current->pid;
int target_uid = get_uid_from_pid(pid);
if (uid.val == targetuid || target_uid == targetuid)
{
printk(KERN_DEBUG "#Tsingxing:exit group hooked, pid is %ld\n",pid);
return 0;
}
return origin_sys_exit_group(error_code);
}
I have solved this problem. I mixed sys_call_table and compat_sys_call_table. The Target application is using compat_sys_call_table but I'm using the __NR_xxx. I solved the problem using __NR_compat_xxx method. Just return direct in compat_sys_call_exit_group.
At a very high level, this can't work. When an application calls _Exit (possibly/likely at the end of exit), it has no path to any further code to be run. These functions are normally even marked _Noreturn, meaning that the compiler does not leave the registers/calling stack frame in a meaningful state where resumption of execution could occur. Even if it did, the program itself at the source level is not prepared to continue execution.
If the function somehow returned, the next step would be runaway wrong code execution, likely leading to arbitrary code execution under the control of an attacker if the application had been processing untrusted input of any kind.
In practice, the libc side implementation of the exit and _Exit functions likely hardens against kernel bugs (yes, what you're asking for is a bug) whereby SYS_exit_group fails to exit. I haven't verified other implementations lately but I know mine in musl do this, because it's cheap and the alternative is very dangerous.

waitpid() for non-child processes [duplicate]

For child processes, the wait() and waitpid() functions can be used to suspends execution of the current process until a child has exited. But this function can not be used for non-child processes.
Is there another function, which can wait for exit of any process ?
Nothing equivalent to wait(). The usual practice is to poll using kill(pid, 0) and looking for return value -1 and errno of ESRCH to indicate that the process is gone.
Update: Since linux kernel 5.3 there is a pidfd_open syscall, which creates an fd for a given pid, which can be polled to get notification when pid has exited.
On BSDs and OS X, you can use kqueue with EVFILT_PROC+NOTE_EXIT to do exactly that. No polling required. Unfortunately there's no Linux equivalent.
So far I've found three ways to do this on Linux:
Polling: you check for the existence of the process every so often, either by using kill or by testing for the existence of /proc/$pid, as in most of the other answers
Use the ptrace system call to attach to the process like a debugger so you get notified when it exits, as in a3nm's answer
Use the netlink interface to listen for PROC_EVENT_EXIT messages - this way the kernel tells your program every time a process exits and you just wait for the right process ID. I've only seen this described in one place on the internet.
Shameless plug: I'm working on a program (open source of course; GPLv2) that does any of the three.
You could also create a socket or a FIFO and read on them. The FIFO is especially simple: Connect the standard output of your child with the FIFO and read. The read will block until the child exits (for any reason) or until it emits some data. So you'll need a little loop to discard the unwanted text data.
If you have access to the source of the child, open the FIFO for writing when it starts and then simply forget about it. The OS will clean the open file descriptor when the child terminates and your waiting "parent" process will wake up.
Now this might be a process which you didn't start or own. In that case, you can replace the binary executable with a script that starts the real binary but also adds monitoring as explained above.
Here is a way to wait for any process (not necessarily a child) in linux to exit (or get killed) without polling:
Using inotify to wait for the /proc'pid' to be deleted would be the perfect solution, but unfortunately inotify does not work with pseudo file systems like /proc.
However we can use it with the executable file of the process.
While the process still exists, this file is being held open.
So we can use inotify with IN_CLOSE_NOWRITE to block until the file is closed.
Of course it can be closed for other reasons (e.g. if another process with the same executable exits) so we have to filter those events by other means.
We can use kill(pid, 0), but that can't guarantee if it is still the same process. If we are really paranoid about this, we can do something else.
Here is a way that should be 100% safe against pid-reuse trouble: we open the pseudo directory /proc/'pid', and keep it open until we are done. If a new process is created in the meantime with the same pid, the directory file descriptor that we hold will still refer to the original one (or become invalid, if the old process cease to exist), but will NEVER refer the new process with the reused pid. Then we can check if the original process still exists by checking, for example, if the file "cmdline" exists in the directory with openat(). When a process exits or is killed, those pseudo files cease to exist too, so openat() will fail.
here is an example code:
// return -1 on error, or 0 if everything went well
int wait_for_pid(int pid)
{
char path[32];
int in_fd = inotify_init();
sprintf(path, "/proc/%i/exe", pid);
if (inotify_add_watch(in_fd, path, IN_CLOSE_NOWRITE) < 0) {
close(in_fd);
return -1;
}
sprintf(path, "/proc/%i", pid);
int dir_fd = open(path, 0);
if (dir_fd < 0) {
close(in_fd);
return -1;
}
int res = 0;
while (1) {
struct inotify_event event;
if (read(in_fd, &event, sizeof(event)) < 0) {
res = -1;
break;
}
int f = openat(dir_fd, "fd", 0);
if (f < 0) break;
close(f);
}
close(dir_fd);
close(in_fd);
return res;
}
You could attach to the process with ptrace(2). From the shell, strace -p PID >/dev/null 2>&1 seems to work. This avoid the busy-waiting, though it will slow down the traced process, and will not work on all processes (only yours, which is a bit better than only child processes).
None I am aware of. Apart from the solution from chaos, you can use semaphores if you can change the program you want to wait for.
The library functions are sem_open(3), sem_init(3), sem_wait(3), ...
sem_wait(3) performs a wait, so you don´t have to do busy waiting as in chaos´ solution. Of course, using semaphores makes your programs more complex and it may not be worth the trouble.
Maybe it could be possible to wait for /proc/[pid] or /proc/[pid]/[something] to disappear?
There are poll() and other file event waiting functions, maybe that could help?
Since linux kernel 5.3 there is a pidfd_open syscall, which creates an fd for a given pid, which can be polled to get notification when pid has exited.
Simply poll values number 22 and 2 of the /proc/[PID]/stat.
The value 2 contains name of the executable and 22 contains start time.
If they change, some other process has taken the same (freed) PID. Thus the method is very reliable.
You can use eBPF to achieve this.
The bcc toolkit implements many excellent monitoring capabilities based on eBPF. Among them, exitsnoop traces process termination, showing the command name and reason for termination,
either an exit or a fatal signal.
It catches processes of all users, processes in containers, as well as processes that
become zombie.
This works by tracing the kernel sched_process_exit() function using dynamic tracing, and
will need updating to match any changes to this function.
Since this uses BPF, only the root user can use this tool.
You can refer to this tool for related implementation.
You can get more information about this tool from the link below:
Github repo: tools/exitsnoop: Trace process termination (exit and fatal signals). Examples.
Linux Extended BPF (eBPF) Tracing Tools
ubuntu manpages: exitsnoop-bpfcc
You can first install this tool and use it to see if it meets your needs, and then refer to its implementation for coding, or use some of the libraries it provides to implement your own functions.
exitsnoop examples:
Trace all process termination
# exitsnoop
Trace all process termination, and include timestamps:
# exitsnoop -t
Exclude successful exits, only include non-zero exit codes and fatal signals:
# exitsnoop -x
Trace PID 181 only:
# exitsnoop -p 181
Label each output line with 'EXIT':
# exitsnoop --label EXIT
Another option
Wait for a (non-child) process' exit using Linux's PROC_EVENTS
Reference project:
https://github.com/stormc/waitforpid
mentioned in the project:
Wait for a (non-child) process' exit using Linux's PROC_EVENTS. Thanks
to the CAP_NET_ADMIN POSIX capability permitted to the waitforpid
binary, it does not need to be set suid root. You need a Linux kernel
having CONFIG_PROC_EVENTS enabled.
Appricate #Hongli's answer for macOS with kqueue. I implement it with swift
/// Wait any pids, including non-child pid. Block until all pids exit.
/// - Parameters:
/// - timeout: wait until interval, nil means no timeout
/// - Throws: WaitOtherPidError
/// - Returns: isTimeout
func waitOtherPids(_ pids: [Int32], timeout: TimeInterval? = nil) throws -> Bool {
// create a kqueue
let kq = kqueue()
if kq == -1 {
throw WaitOtherPidError.createKqueueFailed(String(cString: strerror(errno)!))
}
// input
// multiple changes is OR relation, kevent will return if any is match
var changes: [Darwin.kevent] = pids.map({ pid in
Darwin.kevent.init(ident: UInt(pid), filter: Int16(EVFILT_PROC), flags: UInt16(EV_ADD | EV_ENABLE), fflags: NOTE_EXIT, data: 0, udata: nil)
})
let timeoutDeadline = timeout.map({ Date(timeIntervalSinceNow: $0)})
let remainTimeout: () ->timespec? = {
if let deadline = timeoutDeadline {
let d = max(deadline.timeIntervalSinceNow, 0)
let fractionalPart = d - TimeInterval(Int(d))
return timespec(tv_sec: Int(d), tv_nsec: Int(fractionalPart * 1000 * 1000 * 1000))
} else {
return nil
}
}
// output
var events = changes.map{ _ in Darwin.kevent.init() }
while !changes.isEmpty {
// watch changes
// sync method
let numOfEvent: Int32
if var timeout = remainTimeout() {
numOfEvent = kevent(kq, changes, Int32(changes.count), &events, Int32(events.count), &timeout);
} else {
numOfEvent = kevent(kq, changes, Int32(changes.count), &events, Int32(events.count), nil);
}
if numOfEvent < 0 {
throw WaitOtherPidError.keventFailed(String(cString: strerror(errno)!))
}
if numOfEvent == 0 {
// timeout. Return directly.
return true
}
// handle the result
let realEvents = events[0..<Int(numOfEvent)]
let handledPids = Set(realEvents.map({ $0.ident }))
changes = changes.filter({ c in
!handledPids.contains(c.ident)
})
for event in realEvents {
if Int32(event.flags) & EV_ERROR > 0 { // #see 'man kevent'
let errorCode = event.data
if errorCode == ESRCH {
// "The specified process to attach to does not exist"
// ingored
} else {
print("[Error] kevent result failed with code \(errorCode), pid \(event.ident)")
}
} else {
// succeeded event, pid exit
}
}
}
return false
}
enum WaitOtherPidError: Error {
case createKqueueFailed(String)
case keventFailed(String)
}
PR_SET_PDEATHSIG can be used to wait for parent process termination

C-program does not return from wait-statement

I have to migrate a C-program from OpenVMS to Linux, and have now difficulties with a program generating subprocesses. A subprocess is generated (fork works fine), but execve fails (which is correct, as the wrong program name is given).
But to reset the number of active subprocesses, I afterwards call a wait() which does not return. When I look at the process via ps, I see that there are no more subprocesses, but wait() does not return ECHILD as I had thought.
while (jobs_to_be_done)
{
if (running_process_cnt < max_process_cnt)
{
if ((pid = vfork()) == 0)
{
params[0] = param1 ;
params[1] = NULL ;
if ((cstatus = execv(command, params)) == -1)
{
perror("Child - Exec failed") ; // this happens
exit(EXIT_FAILURE) ;
}
}
else if (pid < 0)
{
printf("\nMain - Child process failed") ;
}
else
{
running_process_cnt++ ;
}
}
else // no more free process slot, wait
{
if ((pid = wait(&cstatus)) == -1) // does not return from this statement
{
if (errno != ECHILD)
{
perror("Main: Wait failed") ;
}
anz_sub = 0 ;
}
else
{
...
}
}
}
Is the anything that has to be done to tell the wait-command that there are no more subprocesses?
With OpenVMS the program works fine.
Thanks a lot in advance for your help
I don't recommend using vfork these days on Linux, since fork(2) is efficient enough, thanks to lazy copy-on-write techniques in the Linux kernel.
You should check the result of fork. Unless it is failing, a process has been created, and wait (or waitpid(2), perhaps with WNOHANG if you don't want to really wait, but just find out about already ended child processes ...) should not fail (even if the exec function in the child has failed, the fork did succeed).
You might also carefully use the SIGCHLD signal, see signal(7). A defensive way of using signals is to set some volatile sigatomic_t flag in signal handlers, and test and clear these flags inside your loop. Recall that only async signal safe functions (and there are quite few of them) can be called -even indirectly- inside a signal handler. Read also about POSIX signals.
Take time to read Advanced Linux Programming to get a wider picture in your mind. Don't try to mimic OpenVMS on POSIX, but think in a POSIX or Linux way!
You probably may want to always waitpid in your loop, perhaps (sometimes or always) with WNOHANG. So waitpid should not be only called in the else part of your if (running_process_cnt < max_process_cnt) but probably in every iteration of your loop.
You might want to compile with all warnings & debug info (gcc -Wall -Wextra -g) then use the gdb debugger. You could also strace(1) your program (probably with -f)
You might want to learn about memory overcommitment. I dislike this feature and usually disable it (e.g. by running echo 0 > /proc/sys/vm/overcommit_memory as root). See also proc(5) -which is very useful to know about...
From man vfork:
The child must not return from the current function or call exit(3), but may call _exit(2)
You must not call exit() when the call to execv (after vfork) fails - you must use _exit() instead. It is quite possible that this alone is causing the problem you see with wait not returning.
I suggest you use fork instead of vfork. It's much easier and safer to use.
If that alone doesn't solve the problem, you need to do some debugging or reduce the code down until you find the cause. For example the following should run without hanging:
#include <sys/wait.h>
int main(int argc, char ** argv)
{
pid_t pid;
int cstatus;
pid = wait(&cstatus);
return 0;
}
If you can verify that this program doesn't hang, then it must be some aspect of your program that is causing a hang. I suggest putting in print statements just before and after the call to wait.

Is it possible to fork/exec and guarantee one starts before the other?

Pretty much as the title says. I have a snippet of code that looks like this:
pid_t = p;
p = fork();
if (p == 0) {
childfn();
} else if (p > 0) {
parentfn();
} else {
// error
}
I want to ensure that either the parent or the child executes (but not returns from) their respective functions before the other.
Something like a call to sleep() would probably work, but is not guaranteed by any standard, and would just be exploiting an implementation detail of the OS's scheduler...is this possible? Would vfork work?
edit: Both of the functions find their way down to a system() call, one of which will not return until the other is started. So to re-iterate: I need to ensure that either the parent or the child only calls their respective functions (but not returns, cause they won't, which is what all of the mutex based solutions below offer) before the other. Any ideas? Sorry for the lack of clarity.
edit2: Having one process call sched_yield and sleep, I seem to be getting pretty reliable results. vfork does provide the semantics I am looking for, but comes with too many restrictions on what I can do in the child process (I can pretty much only call exec). So, I have found some work-arounds that are good enough, but no real solution. vfork is probably the closest thing to what I was looking for, but all the solutions presented below would work more or less.
This problem would normally be solved by a mutex or a semaphore. For example:
// Get a page of shared memory
int pagesize = getpagesize();
void *mem = mmap(NULL, pagesize, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_ANONYMOUS, -1, 0);
if(!mem)
{
perror("mmap");
return 1;
}
// Put the semaphore at the start of the shared page. The rest of the page
// is unused.
sem_t *sem = mem;
sem_init(sem, 1, 1);
pid_t p = fork();
if (p == 0) {
sem_wait(sem);
childfn();
sem_post(sem);
} else if (p > 0) {
sem_wait(sem);
parentfn();
sem_post(sem);
int status;
wait(&status);
sem_destroy(sem);
} else {
// error
}
// Clean up
munmap(mem, pagesize);
You could also use a mutex in a shared memory region, but you need to make sure to create with non-default attributes with the process-shared attribute said to shared (via pthread_mutexattr_setpshared(&mutex, PTHREAD_PROCESS_SHARED)) in order for it to work.
This ensures that only one of childfn or parentfn will execute at any given time, but they could run in either order. If you need to have a particular one run first, start the semaphore off with a count of 1 instead of 0, and have the function that needs to run first not wait for the semaphore (but still post to it when finished). You might also be able to use a condition variable, which has different semantics.
A mutex should be able to solve this problem. Lock the mutex before the call to fork and have the 1st function excute as normal, while the second tries to claim the mutex. The 1st should unlock the mutex when it is done and the second will wait until it is free.
EDIT: Mutex must be in a shared memory segment for the two processes
Safest way is to use a (named) pipe or socket. One side writes to it, the other reads. The reader cannot read what has not been written yet.
Use a semphore to ensure that one starts before the other.
You could use an atomic variable. Set it to zero before you fork/thread/exec, have the first process set it to one just before (or better, after) it enters the function, and have the second wait while(flag == 0).

snprintf in signal handler creates segmentation fault if started with valgrind

This very simple C program gives me a segmentation fault when running it with valgrind.
Its runs fine when started normal.
It crashes when you send the USR1 signal to the process.
The problem seems to be the way snprintf handles the formatting of the float value, because it works fine if you use a string (%s) or int (%d) format parameter.
P.S. I know you you shouldn't call any printf family function inside a signal handler, but still why does it only crash with valgrind.
#include <stdio.h>
#include <signal.h>
void sig_usr1(int sig) {
char buf[128];
snprintf(buf, sizeof(buf), "%f", 1.0);
}
int main(int argc, char **argv) {
(void) signal(SIGUSR1, sig_usr1);
while(1);
}
As cnicutar notes, valgrind may have an effect on anything timing related and signal handlers would certainly qualify.
I don't think snprintf is safe to use in a signal handler so it might be working in the non-valgrind case solely by accident and then valgrind comes in, changes the timing, and you get the flaming death that you were risking without valigrind.
I found a list of functions that are safe in signal handlers (according to POSIX.1-2003 ) here:
http://linux.die.net/man/2/signal
Yes, the linux.die.net man pages are a bit out of date but the list here (thanks to RedX for finding this one):
https://www.securecoding.cert.org/confluence/display/seccode/SIG30-C.+Call+only+asynchronous-safe+functions+within+signal+handlers
doesn't mention snprintf either except in the context of OpenBSD where it say:
... asynchronous-safe in OpenBSD but "probably not on other systems," including snprintf(), ...
so the implication is that snprintf is not, in general, safe in a signal handler.
And, thanks to Nemo, we have an authoritative list of functions that are safe for use in signal handlers:
http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04_03
Start at that link and search down for _Exit and you'll see the list; then you'll see that snprintf is not on the list.
Also, I remember using write() in a signal handler because fprintf wasn't safe for a signal handler but that was a long time ago.
I don't have a copy of the relevant standard so I can't back this up with anything really authoritative but I thought I'd mention it anyway.
From manual: http://www.network-theory.co.uk/docs/valgrind/valgrind_27.html and http://www.network-theory.co.uk/docs/valgrind/valgrind_24.html
Valgrind's signal simulation is not as robust as it could be. Basic POSIX-compliant sigaction and sigprocmask functionality is supplied, but it's conceivable that things could go badly awry if you do weird things with signals. Workaround: don't. Programs that do non-POSIX signal tricks are in any case inherently unportable, so should be avoided if possible.
So, snprintf in signal handler is not a POSIX-allowed signal trick and valgrind has a right to brick your programs.
Why snprintf is not signal-safe?
The glibc manual says: http://www.gnu.org/software/hello/manual/libc/Nonreentrancy.html
If a function uses and modifies an object that you supply, then it is potentially non-reentrant; two calls can interfere if they use the same object.
This case arises when you do I/O using streams. Suppose that the signal handler prints a message with fprintf. Suppose that the program was in the middle of an fprintf call using the same stream when the signal was delivered. Both the signal handler's message and the program's data could be corrupted, because both calls operate on the same data structure—the stream itself.
However, if you know that the stream that the handler uses cannot possibly be used by the program at a time when signals can arrive, then you are safe. It is no problem if the program uses some other stream.
You can say that s*printf* are not on streams, but on strings. But internally, glibc's snprintf does work on special stream:
ftp://sources.redhat.com/pub/glibc/snapshots/glibc-latest.tar.bz2/glibc-20090518/libio/vsnprintf.c
int
_IO_vsnprintf (string, maxlen, format, args)
{
_IO_strnfile sf; // <<-- FILE*-like descriptor
The %f output code in glibc also has a malloc call inside it:
ftp://sources.redhat.com/pub/glibc/snapshots/glibc-latest.tar.bz2/glibc-20090518/stdio-common/printf_fp.c
/* Allocate buffer for output. We need two more because while rounding
it is possible that we need two more characters in front of all the
other output. If the amount of memory we have to allocate is too
large use `malloc' instead of `alloca'. */
size_t wbuffer_to_alloc = (2 + (size_t) chars_needed) * sizeof (wchar_t);
buffer_malloced = ! __libc_use_alloca (chars_needed * 2 * sizeof (wchar_t));
if (__builtin_expect (buffer_malloced, 0))
{
wbuffer = (wchar_t *) malloc (wbuffer_to_alloc);
if (wbuffer == NULL)
/* Signal an error to the caller. */
return -1;
}
else
wbuffer = (wchar_t *) alloca (wbuffer_to_alloc);
Valgrind slightly changes the timings in your program.
Have a loot at the FAQ.
My program crashes normally, but doesn't under Valgrind, or vice
versa. What's happening?
When a program runs under Valgrind, its environment is slightly
different to when it runs natively.
Most of the time this doesn't make any difference, but it can,
particularly if your program is buggy.
This is a valgrind bug. It calls your signal handler with a stack that is not 16-byte aligned as required by the ABI. On x86_64, floating point arguments are passed in XMM registers which can only be stored at addresses that are 16-byte aligned. You can work around the problem by compiling for 32-bit (gcc -m32).

Resources