I'm writing a program that spawns child processes. For security reasons, I want to limit what these processes can do. I know of security measures from outside the program such as chroot or ulimit, but I want to do something more than that. I want to limit the system calls done by the child process (for example preventing calls to open(), fork() and such things). Is there any way to do that? Optimally, the blocked system calls should return with an error but if that's not possible, then killing the process is also good.
I guess it can be done wuth ptrace() but from the man page I don't really understand how to use it for this purpose.
It sounds like SECCOMP_FILTER, added in kernel version 3.5, is what you're after. The libseccomp library provides an easy-to-use API for this functionality.
By the way, chroot() and setrlimit() are both system calls that can be called within your program - you'd probably want to use one or both of these in addition to seccomp filtering.
If you want to do it the ptrace way, you have some options (and some are really simple). First of all, I recommend you to follow the tutorial explained here. With it you can learn how to know what system calls are being called, and also the basic ptrace knowledge (don't worry, it's a very short tutorial). The options (that I know) you have are the following:
The easiest one would be to kill the child, that is this exact code here.
Secondly you could make the child fail, just by changing the registers with PTRACE_SETREGS, putting wrong values in them, and you can also change the return value of the system call if you want (again, with PTRACE_SETREGS).
Finally you could skip the system call. But for that you should know the address after the system call call, make the intruction register point there and set it (again, with PTRACE_SETREGS).
Related
I need to know is it possible to intercept user executed command in loadable kernel module. I know that system calls can be intercepted such as open(). But what i need to do is intercepts user entered command/ process and add some validations. for example, if user enters cp command, before executing the command i need to perform some validations against it. If we cannot do this in LKM, what are the alternative approaches?
You need to look up how many actual system calls there are for execvp() and friends (probably 1, maybe 2 — it could be more, but probably isn't), and then intercept those system calls. You might need to worry about posix_spawn() and friends too. They're the only ways that new processes can be run. There isn't any other way to intercept them.
You could try using an LKM or a systemtap plugin(which compiles to an LKM). The kernel functions that you should hook are execve and execveat. In case you are doing this for programming fun and want to write hooking code by yourself, you might want to look at kprobes and know that you can get kernel function addresses from /proc/kallsyms.
Of course, recompiling the kernel with your own hooking code is another option if that is a possibility.
In both the cases above you probably want to intercept the execve calls made by a specific uid; if so, you should filter calls from that uid.
A userspace approach might be to try writing a seccomp filter. Here is a tutorial on how to go about writing one.
In the man pages I've been reading, it seems popen, system, etc. tend to call fork(). In turn, fork() copies the process's entire memory state. This seems really heavy, especially when in many situations a child from a call to fork() uses little if any of the memory allocated for the parent.
So, my question is, can I get fork() like behavior without duplicating the whole memory state of the parent process? Or is there something I am missing, such that fork() is not as heavy as it appears (like, maybe calls tend to be optimized to avoid unnecessary memory duplication)?
fork(2) is, as all syscalls, a primitive operation (but some C libraries use clone(2) for it), from the point of view of user-space application. It is mostly a single machine instruction SYSCALL or SYSENTER to switch from user-mode to kernel-mode, then the (recent version of) Linux kernel is doing quite significant processing.
It is in practice quite efficient (e.g. less than a millisecond, and sometimes even less than a tenth of it) because the kernel is extensively using lazy copy-on-write techniques to share pages between parent & child processes. The actual copying would happen later, on page faults, when overwriting a shared page.
And forking has a huge advantage, since the starting of some other program is delegated to execve(2): it is conceptually simple: the only difference between the parent & child processes is the result of fork
BTW on POSIX systems such as Linux, fork(2) or the suitable clone(2) equivalent is the only way to create a process (there are some few weird exceptions that you should generally ignore: the kernel is making some processes like /sbin/init etc...), since vfork(2) is obsolete.
The problem is that to run the main function of a standardly linked executable, you need to call execve, and exec replaces the whole process image and so you need a new address space, which is what fork is for.
You can get around this by having your calee expose its main functionality in a shared library (but then it must not be called main), and then you can load the function with the main functionality without having to fork (provided there are no symbol conflicts).
That would be a more efficient alternative to system (basically with the efficiency of a function call).
Now popen involves pipes and to use pipes you need to have the pipe ends in different schedulable units. Threads, which use the same address space, can be used here as a lighter alternative to separate processes.
As you alluded to fork() is a bit of a mad syscall that has kind of stuck around for historical reasons. There's a great article about its flaws here, and also this post goes into some details and potential workarounds.
Although on Linux fork() is optimised to use copy-on-write for the memory, it's still not "free" because:
It still has to do some memory-related admin (new page tables, etc.)
If you're using RAII (e.g. in C++ or possibly Rust) then all the objects that are copied will be cleaned up twice. That might even lead to logic errors (e.g. deleting temporary files twice).
It's likely that the parent process will keep running, probably modifying lots of its memory, and then it will have to be copied.
The alternatives appear to be:
vfork()
clone()
posix_spawn()
vfork() was created for the common use case of doing fork() and then execve() to run a program. execve() replaces all of the memory of the current process with a new set, so there's no point copying the parent process's memory if your just about to obliterate it.
So vfork() doesn't do that. Instead it runs in the same memory space as the parent process and pauses it until it gets to execve(). The Linux man page for vfork() says that doing just about anything except vfork() then execve() is undefined behaviour.
posix_spawn() is basically a nice wrapper around vfork() and then execve().
clone() is similar to fork() but allows you to exactly specify what is copied (file descriptors, memory, etc.). It has a load of options, including one (CLONE_VM) which lets the child process run in the same address space as the parent, which is pretty wild! I guess that is the lightest weight way to make a new process because it doesn't involve any copying of memory at all!
But in practice I think in most situations you should either:
Use threads, or
Use posix_spawn().
(Note, I am just researching this now; I'm not an expert so I might have got some things wrong.)
I'm not sure if the title accurately describes what I want to do but here's the rub:
We have a large and hairy codebase (not-invented-here courtesy of Elbonian Code Slaves) which currently compiles as one big binary which internally creates several pthreads for various specific tasks, communicating through IPC messages.
It's not ideal for a number of reasons, and several of the threads would be better as independent autonomous processes as they are all individual specific "workers" rather than multiple instances of the same piece of code.
I feel a bit like I'm missing some trick, is our only option to split off the various thread code and compile each as a standalone executable invoked using system() or exec() from the main blob of code? It feels clunky somehow.
If you want to take a part of your program that currently runs as a thread, and instead run it as a separate process launched by your main program, then you have two main options:
Instead of calling pthread_create(), fork() and in the child process call the thread-start function directly (do not use any of the exec-family functions).
Compile the code that the the thread executes as a separate executable. Launch that executable at need by the standard fork / exec sequence. (Or you could use system() instead of fork/exec, but don't. Doing so needlessly brings the shell into it, and also gives you much less control.)
The former has the disadvantage that each process image contains a lot of code that it will never use, since each is a complete copy of everything. Inasmuch as under Linux fork() uses copy-on-write, however, that's mostly an address-space issue, not a resource-wastage issue.
The latter has the disadvantage that the main program needs to be able to find the child programs on the file system. That's not necessarily a hard problem, mind you, but it is substantially different from already having the needed code at hand. If there is any way that any of the child programs would be independently useful, however, then breaking them out as separate programs makes a fair amount of sense.
Do note, by the way, that I do not in general accept your premise that it is inappropriate to implement specific for-purpose workers as threads. If you want to break out such tasks, however, then the above are your available alternatives.
Edited to add:
As #EOF pointed out, if you intend that after the revamp your main process will still be multi-threaded (that is, if you intend to convert only some threads to child processes) then you need to be aware of a significant restriction placed by POSIX:
If a multi-threaded process calls fork(), [...] to avoid errors, the child process may only execute async-signal-safe operations until such time as one of the exec functions is called.
On the other hand, I'm pretty sure the relevant definition of "multi-threaded" is that the process has multiple live threads at the time fork() is called. It should not present a problem if the child processes are all forked off before any additional threads are created, or after all but one thread is joined.
I'm trying to run an application in C, but the only way I could find that is reasonably easy to use works like this:
system("command here");
It works, of course, but it's really slow (especially when repeating this a lot). I'm just wondering if there is a way of running a program without having to interact with a shell, something like python's subprocess module.
I have heard of execl, and I would use that (forking it first, of course), but I'm wondering if there is a simpler way that wouldn't require forking first.
EDIT: I also want to be able to know the return code of the program
As I'm sure you already know, system already employs the fork/exec strategy. I understand you want to circumvent the shell and are looking for a simple approach, I'm just saying you could just as easily write a function to wrap the fork/exec pattern as is done in system. Indeed it would probably be most straightforward to just do that. An alternative as Gabe mentioned in the comments is posix_spawn.
A faster (but apparently discouraged) alternative is vfork() / exec, but this is generally discouraged and is obsolete in the latest POSIX standards.
4.3BSD; POSIX.1-2001 (but marked OBSOLETE). POSIX.1-2008 removes the
specification of vfork().
It's meant to be immediately followed by an exec or _exit. Otherwise all kinds of weird bugs can arise since the virtual memory pages and page tables aren't duplicated (child uses same data/heap/stack segments). The parent/calling process blocks until the child execs or _exits. Regular fork's modern implementations have copy-on-write semantics which approach the speed of vfork, without the potential bugs incurred by vfork's memory sharing semantics.
If you want even further control over memory-sharing semantics and process inheritance, and the consequent potential speed-up (and are on Linux), look into clone() (wrapper for system-call sys_clone()) which is what some process-creating system calls delegate their work to. Be sure to carefully comb over all of the various flags.
You can use waitpid to get the exit status of the process.
If neither system() nor popen() provides the mechanism you need, then the easy way to do it is with fork() and execv() (or, perhaps, execl(), but the argument list must be fixed at compile time, not variable, to use it). Really! It is not hard to do fork() and exec(), and any alternative will encapsulate that processing.
The Python subprocess module is simply hiding fork() and exec() for you behind a convenient interface. That's probably appropriate for a high-level language like Python. C is a lower-level language and doesn't really need the complexity.
The hard way to do it is with posix_spawn(). You have to create arguments to describe all the actions you want done in the child between the fork() and the exec(), which is far harder to set up than it is to simply do the fork(), make the changes, and then use exec() after all. This (posix_spawn()) is what you get when you design the code to spawn a child process without visibly using fork() and exec() and ensure that it can handle almost any reasonable circumstance.
You'll need to consider whether you need to use wait() or waitpid() or a variant to determine when the child is complete. You may need to consider whether to handle the SIGCHLD signal (which will notify you when a child dies).
I have my own POSIX application which starts a child process. I want the parent process to be notified with the names of all files the child process reads or writes, as well as the file names of any child processes the child spawns, and any dynamic libraries it loads. Similarly, I need to monitor all child processes spawned by child processes, etc.
How is this done?
I have two ideas for this.
Method 1 - The "real way".
I think you want ptrace. But it isn't going to be easy to use.
Essentially this call is for writing a debugger. Note that PTRACE_SYSCALL steps until the next syscall. At which point you might be able to use more ptrace calls to peek at the process's memory to observe if it's, say, a call to open().
Method 2 - The lazy, hackish way.
You could use the LD_PRELOAD environment variable. That is, write a shared library with your own implementation of the calls you want to hook (say, open(), dlopen()), adding your own code and dispatching to the normal libc version. Then you point the LD_PRELOAD environment variable at this shared library so the dynamic linker will load it at process start.
One downside to this approach is that if a process knows it's being observed this way, it can reset the environment variable and execute itself again, and evade detection. Another I can think of is that as a security feature this environment variable is not honored if you're root.