setuid equivalent for non-root users - c

Does Linux have some C interface similar to setuid, which allows a program to switch to a different user using e.g. the username/password? The problem with setuid is that it can only be used by superusers.
I am running a simple web service which requires jobs to be executed as the logged in user. So the main process runs as root, and after the user logs in it forks and calls setuid to switch to the appropriate uid. However, I am not quite comfortable with the main proc running as root. I would rather have it run as another user, and have some mechanism to switch to another user similar to su (but without starting a new process).

First, setuid() can most definitely be used by non-superusers. Technically, all you need in Linux is the CAP_SETUID (and/or CAP_SETGID) capability to switch to any user. Second, setuid() and setgid() can change the process identity between the real (user who executed the process), effective (owner of the setuid/setgid binary), and saved identities.
However, none of that is really relevant to your situation.
There exists a relatively straightforward, yet extremely robust solution: Have a setuid root helper, forked and executed by your service daemon before it creates any threads, and use an Unix domain socket pair to communicate between the helper and the service, the service passing both its credentials and the pipe endpoint file descriptors to the helper when user binaries are to be executed. The helper will check everything securely, and if all is in order, it will fork and execute the desired user helper, with the specified pipe endpoints connected to standard input, standard output, and standard error.
The procedure for the service to start the helper, as early as possible, is as follows:
Create an Unix domain socket pair, used for privileged communications between the service and the helper.
Fork.
In the child, close all excess file descriptors, keeping only one end of the socket pair. Redirect standard input, output, and error to /dev/null.
In the parent, close the child end of the socket pair.
In the child, execute the privileged helper binary.
The parent sends a simple message, possibly one without any data at all, but with an ancillary message containing its credentials.
The helper program waits for the initial message from the service.
When it receives it, it checks the credentials. If the credentials do not pass muster, it quits immediately.
The credentials in the ancillary message define the originating process' UID, GID, and PID. Although the process needs to fill in these, the kernel verifies they are true. The helper of course verifies that UID and GID are as expected (correspond to the account the service ought to be running as), but the trick is to get the statistics on the file the /proc/PID/exe symlink points to. That is the genuine executable of the process that sent the credentials. You should verify it is the same as the installed system service daemon (owned by root:root, in the system binary directory).
There is a very simple attack that may defeat the security up to this point. A nefarious user may create their own program, that forks and executes the helper binary correctly, sends the initial message with its true credentials -- but replaces itself with the correct system binary before the helper has a chance to check what the credentials actually refer to!
That attack is trivially defeated by three further steps:
The helper program generates a (cryptographically secure) pseudorandom number, say 1024 bits, and sends it back to the parent.
The parent sends the number back, but again adds its credentials in an ancillary message.
The helper program verifies that the UID, GID, and PID have not changed, and that /proc/PID/exe still points to the correct service daemon binary. (I'd just repeat the full checks.)
At step 8, the helper has already ascertained the other end of the socket is executing the binary it ought to be executing. Sending it a random cookie it has to send back, means the other end cannot have "stuffed" the socket with the messages beforehand. Of course this assumes the attacker cannot guess the pseudorandom number beforehand. If you want to be careful, you can read a suitable cookie from /dev/random, but remember it is a limited resource (may block if there is not enough randomness available to the kernel). I'd personally just read say 1024 bits (128 bytes) from /dev/urandom, and use that.
At this point, the helper has ascertained the other end of the socket pair is your service daemon, and the helper can trust the control messages as far as it can trust the service daemon. (I'm assuming this is the only mechanism the service daemon will spawn user processes; otherwise you'd need to re-pass the credentials in every further message, and re-check them every time in the helper.)
Whenever the service daemon wishes to execute a user binary, it
Creates the necessary pipes (one for feeding standard input to the user binary, one to get back the standard output from the user binary)
Sends a message to the helper containing
Identity to run the binary as; either user (and group) names, or UID and GID(s)
Path to the binary
Command-line parameters given to the binary
An ancillary message containing the file descriptors for the user binary endpoints of the data pipes
Whenever the helper gets such a message, it forks. In the child, it replaces standard input and output with the file descriptors in the ancillary message, changes identity with setresgid() and setresuid() and/or initgroups(), changes the working directory to somewhere appropriate, and executes the user binary. The parent helper process closes the file descriptors in the ancillary message, and waits for the next message.
If the helper exits when there is going to be no more input from the socket, then it will automatically exit when the service exits.
I could provide some example code, if there is sufficient interest. There's lots of details to get right, so the code is a bit tedious to write. However, correctly written, it is more secure than e.g. Apache SuEXEC.

No, there is no way to change UID using only a username and password. (The concept of a "password" is not recognized by the kernel in any fashion -- it only exists in userspace.) To switch from one non-root UID to another, you must become root as an intermediate step, typically by exec()-uting a setuid binary.
Another option in your situation may be to have the main server run as an unprivileged user, and have it communicate with a back-end process running as root.

Related

Protect /dev/shm file

I'm working on an application which is using a shared memory via shm_open(). It perform mmap() from a file within /dev/shm and is based on producer/consumer approach.
Is there any mechanism for my shared memory to be protected and accessible only by this application? I know it is possible to use encryption but does linux (or the programming language) provide any services so that the file is only accessible by my application?
If you use fd = shm_open(name, O_RDWR | O_CREAT | O_EXCL, 0);, then the shared memory object cannot be opened by any other process (without changing the access mode first). If it succeeds (fd != -1), and you immediately unlink the object via int rc = shm_unlink(name); successfully (rc == 0), only processes that can access the current process itself can access the object.
There is a small time window between the two operations when another process with sufficient privileges might have changed the mode and opened the object. To check, use fcntl(fd, F_SETLEASE, F_WRLCK) to obtain a write lease on the object. It will succeed only if this is the only process with access to the object.
Have the first instance of the application bind to a previously-agreed Unix domain stream socket, named or abstract, and listen for incoming connections on it. (For security reasons, it is important to use fcntl(sockfd, F_SETFD, FD_CLOEXEC) to avoid leaking the socket to a child process in case it exec()s a new binary.)
If the socket has been already bound, the bind will fail; so connect to that socket instead. When the first instance accepts a new connection, or the second instance connects to i, both must use int rc = getsockopt(connfd, SOL_SOCKET, SO_PEERCRED, &creds, &credslen); with struct ucred creds; socklen_t credslen = sizeof creds;, to obtain the credentials of the other side.
You can then check that the uid of the other side matches getuid() and geteuid(), and verify using e.g. stat() that the path "/proc/PID/exe" (where PID is the pid of the other side) refers to the same inode on the same filesystem as "/proc/self/exe". If they do, both sides are executing the same binary. (Note that you can also use POSIX realtime signals, via sigqueue(), passing one data token (of int, void pointer, or uintptr_t/intptr_t which happen to match unsigned long/long on Linux) between them.) This is useful, for example if one wants to notify the other that they're about to exit, and the other one should bind to and listen for incoming connections on the Unix domain stream socket.)
Then, the initial process can pass a copy of the shared object description (via descriptor fd) to the second process, using an SCM_RIGHTS ancillary message, with for example the actual size of the shared object as data (recommend a size_t for this). If you want to pass other stuff, use a structure.
The first (often, but not necessarily only) message the second process receives will contain the ancillary data with a new file descriptor referring to the shared object. Note that because this is an Unix domain stream socket, message boundaries are not preserved, and if there wasn't a full data payload, you need to use a loop to read the rest of the data.
Both sides can then close the Unix domain socket. The second side can then mmap() the shared object.
If there is never more than this exact pair of processes sharing data, then both sides can close the descriptor, making it impossible for anyone except superuser or the kernel to access the shared descriptor. The kernel will keep an internal reference as long as the mapping exists; it is equivalent to the process having the descriptor still open, except that the process itself cannot access or share the descriptor anymore, only the shared memory itself.
Because the shared object has been unlinked already, no cleanup is necessary. The shared object will vanish as soon as the last process with an open descriptor or existing mmap closes it, unmaps it, or exits.
The Unix security model that Linux implements does not have strong boundaries between processes running as the same uid. In particular, they can examine each others /proc/PID/ pseudodirectories, including their open file descriptors listed under /proc/PID/fd/.
Because of this, security-sensitive applications usually run as a dedicated user. The aforementioned scheme works well even when the second party is a process running as the human user, and the first party as the dedicated application uid. If you use a named Unix domain stream socket, you do need to ensure its access mode is suitable (you can use chmod(), chgrp(), et al. after binding to the socket, to change the named Unix domain stream socket access mode). Abstract Unix domain stream sockets do not have a filesystem-visible node, and any process can connect to such a bound socket.
When a privilege boundary is involved between the application (running as its own dedicated uid) and the agent (running as an user uid), it is important to make sure that both sides are who they claim to be across the entire exchange. The credentials are valid only at that point in time, and a known attack method is to have the valid agent execute a nefarious binary just after having connected to the socket, so that the other side still sees the original credentials, but the next communications are in control of a nefarious process.
To avoid this, make sure the socket descriptor is not shared across an exec (using CLOEXEC descriptor flag), and optionally check the peer credentials more than once, for example initially and finally.
Why is this "complicated"? Because proper security has to be baked in, it cannot be added on top afterwards, or taken invisibly care of for you: it must be a part of the approach. Changes in the approach must be reflected in the security implementation, or you have no security.
In real life, after you implement this (for the same-executable-binary one, and the privileged-service-or-application and user-agent one), you'll find that it isn't as complicated as it sounds: each step has their purpose, and can be tweaked if the approach changes. In particular, it isn't much C code at all.
If one wants or needs "something easier", then one just has to pick something other than security-sensitive code.

How do I detect if I'm a first instance or send a IPC message to previous instance of same app?

My app is suppose to run long term (usually idling). If I try to open a second app (or trigger the app via global hotkey) I'd like my existing instance to receive some kind of IPC message and bring itself to the front. How do I do this on linux? The problem I've been running into is if I hold a global lock it doesn't automatically free when the instance close (usually I unlock it but an app can crash). If I try to use mkfifo I have no idea if I'm the first instance or not and every solution I can think of seems to require a lot of code and usually that's a sign to me I might be doing something wrong
There are many IPC primitives, all possible to use.
A simple one is using a named pipe: If the pipe doesn't exist then the program creates it and starts as usual. Then it polls the pipe at regular intervals to see if something can be received on the pipe, in which case the program receives it (and discards it) and puts itself to the "front".
If, on the other hand, the named pipe exists, then the program sends a simple dummy message through it, and exits.
I would use a flag file, e.g. /run/service-name/pid with PID of the first running instance. A new instance would check this file, if it does not exists, create it, if it does, send a SIGUSR1 to the PID in the file.
#Some programmer dude's answer above provides a bit more of flexibility.

How to drop privilege temporarily from root?

I am developing a daemon running as root, but need to call an API with the user, I checked the API codes, it uses getuid() to get the user.
If root user drops privilege by setuid() , it can't be restored to root. If calling seteuid(), the API will still do something as user uid=0.
I think fork before accessing API and setuid in the child process should work, but even if COW , it will cost much if calling API many times. Is it possible to solve the problem except using process pool?
Yes! Create a single process to call the API with the appropriate UID and communicate with the rest of the program through a Pipe, a UNIX domain socket or (shared memory)1.
I mean, fork only once and keep the privileged user running another process. Then create communication between the two if needed and as needed. Also, you might want to consider using dbus since it also integrates perfectly with systemd and on modern linux you want your daemon to interact nicely with both.
Note: I am by no means an expert on the subject, but this is a simple idea that seems clear to me. You don't need to create a process for every call to the API. This is a good example of the XY problem, the real problem that you want to solve, has nothing to do with avoiding to fork() multiple times because the idea of doing that is the wrong solution. You only need to fork() once, drop privileges and stay there without privileges, communicating with the parent process if/as needed.
1Any IPC mechanism that works for you.
Just call seteuid(2) to do the appropiate unprivileged stuff. seteuid(2) allows to switch between the real(or saved) user id (the one that launches the suid program or root in your case) and the suid user id (the one the suid program belongs to) so there should be no problem to regain privileged user id afterwards (as the saved user id is root, you don't have any issue to switch to it again and again).
If you change uids with setuid(2) you'll change all (effective, saved and real uids) and this is only allowed to the root user (or a program setuid root, and there's no way back then).
Look at the next example:
File pru49015.c:
#include <stdio.h>
#include <stdlib.h>
#include <getopt.h>
int main(int argc, char **argv)
{
int opt, suid = getuid(), /* this is the saved uid */
uid = 0;
while ((opt = getopt(argc, argv, "i:")) != EOF) {
switch (opt) {
case 'i': uid = atoi(optarg); break;
}
}
/* execute this program with root privileges, like setuid root, for example */
printf("real uid=%d; effective uid=%d\n", getuid(), geteuid());
seteuid(uid); /* change to the non-privileged id configured */
printf("real uid=%d; effective uid=%d\n", getuid(), geteuid());
seteuid(suid); /* return back to saved uid */
printf("real uid=%d; effective uid=%d\n", getuid(), geteuid());
}
You will get an output like this:
$ pru49015 -i 37
real uid=502; effective uid=0
real uid=502; effective uid=37
real uid=502; effective uid=502
when used as a setuid-root program
If you use it as root, you'll get the following output:
$ sudo pru$$ -i 37
real uid=0; effective uid=0
real uid=0; effective uid=37
real uid=0; effective uid=0
The mechanism is that you are allowed on a setuid- program to switch between the user you are (let's call it the saved user id) and the user the program runs setuid to (the called effective user id or suid user) as many times as you want.
You can store the former effective uid in the saved UID of the process:
uid_t real = getuid();
uid_t privileged = geteuid();
setresuid(real, real, privileged);
do_API_call(); // API's getuid() call now returns real
setresuid(real, privileged, -1); // allowed, since saved==privileged
There's a corresponding setresgid to use saved GIDs, too.
Note that this answer is specific to Linux (as per question tags). A similar call exists on HP-UX and some BSD systems, but I haven't checked that the semantics are identical.
Actually, on further reading setreuid() should be sufficient (and POSIX-conformant). setuid() says:
If the effective UID of the caller is root (more precisely: if the caller has the CAP_SETUID capability), the real UID and saved set-user-ID are also
set.
and
If the user is root or the program is set-user-ID-root, special care must be taken. The setuid() function checks the effective user ID of the caller and if it is the superuser, all process-related user ID's are
set to uid. After this has occurred, it is impossible for the program to regain root privileges.
but there is no such statement for setreuid().
From here:
Normally when a process is executed, the effective, real, and saved user and group IDs are all set to the real user and group ID of the process's parent, respectively. However, when the setuid bit is set on an executable, the effective and saved user IDs are set to the user ID that owns the file.

SIGHUP from a terminal

Experts,
I have a client that connects over ssh to a server (it gets a tty allocated). I have a process A that is running on the server. Now, whenever the client disconnects, I need A to know about the tty that vanishes.
I was thinking since SSHD knows the session dying (after timeout or a simple exit), it can generate a signal to process A.
Is there any other way that A can get information about the tty that vanishes like listening on SIGHUP for the tty? I am writing the code in C on Linux.
Appreciate your help.
POSIX.1 provides a facility, utmpx, which lists the currently logged in users, their terminals, and other information. In Linux, that is the same as utmp; see man 5 utmp for further information.
OpenSSH does maintain utmp records.
Here is a simple example, that lists all users currently logged in from remote machines, the terminal they are using, and the initial process group the user owns:
#define _POSIX_C_SOURCE 200809L
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <utmpx.h>
int main(void)
{
struct utmpx *entry;
setutxent();
while ((entry = getutxent()))
if (entry->ut_type == USER_PROCESS && entry->ut_host[0] != '\0')
printf("%s is logged in on /dev/%s from %s and owns process group %d\n",
entry->ut_user, entry->ut_line, entry->ut_host,
(int)getpgid(entry->ut_pid));
return 0;
}
In your case, I would expect process A to maintain a list of remotely connected users, and periodically do a similar loop as above to update the status of known entries and to add new entries; and remove entries that are no longer seen.
New entries then match a "login" event, entries that are no longer seen a "logout" event (and deleted after the loop), and all other events are "still logged in" users.
The loop above is quite lightweight in terms of CPU time used and I/O used. The utmp records (/var/run/utmp in most Linux machines) are in binary form, and if frequently accessed, usually in page cache. Entries are relatively small, and even on servers with a lot of users the file read is well under a megabyte in size. Still, I wouldn't do it in a tight loop.
Personally, I would use inotify to wait for CLOSE_WRITE events on the UTMPX_FILE file (/var/run/utmp on most Linux machines), and reread the records after each event. That way the service would block on the read() on the inotify file descriptor most of the time (not wasting any CPU time), and pretty much immediately react to any login/logout events.
You face two problems, both difficult. The succinct answer is "you can't"; the longer answer is "you can't without making significant modifications".
A signal relays very little information other than the fact that it occurred. If you use sigaction() and SA_SIGINFO, you can find the process ID of the process that sent the signal, but under your scheme, that would be sshd, which isn't dreadfully helpful. Thus, it will be hard (nigh on impossible) to get the information about which terminal via the signal. Obviously, other schemes can be defined, but you'd have to write the information to a file, or something similar.
You'd have to modify sshd to record the information about which terminal it allocates (or is allocated) to its child processes, and then arrange for it to send that information to your Process A when a child terminates. That would be tricky, at best.
These two factors alone make it rather difficult. If you still want to do it, then the way I'd try is by getting sshd to run a special process of your devising, which in turn forks and the child runs the the process that sshd would otherwise run. The parent (a) records which terminal the child is connected to, and (b) waits for the child to terminate. When it does, it writes the terminal information to somewhere that Process A will find it, and exits. You still have to revise sshd, and you have to devise a mechanism whereby the parent process knows what to run as the child process (but that's probably not very hard; you leave the argument list unchanged, but simply have sshd exec your monitor process instead of whatever is specified as argv[0]…the parent uses argv[0] as the file argument to execvp().
This scheme minimizes the changes to sshd (but does still require a non-standard version). And you have to write the parent code carefully, and it has to cooperate with Process A. All decidedly non-trivial.

Determine if a process is running?

Is there an easy way to determine if a certain process is running?
I need to know if an instance of my program is running in the background, and if not fork and create the background process.
Normally the race-free way of doing this is:
Open a lock file / pid file for writing (but do not truncate it)
Attempt to take an exclusive lock on it (using fcntl or flock) without blocking
If that fails with EAGAIN, then the other process is already running.
The file descriptor should now be inherited by the daemon and left open for its lifetime
The advantage of doing this over simply storing a PID, is that if somebody reuses the PID, you won't get a false positive.
The biggest problem with storing the pid in the file is that a low-numbered pid used by a system start up daemon can get reused on a subsequent reboot by a different daemon. I have seen this happen.
This is usually done using pidfiles: a file in /var/run/[name].pid containing only the process ID returned by fork().
if pidfile exists:
exit()
else:
create pidfile
pid = start_background()
pidfile.write(pid)
On shutdown: remove pidfile
Linux software, by far and large does not care about the exclusivity of programs, only the resources they use. "Caring" is most often provided by the implementation (E.G. the infrastructure of the distro).
For instance, if you want to run a program, but that program locks up or turns zombie and you have no way to kill it, or it's running as a different user performing some other function. Why should the program care whether another copy of itself is running? Having it do so only seems like an unnecessary restriction.
If it's a process that opens a socket (like a TCP port), have the program fail if it can't open the socket. If it needs exclusive access to a file, have it fail if it can't get it. Support a PID file, but don't make it mandatory.
You'll see this methodology all over GNU software, which is part of what makes it so versatile.

Resources