Ensure that UID/GID check in system call is executed in RCU-critical section - c

Task
I have a small kernel module I wrote for my RaspBerry Pi 2 which implements an additional system call for generating power consumption metrics. I would like to modify the system call so that it only gets invoked if a special user (such as "root" or user "pi") issues it. Otherwise, the call just skips the bulk of its body and returns success.
Background Work
I've read into the issue at length, and I've found a similar question on SO, but there are numerous problems with it, from my perspective (noted below).
Question
The linked question notes that struct task_struct contains a pointer element to struct cred, as defined in linux/sched.h and linux/cred.h. The latter of the two headers doesn't exist on my system(s), and the former doesn't show any declaration of a pointer to a struct cred element. Does this make sense?
Silly mistake. This is present in its entirety in the kernel headers (ie: /usr/src/linux-headers-$(uname -r)/include/linux/cred.h), I was searching in gcc-build headers in /usr/include/linux.
Even if the above worked, it doesn't mention if I would be getting the the real, effective, or saved UID for the process. Is it even possible to get each of these three values from within the system call?
cred.h already contains all of these.
Is there a safe way in the kernel module to quickly determine which groups the user belongs to without parsing /etc/group?
cred.h already contains all of these.
Update
So, the only valid question remaining is the following:
Note, that iterating through processes and reading process's
credentials should be done under RCU-critical section.
... how do I ensure my check is run in this critical section? Are there any working examples of how to accomplish this? I've found some existing kernel documentation that instructs readers to wrap the relevant code with rcu_read_lock() and rcu_read_unlock(). Do I just need to wrap an read operations against the struct cred and/or struct task_struct data structures?

First, adding a new system call is rarely the right way to do things. It's best to do things via the existing mechanisms because you'll benefit from already-existing tools on both sides: existing utility functions in the kernel, existing libc and high-level language support in userland. Files are a central concept in Linux (like other Unix systems) and most data is exchanged via files, either device files or special filesystems such as proc and sysfs.
I would like to modify the system call so that it only gets invoked if a special user (such as "root" or user "pi") issues it.
You can't do this in the kernel. Not only is it wrong from a design point of view, but it isn't even possible. The kernel knows nothing about user names. The only knowledge about users in the kernel in that some privileged actions are reserved to user 0 in the root namespace (don't forget that last part! And if that's new to you it's a sign that you shouldn't be doing advanced things like adding system calls). (Many actions actually look for a capability rather than being root.)
What you want to use is sysfs. Read the kernel documentation and look for non-ancient online tutorials or existing kernel code (code that uses sysfs is typically pretty clean nowadays). With sysfs, you expose information through files under /sys. Access control is up to userland — have a sane default in the kernel and do things like calling chgrp, chmod or setfacl in the boot scripts. That's one of the many wheels that you don't need to reinvent on the user side when using the existing mechanisms.
The sysfs show method automatically takes a lock around the file, so only one kernel thread can be executing it at a time. That's one of the many wheels that you don't need to reinvent on the kernel side when using the existing mechanisms.

The linked question concerns a fundamentally different issue. To quote:
Please note that the uid that I want to get is NOT of the current process.
Clearly, a thread which is not the currently executing thread can in principle exit at any point or change credentials. Measures need to be taken to ensure the stability of whatever we are fiddling with. RCU is often the right answer. The answer provided there is somewhat wrong in the sense that there are other ways as well.
Meanwhile, if you want to operate on the thread executing the very code, you can know it wont exit (because it is executing your code as opposed to an exit path). A question arises what about the stability of credentials -- good news, they are also guaranteed to be there and can be accessed with no preparation whatsoever. This can be easily verified by checking the code doing credential switching.
We are left with the question what primitives can be used to do the access. To that end one can use make_kuid, uid_eq and similar primitives.
The real question is why is this a syscall as opposed to just a /proc file.
See this blogpost for somewhat elaborated description of credential handling: http://codingtragedy.blogspot.com/2015/04/weird-stuff-thread-credentials-in-linux.html

Related

How to use Readlink

How do I use Readlink for fetching the values.
The answer is:
Don't do it
At least not in the way you're proposing.
You specified a solution here without specifying what you really want to do [and why?]. That is, what are your needs/requirements? Assuming you get it, what do you want to do with the filename? You posted a bare fragment of your userspace application but didn't post any of your kernel code.
As a long time kernel programmer, I can tell you that this won't work, can't work, and is a terrible hack. There is a vast difference in methods to use inside the kernel vs. userspace.
/proc is strictly for userspace applications to snoop on kernel data. The /proc filesystem drivers assume userspace, so they always do copy_to_user. Data will be written to user address space, and not kernel address space, so this will never work from within the kernel.
Even if you could use /proc from within the kernel, it is a genuinely awful way to do it.
You can get the equivalent data, but it's a bit more complicated than that. If you're intercepting the read syscall inside the kernel, you [already] have access to the current task struct and the fd number used in the call. From this, you can locate the struct for the given open file, and get whatever you want, directly, without involving /proc at all. Use this as a starting point.
Note that doing this will necessitate that you read kernel documentation, sources for filesystem drivers, syscalls, etc. How to lock data structures and lists with the various locking methods (e.g. RCU, rw locks, spinlocks). Also, per-cpu variables. kernel thread preemptions. How to properly traverse the necessary filesystem related lists and structs to get the information you want. All this, without causing lockups, panics, segfaults, deadlocks, UB based on stale or inconsistent/dynamically changing data.
You'll need to study all this to become familiar with the way the kernel does things internally, and understand it, before you try doing something like this. If you had, you would have read the source code for the /proc drivers and already known why things were failing.
As a suggestion, forget anything that you've learned about how a userspace application does things. It won't apply here. Internally, the kernel is organized in a completely different way than what you've been used to.
You have no need to use readlink inside the kernel in this instance. That's the way a userspace application would have to do it, but in the kernel it's like driving 100 miles out of your way to get data you already have nearby, and, as I mentioned previously, won't even work.

Is this is a good way to intercept system calls?

I am writing a tool. A part of that tool will be its ability to log the parameters of the system calls. Alright I can use ptrace for that purpose, but ptrace is pretty slow. A faster method that came to my mind was to modify the glibc. But this is getting difficult, as gcc magically inserts its own built in functions as system call wrappers than using the code defined in glibc. Using -fno-builtin is also not helping there.
So I came up with this idea of writing a shared library, which includes every system call wrapper, such as mmap and then perform the logging before calling the actual system call wrapper function. For example pseudo code of what my mmap would look like is given below.
int mmap(...)
{
log_parameters(...);
call_original_mmap(...);
...
}
Then I can use LD_PRELOAD to load this library firstup. Do you think this idea will work, or am I missing something?
No method that you can possibly dream up in user-space will work seamlessly with any application. Fortunately for you, there is already support for doing exactly what you want to do in the kernel. Kprobes and Kretprobes allow you to examine the state of the machine just preceeding and following a system call.
Documentation here: https://www.kernel.org/doc/Documentation/kprobes.txt
As others have mentioned, if the binary is statically linked, the dynamic linker will skip over any attempts to intercept functions using libdl. Instead, you should consider launching the process yourself and detouring the entry point to the function you wish to intercept.
This means launching the process yourself, intercepting it's execution, and rewriting it's memory to place a jump instruction at the beginning of a function's definition in memory to a new function that you control.
If you want to intercept the actual system calls and can't use ptrace, you will either have to find the execution site for each system call and rewrite it, or you may need to overwrite the system call table in memory and filtering out everything except the process you want to control.
All system calls from user-space goes through a interrupt handler to switch to kernel mode, if you find this handler you probably can add something there.
EDIT I found this http://cateee.net/lkddb/web-lkddb/AUDITSYSCALL.html. Linux kernels: 2.6.6–2.6.39, 3.0–3.4 have support for system call auditing. This is a kernel module that has to be enabled. Maybe you can look at the source for this module if it's not to confusing.
If the code you are developing is process-related, sometimes you can develop alternative implementations without breaking the existing code. This is helpful if you are rewriting an important system call and would like a fully functional system with which to debug it.
For your case, you are rewriting the mmap() algorithm to take advantage of an exciting new feature(or enhancing with new feature). Unless you get everything right on the first try, it would not be easy to debug the system: A nonfunctioning mmap() system call is certain to result in a nonfunctioning system. As always, there is hope.
Often, it is safe to keep the remaining algorithm in place and construct your replacement on the side. You can achieve this by using the user id (UID) as a conditional with which to decide which algorithm to use:
if (current->uid != 7777) {
/* old algorithm .. */
} else {
/* new algorithm .. */
}
All users except UID 7777 will use the old algorithm. You can create a special user, with UID 7777, for testing the new algorithm. This makes it much easier to test critical process-related code.

Where can I find system call source code?

In Linux where can I find the source code for all system calls given that I have the source tree? Also if I were to want to look up the source code and assembly for a particular system call is there something that I can type in terminal like my_system_call?
You'll need the Linux kernel sources in order to see the actual source of the system calls. Manual pages, if installed on your local system, only contain the documentation of the calls and not their source itself.
Unfortunately for you, system calls aren't stored in just one particular location in the whole kernel tree. This is because various system calls can refer to different parts of the system (process management, filesystem management, etc.) and therefore it would be infeasible to store them apart from the part of the tree related to that particular part of the system.
The best thing you can do is look for the SYSCALL_DEFINE[0-6] macro. It is used (obviously) to define the given block of code as a system call. For example, fs/ioctl.c has the following code :
SYSCALL_DEFINE3(ioctl, unsigned int, fd, unsigned int, cmd, unsigned long, arg)
{
/* do freaky ioctl stuff */
}
Such a definition means that the ioctl syscall is declared and takes three arguments. The number next to the SYSCALL_DEFINE means the number of arguments. For example, in the case of getpid(void), declared in kernel/timer.c, we have the following code :
SYSCALL_DEFINE0(getpid)
{
return task_tgid_vnr(current);
}
Hope that clears things up a little.
From an application's point of view, a system call is an elementary and atomic operation done by the kernel.
The Assembly Howto explains what is happening, in terms of machine instruction.
Of course, the kernel is doing a lot of things when handling a syscall.
Actually, you almost could believe that the entire kernel code is devoted to handle all system calls (this is not entirely true, but almost; from applications' point of view, the kernel is only visible thru system calls). The other answer by Daniel Kamil Kozar is explaining what kernel function is starting the handling of some system call (but very often, many other parts of the kernel indirectly participate to system calls; for example, the scheduler participates indirectly into implementing fork because it manages the child process created by a successful fork syscall).
I know it's old, but I was searching for the source for _system_call() too and found this tidbit
Actual code for system_call entry point can be found in /usr/src/linux/kernel/sys_call.S Actual code for many of the system calls can be found in /usr/src/linux/kernel/sys.c, and the rest are found elsewhere. find is your friend.
I assume this is dated, because I don't even have that file. However, grep found ENTRY(system_call) in arch/x86/kernel/entry_64.S and seems to be the thing that calls the individual system calls. I'm not up on my intel-syntax x86 asm right now, so you'll have to look and see if this is what you wanted.

Share a variable between C and Labview?

What is the best way to permit C code to regularly access the instantaneous value of an integer generated from a separate Labview program?
I have time-critical C code that controls a scientific experiment and records data once every 20ms. I also have some labview code that operates a different instrument and outputs an integer value ever 100ms. I want my C code to be able to record the value from labview. What is the best way to do this?
One idea is to have Labview write the integer to file in a loop, and have the C code read the value of the file in a loop. (I could add a second thread to my C code if necessary.) Labview can also link to C dll's. So I might be able to write a DLL in C that somehow facilitates sharing between the two programs. Is that advisable? How would I do that?
I have a similar application here and use TCP sockets with the TCP_NO_DELAY option set (disables the Nagle algorythm which does some sort of packet buffering). Sockets should allow for a 100mSec update rate without problems, although the actual network delay will always remain an unknown variable. For my application this does not matter as long as it stays under a certain limit (this is also checked for by sending a timestamp with each packet and big red dialog boxes if timestamp delta becomes too large :]). Does it matter for your application? Ie, is it important that whenever the LV instrument acquires a new sample it's value has to make it to the C app within x mSec?
You might get the dll approach working, but it's not as straightforward as sockets and it will make the two applications more dependant of each other. Variable acces will be pretty much instantaneous though. I see at least two possibilities:
put your entire C app in a dll (might seem a weird approach at first but it works), and have LV load it and call methods on it. Eg to start your app LV calls dll's Start() method, then in the loop LV acquires it's samples it calls the dll's NewSampleValue(0 method or so. Also means your app cannot run standalone unless you write a seperate host process for it.
look into shared process memory, and have the C app and another dll share common memory. LV will load that dll and call a method on it to write a value to the shared memory, then the C app can read it after polling a flag (which needs a lock!).
it might also be possible to have the C app call the LV program using dll/activeX/? calls but I don't know how that system works..
I would definitely stay away from the file approach: disk I/O can be a real bottleneck and it also has the locking problem which is messy to solve with files. C app cannot read the file while LV is writing it and vice-versa which might introduce extra delays.
On a sidenote, you can see that each of the approaches above either use a push or pull model (the TCP one can be implemented in both ways), this might affect your final decision of which way to go.. Push = LV signals the C app directly, pull = C app has to poll a flag or ask LV for the value.
I'm an employee at National Instruments and I wanted to make sure you didn't miss the Network Variable API that is provided with LabWindows/CVI, the National Instruments C development environment. The the Network Variable API will allow you to easily communicate with the LabVIEW program over Shared Variables (http://zone.ni.com/devzone/cda/tut/p/id/4679). While reading these links, note that a Network Variable and a Shared Variable are the same thing - the different names are unfortunate...
The nice thing about the Network Variable API is that it allows easy interoperability with LabVIEW, it provides a strongly typed communication mechanism, and it provides a callback model for notification when the Network/Shared variable's properties (such as value) change.
You can obtain this API by installing LabWindows/CVI, but it is not necessary to use the LabWindows/CVI environment. The header file is available at C:\Program Files\National Instruments\CVI2010\include\cvinetv.h, and the .lib file located at C:\Program Files\National Instruments\CVI2010\extlib\msvc\cvinetv.lib can be linked in with whatever C development tools you are using.
I followed up on one of #stijn's ideals:
have the C app and another dll share common memory. LV will load that dll and call a method on it to write a value to the shared memory, then the C app can read it after polling a flag (which needs a lock!).
I wrote the InterProcess library, available here: http://github.com/samuellab/InterProcess
InterProcess is a compact general library that sets up windows shared memory using CreateFileMapping() and MapViewOfFile(). It allows the user to seamlessly store values of any type (int, char, your struct.. whatever) in an arbitrary number of named fields. It also implements Mutex objects to avoid collisions and race conditions, and it abstracts away all of this in a clean and simple interface. Tested on Windows XP. Should work with any modern Windows.
For interfacing between my existing C code and labview, I wrote a small wrapper DLL that sits on top of InterProcess and exposes only the specific functions that my C code or labview need to access. In this way, all of the shared memory is completely abstracted away.
Hopefully someone else will find this code useful.

how to acess and change variable of kernel space from user space

i,
I have posted query previously and i am repeating same I want to modify igmpv3 (Linux)
which is inbuilt in kernel2.6.-- such that it reads a value from a file and appropriately decides reserved(res 1) value inside the igmpv3 paket which is sent by a host.
I want to add more to above question by saying that this is more a generic question of changing variable
of kernel space from user space.
Thanks in advance for your help.
Regards,
Bhavin
From the perspective of a user land program, you should think of the driver as a "black box" with well defined interfaces instead of code with variables you can change. Using this mental model, there are four ways (i.e. interfaces) to communicate control information to the driver that you should consider:
Command line options. You can pass parameters to a kernel module which are then available to it during initialization.
IOCTLs. This is the traditional way of passing control information to a driver, but this mechanism is a little more cumbersome to use than sysfs.
proc the process information pseudo-file system. proc creates files in the /proc directory which user land programs can read and sometimes write. In the past, this interface was appropriated to also communicate with drivers. Although proc looks similarly to sysfs, newer drivers (Linux 2.6) should use sysfs instead as the intent of the proc is to report on the status of processes.
sysfs is a pseudo-file system used to export information about drivers and devices. See the documentation in the kernel (Documentation/filesystems/sysfs.txt) for more details and code samples. For your particular case, pay attention to the "store" method.
Depending on when you need to communicate with the driver (i.e. initialization or run time), you should add either a new command line option or a new sysfs entry to change how the driver treats the value of reserved fields in the packet.
With regard to filp_open, the function's comment is
/**
* This is the helper to open a file from kernelspace if you really
* have to. But in generally you should not do this, so please move
* along, nothing to see here..
*/
meaning there are better ways than this to do what you want. Also see this SO question for more information on why drivers generally should not open files.
You normally can't. Only structures exposed in /proc and /sys or via a module parameter can be modified from userspace.

Resources