I need to call a sockets "setopt" routine from kernel space - c

I need to make changes to a TCP socket (although I'd like to keep it generic) from kernel space. A USB driver will receive a message and need to make this change to a given socket struct.
Directly calling the function requires userspace memory and the workarounds are not something I can place in production. So a straightforward call seems unlikely. Another solution I saw was to replicate the portions that expect userspace memory, but this solution is not something I want to put in production either.
I am considering writing a userspace program to talk to the driver via procfs and make the call. The driver would place data in the file to instruct the userspace app to do various things.
Is there a better way to accomplish this?

Depending on the option, you can call sock_setsockopt or tcp_setsockopt from within kernel space. There are also functions for IPv4 and IPv6 options.
Or, if you just want to manipulate particular options, you can view how these functions are implemented, and then use the same implementation for your particular options. For example, if you wanted to adjust TCP_CONGESTION directly, tcp_setsockopt does this (after validating the passed in values):
lock_sock(sk);
err = tcp_set_congestion_control(sk, name, true, true,
ns_capable(sock_net(sk)->user_ns,
CAP_NET_ADMIN));
release_sock(sk);

Related

Calling system calls from the kernel code

I am trying to create a mechanism to read performance counters for processes. I want this mechanism to be executed from within the kernel (version 4.19.2) itself.
I am able to do it from the user space the sys_perf_event_open() system call as follows.
syscall (__NR_perf_event_open, hw_event, pid, cpu, group_fd, flags);
I would like to invoke this call from the kernel space. I got some basic idea from here How do I use a Linux System call from a Linux Kernel Module
Here are the steps I took to achieve this:
To make sure that the virtual address of the kernel remains valid, I have used set_fs(), get_fs() and get_fd().
Since sys_perf_event_open() is defined in /include/linux/syscalls.h I have included that in the code.
Eventually, the code for calling the systems call looks something like this:
mm_segment_t fs;
fs = get_fs();
set_fs(get_ds());
long ret = sys_perf_event_open(&pe, pid, cpu, group_fd, flags);
set_fs(fs);
Even after these measures, I get an error claiming "implicit declaration of function ‘sys_perf_event_open’ ". Why is this popping up when the header file defining it is included already? Does it have to something with the way one should call system calls from within the kernel code?
In general (not specific to Linux) the work done for systems calls can be split into 3 categories:
switching from user context to kernel context (and back again on the return path). This includes things like changing the processor's privilege level, messing with gs, fiddling with stacks, and doing security mitigations (e.g. for Meltdown). These things are expensive, and if you're already in the kernel they're useless and/or dangerous.
using a "function number" parameter to find the right function to call, and calling it. This typically includes some sanity checks (does the function exist?) and a table lookup, plus code to mangle input and output parameters that's needed because the calling conventions used for system calls (in user space) is not the same as the calling convention that normal C functions use. These things are expensive, and if you're already in the kernel they're useless and/or dangerous.
the final normal C function that ends up being called. This is the function that you might have (see note) been able to call directly without using any of the expensive, useless and/or dangerous system call junk.
Note: If you aren't able to call the final normal C function directly without using (any part of) the system call junk (e.g. if the final normal C function isn't exposed to other kernel code); then you must determine why. For example, maybe it's not exposed because it alters user-space state, and calling it from kernel will corrupt user-space state, so it's not exposed/exported to other kernel code so that nobody accidentally breaks everything. For another example, maybe there's no reason why it's not exposed to other kernel code and you can just modify its source code so that it is exposed/exported.
Calling system calls from inside the kernel using the sys_* interface is discouraged for the reasons that others have already mentioned. In the particular case of x86_64 (which I guess it is your architecture) and starting from kernel versions v4.17 it is now a hard requirement not to use such interface (but for a few exceptions). It was possible to invoke system calls directly prior to this version but now the error you are seeing pops up (that's why there are plenty of tutorials on the web using sys_*). The proposed alternative in the Linux documentation is to define a wrapper between the syscall and the actual syscall's code that can be called within the kernel as any other function:
int perf_event_open_wrapper(...) {
// actual perf_event_open() code
}
SYSCALL_DEFINE5(perf_event_open, ...) {
return perf_event_open_wrapper(...);
}
source: https://www.kernel.org/doc/html/v4.19/process/adding-syscalls.html#do-not-call-system-calls-in-the-kernel
Which kernel version are we talking about?
Anyhow, you could either get the address of the sys_call_table by looking at the System map file, or if it is exported, you can look up the symbol (Have a look at kallsyms.h), once you have the address to the syscall table, you may treat it as a void pointer array (void **), and find your desired functions indexed. i.e sys_call_table[__NR_open] would be open's address, so you could store it in a void pointer and then call it.
Edit: What are you trying to do, and why can't you do it without calling syscalls? You must understand that syscalls are the kernel's API to the userland, and should not be really used from inside the kernel, thus such practice should be avoided.
calling system calls from kernel code
(I am mostly answering to that title; to summarize: it is forbidden to even think of that)
I don't understand your actual problem (I feel you need to explain it more in your question which is unclear and lacks a lot of useful motivation and context). But a general advice -following the Unix philosophy- is to minimize the size and vulnerability area of your kernel or kernel module code, and to deport, as much as convenient, such code in user-land, in particular with the help of systemd, as soon as your kernel code requires some system calls. Your question is by itself a violation of most Unix and Linux cultural norms.
Have you considered to use efficient kernel to user-land communication, in particular netlink(7) with socket(7). Perhaps you also
want some driver specific kernel thread.
My intuition would be that (in some user-land daemon started from systemd early at boot time) AF_NETLINK with socket(2) is exactly fit for your (unexplained) needs. And eventd(2) might also be relevant.
But just thinking of using system calls from inside the kernel triggers a huge flashing red light in my brain and I tend to believe it is a symptom of a major misunderstanding of operating system kernels in general. Please take time to read Operating Systems: Three Easy Pieces to understand OS philosophy.

How to use Readlink

How do I use Readlink for fetching the values.
The answer is:
Don't do it
At least not in the way you're proposing.
You specified a solution here without specifying what you really want to do [and why?]. That is, what are your needs/requirements? Assuming you get it, what do you want to do with the filename? You posted a bare fragment of your userspace application but didn't post any of your kernel code.
As a long time kernel programmer, I can tell you that this won't work, can't work, and is a terrible hack. There is a vast difference in methods to use inside the kernel vs. userspace.
/proc is strictly for userspace applications to snoop on kernel data. The /proc filesystem drivers assume userspace, so they always do copy_to_user. Data will be written to user address space, and not kernel address space, so this will never work from within the kernel.
Even if you could use /proc from within the kernel, it is a genuinely awful way to do it.
You can get the equivalent data, but it's a bit more complicated than that. If you're intercepting the read syscall inside the kernel, you [already] have access to the current task struct and the fd number used in the call. From this, you can locate the struct for the given open file, and get whatever you want, directly, without involving /proc at all. Use this as a starting point.
Note that doing this will necessitate that you read kernel documentation, sources for filesystem drivers, syscalls, etc. How to lock data structures and lists with the various locking methods (e.g. RCU, rw locks, spinlocks). Also, per-cpu variables. kernel thread preemptions. How to properly traverse the necessary filesystem related lists and structs to get the information you want. All this, without causing lockups, panics, segfaults, deadlocks, UB based on stale or inconsistent/dynamically changing data.
You'll need to study all this to become familiar with the way the kernel does things internally, and understand it, before you try doing something like this. If you had, you would have read the source code for the /proc drivers and already known why things were failing.
As a suggestion, forget anything that you've learned about how a userspace application does things. It won't apply here. Internally, the kernel is organized in a completely different way than what you've been used to.
You have no need to use readlink inside the kernel in this instance. That's the way a userspace application would have to do it, but in the kernel it's like driving 100 miles out of your way to get data you already have nearby, and, as I mentioned previously, won't even work.

How can I emulate a memory I/O device for unit testing on linux?

How can I emulate a memory I/O device for unit testing on Linux?
I'm writing a unit test for some source code for embedded deployment.
The code is accessing a specific address space to communicate with a chip.
I would like to unit test(UT) this code on Linux.
The unit test must be able to run without human intervention.
I need to run the UT as a normal user.
The code must being tested must be exactly the source code being run on the target system.
Any ideas of where I could go for inspiration on how to solve this?
Can an ordinary user somehow tell the MMU that a particular memory allocation must be done at a specific address.
Or that a data block must be in a particular memory areas?
As I understand it:
sigsegv can't be used; since after the return from the handler the same mem access code will be called again and fail again. ( or by accident the memory area might actually have valid data in it, just not what I would like)
Thanks
Henry
First, make the address to be read an injected dependency of the code, instead of a hard-coded dependency. Now you don't have to worry about the location under test conditions, it can be anything you like.
Then, you may also need to inject a function to read/write from/to the magic address as a dependency, depending what you're testing. Now you don't have to worry about how it's going to trick the code being tested into thinking it's performing I/O. You can stub/mock/whatever the hardware I/O behavior.
It's quite difficult to test low-level code under the conditions you describe, whilst also keeping it super-efficient in non-test mode, because you don't want to introduce too many levels of indirection.
"Exactly the source code" can hide a multitude of sins, though, depending how you interpret it. For example, your "dependency injection" could be via a macro, so that the unit source is "the same", but you've completely changed what it does with a sneaky -D compiler option.
AFAIK you need to create a block device (I am not sure whether character device will work). Create a kernel module that maps that memory range to itself.
create read/write function, so whenever that memory range is touched, those read/write functions are called.
register those read/write function with the kernel, so that whenever there is read/write to those addresses, kernel is invoked and read/write functionality is performed by kernel on behalf of user.

How can i call a function that written in kernel module, from the user program?

sample driver created and loaded successfully, in that a user defined function is written, it does some actions. i need to write a user program that calls the user defined function in the driver module.
need help in following cases.
How can i get access to the driver code from a user program ?.
How can i call a function that written in kernel module, from the user program ?.
thanks.
You can make your driver to react on writes (or if necessary, ioctl) to a /dev/xxx file or a /proc/xxx file. Also, you can create a new syscall, but that's more of a toy as the module would only work on custom built kernels.
Edit: try http://www.faqs.org/docs/kernel/x571.html (on character device drivers.)
It depends on what your function does, but in general:
If you want to store and show properties in the form of values (e.g. current brightness of a backlight), the standard way of doing would be using sysfs: http://kernel.org/doc/Documentation/filesystems/sysfs.txt
If you want to write/read values from a device (real or virtual), export memory or IO regions of a device to user space, or more generally control a device (e.g. setting resolution of a camera and capturing frames), you would use a character or block devices with the read/write/mmap and ioctl functions: http://luv.asn.au/overheads/chrdev-talk.html
Finally if your function just controls something from the kernel, then sysfs or procfs should be the way to go. I am not sure why people are still using procfs nowadays, except for misc devices maybe.
So in general, you need to export your kernel functions to user space through files, by defining hooks that will be called when the file is opened, read, written (to copy data from/to user space), mmap'ed (to share memory areas without copying) or when an ioctl is invoked (to perform more general control).
VDSO:
http://en.wikipedia.org/wiki/VDSO
Kernel mode Linux:
http://www.yl.is.s.u-tokyo.ac.jp/~tosh/kml/
for Qn.1:
read/write/ioctl see file_operations
for Qn.2:
1) system calls
2) driver - ioctl

how to acess and change variable of kernel space from user space

i,
I have posted query previously and i am repeating same I want to modify igmpv3 (Linux)
which is inbuilt in kernel2.6.-- such that it reads a value from a file and appropriately decides reserved(res 1) value inside the igmpv3 paket which is sent by a host.
I want to add more to above question by saying that this is more a generic question of changing variable
of kernel space from user space.
Thanks in advance for your help.
Regards,
Bhavin
From the perspective of a user land program, you should think of the driver as a "black box" with well defined interfaces instead of code with variables you can change. Using this mental model, there are four ways (i.e. interfaces) to communicate control information to the driver that you should consider:
Command line options. You can pass parameters to a kernel module which are then available to it during initialization.
IOCTLs. This is the traditional way of passing control information to a driver, but this mechanism is a little more cumbersome to use than sysfs.
proc the process information pseudo-file system. proc creates files in the /proc directory which user land programs can read and sometimes write. In the past, this interface was appropriated to also communicate with drivers. Although proc looks similarly to sysfs, newer drivers (Linux 2.6) should use sysfs instead as the intent of the proc is to report on the status of processes.
sysfs is a pseudo-file system used to export information about drivers and devices. See the documentation in the kernel (Documentation/filesystems/sysfs.txt) for more details and code samples. For your particular case, pay attention to the "store" method.
Depending on when you need to communicate with the driver (i.e. initialization or run time), you should add either a new command line option or a new sysfs entry to change how the driver treats the value of reserved fields in the packet.
With regard to filp_open, the function's comment is
/**
* This is the helper to open a file from kernelspace if you really
* have to. But in generally you should not do this, so please move
* along, nothing to see here..
*/
meaning there are better ways than this to do what you want. Also see this SO question for more information on why drivers generally should not open files.
You normally can't. Only structures exposed in /proc and /sys or via a module parameter can be modified from userspace.

Resources