i,
I have posted query previously and i am repeating same I want to modify igmpv3 (Linux)
which is inbuilt in kernel2.6.-- such that it reads a value from a file and appropriately decides reserved(res 1) value inside the igmpv3 paket which is sent by a host.
I want to add more to above question by saying that this is more a generic question of changing variable
of kernel space from user space.
Thanks in advance for your help.
Regards,
Bhavin
From the perspective of a user land program, you should think of the driver as a "black box" with well defined interfaces instead of code with variables you can change. Using this mental model, there are four ways (i.e. interfaces) to communicate control information to the driver that you should consider:
Command line options. You can pass parameters to a kernel module which are then available to it during initialization.
IOCTLs. This is the traditional way of passing control information to a driver, but this mechanism is a little more cumbersome to use than sysfs.
proc the process information pseudo-file system. proc creates files in the /proc directory which user land programs can read and sometimes write. In the past, this interface was appropriated to also communicate with drivers. Although proc looks similarly to sysfs, newer drivers (Linux 2.6) should use sysfs instead as the intent of the proc is to report on the status of processes.
sysfs is a pseudo-file system used to export information about drivers and devices. See the documentation in the kernel (Documentation/filesystems/sysfs.txt) for more details and code samples. For your particular case, pay attention to the "store" method.
Depending on when you need to communicate with the driver (i.e. initialization or run time), you should add either a new command line option or a new sysfs entry to change how the driver treats the value of reserved fields in the packet.
With regard to filp_open, the function's comment is
/**
* This is the helper to open a file from kernelspace if you really
* have to. But in generally you should not do this, so please move
* along, nothing to see here..
*/
meaning there are better ways than this to do what you want. Also see this SO question for more information on why drivers generally should not open files.
You normally can't. Only structures exposed in /proc and /sys or via a module parameter can be modified from userspace.
Related
I have looked into similar questions on this site (listed at the end) but still feel like missing a couple points, hopefully someone can help here:
Is there a hook into the proc file system that connects the /proc/iomem inode to a function that dumps the information? I wasn't able to find where in proc fs this function lives. I did a grep under the linux source tree fs/proc for iomem, got nothing. So maybe it is a more of a procfs question... The answer to this question might help me to dig up the answer to the next question..
The /proc/iomem has more entries than the BIOS E820 information I extracted from either dmesg or /sys/firmware/memmap (these two are actually consistent with each other). For example, /sys/firmware/memmap does not seem to have pci memory mapped regions. Drivers' init code calls the request_mem_region() and add more info to the map, so somewhere there should be a global variable (root of all resources ?) that remembers this graph?
The questions on stackoverflow I have looked into:
How is /proc/io* populated?
Expose information to /proc/iomem
Content of /proc/iomem
struct resource iomem_resource is what you're looking for, and it is defined and initialized in kernel/resource.c (via proc_create_seq_data()). In the same file, the instance struct seq_operations resource_op defines what happens when you, for example cat the file from userland.
iomem_resource is a globally exported symbol, and is used throughout the kernel, drivers included, to request resources. You can find instances scattered across the kernel of devm_/request_resource() which take either iomem_resource or its sibling ioport_resource based on either fixed settings, or based on configurations. Examples of methods that take configurations are a) device trees which is prevalent in embedded settings, and b) E820 or UEFI, which can be found more on x86.
Starting with b) which was asked in the question, the file arch/x86/kernel/e820.c shows examples of how reserved memory gets inserted into /proc/iomem via insert_resource().
This excellent link has more details on the dynamics of requesting memory map details from the BIOS.
Another alternative sequence (which relies on CONFIG_OF) for how a device driver requests the needed resources is:
The Open Firmware API is traversing the device tree, and finds a matching driver. For example via a struct of_device_id.
The driver defines a struct platform_device which contains both the struct of_device_id and a probe function. This probing function is thus called.
Inside the probe function, a call to platform_get_resource() is made which reads the reg property from the device tree. This property defines the physical memory map for a specific device.
A call to devm_request_mem_region() is made (which is just a call to request_region()) to actually allocate the resources and add it to /proc/iomem.
I need to make changes to a TCP socket (although I'd like to keep it generic) from kernel space. A USB driver will receive a message and need to make this change to a given socket struct.
Directly calling the function requires userspace memory and the workarounds are not something I can place in production. So a straightforward call seems unlikely. Another solution I saw was to replicate the portions that expect userspace memory, but this solution is not something I want to put in production either.
I am considering writing a userspace program to talk to the driver via procfs and make the call. The driver would place data in the file to instruct the userspace app to do various things.
Is there a better way to accomplish this?
Depending on the option, you can call sock_setsockopt or tcp_setsockopt from within kernel space. There are also functions for IPv4 and IPv6 options.
Or, if you just want to manipulate particular options, you can view how these functions are implemented, and then use the same implementation for your particular options. For example, if you wanted to adjust TCP_CONGESTION directly, tcp_setsockopt does this (after validating the passed in values):
lock_sock(sk);
err = tcp_set_congestion_control(sk, name, true, true,
ns_capable(sock_net(sk)->user_ns,
CAP_NET_ADMIN));
release_sock(sk);
Task
I have a small kernel module I wrote for my RaspBerry Pi 2 which implements an additional system call for generating power consumption metrics. I would like to modify the system call so that it only gets invoked if a special user (such as "root" or user "pi") issues it. Otherwise, the call just skips the bulk of its body and returns success.
Background Work
I've read into the issue at length, and I've found a similar question on SO, but there are numerous problems with it, from my perspective (noted below).
Question
The linked question notes that struct task_struct contains a pointer element to struct cred, as defined in linux/sched.h and linux/cred.h. The latter of the two headers doesn't exist on my system(s), and the former doesn't show any declaration of a pointer to a struct cred element. Does this make sense?
Silly mistake. This is present in its entirety in the kernel headers (ie: /usr/src/linux-headers-$(uname -r)/include/linux/cred.h), I was searching in gcc-build headers in /usr/include/linux.
Even if the above worked, it doesn't mention if I would be getting the the real, effective, or saved UID for the process. Is it even possible to get each of these three values from within the system call?
cred.h already contains all of these.
Is there a safe way in the kernel module to quickly determine which groups the user belongs to without parsing /etc/group?
cred.h already contains all of these.
Update
So, the only valid question remaining is the following:
Note, that iterating through processes and reading process's
credentials should be done under RCU-critical section.
... how do I ensure my check is run in this critical section? Are there any working examples of how to accomplish this? I've found some existing kernel documentation that instructs readers to wrap the relevant code with rcu_read_lock() and rcu_read_unlock(). Do I just need to wrap an read operations against the struct cred and/or struct task_struct data structures?
First, adding a new system call is rarely the right way to do things. It's best to do things via the existing mechanisms because you'll benefit from already-existing tools on both sides: existing utility functions in the kernel, existing libc and high-level language support in userland. Files are a central concept in Linux (like other Unix systems) and most data is exchanged via files, either device files or special filesystems such as proc and sysfs.
I would like to modify the system call so that it only gets invoked if a special user (such as "root" or user "pi") issues it.
You can't do this in the kernel. Not only is it wrong from a design point of view, but it isn't even possible. The kernel knows nothing about user names. The only knowledge about users in the kernel in that some privileged actions are reserved to user 0 in the root namespace (don't forget that last part! And if that's new to you it's a sign that you shouldn't be doing advanced things like adding system calls). (Many actions actually look for a capability rather than being root.)
What you want to use is sysfs. Read the kernel documentation and look for non-ancient online tutorials or existing kernel code (code that uses sysfs is typically pretty clean nowadays). With sysfs, you expose information through files under /sys. Access control is up to userland — have a sane default in the kernel and do things like calling chgrp, chmod or setfacl in the boot scripts. That's one of the many wheels that you don't need to reinvent on the user side when using the existing mechanisms.
The sysfs show method automatically takes a lock around the file, so only one kernel thread can be executing it at a time. That's one of the many wheels that you don't need to reinvent on the kernel side when using the existing mechanisms.
The linked question concerns a fundamentally different issue. To quote:
Please note that the uid that I want to get is NOT of the current process.
Clearly, a thread which is not the currently executing thread can in principle exit at any point or change credentials. Measures need to be taken to ensure the stability of whatever we are fiddling with. RCU is often the right answer. The answer provided there is somewhat wrong in the sense that there are other ways as well.
Meanwhile, if you want to operate on the thread executing the very code, you can know it wont exit (because it is executing your code as opposed to an exit path). A question arises what about the stability of credentials -- good news, they are also guaranteed to be there and can be accessed with no preparation whatsoever. This can be easily verified by checking the code doing credential switching.
We are left with the question what primitives can be used to do the access. To that end one can use make_kuid, uid_eq and similar primitives.
The real question is why is this a syscall as opposed to just a /proc file.
See this blogpost for somewhat elaborated description of credential handling: http://codingtragedy.blogspot.com/2015/04/weird-stuff-thread-credentials-in-linux.html
How do I use Readlink for fetching the values.
The answer is:
Don't do it
At least not in the way you're proposing.
You specified a solution here without specifying what you really want to do [and why?]. That is, what are your needs/requirements? Assuming you get it, what do you want to do with the filename? You posted a bare fragment of your userspace application but didn't post any of your kernel code.
As a long time kernel programmer, I can tell you that this won't work, can't work, and is a terrible hack. There is a vast difference in methods to use inside the kernel vs. userspace.
/proc is strictly for userspace applications to snoop on kernel data. The /proc filesystem drivers assume userspace, so they always do copy_to_user. Data will be written to user address space, and not kernel address space, so this will never work from within the kernel.
Even if you could use /proc from within the kernel, it is a genuinely awful way to do it.
You can get the equivalent data, but it's a bit more complicated than that. If you're intercepting the read syscall inside the kernel, you [already] have access to the current task struct and the fd number used in the call. From this, you can locate the struct for the given open file, and get whatever you want, directly, without involving /proc at all. Use this as a starting point.
Note that doing this will necessitate that you read kernel documentation, sources for filesystem drivers, syscalls, etc. How to lock data structures and lists with the various locking methods (e.g. RCU, rw locks, spinlocks). Also, per-cpu variables. kernel thread preemptions. How to properly traverse the necessary filesystem related lists and structs to get the information you want. All this, without causing lockups, panics, segfaults, deadlocks, UB based on stale or inconsistent/dynamically changing data.
You'll need to study all this to become familiar with the way the kernel does things internally, and understand it, before you try doing something like this. If you had, you would have read the source code for the /proc drivers and already known why things were failing.
As a suggestion, forget anything that you've learned about how a userspace application does things. It won't apply here. Internally, the kernel is organized in a completely different way than what you've been used to.
You have no need to use readlink inside the kernel in this instance. That's the way a userspace application would have to do it, but in the kernel it's like driving 100 miles out of your way to get data you already have nearby, and, as I mentioned previously, won't even work.
sample driver created and loaded successfully, in that a user defined function is written, it does some actions. i need to write a user program that calls the user defined function in the driver module.
need help in following cases.
How can i get access to the driver code from a user program ?.
How can i call a function that written in kernel module, from the user program ?.
thanks.
You can make your driver to react on writes (or if necessary, ioctl) to a /dev/xxx file or a /proc/xxx file. Also, you can create a new syscall, but that's more of a toy as the module would only work on custom built kernels.
Edit: try http://www.faqs.org/docs/kernel/x571.html (on character device drivers.)
It depends on what your function does, but in general:
If you want to store and show properties in the form of values (e.g. current brightness of a backlight), the standard way of doing would be using sysfs: http://kernel.org/doc/Documentation/filesystems/sysfs.txt
If you want to write/read values from a device (real or virtual), export memory or IO regions of a device to user space, or more generally control a device (e.g. setting resolution of a camera and capturing frames), you would use a character or block devices with the read/write/mmap and ioctl functions: http://luv.asn.au/overheads/chrdev-talk.html
Finally if your function just controls something from the kernel, then sysfs or procfs should be the way to go. I am not sure why people are still using procfs nowadays, except for misc devices maybe.
So in general, you need to export your kernel functions to user space through files, by defining hooks that will be called when the file is opened, read, written (to copy data from/to user space), mmap'ed (to share memory areas without copying) or when an ioctl is invoked (to perform more general control).
VDSO:
http://en.wikipedia.org/wiki/VDSO
Kernel mode Linux:
http://www.yl.is.s.u-tokyo.ac.jp/~tosh/kml/
for Qn.1:
read/write/ioctl see file_operations
for Qn.2:
1) system calls
2) driver - ioctl