How can I get the PID of the user process which triggered my Kernel module's file_operation.read routine (i.e., which process is reading /dev/mydev) ?
When your read function is executing, it's doing so in the context of the process that issued the system call. You should thus pe able to use current, i.e. current->pid.
These days, we have some helper functions defined in sched.h. In the case of pid, you can use:
pid = task_pid_nr(current);
to get the current task's pid.
here is the comment taken from include/linux/sched.h as of v3.8.
the helpers to get the task's different pids as they are seen
from various namespaces
task_xid_nr() : global id, i.e. the id seen from the init namespace;
task_xid_vnr() : virtual id, i.e. the id seen from the pid namespace of current.
task_xid_nr_ns() : id seen from the ns specified;
set_task_vxid() : assigns a virtual id to a task;
see also pid_nr() etc in include/linux/pid.h
On a kernel 2.6.39 arm build, if current->pid does not work then it may be done by:
pid_nr(get_task_pid(current, PIDTYPE_PID))
The PIDTYPE_PID can be substituted by PIDTYPE_PGID or PIDTYPE_SID. The header source is at include/linux/pid.h as Yasushi pointed out.
Which of the approaches work depends on what header files the code uses.
Related
I take LTP (Linux Test Project) on embedded device. Device stuck in following while loop in test case setfsgid03, because getgrgid() always return NULL when it is called by nobody .
It works fine when it is called by root on embedded device. And it works fine on x86 linux host when it is called by nobody.
Is it caused by any configuration of linux on device?
Relevant code snippet is below:
gid = 1;
while (!getgrgid(gid))
gid++;
getgrgid will read the entries from /etc/group or with Glibc more generally from sources specified in /etc/nsswitch.conf. If /etc/group does not exist or it doesn't have other groups besides the gid then this code will loop at least until the wrap-around/signed overflow of gid. If there is only the entry for nobody at pid -2 it will also take ages to find that pid.
All in all, the code is utterly bad. I'd just ensure that there is an entry in /etc/group with GID 2 say; the proper way to find a defined non-root gid would be to use getgrent_r successively until the returned record has gr_gid != 0, and fail if NULL is returned before such a record is found.
I will give a detailed exp of the program and lead to the issue regarding the use of netlink socket communication.
The last paragraph asks the actual question I need an answer for, so you might wanna start by peeking it first.
Disclaimer before I start:
- I have made an earlier search before asking here and did not find complete solution / alternative to my issue.
- I know how to initialize a module and insert it to kernel.
- I know to handle communication between module and user-space without using netlink sockets. Meaning using struct file_operations func pointers assignments to later be invoked by the module program whenever a user attempts to read/write etc. and answer to the user using copy_to_user / copy_from_user.
- This topic refers to Linux OS, Mint 17 dist.
- Language is C
Okay, so I am building a system with 3 components:
1. user.c : user application (user types commands here)
2. storage.c : storage device ('virtual' disk-on-key)
3. device.ko : kernel module (used as proxy between 1. and 2.)
The purpose of this system is to be able (as a user) to:
- Copy files to the virtual disk-on-key device (2) - like an "upload" from local directory that belongs to the user.
- Save files from the virtual device on local directory - like "download" from the device storage to the user directory.
Design:
Assuming programs (1),(2) are compiled and running + (3) has successfully inserted using the bash command ' sudo insmod device.ko ' , the following should work like this (simulation ofc):
Step 1 (in user.c) -> user types 'download file.txt'
Step 2 (in device.ko) -> the device recognizes the user have tried to 'write' to it (actually user just passing the string "download file.txt") and invokes the 'write' implementation of the method we set on struct file_operation earlier on module_init().
The device (kernel module) now passes the data (string with a command) to the storage.c application, expecting an answer to later be retrieved to the user.c application.
Step 3 (in storage.c) -> now, lets say this program performs a busy-wait loop of 'readmsg()' and that's how a request from module event is triggered and recognized, the storage device now recognizes that the module has sent a request (string with a command \ data). Now, the storage programs shall perform an implementation of some function 'X' to send the data requested using sendmsg() somewhere inside the function.
Now, here comes the issue.
Usually, on all of the examples I've looked on web, the communication between the kernel-module and a user-space (or the storage.c program in our case) using netlink is triggered by the user-space and not vice versa. Meaning that the sendmsg() function from the user-space invokes the 'request(struct sk_buff *skb)' method (which is set on the module_init() part as following:
struct netlink_kernel_cfg cfg = {
.input = request // when storage.c sends something, it invokes the request function
};
so when the storage.c performs something like:
sendmsg(sock_fd,&msg,0); // send a msg to the module
the module invokes and runs the:
static void request(struct sk_buff *skb) {
char *msg ="Hello from kernel";
msg_size=strlen(msg);
netlink_holder=(struct nlmsghdr*)skb->data;
printk(KERN_INFO "Netlink received msg payload:%s\n",(char*)nlmsg_data(netlink_holder));
pid = netlink_holder->nlmsg_pid; // pid of sending process
skb_out = nlmsg_new(msg_size,0);
if(!skb_out){
printk(KERN_ERR "Failed to allocate new skb\n");
return;
}
netlink_holder=nlmsg_put(skb_out,0,0,NLMSG_DONE,msg_size,0); // add a new netlink message to an skb. more info: http://elixir.free-electrons.com/linux/v3.2/source/include/net/netlink.h#L491
NETLINK_CB(skb_out).dst_group = 0; // not in multicast group
strncpy(nlmsg_data(netlink_holder),msg,msg_size); // assign data as char* (variable msg)
result=nlmsg_unicast(sock_netlink,skb_out,pid); // send data to storage. more info: http://elixir.free-electrons.com/linux/latest/source/include/net/netlink.h#L598
if(result<0)
printk(KERN_INFO "Error while sending bak to user\n");
}
and from all that big chunk, the only thing that im interesting in is actually doing this:
result=nlmsg_unicast(sock_netlink,skb_out,pid); // send data to storage.
BUT I can't use nlmsg_unicast() without having the strcut sk_buff* which is provided automatically for me whenever there's an invoke from storage.c !
To sum up everything:
How do I send a msg from the device.ko (kernel module) to the user-space withtout having to wait for request to invoke / rely on the provided strcut sk_buff parameter from the earlier shown 'request()' method ?
Hope this sums up the point.
Thanks.
The only question here is that you need the user-space program connected to kernel-space first to get the pid of your user-program.
After get the pid, you can manually construct the skb_out and send it out through netlink_unicast or nlmsg_unicast.
The pid is always needed, you can set it as static and let your user-space program connect to your device.ko to make a long-maintained link.
Although this question is asked at 2017, I believe OP has already found the answer :D
So I am supposed to achieve following behavior.
This project you are asked to add a new field to the task descriptor. The name and type of the field is: int casper;
If casper=0 : The process is visible to all, i.e. it is listed in the /proc file system and it can be seen using “ps”, “pstree”, “top”, ...
If casper=1 : The process is visible only to processes which have the same user id, i.e. for all other processes, it is NOT listed in the /proc file system and it can NOT be seen using “ps”, “pstree”, “top”, ...
If casper=2 : The process is visible only to processes which are in the same group, i.e. for all other processes, it is NOT listed in the /proc file system and it can NOT be seen using “ps”, “pstree”, “top”, ...
If casper=3 : The process is invisible for all, i.e. it is NOT listed in the /proc file system and it can NOT be seen using “ps”, “pstree”, “top”, ...
I have already modified task_struct definition and its default value for init process and added necessary stuff to fork sys call
I did some research but couldnt find an obvious way to do this. So I assumed that /proc is created per request so I can get the task_struct of the process that requested it and populate the /proc according to that. Am I on the right track?
Yes, it's. /proc is a vritual filesystem generated by the kernel upon request. Check the following article for more details: Linux VFS
Having some difficulties with Linux kernel namespaces today, specifically correlating PIDs inside of a unique PID namespace to those within the global PID namespace
I need to be able to do one of the following:
a) Kill a process from the global scope using a PID assigned by the namespace
OR
b) Translate a namespace specific PID to a global PID, so I can kill the PID from the global scope
OR
c) Enable a process within a PID namespace to report to me its global PID, so I can kill the PID from the global scope
There is some discussion on the process structures which contain the PID information in namespace scenarios here. I'm not sure how / if I can access these structures from a userland application, or if I need to add in support via a kernel hack.
Why?
I have an application which currently uses network namespaces. I am adding support for PID namespaces. Here is how it currently works:
Before the introduction of PID namespaces:
The main application currently launches a bash console in another network namespace. It then uses that bash console to start programs and has those programs report their current PID. When the main application wants to kill a subprocess in that network namespace, it just tells the OS to kill the PID reported back.
With PID namespaces (broken state):
The main application currently launches a bash console in another network and PID namespace. It then uses that bash console to start programs and has those programs report their current PID. However, the current PID reported back is not valid in the global PID namespace (it may be 10, when the PID in the global namespace is 56000). As a result, the main application cannot kill the subprocess in that network + PID namespace
As always, any guidance is appreciated
An approach could be search in the pid queue the process descriptor that matches the target pid, if the reporting shell is on the same workspace, it can make a system call to get the other process 'process descriptor' and make some kind of for loop to find the process descripton in /proc/< pid>
you may also want to take a look here: http://lkml.indiana.edu/hypermail/linux/kernel/0707.0/1701.html
Specially, this part:
/*
* the helpers to get the pid's id seen from different namespaces
*
* pid_nr() : global id, i.e. the id seen from the init namespace;
* pid_vnr() : virtual id, i.e. the id seen from the namespace this pid
* belongs to. this only makes sence when called in the
* context of the task that belongs to the same namespace;
* pid_nr_ns() : id seen from the ns specified.
*
* see also task_xid_nr() etc in include/linux/sched.h
*/
static inline pid_t pid_nr(struct pid *pid)
{
pid_t nr = 0;
if (pid)
- nr = pid->nr;
+ nr = pid->numbers[0].nr;
return nr;
}
Hope it helps!
Best Regards!
I am using following C function to create multiple network namespaces from a single process instance:
void create_namespace(const char *ns_name)
{
char ns_path[100];
snprintf(ns_path, 100, "%s/%s", "/var/run/netns", ns_name);
close(open(ns_path, O_RDONLY|O_CREAT|O_EXCL, 0));
unshare(CLONE_NEWNET);
mount("/proc/self/ns/net", ns_path, "none", MS_BIND , NULL);
}
After my process creates all the namspaces and I add a tap interface to any of the one network namespace (with ip link set tap1 netns ns1 command), then I actually see this interface in all of the namespaces (presumably, this is actually a single namespace that goes under different names).
But, if I create multiple namespaces by using multiple processes, then everything is working just fine.
What could be wrong here? Do I have to pass any additional flags to the unshare() to get this working from a single process instance? Is there a limitation that a single process instance can't create multiple network namespaces? Or is there a problem with mount() call, because /proc/self/ns/net is actually mounted multiple times?
Update:
It seems that unshare() function creates multiple network namespaces correctly, but all the mount points in /var/run/netns/ actually reference to the first network namespace that was mounted in that direcotry.
Update2:
It seems that the best approach is to fork() another process and execute create_namespace() function from there. Anyway, I would be glad to hear a better solution that does not involve fork() call or at least get a confirmation that would prove that it is impossible to create and manage multiple network namespaces from a single process.
Update3:
I am able to create multiple namespaces with unshare() by using the following code:
int main() {
create_namespace("a");
system("ip tuntap add mode tap tapa");
system("ifconfig -a");//shows lo and tapA interface
create_namespace("b");
system("ip tuntap add mode tap tapb");
system("ifconfig -a");//show lo and tapB interface, but does not show tapA. So this is second namespace created.
}
But after the process terminates and I execute ip netns exec a ifconfig -a and ip netns exec b ifconfig -a it seems that both commands were suddenly executed in namespace a. So the actual problem is storing the references to the namespaces (or calling mount() the right way. But I am not sure, if this is possible).
Network Namespaces are, by design, created with a call to clone, and it can be modified after by unshare. Take note that even if you do create a new network namespace with unshare, in fact you just modify network stack of your running process. unshare is unable to modify network stack of other processes, so you won't be able to create another one only with unshare.
In order to work, a new network namespace needs a new network stack, and so it needs a new process. That's all.
Good news is that it can be made very lightweight with clone, see:
Clone() differs from the traditional fork() system call in UNIX, in
that it allows the parent and child processes to selectively share or
duplicate resources.
You are able to divert only on this network stack (and avoid memory space, table of file descriptors and table of signal handlers). Your new network process can be made more like a thread than a real fork.
You can manipulate them with C code or with Linux Kernel and/or LXC tools.
For instance, to add a device to new network namespace, it's as simple as:
echo $PID > /sys/class/net/ethX/new_ns_pid
See this page for more info about CLI available.
On the C-side, one can take a look at lxc-unshare implementation. Despite its name it uses clone, as you can see (lxc_clone is here). One can also look at LTP implementation, where the author has chosen to use fork directly.
EDIT: There is a trick that you can use to make them persistent, but you will still need to fork, even temporarily.
Take a look at this code of ipsource2 (I have removed error checking for clarity):
snprintf(netns_path, sizeof(netns_path), "%s/%s", NETNS_RUN_DIR, name);
/* Create the base netns directory if it doesn't exist */
mkdir(NETNS_RUN_DIR, S_IRWXU|S_IRGRP|S_IXGRP|S_IROTH|S_IXOTH);
/* Create the filesystem state */
fd = open(netns_path, O_RDONLY|O_CREAT|O_EXCL, 0);
[...]
close(fd);
unshare(CLONE_NEWNET);
/* Bind the netns last so I can watch for it */
mount("/proc/self/ns/net", netns_path, "none", MS_BIND, NULL)
If you execute this code in a forked process, you'll be able to create new network namespace at will. In order to delete them, you can simply umount and delete this bind:
umount2(netns_path, MNT_DETACH);
if (unlink(netns_path) < 0) [...]
EDIT2: Another (dirty) trick would be simply to execute "ip netns add .." cli with system.
You only have to bind mount /proc/*/ns/* if you need to access these namespaces from another process, or need to get handle to be able to switch back and forth between the two. It is not needed to use multiple namespaces from a single process.
unshare does create new namespace.
clone and fork by default do not create any new namespaces.
there is one "current" namespace of each kind assigned to a process. It can be changed by unshare or setns. Set of namespaces (by default) is inherited by child processes.
Whenever you do open(/proc/N/ns/net), it creates inode for this file,
and all subsequent open()s will return file that is bound to the
same namespace. Details are lost in the depths of kernel dentry cache.
Also, each process has only one /proc/self/ns/net file entry, and
bind mount does not create new instances of this proc file.
Opening those mounted files are exactly the same as opening
/proc/self/ns/net file directly (which will keep pointing to the
namespace it pointed to when you first opened it).
It seems that "/proc/*/ns" is half-baked like this.
So, if you only need 2 namespaces, you can:
open /proc/1/ns/net
unshare
open /proc/self/ns/net
and switch between the two.
For more that 2 you might have to clone(). There seems to be no way to create more than one /proc/N/ns/net file per process.
However, if you do not need to switch between namespaces at runtime, or to share them with other processes, you can use many namespaces like this:
open sockets and run processes for main namespace.
unshare
open sockets and run processes for 2nd namespace (netlink, tcp, etc)
unshare
...
unshare
open sockets and run processes for Nth namespace (netlink, tcp, etc)
Open sockets keep reference to their network namespace, so they will not be collected until sockets are closed.
You can also use netlink to move interfaces between namespaces, by sending netlink command on source namespace, and specifying dst namespace either by PID or namespace FD (the later you don't have).
You need to switch process namespace before accessing /proc entries that depend on that namespace. Once "proc" file is open, it keeps reference to the namespace.