How to create multiple network namespace from a single process instance - c

I am using following C function to create multiple network namespaces from a single process instance:
void create_namespace(const char *ns_name)
{
char ns_path[100];
snprintf(ns_path, 100, "%s/%s", "/var/run/netns", ns_name);
close(open(ns_path, O_RDONLY|O_CREAT|O_EXCL, 0));
unshare(CLONE_NEWNET);
mount("/proc/self/ns/net", ns_path, "none", MS_BIND , NULL);
}
After my process creates all the namspaces and I add a tap interface to any of the one network namespace (with ip link set tap1 netns ns1 command), then I actually see this interface in all of the namespaces (presumably, this is actually a single namespace that goes under different names).
But, if I create multiple namespaces by using multiple processes, then everything is working just fine.
What could be wrong here? Do I have to pass any additional flags to the unshare() to get this working from a single process instance? Is there a limitation that a single process instance can't create multiple network namespaces? Or is there a problem with mount() call, because /proc/self/ns/net is actually mounted multiple times?
Update:
It seems that unshare() function creates multiple network namespaces correctly, but all the mount points in /var/run/netns/ actually reference to the first network namespace that was mounted in that direcotry.
Update2:
It seems that the best approach is to fork() another process and execute create_namespace() function from there. Anyway, I would be glad to hear a better solution that does not involve fork() call or at least get a confirmation that would prove that it is impossible to create and manage multiple network namespaces from a single process.
Update3:
I am able to create multiple namespaces with unshare() by using the following code:
int main() {
create_namespace("a");
system("ip tuntap add mode tap tapa");
system("ifconfig -a");//shows lo and tapA interface
create_namespace("b");
system("ip tuntap add mode tap tapb");
system("ifconfig -a");//show lo and tapB interface, but does not show tapA. So this is second namespace created.
}
But after the process terminates and I execute ip netns exec a ifconfig -a and ip netns exec b ifconfig -a it seems that both commands were suddenly executed in namespace a. So the actual problem is storing the references to the namespaces (or calling mount() the right way. But I am not sure, if this is possible).

Network Namespaces are, by design, created with a call to clone, and it can be modified after by unshare. Take note that even if you do create a new network namespace with unshare, in fact you just modify network stack of your running process. unshare is unable to modify network stack of other processes, so you won't be able to create another one only with unshare.
In order to work, a new network namespace needs a new network stack, and so it needs a new process. That's all.
Good news is that it can be made very lightweight with clone, see:
Clone() differs from the traditional fork() system call in UNIX, in
that it allows the parent and child processes to selectively share or
duplicate resources.
You are able to divert only on this network stack (and avoid memory space, table of file descriptors and table of signal handlers). Your new network process can be made more like a thread than a real fork.
You can manipulate them with C code or with Linux Kernel and/or LXC tools.
For instance, to add a device to new network namespace, it's as simple as:
echo $PID > /sys/class/net/ethX/new_ns_pid
See this page for more info about CLI available.
On the C-side, one can take a look at lxc-unshare implementation. Despite its name it uses clone, as you can see (lxc_clone is here). One can also look at LTP implementation, where the author has chosen to use fork directly.
EDIT: There is a trick that you can use to make them persistent, but you will still need to fork, even temporarily.
Take a look at this code of ipsource2 (I have removed error checking for clarity):
snprintf(netns_path, sizeof(netns_path), "%s/%s", NETNS_RUN_DIR, name);
/* Create the base netns directory if it doesn't exist */
mkdir(NETNS_RUN_DIR, S_IRWXU|S_IRGRP|S_IXGRP|S_IROTH|S_IXOTH);
/* Create the filesystem state */
fd = open(netns_path, O_RDONLY|O_CREAT|O_EXCL, 0);
[...]
close(fd);
unshare(CLONE_NEWNET);
/* Bind the netns last so I can watch for it */
mount("/proc/self/ns/net", netns_path, "none", MS_BIND, NULL)
If you execute this code in a forked process, you'll be able to create new network namespace at will. In order to delete them, you can simply umount and delete this bind:
umount2(netns_path, MNT_DETACH);
if (unlink(netns_path) < 0) [...]
EDIT2: Another (dirty) trick would be simply to execute "ip netns add .." cli with system.

You only have to bind mount /proc/*/ns/* if you need to access these namespaces from another process, or need to get handle to be able to switch back and forth between the two. It is not needed to use multiple namespaces from a single process.
unshare does create new namespace.
clone and fork by default do not create any new namespaces.
there is one "current" namespace of each kind assigned to a process. It can be changed by unshare or setns. Set of namespaces (by default) is inherited by child processes.
Whenever you do open(/proc/N/ns/net), it creates inode for this file,
and all subsequent open()s will return file that is bound to the
same namespace. Details are lost in the depths of kernel dentry cache.
Also, each process has only one /proc/self/ns/net file entry, and
bind mount does not create new instances of this proc file.
Opening those mounted files are exactly the same as opening
/proc/self/ns/net file directly (which will keep pointing to the
namespace it pointed to when you first opened it).
It seems that "/proc/*/ns" is half-baked like this.
So, if you only need 2 namespaces, you can:
open /proc/1/ns/net
unshare
open /proc/self/ns/net
and switch between the two.
For more that 2 you might have to clone(). There seems to be no way to create more than one /proc/N/ns/net file per process.
However, if you do not need to switch between namespaces at runtime, or to share them with other processes, you can use many namespaces like this:
open sockets and run processes for main namespace.
unshare
open sockets and run processes for 2nd namespace (netlink, tcp, etc)
unshare
...
unshare
open sockets and run processes for Nth namespace (netlink, tcp, etc)
Open sockets keep reference to their network namespace, so they will not be collected until sockets are closed.
You can also use netlink to move interfaces between namespaces, by sending netlink command on source namespace, and specifying dst namespace either by PID or namespace FD (the later you don't have).
You need to switch process namespace before accessing /proc entries that depend on that namespace. Once "proc" file is open, it keeps reference to the namespace.

Related

posix_spawn pipe dmesg to python script

I've got several USB to 422 adapters in my test system. I've used FTProg to give each adapter a specific name: Sensor1, Sensor2, etc. They will all be plugged in at power on. I don't want to hard code each adapter to a specific ttyUSBx. I want the drivers to figure out which tty it needs to use. I'm developing in C for a linux system. My first thought was to something like this in my startup code.
system("dmesg | find_usb.py");
The python script would find the devices since each one has a unique Product Description. Then using the usb tree to associate each device with its ttyUSBx. The script would then create /tmp/USBDevs which would just be a simple device:tty pairing that would be easy for the C code to search.
I've been told...DoN't UsE sYsTeM...use posix_spawn(). But I'm having problems getting the output of dmesg piped to my python script. This isn't working
char *my_args[] = {"dmesg", "|", "find_usb.py", NULL};
pid_t pid;
int status;
status = posix_spawn(&pid, "/bin/dmesg", NULL, NULL, my_args, NULL);
if(status == 0){
if(waitpid(pid, &status, 0) != -1);{
printf("posix_spawn exited: %i", status);
}
I've been trying to figure out how to do this with posix_spawn_file_actions(), but I'm not allowed to hit the peak of the 'Ballmer Curve' at work.
Thanks in advance
Instead of using /dev/ttyUSB* devices, write udev rules to generate named symlinks to the devices. For a brief how-to, see here. Basically, you'll have an udev rule for each device, ending with say SYMLINK+=Sensor-name, and in your program, use /dev/Sensor-name for each sensor. (I do recommend using Sensor- prefix, noting the initial Capital letter, as all device names are currently lowercase. This avoids any clashes with existing devices.)
These symlinks will then only exist when the matching device is plugged in, and will point to the correct device (/dev/ttyUSB* in this case). When the device is removed, udev automagically deletes the symlink also. Just make sure your udev rule identifies the device precisely (not just vendor:device, but serial number also). I'd expect the rule to look something like
SUBSYSTEM=="tty", ATTRS{idVendor}=="VVVV", ATTRS{idProduct}=="PPPP", ATTRS{serial}=="SSSSSSSS", SYMLINK+="Sensor-name"
where VVVV is the USB Vendor ID (four hexadecimal digits), PPPP is the USB Product ID (four hexadecimal digits), and SSSSSSSS is the serial number string. You can see these values using e.g. udevadm info -a -n /dev/ttyUSB* when the device is plugged in.
If you still insist on parsing dmesg output, using your own script is a good idea.
You could use FILE *handle = popen("dmesg | find_usb.py", "r"); and read from handle like it was a file. When complete, close the handle using int exitstatus = pclose(handle);. See man popen and man pclose for the details, and man 2 wait for the WIFEXITED(), WEXITSTATUS(), WIFSIGNALED(), WTERMSIG() macros you'll need to use to examine exitstatus (although in your case, I suppose you can just ignore any errors).
If you do want to use posix_spawn() (or roughly equivalently, fork() and execvp()), you'd need to set up at least one pipe (to read the output of the spawned command) – two if you spawn/fork+exec both dmesg and your Python script –, and that gets a bit more complicated. See man pipe for details on that. Personally, I would rewrite the Python script so that it executes dmesg itself internally, and only outputs the device name(s). With posix_spawn(), you'd init a posix_file_actions_t, with three actions: _adddup2() to duplicate the write end of the pipe to STDOUT_FILENO, and two _addclose()s to close both ends of the pipe. However, I myself prefer to use fork() and exec() instead, somewhat similar to the example by Glärbo in this answer.

How does the Linux Kernel know which file descriptor to write input events to?

I would like to know the mechanism in which the Linux Kernel knows which file descriptor (e.g. /dev/input/eventX) to write the input to. For example, I know that when the user clicks the mouse, an interrupt occurs, which gets handled by the driver and propagated to the Linux input core via input_event (drivers/input/input.c), which eventually gets written to the appropriate file in /dev/input/. Specifically, I want to know which source files I need to go through to see how the kernel knows which file to write to based on the information given about the input event. My goal is to see if I can determine the file descriptors corresponding to specific input event codes before the kernel writes them to the /dev/input/eventX character files.
You may go through two files:
drivers/input/input.c
drivers/input/evdev.c
In evdev.c, evdev_init() will call input_register_handler() to initialize input_handler_list.
Then in an input device driver, after initialize input_dev, it will call:
input_register_device(input_dev)
-> get device kobj path, like /devices/soc/78ba000.i2c/i2c-6/6-0038/input/input2
-> input_attach_handler()
-> handler->connect(handler, dev, id);
-> evdev_connect()
In evdev_connect(), it will do below:
1. dynamic allocate a minor for a new evdev.
2. dev_set_name(&evdev->dev, "event%d", dev_no);
3. call input_register_handle() to connect input_dev and evdev->handle.
4. create a cdev, and call device_add().
After this, you will find input node /dev/input/eventX, X is value of dev_no.

Close device/socket in VxWorks

Is there a way to close the device/socket in VxWorks programmatically?
Meaning say I have the devices /tyco/0, /tyco/1 and /tyco/2 and I want to close/shutdown /tyco/1 and /tyco/2.
I would like to do something like remove("/tyco/1"). Something that would prevent even an open("/tyco/1") call later on in the code or from an outside source from opening the socket.
All devices available to VxWorks are part of the device list. The device list is accessible using the iosLib.
I've used the following code a lot to remove devices to generate errors in order to test my programs:
DEV_HDR *pDevice;
pDevice = iosDevFind("/xyz", NULL);
if (pDevice != NULL)
{
iosDevDelete(pDevice);
}
This works for all devices listed by the devs command which in your case will also work for "/tyco". I doubt that you can inhibit open calls to "/tyco/1" and "/tyco/2" but allow calls to "/tyco/0" using that method since it works on "devices".
If "/tyco/0" is your serial interface to the VxWorks shell then the method from above will work. Because removing a device from the device list will cause all following open calls to that device to fail but will not close already opened devices...

Is /proc directory generated dynamically per request?

So I am supposed to achieve following behavior.
This project you are asked to add a new field to the task descriptor. The name and type of the field is: int casper;
If casper=0 : The process is visible to all, i.e. it is listed in the /proc file system and it can be seen using “ps”, “pstree”, “top”, ...
If casper=1 : The process is visible only to processes which have the same user id, i.e. for all other processes, it is NOT listed in the /proc file system and it can NOT be seen using “ps”, “pstree”, “top”, ...
If casper=2 : The process is visible only to processes which are in the same group, i.e. for all other processes, it is NOT listed in the /proc file system and it can NOT be seen using “ps”, “pstree”, “top”, ...
If casper=3 : The process is invisible for all, i.e. it is NOT listed in the /proc file system and it can NOT be seen using “ps”, “pstree”, “top”, ...
I have already modified task_struct definition and its default value for init process and added necessary stuff to fork sys call
I did some research but couldnt find an obvious way to do this. So I assumed that /proc is created per request so I can get the task_struct of the process that requested it and populate the /proc according to that. Am I on the right track?
Yes, it's. /proc is a vritual filesystem generated by the kernel upon request. Check the following article for more details: Linux VFS

Getting user process pid when writing Linux Kernel Module

How can I get the PID of the user process which triggered my Kernel module's file_operation.read routine (i.e., which process is reading /dev/mydev) ?
When your read function is executing, it's doing so in the context of the process that issued the system call. You should thus pe able to use current, i.e. current->pid.
These days, we have some helper functions defined in sched.h. In the case of pid, you can use:
pid = task_pid_nr(current);
to get the current task's pid.
here is the comment taken from include/linux/sched.h as of v3.8.
the helpers to get the task's different pids as they are seen
from various namespaces
task_xid_nr() : global id, i.e. the id seen from the init namespace;
task_xid_vnr() : virtual id, i.e. the id seen from the pid namespace of current.
task_xid_nr_ns() : id seen from the ns specified;
set_task_vxid() : assigns a virtual id to a task;
see also pid_nr() etc in include/linux/pid.h
On a kernel 2.6.39 arm build, if current->pid does not work then it may be done by:
pid_nr(get_task_pid(current, PIDTYPE_PID))
The PIDTYPE_PID can be substituted by PIDTYPE_PGID or PIDTYPE_SID. The header source is at include/linux/pid.h as Yasushi pointed out.
Which of the approaches work depends on what header files the code uses.

Resources