Killing a process from the global scope using its kernel namespace PID - c

Having some difficulties with Linux kernel namespaces today, specifically correlating PIDs inside of a unique PID namespace to those within the global PID namespace
I need to be able to do one of the following:
a) Kill a process from the global scope using a PID assigned by the namespace
OR
b) Translate a namespace specific PID to a global PID, so I can kill the PID from the global scope
OR
c) Enable a process within a PID namespace to report to me its global PID, so I can kill the PID from the global scope
There is some discussion on the process structures which contain the PID information in namespace scenarios here. I'm not sure how / if I can access these structures from a userland application, or if I need to add in support via a kernel hack.
Why?
I have an application which currently uses network namespaces. I am adding support for PID namespaces. Here is how it currently works:
Before the introduction of PID namespaces:
The main application currently launches a bash console in another network namespace. It then uses that bash console to start programs and has those programs report their current PID. When the main application wants to kill a subprocess in that network namespace, it just tells the OS to kill the PID reported back.
With PID namespaces (broken state):
The main application currently launches a bash console in another network and PID namespace. It then uses that bash console to start programs and has those programs report their current PID. However, the current PID reported back is not valid in the global PID namespace (it may be 10, when the PID in the global namespace is 56000). As a result, the main application cannot kill the subprocess in that network + PID namespace
As always, any guidance is appreciated

An approach could be search in the pid queue the process descriptor that matches the target pid, if the reporting shell is on the same workspace, it can make a system call to get the other process 'process descriptor' and make some kind of for loop to find the process descripton in /proc/< pid>
you may also want to take a look here: http://lkml.indiana.edu/hypermail/linux/kernel/0707.0/1701.html
Specially, this part:
/*
* the helpers to get the pid's id seen from different namespaces
*
* pid_nr() : global id, i.e. the id seen from the init namespace;
* pid_vnr() : virtual id, i.e. the id seen from the namespace this pid
* belongs to. this only makes sence when called in the
* context of the task that belongs to the same namespace;
* pid_nr_ns() : id seen from the ns specified.
*
* see also task_xid_nr() etc in include/linux/sched.h
*/
static inline pid_t pid_nr(struct pid *pid)
{
pid_t nr = 0;
if (pid)
- nr = pid->nr;
+ nr = pid->numbers[0].nr;
return nr;
}
Hope it helps!
Best Regards!

Related

New Process Creation in Operating System (Unix)

I am sorry if my question seems silly. I have a query regarding new process creation in operating system. Consider the following simple C code:
//hello.c
#include <stdio.h>
int main()
{
printf("Hello World\n");
return 0;
}
When is Compiled with gcc.
gcc hello.c
now executing the executable a.out
./a.out
Now I don't understand how in this case New Process is created, who calls the fork() and exec system calls and which process is duplicated to have a.out as child process? In this example, parent process explicitly calls fork system call to create child process but in above hello.c code there is no fork call.
Typically, the parent process issues the fork() system call, creating a duplicate process having (mostly) the same properties as the original process (different process ID, for example). From there, the child process issues one of the exec family system calls to replace its own process image with a new one. This is explained quite well on the Unix SE.
In your case, the shell is the parent process, and the "new program" you are running is the child process which eventually calls exec.
Your shell will create new process for your program and it will do exec. It means parent process for your program is your shell.
echo $$ This command will give shell process ID.
Program :
printf("PPID : %d, PID : %d\n", getppid(), getpid() );
When you are run the program you will get below output.
PPID : 19172, PID : 26388
Note :
Parent process and shell process ID's are same.

(kernel simple programming) I want to know why pids are displayed like this [duplicate]

This question already has answers here:
How does Linux determine the next PID?
(4 answers)
Closed 6 years ago.
I am practicing making device driver programming.
my goal was to get current task's pid, and parent task's pid.
furthermore, I get grand(parent's parent) task's pid.
this is source code.
struct task_struct * current_task = current;
struct task_struct * parent_task = current_task->parent;
printk("current pid : %d, current_task->pid);
printk("parent task pid : %d\n", parent_task->pid);
printk("grand pid : %d\n", parent_task->parent->pid);
this is result..
(I ran this 3 times.)
current pid : 2850
parent task pid : 2846
grand parent task : 1692
current pid : 2872
parent task pid : 2864
grand parent task : 1692
current pid : 2891
parent task pid : 2887
grand parent task : 1692
grand parent task's pid is always 1692.
current's one and parent's one are increasing.
I am just wondering why...
It will be helpful if someone answer me..
Thanks for reading.
you have not specified whether you are using IDE for writing and executing the code.
In the case if you are using it
grand parent task is IDE (which is constant till you close and reopen it.
parent task id - when you execute the program, the program is called by a process which is created by the IDE and each time you run the program it is called by new processes created by IDE
(when you program terminates, the calling process created by your IDE is also terminates).
current pid is your process id
Process ID's are assigned in increment order by os.

Creating a daemon in Linux

In Linux I want to add a daemon that cannot be stopped and which monitors filesystem changes.
If any changes are detected, it should write the path to the console where it was started plus a newline.
I already have the filesystem changing code almost ready but I cannot figure out how to create a daemon.
My code is from here: http://www.yolinux.com/TUTORIALS/ForkExecProcesses.html
What to do after the fork?
int main (int argc, char **argv) {
pid_t pID = fork();
if (pID == 0) { // child
// Code only executed by child process
sIdentifier = "Child Process: ";
}
else if (pID < 0) {
cerr << "Failed to fork" << endl;
exit(1);
// Throw exception
}
else // parent
{
// Code only executed by parent process
sIdentifier = "Parent Process:";
}
return 0;
}
In Linux i want to add a daemon that cannot be stopped and which monitors filesystem changes. If any changes would be detected it should write the path to the console where it was started + a newline.
Daemons work in the background and (usually...) don't belong to a TTY that's why you can't use stdout/stderr in the way you probably want.
Usually a syslog daemon (syslogd) is used for logging messages to files (debug, error,...).
Besides that, there are a few required steps to daemonize a process.
If I remember correctly these steps are:
fork off the parent process & let it terminate if forking was successful. -> Because the parent process has terminated, the child process now runs in the background.
setsid - Create a new session. The calling process becomes the leader of the new session and the process group leader of the new process group. The process is now detached from its controlling terminal (CTTY).
Catch signals - Ignore and/or handle signals.
fork again & let the parent process terminate to ensure that you get rid of the session leading process. (Only session leaders may get a TTY again.)
chdir - Change the working directory of the daemon.
umask - Change the file mode mask according to the needs of the daemon.
close - Close all open file descriptors that may be inherited from the parent process.
To give you a starting point: Look at this skeleton code that shows the basic steps. This code can now also be forked on GitHub: Basic skeleton of a linux daemon
/*
* daemonize.c
* This example daemonizes a process, writes a few log messages,
* sleeps 20 seconds and terminates afterwards.
*/
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <signal.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <syslog.h>
static void skeleton_daemon()
{
pid_t pid;
/* Fork off the parent process */
pid = fork();
/* An error occurred */
if (pid < 0)
exit(EXIT_FAILURE);
/* Success: Let the parent terminate */
if (pid > 0)
exit(EXIT_SUCCESS);
/* On success: The child process becomes session leader */
if (setsid() < 0)
exit(EXIT_FAILURE);
/* Catch, ignore and handle signals */
//TODO: Implement a working signal handler */
signal(SIGCHLD, SIG_IGN);
signal(SIGHUP, SIG_IGN);
/* Fork off for the second time*/
pid = fork();
/* An error occurred */
if (pid < 0)
exit(EXIT_FAILURE);
/* Success: Let the parent terminate */
if (pid > 0)
exit(EXIT_SUCCESS);
/* Set new file permissions */
umask(0);
/* Change the working directory to the root directory */
/* or another appropriated directory */
chdir("/");
/* Close all open file descriptors */
int x;
for (x = sysconf(_SC_OPEN_MAX); x>=0; x--)
{
close (x);
}
/* Open the log file */
openlog ("firstdaemon", LOG_PID, LOG_DAEMON);
}
int main()
{
skeleton_daemon();
while (1)
{
//TODO: Insert daemon code here.
syslog (LOG_NOTICE, "First daemon started.");
sleep (20);
break;
}
syslog (LOG_NOTICE, "First daemon terminated.");
closelog();
return EXIT_SUCCESS;
}
Compile the code: gcc -o firstdaemon daemonize.c
Start the daemon: ./firstdaemon
Check if everything is working properly: ps -xj | grep firstdaemon
The output should be similar to this one:
+------+------+------+------+-----+-------+------+------+------+-----+
| PPID | PID | PGID | SID | TTY | TPGID | STAT | UID | TIME | CMD |
+------+------+------+------+-----+-------+------+------+------+-----+
| 1 | 3387 | 3386 | 3386 | ? | -1 | S | 1000 | 0:00 | ./ |
+------+------+------+------+-----+-------+------+------+------+-----+
What you should see here is:
The daemon has no controlling terminal (TTY = ?)
The parent process ID (PPID) is 1 (The init process)
The PID != SID which means that our process is NOT the session leader
(because of the second fork())
Because PID != SID our process can't take control of a TTY again
Reading the syslog:
Locate your syslog file. Mine is here: /var/log/syslog
Do a: grep firstdaemon /var/log/syslog
The output should be similar to this one:
firstdaemon[3387]: First daemon started.
firstdaemon[3387]: First daemon terminated.
A note:
In reality you would also want to implement a signal handler and set up the logging properly (Files, log levels...).
Further reading:
Linux-UNIX-Programmierung - German
Unix Daemon Server Programming
man 7 daemon describes how to create daemon in great detail. My answer is just excerpt from this manual.
There are at least two types of daemons:
traditional SysV daemons (old-style),
systemd daemons (new-style).
SysV Daemons
If you are interested in traditional SysV daemon, you should implement the following steps:
Close all open file descriptors except standard input, output, and error (i.e. the first three file descriptors 0, 1, 2). This ensures that no accidentally passed file descriptor stays around in the daemon process. On Linux, this is best implemented by iterating through /proc/self/fd, with a fallback of iterating from file descriptor 3 to the value returned by getrlimit() for RLIMIT_NOFILE.
Reset all signal handlers to their default. This is best done by iterating through the available signals up to the limit of _NSIG and resetting them to SIG_DFL.
Reset the signal mask using sigprocmask().
Sanitize the environment block, removing or resetting environment variables that might negatively impact daemon runtime.
Call fork(), to create a background process.
In the child, call setsid() to detach from any terminal and create an independent session.
In the child, call fork() again, to ensure that the daemon can never re-acquire a terminal again.
Call exit() in the first child, so that only the second child (the actual daemon process) stays around. This ensures that the daemon process is re-parented to init/PID 1, as all daemons should be.
In the daemon process, connect /dev/null to standard input, output, and error.
In the daemon process, reset the umask to 0, so that the file modes passed to open(), mkdir() and suchlike directly control the access mode of the created files and directories.
In the daemon process, change the current directory to the root directory (/), in order to avoid that the daemon involuntarily blocks mount points from being unmounted.
In the daemon process, write the daemon PID (as returned by getpid()) to a PID file, for example /run/foobar.pid (for a hypothetical daemon "foobar") to ensure that the daemon cannot be started more than once. This must be implemented in race-free fashion so that the PID file is only updated when it is verified at the same time that the PID previously stored in the PID file no longer exists or belongs to a foreign process.
In the daemon process, drop privileges, if possible and applicable.
From the daemon process, notify the original process started that initialization is complete. This can be implemented via an unnamed pipe or similar communication channel that is created before the first fork() and hence available in both the original and the daemon process.
Call exit() in the original process. The process that invoked the daemon must be able to rely on that this exit() happens after initialization is complete and all external communication channels are established and accessible.
Note this warning:
The BSD daemon() function should not be used, as it implements only a subset of these steps.
A daemon that needs to provide compatibility with SysV systems should implement the scheme pointed out above. However, it is recommended to make this behavior optional and configurable via a command line argument to ease debugging as well as to simplify integration into systems using systemd.
Note that daemon() is not POSIX compliant.
New-Style Daemons
For new-style daemons the following steps are recommended:
If SIGTERM is received, shut down the daemon and exit cleanly.
If SIGHUP is received, reload the configuration files, if this applies.
Provide a correct exit code from the main daemon process, as this is used by the init system to detect service errors and problems. It is recommended to follow the exit code scheme as defined in the LSB recommendations for SysV init scripts.
If possible and applicable, expose the daemon's control interface via the D-Bus IPC system and grab a bus name as last step of initialization.
For integration in systemd, provide a .service unit file that carries information about starting, stopping and otherwise maintaining the daemon. See systemd.service(5) for details.
As much as possible, rely on the init system's functionality to limit the access of the daemon to files, services and other resources, i.e. in the case of systemd, rely on systemd's resource limit control instead of implementing your own, rely on systemd's privilege dropping code instead of implementing it in the daemon, and similar. See systemd.exec(5) for the available controls.
If D-Bus is used, make your daemon bus-activatable by supplying a D-Bus service activation configuration file. This has multiple advantages: your daemon may be started lazily on-demand; it may be started in parallel to other daemons requiring it — which maximizes parallelization and boot-up speed; your daemon can be restarted on failure without losing any bus requests, as the bus queues requests for activatable services. See below for details.
If your daemon provides services to other local processes or remote clients via a socket, it should be made socket-activatable following the scheme pointed out below. Like D-Bus activation, this enables on-demand starting of services as well as it allows improved parallelization of service start-up. Also, for state-less protocols (such as syslog, DNS), a daemon implementing socket-based activation can be restarted without losing a single request. See below for details.
If applicable, a daemon should notify the init system about startup completion or status updates via the sd_notify(3) interface.
Instead of using the syslog() call to log directly to the system syslog service, a new-style daemon may choose to simply log to standard error via fprintf(), which is then forwarded to syslog by the init system. If log levels are necessary, these can be encoded by prefixing individual log lines with strings like "<4>" (for log level 4 "WARNING" in the syslog priority scheme), following a similar style as the Linux kernel's printk() level system. For details, see sd-daemon(3) and systemd.exec(5).
To learn more read whole man 7 daemon.
You cannot create a process in linux that cannot be killed. The root user (uid=0) can send a signal to a process, and there are two signals which cannot be caught, SIGKILL=9, SIGSTOP=19. And other signals (when uncaught) can also result in process termination.
You may want a more general daemonize function, where you can specify a name for your program/daemon, and a path to run your program (perhaps "/" or "/tmp"). You may also want to provide file(s) for stderr and stdout (and possibly a control path using stdin).
Here are the necessary includes:
#include <stdio.h> //printf(3)
#include <stdlib.h> //exit(3)
#include <unistd.h> //fork(3), chdir(3), sysconf(3)
#include <signal.h> //signal(3)
#include <sys/stat.h> //umask(3)
#include <syslog.h> //syslog(3), openlog(3), closelog(3)
And here is a more general function,
int
daemonize(char* name, char* path, char* outfile, char* errfile, char* infile )
{
if(!path) { path="/"; }
if(!name) { name="medaemon"; }
if(!infile) { infile="/dev/null"; }
if(!outfile) { outfile="/dev/null"; }
if(!errfile) { errfile="/dev/null"; }
//printf("%s %s %s %s\n",name,path,outfile,infile);
pid_t child;
//fork, detach from process group leader
if( (child=fork())<0 ) { //failed fork
fprintf(stderr,"error: failed fork\n");
exit(EXIT_FAILURE);
}
if (child>0) { //parent
exit(EXIT_SUCCESS);
}
if( setsid()<0 ) { //failed to become session leader
fprintf(stderr,"error: failed setsid\n");
exit(EXIT_FAILURE);
}
//catch/ignore signals
signal(SIGCHLD,SIG_IGN);
signal(SIGHUP,SIG_IGN);
//fork second time
if ( (child=fork())<0) { //failed fork
fprintf(stderr,"error: failed fork\n");
exit(EXIT_FAILURE);
}
if( child>0 ) { //parent
exit(EXIT_SUCCESS);
}
//new file permissions
umask(0);
//change to path directory
chdir(path);
//Close all open file descriptors
int fd;
for( fd=sysconf(_SC_OPEN_MAX); fd>0; --fd )
{
close(fd);
}
//reopen stdin, stdout, stderr
stdin=fopen(infile,"r"); //fd=0
stdout=fopen(outfile,"w+"); //fd=1
stderr=fopen(errfile,"w+"); //fd=2
//open syslog
openlog(name,LOG_PID,LOG_DAEMON);
return(0);
}
Here is a sample program, which becomes a daemon, hangs around, and then leaves.
int
main()
{
int res;
int ttl=120;
int delay=5;
if( (res=daemonize("mydaemon","/tmp",NULL,NULL,NULL)) != 0 ) {
fprintf(stderr,"error: daemonize failed\n");
exit(EXIT_FAILURE);
}
while( ttl>0 ) {
//daemon code here
syslog(LOG_NOTICE,"daemon ttl %d",ttl);
sleep(delay);
ttl-=delay;
}
syslog(LOG_NOTICE,"daemon ttl expired");
closelog();
return(EXIT_SUCCESS);
}
Note that SIG_IGN indicates to catch and ignore the signal. You could build a signal handler that can log signal receipt, and set flags (such as a flag to indicate graceful shutdown).
Try using the daemon function:
#include <unistd.h>
int daemon(int nochdir, int noclose);
From the man page:
The daemon() function is for programs wishing to detach themselves
from the controlling terminal and run in the background as system
daemons.
If nochdir is zero, daemon() changes the calling process's current
working directory to the root directory ("/"); otherwise, the current
working directory is left unchanged.
If noclose is zero, daemon() redirects standard input, standard
output and standard error to /dev/null; otherwise, no changes are
made to these file descriptors.
I can stop at the first requirement "A daemon which cannot be stopped ..."
Not possible my friend; however, you can achieve the same with a much better tool, a kernel module.
http://www.infoq.com/articles/inotify-linux-file-system-event-monitoring
All daemons can be stopped. Some are more easily stopped than others. Even a daemon pair with the partner in hold down, respawning the partner if lost, can be stopped. You just have to work a little harder at it.
If your app is one of:
{
".sh": "bash",
".py": "python",
".rb": "ruby",
".coffee" : "coffee",
".php": "php",
".pl" : "perl",
".js" : "node"
}
and you don't mind a NodeJS dependency then install NodeJS and then:
npm install -g pm2
pm2 start yourapp.yourext --name "fred" # where .yourext is one of the above
pm2 start yourapp.yourext -i 0 --name "fred" # run your app on all cores
pm2 list
To keep all apps running on reboot (and daemonise pm2):
pm2 startup
pm2 save
Now you can:
service pm2 stop|restart|start|status
(also easily allows you to watch for code changes in your app directory and auto restart the app process when a code change happens)
Daemon Template
I wrote a daemon template following the new-style daemon: link
You can find the entire template code on GitHub: here
Main.cpp
// This function will be called when the daemon receive a SIGHUP signal.
void reload() {
LOG_INFO("Reload function called.");
}
int main(int argc, char **argv) {
// The Daemon class is a singleton to avoid be instantiate more than once
Daemon& daemon = Daemon::instance();
// Set the reload function to be called in case of receiving a SIGHUP signal
daemon.setReloadFunction(reload);
// Daemon main loop
int count = 0;
while(daemon.IsRunning()) {
LOG_DEBUG("Count: ", count++);
std::this_thread::sleep_for(std::chrono::seconds(1));
}
LOG_INFO("The daemon process ended gracefully.");
}
Daemon.hpp
class Daemon {
public:
static Daemon& instance() {
static Daemon instance;
return instance;
}
void setReloadFunction(std::function<void()> func);
bool IsRunning();
private:
std::function<void()> m_reloadFunc;
bool m_isRunning;
bool m_reload;
Daemon();
Daemon(Daemon const&) = delete;
void operator=(Daemon const&) = delete;
void Reload();
static void signalHandler(int signal);
};
Daemon.cpp
Daemon::Daemon() {
m_isRunning = true;
m_reload = false;
signal(SIGINT, Daemon::signalHandler);
signal(SIGTERM, Daemon::signalHandler);
signal(SIGHUP, Daemon::signalHandler);
}
void Daemon::setReloadFunction(std::function<void()> func) {
m_reloadFunc = func;
}
bool Daemon::IsRunning() {
if (m_reload) {
m_reload = false;
m_reloadFunc();
}
return m_isRunning;
}
void Daemon::signalHandler(int signal) {
LOG_INFO("Interrup signal number [", signal,"] recived.");
switch(signal) {
case SIGINT:
case SIGTERM: {
Daemon::instance().m_isRunning = false;
break;
}
case SIGHUP: {
Daemon::instance().m_reload = true;
break;
}
}
}
daemon-template.service
[Unit]
Description=Simple daemon template
After=network.taget
[Service]
Type=simple
ExecStart=/usr/bin/daemon-template --conf_file /etc/daemon-template/daemon-tenplate.conf
ExecReload=/bin/kill -HUP $MAINPID
User=root
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=daemon-template
[Install]
WantedBy=multi-user.target
By calling fork() you've created a child process. If the fork is successful (fork returned a non-zero PID) execution will continue from this point from within the child process. In this case we want to gracefully exit the parent process and then continue our work in the child process.
Maybe this will help:
http://www.netzmafia.de/skripten/unix/linux-daemon-howto.html
A daemon is just a process in the background. If you want to start your program when the OS boots, on linux, you add your start command to /etc/rc.d/rc.local (run after all other scripts) or /etc/startup.sh
On windows, you make a service, register the service, and then set it to start automatically at boot in administration -> services panel.

Getting user process pid when writing Linux Kernel Module

How can I get the PID of the user process which triggered my Kernel module's file_operation.read routine (i.e., which process is reading /dev/mydev) ?
When your read function is executing, it's doing so in the context of the process that issued the system call. You should thus pe able to use current, i.e. current->pid.
These days, we have some helper functions defined in sched.h. In the case of pid, you can use:
pid = task_pid_nr(current);
to get the current task's pid.
here is the comment taken from include/linux/sched.h as of v3.8.
the helpers to get the task's different pids as they are seen
from various namespaces
task_xid_nr() : global id, i.e. the id seen from the init namespace;
task_xid_vnr() : virtual id, i.e. the id seen from the pid namespace of current.
task_xid_nr_ns() : id seen from the ns specified;
set_task_vxid() : assigns a virtual id to a task;
see also pid_nr() etc in include/linux/pid.h
On a kernel 2.6.39 arm build, if current->pid does not work then it may be done by:
pid_nr(get_task_pid(current, PIDTYPE_PID))
The PIDTYPE_PID can be substituted by PIDTYPE_PGID or PIDTYPE_SID. The header source is at include/linux/pid.h as Yasushi pointed out.
Which of the approaches work depends on what header files the code uses.

How to create multiple network namespace from a single process instance

I am using following C function to create multiple network namespaces from a single process instance:
void create_namespace(const char *ns_name)
{
char ns_path[100];
snprintf(ns_path, 100, "%s/%s", "/var/run/netns", ns_name);
close(open(ns_path, O_RDONLY|O_CREAT|O_EXCL, 0));
unshare(CLONE_NEWNET);
mount("/proc/self/ns/net", ns_path, "none", MS_BIND , NULL);
}
After my process creates all the namspaces and I add a tap interface to any of the one network namespace (with ip link set tap1 netns ns1 command), then I actually see this interface in all of the namespaces (presumably, this is actually a single namespace that goes under different names).
But, if I create multiple namespaces by using multiple processes, then everything is working just fine.
What could be wrong here? Do I have to pass any additional flags to the unshare() to get this working from a single process instance? Is there a limitation that a single process instance can't create multiple network namespaces? Or is there a problem with mount() call, because /proc/self/ns/net is actually mounted multiple times?
Update:
It seems that unshare() function creates multiple network namespaces correctly, but all the mount points in /var/run/netns/ actually reference to the first network namespace that was mounted in that direcotry.
Update2:
It seems that the best approach is to fork() another process and execute create_namespace() function from there. Anyway, I would be glad to hear a better solution that does not involve fork() call or at least get a confirmation that would prove that it is impossible to create and manage multiple network namespaces from a single process.
Update3:
I am able to create multiple namespaces with unshare() by using the following code:
int main() {
create_namespace("a");
system("ip tuntap add mode tap tapa");
system("ifconfig -a");//shows lo and tapA interface
create_namespace("b");
system("ip tuntap add mode tap tapb");
system("ifconfig -a");//show lo and tapB interface, but does not show tapA. So this is second namespace created.
}
But after the process terminates and I execute ip netns exec a ifconfig -a and ip netns exec b ifconfig -a it seems that both commands were suddenly executed in namespace a. So the actual problem is storing the references to the namespaces (or calling mount() the right way. But I am not sure, if this is possible).
Network Namespaces are, by design, created with a call to clone, and it can be modified after by unshare. Take note that even if you do create a new network namespace with unshare, in fact you just modify network stack of your running process. unshare is unable to modify network stack of other processes, so you won't be able to create another one only with unshare.
In order to work, a new network namespace needs a new network stack, and so it needs a new process. That's all.
Good news is that it can be made very lightweight with clone, see:
Clone() differs from the traditional fork() system call in UNIX, in
that it allows the parent and child processes to selectively share or
duplicate resources.
You are able to divert only on this network stack (and avoid memory space, table of file descriptors and table of signal handlers). Your new network process can be made more like a thread than a real fork.
You can manipulate them with C code or with Linux Kernel and/or LXC tools.
For instance, to add a device to new network namespace, it's as simple as:
echo $PID > /sys/class/net/ethX/new_ns_pid
See this page for more info about CLI available.
On the C-side, one can take a look at lxc-unshare implementation. Despite its name it uses clone, as you can see (lxc_clone is here). One can also look at LTP implementation, where the author has chosen to use fork directly.
EDIT: There is a trick that you can use to make them persistent, but you will still need to fork, even temporarily.
Take a look at this code of ipsource2 (I have removed error checking for clarity):
snprintf(netns_path, sizeof(netns_path), "%s/%s", NETNS_RUN_DIR, name);
/* Create the base netns directory if it doesn't exist */
mkdir(NETNS_RUN_DIR, S_IRWXU|S_IRGRP|S_IXGRP|S_IROTH|S_IXOTH);
/* Create the filesystem state */
fd = open(netns_path, O_RDONLY|O_CREAT|O_EXCL, 0);
[...]
close(fd);
unshare(CLONE_NEWNET);
/* Bind the netns last so I can watch for it */
mount("/proc/self/ns/net", netns_path, "none", MS_BIND, NULL)
If you execute this code in a forked process, you'll be able to create new network namespace at will. In order to delete them, you can simply umount and delete this bind:
umount2(netns_path, MNT_DETACH);
if (unlink(netns_path) < 0) [...]
EDIT2: Another (dirty) trick would be simply to execute "ip netns add .." cli with system.
You only have to bind mount /proc/*/ns/* if you need to access these namespaces from another process, or need to get handle to be able to switch back and forth between the two. It is not needed to use multiple namespaces from a single process.
unshare does create new namespace.
clone and fork by default do not create any new namespaces.
there is one "current" namespace of each kind assigned to a process. It can be changed by unshare or setns. Set of namespaces (by default) is inherited by child processes.
Whenever you do open(/proc/N/ns/net), it creates inode for this file,
and all subsequent open()s will return file that is bound to the
same namespace. Details are lost in the depths of kernel dentry cache.
Also, each process has only one /proc/self/ns/net file entry, and
bind mount does not create new instances of this proc file.
Opening those mounted files are exactly the same as opening
/proc/self/ns/net file directly (which will keep pointing to the
namespace it pointed to when you first opened it).
It seems that "/proc/*/ns" is half-baked like this.
So, if you only need 2 namespaces, you can:
open /proc/1/ns/net
unshare
open /proc/self/ns/net
and switch between the two.
For more that 2 you might have to clone(). There seems to be no way to create more than one /proc/N/ns/net file per process.
However, if you do not need to switch between namespaces at runtime, or to share them with other processes, you can use many namespaces like this:
open sockets and run processes for main namespace.
unshare
open sockets and run processes for 2nd namespace (netlink, tcp, etc)
unshare
...
unshare
open sockets and run processes for Nth namespace (netlink, tcp, etc)
Open sockets keep reference to their network namespace, so they will not be collected until sockets are closed.
You can also use netlink to move interfaces between namespaces, by sending netlink command on source namespace, and specifying dst namespace either by PID or namespace FD (the later you don't have).
You need to switch process namespace before accessing /proc entries that depend on that namespace. Once "proc" file is open, it keeps reference to the namespace.

Resources