System becomes unresponsive when my kernel module tries accesses task->mm->pgd - c

I want to print the value of tsk->mm->pgd for all tasks (task_struct). I have a branch to judge if the mm or pgd is NULL. But, if the program access the value of pgd, I can't control the system. The operating system becomes unresponsive.
The following testFunc is defined in a kernel module:
void testFunc(void)
{
struct task_struct *p=&init_task;
printk(KERN_INFO "testFunc\n");
pid_t pid = current->pid;
printk(KERN_INFO "current PID - %d, - pgd %px\n", pid, current->mm->pgd);
for_each_process(p)
{
printk(KERN_INFO "pid - %d\tpname - %s\n", p->pid, p->comm);
if(p->mm && p->mm->pgd)
printk(KERN_INFO "%px\n", p->mm->pgd);
}
}
I want to know why this happens.
And is there any method to make it work?

Looking at your code, you are not taking any lock of any kind, so both the task and the mm you are working with could vanish at any moment and you'd be operating on invalid/dangling pointers.
To avoid this, there are different lock/unlock pairs of functions protecting different parts of task_struct that you can use. The following code should be what you are looking for:
#include <linux/sched/task.h> // struct task_struct, {get,put}_task_struct()
#include <linux/sched/signal.h> // for_each_process()
#include <linux/sched/mm.h> // get_task_mm(), mmput()
#include <linux/mm.h>
/* ... */
void testFunc(void)
{
char comm[TASK_COMM_LEN];
struct task_struct *p;
struct mm_struct *mm;
for_each_process(p) {
get_task_struct(p);
get_task_comm(comm, p);
mm = get_task_mm(p);
if (mm) {
pr_info("PID %d (\"%s\") pgd = %px\n", p->pid, comm, mm->pgd);
mmput(mm);
} else {
pr_info("PID %d (\"%s\") is an anonymous process\n", p->pid, comm);
}
put_task_struct(p);
}
}
See also this Linux kernel documentation page explaining the difference between task->mm and task->active_mm, and what it means for a task to have task->mm == NULL.

Related

C program causes memory leak?

I use that very simple C program to execute a system call to php each second, in order to run a php script that sends pending push notification in my database to APNS (Apple notification service).
Anyway, this program causes a memory overflow after about 10 hours, so I reduced sleep time between thread creation from 1s to 10000us, and I could see in real time with htop that memory were increasing without never lower. Here is the program :
#include <stdlib.h>
#include <stdio.h>
#include <pthread.h>
#include <unistd.h>
typedef struct {
char* script_path ;
} arg_for_script ;
static void *start_instance(void *_args)
{
int id = abs(pthread_self());
arg_for_script* args = _args ;
printf("[SERVICE] start php script on thread %d\n",id);
fflush(stdout);
char cmd[200] ;
sprintf(cmd, "php -f %s %d", args->script_path, id );
system(cmd);
printf("[SERVICE] end of script on thread %d\n", id);
fflush(stdout);
pthread_exit(NULL);
}
int main(int argc, char* argv[])
{
if(argc < 2)
{
fprintf(stderr, "[SERVICE] Path of php notification script must be filled\n");
fflush(stderr);
return EXIT_FAILURE;
}
arg_for_script args ;
args.script_path = argv[1];
pthread_attr_t tattr ;
struct sched_param param;
param.sched_priority = 1 ;
pthread_attr_init(&tattr);
pthread_attr_setinheritsched(&tattr, PTHREAD_EXPLICIT_SCHED);
pthread_attr_setschedpolicy(&tattr, SCHED_FIFO);
pthread_attr_setschedparam(&tattr, &param);
while(1) {
pthread_t thrd;
// if(pthread_create(&thrd, &tattr, start_instance, (void *)&args) == -1) {
if(pthread_create(&thrd, NULL, start_instance, (void *)&args) == -1)
{
fprintf(stderr, "[SERVICE] Unable to create thread\n");
fflush(stderr);
return EXIT_FAILURE;
}
usleep( 10000);
}
// pthread_attr_destroy(&tattr);
return EXIT_SUCCESS ;
}
Here, I don't dynamically allocate any RAM with malloc. Why would this program increases memory usage ? What pointer should I free here ?
You aren't calling pthread_join() nor use pthread_detach(), so the resources allocated for the thread aren't freed. Namely each thread has it's own stack, which is probably what causes the rising memory consumption.
Some remarks about your implementation: Since you plan on executing a PHP script with system() and don't actually need to work on shared variables or file descriptors, it's better to use fork() and one of the variants of exec(). This will spawn a new process without the intermediate step of creating a thread. It's also not recommended to use system() because it often allows to exploit the program when the input isn't properly sanitized. In this case it might be fine, if you only call it manually.

What is the purpose of uint64_t ps_pledge varibale in OpenBSD kernel code?

I am playing with OpenBSD kernel code, especially this file sys/kern/sched_bsd.c.
void
schedcpu(void *arg)
{
......
......
LIST_FOREACH(p, &allproc, p_list) {
/*
* Increment sleep time (if sleeping). We ignore overflow.
*/
if (p->p_stat == SSLEEP || p->p_stat == SSTOP)
p->p_slptime++;
p->p_pctcpu = (p->p_pctcpu * ccpu) >> FSHIFT;
/*
* If the process has slept the entire second,
* stop recalculating its priority until it wakes up.
*/
if (p->p_slptime > 1)
continue;
SCHED_LOCK(s);
/*
* p_pctcpu is only for diagnostic tools such as ps.
*/
....
....
LIST_FOREACH(TYPE *var, LIST_HEAD *head, LIST_ENTRY NAME);
The macro LIST_FOREACH traverses the list referenced by head in the forward direction, assigning each element in turn to var.
Now, here, p will contain the address of struct proc structure of eery process which is in the file
sys/sys/proc.h
Now, again, this structure contains another struct process *p_p structure, which denotes the properties of every process like its pid, flags, threads etc.
struct proc {
TAILQ_ENTRY(proc) p_runq;
LIST_ENTRY(proc) p_list; /* List of all threads. */
struct process *p_p; /* The process of this thread. */
TAILQ_ENTRY(proc) p_thr_link; /* Threads in a process linkage. */
TAILQ_ENTRY(proc) p_fut_link; /* Threads in a futex linkage. */
struct futex *p_futex; /* Current sleeping futex. */
/* substructures: */
struct filedesc *p_fd; /* copy of p_p->ps_fd */
struct vmspace *p_vmspace; /* copy of p_p->ps_vmspace */
#define p_rlimit p_p->ps_limit->pl_rlimit
....
....
Now, stucture struct process contains uint64_t ps_plegde.
struct process {
/*
* ps_mainproc is the original thread in the process.
* It's only still special for the handling of p_xstat and
* some signal and ptrace behaviors that need to be fixed.
*/
struct proc *ps_mainproc;
struct ucred *ps_ucred; /* Process owner's identity. */
....
....
u_short ps_acflag; /* Accounting flags. */
uint64_t ps_pledge;
uint64_t ps_execpledge;
....
....
Now, I wrote some modification in void schedcpu() function code.
void
schedcpu(void *arg)
{
pid_t pid;
uint64_t pledge_bit;
....
....
LIST_FOREACH(p, &allproc, p_list) {
pid=p->p_p->pid;
pledge_bit=p->p_p->ps_pledge;
if (pledge_bit) {
printf("pid: %10d pledge_bit: %10llu pledge_xbit:%10llx\n",pid,pledge_bit,pledge_bit);
}
/*
* Increment sleep time (if sleeping). We ignore overflow.
*/
if (p->p_stat == SSLEEP || p->p_stat == SSTOP)
p->p_slptime++;
p->p_pctcpu = (p->p_pctcpu * ccpu) >> FS
....
....
Here, Kernel log
pid: 37846 pledge_bit: 393359 pledge_xbit: 6008f
pid: 96037 pledge_bit: 393544 pledge_xbit: 60148
pid: 86032 pledge_bit: 264297 pledge_xbit: 40869
pid: 72264 pledge_bit: 393480 pledge_xbit: 60108
pid: 40102 pledge_bit: 8 pledge_xbit: 8
pid: 841 pledge_bit: 2148162527 pledge_xbit: 800a5bdf
pid: 49970 pledge_bit: 2148096143 pledge_xbit: 8009588f
pid: 68505 pledge_bit: 40 pledge_xbit: 28
pid: 46106 pledge_bit: 72 pledge_xbit: 48
pid: 77690 pledge_bit: 537161 pledge_xbit: 83249
pid: 44005 pledge_bit: 262152 pledge_xbit: 40008
pid: 82731 pledge_bit: 2148096143 pledge_xbit: 8009588f
pid: 71609 pledge_bit: 262472 pledge_xbit: 40148
pid: 54330 pledge_bit: 662063 pledge_xbit: a1a2f
pid: 77764 pledge_bit: 1052776 pledge_xbit: 101068
pid: 699 pledge_bit: 2148096143 pledge_xbit: 8009588f
pid: 84265 pledge_bit: 1052776 pledge_xbit: 101068
....
....
Now, Is that possible to know which process pledge what permissions, from looking at pledge_bit (decimal or hex values) that I got from above output?
I took pledge hex value of dhclient process i.e 0x8009588f, then, I wrote a sample hello world program with pledge("STDIO",NULL); and again I looked at dmesg and got same pledge_bit for hello world i.e 0x8009588f.
Then, this time I looked at dhclient source code and found out that, dhclient code pledged pledge("stdio inet dns route proc", NULL).
But, then, how is it possible to get same pledge hex bit for different pledge parameters?
I have got my answer after reading source code again and again.
So, I just want to contribute my learning so that future developers won't face any problem regarding this issue or confusion.
First understanding:
I wrote hello world like this,
void
main() {
pledge("STDIO", NULL); /* wrong use of pledge call */
printf("Hello world\n");
}
Correction of above code:
void
main() {
if (pledge("stdio", NULL) == -1) {
printf("Error\n");
}
printf("Hello world\n");
}
I forgot to check the return value from pledge().
Second understanding:
dhclient.c code contains pledge() call as:
int
main(int argc, char *argv[])
{
struct ieee80211_nwid nwid;
struct ifreq ifr;
struct stat sb;
const char *tail_path = "/etc/resolv.conf.tail";
....
....
fork_privchld(ifi, socket_fd[0], socket_fd[1]);
....
....
if ((cmd_opts & OPT_FOREGROUND) == 0) {
if (pledge("stdio inet dns route proc", NULL) == -1)
fatal("pledge");
} else {
if (pledge("stdio inet dns route", NULL) == -1)
fatal("pledge");
}
....
....
void
fork_privchld(struct interface_info *ifi, int fd, int fd2)
{
struct pollfd pfd[1];
struct imsgbuf *priv_ibuf;
ssize_t n;
int ioctlfd, routefd, nfds, rslt;
switch (fork()) {
case -1:
fatal("fork");
break;
case 0:
break;
default:
return;
}
....
....
}
Now, I wrote sample hello world code which contains the same parameters as dhclient:
void
main() {
if(pledge("stdio inet proc route dns", NULL) == -1) {
printf("Error\n");
}
while(1) {}
}
Now, I tested this above sample hello world code and got pledge_bit as 0x101068 in hex, which is correct.
But, as I told you earlier in my question, I became confused, when I saw different pledge_bit for dhclient. Because, as you guys know that both have same parameters in their pledge().
Then, how is it possible?
Now, here is the catch,
After continuously looking at dhclient source code, I found one function named as fork_privchld().
This function called before pledging in dhclient, so, it's like there is no pledge for this function as it is called before pledging.
So, just for verification again I wrote sample hello world code, but this time without any pledge() syscall.
void
main() {
printf("hello\n");
}
And, guess what, I got the same pledge_bit as 0x8009588f.
So, after this verification, it is verified that fork_privchld() function doesn't have any pledge bit set because as it is called before pledging in dhclient.
This function creates a [priv] child process for dhclient.
# ps aux|grep dhclient
root 26416 0.0 0.1 608 544 ?? Is 8:36AM 0:00.00 dhclient: em0 [priv] (dhclient)
_dhcp 33480 0.0 0.1 744 716 ?? Isp 8:36AM 0:00.00 dhclient: em0 (dhclient)
And, I don't know why I was only looking at the first process i.e [priv] (dhclient). This process is the one which is created by fork_privchld() function.
That's why this process has 0x8009588f pledge bit (due to called before pledging, so, no pledging at this point).
And, when I checked the second process i.e _dhcp, then I got my expected pledge_bit i.e 0x101068 (due to pledging).
So, I hope things will clear after reading this.
Note: Please, feel free to update me, if anything I forgot or missed.

Linux driver code with wait queues hanging system

I have written a sample linux device driver code which will create two kernel threads and each will increment a single global variable. I have used wait-queues to perform the task of incrementing the variable, and each thread will wait on the wait queue until a timer expires and each thread is woken up at random.
But problem is when I inserted this module, the whole system is just freezing up, and I have to restart the machine. This is happening every time I inserted the module. I tried debugging the kthread code to see if I am entering dead-lock situation by mistake but I am unable to figure out anything wrong with the code.
Can anyone please tell me what I am doing wrong in the code to get the hang-up situation?
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/init.h>
#include <linux/errno.h>
#include <linux/semaphore.h>
#include <linux/wait.h>
#include <linux/timer.h>
#include <linux/sched.h>
#include <linux/kthread.h>
spinlock_t my_si_lock;
pid_t kthread_pid1;
pid_t kthread_pid2 ;
static DECLARE_WAIT_QUEUE_HEAD(wqueue);
static struct timer_list my_timer;
int kthread_num;
/* the timer callback */
void my_timer_callback( unsigned long data ){
printk(KERN_INFO "my_timer_callback called (%ld).\n", jiffies );
if (waitqueue_active(&wqueue)) {
wake_up_interruptible(&wqueue);
}
}
/*Routine for the first thread */
static int kthread_routine_1(void *kthread_num)
{
//int num=(int)(*(int*)kthread_num);
int *num=(int *)kthread_num;
char kthread_name[15];
unsigned long flags;
DECLARE_WAITQUEUE(wait, current);
printk(KERN_INFO "Inside daemon_routine() %ld\n",current->pid);
allow_signal(SIGKILL);
allow_signal(SIGTERM);
do{
set_current_state(TASK_INTERRUPTIBLE);
add_wait_queue(&wqueue, &wait);
spin_lock_irqsave(&my_si_lock, flags);
printk(KERN_INFO "kernel_daemon [%d] incrementing the shared data=%d\n",current->pid,(*num)++);
spin_unlock_irqrestore(&my_si_lock, flags);
remove_wait_queue(&wqueue, &wait);
if (kthread_should_stop()) {
break;
}
}while(!signal_pending(current));
set_current_state(TASK_RUNNING);
return 0;
}
/*Routine for the second thread */
static int kthread_routine_2(void *kthread_num)
{
//int num=(int)(*(int*)kthread_num);
int *num=(int *)kthread_num;
char kthread_name[15];
unsigned long flags;
DECLARE_WAITQUEUE(wait, current);
printk(KERN_INFO "Inside daemon_routine() %ld\n",current->pid);
allow_signal(SIGKILL);
allow_signal(SIGTERM);
do{
set_current_state(TASK_INTERRUPTIBLE);
add_wait_queue(&wqueue, &wait);
spin_lock_irqsave(&my_si_lock, flags);
printk(KERN_INFO "kernel_daemon [%d] incrementing the shared data=%d\n",current->pid,(*num)++);
spin_unlock_irqrestore(&my_si_lock, flags);
remove_wait_queue(&wqueue, &wait);
if (kthread_should_stop()) {
break;
}
}while(!signal_pending(current));
set_current_state(TASK_RUNNING);
return 0;
}
static int __init signalexample_module_init(void)
{
int ret;
spin_lock_init(&my_si_lock);
init_waitqueue_head(&wqueue);
kthread_num=1;
printk(KERN_INFO "starting the first kernel thread with id ");
kthread_pid1 = kthread_run(kthread_routine_1,&kthread_num,"first_kthread");
printk(KERN_INFO "%ld \n",(long)kthread_pid1);
if(kthread_pid1< 0 ){
printk(KERN_ALERT "Kernel thread [1] creation failed\n");
return -1;
}
printk(KERN_INFO "starting the second kernel thread with id");
kthread_pid2 = kthread_run(kthread_routine_2,&kthread_num,"second_kthread");
printk(KERN_INFO "%ld \n",(long)kthread_pid2);
if(kthread_pid2 < 0 ){
printk(KERN_ALERT "Kernel thread [2] creation failed\n");
return -1;
}
setup_timer( &my_timer, my_timer_callback, 0 );
ret = mod_timer( &my_timer, jiffies + msecs_to_jiffies(2000) );
if (ret) {
printk("Error in mod_timer\n");
return -EINVAL;
}
return 0;
}
static void __exit signalexample_module_exit(void)
{
del_timer(&my_timer);
}
module_init(signalexample_module_init);
module_exit(signalexample_module_exit);
MODULE_LICENSE("GPL");
MODULE_DESCRIPTION("Demonstrates use of kthread");
You need a call to schedule() in both of your thread functions:
/* In kernel thread function... */
set_current_state(TASK_INTERRUPTIBLE);
add_wait_queue(&wqueue, &wait);
schedule(); /* Add this call here */
spin_lock_irqsave(&my_si_lock, flags);
/* etc... */
Calling set_current_state(TASK_INTERRUPTIBLE) sets the state in the current process' task structure, which allows the scheduler to move the process off of the run queue once it sleeps. But then you have to tell the scheduler, "Okay, I've set a new state. Reschedule me now." You're missing this second step, so the changed flag won't take effect until the next time the scheduler decides to suspend your thread, and there's no way to know how soon that will happen, or which line of your code it's executing when it happens (except in the locked code - that shouldn't be interrupted).
I'm not really sure why it's causing your whole system to lock up, because your system's state is pretty unpredictable. Since the kernel threads weren't waiting for the timer to expire before grabbing locks and looping, I have no idea when you could expect the scheduler to actually take action on the new task struct states, and a lot of things could be happening in the meantime. Your threads are repeatedly calling add_wait_queue(&wqueue, &wait); and remove_wait_queue(&wqueue, &wait);, so who knows what state the wait queue is in by the time your timer callback fires. In fact, since the kernel threads are spinning, this code has a race condition:
if (waitqueue_active(&wqueue)) {
wake_up_interruptible(&wqueue);
}
It's possible that you have active tasks on the waitqueue when the if statement is executed, only to have them emptied out by the time wake_up_interruptible(&wqueue); is called.
By the way, I'm assuming your current goal of incrementing a global variable is just an exercise to learn waitqueues and sleep states. If you ever want to actually implement a shared counter, look at atomic operations instead, and you'll be able to dump the spinlock.
If you decide to keep the spinlock, you should switch to using the DEFINE_SPINLOCK() macro instead.
Also, as I mentioned in my comment, you should change your two kthread_pid variables to be of task_struct * type. You also need a call to kthread_stop(kthread_pid); in your __exit routine for each of the threads you start. kthread_should_stop() will never return true if you don't ever tell them to stop.

How to read a counter from a linux C program to a bash test script?

I have a large C/C++ program on a Suse linux system. We do automated testing of it with a bash script, which sends input to the program, and reads the output. It's mainly "black-box" testing, but some tests need to know a few internal details to determine if a test has passed.
One test in particular needs to know how times the program runs a certain function (which parses a particular response message). When that function runs it issues a log and increments a counter variable. The automated test currently determines the number of invocations by grepping in the log file for the log message, and counting the number of occurrences before and after the test. This isn't ideal, because the logs (syslog-ng) aren't guaranteed, and they're frequently turned off by configuration, because they're basically debug logs.
I'm looking for a better alternative. I can change the program to enhance the testability, but it shouldn't be heavy impact to normal operation. My first thought was, I could just read the counter after each test. Something like this:
gdb --pid=$PID --batch -ex "p numServerResponseX"
That's slow when it runs, but it's good because the program doesn't need to be changed at all. With a little work, I could probably write a ptrace command to do this a little more efficiently.
But I'm wondering if there isn't a simpler way to do this. Could I write the counter to shared memory (with shm_open / mmap), and then read /dev/shm in the bash script? Is there some simpler way I could setup the counter to make it easy to read, without making it slow to increment?
Edit:
Details: The test setup is like this:
testScript <-> sipp <-> programUnderTest <-> externalServer
The bash testScript injects sip messages with sipp, and it generally determines success or failure based on the completion code from sipp. But in certain tests it needs to know the number of responses the program received from the external server. The function "processServerResponseX" processes certain responses from the external server. During the testing there isn't much traffic running, so the function is only invoked perhaps 20 times over 10 seconds. When each test ends and we want to check the counter, there should be essentially no traffic. However during normal operation, it might be invoked hundreds of times a second. The function is roughly:
unsigned long int numServerResponseX;
int processServerResponseX(DMsg_t * dMsg, AppId id)
{
if (DEBUG_ENABLED)
{
syslog(priority, "%s received %d", __func__, (int) id);
}
myMutex->getLock();
numServerResponseX++;
doLockedStuff(dMsg, id);
myMutex->releaseLock();
return doOtherStuff(dMsg, id);
}
The script currently does:
grep processServerResponseX /var/log/logfile | wc -l
and compares the value before and after. My goal is to have this work even if DEBUG_ENABLED is false, and not have it be too slow. The program is multi-threaded, and it runs on an i86_64 smp machine, so adding any long blocking function would not be a good solution.
I would have that certain function "(which parses a particular response message)" write (probably using fopen then fprintf then fclose) some textual data somewhere.
That destination could be a FIFO (see fifo(7) ...) or a temporary file in a tmpfs file system (which is a RAM file system), maybe /run/
If your C++ program is big and complex enough, you could consider adding some probing facilities (some means for an external program to query about the internal state of your C++ program) e.g. a dedicated web service (using libonion in a separate thread), or some interface to systemd, or to D-bus, or some remote procedure call service like ONC/RPC, JSON-RPC, etc etc...
You might be interested by POCOlib. Perhaps its logging framework should interest you.
As you mentioned, you might use Posix shared memory & semaphores (see shm_overview(7) and sem_overview(7) ...).
Perhaps the Linux specific eventfd(2) is what you need.... (you could code a tiny C program to be invoked by your testing bash scripts....)
You could also try to change the command line (I forgot how to do that, maybe libproc or write to /proc/self/cmdline see proc(5)...). Then ps would show it.
I personally do usually use the methods Basile Starynkevitch outlined for this, but I wanted to bring up an alternative method using realtime signals.
I am not claiming this is the best solution, but it is simple to implement and has very little overhead. The main downside is that the size of the request and response are both limited to one int (or technically, anything representable by an int or by a void *).
Basically, you use a simple helper program to send a signal to the application. The signal has a payload of one int your application can examine, and based on it, the application responds by sending the same signal back to the originator, with an int of its own as payload.
If you don't need any locking, you can use a simple realtime signal handler. When it catches a signal, it examines the siginfo_t structure. If sent via sigqueue(), the request is in the si_value member of the siginfo_t structure. The handler answers to the originating process (si_pid member of the structure) using sigqueue(), with the response. This only requires about sixty lines of code to be added to your application. Here is an example application, app1.c:
#define _POSIX_C_SOURCE 200112L
#include <unistd.h>
#include <signal.h>
#include <errno.h>
#include <string.h>
#include <time.h>
#include <stdio.h>
#define INFO_SIGNAL (SIGRTMAX-1)
/* This is the counter we're interested in */
static int counter = 0;
static void responder(int signum, siginfo_t *info,
void *context __attribute__((unused)))
{
if (info && info->si_code == SI_QUEUE) {
union sigval value;
int response, saved_errno;
/* We need to save errno, to avoid interfering with
* the interrupted thread. */
saved_errno = errno;
/* Incoming signal value (int) determines
* what we respond back with. */
switch (info->si_value.sival_int) {
case 0: /* Request loop counter */
response = *(volatile int *)&counter;
break;
/* Other codes? */
default: /* Respond with -1. */
response = -1;
}
/* Respond back to signaler. */
value.sival_ptr = (void *)0L;
value.sival_int = response;
sigqueue(info->si_pid, signum, value);
/* Restore errno. This way the interrupted thread
* will not notice any change in errno. */
errno = saved_errno;
}
}
static int install_responder(const int signum)
{
struct sigaction act;
sigemptyset(&act.sa_mask);
act.sa_sigaction = responder;
act.sa_flags = SA_SIGINFO;
if (sigaction(signum, &act, NULL))
return errno;
else
return 0;
}
int main(void)
{
if (install_responder(INFO_SIGNAL)) {
fprintf(stderr, "Cannot install responder signal handler: %s.\n",
strerror(errno));
return 1;
}
fprintf(stderr, "PID = %d\n", (int)getpid());
fflush(stderr);
/* The application follows.
* This one just loops at 100 Hz, printing a dot
* about once per second or so. */
while (1) {
struct timespec t;
counter++;
if (!(counter % 100)) {
putchar('.');
fflush(stdout);
}
t.tv_sec = 0;
t.tv_nsec = 10000000; /* 10ms */
nanosleep(&t, NULL);
/* Note: Since we ignore the remainder
* from the nanosleep call, we
* may sleep much shorter periods
* when a signal is delivered. */
}
return 0;
}
The above responder responds to query 0 with the counter value, and with -1 to everything else. You can add other queries simply by adding a suitable case statement in responder().
Note that locking primitives (except for sem_post()) are not async-signal safe, and thus should not be used in a signal handler. So, the above code cannot implement any locking.
Signal delivery can interrupt a thread in a blocking call. In the above application, the nanosleep() call is usually interrupted by the signal delivery, causing the sleep to be cut short. (Similarly, read() and write() calls may return -1 with errno == EINTR, if they were interrupted by signal delivery.)
If that is a problem, or you are not sure if all your code handles errno == EINTR correctly, or your counters need locking, you can use separate thread dedicated for the signal handling instead.
The dedicated thread will sleep unless a signal is delivered, and only requires a very small stack, so it really does not consume any significant resources at run time.
The target signal is blocked in all threads, with the dedicated thread waiting in sigwaitinfo(). If it catches any signals, it processes them just like above -- except that since this is a thread and not a signal handler per se, you can freely use any locking etc., and do not need to limit yourself to async-signal safe functions.
This threaded approach is slightly longer, adding almost a hundred lines of code to your application. (The differences are contained in the responder() and install_responder() functions; even the code added to main() is exactly the same as in app1.c.)
Here is app2.c:
#define _POSIX_C_SOURCE 200112L
#include <signal.h>
#include <errno.h>
#include <pthread.h>
#include <string.h>
#include <time.h>
#include <stdio.h>
#define INFO_SIGNAL (SIGRTMAX-1)
/* This is the counter we're interested in */
static int counter = 0;
static void *responder(void *payload)
{
const int signum = (long)payload;
union sigval response;
sigset_t sigset;
siginfo_t info;
int result;
/* We wait on only one signal. */
sigemptyset(&sigset);
if (sigaddset(&sigset, signum))
return NULL;
/* Wait forever. This thread is automatically killed, when the
* main thread exits. */
while (1) {
result = sigwaitinfo(&sigset, &info);
if (result != signum) {
if (result != -1 || errno != EINTR)
return NULL;
/* A signal was delivered using *this* thread. */
continue;
}
/* We only respond to sigqueue()'d signals. */
if (info.si_code != SI_QUEUE)
continue;
/* Clear response. We don't leak stack data! */
memset(&response, 0, sizeof response);
/* Question? */
switch (info.si_value.sival_int) {
case 0: /* Counter */
response.sival_int = *(volatile int *)(&counter);
break;
default: /* Unknown; respond with -1. */
response.sival_int = -1;
}
/* Respond. */
sigqueue(info.si_pid, signum, response);
}
}
static int install_responder(const int signum)
{
pthread_t worker_id;
pthread_attr_t attrs;
sigset_t mask;
int retval;
/* Mask contains only signum. */
sigemptyset(&mask);
if (sigaddset(&mask, signum))
return errno;
/* Block signum, in all threads. */
if (sigprocmask(SIG_BLOCK, &mask, NULL))
return errno;
/* Start responder() thread with a small stack. */
pthread_attr_init(&attrs);
pthread_attr_setstacksize(&attrs, 32768);
retval = pthread_create(&worker_id, &attrs, responder,
(void *)(long)signum);
pthread_attr_destroy(&attrs);
return errno = retval;
}
int main(void)
{
if (install_responder(INFO_SIGNAL)) {
fprintf(stderr, "Cannot install responder signal handler: %s.\n",
strerror(errno));
return 1;
}
fprintf(stderr, "PID = %d\n", (int)getpid());
fflush(stderr);
while (1) {
struct timespec t;
counter++;
if (!(counter % 100)) {
putchar('.');
fflush(stdout);
}
t.tv_sec = 0;
t.tv_nsec = 10000000; /* 10ms */
nanosleep(&t, NULL);
}
return 0;
}
For both app1.c and app2.c the application itself is the same.
The only modifications needed to the application are making sure all the necessary header files get #included, adding responder() and install_responder(), and a call to install_responder() as early as possible in main().
(app1.c and app2.c only differ in responder() and install_responder(); and in that app2.c needs pthreads.)
Both app1.c and app2.c use the signal SIGRTMAX-1, which should be unused in most applications.
app2.c approach, also has a useful side-effect you might wish to use in general: if you use other signals in your application, but don't want them to interrupt blocking I/O calls et cetera -- perhaps you have a library that was written by a third party, and does not handle EINTR correctly, but you do need to use signals in your application --, you can simply block the signals after the install_responder() call in your application. The only thread, then, where the signals are not blocked is the responder thread, and the kernel will use tat to deliver the signals. Therefore, the only thread that will ever get interrupted by the signal delivery is the responder thread, more specifically sigwaitinfo() in responder(), and it ignores any interruptions. If you use for example async I/O or timers, or this is a heavy math or data processing application, this might be useful.
Both application implementations can be queried using a very simple query program, query.c:
#define _POSIX_C_SOURCE 200112L
#include <unistd.h>
#include <signal.h>
#include <string.h>
#include <errno.h>
#include <time.h>
#include <stdio.h>
int query(const pid_t process, const int signum,
const int question, int *const response)
{
sigset_t prevmask, waitset;
struct timespec timeout;
union sigval value;
siginfo_t info;
int result;
/* Value sent to the target process. */
value.sival_int = question;
/* Waitset contains only signum. */
sigemptyset(&waitset);
if (sigaddset(&waitset, signum))
return errno = EINVAL;
/* Block signum; save old mask into prevmask. */
if (sigprocmask(SIG_BLOCK, &waitset, &prevmask))
return errno;
/* Send the signal. */
if (sigqueue(process, signum, value)) {
const int saved_errno = errno;
sigprocmask(signum, &prevmask, NULL);
return errno = saved_errno;
}
while (1) {
/* Wait for a response within five seconds. */
timeout.tv_sec = 5;
timeout.tv_nsec = 0L;
/* Set si_code to an uninteresting value,
* just to be safe. */
info.si_code = SI_KERNEL;
result = sigtimedwait(&waitset, &info, &timeout);
if (result == -1) {
/* Some other signal delivered? */
if (errno == EINTR)
continue;
/* No response; fail. */
sigprocmask(SIG_SETMASK, &prevmask, NULL);
return errno = ETIMEDOUT;
}
/* Was this an interesting signal? */
if (result == signum && info.si_code == SI_QUEUE) {
if (response)
*response = info.si_value.sival_int;
/* Return success. */
sigprocmask(SIG_SETMASK, &prevmask, NULL);
return errno = 0;
}
}
}
int main(int argc, char *argv[])
{
pid_t pid;
int signum, question, response;
long value;
char dummy;
if (argc < 3 || argc > 4 ||
!strcmp(argv[1], "-h") || !strcmp(argv[1], "--help")) {
fprintf(stderr, "\n");
fprintf(stderr, "Usage: %s [ -h | --help ]\n", argv[0]);
fprintf(stderr, " %s PID SIGNAL [ QUERY ]\n", argv[0]);
fprintf(stderr, "\n");
return 1;
}
if (sscanf(argv[1], " %ld %c", &value, &dummy) != 1) {
fprintf(stderr, "%s: Invalid process ID.\n", argv[1]);
return 1;
}
pid = (pid_t)value;
if (pid < (pid_t)1 || value != (long)pid) {
fprintf(stderr, "%s: Invalid process ID.\n", argv[1]);
return 1;
}
if (sscanf(argv[2], "SIGRTMIN %ld %c", &value, &dummy) == 1)
signum = SIGRTMIN + (int)value;
else
if (sscanf(argv[2], "SIGRTMAX %ld %c", &value, &dummy) == 1)
signum = SIGRTMAX + (int)value;
else
if (sscanf(argv[2], " %ld %c", &value, &dummy) == 1)
signum = value;
else {
fprintf(stderr, "%s: Unknown signal.\n", argv[2]);
return 1;
}
if (signum < SIGRTMIN || signum > SIGRTMAX) {
fprintf(stderr, "%s: Not a realtime signal.\n", argv[2]);
return 1;
}
/* Clear the query union. */
if (argc > 3) {
if (sscanf(argv[3], " %d %c", &question, &dummy) != 1) {
fprintf(stderr, "%s: Invalid query.\n", argv[3]);
return 1;
}
} else
question = 0;
if (query(pid, signum, question, &response)) {
switch (errno) {
case EINVAL:
fprintf(stderr, "%s: Invalid signal.\n", argv[2]);
return 1;
case EPERM:
fprintf(stderr, "Signaling that process was not permitted.\n");
return 1;
case ESRCH:
fprintf(stderr, "No such process.\n");
return 1;
case ETIMEDOUT:
fprintf(stderr, "No response.\n");
return 1;
default:
fprintf(stderr, "Failed: %s.\n", strerror(errno));
return 1;
}
}
printf("%d\n", response);
return 0;
}
Note that I did not hardcode the signal number here; use SIGRTMAX-1 on the command line for app1.c and app2.c. (You can change it. query.c does understand SIGRTMIN+n too. You must use a realtime signal, SIGRTMIN+0 to SIGRTMAX-0, inclusive.)
You can compile all three programs using
gcc -Wall -O3 app1.c -o app1
gcc -Wall -O3 app2.c -lpthread -o app2
gcc -Wall -O3 query.c -o query
Both ./app1 and ./app2 print their PIDs, so you don't need to look for it. (You can find the PID using e.g. ps -o pid= -C app1 or ps -o pid= -C app2, though.)
If you run ./app1 or ./app2 in one shell (or both in separate shells), you can see them outputting the dots at about once per second. The counter increases every 1/100th of a second. (Press Ctrl+C to stop.)
If you run ./query PID SIGRTMAX-1 in another shell in the same directory on the same machine, you can see the counter value.
An example run on my machine:
A$ ./app1
PID = 28519
...........
B$ ./query 28519 SIGRTMAX-1
11387
C$ ./app2
PID = 28522
...
B$ ./query 28522 SIGRTMAX -1
371
As mentioned, the downside of this mechanism is that the response is limited to one int (or technically an int or a void *). There are ways around that, however, by also using some of the methods Basile Starynkevich outlined. Typically, the signal is then just a notification for the application that it should update the state stored in a file, shared memory segment, or wherever. I recommend using the dedicated thread approach for that, as it has very little overheads, and minimal impact on the application itself.
Any questions?
A hard-coded systemtap solution could look like:
% cat FOO.stp
global counts
probe process("/path/to/your/binary").function("CertainFunction") { counts[pid()] <<< 1 }
probe process("/path/to/your/binary").end { println ("pid %d count %sd", pid(), #count(counts[pid()]))
delete counts[pid()] }
# stap FOO.stp
pid 42323 count 112
pid 2123 count 0
... etc, until interrupted
Thanks for the responses. There is lots of good information in the other answers. However, here's what I did. First I tweaked the program to add a counter in a shm file:
struct StatsCounter {
char counterName[8];
unsigned long int counter;
};
StatsCounter * stats;
void initStatsCounter()
{
int fd = shm_open("TestStats", O_RDWR|O_CREAT, 0);
if (fd == -1)
{
syslog(priority, "%s:: Initialization Failed", __func__);
stats = (StatsCounter *) malloc(sizeof(StatsCounter));
}
else
{
// For now, just one StatsCounter is used, but it could become an array.
ftruncate(fd, sizeof(StatsCounter));
stats = (StatsCounter *) mmap(NULL, sizeof(StatsCounter),
PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
}
// Initialize names. Pad them to 7 chars (save room for \0).
snprintf(stats[0].counterName, sizeof(stats[0].counterName), "nRespX ");
stats[0].counter = 0;
}
And changed processServerResponseX to increment stats[0].counter in the locked section. Then I changed the script to parse the shm file with "hexdump":
hexdump /dev/shm/TestStats -e ' 1/8 "%s " 1/8 "%d\n"'
This will then show something like this:
nRespX 23
This way I can extend this later if I want to also look at response Y, ...
Not sure if there are mutual exclusion problems with hexdump if it accessed the file while it was being changed. But in my case, I don't think it matters, because the script only calls it before and after the test, it should not be in the middle of an update.

Execute/invoke user-space program, and get its pid, from a kernel module

I checked out Kernel APIs, Part 1: Invoking user - space applications from the kernel, and Executing a user-space function from the kernel space - Stack Overflow - and here is a small kernel module, callmodule.c, demonstrating that:
// http://people.ee.ethz.ch/~arkeller/linux/code/usermodehelper.c
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/init.h>
#include <linux/proc_fs.h>
#include <asm/uaccess.h>
static int __init callmodule_init(void)
{
int ret = 0;
char userprog[] = "/path/to/mytest";
char *argv[] = {userprog, "2", NULL };
char *envp[] = {"HOME=/", "PATH=/sbin:/usr/sbin:/bin:/usr/bin", NULL };
printk("callmodule: init %s\n", userprog);
/* last parameter: 1 -> wait until execution has finished, 0 go ahead without waiting*/
/* returns 0 if usermode process was started successfully, errorvalue otherwise*/
/* no possiblity to get return value of usermode process*/
ret = call_usermodehelper(userprog, argv, envp, UMH_WAIT_EXEC);
if (ret != 0)
printk("error in call to usermodehelper: %i\n", ret);
else
printk("everything all right\n");
return 0;
}
static void __exit callmodule_exit(void)
{
printk("callmodule: exit\n");
}
module_init(callmodule_init);
module_exit(callmodule_exit);
MODULE_LICENSE("GPL");
... with Makefile:
obj-m += callmodule.o
all:
make -C /lib/modules/$(shell uname -r)/build M=$(PWD) modules
clean:
make -C /lib/modules/$(shell uname -r)/build M=$(PWD) clean
When I run this via sudo insmod ./callmodule.ko && sudo rmmod callmodule, I get in /var/log/syslog:
Feb 10 00:42:45 mypc kernel: [71455.260355] callmodule: init /path/to/mytest
Feb 10 00:42:45 mypc kernel: [71455.261218] everything all right
Feb 10 00:42:45 mypc kernel: [71455.286131] callmodule: exit
... which apparently means all went fine. (Using Linux 2.6.38-16-generic #67-Ubuntu SMP)
My question is - how can I get the PID of the process instantiated from a kernel module? Is there a similar process other than call_usermodehelper, that will allow me to instantiate a user-space process in kernel space, and obtain its pid?
Note that it may not be possible to use call_usermodehelper and get the instantiated process PID:
Re: call_usermodehelper's pid ? — Linux Kernel Newbies
I want to create a user space process from within
a kernel module, and be able to kill it, send signals
to it, etc...
can I know its pid ?
No, you can't. But since inside the implementation the pid is known,
patch that makes it available would not be too hard (note, that errors
are always negative in kernel and pids are positive, limited to 2**16).
You would have to modify all callers that expect 0 on success though.
I poked around the sources a bit, and it seems ultimately there is a call chain: call_usermodehelper -> call_usermodehelper_setup -> __call_usermodehelper, which looks like:
static void __call_usermodehelper(struct work_struct *work)
{
struct subprocess_info *sub_info =
container_of(work, struct subprocess_info, work);
// ...
if (wait == UMH_WAIT_PROC)
pid = kernel_thread(wait_for_helper, sub_info,
CLONE_FS | CLONE_FILES | SIGCHLD);
else
pid = kernel_thread(____call_usermodehelper, sub_info,
CLONE_VFORK | SIGCHLD);
...
... so a PID of a kernel thread is used, but it is saved nowhere; also, neither work_struct nor subprocess_info have a pid field (task_struct does, but nothing here seems to use task_struct). Recording this pid would require changing the kernel sources - and as I'd like to avoid that, this is the reason why I'm also interested in approaches other than call_usermodehelper ...
A tentative answer from my understanding of the implementation in kmod.c.
If you look at the code of call_usermodehelper you will see it calls call_usermodehelper_setup and then call_usermodehelper_exec.
call_usermodehelper_setup takes as parameter an init function that will be executed just before the do_execve. I believe the value of current when the init function gets executed will get you the task_struct of the user process.
So to get the pid you will need to:
Copy the implementation of call_usermodehelper in your code.
Define an init function that you will pass as parameter to call_usermodehelper_setup.
In the init function retrieves the current task_struct and in turn the PID.
Well, this was tedious... Below is a rather hacky way to achieve this, at least on my platform, as callmodule.c (same Makefile as above can be used). As I cannot believe that this is the way this should be done, more proper answers are still welcome (hopefully, also, with a code example I could test). But at least, it does the job as a kernel module only - without the need to patch the kernel itself - for the 2.6.38 version, which was quite important to me.
Basically, I copied all functions (renamed with a "B" suffix), until the point where the PID is available. Then I use a copy of subprocess_info with an extra field to save it (although that is not strictly necessary: in order not to mess with function signatures in respect to return value, I have to save the pid as a global variable anyway; left it as an exercise). Now, when I run sudo insmod ./callmodule.ko && sudo rmmod callmodule, in /var/log/syslog I get:
Feb 10 18:53:02 mypc kernel: [ 2942.891886] callmodule: > init /path/to/mytest
Feb 10 18:53:02 mypc kernel: [ 2942.891912] callmodule: symbol # 0xc1065b60 is wait_for_helper+0x0/0xb0
Feb 10 18:53:02 mypc kernel: [ 2942.891923] callmodule: symbol # 0xc1065ed0 is ____call_usermodehelper+0x0/0x90
Feb 10 18:53:02 mypc kernel: [ 2942.891932] callmodule:a: pid 0
Feb 10 18:53:02 mypc kernel: [ 2942.891937] callmodule:b: pid 0
Feb 10 18:53:02 mypc kernel: [ 2942.893491] callmodule: : pid 23306
Feb 10 18:53:02 mypc kernel: [ 2942.894474] callmodule:c: pid 23306
Feb 10 18:53:02 mypc kernel: [ 2942.894483] callmodule: everything all right; pid 23306
Feb 10 18:53:02 mypc kernel: [ 2942.894494] callmodule: pid task a: ec401940 c: mytest p: [23306] s: runnable
Feb 10 18:53:02 mypc kernel: [ 2942.894502] callmodule: parent task a: f40aa5e0 c: kworker/u:1 p: [14] s: stopped
Feb 10 18:53:02 mypc kernel: [ 2942.894510] callmodule: - mytest [23306]
Feb 10 18:53:02 mypc kernel: [ 2942.918500] callmodule: < exit
One of the nasty problems here, is that once you start copying functions, at a certain time you come to a point, where kernel functions are used which are not exported, like in this case wait_for_helper. What I did was basically look in /proc/kallsyms (remember sudo!) to get absolute addresses for e.g. wait_for_helper, then hardcoded those in the kernel module as function pointers - seems to work. Another problem is that functions in kernel source refer to enum umh_wait, which cannot be used as argument from the module (those need to simply be converted to use int instead).
So the module starts the user-space process, gets the PID (noting that "What the kernel calls PIDs are actually kernel-level thread ids (often called TIDs) ... What's considered a PID in the POSIX sense of "process", on the other hand, is called a "thread group ID" or "TGID" in the kernel."), gets the corresponding task_struct and its parent, and tries to list all children of the parent and of the spawned process itself. So I can see that kworker/u:1 is typically the parent, and it has no other children than mytest - and since mytest is very simple (in my case, just a single write to disk file), it spawns no threads of its own, so it has no children either.
I encountered a couple of Oopses which required a reboot - I think they are solved now, but just in case, caveat emptor.
Here is the callmodule.c code (with some notes/links at end):
// callmodule.c with pid, url: https://stackoverflow.com/questions/21668727/
#include <linux/module.h>
#include <linux/slab.h> //kzalloc
#include <linux/syscalls.h> // SIGCHLD, ... sys_wait4, ...
#include <linux/kallsyms.h> // kallsyms_lookup, print_symbol
// global variable - to avoid intervening too much in the return of call_usermodehelperB:
static int callmodule_pid;
// >>>>>>>>>>>>>>>>>>>>>>
// modified kernel functions - taken from
// http://lxr.missinglinkelectronics.com/linux+v2.6.38/+save=include/linux/kmod.h
// http://lxr.linux.no/linux+v2.6.38/+save=kernel/kmod.c
// define a modified struct (with extra pid field) here:
struct subprocess_infoB {
struct work_struct work;
struct completion *complete;
char *path;
char **argv;
char **envp;
int wait; //enum umh_wait wait;
int retval;
int (*init)(struct subprocess_info *info);
void (*cleanup)(struct subprocess_info *info);
void *data;
pid_t pid;
};
// forward declare:
struct subprocess_infoB *call_usermodehelper_setupB(char *path, char **argv,
char **envp, gfp_t gfp_mask);
static inline int
call_usermodehelper_fnsB(char *path, char **argv, char **envp,
int wait, //enum umh_wait wait,
int (*init)(struct subprocess_info *info),
void (*cleanup)(struct subprocess_info *), void *data)
{
struct subprocess_info *info;
struct subprocess_infoB *infoB;
gfp_t gfp_mask = (wait == UMH_NO_WAIT) ? GFP_ATOMIC : GFP_KERNEL;
int ret;
populate_rootfs_wait(); // is in linux-headers-2.6.38-16-generic/include/linux/kmod.h
infoB = call_usermodehelper_setupB(path, argv, envp, gfp_mask);
printk(KBUILD_MODNAME ":a: pid %d\n", infoB->pid);
info = (struct subprocess_info *) infoB;
if (info == NULL)
return -ENOMEM;
call_usermodehelper_setfns(info, init, cleanup, data);
printk(KBUILD_MODNAME ":b: pid %d\n", infoB->pid);
// this must be called first, before infoB->pid is populated (by __call_usermodehelperB):
ret = call_usermodehelper_exec(info, wait);
// assign global pid here, so rest of the code has it:
callmodule_pid = infoB->pid;
printk(KBUILD_MODNAME ":c: pid %d\n", callmodule_pid);
return ret;
}
static inline int
call_usermodehelperB(char *path, char **argv, char **envp, int wait) //enum umh_wait wait)
{
return call_usermodehelper_fnsB(path, argv, envp, wait,
NULL, NULL, NULL);
}
/* This is run by khelper thread */
static void __call_usermodehelperB(struct work_struct *work)
{
struct subprocess_infoB *sub_infoB =
container_of(work, struct subprocess_infoB, work);
int wait = sub_infoB->wait; // enum umh_wait wait = sub_info->wait;
pid_t pid;
struct subprocess_info *sub_info;
// hack - declare function pointers, to use for wait_for_helper/____call_usermodehelper
int (*ptrwait_for_helper)(void *data);
int (*ptr____call_usermodehelper)(void *data);
// assign function pointers to verbatim addresses as obtained from /proc/kallsyms
ptrwait_for_helper = (void *)0xc1065b60;
ptr____call_usermodehelper = (void *)0xc1065ed0;
sub_info = (struct subprocess_info *)sub_infoB;
/* CLONE_VFORK: wait until the usermode helper has execve'd
* successfully We need the data structures to stay around
* until that is done. */
if (wait == UMH_WAIT_PROC)
pid = kernel_thread((*ptrwait_for_helper), sub_info, //(wait_for_helper, sub_info,
CLONE_FS | CLONE_FILES | SIGCHLD);
else
pid = kernel_thread((*ptr____call_usermodehelper), sub_info, //(____call_usermodehelper, sub_info,
CLONE_VFORK | SIGCHLD);
printk(KBUILD_MODNAME ": : pid %d\n", pid);
// grab and save the pid here:
sub_infoB->pid = pid;
switch (wait) {
case UMH_NO_WAIT:
call_usermodehelper_freeinfo(sub_info);
break;
case UMH_WAIT_PROC:
if (pid > 0)
break;
/* FALLTHROUGH */
case UMH_WAIT_EXEC:
if (pid < 0)
sub_info->retval = pid;
complete(sub_info->complete);
}
}
/**
* call_usermodehelper_setup - prepare to call a usermode helper
*/
struct subprocess_infoB *call_usermodehelper_setupB(char *path, char **argv,
char **envp, gfp_t gfp_mask)
{
struct subprocess_infoB *sub_infoB;
sub_infoB = kzalloc(sizeof(struct subprocess_infoB), gfp_mask);
if (!sub_infoB)
goto out;
INIT_WORK(&sub_infoB->work, __call_usermodehelperB);
sub_infoB->path = path;
sub_infoB->argv = argv;
sub_infoB->envp = envp;
out:
return sub_infoB;
}
// <<<<<<<<<<<<<<<<<<<<<<
static int __init callmodule_init(void)
{
int ret = 0;
char userprog[] = "/path/to/mytest";
char *argv[] = {userprog, "2", NULL };
char *envp[] = {"HOME=/", "PATH=/sbin:/usr/sbin:/bin:/usr/bin", NULL };
struct task_struct *p;
struct task_struct *par;
struct task_struct *pc;
struct list_head *children_list_head;
struct list_head *cchildren_list_head;
char *state_str;
printk(KBUILD_MODNAME ": > init %s\n", userprog);
/* last parameter: 1 -> wait until execution has finished, 0 go ahead without waiting*/
/* returns 0 if usermode process was started successfully, errorvalue otherwise*/
/* no possiblity to get return value of usermode process*/
// note - only one argument allowed for print_symbol
print_symbol(KBUILD_MODNAME ": symbol # 0xc1065b60 is %s\n", 0xc1065b60); // shows wait_for_helper+0x0/0xb0
print_symbol(KBUILD_MODNAME ": symbol # 0xc1065ed0 is %s\n", 0xc1065ed0); // shows ____call_usermodehelper+0x0/0x90
ret = call_usermodehelperB(userprog, argv, envp, UMH_WAIT_EXEC);
if (ret != 0)
printk(KBUILD_MODNAME ": error in call to usermodehelper: %i\n", ret);
else
printk(KBUILD_MODNAME ": everything all right; pid %d\n", callmodule_pid);
// find the task:
// note: sometimes p may end up being NULL here, causing kernel oops -
// just exit prematurely in that case
rcu_read_lock();
p = pid_task(find_vpid(callmodule_pid), PIDTYPE_PID);
rcu_read_unlock();
if (p == NULL) {
printk(KBUILD_MODNAME ": p is NULL - exiting\n");
return 0;
}
// p->comm should be the command/program name (as per userprog)
// (out here that task is typically in runnable state)
state_str = (p->state==-1)?"unrunnable":((p->state==0)?"runnable":"stopped");
printk(KBUILD_MODNAME ": pid task a: %p c: %s p: [%d] s: %s\n",
p, p->comm, p->pid, state_str);
// find parent task:
// parent task could typically be: c: kworker/u:1 p: [14] s: stopped
par = p->parent;
if (par == NULL) {
printk(KBUILD_MODNAME ": par is NULL - exiting\n");
return 0;
}
state_str = (par->state==-1)?"unrunnable":((par->state==0)?"runnable":"stopped");
printk(KBUILD_MODNAME ": parent task a: %p c: %s p: [%d] s: %s\n",
par, par->comm, par->pid, state_str);
// iterate through parent's (and our task's) child processes:
rcu_read_lock(); // read_lock(&tasklist_lock);
list_for_each(children_list_head, &par->children){
p = list_entry(children_list_head, struct task_struct, sibling);
printk(KBUILD_MODNAME ": - %s [%d] \n", p->comm, p->pid);
// note: trying to print "%p",p here results with oops/segfault:
// printk(KBUILD_MODNAME ": - %s [%d] %p\n", p->comm, p->pid, p);
if (p->pid == callmodule_pid) {
list_for_each(cchildren_list_head, &p->children){
pc = list_entry(cchildren_list_head, struct task_struct, sibling);
printk(KBUILD_MODNAME ": - - %s [%d] \n", pc->comm, pc->pid);
}
}
}
rcu_read_unlock(); //~ read_unlock(&tasklist_lock);
return 0;
}
static void __exit callmodule_exit(void)
{
printk(KBUILD_MODNAME ": < exit\n");
}
module_init(callmodule_init);
module_exit(callmodule_exit);
MODULE_LICENSE("GPL");
/*
NOTES:
// assign function pointers to verbatim addresses as obtained from /proc/kallsyms:
// ( cast to void* to avoid "warning: assignment makes pointer from integer without a cast",
// see also https://stackoverflow.com/questions/3941793/what-is-guaranteed-about-the-size-of-a-function-pointer )
// $ sudo grep 'wait_for_helper\|____call_usermodehelper' /proc/kallsyms
// c1065b60 t wait_for_helper
// c1065ed0 t ____call_usermodehelper
// protos:
// static int wait_for_helper(void *data)
// static int ____call_usermodehelper(void *data)
// see also:
// http://www.linuxforu.com/2012/02/function-pointers-and-callbacks-in-c-an-odyssey/
// from include/linux/kmod.h:
//~ enum umh_wait {
//~ UMH_NO_WAIT = -1, /* don't wait at all * /
//~ UMH_WAIT_EXEC = 0, /* wait for the exec, but not the process * /
//~ UMH_WAIT_PROC = 1, /* wait for the process to complete * /
//~ };
// however, note:
// /usr/src/linux-headers-2.6.38-16-generic/include/linux/kmod.h:
// #define UMH_NO_WAIT 0 ; UMH_WAIT_EXEC 1 ; UMH_WAIT_PROC 2 ; UMH_KILLABLE 4 !
// those defines end up here, regardless of the enum definition above
// (NB: 0,1,2,4 enumeration starts from kmod.h?v=3.4 on lxr.free-electrons.com !)
// also, note, in "generic" include/, prototypes of call_usermodehelper(_fns)
// use int wait, and not enum umh_wait wait ...
// seems these cannot be used from a module, nonetheless:
//~ extern int wait_for_helper(void *data);
//~ extern int ____call_usermodehelper(void *data);
// we probably would have to (via http://www.linuxconsulting.ro/pidwatcher/)
// edit /usr/src/linux/kernel/ksyms.c and add:
//EXPORT_SYMBOL(wait_for_helper);
// but that is kernel re-compilation...
// https://stackoverflow.com/questions/19360298/triggering-user-space-with-kernel
// You should not be using PIDs to identify processes within the kernel. The process can exit and a different process re-use that PID. Instead, you should be using a pointer to the task_struct for the process (rather than storing current->pid at registration time, just store current)
# reports task name from the pid (pid_task(find_get_pid(..)):
http://tuxthink.blogspot.dk/2012/07/module-to-find-task-from-its-pid.html
// find the task:
//~ rcu_read_lock();
// uprobes uses this - but find_task_by_pid is not exported for modules:
//~ p = find_task_by_pid(callmodule_pid);
//~ if (p)
//~ get_task_struct(p);
//~ rcu_read_unlock();
// see: [http://www.gossamer-threads.com/lists/linux/kernel/1260996 find_task_by_pid() problem | Linux | Kernel]
// https://stackoverflow.com/questions/18408766/make-a-system-call-to-get-list-of-processes
// this macro loops through *all* processes; our callmodule_pid should be listed by it
//~ for_each_process(p)
//~ pr_info("%s [%d]\n", p->comm, p->pid);
// [https://lists.debian.org/debian-devel/2008/05/msg00034.html Re: problems for making kernel module]
// note - WARNING: "tasklist_lock" ... undefined; because tasklist_lock removed in 2.6.1*:
// "tasklist_lock protects the kernel internal task list. Modules have no business looking at it";
// https://stackoverflow.com/questions/13002444/list-all-threads-within-the-current-process
// "all methods that loop over the task lists need to be wrapped in rcu_read_lock(); / rcu_read_unlock(); to be correct."
// https://stackoverflow.com/questions/19208487/kernel-module-that-iterates-over-all-tasks-using-depth-first-tree
// https://stackoverflow.com/questions/5728592/how-can-i-get-the-children-process-list-in-kernel-code
// https://stackoverflow.com/questions/1446239/traversing-task-struct-children-in-linux-kernel
// https://stackoverflow.com/questions/8207160/kernel-how-to-iterate-the-children-of-the-current-process
// https://stackoverflow.com/questions/10262017/linux-kernel-list-list-head-init-vs-init-list-head
// https://stackoverflow.com/questions/16230524/explain-list-for-each-entry-and-list-for-each-entry-safe "list_entry is just an alias for container_of"
*/

Resources