Efficient way to find task_struct by pid - c

Is there an efficient way of finding the task_struct for a specified PID, without iterating through the task_struct list?

What's wrong with using one of the following?
extern struct task_struct *find_task_by_vpid(pid_t nr);
extern struct task_struct *find_task_by_pid_ns(pid_t nr,
struct pid_namespace *ns);

If you want to find the task_struct from a module, find_task_by_vpid(pid_t nr) etc. are not going to work since these functions are not exported.
In a module, you can use the following function instead:
pid_task(find_vpid(pid), PIDTYPE_PID);

There is a better way to get the instance of task_struct from a module.
Always try to use wrapper function/ helper routines because they are designed in such a way if driver programmer missed something, the kernel can take care by own. For eg - error handling, conditions checks etc.
/* Use below API and you will get a pointer of (struct task_struct *) */
taskp = get_pid_task(pid, PIDTYPE_PID);
and to get the PID of type pid_t. you need to use below API -
find_get_pid(pid_no);
You don't need to use "rcu_read_lock()" and "rcu_read_unlock()" while calling these API's because "get_pid_task()" internally calls rcu_read_lock(),rcu_read_unlock() before calling "pid_task()" and handles concurrency properly. That's why I have said above use these kind of wrapper always.
Snippet of get_pid_task() and find_get_pid() function below :-
struct task_struct *get_pid_task(struct pid *pid, enum pid_type type)
{
struct task_struct *result;
rcu_read_lock();
result = pid_task(pid, type);
if (result)
get_task_struct(result);
rcu_read_unlock();
return result;
}
EXPORT_SYMBOL_GPL(get_pid_task);
struct pid *find_get_pid(pid_t nr)
{
struct pid *pid;
rcu_read_lock();
pid = get_pid(find_vpid(nr));
rcu_read_unlock();
return pid;
}
EXPORT_SYMBOL_GPL(find_get_pid);
In a kernel module, you can use wrapper function in the following way as well -
taskp = get_pid_task(find_get_pid(PID),PIDTYPE_PID);
PS: for more information on API's you can look at kernel/pid.c

No one mentioned that the pid_task() function and the pointer (which you obtain from it) should be used inside RCU critical section (because it uses RCU-protected data structure). Otherwise there can be use-after-free BUG.
There are lots of cases of using pid_task() in Linux kernel sources (e.g. in posix_timer_event()).
For example:
rcu_read_lock();
/* search through the global namespace */
task = pid_task(find_pid_ns(pid_num, &init_pid_ns), PIDTYPE_PID);
if (task)
printk(KERN_INFO "1. pid: %d, state: %#lx\n",
pid_num, task->state); /* valid task dereference */
rcu_read_unlock(); /* after it returns - task pointer becomes invalid! */
if (task)
printk(KERN_INFO "2. pid: %d, state: %#lx\n",
pid_num, task->state); /* may be successful,
* but is buggy (task dereference is INVALID!) */
Find out more about RCU API from Kernel.org
P.S. also you can just use the special API functions like find_task_by_pid_ns() and find_task_by_vpid() under the rcu_read_lock().
The first one is for searching through the particular namespace:
task = find_task_by_pid_ns(pid_num, &init_pid_ns); /* e.g. init namespace */
The second one is for searching through the namespace of current task.

Related

Why does printing the pointer to structure of type pthread, gives us thread ID?

The structure of pthread is as follows. It is taken from https://stuff.mit.edu/afs/sipb/project/pthreads/include/pthread.h
struct pthread {
struct machdep_pthread machdep_data;
enum pthread_state state;
pthread_attr_t attr;
/* Signal interface */
sigset_t sigmask;
sigset_t sigpending;
/* Time until timeout */
struct timespec wakeup_time;
/* Cleanup handlers Link List */
struct pthread_cleanup *cleanup;
/* Join queue for waiting threads */
struct pthread_queue join_queue;
/* Queue thread is waiting on, (mutexes, cond. etc.) */
struct pthread_queue *queue;
/*
* Thread implementations are just multiple queue type implemenations,
* Below are the various link lists currently necessary
* It is possible for a thread to be on multiple, or even all the
* queues at once, much care must be taken during queue manipulation.
*
* The pthread structure must be locked before you can even look at
* the link lists.
*/
struct pthread *pll; /* ALL threads, in any state */
/* struct pthread *rll; Current run queue, before resced */
struct pthread *sll; /* For sleeping threads */
struct pthread *next; /* Standard for mutexes, etc ... */
/* struct pthread *fd_next; For kernel fd operations */
int fd; /* Used when thread waiting on fd */
semaphore lock;
/* Data that doesn't need to be locked */
void *ret;
int error;
const void **specific_data;
};
typedef struct pthread * pthread_t;
Now let's see the following code to print the ID of the thread:
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
void* calls(void* ptr)
{
// using pthread_self() get current thread id
printf("In function \nthread id = %ld\n", pthread_self());
pthread_exit(NULL);
return NULL;
}
int main()
{
pthread_t thread; // declare thread
pthread_create(&thread, NULL, calls, NULL);
printf("In main \nthread id = %ld\n", thread);
pthread_join(thread, NULL);
return 0;
}
Output in my system is:
In main
thread id = 140289852200704
In function
thread id = 140289852200704
From pthread.h file (above), pthread is a structure, thread in the code is a pointer to the structure pthread (since pthread_t is typdef struct pthread*). Why does printing this pointer gives us the thread ID?
From pthread.h file (above), pthread is a structure, thread in the
code is a pointer to the structure pthread (since pthread_t is typdef
struct pthread*).
To be clear: in that implementation, pthread_t is a pointer-to-structure type. I imagine that's very common for pthreads implementations, but do be careful to avoid mistaking the details of a particular implementation for a general characteristic of the specifications or of all implementations. For example, it could just as well be an integer index in some other implementation, among various other possibilities.
Why does printing this pointer gives us the thread
ID?
Because it is the thread ID. And because you're lucky that the undefined behavior arising from printing it with a %d formatting directive manifested the same way in both places.
You probably have done yourself a disservice by looking under the covers at the definition of your implementation's pthread_t. You don't need to know those details to use pthreads, and in fact they do not help you in the slightest. The type is meant to be treated as opaque.
All you really need to understand to answer the question is that the value written into variable thread by pthread_create() is the created thread's ID, and the value returned by pthread_self() is the calling thread's thread ID. Naturally, each mechanism for obtaining a thread ID yields the same ID for the same thread.

pthread doesn't let you get the tid?

In Linux, the tid is retrieved with a syscall: gettid().
The pthread object stores the tid in struct pthread:
struct pthread {
...
/* This descriptor's link on the `stack_used' or `__stack_user' list. */
list_t list;
/* Thread ID - which is also a 'is this thread descriptor (and
therefore stack) used' flag. */
pid_t tid;
/* Ununsed. */
pid_t pid_ununsed;
...
};
I believe this structure is stored at the base of the stack? Anyway, every pthread has access to it. For instance, in the pthread_create implementation, it grabs the struct pthread:
struct pthread *self = THREAD_SELF;
So my question is: why is there no pthread_gettid_np() call? Perhaps gettid() is so fast somehow it's negligible? Or maybe the call exists and I just can't find it anywhere?

Get Process Info (Current proc, parent proc, oldest child proc)

I have an exercise about adding a system call in the Linux kernel, but I'm struggling to implement it. Below is the description:
The main part of this assignment is to implement a new system call that lets the user determine the information about both the parent and the oldest child process. The information about the processes' information is represented through the following struct:
struct procinfos{
long studentID;
struct proc_info proc;
struct proc_info parent_proc;
struct proc_info oldest_child_proc;
};
Where proc_info is defined as follows:
struct proc_info{
pid_t pid;
char name[16];
};
procinfos contains information of three processes:
proc, the current process or process with PID
parent_proc, the parent of the first process
oldest_child_proc, the oldest child process of the first process
The processes' information is stored in the struct proc_info and contains:
pid, the pid of the process
name, the name of the program which is executed
The prototype of our system call is described below:
To invoke get_proc_info system call, the user must provide the PID of the process or −1 in the case of the current process. If the system call finds the process with the given PID, it will get the process' information, put it in output parameter *info, and return 0. However, if the system call cannot find such a process, it will return
EINVAL.
#include <linux/kernel.h>
#include <unistd.h>
struct procinfos{
long studentID;
struct proc_info proc;
struct proc_info parent_proc;
struct proc_info oldest_child_proc;
};
struct proc_info{
pid_t pid;
char name[16];
};
asmlinkage long sys_get_proc_info(pid_t pid, struct proinfos *info){
// TODO: implement the system call
}
HINT:
To find the current process: look at arch/x86/include/asm/current.h or for simple use macro current (current -> pid).
To find info about each process, look at include/linux/sched.h.
To after the trimming process the time to build the kernel is reduced to about 10 minutes but it is till a long time to compile. To make to the development of system call as fast as possible, you can use kernel module to test the system call represented as a module in advance (Appd B).
How to implement this system call?
since I can't help you much about this, let me give you some hints how to do this:
You can get most of the information of each process when look at the task_struct datatype in include/linux/sched.h. "current" is also a pointer of task_struct type, and you can extract some useful information from a task_struct. For example:
current->pid give you the ID of the process
current->parent give you the parent of the process;parent is also a pointer of task_struct type
current->comm give you the name of the process, as an char array
current->children give you a list of children of a process. It's a list_head data type, implemented as a double linklist. You need to find another way to access this data type. Try to find in scripts/kconfig/list.h

Passing a struct by value to another function initializes it to zero

I am trying to create a thread library and my thread is a struct type. Have to follow a certain interface and in that I need to pass the thread by value. For ex: to join on a thread my code is as follows:
int thread_join(thread_t thread, void **status1)
{
printf("Joining thread\n");
long int thId = thread.id;
printf("Thread id: %ld\n", thId);
gtthread_t * thrd = getThreadFromID(thId);
while(thrd->status != EXIT)
{
}
status1 = &(thrd->ret_value);
return 0;
}
And I an passing a struct of type thread_t to this function. My problem is when I see the thread's ID in the calling function, its displayed properly but when I check it in the thread_join function its displayed as 0. The caller function is as follows:
void* caller(void* arg)
{
thread_t th;
thread_create(&th, some_function, NULL);
thread_join(th, NULL);
while(1);
}
Thread create initializes the ID of the thread to a non-zero value and starts the function associated with it.
My thread structure (and other relevant structure is):
typedef enum
{
RUNNING,
WAITING,
CANCEL,
EXIT
} stat;
//Thread
typedef struct
{
ucontext_t t_ctxt;
long int id;
stat status;
void * ret_value;
int isMain;
} thread_t;
int thread_create(thread_t *thread, void *(*start_routine)(void *), void *arg)
{
thread = (thread_t *)malloc(sizeof(thread_t));
thread->id = ++count;
thread->status = RUNNING;
thread->ret_value = NULL;
thread->isMain = 0;
if(getcontext(&(thread->t_ctxt)) == -1)
handle_error("getcontext");
thread->t_ctxt.uc_stack.ss_sp = malloc(SIGSTKSZ);
thread->t_ctxt.uc_stack.ss_size = SIGSTKSZ;
thread->t_ctxt.uc_link = &sched_ctxt;
makecontext(&thread->t_ctxt, (void (*)(void))wrap_func, 2, (void (*)(void))start_routine, arg);
enqueue(gQ, thread);
printf("Thread id: %ld\n", thread->id);
swapcontext(&(curr_thread->t_ctxt),&sched_ctxt);
return 0;
}
Why does this happen? After all, I am passing by value and this should create a copy of the thread with the same values. Thanks.
EDIT:
Basically I am having a queue of threads and there is a scheduler which round-robins. I can post that code here too but I'm sure that's needless and that code works fine.
EDIT2:
I am making a header file from this code and including that header in another file to test it. All my thread_t variables are static. The caller is a function which includes my header file.
What is this line:
thread = (thread_t *)malloc(sizeof(thread_t));
for?
You pass in to thread_create() an address which referrs to a struct thread_t defined in caller() as auto variable.
Doing as you do, you allocate memory to the pointer passed in to thread_create() initialise it and forget the address on return.
The code never writes to the memory being referenced by the address passed in! Besides this it is a memory leak.
To fix this simply remove the line of code quoted above.
You have no mutex guard on thread id getter. Presumably, there is no guard on setter. What can be happening is that the variable is not visible in the other thread yet. And, without a critical section, it may never become visible.
Each variable which is accessed for both read and write from different threads has to be accessed in a critical section (pthread_mutex_lock / unlock).
Another possibility is that you are setting the thread id inside the running thread and you are accessing the variable even before it is set. If you attempt to join immediately after starting a thread it is possible, that the other thread hasn't been run at all yet and the variable is not set.
side note: do yourself a favor and use calloc:)
In caller function,
thread_create(&th, some_function, NULL);
should be
gtthread_create(&th, some_function, NULL);

how does current->pid work for linux?

Do I need to include a library?
Can anyone please elaborate in it?
I know is used to get the process id of the current task where is being called from
But I want to printk something with current->pid
printk("My current process id/pid is %d\n", current->pid);
...and is giving me an error
error: dereferencing pointer to incomplete type
You're looking for #include <linux/sched.h>. That's where task_struct is declared.
Your code should work. You are probably missing some header.
current is a per-cpu variable defined in linux/arch/x86/include/asm/current.h (all the code is for the case of x86):
DECLARE_PER_CPU(struct task_struct *, current_task);
static __always_inline struct task_struct *get_current(void)
{
return percpu_read_stable(current_task);
}
#define current get_current()
current points to the task running on a CPU at a given moment. Its type is struct task_struct and it is defined in linux/include/linux/sched.h:
struct task_struct {
...
pid_t pid; // process identifier
pid_t tgid; // process thread group id
...
};
You can browse the code for these files in the Linux Cross Reference:
current.h
sched.h
I think you're looking for the getpid() system call. I don't know what current is though.

Resources