The Linux kernel uses struct pid to represent PID in kernel space.The C code is below.
struct pid
{
atomic_t count;
/* lists of tasks that use this pid */
struct hlist_head tasks[PIDTYPE_MAX];
int level;
struct upid numbers[1];
};
I can not really understand why the member tasks can represent "the lists of tasks".Since task_struct is the kernel internal representation of task,and one task only uses one task_struct.Why can more than one task share a "struct pid"?
Because more than one task can be part of the same process. Consider, for example, a multi-threaded process using a 1-to-1 threading library like NPTL. It has a single process ID, is a single process, but consists of multiple entities scheduled by the kernel.
Related
This might be a dumb question, i'm very sorry if that's the case. But i'm struggling to take advantage of the multiple cores in my computer to perform multiple computations at the same time in my Quad-Core MacBook. This is not for any particular project, just a general question, since i want to learn for when i eventually do need to do this kind of things
I am aware of threads, but the seem to run in the same core, so i don't seem to gain any performance using them for compute-bound operations (They are very useful for socket based stuff tho!).
I'm also aware of processed that can be created with fork, but i'm nor sure they are guaranteed to use more CPU, or if they, like threads, just help with IO-bound operations.
Finally i'm aware of CUDA, allowing paralellism in the GPU (And i think OpenCL and Compute Shaders also allows my code to run in the CPU in parallel) but i'm currently looking for something that will allow me to take advantage of the multiple CPU cores that my computer has.
In python, i'm aware of the multiprocessing module, which seems to provide an API very similar to threads, but there i do seem to gain an edge by running multiple functions performing computations in parallel. I'm looking into how could i get this same advantage in C, but i don't seem to be able
Any help pointing me to the right direction would be very much appreciated
Note: I'm trying to achive true parallelism, not concurrency
Note 2: I'm only aware of threads and using multiple processes in C, with threads i don't seem to be able to win the performance boost i want. And i'm not very familiar with processes, but i'm still not sure if running multiple processes is guaranteed to give me the advantage i'm looking for.
A simple program to heat up your CPU (100% utilization of all available cores).
Hint: The thread starting function does not return, program exit via [CTRL + C]
#include <pthread.h>
void* func(void *arg)
{
while (1);
}
int main()
{
#define NUM_THREADS 4 //use the number of cores (if known)
pthread_t threads[NUM_THREADS];
for (int i=0; i < NUM_THREADS; ++i)
pthread_create(&threads[i], NULL, func, NULL);
for (int i=0; i < NUM_THREADS; ++i)
pthread_join(threads[i], NULL);
return 0;
}
Compilation:
gcc -pthread -o thread_test thread_test.c
If i start ./thread_test, all cores are at 100%.
A word to fork and pthread_create:
fork creates a new process (the current process image will be copied and executed in parallel), while pthread_create will create a new thread, sometimes called a lightweight process.
Both, processes and threads will run in 'parallel' to the parent process.
It depends, when to use a child process over a thread, e.g. a child is able to replace its process image (via exec family) and has its own address space, while threads are able to share the address space of the current parent process.
There are of course a lot more differences, for that i recommend to study the following pages:
man fork
man pthreads
I am aware of threads, but the seem to run in the same core, so i don't seem to gain any performance using them for compute-bound operations (They are very useful for socket based stuff tho!).
No, they don't. Except if you block and your threads don't block, you'll see alll of them running. Just try this (beware that this consumes all your cpu time) that starts 16 threads each counting in a busy loop for 60 s. You will see all of them running and makins your cores to fail to their knees (it runs only a minute this way, then everything ends):
#include <assert.h>
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#define N 16 /* had 16 cores, so I used this. Put here as many
* threads as cores you have. */
struct thread_data {
pthread_t thread_id; /* the thread id */
struct timespec end_time; /* time to get out of the tunnel */
int id; /* the array position of the thread */
unsigned long result; /* number of times looped */
};
void *thread_body(void *data)
{
struct thread_data *p = data;
p->result = 0UL;
clock_gettime(CLOCK_REALTIME, &p->end_time);
p->end_time.tv_sec += 60; /* 60 s. */
struct timespec now;
do {
/* just get the time */
clock_gettime(CLOCK_REALTIME, &now);
p->result++;
/* if you call printf() you will see them slowing, as there's a
* common buffer that forces all thread to serialize their outputs
*/
/* check if we are over */
} while ( now.tv_sec < p->end_time.tv_sec
|| now.tv_nsec < p->end_time.tv_nsec);
return p;
} /* thread_body */
int main()
{
struct thread_data thrd_info[N];
for (int i = 0; i < N; i++) {
struct thread_data *d = &thrd_info[i];
d->id = i;
d->result = 0;
printf("Starting thread %d\n", d->id);
int res = pthread_create(&d->thread_id,
NULL, thread_body, d);
if (res < 0) {
perror("pthread_create");
exit(EXIT_FAILURE);
}
printf("Thread %d started\n", d->id);
}
printf("All threads created, waiting for all to finish\n");
for (int i = 0; i < N; i++) {
struct thread_data *joined;
int res = pthread_join(thrd_info[i].thread_id,
(void **)&joined);
if (res < 0) {
perror("pthread_join");
exit(EXIT_FAILURE);
}
printf("PTHREAD %d ended, with value %lu\n",
joined->id, joined->result);
}
} /* main */
Linux and all multithread systems work the same, they create a new execution unit (if both don't share the virtual address space, they are both processes --not exactly so, but this explains the main difference between a process and a thread--) and the available processors are given to each thread as necessary. Threads are normally encapsulated inside processes (they share ---not in linux, if that has not changed recently--- the process id, and virtual memory) Processes run each in a separate virtual space, so they can only share things through the system resources (files, shared memory, communication sockets/pipes, etc.)
The problem with your test case (you don't show it so I have go guess) is that probably you will make all threads in a loop in which you try to print something. If you do that, probably the most time each thread is blocked trying to do I/O (to printf() something)
Stdio FILEs have the problem that they share a buffer between all threads that want to print on the same FILE, and the kernel serializes all the write(2) system calls to the same file descriptor, so if the most of the time you pass in the loop is blocked in a write, the kernel (and stdio) will end serializing all the calls to print, making it to appear that only one thread is working at a time (all the threads will become blocked by the one that is doing the I/O) This busy loop will make all the threads to run in parallel and will show you how the cpu is collapsed.
Parallelism in C can be achieved by using the fork() function. This function simulates a thread by allowing two threads to run simultaneously and share data. The first thread forks itself, and the second thread is then executed as if it was launched from main(). Forking allows multiple processes to be Run concurrently without conflicts arising.
To make sure that data is shared appropriately between the two threads, use the wait() function before accessing shared resources. Wait will block execution of the current program until all database connections are closed or all I/O has been completed, whichever comes first.
I see that a linux\pid.h in the kernel defines the following type:
enum pid_type
{
PIDTYPE_PID,
PIDTYPE_TGID,
PIDTYPE_PGID,
PIDTYPE_SID,
PIDTYPE_MAX,
};
and the struct pid type uses it when keeping track of the tasks associated to the PID:
struct pid
{
atomic_t count;
unsigned int level;
/* lists of tasks that use this pid */
struct hlist_head tasks[PIDTYPE_MAX];
struct rcu_head rcu;
struct upid numbers[1];
};
But what does each list refers to? It's my understanding that PIDTYPE_PID refers to tasks which use this as PID (the "thread ID" from kernel perspective) and PIDTYPE_TGID as tasks which use this as TGID, i.e. thread group ID which denotes a group of threads which share the same userspace PID, what are PIDTYPE_PGID and PIDTYPE_SID?
SID = session ID,
PGID = process group ID as described here:
https://www.win.tue.nl/~aeb/linux/lk/lk-10.html
I have an exercise about adding a system call in the Linux kernel, but I'm struggling to implement it. Below is the description:
The main part of this assignment is to implement a new system call that lets the user determine the information about both the parent and the oldest child process. The information about the processes' information is represented through the following struct:
struct procinfos{
long studentID;
struct proc_info proc;
struct proc_info parent_proc;
struct proc_info oldest_child_proc;
};
Where proc_info is defined as follows:
struct proc_info{
pid_t pid;
char name[16];
};
procinfos contains information of three processes:
proc, the current process or process with PID
parent_proc, the parent of the first process
oldest_child_proc, the oldest child process of the first process
The processes' information is stored in the struct proc_info and contains:
pid, the pid of the process
name, the name of the program which is executed
The prototype of our system call is described below:
To invoke get_proc_info system call, the user must provide the PID of the process or −1 in the case of the current process. If the system call finds the process with the given PID, it will get the process' information, put it in output parameter *info, and return 0. However, if the system call cannot find such a process, it will return
EINVAL.
#include <linux/kernel.h>
#include <unistd.h>
struct procinfos{
long studentID;
struct proc_info proc;
struct proc_info parent_proc;
struct proc_info oldest_child_proc;
};
struct proc_info{
pid_t pid;
char name[16];
};
asmlinkage long sys_get_proc_info(pid_t pid, struct proinfos *info){
// TODO: implement the system call
}
HINT:
To find the current process: look at arch/x86/include/asm/current.h or for simple use macro current (current -> pid).
To find info about each process, look at include/linux/sched.h.
To after the trimming process the time to build the kernel is reduced to about 10 minutes but it is till a long time to compile. To make to the development of system call as fast as possible, you can use kernel module to test the system call represented as a module in advance (Appd B).
How to implement this system call?
since I can't help you much about this, let me give you some hints how to do this:
You can get most of the information of each process when look at the task_struct datatype in include/linux/sched.h. "current" is also a pointer of task_struct type, and you can extract some useful information from a task_struct. For example:
current->pid give you the ID of the process
current->parent give you the parent of the process;parent is also a pointer of task_struct type
current->comm give you the name of the process, as an char array
current->children give you a list of children of a process. It's a list_head data type, implemented as a double linklist. You need to find another way to access this data type. Try to find in scripts/kconfig/list.h
I am trying to implement kernel level threads in xv6.
My main problem at the moment is to understand how the CPU gets its information about the current process and how to modify it to point to the current thread instead.
I know it is somehow linked to this line:
extern struct proc *proc asm("%gs:4");
in proc.h, but I do not fully understand how and why it works.
I found out %gs points to to the line struct cpu *cpu; in the struct cpu (defined at proc.h), and right below that line (+ 4 bytes after the cpu pointer)
the current process of the cpu is stored:
struct proc *proc; // The currently-running process.
so in order to add thread support one should either alter this line to point to the new thread struct instead of process struct or alternatively, add the thread below the "proc" line and perform the following changes:
add in proc.h the following decleration: extern struct thread *thread asm("%gs:8");
change in vm.c, in fucntion "seginit(void)" the line
c->gdt[SEG_KCPU] = SEG(STA_W, &c->cpu, 8, 0); to c->gdt[SEG_KCPU] = SEG(STA_W, &c->cpu, 12, 0); in order to allocate space for the extra thread pointer.
Is there an efficient way of finding the task_struct for a specified PID, without iterating through the task_struct list?
What's wrong with using one of the following?
extern struct task_struct *find_task_by_vpid(pid_t nr);
extern struct task_struct *find_task_by_pid_ns(pid_t nr,
struct pid_namespace *ns);
If you want to find the task_struct from a module, find_task_by_vpid(pid_t nr) etc. are not going to work since these functions are not exported.
In a module, you can use the following function instead:
pid_task(find_vpid(pid), PIDTYPE_PID);
There is a better way to get the instance of task_struct from a module.
Always try to use wrapper function/ helper routines because they are designed in such a way if driver programmer missed something, the kernel can take care by own. For eg - error handling, conditions checks etc.
/* Use below API and you will get a pointer of (struct task_struct *) */
taskp = get_pid_task(pid, PIDTYPE_PID);
and to get the PID of type pid_t. you need to use below API -
find_get_pid(pid_no);
You don't need to use "rcu_read_lock()" and "rcu_read_unlock()" while calling these API's because "get_pid_task()" internally calls rcu_read_lock(),rcu_read_unlock() before calling "pid_task()" and handles concurrency properly. That's why I have said above use these kind of wrapper always.
Snippet of get_pid_task() and find_get_pid() function below :-
struct task_struct *get_pid_task(struct pid *pid, enum pid_type type)
{
struct task_struct *result;
rcu_read_lock();
result = pid_task(pid, type);
if (result)
get_task_struct(result);
rcu_read_unlock();
return result;
}
EXPORT_SYMBOL_GPL(get_pid_task);
struct pid *find_get_pid(pid_t nr)
{
struct pid *pid;
rcu_read_lock();
pid = get_pid(find_vpid(nr));
rcu_read_unlock();
return pid;
}
EXPORT_SYMBOL_GPL(find_get_pid);
In a kernel module, you can use wrapper function in the following way as well -
taskp = get_pid_task(find_get_pid(PID),PIDTYPE_PID);
PS: for more information on API's you can look at kernel/pid.c
No one mentioned that the pid_task() function and the pointer (which you obtain from it) should be used inside RCU critical section (because it uses RCU-protected data structure). Otherwise there can be use-after-free BUG.
There are lots of cases of using pid_task() in Linux kernel sources (e.g. in posix_timer_event()).
For example:
rcu_read_lock();
/* search through the global namespace */
task = pid_task(find_pid_ns(pid_num, &init_pid_ns), PIDTYPE_PID);
if (task)
printk(KERN_INFO "1. pid: %d, state: %#lx\n",
pid_num, task->state); /* valid task dereference */
rcu_read_unlock(); /* after it returns - task pointer becomes invalid! */
if (task)
printk(KERN_INFO "2. pid: %d, state: %#lx\n",
pid_num, task->state); /* may be successful,
* but is buggy (task dereference is INVALID!) */
Find out more about RCU API from Kernel.org
P.S. also you can just use the special API functions like find_task_by_pid_ns() and find_task_by_vpid() under the rcu_read_lock().
The first one is for searching through the particular namespace:
task = find_task_by_pid_ns(pid_num, &init_pid_ns); /* e.g. init namespace */
The second one is for searching through the namespace of current task.