What is a Kernel thread?

What is a Kernel thread? - c

I am just started coding of device driver and new to threading, went through many documents for getting an idea about threads. I still have some doubts.
what is a kernel thread?
how it differs from user thread?
what is the relationship between the two threads?
how can i implement kernel threads?
where can i see the output of the implementation?
Can anyone help me?
Thanks.

A kernel thread is a task_struct with no userspace components.
Besides the lack of userspace, it has different ancestors (kthreadd kernel thread instead of the init process) and is created by a kernel-only API instead of sequences of clone from fork/exec system calls.
Two kernel threads have kthreadd as a parent. Apart from that, kernel threads enjoy the same "independence" one from another as userspace processes.
Use the kthread_run function/macro from the kthread.h header You will most probably have to write a kernel module in order to call this function, so you should take a look a the Linux Device Drivers
If you are referring to the text output of your implementation (via printk calls), you can see this output in the kernel log using the dmesg command.

A kernel thread is a kernel task running only in kernel mode; it usually has not been created by fork() or clone() system calls. An example is kworker or kswapd.
You probably should not implement kernel threads if you don't know what they are.
Google gives many pages about kernel threads, e.g. Frey's page.

user threads & stack:
Each thread has its own stack so that it can use its own local variables, thread’s share global variables which are part of .data or .bss sections of linux executable.
Since threads share global variables i.e we use synchronization mechanisms like mutex when we want to access/modify global variables in multi threaded application. Local variables are part of thread individual stack, so no need of any synchronization.
Kernel threads
Kernel threads have emerged from the need to run kernel code in process context. Kernel threads are the basis of the workqueue mechanism. Essentially, a thread kernel is a thread that only runs in kernel mode and has no user address space or other user attributes.
To create a thread kernel, use kthread_create():
#include <linux/kthread.h>
structure task_struct *kthread_create(int (*threadfn)(void *data),
void *data, const char namefmt[], ...);
kernel threads & stack:
Kernel threads are used to do post processing tasks for kernel like pdf flush threads, workq threads etc.
Kernel threads are basically new process only without address space(can be created using clone() call with required flags), means they can’t switch to user-space. kernel threads are schedulable and preempt-able as normal processes.
kernel threads have their own stacks, which they use to manage local info.
More about kernel stacks:-
https://www.kernel.org/doc/Documentation/x86/kernel-stacks

Since you're comparing kernel threads with user[land] threads, I assume you mean something like the following.
The normal way of implementing threads nowadays is to do it in the kernel, so those can be considered "normal" threads. It's however also possible to do it in userland, using signals such as SIGALRM, whose handler will save the current process state (registers, mostly) and change them to another one previously saved. Several OSes used this as a way to implement threads before they got proper kernel thread support. They can be faster, since you don't have to go into kernel mode, but in practice they've faded away.
There's also cooperative userland threads, where one thread runs until it calls a special function (usually called yield), which then switches to another thread in a similar way as with SIGALRM above. The advantage here is that the program is in total control, which can be useful when you have timing concerns (a game for example). You also don't have to care much about thread safety. The big disadvantage is that only one thread can run at a time, and therefore this method is also uncommon now that processors have multiple cores.
Kernel threads are implemented in the kernel. Perhaps you meant how to use them? The most common way is to call pthread_create.

Related

Are locked pages inherited by pthreads?

I have a little paging problem on my realtime system, and wanted to know how exactly linux should behave in my particular case.
Among various other things, my application spawns 2 threads using pthread_create(), which operate on a set of shared buffers.
The first thread, let's call it A, reads data from a device, performs some calculations on it, and writes the results into one of the buffers.
Once that buffer is full, thread B will read all the results and send them to a PC via ethernet, while thread A writes into the next buffer.
I have noticed that each time thread A starts writing into a previously unused buffer, i miss some interrupts and lose data (there is an id in the header of each packet, and if that increments by more than one, i have missed interrupts).
So if i use n buffers, i get exactly n bursts of missed interrupts at the start of my data acquisition (therefore the problem is definitely caused by paging).
To fix this, i used mlock() and memset() on all of the buffers to make sure they are actually paged in.
This fixed my problem, but i was wondering where in my code would be the correct place do this. In my main application, or in one/both of the threads? (currently i do it in both threads)
According to the libc documentation (section 3.4.2 "Locked Memory Details"), memory locks are not inherited by child processes created using fork().
So what about pthreads? Do they behave the same way, or would they inherit those locks from my main process?
Some background information about my system, even though i don't think it matters in this particular case:
It is an embedded system powered by a SoC with a dual-core Cortex-A9 running Linux 4.1.22 with PREEMPT_RT.
The interrupt frequency is 4kHz
The thread priorities (as shown in htop) are -99 for the interrupt, -98 for thread A (both of which are higher than the standard priority of -51 for all other interrupts) and -2 for thread B
EDIT:
I have done some additional tests, calling my page locking function from different threads (and in main).
If i lock the pages in main(), and then try to lock them again in one of the threads, i would expect to see a large amount of page faults for main() but no page faults for the thread itself (because the pages should already be locked). However, htop tells a different story: i see a large amount of page faults (MINFLT column) for each and every thread that locks those pages.
To me, that would suggest that pthreads actually do have the same limitation as child processes spawned using fork(). And if this is the case, locking them in both threads (but not in main) would be the correct procedure.

Threads share the same memory management context. If a page is resident for one thread, it's resident for all threads in the same process.
The implication of this is that memory locking is per-process, not per-thread.
You are probably still seeing minor faults on the first write because a fault is used to mark the page dirty. You can avoid this by also writing to each page after locking.

In a linux kernel mode, how can I detect a process?

I need to create a user-mode process. This process has to be detected in kernel mode to be sent to a FIFO queue (SCHED_FIFO) in the Linux kernel.
I have been investigating and if you use the function void scheduler_tick(void), which is located in core.c (I think scheduler_tick is called by the system each tick of the clock of the cpu), I can capture the process.
My question is if this is correct, or if there is any better way.
Scheduler_tick code: http://lxr.free-electrons.com/ident?i=scheduler_tick
The work is based on a multilevel queue, in which a series of diferente processes will be introduced(we have: payment processes, cancellations processes, reservations processes and event processes). These processes have different priorities in to the system.
Therefore when I created a process, for example payment process, i need detect the process, because i need know what type it is your priority.
Hence to comment the idea of used the function void scheduler_tick to detect process.
I don't know if i explained well ...
Thank you very much.

Creating a user process is not a kernel concern.
All the user processes that are created are forked from the init process or from its children.

You don't need to do that in the kernel. Actually, you have to keep that outside of the kernel.
What you need to do is either use chrt in you init scripts or use sched_setscheduler from your init program or daemon monitor.

Create a user level thread or kernel level thread using `pthread_create`?

Question: How can one create a user level thread or kernel level thread using pthread_create?
Note: I checked the documentation of pthread_create in this link and I didn't find any parameter that can be specified to tell OS to create either user level thread or the kernel level thread. So if there is no parameter then when thread created using pthread_create by default is user level or kernel level?
Any information or hint would be great.
Thanks.

pthread_create simply creates a thread. Not "a kernel-level thread" or "a user-level thread". The latter are descriptions you could use talking about implementation of threads, but as far as POSIX threads are concerned, there is no practical way to implement threads without each thread having some corresponding scheduling/state object belonging to the kernel. This is because each thread has independent signal mask, pending signals, etc. and can be independently blocked in various operations that allow other threads to make forward progress while they are blocked. So in some sense, you could say pthread_create creates "kernel level threads". That's certainly the mechanism in all major real-world implementations.

Linux Scheduling: OS vs "virtual"

How does one implement a multithreaded single process model in linux fedora under c where a single scheduler is used on a "main" core reading i/o availability (ex. tcp/ip, udp) then having a single-thread-per-core (started at init), the "execution thread", parse the data then update a small amount of info update to shared memory space (it is my understanding pthreads share data under a single process).
I beleive my options are:
Pthreads or the linux OS scheduler
I have a naive model in mind consisting of starting a certain number of these execution threads a single scheduler thread.
What is the best solution one could think when I know that I can use this sort of model.

Completing Benoit's answer, in order to communicate between your master and your worker threads, you could use conditional variable. The workers do something like:
while (true)
{
pthread_mutex_lock(workQueueMutex);
while (workQueue.empty())
pthread_cond_wait(workQueueCond, workQueueMutex);
/* if we get were then (a) we have work (b) we hold workQueueMutex */
work = pop(workQueue);
pthread_mutex_unlock(workQueueMutex);
/* do work */
}
and the master:
/* I/O received */
pthread_mutex_lock(workQueueMutex);
push(workQueue, work);
pthread_cond_signal(workQueueCond);
pthread_mutex_unlock(workQueueMutex);
This would wake up one idle work to immediately process the request. If no worker is available, the work will be dequeued and processed later.

Modifying the Linux scheduler is quite a tough work. I would just forget about it. Pthread is usually prefered. If I understand well, you want to have one core dedicated to the control plan, and a pool of other cores dedicated to the data plan processing? Then create a pool of threads from your master thread and setup core affinity for these slave threads with pthread_setaffinity_np(...).
Indeed threads of a process share the same address-space, and global variables are accessible by any threads of that process.

It looks to me that you have a version of the producer-consumer problem with a single consumer aggregating the results of n producers. This is a pretty standard problem, so I definitely think that pthread is more than enough for you. You don't need to go and mess around with the scheduler.
As one of the answer's states, a thread safe queue like the one described here works nicely for this sort of issue. Your original idea of spawning a bunch of threads is a good idea. You seem to be worried that the ability of the threads to share global state will cause you problems. I don't think that this is an issue if you keep shared state to a minimum and use sane locking discipline. Sharing state is fine as long as you do so responsibly.
Finally, unless you really know what you're doing, I would advise against manually messing with thread affinity. Just spawn the threads and let the scheduler handle when and on what core a thread runs. The thing to optimize is the number of threads you use. One for each core may not actually be the fastest approach if other threads are running.

Generally speaking, this is more or less exactly what the posix select and linux specific epoll functions are for.

Why are threads called lightweight processes?

A thread is "lightweight" because most of the overhead has already been accomplished through the creation of its process.
I found this in one of the tutorials.
Can somebody elaborate what it exactly means?

The claim that threads are "lightweight" is - depending on the platform - not necessarily reliable.
An operating system thread has to support the execution of native code, e.g. written in C. So it has to provide a decent-sized stack, usually measured in megabytes. So if you started 1000 threads (perhaps in an attempt to support 1000 simultaneous connections to your server) you would have a memory requirement of 1 GB in your process before you even start to do any real work.
This is a real problem in highly scalable servers, so they don't use threads as if they were lightweight at all. They treat them as heavyweight resources. They might instead create a limited number of threads in a pool, and let them take work items from a queue.
As this means that the threads are long-lived and small in number, it might be better to use processes instead. That way you get address space isolation and there isn't really an issue with running out of resources.
In summary: be wary of "marketing" claims made on behalf of threads. Parallel processing is great (increasingly it's going to be essential), but threads are only one way of achieving it.

Process creation is "expensive", because it has to set up a complete new virtual memory space for the process with it's own address space. "expensive" means takes a lot of CPU time.
Threads don't need to do this, just change a few pointers around, so it's much "cheaper" than creating a process. The reason threads don't need this is because they run in the address space, and virtual memory of the parent process.
Every process must have at least one thread. So if you think about it, creating a process means creating the process AND creating a thread. Obviously, creating only a thread will take less time and work by the computer.
In addition, threads are "lightweight" because threads can interact without the need of inter-process communication. Switching between threads is "cheaper" than switching between processes (again, just moving some pointers around). And inter-process communication requires more expensive communication than threads.

Threads within a process share the same virtual memory space but each has a separate stack, and possibly "thread-local storage" if implemented. They are lightweight because a context switch is simply a case of switching the stack pointer and program counter and restoring other registers, wheras a process context switch involves switching the MMU context as well.
Moreover, communication between threads within a process is lightweight because they share an address space.

process:
process id
environment
folder
registers
stack
heap
file descriptor
shared libraries
instruments of interprocess communications (pipes, semaphores, queues, shared memory, etc.)
specific OS sources
thread:
stack
registers
attributes (for sheduler, like priority, policy, etc.)
specific thread data
specific OS sources

A process contains one or more threads in it and a thread can do anything a process can do. Also threads within a process share the same address space because of which cost of communication between threads is low as it is using the same code section, data section and OS resources, so these all features of thread makes it a "lightweight process".

Just because the threads share the common memory space. The memory allocated to the main thread will be shared by all other child threads.
Whereas in case of Process, the child process are in need to allocate the separate memory space.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight