what's the difference between the threads(and process) in kernel-mode and ones in user-mode? - c

my question:
1)In book modern operating system, it says the threads and processes can be in kernel mode or user mode, but it does not say clearly what's the difference between them .
2)Why the switch for the kernel-mode threads and process costs more than the switch for user-mode threads and process?
3) now, I am learning Linux,I want to know how would I create threads and processes in Kernel mode and user mode respectively IN LINUX SYSTEM?
4)In book modern operating system, it says that it is possible that process would be in user- mode, but the threads which are created in the user-mode process can be in kernel mode. How would this be possible?

There are some terminology problems due more to historical accident than anything else here.
"Thread" usually refers to thread-of-control within a process, and may (does in this case) mean "a task with its own stack, but which shares access to everything not on that stack with other threads in the same protection domain".
"Process" tends to refer to a self-contained "protection domain" which may (and does in this case) have the ability to have multiple threads within it. Given two processes P1 and P2, the only way for P1 to affect P2 (or vice versa) is through some particular defined "communications channel" such as a file, pipe, or socket; via "inter-process" signals like Unix/Linux signals; and so on.
Since threads don't have this kind of barrier between each other, one thread can easily interfere with (corrupt the data used by) another thread.
All of this is independent of user vs kernel, with one exception: in "the kernel"—note that there is an implicit assumption here that there is just one kernel—you have access to the entire machine state at all times, and full privileges to do anything. Hence you can deliberately (or in some cases accidentally) disregard or turn off hardware protection and mess with data "belonging to" someone else.
That mostly covers several possibly-confused items in Q1. As for Q2, the answer to the question as asked is "it doesn't". In general, because threads do not involve (as much) protection, it's cheaper to switch from one thread to another: you do not have to tell the hardware (in whatever fashion) that it should no longer allow various kinds of access, since threads T1 and T2 have "the same" access. Switching between processes, however, as with P1 and P2, you "cross a protection barrier", which has some penalty (the actual penalty varies widely with hardware, and to some extent the skills of the OS writers).
It's also worth noting that crossing from user to kernel mode, and vice versa, is also crossing a protection domain, which again has some kind of cost.
In Linux, there are a number of ways for user processes to create what amount to threads, including both "POSIX threads" (pthreads) and the clone call (details for clone, which is extremely flexible, are beyond the scope of this answer). If you want to write portable code, you should probably stick with pthreads.
Within the Linux kernel, threads are done completely differently, and you will need Linux kernel documentation.
I can't properly answer Q4 since I don't have the book and am not sure what they are referring to here. My guess is that they mean that whenever any user process-or-thread makes a "system call" (requests some service from the OS), this crosses that user/kernel protection barrier, and it is then up to the kernel to verify that the user code has appropriate privileges for that operation, and then to do that operation. The part of the kernel that does this is running with kernel-level protections and thus needs to be more careful.
Some hardware (mostly obsolete these days) has (or had) more than just two levels of hardware-provided protection. On these systems, "user processes" had the least direct privilege, but above those you would find "executive mode", "system mode", and (most privileged) "kernel" or "nucleus" mode. These were intended to lower the cost of crossing the various protection barriers. Code running in "executive" did not have full access to everything in the machine, so it could, for instance, just assume that a user-provided address was valid, and try to use it. If that address was in fact invalid, the exception would rise to the next higher level. With only two levels—"user", unprivileged; and "kernel", completely-privileged—kernel code must be written very carefully. However, it's possible to provide "virtual machines" at low cost these days, which pretty much obsoletes the need for multiple hardware levels of protection. One simply writes a true kernel, then lets it run other things in what they "think" is "kernel mode". This is what VMware and other "hypervisor" systems do.

User-mode threads are scheduled in user mode by something in the process, and the process itself is the only thing handled by the kernel scheduler.
That means your process gets a certain amount of grunt from the CPU and you have to share it amongst all your user mode threads.
Simple case, you have two processes, one with a single thread and one with a hundred threads.
With a simplistic kernel scheduling policy, the thread in the single-thread process gets 50% of the CPU and each thread in the hundred-thread process gets 0.5% each.
With kernel mode threads, the kernel itself manages your threads and schedules them independently. Using the same simplistic scheduler, each thread would get just a touch under 1% of the CPU grunt (101 threads to share the 100% of CPU).
In terms of why kernel mode switching is more expensive, it probably has to do with the fact that you need to switch to kernel mode to do it. User mode threads do all their stuff in user mode (obviously) so there's no involving the kernel in a thread switch operation.
Under Linux, you create threads (and processes) with the clone call, similar to fork but with much finer control over things.
Your final point is a little obtuse. I can't be certain but it's probably talking about user and kernel mode in the sense that one could be executing user code and another could be doing some system call in the kernel (which requires switching to kernel or supervisor mode).
That's not the same as the distinction when talking about the threading support (user or kernel mode support for threading). Without having a copy of the book to hand, I couldn't say definitively, but that'd be my best guess.

Related

Are processes "sandboxed" by hardware?

Can a process access all of the RAM or does the CPU give the process a specific part which the kernel decides, and the process (running in user space) can't change? In other words - is a process sandboxed by hardware, or can it do anything, but is monitored by the OS?
EDIT
I'm told in the comments that this is too broad, so let's assume x86/x64. I'll also add that the question arose while reading what I understood to say that processes can access all RAM - which seems to conflict with what I've read about security in OSs.
If you count MS-DOS as an "operating system", then processes can do anything (and aren't monitored). Even Windows95 doesn't have real memory protection, and a buggy process can crash the machine by scribbling over the wrong memory.
If you only count modern OSes with privilege separation (Unix/Linux, Windows NT and derivates), then processes are sandboxed.
AFAIK, there aren't really systems where there's monitoring of any kind other than "fault if you try to do something". The kernel sets the boundaries, and the user-space process gets a fault if it tries to go outside them.
If you're imagining that maybe the kernel looks at what an unprivileged process does, and adapts accordingly, then no, that's not what happens.
See
https://en.wikipedia.org/wiki/Memory_protection: Usually achieved by giving each process its own virtual address space (virtual memory). This is hardware-supported: every address your code uses is translated to a physical address by a fast translation cache (TLB), which caches the translation tables set up by the OS (aka page tables).
A process can't directly modify its own page tables: it has to ask the kernel to map more physical memory into its address space (e.g. as part of malloc()). So the kernel has a chance to verify that the request is ok before doing it.
Also, a process can ask the kernel to copy data to/from files (or other things) into its memory space. (write/read system calls).
https://en.wikipedia.org/wiki/User_space: normal processes run in user-mode, which is a mode provided by the hardware where privileged instructions will trap to the kernel.

What happens when we set different processor affinity to process and its thread in linux?

what happens when we set different processor affinity to process and its thread in linux.
I am trying to start a process affined to a core (say 1) which have two threads one of which need to run on other core (say 0)
When i tried to set affinity to thread different to process the program got executed. but I want to know the hidden impacts of this approach.
Threads and processes are vastly the same thing. Whether you call pthread_setaffinity... or use the sched_setaffinity syscall, they both affect the current thread's affinity mask. This may be your "process" thread, or a thread you created.
However, note that a new thread created by pthread_create inherits a copy of its creator's CPU affinity mask [1].
That means that setting the affinity and creating a thread is not the same as creating a thread and setting the affinity. In the first case, both threads will compete over the same processor (which is most probably not what you want) and in the second case they will be bound to different processors.
Also note that while binding threads to a dedicated processor (core) may have advantages in some situations, it may just as well be a very stupid thing to do. Playing with the affinity mask means you limit the scheduler in what it can do to make your program run. If the core you bound your thread to isn't available, your thread will not run, end of story.
This is a very similar reasoning/strategy as disabling swap to make the system "faster" (some users still do that!). By doing so they usually gain nothing, all they do is limit what the memory manager can do by removing one option of providing a free page once it runs out of unused physical RAM. Usually this means something more or less valuable from the buffer cache is purged when instead some private page that wasn't used in hours could have been swapped out.
Usually people use affinity because they have this idea that the scheduler will make threads bounce between processor cores all the time and this is bad. Processor migration indeed is not cheap, but the scheduler has a mechanism which makes sure it does not happen before a certain minimum amount of time (there is a /proc thingie for that too). After a longer amount of time, all advantages of staying at the old core (TLB, cache) are usually gone anyway, so running on a different core which is readily available is actually better than waiting for a particular core to maybe, eventually become available.
NUMA architectures may be a different topic, but I'd assume (though I don't know for sure) that the scheduler is smart enough not to silently migrate a thread to a different node. In general, however, I'd recommend not to play with affinity at all.
Affinity is a common first line approach to limiting jitter in HPC. Typically LINUX processes and threads and such are constrained to a small but sufficient set of CPUs and the application is constrained to the remainder of the CPUs.
Affinity is very useful with device drivers. Consider for example an Infiniband adapter being used by an application. The adapter will perform best if the driver thread(s) are constrained to CPUs on the same (or closest if none) NUMA node as the adapter. LINUX doesn't know the application thread so can't even consider any affinity for performance.

How does N<->1 threading model work?

In continuation to question, This is an additional query on N-1 threading model.
It is taught that, before designing an application, selection of threading model need to be taken care.
In N-1 threading model, a single kernel thread is available to work on behalf of each user process. OS scheduler gives a single CPU time slice to this kernel thread.
In user space, programmer would use either POSIX pthread or Windows CreateThread() to spawn multiple threads within a user process. As the programmer used POSIX pthread or Windows CreateThread() the kernel is aware of the user-land threads and each thread is considered for processor time assignment by the scheduler. SO, that means every user thread will get a kernel thread.
My question:
So, How does N-1 threading model looks possible to exist? It would be 1-1 threading model. Please clarify.
In user space, programmer would use either POSIX pthread or Windows CreateThread() to spawn multiple threads within a user process. As the programmer used POSIX pthread or Windows CreateThread() the kernel is aware of the user-land threads and each thread is considered for processor time assignment by the scheduler. SO, that means every user thread will get a kernel thread.
That's how 1-to-1 threading works.
This doesn't have to be the case. A platform can implement pthread_create, CreateThread, or whatever other "create a thread" function it offers that does whatever it wants.
My question:
So, How does N-1 threading model looks possible to exist? It would be 1-1 threading model.
Please clarify.
Precisely as you explained in the beginning of your question -- when the programmer creates a thread, instead of creating a thread the kernel is aware of, it creates a thread that the userland scheduler is aware of, still using a single kernel thread for the entire process.
Short answer: there is more than Windows and Linux.
Slightly longer answer (EDITED):
Many programming languages and frameworks introduce multithreading to the programmer. At the same time, they aim to be portable, i.e., it is not known, whether any target plattform does support threads at all. Here, the best way is to implement a N:1 threading, either in general, are at least for the backends without threading support.
The classic example is Java: the language supports multithreading, while JVMs exist even for very simple embedded plattforms, that do not support threads. However, there are JVMs (actually, most of them) that use kernel threads (e.g. AFIK, the JVM by Sun/Oracle).
Another reason that a language/plattform does not want to transfer the threading control completely to the operating system are sometimes special implementation features as reactor modells or global language locks. Here, the objective is to use information on execution special patterns in the user runtime system (which does the local scheduling) that the OS scheduling has no access to.
Does [1:1 threading] add more space occupancy on User process virtual
address space because of these kernel threads?
Well, in theory, execution flow (processes, threads, etc.) and address space are independent concepts. One can find all kinds of mapping between processes (here used as a general term) and memory spaces: 1:1, n:1, 1:n, n:n. However, the classic approach of threading is that several threads of a process share the memory space of the task (that is the owner of the memory space). And thus, there is usually no difference between user threads and kernel threads regarding the memory space. (One exception is, e.g., the Erlang-VM: here, there exist user threads with isolated memory spaces).

Why does OS require/maintain kernel-land threads?

Below are three threading models that i came across.
Based on these below 3 architectures, It is new for me to understand that, there also exist something called kernel thread, apart from user thread which is introduced as part of POSIX.1C
This is 1-1 model
This is N-1 model.
This is Hybrid model.
I have been through many questions on SO for kernel threads. This looks more relevant link for clarification.
At process level, For every user process that is loaded by Linux loader(say), Kernel does not allocate corresponding kernel process for executing machine instructions that a user process has come up with. User process only request for kernel mode execution, when it require a facility from kernel module[like malloc()/fork()]. Scheduling of user process is done by OS scheduler and assign a CPU core.
For example, User process does not require kernel execution mode to execute an instruction
a=a+2;//a is my local variable in a user level C function
My question:
1)
So, What is the purpose of kernel level thread? Why does OS need to maintain a kernel thread(additionally) for corresponding user thread of a User level process? Does User mode programmer have any control on choosing any of the above three threading models for a given User process through programming?
After i understand the answer to first question, one relevant supplementary is,
2)
Does kernel thread actually get scheduled by OS scheduler but not user thread?
I think the use of the word kernel thread is a bit misleading in these figures. I know the figures from a book about operating system (design) and if I remember correctly, they refer to the way how work is scheduled by the operating system.
In the figures, each process has at least one kernel thread assigned that is scheduled by the kernel.
The N-1 model shows multiple user-land threads that are not known to the kernel at all because the latter schedules the process (or how it's called in the figure, a single kernel thread) only. So for the kernel, each process is a kernel thread. When the process is assigned a slice of processor time, it itself runs multiple threads by scheduling them at its own discretion.
In the 1-1 model, the kernel is aware of the user-land threads and each thread is considered for processor time assignment by the scheduler. So instead of scheduling a whole process, the kernel switches between threads inside of processes.
The hybrid model combines both principles, where lightweight processes are actually threads known to the kernel and which are scheduled for execution by it. Additionally, they implement threads the kernel is not aware of and assign processor time in user-land.
And now to be completely confused, there is actually a real kernel thread in Linux. But as far as I understand the concept, these threads are used for kernel-space operations only, e.g. when kernel modules need to do things in parallel.
So, What is the purpose of kernel level thread?
To provide a vehicle for assignment of a set of resources provided by the OS. The set always incudes CPU code execution on a core. Others may include disk, NIC, KB, mouse, timers, as may be requested by syscalls from the thread. The kernel manages access to those resources as they become available and arbitrates between resource conflicts, eg. a request for KB input when none is available will remove CPU execution from the thread until KB input becomes available.
Why do we need a kernel thread(additionally) for corresponding user
thread of a User level process?
Without a kernel-level thread, the user thread would not be able to obtain execution - it would be dead code/stack. Note that with Linux, the concept of threads/processes can get somewhat muddied, but nevertheless, the fundamental unit of execution is a thread. A process is a higher-level construct whose code must be run by at least one thread, (eg. the one raised by the OS loader to run code at the process entry point when it is first loaded).
Does User mode programmer have any control on choosing any of the
above three threading models for a given User process through
programming?
No, not without a syscall, which means leaving user mode.
Does kernel thread actually get scheduled by OS scheduler but not user
thread
Yes - it is the only thing that gets to be given execution when it can use it, have execution removed when it cannot, and be subject to preemptive removal of CPU if the OS scheduler requires it for something else.

Notifying user-mode as soon as a packet arrives

(This is for a low latency system)
Assuming I have some code which transfers received UDP packets to a region of shared memory, how can I then notify the application (in user mode) that it is now time to read the shared memory? I do not want the application continuously polling eating up cpu cycles.
Is it possible to insert some code in the network stack which can call my application code immediately after it has written to the shared memory?
EDIT I added a C tag, but the application would be in C++
One way to signal an event from one Unix process to another is with POSIX semaphores. You would use sem_open to initialize and open a named semaphore that you can use cross-process.
See How can I get multiple calls to sem_open working in C?.
The lowest latency method to signal an event between processes on the same host is to spin-wait looking for a (shared) memory location to change... this avoids a system call. You expressly said you do not want the application polling, however in a multi-threaded application running on a multi-core system it may not be a bad tradeoff if you really care about latency.
Unless you are planning to use a real-time OS, there is no "immediate" protocol. The CPU resources are available in quantums of few milliseconds, and usually it takes some time for your user thread to understand it can continue.
Considering all above, any form of IPC would do: local sockets, signals, pipes, event descriptors etc. Practical difference on performance would be miserable.
Furthermore, usage of shared memory can lead to unnessessary complications in maintaining/debugging, but that's the designer's choice.

Resources