How does N<->1 threading model work? - c

In continuation to question, This is an additional query on N-1 threading model.
It is taught that, before designing an application, selection of threading model need to be taken care.
In N-1 threading model, a single kernel thread is available to work on behalf of each user process. OS scheduler gives a single CPU time slice to this kernel thread.
In user space, programmer would use either POSIX pthread or Windows CreateThread() to spawn multiple threads within a user process. As the programmer used POSIX pthread or Windows CreateThread() the kernel is aware of the user-land threads and each thread is considered for processor time assignment by the scheduler. SO, that means every user thread will get a kernel thread.
My question:
So, How does N-1 threading model looks possible to exist? It would be 1-1 threading model. Please clarify.

In user space, programmer would use either POSIX pthread or Windows CreateThread() to spawn multiple threads within a user process. As the programmer used POSIX pthread or Windows CreateThread() the kernel is aware of the user-land threads and each thread is considered for processor time assignment by the scheduler. SO, that means every user thread will get a kernel thread.
That's how 1-to-1 threading works.
This doesn't have to be the case. A platform can implement pthread_create, CreateThread, or whatever other "create a thread" function it offers that does whatever it wants.
My question:
So, How does N-1 threading model looks possible to exist? It would be 1-1 threading model.
Please clarify.
Precisely as you explained in the beginning of your question -- when the programmer creates a thread, instead of creating a thread the kernel is aware of, it creates a thread that the userland scheduler is aware of, still using a single kernel thread for the entire process.

Short answer: there is more than Windows and Linux.
Slightly longer answer (EDITED):
Many programming languages and frameworks introduce multithreading to the programmer. At the same time, they aim to be portable, i.e., it is not known, whether any target plattform does support threads at all. Here, the best way is to implement a N:1 threading, either in general, are at least for the backends without threading support.
The classic example is Java: the language supports multithreading, while JVMs exist even for very simple embedded plattforms, that do not support threads. However, there are JVMs (actually, most of them) that use kernel threads (e.g. AFIK, the JVM by Sun/Oracle).
Another reason that a language/plattform does not want to transfer the threading control completely to the operating system are sometimes special implementation features as reactor modells or global language locks. Here, the objective is to use information on execution special patterns in the user runtime system (which does the local scheduling) that the OS scheduling has no access to.
Does [1:1 threading] add more space occupancy on User process virtual
address space because of these kernel threads?
Well, in theory, execution flow (processes, threads, etc.) and address space are independent concepts. One can find all kinds of mapping between processes (here used as a general term) and memory spaces: 1:1, n:1, 1:n, n:n. However, the classic approach of threading is that several threads of a process share the memory space of the task (that is the owner of the memory space). And thus, there is usually no difference between user threads and kernel threads regarding the memory space. (One exception is, e.g., the Erlang-VM: here, there exist user threads with isolated memory spaces).

Related

User-level threads context switching: How to detect when a thread is blocking in C?

As the title suggests, is there a way in C to detect when a user-level thread running on top of a kernel-level thread e.g., pthread has blocked (or about to block) for I/O?
My use case is as follows: I need to execute tasks in a multithreaded environment (on top of kernel threads e.g., pthreads). The tasks are basically user functions that can be synchronized and may use blocking operations within. I need to hide latency in my implementation. So, I am exploring the idea of implementing the tasks as user-level threads for better control of their execution context such that, when a task blocks or synchronizes, I context-switch to other ready tasks (i.e., implementing my own scheduler for the user-level threads). Consequently, almost the full use of the OS’s time quantum per kernel thread can be achieved.
There used to be code that did this, for example GNU pth. It's generally been abandoned because it just doesn't work very well and we have much better options now. You have two choices:
1) If you have OS help, you can use the OS mechanisms. Windows provides OS help for this, IOCP dispatching uses it.
2) If you have no OS help, then you have to convert all blocking operations into non-blocking ones that call your dispatcher rather than blocking. So, for example, if someone calls socket, you intercept that call and set the socket non-blocking. When they call read, you intercept that call and if they get a "would block" indication, you arrange to resume when the operation might succeed and schedule another thread.
You can look at GNU pth to see how you might make option 2 work. But be warned, GNU pth is full of reported bugs that have never been fixed since it was abandoned. It will give you an idea of how to implement things like mutexes and sleeps in a cooperative user-space threading environment. But don't actually use the code.

Why does OS require/maintain kernel-land threads?

Below are three threading models that i came across.
Based on these below 3 architectures, It is new for me to understand that, there also exist something called kernel thread, apart from user thread which is introduced as part of POSIX.1C
This is 1-1 model
This is N-1 model.
This is Hybrid model.
I have been through many questions on SO for kernel threads. This looks more relevant link for clarification.
At process level, For every user process that is loaded by Linux loader(say), Kernel does not allocate corresponding kernel process for executing machine instructions that a user process has come up with. User process only request for kernel mode execution, when it require a facility from kernel module[like malloc()/fork()]. Scheduling of user process is done by OS scheduler and assign a CPU core.
For example, User process does not require kernel execution mode to execute an instruction
a=a+2;//a is my local variable in a user level C function
My question:
1)
So, What is the purpose of kernel level thread? Why does OS need to maintain a kernel thread(additionally) for corresponding user thread of a User level process? Does User mode programmer have any control on choosing any of the above three threading models for a given User process through programming?
After i understand the answer to first question, one relevant supplementary is,
2)
Does kernel thread actually get scheduled by OS scheduler but not user thread?
I think the use of the word kernel thread is a bit misleading in these figures. I know the figures from a book about operating system (design) and if I remember correctly, they refer to the way how work is scheduled by the operating system.
In the figures, each process has at least one kernel thread assigned that is scheduled by the kernel.
The N-1 model shows multiple user-land threads that are not known to the kernel at all because the latter schedules the process (or how it's called in the figure, a single kernel thread) only. So for the kernel, each process is a kernel thread. When the process is assigned a slice of processor time, it itself runs multiple threads by scheduling them at its own discretion.
In the 1-1 model, the kernel is aware of the user-land threads and each thread is considered for processor time assignment by the scheduler. So instead of scheduling a whole process, the kernel switches between threads inside of processes.
The hybrid model combines both principles, where lightweight processes are actually threads known to the kernel and which are scheduled for execution by it. Additionally, they implement threads the kernel is not aware of and assign processor time in user-land.
And now to be completely confused, there is actually a real kernel thread in Linux. But as far as I understand the concept, these threads are used for kernel-space operations only, e.g. when kernel modules need to do things in parallel.
So, What is the purpose of kernel level thread?
To provide a vehicle for assignment of a set of resources provided by the OS. The set always incudes CPU code execution on a core. Others may include disk, NIC, KB, mouse, timers, as may be requested by syscalls from the thread. The kernel manages access to those resources as they become available and arbitrates between resource conflicts, eg. a request for KB input when none is available will remove CPU execution from the thread until KB input becomes available.
Why do we need a kernel thread(additionally) for corresponding user
thread of a User level process?
Without a kernel-level thread, the user thread would not be able to obtain execution - it would be dead code/stack. Note that with Linux, the concept of threads/processes can get somewhat muddied, but nevertheless, the fundamental unit of execution is a thread. A process is a higher-level construct whose code must be run by at least one thread, (eg. the one raised by the OS loader to run code at the process entry point when it is first loaded).
Does User mode programmer have any control on choosing any of the
above three threading models for a given User process through
programming?
No, not without a syscall, which means leaving user mode.
Does kernel thread actually get scheduled by OS scheduler but not user
thread
Yes - it is the only thing that gets to be given execution when it can use it, have execution removed when it cannot, and be subject to preemptive removal of CPU if the OS scheduler requires it for something else.

How to portably share a variable between threads/processes?

I have a server that spawns a new process or thread for every incoming request and I need to read and write a variable defined in this server from both threads and processes. Since the server program needs to work both on UNIX and Windows I need to share the variable in a portable way, but how do I do it?
I need to use the standard C library or the native syscalls, so please don’t suggest third party libraries.
shared memory is operating system specific. On Linux, consider reading shm_overview(7) and (since with shared memory you always need some way to synchronize) sem_overview(7).
Of course you need to find out the similar (but probably not equivalent) Windows function calls.
Notice that threads are not the same as processes. Threads by definition share a common single address space. With threads, the main issue is then mostly synchronization, often using mutexes (e.g. pthread_mutex_lock etc...). On Linux, read a pthread tutorial & pthreads(7)
Recall that several libraries (glib, QtCore, Poco, ...) provide useful abstractions above operating system specific functionalities, but you seem to want avoiding them.
At last, I am not at all sure that sharing a variable like you ask is the best way to achieve your goals (I would definitely consider some message passing approach with an event loop: pipe(7) & poll(2), perhaps with a textual protocol à la JSON).

OpenMp and Shared Memory definition

According to the OpenMP web site OpenMp is "the de-facto standard for parallel programming on shared memory systems" According to Wikipedia "Using memory for communication inside a single program, for example among its multiple threads, is generally not referred to as shared memory."
What is wrong here ? Is it the "generally" term ?
OpenMp is really just creating threads "sharing memory" through a single same virtual adress space, isn't it ?
Moreover, I guess OpenMP is able to run on NUMA architectures where all the memory can be addressed by all the processors, but with some increased memory access time when threads sharing data, are assigned to cores accessing to different memories at different access time. Is this true ?
I'm redacting a full-fledged answer here to try to answer further questions asked as comments to lucas1024's answer.
On the meaning of "shared memory"
On the one hand, you have the software-oriented (i.e. OS-oriented) meaning of shared-memory: a way to enable different processes to access the same chunk of memory (i.e. to relax the usual OS constraint that a given process should not be able to tamper with other processes' memory). As stated in the wikipedia page, the POSIX shared memory API is one implementation of such a facility. In this acception, it does not make much sense to speak of threads (an OS might well provide shared memory without even providing threads).
On the other hand, you have the hardware-oriented meaning of "shared-memory": an hardware configuration where all CPUs have access to the same piece of RAM.
On the meaning of "thread"
Now we have to disambiguate another term: "thread". An OS might provide a way to have multiple concurrent execution flows within a process. POSIX threads are an implementation of such a feature.
However, the OpenMP specification has its own definitions:
thread: An execution entity with a stack and associated static memory, called
threadprivate memory.
OpenMP thread: A thread that is managed by the OpenMP runtime system.
Such definitions fit nicely with the definition of e.g. POSIX threads, and most OpenMP implementations indeed use POSIX threads to create OpenMP threads. But you might imagine OpenMP implementations on top of OSes which do not provide POSIX threads or equivalent features. Such OpenMP implementations would have to internally manage execution flows, which is difficult enough but entirely doable. Alternatively, they might map OpenMP threads to OS processes and use some kind of "shared memory" feature (in the OS sense) to enable them sharing memory (though I don't know of any OpenMP implementation doing this).
In the end, the only constraint you have for an OpenMP implementation is that all CPUs should have a way to share access to the same central memory. That is to say OpenMP programs should run on "shared memory" systems in the hardware sense. However, OpenMP threads do not necessarily have to be POSIX threads of the same OS process.
A "shared memory system" is simply a system where multiple cores or CPUs are accessing a single pool of memory through a local bus. So the OpenMP site is correct.
Communicating between threads in a program is not done using "shared memory" - instead the term typically refers to communication between processes on the same machine through memory. So the Wikipedia entry is not in contradiction and it, in fact, points out the difference in terminology between hardware and software.

what's the difference between the threads(and process) in kernel-mode and ones in user-mode?

my question:
1)In book modern operating system, it says the threads and processes can be in kernel mode or user mode, but it does not say clearly what's the difference between them .
2)Why the switch for the kernel-mode threads and process costs more than the switch for user-mode threads and process?
3) now, I am learning Linux,I want to know how would I create threads and processes in Kernel mode and user mode respectively IN LINUX SYSTEM?
4)In book modern operating system, it says that it is possible that process would be in user- mode, but the threads which are created in the user-mode process can be in kernel mode. How would this be possible?
There are some terminology problems due more to historical accident than anything else here.
"Thread" usually refers to thread-of-control within a process, and may (does in this case) mean "a task with its own stack, but which shares access to everything not on that stack with other threads in the same protection domain".
"Process" tends to refer to a self-contained "protection domain" which may (and does in this case) have the ability to have multiple threads within it. Given two processes P1 and P2, the only way for P1 to affect P2 (or vice versa) is through some particular defined "communications channel" such as a file, pipe, or socket; via "inter-process" signals like Unix/Linux signals; and so on.
Since threads don't have this kind of barrier between each other, one thread can easily interfere with (corrupt the data used by) another thread.
All of this is independent of user vs kernel, with one exception: in "the kernel"—note that there is an implicit assumption here that there is just one kernel—you have access to the entire machine state at all times, and full privileges to do anything. Hence you can deliberately (or in some cases accidentally) disregard or turn off hardware protection and mess with data "belonging to" someone else.
That mostly covers several possibly-confused items in Q1. As for Q2, the answer to the question as asked is "it doesn't". In general, because threads do not involve (as much) protection, it's cheaper to switch from one thread to another: you do not have to tell the hardware (in whatever fashion) that it should no longer allow various kinds of access, since threads T1 and T2 have "the same" access. Switching between processes, however, as with P1 and P2, you "cross a protection barrier", which has some penalty (the actual penalty varies widely with hardware, and to some extent the skills of the OS writers).
It's also worth noting that crossing from user to kernel mode, and vice versa, is also crossing a protection domain, which again has some kind of cost.
In Linux, there are a number of ways for user processes to create what amount to threads, including both "POSIX threads" (pthreads) and the clone call (details for clone, which is extremely flexible, are beyond the scope of this answer). If you want to write portable code, you should probably stick with pthreads.
Within the Linux kernel, threads are done completely differently, and you will need Linux kernel documentation.
I can't properly answer Q4 since I don't have the book and am not sure what they are referring to here. My guess is that they mean that whenever any user process-or-thread makes a "system call" (requests some service from the OS), this crosses that user/kernel protection barrier, and it is then up to the kernel to verify that the user code has appropriate privileges for that operation, and then to do that operation. The part of the kernel that does this is running with kernel-level protections and thus needs to be more careful.
Some hardware (mostly obsolete these days) has (or had) more than just two levels of hardware-provided protection. On these systems, "user processes" had the least direct privilege, but above those you would find "executive mode", "system mode", and (most privileged) "kernel" or "nucleus" mode. These were intended to lower the cost of crossing the various protection barriers. Code running in "executive" did not have full access to everything in the machine, so it could, for instance, just assume that a user-provided address was valid, and try to use it. If that address was in fact invalid, the exception would rise to the next higher level. With only two levels—"user", unprivileged; and "kernel", completely-privileged—kernel code must be written very carefully. However, it's possible to provide "virtual machines" at low cost these days, which pretty much obsoletes the need for multiple hardware levels of protection. One simply writes a true kernel, then lets it run other things in what they "think" is "kernel mode". This is what VMware and other "hypervisor" systems do.
User-mode threads are scheduled in user mode by something in the process, and the process itself is the only thing handled by the kernel scheduler.
That means your process gets a certain amount of grunt from the CPU and you have to share it amongst all your user mode threads.
Simple case, you have two processes, one with a single thread and one with a hundred threads.
With a simplistic kernel scheduling policy, the thread in the single-thread process gets 50% of the CPU and each thread in the hundred-thread process gets 0.5% each.
With kernel mode threads, the kernel itself manages your threads and schedules them independently. Using the same simplistic scheduler, each thread would get just a touch under 1% of the CPU grunt (101 threads to share the 100% of CPU).
In terms of why kernel mode switching is more expensive, it probably has to do with the fact that you need to switch to kernel mode to do it. User mode threads do all their stuff in user mode (obviously) so there's no involving the kernel in a thread switch operation.
Under Linux, you create threads (and processes) with the clone call, similar to fork but with much finer control over things.
Your final point is a little obtuse. I can't be certain but it's probably talking about user and kernel mode in the sense that one could be executing user code and another could be doing some system call in the kernel (which requires switching to kernel or supervisor mode).
That's not the same as the distinction when talking about the threading support (user or kernel mode support for threading). Without having a copy of the book to hand, I couldn't say definitively, but that'd be my best guess.

Resources