Threads on a multiprocessor system

Threads on a multiprocessor system - c

Does the clone() function take advantage of a multiprocessor system?
I mean, if I create many threads inside a main process using the clone() function, will these created threads be executed on different processors simultaneously?
thanks

You don't generally have control over processors from user space (with some exceptions). It is the kernel's decision how to distribute all the separate execution contexts onto the available hardware processors. It will probably try to be nice and smart (i.e. a sleeping thread is woken up on the same CPU where it went to sleep, and if there are is no contention, separate threads should spread out over available CPUs), but a real operating system running many processes is a complex thing and the choice of processor depends on many factors.

Related

linux c: what's the common use case of "sched_setaffinity" function? I don't find it useful

The operating system is able to determine how to arrange difference processes/threads onto different cpu cores, the os scheduler does the work well. So when do we really need to call functions like sched_setafficity() for a process, or pthread_setaffinity_np() for a pthread?
It doesn't seem to be able to raise any performance dramatically, if it can, then I suppose we need to re-write linux process scheduler right?
Just wish to know when do we need to call these functions, in my applications?
Thanks.

It's very helpful in some computationally intensive real time processes related to DSP(Digital Signal Processing).
Let's say One real time DSP related process PROCESS0 is running on core CPU0. Because of some scheduling algorithms CPU0 pre-emption need to happen such that process0 has to run on another CPU. This switching of realtime process is a overhead. Hence affinity. We direct to kernel that the process0 should run on CPU0.

Fork : concurrency or parallelism

I was recently doing some experiments with the fork function and I was concerned by a "simple" (short) question:
Does fork uses concurrency or parallelism (if more than one core) mechanisms?
Or, is it the OS which makes the best choice?
Thanks for your answer.
nb: Damn I fork bombed myself again!
Edit:
Concurrency: Each operation run on one core. Interrupts are received in order to switch from one process to another (Sequential computation).
Parallelism: Each operation run on two cores (or more).

fork() duplicates the current process, creating another independent process. End of story.
How the kernel chooses to schedule these processes is a different, very broad question. In general the kernel will try to use all available resources (cores) to run as many tasks as possible. When there are more runnable tasks than cores, it has to start making decisions about who gets to run, and for how long.

The fork function creates a separate process. It is up to the operating system how it handles different processes.
Of course, if only once core is available, the OS has no other choice but running all processes interleaved.
If more cores are available, every sane OS will distribute the processes to the different cores, so every core runs at least one process.
However, even then, more processes can be active than there are cores. So even then, it is up to the OS to decide which processes can be run parallel (by distributing to cores) and which have to be run interleaved (on a single core).

In fact, fork() is a system call (aka. system service) which creates a new process from the current process (read the return code to see who you are, the parent or the child).
In the UNIX work, processes shares the CPU computing time. This works like that :
a process is running
the clock generates an interrupt, calling the kernel and pausing the process
the kernel takes the list of available processes, and decide to resume one (this is called scheduling)
go to point 1)
When there is multiples processor cores, kernels are able to dispatch processes on them.

Well, you can do something. Write a complex program say, O(n^3), so that it takes a good amount of time to compute. fork() four times (if you have quad-core). Now open any graphical CPU monitor. Anything cool?

What happens when we set different processor affinity to process and its thread in linux?

what happens when we set different processor affinity to process and its thread in linux.
I am trying to start a process affined to a core (say 1) which have two threads one of which need to run on other core (say 0)
When i tried to set affinity to thread different to process the program got executed. but I want to know the hidden impacts of this approach.

Threads and processes are vastly the same thing. Whether you call pthread_setaffinity... or use the sched_setaffinity syscall, they both affect the current thread's affinity mask. This may be your "process" thread, or a thread you created.
However, note that a new thread created by pthread_create inherits a copy of its creator's CPU affinity mask [1].
That means that setting the affinity and creating a thread is not the same as creating a thread and setting the affinity. In the first case, both threads will compete over the same processor (which is most probably not what you want) and in the second case they will be bound to different processors.
Also note that while binding threads to a dedicated processor (core) may have advantages in some situations, it may just as well be a very stupid thing to do. Playing with the affinity mask means you limit the scheduler in what it can do to make your program run. If the core you bound your thread to isn't available, your thread will not run, end of story.
This is a very similar reasoning/strategy as disabling swap to make the system "faster" (some users still do that!). By doing so they usually gain nothing, all they do is limit what the memory manager can do by removing one option of providing a free page once it runs out of unused physical RAM. Usually this means something more or less valuable from the buffer cache is purged when instead some private page that wasn't used in hours could have been swapped out.
Usually people use affinity because they have this idea that the scheduler will make threads bounce between processor cores all the time and this is bad. Processor migration indeed is not cheap, but the scheduler has a mechanism which makes sure it does not happen before a certain minimum amount of time (there is a /proc thingie for that too). After a longer amount of time, all advantages of staying at the old core (TLB, cache) are usually gone anyway, so running on a different core which is readily available is actually better than waiting for a particular core to maybe, eventually become available.
NUMA architectures may be a different topic, but I'd assume (though I don't know for sure) that the scheduler is smart enough not to silently migrate a thread to a different node. In general, however, I'd recommend not to play with affinity at all.

Affinity is a common first line approach to limiting jitter in HPC. Typically LINUX processes and threads and such are constrained to a small but sufficient set of CPUs and the application is constrained to the remainder of the CPUs.
Affinity is very useful with device drivers. Consider for example an Infiniband adapter being used by an application. The adapter will perform best if the driver thread(s) are constrained to CPUs on the same (or closest if none) NUMA node as the adapter. LINUX doesn't know the application thread so can't even consider any affinity for performance.

How does N<->1 threading model work?

In continuation to question, This is an additional query on N-1 threading model.
It is taught that, before designing an application, selection of threading model need to be taken care.
In N-1 threading model, a single kernel thread is available to work on behalf of each user process. OS scheduler gives a single CPU time slice to this kernel thread.
In user space, programmer would use either POSIX pthread or Windows CreateThread() to spawn multiple threads within a user process. As the programmer used POSIX pthread or Windows CreateThread() the kernel is aware of the user-land threads and each thread is considered for processor time assignment by the scheduler. SO, that means every user thread will get a kernel thread.
My question:
So, How does N-1 threading model looks possible to exist? It would be 1-1 threading model. Please clarify.

In user space, programmer would use either POSIX pthread or Windows CreateThread() to spawn multiple threads within a user process. As the programmer used POSIX pthread or Windows CreateThread() the kernel is aware of the user-land threads and each thread is considered for processor time assignment by the scheduler. SO, that means every user thread will get a kernel thread.
That's how 1-to-1 threading works.
This doesn't have to be the case. A platform can implement pthread_create, CreateThread, or whatever other "create a thread" function it offers that does whatever it wants.
My question:
So, How does N-1 threading model looks possible to exist? It would be 1-1 threading model.
Please clarify.
Precisely as you explained in the beginning of your question -- when the programmer creates a thread, instead of creating a thread the kernel is aware of, it creates a thread that the userland scheduler is aware of, still using a single kernel thread for the entire process.

Short answer: there is more than Windows and Linux.
Slightly longer answer (EDITED):
Many programming languages and frameworks introduce multithreading to the programmer. At the same time, they aim to be portable, i.e., it is not known, whether any target plattform does support threads at all. Here, the best way is to implement a N:1 threading, either in general, are at least for the backends without threading support.
The classic example is Java: the language supports multithreading, while JVMs exist even for very simple embedded plattforms, that do not support threads. However, there are JVMs (actually, most of them) that use kernel threads (e.g. AFIK, the JVM by Sun/Oracle).
Another reason that a language/plattform does not want to transfer the threading control completely to the operating system are sometimes special implementation features as reactor modells or global language locks. Here, the objective is to use information on execution special patterns in the user runtime system (which does the local scheduling) that the OS scheduling has no access to.
Does [1:1 threading] add more space occupancy on User process virtual
address space because of these kernel threads?
Well, in theory, execution flow (processes, threads, etc.) and address space are independent concepts. One can find all kinds of mapping between processes (here used as a general term) and memory spaces: 1:1, n:1, 1:n, n:n. However, the classic approach of threading is that several threads of a process share the memory space of the task (that is the owner of the memory space). And thus, there is usually no difference between user threads and kernel threads regarding the memory space. (One exception is, e.g., the Erlang-VM: here, there exist user threads with isolated memory spaces).

Why does OS require/maintain kernel-land threads?

Below are three threading models that i came across.
Based on these below 3 architectures, It is new for me to understand that, there also exist something called kernel thread, apart from user thread which is introduced as part of POSIX.1C
This is 1-1 model
This is N-1 model.
This is Hybrid model.
I have been through many questions on SO for kernel threads. This looks more relevant link for clarification.
At process level, For every user process that is loaded by Linux loader(say), Kernel does not allocate corresponding kernel process for executing machine instructions that a user process has come up with. User process only request for kernel mode execution, when it require a facility from kernel module[like malloc()/fork()]. Scheduling of user process is done by OS scheduler and assign a CPU core.
For example, User process does not require kernel execution mode to execute an instruction
a=a+2;//a is my local variable in a user level C function
My question:
1)
So, What is the purpose of kernel level thread? Why does OS need to maintain a kernel thread(additionally) for corresponding user thread of a User level process? Does User mode programmer have any control on choosing any of the above three threading models for a given User process through programming?
After i understand the answer to first question, one relevant supplementary is,
2)
Does kernel thread actually get scheduled by OS scheduler but not user thread?

I think the use of the word kernel thread is a bit misleading in these figures. I know the figures from a book about operating system (design) and if I remember correctly, they refer to the way how work is scheduled by the operating system.
In the figures, each process has at least one kernel thread assigned that is scheduled by the kernel.
The N-1 model shows multiple user-land threads that are not known to the kernel at all because the latter schedules the process (or how it's called in the figure, a single kernel thread) only. So for the kernel, each process is a kernel thread. When the process is assigned a slice of processor time, it itself runs multiple threads by scheduling them at its own discretion.
In the 1-1 model, the kernel is aware of the user-land threads and each thread is considered for processor time assignment by the scheduler. So instead of scheduling a whole process, the kernel switches between threads inside of processes.
The hybrid model combines both principles, where lightweight processes are actually threads known to the kernel and which are scheduled for execution by it. Additionally, they implement threads the kernel is not aware of and assign processor time in user-land.
And now to be completely confused, there is actually a real kernel thread in Linux. But as far as I understand the concept, these threads are used for kernel-space operations only, e.g. when kernel modules need to do things in parallel.

So, What is the purpose of kernel level thread?
To provide a vehicle for assignment of a set of resources provided by the OS. The set always incudes CPU code execution on a core. Others may include disk, NIC, KB, mouse, timers, as may be requested by syscalls from the thread. The kernel manages access to those resources as they become available and arbitrates between resource conflicts, eg. a request for KB input when none is available will remove CPU execution from the thread until KB input becomes available.
Why do we need a kernel thread(additionally) for corresponding user
thread of a User level process?
Without a kernel-level thread, the user thread would not be able to obtain execution - it would be dead code/stack. Note that with Linux, the concept of threads/processes can get somewhat muddied, but nevertheless, the fundamental unit of execution is a thread. A process is a higher-level construct whose code must be run by at least one thread, (eg. the one raised by the OS loader to run code at the process entry point when it is first loaded).
Does User mode programmer have any control on choosing any of the
above three threading models for a given User process through
programming?
No, not without a syscall, which means leaving user mode.
Does kernel thread actually get scheduled by OS scheduler but not user
thread
Yes - it is the only thing that gets to be given execution when it can use it, have execution removed when it cannot, and be subject to preemptive removal of CPU if the OS scheduler requires it for something else.