TASK_UNINTERRUPTIBLE and process threads in linux kernel development using C - c

I have a running process which has created multiple user mode threads. If the kernel changes the state of the process to TASK_UNINTERRUPTIBLE (or TASK_INTERRUPTIBLE) do the threads created by the process automatically get suspended?
This is not a homework question. I'm reading an operating systems book which describes how a semaphore is implemented. In their implementation the semaphore struct maintains a linked list of processes currently waiting for the semaphore. From what I've learned so far, such a semaphore could only be used to synchronize processes, not threads. Correct? The threads in the linked list are put into a TASK_INTERRUPTIBLE state until the semaphore is available, at which point one process is woken up by setting its state to TASK_RUNNING.

In Linux each thread is a separate task running within a process scope. See /proc/self/task/. They are even created with the same kernel function as a new process. Threads in Linux originated as "lightweight processes".
Each task has a unique task id (tid), similar to the process id (pid) and indeed the master thread (the one executing main()) has the same tid as the process pid.
The only functional difference in Linux between threads and processes is that all threads (tasks) share all process resources apart from
scheduling parameters (includes TASK_UNINTERRUPTIBLE, TASK_INTERRUPTIBLE)
stack
task id
the main() thread identifies the process
So TASK_INTERRUPTIBLE can be applied to each thread individually.
As such semaphores are perfectly valid to use for synchronising threads. In this case if one thread blocks on a semaphore, it's jus that one thread.

Related

What happens when multi thread program creates new processes?

I am a bit confused I would like to know in detail, what happens if a C program with more than one thread creates new processes. Does the behaviour depends on which thread is creating new processes or how many threads create new processes?
With pthreads, only the calling thread is forked in the new process when fork is called.
From the Linux man page:
The child process is created with a single thread--the one that
called fork(). The entire virtual address space of the parent
is replicated in the child, including the states of mutexes,
condition variables, and other pthreads objects; the use of
pthread_atfork(3) may be helpful for dealing with problems that this
can cause.
There are however some versions of fork on Solaris that duplicate all threads.
From the Solaris man page:
A call to forkall() or forkallx() replicates in the child process
all of the threads (see
thr_create(3C) and pthread_create(3C)) in the parent process. A call to fork1() or forkx()
replicates only the calling thread in the child process.
A call to fork() is identical to a call to fork1(); only the calling
thread is replicated in the child process. This is the POSIX-specified
behavior for fork().
In releases of Solaris prior to Solaris 10, the behavior of fork()
depended on whether or not the application was linked with the
POSIX threads library. When linked with -lthread (Solaris Threads)
but not linked with -lpthread (POSIX Threads), fork() was the same
as forkall(). When linked with -lpthread, whether or not also
linked with -lthread, fork() was the same as fork1().

Relation between Thread ID and Process ID

I am having some confusion between Process Id and Thread Id. I have gone through several web-post including stack overflow here, Which says
starting a new process gives you a new PID and a new TGID, while starting a new thread gives you a new PID while maintaining the same TGID.
So when I run a program why all the threads created from the program don't have different PID?
I know in programming we usually say that the main is a thread and execution starts from main , So if I create multiple thread from main, all the threads will have the same PID which is equal to the main's PID.
So what I wanted to ask is as below:
1) When we run a program it will run as a process or a thread?
2) Is there any difference between main thread creating threads and Process creating threads?
3) Is there any difference between thread and process in linux? Since I read somewhere that linux doesn't differentiate between Thread and Process.
Simplifying a bit:
The PID is the process ID, TID is the thread ID. The thing is that for the first thread created by fork(), PID=TID. If you create more threads within the process, with a clone() command, then PID and TID will be different, PID will always be smaller than TID.
No, there is no difference, except maybe that if main is killed, all other threads are also killed.
Yes, the thread is what actually gets scheduled. Technically, the process is only a memory mapping of the different segments of code (text, bss, stack, heap and the OS).
This confusion comes from the Linux concept of tasks.
In Linux there is little difference between a task and a thread though.
Every process is a self contained VM running at least one task.
Each task is an independent execution unit within a process scope.
The main task of a process gives it's task id (TID) to the process as it's process id (PID).
Every new thread that you spawn within a process creates a new task within it. In order to identify then individually in the kernel they get assigned their own individual task id (TID).
All tasks within a process share the same task group id (TGID).
I got the answer here on stackoverflow. It states that if we run a program on Linux that contains the libc libuClibc-0.9.30.1.so (1). Basically an older version of libc then thread created will have different PID as shown below
root#OpenWrt:~# ./test
main thread pid is 1151
child thread pid is 1153
and I tried to run this program with a linux that contains the libc from ubuntu libc6 (2) i.e newer version of libc then Thread created will have the same PID as the process.
$ ./test
main thread pid is 2609
child thread pid is 2609
The libc (1) use linuxthreads implementation of pthread
And the libc (2) use NPTL ("Native posix thread library") implementation of pthread
According to the linuxthreads FAQ (in J.3 answer):
each thread is really a distinct process with a distinct PID, and signals sent to the PID of a thread can only be handled by that thread
So in the old libc which use linuxthreads implementation, each thread has its distinct PID
In the new libc version which use NPTL implementation, all threads has the same PID of the main process.
The NPTL was developed by redhat team. and according to the redhat NPTL document: One of the problems which are solved in the NPTL implementation is:
(Chapter: Problems with the Existing Implementation, page5)
Each thread having a different process ID causes compatibility problems with other POSIX thread implementations. This is in part a moot point since signals can't be used very well but is still noticeable
And that explain this issue.
I am using the new libc version that contains the NPTL ("Native posix thread library") implementation of pthread.
The post you have shown describes of Linux threading implementation which I suppose is the older version of Linux implementation where threads were created as a different process.
In the POSIX implementation of threads, the threads are not created as a different process rather they create different streams of parallel execution of the code which have some components differ in the those parallel execution, the information of which is stored by Thread Descriptor storing the TID.
Whereas the process creating multiple thread can be referred as a multi-threaded process, thus has a same PID of all its thread but different TID's. The main process creating thread can be referred as Main thread
You will get same Process ID as all threads are sharing your program data which is your process so when you call for Process ID you get the same.

Does message queue support Multi-thread?

I have 3 questions about thread and process communication.
Can the Linux function msgget(), msgsnd(), and msgrcv() be invoked by multiple threads in one process? These functions in different threads are to attempt to access(r/w) one process' message queue. Are all race conditions supposed to be taken care by the system? If not, is there any good method to support threads and send a message to its main thread(process)?
Can semop() function be used to synchronize threads in one process?
There is a shared memory which have the following entities to access.
process
several threads in one process.
Do I have to use semaphore of inter-process level and a semaphore of threads level at the same time? Any simple way to handle this?
A lot of question. :) thanks.
Can the Linux function msgget(), msgsnd(), and msgrcv() be invoked by multiple threads in one process?
You do not need to worry about race conditions, the system will take care of that, there is no race condition with these calls.
can semop() function be used to synchronize threads in one process?
Yes, read more in the documentation
Do I have to use semaphore of inter-process level and a semaphore of threads level?
Any resource which is shared globally among threads or processes is subject to race conditions due to one or more threads or processes trying to access it at the very same time, So you need to synchronize the access to such a shared global resource.

Zero Threaded Process?

Does a process have to have at least one thread in it? Is it possible for a process to be void of any threads, or does this not make sense?
A process usually has at least one thread. Wikipedia has the definition:
a thread of execution is the smallest unit of processing that can be scheduled by an operating system. The implementation of threads and processes differs from one operating system to another, but in most cases, a thread is contained inside a process.
The MSDN backs this up:
A processor executes threads, not processes, so each application has at least one process, and a process always has at least one thread of execution, known as the primary thread.
Though it does go on to say:
A process can have zero or more single-threaded apartments and zero or one multithreaded apartment.
Which implies that if both the number of single-threaded apartments and multithreaded apartments could be zero. However, the process wouldn't do much :)
In Unix-like operating systems, it's possible to have a zombie process, where an entry still exists in the process table even though there are no (longer) any threads.
You can choose not to use an explicit threading library, or an operating system that has no concept of threads (and so doesn't call it a thread), but for most modern programming all programs have at least one thread of execution (generally referred to as a main thread or UI thread or similar). If that exits, so does the process.
Thought experiment: what would a process with zero threads of execution do?
In theory, I don't see why not. But it would be impossible with the popular operating systems.
A process typically consists of a few different parts:
Threads
Memory space
File discriptors
Environment (root directory, current directory, etc.)
Privileges (UID, etc.)
Et cetera
In theory, a process could exist with no threads as an RPC server. Other processes would make RPC calls which spawn threads in the server process, and then the threads disappear when the function returns. I don't know of any operating systems that work this way.
On most OSs, the process exits either when the last thread exits, or when the main thread exits.
Note: This ignores the "useless" cases such as zombie processes, which have no threads but don't do anything.
"main" itself is thread. Its a thread that gets executed. So, every process runs on at least one thread.

backgrounding a threaded application with fork()

So I have an application which uses threads. Now when the program first starts up, I want it to go through setting up database connections and whatnot before it backgrounds itself so that whatever/whoever starts the program can know if there was an error starting up.
I did some looking around and have found some resources that say 'do not mix fork and threads', while others say that forking in linux will only duplicate the main thread and leave the others alone.
In the case of the latter (where it just duplicates the main thread), how then do the threads access file level (global) variables? Will the threads not be able to access the variables that are now in the forked process's address space?
Ultimately the goal is to have the application background itself after threads have been created. If this is not possible, I can put the fork before the thread creation, just would like to do it as late as possible.
Note: at the time of the fork, the threads will be doing a sleep() loop until the main thread puts data into a shared variable for them to process. So if the sleep gets interrupted, they wont be harmed.
There is no way to duplicate threads as part of the fork, and the parent's threads will all terminate when the parent exits, so even if they could access the child's memory, it wouldn't help you. You need to either create your threads after forking, or use pthread_atfork to register handlers that will recreate them in the child process. I would recommend just waiting until after forking to create your threads since it's a lot simpler and more efficient.
Why is it that you want to delay forking as long as possible? If you want to maintain connection to a terminal or something until initialization is finished, you can just have the parent process wait to terminate until the child process (with its threads) is done initializing and ready to be "in the background". Various synchronization tools could be used to accomplish this. One simple one would be opening a pipe through which the child sends its output back to the parent to display; the parent could simply exit when it receives EOF on this pipe.
Forking a process creates two different processes and threads in one process will not be able to access memory in the second process. If you want different processes to access the same memory, you want something called shared memory.
When a thread in a process calls fork(), a new process is created by copying, among other things, (1) the full address space of the process and (2) the (one) thread that called fork. If there are other threads in the process, they don't get copied. This will almost certainly lead to bugs in your program. Hence the advice not to mix threads and forks.
If you want to create a background process with many threads, you must fork it before spawning any other thread. Then, the two processes behave normally, like any two isolated processes: threads within one process share the same memory, but your background threads and your foreground process won't share any memory (by default).

Resources