I am having some confusion between Process Id and Thread Id. I have gone through several web-post including stack overflow here, Which says
starting a new process gives you a new PID and a new TGID, while starting a new thread gives you a new PID while maintaining the same TGID.
So when I run a program why all the threads created from the program don't have different PID?
I know in programming we usually say that the main is a thread and execution starts from main , So if I create multiple thread from main, all the threads will have the same PID which is equal to the main's PID.
So what I wanted to ask is as below:
1) When we run a program it will run as a process or a thread?
2) Is there any difference between main thread creating threads and Process creating threads?
3) Is there any difference between thread and process in linux? Since I read somewhere that linux doesn't differentiate between Thread and Process.
Simplifying a bit:
The PID is the process ID, TID is the thread ID. The thing is that for the first thread created by fork(), PID=TID. If you create more threads within the process, with a clone() command, then PID and TID will be different, PID will always be smaller than TID.
No, there is no difference, except maybe that if main is killed, all other threads are also killed.
Yes, the thread is what actually gets scheduled. Technically, the process is only a memory mapping of the different segments of code (text, bss, stack, heap and the OS).
This confusion comes from the Linux concept of tasks.
In Linux there is little difference between a task and a thread though.
Every process is a self contained VM running at least one task.
Each task is an independent execution unit within a process scope.
The main task of a process gives it's task id (TID) to the process as it's process id (PID).
Every new thread that you spawn within a process creates a new task within it. In order to identify then individually in the kernel they get assigned their own individual task id (TID).
All tasks within a process share the same task group id (TGID).
I got the answer here on stackoverflow. It states that if we run a program on Linux that contains the libc libuClibc-0.9.30.1.so (1). Basically an older version of libc then thread created will have different PID as shown below
root#OpenWrt:~# ./test
main thread pid is 1151
child thread pid is 1153
and I tried to run this program with a linux that contains the libc from ubuntu libc6 (2) i.e newer version of libc then Thread created will have the same PID as the process.
$ ./test
main thread pid is 2609
child thread pid is 2609
The libc (1) use linuxthreads implementation of pthread
And the libc (2) use NPTL ("Native posix thread library") implementation of pthread
According to the linuxthreads FAQ (in J.3 answer):
each thread is really a distinct process with a distinct PID, and signals sent to the PID of a thread can only be handled by that thread
So in the old libc which use linuxthreads implementation, each thread has its distinct PID
In the new libc version which use NPTL implementation, all threads has the same PID of the main process.
The NPTL was developed by redhat team. and according to the redhat NPTL document: One of the problems which are solved in the NPTL implementation is:
(Chapter: Problems with the Existing Implementation, page5)
Each thread having a different process ID causes compatibility problems with other POSIX thread implementations. This is in part a moot point since signals can't be used very well but is still noticeable
And that explain this issue.
I am using the new libc version that contains the NPTL ("Native posix thread library") implementation of pthread.
The post you have shown describes of Linux threading implementation which I suppose is the older version of Linux implementation where threads were created as a different process.
In the POSIX implementation of threads, the threads are not created as a different process rather they create different streams of parallel execution of the code which have some components differ in the those parallel execution, the information of which is stored by Thread Descriptor storing the TID.
Whereas the process creating multiple thread can be referred as a multi-threaded process, thus has a same PID of all its thread but different TID's. The main process creating thread can be referred as Main thread
You will get same Process ID as all threads are sharing your program data which is your process so when you call for Process ID you get the same.
Related
I am a bit confused I would like to know in detail, what happens if a C program with more than one thread creates new processes. Does the behaviour depends on which thread is creating new processes or how many threads create new processes?
With pthreads, only the calling thread is forked in the new process when fork is called.
From the Linux man page:
The child process is created with a single thread--the one that
called fork(). The entire virtual address space of the parent
is replicated in the child, including the states of mutexes,
condition variables, and other pthreads objects; the use of
pthread_atfork(3) may be helpful for dealing with problems that this
can cause.
There are however some versions of fork on Solaris that duplicate all threads.
From the Solaris man page:
A call to forkall() or forkallx() replicates in the child process
all of the threads (see
thr_create(3C) and pthread_create(3C)) in the parent process. A call to fork1() or forkx()
replicates only the calling thread in the child process.
A call to fork() is identical to a call to fork1(); only the calling
thread is replicated in the child process. This is the POSIX-specified
behavior for fork().
In releases of Solaris prior to Solaris 10, the behavior of fork()
depended on whether or not the application was linked with the
POSIX threads library. When linked with -lthread (Solaris Threads)
but not linked with -lpthread (POSIX Threads), fork() was the same
as forkall(). When linked with -lpthread, whether or not also
linked with -lthread, fork() was the same as fork1().
What's the recycle strategy of Linux thread ID ?
Linux process ID will not be reused immediately unless new PID get the max limitation and being rewinded.
When I use pthread_self() to get thread id, I got TIDs like 1028, 1034. I guess it is the inner "serial number" of threads in a process. So I guess it would be more appropriate to use a thread id recycle strategy like PID recycle strategy.
But I am not quite sure whether it is true as to Linux pthread implementation.
A threaded linux process has
an OS pid shared by all threads within the process - use getpid
each thread within the process has its own OS thread id - use gettid
a pthreads thread id used internally by pthreads to identify threads when making various pthread related calls - use pthread_self and similar.
It can't be determine from your question if you trying to implement a "recycle strategy" or why you think you need to do so.
Edit
As an idle curiosity you can look through the linux pthread code but technically you have no reason to care. The POSIX spec basically just says the thread id is guaranteed to be unique within a process and is free to be reused after a thread dies.
Although implementations may have thread IDs that are unique in a system, applications should only assume that thread IDs are usable and unique within a single process. The effect of calling any of the functions defined in this volume of IEEE Std 1003.1-2001 and passing as an argument the thread ID of a thread from another process is unspecified. A conforming implementation is free to reuse a thread ID after the thread terminates if it was created with the detachstate attribute set to PTHREAD_CREATE_DETACHED or if pthread_detach() or pthread_join() has been called for that thread.
In linux threads are implemented as processes (with shared memory and other stuff)
so the kernel thread ids (the ones you get through gettid()) are really process ids.
This is also indicated by the fact that the id of first thread of a process and that process' id are one and the same.
Now i don't know exactly what is the pid-allocation algorithm employed by the linux kernel, but i believe it makes some effort to avoid rapid pid reuse (i think i have read about that somewhere but cant remember).
Note that those are the kernel thread ids (returned by the syscall gettid()), which is different thing from the "pthread_t" (returned by the library function pthread_self()).
While both can be used to uniquely identify threads, the former is linux-specific, so if your code needs to be portable you better avoid it (or use #ifdef-s).
I have a running process which has created multiple user mode threads. If the kernel changes the state of the process to TASK_UNINTERRUPTIBLE (or TASK_INTERRUPTIBLE) do the threads created by the process automatically get suspended?
This is not a homework question. I'm reading an operating systems book which describes how a semaphore is implemented. In their implementation the semaphore struct maintains a linked list of processes currently waiting for the semaphore. From what I've learned so far, such a semaphore could only be used to synchronize processes, not threads. Correct? The threads in the linked list are put into a TASK_INTERRUPTIBLE state until the semaphore is available, at which point one process is woken up by setting its state to TASK_RUNNING.
In Linux each thread is a separate task running within a process scope. See /proc/self/task/. They are even created with the same kernel function as a new process. Threads in Linux originated as "lightweight processes".
Each task has a unique task id (tid), similar to the process id (pid) and indeed the master thread (the one executing main()) has the same tid as the process pid.
The only functional difference in Linux between threads and processes is that all threads (tasks) share all process resources apart from
scheduling parameters (includes TASK_UNINTERRUPTIBLE, TASK_INTERRUPTIBLE)
stack
task id
the main() thread identifies the process
So TASK_INTERRUPTIBLE can be applied to each thread individually.
As such semaphores are perfectly valid to use for synchronising threads. In this case if one thread blocks on a semaphore, it's jus that one thread.
I have three questions which are causing me a lot of doubts:
If one thread in a program calls fork(), does the new process
duplicate all threads, or is the new process single-threaded?
If a thread invokes exec(), will the program specified in the parameter
to exec() replace the entire process including ALL the threads?
Are system calls preemptive? For example whether a process can be scheduled in middle of a system call?
For exec, from man execve:
All threads other than the calling thread are destroyed during an execve().
From man fork:
The child process is created with a single thread — the one that called fork().
W.r.t. #3: Yes, you can invoke a system call that directly or indirectly makes another thread ready to run. And if that thread has a greater priority than the current and the system is designed to schedule it right then, it can do so.
Does a process have to have at least one thread in it? Is it possible for a process to be void of any threads, or does this not make sense?
A process usually has at least one thread. Wikipedia has the definition:
a thread of execution is the smallest unit of processing that can be scheduled by an operating system. The implementation of threads and processes differs from one operating system to another, but in most cases, a thread is contained inside a process.
The MSDN backs this up:
A processor executes threads, not processes, so each application has at least one process, and a process always has at least one thread of execution, known as the primary thread.
Though it does go on to say:
A process can have zero or more single-threaded apartments and zero or one multithreaded apartment.
Which implies that if both the number of single-threaded apartments and multithreaded apartments could be zero. However, the process wouldn't do much :)
In Unix-like operating systems, it's possible to have a zombie process, where an entry still exists in the process table even though there are no (longer) any threads.
You can choose not to use an explicit threading library, or an operating system that has no concept of threads (and so doesn't call it a thread), but for most modern programming all programs have at least one thread of execution (generally referred to as a main thread or UI thread or similar). If that exits, so does the process.
Thought experiment: what would a process with zero threads of execution do?
In theory, I don't see why not. But it would be impossible with the popular operating systems.
A process typically consists of a few different parts:
Threads
Memory space
File discriptors
Environment (root directory, current directory, etc.)
Privileges (UID, etc.)
Et cetera
In theory, a process could exist with no threads as an RPC server. Other processes would make RPC calls which spawn threads in the server process, and then the threads disappear when the function returns. I don't know of any operating systems that work this way.
On most OSs, the process exits either when the last thread exits, or when the main thread exits.
Note: This ignores the "useless" cases such as zombie processes, which have no threads but don't do anything.
"main" itself is thread. Its a thread that gets executed. So, every process runs on at least one thread.