Linux thread id recycle strategy - c

What's the recycle strategy of Linux thread ID ?
Linux process ID will not be reused immediately unless new PID get the max limitation and being rewinded.
When I use pthread_self() to get thread id, I got TIDs like 1028, 1034. I guess it is the inner "serial number" of threads in a process. So I guess it would be more appropriate to use a thread id recycle strategy like PID recycle strategy.
But I am not quite sure whether it is true as to Linux pthread implementation.

A threaded linux process has
an OS pid shared by all threads within the process - use getpid
each thread within the process has its own OS thread id - use gettid
a pthreads thread id used internally by pthreads to identify threads when making various pthread related calls - use pthread_self and similar.
It can't be determine from your question if you trying to implement a "recycle strategy" or why you think you need to do so.
Edit
As an idle curiosity you can look through the linux pthread code but technically you have no reason to care. The POSIX spec basically just says the thread id is guaranteed to be unique within a process and is free to be reused after a thread dies.
Although implementations may have thread IDs that are unique in a system, applications should only assume that thread IDs are usable and unique within a single process. The effect of calling any of the functions defined in this volume of IEEE Std 1003.1-2001 and passing as an argument the thread ID of a thread from another process is unspecified. A conforming implementation is free to reuse a thread ID after the thread terminates if it was created with the detachstate attribute set to PTHREAD_CREATE_DETACHED or if pthread_detach() or pthread_join() has been called for that thread.

In linux threads are implemented as processes (with shared memory and other stuff)
so the kernel thread ids (the ones you get through gettid()) are really process ids.
This is also indicated by the fact that the id of first thread of a process and that process' id are one and the same.
Now i don't know exactly what is the pid-allocation algorithm employed by the linux kernel, but i believe it makes some effort to avoid rapid pid reuse (i think i have read about that somewhere but cant remember).
Note that those are the kernel thread ids (returned by the syscall gettid()), which is different thing from the "pthread_t" (returned by the library function pthread_self()).
While both can be used to uniquely identify threads, the former is linux-specific, so if your code needs to be portable you better avoid it (or use #ifdef-s).

Related

Create a user level thread or kernel level thread using `pthread_create`?

Question: How can one create a user level thread or kernel level thread using pthread_create?
Note: I checked the documentation of pthread_create in this link and I didn't find any parameter that can be specified to tell OS to create either user level thread or the kernel level thread. So if there is no parameter then when thread created using pthread_create by default is user level or kernel level?
Any information or hint would be great.
Thanks.
pthread_create simply creates a thread. Not "a kernel-level thread" or "a user-level thread". The latter are descriptions you could use talking about implementation of threads, but as far as POSIX threads are concerned, there is no practical way to implement threads without each thread having some corresponding scheduling/state object belonging to the kernel. This is because each thread has independent signal mask, pending signals, etc. and can be independently blocked in various operations that allow other threads to make forward progress while they are blocked. So in some sense, you could say pthread_create creates "kernel level threads". That's certainly the mechanism in all major real-world implementations.

Relation between Thread ID and Process ID

I am having some confusion between Process Id and Thread Id. I have gone through several web-post including stack overflow here, Which says
starting a new process gives you a new PID and a new TGID, while starting a new thread gives you a new PID while maintaining the same TGID.
So when I run a program why all the threads created from the program don't have different PID?
I know in programming we usually say that the main is a thread and execution starts from main , So if I create multiple thread from main, all the threads will have the same PID which is equal to the main's PID.
So what I wanted to ask is as below:
1) When we run a program it will run as a process or a thread?
2) Is there any difference between main thread creating threads and Process creating threads?
3) Is there any difference between thread and process in linux? Since I read somewhere that linux doesn't differentiate between Thread and Process.
Simplifying a bit:
The PID is the process ID, TID is the thread ID. The thing is that for the first thread created by fork(), PID=TID. If you create more threads within the process, with a clone() command, then PID and TID will be different, PID will always be smaller than TID.
No, there is no difference, except maybe that if main is killed, all other threads are also killed.
Yes, the thread is what actually gets scheduled. Technically, the process is only a memory mapping of the different segments of code (text, bss, stack, heap and the OS).
This confusion comes from the Linux concept of tasks.
In Linux there is little difference between a task and a thread though.
Every process is a self contained VM running at least one task.
Each task is an independent execution unit within a process scope.
The main task of a process gives it's task id (TID) to the process as it's process id (PID).
Every new thread that you spawn within a process creates a new task within it. In order to identify then individually in the kernel they get assigned their own individual task id (TID).
All tasks within a process share the same task group id (TGID).
I got the answer here on stackoverflow. It states that if we run a program on Linux that contains the libc libuClibc-0.9.30.1.so (1). Basically an older version of libc then thread created will have different PID as shown below
root#OpenWrt:~# ./test
main thread pid is 1151
child thread pid is 1153
and I tried to run this program with a linux that contains the libc from ubuntu libc6 (2) i.e newer version of libc then Thread created will have the same PID as the process.
$ ./test
main thread pid is 2609
child thread pid is 2609
The libc (1) use linuxthreads implementation of pthread
And the libc (2) use NPTL ("Native posix thread library") implementation of pthread
According to the linuxthreads FAQ (in J.3 answer):
each thread is really a distinct process with a distinct PID, and signals sent to the PID of a thread can only be handled by that thread
So in the old libc which use linuxthreads implementation, each thread has its distinct PID
In the new libc version which use NPTL implementation, all threads has the same PID of the main process.
The NPTL was developed by redhat team. and according to the redhat NPTL document: One of the problems which are solved in the NPTL implementation is:
(Chapter: Problems with the Existing Implementation, page5)
Each thread having a different process ID causes compatibility problems with other POSIX thread implementations. This is in part a moot point since signals can't be used very well but is still noticeable
And that explain this issue.
I am using the new libc version that contains the NPTL ("Native posix thread library") implementation of pthread.
The post you have shown describes of Linux threading implementation which I suppose is the older version of Linux implementation where threads were created as a different process.
In the POSIX implementation of threads, the threads are not created as a different process rather they create different streams of parallel execution of the code which have some components differ in the those parallel execution, the information of which is stored by Thread Descriptor storing the TID.
Whereas the process creating multiple thread can be referred as a multi-threaded process, thus has a same PID of all its thread but different TID's. The main process creating thread can be referred as Main thread
You will get same Process ID as all threads are sharing your program data which is your process so when you call for Process ID you get the same.

Group together two or more threads

I have a multi-thread application where each thread has a helper thread that helps the first one to accomplish a task. I would like that when a thread is terminated (likely calling exit) the helper thread is terminated as well.
I know that there is the possibility to use exit_group, but this system call kills all threads in the same group of the calling thread. For example, if my application has 10 threads (and therefore 10 additional helper threads) I would like that only the thread and the helper thread associated is terminated, while the other threads keep on running.
My application works exclusively on Linux.
How can I have this behavior?
Reading around about multithreading I got a bit confused about the concept of thread group and process group in Linux. Are these terms referring to the same thing?
Precisely, the process group (and perhaps the thread group) is the pid retrieved by one of the following calls :
pid_t getpgid(pid_t pid);
pid_t getpgrp(void); /* POSIX.1 version */
pid_t getpgrp(pid_t pid); /* BSD version */
You are a bit adrift here. Forget exit_group, which these days is the same as exit on linux is not what you are looking for. Similarly the various get-pid calls aren't really what you want either.
The simplest (and usually best) way to handle this is have each primary thread signal its helper thread to shut down and then pthread_join it - or not if it is detached.
So something like:
(a) primary work thread knows - however it knows - its work is done.
(b) signals helper thread via a shared switch or similar mechanism
(c) helper thread periodically checks flag, cleans up and calls pthread_exit
(d) primary worker thread calls pthread_join (or not) on dead helper thread
(e) primary worker cleans up and calls pthread_exit on itself.
There are a lot of variations on that but that's the basic idea. Beyond that you get into things like pthread_cancel and areas you may want to avoid if you don't absolutely require them (and the potential headaches).

Where does the forked process start from if a call of fork in a thread occurs?

I'm going to write a program in which the main thread creates new thread and then the new thread creates a child process. Since I have a hard time keeping track of the new thread and forked process, I'd like to gain a wise answer from someone.
My question is
1. Does a created process in a thread start to execute codes after pthread_create?
2. If 1 is not, where does the forked process start from if a call of fork in a thread occurs?
Thank you for reading my question.
Some of this is a bit OS-dependent, as different systems have different POSIX thread implementations and this can expose internals.
POSIX offers pthread_atfork as a somewhat blunt instrument for dealing with some of the issues, but it still looks pretty messy to me.
If your system uses a one-to-one map between "user land thread" and "kernel thread" using clone or rfork to achieve proper user-space sharing of data between threads, then fork will merely duplicate the (single) thread that calls it. However, if your system has a many-to-many style mapping (so that one user process is handling multiple threads, at least before they enter into blocking syscalls), fork may internally duplicate multiple threads. POSIX says it should look like it only duplicated one thread, so that's not supposed to be visible, but I'm not sure how well all systems implement this.
There's some general advice at http://www.linuxprogrammingblog.com/threads-and-fork-think-twice-before-using-them (Linux-centric, obviously, but still useful).
Is there some particular reason you want to fork inside a thread but not exec? In general, if you just want to run more code in parallel, you just spin off yet another thread (i.e., once you choose to run any threads, you do everything in threads, except if you have to fork for exec; if the exec fails, just _exit).

Zero Threaded Process?

Does a process have to have at least one thread in it? Is it possible for a process to be void of any threads, or does this not make sense?
A process usually has at least one thread. Wikipedia has the definition:
a thread of execution is the smallest unit of processing that can be scheduled by an operating system. The implementation of threads and processes differs from one operating system to another, but in most cases, a thread is contained inside a process.
The MSDN backs this up:
A processor executes threads, not processes, so each application has at least one process, and a process always has at least one thread of execution, known as the primary thread.
Though it does go on to say:
A process can have zero or more single-threaded apartments and zero or one multithreaded apartment.
Which implies that if both the number of single-threaded apartments and multithreaded apartments could be zero. However, the process wouldn't do much :)
In Unix-like operating systems, it's possible to have a zombie process, where an entry still exists in the process table even though there are no (longer) any threads.
You can choose not to use an explicit threading library, or an operating system that has no concept of threads (and so doesn't call it a thread), but for most modern programming all programs have at least one thread of execution (generally referred to as a main thread or UI thread or similar). If that exits, so does the process.
Thought experiment: what would a process with zero threads of execution do?
In theory, I don't see why not. But it would be impossible with the popular operating systems.
A process typically consists of a few different parts:
Threads
Memory space
File discriptors
Environment (root directory, current directory, etc.)
Privileges (UID, etc.)
Et cetera
In theory, a process could exist with no threads as an RPC server. Other processes would make RPC calls which spawn threads in the server process, and then the threads disappear when the function returns. I don't know of any operating systems that work this way.
On most OSs, the process exits either when the last thread exits, or when the main thread exits.
Note: This ignores the "useless" cases such as zombie processes, which have no threads but don't do anything.
"main" itself is thread. Its a thread that gets executed. So, every process runs on at least one thread.

Resources