pthread doesn't let you get the tid? - c

In Linux, the tid is retrieved with a syscall: gettid().
The pthread object stores the tid in struct pthread:
struct pthread {
...
/* This descriptor's link on the `stack_used' or `__stack_user' list. */
list_t list;
/* Thread ID - which is also a 'is this thread descriptor (and
therefore stack) used' flag. */
pid_t tid;
/* Ununsed. */
pid_t pid_ununsed;
...
};
I believe this structure is stored at the base of the stack? Anyway, every pthread has access to it. For instance, in the pthread_create implementation, it grabs the struct pthread:
struct pthread *self = THREAD_SELF;
So my question is: why is there no pthread_gettid_np() call? Perhaps gettid() is so fast somehow it's negligible? Or maybe the call exists and I just can't find it anywhere?

Related

Why does printing the pointer to structure of type pthread, gives us thread ID?

The structure of pthread is as follows. It is taken from https://stuff.mit.edu/afs/sipb/project/pthreads/include/pthread.h
struct pthread {
struct machdep_pthread machdep_data;
enum pthread_state state;
pthread_attr_t attr;
/* Signal interface */
sigset_t sigmask;
sigset_t sigpending;
/* Time until timeout */
struct timespec wakeup_time;
/* Cleanup handlers Link List */
struct pthread_cleanup *cleanup;
/* Join queue for waiting threads */
struct pthread_queue join_queue;
/* Queue thread is waiting on, (mutexes, cond. etc.) */
struct pthread_queue *queue;
/*
* Thread implementations are just multiple queue type implemenations,
* Below are the various link lists currently necessary
* It is possible for a thread to be on multiple, or even all the
* queues at once, much care must be taken during queue manipulation.
*
* The pthread structure must be locked before you can even look at
* the link lists.
*/
struct pthread *pll; /* ALL threads, in any state */
/* struct pthread *rll; Current run queue, before resced */
struct pthread *sll; /* For sleeping threads */
struct pthread *next; /* Standard for mutexes, etc ... */
/* struct pthread *fd_next; For kernel fd operations */
int fd; /* Used when thread waiting on fd */
semaphore lock;
/* Data that doesn't need to be locked */
void *ret;
int error;
const void **specific_data;
};
typedef struct pthread * pthread_t;
Now let's see the following code to print the ID of the thread:
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
void* calls(void* ptr)
{
// using pthread_self() get current thread id
printf("In function \nthread id = %ld\n", pthread_self());
pthread_exit(NULL);
return NULL;
}
int main()
{
pthread_t thread; // declare thread
pthread_create(&thread, NULL, calls, NULL);
printf("In main \nthread id = %ld\n", thread);
pthread_join(thread, NULL);
return 0;
}
Output in my system is:
In main
thread id = 140289852200704
In function
thread id = 140289852200704
From pthread.h file (above), pthread is a structure, thread in the code is a pointer to the structure pthread (since pthread_t is typdef struct pthread*). Why does printing this pointer gives us the thread ID?
From pthread.h file (above), pthread is a structure, thread in the
code is a pointer to the structure pthread (since pthread_t is typdef
struct pthread*).
To be clear: in that implementation, pthread_t is a pointer-to-structure type. I imagine that's very common for pthreads implementations, but do be careful to avoid mistaking the details of a particular implementation for a general characteristic of the specifications or of all implementations. For example, it could just as well be an integer index in some other implementation, among various other possibilities.
Why does printing this pointer gives us the thread
ID?
Because it is the thread ID. And because you're lucky that the undefined behavior arising from printing it with a %d formatting directive manifested the same way in both places.
You probably have done yourself a disservice by looking under the covers at the definition of your implementation's pthread_t. You don't need to know those details to use pthreads, and in fact they do not help you in the slightest. The type is meant to be treated as opaque.
All you really need to understand to answer the question is that the value written into variable thread by pthread_create() is the created thread's ID, and the value returned by pthread_self() is the calling thread's thread ID. Naturally, each mechanism for obtaining a thread ID yields the same ID for the same thread.

Passing a struct by value to another function initializes it to zero

I am trying to create a thread library and my thread is a struct type. Have to follow a certain interface and in that I need to pass the thread by value. For ex: to join on a thread my code is as follows:
int thread_join(thread_t thread, void **status1)
{
printf("Joining thread\n");
long int thId = thread.id;
printf("Thread id: %ld\n", thId);
gtthread_t * thrd = getThreadFromID(thId);
while(thrd->status != EXIT)
{
}
status1 = &(thrd->ret_value);
return 0;
}
And I an passing a struct of type thread_t to this function. My problem is when I see the thread's ID in the calling function, its displayed properly but when I check it in the thread_join function its displayed as 0. The caller function is as follows:
void* caller(void* arg)
{
thread_t th;
thread_create(&th, some_function, NULL);
thread_join(th, NULL);
while(1);
}
Thread create initializes the ID of the thread to a non-zero value and starts the function associated with it.
My thread structure (and other relevant structure is):
typedef enum
{
RUNNING,
WAITING,
CANCEL,
EXIT
} stat;
//Thread
typedef struct
{
ucontext_t t_ctxt;
long int id;
stat status;
void * ret_value;
int isMain;
} thread_t;
int thread_create(thread_t *thread, void *(*start_routine)(void *), void *arg)
{
thread = (thread_t *)malloc(sizeof(thread_t));
thread->id = ++count;
thread->status = RUNNING;
thread->ret_value = NULL;
thread->isMain = 0;
if(getcontext(&(thread->t_ctxt)) == -1)
handle_error("getcontext");
thread->t_ctxt.uc_stack.ss_sp = malloc(SIGSTKSZ);
thread->t_ctxt.uc_stack.ss_size = SIGSTKSZ;
thread->t_ctxt.uc_link = &sched_ctxt;
makecontext(&thread->t_ctxt, (void (*)(void))wrap_func, 2, (void (*)(void))start_routine, arg);
enqueue(gQ, thread);
printf("Thread id: %ld\n", thread->id);
swapcontext(&(curr_thread->t_ctxt),&sched_ctxt);
return 0;
}
Why does this happen? After all, I am passing by value and this should create a copy of the thread with the same values. Thanks.
EDIT:
Basically I am having a queue of threads and there is a scheduler which round-robins. I can post that code here too but I'm sure that's needless and that code works fine.
EDIT2:
I am making a header file from this code and including that header in another file to test it. All my thread_t variables are static. The caller is a function which includes my header file.
What is this line:
thread = (thread_t *)malloc(sizeof(thread_t));
for?
You pass in to thread_create() an address which referrs to a struct thread_t defined in caller() as auto variable.
Doing as you do, you allocate memory to the pointer passed in to thread_create() initialise it and forget the address on return.
The code never writes to the memory being referenced by the address passed in! Besides this it is a memory leak.
To fix this simply remove the line of code quoted above.
You have no mutex guard on thread id getter. Presumably, there is no guard on setter. What can be happening is that the variable is not visible in the other thread yet. And, without a critical section, it may never become visible.
Each variable which is accessed for both read and write from different threads has to be accessed in a critical section (pthread_mutex_lock / unlock).
Another possibility is that you are setting the thread id inside the running thread and you are accessing the variable even before it is set. If you attempt to join immediately after starting a thread it is possible, that the other thread hasn't been run at all yet and the variable is not set.
side note: do yourself a favor and use calloc:)
In caller function,
thread_create(&th, some_function, NULL);
should be
gtthread_create(&th, some_function, NULL);

How to get thread id of a pthread in linux c program?

In a Linux C program, how do I print the thread id of a thread created by the pthread library? For example like how we can get pid of a process by getpid().
What? The person asked for Linux specific, and the equivalent of getpid(). Not BSD or Apple. The answer is gettid() and returns an integral type. You will have to call it using syscall(), like this:
#include <sys/types.h>
#include <unistd.h>
#include <sys/syscall.h>
....
pid_t x = syscall(__NR_gettid);
While this may not be portable to non-linux systems, the threadid is directly comparable and very fast to acquire. It can be printed (such as for LOGs) like a normal integer.
pthread_self() function will give the thread id of current thread.
pthread_t pthread_self(void);
The pthread_self() function returns the Pthread handle of the calling thread. The pthread_self() function does NOT return the integral thread of the calling thread. You must use pthread_getthreadid_np() to return an integral identifier for the thread.
NOTE:
pthread_id_np_t tid;
tid = pthread_getthreadid_np();
is significantly faster than these calls, but provides the same behavior.
pthread_id_np_t tid;
pthread_t self;
self = pthread_self();
pthread_getunique_np(&self, &tid);
As noted in other answers, pthreads does not define a platform-independent way to retrieve an integral thread ID.
On Linux systems, you can get thread ID thus:
#include <sys/types.h>
pid_t tid = gettid();
On many BSD-based platforms, this answer https://stackoverflow.com/a/21206357/316487 gives a non-portable way.
However, if the reason you think you need a thread ID is to know whether you're running on the same or different thread to another thread you control, you might find some utility in this approach
static pthread_t threadA;
// On thread A...
threadA = pthread_self();
// On thread B...
pthread_t threadB = pthread_self();
if (pthread_equal(threadA, threadB)) printf("Thread B is same as thread A.\n");
else printf("Thread B is NOT same as thread A.\n");
If you just need to know if you're on the main thread, there are additional ways, documented in answers to this question how can I tell if pthread_self is the main (first) thread in the process?.
pid_t tid = syscall(SYS_gettid);
Linux provides such system call to allow you get id of a thread.
You can use pthread_self()
The parent gets to know the thread id after the pthread_create() is executed sucessfully, but while executing the thread if we want to access the thread id we have to use the function pthread_self().
This single line gives you pid , each threadid and spid.
printf("before calling pthread_create getpid: %d getpthread_self: %lu tid:%lu\n",getpid(), pthread_self(), syscall(SYS_gettid));
I think not only is the question not clear but most people also are not cognizant of the difference. Examine the following saying,
POSIX thread IDs are not the same as the thread IDs returned by the
Linux specific gettid() system call. POSIX thread IDs are assigned
and maintained by the threading implementation. The thread ID returned
by gettid() is a number (similar to a process ID) that is assigned by
the kernel. Although each POSIX thread has a unique kernel thread ID
in the Linux NPTL threading implementation, an application generally
doesn’t need to know about the kernel IDs (and won’t be portable if it
depends on knowing them).
Excerpted from: The Linux Programming Interface: A Linux and UNIX System Programming Handbook, Michael Kerrisk
IMHO, there is only one portable way that pass a structure in which define a variable holding numbers in an ascending manner e.g. 1,2,3... to per thread. By doing this, threads' id can be kept track. Nonetheless, int pthread_equal(tid1, tid2) function should be used.
if (pthread_equal(tid1, tid2)) printf("Thread 2 is same as thread 1.\n");
else printf("Thread 2 is NOT same as thread 1.\n");
pthread_getthreadid_np wasn't on my Mac os x. pthread_t is an opaque type. Don't beat your head over it. Just assign it to void* and call it good. If you need to printf use %p.
There is also another way of getting thread id. While creating threads with
int pthread_create(pthread_t * thread, const pthread_attr_t * attr, void * (*start_routine)(void *), void *arg);
function call; the first parameter pthread_t * thread is actually a thread id (that is an unsigned long int defined in bits/pthreadtypes.h). Also, the last argument void *arg is the argument that is passed to void * (*start_routine) function to be threaded.
You can create a structure to pass multiple arguments and send a pointer to a structure.
typedef struct thread_info {
pthread_t thread;
//...
} thread_info;
//...
tinfo = malloc(sizeof(thread_info) * NUMBER_OF_THREADS);
//...
pthread_create (&tinfo[i].thread, NULL, handler, (void*)&tinfo[i]);
//...
void *handler(void *targs) {
thread_info *tinfo = targs;
// here you get the thread id with tinfo->thread
}
For different OS there is different answer. I find a helper here.
You can try this:
#include <unistd.h>
#include <sys/syscall.h>
int get_thread_id() {
#if defined(__linux__)
return syscall(SYS_gettid);
#elif defined(__FreeBSD__)
long tid;
thr_self(&tid);
return (int)tid;
#elif defined(__NetBSD__)
return _lwp_self();
#elif defined(__OpenBSD__)
return getthrid();
#else
return getpid();
#endif
}
Platform-independent way (starting from c++11) is:
#include <thread>
std::this_thread::get_id();
You can also write in this manner and it does the same. For eg:
for(int i=0;i < total; i++)
{
pthread_join(pth[i],NULL);
cout << "SUM of thread id " << pth[i] << " is " << args[i].sum << endl;
}
This program sets up an array of pthread_t and calculate sum on each. So it is printing the sum of each thread with thread id.

pthread_t is initialised for thread it is defined in?

I am using pthread_t to print out the pid of a thread that I manually create in C. However, I print it before I create my new thread (passing it by ref as a parameter) and it prints a different value (presumably the thread that my main function is executing on). I would have expected it to default to be 0 or unitialised. Any ideas?
Thanks,
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <pthread.h>
struct thread_info { /* Used as argument to thread_start() */
pthread_t thread_id;/* ID returned by pthread_create() */
};
static void *thread_1_start(void *arg) {
struct thread_info *myInfo = arg;
printf("Started thread id: %d\n", myInfo->thread_id);
pthread_exit(0);
}
int main() {
struct thread_info tinfo;
int s;
printf("Main thread id: %d\n", tinfo.thread_id);
s = pthread_create(&tinfo.thread_id,
NULL, // was address of attr, error as this was not initialised.
&thread_1_start,
&tinfo);
pthread_join(tinfo.thread_id,NULL);
}
Actual output:
Main thread id: 244580352
Started thread id: 245325824
Expected output:
Main thread id: // 0 or undefined
Started thread id: 245325824
The problem is you are not initialising tinfo structure.
In local variables (as opposed to global/heap variables), values are not initialised in C Programming Language.
So, if you do something like:
int c;
printf("%d", c);
You should not expect a coherent value since it will depend on what's on that memory location in that moment.
You need to initialize tinfo variable. Using memset or assigning tinfo.thread_id = 0 explicitly.
There is no thread-specific logic to initialize tinfo; it is just a regular C struct. It will have whatever data was in that memory address at the initialization. You need to explicitly initialize it.
You can initialize the value to zero by:
struct thread_info tinfo = { 0 };
Declare struct thread_info tinfo; global and see what happens.
There's a number of important things you need to know.
First, pthread_t is opaque. You can't reliably print it with printf because nowhere in the POSIX standard is pthread_t specified as beinban into, struct or whatever. By definition you can't print it and get a meaningful output.
Second, if a thread needs to know it's pthread_t ID it can call pthread_self(). You don't need to tell the thread what its ID is externally like you're trying to do.
But never mind that! The condition you describe where the printed output is close to what you're expecting is because you have a race between the thread printing out and pthread_create assigning the pthread_t to thread_info.thread_id, and due to pthread_t actually being an integer type on Linux (so it's likely that they're allocated sequentially, and you're just getting an old value).

Efficient way to find task_struct by pid

Is there an efficient way of finding the task_struct for a specified PID, without iterating through the task_struct list?
What's wrong with using one of the following?
extern struct task_struct *find_task_by_vpid(pid_t nr);
extern struct task_struct *find_task_by_pid_ns(pid_t nr,
struct pid_namespace *ns);
If you want to find the task_struct from a module, find_task_by_vpid(pid_t nr) etc. are not going to work since these functions are not exported.
In a module, you can use the following function instead:
pid_task(find_vpid(pid), PIDTYPE_PID);
There is a better way to get the instance of task_struct from a module.
Always try to use wrapper function/ helper routines because they are designed in such a way if driver programmer missed something, the kernel can take care by own. For eg - error handling, conditions checks etc.
/* Use below API and you will get a pointer of (struct task_struct *) */
taskp = get_pid_task(pid, PIDTYPE_PID);
and to get the PID of type pid_t. you need to use below API -
find_get_pid(pid_no);
You don't need to use "rcu_read_lock()" and "rcu_read_unlock()" while calling these API's because "get_pid_task()" internally calls rcu_read_lock(),rcu_read_unlock() before calling "pid_task()" and handles concurrency properly. That's why I have said above use these kind of wrapper always.
Snippet of get_pid_task() and find_get_pid() function below :-
struct task_struct *get_pid_task(struct pid *pid, enum pid_type type)
{
struct task_struct *result;
rcu_read_lock();
result = pid_task(pid, type);
if (result)
get_task_struct(result);
rcu_read_unlock();
return result;
}
EXPORT_SYMBOL_GPL(get_pid_task);
struct pid *find_get_pid(pid_t nr)
{
struct pid *pid;
rcu_read_lock();
pid = get_pid(find_vpid(nr));
rcu_read_unlock();
return pid;
}
EXPORT_SYMBOL_GPL(find_get_pid);
In a kernel module, you can use wrapper function in the following way as well -
taskp = get_pid_task(find_get_pid(PID),PIDTYPE_PID);
PS: for more information on API's you can look at kernel/pid.c
No one mentioned that the pid_task() function and the pointer (which you obtain from it) should be used inside RCU critical section (because it uses RCU-protected data structure). Otherwise there can be use-after-free BUG.
There are lots of cases of using pid_task() in Linux kernel sources (e.g. in posix_timer_event()).
For example:
rcu_read_lock();
/* search through the global namespace */
task = pid_task(find_pid_ns(pid_num, &init_pid_ns), PIDTYPE_PID);
if (task)
printk(KERN_INFO "1. pid: %d, state: %#lx\n",
pid_num, task->state); /* valid task dereference */
rcu_read_unlock(); /* after it returns - task pointer becomes invalid! */
if (task)
printk(KERN_INFO "2. pid: %d, state: %#lx\n",
pid_num, task->state); /* may be successful,
* but is buggy (task dereference is INVALID!) */
Find out more about RCU API from Kernel.org
P.S. also you can just use the special API functions like find_task_by_pid_ns() and find_task_by_vpid() under the rcu_read_lock().
The first one is for searching through the particular namespace:
task = find_task_by_pid_ns(pid_num, &init_pid_ns); /* e.g. init namespace */
The second one is for searching through the namespace of current task.

Resources