Get thread identifier C - c

I am trying to get the identifier of a thread, but it always returns some random numbers.
What isn't good here ?
// C program to demonstrate working of pthread_self()
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
void* calls(void* ptr)
{
// using pthread_self() get current thread id
printf("In function \nthread id = %d\n", pthread_self());
pthread_exit(NULL);
return NULL;
}
int main()
{
pthread_t thread; // declare thread
pthread_create(&thread, NULL, calls, NULL);
printf("In main \nthread id = %d\n", thread);
pthread_join(thread, NULL);
return 0;
}

Maybe it would be good time to read the documentation. For Linux, it is said that
NOTES
POSIX.1 allows an implementation wide freedom in choosing the type used to represent a thread ID; for example, representation using either an arithmetic type or a structure is permitted. Therefore, variables of
type pthread_t can't portably be compared using the C equality operator (==); use pthread_equal(3) instead.
Thread identifiers should be considered opaque: any attempt to use a thread ID other than in pthreads calls is nonportable and can lead to unspecified results.
Thread IDs are guaranteed to be unique only within a process. A thread ID may be reused after a terminated thread has been joined, or a detached thread has terminated.
Which means that for example printing the value returned by pthread_self is not sensible.

In Linux/GLIBC the pthread_t type is actually the address of the "Thread Control Block" (TCB) of the thread (located at the bottom of its stack). This is unique. But as pointed out in a previous answer from #AnttiHaapala, no supposition can be done as the type should be considered opaque. So, printing it with a "%d" specifier is definitely not portable.
In Linux/GLIBC environment, you also have another unique identifier for a thread. This is its task identifier in the kernel obtained with a call to gettid(). It returns a pid_t typed value (signed integer suitable for "%d" format specifier) as discussed in this post.

You can use gettid(). Under Linux, it's available with glibc 2.30 or higher. If you have an older version, you can write your own trivial syscall wrapper.
#include <syscall.h>
static pid_t my_gettid(void)
{
return syscall(SYS_gettid);
}
#define gettid() my_gettid()

Related

Binary Semaphore program has extremely unpredictable output

My program uses an enum as a semaphore. There are two possible values/states(Because it's a binary semaphore). The program compiles fine. signal() and wait() look logical. Why is program behavior so unpredictable? Even the integer printf is buggy. Here's the code:
#include <stdio.h>
#include <pthread.h>
typedef enum {IN_USE,NOT_IN_USE} binary_semaphore;
binary_semaphore s=NOT_IN_USE;
struct parameters{
int thread_num;
};
void wait(){
while(s==IN_USE);
s=IN_USE;
}
void signal(){
s=NOT_IN_USE;
}
void resource(void *params){
//assuming parameter is a parameters struct.
struct parameters *p=(struct parameters*)params;
wait();
printf("Resource is being used by thread %d\n",(*p).thread_num);
signal();
}
int main(void){
pthread_t threads[4];
struct parameters ps[4]={{1},{2},{3},{4}};
register int counter=0;
while(counter++<4){
pthread_create(&threads[counter],NULL,resource,(void*)&ps[counter]);
}
return 0;
}
What's wrong with my code?
Some of the outputs(Yes, they're different every time):-
(NOTHING)
Resource is being used by thread 32514
Resource is being used by thread 0
Resource is being used by thread 0
Resource is being used by thread 32602
Resource is being used by thread -24547608
Is it a garbage value issue?
What's wrong with my code?
Multiple things, largely discussed in comments already. The most significant one is that your code is rife with data races. That is one of the things that semaphores are often used to protect against, but in this case it is your semaphores themselves that are racy. Undefined behavior results.
Additional issues include
A pthreads thread function must return void *, but yours returns void
You overrun the bounds of your main()'s ps and threads arrays
You do not join your threads before the program exits.
You define functions with the same names as a C standard library function (signal()) and a standard POSIX function (wait()).
Nevertheless, if your C implementation supports the atomics option then you can use that to implement a working semaphore not too different from your original code:
#include <stdio.h>
#include <pthread.h>
#include <stdatomic.h>
// This is used only for defining the enum constants
enum sem_val { NOT_IN_USE, IN_USE };
// It is vital that sem be *atomic*.
// With `stdatomic.h` included, "_Atomic int" could also be spelled "atomic_int".
_Atomic int sem = ATOMIC_VAR_INIT(NOT_IN_USE);
struct parameters{
int thread_num;
};
void my_sem_wait() {
int expected_state = NOT_IN_USE;
// See discussion below
while (!atomic_compare_exchange_strong(&sem, &expected_state, IN_USE)) {
// Reset expected_state
expected_state = NOT_IN_USE;
}
}
void my_sem_signal() {
// This assignment is performed atomically because sem has atomic type
sem = NOT_IN_USE;
}
void *resource(void *params) {
//assuming parameter is a parameters struct.
struct parameters *p = params;
my_sem_wait();
printf("Resource is being used by thread %d\n", p->thread_num);
my_sem_signal();
return NULL;
}
int main(void) {
pthread_t threads[4];
struct parameters ps[4] = {{1},{2},{3},{4}};
for (int counter = 0; counter < 4; counter++) {
pthread_create(&threads[counter], NULL, resource, &ps[counter]);
}
// It is important to join the threads if you care that they run to completion
for (int counter = 0; counter < 4; counter++) {
pthread_join(threads[counter], NULL);
}
return 0;
}
Most of that is pretty straightforward, but the my_sem_wait() function bears a little more explanation. It uses an atomic compare-and-swap to make sure that threads proceed only if they change the value of the semaphore from NOT_IN_USE to IN_USE, with the comparison and conditional assignment being performed as a single atomic unit. Specifically, this ...
atomic_compare_exchange_strong(&sem, &expected_state, IN_USE)
... says "Atomically, compare the value of sem to the value of expected_state and if they compare equal then assign value IN_USE to to sem." The function additionally sets the value of expected_state to the one read from sem if they differ, and returns the result of the equality comparison that was performed (equivalently: returns 1 if the specified value was assigned to sem and 0 if not).
It is essential that the comparison and swap be performed as an atomic unit. Individual atomic reads and writes would ensure that there is no data race, but they would not ensure correct program behavior, because two threads waiting on the semaphore could both see it available at the same time, each before the other had a chance to mark it unavailable. Both would then proceed. The atomic compare and swap prevents one thread reading the value of the semaphore between another's read and update of that value.
Do note, however, that unlike a pthreads mutex or a POSIX semaphore, this semaphore waits busily. That means threads waiting to acquire the semaphore consume CPU while they do, rather than going to sleep as threads waiting on a pthreads mutex do. That may be ok if semaphore access is usually uncontended or if threads never hold it locked very long, but under other circumstances it can make your program much more resource-hungry than it needs to be.
You are encountering race conditions as well as undefined behavior. When you use a regular int as a semaphore in multithreaded applications, there's no guarantee that a process will read a variable's value and modify it before another process is able to read it, which is why you must use a concurrency library designed for your operating system. Furthermore, the function pointer you passed to pthread_create is not the right type, which is undefined behavior.
I replaced your "semaphore" enum with a pointer to pthread_mutex_t which is initialized on the stack of main(), and each thread gets a pointer to it as a member of their struct parameter.
I also changed the definition of void resource(void* params) to void* resource(void *params) as that is a prototype that matches what pthread_create expects as its third parameter.
Your wait() and signal() functions were able to be replaced one-to-one with pthread_mutex_lock() and pthread_mutex_unlock() from pthread.h which you have already included.
#include <stdio.h>
#include <pthread.h>
struct parameters{
int thread_num;
pthread_mutex_t *mutex; //mutex could be global if you prefer
};
void* resource(void *params){ //pthread_create expects a pointer to a function that takes a void* and returns a void*
//assuming parameter is a parameters struct.
struct parameters *p = (struct parameters*)params;
pthread_mutex_lock(p->mutex);
printf("Resource is being used by thread %d\n", p->thread_num);
pthread_mutex_unlock(p->mutex);
return NULL;
}
int main(void){
pthread_t threads[4];
pthread_mutex_t mutex;
pthread_mutex_init(&mutex, NULL);
struct parameters ps[4]={{1, &mutex},{2, &mutex},{3, &mutex},{4, &mutex}};
for(int counter = 0; counter < 4; ++counter)
pthread_create(&threads[counter], NULL, resource, &ps[counter]);
//Threads should be joined
for(int counter = 0; counter < 4; ++counter)
pthread_join(threads[counter], NULL);
}
This will eliminate the stochasticity that you are experiencing.

How to get thread id of a pthread in linux c program?

In a Linux C program, how do I print the thread id of a thread created by the pthread library? For example like how we can get pid of a process by getpid().
What? The person asked for Linux specific, and the equivalent of getpid(). Not BSD or Apple. The answer is gettid() and returns an integral type. You will have to call it using syscall(), like this:
#include <sys/types.h>
#include <unistd.h>
#include <sys/syscall.h>
....
pid_t x = syscall(__NR_gettid);
While this may not be portable to non-linux systems, the threadid is directly comparable and very fast to acquire. It can be printed (such as for LOGs) like a normal integer.
pthread_self() function will give the thread id of current thread.
pthread_t pthread_self(void);
The pthread_self() function returns the Pthread handle of the calling thread. The pthread_self() function does NOT return the integral thread of the calling thread. You must use pthread_getthreadid_np() to return an integral identifier for the thread.
NOTE:
pthread_id_np_t tid;
tid = pthread_getthreadid_np();
is significantly faster than these calls, but provides the same behavior.
pthread_id_np_t tid;
pthread_t self;
self = pthread_self();
pthread_getunique_np(&self, &tid);
As noted in other answers, pthreads does not define a platform-independent way to retrieve an integral thread ID.
On Linux systems, you can get thread ID thus:
#include <sys/types.h>
pid_t tid = gettid();
On many BSD-based platforms, this answer https://stackoverflow.com/a/21206357/316487 gives a non-portable way.
However, if the reason you think you need a thread ID is to know whether you're running on the same or different thread to another thread you control, you might find some utility in this approach
static pthread_t threadA;
// On thread A...
threadA = pthread_self();
// On thread B...
pthread_t threadB = pthread_self();
if (pthread_equal(threadA, threadB)) printf("Thread B is same as thread A.\n");
else printf("Thread B is NOT same as thread A.\n");
If you just need to know if you're on the main thread, there are additional ways, documented in answers to this question how can I tell if pthread_self is the main (first) thread in the process?.
pid_t tid = syscall(SYS_gettid);
Linux provides such system call to allow you get id of a thread.
You can use pthread_self()
The parent gets to know the thread id after the pthread_create() is executed sucessfully, but while executing the thread if we want to access the thread id we have to use the function pthread_self().
This single line gives you pid , each threadid and spid.
printf("before calling pthread_create getpid: %d getpthread_self: %lu tid:%lu\n",getpid(), pthread_self(), syscall(SYS_gettid));
I think not only is the question not clear but most people also are not cognizant of the difference. Examine the following saying,
POSIX thread IDs are not the same as the thread IDs returned by the
Linux specific gettid() system call. POSIX thread IDs are assigned
and maintained by the threading implementation. The thread ID returned
by gettid() is a number (similar to a process ID) that is assigned by
the kernel. Although each POSIX thread has a unique kernel thread ID
in the Linux NPTL threading implementation, an application generally
doesn’t need to know about the kernel IDs (and won’t be portable if it
depends on knowing them).
Excerpted from: The Linux Programming Interface: A Linux and UNIX System Programming Handbook, Michael Kerrisk
IMHO, there is only one portable way that pass a structure in which define a variable holding numbers in an ascending manner e.g. 1,2,3... to per thread. By doing this, threads' id can be kept track. Nonetheless, int pthread_equal(tid1, tid2) function should be used.
if (pthread_equal(tid1, tid2)) printf("Thread 2 is same as thread 1.\n");
else printf("Thread 2 is NOT same as thread 1.\n");
pthread_getthreadid_np wasn't on my Mac os x. pthread_t is an opaque type. Don't beat your head over it. Just assign it to void* and call it good. If you need to printf use %p.
There is also another way of getting thread id. While creating threads with
int pthread_create(pthread_t * thread, const pthread_attr_t * attr, void * (*start_routine)(void *), void *arg);
function call; the first parameter pthread_t * thread is actually a thread id (that is an unsigned long int defined in bits/pthreadtypes.h). Also, the last argument void *arg is the argument that is passed to void * (*start_routine) function to be threaded.
You can create a structure to pass multiple arguments and send a pointer to a structure.
typedef struct thread_info {
pthread_t thread;
//...
} thread_info;
//...
tinfo = malloc(sizeof(thread_info) * NUMBER_OF_THREADS);
//...
pthread_create (&tinfo[i].thread, NULL, handler, (void*)&tinfo[i]);
//...
void *handler(void *targs) {
thread_info *tinfo = targs;
// here you get the thread id with tinfo->thread
}
For different OS there is different answer. I find a helper here.
You can try this:
#include <unistd.h>
#include <sys/syscall.h>
int get_thread_id() {
#if defined(__linux__)
return syscall(SYS_gettid);
#elif defined(__FreeBSD__)
long tid;
thr_self(&tid);
return (int)tid;
#elif defined(__NetBSD__)
return _lwp_self();
#elif defined(__OpenBSD__)
return getthrid();
#else
return getpid();
#endif
}
Platform-independent way (starting from c++11) is:
#include <thread>
std::this_thread::get_id();
You can also write in this manner and it does the same. For eg:
for(int i=0;i < total; i++)
{
pthread_join(pth[i],NULL);
cout << "SUM of thread id " << pth[i] << " is " << args[i].sum << endl;
}
This program sets up an array of pthread_t and calculate sum on each. So it is printing the sum of each thread with thread id.

why using pthread_exit?

I'm trying to figure out the usage of pthread_exit using this example code:
void* PrintVar(void* arg)
{
int * a = (int *) arg; // we can access memory of a!!!
printf( "%d\n", *a);
}
int main(int argc, char*argv[])
{
int a, rc;
a = 10;
pthread_t thr;
pthread_create( &thr, NULL, PrintVar, &a );
//why do I need it here?//
pthread_exit(&rc); /* process continues until last
threads termintates */
there are two things I'm not quite sure about :
when we are using pthread_create - I'm passing 'a' parameter's address,
but is this paramter being "saved" under "arg" of the PrintVar function?
for example if I was using : PrintVar(void *blabla) , and wanted to pass 2 parameters from main function : int a = 10, int b= 20 .. how can I do that?
Why the pthread_exit needed? it means - wait for proccess to end - but what scenario can I get if I won't use that line?
thanks alot!
when we are using pthread_create - I'm passing 'a' parameter's address, but is this paramter being "saved" under "arg" of the PrintVar function?
The "original" a (the one defined in main) is not being copied, you are only passing around a pointer to it.
for example if I was using : PrintVar(void *blabla) , and wanted to pass 2 parameters from main function : int a = 10, int b= 20 .. how can I do that?
Put those two values in a struct and pass a pointer to such struct as argument to pthread_create (PrintVar, thus, will receive such a pointer and will be able to retrieve the two values).
and my second question is why the pthread_exit needed? it means - wait for proccess to end - but what scenario can I get if I won't use that line?
pthread_exit terminates the current thread without terminating the process if other threads are still running; returning from main, instead, is equivalent to calling exit which, as far as the standard is concerned, should "terminate the program" (thus implicitly killing all the threads).
Now, being the C standard thread-agnostic (until C11) and support for threading in the various Unixes a relatively recent addition, depending from libc/kernel/whatever version exit may or may not kill just the current thread or all the threads.
Still, in current versions of libc, exit (and thus return from main) should terminate the process (and thus all its threads), actually using the syscall exit_group on Linux.
Notice that a similar discussion applies for the Windows CRT.
The detached attribute merely determines the behavior of the system when the thread terminates; it does not
prevent the thread from being terminated if the process terminates using exit(3) (or equivalently, if the
main thread returns).

pthread_create timing of writeback

In the call pthread_create(&id, NULL, &start_routine, arg), is the thread id guaranteed to be written to id before start_routine starts running? The manpages are clear that the start_routine may but will not necessarily begin executing before the call to pthread_create returns, but they are silent on when the thread id gets written back to the passed thread argument.
My specific case is that I have a wrapper around pthread_create:
int mk_thread(pthread_t *id) {
pthread_t tid;
pthread_create(&tid,NULL,ThreadStart,NULL);
if (id == NULL) {
pthread_detach(tid);
} else {
*id=lid;
}
}
which can obviously run the start routine before writing back. I changed it to
int mk_thread(pthread_t *id) {
pthread_t tid,tidPtr=id?id:&tid;
pthread_create(tidPtr,NULL,ThreadStart,NULL);
if (id == NULL) {
pthread_detach(tid);
}
}
This rewrite is much more stable in practice, but is it actually a fix or just a smaller window for the race condition?
The thread id is definitely written before pthread_create returns. If you think about it, it would be impossible for pthread_create to work any other way. It could not delegate writing the thread id to the new thread, because the pthread_t variable might be out of scope by the time the new thread runs.
The relevant text is:
Upon successful completion, pthread_create() shall store the ID of the created thread in the location referenced by thread.
(From http://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_create.html) Note that it says "on successful completion" of the function, not "at an indeterminate time after successful completion".
The more interesting question, and I'm unclear on this one, is whether pthread_create must have finished writing the thread id to its destination before the new thread start function begins, i.e. whether the new thread can immediately see its own thread id, e.g. if it's to be stored in a global variable. I suspect the answer is no.
Edit: Upon rereading your question, it seems like you might really have been asking about this latter, more interesting question. In any case, there's no reason for the new thread's start function to use the thread-id written out by pthread_create. Your new thread can (and should) just use pthread_self to get its own thread id.
I believe that nothing in the spec requires pthread_create to assign its output parameter pthread_t *thread before code in start_routine begins to execute.
As a matter of practicality, the following program succeeds on many pthreads implementations (freebsd8 i386 and debian gnu/linux amd64) but fails on one of interest to me (debian/kfreebsd9 amd64):
#include <pthread.h>
#include <assert.h>
#include <stdint.h>
#include <stdio.h>
pthread_t th;
void *asserter(void* unused) {
pthread_t self = pthread_self(), th_=th;
printf("th=%jd self=%jd\n", (intmax_t)th_, (intmax_t)self);
assert(pthread_equal(th_, self));
}
int main() {
int i;
for(i=0; i<1000; i++) {
pthread_create(&th, NULL, asserter, NULL);
pthread_join(th, NULL);
}
return 0;
}
that said, I am not sure I understand how this detail of behavior is relevant to the two code alternatives you offer in the original question. Though it occurs to me that if pthread_create writes other values to *thread during its execution, and you're using the value of *id in the other thread, it could be relevant. The standard does not specify that no other 'intermediate' values are written to *thread during successful execution of pthread_create.

How does POSIX Threads work in linux?

I thought pthread uses clone to spawn one new thread in linux. But if so, all of the threads should have their seperate pid. Otherwise, if they have the same pid, the global variables in the libc seem to be shared. However, as I ran the following program, I got the same pid but the different address of errno.
extern errno;
void*
f(void *arg)
{
printf("%u,%p\n", getpid(), &errno);
fflush(stdin);
return NULL;
}
int
main(int argc, char **argv)
{
pthread_t tid;
pthread_create(&tid, NULL, f, NULL);
printf("%u,%p\n", getpid(), &errno);
fflush(stdin);
pthread_join(tid, NULL);
return 0;
}
Then, why?
I'm not sure exactly how clone() is used when pthread_create() is called. That said, looking at the clone() man page, it looks like there is a flag called CLONE_THREAD which:
If CLONE_THREAD is set, the child is
placed in the same thread group as the
calling process. To make the remainder
of the discussion of CLONE_THREAD more
readable, the term "thread" is used to
refer to the processes within a thread
group.
Thread groups were a feature added in
Linux 2.4 to support the POSIX threads
notion of a set of threads that share
a single PID. Internally, this shared
PID is the so-called thread group
identifier (TGID) for the thread
group. Since Linux 2.4, calls to
getpid(2) return the TGID of the
caller.
It then goes on to talk about a gettid() function for getting the unique ID of an individual thread within a process. Modifying your code:
#include <stdio.h>
#include <pthread.h>
#include <sys/types.h>
#include <sys/syscall.h>
#include <unistd.h>
int errno;
void*
f(void *arg)
{
printf("%u,%p, %u\n", getpid(), &errno, syscall(SYS_gettid));
fflush(stdin);
return NULL;
}
int
main(int argc, char **argv)
{
pthread_t tid;
pthread_create(&tid, NULL, f, NULL);
printf("%u,%p, %u\n", getpid(), &errno, syscall(SYS_gettid));
fflush(stdin);
pthread_join(tid, NULL);
return 0;
}
(make sure to use "-lpthread"!) we can see that the individual thread id is indeed unique, while the pid remains the same.
rascher#coltrane:~$ ./a.out
4109,0x804a034, 4109
4109,0x804a034, 4110
Global variables: your mistake is that errno is not a global variable but a macro that expands to an lvalue of type int. In practice, it expands to (*__errno_location()) or similar.
getpid is a library function that returns the process id in the POSIX sense of process, not the bogus Linux per-clone pid. Nowadays Linux has the minimal kernel-level functionality necessary to make near-POSIX-compliance possible with respect to threads, but most of it still depends on ugly hacks at the userspace libc level.

Resources