#include <pthread.h>
void thread_routine(void*)
{
sleep(5);
pthread_detach(pthread_self());
sleep(5);
}
int main()
{
pthread_t t;
pthread_create(&t, 0, thread_routine, 0);
pthread_join(t);
}
Will pthread_join(t); return immediately after pthread_detach(pthread_self()); succeed?
The behavior is undefined, and thus obviously to be avoided at all costs.
(As far as I can tell the behavior is implicitly undefined. There are several kindred instances of explicitly undefined behavior in the spec, but this exact scenario is not mentioned.)
For the curious, on an NPTL Linux system near me, both the pthread_detach() and the pthread_join() return 0, and, moreover, the latter blocks and successfully gets the value returned by the thread. On an OS X system near me, by contrast, the pthread_detach() succeeds, and the pthread_join() immediately fails with ESRCH.
Your code is buggy. By the time you call pthread_join, the thread may already have terminated. Since it's detached, the pthread_t is not longer valid. So your code may pass an invalid pthread_t to pthread_join, which can cause unpredictable behavior.
To avoid these kinds of problems one specific thing should control the lifetime of a thread. That can be the thread itself, if it's detached, in which case no thread should try to join it. It can also be the thread that joins it, in which case the thread should not be detached.
Related
I've tested the thread cancellation process, and have a code like this.
It works on my ARM machine, and sometimes works fine, sometimes leads to a segfault, sometimes stuck after created.
#include <stdio.h>
#include <pthread.h>
#include <unistd.h>
#include <time.h>
void *t(void *ptr)
{
printf("in t\n");
sleep(0.3);
return NULL;
}
int main() {
pthread_t p;
pthread_create(&p, NULL, t, NULL);
printf("created\n");
pthread_detach(p);
pthread_cancel(p);
printf("canceled\n");
return 0;
}
have no idea which part is leading to the issue(stuck/segfault).
I answered this same question 18 years ago. It is not safe to call pthread_cancel on a detached thread.
Your code has a race condition so its behavior is undefined. If the thread manages to terminate before you call pthread_cancel, you are passing an invalid parameter to pthread_cancel and that is undefined behavior.
If the code in main is managing the lifetime of the thread, do not detach it because otherwise there is no way to ensure the thread ID remains valid. If the code in main is not managing the lifetime of the thread, do not call pthread_cancel in it. You will never find a safe way to split the difference.
You should think of pthread_detach as rendering the thread ID invalid (or "semantically closed" as Glenn Burkhardt put it) and not use it again.
As Some programmer dude points out, your sleep rounds to zero which makes the race condition more likely to encounter.
#include <pthread.h>
#include <unistd.h>
static void *tfunc(void *data)
{
return NULL;
}
int main(int argc, char **argv)
{
pthread_t t;
pthread_create(&t, NULL, tfunc, NULL);
sleep(1);
pthread_detach(t);
return 0;
}
See the MWE.
It works fine, but I am unsure if this is actually defined behavior. The man page of pthread_detach says nothing about calling it on exited threads.
Yes I know about creating threads with the detached attribute, but I am specifically curious about this situation. pthread_join has a mention on this case and I assume pthread_detach works just as fine, but I haven't found any official statement.
This code is perfectly legal, and does not invoke undefined behavior:
#include <pthread.h>
#include <unistd.h>
static void *tfunc(void *data)
{
return NULL;
}
int main(int argc, char **argv)
{
pthread_t t;
pthread_create(&t, NULL, tfunc, NULL);
sleep(1);
pthread_detach(t);
return 0;
}
It's not really clearly stated, but the POSIX documentation for pthread_detach() is worded in such a way that it must be defined and correct to call pthread_detach() on a terminated thread:
The pthread_detach() function shall indicate to the implementation
that storage for the thread thread can be reclaimed when that thread
terminates. If thread has not terminated, pthread_detach() shall not
cause it to terminate.
The behavior is undefined if the value specified by the thread
argument to pthread_detach() does not refer to a joinable thread.
First, note the statement "If thread has not terminated". That implies that it must be safe to call pthread_detach() when the thread has terminated.
Second, note "The behavior is undefined if ... does not refer to a joinable thread." In your posted code, the thread you created is clearly joinable - you didn't create it with a detached attribute, so you could call pthread_join() to retrieve its returned value. So it's not undefined behavior.
Remember, there's no guaranteed way to ensure from thread A that thread B is still running when either pthread_join() or pthread_detach() is called. So either call has to be safe to call (once!) from any thread on any other thread.
Also, from the Rationale section of the POSIX documentation:
RATIONALE
The pthread_join() or pthread_detach() functions should eventually
be called for every thread that is created so that storage associated
with the thread may be reclaimed.
It has been suggested that a "detach" function is not necessary; the
detachstate thread creation attribute is sufficient, since a thread
need never be dynamically detached. However, need arises in at least
two cases:
In a cancellation handler for a pthread_join() it is nearly essential to have a pthread_detach() function in order to detach the
thread on which pthread_join() was waiting. Without it, it would be
necessary to have the handler do another pthread_join() to attempt
to detach the thread, which would both delay the cancellation
processing for an unbounded period and introduce a new call to
pthread_join(), which might itself need a cancellation handler. A
dynamic detach is nearly essential in this case.
In order to detach the "initial thread" (as may be desirable in processes that set up server threads).
Again, while not clearly stated, note the implied equivalence between pthread_join() and pthread_detach().
I am getting back into using pthreads and the definition of pthread_join bothers me.
It says
"The pthread_join() function shall suspend execution of the calling
thread until the target thread terminates, unless the target thread
has already terminated. On return from a successful pthread_join()
call with a non-NULL value_ptr argument, the value passed to
pthread_exit() by the terminating thread shall be made available in
the location referenced by value_ptr. When a pthread_join() returns
successfully, the target thread has been terminated. The results of
multiple simultaneous calls to pthread_join() specifying the same
target thread are undefined. If the thread calling pthread_join() is
canceled, then the target thread shall not be detached."
I am trying to understand how, if I call pthread_join for one thread, then call pthread_join to start a second thread, the two threads are started, even though I imagine, the second pthread_join cannot be called because the first join has suspended the main thread from executing, and running the next line until pthread_exit is called from within the thread joined.
In particular, I imagine, the first pthread_join must wait until the specified thread has called pthread_exit, only then it should continue. However this is not the case, as I can do:
#include <pthread.h>
#include <stdio.h>
int avail = 0;
void *consumer(void *unused)
{
while (1) {
if (avail > 0) {
--avail;
puts("consumed");
}
}
}
void *producer(void *unused)
{
while (1) {
++avail;
puts("produced");
}
}
int main(int argc, char **argv)
{
pthread_t c_thread;
pthread_t p_thread;
pthread_create(&c_thread, 0, consumer, 0);
pthread_create(&p_thread, 0, producer, 0);
pthread_join(c_thread, 0);
pthread_join(p_thread, 0);
return 0;
}
ignoring the problem of possible race conditions to try to reduce code size, why are both the threads working, despite the first join suspending the main thread (thus, preventing the next join from being called, in my mind).
I would really like to understand how this works.
Thanks ahead of time.
Threads run concurrently, starting sometime during or after the call to pthread_create. Calling pthread_join has nothing to do with starting or running the thread, it simply waits until it exits. Both your threads have already been running and are still runnable at the point you enter and block on the first join and they will continue to run. The only thing blocked by the first join is your main thread.
The threads do not start in pthread_join, but rather in pthread_create. I mislead myself into thinking pthread_join was used to actually start the thread, whereas it is a non-busy wait for the specific thread to return before the main thread continues it's execution, in my case, main returns before the threads get a chance to call the puts function.
The second pthread_join in my code is never actually called, because main is indeed suspended from the first pthread_join waiting on c_thread to return. The second join in this particular scenario is a "no operation" and the program never actually gets to it because consumer never actually returns.
I have a question about pthread_kill() behavior.
Here's a small code I'm trying out:
void my_handler1(int sig)
{
printf("my_handle1: Got signal %d, tid: %lu\n",sig,pthread_self());
//exit(0);
}
void *thread_func1(void *arg)
{
struct sigaction my_action;
my_action.sa_handler = my_handler1;
my_action.sa_flags = SA_RESTART;
sigaction(SIGUSR1, &my_action, NULL);
printf("thread_func1 exit\n");
}
void *thread_func2(void *arg)
{
int s;
s = pthread_kill(tid1_g,SIGUSR1);
if(s)
handle_error(s,"tfunc2: pthread_kill");
printf("thread_func2 exit\n");
}
int main()
{
int s = 0;
pthread_t tid1;
s = pthread_create(&tid1,NULL,thread_func1,NULL);
if(s)
handle_error(s,"pthread_create1");
tid1_g = tid1;
printf("tid1: %lu\n",tid1);
s = pthread_join(tid1,NULL);
if(s)
handle_error(s, "pthread_join");
printf("After join tid1\n");
pthread_t tid3;
s = pthread_create(&tid3,NULL,thread_func2,NULL);
if(s)
handle_error(s,"pthread_create3");
s = pthread_join(tid3,NULL);
if(s)
handle_error(s, "pthread_join3");
printf("After join tid3\n");
return 0;
}
The output I'm getting is:
tid1: 140269627565824
thread_func1 exit
After join tid1
my_handle1: Got signal 10, tid: 140269627565824
thread_func2 exit
After join tid3
So, even though I'm calling pthread_kill() on a thread that has already finished, the handler for that thread is still getting called. Isn't pthread_kill() supposed to return error(ESRCH) in case the thread doesn't exist?
Any use (*) of the pthread_t for a thread after its lifetime (i.e. after pthread_join successfully returns, or after the thread terminates in the detached state) results in undefined behavior. You should only expect ESRCH if the pthread_t is still valid, i.e. if you haven't joined the thread yet. Otherwise all bets are off.
Note: By "use" (*), I mean passing it to a pthread_ function in the standard library. As far as I can tell, merely assigning it to another pthread_t variable or otherwise passing it around between your own functions without "using" it doesn't result in UB.
According this SO thread says that passing a signal to an already dead thread (Only if the thread was joined or exited ) results in undefined behavior!
EDIT: Found a thread which clearly quotes the latest POSIX spec which indicates the behavior to be undefined. Thanks R.. for the correct pointers!
The question asked here (How to determine if a pthread is still alive) has been marked as duplicate as this question.
But I believe this post just clarifies the behavior of pthread_kill and confirms that it does not guarantee the correct behavior if pthread_kill is called with the ID which is no more valid. Hence pthread_kill can not be used to know if thread is alive or not as if the thread was joined earlier, the ID would not have been valid or would have been re-used and same is the case if its been detached as the resources may have got reclaimed if thread was terminated.
So to determine if thread is alive (question is specifically asked for joinable threads), I could think of only one solution as below:
Use some global data/memory which can be accessed by both the threads and store the return/exit status of thread-whose-status-needs-to-be-determined there. Other threads can check this data/locatin to get its status. (Obviously this assumes that thread exited normally i.e either joined or detached).
For e.g:
Have a global bool named as "bTerminated" initialized with "FALSE" and in
the handler function of this thread either make it as "TRUE" before
returning or modify it once it is returned to the caller (i.e where you have
called `pthread_join` for this thread). Check for this variable in any other
threads where you want to know if this thread is alive. Probably it will be
straight to implement such a logic which fits into your original code.
I've run into an issue with the Linux futex syscall (FUTEX_WAIT operation) sometimes returning early seemingly without cause. The documentation specifies certain conditions that may cause it to return early (without a FUTEX_WAKE) but these all involve non-zero return values: EAGAIN if the value at the futex address does not match, ETIMEDOUT for timed waits that timeout, EINTR when interrupted by a (non-restarting) signal, etc. But I'm seeing a return value of 0. What, other than FUTEX_WAKE or the termination of a thread whose set_tid_address pointer points to the futex, could cause FUTEX_WAIT to return with a return value of 0?
In case it's useful, the particular futex I was waiting on is the thread tid address (set by the clone syscall with CLONE_CHILD_CLEARTID), and the thread had not terminated. My (apparently incorrect) assumption that the FUTEX_WAIT operation returning 0 could only happen when the thread terminated lead to serious errors in program logic, which I've since fixed by looping and retrying even if it returns 0, but now I'm curious as to why it happened.
Here is a minimal test case:
#define _GNU_SOURCE
#include <sched.h>
#include <sys/syscall.h>
#include <unistd.h>
#include <linux/futex.h>
#include <signal.h>
static char stack[32768];
static int tid;
static int foo(void *p)
{
syscall(SYS_getpid);
syscall(SYS_getpid);
syscall(SYS_exit, 0);
}
int main()
{
int pid = getpid();
for (;;) {
int x = clone(foo, stack+sizeof stack,
CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND
|CLONE_THREAD|CLONE_SYSVSEM //|CLONE_SETTLS
|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID
|CLONE_DETACHED,
0, &tid, 0, &tid);
syscall(SYS_futex, &tid, FUTEX_WAIT, x, 0);
/* Should fail... */
syscall(SYS_tgkill, pid, tid, SIGKILL);
}
}
Let it run for a while, at it should eventually terminate with Killed (SIGKILL), which is only possible if the thread still exists when the FUTEX_WAIT returns.
Before anyone goes assuming this is just the kernel waking the futex before it finishes destroying the thread (which might in fact be happening in my minimal test case here), please note that in my original code, I actually observed userspace code running in the thread well after FUTEX_WAIT returned.
Could you be dealing with a race condition between whether the parent or child operations complete first? You can probably investigate this theory by putting small sleeps at the beginning of your foo() or immediately after the clone() to determine if a forced sequencing of events masks the issue. I don't recommend fixing anything in that manner, but it could be helpful to investigate. Maybe the futex isn't ready to be waited upon until the child gets further through its initialization, but the parent's clone has enough to return to the caller?
Specifically, the CLONE_VFORK option's presence seems to imply this is a dangerous scenario. You may need a bi-directional signaling mechanism such that the child signals the parent that it has gotten far enough that it is safe to wait for the child.