unknown failure when pthread_cancel right after pthread_create and pthread_detach - c

I've tested the thread cancellation process, and have a code like this.
It works on my ARM machine, and sometimes works fine, sometimes leads to a segfault, sometimes stuck after created.
#include <stdio.h>
#include <pthread.h>
#include <unistd.h>
#include <time.h>
void *t(void *ptr)
{
printf("in t\n");
sleep(0.3);
return NULL;
}
int main() {
pthread_t p;
pthread_create(&p, NULL, t, NULL);
printf("created\n");
pthread_detach(p);
pthread_cancel(p);
printf("canceled\n");
return 0;
}
have no idea which part is leading to the issue(stuck/segfault).

I answered this same question 18 years ago. It is not safe to call pthread_cancel on a detached thread.
Your code has a race condition so its behavior is undefined. If the thread manages to terminate before you call pthread_cancel, you are passing an invalid parameter to pthread_cancel and that is undefined behavior.
If the code in main is managing the lifetime of the thread, do not detach it because otherwise there is no way to ensure the thread ID remains valid. If the code in main is not managing the lifetime of the thread, do not call pthread_cancel in it. You will never find a safe way to split the difference.
You should think of pthread_detach as rendering the thread ID invalid (or "semantically closed" as Glenn Burkhardt put it) and not use it again.
As Some programmer dude points out, your sleep rounds to zero which makes the race condition more likely to encounter.

Related

Condition variables in c

I just started learning about threads, mutexes and condition variables and I have this code:
#include <pthread.h>
#include <stdio.h>
#include <stdbool.h>
#include <stdlib.h>
static pthread_cond_t cond = PTHREAD_COND_INITIALIZER;
static pthread_mutex_t mutex;
static pthread_mutexattr_t attr;
volatile int x = 0;
void *thread_one_function (void *dummy) {
printf("In func one.");
while (true) {
pthread_mutex_lock (&mutex);
x = rand ();
pthread_cond_signal (&cond);
pthread_mutex_unlock (&mutex);
}
}
void *thread_two_function (void *dummy) {
printf("In func two.");
pthread_mutex_lock (&mutex);
while (x != 10) {
pthread_cond_wait (&cond, &mutex);
}
printf ("%d\n", x);
pthread_mutex_unlock (&mutex);
printf ("%d\n", x);
}
int main (void){
pthread_mutexattr_init (&attr);
pthread_mutexattr_settype (&attr, PTHREAD_MUTEX_RECURSIVE);
pthread_mutex_init (&mutex, &attr);
pthread_t one, two;
pthread_create(&one,NULL,thread_one_function,NULL);
pthread_create(&two,NULL,thread_two_function,NULL);
//pthread_join(one,NULL); //with this program will not end.
//pthread_join(two,NULL);
return 0;
}
I compile it as gcc prog.c -lpthread -o a.exe
And I am getting no output. Not even that my threads get into those two functions...
What am I doing wrong ? My code is created as a combination from multiple documentations.
Thanks for any help.
The most likely reason for you getting no output is that the main thread returns from the initial call to main() immediately after starting the threads. That terminates the program, including the newly-created threads.
It does not help that neither do the threads' initial messages end with newlines nor do the threads flush the standard output after the printf calls. The standard output is line-buffered when connected to a terminal, so actual delivery of those messages to the terminal will be deferred in your example.
The main thread should join the other two before it terminates. Your code comments indicate that you had other issues when you did that, but going without joining those threads is not a correct solution if in fact you want the additional threads to run to completion. The problem is not joining the threads, but rather that the threads are not terminating.
There's a pretty clear reason why the thread_one_function thread does not terminate, which I am sure you will recognize if you look for it. But what about the thread_two_function thread? The reasons for that taking a long time to terminate are at least twofold:
the more obvious one is the one hinted in #dbush's comment on the question. Consider the range of the rand() function: how many calls do you think it might take before it returns the one, specific result that thread_two_function is looking for?
moreover, how often is thread_two_function even going to get a chance to check? Consider: thread_one_function runs a tight loop in which it acquires the mutex at the top and releases it at the bottom. Pthreads mutexes do not implement any kind of fairness policy, so when thread_one_function loops around and tries to reacquire the mutex immediately after releasing it, it has a very high probability of succeeding immediately, even though thread_two_function may be trying to acquire the mutex too.
It would make more sense for your threads to take turns. That is, after generating a new number, thread_one_function would wait for thread_two_function to check it before generating another. There are several ways you could implement that. Some involve using the same condition variable in thread_one_function that you do in thread_two_function. (Details left as an exercise.) I would suggest also providing a means by which thread_two_function can tell thread_one_function that it doesn't need to generate any more numbers, and should instead terminate.
Finally, do be aware that volatile has no particular place here. It is not useful or appropriate for synchronization, whereas the mutex alone is entirely sufficient for that. It's not exactly wrong to declare x volatile, but it's extraneous.

Calling pthread_detach for an already exited thread?

#include <pthread.h>
#include <unistd.h>
static void *tfunc(void *data)
{
return NULL;
}
int main(int argc, char **argv)
{
pthread_t t;
pthread_create(&t, NULL, tfunc, NULL);
sleep(1);
pthread_detach(t);
return 0;
}
See the MWE.
It works fine, but I am unsure if this is actually defined behavior. The man page of pthread_detach says nothing about calling it on exited threads.
Yes I know about creating threads with the detached attribute, but I am specifically curious about this situation. pthread_join has a mention on this case and I assume pthread_detach works just as fine, but I haven't found any official statement.
This code is perfectly legal, and does not invoke undefined behavior:
#include <pthread.h>
#include <unistd.h>
static void *tfunc(void *data)
{
return NULL;
}
int main(int argc, char **argv)
{
pthread_t t;
pthread_create(&t, NULL, tfunc, NULL);
sleep(1);
pthread_detach(t);
return 0;
}
It's not really clearly stated, but the POSIX documentation for pthread_detach() is worded in such a way that it must be defined and correct to call pthread_detach() on a terminated thread:
The pthread_detach() function shall indicate to the implementation
that storage for the thread thread can be reclaimed when that thread
terminates. If thread has not terminated, pthread_detach() shall not
cause it to terminate.
The behavior is undefined if the value specified by the thread
argument to pthread_detach() does not refer to a joinable thread.
First, note the statement "If thread has not terminated". That implies that it must be safe to call pthread_detach() when the thread has terminated.
Second, note "The behavior is undefined if ... does not refer to a joinable thread." In your posted code, the thread you created is clearly joinable - you didn't create it with a detached attribute, so you could call pthread_join() to retrieve its returned value. So it's not undefined behavior.
Remember, there's no guaranteed way to ensure from thread A that thread B is still running when either pthread_join() or pthread_detach() is called. So either call has to be safe to call (once!) from any thread on any other thread.
Also, from the Rationale section of the POSIX documentation:
RATIONALE
The pthread_join() or pthread_detach() functions should eventually
be called for every thread that is created so that storage associated
with the thread may be reclaimed.
It has been suggested that a "detach" function is not necessary; the
detachstate thread creation attribute is sufficient, since a thread
need never be dynamically detached. However, need arises in at least
two cases:
In a cancellation handler for a pthread_join() it is nearly essential to have a pthread_detach() function in order to detach the
thread on which pthread_join() was waiting. Without it, it would be
necessary to have the handler do another pthread_join() to attempt
to detach the thread, which would both delay the cancellation
processing for an unbounded period and introduce a new call to
pthread_join(), which might itself need a cancellation handler. A
dynamic detach is nearly essential in this case.
In order to detach the "initial thread" (as may be desirable in processes that set up server threads).
Again, while not clearly stated, note the implied equivalence between pthread_join() and pthread_detach().

What if a being-waited thread detaches itself?

#include <pthread.h>
void thread_routine(void*)
{
sleep(5);
pthread_detach(pthread_self());
sleep(5);
}
int main()
{
pthread_t t;
pthread_create(&t, 0, thread_routine, 0);
pthread_join(t);
}
Will pthread_join(t); return immediately after pthread_detach(pthread_self()); succeed?
The behavior is undefined, and thus obviously to be avoided at all costs.
(As far as I can tell the behavior is implicitly undefined. There are several kindred instances of explicitly undefined behavior in the spec, but this exact scenario is not mentioned.)
For the curious, on an NPTL Linux system near me, both the pthread_detach() and the pthread_join() return 0, and, moreover, the latter blocks and successfully gets the value returned by the thread. On an OS X system near me, by contrast, the pthread_detach() succeeds, and the pthread_join() immediately fails with ESRCH.
Your code is buggy. By the time you call pthread_join, the thread may already have terminated. Since it's detached, the pthread_t is not longer valid. So your code may pass an invalid pthread_t to pthread_join, which can cause unpredictable behavior.
To avoid these kinds of problems one specific thing should control the lifetime of a thread. That can be the thread itself, if it's detached, in which case no thread should try to join it. It can also be the thread that joins it, in which case the thread should not be detached.

thread cancel issue

I am facing one issue related to pthread_cancel. Please see the code below:
void* func(void *arg)
{
while(1)
{
sleep(2);
}
}
#include<stdlib.h>
#include <stdio.h>
#include <pthread.h>
int main()
{
void *status;
pthread_t thr_Var;
pthread_setcancelstate(PTHREAD_CANCEL_DISABLE,NULL);
pthread_create(&thr_Var,NULL,func,NULL);
pthread_cancel(thr_Var);
pthread_join(thr_Var,&status);
return 0;
}
My doubt is even if i disable the cancel state, still pthread_cancel is working and thread is getting terminated.
Any help will be appreciated
pthread_setcancelstate sets the cancelability type of the calling thread, i.e. the main thread in your case. So if you want to make the newly created thread non-cancelable you should call that function from within the context of that thread.
See man 3 pthread_setcancelstate
Note that while Linux pthreads implementation permits a NULL oldstate pointer, POSIX, however, does not specify that, so it's best to provide a pointer for oldsate.

Linux futex syscall spurious wakes with return value 0?

I've run into an issue with the Linux futex syscall (FUTEX_WAIT operation) sometimes returning early seemingly without cause. The documentation specifies certain conditions that may cause it to return early (without a FUTEX_WAKE) but these all involve non-zero return values: EAGAIN if the value at the futex address does not match, ETIMEDOUT for timed waits that timeout, EINTR when interrupted by a (non-restarting) signal, etc. But I'm seeing a return value of 0. What, other than FUTEX_WAKE or the termination of a thread whose set_tid_address pointer points to the futex, could cause FUTEX_WAIT to return with a return value of 0?
In case it's useful, the particular futex I was waiting on is the thread tid address (set by the clone syscall with CLONE_CHILD_CLEARTID), and the thread had not terminated. My (apparently incorrect) assumption that the FUTEX_WAIT operation returning 0 could only happen when the thread terminated lead to serious errors in program logic, which I've since fixed by looping and retrying even if it returns 0, but now I'm curious as to why it happened.
Here is a minimal test case:
#define _GNU_SOURCE
#include <sched.h>
#include <sys/syscall.h>
#include <unistd.h>
#include <linux/futex.h>
#include <signal.h>
static char stack[32768];
static int tid;
static int foo(void *p)
{
syscall(SYS_getpid);
syscall(SYS_getpid);
syscall(SYS_exit, 0);
}
int main()
{
int pid = getpid();
for (;;) {
int x = clone(foo, stack+sizeof stack,
CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND
|CLONE_THREAD|CLONE_SYSVSEM //|CLONE_SETTLS
|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID
|CLONE_DETACHED,
0, &tid, 0, &tid);
syscall(SYS_futex, &tid, FUTEX_WAIT, x, 0);
/* Should fail... */
syscall(SYS_tgkill, pid, tid, SIGKILL);
}
}
Let it run for a while, at it should eventually terminate with Killed (SIGKILL), which is only possible if the thread still exists when the FUTEX_WAIT returns.
Before anyone goes assuming this is just the kernel waking the futex before it finishes destroying the thread (which might in fact be happening in my minimal test case here), please note that in my original code, I actually observed userspace code running in the thread well after FUTEX_WAIT returned.
Could you be dealing with a race condition between whether the parent or child operations complete first? You can probably investigate this theory by putting small sleeps at the beginning of your foo() or immediately after the clone() to determine if a forced sequencing of events masks the issue. I don't recommend fixing anything in that manner, but it could be helpful to investigate. Maybe the futex isn't ready to be waited upon until the child gets further through its initialization, but the parent's clone has enough to return to the caller?
Specifically, the CLONE_VFORK option's presence seems to imply this is a dangerous scenario. You may need a bi-directional signaling mechanism such that the child signals the parent that it has gotten far enough that it is safe to wait for the child.

Resources