How do I get the error code from pthread_join()? - c

The following code fails to join pthreads and the message "join failed" is printed. How do I get more information about the failure and its cause?
pthread_t aThread[MAX_LENGTH];
int errCode[MAX_LENGTH];
char returnVal;
for(int i = 0; i < MAX_LENGTH; i++)
{
if((errCode[i] = pthread_create(&aThread[i], NULL, &findMatch, &fpArgs)) != 0)
printf("error creating thread %d\n", errCode[i]);
if(!pthread_join(aThread[i], (void**)&returnVal))
printf("join failed\n i is %d", i);
}
EDIT: actually join returned no error and I made a mistake. The if statment shouldn't have the ! because join returns a non-zero number if there is a problem which evaluates to true.

I pointed this out in comment, but it deserves amplification.
Your returnVal usage is wrong
The pthread_join api expects a void**, that is a pointer to a void*. Unlike void*, a void** is not equally universal. It is a pointer of specific type and as such you should only pass a likewise typed address. However, you're not using it anyway, so I would suggest for now you simply pass NULL. As-written, it is undefined behavior. And I can all-but-guarantee you sizeof(char), the writable size of the address you giving it, and sizeof(void*), the size it expects to have available, are not the same. Consider this instead for now:
pthread_join(aThread[i], NULL);
In case you're wondering what the use for that void** parameter is, it is a place to store void* return value from your thread-proc. Recall a pthread thread-proc looks like this:
void* thread_proc(void* args)
// ^----- this is what is stashed in the pthread_join second parameter
You're logic for failure testing is backwards
The pthread_join function returns 0 on success; not on failure.
You're not actually running concurrent threads
Thread concurrency simply means your threads run simultaneously. But yours do not. You start a thread, then wait for it to end, then start a thread, then wait for it to end, etc. This is literally no better (and in fact, actually worse) than simply calling a function. If you want your threads to run concurrently your logic should be styled like this:
pthread_t aThread[MAX_LENGTH];
int errCode[MAX_LENGTH] = {0};
for (int i = 0; i < MAX_LENGTH; i++)
{
if((errCode[i] = pthread_create(&aThread[i], NULL, &findMatch, &fpArgs)) != 0)
printf("error creating thread %d, error=%d\n", i, errCode[i]);
}
for (int i = 0; i < MAX_LENGTH; i++)
{
// note the check for errCode[i], which is only non-zero
// if the i'th thread failed to start
if(errCode[i] == 0)
{
errCode[i] = pthread_join(aThread[i], NULL))
if (errCode[i] != 0)
printf("error joining thread %d, error=%d\n", i, errCode[i]);
}
}

When the function fails (i.e. in any pthread call, a return code that is not equal to zero) it will set errno to the value of the reason for failure. There are a couple of ways to get the textual explanation of failure code.
int returnval;
if((returnval = pthread_join(aThread[i], (void**)&returnVal)) != 0)
{
printf("error joining thread: %s\n", strerror(returnval)); //1st optiop
perror("error joining thread:"); //2nd option
printf("error joining thread: %m\n"); //3rd option
}
(1) strerror will print the error string of the error value you pass it and is convenient for placing in printf statements.
(2) perror allows you to pass a little string that will print first and then it will automatically print the error description of whatever value errno is set to. You don't need to explicitly pass errno.
(3) There is a glibc extension to printf that provide a %m conversion specifier that acts like strerror but with a little less muss and fuss. This would be the least portable.
Once you get the description you can easily look into the man pages of the call that failed and they will provide greater hints as to why the call failed. Charlie Burns has posted the reasons pthread_join might fail.

Am I missing something? The return value tells you the error:
RETURN VALUES
If successful, the pthread_join() function will return zero. Otherwise,
an error number will be returned to indicate the error.
ERRORS
pthread_join() will fail if:
[EDEADLK] A deadlock was detected or the value of thread speci-
fies the calling thread.
[EINVAL] The implementation has detected that the value speci-
fied by thread does not refer to a joinable thread.
[ESRCH] No thread could be found corresponding to that speci-
fied by the given thread ID, thread.

More specifically::
int retVal = pthread_create(&myThread, NULL,myThreadFn, NULL);
printf("error joining thread: %d\n", retVal);

The pthread library does not set the errno variable upon error. The error code is returned by the function instead.
The online manual under Linux is quite clear for the pthread functions (e.g. man pthread_join) as the "RETURN VALUE" section generally contains something like:
RETURN VALUE
On success, pthread_join() returns 0; on error, it returns an error number.
If you need to output the error through functions like strerror(), strerror_r() or %m printf format (the latter is a GLIBC extension), you must use the return code of the failing service or update errno in the error branch:
if ((rc = pthread_join(...)) != 0) {
errno = rc;
fprintf(stderr, "pthread_join(): %m\n");
OR
fprintf(stderr, "pthread_join(): %m\n", strerror(errno)); // rc could be used without errno
OR
char err_buf[128];
errno = rc;
fprintf(stderr, "pthread_join(): %m\n", strerror_r(errno, err_buf, sizeof(err_buf))); // rc could be used without errno
Notes:
errno is thread-safe (it is located in the thread local storage). So, it is local to each thread
strerror_r() and %m should be used in multi-threaded environment as they are thread-safe (strerror() is not)

Related

How collect thread exit status(using join) when cancelled

I am trying to cancel thread from caller or calle, but both are crashing the program
But if I join I am getting the exit status correct.
how to collect the exit status properly on pthread_cancel
man page says below
After a canceled thread has terminated, a join with that thread using
pthread_join(3) obtains PTHREAD_CANCELED as the thread's exit status.
(Joining with a thread is the only way to know that cancellation has
completed.)
#include <stdio.h>
#include <pthread.h>
void *thread_func(void *arg);
int errNum = 3;
int main()
{
pthread_t t_id;
void *status;
// on success pthread_create return zero
if(pthread_create(&t_id,NULL,thread_func,NULL) != 0){
printf("thread creation failed\n");
return 0;
}
printf("thread created with id %u successfully\n",t_id);
// status will be collecting the pthread_exit value
// error numberis returned incase of error
// pthread_cancel(t_id);
if(pthread_join(t_id,&status) != 0){
printf("join failed\n");
}
printf("thread %u exited with code %d\n", t_id, *(int *)status);
return 0;
}
void *thread_func(void *arg)
{
printf("Inside thread_func :%u\n",pthread_self());
//the arguments of pthread_exit should not be from local space, as it will be collected in caller using join
//pthread_exit(&errNum);
// if we return it may cause seg fault as we are trying to print the value from ptr(status)
//return ;
pthread_cancel(pthread_self());
}
If a thread is cancelled (before it has terminated normally), then when you join it, you will receive PTHREAD_CANCELED as the thread's return value / exit status. That macro expands to the actual void * value that is returned, so you can compare the value you receive directly to that to judge whether the thread was cancelled. It generally is not a valid pointer, so you must not try to dereference it.
Example:
void *status;
// ...
if (pthread_join(t_id, &status) != 0) {
// pthread_join failed
} else if (status == PTHREAD_CANCELED) {
// successfully joined a thread that was cancelled
// 'status' MUST NOT be dereferenced
} else {
// successfully joined a thread that terminated normally
// whether 'status' may be dereferenced or how else it may be
// used depends on the thread
}
It is worth noting that the wording of the Linux manual page is a bit fast and loose. Threads do not have an "exit status" in the sense that processes do, and the actual POSIX specifications do not use the term in the context of threads. For example, the POSIX specifications for pthread_join() say:
On return from a successful pthread_join() call with a non-NULL value_ptr argument, the value passed to pthread_exit() by the terminating thread shall be made available in the location referenced by value_ptr.
That's a bit of a mouthful compared to the Linux wording, but it is chosen to be very precise.
Note also that the choice of type void * here is intentional and useful. It is not merely an obtuse way to package an int. Through such a pointer, a thread can provide access to an object of any type, as may be useful for communicating information about the outcome of its computations. On the other hand, it is fairly common for threads to eschew that possibility and just return NULL. But if a thread did want to provide an integer code that way, then it would most likely provide an intvalue cast to type void *, rather than a pointer to an object of type int containing the chosen value. In that case, one would obtain the value by casting back to int, not by dereferencing the pointer.

Return from exit() with fork() is weirdly bitshifted

I have a code in C that forks itself sometimes, each fork does something then returns an error code. Currently, each child process returns its ID (0..n).
void other(int numero){
...
exit(numero);
}
int main(...){
for(int i = 0; i < 6; i++){
if(fork() == 0){
other(i);
}
}
int order[6];
for(int i = 0; i < 6; i++){
int code;
waitpid(0, &code, 0);
order[i] = code; // <-- HERE
}
}
Weirdly, this returns a multiple of the real value. By replacing the line I marked with :
order[i] = code >> 8;
I managed to get the expected result of 0..5. However, I really don't understand why this happens. I expect it's because of some kind of type problem, but I don't see it, I'm always using ints.
The correct replacement for order[i] = code; is order[i] = WEXITSTATUS(code); Also, note that waitpid can return even if the process didn't exit; you should use WIFEXITED to make sure it did.
From man 2 waitpid:
If wstatus is not NULL, wait() and waitpid() store status information
in the int to which it points. This integer can be inspected with
the following macros (which take the integer itself as an argument,
not a pointer to it, as is done in wait() and waitpid()!):
WEXITSTATUS(wstatus)
returns the exit status of the child. This consists of the
least significant 8 bits of the status argument that the child
specified in a call to exit(3) or _exit(2) or as the argument
for a return statement in main(). This macro should be
employed only if WIFEXITED returned true.
You're supposed to use the various macros listed there, such as WEXITSTATUS in your case, to make sense of wstatus. Other than using them, it's only safe to treat wstatus as an opaque blob (aside from a special case when it's 0).
You're supposed to be using the W* macros from sys/wait.h to interpret exit statuses.
See the waitpid manpage.
As far as the raw value is concerned, you can only count on the fact that status==0 means WIFEXITED(status) && WEXITSTATUS(status)==0 (see http://pubs.opengroup.org/onlinepubs/9699919799/functions/waitpid.html which describes this special guarantee).

pthread_kill() with invalid thread

I'd like to determine if a particular thread 'exists'.
pthread_kill() appears to be suited to this task, at least according to its man page.
If sig is 0, then no signal is sent, but error checking is still performed.
Or, as my system's man page puts it:
If sig is 0, then no signal is sent, but error checking is still performed; this can be used to check for the existence of a thread ID.
However when I attempt to pass in an uninitialized pthread_t, the application invariably SEGFAULTs.
Digging into this, the following snippet from pthread_kill.c (from my toolchain) appears to do no error checking, and simply attempts to de-reference threadid (the de-reference is at pd->tid).
int
__pthread_kill (threadid, signo)
pthread_t threadid;
int signo;
{
struct pthread *pd = (struct pthread *) threadid;
/* Make sure the descriptor is valid. */
if (DEBUGGING_P && INVALID_TD_P (pd))
/* Not a valid thread handle. */
return ESRCH;
/* Force load of pd->tid into local variable or register. Otherwise
if a thread exits between ESRCH test and tgkill, we might return
EINVAL, because pd->tid would be cleared by the kernel. */
pid_t tid = atomic_forced_read (pd->tid);
if (__builtin_expect (tid <= 0, 0))
/* Not a valid thread handle. */
return ESRCH;
We can't even rely on zero being a good initializer, because of the following:
# define DEBUGGING_P 0
/* Simplified test. This will not catch all invalid descriptors but
is better than nothing. And if the test triggers the thread
descriptor is guaranteed to be invalid. */
# define INVALID_TD_P(pd) __builtin_expect ((pd)->tid <= 0, 0)
Additionally, I noticed the following in the linked man page (but not on my system's):
POSIX.1-2008 recommends that if an implementation detects the use of a thread ID after the end of its lifetime, pthread_kill() should return the error ESRCH. The glibc implementation returns this error in the cases where an invalid thread ID can be detected. But note also that POSIX says that an attempt to use a thread ID whose lifetime has ended produces undefined behavior, and an attempt to use an invalid thread ID in a call to pthread_kill() can, for example, cause a segmentation fault.
As outlined here by R.., I'm asking for the dreaded undefined behavior.
It would appear that the manual is indeed misleading - particularly so on my system.
Is there a good / reliable way to ask find out if a thread exists? (presumably by not using pthread_kill())
Is there a good value that can be used to initialize pthread_t type variables, even if we have to catch them ourselves?
I'm suspecting that the answer is to employ pthread_cleanup_push() and keep an is_running flag of my own, but would like to hear thoughts from others.
I think I've come to a realisation while driving home, and I suspect that many others may find this useful too...
It would appear that I've been treating the worker (the thread), and the task (what the thread is doing) as one and the same, when in fact, they are not.
As I've already established from the code snippets in the question, it is unreasonable to ask "does this thread exist" as the pthread_t is likely just a pointer (it certainly is on my target). It's almost certainly the wrong question.
The same goes for process IDs, file handles, malloc()'d memory, etc... they don't use unique and never repeating identifiers, and thus are not unique 'entities' that can be tested for their existence.
The suspicions that I raised in the question are likely true - I'm going to have to use something like an is_running flag for the task (not thread).
One approach that I've thought about is to use a semaphore initialized to one, sem_trywait(), sem_post() and pthread_cleanup_push(), as in the example below (cleanup missing for brevity).
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <string.h>
#include <semaphore.h>
#include <pthread.h>
struct my_task {
sem_t can_start;
pthread_t tid;
/* task-related stuff */
};
void *my_task_worker(void *arg) {
struct my_task *task = arg;
pthread_cleanup_push(sem_post, &(task->can_start));
fprintf(stderr, "--- task starting!\n");
usleep(2500000);
fprintf(stderr, "--- task ending!\n");
pthread_cleanup_pop(1);
return NULL;
}
void my_task_start(struct my_task *task) {
int ret;
ret = sem_trywait(&(task->can_start));
if (ret != 0) {
if (errno != EAGAIN) {
perror("sem_trywait()");
exit(1);
}
fprintf(stderr, ">>> task already running...\n");
return;
}
ret = pthread_create(&(task->tid), NULL, my_task_worker, task);
if (ret != 0) {
perror("pthread_create()");
exit(1);
}
fprintf(stderr, ">>> started task!\n");
return;
}
int main(int argc, char *argv[]) {
int ret;
struct my_task task;
int i;
memset(&task, 0, sizeof(0));
ret = sem_init(&(task.can_start), 0, 1);
if (ret != 0)
{
perror("sem_init()");
return 1;
}
for (i = 0; i < 10; i++) {
my_task_start(&task);
sleep(1);
}
return 0;
}
Output:
>>> started task!
--- task starting!
>>> task already running...
>>> task already running...
--- task ending!
>>> started task!
--- task starting!
>>> task already running...
>>> task already running...
--- task ending!
>>> started task!
--- task starting!
>>> task already running...
>>> task already running...
--- task ending!
>>> started task!
--- task starting!

pthread_join success = thread entirely executed?

I have a question about pthreads whith this little C source:
int calc = 0;
void func(void* data){
calc = 2 * 2;
return NULL;
}
int main(){
pthread_t t;
if(0==pthread_create(&t,NULL,func,NULL)){
if(0==pthread_join(t,NULL)){
printf("Result: %d\n",calc); // 4 ?
}
}
}
If pthread_join return success, is "func" always executed entirely ? (calc always equal 4 on printf ?).
The function pthread_join returns zero on success.
The documentation says that pthread_join blocks until the thread ends, so, with some applied logic one can easily conclude that the thread has ended.
On the other side, pthread_join fails in different ways:
When the handle is invalid: EINVAL
When a deadlock has been detected: EDEADLK
There is another possible error (recommended by the open group, but depending on the implementation): ESRCH, when it detects that the thread handle is being used past the end of the thread.
If you want to know more you may want to take a look at the documentation.

Exiting a thread in c

I am currently working on a some C code, very new to C, so apologize if this is a bit basic or a stupid question.
I have the following code which is executed within a thread using pthread_create().
if (ps.status == completed)
{
LOG(LOG_AUDIO, "evsafewait_sm_play_tone:\tPlay tone complete");
if (e2)
{
LOG(LOG_MUST, "Failed to free tone event. Result: %i", e2);
}
pccb->playToneComplete = 1;
LOG(LOG_AUDIO, "Detatching thread ID %x", manageToneParms->toneManagerThread);
//pthread_detach(manageToneParms->toneManagerThread);
int retVal;
pthread_exit(&retVal);
LOG(LOG_AUDIO, "THREAD TERMINATED WITH RESULT %i", retVal);
LOG(LOG_AUDIO, "Freeing memory");
free(manageToneParms->playToneParms);
free(manageToneParms);
return 0;
}
Before the structures are free and the method returns I am trying to exit the thread using pthread_exit() but when this is called, everything below it is skipped, no errors are displayed, as far as I can see anyway.
I have tried debugging it with GDB and when pthread_exit() is called the next thing it prints out is siglongjmp, I have no idea what this is, I don't believe it's in the C code, at least not in the changes that I have been making to it.
How can I exit this thread? I've also tried pthread_exit(NULL) and pthread_kill(threadID, SIGKILL) but then this kills the whole program not just the thread.
pthread_exit() returns from the thread at the point of the call. Any code after pthread_exit() will not be executed.
You should be sure to release any memory allocated in the thread before calling pthread_exit().
Have you tried something like this:?
int retVal;
/* free resources -- make sure all resources allocated in thread
function are released
*/
free(manageToneParms->playToneParms);
free(manageToneParms);
LOG(LOG_AUDIO, "THREAD TERMINATED WITH RESULT %i", retVal);
LOG(LOG_AUDIO, "Freeing memory");
pthread_exit(&retVal);

Resources