In a C program, I am using PTHREAD_CANCEL_ASYNCHRONOUS to cancel the thread immediately, as soon as the pthread_cancel is fired from the parent thread. But it is causing the whole process to get crash with Segmentation Fault. The job of child thread is to get some data from a database server. And my logic is that if it doesnt get data within 10 seconds, the thread should get killed from the parent thread.
I want only to kill the child thread, not the whole process.
struct str_thrd_data
{
SQLHANDLE hstmt;
int rc;
bool thrd_completed_flag;
};
void * str_in_thread_call(void *in_str_arg)
{
int thrd_rc;
struct str_thrd_data *str_arg;
str_arg = in_str_arg;
thrd_rc = pthread_setcancelstate(PTHREAD_CANCEL_DISABLE, NULL);
if (thrd_rc != 0)
handle_error_en(thrd_rc, "pthread_setcancelstate");
thrd_rc = pthread_setcancelstate(PTHREAD_CANCEL_ENABLE, NULL);
if (thrd_rc != 0)
handle_error_en(thrd_rc, "pthread_setcancelstate");
thrd_rc = pthread_setcanceltype(PTHREAD_CANCEL_ASYNCHRONOUS, NULL);
if (thrd_rc != 0)
handle_error_en(thrd_rc, "pthread_setcanceltype");
// Code to call SQL Dynamic Query from a Database Server. This takes time more than 10 seconds.
thrd_rc = SQLExecute(hstmt);
printf("\n*********************Normal Thread termination withing timelimit %d\n",str_arg->rc);
str_arg->thrd_completed_flag = true;
}
int main()
{
printf("\nPJH: New THread created.\n");
pthread_attr_t tattr;
pthread_t th;
size_t mysize = 1;
struct str_thrd_data atd;
atd.hstmt = hstmt;
atd.rc= rc;
atd.thrd_completed_flag = false;
thrd_rc = pthread_attr_init(&tattr);
thrd_rc = pthread_attr_setstacksize(&tattr, mysize);
thrd_rc = pthread_create(&th, &tattr, &str_in_thread_call, &atd);
if (thrd_rc != 0)
handle_error_en(thrd_rc, "pthread_create");
// While Loop tp count till 10 seconds.
while(timeout !=0)
{
printf("%d Value of rc=%d\n",timeout, atd.rc);
if(atd.rc != 999) break;
timeout--;
usleep(10000);
}
rc = atd.rc;
//Condition to check if thread is completed or not yet.
if(atd.thrd_completed_flag == false)
{
//Thread not comepleted within time, so Kill it now.
printf("PJH ------- 10 Seconds Over\n");
thrd_rc = pthread_cancel(th);
printf("PJH ------- Thread Cancelled Immediately \n");
if (thrd_rc != 0)
{
handle_error_en(thrd_rc, "pthread_cancel");
}
printf("\nPJH &&&&&&&& Thread Cancelled Manually\n");
}
thrd_rc = pthread_join(th,NULL);
// some other job .....
}
gdb process_name corefile shows the below backtrace:- Mostly all SQL Library functions.
#0 0xffffe410 in __kernel_vsyscall ()
#1 0x0059fe30 in raise () from /lib/libc.so.6
#2 0x005a1741 in abort () from /lib/libc.so.6
#3 0xdef3f5d7 in ?? () from /usr/lib/libstdc++.so.5
#4 0xdef3f624 in std::terminate() () from /usr/lib/libstdc++.so.5
#5 0xdef3f44c in __gxx_personality_v0 () from /usr/lib/libstdc++.so.5
#6 0x007e1917 in ?? () from /lib/libgcc_s.so.1
#7 0x007e1c70 in _Unwind_ForcedUnwind () from /lib/libgcc_s.so.1
#8 0x007cda46 in _Unwind_ForcedUnwind () from /lib/libpthread.so.0
#9 0x007cb471 in __pthread_unwind () from /lib/libpthread.so.0
#10 0x007c347a in sigcancel_handler () from /lib/libpthread.so.0
#11 <signal handler called>
#12 0xffffe410 in __kernel_vsyscall ()
#13 0x0064decb in semop () from /lib/libc.so.6
#14 0xe0245901 in sqloSSemP () from /opt/IBM/db2/V9.1/lib32/libdb2.so.1
#15 0xe01e7f3c in sqlccipcrecv(sqlcc_comhandle*, sqlcc_cond*) () from /opt/IBM/db2/V9.1/lib32/libdb2.so.1
#16 0xe03fe135 in sqlccrecv () from /opt/IBM/db2/V9.1/lib32/libdb2.so.1
#17 0xe02a0307 in sqljcReceive(sqljCmnMgr*) () from /opt/IBM/db2/V9.1/lib32/libdb2.so.1
#18 0xe02d0ba3 in sqljrReceive(sqljrDrdaArCb*, db2UCinterface*) () from /opt/IBM/db2/V9.1/lib32/libdb2.so.1
#19 0xe02c510d in sqljrDrdaArExecute(db2UCinterface*, UCstpInfo*) () from /opt/IBM/db2/V9.1/lib32/libdb2.so.1
#20 0xe01392bc in CLI_sqlCallProcedure(CLI_STATEMENTINFO*, CLI_ERRORHEADERINFO*) () from /opt/IBM/db2/V9.1/lib32/libdb2.so.1
#21 0xe00589c7 in SQLExecute2(CLI_STATEMENTINFO*, CLI_ERRORHEADERINFO*) () from /opt/IBM/db2/V9.1/lib32/libdb2.so.1
#22 0xe0050fc9 in SQLExecute () from /opt/IBM/db2/V9.1/lib32/libdb2.so.1
#23 0x080a81f7 in apcd_in_thread_call (in_apcd_arg=0xbc8e8f34) at dcs_db2_execute.c:357
#24 0x007c4912 in start_thread () from /lib/libpthread.so.0
#25 0x0064c60e in clone () from /lib/libc.so.6
Asynchronous thread cancellation can only be safely used on threads which perform a very restricted set of operations — the official rules are long and confusing, but in effect threads subject to async cancels can only perform pure computation. They can't do I/O, they can't allocate memory, they can't take locks of any kind, and they can't call any library function that might do any of the above. There is no way it is safe to apply async cancels to a thread that talks to a database.
Deferred cancellation is less restricted, but is still extremely finicky. If your database library is not coded to cope with the possibility that the calling thread might be cancelled mid-operation — and it probably isn't — then you can't safely use deferred cancellation, either.
You will need to find some other mechanism for aborting queries which run too long.
EDIT: Since this is DB2 and the confusingly-named "CLI" API, try using SqlSetStmtAttr to set the SQL_ATTR_QUERY_TIMEOUT parameter on the prepared statement. This is the full list of parameters that can be set this way, and here is some more discussion of query timeouts.
SON OF EDIT: According to a friend who has done a lot more database work than me, it is quite likely that there is a server-side mechanism for cancelling slow queries regardless of their source. If this exists in DB2 it may be more convenient than manually setting timeouts on all your queries client-side, especially as it may be able to log slow queries so you know which ones they are and can optimize them.
Since the database client code is probably not written in such a way that it can deal with cancellation (most library code isn't), I don't think this approach will work. See Zack's answer for details.
If you need to be able to cancel database connections, you will probably have to proxy the connection and kill the proxy. Basically, what you would do is create a second thread that listens on a port and forwards the connection to the database server, and direct your database client to connect to this port on localhost instead of the real database server/port. The proxy thread could then be cancellable (with normal deferred cancellation, not asynchronous), with a cancellation cleanup handler to shutdown the sockets. Losing connection to the database server via a closed socket (rather than just a non-responsive socket) should cause the database client library code to return with an error, and you can then have its thread exit too.
Keep in mind when setting up such a proxy that you will need to make sure you don't introduce security issues with access to the database.
Here is a sketch of the code you could use for a proxy, without any error checking logic and without anything to account for unintended clients connecting:
int s, c;
struct addrinfo *ai;
struct sockaddr_in sa;
char portstr[8];
getaddrinfo(0, 0, &(struct addrinfo){ .ai_flags = AI_PASSIVE, .ai_family = AF_INET }, &ai);
s = socket(ai->ai_family, ai->ai_socktype, ai->ai_protocol);
bind(s, ai->ai_addr, ai_addrlen);
freeaddrinfo(ai);
getsockname(s, (void *)&sa, &(socklen_t){sizeof sa});
port = ntohs(sa.sin_port);
/* Here, do something to pass the port (assigned by kernel) back to the caller. */
listen(s, 1);
c = accept(s, &sa, &(socklen_t){sizeof sa});
close(s);
getaddrinfo("dbserver", "dbport", 0, &ai);
s = socket(ai->ai_family, ai->ai_socktype, ai->ai_protocol);
connect(s, ai->ai_addr, ai->ai_addrlen);
freeaddrinfo(ai);
At this point, you have two sockets, s connected to the database server, and c connected to the database client in another thread of your program. Whatever you read from one should be written to the other; use poll to detect which one is ready for reading or writing.
During the above setup code, cancellation should be blocked except around the accept and connect calls, and at those points, you need appropriate cleanup handlers to close your sockets and call freeaddrinfo if cancellation happens. It might make sense to copy the data you're using from getaddrinfo to local variables so you can freeaddrinfo before the blocking calls and not have to worry about doing it from a cancellation cleanup handler.
Related
We have created a multithreaded, single core application running on Ubuntu.
When we call getaddrinfo and gethostbyname from the main process, it does not crash.
However when we create a thread from the main process and the functions getaddrinfo and gethostbyname are called from the created thread, it always crashes.
Kindly help.
Please find the call stack below:
#0 0xf7e9f890 in ?? () from /lib/i386-linux-gnu/libc.so.6
#1 0xf7e9fa73 in __res_ninit () from /lib/i386-linux-gnu/libc.so.6
#2 0xf7ea0a68 in __res_maybe_init () from /lib/i386-linux-gnu/libc.so.6
#3 0xf7e663be in ?? () from /lib/i386-linux-gnu/libc.so.6
#4 0xf7e696bb in getaddrinfo () from /lib/i386-linux-gnu/libc.so.6
#5 0x080c4e35 in mn_task_entry (args=0xa6c4130 <ipc_os_input_params>) at /home/nextg/Alps_RT/mn/src/mn_main.c:699
#6 0xf7fa5d78 in start_thread () from /lib/i386-linux-gnu/libpthread.so.0
#7 0xf7e9001e in clone () from /lib/i386-linux-gnu/libc.so.6
The reason the getaddrinfo was crashing because, the child thread making the call did not have sufficient stack space.
Using ACE C++ version 6.5.1 libraries classes which use ACE_Thread::spawn_n with default ACE_DEFAULT_THREAD_PRIORITY (1024*1024) will crash when calling gethostbyname/getaddrinfo inside child as reported by Syed Aslam. libxml2 schema parsing takes forever, using a child thread Segment Faulted after calling xmlNanoHTTPConnectHost as it tries to resolve schemaLocation.
ACE_Task activate
const ACE_TCHAR *thr_name[1];
thr_name[0] = "Flerf";
// libxml2-2.9.7/nanohttp.c:1133
// gethostbyname will crash when child thread making the call
// has insufficient stack space.
size_t stack_sizes[1] = {
ACE_DEFAULT_THREAD_STACKSIZE * 100
};
const INT ret = this->activate (
THR_NEW_LWP/*Light Weight Process*/ | THR_JOINABLE,
1,
0/*force_active*/,
ACE_DEFAULT_THREAD_PRIORITY,
-1/*grp_id*/,
NULL/*task*/,
NULL/*thread_handles[]*/,
NULL/*stack[]*/,
stack_sizes/*stack_size[]*/,
NULL/*thread_ids[]*/,
thr_name
);
I have a multi-threaded program. The main thread uses a getchar to close all the other threads and itself. I have a timer functionality used in one of the child threads. This thread uses SIG34 for timer expiration.
At some point, I receive the SIG34 as below. This is affecting the getchar in my main thread and my program just aborts. Please help me in understanding the same.
Program received signal SIG34, Real-time event 34.
0x00007ffff6ea38cd in read () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0 0x00007ffff6ea38cd in read () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007ffff6e37ff8 in _IO_file_underflow () from /lib/x86_64-linux-gnu/libc.so.6
#2 0x00007ffff6e3903e in _IO_default_uflow () from /lib/x86_64-linux-gnu/libc.so.6
#3 0x00007ffff6e2fb28 in getchar () from /lib/x86_64-linux-gnu/libc.so.6
#4 0x0000000000401eef in main (argc=1, argv=0x7fffffffe178) at ../../src/SimMain.c:186
Note:
In the child thread, I have assigned SIGRTMIN(translates to SIG34 on my system) for timer signalling and have a handler also. This handler sets a global variable to let me change course post timer expiration. But unsure why getchar is in issue.
Timer Init and usage:
/* Timer macros */
#define CLOCKID CLOCK_REALTIME
#define SIGRT_OFFSET 4 // was 0 before, hence, SIG34, now it is SIG38
#define SIG (SIGRTMIN + SIGRT_OFFSET)
void cc_timer_init()
{
// Install the timer handler...
struct sigevent sev;
long long freq_nanosecs;
struct sigaction disc_action;
/* Establish timer_handler for timer signal */
memset (&disc_action, 0, sizeof (disc_action));
disc_action.sa_flags = SA_SIGINFO; //0 before
disc_action.sa_sigaction = disc_timer_handler;
sigaction(SIG, &disc_action, NULL);
myState = INIT_STATE;
/* Create the timer */
sev.sigev_notify = SIGEV_SIGNAL;
sev.sigev_signo = SIG;
sev.sigev_value.sival_ptr = &timerid;
timer_create(CLOCKID, &sev, &timerid);
/* Set itimerspec to start the timer */
freq_nanosecs = TMR_TV_NSEC;
v_itimerspec.it_value.tv_sec = TMR_TV_SEC;
v_itimerspec.it_value.tv_nsec = freq_nanosecs % 1000000000;
v_itimerspec.it_interval.tv_sec = 0;
v_itimerspec.it_interval.tv_nsec = 0;
}
static void disc_timer_handler(int sig, siginfo_t *si, void *uc)
{
/* Global variable that I set */
State = MID_1_STATE;
}
/* In another part...*/
.
.
.
case INIT_STATE :
{
v_itimerspec.it_value.tv_sec = TMR_TV_SEC;
timer_settime(timerid, 0, &v_itimerspec, NULL);
ret_val = SUCCESS;
}
break;
.
.
.
From the ubuntu pthreads information sheet (LinuxThreads)):
In addition to the main (initial) thread, and the threads that the
program creates using pthread_create(3), the implementation creates
a "manager" thread. This thread handles thread creation and
termination. (Problems can result if this thread is inadvertently
killed.)
- Signals are used internally by the implementation. On Linux 2.2 and
later, the first three real-time signals are used.
Other implementations use the first two RT signals. Set SIGRTMIN above these two/three signals that used by threading management. See what your pthreads(7) man page says about SIGRTMIN. And adjust accordingly.
I'm developing an project on an embedded linux OS(uclinux, mips CPU), It crashed occasionally.
When I try to check coredump with gdb, I can see that it received a SIGILL signal.
Sometime I can see the backtrace, which showed it died in pthread_mutex_lock. but most time, backtrace is not valid.
A valid backtrace
(gdb) bt
#0 <signal handler called>
#1 0x2ab87fd8 in sigsuspend () from /lib/libc.so.0
#2 0x2aade80c in __pthread_wait_for_restart_signal () from /lib/libpthread.so.0
#3 0x2aadc7ac in __pthread_alt_lock () from /lib/libpthread.so.0
#4 0x2aad81a4 in pthread_mutex_lock () from /lib/libpthread.so.0
#5 0x0042fde8 in aos_mutex_lock (mutex=0x66bea8) at ../../source/ssp/os/sys/linux/aos_lock_linux.c:184
invalid backtrace
(gdb) bt
#0 0x00690430 in ?? ()
#1 0x00690430 in ?? ()
I used pthread_attr_setstackaddr to initial a stack for each thread, so that I can see its call frame through checking its stack. Also, I found it died in pthread_mutex_lock.
I used a wrapper for lock and unlock like
struct aos_mutex_t
{
pthread_mutex_t mutex;
S8 obj_name[AOS_MAX_OBJ_NAME];
S32 lFlag;
};
S32 aos_mutex_lock(AOS_MUTEX_T *mutex)
{
S32 status;
AOS_ASSERT_RETURN(mutex, AOS_EINVAL);
mutex->lFlag++;
status = pthread_mutex_lock( &mutex->mutex );
if (status == 0)
{
return AOS_SUCC;
}
else
{
return AOS_RETURN_OS_ERROR(status);
}
}
/*
* aos_mutex_unlock()
*/
S32 aos_mutex_unlock(AOS_MUTEX_T *mutex)
{
S32 status;
AOS_ASSERT_RETURN(mutex, AOS_EINVAL);
status = pthread_mutex_unlock( &mutex->mutex );
mutex->lFlag--;
if (status == 0)
return AOS_SUCC;
else
{
return AOS_RETURN_OS_ERROR(status);
}
}
All of these mutex is initiated before using them.
I tried gdb to run the program, it didn't die.
I wrote a simple program, 11 threads do nothing bug just lock and unlock in a dead loop. It didn't die.
Is there any suggestion?
I am running a phread test program until it fails. Here is the main skeleton of the code:
int authSessionListMutexUnlock()
{
int rc = 0;
int rc2 = 0;
rc2 = pthread_mutex_trylock(&mutex);
ERR_IF( rc2 != EBUSY && rc2 != 0 );
rc2 = pthread_mutex_unlock(&mutex);
ERR_IF( rc2 != 0 );
cleanup:
return rc;
}
static void cleanup_handler(void *arg)
{
int rc = 0;
(void)arg;
rc = authSessionListMutexUnlock();
if (rc != 0)
AUTH_DEBUG5("authSessionListMutexUnlock() failed\n");
}
static void *destroy_expired_sessions(void *t)
{
int rc2 = 0;
(void)t;
pthread_cleanup_push(cleanup_handler, NULL);
rc2 = pthread_setcancelstate(PTHREAD_CANCEL_ENABLE, NULL);
if (rc2 != 0)
AUTH_DEBUG5("pthread_setcancelstate(): rc2 == %d\n", rc2);
rc2 = pthread_setcanceltype(PTHREAD_CANCEL_ASYNCHRONOUS, NULL);
if (rc2 != 0)
AUTH_DEBUG5("pthread_setcanceltype(): rc2 == %d\n", rc2);
while (1)
{
... // destroy expired session
sleep(min_timeout);
}
pthread_cleanup_pop(0);
}
int authDeinit( char *path )
{
...
rc2 = authSessionListDeInit();
ERR_IF( rc2 != 0 );
rc2 = pthread_cancel(destroy_thread);
ERR_IF( rc2 != 0 );
rc2 = pthread_join(destroy_thread, &status);
ERR_IF( rc2 != 0 || (int *)status != PTHREAD_CANCELED );
...
return 0
}
It runs well with the test program, but the test program hangs at round #53743 with pthread_join():
(gdb) bt
#0 0x40000410 in __kernel_vsyscall ()
#1 0x0094aa77 in pthread_join () from /lib/libpthread.so.0
#2 0x08085745 in authDeinit ()
at /users/qixu/src/moja/auth/src//app/libauth/authAPI.c:1562
#3 0x0807e747 in main ()
at /users/qixu/src/moja/auth/src//app/tests/test_session.c:45
Looks like pthread_join() caused a deadlock. But looking at the code, I feel there is no reason that a dead lock be caused by pthread_join(). When pthread_join() gets the chance to run, the only mutex operation is of the thread itself. Should be no conflict, right? Really confused here...
At least one "oddity" shows in your code; your cleanup handler will always unlock the mutex even if you're not the thread holding it.
From the manual;
Calling pthread_mutex_unlock() with a mutex that the calling thread
does not hold will result in undefined behavior.
A bigger problem with your code, and probably the cause of the deadlocks, is your use of asynchronous cancellation mode (I missed this before). Only 3 functions in POSIX are async-cancel-safe:
pthread_cancel()
pthread_setcancelstate()
pthread_setcanceltype()
Source: http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_09_05_04
You certainly cannot lock and unlock mutexes while async cancel mode is enabled.
For async cancellation to be usable, you have to do one of the following things:
Use it only with code that's purely computational, e.g. doing heavy math without any library calls at all, just arithmetic operations, or
Constantly toggle it off and back on around each library call you make.
Edit: Based on the comments, I think you have a misunderstanding of what asynchronous cancellation type means. It has nothing to do with the manner in which cleanup handlers run. It's purely a matter of what point the thread can catch the cancellation request and begin acting on it.
When the target is in deferred cancellation mode, calling pthread_cancel on it will not necessarily do anything right away, unless it's already blocked in a function (like read or select) that's a cancellation point. Instead it will just set a flag, and the next time a function which is a cancellation point is called, the thread will instead block any further cancellation attempts, run the cancellation cleanup handlers in the reverse order they were pushed, and exit with a special status indicating that the thread was cancelled.
When the target is in asynchronous cancellation mode, calling pthread_cancel on it will interrupt the thread immediately (possibly between any pair of adjacent machine code instructions). If you don't see why this is potentially dangerous, think about it for a second. Any function that has internal state (static/global variables, file descriptors or other resources being allocated/freed, etc.) could be in inconsistent state at the point of the interruption: a variable partially modified, a lock halfway obtained, a resource obtained but with no record of it having been obtained, or freed but with no record of it having been freed, etc.
At the point of the asynchronous interruption, further cancellation requests are blocked, so there's no danger of calling whatever function you like from your cleanup handlers. When the cleanup handlers finish running, the thread of course ceases to exist.
One other potential source of confusion: cleanup handlers do not run in parallel with the thread being cancelled. When cancellation is acted upon, the cancelled thread stops running the normal flow of code, and instead runs the cleanup handlers then exits.
Typically if a task1 holds lock A wants to take a lock B and another task2 has taken lock B and is waiting for lock A held by task1), this causes the deadlock.
But when it comes to pthread_mutex_timedlock, it attempts the mutex lock or timeout after the specified timeout.
I hit the deadlock scenario where i was trying to take the timed lock, which would have timed out eventually, which puzzles me.
edit: Deadlocks can be avoided by having a better design, which is what i ended up doing, i made sure that the order of taking mutex locks is same, to avoid deadlock
but the question remains open as to if the deadlock can be avoided since i chose timedlock
Can someone explain me this behaviour ?
Edit: Attaching a sample code to make the scenario more clear(real tasks are fairly complicated and run into thousands of lines)
T1
pthread_mutex_lock(&lockA);
//call some API, which results in a lock of m2
pthread_mutex_lock(&lockB);
//unlock in the order
pthread_mutex_unlock(&lockB);
pthread_mutex_unlock(&lockA);
T2
pthread_mutex_lock(&lockB);
//call some API, which results in locking m1
pthread_mutex_timedlock(&lockA,<10 sec>);
The crash is seen in the context of T2, bt:
Program terminated with signal 6, Aborted.
#0 0x57edada0 in raise () from /lib/libc.so.6
(gdb) bt
#0 0x57edada0 in raise () from /lib/libc.so.6
#1 0x57edc307 in abort () from /lib/libc.so.6
#2 0x57ed4421 in __assert_fail () from /lib/libc.so.6
#3 0x57bb2a7c in pthread_mutex_timedlock () from /lib/libpthread.so.0
I traced the error to following
pthread_mutex_timedlock: Assertion `(-(e)) != 35 || (kind != PTHREAD_MUTEX_ERRORCHECK_NP && kind != PTHREAD_MUTEX_RECURSIVE_NP)' failed.
In glibc sources pthread_mutex_timedlock() this assert looks like this:
int e = INTERNAL_SYSCALL (futex, __err, 4, &mutex->__data.__lock,
__lll_private_flag (FUTEX_LOCK_PI,
private), 1,
abstime);
if (INTERNAL_SYSCALL_ERROR_P (e, __err))
{
if (INTERNAL_SYSCALL_ERRNO (e, __err) == ETIMEDOUT)
return ETIMEDOUT;
if (INTERNAL_SYSCALL_ERRNO (e, __err) == ESRCH
|| INTERNAL_SYSCALL_ERRNO (e, __err) == EDEADLK)
{
assert (INTERNAL_SYSCALL_ERRNO (e, __err) != EDEADLK
|| (kind != PTHREAD_MUTEX_ERRORCHECK_NP
&& kind != PTHREAD_MUTEX_RECURSIVE_NP));
/* ESRCH can happen only for non-robust PI mutexes where
the owner of the lock died. */
assert (INTERNAL_SYSCALL_ERRNO (e, __err) != ESRCH
|| !robust);
It is probably that e == EDEADLK and kind is either PTHREAD_MUTEX_ERRORCHECK_NP or PTHREAD_MUTEX_RECURSIVE_NP. The other thing to notice is that timeout is handled before this check, i.e. you don't hit the timeout.
In the kernel it is futex_lock_pi_atomic() returning EDEADLK code:
/*
* Detect deadlocks.
*/
if ((unlikely((curval & FUTEX_TID_MASK) == vpid)))
return -EDEADLK;
/*
The above piece compares the TID of the thread that has locked the mutex and the TID of the thread that tries to acquire the mutex. If they are the same it suggests that the thread is trying to acquire the mutex that it has already acquired.
first of all what was the time specified for time out ? Was it large?
pthread_mutex_timedlock fails in three condtion
1> A deadlock condition was detected or the current thread already owns the mutex.
2>The mutex could not be acquired because the maximum number of recursive locks for mutex has been exceeded.
3>The value specified by mutex does not refer to an initialized mutex object.
was your code subjected to any of the above.
Also code snipet may help to clear things up for us to see the problem.