getaddrinfo and gethostbyname crashing when called from child thread?

getaddrinfo and gethostbyname crashing when called from child thread? - c

We have created a multithreaded, single core application running on Ubuntu.
When we call getaddrinfo and gethostbyname from the main process, it does not crash.
However when we create a thread from the main process and the functions getaddrinfo and gethostbyname are called from the created thread, it always crashes.
Kindly help.
Please find the call stack below:
#0 0xf7e9f890 in ?? () from /lib/i386-linux-gnu/libc.so.6
#1 0xf7e9fa73 in __res_ninit () from /lib/i386-linux-gnu/libc.so.6
#2 0xf7ea0a68 in __res_maybe_init () from /lib/i386-linux-gnu/libc.so.6
#3 0xf7e663be in ?? () from /lib/i386-linux-gnu/libc.so.6
#4 0xf7e696bb in getaddrinfo () from /lib/i386-linux-gnu/libc.so.6
#5 0x080c4e35 in mn_task_entry (args=0xa6c4130 <ipc_os_input_params>) at /home/nextg/Alps_RT/mn/src/mn_main.c:699
#6 0xf7fa5d78 in start_thread () from /lib/i386-linux-gnu/libpthread.so.0
#7 0xf7e9001e in clone () from /lib/i386-linux-gnu/libc.so.6

The reason the getaddrinfo was crashing because, the child thread making the call did not have sufficient stack space.

Using ACE C++ version 6.5.1 libraries classes which use ACE_Thread::spawn_n with default ACE_DEFAULT_THREAD_PRIORITY (1024*1024) will crash when calling gethostbyname/getaddrinfo inside child as reported by Syed Aslam. libxml2 schema parsing takes forever, using a child thread Segment Faulted after calling xmlNanoHTTPConnectHost as it tries to resolve schemaLocation.
ACE_Task activate
const ACE_TCHAR *thr_name[1];
thr_name[0] = "Flerf";
// libxml2-2.9.7/nanohttp.c:1133
// gethostbyname will crash when child thread making the call
// has insufficient stack space.
size_t stack_sizes[1] = {
ACE_DEFAULT_THREAD_STACKSIZE * 100
};
const INT ret = this->activate (
THR_NEW_LWP/*Light Weight Process*/ | THR_JOINABLE,
1,
0/*force_active*/,
ACE_DEFAULT_THREAD_PRIORITY,
-1/*grp_id*/,
NULL/*task*/,
NULL/*thread_handles[]*/,
NULL/*stack[]*/,
stack_sizes/*stack_size[]*/,
NULL/*thread_ids[]*/,
thr_name
);

Related

SIGABRT inside fgets

I have a relatively simple program that runs a bunch of shell scripts, concatenates their output into one string and sets it as statusbar for my display manager. For the most part everything is working
fine, but from time to time it crashes for no apart reason. Have inspected the coredump I found the following backtrace:
#0 0x00007fda71dc3d22 raise (libc.so.6 + 0x3cd22)
#1 0x00007fda71dad862 abort (libc.so.6 + 0x26862)
#2 0x00007fda71e05d28 __libc_message (libc.so.6 + 0x7ed28)
#3 0x00007fda71e0d92a malloc_printerr (libc.so.6 + 0x8692a)
#4 0x00007fda71e1109c _int_malloc (libc.so.6 + 0x8a09c)
#5 0x00007fda71e12397 malloc (libc.so.6 + 0x8b397)
#6 0x00007fda71dfb564 _IO_file_doallocate (libc.so.6 + 0x74564)
#7 0x00007fda71e09db0 _IO_doallocbuf (libc.so.6 + 0x82db0)
#8 0x00007fda71e08cbc _IO_file_underflow##GLIBC_2.2.5 (libc.so.6 + 0x81cbc)
#9 0x00007fda71e09e66 _IO_default_uflow (libc.so.6 + 0x82e66)
#10 0x00007fda71dfcf2c _IO_getline_info (libc.so.6 + 0x75f2c)
#11 0x00007fda71dfbe8a _IO_fgets (libc.so.6 + 0x74e8a)
#12 0x0000564c2b290484 getcmd (dwmblocks + 0x1484)
#13 0x0000564c2b2906ab getsigcmds (dwmblocks + 0x16ab)
#14 0x0000564c2b290b6f sighandler (dwmblocks + 0x1b6f)
#15 0x00007fda71dc3da0 __restore_rt (libc.so.6 + 0x3cda0)
#16 0x00007fda71e112cc _int_malloc (libc.so.6 + 0x8a2cc)
#17 0x00007fda71e13175 __libc_calloc (libc.so.6 + 0x8c175)
#18 0x00007fda71f83d23 XOpenDisplay (libX11.so.6 + 0x30d23)
#19 0x0000564c2b290952 setroot (dwmblocks + 0x1952)
#20 0x0000564c2b290b1a statusloop (dwmblocks + 0x1b1a)
#21 0x0000564c2b290e28 main (dwmblocks + 0x1e28)
#22 0x00007fda71daeb25 __libc_start_main (libc.so.6 + 0x27b25)
#23 0x0000564c2b29020e _start (dwmblocks + 0x120e)
The last function of the program which has been run before crash looks something like this:
void getcmd(const Block *block, char *output)
{
...
char *cmd = block->command;
FILE *cmdf = popen(cmd,"r");
if (!cmdf){
return;
}
char tmpstr[CMDLENGTH] = "";
char *s;
int e;
do {
errno = 0;
s = fgets(tmpstr, CMDLENGTH-(strlen(delim)+1), cmdf);
e = errno;
} while (!s && e == EINTR);
pclose(cmdf);
...
}
So it's just calling popen and trying to read the output with fgets.
From the backtrace it is apparent that SIGABRT is generated inside the fgets call. I have two questions:
How is it even possible? Isn't fgets has to return a string or an error if anything went wrong and let me deal with that error instead of bringing the whole program down?
What should I do to prevent that behavior?
UPDATE:
Inspecting strings from coredump I found out that error which malloc_printerr was trying to report was malloc(): mismatching next->prev_size (unsorted).
Don't know if it means anything...
UPDATE:
It appears the problem is that getcmd is called from signal handler, but popen and fgets is not signal-safe.
UPDATE:
I've added setvbuf(cmdf, NULL, _IONBF, 0); after popen call to make stream unbuffered so fgets wouldn't try to allocate buffers with malloc and hopefully prevent that crash. Unfortunately, I can't reliably reproduce the crash, so I can't tell if this hack helps.

From the stack trace, I can see calls to malloc twice with a signal handler between them. This is going to fail becuase malloc is (generally) not reentrant, so trying to call it from a signal handler is never a good idea. In general, you should not call ANY POSIX async-unsafe function in a signal handler unless you can somehow guarentee that the signal will never be delivered while running any other async-unsafe function1.
So the real question here is why does your signal need to call popen or fgets (both async-unsafe) and what can you do about it? What is the signal being caught? Is it likely to be fatal anyways (SIGSEGV or SIGBUS), or is it an informational signal like SIGIO?
If it is a fatal signal, you should be looking into why it is occurring; the failure in the signal handler is secondary.
If it is a non-fatal signal, then you should move the async-unsafe code out of the signal handler and have the signal handler either set some global variable that the main program will check, or arrange for another thread to do whatever work is needed
1This is possible but quite hard -- generally requires wrapping sigblock calls around all calls to async unsafe things. However, if you only have a few of those in your main program, it may be practical.

Your code is calling popen() to run some arbitrary Linux command.
The "arbitrary command" is calling XOpenDisplay() to display an X Windows GUI to the user.
The crash is occurring in malloc(), deep inside XOpenDisplay. Many other C library functions also use malloc() - including popen().
THEORY: You've corrupted memory, hence the "malloc()" failure.
LIKELY CANDIDATE: fgets(tmpstr, CMDLENGTH-(strlen(delim)+1), cmdf);
<= You need to ensure that "n" (the second argument) is NEVER larger than sizeof(tmpstr)-1.
It certainly looks like you're trying to do that ("n" should always be less than CMDLENGTH)... but it's worth double-checking.
SUGGESTION: try Valgrind

gdb script to isolate a thread based on variable value

I have a program having over 300 threads to which I have attached gdb. I need to identify one particular thread whose call stack has a frame containing a variable whose value I want to use for matching. Can I script this in gdb?
(gdb) thread 3
[Switching to thread 3 (Thread 0x7f16c1eeb700 (LWP 18833))]
#4 0x00007f17f3a3bdd5 in start_thread () from /lib64/libpthread.so.0
(gdb) backtrace
#0 0x00007f17f3a3fd12 in pthread_cond_timedwait##GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f17e72838be in __afr_shd_healer_wait (healer=healer#entry=0x7f17e05203d0) at afr-self-heald.c:101
#2 0x00007f17e728392d in afr_shd_healer_wait (healer=healer#entry=0x7f17e05203d0) at afr-self-heald.c:125
#3 0x00007f17e72848e8 in afr_shd_index_healer (data=0x7f17e05203d0) at afr-self-heald.c:572
#4 0x00007f17f3a3bdd5 in start_thread () from /lib64/libpthread.so.0
#5 0x00007f17f3302ead in clone () from /lib64/libc.so.6
(gdb) frame 3
#3 0x00007f17e72848e8 in afr_shd_index_healer (data=0x7f17e05203d0) at afr-self-heald.c:572
572 afr_shd_healer_wait (healer);
(gdb) p this->name
$6 = 0x7f17e031b910 "testvol-replicate-0"
For example, can I run a macro to loop over each thread, go to frame 3 in each of it, inspect the variable this->name and print the thead number only if the value matches testvol-replicate-0 ?

It's possible to integrate Python into GDB. Then, with the Python GDB API, you could loop over threads and search for a match. Below two examples of debugging threads with GDB and Python.
https://www.linuxjournal.com/article/11027
https://fy.blackhats.net.au/blog/html/2017/08/04/so_you_want_to_script_gdb_with_python.html

How can I find out the source of this glibc backtrace originating with clone()?

This backtrace comes from a deadlock situation in a multi-threaded application.
The other deadlocked threads are locking inside the call to malloc(), and appear
to be waiting on this thread.
I don't understand what creates this thread, since it deadlocks before calling
any functions in my application:
Thread 6 (Thread 0x7ff69d43a700 (LWP 14191)):
#0 0x00007ff6a2932eec in __lll_lock_wait_private () from /usr/lib64/libc.so.6
#1 0x00007ff6a299460d in _L_lock_27 () from /usr/lib64/libc.so.6
#2 0x00007ff6a29945bd in arena_thread_freeres () from /usr/lib64/libc.so.6
#3 0x00007ff6a2994662 in __libc_thread_freeres () from /usr/lib64/libc.so.6
#4 0x00007ff6a3875e38 in start_thread () from /usr/lib64/libpthread.so.0
#5 0x00007ff6a292534d in clone () from /usr/lib64/libc.so.6
clone() is used to implement fork(), pthread_create(), and perhaps other functions. See here and here.
How can I find out if this trace comes from a fork(), pthread_create(), a signal handler, or something else? Do I just need to dig through the glibc code, or can I use gdb or some other tool? Why does this thread need the internal glibc lock? This would be useful in determining the cause of the deadlock.
Additional information and research:
malloc() is thread-safe, but not reentrant (recursive-safe) (see this and this, so malloc() is also not async-signal-safe. We don't define signal handlers for this process, so I know that we don't call malloc() from signal handlers. The deadlocked threads don't ever call recursive functions, and callbacks are handled in a new thread, so I don't think we should need to worry about reentrancy here. (Maybe I'm wrong?)
This deadlock happens when many callbacks are being spawned to signal (ultimately kill) different processes. The callbacks are spawned in their own threads.
Are we possibly using malloc in an unsafe way?
Possibly related:
glibc malloc internals
Malloc inside of signal handler causes deadlock.
How are signal handlers delivered in a multi-threaded application?
glibc fork/malloc deadlock bug that was fixed in glibc-2.17-162.el7. This looks similar, but is NOT my bug - I'm on a fixed version of glibc.
(I've been unsuccessful in creating a minimal, complete, verifiable example. Unfortunately the only way to reproduce is with the application (Slurm), and it's quite difficult to reproduce.)
EDIT:
Here's the backtrace from all the threads. Thread 6 is the trace I originally posted. Thread 1 is just waiting on a pthread_join(). Threads 2-5 are locked after a call to malloc(). Thread 7 is listening for messages and spawning callbacks in new threads (threads 2-5). Those would be callbacks that would eventually signal other processes.
Thread 7 (Thread 0x7ff69e672700 (LWP 12650)):
#0 0x00007ff6a291aa3d in poll () from /usr/lib64/libc.so.6
#1 0x00007ff6a3c09064 in _poll_internal (shutdown_time=<optimized out>, nfds=2,
pfds=0x7ff6980009f0) at ../../../../slurm/src/common/eio.c:364
#2 eio_handle_mainloop (eio=0xf1a970) at ../../../../slurm/src/common/eio.c:328
#3 0x000000000041ce78 in _msg_thr_internal (job_arg=0xf07760)
at ../../../../../slurm/src/slurmd/slurmstepd/req.c:245
#4 0x00007ff6a3875e25 in start_thread () from /usr/lib64/libpthread.so.0
#5 0x00007ff6a292534d in clone () from /usr/lib64/libc.so.6
Thread 6 (Thread 0x7ff69d43a700 (LWP 14191)):
#0 0x00007ff6a2932eec in __lll_lock_wait_private () from /usr/lib64/libc.so.6
#1 0x00007ff6a299460d in _L_lock_27 () from /usr/lib64/libc.so.6
#2 0x00007ff6a29945bd in arena_thread_freeres () from /usr/lib64/libc.so.6
#3 0x00007ff6a2994662 in __libc_thread_freeres () from /usr/lib64/libc.so.6
#4 0x00007ff6a3875e38 in start_thread () from /usr/lib64/libpthread.so.0
#5 0x00007ff6a292534d in clone () from /usr/lib64/libc.so.6
Thread 5 (Thread 0x7ff69e773700 (LWP 22471)):
#0 0x00007ff6a2932eec in __lll_lock_wait_private () from /usr/lib64/libc.so.6
#1 0x00007ff6a28af7d8 in _L_lock_1579 () from /usr/lib64/libc.so.6
#2 0x00007ff6a28a7ca0 in arena_get2.isra.3 () from /usr/lib64/libc.so.6
#3 0x00007ff6a28ad0fe in malloc () from /usr/lib64/libc.so.6
#4 0x00007ff6a3c02e60 in slurm_xmalloc (size=size#entry=24, clear=clear#entry=false,
file=file#entry=0x7ff6a3c1f1f0 "../../../../slurm/src/common/pack.c",
line=line#entry=152, func=func#entry=0x7ff6a3c1f4a6 <__func__.7843> "init_buf")
at ../../../../slurm/src/common/xmalloc.c:86
#5 0x00007ff6a3b2e5b7 in init_buf (size=16384)
at ../../../../slurm/src/common/pack.c:152
#6 0x000000000041caab in _handle_accept (arg=0x0)
at ../../../../../slurm/src/slurmd/slurmstepd/req.c:384
#7 0x00007ff6a3875e25 in start_thread () from /usr/lib64/libpthread.so.0
#8 0x00007ff6a292534d in clone () from /usr/lib64/libc.so.6
Thread 4 (Thread 0x7ff6a4086700 (LWP 5633)):
#0 0x00007ff6a2932eec in __lll_lock_wait_private () from /usr/lib64/libc.so.6
#1 0x00007ff6a28af7d8 in _L_lock_1579 () from /usr/lib64/libc.so.6
#2 0x00007ff6a28a7ca0 in arena_get2.isra.3 () from /usr/lib64/libc.so.6
#3 0x00007ff6a28ad0fe in malloc () from /usr/lib64/libc.so.6
#4 0x00007ff6a3c02e60 in slurm_xmalloc (size=size#entry=24, clear=clear#entry=false,
file=file#entry=0x7ff6a3c1f1f0 "../../../../slurm/src/common/pack.c",
line=line#entry=152, func=func#entry=0x7ff6a3c1f4a6 <__func__.7843> "init_buf")
at ../../../../slurm/src/common/xmalloc.c:86
#5 0x00007ff6a3b2e5b7 in init_buf (size=16384)
at ../../../../slurm/src/common/pack.c:152
#6 0x000000000041caab in _handle_accept (arg=0x0)
at ../../../../../slurm/src/slurmd/slurmstepd/req.c:384
#7 0x00007ff6a3875e25 in start_thread () from /usr/lib64/libpthread.so.0
#8 0x00007ff6a292534d in clone () from /usr/lib64/libc.so.6
Thread 3 (Thread 0x7ff69d53b700 (LWP 12963)):
#0 0x00007ff6a2932eec in __lll_lock_wait_private () from /usr/lib64/libc.so.6
#1 0x00007ff6a28af7d8 in _L_lock_1579 () from /usr/lib64/libc.so.6
#2 0x00007ff6a28a7ca0 in arena_get2.isra.3 () from /usr/lib64/libc.so.6
#3 0x00007ff6a28ad0fe in malloc () from /usr/lib64/libc.so.6
#4 0x00007ff6a3c02e60 in slurm_xmalloc (size=size#entry=24, clear=clear#entry=false,
file=file#entry=0x7ff6a3c1f1f0 "../../../../slurm/src/common/pack.c",
line=line#entry=152, func=func#entry=0x7ff6a3c1f4a6 <__func__.7843> "init_buf")
at ../../../../slurm/src/common/xmalloc.c:86
#5 0x00007ff6a3b2e5b7 in init_buf (size=16384)
at ../../../../slurm/src/common/pack.c:152
#6 0x000000000041caab in _handle_accept (arg=0x0)
at ../../../../../slurm/src/slurmd/slurmstepd/req.c:384
#7 0x00007ff6a3875e25 in start_thread () from /usr/lib64/libpthread.so.0
#8 0x00007ff6a292534d in clone () from /usr/lib64/libc.so.6
Thread 2 (Thread 0x7ff69f182700 (LWP 19734)):
#0 0x00007ff6a2932eec in __lll_lock_wait_private () from /usr/lib64/libc.so.6
#1 0x00007ff6a28af7d8 in _L_lock_1579 () from /usr/lib64/libc.so.6
#2 0x00007ff6a28a7ca0 in arena_get2.isra.3 () from /usr/lib64/libc.so.6
#3 0x00007ff6a28ad0fe in malloc () from /usr/lib64/libc.so.6
#4 0x00007ff6a3c02e60 in slurm_xmalloc (size=size#entry=24, clear=clear#entry=false,
file=file#entry=0x7ff6a3c1f1f0 "../../../../slurm/src/common/pack.c",
line=line#entry=152, func=func#entry=0x7ff6a3c1f4a6 <__func__.7843> "init_buf")
at ../../../../slurm/src/common/xmalloc.c:86
#5 0x00007ff6a3b2e5b7 in init_buf (size=16384)
at ../../../../slurm/src/common/pack.c:152
#6 0x000000000041caab in _handle_accept (arg=0x0)
at ../../../../../slurm/src/slurmd/slurmstepd/req.c:384
#7 0x00007ff6a3875e25 in start_thread () from /usr/lib64/libpthread.so.0
#8 0x00007ff6a292534d in clone () from /usr/lib64/libc.so.6
Thread 1 (Thread 0x7ff6a4088880 (LWP 12616)):
#0 0x00007ff6a3876f57 in pthread_join () from /usr/lib64/libpthread.so.0
#1 0x000000000041084a in _wait_for_io (job=0xf07760)
at ../../../../../slurm/src/slurmd/slurmstepd/mgr.c:2219
#2 job_manager (job=job#entry=0xf07760)
at ../../../../../slurm/src/slurmd/slurmstepd/mgr.c:1397
#3 0x000000000040ca07 in main (argc=1, argv=0x7fffacab93d8)
at ../../../../../slurm/src/slurmd/slurmstepd/slurmstepd.c:172

The presence of start_thread() in the backtrace indicates that this is a pthread_create() thread.
__libc_thread_freeres() is a function that glibc calls at thread exit, which invokes a set of callbacks to free internal per-thread state. This indicates the thread you have highlighted is in the process of exiting.
arena_thread_freeres() is one of those callbacks. It is for the malloc arena allocator, and it moves the free list from the exiting thread's private arena to the global free list. To do this, it must take a lock that protects the global free list (this is the list_lock in arena.c).
It appears to be this lock that the highlighted thread (Thread 6) is blocked on.
The arena allocator installs pthread_atfork() handlers which lock the list lock at the start of fork() processing, and unlock it at the end. This means that while other pthread_atfork() handlers are running, all other threads will block on this lock.
Are you installing your own pthread_atfork() handlers? It seems likely that one of these may be causing your deadlock.

Process threads stuck with __pthread_enable_asynccancel ()

I am debugging an issue with a multi-threaded TCP server application on CentOS platform. The application suddenly stopped processing connections. Even the event logging with syslog was not seen in the log files. It was as if the application become a black hole.
I killed the process with signal 11 to get the core dump. In the core dump I observed that all threads are stuck with similar patterns.
Ex:
Thread 19 (Thread 0xb2de8b70 (LWP 3722)):
#0 0x00c9e424 in __kernel_vsyscall ()
#1 0x00c17189 in __pthread_enable_asynccancel () from /lib/libpthread.so.0
#2 0x080b367d in server_debug_stats_timer_handler (tmp=0x0) at server_debug_utils.c:75 ==> this line as a event print with syslog(...)
All most all threads are attempting a syslog(...) print but gets stuck with __pthread_enable_asynccancel ().
What is being done with __pthread_enable_asynccancel () and why it isn't returning?
Here is info reg from the thread mentioned:
(gdb) info reg
eax 0xfffffe00 -512
ecx 0x80 128
edx 0x2 2
ebx 0x154ea3c4 357475268
esp 0xb2de8174 0xb2de8174
ebp 0xb2de81a8 0xb2de81a8
esi 0x0 0
edi 0x0 0
eip 0xc9e424 0xc9e424 <__kernel_vsyscall+16>
eflags 0x200246 [ PF ZF IF ID ]
cs 0x73 115
ss 0x7b 123
ds 0x7b 123
es 0x7b 123
fs 0x0 0
gs 0x33 51
(gdb)
(gdb) print $orig_eax
$1 = 240
($orig_eax = 240 is SYS_futex)
One of the thread state is as shown below:
Thread 27 (Thread 0xa97d9b70 (LWP 3737)):
#0 0x00c9e424 in __kernel_vsyscall ()
#1 0x00faabb3 in inet_ntop () from /lib/libc.so.6
#2 0x00f20e76 in freopen64 () from /lib/libc.so.6
#3 0x00f96a55 in fcvt_r () from /lib/libc.so.6
#4 0x00f96fd7 in qfcvt_r () from /lib/libc.so.6
#5 0x00a19932 in app_signal_handler (signum=11) at appBaseClass.cpp:920
#6 <signal handler called>
#7 0x00f2aec5 in sYSMALLOc () from /lib/libc.so.6
#8 0x0043431a in CRYPTO_free () from /usr/local/lr/packages/stg_app/5.3.8/lib/ssl/libcrypto.so.10
#9 0x00000000 in ?? ()
(gdb) print $orig_eax $5 = 240`

Your stuck threads are in the futex() syscall, probably on an internal glibc lock taken within malloc() (the syslog() call allocates memory internally).
The reason for this deadlock isn't apparent, but I have two suggestions:
If you call syslog() (or any other non-AS-safe function, like printf()) from a signal handler anywhere in your program, that could cause a deadlock like this;
It's possible you're being bitten by a bug in futex() that was introduced in Linux 3.14 and fixed in 3.18. The bug was also backported RHEL 6.6 so would have been present for a while in CentOS too. The effect of this bug was to cause processes to fail to wake up from a FUTEX_WAIT when they should have.

thread died in the pthread_mutex_lock with SIGILL signal

I'm developing an project on an embedded linux OS(uclinux, mips CPU), It crashed occasionally.
When I try to check coredump with gdb, I can see that it received a SIGILL signal.
Sometime I can see the backtrace, which showed it died in pthread_mutex_lock. but most time, backtrace is not valid.
A valid backtrace
(gdb) bt
#0 <signal handler called>
#1 0x2ab87fd8 in sigsuspend () from /lib/libc.so.0
#2 0x2aade80c in __pthread_wait_for_restart_signal () from /lib/libpthread.so.0
#3 0x2aadc7ac in __pthread_alt_lock () from /lib/libpthread.so.0
#4 0x2aad81a4 in pthread_mutex_lock () from /lib/libpthread.so.0
#5 0x0042fde8 in aos_mutex_lock (mutex=0x66bea8) at ../../source/ssp/os/sys/linux/aos_lock_linux.c:184
invalid backtrace
(gdb) bt
#0 0x00690430 in ?? ()
#1 0x00690430 in ?? ()
I used pthread_attr_setstackaddr to initial a stack for each thread, so that I can see its call frame through checking its stack. Also, I found it died in pthread_mutex_lock.
I used a wrapper for lock and unlock like
struct aos_mutex_t
{
pthread_mutex_t mutex;
S8 obj_name[AOS_MAX_OBJ_NAME];
S32 lFlag;
};
S32 aos_mutex_lock(AOS_MUTEX_T *mutex)
{
S32 status;
AOS_ASSERT_RETURN(mutex, AOS_EINVAL);
mutex->lFlag++;
status = pthread_mutex_lock( &mutex->mutex );
if (status == 0)
{
return AOS_SUCC;
}
else
{
return AOS_RETURN_OS_ERROR(status);
}
}
/*
* aos_mutex_unlock()
*/
S32 aos_mutex_unlock(AOS_MUTEX_T *mutex)
{
S32 status;
AOS_ASSERT_RETURN(mutex, AOS_EINVAL);
status = pthread_mutex_unlock( &mutex->mutex );
mutex->lFlag--;
if (status == 0)
return AOS_SUCC;
else
{
return AOS_RETURN_OS_ERROR(status);
}
}
All of these mutex is initiated before using them.
I tried gdb to run the program, it didn't die.
I wrote a simple program, 11 threads do nothing bug just lock and unlock in a dead loop. It didn't die.
Is there any suggestion?

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

getaddrinfo and gethostbyname crashing when called from child thread? - c

The reason the getaddrinfo was crashing because, the child thread making the call did not have sufficient stack space.

Related

SIGABRT inside fgets

gdb script to isolate a thread based on variable value

How can I find out the source of this glibc backtrace originating with clone()?

Process threads stuck with __pthread_enable_asynccancel ()

thread died in the pthread_mutex_lock with SIGILL signal

Categories

Resources