Qt multi-thread application freezes and several threads wait for the same mutex - c

I encountered a strange problem with my Qt-based multi-thread application. After several days running, the application will freeze without any response.
After freeze occurred, I can confirm that several threads, including the main thread, are in futex_wait_queue_me status. When I attach to that application to investigate thread status by GDB, the backtrace of those threads
shows that they all stopped at the following function with the same argument futex=0x45a2f8b8 <main_arena>.
__lll_lock_wait_private (futex=0x45a2f8b8 <main_arena>)
I know that on Linux, using non-asynchronous-safe functions within signal handlers is one of possible reasons for this status, i.e. several threads wait for the same mutex, (I can confirm from backtrace that they all stopped at malloc()/free() related function calls), but after I confirmed my Qt application, I can not find implementations related to Linux signal handlers. (but I am not sure whether Qt core library is using Linux signal handlers in its signal/slot mechanism.)
I am sorry that I can not provide source code for this question because it is a huge project. Would you like tell me some possible reasons for this phenomenon, or some advises on how to debug it?
Thanks in advance.
UPDATE 1:
I can provide backtrace, but sorry I have to delete some sensitive information.
Backtrace of sub thread:
#0 in __lll_lock_wait_private (futex=0x4ad078b8 <main_arena>)
#1 in __GI___libc_malloc (bytes=32) at malloc.c:2918
... ...
#11 in SystemEventImp::event(QEvent*) ()
#12 in QApplicationPrivate::notify_helper(QObject*, QEvent*) ()
#13 in QApplication::notify(QObject*, QEvent*) ()
#14 in QCoreApplication::notifyInternal(QObject*, QEvent*) ()
#15 in QCoreApplicationPrivate::sendPostedEvents(QObject*, int, QThreadData*) ()
#16 in QCoreApplication::sendPostedEvents (receiver=0x0, event_type=0) at kernel/qcoreapplication.cpp:1329
#17 in QWindowSystemInterface::sendWindowSystemEvents (flags=...) at kernel/qwindowsysteminterface.cpp:560
#18 in QUnixEventDispatcherQPA::processEvents (this=0x8079958, flags=...) at eventdispatchers/qunixeventdispatcher.cpp:70
#19 in QEventLoop::processEvents (this=0xbfffef50, flags=...) at kernel/qeventloop.cpp:136
#20 in QEventLoop::exec (this=0xbfffef50, flags=...) at kernel/qeventloop.cpp:212
#21 in QCoreApplication::exec () at kernel/qcoreapplication.cpp:1120
#22 in QGuiApplication::exec () at kernel/qguiapplication.cpp:1220
#23 in QApplication::exec () at kernel/qapplication.cpp:2689
#24 in main(argc=2, argv=0xbffff294)
Backtrace of main thread:
#0 in __lll_lock_wait_private (futex=0x4ad078b8 <main_arena>) at ../ports/sysdeps/unix/sysv/linux/arm/nptl/lowlevellock.c:32
#1 in __GI___libc_malloc (bytes=8) at malloc.c:2918
... ...
#15 in QGraphicsView::paintEvent(QPaintEvent*) ()
#16 in QWidget::event(QEvent*) ()
#17 in QFrame::event(QEvent*) ()
#18 in QGraphicsView::viewportEvent(QEvent*) ()
#19 in Platform::Drawing::GraphicsView::viewportEvent(QEvent*) ()
#20 in QAbstractScrollAreaFilter::eventFilter(QObject*, QEvent*) ()
#21 in QCoreApplicationPrivate::cancel_handler(QObject*, QEvent*) ()
#22 in QApplicationPrivate::notify_helper(QObject*, QEvent*) ()
#23 in QApplication::notify(QObject*, QEvent*) ()
#24 in QCoreApplication::notifyInternal(QObject*, QEvent*) ()
#25 in QWidgetPrivate::drawWidget(QPaintDevice*, QRegion const&, QPoint const&, int, QPainter*, QWidgetBackingStore*) [clone .part.175] ()
#26 in QWidgetBackingStore::sync() ()
#27 in QWidgetPrivate::syncBackingStore() ()
#28 in QWidget::event(QEvent*) ()
#29 in QApplicationPrivate::notify_helper(QObject*, QEvent*) ()
#30 in QApplication::notify(QObject*, QEvent*) ()
#31 in QCoreApplication::notifyInternal(QObject*, QEvent*) ()
#32 in QCoreApplicationPrivate::sendPostedEvents(QObject*, int, QThreadData*) ()
#33 in QCoreApplication::sendPostedEvents (receiver=0x809ea50, event_type=77)
#34 in QGraphicsViewPrivate::dispatchPendingUpdateRequests (this=0x80e4418)
#35 in QGraphicsScenePrivate::_q_processDirtyItems (this=0x80de238) at graphicsview/qgraphicsscene.cpp:508
#36 in QGraphicsScene::qt_static_metacall (_o=0x80d1a80, _c=QMetaObject::InvokeMetaMethod, _id=15, _a=0x865e238)
#37 in QMetaCallEvent::placeMetaCall (this=0x898d020, object=0x80d1a80)
#38 in QObject::event (this=0x80d1a80, e=0x898d020) at kernel/qobject.cpp:1070
#39 in QGraphicsScene::event (this=0x80d1a80, event=0x898d020) at graphicsview/qgraphicsscene.cpp:3478
#40 in QApplicationPrivate::notify_helper (this=0x8077ba0, receiver=0x80d1a80, e=0x898d020) at kernel/qapplication.cpp:3457
#41 in QApplication::notify (this=0x8077970, receiver=0x80d1a80, e=0x898d020) at kernel/qapplication.cpp:2878
#42 in QCoreApplication::notifyInternal (this=0x8077970, receiver=0x80d1a80, event=0x898d020) at kernel/qcoreapplication.cpp:867
#43 in QCoreApplication::sendEvent (receiver=0x80d1a80, event=0x898d020) at ../../include/QtCore/../../src/corelib/kernel/qcoreapplication.h:232
#44 in QCoreApplicationPrivate::sendPostedEvents (receiver=0x0, event_type=0, data=0x8073318) at kernel/qcoreapplication.cpp:1471
#45 in QCoreApplication::sendPostedEvents (receiver=0x0, event_type=0) at kernel/qcoreapplication.cpp:1329
#46 in QWindowSystemInterface::sendWindowSystemEvents (flags=...) at kernel/qwindowsysteminterface.cpp:560
#47 in QUnixEventDispatcherQPA::processEvents (this=0x8079958, flags=...) at eventdispatchers/qunixeventdispatcher.cpp:70
#48 in QEventLoop::processEvents (this=0xbfffef50, flags=...) at kernel/qeventloop.cpp:136
#49 in QEventLoop::exec (this=0xbfffef50, flags=...) at kernel/qeventloop.cpp:212
#50 in QCoreApplication::exec () at kernel/qcoreapplication.cpp:1120
#51 in QGuiApplication::exec () at kernel/qguiapplication.cpp:1220
#52 in QApplication::exec () at kernel/qapplication.cpp:2689
#53 in main(argc=2, argv=0xbffff294)
UPDATE2:
In response to those valuable comments of this question.
I also shared several detailed backtrace files in the following links: 1drv.ms/f/s!AlojS_vldQMhjHRlTfU9vwErNz-H .Please refer to Readme.txt
for for some explanation and the libc version I used.
By the way, when I tried to replace system() with vfork()/waitpid(), the freeze seems not appear any more. I did not know the reason.
Thanks you all in advance.

Without source code provided, it is hard to answer the question definitively. In my experience with multithreaded programs it is really easy to overlook some place, where a deadlock can occur. In your case it sounds like something, that is very unlikely to happen. However i would bet, that somewhere in your code you have a potential deadlock.
I would advise you to draw out the whole environment in a diagram and look at which threads use which shared ressources and when and where the mutexes come in.
But as i said in the beginning, without further information it's hard to say.

From the trace-back, it seems the malloc was called when Qt was trying to post an event.
If you are trying to send events across threads, Qt could Queue the events for you. But these events could fill up your memory if it is nor drained out. Then you could get wired behavior from malloc, because there is no memory left.
Do you have a mean to monitor the memory usage of your program and see if this happens everytime the memory gets filled up?
Do you have a way to reduce the memory that the system has and see if this problem comes up more often?
If above is indeed the issue, then you might take a look at this thread for the solution.

If you are using signals and slots to communicate across threads you should understand the different connection patterns.
Auto Connection (default) If the signal is emitted in the thread
which the receiving object has affinity then the behavior is the same
as the Direct Connection. Otherwise, the behavior is the same as the
Queued Connection.
Direct Connection The slot is invoked
immediately, when the signal is emitted. The slot is executed in the
emitter's thread, which is not necessarily the receiver's thread.
Queued Connection The slot is invoked when control returns to the
event loop of the receiver's thread. The slot is executed in the
receiver's thread.
Blocking Queued Connection The slot is invoked as
for the Queued Connection, except the current thread blocks until the
slot returns. Note: Using this type to connect objects in the same
thread will cause deadlock.
More here: https://doc.qt.io/archives/qt-5.6/threads-qobject.html
The question does need some code context though. Is this behavior occurring when you are passing data to the UI? If yes are you using QWidgets, QML, ...? A lot of the Qt patterns rely on signals/slots when rendering data to the UI.

Related

QEMU how pcie_host converts physical address to pcie address

I am learning the implementations of QEMU. Here I got a question: As we know, in real hardware, when cpu reads the virtual address which is the address of pci devices, pci host will take the responsibility to convert it to address of pci. And QEMU, provides pcie_host.c to imitate pcie host. In this file, pcie_mmcfg_data_write is implemented, but nothing about the conversion of physical address to pci address.
I do a test in QEMU using gdb:
firstly, I add edu device, which is a very simple pci device, into qemu.
When I try to open Memory Space Enable, (Mem- to Mem+):septic -s 00:02.0 04.b=2, qemu stop in function pcie_mmcfg_data_write.
static void pcie_mmcfg_data_write(void *opaque, hwaddr mmcfg_addr,
uint64_t val, unsigned len)
{
PCIExpressHost *e = opaque;
PCIBus *s = e->pci.bus;
PCIDevice *pci_dev = pcie_dev_find_by_mmcfg_addr(s, mmcfg_addr);
uint32_t addr;
uint32_t limit;
if (!pci_dev) {
return;
}
addr = PCIE_MMCFG_CONFOFFSET(mmcfg_addr);
limit = pci_config_size(pci_dev);
pci_host_config_write_common(pci_dev, addr, limit, val, len);
}
It is obvious that pcie host uses this function to find device and do the thing.
Use bt can get:
#0 pcie_mmcfg_data_write
(opaque=0xaaaaac573f10, mmcfg_addr=65540, val=2, len=1)
at hw/pci/pcie_host.c:39
#1 0x0000aaaaaae4e8a8 in memory_region_write_accessor
(mr=0xaaaaac574520, addr=65540, value=0xffffe14703e8, size=1, shift=0, mask=255, attrs=...)
at /home/mrzleo/Desktop/qemu/memory.c:483
#2 0x0000aaaaaae4eb14 in access_with_adjusted_size
(addr=65540, value=0xffffe14703e8, size=1, access_size_min=1, access_size_max=4, access_fn=
0xaaaaaae4e7c0 <memory_region_write_accessor>, mr=0xaaaaac574520, attrs=...) at /home/mrzleo/Desktop/qemu/memory.c:544
#3 0x0000aaaaaae51898 in memory_region_dispatch_write
(mr=0xaaaaac574520, addr=65540, data=2, op=MO_8, attrs=...)
at /home/mrzleo/Desktop/qemu/memory.c:1465
#4 0x0000aaaaaae72410 in io_writex
(env=0xaaaaac6924e0, iotlbentry=0xffff000e9b00, mmu_idx=2, val=2,
addr=18446603336758132740, retaddr=281473269319356, op=MO_8)
at /home/mrzleo/Desktop/qemu/accel/tcg/cputlb.c:1084
#5 0x0000aaaaaae74854 in store_helper
(env=0xaaaaac6924e0, addr=18446603336758132740, val=2, oi=2, retaddr=281473269319356, op=MO_8)
at /home/mrzleo/Desktop/qemu/accel/tcg/cputlb.c:1954
#6 0x0000aaaaaae74d78 in helper_ret_stb_mmu
(env=0xaaaaac6924e0, addr=18446603336758132740, val=2 '\002', oi=2, retaddr=281473269319356)
at /home/mrzleo/Desktop/qemu/accel/tcg/cputlb.c:2056
#7 0x0000ffff9a3b47cc in code_gen_buffer ()
#8 0x0000aaaaaae8d484 in cpu_tb_exec
(cpu=0xaaaaac688c00, itb=0xffff945691c0 <code_gen_buffer+5673332>)
at /home/mrzleo/Desktop/qemu/accel/tcg/cpu-exec.c:172
#9 0x0000aaaaaae8e4ec in cpu_loop_exec_tb
(cpu=0xaaaaac688c00, tb=0xffff945691c0 <code_gen_buffer+5673332>,
last_tb=0xffffe1470b78, tb_exit=0xffffe1470b70)
at /home/mrzleo/Desktop/qemu/accel/tcg/cpu-exec.c:619
#10 0x0000aaaaaae8e830 in cpu_exec (cpu=0xaaaaac688c00)
at /home/mrzleo/Desktop/qemu/accel/tcg/cpu-exec.c:732
#11 0x0000aaaaaae3d43c in tcg_cpu_exec (cpu=0xaaaaac688c00)
at /home/mrzleo/Desktop/qemu/cpus.c:1405
#12 0x0000aaaaaae3dd4c in qemu_tcg_cpu_thread_fn (arg=0xaaaaac688c00)
at /home/mrzleo/Desktop/qemu/cpus.c:1713
#13 0x0000aaaaab722c70 in qemu_thread_start (args=0xaaaaac715be0)
at util/qemu-thread-posix.c:519
#14 0x0000fffff5af84fc in start_thread (arg=0xffffffffe3ff)
at pthread_create.c:477
#15 0x0000fffff5a5167c in thread_start ()
at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78
and I try to visit the address of edu: devmem 0x10000000
qemu stop in edu_mmio_read. use bt:
(gdb) bt
#0 edu_mmio_read
(opaque=0xaaaaae71c560, addr=0, size=4)
at hw/misc/edu.c:187
#1 0x0000aaaaaae4e5b4 in memory_region_read_accessor
(mr=0xaaaaae71ce50, addr=0, value=0xffffe2472438, size=4, shift=0, mask=4294967295, attrs=...)
at /home/mrzleo/Desktop/qemu/memory.c:434
#2 0x0000aaaaaae4eb14 in access_with_adjusted_size
(addr=0, value=0xffffe2472438, size=4, access_size_min=4, access_size_max=8, access_fn=
0xaaaaaae4e570 <memory_region_read_accessor>, mr=0xaaaaae71ce50, attrs=...)
at /home/mrzleo/Desktop/qemu/memory.c:544
#3 0x0000aaaaaae51524 in memory_region_dispatch_read1
(mr=0xaaaaae71ce50, addr=0, pval=0xffffe2472438, size=4, attrs=...)
at /home/mrzleo/Desktop/qemu/memory.c:1385
#4 0x0000aaaaaae51600 in memory_region_dispatch_read
(mr=0xaaaaae71ce50, addr=0, pval=0xffffe2472438, op=MO_32, attrs=...)
at /home/mrzleo/Desktop/qemu/memory.c:1413
#5 0x0000aaaaaae72218 in io_readx
(env=0xaaaaac6be0f0, iotlbentry=0xffff04282ec0, mmu_idx=0,
addr=281472901758976, retaddr=281473196263360, access_type=MMU_DATA_LOAD, op=MO_32)
at /home/mrzleo/Desktop/qemu/accel/tcg/cputlb.c:1045
#6 0x0000aaaaaae738b0 in load_helper
(env=0xaaaaac6be0f0, addr=281472901758976, oi=32, retaddr=281473196263360,
op=MO_32, code_read=false, full_load=0xaaaaaae73c68 <full_le_ldul_mmu>)
at /home/mrzleo/Desktop/qemu/accel/tcg/cputlb.c:1566
#7 0x0000aaaaaae73ca4 in full_le_ldul_mmu
(env=0xaaaaac6be0f0, addr=281472901758976, oi=32, retaddr=281473196263360)
at /home/mrzleo/Desktop/qemu/accel/tcg/cputlb.c:1662
#8 0x0000aaaaaae73cd8 in helper_le_ldul_mmu
(env=0xaaaaac6be0f0, addr=281472901758976, oi=32, retaddr=281473196263360)
at /home/mrzleo/Desktop/qemu/accel/tcg/cputlb.c:1669
#9 0x0000ffff95e08824 in code_gen_buffer
()
#10 0x0000aaaaaae8d484 in cpu_tb_exec
(cpu=0xaaaaac6b4810, itb=0xffff95e086c0 <code_gen_buffer+31491700>)
at /home/mrzleo/Desktop/qemu/accel/tcg/cpu-exec.c:172
#11 0x0000aaaaaae8e4ec in cpu_loop_exec_tb
(cpu=0xaaaaac6b4810, tb=0xffff95e086c0 <code_gen_buffer+31491700>,
last_tb=0xffffe2472b78, tb_exit=0xffffe2472b70)
at /home/mrzleo/Desktop/qemu/accel/tcg/cpu-exec.c:619
#12 0x0000aaaaaae8e830 in cpu_exec
(cpu=0xaaaaac6b4810) at /home/mrzleo/Desktop/qemu/accel/tcg/cpu-exec.c:732
#13 0x0000aaaaaae3d43c in tcg_cpu_exec
(cpu=0xaaaaac6b4810) at /home/mrzleo/Desktop/qemu/cpus.c:1405
#14 0x0000aaaaaae3dd4c in qemu_tcg_cpu_thread_fn
(arg=0xaaaaac6b4810)
at /home/mrzleo/Desktop/qemu/cpus.c:1713
#15 0x0000aaaaab722c70 in qemu_thread_start (args=0xaaaaac541610) at util/qemu-thread-posix.c:519
#16 0x0000fffff5af84fc in start_thread (arg=0xffffffffe36f) at pthread_create.c:477
#17 0x0000fffff5a5167c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78
It seems that qemu just locates to edu device directly, and pcie host do nothing in this procedure. I wonder whether qemu do not implements the conversion here and just use memoryRegion to achieve polymorphism? If not, how QEMU's pcie host do in this procedure?
QEMU uses a set of data structures called MemoryRegions to model the address space that a CPU sees (the detailed API is documented in part in the developer docs).
MemoryRegions can be built up into a tree, where at the "root" there is one 'container' MR which covers the whole 64-bit address space the guest CPU can see, and then MRs for blocks of RAM, devices, etc are placed into that root MR at appropriate offsets. Child MRs can also be containers which in turn contain further MRs. You can then find the MR corresponding to a given guest physical address by walking through the tree of MRs.
The tree of MemoryRegions is largely built up statically when QEMU starts (because most devices don't move around), but it can also be changed dynamically in response to guest software actions. In particular, PCI works this way. When the guest OS writes to a PCI device BAR (which is in PCI config space) this causes QEMU's PCI host controller emulation code to place the MR corresponding to the device's registers into the MemoryRegion hierarchy at the correct place and offset (depending on what address the guest wrote to the BAR, ie where it asked for it to be mapped). Once this is done, the MR for the PCI device is like any other in the tree, and the PCI host controller code doesn't need to be involved in guest accesses to it.
As a performance optimisation, QEMU doesn't actually walk down a tree of MRs for every access. Instead, we first "flatten" the tree into a data structure (a FlatView) that directly says "for this range of addresses, it will be this MR; for this range; this MR", and so on. Secondly, QEMU's TLB structure can directly cache mappings from "guest virtual address" to "specific memory region". On first access it will do an emulated guest MMU page table walk to get from the guest virtual address to the guest physical address, and then it will look that physical address up in the FlatView to find either the real host RAM or the MemoryRegion that is mapped there, and it will add the "guest VA -> this MR" mapping to the TLB cache. Future accesses will hit in the TLB and need not repeat the work of converting to a physaddr and then finding the MR in the flatmap. This is what is happening in your backtrace -- the io_readx() function is passed the guest virtual address and also the relevant part of the TLB data structure, and it can then directly find the target MR and the offset within it, so it can call memory_region_dispatch_read() to dispatch the read request to that MR's read callback function. (If this was the first access, the initial "MMU walk + FlatView lookup" work will have just been done in load_helper() before it calls io_readx().)
Obviously, all this caching also implies that QEMU tracks events which mean the cached data is no longer valid so we can throw it away (eg if the guest writes to the BAR again to unmap it or to map it somewhere else; or if the MMU settings or page tables are changed to alter the guest virtual-to-physical mapping).

Need you lights on iOS threads

I am working on a POC app on datagram sockets, I'm on the iOS part. It's a straightforward one screen app with a couple of buttons. Anyway, my issue is with the EDT thread, the GC thread and one of my IO threads. My IO thread has a bound datagram socket waiting for messages (recvfrom). Sometimes, I see that the EDT is stuck and when I look at the iOS threads stacks, I see that:
1 - The EDT thread is sleeping waiting for a boolean to turn false
while(threadStateData->threadBlockedByGC) {
usleep(1000);
}
#3 0x0000000100e6ed02 in java_lang_Thread_sleep___long at /dist/MyApplication-src/nativeMethods.m:1231
#4 0x0000000101194c44 in java_lang_System_gc__ at /dist/MyApplication-src/java_lang_System.m:257
#5 0x0000000100c431c1 in codenameOneGcMalloc at /dist/MyApplication-src/cn1_globals.m:791
#6 0x00000001011bac4a in __NEW_com_codename1_ui_Label_1 at /dist/MyApplication-src/com_codename1_ui_Label_1.m:31
#7 0x0000000101491019 in com_codename1_ui_Label___INIT_____java_lang_String_java_lang_String at /dist/MyApplication-src/com_codename1_ui_Label.m:1402
...
2 - The GC thread is also sleeping waiting for another boolean to turn true
while(t->threadActive) {
usleep(500);
#3 0x0000000100c428d6 in codenameOneGCMark at /dist/MyApplication-src/cn1_globals.m:426
#4 0x0000000100e6e950 in java_lang_System_gcMarkSweep__ at /dist/MyApplication-src/nativeMethods.m:1078
#5 0x000000010119521d in java_lang_System_access$200__ at /dist/MyApplication-src/java_lang_System.m:331
...
A quick watch on t shows the threadId=8
t ThreadLocalData * 0x600001616eb0 0x0000600001616eb0
threadId JAVA_LONG 8
3 - My IO thread seems to be the one with id 8 (the address in memory is the same as well)
A quick watch on threadStateData shows the threadId=8
threadStateData ThreadLocalData * 0x600001616eb0 0x0000600001616eb0
threadId JAVA_LONG 8
ssize_t result = recvfrom(socketDescriptor, buffer, sob, 0, (struct sockaddr *)&receiveSockaddr, &receiveSockaddrLen);
#1 0x0000000101100a00 in -[net_etc_net_impl_NativeDatagramSocketImpl receive:param1:param2:param3:] at /dist/MyApplication-src/net_et_net_impl_NativeDatagramSocketImpl.m:131
#2 0x0000000101615f6b in net_etc_net_impl_NativeDatagramSocketImplCodenameOne_receive___int_int_java_lang_String_int_R_int at /dist/MyApplication-src/native_net_et_net_impl_NativeDatagramSocketImplCodenameOne.m:51
#3 0x0000000100f7fc9e in net_etc_net_impl_NativeDatagramSocketStub_receive___int_int_java_lang_String_int_R_int at /dist/MyApplication-src/net_etc_net_impl_NativeDatagramSocketStub.m:87
#4 0x0000000100d59939 in virtual_net_etc_net_impl_NativeDatagramSocket_receive___int_int_java_lang_String_int_R_int at /dist/MyApplication-src/net_etc_net_impl_NativeDatagramSocket.m:91
#5 0x000000010156690f in net_etc_net_DatagramSocket_receive___byte_1ARRAY_int_R_int at /dist/MyApplication-src/net_etceterum_net_DatagramSocket.m:215
So my question is: what can I do to prevent this?
Thanks for your help.
Emmanuel
See this code in our socket implementation. I suggest adding yield/resume calls in your code to let the GC work. Just make sure you don't do any Java based allocations during that time.
What happens is this:
The GC needs to run so it loops over all the active threads and tries to collect
Your thread started on the Java side so it's marked as a GC thread
It's marked as alive
The GC wants it to suspend allocations so it can GC it
The thread is unaware of this because it's in C code for a long time... Deadlock

Unexplainable behaviour when integrating x86 FreeRTOS port (pthreads) and auxilary pthreads code

I am out of ideas of how to figure out where my problem is coming from.
I am trying to incorporate a Async UDP handler into an existing FreeRTOS emulator, both being pthreads based. The FreeRTOS implementation is essentially a wrapper around pthreads and the UDP handler spawns a FreeRTOS task which then spawns a pthread thread for each socket, such that the spawned threads can have their own sigaction to handle that specific UDP port with a specified callback.
As a sanity check I moved the UDP handler code into a stand alone build yesterday to test it and it works without fault, found here. All valgrind checks also showing no errors. The FreeRTOS Emulator is also stable when the UDP handler is not added, found here. The unstable integration can be found here.
Now when integrating the two I get behavior I have not been able to debug successfully yet. The bug presents itself as a heisenbug in that during debugging I am not able to recreate it always. All valgrind (memcheck, helgrind and drd) are not able to recreate the bug, only reporting errors in linked libraries such as SDL2, X11, mensa graphics etc. Post morten GDB is able to capture the fault as well as when using (gdb) set disable-randomization off.
The backtrace from gdb shows me the following
(gdb) bt
#0 0x00007faa2f45a41b in pthread_kill () from /usr/lib/libpthread.so.0
#1 0x0000564392f5c93b in prvResumeThread (xThreadId=0) at /home/alxhoff/git/GitHub/FreeRTOS-Emulator/lib/FreeRTOS_Kernel/portable/GCC/Posix/port.c:561
#2 0x0000564392f5c38b in vPortYield () at /home/alxhoff/git/GitHub/FreeRTOS-Emulator/lib/FreeRTOS_Kernel/portable/GCC/Posix/port.c:329
#3 0x0000564392f5d986 in xQueueGenericReceive (xQueue=0x564396692bd0, pvBuffer=0x0, xTicksToWait=4294967295, xJustPeeking=0) at /home/alxhoff/git/GitHub/FreeRTOS-Emulator/lib/FreeRTOS_Kernel/queue.c:1376
#4 0x0000564392f5b0d3 in vDemoTask1 (pvParameters=0x0) at /home/alxhoff/git/GitHub/FreeRTOS-Emulator/src/main.c:338
#5 0x0000564392f5c754 in prvWaitForStart (pvParams=0x5643966b2780) at /home/alxhoff/git/GitHub/FreeRTOS-Emulator/lib/FreeRTOS_Kernel/portable/GCC/Posix/port.c:496
#6 0x00007faa2f4524cf in start_thread () from /usr/lib/libpthread.so.0
#7 0x00007faa2efcd2d3 in clone () from /usr/lib/libc.so.6
The problem appears to be that prvResumeThread is not being passed a valid thread id as seen in #1. Going into the FreeRTOS sources I believe that this should not be the case as the same threads are created when the UDP handler and it's respective task are added, their addition somehow leads to FreeRTOS's pxCurrentTCB becoming invalid when executing xTaskGetCurrentTaskHandle which retrieves the thread handle for the faulting prvResumeThread call in #1 of the backtrace. Moving the task creation order around leads to the same error which makes me think I am dealing with some sort of memory leak but given that I cannot reproduce the error with valgrind I am unsure of how to diagnose the error.
I am worried this seems like a "debug my program" post but I am unsure of what methods or tools I can utilize to further my diagnosis, given my limited experience with multi-threaded debugging, and am in need of a push in the right direction.
Cheers

Cuda hangs on cudaDeviceSynchronize randomly [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 7 years ago.
Improve this question
I have a piece of GPU code that has worked for a while. I recently made a couple minor algorithmic changes, but they didn't touch the CUDA part.
I'm running production runs on a set of three Xeon machines, each with a 780 Ti in it. Each run takes about three minutes to complete, but at this point there have been two cases (out of 5000) where the application has hung for hours (until killed). Both were on the same machine.
The second time, I attached GDB to the running process, and got a backtrace that looks like
#0 0x00007fff077ffa01 in clock_gettime ()
#1 0x0000003e1ec03e46 in clock_gettime () from /lib64/librt.so.1
#2 0x00002b5b5e302a1e in ?? () from /usr/lib64/libcuda.so
#3 0x00002b5b5dca2294 in ?? () from /usr/lib64/libcuda.so
#4 0x00002b5b5dbbaa4f in ?? () from /usr/lib64/libcuda.so
#5 0x00002b5b5dba8cda in ?? () from /usr/lib64/libcuda.so
#6 0x00002b5b5db94c4f in cuCtxSynchronize () from /usr/lib64/libcuda.so
#7 0x000000000041cd8d in cudart::cudaApiDeviceSynchronize() ()
#8 0x0000000000441269 in cudaDeviceSynchronize ()
#9 0x0000000000408124 in main (argc=11, argv=0x7fff076fa1d8) at src/fraps3d.cu:200
I manually did a frame 8; return; to forcibly make it finish, which caused it to end up stuck on the next cudaDeviceSynchronize() call. Doing it again got it stuck on the next synchronization call after that (every time with the same frames 0 through 8). Extra strangely, the failure happened in the middle of the main loop, on the ~5000th time through.
After killing it, the next jobs starts and runs properly, so it doesn't appear to be a systemic failure of the execution host.
Any ideas about what could cause a random failure like this?
I'm compiling and running with V6.0.1, running with driver version 331.62.

core dump at _dl_sysinfo_int80 ()

I have created a TCP client that connects to a listening server.
We implemeted TCP keep alive also.
Some times the client crashes and core dumped.
Below are the core dump traces.
Problem is in linux kernel version Update 4, kernel 2.6.9-42.0.10.
we had two core dumps.
(gdb) where
#0 0x005e77a2 in _dl_sysinfo_int80 () from /ddisk/d303/dumps/mhx239131/ld-
linux.so.2
#1 0x006c8bd1 in connect () from /ddisk/d303/dumps/mhx239131/libc.so.6
#2 0x08057863 in connect_to_host ()
#3 0x08052f38 in open_ldap_connection ()
#4 0x0805690a in new_connection ()
#5 0x08052cc9 in ldap_open ()
#6 0x080522cf in checkHosts ()
#7 0x08049b36 in pollLDEs ()
#8 0x0804d1cd in doOnChange ()
#9 0x0804a642 in main ()
(gdb) where
#0 0x005e77a2 in _dl_sysinfo_int80 () from /ddisk/d303/dumps/mhx239131/ld-
linux.so.2
#1 0x0068ab60 in __nanosleep_nocancel (
from /ddisk/d303/dumps/mhx239131/libc.so.6
#2 0x080520a2 in Sleep ()
#3 0x08049ac1 in pollLDEs ()
#4 0x0804d1cd in doOnChange ()
#5 0x0804a642 in main ()
We have tried to reproduce the problem in our environment, but we could not.
What would cause the core file?
Please help me to avoid such situation.
Thanks,
Naga
_dl_sysinfo_int80 is just a function which does a system call into the kernel. So the core dump is happening on a system call (probably the one used by connect in the first example and nanosleep in the second example), probably because you are passing invalid pointers.
The invalid pointers could be because the code which calls these functions being broken or because somewhere else in the program is broken and corrupting the program's memory.
Take a look at two frames above (frame #2) in the core dump for both examples and check the parameters being passed. Unfortunately, it seems you did not compile with debug information, making it harder to see them.
Additionally, I would suggest trying valgrind and seeing if it finds something.
Your program almost cetainly did not coredump in either of the above places.
Most likely, you either have multiple threads in your process (and some other thread caused the core dump), or something external caused your process to die (such as 'kill -SIGABRT <pid>').
If you do have multiple threads, GDB 'info threads' and 'thread apply all where' are likely to provide further clues.

Resources