My program (it is an smtp server program, tested by jmeter) run without any problem when it is run by valgrind.
But failed (got SIGABRT) finally, if it is running without valgrind or run within 'gdb' debugger.
I've tested all of the valgrind's tools (memcheck,helgrind,drd,massif) but no one reported any problem. I haven't found any memory leaks (checked by mtrace() ).
I've got the following :
Program received signal SIGABRT, Aborted.
[Switching to Thread 0xb7101b70 (LWP 1639)]
0xb776d416 in __kernel_vsyscall ()
the backtrace show various locations which changend run by run. The problems always allude to malloc() or free() (and always correlate with a string (char array) )
The question is: how can I found the problem, if valgrind and mtrace did not show any problem and the program can run without stoppage (within valgrind) in an endless jmeter test loop?
Related
I am using gdb to find out why I am getting a seg fault. I run the command gba myProg core so I can see the core dump from the seg fault. The core dump reads as follows.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 __GI__IO_fwrite (buf=0x7f32040167a0, size=1, count=2, fp=0x0) at iofwrite.c:37
37 iofwrite.c: No such file or directory.
[Current thread is 1 (Thread 0x7f3209bac700 (LWP 20157))]
I'm having a hard time figuring out the error message. It seems to be saying that the seg fault is due to iofwrite.c but I can't seem to find any information on such a file. I assume it relates to fwrite.
You are passing a NULL fp to fwrite(). It's impossible to answer more completely without code.
I had encounted this question too, the reason was my output file name is invalid.
I know this question has been asked before, but I have read all the threads and I didn't find an answer.
From the moment I execure run to start debugging my project, I get this : Program received signal SIGTRAP, Trace/breakpoint trap. [Switching to Thread 6]. When I do ctrl+c, gdb tells me : Program received signal SIGINT, Interrupt.
0x00000000 in ?? ()
Usually it'll tell me which file and which function it got interrupted at not 0x00000000 in ?? ()
GDB no longer hits breakpoints, and what makes matter crazier is the fact that a colleague and I, are sharing the same session (the debug is done using cygwin with a remote machine) and it works fine for them but not for me.
when I try to get info about the threads using info threads here's what I get :
[New Thread 20]
[New Thread 21]
[New Thread 22]
Id Target Id Frame
4 Thread 22 (ssp=0xa9004d5c) 0x00000000 in ?? ()
3 Thread 21 (ssp=0xa9002e64) 0x00000010 in ?? ()
2 Thread 20 (ssp=0xa9000ef4) 0x00000000 in ?? ()
The current thread <Thread ID 1> has terminated. See `help thread'
there's no thread 6, there's no * to indicate which thread gdb is using. And I don't even know if that's linked to the problem.
Can anyone please help me?
You are not providing nearly enough info to help you. Details matter, and you are withholding them. Versions of GDB and gdbserver matter, how you invoke GDB and gdbserver matter, what warnings you receive from GDB (if any) matter.
Now, this error message:
Program received signal SIGTRAP, Trace/breakpoint trap. [Switching to Thread 6]
usually means that gdbserver has not attached one of the threads of your process, and that thread has tried to execute breakpoint instruction (you do have breakpoints set before this happens, don't you?).
One of the reasons this may happen is when your GDB loads "wrong" libthread_db.so (one that doesn't match the target libc.so.6).
what makes matter crazier is the fact that a colleague and I, are sharing the same session (the debug is done using cygwin with a remote machine) and it works fine for them but not for me.
I am not sure what you mean by "same session", but it's probably not "when he types commands, they work; but when I type the same commands into the same GDB, they don't".
One difference between you and your colleague could be LD_LIBRATY_PATH environment variable setting. Another could be in ~/.gdbinit or in ./.gdbinit.
I suggest running gdb -nx to get rid of the latter, and unsetting LD_LIBRARY_PATH to get rid of the former.
The problem with the whole thing and for some reason no one seemed to notice it is this :
this is how I call gdb /usr/local/build/gdbx.y/gdb/gdb what I should be doing is this : /usr/local/build/gdbx.y/build/gdb/gdb
It was a path problem.
I'm compiling a C program with flags "-Wall -W -pedantic -O0 --coverage" (GCC version 4.8.2). However when a segmentation fault happens on that program I can't extract the coverage, because I don't have the .gcda file...
Does anyone know how can I use gcov even when a segmentation fault happens?
Thanks.
Does anyone know how can I use gcov even when a segmentation fault happens?
The coverage files are normally written by atexit handler, which requires program to call exit(). That does not happen when the program dies with SIGSEGV, which is why you don't get the .gcda file in that case.
The best solution is to fix whatever bug is causing SIGSEGV in the first place.
Alternatively, you could install a SIGSEGV handler, and call exit() from it. This is not guaranteed to work. For example, if your program hit SIGSEGV due to heap corruption, it may deadlock or crash again when exit calls global destructors.
Another possible solution is to run the program under GDB, and call __gcov_flush() from the debugger when you get SIGSEGV.
Is there any way to find what where the signal that interrupted a call to sleep() came from?
I have a ginormous amount of code, and I get this stacktrace from gdb:
#0 0x00418422 in __kernel_vsyscall ()
#1 0x001adfc6 in nanosleep () from /lib/libc.so.6
#2 0x001adde1 in sleep () from /lib/libc.so.6
#3 0x080a3cbd in MRT::setUp (this=0x9c679d8) at /code/Core/exec/mrt.cc:50
#4 0x080a1efc in main (argc=13, argv=0xbfcb6934) at /code/Core/exec/rpn.cc:211
I'm not entirely sure what all the code does, but I think this is what is going on:
Program 1 starts
Calls program 2 for shared memory allocation
Waits predetermined amount of time for allocation to complete
Program 1 continues
Find what interrupts sleep
At the time you attached GDB to the program, the sleep was in fact not interrupted by anything -- your stack trace indicates that your program is still blocked in the sleep system call.
Do you know what the sleep address is inside setup()? For example, sleep(&variable). Look for all callers of wakeup(&variable) and one of them is the sleep breaker. If there are too many, then I would add a trace array to remember the wakeups that were issued i.e. just store the PC from where wakeup was called...you can read that in the core file.
If you are sure that the sleep is interruptible && the sleep was actually interrupted, then I would do what one other poster said...catch the signal in a signal handler, capture signal info and re-arm it with the same signal.
If you are attaching to a running process, the process is interrupted by GDB itself to allow you to debug. The stack trace you observe is simply the stack of the running process at the time you attached to it. sleep() would not be an unreasonable system call for the process to be in when you are attaching to a process that appears to be idle.
If you are debugging a core file that shows the stack trace in sleep(), then when you start GDB to load a core file, it will display the top of the current stack frame of the core file. But just above that, it shows the signal that caused the core file. I wrote a test program, and this is what it showed when I loaded the core file into GDB:
Core was generated by `./a.out'.
Program terminated with signal 11, Segmentation fault.
#0 0x0000000000400458 in main ()
(gdb)
A core file is just a process snapshot, it is not always due to an internal error from the code. Sometimes it is generated by a signal delivered from an external program or the shell. Sometimes it is generated by executing the command generate-core-file from within GDB. In these cases, your core file may not actually point to anything wrong, but just the state the program was in at the time the core file was created.
I am teaching myself to use gdb and am running some random tests. It may be worth mentioning that I am using a portable installation of MinGW on Windows 7 x64. I've created a program which I know results in a stack overflow, and as I run through it in gdb I first get two SIGSEGV signals (no surprise), and then it exits (again no surprise) with code 030000000375.
Program received signal SIGSEGV, Segmentation fault.
Program received signal SIGSEGV, Segmentation fault.
Program exited with code 030000000375.
Curiosity getting the best of me... what the heck is that code? I googled it and found very little.
Thanks!
UPDATE: For reference I tried the same program on Ubuntu, and the results are slightly different:
Program received signal SIGSEGV, Segmentation fault.
Program terminated with signal SIGSEGV, Segmentation fault.
The program no longer exists.
gdb prints out the exit code in octal format. Not obvious, but indicated by the leading 0.
So 030000000375 is 0xC00000FD in hex, which makes the code look much more common to a windows programmer.
0xC00000FD is STATUS_STACK_OVERFLOW and should be defined in ntstatus.h.