Program received signal SIGTRAP, Trace/breakpoint trap. [Switching to Thread 6] - c

I know this question has been asked before, but I have read all the threads and I didn't find an answer.
From the moment I execure run to start debugging my project, I get this : Program received signal SIGTRAP, Trace/breakpoint trap. [Switching to Thread 6]. When I do ctrl+c, gdb tells me : Program received signal SIGINT, Interrupt.
0x00000000 in ?? ()
Usually it'll tell me which file and which function it got interrupted at not 0x00000000 in ?? ()
GDB no longer hits breakpoints, and what makes matter crazier is the fact that a colleague and I, are sharing the same session (the debug is done using cygwin with a remote machine) and it works fine for them but not for me.
when I try to get info about the threads using info threads here's what I get :
[New Thread 20]
[New Thread 21]
[New Thread 22]
Id Target Id Frame
4 Thread 22 (ssp=0xa9004d5c) 0x00000000 in ?? ()
3 Thread 21 (ssp=0xa9002e64) 0x00000010 in ?? ()
2 Thread 20 (ssp=0xa9000ef4) 0x00000000 in ?? ()
The current thread <Thread ID 1> has terminated. See `help thread'
there's no thread 6, there's no * to indicate which thread gdb is using. And I don't even know if that's linked to the problem.
Can anyone please help me?

You are not providing nearly enough info to help you. Details matter, and you are withholding them. Versions of GDB and gdbserver matter, how you invoke GDB and gdbserver matter, what warnings you receive from GDB (if any) matter.
Now, this error message:
Program received signal SIGTRAP, Trace/breakpoint trap. [Switching to Thread 6]
usually means that gdbserver has not attached one of the threads of your process, and that thread has tried to execute breakpoint instruction (you do have breakpoints set before this happens, don't you?).
One of the reasons this may happen is when your GDB loads "wrong" libthread_db.so (one that doesn't match the target libc.so.6).
what makes matter crazier is the fact that a colleague and I, are sharing the same session (the debug is done using cygwin with a remote machine) and it works fine for them but not for me.
I am not sure what you mean by "same session", but it's probably not "when he types commands, they work; but when I type the same commands into the same GDB, they don't".
One difference between you and your colleague could be LD_LIBRATY_PATH environment variable setting. Another could be in ~/.gdbinit or in ./.gdbinit.
I suggest running gdb -nx to get rid of the latter, and unsetting LD_LIBRARY_PATH to get rid of the former.

The problem with the whole thing and for some reason no one seemed to notice it is this :
this is how I call gdb /usr/local/build/gdbx.y/gdb/gdb what I should be doing is this : /usr/local/build/gdbx.y/build/gdb/gdb
It was a path problem.

Related

Debugging a program that uses SIGINT with gdb

I frequently work with PostgreSQL for debugging, and it uses SIGINT internally for some of its inter-backend signalling.
As a result when running certain backends under gdb execution tends to get interrupted a lot. One can use the signal command to make sure SIGINT is passed to the program and that it is not captured by gdb... but then gdb doesn't respond to control-C on the command line, since that sends SIGINT.
If you run:
handle SIGINT noprint nostop pass
gdb will complain
SIGINT is used by the debugger.
Are you sure you want to change it? (y or n) y
Is there any way to get gdb to use a different interrupt signal? Or any alternative method that'd let me have gdb ignore SIGINT?
(This isn't an issue for most PostgreSQL backend debugging, but it's a pain with background workers and autovacuum).
Readers who end up on this page (as I did) with a slightly different variation of this problem, would perhaps be more interested in this question:
Debugging a segmentation fault when I do ctrl c
... and its answer, which is:
send SIGINT from inside gdb itself:
(gdb) signal 2
(Normally I would post the link as a simple comment under the OP's question on this page, but since there are already 7 comments, comments are being hidden/buried.)
If you read all the details of the OP's question here, then it is obvious that my answer is not correct for OP.
However, my answer is correct for many situations that could be described by the same title: "Debugging a program that uses SIGINT with gdb"
On UNIX-like systems, you can distinguish a tty-initiated SIGINT from one sent by kill by looking at the si_pid element in the siginfo struct. If the pid is 0, it came from a tty.
So you could do something like this:
catch signal SIGINT
commands
if $_siginfo._sifields._kill.si_pid == 0
print "Received SIGINT from tty"
else
printf "Received SIGINT from %d; continuing\n", $_siginfo._sifields._kill.si_pid
signal SIGINT
end
end
This part of gdb is a bit tricky, both due to its history and also due to the various modes of operation it supports.
One might think that running gdb in a separate terminal and only using attach would help it do the right thing, but I don't think it is that easy.
One way forward might be to only use async execution when debugging, and then use a command to interrupt the inferior. Something like:
(gdb) attach 5555
... attaches
(gdb) continue &
... lots of stuff happens
(gdb) interrupt -a
Depending on your version of gdb you might need to set target-async for this to work.

Find what interrupts sleep()

Is there any way to find what where the signal that interrupted a call to sleep() came from?
I have a ginormous amount of code, and I get this stacktrace from gdb:
#0 0x00418422 in __kernel_vsyscall ()
#1 0x001adfc6 in nanosleep () from /lib/libc.so.6
#2 0x001adde1 in sleep () from /lib/libc.so.6
#3 0x080a3cbd in MRT::setUp (this=0x9c679d8) at /code/Core/exec/mrt.cc:50
#4 0x080a1efc in main (argc=13, argv=0xbfcb6934) at /code/Core/exec/rpn.cc:211
I'm not entirely sure what all the code does, but I think this is what is going on:
Program 1 starts
Calls program 2 for shared memory allocation
Waits predetermined amount of time for allocation to complete
Program 1 continues
Find what interrupts sleep
At the time you attached GDB to the program, the sleep was in fact not interrupted by anything -- your stack trace indicates that your program is still blocked in the sleep system call.
Do you know what the sleep address is inside setup()? For example, sleep(&variable). Look for all callers of wakeup(&variable) and one of them is the sleep breaker. If there are too many, then I would add a trace array to remember the wakeups that were issued i.e. just store the PC from where wakeup was called...you can read that in the core file.
If you are sure that the sleep is interruptible && the sleep was actually interrupted, then I would do what one other poster said...catch the signal in a signal handler, capture signal info and re-arm it with the same signal.
If you are attaching to a running process, the process is interrupted by GDB itself to allow you to debug. The stack trace you observe is simply the stack of the running process at the time you attached to it. sleep() would not be an unreasonable system call for the process to be in when you are attaching to a process that appears to be idle.
If you are debugging a core file that shows the stack trace in sleep(), then when you start GDB to load a core file, it will display the top of the current stack frame of the core file. But just above that, it shows the signal that caused the core file. I wrote a test program, and this is what it showed when I loaded the core file into GDB:
Core was generated by `./a.out'.
Program terminated with signal 11, Segmentation fault.
#0 0x0000000000400458 in main ()
(gdb)
A core file is just a process snapshot, it is not always due to an internal error from the code. Sometimes it is generated by a signal delivered from an external program or the shell. Sometimes it is generated by executing the command generate-core-file from within GDB. In these cases, your core file may not actually point to anything wrong, but just the state the program was in at the time the core file was created.

Program with SIGFPE exception behaves differently under gdb

I have a simple C program which behaves differently when debugged with gdb and not.
The program is this:
#include <stdio.h>
#include <signal.h>
int main() {
kill(getpid(), SIGFPE);
printf("I'm happy.\n");
return 0;
}
When run by itself, I get this very strange result:
ezyang#javelin:~$ ./mini
I'm happy.
ezyang#javelin:~$ echo $?
0
No error! That is not to say that the signal is not being fired, it is:
ezyang#javelin:~$ strace -e signal ./mini
kill(31950, SIGFPE) = 0
--- SIGFPE (Floating point exception) # 0 (0) ---
I'm happy
When in GDB, things proceed differently:
ezyang#javelin:~/Dev/ghc-build-sandbox/libraries/unix/tests/libposix$ gdb ./mini
GNU gdb (GDB) 7.5.91.20130417-cvs-ubuntu
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
...
Reading symbols from /srv/code/ghc-build-sandbox/libraries/unix/tests/libposix/mini...(no debugging symbols found)...done.
(gdb) r
Starting program: /srv/code/ghc-build-sandbox/libraries/unix/tests/libposix/mini
warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7ffff7ffa000
Program received signal SIGFPE, Arithmetic exception.
0x00007ffff7a49317 in kill () at ../sysdeps/unix/syscall-template.S:81
81 ../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb) c
Continuing.
Program terminated with signal SIGFPE, Arithmetic exception.
The program no longer exists.
Asking GDB to not stop makes no difference
(gdb) handle SIGFPE nostop
Signal Stop Print Pass to program Description
SIGFPE No Yes Yes Arithmetic exception
(gdb) r
Starting program: /srv/code/ghc-build-sandbox/libraries/unix/tests/libposix/mini
warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7ffff7ffa000
Program received signal SIGFPE, Arithmetic exception.
Program terminated with signal SIGFPE, Arithmetic exception.
The program no longer exists.
What's going on?! For one thing, why isn't the SIGFPE killing the program; for the second thing, why is GDB behaving differently?
Update. One thought is that the child process is inheriting the signal masks of the parent. However, as can be seen in this transcript, that clearly is not the case: This analysis was not correct, see below.
ezyang#javelin:~$ trap - SIGFPE
ezyang#javelin:~$ ./mini
I'm happy.
Update 2. A friend of mine points out that trap only reports signals as set by the shell itself, and not by any parent processes. So we tracked down the ignore masks of all the parents, and lo and behold, rxvt-unicode had SIGFPE masked. A friend confirmed he could reproduce when he ran the executable using rxvt-unicode.
Ignored signals are inherited across fork() and exec*():
$ ./mini
Floating point exception (core dumped)
$ trap '' SIGFPE
$ ./mini
I'm happy.
$ trap - SIGFPE
$ ./mini
Floating point exception (core dumped)
I discussed this privately with the question author. Debugging was complicated by the fact that bash saves and restores the signal mask from its parent process, and that the trap builtin only reports signals that were handled or ignored in the current shell, even though ignored signals inherited from the parent process will still take effect.
It turns out the root problem was that he was running the test inside urxvt, which links libperl, which unconditionally ignores SIGFPE.
Signal masks and signal dispositions of SIG_IGN are inherited by child processes. The only possibility I can think of is that your shell has SIGFPE masked or ignored for some reason and is not clearing this status before starting your program.

program only in valgrind run without stoppage

My program (it is an smtp server program, tested by jmeter) run without any problem when it is run by valgrind.
But failed (got SIGABRT) finally, if it is running without valgrind or run within 'gdb' debugger.
I've tested all of the valgrind's tools (memcheck,helgrind,drd,massif) but no one reported any problem. I haven't found any memory leaks (checked by mtrace() ).
I've got the following :
Program received signal SIGABRT, Aborted.
[Switching to Thread 0xb7101b70 (LWP 1639)]
0xb776d416 in __kernel_vsyscall ()
the backtrace show various locations which changend run by run. The problems always allude to malloc() or free() (and always correlate with a string (char array) )
The question is: how can I found the problem, if valgrind and mtrace did not show any problem and the program can run without stoppage (within valgrind) in an endless jmeter test loop?

How do I halt the continuing in GDB

I'm pretty much using GDB for the first time.
I run
$ gdb
then I'm running
attach <mypid>
then I see that my process is stuck (which is probably ok). Now I want it to continue running, so I run
continue
and my process continues running
but from here I'm stuck if I want again to watch my current stack trace etc. I couldn't get out of continuing... I tried Ctrl-D etc. but nothing worked for me... (was just a guess).
You should interrupt the process that is attached by gdb.
Do not interrupt gdb itself.
Interrupt the process by either ctrl-c in the terminal in
which the process was started or send the process the SIGINT
by kill -2 procid. With procid the id of the process being attached.
Control+C in the gdb process should bring you back to the command prompt.
Here's a short GDB tutorial, and here's a full GDB manual.
The point of debugging is to inspect interesting/suspicious parts of the program. Breakpoints allow you to stop execution at some source location, and watchpoints allow you to stop when interesting data changes.
Simple examples:
(gdb) break my_function
(gdb) cont
This will insert a breakpoint at the beginning of my_function, so when execution of the program enters the function the program will be suspended and you get GDB prompt back, and be able to inspect program's state. Or you can step through the code.
(gdb) watch my_var
(gdb) cont
Same, but now the program will be stopped at whatever location that modifies the value of my_var.
Shameless plug - here's a link to my GDB presentation at NYC BSD User Group. Hope this helps.
interrupt
gdb> help interrupt
Interrupt the execution of the debugged program.
If non-stop mode is enabled, interrupt only the current thread,
otherwise all the threads in the program are stopped. To
interrupt all running threads in non-stop mode, use the -a option.
interrupt cmd also send SIGINT to debugged process.
gdb> info thread
Cannot execute this command while the target is running.
Use the "interrupt" command to stop the target
and then try again.
gdb> interrupt
[New Thread 27138.27266]
[New Thread 27138.27267]
[New Thread 27138.27268]
[New Thread 27138.27269]
[New Thread 27138.27270]
Thread 1 "loader" received signal SIGINT, Interrupt.
0x0000007fb7c02e90 in nanosleep () from target:/system/lib64/libc.so

Resources