symbol-file command working with older gdb but not with newer gdb - c

I have an issue where gdb on one system I can't get gdb to use a symbol-file in order to show me source files and line numbers in a backtrace, but on an older system with the exact same commands it works fine. The older system where this is woking is a 32-bit Debian 6, and the modern one where it isn't working is a 64-bit Debian 10.
In both cases I've built, run, and ran gdb on the same system (so no crossing binaries or cores between systems)
I can reproduce this with a simple toy program (test.c) designed just to crash immediately:
int main()
{
int *i=0;
*i=0;
return 0;
}
I compile it and split it into a stripped executable and a symbols file, and run the executable:
gcc -g test.c -o test
objcopy --only-keep-debug test test.dbg
objcopy --strip-debug test
ulimit -c unlimited
./test
I then open the core dump in gdb. When I do this on an older system (32-bit, gdb 7.0.1-debian), I see the backtrace without any symbols, but once I run symbol-file test.dbg I then see the source file and line information in the backtrace (test.c:4) (I've not included some early gdb output that clutters the screen and I don't think is relevant, I can put it in if its needed)
gdb -c core test
Core was generated by `./test'.
Program terminated with signal 11, Segmentation fault.
#0 0x080483a4 in main ()
(gdb) symbol-file test.dbg
Reading symbols from /home/user/test.dbg...done.
(gdb) bt
#0 0x080483a4 in main () at test.c:4
(gdb) quit
Running the exact same sequence on a newer machine (64-bit, gdb 8.2.1), I get the following
Core was generated by `./test'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x000055e5dda71135 in main ()
(gdb) symbol-file test.dbg
Load new symbol table from "test.dbg"? (y or n) y
Reading symbols from test.dbg...done.
(gdb) bt
#0 0x000055e5dda71135 in ?? ()
#1 0x000055e5dda71150 in ?? ()
#2 0x00007eff7035f09b in __libc_start_main (main=0x55e5dda71125, argc=1, argv=0x7fff46187ec8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fff46187eb8)
at ../csu/libc-start.c:308
#3 0x000055e5dda7106a in ?? ()
#4 0x00007fff46187eb8 in ?? ()
#5 0x000000000000001c in ?? ()
#6 0x0000000000000001 in ?? ()
#7 0x00007fff46188e8f in ?? ()
#8 0x0000000000000000 in ?? ()
Loading the symbol file not only doesn't add the sources and lines, but seems to lose the main() in the first frame. I've also tried using add-symbol-file instead of symbol-file and including the symbol file on the command line with the -s option but with no better results. I've also tried putting -s before -c as I found recommended online but that didn't help either.
Edit: Transcript that was requested below:
$ gcc -g test.c -o test
$ objcopy --only-keep-debug test test.dbg
$ ./test
Segmentation fault (core dumped)
$ gdb test.dbg core
GNU gdb (Debian 8.2.1-2+b1) 8.2.1
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from test.dbg...done.
warning: core file may not match specified executable file.
[New LWP 26602]
Core was generated by `./test'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x000056477c36d135 in ?? ()
(gdb) bt
#0 0x000056477c36d135 in ?? ()
#1 0x000056477c36d150 in ?? ()
#2 0x00007fa3a18a509b in __libc_start_main (main=0x56477c36d125, argc=1, argv=0x7ffd0b6aed38, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffd0b6aed28)
at ../csu/libc-start.c:308
#3 0x000056477c36d06a in ?? ()
#4 0x00007ffd0b6aed28 in ?? ()
#5 0x000000000000001c in ?? ()
#6 0x0000000000000001 in ?? ()
#7 0x00007ffd0b6afe8d in ?? ()
#8 0x0000000000000000 in ?? ()
(gdb)

Running the exact same sequence on a newer machine (64-bit, gdb 8.2.1), I get the following
Try building your test binary with -fno-pie -no-pie.
Your working example has a non-PIE address (0x0804xxxx). Your non-working example has a PIE address (0x000055....). Debian decided to make PIE binaries the default, which adds a modicum of security to the system.
What is likely happening is that GDB throws away the relocation info when it reloads the symbol-file.
P.S. You can avoid this whole problem by using:
gdb test.dbg core
which would start GDB on the right foot from the get-go.
P.P.S. It's generally a really bad idea to name any binary test, because that may interfere with shell evaluation of conditionals (if the shell finds ./test instead of /bin/test, and doesn't have test as built-in).

Related

What does ?? in gdb backtrace mean and how to get the actual stack frames?

I was trying to learn how to use gdb on core dumps.
Here is the code:
int main()
{
return 1/0;
}
This is the gdb output, when I run gdb a.out core:
warning: exec file is newer than core file.
[New LWP 3121]
Core was generated by `./crash'.
Program terminated with signal SIGFPE, Arithmetic exception.
#0 0x00000000004004fc in ?? ()
(gdb) bt
#0 0x00000000004004fc in ?? ()
#1 0x0000000000400500 in ?? ()
#2 0x00007f6ea0945b97 in ?? ()
#3 0x0000000000000000 in ?? ()
What are ?? in the backtrace? How can I resolve them?
Those ?? are usually where the name of the function is displayed. GDB does not know the name of those functions and therefore displays ??.
Now, why is this happening? Depends. GCC compiles including symbols (e.g. function names and similar) by default. Most probably you are working with a stripped version, where symbols have been removed, or just with the wrong file.
As #zwol suggests, the line you see warning: exec file is newer than core file is an indication of the fact that something else is going on that you don't show in your question. You are working on a core dump file generated by the crashed executable, which is outdated.
I would suggest you to re-compile the program from scratch and make sure that you are opening the right file with GDB. First produce a new core dump by crashing the new program, then open it in GDB.
Assuming the following program.c:
int main(void) { return 1/0; }
This should work:
$ rm -f core
$ gcc program.c -o program
$ ./program
Floating point exception (core dumped)
$ gdb program core
Reading symbols from program...(no debugging symbols found)...done.
[New LWP 11896]
Core was generated by `./program'.
Program terminated with signal SIGFPE, Arithmetic exception.
#0 0x000055d24a4cd790 in main ()
(gdb) bt
#0 0x000055d24a4cd790 in main ()
(gdb)
NOTE: if you don't see (core dumped) when running the process that means that a core dump was not generated (which leaves you with the old one). If you are using Bash, try running the command ulimit -c unlimited before crashing the program.
What does ?? in gdb backtrace mean
It means that GDB has no idea to which code the addresses in backtrace: 0x04004fc, 0x0400500, etc. correspond.
and how to get the actual stack frames?
That depends on why this is happening. There are two common scenarios:
You are debugging the wrong executable.
One way this could happen is when you compile with optimization, e.g. gcc -O2 main.c -o crash, let the program dump core, then recompile with debugging (e.g. gcc -g main.c -o crash) and try to debug "old" core dump with "new" executable.
Don't do that. Instead, compile with optimization and debugging: gcc -O2 -g main.c -o crash.
P.S. This warning: warning: exec file is newer than core file is intended to warn you precisely about this case.
The other common cause is when you obtain a crash on a production system and try to debug it on a development one (given the addresses which you show this is unlikely to have happened here).
For that case, see this answer.
You did not compile with debug symbols - try adding -g to the compile line

Useless stack trace by gdb(1) of "-g"-based core-file: nothing but "... in ?? ()"

I'm getting the following (useless) stack trace from gdb:
$ gdb -e pqact -c core.6067
GNU gdb (GDB) Fedora (7.2-52.fc14)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
[New Thread 6067]
Cannot access memory at address 0x675
Cannot access memory at address 0x675
Cannot access memory at address 0x675
Cannot access memory at address 0x675
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Cannot access memory at address 0x675
Core was generated by `pqact -f WMO|CONDUIT|NGRID|EXP /local/ldm/etc/GEMPAK/pqact.gempak_upc'.
Program terminated with signal 11, Segmentation fault.
#0 0x0000003790044d99 in ?? ()
Missing separate debuginfos, use: debuginfo-install glibc-2.13-2.x86_64
(gdb) where
#0 0x0000003790044d99 in ?? ()
#1 0x00000000903959a0 in ?? ()
#2 0x00000000900466e4 in ?? ()
#3 0x000000379080ee20 in ?? ()
#4 0x000000000040f81e in ?? ()
#5 0x000000379080ee60 in ?? ()
#6 0x0000000000000000 in ?? ()
(gdb) ^Z
The program was built with debugging enabled (libtool boilerplate omitted for clarity):
$ c99 ... -g ... -c -o action.o action.c
$ c99 ... -g ... -c -o filel.o filel.c
$ c99 ... -g ... -c -o palt.o palt.c
$ c99 ... -g ... -c -o pbuf.o pbuf.c
$ c99 ... -g ... -c -o pqact.o pqact.c
$ c99 ... -g ... -c -o state.o state.c
$ c99 -g -o pqact action.o filel.o palt.o pbuf.o pqact.o state.o ...
c99 is gcc on my system:
$ uname -a
Linux xxx.xxx.xxx.xxx 2.6.35.14-106.fc14.x86_64 #1 SMP Wed Nov 23 13:07:52 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux
$ type c99
c99 is /usr/bin/c99
$ c99 -dumpversion
4.5.1
And the resulting program and core-file appear OK:
$ file pqact
pqact: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.32, not stripped
$ file core.6067
core.6067: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style, from 'pqact -f WMO|CONDUIT|NGRID|EXP /local/ldm/etc/GEMPAK/pqact.gempak_upc'
$ ldd pqact
linux-vdso.so.1 => (0x00007fff385ff000)
libldm.so.0 => /opt/ldm/ldm-6.13.0.0/lib/libldm.so.0 (0x00007ffcb2f3a000)
libgdbm.so.3 => /usr/lib64/libgdbm.so.3 (0x0000003790400000)
libxml2.so.2 => /usr/lib64/libxml2.so.2 (0x00000030d8a00000)
libz.so.1 => /lib64/libz.so.1 (0x0000003d27e00000)
librt.so.1 => /lib64/librt.so.1 (0x0000003791800000)
libm.so.6 => /lib64/libm.so.6 (0x00007ffcb2c8c000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x0000003790800000)
libc.so.6 => /lib64/libc.so.6 (0x0000003790000000)
libdl.so.2 => /lib64/libdl.so.2 (0x0000003790c00000)
/lib64/ld-linux-x86-64.so.2 (0x000000378fc00000)
Any ideas why the stack-trace is useless?
Any ideas why the stack-trace is useless?
The most common reason is that the binaries you give GDB to analyse the core do not match the binaries that were used at the time the core was produced.
For example, if your optimized binary crashed and produced core, and then you rebuild that binary without optimization, and try to use the new binary to analyze existing core, that wouldn't work.
Updating system libraries with newer version will also have that effect. So will copying core from one machine to another (unless system libraries match).
See also this answer.

gdb issue on Ubuntu 14.04

The system is Ubuntu 14.04 32bit, upgraded to the latest version.
test2.c:
#include <unistd.h>
void test() {
sleep(1000);
}
int main() {
test();
}
Compile it:
gcc /tmp/test2.c -o /tmp/test2 -g
Run and try to attach it with gdb:
$ sudo gdb -p 9038
GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.2) 7.7.1
...
Attaching to process 9038
Reading symbols from /tmp/test2...done.
Reading symbols from /lib/i386-linux-gnu/libc.so.6...Reading symbols from /usr/lib/debug//lib/i386-linux-gnu/libc-2.19.so...done.
done.
Loaded symbols for /lib/i386-linux-gnu/libc.so.6
Reading symbols from /lib/ld-linux.so.2...Reading symbols from /usr/lib/debug//lib/i386-linux-gnu/ld-2.19.so...done.
done.
Loaded symbols for /lib/ld-linux.so.2
0xb7799cb0 in ?? ()
(gdb) bt
#0 0xb7799cb0 in ?? ()
#1 0x00000000 in ?? ()
The gdb could load symbols from binary, but it just cannot parse the stack correctly. It's really weird.
Any suggestion?

How to print gcc runtime error to text file?

I'm trying to make a program. That's printing error from C code through gcc compiler.
When Compile error happens, it makes text file which has error messages in. But when runtime error happens. For example, 'segmentation fault'. The file is empty.
It shows segmentation fault well on terminal, but it doesn't show the errors in the file.
I tried to type below few commands but it still doesn't work.
gcc new.c &> myFile
gcc new.c > myFile 2>&1
I think you need core dump file but not catch the run-time error of gcc
I show you how to get core dump under Linux, I wrote a test program, test_coredump.c:
#include <stdlib.h>
int main(void){
int *ptr = NULL;
*ptr = 10; //Segmentation fault will happen since you write to a null memory
return 0;
}
Normally, I will do following step before compile:
how#ubuntu-sw:~/Work/c/test_coredump
-> ulimit -c
0
how#ubuntu-sw:~/Work/c/test_coredump
-> ulimit -c unlimited
how#ubuntu-sw:~/Work/c/test_coredump
-> ulimit -c
unlimited
how#ubuntu-sw:~/Work/c/test_coredump
-> gcc -g ./test_coredump.c
how#ubuntu-sw:~/Work/c/test_coredump
-> ls
a.out test_coredump.c
how#ubuntu-sw:~/Work/c/test_coredump
-> ./a.out
Segmentation fault (core dumped)
after this, it will generate core dump file for you:
how#ubuntu-sw:~/Work/c/test_coredump
-> ls
a.out core test_coredump.c
and you can know use gdb or whatever debug tool you like to see:
how#ubuntu-sw:~/Work/c/test_coredump
-> gdb ./a.out ./core
GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.2) 7.7.1
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./a.out...done.
[New LWP 6421]
Core was generated by `./a.out'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x080483fd in main () at test_coredump.c:5
5 *ptr = 10; //Segmentation fault will happen since you write to a null memory
(gdb)
you can find where you program trap.
this should put the error in a file properly :
gcc new.c -o new.x >& error.log
gcc compiles the program. Redirecting gcc's output will not help you with any errors that occur when your program runs after it was successfully compiled.
To run your program and redirect your program's stderr to a file, you can write ./yourprogramname 2>file.txt.
However, the Segmentation fault message is not generated by your program either. It is generated by the operating system and it is printed on the shell's stderr (not your program's stderr). To redirect this message it depends on your shell, see this question.

GCC doesn't produce line number information even with -g option

I have built and installed GCC 4.8.1 from source:
$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/x86_64-unknown-linux-gnu/4.8.1/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: ./configure --disable-multilib
Thread model: posix
gcc version 4.8.1 (GCC)
And I've written a simple useless program:
$ cat hw.c
#include <stdio.h>
void foo()
{
int a;
scanf("%d", &a); /* So I can press ctrl+c here. */
printf("Hello world!\n");
}
int main()
{
foo();
}
Now I compile this:
$ gcc -g -O0 hw.c -o hw
Then started debugging it with GDB:
$ gdb hw
GNU gdb (Ubuntu/Linaro 7.4-2012.04-0ubuntu2.1) 7.4-2012.04
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
<http://bugs.launchpad.net/gdb-linaro/>...
Reading symbols from /home/calmarius/workdir/crucible/hw/hw...done.
(gdb)
Run it and Ctrl+C it immediately:
(gdb) run
Starting program: /home/dcsirmaz/workdir/crucible/hw/hw
^C
Program received signal SIGINT, Interrupt.
0x00007ffff7b018b0 in __read_nocancel () at ../sysdeps/unix/syscall-template.S:82
82 ../sysdeps/unix/syscall-template.S: Nincs ilyen fájl vagy könyvtár.
I got function names in the backtrace but no line numbers in my code:
(gdb) bt
#0 0x00007ffff7b018b0 in __read_nocancel () at ../sysdeps/unix/syscall-template.S:82
#1 0x00007ffff7a95ff8 in _IO_new_file_underflow (fp=0x7ffff7dd4340) at fileops.c:619
#2 0x00007ffff7a9703e in _IO_default_uflow (fp=0x7ffff7dd4340) at genops.c:440
#3 0x00007ffff7a74fb6 in _IO_vfscanf_internal (s=<optimized out>, format=<optimized out>, argptr=0x7fffffffe018, errp=0x0) at vfscanf.c:620
#4 0x00007ffff7a790bd in __isoc99_scanf (format=<optimized out>) at isoc99_scanf.c:37
#5 0x000000000040054e in foo ()
#6 0x0000000000400568 in main ()
What's gone wrong? Maybe is it something with the configuration?
Your gdb is too old -- you need a more recent gdb (I use 7.6) to understand the debugging info generated by gcc 4.8.1
Usually GCC uses dwarf as its main debugging file format, you need to enable dwarf support when building gcc with the flag --with-dwarf2.
While building your compiled object you can use -ggdb instead of -g which is a more specific solution but just for gdb.

Resources