Linux machine 3.2.0-34-generic, x86_64
Source: libtar
Version: 1.2.11-8
#include <stdio.h>
#include <libtar.h>
#include <fcntl.h>
int main(void)
{
TAR *pTar;
char *tarFilename = "file.tar";
char *srcDir = "/home/test";
char *extractTo = ".";
tar_open(&pTar, tarFilename, NULL, O_WRONLY | O_CREAT, 0644, TAR_GNU);
tar_append_tree(pTar, srcDir, extractTo);
tar_close(pTar);
return (0);
}
Compiling as:
gcc main.c -o run -ltar
Getting segfault
Now with "gcc main.c -g3 -O0 -o run -ltar"
Program received signal SIGABRT, Aborted.
0x00007ffff7845425 in __GI_raise (sig=<optimized out>)
at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
64 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
0 0x00007ffff7845425 in __GI_raise (sig=<optimized out>)
at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
1 0x00007ffff7848b8b in __GI_abort () at abort.c:91
2 0x00007ffff788339e in __libc_message (do_abort=2,
fmt=0x7ffff798ae3f "*** %s ***: %s terminated\n")
at ../sysdeps/unix/sysv/linux/libc_fatal.c:201
3 0x00007ffff7919807 in __GI___fortify_fail (
msg=0x7ffff798add6 "buffer overflow detected") at fortify_fail.c:32
4 0x00007ffff7918700 in __GI___chk_fail () at chk_fail.c:29
5 0x00007ffff79179e6 in __strncpy_chk (s1=<optimized out>,
s2=<optimized out>, n=<optimized out>, s1len=<optimized out>)
at strncpy_chk.c:34
6 0x00007ffff7bd12ef in th_finish () from /usr/lib/libtar.so.0
7 0x00007ffff7bd0dc1 in th_write () from /usr/lib/libtar.so.0
8 0x00007ffff7bd07f0 in tar_append_file () from /usr/lib/libtar.so.0
9 0x00007ffff7bd3c12 in tar_append_tree () from /usr/lib/libtar.so.0
10 0x00000000004006f1 in main () at main.c:12
All examples, f.e. here
Using libtar to compress a directory
Seems to work fine....
It looks like a library defect
https://bugzilla.redhat.com/show_bug.cgi?id=538770
You may need to upgrade libtar-devel
Related
In my x86-64 Linux program I deliberately do:
char *ptr = 0x3e8;
int x = *(int *)ptr;
When I run it in gdb the process crashes due to SIGSEGV and prints a valid backtrace. If I do instead:
char s[16];
snprintf(s, 16, "%s\n", ptr);
The process still crashes but the backtrace is trash:
(gdb) bt
#0 0x00007ffff5da15c7 in ?? ()
#1 0x00007ffff5c704d3 in ?? ()
#2 0x0000000000000000 in ?? ()
My example may look contrived but my production code is crashing in snprintf() in exactly the same way. I've compiled with -g -O0.
The process still crashes but the backtrace is trash
When I build this test:
#include <stdio.h>
int main()
{
char *ptr = (char *)0x3e8;
char s[16];
snprintf(s, 16, "%s\n", ptr);
return 0;
}
Using gcc (Debian 9.3.0-3) 9.3.0 and GNU C Library (Debian GLIBC 2.30-4) stable release version 2.30., with libc6-dbg installed, I get:
Program received signal SIGSEGV, Segmentation fault.
__strlen_avx2 () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:96
96 ../sysdeps/x86_64/multiarch/strlen-avx2.S: No such file or directory.
(gdb) bt
#0 __strlen_avx2 () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:96
#1 0x00007ffff7e48756 in __vfprintf_internal (s=s#entry=0x7fffffffd8b0, format=format#entry=0x555555556004 "%s\n", ap=ap#entry=0x7fffffffda30, mode_flags=mode_flags#entry=0) at vfprintf-internal.c:1688
#2 0x00007ffff7e5a1f6 in __vsnprintf_internal (string=0x7fffffffdb10 "", maxlen=<optimized out>, format=0x555555556004 "%s\n", args=args#entry=0x7fffffffda30, mode_flags=mode_flags#entry=0) at vsnprintf.c:114
#3 0x00007ffff7e335a2 in __GI___snprintf (s=<optimized out>, maxlen=<optimized out>, format=<optimized out>) at snprintf.c:31
#4 0x0000555555555169 in main () at t.c:7
I suspect that you'll get similar result from this test case on a standard x86 Ubuntu 18.04, in which case you are not telling us the whole story, and an MCVE would help a lot to get you the real answer.
Suppose I have:
#include <stdlib.h>
int main()
{
int a = 2, b = 3;
if (a!=b)
abort();
}
Compiled with:
gcc -g c.c
Running this, I'll get a coredump (due to the SIGABRT raised by abort()), which I can debug with:
gdb a.out core
How can I get gdb to print the values of a and b from this context?
Here's the another way to specifically get a and b values by moving to the interested frame and then info locals would give you the values.
a.out was compiled with your code. (frame 2 is what you are interested in i.e., main()).
$ gdb ./a.out core
[ removed some not-so-interesting info here ]
Reading symbols from ./a.out...done.
[New LWP 14732]
Core was generated by `./a.out'.
Program terminated with signal SIGABRT, Aborted.
#0 __GI_raise (sig=sig#entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
51 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0 __GI_raise (sig=sig#entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1 0x00007fac16269f5d in __GI_abort () at abort.c:90
#2 0x00005592862f266d in main () at f.c:7
(gdb) frame 2
#2 0x00005592862f266d in main () at f.c:7
7 abort();
(gdb) info locals
a = 2
b = 3
(gdb) q
You can also use print once frame 2:
(gdb) print a
$1 = 2
(gdb) print b
$2 = 3
Did you compile with debug symbols -g? The command should be bt for backtrace, you can also use bt full for a full backtrace.
More infos: https://sourceware.org/gdb/onlinedocs/gdb/Backtrace.html
My objective is to hook the open function that dlopen on linux uses. For some reason, this code is not hooking dlopen->open, but it does hook my version of open main.c->open. Is dlopen not using my symbols somehow?
Compilation process is as follows:
gcc main.c -ldl -ggdb
gcc fake-open.c -o libexample.so -fPIC -shared
export LD_PRELOAD="$PWD/libexample.so"
When I run the program, everything works. Ensuring the LD_PRELOAD variable is set.. etc.
Here is the problem, when I try to hook the open function directly or indirectly called by dlopen, somehow this "version" of open is not being resolved/redirected/hooked by my version.
[main.c]
#include <dlfcn.h>
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
int main()
{
puts("calling open");
int fd = open("/tmp/test.so", O_RDONLY|O_CLOEXEC);
puts("calling dlopen");
int *handle = dlopen("/tmp/test.so", RTLD_LAZY);
}
[fake-open.c]
#define _GNU_SOURCE
#include <stdio.h>
#include <dlfcn.h>
#include <sys/types.h>
#include <sys/stat.h>
//#include <fcntl.h>
int open(const char *pathname, int flags)
{
puts("from hooked..");
return 1;
}
Console Output:
calling open
from hooked..
calling dlopen
I know for a fact dlopen is somehow calling open due to strace.
write(1, "calling open\n", 13calling open
) = 13
write(1, "from hooked..\n", 14from hooked..
) = 14
write(1, "calling dlopen\n", 15calling dlopen
) = 15
brk(0) = 0x804b000
brk(0x806c000) = 0x806c000
open("/tmp/test.so", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\2\0\3\0\1\0\0\0`\205\4\0104\0\0\0"..., 512) = 512
But, for some reason, when dlopen calls open, it is not using my version of open. This has to be some kind of linking of run time symbol resolution problem, or perhaps dlopen is using a static version of open and doesnt need to resolve any symbols at run or load time?
First, contrary to #usr's answer, dlopen does open the library.
We can confirm this by running a simple test under GDB:
// main.c
#include <dlfcn.h>
int main()
{
void *h = dlopen("./foo.so", RTLD_LAZY);
return 0;
}
// foo.c; compile with "gcc -fPIC -shared -o foo.so foo.c"
int foo() { return 0; }
Let's compile and run this:
gdb -q ./a.out
(gdb) start
Temporary breakpoint 1 at 0x400605: file main.c, line 4.
Starting program: /tmp/a.out
Temporary breakpoint 1, main () at main.c:4
4 void *h = dlopen("./foo.so", RTLD_LAZY);
(gdb) catch syscall open
Catchpoint 2 (syscall 'open' [2])
(gdb) c
Continuing.
Catchpoint 2 (call to syscall open), 0x00007ffff7df3497 in open64 () at ../sysdeps/unix/syscall-template.S:81
81 ../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb) bt
#0 0x00007ffff7df3497 in open64 () at ../sysdeps/unix/syscall-template.S:81
#1 0x00007ffff7ddf5bd in open_verify (name=0x602010 "./foo.so", fbp=0x7fffffffd568, loader=<optimized out>, whatcode=<optimized out>, found_other_class=0x7fffffffd550, free_name=<optimized out>) at dl-load.c:1930
#2 0x00007ffff7de2d6f in _dl_map_object (loader=loader#entry=0x7ffff7ffe1c8, name=name#entry=0x4006a4 "./foo.so", type=type#entry=2, trace_mode=trace_mode#entry=0, mode=mode#entry=-1879048191, nsid=0) at dl-load.c:2543
#3 0x00007ffff7deea14 in dl_open_worker (a=a#entry=0x7fffffffdae8) at dl-open.c:235
#4 0x00007ffff7de9fc4 in _dl_catch_error (objname=objname#entry=0x7fffffffdad8, errstring=errstring#entry=0x7fffffffdae0, mallocedp=mallocedp#entry=0x7fffffffdad0, operate=operate#entry=0x7ffff7dee960 <dl_open_worker>, args=args#entry=0x7fffffffdae8) at dl-error.c:187
#5 0x00007ffff7dee37b in _dl_open (file=0x4006a4 "./foo.so", mode=-2147483647, caller_dlopen=<optimized out>, nsid=-2, argc=1, argv=0x7fffffffde28, env=0x7fffffffde38) at dl-open.c:661
#6 0x00007ffff7bd702b in dlopen_doit (a=a#entry=0x7fffffffdd00) at dlopen.c:66
#7 0x00007ffff7de9fc4 in _dl_catch_error (objname=0x7ffff7dd9110 <last_result+16>, errstring=0x7ffff7dd9118 <last_result+24>, mallocedp=0x7ffff7dd9108 <last_result+8>, operate=0x7ffff7bd6fd0 <dlopen_doit>, args=0x7fffffffdd00) at dl-error.c:187
#8 0x00007ffff7bd762d in _dlerror_run (operate=operate#entry=0x7ffff7bd6fd0 <dlopen_doit>, args=args#entry=0x7fffffffdd00) at dlerror.c:163
#9 0x00007ffff7bd70c1 in __dlopen (file=<optimized out>, mode=<optimized out>) at dlopen.c:87
#10 0x0000000000400614 in main () at main.c:4
This tells you that on 64-bit system, dlopen calls open64 instead of open, so your interposer wouldn't work (you'd need to interpose open64 instead).
But you are on a 32-bit system (as evidenced by the 0x806c000 etc. addresses printed by strace), and there the stack trace looks like this:
#0 0xf7ff3774 in open () at ../sysdeps/unix/syscall-template.S:81
#1 0xf7fe1211 in open_verify (name=0x804b008 "./foo.so", fbp=fbp#entry=0xffffc93c, loader=0xf7ffd938, whatcode=whatcode#entry=0, found_other_class=found_other_class#entry=0xffffc933, free_name=free_name#entry=true) at dl-load.c:1930
#2 0xf7fe4114 in _dl_map_object (loader=loader#entry=0xf7ffd938, name=name#entry=0x8048590 "./foo.so", type=type#entry=2, trace_mode=trace_mode#entry=0, mode=mode#entry=-1879048191, nsid=0) at dl-load.c:2543
#3 0xf7feec14 in dl_open_worker (a=0xffffccdc) at dl-open.c:235
#4 0xf7feac06 in _dl_catch_error (objname=objname#entry=0xffffccd4, errstring=errstring#entry=0xffffccd8, mallocedp=mallocedp#entry=0xffffccd3, operate=operate#entry=0xf7feeb50 <dl_open_worker>, args=args#entry=0xffffccdc) at dl-error.c:187
#5 0xf7fee644 in _dl_open (file=0x8048590 "./foo.so", mode=-2147483647, caller_dlopen=0x80484ea <main+29>, nsid=<optimized out>, argc=1, argv=0xffffcf74, env=0xffffcf7c) at dl-open.c:661
#6 0xf7fafcbc in dlopen_doit (a=0xffffce90) at dlopen.c:66
#7 0xf7feac06 in _dl_catch_error (objname=0xf7fb3070 <last_result+12>, errstring=0xf7fb3074 <last_result+16>, mallocedp=0xf7fb306c <last_result+8>, operate=0xf7fafc30 <dlopen_doit>, args=0xffffce90) at dl-error.c:187
#8 0xf7fb037c in _dlerror_run (operate=operate#entry=0xf7fafc30 <dlopen_doit>, args=args#entry=0xffffce90) at dlerror.c:163
#9 0xf7fafd71 in __dlopen (file=0x8048590 "./foo.so", mode=1) at dlopen.c:87
#10 0x080484ea in main () at main.c:4
So why isn't open_verifys call to open redirected to your open interposer?
First, let's look at the actual call instruction in frame 1:
(gdb) fr 1
#1 0xf7fe1211 in open_verify (name=0x804b008 "./foo.so", fbp=fbp#entry=0xffffc93c, loader=0xf7ffd938, whatcode=whatcode#entry=0, found_other_class=found_other_class#entry=0xffffc933, free_name=free_name#entry=true) at dl-load.c:1930
1930 dl-load.c: No such file or directory.
(gdb) x/i $pc-5
0xf7fe120c <open_verify+60>: call 0xf7ff3760 <open>
Compare this to the call instruction in frame 10:
(gdb) fr 10
#10 0x080484ea in main () at main.c:4
4 void *h = dlopen("./foo.so", RTLD_LAZY);
(gdb) x/i $pc-5
0x80484e5 <main+24>: call 0x80483c0 <dlopen#plt>
Notice anything different?
That's right: the call from main goes through the procedure linkage table (PLT), which the dynamic loader (ld-linux.so.2) resolves to appropriate definition.
But the call to open in frame 1 does not go through PLT (and thus is not interposable).
Why is that? Because that call must work before there is any other definition of open available -- it is used while the libc.so.6 (which normally supplies the definition of open) is itself being loaded (by the dynamic loader).
For this reason, the dynamic loader must be entirely self-contained (in fact in contains a statically linked in copy of a subset of libc).
My objective is to hook the open function that dlopen on linux uses.
For the reason above, this objective can't be achieved via LD_PRELOAD. You'll need to use some other hooking mechanism, such as patching the loader executable code at runtime.
I'm trying to invoke gdb with a stripped executable and a separate debug symbols file, on a core dump generated from running the stripped executable.
But when I use the separate debug symbols file, gdb is unable to give information on local variables for me.
Here is a log showing entirely how I produce my 3 ELF files and the core file and then run them through gdb 3 times.
First I just run gdb with the stripped executable and of course can't see any file names or line numbers, and can't inspect variables.
Then I run gdb using the stripped executable and grabbing the debug symbols from the original unstripped executable. This works pretty well but does give a disturbing and apparently unwarranted warning about the core and executable possibly mismatching.
Finally I run gdb with the stripped executable and the separate debug file. This still gives filenames and line numbers, but I can't inspect local variables and I get a "can't compute CFA for this frame" error.
Here is the log:
2016-09-16 16:01:45 barry#somehost ~/proj/segfault/segfault
$ cat segfault.c
#include <stdio.h>
int main(int argc, char **argv) {
char *badpointer = (char *)0x2398723;
printf("badpointer: %s\n", badpointer);
return 0;
}
2016-09-16 16:03:31 barry#somehost ~/proj/segfault/segfault
$ gcc -g -o segfault segfault.c
2016-09-16 16:03:37 barry#somehost ~/proj/segfault/segfault
$ objcopy --strip-debug segfault segfault.stripped
2016-09-16 16:03:40 barry#somehost ~/proj/segfault/segfault
$ objcopy --only-keep-debug segfault segfault.debug
2016-09-16 16:03:43 barry#somehost ~/proj/segfault/segfault
$ ./segfault.stripped
Segmentation fault (core dumped)
2016-09-16 16:03:48 barry#somehost ~/proj/segfault/segfault
$ ll /tmp/core.segfault.stripp.11
-rw------- 1 barry bsm-it 188416 2016-09-16 16:03 /tmp/core.segfault.stripp.11
2016-09-16 16:03:51 barry#somehost ~/proj/segfault/segfault
$ gdb ./segfault.stripped /tmp/core.segfault.stripp.11
GNU gdb (GDB) Fedora (7.0.1-50.fc12)
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/barry/proj/segfault/segfault/segfault.stripped...(no debugging symbols found)...done.
warning: core file may not match specified executable file.
Missing separate debuginfo for
Try: yum --disablerepo='*' --enablerepo='*-debuginfo' install /usr/lib/debug/.build-id/a6/8dce9115a92508af92ac4ccac24b9f0cc34d71
Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Core was generated by `./segfault.stripped'.
Program terminated with signal 11, Segmentation fault.
#0 0x00000035fec47cb7 in vfprintf () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.11.2-3.x86_64
(gdb) bt
#0 0x00000035fec47cb7 in vfprintf () from /lib64/libc.so.6
#1 0x00000035fec4ec4a in printf () from /lib64/libc.so.6
#2 0x00000000004004f4 in main ()
(gdb) up
#1 0x00000035fec4ec4a in printf () from /lib64/libc.so.6
(gdb) up
#2 0x00000000004004f4 in main ()
(gdb) p argc
No symbol table is loaded. Use the "file" command.
(gdb) q
2016-09-16 16:04:19 barry#somehost ~/proj/segfault/segfault
$ gdb -q -e ./segfault.stripped -s ./segfault -c /tmp/core.segfault.stripp.11
Reading symbols from /home/barry/proj/segfault/segfault/segfault...done.
warning: core file may not match specified executable file.
Missing separate debuginfo for
Try: yum --disablerepo='*' --enablerepo='*-debuginfo' install /usr/lib/debug/.build-id/a6/8dce9115a92508af92ac4ccac24b9f0cc34d71
Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Core was generated by `./segfault.stripped'.
Program terminated with signal 11, Segmentation fault.
#0 0x00000035fec47cb7 in vfprintf () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.11.2-3.x86_64
(gdb) bt
#0 0x00000035fec47cb7 in vfprintf () from /lib64/libc.so.6
#1 0x00000035fec4ec4a in printf () from /lib64/libc.so.6
#2 0x00000000004004f4 in main (argc=1, argv=0x7fffd1c0a728) at segfault.c:4
(gdb) up
#1 0x00000035fec4ec4a in printf () from /lib64/libc.so.6
(gdb) up
#2 0x00000000004004f4 in main (argc=1, argv=0x7fffd1c0a728) at segfault.c:4
4 printf("badpointer: %s\n", badpointer);
(gdb) p argc
$1 = 1
(gdb) q
2016-09-16 16:04:39 barry#somehost ~/proj/segfault/segfault
$ gdb -q -e ./segfault.stripped -s ./segfault.debug -c /tmp/core.segfault.stripp.11
Reading symbols from /home/barry/proj/segfault/segfault/segfault.debug...done.
warning: core file may not match specified executable file.
Missing separate debuginfo for
Try: yum --disablerepo='*' --enablerepo='*-debuginfo' install /usr/lib/debug/.build-id/a6/8dce9115a92508af92ac4ccac24b9f0cc34d71
Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Core was generated by `./segfault.stripped'.
Program terminated with signal 11, Segmentation fault.
#0 0x00000035fec47cb7 in vfprintf () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.11.2-3.x86_64
(gdb) bt
#0 0x00000035fec47cb7 in vfprintf () from /lib64/libc.so.6
#1 0x00000035fec4ec4a in printf () from /lib64/libc.so.6
#2 0x00000000004004f4 in main (argc=can't compute CFA for this frame
) at segfault.c:4
(gdb) up
#1 0x00000035fec4ec4a in printf () from /lib64/libc.so.6
(gdb) up
#2 0x00000000004004f4 in main (argc=can't compute CFA for this frame
) at segfault.c:4
4 printf("badpointer: %s\n", badpointer);
(gdb) p argc
can't compute CFA for this frame
(gdb) q
I have some questions about this:
Why does it display the warning "warning: core file may not match specified executable file.", even though I'm using the exact same executable path as was used when the core dump was originally generated?
Why does using the separate debug symbols (-s ./segfault.debug) result in the error "can't compute CFA for this frame" when attempting to inspect local variables?
What is a CFA anyway?
Am I using an incorrect method to product the debug symbol file?
I confirmed that using "objcopy --strip-debug" gives the same result as "strip -g".
Am I using the right options to feed the debug info into gdb?
My intention is that the stripped executables will be installed on a binary-compatible production system and any core dumps generated due to segfaults can be copied back to the devel system where we can feed them into gdb with the debug info and analyse the crash position and stack variables. But as a first step I'm trying to sort out the issues with using separate debug info files on the devel system.
It seems that using a separate debug symbols file causes the "can't compute CFA for this frame" error, even when a core file is not used.
My gcc version:
2016-09-16 16:07:39 barry#somehost ~/proj/segfault/segfault
$ gcc -v
Using built-in specs.
Target: x86_64-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk --disable-dssi --enable-plugin --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre --enable-libgcj-multifile --enable-java-maintainer-mode --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --disable-libjava-multilib --with-ppl --with-cloog --with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux
Thread model: posix
gcc version 4.4.4 20100630 (Red Hat 4.4.4-10) (GCC)
I suspect that gdb might be looking for symbols related to the variables in the segfault.debug file when objcopy actually only put them in the segfault.stripped file. If this is the case, perhaps some small adjustment to the options to objcopy could put those symbols in the place gdb is looking?
I commend you for wanting to keep a set of symbol files for everything that is deployed to the production server; in my opinion this is an often overlooked practice, but you will not regret it -- one day it will save you a lot of debugging trouble.
As I have had similar issues in the past, I will try to answer some of your questions, although you have quite an ancient toolchain, if you don't mind me saying so, so I'm not sure how much that really applies here. I'll put up here anyway.
CFA = Canonical Frame Address. This is the base pointer to the stack frame that every local variable is addressed relative to. If you have done some traditional x86 assembly programming, the BP register was used for this. So "can't compute CFA for this frame" basically says "I know of these local variables, but I don't know where they are located on the stack".
There used to be code in GDB that worked only for the DWARF-2 debugging format, and non-conformance triggered this particular error at least. That restriction was lifted some time ago, but that change won't be in your version.
The other thing is there are debug information regarding how variables may be moved around is not always generated. This usually happens in newer compilers though, as they get better at optimizing.
I was able to get rid of my problems by compiling like this:
gcc -g3 -gdwarf-2 -fvar-tracking -fvar-tracking-assignments -o segfault segfault.c
you can try to see if this solves your problem, too.
Regarding the message about the location of the symbol file; it seems that the debugger wants to load it from the system directory. Maybe you have to link the executable to the symbol file with:
objcopy --add-gnu-debuglink=segfault.debug segfault
I found this question while searching for an answer to the following part of the original question:
Why does it display the warning "warning: core file may not match
specified executable file.", even though I'm using the exact same
executable path as was used when the core dump was originally
generated?
There was not an answer to this particular question but through experimentation and research I believe I have found the answer.
Below is a transcript of using gdb to debug a core file. Notice that the "warning: core file may not match specified executable file." error appears when the executable file that caused the core is greater than 15 characters in length.
[~/t]$cat do_abort.c
#include <stdlib.h>
int func4(int f) { if(f) {abort();} return 0;}
int func3(int f) { return func4(f); }
int func2(int f) { return func3(f); }
int func1(int f) { return func2(f); }
int main(void) { return func1(1); }
[~/t]$gcc -g -o 123456789012345 do_abort.c
[~/t]$./123456789012345
Aborted (core dumped)
[~/t]$ll core*
-rw-------. 1 dev wheel 240K Apr 22 03:19 core.42697
[~/t]$gdb -q -c core.42697 123456789012345
Reading symbols from /home/dev/t/123456789012345...done.
[New LWP 42697]
Core was generated by `./123456789012345'.
Program terminated with signal 6, Aborted.
#0 0x00007f0be67631d7 in __GI_raise (sig=sig#entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56 return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
(gdb) bt
#0 0x00007f0be67631d7 in __GI_raise (sig=sig#entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007f0be67648c8 in __GI_abort () at abort.c:90
#2 0x0000000000400543 in func4 (f=1) at do_abort.c:3
#3 0x000000000040055f in func3 (f=1) at do_abort.c:4
#4 0x0000000000400576 in func2 (f=1) at do_abort.c:5
#5 0x000000000040058d in func1 (f=1) at do_abort.c:6
#6 0x000000000040059d in main () at do_abort.c:7
(gdb) quit
[~/t]$rm core.42697
[~/t]$
[~/t]$mv 123456789012345 1234567890123456
[~/t]$./1234567890123456
Aborted (core dumped)
[~/t]$ll core*
-rw-------. 1 dev wheel 240K Apr 22 03:20 core.42721
[~/t]$gdb -q -c core.42721 1234567890123456
Reading symbols from /home/dev/t/1234567890123456...done.
warning: core file may not match specified executable file.
[New LWP 42721]
Core was generated by `./1234567890123456'.
Program terminated with signal 6, Aborted.
#0 0x00007f5b271fa1d7 in __GI_raise (sig=sig#entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56 return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
(gdb) bt
#0 0x00007f5b271fa1d7 in __GI_raise (sig=sig#entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007f5b271fb8c8 in __GI_abort () at abort.c:90
#2 0x0000000000400543 in func4 (f=1) at do_abort.c:3
#3 0x000000000040055f in func3 (f=1) at do_abort.c:4
#4 0x0000000000400576 in func2 (f=1) at do_abort.c:5
#5 0x000000000040058d in func1 (f=1) at do_abort.c:6
#6 0x000000000040059d in main () at do_abort.c:7
(gdb) quit
[~/t]$mv 1234567890123456 123456789012345
[~/t]$gdb -q -c core.42721 123456789012345
Reading symbols from /home/dev/t/123456789012345...done.
[New LWP 42721]
Core was generated by `./1234567890123456'.
Program terminated with signal 6, Aborted.
#0 0x00007f5b271fa1d7 in __GI_raise (sig=sig#entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56 return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
(gdb) bt
#0 0x00007f5b271fa1d7 in __GI_raise (sig=sig#entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007f5b271fb8c8 in __GI_abort () at abort.c:90
#2 0x0000000000400543 in func4 (f=1) at do_abort.c:3
#3 0x000000000040055f in func3 (f=1) at do_abort.c:4
#4 0x0000000000400576 in func2 (f=1) at do_abort.c:5
#5 0x000000000040058d in func1 (f=1) at do_abort.c:6
#6 0x000000000040059d in main () at do_abort.c:7
(gdb) quit
Following through the gdb source code I discovered that the ELF core file structure only reserves sixteen bytes to hold the executable filename, pr_fname[16], including the nul terminator (reference):
35 struct elf_external_linux_prpsinfo32_ugid32
36 {
37 char pr_state; /* Numeric process state. */
38 char pr_sname; /* Char for pr_state. */
39 char pr_zomb; /* Zombie. */
40 char pr_nice; /* Nice val. */
41 char pr_flag[4]; /* Flags. */
42 char pr_uid[4];
43 char pr_gid[4];
44 char pr_pid[4];
45 char pr_ppid[4];
46 char pr_pgrp[4];
47 char pr_sid[4];
48 char pr_fname[16]; /* Filename of executable. */
49 char pr_psargs[80]; /* Initial part of arg list. */
50 };
The "warning: core file may not match specified executable file." warning will be issued by gdb when the name of the executable passed on the command-line to gdb doesn't match the value stored in pr_fname[] in the core file (references here, here, and here).
Using the demonstration I showed at the start of this answer, when the filename is 1234567890123456 the filename stored in the core file as pr_fname[] is 123456789012345 (truncated to 15 characters). If gdb is started using gdb -c core.XXXX 1234567890123456 then the warning will be issued. If gdb is started using gdb -c core.XXXX 123456789012345 then the warning will not be issued.
It should follow that in the example from the original question, if segfault.stripped was renamed to segfault.stripp and gdb was run using gdb ./segfault.stripp /tmp/core.segfault.stripp.11 then the warning should not be issued.
Consider the following contiki program.
#include<stdio.h>
#include"contiki.h"
#include <stdlib.h>
static char *mem;
static int x;
/*---------------------------------------------------------------------------*/
PROCESS(test, "test");
AUTOSTART_PROCESSES(&test);
/*---------------------------------------------------------------------------*/
PROCESS_THREAD(test, ev, data)
{
PROCESS_BEGIN();
printf("before malloc\n");
mem=(char*)malloc(10);
for(x=0;x<10;x++)
mem[x]=x+1;
printf("after malloc\n");
PROCESS_END();
}
when this program is compiled for native/z1/wismote/cooja it executes perfectly fine and both the printf statements are executed, but when compiled for mbxxx target, and then executed on hardware, only the first printf statements is executed and the code gets stuck in the malloc. Any guess or reason behind this behaviour? I am also attaching the GDB trace here.
(gdb) mon reset init
target state: halted
target halted due to debug-request, current mode: Thread
xPSR: 0x01000000 pc: 0x08000efc msp: 0x20000500
(gdb) b test.c:16
Breakpoint 1 at 0x8000ec8: file test.c, line 16.
(gdb) b test.c:17
Breakpoint 2 at 0x8000ece: file test.c, line 17.
(gdb) b test.c:18
Breakpoint 3 at 0x8000ed8: file test.c, line 18.
(gdb) load
Loading section .isr_vector, size 0x84 lma 0x8000000
Loading section .text, size 0xc5c4 lma 0x8000084
Loading section .data, size 0x660 lma 0x800c648
Start address 0x8000084, load size 52392
Transfer rate: 15 KB/sec, 8732 bytes/write.
(gdb) c
Continuing.
Note: automatically using hardware breakpoints for read-only addresses.
Breakpoint 1, process_thread_test (process_pt=0x2000050c <test+12>, ev=129 '\201', data=0x0) at test.c:16
16 printf("before malloc\n");
(gdb) c
Continuing.
Breakpoint 2, process_thread_test (process_pt=0x2000050c <test+12>, ev=<optimized out>,
data=<optimized out>) at test.c:17
17 mem=(char*)malloc(10);
(gdb) c
Continuing.
^C
Program received signal SIGINT, Interrupt.
Default_Handler () at ../../cpu/stm32w108/hal/micro/cortexm3/stm32w108/crt-stm32w108.c:87
87 {
(gdb) bt
#0 Default_Handler () at ../../cpu/stm32w108/hal/micro/cortexm3/stm32w108/crt-stm32w108.c:87
#1 <signal handler called>
#2 0x08000440 in _malloc_r ()
#3 0x08000ed4 in process_thread_test (process_pt=0x2000050c <certificate_check+12>, ev=<optimized out>,
data=<optimized out>) at test.c:17
#4 0x0800272c in call_process (p=0x20000500 <test>, ev=<optimized out>, data=<optimized out>)
at ../../core/sys/process.c:190
#5 0x080028e6 in process_post_synch (p=<optimized out>, ev=ev#entry=129 '\201', data=<optimized out>)
at ../../core/sys/process.c:366
#6 0x0800291a in process_start (p=<optimized out>, arg=arg#entry=0x0) at ../../core/sys/process.c:120
#7 0x08002964 in autostart_start (processes=<optimized out>) at ../../core/sys/autostart.c:57
#8 0x08001134 in main () at ../../platform/mbxxx/./contiki-main.c:210
(gdb)
Ahhh... Finally figured out the problem. This particular problem was there because stm32w108 was not configured to use dynamic memory.
All that was needed to be done was, to open the the following file:
contiki-2.7/cpu/stm32w108/hal/micro/cortexm3/stm32w108/crt-stm32w108.c and add #define USE_HEAP at the top of the file or before the _sbrk implementation! The heap size can also be modified here, not from the linker script, although the stack size
A side note: It is a really bad idea to use dynamic memory allocation in embedded systems, so avoid it! Its filthy trust me! Eventually I will also remove any dynamic memory allocation references! :)