MPI debugging with GDB - No symbol "i" in current context - c

I need to debug my MPI application written in C. I wanted to use the system with GDB attached manually to processes, as it's recommended here (paragraph 6).
The problem is, when I try to print the value of the variable "i", I get this error:
No symbol "i" in current context.
The same problem is with set var i=5. When i try to run info local, it simply states "no locales".
System Ubuntu 14.04
MPICC cc (Ubuntu 4.8.2-19ubuntu1) 4.8.2
GDB GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.2) 7.7.1.
I compile my code with the command
mpicc -o hello hello.c
and execute it with
mpiexec -n 2 ./hello
I've tried to look for this problem, but the solution is usually not to use any optimalization (-O) options in GCC, but it's not useful for me, because I don't use any of them here and I'm compiling with MPICC. I've already tried to declare "i" variable as volatile, and launch mpicc with -g and -O0, but nothing helps.
DBG message
GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.2) 7.7.1
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word".
Attaching to process 3778
Reading symbols from /home/martin/Dokumenty/Programovani/mpi_trenink/hello...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libmpich.so.10...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/x86_64-linux-gnu/libmpich.so.10
Reading symbols from /lib/x86_64-linux-gnu/libc.so.6...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libc-2.19.so...done.
done.
Loaded symbols for /lib/x86_64-linux-gnu/libc.so.6
Reading symbols from /usr/lib/x86_64-linux-gnu/libmpl.so.1...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/x86_64-linux-gnu/libmpl.so.1
Reading symbols from /lib/x86_64-linux-gnu/librt.so.1...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/librt-2.19.so...done.
done.
Loaded symbols for /lib/x86_64-linux-gnu/librt.so.1
Reading symbols from /usr/lib/libcr.so.0...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libcr.so.0
Reading symbols from /lib/x86_64-linux-gnu/libpthread.so.0...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libpthread-2.19.so...done.
done.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Loaded symbols for /lib/x86_64-linux-gnu/libpthread.so.0
Reading symbols from /lib/x86_64-linux-gnu/libgcc_s.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/x86_64-linux-gnu/libgcc_s.so.1
Reading symbols from /lib64/ld-linux-x86-64.so.2...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/ld-2.19.so...done.
done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /lib/x86_64-linux-gnu/libdl.so.2...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libdl-2.19.so...done.
done.
Loaded symbols for /lib/x86_64-linux-gnu/libdl.so.2
Reading symbols from /lib/x86_64-linux-gnu/libnss_files.so.2...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libnss_files-2.19.so...done.
done.
Loaded symbols for /lib/x86_64-linux-gnu/libnss_files.so.2
0x00007f493e53c9a0 in __nanosleep_nocancel ()
at ../sysdeps/unix/syscall-template.S:81
81 ../sysdeps/unix/syscall-template.S: No such file or directory.
My code
#include <stdio.h>
#include <mpi.h>
#include <unistd.h> // sleep()
int main(){
MPI_Init(NULL, NULL);
/* DEBUGGING STOP */
int i = 0;
while(i == 0){
sleep(30);
}
int world_size;
MPI_Comm_size( MPI_COMM_WORLD, &world_size );
int process_id; // casto znaceno jako 'world_rank'
MPI_Comm_rank( MPI_COMM_WORLD, &process_id );
char processor_name[ MPI_MAX_PROCESSOR_NAME ];
int name_len;
MPI_Get_processor_name( processor_name, &name_len );
printf("Hello! - sent from process %d running on processor %s.\n\
Number of processors is %d.\n\
Length of proc name is %d.\n\
***********************\n",
process_id, processor_name, world_size, name_len);
MPI_Finalize();
return 0;
}

With a high probability GDB is to break the process while it is deep into the implementation of the sleep(3) function. You could check that by first issuing the bt (backtrace) command:
(gdb) bt
#0 0x00000030e0caca3d in nanosleep () from /lib64/libc.so.6
#1 0x00000030e0cac8b0 in sleep () from /lib64/libc.so.6
#2 0x0000000000400795 in main (argc=1, argv=0x7fff64ae4688) at sleeper.c:9
i is not present in the frame of nanosleep:
(gdb) info locals
No symbol table info available.
Select the stack frame of the main function by issuing the frame x command (where x is the frame number, 2 in the example shown).
(gdb) f 2
#2 0x0000000000400795 in main (argc=1, argv=0x7fff64ae4688) at sleeper.c:9
9 while(i == 0) { sleep(30); }
i should be there now:
(gdb) info locals
i = 0
You might also need to change the active thread if GDB happens to attach to the wrong one. Many MPI libraries spawn additional threads, e.g. with Intel MPI:
(gdb) info threads
3 Thread 0x7f8b9fada700 (LWP 39085) 0x00000030e0cdf1b3 in poll () from /lib64/libc.so.6
2 Thread 0x7f8b9f0d9700 (LWP 39087) 0x00000030e0cdf1b3 in poll () from /lib64/libc.so.6
* 1 Thread 0x7f8ba1b51700 (LWP 39066) 0x00000030e0caca3d in nanosleep () from /lib64/libc.so.6
The thread marked with * is the one being examined. If some other thread is active, switch to the main one with the thread 1 command.

I've finally solved this. The point is I had to examine the contents of the certain frame with up command, before trying to print the variable "i" up or changing its value.
Step-by-step solution
Compile this code with mpicc -o hello hello.c -g -O0.
Launch the program with mpiexec -n 2 ./hello.
Find the process ID (PID) out.
I use the command ps -e | grep hello.
Other option is to use simply pstree.
And finally, you can use the native Linux function getpid().
Next step is to open a new terminal and launch GDB with the command gdb --pid debugged_process_id.
Now, in debugger type bt.
The output will be similar to this one:
#0 0x00007f63667e09a0 in __nanosleep_nocancel ()
at ../sysdeps/unix/syscall-template.S:81
#1 0x00007f63667e0854 in __sleep (seconds=0)
at ../sysdeps/unix/sysv/linux/sleep.c:137
#2 0x00000000004009ec in main () at hello.c:20
As we can see, paragraph 2 points to the code hello.c, so we can look at it more in detail. Type up 2.
The output will be similar to this one:
#2 0x00000000004009ec in main () at hello.c:20
warning: Source file is more recent than executable.
20 sleep(30);
And finally, now we can print all the local variables in this block out. Type info local.
The output:
i = 0
world_size = 0
process_id = 0
processor_name = "\000\000\000\000\000\000\000\000 5\026gc\177\000\000\200\306Η\377\177\000\000p\306Η\377\177\000\000.N=\366\000\000\000\000\272\005#\000\000\000\000\000\377\377\377\377\000\000\000\000%0`\236\060\000\000\000\250\361rfc\177\000\000x\n\026gc\177\000\000\320\067`\236\060\000\000\000\377\377\377\177\376\377\377\377\001\000\000\000\000\000\000\000\335\n#\000\000\000\000\000\377\377\377\377\377\377\377\377\000\000\000\000\000\000\000"
name_len = 1718986550
Now we can free the stopper loop by set var i=1 and continue with debugging.

Related

GDB symbols missing - libc claimed to be wrong library or version mismatch

I am having trouble showing proper debug symbols in the backtrace in GDB in an ARM cross-compiled system, built using Yocto.
abc.c is a simple printf("Hello world\n"); program in C (nothing tricky). On the build machine:
> yocto-dir/build/tmp-angstrom-glibc/sysroots/x86_64-linux/usr/bin/arm-angstrom-linux-gnueabi/arm-angstrom-linux-gnueabi-gcc abc --sysroot=yocto-dir/build/tmp-angstrom-glibc/sysroots/imx28scm -g -O0 -o abc
> scp abc root#DEVICE-IP:~
On the ARM target:
> gdbserver :2345 abc
Start GDB on the build machine (from installed Yocto SDK):
> /usr/local/oecore-x86_64/sysroots/x86_64-angstromsdk-linux/usr/bin/arm-angstrom-linux-gnueabi/arm-angstrom-linux-gnueabi-gdb abc
GNU gdb (Linaro GDB) 7.8-2014.09
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "--host=x86_64-angstromsdk-linux --target=arm-angstrom-linux-gnueabi".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://bugs.linaro.org>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from abc...done.
(gdb) target remote DEVICE-IP:2345
Remote debugging using DEVICE-IP:2345
warning: Unable to find dynamic linker breakpoint function.
GDB will be unable to debug shared library initializers
and track explicitly loaded dynamic code.
Cannot access memory at address 0x0
0x4ae90a20 in ?? ()
(gdb) bt
#0 0x4ae90a20 in ?? ()
#1 0x00000000 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) set sysroot yocto-dir/build/tmp-angstrom-glibc/sysroots/imx28scm
Reading symbols from yocto-dir/build/tmp-angstrom-glibc/sysroots/imx28scm/lib/ld-linux.so.3...done.
Loaded symbols for yocto-dir/build/tmp-angstrom-glibc/sysroots/imx28scm/lib/ld-linux.so.3
Cannot access memory at address 0x0
After setting the sysroot, it still does not give symbols.
(gdb) bt
#0 0x4ae90a20 in ?? ()
#1 0x00000000 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) b main
Breakpoint 1 at 0x84a8: file abc.c, line 5.
(gdb) c
Continuing.
Breakpoint 1, main () at abc.c:5
5 printf("Hello world\n");
Okay, when it hits a breakpoint, it does display symbols.
(gdb) bt
Cannot access memory at address 0x0
#0 main () at abc.c:5
However, it goes weird stepping beyond there.
(gdb) n
Cannot access memory at address 0x1
0x4aea6ea0 in ?? ()
(gdb) bt
#0 0x4aea6ea0 in ?? ()
#1 0x0000a014 in do_lookup_unique (Cannot access memory at address 0x1
undef_map=0x1, ref=0x0, strtab=0x56ebb27 <error: Cannot access memory at address 0x56ebb27>, sym=0x84a0 <main>, type_class=-1224757248, result=0x1, map=<optimized out>,
new_hash=<optimized out>, undef_name=<optimized out>) at /usr/src/debug/glibc/2.24-r0/git/elf/dl-lookup.c:332
#2 do_lookup_x (undef_name=<optimized out>, new_hash=<optimized out>, old_hash=<optimized out>, ref=0x0, result=<optimized out>, scope=0x177ff8e, i=<optimized out>, version=<optimized out>,
flags=-1224757248, skip=0x1, type_class=100, undef_map=0x1) at /usr/src/debug/glibc/2.24-r0/git/elf/dl-lookup.c:544
#3 0x4aec0b10 in ?? ()
Cannot access memory at address 0x1
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
It can't find the proper version of libc.so.6.
(gdb) info sharedlibrary
warning: .dynamic section for "yocto-dir/build/tmp-angstrom-glibc/sysroots/imx28scm/lib/libc.so.6" is not at the expected address (wrong library or version mismatch?)
From To Syms Read Shared Object Library
0x000007d0 0x0001bee0 Yes yocto-dir/build/tmp-angstrom-glibc/sysroots/imx28scm/lib/ld-linux.so.3
0x4aee73c0 0x4afe2018 No yocto-dir/build/tmp-angstrom-glibc/sysroots/imx28scm/lib/libc.so.6
(gdb) n
Cannot find bounds of current function
It does not give an ideal debugging experience.
There is a gcc inside yocto-dir sysroot (as used above), as well as in /usr/local/oecore-x86_64. They both behave the same. The /usr/local/oecore-x86_64 SDK is freshly built and installed.
Similarly, there is an imx28scm sysroot inside yocto-dir (as used above), as well as in /usr/local/oecore-x86_64, and they both behave the same. However, they clearly do have different versions of libc.so.6 - yocto-dir's is 14.8MB, and /usr/local/oecore-x86_64's is 1.3MB. This is a concern, however setting either of these locations as the sysroot does not fix the problem.
One workaround is to link with -static. GDB does give symbols in this case:
(gdb) target remote DEVICE-IP:2345
Remote debugging using DEVICE-IP:2345
_start () at ../sysdeps/arm/start.S:79
79 ../sysdeps/arm/start.S: No such file or directory.
(gdb) set sysroot yocto-dir/build/tmp-angstrom-glibc/sysroots/imx28scm
(gdb) bt
#0 _start () at ../sysdeps/arm/start.S:79
(gdb) b main
Breakpoint 1 at 0x8480: file abc.c, line 5.
(gdb) c
Continuing.
Breakpoint 1, main () at abc.c:5
5 printf("Hello world\n");
(gdb) n
6 return 0;
(gdb) n
7 }
Linking with -Wl,--verbose seems to show it is linking with the library in the expected sysroot:
yocto-dir/build/tmp-angstrom-glibc/sysroots/x86_64-linux/usr/libexec/arm-angstrom-linux-gnueabi/gcc/arm-angstrom-linux-gnueabi/6.2.1/ld: Attempt to open yocto-dir/build/tmp-angstrom-glibc/sysroots/imx28scm/lib/libc.so.6 succeeded
The linker also finds this one, but it isn't referred to as libc.so.6, so presumably this is not interfering.
yocto-dir/build/tmp-angstrom-glibc/sysroots/x86_64-linux/usr/libexec/arm-angstrom-linux-gnueabi/gcc/arm-angstrom-linux-gnueabi/6.2.1/ld: Attempt to open yocto-dir/build/tmp-angstrom-glibc/sysroots/imx28scm/usr/lib/libc.so succeeded
Why is there a library version mismatch in this case? How can I get GDB to display symbols from the library which it expects? I do not wish to link statically.
Please make sure the libc in the box is same as the one in your build server.
sorry, this should be a comments, but currently, I don't have enough reputation.
Apparently GDB for ARM target has trouble with trying to load symbols before main() (Debugging shared libraries with gdbserver):
The problem I had was that gdbserver stops at the dynamic loader, before main, and the dynamic libraries are not yet loaded at that point, and so GDB does not know where the symbols will go in memory yet.
GDB appears to have some mechanisms to automatically load shared library symbols, and if I compile for host, and run gdbserver locally, running to main is not needed. But on the ARM target, that is the most reliable thing to do.
Therefore, set it to load shared symbols after main has been hit:
> b main
> c
<breakpoint hit>
> set sysroot <sysroot>
Or reload the symbols after you hit main.
> set sysroot <sysroot>
...
> b main
> c
<breakpoint hit>
> nosharedlibrary
> sharedlibrary
Or it might be useful in interfacing with IDE debuggers to set auto loading of symbols to be off on GDB startup:
> set auto-solib-add off

Debugger in C::B. Can't open cygwin.S

Hi I just discovered a quite weird behaviour of the debugger when declaring a simple two-dimensional array. It looks like it can't open a file cygwin.S in the library.
Cannot open file: ../../../../../src/gcc-4.8.1/libgcc/config/i386/cygwin.S
At ../../../../../src/gcc-4.8.1/libgcc/config/i386/cygwin.S:169
An execution without debugging works fine. There is an example of the code:
#include <stdio.h>
#include <stdlib.h>
int main()
{
const int strNumTries = 15;
const int strLength = 98;
char strName[strLength][strNumTries];
printf("Hello world!\n");
return 0;
}
Debugger stops on the char array declaration using 'step in' method. So what's the problem it may be?
I suspect your seeing something like this:
GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.2) 7.7.1
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from untitled...done.
(gdb) br main
Breakpoint 1 at 0x4005f1: file untitled.c, line 6.
(gdb) r
Starting program: /home/xxxx/untitled
.
.
.
Breakpoint 1, main () at untitled.c:6
6 const int strNumTries = 15;
(gdb) s
7 const int strLength = 98;
(gdb)
8 char strName[strLength][strNumTries];
(gdb)
10 printf("Hello world!\n");
(gdb)
_IO_puts (str=0x400794 "Hello world!") at ioputs.c:34
34 ioputs.c: No such file or directory.
(gdb)
36 in ioputs.c
(gdb)
strlen () at ../sysdeps/x86_64/strlen.S:66
66 ../sysdeps/x86_64/strlen.S: No such file or directory.
(gdb)
67 in ../sysdeps/x86_64/strlen.S
(gdb)
68 in ../sysdeps/x86_64/strlen.S
where that last line repeats many times before moving on to any other sub functions.
This is not an error, but rather part of the call to printf()
the following is the only gdb output that has anything to do with the array declaration:
8 char strName[strLength][strNumTries];
Not sure if this will help your situation, but ...
I had exactly the same errors from GDB, and GDB admitted it had failed and had a problem (see GDB listing below). I could reproduce this with GCC/gFortran 6.4 and 7.1 100% (running C::B under Win, with MingW).
Of course, I had not ever installed cygwin, and I only use seh and sjlj variants of the MingW64 compilers. Also, I don't have, and never had any of the dir's that GDB is complaining about (e.g. "/../src/gcc-7.1.0/") so it made no sense ... and the problem arose only after adding one more ostensibly similar s/r to my lib of thousands of s/r's (none of which ever complained about this).
... to make a long story short, the problem turned out to be "out of stack space", as I had declared some (Automatic) arrays with large size. Changing either the array sizes (smaller) or changing the Automatic arrays to Allocatable arrays (the former goes on the stack, the later on the heap) fixed the problem.
... so, GDB seems to have a bug, and its complaints had nothing to do with the actual error.
---- for completeness, here is the relevant portion of my GDB listing:
[debug][New Thread 740.0x9fc]
[debug]172 ../../../../../src/gcc-7.1.0/libgcc/config/i386/cygwin.S: No such file or directory.
[debug]Thread 1 received signal SIGSEGV, Segmentation fault.
[debug]__chkstk_ms () at ../../../../../src/gcc-7.1.0/libgcc/config/i386/cygwin.S:172
[debug]>>>>>>cb_gdb:
[debug]> info frame
[debug]Stack level 0, frame at 0x136f00:
[debug] eip = 0x664974bb in __chkstk_ms (../../../../../src/gcc-7.1.0/libgcc/config/i386/cygwin.S:172); saved eip = 0x664958b3
[debug] called by frame at 0x136f14
[debug] source language asm.
[debug] Arglist at 0x136ef8, args:
[debug] Locals at 0x136ef8, Previous frame's sp is 0x136f00
[debug] Saved registers:
[debug] eax at 0x136ef4, ecx at 0x136ef8, eip at 0x136efc
[debug]>>>>>>cb_gdb:
Cannot open file: ../../../../../src/gcc-7.1.0/libgcc/config/i386/cygwin.S
At ../../../../../src/gcc-7.1.0/libgcc/config/i386/cygwin.S:172
[debug]> info locals
[debug]No locals.
[debug]>>>>>>cb_gdb:
[debug]> info args
[debug]No symbol table info available.
[debug]>>>>>>cb_gdb:
[debug]> bt 30
[debug]../../../../src/gdb-7.11.1/gdb/dwarf2loc.c:364: internal-error: dwarf_expr_frame_base: Assertion `framefunc != NULL' failed.
[debug]A problem internal to GDB has been detected,
[debug]further debugging may prove unreliable.
[debug]This is a bug, please report it. For instructions, see:
[debug]<http://www.gnu.org/software/gdb/bugs/>.
[debug]This application has requested the Runtime to terminate it in an unusual way.
[debug]Please contact the application's support team for more information.
[debug]#0 __chkstk_ms () at ../../../../../src/gcc-7.1.0/libgcc/config/i386/cygwin.S:172
[debug]#1 0x664958b3 in fadcern_sixtrack_xl (
Debugger finished with status 1
I received this error when i was trying to debug using gdb in vscode. Vscode don't offer redirected input especially for C. So i was using freopen(). Everything worked fine until it stepped into the declaration part of array.
int arr[n]
So I replaced it with a pointer and allocated memory dynamically.
int *arr=(int *) malloc(sizeof(int)*n);
and this worked.
I think gdb is unable to handle those arr[variable] declaration. But I may be completely wrong. Hope this helps.

call to ffi_call fails even though arguments look right

Consider this gist. I have checked and double checked this piece of code for defects and can't find any apparent flaws in the code. It also compiles fine when I use g++ -g -std=c++11 -Wall dynlibtest.cc -ldl -lffi -lstdc++ -odynlibtest && ./dynlibtest (the -ldl and -lffi switches are for the dynamic loading and FFI libraries, respectively).
However, when the highlighted line (l.96) executes it segfaults.
I have also tried pulling it through gdb, and after installing the libc debugging symbols it spits this message out when the ./dynlibtest bin segfaults:
(gdb) next
Program received signal SIGSEGV, Segmentation fault.
__memcpy_sse2_unaligned () at ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S:157
157 ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S: No such file or directory.
Who can help me understand why this segfaults? Is it a bug of some kind or am I using one of the API's wrong?
For reference: the first part of the code calls gettimeofday directly to show that the code can indeed find it, and that even the args are correct when it is called directly.
EDIT: I have added the gdb output when the code segfaults with the output of bt also attached:
$ gdb ./dynlibtest
GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.2) 7.7.1
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/.
Find the GDB manual and other documentation resources online at:
http://www.gnu.org/software/gdb/documentation/.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./dynlibtest...done.
(gdb) break 96
Breakpoint 1 at 0x401032: file dynlibtest.cc, line 96.
(gdb) run
Starting program: /home/j/dev/elisp-ffi/dynlibtest
Test started...
Got main program handle
pre-alloc: tv.tv_sec = 140737340592552
Sleeping for 1 second
post-alloc: tv.tv_sec = 1432058412
Sleeping for 1 second
Fn ptr call: tv.tv_sec = 1432058413
FFI CIF preparation is OK
Sleeping for 1 second
Breakpoint 1, main () at dynlibtest.cc:96
96 ffi_call(&cif, FFI_FN(gettimeofday), &result, args);
(gdb) next
Program received signal SIGSEGV, Segmentation fault.
__memcpy_sse2_unaligned () at ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S:157
157 ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S: No such file or directory.
(gdb) bt
#0 __memcpy_sse2_unaligned () at ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S:157
#1 0x00007ffff79d34c2 in memcpy (__len=8, __src=0x0, __dest=0x7fffffffda48) at /usr/include/x86_64-linux-gnu/bits/string3.h:51
#2 ffi_call (cif=0x7fffffffdca0, fn=0x400ab0 , rvalue=0x7fffffffdc40, avalue=0x7fffffffdc00) at ../src/x86/ffi64.c:504
#3 0x000000000040104e in main () at dynlibtest.cc:96
(gdb)

SDL - Segmentation Fault (core dumped), any thoughts?

Having this problem since I've installed SDL. First of all, I've tried to install it with the tar.gz file, didn't went ok when trying to compile (terminal couldn't find the dir for SDL lib), so after that I've installed the synpatic pack mng, and sucessfully downloaded the "libsdl1.2-dev" file.
I am following lazzy foo's tutorial for SDL, whenever I try to compile a simple code to create a screen and blit an image, i get the following message in the terminal:
(gcc -Wall -o teste teste.c -lSDL -lSDL_image)
"Segmentation fault (core dumped)"
Here it is my code in C:
#include <stdio.h>
#include <stdlib.h>
#include "SDL/SDL.h"
int main( int argc, char* args[] )
{
SDL_Surface* hello = NULL;
SDL_Surface* screen = NULL;
SDL_Init(SDL_INIT_EVERYTHING);
screen = SDL_SetVideoMode(640, 480, 32, SDL_SWSURFACE);
if (screen == NULL) {
printf("SDL_SetVideoMode failed: %s\n", SDL_GetError());
exit(1); /* Unrecoverable error */
}
hello = SDL_LoadBMP("hello.bmp");
SDL_BlitSurface(hello, NULL, screen, NULL);
SDL_Flip(screen);
SDL_Delay(2000);
SDL_FreeSurface(hello);
SDL_Quit();
return 0;
}
I've already made sure that hello.bmp is in the same dir of my teste.c file.
Here's a log using gdb to backtrace:
LOG
GNU gdb (Ubuntu 7.8-1ubuntu4) 7.8.0.20141001-cvs
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from teste...(no debugging symbols found)...done.
(gdb) run
Starting program: /home/lazzo/Documentos/Treino/teste
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff707c700 (LWP 5605)]
Program received signal SIGSEGV, Segmentation fault.
SDL_Flip (screen=0x0) at ./src/video/SDL_video.c:1109
1109 ./src/video/SDL_video.c: No such file or directory.
(gdb) bt
#0 SDL_Flip (screen=0x0) at ./src/video/SDL_video.c:1109
#1 0x00000000004009a2 in main ()
(gdb) c
Continuing.
[Thread 0x7ffff7fd8740 (LWP 5601) exited]
Program terminated with signal SIGSEGV, Segmentation fault.
The program no longer exists.
(gdb) q
]0;lazzo#J-Ubuntu: ~/Documentos/Treinolazzo#J-Ubuntu:~/Documentos/Treino$ exit
exit
END OF LOG
Any help you guys could give me would be really appreciated, and I apologize for my bad english, I am from Brazil and still learning english.
UPDATE
After adding Klas suggestion to my code, I've got this from terminal:
"SDL_SetVideoMode failed: No avaible video device"
How is that even possible? (my videocard is a radeon HD 4850 btw)
Problem round 1 (compilation):
The target filename must follow immediately after the -o option, so you should change the order of the arguments:
gcc -Wall -o teste teste.c -lSDL -lSDL_image
This may not solve all your build problems, but it is a good start.
Problem round 2 (adding error handling):
The call to SDL_SetVideoMode returned null. If you get a return value of null you should call SDL_GetError immediately after to check what the error is:
screen = SDL_SetVideoMode(640, 480, 32, SDL_SWSURFACE);
if (screen == NULL) {
printf("SDL_SetVideoMode failed: %s\n", SDL_GetError());
exit(1); /* Unrecoverable error */
}
You should add similar handling for the other SDL calls.
Only thing that have worked out in my case was to format Ubuntu and try another distro. Right now I am using Linux Mint, and despite that fact that it's totally based on Ubuntu, everything is working as expected now. Just sharing my solution to the problem, in case somebody else have this very same problem someday.

Unable to find stack smashing function using GDB

I have the following C application:
#include <stdio.h>
void smash()
{
int i;
char buffer[16];
for(i = 0; i < 17; i++) // <-- exceeds the limit of the buffer
{
buffer[i] = i;
}
}
int main()
{
printf("Starting\n");
smash();
return 0;
}
I cross-compiled using the following version of gcc:
armv5l-linux-gnueabi-gcc -v
Using built-in specs.
Target: armv5l-linux-gnueabi
Configured with: /home/tarjeif/svn/builder/build_armv5l-linux-gnueabi/gcc-4.4.1/gcc-4.4.1/configure --target=armv5l-linux-gnueabi --host=i486-linux-gnu --build=i486-linux-gnu --prefix=/home/tarjeif/svn/builder/build_armv5l-linux-gnueabi/toolchain --with-sysroot=/home/tarjeif/svn/builder/build_armv5l-linux-gnueabi/toolchain --with-headers=/home/tarjeif/svn/builder/build_armv5l-linux-gnueabi/toolchain/include --enable-languages=c,c++ --with-gmp=/home/tarjeif/svn/builder/build_armv5l-linux-gnueabi/gmp-5.0.0/gmp-host-install --with-mpfr=/home/tarjeif/svn/builder/build_armv5l-linux-gnueabi/mpfr-2.4.2/mpfr-host-install --disable-nls --disable-libgcj --disable-libmudflap --disable-libssp --disable-libgomp --enable-checking=release --with-system-zlib --with-arch=armv5t --with-gnu-as --with-gnu-ld --enable-shared --enable-symvers=gnu --enable-__cxa_atexit --disable-nls --without-fp --enable-threads
Thread model: posix
gcc version 4.4.1 (GCC)
Invoked like this:
armv5l-linux-gnueabi-gcc -ggdb3 -fstack-protector-all -O0 test.c
When run on target, it outputs:
Starting
*** stack smashing detected ***: ./a.out terminated
Aborted (core dumped)
I load the resulting core dump in gdb yielding the following backtrace:
GNU gdb (GDB) 7.0.1
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "--host=i486-linux-gnu --target=armv5l-linux-gnueabi".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/andersn/workspace/stacktest/a.out...done.
Reading symbols from /home/andersn/workspace/stacktest/linux/toolchain/lib/libc.so.6...done.
Loaded symbols for /home/andersn/workspace/stacktest/linux/toolchain/lib/libc.so.6
Reading symbols from /home/andersn/workspace/stacktest/linux/toolchain/lib/ld-linux.so.3...done.
Loaded symbols for /home/andersn/workspace/stacktest/linux/toolchain/lib/ld-linux.so.3
Reading symbols from /home/andersn/workspace/stacktest/linux/toolchain /lib/libgcc_s.so.1...done.
Loaded symbols for /home/andersn/workspace/stacktest/linux/toolchain/lib/libgcc_s.so.1
Core was generated by `./a.out'.
Program terminated with signal 6, Aborted.
#0 0x40052d4c in *__GI_raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:67
67 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
in ../nptl/sysdeps/unix/sysv/linux/raise.c
(gdb) bt
#0 0x40052d4c in *__GI_raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:67
#1 0x40054244 in *__GI_abort () at abort.c:92
#2 0x40054244 in *__GI_abort () at abort.c:92
#3 0x40054244 in *__GI_abort () at abort.c:92
#4 0x40054244 in *__GI_abort () at abort.c:92
#5 0x40054244 in *__GI_abort () at abort.c:92
#6 0x40054244 in *__GI_abort () at abort.c:92
... and so on ...
Now, the question:
I'm totally unable to find the function causing the stack smashing from GDB even though the smash() function don't overwrite any structural data of the stack, only the stack protector itself. What should I do?
The problem is that the version of GCC which compiled your target libc.so.6 is buggy and did not emit correct unwind descriptors for __GI_raise. With incorrect unwind descriptors, GDB gets into a loop while unwinding the stack.
You can examine the unwind descriptors with
readelf -wf /home/andersn/workspace/stacktest/linux/toolchain/lib/libc.so.6
I expect you'll get exact same result in GDB from any program calling abort, e.g.
#include <stdlib.h>
void foo() { abort(); }
int main() { foo(); return 0; }
Unfortunately, there isn't much you can do, other than trying to build newer version of GCC, and then rebuilding the whole "world" with it.
It's not the case that GDB can always work out what happened to a smashed stack even with -fstack-protector-all (and even with -Wstack-protector to warn about functions with frames that weren't protected). Example.
In these cases the stack protector has done its job (killed a misbehaving app) but hasn't done the debugger any favors. (The classic example is a stack smash where a write has occurred with a large enough stride that it jumps the canary.) In these cases it may become necessary to binary search through the code via breakpoints to narrow down which region of the code is causing the smash, then single step through the smash to see how it happened.
Have you tried resolving this complaint: "../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory." to see if actually being able to resolve symbols would help?

Resources