I am using GDB to debug a program that uses libpthread. There is an error
happening in pthread_create and need to step into that function. Unfortunately when I am debugging my program, it does not load the shared library symbols properly so I can't step over the source code and examine program behaviour meaningfully.. This is the output as soon as I start gdb.
Remote debugging using 127.0.0.1:21293
warning: limiting remote suggested packet size (206696 bytes) to 16384
Failed to read a valid object file image from memory.
So I believe the last message is related to the failure to read debugging symbols. This is despite having the libc6-dbg package installed. This is the truncated output of "where" at a point just before a SIGSEGV is encountered (in pthread_create, the function I want to examine in the debugger)
#0 0x68200ce2 in ?? ()
#1 0x68403cbf in ?? ()
#2 0x687571b0 in ?? ()
#3 0x6874c638 in ?? ()
#4 0x68867a72 in ?? ()
....
The process' /proc/.../maps shows where libpthread is mapped into memory.
683f8000-68410000 r-xp 00000000 08:01 3017052 /lib/i386-linux-gnu/i686/cmov/libpthread-2.19.so
68410000-68411000 r--p 00017000 08:01 3017052 /lib/i386-linux-gnu/i686/cmov/libpthread-2.19.so
68411000-68412000 rw-p 00018000 08:01 3017052 /lib/i386-linux-gnu/i686/cmov/libpthread-2.19.so
I believe that if I could only manually load the debugging symbols into gdb, then I will be able to step over the source code and find the source of my memory error. However I am unsure about how to do this.
I am debugging a 32 bit program on x86_64 Debian. What should I do to load libpthread symbols into GDB so that I can debug it meaningfully?
Loading debug symbols for a shared library
If the shared library is stripped, and the debug symbols are provided as a separate file, you need to load them after the shared library was loaded by the linker. The symbols should be loaded on the memory address where the shared library is loaded.
Following is an example of loading the symbols:
start gdb
~$ gdb a.out
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
(gdb) info sharedlibrary
No shared libraries loaded at this time.
Create break point (on main or any other place) and start the debug
(gdb) b 35
Breakpoint 1 at 0xd7c: file test.c, line 35.
(gdb) r
Starting program: /root/testing/lib-test/a.out
Find the memory location where the shared library that you want to debug was loaded (in this example the library is libtest.so.1
(gdb) info sharedlibrary
From To Syms Read Shared Object Library
0x0000fffff7fcd0c0 0x0000fffff7fe5468 Yes (*) /lib/ld-linux-aarch64.so.1
0x0000fffff7f9f890 0x0000fffff7fb65c0 Yes (*) /usr/local/lib/libtest.so.1
0x0000fffff7e4bbc0 0x0000fffff7f3b190 Yes /lib/aarch64-linux-gnu/libc.so.6
0x0000fffff7dfea50 0x0000fffff7e0ddec Yes /lib/aarch64-linux-gnu/libpthread.so.0
so, the library is loaded starting from memory address 0x0000fffff7f9f890
Load the symbol file with the address from the share library
(gdb) add-symbol-file ./libsrc/libtest.dbg 0x0000fffff7f9f890
add symbol table from file "./libsrc/libtest.dbg" at
.text_addr = 0xfffff7f9f890
(y or n) y
Reading symbols from ./libsrc/libtest.dbg...
After this, you can trace the execution flow inside the library, list the lines of the source code, inspect variables by name, etc.
Related
I am having trouble showing proper debug symbols in the backtrace in GDB in an ARM cross-compiled system, built using Yocto.
abc.c is a simple printf("Hello world\n"); program in C (nothing tricky). On the build machine:
> yocto-dir/build/tmp-angstrom-glibc/sysroots/x86_64-linux/usr/bin/arm-angstrom-linux-gnueabi/arm-angstrom-linux-gnueabi-gcc abc --sysroot=yocto-dir/build/tmp-angstrom-glibc/sysroots/imx28scm -g -O0 -o abc
> scp abc root#DEVICE-IP:~
On the ARM target:
> gdbserver :2345 abc
Start GDB on the build machine (from installed Yocto SDK):
> /usr/local/oecore-x86_64/sysroots/x86_64-angstromsdk-linux/usr/bin/arm-angstrom-linux-gnueabi/arm-angstrom-linux-gnueabi-gdb abc
GNU gdb (Linaro GDB) 7.8-2014.09
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "--host=x86_64-angstromsdk-linux --target=arm-angstrom-linux-gnueabi".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://bugs.linaro.org>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from abc...done.
(gdb) target remote DEVICE-IP:2345
Remote debugging using DEVICE-IP:2345
warning: Unable to find dynamic linker breakpoint function.
GDB will be unable to debug shared library initializers
and track explicitly loaded dynamic code.
Cannot access memory at address 0x0
0x4ae90a20 in ?? ()
(gdb) bt
#0 0x4ae90a20 in ?? ()
#1 0x00000000 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) set sysroot yocto-dir/build/tmp-angstrom-glibc/sysroots/imx28scm
Reading symbols from yocto-dir/build/tmp-angstrom-glibc/sysroots/imx28scm/lib/ld-linux.so.3...done.
Loaded symbols for yocto-dir/build/tmp-angstrom-glibc/sysroots/imx28scm/lib/ld-linux.so.3
Cannot access memory at address 0x0
After setting the sysroot, it still does not give symbols.
(gdb) bt
#0 0x4ae90a20 in ?? ()
#1 0x00000000 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) b main
Breakpoint 1 at 0x84a8: file abc.c, line 5.
(gdb) c
Continuing.
Breakpoint 1, main () at abc.c:5
5 printf("Hello world\n");
Okay, when it hits a breakpoint, it does display symbols.
(gdb) bt
Cannot access memory at address 0x0
#0 main () at abc.c:5
However, it goes weird stepping beyond there.
(gdb) n
Cannot access memory at address 0x1
0x4aea6ea0 in ?? ()
(gdb) bt
#0 0x4aea6ea0 in ?? ()
#1 0x0000a014 in do_lookup_unique (Cannot access memory at address 0x1
undef_map=0x1, ref=0x0, strtab=0x56ebb27 <error: Cannot access memory at address 0x56ebb27>, sym=0x84a0 <main>, type_class=-1224757248, result=0x1, map=<optimized out>,
new_hash=<optimized out>, undef_name=<optimized out>) at /usr/src/debug/glibc/2.24-r0/git/elf/dl-lookup.c:332
#2 do_lookup_x (undef_name=<optimized out>, new_hash=<optimized out>, old_hash=<optimized out>, ref=0x0, result=<optimized out>, scope=0x177ff8e, i=<optimized out>, version=<optimized out>,
flags=-1224757248, skip=0x1, type_class=100, undef_map=0x1) at /usr/src/debug/glibc/2.24-r0/git/elf/dl-lookup.c:544
#3 0x4aec0b10 in ?? ()
Cannot access memory at address 0x1
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
It can't find the proper version of libc.so.6.
(gdb) info sharedlibrary
warning: .dynamic section for "yocto-dir/build/tmp-angstrom-glibc/sysroots/imx28scm/lib/libc.so.6" is not at the expected address (wrong library or version mismatch?)
From To Syms Read Shared Object Library
0x000007d0 0x0001bee0 Yes yocto-dir/build/tmp-angstrom-glibc/sysroots/imx28scm/lib/ld-linux.so.3
0x4aee73c0 0x4afe2018 No yocto-dir/build/tmp-angstrom-glibc/sysroots/imx28scm/lib/libc.so.6
(gdb) n
Cannot find bounds of current function
It does not give an ideal debugging experience.
There is a gcc inside yocto-dir sysroot (as used above), as well as in /usr/local/oecore-x86_64. They both behave the same. The /usr/local/oecore-x86_64 SDK is freshly built and installed.
Similarly, there is an imx28scm sysroot inside yocto-dir (as used above), as well as in /usr/local/oecore-x86_64, and they both behave the same. However, they clearly do have different versions of libc.so.6 - yocto-dir's is 14.8MB, and /usr/local/oecore-x86_64's is 1.3MB. This is a concern, however setting either of these locations as the sysroot does not fix the problem.
One workaround is to link with -static. GDB does give symbols in this case:
(gdb) target remote DEVICE-IP:2345
Remote debugging using DEVICE-IP:2345
_start () at ../sysdeps/arm/start.S:79
79 ../sysdeps/arm/start.S: No such file or directory.
(gdb) set sysroot yocto-dir/build/tmp-angstrom-glibc/sysroots/imx28scm
(gdb) bt
#0 _start () at ../sysdeps/arm/start.S:79
(gdb) b main
Breakpoint 1 at 0x8480: file abc.c, line 5.
(gdb) c
Continuing.
Breakpoint 1, main () at abc.c:5
5 printf("Hello world\n");
(gdb) n
6 return 0;
(gdb) n
7 }
Linking with -Wl,--verbose seems to show it is linking with the library in the expected sysroot:
yocto-dir/build/tmp-angstrom-glibc/sysroots/x86_64-linux/usr/libexec/arm-angstrom-linux-gnueabi/gcc/arm-angstrom-linux-gnueabi/6.2.1/ld: Attempt to open yocto-dir/build/tmp-angstrom-glibc/sysroots/imx28scm/lib/libc.so.6 succeeded
The linker also finds this one, but it isn't referred to as libc.so.6, so presumably this is not interfering.
yocto-dir/build/tmp-angstrom-glibc/sysroots/x86_64-linux/usr/libexec/arm-angstrom-linux-gnueabi/gcc/arm-angstrom-linux-gnueabi/6.2.1/ld: Attempt to open yocto-dir/build/tmp-angstrom-glibc/sysroots/imx28scm/usr/lib/libc.so succeeded
Why is there a library version mismatch in this case? How can I get GDB to display symbols from the library which it expects? I do not wish to link statically.
Please make sure the libc in the box is same as the one in your build server.
sorry, this should be a comments, but currently, I don't have enough reputation.
Apparently GDB for ARM target has trouble with trying to load symbols before main() (Debugging shared libraries with gdbserver):
The problem I had was that gdbserver stops at the dynamic loader, before main, and the dynamic libraries are not yet loaded at that point, and so GDB does not know where the symbols will go in memory yet.
GDB appears to have some mechanisms to automatically load shared library symbols, and if I compile for host, and run gdbserver locally, running to main is not needed. But on the ARM target, that is the most reliable thing to do.
Therefore, set it to load shared symbols after main has been hit:
> b main
> c
<breakpoint hit>
> set sysroot <sysroot>
Or reload the symbols after you hit main.
> set sysroot <sysroot>
...
> b main
> c
<breakpoint hit>
> nosharedlibrary
> sharedlibrary
Or it might be useful in interfacing with IDE debuggers to set auto loading of symbols to be off on GDB startup:
> set auto-solib-add off
Are there any tools or libraries one can use on Linux to get the original (source) instruction only from the PID and the current instruction pointer address, even if the IP currently points into a shared library?
AFAIK it should be possible, since the location of the library mapping is available through /proc/[PID]/maps, though I haven't found any applications or examples doing so.
Any suggestions?
EDIT: an assembly instruction or the nearest symbol suffice (source code line is not necessarily needed)
I found a way to do this with GDB:
Interactive:
$ gdb --pid 1566
(gdb) info symbol 0x7fe28b8a2b79
pselect + 89 in section .text of /lib/x86_64-linux-gnu/libc.so.6
(gdb) info symbol 0x5612550f14a4
copy_word_list + 20 in section .text of /usr/bin/bash
(gdb) info symbol 0x7fe28b878947
execve + 7 in section .text of /lib/x86_64-linux-gnu/libc.so.6
Shows exactly what I wanted!
It can also be scripted:
gdb -q --pid PID --batch -ex 'info symbol HEX_SYMBOL_ADDR'
I first noticed it while playing with GDB's rbreak ., and then made a minimal example:
(gdb) file hello_world.out
Reading symbols from hello_world.out...done.
(gdb) b _init
Breakpoint 1 at 0x4003e0
(gdb) b _start
Breakpoint 2 at 0x400440
(gdb) run
Starting program: /home/ciro/bak/git/cpp/cheat/gdb/hello_world.out
Breakpoint 1, _init (argc=1, argv=0x7fffffffd698, envp=0x7fffffffd6a8) at ../csu/init-first.c:52
52 ../csu/init-first.c: No such file or directory.
(gdb) continue
Continuing.
Breakpoint 2, 0x0000000000400440 in _start ()
(gdb) continue
Continuing.
Breakpoint 1, 0x00000000004003e0 in _init ()
(gdb) info breakpoints
Num Type Disp Enb Address What
1 breakpoint keep y <MULTIPLE>
breakpoint already hit 2 times
1.1 y 0x00000000004003e0 <_init>
1.2 y 0x00007ffff7a36c20 in _init at ../csu/init-first.c:52
2 breakpoint keep y 0x0000000000400440 <_start>
breakpoint already hit 1 time
Note that there are 2 _init: one in csu/init-first.c, and the other seems to come from sysdeps/x86_64/crti.S. I'm talking about the csu one.
Isn't _start supposed to be the entry point set by the linker, and stored in the ELF header? What mechanism makes _init run first? What is its purpose?
Tested on GCC 4.8, glibc 2.19, GDB 7.7.1 and Ubuntu 14.04.
Where the debugger halts first in your example isn't the real beginning of the process.
In the ELF header there is an entry for the program interpreter (dynamic linker). On Linux 64 bit its value is /lib64/ld-linux-x86-64.so.2. The kernel sets the initial instruction pointer to the entry point of this program interpreter. The symbol name of it is _start too, like the programs _start.
After the dynamic linker has done its work, calling also functions in the program, like _init in glibc, it calls the entry point of the program.
The breakpoint at _start doesn't work for the dynamic linker because it takes only the address of the program's _start.
You can find the entry point address with readelf -h /lib64/ld-linux-x86-64.so.2.
You could also set a breakpoint at _dl_start and print a backtrace to see that this function is called from dynamic linker's _start.
If you download glibc's current source code you can find the entry point of the dynamic loader at glibc-2.21/sysdeps/x86_64/dl-machine.h starting on line 121.
Here goes what I did.
Download MESA and build with --enable-debug configuration.
The build results are all in ${MESA_SRC}/lib directory.
I set the LIBGL_DRIVERS_PATH and LD_LIBRARY_PATH to the ${MESA_SRC}/lib directory to use the built results rather than the original library in my local PC. I checked that OpenGL apps are using the libraries in ${MESA_SRC}/lib by using ldd as follows:
~/work/mesa$ ldd /usr/bin/glxgears | grep libGL
libGL.so.1 => lib/libGL.so.1 (0x00007f81aa9cc000)
where lib/libGL.so.1 is the build results from mesa source code.
Now, I wrote a very simple OpenGL app. Let say A. And run gdb and breakpoint to main.
work/mesa$ gdb A
GNU gdb (Ubuntu 7.7-0ubuntu3.1) 7.7
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
(gdb) b main
Breakpoint 1 at 0x4015af: file A.c, line 168.
(gdb) r
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Breakpoint 1, main (argc=1, argv=0x7fffffffdee8) at compositor-pbo.c:168
168 {
(gdb)
After that, I tried to load solib-search-path
(gdb) set solib-search-path /home/jyyoo/work/mesa/lib
where /home/jyyoo/work/mesa is the ${MESA_SRC}. But, I don't see any so files loading. As far as I know, if I issue solib-search-path, the loading of so files in the corresponding directory should be listed with. But, I don't see any. I know that there is a list of so files as follows:
$ ls /home/jyyoo/work/mesa/lib/*.so
/home/jyyoo/work/mesa/lib/i965_dri.so
/home/jyyoo/work/mesa/lib/libglapi.so
/home/jyyoo/work/mesa/lib/libGL.so
/home/jyyoo/work/mesa/lib/libEGL.so
/home/jyyoo/work/mesa/lib/libGLESv1_CM.so
/home/jyyoo/work/mesa/lib/mesa_dri_drivers.so
/home/jyyoo/work/mesa/lib/libgbm.so
/home/jyyoo/work/mesa/lib/libGLESv2.so
So, if I try to 's' in gl functions, it does not step into...
Additionally, if I see the address of glBindBuffer, it shows
(gdb) p glBindBuffer
$1 = {<text variable, no debug info>} 0x7ffff7969f60 <glBindBufferARB>
But, the maps of this process does not load anything in that address:
/proc/<pid of gdb>$ cat maps
... snip ...
7fff5f652000-7fff5f673000 rw-p 00000000 00:00 0 [stack]
7fff5f74d000-7fff5f74f000 r-xp 00000000 00:00 0 [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
Seems the apis are written in assembly language, so no source can be displayed. Just set a breakpoint at glBindBuffer, then run, then the program will pause at the breakpoint, then type 'step' or 'next', you will step into the real function behind the api.
I need to debug my MPI application written in C. I wanted to use the system with GDB attached manually to processes, as it's recommended here (paragraph 6).
The problem is, when I try to print the value of the variable "i", I get this error:
No symbol "i" in current context.
The same problem is with set var i=5. When i try to run info local, it simply states "no locales".
System Ubuntu 14.04
MPICC cc (Ubuntu 4.8.2-19ubuntu1) 4.8.2
GDB GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.2) 7.7.1.
I compile my code with the command
mpicc -o hello hello.c
and execute it with
mpiexec -n 2 ./hello
I've tried to look for this problem, but the solution is usually not to use any optimalization (-O) options in GCC, but it's not useful for me, because I don't use any of them here and I'm compiling with MPICC. I've already tried to declare "i" variable as volatile, and launch mpicc with -g and -O0, but nothing helps.
DBG message
GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.2) 7.7.1
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word".
Attaching to process 3778
Reading symbols from /home/martin/Dokumenty/Programovani/mpi_trenink/hello...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libmpich.so.10...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/x86_64-linux-gnu/libmpich.so.10
Reading symbols from /lib/x86_64-linux-gnu/libc.so.6...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libc-2.19.so...done.
done.
Loaded symbols for /lib/x86_64-linux-gnu/libc.so.6
Reading symbols from /usr/lib/x86_64-linux-gnu/libmpl.so.1...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/x86_64-linux-gnu/libmpl.so.1
Reading symbols from /lib/x86_64-linux-gnu/librt.so.1...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/librt-2.19.so...done.
done.
Loaded symbols for /lib/x86_64-linux-gnu/librt.so.1
Reading symbols from /usr/lib/libcr.so.0...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libcr.so.0
Reading symbols from /lib/x86_64-linux-gnu/libpthread.so.0...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libpthread-2.19.so...done.
done.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Loaded symbols for /lib/x86_64-linux-gnu/libpthread.so.0
Reading symbols from /lib/x86_64-linux-gnu/libgcc_s.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/x86_64-linux-gnu/libgcc_s.so.1
Reading symbols from /lib64/ld-linux-x86-64.so.2...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/ld-2.19.so...done.
done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /lib/x86_64-linux-gnu/libdl.so.2...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libdl-2.19.so...done.
done.
Loaded symbols for /lib/x86_64-linux-gnu/libdl.so.2
Reading symbols from /lib/x86_64-linux-gnu/libnss_files.so.2...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libnss_files-2.19.so...done.
done.
Loaded symbols for /lib/x86_64-linux-gnu/libnss_files.so.2
0x00007f493e53c9a0 in __nanosleep_nocancel ()
at ../sysdeps/unix/syscall-template.S:81
81 ../sysdeps/unix/syscall-template.S: No such file or directory.
My code
#include <stdio.h>
#include <mpi.h>
#include <unistd.h> // sleep()
int main(){
MPI_Init(NULL, NULL);
/* DEBUGGING STOP */
int i = 0;
while(i == 0){
sleep(30);
}
int world_size;
MPI_Comm_size( MPI_COMM_WORLD, &world_size );
int process_id; // casto znaceno jako 'world_rank'
MPI_Comm_rank( MPI_COMM_WORLD, &process_id );
char processor_name[ MPI_MAX_PROCESSOR_NAME ];
int name_len;
MPI_Get_processor_name( processor_name, &name_len );
printf("Hello! - sent from process %d running on processor %s.\n\
Number of processors is %d.\n\
Length of proc name is %d.\n\
***********************\n",
process_id, processor_name, world_size, name_len);
MPI_Finalize();
return 0;
}
With a high probability GDB is to break the process while it is deep into the implementation of the sleep(3) function. You could check that by first issuing the bt (backtrace) command:
(gdb) bt
#0 0x00000030e0caca3d in nanosleep () from /lib64/libc.so.6
#1 0x00000030e0cac8b0 in sleep () from /lib64/libc.so.6
#2 0x0000000000400795 in main (argc=1, argv=0x7fff64ae4688) at sleeper.c:9
i is not present in the frame of nanosleep:
(gdb) info locals
No symbol table info available.
Select the stack frame of the main function by issuing the frame x command (where x is the frame number, 2 in the example shown).
(gdb) f 2
#2 0x0000000000400795 in main (argc=1, argv=0x7fff64ae4688) at sleeper.c:9
9 while(i == 0) { sleep(30); }
i should be there now:
(gdb) info locals
i = 0
You might also need to change the active thread if GDB happens to attach to the wrong one. Many MPI libraries spawn additional threads, e.g. with Intel MPI:
(gdb) info threads
3 Thread 0x7f8b9fada700 (LWP 39085) 0x00000030e0cdf1b3 in poll () from /lib64/libc.so.6
2 Thread 0x7f8b9f0d9700 (LWP 39087) 0x00000030e0cdf1b3 in poll () from /lib64/libc.so.6
* 1 Thread 0x7f8ba1b51700 (LWP 39066) 0x00000030e0caca3d in nanosleep () from /lib64/libc.so.6
The thread marked with * is the one being examined. If some other thread is active, switch to the main one with the thread 1 command.
I've finally solved this. The point is I had to examine the contents of the certain frame with up command, before trying to print the variable "i" up or changing its value.
Step-by-step solution
Compile this code with mpicc -o hello hello.c -g -O0.
Launch the program with mpiexec -n 2 ./hello.
Find the process ID (PID) out.
I use the command ps -e | grep hello.
Other option is to use simply pstree.
And finally, you can use the native Linux function getpid().
Next step is to open a new terminal and launch GDB with the command gdb --pid debugged_process_id.
Now, in debugger type bt.
The output will be similar to this one:
#0 0x00007f63667e09a0 in __nanosleep_nocancel ()
at ../sysdeps/unix/syscall-template.S:81
#1 0x00007f63667e0854 in __sleep (seconds=0)
at ../sysdeps/unix/sysv/linux/sleep.c:137
#2 0x00000000004009ec in main () at hello.c:20
As we can see, paragraph 2 points to the code hello.c, so we can look at it more in detail. Type up 2.
The output will be similar to this one:
#2 0x00000000004009ec in main () at hello.c:20
warning: Source file is more recent than executable.
20 sleep(30);
And finally, now we can print all the local variables in this block out. Type info local.
The output:
i = 0
world_size = 0
process_id = 0
processor_name = "\000\000\000\000\000\000\000\000 5\026gc\177\000\000\200\306Η\377\177\000\000p\306Η\377\177\000\000.N=\366\000\000\000\000\272\005#\000\000\000\000\000\377\377\377\377\000\000\000\000%0`\236\060\000\000\000\250\361rfc\177\000\000x\n\026gc\177\000\000\320\067`\236\060\000\000\000\377\377\377\177\376\377\377\377\001\000\000\000\000\000\000\000\335\n#\000\000\000\000\000\377\377\377\377\377\377\377\377\000\000\000\000\000\000\000"
name_len = 1718986550
Now we can free the stopper loop by set var i=1 and continue with debugging.