Seg fault on running a program through linker? - c

I downloaded the source for libc6 and completed the build process successfully. (Though I did not performed a make install deliberately).
With the new linker built in buil-dir/elf/ld.so I ran a program supplying it as the argument to the newly built linker.
The test code prints some string and then malloc(sizeof(char)*1024).
On running the test binary as an argument to the newly built linker I get a Seg Fault at elf/dl-addr.c:132 which is:
131 /* Protect against concurrent loads and unloads. */
132 __rtld_lock_lock_recursive (GL(dl_load_lock));
This is the last frame before the seg fault and is called through malloc() call from the test program.
Stack Trace at that point :
#0 0x0000000000000000 in ?? ()
#1 0x00007f11a6a94928 in __GI__dl_addr (address=0x7f11a69e67a0 <ptmalloc_init>, info=0x7fffe9393be0, mapp=0x7fffe9393c00, symbolp=0x0) at dl-addr.c:132
#2 0x00007f11a69e64d7 in ptmalloc_init () at arena.c:381
#3 0x00007f11a69e72b8 in ptmalloc_init () at arena.c:371
#4 malloc_hook_ini (sz=<optimized out>, caller=<optimized out>) at hooks.c:32
#5 0x00000000004005b3 in main () at test.c:20
On running the same program with the default installed linker on the machine the program runs fine.
I am not able to understand what can be the issue behind this? (Is it faulting because I am using the newly built linker without installing it first)
-Any suggestions or pointers are highly appreciated.
Thanks
(System details GCC 4.8.22, eglibc-2.15 Ubuntu 12.10 64bit

With the new linker built in buil-dir/elf/ld.so I ran a program supplying it as the argument to the newly built linker.
It that's all you did, then the crash is expected, because you are mixing newly-built loader with system libraries (which doesn't work: all parts of glibc must come from the same build of it).
What you need to do is:
buil-dir/elf/ld.so \
--library-path buil-dir:buil-dir/dlfcn:buil-dir/nptl:... \
/path/to/a.out
The list of directories to search must include all the libraries (parts of glibc) that your program uses.

Related

Debugging functions in __libc_start_main

I'm writing a library that hooks some CUDA functions to add some functionality. The "constructor" hooks the CUDA functions and set up message queue and shared memory to communicate with other hooked CUDA binaries. When launching several hooked CUDA binaries (by python subprocess.Popen('<path-to-binary>', shell=True)) some processes hangs. So I used gdb -p <pid> to attach one suspended process, hoping to figure out what's going wrong. Here's the result:
Attaching to process 7445
Reading symbols from /bin/dash...(no debugging symbols found)...done.
Reading symbols from /lib/x86_64-linux-gnu/libc.so.6...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libc-2.27.so...done.
done.
Reading symbols from /lib64/ld-linux-x86-64.so.2...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/ld-2.27.so...done.
done.
0x00007f9cefe8b76a in wait4 () at ../sysdeps/unix/syscall-template.S:78
78 ../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb) bt
#0 0x00007f9cefe8b76a in wait4 () at ../sysdeps/unix/syscall-template.S:78
#1 0x000055fff93be8a0 in ?? ()
#2 0x000055fff93c009d in ?? ()
#3 0x000055fff93ba6d8 in ?? ()
#4 0x000055fff93b949e in ?? ()
#5 0x000055fff93b9eda in ?? ()
#6 0x000055fff93b7944 in ?? ()
#7 0x00007f9cefdc8b97 in __libc_start_main (main=0x55fff93b7850, argc=3, argv=0x7ffca7c7beb8, init=<optimized out>,
fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffca7c7bea8) at ../csu/libc-start.c:310
#8 0x000055fff93b7a4a in ?? ()
I've added -g flag but it seems that the program hangs on wait4 before entering main.
Thanks for any insights on:
How can I load these debug symbols to get rid of ??
Where is ../csu/libc-start.c:310 located?
What else can I do to locate the bug?
System Info: gcc 6.5.0, Ubuntu 18.04 with 4.15.0-54-generic.
How can I load these debug symbols to get rid of ??
You appear to need the debug symbols for /bin/dash, which are probably going to be in a package called dash-dbg or dash-dbgsym or something like that.
Also, I suspect your stack trace would make more sense if you compiled your library with -fno-optimize-sibling-calls.
Where is ../csu/libc-start.c:310 located?
See this answer.
What else can I do to locate the bug?
You said that you are writing a library that uses __attribute__((constructor)), but you showed a stack trace for /bin/dash (which I presume is DASH and not a program you wrote) that does not appear to involve symbols from your library. I infer from this, that your library is loaded with LD_PRELOAD into programs that are not expecting it to be there.
Both of those things -- LD_PRELOAD and __attribute__((constructor)) -- break the normal expectations of both whatever unsuspecting program is involved, and the C library. You should only do those things if you have no other choice, and you should try to do as little as possible within the injected code. (In particular, I do not think any design that involves spawning processes from a constructor function will be workable, period.) If you tell us about your larger goals we may be able to suggest alternative means that are less troublesome.
EDIT:
subprocess.Popen('<path-to-binary>', shell=True)
With shell=True, Python doesn't invoke the program directly, it runs a command of the form /bin/sh -c 'string passed to Popen'. In many cases this will naturally produce a /bin/dash process sleeping (not hung) in a wait syscall for the entire lifetime of the actual binary. Unless you actually need to evaluate some shell code before running the program, try the default shell=False instead and see if that makes your problem go away. (If you do need to evaluate shell code, try Popen('<shell code>; exec <binary>', shell=True).)

How to gdb a core file with seg fault in centos 6.5?

My program is multi thread. i got a core file and when i try to debug it i got this.
Program terminated with signal 11, Segmentation fault.
#0 memcpy () at ../sysdeps/x86_64/memcpy.S:91
91 movl %ecx, (%rdi)
Missing separate debuginfos, use: debuginfo-install libssh2-1.8.0-2.0.cf.rhel6.x86_64
(gdb) bt
#0 memcpy () at ../sysdeps/x86_64/memcpy.S:91
#1 0x00007f981b342feb in ?? ()
#2 0x00000000025f1ef0 in ?? ()
#3 0x00000000025edef0 in ?? ()
#4 0x00007fff4b65a810 in ?? ()
#5 0x0000000000000001 in ?? ()
#6 0x00000000025cb800 in ?? ()
#7 0x00000000025ccea0 in ?? ()
#8 0x0000000000000000 in ?? ()
Why the bt infos are "???" Can i identify which thread and where case the seg fault?
Thank you.
In order to run gdb and make the best use of it, firstly, you need to compile your source with the -g or -ggdb3 option of gcc in the following way:
gcc -ggdb3 sample.c -o sample
After this you will get an executable or binary file which you can execute. Upon execution the program will generate a segfault and a coredump will be created. You can use this core file in the following way with gdb to obtain the backtrace:
gdb ./sample /path/to/core/file
You can even launch your program using gdb without actually executing it separately and generating the core file explicitly. If you want to do this, execute the following command:
gdb ./sample
The "??" entries are where symbol translation failed. Stack walking – which produces the stack trace – can also fail. In that case you'll likely see a single valid frame, then a small number of bogus addresses. If symbols or stacks are too badly broken to make sense of the stack trace, then there are usually ways to fix it: installing debug info packages (giving gdb more symbols, and letting it do DWARF-based stack walks), or recompiling the software from source with frame pointers and debugging information (-fno-omit-frame-pointer -g).

many core files( e.g core.1678 etc ),how to find where the exactly error is. By using gdb?

core.1678,core.1689, how can i resolve this problem using gdb.i have tried gdb bt option but it is not resolving the error.
gdb -bt core.1678
(gdb) core
No core file now.
(gdb) n
The program is not being run.
(gdb) r
Starting program:
No executable file specified.
Use the "file" or "exec-file" command.
(gdb) core.1678
/home/deepak/deepak/mss/.1678: No such file or directory.
(gdb) /home/deepak/deepak/mss/core.1678
help me out
many core files( e.g core.1678 etc )...
This indicates that your same program or different programs in that particular directory is continuously crashing. When your machine is configured to generate dump file, it creates the file in the form of core.(PID). You may refer many useful article regarding the core dump file. You may refer my blog as well which explains about core dump analysis and its internal.
http://mantoshopensource.blogspot.sg/2011/02/core-dump-analysis-part-ii.html
The basic command to load and analyze the core dump file using GDB is as follows:
mantosh#ubuntu:~$ gdb
// This is how you would open the core dump file.
(gdb) core core.23515
(no debugging symbols found)
Core was generated by `./otest LinuxWorldRocks 10'.
Program terminated with signal 11, Segmentation fault.
[New process 23515]
==> Signal 11(SIGSEGV) was the reason for this core-dump file
==> pid of a program is 23515
#0 0x080485f8 in ?? ()
// Load the debug symbol of your program(build with -g option)
(gdb) symbol ./otest
Reading symbols from /home/mantosh/Desktop/otest...done.
// Now you can execute any normal command which you perform while debugging(except breakpoints).
(gdb) bt
#0 0x080485f8 in printf_info (info=0x8ec5008 "LinuxWorld") at test.c:58
#1 0x080485c2 in my_memcpy (dest=0x8ec5012 "", source=0xbfb9c6fe "Rocks",
length=10) at test.c:47
#2 0x0804855d in main (argc=3, argv=0xbfb9b3f4) at test.c:33
EDIT
cored-ump file is the snapshot of that particular program at the time of exception/segmentation fault. So once you load core-dump in GDB you would only be able to execute the command to read
the memory information. You can not use the debugging commands like breakpoints, continue, run ...etc..........

No Debug Symbols cross compile ARM on BusyBox

i'm trying to debug a C program, which runs on an ARM926EJ-S rev 5 (v5l). The software was cross-compiled (and is statically linked) with the std. arm-linux-gnueabi compiler (intalled via synaptic). I run Ubuntu 13.04 64bit. On the device is a Busybox v1.18.2. I successfully compiled gdbserver (with host=arm-linux-gnueabi) and gdb (with target=arm-linux-gnueabi) and can start my program on the embedded device via the locally running gdb...
My problem now is, that i don't have a proper backtrace output.
Message of gdb:
Remote debugging using 192.168.21.127:2345
0x0000a79c in ?? ()
(gdb) run
The "remote" target does not support "run". Try "help target" or "continue".
(gdb) continue
Continuing.
Cannot access memory at address 0x0
Program received signal SIGINT, Interrupt.
0x00026628 in ?? ()
(gdb) backtrace
#0 0x00026628 in ?? ()
#1 0x00036204 in ?? ()
#2 0x00036204 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb)
I try to compile the software with -g, -g3 -gdwarf-2, -ggdb, -ggdb3 without any difference.
Has anybody an idea what i am missing here?
Is this a problem maybe with the BusyBox or do i need additional libs on my host system?
I also tried the function backtrace_symbols from execinfo.h with nearly the same output...
Thanks in advance for any reply.
Another way for debugging is use gdb inside board follow below steps.
1)Run gdb process and attach your process to gdb using attach <pid> command
2)Continue your process using c command in gdb
Whenever you find any SIGINT or SIGSEGV then refer stack of your process using bt command in gdb.

linker issue or other? dynamically loaded lib

My program loads a dynamic library, but after it tries to load it (it doesn't seem to, or at least something's amiss with the loading. A free() throws an error, and I commented out that line.)
I get the following in gdb.
Program received signal SIGSEGV, Segmentation fault.
__strlen_ia32 () at ../sysdeps/i386/i686/multiarch/../../i586/strlen.S:99
99 ../sysdeps/i386/i686/multiarch/../../i586/strlen.S: No such file or directory.
in ../sysdeps/i386/i686/multiarch/../../i586/strlen.S
How would I go about addressing this?
EDIT1:
The above issue was due to me not having an xml file where it should have been.
Here's the first error that I covered up to get to the initial error I showed.
(gdb) s
__dlopen (file=0xbfffd03c "/usr/lib/libvisual-0.5/actor/actor_AVS.so", mode=1)
at dlopen.c:76
76 dlopen.c: No such file or directory.
in dlopen.c
(gdb) bt
#0 __dlopen (file=0xbfffd03c "/usr/lib/libvisual-0.5/actor/actor_AVS.so",
mode=1) at dlopen.c:76
#1 0xb7f8680d in visual_plugin_get_references (
pluginpath=0xbfffd03c "/usr/lib/libvisual-0.5/actor/actor_AVS.so",
count=0xbfffd020) at lv_plugin.c:834
#2 0xb7f86168 in plugin_add_dir_to_list (list=0x804e428,
dir=0x804e288 "/usr/lib/libvisual-0.5/actor") at lv_plugin.c:609
#3 0xb7f86b2b in visual_plugin_get_list (paths=0x804e3d8,
ignore_non_existing=1) at lv_plugin.c:943
#4 0xb7f9c5db in visual_init (argc=0xbffff170, argv=0xbffff174)
at lv_libvisual.c:370
#5 0x080494b7 in main (argc=2, argv=0xbffff204) at client.c:32
(gdb) quit
A debugging session is active.
Inferior 1 [process 3704] will be killed.
Quit anyway? (y or n) y
starlon#lyrical:client$ ls /usr/lib/libvisual-0.5/actor/actor_AVS.so
/usr/lib/libvisual-0.5/actor/actor_AVS.so
starlon#lyrical:client$
The file exists. Not sure what's up. Not sure what code to provide either.
Edit2: More info on the file. Permissions are ok.
816K -rwxr-xr-x 1 root root 814K 2011-11-08 15:06 /usr/lib/libvisual-0.5/actor/actor_AVS.so
You didn't tell what dynamic library it is.
If it is a free dynamic library -or a library whose source is accessible to you- you can compile it and use it with debugging enabled.
Several Linux distributions -notably Debian & Ubuntu- provide debugging variant of many libraries (e.g. GLibc, GTK, Qt, etc...), so you don't need to rebuild them. For example, Debian has libgtk-3-0 package (the binary libraries mostly), libgtk-3-dev the development files for it (headers, etc...) and libgtk-3-0-dbg (the debugging variant of the library). You need to set LD_LIBRARY_PATH appropriately to use it (since it is in /usr/lib/debug/usr/lib/libgdk-3.so.0.200.1).
Sometimes, using the debugging variants of system libraries help you to find bugs in your own code. (Of course, you also need to compile with -g -Wall your own code)
Turned out this was due to a faulty hard drive. Looks like I need a new one.

Resources