I am doing cross compile debugging.
My build server CPU is amd64. My device CPU is MIPS.
When I am trying to do debug the elf file compiled by myself. The gdb can only show ld.so.1
(gdb) info sharedlibrary
From To Syms Read Shared Object Library
0x7704f9c0 0x7706c490 Yes (*) /lib/ld.so.1
(*): Shared library is missing debugging information.
(gdb) q
I checked the /proc/xxxx/maps file. It showed that the shared libraries are loaded.
root#TRA:/proc/13679# cat maps
......
76549000-76d48000 rwxp 00000000 00:00 0 [stack:13682]
76d48000-76d4a000 r-xp 00000000 00:0c 5268 /usr/lib/strongswan/plugins/libstrongswan-addrblock.so
76d4a000-76d59000 ---p 00002000 00:0c 5268 /usr/lib/strongswan/plugins/libstrongswan-addrblock.so
......
If I debug the file which is installed from Debian Package server, then GDB can show all the shared libraries.
(gdb) info sharedlibrary
From To Syms Read Shared Object Library
0x77341bc0 0x77342c80 Yes (*) /lib/mips-linux-gnu/libdl.so.2
0x771d77e0 0x772ff6f0 Yes (*) /lib/mips-linux-gnu/libc.so.6
0x773549c0 0x77371490 Yes (*) /lib/ld.so.1
(*): Shared library is missing debugging information.
(gdb)
GDB version is:
GNU gdb (Debian 7.7.1+dfsg-5) 7.7.1
My questions is:
Why the GDB command 'info sharedlibrary' can't show all the libraries? How can I fix it?
(EDIT)
(Does every executable file need the library ld.so? It is missing.)
The output of command "mips-linux-gnu-readelf -d src/charon/.libs/charon"
Dynamic section at offset 0x1fc contains 33 entries:
Tag Type Name/Value
0x00000001 (NEEDED) Shared library: [libstrongswan.so.0]
0x00000001 (NEEDED) Shared library: [libhydra.so.0]
0x00000001 (NEEDED) Shared library: [libcharon.so.0]
0x00000001 (NEEDED) Shared library: [libm.so.6]
0x00000001 (NEEDED) Shared library: [libpthread.so.0]
0x00000001 (NEEDED) Shared library: [libdl.so.2]
0x00000001 (NEEDED) Shared library: [libc.so.6]
0x0000001d (RUNPATH) Library runpath: [/usr/lib/strongswan]
0x0000000c (INIT) 0xd00
0x0000000d (FINI) 0x2eb0
0x00000004 (HASH) 0x32c
0x00000005 (STRTAB) 0x904
0x00000006 (SYMTAB) 0x4d4
0x0000000a (STRSZ) 787 (bytes)
0x0000000b (SYMENT) 16 (bytes)
0x70000035 (MIPS_RLD_MAP_REL) 0x134dc
0x00000015 (DEBUG) 0x0
0x00000003 (PLTGOT) 0x13760
0x00000011 (REL) 0xcf0
0x00000012 (RELSZ) 16 (bytes)
0x00000013 (RELENT) 8 (bytes)
0x70000001 (MIPS_RLD_VERSION) 1
0x70000005 (MIPS_FLAGS) NOTPOT
0x70000006 (MIPS_BASE_ADDRESS) 0x0
0x7000000a (MIPS_LOCAL_GOTNO) 18
0x70000011 (MIPS_SYMTABNO) 67
0x70000012 (MIPS_UNREFEXTNO) 37
0x70000013 (MIPS_GOTSYM) 0x11
0x6ffffffb (FLAGS_1) Flags: PIE
0x6ffffffe (VERNEED) 0xca0
0x6fffffff (VERNEEDNUM) 2
0x6ffffff0 (VERSYM) 0xc18
0x00000000 (NULL) 0x0
EDIT
Debuging GDB:
The gdb query ‘qXfer:libraries-svr4:read’ returned empty library list.
Breakpoint 7, svr4_current_sos_via_xfer_libraries (list=0x7fff8be59ad0, annex=<optimized out>)
at /gdb/gdb-7.11.1/gdb/solib-svr4.c:1301
1301 result = svr4_ parse_libraries (svr4_library_document, list);
1: svr4_library_document = 0x15cd9c0 "<library-list-svr4 version=\"1.0\"/>"
(gdb)
For Debian packages which are not compiled by me, the gdb query ‘qXfer:libraries-svr4:read’ returned full shared library list.
How does gdbserver construct the reply of this query ‘qXfer:libraries-svr4:read’?
EDIT
One more clue:
The pkgs installed from debian Jessie distribute is not PIE code.
The code I compiled is PIE code.
root#TRA:/proc/14956# readelf -r /usr/lib/strongswan/charon
Relocation section '.rel.dyn' at offset 0xcf0 contains 2 entries:
Offset Info Type Sym.Value Sym. Name
00000000 00000000 R_MIPS_NONE
00013870 00000003 R_MIPS_REL32
root#TRA:/proc/14956# readelf -r /usr/bin/id
There are no relocations in this file.
root#TRA:/proc/14956#
EDIT
After debugging gdbserver, I found one strange info.
The DT_DEBUG entry of the running proc is 0. After loader relocate the code, the DT_DEBUG should not be 0.(?) Does the system not support PIE code? I am using Debian Jessie MIPS system.
gdbserver source code:
if (dyn->d_tag == DT_DEBUG && map == -1)
map = dyn->d_un.d_val;
gdbserver dbg print
(gdb) p *dyn
$19 = {d_tag = 21, d_un = {d_val = 0, d_ptr = 0}}
(gdb)
EDIT
Get some information from this link:
https://sourceware.org/ml/binutils/2015-06/msg00166.html
I installed gdbserver from Debian Jessie mips-pkg server. But it seems not support PIE. Where can I install the mips-gdbserver which can support PIE?
Or how can I disable the gcc compiler generate PIE code?
I tried these flags (-fno-pie -fPIC) in cross-compile, but it still generate PIE code.
libtool: link: mips-linux-gnu-gcc -mfp32 -fno-pie -fPIC
-ggdb -O0 -Wall -Wno-format -Wno-format-security
-Wno-pointer-sign -I/cross-mips/usr/include -I/cross-mips/usr/include/libnl3
-I/cross-mips/usr/include/mips-linux-gnu
-I/work/strongswan/src/util
-include /work/strongswan/config.h
-o .libs/charon charon.o -L/cross-mips/lib/mips-linux-gnu
- L/cross-mips -L/cross-mips/usr/lib/mips-linux-gnu
../../src/libstrongswan/.libs/libstrongswan.so
-lm -lpthread -ldl -Wl,-rpath -Wl,/usr/lib/strongswan
Check the generated code:
mips-linux-gnu-readelf -r src/charon/.libs/charon
Relocation section '.rel.dyn' at offset 0xcf0 contains 2 entries:
Offset Info Type Sym.Value Sym. Name
00000000 00000000 R_MIPS_NONE
00013870 00000003 R_MIPS_REL32
Solution
Unfortunately the reason is my compiler gcc-6 is brocken. I used 'gcc version 6.3.0 20170516 (Debian 6.3.0-18)'. It is configured with '--enable-default-pie'. And there is no way to disable PIE. And this PIE breaks static library links. I have to change my compiler to gcc5.
From the info you provided, it seems that there are two likely causes:
Either you fully strip your binary, and gdbserver requires some symbol, or
You are building a PIE binary, and gdbserver on your system doesn't support such binaries.
(It's also possible that it's the combination of 1 and 2 that causes the problem.)
Since you know that the distribution binaries work, your best bet is probably to understand the differences between them and your binary, and minimizing such differences until gdbserver starts working.
Related
I have a number of object files in ELF format, with the usual .text and other common sections, and I was wondering if the gnu ld or gold could be used to link a number of ELF object files into an ELF executable, even if the architecture (an 8-bit micro with a proprietary toolchain) is not known beforehand by the linker. In essence I'm asking if the linking process is, to some extent, platform independent once you have all the required obect files, or if on the contrary I will need to roll my own linker at some point.
No, it won't work.
A major thing the linker has to do is to handle relocations. Relocations are arch-specific:
int f(){return 42;}
$ gcc -c foo.c -o foo && readelf -r foo
Relocation section '.rela.eh_frame' at offset 0x198 contains 1 entry:
Offset Info Type Sym. Value Sym. Name + Addend
000000000020 000200000002 R_X86_64_PC32 0000000000000000 .text + 0
$ gcc -m32 -c foo.c -o foo && readelf -r foo
Relocation section '.rel.text' at offset 0x1d0 contains 2 entries:
Offset Info Type Sym.Value Sym. Name
00000004 00000b02 R_386_PC32 00000000 __x86.get_pc_thunk.ax
00000009 00000c0a R_386_GOTPC 00000000 _GLOBAL_OFFSET_TABLE_
Relocation section '.rel.eh_frame' at offset 0x1e0 contains 2 entries:
Offset Info Type Sym.Value Sym. Name
00000020 00000202 R_386_PC32 00000000 .text
00000040 00000502 R_386_PC32 00000000 .text.__x86.get_pc_thu
$ clang -target arm-linux-gnueabi -c foo.c -o foo && readelf -r foo
Relocation section '.rel.ARM.exidx' at offset 0x104 contains 1 entry:
Offset Info Type Sym.Value Sym. Name
00000000 0000032a R_ARM_PREL31 00000000 .text
Moreover the linker script which says how the ELF file should be generated (page size, start address, etc.) is arch-specific:
ld -m elf_x86_64 --verbose
ld -m elf_i386 --verbose
arm-linux-gnueabi-ld --verbose
If your not compiling to a static executable, the linker has to generate PLT entries as well which are native code (and thus arch-specific).
Some architecture have arch-specific segments as well (eg. .ARM.extab, .ARM.exidx).
There are many questions related to specific errors why stepping into a shared library with gdb isn't working. None of them provide a systematic answer on how to confirm where the the cause is. This questions is about the ways to diagnose the setup.
Setup example
main.c
#include <stdio.h>
#include "myshared.h"
int main(void)
{
int a = 3;
print_from_lib();
return 0;
}
myshared.h
void print_from_lib();
myshared.c
#include <stdio.h>
void print_from_lib()
{
printf("Printed from shared library\n");
}
Place all the files in the same directory.
export LIBRARY_PATH=$PWD:$LIBRARY_PATH
export LD_LIBRARY_PATH=$PWD:$LD_LIBRARY_PATH
gcc -ggdb -c -Wall -Werror -fpic myshared.c -o myshared-ggdb.o
gcc -ggdb -shared -o libmyshared-ggdb.so myshared-ggdb.o
gcc -ggdb main.c -lmyshared-ggdb -o app-ggdb
Getting the error
$ gdb ./app-ggdb
GNU gdb (Ubuntu 7.12.50.20170314-0ubuntu1) 7.12.50.20170314-git
...### GDB STARTING TEXT
Reading symbols from app-ggdb...done.
(gdb) break 7
Breakpoint 1 at 0x78f: file main.c, line 7.
(gdb) run
Starting program: /home/user/share-lib-example/app-ggdb
Breakpoint 1, main () at main.c:7
7 print_from_lib();
(gdb) s
Printed from shared library
8 return 0;
gdb is not stepping inside of the function
Necessary but not sufficient checks
Debug symbols in the binaries
$ objdump --syms libmyshared-ggdb.so | grep debug
0000000000000000 l d .debug_aranges 0000000000000000 .debug_aranges
0000000000000000 l d .debug_info 0000000000000000 .debug_info
0000000000000000 l d .debug_abbrev 0000000000000000 .debug_abbrev
0000000000000000 l d .debug_line 0000000000000000 .debug_line
0000000000000000 l d .debug_str 0000000000000000 .debug_str
Symbols recognized by gdb
$ gdb ./app-ggdb
...### GDB STARTING TEXT
Reading symbols from app-ggdb...done.
(gdb) break 7
Breakpoint 1 at 0x78f: file main.c, line 7.
(gdb) run
Starting program: /home/user/share-lib-example/app-ggdb
Breakpoint 1, main () at main.c:7
7 print_from_lib();
(gdb)(gdb) info sharedlibrary
From To Syms Read Shared Object Library
0x00007ffff7dd7aa0 0x00007ffff7df55c0 Yes /lib64/ld-linux-x86-64.so.2
0x00007ffff7bd5580 0x00007ffff7bd5693 Yes /home/user/share-lib-example/libmyshared-ggdb.so
0x00007ffff782d9c0 0x00007ffff797ed43 Yes /lib/x86_64-linux-gnu/libc.so.6
Confirm .gdbinit isn't the cause
~/.gdbinit contains commands automatically executed upon starting gdb. ref.
Running gdb with the -nx flags can exclude .gdbinit as the source of the problem.
Question
Am looking for suggestions to complete the list of Necessary but not sufficient checks.
Current issue [Update from Mark Plotnick]
This step bug is reproducible on Ubuntu 17.04 amd64 with both a 64- and 32-bit executable and library.
The bug isn't reproducible on Ubuntu 17.04 i386. (gcc 6.3.0-12ubuntu2, gdb 7.12.50 and 8.0, no .gdbinit.).
Possibly relevant: gcc on 17.04 amd64 has been built (by Canonical) to generate pie executables by default.
Question
Can flags with which gcc was build with interfere with debugging? How can you identify if your gcc is the cause?
Your problem is self-imposed: don't do this: set step-mode on, and step will work as you expect.
From the GDB manual:
set step-mode
set step-mode on
The set step-mode on command causes the step command to stop at the first
instruction of a function which contains no debug line information
rather than stepping over it.
This is useful in cases where you may be interested in inspecting the machine
instructions of a function which has no symbolic info and do not want
GDB to automatically skip over this function.
You are interested in the opposite of the above -- you want to step into the print_from_lib function and avoid stopping inside the PLT jump stub and the dynamic loader's symbol resolution function.
GDB 7.11 can't reproduce this problem.
This is my steps. I hope this will help you:
1.gcc -ggdb -c -Wall -Werror -fpic myshared.c -o myshared-ggdb.o
2.gcc -ggdb -shared -o libmyshared-ggdb.so myshared-ggdb.o
3.gcc -ggdb main.c -lmyshared-ggdb -o app-ggdb -L.
4.gdb ./app-ggdb
In GDB,
(gdb) set env LD_LIBRARY_PATH=.
(gdb) b main.c:7
Breakpoint 1 at 0x4006a5: file main.c, line 7.
(gdb) r
Starting program: /home/haolee/tmp/app-ggdb
Breakpoint 1, main () at main.c:7
7 print_from_lib();
(gdb) s
print_from_lib () at myshared.c:5
5 printf("Printed from shared library\n");
(gdb)
I step into the function print_from_lib successfully.
Some more tests you can do on built shared library:
file libmyshared-ggdb.so should report that library has debug info and not stripped.
nm libmyshared-ggdb.so | grep print_from_lib should find the symbol for print_from_lib function.
If all above tests passed try to load the library directly in gdb and find the function:
gdb libmyshared-ggdb.so
(gdb) info functions print_from_lib
Function print_from_lib name should be printed. If not, something is wrong with gdb or gcc.
When I specify the output format to be i386, my execute got a SIGSEGV. However, when I use -m elf_i386 option, it worked. Checking man page, these two are different, since OUTPUT_FORMAT is equivalent to -oformat option.
So, what are the differences between the two and which should I use in which cases?
Example code:
File hello.c:
int a = 1;
int b;
void _start() {
/* exit system call */
asm("movl $1,%eax;"
"xorl %ebx,%ebx;"
"int $0x80"
);
}
script.lds: OUTPUT_FORMAT and OUTPUT_ARCH seem to do nothing to help my program running.
/* OUTPUT_FORMAT("elf32-i386"); */
/* OUTPUT_ARCH(i386); */
OUTPUT(hello);
ENTRY(_start);
SECTIONS
{
.text 0x10000:
{
*(.text)
}
.data 0x8000000:
{
*(.data)
}
.bss :
{
*(.bss)
}
}
Commands executes:
gcc -m32 -nostdlib -g -c hello.c -o hello.o
ld -m elf_i386 -T script.lds hello.o
The difference really is that emulation means way more than just OUTPUT_ARCH and OUTPUT_FORMAT. Some of the details are almost obvious, like the difference in default linker scripts that can be seen with --verbose option, some are described in this document, but most of the answers could only be found in sources, like compare emulation script for elf_i386 and emulation script for elf_x86_64. The difference doesn't seem to be that high, but that's not the only difference and what actually bites you in your particular case can't even be seen with a diff between generated (on ld build) ld/eelf_i386.c and ld/eelf_x86_64.c files, because that boils down to the constant that comes from the bfd library and that also depends on emulation.
So, let's drill down a little bit and see what happens. Everywhere down below by script.lds I mean your script with OUTPUT_ARCH and OUTPUT_FORMAT uncommented.
Now, let's take a look at differences in results first:
$ ld -T script.lds hello.o
$ LC_ALL=C objdump -p hello
hello: file format elf32-i386
Program Header:
LOAD off 0x00000000 vaddr 0x00000000 paddr 0x00000000 align 2**21
filesz 0x00010048 memsz 0x00010048 flags r-x
LOAD off 0x00200000 vaddr 0x08000000 paddr 0x08000000 align 2**21
filesz 0x00000004 memsz 0x00000008 flags rw-
STACK off 0x00000000 vaddr 0x00000000 paddr 0x00000000 align 2**4
filesz 0x00000000 memsz 0x00000000 flags rw-
$ ld -m elf_i386 -T script.lds hello.o
$ LC_ALL=C objdump -p hello
hello: file format elf32-i386
Program Header:
LOAD off 0x00001000 vaddr 0x00010000 paddr 0x00010000 align 2**12
filesz 0x00000048 memsz 0x00000048 flags r-x
LOAD off 0x00002000 vaddr 0x08000000 paddr 0x08000000 align 2**12
filesz 0x00000004 memsz 0x00000008 flags rw-
STACK off 0x00000000 vaddr 0x00000000 paddr 0x00000000 align 2**4
filesz 0x00000000 memsz 0x00000000 flags rw-
Notice that the "bad" binary has a PT_LOAD segment with virtual address of zero and alignment of 0x00200000. Virtual address 0 doesn't sound right, but let's see why it really fails. Debugging that is a real fun. If one tries to use gdb, he gets this:
(gdb) run
Starting program: /somewhere/hello
During startup program terminated with signal SIGSEGV, Segmentation fault.
(gdb) bt
No stack.
(gdb) info registers
The program has no registers now.
So the program doesn't even start actually running. Let's look at strace then:
$ strace ./hello
execve("./hello", ["./hello"], [/* 108 vars */]) = -1 EPERM (Operation not permitted)
--- SIGSEGV {si_signo=SIGSEGV, si_code=SI_KERNEL, si_addr=0} ---
+++ killed by SIGSEGV +++
We see that execve() returns EPERM. What kind of permission could fail? Well, that's exactly because of that virtual address of zero, the kernel tries to load our ELF, tries to map file for virtual address 0 and fails to do that because around Linux 2.6.23 times there was a security feature introduced that forbids doing that. But that can be configured, so after a simple
$ echo 0 > /proc/sys/vm/mmap_min_addr
the "bad" binary suddenly starts working.
But we're not about making something work here (yay!), we're about the differences in ld behaviour. What also differs between our "bad" and "good" binaries is the alignment of the loadable segments. And if you think about it for a while, you'll see that ld behaviour is actually absolutely correct, when it has an alignment constraint of 0x1000 it uses virtual address of 0x10000 for the segment start that is correct for this alignment, but when it has an alignment constraint of 0x200000, given that we have instructed it to put our .text into address 0x10000 it has no other choice but to use base virtual address of zero!
So where this alignment requirement comes from? Here we return to our emulation stuff, because the default alignment for both elf_i386 and elf_x86_64 is the maximum page size (got from bfd via bfd_emul_get_maxpagesize()), but that page size is different for these architectures.
You actually can build your binary without elf_i386 emulation, but in order to do that you need to specify the maximum page size via parameter, like:
$ ld -T script.lds -z max-page-size=0x1000 hello.o
This resulting binary will not only work without mmap_min_addr tweaks, but it will also be bit-by-bit identical to the one built with proper elf_i386 emulation.
Getting back to the original questions — the difference is huge and subtle in its details. You definitely want to use the right emulation when you build your software. 99.99% of the time your OUTPUT_FORMAT is going to be something very similar to your emulation parameter.
But. Well. There are some cases. Things you normally don't do. But you can do if you're careful and there is need to, like, for example:
$ head -n 1 script.lds
OUTPUT_FORMAT("srec");
$ ld -T script.lds hello.o
$ file hello
hello: Motorola S-Record; binary data in text format
Exactly the case where your emulation is one thing and OUTPUT_FORMAT is really about output format that you need for some (strange) reason.
But don't try that at home, please, use proper emulations and forget about all this nightmare.
I'm trying to build a simple ("hello world") C++ program with LLVM/Clang 3.7.0 built from sources against the toolchain's libc++, with the command line:
clang++ -std=c++14 -stdlib=libc++ -fno-exceptions hello.cpp
However, I get the following errors:
/usr/bin/ld: warning: libc++abi.so.1, needed by /bulk/workbench/llvm/3.7.0
/toolchain4/bin/../lib/libc++.so, not found (try using -rpath or -rpath-link)
/bulk/workbench/llvm/3.7.0/toolchain4/bin/../lib/libc++.so: undefined reference to `__cxa_rethrow_primary_exception'
/bulk/workbench/llvm/3.7.0/toolchain4/bin/../lib/libc++.so: undefined reference to `__cxa_decrement_exception_refcount'
/bulk/workbench/llvm/3.7.0/toolchain4/bin/../lib/libc++.so: undefined reference to `std::out_of_range::~out_of_range()'
[...]
The LD_LIBRARY_PATH is not set and the toolchain's install directory is added to my working PATH by:
export PATH=$PATH:/bulk/workbench/llvm/3.7.0/toolchain4/bin/
I'm on Ubuntu GNU/Linux 14.04 and I have not installed anything LLVM or Clang related packages from any repository.
According to the libc++ documentation:
On Linux libc++ can typically be used with only ‘-stdlib=libc++’. However some libc++ installations require the user manually link libc++abi themselves. If you are running into linker errors when using libc++ try adding ‘-lc++abi’ to the link line.
Doing as suggested gives a successful build.
So, my question is this:
Why do I have to specify the -lc++abi dependency explicitly on the line of the build command?
Doing
readelf -d $(llvm-config --libdir)/libc++.so
gives
Dynamic section at offset 0xb68c8 contains 31 entries:
Tag Type Name/Value
0x0000000000000001 (NEEDED) Shared library: [libc++abi.so.1]
0x0000000000000001 (NEEDED) Shared library: [libpthread.so.0]
0x0000000000000001 (NEEDED) Shared library: [libc.so.6]
0x0000000000000001 (NEEDED) Shared library: [libm.so.6]
0x0000000000000001 (NEEDED) Shared library: [librt.so.1]
0x0000000000000001 (NEEDED) Shared library: [libgcc_s.so.1]
0x000000000000000e (SONAME) Library soname: [libc++.so.1]
0x000000000000000f (RPATH) Library rpath: [$ORIGIN/../lib]
0x000000000000000c (INIT) 0x350a8
[...]
Shouldn't the embedded RPATH in the dynamic section of the ELF be considered by ld as described in its man page under the section -rpath-link=dir?
Moreover, when I set the LD_LIBRARY_PATH with
LD_LIBRARY_PATH=$(llvm-config --libdir)
the initial build command (without specifying -lc++abi) works, as also described in the 5th clause of the aforementioned man entry.
I'm currently working through a document titled "Building a Simple OS -- from scratch". It teaches x86 instructions only in 32-bit. At one point the author lists this C function:
int my_function() {
return 0xbaba;
}
and says that it compiles into this assembly:
00000000 55 push ebp
00000001 89E5 mov ebp, esp
00000003 B8BABA0000 mov eax, 0xbaba
00000008 5D pop ebp
00000009 C3 ret
I have the code for my_function() in a file called basic.c and I'm using the following bash instructions (on Mac OS X Yosemite w/ Xcode installed):
gcc -ffreestanding -m32 -c basic.c -o basic.o
ld -arch i386 -no_pie -e _my_function -static -o basic.bin -image_base 0x0 basic.o
These are successful, but when I run
ndisasm -b 32 basic.bin > basic.dis
I get a file with over 2000 lines of assembly, most of which are
00000FDA 0000 add [eax],al
How can I get it to just compile to the simple five lines listed by author?
You should be looking at the .o file, not the linked file (or using a different tool to disassemble just the desired function in the linked file). Per the manual:
NDISASM does not have any understanding of object file formats, like objdump, and it will not understand DOS .EXE files like debug will. It just disassembles.
ld in the OS X / Xcode toolchain produces a Mach-O binary. This includes various metadata in addition to the machine code for the function. ndisasm isn't aware of the file structure and is attempting to disassemble the metadata as code (which it isn't).