How to debug an experimental toolchain producing malformed executables - c

I am working on cross compiling an experimental GNU free Linux toolchain using clang (instead of gcc), compiler-rt (instead of libgcc), libunwind (available at http://llvm.org/git/libunwind.git) (instead of libgcc_s), lld (instead of GNU ld), libcxx (instead of libstdc++), libcxxabi (instead of not sure, I'm unclear on the GNU distinction between libstdc++ and its ABI) and musl (instead of glibc).
Using a musl based gcc cross compiler and a few patches I've managed to successfully compile all of the above and sucessfully compile and link a simple hello world C program with it. Something seems to have gone wrong, however, as running the hello world program results in a segmentation fault:
$ ./hello
Segmentation fault
$
Normally I would simply debug it with gdb, but herein lies the problem:
$ gdb ./hello
Reading symbols from ./hello...Dwarf Error: Could not find abbrev number 5 in CU at offset 0x52 [in module /home/main/code/main/asm/hello]
(no debugging symbols found)...done.
(gdb) start
Temporary breakpoint 1 at 0x206
Starting program: /hello
During startup program terminated with signal SIGSEGV, Segmentation fault.
(gdb)
I can't seem to step through the program in any way, I'm guessing because the error is occuring somewhere in early C runtime startup. I can't even step through the assembly using layout asm and stepi, so I really don't know how to find out where exactly the error is occuring (to debug my toolchain).
I have confirmed that the problem resides with lld by using a GNU binutils ld to successfully link the hello world object (statically) using the cross compiled libraries and object files, which results in a functional hello world program. Since lld successfully links, however, I can't pinpoint where failure is occuring.
Note I compiled hello as a static executable and used the -v gcc/clang option to verify that all the correct libraries and object files were linked it.
Note online GDB documentation has the following to say about the above error:
On Unix systems, by default, if a shell is available on your target, gdb) uses it to start your program. Arguments of the run command are passed to the shell, which does variable substitution, expands wildcard characters and performs redirection of I/O. In some circumstances, it may be useful to disable such use of a shell, for example, when debugging the shell itself or diagnosing startup failures such as:
(gdb) run
Starting program: ./a.out
During startup program terminated with signal SIGSEGV, Segmentation fault.
which indicates the shell or the wrapper specified with ‘exec-wrapper’ crashed, not your program.
I don't think this is true, considering what I'm working with and that the problem doesn't happen when I use GNU ld, and because the suggested solution (set startup-with-shell off) doesn't work.

The croscompilling means that the compilation is done on a host machine, and the output of the compilation is the binary which shall run on a target machine. Therefore the compiled binary is not compatible with your host CPU. Instead, if your target supports this, you could run the binary there and use the debugger from your toolchain to connect to the running binary remotely if supported. Or alternatively, the debugger may also be available at the target and you can debug the binary already at place.
Just to get more feeling, try to use command file for the compiled binary, and some other binaries of your host to see possible differences.

Related

Why gcc -g doesn't work with multiple files

To debug my C code I compile it with the -g flag and use lldb to see where my seg fault is for example.
I use the -g flag so the output of lldb is in C not Assembly.
but now I have a multiple files project and lldb shows only Assembly even tho I'm using the -g flag, it's like the -g flag applies only to one file.
Example:
gcc -g example.c
lldb a.out
>run
I get c code here
gcc -g example1.c example2.c main.c
lldb a.out
>run
I get assembly code here
Can anyone tell me what I'm I missing here?
and how can I get c code in lldb.
Thanks in advance.
When you just run the program you shouldn't be getting code at all.
You will be getting code if the program stops running. Then you need to look at the call stack to make sure you're actually in your own code.
If you're in library code then it will likely not have source available and you'll get assembler code. Go up the call-stack until you reach your own code.
GNU’s documentation for the ‘gcc -g’ says
Produce debugging information in the operating system’s native format (stabs, COFF, XCOFF, or DWARF). GDB can work with this debugging information.
Notice that it makes no mention of either C or assembler.
I imagine that in your first example the error was in your C code; in your second example the error was in a library, such as stdio, for which the debugger doesn’t have C source.
A segmentation fault corresponds to an invalid address. This might mean that you passed invalid data to a library that was expecting a pointer, or you passed a pointer to a buffer but an incorrect length.
A typical error that might cause this is passing a value (v) to a library that is expecting a pointer (&v).

How to run arm64 baremetal hello world program on qemu?

Often a question leads me into another question.
While trying to debug an inline assembly code, I met with another basic problem.
To make long story short, I want to run arm64 baremetal hello world program on qemu.
#include <stdio.h>
int main()
{
printf("Hello World!\n");
}
I compile it like this :
aarch64-none-elf-gcc -g test.c
I get undefined reference errors for _exit _sbrk _write _close _lseek _read _fstat and _isatty. I learned in the past the -specs=rdimon.specs compile options removes this errors.
So I ran
aarch64-none-elf-gcc -g test.c -specs=rdimon.specs
and it compiles ok with a.out file.
Now I run qemu baremetal program to debug the code.
qemu-system-aarch64 -machine
virt,gic-version=max,secure=true,virtualization=true -cpu cortex-a72
-kernel a.out -m 2048M -nographic -s -S
and here is the gdb run result.
ckim#ckim-ubuntu:~/testdir/testinlinedebugprint$ aarch64-none-elf-gdb a.out
GNU gdb (GNU Toolchain for the A-profile Architecture 10.2-2020.11 (arm-10.16)) 10.1.90.20201028-git
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "--host=x86_64-pc-linux-gnu --target=aarch64-none-elf".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://bugs.linaro.org/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from a.out...
(gdb) set architecture aarch64
The target architecture is set to "aarch64".
(gdb) set serial baud 115200
(gdb) target remote :1234
Remote debugging using :1234
_start ()
at /tmp/dgboter/bbs/build02--cen7x86_64/buildbot/cen7x86_64--aarch64-none-elf/build/src/newlib-cygwin/libgloss/aarch64/crt0.S:90
90 /tmp/dgboter/bbs/build02--cen7x86_64/buildbot/cen7x86_64--aarch64-none-elf/build/src/newlib-cygwin/libgloss/aarch64/crt0.S: No such file or directory.
(gdb) b main
Breakpoint 1 at 0x4002f8: file test.c, line 26.
(gdb)
(gdb) r
The "remote" target does not support "run". Try "help target" or "continue".
(gdb) c
Continuing.
It doesn't break and hangs.
What am I doing wrong? and how can I solve the /tmp/dgboter/bbs/build02--cen7x86_64/buildbot/cen7x86_64--aarch64-none-elf/build/src/newlib-cygwin/libgloss/aarch64/crt0.S: No such file or directory. problem?
Any help will be really appreciated. Thanks!
ADD :
I realized I have asked the same question (How to compile baremetal hello_world.c and run it on qemu-system-aarch64?) before (Ah! my memory..) I realized I need all the stuff like start.S crt0.S and the linker script, . . .I stupidly thought the baremetal compiler will take care of it automatically when actually I have to fill the really low level things. I've worked on baremetal programs in some cases but it was after someone else had already set up those initial environment(sometimes I even modified them many times!). In baremetal, you have to privide all the things. There isn't anything you can take for granted because it's "bare metal". I realized this basic thing so late..
When you build a program for "bare metal" that means that you need to configure your toolchain to produce a binary that works on the specific piece of bare metal that you try to run it on. For instance, the binary must:
put its code somewhere in the machine's memory map where there is either ROM or RAM
put its data where there is RAM
make sure that on startup the stack pointer is correctly initialized to point into RAM
if it wants to print output, include routines which access a suitable device on that machine. This is likely a serial port, and serial ports are often entirely different devices, located at different addresses, on different machines
If any of these things are wrong or don't match the actual machine you run on, the result is typically exactly what you see -- the program crashes without output.
More specifically, rdimon.specs tells the compiler to build in C library functions which do some of this via the "semihosting" debugger ABI (which has support for "print string" and some other things). Your QEMU command line doesn't enable implementation of semihosting (you can turn it on with the -semihosting option), so that won't work at all. But there are probably other problems you're also hitting.

Analysing stack frame of C program in Linux

I'd like to ask if there is any option to gcc for Linux which allows debugging stack frames of given procedure of program written in C?
I know I can compile my program with -ggdb3 gcc parameter and it allows me to find out what are the symbols in this program. But is there any method to find out how the procedures arguments are passed (via stack or registers)?
I've got program which overwrites stack causing SEGV and I'd like to analyse it from the same program. First I'd like to find the problematic procedure and then I'm planning to find the place of the error.
You have a few options. One I prefer is to look at the actual generated code as it tells me exactly what is being executed. You can get this when compiling with gcc or g++. This will create a file with a .S suffix.
For example, gcc -S helloworld.c will also create a file called helloworld.S which contains the assembly code.
If you don't have source you can use tools like objdump to turn the binary code into a disassembly.
Lots of examples if you search for gcc assembly output

Core dumped after program exits

I have a quite intriguing issue with my program (compiled with gcc 4.6.4 on ubuntu 12.04). When I dynamically build the executable, the program runs flawlessly. But when I build it statically (with -static flag), it gives me a 'core dumped' after exiting (e.g. after 'return 0' in main). Unfortunately, the whole program is too big to pose in here. What are the possibilities?
In addition of the two possibilities in johnnycrash answer:
Some functions with __attribute__ ((destructor)) is called, and dump core.
The memory heap is corrupted (check with valgrind)
Some function registered with atexit(3) is crashing
Some library/function is linked "twice"
1) You have a thread still executing.
2) You are overwriting memory and you get lucky with the dynamic libraries.

How do I know which illegal address the program access when a segmentation fault happens

Plus, The program runs on a arm device running Linux, I can print out stack info and register values in the sig-seg handler I assign.
The problem is I can't add -g option to the source file, since the bug may won't reproduce due to performance downgrade.
Compiling with the -g option to gcc does not cause a "performance downgrade". All it does is cause debugging symbols to be included; it does not affect the optimisation or code generation.
If you install your SIGSEGV handler using the sa_sigaction member of the sigaction struct passed to sigaction(), then the si_addr member of the siginfo_t structure passed to your handler contains the faulting address.
I tend to use valgrind which indicates leaks and memory access faults.
This seems to work
http://tlug.up.ac.za/wiki/index.php/Obtaining_a_stack_trace_in_C_upon_SIGSEGV
static void signal_segv(int signum, siginfo_t* info, void*ptr) {
// info->si_addr is the illegal address
}
If you are worried about using -g on the binary that you load on the device, you may be able to use gdbserver on the ARM device with a stripped version of the executable and run arm-gdb on your development machine with the unstripped version of the executable. The stripped version and the unstripped version need to match up to do this, so do this:
# You may add your own optimization flags
arm-gcc -g program.c -o program.debug
arm-strip --strip-debug program.debug -o program
# or
arm-strip --strip-unneeded program.debug -o program
You'll need to read the gdb and gdbserver documentation to figure out how to use them. It's not that difficult, but it isn't as polished as it could be. Mainly it's very easy to accidentally tell gdb to do something that it ends up thinking you meant to do locally, so it will switch out of remote debugging mode.
You may also want to use the backtrace() function if available, that will provide the call stack at the time of the crash. This can be used in order to dump the stack like it happens in an high level programming language when a C program gets a segmentation fault, bus error, or other memory violation error.
backtrace() is available both on Linux and Mac OS X
If the -g option makes the error disappear, then knowing where it crashes is unlikely to be useful anyway. It's probably writing to an uninitialized pointer in function A, and then function B tries to legitimately use that memory, and dies. Memory errors are a pain.

Resources