How to debug a crash before main? - c

My program links statically to many libraries and crashes before getting to main in GDB. How do I diagnose what the problem is?

It's a good bet that LD_DEBUG can help you here. Try this: LD_DEBUG=all ./a.out. This will allow you to easily identify the library which is being loaded when your program crashes.
(Edit: if it wasn't clear, a.out is meant to refer to a generic binary file -- in this case, replace it with the name of your executable).
Edit 2:
To clarify, LD_DEBUG is an environment variable which is examined by the dynamic linker when a program begins execution. If LD_DEBUG is set to some value, the dynamic linker will output a lot of information about the dynamic libraries being loaded during program execution, symbol binding, and so on.
For starters, execute the following on your machine:
LD_DEBUG=help ls
You will see the valid options for LD_DEBUG on your system listed. The most verbose setting is all, which will display all available information.
Now, to use this is as simple as the ls example, only replace ls with the name of your program. There is no need for gdb in order to use LD_DEBUG, as it is functionality provided solely by the dynamic linker, and not by gdb.

It may crash because some component throws an exception and nobody catches it since main() hasn't been entered yet. Set a breakpoint on throwing an exception:
catch throw
run
(If catch throw doen't work the first time you start it, run it once to let it load the dynamic libraries and then do catch throw and run again).

This post has the answer, you have to set a breakpoint before main in the crt0 startup code:
Using GDB without debugging symbols on x86?

starti
starti breaks at the very first instruction executed, see also: Stopping at the first machine code instruction in GDB
An alternative if your GDB is not new enough:
break _start
if you know the that the name of the entry point method is _start, or:
info files
search for Entry point:
Entry point: 0x400440
and run:
break *0x400440
TODO: find out how to compile crt* objects with debug symbols and step into them: How to compile my own glibc C standard library from source and use it?

Start taking the libraries out one by one until it stops crashing.
Then examine the culprit.

I haven't run into this in C but if you link to a c++ library static initialization can crash. You can create it easily by having an assert in a constructor of a static scope variable.

If you can, link your program dynamically instead of statically and follow #denniston.t answer. Maybe debug trace from dynamic linker will help to fix this problem.

Related

Why is _dl_fixup called before dynamic linker start?

I'm trying to understand how glibc dynamic linker works. I know that _dl_fixup is called in _dl_runtime_resolve, and solves the relocation problems. So I thought it's called only after linker starts and has loaded some libraries. But when I do some print work in it, I find the function is called even before _dl_start. It's confusing: why it was called? What work it has done?
I did some print work, the function is working on symbols like strncpy, fopen, fread64 and so on, but the object name(l->l_name) seems to be null.
I use gdb to debug the linker, and I think gdb itself used _dl_fixup to complete some tasks. If I didn't use gdb, the _dl_fixup will be called only after _dl_start.
So I thought it's called only after linker starts and has loaded some libraries
That is correct.
I find the function is called even before _dl_start
This is not correct: _dl_fixup is called only after _dl_start.
Unfortunately you didn't provide any details on how you've came to the incorrect conclusion, so it's impossible to tell you where you made a mistake, but you did make (at least one) mistake.

ELF binary from memory

I just wrote a Hello world program in C that I was playing around with. I'd like to try and dump the binary from memory(using gdb) and try to create another executable from it. I tried dumping the page with executable privileges followed by its data page; however it segfaults. Are there any approaches to doing this? Is there any way I can debug and find out why it crashes? Any generic suggestions at all?
Thanks.
[EDIT]
Its on linux and I've tried it on both 32 and 64-bit x86. The kernel version is 3.13. I set a breakpoint on _start, dumped the executable page followed by its data page to a file and tried executing it.
Wait, are you just dumping the mapped text (exectuable page) section followed by the mapped data section to a file? That itself wouldn't be a valid ELF object, an ELF file needs an ELF header as well. I am surprised the OS even let you attempt to execute that, you should have gotten an error about an invalid ELF header or something like that.
In addition to the header, an ELF file contains many more sections that are important to be able to run it.
As for debugging, I'd start with GDB to see where it crashes. Does your program crash, or does the dynamic linker crash when trying to load your program? If the dynamic linker crashes, try debugging that, e.g. with
gdb --args /lib64/ld-2.18.so <your program>
Attempts to re-create ELF files from memory have been done before - have a look at Statifier, which even statically includes all loaded dynamic libraries into the resulting ELF.
It might be not very simple and is certainly processor and operating system specific.
You could look at emacs source unexec.c which is doing what you want. See this answer

What function actually calls WinMain

How is WinMain() actually called? I remember a function used by pro-hackers that started with (something) that looked like __startupWinMain().
The problem is, I have a Win32 EXE(compiled with /SUBSYSTEM:WINDOWS) but gets arguments from command-line. If the command line is incorrect, the process should print a help message to the console.
How can I manually deallocate(or FreeConsole()) from an exe with /SUBSYSTEM:WINDOWS linker option?
As the very first act of your program, check the parameters. If they are fine, continue as normal.
Otherwise call AttachConsole passing ATTACH_PARENT_PROCESS. If that succeeds, then you can print your error to stdout and quit. If it doesn't, then you'll have to show the error in a message box.
Perhaps you should consider having the program pop up a message box when the command line is incorrect. Something like this:
MessageBox( NULL, "(description of command line error)",
"MyProg - Command Line Error",
MB_OK|MB_ICONEXCLAMATION );
This will open a message box in the center of the display and wait for the user to acknowledge it before actually terminating your program.
On the other hand, you could build your program as a console app and use printf() to write to the console. A console program may still create windows, but the console itself will hang around unless you figure out how to detach from it (and then, of course, you will no longer be able to use printf().)
How does the compiler know to invoke wWinMain instead of the standard main function? What actually happens is that the Microsoft C runtime library (CRT) provides an implementation of main that calls either WinMain or wWinMain.
Note The CRT does some additional work inside main. For example, any static initializers are called before wWinMain. Although you can tell the linker to use a different entry-point function, use the default if you link to the CRT. Otherwise, the CRT initialization code will be skipped, with unpredictable results. (For example, global objects will not be initialized correctly.)
How is WinMain() actually called?
If you single-step to the first line of your program in a debugger, and then look at the stack, you can see how WinMain gets called. The actual start function for a typical build is a function pulled in from the run-time library. For me, it's _WinMainCRTStartup, but I suppose it might vary depending on the version of the compiler, linker, and library you build with. The startup function from the run-time library does some initialization and then calls WinMain.
Using dumpbin /headers (or another program that can inspect a PE binary), you can confirm which function is the "entry point" to your executable. Unless you've done something to change it, you'll probably see _WinMainCRTStartup, which is consistent with what the stack trace tells us.
That should answer your question, but it doesn't solve your problem. It looks like some others have posted good solutions.

Implementing traceback on i386

I am currently porting our code from an alpha (Tru64) to an i386 processor (Linux) in C.
Everything has gone pretty smoothly up until I looked into porting our
exception handling routine. Currently we have a parent process which
spawns lots of sub processes, and when one of these sub-processes
fatal's (unfielded) I have routines to catch the process.
I am currently struggling to find the best method of implementing a traceback routine which can list the function addresses in the error log, currently my routine just prints the the signal which caused the exception and the exception qualifier code.
Any help would be greatly received, ideally I would write error handling for all processors, however at this stage I only really care about i386, and x86_64.
Thanks
Mark
The glibc functions backtrace() and backtrace_symbols(), from execinfo.h, might be of use.
You might look at http://tlug.up.ac.za/wiki/index.php/Obtaining_a_stack_trace_in_C_upon_SIGSEGV. It covers the functionality you need. However you must link against libgdb and libdl, compile with -rdynamic (includes more symbols in the executable), and forgo the use of some optimizations.
There are two GNU (non-POSIX) functions that can help you - backtrace() and backtrace_symbols() - first returns array of function addresses and second resolves addresses to names. Unfortunately names of static functions cannot be resolved.
To get it working you need to compile your binary with -rdynamic flag.
Unfortunately, there isn't a "best" method since the layout of the stack can vary depending on the CPU, the OS and the compiler used to compile your code. But this article may help.
Note that you must implement this in the child process; the parent process just gets a signal that something is wrong; you don't get a copy of the child stack.
If a comment, you state you are using gcc. This http://gcc.gnu.org/onlinedocs/gcc-4.4.3/gcc/Return-Address.html#Return-Address could be useful.
If you're fine with only getting proper backtraces when running through valgrind, then this might be an option for you:
VALGRIND_PRINTF_BACKTRACE(format, ...):
It will give you the backtrace for all functions, including static ones.

__libc_lock_lock is segfaulting

I am working on a piece of code which uses regular expressions in c.
All of the regex stuff is using the standard regex c library.
On line 246 of regexec.c, the line is
__libc_lock_lock(dfa->lock);
My program is segfaulting here and I cannot figure out why. I was trying to find where __libc_lock_lock was defined and it turns out it is a macro in bits/libc-lock.h. However, the macro isnt actually defined to be anything, just defined.
Two questions:
1) Where is the code that is run when __libc_lock_lock is called (I know it must be
replaced with something but I dont know where that would be.
2) if dfa is a re_dfa_t object which is casted from a c string which is the buffer member of the regex_t object type, it will not have any member lock. Is this what is supposed to happen.
It really seams like there is some kind of magic going on here with this __libc_lock_lock
If the segfault is in libc then you can be 99.9% sure of the following:
You are doing something wrong with the API
You have at some previous point clobbered or corrupted memory used by libc, and this is a delayed effect. (Thanks Tyler!)
You are doing something that is pushing the API's capability
You are a developer testing the current trunk with new changes in the API implementation
I suspect that the first is the cause. Posting your API usage and your library version might help. The Regexp API in libc is pretty stable.
Look up debugging with gdb to find a stack trace of the execution path leading to the segfault, and install the glibc-devel packages for the symbols. If the segfault is in (or out) of libc ... then you have done something bad (not initialized an opaque pointer for example)
[aiden#devbox ~]$ gdb ./myProgram
(gdb) r
... Loads of stuff, segfault info ..
(gdb) bt
Will print the stack and function-names that led to the segault. Compile your source with the '-g' debug flag to keep important debugging information.
Get an authoritative source for API usage/examples!
Good Luck
In answer to your first question:
The macro is defined in the libc-lock.h; its relative path is sysdeps/mach/bits on
the glibc release I use (2.2.5). Lines 67/68 from that file are
/* Lock the named lock variable. */
#define __libc_lock_lock(NAME) __mutex_lock (&(NAME))
Run your code in gdb until you get to the segfault. Then do a backtrace to find out where it was.
Here is the set of commands you will type to do this:
gdb myprogram
run
***Make it crash***
backtrace
Typing backtrace will print the call stack and will show you what path the code has taken to get to the point where it is segfaulting.
You can go up and down in the stack to your code by typing 'up' or 'down' respectively. Then you can examine variables in that scope.
So for instance, if your backtrace command prints this:
linux_black_magic
more_linux
libc
libc
yourcode.c
Type 'up' a few times so that the stack frame is in your code instead of linux's. You can then examine variables and memory that your program is operating on. Do this:
print VariableName
x/10 &Variable
That will print the value of the variable and then will print a hex dump of memory starting at the variable.
Those are some general techniques to use with gdb and debugging, post more details for more detailed answers.

Resources