How to see ruby c extension backtrace info

How to see ruby c extension backtrace info - c

here my output, I don't understand the hex "0xe8" and "0x7f8c783ac74d";
/home/roroco/Dropbox/rbs/ro_article/c/ro_helper_article.so(get_article_n2+0xe8) [0x7f8c783ac74d]
here is full output

It looks like you've caused (or, rather, a plugin caused) ruby to segfault. This normally means that you've attempted to access memory outside of your designated bounds - basically, your program did something really, really weird. The line you specifically picked out is actually a C library - the .so extension means "static object," and is linked into the main ruby executable. The information it's providing you with tells you where the error originated - however, most production libraries do not contain information such as "file names" and "line numbers". Instead, they contain a list of symbols. In your case, it's telling you exactly where, in the static object, an error originated - exactly 0xe8 bytes after the get_article_n2 symbol - or, at the address 0x7f8c783ac74d.
So now you have a few options.
You can poke around blindly in your source code (I'm assuming you wrote the library that is in error here, since it seems that's what you're testing) and try and guess where the segfault originated. You already know that it's in the function get_article_n2, considering the error originated after that symbol.
You can disassemble the static object to see the specific instruction that caused the error, and then attempt to map it to the source.
You can enable debugging, and have your build system output file names and line numbers so you know what you're looking at. (disclaimer: I'm not sure if this will work; it doesn't look like you're emitting debug information to me, but I'm not sure if you are; and even if you would be, I'm not sure it would be used to output. However, this seems the easiest course of action.).

Related

How to execute a debugger command from within the app

In runtime I'm trying to recover an address of a function that is not exported but is available through shared library's symbols table and therefore is visible to the debugger.
I'm working on advanced debugging procedure that needs to capture certain events and manipulate runtime. One of the actions requires knowledge of an address of a private function (just the address) which is used as a key elsewhere.
My current solution calculates offset of that private function relative to a known exported function at build time using nm. This solution restricts debugging capabilities since it depends on a particular build of the shared library.
The preferable solution should be capable of recovering the address in runtime.
I was hoping to communicate with the attached debugger from within the app, but struggle to find any API for that.
What are my options?

In runtime I'm trying to recover an address of a function that is not exported but is available through shared library's symbols table and therefore is visible to the debugger.
Debugger is not a magical unicorn. If the symbol table is available to the debugger, it is also available to your application.
I need to recover its address by name using the debugger ...
That is entirely wrong approach.
Instead of using the debugger, read the symbol table for the library in your application, and use the info gained to call the target function.
Reading ELF symbol table is pretty easy. Example. If you are not on ELF platform, getting equivalent info should not be much harder.

In lldb you can quickly find the address by setting a symbolic breakpoint if it's known to the debugger by whatever means:
b symbolname
If you want call a non exported function from a library without a debugger attached there are couple of options but each will not be reliable in the long run:
Hardcode the offset from an exported library and call exportedSymbol+offset (this will work for a particular library binary version but will likely break for anything else)
Attempt to search for a binary signature of your nonexported function in the loaded library. (slightly less prone to break but the binary signature might always change)
Perhaps if you provide more detailed context what are you trying achieve better options can be considered.
Update:
Since lldb is somehow aware of the symbol I suspect it's defined in Mach-O LC_SYMTAB load command of your library. To verify that you could inspect your lib binary with tools like MachOView or MachOExplorer . Or Apple's otool or Jonathan Levin's jtool/jtool2 in console.
Here's an example from very 1st symbol entry yielded from LC_SYMTAB in MachOView. This is /usr/lib/dyld binary
In the example here 0x1000 is virtual address. Your library most likely will be 64bit so expect 0x10000000 and above. The actual base gets randomized by ASLR, but you can verify the current value with
sample yourProcess
yourProcess being an executable using the library you're after.
The output should contain:
Binary Images:
0x10566a000 - 0x105dc0fff com.apple.finder (10.14.5 - 1143.5.1) <3B0424E1-647C-3279-8F90-4D374AA4AC0D> /System/Library/CoreServices/Finder.app/Contents/MacOS/Finder
0x1080cb000 - 0x1081356ef dyld (655.1.1) <D3E77331-ACE5-349D-A7CC-433D626D4A5B> /usr/lib/dyld
...
These are the loaded addresses 0x100000000 shifted by ASLR. There might be more nuances how exactly those addresses are chosen for dylibs but you get the idea.
Tbh I've never needed to find such address programmatically but it's definitely doable (as /usr/bin/sample is able to do it).
From here to achieve something practically:
Parse Mach-o header of your lib binary (check this & this for starters)
Find LC_SYMTAB load command
Find your symbol text based entry and find the virtual address (the red box stuff)
Calculate ASLR and apply the shift
There is some C Apple API for parsing Mach-O. Also some Python code exists in the wild (being popular among reverse engineering folks).
Hope that helps.

Issue preventing GCC from optimizing out global variable

I am using ARM-GCC v4.9 (released 2015-06-23) for a STM32F105RC processor.
I've searched stackoverflow.com and I've found this in order to try to convince gcc not to optimize out a global variable, as you may see below:
static const char AppVersion[] __attribute__((used)) = "v3.05/10.oct.2015";
Yet, to my real surprise, the compiler optimized away the AppVersion variable!
BTW: I am using the optimize level -O0 (default).
I also tried using volatile keyword (as suggested on other thread), but it didn't work either :(
I already tried (void)AppVersion; but it doesn't work...
Smart compiler!? Too smart I suppose...
In the meantime, I use a printf(AppVersion); some place in my code, just to be able to keep the version... But this is a boorish solution :(
So, the question is: Is there any other trick that does the job, i.e. keep the version from being optimized away by GCC?
[EDIT]:
I also tried like this (i.e. without static):
const char AppVersion[] __attribute__((used)) = "v3.05/10.oct.2015";
... and it didn't work either :(

Unfortunately I am not aware of a pragma to do this.
There is however another solution. Change AppVersion to:
static char * AppVersion = "v3.05/10.oct.2015";
and add:
__asm__ ("" : : "" (AppVersion));
to your main function.
You see I dropped the 'used' attribute, according to the documentation this is a function attribute.
Other solutions: Does gcc have any options to add version info in ELF binary file?
Though I found this one to be the easiest. This basically won't let the compiler and linker remove AppVersion since we told it that this piece of inline assembly uses it, even though we don't actually insert any inline assembly.
Hopefully that will be satisfactory to you.
Author: Andre Simoes Dias Vieira
Original link: https://answers.launchpad.net/gcc-arm-embedded/+question/280104

Given the presence of "static", all your declaration does is ask the compiler to include the bytes representing characters of the string "v3.05/10.oct.2015" in
some order at some arbitrary location within the file, but not bother to tell
anyone where it put them. Given that the compiler could legitimately write
that sequence of bytes somewhere in the code image file whether or not it
appeared anywhere in the code such a declaration really isn't very useful. To
be sure, it would be unlikely that such a sequence would appear in the code
entirely by chance, and so scanning the binary image for it might be a somewhat
reliable way to determine that it appeared in the code, but in general it's
much better to have some means of affirmatively determining where the string
may be found.
If the string isn't declared static, then the compiler is required to tell the
linker where it is. Since the linker generally outputs the names and
addresses of all symbols in a variety of places including symbol tables,
debug-information files, etc. which may be used in a variety of ways that the
linker knows nothing about, it may be able to tell that a symbol isn't used
within the code, but can generally have no clue about whether some other
utility may be expecting to find it in the symbol table and make use of it. A directive saying the symbol is "used" will tell the linker that even though it doesn't know of anything that's interested in that symbol, something out in the larger universe the linker knows nothing about is interested in it.
It's typical for each compilation unit to give a blob of information to the
linker and say "Here's some stuff; I need a symbol for the start of it, but
I can compute all the addresses of all the internals from that". The linker
has no way of knowing which parts of such a blob are actually used, so it
has no choice but to accept the whole thing verbatim. If the compiler were
to include unused static declarations in its blob, they'd make it through
to the output file. On the other hand, the compiler knows that if it doesn't
export a symbol for something within that blob, nobody else downstream would
be able to find it whether or not the object was included; thus, there would
typically be little benefit to being able to include such a blob and compiler writers generally have to reason to provide a feature to force such inclusion.

It seems that using a custom section also works.
Instead of
__attribute__((used))
try with
__attribute__((section(".your.section.name.here")))
The linker won't touch it, nor will the strip command.

hidden routines linked in c program

Hullo,
When one disasembly some win32 exe prog compiled by c compiler it
shows that some compilers links some 'hidden' routines in it -
i think even if c program is an empty one and has a 5 bytes or so.
I understand that such 5 bytes is enveloped in PE .exe format but
why to put some routines - it seem not necessary for me and even
somewhat annoys me. What is that? Can it be omitted? As i understand
c program (not speaking about c++ right now which i know has some
initial routines) should not need such complementary hidden functions..
Much tnx for answer, maybe even some extended info link, cause this
topic interests me much
//edit
ok here it is some disasembly Ive done way back then
(digital mars and old borland commandline (i have tested also)
both make much more code, (and Im specialli interested in bcc32)
but they do not include readable names/symbols in such dissassembly
so i will not post them here
thesse are somewhat readable - but i am not experienced in understending
what it is ;-)
https://dl.dropbox.com/u/42887985/prog_devcpp.htm
https://dl.dropbox.com/u/42887985/prog_lcc.htm
https://dl.dropbox.com/u/42887985/prog_mingw.htm
https://dl.dropbox.com/u/42887985/prog_pelles.htm
some explanatory comments whats that heere?
(I am afraid maybe there is some c++ sh*t here, I am
interested in pure c addons not c++ though,
but too tired now to assure that it was compiled in c
mode, extension of compiled empty-main prog was c
so I was thinking it will be output in c not c++)
tnx for longer explanations what it is

Since your win32 exe file is a dynamically linked object file, it will contain the necessary data needed by the dynamic linker to do its job, such as names of libraries to link to, and symbols that need resolving.
Even a program with an empty main() will link with the c-runtime and kernel32.dll libraries (and probably others? - a while since I last did Win32 dev).
You should also be aware that main() is only the entry point of your program - quite a bit has already gone on before this point such as retrieving and tokening the command-line, setting up the locale, creating stderr, stdin, and stdout and setting up the other mechanism required by the c-runtime library such a at_exit(). Similarly, when your main() returns, the runtime does some clean-up - and at the very least needs to call the kernel to tell it that you're done.
As to whether it's necessary? Yes, unless you fancy writing your own program prologue and epilogue each time. There are probably are ways of writing minimal, statically linked applications if you're sufficiently masochistic.
As for storage overhead, why are you getting so worked up? It's not enough to worry about.

There are several initialization functions that load whenever you run a program on Windows. These functions, among other things, call the main() function that you write - which is why you need either a main() or WinMain() function for your program to run. I'm not aware of other included functions though. Do you have some disassembly to show?

You don't have much detail to go on but I think most of what you're seeing is probably the routines of the specific C runtime library that your compiler works with.
For instance there will be code enabling it to run from the entry point 'main' which portable executable format understands to call the main(char ** args) that you wrote in your C program.

__libc_lock_lock is segfaulting

I am working on a piece of code which uses regular expressions in c.
All of the regex stuff is using the standard regex c library.
On line 246 of regexec.c, the line is
__libc_lock_lock(dfa->lock);
My program is segfaulting here and I cannot figure out why. I was trying to find where __libc_lock_lock was defined and it turns out it is a macro in bits/libc-lock.h. However, the macro isnt actually defined to be anything, just defined.
Two questions:
1) Where is the code that is run when __libc_lock_lock is called (I know it must be
replaced with something but I dont know where that would be.
2) if dfa is a re_dfa_t object which is casted from a c string which is the buffer member of the regex_t object type, it will not have any member lock. Is this what is supposed to happen.
It really seams like there is some kind of magic going on here with this __libc_lock_lock

If the segfault is in libc then you can be 99.9% sure of the following:
You are doing something wrong with the API
You have at some previous point clobbered or corrupted memory used by libc, and this is a delayed effect. (Thanks Tyler!)
You are doing something that is pushing the API's capability
You are a developer testing the current trunk with new changes in the API implementation
I suspect that the first is the cause. Posting your API usage and your library version might help. The Regexp API in libc is pretty stable.
Look up debugging with gdb to find a stack trace of the execution path leading to the segfault, and install the glibc-devel packages for the symbols. If the segfault is in (or out) of libc ... then you have done something bad (not initialized an opaque pointer for example)
[aiden#devbox ~]$ gdb ./myProgram
(gdb) r
... Loads of stuff, segfault info ..
(gdb) bt
Will print the stack and function-names that led to the segault. Compile your source with the '-g' debug flag to keep important debugging information.
Get an authoritative source for API usage/examples!
Good Luck

In answer to your first question:
The macro is defined in the libc-lock.h; its relative path is sysdeps/mach/bits on
the glibc release I use (2.2.5). Lines 67/68 from that file are
/* Lock the named lock variable. */
#define __libc_lock_lock(NAME) __mutex_lock (&(NAME))

Run your code in gdb until you get to the segfault. Then do a backtrace to find out where it was.
Here is the set of commands you will type to do this:
gdb myprogram
run
***Make it crash***
backtrace
Typing backtrace will print the call stack and will show you what path the code has taken to get to the point where it is segfaulting.
You can go up and down in the stack to your code by typing 'up' or 'down' respectively. Then you can examine variables in that scope.
So for instance, if your backtrace command prints this:
linux_black_magic
more_linux
libc
libc
yourcode.c
Type 'up' a few times so that the stack frame is in your code instead of linux's. You can then examine variables and memory that your program is operating on. Do this:
print VariableName
x/10 &Variable
That will print the value of the variable and then will print a hex dump of memory starting at the variable.
Those are some general techniques to use with gdb and debugging, post more details for more detailed answers.

Optimized code on Unix?

What is the best and easiest method to debug optimized code on Unix which is written in C?
Sometimes we also don't have the code for building an unoptimized library.

This is a very good question. I had similar difficulties in the past where I had to integrate 3rd party tools inside my application. From my experience, you need to have at least meaningful callstacks in the associated symbol files. This is merely a list of addresses and associated function names. These are usually stripped away and from the binary alone you won't get them... If you have these symbol files you can load them while starting gdb or afterward by adding them. If not, you are stuck at the assembly level...
One weird behavior: even if you have the source code, it'll jump forth and back at places where you would not expect (statements may be re-ordered for better performance) or variables don't exist anymore (optimized away!), setting breakpoints in inlined functions is pointless (they are not there but part of the place where they are inlined). So even with source code, watch out these pitfalls.
I forgot to mention, the symbol files usually have the extension .gdb, but it can be different...

This question is not unlike "what is the best way to fix a passenger car?"
The best way to debug optimized code on UNIX depends on exactly which UNIX you have, what tools you have available, and what kind of problem you are trying to debug.
Debugging a crash in malloc is very different from debugging an unresolved symbol at runtime.
For general debugging techniques, I recommend this book.
Several things will make it easier to debug at the "assembly level":
You should know the calling
convention for your platform, so you
can tell what values are being passed
in and returned, where to find the
this pointer, which registers are "caller saved" and which are "callee saved", etc.
You should know your OS "calling convention" -- what a system call looks like, which register a syscall number goes into, the first parameter, etc.
You should
"master" the debugger: know how to
find threads, how to stop individual
threads, how to set a conditional
breakpoint on individual instruction, single-step, step into or skip over function calls,
etc.
It often helps to debug a working program and a broken program "in parallel". If version 1.1 works and version 1.2 doesn't, where do they diverge with respect to a particular API? Start both programs under debugger, set breakpoints on the same set of functions, run both programs and observe differences in which breakpoints are hit, and what parameters are passed.

Write small code samples by the same interfaces (something in its header), and call your samples instead of that optimized code, say simulation, to narrow down the code scope which you debug. Furthermore you are able to do error enjection in your samples.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight