I am working on a piece of code which uses regular expressions in c.
All of the regex stuff is using the standard regex c library.
On line 246 of regexec.c, the line is
__libc_lock_lock(dfa->lock);
My program is segfaulting here and I cannot figure out why. I was trying to find where __libc_lock_lock was defined and it turns out it is a macro in bits/libc-lock.h. However, the macro isnt actually defined to be anything, just defined.
Two questions:
1) Where is the code that is run when __libc_lock_lock is called (I know it must be
replaced with something but I dont know where that would be.
2) if dfa is a re_dfa_t object which is casted from a c string which is the buffer member of the regex_t object type, it will not have any member lock. Is this what is supposed to happen.
It really seams like there is some kind of magic going on here with this __libc_lock_lock
If the segfault is in libc then you can be 99.9% sure of the following:
You are doing something wrong with the API
You have at some previous point clobbered or corrupted memory used by libc, and this is a delayed effect. (Thanks Tyler!)
You are doing something that is pushing the API's capability
You are a developer testing the current trunk with new changes in the API implementation
I suspect that the first is the cause. Posting your API usage and your library version might help. The Regexp API in libc is pretty stable.
Look up debugging with gdb to find a stack trace of the execution path leading to the segfault, and install the glibc-devel packages for the symbols. If the segfault is in (or out) of libc ... then you have done something bad (not initialized an opaque pointer for example)
[aiden#devbox ~]$ gdb ./myProgram
(gdb) r
... Loads of stuff, segfault info ..
(gdb) bt
Will print the stack and function-names that led to the segault. Compile your source with the '-g' debug flag to keep important debugging information.
Get an authoritative source for API usage/examples!
Good Luck
In answer to your first question:
The macro is defined in the libc-lock.h; its relative path is sysdeps/mach/bits on
the glibc release I use (2.2.5). Lines 67/68 from that file are
/* Lock the named lock variable. */
#define __libc_lock_lock(NAME) __mutex_lock (&(NAME))
Run your code in gdb until you get to the segfault. Then do a backtrace to find out where it was.
Here is the set of commands you will type to do this:
gdb myprogram
run
***Make it crash***
backtrace
Typing backtrace will print the call stack and will show you what path the code has taken to get to the point where it is segfaulting.
You can go up and down in the stack to your code by typing 'up' or 'down' respectively. Then you can examine variables in that scope.
So for instance, if your backtrace command prints this:
linux_black_magic
more_linux
libc
libc
yourcode.c
Type 'up' a few times so that the stack frame is in your code instead of linux's. You can then examine variables and memory that your program is operating on. Do this:
print VariableName
x/10 &Variable
That will print the value of the variable and then will print a hex dump of memory starting at the variable.
Those are some general techniques to use with gdb and debugging, post more details for more detailed answers.
Related
I am doing a simple experiment on Ubuntu LTS 16.04.1 X86_64 with GCC 5.4.
The experiment is to get full call stack of a running C programme.
What I have done is:
Using ptrace's PTRACE_ATTACH & PTRACE_GETREGS to suspend a running C programme and get its current IP and BP.
Using PTRACE_PEEKDATA to get data at [BP] and [BP+4] (or +8 for 64 bits target), so that I can have the calling function's BP and the return address.
Because the BPs are a chain, I should be able to get a sequence of return addresses. After that, by analyzing the address sequence with listing file or dwarf data, I should finally be able to figure the full call stack. Something like 'main --> funcA --> funcB --> funcC ...'.
My problem is, this works fine if the call stack is totally inside my test programme's code. I mean the case when every function is written by me. However, if the test programme is stopped in a CRT or system API, such as 'scanf' or 'sleep', the BP chain no longer works.
I checked the disassambly and noticed that CRT or system API functions do not establish stack frame by 'push ebp' and 'mov ebp,esp' like what my functions do. No wonder why the above approach does not work. But I cannot explain why GDB can still work properly in such case?! So there must be many things I do not know about Linux C programme's call stack.
Could you figure my mistake/misunderstanding? Or could you simply suggest some articles/links for me to read? Thank you very much.
Because the BPs are a chain
They are not. It used to be that a frame pointer chain was used on i386, but for a few years now GCC defaults to -fomit-frame-pointer in optimized compiles even on i386. On x86_64 the -fno-omit-frame-pointer was never the default in optimized code.
this works fine if the call stack is totally inside my test programme's code.
This will only work if you compile without optimization (or with optimization if you also use -fno-omit-frame-pointer).
I cannot explain why GDB can still work properly in such case
GDB (and libunwind) uses DWARF unwind info, which you can examine with readelf -wf a.out.
Before I go deep into my questions, I need to confess that I am still fairly inexperienced to this subject, and am confused over quite a number of concepts, so please bear with me if my manner of asking those questions seems unorganized.
I recently learnt that as standard C library would be loaded into every C program we compiled (is this because we have #include at the beginning of the source file?[quesiton1]), we would have its functions loaded into the memory. So, I would know that the system() function had already been loaded and stored somewhere in the memory, and then I was made know that I could find the exact address of where the system() function was stored by debugging a random C program with gdb, and issuing the command 'p system', which would print out the address of the function. I understand that 'p' is used to print variable in gdb, and 'system' in this case probably indicates the address of the system() function, so it seems to make sense to do so, but then I think to myself, wait a second, it does not appear that I have used the system() function anywhere in my code, why would the inventor of gdb include such a variable for me to print out the address of some function that I don't even use? and does this imply that the address of every function in stand C library can be found out in the same fashion? and they all have a corresponding variable name in gdb? [question2]
One more question unrelated to stuff I talked above is whether functions like system(), execve() and many others are specific to Linux OS, or they are also used in Windows OS? [question3]
Hope that you guys can help me out. Thanks in advance!
The standard C library is linked with every program because it's necessary for it to be there to be able to run your program. There's a lot of things happening in your program before your main function gets called and after it returns, the standard library takes care of this. It also provides you with most of the standard functions you can call. You can compile things without a standard library, but that's an advanced topic. This is pretty much unrelated to #include.
Gdb can see system with p because it prints more than just variables. It prints anything that is in scope. system just happens to be a symbol that's visible to you in that scope. You could print any symbol that's visible to you, including all the globally visible variables and functions in libc and your program. Symbols in this context means "names of various things that need to be findable by programs and other libraries", this includes all functions, variables, section boundaries and many other things that the compiler/linker/runtime/debugger need to find to do its job.
Usually the standard library gets linked dynamically, which means that every program has the exact same copy of the library. In that case all symbols in it will be visible to your program because there's no reason to exclude them. If you link your program statically only the necessary parts of libc will be included and you would probably not see the system symbol unless you actually use that function.
here my output, I don't understand the hex "0xe8" and "0x7f8c783ac74d";
/home/roroco/Dropbox/rbs/ro_article/c/ro_helper_article.so(get_article_n2+0xe8) [0x7f8c783ac74d]
here is full output
It looks like you've caused (or, rather, a plugin caused) ruby to segfault. This normally means that you've attempted to access memory outside of your designated bounds - basically, your program did something really, really weird. The line you specifically picked out is actually a C library - the .so extension means "static object," and is linked into the main ruby executable. The information it's providing you with tells you where the error originated - however, most production libraries do not contain information such as "file names" and "line numbers". Instead, they contain a list of symbols. In your case, it's telling you exactly where, in the static object, an error originated - exactly 0xe8 bytes after the get_article_n2 symbol - or, at the address 0x7f8c783ac74d.
So now you have a few options.
You can poke around blindly in your source code (I'm assuming you wrote the library that is in error here, since it seems that's what you're testing) and try and guess where the segfault originated. You already know that it's in the function get_article_n2, considering the error originated after that symbol.
You can disassemble the static object to see the specific instruction that caused the error, and then attempt to map it to the source.
You can enable debugging, and have your build system output file names and line numbers so you know what you're looking at. (disclaimer: I'm not sure if this will work; it doesn't look like you're emitting debug information to me, but I'm not sure if you are; and even if you would be, I'm not sure it would be used to output. However, this seems the easiest course of action.).
I need to profile the values passed as arguments to the standard C library function sqrt() in my program.
The trivial way is to insert code to dump these values to a file before the actual call to sqrt() (e.g. a simple fprintf()). However, if sqrt() is called from inside a library, or if it is called from multiple locations, the task can become hard.
Is there a way to automatically do this in GDB or in some other debugging tool?
Thanks in advance for your help.
Best Regards.
Sure, it can be done. There is an easy way and a hard way.
The easy way is if you have debuginfo for sqrt. Most distros make this available; e.g., for Fedora you can use debuginfo-install to install it.
In this case, find the function in question, set a breakpoint on it, and have the breakpoint commands print the arguments:
break sqrt
commands
silent
info args
cont
end
If you have a new enough gdb, and you know the names of the arguments, you can use the dprintf command instead. This will give you nicer formatting and not interact badly with other debugging commands like next.
The hard way is if you don't have debug info. In this case you need to know the platform ABI. Then you can still set the breakpoint, and then print the appropriate registers or dump the appropriate memory, depending on how the arguments are passed.
Yet another way is to use SystemTap. This is a pretty good tool for this kind of tracing.
My program links statically to many libraries and crashes before getting to main in GDB. How do I diagnose what the problem is?
It's a good bet that LD_DEBUG can help you here. Try this: LD_DEBUG=all ./a.out. This will allow you to easily identify the library which is being loaded when your program crashes.
(Edit: if it wasn't clear, a.out is meant to refer to a generic binary file -- in this case, replace it with the name of your executable).
Edit 2:
To clarify, LD_DEBUG is an environment variable which is examined by the dynamic linker when a program begins execution. If LD_DEBUG is set to some value, the dynamic linker will output a lot of information about the dynamic libraries being loaded during program execution, symbol binding, and so on.
For starters, execute the following on your machine:
LD_DEBUG=help ls
You will see the valid options for LD_DEBUG on your system listed. The most verbose setting is all, which will display all available information.
Now, to use this is as simple as the ls example, only replace ls with the name of your program. There is no need for gdb in order to use LD_DEBUG, as it is functionality provided solely by the dynamic linker, and not by gdb.
It may crash because some component throws an exception and nobody catches it since main() hasn't been entered yet. Set a breakpoint on throwing an exception:
catch throw
run
(If catch throw doen't work the first time you start it, run it once to let it load the dynamic libraries and then do catch throw and run again).
This post has the answer, you have to set a breakpoint before main in the crt0 startup code:
Using GDB without debugging symbols on x86?
starti
starti breaks at the very first instruction executed, see also: Stopping at the first machine code instruction in GDB
An alternative if your GDB is not new enough:
break _start
if you know the that the name of the entry point method is _start, or:
info files
search for Entry point:
Entry point: 0x400440
and run:
break *0x400440
TODO: find out how to compile crt* objects with debug symbols and step into them: How to compile my own glibc C standard library from source and use it?
Start taking the libraries out one by one until it stops crashing.
Then examine the culprit.
I haven't run into this in C but if you link to a c++ library static initialization can crash. You can create it easily by having an assert in a constructor of a static scope variable.
If you can, link your program dynamically instead of statically and follow #denniston.t answer. Maybe debug trace from dynamic linker will help to fix this problem.