finding which function caused "Address out of bounds" in gdb - c

I have a critical bug in my project. When I use gdb to open the .core it shows me something like(I didn't put all the gdb output for ease of reading):
This is very very suspicious, new written part of code ::
0x00000000004579fe in http_chunk_count_loop
(f=0x82e68dbf0, pl=0x817606e8a Address 0x817606e8a out of bounds)
This is very mature part of code which worked for a long time without problem::
0x000000000045c8a5 in packet_handler_http
(f=0x82e68dbf0, pl=0x817606e8a Address 0x817606e8a out of bounds)
Ok now what messes my mind is the pl=0x817606e8a Address 0x817606e8a out of bounds, gdb shows it was already out of bounds before it reached new written code. This make me think the problem caused by function which calls packet_handler_http.
But packet_handler_http is very mature and working for a long time without problem. And this makes me I am misundertanding gdb output.
The problem is with packet_handler_http I guess but because of this was already working code I am confused, am I right with my guess or am I missing something?

To detect "memory errors" you might like to run the program under Valgrind: http://valgrind.org
If having compiled the program with symbols (-g for gcc) you could quite reliably detect "out of bounds" conditions down to the line of code where the error occurrs, as well with the line of code having allocated the memory (if ever).

The problem is with packet_handler_http I guess
That guess is unlikely to be correct: if the packet_handler_http is really receiving invalid pointer, then the corruption has happened "upstream" from it.
This is very mature part of code which worked for a long time without problem
I routinely find bugs in code that worked "without problem" for 10+ years. Also, the corruption may be happening in newly-added code, but causing problems elsewhere. Heap and stack buffer overflows are often just like that.
As alk already suggested, run your executable under Valgrind, or Address Sanitizer (also included in GCC-4.8), and fix any problems they find.

Thanks guys for your contrubition , even gdb says opposite it turn out pointer was good.
There was a part in new code which causes out of bounds problem.
There was line like :: (goodpointer + offset) and this offset was http chunk size and I were taking it from network(data sniffing). And there was kind of attack that this offset were extremely big, which cause integer overflow. And this resulted out of bounds problem.
My conclusions : don't thrust the parameters from network never AND gdb may not always points the parameter correctly at coredump because at the moment of crush things can get messy in stack .

Related

Unreadable instruction at address

I get segmentation fault on a certain scenario(it is C code with DEC VAX FMS(Forms Management System) calls to get a certain field on a CRT screen - pretty old legacy code). I am on an AIX machine, and have only dbx installed on it. GDB, valgrind etc. are not available.
Here is what I get when I try to debug:
Unreadable instruction at address 0x53484950
I do not know how to proceed from here.
I have tried a few things:
1.
(dbx) up
not that many levels
(dbx) down
not that many levels
(dbx) n
where
Segmentation fault in . at 0x53484950 ($t1)
0x53484950 (???) Unreadable instruction at address 0x53484950
Tried tracei(for machine instructions), dump(dump gives so much output, I am unable to make sense of it) etc. but nothing seems to help.
(dbx) &0x53484950/X
expected variable, found "1397246288"
I am used to getting a stack trace on "where" and going on from there. This is something I have not encountered before, and it appears I am not very good at dbx either. Any help to get to at least the line of code that is causing trouble is appreciated.
Once you have hit a segfault, there is no way to continue, so the n command is not going to do anything. At that point, all you can do is examine the stack and the variables, and that will be meaningless unless you have the source code and can recompile it.
In fact, without the source code, I am not sure how you could possibly proceed with fixing the program. Even if you could "decompile" the program, or at least disassemble the program, the risk of making a mistake when trying to patch the binary in order to fix it is virtually 100%.
I'm sorry. Given the limitations you are working under, I would argue the the problem is insolvable. Without tools such as gdb or valgind, it will be difficult to find the problem, and without the source code, it will be very difficult to fix the problem once you have found it.

What are best practices for finding a bug in a C program that only shows up in optimized build

My program uses a third part library that throws segmentation fault at some point. I tried to compile the library with debug symbols and without compiler optimization, and the crash gone away. My suspect is that compiler optimizations revealed this bug. What are best practices for debugging cases like this?
EDIT - (corrected the statement above: "revealed" instead of "caused")
I think I was misunderstood. I didn't have an intention to blame compiler, or something like that. I only asked for best practices for finding a bug in such a situation, where I don't have debug symbols in the 3rd party library (the crash backtrace leads to the 3rd party library).
What you describe is quite common. And it's almost never ever a bug in the compiler optimization. Optimization does a lot of things to your code. Variables get reordered/optimized away etc. If you have one buffer overflow, it might just overflow memory that's no big deal in the debug build, but that memory is very important in the optimization build.
Use valgrind to track down memory errors - they're almost always the cause of the symptoms you see.
Your suspicion is that optimization caused a bug. My suspicion is that your code has constructs that lead to Undefined Behavior, and when the optimizer is on, this Undefined Behavior manifests itself as erroneous behavior or crash. Don't blame the optimizer. Find UB in your code... might be tricky, though. Possible culprits:
OutOfBounds index
Returning the address a temprorary
A zillion of other things
Compile with debug symbols and compiler optimization, it will "hopefully" fail as well. Allow the system to generate a core file (ulimit -c unlimited, then re-run the program). Load the core file into gdb to see what happened.
Another powerful tool is valgrind, run your program within valgrind with the option --db-attatch=yes it will stop and run the debugger as soon as it detects an invalid read or write. Invalid reads/writes are likely to provoke Segfault, and even if they don't, they should be removed anyway.
Good luck,
Keep putting debug statements or messageboxes in the place you think the code is crashing. The crash will occur between two messageboxes and this will help you locate the faulty code as long as the code wasn't changed too much.
Also comment out blocks of code until the crash stops coming. Keep commenting back in until the crash returns. What you last commented back in must be causing the crash, directly or indirectly.
Both of these methods are useful for general debugging and half your work is already done if you are able to reliably reproduce the crash.
I did not give specific advice for debugging compiler optimisations because it's highly unlikely the crash is caused by that. The optimisations are generally tested very robustly to ensure they do not change the function or semantics of the code in any way.
If the backtrace leads to the third-party library, use gdb to break before the library call. Verify that the parameters you're passing to the library are valid (i.e., aren't uninitialized pointers, aren't pointers to free'd memory, aren't out of range, etc.)
Can you use strace to trace the function calls and then try to determine the execution path in the third-party library? Use a printf or some other system call before the failing library call so you have a starting point in the strace output.
If you really think it's a bug in the third-party library, you'll have to compile it with optimizations on so you can reproduce the failure. Are you saying that your compiler can only include debug symbols for non-optimized builds? gdb should still work for optimized builds.
Well, going through the compiled binary isn't going to help.
So that leaves going through your code to find out what part is causing the segfault. I would just work through your code manually and start commenting things out. Once you find what's causing the error, then you can determine what to do with it. It might be worth adding printfs in select locations to see exactly where the program fails.
Think of it as doing a binary search for the error ;)
If it only blows up when you turn on optimization, then that's a strong hint you've invoked undefined behavior somewhere. Unfortunately, that UB may be nowhere near the code that actually generated the segfault (as I've discovered several times in the past).
Every time this has happened to me (which hasn't been that often), the cause was a buffer overflow somewhere else in the code. I never developed a repeatable, generally applicable technique for finding the problem, though (unless you want to call hours stepping through a debugger and swearing a generally applicable technique).

Need help with buffer overrun

I've got a buffer overrun I absolutely can't see to figure out (in C). First of all, it only happens maybe 10% of the time or so. The data that it is pulling from the DB each time doesn't seem to be all that much different between executions... at least not different enough for me to find any discernible pattern as to when it happens. The exact message from Visual Studio is this:
A buffer overrun has occurred in
hub.exe which has corrupted the
program's internal state. Press
Break to debug the program or Continue
to terminate the program.
For more details please see Help topic
'How to debug Buffer Overrun Issues'.
If I debug, I find that it is broken in __report_gsfailure() which I'm pretty sure is from the /GS flag on the compiler and also signifies that this is an overrun on the stack rather than the heap. I can also see the function it threw this on as it was leaving, but I can't see anything in there that would cause this behavior, the function has also existed for a long time (10+ years, albeit with some minor modifications) and as far as I know, this has never happened.
I'd post the code of the function, but it's decently long and references a lot of proprietary functions/variables/etc.
I'm basically just looking for either some idea of what I should be looking for that I haven't or perhaps some tools that may help. Unfortunately, nearly every tool I've found only helps with debugging overruns on the heap, and unless I'm mistaken, this is on the stack. Thanks in advance.
You could try putting some local variables on either end of the buffer, or even sentinels into the (slightly expanded) buffer itself, and trigger a breakpoint if those values aren't what you think they should be. Obviously, using a pattern that is not likely in the data would be a good idea.
While it won't help you in Windows, Valgrind is by far the best tool for detecting bad memory behavior.
If you are debugging the stack, your need to get to low level tools - place a canary in the stack frame (perhaps a buffer filled with something like 0xA5) around any potential suspects. Run the program in a debugger and see which canaries are no longer the right size and contain the right contents. You will gobble up a large chunk of stack doing this, but it may help you spot exactly what is occurring.
One thing I have done in the past to help narrow down a mystery bug like this was to create a variable with global visibility named checkpoint. Inside the culprit function, I set checkpoint = 0; as the very first line. Then, I added ++checkpoint; statements before and after function calls or memory operations that I even remotely suspected might be able to cause an out-of-bounds memory reference (plus peppering the rest of the code so that I had a checkpoint at least every 10 lines or so). When your program crashes, the value of checkpoint will narrow down the range you need to focus on to a handful of lines of code. This may be a bit overkill, I do this sort of thing on embedded systems (where tools like valgrind can't be used) but it should still be useful.
Wrap it in an exception handler and dump out useful information when it occurs.
Does this program recurse at all? If so, I check there to ensure you don't have an infinite recursion bug. If you can't see it manually, sometimes you can catch it in the debugger by pausing frequently and observing the stack.

Bizarre bug in C [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 3 years ago.
Improve this question
So I have a C program. And I don't think I can post any code snippets due to complexity issues. But I'll outline my error, because it's weird, and see if anyone can give any insights.
I set a pointer to NULL. If, in the same function where I set the pointer to NULL, I printf() the pointer (with "%p"), I get 0x0, and when I print that same pointer a million miles away at the end of my program, I get 0x0. If I remove the printf() and make absolutely no other changes, then when the pointer is printed later, I get 0x1, and other random variables in my structure have incorrect values as well. I'm compiling it with GCC on -O2, but it has the same behavior if I take off optimization, so that's not hte problem.
This sounds like a Heisenbug, and I have no idea why it's happening, nor how to fix it. Does anyone who has dealt with something like this in the past have advice on how they approached this kind of problem? I know this may sound kind of vague.
EDIT: Somehow, it works now. Thank you, all of you, for your suggestions.
The debugger told me interesting things - that my variable was getting optimized away. So I rewrote the function so it didn't need the intermediate variable, and now it works with and without the printf(). I have a vague idea of what might have been happening, but I need sleep more than I need to know what was happening.
Are you using multiple threads? I've often found that the act of printing something out can be enough to effectively suppress a race condition (i.e. not remove the bug, just make it harder to spot).
As for how to diagnose/fix it... can you move the second print earlier and earlier until you can see where it's changing?
Do you always see 0x1 later on when you don't have the printf in there?
One way of avoiding the delay/synchronization of printf would be to copy the pointer value into another variable at the location of the first printf and then print out that value later on - so you can see what the value was at that point, but in a less time-critical spot. Of course, as you've got odd value "corruption" going on, that may not be as reliable as it sounds...
EDIT: The fact that you're always seeing 0x1 is encouraging. It should make it easier to track down. Not being multithreaded does make it slightly harder to explain, admittedly.
I wonder whether it's something to do with the extra printf call making a difference to the size of stack. What happens if you print the value of a different variable in the same place as the first printf call was?
EDIT: Okay, let's take the stack idea a bit further. Can you create another function with the same sort of signature as printf and with enough code to avoid it being inlined, but which doesn't actually print anything? Call that instead of printf, and see what happens. I suspect you'll still be okay.
Basically I suspect you're screwing with your stack memory somewhere, e.g. by writing past the end of an array on the stack; changing how the stack is used by calling a function may be disguising it.
If you're running on a processor that supports hardware data breakpoints (like x86), just set a breakpoint on writes to the pointer.
Do you have a debugger available to you? If so, what do the values look like in that? Can you set any kind of memory/hardware breakpoint on the value? Maybe there's something trampling over the memory elsewhere, and the printf moves things around enough to move or hide the bug?
Probably worth looking at the asm to see if there's anything obviously wrong there. Also, if you haven't already, do a full clean rebuild. If the definition of the struct has changed recently, there's a vague change that the compiler could be getting it wrong if the dependency checking failed to correctly rebuild everything it needed to.
Have you tried setting a condition in your debugger which notifies you when that value is modified? Or running it through Valgrind? These are the two major things that I would try, especially Valgrind if you're using Linux. There's no better way to figure out memory errors.
Without code, it's a little hard to help, but I understand why you don't want to foist copious amounts on us.
Here's my first suggestion: use a debugger and set a watchpoint on that pointer location.
If that's not possible, or the bug disappears again, here's my second suggestion.
1/ Start with the buggy code, the one where you print the pointer value and you see 0x1.
2/ Insert another printf a little way back from there (in terms of code execution path).
3/ If it's still 0x1, go back to step 2, moving a little back through the execution path each time.
4/ If it's 0x0, you know where the problem lies.
If there's nothing obvious between the 0x0 printf and the 0x1 printf, it's likely to be corruption of some sort. Without a watchpoint, that'll be hard to track down - you need to check every single stack variable to ensure there's no possibility of overrun.
I'm assuming that pointer is a global since you set it and print it "a million miles away". If it is, lok at the variables you define on either side of it (in the source). They're the ones most likely to be causing overrun.
Another possibility is to turn off the optimization to see if the problem still occurs. We've occasionally had to ship code like that in cases where we couldn't fix the bug before deadlines (we'll always go back and fix it later, of course).

Strange code crash problem?

I have a MSVC 6.o workspace, which has all C code.
The code is being run without any optimization switch i.e with option O0, and in debug mode.
This code is obtained from some 3rd party. It executes desirable as it is.
But when I add some printf statements in certain functions for debugging, and then execute the code, it crashes.
I suspect it to be some kind of code/data overflow across a memory-page/memory-segment or something alike. But the code does not have any memory map specifier, or linker command file mentioning the segments/memory map etc.
How do I narrow down the cause, and the fix for this quirky issue?
On Linux, I like valgrind. Here is a Stack Overflow thread for valgrind-like tools on Windows.
You could try to determine where the crash happens, by looking at the stack trace in Visual Studio. You should be able to see what is the sequence of function calls that eventually leads to the crash, and this may give you a hint as to what's wrong.
It is also possible that the printf() alone causes the crash. A possible cause - but not too likely on Windows - is a too-small stack that is being overflown by the call to printf().
Use string.getbuffer while printing cstring objects in printf.
There could be an issue for wide char and normal string.
printf("%s",str.Getbuffer());
str.ReleaseBuffer();
Cheers,
Atul.
In general when trying to deal with a crash, your first port of call should be the debugger.
Used correctly, this will enable you to narrow down your problem to a specific line of code and, hopefully, give you a view of the runtime memory at the moment of the crash. This will allow you to see the immediate cause of the crash.

Resources