I have a MSVC 6.o workspace, which has all C code.
The code is being run without any optimization switch i.e with option O0, and in debug mode.
This code is obtained from some 3rd party. It executes desirable as it is.
But when I add some printf statements in certain functions for debugging, and then execute the code, it crashes.
I suspect it to be some kind of code/data overflow across a memory-page/memory-segment or something alike. But the code does not have any memory map specifier, or linker command file mentioning the segments/memory map etc.
How do I narrow down the cause, and the fix for this quirky issue?
On Linux, I like valgrind. Here is a Stack Overflow thread for valgrind-like tools on Windows.
You could try to determine where the crash happens, by looking at the stack trace in Visual Studio. You should be able to see what is the sequence of function calls that eventually leads to the crash, and this may give you a hint as to what's wrong.
It is also possible that the printf() alone causes the crash. A possible cause - but not too likely on Windows - is a too-small stack that is being overflown by the call to printf().
Use string.getbuffer while printing cstring objects in printf.
There could be an issue for wide char and normal string.
printf("%s",str.Getbuffer());
str.ReleaseBuffer();
Cheers,
Atul.
In general when trying to deal with a crash, your first port of call should be the debugger.
Used correctly, this will enable you to narrow down your problem to a specific line of code and, hopefully, give you a view of the runtime memory at the moment of the crash. This will allow you to see the immediate cause of the crash.
Related
My application is a multi-thread program that runs on Solaris.
Recently, I found it may crash, and the reason is one member in a pointer array is changed from a valid value to NULL,so when accessing it, it crashed.
Because the occurrence ratio is very low, in the past 2 months, it only occurred twice, and the changed members in the array aren't the same. I can't find the repeated steps, and after reviewing code, there is no valuable clue gotten.
Could anyone give some advice on how to debug the memory is changed randomly issue?
Since you aren't able to reproduce the crash, debugging it isn't going to be easy.
However, there are some things you can do:
Go through the code and make a list of all of the places in the code that write to that variable--particularly the ones that could write a NULL to it. It's likely that one of them is your culprit.
Try to develop some kind of torture test that makes the fault more likely to occur (eg running through simulated or random transactions at top speed). If you can reproduce the crash this way you'll be in a much better situation, as you can then analyze the actual cause of the crash instead of just speculating.
If possible, run the program under valgrind or purify or similar. If they give any warnings, track down what is causing those warnings and fix it; it's possible that your program is eg accessing memory that has been freed, which might seem to work most of the time (if the free memory hasn't been reused for anything when it is accessed) but would fail occasionally (when something is reusing it)
Add a memory checker like Electric Fence to your code, or just replace free() with a custom version that overwrites the free memory with random garbage in the hopes that this will make the crash more likely to occur.
Recompile your program using different compilers (especially new/fancy ones like clang++ with the static analyzer enabled) and fix whatever they warn about. This may point you to your problem.
Run the program under different hardware and OS's; sometimes an obscure problem under one OS gives really obvious symptoms on another.
Review the various machines where the crash is known to have occurred. Do they all have anything in common? What about the machines where it hasn't crashed? Is there something different about them?
Step 2 is really the most important one, because even if you think you have fixed the problem, you won't be able to prove it unless you can reproduce the crash in the old code, and cannot reproduce it with the fixed code. Without being able to reproduce the fault, you're just guessing about whether a particular code change actually helps or not.
I have a critical bug in my project. When I use gdb to open the .core it shows me something like(I didn't put all the gdb output for ease of reading):
This is very very suspicious, new written part of code ::
0x00000000004579fe in http_chunk_count_loop
(f=0x82e68dbf0, pl=0x817606e8a Address 0x817606e8a out of bounds)
This is very mature part of code which worked for a long time without problem::
0x000000000045c8a5 in packet_handler_http
(f=0x82e68dbf0, pl=0x817606e8a Address 0x817606e8a out of bounds)
Ok now what messes my mind is the pl=0x817606e8a Address 0x817606e8a out of bounds, gdb shows it was already out of bounds before it reached new written code. This make me think the problem caused by function which calls packet_handler_http.
But packet_handler_http is very mature and working for a long time without problem. And this makes me I am misundertanding gdb output.
The problem is with packet_handler_http I guess but because of this was already working code I am confused, am I right with my guess or am I missing something?
To detect "memory errors" you might like to run the program under Valgrind: http://valgrind.org
If having compiled the program with symbols (-g for gcc) you could quite reliably detect "out of bounds" conditions down to the line of code where the error occurrs, as well with the line of code having allocated the memory (if ever).
The problem is with packet_handler_http I guess
That guess is unlikely to be correct: if the packet_handler_http is really receiving invalid pointer, then the corruption has happened "upstream" from it.
This is very mature part of code which worked for a long time without problem
I routinely find bugs in code that worked "without problem" for 10+ years. Also, the corruption may be happening in newly-added code, but causing problems elsewhere. Heap and stack buffer overflows are often just like that.
As alk already suggested, run your executable under Valgrind, or Address Sanitizer (also included in GCC-4.8), and fix any problems they find.
Thanks guys for your contrubition , even gdb says opposite it turn out pointer was good.
There was a part in new code which causes out of bounds problem.
There was line like :: (goodpointer + offset) and this offset was http chunk size and I were taking it from network(data sniffing). And there was kind of attack that this offset were extremely big, which cause integer overflow. And this resulted out of bounds problem.
My conclusions : don't thrust the parameters from network never AND gdb may not always points the parameter correctly at coredump because at the moment of crush things can get messy in stack .
My program uses a third part library that throws segmentation fault at some point. I tried to compile the library with debug symbols and without compiler optimization, and the crash gone away. My suspect is that compiler optimizations revealed this bug. What are best practices for debugging cases like this?
EDIT - (corrected the statement above: "revealed" instead of "caused")
I think I was misunderstood. I didn't have an intention to blame compiler, or something like that. I only asked for best practices for finding a bug in such a situation, where I don't have debug symbols in the 3rd party library (the crash backtrace leads to the 3rd party library).
What you describe is quite common. And it's almost never ever a bug in the compiler optimization. Optimization does a lot of things to your code. Variables get reordered/optimized away etc. If you have one buffer overflow, it might just overflow memory that's no big deal in the debug build, but that memory is very important in the optimization build.
Use valgrind to track down memory errors - they're almost always the cause of the symptoms you see.
Your suspicion is that optimization caused a bug. My suspicion is that your code has constructs that lead to Undefined Behavior, and when the optimizer is on, this Undefined Behavior manifests itself as erroneous behavior or crash. Don't blame the optimizer. Find UB in your code... might be tricky, though. Possible culprits:
OutOfBounds index
Returning the address a temprorary
A zillion of other things
Compile with debug symbols and compiler optimization, it will "hopefully" fail as well. Allow the system to generate a core file (ulimit -c unlimited, then re-run the program). Load the core file into gdb to see what happened.
Another powerful tool is valgrind, run your program within valgrind with the option --db-attatch=yes it will stop and run the debugger as soon as it detects an invalid read or write. Invalid reads/writes are likely to provoke Segfault, and even if they don't, they should be removed anyway.
Good luck,
Keep putting debug statements or messageboxes in the place you think the code is crashing. The crash will occur between two messageboxes and this will help you locate the faulty code as long as the code wasn't changed too much.
Also comment out blocks of code until the crash stops coming. Keep commenting back in until the crash returns. What you last commented back in must be causing the crash, directly or indirectly.
Both of these methods are useful for general debugging and half your work is already done if you are able to reliably reproduce the crash.
I did not give specific advice for debugging compiler optimisations because it's highly unlikely the crash is caused by that. The optimisations are generally tested very robustly to ensure they do not change the function or semantics of the code in any way.
If the backtrace leads to the third-party library, use gdb to break before the library call. Verify that the parameters you're passing to the library are valid (i.e., aren't uninitialized pointers, aren't pointers to free'd memory, aren't out of range, etc.)
Can you use strace to trace the function calls and then try to determine the execution path in the third-party library? Use a printf or some other system call before the failing library call so you have a starting point in the strace output.
If you really think it's a bug in the third-party library, you'll have to compile it with optimizations on so you can reproduce the failure. Are you saying that your compiler can only include debug symbols for non-optimized builds? gdb should still work for optimized builds.
Well, going through the compiled binary isn't going to help.
So that leaves going through your code to find out what part is causing the segfault. I would just work through your code manually and start commenting things out. Once you find what's causing the error, then you can determine what to do with it. It might be worth adding printfs in select locations to see exactly where the program fails.
Think of it as doing a binary search for the error ;)
If it only blows up when you turn on optimization, then that's a strong hint you've invoked undefined behavior somewhere. Unfortunately, that UB may be nowhere near the code that actually generated the segfault (as I've discovered several times in the past).
Every time this has happened to me (which hasn't been that often), the cause was a buffer overflow somewhere else in the code. I never developed a repeatable, generally applicable technique for finding the problem, though (unless you want to call hours stepping through a debugger and swearing a generally applicable technique).
I've got a buffer overrun I absolutely can't see to figure out (in C). First of all, it only happens maybe 10% of the time or so. The data that it is pulling from the DB each time doesn't seem to be all that much different between executions... at least not different enough for me to find any discernible pattern as to when it happens. The exact message from Visual Studio is this:
A buffer overrun has occurred in
hub.exe which has corrupted the
program's internal state. Press
Break to debug the program or Continue
to terminate the program.
For more details please see Help topic
'How to debug Buffer Overrun Issues'.
If I debug, I find that it is broken in __report_gsfailure() which I'm pretty sure is from the /GS flag on the compiler and also signifies that this is an overrun on the stack rather than the heap. I can also see the function it threw this on as it was leaving, but I can't see anything in there that would cause this behavior, the function has also existed for a long time (10+ years, albeit with some minor modifications) and as far as I know, this has never happened.
I'd post the code of the function, but it's decently long and references a lot of proprietary functions/variables/etc.
I'm basically just looking for either some idea of what I should be looking for that I haven't or perhaps some tools that may help. Unfortunately, nearly every tool I've found only helps with debugging overruns on the heap, and unless I'm mistaken, this is on the stack. Thanks in advance.
You could try putting some local variables on either end of the buffer, or even sentinels into the (slightly expanded) buffer itself, and trigger a breakpoint if those values aren't what you think they should be. Obviously, using a pattern that is not likely in the data would be a good idea.
While it won't help you in Windows, Valgrind is by far the best tool for detecting bad memory behavior.
If you are debugging the stack, your need to get to low level tools - place a canary in the stack frame (perhaps a buffer filled with something like 0xA5) around any potential suspects. Run the program in a debugger and see which canaries are no longer the right size and contain the right contents. You will gobble up a large chunk of stack doing this, but it may help you spot exactly what is occurring.
One thing I have done in the past to help narrow down a mystery bug like this was to create a variable with global visibility named checkpoint. Inside the culprit function, I set checkpoint = 0; as the very first line. Then, I added ++checkpoint; statements before and after function calls or memory operations that I even remotely suspected might be able to cause an out-of-bounds memory reference (plus peppering the rest of the code so that I had a checkpoint at least every 10 lines or so). When your program crashes, the value of checkpoint will narrow down the range you need to focus on to a handful of lines of code. This may be a bit overkill, I do this sort of thing on embedded systems (where tools like valgrind can't be used) but it should still be useful.
Wrap it in an exception handler and dump out useful information when it occurs.
Does this program recurse at all? If so, I check there to ensure you don't have an infinite recursion bug. If you can't see it manually, sometimes you can catch it in the debugger by pausing frequently and observing the stack.
We are using DevPartners boundchecker for detecting memory leak issues. It is doing a wonderful job, though it does not find string overflows like the following
char szTest [1] = "";
for (i = 0; i < 100; i ++) {
strcat (szTest, "hi");
}
Question-1: Is their any way, I can make BoundsChecker to detect this?
Question-2: Is their any other tool that can detect such issues?
I tried it in my devpartner (msvc6.6) (devpartner 7.2.0.372)
I confirm your observed behavior.
I get an access violation after about 63 passes of the loop.
What does compuware have to say about the issue?
CppCheck will detect this issue.
One option is to simply ban the use of string functions that don't have information about the destination buffer. A set of macros like the following in a universally included header can be helpful:
#define strcpy strcpy_is_banned_use_strlcpy
#define strcat strcat_is_banned_use_strlcat
#define strncpy strncpy_is_banned_use_strlcpy
#define strncat strncat_is_banned_use_strlcat
#define sprintf sprintf_is_banned_use_snprintf
So any attempted uses of the 'banned' routines will result in a linker error that also tells you what you should use instead. MSVC has done something similar that can be controlled using macros like _CRT_SECURE_NO_DEPRECATE.
The drawback to this technique is that if you have a large set of existing code, it can be a huge chore to get things moved over to using the new, safer routines. It can drive you crazy until you've gotten rid of the functions considered dangerous.
valgrind will detect writing past dynamically allocated data, but I don't think it can do so for automatic arrays like in your example. If you are using strcat, strcpy, etc., you have to make sure that the destination is big enough.
Edit: I was right about valgrind, but there is some hope:
Unfortunately, Memcheck doesn't do bounds checking on static or stack arrays. We'd like to, but it's just not possible to do in a reasonable way that fits with how Memcheck works. Sorry.
However, the experimental tool Ptrcheck can detect errors like this. Run Valgrind with the --tool=exp-ptrcheck option to try it, but beware that it is not as robust as Memcheck.
I haven't used Ptrcheck.
You may find that your compiler can help. For example, in Visual Studio 2008, check the project properties - C/C++ - Code Generation page. Theres a "Buffer Security Check" option.
My guess would be that it reserves a bit of extra memory and writes a known sequence in there. If that sequence gets modified, it assumes a buffer overrun. I'm not sure, though - I remember reading this somewhere, but I don't remember for certain if it was about VC++.
Given that you've tagged this C++, why use a pointer to char at all?
std::stringstream test;
std::fill_n(std::ostream_iterator<std::string>(test), 100, "hi");
If you enable the /RTCs compiler switch, it may help catch problems like this. With this switch on, the test caused an access violation when running the strcat only one time.
Another useful utility that helps with problems like this (more heap-oriented than stack but extremely helpful) is application verifier. It is free and can catch a lot of problems related to heap overflow.
An alternative: our Memory Safety Checker.
I think it will handle this case.
The problem was that by default, the API Validation subsystem is not enabled, and the messages you were interested in come from there.
I can't speak for older versions of BoundsChecker, but version 10.5 has no particular problems with this test. It reports the correct results and BoundsChecker itself does not crash. The test application does, however, because this particular test case completely corrupts the call stack that led to the function where the test code was, and as soon as that function terminated, the application did too.
The results: 100 messages about write overrun to a local variable, and 99 messages about the destination string not being null terminated. Technically, that second message is not right, but BoundsChecker only searches for the null termination within the bounds of the destination string itself, and after the first strcat call, it no longer contains a zero byte within its bounds.
Disclaimer: I work for MicroFocus as a developer working on BoundsChecker.