Tools/techniques for diagnosing C app crash on Windows - c

I have written an application in C, which runs as a Windows service. Most users can run the app without any problems, but a significant minority experience crashes caused by an Access Violation, so I know I have a bug somewhere. I have tried setting up virtual machines to mirror the users' configurations as closely as possible, but cannot reproduce the issue.
My background is in Java - when a Java app crashes it will produce a stack trace showing exactly where the problem occurred, but native applications aren't so helpful. What techniques are normally used by C developers for tracking down this type of problem? I have no physical access to the users' machines that experience the crash, but I could send then additional tools to install, to capture information. I also have Windows error reports showing Exception Code/Offset etc but these don't mean much to me. I have compiled my application using gcc - are there some compiler options I can use to generate more information in the event of a crash?

You could try asking the users to run ProcDump to capture a core dump when the program crashes. Unlike using something like Visual Studio it's a single, simple command-line utility so there should be no problem getting the users to run it.

On most modern operating systems your app can install a crash handler that'll walk the stack(s) in the event of a crash. I have no experience doing this on Windows, but this article walks through how to do it.

Related

VOLTTRON on OS X?

Has anyone got VOLTTRON running on OS X? I'm trying to assess the effort required to make this happen.
It seems that inotify would need to be replaced with something based on FSEvents. Use of inotify appears to be limited to the volttron.platform.utils.watch_file method so it shouldn't be too difficult.
VOLTTRON does start up without error if I comment out the inotify reference but whatever is dependent on watch_file is certainly not going to work. Are there other libraries or behaviors that would be different or unavailable on OS X? I'm not concerned about hardware/driver interfaces. I don't intend to deploy on OS X but it would be nice to be able to develop on it.
Up until about two years ago we had a developer who was working in OSX and would kindly point out whenever we broke something in his environment.
We haven't really tried it since.
The two places we watch files is for authorization changes at run time. Those features will still work they just won't update state at runtime.
I don't know of any other libraries we use that would stop you from working in OSX.

How does a desktop environment developer test his code?

I can't figure out how a desktop environment developer test his code. Usually, a C or C++ programmer compiles his code an then run it (i'm not one of those programmers, i'm a web one).
So, you usually build your gui application over some kind of desktop environment (windows, mac os x, gnome, kde, xfce...), sow how they build and test their gui desktop?
And if this is a silly question, how does a kernel programmer test his code? for example linux kernel? how do you know that what you just wrote works?
Testing is a very broad term there are many types (partial list):
unit tests - test small pieces of code. test that the code behaves as expected.
system tests - test whole application in real world scenarios.
performance tests - test what is the performance of the application or part of it.
GUI testing - test operation of GUI elements (not so common as automated tests)
static analysis - compiler warnings on steroids
dynamic analysis - at a minimum memory checks - check mem allocations and usage
coverage tests - check that all code is executed.
formal verification tests (very advanced) - e.g. check when assertions/assumptions are broken.
Kernel code can be debugged by connecting using a 2nd computer (host). Virtual machines uses the same principal and simplify the setup but can't always work as HW might not exist in the guest VM.
The kernel (all OSes) has trace mechanism(s) for printing progress/problems. In Linux the simple trace is shown via the dmesg command (prints a cyclic buffer).
User mode code can easily be stopped and debugged via a debugger.
Desktop Environments
Testing Desktop Environments in real world scenarios can be kind of annoying, so the developer would have to watch out for every small error he makes, if he doesn't, he will have a hard time developing the DE.
As stated by #egur, there are multiple ways of testing his code, the easiest one and most important (but cannot be used in some cases, of course), he can test that code in a simplified program.
A Desktop Environment consists of many parts, however, in your case, I suppose you're talking about the session manager (or window manager) which is responsible for almost everything. So, if he were to test that, he would simply exit his current DE and use the new executable. In case of some error, he can always keep a backup of the old executable or fix the faulty code using some commandline text editor (like vim, or nano).
Kernel
It's quite hard to test, some kernel developers just write some code and make sure it's fine and compiles, then simply let his users test (by ACK'ing the code, etc.), then it can be submitted into the kernel code. Reasoning behind that is, the developer may not have the hardware needed to test the code.
Right now, you can compile and run the kernel in usermode (UML) if you have heard of it, so some developers may go for it. However, some developers may also want to test it themselves (They of course back up the current kernel incase of a screw up).
The way to test a desktop application is related to the way of control the application unassisted or remotely.
The Cross Platform GUI Test Automation tool (I don't know if this project has a web) project helps you to chose the interfaces/libraries required to solve the problem.
In Linux[1] uses the accessibility libraries to control the application, you have Cobra[2] for Windows and PyATOM[3] for MacOS, but I don't know what kind of technology uses in this platforms.
http://ldtp.freedesktop.org/wiki/
https://github.com/ldtp/cobra
https://github.com/pyatom/pyatom

Dump call stack on error?

I'm debugging a program written in plain C (no C++, MFC, .NET, etc.) to the WIN32API. It must compile in both VS2005 (to run under Win 2K/XP) and VS2010 (to run under Win7.) I've been unable to duplicate a bug that my customer seems able to duplicate fairly reliably, so I'm looking for ways to have my program "debug itself" as-it-were. It is monitoring all of the key values that are changing, but what I'd really like to see is a stack dump when a value changes. Oh, I cannot run a "true" debug build (using the debug libraries) without installing the compiler on the customer's machine and that is not an option, so this must be built into my release build.
Is there any way to do this other than just adding my own function entry/exit calls to my own stack monitor? I'd especially like to be able to set a hardware breakpoint when a specific memory address changes unexpectedly (so I'd need to be able to disable/enable it around the few EXPECTED change locations.) Is this possible? In a Windows program?
I'd prefer something that doesn't require changing several thousand lines of code, if possible. And yes, I'm very underprivileged when it comes to development tools -- I consider myself lucky to have a pro version of the Visual Studio IDEs.
--edit--
In addition to the excellent answers provided below, I've found some info about using hardware breakpoints in your own code at http://www.codereversing.com/blog/?p=76. I think it was written with the idea of hacking other programs, but it looks like it might work find for my needs, allowing me to create a mini dump when an unexpected location writes to a variable. That would be cool and really useful, especially if I can generalize it. Thanks for the answers, now I'm off to see what I can create using all this new information!
You can use MiniDumpWriteDump function which creates a dump, which can be used for post-mortem debugging. In the case application crashes, you can call MiniDumpWriteDump from unhandled exception handler set by SetUnhandledExceptionFilter. If the bug you are talking about is not crash, you can call MiniDumpWriteDump from any place of the program, when some unexpected situation is detected. More about crash dumps and post-mortem debugging here: http://www.codeproject.com/Articles/1934/Post-Mortem-Debugging-Your-Application-with-Minidu
The main idea in this technique is that mini dump files produced on a client site are sent to developer, they can be debugged - threads, stack and variables information is available (with obvious restrictions caused by code optimizations).
There are a bunch of Win32 functions in dbghelp32.dll that can be used to produce a stack trace for a given thread: for an example of this see this code.
You can also look up the StackWalk64() and related functions on MSDN.
To get useful information out, you should turn on PDB file generation in the compiler for your release build: if you set up your installer so that on the customer's computer the PDB files are in the same place as the DLL, then you can get an intelligible stack trace out with function names, etc. Without that, you'll just get DLL names and hex addresses for functions.
I'm not sure how practical it would be to set up hardware breakpoints: you could write some sort of debugger that uses the Win32 debugging API, but that's probably more trouble than its worth.
If you can add limited instrumentation to raise an identifiable exception when the symptom recurs, you can use Process Dumper to generate a full process dump on any instance of that exception.
I find I cite this tool very frequently, it's a real godsend for hard-to-debug production problems but seems little-known.

Segmentation Fault in a multi-threaded server

I have been developing a multi-threaded server (using Pthreads) for a network for about 2 months now, under Linux (Ubuntu 11.04 64-bit kernel 2.6.38).
The code is about 7000 lines of C at the moment. I have been using it in the network where multiple clients connect to it and get served. It has been running quite smoothly.
Suddenly I am facing a bit of strange problem. Every now and then (about 1 out of 10 times) the server crashes due to segmentation fault. I have looked all over the code but can not seem to find the actual reason behind this. Can anyone guide me on this as to what may be going wrong here or what things I should try to find the actual bug?
Enable core file generation. When the application crashes, load up the debugger
run your application using valgrind with memory check
write unit tests. Lots of them, and increase coverage to 100%.
stress test your application using valgrind's hellgrind to test multithreaded applications
100% coverage isn't realistic, but 85%-95% can reasonably happen with diligence.
About why weird errors happen:
http://stromberg.dnsalias.org/~strombrg/checking-early.html
You said this started happening suddenly. Hopefully you've been using a source code control system like Mercurial or Git or SVN. If you have (or perhaps you have nightly backups?), you probably should look back at the changes made at about the time the problems started, trying to find the error, which is likely an undefined memory reference.

How to track down exceptional bugs in application when released?

When an application causes a serious segment-fault issue, which is hard to find or track. I can use a debug version and generate a core dump file when issue happens. And debug this app with core-dump file.
But how to track down exceptional bugs in application when released? There seems to be no core-dump file in release version. Although log is an option, it is useless when there is a hard to track bugs happens.
So my question is how to track down those hard to track bugs in release version? Any suggestions or technology out there available?
Following reference may help the discussion.
[1] Core dump in Linux
[2] generate a core dump in linux
[3] Solaris Core dump analysis
You can compile a release version with gcc -g -O2 ...
The lack of core dump is related to your user's setting of resource limits (unless the application is explicitly calling setrlimit or is setuid; then you should offer a way to avoid that call). You might teach your users how to get core dumps (with the appropriate bash ulimit builtin).
(and there is some obscure way to put the debugging information outside of the executable)
The distributions provide -dbg packages that provide debugging symbols for programs. They are built along with the binary packages and can provide your users the ability to generate meaningful backtraces from core dumps. If you build your packages using the same utilities, you can get these -dbg packages for your own software "nearly free".
I suggest to use a crash reporting system, in my experience we use google's break-pad project for our windows client program, of course you can write your own.
Google break-pad is an open-source multi-platform crash reporting system, it can make mini or full memory dump when exception or crash happen, then you can config it to upload the dump file and any additional files to a specific ftp server or http server, very help to find bug.
Here is the link:
Google Break-pad
Ask the "customer" for a description of what he or she did to make it crash, and try to replicate it yourself with your own version that has debug information.
The hard part is getting correct information from the customer. Often they will say they did nothing special or nothing different than before. If possible, go see the person having the problem, and ask them to do what they do to make the program crash, writing down every step.

Resources