Crash reporting in C for Linux

Crash reporting in C for Linux - c

Following this question:
Good crash reporting library in c#
Is there any library like CrashRpt.dll that does the same on Linux? That is, generate a failure report including a core dump and any necessary environment and notify the developer about it?
Edit: This seems to be a duplicate of this question

See Getting stack traces on Unix systems, automatically on Stack Overflow.

Compile your code with debug symbols, enter unlimit coredumpsize in your shell and you'll get a coredump in the same folder as the binary. Use gdb/ddd - open the program first and then open the core dump. You can check this out for additional info.

#Ionut
This handles generating the core dump, but it doesn't handle notifying the developer when other users had crashes.

Nathan, under what circumstances in a segment base non-zero? I've never seen that occur in my 5 years of Linux application development.
Thanks.

#Martin
I do architectural validation for x86, so I'm very familiar with the architecture the processor provides, but very unfamiliar with how it's used. That's what I based my comment on. If CR2 can be counted on to give the correct answer, then I stand corrected.

Nathan, I wasn't insisting that you were incorrect; I was just saying that in my (limited) experience with Linux, the segment base is always zero. Maybe that's a good question for me to ask...

Note: there are two interesting registers in an x86 seg-fault crash.
The first, EIP, specifies the code address at which the exception occurred. In RichQ's answer, he uses addr2line to show the source line that corresponds to the crash address. But EIP can be invalid; if you call a function pointer that is null, it can be 0x00000000, and if you corrupt your call stack, the return can pop any random value into EIP.
The second, CR2, specifies the data address which caused the segmentation fault. In RichQ's example, he is setting up i as a null pointer, then accessing it. In this case, CR2 would be 0x00000000. But if you change:
int j = *i
to:
int j = i[2];
Then you are trying to access address 0x00000008, and that's what would be found in CR2.

Related

Stack corruption; what is supposed to happen to a variable stored in a register across function calls?

I've been chasing down a crash that seems to be due to memory corruption. The setting is C, building using llvm for iOS.
The memory corruption is absent in debug mode and with optimization level 0 (-O0). In order to be able to step through the code, I've rebuilt with debug symbols but optimization level 1 (-O1). This is enough to reproduce the crash and it allows me to step through the program.
I've narrowed the problem down to a particular function call. Before it, the value of a certain pointer is correct. After it, the value is corrupted (it ends up equalling 0x02, for whatever that is worth).
Lldb doesn't seem to want to watch variables or memory locations. Switching to gdb, I find that if I try to print the address of the aforementioned variable I encounter the following message: "Address requested for identifier 'x' which is in register $r4".
I understand that, as an optimization, the compiler may decide to keep the value of a variable in a register. True enough, if I print the value of $r4 before and after the function call, I see the correct value before and 0x02 after.
I'm in a bit over my head at this point and not sure how to split this into smaller problems. My questions are therefore these:
assuming that the compiler is storing the value of a variable in a register as an optimization, what is supposed to happen to that register when another function is invoked?
Is there some mechanism whereby the value is stored and restored once the new function returns?
Any recommendations on debugging techniques?
All help and suggestions appreciated. Links to reading material on the subject also quite welcome.
Thanks
EDIT: adding version information
iOS version: 5.1
llvm version: i686-apple-darwin10-llvm-gcc-4.2 (GCC) 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2377.00)
Xcode version: 4.3.1
Gdb version: GNU gdb 6.3.50-20050815 (Apple version gdb-1708)
Running on an iPhone 3Gs (crash does not appear in simulator)

Not a full answer, but
assuming that the compiler is storing the value of a variable in a
register as an optimization, what is supposed to happen to that
register when another function is invoked?
The register should probably be pushed to the stack by the callee.
Is there some mechanism whereby the value is stored and restored once
the new function returns?
Depends on the calling conventions, but in general - whoever pushed it to stack is responsible to pop it from the stack
Last thing:
If you meet such case, where "it works" on some optimisation level and doesn't on other, you are very likely to have an undefined behavior. If you can't find it in your code, you can ask it here, giving the actual code.

Try using Valgrind if possible. This looks like a good starting point.
Also, try enabling -fstack-protector for your program.

c runtime error message

this error appeared while creating file using fopen in c programming language
the NTVDM cpu has encountered an illegal instruction CS:0000 IP0075
OP:f0 00 f0 37 05 choos 'close to terminate the operation

This kind of thing typically happens when a program tries to execute data as code. In turn, this typically happens when something tramples the stack and overwrites a return address.
In this case, I would guess that "IP0075" is the instruction pointer, and that the illegal instructions executed were at address 0x0075. My bet is that this address is NOT mapped to the apps executable code.
UPDATE on the possible connection with 'fopen': The OP states that deleting the fopen code makes the problem go away. Unfortunately, this does not prove that the fopen code is the cause of the problem. For example:
The deleted code may include extra local variables, which may mean that the stack trampling is hitting the return address in one case ... and in the other case, some word that is not going to be used.
The deleted code may cause the size of the code segment to change, causing some significant address to point somewhere else.
The problem is almost certainly that your application has done something that has "undefined behavior" per the C standard. Anything can happen, and the chances are that it won't make any sense.
Debugging this kind of problem can be really hard. You should probably start by running "lint" or the equivalent over your code and fixing all of the warnings. Next, you should probably use a good debugger and single step the application to try to find where it is jumping to the bad code/address. Then work back to figure out what caused it to happen.

Assuming that it's really the fopen() call that causes problems (it's hard to say without your source code), have you checked that the 2 character pointers that you pass to the function are actually pointers to a correctly allocated memory?
Maybe they are not properly initialized?

Hmmm.... you did mention NTVDM which sounds like an old 16 bit application that crashed inside an old command window with application compatibility set, somehow. As no code was posted, it could be possible to gauge a guess that its something to do with files (but fopen - how do you know that without showing a hint?) Perhaps there was a particular file that is longer than the conventional 8.3 DOS filename convention and it borked when attempting to read it or that the 16 bit application is running inside a folder that has again, name longer than 8.3?
Hope this helps,
Best regards,
Tom.

Debug core file with no symbols

I have a C application we have deployed to a customers site. It was compiled and runs on HP-UX. The user has reported a crash and we have obtained a core dump. So far, I've been unable to duplicate the crash in house.
As you would suspect, the core file/deployed executable is completely devoid of any sort of symbols. When I load it up in gdb and do a bt, the best I get is this:
(gdb) bt
#0 0xc0199470 in ?? ()
I can do a 'strings core' on the file, but my understanding is that all I get there is all the strings in the executable, so it seems semi-impossible to track down anything there.
I do have a debug version (compiled with -g) of the executable, which is unfortunately a couple of months newer than the released version. If I try to start gdb with that hub, I see this:
warning: exec file is newer than core file.
Core was generated by `program_name'.
Program terminated with signal 11, Segmentation fault.
__dld_list is not valid according to __dld_flags.
#0 0xc0199470 in ?? ()
(gdb) bt
#0 0xc0199470 in ?? ()
While it would be feasible to compile a debug version and deploy it at the customer's site and then wait for another crash, it would be relatively difficult and undesirable for a number of reasons.
I am quite familiar with the code and have a relatively good idea of where in code it is crashing based on the customer's bug report.
Is there ANY way I can glean any more information from this core dump? Via strings or another debugger or anything? Thanks.

This type of response from gdb:
(gdb) bt
#0 0xc0199470 in ?? ()
can also happen in the case that the stack was smashed by a buffer overrun, where the return address was overwritten in memory, so the program counter gets set to a seemingly random area.
This is one of the ways that even a build with a corresponding symbol database can cause a symbol lookup error (or strange looking backtraces). If you still get this after you have the symbol table, your problem is likely that your customer's data is causing some issues with your code.

For the future:
Make sure that you always build with an external symbols database (this is not a debug build -- it's a release build, but you store the symbol table separately)
keep it around for versions you deploy
For this situation:
You know the general area, so to see if you are right, go to the stack trace and find the assembly code -- eyeball it and see if you think it matches your source (this is easier if you have some idea what source generated this assembly). If it looks right, then you have some verification on your hypothesis. You might be able to figure out the values of the local variables by looking at the stack (since you know what you passed in and declared).

Under gdb, "info registers" should give you enough of the execution state at the time of the crash to use with a disassembly of the executable and and relevant shared libraries. I usually use objdump to disassemble, redirect output to a file, then bring up the file in my favorite editor - this is useful for keeping notes as things are figured out. Also gdb's "info target" and "info sharedlib" can be useful for figuring out where shared libraries are loaded.
With register state, stack contents, and disassembly in hand along with a little luck, it should be straightforward (if tedious) to reconstruct the callstack (unless, of course, the stack has been trashed by a buffer overrun or similar catastrophe... might need an Ouija board or crystal ball in that case.)
You might also be able to correlate a a disassembly of the newer version built with -g against the disassembly of the stripped version.

Always use source control (CVS/GIT/Subversion/etc), even for test releases
Tag all releases
Consider (in the future) making a build with debugging (-g) and strip the executable before shipping. NOTE: Don't make two builds with and without -g; they may well not match up, since -g can on occasion cause different code to be generated even at the same optimization level. In super-performance-critical code you can forgo the -g for critical files - most it won't make a difference to.
If you're really stuck, dump the stack and dump relevant parts of the heap to hex and look at it by hand; perhaps taking an instrumented copy and looking for similar "signatures" in the generated code and on the stack. This is real "old-school" debugging... :-)

Do you have the exact source that you used to compile the old version (eg; through a tag in the source tree or something like that)? Maybe you could rebuild using that, and possibly get an insight into where the crash occured?

Try running a "pmap" against the core file (if hp/ux has this tool). This should report the starting addresses of all modules in the core file. With this info, you should be able to take the address of the failure location and figure out what library crashed. Further address comparison between the crash address and the addresses of the known functions in the library ("nm" against the library should get that) may help you determine what function crashed.
Even if you do manage to identify the function at the top of the stack, it isn't very likely that this function is the source of the problem... hopefully it has actually crashed in your code and not, say, the standard C string library. Rebuilding the stack trace is the next-best thing at that point.

There is not much information here. The binary is stripped.But looking at segmentation fault...you should look for places where there is a possibility that you are overwriting a piece of memory.
This is just a suggestion. There can be many problems.
BTW, if you are not able to reproduce in your local machine then the volume of data on customers' might be a problem.

I don't think the core file is supposed to contain symbols. You need to able to build a version of your program that is exactly the same as what you shipped to your customer, but with -g. If you strip your debug executable, it should be identical to the shipped version. Only then can gdb give you anything useful.

How do you put a breakpoint on a memory location in dbx?

A co-worker has a C program that fails in a predictable manner because of some corrupted memory. He'd like to use dbx to monitor the memory location once it's allocated in order to pinpoint the code that causes the corruption.
Is this possible? If so what is the syntax to produce a breakpoint at the moment of corruption?
If not, what would be a good approach to fixing this sort of issue?
(My usual tactic is to look at the source control to see what I've changed lately, since that is usually the cause. But the code in question sounds as if it only ever worked by luck, so that won't work. Also, I've already eliminated myself as the culprit by never having worked with the code. ;-)

Having looked more deeply, it appears the solution on recent versions of dbx is something like:
stop access w <address>, <size>
Since <address> and <size> can be expressions, you can write commands like:
stop access w &p, sizeof(int)
This assumes p is a pointer and we want to monitor the first word it points to.
I've also run across a fine tutorial on tracking and stomping memory bugs. It uses gdb rather than dbx, but the principles should be the same.

On AIX, you want to use stophwp:
(dbx) help stophwp
stophwp <address> <size>
Stop execution when the contents of the specified
memory region change. This is a accomplished in
hardware and may not be available on all models.

I'm no Solaris dev, but you can do this with gdb and hardware breakpoints

Strange code crash problem?

I have a MSVC 6.o workspace, which has all C code.
The code is being run without any optimization switch i.e with option O0, and in debug mode.
This code is obtained from some 3rd party. It executes desirable as it is.
But when I add some printf statements in certain functions for debugging, and then execute the code, it crashes.
I suspect it to be some kind of code/data overflow across a memory-page/memory-segment or something alike. But the code does not have any memory map specifier, or linker command file mentioning the segments/memory map etc.
How do I narrow down the cause, and the fix for this quirky issue?

On Linux, I like valgrind. Here is a Stack Overflow thread for valgrind-like tools on Windows.

You could try to determine where the crash happens, by looking at the stack trace in Visual Studio. You should be able to see what is the sequence of function calls that eventually leads to the crash, and this may give you a hint as to what's wrong.
It is also possible that the printf() alone causes the crash. A possible cause - but not too likely on Windows - is a too-small stack that is being overflown by the call to printf().

Use string.getbuffer while printing cstring objects in printf.
There could be an issue for wide char and normal string.
printf("%s",str.Getbuffer());
str.ReleaseBuffer();
Cheers,
Atul.

In general when trying to deal with a crash, your first port of call should be the debugger.
Used correctly, this will enable you to narrow down your problem to a specific line of code and, hopefully, give you a view of the runtime memory at the moment of the crash. This will allow you to see the immediate cause of the crash.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Crash reporting in C for Linux - c

See Getting stack traces on Unix systems, automatically on Stack Overflow.

Compile your code with debug symbols, enter unlimit coredumpsize in your shell and you'll get a coredump in the same folder as the binary. Use gdb/ddd - open the program first and then open the core dump. You can check this out for additional info.

#Ionut This handles generating the core dump, but it doesn't handle notifying the developer when other users had crashes.

Nathan, under what circumstances in a segment base non-zero? I've never seen that occur in my 5 years of Linux application development. Thanks.

#Martin I do architectural validation for x86, so I'm very familiar with the architecture the processor provides, but very unfamiliar with how it's used. That's what I based my comment on. If CR2 can be counted on to give the correct answer, then I stand corrected.

Nathan, I wasn't insisting that you were incorrect; I was just saying that in my (limited) experience with Linux, the segment base is always zero. Maybe that's a good question for me to ask...

Related

Stack corruption; what is supposed to happen to a variable stored in a register across function calls?

c runtime error message

Debug core file with no symbols

How do you put a breakpoint on a memory location in dbx?

Strange code crash problem?

Categories

Resources