Getting a stack trace from GDB - c

I must have a misunderstanding of the stack, or how functions are called, the backtrace results I'm getting from GDB make no sense to me. I'm trying to find out where things get called in a program so that I can add my component.
The tool draws bounding boxes on videos, what I made is an interpolator. I thought it only made sense to open GDB and put a breakpoint in when a box was being drawn, and run a backtrace. Here's mu output (after running the program from ffmpeg.c main())
#0 draw_glyphs (vidatbox=0x10183d200, picref=0x10141e340, width=720, height=480,
rgbcolor=0x10183d284 "????", yuvcolor=0x10183d278 "뀀?\020???\020???????", x=0, y=0) at
libavfilter/vf_VidAT.c:627
#1 0x000000010001ce4c in draw_text (ctx=0x10120df20, picref=0x10141e340, width=720,
height=480) at libavfilter/vf_VidAT.c:787
Disregarding all the non ascii chars,how are the two functions draw_glyphs and draw_text being called? How come there is nothing else on the stack? When I select Frame #1 and try and go up, it tells me:
Initial frame selected; you cannot go up.
EDIT:
I've looked more, and I'm even more confused then I was when I asked. The function draw_glyphs is not even called inside of the main that I'm running. I've grepped through all the files that this uses to compile and well...it's not called anywhere!
Does this mean that it's a dynamically created function pointer or something? If so, would that make the stack innaccessible like mine is?

If the stack trace is informative but terminates unexpectedly (especially after just one or two entries), then that indicates that gdb was unable to follow the call stack past that point.
Compiler options that interfere with stack unwinding include higher optimisation levels (esp. -O3) and -fomit-frame-pointer, so search your Makefiles and remove those options. The frame pointer is not usually necessary to the execution of code, so using it as a general purpose register will improve performance on register-starved architectures such as x86, but can interfere with debugging.
More recently frame-based stack unwinding is being replaced with unwind tables, but gdb still relies on the frame pointer if unwind tables are not present.

Related

Segmentation fault with ulimit set correctly

I tried to help an OP on this question.
I found out that a code like the one below causes segmentation fault randomly even if the stack is set to 2000 Kbytes.
int main ()
{
int a[510000];
a[509999] = 1;
printf("%d", a[509999]);
return 0;
}
As you can see the array is 510000 x 4 bytes = 2040000 bytes.
The stack is set to 2000 Kbytes (2048000 bytes) using ulimit command:
ulimit -s 2000
ulimit -Ss 2000
Based on those numbers the application has room to store the array, but randomly it return segmentation fault.
Any ideas?
There's a few reasons why you can't do this. There are things that are already using parts of your stack.
main is not the first thing on your stack. There are functions called by the real entry point, dynamic linker, etc. that are before main and they are all probably using some of the stack.
Additionally, there can be things that are usually put on the top of the stack to set up execution. Many systems I know put all the strings in argv and all environment variables on top of the stack (which is why main is not the entry point, there's usually code that runs before main that sets up environment variables and argv for main).
And to top it off a part of the stack can be deliberately wasted to increase the randomness of ASLR if your system does that.
Run you program in the debugger, add a breakpoint at main, look up the value of the stack register and examine the memory above it (remember that most likely your stack grows down unless you're on a weird architecture). I bet you'll find lots of pointers and strings there. I just did this on a linux system and as I suspected all my environment variables were there.
The purpose of resource limits (ulimit) on Unix has never really been to micromanage things down to a byte/microsecond, they are there just to stop your program from going completely crazy and taking down the whole system with it. See them not as red lights and stop signs on a proper road, see them as run-off areas and crash barriers on a racetrack.
If you still wants to access the int location in the array, try to compile the code with out the main..this will not invoke _start
check this discussion enter link description here

Allocating a new call stack

(I think there's a high chance of this question either being a duplicate or otherwise answered here already, but searching for the answer is hard thanks to interference from "stack allocation" and related terms.)
I have a toy compiler I've been working on for a scripting language. In order to be able to pause the execution of a script while it's in progress and return to the host program, it has its own stack: a simple block of memory with a "stack pointer" variable that gets incremented using the normal C code operations for that sort of thing and so on and so forth. Not interesting so far.
At the moment I compile to C. But I'm interested in investigating compiling to machine code as well - while keeping the secondary stack and the ability to return to the host program at predefined control points.
So... I figure it's not likely to be a problem to use the conventional stack registers within my own code, I assume what happens to registers there is my own business as long as everything is restored when it's done (do correct me if I'm wrong on this point). But... if I want the script code to call out to some other library code, is it safe to leave the program using this "virtual stack", or is it essential that it be given back the original stack for this purpose?
Answers like this one and this one indicate that the stack isn't a conventional block of memory, but that it relies on special, system specific behaviour to do with page faults and whatnot.
So:
is it safe to move the stack pointers into some other area of memory? Stack memory isn't "special"? I figure threading libraries must do something like this, as they create more stacks...
assuming any area of memory is safe to manipulate using the stack registers and instructions, I can think of no reason why it would be a problem to call any functions with a known call depth (i.e. no recursion, no function pointers) as long as that amount is available on the virtual stack. Right?
stack overflow is obviously a problem in normal code anyway, but would there be any extra-disastrous consequences to an overflow in such a system?
This is obviously not actually necessary, since simply returning the pointers to the real stack would be perfectly serviceable, or for that matter not abusing them in the first place and just putting up with fewer registers, and I probably shouldn't try to do it at all (not least due to being obviously out of my depth). But I'm still curious either way. Want to know how these sorts of things work.
EDIT: Sorry of course, should have said. I'm working on x86 (32-bit for my own machine), Windows and Ubuntu. Nothing exotic.
All of these answer are based on "common processor architectures", and since it involves generating assembler code, it has to be "target specific" - if you decide to do this on processor X, which has some weird handling of stack, below is obviously not worth the screensurface it's written on [substitute for paper]. For x86 in general, the below holds unless otherwise stated.
is it safe to move the stack pointers into some other area of memory?
Stack memory isn't "special"? I figure threading libraries
must do something like this, as they create more stacks...
The memory as such is not special. This does however assume that it's not on an x86 architecture where the stack segment is used to limit the stack usage. Whilst that is possible, it's rather rare to see in an implementation. I know that some years ago Nokia had a special operating system using segments in 32-bit mode. As far as I can think of right now, that's the only one I've got any contact with that uses the stack segment for as x86-segmentation mode describes.
Assuming any area of memory is safe to manipulate using the stack
registers and instructions, I can think of no reason why it would be a
problem to call any functions with a known call depth (i.e. no
recursion, no function pointers) as long as that amount is available
on the virtual stack. Right?
Correct. Just as long as you don't expect to be able to get back to some other function without switching back to the original stack. Limited level of recursion would also be acceptable, as long as the stack is deep enough [there are certain types of problems that are definitely hard to solve without recursion - binary tree search for example].
stack overflow is obviously a problem in normal code anyway,
but would there be any extra-disastrous consequences to an overflow in
such a system?
Indeed, it would be a tough bug to crack if you are a little unlucky.
I would suggest that you use a call to VirtualProtect() (Windows) or mprotect() (Linux etc) to mark the "end of the stack" as unreadable and unwriteable so that if your code accidentally walks off the stack, it crashes properly rather than some other more subtle undefined behaviour [because it's not guaranteed that the memory just below (lower address) is unavailable, so you could overwrite some other useful things if it does go off the stack, and that would cause some very hard to debug bugs].
Adding a bit of code that occassionally checks the stack depth (you know where your stack starts and ends, so it shouldn't be hard to check if a particular stack value is "outside the range" [if you give yourself some "extra buffer space" between the top of the stack and the "we're dead" zone that you protected - a "crumble zone" as they would call it if it was a car in a crash]. You can also fill the entire stack with a recognisable pattern, and check how much of that is "untouched".
Typically, on x86, you can use the existing stack without any problems so long as:
you don't overflow it
you don't increment the stack pointer register (with pop or add esp, positive_value / sub esp, negative_value) beyond what your code starts with (if you do, interrupts or asynchronous callbacks (signals) or any other activity using the stack will trash its contents)
you don't cause any CPU exception (if you do, the exception handling code might not be able to unwind the stack to the nearest point where the exception can be handled)
The same applies to using a different block of memory for a temporary stack and pointing esp to its end.
The problem with exception handling and stack unwinding has to do with the fact that your compiled C and C++ code contains some exception-handling-related data structures like the ranges of eip with the links to their respective exception handlers (this tells where the closest exception handler is for every piece of code) and there's also some information related to identification of the calling function (i.e. where the return address is on the stack, etc), so you can bubble up exceptions. If you just plug in raw machine code into this "framework", you won't properly extend these exception-handling data structures to cover it, and if things go wrong, they'll likely go very wrong (the entire process may crash or become damaged, despite you having exception handlers around the generated code).
So, yeah, if you're careful, you can play with stacks.
You can use any region you like for the processor's stack (modulo the memory protections).
Essentially, you simply load the ESP register ("MOV ESP, ...") with a pointer to the new area, however you managed to allocate it.
You have to have enough for your program, and whatever it might call (e.g., a Windows OS API), and whatever funny behaviours the OS has. You might be able to figure out how much space your code needs; a good compiler can easily do that. Figuring how much is needed by Windows is harder; you can always allocate "way too much" which is what Windows programs tend to do.
If you decide to manage this space tightly, you'll probably have to switch stacks to call Windows functions. That won't be enough; you'll likely get burned by various Windows surprises. I describe one of them here Windows: avoid pushing full x86 context on stack. I have mediocre solutions, but not good solutions for this.

Stack corruption; what is supposed to happen to a variable stored in a register across function calls?

I've been chasing down a crash that seems to be due to memory corruption. The setting is C, building using llvm for iOS.
The memory corruption is absent in debug mode and with optimization level 0 (-O0). In order to be able to step through the code, I've rebuilt with debug symbols but optimization level 1 (-O1). This is enough to reproduce the crash and it allows me to step through the program.
I've narrowed the problem down to a particular function call. Before it, the value of a certain pointer is correct. After it, the value is corrupted (it ends up equalling 0x02, for whatever that is worth).
Lldb doesn't seem to want to watch variables or memory locations. Switching to gdb, I find that if I try to print the address of the aforementioned variable I encounter the following message: "Address requested for identifier 'x' which is in register $r4".
I understand that, as an optimization, the compiler may decide to keep the value of a variable in a register. True enough, if I print the value of $r4 before and after the function call, I see the correct value before and 0x02 after.
I'm in a bit over my head at this point and not sure how to split this into smaller problems. My questions are therefore these:
assuming that the compiler is storing the value of a variable in a register as an optimization, what is supposed to happen to that register when another function is invoked?
Is there some mechanism whereby the value is stored and restored once the new function returns?
Any recommendations on debugging techniques?
All help and suggestions appreciated. Links to reading material on the subject also quite welcome.
Thanks
EDIT: adding version information
iOS version: 5.1
llvm version: i686-apple-darwin10-llvm-gcc-4.2 (GCC) 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2377.00)
Xcode version: 4.3.1
Gdb version: GNU gdb 6.3.50-20050815 (Apple version gdb-1708)
Running on an iPhone 3Gs (crash does not appear in simulator)
Not a full answer, but
assuming that the compiler is storing the value of a variable in a
register as an optimization, what is supposed to happen to that
register when another function is invoked?
The register should probably be pushed to the stack by the callee.
Is there some mechanism whereby the value is stored and restored once
the new function returns?
Depends on the calling conventions, but in general - whoever pushed it to stack is responsible to pop it from the stack
Last thing:
If you meet such case, where "it works" on some optimisation level and doesn't on other, you are very likely to have an undefined behavior. If you can't find it in your code, you can ask it here, giving the actual code.
Try using Valgrind if possible. This looks like a good starting point.
Also, try enabling -fstack-protector for your program.

How to debug a stackoverflow problem on targets

I want to know how do we proceed to debug a STACKOVERFLOW issue on targets .
I mean what are the steps we should follow to reach a conclusion.
Put a memory write watchpoint for one word past the end of your stack space. Then the debugger will break in when that spot gets written to, and you can see what's at fault.
All stacks can be filled at start up with certain hex value (for example 0xAAAAAAAA). And then using special routine you can monitor all stack's maximum usage periodically by calculating the quantity of known values (0xAA..) from end of stack until finds the first difference.
Run it through a debugger such as gdb. The backtrace at the time of the stack overflow will tell you exactly which function or functions are repeating indefinitely. From there, figure out which input(s) to those functions are not changing, and not moving the function (if it's recursive) towards a base-case that will end the recursion.

Debug core file with no symbols

I have a C application we have deployed to a customers site. It was compiled and runs on HP-UX. The user has reported a crash and we have obtained a core dump. So far, I've been unable to duplicate the crash in house.
As you would suspect, the core file/deployed executable is completely devoid of any sort of symbols. When I load it up in gdb and do a bt, the best I get is this:
(gdb) bt
#0 0xc0199470 in ?? ()
I can do a 'strings core' on the file, but my understanding is that all I get there is all the strings in the executable, so it seems semi-impossible to track down anything there.
I do have a debug version (compiled with -g) of the executable, which is unfortunately a couple of months newer than the released version. If I try to start gdb with that hub, I see this:
warning: exec file is newer than core file.
Core was generated by `program_name'.
Program terminated with signal 11, Segmentation fault.
__dld_list is not valid according to __dld_flags.
#0 0xc0199470 in ?? ()
(gdb) bt
#0 0xc0199470 in ?? ()
While it would be feasible to compile a debug version and deploy it at the customer's site and then wait for another crash, it would be relatively difficult and undesirable for a number of reasons.
I am quite familiar with the code and have a relatively good idea of where in code it is crashing based on the customer's bug report.
Is there ANY way I can glean any more information from this core dump? Via strings or another debugger or anything? Thanks.
This type of response from gdb:
(gdb) bt
#0 0xc0199470 in ?? ()
can also happen in the case that the stack was smashed by a buffer overrun, where the return address was overwritten in memory, so the program counter gets set to a seemingly random area.
This is one of the ways that even a build with a corresponding symbol database can cause a symbol lookup error (or strange looking backtraces). If you still get this after you have the symbol table, your problem is likely that your customer's data is causing some issues with your code.
For the future:
Make sure that you always build with an external symbols database (this is not a debug build -- it's a release build, but you store the symbol table separately)
keep it around for versions you deploy
For this situation:
You know the general area, so to see if you are right, go to the stack trace and find the assembly code -- eyeball it and see if you think it matches your source (this is easier if you have some idea what source generated this assembly). If it looks right, then you have some verification on your hypothesis. You might be able to figure out the values of the local variables by looking at the stack (since you know what you passed in and declared).
Under gdb, "info registers" should give you enough of the execution state at the time of the crash to use with a disassembly of the executable and and relevant shared libraries. I usually use objdump to disassemble, redirect output to a file, then bring up the file in my favorite editor - this is useful for keeping notes as things are figured out. Also gdb's "info target" and "info sharedlib" can be useful for figuring out where shared libraries are loaded.
With register state, stack contents, and disassembly in hand along with a little luck, it should be straightforward (if tedious) to reconstruct the callstack (unless, of course, the stack has been trashed by a buffer overrun or similar catastrophe... might need an Ouija board or crystal ball in that case.)
You might also be able to correlate a a disassembly of the newer version built with -g against the disassembly of the stripped version.
Always use source control (CVS/GIT/Subversion/etc), even for test releases
Tag all releases
Consider (in the future) making a build with debugging (-g) and strip the executable before shipping. NOTE: Don't make two builds with and without -g; they may well not match up, since -g can on occasion cause different code to be generated even at the same optimization level. In super-performance-critical code you can forgo the -g for critical files - most it won't make a difference to.
If you're really stuck, dump the stack and dump relevant parts of the heap to hex and look at it by hand; perhaps taking an instrumented copy and looking for similar "signatures" in the generated code and on the stack. This is real "old-school" debugging... :-)
Do you have the exact source that you used to compile the old version (eg; through a tag in the source tree or something like that)? Maybe you could rebuild using that, and possibly get an insight into where the crash occured?
Try running a "pmap" against the core file (if hp/ux has this tool). This should report the starting addresses of all modules in the core file. With this info, you should be able to take the address of the failure location and figure out what library crashed. Further address comparison between the crash address and the addresses of the known functions in the library ("nm" against the library should get that) may help you determine what function crashed.
Even if you do manage to identify the function at the top of the stack, it isn't very likely that this function is the source of the problem... hopefully it has actually crashed in your code and not, say, the standard C string library. Rebuilding the stack trace is the next-best thing at that point.
There is not much information here. The binary is stripped.But looking at segmentation fault...you should look for places where there is a possibility that you are overwriting a piece of memory.
This is just a suggestion. There can be many problems.
BTW, if you are not able to reproduce in your local machine then the volume of data on customers' might be a problem.
I don't think the core file is supposed to contain symbols. You need to able to build a version of your program that is exactly the same as what you shipped to your customer, but with -g. If you strip your debug executable, it should be identical to the shipped version. Only then can gdb give you anything useful.

Resources