stack overflow in windows when running gdb - c

I think I am experiencing a stack overflow issue when running my unit-tests through gdb within emacs on Windows.
I run exactly the same unit-tests on linux with no problems
I am also aware that I am using a horrendously memory inefficient stack-based .ini file parser within these unit-tests so it would seem to be a reasonable possibility that stack overflow is occuring.
I have noticed a few unit tests faling on Windows that passed on Linux. Further investigation reveals a (stack-based) counter in a for loop resetting itself to zero at random points in the for loops execution and the (stack-based) values in the arrays that the for loop is inspecting change for the same index value
I noticed that gdb seems to be allocated its own thread of execution under Windows - is there any way of finding out how much stack space the thread is allocated?

One of the differences between Linux and Windows is that on Windows the stack-size has to be set at compile-time (there are two sizes, initial and a predefined reserved limit). Not sure what the default is with the compiler you are using, but you might try to increase it using --stack parameter (gcc).
On Linux the stack-size is dynamic up to the limits, generally set by the sysadmin.

Well then, perhaps Windows has tighter restrictions on the maximum amount of stack per process than Linux does?
This page details how to debug a stack overflow in Windows. It's not based on gdb, but perhaps you can still infer something.

Related

How to know/limit static stack size in C program with GCC/Clang compiler? [duplicate]

This question already has answers here:
How to determine maximum stack usage in embedded system with gcc?
(7 answers)
Closed 1 year ago.
I'm writing an embedded program that uses a static limited stack area of a known size (in other words, I have X bytes for the stack, and there's no overlaying OS that can allocate more stack on demand for me). I want to avoid errors during runtime, and catch them in build time instead - to have some indication if I mistakenly declared too much variables in some function block that won't fit in the stack during the runtime.
Given that I don't use recursive calls in my program, can I somehow know during compilation time how much space on stack all my local variables will take on the deepest function call path? Or at least know how much space my variables will take in a single block (function) if the compiler is not smart enough to analyze it on all the nested calls?
Given that I don't use recursive calls in my program, can I somehow know during compilation time how much space on stack all my local variables will take on the deepest function call path?
Only if you don't use interrupts. Which is extremely likely in any embedded system. So you'll have to find out stack use with dynamic analysis.
The old school way is to set your whole stack area to a value like 0xAA upon reset from a debugger, then let the program run for a while, make sure to provoke all use-cases. Then halt and inspect how far down you still have 0xAA in memory. It isn't a 100% scientific, fool-proof method but works just fine in practice, in the vast majority of cases.
Other methods involve setting write breakpoints at certain stack locations where you don't expect the program to end up, sort of like a "hardware stack canary". Run the program and ensure that the breakpoint never triggers. If it does, then investigate from there, move the breakpoint further down the memory map to see exactly where.
Another good practice is to always memory map your stack so that it can only overflow into forbidden memory or at least into read-only flash etc - ideally you'd get a hardware exception for stack overflow. You definitely want to avoid the stack overflowing into other RAM sections like .data/.bss, as that will cause severe and extremely subtle error scenarios.

How come the stack cannot be increased during runtime in most operating system?

Is it to avoid fragmentation? Or some other reason? A set lifetime for a memory allocation is a pretty useful construct, compared to malloc() which has a manual lifetime.
The space used for stack is increased and decreased frequently during program execution, as functions are called and return. The maximum space allowed for the stack is commonly a fixed limit.
In older computers, memory was a very limited resource, and it may still be in small devices. In these cases, the hardware capability may impose a necessary limit on the maximum stack size.
In modern systems with plenty of memory, a great deal may be available for stack, but the maximum allowed is generally set to some lower value. That provides a means of catching “runaway” programs. Generally, controlling how much stack space a program uses is a neglected part of software engineering. Limits have been set based on practical experience, but we could do better.1
Theoretically, a program could be given a small initial limit and could, if it finds itself working on a “large” problem, tell the operating system it intends to use more. That would still catch most “runaway” programs while allowing well crafted programs to use more space. However, by and large we design programs to use stack for program control (managing function calls and returns, along with a modest amount of space for local data) and other memory for the data being operated on. So stack space is largely a function of program design (which is fixed) rather than problem size. That model has been working well, so we keep using it.
Footnote
1 For example, compilers could report, for each routine that does not use objects with run-time variable size, the maximum space used by the routine in any path through it. Linkers or other tools could report, for any call tree [hence without loops] the maximum stack space used. Additional tools could help analyze stack use in call graphs with potential recursion.
How come the stack cannot be increased during runtime in most operating system?
This is wrong for Linux. On recent Linux systems, each thread has its own call stack (see pthreads(7)), and an application could (with clever tricks) increase some call stacks using mmap(2) and mremap(2) after querying the call stacks thru /proc/ (see proc(5) and use /proc/self/maps) like e.g. pmap(1) does.
Of course, such code is architecture specific, since in some cases the call stack grows towards increasing addresses and in other cases towards decreasing addresses.
Read also Operating Systems: Three Easy Pieces and the OSDEV wiki, and study the source code of GNU libc.
BTW, Appel's book Compiling with Continuations, his old paper Garbage Collection can be faster than Stack Allocation and this paper on Compiling with Continuations and LLVM could interest you, and both are very related to your question: sometimes, there is almost "no call stack" and it makes no sense to "increase it".

setting up system for program debugging buffer overflow

I remember reading a long time ago that if I want to test for buffer overflows on my linux box that I need to set something in the system to allow it to happen. I can't remember exactly what it was for, but I was hoping some one knew what I was talking about.
I want to be able to test my programs for vulnerabilities, and see if the registers are overwritten.
EDIT: I am running ubuntu 10.04
One option is to use a memory debugger such as Valgrind. Note, however, that Valgrind only tracks for buffer overflows on dynamically-allocated memory.
If you have the option to use C++ instead of C, then you can switch to using containers rather than raw arrays, and harness GCC's "checked container" mode (see GCC STL bound checking). I'm sure other compilers offer similar tools.
Another hint (in addition of Oli's answer), when chasing memory bugs with the gdb debugger, is to disable address space layout randomization, with e.g.
echo 0 > /proc/sys/kernel/randomize_va_space
After doing that, two consecutive runs of the same deterministic program will usually mmap regions at the same addresses (from one run to another), and this helps a lot debugging with gdb (because then malloc usually gives the same result from one run to another, at the same given location in the run).
You can also use the watch command of gdb. In particular, if in a first run (with ASLR disabled) you figure that the location 0x123456 is changing unexepectedly, you could give gdb the following command in its second run:
watch * (void**) 0x123456
Then gdb will break when this location changes (sadly, it has to be mmap-ed already).

C code on Linux under gdb runs differently if run standalone?

I have built a plain C code on Linux (Fedora) using code-sorcery tool-chain. This is for ARM Cortex-A8 target. This code is running on a Cortex A8 board, running embedded Linux.
When I run this code for some test case, which does dynamic memory allocation (malloc) for some large size (10MB), it crashes after some time giving error message as below:
select 1 (init), adj 0, size 61, to kill
select 1030 (syslogd), adj 0, size 64, to kill
select 1032 (klogd), adj 0, size 74, to kill
select 1227 (bash), adj 0, size 378, to kill
select 1254 (ppp), adj 0, size 1069, to kill
select 1255 (TheoraDec_Corte), adj 0, size 1159, to kill
send sigkill to 1255 (TheoraDec_Corte), adj 0, size 1159
Program terminated with signal SIGKILL, Killed.
Then, when I debug this code for the same test case using gdb built for the target, the point where this dynamic memory allocation happens, code fails to allocate that memory and malloc returns NULL. But during normal stand-alone run, I believe malloc should be failing to allocate but it strangely might not be returning NULL, but it crashes and the OS kills my process.
Why is this behaviour different when run under gdb and when without debugger?
Why would malloc fails yet not return a NULL. Could this be possible, or the reason for the error message I am getting is else?
How do I fix this?
thanks,
-AD
So, for this part of the question, there is a surefire answer:
Why would malloc fails yet not return a NULL. Could this be possible, or the reason for the error message i am getting is else?
In Linux, by default the kernel interfaces for allocating memory almost never fail outright. Instead, they set up your page table in such a way that on the first access to the memory you asked for, the CPU will generate a page fault, at which point the kernel handles this and looks for physical memory that will be used for that (virtual) page. So, in an out-of-memory situation, you can ask the kernel for memory, it will "succeed", and the first time you try to touch that memory it returned back, this is when the allocation actually fails, killing your process. (Or perhaps some other unfortunate victim. There are some heuristics for that, which I'm not incredibly familiar with. See "oom-killer".)
Some of your other questions, the answers are less clear for me.
Why is this behaviour different when run under gdb and when without debugger?It could be (just a guess really) that GDB has its own malloc, and is tracking your allocations somehow. On a somewhat related point, I've actually frequently found that heap bugs in my code often aren't reproducible under debuggers. This is frustrating and makes me scratch my head, but it's basically something I've pretty much figured one has to live with...
How do i fix this?
This is a bit of a sledgehammer solution (that is, it changes the behavior for all processes rather than just your own, and it's generally not a good idea to have your program alter global state like that), but you can write the string 2 to /proc/sys/vm/overcommit_memory. See this link that I got from a Google search.
Failing that... I'd just make sure you're not allocating more than you expect to.
By definition running under a debugger is different than running standalone. Debuggers can and do hide many of the bugs. If you compile for debugging you can add a fair amount of code, similar to compiling completely unoptimized (allowing you to single step or watch variables for example). Where compiling for release can remove debugging options and remove code that you needed, there are many optimization traps you can fall into. I dont know from your post who is controlling the compile options or what they are.
Unless you plan to deliver the product to be run under the debugger you should do your testing standalone. Ideally do your development without the debugger as well, saves you from having to do everything twice.
It sounds like a bug in your code, slowly re-read your code using new eyes as if you were explaining it to someone, or perhaps actually explain it to someone, line by line. There may be something right there that you cannot see because you have been looking at it the same way for too long. It is amazing how many times and how well that works.
I could also be a compiler bug. Doing things like printing out the return value, or not can cause the compiler to generate different code. Adding another variable and saving the result to that variable can kick the compiler to do something different. Try changing the compiler options, reduce or remove any optimization options, reduce or remove the debugger compiler options, etc.
Is this a proven system or are you developing on new hardware? Try running without any of the caches enabled for example. Working in a debugger and not in standalone, if not a compiler bug can be a timing issue, single stepping flushes the pipline, mixes the cache up differently, gives the cache and memory system an eternity to come up with a result which it doesnt have in real time.
In short there is a very long list of reasons why running under a debugger hides bugs that you cannot find until you test in the final deliverable like environment, I have only touched on a few. Having it work in the debugger and not in standalone is not unexpected, it is simply how the tools work. It is likely your code, the hardware, or your tools based on the description you have given so far.
The fastest way to eliminate it being your code or the tools is to disassemble the section and inspect how the passed values and return values are handled. If the return value is optimized out there is your answer.
Are you compiling for a shared C library or static? Perhaps compile for static...

Can the stack size be changed dynamically - How?

Can the stack size be changed dynamically in C?
If yes, How?
It depends on OS you are using.
On Unix/Linux you can use POSIX syscall setrlimit() for RLIMIT_STACK resource.
See man setrlimit for details.
By dynamically do you mean changing the stack size while the code is in execution? AFAIK, that cannot be done. But, you can set the stack size before you run your application. You can do this by using the "ulimit -s" command in linux which will set the stack size for all process executed under that shell.
Incase of windows, the same can be done in VC6 for that project by setting the stack size in Project Properties->link options->output->stack allocations->reserve. I am not aware for VC8, but such options might be available.
In a single-threaded program under Linux, the stack will grow automatically until it crashes into something else in the memory space. This is usually the heap and on 32-bit systems this means that you can typically have several GB of stack.
In a multi-threaded program, this isn't generally possible as another thread's stack will be in the way.
You can control the stack size when creating a new thread, but this is generally a bad idea as it is architecture-dependent how much stack is required by a task.
It's pretty low level stuff and mostly controlled by your C library / threading library. Mess around at your peril :)
In general, it's something that can't be done robustly because address space needs to be reserved for the stack. If objects were already allocated on the heap with addresses within the new desired stack-range, you'd be in big trouble. On systems with less memory than address space it could be possible, but I doubt you'll see many systems that allow it. C does not require nor support any such mechanisms for it.
No, this is outside the scope of C.
Why do you need to do this? It depends on the OS and isn't something that C itself gets directly involved in (although specific linkers and runtime environments have varying ways of managing the configuration of such things).
What OS have you got, and what are you trying to achieve?

Resources