I'm writing a firmware for a Atmel XMEGA microcontroller in c and I think I filled up the 4 KB of SRAM. As far as I know I only do have static/global data and local stack variables (I don't use malloc within my code).
I use a local variable to buffer some pixel data. If I increase the buffer to 51 bytes my display is showing strange results - a buffer of 6 bytes is doing fine. This is why I think my ram is full and the stack is overwriting something.
Creating more free memory is not my problem because I can just move some static data into the flash and only load it when its needed. What bothers me is the fact that I could have never discovered that the memory got full.
Is it somehow possible to dected (e.g. by reseting the microcontroller) when the memory got filled up instead of letting it overwrite some other data?
It can be very difficult to predict exactly how much stack you'll need (some toolchains can have a go at this if you turn on the right options, but it's only ever a rough guide).
A common way of checking on the state of the stack is to fill it completely with a known value at startup, run the code as hard/long as you can, and then see how much had not been overwritten.
The startup code for your toolchain might even have an option to fill the stack for you.
Unfortunately, although the concepts are very simple: fill the stack with a known value, count the number of those values which remain, the reality of implementing it can require quite a deep understanding of the way your specific tools (particularly the startup code and the linker) work.
Crude ways to check if stack overflow is what's causing your problem are to make all your local arrays 'static' and/or to hugely increase the size of the stack and then see if things work better. These can both be difficult to do on small embedded systems.
"Is it somehow possible to dected (e.g.
by reseting the microcontroller) when
the memory got filled up instead of
letting it overwrite some other data?"
I suppose currently you have a memory mapping like (1).
When stack and/or variable space grow to much, they collide and overwrite each other (*).
Another possibility is a memory mapping like (2).
When stack or variable space exceeds the maximum space, they hit the not mapped addr space (*).
Depending on the controller (I am not sure about AVR family) this causes a reset/trap or similar (= what you desired).
[not mapped addr space][ RAM mapped addr space ][not mapped addr space]
(1) [variables ---> * <--- stack]
(2) *[ <--- stack variables ---> ]*
(arrows indicate growing direction if more variable/stack is used)
Of course it is better to make sure beforehand that RAM is big enough.
Typically the linker is responsible for allocating the memory for code, constants, static data, stacks and heaps. Often you must specify required stack sizes (and available memory) to the linker which will then flag an error if it can't fit everything in.
Note also that if you're dealing with a multithreaded application, then each thread has it's own stack and these are frequently allocated off the heap as the thread starts.
Unless your processor has some hardware checking for stack overflow on it (unlikely), there are a couple of tricks you can use to monitor the stack usage.
Fill the stack with a known marker pattern, and examine the stack memory (as allocated by the linker) to determine how much of the marker remains uncorrupted.
In a timer interrupt (or similar) compare the main thread stack pointer with the base of the stack to check for overflow
Both of these approaches are useful in debugging, but they are not guaranteed to catch all problems and will typically only flag a problem AFTER the stack has already corrupted something else...
Usually your programming tool knows the parameters of the controller, so you should be warned if you used more (without mallocs, it is known at compile time).
But you should be careful with pixeldata, because most displays don't have linear address space.
EDIT: usually you can specify the stack size manually. Leave just enough memory for static variables, and reserve the rest for stack.
Related
This question already has answers here:
How to determine maximum stack usage in embedded system with gcc?
(7 answers)
Closed 1 year ago.
I'm writing an embedded program that uses a static limited stack area of a known size (in other words, I have X bytes for the stack, and there's no overlaying OS that can allocate more stack on demand for me). I want to avoid errors during runtime, and catch them in build time instead - to have some indication if I mistakenly declared too much variables in some function block that won't fit in the stack during the runtime.
Given that I don't use recursive calls in my program, can I somehow know during compilation time how much space on stack all my local variables will take on the deepest function call path? Or at least know how much space my variables will take in a single block (function) if the compiler is not smart enough to analyze it on all the nested calls?
Given that I don't use recursive calls in my program, can I somehow know during compilation time how much space on stack all my local variables will take on the deepest function call path?
Only if you don't use interrupts. Which is extremely likely in any embedded system. So you'll have to find out stack use with dynamic analysis.
The old school way is to set your whole stack area to a value like 0xAA upon reset from a debugger, then let the program run for a while, make sure to provoke all use-cases. Then halt and inspect how far down you still have 0xAA in memory. It isn't a 100% scientific, fool-proof method but works just fine in practice, in the vast majority of cases.
Other methods involve setting write breakpoints at certain stack locations where you don't expect the program to end up, sort of like a "hardware stack canary". Run the program and ensure that the breakpoint never triggers. If it does, then investigate from there, move the breakpoint further down the memory map to see exactly where.
Another good practice is to always memory map your stack so that it can only overflow into forbidden memory or at least into read-only flash etc - ideally you'd get a hardware exception for stack overflow. You definitely want to avoid the stack overflowing into other RAM sections like .data/.bss, as that will cause severe and extremely subtle error scenarios.
Let's say I have a function called from within a tight loop, that allocates a few large POD arrays (no constructors) on the stack in one scenario, vs. I allocate the arrays dynamically once and reuse them in each iteration. Do local arrays add run-time cost or not?
As I understand it, allocating local POD variables comes down to shifting the stack pointer, so it shouldn't matter much. However, few things come to mind that may potentially affect the performance:
Checking for stack overflow - who and when does these checks, how often? On some systems stacks can grow automatically, but again, I know very little about this.
Cache considerations: is the stack treated in a special way by the CPU cache, or it's no different from the rest of data?
Are variadic arrays any different with respect to the above? Say, for constant-sized arrays the stack can be somehow preallocated (or pre-computed by the compiler?), whereas for variadic ones something else is involved that adds run-time cost. Again I have no idea how this works.
Checking for stack overflow -- typically the prologue produced by the compiler will walk through the space to be used with a stride corresponding to page size. This guarantees that if the OS is standing ready to extend the stack, that an access to the guard page triggers that OS logic before any accesses occur to lands beyond.
Cache -- The stack is not treated in any special way, but you're likely to get more hits because of locality to space used for spilling registers, saving return addresses, etc which make the stack hot in cache. But if your stack usage is large enough, the part that's already in cache will represent only a tiny fraction. Also, whatever part of the stack has been used recently by another function may be hot as well.
Variable length arrays / arrays with runtime bound -- not really that different. The compiler will have to compute the needed size beforehand, but touching all the pages and adjusting the stack pointer won't magically become more expensive. Exception: unrolling of the loop touching pages will be affected by the fact that the number of pages isn't constant, but this is unlikely to make any difference.
Note that there are a few platforms with dedicated separate registers to be used for return addresses and spilling -- on such the note about these operations making automatic storage hot in cache do not apply.
The only performance hit you'll take is if the stack memory hasn't been mapped yet. Creating the virtual-to-physical mappings can take some time, but only the first time you use the memory. Note that you'd also have to pay that price if, for example, you created a large array of POD on the heap using new or malloc() or any variant thereof, and the memory pages you get haven't been mapped yet, either.
Stack overflow checking is likely to be a SIGSEGV if you overrun the stack. Whether or not the stack grows automatically depends on the OS, and in some cases the thread that's using the stack in question, as a process can have more than one thread and therefore more than one stack. In general, the original thread of the process has a stack that grows automatically, up to some limit, while threads started by the process have a fixed-size stack. How the stack grows isn't as important as the fact they have to grow - the growth can be a significant performance hit.
So it will in general be a lot faster to use stack memory instead of heap memory - as long as you pay the price up front and "touch" each page you need to use for the stack to ensure it "exists" and has a virtual-to-physical mapping. But the trade is that stack memory can only be used by the thread that's running on that stack, and that size is likely limited much more so than the heap. That thread may allow access to data on its stack from other threads, but it's that thread that has to retain control of its own stack.
I've been told not to use stack allocated arrays because the stack is a precious resource.
Other people have suggested to me that in fact it is perfectly fine to use stack allocated arrays so long as the array is relatively small.
I would like to have a general rule of thumb: when should I use a stack allocated array?
And when should I use a heap allocated array?
While all of your memory is limited, even today with enormous amounts of RAM and virtual memory, there is still a limit. However, it's rather large, especially compared with the stack which can be anything from a couple of kb on small embedded systems to a couple of megabytes on a PC.
Besides that, there is also the question about how you are using it, and for what. For example, if you want to return an "array" from a function, it should never be on the stack.
In general, I would say that try to keep arrays on the stack small if you can. If you are creating an array with thousands of entries on the stack you should stop and think about what you want it for.
It depends on your platform.
Nowadays, if working on the popular x64 platform, you don't really have to worry about it.
Depending on the Operating System you use, you can check how much stack space and how much heap space a userland process is allowed to use.
For example, UNIX-like systems have soft and hard limits. Some you can crank up, some you can not.
Bottom line is that you don't usually need to worry about such things. And when you need to know, you are usually tied so closely to the platform you'll be developing for that you know all these details.
Hope I answered your question. If you want specific values please specify your exact hardware, operating system and user privileges.
The answer to this question is context dependent. When you write for an operating system kernel, for example, the stack might be quite limited, and allocating more than a thousand bytes in a stack frame could cause a problem.
In modern consumer systems, the space available for the stack is typically quite large. One problem systems used to have was that address space was limited and, once the stack was assigned an address, it could not grow any further than the next object in the address space in the direction of stack growth, regardless of the availability of physical memory or of virtual memory elsewhere in the address space. This is less of a problem with today’s address spaces.
Commonly, megabytes of space can be allocated in a stack frame, and doing so is cheap and easy. However, if many routines that allocate large amounts of space are called, or one or a few routines that allocate large amounts of space are called recursively, then problems can occur because too much space is used, running into some limit (such as address space or physical memory).
Of course, running into a physical memory limit will not be alleviated by allocating space from the heap. So only the issue of consuming the address space available for the stack is relevant to the question of whether to use stack or heap.
A simple test for whether this is a problem is to insert use of a great deal of stack space in your main routine. If you use additional stack space and your application still functions under a load that uses large amounts of stack space normally, then, when you remove this artificial reservation in main, you will have plenty of margin.
A better way would be to calculate the maximum your program could use and compare that to the stack space available from the system. But that is rarely easy with today’s software.
If you are running into stack space limits, your linker or your operating system may have options to make more available.
Scope of Global and static variables will be through out the life of a process. Memory for these variable will be allocated when a process is started and it will be freed only process exits.
But local variable(stack variable) has scope only to a function on which it is defined. Memory will be allocated when a function is invoked and it will be freed once control exits from the function.
Main intention of dynamic memory is to create a variable of user defined scope. If you want to control a scope of variable means, you can allocate memory for a variable x at one function and then pass the reference(address) to as many function you want and then finally you can free it.
So with the help of dynamic allocated memory, we can create a variable which has scope higher than a local variable and lesser than global or static variable.
Apart from this if the size is very very high its better to go for dynamic memroy, if the architecture contains memory constraint.
The good reason to use heap allocated memory is passing its ownership to some other function/struct. From the other hand, stack gives you memory management for free, you can not forget to deallocate memory from stack, while there is risk of leak if you use heap.
If you create an array just for local usage, the criteria of size of the one to use, however it is hard to give exact size, above which memory should be allocated on heap. One could say that a few hundreds bytes is enough to move to heap, for some others it will be less or more than that.
i declared a struct variable in C of size greater than 1024bytes. On running Coverity (a static code analyzer application) it reports that this stack variable is greater than 1024 bytes and therefore a cause of error.
I'd like to know if I need to worry about this warning? Is there really a maximum limit to the size of a single stack variable?
thanks,
che
The maximum size of a variable is the limited by the maximum size of the stack (specifically, how much of the stack is left over from any current use including variables and parameters from functions higher on the stack as well as process frame overhead).
On Windows, the stacksize of the first thread is a property of the executable set during linking while the stacksize of a thread can be specified during thread creation.
On Unix, the stacksize of the first thread is usually only limited only by how much room there is for it to grow. Depending on how the particular Linux lays out memory and your use of shared objects, that can vary. The stacksize of a thread can also be specified during thread creation.
The problem it is trying to protect you from is stack overflow, because of different execution paths, it is very hard to find in testing. Mostly for this reason - it is considered bad form to allocate a large amount of data on the stack. You are only really likely to run into a real problem on an embedded system though.
In other words, it sets an arbitrary limit to what it considers too much data on the stack.
Yes. Of course it's limited by the address space of your system. It's also limited by the amount of space allocated to the stack by your OS, which usually can't be changed after your program starts but can be changed beforehand (either by the launching process, or by the properties of the executable). At a quick glance, the maximum stack size on my OS X system is 8 MiB and on Linux it's 10 MiB. On some systems, you can even allocate a different amount of stack to each different thread you start, although this is of limited usefulness. Most compilers also have another limit to how much they'll allow in a single stack frame.
On a modern desktop, I wouldn't worry about a 1k stack allocation unless the function were recursive. If you're writing embedded code or code for use inside an OS kernel, it would be a problem. Code in the Linux kernel is only permitted 64 KiB stacks or less, depending on configuration options.
This article is pretty interesting regarding stack size http://www.embedded.com/columns/technicalinsights/47101892?_requestid=27362
Yes is it OS dependent and also other things dependent. Sorry to be so vague. You may also be able to dig up some code in the gcc collection for testing stack size.
If your function was involved (directly or indirectly) in recursion, then allocating a large amount on the stack would limit the depth of recursion and might well blow the stack. Under Windows this stack reserve defaults to 1MB, though you can increase it statically with linker commands. The stack will grow as it is used, but the operating system sometimes cannot extend it. I discuss this in a little more detail on my website here.
As I have seen, a C compiler(turbo) provides a maximum size of 64000k for a variable. If we need more size, then it is declared as "huge".
It's not a good idea to try to use a massive amount of stack space.
Here is a link to the default gcc stack size: http://www.cs.nyu.edu/exact/core/doc/stackOverflow.txt
Also, you could specify --stack,xxxxx to customize the stack size, so it's best to assume xxxxx is a small number and stick with heap allocation.
Stack, heap, low, high VM -- Nuts, for the first thread, the stack at the top pf 64 bit VM, there should be no limit, so it seems like a gcc/c compiler bug that for local automatic "int x[2621440];" I get SIGSEGV. The compiler should be letting the first thread stack grow until it hits the heap, which in a 16 billion billion byte VM is pretty unlikely for now. The kindest thing is to call it a compiler "limitation". (In testing some while back, probably on a Solaris SPARC, it seemed that local variables processed faster than global ones. Go figure!)
You may think that this is a coincidence that the topic of my question is similar to the name of the forum but I actually got here by googling the term "stack overflow".
I use the OPNET network simulator in which I program using C. I think I am having a problem with big array sizes. It seems that I am hitting some sort of memory allocation limitation. It may have to do with OPNET, Windows, my laptop memory or most likely C language. The problem is caused when I try to use nested arrays with a total number of elements coming to several thousand integers. I think I am exceeding an overall memory allocation limit and I am wondering if there is a way to increase this cap.
Here's the exact problem description:
I basically have a routing table. Let's call it routing_tbl[n], meaning I am supporting 30 nodes (routers). Now, for each node in this table, I keep info. about many (hundreds) available paths, in an array called paths[p]. Again, for each path in this array, I keep the list of nodes that belong to it in an array called hops[h]. So, I am using at least nph integers worth of memory but this table contains other information as well. In the same function, I am also using another nested array that consumes almost 40,000 integers as well.
As soon as I run my simulation, it quits complaining about stack overflow. It works when I reduce the total size of the routing table.
What do you think causes the problem and how can it be solved?
Much appreciated
Ali
It may help if you post some code. Edit the question to include the problem function and the error.
Meanwhile, here's a very generic answer:
The two principal causes of a stack overflow are 1) a recursive function, or 2) the allocation of a large number of local variables.
Recursion
if your function calls itself, like this:
int recurse(int number) {
return (recurse(number));
}
Since local variables and function arguments are stored on the stack, then it will in fill the stack and cause a stack overflow.
Large local variables
If you try to allocate a large array of local variables then you can overflow the stack in one easy go. A function like this may cause the issue:
void hugeStack (void) {
unsigned long long reallyBig[100000000][1000000000];
...
}
There is quite a detailed answer to this similar question.
Somehow you are using a lot of stack. Possible causes include that you're creating the routing table on the stack, you're passing it on the stack, or else you're generating lots of calls (eg by recursively processing the whole thing).
In the first two cases you should create it on the heap and pass around a pointer to it. In the third case you'll need to rewrite your algorithm in an iterative form.
Stack overflows can happen in C when the number of embedded recursive calls is too high. Perhaps you are calling a function from itself too many times?
This error may also be due to allocating too much memory in static declarations. You can switch to dynamic allocations through malloc() to fix this type of problem.
Is there a reason why you cannot use the debugger on this program?
It depends on where you have declared the variable.
A local variable (i.e. one declared on the stack is limited by the maximum frame size) This is a limit of the compiler you are using (and can usually be adjusted with compiler flags).
A dynamically allocated object (i.e. one that is on the heap) is limited by the amount of available memory. This is a property of the OS (and can technically by larger the physical memory if you have a smart OS).
Many operating systems dynamically expand the stack as you use more of it. When you start writing to a memory address that's just beyond the stack, the OS assumes your stack has just grown a bit more and allocates it an extra page (usually 4096Kib on x86 - exactly 1024 ints).
The problem is, on the x86 (and some other architectures) the stack grows downwards but C arrays grow upwards. This means if you access the start of a large array, you'll be accessing memory that's more than a page away from the edge of the stack.
If you initialise your array to 0 starting from the end of the array (that's right, make a for loop to do it), the errors might go away. If they do, this is indeed the problem.
You might be able to find some OS API functions to force stack allocation, or compiler pragmas/flags. I'm not sure about how this can be done portably, except of course for using malloc() and free()!
You are unlikely to run into a stack overflow with unthreaded compiled C unless you do something particularly egregious like have runaway recursion or a cosmic memory leak. However, your simulator probably has a threading package which will impose stack size limits. When you start a new thread it will allocate a chunk of memory for the stack for that thread. Likely, there is a parameter you can set somewhere that establishes the the default stack size, or there may be a way to grow the stack dynamically. For example, pthreads has a function pthread_attr_setstacksize() which you call prior to starting a new thread to set its size. Your simulator may or may not be using pthreads. Consult your simulator reference documentation.