how many recursive function calls causes stack overflow? - c

I am working on a simulation problem written in c, the main part of my program is a recursive function.
when the recursive depth reaches approximately 500000 it seems stack overflow occurs.
Q1 : I want to know that is this normal?
Q2 : in general how many recursive function calls causes stack overflow?
Q3 : in the code below, removing local variable neighbor can prevent from stack overflow?
my code:
/*
* recursive function to form Wolff Cluster(= WC)
*/
void grow_Wolff_cluster(lattic* l, Wolff* wolff, site *seed){
/*a neighbor of site seed*/
site* neighbor;
/*go through all neighbors of seed*/
for (int i = 0 ; i < neighbors ; ++i) {
neighbor = seed->neighbors[i];
/*add to WC according to the Wolff Algorithm*/
if(neighbor->spin == seed->spin && neighbor->WC == -1 && ((double)rand() / RAND_MAX) < add_probability)
{
wolff->Wolff_cluster[wolff->WC_pos] = neighbor;
wolff->WC_pos++; // the number of sites that is added to WC
neighbor->WC = 1; // for avoiding of multiple addition of site
neighbor->X = 0;
///controller_site_added_to_WC();
/*continue growing Wolff cluster(recursion)*/
grow_Wolff_cluster(l, wolff, neighbor);
}
}
}

I want to know that is this normal?
Yes. There's only so much stack size.
In the code below, removing local variable neighbor can prevent from stack overflow?
No. Even with no variables and no return values the function calls themselves must be stored in the stack so the stack can eventually be unwound.
For example...
void recurse() {
recurse();
}
int main (void)
{
recurse();
}
This still overflows the stack.
$ ./test
ASAN:DEADLYSIGNAL
=================================================================
==94371==ERROR: AddressSanitizer: stack-overflow on address 0x7ffee7f80ff8 (pc 0x00010747ff14 bp 0x7ffee7f81000 sp 0x7ffee7f81000 T0)
#0 0x10747ff13 in recurse (/Users/schwern/tmp/./test+0x100000f13)
SUMMARY: AddressSanitizer: stack-overflow (/Users/schwern/tmp/./test+0x100000f13) in recurse
==94371==ABORTING
Abort trap: 6
In general how many recursive function calls causes stack overflow?
That depends on your environment and function calls. Here on OS X 10.13 I'm limited to 8192K by default.
$ ulimit -s
8192
This simple example with clang -g can recurse 261976 times. With -O3 I can't get it to overflow, I suspect compiler optimizations have eliminated my simple recursion.
#include <stdio.h>
void recurse() {
puts("Recurse");
recurse();
}
int main (void)
{
recurse();
}
Add an integer argument and it's 261933 times.
#include <stdio.h>
void recurse(int cnt) {
printf("Recurse %d\n", cnt);
recurse(++cnt);
}
int main (void)
{
recurse(1);
}
Add a double argument, now it's 174622 times.
#include <stdio.h>
void recurse(int cnt, double foo) {
printf("Recurse %d %f\n", cnt, foo);
recurse(++cnt, foo);
}
int main (void)
{
recurse(1, 2.3);
}
Add some stack variables and it's 104773 times.
#include <stdio.h>
void recurse(int cnt, double foo) {
double this = 42.0;
double that = 41.0;
double other = 40.0;
double thing = 39.0;
printf("Recurse %d %f %f %f %f %f\n", cnt, foo, this, that, other, thing);
recurse(++cnt, foo);
}
int main (void)
{
recurse(1, 2.3);
}
And so on. But I can increase my stack size in this shell and get twice the calls.
$ ./test 2> /dev/null | wc -l
174622
$ ulimit -s 16384
$ ./test 2> /dev/null | wc -l
349385
I have a hard upper limit to how big I can make the stack of 65,532K or 64M.
$ ulimit -Hs
65532

A stack overflow isn’t defined by the C standard, but by the implementation. The C standard defines a language with unlimited stack space (among other resources) but does have a section about how implementations are allowed to impose limits.
Usually it’s the operating system that actually first creates the error. The OS doesn’t care about how many calls you make, but about the total size of the stack. The stack is composed of stack frames, one for each function call. Usually a stack frame consists of some combination of the following five things (as an approximation; details can vary a lot between systems):
The parameters to the function call (probably not actually here, in this case; they’re probably in registers, although this doesn’t actually buy anything with recursion).
The return address of the function call (in this case, the address of the ++i instruction in the for loop).
The base pointer where the previous stack frame starts
Local variables (at least those that don’t go in registers)
Any registers the caller wants to save when it makes a new function call, so the called function doesn’t overwrite them (some registers may be saved by the caller instead, but it doesn’t particularly matter for stack size analysis). This is why passing parameters in registers doesn’t help much in this case; they’ll end up on the stack sooner or later.
Because some of these (specifically, 1., 4., and 5.) can vary in size by a lot, it can be difficult to estimate how big an average stack frame is, although it’s easier in this case because of the recursion. Different systems also have different stack sizes; it currently looks like by default I can have 8 MiB for a stack, but an embedded system would probably have a lot less.
This also explains why removing a local variable gives you more available function calls; you reduced the size of each of the 500,000 stack frames.
If you want to increase the amount of stack space available, look into the setrlimit(2) function (on Linux like the OP; it may be different on other systems). First, though, you might want to try debugging and refactoring to make sure you need all that stack space.

Yes and no - if you come across a stack overflow in your code, it could mean a few things
Your algorithm is not implemented in a way that respects the amount of memory on the stack you have been given. You may adjust this amount to suit the needs of the algorithm.
If this is the case, it's more common to change the algorithm to more efficiently utilize the stack, rather than add more memory. Converting a recursive function to an iterative one, for example, saves a lot of precious memory.
It's a bug trying to eat all your RAM. You forgot a base case in the recursion or mistakenly called the same function. We've all done it at least 2 times.
It's not necessarily how many calls cause an overflow - it's dependent upon how much memory each individual call takes up on a stack frame. Each function call uses up stack memory until the call returns. Stack memory is statically allocated -- you can't change it at runtime (in a sane world). It's a last-in-first-out (LIFO) data structure behind the scenes.
It's not preventing it, it's just changing how many calls to grow_Wolff_cluster it takes to overflow the stack memory. On a 32-bit system, removing neighbor from the function costs a call to grow_Wolff_cluster 4 bytes less. It adds up quickly when you multiply that in the hundreds of thousands.
I suggest you learn more about how stacks work for you. Here's a good resource over on the software engineering stack exchange. And another here on stack overflow (zing!)

For each time a function recurs, your program takes more memory on the stack, the memory it takes for each function depends upon the function and variables within it. The number of recursions that can be done of a function is entirely dependant upon your system.
There is no general number of recursions that will cause stack overflow.
Removing the variable 'neighbour' will allow for the function to recur further as each recursion takes less memory, but it will still eventually cause stack overflow.

This is a simple c# function that will show you how many iteration your computer can take before stack overflow (as a reference, I have run up to 10478):
private void button3_Click(object sender, EventArgs e)
{
Int32 lngMax = 0;
StackIt(ref lngMax);
}
private void StackIt(ref Int32 plngMax, Int32 plngStack = 0)
{
if (plngStack > plngMax)
{
plngMax = plngStack;
Console.WriteLine(plngMax.ToString());
}
plngStack++;
StackIt(ref plngMax, plngStack);
}
in this simple case, the condition check: "if (plngStack > plngMax)" could be removed,
but if you got a real recursive function, this check will help you localize the problem.

Related

clang eager to inline at cost of stack

Apple clang version 11.0.0 (clang-1100.0.33.17)
Rewording with example as far too many people got fixated on "recursive" when it was about stack usage, and the one recursive example was the lowest fruit (fix that, and you win more!)
The project has recently moved to mark all functions "static" that are not prototyped in headers, and only used in the specific source file.
llvm appears to be quite eager to inline functions, which is often desirable, especially in userland.
However, in kernel, there is a fixed stack of 16KB. Sometimes, the inlining does the wrong thing.
Example:
#include <stdio.h>
#include <string.h>
#include <stdint.h>
// clang -O2 -g -Wframe-larger-than=1 -o stack stack.c
#define current_stack_pointer ({ \
register unsigned long esp asm("esp"); \
asm("" : "=r"(esp)); \
esp; \
})
__attribute__((noinline))
static void lower(unsigned long top)
{
uint64_t usage[100];
for (int i = 0; i < 100; i++)
usage[i] = usage[i] + 1;
printf("%s; stack now %lu : deepest - we care about this one\n",
__func__, top - current_stack_pointer);
}
// Changing this one to be inlined or not. It is only called once
// on the way down the stack, it's not "part" of the deep stack, and
// it is undesirable that its "cost" is pushed on the deep stack when
// inlined.
//__attribute__((noinline))
__attribute__((always_inline))
static void step_one(unsigned long top)
{
uint64_t usage[200];
for (int i = 0; i < 200; i++)
usage[i] = usage[i] + 1;
printf("%s; stack now %lu\n", __func__, top - current_stack_pointer);
}
__attribute__((noinline))
static void start(unsigned long top)
{
uint64_t usage[100];
for (int i = 0; i < 100; i++)
usage[i] = usage[i] + 1;
printf("%s; stack now %lu\n", __func__, top - current_stack_pointer);
step_one(top);
lower(top);
printf("%s; stack now %lu\n", __func__, top - current_stack_pointer);
}
int main(int argc, char **argv)
{
uint64_t usage[100];
unsigned long top;
top = current_stack_pointer;
printf("%s; stack now %lu\n", __func__, top - current_stack_pointer);
// Make it use stack space, ignore this, just to set some variables
for (int i = 0; i < 100; i++)
usage[i] = usage[i] + 1;
start(top);
}
So here we go, force step_one() to be noinline:
stack.c:55:5: warning: stack frame size of 824 bytes in function 'main'
stack.c:41:13: warning: stack frame size of 856 bytes in function 'start'
stack.c:30:13: warning: stack frame size of 1624 bytes in function 'step_one'
stack.c:13:13: warning: stack frame size of 824 bytes in function 'lower'
# ./stack
main; stack now 0
start; stack now 864
step_one; stack now 2496
lower; stack now 1696 : deepest - we care about this one
That is great, even though we called step_one on the way down, and the stack grew to handle it, it is released and lower is not affected.
Hurrah.
Now, changing step_one to be inlined
stack.c:55:5: warning: stack frame size of 824 bytes in function 'main'
stack.c:41:13: warning: stack frame size of 2440 bytes in function 'start'
stack.c:13:13: warning: stack frame size of 824 bytes in function 'lower'
# ./stack
main; stack now 0
start; stack now 2448
step_one; stack now 2448
lower; stack now 3280 : deepest - we care about this one
Here we are, step_one was inlined, and its cost is now part of start and as we descend into lower that cost is still taking stack space.
Hurroo.
This is unfortunate. For kernel files, inlining functions can make it worse (with stacksize in mind) and it at times it has gone from a frame-size of "88" to "1800". due to inlining.
I suspect it has already been answered, there is no way to tell clang to prefer a lean stack over inlining benefits.
Are there any way to ask clang to limit stack usage as a compile option for some source files?
I don't know of any, and wouldn't expect it to exist (now or in the future).
What you're hoping for only makes sense in the presence of recursion (without recursion, "reduce stack size" means "always inline to reduce stack space wasted by function epilogue/prologue"). Currently (as I understand it); clang doesn't even know if a function is recursive and just has a "max. cutoff depth for caller contexts" approach.
Manually figuring out what to "noline" is time-consuming (for the human)...
It would be annoying and possibly error prone; but shouldn't be time-consuming (given that recursion is rare, even when developers have no reason to care about stack size).
The problem with putting __attribute__(noinline) on the log function is that it will be applied to all callers and prevent the log function from being inlined when it is beneficial. What you'd really want is the reverse - e.g. some kind of (hypothetical) __attribute__(noinline_callees) you can put on the recursive function. That behavior could be achieved by putting the recursive function in its own separate file/compilation unit; but that will probably end up being even more annoying.
So far the best option we have come across is to use -finline-hint-functions for the kernel compile, then it will only inline functions that we ask it to. Changing the default away from try-to-inline-everything, and we have a chance to keep the stack small.

C: Variable value gets dirty while returning to calling function

Take a look at this code:
extern void f3(int);
void f2 (int foo) {
//some stuff
f3(foo)
printf("f2:%d\n",foo);
}
void f1 (int foo) {
//some stuff
f2(foo);
printf("f1:%d\n",foo);
}
int main() {
//some stuff
f1(foo)
//other stuff
return 0;
}
My problem is that i have outputs like:
f2: 1060 //this is the correct value
f1: 1065294485
There is no code between the print in the function f2 and the end of the function. There is no code between the call of the function f2 and the print in the function f1. How is possible this change of value?
I need to allocate big data structures in the stack and I'm using ulimit -s 2^28. I'm also using gcc -mO0 -m32 -msse to compile because the function f3 is written in nasm with sse. Can the problem depend on this?
Ask me for other things that may be helpful to understand the problem.
Edit: I'm showing the real f2() function:
void upgma_start(float* centroids,int k,int c,int d,float* size,float *md) {
float mc1 [d];
float mc2 [d];
upgma(centroids,k,c,d,size,md,mc1,mc2);
printf("uuu:::%d:\n",k);
}
the function upgma is the function f3 of the example code and k is the foo var.
I need to allocate big data structures in the stack
Why? Why can't you simply use malloc/free?
Can the problem depend on this?
Probably memory access in your f2 is out of bounds and reaches up into the stack frame of f1. This is not directly related to the size of your stack, but you likely have an out-of-bounds array access there, with an index too large instead of writing to the array in f2's stack frame you're trashing something in f1.
Had you used dynamic memory this problem would likely manifest itself as a segfault. I strongly suggest you switch to dynamic memory and use a memory debugger like Valgrind to track down the offending instruction in your code.

Detecting stack overflows during runtime beforehand

I have a rather huge recursive function (also, I write in C), and while I have no doubt that the scenario where stack overflow happens is extremely unlikely, it is still possible. What I wonder is whether you can detect if stack is going to get overflown within a few iterations, so you can do an emergency stop without crashing the program.
In the C programming language itself, that is not possible. In general, you can't know easily that you ran out of stack before running out. I recommend you to instead place a configurable hard limit on the recursion depth in your implementation, so you can simply abort when the depth is exceeded. You could also rewrite your algorithm to use an auxillary data structure instead of using the stack through recursion, this gives you greater flexibility to detect an out-of-memory condition; malloc() tells you when it fails.
However, you can get something similar with a procedure like this on UNIX-like systems:
Use setrlimit to set a soft stack limit lower than the hard stack limit
Establish signal handlers for both SIGSEGV and SIGBUS to get notified of stack overflows. Some operating systems produce SIGSEGV for these, others SIGBUS.
If you get such a signal and determine that it comes from a stack overflow, raise the soft stack limit with setrlimit and set a global variable to identify that this occured. Make the variable volatile so the optimizer doesn't foil your plains.
In your code, at each recursion step, check if this variable is set. If it is, abort.
This may not work everywhere and required platform specific code to find out that the signal came from a stack overflow. Not all systems (notably, early 68000 systems) can continue normal processing after getting a SIGSEGV or SIGBUS.
A similar approach was used by the Bourne shell for memory allocation.
Heres a simple solution that works for win-32. Actually resembles what Wossname already posted but less icky :)
unsigned int get_stack_address( void )
{
unsigned int r = 0;
__asm mov dword ptr [r], esp;
return r;
}
void rec( int x, const unsigned int begin_address )
{
// here just put 100 000 bytes of memory
if ( begin_address - get_stack_address() > 100000 )
{
//std::cout << "Recursion level " << x << " stack too high" << std::endl;
return;
}
rec( x + 1, begin_address );
}
int main( void )
{
int x = 0;
rec(x,get_stack_address());
}
Here's a naive method, but it's a bit icky...
When you enter the function for the first time you could store the address of one of your variables declared in that function. Store that value outside your function (e.g. in a global). In subsequent calls compare the current address of that variable with the cached copy. The deeper you recurse the further apart these two values will be.
This will most likely cause compiler warnings (storing addresses of temporary variables) but it does have the benefit of giving you a fairly accurate way of knowing exactly how much stack you're using.
Can't say I really recommend this but it would work.
#include <stdio.h>
char* start = NULL;
void recurse()
{
char marker = '#';
if(start == NULL)
start = &marker;
printf("depth: %d\n", abs(&marker - start));
if(abs(&marker - start) < 1000)
recurse();
else
start = NULL;
}
int main()
{
recurse();
return 0;
}
An alternative method is to learn the stack limit at the start of the program, and each time in your recursive function to check whether this limit has been approached (within some safety margin, say 64 kb). If so, abort; if not, continue.
The stack limit on POSIX systems can be learned by using getrlimit system call.
Example code that is thread-safe: (note: it code assumes that stack grows backwards, as on x86!)
#include <stdio.h>
#include <sys/time.h>
#include <sys/resource.h>
void *stack_limit;
#define SAFETY_MARGIN (64 * 1024) // 64 kb
void recurse(int level)
{
void *stack_top = &stack_top;
if (stack_top <= stack_limit) {
printf("stack limit reached at recursion level %d\n", level);
return;
}
recurse(level + 1);
}
int get_max_stack_size(void)
{
struct rlimit rl;
int ret = getrlimit(RLIMIT_STACK, &rl);
if (ret != 0) {
return 1024 * 1024 * 8; // 8 MB is the default on many platforms
}
printf("max stack size: %d\n", (int)rl.rlim_cur);
return rl.rlim_cur;
}
int main (int argc, char *argv[])
{
int x;
stack_limit = (char *)&x - get_max_stack_size() + SAFETY_MARGIN;
recurse(0);
return 0;
}
Output:
max stack size: 8388608
stack limit reached at recursion level 174549

Compiling Tail-Call Optimization In Mutual Recursion Across C and Haskell

I'm experimenting with the foreign-function interface in Haskell. I wanted to implement a simple test to see if I could do mutual recursion. So, I created the following Haskell code:
module MutualRecursion where
import Data.Int
foreign import ccall countdownC::Int32->IO ()
foreign export ccall countdownHaskell::Int32->IO()
countdownHaskell::Int32->IO()
countdownHaskell n = print n >> if n > 0 then countdownC (pred n) else return ()
Note that the recursive case is a call to countdownC, so this should be tail-recursive.
In my C code, I have
#include <stdio.h>
#include "MutualRecursionHaskell_stub.h"
void countdownC(int count)
{
printf("%d\n", count);
if(count > 0)
return countdownHaskell(count-1);
}
int main(int argc, char* argv[])
{
hs_init(&argc, &argv);
countdownHaskell(10000);
hs_exit();
return 0;
}
Which is likewise tail recursive. So then I make a
MutualRecursion: MutualRecursionHaskell_stub
ghc -O2 -no-hs-main MutualRecursionC.c MutualRecursionHaskell.o -o MutualRecursion
MutualRecursionHaskell_stub:
ghc -O2 -c MutualRecursionHaskell.hs
and compile with make MutualRecursion.
And... upon running, it segfaults after printing 8991.
Just as a test to make sure gcc itself can handle tco in mutual recursion, I did
void countdownC2(int);
void countdownC(int count)
{
printf("%d\n", count);
if(count > 0)
return countdownC2(count-1);
}
void countdownC2(int count)
{
printf("%d\n", count);
if(count > 0)
return countdownC(count-1);
}
and that worked quite fine. It also works in the single-recursion case of just in C and just in Haskell.
So my question is, is there a way to indicate to GHC that the call to the external C function is tail recursive? I'm assuming that the stack frame does come from the call from Haskell to C and not the other way around, since the C code is very clearly a return of a function call.
I believe cross-language C-Haskell tail calls are very, very hard to achieve.
I do not know the exact details, but the C runtime and the Haskell runtime are vastly different. The main factors for this difference, as far as I can see, are:
different paradigm: purely functional vs imperative
garbage collection vs manual memory management
lazy semantics vs strict one
The kinds of optimizations which are likely to survive across language boundaries given such differences are next to zero. Perhaps, in theory, one could invent an ad hoc C runtime together with a Haskell runtime so that some optimizations are feasible, but GHC and GCC were not designed in this way.
Just to show an example of the potential differences, assume we have the following Haskell code
p :: Int -> Bool
p x = x==42
main = if p 42
then putStrLn "A" -- A
else putStrLn "B" -- B
A possible implementation of the main could be the following:
push the address of A on the stack
push the address of B on the stack
push 42 on the stack
jump to p
A: print "A", jump to end
B: print "B", jump to end
while p is implemented as follows:
p: pop x from the stack
pop b from stack
pop a from stack
test x against 42
if equal, jump to a
jump to b
Note how p is invoked with two return addresses, one for each possible result. This is different from C, whose standard implementations use only one return address. When crossing boundaries the compiler must account for this difference and compensate.
Above I also did not account for the case when the argument of p is a thunk, to keep it simple. The GHC allocator can also trigger garbage collection.
Note that the above fictional implementation was actually used in the past by GHC (the so called "push/enter" STG machine). Even if now it is no longer in use, the "eval/apply" STG machine is only marginally closer to the C runtime. I'm not even sure about GHC using the regular C stack: I think it does not, using its own one.
You can check the GHC developer wiki to see the gory details.
While I am no expert in Haskel-C interop, I do not imagine a call from C to Haskel can be a straight function invocation - it most likely has to go through intermediary to set up environment. As a result, your call to haskel would actually consist of call to this intermediary. This call likely was optimized by gcc. But the call from intermediary to actual Haskel routine was not neccessarily optimized - so I assume, this is what you are dealing with. You can check assembly output to make sure.

snprintf() prints garbage floats with newlib nano

I am running a bare metal embedded system with an ARM Cortex-M3 (STM32F205). When I try to use snprintf() with float numbers, e.g.:
float f;
f = 1.23;
snprintf(s, 20, "%5.2f", f);
I get garbage into s. The format seems to be honored, i.e. the garbage is a well-formed string with digits, decimal point, and two trailing digits. However, if I repeat the snprintf, the string may change between two calls.
Floating point mathematics seems to work otherwise, and snprintf works with integers, e.g.:
snprintf(s, 20, "%10d", 1234567);
I use the newlib-nano implementation with the -u _printf_float linker switch. The compiler is arm-none-eabi-gcc.
I do have a strong suspicion of memory allocation problems, as integers are printed without any hiccups, but floats act as if they got corrupted in the process. The printf family functions call malloc with floats, not with integers.
The only piece of code not belonging to newlib I am using in this context is my _sbrk(), which is required by malloc.
caddr_t _sbrk(int incr)
{
extern char _Heap_Begin; // Defined by the linker.
extern char _Heap_Limit; // Defined by the linker.
static char* current_heap_end;
char* current_block_address;
// first allocation
if (current_heap_end == 0)
current_heap_end = &_Heap_Begin;
current_block_address = current_heap_end;
// increment and align to 4-octet border
incr = (incr + 3) & (~3);
current_heap_end += incr;
// Overflow?
if (current_heap_end > &_Heap_Limit)
{
errno = ENOMEM;
current_heap_end = current_block_address;
return (caddr_t) - 1;
}
return (caddr_t)current_block_address;
}
As far as I have been able to track, this should work. It seems that no-one ever calls it with negative increments, but I guess that is due to the design of the newlib malloc. The only slightly odd thing is that the first call to _sbrk has a zero increment. (But this may be just malloc's curiosity about the starting address of the heap.)
The stack should not collide with the heap, as there is around 60 KiB RAM for the two. The linker script may be insane, but at least the heap and stack addresses seem to be correct.
As it may happen that someone else gets bitten by the same bug, I post an answer to my own question. However, it was #Notlikethat 's comment which suggested the correct answer.
This is a lesson of Thou shall not steal. I borrowed the gcc linker script which came with the STMCubeMX code generator. Unfortunately, the script along with the startup file is broken.
The relevant part of the original linker script:
_estack = 0x2000ffff;
and its counterparts in the startup script:
Reset_Handler:
ldr sp, =_estack /* set stack pointer */
...
g_pfnVectors:
.word _estack
.word Reset_Handler
...
The first interrupt vector position (at 0) should always point to the startup stack top. When the reset interrupt is reached, it also loads the stack pointer. (As far as I can say, the latter one is unnecessary as the HW anyway reloads the SP from the 0th vector before calling the reset handler.)
The Cortex-M stack pointer should always point to the last item in the stack. At startup there are no items in the stack and thus the pointer should point to the first address above the actual memory, 0x020010000 in this case. With the original linker script the stack pointer is set to 0x0200ffff, which actually results in sp = 0x0200fffc (the hardware forces word-aligned stack). After this the stack is misaligned by 4.
I changed the linker script by removing the constant definition of _estack and replacing it by _stacktop as shown below. The memory definitions were there before. I changed the name just to see where the value is used.
MEMORY
{
FLASH (rx) : ORIGIN = 0x8000000, LENGTH = 128K
RAM (xrw) : ORIGIN = 0x20000000, LENGTH = 64K
}
_stacktop = ORIGIN(RAM) + LENGTH(RAM);
After this the value of _stacktop is 0x20010000, and my numbers float beautifully... The same problem could arise with any external (library) function using double length parameters, as the ARM Cortex ABI states that the stack must be aligned to 8 octets when calling external functions.
snprintf accepts size as second argument. You might want to go through this example http://www.cplusplus.com/reference/cstdio/snprintf/
/* snprintf example */
#include <stdio.h>
int main ()
{
char buffer [100];
int cx;
cx = snprintf ( buffer, 100, "The half of %d is %d", 60, 60/2 );
snprintf ( buffer+cx, 100-cx, ", and the half of that is %d.", 60/2/2 );
puts (buffer);
return 0;
}

Resources