Go: Effects of empty curly braces on array initialisation memory allocation - arrays

I was playing with different ways to initialise/declare arrays in golang. I got different behaviours/results.
go version go1.3 darwin/amd64
version 1:
func main() {
a := [100000000]int64{}
var i int64
for i = 0; i < 100000000; i++ {
a[i] = i
}
}
Produces a 763MB binary. It crashes after a few seconds when I run it with this message.
runtime: goroutine stack exceeds 1000000000-byte limit
fatal error: stack overflow
version 2:
func main() {
var a [100000000]int64
var i int64
for i = 0; i < 100000000; i++ {
a[i] = i
}
}
Produces a 456KB binary. It runs in less than one second.
Question:
Can anybody help me understand why those differences (and other I may have missed) are there? Thanks!
edit:
Running times:
I built the two different snippets and run the compiled versions, so the compilation time wasn't added. The first time I run version1 is extremely slow in comparison though. Here is the output.
go build version1.go
go build version2.go
These are the execution outputs
version 1
first run
time ./version1
runtime: goroutine stack exceeds 1000000000-byte limit
fatal error: stack overflow
runtime stack:
runtime.throw(0x2fb42a8e)
/usr/local/Cellar/go/1.3/libexec/src/pkg/runtime/panic.c:520 +0x69
runtime.newstack()
/usr/local/Cellar/go/1.3/libexec/src/pkg/runtime/stack.c:770 +0x486
runtime.morestack()
/usr/local/Cellar/go/1.3/libexec/src/pkg/runtime/asm_amd64.s:228 +0x61
goroutine 16 [stack growth]:
main.main()
/Users/ec/repo/offers/lol/version1.go:3 fp=0x2b7b85f50 sp=0x2b7b85f48
runtime.main()
/usr/local/Cellar/go/1.3/libexec/src/pkg/runtime/proc.c:247 +0x11a fp=0x2b7b85fa8 sp=0x2b7b85f50
runtime.goexit()
/usr/local/Cellar/go/1.3/libexec/src/pkg/runtime/proc.c:1445 fp=0x2b7b85fb0 sp=0x2b7b85fa8
created by _rt0_go
/usr/local/Cellar/go/1.3/libexec/src/pkg/runtime/asm_amd64.s:97 +0x120
goroutine 17 [runnable]:
runtime.MHeap_Scavenger()
/usr/local/Cellar/go/1.3/libexec/src/pkg/runtime/mheap.c:507
runtime.goexit()
/usr/local/Cellar/go/1.3/libexec/src/pkg/runtime/proc.c:1445
./version1 0.00s user 0.10s system 1% cpu 7.799 total
second run
runtime: goroutine stack exceeds 1000000000-byte limit
fatal error: stack overflow
runtime stack:
runtime.throw(0x2fb42a8e)
/usr/local/Cellar/go/1.3/libexec/src/pkg/runtime/panic.c:520 +0x69
runtime.newstack()
/usr/local/Cellar/go/1.3/libexec/src/pkg/runtime/stack.c:770 +0x486
runtime.morestack()
/usr/local/Cellar/go/1.3/libexec/src/pkg/runtime/asm_amd64.s:228 +0x61
goroutine 16 [stack growth]:
main.main()
/Users/ec/repo/offers/lol/version1.go:3 fp=0x2b7b85f50 sp=0x2b7b85f48
runtime.main()
/usr/local/Cellar/go/1.3/libexec/src/pkg/runtime/proc.c:247 +0x11a fp=0x2b7b85fa8 sp=0x2b7b85f50
runtime.goexit()
/usr/local/Cellar/go/1.3/libexec/src/pkg/runtime/proc.c:1445 fp=0x2b7b85fb0 sp=0x2b7b85fa8
created by _rt0_go
/usr/local/Cellar/go/1.3/libexec/src/pkg/runtime/asm_amd64.s:97 +0x120
goroutine 17 [runnable]:
runtime.MHeap_Scavenger()
/usr/local/Cellar/go/1.3/libexec/src/pkg/runtime/mheap.c:507
runtime.goexit()
/usr/local/Cellar/go/1.3/libexec/src/pkg/runtime/proc.c:1445
./version1 0.00s user 0.10s system 98% cpu 0.102 total
version 2
first run
time ./version2
./version2 0.16s user 0.26s system 99% cpu 0.429 total
second run
time ./version2
./version2 0.17s user 0.25s system 97% cpu 0.421 total

In Version 1, you're declaring a literal array of [100000000]int64{} which the compiler immediately allocates.
Version 2, you're only declaring the type of a as [100000000]int64.
When you only have a variable declaration, the contents aren't known at that point during compilation. In version 2, the compiler knows that a is of type [100000000]int64, but the memory isn't allocated until runtime.
When you use a literal, that exact memory representation is written into the binary. It works the same as if you declared a string literal vs a variable of type string; the string literal will be written in place, while a variable declaration is only a placeholder.
Even though the current compiler (go 1.3) allows a to escape to the heap, the literal data is expected to live in the stack frame. You can see this in the assembly output (frame size is 800000016):
TEXT "".func1+0(SB),$800000016-0
If you actually need a larger literal than what can fit in the stack, you can place it in a global variable. The following executes just fine:
var a = [100000000]int64{1}
func func1() {
var i int64
for i = 0; i < 100000000; i++ {
a[i] = i
}
}
I did have to initialize at least one value in a here, because it seems that the compiler can elide this literal, if it's equal to the zero value.

It's not actually an answer but maybe someone can find this useful.
In my case this happens when I've try to json.Marshal following structure:
type Element struct {
Parent *Element
Child []*Element
}
Where Parent points to element which has current element in Child.
So I just mark "Parent" as json:"-" to be ignored in marshalling.

Related

`.bss' is not within region `ram' error - esp32 ulp risv-v

I'm trying to copy array elements from one array to another.
One of the arrays uses a int32_t index to track the current index.
int32_t pres_log_last_5_readings_raw[5];
int32_t final_arr_pressure[100];
int32_t final_array_index = 0;
for (int i = 0; i < 4; i++)
{
final_arr_pressure[final_array_index] = pres_log_last_5_readings_raw[i]; // This line will compile without error if the next line is commented
final_array_index++;
}
If the line is active I will get this error:
/Users/user/.espressif/tools/riscv32-esp-elf/esp-2022r1-11.2.0/riscv32-esp-elf/bin/../lib/gcc/riscv32-esp-elf/11.2.0/../../../../riscv32-esp-elf/bin/ld: address 0x10dc of ulp_main section `.bss' is not within region `ram'
/Users/user/.espressif/tools/riscv32-esp-elf/esp-2022r1-11.2.0/riscv32-esp-elf/bin/../lib/gcc/riscv32-esp-elf/11.2.0/../../../../riscv32-esp-elf/bin/ld: address 0x10dc of ulp_main section `.bss' is not within region `ram'
collect2: error: ld returned 1 exit status
I've tried many variations but nothing seems to work, I'm not sure if the problem is with my C skills or with the esp idf compiler.
**
Solved:
**
The right answer is indeed me running out if memory, reducing the size of the arrays to 50 fixed the problem.
I do need to mention that I should have increased the memory allocation for slow rtc memory from 4kb to 8kb in idf.py menuconfig! This would solve the problem without changing the size to 50 (in my case).
Thanks everyone
This error message says that your uninitialized static storage data (stored in the .bss section) is too big. Basically, you ran out of RAM. You need to reduce the number and size of your variables.

Why some of the local variables are not listed in the corresponding stack frame when inspected using GDB?

I have a piece of code in C as shown below-
In a .c file-
1 custom_data_type2 myFunction1(custom_data_type1 a, custom_data_type2 b)
2 {
3 int c=foo();
4 custom_data_type3 t;
5 check_for_ir_path();
6 ...
7 ...
8 }
9
10 custom_data_type4 myFunction2(custom_data_type3 c, const void* d)
11 {
12 custom_data_type4 e;
13 struct custom_data_type5 f;
14 check_for_ir_path();
15 ...
16 temp = myFunction1(...);
17 return temp;
18 }
In a header file-
1 void CRASH_DUMP(int *i)
2 __attribute__((noinline));
3
4 #define INTRPT_FORCE_DUMMY_STACK 3
5
6 #define check_for_ir_path() { \
7 if (checkfunc1() && !checkfunc2()) { \
8 int sv = INTRPT_FORCE_DUMMY_STACK; \
9 ...
10 CRASH_DUMP(&sv);\
11 }\
12 }\
In an unknown scenario, there is a crash.
After processing the core dump using GDB, we get the call stack like -
#0 0x00007ffa589d9619 in myFunction1 [...]
(custom_data_type1=0x8080808080808080, custom_data_type2=0x7ff9d77f76b8) at ../xxx/yyy/zzz.c:5
sv = 32761
t = <optimized out>
#1 0x00007ffa589d8f91 in myFunction2 [...]
(custom_data_type3=<optimized out>, d=0x7ff9d77f7748) at ../xxx/yyy/zzz.c:16
sv = 167937677
f = {
...
}
If you see the function, myFunction1 there are three local variables- c, t, sv (defined as part of macro definition). However, in the backtrace, in the frame 0, we see only two local variables - t and sv. And i dont see the variable c being listed.
Same is the case, in the function myFunction2, there are three local variables - e, f, sv(defined as part of macro definition). However, from the backtrace, in the frame 1, we see only two local variables - f and sv. And i dont see the variable e being listed.
Why is the behavior like this?
Any non-static variable declared inside the function, should be put on the callstack during execution and which should have been listed in the backtrace full, isn't it? However, some of the local variables are missing in the backtrace. Could someone provide an explanation?
Objects local to a C function often do not appear on the stack because optimization during compilation often makes it unnecessary to store objects on the stack. In general, while an implementation of the C abstract machine may be viewed as storing objects to local to a function on the stack, the actual implementation on a real processor after compilation and optimization may be very different. In particular:
An object local to a function may be created and used only inside a processor register. When there are enough processor registers to hold a function’s local objects, or some of them, there is no point in writing them to memory, so optimized code will not do so.
Optimization may eliminate a local object completely or fold it into other values. For example, given void foo(int x) { int t = 10; bar(x+2*t); … }, the compiler may merely generate code that adds an immediate value of 20 to x, with the result that neither 10 nor any other instantiation of t ever appears on stack, in a register, or even in the immediate operand of an instruction. It simply does not exist in the generated code because there was no need for it.
An object local to a function may appear on the stack at one point during a function’s code but not at others. And the places it appears may differ from place to place in the code. For example, with { int t = x*x; … bar(t); … t = x/3; … bar(t); … }, the compiler may decide to stash the first value of t in one place on the stack. But the second value assigned to t is effectively a separate lifetime, and the compiler may stash it in another place on the stack (or not at all, per the above). In a good implementation, the debugger may be aware of these different places and display the stored value of t while the program counter is in a matching section of code. And, while the program counter is not in a matching section of code, t may effectively not exist, and the debugger could report it is optimized out at that point.

how many recursive function calls causes stack overflow?

I am working on a simulation problem written in c, the main part of my program is a recursive function.
when the recursive depth reaches approximately 500000 it seems stack overflow occurs.
Q1 : I want to know that is this normal?
Q2 : in general how many recursive function calls causes stack overflow?
Q3 : in the code below, removing local variable neighbor can prevent from stack overflow?
my code:
/*
* recursive function to form Wolff Cluster(= WC)
*/
void grow_Wolff_cluster(lattic* l, Wolff* wolff, site *seed){
/*a neighbor of site seed*/
site* neighbor;
/*go through all neighbors of seed*/
for (int i = 0 ; i < neighbors ; ++i) {
neighbor = seed->neighbors[i];
/*add to WC according to the Wolff Algorithm*/
if(neighbor->spin == seed->spin && neighbor->WC == -1 && ((double)rand() / RAND_MAX) < add_probability)
{
wolff->Wolff_cluster[wolff->WC_pos] = neighbor;
wolff->WC_pos++; // the number of sites that is added to WC
neighbor->WC = 1; // for avoiding of multiple addition of site
neighbor->X = 0;
///controller_site_added_to_WC();
/*continue growing Wolff cluster(recursion)*/
grow_Wolff_cluster(l, wolff, neighbor);
}
}
}
I want to know that is this normal?
Yes. There's only so much stack size.
In the code below, removing local variable neighbor can prevent from stack overflow?
No. Even with no variables and no return values the function calls themselves must be stored in the stack so the stack can eventually be unwound.
For example...
void recurse() {
recurse();
}
int main (void)
{
recurse();
}
This still overflows the stack.
$ ./test
ASAN:DEADLYSIGNAL
=================================================================
==94371==ERROR: AddressSanitizer: stack-overflow on address 0x7ffee7f80ff8 (pc 0x00010747ff14 bp 0x7ffee7f81000 sp 0x7ffee7f81000 T0)
#0 0x10747ff13 in recurse (/Users/schwern/tmp/./test+0x100000f13)
SUMMARY: AddressSanitizer: stack-overflow (/Users/schwern/tmp/./test+0x100000f13) in recurse
==94371==ABORTING
Abort trap: 6
In general how many recursive function calls causes stack overflow?
That depends on your environment and function calls. Here on OS X 10.13 I'm limited to 8192K by default.
$ ulimit -s
8192
This simple example with clang -g can recurse 261976 times. With -O3 I can't get it to overflow, I suspect compiler optimizations have eliminated my simple recursion.
#include <stdio.h>
void recurse() {
puts("Recurse");
recurse();
}
int main (void)
{
recurse();
}
Add an integer argument and it's 261933 times.
#include <stdio.h>
void recurse(int cnt) {
printf("Recurse %d\n", cnt);
recurse(++cnt);
}
int main (void)
{
recurse(1);
}
Add a double argument, now it's 174622 times.
#include <stdio.h>
void recurse(int cnt, double foo) {
printf("Recurse %d %f\n", cnt, foo);
recurse(++cnt, foo);
}
int main (void)
{
recurse(1, 2.3);
}
Add some stack variables and it's 104773 times.
#include <stdio.h>
void recurse(int cnt, double foo) {
double this = 42.0;
double that = 41.0;
double other = 40.0;
double thing = 39.0;
printf("Recurse %d %f %f %f %f %f\n", cnt, foo, this, that, other, thing);
recurse(++cnt, foo);
}
int main (void)
{
recurse(1, 2.3);
}
And so on. But I can increase my stack size in this shell and get twice the calls.
$ ./test 2> /dev/null | wc -l
174622
$ ulimit -s 16384
$ ./test 2> /dev/null | wc -l
349385
I have a hard upper limit to how big I can make the stack of 65,532K or 64M.
$ ulimit -Hs
65532
A stack overflow isn’t defined by the C standard, but by the implementation. The C standard defines a language with unlimited stack space (among other resources) but does have a section about how implementations are allowed to impose limits.
Usually it’s the operating system that actually first creates the error. The OS doesn’t care about how many calls you make, but about the total size of the stack. The stack is composed of stack frames, one for each function call. Usually a stack frame consists of some combination of the following five things (as an approximation; details can vary a lot between systems):
The parameters to the function call (probably not actually here, in this case; they’re probably in registers, although this doesn’t actually buy anything with recursion).
The return address of the function call (in this case, the address of the ++i instruction in the for loop).
The base pointer where the previous stack frame starts
Local variables (at least those that don’t go in registers)
Any registers the caller wants to save when it makes a new function call, so the called function doesn’t overwrite them (some registers may be saved by the caller instead, but it doesn’t particularly matter for stack size analysis). This is why passing parameters in registers doesn’t help much in this case; they’ll end up on the stack sooner or later.
Because some of these (specifically, 1., 4., and 5.) can vary in size by a lot, it can be difficult to estimate how big an average stack frame is, although it’s easier in this case because of the recursion. Different systems also have different stack sizes; it currently looks like by default I can have 8 MiB for a stack, but an embedded system would probably have a lot less.
This also explains why removing a local variable gives you more available function calls; you reduced the size of each of the 500,000 stack frames.
If you want to increase the amount of stack space available, look into the setrlimit(2) function (on Linux like the OP; it may be different on other systems). First, though, you might want to try debugging and refactoring to make sure you need all that stack space.
Yes and no - if you come across a stack overflow in your code, it could mean a few things
Your algorithm is not implemented in a way that respects the amount of memory on the stack you have been given. You may adjust this amount to suit the needs of the algorithm.
If this is the case, it's more common to change the algorithm to more efficiently utilize the stack, rather than add more memory. Converting a recursive function to an iterative one, for example, saves a lot of precious memory.
It's a bug trying to eat all your RAM. You forgot a base case in the recursion or mistakenly called the same function. We've all done it at least 2 times.
It's not necessarily how many calls cause an overflow - it's dependent upon how much memory each individual call takes up on a stack frame. Each function call uses up stack memory until the call returns. Stack memory is statically allocated -- you can't change it at runtime (in a sane world). It's a last-in-first-out (LIFO) data structure behind the scenes.
It's not preventing it, it's just changing how many calls to grow_Wolff_cluster it takes to overflow the stack memory. On a 32-bit system, removing neighbor from the function costs a call to grow_Wolff_cluster 4 bytes less. It adds up quickly when you multiply that in the hundreds of thousands.
I suggest you learn more about how stacks work for you. Here's a good resource over on the software engineering stack exchange. And another here on stack overflow (zing!)
For each time a function recurs, your program takes more memory on the stack, the memory it takes for each function depends upon the function and variables within it. The number of recursions that can be done of a function is entirely dependant upon your system.
There is no general number of recursions that will cause stack overflow.
Removing the variable 'neighbour' will allow for the function to recur further as each recursion takes less memory, but it will still eventually cause stack overflow.
This is a simple c# function that will show you how many iteration your computer can take before stack overflow (as a reference, I have run up to 10478):
private void button3_Click(object sender, EventArgs e)
{
Int32 lngMax = 0;
StackIt(ref lngMax);
}
private void StackIt(ref Int32 plngMax, Int32 plngStack = 0)
{
if (plngStack > plngMax)
{
plngMax = plngStack;
Console.WriteLine(plngMax.ToString());
}
plngStack++;
StackIt(ref plngMax, plngStack);
}
in this simple case, the condition check: "if (plngStack > plngMax)" could be removed,
but if you got a real recursive function, this check will help you localize the problem.

OpenCL 2.0 - work_group operations on CPU and GPU

I was testing the following code in order to perform a parallel array element addition in OpenCL 2.0 with the work_groups built-in functions. (inclusive_add and reduce_add in this case)
kernel void(global const float *input,
global float *sum,
local float *scratch)
{
uint local_id = get_local_id(0);
scratch[local_id] = work_group_inclusive_add(input[get_global_id(0)]);
if (local_id == get_local_size(0)-1)
sum[get_group_id(0)] = work_group_reduce_add(scratch[local_id]);
}
If I test it with an array of floats from 0 to 15 with steps of 1, global_size = 16 and local_size = 4 I'm expecting as a result "6.0 22.0 38.0 54.0" and this works fine if I choose my CPU as a device.
But as soon as I choose the GPU and run the same code I get "0.0 4.0 8.0 12.0" (which is just the element in the first position for each work-group)
Am I missing something?
Things that I tried to do but didn't affect a thing:
Adding "barrier(CLK_LOCAL_MEM_FENCE)" before the "if"
Changing the local size and/or the array size / global size.
Notes:
I am passing the input array with clEnqueueWriteBuffer and then reading the sum with clEnqueueReadBuffer
CPU: i5 6200u
GPU: Intel HD Graphics 520
(yes they support OpenCL 2.0 and I can build the kernel successfully with ioc64 passing -cl-std=CL2.0 as I do while building the program at runtime)
You are getting different results because you are using work_group_reduce_add wrong way.
The OpenCL 2.0 spec says:
This built-in function must be encountered by all work-items in a
work-group executing the kernel.
This isn't the case when you call work_group_reduce_add.
You need to remove that if statement from there altogether. By adding if statement which allows only one work item to access it you are calculating sum of one just one value. And that is returned to you.
After work_group_scan_inclusive_add the numbers should be as follows:
w1: 0,1,2,3 -> 0,1,3,6
w2: 4,5,6,7 -> 4,9,15,22
w3: 8,9,10,11 -> 8,17,27,38
w4: 12,13,14,15 -> 12,25,39,54
After work_group_reduce_add:
w1: 10
w2: 50
w3: 90
w4: 130
And 2nd thing from the spec:
NOTE: The order of floating-point operations is not guaranteed for the
work_group_reduce_, work_group_scan_inclusive_ and
work_group_scan_exclusive_ built-in functions that operate on
half, float and double data types. The order of these floating-point
operations is also non-deterministic for a given workgroup.
So the results after inclusive scan I calculated may not necessary be the same and this is what you are observing on what GPU is returning (GPU is returning 0,4,8,12 which happens to be the last value of each buffer).
To summarize: removing if statement before work_group_reduce_add should fix the issue.

Strange stack overflow?

I'm running into a strange situation with passing a pointer to a structure with a very large array defined in the struct{} definition, a float array around 34MB in size. In a nutshell, the psuedo-code looks like this:
typedef config_t{
...
float values[64000][64];
} CONFIG;
int32_t Create_Structures(CONFIG **the_config)
{
CONFIG *local_config;
int32_t number_nodes;
number_nodes = Find_Nodes();
local_config = (CONFIG *)calloc(number_nodes,sizeof(CONFIG));
*the_config = local_config;
return(number_nodes);
}
int32_t Read_Config_File(CONFIG *the_config)
{
/* do init work here */
return(SUCCESS);
}
main()
{
CONFIG *the_config;
int32_t number_nodes,rc;
number_nodes = Create_Structures(&the_config);
rc = Read_Config_File(the_config);
...
exit(0);
}
The code compiles fine, but when I try to run it, I'll get a SIGSEGV at the { beneath Read_Config_File().
(gdb) run
...
Program received signal SIGSEGV, Segmentation fault.
0x0000000000407d0a in Read_Config_File (the_config=Cannot access memory at address 0x7ffffdf45428
) at ../src/config_parsing.c:763
763 {
(gdb) bt
#0 0x0000000000407d0a in Read_Config_File (the_config=Cannot access memory at address 0x7ffffdf45428
) at ../src/config_parsing.c:763
#1 0x00000000004068d2 in main (argc=1, argv=0x7fffffffe448) at ../src/main.c:148
I've done this sort of thing all the time, with smaller arrays. And strangely, 0x7fffffffe448 - 0x7ffffdf45428 = 0x20B8EF8, or about the 34MB of my float array.
Valgrind will give me similar output:
==10894== Warning: client switching stacks? SP change: 0x7ff000290 --> 0x7fcf47398
==10894== to suppress, use: --max-stackframe=34311928 or greater
==10894== Invalid write of size 8
==10894== at 0x407D0A: Read_Config_File (config_parsing.c:763)
==10894== by 0x4068D1: main (main.c:148)
==10894== Address 0x7fcf47398 is on thread 1's stack
The error messages all point to me clobbering the stack pointer, but a) I've never run across one that crashes on entry of the function and b) I'm passing pointers around, not the actual array.
Can someone help me out with this? I'm on a 64-bit CentOS box running kernel 2.6.18 and gcc 4.1.2
Thanks!
Matt
You've blown up the stack by allocating one of these huge config_t structs onto it. The two stack pointers on evidence in the gdb output, 0x7fffffffe448 and 0x7ffffdf45428, are very suggestive of this.
$ gdb
GNU gdb 6.3.50-20050815 ...blahblahblah...
(gdb) p 0x7fffffffe448 - 0x7ffffdf45428
$1 = 34312224
There's your ~34MB constant that matches the size of the config_t struct. Systems don't give you that much stack space by default, so either move the object off the stack or increase your stack space.
The short answer is that there must be a config_t declared as a local variable somewhere, which would put it on the stack. Probably a typo: missing * after a CONFIG declaration somewhere.

Resources