Does go garbage collect parts of slices? - arrays

If I implement a queue like this...
package main
import(
"fmt"
)
func PopFront(q *[]string) string {
r := (*q)[0]
*q = (*q)[1:len(*q)]
return r
}
func PushBack(q *[]string, a string) {
*q = append(*q, a)
}
func main() {
q := make([]string, 0)
PushBack(&q, "A")
fmt.Println(q)
PushBack(&q, "B")
fmt.Println(q)
PushBack(&q, "C")
fmt.Println(q)
PopFront(&q)
fmt.Println(q)
PopFront(&q)
fmt.Println(q)
}
... I end up with an array ["A", "B", "C"] that has no slices pointing to the first two elements. Since the "start" pointer of a slice can never be decremented (AFAIK), those elements can never be accessed.
Is Go's garbage collector smart enough to free them?

Slices are just descriptors (small struct-like data structures) which if not referenced will be garbage collected properly.
The underlying array for a slice (to which the descriptor points to) on the other hand is shared between all slices that are created by reslicing it: quoting from the Go Language Specification: Slice Types:
A slice, once initialized, is always associated with an underlying array that holds its elements. A slice therefore shares storage with its array and with other slices of the same array; by contrast, distinct arrays always represent distinct storage.
Therefore if at least one slice exists, or a variable holding the array (if a slice was created by slicing the array), it will not be garbage collected.
Official Statement about this:
The blog post Go Slices: usage and internals By Andrew Gerrand clearly states this behaviour:
As mentioned earlier, re-slicing a slice doesn't make a copy of the underlying array. The full array will be kept in memory until it is no longer referenced. Occasionally this can cause the program to hold all the data in memory when only a small piece of it is needed.
...
Since the slice references the original array, as long as the slice is kept around the garbage collector can't release the array.
Back to your example
While the underlying array will not be freed, note that if you add new elements to the queue, the built-in append function occasionally might allocate a new array and copy the current elements to the new – but copying will only copy the elements of the slice and not the whole underlying array! When such a reallocation and copying occurs, the "old" array may be garbage collected if no other reference exists to it.
Also another very important thing is that if an element is popped from the front, the slice will be resliced and not contain a reference to the popped element, but since the underlying array still contains that value, the value will also remain in memory (not just the array). It is recommended that whenever an element is popped or removed from your queue (slice/array), always zero it (its respective element in the slice) so the value will not remain in memory needlessly. This becomes even more critical if your slice contains pointers to big data structures.
func PopFront(q *[]string) string {
r := (*q)[0]
(*q)[0] = "" // Always zero the removed element!
*q = (*q)[1:len(*q)]
return r
}
This is mentioned Slice Tricks wiki page:
Delete without preserving order
a[i] = a[len(a)-1]
a = a[:len(a)-1]
NOTE If the type of the element is a pointer or a struct with pointer fields, which need to be garbage collected, the above implementations of Cut and Delete have a potential memory leak problem: some elements with values are still referenced by slice a and thus can not be collected.

No. At the time of this writing, the Go garbage collector (GC) is not smart enough to collect the beginning of an underlying array in a slice, even if it is inaccessible.
As mentioned by others here, a slice (under the hood) is a struct of exactly three things: a pointer to its underlying array, the length of the slice (values accessible without reslicing), and the capacity of the slice (values accessible by reslicing). On the Go blog, slice internals are discussed at length. Here is another article I like about Go memory layouts.
When you reslice and cut off the tail end of a slice, it is obvious (upon understanding the internals) that the underlying array, the pointer to the underlying array, and the slice's capacity are all left unchanged; only the slice length field is updated. When you re-slice and cut off the beginning of a slice, you are really changing the pointer to the underlying array along with the length and capacity. In this case, it is generally unclear (based on my readings) why the GC does not clean up this inaccessible part of the underlying array because you cannot re-slice the array to access it again. My assumption is that the underlying array is treated as one block of memory from the GC's point of view. If you can point to any part of the underlying array, the entire thing is ineligible for deallocation.
I know what you're thinking... like the true computer scientist you are, you may want some proof. I'll indulge you:
https://goplay.space/#tDBQs1DfE2B
As mentioned by others and as shown in the sample code, using append can cause a reallocation and copy of the underlying array, which allows the old underlying array to be garbage collected.

Simple question, simple answer: No. (But if you keep pushing the slice will at some point overflow its underlying array then the unused elements become available to be freed.)

Contrary to what I'm reading, Golang certainly seems to garbage collect at least unused slices starting sections. The following test case provides evidence.
In the first case the slice is set to slice[:1] in each iteration. In the comparison case, it skips that step.
The second case dwarfs the memory consumed in the first case. But why?
func TestArrayShiftMem(t *testing.T) {
slice := [][1024]byte{}
mem := runtime.MemStats{}
mem2 := runtime.MemStats{}
runtime.GC()
runtime.ReadMemStats(&mem)
for i := 0; i < 1024*1024*1024*1024; i++ {
slice = append(slice, [1024]byte{})
slice = slice[1:]
runtime.GC()
if i%(1024) == 0 {
runtime.ReadMemStats(&mem2)
fmt.Println(mem2.HeapInuse - mem.HeapInuse)
fmt.Println(mem2.StackInuse - mem.StackInuse)
fmt.Println(mem2.HeapAlloc - mem.HeapAlloc)
}
}
}
func TestArrayShiftMem3(t *testing.T) {
slice := [][1024]byte{}
mem := runtime.MemStats{}
mem2 := runtime.MemStats{}
runtime.GC()
runtime.ReadMemStats(&mem)
for i := 0; i < 1024*1024*1024*1024; i++ {
slice = append(slice, [1024]byte{})
// slice = slice[1:]
runtime.GC()
if i%(1024) == 0 {
runtime.ReadMemStats(&mem2)
fmt.Println(mem2.HeapInuse - mem.HeapInuse)
fmt.Println(mem2.StackInuse - mem.StackInuse)
fmt.Println(mem2.HeapAlloc - mem.HeapAlloc)
}
}
}
Output Test1:
go test -run=.Mem -v .
...
0
393216
21472
^CFAIL github.com/ds0nt/cs-mind-grind/arrays 1.931s
Output Test3:
go test -run=.Mem3 -v .
...
19193856
393216
19213888
^CFAIL github.com/ds0nt/cs-mind-grind/arrays 2.175s
If you disable garbage collection on the first test, indeed memory skyrockets. The resulting code looks like this:
func TestArrayShiftMem2(t *testing.T) {
debug.SetGCPercent(-1)
slice := [][1024]byte{}
mem := runtime.MemStats{}
mem2 := runtime.MemStats{}
runtime.GC()
runtime.ReadMemStats(&mem)
// 1kb per
for i := 0; i < 1024*1024*1024*1024; i++ {
slice = append(slice, [1024]byte{})
slice = slice[1:]
// runtime.GC()
if i%(1024) == 0 {
fmt.Println("len, cap:", len(slice), cap(slice))
runtime.ReadMemStats(&mem2)
fmt.Println(mem2.HeapInuse - mem.HeapInuse)
fmt.Println(mem2.StackInuse - mem.StackInuse)
fmt.Println(mem2.HeapAlloc - mem.HeapAlloc)
}
}
}

Related

Copying a struct's slice parameter in Go

Let's say I have a struct with a slice parameter like the following, and I create one and fill it with some values:
type s struct {
sl []float32
}
func NewS() *s {
return &s{
sl: make([]float32, 3),
}
}
func main() {
a := NewS()
a.sl[0] = 1
a.sl[1] = 2
a.sl[2] = 3
b := NewS()
// Code here
}
I want b to have an sl with the same values as a.sl, and in particular, I'd like to understand how to do it in different ways:
Where the entire b points to the entire a (so that if there were other parameters in the struct, those would all match)
Where b.sl would point to a.sl
As a deep copy, where there are no pointers (and done not by iterating through the slice, but by a way that's repeatable regardless of type, ideally)
What follows is what I've tried and found so far.
b = a
fmt.Printf("a: %p\n", &a)
fmt.Printf("b: %p\n", &b)
fmt.Printf("a.sl: %p\n", &a.sl)
fmt.Printf("b.sl: %p\n", &b.sl)
Which outputs different memory addresses for the structs, but the same memory addresses for the slices. Why is that? Why wouldn't the struct addresses be the same, too, if a is just pointing to b?
a: 0xc00000e030
b: 0xc00000e038
a.sl: 0xc000054040
b.sl: 0xc000054040
Then I tried:
b.sl = a.sl
// ...
Which outputs all distinct memory addresses. It makes sense of course that the address of a and b are different, but why isn't b.sl pointing to the address of a.sl?
a: 0xc00000e028
b: 0xc00000e030
a.sl: 0xc00000c030
b.sl: 0xc00000c048
And finally I tried:
copy(b.sl, a.sl)
// ...
Which outputs similarly to the above. This result is the only one I expected, since I expected this to make a deep copy.
a: 0xc0000b8018
b: 0xc0000b8020
a.sl: 0xc0000ae018
b.sl: 0xc0000ae030
Could you help me understand what's happening in these 3 cases and other ways to achieve my desire results and perform various types of copies?
Here's some background information: A slice contains a pointer to the slice's backing array, the length of the slice and the capacity of the backing array. The (pointer, length, capacity) is referred to as a slice header.
The address of a slice is the address of the slice header, not the pointer to the slice's backing array or the first element of the slice's backing array.
b = a
The variables a and b point to the same value after the statement is executed.
The variables a and b are distinct and therefore have different addresses.
The expression &a.sl is the address of the sl field in the pointed at value. The expression &b.sl equals &a.sl because a and b point at the same value.
This is your #1: a an b point at the same value.
b.sl = a.sl
The statement copies the slice header from a.sl to b.sl. The slices have different addresses because slice headers remain distinct.
This is close to your #2: a.sl an b.sl point at the same backing array. A modification to an element in a.sl is visible through b.sl, but changes to the slice header a.sl are not reflected in `b.sl.
copy(b.sl, a.sl)
The statement copies the elements in a.sls backing array to b.sl.
This is your #3, a deep copy. Changes through a are not reflected in b.
A slice is a triple containing a pointer to the underlying array, length, and capacity. When you assign a slice to another slice, you assign that triple. The underlying array remains the same.
A slice is a view of an array. If you need a deep copy of a slice, you have to create another slice and copy it yourself.
If you assign an array to another, that copies each element of the source to the target.

Is returning a slice of a local array in a Go function safe?

What happens if I return a slice of an array that is a local variable of a function or method? Does Go copy the array data into a slice create with make()? Will the capacity match the slice size or the array size?
func foo() []uint64 {
var tmp [100]uint64
end := 0
...
for ... {
...
tmp[end] = uint64(...)
end++
...
}
...
return tmp[:end]
}
This is detailed in Spec: Slice expressions.
The array will not be copied, but instead the result of the slice expression will be a slice that refers to the array. In Go it is perfectly safe to return local variables or their addresses from functions or methods, the Go compiler performs an escape analysis to determine if a value may escape the function, and if so (or rather if it can't prove that a value may not escape), it allocates it on the heap so it will be available after the function returns.
The slice expression: tmp[:end] means tmp[0:end] (because a missing low index defaults to zero). Since you didn't specify the capacity, it will default to len(tmp) - 0 which is len(tmp) which is 100.
You can also control the result slice's capacity by using a full slice expression, which has the form:
a[low : high : max]
Which sets the resulting slice's capacity to max - low.
More examples to clarify the resulting slice's length and capacity:
var a [100]int
s := a[:]
fmt.Println(len(s), cap(s)) // 100 100
s = a[:50]
fmt.Println(len(s), cap(s)) // 50 100
s = a[10:50]
fmt.Println(len(s), cap(s)) // 40 90
s = a[10:]
fmt.Println(len(s), cap(s)) // 90 90
s = a[0:50:70]
fmt.Println(len(s), cap(s)) // 50 70
s = a[10:50:70]
fmt.Println(len(s), cap(s)) // 40 60
s = a[:50:70]
fmt.Println(len(s), cap(s)) // 50 70
Try it on the Go Playground.
Avoiding heap allocation
If you want to allocate it on the stack, you can't return any value that points to it (or parts of it). If it would be allocated on the stack, there would be no guarantee that after returning it remains available.
A possible solution to this would be to pass a pointer to an array as an argument to the function (and you may return a slice designating the useful part that the function filled), e.g.:
func foo(tmp *[100]uint64) []uint64 {
// ...
return tmp[:end]
}
If the caller function creates the array (on the stack), this will not cause a "reallocation" or "moving" to the heap:
func main() {
var tmp [100]uint64
foo(&tmp)
}
Running go run -gcflags '-m -l' play.go, the result is:
./play.go:8: leaking param: tmp to result ~r1 level=0
./play.go:5: main &tmp does not escape
The variable tmp is not moved to heap.
Note that [100]uint64 is considered a small array to be allocated on the stack. For details see What is considered "small" object in Go regarding stack allocation?
Nothing wrong happens.
Go does not make a copy but compiler performs escape analysis and allocates the variable that will be visible outside function on heap.
The capacity will be the capacity of underlying array.
The data is not copied. The array will be used as the underlying array of the slice.
It looks like you're worrying about the life time of the array, but the compiler and garbage collector will figure that out for you. It's as safe as returning pointers to "local variables".

unsafe Pointers in Go: function call end kills array

I'm writing a library and I want to return an array (or write to an array) of an unspecific type to the caller. The type can vary, depending on who calls - I can, however, create as many objects of said type from within my function. One way would be that the caller creates an array and the callee fills that - however, there is no way of telling how long this array is going to be. (Is there a way that the callee makes the caller's array bigger? Remember, the callee only sees x interface{}...)
The other way which I chose because I don't see how above is possible, is that the caller gives me the pointer of his specific type and I redirect it to the array of objects which I created.
Below is my solution. My question: why is the array empty after the function call? They are pointing to the same array after my operation, they should be the same. Am I overlooking something? I thought about GC, but it couldn't be that fast, could it?
http://play.golang.org/p/oVoPx5Nf84
package main
import "unsafe"
import "reflect"
import "log"
func main() {
var x []string
log.Printf("before: %v, %p", x, x)
manipulate(&x)
log.Printf("after: %v, %p", x, x)
}
func manipulate(target interface{}) {
new := make([]string, 0, 10)
new = append(new, "Hello", "World")
log.Printf("new: %v, %p", new, new)
p := unsafe.Pointer(reflect.ValueOf(target).Pointer())
ptr := unsafe.Pointer(reflect.ValueOf(new).Pointer())
*(*unsafe.Pointer)(p) = ptr
}
First of all, unsafe is usually a bad idea. So is reflection, but unsafe is at least an order of magnitude worse.
Here is your example using pure reflection (http://play.golang.org/p/jTJ6Mhg8q9):
package main
import (
"log"
"reflect"
)
func main() {
var x []string
log.Printf("before: %v, %p", x, x)
manipulate(&x)
log.Printf("after: %v, %p", x, x)
}
func manipulate(target interface{}) {
t := reflect.Indirect(reflect.ValueOf(target))
t.Set(reflect.Append(t, reflect.ValueOf("Hello"), reflect.ValueOf("World")))
}
So, why didn't your unsafe way work? Unsafe is extremely tricky and at times requires understanding the internals. First, some misconceptions you have:
You are using arrays: you are not, you are using slices. Slices are a view of an array. They contain within them a pointer to the data, a length, and a capacity. They are stucts internally and not pure pointers.
Pointer returns the pointer only if it is a pointer: it can actually return a value for many types like a slice.
From http://golang.org/pkg/reflect/#Value.Pointer:
If v's Kind is Slice, the returned pointer is to the first element of the slice. If the slice is nil the returned value is 0. If the slice is empty but non-nil the return value is non-zero.
Arrays are pointers: in Go, arrays are actually values. That means they are copied when passed to other functions or assigned. It also means the .Pointer method wouldn't work.
You are assigning a pointer to an array to a slice type. By luck, the implementation of slices used internally has the data pointer first so you are actually setting the internal array pointer used by the slice. I must stress that is is effectively pure accident. Even still, you are not setting the length and capacity of the slice so it still prints zero elements.
Unsafe lets you do things at such a low level that the actual results aren't really defined. It is best to stay away from it unless you really know what you are doing. Even then, be aware that things can can change and what works today may not work in the next version of Go or another implementation.

Can Ada functions return arrays?

I read somewhere that Ada allows a function only to return a single item. Since an array can hold multiple items does this mean that I can return the array as a whole or must I return only a single index of the array?
Yes, an Ada function can return an array - or a record.
There can be a knack to using it, though. For example, if you are assigning the return value to a variable, the variable must be exactly the right size to hold the array, and there are two common ways of achieving that.
1) Fixed size array - cleanest way is to define an array type, e.g.
type Vector is new Array(1..3) of Integer;
function Unit_Vector return Vector;
A : Vector;
begin
A := Unit_Vector;
...
2) Unconstrained array variables.
These are arrays whose size is determined at runtime by the initial assignment to them. Subsequent assignments to them will fail unless the new value happens to have the same size as the old. The trick is to use a declare block - a new scope - so that each assignment to the unconstrained variable is its first assignment. For example:
for i in 1 .. last_file loop
declare
text : String := Read_File(Filename(i));
-- the size of "text" is determined by the file contents
begin
-- process the text here.
for j in text'range loop
if text(j) = '*' then
...
end loop;
end
end loop;
One warning : if the array size is tens of megabytes or more, it may not be successfully allocated on the stack. So if this construct raises Storage_Error exceptions, and you can't raise the stack size, you may need to use access types, heap allocation via "new" and deallocation as required.
Yes, an Ada function can return an array. For example, an Ada String is "A one-dimensional array type whose component type is a character type." Several of the functions defined in Ada.Strings.Fixed—including Insert, Delete, Head, Tail and Trim—return a String.

How to allocate a non-constant sized array in Go

How do you allocate an array in Go with a run-time size?
The following code is illegal:
n := 1
var a [n]int
you get the message prog.go:12: invalid array bound n (or similar), whereas this works fine:
const n = 1
var a [n]int
The trouble is, I might not know the size of the array I want until run-time.
(By the way, I first looked in the question How to implement resizable arrays in Go for an answer, but that is a different question.)
The answer is you don't allocate an array directly, you get Go to allocate one for you when creating a slice.
The built-in function make([]T, length, capacity) creates a slice and the array behind it, and there is no (silly) compile-time-constant-restriction on the values of length and capacity. As it says in the Go language specification:
A slice created with make always allocates a new, hidden array to which the returned slice value refers.
So we can write:
n := 12
s := make([]int, n, 2*n)
and have an array allocated size 2*n, with s a slice initialised to be the first half of it.
I'm not sure why Go doesn't allocate the array [n]int directly, given that you can do it indirectly, but the answer is clear: "In Go, use slices rather than arrays (most of the time)."

Resources