I'm new to Go and try to understand the language in order to write efficient code. In the following code, sizes of the two arrays differ by 140%, can someone explain this?
package main
import (
"fmt"
"unsafe"
)
func main() {
ind1 := make([]bool, 10)
var ind2 [10]bool
fmt.Println(unsafe.Sizeof(ind1)) // 24
fmt.Println(len(ind1)) // 10
fmt.Println(unsafe.Sizeof(ind2)) // 10
fmt.Println(len(ind2)) // 10
}
The size of the first array remains 10, even in case the capacity is set explicitly:
ind1 := make([]bool, 10, 10)
Can someone explain this? Is there any additional overhead in using make? If yes, why is it recommended to use make over default initialization?
Arrays and slices in Go are different things.
Your ind1 is a slice, and ind2 is an array. The length of an array is part of the type, so for example [2]bool and [3]bool are 2 different array types.
A slice in Go is a descriptor for a contiguous segment of an underlying array and provides access to a numbered sequence of elements from that array. This slice header is a struct-like data structure represented by the type reflect.SliceHeader:
type SliceHeader struct {
Data uintptr
Len int
Cap int
}
It contains a data pointer (to the first element of the represented segment), a length and a capacity.
The unsafe.SizeOf() function returns the size in bytes of the hypothetical variable as if it would hold the passed value. It does not include any memory possibly referenced by it.
So if you pass a slice value (ind1), it will tell you the size of the above mentioned slice header. Note that the size of the fields of SliceHeader are architecture dependent, e.g. int may be 4 bytes on one platform and it may be 8 bytes on another. The size 24 applies to 64-bit architectures.
The Go Playground runs on a 32-bit architecture. Let's see this example:
fmt.Println(unsafe.Sizeof(make([]bool, 10)))
fmt.Println(unsafe.Sizeof(make([]bool, 20)))
fmt.Println(unsafe.Sizeof([10]bool{}))
fmt.Println(unsafe.Sizeof([20]bool{}))
Output (try it on the Go Playground):
12
12
10
20
As you can see, no matter the length of the slice you pass to unsafe.SizeOf(), it always returns 12 on the Go Playground (and 24 on 64-bit architectures).
On the other hand, an array value includes all its elements, and as such, its size depends on its length. Size of [10]bool is 10, and size of [20]bool is 20.
See related questions+answers to learn more about slices, arrays and the difference and relation between them:
How do I find the size of the array in go
Why have arrays in Go?
Why use arrays instead of slices?
Must read blog posts:
Go Slices: usage and internals
Arrays, slices (and strings): The mechanics of 'append'
ind1 is a slice (the type is []bool).
ind2 is an array (the type is [10]bool).
They are not of the same type.
The result of unsafe.Sizeof(ind1) probably has nothing to do with the arguments passed to make.
Related
Is it possible to create arrays based of their index as in
int x = 4;
int y = 5;
int someNr = 123;
int foo[x][y] = someNr;
dynamically/on the run, without creating foo[0...3][0...4]?
If not, is there a data structure that allow me to do something similar to this in C?
No.
As written your code make no sense at all. You need foo to be declared somewhere and then you can index into it with foo[x][y] = someNr;. But you cant just make foo spring into existence which is what it looks like you are trying to do.
Either create foo with correct sizes (only you can say what they are) int foo[16][16]; for example or use a different data structure.
In C++ you could do a map<pair<int, int>, int>
Variable Length Arrays
Even if x and y were replaced by constants, you could not initialize the array using the notation shown. You'd need to use:
int fixed[3][4] = { someNr };
or similar (extra braces, perhaps; more values perhaps). You can, however, declare/define variable length arrays (VLA), but you cannot initialize them at all. So, you could write:
int x = 4;
int y = 5;
int someNr = 123;
int foo[x][y];
for (int i = 0; i < x; i++)
{
for (int j = 0; j < y; j++)
foo[i][j] = someNr + i * (x + 1) + j;
}
Obviously, you can't use x and y as indexes without writing (or reading) outside the bounds of the array. The onus is on you to ensure that there is enough space on the stack for the values chosen as the limits on the arrays (it won't be a problem at 3x4; it might be at 300x400 though, and will be at 3000x4000). You can also use dynamic allocation of VLAs to handle bigger matrices.
VLA support is mandatory in C99, optional in C11 and C18, and non-existent in strict C90.
Sparse arrays
If what you want is 'sparse array support', there is no built-in facility in C that will assist you. You have to devise (or find) code that will handle that for you. It can certainly be done; Fortran programmers used to have to do it quite often in the bad old days when megabytes of memory were a luxury and MIPS meant millions of instruction per second and people were happy when their computer could do double-digit MIPS (and the Fortran 90 standard was still years in the future).
You'll need to devise a structure and a set of functions to handle the sparse array. You will probably need to decide whether you have values in every row, or whether you only record the data in some rows. You'll need a function to assign a value to a cell, and another to retrieve the value from a cell. You'll need to think what the value is when there is no explicit entry. (The thinking probably isn't hard. The default value is usually zero, but an infinity or a NaN (not a number) might be appropriate, depending on context.) You'd also need a function to allocate the base structure (would you specify the maximum sizes?) and another to release it.
Most efficient way to create a dynamic index of an array is to create an empty array of the same data type that the array to index is holding.
Let's imagine we are using integers in sake of simplicity. You can then stretch the concept to any other data type.
The ideal index depth will depend on the length of the data to index and will be somewhere close to the length of the data.
Let's say you have 1 million 64 bit integers in the array to index.
First of all you should order the data and eliminate duplicates. That's something easy to achieve by using qsort() (the quick sort C built in function) and some remove duplicate function such as
uint64_t remove_dupes(char *unord_arr, char *ord_arr, uint64_t arr_size)
{
uint64_t i, j=0;
for (i=1;i<arr_size;i++)
{
if ( strcmp(unord_arr[i], unord_arr[i-1]) != 0 ){
strcpy(ord_arr[j],unord_arr[i-1]);
j++;
}
if ( i == arr_size-1 ){
strcpy(ord_arr[j],unord_arr[i]);
j++;
}
}
return j;
}
Adapt the code above to your needs, you should free() the unordered array when the function finishes ordering it to the ordered array. The function above is very fast, it will return zero entries when the array to order contains one element, but that's probably something you can live with.
Once the data is ordered and unique, create an index with a length close to that of the data. It does not need to be of an exact length, although pledging to powers of 10 will make everything easier, in case of integers.
uint64_t* idx = calloc(pow(10, indexdepth), sizeof(uint64_t));
This will create an empty index array.
Then populate the index. Traverse your array to index just once and every time you detect a change in the number of significant figures (same as index depth) to the left add the position where that new number was detected.
If you choose an indexdepth of 2 you will have 10² = 100 possible values in your index, typically going from 0 to 99.
When you detect that some number starts by 10 (103456), you add an entry to the index, let's say that 103456 was detected at position 733, your index entry would be:
index[10] = 733;
Next entry begining by 11 should be added in the next index slot, let's say that first number beginning by 11 is found at position 2023
index[11] = 2023;
And so on.
When you later need to find some number in your original array storing 1 million entries, you don't have to iterate the whole array, you just need to check where in your index the first number starting by the first two significant digits is stored. Entry index[10] tells you where the first number starting by 10 is stored. You can then iterate forward until you find your match.
In my example I employed a small index, thus the average number of iterations that you will need to perform will be 1000000/100 = 10000
If you enlarge your index to somewhere close the length of the data the number of iterations will tend to 1, making any search blazing fast.
What I like to do is to create some simple algorithm that tells me what's the ideal depth of the index after knowing the type and length of the data to index.
Please, note that in the example that I have posed, 64 bit numbers are indexed by their first index depth significant figures, thus 10 and 100001 will be stored in the same index segment. That's not a problem on its own, nonetheless each master has his small book of secrets. Treating numbers as a fixed length hexadecimal string can help keeping a strict numerical order.
You don't have to change the base though, you could consider 10 to be 0000010 to keep it in the 00 index segment and keep base 10 numbers ordered, using different numerical bases is nonetheless trivial in C, which is of great help for this task.
As you make your index depth become larger, the amount of entries per index segment will be reduced
Please, do note that programming, especially lower level like C consists in comprehending the tradeof between CPU cycles and memory use in great part.
Creating the proposed index is a way to reduce the number of CPU cycles required to locate a value at the cost of using more memory as the index becomes larger. This is nonetheless the way to go nowadays, as masive amounts of memory are cheap.
As SSDs' speed become closer to that of RAM, using files to store indexes is to be taken on account. Nevertheless modern OSs tend to load in RAM as much as they can, thus using files would end up in something similar from a performance point of view.
Goal
I am programming an Allen-Bradley / Rockwell CompactLogix PLC in SCL. I would like to determine the size of Arrays at runtime. It would be possible to enter the Array lengths as constants into the code before compiling. However, the reusability would be improved greatly if the lengths of the arrays could be determined automatically.
Problem
There is the function SIZE(Source,Dimtovary,Size) which does exactly what I need although only for SINT[] INT[] DINT[] REAL[] structure and STRING. Unfortunately I need the length of BOOL[].
"The SIZE instruction finds the number of elements (size) in the designated dimension of the Source array or string operand and places the result in the Size operand. The instruction finds the size of one dimension of an array."
Pseudo code
Int_array := INT[16];
Bool_array := BOOL[64];
SIZE(Int_array[0],0,Int_array_len);
// Works, Int_array_len contains 16
SIZE(Bool_array[0],0,Bool_array_len);
// Isn't compilable becaus size(); isn't defined for boolean arrays
Environment
IDE: Rockwell Studio 5000 / RSLogix 5000
PLC: 1769-L36ERMS
Language: SCL (Structured text)
Reference: Programming reference manual
Question
Is there a way to determine the length of a boolean array for example BOOL[64]?
Additionally, is there a fundamental reason why SIZE(Source,Dimtovary,Size); doesn't work with boolean arrays?
The answer is simply no; it is not possible to get the size of a BOOL[] array.
As #DanMašek suggested correctly, BOOL[] arrays are very limited. It is even recommended using UDTs containing members of type BOOL instead.
Unfortunately, I still have no solution to get the length of multiple BITs arranged in some way and loop through them in a for loop.
I am trying to write an application that reads RPM files. The start of each block has a Magic char of [4]byte.
Here is my struct
type Lead struct {
Magic [4]byte
Major, Minor byte
Type uint16
Arch uint16
Name string
OS uint16
SigType uint16
}
I am trying to do the following:
lead := Lead{}
lead.Magic = buffer[0:4]
I am searching online and not sure how to go from a slice to an array (without copying). I can always make the Magic []byte (or even uint64), but I was more curious on how would I go from type []byte to [4]byte if needed to?
The built in method copy will only copy a slice to a slice NOT a slice to an array.
You must trick copy into thinking the array is a slice
copy(varLead.Magic[:], someSlice[0:4])
Or use a for loop to do the copy:
for index, b := range someSlice {
varLead.Magic[index] = b
}
Or do as zupa has done using literals. I have added onto their working example.
Go Playground
You have allocated four bytes inside that struct and want to assign a value to that four byte section. There is no conceptual way to do that without copying.
Look at the copy built-in for how to do that.
Try this:
copy(lead.Magic[:], buf[0:4])
Tapir Liui (auteur de Go101) twitte:
Go 1.18 1.19 1.20 will support conversions from slice to array: golang/go issues 46505.
So, since Go 1.18,the slice copy2 implementation could be written as:
*(*[N]T)(d) = [N]T(s)
or, even simpler if the conversion is allowed to present as L-values:
[N]T(d) = [N]T(s)
Without copy, you can convert, with the next Go 1.17 (Q3 2021) a slice to an array pointer.
This is called "un-slicing", giving you back a pointer to the underlying array of a slice, again, without any copy/allocation needed:
See golang/go issue 395: spec: convert slice x into array pointer, now implemented with CL 216424/, and commit 1c26843
Converting a slice to an array pointer yields a pointer to the underlying array of the slice.
If the length of the slice is less than the length of the array,
a run-time panic occurs.
s := make([]byte, 2, 4)
s0 := (*[0]byte)(s) // s0 != nil
s2 := (*[2]byte)(s) // &s2[0] == &s[0]
s4 := (*[4]byte)(s) // panics: len([4]byte) > len(s)
var t []string
t0 := (*[0]string)(t) // t0 == nil
t1 := (*[1]string)(t) // panics: len([1]string) > len(s)
So in your case, provided Magic type is *[4]byte:
lead.Magic = (*[4]byte)(buffer)
Note: type aliasing will work too:
type A [4]int
var s = (*A)([]int{1, 2, 3, 4})
Why convert to an array pointer? As explained in issue 395:
One motivation for doing this is that using an array pointer allows the compiler to range check constant indices at compile time.
A function like this:
func foo(a []int) int
{
return a[0] + a[1] + a[2] + a[3];
}
could be turned into:
func foo(a []int) int
{
b := (*[4]int)(a)
return b[0] + b[1] + b[2] + b[3];
}
allowing the compiler to check all the bounds once only and give compile-time errors about out of range indices.
Also:
One well-used example is making classes as small as possible for tree nodes or linked list nodes so you can cram as many of them into L1 cache lines as possible.
This is done by each node having a single pointer to a left sub-node, and the right sub-node being accessed by the pointer to the left sub-node + 1.
This saves the 8-bytes for the right-node pointer.
To do this you have to pre-allocate all the nodes in a vector or array so they're laid out in memory sequentially, but it's worth it when you need it for performance.
(This also has the added benefit of the prefetchers being able to help things along performance-wise - at least in the linked list case)
You can almost do this in Go with:
type node struct {
value int
children *[2]node
}
except that there's no way of getting a *[2]node from the underlying slice.
Go 1.20 (Q1 2023): this is addressed with CL 430415, 428938 (type), 430475 (reflect) and 429315 (spec).
Go 1.20
You can convert from a slice to an array directly with the usual conversion syntax T(x). The array's length can't be greater than the slice's length:
func main() {
slice := []int64{10, 20, 30, 40}
array := [4]int64(slice)
fmt.Printf("%T\n", array) // [4]int64
}
Go 1.17
Starting from Go 1.17 you can directly convert a slice to an array pointer. With Go's type conversion syntax T(x) you can do this:
slice := make([]byte, 4)
arrptr := (*[4]byte)(slice)
Keep in mind that the length of the array must not be greater than the length of the slice, otherwise the conversion will panic.
bad := (*[5]byte)(slice) // panics: slice len < array len
This conversion has the advantage of not making any copy, because it simply yields a pointer to the underlying array.
Of course you can dereference the array pointer to obtain a non-pointer array variable, so the following also works:
slice := make([]byte, 4)
var arr [4]byte = *(*[4]byte)(slice)
However dereferencing and assigning will subtly make a copy, since the arr variable is now initialized to the value that results from the conversion expression. To be clear (using ints for simplicity):
v := []int{10,20}
a := (*[2]int)(v)
a[0] = 500
fmt.Println(v) // [500 20] (changed, both point to the same backing array)
w := []int{10,20}
b := *(*[2]int)(w)
b[0] = 500
fmt.Println(w) // [10 20] (unchanged, b holds a copy)
One might wonder why the conversion checks the slice length and not the capacity (I did). Consider the following program:
func main() {
a := []int{1,2,3,4,5,6}
fmt.Println(cap(a)) // 6
b := a[:3]
fmt.Println(cap(a)) // still 6
c := (*[3]int)(b)
ptr := uintptr(unsafe.Pointer(&c[0]))
ptr += 3 * unsafe.Sizeof(int(0))
i := (*int)(unsafe.Pointer(ptr))
fmt.Println(*i) // 4
}
The program shows that the conversion might happen after reslicing. The original backing array with six elements is still there, so one might wonder why a runtime panic occurs with (*[6]int)(b) where cap(b) == 6.
This has actually been brought up. It's worth to remember that, unlike slices, an array has fixed size, therefore it needs no notion of capacity, only length:
a := [4]int{1,2,3,4}
fmt.Println(len(a) == cap(a)) // true
You might be able to do the whole thing with one read, instead of reading individually into each field. If the fields are fixed-length, then you can do:
lead := Lead{}
// make a reader to dispense bytes so you don't have to keep track of where you are in buffer
reader := bytes.NewReader(buffer)
// read into each field in Lead, so Magic becomes buffer[0:4],
// Major becomes buffer[5], Minor is buffer[6], and so on...
binary.Read(reader, binary.LittleEndian, &lead)
Don't. Slice itself is suffice for all purpose. Array in go lang should be regarded as the underlying structure of slice. In every single case, use only slice. You don't have to array yourself. You just do everything by slice syntax. Array is only for computer. In most cases, slice is better, clear in code. Even in other cases, slice still is sufficient to reflex your idea.
I'd like for my array to be of a set length using a simple format. Please, let me know how this is done.
What I already have:
arr[100]
Pseudocode: what I would like to have:
arr[4-20] or arr[$min_int THROUGH $max_int]
Additional detail edit: The int should be within the range array = (4, 20). The input may contain leading zeros. I'd like to keep the length of the array restricted (i.e., to 9 or 10 characters).
Arrays simply do not work this way in C. You will need to implement it yourself by only looping through valid indices (and wasting memory in the process) or by using a data structure better suited to the job, like a map (which you will have to find in a library or write yourself as it does not exist in the language).
#define ARRMINIDX 4
#define ARRMAXIDX 20
int arrmem[ARRMAXIDX+1-ARRMINIDX];
#define arr(x) arrmem[ARRMINIDX+(x)]
// process elements of arr
for( i = ARRMINIDX; i <= ARRMAXIDX; i++ )
dosomething(arr(i));
OTOH, this make not be what you want at all, given your comment
I want an array with 0-1 elements: a limited int or limited "numeric
int"--string mimicking an int.
which I can't make heads or tails of in this context. Are you saying that you want a string of 4-20 chars that represents an integer?
Greetings
I need to calculate a first-order entropy (Markov source, like on wiki here http://en.wikipedia.org/wiki/Entropy_(information_theory) of a signal that consists of 16bit words.
This means, i must calculate how frequently each combination of a->b (symbol b appears after a) is happening in the data stream.
When i was doing it for just 4 less significant or 4 more significant bits, i used a two dimensional array, where first dimension was the first symbol and second dimension was the second symbol.
My algorithm looked like this
Read current symbol
Array[prev_symbol][curr_symbol]++
prev_symbol=curr_symbol
Move forward 1 symbol
Then, Array[a][b] would mean how many times did symbol b going after symbol a has occurred in a stream.
Now, i understand that array in C is a pointer that is incremented to get exact value, like to get element [3][4] from array[10][10] i have to increment pointer to array[0][0] by (3*10+4)(size of variable stored in array). I understand that the problem must be that 2^32 elements of type unsigned long must be taking too much.
But still, is there a way to deal with it?
Or maybe there is another way to accomplish this?
An two-dimensional array of integers (4 byte) with 32'000 by 32'000 elements occupies about 16 GByte of RAM. Does your machine have that much memory?
Anyhow, out of the more than 1 billion array elements, only very few will have a count different from zero. So it's probably better to go with some sort of sparse storage.
One solution would be to use a dictionary where the tuple (a, b) is the key and the count of occurrences is the value.
Perhaps you could do multiple passes over the data. The entropy contribution from pairs beginning with symbol X is essentially independent of pairs beginning with any other symbol (aside from the total number of them, of course), so you can calculate the entropy for all such pairs and then throw away the distribution data. At the end, combine 2^16 partial entropy values to get the total. You don't necessarily have to do 2^16 passes over the data, you can be "interested" in as many initial characters in a single pass as you have space for.
Alternatively, if your data is smaller than 2^32 samples, then you know for sure that you won't see all possible pairs, so you don't actually need to allocate a count for each one. If the sample is small enough, or the entropy is low enough, then some kind of sparse array would use less memory than your full 16GB matrix.
Did a quick test on Ubuntu 10.10 x64
gt#thinkpad-T61p:~/test$ uname -a
Linux thinkpad-T61p 2.6.35-25-generic #44-Ubuntu SMP Fri Jan 21 17:40:44 UTC 2011 x86_64 GNU/Linux
gt#thinkpad-T61p:~/test$ cat mtest.c
#include <stdio.h>
#include <stdlib.h>
short *big_array;
int main(void)
{
if((big_array = (short *)malloc(4UL*1024*1024*1024*sizeof (short))) == NULL) {
perror("malloc");
return 1;
}
big_array[0]++;
big_array[100]++;
big_array[1UL*1024*1024*1024]++;
big_array[2UL*1024*1024*1024]++;
big_array[3UL*1024*1024*1024]++;
printf("array[100] = %d\narray[3G] = %d\n", big_array[100], big_array[3UL*1024*1024*1024]);
return 0;
}
gt#thinkpad-T61p:~/test$ gcc -Wall mtest.c -o mtest
gt#thinkpad-T61p:~/test$ ./mtest
array[100] = 1
array[3G] = 1
gt#thinkpad-T61p:~/test$
It looks like the virtual memory system on linux is up to the job, as long as you have enough memory and/or swap.
Have fun!