Usage of non UTF-8-encoded string as map key - arrays

I would like to use a bytearray of variable length as key within a map.
myMap := make(map[[]byte]int)
As slices and variable length bytearrays are no valid key type in go, the code above is not valid.
Then I read that strings are just a set of 8-bit bytes, conventinally but not necessarily representing UTF-8-encoded text.
Are there any problems to use such a non UTF-8-encoded string for a map key regarding hashing?
The following code demonstrates how I converted []byte to string and back to []byte again:
package main
import (
"bytes"
"fmt"
)
func main() {
// src is a byte array with all available byte values
src := make([]byte, 256)
for i := 0; i < len(src); i++ {
src[i] = byte(i)
}
fmt.Println("src:", src)
// convert byte array to string for key usage within a map
mapKey := string(src[:]) // <- can this be used for key in map[string]int?
//fmt.Println(mapKey) // <- this destroys the print function!
fmt.Printf("len(mapKey): %d\n", len(mapKey)) // <- that actually works
// convert string back to dst for binary usage
dst := []byte(mapKey)
fmt.Println("dst:", dst)
if bytes.Compare(src, dst) != 0 {
panic("Ups... something went wrong!")
}
}

There is no problem using string as key in a map where the string is not valid UTF-8.
The Go Blog: Strings, bytes, runes and characters in Go:
In Go, a string is in effect a read-only slice of bytes.
And Spec: Comparison operators:
String values are comparable and ordered, lexically byte-wise.
What matters is what bytes the string has, may it be valid or non-valid UTF-8 sequence. If 2 string values have the same invalid UTF-8 byte sequence, they are equal, and if not, they aren't.
Testing invalid and valid sequences ("\xff" and "\x00"):
m := map[string]byte{}
m["\xff"] = 1
m["\x00"] = 2
fmt.Println(m["\xff"], m["\x00"])
Output is (try it on the Go Playground):
1 2

Related

How strings differ from massive(array) of chars in Pascal?

I have a problem. Why I can't assign the value of strings to strings, but with chars it works. Why ^^^^^^? Where's string? Why There's the a[i]?
Is it because internal representation of strings and chars?
program massive.pas;
type
chars = array [1..255] of char;
var
s,s1: string;
ch1,ch2: chars;
i: integer;
begin
s1 := '';
s := 'abrakadabra';
for i := 1 to 5 do
begin
s1[i] := s[i];
writeln(s1[i],#10,'^^^',s1,'^^^')
end;
ch2 := '';
ch1 := 'abrakadabra';
for i := 1 to 5 do
begin
ch2[i] := ch1[i]
writeln(ch2[i])
end;
writeln('%%%',ch2,'%%%');
for i := 1 to 5 do
writeln('&&&',s1[i],'&&&');
end.
*Output
a
^^^^^^
b
^^^^^^
r
^^^^^^
a
^^^^^^
k
^^^^^^
a
b
r
a
k
%%%abrak%%%
&&&a&&&
&&&b&&&
&&&r&&&
&&&a&&&
&&&k&&&
The main difference between type chars = Array[1..255] of Char and String is that a chars array has a fixed length, while the string has a dynamic length.
You did not say which compiler you use, but I do think that the String type is what in some Pascal editions is called a ShortString, with a max length of 255 chars. The space for 255 chars is preallocated and the structure includes a length field, that keeps track of assigned length of the string.
In your example, you assign s1 := ''; in other words the length is set to zero. Then you do a mistake in the for loop in assigning s1[i] := s[i]; without setting the length of s1.
Subsequent reading of s1 always return an empty string as the length field is 0.
If you would assign the characters to the string, e.g. as:
for i := 1 to 5 do
begin
SetLength(s1, Length(s1)+1);
s1[i] := s[i];
writeln(s1[i],#10,'^^^',s1,'^^^');
end;
then the result would be what you originally expected.
Still better to set the length to the final 5 before the for loop.
Of course there are other solutions too. One is to not set the length at all, but to concatenate the string in the loop and let it handle the length field by itself:
for i := 1 to 5 do
begin
s1 := s1 + s[i];
writeln(s1[i],#10,'^^^',s1,'^^^');
end;
Edit 24.12.2021:
In a comment you said: But I still don't understand, why, when i wrote s1[1] in the for loop, all worked?
...and presumably refer to this code just before end.:
for i := 1 to 5 do
writeln('&&&',s1[i],'&&&');
We need to look at the memory layout and know that the first byte of the memory allocated to s1 is the length of the string. It can be referred to as s[0]. Subsequent bytes hold the characters that make up the stored string and they can be referred to as s[1]..s[n].
The first byte was set to 0 when you wrote (in the very beginning):
s1 := '';
// memory content:
0
|_|_|_|_|_|_|_|_|_| ...
Then you added the characters to s1 by manipulating the memory directly when you wrote in the first for loop:
s1[i] := s[i];
// content after 5 characters
0 a b r a k
|_|_|_|_|_|_|_|_|_| ...
Because you did not use concatenation (or adjusted the length while you added the characters) the length is still 0.
Then at the end in the last loop, you fetch the characters again by accessing the memory directly, and get the result you do, seemingly correct, but badly misusing the string structure.

converting an array of byte to array of int64 golang [duplicate]

This question already has an answer here:
Go binary.Read into slice of data gives zero result
(1 answer)
Closed 10 months ago.
I came across a situation where I want to convert an array of byte to array of int64 and I am trying to do the below
func covertToInt64(message []byte) []int64{
rbuf := bytes.NewBuffer(message)
arr := []int64{}
e := binary.Read(rbuf, binary.LittleEndian, &arr)
if e != nil {
}
return arr
}
The above returns an empty arr but when I convert []byte to a string as below
msg:=string(message)
msg have the value "[1,2]"
May I know a better and correct way to do this in Go?
The question is what is exactly that you want?
If the message is byte values from 0 to 0xFF, and you simply want to cast each member of the slice into int64, then the answer is:
ints := make([]int64, len(message))
for index, b := range message {
ints[index] = int64(b)
}
If the the message is the binary data, representing int64 values, then the solution is a bit more complicated than that. Because int64 is 8 bytes long each, thus to be able to convert a slice of bytes, the length of the message must be divisible by eight without any remainder at it's best. We're dropping other cases here.
So, then the answer is:
ml := len(message)
il := ml/8
if ml%8 != 0 {
// there's more than il*8 bytes, but not
// enough to make il+1 int64 values
// error out here, if needed
}
ints := make([]int64, il)
err := binary.Read(bytes.NewReader(message), ints)
The thing is that when you call binary.Read you need to know the size of the destination value in advance. And the reading fails because: destination length is zero, and in addition the source length is not enough to read even a single one int64 value.
I guess the second situation is a bit more complicated and what you actually wanted can be solved with the first scenario.

Go: how to convert unsafe.Pointer into pointer to array with unknown length?

I am trying to write a Go program which uses mmap to map a very large file containing float32 values into memory. Here is my attempt (inspired by a previous answer, error handling omitted for brevity):
package main
import (
"fmt"
"os"
"syscall"
"unsafe"
)
func main() {
fileName := "test.dat"
info, _ := os.Stat(fileName)
fileSize := info.Size()
n := int(fileSize / 4)
mapFile, _ := os.Open(fileName)
defer mapFile.Close()
mmap, _ := syscall.Mmap(int(mapFile.Fd()), 0, int(fileSize),
syscall.PROT_READ, syscall.MAP_SHARED)
defer syscall.Munmap(mmap)
mapArray := (*[n]float32)(unsafe.Pointer(&mmap[0]))
for i := 0; i < n; i++ {
fmt.Println(mapArray[i])
}
}
This fails with the following error message:
./main.go:21: non-constant array bound n
Since n is determined by the length of the file (not known at compile time), I cannot replace n with a constant value in the cast. How do I convert mmap into an array (or slice) of float32 values?
You first convert to an array of a type with a static length that can fit your data, then slice that array to the correct length and capacity.
mapSlice := (*[1 << 30]float32)(unsafe.Pointer(&mmap[0]))[:n:n]
Unfortunately you can't get a pointer to an array in your case. This is because n is not a constant value (i.e. it's determined at runtime with fileSize/4). (Note If fileSize were constant, you could get an array.)
There are safe and unsafe alternatives though.
The safe, or some might call the "right" way -- this requires a copy, but you have control over the endianness. Here's an example:
import (
"encoding/binary"
"bytes"
"unsafe" // optional
)
const SIZE_FLOAT32 = unsafe.Sizeof(float32(0)) // or 4
bufRdr := bytes.NewReader(mmap)
mapSlice := make([]float32, len(mmap)/SIZE_FLOAT32) // = fileSize/4
err := binary.Read(bufRdr, binary.LittleEndian, mapSlice) // could pass &mapSlice instead of mapSlice: same result.
// mapSlice now can be used like the mapArray you wanted.
There are a couple ways to do this unsafely, but with Go 1.17 it's pretty simple.
mapSlice := unsafe.Slice((*float32)(unsafe.Pointer(&mmap[0])), len(mmap)/SIZE_FLOAT32)
You could also use reflect.SliceHeader. There are lots of nuances to be careful of here to prevent garbage collector issues:
var mapSlice []float32 // mapSlice := []float32{} also works (important thing is that len and cap are 0)
// newSh and oldSh are here for readability (i.e. inlining these variables is ok, but makes things less readable IMO)
newSh := (*reflect.SliceHeader)(unsafe.Pointer(&mapSlice))
oldSh := (*reflect.SliceHeader)(unsafe.Pointer(&mmap))
// Note: order of assigning Data, Cap, Len is important (due to GC)
newSh.Data = oldSh.Data
newSh.Cap = oldSh.Cap/SIZE_FLOAT32
newSh.Len = oldSh.Len/SIZE_FLOAT32
runtime.KeepAlive(mmap) // ensure `mmap` is not freed up until this point.
The final unsafe way I can think of is given in #JimB's answer -- cast an mmap's Data to an unsafe.Pointer, then cast that to an arbitrarily large pointer to array, and then finally slice that array specifying to desired size and capacity.

delphi x32 and x64, (typecast?) array of bytes into (wide)string

I receive from an external function(dll) widestring in array of bytes.
to convert the bytes to string, I use the following simple code:
mystrvar := widestring(buffer);
where buffer is the byte array.
when compile for 32bits, it works great, but when compile this for 64bits code returns empty string while the buffer(byte array) is the same in both cases.
the same happens when use
mystrvar := string(buffer);
while pchar(buffer) or pwchar(buffer) works.
Reason why I do not use pwchar is;
pwchar(buffer) breaks by 00 while widestring(buffer) does not. This buffer(bytearray) contains stringlist which is delimited by (00).
btw, excuse me for bad english.
Use
SetString(mystrvar,buffer,LENGTH(buffer) DIV SizeOf(WideChar));
assuming that
VAR
mystrvar : WideString;
buffer: ARRAY OF BYTE;
and that "buffer" does not contain a trailing zero-terminating set of bytes. Also note that "buffer" is an array of BYTEs and thus the length of the buffer is twice that of the length of the resulting string.
Assuming your array is double Null Terminating you can use this:
while Buffer^ <> WideNull do
begin
value := PWChar(Buffer);
CommaText := CommaText + value + ',';
Inc(Buffer, (Length(value) + 1));
end;

How do you convert a slice into an array?

I am trying to write an application that reads RPM files. The start of each block has a Magic char of [4]byte.
Here is my struct
type Lead struct {
Magic [4]byte
Major, Minor byte
Type uint16
Arch uint16
Name string
OS uint16
SigType uint16
}
I am trying to do the following:
lead := Lead{}
lead.Magic = buffer[0:4]
I am searching online and not sure how to go from a slice to an array (without copying). I can always make the Magic []byte (or even uint64), but I was more curious on how would I go from type []byte to [4]byte if needed to?
The built in method copy will only copy a slice to a slice NOT a slice to an array.
You must trick copy into thinking the array is a slice
copy(varLead.Magic[:], someSlice[0:4])
Or use a for loop to do the copy:
for index, b := range someSlice {
varLead.Magic[index] = b
}
Or do as zupa has done using literals. I have added onto their working example.
Go Playground
You have allocated four bytes inside that struct and want to assign a value to that four byte section. There is no conceptual way to do that without copying.
Look at the copy built-in for how to do that.
Try this:
copy(lead.Magic[:], buf[0:4])
Tapir Liui (auteur de Go101) twitte:
Go 1.18 1.19 1.20 will support conversions from slice to array: golang/go issues 46505.
So, since Go 1.18,the slice copy2 implementation could be written as:
*(*[N]T)(d) = [N]T(s)
or, even simpler if the conversion is allowed to present as L-values:
[N]T(d) = [N]T(s)
Without copy, you can convert, with the next Go 1.17 (Q3 2021) a slice to an array pointer.
This is called "un-slicing", giving you back a pointer to the underlying array of a slice, again, without any copy/allocation needed:
See golang/go issue 395: spec: convert slice x into array pointer, now implemented with CL 216424/, and commit 1c26843
Converting a slice to an array pointer yields a pointer to the underlying array of the slice.
If the length of the slice is less than the length of the array,
a run-time panic occurs.
s := make([]byte, 2, 4)
s0 := (*[0]byte)(s) // s0 != nil
s2 := (*[2]byte)(s) // &s2[0] == &s[0]
s4 := (*[4]byte)(s) // panics: len([4]byte) > len(s)
var t []string
t0 := (*[0]string)(t) // t0 == nil
t1 := (*[1]string)(t) // panics: len([1]string) > len(s)
So in your case, provided Magic type is *[4]byte:
lead.Magic = (*[4]byte)(buffer)
Note: type aliasing will work too:
type A [4]int
var s = (*A)([]int{1, 2, 3, 4})
Why convert to an array pointer? As explained in issue 395:
One motivation for doing this is that using an array pointer allows the compiler to range check constant indices at compile time.
A function like this:
func foo(a []int) int
{
return a[0] + a[1] + a[2] + a[3];
}
could be turned into:
func foo(a []int) int
{
b := (*[4]int)(a)
return b[0] + b[1] + b[2] + b[3];
}
allowing the compiler to check all the bounds once only and give compile-time errors about out of range indices.
Also:
One well-used example is making classes as small as possible for tree nodes or linked list nodes so you can cram as many of them into L1 cache lines as possible.
This is done by each node having a single pointer to a left sub-node, and the right sub-node being accessed by the pointer to the left sub-node + 1.
This saves the 8-bytes for the right-node pointer.
To do this you have to pre-allocate all the nodes in a vector or array so they're laid out in memory sequentially, but it's worth it when you need it for performance.
(This also has the added benefit of the prefetchers being able to help things along performance-wise - at least in the linked list case)
You can almost do this in Go with:
type node struct {
value int
children *[2]node
}
except that there's no way of getting a *[2]node from the underlying slice.
Go 1.20 (Q1 2023): this is addressed with CL 430415, 428938 (type), 430475 (reflect) and 429315 (spec).
Go 1.20
You can convert from a slice to an array directly with the usual conversion syntax T(x). The array's length can't be greater than the slice's length:
func main() {
slice := []int64{10, 20, 30, 40}
array := [4]int64(slice)
fmt.Printf("%T\n", array) // [4]int64
}
Go 1.17
Starting from Go 1.17 you can directly convert a slice to an array pointer. With Go's type conversion syntax T(x) you can do this:
slice := make([]byte, 4)
arrptr := (*[4]byte)(slice)
Keep in mind that the length of the array must not be greater than the length of the slice, otherwise the conversion will panic.
bad := (*[5]byte)(slice) // panics: slice len < array len
This conversion has the advantage of not making any copy, because it simply yields a pointer to the underlying array.
Of course you can dereference the array pointer to obtain a non-pointer array variable, so the following also works:
slice := make([]byte, 4)
var arr [4]byte = *(*[4]byte)(slice)
However dereferencing and assigning will subtly make a copy, since the arr variable is now initialized to the value that results from the conversion expression. To be clear (using ints for simplicity):
v := []int{10,20}
a := (*[2]int)(v)
a[0] = 500
fmt.Println(v) // [500 20] (changed, both point to the same backing array)
w := []int{10,20}
b := *(*[2]int)(w)
b[0] = 500
fmt.Println(w) // [10 20] (unchanged, b holds a copy)
One might wonder why the conversion checks the slice length and not the capacity (I did). Consider the following program:
func main() {
a := []int{1,2,3,4,5,6}
fmt.Println(cap(a)) // 6
b := a[:3]
fmt.Println(cap(a)) // still 6
c := (*[3]int)(b)
ptr := uintptr(unsafe.Pointer(&c[0]))
ptr += 3 * unsafe.Sizeof(int(0))
i := (*int)(unsafe.Pointer(ptr))
fmt.Println(*i) // 4
}
The program shows that the conversion might happen after reslicing. The original backing array with six elements is still there, so one might wonder why a runtime panic occurs with (*[6]int)(b) where cap(b) == 6.
This has actually been brought up. It's worth to remember that, unlike slices, an array has fixed size, therefore it needs no notion of capacity, only length:
a := [4]int{1,2,3,4}
fmt.Println(len(a) == cap(a)) // true
You might be able to do the whole thing with one read, instead of reading individually into each field. If the fields are fixed-length, then you can do:
lead := Lead{}
// make a reader to dispense bytes so you don't have to keep track of where you are in buffer
reader := bytes.NewReader(buffer)
// read into each field in Lead, so Magic becomes buffer[0:4],
// Major becomes buffer[5], Minor is buffer[6], and so on...
binary.Read(reader, binary.LittleEndian, &lead)
Don't. Slice itself is suffice for all purpose. Array in go lang should be regarded as the underlying structure of slice. In every single case, use only slice. You don't have to array yourself. You just do everything by slice syntax. Array is only for computer. In most cases, slice is better, clear in code. Even in other cases, slice still is sufficient to reflex your idea.

Resources