Map in order range loop - loops

I'm looking for a definitive way to range over a Go map in-order.
Golang spec states the following:
The iteration order over maps is not specified and is not guaranteed to be the same from one iteration to the next. If map entries that have not yet been reached are removed during iteration, the corresponding iteration values will not be produced. If map entries are created during iteration, that entry may be produced during the iteration or may be skipped. The choice may vary for each entry created and from one iteration to the next. If the map is nil, the number of iterations is 0.
All I've found here on StackOverflow and Googling are (imho) workarounds that I don't like.
Is there a solid way to iterate through a map and retrieve items in the order they've been inserted?
The solutions I've found are:
Keep track of keys and values in two separate slices: which sounds like "Do not use a map", losing all the advantages of using maps.
Use a map but keep track of keys in a different slice: this means data duplication which might lead to data misalignment and eventually may bring loads of bugs and painful debugging.
What do you suggest?
Edit in response to the possible duplicate flag.
There's a slight difference between my question and the one provided (this question, but also this one), both questions asked for looping through the map following the keys lexicographic order; I, instead, have specifically asked:
Is there a solid way to iterate through a map and retrieve items in the order they've been inserted?
which is not lexicographic and thus different from #gramme.ninja question:
How can I get the keys to be in order / sort the map so that the keys are in order and the values correspond?

If you need a map and keys in order, those are 2 different things, you need 2 different (data) types to provide that functionality.
With a keys slice
The easiest way to achieve this is to maintain key order in a different slice. Whenever you put a new pair into the map, first check if the key is already in it. If not, add the new key to the separate slice. When you need elements in order, you may use the keys slice. Of course when you remove a pair, you also have to remove it from the slice too.
The keys slice only has to contain the keys (and not the values), so the overhead is little.
Wrap this new functionality (map+keys slice) into a new type and provide methods for it, and hide the map and slice. Then data misalignment cannot occur.
Example implementation:
type Key int // Key type
type Value int // Value type
type Map struct {
m map[Key]Value
keys []Key
}
func New() *Map {
return &Map{m: make(map[Key]Value)}
}
func (m *Map) Set(k Key, v Value) {
if _, ok := m.m[k]; !ok {
m.keys = append(m.keys, k)
}
m.m[k] = v
}
func (m *Map) Range() {
for _, k := range m.keys {
fmt.Println(m.m[k])
}
}
Using it:
m := New()
m.Set(1, 11)
m.Set(2, 22)
m.Range()
Try it on the Go Playground.
With a value-wrapper implementing a linked-list
Another approach would be to wrap the values, and –along the real value– also store the next/previous key.
For example, assuming you want a map like map[Key]Value:
type valueWrapper struct {
value Value
next *Key // Next key
}
Whenever you add a pair to the map, you set a valueWrapper as the value, and you have to link this to the previous (last) pair. To link, you have to set next field of the last wrapper to point to this new key. To easily implement this, it's recommended to also store the last key (to avoid having to search for it).
When you want to iterate over the elements in insertion order, you start from the first (you have to store this), and its associated valueWrapper will tell you the next key (in insertion order).
Example implementation:
type Key int // Key type
type Value int // Value type
type valueWrapper struct {
v Value
next *Key
}
type Map struct {
m map[Key]valueWrapper
first, last *Key
}
func New() *Map {
return &Map{m: make(map[Key]valueWrapper)}
}
func (m *Map) Set(k Key, v Value) {
if _, ok := m.m[k]; !ok && m.last != nil {
w2 := m.m[*m.last]
m.m[*m.last] = valueWrapper{w2.v, &k}
}
w := valueWrapper{v: v}
m.m[k] = w
if m.first == nil {
m.first = &k
}
m.last = &k
}
func (m *Map) Range() {
for k := m.first; k != nil; {
w := m.m[*k]
fmt.Println(w.v)
k = w.next
}
}
Using it is the same. Try it on the Go Playground.
Notes: You may vary a couple of things to your liking:
You may declare the internal map like m map[Key]*valueWrapper and so in Set() you can change the next field without having to assign a new valueWrapper.
You may choose first and last fields to be of type *valueWrapper
You may choose next to be of type *valueWrapper
Comparison
The approach with an additional slice is easier and cleaner. But removing an element from it may become slow if the map grows big, as we also have to find the key in the slice which is "unsorted", so it's O(n) complexity.
The approach with linked-list in value-wrapper can easily be extended to support fast element removal even if the map is big, if you also add the prev field to the valueWrapper struct. So if you need to remove an element, you can super-fast find the wrapper (O(1)), update the prev and next wrappers (to point to each other), and perform a simple delete() operation, it's O(1).
Note that deletion in the first solution (with slice) could still be sped up by using 1 additional map, which would map from key to index of the key in the slice (map[Key]int), so delete operation could still be implemented in O(1), in exchange for greater complexity. Another option for speeding up could be to change the value in the map to be a wrapper, which could hold the actual value and the index of the key in the slice.
See related question: Why can't Go iterate maps in insertion order?

Related

What's the fastest way of finding the index of the maximum value in an array?

I have a 2D array of type f32 (from ndarray::ArrayView2) and I want to find the index of the maximum value in each row, and put the index value into another array.
The equivalent in Python is something like:
import numpy as np
for i in range (0, max_val, batch_size):
sims = xp.dot(batch, vectors.T)
# sims is the dot product of batch and vectors.T
# the shape is, for example, (1024, 10000)
best_rows[i: i+batch_size] = sims.argmax(axis = 1)
In Python, the function .argmax is very fast, but I don't see any function like that in Rust. What's the fastest way of doing so?
Consider the easy case of a general Ord type: The answer will differ slightly depending on whether you know the values are Copy or not, but here's the code:
fn position_max_copy<T: Ord + Copy>(slice: &[T]) -> Option<usize> {
slice.iter().enumerate().max_by_key(|(_, &value)| value).map(|(idx, _)| idx)
}
fn position_max<T: Ord>(slice: &[T]) -> Option<usize> {
slice.iter().enumerate().max_by(|(_, value0), (_, value1)| value0.cmp(value1)).map(|(idx, _)| idx)
}
The basic idea is that we pair [a reference to] each item in the array (really, a slice - it doesn't matter if it's a Vec or an array or something more exotic) with its index, use std::iter::Iterator functions to find the maximum value according to the value only (not the index), then return just the index. If the slice is empty None will be returned. Per the documentation, the rightmost index will be returned; if you need the leftmost, do rev() after enumerate().
rev(), enumerate(), max_by_key(), and max_by() are documented here; slice::iter() is documented here (but that one needs to be on your shortlist of things to recall without documentation as a rust dev); map is Option::map() documented here (ditto). Oh, and cmp is Ord::cmp but most of the time you can use the Copy version which doesn't need it (e.g. if you're comparing integers).
Now here's the catch: f32 isn't Ord because of the way IEEE floats work. Most languages ignore this and have subtly wrong algorithms. The most popular crate to provide a total order on Ord (by declaring all NaN to be equal, and greater than all numbers) seems to be ordered-float. Assuming it's implemented correctly it should be very very lightweight. It does pull in num_traits but this is part of the most popular numerics library so might well be pulled in by other dependencies already.
You'd use it in this case by mapping ordered_float::OrderedFloat (the "constructor" of the tuple type) over the slice iter (slice.iter().map(ordered_float::OrderedFloat)). Since you only want the position of the maximum element, no need to extract the f32 afterward.
The approach from #David A is cool, but as mentioned, there's a catch: f32 & f64 do not implement Ord::cmp. (Which is really a pain in your-know-where.)
There are multiple ways of solving that: You can implement cmp yourself, or you can use ordered-float, etc..
In my case, this is a part of a bigger project and we are very careful about using external packages. Besides, I am pretty sure we don't have any NaN values. Therefore I would prefer using fold, which, if you take a close look at the max_by_key source code, is what they have been using too.
for (i, row) in matrix.axis_iter(Axis(1)).enumerate() {
let (max_idx, max_val) =
row.iter()
.enumerate()
.fold((0, row[0]), |(idx_max, val_max), (idx, val)| {
if &val_max > val {
(idx_max, val_max)
} else {
(idx, *val)
}
});
}

Swift algorithm to enumerate a multilinear map, using multiples indexes:[Int]

A multilinear map M has its elements stored in a one-dimension array of length N, with a Shape S defined by S:[Int] = [p,q,r,...] so that q*p*r*... = N. The Shape is of variable size, not known at compile time.
The issue I'm trying to solve is a generic approach to accessing the map's elements using an array of integers, which individual values are coordinates in the Shape S, ex: M[1,3,2], M[2,3,3,3] etc... This is a problem different from a simple enumeration of the map's elements.
One method is to use M[i,j,k] and implement a subscript method. Unfortunately, this approach hardcodes the map's shape, and the algorithm is no longer generic.
Say there's a utility function that returns an element index from a tuple derived from the map's Shape, so that:
func index(_ indexes:[Int]) -> Int {....}
func elementAt(indexes:[Int]) -> Element {
return elements_of_the_map[self.index(indexes)]
}
M.elementAt(indexes:[i,j,k]) or M.elementAt(indexes:[i,j,k,l,m]) always work. So the problem at this point is to build the array [i,j,k,...]
Question: Is there an algorithm to efficiently enumerate those indexes? Nested loops won't work since the number of loops isn't known at compile time, and recursive function seem to add a lot of complexity (in particular keeping track of previous indexes).
I'm thinking about an algorithm 'a la' base-x counting, that is adding one unit to the top right index, and moving leftwards one unit if the count exceeds the number of elements by the map's Shape.
Same idea, but less code:
func addOneUnit(shape: [Int], indexes: [Int]) -> [Int]? {
var next = indexes
for i in shape.indices.reversed() {
next[i] += 1
if next[i] < shape[i] {
return next
}
next[i] = 0
}
return nil
}
Here's the code, it's primitive, but should work. The idea is to increment, right-to-left, to move say to [1,2,2] from [1,2,1] with the shape constraint [2,3,3].
func add_one_unit(shape:[Int],indexes:[Int]) -> [Int]? {
//Addition is right to left, so we have to reverse the arrays. Shape Arrays are usually very small, so it's fast.
let uu = Array(indexes.reversed()); //Array to add one index to.
let shape_reversed = Array(shape.dimensions.reversed()); //Shape array.
var vv:[Int] = [];
var move_next:Bool = true;
for i in 0..<uu.count {
if move_next {
if uu[i] < shape_reversed[i] - 1 { //Shape constraint is OK.
vv.append(uu[i] + 1)
move_next = false;
} else {
vv.append(0) //Shape constraint is reached.
move_next = true;//we'll flip the next index.
}
} else {
vv.append(uu[i]) //Nothing to change.
}
}
return ( vv.reduce(true, { $0&&($1 == 0) }) ) ? nil : Array(vv.reversed()); //Returns nil once we reached the Zero Vector.
}
Which gives
add_one_unit(shape:[2,3,3],indexes:[0,0,0]) -> [0,0,1]
add_one_unit(shape:[2,3,3],indexes:[1,2,2]) -> [0,0,0]/nil
Once this is done, this function can be used to enumerate a multilinear map of any shape (a mapping of [i,j,k,...] to a unique index such as matrix to index mapping is necessary and depends on your implementation), or slice a map starting from any particular vector.

Duplicate Int in Array , Dictionary or Set in SWIFT

Reading up on Sets and Arrays I find that a Set cannot, or is not able to store duplicate values ( Ints, Strings, etc ).
Knowing this, if we are to solve for finding a duplicate Int in an array and one method is to convert the Array to a Set, how come we don't get an error once the Array is a Set?
The methods below simply return a Bool value if the array contains duplicates.
import UIKit
func containsDuplicatesDictionary(a: [Int]) -> Bool {
var aDict = [Int : Int]()
for value in a {
if let count = aDict[value] {
aDict[value] = count + 1
return true
} else {
aDict[value] = 1
}
}
return false
}
containsDuplicatesDictionary(a: [1,2,2,4,5])
func containsDuplicatesSet(a: [Int]) -> Bool {
return Set(a).count != a.count
}
containsDuplicatesSet(a: [1,2,2,4])
The first function, containsDuplicatesDictionary, I convert the array to a Dictionary, of course this takes a for loop as well. The Set method can be done in one line, which is really nice. But I guess since I am new to this, I would think converting the array would throw an error immediately since theres duplicate values.
What am I missing when it's converted
Thank you.
Set, by design is an unordered, unique collection of elements. The implementation of Set takes care of duplicate values itself, when you try to add a duplicate value, it checks whether the value is already present in the Set or not and if it is, the value is not added.
When you call the initializer of Set that takes a sequence as its input parameter (this is what you use when writing Set(a), where a is of type [Int], under the hood, the initializer adds the elements one by one checking whether any of the new elements are already present in the Set or not.
You could make a custom initializer method for Set that would throw an error if you would try to add a duplicate value to it, but it wouldn't really have any advantages for any users of Swift, hence the current implementation that just doesn't add the value if it is already present in the Set and doesn't throw an error. This way, you can safely and easily get rid of any duplicates in a non-unique collection of elements (such as an array).

Is there any better way to handle slices of variable size?

please see the code below
names := make([]string, 0, 100)
names = append(names, "Jack")
names = append(names, "Jacob")
// adding many names in here
Given a circumstances like this: I will get these names from somewhere else, before that I didn't know the size of it. So I a need a dynamic array to contains these names. The above code is way that I came up with. I was wonder if there is any more elegant way to do this.
If I initialise like this
names := make([]string, 100, 200)
// then I use append in here
// I would get first 100 elements as empty, the append start at index 101.
I suppose this would be a such waste on memory.
I am totally new to static programming language, so if there is any wrong concept in this post, please point it out.
Just only declare the type and then assign the appended slice to it:
package main
import "fmt"
func main() {
var names []string
names = append(names, "foo")
names = append(names, "bar")
fmt.Println(names)
}
Yields:
>> [foo bar]
If you are into the mechanics of it, here is a nice blog post.
Stick with what you are doing. The 100 DOES NOT prevent the slice from having more the 100 elements
names := make([]string, 0, 100)
names = append(names, "Jack")
names = append(names, "Jacob")
I would strongly suggest to set the capacity of the slice if you have rough estimates of number of elements and ALWAYS use append to add elements to the slice. You don't have to worry about the exceeding your estimate as append WILL create new array to fit the added elements.
names := make([]string)
The above case your array has 0 capacity, and append will cause the underlying array to be created again and again. This will have impact in performance. You should avoid this. If you worry about taking up more space in the memory you might consider creating slice of pointer to a type
objList := make([]*MyStructType, 0, 100)
You can consider setting an initial length, instead of '0', in addition of your capacity (100, meaning at most 100 elements can be added).
See "Arrays, slices (and strings): The mechanics of 'append'"
allocation.
We could use the new built-in function to allocate a bigger array and then slice the result, but it is simpler to use the make built-in function instead.
It allocates a new array and creates a slice header to describe it, all at once. The make function takes three arguments: the type of the slice, its initial length, and its capacity, which is the length of the array that make allocates to hold the slice data
The idea is to avoid append() to have to grow the slice too soon, especially if you know you will receive at least n elements.

dynamic array auto shrink

When designing a dynamic array which will shrink, what is a method to keep track of the highest used index, and when that index's held object is deleted, find the new highest used index.
Right now I can think up having a a simple int last_used and then charging the cost of maintaining this variable to the func_delete which must check if it deletes the highest, and if so, check each smaller value looking for a non null. (array will always be null initialized)
if(last_index == deleted_index){
while(last_index >0 && array[--last_index] != NULL)
// if the array is now only half full, I realloc
}
Is there any other smart ways?
It looks okay to me, but logic wise, if the function func_delete always deletes the highest item, then there shouldn't be any NULL values at smaller indices. So this should do:
if(last_index == deleted_index && array[last_index] != NULL) {
//delete last item
free(array[last_index]);
//set to NULL and decrement
array[last_index--] = NULL;
}
Edit:
based on your comments, I understand what you're trying to do, I think you could just keep track of the highest used index in the insertion function instead:
void array_insert(array , element e, int index)
{
if (index > arr->highest_index) {
arr->highest_index = index;
}
//insert element
}
And when you delete you check if the index is the highest or not. like you're doing, don't think there's a better way to do that, without complicating things further, for example, you could keep another sorted list of indices, this way you when you delete the highest index you find the next one in constant time, but like I said, it complicates things. However, I think another data structure might be more useful, a linked list for example, is more efficient when randomly deleting nodes, but not when randomly inserting nodes.

Resources