I'm porting some old PHP script to Go in order to achieve better performance. However, the old PHP is full of multidimensional arrays. Some excerpt from the codebase:
while (($row = $stmt->fetch(PDO::FETCH_ASSOC)) !== false) {
$someData[$row['column_a']][$row['column_b']] = $row;
}
// ... more queries and stuff
if (isset($moreData['id']) && isset($anotherData['id']) && $someData[$anotherData['id']][$moreData['id']]) {
echo $someData[$anotherData['id']][$moreData['id']];
}
Awful, i know, but i can't change the logic. I made the whole script perform much better by compiling with phc, but moving to a procedural, statically-typed language seems like a better move. How can i replicate those data structures efficiently with Go or Rust? It needs to be fault-tolerant when it comes to checking indexes, there's a lot of issets all around the script to check if the identifiers exist in the data structures.
In Go, this would be represented as a map. The syntax is map[key]value. So for example to store a multidimensional map of [string, string] -> int it would be map[string]map[string]int. If you know your indices are integers and densely packed, then you'd want to use slices. Those are simpler and look like [][]type.
As for the checking for a key existing, use this syntax, where m is a map:
if val, ok := m[key1][key2]; ok {
///Do something with val
}
Remember that to add a key to a multidimensional map you'd have to make sure the inner map is allocated before adding to it.
if _, ok := m[key]; !ok {
m[key] = make(map[string]int)
}
m[key1][key2] = value
Obviously you'd want to wrap this up in a type with methods or a few simple functions.
Rust
PHP arrays are actually associative arrays, also known as maps or dictionaries. Rust uses the name Map in its standard library. The Map trait provides an interface for a variety of implementations. In particular, this trait defines a contains_key method, which you can use to check if the map contains a particular key (instead of writing isset($array[$key]), you write map.contains_key(key)).
Map has two type parameters: K is the type of the map's keys (i.e. the values you use as an index) and V is the type of the map's values.
If you need your map to contain keys and/or values of various types, you'll need to use the Any trait. For example, if the keys are strings and the values are of various types, you could use HashMap<String, Box<Any>> (Box is necessary because trait objects are unsized; see this answer for more information). Check out the documentation for AnyRefExt and AnyMutRefExt to see how to work with an Any value.
However, if the possible types are relatively limited, it might be easier for you to define your own trait and use that trait instead of Any so that you can implement operations on those types without having to cast explicitly everywhere you need to use the values (plus, you can add types by adding an impl without having to change all the places that use values).
Related
In many languages you can specify that an array is of a certain type. For instance, in Java you could write:
String[] arrayOfStrings;
However in ActionScript 3 it seems that you can only specify that an object is of type Array, for instance:
var myArray:Array;
Is there a way to specify what type of object an AS3 array will contain?
You can use Vector.<String> to store several objects of the given type in an array. Vector is type-safe and is faster than Array so in almost all cases (when it's up to you) you should use Vector instead of Array.
I also recommend reading this article about the various ways to construct a vector. The article is from 2010 (so many Flash Player improvements have been done since then) but much of it still applies and you can download Jackson's test source to run the performance test on the current player.
So in Swift, what's the difference between
var arr = ["Foo", "Bar"] // normal array in Swift
and
var arr = NSMutableArray.array() // 'NSMutableArray' object
["Foo", "Bar"].map {
arr.addObject($0)
}
other than being different implementations of the same thing.
Both appear to have all the basic features that one might need (.count, the ability to insert/remove objects etc.).
NSMutableArray was invented back in the Obj-C days, obviously to provide a more modern solution instead of the regular C-style arrays. But how does it compare to Swift's built-in array?
Which one is safer and/or faster?
The most important difference, in my opinion, is that NSMutableArray is a class type and Array is a value type. Ergo, an NSMutableArray will be passed as a reference, whereas a Swift Array will be passed by value.
Furthermore NSMutableArray is a subclass of NSObject whereas Array has no parent class. - this means that you get access to all NSObject methods and other 'goodies' when utilising NSMutableArray.
An NSMutableArray will not be copied when you amend it, a Swift Array will be.
Which one is best really depends on your application.
I find (when working with UIKit and Cocoa touch) that NSMutableArray is great when I need a persistent model, whereas Array is great for performance and throw away arrays.
These are just my initial thoughts, I'm sure someone from the community can offer much deeper insight.
Reference Type When:(NSMutableArray)
Subclasses of NSObject must be class types
Comparing instance identity with === makes sense
You want to create shared, mutable state
Value Type When: (Swift array)
Comparing instance data with == makes sense (Equatable protocol)
You want copies to have independent state
The data will be used in code across multiple threads (avoid explicit synchronization)
Interestingly enough, the Swift standard library heavily favors value types:Primitive types (Int, Double, String, …) are value types
Standard collections (Array, Dictionary, Set, …) are value types
Aside from what is illustrated above, the choice really depends on what you are trying to implement. As a rule of thumb, if there is no specific constraint that forces you to opt for a reference type, or you are not sure which option is best for your specific use case, you could start by implementing your data structure using a value type. If needed, you should be able to convert it to a reference type later with relatively little effort.
Conclusion:
Reference types incur more memory overhead, from reference counting and also for storing its data on the heap.
It's worth knowing that copying value types is relatively cheap in Swift,
But it’s important to keep in mind that if your value types become too large, the performance cost of copying can become greater than the cost of using reference types.
I can see that it's possible to write functions like map/sortBy/findIndex and some other List-related functions for Arrays instead (at least those indexed by integers.) Is this done anywhere in the standard library, or would I need to roll my own?
I need to use an array in my program for the in-place update, but there are also several locations I'd like to use some of the above list functions on it. Is converting back and forth between the two the best solution?
(The arrays I've been looking at are from Data.Array.IArray. I'm also happy to use any other array library that implements this functionality.)
I recommend you have a look at the vector and vector-algorithms packages. They contain very efficient implementations of many common operations on Int-indexed arrays, in both mutable and immutable variants.
fmap (from Control.Monad) is sort of like a generic version of map that works on anything that supports the Functor type class. Array supports that, so you should be able to use fmap instead of map for array.
As hammar says, the vector and vector-algorithms are probably a better way to approach the problem if you need to consider indexed arrays.
This is a bit of a general question but I was wondering if anybody could advise me on what would be advantages of working with Array vs ArraySeq. From what I have seen Array is scala's representation of java Array and there are not too many members in its API whereas ArraySeq seems to contain a much richer API.
There are actually four different classes you could choose from to get mutable array-like functionality.
Array + ArrayOps
WrappedArray
ArraySeq
ArrayBuffer
Array is a plain old Java array. It is by far the best way to go for low-level access to arrays of primitives. There's no overhead. Also it can act like the Scala collections thanks to implicit conversion to ArrayOps, which grabs the underlying array, applies the appropriate method, and, if appropriate, returns a new array. But since ArrayOps is not specialized for primitives, it's slow (as slow as boxing/unboxing always is).
WrappedArray is a plain old Java array, but wrapped in all of Scala's collection goodies. The difference between it and ArrayOps is that WrappedArray returns another WrappedArray--so at least you don't have the overhead of having to re-ArrayOps your Java primitive array over and over again for each operation. It's good to use when you are doing a lot of interop with Java and you need to pass in plain old Java arrays, but on the Scala side you need to manipulate them conveniently.
ArraySeq stores its data in a plain old Java array, but it no longer stores arrays of primitives; everything is an array of objects. This means that primitives get boxed on the way in. That's actually convenient if you want to use the primitives many times; since you've got boxed copies stored, you only have to unbox them, not box and unbox them on every generic operation.
ArrayBuffer acts like an array, but you can add and remove elements from it. If you're going to go all the way to ArraySeq, why not have the added flexibility of changing length while you're at it?
From the scala-lang.org forum:
Array[T] - Benefits: Native, fast -
Limitations: Few methods (only apply,
update, length), need to know T at
compile-time, because Java bytecode
represents (char[] different from
int[] different from Object[])
ArraySeq[T] (the class formerly known
as GenericArray[T]): - Benefits: Still
backed by a native Array, don't need
to know anything about T at
compile-time (new ArraySeq[T] "just
works", even if nothing is known about
T), full suite of SeqLike methods,
subtype of Seq[T] - Limitations: It's
backed by an Array[AnyRef], regardless
of what T is (if T is primitive, then
elements will be boxed/unboxed on
their way in or out of the backing
Array)
ArraySeq[Any] is much faster than
Array[Any] when handling primitives.
In any code you have Array[T], where T
isn't <: AnyRef, you'll get faster
performance out of ArraySeq.
Array is a direct representation of Java's Array, and uses the exact same bytecode on the JVM.
The advantage of Array is that it's the only collection type on the JVM to not undergo type erasure, Arrays are also able to directly hold primitives without boxing, this can make them very fast under some circumstances.
Plus, you get Java's messed up array covariance behaviour. (If you pass e.g. an Array[Int] to some Java class it can be assigned to a variable of type Array[Object] which will then throw an ArrayStoreException on trying to add anything that isn't an int.)
ArraySeq is rarely used nowadays, it's more of a historic artifact from older versions of Scala that treated arrays differently. Seeing as you have to deal with boxing anyway, you're almost certain to find that another collection type is a better fit for your requirements.
Otherwise... Arrays have exactly the same API as ArraySeq, thanks to an implicit conversion from Array to ArrayOps.
Unless you have a specific need for the unique properties of arrays, try to avoid them too.
See This Talk at around 19:30 or This Article for an idea of the sort of problems that Arrays can introduce.
After watching that video, it's interesting to note that Scala uses Seq for varargs :)
As you observed correctly, ArraySeq has a richer API as it is derived from IndexedSeq (and so on) whereas Array is a direct representation of Java arrays.
The relation between the both could be roughly compared to the relation of the ArrayList and arrays in Java.
Due to it's API, I would recommend using the ArraySeq unless there is a specific reason not to do so. Using toArray(), you can convert to an Array any time.
Scala has all sorts sorts of immutable sequences like List, Vector,etc. I have been surprised to find no implementation of immutable indexed sequence backed by a simple array (Vector seems way too complicated for my needs).
Is there a design reason for this? I could not find a good explanation on the mailing list.
Do you have a recommendation for an immutable indexed sequence that has close to the same performances as an array? I am considering scalaz's ImmutableArray, but it has some issues with scala trunk for example.
Thank you
You could cast your array into a sequence.
val s: Seq[Int] = Array(1,2,3,4)
The array will be implicitly converted to a WrappedArray. And as the type is Seq, update operations will no longer be available.
So, let's first make a distinction between interface and class. The interface is an API design, while the class is the implementation of such API.
The interfaces in Scala have the same name and different package to distinguish with regards to immutability: Seq, immutable.Seq, mutable.Seq.
The classes, on the other hand, usually don't share a name. A List is an immutable sequence, while a ListBuffer is a mutable sequence. There are exceptions, like HashSet, but that's just a coincidence with regards to implementation.
Now, and Array is not part of Scala's collection, being a Java class, but its wrapper WrappedArray shows clearly where it would show up: as a mutable class.
The interface implemented by WrappedArray is IndexedSeq, which exists are both mutable and immutable traits.
The immutable.IndexedSeq has a few implementing classes, including the WrappedString. The general use class implementing it, however, is the Vector. That class occupies the same position an Array class would occupy in the mutable side.
Now, there's no more complexity in using a Vector than using an Array, so I don't know why you call it complicated.
Perhaps you think it does too much internally, in which case you'd be wrong. All well designed immutable classes are persistent, because using an immutable collection means creating new copies of it, so they have to be optimized for that, which is exactly what Vector does.
Mostly because there are no arrays whatsoever in Scala. What you're seeing is java's arrays pimped with a few methods that help them fit into the collection API.
Anything else wouldn't be an array, with it's unique property of not suffering type erasure, or the broken variance. It would just be another type with indexes and values. Scala does have that, it's called IndexedSeq, and if you need to pass it as an array to some 3rd party API then you can just use .toArray
Scala 2.13 has added ArraySeq, which is an immutable sequence backed by an array.
Scala 3 now has IArray, an Immutable Array.
It is implemented as an Opaque Type Alias, with no runtime overhead.
The point of the scala Array class is to provide a mechanism to access the abilities of Java arrays (but without Java's awful design decision of allowing arrays to be covariant within its type system). Java arrays are mutable, hence so are those in the scala standard library.
Suppose there were also another class immutable.Array in the library but that the compiler were also to use a Java array as the underlying structure (for efficiency/speed). The following code would then compile and run:
val i = immutable.Array("Hello")
i.asInstanceOf[Array[String]](0) = "Goodbye"
println( i(0) ) //I thought i was immutable :-(
That is, the array would really be mutable.
The problem with Arrays is that they have a fixed size. There is no operation to add an element to an array, or remove one from it.
You can keep an array that you guess will be long enough as a backing store, "wasting" the memory you're not using, keep track of the last used index, and copy to a larger array if you need the extra space. That copying is O(N) obviously.
Changing a single element is also O(N) as you will need to copy over the entire array. There is no structural sharing, which is the lynchpin of performant functional datastructures.
You could also allocate an extra array for the "overflowing" elements, and somehow keep track of your arrays. At that point you're on your way of re-inventing Vector.
In short, due to their unsuitablility for structural sharing, immutable facades for arrays have terrible runtime performance characteristics for most common operations like adding an element, removing an element, and changing an element.
That only leaves the use-case of a fixed size fixed content data-carrier, and that use-case is relatively rare. Most uses better served with List, Stream or Vector
You can simply use Array[T].toIndexSeq to convert Array[T] to ArraySeq[T], which is of type immutable.IndexedSeq[T].
(after Scala 2.13.0)
scala> val array = Array(0, 1, 2)
array: Array[Int] = Array(0, 1, 2)
scala> array.toIndexedSeq
res0: IndexedSeq[Int] = ArraySeq(0, 1, 2)