Why no immutable arrays in scala standard library? - arrays

Scala has all sorts sorts of immutable sequences like List, Vector,etc. I have been surprised to find no implementation of immutable indexed sequence backed by a simple array (Vector seems way too complicated for my needs).
Is there a design reason for this? I could not find a good explanation on the mailing list.
Do you have a recommendation for an immutable indexed sequence that has close to the same performances as an array? I am considering scalaz's ImmutableArray, but it has some issues with scala trunk for example.
Thank you

You could cast your array into a sequence.
val s: Seq[Int] = Array(1,2,3,4)
The array will be implicitly converted to a WrappedArray. And as the type is Seq, update operations will no longer be available.

So, let's first make a distinction between interface and class. The interface is an API design, while the class is the implementation of such API.
The interfaces in Scala have the same name and different package to distinguish with regards to immutability: Seq, immutable.Seq, mutable.Seq.
The classes, on the other hand, usually don't share a name. A List is an immutable sequence, while a ListBuffer is a mutable sequence. There are exceptions, like HashSet, but that's just a coincidence with regards to implementation.
Now, and Array is not part of Scala's collection, being a Java class, but its wrapper WrappedArray shows clearly where it would show up: as a mutable class.
The interface implemented by WrappedArray is IndexedSeq, which exists are both mutable and immutable traits.
The immutable.IndexedSeq has a few implementing classes, including the WrappedString. The general use class implementing it, however, is the Vector. That class occupies the same position an Array class would occupy in the mutable side.
Now, there's no more complexity in using a Vector than using an Array, so I don't know why you call it complicated.
Perhaps you think it does too much internally, in which case you'd be wrong. All well designed immutable classes are persistent, because using an immutable collection means creating new copies of it, so they have to be optimized for that, which is exactly what Vector does.

Mostly because there are no arrays whatsoever in Scala. What you're seeing is java's arrays pimped with a few methods that help them fit into the collection API.
Anything else wouldn't be an array, with it's unique property of not suffering type erasure, or the broken variance. It would just be another type with indexes and values. Scala does have that, it's called IndexedSeq, and if you need to pass it as an array to some 3rd party API then you can just use .toArray

Scala 2.13 has added ArraySeq, which is an immutable sequence backed by an array.

Scala 3 now has IArray, an Immutable Array.
It is implemented as an Opaque Type Alias, with no runtime overhead.

The point of the scala Array class is to provide a mechanism to access the abilities of Java arrays (but without Java's awful design decision of allowing arrays to be covariant within its type system). Java arrays are mutable, hence so are those in the scala standard library.
Suppose there were also another class immutable.Array in the library but that the compiler were also to use a Java array as the underlying structure (for efficiency/speed). The following code would then compile and run:
val i = immutable.Array("Hello")
i.asInstanceOf[Array[String]](0) = "Goodbye"
println( i(0) ) //I thought i was immutable :-(
That is, the array would really be mutable.

The problem with Arrays is that they have a fixed size. There is no operation to add an element to an array, or remove one from it.
You can keep an array that you guess will be long enough as a backing store, "wasting" the memory you're not using, keep track of the last used index, and copy to a larger array if you need the extra space. That copying is O(N) obviously.
Changing a single element is also O(N) as you will need to copy over the entire array. There is no structural sharing, which is the lynchpin of performant functional datastructures.
You could also allocate an extra array for the "overflowing" elements, and somehow keep track of your arrays. At that point you're on your way of re-inventing Vector.
In short, due to their unsuitablility for structural sharing, immutable facades for arrays have terrible runtime performance characteristics for most common operations like adding an element, removing an element, and changing an element.
That only leaves the use-case of a fixed size fixed content data-carrier, and that use-case is relatively rare. Most uses better served with List, Stream or Vector

You can simply use Array[T].toIndexSeq to convert Array[T] to ArraySeq[T], which is of type immutable.IndexedSeq[T].
(after Scala 2.13.0)
scala> val array = Array(0, 1, 2)
array: Array[Int] = Array(0, 1, 2)
scala> array.toIndexedSeq
res0: IndexedSeq[Int] = ArraySeq(0, 1, 2)

Related

Why do filter method implementations create another array instead of modifying the current array?

From what I know, many popular implementations of filter collection methods, e.g. JavaScript's Array#filter method, tend to create a new array rather than modifying it. (As #Berthur mentioned, this is also generally useful in terms of functional programming as well).
However, from what I've seen in homemade methods of filter implementations, sometimes the author chooses to use a while / for loop on a dynamically allocated array (e.g. an ArrayList in Java) and remove elements instead.
I have a general idea of why this is the case (since removing elements requires the rest of the array's elements afterwards to be shifted over, which is O(n) while adding elements is O(1)), but I also know that in the same case, if an element is added to the end of an array when the array is full, it requires memory to be allocated, which requires, in the case for Java, the array to be copied.
Thus, is there some mathematical reason of why creating a new array for filtering is (generally) faster than removing & moving elements over, or is it just for the guaranteed immutability over the original array that it guarantees?
It's not generally faster, and it's not done for performance reasons. It's more of a programming paradigm, as well as being a convenient tool.
While in-place algorithms are often faster for performance and/or memory critical applications, they need to know about the underlying implementation of the data structure, and become more specific. This immutable approach allows for more general functionality, apart from being convenient. The approach is common in functional programming. As you say, it guarantees immutability, which makes it compatible with this way of thinking.
In your Javascript example, for instance, notice that you can call filter on a regular array, but you could also call it on a TypedArray. Now, typed arrays cannot be resized, so performing an in-place filter would not be possible in the first place. But the filter method behaves in the same way through their common interface, following the principles of polymorphism.
Ultimately, these functions are just available to you and while they can be very convenient for many cases, it is up to you as a programmer to decide whether they cover your specific need or whether you must implement your own custom algorithm.

Normal array in Swift vs 'NSMutableArray'?

So in Swift, what's the difference between
var arr = ["Foo", "Bar"] // normal array in Swift
and
var arr = NSMutableArray.array() // 'NSMutableArray' object
["Foo", "Bar"].map {
arr.addObject($0)
}
other than being different implementations of the same thing.
Both appear to have all the basic features that one might need (.count, the ability to insert/remove objects etc.).
NSMutableArray was invented back in the Obj-C days, obviously to provide a more modern solution instead of the regular C-style arrays. But how does it compare to Swift's built-in array?
Which one is safer and/or faster?
The most important difference, in my opinion, is that NSMutableArray is a class type and Array is a value type. Ergo, an NSMutableArray will be passed as a reference, whereas a Swift Array will be passed by value.
Furthermore NSMutableArray is a subclass of NSObject whereas Array has no parent class. - this means that you get access to all NSObject methods and other 'goodies' when utilising NSMutableArray.
An NSMutableArray will not be copied when you amend it, a Swift Array will be.
Which one is best really depends on your application.
I find (when working with UIKit and Cocoa touch) that NSMutableArray is great when I need a persistent model, whereas Array is great for performance and throw away arrays.
These are just my initial thoughts, I'm sure someone from the community can offer much deeper insight.
Reference Type When:(NSMutableArray)
Subclasses of NSObject must be class types
Comparing instance identity with === makes sense
You want to create shared, mutable state
Value Type When: (Swift array)
Comparing instance data with == makes sense (Equatable protocol)
You want copies to have independent state
The data will be used in code across multiple threads (avoid explicit synchronization)
Interestingly enough, the Swift standard library heavily favors value types:Primitive types (Int, Double, String, …) are value types
Standard collections (Array, Dictionary, Set, …) are value types
Aside from what is illustrated above, the choice really depends on what you are trying to implement. As a rule of thumb, if there is no specific constraint that forces you to opt for a reference type, or you are not sure which option is best for your specific use case, you could start by implementing your data structure using a value type. If needed, you should be able to convert it to a reference type later with relatively little effort.
Conclusion:
Reference types incur more memory overhead, from reference counting and also for storing its data on the heap.
It's worth knowing that copying value types is relatively cheap in Swift,
But it’s important to keep in mind that if your value types become too large, the performance cost of copying can become greater than the cost of using reference types.

How do I resize an array in Scala

I am trying to create a DB management tool in Scala, and I want to be able to draw from this database into Arrays, whose size can shift based on the data being passed to them. I know how to do this in C, PHP, VB, etc. but can't seem to figure out the syntax for Scala.
I'm sure this should be a simple problem, so any help would be appreciated
Collections by default in Scala tend to be immutable. Operations will create new immutable collections from existing collections (by adding/removing elements etc.). The benefit of this is that collections don't change under iteration and writing multi-threaded applications tends to be easier (lots of caveats/assumptions with how you write standard Java apply here!).
Having said all that, if you need a mutable array, have you looked at an ArrayBuffer (a mutable collection with an underlying array implementation) ?
e.g.
val a = new scala.collection.mutable.ArrayBuffer[String]()
a += "A"
a += "B"
a(1) // gives you 'B'
You could use System.copy for this task, if you really want to use an array, or you could directly use a container that will resize itself automatically, such as ListBuffer or ArrayList.

Array vs ArraySeq comparison

This is a bit of a general question but I was wondering if anybody could advise me on what would be advantages of working with Array vs ArraySeq. From what I have seen Array is scala's representation of java Array and there are not too many members in its API whereas ArraySeq seems to contain a much richer API.
There are actually four different classes you could choose from to get mutable array-like functionality.
Array + ArrayOps
WrappedArray
ArraySeq
ArrayBuffer
Array is a plain old Java array. It is by far the best way to go for low-level access to arrays of primitives. There's no overhead. Also it can act like the Scala collections thanks to implicit conversion to ArrayOps, which grabs the underlying array, applies the appropriate method, and, if appropriate, returns a new array. But since ArrayOps is not specialized for primitives, it's slow (as slow as boxing/unboxing always is).
WrappedArray is a plain old Java array, but wrapped in all of Scala's collection goodies. The difference between it and ArrayOps is that WrappedArray returns another WrappedArray--so at least you don't have the overhead of having to re-ArrayOps your Java primitive array over and over again for each operation. It's good to use when you are doing a lot of interop with Java and you need to pass in plain old Java arrays, but on the Scala side you need to manipulate them conveniently.
ArraySeq stores its data in a plain old Java array, but it no longer stores arrays of primitives; everything is an array of objects. This means that primitives get boxed on the way in. That's actually convenient if you want to use the primitives many times; since you've got boxed copies stored, you only have to unbox them, not box and unbox them on every generic operation.
ArrayBuffer acts like an array, but you can add and remove elements from it. If you're going to go all the way to ArraySeq, why not have the added flexibility of changing length while you're at it?
From the scala-lang.org forum:
Array[T] - Benefits: Native, fast -
Limitations: Few methods (only apply,
update, length), need to know T at
compile-time, because Java bytecode
represents (char[] different from
int[] different from Object[])
ArraySeq[T] (the class formerly known
as GenericArray[T]): - Benefits: Still
backed by a native Array, don't need
to know anything about T at
compile-time (new ArraySeq[T] "just
works", even if nothing is known about
T), full suite of SeqLike methods,
subtype of Seq[T] - Limitations: It's
backed by an Array[AnyRef], regardless
of what T is (if T is primitive, then
elements will be boxed/unboxed on
their way in or out of the backing
Array)
ArraySeq[Any] is much faster than
Array[Any] when handling primitives.
In any code you have Array[T], where T
isn't <: AnyRef, you'll get faster
performance out of ArraySeq.
Array is a direct representation of Java's Array, and uses the exact same bytecode on the JVM.
The advantage of Array is that it's the only collection type on the JVM to not undergo type erasure, Arrays are also able to directly hold primitives without boxing, this can make them very fast under some circumstances.
Plus, you get Java's messed up array covariance behaviour. (If you pass e.g. an Array[Int] to some Java class it can be assigned to a variable of type Array[Object] which will then throw an ArrayStoreException on trying to add anything that isn't an int.)
ArraySeq is rarely used nowadays, it's more of a historic artifact from older versions of Scala that treated arrays differently. Seeing as you have to deal with boxing anyway, you're almost certain to find that another collection type is a better fit for your requirements.
Otherwise... Arrays have exactly the same API as ArraySeq, thanks to an implicit conversion from Array to ArrayOps.
Unless you have a specific need for the unique properties of arrays, try to avoid them too.
See This Talk at around 19:30 or This Article for an idea of the sort of problems that Arrays can introduce.
After watching that video, it's interesting to note that Scala uses Seq for varargs :)
As you observed correctly, ArraySeq has a richer API as it is derived from IndexedSeq (and so on) whereas Array is a direct representation of Java arrays.
The relation between the both could be roughly compared to the relation of the ArrayList and arrays in Java.
Due to it's API, I would recommend using the ArraySeq unless there is a specific reason not to do so. Using toArray(), you can convert to an Array any time.

What is most efficient way to do immutable byte arrays in Scala?

I want to get an array of bytes (Array[Byte]) from somewhere (read from file, from socket, etc) and then provide a efficient way to pull bits out of it (e.g. provide a function to extract a 32-bit integer from offset N in array). I would then like to wrap the byte array (hiding it) providing functions to pull bits out from the array (probably using lazy val for each bit to pull out).
I would imagine having a wrapping class that takes an immutable byte array type in the constructor to prove the array contents is never modified. IndexedSeq[Byte] seemed relevant, but I could not work out how to go from Array[Byte] to IndexedSeq[Byte].
Part 2 of the question is if I used IndexedSeq[Byte] will the resultant code be any slower? I need the code to execute as fast as possible, so would stick with Array[Byte] if the compiler could do a better job with it.
I could write a wrapper class around the array, but that would slow things down - one extra level of indirection for each access to bytes in the array. Performance is critical due to the number of array accesses that will be required. I need fast code, but would like to do the code nicely at the same time. Thanks!
PS: I am a Scala newbie.
Treating Array[T] as an IndexedSeq[T] could hardly be simpler:
Array(1: Byte): IndexedSeq[Byte] // trigger an Implicit View
wrapByteArray(Array(1: Byte)) // explicitly calling
Unboxing will kill you long before an extra layer of indirection.
C:\>scala -Xprint:erasure -e "{val a = Array(1: Byte); val b1: Byte = a(0); val
b2 = (a: IndexedSeq[Byte])(0)}"
[[syntax trees at end of erasure]]// Scala source: scalacmd5680604016099242427.s
cala
val a: Array[Byte] = scala.Array.apply((1: Byte), scala.this.Predef.
wrapByteArray(Array[Byte]{}));
val b1: Byte = a.apply(0);
val b2: Byte = scala.Byte.unbox((scala.this.Predef.wrapByteArray(a): IndexedSeq).apply(0));
To avoid this, the Scala collections library should be specialized on the element type, in the same style as Tuple1 and Tuple2. I'm told this is planned, but it's a bit more involved than simply slapping #specialized everywhere, so I don't know how long it will take.
UPDATE
Yes, WrappedArray is mutable, although collection.IndexedSeq[Byte] doesn't have methods to mutate, so you could just trust clients not to cast to a mutable interface. The next release of Scalaz will include ImmutableArray which prevents this.
The boxing comes retrieving an element from the collection via this generic method:
trait SeqLike[+A, +Repr] extends IterableLike[A, Repr] { self =>
def apply(idx: Int): A
}
At the JVM level, this signature is type-erased to:
def apply(idx: Int): Object
If your collection contains primitives, that is, subtypes of AnyVal, they must be boxed in the corresponding wrapper to be returned from this method. For some applications, this is a major performance concern. Entire libraries have been written in Java to avoid this, notably fastutils.
Annotation directed specialization was added to Scala 2.8 to instruct the compiler to generate various versions of a class or method tailored to the permutations of primitive types. This has been applied to a few places in the standard library already, e.g. TupleN, ProductN, Function{0, 1, 2}. If this was also applied to the collections hierarchy, this performance cost could be alleviated.
If you want to work with sequences in Scala, I recommend you choose one of these:
Immutable seqs:
(linked seqs) List, Stream, Queue
(indexed seqs) Vector
Mutable seqs:
(linked seq) ListBuffer
(indexed seq) ArrayBuffer
The new (2.8) Scala collections have been hard to grasp for me, primarily due to shortage of (correct) documentation but also because of the source code (complex hierarchys). To clear my mind I made this pic to visualize the basic structure:
(source: programmera.net)
Also, note that Array is not part of the tree structure, it is a special case, since it wraps the Java array (which is a special case in Java).

Resources