Scala - Efficient element wise sum of two arrays - arrays

I have two arrays which I would like to reduce to one array in which at each index you have the sum of the two elements in the original arrays. For example:
val arr1: Array[Int] = Array(1, 1, 3, 3, 5)
val arr1: Array[Int] = Array(2, 1, 2, 2, 1)
val arr3: Array[Int] = sum(arr1, arr2)
// This should result in:
// arr3 = Array(3, 2, 5, 5, 6)
I've seen this post: Element-wise sum of arrays in Scala, and I currently use this approach (zip/map). However, using this for a big data application I am concerned about its performance. Using this approach one has to traverse the array(s) at least twice. Is there a better approach in terms of efficiency?

The most efficient way might well be to do it lazily.
As with anything collection-oriented, Scala 2.12 and 2.13 are going to be different (this code is Scala 2.13, but 2.12 will be similar... might extend IndexedSeqLike, but I don't know for sure)
import scala.collection.IndexedSeq
import scala.math.Numeric
case class SumIndexedSeq[+T: Numeric](seq1: IndexedSeq[T], seq2: IndexedSeq[T]) extends IndexedSeq[T] {
override val length: Int = seq1.length.min(seq2.length)
override def apply(i: Int) =
if (i >= length) throw new IndexOutOfBoundsException
else seq1(i) + seq2(i)
}
Arrays are implicitly convertible to a subtype of collection.IndexedSeq. This will compute the sum of the corresponding elements on every access (which may be generally desirable as it's possible to use a mutable IndexedSeq).
If you need an Array, you can get one with only a single traversal via
val arr3: Array[Int] = SumIndexedSeq(arr1, arr2).toArray
but SumIndexedSeq can be used anywhere a Seq can be used without a traversal.
As a further optimization, especially if you're sure that the underlying collections/arrays won't mutate, you can add a cache so you don't add the same elements together twice. It can also be generalized, if you so care, to any binary operations on T (in which case the Numeric constraint can be removed).
As Luis noted, for a performance question: experiment and benchmark. It's worth keeping in mind that a cache implementation may well entail boxing every element to put in the cache, so you might need to be accessing the same elements many times in order for the cache to be a win (and a sufficiently large cache may have implications for the stability of a distributed system).

Well, first of all, as with all things related to performance the only answer is to benchmark.
Second, are you sure you need plain mutable, invariant, weird Arrays? Can't you use something like Vector or ArraySeq?
Third, you can just do something like this or using a while loop, which would be the same.
val result = ArraySeq.tabulate(math.min(arr1.length, arr2.length)) { i =>
arr1(i) + arr2(i)
}

Related

Tile a small Array in a large Array multiple times in scala

I want to tile a small array multiple times in a large array. I'm looking for an "official" way of doing this. A naive solution follows:
val arr = Array[Int](1, 2, 3)
val array = {
val arrBuf = ArrayBuffer[Int]()
for (_ <- 1 until 10) {
arrBuf ++= arr
}
arrBuf.toArray
}
If you do not know why Arrays are good for performance (meaning you do not really need raw performance in this case) I would recommend you do not use them, and rather stick with List or Vector instead.
Arrays are not proper Scala collections, they are just plain JVM arrays. Meaning, they are mutable, very efficient (especially for unboxed primitives), fixed in memory size, and very restricted. They behave like normal scala collections because of implicit conversions and extension methods. But, due to their mutability and invariance, you really should avoid them unless you have good reasons for using them.
The proposed solution by Andronicus is not ideal for arrays (but it would be a very good solution for any real collection) because given arrays have fixed memory size, this fattening will end in constant reallocations and memory copying under the hood.
Anyways, here is a slight variation to such solution, using lists instead; which is a little bit more efficient.
implicit class ListOps[A](private val list: List[A]) extends AnyVal {
def times[B >: A](n: Int): List[B] =
Iterator.fill(n)(list).flatten.toList
}
List(1, 2, 3).times(3)
// res: List[Int] = List(1, 2, 3, 1, 2, 3, 1, 2, 3)
And here is also an efficient version using the new ArraySeq introduced in 2.13; which is an immutable Array.
(Note, you can do this using plain Arrays too)
implicit class ArraySeqOps[A](private val arr: ArraySeq[A]) extends AnyVal {
def times[B >: A](n: Int): ArraySeq[B] =
ArraySeq.tabulate(n * arr.lenght) { i => arr(i % arr.length) }
}
ArraySeq(1, 2, 3).times(3)
// res: ArraySeq[Int] = ArraySeq(1, 2, 3, 1, 2, 3, 1, 2, 3)
You can use Array.fill:
Array.fill(10)(Array(1, 2, 3)).flatten

Adding value to arrays in scala

I faced a problem where I needed to add a new value in the middle of an Array (i.e. make a copy of the original array and replace that with the new one). I successfully solved my problem, but I was wondering whether there were other methods to do this without changing the array to buffer for a while.
val original = Array(0, 1, 3, 4)
val parts = original.splitAt(2)
val modified = parts._1 ++ (2 +: parts._2)
res0: Array[Int] = Array(0, 1, 2, 3, 4)
What I don't like on my solution is the parts variable; I'd prefer not using an intermediate step like that. Is that the easiest way to add the value or is there some better ways to do add an element?
This is precisely what patch does:
val original = Array(0, 1, 3, 4)
original.patch(2, Array(2), 0) // Array[Int] = Array(0, 1, 2, 3, 4)
You can use a mutable version of a collection to do this. The method insert do what you want (insert an element at a given index).
Well, if indeed the extra variable is what's troubling you, you can do it in one go:
val modified = original.take(2) ++ (2 +: original.drop(2))
But using a mutable collection like Augusto suggested might fit better, depending on your use case (e.g. performance, array size, multiple such edits...).
The question is, what's the context? If you are doing this in a loop, allocating a new array every time will kill your performance anyway, and you should rethink your approach (e.g. collect all the elements you want to insert before inserting them).
If you aren't, well, you can use System.arraycopy to avoid any intermediate conversions:
val original = Array(0, 1, 3, 4)
val index = 2
val valueToInsert = 2
val modified = Array.ofDim[Int](original.length + 1)
System.arraycopy(original, 0, modified, 0, index)
modified(index) = valueToInsert
System.arraycopy(original, index, modified, index + 1, original.length - index)
But note how easy it's to make an off-by-one error here (I think there isn't one, but I haven't tested it). So the only reason to do it is if you really need high performance, and that's only likely if it happens in a loop, in which case go back to the second sentence.

Scala: remove first column (first element in each row)

Given an var x: Array[Seq[Any]], what would be the most efficient (fast, in-place if possible?) way to remove the first element from each row?
I've tried the following but it didn't work - probably because of immutability...
for (row <- x.indices)
x(row).drop(1)
First off, Arrays are mutable, and I'm guessing you're intending to change the elements of x, rather than replacing the entire array with a new array object, so it makes more sense to use val instead of var:
val x: Array[Seq[Any]]
Since you said your Seq objects are immutable, then you need to make sure you are setting the values in your array. This will work:
for (row <- x.indices)
x(row) = x(row).drop(1)
This can be written in nicer ways. For example, you can use transform to map all the values of your array with a function:
x.transform(_.drop(1))
This updates in-place, unlike map, which will leave the old array unmodified and return a new array.
EDIT: I started to speculate on which method would be faster or more efficient, but the more I think about it, the more I realize I'm not sure. Both should have acceptable performance for most use cases.
this would work
scala> val x = Array(List(1,2,3),List(1,2,3))
x: Array[List[Int]] = Array(List(1, 2, 3), List(1, 2, 3))
scala> x map(_.drop(1))
res0: Array[List[Int]] = Array(List(2, 3), List(2, 3))

Efficient way to convert Scala Array to Unique Sorted List

Can anybody optimize following statement in Scala:
// maybe large
val someArray = Array(9, 1, 6, 2, 1, 9, 4, 5, 1, 6, 5, 0, 6)
// output a sorted list which contains unique element from the array without 0
val newList=(someArray filter (_>0)).toList.distinct.sort((e1, e2) => (e1 > e2))
Since the performance is critical, is there a better way?
Thank you.
This simple line is one of the fastest codes so far:
someArray.toList.filter (_ > 0).sortWith (_ > _).distinct
but the clear winner so far is - due to my measurement - Jed Wesley-Smith. Maybe if Rex' code is fixed, it looks different.
Typical disclaimer 1 + 2:
I modified the codes to accept an Array and return an List.
Typical benchmark considerations:
This was random data, equally distributed. For 1 Million elements, I created an Array of 1 Million ints between 0 and 1 Million. So with more or less zeros, and more or less duplicates, it might vary.
It might depend on the machine etc.. I used a single core CPU, Intel-Linux-32bit, jdk-1.6, scala 2.9.0.1
Here is the underlying benchcoat-code and the concrete code to produce the graph (gnuplot). Y-axis: time in seconds. X-axis: 100 000 to 1 000 000 elements in Array.
update:
After finding the problem with Rex' code, his code is as fast as Jed's code, but the last operation is a transformation of his Array to a List (to fullfill my benchmark-interface). Using a var result = List [Int], and result = someArray (i) :: result speeds his code up, so that it is about twice as fast as the Jed-Code.
Another, maybe interesting, finding is: If I rearrange my code in the order of filter/sort/distinct (fsd) => (dsf, dfs, fsd, ...), all 6 possibilities don't differ significantly.
I haven't measured, but I'm with Duncan, sort in place then use something like:
util.Sorting.quickSort(array)
array.foldRight(List.empty[Int]){
case (a, b) =>
if (!b.isEmpty && b(0) == a)
b
else
a :: b
}
In theory this should be pretty efficient.
Without benchmarking I can't be sure, but I imagine the following is pretty efficient:
val list = collection.SortedSet(someArray.filter(_>0) :_*).toList
Also try adding .par after someArray in your version. It's not guaranteed to be quicker, bit it might be. You should run a benchmark and experiment.
sort is deprecated. Use .sortWith(_ > _) instead.
Boxing primitives is going to give you a 10-30x performance penalty. Therefore if you really are performance limited, you're going to want to work off of raw primitive arrays:
def arrayDistinctInts(someArray: Array[Int]) = {
java.util.Arrays.sort(someArray)
var overzero = 0
var ndiff = 0
var last = 0
var i = 0
while (i < someArray.length) {
if (someArray(i)<=0) overzero = i+1
else if (someArray(i)>last) {
last = someArray(i)
ndiff += 1
}
i += 1
}
val result = new Array[Int](ndiff)
var j = 0
i = overzero
last = 0
while (i < someArray.length) {
if (someArray(i) > last) {
result(j) = someArray(i)
last = someArray(i)
j += 1
}
i += 1
}
result
}
You can get slightly better than this if you're careful (and be warned, I typed this off the top of my head; I might have typoed something, but this is the style to use), but if you find the existing version too slow, this should be at least 5x faster and possibly a lot more.
Edit (in addition to fixing up the previous code so it actually works):
If you insist on ending with a list, then you can build the list as you go. You could do this recursively, but I don't think in this case it's any clearer than the iterative version, so:
def listDistinctInts(someArray: Array[Int]): List[Int] = {
if (someArray.length == 0 || someArray(someArray.length-1) <= 0) List[Int]()
else {
java.util.Arrays.sort(someArray)
var last = someArray(someArray.length-1)
var list = last :: Nil
var i = someArray.length-2
while (i >= 0) {
if (someArray(i) < last) {
last = someArray(i)
if (last <= 0) return list;
list = last :: list
}
i -= 1
}
list
}
}
Also, if you may not destroy the original array by sorting, you are by far best off if you duplicate the array and destroy the copy (array copies of primitives are really fast).
And keep in mind that there are special-case solutions that are far faster yet depending on the nature of the data. For example, if you know that you have a long array, but the numbers will be in a small range (e.g. -100 to 100), then you can use a bitset to track which ones you've encountered.
For efficiency, depending on your value of large:
val a = someArray.toSet.filter(_>0).toArray
java.util.Arrays.sort(a) // quicksort, mutable data structures bad :-)
res15: Array[Int] = Array(1, 2, 4, 5, 6, 9)
Note that this does the sort using qsort on an unboxed array.
I'm not in a position to measure, but some more suggestions...
Sorting the array in place before converting to a list might well be more efficient, and you might look at removing dups from the sorted list manually, as they will be grouped together. The cost of removing 0's before or after the sort will also depend on their ratio to the other entries.
How about adding everything to a sorted set?
val a = scala.collection.immutable.SortedSet(someArray filter (0 !=): _*)
Of course, you should benchmark the code to check what is faster, and, more importantly, that this is truly a hot spot.

ArrayStack Remove method?

Is there a method that does the same as Java's remove in the ArrayStack class?
Or is it possible to write one in Scala?
All Scala's collection types support adding/removing elements at either the start or the end (with varying performance trade-offs), and are only limited in size by certain properties of the JVM - such as the maximum size of pointers.
So if this is the only reason that you're using a Stack, then you've chosen the wrong collection type. Given the requirement to be able to remove elements from the middle of the collection, something like a Vector would be a much better fit.
You can use filterNot(_ == o) to create another stack with any instances of o missing (at least in 2.9), and you can stack.slice(0,n) ++ stack.slice(n+1,stack.length) to create a new stack with in indexed element missing.
But, no, there isn't an exact analog, probably because removing an item at a random position in an array is a low-performance thing to do.
Edit: slice seems buggy to me, actually, in 2.9.0.RC2 (I have filed a bug report with code to fix it, so this will be fixed for 2.9.0.final, presumably). And in 2.8.1, you have to create a new ArrayStack by hand. So I guess the answer for now is a pretty emphatic "no".
Edit: slice has been fixed, so as of 2.9.0.RC4 and later, the slice approach should work.
Maybe this could fit your needs:
scala> import collection.mutable.Stack
import collection.mutable.Stack
scala> val s = new Stack[Int]
s: scala.collection.mutable.Stack[Int] = Stack()
scala> s push 1
res0: s.type = Stack(1)
scala> s push 2
res1: s.type = Stack(2, 1)
scala> s push 3
res2: s.type = Stack(3, 2, 1)
scala> s pop
res3: Int = 3
scala> s pop
res4: Int = 2
scala> s pop
res5: Int = 1
Or there is also immutable version of the Stack class.

Resources