How to sum up every column of a Scala array? - arrays

If I have an array of array (similar to a matrix) in Scala, what's the efficient way to sum up each column of the matrix? For example, if my array of array is like below:
val arr = Array(Array(1, 100, ...), Array(2, 200, ...), Array(3, 300, ...))
and I want to sum up each column (e.g., sum up the first element of all sub-arrays, sum up the second element of all sub-arrays, etc.) and get a new array like below:
newArr = Array(6, 600, ...)
How can I do this efficiently in Spark Scala?

There is a suitable .transpose method on List that can help here, although I can't say what its efficiency is like:
arr.toList.transpose.map(_.sum)
(then call .toArray if you specifically need the result as an array).

Using breeze Vector:
scala> val arr = Array(Array(1, 100), Array(2, 200), Array(3, 300))
arr: Array[Array[Int]] = Array(Array(1, 100), Array(2, 200), Array(3, 300))
scala> arr.map(breeze.linalg.Vector(_)).reduce(_ + _)
res0: breeze.linalg.Vector[Int] = DenseVector(6, 600)
If your input is sparse you may consider using breeze.linalg.SparseVector.

In practice a linear algebra vector library as mentioned by #zero323 will often be the better choice.
If you can't use a vector library, I suggest writing a function col2sum that can sum two columns -- even if they are not the same length -- and then use Array.reduce to extend this operation to N columns. Using reduce is valid because we know that sums are not dependent on order of operations (i.e. 1+2+3 == 3+2+1 == 3+1+2 == 6) :
def col2sum(x:Array[Int],y:Array[Int]):Array[Int] = {
x.zipAll(y,0,0).map(pair=>pair._1+pair._2)
}
def colsum(a:Array[Array[Int]]):Array[Int] = {
a.reduce(col2sum)
}
val z = Array(Array(1, 2, 3, 4, 5), Array(2, 4, 6, 8, 10), Array(1, 9));
colsum(z)
--> Array[Int] = Array(4, 15, 9, 12, 15)

scala> val arr = Array(Array(1, 100), Array(2, 200), Array(3, 300 ))
arr: Array[Array[Int]] = Array(Array(1, 100), Array(2, 200), Array(3, 300))
scala> arr.flatten.zipWithIndex.groupBy(c => (c._2 + 1) % 2)
.map(a => a._1 -> a._2.foldLeft(0)((sum, i) => sum + i._1))
res40: scala.collection.immutable.Map[Int,Int] = Map(2 -> 600, 1 -> 6, 0 -> 15)
flatten array and zipWithIndex to get index and groupBy to map new array as column array, foldLeft to sum the column array.

Related

Scala Array Copy looks to be working differently

object Solution extends App {
val arr1 = Array(
Array(1,2,3),
Array(4,5,6)
)
var arr2 = Array.ofDim[Int](2,3)
Array.copy(arr1,0,arr2,0,arr1.length)
arr1(0)(1) = 23
println(arr1.map(_.mkString(",")).mkString("\n"))
println()
println(arr2.map(_.mkString(",")).mkString("\n"))
}
1,23,3
4,5,6
1,23,3
4,5,6
what is wrong, why is the 23 appearing in both arrays
Because Array in Scala, or if to be more precise in JVM, because of Scala interop with Java - is a mutable structure, and you performing shallow copy and not a deep copy. Meaning - you copying the upper structure (or top array in your case) and not entire structure recursively, like all downstream array.
Solution might look like:
val source = Array(
Array(1, 2, 3),
Array(4, 5, 6)
)
val target = Array.ofDim[Int](2, 3)
source.zipWithIndex.foreach { case (row, index) =>
Array.copy(source(index), 0, target(index), 0, source.length)
}
target(0)(1) = 23
println(source.map(_.mkString(",")).mkString("\n"))
println()
println(target.map(_.mkString(",")).mkString("\n"))
which will print out result:
1,2,3
4,5,6
1,23,0
4,5,0
Scatie example: https://scastie.scala-lang.org/lrrHyGqZRxKk7mZ6CbLoiA
UPDATE
As correctly stated #Luis Miguel Mejía Suárez in the comment section - zipWithIndex expensive operation. More optimal solution would be
(0 until source.length).foreach { index =>
Array.copy(source(index), 0, target(index), 0, source.length)
}
Array.copy uses System.arrayCopy which modifies both the arrays. In the doc:
Copy one array to another. Equivalent to Java's System.arraycopy(src, srcPos, dest, destPos, length), except that this also works for polymorphic and boxed arrays.
Note that the passed-in dest array will be modified by this call.
You can try a simple map with identity:
scala> val arr1 = Array(Array(1,2,3),Array(4,5,6))
arr1: Array[Array[Int]] = Array(Array(1, 2, 3), Array(4, 5, 6))
scala> val arr3 = arr1.map(_.map(identity))
arr3: Array[Array[Int]] = Array(Array(1, 2, 3), Array(4, 5, 6))
scala> arr1(0)(1) = 23
scala> arr1
res16: Array[Array[Int]] = Array(Array(1, 23, 3), Array(4, 5, 6))
scala> arr3
res17: Array[Array[Int]] = Array(Array(1, 2, 3), Array(4, 5, 6))

In Scala, how do you convert an Array with n elements into an Array of tuples?

How do you convert from an Array[Int] of length n, into an Array[(Int,Option[Int])] of length Math.ceil(n / 2.0).toInt?
scala> val a = Array(1, 2, 3, 4)
a: Array[Int] = Array(1, 2, 3, 4)
The resultant Array for the example above would be Array((1, Some(2)), (3, some(4))). If a were Array(1), then the desired result would be Array((1, None)).
Capice?
grouped is useful for breaking the array into pairs, and case is useful for dealing with the possibility of a leftover element:
def toPairs[T](xs: Array[T]): Array[(T, Option[T])] =
xs.grouped(2)
.map{
case Array(a, b) => (a, Some(b))
case Array(a) => (a, None)
}.toArray
scala> toPairs(Array(1, 2, 3, 4, 5))
res0: Array[(Int, Option[Int])] = Array((1,Some(2)), (3,Some(4)), (5,None))
Something similar to Seth's suggestion, just a little more concise.
scala> Array(1,2,3,4,5).grouped(2).map(x=> (x.head, x.tail.headOption)).toArray
res17: Array[(Int, Option[Int])] = Array((1,Some(2)), (3,Some(4)), (5,None))

How can I select a non-sequential subset elements from an array using Scala and Spark?

In Python, this is how I would do it.
>>> x
array([10, 9, 8, 7, 6, 5, 4, 3, 2])
>>> x[np.array([3, 3, 1, 8])]
array([7, 7, 9, 2])
This doesn't work in the Scala Spark shell:
scala> val indices = Array(3,2,0)
indices: Array[Int] = Array(3, 2, 0)
scala> val A = Array(10,11,12,13,14,15)
A: Array[Int] = Array(10, 11, 12, 13, 14, 15)
scala> A(indices)
<console>:28: error: type mismatch;
found : Array[Int]
required: Int
A(indices)
The foreach method doesn't work either:
scala> indices.foreach(println(_))
3
2
0
scala> indices.foreach(A(_))
<no output>
What I want is the result of B:
scala> val B = Array(A(3),A(2),A(0))
B: Array[Int] = Array(13, 12, 10)
However, I don't want to hard code it like that because I don't know how long indices is or what would be in it.
The most concise way I can think of is to flip your mental model and put indices first:
indices map A
And, I would potentially suggest using lift to return an Option
indices map A.lift
You can use map on indices, which maps each element to a new element based on a mapping lambda. Note that on Array, you get an element at an index with the apply method:
indices.map(index => A.apply(index))
You can leave off apply:
indices.map(index => A(index))
You can also use the underscore syntax:
indices.map(A(_))
When you're in a situation like this, you can even leave off the underscore:
indices.map(A)
And you can use the alternate space syntax:
indices map A
You were trying to use foreach, which returns Unit, and is only used for side effects. For example:
indices.foreach(index => println(A(index)))
indices.map(A).foreach(println)
indices map A foreach println

Faster way to make a zeroed array in Scala

I create zeroed Arrays in Scala with
(0 until Nrows).map (_ => 0).toArray but is there anything faster ? map is slow.
I have the same question but with 1 instead of O, i.e. I also want to accelerate (0 until Nrows).map (_ => 1).toArray
Zero is the default value for an array of Ints, so just do this:
val array = new Array[Int](NRows)
If you want all those values to be 1s then use .fill() (with thanks to #gourlaysama):
val array = Array.fill(NRows)(1)
However, looking at how this works internally, it involves the creation of a few objects that you don't need. I suspect the following (uglier) approach may be quicker if speed is your main concern:
val array = new Array[Int](NRows)
for (i <- 0 until array.length) { array(i) = 1 }
For multidimensional arrays consider Array.ofDim, for instance,
scala> val a = Array.ofDim[Int](3,3)
a: Array[Array[Int]] = Array(Array(0, 0, 0), Array(0, 0, 0), Array(0, 0, 0))
Likewise,
scala> val a = Array.ofDim[Int](3)
a: Array[Int] = Array(0, 0, 0)
In the context here,
val a = Array.ofDim[Int](NRows)
For setting (possibly nonzero) initial values, consider Array.tabulate, for instance,
scala> Array.tabulate(3,3)( (x,y) => 1)
res5: Array[Array[Int]] = Array(Array(1, 1, 1), Array(1, 1, 1), Array(1, 1, 1))
scala> Array.tabulate(3)( x => 1)
res18: Array[Int] = Array(1, 1, 1)

Removing arrays which are subsets of other arrays in Scala

I have an array of array of integers. Like:
val t1 = Array(Array(1, 2, 3), Array(2), Array(4, 5, 6), Array(5, 6))
I want to remove the arrays that are subsets of another array. So, the result should be:
Array(Array(1, 2, 3), Array(4, 5, 6))
Ideally, these should be Sets, but in the context of my program, they are arrays, and I don't want to convert them to sets due to performance reasons.
I solved it this way in Scala, but I would like to know if there is a more elegant (and/or more efficient) way to do this:
def removeSubsets[T: ClassManifest](clusters: Array[Array[T]]) = {
val sortedClusters = clusters.sortBy(-1 * _.length)
sortedClusters.foldLeft(Array[Array[T]]()){ (acc, ele) =>
val isASubset = acc.exists(arr => (ele diff arr).length == 0)
if (isASubset) acc else acc :+ ele
}
}

Resources