Multidimensional Array zip array in scala - arrays

I have two array like:
val one = Array(1, 2, 3, 4)
val two = Array(4, 5, 6, 7)
var three = one zip two map{case(a, b) => a * b}
It's ok.
But I have a multidimensional Array and a one-dimensional array now, like this:
val mulOne = Array(Array(1, 2, 3, 4),Array(5, 6, 7, 8),Array(9, 10, 11, 12))
val one_arr = Array(1, 2, 3, 4)
I would like to multiplication them, how can I do this in scala?

You could use:
val tmpArr = mulOne.map(_ zip one_arr).map(_.map{case(a,b) => a*b})
This would give you Array(Array(1*1, 2*2, 3*3, 4*4), Array(5*1, 6*2, 7*3, 8*4), Array(9*1, 10*2, 11*3, 12*4)).
Here mulOne.map(_ zip one_arr) maps each internal array of mulOne with corresponding element of one_arr to create pairs like: Array(Array((1,1), (2,2), (3,3), (4,4)), ..) (Note: I have used placeholder syntax). In the next step .map(_.map{case(a,b) => a*b}) multiplies each elements of pair to give
output like: Array(Array(1, 4, 9, 16),..)
Then you could use:
tmpArr.map(_.reduce(_ + _))
to get sum of all internal Arrays to get Array(30, 70, 110)

Try this
mulOne.map{x => (x, one_arr)}.map { case(one, two) => one zip two map{case(a, b) => a * b} }
mulOne.map{x => (x, one_arr)} => for every array inside mulOne, create a tuple with content of one_arr.
.map { case(one, two) => one zip two map{case(a, b) => a * b} } is basically the operation that you performed in your first example on every tuple that were created in the first step.

Using a for comprehension like this,
val prods =
for { xs <- mulOne
zx = one_arr zip xs
(a,b) <- zx
} yield a*b
and so
prods.sum
delivers the final result.

Related

map and reduce an array

I want to map an array in every 3 elements, output is many [k,v]pairs, for example:
input: array(1,2,3,4,5,6,7,8,9,7,12,11)
output: (1 => 2,3) (4 => 5,6)(7 => 8,9) (7 => 12,11)
And I also want to reduce these pairs by keys, for example ,if I want to collect the data with key=7, then the output will be (7=> 8,9,12,11).
Thanks a lot.
I think what you need is following
val input = Array(1, 2, 3, 4, 5, 6, 7, 8, 9, 7, 12, 11)
val output = input.toSeq.grouped(3)
.map(g => (g.head, g.tail)).toList
.groupBy(_._1)
.mapValues(l => l.flatMap(_._2))
Result would be
Map(4 -> List(5, 6), 7 -> List(8, 9, 12, 11), 1 -> List(2, 3))
Try this:
res0 = list.grouped(3).map {x => (x(0), List(x(1),x(2)))}.toList
// you must dump your converted data format into your storage eg hdfs.
// And not the entire thing in the form of array. Transform in form of
// (key,value) and dump in hdfs. That will save a lot of computation.
res1 = sc.parallelize(res0)
res2 = res1.reduceByKey(_++_).collect
But I am not sure how much scalable this solution would be.
EDIT
val res1 = sc.parallelize(arr)
// (1,2,3,4,5,6,7,8,9,7,12,11)
val res2 = res1.zipWithIndex.map(x._2/3,List(x._1))
// (1,0),(2,1),...(12,10),(11,11) -> (0,1),(0,2),(0,3),(1,4),(1,5),(1,6)
val res3 = res2.reduceByKey(_++_).map(_._2)
//(0,List(1,2,3)),(1,List(4,5,6)) -> List(1,2,3),List(4,5,6)
val res4 = res3.map(x => x match {
case x1::xs => (x1,xs)
}).reduceByKey(_++_)
//List(1,2,3) - > (1,List(2,3)) -> reduceByKey
//(1,List(2,3)),(4,List(5,6)),(7,List(8,9,12,11))

Get the maximum N elements (along with their indices) of an Array

I've got an array that contains Integers as the one shown below:
val my_array = Array(10, 20, 6, 31, 0, 2, -2)
I need to get the maximum 3 elements of this array along with their corresponding indices (either using a single function or two separate funcs).
For example, the output might be something like:
// max values
Array(31, 20, 10)
// max indices
Array(3, 1, 0)
Although the operations look simple, I was not able to find any relevant functions around.
Here's a straightforward way - zipWithIndex followed by sorting:
val (values, indices) = my_array
.zipWithIndex // add indices
.sortBy(t => -t._1) // sort by values (descending)
.take(3) // take first 3
.unzip // "unzip" the array-of-tuples into tuple-of-arrays
Here's another way to do it:
(my_array zip Stream.from(0)).
sortWith(_._1 > _._1).
take(3)
res1: Array[(Int, Int)] = Array((31,3), (20,1), (10,0))

How to sum up every column of a Scala array?

If I have an array of array (similar to a matrix) in Scala, what's the efficient way to sum up each column of the matrix? For example, if my array of array is like below:
val arr = Array(Array(1, 100, ...), Array(2, 200, ...), Array(3, 300, ...))
and I want to sum up each column (e.g., sum up the first element of all sub-arrays, sum up the second element of all sub-arrays, etc.) and get a new array like below:
newArr = Array(6, 600, ...)
How can I do this efficiently in Spark Scala?
There is a suitable .transpose method on List that can help here, although I can't say what its efficiency is like:
arr.toList.transpose.map(_.sum)
(then call .toArray if you specifically need the result as an array).
Using breeze Vector:
scala> val arr = Array(Array(1, 100), Array(2, 200), Array(3, 300))
arr: Array[Array[Int]] = Array(Array(1, 100), Array(2, 200), Array(3, 300))
scala> arr.map(breeze.linalg.Vector(_)).reduce(_ + _)
res0: breeze.linalg.Vector[Int] = DenseVector(6, 600)
If your input is sparse you may consider using breeze.linalg.SparseVector.
In practice a linear algebra vector library as mentioned by #zero323 will often be the better choice.
If you can't use a vector library, I suggest writing a function col2sum that can sum two columns -- even if they are not the same length -- and then use Array.reduce to extend this operation to N columns. Using reduce is valid because we know that sums are not dependent on order of operations (i.e. 1+2+3 == 3+2+1 == 3+1+2 == 6) :
def col2sum(x:Array[Int],y:Array[Int]):Array[Int] = {
x.zipAll(y,0,0).map(pair=>pair._1+pair._2)
}
def colsum(a:Array[Array[Int]]):Array[Int] = {
a.reduce(col2sum)
}
val z = Array(Array(1, 2, 3, 4, 5), Array(2, 4, 6, 8, 10), Array(1, 9));
colsum(z)
--> Array[Int] = Array(4, 15, 9, 12, 15)
scala> val arr = Array(Array(1, 100), Array(2, 200), Array(3, 300 ))
arr: Array[Array[Int]] = Array(Array(1, 100), Array(2, 200), Array(3, 300))
scala> arr.flatten.zipWithIndex.groupBy(c => (c._2 + 1) % 2)
.map(a => a._1 -> a._2.foldLeft(0)((sum, i) => sum + i._1))
res40: scala.collection.immutable.Map[Int,Int] = Map(2 -> 600, 1 -> 6, 0 -> 15)
flatten array and zipWithIndex to get index and groupBy to map new array as column array, foldLeft to sum the column array.

How can I select a non-sequential subset elements from an array using Scala and Spark?

In Python, this is how I would do it.
>>> x
array([10, 9, 8, 7, 6, 5, 4, 3, 2])
>>> x[np.array([3, 3, 1, 8])]
array([7, 7, 9, 2])
This doesn't work in the Scala Spark shell:
scala> val indices = Array(3,2,0)
indices: Array[Int] = Array(3, 2, 0)
scala> val A = Array(10,11,12,13,14,15)
A: Array[Int] = Array(10, 11, 12, 13, 14, 15)
scala> A(indices)
<console>:28: error: type mismatch;
found : Array[Int]
required: Int
A(indices)
The foreach method doesn't work either:
scala> indices.foreach(println(_))
3
2
0
scala> indices.foreach(A(_))
<no output>
What I want is the result of B:
scala> val B = Array(A(3),A(2),A(0))
B: Array[Int] = Array(13, 12, 10)
However, I don't want to hard code it like that because I don't know how long indices is or what would be in it.
The most concise way I can think of is to flip your mental model and put indices first:
indices map A
And, I would potentially suggest using lift to return an Option
indices map A.lift
You can use map on indices, which maps each element to a new element based on a mapping lambda. Note that on Array, you get an element at an index with the apply method:
indices.map(index => A.apply(index))
You can leave off apply:
indices.map(index => A(index))
You can also use the underscore syntax:
indices.map(A(_))
When you're in a situation like this, you can even leave off the underscore:
indices.map(A)
And you can use the alternate space syntax:
indices map A
You were trying to use foreach, which returns Unit, and is only used for side effects. For example:
indices.foreach(index => println(A(index)))
indices.map(A).foreach(println)
indices map A foreach println

Removing arrays which are subsets of other arrays in Scala

I have an array of array of integers. Like:
val t1 = Array(Array(1, 2, 3), Array(2), Array(4, 5, 6), Array(5, 6))
I want to remove the arrays that are subsets of another array. So, the result should be:
Array(Array(1, 2, 3), Array(4, 5, 6))
Ideally, these should be Sets, but in the context of my program, they are arrays, and I don't want to convert them to sets due to performance reasons.
I solved it this way in Scala, but I would like to know if there is a more elegant (and/or more efficient) way to do this:
def removeSubsets[T: ClassManifest](clusters: Array[Array[T]]) = {
val sortedClusters = clusters.sortBy(-1 * _.length)
sortedClusters.foldLeft(Array[Array[T]]()){ (acc, ele) =>
val isASubset = acc.exists(arr => (ele diff arr).length == 0)
if (isASubset) acc else acc :+ ele
}
}

Resources