For comprehension over Option array - arrays

I am getting compilation error:
Error:(64, 9) type mismatch;
found : Array[(String, String)]
required: Option[?]
y <- x
^
in a fragment:
val z = Some(Array("a"->"b", "c" -> "d"))
val l = for(
x <- z;
y <- x
) yield y
Why generator over Array does not produce items of the array? And where from requirement to have Option is coming from?
To be more ridiculous, if I replace "yield" with println(y) then it does compile.
Scala version: 2.10.6

This is because of the way for expressions are translated into map, flatmap and foreach expressions. Let's first simplify your example:
val someArray: Some[Array[Int]] = Some(Array(1, 2, 3))
val l = for {
array: Array[Int] <- someArray
number: Int <- array
} yield number
In accordance with the relevant part of the Scala language specification, this first gets translated into
someArray.flatMap {case array => for (number <- array) yield number}
which in turn gets translated into
someArray.flatMap {case array => array.map{case number => number}}
The problem is that someArray.flatMap expects a function from Array[Int] to Option[Array[Int]], whereas we've provided a function from Array[Int] to Array[Int].
The reason the compilation error goes away if yield number is replaced by println(number) is that for loops are translated differently from for comprehensions: it will now be translated as someArray.foreach{case array => array.foreach {case item => println(item)}}, which doesn't have the same typing issues.
A possible solution is to begin by converting the Option to the kind of collection you want to end up with, so that its flatMap method will have the right signature:
val l = for {
array: Array[Int] <- someArray.toArray
number: Int <- array
} yield number

It's the usual "option must be converted to mix monads" thing.
scala> for (x <- Option.option2Iterable(Some(List(1,2,3))); y <- x) yield y
res0: Iterable[Int] = List(1, 2, 3)
Compare
scala> for (x <- Some(List(1,2,3)); y <- x) yield y
<console>:12: error: type mismatch;
found : List[Int]
required: Option[?]
for (x <- Some(List(1,2,3)); y <- x) yield y
^
to
scala> Some(List(1,2,3)) flatMap (is => is map (i => i))
<console>:12: error: type mismatch;
found : List[Int]
required: Option[?]
Some(List(1,2,3)) flatMap (is => is map (i => i))
^
or
scala> for (x <- Some(List(1,2,3)).toSeq; y <- x) yield y
res3: Seq[Int] = List(1, 2, 3)

Related

How do I take slice from an array position to end of the array?

How do I get an array of array with elements like this? Is there an inbuilt scala api that can provide this value (without using combinations)?
e.g
val inp = Array(1,2,3,4)
Output
Vector(
Vector((1,2), (1,3), (1,4)),
Vector((2,3), (2,4)),
Vector((3,4))
)
My answer is below. I feel that there should be an elegant answer than this in scala.
val inp = Array(1,2,3,4)
val mp = (0 until inp.length - 1).map( x => {
(x + 1 until inp.length).map( y => {
(inp(x),inp(y))
})
})
print(mp)
+Edit
Added combination constraint.
Using combinations(2) and groupBy() on the first element (0) of each combination will give you the values and structure you want. Getting the result as a Vector[Vector]] will require some conversion using toVector
scala> inp.combinations(2).toList.groupBy(a => a(0)).values
res11: Iterable[List[Array[Int]]] = MapLike.DefaultValuesIterable
(
List(Array(2, 3), Array(2, 4)),
List(Array(1, 2), Array(1, 3), Array(1, 4)),
List(Array(3, 4))
)
ORIGINAL ANSWER
Note This answer is OK only if the elements in the Seq are unique and sorted (according to <). See edit for the more general case.
With
val v = a.toVector
and by foregoing combinations, I can choose tuples instead and not have to cast at the end
for (i <- v.init) yield { for (j <- v if i < j) yield (i, j) }
or
v.init.map(i => v.filter(i < _).map((i, _)))
Not sure if there's a performance hit for using init on vector
EDIT
For non-unique elements, we can use the indices
val v = a.toVector.zipWithIndex
for ((i, idx) <- v.init) yield { for ((j, jdx) <- v if idx < jdx) yield (i, j) }

Megre two sorted Array, List vs Array

This function merges two sorted lists. It takes two lists as a parameter and returns one.
def merge(xs : List[Int], ys : List[Int]) : List[Int] = {
(xs, ys) match {
case (Nil, Nil) => Nil
case (Nil, ys) => ys
case (xs, Nil) => xs
case (x :: xs1, y :: ys1) =>
if(x < y) x :: merge(xs1, ys)
else y :: merge(xs, ys1)
}
}
I wanted to rewrite this function by changing the parameter type from List to Array, and it did not work. However, going from List to Seq it worked. Could you tell me what does not work with the Arrays?
def mergeDontWork(xs : Array[Int], ys : Array[Int]) : Array[Int] = {
(xs, ys) match {
case (Array.empty,Array.empty) => Array.empty
case (Array.empty, ys) => ys
case (xs, Array.empty) => xs
case (x +: xs1, y +: ys1) => if(x < y) x +: merge2(xs1, ys)
else y +: merge2(xs, ys1)
}
}
The error comes from that part of code : if(x < y) x +: merge2(xs1, ys) : Array[Any] does not conform with the expected type Array[Int]
EDIT
I finally understood how to go from List to Array thanks to the solutions proposed by pedromss and Harald. I modified the function by making it tail recursive.
def mergeTailRecursion(xs : Array[Int], ys : Array[Int]) : Array[Int] ={
def recurse( acc:Array[Int],xs:Array[Int],ys:Array[Int]):Array[Int]={
(xs, ys) match {
case (Array(),Array()) => acc
case (Array(), ys) => acc++ys
case (xs, Array()) => acc++xs
case (a#Array(x, _*), b#Array(y, _*)) =>
if (x < y) recurse(acc:+x, a.tail, b)
else recurse( acc:+y, a, b.tail)
}
}
recurse(Array(),xs,ys)
}
You can't pattern match on Array.empty because it is a method. Use Array() instead.
(x +: xs1, y +: ys1) doesn't appear to be a valid match expression. Change to (x +: xs1, y +: ys1)
Compiling version of your code:
object Arrays extends App {
def merge(xs: Array[Int], ys: Array[Int]): Array[Int] = {
(xs, ys) match {
case (Array(), Array()) => Array.empty
case (Array(), ys2) => ys2
case (xs2, Array()) => xs2
case (xs1#Array(x, _*), ys1#Array(y, _*)) =>
if (x < y) x +: merge(xs1.tail, ys)
else y +: merge(xs, ys1.tail)
}
}
merge(Array(1, 2, 3), Array(4, 5, 6)).foreach(println)
}
Refer to [here|Why can't I pattern match on Stream.empty in Scala? for the explanation about pattern matching on methods.
And [here|How do I pattern match arrays in Scala? for the explanation about the _*. Basically it will match any number of arguments.
Lastlty about the xs1#, from the [documentation|https://www.scala-lang.org/files/archive/spec/2.11/08-pattern-matching.html]:
Pattern Binders
Pattern2 ::= varid `#' Pattern3
A pattern binder xx#pp consists of a pattern variable xx and a pattern pp. The type of the variable xx is the static type TT of the pattern pp. This pattern matches any value vv matched by the pattern pp, provided the run-time type of vv is also an instance of TT, and it binds the variable name to that value.
You could also do it with
case (Array(x, _*), Array(y, _*)) =>
if (x < y) x +: merge(xs.tail, ys)
else y +: merge(xs, ys.tail)
The unapply methods of +: in the last pattern matching case doesn't seem to resolve to the right types Int and Array[Int].
You could try something like this:
case (Array(x, xs1 #_*), Array(y, ys1 #_*)) =>
if (x<y) x +: mergeDontWork(xs.tail, ys)
else y +: mergeDontWork(xs, ys.tail)
Unfortunately the construct xs1 #_* results in xs1 being of type Seq[Int] so it's also not possible to pass it into the recursive call. I used xs.tail as a workaround.

Spark sequences [int] comparison with [String] sequence ouput

I was trying to compare integer wrapped arrays in two different columns and give the ratings as string:
import org.apache.spark.sql.Row
import org.apache.spark.sql.functions._
import scala.collection.mutable.WrappedArray
The DataFrame data has column A and B with wrapped array I would like to compare:
val data = Seq(
(Seq(1,2,3),Seq(4,5,6),Seq(7,8,9)),
(Seq(1,1,3),Seq(6,5,7),Seq(11,9,8))
).toDF("A","B","C")
And here is how it looks like:
data: org.apache.spark.sql.DataFrame = [A: array<int>, B: array<int> ... 1 more field]
+---------+---------+----------+
| A| B| C|
+---------+---------+----------+
|[1, 2, 3]|[4, 5, 6]| [7, 8, 9]|
|[1, 1, 3]|[6, 5, 7]|[11, 9, 8]|
+---------+---------+----------+
Then here is the the user define function which I would like to compare each elements in paired arrays in column A and B per row and give the ratings with simple logics. For example if A(1) > B(1) then D(1) is "Top". So as first row with column D, I hope to have ["Top", "Top", "Top"]
def myToChar(num1: Seq[Int], num2: Seq[Int]): Seq[String] = {
val twozipped = num1.zip(num2)
for ((x,y) <- num1.zip(num2)) {
if (x > y) "Top"
if (x < y) "Well"
if (x == y) "Good"
}}
val udfToChar = udf(myToChar(_: Seq[Int], _: Seq[Int]))
val ouput = data.withColumn("D",udfToChar($"A",$"B"))
However, I kept getting the <console>:45: error: type mismatch; error information. Not sure if my udf() type definition is wrong and appreciate any guidance to correct my mistake.
Your myToChar definition is declared to return a Seq[String] - but its implementation doesn't - it returns Unit, because a for expression (without a yield clause) has Unit type.
You can fix this by fixing the implementation of the function:
Replace the for with a map operation
Replace the last if with an else, otherwise the mapping function also returns Unit for inputs that adhere to none of the if conditions (unlike with pattern matching, the compiler can't conclude that your if conditions are exhaustive - it must assume there's also a possibility none of them would hold true)
So - a correct implementation would be:
def myToChar(num1: Seq[Int], num2: Seq[Int]): Seq[String] = {
num1.zip(num2).map { case (x, y) =>
if (x > y) "Top"
if (x < y) "Well"
else "Good"
}
}
Or alternatively using pattern matching with guards:
def myToChar(num1: Seq[Int], num2: Seq[Int]): Seq[String] = {
num1.zip(num2).map {
case (x, y) if x > y => "Top"
case (x, y) if x < y => "Well"
case _ => "Good"
}
}

Element-wise sum of arrays in Scala

How do I compute element-wise sum of the Arrays?
val a = new Array[Int](5)
val b = new Array[Int](5)
// assign values
// desired output: Array -> [a(0)+b(0), a(1)+b(1), a(2)+b(2), a(3)+b(3), a(4)+b(4)]
a.zip(b).flatMap(_._1+_._2)
missing parameter type for expanded function
Try:
a.zip(b).map { case (x, y) => x + y }
When you use an underscore as a placeholder in a function definition, it can only appear once (for each function argument position, that is, but in this case flatMap takes a Function1, so there's only one). If you need to refer to an argument more than once, you can't use the placeholder syntax—you'll need to give the argument a name.
As the other answers point out, you can use .map { case (x, y) => x + y } or the tuple accessor version, but it's also worth noting that if you want to avoid a bunch of tuple allocations in an intermediate collection, you can write the following:
scala> (a, b).zipped.map(_ + _)
res5: Array[Int] = Array(0, 0, 0, 0, 0)
Here zipped is a method that's available on pairs of collections that has a special map that takes a Function2, which means the only tuple that gets created is the (a, b) pair. The extra efficiency probably doesn't matter much in most cases, but the fact that you can pass a Function2 instead of a function from pairs means the syntax is often a little nicer as well.
// one D Array
val x = Array(1, 2, 3, 40, 55)
val x1 = Array(1, 2, 3, 40, 55)
x.indices.map(i=>x(i)+ x(i) )
// TWo D Array
val x1= Array((3,5), (5,7))
val x = Array((1,2), (3,4))
x.indices.map(i=>( x(i)._1 + x1(i)._1, x(i)._2 + x1(i)._2))

Scala logical indexing with for comprehension

I'm trying to translate the following Matlab logical-indexing pattern into Scala code:
% x is an [Nx1] array of Int32
% y is an [Nx1] array of Int32
% myExpensiveFunction() processes batches of unique x.
ux = unique(x);
z = nan(size(x));
for i = 1:length(ux)
idx = x == ux(i);
z(idx) = myExpensiveFuntion(x(idx), y(idx));
end
Assume I'm working with val x: Array[Int] in Scala. What is the best way to do this?
Edit: To clarify, I'm looking to process batches of (x,y) at a time, grouped by unique x, and return a result (z) with an order corresponding to the initial input. I'm open to sorting x, but eventually need to get back to the original unsorted order. My primary requirement is to handle all the indexing/mapping/sorting in a clear and reasonably efficient way.
Most of this is pretty straightforward in Scala; the only thing that's a bit out of the ordinary is the unique x indices. In Scala you'd do that with a `groupBy'. Since this is a really index-heavy method, I'm just going to give in and go with indices all the way:
val z = Array.fill(x.length)(Double.NaN)
x.indices.groupBy(i => x(i)).foreach{ case (xi, is) =>
is.foreach(i => z(i) = myExpensiveFunction(xi, y(i)))
}
z
assuming you can live with a lack of vectors going to myExpensiveFunction. If not,
val z = Array.fill(x.length)(Double.NaN)
x.indices.groupBy(i => x(i)).foreach{ case (xi, is) =>
val xs = Array.fill(is.length)(xi)
val ys = is.map(i => y(i)).toArray
val zs = myExpensiveFunction(xs, ys)
is.foreach(i => z(i) = zs(i))
}
z
This isn't the most natural way to do the computation in Scala, or the most efficient, but you don't care about efficiency if your expensive function is expensive, and it's the closest I can come to a literal translation.
(Translating your matlab-algorithms into almost everything else involves a certain amount of pain or rethinking, since the "natural" computations in matlab are not like those in most other languages.)
The important point is to get Matlab's unique right. A simple solution would be to use a Set to determine the unique values:
val occurringValues = x.toSet
occurringValues.foreach{ value =>
val indices = x.indices.filter(i => x(i) == value)
for (i <- indices) {
z(i) = myExpensiveFunction(x(i), y(i))
}
}
Note: I assume that it is possible to change myExpensiveFunction to element-wise operation...
scala> def process(xs: Array[Int], ys: Array[Int], f: (Seq[Int], Seq[Int]) => Double): Array[Double] = {
| val ux = xs.distinct
| val zs = Array.fill(xs.size)(Double.NaN)
| for(x <- ux) {
| val idx = xs.indices.filter{ i => xs(i) == x }
| val res = f(idx.map(xs), idx.map(ys))
| idx foreach { i => zs(i) = res }
| }
| zs
| }
process: (xs: Array[Int], ys: Array[Int], f: (Seq[Int], Seq[Int]) => Double)Array[Double]
scala> val xs = Array(1,2,1,2,3)
xs: Array[Int] = Array(1, 2, 1, 2, 3)
scala> val ys = Array(1,2,3,4,5)
ys: Array[Int] = Array(1, 2, 3, 4, 5)
scala> val f = (a: Seq[Int], b: Seq[Int]) => a.sum/b.sum.toDouble
f: (Seq[Int], Seq[Int]) => Double = <function2>
scala> process(xs, ys, f)
res0: Array[Double] = Array(0.5, 0.6666666666666666, 0.5, 0.6666666666666666, 0.6)

Resources