how to read file and split in scala - arrays

If I had a file(like csv, txt...).
I wish get two array such as
Array(Array(1.0,2.0),Array(4.0,5.0),Array(7.0, 8.0),Array(10.0,11.0),Array(13.0,14.0))
and
Array(3.0, 6.0, 9.0, 12.0, 15.0)
What's the ideal way to do this in scala?

val rdd = sc.textFile("1.csv").map(_.split(',').map(_.trim().toDouble))
rdd.map(_.take(2)).collect()
res0: Array[Array[Double]] = Array(Array(1.0, 2.0), Array(4.0, 5.0), Array(7.0, 8.0), Array(10.0, 11.0), Array(13.0, 14.0))
rdd.map(_(2)).collect()
res2: Array[Double] = Array(3.0, 6.0, 9.0, 12.0, 15.0)

You can get both arrays in one go, so that you don't need to traverse the data twice:
val (first, second) = {
io.Source.fromFile(name).getLines
.map(_.split(",").map(_.toDouble))
.foldRight(Seq.empty[Array[Double]] -> Seq.empty[Double]) {
case (Array(x, y, z), (as, bs)) => (Array(x, y) +: as, z +: bs)
}
}
Now, you end up with two lists rather that arrays. Of that matters to you, first.toArray and second.toArray will do the conversion for you.

Similar to #Vitaliy Kotlyarenko's answer, but without using 3rd parties like Spark (Spark is great if your data is large, but an overkill otherwise):
val lines: Iterator[String] = scala.io.Source.fromFile("txt.csv").getLines()
val matrix: Array[Array[Double]] = lines.map(_.split(",").map(_.trim.toDouble)).toArray
val twoFirstColumns: Array[Array[Double]] = matrix.map(_.take(2))
val thirdColumn: Array[Double] = matrix.map(_(2))

Related

Array filtering based on a condition in scala

I have the below array with me
scala> Array((65.0,53.0,54.0),(20.0,30.0,24.0),(11.0,19.0,43.0))
res3: Array[(Double, Double, Double)] = Array((65.0,53.0,54.0), (20.0,30.0,24.0), (11.0,19.0,43.0))
How to filter out the items from this array based on the third element ? ie , I am trying to get the item which has the least third element. ie, here the third elements are 54.0, 24.0 and 43.0 and
Expected output -
scala> Array((20.0,30.0,24.0))
res4: Array[(Double, Double, Double)] = Array((20.0,30.0,24.0))
how about,
val a = Array((65.0, 53.0, 54.0), (20.0, 30.0, 24.0), (11.0, 19.0, 43.0))
val l = a.minBy(_._3)
println(s">>Least third: ${l}")

Create Array of Arrays with different sizes in scala [duplicate]

How do I create an array of multiple dimensions?
For example, I want an integer or double matrix, something like double[][] in Java.
I know for a fact that arrays changed in Scala 2.8 and that the old arrays are deprecated, but are there multiple ways to do it now and if yes, which is best?
Like so:
scala> Array.ofDim[Double](2, 2, 2)
res2: Array[Array[Array[Double]]] = Array(Array(Array(0.0, 0.0), Array(0.0, 0.0)), Array(Array(0.0, 0.0), Array(0.0, 0.0)))
scala> {val (x, y) = (2, 3); Array.tabulate(x, y)( (x, y) => x + y )}
res3: Array[Array[Int]] = Array(Array(0, 1, 2), Array(1, 2, 3))
It's deprecated. Companion object exports factory methods ofDim:
val cube = Array.ofDim[Float](8, 8, 8)
How to create and use a multi-dimensional array in Scala?
var dd : Array[(Int, (Double, Double))] = Array((1,(0.0,0.0)))

Using type to define a multidimensional array in scala

I'm trying to define a type for a matrix (two dimensional array). I have this:
scala> type DMatrix[T] = Array[Array[T]]
defined type alias DMatrix
and then I define de DMatrix:
scala> def DMatrix = Array.ofDim[Double](2,2)
DMatrix: Array[Array[Double]]
So far so good. The problem now is how to work with th DMatrix. I've tried some examples but nothing happens:
scala> DMatrix(0)(0) = 1.0
scala> DMatrix
res40: Array[Array[Double]] = Array(Array(0.0, 0.0), Array(0.0, 0.0))
scala> DMatrix(0)
res41: Array[Double] = Array(0.0, 0.0)
scala> DMatrix(0) = Array(1.0,2.1)
scala> DMatrix(0)
res43: Array[Double] = Array(0.0, 0.0)
so, the question is how to use this DMatrix type?
thanks in advance
There's just a tiny but crucial mistake here - in:
scala> def DMatrix = Array.ofDim[Double](2,2)
You've used def instead of val to declare DMatrix: that means that the expression is evaluated anew everytime you access it, so when you modify the values in the arrays, the result is "thrown away" in favor of a new DMatrix instance.
Changing it to val would fix the issue and you'll see all changes:
scala> val DMatrix = Array.ofDim[Double](2,2)
DMatrix: Array[Array[Double]] = Array(Array(0.0, 0.0), Array(0.0, 0.0))
scala> DMatrix(0)(0) = 1.0
scala> DMatrix
res1: Array[Array[Double]] = Array(Array(1.0, 0.0), Array(0.0, 0.0))

Subtracting elements at specified indices in array

I am beginner to functional programming and Scala. I have an Array of arrays which contain Double numerals. I want to subtract elements (basically two arrays, see example) and I am unable to find online how to do this.
For example, consider
val instance = Array(Array(2.1, 3.4, 5.6),
Array(4.4, 7.8, 6.7))
I want to subtract 4.4 from 2.1, 7.8 from 3.4 and 6.7 from 5.6
Is this possible in Scala?
Apologies if the question seems very basic but any guidance in the right direction would be appreciated. Thank you for your time.
You can use .zip:
scala> instance(1).zip(instance(0)).map{ case (a,b) => a - b}
res3: Array[Double] = Array(2.3000000000000003, 4.4, 1.1000000000000005)
instance(1).zip(instance(0)) makes an array of tuples Array((2.1,4.4), (3.4,7.8), (5.6,6.7))from corresponding pairs in your array
.map{ case (a,b) => a - b} or .map(x => x._1 - x._2) is doing subtraction for every tuple.
I would also recommend to use tuple instead of your top-level array:
val instance = (Array(2.1, 3.4, 5.6), Array(4.4, 7.8, 6.7))
So now, with additional definitions, it looks much better
scala> val (a,b) = instance
a: Array[Double] = Array(2.1, 3.4, 5.6)
b: Array[Double] = Array(4.4, 7.8, 6.7)
scala> val sub = (_: Double) - (_: Double) //defined it as function, not method
sub: (Double, Double) => Double = <function2>
scala> a zip b map sub.tupled
res20: Array[Double] = Array(2.3000000000000003, 4.4, 1.1000000000000005)
*sub.tupled allows sub-function to receive tuple of 2 parameters instead of just two parameters here.

How to create and use a multi-dimensional array in Scala?

How do I create an array of multiple dimensions?
For example, I want an integer or double matrix, something like double[][] in Java.
I know for a fact that arrays changed in Scala 2.8 and that the old arrays are deprecated, but are there multiple ways to do it now and if yes, which is best?
Like so:
scala> Array.ofDim[Double](2, 2, 2)
res2: Array[Array[Array[Double]]] = Array(Array(Array(0.0, 0.0), Array(0.0, 0.0)), Array(Array(0.0, 0.0), Array(0.0, 0.0)))
scala> {val (x, y) = (2, 3); Array.tabulate(x, y)( (x, y) => x + y )}
res3: Array[Array[Int]] = Array(Array(0, 1, 2), Array(1, 2, 3))
It's deprecated. Companion object exports factory methods ofDim:
val cube = Array.ofDim[Float](8, 8, 8)
How to create and use a multi-dimensional array in Scala?
var dd : Array[(Int, (Double, Double))] = Array((1,(0.0,0.0)))

Resources