Subtracting elements at specified indices in array - arrays

I am beginner to functional programming and Scala. I have an Array of arrays which contain Double numerals. I want to subtract elements (basically two arrays, see example) and I am unable to find online how to do this.
For example, consider
val instance = Array(Array(2.1, 3.4, 5.6),
Array(4.4, 7.8, 6.7))
I want to subtract 4.4 from 2.1, 7.8 from 3.4 and 6.7 from 5.6
Is this possible in Scala?
Apologies if the question seems very basic but any guidance in the right direction would be appreciated. Thank you for your time.

You can use .zip:
scala> instance(1).zip(instance(0)).map{ case (a,b) => a - b}
res3: Array[Double] = Array(2.3000000000000003, 4.4, 1.1000000000000005)
instance(1).zip(instance(0)) makes an array of tuples Array((2.1,4.4), (3.4,7.8), (5.6,6.7))from corresponding pairs in your array
.map{ case (a,b) => a - b} or .map(x => x._1 - x._2) is doing subtraction for every tuple.
I would also recommend to use tuple instead of your top-level array:
val instance = (Array(2.1, 3.4, 5.6), Array(4.4, 7.8, 6.7))
So now, with additional definitions, it looks much better
scala> val (a,b) = instance
a: Array[Double] = Array(2.1, 3.4, 5.6)
b: Array[Double] = Array(4.4, 7.8, 6.7)
scala> val sub = (_: Double) - (_: Double) //defined it as function, not method
sub: (Double, Double) => Double = <function2>
scala> a zip b map sub.tupled
res20: Array[Double] = Array(2.3000000000000003, 4.4, 1.1000000000000005)
*sub.tupled allows sub-function to receive tuple of 2 parameters instead of just two parameters here.

Related

Array filtering based on a condition in scala

I have the below array with me
scala> Array((65.0,53.0,54.0),(20.0,30.0,24.0),(11.0,19.0,43.0))
res3: Array[(Double, Double, Double)] = Array((65.0,53.0,54.0), (20.0,30.0,24.0), (11.0,19.0,43.0))
How to filter out the items from this array based on the third element ? ie , I am trying to get the item which has the least third element. ie, here the third elements are 54.0, 24.0 and 43.0 and
Expected output -
scala> Array((20.0,30.0,24.0))
res4: Array[(Double, Double, Double)] = Array((20.0,30.0,24.0))
how about,
val a = Array((65.0, 53.0, 54.0), (20.0, 30.0, 24.0), (11.0, 19.0, 43.0))
val l = a.minBy(_._3)
println(s">>Least third: ${l}")

how to read file and split in scala

If I had a file(like csv, txt...).
I wish get two array such as
Array(Array(1.0,2.0),Array(4.0,5.0),Array(7.0, 8.0),Array(10.0,11.0),Array(13.0,14.0))
and
Array(3.0, 6.0, 9.0, 12.0, 15.0)
What's the ideal way to do this in scala?
val rdd = sc.textFile("1.csv").map(_.split(',').map(_.trim().toDouble))
rdd.map(_.take(2)).collect()
res0: Array[Array[Double]] = Array(Array(1.0, 2.0), Array(4.0, 5.0), Array(7.0, 8.0), Array(10.0, 11.0), Array(13.0, 14.0))
rdd.map(_(2)).collect()
res2: Array[Double] = Array(3.0, 6.0, 9.0, 12.0, 15.0)
You can get both arrays in one go, so that you don't need to traverse the data twice:
val (first, second) = {
io.Source.fromFile(name).getLines
.map(_.split(",").map(_.toDouble))
.foldRight(Seq.empty[Array[Double]] -> Seq.empty[Double]) {
case (Array(x, y, z), (as, bs)) => (Array(x, y) +: as, z +: bs)
}
}
Now, you end up with two lists rather that arrays. Of that matters to you, first.toArray and second.toArray will do the conversion for you.
Similar to #Vitaliy Kotlyarenko's answer, but without using 3rd parties like Spark (Spark is great if your data is large, but an overkill otherwise):
val lines: Iterator[String] = scala.io.Source.fromFile("txt.csv").getLines()
val matrix: Array[Array[Double]] = lines.map(_.split(",").map(_.trim.toDouble)).toArray
val twoFirstColumns: Array[Array[Double]] = matrix.map(_.take(2))
val thirdColumn: Array[Double] = matrix.map(_(2))

Easiest way to represent Euclidean Distance in scala

I am writing a data mining algorithm in Scala and I want to write the Euclidean Distance function for a given test and several train instances. I have an Array[Array[Double]] with test and train instances. I have a method which loops through each test instance against all training instances and calculates distances between the two (picking one test and train instance per iteration) and returns a Double.
Say, for example, I have the following data points:
testInstance = Array(Array(3.2, 2.1, 4.3, 2.8))
trainPoints = Array(Array(3.9, 4.1, 6.2, 7.3), Array(4.5, 6.1, 8.3, 3.8), Array(5.2, 4.6, 7.4, 9.8), Array(5.1, 7.1, 4.4, 6.9))
I have a method stub (highlighting the distance function) which returns neighbours around a given test instance:
def predictClass(testPoints: Array[Array[Double]], trainPoints: Array[Array[Double]], k: Int): Array[Double] = {
for(testInstance <- testPoints)
{
for(trainInstance <- trainPoints)
{
for(i <- 0 to k)
{
distance = euclideanDistanceBetween(testInstance, trainInstance) //need help in defining this function
}
}
}
return distance
}
I know how to write a generic Euclidean Distance formula as:
math.sqrt(math.pow((x1 - y1), 2) + math.pow((x2 - y2), 2))
I have some pseudo steps as to what I want the method to do with a basic definition of the function:
def distanceBetween(testInstance: Array[Double], trainInstance: Array[Double]): Double = {
// subtract each element of trainInstance with testInstance
// for example,
// iteration 1 will do [Array(3.9, 4.1, 6.2, 7.3) - Array(3.2, 2.1, 4.3, 2.8)]
// i.e. sqrt(3.9-3.2)^2+(4.1-2.1)^2+(6.2-4.3)^2+(7.3-2.8)^2
// return result
// iteration 2 will do [Array(4.5, 6.1, 8.3, 3.8) - Array(3.2, 2.1, 4.3, 2.8)]
// i.e. sqrt(4.5-3.2)^2+(6.1-2.1)^2+(8.3-4.3)^2+(3.8-2.8)^2
// return result, and so on......
}
How can I write this in code?
So the formula you put in only works for two-dimensional vectors. You have four dimensions, but you should probably write your function to be flexible on this. So check out this formula.
So what you really want to say is:
for each position i:
subtract the ith element of Y from the ith element of X
square it
add all of those up
square root the whole thing
To make this more functional-programming style it will be more like:
square root the:
sum of:
zip X and Y into pairs
for each pair, square the difference
So that would look like:
import math._
def distance(xs: Array[Double], ys: Array[Double]) = {
sqrt((xs zip ys).map { case (x,y) => pow(y - x, 2) }.sum)
}
val testInstances = Array(Array(5.0, 4.8, 7.5, 10.0), Array(3.2, 2.1, 4.3, 2.8))
val trainPoints = Array(Array(3.9, 4.1, 6.2, 7.3), Array(4.5, 6.1, 8.3, 3.8), Array(5.2, 4.6, 7.4, 9.8), Array(5.1, 7.1, 4.4, 6.9))
distance(testInstances.head, trainPoints.head)
// 3.2680269276736382
As for predicting the class, you can make that more functional too, but it's unclear what the Double is that you are intending to return. It seems like you would want to predict the class for each test instance? Maybe choosing the class c corresponding to the nearest training point?
def findNearestClasses(testPoints: Array[Array[Double]], trainPoints: Array[Array[Double]]): Array[Int] = {
testPoints.map { testInstance =>
trainPoints.zipWithIndex.map { case (trainInstance, c) =>
c -> distance(testInstance, trainInstance)
}.minBy(_._2)._1
}
}
findNearestClasses(testInstances, trainPoints)
// Array(2, 0)
Or maybe you want the k-nearest neighbors:
def findKNearestClasses(testPoints: Array[Array[Double]], trainPoints: Array[Array[Double]], k: Int): Array[Int] = {
testPoints.map { testInstance =>
val distances =
trainPoints.zipWithIndex.map { case (trainInstance, c) =>
c -> distance(testInstance, trainInstance)
}
val classes = distances.sortBy(_._2).take(k).map(_._1)
val classCounts = classes.groupBy(identity).mapValues(_.size)
classCounts.maxBy(_._2)._1
}
}
findKNearestClasses(testInstances, trainPoints)
// Array(2, 1)
The generic formula for the euclidean distance is as follows:
math.sqrt(math.pow((x1 - x2), 2) + math.pow((y1 - y2), 2))
You can only compare the x coordinate with the x, and y with the y.

Julia Approach to python equivalent list of lists

I just started tinkering with Julia and I'm really getting to like it. However, I am running into a road block. For example, in Python (although not very efficient or pythonic), I would create an empty list and append a list of a known size and type, and then convert to a NumPy array:
Python Snippet
a = []
for ....
a.append([1.,2.,3.,4.])
b = numpy.array(a)
I want to be able to do something similar in Julia, but I can't seem to figure it out. This is what I have so far:
Julia snippet
a = Array{Float64}[]
for .....
push!(a,[1.,2.,3.,4.])
end
The result is an n-element Array{Array{Float64,N},1} of size (n,), but I would like it to be an nx4 Array{Float64,2}.
Any suggestions or better way of doing this?
The literal translation of your code would be
# Building up as rows
a = [1. 2. 3. 4.]
for i in 1:3
a = vcat(a, [1. 2. 3. 4.])
end
# Building up as columns
b = [1.,2.,3.,4.]
for i in 1:3
b = hcat(b, [1.,2.,3.,4.])
end
But this isn't a natural pattern in Julia, you'd do something like
A = zeros(4,4)
for i in 1:4, j in 1:4
A[i,j] = j
end
or even
A = Float64[j for i in 1:4, j in 1:4]
Basically allocating all the memory at once.
Does this do what you want?
julia> a = Array{Float64}[]
0-element Array{Array{Float64,N},1}
julia> for i=1:3
push!(a,[1.,2.,3.,4.])
end
julia> a
3-element Array{Array{Float64,N},1}:
[1.0,2.0,3.0,4.0]
[1.0,2.0,3.0,4.0]
[1.0,2.0,3.0,4.0]
julia> b = hcat(a...)'
3x4 Array{Float64,2}:
1.0 2.0 3.0 4.0
1.0 2.0 3.0 4.0
1.0 2.0 3.0 4.0
It seems to match the python output:
In [9]: a = []
In [10]: for i in range(3):
a.append([1, 2, 3, 4])
....:
In [11]: b = numpy.array(a); b
Out[11]:
array([[1, 2, 3, 4],
[1, 2, 3, 4],
[1, 2, 3, 4]])
I should add that this is probably not what you actually want to be doing as the hcat(a...)' can be expensive if a has many elements. Is there a reason not to use a 2d array from the beginning? Perhaps more context to the question (i.e. the code you are actually trying to write) would help.
The other answers don't work if the number of loop iterations isn't known in advance, or assume that the underlying arrays being merged are one-dimensional. It seems Julia lacks a built-in function for "take this list of N-D arrays and return me a new (N+1)-D array".
Julia requires a different concatenation solution depending on the dimension of the underlying data. So, for example, if the underlying elements of a are vectors, one can use hcat(a) or cat(a,dims=2). But, if a is e.g a 2D array, one must use cat(a,dims=3), etc. The dims argument to cat is not optional, and there is no default value to indicate "the last dimension".
Here is a helper function that mimics the np.array functionality for this use case. (I called it collapse instead of array, because it doesn't behave quite the same way as np.array)
function collapse(x)
return cat(x...,dims=length(size(x[1]))+1)
end
One would use this as
a = []
for ...
... compute new_a...
push!(a,new_a)
end
a = collapse(a)

how 2d array filter get value in scala

I have a 2d array
val A = Array((10.0,1.0,2.0,3.0),(20.0,4.0,5.0,6.0),(10.0,7.2.8.0,9.0))
how can I filter the first element which equal to 10, and get the other element value ?
result like:
x = Array((1.0,2.0,3.0),(7.2.8.0,9.0))
and I can use x(i) to get the value inside the array
thank you ! :)
You could do it like this:
A.filter(_._1 == 10).map{case (a,b,c,d)=>(b,c,d)}
Or like this:
for ((a,b,c,d) <- A if a == 10) yield (b,c,d)
(By the way, it's recommended that you don't use arrays in Scala unless you really need to; you should prefer immutable collections such as Seq and Vector. There's a (somewhat old) introduction to Scala collections here.)
scala> A.filter(_._1 == 10.0).map(t => t.productIterator.toList.tail)
res0: Array[List[Any]] = Array(List(1.0, 2.0, 3.0), List(7.2, 8.0, 9.0))

Resources