Process sequence of elements in scala - arrays

Given a sorted sequence of pair, I combine the elements if they are continuous or smaller and skip if they are not.
For example:
if A = Seq((1,2),(3,4),(5,6),(8,10)(11,15))
output should be Seq((1,6),(8,15))
since last element of current entry is continuous with first element of next entry
if B = Seq((1,4),(3,5),(6,7),(9,10)(11,15))
output should be Seq((1,7),(9,15))
since last element of current entry is smaller with first element of next entry
I tried something like:
val finalOut = mySeq.sliding(2).map {
case Array(x, y, _*) => (x, y, (x._2 - y._1))
}.toList
The problem with this is it will just take 2 elements at a time, whereas we need to keep traversing unless there is a gap in continuity. I am not sure how to obtain that in scala.
I tried implementing for loop as well, but that also doesn't help, because it processes one element at a time and doesn't help in keeping a track of other elements or counter like c++.

You can do this with foldRight and accumulate into a new Seq.
This works
val tups1 = Seq((1,2),(3,4),(5,6),(8,10),(11,15))
val tups2 = Seq((1,4),(3,5),(6,7),(9,10),(11,15))
def f(tups: Seq[(Int, Int)]): Seq[(Int, Int)] = {
val emptySeq: Seq[(Int, Int)] = Seq()
tups.foldRight(emptySeq){ (next, accum) => accum match {
case Nil => Seq(next)
case (a, b) +: cs if a - 1 <= next._2 => (next._1, next._2 max b) +: cs
case _ => next +: accum
}}
}
f(tups1) // Seq[(Int, Int)] = List((1,6), (8,15))
f(tups2) // Seq[(Int, Int)] = List((1,7), (9,15))
I'd use foldRight over foldLeft because Seq's are Lists under the hood, and for Lists, prepend is constant, while append is O(n), and foldRight lets you accomplish this with only prepends.
https://docs.scala-lang.org/overviews/collections/performance-characteristics.html

Related

merge the array of array in ruby on rails

I have one array like below
[["GJ","MP"],["HR","MH"],["MP","KL"],["KL","HR"]]
And I want result like below
"GJ, MP, KL, HR, MH"
First element of array ["GJ","MP"]
Added is in the answer_string = "GJ, MP"
Now Find MP which is the last element of this array in the other where is should be first element like this ["MP","KL"]
after this I have to add KL in to the answer_string = "GJ, MP, KL"
This is What I want as output
Given
ary = [["GJ","MP"],["HR","MH"],["MP","KL"],["KL","HR"]]
(where each element is in fact an edge in a simple graph that you need to traverse) your task can be solved in a quite straightforward way:
acc = ary.first.dup
ary.size.times do
# Find an edge whose "from" value is equal to the latest "to" one
next_edge = ary.find { |a, _| a == acc.last }
acc << next_edge.last if next_edge
end
acc
#=> ["GJ", "MP", "KL", "HR", "MH"]
Bad thing here is its quadratic time (you search through the whole array on each iteration) that would hit you badly if the initial array is large enough. It would be faster to use some auxiliary data structure with the faster lookup (hash, for instance). Smth. like
head, *tail = ary
edges = tail.to_h
tail.reduce(head.dup) { |acc, (k, v)| acc << edges[acc.last] }
#=> ["GJ", "MP", "KL", "HR", "MH"]
(I'm not joining the resulting array into a string but this is kinda straightforward)
d = [["GJ","MP"],["HR","MH"],["MP","KL"],["KL","HR"]]
o = [] # List for output
c = d[0][0] # Save the current first object
loop do # Keep looping through until there are no matching pairs
o.push(c) # Push the current first object to the output
n = d.index { |a| a[0] == c } # Get the index of the first matched pair of the current `c`
break if n == nil # If there are no found index, we've essentially gotten to the end of the graph
c = d[n][1] # Update the current first object
end
puts o.join(',') # Join the results
Updated as the question was dramatically changed. Essentially, you navigating a graph.
I use arr.size.times to loop
def check arr
new_arr = arr.first #new_arr = ["GJ","MP"]
arr.delete_at(0) # remove the first of arr. arr = [["HR","MH"],["MP","KL"],["KL","HR"]]
arr.size.times do
find = arr.find {|e| e.first == new_arr.last}
new_arr << find.last if find
end
new_arr.join(',')
end
array = [["GJ","MP"],["HR","MH"],["MP","KL"],["KL","HR"]]
p check(array)
#=> "GJ,MP,KL,HR,MH"
Assumptions:
a is an Array or a Hash
a is in the form provided in the Original Post
For each element b in a b[0] is unique
First thing I would do is, if a is an Array, then convert a to Hash for faster easier lookup up (this is not technically necessary but it simplifies implementation and should increase performance)
a = [["GJ","MP"],["HR","MH"],["MP","KL"],["KL","HR"]]
a.to_h
#=> {"GJ"=>"MP", "HR"=>"MH", "MP"=>"KL", "KL"=>"HR"}
UPDATE
If the path will always be from first to end of the chain and the elements are always a complete chain, then borrowing from #KonstantinStrukov's inspiration: (If you prefer this option then please given him the credit ✔️)
a.to_h.then {|edges| edges.reduce { |acc,_| acc << edges[acc.last] }}.join(",")
#=> "GJ,MP,KL,HR,MH"
Caveat: If there are disconnected elements in the original this result will contain nil (represented as trailing commas). This could be solved with the addition of Array#compact but it will also cause unnecessary traversals for each disconnected element.
ORIGINAL
We can use a recursive method to lookup the path from a given key to the end of the path. Default key is a[0][0]
def navigate(h,from:h.keys.first)
return unless h.key?(from)
[from, *navigate(h,from:h[from]) || h[from]].join(",")
end
Explanation:
navigation(h,from:h.keys.first) - Hash to traverse and the starting point for traversal
return unless h.key?(key) if the Hash does not contain the from key return nil (end of the chain)
[from, *navigate(h,from:h[from]) || h[from]].join(",") - build a Array of from key and the recursive result of looking up the value for that from key if the recursion returns nil then append the last value. Then simply convert the Array to a String joining the elements with a comma.
Usage:
a = [["GJ","MP"],["HR","MH"],["MP","KL"],["KL","HR"]].to_h
navigate(a)
#=> "GJ,MP,KL,HR,MH"
navigate(a,from: "KL")
#=> "KL,HR,MH"
navigate(a,from: "X")
#=> nil

Efficient Way to take first n sorted elements in Spark Partitions

I have created an RDD from array in Spark. I want to take n smallest elements from on each partition. I have sorted iterator at each partition every time and take first n elements and replaces them with elements of arr1. The way i have done is
var arr = (1 to 50000).toArray
val n = 50
val iterations = 100
val r = new Random()
val arr1 = Array.fill(n)(r.nextInt(10))
val rdd = sc.parallelize(arr,3)
rdd.mapPartitionsWithIndex{(index , it) =>
it=it.sortWith(_<_)
for(i<- 0 until n){
it(i) = arr1(i)
}
it
}
I want to ask is there any efficient way to perform same task in Scala
rdd.sortBy(x=>x)
.foreachPartition(y=>println(y.take(n).toList))
Replace println with your use case

Efficient method of merging two neighboring indicies in Scala

What would be a most efficient method of merging two neighboring indicies in Scala? What I have in mind a nasty while loops with copying.
For example, there's a buffer array A, with length N. The new array need be generated such that for A(i) = A(i) + A(i+1), where i < N
For example, merging and summing the second and third element, and generate a new array.
ArrayBuffer(1,2,4,3) => ArrayBuffer(1,6,3)
UPDATE:
I think I come up with some solution, but doesn't like it much. Any suggestion to improve would be highly appreciated.
scala> val i = 1
i: Int = 1
scala> ArrayBuffer(1,2,4,3).zipWithIndex.foldLeft(ArrayBuffer[Int]())( (k,v)=> if(v._2==i+1){ k(k.length-1) =(k.last+v._1);k; }else k+= v._1 )
The simplest way to get neighbors is to use sliding method.
a.sliding(2, 1).map(_.sum)
where the first argument is a size and the second one is step.
If you want to keep the first and the last element intact something like this should work:
a.head +: a.drop(1).dropRight(1).sliding(2, 1).map(_.sum).toArray :+ a.last
If you want to avoid copying and array on append/prepend you can rewrite it as follows:
val aa = a.sliding(2, 1).map(_.sum).toArray
aa(0) = a.head
aa(aa.size - 1) = a
or use ListBuffer which provides constant time prepend and append.
It should be also possible to use Iterators:
val middle: Iterator[Int] = a.drop(1).dropRight(1).sliding(2, 1).map(_.sum)
(Iterator(a.head) ++ middle ++ Iterator(a.last)).toArray // or toBuffer

2d Array Sort in Haskell

I'm trying to teach myself Haskell (coming from OOP languages). Having a hard time grasping the immutable variables stuff. I'm trying to sort a 2d array in row major.
In java, for example (pseudo):
int array[3][3] = **initialize array here
for(i = 0; i<3; i++)
for(j = 0; j<3; j++)
if(array[i][j] < current_low)
current_low = array[i][j]
How can I implement this same sort of thing in Haskell? If I create a temp array to add the low values to after each iteration, I won't be able to add to it because it is immutable, correct? Also, Haskell doesn't have loops, right?
Here's some useful stuff I know in Haskell:
main = do
let a = [[10,4],[6,10],[5,2]] --assign random numbers
print (a !! 0 !! 1) --will print a[0][1] in java notation
--How can we loop through the values?
First, your Java code does not sort anything. It just finds the smallest element. And, well, there's a kind of obvious Haskell solution... guess what, the function is called minimum! Let's see what it does:
GHCi> :t minimum
minimum :: Ord a => [a] -> a
ok, so it takes a list of values that can be compared (hence Ord) and outputs a single value, namely the smallest. How do we apply this to a "2D list" (nested list)? Well, basically we need the minimum amongst all minima of the sub-lists. So we first replace the list of list with the list of minima
allMinima = map minimum a
...and then use minimum allMinima.
Written compactly:
main :: IO ()
main = do
let a = [[10,4],[6,10],[5,2]] -- don't forget the indentation
print (minimum $ map minimum a)
That's all!
Indeed "looping through values" is a very un-functional concept. We generally don't want to talk about single steps that need to be taken, rather think about properties of the result we want, and let the compiler figure out how to do it. So if we weren't allowed to use the pre-defined minimum, here's how to think about it:
If we have a list and look at a single value... under what circumstances is it the correct result? Well, if it's smaller than all other values. And what is the smallest of the other values? Exactly, the minimum amongst them.
minimum' :: Ord a => [a] -> a
minimum' (x:xs)
| x < minimum' xs = x
If it's not smaller, then we just use the minimum of the other values
minimum' (x:xs)
| x < minxs = x
| otherwise = minxs
where minxs = minimum' xs
One more thing: if we recurse through the list this way, there will at some point be no first element left to compare with something. To prevent that, we first need the special case of a single-element list:
minimum' :: Ord a => [a] -> a
minimum' [x] = x -- obviously smallest, since there's no other element.
minimum' (x:xs)
| x < minxs = x
| otherwise = minxs
where minxs = minimum' xs
Alright, well, I'll take a stab. Zach, this answer is intended to get you thinking in recursions and folds. Recursions, folds, and maps are the fundamental ways that loops are replaced in functional style. Just try to believe that in reality, the question of nested looping rarely arises naturally in functional programming. When you actually need to do it, you'll often enter a special section of code, called a monad, in which you can do destructive writes in an imperative style. Here's an example. But, since you asked for help with breaking out of loop thinking, I'm going to focus on that part of the answer instead. #leftaroundabout's answer is also very good and you fill in his definition of minimum here.
flatten :: [[a]] -> [a]
flatten [] = []
flatten xs = foldr (++) [] xs
squarize :: Int -> [a] -> [[a]]
squarize _ [] = []
squarize len xs = (take len xs) : (squarize len $ drop len xs)
crappySort :: Ord a => [a] -> [a]
crappySort [] = []
crappySort xs =
let smallest = minimum xs
rest = filter (smallest /=) xs
count = (length xs) - (length rest)
in
replicate count smallest ++ crappySort rest
sortByThrees xs = squarize 3 $ crappySort $ flatten xs

2d scala array iteration

I have a 2d array of type boolean (not important)
It is easy to iterate over the array in non-functional style.
How to do it FP style?
var matrix = Array.ofDim[Boolean](5, 5)
for ex, I would like to iterate through all the rows for a given column and return a list of int that would match a specific function.
Example: for column 3, iterate through rows 1 to 5 to return 4, 5 if the cell at (4, 3), (5, 3) match a specif function. Thx v much
def getChildren(nodeId: Int) : List[Int] = {
info("getChildren("+nodeId+")")
var list = List[Int]()
val nodeIndex = id2indexMap(nodeId)
for (rowIndex <- 0 until matrix.size) {
val elem = matrix(rowIndex)(nodeIndex)
if (elem) {
println("Row Index = " + rowIndex)
list = rowIndex :: list
}
}
list
}
What about
(1 to 5) filter {i => predicate(matrix(i)(3))}
where predicate is your function?
Note that initialized with (5,5) indexes goes from 0 to 4.
Update: based on your example
def getChildren(nodeId: Int) : List[Int] = {
info("getChildren("+nodeId+")")
val nodeIndex = id2indexMap(nodeId)
val result = (0 until matrix.size).filter(matrix(_)(nodeIndex)).toList
result.forEach(println)
result
}
You may move the print in the fiter if you want too, and reverse the list if you want it exactly as in your example
If you're not comfortable with filters and zips, you can stick with the for-comprehension but use it in a more functional way:
for {
rowIndex <- matrix.indices
if matrix(rowIndex)(nodeIndex)
} yield {
println("Row Index = " + rowIndex)
rowIndex
}
yield builds a new collection from the results of the for-comprehension, so this expression evaluates to the collection you want to return. seq.indices is a method equivalent to 0 until seq.size. The curly braces allow you to span multiple lines without semicolons, but you can make it in-line if you want:
for (rowIndex <- matrix.indices; if matrix(rowIndex)(nodeIndex)) yield rowIndex
Should probably also mention that normally if you're iterating through an Array you won't need to refer to the indices at all. You'd do something like
for {
row <- matrix
elem <- row
} yield f(elem)
but your use-case is a bit unusual in that it requires the indices of the elements, which you shouldn't normally be concerned with (using array indices is essentially a quick and dirty hack to pair a data element with a number). If you want to capture and use the notion of position you might be better off using a Map[Int, Boolean] or a case class with such a field.
def findIndices[A](aa: Array[Array[A]], pred: A => Boolean): Array[Array[Int]] =
aa.map(row =>
row.zipWithIndex.collect{
case (v,i) if pred(v) => i
}
)
You can refactor it to be a bit more nicer by extracting the function that finds the indices in a single row only:
def findIndices2[A](xs: Array[A], pred: A => Boolean): Array[Int] =
xs.zipWithIndex.collect{
case (v,i) if pred(v) => i
}
And then write
matrix.map(row => findIndices2(row, pred))

Resources