How to sum up every n elements of a Scala array? - arrays

what's the efficient way to sum up every n elements of an array in Scala? For example, if my array is like below:
val arr = Array(3,1,9,2,5,8,...)
and I want to sum up every 3 elements of this array and get a new array like below:
newArr = Array(13, 15, ...)
How can I do this efficiently in Spark Scala? Thank you very much.

grouped followed by map should do the trick:
scala> val arr = Array(3,1,9,2,5,8)
arr: Array[Int] = Array(3, 1, 9, 2, 5, 8)
scala> arr.grouped(3).map(_.sum).toArray
res0: Array[Int] = Array(13, 15)

Calling the toIterator method on the array before calling grouped should speed things up a bit, i.e.
arr.toIterator.grouped(3).map(_.sum).toArray
For example, using
val xs = Array.range(0, 10000)
10000 iterations of
xs.toIterator.grouped(3).map(_.sum).toArray
takes about 16.93 seconds, while 10000 iterations of
xs.grouped(3).map(_.sum).toArray
requires approximately 21.49 seconds.

Related

Find common elements in subarrays of arrays

I have two numpy arrays of shape arr1=(~140000, 3) and arr2=(~450000, 10). The first 3 elements of each row, for both the arrays, are coordinates (z,y,x). I want to find the rows of arr2 that have the same coordinates of arr1 (which can be considered a subgroup of arr2).
for example:
arr1 = [[1,2,3],[1,2,5],[1,7,8],[5,6,7]]
arr2 = [[1,2,3,7,66,4,3,44,8,9],[1,3,9,6,7,8,3,4,5,2],[1,5,8,68,7,8,13,4,53,2],[5,6,7,6,67,8,63,4,5,20], ...]
I want to find common coordinates (same first 3 elements):
list_arr = [[1,2,3,7,66,4,3,44,8,9], [5,6,7,6,67,8,63,4,5,20], ...]
At the moment I'm doing this double loop, which is extremely slow:
list_arr=[]
for i in arr1:
for j in arr2:
if i[0]==j[0] and i[1]==j[1] and i[2]==j[2]:
list_arr.append (j)
I also tried to create (after the 1st loop) a subarray of arr2, filtering it on the value of i[0] (arr2_filt = [el for el in arr2 if el[0]==i[0]). This speed a bit the operation, but it still remains really slow.
Can you help me with this?
Approach #1
Here's a vectorized one with views -
# https://stackoverflow.com/a/45313353/ #Divakar
def view1D(a, b): # a, b are arrays
a = np.ascontiguousarray(a)
b = np.ascontiguousarray(b)
void_dt = np.dtype((np.void, a.dtype.itemsize * a.shape[1]))
return a.view(void_dt).ravel(), b.view(void_dt).ravel()
a,b = view1D(arr1,arr2[:,:3])
out = arr2[np.in1d(b,a)]
Approach #2
Another with dimensionality-reduction for ints -
d = np.maximum(arr2[:,:3].max(0),arr1.max(0))
s = np.r_[1,d[:-1].cumprod()]
a,b = arr1.dot(s),arr2[:,:3].dot(s)
out = arr2[np.in1d(b,a)]
Improvement #1
We could use np.searchsorted to replace np.in1d for both of the approaches listed earlier -
unq_a = np.unique(a)
idx = np.searchsorted(unq_a,b)
idx[idx==len(a)] = 0
out = arr2[unq_a[idx] == b]
Improvement #2
For the last improvement on using np.searchsorted that also uses np.unique, we could use argsort instead -
sidx = a.argsort()
idx = np.searchsorted(a,b,sorter=sidx)
idx[idx==len(a)] = 0
out = arr2[a[sidx[idx]]==b]
You can do it with the help of set
arr = np.array([[1,2,3],[4,5,6],[7,8,9]])
arr2 = np.array([[7,8,9,11,14,34],[23,12,11,10,12,13],[1,2,3,4,5,6]])
# create array from arr2 with only first 3 columns
temp = [i[:3] for i in arr2]
aset = set([tuple(x) for x in arr])
bset = set([tuple(x) for x in temp])
np.array([x for x in aset & bset])
Output
array([[7, 8, 9],
[1, 2, 3]])
Edit
Use list comprehension
l = [list(i) for i in arr2 if i[:3] in arr]
print(l)
Output:
[[7, 8, 9, 11, 14, 34], [1, 2, 3, 4, 5, 6]]
For integers Divakar already gave an excellent answer. If you want to compare floats you have to consider e.g. the following:
1.+1e-15==1.
False
1.+1e-16==1.
True
If this behaviour could lead to problems in your code I would recommend to perform a nearest neighbour search and probably check if the distances are within a specified threshold.
import numpy as np
from scipy import spatial
def get_indices_of_nearest_neighbours(arr1,arr2):
tree=spatial.cKDTree(arr2[:,0:3])
#You can check here if the distance is small enough and otherwise raise an error
dist,ind=tree.query(arr1, k=1)
return ind

Get the maximum N elements (along with their indices) of an Array

I've got an array that contains Integers as the one shown below:
val my_array = Array(10, 20, 6, 31, 0, 2, -2)
I need to get the maximum 3 elements of this array along with their corresponding indices (either using a single function or two separate funcs).
For example, the output might be something like:
// max values
Array(31, 20, 10)
// max indices
Array(3, 1, 0)
Although the operations look simple, I was not able to find any relevant functions around.
Here's a straightforward way - zipWithIndex followed by sorting:
val (values, indices) = my_array
.zipWithIndex // add indices
.sortBy(t => -t._1) // sort by values (descending)
.take(3) // take first 3
.unzip // "unzip" the array-of-tuples into tuple-of-arrays
Here's another way to do it:
(my_array zip Stream.from(0)).
sortWith(_._1 > _._1).
take(3)
res1: Array[(Int, Int)] = Array((31,3), (20,1), (10,0))

Read File and store element as array Scala

I am new to Scala and this is the first time I'm using it. I want to read in a textfile with with two columns of numbers and store each column items in a separate list or array that will have to be cast as integer. For example the textfile looks like this:
1 2
2 3
3 4
4 5
1 6
6 7
7 8
8 9
6 10
I want to separate the two columns so that each column is stored in its on list or array.
Lets say you named the file as "columns", this would be a solution:
val lines = Source.fromFile("columns").getLines()
/* gets an Iterator[String] which interfaces a collection of all the lines in the file*/
val linesAsArraysOfInts = lines.map(line => line.split(" ").map(_.toInt))
/* Here you transform (map) any line to arrays of Int, so you will get as a result an Interator[Array[Int]] */
val pair: (List[Int], List[Int]) = linesAsArraysOfInts.foldLeft((List[Int](), List[Int]()))((acc, cur) => (cur(0) :: acc._1, cur(1) :: acc._2))
/* The foldLeft method on iterators, allows you to propagate an operation from left to right on the Iterator starting from an initial value and changing this value over the propagation process. In this case you start with two empty Lists stored as Tuple an on each step you prepend the first element of the array to the first List, and the second element to the second List. At the end you will have to Lists with the columns in reverse order*/
val leftList: List[Int] = pair._1.reverse
val rightList: List[Int] = pair._2.reverse
//finally you apply reverse to the lists and it's done :)
Here is one possible way of doing this:
val file: String = ??? // path to .txt in String format
val source = Source.fromFile(file)
scala> val columnsTogether = source.getLines.map { line =>
val nums = line.split(" ") // creating an array of just the 'numbers'
(nums.head, nums.last) // assumes every line has TWO numbers only
}.toList
columnsTogether: List[(String, String)] = List((1,2), (2,3), (3,4), (4,5), (1,6), (6,7), (7,8), (8,9), (6,10))
scala> columnsTogether.map(_._1.toInt)
res0: List[Int] = List(1, 2, 3, 4, 1, 6, 7, 8, 6)
scala> columnsTogether.map(_._2.toInt)
res1: List[Int] = List(2, 3, 4, 5, 6, 7, 8, 9, 10)

How can I select a non-sequential subset elements from an array using Scala and Spark?

In Python, this is how I would do it.
>>> x
array([10, 9, 8, 7, 6, 5, 4, 3, 2])
>>> x[np.array([3, 3, 1, 8])]
array([7, 7, 9, 2])
This doesn't work in the Scala Spark shell:
scala> val indices = Array(3,2,0)
indices: Array[Int] = Array(3, 2, 0)
scala> val A = Array(10,11,12,13,14,15)
A: Array[Int] = Array(10, 11, 12, 13, 14, 15)
scala> A(indices)
<console>:28: error: type mismatch;
found : Array[Int]
required: Int
A(indices)
The foreach method doesn't work either:
scala> indices.foreach(println(_))
3
2
0
scala> indices.foreach(A(_))
<no output>
What I want is the result of B:
scala> val B = Array(A(3),A(2),A(0))
B: Array[Int] = Array(13, 12, 10)
However, I don't want to hard code it like that because I don't know how long indices is or what would be in it.
The most concise way I can think of is to flip your mental model and put indices first:
indices map A
And, I would potentially suggest using lift to return an Option
indices map A.lift
You can use map on indices, which maps each element to a new element based on a mapping lambda. Note that on Array, you get an element at an index with the apply method:
indices.map(index => A.apply(index))
You can leave off apply:
indices.map(index => A(index))
You can also use the underscore syntax:
indices.map(A(_))
When you're in a situation like this, you can even leave off the underscore:
indices.map(A)
And you can use the alternate space syntax:
indices map A
You were trying to use foreach, which returns Unit, and is only used for side effects. For example:
indices.foreach(index => println(A(index)))
indices.map(A).foreach(println)
indices map A foreach println

Scala: what is the best way to append an element to an Array?

Say I have an Array[Int] like
val array = Array( 1, 2, 3 )
Now I would like to append an element to the array, say the value 4, as in the following example:
val array2 = array + 4 // will not compile
I can of course use System.arraycopy() and do this on my own, but there must be a Scala library function for this, which I simply could not find. Thanks for any pointers!
Notes:
I am aware that I can append another Array of elements, like in the following line, but that seems too round-about:
val array2b = array ++ Array( 4 ) // this works
I am aware of the advantages and drawbacks of List vs Array and here I am for various reasons specifically interested in extending an Array.
Edit 1
Thanks for the answers pointing to the :+ operator method. This is what I was looking for. Unfortunately, it is rather slower than a custom append() method implementation using arraycopy -- about two to three times slower. Looking at the implementation in SeqLike[], a builder is created, then the array is added to it, then the append is done via the builder, then the builder is rendered. Not a good implementation for arrays. I did a quick benchmark comparing the two methods, looking at the fastest time out of ten cycles. Doing 10 million repetitions of a single-item append to an 8-element array instance of some class Foo takes 3.1 sec with :+ and 1.7 sec with a simple append() method that uses System.arraycopy(); doing 10 million single-item append repetitions on 8-element arrays of Long takes 2.1 sec with :+ and 0.78 sec with the simple append() method. Wonder if this couldn't be fixed in the library with a custom implementation for Array?
Edit 2
For what it's worth, I filed a ticket:
https://issues.scala-lang.org/browse/SI-5017
You can use :+ to append element to array and +: to prepend it:
0 +: array :+ 4
should produce:
res3: Array[Int] = Array(0, 1, 2, 3, 4)
It's the same as with any other implementation of Seq.
val array2 = array :+ 4
//Array(1, 2, 3, 4)
Works also "reversed":
val array2 = 4 +: array
Array(4, 1, 2, 3)
There is also an "in-place" version:
var array = Array( 1, 2, 3 )
array +:= 4
//Array(4, 1, 2, 3)
array :+= 0
//Array(4, 1, 2, 3, 0)
The easiest might be:
Array(1, 2, 3) :+ 4
Actually, Array can be implcitly transformed in a WrappedArray

Resources