Comparing Arrays in Kotlin

Comparing Arrays in Kotlin - arrays

I am learning kotlin at the moment. Is there a way to "compare" two arrays? For example
I have an array (1,2,3) and an array (1,2,1). The output should be something like this:
"2,2" for "took two from the indice two".
Thanks in advance.

You can use zip as follows:
val array1 = listOf(1, 2, 3)
val array2 = listOf(1, 2, 1)
val out1 = array1.zip(array2, Int::minus)
println(out1) // [0, 0, 2]
This gives you a new array with the values of the differences.
From there, it's just a short step to the (uncommon) format you're requesting using mapIndexedNotNull:
val out2 = out1.mapIndexedNotNull{ i, v -> if (v != 0) listOf(i, v) else null }
println(out2) // [[2, 2]]

Related

Iterating over for an Array Column with dynamic size in Spark Scala Dataframe

I am familiar with this approach - case in point an example from How to obtain the average of an array-type column in scala-spark over all row entries per entry?
val array_size = 3
val avgAgg = for (i <- 0 to array_size -1) yield avg($"value".getItem(i))
df.select(array(avgAgg: _*).alias("avg_value")).show(false)
However, the 3 is hard-coded in reality.
No matter how hard I try not to use an UDF, I cannot do this type of thing dynamically based on the size of an array column already present in the data frame. E.g:
...
val z = for (i <- 1 to size($"sortedCol") ) yield array (element_at($"sortedCol._2", i), element_at($"sortedCol._3", i) )
...
...
.withColumn("Z", array(z: _*) )
I am looking as to how this can be done by applying to an existing array col which is variable in length. transform, expr? Not sure.
Full code as per request:
import spark.implicits._
import org.apache.spark.sql.functions._
import org.apache.spark.sql.expressions.Window
case class abc(year: Int, month: Int, item: String, quantity: Int)
val df0 = Seq(abc(2019, 1, "TV", 8),
abc(2019, 7, "AC", 10),
abc(2018, 1, "TV", 2),
abc(2018, 2, "AC", 3),
abc(2019, 2, "CO", 10)).toDS()
val df1 = df0.toDF()
// Gen some data, can be done easier, but not the point.
val itemsList= collect_list(struct("month", "item", "quantity"))
// This nn works.
val nn = 3
val z = for (i <- 1 to nn) yield array (element_at($"sortedCol.item", i), element_at($"sortedCol.quantity", i) )
// But want this.
//val z = for (i <- 1 to size($"sortedCol") ) yield array (element_at($"sortedCol.item", i), element_at($"sortedCol.quantity", i) )
val df2 = df1.groupBy($"year")
.agg(itemsList as "items")
.withColumn("sortedCol", sort_array($"items", asc = true))
.withColumn("S", size($"sortedCol")) // cannot use this either
.withColumn("Z", array(z: _*) )
.drop("items")
.orderBy($"year".desc)
df2.show(false)
// Col Z is the output I want, but not the null value Array
UPD
In apache spark SQL, how to remove the duplicate rows when using collect_list in window function? there I solve with a very simple UDF but I was looking for a way without UDF and in particular the dynamic setting of the to value in the for loop. The answer proves that certain constructs are not possible - which was the verification being sort.

If I correctly understand your need, you can simply use transform function like this:
val df2 = df1.groupBy($"year")
.agg(itemsList as "items")
.withColumn("sortedCol", sort_array($"items", asc = true))
val transform_expr = "transform(sortedCol, x -> array(x.item, x.quantity))"
df2.withColumn("Z", expr(transform_expr)).show(false)
//+----+--------------------------------------+--------------------------------------+-----------------------------+
//|year|items |sortedCol |Z |
//+----+--------------------------------------+--------------------------------------+-----------------------------+
//|2018|[[1, TV, 2], [2, AC, 3]] |[[1, TV, 2], [2, AC, 3]] |[[TV, 2], [AC, 3]] |
//|2019|[[1, TV, 8], [7, AC, 10], [2, CO, 10]]|[[1, TV, 8], [2, CO, 10], [7, AC, 10]]|[[TV, 8], [CO, 10], [AC, 10]]|
//+----+--------------------------------------+--------------------------------------+-----------------------------+

Find common elements in subarrays of arrays

I have two numpy arrays of shape arr1=(~140000, 3) and arr2=(~450000, 10). The first 3 elements of each row, for both the arrays, are coordinates (z,y,x). I want to find the rows of arr2 that have the same coordinates of arr1 (which can be considered a subgroup of arr2).
for example:
arr1 = [[1,2,3],[1,2,5],[1,7,8],[5,6,7]]
arr2 = [[1,2,3,7,66,4,3,44,8,9],[1,3,9,6,7,8,3,4,5,2],[1,5,8,68,7,8,13,4,53,2],[5,6,7,6,67,8,63,4,5,20], ...]
I want to find common coordinates (same first 3 elements):
list_arr = [[1,2,3,7,66,4,3,44,8,9], [5,6,7,6,67,8,63,4,5,20], ...]
At the moment I'm doing this double loop, which is extremely slow:
list_arr=[]
for i in arr1:
for j in arr2:
if i[0]==j[0] and i[1]==j[1] and i[2]==j[2]:
list_arr.append (j)
I also tried to create (after the 1st loop) a subarray of arr2, filtering it on the value of i[0] (arr2_filt = [el for el in arr2 if el[0]==i[0]). This speed a bit the operation, but it still remains really slow.
Can you help me with this?

Approach #1
Here's a vectorized one with views -
# https://stackoverflow.com/a/45313353/ #Divakar
def view1D(a, b): # a, b are arrays
a = np.ascontiguousarray(a)
b = np.ascontiguousarray(b)
void_dt = np.dtype((np.void, a.dtype.itemsize * a.shape[1]))
return a.view(void_dt).ravel(), b.view(void_dt).ravel()
a,b = view1D(arr1,arr2[:,:3])
out = arr2[np.in1d(b,a)]
Approach #2
Another with dimensionality-reduction for ints -
d = np.maximum(arr2[:,:3].max(0),arr1.max(0))
s = np.r_[1,d[:-1].cumprod()]
a,b = arr1.dot(s),arr2[:,:3].dot(s)
out = arr2[np.in1d(b,a)]
Improvement #1
We could use np.searchsorted to replace np.in1d for both of the approaches listed earlier -
unq_a = np.unique(a)
idx = np.searchsorted(unq_a,b)
idx[idx==len(a)] = 0
out = arr2[unq_a[idx] == b]
Improvement #2
For the last improvement on using np.searchsorted that also uses np.unique, we could use argsort instead -
sidx = a.argsort()
idx = np.searchsorted(a,b,sorter=sidx)
idx[idx==len(a)] = 0
out = arr2[a[sidx[idx]]==b]

You can do it with the help of set
arr = np.array([[1,2,3],[4,5,6],[7,8,9]])
arr2 = np.array([[7,8,9,11,14,34],[23,12,11,10,12,13],[1,2,3,4,5,6]])
# create array from arr2 with only first 3 columns
temp = [i[:3] for i in arr2]
aset = set([tuple(x) for x in arr])
bset = set([tuple(x) for x in temp])
np.array([x for x in aset & bset])
Output
array([[7, 8, 9],
[1, 2, 3]])
Edit
Use list comprehension
l = [list(i) for i in arr2 if i[:3] in arr]
print(l)
Output:
[[7, 8, 9, 11, 14, 34], [1, 2, 3, 4, 5, 6]]

For integers Divakar already gave an excellent answer. If you want to compare floats you have to consider e.g. the following:
1.+1e-15==1.
False
1.+1e-16==1.
True
If this behaviour could lead to problems in your code I would recommend to perform a nearest neighbour search and probably check if the distances are within a specified threshold.
import numpy as np
from scipy import spatial
def get_indices_of_nearest_neighbours(arr1,arr2):
tree=spatial.cKDTree(arr2[:,0:3])
#You can check here if the distance is small enough and otherwise raise an error
dist,ind=tree.query(arr1, k=1)
return ind

Finding common strings from map type and array Scala without loops

I am trying to find the common strings in a map and an array to output the respective values(from map, values here is Map[key -> value]) in Scala, I'm trying to not use any loops. Example:
Input:
Array("Ash","Garcia","Mac") Map("Ash" -> 5, "Mac" -> 4, "Lucas" -> 3)
Output:
Array(5,4)
The output is an array with 5 and 4 because Ash and Mac are common in both the data structures

What constitutes a loop?
def common(arr: Array[String], m: Map[String,Int]): Array[Int] =
arr flatMap m.get
Usage:
common(Array("Ash","Garcia","Mac")
,Map("Ash" -> 5, "Mac" -> 4, "Lucas" -> 3))
// res0: Array[Int] = Array(5, 4)

This is the most elegant solution, I think, but the results may not fit your requirements if there are duplicates in the array.
yourArray.collect(yourMap) // Array(5,4)

Use .filter to find the matching entries only, then get the value of your filtered map.
Given
scala> val names = Array("Ash","Garcia","Mac")
names: Array[String] = Array(Ash, Garcia, Mac)
scala> val nameToNumber = Map("Ash" -> 5, "Mac" -> 4, "Lucas" -> 3)
nameToNumber: scala.collection.immutable.Map[String,Int] = Map(Ash -> 5, Mac -> 4, Lucas -> 3)
.filter.map
scala> nameToNumber.filter(x => names.contains(x._1)).map(_._2)
res3: scala.collection.immutable.Iterable[Int] = List(5, 4)
Alternatively, you can use collect,
scala> nameToNumber.collect{case kv if names.contains(kv._1) => kv._2}
res6: scala.collection.immutable.Iterable[Int] = List(5, 4)
Your complexity here is O(n2)

Quite easy for scala elegant syntax：
val a = Array("Ash","Garcia","Mac")
val m = Map("Ash" -> 5, "Mac" -> 4, "Lucas" -> 3)
println(m.filter { case (k, v) => a.contains(k)}.map { case (k, v) => v}.toArray)
Here is the solution！

how to insert element to rdd array in spark

Hi I've tried to insert element to rdd array[String] using scala in spark.
Here is example.
val data = RDD[Array[String]] = Array(Array(1,2,3), Array(1,2,3,4), Array(1,2)).
I want to make length 4 of all arrays in this data.
If the length of array is less than 4, I want to fill the NULL value in the array.
here is my code that I tried to solve.
val newData = data.map(x =>
if(x.length < 4){
for(i <- x.length until 4){
x.union("NULL")
}
}
else{
x
}
)
But The result is Array[Any] = Array((), Array(1, 2, 3, 4), ()).
So I tried another ways. I used yield on for loop.
val newData = data.map(x =>
if(x.length < 4){
for(i <- x.length until 4)yield{
x.union("NULL")
}
}
else{
x
}
)
The result is Array[Object] = Array(Vector(Array(1, 2, 3, N, U, L, L)), Array(1, 2, 3, 4), Vector(Array(1, 2, N, U, L, L), Array(1, 2, N, U, L, L)))
these are not what I want. I want to return like this
RDD[Array[String]] = Array(Array(1,2,3,NULL), Array(1,2,3,4), Array(1,2,NULL,NULL)).
What should I do?
Is there a method to solve it?

union is a functional operation, it doesn't change the array x. You don't need to do this with a loop, though, and any loop implementations will probably be slower -- it's much better to create one new collection with all the NULL values instead of mutating something every time you add a null. Here's a lambda function that should work for you:
def fillNull(x: Array[Int], desiredLength: Int): Array[String] = {
x.map(_.toString) ++ Array.fill(desiredLength - x.length)("NULL")
}
val newData = data.map(fillNull(_, 4))

I solved your use case with the following code:
val initialRDD = sparkContext.parallelize(Array(Array[AnyVal](1, 2, 3), Array[AnyVal](1, 2, 3, 4), Array[AnyVal](1, 2, 3)))
val transformedRDD = initialRDD.map(array =>
if (array.length < 4) {
val transformedArray = Array.fill[AnyVal](4)("NULL")
Array.copy(array, 0, transformedArray, 0, array.length)
transformedArray
} else {
array
}
)
val result = transformedRDD.collect()

Groupby Array[Array[String]] Scala

I have an Array[Array[String]] like this:
Array(Array("A","1","2"),
Array("A","3","4"),
Array("A","5","6"),
Array("B","7","8"),
Array("B","9","10"))
I would like to groupby on the first element of each sub Array and get a Map like this:
var A:Map[String, Array[String] = Map()
A + = ('A' -> Array("1", "2"))
A + = ('A' -> Array("3", "4"))
A + = ('A' -> Array("5", "6"))
A + = ('B' -> Array("7", "8"))
A + = ('B' -> Array("9", "10"))
I don't know how to manipulate groupby to get this result.
Do you have any idea?

Try this.
val arr = Array(Array("A","1","2"),
Array("A","3","4"),
Array("A","5","6"),
Array("B","7","8"),
Array("B","9","10"))
val result = arr.groupBy(_.head).map{case (k,v) => k -> v.flatMap(_.tail)}
result("A") // Array(1, 2, 3, 4, 5, 6)
result("B") // Array(7, 8, 9, 10)
Basically, after grouping you need to remove the head of each sub-array (that's the tail part), and you need to flatten the sub-arrays into a single array (that's the flatMap part).
Warning: this will throw a runtime exception if any of the sub-arrays are empty. Here's a version that will take care of that.
val result=arr.groupBy(_.headOption).collect{case (Some(k),v)=>k->v.flatMap(_.tail)}

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Comparing Arrays in Kotlin - arrays

I am learning kotlin at the moment. Is there a way to "compare" two arrays? For example I have an array (1,2,3) and an array (1,2,1). The output should be something like this: "2,2" for "took two from the indice two". Thanks in advance.

Related

Iterating over for an Array Column with dynamic size in Spark Scala Dataframe

Find common elements in subarrays of arrays

Finding common strings from map type and array Scala without loops

how to insert element to rdd array in spark

Groupby Array[Array[String]] Scala

Categories

Resources