Scala, Collection.searching with user-defined implicit Ordering[] - arrays

I need to perform binary search on an array of custom case class. This should be as simple as calling the search function defined in scala.collection.Searching:
As you can see, if the collection on which I call the search method is an indexed sequence, the binary search is performed.
Now, I need to create my custom Ordering[B] parameter and I want to pass it explicitly to the search function (I don't want for it to take any implicit parameter inferred from context).
I have the following code:
// File 1
case class Person(name: String, id: Int)
object Person{
val orderingById: Ordering[Person] = Ordering.by(e => e.id)
}
// File 2 (same package)
for(i <- orderedId.indices) {
// orderedId is an array of Int
// listings is an array of Person
val listingIndex = listings.search(orderedId(i))(Person.orderingById)
...
}
I get the following error:
Type mismatch. Required: Ordering[Any], found: Ordering[Nothing]
So, I tried change the implementation in this way:
// file 1
object Person{
implicit def orderingById[A <: Person] : Ordering[A] = {
Ordering.by(e => e.id)
}
}
//file 2 as before
This time getting the following error:
Type mismatch. Required: Ordering[Any], found: Ordering[Person]
Why does it happen? At least in the second case, should it convert from Any to Person?

Follow the type specifications.
If you want to .search() on a collection of Person elements then the first search parameter should be a Person (or a super-class thereof).
val listingIndex =
listings.search(Person("",orderedId(i)))(Person.orderingById)
Or, to put it in a more complete and succinct context:
import scala.collection.Searching.SearchResult
case class Person(name: String, id: Int)
val listings: Array[Person] = ...
val orderedId: Array[Int] = ...
for(id <- orderedId) {
val listingIndex: SearchResult =
listings.search(Person("",id))(Ordering.by(_.id))
}

I'll add a bit just to elaborate about your error. First, please note that Searching.search is deprecated, with deprecation message:
Search methods are defined directly on SeqOps and do not require scala.collection.Searching any more.
search is now defined on IndexedSeqOps. Let's look at the signature:
final def search[B >: A](elem: B)(implicit ord: Ordering[B])
When you call:
listings.search(orderedId(i))(Person.orderingById)
The result of orderedId(i) is Int. Therefore, B in the signature above is Int. The definition of Int is:
final abstract class Int private extends AnyVal
A is Person, because listing is of type Array[Person]. Therefore, search, is looking for a common root for both Int and Person. This common root is Any, hence you are getting this error. One way to overcome it, is to define an implicit conversion from Int to Person:
object Person{
val orderingById: Ordering[Person] = Ordering.by(e => e.id)
implicit def apply(id: Int): Person = {
Person("not defined", id)
}
}
Then the following:
val listings = Array(Person("aa", 1), Person("bb", 2), Person("dd", 4))
val orderedId = 1.to(6).toArray
for(i <- orderedId.indices) {
// orderedId is an array of Int
// listings is an array of Person
listings.search[Person](orderedId(i))(Person.orderingById) match {
case Found(foundIndex) =>
println("foundIndex: " + foundIndex)
case InsertionPoint(insertionPoint) =>
println("insertionPoint: " + insertionPoint)
}
}
will produce:
foundIndex: 0
foundIndex: 1
insertionPoint: 2
foundIndex: 2
insertionPoint: 3
insertionPoint: 3
Code run in Scastie.

Related

GenericRowWithSchema ClassCastException in Spark 3 Scala UDF for Array data

I am writing a Spark 3 UDF to mask an attribute in an Array field.
My data (in parquet, but shown in a JSON format):
{"conditions":{"list":[{"element":{"code":"1234","category":"ABC"}},{"element":{"code":"4550","category":"EDC"}}]}}
case class:
case class MyClass(conditions: Seq[MyItem])
case class MyItem(code: String, category: String)
Spark code:
val data = Seq(MyClass(conditions = Seq(MyItem("1234", "ABC"), MyItem("4550", "EDC"))))
import spark.implicits._
val rdd = spark.sparkContext.parallelize(data)
val ds = rdd.toDF().as[MyClass]
val maskedConditions: Column = updateArray.apply(col("conditions"))
ds.withColumn("conditions", maskedConditions)
.select("conditions")
.show(2)
Tried the following UDF function.
UDF code:
def updateArray = udf((arr: Seq[MyItem]) => {
for (i <- 0 to arr.size - 1) {
// Line 3
val a = arr(i).asInstanceOf[org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema]
val a = arr(i)
println(a.getAs[MyItem](0))
// TODO: How to make code = "XXXX" here
// a.code = "XXXX"
}
arr
})
Goal:
I need to set 'code' field value in each array item to "XXXX" in a UDF.
Issue:
I am unable to modify the array fields.
Also I get the following error if remove the line 3 in the UDF (cast to GenericRowWithSchema).
Error:
Caused by: java.lang.ClassCastException: org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema cannot be cast to MyItem
Question: How to capture Array of Structs in a function and how to return a modified array of items?
Welcome to Stackoverflow!
There is a small json linting error in your data: I assumed that you wanted to close the [] square brackets of the list array. So, for this example I used the following data (which is the same as yours):
{"conditions":{"list":[{"element":{"code":"1234","category":"ABC"}},{"element":{"code":"4550","category":"EDC"}}]}}
You don't need UDFs for this: a simple map operation will be sufficient! The following code does what you want:
import spark.implicits._
import org.apache.spark.sql.Encoders
case class MyItem(code: String, category: String)
case class MyElement(element: MyItem)
case class MyList(list: Seq[MyElement])
case class MyClass(conditions: MyList)
val df = spark.read.json("./someData.json").as[MyClass]
val transformedDF = df.map{
case (MyClass(MyList(list))) => MyClass(MyList(list.map{
case (MyElement(item)) => MyElement(MyItem(code = "XXXX", item.category))
}))
}
transformedDF.show(false)
+--------------------------------+
|conditions |
+--------------------------------+
|[[[[XXXX, ABC]], [[XXXX, EDC]]]]|
+--------------------------------+
As you see, we're doing some simple pattern matching on the case classes we've defined and successfully renaming all of the code fields' values to "XXXX". If you want to get a json back, you can call the to_json function like so:
transformedDF.select(to_json($"conditions")).show(false)
+----------------------------------------------------------------------------------------------------+
|structstojson(conditions) |
+----------------------------------------------------------------------------------------------------+
|{"list":[{"element":{"code":"XXXX","category":"ABC"}},{"element":{"code":"XXXX","category":"EDC"}}]}|
+----------------------------------------------------------------------------------------------------+
Finally a very small remark about the data. If you have any control over how the data gets made, I would add the following suggestions:
The conditions JSON object seems to have no function in here, since it just contains a single array called list. Consider making the conditions object the array, which would allow you to discard the list name. That would simpify your structure
The element object does nothing, except containing a single item. Consider removing 1 level of abstraction there too.
With these suggestions, your data would contain the same information but look something like:
{"conditions":[{"code":"1234","category":"ABC"},{"code":"4550","category":"EDC"}]}
With these suggestions, you would also remove the need of the MyElement and the MyList case classes! But very often we're not in control over what data we receive so this is just a small disclaimer :)
Hope this helps!
EDIT: After your addition of simplified data according to the above suggestions, the task gets even easier. Again, you only need a map operation here:
import spark.implicits._
import org.apache.spark.sql.Encoders
case class MyItem(code: String, category: String)
case class MyClass(conditions: Seq[MyItem])
val data = Seq(MyClass(conditions = Seq(MyItem("1234", "ABC"), MyItem("4550", "EDC"))))
val df = data.toDF.as[MyClass]
val transformedDF = df.map{
case MyClass(conditions) => MyClass(conditions.map{
item => MyItem("XXXX", item.category)
})
}
transformedDF.show(false)
+--------------------------+
|conditions |
+--------------------------+
|[[XXXX, ABC], [XXXX, EDC]]|
+--------------------------+
I am able to find a simple solution with Spark 3.1+ as new features are added in this new Spark version.
Updated code:
val data = Seq(
MyClass(conditions = Seq(MyItem("1234", "ABC"), MyItem("234", "KBC"))),
MyClass(conditions = Seq(MyItem("4550", "DTC"), MyItem("900", "RDT")))
)
import spark.implicits._
val ds = data.toDF()
val updatedDS = ds.withColumn(
"conditions",
transform(
col("conditions"),
x => x.withField("code", updateArray(x.getField("code")))))
updatedDS.show()
UDF:
def updateArray = udf((oldVal: String) => {
if(oldVal.contains("1234"))
"XXX"
else
oldVal
})

Initializing interleaved Arrays in Kotlin

I'd like to create a custom Array in Kotlin.
class Node(val point: Point) {
var neighbour : Array<Node?> = Array(4, {_ -> null})
var prev : Byte = -1
}
Now, in another class, I tried to create an object like:
class OtherClass{
var field: Array<Array<Node?>> = Array(size.x, {_ -> Array(size.y, {_ -> null})})
}
So basically, I need a Grid of Nodes, all initialized with null. The provided sizes are of type Integer.
I get the following error:
Type inference failed. Expected type mismatch:
required: Array< Array< Node?>>
found: Array< Array< Nothing?>
Kotlin has an arrayOfNulls function which might make this a bit more elegant:
val field: Array<Array<Node?>> = Array(4) { arrayOfNulls<Node?>(4) }
Or, without optional types:
val field = Array(4) { arrayOfNulls<Node?>(4) }
I still have to specify Node? as the innermost type, however.
Okay, the solution was quite simple:
var field: Array<Array<Node?>> = Array(size.x, {_ -> Array(size.y, {_ -> null as Node?})})

Get field names and values from LablelledGenerics of a case class

I try to have Seq[String], containing the fields name of a case class
And another Seq[String] containing values of case class.
In a generic way. I think I will have to map values with a Poly1 function to have the Arbitrary type => String.
But now, I'm not able to extract keys and values form LabelledGenerics.
def apply[T,R <: HList](value : T)(implicit gen: LabelledGeneric.Aux[T, R],
keys : Keys[R],
valuesR : Values[R]
) {
val hl = gen.to(value)
val keys = hl.keys ...
val values = hl.values.map ...
}
I'm not sure if I have to ask for keys and values implicit or if it's possible to have this from the LabelledGeneric.
I have tried to map the following Poly over keys to have a hlist of string.
But it's seems keys are not Witness
object PolyWitnesToString extends Poly1 {
implicit def witnessCase = at[Witness]{ w => w.toString}
}
I'm a little bit lost now.

Tuple case mapping don't work with generic Array[T] in scala

I don't understand why the compiler cannot understand the case instruction mapping on tuple when i try to use with generics Array[T].
class Variable[T](val p: Prototype[T], val value: T)
class Prototype[T](val name: String)(implicit m: Manifest[T])
// Columns to variable converter
implicit def columns2Variables[T](columns:Array[(String,Array[T])]): Iterable[Variable[Array[T]]] = {
columns.map{
case(name,value) =>
new Variable[Array[T]](new Prototype[Array[T]](name), value)
}.toIterable
}
Error say :
error: constructor cannot be instantiated to expected type;
found : (T1, T2)
required: fr.geocite.simExplorator.data.Variable[Array[T]]
case(name,value) =>
I'm also not sure about the wording of the error, but first of all, you will need the manifest for T because it is required for constructing new Prototype[Array[T]] (the array manifest can be automatically generated if a manifest for its type parameter is in scope).
Is there any reason you absolutely need arrays? They come with the irregularity of Java's type system, they are mutable, and they offer very little advantage over for example Vector. Lastly, and that's probably why carry around the manifests, unlike arrays standard collections do not require manifests for construction.
class Variable[T](val p: Prototype[T], val value: T)
class Prototype[T](val name: String)
implicit def cols2v[T](cols: Vector[(String,Vector[T])]): Vector[Variable[Vector[T]]] =
cols.map {
case (name, value) => new Variable(new Prototype(name), value)
}

Scala: how to specify type parameter bounds implying equality?

Don't be put off by the long text, the points are quite trivial but require a bit of code to illustrate the problem. :-)
The Setup:
Say I would like to create a trait, here modeled as a Converter of some kind, that itself is generic but has a typed method convert() that returns an appropriately typed result object, say a Container[T]:
trait Converter {
def convert[T]( input: T ) : Container[T]
}
trait Container[T] // details don't matter
My question is about type constraints on methods, in particular for enforcing equality, and has two closely related parts.
Part 1: Say now that there was a specialized container type that was particularly suitable for array-based contents, like so:
object Container {
trait ForArrays[U] extends Container[Array[U]]
}
Given this possibility, I'd now like to specialize the Converter and in particular the return type of the convert() method, to the specialized Container.ForArrays type:
object Converter {
trait ForArrays extends Converter {
// the following line is rubbish - how to do this right?
def convert[E,T <: Array[E]]( input: T ) : Container.ForArrays[E]
}
}
So that I can do something like this:
val converter = new Converter.ForArrays { ... }
val input = Array( 'A', 'B', 'C' )
val converted : Container.ForArrays[Char] = converter.convert( input )
Basically I want Scala, if the type of converter is known to be Converter.ForArrays, to also infer the specialized return type of convert[Char]() as Container.ForArrays[Char], i.e. the matching container type plus the array type of the input. Is this or something like it possible and if so, how do I do it? E.g. how do I specify the type parameters / bounds on convert() (what is provided is just a stand-in - I have no idea how to do this). Oh, and naturally so that it still overrides its super method, otherwise nothing is gained.
Part 2: As a fallback, should this not be possible, I could of course push the convert function down into the Array-focused variant, like so:
trait Converter // now pretty useless as a shared trait
object Converter {
trait ForValues extends Converter {
def convert[T]( input: T ) : Container[T]
}
trait ForArrays extends Converter {
def convert[E]( input: Array[E] ) : Container.ForArrays[E]
}
}
OK. Now say I have an even more specialized Converter.ForArrays.SetBased that can internally use a set of elements of type E (the same as the 'input' array element type) to do some particular magic during the conversion. The set is now a parameter of the trait, however, like so:
case class SetBased( set: Set[F] ) extends Converter.ForArrays {
// the following line is also rubbish...
def convert[E = F]( input: Array[E] ) : Container.ForArrays[E] = {...}
}
Again, this is about the type parameters of the convert() method. The difficulty here is: how do I glue the type parameter of the class - F - to the type parameter of the method - E - such that the Scala compiler will only let the user call convert() with an array whose elements match the elements of the set? Example:
val set = Set( 'X', 'Y', 'Z' )
val converter = new Converter.ForArrays.SetBased( set )
val input = Array( 'A', 'B', 'C' )
val converted : Container.ForArrays[Char] = converter.convert( input )
No, you can't. For the same reason you can't narrow argument types or widen return types when overriding a method (but can narrow return type). Here is what you can do, however (for your fallback solution):
trait Converter {
type Constraint[T]
}
trait ForArrays extends Converter {
def convert[E]( input: Array[E] )( implicit ev : Constraint[T] ) : Container.ForArrays[E]
}
case class SetBased[F](set: Set[F]) extends Converter {
type Constraint[T] = T =:= F
def convert[E]( input: Array[E] )( implicit ev : E =:= F ) = ...
}
I'm going to assume that Container.ForArrays is a subclass of Container, without this, Converter.ForArrays.convert won't match the signature of the overridden Converter.convert
Try writing it something like this:
object Converter {
trait ForArrays extends Converter {
def convert[E] (input: Array[E]): Container.ForArrays[E]
}
}
Regarding your fallback solution. If two types are the same, then just use the same type param!
case class SetBased (set: Set[F]) extends Converter.ForArrays {
def convert (input: Array[F]): Container.ForArrays[F] = {...}
}

Resources