I can't find a string result contained in a column...
here is the table:
object Equivalences extends Table[(Option[Int], String, String)]("EQUIVALENCES") {
def id = column[Int]("EQV_ID", O.PrimaryKey, O.AutoInc)
def racine = column[String]("RAC")
def definition = column[String]("DEF")
def * = id.? ~ racine ~ definition
}
and here is the wrong code:
def ajoute_si_racine_absente(racine_ajoutée: String, definition_ajoutée: String) = {
val cul = Query(Equivalences).filter(
equ => {
println(equ.racine)
equ.racine.toString.contains(racine_ajoutée)
})
if (cul.list().length == 0) {
Equivalences.insert(None, racine_ajoutée, definition_ajoutée)
}
}
the wrong code aims to insert a value if it does not exists, but the "println" within displays this result: "(EQUIVALENCES #409303125).RAC" and it does not match the column's content.
Maybe should I use the "getResult" method but I did not found any example on the net.
thanks.
Karol S is right. This does what you want:
def ajoute_si_racine_absente(racine_ajoutée: String, definition_ajoutée: String) = {
val cul = Query(Equivalences).list().filter(
equ => {
println(equ.racine)
equ.racine.toString.contains(racine_ajoutée)
})
if (cul.length == 0) {
Equivalences.insert(None, racine_ajoutée, definition_ajoutée)
}
}
But it may not be efficient, because you fetch the complete table. Slick is a query builder with a collection like API. Everything you write just resembles and builds up a query description until you finally call .listor .run. Only then the query is executed. Everything before are just placeholder objects representing tables, queries and columns. And the placeholder object for column racine prints as "(EQUIVALENCES #409303125).RAC".
Related
I am writing a Spark 3 UDF to mask an attribute in an Array field.
My data (in parquet, but shown in a JSON format):
{"conditions":{"list":[{"element":{"code":"1234","category":"ABC"}},{"element":{"code":"4550","category":"EDC"}}]}}
case class:
case class MyClass(conditions: Seq[MyItem])
case class MyItem(code: String, category: String)
Spark code:
val data = Seq(MyClass(conditions = Seq(MyItem("1234", "ABC"), MyItem("4550", "EDC"))))
import spark.implicits._
val rdd = spark.sparkContext.parallelize(data)
val ds = rdd.toDF().as[MyClass]
val maskedConditions: Column = updateArray.apply(col("conditions"))
ds.withColumn("conditions", maskedConditions)
.select("conditions")
.show(2)
Tried the following UDF function.
UDF code:
def updateArray = udf((arr: Seq[MyItem]) => {
for (i <- 0 to arr.size - 1) {
// Line 3
val a = arr(i).asInstanceOf[org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema]
val a = arr(i)
println(a.getAs[MyItem](0))
// TODO: How to make code = "XXXX" here
// a.code = "XXXX"
}
arr
})
Goal:
I need to set 'code' field value in each array item to "XXXX" in a UDF.
Issue:
I am unable to modify the array fields.
Also I get the following error if remove the line 3 in the UDF (cast to GenericRowWithSchema).
Error:
Caused by: java.lang.ClassCastException: org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema cannot be cast to MyItem
Question: How to capture Array of Structs in a function and how to return a modified array of items?
Welcome to Stackoverflow!
There is a small json linting error in your data: I assumed that you wanted to close the [] square brackets of the list array. So, for this example I used the following data (which is the same as yours):
{"conditions":{"list":[{"element":{"code":"1234","category":"ABC"}},{"element":{"code":"4550","category":"EDC"}}]}}
You don't need UDFs for this: a simple map operation will be sufficient! The following code does what you want:
import spark.implicits._
import org.apache.spark.sql.Encoders
case class MyItem(code: String, category: String)
case class MyElement(element: MyItem)
case class MyList(list: Seq[MyElement])
case class MyClass(conditions: MyList)
val df = spark.read.json("./someData.json").as[MyClass]
val transformedDF = df.map{
case (MyClass(MyList(list))) => MyClass(MyList(list.map{
case (MyElement(item)) => MyElement(MyItem(code = "XXXX", item.category))
}))
}
transformedDF.show(false)
+--------------------------------+
|conditions |
+--------------------------------+
|[[[[XXXX, ABC]], [[XXXX, EDC]]]]|
+--------------------------------+
As you see, we're doing some simple pattern matching on the case classes we've defined and successfully renaming all of the code fields' values to "XXXX". If you want to get a json back, you can call the to_json function like so:
transformedDF.select(to_json($"conditions")).show(false)
+----------------------------------------------------------------------------------------------------+
|structstojson(conditions) |
+----------------------------------------------------------------------------------------------------+
|{"list":[{"element":{"code":"XXXX","category":"ABC"}},{"element":{"code":"XXXX","category":"EDC"}}]}|
+----------------------------------------------------------------------------------------------------+
Finally a very small remark about the data. If you have any control over how the data gets made, I would add the following suggestions:
The conditions JSON object seems to have no function in here, since it just contains a single array called list. Consider making the conditions object the array, which would allow you to discard the list name. That would simpify your structure
The element object does nothing, except containing a single item. Consider removing 1 level of abstraction there too.
With these suggestions, your data would contain the same information but look something like:
{"conditions":[{"code":"1234","category":"ABC"},{"code":"4550","category":"EDC"}]}
With these suggestions, you would also remove the need of the MyElement and the MyList case classes! But very often we're not in control over what data we receive so this is just a small disclaimer :)
Hope this helps!
EDIT: After your addition of simplified data according to the above suggestions, the task gets even easier. Again, you only need a map operation here:
import spark.implicits._
import org.apache.spark.sql.Encoders
case class MyItem(code: String, category: String)
case class MyClass(conditions: Seq[MyItem])
val data = Seq(MyClass(conditions = Seq(MyItem("1234", "ABC"), MyItem("4550", "EDC"))))
val df = data.toDF.as[MyClass]
val transformedDF = df.map{
case MyClass(conditions) => MyClass(conditions.map{
item => MyItem("XXXX", item.category)
})
}
transformedDF.show(false)
+--------------------------+
|conditions |
+--------------------------+
|[[XXXX, ABC], [XXXX, EDC]]|
+--------------------------+
I am able to find a simple solution with Spark 3.1+ as new features are added in this new Spark version.
Updated code:
val data = Seq(
MyClass(conditions = Seq(MyItem("1234", "ABC"), MyItem("234", "KBC"))),
MyClass(conditions = Seq(MyItem("4550", "DTC"), MyItem("900", "RDT")))
)
import spark.implicits._
val ds = data.toDF()
val updatedDS = ds.withColumn(
"conditions",
transform(
col("conditions"),
x => x.withField("code", updateArray(x.getField("code")))))
updatedDS.show()
UDF:
def updateArray = udf((oldVal: String) => {
if(oldVal.contains("1234"))
"XXX"
else
oldVal
})
fun main() {
val data = ArrayList<List<String>>()
data.add(listOf("32701", "First"))
data.add(listOf("32702", "Second"))
data.add(listOf("32702", "Second"))
data.add(listOf("32701", "First True"))
println(data.distinct())
}
Result :
[[32701, First], [32702, Second], [32701, First True]]
Question How about removing data [32701, First] and get new data with the same value ?
Expected :
[32702, Second], [32701, First True]]
The problem is that distinct() uses the equals methods and comparing the entirety of the list.
You could use distinctyBy { it.first() } if you can ensure lists wont be empty
Edit
In order to get latest value you can:
a) Reverse the list and then call distinctBy
yourList
.reversed() // Now latest values are first in the list
.distinctBy { it.first() } // first element of list holds the id
b) Associate the values into a map of Map<String, List<String>> by calling associateBy { it.first()} and getting the last value of the map by calling
val correctResults = map.values.map { valueList -> valueList.last() }
As a whole would look like:
yourList
.associateBy { it.first() }
.values
.map { valueList -> valueList.last() }
Be aware that any of these approaches IS NOT dealing with empty lists.
In order to deal with empty lists you could filter them out by just doing
val listsThatAreNotEmpty = yourList.filter { it.isNotEmpty() }
Use a combination of reversed and disinctBy:
fun main() {
val Data = ArrayList<List<String>>()
Data.add(listOf("32701", "First"))
Data.add(listOf("32702", "Second"))
Data.add(listOf("32702", "Second"))
Data.add(listOf("32701", "First True"))
println(Data.reversed().distinctBy{it[0]} )
// prints [[32701, First True], [32702, Second]]
}
You can reverse the result again to get the original relative order.
As mentioned by others, the use of listOf is sub-optimal, here is a cleaner version:
data class Item(val id: String, val text: String)
fun distinct(data : List<Item>) = data.reversed().distinctBy{it.id}
fun main() {
val data = listOf(
Item("32701", "First"),
Item("32702", "Second"),
Item("32702", "Second"),
Item("32701", "First True")
)
println(distinct(data) )
// [Item(id=32701, text=First True), Item(id=32702, text=Second)]
}
uri:23e6b806-7a39-4836-bae2-f369673defef offset:1
uri:z65e9d4e-a099-41a1-a9fe-3cf74xbb01a4 offset:2
uri:2beff8bf-1019-4265-9da4-30c696397e08 offset:3
uri:3b1df8bb-69f6-4892-a516-523fd285d659 offset:4
uri:4f961415-b847-4d2c-9107-87617671c47b offset:5
uri:015ba25c-c145-456a-bae7-ebe999cb8e0f offset:6
uri:z1f9592f-64d0-443d-ad0d-38c386dd0adb offset:7
The above is an arrays of arrays.
Each line is an element in the array however this in itself is an array. I did this by splitting each line on the comma and removing it. What I am trying to do is only extract the uri and offset and apply it to a case class.
case class output2(uri: String, offset: Int)
All I want is the actual values, so each instance of the case class, the uri and offset would be in the below format -
e1af5db7-3aad-4ab0-ac3a-55686fccf6ae
1
I'm trying to find a simple way to do this.
No need to split() each line on the comma. Make the comma part of the recognized intput pattern.
val data = Array("uri:23e6b806-7a39-4836-bae2-f369673defef,offset:1"
,"uri:z65e9d4e-a099-41a1-a9fe-3cf74xbb01a4,offset:2"
,"poorly formatted data will be ignored"
,"uri:2beff8bf-1019-4265-9da4-30c696397e08,offset:3"
,"uri:3b1df8bb-69f6-4892-a516-523fd285d659,offset:4"
,"uri:4f961415-b847-4d2c-9107-87617671c47b,offset:5"
,"uri:015ba25c-c145-456a-bae7-ebe999cb8e0f,offset:6"
,"uri:z1f9592f-64d0-443d-ad0d-38c386dd0adb,offset:7")
case class Data(uri:String, offset:Int)
val dataRE = "uri:([^,]+),offset:(\\d+)".r
val rslt:Array[Data] = data.collect{case dataRE(uri, os) => Data(uri, os.toInt)}
You can build your data checking the guid using the regex like:
val regexp = """uri:(\b[0-9a-f]{8}\b-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-\b[0-9a-f]{12}\b) offset:([0-9]+)""".r
val regexp(pattern, value) = "uri:23e6b806-7a39-4836-bae2-f369673defef offset:1"
output2(pattern, value.toInt)
I'd do it this way:
case class Output(uri: String, offset: Int)
val lines = Source
.fromFile("input")
.getLines
.toList
def parseUri(s: String): Option[String] =
s.splitAt(s.indexOf(":") + 1)._2 match {
case "" => None
case uri => Some(uri)
}
def parseOffset(s: String): Option[Int] =
s.splitAt(s.indexOf(":") + 1)._2 match {
case "" => None
case offset => Some(offset.toInt)
}
def parseOutput(xs: Array[String]): Option[Output] = for {
uri <- parseUri(xs(0))
offset <- parseOffset(xs(1))
} yield {
Output(uri, offset)
}
lines.map(_.split(" ")).flatMap { x =>
parseOutput(x)
}
trying to return string builder in a loop. is this workable.. I am collecting a list with each(), then appending 'it' to "scriptBldr_" to create a different object name each time to hold the string. then I collect the object names in a list. And trying to return using a for loop for each object name. But it's failing.
List.each {
String builderstring = "scriptBldr_" + it.replaceAll (/"/, '')
StringBuilder $builderstring = StringBuilder.newInstance()
if (ValidUDA == "Region") {
$builderstring <<"""
XYZCode
"""
StringBuilders.add(builderstring)
}
}
for(item in StringBuilders)
{
return item
}
I guess the following code does what you intended to code:
def myList = ['Hello "World"', 'asb"r"sd']
def ValidUDA = "Region"
def builders = [:]
myList.each {
String builderstring = "scriptBldr_" + it.replaceAll (/"/, '')
builders[builderstring] = StringBuilder.newInstance()
if (ValidUDA == "Region") {
builders[builderstring] <<"""
XYZCode
"""
}
}
return builders
A return statement will immediatly return from the method and hence will exit the loop and only called once. So, I guess, what you wanted to achieve is to return a list of StrinngBuilders.
some hints:
it is unusual in Groovy to start a variable with $ and you can run into problems with such a naming
when asking a question on SO, try to come up with a working example. As you can see, you example was missing some definitions
Update: as you've stated in you comment that you tryed to create dynamic variable names, I've updated the code to use maps. The returned map now contains the StringBuilders together with their names.
I've recently had to move a project over from MySQL to MSSQL. I'm using IDENTITY(1,1) on my id columns for my tables to match MySQL's auto-increment feature.
When I try to insert an object though, I'm getting this error:
[SQLServerException: Cannot insert explicit value for identity column in table 'categories' when IDENTITY_INSERT is set to OFF.]
Now after some research I found out that it's because I'm trying to insert a value for my id(0) on my tables. So for example I have an object Category
case class Category(
id: Long = 0L,
name: String
)
object Category extends Table[Category]("categories"){
def name = column[String]("name", O.NotNull)
def id = column[Long]("id", O.PrimaryKey, O.AutoInc)
def * = id ~ name <> (Category.apply _, Category.unapply _)
def add(model:Category) = withSession{ implicit session =>
Category.insert(model)
}
def remove(id:Long) = withSession{implicit session =>
try{Some(Query(Category).filter(_.id === id).delete)}
catch{case _ => None}
}
}
Is there a way to insert my object into the database and ignoring the 0L without MSSQL throwing an SQLException? MySQL would just ignore the id's value and do the increment like it didn't receive an id.
I'd really rather not create a new case class with everything but the id.
Try redefining your add method like this and see if it works for you:
def add(model:Category) = withSession{ implicit session =>
Category.name.insert(model.name)
}
If you had more columns then you could have added a forInsert projection to your Category table class that specified all fields except id, but since you don't, this should work instead.
EDIT
Now if you do have more than 2 fields on your table objects, then you can do something like this, which is described in the Lifted Embedding documentation here:
case class Category(
id: Long = 0L,
name: String,
foo:String
)
object Category extends Table[Category]("categories"){
def id = column[Long]("id", O.PrimaryKey, O.AutoInc)
def name = column[String]("name", O.NotNull)
def foo = column[String]("foo", O.NotNull)
def * = id ~ name ~ foo <> (Category.apply _, Category.unapply _)
def forInsert = name ~ foo <> (t => Category(0L, t._1, t._2), {(c:Category) => Some(c.name, c.foo)})
def add(model:Category) = withSession{ implicit session =>
Category.forInsert insert model
}
def remove(id:Long) = withSession{implicit session =>
try{Some(Query(Category).filter(_.id === id).delete)}
catch{case _ => None}
}
def withSession(f: Session => Unit){
}
}