how to get multiple records from db and put into array or map - arrays

I want to get multiple records from db and put into array or map.
this is my sample array with user ids
{"array":[133,136,137] }
this is my code
def array(conn, %{"array" => array}) do
userlist = %{}
Enum.each(array, fn(x) ->
Map.put(userlist, x, Repo.get(ApiDb.User, x))
end)
json conn, userlist
end
but this method return empty array
below is the console output

I think this approach is more optimized than #Dogbert's one (correct me if I'm wrong) because we ask directly Ecto to format each row in tuple then we convert the list of tuples into a map using the built-in Enum.into/2. For that, you'll want to import Ecto.Query in your current module and :
query = from user in ApiDb.User,
where: user.id in ^user_ids,
select: {user.id, user}
Repo.all(query) |> Enum.into(%{})
which yields
%{id1 => user1, id2 => user2...}
For the Poison encoding problem, I didn't encounter any problem regarding the conversion of number keys, as they get converted into strings automatically by Poison.
Hope this also helps :)

You cannot modify the value of a variable outside Enum.each from within Enum.each. For this specific case, I'd use a for to iterate through the list, fetch the user, and put it in a map with the id as key:
def array(conn, %{"array" => array}) do
users = for x <- array, into: %{}, do: {"#{x}", Repo.get(ApiDb.User, x)}
json conn, users
end
I'd suggest using id IN _ query here so that all the records are fetched in a single query:
def array(conn, %{"array" => array}) do
users = from(u in ApiDb.User, where: u.id in ^array) |> Repo.all
map = for user <- users, into: %{}, do: {"#{user.id}", user}
json conn, map
end

Related

GenericRowWithSchema ClassCastException in Spark 3 Scala UDF for Array data

I am writing a Spark 3 UDF to mask an attribute in an Array field.
My data (in parquet, but shown in a JSON format):
{"conditions":{"list":[{"element":{"code":"1234","category":"ABC"}},{"element":{"code":"4550","category":"EDC"}}]}}
case class:
case class MyClass(conditions: Seq[MyItem])
case class MyItem(code: String, category: String)
Spark code:
val data = Seq(MyClass(conditions = Seq(MyItem("1234", "ABC"), MyItem("4550", "EDC"))))
import spark.implicits._
val rdd = spark.sparkContext.parallelize(data)
val ds = rdd.toDF().as[MyClass]
val maskedConditions: Column = updateArray.apply(col("conditions"))
ds.withColumn("conditions", maskedConditions)
.select("conditions")
.show(2)
Tried the following UDF function.
UDF code:
def updateArray = udf((arr: Seq[MyItem]) => {
for (i <- 0 to arr.size - 1) {
// Line 3
val a = arr(i).asInstanceOf[org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema]
val a = arr(i)
println(a.getAs[MyItem](0))
// TODO: How to make code = "XXXX" here
// a.code = "XXXX"
}
arr
})
Goal:
I need to set 'code' field value in each array item to "XXXX" in a UDF.
Issue:
I am unable to modify the array fields.
Also I get the following error if remove the line 3 in the UDF (cast to GenericRowWithSchema).
Error:
Caused by: java.lang.ClassCastException: org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema cannot be cast to MyItem
Question: How to capture Array of Structs in a function and how to return a modified array of items?
Welcome to Stackoverflow!
There is a small json linting error in your data: I assumed that you wanted to close the [] square brackets of the list array. So, for this example I used the following data (which is the same as yours):
{"conditions":{"list":[{"element":{"code":"1234","category":"ABC"}},{"element":{"code":"4550","category":"EDC"}}]}}
You don't need UDFs for this: a simple map operation will be sufficient! The following code does what you want:
import spark.implicits._
import org.apache.spark.sql.Encoders
case class MyItem(code: String, category: String)
case class MyElement(element: MyItem)
case class MyList(list: Seq[MyElement])
case class MyClass(conditions: MyList)
val df = spark.read.json("./someData.json").as[MyClass]
val transformedDF = df.map{
case (MyClass(MyList(list))) => MyClass(MyList(list.map{
case (MyElement(item)) => MyElement(MyItem(code = "XXXX", item.category))
}))
}
transformedDF.show(false)
+--------------------------------+
|conditions |
+--------------------------------+
|[[[[XXXX, ABC]], [[XXXX, EDC]]]]|
+--------------------------------+
As you see, we're doing some simple pattern matching on the case classes we've defined and successfully renaming all of the code fields' values to "XXXX". If you want to get a json back, you can call the to_json function like so:
transformedDF.select(to_json($"conditions")).show(false)
+----------------------------------------------------------------------------------------------------+
|structstojson(conditions) |
+----------------------------------------------------------------------------------------------------+
|{"list":[{"element":{"code":"XXXX","category":"ABC"}},{"element":{"code":"XXXX","category":"EDC"}}]}|
+----------------------------------------------------------------------------------------------------+
Finally a very small remark about the data. If you have any control over how the data gets made, I would add the following suggestions:
The conditions JSON object seems to have no function in here, since it just contains a single array called list. Consider making the conditions object the array, which would allow you to discard the list name. That would simpify your structure
The element object does nothing, except containing a single item. Consider removing 1 level of abstraction there too.
With these suggestions, your data would contain the same information but look something like:
{"conditions":[{"code":"1234","category":"ABC"},{"code":"4550","category":"EDC"}]}
With these suggestions, you would also remove the need of the MyElement and the MyList case classes! But very often we're not in control over what data we receive so this is just a small disclaimer :)
Hope this helps!
EDIT: After your addition of simplified data according to the above suggestions, the task gets even easier. Again, you only need a map operation here:
import spark.implicits._
import org.apache.spark.sql.Encoders
case class MyItem(code: String, category: String)
case class MyClass(conditions: Seq[MyItem])
val data = Seq(MyClass(conditions = Seq(MyItem("1234", "ABC"), MyItem("4550", "EDC"))))
val df = data.toDF.as[MyClass]
val transformedDF = df.map{
case MyClass(conditions) => MyClass(conditions.map{
item => MyItem("XXXX", item.category)
})
}
transformedDF.show(false)
+--------------------------+
|conditions |
+--------------------------+
|[[XXXX, ABC], [XXXX, EDC]]|
+--------------------------+
I am able to find a simple solution with Spark 3.1+ as new features are added in this new Spark version.
Updated code:
val data = Seq(
MyClass(conditions = Seq(MyItem("1234", "ABC"), MyItem("234", "KBC"))),
MyClass(conditions = Seq(MyItem("4550", "DTC"), MyItem("900", "RDT")))
)
import spark.implicits._
val ds = data.toDF()
val updatedDS = ds.withColumn(
"conditions",
transform(
col("conditions"),
x => x.withField("code", updateArray(x.getField("code")))))
updatedDS.show()
UDF:
def updateArray = udf((oldVal: String) => {
if(oldVal.contains("1234"))
"XXX"
else
oldVal
})

How to get from database list instead of list of lists?

I have columns with regions. I need get all unique values to List.
The follow function return List of Lists:
getListOfRegionsFromDB() async {
String sql_get_regions = """SELECT DISTINCT(region) FROM "public"."ftp_files"; """;
List<dynamic> result = await connection.query(sql_get_regions);
print(result);
}
And I am getting result like:
[[Tverskaja_obl], [Belgorodskaja_obl], [Uljanovskaja_obl], [Jaroslavskaja_obl]]
But I want to:
["Tverskaja_obl", "Belgorodskaja_obl", "Uljanovskaja_obl", "Jaroslavskaja_obl"]
Is there any short way to do it? I see only one way -- to iterate throw data with for loop.
I am using PostgreSQL.

Ruby - Set key-value pairs inside array of hashes

The problem is:
I have a method
def comparison_reporter(list_of_scenarios_results1, list_of_scenarios_results2)
actual_failed_tests = list_of_scenarios_results2.select {|k,v| v == 'Failed'}
actual_passed_tests = list_of_scenarios_results2.select {|k,v| v == 'Passed'}
failed_tests = Array.new(actual_failed_tests.length) { Hash.new }
failed_tests.each do |hash|
actual_failed_tests.keys.map {|name| hash["test_name"] = name}
actual_failed_tests.values.map {|new_status| hash["actual_status"] = new_status}
list_of_scenarios_results1.values_at(*actual_failed_tests.keys).map {|old_status| hash["previous_status"] = old_status}
end
final_result = {
"passed_tests_count" => list_of_scenarios_results2.select {|k,v| v == 'Passed'}.length,
"failed_tests_count" => list_of_scenarios_results2.select {|k,v| v == 'Failed'}.length,
"failed_tests" => failed_tests
}
return final_result
end
This method takes 2 hashes as arguments and returns the result of their comparison and some other things. Currently, it always returns failed_tests with two (or more) identical hashes (same key-value pairs).
I think, that problem is somewhere in failed_tests.each do |hash| block, but I can't find the reason of this bug, please advice. Example of the method result (in .json format)
{
"passed_tests_count": 3,
"failed_tests_count": 2,
"failed_tests": [
{
"test_name": "As a user I want to use Recent searches tab",
"actual_status": "Failed",
"previous_status": "Failed"
},
{
"test_name": "As a user I want to use Recent searches tab",
"actual_status": "Failed",
"previous_status": "Failed"
}
]
}
UPD:
hash1 (first argument) -
{""=>"Passed",
"As a new user I want to use no fee rentals tab"=>"Passed",
"As a new user I want to use Luxury rentals tab"=>"Passed",
"As a user I want to use Recent searches tab"=>"Failed",
"As a user I want to use new listings for you tab"=>"Passed"}
hash2 (second argument)-
{""=>"Passed",
"As a new user I want to use no fee rentals tab"=>"Failed",
"As a new user I want to use Luxury rentals tab"=>"Passed",
"As a user I want to use Recent searches tab"=>"Failed",
"As a user I want to use new listings for you tab"=>"Passed"}
Example of desired desired output:
{
"passed":"count",
"failed":"count",
"failed_tests": [
{"test_name":"As a user I want to use Recent searches tab",
"actual_status":"failed",
"previous_status":"failed"},
{"test_name":"As a new user I want to use no fee rentals tab",
"actual_status":"failed",
"previous_status":"passed"}]
}
Solution:
def comparison_reporter(before, after)
failed_tests = after.select { |k,v| v == "Failed" }.map do |k,v|
{
test_name: k,
actual_status: v,
previous_status: before[k]
}
end
{
passed: after.size - failed_tests.size,
failed: failed_tests.size,
failed_tests: failed_tests
}
end
Simplified failed_tests quite a bit. Since we calculate number of failed tests, we can use it for the final counts, instead of iterating over the hash again.
The problem is on line 8: You're overwriting hash["previous_status"] with the last value in list_of_scenarios_results1.values_at(*actual_failed_tests.keys) when you map over it.
Usually you use map to assign an iterable to something, not modify something else.
e.g.
x = ['1','2','3'].map(&:to_i)
rather than
x = []; ['1','2','3'].map {|v| x << v.to_i}
I'd suggest re-thinking your approach. Will you always have the same keys in both hashes? If so you could simplify this. I'd also suggest looking into byebug. It's an interactive debugger that'll let you step through your function and see where things aren't doing what you expect.

How to pass Array[Long] ids into Aerospike Record UDF?

I store many sub records as Map in the same top record, now I need remove some sub records according to passing keys, I plan to use UDF to implement it. When I invoked from Java, the log shows the length is 0. Why? and how to correct this? Thanks!
In Java =>
val ids = Array(1,2,3)
Value.get(ids)
In Lua UDF =>
function remove_keys(rec, binName, rmkeys)
info("Value(%s) valType(%s)", tostring(rec), type(rec));
info("Value(%s) valType(%s)", tostring(binName), type(binName));
info("Value(%s) valType(%s)", tostring(rmkeys), type(rmkeys));
info("BinName(%s)", binName)
info("len(%s)", tostring(#rmkeys))
...
end
Try:
String binName = "binName";
List<Value> rmKeys = new ArrayList<Value>();
rmKeys.add(Value.get("key1"));
client.execute(policy, key, packageName, "remove_keys", Value.get(binName), Value.get(rmKeys));

Saving users and items features to HDFS in Spark Collaborative filtering RDD

I want to extract users and items features (latent factors) from the result of collaborative filtering using ALS in Spark. The code I have so far:
import org.apache.spark.mllib.recommendation.ALS
import org.apache.spark.mllib.recommendation.MatrixFactorizationModel
import org.apache.spark.mllib.recommendation.Rating
// Load and parse the data
val data = sc.textFile("myhdfs/inputdirectory/als.data")
val ratings = data.map(_.split(',') match { case Array(user, item, rate) =>
Rating(user.toInt, item.toInt, rate.toDouble)
})
// Build the recommendation model using ALS
val rank = 10
val numIterations = 10
val model = ALS.train(ratings, rank, numIterations, 0.01)
// extract users latent factors
val users = model.userFeatures
// extract items latent factors
val items = model.productFeatures
// save to HDFS
users.saveAsTextFile("myhdfs/outputdirectory/users") // does not work as expected
items.saveAsTextFile("myhdfs/outputdirectory/items") // does not work as expected
However, what gets written to HDFS is not what I expect. I expected each line to have a tuple (userId, Array_of_doubles). Instead I see the following:
[myname#host dir]$ hadoop fs -cat myhdfs/outputdirectory/users/*
(1,[D#3c3137b5)
(3,[D#505d9755)
(4,[D#241a409a)
(2,[D#c8c56dd)
.
.
It is dumping the hash value of the array instead of the entire array. I did the following to print the desired values:
for (user <- users) {
val (userId, lf) = user
val str = "user:" + userId + "\t" + lf.mkString(" ")
println(str)
}
This does print what I want but I can't then write to HDFS (this prints on the console).
What should I do to get the complete array written to HDFS properly?
Spark version is 1.2.1.
#JohnTitusJungao is right and also the following lines works as expected :
users.saveAsTextFile("myhdfs/outputdirectory/users")
items.saveAsTextFile("myhdfs/outputdirectory/items")
And this is the reason, userFeatures returns an RDD[(Int,Array[Double])]. The array values are denoted by the symbols you see in the output e.g. [D#3c3137b5 , D for double, followed by # and hex code which is created using the Java toString method for this type of objects. More on that here.
val users: RDD[(Int, Array[Double])] = model.userFeatures
To solve that you'll need to make the array as a string :
val users: RDD[(Int, String)] = model.userFeatures.mapValues(_.mkString(","))
The same goes for items.

Resources