transform flattened json into date object in snowfalke - snowflake-cloud-data-platform

I know how to flatten the below object, but wondering how I can pass the values in as I flatten them to create a date
What I have:
"dateRange": {
"start": {
"month": 9,
"day": 2,
"year": 2020
}
What I want
2020-09-02

DATE_FROM_PART
DATE_FROM_PART(year,month,day)
or for you object
DATE_FROM_PART(f.value:dateRange:start:year, f.value:dateRange:start:month, f.value:dateRange:start:year)
if the flatten looked like table(flatten(input=> <something>) f

Related

Combine JSON with same value into JSON array - Scala

I have converted a dataframe with columns email, account, id into json using toJSON. Each row is a JSON which looks like: {"email": "xyz", "account": "pqr", "id": "1"}.
The id field is not unique and I want to combine this array of JSON into array of JSON array such that each row is a array of JSONs with same id values.
For example: One row would look like: [{"email": "xyz", "account": "pqr", "id": "1"},{"email": "abc", "account": "lmn", "id": "1"}]
After this, I want to populate this JSON array into another dataframe user which has columns id and user.
The JSON array of each user with the matching id should be in the user dataframe.
O/p would be each row as: | 1 | [{"email": "xyz", "account": "pqr", "id": "1"},{"email": "abc", "account": "lmn", "id": "1"}] |
Can someone suggest how I can do this efficiently without exploding all the arrays multiple times?
I'm unsure which JSON library you are using, so I'd recommend to convert to a case class which has an id field. You could then group by the id field and then insert into your user dataframe, converting the grouped rows to JSON.
Something along the lines of...
case class Row(email: String, account: String, id: String)
val rows: List[Row] = ??? // converted from your dataframe
rows.groupBy(_.id)
.map { case (id, rows) =>
// insert into user dataframe. Convert rows to JSON
}

Cosmos DB SQL query in single embed document

I am working with Cosmos DB and I want to write a SQL query that returns multiple document in one single embed documents.
To elaborate, imagine you have the following two document types in one container. OrderId of Order document has reference in OrderDetail document.
1.Order
{
"OrderId": "31d4c08b-ee59-4ede-b801-3cacaea38808",
"Name": "ABC",
"Type": "Order",
"DeptName": "ABC",
"TotalAmount": 100.05
}
2.OrderDetail
{
"OrderDetailId": "689bdc38-9849-4a11-b856-53f8628b76c9",
"OrderId": "31d4c08b-ee59-4ede-b801-3cacaea38808",
"Type": "OrderDetail",
"ItemNo": 202,
"Quantity": 10,
"UnitPrice": 10.05
},
{
"OrderDetailId": "789bdc38-9849-4a11-b856-53f8628b76c9",
"OrderId": "31d4c08b-ee59-4ede-b801-3cacaea38808",
"Type": "OrderDetail",
"ItemNo": 200,
"Quantity": 11,
"UnitPrice": 15.05
}
I want to write a query that will return all entries of OrderDetail in one array based on reference OrderId="31d4c08b-ee59-4ede-b801-3cacaea38808"
Output should be like below
{
"OrderId":"31d4c08b-ee59-4ede-b801-3cacaea38808",
"Name":"ABC",
"Type":"Order",
"OrderDetail":[
{
"OrderDetailId":"689bdc38-9849-4a11-b856-53f8628b76c9",
"Type":"OrderDetail",
"ItemNo":202,
"Quantity":10,
"UnitPrice":10.05
},
{
"OrderDetailId":"789bdc38-9849-4a11-b856-53f8628b76c9",
"Type":"OrderDetail",
"ItemNo":200,
"Quantity":11,
"UnitPrice":15.05
}
]
}
I have no idea how to query in Cosmosdb to get the above result.
Your desired output should be applied in relational database,Cosmos db is non-relational db which is not appropriate for your scenario. Per my knowledge, no query sql could produce above output directly.
I suggest you executing 2 sqls, one produces:
{"OrderId":"31d4c08b-ee59-4ede-b801-3cacaea38808",
"Name":"ABC",
"Type":"Order"}
other one produces:
"OrderDetail":[
{
"OrderDetailId":"689bdc38-9849-4a11-b856-53f8628b76c9",
"Type":"OrderDetail",
"ItemNo":202,
"Quantity":10,
"UnitPrice":10.05
},
{
"OrderDetailId":"789bdc38-9849-4a11-b856-53f8628b76c9",
"Type":"OrderDetail",
"ItemNo":200,
"Quantity":11,
"UnitPrice":15.05
}
]
Then combine them. Surely,you could do such process in Stored Procedure.

Querying an array within an array with Postgres JSONB query

I have some JSON in a field in my Postgres 9.4 db and I want to find rows where the given name is a certain value, where the field is named model and the JSON structure is as follows:
{
"resourceType": "Person",
"id": "8a7b72b1-49ec-43e5-bd21-bc62674d9875",
"name": [
{
"family": [
"NEWMAN"
],
"given": [
"JOHN"
]
}
]
}
So I tried this: SELECT * FROM current WHERE model->'name' #> '{"given":["JOHN"]}'; (as well as various other guesses) but that does not match the above data. How should I do this?
Use the function jsonb_array_elements():
select t.*
from current t,
jsonb_array_elements(model->'name') names
where names->'given' ? 'JOHN'

Elasticsearch constant score sort

I have a pretty simple elasticsearch query where I filter some items by category. It's a constant score query, something like this:
"query": {
"constant_score": {
"filter": {
"term": {
"category": "[category-id]"
}
}
}
}
The problem is that having no score to sort these result by they don't always come back in the same order. And this is an issue, because it messes up my pagination.
An example. I request the first 5 items and I receive back let's say the following ids: [4, 7, 8, 10, 3]. I then want the next 5 items to display the next page, but I may get some items repeated, like this: [12, 15, 7, 13, 9].
The problem is that all my fields are string fields, and I wouldn't want to sort by any of them. The sort order is not important, it's just important to keep the same order every time.
Any ideas? Thanks!
Try this:
GET _search
{
"query": {
"bool": {
"filter": {
"term": {
"category": "[category-id]"
}
}
}
}
}
Since this is what used to be known as a filtered query no scores are calculated and the score field will have value of 0.

Multiple sorting in ArangoDB

My webapp needs to display several sorted lists of document attributes in a graph. These are hours, cycles, and age.
I have an AQL query that beautifully traverses the graph and gets me all the data my app needs in 2 ms. I'm very impressed! But I need it sorted for each graph. The query currently returns an array of json objects that contain all three of the attributes and the id for which they apply. Awesome. The query also very easily sorts on one of the attributes.
My problem is: I need to have a sorted list of all three, and would prefer not to query the database three times since the data is all in the same documents my traversal returned.
I would like to return three sorted arrays of json objects: one containing hours and the id, one containing cycles and the id, and one containing age and the id. This way, my graphs can easily display all three graphs without client-side sorting.
HTTP requests themselves are time consuming although the database is very fast, which is why I'd like to pull all three at once, as the data itself is small.
My current query is a simple graph traversal:
for v, e, p in outbound startNode graph 'myGraph'
filters & definitions...
sort v.hours desc
return {"hours": v.hours, "cycles": v.cycles, "age": v.age, "id": v.id}
Is there an easy way I can tell Arango to return me this structure?
{
[
{
"id": 47,
"hours": 123
},
{
"id": 23,
"hours": 105
}...
],
[
{
"id": 47,
"cycles": 18
},
{
"id": 23,
"cycles": 5
}...
],
[
{
"id": 47,
"age": 4.2
},
{
"id": 23,
"age": 0.9
}
]
}
Although the traversal is fast, I would prefer if I didn't have to re-traverse the graph three times to do it, if possible.
My solution:
let data = (for v, e, p in outbound startNode graph 'myGraph'
filters & definitions...
return {"hours": v.hours, "cycles": v.cycles, "age": v.age, "id": v.id})
let byHours = (for thing in data
sort thing.hours desc
return {"hours": thing.hours, "id": thing.id})
let byCycles = (for thing in data
sort thing.cycles desc
return {"cycles": thing.cycles, "id": thing.id})
let byAge = (for thing in data
sort thing.age desc
return {"age": thing.age, "id": thing.id})
return {"hours": byHours, "cycles": byCycles, "age": byAge}
I'm not sure how this compares against your solution performance-wise, but the most obvious solution would be to traverse once and then create three sorted results like this:
LET nodes = (
FOR v, e, p IN OUTBOUND startNode GRAPH 'myGraph'
FILTER ...
RETURN v
)
RETURN {
hours: (
FOR n IN nodes
SORT n.hours DESC
RETURN KEEP(n, ['hours', 'id'])
),
cycles: (
FOR n IN nodes
SORT n.cycles DESC
RETURN KEEP(n, ['cycles', 'id'])
),
age: (
FOR n IN nodes
SORT n.age DESC
RETURN KEEP(n, ['age', 'id'])
)
}
This would traverse the graph only once but sort the result three times.

Resources