Json data flattening on snowflake - snowflake-cloud-data-platform

Json data flattening on snowflake - snowflake-cloud-data-platform

I'm trying to flatten below Json data on snowflake :
Json Data :
{
"empDetails": [
{
"kind": "person",
"fullName": "John Doe",
"age": 22,
"gender": "Male",
"phoneNumber": {
"areaCode": "206",
"number": "1234567"
},
"children": [
{
"name": "Jane",
"gender": "Female",
"age": "6"
},
{
"name": "John",
"gender": "Male",
"age": "15"
}
],
"citiesLived": [
{
"place": "Seattle",
"yearsLived": [
"1995"
]
},
{
"place": "Stockholm",
"yearsLived": [
"2005"
]
}
]
},
{
"kind": "person",
"fullName": "Mike Jones",
"age": 35,
"gender": "Male",
"phoneNumber": {
"areaCode": "622",
"number": "1567845"
},
"children": [
{
"name": "Earl",
"gender": "Male",
"age": "10"
},
{
"name": "Sam",
"gender": "Male",
"age": "6"
},
{
"name": "Kit",
"gender": "Male",
"age": "8"
}
],
"citiesLived": [
{
"place": "Los Angeles",
"yearsLived": [
"1989",
"1993",
"1998",
"2002"
]
},
{
"place": "Washington DC",
"yearsLived": [
"1990",
"1993",
"1998",
"2008"
]
},
{
"place": "Portland",
"yearsLived": [
"1993",
"1998",
"2003",
"2005"
]
},
{
"place": "Austin",
"yearsLived": [
"1973",
"1998",
"2001",
"2005"
]
}
]
},
{
"kind": "person",
"fullName": "Anna Karenina",
"age": 45,
"gender": "Female",
"phoneNumber": {
"areaCode": "425",
"number": "1984783"
},
"citiesLived": [
{
"place": "Stockholm",
"yearsLived": [
"1992",
"1998",
"2000",
"2010"
]
},
{
"place": "Russia",
"yearsLived": [
"1998",
"2001",
""
]
},
{
"place": "Austin",
"yearsLived": [
"1995",
"1999"
]
}
]
}
]
}
I'm able to flatten the most of the data except for column/array years Lived,
for last column I'm getting null values.
below is what I have tried so far :
select empd.value:kind,
empd.value:fullName,
empd.value:age,
empd.value:gender,
empd.value:phoneNumber,
empd.value:phoneNumber.areaCode,
empd.value:phoneNumber.number ,
empd.value:children,
chldrn.value:name,
chldrn.value:gender,
chldrn.value:age,
city.value:place,
yr.value:yearsLived
from my_json emp,
lateral flatten(input=>emp.Json_data:empDetails) empd ,
lateral flatten(input=>empd.value:children, OUTER => TRUE) chldrn,
lateral flatten(input=>empd.value:citiesLived) city,
lateral flatten(input=>city.value:yearsLived) yr -- not getting data for
this array
can someone help me understand why I'm getting null values for yearsLived array ? I'm not sure if I'm missing anything here

Your query returns the column
yr.value:yearsLived
as if yr.value were an OBJECT with fields.
But you have already expanded the yearsLived field in the line
lateral flatten(input=>city.value:yearsLived) yr
so yr.value is really just a VARIANT containing the year. You can leave it as such—or perhaps wrap it in TO_NUMBER or TO_VARCHAR to have a more precise type.

Why don't you try this out.
create or replace table json_tab as
select parse_json('{ "place": "Austin","yearsLived": [ "1995","1999"]}') as years
select years:yearsLived[0]::int from json_tab
Since your JSON data is an array, you need to access the elements via index if you would like to get specific values or use any array function to explode it.
with flatten function
select years, v.value::string
from json_tab,
lateral flatten(input =>years:yearsLived ) v;

Related

JOLT: Merge specific data from JSON array using id key

I'm getting data in an specific way from an API and I have to convert it to a cleaner version of it.
What I get from the API is a JSON like this (you can see that there is some information duplicated as for the first fields but the investor is different).
{
"clubhouse": [
{
"id": "01",
"statusId": "ok",
"stateid": "2",
"TypeId": "3",
"investors": [
{
"investor": {
"id": "1234",
"gender": "01"
},
"inamount": "1500000",
"ratio": "12"
}
]
},
{
"id": "01",
"statusId": "ok",
"stateid": "2",
"TypeId": "3",
"investors": [
{
"investor": {
"id": "4321",
"gender": "02"
},
"inamount": "1700000",
"ratio": "12"
}
]
},
{
"id": "02",
"statusId": "ok",
"stateid": "2",
"TypeId": "3",
"investors": [
{
"investor": {
"id": "1333",
"gender": "01"
},
"inamount": "1500000",
"ratio": "12"
}
]
},
{
"id": "03",
"statusId": "ok",
"stateid": "5",
"TypeId": "3",
"investors": [
{
"investor": {
"id": "",
"gender": ""
},
"inamount": "",
"ratio": ""
}
]
},
{
"id": "02",
"statusId": "ok",
"stateid": "2",
"TypeId": "3",
"investors": [
{
"investor": {
"id": "1334",
"gender": "02"
},
"inamount": "1900000",
"ratio": "12"
}
]
}
]
}
I need to merge the investors and eliminate the duplicated information, the the expected result will be
{
"clubhouse": [
{
"id": "01",
"statusId": "ok",
"stateid": "2",
"TypeId": "3",
"investors": [
{
"investor": {
"id": "1234",
"gender": "01"
},
"inamount": "1500000",
"ratio": "12"
},
{
"investor": {
"id": "4321",
"gender": "02"
},
"inamount": "1700000",
"ratio": "12"
}
]
},
{
"id": "02",
"statusId": "ok",
"stateid": "2",
"TypeId": "3",
"investors": [
{
"investor": {
"id": "1333",
"gender": "01"
},
"inamount": "1500000",
"ratio": "12"
},
{
"investor": {
"id": "1334",
"gender": "02"
},
"inamount": "1900000",
"ratio": "12"
}
]
},
{
"id": "03",
"statusId": "ok",
"stateid": "5",
"TypeId": "3",
"investors": [
{
"investor": {
"id": "1555",
"gender": "01"
},
"inamount": "2000000",
"ratio": "15"
}
]
}
]
}
I'd try a couple of JOLTS and I got to merge the fields but not eliminate the duplicates.

You can start with grouping by id values such as
[
{
// group by "id" values to create separate objects
"operation": "shift",
"spec": {
"*": {
"*": {
"*": "#(1,id).&",
"investors": {
"*": {
"*": {
"#": "#(4,id).&3[&4].&" // &3 -> going 3 levels up to grab literal "investors", [&4] -> going 4 levels up the tree in order to reach the indexes of "clubhouse" array, & -> replicate the leaf node values for the current key-value pair
}
}
}
}
}
}
},
{
// get rid of "null" values
"operation": "modify-overwrite-beta",
"spec": {
"*": "=recursivelySquashNulls"
}
},
{
// pick only the first components from the repeated values populated within the arrays
"operation": "cardinality",
"spec": {
"*": {
"*": "ONE",
"investors": "MANY"
}
}
},
{
// get rid of object labels
"operation": "shift",
"spec": {
"*": ""
}
}
]

Performance issue running mongodb aggregation

I need to run a query that joins documents from two collections, I wrote an aggregation query but it takes too much time when running in the production database with many documents. Is there any way to write this query in a more efficient way?
Query in Mongo playground: https://mongoplayground.net/p/dLb3hsJHNYt
There are two collections users and activities. I need to run a query to get some users (from users collection), and also their last activity (from activities collection).
Database:
db={
"users": [
{
"_id": 1,
"email": "user1#gmail.com",
"username": "user1",
"country": "BR",
"creation_date": 1646873628
},
{
"_id": 2,
"email": "user2#gmail.com",
"username": "user2",
"country": "US",
"creation_date": 1646006402
}
],
"activities": [
{
"_id": 1,
"email": "user1#gmail.com",
"activity": "like",
"timestamp": 1647564787
},
{
"_id": 2,
"email": "user1#gmail.com",
"activity": "comment",
"timestamp": 1647564834
},
{
"_id": 3,
"email": "user2#gmail.com",
"activity": "like",
"timestamp": 1647564831
}
]
}
Inefficient Query:
db.users.aggregate([
{
// Get users using some filters
"$match": {
"$expr": {
"$and": [
{ "$not": { "$in": [ "$country", [ "AR", "CA" ] ] } },
{ "$gte": [ "$creation_date", 1646006400 ] },
{ "$lte": [ "$creation_date", 1648684800 ] }
]
}
}
},
{
// Get the last activity within the time range
"$lookup": {
"from": "activities",
"as": "last_activity",
"let": { "cur_email": "$email" },
"pipeline": [
{
"$match": {
"$expr": {
"$and": [
{ "$eq": [ "$email", "$$cur_email" ] },
{ "$gte": [ "$timestamp", 1647564787 ] },
{ "$lte": [ "$timestamp", 1647564834 ] }
]
}
}
},
{ "$sort": { "timestamp": -1 } },
{ "$limit": 1 }
]
}
},
{
// Remove users with no activity
"$match": {
"$expr": {
"$gt": [ { "$size": "$last_activity" }, 0 ] }
}
}
])
Result:
[
{
"_id": 1,
"country": "BR",
"creation_date": 1.646873628e+09,
"email": "user1#gmail.com",
"last_activity": [
{
"_id": 2,
"activity": "comment",
"email": "user1#gmail.com",
"timestamp": 1.647564788e+09
}
],
"username": "user1"
},
{
"_id": 2,
"country": "US",
"creation_date": 1.646006402e+09,
"email": "user2#gmail.com",
"last_activity": [
{
"_id": 3,
"activity": "like",
"email": "user2#gmail.com",
"timestamp": 1.647564831e+09
}
],
"username": "user2"
}
]
I'm more familiar with relational databases, so I'm struggling a little to run this query efficiently.
Thanks!

The most appropiate way to render unstructured data in react

I have a backend that returns unstructured data (another dev is responsible for the backend) and I have no idea how is the most appropiate way to render it, any ideas?.
What I have already tried is to render it with this library react-json-view but it's not very user friendly.
This is an example of the data I receive:
[
{
"conditions": [
"SIN_SALDO"
],
"typeItem": "MSISDN",
"createdDate": 1639677563,
"data": {
"msisdn": "571345543122"
},
"planName": "PRE_PAGO",
"backendName": "backofficeco",
"pk": "#CO#MSISDN#MI_tienda#backofficeco#cbb1efe963",
"country": "CO",
"resourceGroup": "MI_tienda"
},
{
"typeItem": "MSISDN",
"createdDate": 1644521244,
"data": {
"MSISDN": "asdfk"
},
"backendName": "adfs;fk",
"pk": "#CO#MSISDN#asdf#adfs;fk#7578238817",
"country": "CO",
"resourceGroup": "asdf"
},
{
"conditions": [
"SIN_SALDO"
],
"typeItem": "MSISDN",
"createdDate": 1644940771,
"data": {
"msisdn": "3007279930"
},
"planName": "POS_PAGO",
"backendName": "backofficeco",
"pk": "#CO#MSISDN#MI_tienda#backofficeco#25831ae7cf",
"country": "CO",
"resourceGroup": "MI_tienda"
},
{
"conditions": [
"SIN_SALDO"
],
"typeItem": "MSISDN",
"createdDate": 1644420646,
"data": {
"msisdn": "571345543122"
},
"planName": "adfasdf",
"backendName": "backofficeco",
"pk": "#CO#MSISDN#asdfasdf#backofficeco#c30d28f552",
"country": "CO",
"resourceGroup": "MI_tienda"
},
{
"typeItem": "MSISDN",
"createdDate": 1644525223,
"data": {
"MSISDN": "asdfasd"
},
"backendName": "asdfasdf",
"pk": "#CO#MSISDN#asdfasdf#asdfasdf#02ac5aa61b",
"country": "CO",
"resourceGroup": "asdfasdf"
},
{
"conditions": [
"adsfas"
],
"typeItem": "MSISDN",
"createdDate": 1646230406,
"data": {
"msisdn": "571345543122"
},
"planName": "adfasdf",
"backendName": "backofficeco",
"ttl": 1646835206,
"pk": "#CO#MSISDN#MI_tienda#backofficeco#cd40ee06af",
"country": "CO",
"resourceGroup": "adsfa"
}
]

Assuming you just want to render the list, you can try creating a map based on some key (maybe on 'pk') and pass it on, say to grid.

Flatten JSON Data on snowflake

below is the Json data I'm trying to Flatten on snowflake
Json Document :
[
"empDetails": [
{
"kind": "person",
"fullName": "John Doe",
"age": 22,
"gender": "Male",
"phoneNumber": {
"areaCode": "206",
"number": "1234567"
},
"children": [
{
"name": "Jane",
"gender": "Female",
"age": "6"
},
{
"name": "John",
"gender": "Male",
"age": "15"
}
],
"citiesLived": [
{
"place": "Seattle",
"yearsLived": [
"1995"
]
},
{
"place": "Stockholm",
"yearsLived": [
"2005"
]
}
]
},
{
"kind": "person",
"fullName": "Mike Jones",
"age": 35,
"gender": "Male",
"phoneNumber": {
"areaCode": "622",
"number": "1567845"
},
"children": [
{
"name": "Earl",
"gender": "Male",
"age": "10"
},
{
"name": "Sam",
"gender": "Male",
"age": "6"
},
{
"name": "Kit",
"gender": "Male",
"age": "8"
}
],
"citiesLived": [
{
"place": "Los Angeles",
"yearsLived": [
"1989",
"1993",
"1998",
"2002"
]
},
{
"place": "Washington DC",
"yearsLived": [
"1990",
"1993",
"1998",
"2008"
]
},
{
"place": "Portland",
"yearsLived": [
"1993",
"1998",
"2003",
"2005"
]
},
{
"place": "Austin",
"yearsLived": [
"1973",
"1998",
"2001",
"2005"
]
}
]
},
{
"kind": "person",
"fullName": "Anna Karenina",
"age": 45,
"gender": "Female",
"phoneNumber": {
"areaCode": "425",
"number": "1984783"
},
"citiesLived": [
{
"place": "Stockholm",
"yearsLived": [
"1992",
"1998",
"2000",
"2010"
]
},
{
"place": "Russia",
"yearsLived": [
"1998",
"2001",
""
]
},
{
"place": "Austin",
"yearsLived": [
"1995",
"1999"
]
}
]
}
]
}
In this data I have 3 employees and their details like Name, children, cities Lived
but for one of the employee "Anna Karenina" children details are not there, but for other 2 employees have children data.
because of the missing children details I'm not able to flatten 3rd emp data.
below is what I have tried so far :
Snowflake Flatten Json Code :
select empd.value:kind,
empd.value:fullName,
empd.value:age,
empd.value:gender,
--empd.value:phoneNumber,
empd.value:phoneNumber.areaCode,
empd.value:phoneNumber.number ,
empd.value:children -- flattening childrean
//chldrn.value:name,
//chldrn.value:gender,
//chldrn.value:age,
//city.value:place,
//yr.value:yearsLived
from my_json emp , lateral flatten(input=>emp.Json_data:empDetails) empd ,
lateral flatten(input=>empd.value:children) chldrn,
//lateral flatten(input=>empd.value:citiesLived) city,
//lateral flatten(input=>city.value:yearsLived) yr

You need to use OUTER switch:
FLATTEN
OUTER => TRUE | FALSE
If FALSE, any input rows that cannot be expanded, either because they cannot be accessed in the path or because they have zero fields or entries, are completely omitted from the output.
If TRUE, exactly one row is generated for zero-row expansions (with NULL in the KEY, INDEX, and VALUE columns).
select empd.value:kind,
empd.value:fullName,
empd.value:age,
empd.value:gender,
empd.value:phoneNumber,
empd.value:phoneNumber.areaCode,
empd.value:phoneNumber.number ,
empd.value:children,
chldrn.value:name,
chldrn.value:gender,
chldrn.value:age,
city.value:place,
yr.value:yearsLived
from my_json emp,
lateral flatten(input=>emp.Json_data:empDetails) empd ,
lateral flatten(input=>empd.value:children, OUTER => TRUE) chldrn, -- <HERE>
lateral flatten(input=>empd.value:citiesLived) city,
lateral flatten(input=>city.value:yearsLived) yr

How to get a value in one mongodb collection and use that value to update another document in another collection order & inventory system

Hello I have been stuck for weeks trying to figure how to create a order & inventory system for a project I am working on. I don't know how to properly ask this but my problem is when a user adds items to their cart > I store the order details in a orders collection in mongodb > I then need to figure out how to subtract the quantity of the items in a customers order from my inventory collection. How can I do this with mongodb, Python
This is the document created when a customer places an order
{
"_id": "5eca94b4f56331fd9eade681",
"ordernumber": 343,
"order": {
"order_details": [
{
"customer_info": [
{
"first_name": "John",
"last_name": "Doe",
"email": "email#email.com"
}
],
"shipping_details": [
{
"shipping_address": "Test Address",
"shipping_zip": "12345",
"shippingl_city": "Test city",
"shipping_country": "USA"
}
],
"products_ordered": [
{
"variant_id": "a",
"product_name": "red shirt",
"price": 30,
"quantity": 2,
"image": "imageurl",
"size": "Small"
},
{
"variant_id": "f",
"product_name": "Blue Jeans",
"price": 20,
"quantity": 3,
"image": "imageurl",
"size": "Large"
}
]
}
]
}
}
These are the products in my inventory collection I want inventory order quantity subtracted by the quantity a customer purchased
{
"_id": "5eca0ff4898b8f30a9fee5e5",
"product_id": 1,
"product_name": "red shirt",
"category": "shirts",
"price": 30,
"status": "instock",
"description": "nice red shirt",
"alt": "string",
"images": [
"imageUrl1",
"imageUrl2"
],
"variants": [
{
"Small": [
{
"variant_id": "a",
"inventory": 30
}
],
"Medium": [
{
"variant_id": "b",
"inventory": 10
}
],
"Large": [
{
"variant_id": "c",
"inventory": 10
}
]
}
]
}
{
"_id": "5eca108f898b8f30a9fee5e6",
"product_id": 2,
"product_name": "blue jeans",
"category": "jeans",
"price": 20,
"status": "instock",
"description": "nice blue jeans",
"alt": "string",
"images": [
"ImageURL"
],
"variants": [
{
"Small": [
{
"variant_id": "d",
"inventory": 100
}
],
"Medium": [
{
"variant_id": "e",
"inventory": 150
}
],
"Large": [
{
"variant_id": "f",
"inventory": 70
}
] }
]
}

I would suggest to do it along with the service which creates the order.
I would also like to suggest to refactor the db structure a bit as it would be harder to maintain this in a larger scale.
Because currently we would have to write something like
for ordered_product in products_ordered:
query = { "product_name": ordered_product.get("product_name") }
inventory_product = inventory_collection.find_one(query)
product_id = inventory_product["_id"]
existing_count = inventory_product["variants"][0][ordered_product.size][0]["inventory"]
inventory_product["variants"][0][ordered_product["size"]][0]["inventory"] = existing_count - ordered_product["quantity"]
inventory_collection.update_one({ "_id": product_id }, { "$set": inventory_product })
I have hardcoded the index values of the list. You could use filter() to filter out the variant and size you need.
This code definitely seems messy to me.
Of course you could refactor this code by splitting it into functions inside the model file itself, but I would suggest to refactor the db structure for better scalability.
May be you could move the variants to a seperate collection and use the product_id as a link. You have to think this through before getting on with the code.
Hope this helps.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Json data flattening on snowflake - snowflake-cloud-data-platform

Related

JOLT: Merge specific data from JSON array using id key

Performance issue running mongodb aggregation

The most appropiate way to render unstructured data in react

Flatten JSON Data on snowflake

How to get a value in one mongodb collection and use that value to update another document in another collection order & inventory system

Categories

Resources