below is the Json data I'm trying to Flatten on snowflake
Json Document :
[
"empDetails": [
{
"kind": "person",
"fullName": "John Doe",
"age": 22,
"gender": "Male",
"phoneNumber": {
"areaCode": "206",
"number": "1234567"
},
"children": [
{
"name": "Jane",
"gender": "Female",
"age": "6"
},
{
"name": "John",
"gender": "Male",
"age": "15"
}
],
"citiesLived": [
{
"place": "Seattle",
"yearsLived": [
"1995"
]
},
{
"place": "Stockholm",
"yearsLived": [
"2005"
]
}
]
},
{
"kind": "person",
"fullName": "Mike Jones",
"age": 35,
"gender": "Male",
"phoneNumber": {
"areaCode": "622",
"number": "1567845"
},
"children": [
{
"name": "Earl",
"gender": "Male",
"age": "10"
},
{
"name": "Sam",
"gender": "Male",
"age": "6"
},
{
"name": "Kit",
"gender": "Male",
"age": "8"
}
],
"citiesLived": [
{
"place": "Los Angeles",
"yearsLived": [
"1989",
"1993",
"1998",
"2002"
]
},
{
"place": "Washington DC",
"yearsLived": [
"1990",
"1993",
"1998",
"2008"
]
},
{
"place": "Portland",
"yearsLived": [
"1993",
"1998",
"2003",
"2005"
]
},
{
"place": "Austin",
"yearsLived": [
"1973",
"1998",
"2001",
"2005"
]
}
]
},
{
"kind": "person",
"fullName": "Anna Karenina",
"age": 45,
"gender": "Female",
"phoneNumber": {
"areaCode": "425",
"number": "1984783"
},
"citiesLived": [
{
"place": "Stockholm",
"yearsLived": [
"1992",
"1998",
"2000",
"2010"
]
},
{
"place": "Russia",
"yearsLived": [
"1998",
"2001",
""
]
},
{
"place": "Austin",
"yearsLived": [
"1995",
"1999"
]
}
]
}
]
}
In this data I have 3 employees and their details like Name, children, cities Lived
but for one of the employee "Anna Karenina" children details are not there, but for other 2 employees have children data.
because of the missing children details I'm not able to flatten 3rd emp data.
below is what I have tried so far :
Snowflake Flatten Json Code :
select empd.value:kind,
empd.value:fullName,
empd.value:age,
empd.value:gender,
--empd.value:phoneNumber,
empd.value:phoneNumber.areaCode,
empd.value:phoneNumber.number ,
empd.value:children -- flattening childrean
//chldrn.value:name,
//chldrn.value:gender,
//chldrn.value:age,
//city.value:place,
//yr.value:yearsLived
from my_json emp , lateral flatten(input=>emp.Json_data:empDetails) empd ,
lateral flatten(input=>empd.value:children) chldrn,
//lateral flatten(input=>empd.value:citiesLived) city,
//lateral flatten(input=>city.value:yearsLived) yr
You need to use OUTER switch:
FLATTEN
OUTER => TRUE | FALSE
If FALSE, any input rows that cannot be expanded, either because they cannot be accessed in the path or because they have zero fields or entries, are completely omitted from the output.
If TRUE, exactly one row is generated for zero-row expansions (with NULL in the KEY, INDEX, and VALUE columns).
select empd.value:kind,
empd.value:fullName,
empd.value:age,
empd.value:gender,
empd.value:phoneNumber,
empd.value:phoneNumber.areaCode,
empd.value:phoneNumber.number ,
empd.value:children,
chldrn.value:name,
chldrn.value:gender,
chldrn.value:age,
city.value:place,
yr.value:yearsLived
from my_json emp,
lateral flatten(input=>emp.Json_data:empDetails) empd ,
lateral flatten(input=>empd.value:children, OUTER => TRUE) chldrn, -- <HERE>
lateral flatten(input=>empd.value:citiesLived) city,
lateral flatten(input=>city.value:yearsLived) yr
Related
I'm getting data in an specific way from an API and I have to convert it to a cleaner version of it.
What I get from the API is a JSON like this (you can see that there is some information duplicated as for the first fields but the investor is different).
{
"clubhouse": [
{
"id": "01",
"statusId": "ok",
"stateid": "2",
"TypeId": "3",
"investors": [
{
"investor": {
"id": "1234",
"gender": "01"
},
"inamount": "1500000",
"ratio": "12"
}
]
},
{
"id": "01",
"statusId": "ok",
"stateid": "2",
"TypeId": "3",
"investors": [
{
"investor": {
"id": "4321",
"gender": "02"
},
"inamount": "1700000",
"ratio": "12"
}
]
},
{
"id": "02",
"statusId": "ok",
"stateid": "2",
"TypeId": "3",
"investors": [
{
"investor": {
"id": "1333",
"gender": "01"
},
"inamount": "1500000",
"ratio": "12"
}
]
},
{
"id": "03",
"statusId": "ok",
"stateid": "5",
"TypeId": "3",
"investors": [
{
"investor": {
"id": "",
"gender": ""
},
"inamount": "",
"ratio": ""
}
]
},
{
"id": "02",
"statusId": "ok",
"stateid": "2",
"TypeId": "3",
"investors": [
{
"investor": {
"id": "1334",
"gender": "02"
},
"inamount": "1900000",
"ratio": "12"
}
]
}
]
}
I need to merge the investors and eliminate the duplicated information, the the expected result will be
{
"clubhouse": [
{
"id": "01",
"statusId": "ok",
"stateid": "2",
"TypeId": "3",
"investors": [
{
"investor": {
"id": "1234",
"gender": "01"
},
"inamount": "1500000",
"ratio": "12"
},
{
"investor": {
"id": "4321",
"gender": "02"
},
"inamount": "1700000",
"ratio": "12"
}
]
},
{
"id": "02",
"statusId": "ok",
"stateid": "2",
"TypeId": "3",
"investors": [
{
"investor": {
"id": "1333",
"gender": "01"
},
"inamount": "1500000",
"ratio": "12"
},
{
"investor": {
"id": "1334",
"gender": "02"
},
"inamount": "1900000",
"ratio": "12"
}
]
},
{
"id": "03",
"statusId": "ok",
"stateid": "5",
"TypeId": "3",
"investors": [
{
"investor": {
"id": "1555",
"gender": "01"
},
"inamount": "2000000",
"ratio": "15"
}
]
}
]
}
I'd try a couple of JOLTS and I got to merge the fields but not eliminate the duplicates.
You can start with grouping by id values such as
[
{
// group by "id" values to create separate objects
"operation": "shift",
"spec": {
"*": {
"*": {
"*": "#(1,id).&",
"investors": {
"*": {
"*": {
"#": "#(4,id).&3[&4].&" // &3 -> going 3 levels up to grab literal "investors", [&4] -> going 4 levels up the tree in order to reach the indexes of "clubhouse" array, & -> replicate the leaf node values for the current key-value pair
}
}
}
}
}
}
},
{
// get rid of "null" values
"operation": "modify-overwrite-beta",
"spec": {
"*": "=recursivelySquashNulls"
}
},
{
// pick only the first components from the repeated values populated within the arrays
"operation": "cardinality",
"spec": {
"*": {
"*": "ONE",
"investors": "MANY"
}
}
},
{
// get rid of object labels
"operation": "shift",
"spec": {
"*": ""
}
}
]
I need to run a query that joins documents from two collections, I wrote an aggregation query but it takes too much time when running in the production database with many documents. Is there any way to write this query in a more efficient way?
Query in Mongo playground: https://mongoplayground.net/p/dLb3hsJHNYt
There are two collections users and activities. I need to run a query to get some users (from users collection), and also their last activity (from activities collection).
Database:
db={
"users": [
{
"_id": 1,
"email": "user1#gmail.com",
"username": "user1",
"country": "BR",
"creation_date": 1646873628
},
{
"_id": 2,
"email": "user2#gmail.com",
"username": "user2",
"country": "US",
"creation_date": 1646006402
}
],
"activities": [
{
"_id": 1,
"email": "user1#gmail.com",
"activity": "like",
"timestamp": 1647564787
},
{
"_id": 2,
"email": "user1#gmail.com",
"activity": "comment",
"timestamp": 1647564834
},
{
"_id": 3,
"email": "user2#gmail.com",
"activity": "like",
"timestamp": 1647564831
}
]
}
Inefficient Query:
db.users.aggregate([
{
// Get users using some filters
"$match": {
"$expr": {
"$and": [
{ "$not": { "$in": [ "$country", [ "AR", "CA" ] ] } },
{ "$gte": [ "$creation_date", 1646006400 ] },
{ "$lte": [ "$creation_date", 1648684800 ] }
]
}
}
},
{
// Get the last activity within the time range
"$lookup": {
"from": "activities",
"as": "last_activity",
"let": { "cur_email": "$email" },
"pipeline": [
{
"$match": {
"$expr": {
"$and": [
{ "$eq": [ "$email", "$$cur_email" ] },
{ "$gte": [ "$timestamp", 1647564787 ] },
{ "$lte": [ "$timestamp", 1647564834 ] }
]
}
}
},
{ "$sort": { "timestamp": -1 } },
{ "$limit": 1 }
]
}
},
{
// Remove users with no activity
"$match": {
"$expr": {
"$gt": [ { "$size": "$last_activity" }, 0 ] }
}
}
])
Result:
[
{
"_id": 1,
"country": "BR",
"creation_date": 1.646873628e+09,
"email": "user1#gmail.com",
"last_activity": [
{
"_id": 2,
"activity": "comment",
"email": "user1#gmail.com",
"timestamp": 1.647564788e+09
}
],
"username": "user1"
},
{
"_id": 2,
"country": "US",
"creation_date": 1.646006402e+09,
"email": "user2#gmail.com",
"last_activity": [
{
"_id": 3,
"activity": "like",
"email": "user2#gmail.com",
"timestamp": 1.647564831e+09
}
],
"username": "user2"
}
]
I'm more familiar with relational databases, so I'm struggling a little to run this query efficiently.
Thanks!
I have a backend that returns unstructured data (another dev is responsible for the backend) and I have no idea how is the most appropiate way to render it, any ideas?.
What I have already tried is to render it with this library react-json-view but it's not very user friendly.
This is an example of the data I receive:
[
{
"conditions": [
"SIN_SALDO"
],
"typeItem": "MSISDN",
"createdDate": 1639677563,
"data": {
"msisdn": "571345543122"
},
"planName": "PRE_PAGO",
"backendName": "backofficeco",
"pk": "#CO#MSISDN#MI_tienda#backofficeco#cbb1efe963",
"country": "CO",
"resourceGroup": "MI_tienda"
},
{
"typeItem": "MSISDN",
"createdDate": 1644521244,
"data": {
"MSISDN": "asdfk"
},
"backendName": "adfs;fk",
"pk": "#CO#MSISDN#asdf#adfs;fk#7578238817",
"country": "CO",
"resourceGroup": "asdf"
},
{
"conditions": [
"SIN_SALDO"
],
"typeItem": "MSISDN",
"createdDate": 1644940771,
"data": {
"msisdn": "3007279930"
},
"planName": "POS_PAGO",
"backendName": "backofficeco",
"pk": "#CO#MSISDN#MI_tienda#backofficeco#25831ae7cf",
"country": "CO",
"resourceGroup": "MI_tienda"
},
{
"conditions": [
"SIN_SALDO"
],
"typeItem": "MSISDN",
"createdDate": 1644420646,
"data": {
"msisdn": "571345543122"
},
"planName": "adfasdf",
"backendName": "backofficeco",
"pk": "#CO#MSISDN#asdfasdf#backofficeco#c30d28f552",
"country": "CO",
"resourceGroup": "MI_tienda"
},
{
"typeItem": "MSISDN",
"createdDate": 1644525223,
"data": {
"MSISDN": "asdfasd"
},
"backendName": "asdfasdf",
"pk": "#CO#MSISDN#asdfasdf#asdfasdf#02ac5aa61b",
"country": "CO",
"resourceGroup": "asdfasdf"
},
{
"conditions": [
"adsfas"
],
"typeItem": "MSISDN",
"createdDate": 1646230406,
"data": {
"msisdn": "571345543122"
},
"planName": "adfasdf",
"backendName": "backofficeco",
"ttl": 1646835206,
"pk": "#CO#MSISDN#MI_tienda#backofficeco#cd40ee06af",
"country": "CO",
"resourceGroup": "adsfa"
}
]
Assuming you just want to render the list, you can try creating a map based on some key (maybe on 'pk') and pass it on, say to grid.
I'm trying to flatten below Json data on snowflake :
Json Data :
{
"empDetails": [
{
"kind": "person",
"fullName": "John Doe",
"age": 22,
"gender": "Male",
"phoneNumber": {
"areaCode": "206",
"number": "1234567"
},
"children": [
{
"name": "Jane",
"gender": "Female",
"age": "6"
},
{
"name": "John",
"gender": "Male",
"age": "15"
}
],
"citiesLived": [
{
"place": "Seattle",
"yearsLived": [
"1995"
]
},
{
"place": "Stockholm",
"yearsLived": [
"2005"
]
}
]
},
{
"kind": "person",
"fullName": "Mike Jones",
"age": 35,
"gender": "Male",
"phoneNumber": {
"areaCode": "622",
"number": "1567845"
},
"children": [
{
"name": "Earl",
"gender": "Male",
"age": "10"
},
{
"name": "Sam",
"gender": "Male",
"age": "6"
},
{
"name": "Kit",
"gender": "Male",
"age": "8"
}
],
"citiesLived": [
{
"place": "Los Angeles",
"yearsLived": [
"1989",
"1993",
"1998",
"2002"
]
},
{
"place": "Washington DC",
"yearsLived": [
"1990",
"1993",
"1998",
"2008"
]
},
{
"place": "Portland",
"yearsLived": [
"1993",
"1998",
"2003",
"2005"
]
},
{
"place": "Austin",
"yearsLived": [
"1973",
"1998",
"2001",
"2005"
]
}
]
},
{
"kind": "person",
"fullName": "Anna Karenina",
"age": 45,
"gender": "Female",
"phoneNumber": {
"areaCode": "425",
"number": "1984783"
},
"citiesLived": [
{
"place": "Stockholm",
"yearsLived": [
"1992",
"1998",
"2000",
"2010"
]
},
{
"place": "Russia",
"yearsLived": [
"1998",
"2001",
""
]
},
{
"place": "Austin",
"yearsLived": [
"1995",
"1999"
]
}
]
}
]
}
I'm able to flatten the most of the data except for column/array years Lived,
for last column I'm getting null values.
below is what I have tried so far :
select empd.value:kind,
empd.value:fullName,
empd.value:age,
empd.value:gender,
empd.value:phoneNumber,
empd.value:phoneNumber.areaCode,
empd.value:phoneNumber.number ,
empd.value:children,
chldrn.value:name,
chldrn.value:gender,
chldrn.value:age,
city.value:place,
yr.value:yearsLived
from my_json emp,
lateral flatten(input=>emp.Json_data:empDetails) empd ,
lateral flatten(input=>empd.value:children, OUTER => TRUE) chldrn,
lateral flatten(input=>empd.value:citiesLived) city,
lateral flatten(input=>city.value:yearsLived) yr -- not getting data for
this array
can someone help me understand why I'm getting null values for yearsLived array ? I'm not sure if I'm missing anything here
Your query returns the column
yr.value:yearsLived
as if yr.value were an OBJECT with fields.
But you have already expanded the yearsLived field in the line
lateral flatten(input=>city.value:yearsLived) yr
so yr.value is really just a VARIANT containing the year. You can leave it as such—or perhaps wrap it in TO_NUMBER or TO_VARCHAR to have a more precise type.
Why don't you try this out.
create or replace table json_tab as
select parse_json('{ "place": "Austin","yearsLived": [ "1995","1999"]}') as years
select years:yearsLived[0]::int from json_tab
Since your JSON data is an array, you need to access the elements via index if you would like to get specific values or use any array function to explode it.
with flatten function
select years, v.value::string
from json_tab,
lateral flatten(input =>years:yearsLived ) v;
I would appreciate any help trying to create a multi faceted CSV (with header) into a json I can use for a post. Here is csv and JSON format required. Ultimately; I need to create a single location that may have 1, 2, or many fax machines.
csv
ID,Location Name,timezone,code,display,address,city,state,postalcode,country,faxlocation,faxnumber,(in)active
5,Location1,America/Chicago,bu,Building,1313 Mocking Bird Lane,The City,IL,999999,USA,Room 1; Room 2,111111111; 2222222222,active
8,Location2,America/New_York,bu,Building,2626 Humpty Dumpty Lane,Another City,NY,999999,USA,Room 1; Room 2; Room 3,111111111; 2222222222; 3333333333,active
32,Location3,America/Los_Angeles,bu,Building,3939 Big Bird Lane,Last City,CA,999999,USA,Room 1,111111111,active
json
{
"resourceType": "Location",
"id": "5",
"description": "America/Chicago",
"name": "Location1",
"address": {
"address": "1313 Mocking Bird Lane",
"city": "The City",
"state": "IL",
"postalCode": "999999",
"country": "USA"
},
"telecom": [
{
"system": "fax",
"value": "1111111111",
"use": "work",
"extension": [
{
"url": "displayValue",
"valueString": "Room 1"
}
]
},
{
"system": "fax",
"value": "2222222222",
"use": "work",
"extension": [
{
"url": "displayValue",
"valueString": "Room "
}
]
}
],
"status": "active"
}
all requires here
locations = CSV.read(csv)
locations.shift # remove header row
locations.each_index do |index|
faxnumarrat = locations[index][10].to_s.delete(' ').split(';') #Take 10th index and turn it into an array
faxlocarray = locations[index][11].to_s.delete(' ').split(';') #Take 11th index and turn it into an array
# keys = ['loc1','loc2']
# values = ['fax1','fax2']
my_hash = Hash[faxnumarray.zip(faxlocarray)] # Combine locations and fax numbers
my_hash.each do |key, value|
#fax_hash = { 'system' => 'fax', 'value' => key, 'use' => 'work', 'extension' => [{ 'url' => 'automaticSend', 'valueBoolean' => false }, { 'url' => 'displayValue', 'valueString' => value }] } #Build up a hash using k,v. ISSUE: Only creates one fax. Need to put all faxes associated no matter how many per location
end
end
Instead of reading in the entire CSV as an array of arrays, you instead can open
or create a new CSV instance and pass in an option to convert the first line of
into headers into keys to access columns within a row more easily; e.g.
locations = CSV.open(csv, headers: true)
location = locations.first
# => #<CSV::Row "ID":"5" "Location Name":"Location1" "timezone":"America/Chicago" "code":"bu" "display":"Building" "address":"1313 Mocking Bird Lane" "city":"The City" "state":"IL" "postalcode":"999999" "country":"USA" "faxlocation":"Room 1; Room 2" "faxnumber":"111111111; 2222222222" "(in)active":"active">
location["Location Name"]
# => "Location1"
With the headers as keys we can use, we can then build a new Hash object using
those old values, whilst also build a collection they you had started in your
question example
locations = CSV.open(csv, headers: true)
results = locations.map.with_index do |location, index|
faxnumarray = location["faxnumber"].delete(' ').split(';') #Take 10th index and turn it into an array
faxlocarray = location["faxlocation"].delete(' ').split(';') #Take 11th index and turn it into an array
# keys = ['loc1','loc2']
# values = ['fax1','fax2']
my_hash = Hash[faxnumarray.zip(faxlocarray)] # Combine locations and fax numbers
{
"resourceType" => "Location",
"id" => location["ID"],
"description" => location["timezone"],
"name" => location["Location Name"],
"address" => {
"address" => location["address"],
"city" => location["city"],
"state" => location["state"],
"postalCode" => location["postalcode"],
"country" => location["country"]
},
"telecom" => my_hash.map do |key, value|
{
'system' => 'fax',
'value' => key,
'use' => 'work',
'extension' => [
{
'url' => 'automaticSend',
'valueBoolean' => false
},
{
'url' => 'displayValue',
'valueString' => value
}
]
}
end,
"status" => location["(in)active"]
}
end
results.to_json
I reused your code where I could and made the last object (the returned object)
within the map block resemble the requested Hash result object. Using map
instead of each means it
Returns a new array with the results of running block once for every element in enum.
-- Module: Enumerable (Ruby 2_4_0)
This produces the following result
[
{
"resourceType": "Location",
"id": "5",
"description": "America/Chicago",
"name": "Location1",
"address": {
"address": "1313 Mocking Bird Lane",
"city": "The City",
"state": "IL",
"postalCode": "999999",
"country": "USA"
},
"telecom": [
{
"system": "fax",
"value": "111111111",
"use": "work",
"extension": [
{
"url": "automaticSend",
"valueBoolean": false
},
{
"url": "displayValue",
"valueString": "Room 1"
}
]
},
{
"system": "fax",
"value": "2222222222",
"use": "work",
"extension": [
{
"url": "automaticSend",
"valueBoolean": false
},
{
"url": "displayValue",
"valueString": " Room 2"
}
]
}
],
"status": "active"
},
{
"resourceType": "Location",
"id": "8",
"description": "America/New_York",
"name": "Location2",
"address": {
"address": "2626 Humpty Dumpty Lane",
"city": "Another City",
"state": "NY",
"postalCode": "999999",
"country": "USA"
},
"telecom": [
{
"system": "fax",
"value": "111111111",
"use": "work",
"extension": [
{
"url": "automaticSend",
"valueBoolean": false
},
{
"url": "displayValue",
"valueString": "Room 1"
}
]
},
{
"system": "fax",
"value": "2222222222",
"use": "work",
"extension": [
{
"url": "automaticSend",
"valueBoolean": false
},
{
"url": "displayValue",
"valueString": " Room 2"
}
]
},
{
"system": "fax",
"value": "3333333333",
"use": "work",
"extension": [
{
"url": "automaticSend",
"valueBoolean": false
},
{
"url": "displayValue",
"valueString": " Room 3"
}
]
}
],
"status": "active"
},
{
"resourceType": "Location",
"id": "32",
"description": "America/Los_Angeles",
"name": "Location3",
"address": {
"address": "3939 Big Bird Lane",
"city": "Last City",
"state": "CA",
"postalCode": "999999",
"country": "USA"
},
"telecom": [
{
"system": "fax",
"value": "111111111",
"use": "work",
"extension": [
{
"url": "automaticSend",
"valueBoolean": false
},
{
"url": "displayValue",
"valueString": "Room 1"
}
]
}
],
"status": "active"
}
]
References:
Class: CSV (Ruby 2_4_0)