Related
With standard fields, like id, this works perfectly. But I am not finding a way to parse the custom fields where the structure is
"custom_fields": [
{
"id": 57852188,
"value": ""
},
{
"id": 57522467,
"value": ""
},
{
"id": 57522487,
"value": ""
}
]
The general format that I have been using is:
Select v:id,v:updatedat
from zd_tickets
updated data:
{
"id":151693,
"brand_id": 36000,
"created_at": "2022-0523T19:26:35Z",
"custom_fields": [
{ "id": 57866008, "value": false },
{ "id": 360022282754, "value": "" },
{ "id": 80814087, "value": "NC" } ],
"group_id": 36000770
}
If you want to select all repeating elements you will need to use FLATTEN, otherwise you can use standard notation. This is all documented here: https://docs.snowflake.com/en/user-guide/querying-semistructured.html#retrieving-a-single-instance-of-a-repeating-element
So using this CTE to access the data in a way that look like a table:
with data(json) as (
select parse_json(column1) from values
('{
"id":151693,
"brand_id": 36000,
"created_at": "2022-0523T19:26:35Z",
"custom_fields": [
{ "id": 57866008, "value": false },
{ "id": 360022282754, "value": "" },
{ "id": 80814087, "value": "NC" } ],
"group_id": 36000770
} ')
)
SQL to unpack the top level items, as you have shown you have working:
select
json:id::number as id
,json:brand_id::number as brand_id
,try_to_timestamp(json:created_at::text, 'yyyy-mmddThh:mi:ssZ') as created_at
,json:custom_fields as custom_fields
from data;
gives:
ID
BRAND_ID
CREATED_AT
CUSTOM_FIELDS
151693
36000
2022-05-23 19:26:35.000
[ { "id": 57866008, "value": false }, { "id": 360022282754, "value": "" }, { "id": 80814087, "value": "NC" } ]
So now how to tackle that json/array of custom_fields..
Well if you only ever have 3 values, and the order is always the same..
select
to_array(json:custom_fields) as custom_fields_a
,custom_fields_a[0] as field_0
,custom_fields_a[1] as field_1
,custom_fields_a[2] as field_2
from data;
gives:
CUSTOM_FIELDS_A
FIELD_0
FIELD_1
FIELD_2
[ { "id": 57866008, "value": false }, { "id": 360022282754, "value": "" }, { "id": 80814087, "value": "NC" } ]
{ "id": 57866008, "value": false }
{ "id": 360022282754, "value": "" }
{ "id": 80814087, "value": "NC" }
so we can use flatten to access those objects, which makes "more rows"
select
d.json:id::number as id
,d.json:brand_id::number as brand_id
,try_to_timestamp(d.json:created_at::text, 'yyyy-mmddThh:mi:ssZ') as created_at
,f.*
from data as d
,table(flatten(input=>json:custom_fields)) f
ID
BRAND_ID
CREATED_AT
SEQ
KEY
PATH
INDEX
VALUE
THIS
151693
36000
2022-05-23 19:26:35.000
1
[0]
0
{ "id": 57866008, "value": false }
[ { "id": 57866008, "value": false }, { "id": 360022282754, "value": "" }, { "id": 80814087, "value": "NC" } ]
151693
36000
2022-05-23 19:26:35.000
1
[1]
1
{ "id": 360022282754, "value": "" }
[ { "id": 57866008, "value": false }, { "id": 360022282754, "value": "" }, { "id": 80814087, "value": "NC" } ]
151693
36000
2022-05-23 19:26:35.000
1
[2]
2
{ "id": 80814087, "value": "NC" }
[ { "id": 57866008, "value": false }, { "id": 360022282754, "value": "" }, { "id": 80814087, "value": "NC" } ]
So we can pull out know values (a manual PIVOT)
select
d.json:id::number as id
,d.json:brand_id::number as brand_id
,try_to_timestamp(d.json:created_at::text, 'yyyy-mmddThh:mi:ssZ') as created_at
,max(iff(f.value:id=80814087, f.value:value::text, null)) as v80814087
,max(iff(f.value:id=360022282754, f.value:value::text, null)) as v360022282754
,max(iff(f.value:id=57866008, f.value:value::text, null)) as v57866008
from data as d
,table(flatten(input=>json:custom_fields)) f
group by 1,2,3, f.seq
grouping by the f.seq means if you have many "rows" of input these will be kept apart, even if they share common values for 1,2,3
gives:
ID
BRAND_ID
CREATED_AT
V80814087
V360022282754
V57866008
151693
36000
2022-05-23 19:26:35.000
NC
<empty string>
false
Now if you do not know the names of the values, there is no way short of dynamic SQL and double parsing to turns rows into columns.
I ended up doing the following, with 2 different CTEs (CTE and UCF):
Used to_array to gather my custom fields
Unioned the custom fields together twice; once for the id of the field and once for the value (and used combinations of substring, position and replace to clean up data as needed (same setup for all fields)
Joined the resulting data to a Custom Fields Table (contains the id and a name) to include the name of the custom field in my result set.
WITH UCF AS (--Union Gathered Array into 2 fields (an id field and a value field)
WITH CTE AS( ---Gather array of custom fields
SELECT v:id as id,
to_array(v:custom_fields) as cf
,cf[0] as f0,cf[1] as f1,cf[2] as f2
FROM ZD_TICKETS)
SELECT id,
substring(f0,7,position(',',f0)-7) AS cf_id, REPLACE(substring(f0,position('value":',f0)+8,position('"',f0,position('value":',f0)+8)),'"}') AS cf_value
FROM CTE c
WHERE f0 not like '%null%'
UNION
SELECT id,
substring(f1,7,position(',',f1)-7) AS cf_id,
REPLACE(substring(f1,position('value":',f1)+8,position('"',f1,position('value":',f1)+8)),'"}') AS cf_value
FROM CTE c
WHERE f1 not like '%null%'
-- field 3
UNION
SELECT id,
substring(f2,7,position(',',f2)-7) AS cf_id,
REPLACE(substring(f2,position('value":',f2)+8,position('"',f2,position('value":',f2)+8)),'"}') AS cf_value
FROM CTE c
WHERE f2 not like '%null%' --this removes records where the value is null
)
SELECT UCF.*,CFD.name FROM UCF
LEFT OUTER JOIN "FLBUSINESS_DB"."STAGING"."FILE_ZD_CUSTOM_FIELD_IDS" CFD
ON CFD.id=UCF.cf_id
WHERE cf_value<>'' --this removes records where the value is blank
The result set looks like:
There is a Variant field "events" that is made up of an Array storing Objects(aka dictionaries, key-value pairs), as per the below:
[
{ "field_name": "status", "id": 987418431597, "previous_value": "new", "type": "Change", "value": "pending"},
{ "field_name": "360020024138", "id": 987418431617, "previous_value": null, "type": "Change", "value": "#55927" },
{ "field_name": "360016698218", "id": 987418431637, "previous_value": null, "type": "Change", "value": "0681102386"},
{ "field_name": "360016774537", "id": 987418431657, "previous_value": null, "type": "Change", "value": "89367031562011632212"}
]
This field belongs to an event log, and I am trying to use the content of "events" as a filter to get the related timestamps and ids.
Through the Snowflake documentation on Flatten I found out that the recursive => True parameter allows me to expand the Variant all the way down to its nested objects, but with hopes of optimising code, I wanted to use the path parameter, to selectively expand "events" only for the Objects I was interested in.
However, for some reason, Flatten does not allow me to pass a numeric path to identify the Array index of Object that I want to expand, as:
select b.*
from "event_log" a
,lateral flatten (input => a."events", path => 0) b limit 100;
returns: invalid type [NUMBER(1,0)] for parameter 'path'
and
select b.*
from "event_log" a
, lateral flatten (input => a."events", path => [0]) b limit 100;
returns: Syntax error: unexpected '['. (line 162)
Ironically, when using recursive => True, the b.path field represent indexes like this [i].
The example in the SFlake docs makes an example of the use of the path parameter with an Object that stores Arrays, whereas here "events" is made up of an Array of Objects, so I actually do not have any working example for this type of Variants.
The array index should be provided as input:
select b.*
from "event_log" a,
lateral flatten (input => a."events"[1]) b
limit 100;
Sample:
CREATE OR REPLACE TABLE "event_log"("events" VARIANT)
AS
SELECT '[ { "field_name": "status", "id": 987418431597, "previous_value": "new", "type": "Change", "value": "pending" }, { "field_name": "360020024138", "id": 987418431617, "previous_value": null, "type": "Change", "value": "#55927" }, { "field_name": "360016698218", "id": 987418431637, "previous_value": null, "type": "Change", "value": "0681102386" }, { "field_name": "360016774537", "id": 987418431657, "previous_value": null, "type": "Change", "value": "89367031562011632212" } ]';
Output:
So you have an array in each row. The array has many objects with the same "structure". Getting the ID makes sense as there is one in every object.
So you could just access it.
select a."events"[0]:id::number as id
from "event_log" as a
limit 100;
gives:
A."EVENTS"[0]:ID
987418431597
But given each object's ID looks to be different how do you know you are getting to correct ID's it would seem to make more sense to use flatten to unroll the array. and access the object elements.
so in this "2 rows" of 4 fields:
with "event_log"("events") as (
select parse_json(column1) from values
('[
{ "field_name": "status", "id": 0987418431597, "previous_value": "new", "type": "Change", "value": "pending"},
{ "field_name": "360020024138", "id": 0987418431617, "previous_value": null, "type": "Change", "value": "#55927" },
{ "field_name": "360016698218", "id": 0987418431637, "previous_value": null, "type": "Change", "value": "0681102386"},
{ "field_name": "360016774537", "id":0987418431657, "previous_value": null, "type": "Change", "value": "89367031562011632212"}
]'),
('[
{ "field_name": "status", "id": 1987418431597, "previous_value": "new", "type": "Change", "value": "pending"},
{ "field_name": "360020024138", "id":1987418431617, "previous_value": null, "type": "Change", "value": "#55927" },
{ "field_name": "360016698218", "id": 1987418431637, "previous_value": null, "type": "Change", "value": "0681102386"},
{ "field_name": "360016774537", "id":1987418431657, "previous_value": null, "type": "Change", "value": "89367031562011632212"}
]')
)
select b.seq as input_row
,b.index as array_index
,b.value:field_name::text as field_name
,b.value:id::number as id
,b.value:previous_value::text as previous_value
,b.value:type::text as type
,b.value:value::text as value
from "event_log" as a
,lateral flatten(input=>a."events") b
;
we get:
INPUT_ROW
ARRAY_INDEX
FIELD_NAME
ID
PREVIOUS_VALUE
TYPE
VALUE
1
0
status
987418431597
new
Change
pending
1
1
360020024138
987418431617
Change
#55927
1
2
360016698218
987418431637
Change
0681102386
1
3
360016774537
987418431657
Change
89367031562011632212
2
0
status
1987418431597
new
Change
pending
2
1
360020024138
1987418431617
Change
#55927
2
2
360016698218
1987418431637
Change
0681102386
2
3
360016774537
1987418431657
Change
89367031562011632212
How can I get the data out of this array stored in a variant column in Snowflake. I don't care if it's a new table, a view or a query. There is a second column of type varchar(256) that contains a unique ID.
If you can just help me read the "confirmed" data and the "editorIds" data I can probably take it from there. Many thanks!
Output example would be
UniqueID ConfirmationID EditorID
u3kd9 xxxx-436a-a2d7 nupd
u3kd9 xxxx-436a-a2d7 9l34c
R3nDo xxxx-436a-a3e4 5rnj
yP48a xxxx-436a-a477 jTpz8
yP48a xxxx-436a-a477 nupd
[
{
"confirmed": {
"Confirmation": "Entry ID=xxxx-436a-a2d7-3525158332f0: Confirmed order submitted.",
"ConfirmationID": "xxxx-436a-a2d7-3525158332f0",
"ConfirmedOrders": 1,
"Received": "8/29/2019 4:31:11 PM Central Time"
},
"editorIds": [
"xxsJYgWDENLoX",
"JR9bWcGwbaymm3a8v",
"JxncJrdpeFJeWsTbT"
] ,
"id": "xxxxx5AvGgeSHy8Ms6Ytyc-1",
"messages": [],
"orderJson": {
"EntryID": "xxxxx5AvGgeSHy8Ms6Ytyc-1",
"Orders": [
{
"DropShipFlag": 1,
"FromAddressValue": 1,
"OrderAttributes": [
{
"AttributeUID": 548
},
{
"AttributeUID": 553
},
{
"AttributeUID": 2418
}
],
"OrderItems": [
{
"EditorId": "aC3f5HsJYgWDENLoX",
"ItemAssets": [
{
"AssetPath": "https://xxxx573043eac521.png",
"DP2NodeID": "10000",
"ImageHash": "000000000000000FFFFFFFFFFFFFFFFF",
"ImageRotation": 0,
"OffsetX": 50,
"OffsetY": 50,
"PrintedFileName": "aC3f5HsJYgWDENLoX-10000",
"X": 50,
"Y": 52.03909266409266,
"ZoomX": 100,
"ZoomY": 93.75
}
],
"ItemAttributes": [
{
"AttributeUID": 2105
},
{
"AttributeUID": 125
}
],
"ItemBookAttribute": null,
"ProductUID": 52,
"Quantity": 1
}
],
"SendNotificationEmailToAccount": true,
"SequenceNumber": 1,
"ShipToAddress": {
"Addr1": "Addr1",
"Addr2": "0",
"City": "City",
"Country": "US",
"Name": "Name",
"State": "ST",
"Zip": "00000"
}
}
]
},
"orderNumber": null,
"status": "order_placed",
"submitted": {
"Account": "350000",
"ConfirmationID": "xxxxx-436a-a2d7-3525158332f0",
"EntryID": "xxxxx-5AvGgeSHy8Ms6Ytyc-1",
"Key": "D83590AFF0CC0000B54B",
"NumberOfOrders": 1,
"Orders": [
{
"LineItems": [],
"Note": "",
"Products": [
{
"Price": "00.30",
"ProductDescription": "xxxxxint 8x10",
"Quantity": 1
},
{
"Price": "00.40",
"ProductDescription": "xxxxxut Black 8x10",
"Quantity": 1
},
{
"Price": "00.50",
"ProductDescription": "xxxxx"
},
{
"Price": "00.50",
"ProductDescription": "xxxscount",
"Quantity": 1
}
],
"SequenceNumber": "1",
"SubTotal": "00.70",
"Tax": "1.01",
"Total": "00.71"
}
],
"Received": "8/29/2019 4:31:10 PM Central Time"
},
"tracking": null,
"updatedOn": 1.598736670503000e+12
}
]
So, this is how I'd query that exact JSON assuming the data is in column var in table x:
SELECT x.var[0]:confirmed:ConfirmationID::varchar as ConfirmationID,
f.value::varchar as EditorID
FROM x,
LATERAL FLATTEN(input => var[0]:editorIds) f
;
Since your sample output doesn't match the JSON that you provided, I will assume that this is what you need.
Also, as a note, your JSON includes outer [ ] which indicates that the entire JSON string is inside an array. This is the reason for var[0] in my query. If you have multiple records inside that array, then you should remove that. In general, you should exclude those and instead load each record into the table separately. I wasn't sure whether you could make that change, so I just wanted to make note.
I need to write a SQL query in the CosmosDB query editor, that will fetch results from JSON documents stored in Collection, as per my requirement shown below
The example JSON
{
"id": "abcdabcd-1234-1234-1234-abcdabcdabcd",
"source": "Example",
"data": [
{
"Laptop": {
"New": "yes",
"Used": "no",
"backlight": "yes",
"warranty": "yes"
}
},
{
"Mobile": [
{
"order": 1,
"quantity": 2,
"price": 350,
"color": "Black",
"date": "07202019"
},
{
"order": 2,
"quantity": 1,
"price": 600,
"color": "White",
"date": "07202019"
}
]
},
{
"Accessories": [
{
"covers": "yes",
"cables": "few"
}
]
}
]
}
Requirement:
SELECT 'warranty' (Laptop), 'quantity' (Mobile), 'color' (Mobile), 'cables' (Accessories) for a specific 'date' (for eg: 07202019)
I've tried the following query
SELECT
c.data[0].Laptop.warranty,
c.data[1].Mobile[0].quantity,
c.data[1].Mobile[0].color,
c.data[2].Accessories[0].cables
FROM c
WHERE ARRAY_CONTAINS(c.data[1].Mobile, {date : '07202019'}, true)
Original Output from above query:
[
{
"warranty": "yes",
"quantity": 2,
"color": "Black",
"cables": "few"
}
]
But how can I get this Expected Output, that has all order details in the array 'Mobile':
[
{
"warranty": "yes",
"quantity": 2,
"color": "Black",
"cables": "few"
},
{
"warranty": "yes",
"quantity": 1,
"color": "White",
"cables": "few"
}
]
Since I wrote c.data[1].Mobile[0].quantity i.e 'Mobile[0]' which is hard-coded, only one entry is returned in the output (i.e. the first one), but I want to have all the entries in the array to be listed out
Please consider using JOIN operator in your sql:
SELECT DISTINCT
c.data[0].Laptop.warranty,
mobile.quantity,
mobile.color,
c.data[2].Accessories[0].cables
FROM c
JOIN data in c.data
JOIN mobile in data.Mobile
WHERE ARRAY_CONTAINS(data.Mobile, {date : '07202019'}, true)
Output:
Update Answer:
Your sql:
SELECT DISTINCT c.data[0].Laptop.warranty, mobile.quantity, mobile.color, accessories.cables FROM c
JOIN data in c.data JOIN mobile in data.Mobile
JOIN accessories in data.Accessories
WHERE ARRAY_CONTAINS(data.Mobile, {date : '07202019'}, true)
My advice:
I have to say that,actually, Cosmos DB JOIN operation is limited to the scope of a single document. What possible is you can join parent object with child objects under same document. Cross-document joins are NOT supported.However,your sql try to implement mutiple parallel join.In other words, Accessories and Mobile are hierarchical, not nested.
I suggest you using stored procedure to execute two sql,than put them together. Or you could implement above process in the code.
Please see this case:CosmosDB Join (SQL API)
I want to group the data based on the type and type_id
Here is the array
var addArray = [
{
"id": 24,
"language_id": 3,
"type": "service",
"type_id": 2,
"key": "service seeker",
"value": " need service"
},
{
"id": 23,
"language_id": 3,
"type": "service",
"type_id": 2,
"key": "phone",
"value": "phone number"
},
{
"id": 24,
"language_id": 3,
"type": "service",
"type_id": 7,
"key": "tester",
"value": "service tester"
}
{
"id": 19,
"language_id": 3,
"type": "offer",
"type_id": 4,
"key": "source",
"value": "resource"
}
]
I have tried let result = _.groupBy(addArray,'type') it is grouping the data based on type but I need to group by type as well as type_id
Expected output
If you need the a flat grouping based on two or more properties, use the _.groupBy() callback to combine the properties to a string:
const addArray = [{"id":24,"language_id":3,"type":"service","type_id":2,"key":"service seeker","value":" need service"},{"id":23,"language_id":3,"type":"service","type_id":2,"key":"phone","value":"phone number"},{"id":24,"language_id":3,"type":"service","type_id":7,"key":"tester","value":"service tester"},{"id":19,"language_id":3,"type":"offer","type_id":4,"key":"source","value":"resource"}]
const result = _.groupBy(addArray, o => `${o.type}-${o.type_id}`)
console.log(result)
<script src="https://cdnjs.cloudflare.com/ajax/libs/lodash.js/4.17.11/lodash.min.js"></script>
If you need a multi level grouping, start by grouping by the type, then map the groups with _.values(), and group them again by type_id:
const { flow, partialRight: pr, groupBy, mapValues } = _
const fn = flow(
pr(groupBy, 'type'),
pr(mapValues, g => groupBy(g, 'type_id'))
)
const addArray = [{"id":24,"language_id":3,"type":"service","type_id":2,"key":"service seeker","value":" need service"},{"id":23,"language_id":3,"type":"service","type_id":2,"key":"phone","value":"phone number"},{"id":24,"language_id":3,"type":"service","type_id":7,"key":"tester","value":"service tester"},{"id":19,"language_id":3,"type":"offer","type_id":4,"key":"source","value":"resource"}]
const result = fn(addArray)
console.log(result)
<script src="https://cdnjs.cloudflare.com/ajax/libs/lodash.js/4.17.11/lodash.min.js"></script>