SNOWFLAKE querying the array of elements - snowflake-cloud-data-platform

I am using SNOW_FLAKE and trying to query the data stored in the form of array of elements under column name nested_colmn as example:
nested_colmn
[
{
"firstKey": "val1",
"secondKey": 2555,
"thirdKey": false,
"fourthkey": "otrvalue"
},
{
"firstKey": "val2",
"secondKey": 255221,
"thirdKey": true,
"fourthkey": "otrvalu"
}
]
The above Array gets returned as one complete row if I do
Select nested_colmn from table_name
Now I want to query/get the results only for the firstkey(nested_colmn.firstkey) from the Attributes column. How do I frame the query to be to retrieve the individual custom elements from an array instead of getting all. Please help me if any thoughts on this

Note: I will assume that you truly want the source table to have the array as a value, instead of stripping the outer array and placing each element into its own row.
First, create a test table with your sample data:
CREATE OR REPLACE TEMPORARY TABLE table_name (
nested_colmn VARIANT
)
AS
SELECT PARSE_JSON($1) AS nested_colmn
FROM VALUES
($$
[
{
"firstKey": "val1",
"secondKey": 2555,
"thirdKey": false,
"fourthkey": "otrvalue"
},
{
"firstKey": "val2",
"secondKey": 255221,
"thirdKey": true,
"fourthkey": "otrvalu"
}
]
$$)
;
With that, here is a sample query:
SELECT F.VALUE:"firstKey"::VARCHAR AS FIRST_KEY
FROM table_name T
,LATERAL FLATTEN(nested_colmn) F
;

You're going to need to run a lateral flatten on the array and then parse the JSON:
WITH x as (
SELECT array_construct(parse_json('
{
"firstKey": "val1",
"secondKey": 2555,
"thirdKey": false,
"fourthkey": "otrvalue"
}'),parse_json('
{
"firstKey": "val2",
"secondKey": 255221,
"thirdKey": true,
"fourthkey": "otrvalu"
}')) as var)
SELECT p.value:firstKey::varchar
FROM x,
lateral flatten(input => x.var) p;

Related

dynamically flatten json using snowflake function

Is it possible to dynamically flatten json using snowflake function ?
select key,value from table a ,lateral flatten(input => variant_column)
gives
which need to be converted as
I've assumed your source JSON is something like this:
[
{
"empname": "e1",
"empid": 123
},
{
"empname": "e2",
"empid": 456
}
]
Based on this, you can achieve the output you want using:
select
s.value:empname::varchar as empname,
s.value:empid::number as empid
from
json j,
lateral flatten (input => j.src, path => '', mode => 'ARRAY') s
;
Full example replication code:
create or replace table json (src variant);
insert into json(src) select parse_json($$
[
{
"empname": "e1",
"empid": 123
},
{
"empname": "e2",
"empid": 456
}
]
$$
);
select * from json;
select
s.value:empname::varchar as empname,
s.value:empid::number as empid
from
json j,
lateral flatten (input => j.src, path => '', mode => 'ARRAY') s
;

How to parse this json in snowflake

{
"segmentId": "b204c220-ea8d-4cf4-b579-30eb59a1a2a4",
"diffFields": [
{
"fieldName": "name",
"valueBefore": null,
"valueAfter": "new-segment-name"
},
{
"fieldName": "active",
"valueBefore": null,
"valueAfter": true
}
]
}
In the above json I have an array of diffFields . I am trying to parse this in snowflake get the array of columns instead of rows.
I tried flatten, but this flatten it as rows.
I am trying to parse this in dbt to create another table from the above json with table structure as
create table some_table (
field_one,
--if `name` is present in the above json I want that to be 2nd column
-- if `active` is present in the above json i want that to be 3nd column
)
)
I would flatten it like
WITH data as (
select parse_json('
{
"segmentId": "b204c220-ea8d-4cf4-b579-30eb59a1a2a4",
"diffFields": [
{
"fieldName": "name",
"valueBefore": null,
"valueAfter": "new-segment-name"
},
{
"fieldName": "active",
"valueBefore": null,
"valueAfter": true
}
]
}') as json
)
select
json:segmentId::text as seg_id,
f.value:fieldName::text as fieldName,
f.value:valueBefore as valueBefore,
f.value:valueAfter as valueAfter
from data, table(flatten(input=>json:diffFields)) f
which gives:
SEG_ID
FIELDNAME
VALUEBEFORE
VALUEAFTER
b204c220-ea8d-4cf4-b579-30eb59a1a2a4
name
null
"new-segment-name"
b204c220-ea8d-4cf4-b579-30eb59a1a2a4
active
null
true
but those variant data nulls are not real nulls. so you want to use something like is_null_value to test and covert to real nulls
To select array parts:
select json:segmentId::text
,max(iff(f.value:fieldName::text = 'name', f.value, null)) as name_object
,max(iff(f.value:fieldName::text = 'active', f.value, null)) as active_object
from data, table(flatten(input=>json:diffFields)) f
group by 1;
gives:
JSON:SEGMENTID::TEXT
NAME_OBJECT
ACTIVE_OBJECT
b204c220-ea8d-4cf4-b579-30eb59a1a2a4
{ "fieldName": "name", "valueAfter": "new-segment-name", "valueBefore": null }
{ "fieldName": "active", "valueAfter": true, "valueBefore": null }
You can use QuickTable to connect snowflake and do JSON parse use QuickTable. QuickTable can generate related SQL that is compatible with Snowflake.

Querying for an array object in Postgres jsonb column

I have a Postgres table with 2 columns "nodes" & "timestamp".The "nodes" column is of type jsonb & is an array of objects of the following format:
[
{
"addr": {},
"node_number": "1",
"primary": false
},
{
"addr": {},
"node_number": "2",
"primary": true
},
]
I want to find the object in this array that has "primary":true in the most recent row. If the above was the latest row, the result should be:
{
"addr": { },
"node_number": "2",
"primary": true
}
I have tried:
SELECT(nodes -> 0) FROM table WHERE nodes #> '[{"primary": true}]'
order by timestamp desc
limit 1;
which gives the object at index 0 in the array not the desired object that has "primary": true.
How can I implement the query ?
Use jsonb_array_elements() in a lateral join:
select elem
from my_table
cross join jsonb_array_elements(nodes) as elem
where (elem->>'primary')::boolean
elem
---------------------------------------------------
{"addr": {}, "primary": true, "node_number": "2"}
(1 row)

How to write mongodb query for the following SQL query

Suppose I have following data:
articles[{
_id:1,
flag1:true,
date:2016-09-09,
title:"...",
flag2:false
},
{
_id:2,
flag1:true,
date:2016-09-10,
title:"...",
flag2:false
},
{
_id:3,
flag1:false,
date:2016-09-11,
title:"...",
flag2:true
},
{
_id:4,
flag1:false,
date:2016-09-13,
title:"...",
flag2:true
}
]
I want individual sorting [basically I have to select two list one sorted list with flag1:true and flag2:true finally merge them into one list]
and flag1:true records list on top.
I want to get output in following order:
[
{
_id:2,
flag1:true,
date:2016-09-10,
title:"...",
flag2:false
},
{
_id:1,
flag1:true,
date:2016-09-09,
title:"...",
flag2:false
},
{
_id:4,
flag1:false,
date:2016-09-13,
title:"...",
flag2:true
},
{
_id:3,
flag1:false,
date:2016-09-11,
title:"...",
flag2:true
}
]
How do I write this SQL query in mongoose/mongodb?
select * from articles
where _id in
(select _id from articles where Flag1=true
order by date desc)
or
_id in (select _id from articles where Flag2=true
order by date desc)
I want to write individual sorting, so that I will get Flag1 based records in first priority with the sorted order.
> db.articles.find({ $or: [ { Flag1: true }, { Flag2: true } ] }).sort({date:-1})
However I am unclear with your requirements..still hope this will help you.
UPDATE:
Okais...then you just need to add sort by those two fields :-
db.articles.find({ $or: [ { Flag1: true }, { Flag2: true } ] })
.sort({Flag1:-1,Flag2:-1,date:-1})
I got the answer for my question. I used aggregate function with $project for sorting I used virtual field with $project.
We can create subqueries by using $project.

Group by specific value inside array inside another array in MongoDB

I am trying to write a query to group my records by the value inside an array inside another array inside a collection in MongoDB. Now if that doesn't have your head hurting I think a sample schema might be easier to understand:
{
"_id": ObjectId("...")
"attributes": [
[ "attributeA", "valueA" ],
[ "attributeB", "valueB" ],
[ "attributeC", "valueC" ],
...
]
}
Now I want to be able to group my records by the attributeB field based on valueB.
So far I can aggregate if I know the actual value of valueB:
collection.aggregate([
{ '$match': { 'attributes': [ "attributeB", "valueB" ] } },
{ '$group': {
'_id': { 'attributes': [ "attributeB", "valueB" ] } }
}
])
Basically seeing if the attributes array contains the pair: [ "attributeB", "valueB" ]. But now I want to be able to have the query determine what valueB is as it performs the aggregation.
To paraphrase: I can't seem to figure out how to group by the value if I don't know the value of valueB. I just want all records to group by their valueB's, when attributeB is found at the first position inside an array inside the attributes array.
Any help is appreciated. Thanks!
After grouping your data you should use the $unwind operator. It pairs up your other fields with every item in the array.
collection.aggregate([
{ '$match': { 'attributes': [ "attributeB", "valueB" ] } },
{ '$group': {
'_id': { 'attributes': [ "attributeB", "valueB" ] } }
},
{ '$unwind': 'attributes'},
... // here you can match again and continue aggregation
])
Most probably this is not the fastest solution. I will think of a better one.
Also note that the order of elements in the array is not preserved.
UPDATE
This is a similar question. So what I would do is create documents in the attribute array like
'attributes':[
{'attribute': 'attributeB', 'value': 'valueB'},
{'attribute': 'attributeC', 'value': 'valueA'},
]
So you can access your valueB after the $match or $unwind through $value.

Resources