Snowflake : array_agg inside an array_agg - snowflake-cloud-data-platform

I would like to perform this transformation :
below is the input and the output is the json with the "d": [ "c_value_1", "c_value_2"].
As you can see, I am working with an array of nested objects, and I would like to flatten c.
I know this involves two array_aggs, but I always end up with this error :
SQL compilation error: Unsupported subquery type cannot be evaluated
with table_1 as (
select parse_json(
'[
{
"a": "a_value_1",
"b": [
{
"c": "c_value_1",
},
{
"c": "c_value_2",
}
]
},
{
"a": "a_value_2",
"b": [
{
"c": "c_value_2",
},
{
"c": "c_value3",
}
]
}
]'
) as json_object
)
select parse_json(
'[
{
"a": "a_value_1",
"d": [ "c_value_1", "c_value_2"]
},
{
"a": "a_value_2",
"d": [ "c_value_2", "c_value_3"]
}
]'
);
The goal is to have a table with the second array of objects as a column

Get Path (explains how the ':' work)
Object Agg()
Object Keys (never leave home without em)
with table_1 as (
select
parse_json(
'[
{
"a": "a_value_1",
"b": [
{
"c": "c_value_1",
},
{
"c": "c_value_2",
}
]
},
{
"a": "a_value_2",
"b": [
{
"c": "c_value_2",
},
{
"c": "c_value3",
}
]
}
]'
) as json_object
)
SELECT
DISTINCT OBJECT_AGG(
h.value::string, iff(
h.value = 'a',
object_construct(h.value::string,g.value:a) ,object_construct(h.value::string,array_construct(g.value:b[0].c,g.value:b[1].c
) ) ) ) over (partition by h.seq) bob
FROM
table_1,
lateral flatten(input => json_object) g,
lateral flatten(input => object_keys(g.value)) h;
Summary.
Rip it down to the atomic(lowest) layer using the flattens ect
Build it up using the Semi Structured functions
Possible improvements - removing the IFF in the middle - just ran out of time... sorry.

Related

Add new column to multi nested array in mongo db

"ParentType": {
"Food": [
{
"Name": "Burger",
"FoodId": "5e3abe145c1bfb31b4e335de",
"Price": 0,
"Quantity": 1,
"SubCategory": 0
}
],
"Inventory": [
{
"Name": "Small Popcorn",
"InventoryId": "5e3a64245c1bfb31b4e335b7",
"Price": 0,
"Quantity": 1,
"SubCategory": 0
}
]
}
I need to add UOM as new column only for Inventory array.I have used aggregate as below but collection is not getting updated.Pls help me with adding this new Column in mongodb
db.Concession.aggregate([
{
$addFields: { ParentType.Inventory.UOM: "null" }
}
])
add UOM to all eliment in Inventory
db.collection.update(
{},
{ $set: { 'ParentType.Inventory.$[].UOM':''} }
)
Option 1: ( Update/$set 3.6+ )
db.collection.update({},
{
$set: {
"ParentType.Inventory.$[].UOM": null
}
},
{
multi: true
})
Explained:
Use update() operation with positional operator $[] ( mongoDB 3.6+ )
Use option multi to update all documents in collection matching the filter criteria
Playground
Option 2: ( Update/aggregation 4.2+)
db.collection.update({},
[
{
$addFields: {
"ParentType.Inventory": {
"$map": {
input: "$ParentType.Inventory",
as: "i",
in: {
"$mergeObjects": [
"$$i",
{
UOM: null
}
]
}
}
}
}
}
],
{
multi: true
})
Explained:
Use update with aggregation pipeline ( mongoDB 4.2+ )
In aggregation use the addFileds with mergeObjects so you add the new fields to the array element.
Use multi:true to affect all documents in collection matching the filter criteria
Playground 2

How to retrieve all child nodes from JSON file

I have below JSON file, which is in the external stage, I'm trying to write a copy query into the table with the below query. But it's fetching a single record from the node "values" whereas I need to insert all child elements for the values node. I have loaded this file into a table with the variant datatype.
The query I'm using:
select record:batchId batchId, record:results[0].pageInfo.numberOfPages NoofPages, record:results[0].pageInfo.pageNumber pageNo,
record:results[0].pageInfo.pageSize PgSz, record:results[0].requestId requestId,record:results[0].showPopup showPopup,
record:results[0].values[0][0].columnId columnId,record:results[0].values[0][0].value val
from lease;
{
"batchId": "",
"results": [
{
"pageInfo": {
"numberOfPages": ,
"pageNumber": ,
"pageSize":
},
"requestId": "",
"showPopup": false,
"values": [
[
{
"columnId": ,
"value": ""
},
{
"columnId": ,
"value":
}
]
]
}
]
}
you need to use the LATERAL FLATTEN functions, something like this:
I created this table:
create table json_test (seq_no integer, json_text variant);
and then populated it with this JSON string:
insert into json_test(seq_no, json_text)
select 1, parse_json($${
"batchId": "a",
"results": [
{
"pageInfo": {
"numberOfPages": "1",
"pageNumber": "1",
"pageSize": "100000"
},
"requestId": "a",
"showPopup": false,
"values": [
[
{
"columnId": "4567",
"value": "2020-10-09T07:24:29.000Z"
},
{
"columnId": "4568",
"value": "2020-10-10T10:24:29.000Z"
}
]
]
}
]
}$$);
Then the following query:
select
json_text:batchId batchId
,json_text:results[0].pageInfo.numberOfPages numberOfPages
,json_text:results[0].pageInfo.pageNumber pageNumber
,json_text:results[0].pageInfo.pageSize pageSize
,json_text:results[0].requestId requestId
,json_text:results[0].showPopup showPopup
,f.value:columnId columnId
,f.value:value value
from json_test t
,lateral flatten(input => t.json_text:results[0]:values[0]) f;
gives these results - which I think is roughly what you are looking for:
BATCHID NUMBEROFPAGES PAGENUMBER PAGESIZE REQUESTID SHOWPOPUP COLUMNID VALUE
"a" "1" "1" "100000" "a" false "4567" "2020-10-09T07:24:29.000Z"
"a" "1" "1" "100000" "a" false "4568" "2020-10-10T10:24:29.000Z"

Snowflake query for retrieving JSONs that contain string in ANY element of a list

The JSON object contains this:
"entities": {
"hashtags": [
{
"indices": [
17,
29
],
"text": "HBOMAX4ZACK"
},
{
"indices": [
38,
51
],
"text": "TheSnyderCut"
}
],
I want to select only those rows that contain 'XYZ' in ANY ONE of the entries in 'hashtags'. I know I can do this:
select record_content:text, * from tweets where record_content:entities:hashtags[0].text = 'HBOMAX4ZACK';
But as you can see, I have hard-coded 'hashtags[0]' in this case. I want to check if 'HBOMAX4ZACK' exists in any element.
Use lateral and FLATTEN to explode the json. With flatten you are exploding the json up to the level you need (in path):
select t.json,
t2.VALUE value_matching
from (select parse_json('{
"entities": {
"hashtags": [{
"indices": [
17,
29
],
"text": "HBOMAX4ZACK"
},
{
"indices": [
38,
51
],
"text": "TheSnyderCut"
}
]
}
}') as json) t,
lateral flatten(input => parse_json(t.json), path => 'entities.hashtags') t2
WHERE t2.VALUE:text = 'HBOMAX4ZACK';
You can find a tutorial to the topic here https://community.snowflake.com/s/article/How-To-Lateral-Join-Tutorial

Querying array in hazelcast jsonvalues

I am trying to search HazelcastJsonValue, data example for the same.
class A {
B[] listOfB;
}
class B {
int num;
String name;
}
'A' object is present in Hazelcast as HazelcastJsonValue and i want to create query which fetches all objects which contain B for which num = 10 and name = test
hazelcast query for array search using predicate
Predicate.equal("listOfB[any].name","test")
for above scenario query i can make using predicates
Predicate[] arrayOfPredicate = {Predicates.equal("listOfB[any].num",10)
,Predicates.equal("listOfB[any].name","test")};
Predicate p = Predicates.and(arrayOfPredicate);
System.out.println(p.toString()); // (listOfB[any].num=10 AND listOfB[any].name=test)
Example Data in hazelcast
[
{
"listOfB": [
{
"num": 10,
"name": "ab"
},
{
"num": 11,
"name": "test"
}
]
},
{
"listOfB": [
{
"num": 10,
"name": "test"
},
{
"num": 12,
"name": "xyz"
}
]
},
{
"listOfB": [
{
"num": 30,
"name": "abc"
}
]
}
]
Hazelcast query for same
(listOfB[any].num=10 AND listOfB[any].name=test)
But this is not giving desired results instead below result came
[
{
"listOfB": [
{
"num": 10,
"name": "ab"
},
{
"num": 11,
"name": "test"
}
]
},
{
"listOfB": [
{
"num": 10,
"name": "test"
},
{
"num": 12,
"name": "xyz"
}
]
}
]
Desired results are
{
"listOfB": [
{
"num": 10,
"name": "test"
},
{
"num": 12,
"name": "xyz"
}
]
}
How can i get desired results?
Both of the above should've been returned in your result set. Is this not the case? The fact that you're wanting any will return true for the above data. If you limited the filter to listOfB[0], then the 2nd will be returned but I'm sure your intention is to not limit to 1st item only.

Remove common elements from JSON Objects in Array

I have an array of objects and I am looking to remove all the elements from the objects and their sub-objects that are common across all objects.
Maybe the best way to explain this is with an example
[
{
"a": {
"k1": [1,2,3],
"k2": 4
},
"b": {
"k3": {
"foo": "bar",
"top": "bottom"
},
"k4": 5
},
"c": {
"k5": [{"cat":"dog"},{"rat":"not rat"}]
},
"d": { }
},
{
"a": {
"k1": [1,2,3],
"k2": -4
},
"b": {
"k3": {
"foo": "hat",
"top": "bottom"
},
"k4": 5
},
"c": {
"k5": [{"cat":"dog"},{"rat":"mouse"}]
}
}
]
would evaluate to
[
{
"a": {
"k2": 4
},
"b": {
"k3": {
"foo": "bar"
}
},
"c": {
"k5": [{"cat":"dog"},{"rat":"not rat"}]
},
"d": { }
},
{
"a": {
"k2": -4
},
"b": {
"k3": {
"foo": "hat"
}
},
"c": {
"k5": [{"cat":"dog"},{"rat":"mouse"}]
},
"d": null
}
]
Are there any good tools I can use to solve this? I looked at json-diff but that doesn't quite fit my requirements.
I wrote some julia functions to do this for me
I started by computing the common fields in the objects and then proceeded to remove the common fields from each of the objects.
function common_in_array(a::Array)
common = deepcopy(a[end])
common_in_array!(a[1:end-1], common)
end
function common_in_array!(a::Array, common::Dict)
if size(a,1) == 0
return common
else
return common_in_array!(a[1:end-1], dict_common!(a[end], common))
end
end
function dict_common!(d::Dict, common::Dict)
keys_d = keys(d)
keys_common = keys(common)
all_keys = union(keys_d, keys_common)
and_keys = intersect(keys_d, keys_common)
for k in setdiff(all_keys, and_keys)
delete!(common, k)
end
for k in and_keys
v1 = d[k]
v2 = common[k]
if typeof(v1) != typeof(v2)
delete!(common, k)
elseif isa(v2, Dict)
dict_common!(v1, v2)
elseif v1 != v2
delete!(common, k)
end
end
common
end
function remove_common_from_dict!(d::Dict, common::Dict)
for (key, value) in common
if key in keys(d)
value_d = d[key]
if value == value_d
delete!(d, key)
elseif isa(value, Dict) && isa(value_d, Dict)
remove_common_from_dict!(value_d, value)
end
end
end
d
end
function remove_common_from_array!(a::Array, common::Dict)
map(d -> remove_common_from_dict!(d, common), a)
end
function remove_common_from_array!(a::Array)
remove_common_from_array!(a, common_in_array(a))
end
then I evaluate this on my json_array string
using JSON
JSON.print(remove_common_from_array!(JSON.parse(json_array)))

Resources