4 levels of nested Json, how to flatten? - snowflake-cloud-data-platform

Hi I am trying to flatten json with 4 levels of nested arrays. What is the best way to flatten this data, without having to flatten it 4 times?
Data example, staged:
{
"sample": {
"someitem": {
"thesearecool": [
{
"neat": "wow"
},
{
"neat": "tubular"
}
]
}
}
}
I think this works for the first flatten, but is there some way to flatten it two more times, so that each value is in a different column?
select src:sample::string, src:someitem::string, value
from
raw_source
, lateral flatten( input => src:sample )
Source: https://community.snowflake.com/s/article/How-To-Lateral-Join-Tutorial

Do you want to list the values of the neat values?
with raw_source as (select parse_json('{
"sample": {
"someitem": {
"thesearecool": [
{
"neat": "wow"
},
{
"neat": "tubular"
}
]
}
}
}') c)
select f.value:neat as neat
from
raw_source
, lateral flatten( input => c, path => 'sample.someitem.thesearecool' ) f;
In this case, you can use path parameter.

Related

MongoDB: find out query array's element not in database

is it possible to find out the query array element not in database?
example:
const query = ['aaa','bbb','ccc']
Documents in db:
[{name:'bbb'},{name:'ccc'}]
I want to find query array elements not in database:
return result should be:
['aaa']
I can't find some quickly method to do this except query each element(or batch?) in array
Any one has better method? thanks
Querying for stuff that are -missing- is always a more expensive operation, also there is no "magic" query to do it for you. I recommend using Mongo's distinct method, like so:
const queryArr = ['aaa', 'bbb', 'ccc'];
const allNames = await db.collection.distinct('name');
const notInDb = queryArr.filter(e => !allNames.includes(e));
However if you want to do it in 1 db command you could do something like this:
db.collection.aggregate([
{
$group: {
_id: null,
names: {
"$addToSet": "$name"
}
}
},
{
"$replaceRoot": {
"newRoot": {
results: {
$filter: {
input: [
"aaa",
"bbb",
"ccc"
],
as: "datum",
cond: {
$not: {
"$setIsSubset": [
[
"$$datum"
],
"$names"
]
}
}
}
}
}
}
}
])
Mongo Playground
As you can tell both approaches require you to load all the names into memory, there is no way around this, if your db's scale is too big for these approaches you will have to iterate over the query input and do it one by one.
const queryArr = ['aaa', 'bbb', 'ccc'];
for (let queryName of queryArr) {
const found = await db.collection.findOne({name: queryName})
if (!found) {
//ding
}
}
Assuming you have an index on name field this should be very efficient.

querying a multilevel array using SnowFlake

I am using SNOW_FLAKE and trying to query the data stored in the form of multi level array of elements under column name multi_array as example:
multi_array
[
{
"attribute_1": "hello",
"attribute_2": "hello1",
"group_attrbutes": [
{
"grp_attr1": "tst_val",
"grp_attr2": "test_val2"
}
]
}
]
The flatten output would be:
attribute_1 attribute_2 grp_attr1 grp_attr2
hello hello1 tst_val tast_val2
can any one please advise how do i flatten the group_attrbutes array so that it would get in tabular form
SELECT d.json
,f.value:attribute_1 as attribute_1
,f.value:attribute_2 as attribute_2
,g.value:grp_attr1 as grp_attr1
,g.value:grp_attr2 as grp_attr2
FROM (
SELECT parse_json('[
{
"attribute_1": "hello",
"attribute_2": "hello1",
"group_attrbutes": [
{
"grp_attr1": "tst_val",
"grp_attr2": "test_val2"
}
]
}
]') as json
) AS d,
table(flatten(input => d.json)) f,
table(flatten(input => f.value:group_attrbutes)) g
;
gives (with the JSON stripped out):
ATTRIBUTE_1 ATTRIBUTE_2 GRP_ATTR1 GRP_ATTR2
"hello" "hello1" "tst_val" "test_val2"

Typescript - Querying or flattening nested array but keeping some objects as nested ones

Again I'm stuck with a nested Array of objects. I want to flatten it out, but I do have to keep some nested objects. The Problem I'm running into: How to rename the keys of the nested objects since I have an undefined number of nested objects. There might be 3 of them oder 8. So property1_3 has to be renamed to eg property1_3_1, property1_3_2 - depending on how many objects are in the original json data. And how to aply them to the correct parent object.
The json data I recieve looks like:
data = [{
"property1_1": "value1_1",
"property1_2": "value1_2",
"property1_3": [
[{
"subproperty1_1_1": "subvalue1_1_1",
"subproperty1_1_2": "subvalue1_1_2"
}],
[{
"subproperty1_2_1": "subvalue1_2_1",
"subproperty1_2_2": "subvalue1_2_2"
}]
]
},
{
"property2_1": "value2_1",
"property2_2": "value2_2",
"property2_3": [
[{
"subproperty2_1_1": "subvalue2_2_1",
"subproperty2_1_2": "subvalue2_2_2"
}],
[{
"subproperty2_2_1": "subvalue2_2_1",
"subproperty2_2_2": "subvalue2_2_2"
}],
[{
"subproperty2_3_1": "subvalue2_2_1",
"subproperty2_3_2": "subvalue2_2_2"
}]
]
}
]
What I want to achieve now is:
data = [
{
"property1_1": "value1_1",
"property1_2": "value1_2",
"property1_3_index1": {"subproperty1_1_1":"subvalue1_1_1", "subproperty1_1_2":"subvalue1_1_2"},
"property1_3_index2": {"subproperty1_2_1":"subvalue1_2_1", "subproperty1_2_2":"subvalue1_2_2"}
},
{
"property2_1": "value2_1",
"property2_2": "value2_2",
"property2_3_index1": {"subproperty2_1_1":"subvalue2_2_1", "subproperty2_1_2":"subvalue2_2_2"},
"property2_3_index2": {"subproperty2_2_1":"subvalue2_2_1", "subproperty2_2_2":"subvalue2_2_2"},
"property2_3_index3": {"subproperty2_3_1":"subvalue2_2_1", "subproperty2_3_2":"subvalue2_2_2"}
}
]
My last try was:
transformData(input) {
const testArray = [];
input.map(obj => {
for (const prop in obj) {
if (obj.hasOwnProperty(prop) && Array.isArray(obj[prop])) {
for (const [index, element] of obj[prop].entries()) {
testArray.push(element[0]);
}
}
}
});
}
but this only leeds to an array with all the single subobjects in one array. I'm also not quite sure if it's best trying to convert the original data or to build a new array as I tried before.
I finally found a way to achieve this.
transformData(input) {
return input.map(obj => {
for (const prop in obj) {
if (obj.hasOwnProperty(prop) && Array.isArray(obj[prop])) {
for (let i = 0; i < obj[prop].length; i++) {
const name = prop + (i + 1).toString();
obj[name] = obj[prop].flat(1)[i];
}
delete obj[prop];
}
}
return obj;
});
}

How can I assign an element of the array to a new key in the document on MongoDB

I have a problem with MongoDB.
I have a collection with many documents like this:
{
key1:[
{el:"EL1"},
{el:"EL2"}
]
}
I want to update all documents in the collection col adding a new key key2 where the value is key1.0.
In particular a generic output's document will be:
{
key1:[
{el:"EL1"},
{el:"EL2"}
],
key2: {el:"EL1"}
}
How can I do that?
Thanks
You can use $addFields with $let to generate new field based on key1.0 value and then you can run $out to update existing collection with the result of aggregation:
db.col.aggregate([
{
$addFields: {
"key2.el": {
$let: {
vars: { fst: { $arrayElemAt: [ "$key1", 0 ] } },
in: "$$fst.el"
}
}
}
},
{ $out: "col" }
])

logstash filter to modify array field

I am looking for a logstash filter that can modify array fields.
For example, I would like a modifier that can turn this JSON document
{
arrayField: [
{
subfield: {
subsubfield: "value1"
}
},
{
subfield: {
subsubfield: "value2"
}
}
]
}
Into this JSON document
{
arrayField: [
{
subfield: "value1"
},
{
subfield: "value2"
}
]
}
I have tried the following input
input {
mutate {
replace => ["[arrayField][subfield]", "%{[arrayField][subField][subsubField]}"]
}
}
but the input just rewrites the array field instead of operating on each element of the array. How do you set up a modifier to operate on each element of an array?
Thanks Alain Collins for pointing out the ruby filter. The below input did the trick.
input {
ruby {
code => "
event['arrayField'].each{|subdoc| subdoc['subfield'] = subdoc['subfield']['subsubfield']}
"
}
}

Resources