Querying list in a Snowflake VARIANT file - snowflake-cloud-data-platform

I have a VARIANT table that has many JSON files but, for the sake of the example and to illustrate my issue, let's look at only the two rows below.
{
"id" : "1",
"fields":
[
{
"id": "somekey1",
"value" : "value1"
},
{
"id": "somekey2",
"value" : "value2"
}
]
},
{
"id" : "2",
"fields":
[
{
"id": "somekey1",
"value" : "value1"
},
{
"id": "somekey2",
"value" : "value2"
},
{
"id": "somekey3",
"value" : "value3"
}
]
}
I want to write a query that would give me this output:
ID
VALUES
1
["value1","value2","value3"]
2
["value1","value2"]
I have tried many things and this query gave me something of a result but not remotely close to the desired output:
SELECT
file:id as ID,
s.value:value::varchar as VALUES
from variant_table,
table(flatten(FILE:fields)) s
And the result is below, plus it omits the JSON if the fields is empty:
ID
VALUES
1
"value1"
1
"value2"
1
"value3"
2
"value1"
2
"value2"
What would be the best approach to solve this in Snowflake?

After flattening, this will turn the rows into arrays for the VALUES column:
SELECT
file:id::int as ID,
array_agg(s.value:value::varchar) as "VALUES"
from variant_table,
table(flatten(FILE:fields)) s
group by ID

Related

How to project a specific index inside a multilevel nested array in mongodb

I have a particular field in my document which has a multilevel nested array structure. The document looks like something this
{
"_id" : ObjectId("62171b4207476091a17f595f"),
"data" : [
{
"id" : "1",
"content" : [
{
"id" : "1.1",
"content" : []
},
{
"id" : "1.2",
"content" : [
{
"id" : "1.2.1",
"content" : [
{
"id" : "1.2.1.1",
"content" : []
}
]
},
{
"id" : "1.2.2",
"content" : []
}
]
}
]
}
]
}
(The ids in my actual data is a random string, I have added a more defined id here just for readability)
In my application code the nesting level is controlled so it won't go more than 5 levels deep.
I need to project a particular object inside one of the deeply nested arrays.
I have all the information needed to traverse the nested structure. For example if I need to fetch the object with id "1.2.2" my input will look something like this:
[{id: 1, index: 0}, {id: 1.2, index: 1}, {id: 1.2.2, index: 1}]
In the above array, each element represents one level of nesting. I have the Id and the index. So in the above example, I know I first need to travel to index 0 at the top level, then inside that to index 1 , then inside that to index 1 again to find my object.
Is there a way I can only get the inner object that I want directly using a query. Or will I need to get the whole "data" field and do the traversal in my application code. I have been unable to figure out any way to construct a query that would satisfy my need.
Query
if you know the path, you can do it using a series of nested
$getField
$arrayElemAt
you can do it in one stage with nested calls, or with many new fields like i did bellow, or with mongodb variables
*i am not sure what output you need, this goes inside to get the 2 using the indexes (if this is not what you need add if you can the expected output)
Test code here
Data
[
{
"_id": ObjectId( "62171b4207476091a17f595f"),
"data": [
{
"id": "1",
"content": [
{
"id": "1.1",
"content": []
},
{
"id": "1.2",
"content": [
{
"id": "1.2.1",
"content": [
{
"id": "1.2.1.1",
"content": []
}
]
},
{
"id": "1.2.2",
"content": [1,2]
}
]
}
]
}
]
}
]
Query
aggregate(
[{"$set":
{"c1":
{"$getField":
{"field":"content", "input":{"$arrayElemAt":["$data", 0]}}}}},
{"$set":
{"c2":
{"$getField":
{"field":"content", "input":{"$arrayElemAt":["$c1", 1]}}}}},
{"$set":
{"c3":
{"$getField":
{"field":"content", "input":{"$arrayElemAt":["$c2", 1]}}}}},
{"$project":{"_id":0, "c4":{"$arrayElemAt":["$c3", 1]}}}])
Results
[{
"c4": 2
}]

Multiple match conditions for same array element

I have a collection named "devices" with roughly 50,000 documents. I'm trying to query the "routes" array within each document and have it return the document if multiple conditions are met for the individual array elements. The problem is it seems Mongo is giving back answers where the multiple conditions are satisfied for different array elements.
Sample Data:
{
"_id": 0,
"name": "example1",
"serial": "123456",
"routes": [
{
"description": "8989",
"zone": "front"
},
{
"description": "1221",
"zone": "back"
}
]
},
{
"_id": 1,
"name": "example2",
"serial": "987654",
"routes": [
{
"description": "1515",
"zone": "front"
},
{
"description": "8989",
"zone": "side"
}
]
}
I've tried simple .find() variations with no luck including
db.devices.find({"routes.description":"8989", "routes.zone":"front"})
db.devices.find({"$and": [{"routes.description":"8989"}, {"routes.zone":"front"}]})
I've also tried aggregations which seems to fail on me since my understanding of them is elementary. The desired results for the queries above would be a single document ("_id":0) and not both documents.
{ "_id" : 0, "name" : "example1", "serial" : "123456", "routes" : [ { "description" : "8989", "zone" : "front" }, { "description" : "1221", "zone" : "back" } ] }
Additionally, the ability to query the array using the $in operator would be desired. For example, the following query's desired output would be both documents since both of them have routes that match "zone":"front" and "descriptions" that are in the list.
db.devices.find({"$and": [{"routes.description": { $in: ["8989", "1515"] }}, {"routes.zone":"front"}]})
You simply need to use $elemMatch here
db.devices.find({routes: {$elemMatch: {description:"8989", zone:"front"}}})
Example

JSON schema different for first row and different for remaining rows

my problem statement is :
consider a a list of 15 rows, all rows should have 5 keys. However only the 0th row will have 4 keys. But all the remaining rows will have all the 5 keys.
I want to validate this again my response. Does first and other keyword really exist.
I found this here Correct JSON Schema for an array of items of different type
Example schema
{
"type": "array",
"items": {
"oneOf": [
{
"first": [{
"type": "object",
"required": ["state"],
"properties":{
"state":{
"type":"string"
}
}
}]
},
{
"other": [{
"type": "object",
"required": ["state", "zip"],
"properties":{
"state":{
"type":"string"
},
"zip":{
"type":"string"
}
}
}]
}
]
}
}
First things first: what do you want to achieve with following schema definition?
"first" : [ { ...schema... } ]
As to your problem statement, I am not sure, what you want to achieve:
Schema that allows first array item to be an object with 4 keys, while all other items should have 5 keys?
Schema, that allows only array items=object with 5 keys and will reject a JSON, which does have 4 keys in first item
Could you please rephrase your question to make it more clear? I did some solution basing on assumptions, but it would be good if you could confirm my understanding.
Required reading
Please read first through:
http://json-schema.org/latest/json-schema-validation.html#rfc.section.6.4.1
If "items" is an array of schemas, validation succeeds if each element
of the instance validates against the schema at the same position, if
any.
plus https://stackoverflow.com/a/52758108/2811843 on above topic
https://json-schema.org/understanding-json-schema/reference/array.html#length
https://json-schema.org/understanding-json-schema/reference/array.html#tuple-validation
and https://json-schema.org/understanding-json-schema/reference/array.html in general
as well as
https://json-schema.org/understanding-json-schema/reference/object.html#property-names
https://json-schema.org/understanding-json-schema/reference/object.html#size
and https://json-schema.org/understanding-json-schema/reference/object.html in general.
Possible solution
After looking at sample schema I will rephrase problem statement making some wild assumptions you want a schema, that allows an array of items, where item = object. First item could have 4 keys, while all other items must have 5 keys.
I need a JSON schema that will describe an array of objects, where
first object always has 4 keys/properties, while all remaining objects
do have 5 keys/properties.
Additionally, there is always at least first item in array (containing 4 keys) and there can be up to X other
objects (containing 5 keys) in array.
Go for Tuple-typing and array of objects. Thus you might exactly check that first item (object) has exactly 4 properties and define the schema for the rest of them.
First, full working schema (with comments inside). The "examples" section contains examples of arrays to illustrate the logic, only last 3 will be valid against schema.
{
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "array",
"$comment" : "This is an array, where first item must be an object with at least 4 properties and one property named \"state\" and can contain minimum 1 and maximum of 3 items",
"minItems" : 1,
"maxItems" : 3,
"items": [
{
"type": "object",
"minProperties" : 4,
"required" : ["state"],
}
],
"additionalItems" : {
"$comment" : "Any additional item in this array must be an object with at least 5 keys and two of them must be \"state\" and \"zip\".",
"type" : "object",
"minProperties" : 5,
"required" : ["state", "zip"],
},
"examples" : [
[
{
"key1" : "1",
"key2" : "2",
"key3" : "3",
"state" : "some state",
},
{},
{}
],
[
{
"key1" : "1",
"key2" : "2",
"key3" : "3",
"state" : "some state",
},
{
"key1" : "1",
"key2" : "2",
"key3" : "3",
"state" : "some state",
"zip" : "12345"
},
{
"key1" : "1",
"key2" : "2",
"key3" : "3",
"state" : "some state",
}
],
[
{
"key1" : "1",
"key2" : "2",
"key3" : "3",
"state" : "some state",
},
{
"key1" : "1",
"key2" : "2",
"key3" : "3",
"state" : "some state",
"zip" : "12345"
},
{
"key1" : "1",
"key2" : "2",
"key3" : "3",
"state" : "some state",
"zip" : "54321"
},
{
"key1" : "1",
"key2" : "2",
"key3" : "3",
"state" : "some state",
"zip" : "54321"
}
],
[],
[
{
"key1" : "1",
"key2" : "2",
"key3" : "3",
"state" : "some state",
},
{
"key1" : "1",
"key2" : "2",
"key3" : "3",
"state" : "some state",
"zip" : "12345"
},
{
"key1" : "1",
"key2" : "2",
"key3" : "3",
"state" : "some state",
"zip" : "54321"
},
],
[
{
"key1" : "1",
"key2" : "2",
"key3" : "3",
"state" : "some state",
},
],
[
{
"key1" : "1",
"key2" : "2",
"key3" : "3",
"state" : "some state",
},
{
"key1" : "1",
"key2" : "2",
"key3" : "3",
"state" : "some state",
"zip" : "12345"
},
]
]
}
So, step by step:
"type": "array",
"minItems" : 1,
"maxItems" : 3,
an JSON which is an array with minimum 1 item, maximum 3 items, will be ok. If you don't define "minItems" value, the empty array would pass validation against schema.
"items": [
{
"type": "object",
"minProperties" : 4,
"required" : ["state"],
}
],
This is the Tuple magic - a finite, ordered list of elements (sequence). Yep, maths has it's saying. By using "items" : [ ... ] instead of { ... } you fall into quoted above section of JSON Schema Validation spec (http://json-schema.org/latest/json-schema-validation.html#rfc.section.6.4.1 ).
Above basically says: This is an array, where first item must be an object with at least 4 keys and one of those keys must be "state".
Ok, last but not least:
"additionalItems" : {
"$comment" : "Any additional item in this array must be an object with at least 5 keys and two of them must be \"state\" and \"zip\".",
"type" : "object",
"minProperties" : 5,
"required" : ["state", "zip"],
}
By this I said:
in this array (which must have first item an object with 4 keys and one of those keys is "state" and oh, by the way, an array must have at least 1 item and tops 3 items) you can have additional items on top of the the ones already defined in "items" section. Each such additional item must be an object with at least 5 keys, out of which two must be "state" and "zip".
Does it solve your issue?

How to update an embedded document into a nested array?

I have this kind of structure into a Mongo collection :
{
"_id": "12345678",
"Invoices": [
{
"_id": "123456789",
"Currency": "EUR",
"DueTotalAmountInvoice": 768.3699999999999,
"InvoiceDate": "2016-01-01 00:00:00.000",
"Items": [
{
"Item": 10,
"ProductCode": "ABC567",
"Quantity": 1
},
{
"Item": 20,
"ProductCode": "CDE987",
"Quantity": 1
}
]
},
{
"_id": "87654321",
"Currency": "EUR",
"DueTotalAmountInvoice": 768.3699999999999,
"InvoiceDate": "2016-01-01 00:00:00.000",
"Items": [
{
"Item": 30,
"ProductCode": "PLO987",
"Quantity": 1,
"Units": "KM3"
},
{
"Item": 40,
"ProductCode": "PLS567",
"Quantity": 1,
"DueTotalAmountInvoice": 768.3699999999999
}
]
}
]
}
So I have a first object storing several Invoices and each Invoice is storing several Items. An item is an embedded document.
So in relational modelisation :
A customer has 1 or several Invoice
An Invoice has 1 or several Item
I am facing an issue since I am trying to update a specific Item into a specific a specific Invoice. For example I want to change the quantity of the item 10 in Invoice 123456789.
How is it possible to do that in Mongodb ?
I tried :
Push statement but it doesn't seem to work for nested arrays
arrayFilters but it doesn't seem to work for embedded document in nested arrays (only simple value arrays).
Can you give me some advice about it ?
Thank you !
As per your problem description here:
For example I want to change the quantity of the item 10 in Invoice 123456789. I just changed the Quantity to 3. You can perform any operations here as you want. You just need to take note of how I used arrayFilters here.
Try this query:
db.collection.update(
{"_id" : "12345678"},
{$set:{"Invoices.$[element1].Items.$[element2].Quantity":3}},
{multi:true, arrayFilters:[ {"element1._id": "123456789"},{
"element2.Item": { $eq: 10 }} ]}
)
The above query successfully executed from mongo shell (Mongo 3.6.3). And I see this result:
/* 1 */
{
"_id" : "12345678",
"Invoices" : [
{
"_id" : "123456789",
"Currency" : "EUR",
"DueTotalAmountInvoice" : 768.37,
"InvoiceDate" : "2016-01-01 00:00:00.000",
"Items" : [
{
"Item" : 10,
"ProductCode" : "ABC567",
"Quantity" : 3.0
},
{
"Item" : 20,
"ProductCode" : "CDE987",
"Quantity" : 1
}
]
},
{
"_id" : "87654321",
"Currency" : "EUR",
"DueTotalAmountInvoice" : 768.37,
"InvoiceDate" : "2016-01-01 00:00:00.000",
"Items" : [
{
"Item" : 30,
"ProductCode" : "PLO987",
"Quantity" : 1,
"Units" : "KM3"
},
{
"Item" : 40,
"ProductCode" : "PLS567",
"Quantity" : 1,
"DueTotalAmountInvoice" : 768.37
}
]
}
]
}
Is that what you wanted?
Mongo Db has a way to get the specific array element by using its index. For example, you have an array and you need to get [your] index, then in mongo we use dot . but not braces [ ] !! And one thing is important either! - If you are getting the embedded value (in object or array) you must use " " for your way so if you are changing your value inside this must be like that:
yourModel.findOneAndUpdate(
{ _id: "12345678" },
{
$set: {
"Invoices.0.Items.0.Quantity": 10,
},
}
);
0 - is your element indexes in the array!
$set is the operator to set new value
10 - new value
Else you can go further, you can construct your way to the value with the variable indexes. Use string template
yourModel.findOneAndUpdate(
{ _id: "12345678" },
{
$set: {
[`Invoices.${invoiceIndex}.Items.${itemIndex}.Quantity`]:newValue ,
},
}
);
it is the same but you can paste variable indexes

How can I provide multiple criteria for an attribute within an element of array in mongo query?

I have a collection with following documents:
{
"_id": 1,
"books": [
{
"id":"Sherlock Holmes",
"category":"Novel"
},
{
"id":"10 tips for cook",
"category":"Tips"
}
]
},
{
"_id": 2,
"books": [
{
"id":"10 tips for cook",
"category":"Tips"
}
]
},
{
"_id": 3,
"books": [
{
"id":"Sherlock Holmes",
"category":"Novel"
}
]
}
I want to query document contains both books with id "Sherlock Holmes" and "10 tips for cook", where its "_id" is 1.
I've tried with $in and $elemMatch but the results are those three. I only need one in this case.
Do you have any solutions?
Use the $and operator to search for the same field with multiple expression.
db.coll.find({
'$and': [
{'books.id': 'Sherlock Holmes'},
{'books.id': '10 tips for cook'}
]
})
Result:
{
"_id" : 1,
"books" : [
{
"id" : "Sherlock Holmes",
"category" : "Novel"
},
{
"id" : "10 tips for cook",
"category" : "Tips"
}
]
}
Because _id is unique in a MongoDB collection, so you can just query
db.myCollection.find({_id:1})
And if you don't want the whole document to be returned, you can use projection
db.myCollection.find({_id:1},{_id:0, books:1})

Resources