Snowflake Merge dynamic attributes or any other solutions - snowflake-cloud-data-platform

We are trying to leverage snowflake merge using below scenario but are not able to achieve. The first set of data loaded into the target table using stream and task, then the same set of data with different attributes are loaded, we need both pre and post merges attributes and common should be updates, are there any solutions to this ?
Per min approx. 10k messages in stream, Consider volume will be high.
Attributes are not fixed one, dynamically changed based on job id.
Also, in our case it is not a duplicates but getting exception (Duplicate row detected during DML action) in the second load while updating.
We tried with JavaScript snowflake connector and performed deep merge but taking too much time.
Using below scripts:-
merge INTO TARGET_TABLE target using STREAM_TABLE Pstream
on (target.message:meta:jobid = Pstream.message:meta:jobid) AND
WHEN MATCHED
then update set target.message = Pstream.message, target.lastUpdatedTimestamp= current_timestamp
WHEN NOT MATCHED
then insert values (Pstream.message, current_timestamp)
First load :
{
"at"{
"attribute1" : "value1",
"attribute2" : "value2",
"attribute3" : "value3",
"attribute4" : "value4",
"attribute5" : "value5"
},
"meta"{
"jobid" : 123,
"countryCode": "value",
"countryName" : "value"
}
}
Second load same jobid with multiple data:
{
"at"{
"attribute3" : "value3", <update>
"attribute4" : "value4", <update>
"attribute6" : "value6" <Insert>
},
"meta"{
"jobid" : 123,
"countryCode": "value",
"countryName" : "value"
}
}
{
"at"{
"attribute1" : "value1", <update>
"attribute4" : "value4", <update>
"attribute6" : "value6" <update>
"attribute7" : "value7" <Insert>
},
"meta"{
"jobid" : 123,
"countryCode": "value",
"countryName" : "value"
}
}
Final Result: (Consider At structure dynamic)
{
"at"{
"attribute1" : "value1", <updated value>
"attribute2" : "value2", <Old value>
"attribute3" : "value3", <updated value>
"attribute4" : "value4", <updated value>
"attribute5" : "value5" <Old value>
"attribute6" : "value6" <updated value>
"attribute7" : "value7" <new value>
},
"meta"{
"jobid" : 123,
"countryCode": "value",
"countryName" : "value"
}
}

Related

Find only the documents which have two embedded documents

I'm using Mongodb to analysee a Nobel prizes dataset which documents look like these:
> db.laureate.find().pretty().limit(1)
{
"_id" : ObjectId("604bc8c847d640142f02b3b1"),
"id" : "1",
"firstname" : "Wilhelm Conrad",
"surname" : "Röntgen",
"born" : "1845-03-27",
"died" : "1923-02-10",
"bornCountry" : "Prussia (now Germany)",
"bornCountryCode" : "DE",
"bornCity" : "Lennep (now Remscheid)",
"diedCountry" : "Germany",
"diedCountryCode" : "DE",
"diedCity" : "Munich",
"gender" : "male",
"prizes" : [
{
"year" : "1901",
"category" : "physics",
"share" : "1",
"motivation" : "\"in recognition of the extraordinary services he has rendered by the discovery of the remarkable rays subsequently named after him\"",
"affiliations" : [
{
"name" : "Munich University",
"city" : "Munich",
"country" : "Germany"
}
]
}
]
}
As you see the column "prizes" has embedded documents and the query I am trying to do is finding only those laureates who won two prizes (which I already know to be Marie Curie and Linus Pauling) can you help me with that?
Thanks in advance!
The $size operator should work fine for this. You could read about it if you want in this link: https://docs.mongodb.com/manual/reference/operator/query/size/
Your new query:
db.laureate.find({prizes: {$size: 2}}).pretty().limit(1)

Jolt to Transform JSON Array

I want to use Jolt to transform a JSON dataset. The problem is that my entire dataset is treated like an array because it is originally transformed from XML. Here is an example of the first 3 records:
{
"XMLSOCCER.COM" : { "Team" :[{
"Team_Id" : "45",
"Name" : "Aberdeen",
"Country" : "Scotland",
"Stadium" : "Pittodrie Stadium",
"HomePageURL" : "http://www.afc.co.uk",
"WIKILink" : "http://en.wikipedia.org/wiki/Aberdeen_F.C.",
"Capacity" : "20866",
"Manager" : "Derek McInnes"
},{
"Team_Id" : "46",
"Name" : "St Johnstone",
"Country" : "Scotland",
"Stadium" : "McDiarmid Park",
"HomePageURL" : "http://www.perthstjohnstonefc.co.uk",
"WIKILink" : "http://en.wikipedia.org/wiki/St._Johnstone_F.C."
},{
"Team_Id" : "47",
"Name" : "Motherwell",
"Country" : "Scotland",
"Stadium" : "Fir Park Stadium",
"HomePageURL" : "http://www.motherwellfc.co.uk",
"WIKILink" : "http://en.wikipedia.org/wiki/Motherwell_F.C."
}}]}}
For a single record-set, I can use this spec which gives me the correct output:
[
{
"operation": "shift",
"spec": {
"XMLSOCCER.COM": {
"Team": {
"Team_Id": "Team_Id",
"Name": "Name",
"Country": "Country",
"Stadium": "Stadium",
"Capacity": "Capacity",
"Manager": "Manager"
}
}
}}]
But because my entire dataset is treated as a JSON array (an array under "Team"), I cannot figure out how to create the spec to work with this configuration. I appreciate any input. thanks!
Spec: Match into all the elements of the Team array, and then reference the element number of the team array for each key in the output.
[
{
"operation": "shift",
"spec": {
"XMLSOCCER.COM": {
"Team": {
"*": {
"Team_Id": "soccer[&1].Team_Id",
"Name": "soccer[&1].Name",
"Country": "soccer[&1].Country",
"Stadium": "soccer[&1].Stadium",
"Capacity": "soccer[&1].Capacity",
"Manager": "soccer[&1].Manager"
}
}
}
}
}
]

MongoDB - Array element as variable

Lets say here is the same document:
{
"_id" : 1.0,
"custname" : "metlife",
"address" : {
"city" : "Bangalore",
"country" : "INDIA"
}
}
And if I want to push an extra field to this document, something like below:
db.customers.updateMany(
{"address.country":"INDIA"},
{$push : {city: "$address.country"}}
)
It results in wrong update:
{
"_id" : 1.0,
"custname" : "metlife",
"address" : {
"city" : "Bangalore",
"country" : "INDIA"
},
"city" : "$address.city"
}
Instead of this:
{
"_id" : 1.0,
"custname" : "metlife",
"address" : {
"city" : "Bangalore",
"country" : "INDIA"
},
"city" : "Bangalore"
}
How do I achieve the above result?
You can't refer to other field values in update currently (more here). There is a workaround in aggregation framework (using $out) but it will replace entire collection.
I think that you can consider using $rename in your case. It will not add new field but it can move city to the top level of your document.
db.customers.updateMany({"address.country":"INDIA"}, {$rename: {"address.city": "city"}})
will give you following structure:
{ "_id" : 1, "custname" : "metlife", "address" : { "country" : "INDIA" }, "city" : "Bangalore" }
like #mickl said : You can't refer to other field values in update currently,
you have to iterate through your collection to update the documents, try this :
db.eval(function() {
db.collection.find({"address.country":"INDIA"}).forEach(function(e) {
e.city= e.address.city;
db.collection.save(e);
});
});
Keep in mind that this will block the DB until all updates are done.
try this
db.customers.updateMany(
{"address.country":"INDIA"},
{$push : {city: "address.country"}}
)
remove $ sign

Searching JSON document in monogDB for value across array elements

I have some complex document (being new to mongodb schemas, I think it's complext) that I'm trying to process through for a specific array value match across different array sections of the document.
Sample content of my document:
{
"_id" : ObjectId("541c0c9bdfecb53368e12ef0"),
"SRVIP" : "10.10.10.10",
"INSNME" : "myinstance",
"DBNAME" : "mydbname",
"DBGRPL" : [{
"GRPNME" : "grp1",
"GRPPRV" : "7",
"GRPAUT" : [ “AUTH1”,”AUTH2”],
"GRPUSR" : [ "USER1",”USER2”]
}
],
"SAUTLV" : [ { "SAUNME" : "USER4",
"SAUPRV" : "0",
"SAUAUT" : [ “AUTH2”,”AUTH3”],
"SAUUSR" : [ "USER2" ]
}
],
"USRLVL" : [
{ "ULVNME" : "USER1",
"ULVPRV" : "0",
"ULVAUT" : [ "AUTH1","AUTH2","AUTH3"]
},
{
"ULVNME" : "USER2",
"ULVPRV" : "2411",
"ULVAUT" : [ "AUTH3"]
}
]
}
I'm trying to only return the section of the document where for example USER1 exists
At the moment, I've create two different aggregated statement to retrieve the information, but I'm looking at a single statement to search all arrays in the document.
Retrieving USER1 statement on DBGRPL array level :
var var1=[“USER1”]
db.authinfo.aggregate({$unwind:"$DBGRPL"},{$match:{"DBGRPL.GRPUSR":{$in:var1}}},{$project:{SRVIP:1,DBNAME:1,"DBGRPL":1}})
var var1=”USER1”
Retrieving USER1 statement on USRLVL array level:
db.authinfo.aggregate({$unwind:"$USRLVL"},{$match:{"USRLVL.ULVNME":var1}},{$project:{SRVIP:1,DBNAME:1,"USRLVL":1}})
The obvious error with the above approach is using 2 different variable type for the queries to work, which is also something I can't resolve at the moment ….
How can I combine the search into a single statement ?
expected output :
{
"_id" : ObjectId("541c0c9bdfecb53368e12ef0"),
"SRVIP" : "10.10.10.10",
"INSNME" : "myinstance",
"DBNAME" : "mydbname",
"DBGRPL" : [{
"GRPNME" : "grp1",
"GRPPRV" : "7",
"GRPAUT" : [ “AUTH1”,”AUTH2”],
"GRPUSR" : [ "USER1",”USER2”]
}
],
"USRLVL" : [
{ "ULVNME" : "USER1",
"ULVPRV" : "0",
"ULVAUT" : [ "AUTH1","AUTH2","AUTH3"]
}
{
]
}
when searching for USER1.
I will also search across the GRPAUTH, SAUAUT and ULVAUTH sections of the document where say AUTH1 is a value ...

postgresql json array index

I have data for customers with more than one adresses with json representation like this:
{
"firstName" : "Max",
"lastName" : "Mustermann",
"addresses" : [{
"city" : "München",
"houseNumber" : "1",
"postalCode" : "87654",
"street" : "Leopoldstraße",
}, {
"city" : "Berlin",
"houseNumber" : "2a",
"postalCode" : "12345",
"street" : "Kurfürstendamm",
}
]
}
these json is stored in a column named json of datatype json in a table named customer.
I want to query like this:
SELECT *
FROM customer cust,
json_array_elements(cust.json#>'{addresses}') as adr
WHERE adr->>'city' like '%erlin'
and adr->>'street' like '%urf%';
Query works fine ... but can't create index that postgresql 9.3.4 can use.
Any idea?

Resources