Generate nested JSON (reverse lateral flatten) - snowflake-cloud-data-platform

How can I use Snowflake to generate nested JSON from table data?
We can use the dot notification to specify where in a JSON file to read data from, is it possible to the reverse and specify a hierarchy to save data?
My end goal is to output a dataset without duplicating parent values, but nesting children underneath instead.

object_construct function would be of help here:
https://docs.snowflake.com/en/sql-reference/functions/object_construct.html
A couple of related how-to articles:
https://community.snowflake.com/s/article/Generating-a-JSON-Dataset-using-Relational-Data-in-Snowflake
https://community.snowflake.com/s/article/How-to-Merge-Combine-Two-JSON-Fields

Related

Performance Effective JSON data masking Snowflake

I am trying to perform data masking on JSON data.
Using a Javascript UDF to update list of NESTED JPATH attributes similar to what is done here,
https://www.snowflake.com/blog/masking-semi-structured-data-with-snowflake/
Additionally I tried nested OBJECT_INSERT statements to mask a specific attribute but having multiple attributes to mask I have to build a list of subqueries to perform OBJECT INSERT on previous sub query result which is complex.
Ex:
FROM (
SELECT OBJECT_INSERT(VAR_COL,'LVL1',OBJECT_INSERT(VAR_COL:LVL1,'KEY1',OBJECT_INSERT(VAR_COL:LVL1.KEY1,'KEY2','VALUE',TRUE),TRUE),TRUE) AS VAR_COL
FROM TABLE
)
Another problem with OBJECT_INSERT which is not letting me use it is if the JPATH doesn't exists for a specific JSON row it will add that JPATH which I dont want.
I am working with million of records and using XS Warehouse it takes 15 mins to do a simple query using JavaScript UDF.
Alternately, also tried Snowpark UDF but it is also showing very small improvement.
Any idea on improving performance further?

How do you copy data from a JSON file in a external stage in Snowflake when the file is too large?

I have a JSON file (well technically several) in an external GCS stage in Snowflake. I'm trying to extract data from it into tables, but the file is too large. The file doesn't contain an array of JSON objects, but it actually one giant JSON object. As such, setting STRIP_OUTER_ARRAY to true isn't an option in this case. Breaking the files into smaller files isn't really an option either because they are maintained by an external program and I don't have any control over that.
The general structure of the JSON is:
{
meta1: value1,
meta2: value2,
...
data: {
component1: value1,
component2: value2,
...
}
}
The issue is due to the value of data. There can be a varying number of components and their names aren't reliably predictable. I could avoid the size issue if I could separate out the components, but I'm unable to do a lateral flatten inside of copy into. I can't load into a temporary variant column either because the JSON is too large. I tried to do an insert instead of a copy, but that complains about the size as well.
Are there any other options? I wondered if there might be a way to utilize a custom function or procedure, but I don't have enough experience with those to know if they would help in this case.
So its external programs to GCP to Snowflake. Looking at all the things you have tried around snowflake, seems like there may not be any other option available within SF.. However, how about checking something at GCP end. Once the file reaches GCP say in Folder 1, try to split the JSON into smaller bits, and move them into GCP Folder 2. Create your external stage on top of Folder 2.
Even I do not have much idea on GCP but thought of sharing this idea in case it works out for you.
I could avoid the size issue if I could separate out the components,
but I'm unable to do a lateral flatten inside of copy into.
I think you can use LATERAL FLATTEN in a CREATE TABLE ... AS SELECT ... statement (see this). You leave your huge JSON file in an external stage like S3, and you access it directly from there with a statement like this, to split it up by the "data" elements:
create table mytable (id string, value variant) as
select id::string, value::variant
from #mystage/myfile.json.gz,
lateral flatten(input => parse_json($1:data));

how to phrase a single line json file in snowflake external table

this is the first time I try to load a single line json file into snowflake via external table.
the files are around 60MB and stored on s3. it contains nest records, arrays and no newline character.
Right now I cannot see the data from the external table, However, if the file is small enough like 1MB the external table works fine.
The closest solution I can find is this, but it doesn't provide me sufficient answer.
The file can be in a much bigger size and I have no control of the files.
Is there a way to fix this issue?
thanks!
Edit:
Look like the only tangible solution is to make the file smaller, as the json file is not ndjson. What JSON format does STRIP_OUTER_ARRAY support?
If your nested records are wrapped in an outer array, you can potentially use STRIP_OUTER_ARRAY on the Stage file definition to remove them so each JSON element within the outer array gets loaded as a record. You can include use METADATA$FILENAME in the table definition to derive which file the elements came from.

Mule Salesforce connector in query operation. List of Maps as input

I have a huge list of Ids and Names for Contacts defined in this simple DW script. Let's show just two for simplicity:
[
{id:'169057',Name:'John'},
{id:'169099',Name:'Mark'}
]
I need salesforce connector to execute a single query (not 1000 unitary queries) about this two fields.
I expect the connector to be able to use a List of Maps as input, as it does using update/insert operations. So, I tried with this Connector config:
I am getting as response a ConsumerIterator. If I do a Object to String transformer but I am getting an empty String as response.
I guess must be a way of executing a big query in just one API call... But I am not finding it. Have in mind I need to use two or more WHERE clauses.
Any idea? Thanks!
I think you need to create two separate list for id and Name values from your input and then use those list in the query using IN.
Construct id list to be used in where clause. This can be executed inside for each with batch size 1000.
Retrieve name and id from contact for each batch
select id, name from contact where id in (id list)
Get the desired id's by matching name retrieved in step 2 within mule.

What is the best method to exclude data and query parts of data in a Swift Firebase query?

I am querying users from Firebase and would like to know the best method to query all users excluding the current ref.authData.uid . In parse its read like this......
query?.whereKey("username", notEqualTo: PFUser.currentUser()!.username!)
query?.whereKey("username", containsString: self.searchTXTFLD.text)
Also, is there any Firebase query type similar to Parse's containsString?
There is no way to retrieve only items from Firebase that don't match a certain condition. You'll have to retrieve all items and exclude the offending ones client-side. Also see is it possible query data that are not equal to the specified condition?
There is also no contains operator for Firebase queries. This has been covered many times before, such as in this question: Firebase query - Find item with child that contains string

Resources