Flattening an array to a set of fields that I can use

Flattening an array to a set of fields that I can use - snowflake-cloud-data-platform

Environment : Hevo pulling data from Xero into Snowflake, transforming using dbt and visualise using Tableau
I'm pulling Reports from Xero into Snowflake using Hevo and that works great. I end up with a variant field that contains an array (I'm working outside my comfort zone/experience here), but the array contains JSON as well. I'm trying to "unpack" into normal database fields (process them with dbt and then access them in Tableau)
When I query using
select "ROWS"[0] from profit_and_loss_report;
I get
{
"Cells": [
{
"Value": ""
},
{
"Value": "31 Oct 21"
}
],
"RowType": "Header"
}
but when I use
select seq, key, path, index from profit_and_loss_report pnl
, lateral flatten(input => pnl."ROWS"[0]);
I can't get the date and "Header" to appear in the field at all.

Related

Azure Data Factory - converting lookup result array

I'm pretty new to Acure Data Factory - ADF and have stumbled into somthing I would have solved with a couple lines of code.
Background
Main flow:
Lookup Activity fetchin an array of ID's to process
ForEach Activity looping over input array and uisng a Copy Activity pulling data from a REST API storing it into a database
Step #1 would result in an array containing ID's
{
"count": 10000,
"value": [
{
"id": "799128160"
},
{
"id": "817379102"
},
{
"id": "859061172"
},
... many more...
Step #2 When the lookup returns a lot of ID's - individual REST calls takes a lot of time. The REST API supports batching ID's using a comma spearated input.
The question
How can I convert the array from the input into a new array with comma separated fields? This will reduce the number of Activities and reduce the time to run.
Expecting something like this;
{
"count": 1000,
"value": [
{
"ids": "799128160,817379102,859061172,...."
},
{
"ids": "n,n,n,n,n,n,n,n,n,n,n,n,...."
}
... many more...
EDIT 1 - 19th Des 22
Using "Until Activity" and keeping track of posistions, I managed to use plain ADF. Would be nice if this could have been done using some simple array manipulation in a code snippet.

The ideal response might be we have to do manipulation with Dataflow -
My sample input:
First, I took a Dataflow In that adding a key Generate (Surrogate key) after the source - Say new key field is 'SrcKey'
Data preview of Surrogate key 1
Add an aggregate where you group by mod(SrcKey/3). This will group similar remainders into the same bucket.
Add a collect column in the same aggregator to collect into an array with expression trim(toString(collect(id)),'[]').
Data preview of Aggregate 1
Store output in single file in blob storage.
OUTPUT

faltten command takes one variable, how to use it for multiple rows

I'm new to snowflake, trying to figure out a way to flatten multiple rows. Appreciate any help here!
we are receiving files from snowpipe. multiple files come into one staged table. each file contains multiple records, however, all the records are in one array variable. When I use flatten command, if there is only one file, it works fine, it flattens the array and separates the records. However, when there are multiple files the flatten command fails with the error "single-row subquery returns more than one row". how to handle this?
{ "test": [ { "a": true, "b": "20" }, { "a": true, "b": "30"}, { "a": false, "b": "40" } ], "Date": "Sun Jan 02 2022 16:00:30 GMT+0000 (Coordinated Universal Time)" }
This is made up data for clarification, Each file contains data like this. SQL I'm using is
select * from table(flatten(select $1:test from #stage))
If there is only one file with the above structure it works fine. However for multiple files it's failing

change the order of operations
SELCT
s.*,
f.*
FROM #stage s,
TABLE(FLATTEN(input=>s.$1:test)) f
this way you get a row s for every stage file, and then get access to the flattened f results.

Azure Stream Analytics–Querying JSON Arrays of arrays

I have a problem writing a query to extract a table out of the arrays from a json file:
The problem is how to get the information of the array “data packets” and its contents of arrays and then make them all in a normal sql table.
One hard issue there is the "CrashNotification" and "CrashMaxModuleAccelerations", I dont know how to define and use them.
The file looks like this:
{ "imei": { "imei": "351631044527130F", "imeiNotEncoded":
"351631044527130"
},
"dataPackets": [ [ "CrashNotification", { "version": 1, "id": 28 } ], [
"CrashMaxModuleAccelerations", { "version": 1, "module": [ -1243, -626,
14048 ] } ] ]}
I tried to use Get array elements method and other ways but I am never able to access 2nd level arrays like elements of "CrashNotification" of the "dataPackets" or elements of "module" of the array "CrashMaxModuleAccelerations" of the "dataPackets".
I looked also here (Select the first element in a JSON array in Microsoft stream analytics query) and it doesnt work.
I would appreciate any help :)

Based on your schema, here's an example of query that will extract a table with the following columns: emei, crashNotification_version, crashNotification_id
WITH Datapackets AS
(
SELECT imei.imei as imei,
GetArrayElement(Datapackets, 0) as CrashNotification
FROM input
)
SELECT
imei,
GetRecordPropertyValue (GetArrayElement(CrashNotification, 1), 'version') as crashNotification_version,
GetRecordPropertyValue (GetArrayElement(CrashNotification, 1), 'id') as crashNotification_id
FROM Datapackets
Let me know if you have any further question.
Thanks,
JS (Azure Stream Analytics)

We built a HTTP API called Stride for converting streaming JSON data into realtime, incrementally updated tables using only SQL.
All you'd need to do is write raw JSON data to the Stride API's /collect endpoint, define continuous SQL queries via the /process endpoint, and then push or pull data via the /analyze endpoint.
This approach eliminates the need to deal with any underlying data infrastructure and gives you a SQL-based approach to this type of streaming analytics problem.

Optimizing seemingly simple couchbase query for "items whose children satisfy"

I'm developing a system to store our translations using couchbase.
I have about 15,000 entries in my bucket that look like this:
{
"classifications": [
{
"documentPath": "Test Vendor/Test Project/Ordered",
"position": 1
}
],
"id": "message-Test Vendor/Test Project:first",
"key": "first",
"projectId": "project-Test Vendor/Test Project",
"translations": {
"en-US": [
{
"default": {
"owner": "414d6352-c26b-493e-835e-3f0cf37f1f3c",
"text": "first"
}
}
]
},
"type": "message",
"vendorId": "vendor-Test Vendor"
},
And I want, as an example, to find all messages that are classified with a "documentPath" of "Test Vendor/Test Project/Ordered".
I use this query:
SELECT message.*
FROM couchlate message UNNEST message.classifications classification
WHERE classification.documentPath = "Test Vendor/Test Project/Ordered"
AND message.type="message"
ORDER BY classification.position
But I'm very surprised that the query takes 2 seconds to execute!
Looking at the query execution plan, it seems that couchbase is looping over all the messages and then filtering on "documentPath".
I'd like it to first filter on "documentPath" (because there are in reality only 2 documentPaths matching my query) and then find the messages.
I've tried to create an index on "classifications" but it did not change anything.
Is there something wrong with my index setup, or should I structure my data differently to get fast results?
I'm using couchbase 4.5 beta if that matters.

Your query filters on the documentPath field, so an index on classifications doesn't actually help. You need to create an array index on the documentPath field itself using the new array index syntax on Couchbase 4.5:
CREATE INDEX ix_documentPath ON myBucket ( DISTINCT ARRAY c.documentPath FOR c IN classifications END ) ;
Then you can query on documentPath with a query like this:
SELECT * FROM myBucket WHERE ANY c IN classifications SATISFIES c.documentPath = "your path here" END ;
Add EXPLAIN to the start of the query to see the execution plan and confirm that it is indeed using the index ix_documentPath.
More details and examples here: http://developer.couchbase.com/documentation/server/4.5-dp/indexing-arrays.html

Generalized way to extract JSON from a relational database?

Ok, maybe this is too broad for StackOverflow, but is there a good, generalized way to assemble data in relational tables into hierarchical JSON?
For example, let's say we have a "customers" table and an "orders" table. I want the output to look like this:
{
"customers": [
{
"customerId": 123,
"name": "Bob",
"orders": [
{
"orderId": 456,
"product": "chair",
"price": 100
},
{
"orderId": 789,
"product": "desk",
"price": 200
}
]
},
{
"customerId": 999,
"name": "Fred",
"orders": []
}
]
}
I'd rather not have to write a lot of procedural code to loop through the main table and fetch orders a few at a time and attach them. It'll be painfully slow.
The database I'm using is MS SQL Server, but I'll need to do the same thing with MySQL soon. I'm using Java and JDBC for access. If either of these databases had some magic way of assembling these records server-side it would be ideal.
How do people migrate from relational databases to JSON databases like MongoDB?

Here is a useful set of functions for converting relational data to JSON and XML and from JSON back to tables: https://www.simple-talk.com/sql/t-sql-programming/consuming-json-strings-in-sql-server/

SQL Server 2016 is finally catching up and adding support for JSON.
The JSON support still does not match other products such as PostgreSQL, e.g. no JSON-specific data type is included. However, several useful T-SQL language elements were added that make working with JSON a breeze.
E.g. in the following Transact-SQL code a text variable containing a JSON string is defined:
DECLARE #json NVARCHAR(4000)
SET #json =
N'{
"info":{
"type":1,
"address":{
"town":"Bristol",
"county":"Avon",
"country":"England"
},
"tags":["Sport", "Water polo"]
},
"type":"Basic"
}'
and then, you can extract values and objects from JSON text using the JSON_VALUE and JSON_QUERY functions:
SELECT
JSON_VALUE(#json, '$.type') as type,
JSON_VALUE(#json, '$.info.address.town') as town,
JSON_QUERY(#json, '$.info.tags') as tags
Furhtermore, the OPENJSON function allows to return elements from referenced JSON array:
SELECT value
FROM OPENJSON(#json, '$.info.tags')
Last but not least, there is a FOR JSON clause that can format a SQL result set as JSON text:
SELECT object_id, name
FROM sys.tables
FOR JSON PATH
Some references:
https://learn.microsoft.com/en-us/sql/relational-databases/json/json-data-sql-server
https://learn.microsoft.com/en-us/sql/relational-databases/json/convert-json-data-to-rows-and-columns-with-openjson-sql-server
https://blogs.technet.microsoft.com/dataplatforminsider/2016/01/05/json-in-sql-server-2016-part-1-of-4/
https://www.red-gate.com/simple-talk/sql/learn-sql-server/json-support-in-sql-server-2016/

I think one 'generalized' solution will be as follows:-
Create a 'select' query which will join all the required tables to fetch results in a 2 dimentional array (like CSV / temporary table, etc)
If each row of this join is unique, and the MongoDB schema and the columns have one to one mapping, then its all about importing this CSV/Table using MongoImport command with required parameters.
But a case like above, where a given Customer ID can have an array of 'orders', needs some computation before mongoImport.
You will have to write a program which can 'vertical merge' the orders for a given customer ID.For small set of data, a simple java program will work. But for larger sets, parallel programming using spark can do this job.

SQL Server 2016 now supports reading JSON in much the same way as it has supported XML for many years. Using OPENJSON to query directly and JSON datatype to store.

There is no generalized way because SQL Server doesn’t support JSON as its datatype. You’ll have to create your own “generalized way” for this.
Check out this article. There are good examples there on how to manipulate sql server data to JSON format.
https://www.simple-talk.com/blogs/2013/03/26/sql-server-json-to-table-and-table-to-json/

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Flattening an array to a set of fields that I can use - snowflake-cloud-data-platform

Related

Azure Data Factory - converting lookup result array

faltten command takes one variable, how to use it for multiple rows

Azure Stream Analytics–Querying JSON Arrays of arrays

Optimizing seemingly simple couchbase query for "items whose children satisfy"

Generalized way to extract JSON from a relational database?

Categories

Resources