Flattens (explodes) compound values into multiple rows - snowflake-cloud-data-platform

Flatten can be used for following semi structure and structure data, or only semi structure.

Flatten is for a VARIANT, OBJECT, or ARRAY. Unpivot will work on structured data.

Related

pyspark filter an array of structs based on one value in the struct

I have a a df with an array of structs:
When I call df.dtypes for this column I would get:
('forminfo', 'array<struct<id: string, code: string>>')
I want to create a new column called 'forminfo_approved' which takes my array and filters within that array to keep only the structs with code == "APPROVED". So if I did a df.dtypes on this new field, the type would be the same, (another array of structs) but I would only have the APPROVED structs from the array.
I've played around with udf's and expr's for a long time now and I can't quite seem to get this one to perform the behavior above. Thanks so much if you can help!
Christie
I think I got it:
df.withColumn("forminfo_approved", expr("filter(form_info, s-> s.code == 'APPROVED')"))

Combining Ranges in Excel Functions with out helper cells

I am having difficulty figuring out how to get the correct data type evaluated as a single array passed to the first argument of the Small() in this function. My overall goal is (completely without the use of helper cells) to combine two sets of ranges passed to Small() as arrays into a 2-dimensional array output. The formulas work correctly when placed separately in ranges, but when combined in a Let() I get type inconsistency caused #VALUE! errors as the output.
Here is the LET() function ...
=LET(
A1v, SEQUENCE(1,10,1,0),
A2v, SEQUENCE(1,10,2,0),
SMALL((A1v,A2v),SEQUENCE(2,COLUMNS(A1v)))
)
When the Let() formula is broken into pieces and placed into separate ranges as described below it produces the desired output in A3:J4 (A3#) as shown in this
Range A1 formula:
=SEQUENCE(1,10,1,0)
Range A2 formula:
=SEQUENCE(1,10,2,0)
Range A3 formula:
=SMALL((A1#,A2#),SEQUENCE(2,COLUMNS(A1#)))
I am aware that there are other function constructs that can be used to combine ranges. I am not looking for alternatives to using Small(). I am looking for answers that will help further my understanding of constructing arrays from and to be used as inputs to the new array functions in Excel. Thanks to everyone in advance!
The reason your Let function fails is the (A1v,A2v) parameter to Small. That construct is the Union Operator for Ranges. A1v and A2v are arrays, not ranges.
This can be seen in the Formula Evaluator dialog
In contrast (A1#,A2#) works because A1# and A2# are ranges.
FWIW, Small itself can accept either ranges or arrays
To solve this you'll need a general purpose solution to getting the Union of two data sets, whether they be Ranges or Arrays. This can be achieved using a Lambda function.
Add a Workbook scoped Name to the Name Manager, lets call it Union
In the Refers To section insert
=LAMBDA(tabl1, tabl2,
LET(rowindex, SEQUENCE(ROWS(tabl1)+ROWS(tabl2)),
colindex, SEQUENCE(1,COLUMNS(tabl1)),
IF(rowindex<=ROWS(tabl1),
INDEX(tabl1,rowindex,colindex),
INDEX(tabl2,rowindex-ROWS(tabl1),colindex)
)
)
)
Union is now available to use as a stand alone function, or imbedded in another function
Your Formula now becomes
=LET(
A1v, SEQUENCE(1,10,1,0),
A2v, SEQUENCE(1,10,2,0),
SMALL(Union(A1v,A2v),SEQUENCE(2,COLUMNS(A1v)))
)
Or without Lambda
=LET(
A1v, SEQUENCE(1,10,1,0),
A2v, SEQUENCE(1,10,2,0),
rowindex, SEQUENCE(ROWS(A1v)+ROWS(A2v)),
colindex, SEQUENCE(1,COLUMNS(A1v)),
Av, IF(rowindex<=ROWS(A1v),
INDEX(A1v,rowindex,colindex),
INDEX(A2v,rowindex-ROWS(A1v),colindex),
SMALL(Av,SEQUENCE(2,COLUMNS(A1v))) )
)
Untested, so you might have to tweak it a bit

Google Sheets: Conduct VLOOKUP on array stored in a different cell

I am trying to store a series of key/value pairs in a single cell in Google Sheets, then interrogate the array using formulae, such as VLOOKUP().
As an example, there is an array storing a series of key/value pairs: ​
{"keyA", "valueA"; "keyB", "valueB"; "keyC", "valueC"}
You can use VLOOKUP on this array if it is embedded in the formula:
=VLOOKUP("keyB",{"keyA","valueA";"keyB","valueB";"keyC","valueC"},2,FALSE)
which will return "valueB".
But if you store the array in a cell (eg B2) and refer to that cell in the formula, eg:
=VLOOKUP("keyB",B2,2,FALSE)
...you get a #REF! response with the detail: "Error. VLOOKUP evaluates to an out-of-bounds range."
Can anyone suggest a solution to this please?
Many thanks
There is no EVALUATE formula in Spreadsheet.
The array expression is some kind of formula, in order to output a range of values, you have to put:
={"keyA", "valueA"; "keyB", "valueB"; "keyC", "valueC"}
You could then do VLOOKUP on the output.
However, you can parse it yourself with SPLIT
=ArrayFormula(SPLIT(TRANSPOSE(SPLIT(REGEXREPLACE(B2,"^{""|""}$",),"""; """,FALSE)),""", """,FALSE))

Can't query semi-structured data using lateral flatten etc

I've got some data in a table, and one of the columns is a Variant which contains a ree of JSON data. I can successfully flatten arrays, and arrays within arrays to access data therein but I'm struggling with flattening key-value pairs to access the value for a given key.
I've seen the docs at https://docs.snowflake.net/manuals/user-guide/json-basics-tutorial.html mapping this onto my use case results in NULL values in the results.
My variant is show in part below - In particular it's values like MatchStatus and the key/values under Variables that I'm interested in extracting.
Thanks for any helpful suggestions.
The described JSON has a simple path-like structure with objects at various levels (and no arrays).
Per Snowflake's semi-structured data documentation, use the dot notation to extract a value following a (flatly nested) path:
Insert a colon : between the VARIANT column name
and any first-level element: <column>:<level1_element>.
Use dot notation to traverse a path in a JSON object:
<column>:<level1_element>.<level2_element>.<level3_element>.
An example would be (note the chained use of dots in the third and fourth lines):
SELECT
badminton_odds:Id as id,
badminton_odds:PricingRequest.MatchStatus as match_status,
badminton_odds:PricingRequest.Variables.Dispersion as var_dispersion
FROM odds_table
You do not require FLATTEN for simple, singular value extraction. Use FLATTEN when you have a need to explode some series data into multiple rows (such as in case of arrays).
For example, if the described JSON in the question is how a single array element looks in a long array of such objects, you may use FLATTEN to first break the whole array into rows, and then apply path style extraction to retrieve the value from each row.

VFP: 3d arrays?

The following doesn't work... Can you do 3d arrays in foxpro?
DIMENSION sqlresults[10]
select list_code, count(donor) as ndine FROM cGift group by list_code INTO ARRAY sqlresults[1]
edit:
ah, a google search for "vfp multi-dimensional arrays" turned up something ("vfp 3d arrays" didn't)
Foxpro only supports 2d arrays. Guess i'll have to fake it with some substitution (&).
The only problem with your code is that you included a dimension in the query. Try this instead:
select list_code, count(donor) as ndine
FROM cGift
group by list_code
INTO ARRAY sqlresults
That said, on the whole, you're better off putting query results into a cursor than an array.
Sqlresults[1] = sys(2015)
Select ... into cursor (sqlresults[1])
This way your array holds names of the cursors, and you can access their values like:
Select (sqlresults[1])
?fieldname
Or use eval or &

Resources