Can't query semi-structured data using lateral flatten etc - snowflake-cloud-data-platform

I've got some data in a table, and one of the columns is a Variant which contains a ree of JSON data. I can successfully flatten arrays, and arrays within arrays to access data therein but I'm struggling with flattening key-value pairs to access the value for a given key.
I've seen the docs at https://docs.snowflake.net/manuals/user-guide/json-basics-tutorial.html mapping this onto my use case results in NULL values in the results.
My variant is show in part below - In particular it's values like MatchStatus and the key/values under Variables that I'm interested in extracting.
Thanks for any helpful suggestions.

The described JSON has a simple path-like structure with objects at various levels (and no arrays).
Per Snowflake's semi-structured data documentation, use the dot notation to extract a value following a (flatly nested) path:
Insert a colon : between the VARIANT column name
and any first-level element: <column>:<level1_element>.
Use dot notation to traverse a path in a JSON object:
<column>:<level1_element>.<level2_element>.<level3_element>.
An example would be (note the chained use of dots in the third and fourth lines):
SELECT
badminton_odds:Id as id,
badminton_odds:PricingRequest.MatchStatus as match_status,
badminton_odds:PricingRequest.Variables.Dispersion as var_dispersion
FROM odds_table
You do not require FLATTEN for simple, singular value extraction. Use FLATTEN when you have a need to explode some series data into multiple rows (such as in case of arrays).
For example, if the described JSON in the question is how a single array element looks in a long array of such objects, you may use FLATTEN to first break the whole array into rows, and then apply path style extraction to retrieve the value from each row.

Related

Is there a way to return a cell value if two arrays associated with that value are equal?

I have a data set that details out specific discounts based on a store number, that data set looks similar to this:
My two questions are:
Is there a way to pull out the unique arrays within the data set using formulas? (The arrays are a set size of columns and rows within the data set)
Is there a way to get a unique list of stores that fit the criteria of each unique array?
In this example I am trying to get something like this to return:
I've been able to build a workaround by assigning specific areas of the worksheet as Array1, Array2, etc. and then using a formula to check if the individual arrays for each store are equal but I'm looking for a quicker way to easily pull out the unique arrays and associated stores.

Fetch different columns from different google sheets using Query [duplicate]

When I append arrays in google spreadsheets, all of the resulting elements are not rendered in cells. For example, if I enter the formula:
={{1,2,3}, {4,5,6}}
the values rendered in spreadsheet cells are 1,4,5,6. Any ideas about why this is happening, or alternatives? My broader problem is to accumulate rows from separate sheets into another sheet - I can do that via
={ImportRange(...), ImportRange(...)}
but the same problem is apparent (missing the second element and beyond from the first array).
Edit (2 Oct 2014)
I just happened upon this when someone upvoted. The information below is obsolete in the newest version of Sheets - you can now (have been able to for a few months) concatenate arrays inside embedded arrays. All the examples that I provided below will work, including the one I said "shouldn't work".
Embedded arrays in Google Sheets
An array of values may be populated by a single function using an embedded array. Each element in the embedded array (and this may be point of conjecture; it is more or less just my opinion) represents the value that will be populated in contiguous cells in the sheet. Semi-colons are row delimiters; commas (or backslashes in locales that use a comma for a decimal separator) are column delimiters. So this will successfully create a two-row, three-column array (all of the following examples assume a locale supporting comma column delimiters):
={1,2,3;4,5,6}
Embedded arrays within embedded arrays
As each element in an embedded array represents a cell in the spreadsheet, I think it is reasonable to assume that one should be able to populate a cell with another embedded array, as long as it does not overwrite other elements in the outer embedded array. So IMO something like this should (see point 3) be successful:
={{1;2;3},{4;5;6}}
However something like this shouldn't work (again IMO), as the second and third elements of the first embedded array would be "overwriting" the second embedded array:
={{1,2,3},{4,5,6}}
There is a bug associated with the first embedded array inside an embedded array
As +Jason pointed out, something like ={{1;2;3},{4;5;6},{7;8;9}} doesn't work in that the first embedded array only populates one element (but every other column is populated correctly). It is also interesting that that one element is auto-converted to a text string. This is (unfortunately) a long standing bug in Google Sheets. The same thing occurs when you attempt to invoke the SPLIT() function on an array (every element in the array is split successfully except for the first one).
I don't think embedded arrays within embedded arrays will help with your broader problem anyway
Embedded arrays can't really be used to append one array on to end of another anyway (due to the "overwriting" effect), and there is no native function that can do it directly. The VMERGE function which you can obtain via the Script gallery (credit to +ahab) will work out of the box:
=VMERGE(ImportRange(...);ImportRange(...);...)
or you can use native functions to do some string manipulation to achieve this. For example, for one-dimensional arrays:
=ArrayFormula(TRANSPOSE(SPLIT(CONCATENATE(ImportRange("key1";"A1:A10")&CHAR(9);ImportRange("key2";"A1:A10")&CHAR(9));CHAR(9))))
but as well as being clunky and not very readable, this type of formula can be very expensive performance-wise for large data sets (I would tend to recommend the VMERGE custom function option in preference).
It is possible to make a union in Google Spreadsheet very easily. For example:
={'Sheet1'!A2:A;'Sheet2'!A2:A;'Sheet3'!A2:A}
See more info in Google Docs Help: Using arrays in Google Sheets
Assuming you have 3 arrays A2:B7, D4:E12, and F2:G230 with the same number of columns but different lengths (often the case if you have the same table of data split into different tabs for each period), I think the easiest way is something like this:
=TRANSPOSE({TRANSPOSE(A2:B7), TRANSPOSE(D4:E12), TRANSPOSE(F2:G230)})

How to modify array under specific JSONB key in PostgreSQL?

We're storing various heterogeneous data in a JSONB column called ext and under some keys we have arrays of values. I know how to replace the whole key (||). If I want to add one or two values I still need to extract the original values (that would be ext->'key2' in the example lower) - in some cases this may be too many.
I realize this is trivial problem in relational world and that PG still needs to overwrite the whole row anyway, but at least I don't need to pull the unchanged part of the data from DB to the application and push them back.
I can construct the final value of the array in the select, but I don't know how to merge this into the final value of ext so it is usable in UPDATE statement:
select ext, -- whole JSONB
ext->'key2', -- JSONB array
ARRAY(select jsonb_array_elements_text(ext->'key2')) || array['asdf'], -- array + concat
ext || '{"key2":["new", "value"]}' -- JSONB with whole "key2" key replaced (not what I want)
from (select '{"key1": "val1", "key2": ["val2-1", "val2-2"]}'::jsonb ext) t
So the question: How to write such a modification into the UPDATE statement?
Example uses jsonb_*_text function, some values are non-textual, e.g. numbers, that would need non _text function, but I know what type it is when I construct the query, no problem here.
We also need to remove the values from the arrays as well, in which case if the array is completely empty we would like to remove the key from the JSONB altogether.
Currently we achieve this with this expression in the UPDATE statement
coalesce(ext, '{}')::jsonb - <array of items to delete> || <jsonb with additions> (<parts> are symbolic here, we use single JDBC parameter for each value). If the final value of the array is empty, the key for that value goes into the first array, otherwise the final value appears int he JSONB after || operator.
To be clear:
I know the path to the JSONB value I want to change - it's actually always a single key on the top level.
I know whether that key stores single value (no problem for those) or array (that's where I don't have satisfying solution yet), because we know the definitions of each key, this is stored separately.
I need to add and/or remove multiple values I provide, but I don't know what is in the array at that moment - that's the whole point, so that application doesn't need to read it.
I may also want to replace the whole array under the key, but this is trivial case and I know how to do this.
Finally, if removal results in an empty array, we'd like to get rid of the key as well.
I could probably write a function doing it all if necessary but I've not committed to that yet.
Obviously, restructuring the data out of that JSONB column is not an option. Eventually I want to make it more flexible and data with these characteristics would go to some other table, but at this moment we're not able to do it with our application.
You can use jsonb_set to modify an array which is placed under some key.
To update a value in an array you should specify a zero-based index within the array in the below example.
To add a new element on a start/end - specify negative/positive index which is greter than array's length.
UPDATE <table>
SET ext = jsonb_set(ext, '{key2, <index>}', '5')
WHERE <condition>

Subtract arrays (Set Difference) with Google Sheets Filter

I need to filter an array in google sheets to remove all elements of that first array from existing elements in a second array.
The arrays are always sorted, although the might contain duplicates. But to complicate the matter, the arrays are not ranges in the sheet (they are delimited text strings), and they are usually not the same size, e.g. "a, b, d" - "b, c" ( should evaluate to "a, d" ).
QUESTION PART 1: The underlying logic I am using to implement A - B using FILTER and COUNTIF (just on ranges for simplicity) is
FILTER(A1:1, COUNTIF(A1:1, B1:1)=0)
But this fails for these test cases (should-be❌is):
{a,b,c}-{}⥱{a,b,c}✅
{a,b,c}-{a}⥱{b,c} ✅ {a,b,c}-{b}⥱{a,c}❌{b,c} {a,b,c}-{c}⥱{a,b}❌{b,c}
{a,b,c}-{a,b}⥱✅ {a,b,c}-{b,c}⥱{a}❌{c} {a,b,c}-{a,c}⥱{b}❌{c}
{a,b,c}-{d}⥱{a,b,c}✅ {a,b,c}-{b,d}⥱{a,c}❌{b,c}
{}-{a}⥱{}✅ {}-{a,b}⥱{}✅
{}-{}⥱{}✅
{a,b,c}-{a,b,c}⥱{}✅
Should I be using another implementation, maybe with MATCH?
QUESTION PART 2: Since I need to use delimited texts instead of ranges, I am splitting my string with SPLIT(A1, ",") to get my arrays, but have to pad them and use array_constraint to get them the same size so that I can use the FILTER and COUNTIF functions, e.g.
ARRAY_CONSTRAIN(SPLIT(CONCATENATE(A1,REPT(",",999)),",",false,false),1,999)
Is there a more direct, not-so-intensive way to get arrays that will work in FILTER and COUNTIF?
I have figured this out after stumbling upon this sheet by Marc Meyer: https://docs.google.com/spreadsheets/d/1-beBOT1CjVyny7QwLz-RQCeN6fDTcIpLI1iZIjdTSgI/edit#gid=0
QUESTION PART 1: The underlying logic I am using to implement A - B using FILTER and COUNTIF (just on ranges for simplicity)
should be
=FILTER(A1:1, ISERROR(MATCH(A1:1, B1:1, false)))
and using this, I can have unequal sized arrays passed to it, obviating the need to pad and splice.

Nest an Excel Array Function In a Non-Array Function and Return an Array

I am trying to create a spreadsheet that will be used to provide quotes to customers. Some part-numbers apply to a single item. Some part-numbers are bundles of up to 4 items. I am trying to create a formula that returns all of the values associated with a given part-number.
Initially, I had two section in the quote - one that uses VLOOKUP to return part-numbers with single items and one that uses an array formula that returns an array of items.
The first formula is
=IF(ISNA(VLOOKUP(B12,PriceList,2,FALSE)),"",VLOOKUP(B12,PriceList,2,FALSE))
The second is
{=IFERROR(INDEX(Bundles!$B$2:$B$101, SMALL(IF($B$33=Bundles!$A$2:$A$101, ROW(Bundles!$B$2:$B$101 ) - 1,""), ROW() - 32 )),"")}
Screenshot showing results of first two formulas
Both work fine on their own. They rely on two data tables "PriceList" and "Bundles"
I want the sales reps to be able to type a part-number in column B and get the correct part descriptions - whether it is 1, 2, 3, or 4 items - to display in column C. I want them to be able to enter multiple part numbers on the same quote.
I tried to base this on the part number
=IF(LEFT(B29,4)="BUND",IFERROR(INDEX(Bundles!$B$2:$B$101,SMALL(IF($B$29=Bundles!$A$2:$A$101,ROW(Bundles!$B$2:$B$101)-1,""),ROW()-28)),""),VLOOKUP(B29,PriceList,2,FALSE))}
This works for bundled items, but repeats single items.
I would like to have a single data source (PriceList) and a single formula.
Partnumbers in Datasource
What I am now trying to do is use COUNTIF. For example if COUNTIF returns more than 1, use the array formula, else use the VLOOKUP formula.
I picture it to be something like
IF((COUNTIF(PriceList,Quote!B11)>1),"BUNDLE",IF(ISNA(VLOOKUP(Quote!B11,PriceList,2,FALSE)),"",VLOOKUP(Quote!B11,PriceList,2,FALSE)))
where "BUNDLE" is replaced by an array function. I can't seem to come up with the right array formula.
I tried
{=IF((COUNTIF(PriceList,Quote!B11)>1),IFERROR(INDEX(Bundles!$B$2:$B$101, SMALL(IF($B$11=Bundles!$A$2:$A$101, ROW(Bundles!$B$2:$B$101 ) - 1,""), ROW() - 32 )),""),IF(ISNA(VLOOKUP(Quote!B11,PriceList,2,FALSE)),"",VLOOKUP(Quote!B11,PriceList,2,FALSE)))}
This returns four rows of the same item for single items and nothing for bundles
I had thought about placing the array function in another cell and referencing that cell, but this does not help if the bundle contains more than one item.
Any thoughts or advice would be welcome.

Resources