BigQuery: Checking the content of an array with using UNNEST - arrays

I've been trying for the past hours to look for a way to check in BigQuery if an array contains a certain value without using UNNEST. The reason why I don't want to use UNNEST is that I don't want an UNNEST result, I just want to check if the value is in it or not (and then do a condition CASE WHEN on it).
I've tried different ways like value = ANY(array), CONTAINS, CONTAINS_ARRAY but none of them work on BigQuery.
Thank you!

If the only reason for you not to use UNNEST is the unnested result, I would not leave this option behind. Although, I would suggest you to use UNNEST and do not select the unnested columns. Thus, maintaining your nested result and you will be able to use these temporary new columns to verify your conditions within your CASE WHEN statements.
I have used a public dataset in BigQuery to exemplify this algorithm for you.The syntax is:
WITH
temporary_table AS(
SELECT
*,
param
FROM
`firebase-public-project.analytics_153293282.events_20181003`,
UNNEST(event_params) AS param )
SELECT
*,
CASE
WHEN (param.key IN ('value', 'board')) THEN TRUE
END
AS check
FROM
temporary_table
LIMIT
100;
Notice that the unnested columns from event_param are not displayed in the final result. Also, the column check was created and used as a Boolean which could be omitted and could also be used as flag to make the desired modification to your desired columns.
I hope it helps.

Below example is for BigQuery Standard SQL
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 id, [1,2,3] arr UNION ALL
SELECT 2, [4,5]
)
SELECT id, arr,
CASE 1 IN UNNEST(arr)
WHEN TRUE THEN 'valie is in array'
ELSE 'valie is not in array'
END conclusion
FROM `project.dataset.table`
with result
As you can see, result is not unnested!

Related

Filter based on decrypted value using DecryptByKey

I have the following simplified query which should filter the decrypted column values
SELECT ResourceId, ClientId, UserName
FROM dbo.Resources
WHERE DecrpytByKey(UserName) LIKE '%PETER%';
It doesn't work and returns me 0 results
If I use
SELECT K.ResourceId, K.ClientId, K.DUserName
FROM (
SELECT ResourceId, ClientId, UserName, DecrpytByKey(UserName) AS DUserName
FROM dbo.Resources
) AS K
WHERE K.DUserName LIKE '%PETER%'
This works correctly and gives me the desired results.
I want to go with something similar to option #1, since I want it to be efficient and filter the results before the join.
Is there a way to filter based on the decrypted value in the same select statement?
I'm not sure why it doesn't work, but your queries are doing exactly the same thing. Both of them are doing full scan on table so performance will be same.
Unfortunately, %% in this case are making performance fix impossible.

Select Value of Parameter in Array with Denodo (VQL)

I am trying to do something that seems simple but cannot find the right syntax for Denodo's VQL (Virtual Query Language). I have a string like this: XXXX-YYYY-ZZZZ-AAAA-BBBB in a column called "location" that varies in length, and I want to get the value of the fourth set (i.e. AAAA in this example). I am using the Denodo split function like this:
SELECT SPLIT("-",location)[3] AS my_variable FROM my_table
However, the [3] doesn't work. I've tried a bunch of variations:
SELECT SPLIT("-",location)[3].value AS my_variable FROM my_table
SELECT SPLIT("-",location).column3 AS my_variable FROM my_table
etc.
Can someone please help me figure out the right syntax to return a single parameter from an array? Thank you!
SELECT field_1[3].string
FROM (SELECT split('-', 'XXXX-YYYY-ZZZZ-AAAA-BBBB') as field_1)
You have to do it using a subquery because the syntax to access the element of an array (that is, [<number>]) can only be used with field names. You cannot use something like [4] next to the result of a expression.
This question helps: https://community.denodo.com/answers/question/details?questionId=90670000000CcQPAA0
I got it working by creating a view that saves the array as a field:
CREATE OR REPLACE VIEW p_sample_data FOLDER = '/stack_overflow'
AS SELECT bv_sample_data.location AS location
, bv_sample_data.id AS id
, split('-', location) AS location_array
FROM bv_sample_data;
Notice I created a column called location_array?
Now you can use a select statement on top of your view to extract the information you want:
SELECT location, id, location_array[2].string
FROM p_sample_data
location_array[2] is the 3rd element, and the .string tells denodo you want the string value (I think that's what it does... you'd have to read more about Compound Values in the documentation: https://community.denodo.com/docs/html/browse/6.0/vdp/vql/advanced_characteristics/management_of_compound_values/management_of_compound_values )
Another way you could probably do it is by creating a view with the array, and then flattening the array, although I haven't tried that option.
Update: I tried creating a view that flattens the array, and then using an analytics (or "window") function to get a row_number() OVER (PARTITION BY id order by ID ASC), but analytic/window functions don't work against flat file sources.
So if you go the "flatten" route and your source system doesn't work with analytic fuctions, you could just go with a straight rownum() function, but you'd have to offset the value by column number you want, and then use remainder division to pull out the data you want.
Like this:
--My view with the array is called p_sample_data
CREATE OR REPLACE VIEW f_p_sample_data FOLDER = '/stack_overflow' AS
SELECT location AS location
, id AS id
, string AS string
, rownum(2) AS rownumber
FROM FLATTEN p_sample_data AS v ( v.location_array);
Now, with the rownum() function (and an offset of 2), I can use remainder division in my where clause to get the data I want:
SELECT location, id, string, rownumber
FROM f_p_sample_data
WHERE rownumber % 5 = 0
Frankly, I think the easier way is to just leave your location data in the array and extract out the nth column with the location_array[2].string syntax, where 2 is the nth column, zero based.

Access SubQuery: SHOW TOP (count form select query) Table

Is it possible to use a Count() or number from another Select query to SELECT TOP a number of rows in a different query?
Below is a sample of the update query I'm trying to use but would like to take the count from another query to replace "10".
...
WHERE Frames.Package IN (
SELECT TOP 10 Frames
FROM Frames.Package WHERE Package = "100"
ORDER BY Frames.ReferenceNumber
)
So for example, i've tried to do
SELECT TOP SelectQuery.RecordCount Frames
Sample SelectQuery.RecordCount
SELECT COUNT(Frames.Package) AS RecordCount
FROM Frames
HAVING Frames.Package = "100";
Any assistance would be appreciated...
Access does not support using a parameter for SELECT TOP. You must write a literal value into the text of the SQL statement.
From another answer: Select TOP N not working in MS Access with parameter
On that note, your two queries appear to be just interchanging HAVING and WHERE clauses to get the record count. It doesn't seem to be doing anything more, thus why bother with the TOP clause and simply SELECT * FROM Frames WHERE [..]?
Am I missing something?

Postgresql jsonb set-union of lists

I am hoping it is straightforward to do the following:
Given rows containing jsonb of the form
{
'a':"hello",
'b':['jim','bob','kate']
}
I would like to be able to get all the 'b' fields from a table (as in select jsondata->'b' from mytable) and then form a list consisting of all strings which occur in at least one 'b' field. (Basically a set-union.)
How can I do this? Or am I better off using a python script to extract the 'b' entries, do the set-union there, and then store it back into the database somewhere else?
This gives you the union set of elements in list 'b' of the json.
SELECT array_agg(a order by a)
FROM (SELECT DISTINCT unnest(txt_arr) AS a FROM
(SELECT ARRAY(SELECT trim(elem::text, '"')
FROM jsonb_array_elements(jsondata->'b') elem) AS txt_arr
FROM jtest1)y)z;
Query Explanation:
Gets the list from b as jsondata->'b'
Expands a JSON array to a set of JSON values from jsonb_array_elements() function.
Trims the " part in the elements from trim() function.
Converts to an array again using array() function after trimming.
Get the distinct value by unnesting it using unnest() function.
Finally array_agg() is used to form the expected result.

SQL Server Reference a Calculated Column

I have a select statement with calculated columns and I would like to use the value of one calculated column in another. Is this possible? Here is a contrived example to show what I am trying to do.
SELECT [calcval1] = CASE Statement, [calcval2] = [calcval1] * .25
No.
All the results of a single row from a select are atomic. That is, you can view them all as if they occur in parallel and cannot depend on each other.
If you're referring to computed columns, then you need to update the formula's input for the result to change during a select.
Think of computed columns as macros or mini-views which inject a little calculation whenever you call them.
For example, these columns will be identical, always:
-- assume that 'Calc' is a computed column equal to Salaray*.25
SELECT Calc, Salary*.25 Calc2 FROM YourTable
Also keep in mind that the persisted option doesn't change any of this. It keeps the value around which is nice for indexing, but the atomicity doesn't change.
Unfortunately not really, but a workaround that is sometimes worth it is
SELECT [calcval1], [calcval1] * .25 AS [calcval2]
FROM (SELECT [calcval1] = CASE Statement FROM whatever WHERE whatever)
Yes it's possible.
Use the WITH Statement for nested selects:
Two ways I can think of to do that. First understand that the calval1 column does not exist as far as SQL Server is concerned until the statement has run, therefore it cannot be directly used as showning your example. So you can put the calculation in there twice, once for calval1 and once as substitution for calcval1 in the calval2 calculation.
The other way is to make a derived table with calval1 in it and then calculate calval2 outside the derived table something like:
select calcval1*.25 as calval2, calval1, field1, field2
from (select casestament as cavlval1, field1, field2 from my table) a
You'll need to test both for performance.
You should use an outer apply instead of a subselect:
select V.calc,V.calc*0.25 from FOO outer apply (select case Statement as calc) V
You can't "reset" the value of a calculated column in a Select clause, if that's what you're trying to do... The value of a calculated column is based on the calculated column formulae. Which CAN include the value of another calculated column.... but you canlt reset the formulae in a Select clause... if all you want to do is "output" the value based on two calculated columns, (as the syntax in your question reads" Then the
"[calcval2]"
in
SELECT [calcval1] = CASE Statement, [calcval2] = [calcval1] * .25
would just become a column alias in the output of the Select Clause.
or are you asking how to define the formulae for one calculated column to be based on another?

Resources