Matching at least 1 of 2 values of an array - snowflake-cloud-data-platform

I've got an array [1,2,3,4,5] as a field value
and I'd to calculate the sum of all fields where the array either contains 1 or 2
ex:
array [2,3,4,5,6]
2
array [2,1,3]
1,2
array [5,8,9]
false

Arrays_Overlap() is a super efficient function - way faster than checking things one by one.
-- SET UP PLAY RAYS
WITH CTE AS ( SELECT [1,2,3,4,5] RAY UNION ALL SELECT [10,20,30,40,50] UNION
ALL SELECT [1,2,3,4,5] )
SELECT SUM(VALUE) FROM CTE,TABLE(FLATTEN(INPUT=>RAY)) WHERE ARRAYS_OVERLAP([1,2], RAY)

Related

Replacing elements in an array column in snowflake

I have sample data as follows;
team_id
mode
123
[1,2]
Here mode is an array.The goal is to replace the values in column mode by literal values, such as 1 stands for Ocean, and 2 stands for Air
Expected Output
team_id
mode
123
[Ocean,Air]
Present Approach
As an attempt, I tried to first flatten the data into multiple rows;
team_id
mode
123
1
123
2
Then we can define a new column assigning literal values to mode column using a case statement, followed by aggregating the values into an array to get desired output.
Can I get some help here to do the replacement directly in the array? Thanks in advance.
Using FLATTEN and ARRAY_AGG:
CREATE OR REPLACE TABLE tab(team_id INT, mode ARRAY) AS SELECT 123, [1,2];
SELECT TEAM_ID,
ARRAY_AGG(CASE f.value::TEXT
WHEN 1 THEN 'Ocean'
WHEN 2 THEN 'Air'
ELSE 'Unknown'
END) WITHIN GROUP(ORDER BY f.index) AS new_mode
FROM tab
,LATERAL FLATTEN(tab.mode) AS f
GROUP BY TEAM_ID;
Output:
TEAM_ID
NEW_MODE
123
[ "Ocean", "Air" ]
For an alternative solution with easy array manipulation. you could create a JS UDF:
create or replace function replace_vals_in_array(A variant)
returns variant
language javascript
as $$
dict = {1:'a', 2:'b', 3:'c', 4:'d'};
return A.map(x => dict[x]);
$$;
Then to update your table:
update arrs
set arr = replace_vals_in_array(arr);
Example setup:
create or replace temp table arrs as (
select 1 id, [1,2,3] arr
union all select 2, [2,4]
);
select *, replace_vals_in_array(arr)
from arrs;

How to extract data from an Array looking JSON in BigQuery

I am trying to extract data from the variable metric_data that looks like an Array but it's a JSON.
This is an example:
[{"segmentName":"control","values":[[1588636800000.0,101],[1588723200000.0,546],[1588809600000.0,1195],[1591056000000.0,129]]},{"segmentName":"experiment","values":[[1588636800000.0,91],[1588723200000.0,680],[1588809600000.0,1214],[1588896000000.0,1269],.0,290],[1589760000000.0,248],[1589846400000.0,173],[1589932800000.0,167],[1590019200000.0,178],[1590105600000.0,131],[1590192000000.0,110]]}]
I am specifically trying to sum up the second part of the sub-arrays associated with the key "value" so that I have a row for each segmentName and sum of its values. I only got as far as transforming into an array.
SELECT
array(select
x
FROM UNNEST(JSON_EXTRACT_ARRAY(metric_data, '$')) x
) extracted
FROM temp
Based from my understanding you would like to get the sum for each "segmentName". Two possibilities could be to sum everything (both array elements) or get the sum per element. But if my understanding is wrong please let me know so I can edit/delete my answer.
You can consider the queries below illustrating these two possibilities:
Sum of values
with sample_data as (
select '[{"segmentName":"control","values":[[1588636800000.0,101],[1588723200000.0,546],[1588809600000.0,1195],[1591056000000.0,129]]},{"segmentName":"experiment","values":[[1588636800000.0,91],[1588723200000.0,680],[1588809600000.0,1214],[1588896000000.0,1269],[1588896000000.0,290],[1589760000000.0,248],[1589846400000.0,173],[1589932800000.0,167],[1590019200000.0,178],[1590105600000.0,131],[1590192000000.0,110]]}]' as json_string
)
-- Sum all values
select
json_query(js,'$.segmentName') as segment_name,
sum(cast(arr_val as NUMERIC)) as sum_of_values
from sample_data
,unnest(json_query_array(json_string, '$')) js
,unnest(json_query_array(js,'$.values')) val
,unnest(json_query_array(val,'$')) arr_val
group by 1
Sum of values output:
Sum per element of values
with sample_data as (
select '[{"segmentName":"control","values":[[1588636800000.0,101],[1588723200000.0,546],[1588809600000.0,1195],[1591056000000.0,129]]},{"segmentName":"experiment","values":[[1588636800000.0,91],[1588723200000.0,680],[1588809600000.0,1214],[1588896000000.0,1269],[1588896000000.0,290],[1589760000000.0,248],[1589846400000.0,173],[1589932800000.0,167],[1590019200000.0,178],[1590105600000.0,131],[1590192000000.0,110]]}]' as json_string
)
,cte as (
select
json_query(js,'$.segmentName') as segment_name,
split(regexp_extract(val,r'\[(\d+\.?\d+?,\d+)\]'),',') as new_value
from sample_data
,unnest(json_query_array(json_string, '$')) js
,unnest(json_query_array(js,'$.values')) val
)
select
segment_name,
sum(cast(new_value[offset(0)] as numeric)) as elem1,
sum(cast(new_value[offset(1)] as numeric)) as elem2
from cte
group by segment_name
Sum per element of values output:
NOTE: Your JSON string is missing some brackets and values, hence I created a dummy value.

Using the window function "last_value", when the values of the sorted field are same, the value snowflake returns is not the last value

As we all known, the window function "last_value" returns the last value within an ordered group of values.
In the following example, group by field "A" and sort by field "B" in positive order.
In the group of "A = 1", the last value is returned, which is, the C value 4 when B = 2.
However, in the group of "A = 2", the values of field "B" are the same.
At this time, instead of the last value, which is, the C value 4 in line 6, the first C value 1 in B = 2 is returned.
This puzzles me why the last value within an ordered group of values is not returned when I encounter the value I want to use for sorting.
Example
row_number
A
B
C
LAST_VALUE(C) IGNORE NULLS OVER (PARTITION BY A ORDER BY B ASC)
1
1
1
2
4
2
1
1
1
4
3
1
1
3
4
4
1
2
4
4
5
2
2
1
1
6
2
2
4
1
This puzzles me why the last value within an ordered group of values is not returned when I encounter the value I want to use for sorting.
For partition A equals 2 and column B, there is a tie:
The sort is NOT stable. To achieve stable sort a column or a combination of columns in ORDER BY clause must be unique.
To ilustrate it:
SELECT C
FROM tab
WHERE A = 2
ORDER BY B
LIMIT 1;
It could return either 1 or 4.
If you sort by B within A then any duplicate rows (same A and B values) could appear in any order and therefore last_value could give any of the possible available values.
If you want a specific row, based on some logic, then you would need to sort by all columns within the group to reflect that logic. So in your case you would need to sort by B and C
Good day Bill!
Right, the sorting is not stable and it will return different output each time.
To get stable results, we can run something like below
select
column1,
column2,
column3,
last_value(column3) over (partition by column1 order by
column2,column3) as column2_last
from values
(1,1,2), (1,1,1), (1,1,3),
(1,2,4), (2,2,1), (2,2,4)
order by column1;

Postgres: select array element based on value in other column

I have two column in a postgres table, one containing an array of variable length the other the array position of the elements I want to select.
How do I query these?
testtable
testarray arraypos
a,b,c 3
a NULL
NULL
b,c,d,e 3
a,c 1
What I want is something like:
SELECT testarray[arraypos] FROM testtable;
i.e. getting c, d, a.

Concatenate array elements in order PostgreSQL

Is it possible to concatenate elements of 2 arrays in the correct order of its elements?
Example:
array1=['a','b','c']
array2=['d','e','f']
concatenated_array=['ad','be','cf']
My data is in the following way:
id col1 col2
1 ['a','b','c'] ['d','e','f']
2 ['g','h','i'] ['j','k','l']
3 ['a','b','c'] ['j','k','l']
Use array_agg and unnest (with column alias).
SELECT array_agg(el1||el2)
FROM unnest(ARRAY['a','b','c'], ARRAY['d','e','f']) el (el1, el2);
array_agg
------------
{ad,be,cf}
(1 row)

Resources