I have the following table in Big Query which has an array of struct type. I have to perform a union operation with a simple table and want to add null values in place of the nested columns.
Actual nested table example -
Simple table (which needs to be union-ed)
acc
date
count
acc_6
11/29/2022
2
acc_8
11/30/2022
3
I tried the following query but it gives an error of incompatible types on the nested columns
select * from actual_table
union all
select acc, date, count,
array_agg(struct(cast(null as string) as device_id, cast(null as date) as to_date, cast(null as string) as from_date) as d
from simple_table
The resultant table should look like this -
Since d has a type of array of struct<string, string, string>, you need to write a null struct like below.
SELECT * FROM actual_table
UNION ALL
SELECT *, [STRUCT(CAST(null AS STRING), CAST(null AS STRING), CAST(null AS STRING))] FROM simple_table;
[] is for array literal. see Using array literals
field names in null struct are optional cause they are already declared in actual_table before union all.
you can use STRING(null) instead of CAST(null AS STRING) which is a little bit concise.
Query results
Related
I have a snowflake array as below rows which is an input, which I would want to check for each value in the array value and spit as multiple output arrays based on the value's length for values with 5 digits as one column, and values with 6 digits as another column.
ID_COL,ARRAY_COL_VALUE
1,[22,333,666666]
2,[1,55555,999999999]
3,[22,444]
Output table:
ID_COL,FIVE_DIGIT_COL,SIX_DIGIT_COL
1,[],[666666]
2,[555555],[]
3,[],[]
Please let me know if we could iterate through each array value and perform SQL aggregation to check column length and then output as a separate column outputs. Creating it through SQL would be great, but UDFs using javascript, python if an option would also be great.
Using SQL and FLATTEN:
CREATE OR REPLACE TABLE t(ID_COL INT,ARRAY_COL_VALUE VARIANT)
AS
SELECT 1,[22,333,666666] UNION ALL
SELECT 2,[1,55555,999999999] UNION ALL
SELECT 3,[22,444];
Query:
SELECT ID_COL,
ARRAY_AGG(CASE WHEN s.value BETWEEN 10000 AND 99999 THEN s.value END) AS FIVE_DIGIT_COL,
ARRAY_AGG(CASE WHEN s.value BETWEEN 100000 AND 999999 THEN s.value END) AS SIX_DIGIT_COL
FROM t, TABLE(FLATTEN(ARRAY_COL_VALUE)) AS s
GROUP BY ID_COL;
And Python UDF:
create or replace function filter_arr(arr variant, num_digits INT)
returns variant
language python
runtime_version = 3.8
handler = 'main'
as $$
def main(arr, num_digits):
return [x for x in arr if len(str(x))==num_digits]
$$;
SELECT ID_COL,
ARRAY_COL_VALUE,
filter_arr(ARRAY_COL_VALUE, 5),
filter_arr(ARRAY_COL_VALUE, 6)
FROM t;
Output:
If you're dealing strictly with numbers here's another way
with cte (id, array_col) as
(select 1,[22,333,666666,666666] union all
select 2,[1,22222,55555,999999999] union all
select 3,[22,444])
select *,
concat(',',array_to_string(array_col,',,'),',') as str_col,
regexp_substr_all(str_col,',([^,]{5}),',1,1,'e') as len_5,
regexp_substr_all(str_col,',([^,]{6}),',1,1,'e') as len_6
from cte;
The basic idea is to turn that array into a string and keep all the digits surrounded by , so that we can parse the pattern using regex_substr_all.
If you're dealing with strings, you can modify it to use a delimiter that won't show up in your data.
I have a table with columns id(int) and trans_id string.
trans_id contains values such as 20345,19345 - the 1st 2 chars represent years, I want a query for transactions that happened in 2020,2019
You should store dates in a date or datetime column, not as a string or integer. And you certainly shouldn't store multiple values in one column.
Assuming trans_id is an int you can do
SELECT *
FROM YourTable t
WHERE trans_id >= 19000 AND trans_id < 21000;
If trans_id is a varchar string, you can do
SELECT *
FROM YourTable t
WHERE trans_id LIKE '20%' OR trans_id LIKE '19%';
If you've gone for an even worse version and stored multiple values, you need to split them first
SELECT *
FROM YourTable t
WHERE EXISTS (SELECT 1
FROM STRING_SPLIT(trans_id, ',') s
WHERE s.value LIKE '20%' OR s.value LIKE '19%'
);
You can also use LEFT to get the first two characters of the string.
Then use IN for the list of years you need.
SELECT *
FROM YourTable t
WHERE LEFT(trans_id, 2) IN ('19', '20')
But don't use BETWEEN without casting the 2 digits to an INT.
I'm trying to migrate an oracle query to Postgres:
SELECT /*+ materialize */ distinct r.empid, r.mgr_id, CONNECT_BY_ISLEAF leafs
FROM (Select * from empid_reports_to_mgrid WHERE sysdate BETWEEN eff_date AND eff_date_end) r
CONNECT BY PRIOR r.mgr_id = r.empid
START WITH r.empid IN (SELECT distinct empid
FROM employee
WHERE event_oid ='345345' AND F_HISTORICAL=0 and F_ELIGIBLE=1);
I arrived at this solution:
( with recursive cte ( empid, mgr_id,level, visited, root_id) AS
(
select empid::varchar ,
mgr_id::varchar,
1 as level,
array[empid]::varchar[] as visited,
empid::varchar as root_id
from (Select * from empid_reports_to_mgrid WHERE now() BETWEEN eff_date AND eff_date_end
where empid IN (SELECT distinct empid
FROM employee
WHERE event_oid ='345345' AND F_HISTORICAL=0 and F_ELIGIBLE=1) ) e
union all
select c.empid::varchar,
c.mgr_id::varchar,
p.level + 1,
(p.visited::varchar[] ||c.empid::varchar[]),
p.root_id::varchar
from (Select * from empid_reports_to_mgrid WHERE now() BETWEEN eff_date AND eff_date_end) c
join cte p on p.mgr_id= c.empid
where c.empid <> all(p.visited)
)
SELECT e.*,
not exists (select * from cte p where p.mgr_id = e.empid) as leafs
FROM cte e);
The columns empid and mgr_id are of data type varchar(32).
When I run this query, I'm getting the below error:
SQL Error [42804]: ERROR: recursive query "cte" column 4 has type character varying(32)[] in non-recursive term but type character varying[] overall
Hint: Cast the output of the non-recursive term to the correct type.
The type casts that are present have been added after looking at the below post which suggests to type cast the recursive columns to get rid of the error but it dint work:
Postgres CTE : type character varying(255)[] in non-recursive term but type character varying[] overall
How do we migrate CONNECT_BY_ISLEAF to postgres? Please help!
Also, what are the recursive columns in this case?
If I'm typecasting to text and text[] instead of varchar and varchar[], I'm getting the below error:
malformed array literal: "21466694N" Detail: Array value must start with "{" or dimension information.
I have a couple of string columns and an array column. My requirement is to convert the array as a string and concatenate with the other string columns to execute MD5 function over the concatenated string column
But Casting array to String is not possible and I tried to use explode and inline function as well in order to extract the array contents but of no luck so far
Any idea on how to achieve this
Explode the array and get the struct elements, build string you need using struct elements and collect array of strings, use concat_ws to convert it to the string and then concatenate with some other column. Like this:
with mydata as (
select ID, my_array
from
( --some array<struct> example
select 1 ID, array(named_struct("city","Hudson","state","NY"),named_struct("city","San Jose","state","CA"),named_struct("city","Albany","state","NY")) as my_array
union all
select 2 ID, array(named_struct("city","San Jose","state","CA"),named_struct("city","San Diego","state","CA")) as my_array
)s
)
select ID, concat(ID,'-', --'-' is a delimiter
concat_ws(',',collect_list(element)) --collect array of strings and concatenate it using ',' delimiter
) as my_string --concatenate with ID column also
from
(
select s.ID, concat_ws(':',a.mystruct.city, mystruct.state) as element --concatenate struct using : as a delimiter Or concatenate in some other way
from mydata s
lateral view explode(s.my_array) a as mystruct
)s
group by ID
;
Returns:
OK
1 1-Hudson:NY,San Jose:CA,Albany:NY
2 2-San Jose:CA,San Diego:CA
Time taken: 63.368 seconds, Fetched: 2 row(s)
Using INLINE you can get struct elements exploded
with mydata as (
select ID, my_array
from
( --some array<struct> example
select 1 ID, array(named_struct("city","Hudson","state","NY"),named_struct("city","San Jose","state","CA"),named_struct("city","Albany","state","NY")) as my_array
union all
select 2 ID, array(named_struct("city","San Jose","state","CA"),named_struct("city","San Diego","state","CA")) as my_array
)s
)
select s.ID, a.city, a.state
from mydata s
lateral view inline(s.my_array) a as city, state
;
And concatenate them as you want again in the string, collect array, concat_ws, etc
My table has two columns, id and a. Column id contains a number, column a contains an array of strings. I want to count the number of unique id for a given array, equality between arrays being defined as "same size, same string for each index".
When using GROUP BY a, I get Grouping by expressions of type ARRAY is not allowed. I can use something like GROUP BY ARRAY_TO_STRING(a, ","), but then the two arrays ["a,b"] and ["a","b"] are grouped together, and I lose the "real" value of my array (so if I want to use it later in another query, I have to split the string).
The values in this field array come from the user, so I can't assume that some character is simply never going to be there (and use it as a separator).
Instead of GROUP BY ARRAY_TO_STRING(a, ",") use GROUP BY TO_JSON_STRING(a)
so your query will look like below
#standardsql
SELECT
TO_JSON_STRING(a) arr,
COUNT(DISTINCT id) cnt
FROM `project.dataset.table`
GROUP BY arr
You can test it with dummy data like below
#standardsql
WITH `project.dataset.table` AS (
SELECT 1 id, ["a,b", "c"] a UNION ALL
SELECT 1, ["a","b,c"]
)
SELECT
TO_JSON_STRING(a) arr,
COUNT(DISTINCT id) cnt
FROM `project.dataset.table`
GROUP BY arr
with result as
Row arr cnt
1 ["a,b","c"] 1
2 ["a","b,c"] 1
Update based on #Ted's comment
#standardsql
SELECT
ANY_VALUE(a) a,
COUNT(DISTINCT id) cnt
FROM `project.dataset.table`
GROUP BY TO_JSON_STRING(a)
Alternatively, you can use another separator than comma
ARRAY_TO_STRING(a,"|")