How to perform sql aggregation on Snowflake array and output multiple arrays? - snowflake-cloud-data-platform

I have a snowflake array as below rows which is an input, which I would want to check for each value in the array value and spit as multiple output arrays based on the value's length for values with 5 digits as one column, and values with 6 digits as another column.
ID_COL,ARRAY_COL_VALUE
1,[22,333,666666]
2,[1,55555,999999999]
3,[22,444]
Output table:
ID_COL,FIVE_DIGIT_COL,SIX_DIGIT_COL
1,[],[666666]
2,[555555],[]
3,[],[]
Please let me know if we could iterate through each array value and perform SQL aggregation to check column length and then output as a separate column outputs. Creating it through SQL would be great, but UDFs using javascript, python if an option would also be great.

Using SQL and FLATTEN:
CREATE OR REPLACE TABLE t(ID_COL INT,ARRAY_COL_VALUE VARIANT)
AS
SELECT 1,[22,333,666666] UNION ALL
SELECT 2,[1,55555,999999999] UNION ALL
SELECT 3,[22,444];
Query:
SELECT ID_COL,
ARRAY_AGG(CASE WHEN s.value BETWEEN 10000 AND 99999 THEN s.value END) AS FIVE_DIGIT_COL,
ARRAY_AGG(CASE WHEN s.value BETWEEN 100000 AND 999999 THEN s.value END) AS SIX_DIGIT_COL
FROM t, TABLE(FLATTEN(ARRAY_COL_VALUE)) AS s
GROUP BY ID_COL;
And Python UDF:
create or replace function filter_arr(arr variant, num_digits INT)
returns variant
language python
runtime_version = 3.8
handler = 'main'
as $$
def main(arr, num_digits):
return [x for x in arr if len(str(x))==num_digits]
$$;
SELECT ID_COL,
ARRAY_COL_VALUE,
filter_arr(ARRAY_COL_VALUE, 5),
filter_arr(ARRAY_COL_VALUE, 6)
FROM t;
Output:

If you're dealing strictly with numbers here's another way
with cte (id, array_col) as
(select 1,[22,333,666666,666666] union all
select 2,[1,22222,55555,999999999] union all
select 3,[22,444])
select *,
concat(',',array_to_string(array_col,',,'),',') as str_col,
regexp_substr_all(str_col,',([^,]{5}),',1,1,'e') as len_5,
regexp_substr_all(str_col,',([^,]{6}),',1,1,'e') as len_6
from cte;
The basic idea is to turn that array into a string and keep all the digits surrounded by , so that we can parse the pattern using regex_substr_all.
If you're dealing with strings, you can modify it to use a delimiter that won't show up in your data.

Related

Is there a way you can produce an output like this in T-SQL

I have a column which I translate the values using a case statements and I get numbers like this below. There are multiple columns I need to produce the result like this and this is just one column.
How do you produce the output as a whole like this below.
The 12 is the total numbers counting from top to bottom
49 is the Average.
4.08 is the division 49/12.
1 is how many 1's are there in the output list above. As you can see there is only one 1 in the output above
8.33% is the division and percentage comes from 1/12 * 100
and so on. Is there a way to produce this output below?
drop table test111
create table test111
(
Q1 nvarchar(max)
);
INSERT INTO TEST111(Q1)
VALUES('Strongly Agree')
,('Agree')
,('Disagree')
,('Strongly Disagree')
,('Strongly Agree')
,('Agree')
,('Disagree')
,('Neutral');
SELECT
CASE WHEN [Q1] = 'Strongly Agree' THEN 5
WHEN [Q1] = 'Agree' THEN 4
WHEN [Q1] = 'Neutral' THEN 3
WHEN [Q1] = 'Disagree' THEN 2
WHEN [Q1] = 'Strongly Disagree' THEN 1
END AS 'Test Q1'
FROM test111
I have to make a few assumptions here, but it looks like you want to treat an output column like a column in a spreadsheet. You have 12 numbers. You then have a blank "separator" row. Then a row with the number 12 (which is the count of how many numbers you have). Then a row with the number 49, which is the sum of those 12 numbers. Then the 4.08 row, which is rougly the average, and so on.
Some of these outputs can be provided by cube or rollup, but neither is a complete solution.
If you wanted to produce this output directly from TSQL, you would need to have multiple select statements and combine the results of all of those statements using union all. First you would have a select just to get the numbers. Then you would have a second select which outputs a "blank". Then another select which is providing a count. Then another select which is providing a sum. And so on.
You would also no longer be able to output actual numbers, since a "blank" is not a number. Visually it's best represented as an empty string. But now your output column has to be of datatype char or varchar.
You also have to make sure rows come out in the correct order for presentation. So you need a column to order by. You would have to add some kind of ordering column "manually" to each of the select statements, so when you union them all together you can tell SQL in what order the output should be provided.
So the answer to "can it be done?" is technically "yes". But if you think seems like a whole lot of laborious and inefficient TSQL work, you'd be right.
The real solution here is to change your approach. SQL should not be concerned with "output formatting". What you should do is just return the actual data (your 12 numbers) from SQL, and then do all of the additional presentation (like adding a blank row, adding a count row, etc), in the code of the program that is calling SQL to get that data.
I must say, this is one of the strangest T-SQL requirements I've seen, and is really best left to the presentation layer.
It is possible using GROUPING SETS though. We can use it to get an extra rollup row that aggregates the whole table.
Once you have the rollup, you need to unpivot the totalled row (identified by GROUPING() = 1) to get your final result. We can do this using CROSS APPLY.
This is impossible without a row-identifier. I have added ROW_NUMBER, but any primary or unique key will do.
WITH YourTable AS (
SELECT
ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS rn,
CASE WHEN [Q1] = 'Strongly Agree' THEN 5
WHEN [Q1] = 'Agree' THEN 4
WHEN [Q1] = 'Neutral' THEN 3
WHEN [Q1] = 'Disagree' THEN 2
WHEN [Q1] = 'Strongly Disagree' THEN 1
END AS TestQ1
FROM test111
),
RolledUp AS (
SELECT
rn,
TestQ1,
grouping = GROUPING(TestQ1),
count = COUNT(*),
sum = SUM(TestQ1),
avg = AVG(TestQ1 * 1.0),
one = COUNT(CASE WHEN TestQ1 = 1 THEN 1 END),
onePct = COUNT(CASE WHEN TestQ1 = 1 THEN 1 END) * 1.0 / COUNT(*)
FROM YourTable
GROUP BY GROUPING SETS(
(rn, TestQ1),
()
)
)
SELECT v.TestQ1
FROM RolledUp r
CROSS APPLY (
SELECT r.TestQ1, 0 AS ordering
WHERE r.grouping = 0
UNION ALL
SELECT v.value, v.ordering
FROM (VALUES
(NULL , 1),
(r.count , 2),
(r.sum , 3),
(r.avg , 4),
(r.one , 5),
(r.onePct, 6)
) v(value, ordering)
WHERE r.grouping = 1
) v
ORDER BY
v.ordering,
r.rn;
db<>fiddle

How do I get the MAX of two values in SQL Server?

I am trying to get the max number of two numbers and I figured that I cannot do it like this SELECT MAX(2, 4).
I did try to do it like this but got an error. Cannot perform an aggregate function on an expression containing an aggregate or a subquery.
SELECT MAX( (SELECT LEN('tests') as value
UNION ALL
SELECT LEN('test') as value) );
How can I overcome this or achieve what I want?
No, you can't do MAX(2,4); MAX only expects one parameter.
For something simple like this, you can use a CASE expression. For example:
SELECT CASE WHEN A > B THEN A ELSE B END
Note this assumes neither value can be NULL. If they can be, then would do something like this:
SELECT CASE WHEN B IS NULL OR A > B THEN A
ELSE B
END
For more complex scenarios, you can use a subquery to unpivot the data:
SELECT (SELECT MAX(V.V)
FROM(VALUES(A),(B),(C),(D),(E),(F),(G))V(V))
db<>fiddle
A small change will do: give the UNION ALL result a name, and then query from it. I used the name union_result, you can of course pick just about any name you like.
SELECT MAX(union_result.value)
FROM (SELECT LEN('tests') as value
UNION ALL
SELECT LEN('test') as value
) AS union_result

BigQuery standard SQL: how to group by an ARRAY field

My table has two columns, id and a. Column id contains a number, column a contains an array of strings. I want to count the number of unique id for a given array, equality between arrays being defined as "same size, same string for each index".
When using GROUP BY a, I get Grouping by expressions of type ARRAY is not allowed. I can use something like GROUP BY ARRAY_TO_STRING(a, ","), but then the two arrays ["a,b"] and ["a","b"] are grouped together, and I lose the "real" value of my array (so if I want to use it later in another query, I have to split the string).
The values in this field array come from the user, so I can't assume that some character is simply never going to be there (and use it as a separator).
Instead of GROUP BY ARRAY_TO_STRING(a, ",") use GROUP BY TO_JSON_STRING(a)
so your query will look like below
#standardsql
SELECT
TO_JSON_STRING(a) arr,
COUNT(DISTINCT id) cnt
FROM `project.dataset.table`
GROUP BY arr
You can test it with dummy data like below
#standardsql
WITH `project.dataset.table` AS (
SELECT 1 id, ["a,b", "c"] a UNION ALL
SELECT 1, ["a","b,c"]
)
SELECT
TO_JSON_STRING(a) arr,
COUNT(DISTINCT id) cnt
FROM `project.dataset.table`
GROUP BY arr
with result as
Row arr cnt
1 ["a,b","c"] 1
2 ["a","b,c"] 1
Update based on #Ted's comment
#standardsql
SELECT
ANY_VALUE(a) a,
COUNT(DISTINCT id) cnt
FROM `project.dataset.table`
GROUP BY TO_JSON_STRING(a)
Alternatively, you can use another separator than comma
ARRAY_TO_STRING(a,"|")

SQL Server 2014: How to convert a VARCHAR column mixed with characters and numbers to corresponding numbers

I have a column called result in SQL Server 2014 which has various kinds of lab test results. The values for result can be characters, numbers (integer or decimals or scientific notations) like this:
positive
negative
not detect
n/a
101
15.3
78.002
-12.1
3.49952E-10
7.3E9
I want to only select those representing numbers, which are...
101
15.3
78.002
-12.1
3.49952E-10
7.3E9
And, I want to convert them into a numeric column with the corresponding values. I also want to get AVG, stdev, min, and max of them.
Can someone help me please?
Thanks a lot!
You could use ISNUMERIC function and CAST it to number
DECLARE #SampleData AS TABLE (Value varchar(30))
INSERT INTO #SampleData
VALUES ('positive'),('negative'),('101'),('15.3'),
('78.002'),('-12.1'),('3.49952E-10'),('7.3E9')
SELECT CAST(sd.[Value] AS float) AS Value
FROM #SampleData sd
WHERE isnumeric(sd.[Value]) = 1
Demo link: Rextester
In SQL Server 2012 and newer, you can also use the TRY_CAST function to try to convert a string to a numeric value - if it fails, it will not crash and burn, but instead just simply return NULL.
Based on that, you could use something like this:
-- define a CTE - an "inline" view which handles the conversion
;WITH CTE AS
(
SELECT NumValue = TRY_CAST(YourColumnName AS FLOAT)
FROM dbo.YourTable
)
-- select only those rows from the CTE that have a non-NULL "NumValue"
SELECT *
FROM CTE
WHERE NumValue IS NOT NULL
You could also use pattern matching by using LIKE operator,
SELECT AVG(NumValue) AS Average
,STDEV(NumValue) AS StDev
,MIN(NumValue) AS Min
,MAX(NumValue) AS Max
FROM
(SELECT CONVERT(FLOAT,YourColumn) AS NumValue
FROM YourTable
WHERE YourColumn LIKE '%[0-9]%') x
This subquery will display any data that has number in it, and would return error if there is alphanumeric data other than exponential notation (i.e 3.49952E-10), in that case you could just specified the pattern after LIKE operator.
by using LIKE operator we can restrict string data
;WITH Cte (TextData)
AS
(
SELECT 'positive' UNION ALL
SELECT 'negative' UNION ALL
SELECT 'not detect' UNION ALL
SELECT 'n/a' UNION ALL
SELECT '101' UNION ALL
SELECT '15.3' UNION ALL
SELECT '78.002' UNION ALL
SELECT '-12.1' UNION ALL
SELECT '3.49952E-10'UNION ALL
SELECT '7.3E9'
)
SELECT *
FROM Cte
WHERE TextData LIKE '%[0-9]%'

How can i select top numbers from array column in postgresql 9.4

create table foo_table(foo_id int, foo_array int[], some_other_column text)
insert into foo_table(foo_array) values (ARRAY[1,3,8,32,55])
insert into foo_table(foo_array) values (ARRAY[2,4,9,31,38,92,99])
insert into foo_table(foo_array) values (ARRAY[5,12,15,35,47])
insert into foo_table(foo_array) values (ARRAY[6,7,13])
The foo_array arrays will have variable number of elements.
All the array elements will be unique and all the numbers in all arrays will also be unique.
I wonder that how can i select biggest 5 numbers from foo_array column which in that case would be 99,92,55, 47, 38.
select t.nr
from foo_table
cross join lateral unnest(foo_array) as t(nr)
order by nr desc
limit 5
Alternatively somewhat shorter:
select unnest(foo_array) nr
from foo_table
order by nr desc
limit 5
Using a set-returning function in the select list is somewhat deprecated - or a least discouraged.

Resources