Postgres Arrays Rows aggregation - arrays

I have a recursive query in which i m getting rows of arrays as shown below. How could I possible merge all rows into one array in one row and removing duplicates? Ordering is not important.
--my_column--
"{431}"
"{431,33}"
"{431,60}"
"{431,28}"
"{431,1}"
"{431,226}"
"{431,38}"
"{431,226,229}"
"{431,226,227}"
"{431,226,235}"
"{431,226,239}"
"{431,226,241}"
I tried the query below but I am getting one empty integer [] column
select array(select unnest(my_column) from my_table
thanks

Use array_agg() with distinct and (not necessary) order by from unnest():
with my_table(my_column) as (
values
('{431}'::int[]),
('{431,33}'),
('{431,60}'),
('{431,28}'),
('{431,1}'),
('{431,226}'),
('{431,38}'),
('{431,226,229}'),
('{431,226,227}'),
('{431,226,235}'),
('{431,226,239}'),
('{431,226,241}')
)
select array_agg(distinct elem order by elem)
from my_table,
lateral unnest(my_column) elem;
array_agg
---------------------------------------------
{1,28,33,38,60,226,227,229,235,239,241,431}
(1 row)

Another solution without lateral subquery:
select array_agg(distinct val) from
(select unnest(my_column) as val from my_table) x;

Related

SQL - Attain Previous Transaction Informaiton [duplicate]

I need to calculate the difference of a column between two lines of a table. Is there any way I can do this directly in SQL? I'm using Microsoft SQL Server 2008.
I'm looking for something like this:
SELECT value - (previous.value) FROM table
Imagining that the "previous" variable reference the latest selected row. Of course with a select like that I will end up with n-1 rows selected in a table with n rows, that's not a probably, actually is exactly what I need.
Is that possible in some way?
Use the lag function:
SELECT value - lag(value) OVER (ORDER BY Id) FROM table
Sequences used for Ids can skip values, so Id-1 does not always work.
SQL has no built in notion of order, so you need to order by some column for this to be meaningful. Something like this:
select t1.value - t2.value from table t1, table t2
where t1.primaryKey = t2.primaryKey - 1
If you know how to order things but not how to get the previous value given the current one (EG, you want to order alphabetically) then I don't know of a way to do that in standard SQL, but most SQL implementations will have extensions to do it.
Here is a way for SQL server that works if you can order rows such that each one is distinct:
select rank() OVER (ORDER BY id) as 'Rank', value into temp1 from t
select t1.value - t2.value from temp1 t1, temp1 t2
where t1.Rank = t2.Rank - 1
drop table temp1
If you need to break ties, you can add as many columns as necessary to the ORDER BY.
WITH CTE AS (
SELECT
rownum = ROW_NUMBER() OVER (ORDER BY columns_to_order_by),
value
FROM table
)
SELECT
curr.value - prev.value
FROM CTE cur
INNER JOIN CTE prev on prev.rownum = cur.rownum - 1
Oracle, PostgreSQL, SQL Server and many more RDBMS engines have analytic functions called LAG and LEAD that do this very thing.
In SQL Server prior to 2012 you'd need to do the following:
SELECT value - (
SELECT TOP 1 value
FROM mytable m2
WHERE m2.col1 < m1.col1 OR (m2.col1 = m1.col1 AND m2.pk < m1.pk)
ORDER BY
col1, pk
)
FROM mytable m1
ORDER BY
col1, pk
, where COL1 is the column you are ordering by.
Having an index on (COL1, PK) will greatly improve this query.
LEFT JOIN the table to itself, with the join condition worked out so the row matched in the joined version of the table is one row previous, for your particular definition of "previous".
Update: At first I was thinking you would want to keep all rows, with NULLs for the condition where there was no previous row. Reading it again you just want that rows culled, so you should an inner join rather than a left join.
Update:
Newer versions of Sql Server also have the LAG and LEAD Windowing functions that can be used for this, too.
select t2.col from (
select col,MAX(ID) id from
(
select ROW_NUMBER() over(PARTITION by col order by col) id ,col from testtab t1) as t1
group by col) as t2
The selected answer will only work if there are no gaps in the sequence. However if you are using an autogenerated id, there are likely to be gaps in the sequence due to inserts that were rolled back.
This method should work if you have gaps
declare #temp (value int, primaryKey int, tempid int identity)
insert value, primarykey from mytable order by primarykey
select t1.value - t2.value from #temp t1
join #temp t2
on t1.tempid = t2.tempid - 1
Another way to refer to the previous row in an SQL query is to use a recursive common table expression (CTE):
CREATE TABLE t (counter INTEGER);
INSERT INTO t VALUES (1),(2),(3),(4),(5);
WITH cte(counter, previous, difference) AS (
-- Anchor query
SELECT MIN(counter), 0, MIN(counter)
FROM t
UNION ALL
-- Recursive query
SELECT t.counter, cte.counter, t.counter - cte.counter
FROM t JOIN cte ON cte.counter = t.counter - 1
)
SELECT counter, previous, difference
FROM cte
ORDER BY counter;
Result:
counter
previous
difference
1
0
1
2
1
1
3
2
1
4
3
1
5
4
1
The anchor query generates the first row of the common table expression cte where it sets cte.counter to column t.counter in the first row of table t, cte.previous to 0, and cte.difference to the first row of t.counter.
The recursive query joins each row of common table expression cte to the previous row of table t. In the recursive query, cte.counter refers to t.counter in each row of table t, cte.previous refers to cte.counter in the previous row of cte, and t.counter - cte.counter refers to the difference between these two columns.
Note that a recursive CTE is more flexible than the LAG and LEAD functions because a row can refer to any arbitrary result of a previous row. (A recursive function or process is one where the input of the process is the output of the previous iteration of that process, except the first input which is a constant.)
I tested this query at SQLite Online.
You can use the following funtion to get current row value and previous row value:
SELECT value,
min(value) over (order by id rows between 1 preceding and 1
preceding) as value_prev
FROM table
Then you can just select value - value_prev from that select and get your answer

Select with except in postgresql

General except query is like this
(SELECT * FROM name_of_table_one
EXCEPT
SELECT * FROM name_of_table_two);
is there a way to write a query where I pass a list of values and perform except or intersect operation with a specific column of a table and select from that the list I had passed to DB.
Use a VALUES list.
select * from (values (1),(2),(100000001)) as f (aid)
except
select aid from pgbench_accounts
You can achieve that with the IN clause
"except":
SELECT * FROM your_table WHERE your_column NOT IN (list of values)
"intersect":
SELECT * FROM your_table WHERE your_column IN (list of values)

Exploding a list in Hive SQL to identify blanks

I have a column called part_nos_list as array<\string> in a hive table. Apparently that column has blank and I want to update that with a '-'. The code sort of does that but 42 rows in a group by has shown having blanks. I tried to check individual records but am not successful. Here is the hive sql. Is there something wrong here in this sql
SELECT order_id, exploded_part_nos
FROM sales.order_detail LATERAL VIEW explode(part_no_list) part_nos AS exploded_part_nos where sale_type in ('POS', 'OTC' , 'CCC') and exploded_part_nos = ''
However this group by sql shows 42 as blanks
select * from (SELECT explo,count(*) as uni_explo_cnt
FROM sales.order_detail
LATERAL VIEW explode(split(concat_ws("##", part_no_list),'##')) yy AS explo where sale_type in ('POS', 'OTC' , 'CCC') group by explo order by explo asc) DD
Here is what hive table looks like
Id Part_no_list
1 ["OTC","POS","CCC"]
2 ["OTC","POS"]
4 NULL
5
6 ["-"]
7 ["OTC","POS","CCC"]
Thanks in advance
To test if exploded array element is empty, use this:
select * from(
select explode( array("OTC","POS","CCC","")) as explo
) s where explo=''
Result is one empty string.
If you want to identify array containing empty element use array_contains:
select * from(
select array("OTC","POS","CCC","") as a
) s where array_contains(a,'')
Result:
["OTC","POS","CCC",""]
If you want to find array containing only one element - empty string use size(array)=1 or array_contains(array,'')
But there is also such thing as an empty array.
It displayed the same as array containing empty element but it is not the same.
And to find empty array, use size()=0
Example:
select * from(
select array() as a
) s where size(a)=0
Returns []
Run all these queries on your data and you will become enlightened. I think it is empty arrays, not empty element in your case
Empty array is not NULL, because it is still an array object of zero size:
select * from(
select array() as a
) s where a is null
Returns no rows
Better query array only without explode and use array_contains and size to find empty arrays and empty elements. Use LATERAL VIEW OUTER to generate rows even when a LATERAL VIEW usually would not generate a row. LATERAL VIEW without OUTER word works as INNER JOIN, see docs about LATERAL VIEW OUTER

BigQuery standard SQL: how to group by an ARRAY field

My table has two columns, id and a. Column id contains a number, column a contains an array of strings. I want to count the number of unique id for a given array, equality between arrays being defined as "same size, same string for each index".
When using GROUP BY a, I get Grouping by expressions of type ARRAY is not allowed. I can use something like GROUP BY ARRAY_TO_STRING(a, ","), but then the two arrays ["a,b"] and ["a","b"] are grouped together, and I lose the "real" value of my array (so if I want to use it later in another query, I have to split the string).
The values in this field array come from the user, so I can't assume that some character is simply never going to be there (and use it as a separator).
Instead of GROUP BY ARRAY_TO_STRING(a, ",") use GROUP BY TO_JSON_STRING(a)
so your query will look like below
#standardsql
SELECT
TO_JSON_STRING(a) arr,
COUNT(DISTINCT id) cnt
FROM `project.dataset.table`
GROUP BY arr
You can test it with dummy data like below
#standardsql
WITH `project.dataset.table` AS (
SELECT 1 id, ["a,b", "c"] a UNION ALL
SELECT 1, ["a","b,c"]
)
SELECT
TO_JSON_STRING(a) arr,
COUNT(DISTINCT id) cnt
FROM `project.dataset.table`
GROUP BY arr
with result as
Row arr cnt
1 ["a,b","c"] 1
2 ["a","b,c"] 1
Update based on #Ted's comment
#standardsql
SELECT
ANY_VALUE(a) a,
COUNT(DISTINCT id) cnt
FROM `project.dataset.table`
GROUP BY TO_JSON_STRING(a)
Alternatively, you can use another separator than comma
ARRAY_TO_STRING(a,"|")

Check if a group contains all the ids in any of the arrays supplied

I pass a 2d array to a procedure. This array contains multiple arrays of ids. I want to
group a table by group_id
for each group, for each array in the 2d array
IF this group has all the ids within this iteration array, then return it
I read here about issues with 2d arrays:
postgres, contains-operator for multidimensional arrays performs flatten before comparing?
I think I'm nearly there, but I am unsure how to get around the problem. I understand why the following code produces the error "Subquery can only return one column", but I cant work out how to fix it
DEALLOCATE my_proc;
PREPARE my_proc (bigint[][]) AS
WITH cte_arr AS (select $1 AS arr),
cte_s AS (select generate_subscripts(arr,1) AS subscript,
arr from cte_arr),
grouped AS (SELECT ufs.user_id, array_agg(entity_id)
FROM table_A AS ufs
GROUP BY ufs.user_id)
SELECT *
FROM grouped
WHERE (select arr[subscript:subscript] #> array_agg AS sub,
arr[subscript:subscript]
from cte_s);
EXECUTE my_proc(array[array[1, 2], array[1,3]]);
You can create a row for each group and each array in the parameter with a cross join:
PREPARE stmt (bigint[][]) AS
with grouped as
(
select user_id
, array_agg(entity_id) as user_groups
from table_A
group by
user_id
)
select user_id
, user_groups
, $1[subscript:subscript] as matches
from grouped
cross join
generate_subscripts($1, 1) as gen(subscript)
where user_groups #> $1[subscript:subscript]
;
Example at SQL Fiddle

Resources