Postgresql Group by array elements in common - arrays

I have a table like this:
CREATE TABLE preferences (name varchar, preferences varchar[]);
INSERT INTO preferences (name, preferences)
VALUES
('John','{pizza, spaghetti}'),
('Charlie','{spaghetti, rice}'),
('Lucy','{rice, potatoes}'),
('Beth','{bread, cheese}'),
('Trudy','{rice, milk}');
So from the table
John {pizza, spaghetti}
Charlie {spaghetti, rice}
Lucy {rice, potatoes}
Beth {bread, cheese}
Trudy {rice, milk}
I would like group all rows that have elements in common (even if it is through other people).
So in this case I would like to end up with:
{John,Charlie,Lucy,Trudy} {pizza,spaghetti,rice,potatoes,milk}
{Beth} {bread, cheese}
because Johns preferences intersect with those of Charlie, and those of Charlie intersect with those of Lucy and with those of Trudy.
I already haven an array_intersection function like this:
CREATE OR REPLACE FUNCTION array_intersection(anyarray, anyarray)
RETURNS anyarray
language sql
as $FUNCTION$
SELECT ARRAY(
SELECT UNNEST($1)
INTERSECT
SELECT UNNEST($2)
);
$FUNCTION$;
and know the array_agg function to aggregate arrays, but how to turn those into a grouping like I want is the step I am missing.

This is a typical task for recursion. You need an auxiliary function to merge and sort two arrays:
create or replace function public.array_merge(arr1 anyarray, arr2 anyarray)
returns anyarray
language sql immutable
as $function$
select array_agg(distinct elem order by elem)
from (
select unnest(arr1) elem
union
select unnest(arr2)
) s
$function$;
Use the function in the recursive query:
with recursive cte(name, preferences) as (
select *
from preferences
union
select p.name, array_merge(c.preferences, p.preferences)
from cte c
join preferences p
on c.preferences && p.preferences
and c.name <> p.name
)
select array_agg(name) as names, preferences
from (
select distinct on(name) *
from cte
order by name, cardinality(preferences) desc
) s
group by preferences;
names | preferences
---------------------------+--------------------------------------
{Charlie,John,Lucy,Trudy} | {milk,pizza,potatoes,rice,spaghetti}
{Beth} | {bread,cheese}
(2 rows)

Related

Select with except in postgresql

General except query is like this
(SELECT * FROM name_of_table_one
EXCEPT
SELECT * FROM name_of_table_two);
is there a way to write a query where I pass a list of values and perform except or intersect operation with a specific column of a table and select from that the list I had passed to DB.
Use a VALUES list.
select * from (values (1),(2),(100000001)) as f (aid)
except
select aid from pgbench_accounts
You can achieve that with the IN clause
"except":
SELECT * FROM your_table WHERE your_column NOT IN (list of values)
"intersect":
SELECT * FROM your_table WHERE your_column IN (list of values)

Flatten and aggregate two columns of arrays via distinct in Snowflake

Table structure is
+------------+---------+
| Animals | Herbs |
+------------+---------+
| [Cat, Dog] | [Basil] |
| [Dog, Lion]| [] |
+------------+---------+
Desired output (don't care about sorting of this list):
unique_things
+------------+
[Cat, Dog, Lion, Basil]
First attempt was something like
SELECT ARRAY_CAT(ARRAY_AGG(DISTINCT(animals)), ARRAY_AGG(herbs))
But this produces
[[Cat, Dog], [Dog, Lion], [Basil], []]
Since the distinct is operating on each array, not looking at distinct components within all arrays
If I understand your requirements right and assuming the source tables of
insert into tabarray select array_construct('cat', 'dog'), array_construct('basil');
insert into tabarray select array_construct('lion', 'dog'), null;
I would say the result would look like this:
select array_agg(distinct value) from
(
select
value from tabarray
, lateral flatten( input => col1 )
union all
select
value from tabarray
, lateral flatten( input => col2 ))
;
UPDATE
It is possible without using FLATTEN, by using ARRAY_UNION_AGG:
Returns an ARRAY that contains the union of the distinct values from the input ARRAYs in a column.
For sample data:
CREATE OR REPLACE TABLE t AS
SELECT ['Cat', 'Dog'] AS Animals, ['Basil'] AS Herbs
UNION SELECT ['Dog', 'Lion'], [];
Query:
SELECT ARRAY_UNION_AGG(ARRAY_CAT(Animals, Herbs)) AS Result
FROM t
or:
SELECT ARRAY_UNION_AGG(Animals) AS Result
FROM (SELECT Animals FROM t
UNION ALL
SELECT Herbs FROM t);
Output:
You could flatten the combined array and then aggregate back:
SELECT ARRAY_AGG(DISTINCT F."VALUE") AS unique_things
FROM tab, TABLE(FLATTEN(ARRAY_CAT(tab.Animals, tab.Herbs))) f
Here is another variation to handle NULLs in case they appear in data set.
SELECT ARRAY_AGG(DISTINCT a.VALUE) unique_things from tab, TABLE (FLATTEN(array_compact(array_append(tab.Animals, tab.Herbs)))) a

postgres array_agg ERROR: cannot accumulate arrays of different dimensionality

I have a parcels table in postgresql in which the zoning and zoning_description columns are array_agg cast over to text. the new.universities table has 9 rows and I need to return 9 rows in the output.
The purpose of this query is to find all the properties these universities are located on and collapse there zoning types into 1 unique column and union/dissolve their geometries into multipolygons
select array_agg(distinct dp.zoning) zoning,array_agg(distinct dp.zoning_description) zoning_description,
uni.school name_,uni.address,'University' type_,1000 buff,st_union(dp.geom)
from new.universities uni join new.detroit_parcels_update dp
on st_intersects(st_buffer(uni.geom,-10),dp.geom)
group by name_,uni.address,type_,buff
I get this error
ERROR: cannot accumulate arrays of different dimensionality
********** Error **********
ERROR: cannot accumulate arrays of different dimensionality
SQL state: 2202E
I can do array_agg(distinct dp.zoning::text) zoning etc.. but this returns a completely messed up column with nested arrays in arrays...
Based on the answer here is my updated query which does not work
select array_agg(distinct zoning_u) zoning,array_agg(distinct zoning_description_u) zoning_description,
uni.school name_,uni.address,'University' type_,1000::int buff,st_union(dp.geom) geom
from new.detroit_parcels_update dp,unnest(zoning) zoning_u,
unnest(zoning_description) zoning_description_u
join new.universities uni
on st_intersects(st_buffer(uni.geom,-10),dp.geom)
group by name_,uni.address,type_,buff order by name_
get this error
ERROR: invalid reference to FROM-clause entry for table "dp"
LINE 6: on st_intersects(st_buffer(uni.geom,-10),dp.geom)
^
HINT: There is an entry for table "dp", but it cannot be referenced from this part of the query.
********** Error **********
ERROR: invalid reference to FROM-clause entry for table "dp"
SQL state: 42P01
Hint: There is an entry for table "dp", but it cannot be referenced from this part of the query.
Character: 373
My Final query which worked was
with t as(select dp.zoning,dp.zoning_description,uni.school name_,uni.address,'University' type_,1000::int buff,st_union(dp.geom) geom
from new.detroit_parcels_update dp
join new.universities uni
on st_intersects(st_buffer(uni.geom,-10),dp.geom)
group by name_,uni.address,type_,buff,dp.zoning,zoning_description order by name_
)
select name_,address,type_,buff,st_union(geom) geom,array_agg(distinct z) zoning, array_agg(distinct zd) zoning_description
from t,unnest(zoning) z,unnest(zoning_description) zd
group by name_,address,type_,buff
Example data:
create table my_table(name text, numbers text[], letters text[]);
insert into my_table values
('first', '{1, 2}', '{a}' ),
('first', '{2, 3}', '{a, b}'),
('second', '{4}', '{c, d}'),
('second', '{5, 6}', '{c}' );
You should aggregate arrays elements, not arrays. Use unnest():
select
name,
array_agg(distinct number) as numbers,
array_agg(distinct letter) as letters
from
my_table,
unnest(numbers) as number,
unnest(letters) as letter
group by name;
name | numbers | letters
--------+---------+---------
first | {1,2,3} | {a,b}
second | {4,5,6} | {c,d}
(2 rows)
Alternatively, you can create a custom aggregate. You need a function to merge arrays (concatenation with duplicates removing):
create or replace function public.array_merge(arr1 anyarray, arr2 anyarray)
returns anyarray language sql immutable
as $$
select array_agg(distinct elem order by elem)
from (
select unnest(arr1) elem
union
select unnest(arr2)
) s
$$;
create aggregate array_merge_agg(anyarray) (
sfunc = array_merge,
stype = anyarray
);
select
name,
array_merge_agg(numbers) as numbers,
array_merge_agg(letters) as letters
from my_table
group by name;
A much simpler alternative is to create a custom aggregate function (you only need to do this once)
CREATE AGGREGATE array_concat_agg(anyarray) (
SFUNC = array_cat,
STYPE = anyarray
);
Then replace array_agg for array_concat_agg:
SELECT
array_concat_agg(DISTINCT dp.zoning) zoning,
array_concat_agg(DISTINCT dp.zoning_description) zoning_description,
uni.school name_,
uni.address,
'University' type_,
1000 buff,
st_union(dp.geom)
FROM
new.universities uni
JOIN new.detroit_parcels_update dp ON st_intersects(st_buffer(uni.geom, - 10), dp.geom)
GROUP BY
name_,
uni.address,
type_,
buff

Postgres Arrays Rows aggregation

I have a recursive query in which i m getting rows of arrays as shown below. How could I possible merge all rows into one array in one row and removing duplicates? Ordering is not important.
--my_column--
"{431}"
"{431,33}"
"{431,60}"
"{431,28}"
"{431,1}"
"{431,226}"
"{431,38}"
"{431,226,229}"
"{431,226,227}"
"{431,226,235}"
"{431,226,239}"
"{431,226,241}"
I tried the query below but I am getting one empty integer [] column
select array(select unnest(my_column) from my_table
thanks
Use array_agg() with distinct and (not necessary) order by from unnest():
with my_table(my_column) as (
values
('{431}'::int[]),
('{431,33}'),
('{431,60}'),
('{431,28}'),
('{431,1}'),
('{431,226}'),
('{431,38}'),
('{431,226,229}'),
('{431,226,227}'),
('{431,226,235}'),
('{431,226,239}'),
('{431,226,241}')
)
select array_agg(distinct elem order by elem)
from my_table,
lateral unnest(my_column) elem;
array_agg
---------------------------------------------
{1,28,33,38,60,226,227,229,235,239,241,431}
(1 row)
Another solution without lateral subquery:
select array_agg(distinct val) from
(select unnest(my_column) as val from my_table) x;

How do I search for an item in an array in Hive?

Using Hive I've created a table with the following fields:
ID BIGINT,
MSISDN STRING,
DAY TINYINT,
MONTH TINYINT,
YEAR INT,
GENDER TINYINT,
RELATIONSHIPSTATUS TINYINT,
EDUCATION STRING,
LIKES_AND_PREFERENCES STRING
This was filled with data via the following SQL command:
Insert overwrite table temp_output Select a.ID, a.MSISDN, a.DAY, a.MONTH, a.YEAR, a.GENDER, a.RELATIONSHIPSTATUS, b.NAME, COLLECT_SET(c.NAME) FROM temp_basic_info a JOIN temp_education b ON (a.ID = b.ID) JOIN likes_and_music c ON (c.ID = b.ID) GROUP BY a.ID, a.MSISDN, a.DAY, a.MONTH, a.YEAR, a.Gender, a.RELATIONSHIPSTATUS, b.NAME;
Likes and Preferences is an array, but I was not foresighted enough to specify it as such (it's a string, instead). How would I go about selecting records that have a specific item in the array?
Is it as simple as:
select * from table_result where LIKES_AND_PREFERENCES = "item"
Or will that have some unforeseen issues?
I tried that query above, and it does seam to output the files with only the "items" in the array, though.
May be you should try something like this:
select * from (
select col1,col2..coln, new_column from table_name lateral view explode(array_column_name) exploded_table as new_column
) t where t.new_column = '<value of items to be searched>'
Hope this helps...!!!
Using the array_contains udf in the following manner --
select *
from mytable
where array_contains(likes_and_preferences,'item') = TRUE
array_contains will return a Boolean that you can predicate on.
You are correct the function you used will return only records where array has only one element with value : "item"
You need to use : array_contains function.

Resources