Array difference in postgresql - arrays

I have two arrays [1,2,3,4,7,6] and [2,3,7] in PostgreSQL which may have common elements. What I am trying to do is to exclude from the first array all the elements that are present in the second.
So far I have achieved the following:
SELECT array
(SELECT unnest(array[1, 2, 3, 4, 7, 6])
EXCEPT SELECT unnest(array[2, 3, 7]));
However, the ordering is not correct as the result is {4,6,1} instead of the desired {1,4,6}.
How can I fix this ?
I finally created a custom function with the following definition (taken from here) which resolved my issue:
create or replace function array_diff(array1 anyarray, array2 anyarray)
returns anyarray language sql immutable as $$
select coalesce(array_agg(elem), '{}')
from unnest(array1) elem
where elem <> all(array2)
$$;

I would use ORDINALITY option of UNNEST and put an ORDER BY in the array_agg function while converting it back to array. NOT EXISTS is preferred over except to make it simpler.
SELECT array_agg(e order by id)
FROM unnest( array[1, 2, 3, 4, 7, 6] ) with ordinality as s1(e,id)
WHERE not exists
(
SELECT 1 FROM unnest(array[2, 3, 7]) as s2(e)
where s2.e = s1.e
)
DEMO

More simple, NULL support, probably faster:
select array(
select v
from unnest(array[2,2,null,1,3,3,4,5,null]) with ordinality as t(v, pos)
where array_position(array[3,3,5,5], v) is null
order by pos
);
Result: {2,2,null,1,4,null}
Function array_diff() with tests.

Postgres is unfortunately lacking this functionality. In my case, what I really needed to do was to detect cases where the array difference was not empty. In that specific case you can do that with the #> operator which means "Does the first array contain the second?"
ARRAY[1,4,3] #> ARRAY[3,1,3] → t
See doc

Related

What is the equivalent of postgresSQL unnest() in snowflake sql

How do i modify the PostgresSQL in snowflake?
UNNEST(array[
'x' || to_char(date_trunc('MONTH', max(date)), 'Mon YYYY' ,
'y' || to_char(date_trunc('MONTH', max(date)), 'Mon YYYY')
)])
You can use "flatten" to break out values from the array, and then "table" to convert the values into a table:
-- Use an array for testing:
select array_construct(1, 2, 3, 4, 5);
-- Flattens into a table with metadata for each row:
select * from table(flatten(input => array_construct(1, 2, 3, 4, 5)));
--Pulls out just the values from the array:
select value::integer from table(flatten(input => array_construct(1, 2, 3, 4, 5)));
The "::integer" part casts the values to the data type you want from the array. It's optional but recommended.
You can approximate the syntax of unnest by creating a user defined table function:
create or replace function UNNEST(V array)
returns table ("VALUE" variant)
language SQL
aS
$$
select VALUE from table(flatten(input => V))
$$;
You would call it like this:
select * from table(unnest(array_construct(1, 2, 3, 4, 5)));
This returns a table with a single column named VALUE of type variant. You can make a version that returns strings, integers, etc.

How to check if an array is in a multidimensional array

I'd want to check, using a filtering query, if an array is equal to an element of another multidimensional array which can be considered as an array of arrays.
For example:
Given the multidimensional array {{1,2}, {3,4}, {5,6}} I want to check if a one dimension array of one of the array elements.
Expected results:
Input: {1,2} or {3,4} -> Output: TRUE
Input: {2,3} or {1,5} -> Output: FALSE
I've already tried <#, but it returns TRUE for all the examples cases and I can't use ANY without slicing the multidimensional array.
Does anyone have a solution without using pgplsql?
This does seem like a difficult problem to solve without any pgpsql. However, if this function is utilized, it is much simpler:
https://wiki.postgresql.org/wiki/Unnest_multidimensional_array
CREATE OR REPLACE FUNCTION public.reduce_dim(anyarray)
RETURNS SETOF anyarray AS
$function$
DECLARE
s $1%TYPE;
BEGIN
FOREACH s SLICE 1 IN ARRAY $1 LOOP
RETURN NEXT s;
END LOOP;
RETURN;
END;
$function$
LANGUAGE plpgsql IMMUTABLE;
To use:
create table array_test (arr integer[][]);
insert into array_test (select '{{1,2}, {3,4}, {5,6}}');
select (case when '{1,2}' in (select reduce_dim(arr) from array_test) then true
else false end);
case
------
t
(1 row)
select (case when '{1,4}' in (select reduce_dim(arr) from array_test) then true
else false end);
case
------
f
(1 row)
Simple way: search in array like in string:
select '{{1, 2}, {3, 4}, {5, 6}}'::int[][]::text like '%{1,2}%';
Complex way: decompose array to slices (without plpgsql):
with t(x) as (values('{{1, 2}, {3, 4}, {5, 6}}'::int[][]))
select *
from t
where (
select bool_or(x[s:s] = '{{1,3}}') from generate_subscripts(x,1) as s);

Efficiently saving summable array values in RDBMs

I have a dataset where we track engagement per-percent (so 8 people are active at 38%, 7 people are active at 39%, etc.). This gives an array with 100 values, filled with integers.
I need to store this in a postgres table. The only/major requirement is that I need to be able to sum the values for each index to form a new array. Example:
Row 1: [5, 3, 5, ... 7]
Row 2: [2, 5, 3, ... 1]
Sum: [7, 8, 8, ... 8]
The naive way to save these would be 100 individual (BIG)INT columns, which would allow you to sum the values per-column over multiple rows. However, this makes the table very wide (and does not seem like the most efficient way to do it). I have looked into (BIG)INT[100] columns, but I cannot seem to find a good, native way to sum the values. Same thing with json(b) columns (with a native JSON array).
Have I overlooked something? Is there a good, efficient way to do this without completely bloating a table?
The solution using unnest() with ordinality:
with the_table(intarr) as (
values
(array[1, 2, 3, 4]),
(array[1, 2, 3, 4]),
(array[1, 2, 3, 4])
)
select array_agg(sum order by ordinality)
from (
select ordinality, sum(unnest)
from the_table,
lateral unnest(intarr) with ordinality
group by 1
) s;
array_agg
------------
{3,6,9,12}
(1 row)
Here is one method that seems to work:
select array_agg(sum_aval order by ind)
from (select ind, sum(aval) sum_aval
from (select id, unnest(a) as aval, generate_series(1, 3) as ind
from (values (1, array[1, 2, 3]), (2, array[3, 4, 5])) v(id, a)
) x
group by ind
) x;
That is, unnest the arrays and generate indexes for them using generate_series(). Then you can aggregate at the index level and then re-combine into an array (using two separate aggregations).

Splice Array in Postgres

Is it possible to (easily) splice arrays in Postgres? For example, I want to replace all values of 4 with the values 8 and 12, so an array of {2, 4, 7} should become {2, 8, 12, 7}. Perhaps I'm going about this the wrong way, but I need to maintain the integer array column type for these columns. Thanks for any guidance you can give me.
Perhaps UNNEST?
WITH rep(ord, what, with_what) AS(
VALUES (1,4,8),
(2,4,12)
)
SELECT array_agg(COALESCE(with_what,elem) ORDER BY no, ord) AS new_array
FROM(
SELECT *
FROM UNNEST('{2, 4, 7}'::INTEGER[]) WITH ORDINALITY AS arr(elem, no)
LEFT JOIN rep ON arr.elem = rep.what
) AS q;
This way you can define a whole set of replaces easily.

appending to multi dimensional arrays in postgres

I am trying to create a multi-column aggregate function that needs to accumulate all the column values for later processing.
CREATE OR REPLACE FUNCTION accumulate_scores(prev integer[][],l1 integer,
l2 integer,l3 integer) RETURNS integer[][] AS
$$
BEGIN
prev[1] = array_append(prev[1],l1);
prev[2] = array_append(prev[2],l2);
prev[3] = array_append(prev[3],l3);
return prev;
END
$$ LANGUAGE plpgsql;
CREATE AGGREGATE my_aggregate(integer,integer,integer)(
SFUNC = accumulate_scores,
STYPE = integer[][],
INITCOND = '{}'
);
select accumulate_scores(ARRAY[1]|| ARRAY[[1],[1]],2,3,4);
I get this error
ERROR: function array_append(integer, integer) does not exist
Hint: No function matches the given name and argument types. You might need to add explicit type casts.
How do i accumulate these values into a multi dimensional array?
Edit: i have tried with array_cat and get the same error. thought array_append might be right since prev[1] is a uni dimensional array
You function & aggregate function would "work" like this, but it would only create a 1-dimenstional array:
CREATE OR REPLACE FUNCTION accumulate_scores(prev integer[][]
,l1 integer
,l2 integer
,l3 integer)
RETURNS integer[][] AS
$func$
BEGIN
prev := prev || ARRAY[l1,l2,l3];
RETURN prev;
END
$func$ LANGUAGE plpgsql;
CREATE AGGREGATE my_aggregate(integer,integer,integer)(
SFUNC = accumulate_scores,
STYPE = integer[][],
INITCOND = '{}'
);
The declaration RETURNS integer[][] is equivalent to just RETURNS integer[]. For Postgres all arrays on the same base element are the same type. Additional brackets are for documentation only and ignored.
To actually produce a multi-dimensional array, you don't need a custom function. Just a custom aggregate function:
CREATE AGGREGATE array_agg_mult (anyarray) (
SFUNC = array_cat
,STYPE = anyarray
,INITCOND = '{}'
);
Then you can:
SELECT array_agg_mult(ARRAY[ARRAY[l1,l2,l3]]) -- note the two array layers!
from tbl;
More details here:
Selecting data into a Postgres array
This is how I understand the question
select array[
array_agg(l1),
array_agg(l2),
array_agg(l3)
]
from (values
(1, 9, 1, 2, 3), (2, 9, 4, 5, 6)
) s(id, bar, l1, l2, l3)
group by bar;
array
---------------------
{{1,4},{2,5},{3,6}}
In case somebody encount some issue following erwin's great answer.
before postgresql 14 follow Erwin's answer.
for postgresql 14, we need tiny tweak.
create or replace aggregate
array_agg_mult(anycompatiblearray)
(sfunc = array_cat,
stype = anycompatiblearray,
initcond = '{}');

Resources