Efficiently saving summable array values in RDBMs - arrays

I have a dataset where we track engagement per-percent (so 8 people are active at 38%, 7 people are active at 39%, etc.). This gives an array with 100 values, filled with integers.
I need to store this in a postgres table. The only/major requirement is that I need to be able to sum the values for each index to form a new array. Example:
Row 1: [5, 3, 5, ... 7]
Row 2: [2, 5, 3, ... 1]
Sum: [7, 8, 8, ... 8]
The naive way to save these would be 100 individual (BIG)INT columns, which would allow you to sum the values per-column over multiple rows. However, this makes the table very wide (and does not seem like the most efficient way to do it). I have looked into (BIG)INT[100] columns, but I cannot seem to find a good, native way to sum the values. Same thing with json(b) columns (with a native JSON array).
Have I overlooked something? Is there a good, efficient way to do this without completely bloating a table?

The solution using unnest() with ordinality:
with the_table(intarr) as (
values
(array[1, 2, 3, 4]),
(array[1, 2, 3, 4]),
(array[1, 2, 3, 4])
)
select array_agg(sum order by ordinality)
from (
select ordinality, sum(unnest)
from the_table,
lateral unnest(intarr) with ordinality
group by 1
) s;
array_agg
------------
{3,6,9,12}
(1 row)

Here is one method that seems to work:
select array_agg(sum_aval order by ind)
from (select ind, sum(aval) sum_aval
from (select id, unnest(a) as aval, generate_series(1, 3) as ind
from (values (1, array[1, 2, 3]), (2, array[3, 4, 5])) v(id, a)
) x
group by ind
) x;
That is, unnest the arrays and generate indexes for them using generate_series(). Then you can aggregate at the index level and then re-combine into an array (using two separate aggregations).

Related

Need a database for key-array storage with array specific operations like "update union" and sub-array selection

I need a database to store pairs of key - array rows like below:
===== TABLE: shoppingCart =====
user_id - product_ids
1 - [1, 2, 3, 4]
2 - [100, 200, 300, 400]
and I want to be able to update a row with new array merging to the old one while skipping duplicate values. i.e, I want operations like:
UPDATE shoppingCart SET product_ids = UNION(product_ids, [4, 5, 6]) WHERE user_id = 1
to result the first row's product_ids column to become:
[1, 2, 3, 4, 5, 6]
I also need operations like selecting a sub-array, e.g. :
SELECT product_ids[0:2] from shoppingCart
which should result:
[1,2]
any suggestions for best database for such purposes?
the arrays I need to work with are usually long (containing about 1,000 - 10,000 values of long integers ( or string version of long integers) )

What is the equivalent of postgresSQL unnest() in snowflake sql

How do i modify the PostgresSQL in snowflake?
UNNEST(array[
'x' || to_char(date_trunc('MONTH', max(date)), 'Mon YYYY' ,
'y' || to_char(date_trunc('MONTH', max(date)), 'Mon YYYY')
)])
You can use "flatten" to break out values from the array, and then "table" to convert the values into a table:
-- Use an array for testing:
select array_construct(1, 2, 3, 4, 5);
-- Flattens into a table with metadata for each row:
select * from table(flatten(input => array_construct(1, 2, 3, 4, 5)));
--Pulls out just the values from the array:
select value::integer from table(flatten(input => array_construct(1, 2, 3, 4, 5)));
The "::integer" part casts the values to the data type you want from the array. It's optional but recommended.
You can approximate the syntax of unnest by creating a user defined table function:
create or replace function UNNEST(V array)
returns table ("VALUE" variant)
language SQL
aS
$$
select VALUE from table(flatten(input => V))
$$;
You would call it like this:
select * from table(unnest(array_construct(1, 2, 3, 4, 5)));
This returns a table with a single column named VALUE of type variant. You can make a version that returns strings, integers, etc.

In scala, how can I get the count of elements that never shown in both arrays?

for example, i have array a
Array[Int] = Array(1, 1, 2, 2, 3)
array b
Array[Int] = Array(2, 3, 4, 5)
i'd like to count how many elements that only shown in a or b. in this case, it's (1, 1, 4, 5), so the count is 4.
I tried diff, union, intersect, but I couldn't find a combination of them to get the result I want.
I think you can try something like this one but this is not good approach, still this will do the trick.
a.filterNot(b contains).size + b.filterNot(a contains).size
Same idea as the other answer, but linear time:
a.iterator.filterNot(b.toSet).size + b.iterator.filterNot(a.toSet).size
(.iterator to avoid creating intermediate collections)

Array difference in postgresql

I have two arrays [1,2,3,4,7,6] and [2,3,7] in PostgreSQL which may have common elements. What I am trying to do is to exclude from the first array all the elements that are present in the second.
So far I have achieved the following:
SELECT array
(SELECT unnest(array[1, 2, 3, 4, 7, 6])
EXCEPT SELECT unnest(array[2, 3, 7]));
However, the ordering is not correct as the result is {4,6,1} instead of the desired {1,4,6}.
How can I fix this ?
I finally created a custom function with the following definition (taken from here) which resolved my issue:
create or replace function array_diff(array1 anyarray, array2 anyarray)
returns anyarray language sql immutable as $$
select coalesce(array_agg(elem), '{}')
from unnest(array1) elem
where elem <> all(array2)
$$;
I would use ORDINALITY option of UNNEST and put an ORDER BY in the array_agg function while converting it back to array. NOT EXISTS is preferred over except to make it simpler.
SELECT array_agg(e order by id)
FROM unnest( array[1, 2, 3, 4, 7, 6] ) with ordinality as s1(e,id)
WHERE not exists
(
SELECT 1 FROM unnest(array[2, 3, 7]) as s2(e)
where s2.e = s1.e
)
DEMO
More simple, NULL support, probably faster:
select array(
select v
from unnest(array[2,2,null,1,3,3,4,5,null]) with ordinality as t(v, pos)
where array_position(array[3,3,5,5], v) is null
order by pos
);
Result: {2,2,null,1,4,null}
Function array_diff() with tests.
Postgres is unfortunately lacking this functionality. In my case, what I really needed to do was to detect cases where the array difference was not empty. In that specific case you can do that with the #> operator which means "Does the first array contain the second?"
ARRAY[1,4,3] #> ARRAY[3,1,3] → t
See doc

Splice Array in Postgres

Is it possible to (easily) splice arrays in Postgres? For example, I want to replace all values of 4 with the values 8 and 12, so an array of {2, 4, 7} should become {2, 8, 12, 7}. Perhaps I'm going about this the wrong way, but I need to maintain the integer array column type for these columns. Thanks for any guidance you can give me.
Perhaps UNNEST?
WITH rep(ord, what, with_what) AS(
VALUES (1,4,8),
(2,4,12)
)
SELECT array_agg(COALESCE(with_what,elem) ORDER BY no, ord) AS new_array
FROM(
SELECT *
FROM UNNEST('{2, 4, 7}'::INTEGER[]) WITH ORDINALITY AS arr(elem, no)
LEFT JOIN rep ON arr.elem = rep.what
) AS q;
This way you can define a whole set of replaces easily.

Resources