How to Group by and concatenate arrays in PostgreSQL

How to Group by and concatenate arrays in PostgreSQL - arrays

I have a table in PostgreSQL. I want to concatenate all the arrays(i.e. col) after grouping them by time. The arrays are of varying dimensions.
| time | col |
|------ |------------------ |
| 1 | {1,2} |
| 1 | {3,4,5,6} |
| 2 | {} |
| 2 | {7} |
| 2 | {8,9,10} |
| 3 | {11,12,13,14,15} |
The result should be as follows:
| time | col |
|------ |------------------ |
| 1 | {1,2,3,4,5,6} |
| 2 | {7,8,9,10} |
| 3 | {11,12,13,14,15} |
What I have come up with so far is as follows:
SELECT ARRAY(SELECT elem FROM tab, unnest(col) elem);
But this does not do the grouping. It just takes the entire table and concatenates it.

To preserve the same dimension of you array you can't directly use array_agg(), so first we unnest your arrays and apply distinct to remove duplicates (1). In outer query this is the time to aggregate. To preserve values ordering include order by within aggregate function:
select time, array_agg(col order by col) as col
from (
select distinct time, unnest(col) as col
from yourtable
) t
group by time
order by time
(1) If you don't need duplicate removal just remove distinct word.

you can use next query
SELECT
array_agg(_unnested.item) as array_coll
from my_table
left join LATERAL (SELECT unnest(my_table.array_coll) as item) _unnested ON TRUE

Related

Snowflake fill null values

I have a table like this in Snowflake, where I have a "class" column, where all rows with level 1 have a value. I would like to have a new column "WANTED_OUTPUT", where the value from class are filled until new value occurs, then fill with that value.
I have been looking at first_value and last_value function, but I miss something there can "group" all the rows from level 1 until next level 1 together before I can use the first_value and the partition over that.
Any suggestion?
+----+-------+-------+-------+---------------+
| id | col_c | level | class | WANTED_OUTPUT |
+----+-------+-------+-------+---------------+
| a | q1 | 1 | c99 | c99 |
+----+-------+-------+-------+---------------+
| a | w2 | 2 | NULL | c99 |
+----+-------+-------+-------+---------------+
| a | g6 | 2 | NULL | c99 |
+----+-------+-------+-------+---------------+
| a | j5 | 3 | NULL | c99 |
+----+-------+-------+-------+---------------+
| a | x8 | 1 | c3 | c3 |
+----+-------+-------+-------+---------------+
| a | x9 | 2 | NULL | c3 |
+----+-------+-------+-------+---------------+
| a | h5 | 1 | c67 | c67 |
+----+-------+-------+-------+---------------+

Using FIRST/LAST_VALUE:
SELECT *,
FIRST_VALUE(class) IGNORE NULLS OVER (PARTITION BY id ORDER BY level) AS wanted
FROM tab;
Tables are unordered sets by design so stable sort is required. Level may not be enough based on provided input.
Suggestion: Adding explicit timestamp or seq id column to provide stable sort column would be preferred.
SELECT tab.*,
LAST_VALUE(class) IGNORE NULLS OVER (PARTITION BY id
ORDER BY rn
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS wanted
FROM tab;
db<>fiddle demo

DELETE TOP variable records with variable from grouping of another table

Say I have two tables: A and B
Table A
+----+-------+
| id | value |
+----+-------+
| 1 | 20 |
| 2 | 20 |
| 3 | 10 |
| 4 | 0 |
+----+-------+
Table B
+----+-------+
| id | value |
+----+-------+
| 1 | 20 |
| 2 | 10 |
| 3 | 30 |
| 4 | 20 |
| 5 | 20 |
| 6 | 10 |
+----+-------+
If I do SELECT value, COUNT(*) AS occurrence FROM A GROUP BY value, I'll get:
+-------+------------+
| value | occurrence |
+-------+------------+
| 20 | 2 |
| 10 | 1 |
| 0 | 1 |
+-------+------------+
Based on this grouping of table A, I want to delete occurrence records from table B with the same values. In other words, I want to delete from B 2 records with value 20, 1 record with value 10, and 1 record with value 0. (Other conditions include 'do nothing if no record exists' and 'smallest id first', but I think these conditions are pretty trivial compared to the bulk of this question.)
Table B after deleting should be:
+----+-------+
| id | value |
+----+-------+
| 3 | 30 |
| 5 | 20 |
| 6 | 10 |
+----+-------+
From the official TOP documentation, doesn't seems like I can perform some JOIN to use as the TOP expression.

We could use ROW_NUMBER with CTEs here:
WITH cteA AS (
SELECT value, COUNT(*) cnt
FROM A
GROUP BY value
),
cteB AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY value ORDER BY id) rn
FROM B
)
DELETE
FROM cteB b
INNER JOIN cteA a
ON b.value = a.value
WHERE
b.rn <= a.cnt;
The logic here is that we use ROW_NUMBER to keep track of the order of each value in the B table. Then, we join to bring in the counts of each value in the A table, and we only delete B records for which the row number is strictly less than or equal to the A count.
See the demo link below to verify that the logic be correct. Note that I use a select there, not a delete, but the correct rows are being targeted for deletion.
Demo

Selecting grouped rows after first two rows SQL Server

This is a bit of a tricky question/situation and my search fu failed me.
Lets say i have the following data
| UID | SharedID | Type | Date |
|-----|----------|------|-----------|
| 1 | 1 | foo | 2/4/2016 |
| 2 | 1 | foo | 2/5/2016 |
| 3 | 1 | foo | 2/8/2016 |
| 4 | 1 | foo | 2/11/2016 |
| 5 | 2 | bar | 1/11/2016 |
| 6 | 2 | bar | 2/11/2016 |
| 7 | 3 | baz | 2/1/2016 |
| 8 | 3 | baz | 2/3/2016 |
| 9 | 3 | baz | 2/11/2016 |
And I would like to ommit a variable number of leading rows (most recent date in this case) and lets say that number is 2 in this example. The resulting table would be something like this:
| UID | SharedID | Type | Date |
|-----|----------|------|-----------|
| 1 | 1 | foo | 2/4/2016 |
| 2 | 1 | foo | 2/5/2016 |
| 7 | 3 | baz | 2/1/2016 |
Is this possible in SQL? Essentially I want to filter on an unknown number of rows which uses the date column as the order by. The goal is to get the oldest types and get a list of UID's in the process.

Sure, it's possible. Use a ROW_NUMBER function to assign a value to each row, partitioning by the SharedID column so that the count restarts every time that ID changes, and select those rows with a value greater than your limit.
WITH cteNumberedRows AS (
SELECT UID, SharedID, Type, Date,
ROW_NUMBER() OVER(PARTITION BY SharedID ORDER BY Date DESC) AS RowNum
FROM YourTable
)
SELECT UID, SharedID, Type, Date
FROM cteNumberedRows
WHERE RowNum > 2;

Not sure if I understand what you mean but something like this?
SELECT * FROM MyTable t1 JOIN MyTable T2 ON t2.id NOT IN (
SELECT TOP 2 UID FROM myTable
WHERE SharedID = t1.sharedID
ORDER BY [Date] DESC
)

Join two Select Statements into a single row when one select has n amount of entries?

Is it possible in SQL Server to take two select statements and combine them into a single row without knowing how many entries one of the select statements got?
I've been looking around at various Join solutions but they all seem to work on the basis that the amount of columns is predetermined. I have a case here where one table has a determined amount of columns (t1) and the other table have an undetermined amount of entries (t2) which all use a key that matches one entry in t1.
+----+------+-----+
| id | name | ... |
+----+------+-----+
| 1 | John | ... |
+----+------+-----+
And
+-------------+----------------+
| activity_id | account_number |
+-------------+----------------+
| 1 | 12345467879 |
| 1 | 98765432515 |
| ... | ... |
| ... | ... |
+-------------+----------------+
The number of account numbers belonging to the first query is unknown.
After the query it would become:
+----+------+-----+----------------+------------------+-----+------------------+
| id | name | ... | account_number | account_number_2 | ... | account_number_n |
+----+------+-----+----------------+------------------+-----+------------------+
| 1 | John | ... | 12345467879 | 98765432515 | ... | ... |
+----+------+-----+----------------+------------------+-----+------------------+
So I don't know how many account numbers could be associated with the id beforehand.

Rearranging and deduplicating SQL columns based on column data

Sorry I know that's a rubbish Title but I couldn't think of a more concise way of describing the issue.
I have a (MSSQL 2008) table that contains telephone numbers:
| CustomerID | Tel1 | Tel2 | Tel3 | Tel4 | Tel5 | Tel6 |
| Cust001 | 01222222 | 012333333 | 07111111 | 07222222 | 01222222 | NULL |
| Cust002 | 07444444 | 015333333 | 07555555 | 07555555 | NULL | NULL |
| Cust003 | 01333333 | 017777777 | 07888888 | 07011111 | 016666666 | 013333 |
I'd like to:
Remove any duplicate phone numbers
Rearrange the telephone numbers so that anything beginning with "07" is the first phone number. If there are multiple 07's, they should be in the first fields. The order of the numbers apart from that doesn't really matter.
So, for example, after processing, the table would look like:
| CustomerID | Tel1 | Tel2 | Tel3 | Tel4 | Tel5 | Tel6 |
| Cust001 | 07111111 | 07222222 | 01222222 | 012333333 | NULL | NULL |
| Cust002 | 07444444 | 07555555 | 015333333 | NULL | NULL | NULL |
| Cust003 | 07888888 | 07011111 | 016666666 | 013333 | 01333333 | 017777777 |
I'm struggling to figure out how to efficiently achieve my goal (there are 600,000+ records in the table). Can anyone help?
I've created a fiddle if it'll help anyone play around with the scenario.

You can break up the numbers into individual rows using UNPIVOT, then reorder them based on the occurence of the '07' prefix using ROW_NUMBER(), and finally recombine it using PIVOT to end up with the 6 Tel columns again.
select *
FROM
(
select CustomerID, Col, Tel
FROM
(
select *, Col='Tel' + RIGHT(
row_number() over (partition by CustomerID
order by case
when Tel like '07%' then 1
else 2
end),10)
from phonenumbers
UNPIVOT (Tel for Seq in (Tel1,Tel2,Tel3,Tel4,Tel5,Tel6)) seqs
) U
) P
PIVOT (MAX(TEL) for Col IN (Tel1,Tel2,Tel3,Tel4,Tel5,Tel6)) V;
SQL Fiddle

Perhaps using cursor to collect all customer id and sorting the fields...traditional sorting technique as we used to do in school c++ ..lolz...like to know if any other method possible.
If you dont get any then it is the last way . It will take a long time for sure to execute.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

How to Group by and concatenate arrays in PostgreSQL - arrays

you can use next query SELECT array_agg(_unnested.item) as array_coll from my_table left join LATERAL (SELECT unnest(my_table.array_coll) as item) _unnested ON TRUE

Related

Snowflake fill null values

DELETE TOP variable records with variable from grouping of another table

Selecting grouped rows after first two rows SQL Server

Join two Select Statements into a single row when one select has n amount of entries?

Rearranging and deduplicating SQL columns based on column data

Categories

Resources