Postgres - join on array values - arrays

Say I have a table with schema as follows
id | name | tags |
1 | xyz | [4, 5] |
Where tags is an array of references to ids in another table called tags.
Is it possible to join these tags onto the row? i.e. replacing the id numbers with the values for thise rows in the tags table such as:
id | name | tags |
1 | xyz | [[tag_name, description], [tag_name, description]] |
If not, I wonder if this an issue with the design of the schema?

Example tags table:
create table tags(id int primary key, name text, description text);
insert into tags values
(4, 'tag_name_4', 'tag_description_4'),
(5, 'tag_name_5', 'tag_description_5');
You should unnest the column tags, use its elements to join the table tags and aggregate columns of the last table. You can aggregate arrays to array:
select t.id, t.name, array_agg(array[g.name, g.description])
from my_table as t
cross join unnest(tags) as tag
join tags g on g.id = tag
group by t.id;
id | name | array_agg
----+------+-----------------------------------------------------------------
1 | xyz | {{tag_name_4,tag_description_4},{tag_name_5,tag_description_5}}
(1 row)
or strings to array:
select t.id, t.name, array_agg(concat_ws(', ', g.name, g.description))
...
or maybe strings inside a string:
select t.id, t.name, string_agg(concat_ws(', ', g.name, g.description), '; ')
...
or the last but not least, as jsonb:
select t.id, t.name, jsonb_object_agg(g.name, g.description)
from my_table as t
cross join unnest(tags) as tag
join tags g on g.id = tag
group by t.id;
id | name | jsonb_object_agg
----+------+------------------------------------------------------------------------
1 | xyz | {"tag_name_4": "tag_description_4", "tag_name_5": "tag_description_5"}
(1 row)
Live demo: db<>fiddle.

not sure if this is still helpful for anyone, but unnesting the tags is quite a bit slower than letting postgres do the work directly from the array. you can rewrite the query and this is generally more performant because the g.id = ANY(tags) is a simple pkey index scan without the expansion step:
SELECT t.id, t.name, ARRAY_AGG(ARRAY[g.name, g.description])
FROM my_table AS t
LEFT JOIN tags AS g
ON g.id = ANY(tags)
GROUP BY t.id;

Related

T-SQL grouping by either of two columns

I have a table where I hold chat history as follows:
Id From To Text Hour
=================================================
1 A B Msg_A_B1 00:01
2 A B Msg_A_B2 00:02
3 B A Msg_B_A1 00:03
4 A B Msg_A_B3 00:05
5 C A Msg_C_A1 00:11
6 A C Msg_A_C1 00:12
7 C A Msg_C_A2 00:14
8 D B Msg_D_B1 00:17
I want to create a chat header list from data for a spesific user. The rules are
I want to get the start (first) "Hour" of each chat
and last message of chat for a spesific user
ordered by "Hour" ascending
For example if the user is "A" I want to get
Correspondant Text Hour
=======================================
B Msg_A_B3 00:01
C Msg_C_A2 00:11
Or for user "B" :
Correspondant Text Hour
=======================================
A Msg_A_B3 00:01
D Msg_D_B1 00:17
I can possibly do it by using Temporary tables, but I am seeking a simpler and faster solution.
This information might lead to the use of Stored Procedures, but a proper use of Views is also accepted.
What you are missing is a grouping column to mark a chat between A and B as "belonging together" without looking if A or B is the From or the To.
It is your table design, which makes things difficult. Below my suggestion I will add some hints how this might be done better:
Your mockup to simulate your issue:
DECLARE #mockupTable TABLE(Id INT,[From] VARCHAR(100),[To] VARCHAR(100),[Text] VARCHAR(100),[Hour] TIME(0))
INSERT INTO #mockupTable VALUES
(1,'A','B','Msg_A_B1','00:01')
,(2,'A','B','Msg_A_B2','00:02')
,(3,'B','A','Msg_B_A1','00:03')
,(4,'A','B','Msg_A_B3','00:05')
,(5,'C','A','Msg_C_A1','00:11')
,(6,'A','C','Msg_A_C1','00:12')
,(7,'C','A','Msg_C_A2','00:14')
,(8,'D','B','Msg_D_B1','00:17');
--The query
WITH cte AS
(
SELECT t.*
,CONCAT(CASE WHEN t.[From]>t.[To] THEN t.[To] ELSE t.[From] END,'-',CASE WHEN t.[From]>t.[To] THEN t.[From] ELSE t.[To] END) AS ChatID
FROM #mockupTable t
)
,FindFirstAndLast AS
(
SELECT cte1.ChatID
,(SELECT TOP 1 Id FROM cte cte2 WHERE cte2.ChatID=cte1.ChatID ORDER BY cte2.[Hour] ASC) AS FirstId
,(SELECT TOP 1 Id FROM cte cte2 WHERE cte2.ChatID=cte1.ChatID ORDER BY cte2.[Hour] DESC) AS LastId
FROM cte cte1
GROUP BY cte1.ChatID
)
SELECT fal.ChatID
,tFirst.[From] AS FirstFrom
,tFirst.[To] AS FirstTo
,tFirst.[Hour] AS FirstHour
,tLast.[From] AS LastFrom
,tLast.[To] AS LastTo
,tLast.[Text] AS LastText
FROM FindFirstAndLast fal
INNER JOIN #mockupTable tFirst ON fal.FirstId=tFirst.Id
INNER JOIN #mockupTable tLast ON fal.LastId=tLast.Id;
The idea in short:
The first CTE will create a ChatID by concatenating the From and the To in a sorted way. Doing so a message from A to B will get the same ChatID as a message from B to A.
The second CTE will use a correlated sub-query to find the first and the last message id, grouped for the previously computed ChatID.
The final SELECT will use these message ids to join the appropriate rows.
The result is coming with everything you need. It's on you, to put it in the format needed:
+--------+-----------+---------+-----------+----------+--------+----------+
| ChatID | FirstFrom | FirstTo | FirstHour | LastFrom | LastTo | LastText |
+--------+-----------+---------+-----------+----------+--------+----------+
| A-B | A | B | 00:01:00 | A | B | Msg_A_B3 |
+--------+-----------+---------+-----------+----------+--------+----------+
| A-C | C | A | 00:11:00 | C | A | Msg_C_A2 |
+--------+-----------+---------+-----------+----------+--------+----------+
| B-D | D | B | 00:17:00 | D | B | Msg_D_B1 |
+--------+-----------+---------+-----------+----------+--------+----------+
Some ideas about the desing
I would use
one table Person for your chatting persons.
a second table Chat for a chat with a ChatID.
one m:n mapping table ChattingPerson with JoinTime, a ChatID and a PersonID, both as FKs. Here you can set timestamps like LastAction or mark the status (active, has left, ...)
one more table Message for the messages with time, text, and ChatPersonID as FK.
Your advantages
The opener can explicitly invite more persons (or limit it to one for a person2person chat), or just wait for participants.
Starting a chat creates the row in the Chat table, the first row in the ChattingPerson table to mark the opener, and eventually a first message row.
Following messages add - if not existing yet - a row to the ChatPerson (with a new participant) and the message row.
The ID to the ChatPerson-table will give you the ChatID and the PersonID.
You can filter per chat and/or by person.
There can be separate chats between A and B over the time
You can control the type of chat with a PersonCount-Constraint
You can enforce, that a new ChatPerson can only be added by the opener
You can create certain chat types (like "person2person") with a template
Happy Coding :-)
Let's do it Creating a view:
First let's load the data in table t1:
create table t1 (Id int,[From] varchar(10),[To] varchar(10),Text varchar(100),Hour time(0))
insert into t1 values (1,'A','B','Msg_A_B1','00:01')
insert into t1 values (2,'A','B','Msg_A_B2','00:02')
insert into t1 values (3,'B','A','Msg_B_A1','00:03')
insert into t1 values (4,'A','B','Msg_A_B3','00:05')
insert into t1 values (5,'C','A','Msg_C_A1','00:11')
insert into t1 values (6,'A','C','Msg_A_C1','00:12')
insert into t1 values (7,'C','A','Msg_C_A2','00:14')
insert into t1 values (8,'D','B','Msg_D_B1','00:17')
Then let's create the view
create view vChats
as
with cte as (
select left([text],len([text])-1) as chat,
t.*
from t1 as t
),
cte2 as (
select chat,
min(hour) as minHour,
max(text) as maxText
from cte
group by chat
),
cte3 as (select distinct [From] as [User]
from t1
UNION
select distinct [To] as [User]
from t1
)
select c3.[User],
t.[To] as Correspondant,
c.maxText as [Text],
c.minHour as [Hour]
from cte2 as c
inner join t1 as t ON c.maxText = t.[Text]
inner join cte3 as c3 ON c3.[User] = t.[From]
UNION
select c3.[User],
t.[From] as Correspondant,
c.maxText as [Text],
c.minHour as [Hour]
from cte2 as c
inner join t1 as t ON c.maxText = t.[Text]
inner join cte3 as c3 ON c3.[User] = t.[To]
After that you can use to get all the communications for each user like this:
select *
from vChats
where [User] = 'A'

select resultset of counts by array param in postgres

I've been searching for this and it seems like it should be something simple, but apparently not so much. I want to return a resultSet within PostgreSQL 9.4.x using an array parameter so:
| id | count |
--------------
| 1 | 22 |
--------------
| 2 | 14 |
--------------
| 14 | 3 |
where I'm submitting a parameter of {'1','2','14'}.
Using something (clearly not) like:
SELECT id, count(a.*)
FROM tablename a
WHERE a.id::int IN array('{1,2,14}'::int);
I want to test it first of course, and then write it as a storedProc (function) to make this simple.
Forget it, here is the answer:
SELECT a.id,
COUNT(a.id)
FROM tableName a
WHERE a.id IN
(SELECT b.id
FROM tableName b
WHERE b.id = ANY('{1,2,14}'::int[])
)
GROUP BY a.id;
You can simplify to:
SELECT id, count(*) AS ct
FROM tbl
WHERE id = ANY('{1,2,14}'::int[])
GROUP BY 1;
More:
Check if value exists in Postgres array
To include IDs from the input array that are not found I suggest unnest() followed by a LEFT JOIN:
SELECT id, count(t.id) AS ct
FROM unnest('{1,2,14}'::int[]) id
LEFT JOIN tbl t USING (id)
GROUP BY 1;
Related:
Preserve all elements of an array while (left) joining to a table
If there can be NULL values in the array parameter as well as in the id column (which would be an odd design), you'd need (slower!) NULL-safe comparison:
SELECT id, count(t.id) AS ct
FROM unnest('{1,2,14}'::int[]) id
LEFT JOIN tbl t ON t.id IS NOT DISTINCT FROM id.id
GROUP BY 1;

SQL Pulling the latest information and information from another table

I have a record table that is recording changes within a table. I can pull the data from the first table fine, however when i try to join in another table to add some of its column information it stops displaying the information.
PartNumber | PartDesc | value | date
1 | test | 1 | 3/4/2015
I wanted to include the Aisle tag's from the location table
PartNumber| AisleTag | AisleTagTwo
1 | A1 | N/A
here is what i have as my sql statement so far
Select t1.PartNumber, t1.PartDesc , t1.NewValue , t1.Date,t2.AisleTag,t2.AisleTagTwo
from InvRecord t1
JOIN PartAisleListTbl t2 ON t1.PartNumber = t2.PartNumber
where Date = (select max(Date) from InvRecord where t1.PartNumber = InvRecord.PartNumber)
order by t1.PartNumber
it is coming up blank, my original sql statement doesn't include anything from t2. I am not sure what approach to go with in terms of getting the data combined any help is much appreciated thank you !
this should be the end result
PartNumber | PartDesc | value | date | AisleTag | AisleTagTwo
1 | test | 1 | 3/4/2015 | A1 | N/A
Pull the most recent row (based on Date) for each PartNumber in Table A and append data from Table B (joined on PartNumber):
SELECT *
FROM (
SELECT A.PartNumber
, A.PartDesc
, A.NewValue
, A.Date
, B.AisleTag
, B.AisleTagTwo
, DateSeq = ROW_NUMBER() OVER(PARTITION BY A.PartNumber ORDER BY A.Date DESC)
FROM InvRecord A
LEFT JOIN PartAisleListTbl B
ON A.PartNumber = B.PartNumber
) A
WHERE A.DateSeq = 1
ORDER BY A.PartNumber
Are you returning no records at all, or only records with AisleTag and AisleTagTwo as null?
Your sentence "it is coming up blank, my original sql statement doesn't include anything from t2." makes it sound like you're getting records with nulls for the t2 fields.
If you are, then you probably have a record in t2 that has nulls for those fields.
For troubleshooting purposes, try running the query without the WHERE clause:
Select t1.PartNumber, t1.PartDesc , t1.NewValue , t1.Date,t2.AisleTag,t2.AisleTagTwo
from InvRecord t1
JOIN PartAisleListTbl t2 ON t1.PartNumber = t2.PartNumber
order by t1.PartNumber
If you do get records, your problem is with the WHERE clause. If you don't, your problem is with the PartNumber fields in InvRecord and PartAisleListTbl not matching.
Not sure why your's isn't working... is date in both t1 and t2 by any chance?
Here's it re factored to use a inline view instead of a correlated query wonder if it makes a difference.
Select t1.PartNumber, t1.PartDesc , t1.NewValue , t1.Date,t2.AisleTag,t2.AisleTagTwo
from InvRecord t1
JOIN PartAisleListTbl t2
ON t1.PartNumber = t2.PartNumber
JOIN (select max(Date) mdate, PartNumber from InvRecord GROUP BY PartNumber) t3
on t3.partNumber= T1.PartNumber
and T3.mdate = T1.Date
order by t1.PartNumber

SQL Server: Duplicate columns in joined table, but distinct row info

So I have joined two tables to identify claims and their corresponding reversals if there are any.
The following is a simplified explanation as to what I have done: Join where MbrNo is the same in both tables, and where Amount=-Amount. So now I have an output table contians duplicate column names:
MbrNo | ClaimType | Amount | MbrNo | ClaimType | Amount
xyz | Medicine | R 300 | xyz | Reversal | - R300
I can not input this in a table as column names are not unique.
But I would like to
1. Format this table to look as follows
MbrNo | ClaimType | Amount
xyz | Medicine | R 300
xyz | Reversal | - R300
with t as
(
select *,
count(*) over(partition by [MbrNo], [DepNo], [PracticeNo], [DisciplineCd], [ServiceDt],[PayAmt]) as rownum
from Claims
)
Select * from
(Select * from t where PayAmt<0) a
left outer join
(Select * from t where PayAmt>0) b
on a.[MbrNo]=b.[MbrNo]
and a.[DepNo]=b.[DepNo]
and a.[PracticeNo]=b.[PracticeNo]
and a.[DisciplineCd]=b.[DisciplineCd]
and a.[ServiceDt]=b.[ServiceDt]
and a.[PayAmt]=-b.[PayAmt]
Basically I want to put the 2nd table in the joined table underneath the first table.
Please help:(
If I've understood your requirements correctly then I think you want the UNION operator. See if this gets you going in the right direction.
with t as
(
select *,
count(*) over(partition by [MbrNo], [DepNo], [PracticeNo], [DisciplineCd], [ServiceDt],[PayAmt]) as rownum
from Claims
)
Select t.* from t where PayAmt < 0
union all
select b.* from
(Select * from t where PayAmt < 0) a
inner join
(Select * from t where PayAmt > 0) b
on a.[MbrNo] = b.[MbrNo]
and a.[DepNo] = b.[DepNo]
and a.[PracticeNo] = b.[PracticeNo]
and a.[DisciplineCd] = b.[DisciplineCd]
and a.[ServiceDt] = b.[ServiceDt]
and a.[PayAmt] = -b.[PayAmt]

Select N rows avoiding duplicates on a non-key, non-index field

Using T-SQL, how can I select n rows of a non-key, non-index column and avoid duplicate results?
Example table:
ID_ | state | customer | memo
------------------------------------------
1 | abc | 123 | memo text xyz
2 | abc | 123 | memo text abc
3 | abc | 456 | memo text def
4 | abc | 456 | memo text rew
5 | abc | 789 | memo text yte
6 | def | 123 | memo text hrd
7 | def | 432 | memo text dfg
I want to select, say, 2 memos for state 'abc' but the returned memos should not be for the same customer.
memo
----
memo text xyz
memo text def
PS: The only select condition available is state (eg: where state = 'abc')
I have managed to do this in a very inefficient way
SELECT top 2 MAX(memo)
FROM table
WHERE state = 'abc'
GROUP BY customer
This works fine for small sample size, but the production table has over 1 billion rows.
You can try using the following query, in your actual database size. Not sure of the performance in database table with billion rows. So you can do the test yourself.
SELECT memo
FROM (SELECT memo,
ROW_NUMBER() OVER (PARTITION BY customer ORDER BY (SELECT 0)) AS RN
FROM table1 WHERE state = 'abc') T
WHERE RN = 1
You can check the SQL FIDDLE
EDIT: Adding a non-clustered index on state and customer including memo will tremendously improve the performance.
CREATE NONCLUSTERED INDEX [custom_index] ON table
(
[state] ASC,
[customer] ASC
)
INCLUDE ( [memo]) WITH (SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF) ON [DATA]
A way to get that n distinct value for state/customer is to get an ID for every group
SELECT MIN(ID_) ID
FROM Table1
GROUP BY State, customer
(MIN can be substituted by MAX, it's just a way to get one of the values)
then JOIN that to the table adding the other condition
WITH getID AS (
SELECT MIN(ID_) ID
FROM Table1
GROUP BY State, customer
)
SELECT TOP 2
t.ID_, t.State, t.Customer, t.memo
FROM table1 t
INNER JOIN getID g ON t.ID_ = g.ID
WHERE t.state = 'abc'
SQLFiddle demo
if your version of SQLServer doesn't support WITH the CTE can become a subquery
SELECT TOP 2
t.ID_, t.State, t.Customer, t.memo
FROM table1 t
INNER JOIN (SELECT MIN(ID_) ID
FROM Table1
GROUP BY State, customer
) g ON t.ID_ = g.ID
WHERE t.state = 'abc'
Another way is to use CROSS APPLY to get the distinct ID
SELECT TOP 2
t.ID_, t.State, t.Customer, t.memo
FROM table1 t
CROSS APPLY (SELECT TOP 1
ID_
FROM table1 t1
WHERE t1.State = t.State AND t1.Customer = t.Customer) c
WHERE t.state = 'abc'
AND c.ID_ = t.ID_;
SQLFiddle demo

Resources