Associate couples between 2 tables with SQL Server 2005 - sql-server

The question is easy, answer is not (for me).
I have 2 identical tables composed of 2 columns : ID and value. I have to find all couples sharing the same value. So when a record is used in a couple, it can't be reused in another couple.
For example, with this two tables :
CREATE TABLE [Tab1]([ID1] [int], [Val] [int])
CREATE TABLE [Tab2]([ID2] [int], [Val] [int])
INSERT [Tab1] ([ID1], [Val]) VALUES (1, 10)
INSERT [Tab1] ([ID1], [Val]) VALUES (2, 20)
INSERT [Tab1] ([ID1], [Val]) VALUES (3, 20)
INSERT [Tab1] ([ID1], [Val]) VALUES (4, 50)
INSERT [Tab1] ([ID1], [Val]) VALUES (5, 100)
INSERT [Tab2] ([ID2], [Val]) VALUES (1, 20)
INSERT [Tab2] ([ID2], [Val]) VALUES (2, 10)
INSERT [Tab2] ([ID2], [Val]) VALUES (3, 50)
INSERT [Tab2] ([ID2], [Val]) VALUES (4, 30)
INSERT [Tab2] ([ID2], [Val]) VALUES (5, 20)
GO
a good answer would be (there are several solutions, but one is enough) :
ID1 ID2 Val
--- ---- ---
2 1 20
1 2 10
4 3 50
3 5 20
I'm looking for a query to find this result. I use SQL Server 2005 but I can use SQL Server 2008 if it's needed.

This will work on SQL Server 2008, but I'm not sure it will on 2005.
SELECT t1.Val, ID1, ID2 FROM
(SELECT Val, ID1, RANK() OVER (PARTITION BY Val ORDER BY ID1) rank FROM Tab1) t1
INNER JOIN
(SELECT Val, ID2, RANK() OVER (PARTITION BY Val ORDER BY ID2) rank FROM Tab2) t2
ON t1.Val = t2.Val AND t1.rank = t2.rank
The trick here is to rank the ID's in each table by their ID and partition by the Val. Than joining by rank only returns valid results.
And here's an SQLFiddle

Related

UPDATE using Rank(), Row_Number excluding duplicate values

I've a dataset similar to the one below.
I need to update the base lookup table based on the values provided in the updated_CustomerId column. The base tables is the same as the dataset but it does not have updated_CustomerId column.
The challenge here that the base table has a unique constraint based on combination of three columns below:
Current_CustomerID
Order_ID
OrderCategory
DESIRED OUTPUT:
After the update either one of Old_customerIds (17360410 - Pk 8, 21044488 - Pk = 9) can be reassigned to the Update_CustomerID
PrimaryKey 2 will not updated as that would lead to Unique constraint violation, but it will then be deleted along with one of the PrimaryKeys from the above either 8 or 9, depending on which one was updated (re-assigned to the new id)
After everything is updated on the base table I then delete from the base table all records where Current_CustomerID was not re-assigned to the updated_CustomerId (if different)
IF OBJECT_ID('tempdb..#DataSet') IS NOT NULL
DROP TABLE #DataSet
IF OBJECT_ID('tempdb..#BaseTable') IS NOT NULL
DROP TABLE #BaseTable
CREATE TABLE #DataSet
(
PrimaryKey INT NOT NULL CONSTRAINT [PK_dataset_ID] PRIMARY KEY,
Current_CustomerID INT NOT NULL,
Order_ID INT NOT NULL,
OrderCategory VARCHAR(50) NOT NULL,
Updated_CustomerId INT NOT NULL
)
INSERT INTO #DataSet (PrimaryKey, Current_CustomerID, Order_ID, OrderCategory, updated_CustomerId)
VALUES
(1, 17395001, 4451784, 'Kitchen', 25693110),
(2, 25693110, 4451784, 'Kitchen', 25693110),
(3, 25693110, 2083059, 'Kitchen', 25693110),
(4, 25693110, 2163679, 'Kitchen', 25693110),
(5, 25693110, 2171466, 'Kitchen', 25693110),
(6, 25693110, 2163679, 'Bathroom', 25693110),
(7, 25693110, 2171466, 'Bathroom', 25693110),
(8, 17360410, 3377931, 'Furniture', 16303984),
(9, 21044488, 3377931, 'Furniture', 16303984),
(10, 1534323, 2641714, 'Furniture', 16303984),
(11, 16303984, 2641726, 'Furniture', 16303984),
(12, 16303984, 2641793, 'Furniture', 16303984),
(13, 16303984, 2641816, 'Furniture', 16303984),
(14, 16303345, 2641816, 'Garden', 16301239),
(15, 12345678, 1239065, 'Medicine', 1075432)
CREATE TABLE #BaseTable
(
PrimaryKey INT NOT NULL CONSTRAINT [PK_baseTable_ID] PRIMARY KEY,
CustomerID INT NOT NULL,
Order_ID INT NOT NULL,
OrderCategory VARCHAR(50) NOT NULL,
)
CREATE UNIQUE NONCLUSTERED INDEX [IDX_LookUp] ON #BaseTable
(
CustomerID ASC,
Order_ID ASC,
OrderCategory ASC
) ON [PRIMARY]
INSERT INTO #BaseTable (PrimaryKey, CustomerID, Order_ID, OrderCategory)
VALUES
(1, 17395001, 4451784, 'Kitchen'),
(2, 25693110, 4451784, 'Kitchen'),
(3, 25693110, 2083059, 'Kitchen'),
(4, 25693110, 2163679, 'Kitchen'),
(5, 25693110, 2171466, 'Kitchen'),
(6, 25693110, 2163679, 'Bathroom'),
(7, 25693110, 2171466, 'Bathroom'),
(8, 17360410, 3377931, 'Furniture'),
(9, 21044488, 3377931, 'Furniture'),
(10, 1534323, 2641714, 'Furniture'),
(11, 16303984, 2641726, 'Furniture'),
(12, 16303984, 2641793, 'Furniture'),
(13, 16303984, 2641816, 'Furniture'),
(14, 16303345, 2641816, 'Garden'),
(15, 12345678, 1239065, 'Medicine')
-- select * from #BaseTable
-- select * from #DataSet
; with CTE AS (
select a.*
,rank() over (partition by a.updated_CustomerId, a.Order_ID, a.OrderCategory
order by a.Current_CustomerID) as flag
from #DataSet a
)
with CTE AS (
select a.*
,rank() over (partition by a.updated_CustomerId, a.Order_ID, a.OrderCategory order by a.Current_CustomerID) as flag
from #DataSet a
)
update b
set CustomerID = a.Updated_CustomerId
from #BaseTable b
inner join CTE a on b.PrimaryKey = a.PrimaryKey
where flag <> 2
Msg 2601, Level 14, State 1, Line 82
Cannot insert duplicate key row in object 'dbo.#BaseTable' with unique index 'IDX_LookUp'. The duplicate key value is (25693110, 4451784, Kitchen).
The statement has been terminated.
I think you just want to get a row_number for the #DataTable, and then delete where there are more than one based on the unique key:
//...
DELETE bt
FROM #BaseTable bt
INNER JOIN (
SELECT a.PrimaryKey,
a.Updated_CustomerId,
a.Order_ID,
a.OrderCategory,
row = ROW_NUMBER() OVER (PARTITION BY a.Updated_CustomerId, a.Order_ID, a.OrderCategory ORDER BY a.Current_CustomerID)
FROM #BaseTable b
INNER JOIN #DataSet a
ON b.PrimaryKey = a.PrimaryKey
) x
ON bt.PrimaryKey = x.PrimaryKey
AND x.row > 1

Most efficient way of finding duplicates SQL Server

The fiddle:
CREATE TABLE person
([first_name] varchar(10), [surname] varchar(10), [date_of_birth] date, [person_id] int);
INSERT INTO person
([first_name], [surname], [date_of_birth] ,[person_id])
VALUES
('Alice', 'AA', '1/1/1990', 1),
('Bob' , 'BB', '1/1/1990', 3),
('Carol', 'CC', '1/1/1990', 4),
('Kate' , 'KK', '1/1/1990', 7);
CREATE TABLE person_membership
([person_id] int, [status_flag] varchar(1), [membership_id] int);
INSERT INTO person_membership
([person_id], [status_flag], [membership_id])
VALUES
(1, 'A', 10),
(1, 'A', 20),
(3, 'A', 30),
(4, 'A', 40),
(7, 'A', 60),
(7, 'T', 70);
CREATE TABLE memship
([membership_id] int, [memship_status] varchar(1));
INSERT INTO memship
([membership_id], [memship_status])
VALUES
(10, 'A'),
(20, 'A'),
(30, 'A'),
(40, 'A'),
(50, 'T'),
(60, 'A'),
(70, 'A');
The query:
WITH t AS
(SELECT first_name, surname, date_of_birth, p.person_id, m.membership_id
FROM person p
INNER JOIN person_membership pm ON p.person_id=pm.person_id
INNER JOIN memship m ON pm.membership_id = m.membership_id
WHERE pm.status_flag='A' and m.memship_status='A')
SELECT t.first_name, t.surname, t.date_of_birth, t.person_id, t1.membership_id
FROM t
INNER JOIN t t1 ON t.person_id=t1.person_id
GROUP BY t.first_name, t.surname, t.date_of_birth, t.person_id, t1.membership_id
HAVING count(*) > 1
The problem:
Find and display only those reconds marked as active and with multiple membership IDs assigned to one person id.
The expected outcome:
The question:
My query works fine and gives me the expected outcome but the execution plan looks rather convoluted. What are the better, more elegant, expert-recommended ways of doing it?
Seems like you don't need that big GROUP BY at all, you could use a windowed function inside the CTE instead:
WITH Counts AS(
SELECT p.first_name,
p.surname,
p.date_of_birth,
p.person_id,
m.membership_id,
COUNT(*) OVER (PARTITION BY p.person_id) AS PersonMemCount
FROM person p
INNER JOIN person_membership pm ON p.person_id=pm.person_id
INNER JOIN memship m ON pm.membership_id = m.membership_id
WHERE pm.status_flag='A'
AND m.memship_status='A')
SELECT C.first_name,
C.surname,
C.date_of_birth,
C.person_id,
C.membership_id
FROM Counts C
WHERE C.PersonMemCount > 1;

Get the Average of a Datediff function using a partition by in Snowflake

I am looking to understand what the average amount of days between transactions is for each of the customers in my database using Snowflake.
select Customer_ID,Day_ID,
datediff(Day,lag(Day_ID) over (Partition by Customer_ID ORDER BY DAY_ID), DAY_ID) as Time_Since
from Table
order by Customer_ID, Day_ID
The code above works to get me the time_elapsed but when I try to add an average function I get an error:
select Customer_ID
avg(datediff(Day,lag(Day_ID) over (Partition by Customer_ID ORDER BY DAY_ID), DAY_ID)) as AVG_Time_Since
from Table
order by Customer_ID
group by Customer_ID
The error reads:
SQL compilation error: Window function [LAG(TABLE.DAY_ID) OVER (PARTITION BY TABLE.CUSTOMER_ID ORDER BY TABLE.DAY_ID ASC NULLS LAST)] may not appear inside an aggregate function.
Any ideas?
You can nest them and get the answer you're seeking.
Note: You can simply delete the cte from the beginning of this and replace from cte with from YourTable
WITH cte as
(SELECT column1 customer_id, column2::date day_id
FROM
VALUES (1, '2019-01-01'), (1, '2019-01-06'), (1, '2019-01-15'), (1, '2019-01-25'), (1, '2019-01-27'), (1, '2019-01-31'), (2, '2019-01-01'), (2, '2019-01-08'), (2, '2019-01-13'), (2, '2019-01-17'), (2, '2019-01-21'), (2, '2019-01-25'), (2, '2019-02-02'), (3, '2019-02-12'), (3, '2019-02-14'), (3, '2019-02-18'), (3, '2019-02-23'), (3, '2019-03-04'), (3, '2019-03-10'))
SELECT customer_id,
avg(time_since) AVG_Time_Since
FROM
(SELECT Customer_ID,
Day_ID,
datediff(DAY, lag(Day_ID) OVER (PARTITION BY Customer_ID
ORDER BY DAY_ID), DAY_ID) AS Time_Since
FROM cte
ORDER BY Customer_ID,
Day_ID)
GROUP BY customer_id ;

how to select and join same table in mssql [duplicate]

I have a simple categories table as with the following columns:
Id
Name
ParentId
So, an infinite amount of Categories can be the child of a category. Take for example the following hierarchy:
I want, in a simple query that returns the category "Business Laptops" to also return a column with all it's parents, comma separator or something:
Or take the following example:
Recursive cte to the rescue....
Create and populate sample table (Please save us this step in your future questions):
DECLARE #T as table
(
id int,
name varchar(100),
parent_id int
)
INSERT INTO #T VALUES
(1, 'A', NULL),
(2, 'A.1', 1),
(3, 'A.2', 1),
(4, 'A.1.1', 2),
(5, 'B', NULL),
(6, 'B.1', 5),
(7, 'B.1.1', 6),
(8, 'B.2', 5),
(9, 'A.1.1.1', 4),
(10, 'A.1.1.2', 4)
The cte:
;WITH CTE AS
(
SELECT id, name, name as path, parent_id
FROM #T
WHERE parent_id IS NULL
UNION ALL
SELECT t.id, t.name, cast(cte.path +','+ t.name as varchar(100)), t.parent_id
FROM #T t
INNER JOIN CTE ON t.parent_id = CTE.id
)
The query:
SELECT id, name, path
FROM CTE
Results:
id name path
1 A A
5 B B
6 B.1 B,B.1
8 B.2 B,B.2
7 B.1.1 B,B.1,B.1.1
2 A.1 A,A.1
3 A.2 A,A.2
4 A.1.1 A,A.1,A.1.1
9 A.1.1.1 A,A.1,A.1.1,A.1.1.1
10 A.1.1.2 A,A.1,A.1.1,A.1.1.2
See online demo on rextester

SQL Server count number of records having the same value then group by

I have a table called ReportStats
the table columns are:
Id, MemberId, Action and RecordId.
the action can have a value of create, change or deactivate
what I want to do is count the number of each action per MemberId
is this posible?
Create table
CREATE TABLE #ReportStats (Id INT IDENTITY(1, 1), MemberId INT, [Action] VARCHAR(10))
Insert some sample data
INSERT INTO #ReportStats VALUES (1, 'create'),
(1, 'create'),
(1, 'change'),
(1, 'deactivate'),
(1, 'change'),
(1, 'deactivate'),
(2, 'create'),
(2, 'change'),
(2, 'change'),
(2, 'change'),
(2, 'change'),
(2, 'deactivate')
A simple COUNT(*) and GROUP BY will do the trick:
Query
SELECT
MemberId,
Action,
COUNT(*) AS Cnt
FROM #ReportStats
GROUP BY MemberId, Action
ORDER BY MemberId
Output
MemberId Action Cnt
1 change 2
1 create 2
1 deactivate 2
2 change 4
2 create 1
2 deactivate 1
Other way of doing the same is by using count funtion along with OVER clause:
SELECT DISTINCT
MemberId,
Action,
COUNT(*) OVER (PARTITION BY MemberID,Action) as CntAction
FROM #ReportStats

Resources