How to efficiently find minimum associated ID after applying transitivity - sql-server

I've a very large table with ID variables in two columns. I'd like to find the minimum associated ID after applying transitivity. For example, if ID1 = 1 and ID2 = 2 for one record and ID1 = 2 and ID2 = 3 for another record, I'd like the results to ultimately yield three records where the first column has an ID (1, 2, and 3) and the second column has a Min_ID (all equal to 1, in the above example).
I've determined that at most there are 6 ways IDs can be related, and I believe the below is a minimum functioning example.
Is there a more efficient way to do this?
Source
| ID1 | ID2 |
|------|-------|
| 1 | 1 |
| 2 | 2 |
| 1 | 1 |
| 1 | 2 |
| 2 | 11 |
| 11 | 13 |
| 13 | 99 |
| 99 | 1000 |
| 1000 | 97887 |
| 3 | 5 |
| 5 | 17 |
| 17 | 19 |
| 23 | 34 |
Results
| ID | Min_ID |
|-------|--------|
| 1 | 1 |
| 2 | 1 |
| 3 | 3 |
| 5 | 3 |
| 11 | 1 |
| 13 | 1 |
| 17 | 3 |
| 19 | 3 |
| 23 | 23 |
| 34 | 23 |
| 99 | 1 |
| 1000 | 1 |
| 97887 | 1 |
Functioning albeit potentially-inefficient working example:
IF OBJECT_ID('tempdb..##IDs') IS NOT NULL DROP TABLE ##IDs
GO
CREATE TABLE ##IDs
(
ID1 int
,ID2 int
);
INSERT INTO ##IDs (ID1, ID2)
VALUES
(1, 1),
(2, 2),
(1, 1),
(1, 2),
(2, 11),
(11, 13),
(13, 99),
(99, 1000),
(1000, 97887),
(3, 5),
(5, 17),
(17, 19),
(23, 34)
;
WITH t1 AS
(
SELECT
ID1
,ID2
FROM ##IDs
UNION
SELECT
ID2
,ID1
FROM ##IDs
),
t2 AS
(
SELECT
*
FROM t1
UNION
SELECT
a.ID1
,b.ID2
FROM t1 a
LEFT JOIN t1 b ON
a.ID2 = b.ID1
UNION
SELECT
b.ID2
,a.ID1
FROM t1 a
LEFT JOIN t1 b ON
a.ID2 = b.ID1
),
t3 AS
(
SELECT
*
FROM t2
UNION
SELECT
a.ID1
,b.ID2
FROM t2 a
LEFT JOIN t2 b ON
a.ID2 = b.ID1
UNION
SELECT
b.ID2
,a.ID1
FROM t2 a
LEFT JOIN t2 b ON
a.ID2 = b.ID1
),
t4 AS
(
SELECT
*
FROM t3
UNION
SELECT
a.ID1
,b.ID2
FROM t3 a
LEFT JOIN t3 b ON
a.ID2 = b.ID1
UNION
SELECT
b.ID2
,a.ID1
FROM t3 a
LEFT JOIN t3 b ON
a.ID2 = b.ID1
)
SELECT
ID1 AS [ID]
,MIN(ID2) Min_ID
FROM t4
GROUP BY
ID1

The complexity here for me was finding an appropriate stopping condition for the linking recursion. That said, it's really quite simple: stop linking when the ID links to itself.
IF OBJECT_ID('tempdb..##IDs') IS NOT NULL DROP TABLE ##IDs
GO
CREATE TABLE ##IDs
(
ID1 int
,ID2 int
);
INSERT INTO ##IDs (ID1, ID2)
VALUES
(1, 1),
(2, 2),
(1, 1),
(1, 2),
(2, 11),
(11, 13),
(13, 99),
(99, 1000),
(1000, 97887),
(3, 5),
(5, 17),
(17, 19),
(23, 34)
;
WITH t1 AS
(
SELECT
ID1
,ID2
FROM ##IDs
UNION
SELECT
ID2
,ID1
FROM ##IDs
UNION ALL
SELECT
c.ID1
,t1.ID2
FROM ##IDs c
INNER JOIN t1 ON
c.ID2 = t1.ID1
WHERE
c.ID1 <> c.ID2
UNION ALL
SELECT
t1.ID2
,c.ID1
FROM ##IDs c
INNER JOIN t1 ON
c.ID2 = t1.ID1
WHERE
c.ID1 <> c.ID2
)
SELECT
ID1 AS [ID]
,MIN(ID2) Min_ID
FROM t1
GROUP BY
ID1
;

Related

Track the changes of a few columns in an existing table leveraging primary keys?

I'm currently trying to track the changes of a few columns (let's call them col1 & col2) in a SQL Server table. The table is not being "updated/inserted/deleted" over time; new records are just being added to it (please see below 10/01 vs 11/01).
My end-goal would be to run a SQL query or stored procedure that would highlight the changes overtime using primary keys following the framework:
PrimaryKey | ColumnName | BeforeValue | AfterValue | Date
e.g.
Original table:
+-------+--------+--------+--------+
| PK1 | Col1 | Col2 | Date |
+-------+--------+--------+--------+
| 1 | a | e | 10/01 |
| 1 | b | e | 11/01 |
| 2 | c | e | 10/01 |
| 2 | d | f | 11/01 |
+-------+--------+--------+--------+
Output:
+--------------+--------------+---------------+--------------+--------+
| PrimaryKey | ColumnName | BeforeValue | AfterValue | Date |
+--------------+--------------+---------------+--------------+--------+
| 1 | Col1 | a | b | 11/01 |
| 2 | Col1 | c | d | 11/01 |
| 2 | Col2 | e | f | 11/01 |
+--------------+--------------+---------------+--------------+--------+
Any help appreciated.
Here is some code which is a bit clunky, but seems to work, Basically for each row I try and find an earlier row with a different value. This is done twice, once for Col1 and once for Col2.
To make it work I had to add a unique PK field, which I don't know whether you have or not, you can easily add as an identify field, either to your real table, or to the table used for the calculations.
declare #TestTable table (PK int, PK1 int, Col1 varchar(1), Col2 varchar(1), [Date] date)
insert into #TestTable (PK, PK1, Col1, Col2, [Date])
select 1, 1, 'a', 'e', '10 Jan 2018'
union all select 2, 1, 'b', 'e', '11 Jan 2018'
union all select 3, 2, 'c', 'e', '10 Jan 2018'
union all select 4, 2, 'd', 'f', '11 Jan 2018'
select T1.[Date], T1.PK1, 'Col1', T2.Col1, T1.Col1
from #TestTable T1
inner join #TestTable T2 on T2.PK = (
select top 1 PK
from #TestTable T21
where T21.PK1 = T1.PK1 and T21.Col1 != T1.Col1 and T21.[Date] < T1.[Date]
order by T21.[Date] desc
)
union all
select T1.[Date], T1.PK1, 'Col2', T3.Col2, T1.Col2
from #TestTable T1
inner join #TestTable T3 on T3.PK = (
select top 1 PK
from #TestTable T31
where T31.PK1 = T1.PK1 and T31.Col2 != T1.Col2 and T31.[Date] < T1.[Date]
order by T31.[Date] desc
)
order by [Date], PK1

T SQL Pivot still one row per pivot column

I am trying to create a pivot table in SQL 2008R2. I'm trying to reproduce and Access Pivot table in SQL. When I run the following script, I get one record for each pivot column instead of one record with two populated pivoted columns.
SELECT * FROM
(SELECT DataView.MAEID
, DataView.MoYr
, DataView.ChemicalName
, TblCategories.CatDesc
, DataView.[CAS#]
, DataView.HAP
, Sum([SpecAE]/2000) AS [Total SpecAE Tons]
, SUM([SpecAE]) AS [SpecAE]
FROM TblCategories
INNER JOIN DataView
ON TblCategories.CatID = DataView.Category
GROUP BY DataView.MAEID
, DataView.MoYr
, DataView.ChemicalName
, DataView.[CAS#]
, DataView.HAP
, TblCategories.CatDesc) TBL
PIVOT (
Sum([SpecAE])
FOR CatDesc IN ([INCIINERABLE LIQUIDS], [Supplemental Fuels])
)pvt
Any thoughts?
You haven't provided sample data, so I'll explain your issue with an example. Let's say I have a very simple table with some very simple values:
DECLARE #T TABLE
(
MainColumn INT NOT NULL,
PivotColumn INT NOT NULL,
SumColumn INT NOT NULL
);
INSERT #T VALUES
(1, 1, 10), (1, 1, 20), (1, 1, 30),
(1, 2, 60),
(1, 3, 50),
(2, 1, 10), (2, 1, 15),
(2, 2, 20),
(3, 1, 10),
(3, 2, 10),
(4, 1, 150);
If I perform the following query:
SELECT MainColumn,
PivotColumn,
PivotValue = SUM(SumColumn),
OtherSum = SUM(SumColumn / 5)
FROM #T
GROUP BY MainColumn, PivotColumn
ORDER BY MainColumn, PivotColumn
I get:
+------------+-------------+------------+----------+
| MainColumn | PivotColumn | PivotValue | OtherSum |
+------------+-------------+------------+----------+
| 1 | 1 | 60 | 12 |
| 1 | 2 | 60 | 12 |
| 1 | 3 | 50 | 10 |
| 2 | 1 | 25 | 5 |
| 2 | 2 | 20 | 4 |
| 3 | 1 | 10 | 2 |
| 3 | 2 | 10 | 2 |
| 4 | 1 | 150 | 30 |
+------------+-------------+------------+----------+
Now if I use a PIVOT to pivot the PivotValue for each PivotColumn, it's going to group by MainColumn AND OtherSum column. A pivot groups by every column that isn't part of the pivot.
So my result set will be split into (MainColumn=1, OtherSum=12), (MainColumn=1, OtherSum=10), (MainColumn=2, OtherSum=5), (MainColumn=2, OtherSum=4), etc... I will get a new line for each of these values. If the OtherSum value was unique for each line, I'd expect 8 lines with a pivot.
If I remove OtherSum from my result set, my result set is just going to group by MainColumn alone, so it'll all be on one line for each distinct MainColumn value, since that's the only column the pivot would group by.
If getting the other sum value is important, I can do something like the following:
SELECT P.MainColumn,
Val1A = P.[1],
Val1B = P.[1] / 5,
Val2A = P.[2],
Val2B = P.[2] / 5,
Val3A = P.[3],
Val3B = P.[3] / 5
FROM
(
SELECT MainColumn,
PivotColumn,
PivotValue = SUM(SumColumn)
FROM #T
GROUP BY MainColumn, PivotColumn
) AS T
PIVOT
(
SUM(PivotValue) FOR PivotColumn IN ([1], [2], [3])
) AS P;

I want to make query to read subject score of student to mark

I have database in SQL Server 2012 with two tables
The first table columns is :
| Student_ID , Subject_ID , Subject_score , Subject_mark |
|--------------------------------------------------------|
| 1 | 1 | 92 | ? |
| 1 | 2 | 88 | ? |
|____________|____________|_______________|______________|
And the second table columns is :
|Score_ID | Subject_mark |
|________________________|
| 100 | A+ |
| 95 | A |
| 90 | B+ |
| 85 | B |
| 80 | c |
| 75 | E |
|_________|______________|
I want to write a query to get Subject_mark from the second table and put it in Subject_mark in the first table
select from Subject_mark on table_1 ranged from table_2
I'm on my phone, so I'll have to check this later, as something seems a bit wrong about it, but try something like
Update table_1
Set Subject_mark = (select Subject_mark from table_2
where Score_ID <= (select Subject_score)
and Score_ID > (select Subject_score - 5))
Sample execution with given sample data:
DECLARE #FirstTable TABLE (Student_ID INT, Subject_ID INT, Subject_score INT, Subject_mark VARCHAR(3))
INSERT INTO #FirstTable (Student_ID, Subject_ID, Subject_score, Subject_mark)
VALUES
(1, 1, 92, NULL),
(1, 2, 88, NULL)
DECLARE #SecondTable TABLE (Score_ID INT, Subject_mark VARCHAR(3))
INSERT INTO #SecondTable (Score_ID, Subject_mark)
VALUES
(100, 'A+'),
(95 , 'A'),
(90 , 'B+'),
(85 , 'B'),
(80 , 'C'),
(75 , 'E')
UPDATE #FirstTable
SET Subject_mark = (SELECT Subject_mark
FROM #SecondTable
WHERE Score_ID <= (SELECT Subject_score) AND
Score_ID > (SELECT Subject_score - 5))
SELECT * FROM #FirstTable
Given that marks are classified at a difference of 5 marks each, this should work.
UPDATE t1
SET t1.Subject_mark = CASE WHEN t1.Subject_score < 75 THEN 'E' ELSE t2.Subject_mark END
FROM FirstTable t1
LEFT OUTER JOIN SecondTable t2 ON t1.Subject_score <= t2.Score_ID AND t1.Subject_score > t2.Score_ID - 5;

In SQL getting the Max() of a Count() for a specific Group by

My script
SELECT ans.Questions_Id,ans.Answer_Numeric,ans.Option_Id, opt.Description, count(ans.Option_Id) as [Count]
FROM Answers ans
LEFT OUTER JOIN Questions que
ON ans.Questions_Id = que.Id
LEFT OUTER JOIN Options opt
ON ans.Option_Id = opt.Id
WHERE que.Survey_Id = 1
and ans.Questions_Id = 1
GROUP By ans.Questions_Id,ans.Answer_Numeric,ans.Option_Id, opt.Description
ORDER BY 2, 5 desc
I am trying to get the top number responses (Description) for each Answer_Numeric. The result at the moment looks like this:
| Questions_Id | Answer_Numeric | Option_Id | Description | Count
-----------------------------------------------------------------------
| 1 | 1 | 27 | Technology | 183
| 1 | 1 | 24 | Personal Items | 1
| 1 | 2 | 28 | Wallet / Purse | 174
| 1 | 2 | 24 | Personal Items | 3
| 1 | 2 | 26 | Spiritual | 1
| 1 | 3 | 24 | Personal Items | 53
| 1 | 3 | 25 | Food / Fluids | 5
| 1 | 3 | 26 | Spiritual | 5
| 1 | 3 | 27 | Technology | 1
| 1 | 3 | 28 | Wallet / Purse | 1
As from the example data from above I need it to look like this:
| Questions_Id | Answer_Numeric | Option_Id | Description | Count
-----------------------------------------------------------------------
| 1 | 1 | 27 | Technology | 183
| 1 | 2 | 28 | Wallet / Purse | 174
| 1 | 3 | 24 | Personal Items | 53
I am pretty sure that I need to have a max or something in my Having clause but everything I have tried has not worked. Would really appreciate any help on this.
You can use ROW_NUMBER:
SELECT Questions_Id, Answer_Numeric, Option_Id, Description, [Count]
FROM (
SELECT ans.Questions_Id,ans.Answer_Numeric,ans.Option_Id,
opt.Description, count(ans.Option_Id) as [Count],
ROW_NUMBER() OVER (PARTITION BY ans.Questions_Id, ans.Answer_Numeric
ORDER BY count(ans.Option_Id) DESC) AS rn
FROM Answers ans
LEFT OUTER JOIN Questions que
ON ans.Questions_Id = que.Id
LEFT OUTER JOIN Options opt
ON ans.Option_Id = opt.Id
WHERE que.Survey_Id = 1
and ans.Questions_Id = 1
GROUP By ans.Questions_Id,
ans.Answer_Numeric,
ans.Option_Id,
opt.Description) AS t
WHERE t.rn = 1
ORDER BY 2, 5 desc
Alternatively you can use RANK so as to handle ties, i.e. more than one rows per Questions_Id, Answer_Numeric partition sharing the same maximum Count number.
Use row_number():
SELECT *
FROM (SELECT ans.Questions_Id, ans.Answer_Numeric, ans.Option_Id, opt.Description,
count(*) as cnt,
row_number() over (partition by ans.Questions_Id, ans.Answer_Numeric
order by count(*) desc) as seqnum
FROM Answers ans LEFT OUTER JOIN
Questions que
ON ans.Questions_Id = que.Id LEFT OUTER JOIN
Options opt
ON ans.Option_Id = opt.Id
WHERE que.Survey_Id = 1 and ans.Questions_Id = 1
GROUP By ans.Questions_Id, ans.Answer_Numeric, ans.Option_Id, opt.Description
) t
WHERE seqnum = 1
ORDER BY 2, 5 desc;
we can get the same result set in different ways and I have taken sample data set you just merge your joins in this code
declare #Table1 TABLE
(Id int, Answer int, OptionId int, Description varchar(14), Count int)
;
INSERT INTO #Table1
(Id, Answer, OptionId, Description, Count)
VALUES
(1, 1, 27, 'Technology', 183),
(1, 1, 24, 'Personal Items', 1),
(1, 2, 28, 'Wallet / Purse', 174),
(1, 2, 24, 'Personal Items', 3),
(1, 2, 26, 'Spiritual', 1),
(1, 3, 24, 'Personal Items', 53),
(1, 3, 25, 'Food / Fluids', 5),
(1, 3, 26, 'Spiritual', 5),
(1, 3, 27, 'Technology', 1),
(1, 3, 28, 'Wallet / Purse', 1)
;
SELECT tt.Id, tt.Answer, tt.OptionId, tt.Description, tt.Count
FROM #Table1 tt
INNER JOIN
(SELECT OptionId, MAX(Count)OVER(PARTITION BY OptionId ORDER BY OptionId)AS RN
FROM #Table1
GROUP BY OptionId,count) groupedtt
ON
tt.Count = groupedtt.RN
WHERE tt.Count <> 5
GROUP BY tt.Id, tt.Answer, tt.OptionId, tt.Description, tt.Count
OR
select distinct Count, Description , Id , Answer from #Table1 e where 1 =
(select count(distinct Count ) from #Table1 where
Count >= e.Count and (Description = e.Description))

Split column with delimited values for an inner join

I need to perform an inner join to a column containing delimited values like:
123;124;125;12;3433;35343;
Now what I am currently doing is this:
ALTER procedure [dbo].[GetFruitDetails]
(
#CrateID int
)
AS
SELECT Fruits.*, Fruits_Crates.CrateID
FROM Fruits_Crates INNER JOIN Fruits
ON Fruits_Crates.FruitID = Fruits.ID
WHERE Fruits_Crates.CrateID = #CrateID
Now the issue is I am saving the data this way:
FruitCrateID FruitID
1 1;
2 1;2;3;4
3 3;
How can I inner join FruitsIDs to the fruit table to get fruit details as well?
Using the method posted in this answer, you can convert the delimited string into rows of a temp table and then join to that:
SQL Fiddle
Schema Setup:
CREATE TABLE Fruits_Crates
([FruitCrateID] int, [FruitID] varchar(10))
;
INSERT INTO Fruits_Crates
([FruitCrateID], [FruitID])
VALUES
(1, '1;'),
(2, '1;2;3;4;'),
(3, '3;')
;
CREATE TABLE Fruits
([FruitID] int, [FruitName] varchar(10))
;
INSERT INTO Fruits
([FruitID], [FruitName])
VALUES
(1, 'Apple'),
(2, 'Banana'),
(3, 'Orange'),
(4, 'Pear')
;
Insert to temp table:
SELECT A.[FruitCrateID],
Split.a.value('.', 'VARCHAR(100)') AS FruitId
INTO #fruits
FROM (SELECT [FruitCrateID],
CAST ('<M>' + REPLACE([FruitID], ';', '</M><M>') + '</M>' AS XML) AS String
FROM Fruits_Crates) AS A CROSS APPLY String.nodes ('/M') AS Split(a)
Join temp table to lookup:
SELECT t1.*, t2.FruitName
FROM #Fruits t1
INNER JOIN Fruits t2 on t1.FruitId = t2.FruitId
Results:
| FRUITCRATEID | FRUITID | FRUITNAME |
|--------------|---------|-----------|
| 1 | 1 | Apple |
| 2 | 1 | Apple |
| 2 | 2 | Banana |
| 2 | 3 | Orange |
| 2 | 4 | Pear |
| 3 | 3 | Orange |

Resources