This question already has answers here:
Get top 1 row of each group
(19 answers)
Closed last month.
Imagine that I have Table A:
Value1
Value2
Value3
Date1
Date2
1
2
1
2022/02/01
1900/02/01
1
2
2
2004/02/01
1992/02/01
2
2
2
2022/02/01
2001/07/01
3
3
1
2021/02/01
1990/02/01
3
3
2
2021/02/01
1980/02/01
3
3
3
2005/02/01
2022/02/01
I want to have a query that returns the records for each pair (Value1, Value2) with max Date1 in case of same Date1, the ones with max Date2.
For this example, I want to get the following results:
Value1
Value2
Value3
Date1
Date2
1
2
1
2022/02/01
1900/02/01
2
2
2
2022/02/01
2001/07/01
3
3
1
2021/02/01
1990/02/01
I'm trying to use the SELECT - OVER clause in the following query:
SELECT
Value1,
Value2,
--Value3 OVER (PARTITION BY Value1, Value2 ORDER BY Date1 DESC, Date2 DESC) AS Value3,
MAX(Date1) OVER (PARTITION BY Value1, Value2) AS FinalDate1,
MAX(Date2) OVER (PARTITION BY Value1, Value2) AS FinalDate2
FROM A
but I'm getting the following error:
Msg 8120, Level 16, State 1, Server b784427b284a, Line 21
Column 'A.Date1' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Also, not sure about how to handle the Value3 column since I just want to get it's value without any aggregate function or similar.
Does anyone have an idea on how I can do this?
You can use ROW_NUMBER to partition and ordering the data like what you want. And you can use CTE to filter the ROW_NUMBER
Data prepare:
DECLARE #vTable TABLE (
Value1 INT,
Value2 INT,
Value3 INT,
Date1 DATE,
Date2 DATE
)
INSERT INTO #vTable
VALUES
(1, 2, 1, '2022-02-01', '1900-02-01'),
(1, 2, 2, '2004-02-01', '1992-02-01'),
(2, 2, 2, '2022-02-01', '2001-07-01'),
(3, 3, 1, '2021-02-01', '1900-02-01'),
(3, 3, 2, '2021-02-01', '1980-02-01'),
(3, 3, 3, '2005-02-01', '2022-02-01');
The query script:
;WITH CTE AS (
SELECT
RowNumber = ROW_NUMBER() OVER (PARTITION BY Value1, Value2 ORDER BY Date1 DESC)
, Value1
, Value2
, Value3
, Date1
, Date2
FROM
#vTable
)
SELECT
*
FROM
CTE
WHERE
RowNumber = 1
The result:
RowNumber
Value1
Value2
Value3
Date1
Date2
1
1
2
1
2022-02-01
1900-02-01
1
2
2
2
2022-02-01
2001-07-01
1
3
3
1
2021-02-01
1900-02-01
I have a table that looks like the following which was created using the following code...
SELECT Orders.ID, Orders.CHECKIN_DT_TM, Orders.CATALOG_TYPE,
Orders.ORDER_STATUS, Orders.ORDERED_DT_TM, Orders.COMPLETED_DT_TM,
Min(DateDiff("n",Orders.ORDERED_DT_TM,Orders.COMPLETED_DT_TM)) AS
Time_to_complete
FROM Orders
GROUP BY Orders.ORDER_ID, Orders.ID,
Orders.CHECKIN_DT_TM, Orders.CATALOG_TYPE, Orders.ORDERED_DT_TM,
Orders.COMPLETED_DT_TM, HAVING (((Orders.CATALOG_TYPE)="radiology");
ID Time_to_complete ... .....
1 5
1 7
1 8
2 23
2 6
3 7
4 16
4 14
I'd like to add to this code which would select the smallest Time_to_complete value per subject ID. Leaving the desired table:
ID Time_to_complete ... .....
1 5
2 6
3 7
4 14
I'm using Access and prefer to continue using Access to finish this code but I do have the option to use SQL Server if this is not possible in Access. Thanks!
I suspect you need correlated subquery :
SELECT O.*, DateDiff("n", O.ORDERED_DT_TM, O.COMPLETED_DT_TM) AS Time_to_complete
FROM Orders O
WHERE DateDiff("n", O.ORDERED_DT_TM, O.COMPLETED_DT_TM) = (SELECT Min(DateDiff("n", O1.ORDERED_DT_TM, O1.COMPLETED_DT_TM))
FROM Orders O1
WHERE O1.ORDER_ID = O.ORDER_ID AND . . .
);
EDIT : If you want unique records then you can do instead :
SELECT O.*, DateDiff("n", O.ORDERED_DT_TM, O.COMPLETED_DT_TM) AS Time_to_complete
FROM Orders O
WHERE o.pk = (SELECT TOP (1) o1.pk
FROM Orders O1
WHERE O1.ORDER_ID = O.ORDER_ID AND . . .
ORDER BY DateDiff("n", O.ORDERED_DT_TM, O.COMPLETED_DT_TM) ASC
);
pk is your identity column that specifies unique entry in Orders table, so you can change it accordingly.
Have a look at this:
DECLARE #myTable AS TABLE (ID INT, Time_to_complete INT);
INSERT INTO #myTable
VALUES (1, 5)
, (1, 7)
, (1, 8)
, (2, 23)
, (2, 6)
, (3, 7)
, (4, 16)
, (4, 14);
WITH cte AS
(SELECT *
, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY Time_to_complete) AS RN
FROM #myTable)
SELECT cte.ID
, cte.Time_to_complete
FROM cte
WHERE RN = 1;
Results :
ID Time_to_complete
----------- ----------------
1 5
2 6
3 7
4 14
It uses row numbers over groups, then selects the first row for each group. You should be able to adjust your code to use this technique. If in doubt wrap your entire query in a cte first then apply the technique here.
It's worth becoming familiar with this process as it gets used in a lot of places - especially around de-duping data.
Try This
DECLARE #myTable AS TABLE (ID INT, Time_to_complete INT);
INSERT INTO #myTable
VALUES (1, 5)
, (1, 7)
, (1, 8)
, (2, 23)
, (2, 6)
, (3, 7)
, (4, 16)
, (4, 14);
SELECT O.ID, O.Time_to_complete
FROM #myTable O
WHERE o.Time_to_complete = (Select min(m.Time_to_complete) FROM #myTable m
Where o.id=m.ID
);
Result :
ID Time_to_complete
1 5
2 6
3 7
4 14
I have a recursive query that is working as intended for calculating weighted average cost for inventory calculation. My problem is that I need multiple weighted average from the same query grouped by different columns. I know I can solve the issue by calculating it multiple times, one for each key-column. But because of query performance considerations, I want it to be traversed once. Sometimes I have 1M+ rows.
I have simplified the data and replaced weighted average to a simple sum to make my problem more easy to follow.
How can I get the result below using recursive cte? Remember that I have to use a recursive query to calculate weighted average cost. I am on sql server 2016.
Example data (Id is also the sort order. The Id and Key is unique together.)
Id Key1 Key2 Key3 Value
1 1 1 1 10
2 1 1 1 10
3 1 2 1 10
4 2 2 1 10
5 1 2 1 10
6 1 1 2 10
7 1 1 1 10
8 3 3 1 10
Expected result
Id Key1 Key2 Key3 Value Key1Sum Key2Sum Key3Sum
1 1 1 1 10 10 10 10
2 1 1 1 10 20 20 20
3 1 2 1 10 30 10 30
4 2 2 1 10 10 20 40
5 1 2 1 10 40 30 50
6 1 1 2 10 50 30 10
7 1 1 1 10 60 40 60
8 3 3 1 10 10 10 70
EDIT
After some well deserved criticism I have to be much better in how I make a question.
Here is an example and why I need a recursive query. In the example I get the result for Key1, but I need it for Key2 and Key3 as well in the same query. I know that I can repeat the same query three times, but that is not preferable.
DECLARE #InventoryItem AS TABLE (
IntentoryItemId INT NULL,
InventoryOrder INT,
Key1 INT NULL,
Key2 INT NULL,
Key3 INT NULL,
Quantity NUMERIC(22,9) NOT NULL,
Price NUMERIC(16,9) NOT NULL
);
INSERT INTO #InventoryItem (
IntentoryItemId,
InventoryOrder,
Key1,
Key2,
Key3,
Quantity,
Price
)
VALUES
(1, NULL, 1, 1, 1, 10, 1),
(2, NULL, 1, 1, 1, 10, 2),
(3, NULL, 1, 2, 1, 10, 2),
(4, NULL, 2, 2, 1, 10, 1),
(5, NULL, 1, 2, 1, 10, 5),
(6, NULL, 1, 1, 2, 10, 3),
(7, NULL, 1, 1, 1, 10, 3),
(8, NULL, 3, 3, 1, 10, 1);
--The steps below will give me the cost "grouped" by Key1
WITH Key1RowNumber AS (
SELECT
IntentoryItemId,
ROW_NUMBER() OVER (PARTITION BY Key1 ORDER BY IntentoryItemId) AS RowNumber
FROM #InventoryItem
)
UPDATE #InventoryItem
SET InventoryOrder = Key1RowNumber.RowNumber
FROM #InventoryItem InventoryItem
INNER JOIN Key1RowNumber
ON Key1RowNumber.IntentoryItemId = InventoryItem.IntentoryItemId;
WITH cte AS (
SELECT
IntentoryItemId,
InventoryOrder,
Key1,
Quantity,
Price,
CONVERT(NUMERIC(22,9), InventoryItem.Quantity) AS CurrentQuantity,
CONVERT(NUMERIC(22,9), (InventoryItem.Quantity * InventoryItem.Price) / NULLIF(InventoryItem.Quantity, 0)) AS AvgPrice
FROM #InventoryItem InventoryItem
WHERE InventoryItem.InventoryOrder = 1
UNION ALL
SELECT
Sub.IntentoryItemId,
Sub.InventoryOrder,
Sub.Key1,
Sub.Quantity,
Sub.Price,
CONVERT(NUMERIC(22,9), Main.CurrentQuantity + Sub.Quantity) AS CurrentQuantity,
CONVERT(NUMERIC(22,9),
((Main.CurrentQuantity) * Main.AvgPrice + Sub.Quantity * Sub.price)
/
NULLIF((Main.CurrentQuantity) + Sub.Quantity, 0)
) AS AvgPrice
FROM CTE Main
INNER JOIN #InventoryItem Sub
ON Main.Key1 = Sub.Key1
AND Sub.InventoryOrder = main.InventoryOrder + 1
)
SELECT cte.IntentoryItemId, cte.AvgPrice
FROM cte
ORDER BY IntentoryItemId
Why you will want to calculate on 1M+ rows ?
Secondly I think your db design is wrong ? key1 ,key2,key3 should have been unpivoted and one column called Keys and 1 more column to identify each key group.
It will be clear to you in below example.
If I am able to optimize my query then I can think of calculating many rows else I try to limit number of rows.
Also if possible you can think of keeping calculated column of Avg Price.i.e. when table is populated then you can calculate and store it.
First let us know, if output is correct or not.
DECLARE #InventoryItem AS TABLE (
IntentoryItemId INT NULL,
InventoryOrder INT,
Key1 INT NULL,
Key2 INT NULL,
Key3 INT NULL,
Quantity NUMERIC(22,9) NOT NULL,
Price NUMERIC(16,9) NOT NULL
);
INSERT INTO #InventoryItem (
IntentoryItemId,
InventoryOrder,
Key1,
Key2,
Key3,
Quantity,
Price
)
VALUES
(1, NULL, 1, 1, 1, 10, 1),
(2, NULL, 1, 1, 1, 10, 2),
(3, NULL, 1, 2, 1, 10, 2),
(4, NULL, 2, 2, 1, 10, 1),
(5, NULL, 1, 2, 1, 10, 5),
(6, NULL, 1, 1, 2, 10, 3),
(7, NULL, 1, 1, 1, 10, 3),
(8, NULL, 3, 3, 1, 10, 1);
--select * from #InventoryItem
--return
;with cte as
(
select *
, ROW_NUMBER() OVER (PARTITION BY Key1 ORDER BY IntentoryItemId) AS rn1
, ROW_NUMBER() OVER (PARTITION BY Key2 ORDER BY IntentoryItemId) AS rn2
, ROW_NUMBER() OVER (PARTITION BY Key3 ORDER BY IntentoryItemId) AS rn3
from #InventoryItem
)
,cte1 AS (
SELECT
IntentoryItemId,
Key1 keys,
Quantity,
Price
,rn1
,rn1 rn
,1 pk
FROM cte c
union ALL
SELECT
IntentoryItemId,
Key2 keys,
Quantity,
Price
,rn1
,rn2 rn
,2 pk
FROM cte c
union ALL
SELECT
IntentoryItemId,
Key3 keys,
Quantity,
Price
,rn1
,rn3 rn
,3 pk
FROM cte c
)
, cte2 AS (
SELECT
IntentoryItemId,
rn,
Keys,
Quantity,
Price,
CONVERT(NUMERIC(22,9), InventoryItem.Quantity) AS CurrentQuantity,
CONVERT(NUMERIC(22,9), (InventoryItem.Quantity * InventoryItem.Price)) a,
CONVERT(NUMERIC(22,9), InventoryItem.Price) b,
CONVERT(NUMERIC(22,9), (InventoryItem.Quantity * InventoryItem.Price) / NULLIF(InventoryItem.Quantity, 0)) AS AvgPrice
,pk
FROM cte1 InventoryItem
WHERE InventoryItem.rn = 1
UNION ALL
SELECT
Sub.IntentoryItemId,
sub.rn,
Sub.Keys,
Sub.Quantity,
Sub.Price,
CONVERT(NUMERIC(22,9), Main.CurrentQuantity + Sub.Quantity) AS CurrentQuantity,
CONVERT(NUMERIC(22,9),Main.CurrentQuantity * Main.AvgPrice),
CONVERT(NUMERIC(22,9),Sub.Quantity * Sub.price),
CONVERT(NUMERIC(22,9),
((Main.CurrentQuantity * Main.AvgPrice) + (Sub.Quantity * Sub.price))
/
NULLIF(((Main.CurrentQuantity) + Sub.Quantity), 0)
) AS AvgPrice
,sub.pk
FROM CTE2 Main
INNER JOIN cte1 Sub
ON Main.Keys = Sub.Keys and main.pk=sub.pk
AND Sub.rn = main.rn + 1
--and Sub.InventoryOrder<=2
)
select *
,(select AvgPrice from cte2 c1 where pk=2 and c1.IntentoryItemId=c.IntentoryItemId ) AvgPrice2
,(select AvgPrice from cte2 c1 where pk=2 and c1.IntentoryItemId=c.IntentoryItemId ) AvgPrice3
from cte2 c
where pk=1
ORDER BY pk,rn
Alternate Solution (for Sql 2012+) and many thanks to Jason,
SELECT *
,CONVERT(NUMERIC(22,9),avg((Quantity * Price) / NULLIF(Quantity, 0))
OVER(PARTITION BY Key1 ORDER by IntentoryItemId ROWS UNBOUNDED PRECEDING))AvgKey1Price
,CONVERT(NUMERIC(22,9),avg((Quantity * Price) / NULLIF(Quantity, 0))
OVER(PARTITION BY Key2 ORDER by IntentoryItemId ROWS UNBOUNDED PRECEDING))AvgKey2Price
,CONVERT(NUMERIC(22,9),avg((Quantity * Price) / NULLIF(Quantity, 0))
OVER(PARTITION BY Key3 ORDER by IntentoryItemId ROWS UNBOUNDED PRECEDING))AvgKey3Price
from #InventoryItem
order by IntentoryItemId
Here's how to do it in SQL Server 2012 & later...
IF OBJECT_ID('tempdb..#TestData', 'U') IS NOT NULL
DROP TABLE #TestData;
CREATE TABLE #TestData (
Id INT,
Key1 INT,
Key2 INT,
Key3 INT,
[Value] INT
);
INSERT #TestData(Id, Key1, Key2, Key3, Value) VALUES
(1, 1, 1, 1, 10),
(2, 1, 1, 1, 10),
(3, 1, 2, 1, 10),
(4, 2, 2, 1, 10),
(5, 1, 2, 1, 10),
(6, 1, 1, 2, 10),
(7, 1, 1, 1, 10),
(8, 3, 3, 1, 10);
--=============================================================
SELECT
td.Id, td.Key1, td.Key2, td.Key3, td.Value,
Key1Sum = SUM(td.[Value]) OVER (PARTITION BY td.Key1 ORDER BY td.Id ROWS UNBOUNDED PRECEDING),
Key2Sum = SUM(td.[Value]) OVER (PARTITION BY td.Key2 ORDER BY td.Id ROWS UNBOUNDED PRECEDING),
Key3Sum = SUM(td.[Value]) OVER (PARTITION BY td.Key3 ORDER BY td.Id ROWS UNBOUNDED PRECEDING)
FROM
#TestData td
ORDER BY
td.Id;
results...
Id Key1 Key2 Key3 Value Key1Sum Key2Sum Key3Sum
----------- ----------- ----------- ----------- ----------- ----------- ----------- -----------
1 1 1 1 10 10 10 10
2 1 1 1 10 20 20 20
3 1 2 1 10 30 10 30
4 2 2 1 10 10 20 40
5 1 2 1 10 40 30 50
6 1 1 2 10 50 30 10
7 1 1 1 10 60 40 60
8 3 3 1 10 10 10 70
First, I apologize if the title won't make sense but below is the detailed scenario.
Say I have a document_revision table
id document_id phase_id user_id
1 1 3 1
2 1 2 1
3 1 1 1
4 2 3 2
5 2 2 2
where phase_id is: transcribe = 3; proof = 2; and submit = 1.
I would like to write a query where I can filter the revision records where I will disregard a proof phase if the same user did the transcribe and proof. So the output would be:
id document_id phase_id user_id
1 1 3 1
3 1 1 1
4 2 3 2
I've been struggling for hours figuring out a query for this but no luck so far.
Assuming you only want the phase 3 for any case where a user_id was involved in phase 2 and 3, then one way you could do this is with ROW_NUMBER(), e.g.:
DECLARE #T TABLE (ID INT IDENTITY(1, 1), Document_ID INT, Phase_ID INT, [User_ID] INT);
INSERT #T (Document_ID, Phase_ID, [User_ID]) VALUES
(1, 1, 1), (1, 2, 1), (1, 3, 1), (2, 3, 2), (2, 2, 2), (3, 1, 1), (3, 2, 1), (3, 3, 2);
SELECT ID, Document_ID, Phase_ID, [User_ID]
FROM
(
SELECT *, RN = ROW_NUMBER() OVER (PARTITION BY Document_ID, [User_ID], CASE WHEN Phase_ID IN (2, 3) THEN 2 ELSE Phase_ID END ORDER BY Phase_ID DESC)
FROM #T
) AS T
WHERE RN = 1;
DECLARE #document_revision TABLE (
id INT IDENTITY(1,1),
document_id INT,
phase_id INT,
user_id INT
);
INSERT INTO #document_revision
(document_id, phase_id, user_id)
VALUES
(1, 3, 1),
(1, 2, 1),
(1, 1, 1),
(2, 3, 2),
(2, 2, 2),
-- To test a scenario where there is a proof and a submit with no transcribe phases and same document
(3, 2, 3),
(3, 1, 3),
-- To test a scenario where there is a transcribe and a submit with no proof phases and same document
(4, 3, 4),
(4, 1, 4),
-- To test a scenario where there is a proof and a submit with no transcribe phase (for document_id 5) but different document and same user as above
(5, 2, 4);
SELECT dr.id
, dr.document_id
, dr.phase_id
, dr.user_id
FROM #document_revision AS dr
WHERE NOT EXISTS ( SELECT 1
FROM #document_revision AS temp
-- Same user
WHERE temp.user_id = dr.user_id
-- Same document
AND temp.document_id = dr.document_id
-- To check if there is already a transcribe phase_id with the same user_id and document_id
AND temp.phase_id = 3
-- -- To check if there is already a proof phase_id with the same user_id and document_id
AND dr.phase_id = 2 )
results:
id document_id phase_id user_id
1 1 3 1
3 1 1 1
4 2 3 2
6 3 2 3
7 3 1 3
8 4 3 4
9 4 1 4
10 5 2 4
I have a problem with a query.
This is the data (order by Timestamp):
Data
ID Value Timestamp
1 0 2001-1-1
2 0 2002-1-1
3 1 2003-1-1
4 1 2004-1-1
5 0 2005-1-1
6 2 2006-1-1
7 2 2007-1-1
8 2 2008-1-1
I need to extract distinct values and the first occurance of the date. The exception here is that I need to group them only if not interrupted with a new value in that timeframe.
So the data I need is:
ID Value Timestamp
1 0 2001-1-1
3 1 2003-1-1
5 0 2005-1-1
6 2 2006-1-1
I've made this work by a complicated query, but am sure there is an easier way to do it, just cant think of it. Could anyone help?
This is what I started with - probably could work with that. This is a query that should locate when a value is changed.
> SELECT * FROM Data d1 join Data d2 ON d1.Timestamp < d2.Timestamp and
> d1.Value <> d2.Value
It probably could be done with a good use of row_number clause but cant manage it.
Sample data:
declare #T table (ID int, Value int, Timestamp date)
insert into #T(ID, Value, Timestamp) values
(1, 0, '20010101'),
(2, 0, '20020101'),
(3, 1, '20030101'),
(4, 1, '20040101'),
(5, 0, '20050101'),
(6, 2, '20060101'),
(7, 2, '20070101'),
(8, 2, '20080101')
Query:
;With OrderedValues as (
select *,ROW_NUMBER() OVER (ORDER By TimeStamp) as rn --TODO - specific columns better than *
from #T
), Firsts as (
select
ov1.* --TODO - specific columns better than *
from
OrderedValues ov1
left join
OrderedValues ov2
on
ov1.Value = ov2.Value and
ov1.rn = ov2.rn + 1
where
ov2.ID is null
)
select * --TODO - specific columns better than *
from Firsts
I didn't rely on the ID values being sequential and without gaps. If that's the situation, you can omit OrderedValues (using the table and ID in place of OrderedValues and rn). The second query simply finds rows where there isn't an immediate preceding row with the same Value.
Result:
ID Value Timestamp rn
----------- ----------- ---------- --------------------
1 0 2001-01-01 1
3 1 2003-01-01 3
5 0 2005-01-01 5
6 2 2006-01-01 6
You can order by rn if you need the results in this specific order.