I have a table with few hundred thousand rows, with columns containing a start and finish datetime, something like this:
ID StartDateTime FinishDateTime
--------------------------------------------------------
1 2001-01-01 04:05:06.789 2001-02-03 04:05:06.789
2 2001-01-01 05:05:06.789 2001-01-01 07:05:06.789
3 2001-01-01 06:05:06.789 2001-02-04 07:05:06.789
4 2001-03-01 06:05:06.789 2001-02-03 04:05:06.789
For each row, I need to count the number of 'active' rows at the start time; as in count rows that start before and finish after the startdatetime for each row. For instance: for ID=3, the startdatetime falls between the startdatetime and finishdatetime of ID=1 and ID=2, but not ID=3 or ID=4, so it should return 2.
The desired output is:
ID ActiveRows
-----------------
1 0
2 1
3 2
4 0
I can get it to work using the query below, but it takes hours to run.
select
ID,
(select count(1)
from table tbl2
where tbl2.StartDateTime < tbl.StartDateTime
and tbl2.FinishDateTime > tbl.StartDateTime) as 'ActiveRows'
from
table tbl
I've also tried joining the table on itself, but it also seems extremely slow.
select
tbl.ID, count(1)
from
table tbl
left join table
tbl2 on tbl2.StartDateTime < tbl.StartDateTime
and tbl2.FinishDateTime > tbl.StartDateTime
group by
tbl.ID
What is the fastest way to perform this calculation?
You can do this using Apply operator
SELECT tbl.id,
oa.activerows
FROM yourtable tbl
OUTER apply(SELECT Count(tbl2.id)
FROM yourtable tbl2
WHERE tbl2.startdatetime < tbl.startdatetime
AND tbl2.finishdatetime > tbl.startdatetime) oa (activerows)
and your original query should be using LEFT JOIN to get the ID's with 0 count
To further improve the performance you can create a non clustered index on yourtable
Create Nonclustered Index Nix_table on
yourtable (startdatetime,finishdatetime) Include (Id)
Live Demo
Related
I want to update 15 records in that first 5 records date should be June 2019,next 5 records with July 2019,last 5 records with Aug 2019 based on employee id,Can any one tell me how to write this type of query in SQL Server Management Studio V 17.7,I've tried with below query but unable to do for next 5 rows..
Like below query
Update TOP(5) emp.employee(nolock) set statusDate=GETDATE()-31 where EMPLOYEEID='XCXXXXXX';
To update only a certain number of rows of a table you will need to include a FROM clause and join a sub-query which limits the number of rows. I would suggest using OFFSET AND FETCH instead of top so that you can skip X number of rows
You will also want to use the DATEADD function instead of directly subtracting a number from the DateTime function GETDATE(). I'm not certain but I think your query will subtract milliseconds. If you intend to go back a month I would suggest subtracting a month rather than 31 days. Alternatively it might be easier to specify an exact date like '2019-06-01'
For example:
TableA
- TableAID INT PK
- EmployeeID INT FK
- statusDate DATETIME
UPDATE TableA
SET statusDate = '2019-06-01'
FROM TableA
INNER JOIN
(
SELECT TableAID
FROM TableA
WHERE EmployeeID = ''
ORDER BY TableAID
OFFSET 0 ROWS
FETCH NEXT 5 ROWS ONLY
) T1 ON TableA.TableAID = T1.TableAID
Right now it looks like your original query is updating the table employee rather than a purchases table. You will want to replace my TableA with whichever table it is you're updating and replace TableAID with the PK field of it.
You can use a ROW_NUMBER to get a ranking by employee, then just update the first 15 rows.
;WITH EmployeeRowsWithRowNumbers AS
(
SELECT
T.*,
RowNumberByEmployee = ROW_NUMBER() OVER (
PARTITION BY
T.EmployeeID -- Generate a ranking by each different EmployeeID
ORDER BY
(SELECT NULL)) -- ... in no particular order (you should supply one if you have an ordering column)
FROM
emp.employee AS T
)
UPDATE E SET
statusDate = CASE
WHEN E.RowNumberByEmployee <= 5 THEN '2019-06-01'
WHEN E.RowNumberByEmployee BETWEEN 6 AND 10 THEN '2019-07-01'
ELSE '2019-08-01' END
FROM
EmployeeRowsWithRowNumbers AS E
WHERE
E.RowNumberByEmployee <= 15
I have a tree, where specific node in tree can appear in another node in tree. (2 in my example):
1
/ \
2 3
/ \ \
4 5 6
\
2
/ \
4 5
Notice 2 is duplicated. First under 1, and second under 6.
My recursion is:
with cte (ParentId, ChildId, Field1, Field2) AS (
select BOM.ParentId, BOM.ChildId, BOM.Field1, BOM.Field2
from BillOfMaterials BOM
WHERE ParentId=x
UNION ALL
SELECT BOM.ParentId, BOM.ChildId, BOM.Field1, BOM.Field2 FROM BillOfMaterials BOM
JOIN cte on BOM.ParentId = cte.ChildId
)
select * from cte;
But the problem is that in result relation 2-4 and 2-5 is duplicated (first from relation 1-2 and second from relation 6-2):
ParentId ChildId OtherFields
1 2
1 3
2 4 /*from 1-2*/
2 5 /*from 1-2*/
3 6
6 2
2 4 /*from 6-2*/
2 5 /*from 6-2*/
Is there any way, to skip visiting duplicated relationships? I do no see any logic why should recursion run over rows that are already in result. It would be faster. Something like that:
with cte (ParentId, ChildId, Field1, Field2) AS (
select BOM.ParentId, BOM.ChildId, BOM.Field1, BOM.Field2
from BillOfMaterials BOM
WHERE ParentId=x
UNION ALL
SELECT BOM.ParentId, BOM.ChildId, BOM.Field1, BOM.Field2 FROM BillOfMaterials BOM
JOIN cte on BOM.ParentId = cte.ChildId
------> WHERE (select count(*) FROM SoFarCollectedResult WHERE ParentId=BOM.ParentId AND ChildId=BOM.ChildId ) = 0
)
select * from cte;
I found this thread, but it is 8 years old.
I am using SQL server 2016.
If this is not possible, then my question is how can I remove duplicates from final result, but check distinct only on ParentId and ChildId columns?
Edited:
Expected result is:
ParentId ChildId OtherFields
1 2
1 3
2 4
2 5
3 6
6 2
You can, with adding to 2 little tricks to the SQL.
But you need an extra Id column with a sequential number.
For example via an identity, or a datetime field that shows when the record was inserted.
For the simple reason that as far the database is concerned, there is no order in the records as they were inserted, unless you got a column that indicates that order.
Trick 1) Join the CTE record only to Id's that are higher. Because if they were lower then those are the duplicates you don't want to join.
Trick 2) Use the window function Row_number to get only those that are nearest to the Id the recursion started from
Example:
declare #BillOfMaterials table (Id int identity(1,1) primary key, ParentId int, ChildId int, Field1 varchar(8), Field2 varchar(8));
insert into #BillOfMaterials (ParentId, ChildId, Field1, Field2) values
(1,2,'A','1-2'),
(1,3,'B','1-3'),
(2,4,'C','2-4'), -- from 1-2
(2,5,'D','2-5'), -- from 1-2
(3,6,'E','3-6'),
(6,2,'F','6-2'),
(2,4,'G','2-4'), -- from 6-2
(2,5,'H','2-5'); -- from 6-2
;with cte AS
(
select Id as BaseId, 0 as Level, BOM.*
from #BillOfMaterials BOM
WHERE ParentId in (1)
UNION ALL
SELECT CTE.BaseId, CTE.Level + 1, BOM.*
FROM cte
JOIN #BillOfMaterials BOM on (BOM.ParentId = cte.ChildId and BOM.Id > CTE.Id)
)
select ParentId, ChildId, Field1, Field2
from (
select *
--, row_number() over (partition by BaseId, ParentId, ChildId order by Id) as RNbase
, row_number() over (partition by ParentId, ChildId order by Id) as RN
from cte
) q
where RN = 1
order by ParentId, ChildId;
Result:
ParentId ChildId Field1 Field2
-------- ------- ------ ------
1 2 A 1-2
1 3 B 1-3
2 4 C 2-4
2 5 D 2-5
3 6 E 3-6
6 2 F 6-2
Anyway, as a sidenote, normally a Parent-Child relation table is used differently.
More often it's just a table with unique Parent-Child combinations that are foreign keys to another table where that Id is a primary key. So that the other fields are kept in that other table.
Change your last query from:
select * from cte;
To:
select * from cte group by ParentId, ChildId;
This will essentially take what you have right now, but go one step further and remove rows that have already appeared, which would take care of your duplicate problem. Just be sure that all * returns here is ParentId and ChildId, should it be returning other columns you will need to either add them to the GROUP BY or apply some sort of aggregator to it so that it can still group (max, min, count...).
Should you have more rows that you can't aggregate or group on, you could write the query as such:
select * from cte where ID in (select MAX(ID) from cte group by ParentId, ChildId);
Where ID would be your primary table id for cte. This would take the maximum id when rows matched, which would normally be your latest entry, if you want the earliest entry just change MAX() to MIN().
I am a newbie poster but have spent a lot of time researching answers here. I can't quite figure out how to create a SQL result set using SQL Server 2008 R2 that should probably be using lead/lag from more modern versions. I am trying to aggregate data based on sequencing of one column, but there can be varying numbers of instances in each sequence. The only way I know a sequence has ended is when the next row has a lower sequence number. So it may go 1-2, 1-2-3-4, 1-2-3, and I have to figure out how to make 3 aggregates out of that.
Source data is joined tables that look like this (please help me format):
recordID instanceDate moduleID iResult interactionNum
1356 10/6/15 16:14 1 68 1
1357 10/7/15 16:22 1 100 2
1434 10/9/15 16:58 1 52 1
1435 10/11/15 17:00 1 60 2
1436 10/15/15 16:57 1 100 3
1437 10/15/15 16:59 1 100 4
I need to find a way to separate the first 2 rows from the last 4 rows in this example, based on values in the last column.
What I would love to ultimately get is a result set that looks like this, which averages the iResult column based on the grouping and takes the first instanceDate from the grouping:
instanceDate moduleID iResult
10/6/15 1 84
10/9/15 1 78
I can aggregate to get this result using MIN and AVG if I can just find a way to separate the groups. The data is ordered by instanceDate (please ignore the date formatting here) then interactionNum and the group separation should happen when the query finds a row where the interactionNum is <= than the previous row (will usually start over with '1' but not always, so prefer just to separate on a lower or equal integer value).
Here is the query I have so far (includes the joins that give the above data set):
SELECT
X.*
FROM
(SELECT TOP 100 PERCENT
instanceDate, b.ModuleID, iResult, b.interactionNum
FROM
(firstTable a
INNER JOIN
secondTable b ON b.someID = a.someID)
WHERE
a.someID = 2
AND b.otherID LIKE 'xyz'
AND a.ModuleID = 1
ORDER BY
instanceDate) AS X
OUTER APPLY
(SELECT TOP 1
*
FROM
(SELECT
instanceDate, d.ModuleID, iResult, d.interactionNum
FROM
(firstTable c
INNER JOIN
secondTable d ON d.someID = c.someID)
WHERE
c.someID = 2
AND d.otherID LIKE 'xyz'
AND c.ModuleID = 1
AND d.interactionNum = X.interactionNum
AND c.instanceDate < X.instanceDate) X2
ORDER BY
instanceDate DESC) Y
WHERE
NOT EXISTS (SELECT Y.interactionNum INTERSECT SELECT X.interactionNum)
But this is returning an interim result set like this:
instanceDate ModuleID iResult interactionNum
10/6/15 16:10 1 68 1
10/6/15 16:14 1 100 2
10/15/15 16:57 1 100 3
10/15/15 16:59 1 100 4
and the problem is that interactionNum 3, 4 do not belong in this result set. They would go in the next result set when I loop over this query. How do I keep them out of the result set in this iteration? I need the result set from this query to just include the first two rows, 'seeing' that row 3 of the source data has a lower value for interactionNum than row 2 has.
Not sure what ModuleID was supposed to be used, but I guess you're looking for something like this:
select min (instanceDate), [moduleID], avg([iResult])
from (
select *,row_number() over (partition by [moduleID] order by instanceDate) as RN
from Table1
) X
group by [moduleID], RN - [interactionNum]
The idea here is to create a running number with row_number for each moduleid, and then use the difference between that and InteractionNum as grouping criteria.
Example in SQL Fiddle
Here is my solution, although it should be said, I think #JamesZ answer is cleaner.
I created a new field called newinstance which is 1 wherever your instanceNumber is 1. I then created a rolling sum(newinstance) called rollinginstance to group on.
Change the last select to SELECT * FROM cte2 to show all the fields I added.
IF OBJECT_ID('tempdb..#tmpData') IS NOT NULL
DROP TABLE #tmpData
CREATE TABLE #tmpData (recordID INT, instanceDate DATETIME, moduleID INT, iResult INT, interactionNum INT)
INSERT INTO #tmpData
SELECT 1356,'10/6/15 16:14',1,68,1 UNION
SELECT 1357,'10/7/15 16:22',1,100,2 UNION
SELECT 1434,'10/9/15 16:58',1,52,1 UNION
SELECT 1435,'10/11/15 17:00',1,60,2 UNION
SELECT 1436,'10/15/15 16:57',1,100,3 UNION
SELECT 1437,'10/15/15 16:59',1,100,4
;WITH cte1 AS
(
SELECT *,
CASE WHEN interactionNum=1 THEN 1 ELSE 0 END AS newinstance,
ROW_NUMBER() OVER(ORDER BY recordID) as rowid
FROM #tmpData
), cte2 AS
(
SELECT *,
(select SUM(newinstance) from cte1 b where b.rowid<=a.rowid) as rollinginstance
FROM cte1 a
)
SELECT MIN(instanceDate) AS instanceDate, moduleID, AVG(iResult) AS iResult
FROM cte2
GROUP BY moduleID, rollinginstance
I want to fetch orders that have a “Received” (ActivityID = 1) activity but not a “Delivered” (ActivityID = 4) activity on orders table. i.e orders that are received but not deliverd yet.
my query is
SELECT OrderID FROM tblOrderActivity
where (tblOrderActivity.ActivityID = 1 AND tblOrderActivity.ActivityID != 4)
GROUP BY OrderID
it is not returning desired result.
result should be orderID 2 and 4
Your query doesn't really make sense. Grouping happens after WHERE clause, so you're basically getting all orders that have ActivityID ==1 (because if activity Id is 1 there it's always not equal to 4).
After WHERE clause is applied you end up with following rows:
OrderID ActivityID
1 1
2 1
3 1
4 1
And these are the orders you group. No more condition is evaluated.
If 4 is the highest possible ActivityID you could do following:
SELECT OrderID
FROM tblOrderActivity
GROUP BY OrderID
HAVING MAX(ActivityID) < 4
HAVING condition is applied after grouping, which is what you want.
I don't think Group by is needed here. You can use a Subquery to find he order's which is not delivered. Try this.
SELECT *
FROM Yourtable a
WHERE a.ActivityID = 1
AND NOT EXISTS (SELECT 1
FROM yourtable b
WHERE a.OrderID = b.OrderID
AND b.ActivityID = 4)
I have an MSSQL 2000 table that has a lot of duplicate entries. Each row has an EffectiveChange data column. I want to get the most up to date row by getting the row with the max(EffectiveChange) for each key value.
This is some sample data:
NPANXX TZONE EFFCHANGE RATE
555555 1 01/01/09 1
555555 1 05/01/09 6
214555 2 01/01/09 1
214555 2 05/01/09 3
657555 3 05/01/09 1
657555 1 01/01/09 1
I came up with this:
SELECT DISTINCT
NPANXX,
TZONE,
RATE
FROM AreaCodes
INNER JOIN (SELECT DISTINCT NPANXX, EFFCHANGE FROM AREACODES) b
ON b.NPANXX = AreaCodes.NPANXX
GROUP BY
NPANXX,
TZONE,
RATE
HAVING AreadCodes.EFFCHANGE = max(b.EFFCHANGE)
My question is whether or not this query will give me the max EFFCHANGE row for each key (NPANXX) or will it only give me rows having the MAX(EFFCHANGE) for the whole table?
one way since you are using 2000 in 2005 and up you can also use row_number()
SELECT t1.*
from AreaCodes t1
INNER JOIN (SELECT NPANXX, max(EFFCHANGE) as MaxDate FROM AREACODES
group by NPANXX) t2
ON t1.NPANXX = t2.NPANXX
and t1.EFFCHANGE = t2.MaxDate
here is the complete code including DML and DDL
create table AreaCodes(NPANXX int,TZONE int,EFFCHANGE datetime,RATE int)
insert AreaCodes values(555555,1,'20090101',1)
insert AreaCodes values(555555,1,'20090501',6)
insert AreaCodes values(214555,2,'20090101',1)
insert AreaCodes values(214555,2,'20090501',3)
insert AreaCodes values(657555,3,'20090501',1)
insert AreaCodes values(657555,1,'20090101',1)
SELECT t1.*
from AreaCodes t1
INNER JOIN (SELECT NPANXX, max(EFFCHANGE) as MaxDate FROM AREACODES
group by NPANXX) t2
ON t1.NPANXX = t2.NPANXX
and t1.EFFCHANGE = t2.MaxDate
output
657555 3 2009-05-01 00:00:00.000 1
555555 1 2009-05-01 00:00:00.000 6
214555 2 2009-05-01 00:00:00.000 3