SQL Server + retrieve the data without overlapping datetimes - sql-server

I need some thoughts to the best implementation of this case
I have data where there can be multiple values with start & end datetime, now i need to pull the data without overlapping the dates, below is the sample data.
CREATE TABLE table2 (
start_date DATE NOT NULL,
end_date DATE NOT NULL,
comments VARCHAR(100) NULL ,
id int
);
INSERT INTO table2 (start_date, end_date, id) VALUES
('2011-12-01', '2012-01-02', 5),
('2012-01-01', '2012-01-06', 5),
('2012-01-05', '2012-01-10', 5),
('2012-01-09', '2012-01-11', 5);
from this i need the data which is not overlapping for each id
('2011-12-01', '2012-01-02', 5),
('2012-01-05', '2012-01-10', 5)
Please share me the thoughts on what cane be the best way to implement this ?
Thanks for the support
Thanks,
Manoj.

The output you provide is very unclear. On the first sight you are looking for an intervall, where not other intervall starts within (which would lead to a continued intervall). But your second expected line is overlapping with 2012-01-10?
The following query will return a row, if its end_date is not within another rows intervall... But this does not return your two expected rows, just the first.
SELECT * FROM table2 AS t
WHERE NOT EXISTS(SELECT 1
FROM table2 AS x
WHERE x.start_date<>t.start_date
AND x.end_date BETWEEN t.start_date AND t.end_date
);
I hope this points you the right direction...

The following will do it:
WITH cte
AS
(
SELECT
[start_date]
, end_date
, comments
, id
FROM
(
SELECT
[start_date]
, end_date
, comments
, id
, ROW_NUMBER() OVER (PARTITION BY id ORDER BY [start_date]) R
FROM table2
) Q
WHERE R = 1
UNION ALL
SELECT
[start_date]
, end_date
, comments
, id
FROM
(
SELECT
T.[start_date]
, T.end_date
, T.comments
, T.id
, ROW_NUMBER() OVER (PARTITION BY T.id ORDER BY T.[start_date]) R
FROM
cte C
JOIN table2 T ON
C.id = T.id
AND T.[start_date] > C.end_date
) Q
WHERE R = 1
)
SELECT
[start_date]
, end_date
, comments
, id
FROM cte

Related

Multi - Columns OVERLAPPING DATES

;with cte as (
select Domain_Id, Starting_Date, End_Date
from Que_Date
union all
select t.Domain_Id, cte.Starting_Date, t.End_Date
from cte
join Que_Date t on cte.Domain_Id = t.Domain_Id and cte.End_Date = t.Starting_Date),
cte2 as (
select *, rn = row_number() over (partition by Domain_Id, End_Date order by Domain_Id)
from cte
)
select DISTINCT Domain_Id, Starting_Date, max(End_Date) enddate
from cte2
where rn=1
group by Domain_Id, Starting_Date
order by Domain_Id, Starting_Date;
select * from Que_Date
This is the code that I have wrote but i am getting an extra row i.e 2nd row is extra, the expected output should have only 1st, 3rd and 4th row as output so please help me with it.
I have attached an image showing Input, Excepted Output, and the output that I am getting.
You've got so many results in your first cte. Your first cte has consisting domains. So you cannot filter domains based on your cte. So you query has unnecessary rows.
Try this solution. Cte ConsistentDomains has just consistent domains. So based on this cte, we can get not overlapped results.
Create and fill data:
CREATE TABLE FooTable
(
Domain_ID INT,
Starting_Date DATE,
End_Date Date
)
INSERT INTO dbo.FooTable
(
Domain_ID,
Starting_Date,
End_Date
)
VALUES
( 1, -- Domain_ID - int
CONVERT(datetime,'01-01-2011',103), -- Starting_Date - date
CONVERT(datetime,'05-01-2011',103) -- End_Date - date
)
, (1, CONVERT(datetime,'05-01-2011',103), CONVERT(datetime,'07-01-2011',103))
, (1, CONVERT(datetime,'07-01-2011',103), CONVERT(datetime,'15-01-2011',103))
, (2, CONVERT(datetime,'11-05-2011',103), CONVERT(datetime,'12-05-2011',103))
, (2, CONVERT(datetime,'13-05-2011',103), CONVERT(datetime,'14-05-2011',103))
Query to find not overlapping results:
DECLARE #startDate varchar(50) = '2011-01-01';
WITH ConsistentDomains AS
(
SELECT
f.Domain_ID
, f.Starting_Date
, f.End_Date
FROM FooTable f
WHERE f.Starting_Date = #startDate
UNION ALL
SELECT
s.Domain_ID
, s.Starting_Date
, s.End_Date
FROM FooTable s
INNER JOIN ConsistentDomains cd
ON s.Domain_ID = cd.Domain_ID
AND s.Starting_Date = cd.End_Date
), ConsistentDomainsRownumber AS
(
SELECT
cd.Domain_ID
, cd.Starting_Date
, cd.End_Date
, ROW_NUMBER() OVER (PARTITION BY cd.Domain_ID ORDER BY cd.Starting_Date,
cd.End_Date) RN
FROM ConsistentDomains cd
)
SELECT cd.Domain_ID
, convert(varchar, cd.Starting_Date, 105) Starting_Date
, convert(varchar, cd.End_Date, 105) End_Date
FROM ConsistentDomainsRownumber cd WHERE cd.RN = 1
UNION ALL
SELECT
ft.Domain_ID
, convert(varchar, ft.Starting_Date, 105) Starting_Date
, convert(varchar, ft.End_Date, 105) End_Date
FROM dbo.FooTable ft WHERE ft.Domain_ID NOT IN (SELECT cd.Domain_ID FROM
ConsistentDomainsRownumber cd)
Output:
I used the same table creating script as provided by #stepup, but you can also get your outcome in this way.
CREATE TABLE testtbl
(
Domain_ID INT,
Starting_Date DATE,
End_Date Date
)
INSERT INTO testtbl
VALUES
(1, convert(date, '01-01-2011' ,103), convert(date, '05-01-2011',103) )
,(1, convert(date, '05-01-2011' ,103), convert(date, '07-01-2011',103) )
,(1, convert(date, '07-01-2011' ,103), convert(date, '15-01-2011',103) )
,(2, convert(date, '11-05-2011' ,103), convert(date, '12-05-2011',103) )
,(2, convert(date, '13-05-2011' ,103), convert(date, '14-05-2011',103) )
You can make use of self join and Firs_value and last value within the group to make sure that you are comparing within the same ID and overlapping dates.
select distinct t.Domain_ID,
case when lag(t1.starting_date)over (partition by t.Domain_id order by
t.starting_date) is not null
then first_value(t.Starting_Date) over (partition by t.domain_id order by
t.starting_date)
else t.Starting_Date end StartingDate,
case when lead(t.domain_id) over (partition by t.domain_id order by t.starting_date) =
t1.Domain_ID then isnull(last_value(t.End_Date) over (partition by t.domain_id order by t.end_date rows between unbounded preceding and unbounded following),t.End_Date)
else t.End_Date end end_date
from testtbl t
left join testtbl t1 on t.Domain_ID = t1.Domain_ID
and t.End_Date = t1.Starting_Date
and t.Starting_Date < t1.Starting_Date
Output:
Domain_ID StartingDate end_date
1 2011-01-01 2011-01-15
2 2011-05-11 2011-05-12
2 2011-05-13 2011-05-14

HOWTO: Include column that is not part of an aggregate function or Group by clause in SQL SERVER

I have below recursive CTE:
DECLARE #T AS TABLE
(
PARENT_TEST_ID int,
TEST_ID int,
VALIDATED int,
ERR int
)
INSERT INTO #T VALUES
(NULL, 1, 0, 0),
(NULL, 2, 0, 0),
(1,3,0, 0),
(1,4,0, 0),
(2,5,0, 0),
(2,6,0, 0),
(2,7,0, 0),
(7,8,0, 1)
;with C as
(
select TEST_ID, PARENT_TEST_ID, (CASE WHEN ERR=1 THEN 0 ELSE 1 END) AS VALIDATED, ERR
from #T
where TEST_ID not in (select PARENT_TEST_ID
from #T
where PARENT_TEST_ID is not null) AND PARENT_TEST_ID IS NOT NULL
union all
select
T.TEST_ID,
T.PARENT_TEST_ID,
(case when t.TEST_ID=c.PARENT_TEST_ID and c.VALIDATED=1 AND T.ERR=0 THEN 1 ELSE 0 END) as VALIDATED,
T.ERR
from #T as T
inner join C
on T.TEST_ID = C.PARENT_TEST_ID
)
SELECT DISTINCT PARENT_TEST_ID, TEST_ID, MIN(VALIDATED) FROM C
GROUP BY TEST_ID
But I cannot include PARENT_TEST_ID column in the result SELECT as it is not part of the group by clause, so I have found this link:
Including column that is not part of the group by
So now I am trying to do the same in my case, I am trying to apply John Woo solution but I do not know how. Any help? Or any other best solution?
iamdave is right, but if you want to implement John Woo's solution from that linked answer, it would look like this:
rextester: http://rextester.com/QQQGM79701
;with C as (
select
test_id
, parent_test_id
, validated=(case when err = 1 then 0 else 1 end)
, err
from #T as t
where t.test_id not in (
select i.parent_test_id
from #T as i
where i.parent_test_id is not null
)
and t.parent_test_id is not null
union all
select
t.test_id
, t.parent_test_id
, validated = case
when t.test_id = c.parent_test_id
and c.validated = 1
and t.err = 0
then 1
else 0
end
, t.err
from #T as T
inner join c on t.test_id = c.parent_test_id
)
, r as (
select
parent_test_id
, test_id
, Validated
, rn = row_number() over (
partition by test_id
order by Validated
)
from C
)
select
parent_test_id
, test_id
, Validated
from r
where rn=1
Just change your last line to GROUP BY PARENT_TEST_ID, TEST_ID
The error you are getting is telling you that you can't add columns to the output if you don't either aggregate on it or group your other aggregates by it. By adding the column to the group by, you are telling SQL Server that you want to do your min by both the parent and the test ID values.
rextester: http://rextester.com/JRF55398

SQL - How can I Group sets of sequential numbers and return the MIN and Max Dates

this is driving me crazy! does anyone know how to write some SQL that will return the MIN and MAX dates from groups of sequential numbers? please see screen shots below.
This is the SQL I used:
SELECT
num
, empid
, orderdate
FROM
(SELECT
ROW_NUMBER() OVER (ORDER BY orderdate) AS Num
, empid
, orderdate
FROM TSQL.Sales.Orders)T1
WHERE empid = 4
This is what it returns:
What I would like to do is get the Min and Max dates for each set of sequential numbers based on the num column. For example: the first set would be num 3, 4, 5 & 6. so the Min date is 2006-07-08 and the Max date is 2006-07-10
See example of results needed below
Any help with this would be much appreciated, thank you in advance
Update
I have now changed the SQL to do what I needed: example as follows:
Select
empid
, Island
, MIN(orderdate) as 'From'
, Max(orderdate) as 'To'
From
(select
empid
, num
, num - ROW_NUMBER() OVER (ORDER BY num, orderdate) as Island
, orderdate
from
(Select
ROW_NUMBER() OVER (ORDER BY orderdate) as Num
, empid
, orderdate
from TSQL.Sales.Orders)T1
where empid = 4
)T2
group By
empid
, Island
Result
Thank you so much for your help on this, I have been trying this for ages
Regards
Jason
This should do it:
;with dateSequences(num, empId, orderDate) as
(
select ROW_NUMBER() over (order by orderdate) as num
, empId
, orderdate
from yourTable
),
dateGroups(groupNum, empId, orderDate, num) as
(
select currD.num, currD.empid, currD.orderDate, currD.num
from dateSequences currD
left join dateSequences prevD on prevD.num = currD.num - 1 and prevD.empid = currD.empId
where prevD.num is null
union all
select dg.groupNum, d.empId, d.orderDate, d.num
from dateSequences d
inner join dateGroups dg on dg.num + 1 = d.num and d.empId = dg.empId
)
select empId, min(orderDate) as MinDate, max(orderDate) as MaxDate
from dateGroups
where empId = 4
group by empId, groupNum
Basically it first makes a CTE to get the row numbers for each row in date order. Then it makes a recursive CTE that first finds all the groups with no previous sequential entries then adds all subsequent entries to the same group. Finally it takes the records with all the group numbers assigned and groups them by their group number and gets the min and max dates.

SQL Server - Get customers with nth order in specific date range

I'm tasked with the following:
Select a list of all customers who had their nth order during a certain date range (usually a specific month).
This list needs to contain: customer id, sum of first n orders
My tables are something like this:
[dbo.customers]: customerID
[dbo.orders]: orderID, customerID,
orderDate, orderTotal
Here is what I've tried so far:
-- Let's assume our threshold (n) is 10
-- Let's assume our date range is April 2013
-- Get customers that already had n orders before the beginning of the given date range.
DECLARE #tmpcustomers TABLE (tmpcustomerID varchar(8))
INSERT INTO
#tmpcustomers
SELECT
c.customerID
FROM
orders o
INNER JOIN customers c ON o.customerID = c.customerID
WHERE
o.orderDate < '2013-04-01'
GROUP BY c.customerID
HAVING (COUNT(o.orderID) >= 10)
-- Now get all customers that have n orders sometime within the given date range
-- but did not have n orders before the beginning of the given date range.
SELECT
a.customerID, SUM(orderTotal) AS firstTenOrderTotal
SELECT
o.customerID, o.orderID, o.orderTotal
FROM
orders o
INNER JOIN customers c ON c.customerID = o.customerID
WHERE
a.customerID NOT IN ( SELECT tmpcustomerID FROM #tmpcustomers )
AND
o.orderDate > '2013-04-01'
AND
o.orderDate < '2013-05-01'
GROUP BY c.customerID
HAVING COUNT(o.orderID) >= 10
This seems to work but it's clunky and slow. Another big problem is that the firstTenOrderTotal is actually the SUM of the total amount of orders by the end of the given date range and not necessarily the first 10.
Any suggestions for a better approach would be much appreciated.
In the insert to #tmpcustomers, why are you joining back to the customer table? The order table already has the customerID that you want. Also, why are you looking for orders where the order date is before your date range? Don't you just want customers with more than n orders between a date range? This will make the second query easier.
By only having the customers with n or more orders in the table variable #tmpcustomers, you should just be able to join it and the orders table in the second query to get the sum of all the orders for those customers where you would once again limit order table records to your date range (so you do not get orders outside of that range). This will remove the having statement and the join to the customers table in your final result query.
Give this a try. Depending on your order distribution it may perform better. In this query im assembling the list of orders in the range, and then looking back to count the number of prior orders (also grabbing the orderTotal).
note: I am assuming the orderID increments as orders are placed.
If this isnt the case just use a row_number over the date to project the sequence into the query.
declare #orders table (orderID int primary key identity(1,1), customerID int, orderDate datetime, orderTotal int)
insert into #orders (customerID, orderDate, orderTotal)
select 1, '2013-01-01', 1 union all
select 1, '2013-01-02', 2 union all
select 1, '2013-02-01', 3 union all
select 2, '2013-01-25', 5 union all
select 2, '2013-01-26', 5 union all
select 2, '2013-02-02', 10 union all
select 2, '2013-02-02', 10 union all
select 2, '2013-02-04', 20
declare #N int, #StartDate datetime, #EndDate datetime
select #N = 3,
#StartDate = '2013-02-01',
#EndDate = '2013-02-20'
select o.customerID,
[total] = o.orderTotal + p.total --the nth order + total prior
from #orders o
cross
apply ( select count(*)+1, sum(orderTotal)
from #orders
where customerId = o.customerID and
orderID < o.orderID and
orderDate <= o.orderDate
) p(n, total)
where orderDate between #StartDate and #EndDate and p.n = #N
Here is my suggestion:
Use Northwind
GO
select ords.OrderID , ords.OrderDate , '<-->' as Sep1 , derived1.* from
dbo.Orders ords
join
(
select CustomerID, OrderID, ROW_NUMBER() OVER(PARTITION BY CustomerID ORDER BY OrderId DESC) AS ThisCustomerCardinalOrderNumber from dbo.Orders
) as derived1
on ords.OrderID = derived1.OrderID
where
derived1.ThisCustomerCardinalOrderNumber = 3
and ords.OrderDate between '06/01/1997' and '07/01/1997'
EDIT:::::::::
I took my CTE example, and reworked it for multiple Customers (seen below).
Give it the college try.
Use Northwind
GO
declare #BeginDate datetime
declare #EndDate datetime
select #BeginDate = '01/01/1900'
select #EndDate = '12/31/2010'
;
WITH
MyCTE /* http://technet.microsoft.com/en-us/library/ms175972.aspx */
( ShipName,ShipAddress,ShipCity,ShipRegion,ShipPostalCode,ShipCountry,CustomerID,CustomerName,[Address],
City,Region,PostalCode,Country,Salesperson,OrderID,OrderDate,RequiredDate,ShippedDate,ShipperName,
ProductID,ProductName,UnitPrice,Quantity,Discount,ExtendedPrice,Freight,ROWID) AS
(
SELECT
ShipName ,ShipAddress,ShipCity,ShipRegion,ShipPostalCode,ShipCountry,CustomerID,CustomerName,[Address]
,City ,Region,PostalCode,Country,Salesperson,OrderID,OrderDate,RequiredDate,ShippedDate,ShipperName
,ProductID ,ProductName,UnitPrice,Quantity,Discount,ExtendedPrice,Freight
, ROW_NUMBER() OVER (PARTITION BY CustomerID ORDER BY OrderDate , ProductName ASC ) as ROWID /* Note that the ORDER BY (here) is directly related to the ORDER BY (near the very end of the query) */
FROM
dbo.Invoices inv /* “Invoices” is a VIEW, FYI */
where
(inv.OrderDate between #BeginDate and #EndDate)
)
SELECT
/*
ShipName,ShipAddress,ShipCity,ShipRegion,ShipPostalCode,ShipCountry,CustomerID,CustomerName,[Address],
City,Region,PostalCode,Country,Salesperson,OrderID,OrderDate,RequiredDate,ShippedDate,ShipperName,
ProductID,ProductName,UnitPrice,Quantity,Discount,ExtendedPrice,Freight,
*/
/*trim the list down a little for the final output */
CustomerID ,OrderID , OrderDate, (ExtendedPrice + Freight) as ComputedTotal
/*The below line is the “trick”. I reference the above CTE, but only get data that is less than or equal to the row that I am on (outerAlias.ROWID)*/
, (Select SUM (ExtendedPrice + Freight) from MyCTE innerAlias where innerAlias.ROWID <= outerAlias.ROWID and innerAlias.CustomerID = outerAlias.CustomerID) as RunningTotal
, ROWID as ROWID_SHOWN_FOR_KICKS , OrderDate as OrderDate
FROM
MyCTE outerAlias
GROUP BY CustomerID ,OrderID, OrderDate, ProductName,(ExtendedPrice + Freight) ,ROWID,OrderDate
/*Two Order By Options*/
ORDER BY outerAlias.CustomerID , outerAlias.OrderDate , ProductName
/* << Whatever the ORDER BY is here, should match the “ROW_NUMBER() OVER ( ORDER BY ________ ASC )” statement inside the CTE */
/*ORDER BY outerAlias.ROWID */ /* << Or, to keep is more “trim”, ORDER BY the ROWID, which will of course be the same as the “ROW_NUMBER() OVER ( ORDER BY” inside the CTE */

t-sql test data warehouse type 2 changes

I need to look at a data warehouse and check that a type 2 change works correctly
I need to check that the vaild to date on a row is the same as the vaild from date on the next row.
This check is to make sure that a row has been ended has also been started correctly
thanks, Marc
The following relates to Kimball type-2 dimension table.
Note that this assumes
3000-01-01 as a date in far future for the current entries.
CustomerKey is an auto-incrementing integer.
This example should give you the list of rows with missing or miss-matched next-entries.
;
with
q_00 as (
select
CustomerKey
, CustomerBusinessKey
, rw_ValidFrom
, rw_ValidTo
, row_number() over (partition by CustomerBusinessKey order by CustomerKey asc) as rn
from dimCustomer
)
select
a.CustomerKey
, a.CustomerBusinessKey
, a.rw_ValidFrom
, a.rw_ValidTo
, b.CustomerKey as b_key
, b.CustomerBusinessKey as b_bus_key
, b.rw_ValidFrom as b_ValidFrom
, b.rw_ValidTo as b_ValidTo
from q_00 as a
left join q_00 as b on b.CustomerBusinessKey = a.CustomerBusinessKey and (b.rn = a.rn + 1)
where a.rw_ValidTo < '3000-01-01'
and a.rw_ValidTo != b.rw_ValidFrom ;
Also useful
-- Make sure there are no nulls
-- for rw_ValidFrom, rw_ValidTo
select
CustomerKey
, rw_ValidFrom
, rw_ValidTo
from dimCustomer
where rw_ValidFrom is null
or rw_ValidTo is null ;
-- make sure there are no duplicates in rw_ValidFrom
-- for the same customer
select
CustomerBusinessKey
, rw_ValidFrom
, count(1) as cnt
from dimCustomer
group by CustomerBusinessKey, rw_ValidFrom
having count(1) > 1 ;
-- make sure there are no duplicates in rw_ValidTo
-- for the same customer
select
CustomerBusinessKey
, rw_ValidTo
, count(1) as cnt
from dimCustomer
group by CustomerBusinessKey, rw_ValidTo
having count(1) > 1 ;

Resources