How to eliminate overlapping date ranges in SQL? - sql-server

I am trying to eliminate overlapping date ranges in the data set. A smaller data set that I will be used:
How would I eliminate the highlighted first row of data as it overlaps the other date ranges for that specific id?

This is provided as a basic way to get you started since you are new to SO. You will undoubtedly need to change the logic on what you classify as overlapping.
--your test data...
declare #table table (ID int, BeginTime datetime, EndTime datetime)
insert into #table (ID, BeginTime, EndTime) VALUES
(101,'7/4/2016','9/21/2016'),
(101,'8/8/2016','9/8/2016'),
(101,'9/8/2016','9/21/2016'),
(102,'9/2/2016','9/7/2016'),
(103,'9/22/2016','9/28/2016'),
(103,'9/23/2016','9/28/2016')
/*
In SQL 2012 onward use LEAD and LAG to compare rows to the ones above or below them
Change this logic as you need... based on the limited information for "overlapping"
I just placed a flag where the dates didn't light up perfectly. There are undoubtedly
more cases / better logic you will need.
*/
select
ID,
BeginTime,
EndTime,
case when lead(BeginTime) over (partition by ID order by BeginTime asc) <> EndTime then 'n' else 'y' end as toKeep
from #table
--This is the same logic applied in a CTE so we can update the table
;with cte as(
select
ID,
BeginTime,
EndTime,
case when lead(BeginTime) over (partition by ID order by BeginTime asc) <> EndTime then 'n' else 'y' end as toKeep
from #table)
--Update your table via the CTE
delete from cte where toKeep = 'n'
select * from #table

Related

SQl Server - Where clause uses maximum date in data

I'm struggling with something i thought would be easy.
I have a table that is updated via an append on most days and has a report date field that shows the date the rows were updated.
I want to join to this table but only pull back the records from the date the table was last updated
Most of the time I could get away just looking for yesterdays date as the table is updated most days
Where [reportdate] > DATEADD(DAY, -1, GETDATE())
But as its not always updated daily, I wanted to rule this issue out. Is there anyway of returning the max date?
I was trying to figure out max (date), but I can't figure out the grouping. I need to return all the fields. The below just seems to return the whole table
SELECT max ([ReportDate]) as reportdate
,[GUID]
,[Make]
,[Model]
,[MPxN]
,[PaymentMode]
,[Consent]
,[Category]
,[Fuel]
,[pkCommCompID]
FROM table
group by guid
,[Make]
,[Model]
,[MPxN]
,[PaymentMode]
,[Consent]
,[Category]
,[Fuel]
,[pkCommCompID]
I could get round it with a temp table that just has the max report date and then using this as the left part of a join
SELECT max ([ReportDate]) as reportdate
FROM [DOMCustomers].[dbo].[DCC_Device_Comms_Compiled]
But The SQL is triggered in Excel so temp tables are problematic (i think).
Is there anyway of returning the max date?
Like this:
SELECT *
FROM SomeTable
where ReportDate = (select max(ReportDate) from SomeTable)
Here is a conceptual example.
It will produce a latest row for each car make.
SQL
-- DDL and sample data population, start
DECLARE #tbl TABLE (ID INT IDENTITY PRIMARY KEY, make VARCHAR(20), ReportDate DATETIME);
INSERT INTO #tbl (make, ReportDate) VALUES
('Ford', '2020-12-31'),
('Ford', '2020-10-17'),
('Tesla', '2020-10-25'),
('Tesla', '2020-12-30');
-- DDL and sample data population, end
;WITH rs AS
(
SELECT *
, ROW_NUMBER() OVER (PARTITION BY make ORDER BY ReportDate DESC) AS seq
FROM #tbl
)
SELECT * FROM rs
WHERE seq = 1;
Seems like a DENSE_RANK and TOP would work (assuming ReportDate is a date):
SELECT TOP (1) WITH TIES
[ReportDate]
,[GUID]
,[Make]
,[Model]
,[MPxN]
,[PaymentMode]
,[Consent]
,[Category]
,[Fuel]
,[pkCommCompID]
FROM YourTable
ORDER BY DENSE_RANK() OVER (ORDER BY ReportDate DESC);
If ReportDate is a date and time value, and you want everything for the latest date (ignoring time), then replace ReportDate with CONVERT(date,ReportDate) in the ORDER BY.

Repeat Customers with multiple purchases on the same day counts a 1

I am trying to wrap my head around this problem. I was asked to create a report that show repeat customers in our database.
One of the requirements is if a customer has more than 1 order on a specific date, it would only count as 1.
Then if they have more than 1 purchase date, they would then count as a repeat customer.
Searching on here, I found this which works for finding the Customers with more then 1 purchase on a specific purchase date.
SELECT DISTINCT s.[CustomerName], s.PurchaseDate
FROM Reports.vw_Repeat s WHERE s.PurchaseDate <> ''
GROUP BY s.[CustomerName] , cast(s.PurchaseDate as date)
HAVING COUNT(*) > 1;
This MSSQL code works like it should, by showing customers who had more than 1 purchase on the same date.
My problem is what would the best approach be to Join this into another query (this is where i need help) that then shows a complete repeat customer list where customers with more than 1 purchase would be returned.
I am using MSSQL. Any help would be greatly appreciated.
You're close, you need to move distinct into your having clause because you want to include only customers that have more than 1 distinct purchase date.
Also, only group by the customer id because the different dates have to be part of the same group for count distinct to work.
SELECT s.[CustomerName], COUNT(distinct cast(s.PurchaseDate as date))
FROM Reports.vw_Repeat s WHERE s.PurchaseDate <> ''
GROUP BY s.[CustomerName]
HAVING COUNT(distinct cast(s.PurchaseDate as date)) > 1;
If you want to pass a parameter to a query and join the result, that's what table-valued functions are for. When you join it, you use CROSS APPLY or OUTER APPLY instead of an INNER JOIN or a LEFT JOIN.
Also, I think this goes without saying, but when you check if PurchaseDate is empty:
WHERE s.PurchaseDate <> ''
Could be issues there... it implies it's a varchar field instead of a datetime (yes?) and doesn't handle null values. You might, at least, want to replace that with ISNULL(s.PurchaseDate, '') <> ''. If it's actually a datetime, use IS NOT NULL instead of <> ''.
(Edited to add sample data and DDL statements. I recommend adding these to SQL posts to assist answerers. Also, I made purchasedate a varchar instead of a datetime because of the string comparison in the query.)
https://technet.microsoft.com/en-us/library/ms191165(v=sql.105).aspx
CREATE TABLE company (company_name VARCHAR(25))
INSERT INTO company VALUES ('Company1'), ('Company2')
CREATE TABLE vw_repeat (customername VARCHAR(25), purchasedate VARCHAR(25), company VARCHAR(25))
INSERT INTO vw_repeat VALUES ('Cust1', '11/16/2017', 'Company1')
INSERT INTO vw_repeat VALUES ('Cust1', '11/16/2017', 'Company1')
INSERT INTO vw_repeat VALUES ('Cust2', '11/16/2017', 'Company2')
CREATE FUNCTION [dbo].tf_customers
(
#company varchar(25)
)
RETURNS TABLE AS RETURN
(
SELECT s.[CustomerName], cast(s.PurchaseDate as date) PurchaseDate
FROM vw_Repeat s
WHERE s.PurchaseDate <> '' AND s.Company = #company
GROUP BY s.[CustomerName] , cast(s.PurchaseDate as date)
HAVING COUNT(*) > 1
)
GO
SELECT *
FROM company c
CROSS APPLY tf_customers(c.company_name)
First thanks to everyone for the help.
#MaxSzczurek suggested I use table-valued functions. After looking into this more, I ended up using just a temporary table first to get the DISTINCT purchase dates for each Customer. I then loaded that into another temp table RIGHT JOINED to the main table. This gave me the result I was looking for. Its a little(lot) ugly, but it works.

List all dates within date range in SQL but ignore bank holidays

I'm making a holiday manager.
I have a table with a list of start and end dates for each instance of holiday.
[LeaveID], [EmployeeID], [StartDate], [EndDate]
I also do have a calendar table with dates from 2016-2030, listing the usual variations of date format as well as times the factory is shut, including bank holidays, etc.
I'm working on the front end for it now they want me to display it in sort of calendar format so I will need to mark on each day, who has booked time off.
I figure I need to list each date within each date range (start date to end date), then check if each date on the calendar appears on that list.
So basically I need to get a list of dates within a date range.
On top of that. I'd like to be able to compare the list of dates from above, to the calendar table so I can ignore bank holidays when calculating the amount of holiday used for each instance.
Thanks in advance!
To get a list of date within a date range, you will need source of numbers from 1 to n. I usually create such table and call it Numbers table.
To generate a list of date within a range, use following query.
SELECT
DATEADD(DAY, Numbers.Number-1, [StartDate]) Date
FROM
Numbers
WHERE
DATEADD(DAY, Numbers.Number-1, [StartDate]) <= [EndDate]
To create such table, refer to this question.
If you want to list all dates in Employee table, just cross join it.
SELECT
e.EmployeeID,
DATEADD(DAY, n.Number-1, e.[StartDate]) Date
FROM
Numbers n, Employee e
WHERE
DATEADD(DAY, n.Number-1, e.[StartDate]) <= e.[EndDate]
As you already have a dates table, you do not need the numbers table mentioned in the other answer. To accomplish what you are after requires a simple SQL Join from your dates table. Depending on how you want to format your final report you can either count up the number of EmployeeIDs returned or group them all into a calendar/table control in your front end on the DateValue.
In the query below you will get at least one DateValue for every date specified in the range (for which you can apply your own filtering such as where Dates.BankHoliday = 0 etc) and more than one where multiple Employees have taken leave:
-- Build some dummy data to run the query against.
declare #Emp table (LeaveID int, EmployeeID int , StartDate datetime, EndDate datetime);
insert into #Emp values
(1,1,'20161101','20161105')
,(2,1,'20161121','20161124')
,(3,2,'20161107','20161109')
,(4,3,'20161118','20161122');
declare #Dates table (DateKey int, DateValue datetime, DateLabel nvarchar(50));
declare #s datetime = '20161025';
with cte as
(
select cast(convert(nvarchar(8),#s,112) as int) as DateKey
,#s as DateValue
,convert(nvarchar(50),#s,103) as DateLabel
union all
select cast(convert(nvarchar(8),DateValue+1,112) as int)
,DateValue+1
,convert(nvarchar(50),DateValue+1,103)
from cte
where DateValue+1 <= '20161205'
)
insert into #Dates
select * from cte;
-- Actually query the data.
-- Define the start and end of your date range to return.
declare #MinStart datetime = (select min(StartDate) from #Emp);
declare #MaxEnd datetime = (select max(EndDate) from #Emp);
select d.DateValue
,e.EmployeeID
from #Dates d
left join #Emp e
on(d.DateValue between e.StartDate and e.EndDate)
where d.DateValue between #MinStart and #MaxEnd
order by d.DateValue
,e.EmployeeID;

SQL running sum for an MVC application

I need a faster method to calculate and display a running sum.
It's an MVC telerik grid that queries a view that generates a running sum using a sub-query. The query takes 73 seconds to complete, which is unacceptable. (Every time the user hits "Refresh Forecast Sheet", it takes 73 seconds to re-populate the grid.)
The query looks like this:
SELECT outside.EffectiveDate
[omitted for clarity]
,(
SELECT SUM(b.Amount)
FROM vCI_UNIONALL inside
WHERE inside.EffectiveDate <= outside.EffectiveDate
) AS RunningBalance
[omitted for clarity]
FROM vCI_UNIONALL outside
"EffectiveDate" on certain items can change all the time... New items can get added, etc. I certainly need something that can calculate the running sum on the fly (when the Refresh button is hit). Stored proc or another View...? Please advise.
Solution: (one of many, this one is orders of magnitude faster than a sub-query)
Create a new table with all the columns in the view except for the RunningTotal col. Create a stored procedure that first truncates the table, then INSERT INTO the table using SELECT all columns, without the running sum column.
Use update local variable method:
DECLARE #Amount DECIMAL(18,4)
SET #Amount = 0
UPDATE TABLE_YOU_JUST_CREATED SET RunningTotal = #Amount, #Amount = #Amount + ISNULL(Amount,0)
Create a task agent that will run the stored procedure once a day. Use the TABLE_YOU_JUST_CREATED for all your reports.
Take a look at this post
Calculate a Running Total in SQL Server
If you have SQL Server Denali, you can use new windowed function.
In SQL Server 2008 R2 I suggest you to use recursive common table expression.
Small problem in CTE is that for fast query you have to have identity column without gaps (1, 2, 3,...) and if you don't have such a column you have to create a temporary or variable table with such a column and to move you your data there.
CTE approach will be something like this
declare #Temp_Numbers (RowNum int, Amount <your type>, EffectiveDate datetime)
insert into #Temp_Numbers (RowNum, Amount, EffectiveDate)
select row_number() over (order by EffectiveDate), Amount, EffectiveDate
from vCI_UNIONALL
-- you can also use identity
-- declare #Temp_Numbers (RowNum int identity(1, 1), Amount <your type>, EffectiveDate datetime)
-- insert into #Temp_Numbers (Amount, EffectiveDate)
-- select Amount, EffectiveDate
-- from vCI_UNIONALL
-- order by EffectiveDate
;with
CTE_RunningTotal
as
(
select T.RowNum, T.EffectiveDate, T.Amount as Total_Amount
from #Temp_Numbers as T
where T.RowNum = 1
union all
select T.RowNum, T.EffectiveDate, T.Amount + C.Total_Amount as Total_Amount
from CTE_RunningTotal as C
inner join #Temp_Numbers as T on T.RowNum = C.RowNum + 1
)
select C.RowNum, C.EffectiveDate, C.Total_Amount
from CTE_RunningTotal as C
option (maxrecursion 0)
There're may be some questions with duplicates EffectiveDate values, it depends on how you want to work with them - do you want to them to be ordered arbitrarily or do you want them to have equal Amount?

How can I optimize a SQL query that performs a count nested inside a group-by clause?

I have a charting application that dynamically generates SQL Server queries to compute values for each series on a given chart. This generally works quite well, but I have run into a particular situation in which the generated query is very slow. The query looks like this:
SELECT
[dateExpr] AS domainValue,
(SELECT COUNT(*) FROM table1 WHERE [dateExpr]=[dateExpr(maintable)] AND column2='A') AS series1
FROM table1 maintable
GROUP BY [dateExpr]
ORDER BY domainValue
I have abbreviated [dateExpr] because it's a combination of CAST and DATEPART functions that convert a datetime field to a string in the form of 'yyyy-MM-dd' so that I can easily group by all values in a calendar day. The query above returns both those yyyy-MM-dd values as labels for the x-axis of the chart and the values from the data series "series1" to display on the chart. The data series is supposed to count the number of records that fall into that calendar day that also contain a certain value in [column2]. The "[dateExpr]=[dateExpr(maintable)]" expression looks like this:
CAST(DATEPART(YEAR,dateCol) AS VARCHAR)+'-'+CAST(DATEPART(MONTH,dateCol) AS VARCHAR) =
CAST(DATEPART(YEAR,maintable.dateCol) AS VARCHAR)+'-'+CAST(DATEPART(MONTH,maintable.dateCol) AS VARCHAR)
with an additional term for the day (ommitted above for the sake of space). That is the source of the slowness of the query, but I don't know how to rewrite the query so that it returns the same result more efficiently. I have complete control over the generation of the query, so if I could find more efficient SQL that returned the same results, I could modify the query generator appropriately. Any pointers would be greatly appreciated.
I havent tested but i think it can be done by:
SELECT
[dateExpr] AS domainValue,
SUM (CASE WHEN column2='A' THEN 1 ELSE 0 END) AS series1
FROM table1 maintable
GROUP BY [dateExpr]
ORDER BY domainValue
The fastest way to do this would be to use calendar tables. Create a sql table with an entry for every month for next who knows how many years. Then select from that calendar table, joining in the entries from table1 that have dates between the start and end date for the month. Then, if your clustered index is on the dateCol in table1, the query will run very quickly.
EDIT: Example Query. This assumes a months table exists with two columns, StartDate and EndDate where EndDate is the midnight on the first day of the next month. The clustered index on the months table should be on StartDate
SELECT
months.StartDate,
COUNT(*) AS [Count]
FROM months
INNER JOIN table1
ON table1.dateCol >= months.StartDate AND table1.dateCol < months.EndDate
GROUP BY months.StartDate;
With Calendar As
(
Select DateAdd(d, DateDiff(d, 0, Min( dateCol ) ), 0) As [date]
From Table1
Union All
Select DateAdd(d, 1, [date])
From Calendar
Where [date] <= (
Select Max( DateAdd(d, DateDiff(d, 0, dateCol) + 1, 0) )
From Table1
)
)
Select C.date, Count(Table1.PK) As Total
From Calendar As C
Left Join Table1
On Table1.dateCol >= C.date
And Table1.dateCol < DateAdd(d, 1, C.date )
And Table1.column2 = 'A'
Group By C.date
Option (Maxrecursion 0);
Rather than try to force the display format in SQL, you should do that in your report or chart generator. However, what you can do in the SQL is to strip the time portion from the datetime values as I've done in my solution.

Resources