How To Count Rows Based on Values of Two Variables in SSIS - sql-server

I am fairly new to SSIS, and now I have this requirement to exclude weekends in order to do a performance management. Now I have created a calendar and marked the weekends; what I am trying to do, using SSIS, is get the start and end date of every status and count how many weekends are there. I am kind of struggling to know which component to use to achieve this task.
So I have mainly two tables:
1- Table Calendar
2- Table History-Log
Calendar has the following columns:
1- ID
2- date
3- year
4- month
5- day of week
6- isweekend
History-Log has the following:
1- ID
2- Status
3- startdate
4- enddate
Your help is really appreciated.

I'm not an SSIS user, so apologies if this answer does not help, but if I wanted to get the result you describe, based on some test data:
DECLARE #Calendar TABLE (
ID INT,
[Date] DATETIME,
[Year] INT,
[Month] INT,
[DayOfWeek] VARCHAR(10),
IsWeekend BIT
)
DECLARE #HistoryLog TABLE (
ID INT,
[Status] INT,
StartDate DATETIME,
EndDate DATETIME
)
DECLARE #StartDate DATE = '20100101', #NumberOfYears INT = 10
DECLARE #CutoffDate DATE = DATEADD(YEAR, #NumberOfYears, #StartDate);
INSERT INTO #Calendar
SELECT ROW_NUMBER() OVER (ORDER BY d) AS ID,
d AS [Date],
DATEPART(YEAR,d) AS [Year],
DATEPART(MONTH,d) AS [Month],
DATENAME(WEEKDAY,d) AS [DayOfWeek],
CASE WHEN DATENAME(WEEKDAY,d) IN ('Saturday','Sunday') THEN 1 ELSE 0 END AS IsWeekend
FROM
(
SELECT d = DATEADD(DAY, rn - 1, #StartDate)
FROM
(
SELECT TOP (DATEDIFF(DAY, #StartDate, #CutoffDate))
rn = ROW_NUMBER() OVER (ORDER BY s1.[object_id])
FROM sys.all_objects AS s1
CROSS JOIN sys.all_objects AS s2
ORDER BY s1.[object_id]
) AS x
) AS y;
INSERT INTO #HistoryLog
SELECT 1, 3, '2016-01-05', '2016-01-20'
UNION
SELECT 2, 7, '2016-01-08', '2016-01-25'
UNION
SELECT 3, 4, '2016-01-01', '2016-02-03'
UNION
SELECT 4, 3, '2016-02-09', '2016-02-10'
I would use a query like this to return all of the HistoryLog records with a count of the number of weekend days between their StartDate and EndDate:
SELECT h.ID,
h.[Status],
h.StartDate,
h.EndDate,
COUNT(c.ID) AS WeekendDays
FROM #HistoryLog h
LEFT JOIN #Calendar c ON c.[Date] >= h.StartDate AND c.[Date] <= h.EndDate AND c.IsWeekend = 1
GROUP BY h.ID, h.[Status], h.StartDate, h.EndDate
ORDER BY 1
If you wanted to know the number of weekends, rather than the number of weekend days, we'd need to slightly amend this logic (and define how a range containing only one weekend day - or one starting on a Sunday and ending on a Saturday inclusive - should be handled). Assuming you just want to know how many distinct weekends are at least partially within the date range, you could do:
SELECT h.ID,
h.[Status],
h.StartDate,
h.EndDate,
COUNT(weekends.ID) AS Weekends
FROM #HistoryLog h
LEFT JOIN
(
SELECT c.ID,
c.[Date] AS SatDate,
DATEADD(DAY,1,c.[Date]) AS SunDate
FROM #Calendar c
WHERE c.[DayOfWeek] = 'Saturday'
) weekends ON h.StartDate BETWEEN weekends.SatDate AND weekends.SunDate
OR h.EndDate BETWEEN weekends.SatDate AND weekends.SunDate
OR (h.StartDate <= weekends.SatDate AND h.EndDate >= weekends.SunDate)
GROUP BY h.ID, h.[Status], h.StartDate, h.EndDate

Related

Comments please on my script - how to make it more efficient

I have a table (SYS_Holidays) that have start and end dates of each holiday period. I need to output all holiday dates in a relational form. For example, I have 25-Dec-2017 to 2-Jan-2018 as one row in the input, I want to output 25-Dec, 26-Dec ... through 2-Jan as 9 rows.
I have written this script, could you please tell me how I can make it more efficient?
SELECT
H.HolidayName
, DATEADD(DAY, Number-1, H.StartDate) AS HolidayDate
FROM
SYS_Holidays AS H
CROSS JOIN Config_Numbers AS N
WHERE
-- Figure the # of days between start and end: one row for each holiday-date
-- If EndDate is null, just use StartDate (i.e. 1-day holiday)
N.Number <= DATEDIFF(DAY, H.StartDate, ISNULL(H.EndDate, H.StartDate) ) + 1
NB: Config_Numbers is a table I have created with a huge list of integers (as BIGINT)
It can be done using a date table and an inner join. Use the subquery to make a table if it is not efficient enough:
Create table #Test (HolidayName nvarchar(100), StartDate Date, EndDate Date)
Insert Into #Test Values ('Christmas', '2017-12-22', '2018-01-03'), ('Easter' , '2017-04-10', '2017-04-16')
SELECT HolidayName, DatesList.[Date] as HolidayDate
FROM #Test t
inner join (
SELECT cast(dateadd(day, number, '2017-1-1') as date) as [Date]
FROM master..spt_values WHERE type='P' AND number < 1000) AS DatesList
on t.StartDate<=DatesList.[Date] and t.EndDate>=DatesList.[Date]
I modified #cloudsafe's answer, to yield the code below. It is still much faster than any of the joins using Config_Numbers. Subtree cost came to ~0.2785.
I figured that 2048 can cover a little more than 5 years, so I broke my code up into 5-year blocks, and did a UNION to join them up.
Trouble is, I'd have to remember to do another UNION every 5-years :-(
SELECT HolidayName, DatesList.[Date] as HolidayDate
FROM SYS_Holidays AS H
inner join (
SELECT cast(dateadd(day, number, '2013-01-01') as date) as [Date]
FROM master..spt_values WHERE type='P' AND number < 2048) AS DatesList
on H.StartDate <= DatesList.[Date] and H.EndDate >=DatesList.[Date]
UNION
SELECT HolidayName, DatesList.[Date] as HolidayDate, H.HolidayId, H.CampusId, H.CategoryId
FROM SYS_Holidays AS H
inner join (
SELECT cast(dateadd(day, number, '2018-01-01') as date) as [Date]
FROM master..spt_values WHERE type='P' AND number < 2048) AS DatesList
on H.StartDate <= DatesList.[Date] and H.EndDate >=DatesList.[Date]
Any further suggestions for improvement, please?

TSQL Query 12 months of Data - Include Months without records

I am trying to create a 12 month grid view of all questions that were submitting for each month in that 12 month period.
SELECT
YEAR(h.metaInsert) [Year],
MONTH(h.metaInsert) [Month],
DATENAME(MONTH,h.metaInsert) [Month Name],
COUNT(1) [Total Documents]
FROM
Document_Count_History AS h
WHERE
YEAR(h.metaInsert) = 2017
GROUP BY
YEAR(h.metaInsert), MONTH(h.metaInsert), DATENAME(MONTH, h.metaInsert)
ORDER BY
1, 2
This returns the data perfectly for the months that have it, but I get no data returned for those with 0 records for that specific month.
My goal is to see all 12 months along with the count of documents. If there are no documents, it will simply be a 0 for that month but it will be included in the result set.
How can I take what I have and apply the missing months?
You could use something like this to generate the sequence of months for your query:
declare #StartDate date = '20170101'
,#NumberOfYears int = 1;
;with Months as (
select top (12*#NumberOfYears)
[Month] = dateadd(Month, row_number() over (order by number) -1, #StartDate)
, NextMonth = dateadd(Month, row_number() over (order by number), #StartDate)
from master.dbo.spt_values
)
select
year(m.Month) [Year],
Month(m.Month) [Month],
datename(Month,m.Month) [Month Name],
count(h.*) [Total Documents]
from Months as m
left join Document_Count_History AS h
on h.metaInsert >= m.Month
and h.metaInsert < m.NextMonth
--where h.metaInsert >= '20170101'
group by m.Month
order by m.Month
Although you may want to consider adding a Calendar table, or Date Dimension.
Calendar and Numbers table references:
Generate a set or sequence without loops - 1 - Aaron Bertrand
The "Numbers" or "Tally" Table: What it is and how it replaces a loop - Jeff Moden
Creating a Date Table/Dimension in SQL Server 2008 - David Stein
Calendar Tables - Why You Need One - David Stein
Creating a date dimension or calendar table in SQL Server - Aaron Bertrand
An example months table:
create table dbo.Months(
MonthStart date not null primary key
, NextMonthStart date not null
, [Year] smallint not null
, [Month] tinyint not null
, [MonthName] varchar(16) not null
);
declare #StartDate date = '20100101'
,#NumberOfYears int = 30;
insert dbo.Months(MonthStart,NextMonthStart,[Year],[Month])
select top (12*#NumberOfYears)
[MonthStart] = dateadd(month, row_number() over (order by number) -1, #StartDate)
, NextMonthStart = dateadd(month, row_number() over (order by number), #StartDate)
, [year] = year(dateadd(month, row_number() over (order by number) -1, #StartDate))
, [Month] = Month(dateadd(month, row_number() over (order by number) -1, #StartDate))
, MonthName = datename(Month,dateadd(month, row_number() over (order by number) -1, #StartDate))
from master.dbo.spt_values;
and your query would simplify to:
select
m.[Year],
m.[Month],
m.[MonthName],
count(h.*) [Total Documents]
from Months as m
left join Document_Count_History AS h
on h.metaInsert >= m.MonthStart
and h.metaInsert < m.NextMonthStart
where m.Year = 2017
group by m.Month, m.Year, m.MonthName
order by m.MonthStart
You need a date dimension. Specifically, you need a table that has all the values for months. Then, you can do a left-join on the table that gets the totals, and pull out a sum value.

Count the number of times a date is contained between 2 date columns

I have a table that looks like this
ID start_dt end_dt
--------------------------
1 1951-12-05 1951-12-21
2 1951-12-19 1951-12-31
3 1957-12-05 1957-12-19
4 1995-12-06 1995-12-20
5 1996-06-24 1996-07-08
6 1997-05-12 1997-05-26
7 1997-10-07 1997-10-21
8 1997-12-25 1998-01-08
9 1998-01-19 1998-02-02
10 1998-08-05 1998-08-19
I'd like to know how many times each individual date is contained between start_dt and end_dt.
From my example, the result set should look something like this
date count
------------------
1951-12-05 1
1951-12-06 1
...
1951-12-19 2
1951-12-20 2
1951-12-21 2
...
1998-08-19 1
What would be the best way to do this?
EDIT: To clarify, I need each date that appears at least once in a date range (between start_dt and end_dt) to get a row in my result set and I want the number of ranges that this date fits in next to it
hope this helps
When you need to turn 2 values (a range) into a series of rows you can use a number table (see Aaron Bertrand's The SQL Server Numbers Table article if you aren't familiar with the idea).
I've used shorter and simpler data but you should get the idea.
declare #dates table (id int not null, start_dt date not null, end_dt date not null)
insert #dates values (1, '20160601', '20160603'),
(2, '20160603', '20160605'),
(3, '20160610', '20160612')
;with cte as (
select
row_number() over (order by so1.object_id) - 1 as n
from
sys.objects so1
cross join sys.objects so2
)
select
dateadd(d, c.n, d.start_dt) as [date],
count(*)
from
#dates d
join cte c on dateadd(d, c.n, d.start_dt) <= d.end_dt
group by
dateadd(d, c.n, d.start_dt)
order by
dateadd(d, c.n, d.start_dt)
If there are no more than a few days (< 80 or so, depending in your sys.objects table) between start_dt and end_dt, you can use this approach (inspired on Rhys').
DECLARE #dates TABLE (id int not null, start_dt date not null, end_dt date not null)
INSERT #dates VALUES
(1, '1951-12-05', '1951-12-21'),
(2, '1951-12-19', '1951-12-31'),
(3, '1957-12-05', '1957-12-19'),
(4, '1995-12-06', '1995-12-20'),
(5, '1996-06-24', '1996-07-08'),
(6, '1997-05-12', '1997-05-26'),
(7, '1997-10-07', '1997-10-21'),
(8, '1997-12-25', '1998-01-08'),
(9, '1998-01-19', '1998-02-02'),
(10, '1998-08-05', '1998-08-19');
WITH RawData AS (
SELECT
DATEADD(d, n.n, d.start_dt) AS [date]
FROM #dates d
INNER JOIN (
SELECT ROW_NUMBER() OVER (ORDER BY object_id) - 1 AS n FROM sys.objects
) n ON DATEADD(d, n.n, d.start_dt) <= d.end_dt
)
SELECT [date], COUNT(*) [count]
FROM RawData
GROUP BY [date]
ORDER BY [date]
I don't think this could take long even with 1000 date ranges. Perhaps you are using a table with more fields and even missing some index?
You could use a CTE
WITH CTE AS(SELECT start_dt AS dates FROM Table
UNION ALL
SELECT end_dt AS dates FROM Table)
SELECT CAST(dates as DATE) as Date, COUNT(dates) AS Count
FROM CTE c
GROUP BY c.dates
order by Count desc
Or perhaps you need something broader if your columns are of DATETIME data type. This way will GROUP BY the whole day:
WITH CTE AS(SELECT CAST(start_dt AS DATE) AS dates FROM Table
UNION ALL
SELECT CAST(end_dt AS DATE) AS dates FROM Table)
SELECT Dates as Date, COUNT(Dates) AS Count
FROM CTE c
GROUP BY c.dates
order by Count desc

Select count with 0 count

Lets say I have following query:
SELECT top (5) CAST(Created AS DATE) as DateField,
Count(id) as Counted
FROM Table
GROUP BY CAST(Created AS DATE)
order by DateField desc
Lets say it will return following data set
DateField Counted
2016-01-18 34
2016-01-17 99
2016-01-14 1
2015-12-28 1
2015-12-27 6
But when I have Counted = 0 for certain Date I would like to get that in result set. So for example it should look like following
DateField Counted
2016-01-18 34
2016-01-17 99
2016-01-16 0
2016-01-15 0
2016-01-14 1
Thank you!
Expanding upon KM's answer, you need a date table which is like a numbers table.
There are many examples on the web but here's a simple one.
CREATE TABLE DateList (
DateValue DATE,
CONSTRAINT PK_DateList PRIMARY KEY CLUSTERED (DateValue)
)
GO
-- Insert dates from 01/01/2015 and 12/31/2015
DECLARE #StartDate DATE = '01/01/2015'
DECLARE #EndDatePlus1 DATE = '01/01/2016'
DECLARE #CurrentDate DATE = #StartDate
WHILE #EndDatePlus1 > #CurrentDate
BEGIN
INSERT INTO DateList VALUES (#CurrentDate)
SET #CurrentDate = DATEADD(dd,1,#CurrentDate)
END
Now you have a table
then you can rewrite your query as follows:
SELECT top (5) DateValue, isnull(Count(id),0) as Counted
FROM DateList
LEFT OUTER JOIN Table
on DateValue = CAST(Created AS DATE)
GROUP BY DateValue
order by DateValue desc
Two notes:
You'll need a where clause to specify your range.
A join on a cast isn't ideal. The type in your date table should match the type in your regular table.
One more solution as a single query:
;WITH dates AS
(
SELECT CAST(DATEADD(DAY, ROW_NUMBER() OVER (ORDER BY [object_id]) - 1, '2016-01-14') as date) 'date'
FROM sys.all_objects
)
SELECT TOP 5
[date] AS 'DateField',
SUM(CASE WHEN Created IS NULL THEN 0 ELSE 1 END) AS 'Counted'
FROM dates
LEFT JOIN Table ON [date]=CAST(Created as date)
GROUP BY [date]
ORDER BY [date]
For a more edgy solution, you could use a recursive common table expression to create the date list. PLEASE NOTE: do not use recursive common table expressions in your day job! They are dangerous because it is easy to create one that never terminates.
DECLARE #StartDate date = '1/1/2016';
DECLARE #EndDate date = '1/15/2016';
WITH DateList(DateValue)
AS
(
SELECT DATEADD(DAY, 1, #StartDate)
UNION ALL
SELECT DATEADD(DAY, 1, DateValue)
FROM DateList
WHERE DateList.DateValue < #EndDate
)
SELECT DateValue, isnull(Count(id),0) as Counted
FROM DateList
LEFT OUTER JOIN [Table]
ON DateValue = CAST(Created AS DATE)
GROUP BY DateValue
ORDER BY DateValue DESC

Rolling Prior13 months with Current Month Sales

Within a SQL Server 2012 database, I have a table with two columns customerid and date. I am interested in getting by year-month, a count of customers that have purchased in current month but not in prior 13 months. The table is extremely large so something efficient would be highly appreciated. Results table is shown after the input data. In essence, it is a count of customers that purchased in current month but not in prior 13 months (by year and month).
---input table-----
declare #Sales as Table ( customerid Int, date Date );
insert into #Sales ( customerid, date) values
( 1, '01/01/2012' ),
( 1, '04/01/2013' ),
( 1, '01/01/2014' ),
( 1, '01/01/2014' ),
( 1, '04/06/2014' ),
( 2, '04/01/2014' ),
( 3, '01/03/2012' ),
( 3, '01/03/2014' ),
( 4, '01/04/2012' ),
( 4, '04/04/2013' ),
( 5, '02/01/2010' ),
( 5, '02/01/2013' ),
( 5, '04/01/2014' )
select customerid, date
from #Sales;
---desired results ----
yearmth monthpurchasers monthpurchasernot13m
201002 1 1
201201 3 3
201302 1 1
201304 2 2
201401 2 1
201404 3 2
Thanks very much for looking at this!
Dev
You didn't provide the expected result, but I believe this is pretty close (at least logically):
;with g as (
select customerid, year(date)*100 + month(date) as mon
from #Sales
group by customerid, year(date)*100 + month(date)
),
x as (
select *,
count(*) over(partition by customerid order by mon
rows between 13 preceding and 1 preceding) as cnt
from g
),
y as (
select mon, count(*) as cnt from x
where cnt = 0
group by mon
)
select g.mon,
count(distinct(g.customerid)) as monthpurchasers,
isnull(y.cnt, 0) as cnt
from g
left join y on g.mon = y.mon
group by g.mon, y.cnt
order by g.mon
Tell me if this query helps. It extracts all the rows which meet your condition into a Table variable. Then, I use your query and join to this table.
declare #startDate datetime
declare #todayDate datetime
declare #tbl_Custs as Table(customerid int)
set #startDate = '04/01/2014' -- mm/dd/yyyy
set #todayDate = GETDATE()
insert into #tbl_Custs
-- purchased only this month
select customerid
from Sales
where ([date] >= #startDate and [date] <= #todayDate)
and customerid NOT in
(
-- purchased in past 13 months
select distinct customerid
from Sales
where ([date] >= DATEADD(MONTH,-13,[date])
and [date] < #startDate)
)
-- your query goes here
select year(date) as year
,month(date) as month
,count(distinct(c.customerid)) as monthpurchasers
from #tbl_Custs as c right join
Sales as s
on c.customerid = s.customerid
group by year(date) , month(date)
order by year(date) , month(date)
Below query will produce what you are looking for. I am not sure how performance will be on a big table (how big is your table?) but it is pretty straight forward so I think it will be ok. I simply calculate the 13 months earlier on CTE to find my sale window. Than join to the Sales table within that window / customer id and grouping records based on the unmatched records. You don't actually need 2 CTE's here you can do the DATEADD(mm,-13,date) on the join part of the second CTE but I thought it might be more clear this way.
P.S. If you need to change the time frame from 13 months to something else all you have to change is the DATEADD(mm,-13,date) this simply substracts 13 months from the date value.
Hope this helps or at least leads to a better solution
;WITH PurchaseWindow AS (
select customerid, date, DATEADD(mm,-13,date) minsaledate
FROM #Sales
), JoinBySaleWindow AS (
SELECT a.customerid, a.date,a.minsaledate,b.date earliersaledate
FROM PurchaseWindow a
LEFT JOIN #sales b ON a.customerid =b.customerid
--Find the sales for the customer within the last 13 months of original sale
AND b.date BETWEEN a.date AND a.minsaledate
)
SELECT DATEPART(yy,date) AS [year], DATEPART(mm, date) AS [month], COUNT(DISTINCT customerid) monthpurchases
FROM JoinBySaleWindow
--Exclude records where a sale within last 13 months occured
WHERE earliersaledate IS NULL
GROUP BY DATEPART(mm, date), DATEPART(yy,date)
Sorry about the typos they are fixed now.

Resources