SQL Active Users by day - sql-server

I've got a table with CustomerID, StartDate and EndDate.
I'm trying to create a table with the following columns: Date, ActiveUsers.
The Date needs to be all dates between 01/01/2016 and today. ActiveUsers is a count of CustomerID where the Date falls between the StartDate and EndDate.
I hope all that makes sense.
I found code that gives me a list of dates but I have no idea how I can join my customers table to this result.
DECLARE #StartDateTime DATE
DECLARE #EndDateTime DATE
SET #StartDateTime = '2016-01-01'
SET #EndDateTime = GETDATE();
WITH DateRange(DateData) AS
(
SELECT #StartDateTime as Date
UNION ALL
SELECT DATEADD(d,1,DateData)
FROM DateRange
WHERE DateData <= #EndDateTime
)
SELECT dr.DateData
FROM DateRange dr
OPTION (MAXRECURSION 0)
GO

This is a simple left join, group by and count:
SELECT DateData, COUNT(CustomerID) as ActiveUsers
FROM DateRange AS D
LEFT JOIN Customers AS C
ON D.DateData >= C.StartDate
AND D.DateData <= C.EndDate
GROUP BY DateData
However, here's a free tip: Using a recursive cte for things like that is fine when the range is small, but if you find yourself having to use OPTION (MAXRECURSION 0) it means you are in danger of a performance hit because of the recursive cte and should replace it with a tally table based solution.
If you don't know what a tally table is, read Jeff Moden's The "Numbers" or "Tally" Table: What it is and how it replaces a loop.
If you don't already have a tally table, read What is the best way to create and populate a numbers table?
Having said that, date related queries often benefit from having a pre-populated calendar table - such a table can save you from calculating weekends, national holidays etc', at a storage price that's practically negligible in modern servers.
Read Aaron Bertrand's Creating a date dimension or calendar table in SQL Server for a step-by-step explanation on how to create one for yourself.

Related

SQl Server - Where clause uses maximum date in data

I'm struggling with something i thought would be easy.
I have a table that is updated via an append on most days and has a report date field that shows the date the rows were updated.
I want to join to this table but only pull back the records from the date the table was last updated
Most of the time I could get away just looking for yesterdays date as the table is updated most days
Where [reportdate] > DATEADD(DAY, -1, GETDATE())
But as its not always updated daily, I wanted to rule this issue out. Is there anyway of returning the max date?
I was trying to figure out max (date), but I can't figure out the grouping. I need to return all the fields. The below just seems to return the whole table
SELECT max ([ReportDate]) as reportdate
,[GUID]
,[Make]
,[Model]
,[MPxN]
,[PaymentMode]
,[Consent]
,[Category]
,[Fuel]
,[pkCommCompID]
FROM table
group by guid
,[Make]
,[Model]
,[MPxN]
,[PaymentMode]
,[Consent]
,[Category]
,[Fuel]
,[pkCommCompID]
I could get round it with a temp table that just has the max report date and then using this as the left part of a join
SELECT max ([ReportDate]) as reportdate
FROM [DOMCustomers].[dbo].[DCC_Device_Comms_Compiled]
But The SQL is triggered in Excel so temp tables are problematic (i think).
Is there anyway of returning the max date?
Like this:
SELECT *
FROM SomeTable
where ReportDate = (select max(ReportDate) from SomeTable)
Here is a conceptual example.
It will produce a latest row for each car make.
SQL
-- DDL and sample data population, start
DECLARE #tbl TABLE (ID INT IDENTITY PRIMARY KEY, make VARCHAR(20), ReportDate DATETIME);
INSERT INTO #tbl (make, ReportDate) VALUES
('Ford', '2020-12-31'),
('Ford', '2020-10-17'),
('Tesla', '2020-10-25'),
('Tesla', '2020-12-30');
-- DDL and sample data population, end
;WITH rs AS
(
SELECT *
, ROW_NUMBER() OVER (PARTITION BY make ORDER BY ReportDate DESC) AS seq
FROM #tbl
)
SELECT * FROM rs
WHERE seq = 1;
Seems like a DENSE_RANK and TOP would work (assuming ReportDate is a date):
SELECT TOP (1) WITH TIES
[ReportDate]
,[GUID]
,[Make]
,[Model]
,[MPxN]
,[PaymentMode]
,[Consent]
,[Category]
,[Fuel]
,[pkCommCompID]
FROM YourTable
ORDER BY DENSE_RANK() OVER (ORDER BY ReportDate DESC);
If ReportDate is a date and time value, and you want everything for the latest date (ignoring time), then replace ReportDate with CONVERT(date,ReportDate) in the ORDER BY.

Repeat Customers with multiple purchases on the same day counts a 1

I am trying to wrap my head around this problem. I was asked to create a report that show repeat customers in our database.
One of the requirements is if a customer has more than 1 order on a specific date, it would only count as 1.
Then if they have more than 1 purchase date, they would then count as a repeat customer.
Searching on here, I found this which works for finding the Customers with more then 1 purchase on a specific purchase date.
SELECT DISTINCT s.[CustomerName], s.PurchaseDate
FROM Reports.vw_Repeat s WHERE s.PurchaseDate <> ''
GROUP BY s.[CustomerName] , cast(s.PurchaseDate as date)
HAVING COUNT(*) > 1;
This MSSQL code works like it should, by showing customers who had more than 1 purchase on the same date.
My problem is what would the best approach be to Join this into another query (this is where i need help) that then shows a complete repeat customer list where customers with more than 1 purchase would be returned.
I am using MSSQL. Any help would be greatly appreciated.
You're close, you need to move distinct into your having clause because you want to include only customers that have more than 1 distinct purchase date.
Also, only group by the customer id because the different dates have to be part of the same group for count distinct to work.
SELECT s.[CustomerName], COUNT(distinct cast(s.PurchaseDate as date))
FROM Reports.vw_Repeat s WHERE s.PurchaseDate <> ''
GROUP BY s.[CustomerName]
HAVING COUNT(distinct cast(s.PurchaseDate as date)) > 1;
If you want to pass a parameter to a query and join the result, that's what table-valued functions are for. When you join it, you use CROSS APPLY or OUTER APPLY instead of an INNER JOIN or a LEFT JOIN.
Also, I think this goes without saying, but when you check if PurchaseDate is empty:
WHERE s.PurchaseDate <> ''
Could be issues there... it implies it's a varchar field instead of a datetime (yes?) and doesn't handle null values. You might, at least, want to replace that with ISNULL(s.PurchaseDate, '') <> ''. If it's actually a datetime, use IS NOT NULL instead of <> ''.
(Edited to add sample data and DDL statements. I recommend adding these to SQL posts to assist answerers. Also, I made purchasedate a varchar instead of a datetime because of the string comparison in the query.)
https://technet.microsoft.com/en-us/library/ms191165(v=sql.105).aspx
CREATE TABLE company (company_name VARCHAR(25))
INSERT INTO company VALUES ('Company1'), ('Company2')
CREATE TABLE vw_repeat (customername VARCHAR(25), purchasedate VARCHAR(25), company VARCHAR(25))
INSERT INTO vw_repeat VALUES ('Cust1', '11/16/2017', 'Company1')
INSERT INTO vw_repeat VALUES ('Cust1', '11/16/2017', 'Company1')
INSERT INTO vw_repeat VALUES ('Cust2', '11/16/2017', 'Company2')
CREATE FUNCTION [dbo].tf_customers
(
#company varchar(25)
)
RETURNS TABLE AS RETURN
(
SELECT s.[CustomerName], cast(s.PurchaseDate as date) PurchaseDate
FROM vw_Repeat s
WHERE s.PurchaseDate <> '' AND s.Company = #company
GROUP BY s.[CustomerName] , cast(s.PurchaseDate as date)
HAVING COUNT(*) > 1
)
GO
SELECT *
FROM company c
CROSS APPLY tf_customers(c.company_name)
First thanks to everyone for the help.
#MaxSzczurek suggested I use table-valued functions. After looking into this more, I ended up using just a temporary table first to get the DISTINCT purchase dates for each Customer. I then loaded that into another temp table RIGHT JOINED to the main table. This gave me the result I was looking for. Its a little(lot) ugly, but it works.

List all dates within date range in SQL but ignore bank holidays

I'm making a holiday manager.
I have a table with a list of start and end dates for each instance of holiday.
[LeaveID], [EmployeeID], [StartDate], [EndDate]
I also do have a calendar table with dates from 2016-2030, listing the usual variations of date format as well as times the factory is shut, including bank holidays, etc.
I'm working on the front end for it now they want me to display it in sort of calendar format so I will need to mark on each day, who has booked time off.
I figure I need to list each date within each date range (start date to end date), then check if each date on the calendar appears on that list.
So basically I need to get a list of dates within a date range.
On top of that. I'd like to be able to compare the list of dates from above, to the calendar table so I can ignore bank holidays when calculating the amount of holiday used for each instance.
Thanks in advance!
To get a list of date within a date range, you will need source of numbers from 1 to n. I usually create such table and call it Numbers table.
To generate a list of date within a range, use following query.
SELECT
DATEADD(DAY, Numbers.Number-1, [StartDate]) Date
FROM
Numbers
WHERE
DATEADD(DAY, Numbers.Number-1, [StartDate]) <= [EndDate]
To create such table, refer to this question.
If you want to list all dates in Employee table, just cross join it.
SELECT
e.EmployeeID,
DATEADD(DAY, n.Number-1, e.[StartDate]) Date
FROM
Numbers n, Employee e
WHERE
DATEADD(DAY, n.Number-1, e.[StartDate]) <= e.[EndDate]
As you already have a dates table, you do not need the numbers table mentioned in the other answer. To accomplish what you are after requires a simple SQL Join from your dates table. Depending on how you want to format your final report you can either count up the number of EmployeeIDs returned or group them all into a calendar/table control in your front end on the DateValue.
In the query below you will get at least one DateValue for every date specified in the range (for which you can apply your own filtering such as where Dates.BankHoliday = 0 etc) and more than one where multiple Employees have taken leave:
-- Build some dummy data to run the query against.
declare #Emp table (LeaveID int, EmployeeID int , StartDate datetime, EndDate datetime);
insert into #Emp values
(1,1,'20161101','20161105')
,(2,1,'20161121','20161124')
,(3,2,'20161107','20161109')
,(4,3,'20161118','20161122');
declare #Dates table (DateKey int, DateValue datetime, DateLabel nvarchar(50));
declare #s datetime = '20161025';
with cte as
(
select cast(convert(nvarchar(8),#s,112) as int) as DateKey
,#s as DateValue
,convert(nvarchar(50),#s,103) as DateLabel
union all
select cast(convert(nvarchar(8),DateValue+1,112) as int)
,DateValue+1
,convert(nvarchar(50),DateValue+1,103)
from cte
where DateValue+1 <= '20161205'
)
insert into #Dates
select * from cte;
-- Actually query the data.
-- Define the start and end of your date range to return.
declare #MinStart datetime = (select min(StartDate) from #Emp);
declare #MaxEnd datetime = (select max(EndDate) from #Emp);
select d.DateValue
,e.EmployeeID
from #Dates d
left join #Emp e
on(d.DateValue between e.StartDate and e.EndDate)
where d.DateValue between #MinStart and #MaxEnd
order by d.DateValue
,e.EmployeeID;

Which Transact-SQL query is most efficient?

I plan to pass exam "Querying Microsoft SQL Server 2012"
I have one question that I have problem to understand.
Question is:
Which Transact-SQL query should you use?
Your database contains a table named Purchases. Thetable includes a
DATETIME column named PurchaseTime that stores the date and time each
purchase is made. There is a non-clustered index on the PurchaseTime
column. The business team wants a report that displays the total
number of purchases madeon the current day. You need to write a query
that will return the correct results in the most efficient manner.
Which Transact-SQL query should you use?
Possible answers are:
A.
SELECT COUNT(*)
FROM Purchases
WHERE PurchaseTime = CONVERT(DATE, GETDATE())
B.
SELECT COUNT(*)
FROM Purchases
WHERE PurchaseTime = GETDATE()
C.
SELECT COUNT(*)
FROM Purchases
WHERE CONVERT(VARCHAR, PurchaseTime, 112) = CONVERT(VARCHAR, GETDATE(), 112)
D.
SELECT COUNT(*)
FROM Purchases
WHERE PurchaseTime >= CONVERT(DATE, GETDATE())
AND PurchaseTime < DATEADD(DAY, 1, CONVERT(DATE, GETDATE()))
This is source: Which Transact-SQL query should you use?
According to them the correct answer is 'D'.
But I do not see why is this more efficient than 'A' ?
In 'D' we call two functions (CONVERT and DATEADD).
Thanks for help.
D will be most efficient as you are not converting the datetime column to any other data type, which means SQL Server can use any indexes defined on the PurchaseTime column.
It is also known as Sargable expression.
C will ignore any indexes defined on the PurchaseTime column and will result in a Clustered scan if there is one or a table scan if it is a heap (a table without a clustered index).
And queries A and B will simply not return the correct results as they will ignore any records older than when this query is executed.

How can I optimize a SQL query that performs a count nested inside a group-by clause?

I have a charting application that dynamically generates SQL Server queries to compute values for each series on a given chart. This generally works quite well, but I have run into a particular situation in which the generated query is very slow. The query looks like this:
SELECT
[dateExpr] AS domainValue,
(SELECT COUNT(*) FROM table1 WHERE [dateExpr]=[dateExpr(maintable)] AND column2='A') AS series1
FROM table1 maintable
GROUP BY [dateExpr]
ORDER BY domainValue
I have abbreviated [dateExpr] because it's a combination of CAST and DATEPART functions that convert a datetime field to a string in the form of 'yyyy-MM-dd' so that I can easily group by all values in a calendar day. The query above returns both those yyyy-MM-dd values as labels for the x-axis of the chart and the values from the data series "series1" to display on the chart. The data series is supposed to count the number of records that fall into that calendar day that also contain a certain value in [column2]. The "[dateExpr]=[dateExpr(maintable)]" expression looks like this:
CAST(DATEPART(YEAR,dateCol) AS VARCHAR)+'-'+CAST(DATEPART(MONTH,dateCol) AS VARCHAR) =
CAST(DATEPART(YEAR,maintable.dateCol) AS VARCHAR)+'-'+CAST(DATEPART(MONTH,maintable.dateCol) AS VARCHAR)
with an additional term for the day (ommitted above for the sake of space). That is the source of the slowness of the query, but I don't know how to rewrite the query so that it returns the same result more efficiently. I have complete control over the generation of the query, so if I could find more efficient SQL that returned the same results, I could modify the query generator appropriately. Any pointers would be greatly appreciated.
I havent tested but i think it can be done by:
SELECT
[dateExpr] AS domainValue,
SUM (CASE WHEN column2='A' THEN 1 ELSE 0 END) AS series1
FROM table1 maintable
GROUP BY [dateExpr]
ORDER BY domainValue
The fastest way to do this would be to use calendar tables. Create a sql table with an entry for every month for next who knows how many years. Then select from that calendar table, joining in the entries from table1 that have dates between the start and end date for the month. Then, if your clustered index is on the dateCol in table1, the query will run very quickly.
EDIT: Example Query. This assumes a months table exists with two columns, StartDate and EndDate where EndDate is the midnight on the first day of the next month. The clustered index on the months table should be on StartDate
SELECT
months.StartDate,
COUNT(*) AS [Count]
FROM months
INNER JOIN table1
ON table1.dateCol >= months.StartDate AND table1.dateCol < months.EndDate
GROUP BY months.StartDate;
With Calendar As
(
Select DateAdd(d, DateDiff(d, 0, Min( dateCol ) ), 0) As [date]
From Table1
Union All
Select DateAdd(d, 1, [date])
From Calendar
Where [date] <= (
Select Max( DateAdd(d, DateDiff(d, 0, dateCol) + 1, 0) )
From Table1
)
)
Select C.date, Count(Table1.PK) As Total
From Calendar As C
Left Join Table1
On Table1.dateCol >= C.date
And Table1.dateCol < DateAdd(d, 1, C.date )
And Table1.column2 = 'A'
Group By C.date
Option (Maxrecursion 0);
Rather than try to force the display format in SQL, you should do that in your report or chart generator. However, what you can do in the SQL is to strip the time portion from the datetime values as I've done in my solution.

Resources