Repeat Customers with multiple purchases on the same day counts a 1 - sql-server

I am trying to wrap my head around this problem. I was asked to create a report that show repeat customers in our database.
One of the requirements is if a customer has more than 1 order on a specific date, it would only count as 1.
Then if they have more than 1 purchase date, they would then count as a repeat customer.
Searching on here, I found this which works for finding the Customers with more then 1 purchase on a specific purchase date.
SELECT DISTINCT s.[CustomerName], s.PurchaseDate
FROM Reports.vw_Repeat s WHERE s.PurchaseDate <> ''
GROUP BY s.[CustomerName] , cast(s.PurchaseDate as date)
HAVING COUNT(*) > 1;
This MSSQL code works like it should, by showing customers who had more than 1 purchase on the same date.
My problem is what would the best approach be to Join this into another query (this is where i need help) that then shows a complete repeat customer list where customers with more than 1 purchase would be returned.
I am using MSSQL. Any help would be greatly appreciated.

You're close, you need to move distinct into your having clause because you want to include only customers that have more than 1 distinct purchase date.
Also, only group by the customer id because the different dates have to be part of the same group for count distinct to work.
SELECT s.[CustomerName], COUNT(distinct cast(s.PurchaseDate as date))
FROM Reports.vw_Repeat s WHERE s.PurchaseDate <> ''
GROUP BY s.[CustomerName]
HAVING COUNT(distinct cast(s.PurchaseDate as date)) > 1;

If you want to pass a parameter to a query and join the result, that's what table-valued functions are for. When you join it, you use CROSS APPLY or OUTER APPLY instead of an INNER JOIN or a LEFT JOIN.
Also, I think this goes without saying, but when you check if PurchaseDate is empty:
WHERE s.PurchaseDate <> ''
Could be issues there... it implies it's a varchar field instead of a datetime (yes?) and doesn't handle null values. You might, at least, want to replace that with ISNULL(s.PurchaseDate, '') <> ''. If it's actually a datetime, use IS NOT NULL instead of <> ''.
(Edited to add sample data and DDL statements. I recommend adding these to SQL posts to assist answerers. Also, I made purchasedate a varchar instead of a datetime because of the string comparison in the query.)
https://technet.microsoft.com/en-us/library/ms191165(v=sql.105).aspx
CREATE TABLE company (company_name VARCHAR(25))
INSERT INTO company VALUES ('Company1'), ('Company2')
CREATE TABLE vw_repeat (customername VARCHAR(25), purchasedate VARCHAR(25), company VARCHAR(25))
INSERT INTO vw_repeat VALUES ('Cust1', '11/16/2017', 'Company1')
INSERT INTO vw_repeat VALUES ('Cust1', '11/16/2017', 'Company1')
INSERT INTO vw_repeat VALUES ('Cust2', '11/16/2017', 'Company2')
CREATE FUNCTION [dbo].tf_customers
(
#company varchar(25)
)
RETURNS TABLE AS RETURN
(
SELECT s.[CustomerName], cast(s.PurchaseDate as date) PurchaseDate
FROM vw_Repeat s
WHERE s.PurchaseDate <> '' AND s.Company = #company
GROUP BY s.[CustomerName] , cast(s.PurchaseDate as date)
HAVING COUNT(*) > 1
)
GO
SELECT *
FROM company c
CROSS APPLY tf_customers(c.company_name)

First thanks to everyone for the help.
#MaxSzczurek suggested I use table-valued functions. After looking into this more, I ended up using just a temporary table first to get the DISTINCT purchase dates for each Customer. I then loaded that into another temp table RIGHT JOINED to the main table. This gave me the result I was looking for. Its a little(lot) ugly, but it works.

Related

Is there a way to tell SQL Server to check the table for a duplicate before inserting each new row?

I tried using the SQL below to insert values from one table, importTable, into another table, POInvoicing. It appears that the way this query below works is it checks the POInvoicing table for any possible duplicates from the importTable and for those entries that are not duplicates, it inserts them into the table. The end result is SQL inserting duplicates that already exist in importTable. Is there a way to tell SQL Server to check the table for a possible duplicate entry, if not, add the next row. Then check the table for a duplicate entry, if not, add the next row. I know this will be slower but speed isn't an issue.
INSERT INTO POInvoicing
(VendorID, InvoiceNo)
SELECT dbo.importTable.VendorID,
dbo.importTable.InvoiceNo
FROM dbo.importTable
WHERE NOT EXISTS (SELECT VendorID,
InvoiceNo
FROM POInvoicing
WHERE POInvoicing.VendorID = dbo.importTable.VendorID AND
POInvoicing.InvoiceNo = dbo.importTable.InvoiceNo)
This isn't exactly the functionality I was hoping for. What I want is for the query to insert a row into the table and then check for "duplicates" before inserting the next row. What constitutes a duplicate in the importTable would be the combination of VendorID and InvoiceNo. There are about a dozen different columns in importTable and technically each row is distinct, so DISTINCT won't work here.
I can't simply remove duplicates from the importTable for a couple of reasons not relevant to the question above (though I can provide it if necessary), so that method is out.
If you really don't care (or refuse to tell us) how you want to decide between two rows with the same VendorID and InvoiceNo values, you can pick an arbitrary row like this:
;WITH NewRows AS
(
SELECT VendorID, InvoiceNo, InvoiceDate, /* ... other columns ... */
rn = ROW_NUMBER() OVER (PARTITION BY VendorID, InvoiceNo ORDER BY (SELECT NULL))
FROM dbo.importTable AS i
WHERE NOT EXISTS (SELECT 1 FROM dbo.POInvoicing AS p
WHERE p.VendorID = i.VendorID
AND p.InvoiceNo = i.InvoiceNo)
)
INSERT dbo.POInvoicing(VendorID, InvoiceNo, InvoiceDate /* , ... other columns ... */)
SELECT VendorID, InvoiceNo, InvoiceDate /* , ... other columns */
FROM NewRows
WHERE rn = 1;
If you later decide there is a specific row you want in the case of duplicates, you can swap out (SELECT NULL) for something else. For example, to take the row with the latest invoice date:
OVER (PARTITION BY VendorID, InvoiceNo ORDER BY InvoiceDate DESC)
Again, I wasn't asking questions here to be annoying, it was to help you get the solution you need. If you want SQL Server to pick between two duplicates, you can either tell it how to pick, or you'll have to accept arbitrary / non-deterministic results. You should not jump the fence for looping / cursors just because the first thing you tried didn't work the way you wanted it to.
Also please always specify the schema and use sensible table aliases.
Adding a primary key constraint or unique key constraint in your table to avoid duplicate data insertion.
Also use distinct keyword in your select query to avoid this.
Duplicate rows can also be eliminated by using group by or row_number() functions in SQL.
Using DISTINCT Keyword
INSERT INTO POInvoicing
(VendorID, InvoiceNo, InvoiceDate)
SELECT DISTINCT dbo.importTable.VendorID,
dbo.importTable.InvoiceNo,
dbo.importTable.InvoiceDate
FROM dbo.importTable
WHERE NOT EXISTS (SELECT VendorID,
InvoiceNo
FROM POInvoicing
WHERE POInvoicing.VendorID = dbo.importTable.VendorID
AND
POInvoicing.InvoiceNo = dbo.importTable.InvoiceNo)
Try this INNER JOIN
INSERT INTO POInvoicing
(VendorID, InvoiceNo, InvoiceDate)
SELECT dbo.importTable.VendorID,
dbo.importTable.InvoiceNo,
dbo.importTable.InvoiceDate
FROM dbo.importTable IM
INNER JOIN POInvoicing S ON S.POInvoicing.VendorID <>
dbo.importTable.VendorID
AND
S.POInvoicing.InvoiceNo <> dbo.importTable.InvoiceN

SQL Active Users by day

I've got a table with CustomerID, StartDate and EndDate.
I'm trying to create a table with the following columns: Date, ActiveUsers.
The Date needs to be all dates between 01/01/2016 and today. ActiveUsers is a count of CustomerID where the Date falls between the StartDate and EndDate.
I hope all that makes sense.
I found code that gives me a list of dates but I have no idea how I can join my customers table to this result.
DECLARE #StartDateTime DATE
DECLARE #EndDateTime DATE
SET #StartDateTime = '2016-01-01'
SET #EndDateTime = GETDATE();
WITH DateRange(DateData) AS
(
SELECT #StartDateTime as Date
UNION ALL
SELECT DATEADD(d,1,DateData)
FROM DateRange
WHERE DateData <= #EndDateTime
)
SELECT dr.DateData
FROM DateRange dr
OPTION (MAXRECURSION 0)
GO
This is a simple left join, group by and count:
SELECT DateData, COUNT(CustomerID) as ActiveUsers
FROM DateRange AS D
LEFT JOIN Customers AS C
ON D.DateData >= C.StartDate
AND D.DateData <= C.EndDate
GROUP BY DateData
However, here's a free tip: Using a recursive cte for things like that is fine when the range is small, but if you find yourself having to use OPTION (MAXRECURSION 0) it means you are in danger of a performance hit because of the recursive cte and should replace it with a tally table based solution.
If you don't know what a tally table is, read Jeff Moden's The "Numbers" or "Tally" Table: What it is and how it replaces a loop.
If you don't already have a tally table, read What is the best way to create and populate a numbers table?
Having said that, date related queries often benefit from having a pre-populated calendar table - such a table can save you from calculating weekends, national holidays etc', at a storage price that's practically negligible in modern servers.
Read Aaron Bertrand's Creating a date dimension or calendar table in SQL Server for a step-by-step explanation on how to create one for yourself.

T-SQL: GROUP BY, but while keeping a non-grouped column (or re-joining it)?

I'm on SQL Server 2008, and having trouble querying an audit table the way I want to.
The table shows every time a new ID comes in, as well as every time an IDs Type changes
Record # ID Type Date
1 ae08k M 2017-01-02:12:03
2 liei0 A 2017-01-02:12:04
3 ae08k C 2017-01-02:13:05
4 we808 A 2017-01-03:20:05
I'd kinda like to produce a snapshot of the status for each ID, at a certain date. My thought was something like this:
SELECT
ID
,max(date) AS Max
FROM
Table
WHERE
Date < 'whatever-my-cutoff-date-is-here'
GROUP BY
ID
But that loses the Type column. If I add in the type column to my GROUP BY, then I'd get get duplicate rows per ID naturally, for all the types it had before the date.
So I was thinking of running a second version of the table (via a common table expression), and left joining that in to get the Type.
On my query above, all I have to join to are the ID & Date. Somehow if the dates are too close together, I end up with duplicate results (like say above, ae08k would show up once for each Type). That or I'm just super confused.
Basically all I ever do in SQL are left joins, group bys, and common table expressions (to then left join). What am I missing that I'd need in this situation...?
Use row_number()
select *
from ( select *
, row_number() over (partition by id order by date desc) as rn
from table
WHERE Date < 'whatever-my-cutoff-date-is-here'
) tt
where tt.rn = 1
I'd kinda like know how many IDs are of each type, at a certain date.
Well, for that you use COUNT and GROUP BY on Type:
SELECT Type, COUNT(ID)
FROM Table
WHERE Date < 'whatever-your-cutoff-date-is-here'
GROUP BY Type
Basing on your comment under Zohar Peled answer you probably looking for something like this:
; with cte as (select distinct ID from Table where Date < '$param')
select [data].*, [data2].[count]
from cte
cross apply
( select top 1 *
from Table
where Table.ID = cte.ID
and Table.Date < '$param'
order by Table.Date desc
) as [data]
cross apply
( select count(1) as [count]
from Table
where Table.ID = cte.ID
and Table.Date < '$param'
) as [data2]

SQL Server & SSMS 2012 - Move a value from one column to a new one to ensure only one row

This is a problem that has troubled several times in the past an I have always wondered if a solution is possible.
I have a query using several tables one of the values is mobile phone number.
I have name, addresss etc.... I also have income information in the table which is used for a summary in Excel.
Where the problem occurs is when a contact has more than one mobile number, as you know this will create extra rows with the majority of the data being duplicate including the income.
Question: is it possible for the query to identify whether the contact has more than one number and if so create a new column with the 2nd mobile number.
Effectively returning the contacts information to one row and creating new columns.
My SQL is intermediate and I cannot think of a solution so thought I would ask.
Many thanks
I am pretty sure that it isn't the best possible solution, since we don't have information on how many records do you have in your dataset and I didn't have enough time, so just an idea how you can solve your original problem with two different numbers for one same customer.
declare #t table (id int
,firstName varchar(20)
,lastName varchar(20)
,phoneNumber varchar(20)
,income money)
insert into #t values
(1,'John','Doe','1234567',50)
,(1,'John','Doe','6789856',50)
,(2,'Mike','Smith','5687456',150)
,(3,'Stela','Hodhson','3334445',500)
,(4,'Nick','Slotter','5556667',550)
,(4,'Nick','Slotter','8889991',550)
,(5,'Abraham','Lincoln','4578912',52)
,(6,'Ronald','Regan','6987456',587)
,(7,'Thomas','Jefferson','8745612',300);
with a as(
select id
,max(phoneNumber) maxPhone
from #t group by id
),
b as(
select id
,min(phoneNumber) minPhone
from #t group by id
)
SELECT distinct t.id
,t.firstName
,t.lastName
,t.income
,a.maxPhone as phoneNumber1
,case when b.minPhone = a.maxPhone then ''
else b.minphone end as phoneNumber2
from #t t
inner join a a on a.id = t.id
inner join b b on b.id = t.id

SQL running sum for an MVC application

I need a faster method to calculate and display a running sum.
It's an MVC telerik grid that queries a view that generates a running sum using a sub-query. The query takes 73 seconds to complete, which is unacceptable. (Every time the user hits "Refresh Forecast Sheet", it takes 73 seconds to re-populate the grid.)
The query looks like this:
SELECT outside.EffectiveDate
[omitted for clarity]
,(
SELECT SUM(b.Amount)
FROM vCI_UNIONALL inside
WHERE inside.EffectiveDate <= outside.EffectiveDate
) AS RunningBalance
[omitted for clarity]
FROM vCI_UNIONALL outside
"EffectiveDate" on certain items can change all the time... New items can get added, etc. I certainly need something that can calculate the running sum on the fly (when the Refresh button is hit). Stored proc or another View...? Please advise.
Solution: (one of many, this one is orders of magnitude faster than a sub-query)
Create a new table with all the columns in the view except for the RunningTotal col. Create a stored procedure that first truncates the table, then INSERT INTO the table using SELECT all columns, without the running sum column.
Use update local variable method:
DECLARE #Amount DECIMAL(18,4)
SET #Amount = 0
UPDATE TABLE_YOU_JUST_CREATED SET RunningTotal = #Amount, #Amount = #Amount + ISNULL(Amount,0)
Create a task agent that will run the stored procedure once a day. Use the TABLE_YOU_JUST_CREATED for all your reports.
Take a look at this post
Calculate a Running Total in SQL Server
If you have SQL Server Denali, you can use new windowed function.
In SQL Server 2008 R2 I suggest you to use recursive common table expression.
Small problem in CTE is that for fast query you have to have identity column without gaps (1, 2, 3,...) and if you don't have such a column you have to create a temporary or variable table with such a column and to move you your data there.
CTE approach will be something like this
declare #Temp_Numbers (RowNum int, Amount <your type>, EffectiveDate datetime)
insert into #Temp_Numbers (RowNum, Amount, EffectiveDate)
select row_number() over (order by EffectiveDate), Amount, EffectiveDate
from vCI_UNIONALL
-- you can also use identity
-- declare #Temp_Numbers (RowNum int identity(1, 1), Amount <your type>, EffectiveDate datetime)
-- insert into #Temp_Numbers (Amount, EffectiveDate)
-- select Amount, EffectiveDate
-- from vCI_UNIONALL
-- order by EffectiveDate
;with
CTE_RunningTotal
as
(
select T.RowNum, T.EffectiveDate, T.Amount as Total_Amount
from #Temp_Numbers as T
where T.RowNum = 1
union all
select T.RowNum, T.EffectiveDate, T.Amount + C.Total_Amount as Total_Amount
from CTE_RunningTotal as C
inner join #Temp_Numbers as T on T.RowNum = C.RowNum + 1
)
select C.RowNum, C.EffectiveDate, C.Total_Amount
from CTE_RunningTotal as C
option (maxrecursion 0)
There're may be some questions with duplicates EffectiveDate values, it depends on how you want to work with them - do you want to them to be ordered arbitrarily or do you want them to have equal Amount?

Resources