How to calculate SUM balance of all accounts in T-SQL? - sql-server

I have this table and data
CREATE TABLE #transactions (
[transactionId] [int] NOT NULL,
[accountId] [int] NOT NULL,
[dt] [datetime] NOT NULL,
[balance] [smallmoney] NOT NULL,
CONSTRAINT [PK_transactions_1] PRIMARY KEY CLUSTERED
( [transactionId] ASC)
)
INSERT #transactions ([transactionId], [accountId], [dt], [balance]) VALUES
(1, 1, CAST(0x0000A13900107AC0 AS DateTime), 123.0000),
(2, 1, CAST(0x0000A13900107AC0 AS DateTime), 192.0000),
(3, 1, CAST(0x0000A13A00107AC0 AS DateTime), 178.0000),
(4, 2, CAST(0x0000A13B00107AC0 AS DateTime), 78.0000),
(5, 2, CAST(0x0000A13D011D1860 AS DateTime), 99.0000),
(6, 2, CAST(0x0000A13F00000000 AS DateTime), 97.0000),
(7, 1, CAST(0x0000A13D0141E640 AS DateTime), 201.0000),
(8, 3, CAST(0x0000A1420094DD60 AS DateTime), 4000.0000),
(9, 3, CAST(0x0000A14300956A00 AS DateTime), 4100.0000),
(10, 3, CAST(0x0000A14700000000 AS DateTime), 4200.0000),
(11, 2, CAST(0x0000A14B00B84BB0 AS DateTime), 110.0000)
I need two queries.
For each transaction, I want to return in a query the most recent balance for each account, and an extra column with a SUM of each account balance at that point in time.
Same as 1 but grouped by date without the time portion. So the latest account balance at the end of each day (where there is a transaction in any account) for each account, but SUMed together as in 1.
Data above is sample data that I just made up, but my real table has hundreds of rows and ten accounts (which may increase soon). Each account has a unique accountId. Seems quite a tricky piece of SQL.
EXAMPLE
For 1. I need a result like this:
+---------------+-----------+-------------------------+---------+-------------+
| transactionId | accountId | dt | balance | sumBalances |
+---------------+-----------+-------------------------+---------+-------------+
| 1 | 1 | 2013-01-01 01:00:00.000 | 123 | 123 |
| 2 | 1 | 2013-01-01 01:00:00.000 | 192 | 192 |
| 3 | 1 | 2013-01-02 01:00:00.000 | 178 | 178 |
| 4 | 2 | 2013-01-03 01:00:00.000 | 78 | 256 |
| 5 | 2 | 2013-01-05 17:18:00.000 | 99 | 277 |
| 7 | 1 | 2013-01-05 19:32:00.000 | 201 | 300 |
| 6 | 2 | 2013-01-07 00:00:00.000 | 97 | 298 |
| 8 | 3 | 2013-01-10 09:02:00.000 | 4000 | 4298 |
| 9 | 3 | 2013-01-11 09:04:00.000 | 4100 | 4398 |
| 10 | 3 | 2013-01-15 00:00:00.000 | 4200 | 4498 |
| 11 | 2 | 2013-01-19 11:11:00.000 | 110 | 4511 |
+---------------+-----------+-------------------------+---------+-------------+
So, for transactionId 8, I take the latest balance for each account in turn and then sum them. AccountID 1: is 201, AccountId 2 is 97 and AccountId 3 is 4000. Therefore the result for transactionId 8 will be 201+97+4000 = 4298. When calculating the set must be ordered by dt
For 2. I need this
+------------+-------------+
| date | sumBalances |
+------------+-------------+
| 01/01/2013 | 192 |
| 02/01/2013 | 178 |
| 03/01/2013 | 256 |
| 05/01/2013 | 300 |
| 07/01/2013 | 298 |
| 10/01/2013 | 4298 |
| 11/01/2013 | 4398 |
| 15/01/2013 | 4498 |
| 19/01/2013 | 4511 |
+------------+-------------+
So on date 15/01/2013 the latest account balance for each account in turn (1,2,3) is 201,97,4200. So the result for that date would be 201+97+4200 = 4498

This gives your first desired resultset (SQL Fiddle)
WITH T
AS (SELECT *,
balance -
isnull(lag(balance) OVER (PARTITION BY accountId
ORDER BY dt, transactionId), 0) AS B
FROM #transactions)
SELECT transactionId,
accountId,
dt,
balance,
SUM(B) OVER (ORDER BY dt, transactionId ROWS UNBOUNDED PRECEDING) AS sumBalances
FROM T
ORDER BY dt;
It subtracts the current balance of the account from the previous balance to get the net difference then calculates a running total of those differences.
And that can be used as a base for your second result
WITH T1
AS (SELECT *,
balance -
isnull(lag(balance) OVER (PARTITION BY accountId
ORDER BY dt, transactionId), 0) AS B
FROM #transactions),
T2 AS (
SELECT transactionId,
accountId,
dt,
balance,
ROW_NUMBER() OVER (PARTITION BY CAST(dt AS DATE) ORDER BY dt DESC, transactionId DESC) AS RN,
SUM(B) OVER (ORDER BY dt, transactionId ROWS UNBOUNDED PRECEDING) AS sumBalances
FROM T1)
SELECT CAST(dt AS DATE) AS [date], sumBalances
FROM T2
WHERE RN=1
ORDER BY [date];

Part 1
; WITH a AS (
SELECT *, r = ROW_NUMBER()OVER(PARTITION BY accountId ORDER BY dt)
FROM #transactions t
)
, b AS (
SELECT t.*
, transamount = t.balance - ISNULL(t0.balance,0)
FROM a t
LEFT JOIN a t0 ON t0.accountId = t.accountId AND t0.r + 1 = t.r
)
SELECT transactionId, accountId, dt, balance
, sumBalance = SUM(transamount)OVER(ORDER BY dt, transactionId)
FROM b
ORDER BY dt
Part 2
; WITH a AS (
SELECT *, r = ROW_NUMBER()OVER(PARTITION BY accountId ORDER BY dt)
FROM #transactions t
)
, b AS (
SELECT t.*
, transamount = t.balance - ISNULL(t0.balance,0)
FROM a t
LEFT JOIN a t0 ON t0.accountId = t.accountId AND t0.r + 1 = t.r
)
, c AS (
SELECT transactionId, accountId, dt, balance
, sumBalance = SUM(transamount)OVER(ORDER BY CAST(dt AS DATE))
, r1 = ROW_NUMBER()OVER(PARTITION BY accountId, CAST(dt AS DATE) ORDER BY dt DESC)
FROM b
)
SELECT dt = CAST(dt AS DATE)
, sumBalance
FROM c
WHERE r1 = 1
ORDER BY CAST(dt AS DATE)

Related

Join created table under condition

I am creating a code to join two different tables under a certain condition. The tables look like this
(TABLE 2)
date | deal_code | originator | servicer | random |
-----------------------------------------------------
2011 | 001 | commerzbank | SPV1 | 1 |
2012 | 001 | commerzbank | SPV1 | 12 |
2013 | 001 | commerzbank | SPV1 | 7 |
2013 | 005 | unicredit | SPV2 | 7 |
and another table
(TABLE 1)
date | deal_code | amount |
---------------------------
2011 | 001 | 100 |
2012 | 001 | 100 |
2013 | 001 | 100 |
2013 | 005 | 200 |
I would like to have this as the final result
date | deal_code | amount | originator | servicer | random |
--------------------------------------------------------------
2013 | 001 | 100 | commerzbank | SPV1 | 7 |
2013 | 005 | 200 | unicredit | SPV2 | 7 |
I created the following code
select q1.deal_code, q1.date
from table1 q1
where q1.date = (SELECT MAX(t4.date)
FROM table1 t4
WHERE t4.deal_code = q1.deal_code)
that gives me:
(TABLE 3)
date | deal_code | amount |
---------------------------
2013 | 001 | 100 |
2013 | 005 | 200 |
That is the latest observation for table 1, now I would like to have the originator and servicer information given the deal_code and date. Any suggestion? I hope to have been clear enough. Thanks.
This should do what you are looking for. Please be careful when naming columns. Date is a reserved word and is too ambiguous to be a good name for a column.
declare #Something table
(
SomeDate int
, deal_code char(3)
, originator varchar(20)
, servicer char(4)
, random int
)
insert #Something values
(2011, '001', 'commerzbank', 'SPV1', 1)
, (2012, '001', 'commerzbank', 'SPV1', 12)
, (2013, '001', 'commerzbank', 'SPV1', 7)
, (2013, '005', 'unicredit ', 'SPV2', 7)
declare #SomethingElse table
(
SomeDate int
, deal_code char(3)
, amount int
)
insert #SomethingElse values
(2011, '001', '100')
, (2012, '001', '100')
, (2013, '001', '100')
, (2013, '005', '200')
select x.SomeDate
, x.deal_code
, x.originator
, x.servicer
, x.random
, x.amount
from
(
select s.SomeDate
, s.deal_code
, s.originator
, s.servicer
, s.random
, se.amount
, RowNum = ROW_NUMBER()over(partition by s.deal_code order by s.SomeDate desc)
from #Something s
join #SomethingElse se on se.SomeDate = s.SomeDate and se.deal_code = s.deal_code
) x
where x.RowNum = 1
Looks like this would work:
DECLARE #MaxYear INT;
SELECT #MaxYear = MAX(date)
FROM table1 AS t1
INNER JOIN table2 AS t2
ON t1.deal_code = t2.deal_code;
SELECT t1.date,
t1.deal_code,
t1.amount,
t2.originator,
t2.servicer,
t2.random
FROM table1 AS t1
INNER JOIN table2 AS t2
ON t1.date = #MaxYear
AND t1.deal_code = t2.deal_code;
I agree with Sean Lange about the date column name. His method gets around the dependency on the correlated sub-query, but at the heart of things, you really just need to add an INNER JOIN to your existing query in order to get the amount column into your result set.
select
q2.date,
q2.deal_code,
q1.amount,
q2.originator,
q2.servicer,
q2.random
from
table1 q1
join
table2 q2
on q1.date = q2.date
and q1.deal_code = q2.deal_code
where q1.date = (SELECT MAX(t4.date)
FROM table1 t4
WHERE t4.deal_code = q1.deal_code)

Get the count of statuses by date but only count continuous rows

I have this data:
ID Name Status Date
1 Machine1 Active 2018-01-01
2 Machine2 Fault 2018-01-01
3 Machine3 Active 2018-01-01
4 Machine1 Fault 2018-01-02
5 Machine2 Active 2018-01-02
6 Machine3 Active 2018-01-02
7 Machine2 Active 2018-01-03
8 Machine1 Fault 2018-01-03
9 Machine2 Active 2018-01-04
10 Machine1 Fault 2018-01-04
11 Machine3 Active 2018-01-06
INPUT
and i want these data in output
EXPECTED OUTPUT
Name Last Status Count
Machine1 Fault 3
Machine2 Active 3
Machine3 Active 1 Because Date is not Continuous
*Count : Last number of status in continuous history
I believe it is as simple as this:
WITH cte1 AS (
SELECT
Name,
Status,
DATEADD(DAY, ROW_NUMBER() OVER (PARTITION BY Name, Status ORDER BY Date DESC) - 1, Date) AS GroupingDate
FROM testdata
), cte2 AS (
SELECT
Name,
Status,
RANK() OVER (PARTITION BY Name ORDER BY GroupingDate DESC) AS GroupingNumber
FROM cte1
)
SELECT Name, Status AS LastStatus, COUNT(*) AS LastStatusCount
FROM cte2
WHERE GroupingNumber = 1
GROUP BY Name, Status
ORDER BY Name
Result and DBFiddle:
| Name | LastStatus | LastStatusCount |
|----------|------------|-----------------|
| Machine1 | Fault | 3 |
| Machine2 | Active | 3 |
| Machine3 | Active | 1 |
In order to understand how this works, look at the intermediate values generated by CTE:
| Name | Status | Date | RowNumber | GroupingDate | GroupingNumber |
|----------|--------|---------------------|-----------|---------------------|----------------|
| Machine1 | Fault | 04/01/2018 00:00:00 | 1 | 04/01/2018 00:00:00 | 1 |
| Machine1 | Fault | 03/01/2018 00:00:00 | 2 | 04/01/2018 00:00:00 | 1 |
| Machine1 | Fault | 02/01/2018 00:00:00 | 3 | 04/01/2018 00:00:00 | 1 |
| Machine1 | Active | 01/01/2018 00:00:00 | 1 | 01/01/2018 00:00:00 | 4 |
| Machine2 | Active | 04/01/2018 00:00:00 | 1 | 04/01/2018 00:00:00 | 1 |
| Machine2 | Active | 03/01/2018 00:00:00 | 2 | 04/01/2018 00:00:00 | 1 |
| Machine2 | Active | 02/01/2018 00:00:00 | 3 | 04/01/2018 00:00:00 | 1 |
| Machine2 | Fault | 01/01/2018 00:00:00 | 1 | 01/01/2018 00:00:00 | 4 |
| Machine3 | Active | 06/01/2018 00:00:00 | 1 | 06/01/2018 00:00:00 | 1 |
| Machine3 | Active | 02/01/2018 00:00:00 | 2 | 03/01/2018 00:00:00 | 2 |
| Machine3 | Active | 01/01/2018 00:00:00 | 3 | 03/01/2018 00:00:00 | 2 |
The trick here is that if two numbers are contiguous then subtracting contiguous numbers from them will result in same value. E.g. if we have 5, 6, 8, 9 then subtracting 1, 2, 3, 4 in that order will produce 4, 4, 5, 5.
I think this will work, though SQLFiddle is having a fit at the moment, so I can't test:
SELECT [Name], [Status], ct as [Count]
FROM (
SELECT
[name],
[status],
[date],
1 + (SUM( grp ) OVER (PARTITION BY [name], [status] ORDER BY [date] ROWS BETWEEN 1 PRECEDING AND 0 FOLLOWING ) * grp) ct,
row_number() over(partition by [name] order by [date] desc) rn
FROM
(
SELECT *, CASE WHEN LAG([Date]) OVER(PARTITION BY [name], [status] ORDER BY [date] ) = DATEADD(day, -1, [date]) THEN 1 ELSE 0 END grp
FROM t
) x
) y
WHERE
rn = 1
It first off uses LAG to look at the current row and the previous row (grouping the data into machine name and status, ordering the data by date) and if the current date is 1 day different to the previous date it records a 1 else a 0
These 1s and zeroes are summed in a running total fashion, resetting when the machine name or status changes (the partition of the sum() over() )
Also we want to consider the data just in terms of the machine name, and we only want the latest record from each machine, so we partition by the machine name, and count in order of date descending, then just pick (with the where clause) the rows that are numbered 1 for each machine
It actually makes a lot more sense if you run the queries separately, like this
Calculate the "is the current report consecutive with the previous report, for the given status and machine" 1 = yes, 0 = no:
SELECT *, CASE WHEN LAG([Date]) OVER(PARTITION BY [name], [status] ORDER BY [date] ) = DATEADD(day, -1, [date]) THEN 1 ELSE 0 END grp
FROM t
Calculate the "what is the running total of the current block of consecutive reports":
SELECT
[name],
[status],
[date],
1 + (SUM( grp ) OVER (PARTITION BY [name], [status] ORDER BY [date] ROWS BETWEEN 1 PRECEDING AND 0 FOLLOWING ) * grp) ct,
row_number() over(partition by [name] order by [date] desc) rn
FROM
(
SELECT *, CASE WHEN LAG([Date]) OVER(PARTITION BY [name], [status] ORDER BY [date] ) = DATEADD(day, -1, [date]) THEN 1 ELSE 0 END grp
FROM t
) x
Then of course, the whole thing but without the where clause, so you can see the data we're discarding:
SELECT [Name], [Status], ct as [Count]
FROM (
SELECT
[name],
[status],
[date],
1 + (SUM( grp ) OVER (PARTITION BY [name], [status] ORDER BY [date] ROWS BETWEEN 1 PRECEDING AND 0 FOLLOWING ) * grp) ct,
row_number() over(partition by [name] order by [date] desc) rn
FROM
(
SELECT *, CASE WHEN LAG([Date]) OVER(PARTITION BY [name], [status] ORDER BY [date] ) = DATEADD(day, -1, [date]) THEN 1 ELSE 0 END grp
FROM t
) x
) y
Fiddle finally woke up:
http://www.sqlfiddle.com/#!18/77dae/2

(T-SQL) How to query an audit table, and find changes between 2 dates

The audit table looks like this:
Audit ID VendorID PaymentType CreateDateUTC
999 8048 2 2017-10-30-08:84:24
1000 1234 5 2017-10-31-01:17:34
1001 8048 7 2017-10-31-01:17:45
1002 1234 5 2017-10-31-01:17:53
1003 1234 7 2017-10-31-01:18:23
1004 1234 5 2017-11-01-01:18:45
In this example, you can see that say - VendorID 1234 started as PaymentType 5, then had another entry where it's still 5 (the audit table records additional changes not relevant to my query), then it changes to 7, but then back to 5.
Say I'd want to answer the question: 'Between now and date X, these VendorIDs had a change in PaymentType'. A bonus would be - this was the previous PaymentType.
Expected Results:
VendorID PaymentType Prev_PaymentType
8048 7 2
So say if I queried between now and 10-31-01:00:00, I'd want it to return VendorID 8048 as having changed (and as a bonus, that it's previous PaymentType was 2), but VendorID 1234 shouldn't show up, since at 2017-10-31-01:00:00 it was a 5, and now is still a 5, despite the intermittent changes.
How would one go about querying the VendorIDs whose payment type changed between 2 dates?
Thanks!
Here is an alternative approach that my prove useful, using OUTER APPLY. Note that the AuditID column is used as a tie-breaker mostly because the sample data does not have datetime values.
SQL Fiddle
CREATE TABLE AuditTable (
AuditID int
, VendorID int
, PaymentType int
, CreateDateUTC date
);
INSERT INTO AuditTable
VALUES (999, 8048, 2, '2017-10-30'),
(1000, 1234, 5, '2017-10-31'),
(1001, 8048, 7, '2017-10-31'),
(1002, 1234, 5, '2017-10-31'),
(1003, 1234, 7, '2017-10-31'),
(1004, 1234, 5, '2017-11-01');
Query 1:
select
*
from AuditTable a
outer apply (
select top(1) PaymentType, CreateDateUTC
from AuditTable t
where a.VendorID = t.VendorID
and a.CreateDateUTC >= t.CreateDateUTC
and a.AuditID > t.AuditID
order by CreateDateUTC DESC, AuditID DESC
) oa (PrevPaymentType, PrevDate)
order by
vendorid
, CreateDateUTC
Results:
| AuditID | VendorID | PaymentType | CreateDateUTC | PrevPaymentType | PrevDate |
|---------|----------|-------------|---------------|-----------------|------------|
| 1000 | 1234 | 5 | 2017-10-31 | (null) | (null) |
| 1002 | 1234 | 5 | 2017-10-31 | 5 | 2017-10-31 |
| 1003 | 1234 | 7 | 2017-10-31 | 5 | 2017-10-31 |
| 1004 | 1234 | 5 | 2017-11-01 | 7 | 2017-10-31 |
| 999 | 8048 | 2 | 2017-10-30 | (null) | (null) |
| 1001 | 8048 | 7 | 2017-10-31 | 2 | 2017-10-30 |
CREATE TABLE AuditTable (
AuditID INT,
VendorID INT,
PaymentType INT,
CreateDateUTC DATE
);
INSERT INTO AuditTable VALUES
(999 , 8048, 2, '2017-10-30'),
(1000, 1234, 5, '2017-10-31'),
(1001, 8048, 7, '2017-10-31'),
(1002, 1234, 5, '2017-10-31'),
(1003, 1234, 7, '2017-10-31'),
(1004, 1234, 5, '2017-11-01');
WITH CTE AS (
SELECT *,
ROW_NUMBER () OVER (PARTITION BY CreateDateUTC ORDER BY PaymentType) AS N1
FROM AuditTable
WHERE CreateDateUTC <= '2017-11-02' AND CreateDateUTC >= '2017-10-01'
) ,
MAXP AS(
SELECT VendorID, PaymentType, CreateDateUTC
FROM CTE
WHERE N1 = (SELECT MAX(N1) FROM CTE)
)
SELECT TOP 1 MAXP.VendorID, MAXP.PaymentType AS PaymentType, CTE.PaymentType AS Prev_PaymentType
FROM MAXP
JOIN CTE ON CTE.VendorID = MAXP.VendorID;
Result:
+----------+-------------+------------------+
| VendorID | PaymentType | Prev_PaymentType |
+----------+-------------+------------------+
| 8048 | 7 | 2 |
+----------+-------------+------------------+
Demo
Here is a variant without using LEAD() or LAG() but does use ROW_NUMBER and COUNT() OVER().
See this verision work at:SQL Fiddle
CREATE TABLE AuditTable (
AuditID int
, VendorID int
, PaymentType int
, CreateDateUTC date
);
INSERT INTO AuditTable
VALUES (999, 8048, 2, '2017-10-30'),
(1000, 1234, 5, '2017-10-31'),
(1001, 8048, 7, '2017-10-31'),
(1002, 1234, 5, '2017-10-31'),
(1003, 1234, 7, '2017-10-31'),
(1004, 1234, 5, '2017-11-01');
Query 1:
WITH
rowz AS (
SELECT *
, ROW_NUMBER() OVER (PARTITION BY VendorID
ORDER BY CreateDateUTC, AuditID) AS lagno
FROM AuditTable
),
cte AS (
SELECT *
, ROW_NUMBER() OVER (PARTITION BY VendorID, CreateDateUTC
ORDER BY c DESC, span_dt) rn
FROM (
SELECT r1.AuditID, r1.VendorID, r1.CreateDateUTC
, r1.PaymentType AS prevpaymenttype
, r2.PaymentType
, COALESCE(r2.CreateDateUTC, CAST(GETDATE() AS date)) span_dt
, COUNT(*) OVER (PARTITION BY r1.VendorID, r1.CreateDateUTC, r1.PaymentType) c
FROM rowz r1
LEFT JOIN rowz r2 ON r1.VendorID = r2.VendorID
AND r1.lagno = r2.lagno - 1
) d
)
SELECT
AuditID, VendorID, PrevPaymentType, PaymentType, CreateDateUTC
FROM (
SELECT
*
FROM cte
WHERE ('20171031' BETWEEN CreateDateUTC AND span_dt AND rn = 1)
OR (CAST(GETDATE() AS date) BETWEEN CreateDateUTC AND span_dt AND rn = 1)
) d
WHERE PaymentType <> PrevPaymentType
Results:
| AuditID | VendorID | PrevPaymentType | PaymentType | CreateDateUTC |
|---------|----------|-----------------|-------------|---------------|
| 999 | 8048 | 2 | 7 | 2017-10-30 |

SQL DATEDIFF in an sql query

I have two tables Customers and Purchases:
Customers table:
+------------+-----------+----------+
| CustomerID | FirstName | Surname |
+------------+-----------+----------+
| 101 | Jeff | Smith |
| 102 | Alex | Jones |
| 103 | Pam | Clark |
| 104 | Zola | Lona |
| 105 | Simphele | Ndima |
| 106 | Andre | Williams |
| 107 | Wayne | Shelton |
| 108 | Bob | Banard |
| 109 | Ken | Davidson |
| 110 | Sally | Ivan |
+------------+-----------+----------+
Purchases table:
+------------+--------------+------------+-----------+
| PurchaseId | PurchaseDate | CustomerID | ProductID |
+------------+--------------+------------+-----------+
| 1 | 2012-08-15 | 105 | a510 |
| 2 | 2012-08-15 | 102 | a510 |
| 3 | 2012-08-15 | 103 | a506 |
| 4 | 2012-08-16 | 105 | a510 |
| 5 | 2012-08-17 | 106 | a507 |
| 6 | 2012-08-17 | 107 | a509 |
| 7 | 2012-08-18 | 108 | a502 |
| 8 | 2012-08-19 | 108 | a510 |
| 9 | 2012-08-19 | 109 | a502 |
| 10 | 2012-08-20 | 110 | a503 |
| 11 | 2012-08-21 | 101 | a510 |
| 12 | 2012-08-22 | 102 | a507 |
+------------+--------------+------------+-----------+
My question (which I have been struggling with for the last 2 days): create a query that will display all the customers who purchased products after five days or more, since their last purchase.
Desired outputs:
+-----------+------------------+
| Firstname | Daysdifference |
+-----------+------------------+
| Alex | 7 |
+-----------+------------------+
select c.FirstName, t.dif as Daysdifference from customer c
inner join
(
select p1.CustomerID,
datediff(day,p1.PurchaseDate,p2.PurchaseDate) as dif
from purchases p1
inner join purchases p2
on p1.CustomerID=p2.CustomerID
where datediff(day,p1.PurchaseDate,p2.PurchaseDate)>=5
) t
on t.CustomerID= c.CustomerID
Here you go:
DECLARE #Customers TABLE (CustomerID INT, FirstName VARCHAR(30), Surname VARCHAR(30));
DECLARE #Purchases TABLE (PurchaseId INT, PurchaseDate DATE, CustomerID INT, ProductID VARCHAR(10) );
/**/
INSERT INTO #Customers VALUES
(101,'Jeff ' , 'Smith '),
(102,'Alex ' , 'Jones '),
(103,'Pam ' , 'Clark '),
(104,'Zola ' , 'Lona '),
(105,'Simphele' , 'Ndima '),
(106,'Andre ' , 'Williams'),
(107,'Wayne ' , 'Shelton '),
(108,'Bob ' , 'Banard '),
(109,'Ken ' , 'Davidson'),
(110,'Sally ' , 'Ivan ');
INSERT INTO #Purchases VALUES
(1, '2012-08-15' ,105, 'a510'),
(2, '2012-08-15' ,102, 'a510'),
(3, '2012-08-15' ,103, 'a506'),
(4, '2012-08-16' ,105, 'a510'),
(5, '2012-08-17' ,106, 'a507'),
(6, '2012-08-17' ,107, 'a509'),
(7, '2012-08-18' ,108, 'a502'),
(8, '2012-08-19' ,108, 'a510'),
(9, '2012-08-19' ,109, 'a502'),
(10,'2012-08-20' ,110, 'a503'),
(11,'2012-08-21' ,101, 'a510'),
(12,'2012-08-22' ,102, 'a507');
--
WITH CTE AS (
SELECT Pur1.CustomerID, DATEDIFF(DAY, Pur1.PurchaseDate, Pur2.PurchaseDate) Daysdifference
FROM #Purchases Pur1 INNER JOIN #Purchases Pur2 ON Pur1.CustomerID = Pur2.CustomerID
)
SELECT Cus.FirstName, CTE.Daysdifference
FROM #Customers Cus INNER JOIN CTE ON Cus.CustomerID = CTE.CustomerID
WHERE CTE.Daysdifference >= 5;
Result:
+-----------+------------------+
| Firstname | Daysdifference |
+-----------+------------------+
| Alex | 7 |
+-----------+------------------+
Demo
You can solve it like this:
Create a ranking based on date desc and partitioned by customer id
Next check date diff between consecutive ranks to find those customers
Query below
; with cte as
(
select
*,
row_number() over(partition by CustomerID order by PurchaseDate desc) r
from
Purchases
)
select
Name= c.FirstName,
Daysdifference =datediff(d,c1.PurchaseDate, c2.PurchaseDate)
from
Customers c join
cte c1
on c.customerid=c1.customerid
join cte c2
on c1.CustomerID=c2.CustomerId
and c1.r-1=c2.r
and datediff(d,c1.PurchaseDate, c2.PurchaseDate) >=5
See working demo
Since SQL Server 2012 and the addition of the LAG & LEAD functions, there is no reason at all to do a self join for something like this...
Note... Ranking function can be extremely efficient compared to other methods BUT they do need the help of a proper index to perform their best (note the additional POC index in the test script).
CREATE TABLE #Customers (
CustomerID INT PRIMARY KEY,
FirstName VARCHAR(30),
Surname VARCHAR(30)
);
CREATE TABLE #Purchases (
PurchaseId INT PRIMARY KEY,
PurchaseDate DATE,
CustomerID INT,
ProductID VARCHAR(10)
);
INSERT INTO #Customers VALUES
(101,'Jeff ' , 'Smith '),
(102,'Alex ' , 'Jones '),
(103,'Pam ' , 'Clark '),
(104,'Zola ' , 'Lona '),
(105,'Simphele' , 'Ndima '),
(106,'Andre ' , 'Williams'),
(107,'Wayne ' , 'Shelton '),
(108,'Bob ' , 'Banard '),
(109,'Ken ' , 'Davidson'),
(110,'Sally ' , 'Ivan ');
INSERT INTO #Purchases VALUES
(1, '2012-08-15' ,105, 'a510'),
(2, '2012-08-15' ,102, 'a510'),
(3, '2012-08-15' ,103, 'a506'),
(4, '2012-08-16' ,105, 'a510'),
(5, '2012-08-17' ,106, 'a507'),
(6, '2012-08-17' ,107, 'a509'),
(7, '2012-08-18' ,108, 'a502'),
(8, '2012-08-19' ,108, 'a510'),
(9, '2012-08-19' ,109, 'a502'),
(10,'2012-08-20' ,110, 'a503'),
(11,'2012-08-21' ,101, 'a510'),
(12,'2012-08-22' ,102, 'a507');
-- add POC index...
CREATE NONCLUSTERED INDEX ix_POC ON #Purchases (CustomerID, PurchaseDate);
--===========================================================
SELECT
c.FirstName,
p2.Daysdifference
FROM
#Customers c
JOIN (
SELECT
p.CustomerID,
Daysdifference = DATEDIFF(DAY, p.PurchaseDate, LEAD(p.PurchaseDate, 1) OVER (PARTITION BY p.CustomerID ORDER BY p.PurchaseDate))
FROM
#Purchases p
) p2
ON c.CustomerID = p2.CustomerID
WHERE
p2.Daysdifference >= 5;
Results...
FirstName Daysdifference
------------------------------ --------------
Alex 7

SQL Server query to see what changed from month to month

I am struggling with developing a query to compare changes in a single table from month to month, example data -
+-----------------------------------------------------------+
| TaxGroupDetails |
+-----------+--+----------+--+-----------+--+---------------+
| Tax Group | | Tax Type | | Geocode | | EffectiveDate |
+-----------+--+----------+--+-----------+--+---------------+
| 2001 | | 1D | | 440011111 | | 1120531 |
| 2001 | | X1 | | 440011111 | | 1120531 |
| 2001 | | D3 | | 440011111 | | 1120531 |
| 2001 | | DGH | | 440011111 | | 1120531 |
| 2001 | | 1D | | 440011111 | | 1130101 |
| 2001 | | X1 | | 440011111 | | 1130101 |
| 2001 | | D3 | | 440011111 | | 1130101 |
| 2001 | | 1D | | 440011111 | | 1140201 |
| 2001 | | X1 | | 440011111 | | 1140201 |
| 2001 | | D3 | | 440011111 | | 1140201 |
| 2001 | | Z9 | | 440011111 | | 1140201 |
+-----------+--+----------+--+-----------+--+---------------+
I want to see the changes in the table, what was added or removed from a taxgroup, between the top two effective dates.
The results I am trying to obtain based on the sample data would be Z9 (added) if I was running the query in February (1140201) of this year.
If I was running the query in January (1130101) of last year I would expect to see DGH (removed)
I would expect two seperate queries, one to show what was added and another to show what was removed.
I have tried multiple avenues to come up with these two queries but cant seem to obtain the correct results. Can anyone point me in the right direction ?
SELECT
Current.TaxGroup,
Current.TaxType,
Current.GeoCode,
'Added'
FROM
TaxGroupDetails AS Current
WHERE
Current.EffectiveDate = #CurrentPeriod AND
NOT EXISTS
(
SELECT *
FROM TaxGroupDetails As Previous
WHERE
Previous.EffectiveDate = #PreviousPeriod
Current.TaxGroup = Previous.TaxGroup and
Current.TaxType = Previous.TaxType and
Current.GeoCode = Previous.GeoCode
)
UNION ALL
SELECT
Current.TaxGroup,
Current.TaxType,
Current.GeoCode,
'Added'
FROM
TaxGroupDetails AS Previous
WHERE
Previous.EffectiveDate = #PreviousPeriod AND
NOT EXISTS
(
SELECT *
FROM TaxGroupDetails As Current
WHERE
Current.EffectiveDate = #CurrentPeriod
Current.TaxGroup = Previous.TaxGroup and
Current.TaxType = Previous.TaxType and
Current.GeoCode = Previous.GeoCode
)
As you say you need two queries, one to select each of the two groups of data you want to compare.
SELECT [Tax Group], [Tax Type], [Geocode], [EffectiveDate]
FROM TaxGroupDetails
WHERE EffectiveDate = 1120531
SELECT [Tax Group], [Tax Type], [Geocode], [EffectiveDate]
FROM TaxGroupDetails
WHERE EffectiveDate = 1140201
You then need to join these two together using some form of key, the combination of tax group and tax type seems sensible here.
SELECT *
FROM
(
SELECT [Tax Group], [Tax Type], [Geocode], [EffectiveDate]
FROM TaxGroupDetails
WHERE EffectiveDate = 1120531
) AS FirstGroup
FULL OUTER JOIN
(
SELECT [Tax Group], [Tax Type], [Geocode], [EffectiveDate]
FROM TaxGroupDetails
WHERE EffectiveDate = 1140201
) AS SecondGroup
ON FirstGroup.[Tax Group] = SecondGroup.[Tax Group]
AND FirstGroup.[Tax Type] = SecondGroup.[Tax Type]
The FULL OUTER JOIN here tells SQL to include the remaining row when the other doesn't exist.
Finally let's tidy up and order the columns and not use a *:
SELECT COALESCE(FirstGroup.[Tax Group], SecondGroup.[Tax Group]),
COALESCE(FirstGroup.[Tax Type], SecondGroup.[Tax Type]),
FirstGroup.Geocode, SecondGroup.Geocode,
FirstGroup.EffectiveDate, SecondGroup.EffectiveDate
FROM
.
.
.
COALESCE removes the NULLs from the first matched columns and as we are saying these muct be equal there is no point showing both copies.
The set-based solution: take the difference between the whole table and the whole table with all dates projected forward by one time interval. That will eliminate all rows except the ones with "new" codes.
SELECT
[TaxGroup],
[Tax Type],
[EffectiveDate]
FROM TaxGroupDetails t
EXCEPT
SELECT
[TaxGroup],
[Tax Type],
( SELECT MIN([EffectiveDate])
FROM TaxGroupDetails
WHERE [EffectiveDate] > t.[EffectiveDate]
AND [TaxGroup] = t.[TaxGroup]
)
FROM TaxGroupDetails t
To see what got deleted, project backwards instead. Change the subquery to:
SELECT MAX([EffectiveDate])
FROM TaxGroupDetails
WHERE [EffectiveDate] < t.[EffectiveDate]
AND [TaxGroup] = t.[TaxGroup]
If you have SQL2012:
WITH t AS (
SELECT *,
ROW_NUMBER() OVER(PARTITION BY [TaxGroup], [Tax Type] ORDER BY [EffectiveDate] ASC) rownum
FROM [TaxGroup]
)
SELECT *
FROM t
WHERE rownum = 1
AND [EffectiveDate] = #Date
To get the other query, change ASC to DESC
Try this / you could start from this [partial] solution:
DECLARE #MyTable TABLE (
ID INT IDENTITY PRIMARY KEY,
[Tax Group] SMALLINT NOT NULL,
[Tax Type] VARCHAR(3) NOT NULL,
[Geocode] INT NOT NULL,
[EffectiveDate] INT NOT NULL
);
INSERT #MyTable
SELECT 2001, '1D ', 440011111, 1120531
UNION ALL SELECT 2001, 'X1 ', 440011111, 1120531
UNION ALL SELECT 2001, 'D3 ', 440011111, 1120531
UNION ALL SELECT 2001, 'DGH', 440011111, 1120531
UNION ALL SELECT 2001, '1D ', 440011111, 1130101
UNION ALL SELECT 2001, 'X1 ', 440011111, 1130101
UNION ALL SELECT 2001, 'D3 ', 440011111, 1130101
UNION ALL SELECT 2001, '1D ', 440011111, 1140201
UNION ALL SELECT 2001, 'X1 ', 440011111, 1140201
UNION ALL SELECT 2001, 'D3 ', 440011111, 1140201
UNION ALL SELECT 2001, 'Z9 ', 440011111, 1140201;
DECLARE #Results TABLE (
ID INT NOT NULL,
Rnk INT NOT NULL,
EffectiveYear SMALLINT NOT NULL,
PRIMARY KEY (Rnk, EffectiveYear)
);
INSERT #Results
SELECT x.ID,
DENSE_RANK() OVER(ORDER BY x.[Tax Group], x.[Tax Type], x.[Geocode]) AS Rnk,
x.EffectiveDate / 10000 AS EffectiveYear
FROM #MyTable x;
SELECT
crt.*,
prev.*,
CASE
WHEN crt.ID IS NOT NULL AND prev.ID IS NOT NULL THEN '-' -- No change
WHEN crt.ID IS NULL AND prev.ID IS NOT NULL THEN 'D' -- Deleted
WHEN crt.ID IS NOT NULL AND prev.ID IS NULL THEN 'I' -- Inserted
END AS RowStatus
FROM #Results crt FULL OUTER JOIN #Results prev ON crt.Rnk = prev.Rnk
AND crt.EffectiveYear - 1 = prev.EffectiveYear
ORDER BY ISNULL(crt.EffectiveYear - 1, prev.EffectiveYear), crt.Rnk;
Sample output:
---- ---- ------------- ---- ---- -------------
| Current data | | Previous data |
---- ---- ------------- ---- ---- ------------- ---------
ID Rnk EffectiveYear ID Rnk EffectiveYear RowStatus
---- ---- ------------- ---- ---- ------------- ---------
1 1 112 NULL NULL NULL I -- Current vs. previous: current row hasn't a previous row
3 2 112 NULL NULL NULL I -- the same thing
4 3 112 NULL NULL NULL I -- the same thing
2 4 112 NULL NULL NULL I -- the same thing
NULL NULL NULL 4 3 112 D <-- Deleted: ID 4 = 'DGH'
5 1 113 1 1 112 - -- there is no change
7 2 113 3 2 112 -
6 4 113 2 4 112 -
8 1 114 5 1 113 -
10 2 114 7 2 113 -
9 4 114 6 4 113 -
11 5 114 NULL NULL NULL I <-- Inserted: ID 11 = 'Z9'
NULL NULL NULL 8 1 114 D
NULL NULL NULL 10 2 114 D
NULL NULL NULL 9 4 114 D
NULL NULL NULL 11 5 114 D
Note: I assume that there are no duplicated rows (x.[Tax Group], x.[Tax Type], x.[Geocode]) within a year.

Resources