What I've thought up in my mind I can't find any helpful resources for online. I have a table that has a transaction amount per client. This amount is either zero or below zero. In another table I have invoices with total amounts per invoice per client. For all the transaction records that are below zero I want to calculate which invoices are still (partial) open.
Imagine client A has a transaction amount of -300 and client B has a transaction amount of -100. So there is a table with the fields client_id and transaction_amount. Then take a look at the Excel image below.
client
invoice
amount
aggregated amount from last invoice up
status
A
invoice 101
-25
-575
closed
A
invoice 102
-100
-550
closed
A
invoice 103
-100
-450
closed
A
invoice 104
-75
-350
partial open
A
invoice 105
-25
-275
fully open
A
invoice 106
-150
-250
fully open
A
invoice 107
-25
-100
fully open
A
invoice 108
-75
-75
fully open
B
invoice 201
-25
-125
closed
B
invoice 202
-50
-100
fully open
B
invoice 203
-25
-50
fully open
B
invoice 204
-25
-25
fully open
So when starting the calculation for client A at invoice 8 and working my way up by aggregating the amounts I see that invoices 5 through 8 are fully open and invoice 4 is partially open. For client B invoces 4 through 2 are open. I then want a query result that shows me those invoices for client A and B. Just so you know, the real dataset has lots of clients and lots of invoices.
In an extended version of what I would like to see you do not only see that invoice 4 is partially pasi, but you also see that an amount of 25 was paid, but 50 remains.
I think I need to have a reversed kind of loop? I've tried a few things by having an #transaction variable and subtracting the amounts, but it either keeps on running or returns the same invoice number over and over again. The result should be something like the image below.
client
invoice
amount
amount open in transaction "debt amount"
A
invoice 104
-75
-25 (-50 isn't in the transaction "debt amount": -75 + -25 + -150 + -25 + -25 = -300. That's why it's a partial in the status field.)
A
invoice 105
-25
-25
A
invoice 106
-150
-150
A
invoice 107
-25
-25
A
invoice 108
-75
-75
B
invoice 202
-50
-50
B
invoice 203
-25
-25
B
invoice 204
-25
-25
I can't imagine that I'm the first to want to do this, so if anyone has a link to documentation on how to do this or is able to help me in this topic, it would be much appreciated.
Answer provided by SOS with some help from shawnt00. Thanks. See the dbfiddle result for the code.
WITH cte AS (
SELECT client
, invoice
, amount
, status
, [debt amount]
, SUM(Amount) OVER(PARTITION BY client ORDER BY invoice DESC) AS [aggregated amount from last invoice up]
, SUM(Amount) OVER(PARTITION BY client ORDER BY invoice DESC)
+ ABS([debt amount]) AS RemainingDebt
FROM YourTable
)
SELECT client
, invoice
, amount
, [aggregated amount from last invoice up]
, CASE WHEN RemainingDebt >= 0 THEN 'fully open'
WHEN ABS(RemainingDebt) < ABS(Amount) THEN 'partial open'
WHEN ABS(RemainingDebt) >= ABS(Amount) THEN 'closed'
END AS status
FROM cte
ORDER BY client, invoice
Results:
client
invoice
amount
aggregated amount from last invoice up
status
A
invoice 101
-25
-575
closed
A
invoice 102
-100
-550
closed
A
invoice 103
-100
-450
closed
A
invoice 104
-75
-350
partial open
A
invoice 105
-25
-275
fully open
A
invoice 106
-150
-250
fully open
A
invoice 107
-25
-100
fully open
A
invoice 108
-75
-75
fully open
B
invoice 201
-25
-125
closed
B
invoice 202
-50
-100
fully open
B
invoice 203
-25
-50
fully open
B
invoice 204
-25
-25
fully open
db<>fiddle here
Related
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 4 years ago.
Improve this question
Table 1
Amount
10
20
25
40
50
60
70
80
90
100
110
120
130
Write an sql query to get output as
07/11/2018 10
07/12/2018 20
07/13/2018 25 55
07/14/2018 40 85
07/15/2018 50 115
07/16/2018 60 150
07/17/2018 70 180
07/18/2018 80 210
07/19/2018 90 240
07/20/2018 100 270
07/21/2018 110 300
07/22/2018 120 330
07/23/2018 130 360
So I want to add the last 3 days amount values and get the sum.
The LAG window function is what you need. It allows you to access the value in a column from a previous row. The format is LAG([Column], [Row Offset], [Default]), so this example adds the value from the current row to the value 2 rows back and the value 1 row back.
DECLARE #t TABLE (dt DATE, c INT)
INSERT INTO #t VALUES
('07/11/2018',10),
('07/12/2018',20),
('07/13/2018',25),
('07/14/2018',40),
('07/15/2018',50),
('07/16/2018',60),
('07/17/2018',70),
('07/18/2018',80),
('07/19/2018',90),
('07/20/2018',100),
('07/21/2018',110),
('07/22/2018',120),
('07/23/2018',130)
SELECT dt, c,
c + LAG(c, 2) OVER (ORDER BY dt) + LAG(c, 1) OVER (ORDER BY dt)
FROM #t
ORDER BY dt
Returns:
dt c
2018-07-11 10 NULL
2018-07-12 20 NULL
2018-07-13 25 55
2018-07-14 40 85
2018-07-15 50 115
2018-07-16 60 150
2018-07-17 70 180
2018-07-18 80 210
2018-07-19 90 240
2018-07-20 100 270
2018-07-21 110 300
2018-07-22 120 330
2018-07-23 130 360
I'm not going to give you a full answer here, as you've not responded to my comments. Thus I'm going to give you a partial answer, so that you can work out how to do this yourself.
When working with SUM, you also have access to the OVER clause. In 2012+ (which I assume you're using, as 2008 is effectively out of support now, and anything before is completely unsupported) you have access to the ROWS BETWEEN clause in OVER.
For example:
WITH N AS (
SELECT *
FROM (VALUES(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)) V(N)),
Tally AS (
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS I
FROM N N1
CROSS JOIN N N2)
SELECT I,
SUM(I) OVER (ORDER BY I
ROWS BETWEEN CURRENT ROW AND 3 FOLLOWING) AS SomeSum
FROM Tally
ORDER BY I;
This example calculates the SUM of the current row, and the 3 following rows afterwards. So, for I = 1, that's SUM(1,2,3,4) = 10.
This can all be found in the documentation: SUM (Transact-SQL). SELECT - OVER Clause (Transact-SQL)
I have a T-SQL Quotes table and need to be able to count how many quotes were in an open status during past months.
The dates I have to work with are an 'Add_Date' timestamp and an 'Update_Date' timestamp. Once a quote is put into a 'Won' or 'Loss' columns with a value of '1' in that column it can no longer be updated. Therefore, the 'Update_Date' effectively becomes the Closed_Status timestamp.
Here's a few example records:
Quote_No Add_Date Update_Date Open_Quote Win Loss
001 01-01-2016 NULL 1 0 0
002 01-01-2016 3-1-2016 0 1 0
003 01-01-2016 4-1-2016 0 0 1
Here's a link to all the data here:
https://drive.google.com/open?id=0B4xdnV0LFZI1T3IxQ2ZKRDhNd1k
I asked this question previously this year and have been using the following code:
with n as (
select row_number() over (order by (select null)) - 1 as n
from master..spt_values
)
select format(dateadd(month, n.n, q.add_date), 'yyyy-MM') as yyyymm,
count(*) as Open_Quote_Count
from quotes q join
n
on (closed_status = 1 and dateadd(month, n.n, q.add_date) <= q.update_date) or
(closed_status = 0 and dateadd(month, n.n, q.add_date) <= getdate())
group by format(dateadd(month, n.n, q.add_date), 'yyyy-MM')
order by yyyymm;
The problem is this code is returning a cumulative value. So January was fine, but then Feb is really Jan + Feb, and March is Jan+Feb+March, etc. etc. It took me a while to discover this and the numbers returned now way, way off and I'm trying to correct them.
From the full data set the results of this code are:
Year-Month Open_Quote_Count
2017-01 153
2017-02 265
2017-03 375
2017-04 446
2017-05 496
2017-06 560
2017-07 609
The desired result would be how many quotes were in an open status during that particular month, not the cumulative :
Year-Month Open_Quote_Count
2017-01 153
2017-02 112
2017-03 110
2017-04 71
Thank you in advance for your help!
Unless I am missing something, LAG() would be a good fit here
Example
Declare #YourTable Table ([Year-Month] varchar(50),[Open_Quote_Count] int)
Insert Into #YourTable Values
('2017-01',153)
,('2017-02',265)
,('2017-03',375)
,('2017-04',446)
,('2017-05',496)
,('2017-06',560)
,('2017-07',609)
Select *
,NewValue = [Open_Quote_Count] - lag([Open_Quote_Count],1,0) over (Order by [Year-Month])
From #YourTable --<< Replace with your initial query
Returns
Year-Month Open_Quote_Count NewValue
2017-01 153 153
2017-02 265 112
2017-03 375 110
2017-04 446 71
2017-05 496 50
2017-06 560 64
2017-07 609 49
I trying to create a table that will support a simple event study analysis, but I'm not sure how best to approach this.
I'd like to create a table with the following columns: Customer, Date, Time on website, Outcome. I'm testing the premise that the outcome for a particular customer on any give day if a function of the time spent on the website on the current day as well as the preceding five site visits. I'm envisioning a table similar to this:
I'm hoping to write a T-SQL query that will produce an output like this:
Given this objective, here are my questions:
Assuming this is indeed possible, how should I structure my table to accomplish this objective? Is there a need for a column that refers to the prior visit? Do I need to add an index to a particular column?
Would this be considered a recursive query?
Given the appropriate table structure, what would the query look like?
Is it possible to structure the query with a variable that determines the number of prior periods to include in addition to the current period (for example, if I want to compare 5 periods to 3 periods)?
Not sure I understand analytic value of your matrix
Declare #Table table (id int,VisitDate date,VisitTime int,Outcome varchar(25))
Insert Into #Table (id,VisitDate,VisitTime,Outcome) values
(123,'2015-12-01',100,'P'),
(123,'2016-01-01',101,'P'),
(123,'2016-02-01',102,'N'),
(123,'2016-03-01',100,'P'),
(123,'2016-04-01', 99,'N'),
(123,'2016-04-09', 98,'P'),
(123,'2016-05-09', 99,'P'),
(123,'2016-05-14',100,'N'),
(123,'2016-06-13', 99,'P'),
(123,'2016-06-15', 98,'P')
Select *
,T0 = VisitTime
,T1 = Lead(VisitTime,1,0) over(Partition By ID Order By ID,VisitDate Desc)
,T2 = Lead(VisitTime,2,0) over(Partition By ID Order By ID,VisitDate Desc)
,T3 = Lead(VisitTime,3,0) over(Partition By ID Order By ID,VisitDate Desc)
,T4 = Lead(VisitTime,4,0) over(Partition By ID Order By ID,VisitDate Desc)
,T5 = Lead(VisitTime,5,0) over(Partition By ID Order By ID,VisitDate Desc)
From #Table
Order By ID,VisitDate Desc
Returns
id VisitDate VisitTime Outcome T0 T1 T2 T3 T4 T5
123 2016-06-15 98 P 98 99 100 99 98 99
123 2016-06-13 99 P 99 100 99 98 99 100
123 2016-05-14 100 N 100 99 98 99 100 102
123 2016-05-09 99 P 99 98 99 100 102 101
123 2016-04-09 98 P 98 99 100 102 101 100
123 2016-04-01 99 N 99 100 102 101 100 0
123 2016-03-01 100 P 100 102 101 100 0 0
123 2016-02-01 102 N 102 101 100 0 0 0
123 2016-01-01 101 P 101 100 0 0 0 0
123 2015-12-01 100 P 100 0 0 0 0 0
With fixed columns you can do it like this with lag:
select
time,
lag(time, 1) over (partition by customer order by date desc),
lag(time, 2) over (partition by customer order by date desc),
lag(time, 3) over (partition by customer order by date desc),
lag(time, 4) over (partition by customer order by date desc)
from
yourtable
If you need dynamic columns, then you'll have to build it using dynamic SQL.
We have a table that contains, for this example, links to demographic questions (questionID) for each subscriber, with a date indicating when the subscriber answered a particular demographic question. In some cases, a subscriber may have answered the same question again at a later date, and we now have multiple records for the same subscriber and questionID, but with different answer dates (see sample data):
subscriberID questionID dateAnswered isDeleted
------------ ----------- ----------------------- ---------
100 559 2015-07-29 13:07:26.153 0
100 560 2015-07-29 13:07:26.153 0
100 561 2015-07-29 13:07:26.153 0
100 562 2015-07-29 13:07:26.153 0
100 575 2015-07-29 13:07:26.153 0
102 559 2015-07-30 15:12:46.143 0
102 564 2015-07-30 15:12:46.143 0
102 588 2015-07-30 15:12:46.143 0
102 559 2015-07-31 16:11:53.323 0
114 575 2015-08-21 11:27:14.253 0
114 588 2015-08-21 11:27:14.253 0
114 560 2015-08-21 11:27:14.253 0
114 588 2015-08-24 05:44:42.030 0
114 562 2015-08-21 11:27:14.253 0
114 575 2015-08-24 05:44:42.030 0
The app that was storing the answers should have flagged the older records as "deleted" (set isDeleted = 1) but it did not do so, and I now need to clean up the older records.
This seems like it should be simple, but it's got me stumped. How do I (a) select any records where there are duplicate subscriberID and questionIDs but with different answer dates? And (b) how do I do an update to set all but the newest records for each subscriber to have isDeleted=1?
Any help would be appreciated! I suspect a self join may be in order, but I haven't figured it out yet. Thus the question!
;WITH X AS
(
SELECT ROW_NUMBER() OVER (PARTITION BY
subscriberID, questionID
ORDER BY dateAnswered DESC) rn
, *
FROM TableName
)
UPDATE X
SET isDeleted = 1
WHERE rn > 1
The select /update below will affect all records that are not marked as deleted , except for the last record by each subscriber for each question. Just another approach.
WITH LastAnswers AS
(
SELECT subscriberID ,questionID , MAX(dateAnswered) AS LastAnsweredDate
FROM TableName
GROUP BY subscriberID ,questionID
)
UPDATE TableName
SET TableName.isDeleted = 1
FROM
TableName
LEFT JOIN LastAnswers
ON TableName.subscriberID = LastAnswers.subscriberID
AND TableName.questionID = LastAnswers.questionID
AND TableName.dateAnswered = LastAnswers.LastAnsweredDate
WHERE LastAnswers.LastAnsweredDate IS NULL AND TableName.isDeleted = 0
I have a table :
UNIQUE KEY ID Clicks INSERTDATE
1 100001 10 2011-05-14 00:00:00.000
2 100001 20 2011-05-13 00:00:00.000
3 100001 30 2011-05-18 00:00:00.000
4 100002 10 2011-05-20 00:00:00.000
5 100002 15 2011-05-24 00:00:00.000
6 100002 10 2011-05-05 00:00:00.000
I have a threshold value for clicks, lets say 20.
I need to write a T-SQL which should remove the clicks that do not meet the threshold of the accumulative Sum of clicks for each ID.
So for the above example ID "100001" has an accumulative clicks of 60 (10+20+30) but since the threshold is 20, the last record i.e. with the click value of 30 should get removed from the result.
However, the second record should still be included even though the sum at that point is > my threshold (10 + 20).
EDIT :
Another major rule that needs to be applied is that the INSERTDATE has to be ordered before performing any calculations
Any help would be much appreciated.
If I understood the question correctly, you'd like to filter on the RunningTotal for a given Id, like so:
select c1.*
from ClickTable c1
outer apply (
select sum(Clicks) as RunningTotal
from ClickTable
where pk < c1.pk
and id = c1.id
) c2
where isnull(RunningTotal, 0) <= 20
this implies that you have a unique key field in the table, called PK.
Running sample: http://www.sqlfiddle.com/#!3/98173/11
Update
To order by Clicks instead of the primary key, just change the line
where pk < c1.pk
to
where Clicks < c1.Clicks
Running sample: http://www.sqlfiddle.com/#!3/31750/2
I hope I read the question correctly. Seems too simple:
SELECT ID, SUM(Clicks) AS Clicks
FROM t1
WHERE Clicks <= 20 -- <== this is your threshold
GROUP BY ID
Would give you
ID Clicks
100001 30
100002 35