SQL Server: How to get a rolling sum over 3 days for different customers within same table - sql-server

This is the input table:
Customer_ID Date Amount
1 4/11/2014 20
1 4/13/2014 10
1 4/14/2014 30
1 4/18/2014 25
2 5/15/2014 15
2 6/21/2014 25
2 6/22/2014 35
2 6/23/2014 10
There is information pertaining to multiple customers and I want to get a rolling sum across a 3 day window for each customer.
The solution should be as below:
Customer_ID Date Amount Rolling_3_Day_Sum
1 4/11/2014 20 20
1 4/13/2014 10 30
1 4/14/2014 30 40
1 4/18/2014 25 25
2 5/15/2014 15 15
2 6/21/2014 25 25
2 6/22/2014 35 60
2 6/23/2014 10 70
The biggest issue is that I don't have transactions for each day because of which the partition by row number doesn't work.
The closest example I found on SO was:
SQL Query for 7 Day Rolling Average in SQL Server
but even in that case there were transactions made everyday which accomodated the rownumber() based solutions
The rownumber query is as follows:
select customer_id, Date, Amount,
Rolling_3_day_sum = CASE WHEN ROW_NUMBER() OVER (partition by customer_id ORDER BY Date) > 2
THEN SUM(Amount) OVER (partition by customer_id ORDER BY Date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)
END
from #tmp_taml9
order by customer_id
I was wondering if there is way to replace "BETWEEN 2 PRECEDING AND CURRENT ROW" by "BETWEEN [DATE - 2] and [DATE]"

One option would be to use a calendar table (or something similar) to get the complete range of dates and left join your table with that and use the row_number based solution.
Another option that might work (not sure about performance) would be to use an apply query like this:
select customer_id, Date, Amount, coalesce(Rolling_3_day_sum, Amount) Rolling_3_day_sum
from #tmp_taml9 t1
cross apply (
select sum(amount) Rolling_3_day_sum
from #tmp_taml9
where Customer_ID = t1.Customer_ID
and datediff(day, date, t1.date) <= 3
and t1.Date >= date
) o
order by customer_id;
I suspect performance might not be great though.

Related

Write Query That Consider Date Interval

I have a table that contains Transactions of Customers.
I should Find Customers That had have at least 2 transaction with amount>20000 in Three consecutive days each month.
For example , Today is 2022/03/12 , I should Gather Data Of Transactions From 2022/02/13 To 2022/03/12, Then check These Data and See If a Customer had at least 2 Transaction With Amount>=20000 in Three consecutive days.
For Example, Consider Below Table:
Id
CustomerId
Transactiondate
Amount
1
1
2022-01-01
50000
2
2
2022_02_01
20000
3
3
2022_03_05
30000
4
3
2022_03_07
40000
5
2
2022_03_07
20000
6
4
2022_03_07
30000
7
4
2022_03_07
30000
The Out Put Should be : CustomerId =3 and CustomerId=4
I write query that Find Customer For Special day , but i don't know how to find these customers in one month with out using loop.
the query for special day is:
With cte (select customerid, amount, TransactionDate,Dateadd(day,-2,TransactionDate) as PrevDate
From Transaction
Where TransactionDate=2022-03-12)
Select CustomerId,Count(*)
From Cte
Where
TransactionDate>=Prevdate and TransactionDate<=TransactionDate
And Amount>=20000
Group By CustomerId
Having count(*)>=2
Hi there are many options how to achieve this.
I think that easies (from perfomance maybe not) is using LAG function:
WITH lagged_days AS (
SELECT
ISNULL(LAG(Transactiondate) OVER(PARTITION BY CustomerID ORDER BY id),
LEAD(Transactiondate) OVER(PARTITION BY CustomerID ORDER BY id)) lagged_dt
,*
FROM Transaction
), valid_cust_base as (
SELECT
*
FROM lagged_days
WHERE DATEPART(MONTH, lagged) = DATEPART(MONTH, Transactiondate)
AND datediff(day, Transactiondate, lagged_dt) <= 3
AND Amount >= 20000
)
SELECT
CustomerID
FROM valid_cust_base
GROUP BY CustomerID
HAVING COUNT(*) >= 2
First I have created lagged TransactionDate over customer (I assume that id is incremental). Then I have Selected only transactions within one month, with amount >= 20000 and where date difference between transaction is less then 4 days. Then just select customers who had more than 1 transaction.
In LAG First value is always missing per Customer missing, but you still need to be able say: 1st and 2nd transaction are within 3 days. Thats why I am replacing first NULL value with LEAD. It doesn't matter if you use:
ISNULL(LAG(Transactiondate) OVER(PARTITION BY CustomerID ORDER BY id),
LEAD(Transactiondate) OVER(PARTITION BY CustomerID ORDER BY id)) lagged_dt
OR
ISNULL(LEAD(Transactiondate) OVER(PARTITION BY CustomerID ORDER BY id),
LAG(Transactiondate) OVER(PARTITION BY CustomerID ORDER BY id)) lagged_dt
The main goal is to have for each transaction closest TransactionDate.

Choose row that equal to the max value from a query

I want to know who has the most friends from the app I own(transactions), which means it can be either he got paid, or paid himself to many other users.
I can't make the query to show me only those who have the max friends number (it can be 1 or many, and it can be changed so I can't use limit).
;with relationships as
(
select
paid as 'auser',
Member_No as 'afriend'
from Payments$
union all
select
member_no as 'auser',
paid as 'afriend'
from Payments$
),
DistinctRelationships AS (
SELECT DISTINCT *
FROM relationships
)
select
afriend,
count(*) cnt
from DistinctRelationShips
GROUP BY
afriend
order by
count(*) desc
I just can't figure it out, I've tried count, max(count), where = max, nothing worked.
It's a two columns table - "Member_No" and "Paid" - member pays the money, and the paid is the one who got the money.
Member_No
Paid
14
18
17
1
12
20
12
11
20
8
6
3
2
4
9
20
8
10
5
20
14
16
5
2
12
1
14
10
It's from Excel, but I loaded it into sql-server.
It's just a sample, there are 1000 more rows
It seems like you are massively over-complicating this. There is no need for self-joining.
Just unpivot each row so you have both sides of the relationship, then group it up by one side and count distinct of the other side
SELECT
-- for just the first then SELECT TOP (1)
-- for all that tie for the top place use SELECT TOP (1) WITH TIES
v.Id,
Relationships = COUNT(DISTINCT v.Other),
TotalTransactions = COUNT(*)
FROM Payments$ p
CROSS APPLY (VALUES
(p.Member_No, p.Paid),
(p.Paid, p.Member_No)
) v(Id, Other)
GROUP BY
v.Id
ORDER BY
COUNT(DISTINCT v.Other) DESC;
db<>fiddle

How can i do cumulative total in SQL Server?

Company_Name Amount Cumulative Total
---------------------------------------------
Company 6 100 100
Company 6 200 300
Company 6 150 450
Company 7 700 700
Company 7 1100 1800
Company 7 500 2300
How can I do cumulative sum group by company as shown in this example?
First, you need a column that specifies the ordering, because SQL tables represent unordered sets. Let me assume you have such a column.
Then the function is sum() as a window function:
select t.*,
sum(amount) over (partition by company order by <ordering col>)
from t;
Note: This does not return 0 for the "first" row for each company, so it really is a cumulative sum. For your logic, you need an additional conditional:
select t.*,
(case when row_number() over (partition by company order by <ordering col>) = 1
then 0
else sum(amount) over (partition by company order by <ordering col>)
end)
from t;

Find and replace rows with similar value in one column in Oracle SQL

I want to find the rows which are similar to each other, and replace them with a new row. My table looks like this:
OrderID | Price | Minimum Number | Maximum Number | Volume
1 45 2 10 250
2 46 2 10 250
3 60 2 10 250
"Similar" in this context means that the rows that have same Maximum Number, Minimum Number, and Volume. Prices can be different, but the difference can be at most 2.
In this example, orders with OrderID of 1 and 2 are similar, but 3 is not (since even if it has same Minimum Number, Maximum Number, and Volume, its price is not within 2 units from orders 1 and 2).
Then, I want orders 1 and 2 be replaced by a new order, let's say OrderID 4, which has same Minimum Number and Maximum Number. Its Volume hass to be sum of volumes of the orders it is replacing. Its price can be the Price of any of the previous orders that will be deleted in the output table (45 or 46 in this example). So, the output for the example above would be:
OrderID | Price | Minimum Number | Maximum Number | Volume
4 45 2 10 500
3 60 2 10 250
Here is a way to do this in SQL Server 2012 or Oracle. The idea is to use lag() to find where groups should begin and end and then aggregate.
select min(id) as id, min(price) as price, MinimumNumber, MaximumNumber, sum(Volume)
from (select t.*,
sum(case when prev_price < price - 2 then 1 else 0 end) over
(partition by MinimumNumber, MaximumNumber, Volume order by price) as grp
from (select t.*,
lag(price) over (partition by MinimumNumber, MaximumNumber, Volume
order by price
) as prev_price
from table t
) t
) t
group by grp, price, MinimumNumber, MaximumNumber;
The only issue is the setting of the id. I'm not sure what the exact rule is for that.

SELECT multiple rows where date Is greater than X minutes of previous row

I have a need to SELECT all the rows from a table where the selected rows are greater than the datetime of the previously selected row by a given constant number of minutes. An example probably speaks best.
The following represents the table of data - we will call it myTable.
guid fkGuid myDate
------- ------- ---------------------
1 100 2013-01-10 11:00:00.0
2 100 2013-01-10 11:05:00.0
3 100 2013-01-10 11:10:00.0
4 100 2013-01-10 11:15:00.0
5 100 2013-01-10 11:20:00.0
6 100 2013-01-10 11:25:00.0
7 100 2013-01-10 11:30:00.0
8 100 2013-01-10 11:35:00.0
9 100 2013-01-10 11:40:00.0
10 100 2013-01-10 11:50:00.0
11 100 2013-01-10 11:55:00.0
What I want to do is provide a constant increment (say 10 minutes) and get back all the rows from the first that are 10 minutes or more from the previous row. So, with 10 minutes the result set should look like this:
guid myDate
------- ---------------------
1 2013-01-10 11:00:00.0
3 2013-01-10 11:10:00.0
5 2013-01-10 11:20:00.0
7 2013-01-10 11:30:00.0
9 2013-01-10 11:40:00.0
11 2013-01-10 11:55:00.0
The constant is passed in as a variable so it could be anything. Let's say it was 23 minutes, then the result set should look like this:
guid myDate
------- ---------------------
1 2013-01-10 11:00:00.0
6 2013-01-10 11:25:00.0
10 2013-01-10 11:50:00.0
The last example shows that I start at row 0's time (11:00:00) add 23 minutes and get the next >= row which is 11:25:00, add 23 minutes to the new row's time and then get the next (11:50:00) and so on.
I have tried doing this with a CTE but although I can quite easily get back all my times or none of them, I can't seem to figure how to get the rows I need. My current test code using 23 minutes hard coded into the WHERE clause:
WITH myCTE AS
(
SELECT guid,
myDate,
ROW_NUMBER() OVER (PARTITION BY guid ORDER BY myDate ASC) AS rowNum
FROM myTable
WHERE fkGuid = 100
)
SELECT currentRow.guid, currentRow.myDate
FROM myCTE AS currentRow
LEFT OUTER JOIN
myCTE AS previousRow
ON currentRow.guid = previousRow.guid
AND currentRow.rowNum = previousRow.rowNum + 1
WHERE
currentRow.myDate > DATEADD(minute, 23, previousRow.myDate)
ORDER BY
currentRow.myDate ASC
This returns nothing. If I omit the WHERE clause I get all rows back (obviously because I'm not filtering).
What am I missing?
Any and all help would be very much appreciated as it always is!
#gilly3, hardly SQL voodoo
WITH CTE
AS
(
SELECT TOP 1
guid
,fkGuid
,myDate
,ROW_NUMBER() OVER (ORDER BY myDate) RowNum
FROM MyTable
UNION ALL
SELECT mt.guid
,mt.fkGuid
,mt.myDate
,ROW_NUMBER() OVER (ORDER BY mt.myDate)
FROM MyTable mt
INNER JOIN
CTE ON mt.myDate>=DATEADD(minute,23,CTE.myDate)
WHERE RowNum=1
)
SELECT guid
,fkGuid
,myDate
FROM CTE
WHERE RowNum=1
The SQL Fiddle is here
First, your join will never return any rows, regardless of the where clause. Guid and rowNum are both unique keys per row, so if the guid is the same, so will be the rowNum. You can see that the join always fails by adding a field from previousRow to your select list and running your query without the where clause.
Next, joining on rowNum + 1 prevents skipping rows. You will only select adjacent rows that satisfy the date filter.
There may be some SQL voodoo with recursive queries that will make this work, but there will be a huge performance hit. Filter the data in your application code. Eg, in C#:
List<DataRow> FilterByInterval(IEnumerable<DataRow> rows, string dateColumn, int minutes)
{
List<DataRow> filteredRows = new List<DataRow>();
DateTime lastDate = DateTime.MinValue;
foreach (DataRow row in rows)
{
DateTime dt = row.Field<DateTime>(dateColumn);
TimeSpan diff = dt - lastDate;
if (diff.TotalMinutes >= minutes)
{
filteredRows.Add(row);
lastDate = dt;
}
}
return rows;
}

Resources