Choosing distinct ID with differing column values - sql-server

Lets say I have this query:
SELECT id, date, amount, cancelled
FROM transactions
Which gives me the following results:
id date amount cancelled
1 01/2019 25.10 0
1 02/2019 19.55 1
1 06/2019 20.33 0
2 10/2019 11.00 0
If there are duplicate IDs, how can I get the one with the latest date? So it would look like this:
id date amount cancelled
1 06/2019 20.33 0
2 10/2019 11.00 0

One method is with ROW_NUMBER and a common table expression like this example. In a multi-statement batch, be mindful to terminate the preceding statement with a semi-colon to avoid parsing errors.
WITH data_with_date_sequence AS (
SELECT
id
, date
, amount
, cancelled
, ROW_NUMBER() OVER(PARTITION BY id ORDER BY date DESC) AS seq
FROM dbo.SomeTable
)
SELECT
id
, date
, amount
, cancelled
FROM data_with_date_sequence
WHERE seq = 1;

One option could be to use ROW_NUMBER function, which will group rows by id and order them by date within same id.
;WITH max_dates AS (
SELECT id,
, date
, amount
, cancelled
, ROW_NUMBER() OVER (PARTITION BY id ORDER BY date DESC) AS Position
FROM transactions
)
SELECT * FROM max_dates WHERE Position = 1

Related

I would like the number '1000' to appear once only and then '0' for the remaining records until the next month appears-maybe a case type statement?

I am using SQL and I would like this number '1000' to appear once per month. I have a record set which has the first of every month appearing multiple times. I would like the number '1000' to appear once only and then '0' for the remaining records until the next month appears. I would like the below please- maybe a case type statement/order parition by? I am using SQL Server 2018 ##SQLSERVER. Please see table below of how i would like the data to appear.
Many Thanks :)
Date
Amount
01/01/2022
1000
01/01/2022
0
01/01/2022
0
01/02/2022
1000
01/02/2022
0
01/02/2022
0
01/03/2022
1000
01/03/2022
0
Solution for your problem:
WITH CT1 AS
(
SELECT *,
ROW_NUMBER() OVER(PARTITION BY CONCAT(MONTH([Date]),YEAR([Date])) ORDER BY [Date]) as rn
FROM your_table
)
SELECT [Date],
CASE WHEN rn = 1 THEN 1000 ELSE 0 END AS Amount
FROM CT1;
Working Example: DB<>Fiddle Link
Given just a list of dates you could use row_number and a conditional expression to arbitrarily assign one row of each month a value of 1000
select *,
Iif(Row_Number() over(partition by Month(date) order by (select null)) = 1, 1000, 0) Amount
from t
order by [date], Amount desc;

Write Query That Consider Date Interval

I have a table that contains Transactions of Customers.
I should Find Customers That had have at least 2 transaction with amount>20000 in Three consecutive days each month.
For example , Today is 2022/03/12 , I should Gather Data Of Transactions From 2022/02/13 To 2022/03/12, Then check These Data and See If a Customer had at least 2 Transaction With Amount>=20000 in Three consecutive days.
For Example, Consider Below Table:
Id
CustomerId
Transactiondate
Amount
1
1
2022-01-01
50000
2
2
2022_02_01
20000
3
3
2022_03_05
30000
4
3
2022_03_07
40000
5
2
2022_03_07
20000
6
4
2022_03_07
30000
7
4
2022_03_07
30000
The Out Put Should be : CustomerId =3 and CustomerId=4
I write query that Find Customer For Special day , but i don't know how to find these customers in one month with out using loop.
the query for special day is:
With cte (select customerid, amount, TransactionDate,Dateadd(day,-2,TransactionDate) as PrevDate
From Transaction
Where TransactionDate=2022-03-12)
Select CustomerId,Count(*)
From Cte
Where
TransactionDate>=Prevdate and TransactionDate<=TransactionDate
And Amount>=20000
Group By CustomerId
Having count(*)>=2
Hi there are many options how to achieve this.
I think that easies (from perfomance maybe not) is using LAG function:
WITH lagged_days AS (
SELECT
ISNULL(LAG(Transactiondate) OVER(PARTITION BY CustomerID ORDER BY id),
LEAD(Transactiondate) OVER(PARTITION BY CustomerID ORDER BY id)) lagged_dt
,*
FROM Transaction
), valid_cust_base as (
SELECT
*
FROM lagged_days
WHERE DATEPART(MONTH, lagged) = DATEPART(MONTH, Transactiondate)
AND datediff(day, Transactiondate, lagged_dt) <= 3
AND Amount >= 20000
)
SELECT
CustomerID
FROM valid_cust_base
GROUP BY CustomerID
HAVING COUNT(*) >= 2
First I have created lagged TransactionDate over customer (I assume that id is incremental). Then I have Selected only transactions within one month, with amount >= 20000 and where date difference between transaction is less then 4 days. Then just select customers who had more than 1 transaction.
In LAG First value is always missing per Customer missing, but you still need to be able say: 1st and 2nd transaction are within 3 days. Thats why I am replacing first NULL value with LEAD. It doesn't matter if you use:
ISNULL(LAG(Transactiondate) OVER(PARTITION BY CustomerID ORDER BY id),
LEAD(Transactiondate) OVER(PARTITION BY CustomerID ORDER BY id)) lagged_dt
OR
ISNULL(LEAD(Transactiondate) OVER(PARTITION BY CustomerID ORDER BY id),
LAG(Transactiondate) OVER(PARTITION BY CustomerID ORDER BY id)) lagged_dt
The main goal is to have for each transaction closest TransactionDate.

Is it possible to use the SQL DATEADD function but exclude dates from a table in the calculation?

Is it possible to use the DATEADD function but exclude dates from a table?
We already have a table with all dates we need to exclude. Basically, I need to add number of days to a date but exclude dates within a table.
Example: Add 5 days to 01/08/2021. Dates 03/08/2021 and 04/08/2021 exist in the exclusion table. So, resultant date should be: 08/08/2021.
Thank you
A bit of a "wonky" solution, but it works. Firstly we use a tally to create a Calendar table of dates, that exclude your dates in the table, then we get the nth row, where n is the number of days to add:
DECLARE #DaysToAdd int = 5,
#StartDate date = '20210801';
WITH N AS(
SELECT N
FROM (VALUES(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL))N(N)),
Tally AS(
SELECT 0 AS I
UNION ALL
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS I
FROM N N1, N N2, N N3), --Up to 1,000
Calendar AS(
SELECT DATEADD(DAY,T.I, #StartDate) AS D,
ROW_NUMBER() OVER (ORDER BY T.I) AS I
FROM Tally T
WHERE NOT EXISTS (SELECT 1
FROM dbo.DatesTable DT
WHERE DT.YourDate = DATEADD(DAY,T.I, #StartDate)))
SELECT D
FROM Calendar
WHERE I = #DaysToAdd+1;
A best solution is probably a calendar table.
But if you're willing to traverse through every date, then a recursive CTE can work. It would require tracking the total iterations and another column to substract if any traversed date was in the table. The exit condition uses the total difference.
An example dataset would be:
CREATE TABLE mytable(mydate date); INSERT INTO mytable VALUES ('20210803'), ('20210804');
And an example function run in it's own batch:
ALTER FUNCTION dbo.fn_getDays (#mydate date, #daysadd int)
RETURNS date
AS
BEGIN
DECLARE #newdate date;
WITH CTE(num, diff, mydate) AS (
SELECT 0 AS [num]
,0 AS [diff]
,DATEADD(DAY, 0, #mydate) [mydate]
UNION ALL
SELECT num + 1 AS [num]
,CTE.diff +
CASE WHEN DATEADD(DAY, num+1, #mydate) IN (SELECT mydate FROM mytable)
THEN 0 ELSE 1 END
AS [diff]
,DATEADD(DAY, num+1, #mydate) [mydate]
FROM CTE
WHERE (CTE.diff +
CASE WHEN DATEADD(DAY, num+1, #mydate) IN (SELECT mydate FROM mytable)
THEN 0 ELSE 1 END) <= #daysadd
)
SELECT #newdate = (SELECT MAX(mydate) AS [mydate] FROM CTE);
RETURN #newdate;
END
Running the function:
SELECT dbo.fn_getDays('20210801', 5)
Produces output, which is the MAX(mydate) from the function:
----------
2021-08-08
For reference the MAX(mydate) is taken from this dataset:
n diff mydate
----------- ----------- ----------
0 0 2021-08-01
1 1 2021-08-02
2 1 2021-08-03
3 1 2021-08-04
4 2 2021-08-05
5 3 2021-08-06
6 4 2021-08-07
7 5 2021-08-08
You can use the IN clause.
To perform the test, I used a W3Schools Test DB
SELECT DATE_ADD(BirthDate, INTERVAL 10 DAY) FROM Employees WHERE FirstName NOT IN (Select FirstName FROM Employees WHERE FirstName LIKE 'N%')
This query shows all the birth dates + 10 days except for the only employee with name starting with N (Nancy)

Create new incremental grouping column based on if logic from group, rank, and category columns

I'm trying to sum totals together that goes beyond a basic "group by" or "case" statement.
Here's an example datasets:
Amt Cust_id Ranking PlanType
10 1 1 Term
6 1 2 Variable
8 1 3 Variable
7 1 4 Variable
12 1 5 Term
6 1 6 Variable
10 1 7 Variable
The objective is to return the max sum where the plan type is 'Variable' and
the Ranking numbers are adjacent to each other.
So the answer to the example would be the sum of rows 2-4 which returns 21.
The answer is not the sum of all variable plan types, because row 5 is a 'Term' which breaks it apart.
So I'd like to end with a dataset like below to handle multiple groups of customers:
Amt Cust_ID
21 1
30 2
45 3
Here's where I'm stuck which returns wrong answer:
Create Table #tb (Amt INT, Cust_id TINYINT, Ranking INT, PlanType
VARCHAR(10))
INSERT INTO #tb
VALUES (10,1,1,'Term'),
(6,1,2,'Variable'),
(8,1,3,'Variable'),
(7,1,4,'Variable'),
(12,1,5,'Term'),
(6,1,6,'Variable'),
(10,1,7,'Variable'),
(10,2,1,'Term'),
(6,2,2,'Variable'),
(7,2,4,'Variable'),
(12,2,5,'Term'),
(6,2,6,'Variable'),
(50,2,7,'Variable')
select
( SELECT SUM(Amt) FROM #tb as t2
WHERE t2.Cust_ID=t1.Cust_ID AND t2.Ranking<=t1.Ranking AND
t2.PlanType='Variable') RollingAmt
,Cust_ID, Ranking, Amt, PlanType
from #tb as t1
order by Cust_ID, Ranking
The query runs a rolling sum ordered by "Ranking" where PlanType = 'Variable'. Unfortunately it runs a rolling sum of all "Variable"'s together. I need it to not do that.
If it runs into a PlanType "Term" it needs to start over its sum within each group.
In order to do this you need to use a gaps-and-islands technique to generate a "group id" based on consecutive runs of the same PlanType, then you can sum and sort based on that new group id.
Try this:
DECLARE #data TABLE (Amt INT, Cust_id TINYINT, Ranking INT, PlanType VARCHAR(10))
INSERT INTO #data
VALUES (10,1,1,'Term'),
(6,1,2,'Variable'),
(8,1,3,'Variable'),
(7,1,4,'Variable'),
(12,1,5,'Term'),
(6,1,6,'Variable'),
(10,1,7,'Variable'),
(10,2,1,'Term'),
(6,2,2,'Variable'),
(7,2,4,'Variable'),
(12,2,5,'Term'),
(6,2,6,'Variable'),
(50,2,7,'Variable')
;WITH X AS
(
SELECT *,
ROW_NUMBER() OVER(PARTITION BY Cust_id,PlanType ORDER BY Ranking)
- ROW_NUMBER() OVER(PARTITION BY Cust_id ORDER BY Ranking) groupID /* Assign a groupID to consecutive runs of PlanTypes by Cust_id */
FROM #data
), Y AS
(
SELECT *, SUM(Amt) OVER(PARTITION BY Cust_id,groupID) AS AmtSum /* Sum Amt by Cust/groupID */
FROM X
WHERE PlanType='Variable'
), Z AS
(
SELECT *, ROW_NUMBER() OVER(PARTITION BY Cust_id ORDER BY AmtSum DESC) AS RN /* Assign a row number (1) to highest AmtSum by Cust */
FROM Y
)
SELECT AmtSum, Cust_id
FROM Z
WHERE RN=1 /* Only select RN=1 to get highest value by cust_id/groupId */
If you are curious about how this all works, you can comment the last SELECT and do SELECT * FROM X then SELECT * FROM Y etc, to see what each step does along the way; but only one SELECT can follow the entire CTE structure.

SQL Server : find Cust with Continuous Enrollment

I have a task to solve well known problem in industry task to ID those CustID who have continuous activity , for given period of time and we allow little breaks between contracts.
I did first part populating matrix table like in snippet below for whole period of time and setting flag if it's active for this date, I think this is the only reliable way to do this, as contracts can have overlaps, etc..
So now I need to check if CustID is 1/0 for cont activity, I stuck into the task how to track this, let say in my example there is 3 days break which is OK, but I need to make sure that those days are one after another.
Do you have any good ideas how I can do this nicely, appreciate your help and leads. I saw some examples but they done in SAS so it's hard to understand.
declare #maxBreak int = 3 -- 3 days max allowed for continuse contract
declare #PeriodStart date = '2015-1-11', #PeriodEnd date = '2015-1-19';
;with matrix_dd as
(
select *
from
(select 111 CustID, '2015-1-11' dd, 1 Active union
select 111 CustID, '2015-1-12' dd, 0 Active union
select 111 CustID, '2015-1-13' dd, 0 Active union
select 111 CustID, '2015-1-14' dd, 0 Active union
select 111 CustID, '2015-1-15' dd, 1 Active union
select 111 CustID, '2015-1-16' dd, 1 Active union
select 111 CustID, '2015-1-17' dd, 1 Active union
select 111 CustID, '2015-1-18' dd, 1 Active union
select 111 CustID, '2015-1-19' dd, 0 Active union
select 111 CustID, '2015-1-20' dd, 0 Active) a
)
select *
from matrix_dd
Best
M
This solution calculates the active ranges and how long of a break it's been since the last interval ended:
declare #maxBreak int = 3 -- 3 days max allowed for continuse contract
declare #PeriodStart date = '2015-1-11', #PeriodEnd date = '2015-1-19';
with matrix_dd as
(
select * from ( values
(111, '2015-1-11', 1 ),
(111, '2015-1-12', 0 ),
(111, '2015-1-13', 0 ),
(111, '2015-1-14', 0 ),
(111, '2015-1-15', 1 ),
(111, '2015-1-16', 1 ),
(111, '2015-1-17', 1 ),
(111, '2015-1-18', 1 ),
(111, '2015-1-19', 0 ),
(111, '2015-1-20', 0 )
) as x(CustID, dd, Active)
), active_with_groups as (
select *,
row_number() over (partition by CustID order by dd) -
datediff(day, '2000-01-01', dd) as gid
from matrix_dd
where active = 1
and dd between #PeriodStart and #PeriodEnd
), islands as (
select CustId, min(dd) as islandStart, max(dd) as islandEnd
from active_with_groups
group by CustID, gid
), islands_with_gaps as (
select *,
datediff(
day,
lag(islandEnd, 1, islandStart)
over (partition by CustID order by islandStart),
islandStart
) - 1 as [break]
from islands
)
select *
from islands_with_gaps
where [break] >= #maxBreak
order by islandStart
Let's break it down. In the "active_with_groups" common table expression (CTE), all I'm doing is converting the dates into integers that have the same relationship by using datediff(). Why? Integers are easier to work with for this problem. Note that I'm also using row_number() to get a contiguous sequence and then getting the difference between that and the datediff() value. The key observation is that if the days also don't go up contiguously, that difference will be, well, different. Likewise, if the dates do go up contiguously, then the difference will be the same. Therefore, we can use this value as a group identifier for values that are in a contiguous range.
Next, we use that the group identifier to group by (bet you didn't see that coming!). This gives us the start and end of each interval. Nothing very clever is going on here.
The next step is to calculate the amount of time that's passed between when the last interval ended and the current one began. For this, we use a simple call to the lag() function. The only thing to note here is that I've chosen to have the lag() function emit a default value of islandStart in the case of the first interval. It could have just as easily been no default (which would have then caused it to emit a NULL value).
Lastly, we look for intervals with a gap over the specified threshold.
Similar to Ben's answer. I'm assuming that all your dates are represented in the data. So really we just need to make sure there isn't a run of zeroes longer than 3.
with inactive_runs as (
select
CustID,
row_number() over (partition by CustID order by dd)
- datediff(day, min(dd) over (partition by CustID), dd) as grp
from matrix_dd
where Active = 0
)
select distinct CustID from matrix_dd m
where 3 >= all (
select count(*) from inactive_runs ir
where ir.CustID = m.CustID
group by grp
);
http://rextester.com/AHI22250
Using all isn't particularly common. Here's an alternative:
...
with inactive_runs as (
select
CustID, dd, /* <-- had to add dd */
row_number() over (partition by CustID order by dd)
- datediff(day, min(dd) over (partition by CustID), dd) as grp
from #matrix_dd
where Active = 0
)
select distinct CustID from matrix_dd m
where not exists (
select 1 from inactive_runs ir
where ir.CustID = m.CustID
group by grp
having datediff(day, min(dd), max(dd)) > 2
);
I glanced at your comment above. I think it confirms my suspicion that you've got a single row for every date. If you've got a new version of SQL Server you can just sum over the previous three rows. Unfortunately you wouldn't be able to use a variable for the window size if the length is variable:
with cust as (
select
CustID,
case when
sum(case when Active = 0 then 1 end) over (
partition by CustID
order by dd
rows between 3 preceding and current row
) = 4 then 1
end as isBrk
from matrix_dd
)
select CustID
from cust
group by CustID
having count(isBrk) = 0;
Edit:
Based on your comment with the data in a "pre-matrix" format, yes, that's a simpler query. At that point you're just looking at the previous end date and the current row's start date.
with data as (
select * from (
values (111, 1230, '2014-12-11', '2015-01-11'),
(111, 1231, '2015-01-15', '2015-01-18'),
(111, 1232, '2015-03-22', '2015-04-01')
) as t (CustID, ContractID, StartDD, EndDD)
), gaps as (
select
CustID,
datediff(day,
lag(EndDD, 1, StartDD) over (partition by CustID order by StartDD),
StartDD
) as days
from data
)
select CustID
from gaps
group by CustID;
having max(days) <= 3;

Resources