Multiple rows calculation SQL Server - sql-server

I have a table :
UNIQUE KEY ID Clicks INSERTDATE
1 100001 10 2011-05-14 00:00:00.000
2 100001 20 2011-05-13 00:00:00.000
3 100001 30 2011-05-18 00:00:00.000
4 100002 10 2011-05-20 00:00:00.000
5 100002 15 2011-05-24 00:00:00.000
6 100002 10 2011-05-05 00:00:00.000
I have a threshold value for clicks, lets say 20.
I need to write a T-SQL which should remove the clicks that do not meet the threshold of the accumulative Sum of clicks for each ID.
So for the above example ID "100001" has an accumulative clicks of 60 (10+20+30) but since the threshold is 20, the last record i.e. with the click value of 30 should get removed from the result.
However, the second record should still be included even though the sum at that point is > my threshold (10 + 20).
EDIT :
Another major rule that needs to be applied is that the INSERTDATE has to be ordered before performing any calculations
Any help would be much appreciated.

If I understood the question correctly, you'd like to filter on the RunningTotal for a given Id, like so:
select c1.*
from ClickTable c1
outer apply (
select sum(Clicks) as RunningTotal
from ClickTable
where pk < c1.pk
and id = c1.id
) c2
where isnull(RunningTotal, 0) <= 20
this implies that you have a unique key field in the table, called PK.
Running sample: http://www.sqlfiddle.com/#!3/98173/11
Update
To order by Clicks instead of the primary key, just change the line
where pk < c1.pk
to
where Clicks < c1.Clicks
Running sample: http://www.sqlfiddle.com/#!3/31750/2

I hope I read the question correctly. Seems too simple:
SELECT ID, SUM(Clicks) AS Clicks
FROM t1
WHERE Clicks <= 20 -- <== this is your threshold
GROUP BY ID
Would give you
ID Clicks
100001 30
100002 35

Related

T-SQL Grouping Dynamic Date Ranges

Using MS SQL Server 2019
I have a set of recurring donation records. Each have a First Gift Date and a Last Gift Date associated with them. I need to add a GroupedID to these rows so that I can get the full date range for the earliest FirstGiftDate and the oldest LastGiftDate as long as there is not a break of more than 45 days in between the recurring donations.
For example Bob is a long time supporter. His card has expired multiple times and he has always started new gifts within 45 days. All of his gifts need to be given a single grouped ID. On the opposite side June has been donating and her card expires. She doesn't give again for 6 months, but then continues to give after her card expires. The first gift of Junes should get its own "GroupedID" and the second and third should be grouped together.The grouping count should restart with each donor.
My initial attempt was to join the donation table back to itself aliased as D2. This did work to give me an indicator of which ones were within the 45 day mark but I can't wrap my head around how to then link them. My only thought was to use LEAD and LAG to try analyze each scenario and figure out the different combinations of LEAD and LAG values needed to make it catch each different scenario, but that doesn't seem as reliable as scaleable as I'd like it to be.
I appreciate any help anyone can give.
My code:
SELECT #Donation.*, D2.*
FROM #Donation
LEFT JOIN #Donation D2 ON #Donation.RecurringGiftID <> D2.RecurringGiftID
AND #Donation.Donor = D2.Donor
AND ABS(DATEDIFF(DAY, #Donation.FirstGiftDate, D2.LastGiftDate)) < 45
Table structure and sample data:
CREATE TABLE #Donation
(
RecurringGiftID int,
Donor nvarchar(25),
FirstGiftDate date,
LastGiftDate date
)
INSERT INTO #Donation
VALUES (1, 'Bob', '2017-02-15', '2018-07-01'),
(15, 'Bob', '2018-08-05', '2019-04-01'),
(32, 'Bob', '2019-04-15', '2022-06-15'),
(54, 'June', '2015-05-01', '2016-05-01'),
(96, 'June', '2016-12-15', '2018-02-01'),
(120, 'June', '2018-03-04', '2020-07-01')
Desired output:
RecurringGiftId
Donor
FirstGiftDate
LastGiftDate
GroupedID
1
Bob
2017-02-15
2018-07-01
1
15
Bob
2018-08-05
2019-04-01
1
32
Bob
2019-04-15
2022-06-15
1
54
June
2015-05-01
2016-05-01
1
96
June
2016-12-15
2018-02-01
2
120
June
2018-03-04
2020-07-01
2
use LAG() to detect when current row is more than 45 days from previous and perform a cumulative sum to form the required Group ID
select *,
GroupedID = sum(g) over (partition by Donor order by FirstGiftDate)
from
(
select *,
g = case when datediff(day,
lag(LastGiftDate, 1, '19000101') over (partition by Donor
order by FirstGiftDate),
FirstGiftDate)
> 45
then 1
else 0
end
from #Donation
) d

SQL Merge to Update table with changed history

I have been tasked with building a history table in SQL. I have already built the base table which contains multiple left joins amongst other things. The base table will need to be compared to another table and only update specific columns that have changed, insert new rows where the key doesn't match.
Previously I have used other ETL tools which have GUI style built in SCD loaders, but I don't have such luxury in SQL Server. Here the merge statement can handle such operations. I have used the MERGE statement before, but I become a bit stuck when handling flags and date fields based on the operation performed.
Here is the BASE table
KEY
CLIENT
QUANTITY
CONTRACT_NO
FC_COUNT
DELETE_FLAG
RECORD_UPDATED_DATE
345
A
1000
5015
1
N
31/12/9999
346
B
2000
9352
1
N
31/12/9999
347
C
3000
6903
1
N
31/12/9999
348
D
1000
7085
1
N
31/12/9999
349
E
1000
8488
1
N
31/12/9999
350
F
500
6254
1
N
31/12/9999
Here is the table I plan to merge with
KEY
CLIENT
QUANTITY
CONTRACT_NO
FC_COUNT
345
A
1299
5015
1
346
B
2011
9352
1
351
Z
5987
5541
1
The results I'm looking for are
KEY
CLIENT
QUANTITY
CONTRACT_NO
FC_COUNT
DELETE_FLAG
RECORD_UPDATED_DATE
345
A
1000
5015
1
N
06/07/2022
345
A
1299
5015
1
N
31/12/9999
346
B
2000
9352
1
N
06/07/2022
346
B
2011
9352
1
N
31/12/9999
347
C
3000
6903
1
Y
06/07/2022
348
D
1000
7085
1
Y
06/07/2022
349
E
1000
8488
1
Y
06/07/2022
350
F
500
6254
1
Y
06/07/2022
351
Z
5987
5541
1
N
31/12/9999
As we can see I have shown the changes, closed off the old records, marked with a date and a delete flag if they are missing but was there previous, as well as new new row with the new key and data
Would this be a MERGE? Some direction on how to perform this sort of operation would be a great help. We have a lot of tables where we need to keep change history and this will help a lot going forward.
code shell attempt
SELECT
MAIN_KEY,
CLIENT,
QUANTITY,
CONTRACT_NO,
1 AS FC_COUNT,
NULL as DELETE_FLG_DD,
GETDATE() as RECORD_UPDATED_DATE
INTO #G1_DELTA
FROM
[dwh].STG_DTL
MERGE [dwh].[PRJ1_DELTA] TARGET
USING #G1_DELTA SOURCE
ON TARGET.MAIN_KEY = SOURCE.MAIN_KEY
WHEN MATCHED THEN INSERT
(
MAIN_KEY,
CLIENT,
QUANTITY,
CONTRACT_NO,
FC_COUNT,
DELETE_FLG_DD,
RECORD_UPDATED_DATE
)
VALUES
(
SOURCE.MAIN_KEY,
SOURCE.CLIENT,
SOURCE.QUANTITY,
SOURCE.CONTRACT_NO,
SOURCE.FC_COUNT,
SOURCE.DELETE_FLG_DD,
SOURCE.RECORD_UPDATED_DATE
)
If you need to build a history table containing the updated information from your two tables, you first need to select updated information from your two tables.
The changes that need to be applied to your tables are on:
"tab1.[DELETE_FLAG]", that should be updated to 'Y' whenever it has a match with tab2
"tab1.[RECORD_UPDATED_DATE]", that should be updated to the current date
"tab2.[DELETE_FLAG]", missing and that should be initialized to N
"tab2.[RECORD_UPDATED_DATE]", missing and that should be initialized to your random date 9999-12-31.
Once these changes are made, you can apply the UNION ALL to get the rows from your two tables together.
Then, in order to generate a table, you can use a cte to select the output result set and use the INTO <table> clause after a selection to generate your "history" table.
WITH cte AS (
SELECT tab1.[KEY],
tab1.[CLIENT],
tab1.[QUANTITY],
tab1.[CONTRACT_NO],
tab1.[FC_COUNT],
CASE WHEN tab2.[KEY] IS NOT NULL
THEN 'N'
ELSE 'Y'
END AS [DELETE_FLAG],
CAST(GETDATE() AS DATE) AS [RECORD_UPDATED_DATE]
FROM tab1
LEFT JOIN tab2
ON tab1.[KEY] = tab2.[KEY]
UNION ALL
SELECT *,
'N' AS [DELETE_FLAG],
'9999-12-31' AS [RECORD_UPDATED_DATE]
FROM tab2
)
SELECT *
INTO history
FROM cte
ORDER BY [KEY];
Check the demo here.

In SQL Server how to get some meaningful data from a chronicle alike table?

Say I have a table let's call it purchase table in SQL Server that represents user purchasing.
Table name: purchase
purchase_id buyer_member_id song_id
1 101 1001
2 101 1002
3 102 1001
4 102 1003
5 103 1001
6 103 1003
7 103 1004
Now I tried to make some stats out of this table. I want to know who has purchased both song 1001 and 1003.
select distinct buyer_member_id from purchase where
buyer_member_id in (select buyer_member_id from purchase where song_id = 1001)
and buyer_member_id in (select buyer_member_id from purchase where song_id = 1003)
This works but when we add more and more criteria to the equation, it became slower and slower. It's nearly impossible to do a research for something like, find people who buy a, b and c but not d nor f. I understand that the nature of this and the use of "where someid in (select someid from table where something) is probably not the best way to do it.
Question is, is there a better way?
I call these "set-within-a-set" queries, and like to approach them using group by and having:
select buyer_member_id
from purchase p
group by buyer_member_id
having sum(case when song_id = 1001 then 1 else 0 end) > 0 and
sum(case when song_id = 1003 then 1 else 0 end) > 0;
The sum() counts the number of purchases that match each song. The > 0 says there is at least 1. And = 0 would say there are none.

how to get record for which given date falls between two dates of same column in PostgreSql

My table is having data e.g. empcode designation code and promotion date, I want to get what was an employee's designation on some given date. for eg.
EmpCode DesignationCode PromotionDate
101 50 2010-01-25
101 10 2014-01-01
101 11 2015-01-01
102 10 2009-10-01
103 15 2015-01-01
now if I check designation as on 2014-02-01 it should give result as following
EmpCode DesignationCode PromotionDate
101 10 2014-01-01
102 10 2009-10-01
Can anyone please tell what query should I write ?
Thanks in Advance.
You can try:
SELECT DISTINCT ON (EmpCode) EmpCode, DesignationCode, PromotionDate
FROM mytable
WHERE PromotionDate <= '2014-02-01'
ORDER BY EmpCode, PromotionDate DESC
The query first filters out any records having a PromotionDate that is past given date, i.e. '2014-02-01'.
Using DISTINCT ON (EmpCode) we get one row per EmpCode. This is the one having the most recent PromotionDate (this is achieved by placing PromotionDate DESC in the ORDER BY clause).
Demo here

Spotfire date difference using over function

I have the following data set:
Item || Date || Client ID || Date difference
A || 12/12/2014 || 102 ||
A || 13/12/2014 || 102 || 1
B || 12/12/2014 || 141 ||
B || 17/12/2014 || 141 || 5
I would like to calculate the difference in years between the two dates when the client ID is the same. What expression can I use in a calculated column to get that value?
UPDATE
Hi
This would be the intended values calculated. My table has approximately 300,000 records in no particular order. Would I have to sort the physical table before using this formula? I used this example from another I found, my actual file has no item column. It is only the client ID, and date of the transaction. Thanks again for the help!
ClientId Date Days
102 2014.12.12 0
102 2014.12.13 1
141 2014.12.12 0
141 2014.12.17 5
123 2014.12.01 0
123 2014.12.02 1
123 2014.12.04 2
I used the following solution to deal with groups that had more than 2 rows/dates.
First create a calculated column to provide a rank order by date within each group:
RankDatePerUnit:
Rank([EventDate],[Group_Name])
Then another calculated column to do the date diff using an over expression to reference the previous date within the group.
TimeSinceLastEvent:
DateDiff("day",
First([EventDate]) OVER (Intersect([Group_Name], Previous([RankDatePerUnit]))),
[EventDate])
Note: Duplicate date could be handled differently by using denserank. The above approach will not calculate a zero date diff between two rows from the same group with a duplicate time. They'll both calculate their delta from an earlier date within the same group if one exists.
EDIT 2015.07.15
got it, so if you want the difference from the last customer-date pair. this expression will give you the table you've listed above. spacing for readability:
DateDiff('day',
First([Date) OVER (Intersect([ClientId], Previous([Date]))),
[Date]
)
EDIT 2015.07.13
if you want to reduce this so that you can accurately aggregate [Days], you can surround the above expression with an If(). I'll add some spacing to make this more readable:
If(
[Date] = Min([Date]) OVER Intersect([ClientId], [Item]),
DateDiff( 'day',
Min([Date]) OVER Intersect([ClientId], [Item]),
Max([Date]) OVER Intersect([ClientId], [Item])
)
, 0
)
in English: "If the value of the [Date] column in this row matches the earliest date for this [ItemId] and [ClientId] combination, then put the number of days difference between the first and last [Date] for this [ItemId] and [ClientId] combination; otherwise, put zero."
it results in something like:
Item ClientId Date Days
A 102 2014.12.12 1
A 102 2014.12.13 0
B 141 2014.12.12 5
B 141 2014.12.17 0
C 123 2014.12.01 2
C 123 2014.12.02 0
C 123 2014.12.03 0
WARNING that filters may break this calculation. for example, if you are filtering based on [Date] and, with the above table as an example, filter OUT all dates before 2014.12.13, Sum([Date]) will be 7 instead of 8 (because the first row has been filtered out).
you can use Spotfire's OVER functions to look at data points with common IDs across rows.
it looks like you've only got two rows per Client ID and Item ID, which helps us out! use the following formula:
DateDiff('day', Min([Date]) OVER Intersect([ClientId], [Item]), Max([Date]) OVER Intersect([ClientId], [Item]))
this will give you a column with the number of days difference between the two dates in each row:
Item ClientId Date Days
A 102 2014.12.12 1
A 102 2014.12.13 1
B 141 2014.12.12 5
B 141 2014.12.17 5

Resources