Date span over tables - sql-server

I want to show all profit centers a user had. The necessary information are in two tables.
TerritoryAssignment
+-------------0----------+------------+------------+--------------+
| TerritoryID | DBUserID | ValidFrom | ValidThru | AssignmentID |
+-------------+----------+------------+------------+--------------+
| T1 | 472 | 2019-03-01 | 2019-12-31 | 1389 |
| T4 | 472 | 2020-01-01 | 2020-10-31 | 2105 |
| T8 | 472 | 2020-11-01 | 2021-09-12 | 2226 |
| T12 | 472 | 2021-09-13 | 2021-11-30 | 2578 |
| T2 | 472 | 2021-12-01 | 9999-12-31 | 2659 |
+-------------+----------+------------+------------+--------------+
TerritoryDetails
+-----------+--------------+------------+------------+--------------------+
TerritoryID | ProfitCenter | ValidFrom | ValidThru | TerritoryDetailsID |
+-----------+--------------+------------+------------+--------------------+
| T2 | P05 | 2021-12-01 | 2022-04-30 | 983 |
| T2 | P18 | 2022-05-01 | 9999-12-31 | 1029 |
| T1 | P45 | 2012-09-01 | 9999-12-31 | 502 |
| T4 | P23 | 2020-01-01 | 9999-12-31 | 755 |
| T12 | P05 | 2020-01-01 | 9999-12-31 | 846 |
| T8 | P18 | 2020-01-01 | 9999-12-31 | 956 |
+-----------+--------------+------------+------------+--------------------+
Both tables are joined over the field TerritoryID.
As you can see, the user hat profit center P18 at two different time periods. Therefore a simple MIN and MAX with grouping is not possible. Also see that there are two records for profit center P05 which should be aggregated into one.
I want to get all Profit Centers a user had in the past including the exact start date and end date. The output could look like this:
+----------+------------+------------+--------------+
| DBUserID | ValidFrom | ValidThru | ProfitCenter |
+----------+------------+------------+--------------+
| 472 | 2019-03-01 | 2019-12-31 | P45 |
| 472 | 2020-01-01 | 2020-10-31 | P23 |
| 472 | 2020-11-01 | 2021-09-12 | P18 |
| 472 | 2021-09-13 | 2022-04-30 | P05 |
| 472 | 2022-05-01 | 9999-12-31 | P18 |
+----------+------------+------------+--------------+
The problem is:
A user can have different territory assignments over the time -> multiple entries in table TerritoryAssignment for a user.
Territory details can change -> multiple entries in table TerritoryDetails for a territory.
The profit center can be the same over several TerritoryDatails records.
The profit canter can be appear at another territory as well.
The start and end dates in the records of both tables are independent and therefore cannot used for a join.
I already tried some CTE with ROW_NUMBER() but was not successful. Here my last try, not giving the correct result:
SELECT
DBUserID,
ProfitCenter,
MIN(ValidFrom) AS IslandStartDate,
MAX(ValidThru) AS IslandEndDate
FROM
(
SELECT
*,
CASE WHEN Groups.PreviousEndDate >= ValidFrom THEN 0 ELSE 1 END AS IslandStartInd,
SUM(CASE WHEN Groups.PreviousEndDate >= ValidFrom THEN 0 ELSE 1 END) OVER (ORDER BY Groups.RN) AS IslandId
FROM
(
SELECT
ROW_NUMBER() OVER(ORDER BY DBUserID,ProfitCenter,TD.ValidFrom,TD.ValidThru) AS RN,
DBUserID,
ProfitCenter,
IIF(TA.ValidFrom<TD.ValidFrom,TD.ValidFrom,TA.ValidFrom) AS ValidFrom,
IIF(TA.ValidThru>TD.ValidThru,TD.ValidThru,TA.ValidThru) AS ValidThru,
LAG(TD.ValidFrom,1) OVER (ORDER BY DBUserID,ProfitCenter,TD.ValidFrom,TD.ValidThru) AS PreviousEndDate
FROM
dbo.TerritoryDetails TD INNER JOIN dbo.TerritoryAssignment TA ON TD.TerritoryID=TA.TerritoryID
WHERE TA.DBUserID=472
AND (
(TD.ValidFrom<=TA.ValidFrom AND TD.ValidThru>=TA.ValidFrom)
OR (TD.ValidFrom>=TA.ValidFrom AND TD.ValidThru<=TA.ValidThru)
OR (TD.ValidFrom<=TA.ValidThru AND TD.ValidThru>=TA.ValidThru)
)
) Groups
) Islands
GROUP BY
IslandId,DBUserID,ProfitCenter
ORDER BY
IslandStartDate desc
Does anybody can help?

The first problem here is finding intervals of time which overlap.
This actually turns out to be a fairly trivial expression. Given two intervals T1 and T2, they overlap if:
T1.start <= T2.end and T1.end >= T2.start
Here is a good explanation of why that works
The next problem is that once we have constructed the overlapped intervals, those intervals may be consecutive, ie, one interval starts the day after a prior interval ends. You want to "concatenate" those intervals. An example of this can be seen in your sample data for P05, where the following intervals exist:
2021-09-13 to 2021-11-30
2021-12-01 to 2022-04-30
And your desired result shows the total interval 2021-09-13 to 2022-04-30
We can use some running-sum "magic" to find consecutive intervals and merge them together.
Here is the complete logic broken up into distinct CTEs to make it easier to understand. Comments inline to explain the logic.
If you insert a select for each CTE in turn and look at the output up to that point it may help with understanding the overall logic.
create table #territoryassignment
(
TerritoryID char(3),
DBUserID int,
ValidFrom date,
ValidThru date,
AssignmentID int
);
create table #territoryDetails
(
TerritoryID char(3),
ProfitCenter char(3),
ValidFrom date,
ValidThru date,
TerritoryDetailsID int
);
insert #territoryassignment values
('T1 ', 472, '2019-03-01', '2019-12-31', 1389),
('T4 ', 472, '2020-01-01', '2020-10-31', 2105),
('T8 ', 472, '2020-11-01', '2021-09-12', 2226),
('T12', 472, '2021-09-13', '2021-11-30', 2578),
('T2 ', 472, '2021-12-01', '9999-12-31', 2659)
insert #territoryDetails values
('T2 ', 'P05', '2021-12-01', '2022-04-30', 983 ),
('T2 ', 'P18', '2022-05-01', '9999-12-31', 1029),
('T1 ', 'P45', '2012-09-01', '9999-12-31', 502 ),
('T4 ', 'P23', '2020-01-01', '9999-12-31', 755 ),
('T12', 'P05', '2020-01-01', '9999-12-31', 846 ),
('T8 ', 'P18', '2020-01-01', '9999-12-31', 956 );
-- Find overlapping intervals.
-- An overlap exists if T1.start <= T2.end and T1.end >= T2.start
-- so include that predicate in the join.
-- Where an overlap is found, construct a new interval consisting of the overlapped section
with intervals as
(
select ta.DBUserID,
td.ProfitCenter,
ValidFrom = iif(td.ValidFrom > ta.ValidFrom, td.ValidFrom, ta.ValidFrom),
ValidThru = iif(td.ValidThru < ta.ValidThru, td.ValidThru, ta.ValidThru)
from #territoryAssignment ta
join #territoryDetails td on td.territoryID = ta.territoryId
and td.validFrom <= ta.ValidThru
and td.ValidThru >= ta.ValidFrom
),
-- For each interval,
-- if the prior interval ended yesterday
-- then it is not gapped, otherwise it is.
gaps as
(
select DbUserId,
ProfitCenter,
ValidFrom,
ValidThru,
gap = iif
(
lag(ValidThru) over
(
partition by DBUserId, profitCenter
order by ValidFrom asc
) = dateadd(day, -1, ValidFrom),
0, 1
)
from intervals
),
-- gapCount is the running total number of gaps seen
-- for a given user and profit center
-- in order of the interval start dates.
-- Consecutive intervals will have the same gapCount
-- non consecutive intervals will have an ever increasing gapCount.
grouped as
(
select DBUserId,
ProfitCenter,
ValidFrom,
ValidThru,
gapCount = sum(gap) over
(
partition by DBUserId, ProfitCenter
order by ValidFrom
rows unbounded preceding
)
from gaps
)
-- "Merge" consecutive intervals by grouping on the gapCount
select DBUserId,
ProfitCenter,
ValidFrom = min(ValidFrom),
ValidThru = max(ValidThru)
from grouped
group by DBUserId,
ProfitCenter,
gapCount
order by DBUserId,
ProfitCenter,
ValidFrom;

Related

SQL Server: how min and max blank the other row if they same result and convert 24 hours to 12 hours without where method

I want the join EmpID to EmployeeNo and combine the last name, first name, and middle name from the second table and I want to the entries separate the O and I with min and max but if they don't have a min or max become blank or null I just want to become blank the certain row because if they don't blank the row the result is the same.
This is the 1st
| Entries | recordDate | Empid | Reference |
+-----------------------+-------------------------+--------+-----------+
| 0016930507201907:35I | 2019-05-07 00:00:00 000 | 001693 | 1693 |
| 0016930507201917:06O | 2019-05-07 00:00:00 000 | 001693 | 1693 |
| 0016930507201907:35I | 2019-05-08 00:00:00 000 | 001693 | 1693 |
| | 2019-05-08 00:00:00 000 | 001693 | 1693 |
2nd table
| LastName | FirstName | middleName | EmployeeNO |
+----------+-----------+------------+------------+
| Cruz | Kimberly | Castillo | 001693 |
I want to join that two table with the second table combine the lastname,FirstName, and middleName . the employeeNo join to Empid but the entries would be separate between I and O with min or max of certain empId but if the entries have not I or O it would be blank like this and also with where
| Name | EmployeeNO | RecordDate | TimeIn | TimeOut |
+-------------------------+------------+-------------------------+--------+---------+
| CRUZ, MA KIMBERLY, CA | 001693 | 2019-05-07 00:00:00 000 | 07:35 | 05:06 |
| CRUZ, MA KIMBERLY,CA | 001693 | 2019-05-08 00:00:00 000 | 07:35 |
If I have a where there have a error please help me this is the error
Conversion failed when converting date and/or time from character string.
The following will do it for you. I use a couple of nested CTEs. The first one doe your data conversions, the second one sets the min and max times, and the final query sets any rows that have missing I records to blank.
The test data uses table variables for your Table1 and Table2.
declare #E table(Entries varchar(50), RecordDate varchar(50), EmpID varchar(6), Ref int)
insert #e values ('0016930507201907:35I','2019-05-07 00:00:00 000','001693',1693)
,('0016930507201917:06O','2019-05-07 00:00:00 000','001693',1693)
,('0016930507201907:35I','2019-05-08 00:00:00 000','001693',1693)
,('','2019-05-08 00:00:00 000','001693',1693)
declare #B table(LastName varchar(50),FirstName varchar(50),middleName varchar(50),EmployeeNO varchar(6))
insert #B values ('Cruz','Kimberly','Castillo','001693')
;with e as (
select right(Entries,1) as InOut,
convert(datetime,left(recorddate,10)) as RecordDate, Entries,
substring(Entries,15,5) as t,
EmpID
from #e
where len(ltrim(Entries))>0
)
, details as (
select B.LastName + ',' + B.FirstName + ',' + B.MiddleName [Name],
EmployeeNO,
RecordDate,
min(case when InOut='I' then T else '99:99' end) as I,
max(case when InOut='O' then T else '' end) as O
from #B B
join e on e.EmpID=B.EmployeeNO
group by B.LastName,B.FirstName,B.MiddleName,EmployeeNO,RecordDate
)
select Name, EmployeeNO, RecordDate, case when I='99:99' then '' else I end as TimeIn, O as TimeOut
from details

Issue with Running Total Still

I still have an issue with working out the best way to calculate a running balance.
I am going to be using this code in a Rent Statement that I am going to produce in SSRS, but the problem I am having is that I can't seem to work out how to achieve a running balance.
SELECT rt.TransactionId,
rt.TransactionDate,
rt.PostingDate,
rt.AccountId,
rt.TotalValue,
rab.ClosingBalance,
ROW_NUMBER()OVER(PARTITION BY rt.AccountId ORDER BY rt.PostingDate desc) AS row,
CASE WHEN ROW_NUMBER()OVER(PARTITION BY rt.AccountId ORDER BY rt.PostingDate desc) = 1
THEN ISNULL(rab.ClosingBalance,0)
ELSE 0 end
FROM RentTransactions rt
--all accounts for the specific agreement
INNER JOIN (select raa.AccountId
from RentAgreementEpisode rae
inner join RentAgreementAccount raa on raa.AgreementEpisodeId = rae.AgreementEpisodeId
where rae.AgreementId=1981
) ij on ij.AccountId = rt.AccountId
LEFT JOIN RentBalance rab on rab.AccountId = rt.AccountId AND rt.PostingDate BETWEEN rab.BalanceFromDate AND isnull(rab.BalanceToDate,dateadd(day, datediff(day, 0, GETDATE()), 0))
What this gives me are the below results- I have included the results below -
So my code is sorting my transactions in the order I want and also is row numbering them in the correct order as well.
Where the Row Number is 1 - I need it to pull back the balance on that account at that point in time, which is what I am doing....BUT I am then unsure how I then get my code to start subtracting the proceeding row - so in this case The current figure of 1118.58 would need the Total Value in Row 2 = 91.65 subtracted from it - so the running balance for row 2 would be 1026.93 and so on...
Any help would be greatly appreciated.
Assuming you have all the transactions being returned in your query you can calculate a running total using the over clause, you just need to start at the beginning of your dataset rather than working backwards from your current balance:
declare #t table(d date,v decimal(10,2));
insert into #t values ('20170101',10),('20170102',20),('20170103',30),('20170104',40),('20170105',50),('20170106',60),('20170107',70),('20170108',80),('20170109',90);
select *
,sum(v) over (order by d
rows between unbounded preceding
and current row
) as RunningTotal
from #t
order by d desc
Output:
+------------+-------+--------------+
| d | v | RunningTotal |
+------------+-------+--------------+
| 2017-01-09 | 90.00 | 450.00 |
| 2017-01-08 | 80.00 | 360.00 |
| 2017-01-07 | 70.00 | 280.00 |
| 2017-01-06 | 60.00 | 210.00 |
| 2017-01-05 | 50.00 | 150.00 |
| 2017-01-04 | 40.00 | 100.00 |
| 2017-01-03 | 30.00 | 60.00 |
| 2017-01-02 | 20.00 | 30.00 |
| 2017-01-01 | 10.00 | 10.00 |
+------------+-------+--------------+

How to (Dirty) Pair DateTimes Across Two Tables

I am looking at a SQL Server 2008 Database with two Tables, each with a PK (INT) column and a DateTime column.
There is no explicit relationship between the Tables, except I know the application has a heuristic tendency to insert to the database in pairs, one row into each Table, with DateTimes that seem to never match exactly but are usually pretty close.
I am trying to match back up the PKs in each table by finding the closest matching DateTime in the other table. Each PK can only be used once for this matching.
What is the best way to do this?
EDIT: Sorry, please find at bottom some example input and desired output.
+-------+-------------------------+
| t1.PK | t1.DateTime |
+-------+-------------------------+
| 1 | 2016-08-11 00:11:03.000 |
| 2 | 2016-08-11 00:11:08.000 |
| 3 | 2016-08-11 11:03:00.000 |
| 4 | 2016-08-11 11:08:00.000 |
+-------+-------------------------+
+-------+-------------------------+
| t2.PK | t2.DateTime |
+-------+-------------------------+
| 1 | 2016-08-11 11:02:00.000 |
| 2 | 2016-08-11 00:11:02.000 |
| 3 | 2016-08-11 22:00:00.000 |
| 4 | 2016-08-11 11:07:00.000 |
| 5 | 2016-08-11 00:11:07.000 |
+-------+-------------------------+
+-------+-------+-------------------------+-------------------------+
| t1.PK | t2.PK | t1.DateTime | t2.DateTime |
+-------+-------+-------------------------+-------------------------+
| 1 | 2 | 2016-08-11 00:11:03.000 | 2016-08-11 00:11:02.000 |
| 2 | 5 | 2016-08-11 00:11:08.000 | 2016-08-11 00:11:07.000 |
| 3 | 1 | 2016-08-11 11:03:00.000 | 2016-08-11 11:02:00.000 |
| 4 | 4 | 2016-08-11 11:08:00.000 | 2016-08-11 11:07:00.000 |
+-------+-------+-------------------------+-------------------------+
JOIN to the row with lowest DATEDIFF (in seconds) between t1.DateTime and t2.DateTime.
You can achieve the result you are looking for by cross joining table 1 with table 2 and then getting the difference of the dates in seconds as per Tab Alleman’s suggestion. The next step would then be to rank each match using the ROW_NUMBER() function. Final step is to select out only rows which Rank = 1.
The following example demonstrates using your example data:
DECLARE #t1 TABLE
(
ID INT PRIMARY KEY
,[DateTime] DATETIME
);
DECLARE #t2 TABLE
(
ID INT PRIMARY KEY
,[DateTime] DATETIME
)
INSERT INTO #t1
(
ID
,[DateTime]
)
VALUES
(1 ,'2016-08-11 00:11:03.000'),
(2 ,'2016-08-11 00:11:08.000'),
(3 ,'2016-08-11 11:03:00.000'),
(4 ,'2016-08-11 11:08:00.000');
INSERT INTO #t2
(
ID
,[DateTime]
)
VALUES
(1, '2016-08-11 11:02:00.000'),
(2, '2016-08-11 00:11:02.000'),
(3, '2016-08-11 22:00:00.000'),
(4, '2016-08-11 11:07:00.000'),
(5, '2016-08-11 00:11:07.000');
WITH CTE_DateDifference
AS
(
SELECT t1.ID AS T1_ID
,t2.ID AS T2_ID
,t1.[DateTime] AS T1_DateTime
,t2.[DateTime] AS T2_DateTime
,ABS(DATEDIFF(SECOND, t1.[DateTime], t2.[DateTime])) AS Duration -- Determine the difference between the dates in seconds.
FROM #t1 t1
CROSS JOIN #t2 t2
),CTE_RankDateMatch
AS
(
SELECT T1_ID
,T2_ID
,T1_DateTime
,T2_DateTime
,ROW_NUMBER() OVER (PARTITION BY T1_ID ORDER BY Duration) AS [Rank] -- Rank each match, the row numbers generated will be order based on the duration between the dates. Thus rows with a number of 1will be the closest match between the two tables.
FROM CTE_DateDifference
)
-- Finally select out the rows with a Rank equal to 1.
SELECT *
FROM CTE_RankDateMatch
WHERE [Rank] = 1

Complex SQL query, not sure where to start

I have a tough one here I think. I have the following tables:
[Assets]
AssetId | Name
1 | Acura NSX
2 | Dodge Ram
[Assignments]
AssignmentId | AssetId | StartMileage | EndMileage | StartDate | EndDate
1 | 1 | 8000 | 10000 | 4/1/2015 | 5/1/2015
2 | 1 | 10000 | 16000 | 9/15/2015 | 1/5/2016
3 | 2 | 51000 | NULL | 1/1/2016 | NULL
[Reminders]
ReminderId | AssetId | Name | Distance | Time | Active
1 | 1 | Oil Change | 3000 (miles)| 3 (months)| 1
2 | 1 | Tire Rotation | 5000 | 6 | 0
3 | 2 | Oil Change | 3000 | 3 | 1
4 | 2 | Air Filter | 50000 | 48 | 1
[Maintenance]
MaintenanceId | AssetId | ReminderId | Mileage | Date | Vendor
1 | 1 | 1 | 10000 | 5/1/2015 | Jiffy Lube
2 | 2 | 3 | 51000 | 6/1/2015 | Dealership
I need a query that will join these 4 tables and return something like the following.
Name | Name | Current Mileage | Last Mileage | Last Date
Acura NSX | Oil Change | 16000 | 10000 | 5/1/2015
Dodge RAM | Oil Change | 51000 | 51000 | 6/1/2015
Dodge RAM | Air Filter | 51000 | -- | --
I need to take the distance threshold from the Reminders table and add it to the mileage from the Maintenance table then compare it to the start and end mileage from the Assignments table. If the threshold is greater than the start or end mileage then select the asset name, the name of the reminder, the current mileage (start or end mileage from Assignments, whichever is greater), and mileage and date from the last maintenance for that reminder. I need to do the same for time threshold. Add it to the date from the Maintenance table then compare it to today's date. If it's greater then display the asset.
Can one of you SQL gurus help me with this please?
UPDATE:
SELECT
v.Name,
r.Name AS Reminder,
a.CurrentMileage,
i.MaintenanceMileage,
i.MaintenanceDate
FROM
Assets v
LEFT JOIN
(SELECT AssetId,
COALESCE(EndMileage, StartMileage) AS CurrentMileage,
ROW_NUMBER() OVER (PARTITION BY AssetId
ORDER BY AssignmentId DESC) AS window_id
FROM Assignments) a
ON v.AssetId = a.AssetId
AND a.window_id = 1
JOIN
Reminders r
ON v.AssetId = r.AssetId
AND r.ActiveFlag = 1
LEFT JOIN
(SELECT AssetId,
ReminderId,
MAX(Mileage) AS MaintenanceMileage,
MAX([Date]) AS MaintenanceDate
FROM Maintenances
GROUP BY AssetId, ReminderId) i
ON r.ReminderId = i.ReminderId
AND (a.CurrentMileage > (NULLIF(i.MaintenanceMileage, 0) + r.DistanceThreshold))
OR (GETDATE() > DATEADD(m, r.[TimeThreshold], i.MaintenanceDate))
Here is a starting point:
SELECT v.Name AS [Asset Name], r.Name AS Reminder, a.CurrentMileage,
m.Mileage + r.Distance AS [Last Mileage], m.[Date] AS [Last Date]
FROM Assets v
JOIN ( -- get the latest relevant row as window_id = 1
SELECT AssetId, COALESCE(EndMileage, StartMileage) AS CurrentMileage,
COALESCE(EndDate, StartDate) AS AssignDate,
ROW_NUMBER() OVER (partition by AssetId
order by COALESCE(EndDate, StartDate) DESC) AS window_id
FROM Assignments
) a
ON v.AssetId = a.AssetId
AND a.window_id = 1
JOIN Reminders r
ON v.AssetId = r.AssetId
AND r.Active = 1
LEFT JOIN Maintenance m
ON r.AssetId = m.AssetId
AND r.ReminderId = m.ReminderId
-- corrected
AND ((a.CurrentMileage > (NULLIF(m.Mileage, 0) + r.Distance))
-- slightly oversimplified
OR (GETDATE() > DATEADD(m, r.[Time], COALESCE(m.[Date], a.AssignDate))))
The date calculations are slightly oversimplified because they use the latest assignment dates. What you would really want is a column Assets.InServiceDate that would anchor the time before the first maintenance would be due. But this will get you started.

How do I utilize Row_Number() (partitioning) for my datapool correctly

we have following table (output is already ordered and separated for understanding):
| PK | FK1 | FK2 | ActionCode | CreationTS | SomeAttributeValue |
+----+-----+-----+--------------+---------------------+--------------------+
| 6 | 100 | 500 | Create | 2011-01-02 00:00:00 | H |
----------------------------------------------------------------------------
| 3 | 100 | 500 | Change | 2011-01-01 02:00:00 | Z |
| 2 | 100 | 500 | Change | 2011-01-01 01:00:00 | X |
| 1 | 100 | 500 | Create | 2011-01-01 00:00:00 | Y |
----------------------------------------------------------------------------
| 4 | 100 | 510 | Create | 2011-01-01 00:30:00 | T |
----------------------------------------------------------------------------
| 5 | 100 | 520 | CreateSystem | 2011-01-01 00:30:00 | A |
----------------------------------------------------------------------------
what is ActionCode? we use this in c# and there it represents an enum-value
what do i want to achieve?
well, i need the following output:
| FK1 | FK2 | ActionCode | SomeAttributeValue |
+-----+-----+--------------+--------------------+
| 100 | 500 | Create | H |
| 100 | 500 | Create | Z |
| 100 | 510 | Create | T |
| 100 | 520 | CreateSystem | A |
-------------------------------------------------
well, what is the actual logic?
we have some logical groups for composite-key (FK1 + FK2). each of these groups can be broken into partitions, which begin with Create or CreateSystem. each partition ends with Create, CreateSystem or Change. The actual value of SomeAttributeValue for each partition should be the value from the last line of the partition.
it is not possible to have following datapool:
| PK | FK1 | FK2 | ActionCode | CreationTS | SomeAttributeValue |
+----+-----+-----+--------------+---------------------+--------------------+
| 7 | 100 | 500 | Change | 2011-01-02 02:00:00 | Z |
| 6 | 100 | 500 | Create | 2011-01-02 00:00:00 | H |
| 2 | 100 | 500 | Change | 2011-01-01 01:00:00 | X |
| 1 | 100 | 500 | Create | 2011-01-01 00:00:00 | Y |
----------------------------------------------------------------------------
and then expect PK 7 to affect PK 2 or PK 6 to affect PK 1.
i don't even know how/where to start ... how can i achieve this?
we are running on mssql 2005+
EDIT:
there's a dump available:
instanceId: my PK
tenantId: FK 1
campaignId: FK 2
callId: FK 3
refillCounter: FK 4
ticketType: ActionCode (1 & 4 & 6 are Create, 5 is Change, 3 must be ignored)
ticketType, profileId, contactPersonId, ownerId, handlingStartTime, handlingEndTime, memo, callWasPreselected, creatorId, creationTS, changerId, changeTS should be taken from the Create (first line in partition in groups)
callingState, reasonId, followUpDate, callingAttempts and callingAttemptsConsecutivelyNotReached should be taken from the last Create (which then would be a "one-line-partition-in-group" / the same as the upper one) or Change (last line in partition in groups)
I'm assuming that each partition can only contain a single Create or CreateSystem, otherwise your requirements are ill-defined. The following is untested, since I don't have a sample table, nor sample data in an easily consumed format:
;With Partitions as (
Select
t1.FK1,
t1.FK2,
t1.CreationTS as StartTS,
t2.CreationTS as EndTS
From
Table t1
left join
Table t2
on
t1.FK1 = t2.FK1 and
t1.FK2 = t2.FK2 and
t1.CreationTS < t2.CreationTS and
t2.ActionCode in ('Create','CreateSystem')
left join
Table t3
on
t1.FK1 = t3.FK1 and
t1.FK2 = t3.FK2 and
t1.CreationTS < t3.CreationTS and
t3.CreationTS < t2.CreationTS and
t3.ActionCode in ('Create','CreateSystem')
where
t1.ActionCode in ('Create','CreateSystem') and
t3.FK1 is null
), PartitionRows as (
SELECT
t1.FK1,
t1.FK2,
t1.ActionCode,
t2.SomeAttributeValue,
ROW_NUMBER() OVER (PARTITION_FRAGMENT_ID BY t1.FK1,T1.FK2,t1.StartTS ORDER BY t2.CreationTS desc) as rn
from
Partitions t1
inner join
Table t2
on
t1.FK1 = t2.FK1 and
t1.FK2 = t2.FK2 and
t1.StartTS <= t2.CreationTS and
(t2.CreationTS < t1.EndTS or t1.EndTS is null)
)
select * from PartitionRows where rn = 1
(Please note than I'm using all kinds of reserved names here)
The basic logic is: The Partitions CTE is used to define each partition in terms of the FK1, FK2, an inclusive start timestamp, and exclusive end timestamp. It does this by a triple join to the base table. the rows from t2 are selected to occur after the rows from t1, then the rows from t3 are selected to occur between the matching rows from t1 and t2. Then, in the WHERE clause, we exclude any rows from the result set where a match occurred from t3 - the result being that the row from t1 and the row from t2 represent the start of two adjacent partitions.
The second CTE then retrieves all rows from Table for each partition, but assigning a ROW_NUMBER() score within each partition, based on the CreationTS, sorted descending, with the result that ROW_NUMBER() 1 within each partition is the last row to occur.
Finally, within the select, we choose those rows that occur last within their respective partitions.
This does all assume that CreationTS values are distinct within each partition. I may be able to re-work it using PK also, if that assumption doesn't hold up.
It is solvable with a recursive CTE. Here (assuming rows within partitions are ordered by CreationTS):
WITH partitioned AS (
SELECT
*,
rn = ROW_NUMBER() OVER (PARTITION BY FK1, FK2 ORDER BY CreationTS)
FROM data
),
subgroups AS (
SELECT
PK, FK1, FK2, ActionCode, CreationTS, SomeAttributeValue, rn,
Subgroup = 1,
Subrank = 1
FROM partitioned
WHERE rn = 1
UNION ALL
SELECT
p.PK, p.FK1, p.FK2, p.ActionCode, p.CreationTS, p.SomeAttributeValue, p.rn,
Subgroup = s.Subgroup + CASE p.ActionCode WHEN 'Change' THEN 0 ELSE 1 END,
Subrank = CASE p.ActionCode WHEN 'Change' THEN s.Subrank ELSE 0 END + 1
FROM partitioned p
INNER JOIN subgroups s ON p.FK1 = s.FK1 AND p.FK2 = s.FK2
AND p.rn = s.rn + 1
),
finalranks AS (
SELECT
PK, FK1, FK2, ActionCode, CreationTS, SomeAttributeValue, rn,
Subgroup, Subrank,
rank = ROW_NUMBER() OVER (PARTITION BY FK1, FK2, Subgroup ORDER BY Subrank DESC)
/* or: rank = MAX(Subrank) OVER (PARTITION BY FK1, FK2, Subgroup) - Subrank + 1 */
FROM subgroups
)
SELECT PK, FK1, FK2, ActionCode, CreationTS, SomeAttributeValue
FROM finalranks
WHERE rank = 1

Resources