Related
I am using OVER, and Partition by to get the mindate and max date of dataset.
|ResdetId | bookingdate | Amount | AmountExcl |
-----------------------------------------------
|120106 | 2018-02-04 | 75.00 | 70.7547 |
|120106 | 2018-02-05 | 75.00 | 70.7547 |
|120106 | 2018-02-06 | 90.00 | 84.9057 |
|120106 | 2018-02-08 | 75.00 | 70.7547 |
|120106 | 2018-02-09 | 75.00 | 70.7547 |
I am using this query
select distinct ResDetId, Amount, AmountExcl,
min(Bookingdate) OVER(Partition by ResDetId, Amount, AmountExcl) as Mindate,
max(Bookingdate) OVER(Partition by ResDetId, Amount, AmountExcl) as MaxDate
from #Cumulatedbookingdetails
And I am getting this result
|ResdetId | Amount | AmountExcl | MinDate | MaxDate |
------------------------------------------------------------
|120106 | 75.00 | 70.7547 | 2018-02-04 | 2018-02-09 |
|120106 | 90.00 | 84.9057 | 2018-02-06 | 2018-02-06 |
As we see date 2018-02-07 record is missing from the data set. So, I need result like this
|ResdetId | Amount | AmountExcl | MinDate | MaxDate |
------------------------------------------------------------
|120106 | 75.00 | 70.7547 | 2018-02-04 | 2018-02-05 |
|120106 | 75.00 | 70.7547 | 2018-02-08 | 2018-02-09 |
|120106 | 90.00 | 84.9057 | 2018-02-06 | 2018-02-06 |
One way to approach an "Islands and Gaps" problem, such as this, is to use a recursive CTE to build up the islands. We make the non-recursive portion (above the union) find the row which marks the start of each island, and the recursive part grows each island one match at a time.
The final results of the CTE unfortunately contain all of the intermediate rows used in building the islands, so you need a final GROUP by to select the final island out:
declare #t table (ResdetId int, bookingdate date, Amount decimal(9,3), AmountExcl decimal (9,3))
insert into #t(ResdetId,bookingdate,Amount,AmountExcl) values
(120106,'20180204',75.00,70.7547),
(120106,'20180205',75.00,70.7547),
(120106,'20180206',90.00,84.9057),
(120106,'20180208',75.00,70.7547),
(120106,'20180209',75.00,70.7547)
;With Islands as (
select ResdetId, Amount, AmountExcl,bookingdate as MinDate,bookingDate as MaxDate
from #t t
where not exists (select * from #t t2
where t2.ResdetId = t.ResdetId
and t2.Amount = t.Amount
and t2.AmountExcl = t.AmountExcl
and t2.bookingdate = DATEADD(day,-1,t.BookingDate))
union all
select i.ResdetId, i.Amount,i.AmountExcl,i.MinDate,t.bookingDate
from Islands i
inner join
#t t
on t.ResdetId = i.ResdetId
and t.Amount = i.Amount
and t.AmountExcl = i.AmountExcl
and t.bookingdate = DATEADD(day,1,i.MaxDate)
)
select
ResdetId, Amount, AmountExcl,MinDate,MAX(MaxDate) as MaxDate
from
Islands
group by ResdetId, Amount, AmountExcl,MinDate
Results:
ResdetId Amount AmountExcl MinDate MaxDate
----------- --------- ------------ ---------- ----------
120106 75.000 70.755 2018-02-04 2018-02-05
120106 75.000 70.755 2018-02-08 2018-02-09
120106 90.000 84.906 2018-02-06 2018-02-06
You didn't see 2018-02-07 because the bookingdate is not in your Partition so
|ResdetId | Amount | AmountExcl
--------------------------------
|120106 | 75.00 | 70.7547
|120106 | 90.00 | 84.9057
are unique by your Partition. So it is like a Key. You need another attribute to differentiate the same data:
|ResdetId | Amount | AmountExcl
--------------------------------
|120106 | 75.00 | 70.7547
This would be much easier to do with GROUP BY. OVER and DISTINCT are much "harder" ways to do the same query:
WITH VTE AS(
SELECT ResdetId,
CONVERT(date,bookingdate) AS bookingdate,
Amount,
AmountExcl
FROM (VALUES (120106,'20180204',75.00,70.7547),
(120106,'20180205',75.00,70.7547),
(120106,'20180206',90.00,84.9057),
(120106,'20180208',75.00,70.7547),
(120106,'20180209',75.00,70.7547)) V(ResdetId,bookingdate,Amount,AmountExcl))
SELECT ResdetId,Amount,AmountExcl,
MIN(bookingdate) AS MinBookingDate,
MAX(bookingdate) AS MaxBookingDate
FROM VTE
GROUP BY ResdetId,Amount,AmountExcl;
As noted my Sami, I had read the results the wrong way round, this is a Gaps and Island question:
WITH VTE AS(
SELECT ResdetId,
CONVERT(date,bookingdate) AS bookingdate,
Amount,
AmountExcl
FROM (VALUES (120106,'20180204',75.00,70.7547),
(120106,'20180205',75.00,70.7547),
(120106,'20180206',90.00,84.9057),
(120106,'20180208',75.00,70.7547),
(120106,'20180209',75.00,70.7547)) V(ResdetId,bookingdate,Amount,AmountExcl)),
Grps AS(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY ResdetId ORDER BY V.bookingdate) -
ROW_NUMBER() OVER (PARTITION BY ResdetId, Amount ORDER BY V.bookingdate) AS Grp
FROM VTE V)
SELECT ResdetId,
Amount,
AmountExcl,
MIN(bookingdate) AS MinBookingDate,
MAX(bookingdate) AS MaxBookingDate
FROM Grps
GROUP BY ResdetId,
Amount,
AmountExcl,
Grp
ORDER BY ResdetId,
Amount,
MinBookingDate;
Try this, it uses row numbers differnce technique:
declare #tbl table(ResdetId int, bookingdate date, Amount float, AmountExcl float);
insert into #tbl values
(120106 , '2018-02-04' , 75.00 , 70.7547 ),
(120106 , '2018-02-05' , 75.00 , 70.7547 ),
(120106 , '2018-02-06' , 90.00 , 84.9057 ),
(120106 , '2018-02-08' , 75.00 , 70.7547 ),
(120106 , '2018-02-09' , 75.00 , 70.7547 );
select MIN(bookingDate), MAX(bookingDate), Amount, AmountExcl
from (
select *,
ROW_NUMBER() over (order by bookingDate) -
ROW_NUMBER() over (partition by amount, AmountExcl order by bookingDate) rn
from #tbl
) a group by Amount, AmountExcl, rn
I’ve got a table containing a list of patient appointments: the clinic they attended, and the date of their attendance.
I’m trying to write a query that gives me the following:
‘Which patients attended clinic ‘123-45’ at any point during the period April 2016 – March 2017, and what were the subsequent 2 appointments (the appointment date and clinic attended) for that patient’?
I’ve tried to come at this by first querying out the list of patient ID numbers for all those patients that attended clinic ‘123-45’ during the time frame, and then putting this list of Patient IDs into a WHERE clause and using ROW_NUMBER() OVER (PARTITION BY… to give me an ordered list of all appointments for each patient during the 12 month period.
SELECT
x.Patient_Id
,x.Clinic_Code
,x.Appointment_Date
,x.Row_No FROM
(
SELECT
Patient_Id
,Clinic_Code
,Appointment_Date
,ROW_NUMBER() OVER (PARTITION BY Patient_Id ORDER BY Patient_Id, Appointment_Date asc) [Row_No]
FROM
Appointments
WHERE
Appointment_Date BETWEEN '01/10/2016' AND '30/09/2017'
AND Patient_ID = 'BLO123'
) x
WHERE x.Row_No < 4
However, this has the unintended consequence of numbering any appointments that occurred prior to the clinic ‘123-45’ attendance.
So, if the following is my source:
Patient_ID | Clinic_Code | Appointment_Date
--------------------------------------------
BLO123 | QWE-QW | 01-04-2016
BLO123 | OPD-ZZ | 05-10-2016
BLO123 | 123-45 | 13-11-2016
BLO123 | 333-44 | 15-12-2016
BLO123 | 999-45 | 02-02-2017
BLO123 | 222-44 | 15-02-2017
BLO123 | 777-45 | 19-03-2017
What I'm trying to get is:
Patient_ID | Clinic_Code | Appointment_Date | Row_No
--------------------------------------------------------------
BLO123 | 123-45 | 13-11-2016 | 1
BLO123 | 333-44 | 15-12-2016 | 2
BLO123 | 999-45 | 02-02-2017 | 3
But by including the preceding appointments within the date range, I'm instead getting:
Patient_ID | Clinic_Code | Appointment_Date | Row_No
--------------------------------------------------------------
BLO123 | QWE-QW | 01-04-2016 | 1
BLO123 | OPD-ZZ | 05-10-2016 | 2
BLO123 | 123-45 | 13-11-2016 | 3
What I would like to query to do is to ignore any clinic appointments that precede the ‘123-45 attendance.
Please can anyone advise if it's possible to do this?
This approach uses a common table expression (CTE) to find the first appointment each patient has at clinic 123-45. The main body of the query returns all subsequent appointments.
Sample data:
DECLARE #Appointment TABLE
(
Patient_ID varchar(6),
Clinic_code varchar(6),
Appointment_Date date
)
;
INSERT INTO #Appointment
(
Patient_ID,
Clinic_code,
Appointment_Date
)
VALUES
('BLO123','QWE-QW','20160401'),
('BLO123','OPD-ZZ','20161005'),
('BLO123','123-45','20161113'),
('BLO123','333-44','20161215'),
('BLO123','999-45','20170202')
;
Query:
WITH
FirstAppointment AS
(
-- Find patients first vist to clinic 123-45.
SELECT
Patient_ID,
MIN(Appointment_Date) AS FirstAppointment_Date
FROM
#Appointment
WHERE
Appointment_Date >= '20160401'
AND Appointment_Date <= '20170331'
AND Clinic_code = '123-45'
GROUP BY
Patient_ID
)
SELECT
ROW_NUMBER() OVER (PARTITION BY a.Patient_ID ORDER BY a.Appointment_Date) AS Rn,
a.*
FROM
FirstAppointment AS fa
INNER JOIN #Appointment AS a ON a.Patient_ID = fa.Patient_ID
AND a.Appointment_Date >= fa.FirstAppointment_Date
;
with foo as
(
select
*
from (values
('BLO123','QWE-QW', cast('20160401' as date))
,('BLO123','OPD-ZZ',cast('20161005' as date))
,('BLO123','123-45',cast('20161113' as date))
,('BLO123','333-44',cast('20161215' as date))
,('BLO123','999-45',cast('20170202' as date))
) a(Patient_ID , Clinic_Code , Appointment_Date)
)
,lags as
(
select
*
,lag(Clinic_code,1) over (partition by Patient_id order by Appointment_Date) l1
,lag(Clinic_code,2) over (partition by Patient_id order by Appointment_Date) l2
,ROW_NUMBER() over (partition by Patient_id order by Appointment_Date) rn
from foo
)
select Patient_ID,Clinic_Code,Appointment_Date
,case when Clinic_Code='123-45' then 1
when l1='123-45' then 2
else 3 end Row_Nr
from lags
where '123-45' in (Clinic_Code,l1,l2)
The result:
+----------------------------------------------+
|Patient_ID|Clinic_Code|Appointment_Date|Row_No|
+----------------------------------------------+
|BLO123 |123-45 |2016-11-13 |1 |
|BLO123 |333-44 |2016-12-15 |2 |
|BLO123 |999-45 |2017-02-02 |3 |
+----------------------------------------------+
I have a table in SQL Server with the following data:
+-----------------+-------------------+-------------------+--------+
|Product Family | Product Class | Product | Sales |
|Food | Vegetables | Cauliflower | 24 |
|Food | Prepared Meals | Steak & Patatoes | 54 |
|Food | Fruit | Apples | 76 |
|Food | Fruit | Oranges | 14 |
|Food | Fruit | Pears | 32 |
|Electronics | MP3 Players | Cool Player Z | 57 |
|Electronics | MP3 Players | iStuff 16GB | 45 |
|Electronics | TV's | HD | 96 |
|Electronics | TV's | Ultra HD | 76 |
+-----------------+-------------------+-------------------+--------+
There is a hierarchy in this data:
Product Family
Product Class
Product
I'd like to create a query that will return the sum for each hierarchy level. This union does that:
SELECT 1 as Level, [Product Family] as Item, SUM(SALES) as Sales
FROM [dbo].[HK_Termp_01] GROUP BY [Product Family]
UNION ALL
SELECT 2 as Level, [Product Class] as Item, SUM(SALES) as Sales
FROM [dbo].[HK_Termp_01] GROUP BY [Product Class]
UNION ALL
SELECT 3 as Level, Product as Item, SUM(SALES) as Sales
FROM [dbo].[HK_Termp_01] GROUP BY Product
However, I also require an additional column that will be a concatenation of the 3 string columns, in the order of the hierarchy. The desired output being:
+--------------------------+-----------------------------------------------+--------+
| Level ||Item | Hierarchy | Sales |
| 1 ||Electronics | Electronics | 274 |
| 1 ||Food | Food | 200 |
| 2 ||Fruit | Food > Fruit | 122 |
| 2 ||MP3 Players | Electronics > MP3 Players | 102 |
| 2 ||Prepared Meals | Food > Prepared Meals | 54 |
| 2 ||TV's | Electronics > TV's | 172 |
| 2 ||Vegetables | Food > Vegetables | 24 |
| 3 ||Apples | Food > Fruit > Apples | 76 |
| 3 ||Cauliflower | Food v Vegetables > Cauliflower | 24 |
| 3 ||Cool Player Z | Electronics > MP3 Players > Cool Player Z | 57 |
| 3 ||HD | Electronics > TV's > HD | 96 |
| 3 ||iStuff 16GB | Electronics v MP3 Players > iStuff 16GB | 45 |
| 3 ||Oranges | Food > Fruit > Oranges | 14 |
| 3 ||Pears | Food > Fruit v Pears | 32 |
| 3 ||Steak & Patatoes | Food v Prepared Meals > Steak & Patatoes | 54 |
| 3 ||Ultra HD | Electronics > TV's > Ultra HD | 76 |
+--------------------------+--------------+------+-------------------------+--------+
This is where I get stuck. I can't add all 3 fields to each query in the Union, because then I don't get the right totals by level. But I'm not sure what other avenue to try.
Thanks & Let me know what other info I can supply to clarify the case.
I think you just want a tweak on your query:
SELECT 1 as Level, [Product Family] as Item,
SUM(SALES) as Sales
FROM [dbo].[HK_Termp_01]
GROUP BY [Product Family]
UNION ALL
SELECT 2 as Level, [Product Family] + '>' + [Product Class] as Item,
SUM(SALES) as Sales
FROM [dbo].[HK_Termp_01]
GROUP BY [Product Family] + '>' + [Product Class]
UNION ALL
SELECT 3 as Level, [Product Family] + '>' + [Product Class] + '>' + Product as Item,
SUM(SALES) as Sales
FROM [dbo].[HK_Termp_01]
GROUP BY [Product Family] + '>' + [Product Class] + '>' + Product;
That said, you could do this using GROUPING_SETS:
SELECT [Product Family], [Product Class], Product, SUM(SALES) as Sales
FROM [dbo].[HK_Termp_01]
GROUP BY GROUPING SETS ( ([Product Family], [Product Class], Product),
([Product Family], [Product Class]),
([Product Family])
);
You would then need to fiddle with the names to get the exact output you want.
Just for fun,
Declare #YourTable table ([Product Family] varchar(50),[Product Class] varchar(50),Product varchar(50),Sales int)
Insert Into #YourTable values
('Food','Vegetables','Cauliflower',24),
('Food','Prepared Meals','Steak & Patatoes',54),
('Food','Fruit','Apples',76),
('Food','Fruit','Oranges',14),
('Food','Fruit','Pears',32),
('Electronics','MP3 Players','Cool Player Z',57),
('Electronics','MP3 Players','iStuff 16GB',45),
('Electronics','TV''s','HD',96),
('Electronics','TV''s','Ultra HD',76)
Declare #Top varchar(25) = NULL --<< Sets top of Hier Try ''MP3 Players''
Declare #Nest varchar(25) = '|-----' --<< Optional: Added for readability
;with cte0 as (
Select Distinct ID=Product,Parent=[Product Class],Sales from #YourTable
Union All
Select Distinct ID=[Product Class],Parent=[Product Family],0 from #YourTable
Union All
Select Distinct ID=[Product Family],Parent='Total',0 from #YourTable
Union All
Select Distinct ID='Total',Parent=NULL,0 )
,cteP as (
Select Seq = cast(100000+Row_Number() over (Order by ID) as varchar(500))
,ID
,Parent
,Lvl=1
,Sales = Sales
From cte0
Where IsNull(#Top,'X') = case when #Top is null then isnull(Parent,'X') else ID end
Union All
Select Seq = cast(concat(p.Seq,'.',100000+Row_Number() over (Order by r.ID)) as varchar(500))
,r.ID
,r.Parent
,p.Lvl+1
,r.Sales
From cte0 r
Join cteP p on r.Parent = p.ID)
,cteR1 as (Select *,R1=Row_Number() over (Order By Seq) From cteP)
,cteR2 as (Select A.Seq,A.ID,R2=Max(B.R1) From cteR1 A Join cteR1 B on (B.Seq like A.Seq+'%') Group By A.Seq,A.ID )
Select A.R1
,B.R2
,A.ID
,A.Parent
,A.Lvl
,Title = Replicate(#Nest,A.Lvl-1) + A.ID
,Sales = (Select sum(Sales) from cteR1 S where S.R1 between A.R1 and B.R2)
From cteR1 A
Join cteR2 B on A.ID=B.ID
Group By A.R1,B.R2,A.ID,A.Parent,A.Lvl
Order By A.R1
Returns
Now, If you set #Top = 'MP3 Players' rather than NULL, you'll get :
Just a little narrative:
cte0, we normalize your hierarchy into a Parent/Child relationship
cteP, we build your hierarchy via a recursive cte
cteR1, we generate the sequence/R1 keys
cteR2, we generate the R2 Keys
Now, If yo have slow-moving hierarchies, I tend to store them with the range keys to facilitate navigation and aggregation.
Due to company policies I cannot give the actual query I am working with but heres the breakdown and general idea. We have an attendance register that records for each day if an employee was at work or not and where the employee works at. I am trying to make a summary of this to say between this and that date the employee worked 5 shifts. The problem I am sitting with is that one particular employee worked in workplace A for 2 days and was then transferred to workplace B. After a few days at workplace B the employee was then transferred back to workplace A.
My results to my attempt has showed that the employee begun working at workplace A from 1-Jan and ended at 10-Jan with only 2 working shifts. I have a group by on the working place and the begin and end dates are a min and max selection.
SELECT att.Employee, att.Workplace, dte.BeginDate, dte.EndDate, shf.WorkShift FROM
(SELECT * FROM Attendance WHERE WorkDate BETWEEN '1-Jan' AND '30-Jan') att
CROSS APPLY (SELECT COUNT(Shift) WorkShift FROM Attendance WHERE WorkDate BETWEEN '1-Jan' AND '30-Jan' AND Employee = att.Employee AND WorkPlace = att.WorkPlace AND Shift = 'Worked') shf
CROSS APPLY (SELECT MAX(WorkDate) BeginDate, MIN(WorkDate) EndDate FROM Attendance WHERE WorkDate BETWEEN '1-Jan' AND '30-Jan' AND Employee = att.Employee AND WorkPlace = att.WorkPlace) dte
So this employees records should appear like this (I am sorry for the very bad grid, I don't know how to make it look pretty, you are more than welcome to edit it to look better)
| Name | Workplace | beginDate | endDate | WorkShift |
| Jane | WorkPlaceA | 1-Jan | 2-Jan | 2 |
| Jane | WorkPlaceB | 3-Jan | 8-Jan | 5 |
| Jane | WorkPlaceA | 9-Jan | 10-Jan | 2 |
The attendance table looks something like this
| Name | Workplace | Date | Shift |
| Jane | WorkplaceA | 1-Jan | Worked |
| Jane | WorkplaceA | 2-Jan | Worked |
| Jane | WorkplaceB | 3-Jan | Worked |
| Jane | WorkplaceB | 4-Jan | Worked |
| Jane | WorkplaceB | 5-Jan | Worked |
| Jane | WorkplaceA | 6-Jan | Absent |
| Jane | WorkplaceA | 7-Jan | Absent |
| Jane | WorkplaceA | 8-Jan | Worked |
| Jane | WorkplaceB | 9-Jan | Worked |
| Jane | WorkplaceB | 10-Jan | Worked |
I believe you can accomplish this using CTE's. Here is a sample working code that shows your expected values.
;WITH CTE1 AS (
SELECT Employee, WorkPlace, TransactionDate,
ROW_NUMBER() OVER(PARTITION BY WorkPlace ORDER BY TransactionDate) AS WP,
ROW_NUMBER() OVER(ORDER BY TransactionDate) AS RN FROM Attendance WHERE Shift = 'Worked'),
CTE2 AS (SELECT Employee, WorkPlace, TransactionDate, WP, RN, WP-RN AS GB FROM CTE1),
CTE3 AS (SELECT Employee, WorkPlace, MIN(TransactionDate) AS TransactionDate, COUNT(1) AS Shifts FROM CTE2 GROUP BY Employee, WorkPlace, GB)
SELECT Employee, WorkPlace, TransactionDate AS [Start Date], DATEADD(DAY,Shifts - 1,TransactionDate) AS [End Date], Shifts FROM CTE3 ORDER BY TransactionDate ASC
I think your given output is wrong.
I think the way you are populating table is wrong.
Check my query,it can be further optmize,it do not count absent days
declare #t table(Name varchar(100),Workplace varchar(100), AttnDate date ,Shifts varchar(100))
insert into #t values
('Jane','WorkplaceA',' 1-Jan-16','Worked')
,('Jane','WorkplaceA',' 2-Jan-16','Worked')
,('Jane','WorkplaceB',' 3-Jan-16','Worked')
,('Jane','WorkplaceB',' 4-Jan-16','Worked')
,('Jane','WorkplaceB',' 5-Jan-16','Worked')
,('Jane','WorkplaceA',' 6-Jan-16','Absent')
,('Jane','WorkplaceA',' 7-Jan-16','Absent')
,('Jane','WorkplaceA',' 8-Jan-16','Worked')
,('Jane','WorkplaceB',' 9-Jan-16','Worked')
,('Jane','WorkplaceB','10-Jan-16','Worked')
DECLARE #Name VARCHAR(100) = 'Jane'
DECLARE #FromDate DATE = '01-Jan-16'
DECLARE #ToDate DATE = '31-Jan-16';
WITH CTE
AS (
SELECT *
,row_number() OVER (
ORDER BY attndate
) rn
FROM #t
WHERE NAME = #Name
AND (
AttnDate BETWEEN #FromDate
AND #ToDate
)
)
,CTE1
AS (
SELECT A.NAME
,A.workplace
,A.AttnDate
,Shifts
,rn
,1 RN1
FROM cte A
WHERE rn = 1
UNION ALL
SELECT a.NAME
,a.workplace
,a.AttnDate
,a.Shifts
,CASE
WHEN a.workplace = b.workplace
THEN b.rn
ELSE b.rn + 1
END rn
,RN1 + 1
FROM CTE A
INNER JOIN CTE1 b ON a.attndate > b.attndate
WHERE a.rn = RN1 + 1
)
,CTE2
AS (
SELECT NAME
,Workplace
,AttnDate beginDate
,(
SELECT max(AttnDate)
FROM CTE1 b
WHERE b.rn = a.rn
) endDate
,(
SELECT count(*)
FROM CTE1 b
WHERE b.rn = a.rn
AND Shifts = 'Worked'
) WorkShift
,rn
,ROW_NUMBER() OVER (
PARTITION BY rn ORDER BY rn
) rn3
FROM cte1 a
)
SELECT NAME
,workplace
,beginDate
,endDate
,WorkShift
FROM cte2
WHERE rn3 = 1
Here's the data:
[ TABLE_1 ]
id | prod1 | date1 | prod2 | date2 | prod3 | date3 |
---|--------|--------|--------|--------|--------|-------|
1 | null | null | null | null | null | null |
2 | null | null | null | null | null | null |
3 | null | null | null | null | null | null |
[ TABLE_2 ]
id | date | product |
-----|-------------|-----------|
1 | 20140101 | X |
1 | 20140102 | Y |
1 | 20140103 | Z |
2 | 20141201 | data |
2 | 20141201 | Y |
2 | 20141201 | Z |
3 | 20150101 | data2 |
3 | 20150101 | data3 |
3 | 20160101 | X |
Both tables have other columns not listed here.
date is formatted: yyyymmdd and datatype is int.
[ TABLE_2 ] doesn't have empty rows, just tried to make sample above more readable.
Here's the Goal:
I need to update [ TABLE_1 ] prod1,date1,prod2,date2,prod3,date3
with product collected from [ TABLE_2 ] with corresponding date values.
Data must be sorted so that "latest" product becomes prod1,
2nd latest product will be prod2 and 3rd is prod3.
Latest product = biggest date (int).
If dates are equal, order doesn't matter. (see id=2 and id=3).
Updated [ TABLE_1 ] should be:
id | prod1 | date1 | prod2 | date2 | prod3 | date3 |
---|--------|----------|--------|----------|--------|----------|
1 | Z | 20140103 | Y | 20140102 | X | 20140101 |
2 | data | 20141201 | Y | 20141201 | Z | 20141201 |
3 | X | 20160101 | data2 | 20150101 | data3 | 20150101 |
Ultimate goal is to get the following :
[ TABLE_3 ]
id | order1 | order2 | order3 | + Columns from [ TABLE_1 ]
---|--------------------|----------------------|------------|--------------------------
1 | 20140103:Z | 20140102:Y | 20140103:Z |
2 | 20141201:data:Y:Z | NULL | NULL |
3 | 20160101:X | 20150101:data2:data3 | NULL |
I have to admit this exceeds my knowledge and I haven't tried anything.
Should I do it with JOIN or SELECT subquery?
Should I try to make it in one SQL -clause or perhaps in 3 steps,
each prod&date -pair at the time ?
What about creating [ TABLE_3 ] ?
It has to have columns from [ TABLE_1 ].
Is it easiest to create it from [ TABLE_2 ] -data or Updated [ TABLE_1 ] ?
Any help would be highly appreciated.
Thanks in advance.
I'll post some of my own shots on comments.
After looking into it (after my comment), a stored procedure would be best, that you can call to view the data as a pivot, and do away with TABLE_1. Obviously if you need to make this dynamic, you'll need to look into dynamic pivots, it's a bit of a hack with CTEs:
CREATE PROCEDURE DBO.VIEW_AS_PIVOTED_DATA
AS
;WITH CTE AS (
SELECT ID, [DATE], 'DATE' + CAST(ROW_NUMBER() OVER(PARTITION BY ID ORDER BY [DATE] DESC) AS VARCHAR) AS [RN]
FROM TABLE_2)
, CTE2 AS (
SELECT ID, PRODUCT, 'PROD' + CAST(ROW_NUMBER() OVER(PARTITION BY ID ORDER BY [DATE] DESC) AS VARCHAR) AS [RN]
FROM TABLE_2)
, CTE3 AS (
SELECT ID, [DATE1], [DATE2], [DATE3]
FROM CTE
PIVOT(MAX([DATE]) FOR RN IN ([DATE1],[DATE2],[DATE3])) PIV)
, CTE4 AS (
SELECT ID, [PROD1], [PROD2], [PROD3]
FROM CTE2
PIVOT(MAX(PRODUCT) FOR RN IN ([PROD1],[PROD2],[PROD3])) PIV)
SELECT A.ID, [PROD1], [DATE1], [PROD2], [DATE2], [PROD3], [DATE3]
FROM CTE3 AS A
JOIN CTE4 AS B
ON A.ID=B.ID
Construction:
WITH ranked AS (
SELECT [id]
,[date]
,[product]
,row_number() over (partition by id order by date desc) rn
FROM [sistemy].[dbo].[TABLE_2]
)
SELECT id, [prod1],[date1],[prod2],[date2],[prod3],[date3]
FROM
(
SELECT id, type+cast(rn as varchar(1)) col, value
FROM ranked
CROSS APPLY
(
SELECT 'date', CAST([date] AS varchar(8))
UNION ALL
SELECT 'prod', product
) ca(type, value)
) unpivoted
PIVOT
(
max(value)
for col IN ([prod1],[date1],[prod2],[date2],[prod3],[date3])
) pivoted
You need to take a few steps to achive the aim.
Rank your products by date:
SELECT [id]
,[date]
,[product]
,row_number() over (partition by id order by date desc) rn
FROM [sistemy].[dbo].[TABLE_2]
Unpivot your date and product columns into one column. You can use UNPIVOT OR CROSS APPLY statements. I prefer CROSS APPLY
SELECT id, type+cast(rn as varchar(1)) col, value
FROM ranked
CROSS APPLY
(
SELECT 'date', CAST([date] AS varchar(8))
UNION ALL
SELECT 'prod', product
) ca(type, value)
or the same result using UNPIVOT
SELECT id, type+cast(rn as varchar(1)) col, value
FROM (
SELECT [id],
rn,
CAST([date] AS varchar(500)) date,
CAST([product] AS varchar(500)) prod
FROM ranked) t
UNPIVOT
(
value FOR type IN (date, product)
) unpvt
and at last you use PIVOTE and get a result.