I have two tables, defined as following:
PTable:
[StartDate], [EndDate], [Type], PValue
.................................................
2011-07-01 2011-07-07 001 5
2011-07-08 2011-07-14 001 10
2011-07-01 2011-07-07 002 15
2011-07-08 2011-07-14 002 20
TTable:
[Date], [Type], [TValue]
..................................
2011-07-01 001 11
2011-07-02 001 4
2011-07-03 001 0
2011-07-08 002 12
2011-07-09 002 12
2011-07-10 002 0
I want to update Tvalue column in TTable with the PValue in PTable, where [Date] in TTable is between [StartDate] and [EndDate] in PTable and DATEDIFF(DAY,TTable.[Date],PTable.[EndDate]) is minimum, AND PTable.Type = TTable.Type
The final TTable should look like this:
[Date], [Type], [TValue]
..................................
2011-07-01 001 11
2011-07-02 001 4
2011-07-03 001 5 --updated
2011-07-08 002 12
2011-07-09 002 12
2011-07-10 002 20 --updated
What I have tried is this:
UPDATE [TTable]
SET
TValue = T1.PValue
FROM TTable
INNER JOIN PTable T1 ON
[Date] BETWEEN T1.StartDate AND T1.EndDate
AND DATEDIFF(DAY,[Date],T1.EndDate) =
(SELECT MIN( DATEDIFF(DAY,TTable.[Date],T.EndDate) )
FROM PTable T WHERE TTable.[Date] BETWEEN T.StartDate AND T.EndDate
)
AND
T1.[Type] = TTable.[Type]
It gives me this error :
"Multiple columns are specified in an aggregated expression containing an outer reference. If an expression being aggregated contains an outer reference, then that outer reference must be the only column referenced in the expression."
Later edit:
Considering TTable AS T and PTable AS P, the condition for update are:
1. T.Type = P.Type
2. T.Date BETWEEN P.StartDate AND P.EndDate
3. DATEDIFF(DAY,T.Date,P.EndDate) = minimum value of all DATEDIFFs WHERE P.Type = T.Type AND T.Date BETWEEN P.StartDate AND P.EndDate
Later Edit 2:
Sorry, because I typed wrong the last row in PTable (2011-08-10 instead 2011-07-14), the final result was wrong.
I also managed to update in a simpler way, which I obviously should have tried from the start:
UPDATE TTABLE
SET
TValue = T1.PValue
FROM TTable
INNER JOIN PTABLE T1 ON
[Date] = (SELECT TOP(1) MAX(Date) FROM [TTABLE] WHERE [Date] BETWEEN T1.StartDate AND T1.EndDate)
AND
T1.Type = [TTABLE].Type
Sorry about this.
So you said something about "DATEDIFF(DAY,TTable.[Date],PTable.[EndDate]) is minimum" which confused me. Itt would seem like if there a weekly entry per Type, then for a particular Date, Type combination it would ever only match one. You might give this a try:
UPDATE TTABLE
SET TValue = T1.PValue
FROM TTable
INNER JOIN PTABLE T1 ON T1.Type = [TTABLE].Type -- find row in PTable that the Date falls between
and [Date] BETWEEN T1.StartDate AND T1.EndDate)
where
TValue = ( select MIN(TValue) -- finds the lowest TValue, 0 in example
from TTable))
...updated...
So it appears I read the problem incorrectly the first time. I had thought we update the TTable entries that have the lowest TValue. Not sure how I got that impression. Still seems like there needs to be a check for if it is 0?
UPDATE TTable
SET TValue = T1.PValue
FROM TTable
INNER JOIN PTable T1 ON T1.Type = TTable.Type
and T1.EndDate = (
SELECT top 1 EndDate
FROM PTable
WHERE Type=TTable.Type
ORDER BY abs(DATEDIFF(day,TTable.Date,PTable.EndDate)) desc)
WHERE
TValue = 0 -- only updating entries that aren't set, have a 0
This only works if there is one is one row in PTable with an EndDate of 7/7 or whatever for a given type. If there are two entries for Type 001 with an end date of 7/7, then it will join to two entries. Also if there is two entries that are equal distant from the date in question, so an EndDate of 7/7 and one of 7/13 are both 3 days from 7/10. If the EndDates are all 7 days apart (weekly) you should be ok.
Related
We have been keeping track of some changes in a History Table like this:
ChangeID EmployeeID PropertyName OldValue NewValue ModifiedDate
100 10 EmploymentStart Not Set 1 2013-01-01
101 10 SalaryValue Not Set 55000 2013-01-01
102 10 SalaryValue 55000 61500 2013-03-20
103 10 SalaryEffectiveDate 2013-01-01 2013-04-01 2013-03-20
104 11 EmploymentStart Not Set 1 2013-01-21
105 11 SalaryValue Not Set 43000 2013-01-21
106 10 SalaryValue 61500 72500 2013-09-20
107 10 SalaryEffectiveDate 2013-04-01 2013-10-01 2013-09-20
Basically if an Employee's Salary changes, we log two rows in the history table. One row for the Salary value itself and the other row for the salary effective date. So these two have identical Modification Date/Time and are kind safe to assume that are always after each other in the database. We can also assume that Salary Value is always logged first (so it is one record before the corresponding effective date
Now we are looking into creating reports based on a given date range into a table like this:
Annual Salary Change Report (2013)
EmployeeID Date1 Date2 Salary
10 2013-01-01 2013-04-01 55000
10 2013-04-01 2013-10-01 61500
10 2013-10-01 2013-12-31 72500
11 2013-03-21 2013-12-31 43000
I have done something similar in the past by joining the table to itself but in those cases the effective date and the new value where in the same row. Now I have to create each row of the output table by looking into a few rows of the existing history table. Is there an straightforward way of doing this whitout using cursors?
Edit #1:
Im reading on this and apparently its doable using PIVOTs
Thank you very much in advance.
You can use self join to get the result you want. The trick is to create a cte and add two rows for each EmployeeID as follows (I call the history table ht):
with cte1 as
(
select EmployeeID, PropertyName, OldValue, NewValue, ModifiedDate
from ht
union all
select t1.EmployeeID,
(case when t1.PropertyName = "EmploymentStart" then "SalaryEffectiveDate" else t1.PropertyName end),
(case when t1.PropertyName = "EmploymentStart" then t1.ModifiedDate else t1.NewValue end),
(case when t1.PropertyName = "SalaryValue" then t1.NewValue
when t1.PropertyName = "SalaryEffectiveDate" then "2013-12-31"
when t1.PropertyName = "EmploymentStart" then "2013-12-31" end),
"2013-12-31"
from ht t1
where t1.ModifiedDate = (select max(t2.ModifiedDate) from ht t2 where t1.EmployeeID = t2.EmployeeID)
)
select t3.EmployeeID, t4.OldValue Date1, t4.NewValue Date2, t3.OldValue Salary
from cte1 t3
inner join cte1 t4 on t3.EmployeeID = t4.EmployeeID
and t3.ModifiedDate = t4.ModifiedDate
where t3.PropertyName = "SalaryValue"
and t4.PropertyName = "SalaryEffectiveDate"
order by t3.EmployeeID, Date1
I hope this helps.
It is a little over kill to use pivot since you only need two properties. Use GROUP BY can also achieve this:
;WITH cte_salary_history(EmployeeID,SalaryEffectiveDate,SalaryValue)
AS
(
SELECT EmployeeID,
MAX(CASE WHEN PropertyName='SalaryEffectiveDate' THEN NewValue ELSE NULL END) AS SalaryEffectiveDate,
MAX(CASE WHEN PropertyName='SalaryValue' THEN NewValue ELSE NULL END) AS SalaryValue
FROM yourtable
GROUP BY EmployeeID,ModifiedDate
)
SELECT EmployeeID,SalaryEffectiveDate,
LEAD(SalaryEffectiveDate,1,'9999-12-31') OVER(PARTITION BY EmployeeID ORDER BY SalaryEffectiveDate) AS SalaryEndDate,
SalaryValue
FROM cte_salary_history
I'm trying to find out how to calculate difference between multiple rows from one simple query. Here it is:
SELECT [DateTime],EmployeeId,ControlPointID,EventTypeID
FROM [Events]
WHERE Day([DateTime]) = 4
AND Month([DateTime]) = 7
AND Year([DateTime]) = 2017
AND EmployeeId = 451
AND ControlPointID IN ( 3, 6 )
AND EventTypeID IN ( 1, 2 )
ORDER BY [DateTime]
Result:
DateTime EmployeeId ControlPointID EventTypeID
2017-07-04 11:32:10.000 451 6 1
2017-07-04 16:07:00.000 451 3 2
2017-07-04 16:42:50.000 451 6 1
2017-07-04 20:04:10.000 451 3 2
I need to calculate difference between [DateTime] in minutes.
EventTypeId = 1 means that Employee enters to the building and EventTypeId=2 means that Employee leaves. I can calculate difference between first Enter Event and last Leave Event. In this case it's 512 minutes. But, i have problem to calculate work time, when someone enters twice and leaves twice. It should be 477 minutes. Calculation should looks like this:
DateDiff = (2017-07-04 16:07:00.000 - 2017-07-04 11:32:10.000) +
(2017-07-04 20:04:10.000 - 2017-07-04 16:42:50.000)
Can you help me figure it out, please ?
Given a building entry, finding the first leave after that entry can be done with cross apply:
select entry.EmployeeId, entry.DateTime, exit.DateTime
from Events entry
cross apply (select top 1 e.DateTime
from Events e
where e.EmployeeId = entry.EmployeeId
and e.DateTime > entry.DateTime
and e.EventTypeId = 2
order by e.DateTime asc
) as exit
where entry.EventTypeId = 1
at which point you just need to use the applicable T/SQL function to get the difference in whatever unit you want (eg. in minutes with datediff(minute, entry.DateTime, exit.DateTime).
To get the total of all the differences simply sum the differences:
select EmployeeId, sum(mins)
from (
select entry.EmployeeId, entry.DateTime as EntryDateTime, exit.DateTime as ExitDateTime, datediff(minute, EntryDateTime, ExitDateTime) as mins
from Events entry
cross apply (select top 1 e.DateTime
from Events e
where e.EmployeeId = entry.EmployeeId
and e.DateTime > entry.DateTime
and e.EventTypeId = 2
order by e.DateTime asc
) as exit
where entry.EventTypeId = 1
) as input
group by EmployeeId
Edit: added overall summation (with diff on the inside for clarity)
This can be done using LAG window function, since 2008 does not supports it we need to left join with Row_Number to find the previous entry
;WITH cte
AS (SELECT Row_number()OVER(Partition by EmployeeID ORDER BY [DateTime]) rn,*
FROM Yourresult)
SELECT a.EmployeeID,
Sum(Datediff(minute, b.[DateTime], a.[DateTime]))
FROM cte a
LEFT JOIN cte b
ON a.EmployeeID = b.EmployeeID
AND a.rn = b.rn + 1
WHERE a.[EventTypeId] = 2
GROUP BY a.EmployeeID
Note : This considers there isn't any wrong punches. Just like your sample data
I am now using the mssql with its sample database "adventureworks 2014", here I faced some problems with join and sum, here is the two table I used:
PurchaseOrderHeader:
PurchaseOrderID VendorID OrderDate TotalDue
1 1580 2011-04-16 00:00:00.000 222.1492
2 1496 2011-04-16 00:00:00.000 300.6721
3 1494 2011-04-16 00:00:00.000 9776.2665
4 1650 2011-04-16 00:00:00.000 189.0395
5 1654 2011-04-30 00:00:00.000 22539.0165
6 1664 2011-04-30 00:00:00.000 16164.0229
7 1678 2011-04-30 00:00:00.000 64847.5328
PurchaseOrderDetail:
PurchaseOrderID PurchaseOrderDetailID OrderQty ProductID
1 1 4 1
2 2 3 359
2 3 3 360
3 4 550 530
4 5 3 4
5 6 550 512
6 7 550 513
7 8 550 317
7 9 550 318
7 10 550 319
Here is the sql script:
CREATE TABLE PurchaseOrderHeader(
PurchaseOrderID INTEGER NOT NULL PRIMARY KEY
,VendorID INTEGER NOT NULL
,OrderDate VARCHAR(23) NOT NULL
,TotalDue NUMERIC(10,4) NOT NULL
);
INSERT INTO PurchaseOrderHeader(PurchaseOrderID,VendorID,OrderDate,TotalDue) VALUES (1,1580,'2011-04-16 00:00:00.000',222.1492);
INSERT INTO PurchaseOrderHeader(PurchaseOrderID,VendorID,OrderDate,TotalDue) VALUES (2,1496,'2011-04-16 00:00:00.000',300.6721);
INSERT INTO PurchaseOrderHeader(PurchaseOrderID,VendorID,OrderDate,TotalDue) VALUES (3,1494,'2011-04-16 00:00:00.000',9776.2665);
INSERT INTO PurchaseOrderHeader(PurchaseOrderID,VendorID,OrderDate,TotalDue) VALUES (4,1650,'2011-04-16 00:00:00.000',189.0395);
INSERT INTO PurchaseOrderHeader(PurchaseOrderID,VendorID,OrderDate,TotalDue) VALUES (5,1654,'2011-04-30 00:00:00.000',22539.0165);
INSERT INTO PurchaseOrderHeader(PurchaseOrderID,VendorID,OrderDate,TotalDue) VALUES (6,1664,'2011-04-30 00:00:00.000',16164.0229);
INSERT INTO PurchaseOrderHeader(PurchaseOrderID,VendorID,OrderDate,TotalDue) VALUES (7,1678,'2011-04-30 00:00:00.000',64847.5328);
CREATE TABLE PurchaseOrderDetail(
PurchaseOrderID INTEGER NOT NULL
,PurchaseOrderDetailID INTEGER NOT NULL PRIMARY KEY
,OrderQty INTEGER NOT NULL
,ProductID INTEGER NOT NULL
);
INSERT INTO PurchaseOrderDetail(PurchaseOrderID,PurchaseOrderDetailID,OrderQty,ProductID) VALUES (1,1,4,1);
INSERT INTO PurchaseOrderDetail(PurchaseOrderID,PurchaseOrderDetailID,OrderQty,ProductID) VALUES (2,2,3,359);
INSERT INTO PurchaseOrderDetail(PurchaseOrderID,PurchaseOrderDetailID,OrderQty,ProductID) VALUES (2,3,3,360);
INSERT INTO PurchaseOrderDetail(PurchaseOrderID,PurchaseOrderDetailID,OrderQty,ProductID) VALUES (3,4,550,530);
INSERT INTO PurchaseOrderDetail(PurchaseOrderID,PurchaseOrderDetailID,OrderQty,ProductID) VALUES (4,5,3,4);
INSERT INTO PurchaseOrderDetail(PurchaseOrderID,PurchaseOrderDetailID,OrderQty,ProductID) VALUES (5,6,550,512);
INSERT INTO PurchaseOrderDetail(PurchaseOrderID,PurchaseOrderDetailID,OrderQty,ProductID) VALUES (6,7,550,513);
INSERT INTO PurchaseOrderDetail(PurchaseOrderID,PurchaseOrderDetailID,OrderQty,ProductID) VALUES (7,8,550,317);
INSERT INTO PurchaseOrderDetail(PurchaseOrderID,PurchaseOrderDetailID,OrderQty,ProductID) VALUES (7,9,550,318);
INSERT INTO PurchaseOrderDetail(PurchaseOrderID,PurchaseOrderDetailID,OrderQty,ProductID) VALUES (7,10,550,319);
and here is my code:
select PurchaseOrderHeader.VendorID,
SUM(CASE WHEN Datename(year,PurchaseOrderHeader.OrderDate) = 2011 THEN PurchaseOrderHeader.TotalDue else 0 END) as "TotalPay IN 2011",
SUM(CASE WHEN Datename(year,PurchaseOrderHeader.OrderDate) = 2011 THEN PurchaseOrderDetail.OrderQty else 0 END) as "TotalOrder IN 2011"
from PurchaseOrderHeader
left join PurchaseOrderDetail on PurchaseOrderHeader.PurchaseOrderID = PurchaseOrderDetail.PurchaseOrderID
group by PurchaseOrderHeader.VendorID
order by VendorID
Here is what I got:
VendorID TotalPay IN 2011 TotalOrder IN 2011
1494 9776.2665 550
1496 601.3442 6
1580 222.1492 4
1650 189.0395 3
1654 22539.0165 550
1664 16164.0229 550
1678 194542.5984 1650
while I should expect:
VendorID TotalPay IN 2011 TotalOrder IN 2011
1494 9776.2665 550
1496 300.6721 6
1580 222.1492 4
1650 189.0395 3
1654 22539.0165 550
1664 16164.0229 550
1678 64847.5328 1650
This code will join two tables on PurchaseOrderID, and calculate the TotalDue grouped by vendorID. The problem is when I use join, where will be multiple rows from table PurchaseOrderDetail refered to one row in table PurchaseOrderHeader. In this example for vendor 1496 and 1678 there are two or three rows refer to one row in PurchaseDetailHeader. So it will be added two or three times. How should I avoid adding multiple times, thanks!
You can just take your SUM and divide by COUNT. Something like this.
select PurchaseOrderHeader.VendorID,
SUM(CASE WHEN Datename(year,PurchaseOrderHeader.OrderDate) = 2011 THEN PurchaseOrderHeader.TotalDue else 0 END) / COUNT(*) as "TotalPay IN 2011",
SUM(CASE WHEN Datename(year,PurchaseOrderHeader.OrderDate) = 2011 THEN PurchaseOrderDetail.OrderQty else 0 END) / COUNT(*) as "TotalOrder IN 2011"
from Purchasing.PurchaseOrderHeader
left join Purchasing.PurchaseOrderDetail on PurchaseOrderHeader.PurchaseOrderID = PurchaseOrderDetail.PurchaseOrderID
group by PurchaseOrderHeader.VendorID
order by VendorID
select h.VendorID,
SUM(CASE WHEN Datename(year,h.OrderDate) = 2011 THEN h.TotalDue else 0 END) as "TotalPay IN 2011",
SUM(CASE WHEN Datename(year,h.OrderDate) = 2011 THEN d.OrderQty else 0 END) as "TotalOrder IN 2011"
from PurchaseOrderHeader h
left join (
select t.PurchaseOrderID,
sum(t.OrderQty) as OrderQty
from PurchaseOrderDetail t
group by t.PurchaseOrderID
) d on d.PurchaseOrderID = h.PurchaseOrderID
group by h.VendorID
order by VendorID
The default way to avoid double counting is to use SUM(DISTINCT expr).
This does not always work well enough, as you do not want to sum distinct values, but want to sum distinct rows even when those rows share the same values.
The solution is to use a sub-query to sum the details on order number and then join the result. Then you have only one total per order id to join with the order lines:
SELECT PurchaseOrderHeader.VendorID,
SUM(PurchaseOrderHeader.TotalDue) AS "TotalPay IN 2011",
SUM(POD.Qty) AS "TotalOrder IN 2011"
FROM PurchaseOrderHeader
LEFT JOIN (
SELECT PurchaseOrderDetail.PurchaseOrderID, SUM(OrderQty) AS Qty
FROM PurchaseOrderDetail
GROUP BY PurchaseOrderDetail.PurchaseOrderID
) AS POD on PurchaseOrderHeader.PurchaseOrderID = POD.PurchaseOrderID
WHERE Datename(year,PurchaseOrderHeader.OrderDate) = 2011
GROUP BY PurchaseOrderHeader.VendorID
ORDER BY VendorID
Also I took the freedom to remove the CASE WHEN statement from the SUM() to the WHERE part of the query. In this case that should give you the same results with shorter code.
Lots of good answers, but I think they miss the bit where a vendor could have multiple purchase orders, and that throws off how the TotalOrder gets calculated. (Try a sample with multiple vendors with multiple orders with each order having multiple details.) Don't forget to check for possible NULL values!
Here, I use the subquery to calculate the TotalPay for each vendor for the year in question, and then join that back to the list of all vendors. (Threw in table aliases as well, for legibility.)
-- As a subquery
SELECT
hd.VendorID,
,sum(case
when year(hd.OrderDate) = 2011 then hd.TotalDue
else 0
end) as "TotalPay IN 2011"
,isnull(subQuery.TotaOrderIn2011, 0) as "TotalOrder IN 2011"
from PurchaseOrderHeader hd
left join (-- Calculate volume by vendor for 2011
select
hd.VendorID
,sum(OrderQty) TotalOrderIn2011
from PurchaseOrderHeader hd
inner join PurchaseOrderDetail dt
on hd.PurchaseOrderID = dt.PurchaseOrderID
where year(hd.OrderDate) = 2011
group by
hd.VendorID
) subQuery
on subQuery.VendorId = hd.VendorId
group by hd.VendorID
order by hd.VendorID
I don't know if what i'm looking for it's possible with my current dataset, or if what i'm expecting it's possible at all.
what i am trying to accomplish is to get all rows with status = 2 or 7 get the date and then get the next row with different status to obtain the dateinterval and get the nuber of days that the status had.
DataSet
id_compromiso|fecha |id_actividad|status
-------------+-----------+------------+----------
32 2013-12-10 359 2
32 2013-12-16 380 5
32 2013-12-18 401 7
32 2013-12-24 485 8
58 2013-12-02 248 2
58 2013-12-03 254 2
58 2013-12-10 360 2
58 2013-12-10 378 5
58 2013-12-12 395 2
what have i tried:
SQL query:
WITH pausa AS (
SELECT tmp.id_compromiso, tmp.fecha, MIN(tact.id_actividad) as id_actividad
FROM Actividades as tact
INNER JOIN (
SELECT act.id_compromiso, CAST(act.fecha as date) as fecha
FROM actividades as act
WHERE act.[status]=7
) as tmp
ON(tmp.id_compromiso = tact.id_compromiso AND tmp.fecha = CAST(tact.fecha as date))
WHERE tact.[status]=7
GROUP BY tmp.id_compromiso, tmp.fecha
),
revision AS (
SELECT tmp.id_compromiso, tmp.fecha, MIN(tact.id_actividad) as id_actividad
FROM Actividades as tact
INNER JOIN (
SELECT act.id_compromiso, CAST(act.fecha as date) as fecha
FROM actividades as act
WHERE act.[status]=2
) as tmp
ON(tmp.id_compromiso = tact.id_compromiso AND tmp.fecha = CAST(tact.fecha as date))
WHERE tact.[status]=2
GROUP BY tmp.id_compromiso, tmp.fecha
)
SELECT * FROM revision ORDER BY id_compromiso;
but really running i'm out of ideas on how to get the next item with different status from the table ...
-- First, it extends actividades to include the minimum fecha for the status
-- on the compromiso; this is min(fecha) in the partition by compromiso/status
WITH status_start AS(
SELECT *, MIN(fecha) OVER (PARTITION BY id_compromiso, status) sStart
FROM actividades
),
-- Then, join the extended actividades table with itself (aliased a and b) by compromiso but status 2,7 with status not 2,7
-- (this is the AND a.STATUS IN (2,7) AND b.STATUS NOT IN(2,7) in the join clause)
-- and making sure it's a later status (the a.sStart <b.sStart bit)
-- at this point also calculates the date difference in days
status_start_end AS(
SELECT a.*,b.sStart sEnd, DATEDIFF(d, a.sStart, b.sStart) AS sDiff FROM status_start a
JOIN status_start b ON (a.id_compromiso =b.id_compromiso AND a.STATUS IN (2,7) AND b.STATUS NOT IN(2,7) AND a.sStart <b.sStart))
-- Finaly as the previous query would have day difference in relation to ALL later status, we need to select only the minimum difference
-- as this is when the status actually change. We also need to eliminate duplicates using 'distinct;
-- as it could be many entries for the same status and
-- also many later status.
SELECT DISTINCT id_compromiso, status ,
MIN(sDiff) OVER (PARTITION BY id_compromiso) "Nr. of days in status"
FROM status_start_end
Without knowing more about the context in question it's difficult to provide a fitting answer, but something like this may help:
SELECT TOP 1 id_compromiso, fecha, id_actividad, status
FROM Actividades
WHERE CAST(fecha AS DATE)>( SELECT MAX(CAST(fecha AS DATE))
FROM Actividades
WHERE status IN (2,7))
AND status NOT IN (2,7)
ORDER BY CAST(fecha AS DATE) DESC
I have set up a SQL Fiddle here.
I have the following table
SnapShotDay OperationalUnitNumber IsOpen StatusDate
1-01-2014 001 1 1-01-2014
2-01-2014 NULL NULL NULL
3-01-2014 001 0 3-01-2014
4-01-2014 NULL NULL NULL
5-01-2014 001 1 5-01-2014
I obtain this with a SELECT construct, but what I need to do now is fill in the "NULL"ed rows by taking values from the first Non nulled row before. The latter would give:
SnapShotDay OperationalUnitNumber IsOpen StatusDate
1-01-2014 001 1 1-01-2014
2-01-2014 001 1 1-01-2014
3-01-2014 001 0 3-01-2014
4-01-2014 001 0 3-01-2014
5-01-2014 001 1 5-01-2014
In functional words: I have events records that give me an event on a date for an oprrational unit; the event is: IsOpen or IsClosed. Chaining those events together according to the date gives a sort of Ranges. What I need is generate daily records for those ranges (target is a fact table).
I am trying to achieve this in plain SQL query (no stored procedure).
Can you think of a trick ?
Declare #t table(
SnapShotDay date,
OperationalUnitNumber int,
IsOpen bit,
StatusDate date
)
insert into #t
select '1-01-2014', 001 , 1 , '1-01-2014' union all
select '2-01-2014', NULL, NULL, NULL union all
select '3-01-2014', 001 , 0 ,'3-01-2014' union all
select '4-01-2014', NULL,NULL,NULL union all
select '5-01-2014', 001 ,1,'5-01-2014'
;
with CTE as
(
select *,row_number()over( order by (select 0))rn from #t
)
select *,
case when a.isopen is null then (
select IsOpen from cte where rn=a.rn-1
) else a.isopen end
from cte a
ok i got it create one more cte1 then,
,cte1 as
(
select top 1 rn ,IsOpen from cte where IsOpen is not null order by rn desc
)
--select * from Statuses
select *,
case
when a.rn<=(select b.rn from cte1 b) and a.IsOpen is null then
(
select
a1.IsOpen
from
cte a1
where
a1.rn=a.rn-1
)
when a.rn>=(select b.rn from cte1 b) and a.IsOpen is null then
(select IsOpen from cte1)
else
a.isopen
end
from
cte a
Try this. In the main query we're looking for the previous date with not null values. Then just JOIN this table with this LastDate.
WITH T1 AS
(
SELECT *, (SELECT MAX(SnapShotDay)
FROM T
WHERE SnapShotDay<=TMain.SnapShotDay
AND OPERATIONALUNITNUMBER IS NOT NULL)
as LastDate
FROM T as TMain
)
SELECT T1.SnapShotDay,
T.OperationalUnitNumber,
T.IsOpen,
T.StatusDate
FROM T1
JOIN T ON T1.LastDate=T.SnapShotDay
SQLFiddle demo
SELECT
t1.SnapShotDay,
CASE WHEN t1.OperationalUnitNumber IS NOT NUll
THEN t1.OperationalUnitNumber
ELSE (SELECT TOP 1 t2.OperationalUnitNumber FROM YourTable t2 WHERE t2.SnapShotDay < t1.SnapShotDay AND t2.OperationalUnitNumber IS NOT NULL ORDER BY SnapShotDay DESC)
END AS OperationalUnitNumber,
CASE WHEN t1.IsOpen IS NOT NUll
THEN t1.IsOpen
ELSE (SELECT TOP 1 t2.IsOpen FROM YourTable t2 WHERE t2.SnapShotDay < t1.SnapShotDay AND t2.IsOpen IS NOT NULL ORDER BY SnapShotDay DESC)
END AS IsOpen,
CASE WHEN t1.StatusDate IS NOT NUll
THEN t1.StatusDate
ELSE (SELECT TOP 1 t2.StatusDate FROM YourTable t2 WHERE t2.SnapShotDay < t1.SnapShotDay AND t2.StatusDate IS NOT NULL ORDER BY SnapShotDay DESC)
END AS StatusDate
FROM YourTable t1
You asked for 'plain sql', here is a tested attempt using SQL, with comments, that gives the required answer.
I have tested the code using 'sqlite' and 'mysql' on windows xp. It is pure SQL and should work everywhere.
SQL is about 'sets' and combining them and ordering the results.
This problem seems to be about two separate sets:
1) The 'snap shot day' that have readings.
2) the 'snap shot day' that don't have readings.
I have added extra columns so that we can easily see where values came from.
let us deal with the easy set first:
This is the set of 'supplied' readings.
SELECT dss.SnapShotDay theDay,
'supplied' readingExists,
dss.OperationalUnitNumber,
dss.IsOpen,
dss.StatusDate
FROM dailysnapshot dss
WHERE dss.OperationalUnitNumber IS NOT NULL
results:
theDay readingExists OperationalUnitNumber IsOpen StatusDate
2014-01-01 supplied 001 1 2014-01-01
2014-01-03 supplied 001 0 2014-01-03
2014-01-05 supplied 001 1 2014-01-05
Now let us deal with the set of 'days that have missing readings'. We need to get the 'most recent day that has readings that is closest to the day with the missing readings' and assume the same values from the 'most recent day' that is before the 'current' missing day.
It sounds complex but it isn't. It asks:
foreach day without a reading - get me the closest, earlier, date that has readings and i will use those readings.
Here is the query:
SELECT emptyDSS.SnapShotDay,
'missing' readingExists,
maxPrevDSS.OperationalUnitNumber,
maxPrevDSS.IsOpen,
maxPrevDSS.StatusDate
FROM dailysnapshot emptyDSS
INNER JOIN dailysnapshot maxPrevDSS ON maxPrevDSS.SnapShotDay =
(SELECT MAX(dss.SnapShotDay)
FROM dailysnapshot dss
WHERE dss.SnapShotDay < emptyDSS.SnapShotDay
AND dss.OperationalUnitNumber IS NOT NULL)
WHERE emptyDSS.OperationalUnitNumber IS NULL
results:
SnapShotDay readingExists OperationalUnitNumber IsOpen StatusDate
2014-01-02 missing 001 1 2014-01-01
2014-01-04 missing 001 0 2014-01-03
This is not about efficiency! It is about getting the correct 'result set' with the easiest to understand SQL code. I assume the database engine will optimize the query. The query can be 'tweaked' later if required.
We now need to combine the two queries and order the results in the manner we require.
The standard way of combining results from SQL queries is with set operators (union, intersection, minus).
we use 'union' and an 'order by' on the result set.
this gives the final query of:
SELECT dss.SnapShotDay theDay,
'supplied' readingExists,
dss.OperationalUnitNumber,
dss.IsOpen,
dss.StatusDate
FROM dailysnapshot dss
WHERE `OperationalUnitNumber` IS NOT NULL
UNION
SELECT emptyDSS.SnapShotDay theDay,
'missing' readingExists,
maxPrevDSS.OperationalUnitNumber,
maxPrevDSS.IsOpen,
maxPrevDSS.StatusDate
FROM dailysnapshot emptyDSS
INNER JOIN dailysnapshot maxPrevDSS ON maxPrevDSS.SnapShotDay =
(SELECT MAX(dss.SnapShotDay)
FROM dailysnapshot dss
WHERE dss.SnapShotDay < emptyDSS.SnapShotDay
AND dss.OperationalUnitNumber IS NOT NULL)
WHERE emptyDSS.OperationalUnitNumber IS NULL
ORDER BY theDay ASC
result:
theDay readingExists dss.OperationalUnitNumber dss.IsOpen dss.StatusDate
2014-01-01 supplied 001 1 2014-01-01
2014-01-02 missing 001 1 2014-01-01
2014-01-03 supplied 001 0 2014-01-03
2014-01-04 missing 001 0 2014-01-03
2014-01-05 supplied 001 1 2014-01-05
I enjoyed doing this.
It should work with most SQL engines.