I'm connecting to Microsoft SQL Server on Tableau through a custom SQL query. I have a table with 3 fields DateTime, TagName, Value, and I want to replace null values in the Value field by the last (respecting the DateTime value) non-null value in each group of TagName.
|---------------------|------------------|-----------------|
| DateTime | TagName | Value
|---------------------|------------------|-----------------
| 15.04.2019 16:51:30| A | 10
|---------------------|------------------|-----------------
| 15.04.2019 16:52:42| A | NULL
|---------------------|------------------|-----------------
| 15.04.2019 16:53:14| A | NULL
|---------------------|------------------|-----------------
| 15.04.2019 17:52:14| A | 15
|---------------------|------------------|-----------------
| 15.04.2019 16:51:30| B | NULL
|---------------------|------------------|-----------------
| 15.04.2019 16:52:42| B | NULL
|---------------------|------------------|-----------------
| 15.04.2019 16:53:14| B | NULL
|---------------------|------------------|-----------------
| 15.04.2019 17:52:14| B | 15
|---------------------|------------------|-----------------|
The new table should look like this:
|---------------------|------------------|-----------------|
| DateTime | Computer | Value
|---------------------|------------------|-----------------
| 15.04.2019 16:51:30| A | 10
|---------------------|------------------|-----------------
| 15.04.2019 16:52:42| A | 10
|---------------------|------------------|-----------------
| 15.04.2019 16:53:14| A | 10
|---------------------|------------------|-----------------
| 15.04.2019 17:52:14| A | 15
|---------------------|------------------|-----------------
| 15.04.2019 16:51:30| B | 0
|---------------------|------------------|-----------------
| 15.04.2019 16:52:42| B | 0
|---------------------|------------------|-----------------
| 15.04.2019 16:53:14| B | 0
|---------------------|------------------|-----------------
| 15.04.2019 17:52:14| B | 15
|---------------------|------------------|-----------------|
This is already what I've tried, but it replaces NULL values without considering the TagNames values (In this example there is only one TagName).
SELECT Computer, DateTime
, CASE
WHEN Value IS NULL
THEN
(SELECT TOP 1 Value
FROM History
WHERE DateTime<T.DateTime
AND TagName='RM02EL00CPT81.rEp'
AND DateTime >='2018-12-31 23:59:00'
AND wwRetrievalMode='Delta'
AND Value IS NOT NULL ORDER BY DateTime DESC
)
ELSE Value
END
AS ValueNEW
FROM History T
WHERE TagName='RM02EL00CPT81.rEp' AND DateTime >='2018-12-31 23:59:00' AND wwRetrievalMode='Delta'
I wanted to do almost the same thing by adding OVER(PARTITION BY TagName), but it threw an error. (This is because it doesn't work with SELECT TOP 1.)
This is a "classic" Gaps and Islands question. You can achieve this without a 2 scans, or a triangular join by using the window functions:
WITH VTE AS(
SELECT CONVERT(datetime, [DateTime],104) AS [DateTime],
TagName,
[Value]
FROM (VALUES ('15.04.2019 16:51:30','A',10 ),
('15.04.2019 16:52:42','A',NULL),
('15.04.2019 16:53:14','A',NULL),
('15.04.2019 17:52:14','A',15 ),
('15.04.2019 16:51:30','B',NULL),
('15.04.2019 16:52:42','B',NULL),
('15.04.2019 16:53:14','B',NULL),
('15.04.2019 17:52:14','B',15 )) V([DateTime],TagName,[Value])),
Grps AS(
SELECT [DateTime],
TagName,
[Value],
COUNT(CASE WHEN [Value] IS NOT NULL THEN 1 END) OVER (PARTITION BY TagName ORDER BY [DateTime]
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS Grp
FROM VTE)
SELECT DateTime,
TagName,
ISNULL(MAX([Value]) OVER (PARTITION BY TagName, Grp),0) AS [Value]
FROM Grps
ORDER BY TagName, [DateTime]
Try this
;WITH CTE([DateTime],TagName,Valu)
AS
(
SELECT '15.04.2019 16:51:30','A' , 10 UNION ALL
SELECT '15.04.2019 16:52:42','A' , NULL UNION ALL
SELECT '15.04.2019 16:53:14','A' , NULL UNION ALL
SELECT '15.04.2019 17:52:14','A' , 15 UNION ALL
SELECT '15.04.2019 16:51:30','B' , NULL UNION ALL
SELECT '15.04.2019 16:52:42','B' , NULL UNION ALL
SELECT '15.04.2019 16:53:14','B' , NULL UNION ALL
SELECT '15.04.2019 17:52:14','B' , 15
)
SELECT [DateTime],TagName As Computer,
ISNULL(CASE WHEN Valu IS NOT NULL
THEN Valu
ELSE
(
SELECT TOP 1 Valu FROM
CTE i
WHERE i.TagName = o.TagName
) END,0) As Valu
FROM CTE o
Result
DateTime Computer Valu
---------------------------------------------
15.04.2019 16:51:30 A 10
15.04.2019 16:52:42 A 10
15.04.2019 16:53:14 A 10
15.04.2019 17:52:14 A 15
15.04.2019 16:51:30 B 0
15.04.2019 16:52:42 B 0
15.04.2019 16:53:14 B 0
15.04.2019 17:52:14 B 15
So you're trying to retrieve data from Wonderware Historian. Perhaps you don't need any windowing and replacing, because the Historian retrieval engine should be able to give you the data you need without nulls. Try this:
select DateTime, TagName as Computer, Value
from History
where TagName in ('A', 'B') --put here the tagnames you want to retrieve
and DateTime > '2018-12-31'
AND wwRetrievalMode='Delta'
order by TagName, DateTime
Related
I have a datetime column that has a 5 min interval between the next data, however I want to see if that column contains any time interval less than 5 mins, particularly 5 secs.
So for example:
one date would read 2018-05-04 19:21:46.000
the next row would read 2018-05-04 19:26:46.000
and 2018-05-04 19:31:46.000.
However, we sometimes get rows that read:
2018-05-04 19:36:46.000
then 2018-05-04 19:36:51.000
then 2018-05-04 19:36:56.000
What SQL script would be best to filter the column to distinguish the erroneous data (the 5 secs interval) from the correct data (5 min interval) especially in a table with thousands of rows?
Hi #Andrea, thanks for that. I have a couple of questions. What does the 'q' stand for? and when i rewrite the query as
SELECT ProductID, MyTimestamp, DATEDIFF(second, xMyTimestamp, MyTimestamp) as DIFFERENCE_IN_SECONDS
FROM (
SELECT *,
Lag(MyTimestamp) OVER (ORDER BY MyTimestamp, ProductID) as xMyTimestamp
FROM TableName
) q
WHERE xMyTimestamp IS NOT NULL and ProductID= 31928
I get this result which doesn't compute the time accurately.
+-----------+-------------------------+-----------------------+
| ProductID | MyTimestamp | DIFFERENCE_IN_SECONDS |
+-----------+-------------------------+-----------------------+
| 31928 | 2017-03-21 13:36:30.000 | 0 |
| 31928 | 2017-03-21 13:46:30.000 | 0 |
| 31928 | 2017-03-21 13:56:32.000 | 0 |
| 31928 | 2017-03-21 14:01:32.000 | 0 |
| 31928 | 2017-03-21 14:11:32.000 | 0 |
| 31928 | 2017-03-21 14:16:32.000 | 0 |
| 31928 | 2017-03-21 14:26:32.000 | 0 |
| 31928 | 2017-03-21 14:36:32.000 | 0 |
+-----------+-------------------------+-----------------------+
Any reason why
Since you are on 2014, you can use LEAD to compare the value of one row, to the value of the next.
declare #table table(id int identity(1,1), interval datetime)
insert into #table
values
('2018-05-04 19:21:46.000'),
('2018-05-04 19:26:46.000'),
('2018-05-04 19:31:46.000'),
('2018-05-04 19:36:46.000'),
('2018-05-04 19:36:51.000'),
('2018-05-04 19:36:56.000')
select
id
,interval
,issue_with_row = case
when
isnull(datediff(minute,interval,lead(interval) over (order by id, interval)),0) < 5
then 1
else 0
end
from #table
order by id
Or if you wanted to only see those,
;with cte as(
select
id
,interval
,issue_with_row = case
when
isnull(datediff(minute,interval,lead(interval) over (order by id, interval)),0) < 5
then 1
else 0
end
from #table)
select *
from cte
where issue_with_row = 1
You can use LAG:
declare #tmp table(MyTimestamp datetime)
insert into #tmp values
('2018-05-04 19:21:46.000')
,('2018-05-04 19:26:46.000')
,('2018-05-04 19:31:46.000')
,('2018-05-04 19:36:46.000')
,('2018-05-04 19:36:51.000')
,('2018-05-04 19:36:56.000')
SELECT DATEDIFF(second, xMyTimestamp, MyTimestamp) as DIFFERENCE_IN_SECONDS
FROM (
SELECT *,
LAG(MyTimestamp) OVER (ORDER BY MyTimestamp) xMyTimestamp
FROM #tmp
) q
WHERE xMyTimestamp IS NOT NULL
results:
So you should use it like this:
SELECT DATEDIFF(second, xMyTimestamp, MyTimestamp) as DIFFERENCE_IN_SECONDS
FROM (
SELECT *,
LAG(MyTimestamp) OVER (ORDER BY MyTimestamp) xMyTimestamp
FROM [YOUR_TABLE_NAME_HERE]
) q
WHERE xMyTimestamp IS NOT NULL
Edit
Here is another sample based on new data posted by OP:
declare #tmp table(ProductID int, MyTimestamp datetime)
insert into #tmp values
(31928, '2017-03-21 13:36:30.000')
,(31928, '2017-03-21 13:46:30.000')
,(31928, '2017-03-21 13:56:32.000')
,(31928, '2017-03-21 14:01:32.000')
,(31928, '2017-03-21 14:11:32.000')
,(31928, '2017-03-21 14:16:32.000')
,(31928, '2017-03-21 14:26:32.000')
,(31928, '2017-03-21 14:36:32.000')
SELECT ProductID
,MyTimestamp
,DATEDIFF(second, xMyTimestamp, MyTimestamp) AS DIFFERENCE_IN_SECONDS
FROM (
SELECT *
,Lag(MyTimestamp) OVER (
ORDER BY MyTimestamp
,ProductID
) AS xMyTimestamp
FROM #tmp
) q
WHERE xMyTimestamp IS NOT NULL
AND ProductID = 31928
Output:
Here you can check that the results are calculated correctly.
I have searched high and low for weeks now trying to find a solution to my problem.
As far as I can ascertain, my SQL Server version (2008r2) is a limiting factor on this but, I am positive there is a solution out there.
My problem is as follows:
A have a table with potential contiguous dates in the form of Customer-Status-DateStart-DateEnd-EventID.
I need to merge contiguous dates by customer and status - the status field can shift up and down throughout a customers pathway.
Some example data is as follows:
DECLARE #Tbl TABLE([CustomerID] INT
,[Status] INT
,[DateStart] DATE
,[DateEnd] DATE
,[EventID] INT)
INSERT INTO #Tbl
VALUES (1,1,'20160101','20160104',1)
,(1,1,'20160104','20160108',3)
,(1,2,'20160108','20160110',4)
,(1,1,'20160110','20160113',7)
,(1,3,'20160113','20160113',9)
,(1,3,'20160113',NULL,10)
,(2,1,'20160101',NULL,2)
,(3,2,'20160109','20160110',5)
,(3,1,'20160110','20160112',6)
,(3,1,'20160112','20160114',8)
Desired output:
Customer | Status | DateStart | DateEnd
---------+--------+-----------+-----------
1 | 1 | 2016-01-01| 2016-01-08
1 | 2 | 2016-01-08| 2016-01-10
1 | 1 | 2016-01-10| 2016-01-13
1 | 3 | 2016-01-13| NULL
2 | 1 | 2016-01-01| NULL
3 | 2 | 2016-01-09| 2016-01-10
3 | 1 | 2016-01-10| 2016-01-14
Any ideas / code will be greatly received.
Thanks,
Dan
Try this
DECLARE #Tbl TABLE([CusomerID] INT
,[Status] INT
,[DateStart] DATE
,[DateEnd] DATE
,[EventID] INT)
INSERT INTO #Tbl
VALUES (1,1,'20160101','20160104',1)
,(1,1,'20160104','20160108',3)
,(1,2,'20160108','20160110',4)
,(1,1,'20160110','20160113',7)
,(1,3,'20160113','20160113',9)
,(1,3,'20160113',NULL,10)
,(2,1,'20160101',NULL,2)
,(3,2,'20160109','20160110',5)
,(3,1,'20160110','20160112',6)
,(3,1,'20160112','20160114',8)
;WITH CTE
AS
(
SELECT CusomerID ,
Status ,
DateStart ,
COALESCE(DateEnd, '9999-01-01') AS DateEnd,
EventID,
ROW_NUMBER() OVER (ORDER BY CusomerID, EventID) RowId,
ROW_NUMBER() OVER (PARTITION BY CusomerID, Status ORDER BY EventID) StatusRowId FROM #Tbl
)
SELECT
A.CusomerID ,
A.Status ,
A.DateStart ,
CASE WHEN A.DateEnd = '9999-01-01' THEN NULL
ELSE A.DateEnd END AS DateEnd
FROM
(
SELECT
CTE.CusomerID,
CTE.Status,
MIN(CTE.DateStart) AS DateStart,
MAX(CTE.DateEnd) AS DateEnd
FROM
CTE
GROUP BY
CTE.CusomerID,
CTE.Status,
CTE.StatusRowId -CTE.RowId
) A
ORDER BY A.CusomerID, A.DateStart
Output
CusomerID Status DateStart DateEnd
----------- ----------- ---------- ----------
1 1 2016-01-01 2016-01-08
1 2 2016-01-08 2016-01-10
1 1 2016-01-10 2016-01-13
1 3 2016-01-13 NULL
2 1 2016-01-01 NULL
3 2 2016-01-09 2016-01-10
3 1 2016-01-10 2016-01-14
Here's the data:
[ TABLE_1 ]
id | prod1 | date1 | prod2 | date2 | prod3 | date3 |
---|--------|--------|--------|--------|--------|-------|
1 | null | null | null | null | null | null |
2 | null | null | null | null | null | null |
3 | null | null | null | null | null | null |
[ TABLE_2 ]
id | date | product |
-----|-------------|-----------|
1 | 20140101 | X |
1 | 20140102 | Y |
1 | 20140103 | Z |
2 | 20141201 | data |
2 | 20141201 | Y |
2 | 20141201 | Z |
3 | 20150101 | data2 |
3 | 20150101 | data3 |
3 | 20160101 | X |
Both tables have other columns not listed here.
date is formatted: yyyymmdd and datatype is int.
[ TABLE_2 ] doesn't have empty rows, just tried to make sample above more readable.
Here's the Goal:
I need to update [ TABLE_1 ] prod1,date1,prod2,date2,prod3,date3
with product collected from [ TABLE_2 ] with corresponding date values.
Data must be sorted so that "latest" product becomes prod1,
2nd latest product will be prod2 and 3rd is prod3.
Latest product = biggest date (int).
If dates are equal, order doesn't matter. (see id=2 and id=3).
Updated [ TABLE_1 ] should be:
id | prod1 | date1 | prod2 | date2 | prod3 | date3 |
---|--------|----------|--------|----------|--------|----------|
1 | Z | 20140103 | Y | 20140102 | X | 20140101 |
2 | data | 20141201 | Y | 20141201 | Z | 20141201 |
3 | X | 20160101 | data2 | 20150101 | data3 | 20150101 |
Ultimate goal is to get the following :
[ TABLE_3 ]
id | order1 | order2 | order3 | + Columns from [ TABLE_1 ]
---|--------------------|----------------------|------------|--------------------------
1 | 20140103:Z | 20140102:Y | 20140103:Z |
2 | 20141201:data:Y:Z | NULL | NULL |
3 | 20160101:X | 20150101:data2:data3 | NULL |
I have to admit this exceeds my knowledge and I haven't tried anything.
Should I do it with JOIN or SELECT subquery?
Should I try to make it in one SQL -clause or perhaps in 3 steps,
each prod&date -pair at the time ?
What about creating [ TABLE_3 ] ?
It has to have columns from [ TABLE_1 ].
Is it easiest to create it from [ TABLE_2 ] -data or Updated [ TABLE_1 ] ?
Any help would be highly appreciated.
Thanks in advance.
I'll post some of my own shots on comments.
After looking into it (after my comment), a stored procedure would be best, that you can call to view the data as a pivot, and do away with TABLE_1. Obviously if you need to make this dynamic, you'll need to look into dynamic pivots, it's a bit of a hack with CTEs:
CREATE PROCEDURE DBO.VIEW_AS_PIVOTED_DATA
AS
;WITH CTE AS (
SELECT ID, [DATE], 'DATE' + CAST(ROW_NUMBER() OVER(PARTITION BY ID ORDER BY [DATE] DESC) AS VARCHAR) AS [RN]
FROM TABLE_2)
, CTE2 AS (
SELECT ID, PRODUCT, 'PROD' + CAST(ROW_NUMBER() OVER(PARTITION BY ID ORDER BY [DATE] DESC) AS VARCHAR) AS [RN]
FROM TABLE_2)
, CTE3 AS (
SELECT ID, [DATE1], [DATE2], [DATE3]
FROM CTE
PIVOT(MAX([DATE]) FOR RN IN ([DATE1],[DATE2],[DATE3])) PIV)
, CTE4 AS (
SELECT ID, [PROD1], [PROD2], [PROD3]
FROM CTE2
PIVOT(MAX(PRODUCT) FOR RN IN ([PROD1],[PROD2],[PROD3])) PIV)
SELECT A.ID, [PROD1], [DATE1], [PROD2], [DATE2], [PROD3], [DATE3]
FROM CTE3 AS A
JOIN CTE4 AS B
ON A.ID=B.ID
Construction:
WITH ranked AS (
SELECT [id]
,[date]
,[product]
,row_number() over (partition by id order by date desc) rn
FROM [sistemy].[dbo].[TABLE_2]
)
SELECT id, [prod1],[date1],[prod2],[date2],[prod3],[date3]
FROM
(
SELECT id, type+cast(rn as varchar(1)) col, value
FROM ranked
CROSS APPLY
(
SELECT 'date', CAST([date] AS varchar(8))
UNION ALL
SELECT 'prod', product
) ca(type, value)
) unpivoted
PIVOT
(
max(value)
for col IN ([prod1],[date1],[prod2],[date2],[prod3],[date3])
) pivoted
You need to take a few steps to achive the aim.
Rank your products by date:
SELECT [id]
,[date]
,[product]
,row_number() over (partition by id order by date desc) rn
FROM [sistemy].[dbo].[TABLE_2]
Unpivot your date and product columns into one column. You can use UNPIVOT OR CROSS APPLY statements. I prefer CROSS APPLY
SELECT id, type+cast(rn as varchar(1)) col, value
FROM ranked
CROSS APPLY
(
SELECT 'date', CAST([date] AS varchar(8))
UNION ALL
SELECT 'prod', product
) ca(type, value)
or the same result using UNPIVOT
SELECT id, type+cast(rn as varchar(1)) col, value
FROM (
SELECT [id],
rn,
CAST([date] AS varchar(500)) date,
CAST([product] AS varchar(500)) prod
FROM ranked) t
UNPIVOT
(
value FOR type IN (date, product)
) unpvt
and at last you use PIVOTE and get a result.
I have a table with the following format
YEAR, MONTH, ITEM, REQ_QTY1, REQ_QTY2 , ....REQ_QTY31 ,CONVERTED1, CONVERTED2 ....CONVERTED31
Where the suffix of each column is the day of the month.
I need to convert it to the following format, where Day_of_month is the numeric suffix of each column
YEAR, MONTH, DAY_OF_MONTH, ITEM, REQ_QTY, CONVERTED
I thought of using CROSS APPLY to retrieve the data, but I can't use CROSS APPLY to get the "Day of Month"
SELECT A.YEAR, A.MONTH, A.ITEM, B.REQ_QTY, B.CONVERTED
FROM TEST A
CROSS APPLY
(VALUES
(REQ_QTY1, CONVERTED1),
(REQ_QTY2, CONVERTED2),
(REQ_QTY3, CONVERTED3),
......
(REQ_QTY31, CONVERTED31)
)B (REQ_QTY, CONVERTED)
The only way I found is to use a nested select with inner join
SELECT A.YEAR, A.MONTH, A.DAY_OF_MONTH, A.ITEM,A.REQ_QTY, D.CONVERTED FROM
(SELECT YEAR, MONTH, ITEM, SUBSTRING(DAY_OF_MONTH,8,2) AS DAY_OF_MONTH, REQ_QTY FROM TEST
UNPIVOT
(REQ_QTY FOR DAY_OF_MONTH IN ([REQ_QTY1],[REQ_QTY2],[REQ_QTY3],......[REQ_QTY30],[REQ_QTY31])
) B
) A
INNER JOIN (SELECT YEAR, MONTH, ITEM, SUBSTRING(DAY_OF_MONTH,10,2) AS DAY_OF_MONTH, CONVERTED FROM TEST
UNPIVOT
(CONVERTED FOR DAY_OF_MONTH IN ([CONVERTED1],[CONVERTED2],[CONVERTED3],....[CONVERTED30],[CONVERTED31])
) C
) D
ON D.YEAR = A.YEAR AND D.MONTH = A.MONTH AND D.ITEM = A.ITEM AND D.DAY_OF_MONTH = A.DAY_OF_MONTH
Is there a way to use CROSS APPLY and yet get the DAY_OF_MONTH out?
This is not a solution with CROSS APPLY but it will definitely make it a bit faster as it uses a bit simpler approach and simpler execution plan.
SQL Fiddle
MS SQL Server 2008 Schema Setup:
CREATE TABLE Test_Table([YEAR] INT, [MONTH] INT, [ITEM] INT, REQ_QTY1 INT
, REQ_QTY2 INT ,REQ_QTY3 INT , CONVERTED1 INT, CONVERTED2 INT, CONVERTED3 INT)
INSERT INTO Test_Table VALUES
( 2015 , 1 , 1 , 10 , 20 , 30 , 100 , 200 , 300),
( 2015 , 2 , 1 , 10 , 20 , 30 , 100 , 200 , 300),
( 2015 , 3 , 1 , 10 , 20 , 30 , 100 , 200 , 300)
Query 1:
SELECT *
FROM
(
SELECT [YEAR]
,[MONTH]
,ITEM
,Vals
,CASE WHEN LEFT(N,3) = 'REQ' THEN SUBSTRING(N,8 ,2)
WHEN LEFT(N,3) = 'CON' THEN SUBSTRING(N,10,2)
END AS Day_Of_Month
,CASE WHEN LEFT(N,3) = 'REQ' THEN LEFT(N,7)
WHEN LEFT(N,3) = 'CON' THEN LEFT(N,9)
END AS Tran_Type
FROM Test_Table t
UNPIVOT (Vals FOR N IN ([REQ_QTY1],[REQ_QTY2],[REQ_QTY3],
[CONVERTED1],[CONVERTED2],[CONVERTED3]))up
)t2
PIVOT (SUM(Vals)
FOR Tran_Type
IN (REQ_QTY, CONVERTED))p
Results:
| YEAR | MONTH | ITEM | Day_Of_Month | REQ_QTY | CONVERTED |
|------|-------|------|--------------|---------|-----------|
| 2015 | 1 | 1 | 1 | 10 | 100 |
| 2015 | 1 | 1 | 2 | 20 | 200 |
| 2015 | 1 | 1 | 3 | 30 | 300 |
| 2015 | 2 | 1 | 1 | 10 | 100 |
| 2015 | 2 | 1 | 2 | 20 | 200 |
| 2015 | 2 | 1 | 3 | 30 | 300 |
| 2015 | 3 | 1 | 1 | 10 | 100 |
| 2015 | 3 | 1 | 2 | 20 | 200 |
| 2015 | 3 | 1 | 3 | 30 | 300 |
Well, I found a way using CROSS APPLY, but instead of taking a substring, I'm basically hardcoding the days. Works well enough so...
SELECT A.YEAR, A.MONTH, A.ITEM, B.DAY_OF_MONTH, B.REQ_QTY, B.CONVERTED
FROM TEST A
CROSS APPLY
(
VALUES
('01', REQ_QTY1, CONVERTED1),
('02', REQ_QTY2, CONVERTED2),
('03', REQ_QTY3, CONVERTED3),
('04', REQ_QTY4, CONVERTED4),
......
('31', REQ_QTY31, CONVERTED31)
) B (DAY_OF_MONTH, REQ_QTY, CONVERTED)
I am struggling with developing a query to compare changes in a single table from month to month, example data -
+-----------------------------------------------------------+
| TaxGroupDetails |
+-----------+--+----------+--+-----------+--+---------------+
| Tax Group | | Tax Type | | Geocode | | EffectiveDate |
+-----------+--+----------+--+-----------+--+---------------+
| 2001 | | 1D | | 440011111 | | 1120531 |
| 2001 | | X1 | | 440011111 | | 1120531 |
| 2001 | | D3 | | 440011111 | | 1120531 |
| 2001 | | DGH | | 440011111 | | 1120531 |
| 2001 | | 1D | | 440011111 | | 1130101 |
| 2001 | | X1 | | 440011111 | | 1130101 |
| 2001 | | D3 | | 440011111 | | 1130101 |
| 2001 | | 1D | | 440011111 | | 1140201 |
| 2001 | | X1 | | 440011111 | | 1140201 |
| 2001 | | D3 | | 440011111 | | 1140201 |
| 2001 | | Z9 | | 440011111 | | 1140201 |
+-----------+--+----------+--+-----------+--+---------------+
I want to see the changes in the table, what was added or removed from a taxgroup, between the top two effective dates.
The results I am trying to obtain based on the sample data would be Z9 (added) if I was running the query in February (1140201) of this year.
If I was running the query in January (1130101) of last year I would expect to see DGH (removed)
I would expect two seperate queries, one to show what was added and another to show what was removed.
I have tried multiple avenues to come up with these two queries but cant seem to obtain the correct results. Can anyone point me in the right direction ?
SELECT
Current.TaxGroup,
Current.TaxType,
Current.GeoCode,
'Added'
FROM
TaxGroupDetails AS Current
WHERE
Current.EffectiveDate = #CurrentPeriod AND
NOT EXISTS
(
SELECT *
FROM TaxGroupDetails As Previous
WHERE
Previous.EffectiveDate = #PreviousPeriod
Current.TaxGroup = Previous.TaxGroup and
Current.TaxType = Previous.TaxType and
Current.GeoCode = Previous.GeoCode
)
UNION ALL
SELECT
Current.TaxGroup,
Current.TaxType,
Current.GeoCode,
'Added'
FROM
TaxGroupDetails AS Previous
WHERE
Previous.EffectiveDate = #PreviousPeriod AND
NOT EXISTS
(
SELECT *
FROM TaxGroupDetails As Current
WHERE
Current.EffectiveDate = #CurrentPeriod
Current.TaxGroup = Previous.TaxGroup and
Current.TaxType = Previous.TaxType and
Current.GeoCode = Previous.GeoCode
)
As you say you need two queries, one to select each of the two groups of data you want to compare.
SELECT [Tax Group], [Tax Type], [Geocode], [EffectiveDate]
FROM TaxGroupDetails
WHERE EffectiveDate = 1120531
SELECT [Tax Group], [Tax Type], [Geocode], [EffectiveDate]
FROM TaxGroupDetails
WHERE EffectiveDate = 1140201
You then need to join these two together using some form of key, the combination of tax group and tax type seems sensible here.
SELECT *
FROM
(
SELECT [Tax Group], [Tax Type], [Geocode], [EffectiveDate]
FROM TaxGroupDetails
WHERE EffectiveDate = 1120531
) AS FirstGroup
FULL OUTER JOIN
(
SELECT [Tax Group], [Tax Type], [Geocode], [EffectiveDate]
FROM TaxGroupDetails
WHERE EffectiveDate = 1140201
) AS SecondGroup
ON FirstGroup.[Tax Group] = SecondGroup.[Tax Group]
AND FirstGroup.[Tax Type] = SecondGroup.[Tax Type]
The FULL OUTER JOIN here tells SQL to include the remaining row when the other doesn't exist.
Finally let's tidy up and order the columns and not use a *:
SELECT COALESCE(FirstGroup.[Tax Group], SecondGroup.[Tax Group]),
COALESCE(FirstGroup.[Tax Type], SecondGroup.[Tax Type]),
FirstGroup.Geocode, SecondGroup.Geocode,
FirstGroup.EffectiveDate, SecondGroup.EffectiveDate
FROM
.
.
.
COALESCE removes the NULLs from the first matched columns and as we are saying these muct be equal there is no point showing both copies.
The set-based solution: take the difference between the whole table and the whole table with all dates projected forward by one time interval. That will eliminate all rows except the ones with "new" codes.
SELECT
[TaxGroup],
[Tax Type],
[EffectiveDate]
FROM TaxGroupDetails t
EXCEPT
SELECT
[TaxGroup],
[Tax Type],
( SELECT MIN([EffectiveDate])
FROM TaxGroupDetails
WHERE [EffectiveDate] > t.[EffectiveDate]
AND [TaxGroup] = t.[TaxGroup]
)
FROM TaxGroupDetails t
To see what got deleted, project backwards instead. Change the subquery to:
SELECT MAX([EffectiveDate])
FROM TaxGroupDetails
WHERE [EffectiveDate] < t.[EffectiveDate]
AND [TaxGroup] = t.[TaxGroup]
If you have SQL2012:
WITH t AS (
SELECT *,
ROW_NUMBER() OVER(PARTITION BY [TaxGroup], [Tax Type] ORDER BY [EffectiveDate] ASC) rownum
FROM [TaxGroup]
)
SELECT *
FROM t
WHERE rownum = 1
AND [EffectiveDate] = #Date
To get the other query, change ASC to DESC
Try this / you could start from this [partial] solution:
DECLARE #MyTable TABLE (
ID INT IDENTITY PRIMARY KEY,
[Tax Group] SMALLINT NOT NULL,
[Tax Type] VARCHAR(3) NOT NULL,
[Geocode] INT NOT NULL,
[EffectiveDate] INT NOT NULL
);
INSERT #MyTable
SELECT 2001, '1D ', 440011111, 1120531
UNION ALL SELECT 2001, 'X1 ', 440011111, 1120531
UNION ALL SELECT 2001, 'D3 ', 440011111, 1120531
UNION ALL SELECT 2001, 'DGH', 440011111, 1120531
UNION ALL SELECT 2001, '1D ', 440011111, 1130101
UNION ALL SELECT 2001, 'X1 ', 440011111, 1130101
UNION ALL SELECT 2001, 'D3 ', 440011111, 1130101
UNION ALL SELECT 2001, '1D ', 440011111, 1140201
UNION ALL SELECT 2001, 'X1 ', 440011111, 1140201
UNION ALL SELECT 2001, 'D3 ', 440011111, 1140201
UNION ALL SELECT 2001, 'Z9 ', 440011111, 1140201;
DECLARE #Results TABLE (
ID INT NOT NULL,
Rnk INT NOT NULL,
EffectiveYear SMALLINT NOT NULL,
PRIMARY KEY (Rnk, EffectiveYear)
);
INSERT #Results
SELECT x.ID,
DENSE_RANK() OVER(ORDER BY x.[Tax Group], x.[Tax Type], x.[Geocode]) AS Rnk,
x.EffectiveDate / 10000 AS EffectiveYear
FROM #MyTable x;
SELECT
crt.*,
prev.*,
CASE
WHEN crt.ID IS NOT NULL AND prev.ID IS NOT NULL THEN '-' -- No change
WHEN crt.ID IS NULL AND prev.ID IS NOT NULL THEN 'D' -- Deleted
WHEN crt.ID IS NOT NULL AND prev.ID IS NULL THEN 'I' -- Inserted
END AS RowStatus
FROM #Results crt FULL OUTER JOIN #Results prev ON crt.Rnk = prev.Rnk
AND crt.EffectiveYear - 1 = prev.EffectiveYear
ORDER BY ISNULL(crt.EffectiveYear - 1, prev.EffectiveYear), crt.Rnk;
Sample output:
---- ---- ------------- ---- ---- -------------
| Current data | | Previous data |
---- ---- ------------- ---- ---- ------------- ---------
ID Rnk EffectiveYear ID Rnk EffectiveYear RowStatus
---- ---- ------------- ---- ---- ------------- ---------
1 1 112 NULL NULL NULL I -- Current vs. previous: current row hasn't a previous row
3 2 112 NULL NULL NULL I -- the same thing
4 3 112 NULL NULL NULL I -- the same thing
2 4 112 NULL NULL NULL I -- the same thing
NULL NULL NULL 4 3 112 D <-- Deleted: ID 4 = 'DGH'
5 1 113 1 1 112 - -- there is no change
7 2 113 3 2 112 -
6 4 113 2 4 112 -
8 1 114 5 1 113 -
10 2 114 7 2 113 -
9 4 114 6 4 113 -
11 5 114 NULL NULL NULL I <-- Inserted: ID 11 = 'Z9'
NULL NULL NULL 8 1 114 D
NULL NULL NULL 10 2 114 D
NULL NULL NULL 9 4 114 D
NULL NULL NULL 11 5 114 D
Note: I assume that there are no duplicated rows (x.[Tax Group], x.[Tax Type], x.[Geocode]) within a year.