union all in TDengine - tdengine

Recently I tried union all in TDengine and found there was a issue:
taos> select count(*) as count, loc from st where ts between 1600000000000 and 1600000000010 group by loc;
count | loc | loc |
==========================================================================================
10 | nchar0 | nchar0 |
10 | nchar1 | nchar1 |
10 | nchar2 | nchar2 |
10 | nchar3 | nchar3 |
10 | nchar4 | nchar4 |
10 | nchar5 | nchar5 |
Query OK, 6 row(s) in set (0.003831s)
taos> select count(*) as count, loc from st where ts between 1600000000020 and 1600000000030 group by loc;
Query OK, 0 row(s) in set (0.002620s)
taos> select count(*) as count, loc from st where ts between 1600000000000 and 1600000000010 group by loc
-> union all
-> select count(*) as count, loc from st where ts between 1600000000020 and 1600000000030 group by loc;
count | loc | loc |
==========================================================================================
10 | nchar0 | nchar0 |
10 | nchar1 | nchar1 |
10 | nchar2 | nchar2 |
10 | nchar3 | nchar3 |
10 | nchar4 | nchar4 |
10 | nchar5 | nchar5 |
Query OK, 6 row(s) in set (0.004686s)
taos> select count(*) as count, loc from st where ts between 1600000000020 and 1600000000030 group by loc
-> union all
-> select count(*) as count, loc from st where ts between 1600000000000 and 1600000000010 group by loc;
count | loc | loc |
==========================================================================================
Query OK, 0 row(s) in set (0.004371s)
From above queries, why the result for query 3 and query 4 is different? it makes me confused.

Because TDengine's SQL is a SQL-like language, I guess its implementation of SQL is basically the same as SQL. In mySQL, the implementation principle of UNIONALL is that the result table is generated based on the previous table structure instead of the following table. For example, the structure of Table 1 is:
CREATE TABLE `table_1` (
`col1` int(255) DEFAULT NULL,
`col2` int(255) DEFAULT NULL,
`col3` int(255) DEFAULT NULL,
`col4` int(255) DEFAULT NULL
) ENGINE = InnoDB CHARSET = latin1;
The structure of Table 2 is:
CREATE TABLE `table_1` (
`col4` int(255) DEFAULT NULL,
`col3` int(255) DEFAULT NULL,
`col2` int(255) DEFAULT NULL,
`col1` int(255) DEFAULT NULL
) ENGINE = InnoDB CHARSET = latin1;
The results of Table 1 Unionall and Table 2 have the same structure as Table 1, and the results of Table 2 Unionall have the same structure as Table 2.
So for this problem, in the second case, the previous table is empty and TDengine may not recognize its table structure, so the result table structure is also empty, resulting in inconsistent results.

Related

SSIS Lookup Multiple Columns in one table to the same ID column in another

I have the following table:
EventValue | Person1 | Person2 | Person3 | Person4 | Meta1 | Meta2
-------------------------------------------------------------------------------------------
123 | joePerson01 | samRock01 | nancyDrew01 | steveRogers01 | 505 | 606
321 | steveRogers02 | yoMama01 | ruMo01 | lukeJedi01 | 707 | 808
I want to transform the Person columns into IDs for my destination table, so all of the ID's would be coming from the same Person table in my Destination DB:
ID | FirstName | LastName | DatabaseOneID | DatabaseTwoID
----------------------------------------------------------
1 | Joe | Person | joePerson01 | personJoe01
2 | Sam | Rockwell | samRock01 | rockSam01
3 | Nancy | Drew | nancyDrew01 | drewNancy01
4 | Steve | Rogers | steveRogers01 | rogersSteve01
5 | Steve R | Rogers | steveRogers02 | rogersSteve02
6 | Yo | Mama | yoMama01 | mamaYo01
7 | Rufus | Murdock | ruMo01 | moRu01
8 | Luke | Skywalker | lukeJedi01 | jediLuke01
With results like so:
MetaID | EventValue | Person1ID | Person2ID | Person3ID | Person4ID
------------------------------------------------------------------------
1 | 123 | 1 | 2 | 3 | 4
2 | 321 | 5 | 6 | 7 | 8
I currently have a Lookup Transform looking up the first Person column, but couldn't figure out how to convert all 4 Person columns into IDs within the same lookup.
You could do it in one query, or use UNPIVOT, or use a scalar function if you think it'll be more fixable for your implementation. Then, you just create a view of it, in which it'll be an easy access for you.
here is a quick example :
DECLARE
#tb1 TABLE
(
EventValue INT
, Person1 VARCHAR(250)
, Person2 VARCHAR(250)
, Person3 VARCHAR(250)
, Person4 VARCHAR(250)
, Meta1 INT
, Meta2 INT
)
DECLARE
#Person TABLE
(
ID INT
, FirstName VARCHAR(250)
, LastName VARCHAR(250)
, DatabaseOneID VARCHAR(250)
, DatabaseTwoID VARCHAR(250)
)
INSERT INTO #tb1
VALUES
(123,'joePerson01','samRock01','nancyDrew01','steveRogers01',505,606),
(321,'steveRogers02','yoMama01','ruMo01','lukeJedi01',707,808)
INSERT INTO #Person
VALUES
(1,'Joe','Person','joePerson01','personJoe01'),
(2,'Sam','Rockwell','samRock01','rockSam01'),
(3,'Nancy','Drew','nancyDrew01','drewNancy01'),
(4,'Steve','Rogers','steveRogers01','rogersSteve01'),
(5,'SteveR','Rogers','steveRogers02','rogersSteve02'),
(6,'Yo','Mama','yoMama01','mamaYo01'),
(7,'Rufus','Murdock','ruMo01','moRu01'),
(8,'Luke','Skywalker','lukeJedi01','jediLuke01')
SELECT ROW_NUMBER() OVER(ORDER BY EventValue) AS MetaID, *
FROM (
SELECT
t.EventValue
, MAX(CASE WHEN t.Person1 IN(p.DatabaseOneID, p.DatabaseTwoID) THEN p.ID ELSE NULL END) AS Person1ID
, MAX(CASE WHEN t.Person2 IN(p.DatabaseOneID, p.DatabaseTwoID) THEN p.ID ELSE NULL END) AS Person2ID
, MAX(CASE WHEN t.Person3 IN(p.DatabaseOneID, p.DatabaseTwoID) THEN p.ID ELSE NULL END) AS Person3ID
, MAX(CASE WHEN t.Person4 IN(p.DatabaseOneID, p.DatabaseTwoID) THEN p.ID ELSE NULL END) AS Person4ID
FROM #tb1 t
LEFT JOIN #Person p
ON p.DatabaseOneID IN(t.Person1, t.Person2, t.Person3, t.Person4)
OR p.DatabaseTwoID IN(t.Person1, t.Person2, t.Person3, t.Person4)
GROUP BY t.EventValue
) D
I currently have a Lookup Transform looking up the first Person column, but couldn't figure out how to convert all 4 Person columns into IDs within the same lookup.
You cannot do this within the same lookup, you have to add a Lookup Transformation for each Column. In your case you should add 4 Lookup Transformation.
If source database and destination database are on the same server, then you can use a SQL query to achieve that as mentioned in the other answer, but in case that each database is on a separate server you have to go with Lookup transformation or you have to import data into a staging table and perform Join operations using SQL.

Does Oracle's Optimizer Take into Consideration a Subquery's Underlying Columns?

Say you have a query like so:
with subselect as (
select foo_id
from foo
)
select bar_id
from bar
join subselect on foo_id = bar_id
where foo_id = 1000
Imagine you have an index on foo_id. Is Oracle's database smart enough to use the index in the query for the line "where foo_id = 1000"? OR since foo_id is wrapped in a subquery, does Oracle lose the index information related to this column?
Perform a simple test:
create table foo as
select t.object_id as foo_id, t.* from all_objects t;
create table bar as
select t.object_id as bar_id, t.* from all_objects t;
create index foo_id_ix on foo(foo_id);
exec dbms_stats.GATHER_TABLE_STATS(ownname=>user, tabname=>'FOO', method_opt=>'FOR ALL INDEXED COLUMNS' );
explain plan for
with subselect as (
select foo_id
from foo
)
select bar_id
from bar
join subselect on foo_id = bar_id
where foo_id = 1000;
select * from table( DBMS_XPLAN.DISPLAY );
and a result of last query is:
Plan hash value: 445248211
----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 10 | 366 (1)| 00:00:01 |
| 1 | MERGE JOIN CARTESIAN| | 1 | 10 | 366 (1)| 00:00:01 |
|* 2 | TABLE ACCESS FULL | BAR | 1 | 5 | 365 (1)| 00:00:01 |
| 3 | BUFFER SORT | | 1 | 5 | 1 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | FOO_ID_IX | 1 | 5 | 1 (0)| 00:00:01 |
----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("BAR_ID"=1000)
4 - access("FOO_ID"=1000)
In the above example Oracle uses |* 4 | INDEX RANGE SCAN using index: FOO_ID_IX for filter 4 - access("FOO_ID"=1000)
So the answer is:
yes, the Oracle's database is smart enough to use the index in the query for the line "where foo_id = 1000"

Getting a lineage of linked rows with details

I'm trying to get a "lineage" or similar, and also information about the first and last links (at least; all would be good), out of a table that has self-referential links between rows that have been "replaced" and rows that have replaced them. The table has a structure along these lines:
CREATE TABLE Thing (
Id INT PRIMARY KEY,
TStamp DATETIME,
Replaces INT NULL,
ReplacedBy INT NULL
);
I'm stuck with this structure. :-) It's sort of doubly-linked (yes, it's a bit silly): Each row has a unique Id, and then a row that has been "replaced" by another will have a non-NULL ReplacedBy giving the Id of the replacement row, and the replacement row will also have a link back to what it replaces in Replaces. So we can use either Replaces or ReplacedBy (or both) if we like.
Here's some sample data:
INSERT INTO Thing
(Id, TStamp, Replaces, ReplacedBy)
VALUES
(1, '2017-01-01', NULL, 11),
(2, '2017-01-02', NULL, 12),
(3, '2017-01-03', NULL, NULL),
(4, '2017-01-04', NULL, NULL),
(11, '2017-01-11', 1, NULL),
(12, '2017-01-12', 2, 22),
(22, '2017-01-22', 12, NULL);
So 1 was replaced by 11, 2 was replaced by 12, and 12 was replaced by 22.
I'd like to get the following information for each chain of links from this table in a reasonable way:
Details of the row that started the chain
Details of the final row in the chain
Details of the links in-between or at least how many links (total) there are in the chain
...filtered by a date range applied to the last row in the chain.
In an ideal universe, I'd get back something like this:
+−−−−−−−−−+−−−−−−−−+−−−−+−−−−−−−+−−−−−−−−−−−−+
| FirstId | LastId | Id | Links | TStamp |
+−−−−−−−−−+−−−−−−−−+−−−−+−−−−−−−+−−−−−−−−−−−−+
| 1 | 11 | 1 | 2 | 2017−01−01 |
| 1 | 11 | 11 | 2 | 2017−01−11 |
| 2 | 22 | 2 | 3 | 2017−01−02 |
| 2 | 22 | 12 | 3 | 2017−01−12 |
| 2 | 22 | 22 | 3 | 2017−01−22 |
+−−−−−−−−−+−−−−−−−−+−−−−+−−−−−−−+−−−−−−−−−−−−+
So far I have this query, which I could post-process to get the above:
WITH Data AS (
SELECT Id, TStamp, Replaces, ReplacedBy, 0 AS Depth
FROM Thing
UNION ALL
SELECT Thing.Id, Thing.TStamp, Thing.Replaces, Thing.ReplacedBy, Depth + 1
FROM Data
JOIN Thing
ON Thing.Replaces = Data.Id
)
SELECT *
FROM Data
WHERE ReplacedBy IS NOT NULL OR Depth > 0
ORDER BY
Id, Depth;
That gives me:
+−−−−+−−−−−−−−−−−−+−−−−−−−−−−+−−−−−−−−−−−−+−−−−−−−+
| Id | TStamp | Replaces | ReplacedBy | Depth |
+−−−−+−−−−−−−−−−−−+−−−−−−−−−−+−−−−−−−−−−−−+−−−−−−−+
| 1 | 2017−01−01 | NULL | 11 | 0 |
| 2 | 2017−01−02 | NULL | 12 | 0 |
| 11 | 2017−01−11 | 1 | NULL | 1 |
| 12 | 2017−01−12 | 2 | 12 | 0 |
| 12 | 2017−01−12 | 2 | 12 | 1 |
| 22 | 2017−01−13 | 12 | NULL | 1 |
| 22 | 2017−01−13 | 12 | NULL | 2 |
+−−−−+−−−−−−−−−−−−+−−−−−−−−−−+−−−−−−−−−−−−+−−−−−−−+
And I could use something like this to figure out (for instance) the final row of each chain:
WITH Data AS (
SELECT Id, Replaces, ReplacedBy, 0 AS Depth
FROM Thing
UNION ALL
SELECT Thing.Id, Thing.Replaces, Thing.ReplacedBy, Depth + 1
FROM Data
JOIN Thing
ON Thing.Replaces = Data.Id
),
MaxData AS (
SELECT Data.Id, Data.Depth
FROM Data
JOIN (
SELECT Id, MAX(Depth) AS MaxDepth
FROM Data
GROUP BY Id
) j ON data.Id = j.Id AND Data.Depth = j.MaxDepth
WHERE Depth > 0
)
SELECT *
FROM MaxData
ORDER BY
Id;
...which gives me:
+−−−−+−−−−−−−+
| Id | Depth |
+−−−−+−−−−−−−+
| 11 | 1 |
| 12 | 1 |
| 22 | 2 |
+−−−−+−−−−−−−+
...but I've lost the starting point and the points along the way.
I have the strong feeling I'm missing something really straight-forward — but clever — that would let me get this largely with the query rather than post-processing, some kind of join with a "min" and "max" query (but not like my one above). What would it be?
The table doesn't have any indexes on Replaces or ReplacedBy, but we could add any needed. The table is only lightly used (roughly 300k rows and probably only a couple of hundred updates/inserts a day).
I'm limited to SQL Server 2008 features.
Inspired by Gordon Linoff's answer and HABO's comment which highlighted something Gordon was doing that was critical, I:
Removed the SQL Server 2012+ FIRST_VALUE function, replacing it with a CROSS JOIN on an "overview" query of the data
Included the Links count in the overview query
Removed the reliance on t in Gordon's WHERE NOT EXISTS (SELECT 1 FROM Thing t2 WHERE t2.ReplacedBy = t.id), which (at last on SQL Server 2008) wasn't bound to anything
Filtered out rows that weren't replaced
Below, I also add the date filtering mentioned in the question
...filtered by a date range applied to the last row in the chain.
...which Gordon didn't cover at all, and changes our approach, but only in terms of the arrow of time.
So, first, without the date criteria, sticking fairly close to Gordon's answer:
WITH Data AS (
SELECT Id AS FirstId, Id, TStamp, Replaces, ReplacedBy, 0 AS Depth
FROM Thing
WHERE Replaces IS NULL AND ReplacedBy IS NOT NULL
UNION ALL
SELECT d.FirstId, t.Id, t.TStamp, t.Replaces, t.ReplacedBy, d.Depth + 1
FROM Data d
JOIN Thing t ON t.Replaces = d.Id
),
Overview AS (
SELECT FirstId, MAX(Id) AS LastId, COUNT(*) AS Links
FROM Data
GROUP BY
FirstId
)
SELECT d.FirstId, o.LastId, d.Id, o.Links, d.Depth, d.TStamp
FROM Data d
CROSS APPLY (
SELECT LastId, Links
FROM Overview
WHERE FirstId = d.FirstId
) o
ORDER BY
d.FirstId, d.Depth
;
The critical parts of that are grabbing the seed Id as FirstId here:
SELECT Id AS FirstId, Id, TStamp, Replaces, ReplacedBy, 0 AS Depth
FROM Thing
WHERE Replaces IS NULL AND ReplacedBy IS NOT NULL
and then propagating it through the results of the recursive join:
SELECT d.FirstId, t.Id, t.TStamp, t.Replaces, t.ReplacedBy, d.Depth + 1
FROM Data d
JOIN Thing t ON t.Replaces = d.Id
Just adding that to my original query gives us most of what I wanted. Then we add a second query to get the LastId for each FirstId (Gordon did it as a FIRST_VALUE over a partition, but I can't do that in SQL Server 2008) and using an overview query also lets me grab the number of links. We cross-apply that on the basis of the FirstId value to get the overall results I wanted.
The query above returns the following for the sample data:
+−−−−−−−−−+−−−−−−−−+−−−−+−−−−−−−+−−−−−−−+−−−−−−−−−−−−+
| FirstId | LastId | Id | Links | Depth | TStamp |
+−−−−−−−−−+−−−−−−−−+−−−−+−−−−−−−+−−−−−−−+−−−−−−−−−−−−+
| 1 | 11 | 1 | 2 | 0 | 2017-01-01 |
| 1 | 11 | 11 | 2 | 1 | 2017-01-11 |
| 2 | 22 | 2 | 3 | 0 | 2017-01-02 |
| 2 | 22 | 12 | 3 | 1 | 2017-01-12 |
| 2 | 22 | 22 | 3 | 2 | 2017-01-13 |
+−−−−−−−−−+−−−−−−−−+−−−−+−−−−−−−+−−−−−−−+−−−−−−−−−−−−+
...e.g., exactly what I wanted, plus Depth if I want (so I know what order the intermediary links were in).
If we wanted to include rows that were never replaced, we'd just change
WHERE Replaces IS NULL AND ReplacedBy IS NOT NULL
to
WHERE Replaces IS NULL
Giving us:
+−−−−−−−−−+−−−−−−−−+−−−−+−−−−−−−+−−−−−−−+−−−−−−−−−−−−+
| FirstId | LastId | Id | Links | Depth | TStamp |
+−−−−−−−−−+−−−−−−−−+−−−−+−−−−−−−+−−−−−−−+−−−−−−−−−−−−+
| 1 | 11 | 1 | 2 | 0 | 2017-01-01 |
| 1 | 11 | 11 | 2 | 1 | 2017-01-11 |
| 2 | 22 | 2 | 3 | 0 | 2017-01-02 |
| 2 | 22 | 12 | 3 | 1 | 2017-01-12 |
| 2 | 22 | 22 | 3 | 2 | 2017-01-13 |
| 3 | 3 | 3 | 1 | 0 | 2017-01-03 |
| 4 | 4 | 4 | 1 | 0 | 2017-01-04 |
+−−−−−−−−−+−−−−−−−−+−−−−+−−−−−−−+−−−−−−−+−−−−−−−−−−−−+
But we've ignored the date criteria required by the question:
...filtered by a date range applied to the last row in the chain.
To do that without building a massive temporary result set, we have to work backward: Instead of selecting the starting point (the first entry in a chain, Replaces IS NULL), we need to select the ending point (the last entry in a chain, ReplacedBy IS NULL), and then invert our logic working back through the chain. It's largely a matter of:
Swapping FirstId with LastId
Swapping Replaces with ReplacedBy (convenient the table had both!)
Using MIN to get the first ID in the chain rather than MAX to get the last
Using d.Depth - 1 rather than d.Depth + 1
Then fixing-up Depth based on Links once we know it in our final select, to get those nice values where 0 = first link rather than some varying negative number: o.Links + d.Depth - 1 AS Depth
All of which gives us:
WITH Data AS (
SELECT Id AS LastId, Id, TStamp, Replaces, ReplacedBy, 0 AS Depth
FROM Thing
WHERE ReplacedBy IS NULL AND Replaces IS NOT NULL
-- Filtering by date of last entry would go here
UNION ALL
SELECT d.LastId, t.Id, t.TStamp, t.Replaces, t.ReplacedBy, d.Depth - 1
FROM Data d
JOIN Thing t ON t.ReplacedBy = d.Id
),
Overview AS (
SELECT LastId, MIN(Id) AS FirstId, COUNT(*) AS Links
FROM Data
GROUP BY
LastId
)
SELECT o.FirstId, d.LastId, d.Id, o.Links, o.Links + d.Depth - 1 AS Depth, d.TStamp
FROM Data d
CROSS APPLY (
SELECT FirstId, Links
FROM Overview
WHERE LastId = d.LastId
) o
ORDER BY
o.FirstId, d.Depth
;
So for instance, if we used
AND TStamp BETWEEN '2017-01-12' AND '2017-02-01'
where I have
-- Filtering by date of last entry would go here
above, with our sample data we'd get this result:
+−−−−−−−−−+−−−−−−−−+−−−−+−−−−−−−+−−−−−−−+−−−−−−−−−−−−+
| FirstId | LastId | Id | Links | Depth | TStamp |
+−−−−−−−−−+−−−−−−−−+−−−−+−−−−−−−+−−−−−−−+−−−−−−−−−−−−+
| 2 | 22 | 2 | 3 | 0 | 2017−01−02 |
| 2 | 22 | 12 | 3 | 1 | 2017−01−12 |
| 2 | 22 | 22 | 3 | 2 | 2017−01−13 |
+−−−−−−−−−+−−−−−−−−+−−−−+−−−−−−−+−−−−−−−+−−−−−−−−−−−−+
...because the last link the Id = 1 chain is outside the date range, so we don't include it.
This is a little tricky. Arrange the CTE to start at the beginning of each list. That makes the subsequent processing easier:
WITH Data AS (
SELECT Id as FirstId, Id, TStamp, Replaces, ReplacedBy, 0 AS Depth
FROM Thing t
WHERE NOT EXISTS (SELECT 1 FROM Thing t2 WHERE t2.ReplacedBy = t.id)
UNION ALL
SELECT d.FirstId, t.Id, t.TStamp, t.Replaces, t.ReplacedBy, d.Depth + 1
FROM Data d JOIN
Thing t
ON t.Replaces = d.Id
)
SELECT d.*,
FIRST_VALUE(id) OVER (PARTITION BY FirstId ORDER BY Depth DESC) as LastId
FROM Data d;
Then, you can use FIRST_VALUE() with a reverse sort to get the last value in the chain.
This returns chains that have no links. You can add a filter to remove these.

Updating a set of data from the next matching record

I have a table that contains the status history (including current status) of a number of resources. It looks something like this:
CREATE TABLE [dbo].[RESOURCE_STATUS](
[id] [int] IDENTITY(1,1) NOT NULL,
[resource_id] [int] NOT NULL,
[date_timestamp] [datetime] NOT NULL,
[end_timestamp] [datetime] NULL,
...
)
And an example of the data ( obviously represents a valid timestamp):
+------+-------------+-----------------+---------------+
| id | resource_id | start_timestamp | end_timestamp |
+------+-------------+-----------------+---------------+
| 1 | 1 | <valid_ts> | <valid_ts> |
| 2 | 2 | <valid_ts> | <valid_ts> |
| 3 | 3 | <valid_ts> | NULL |
| 4 | 1 | <valid_ts> | NULL |
| 5 | 2 | <valid_ts> | NULL |
| 6 | 1 | <valid_ts> | <valid_ts> |
| 7 | 2 | <valid_ts> | NULL |
| 8 | 1 | <valid_ts> | NULL |
+------+-------------+-----------------+---------------+
There are, of course, additional columns representing the status etc but I don't think they are relevant at this point.
In theory, the start_timestamp and end_timestamp in each record are supposed to indicate the date and time of the start and end of each status with a NULL end_timestamp indicating that the status is ongoing (in this case rows 3, 7 and 8 indicate ongoing statuses).
The problem I have is that in some cases (rows 4 and 5 in the example) the end_timestamp hasn't been set and to get our reporting system working properly I need to go back and set that timestamp from the start_timestamp in the next record for that resource in the set if it exists. I.E. update row 4 from row 6 and row 5 from row 7. Rows 3, 7 and 8 shouldn't be modified since they represent the current state of the resource.
Note that the row missing the end_timestamp may not be the first row for that resource_id and there may be multiple rows for one or more resources that have incorrectly NULL end_timestamps.
I need to do this both for the existing data and on an ongoing basis when data is added to the table (I know the creator of the data should be fixed, but for various reasons that isn't on the table at this point).
In case it's relevant, we're using MS SQL Server 2008 and the table currently contains just over two million rows and obviously is growing on a daily basis.
Can anyone help me out with this please?
Try this...
WITH AddRowNumber AS
(
SELECT
id
, resource_id
, start_timestamp
, end_timestamp
, ROW_NUMBER() OVER(PARTITION BY resource_id ORDER BY start_timestamp) AS RowNumber
FROM
#Resource_Status
)
, NewTimestamp AS
(
SELECT
A1.id
, A1.resource_id
, A1.start_timestamp
, A1.end_timestamp
, A2.start_timestamp AS NewEndTimeStamp
FROM
AddRowNumber AS A1
INNER JOIN
AddRowNumber AS A2
ON
A1.resource_id = A2.resource_id
AND A1.RowNumber = A2.RowNumber - 1
WHERE
A1.end_timestamp IS NULL
)
UPDATE
#Resource_Status
SET
end_timestamp = Nt.NewEndTimeStamp
FROM
#Resource_Status AS R
INNER JOIN
NewTimestamp AS NT
ON
R.id = NT.id
Let me know if that works.
Ash

How do I utilize Row_Number() (partitioning) for my datapool correctly

we have following table (output is already ordered and separated for understanding):
| PK | FK1 | FK2 | ActionCode | CreationTS | SomeAttributeValue |
+----+-----+-----+--------------+---------------------+--------------------+
| 6 | 100 | 500 | Create | 2011-01-02 00:00:00 | H |
----------------------------------------------------------------------------
| 3 | 100 | 500 | Change | 2011-01-01 02:00:00 | Z |
| 2 | 100 | 500 | Change | 2011-01-01 01:00:00 | X |
| 1 | 100 | 500 | Create | 2011-01-01 00:00:00 | Y |
----------------------------------------------------------------------------
| 4 | 100 | 510 | Create | 2011-01-01 00:30:00 | T |
----------------------------------------------------------------------------
| 5 | 100 | 520 | CreateSystem | 2011-01-01 00:30:00 | A |
----------------------------------------------------------------------------
what is ActionCode? we use this in c# and there it represents an enum-value
what do i want to achieve?
well, i need the following output:
| FK1 | FK2 | ActionCode | SomeAttributeValue |
+-----+-----+--------------+--------------------+
| 100 | 500 | Create | H |
| 100 | 500 | Create | Z |
| 100 | 510 | Create | T |
| 100 | 520 | CreateSystem | A |
-------------------------------------------------
well, what is the actual logic?
we have some logical groups for composite-key (FK1 + FK2). each of these groups can be broken into partitions, which begin with Create or CreateSystem. each partition ends with Create, CreateSystem or Change. The actual value of SomeAttributeValue for each partition should be the value from the last line of the partition.
it is not possible to have following datapool:
| PK | FK1 | FK2 | ActionCode | CreationTS | SomeAttributeValue |
+----+-----+-----+--------------+---------------------+--------------------+
| 7 | 100 | 500 | Change | 2011-01-02 02:00:00 | Z |
| 6 | 100 | 500 | Create | 2011-01-02 00:00:00 | H |
| 2 | 100 | 500 | Change | 2011-01-01 01:00:00 | X |
| 1 | 100 | 500 | Create | 2011-01-01 00:00:00 | Y |
----------------------------------------------------------------------------
and then expect PK 7 to affect PK 2 or PK 6 to affect PK 1.
i don't even know how/where to start ... how can i achieve this?
we are running on mssql 2005+
EDIT:
there's a dump available:
instanceId: my PK
tenantId: FK 1
campaignId: FK 2
callId: FK 3
refillCounter: FK 4
ticketType: ActionCode (1 & 4 & 6 are Create, 5 is Change, 3 must be ignored)
ticketType, profileId, contactPersonId, ownerId, handlingStartTime, handlingEndTime, memo, callWasPreselected, creatorId, creationTS, changerId, changeTS should be taken from the Create (first line in partition in groups)
callingState, reasonId, followUpDate, callingAttempts and callingAttemptsConsecutivelyNotReached should be taken from the last Create (which then would be a "one-line-partition-in-group" / the same as the upper one) or Change (last line in partition in groups)
I'm assuming that each partition can only contain a single Create or CreateSystem, otherwise your requirements are ill-defined. The following is untested, since I don't have a sample table, nor sample data in an easily consumed format:
;With Partitions as (
Select
t1.FK1,
t1.FK2,
t1.CreationTS as StartTS,
t2.CreationTS as EndTS
From
Table t1
left join
Table t2
on
t1.FK1 = t2.FK1 and
t1.FK2 = t2.FK2 and
t1.CreationTS < t2.CreationTS and
t2.ActionCode in ('Create','CreateSystem')
left join
Table t3
on
t1.FK1 = t3.FK1 and
t1.FK2 = t3.FK2 and
t1.CreationTS < t3.CreationTS and
t3.CreationTS < t2.CreationTS and
t3.ActionCode in ('Create','CreateSystem')
where
t1.ActionCode in ('Create','CreateSystem') and
t3.FK1 is null
), PartitionRows as (
SELECT
t1.FK1,
t1.FK2,
t1.ActionCode,
t2.SomeAttributeValue,
ROW_NUMBER() OVER (PARTITION_FRAGMENT_ID BY t1.FK1,T1.FK2,t1.StartTS ORDER BY t2.CreationTS desc) as rn
from
Partitions t1
inner join
Table t2
on
t1.FK1 = t2.FK1 and
t1.FK2 = t2.FK2 and
t1.StartTS <= t2.CreationTS and
(t2.CreationTS < t1.EndTS or t1.EndTS is null)
)
select * from PartitionRows where rn = 1
(Please note than I'm using all kinds of reserved names here)
The basic logic is: The Partitions CTE is used to define each partition in terms of the FK1, FK2, an inclusive start timestamp, and exclusive end timestamp. It does this by a triple join to the base table. the rows from t2 are selected to occur after the rows from t1, then the rows from t3 are selected to occur between the matching rows from t1 and t2. Then, in the WHERE clause, we exclude any rows from the result set where a match occurred from t3 - the result being that the row from t1 and the row from t2 represent the start of two adjacent partitions.
The second CTE then retrieves all rows from Table for each partition, but assigning a ROW_NUMBER() score within each partition, based on the CreationTS, sorted descending, with the result that ROW_NUMBER() 1 within each partition is the last row to occur.
Finally, within the select, we choose those rows that occur last within their respective partitions.
This does all assume that CreationTS values are distinct within each partition. I may be able to re-work it using PK also, if that assumption doesn't hold up.
It is solvable with a recursive CTE. Here (assuming rows within partitions are ordered by CreationTS):
WITH partitioned AS (
SELECT
*,
rn = ROW_NUMBER() OVER (PARTITION BY FK1, FK2 ORDER BY CreationTS)
FROM data
),
subgroups AS (
SELECT
PK, FK1, FK2, ActionCode, CreationTS, SomeAttributeValue, rn,
Subgroup = 1,
Subrank = 1
FROM partitioned
WHERE rn = 1
UNION ALL
SELECT
p.PK, p.FK1, p.FK2, p.ActionCode, p.CreationTS, p.SomeAttributeValue, p.rn,
Subgroup = s.Subgroup + CASE p.ActionCode WHEN 'Change' THEN 0 ELSE 1 END,
Subrank = CASE p.ActionCode WHEN 'Change' THEN s.Subrank ELSE 0 END + 1
FROM partitioned p
INNER JOIN subgroups s ON p.FK1 = s.FK1 AND p.FK2 = s.FK2
AND p.rn = s.rn + 1
),
finalranks AS (
SELECT
PK, FK1, FK2, ActionCode, CreationTS, SomeAttributeValue, rn,
Subgroup, Subrank,
rank = ROW_NUMBER() OVER (PARTITION BY FK1, FK2, Subgroup ORDER BY Subrank DESC)
/* or: rank = MAX(Subrank) OVER (PARTITION BY FK1, FK2, Subgroup) - Subrank + 1 */
FROM subgroups
)
SELECT PK, FK1, FK2, ActionCode, CreationTS, SomeAttributeValue
FROM finalranks
WHERE rank = 1

Resources