Order By A Value In Another Field - sql-server

I have a job definition table with example data, shown below, that needs to be sorted in such a way that records that have a NextJobDefinitionID > 0 are kept together. The sort order for records where the NextJobDefinitionID = 0 does not matter. In the example the record with the JobName of "M1 P1" must follow "M1 Pre-Roll" and "M1 Pre-Roll" must follow "M1 Recurring Benefits". I am using SQL Server 2014.
Data:
My desired output would be:
M1 Recurring Benefits
M1 Pre-Roll
M1 P1

I believe this constructs the required ordering:
declare #t table (ID int,NextID int)
insert into #t(ID,NextID) values
(1,0),
(2,5),
(3,6),
(4,2),
(5,0),
(6,4)
;With Parents as (
select ID,ID as ParentID, 0 as Depth, NextID
from #t
where ID not in (select NextID from #t)
union all
select p.NextID,p.ParentID,Depth+1,t.NextID
from Parents p
inner join
#t t
on
p.NextID = t.ID
where p.NextID != 0
)
select * from Parents
order by ParentID,Depth
It works by building a CTE by using rows which may be freely ordered as the base case and then following the NextID values along the chain, keeping the original ParentID and increasing a Depth value, to then be able to have a simple ORDER BY at the end.
(Translating back to your original column/table/sample data left as an exercise for the reader, since as I say, I don't need the typing practice to transcribe it from an image)

If I correctly understand, you need something like this:
(select JobDefinitionID, FloatingJobID, JobName, NextJobDefinitionID from JobDefinitions
where NextJobDefinitionID <> 0)
UNION ALL
(select JobDefinitionID, FloatingJobID, JobName, 9223372036854775807 AS NextJobDefinitionID from JobDefinitions WHERE JobDefinitionID = (SELECT MAX(NextJobDefinitionID) FROM JobDefinitions))
order by NextJobDefinitionID

Related

Getting non-deterministic results from WITH RECURSIVE cte

I'm trying to create a recursive CTE that traverses all the records for a given ID, and does some operations between ordered records. Let's say I have customers at a bank who get charged a uniquely identifiable fee, and a customer can pay that fee in any number of installments:
WITH recursive payments (
id
, index
, fees_paid
, fees_owed
)
AS (
SELECT id
, index
, fees_paid
, fee_charged
FROM table
WHERE index = 1
UNION ALL
SELECT t.id
, t.index
, t.fees_paid
, p.fees_owed - p.fees_paid
FROM table t
JOIN payments p
ON t.id = p.id
AND t.index = p.index + 1
)
SELECT *
FROM payments
ORDER BY 1,2;
The join logic seems sound, but when I join the output of this query to the source table, I'm getting non-deterministic and incorrect results.
This is my first foray into Snowflake's recursive CTEs. What am I missing in the intermediate result logic that is leading to the non-determinism here?
I assume this is edited code, because in the anchor of you CTE you select the fourth column fee_charged which does not exist, and then in the recursion you don't sum the fees paid and other stuff, basically you logic seems rather strange.
So creating some random data, that has two different id streams to recurse over:
create or replace table data (id number, index number, val text);
insert into data
select * from values (1,1,'a'),(2,1,'b')
,(1,2,'c'), (2,2,'d')
,(1,3,'e'), (2,3,'f')
v(id, index, val);
Now altering you CTE just a little bit to concat that strings together..
WITH RECURSIVE payments AS
(
SELECT id
, index
, val
FROM data
WHERE index = 1
UNION ALL
SELECT t.id
, t.index
, p.val || t.val as val
FROM data t
JOIN payments p
ON t.id = p.id
AND t.index = p.index + 1
)
SELECT *
FROM payments
ORDER BY 1,2;
we get:
ID INDEX VAL
1 1 a
1 2 ac
1 3 ace
2 1 b
2 2 bd
2 3 bdf
Which is exactly as I would expect. So how this relates to your "it gets strange when I join to other stuff" is ether, your output of you CTE is not how you expect it to be.. Or your join to other stuff is not working as you expect, Or there is a bug with snowflake.
Which all comes down to, if the CTE results are exactly what you expect, create a table and join that to your other table, so eliminate some form of CTE vs JOIN bug, and to debug why your join is not working.
But if your CTE output is not what you expect, then lets help debug that.

Creating permutation via recursive CTE in SQL server?

Looking at :
;WITH cte AS(
SELECT 1 AS x UNION
SELECT 2 AS x UNION
SELECT 3 AS x
)
I can create permutation table for all 3 values :
SELECT T1.x , y=T2.x , z=t3.x
FROM cte T1
JOIN cte T2
ON T1.x != T2.x
JOIN cte T3
ON T2.x != T3.x AND T1.x != T3.x
This uses the power of SQL's cartesian product plus eliminating equal values.
OK.
But is it possible to enhance this recursive pseudo CTE :
;WITH cte AS(
SELECT 1 AS x , 2 AS y , 3 AS z
UNION ALL
...
)
SELECT * FROM cte
So that it will yield same result as :
NB there are other solutions in SO that uses recursive CTE , but it is not spread to columns , but string representation of the permutations
I tried to do the lot in a CTE.
However trying to "redefine" a rowset dynamically is a little tricky. While the task is relatively easy using dynamic SQL doing it without poses some issues.
While this answer may not be the most efficient or straight forward, or even correct in the sense that it's not all CTE it may give others a basis to work from.
To best understand my approach read the comments, but it might be worthwhile looking at each CTE expression in turn with by altering the bit of code below in the main block, with commenting out the section below it.
SELECT * FROM <CTE NAME>
Good luck.
IF OBJECT_ID('tempdb..#cteSchema') IS NOT NULL
DROP Table #cteSchema
GO
-- BASE CTE
;WITH cte AS( SELECT 1 AS x, 2 AS y, 3 AS z),
-- So we know what columns we have from the CTE we extract it to XML
Xml_Schema AS ( SELECT CONVERT(XML,(SELECT * FROM cte FOR XML PATH(''))) AS MySchema ),
-- Next we need to get a list of the columns from the CTE, by querying the XML, getting the values and assigning a num to the column
MyColumns AS (SELECT D.ROWS.value('fn:local-name(.)','SYSNAME') AS ColumnName,
D.ROWS.value('.','SYSNAME') as Value,
ROW_NUMBER() OVER (ORDER BY D.ROWS.value('fn:local-name(.)','SYSNAME')) AS Num
FROM Xml_Schema
CROSS APPLY Xml_Schema.MySchema.nodes('/*') AS D(ROWS) ),
-- How many columns we have in the CTE, used a coupld of times below
ColumnStats AS (SELECT MAX(NUM) AS ColumnCount FROM MyColumns),
-- create a cartesian product of the column names and values, so now we get each column with it's possible values,
-- so {x=1, x =2, x=3, y=1, y=2, y=3, z=1, z=2, z=3} -- you get the idea.
PossibleValues AS (SELECT MyC.ColumnName, MyC.Num AS ColumnNum, MyColumns.Value, MyColumns.Num,
ROW_NUMBER() OVER (ORDER BY MyC.ColumnName, MyColumns.Value, MyColumns.Num ) AS ID
FROM MyColumns
CROSS APPLY MyColumns MyC
),
-- Now we have the possibly values of each "column" we now have to concat the values together using this recursive CTE.
AllRawXmlRows AS (SELECT CONVERT(VARCHAR(MAX),'<'+ISNULL((SELECT ColumnName FROM MyColumns WHERE MyColumns.Num = 1),'')+'>'+Value) as ConcatedValue, Value,ID, Counterer = 1 FROM PossibleValues
UNION ALL
SELECT CONVERT(VARCHAR(MAX),CONVERT(VARCHAR(MAX), AllRawXmlRows.ConcatedValue)+'</'+(SELECT ColumnName FROM MyColumns WHERE MyColumns.Num = Counterer)+'><'+(SELECT ColumnName FROM MyColumns WHERE MyColumns.Num = Counterer+1)+'>'+CONVERT(VARCHAR(MAX),PossibleValues.Value)) AS ConcatedValue, PossibleValues.Value, PossibleValues.ID,
Counterer = Counterer+1
FROM AllRawXmlRows
INNER JOIN PossibleValues ON AllRawXmlRows.ConcatedValue NOT LIKE '%'+PossibleValues.Value+'%' -- I hate this, there has to be a better way of making sure we don't duplicate values....
AND AllRawXmlRows.ID <> PossibleValues.ID
AND Counterer < (SELECT ColumnStats.ColumnCount FROM ColumnStats)
),
-- The above made a list but was missing the final closing XML element. so we add it.
-- we also restict the list to the items that contain all columns, the section above builds it up over many columns
XmlRows AS (SELECT DISTINCT
ConcatedValue +'</'+(SELECT ColumnName FROM MyColumns WHERE MyColumns.Num = Counterer)+'>'
AS ConcatedValue
FROM AllRawXmlRows WHERE Counterer = (SELECT ColumnStats.ColumnCount FROM ColumnStats)
),
-- Wrap the output in row and table tags to create the final XML
FinalXML AS (SELECT (SELECT CONVERT(XML,(SELECT CONVERT(XML,ConcatedValue) FROM XmlRows FOR XML PATH('row'))) FOR XML PATH('table') )as XMLData),
-- Prepare a CTE that represents the structure of the original CTE with
DataTable AS (SELECT cte.*, XmlData
FROM FinalXML, cte)
--SELECT * FROM <CTE NAME>
-- GETS destination columns with XML data.
SELECT *
INTO #cteSchema
FROM DataTable
DECLARE #XML VARCHAR(MAX) ='';
SELECT #Xml = XMLData FROM #cteSchema --Extract XML Data from the
ALTER TABLE #cteSchema DROP Column XMLData -- Removes the superflous column
DECLARE #h INT
EXECUTE sp_xml_preparedocument #h OUTPUT, #XML
SELECT *
FROM OPENXML(#h, '/table/row', 2)
WITH #cteSchema -- just use the #cteSchema to define the structure of the xml that has been constructed
EXECUTE sp_xml_removedocument #h
How about translating 1,2,3 into a column, which will look exactly like the example you started from, and use the same approach ?
;WITH origin (x,y,z) AS (
SELECT 1,2,3
), translated (x) AS (
SELECT col
FROM origin
UNPIVOT ( col FOR cols IN (x,y,z)) AS up
)
SELECT T1.x , y=T2.x , z=t3.x
FROM translated T1
JOIN translated T2
ON T1.x != T2.x
JOIN translated T3
ON T2.x != T3.x AND T1.x != T3.x
ORDER BY 1,2,3
If I understood correctly the request, this might just do the trick.
And to run it on more columns, just need to add them origin cte definition + unpivot column list.
Now, i dont know how you pass your 1 - n values for it to be dynamic, but if you tell me, i could try edit the script to be dynamic too.

SQL Select set of records from one table, join each record to top 1 record of second table matching 1 column, sorted by a column in the second table

This is my first question on here, so I apologize if I break any rules.
Here's the situation. I have a table that lists all the employees and the building to which they are assigned, plus training hours, with ssn as the id column, I have another table that list all the employees in the company, also with ssn, but including name, and other personal data. The second table contains multiple records for each employee, at different points in time. What I need to do is select all the records in the first table from a certain building, then get the most recent name from the second table, plus allow the result set to be sorted by any of the columns returned.
I have this in place, and it works fine, it is just very slow.
A very simplified version of the tables are:
table1 (ssn CHAR(9), buildingNumber CHAR(7), trainingHours(DEC(5,2)) (7200 rows)
table2 (ssn CHAR(9), fName VARCHAR(20), lName VARCHAR(20), sequence INT) (708,000 rows)
The sequence column in table 2 is a number that corresponds to a predetermined date to enter these records, the higher number, the more recent the entry. It is common/expected that each employee has several records. But several may not have the most recent(i.e. '8').
My SProc is:
#BuildingNumber CHAR(7), #SortField VARCHAR(25)
BEGIN
DECLARE #returnValue TABLE(ssn CHAR(9), buildingNumber CAHR(7), fname VARCHAR(20), lName VARCHAR(20), rowNumber INT)
INSERT INTO #returnValue(...)
SELECT(ssn,buildingNum,fname,lname,rowNum)
FROM SELECT(...,CASE #SortField Row_Number() OVER (PARTITION BY buildingNumber ORDER BY {sortField column} END AS RowNumber)
FROM table1 a
OUTER APPLY(SELECT TOP 1 fName,lName FROM table2 WHERE ssn = a.ssn ORDER BY sequence DESC) AS e
where buildingNumber = #BuildingNumber
SELECT * from #returnValue ORDER BY RowNumber
END
I have indexes for the following:
table1: buildingNumber(non-unique,nonclustered)
table2: sequence_ssn(unique,nonclustered)
Like I said this gets me the correct result set, but it is rather slow. Is there a better way to go about doing this?
It's not possible to change the database structure or the way table 2 operates. Trust me if it were it would be done. Are there any indexes I could make that would help speed this up?
I've looked at the execution plans, and it has a clustered index scan on table 2(18%), then a compute scalar(0%), then an eager spool(59%), then a filter(0%), then top n sort(14%).
That's 78% of the execution so I know it's in the section to get the names, just not sure of a better(faster) way to do it.
The reason I'm asking is that table 1 needs to be updated with current data. This is done through a webpage with a radgrid control. It has a range, start index, all that, and it takes forever for the users to update their data.
I can change how the update process is done, but I thought I'd ask about the query first.
Thanks in advance.
I would approach this with window functions. The idea is to assign a sequence number to records in the table with duplicates (I think table2), such as the most recent records have a value of 1. Then just select this as the most recent record:
select t1.*, t2.*
from table1 t1 join
(select t2.*,
row_number() over (partition by ssn order by sequence desc) as seqnum
from table2 t2
) t2
on t1.ssn = t1.ssn and t2.seqnum = 1
where t1.buildingNumber = #BuildingNumber;
My second suggestion is to use a user-defined function rather than a stored procedure:
create function XXX (
#BuildingNumber int
)
returns table as
return (
select t1.ssn, t1.buildingNum, t2.fname, t2.lname, rowNum
from table1 t1 join
(select t2.*,
row_number() over (partition by ssn order by sequence desc) as seqnum
from table2 t2
) t2
on t1.ssn = t1.ssn and t2.seqnum = 1
where t1.buildingNumber = #BuildingNumber;
);
(This doesn't have the logic for the ordering because that doesn't seem to be the central focus of the question.)
You can then call it as:
select *
from dbo.XXX(<building number>);
EDIT:
The following may speed it up further, because you are only selecting a small(ish) subset of the employees:
select *
from (select t1.*, t2.*, row_number() over (partition by ssn order by sequence desc) as seqnum
from table1 t1 join
table2 t2
on t1.ssn = t1.ssn
where t1.buildingNumber = #BuildingNumber
) t
where seqnum = 1;
And, finally, I suspect that the following might be the fastest:
select t1.*, t2.*, row_number() over (partition by ssn order by sequence desc) as seqnum
from table1 t1 join
table2 t2
on t1.ssn = t1.ssn
where t1.buildingNumber = #BuildingNumber and
t2.sequence = (select max(sequence) from table2 t2a where t2a.ssn = t1.ssn)
In all these cases, an index on table2(ssn, sequence) should help performance.
Try using some temp tables instead of the table variables. Not sure what kind of system you are working on, but I have had pretty good luck. Temp tables actually write to the drive so you wont be holding and processing so much in memory. Depending on other system usage this might do the trick.
Simple define the temp table using #Tablename instead of #Tablename. Put the name sorting subquery in a temp table before everything else fires off and make a join to it.
Just make sure to drop the table at the end. It will drop the table at the end of the SP when it disconnects, but it is a good idea to make tell it to drop to be on the safe side.

Wrong order in Table valued Function(keep "order" of a recursive CTE)

a few minutes ago i asked here how to get parent records with a recursive CTE.
This works now, but I get the wrong order(backwards, ordered by the PK idData) when i create a Table valued Function which returns all parents. I cannot order directly because i need the logical order provided by the CTE.
This gives the correct order(from next parent to that parent and so on):
declare #fiData int;
set #fiData=16177344;
WITH PreviousClaims(idData,fiData)
AS(
SELECT parent.idData,parent.fiData
FROM tabData parent
WHERE parent.idData = #fiData
UNION ALL
SELECT child.idData,child.fiData
FROM tabData child
INNER JOIN PreviousClaims parent ON parent.fiData = child.idData
)
select iddata from PreviousClaims
But the following function returns all records in backwards order(ordered by PK):
CREATE FUNCTION [dbo].[_previousClaimsByFiData] (
#fiData INT
)
RETURNS #retPreviousClaims TABLE
(
idData int PRIMARY KEY NOT NULL
)
AS
BEGIN
DECLARE #idData int;
WITH PreviousClaims(idData,fiData)
AS(
SELECT parent.idData,parent.fiData
FROM tabData parent
WHERE parent.idData = #fiData
UNION ALL
SELECT child.idData,child.fiData
FROM tabData child
INNER JOIN PreviousClaims parent ON parent.fiData = child.idData
)
INSERT INTO #retPreviousClaims
SELECT idData FROM PreviousClaims;
RETURN;
END;
select * from dbo._previousClaimsByFiData(16177344);
UPDATE:
Since everybody beliefs that the CTE is not ordering(Any "ordering" will be totally arbitrary and coincidental), i'm wondering why the opposite seems to be true. I have queried a child claim with many parents and the order in the CTE is exactly the logical order when i go from child to parent and so on. This would mean that the CTE is iterating from record to record like a cursor and the following select returns it in exact this order. But when i call the TVF i got the order of the primary key idData instead.
The solution was simple. I only needed to remove the parent key of the return-Table of the TVF. So change...
RETURNS #retPreviousClaims TABLE
(
idData int PRIMARY KEY NOT NULL
)
to...
RETURNS #retPreviousClaims TABLE
(
idData int
)
.. and it keeps the right "order" (same order they were inserted into the CTE's temporary result set).
UPDATE2:
Because Damien mentioned that the "CTE-Order" could change in certain circumstances, i will add a new column relationLevel to the CTE which describes the level of relationship of the parent records (what is by the way quite useful in general f.e. for a ssas cube).
So the final Inline-TVF(which returns all columns) is now:
CREATE FUNCTION [dbo].[_previousClaimsByFiData] (
#fiData INT
)
RETURNS TABLE AS
RETURN(
WITH PreviousClaims
AS(
SELECT 1 AS relationLevel, child.*
FROM tabData child
WHERE child.idData = #fiData
UNION ALL
SELECT relationLevel+1, child.*
FROM tabData child
INNER JOIN PreviousClaims parent ON parent.fiData = child.idData
)
SELECT TOP 100 PERCENT * FROM PreviousClaims order by relationLevel
)
This is an exemplary relationship:
select idData,fiData,relationLevel from dbo._previousClaimsByFiData(46600314);
Thank you.
The correct way to do your ORDERing is to add an ORDER BY clause to your outermost select. Anything else is relying on implementation details that may change at any time (including if the size of your database/tables goes up, which may allow more parallel processing to occur).
If you need something convenient to allow the ordering to take place, look at Example D in the examples from the MSDN page on WITH:
WITH DirectReports(ManagerID, EmployeeID, Title, EmployeeLevel) AS
(
SELECT ManagerID, EmployeeID, Title, 0 AS EmployeeLevel
FROM dbo.MyEmployees
WHERE ManagerID IS NULL
UNION ALL
SELECT e.ManagerID, e.EmployeeID, e.Title, EmployeeLevel + 1
FROM dbo.MyEmployees AS e
INNER JOIN DirectReports AS d
ON e.ManagerID = d.EmployeeID
)
Add something similay to the EmployeeLevel column to your CTE, and everything should work.
I think the impression that the CTE is creating an ordering is wrong. It's a coincidence that the rows are coming out in order (possibly due to how they were originally inserted into tabData). Regardless, the TVF is returning a table so you have to explicitly add an ORDER BY to the SELECT you're using to call it if you want to guarantee ordering:
select * from dbo._previousClaimsByFiData(16177344) order by idData
There is no ORDER BY anywhere in sight - neither in the table-valued function, nor in the SELECT from that TVF.
Any "ordering" will be totally arbitrary and coincidental.
If you want a specific order, you need to specify an ORDER BY.
So why can't you just add an ORDER BY to your SELECT:
SELECT * FROM dbo._previousClaimsByFiData(16177344)
ORDER BY (whatever you want to order by)....
or put your ORDER BY into the TVF:
INSERT INTO #retPreviousClaims
SELECT idData FROM PreviousClaims
ORDER BY idData DESC (or whatever it is you want to order by...)

SQL Server Comparing Subsequent Rows for Duplicates

I am trying to write a SQL Server query but have had no luck and was wondering if anyone may have any ideas on how to achieve my query.
What i'm trying to do:
I have a table with several columns naming the ones that i am dealing with TaskID, StatusCode, Timestamp. Now this table just holds tasks for one of our systems that run throughout the day and when something runs it gets a timestamp and the statuscode depending on the status for that task.
Sometimes what happens is the task table will be updated with a new timestamp but the statusCode will not have changed since the last update of the task so for two or more consecutive rows of a given task the statusCode can be the same. When i say consecutive rows i mean with regards to timestamp.
So example task 88 could have twenty rows at statusCode 2 after which the status code changes to something else.
Now what i am trying to do with no luck at the moment is to retrieve a list from this table of all the tasks and the statuscodes and the timestamps but in the case where i have more than one consecutive row for a task with the same statuscode i just want to take the first row with the lowest timestamp and ignore the rest of the row until the statuscode for that task changes.
To make it simpler in this case you can assume that i have a taskid which i am filtering on so i am just looking at a single task.
Does anyone have any ideas as to how i can do this or perhaps something that i coudl probably read to help me?
Thanks
Irfan.
This are a couple ways of getting what you want:
SELECT
T1.task_id,
T1.status_code,
T1.status_timestamp
FROM
My_Table T1
LEFT OUTER JOIN My_Table T2 ON
T2.task_id = T1.task_id AND
T2.status_timestamp < T1.status_timestamp
LEFT OUTER JOIN My_Table T3 ON
T3.task_id = T1.task_id AND
T3.status_timestamp < T1.status_timestamp AND
T3.status_timestamp > T2.status_timestamp
WHERE
T3.task_id IS NULL AND
(T2.status_code IS NULL OR T2.status_code <> T1.status_code)
ORDER BY
T1.status_timestamp
or
SELECT
T1.task_id,
T1.status_code,
T1.status_timestamp
FROM
My_Table T1
LEFT OUTER JOIN My_Table T2 ON
T2.task_id = T1.task_id AND
T2.status_timestamp = (
SELECT
MAX(status_timestamp)
FROM
My_Table T3
WHERE
T3.task_id = T1.task_id AND
T3.status_timestamp < T1.status_timestamp)
WHERE
(T2.status_code IS NULL OR T2.status_code <> T1.status_code)
ORDER BY
T1.status_timestamp
Both methods rely on there being no exact matches of the status_timestamp values (two rows can't have the same exact status_timestamp for a given task_id.)
Something like
select TaskID,StatusCode,Min(TimeStamp)
from table
group by TaskID,StatusCode
order by 1,2
Note that is statuscode can duplicate, you will need an additional field, but hopefully this can point you in the right direction...
Something like the following should get you in the right direction....
CREATE TABLE #T
(
TaskId INT
,StatusCode INT
,StatusTimeStamp DATETIME
)
INSERT INTO #T
SELECT 1, 1, '2009-12-01 14:20'
UNION SELECT 1, 2, '2009-12-01 16:20'
UNION SELECT 1, 2, '2009-12-02 09:15'
UNION SELECT 1, 2, '2009-12-02 12:15'
UNION SELECT 1, 3, '2009-12-02 18:15'
;WITH CTE AS
(
SELECT TaskId
,StatusCode
,StatusTimeStamp
,ROW_NUMBER() OVER (PARTITION BY TaskId, StatusCode ORDER BY TaskId, StatusTimeStamp DESC) AS RNUM
FROM #T
)
SELECT TaskId
,StatusCode
,StatusTimeStamp
FROM CTE
WHERE RNUM = 1
DROP TABLE #T

Resources