How does this recursion repeat itself? - database

I have a question about some code.
I have a relation that is called comedians. It has the attribute comedian and preceding comedian. So the first comedian say Bob, has null in his field for preceding comedian. My question is, how does this code repeat until all child instances are found? I just can not wrap my head around it.
I know that the first part: the one part before UNION ALL selects all parent elements, so all comedians that have no comedians that performed before them (preceding comedian), but how can all the other comedians, under the parent be chosen? What makes it recursive?
with recursive tree as (
select company, comedian, preceding_comedian, 1 as level
from the_table
where preceding_comedian is null
union all
select ch.company, ch.comedian, ch.preceding_comedian, p.level + 1
from the_table ch
join tree p on ch.preceding_comedian = p.comedian
)

First, the non-recursive part of the query is performed:
select company, comedian, preceding_comedian, 1 as level
from the_table
where preceding_comedian is null
and the result is put in a “work table”.
Then the recursive part of the query is performed, where the work table is substituted for the recursive CTE:
select ch.company, ch.comedian, ch.preceding_comedian, p.level + 1
from the_table ch
join <work-table> p on ch.preceding_comedian = p.comedian
The result is added to the work table (if UNION is used instead of UNION ALL, duplicates are removed in the result).
The second step is repeated until the work table does not change any more.
The resulting work table is the result of the CTE.
So it is actually not so much a recursive, but an “iterative” CTE.

Related

SQL - Attain Previous Transaction Informaiton [duplicate]

I need to calculate the difference of a column between two lines of a table. Is there any way I can do this directly in SQL? I'm using Microsoft SQL Server 2008.
I'm looking for something like this:
SELECT value - (previous.value) FROM table
Imagining that the "previous" variable reference the latest selected row. Of course with a select like that I will end up with n-1 rows selected in a table with n rows, that's not a probably, actually is exactly what I need.
Is that possible in some way?
Use the lag function:
SELECT value - lag(value) OVER (ORDER BY Id) FROM table
Sequences used for Ids can skip values, so Id-1 does not always work.
SQL has no built in notion of order, so you need to order by some column for this to be meaningful. Something like this:
select t1.value - t2.value from table t1, table t2
where t1.primaryKey = t2.primaryKey - 1
If you know how to order things but not how to get the previous value given the current one (EG, you want to order alphabetically) then I don't know of a way to do that in standard SQL, but most SQL implementations will have extensions to do it.
Here is a way for SQL server that works if you can order rows such that each one is distinct:
select rank() OVER (ORDER BY id) as 'Rank', value into temp1 from t
select t1.value - t2.value from temp1 t1, temp1 t2
where t1.Rank = t2.Rank - 1
drop table temp1
If you need to break ties, you can add as many columns as necessary to the ORDER BY.
WITH CTE AS (
SELECT
rownum = ROW_NUMBER() OVER (ORDER BY columns_to_order_by),
value
FROM table
)
SELECT
curr.value - prev.value
FROM CTE cur
INNER JOIN CTE prev on prev.rownum = cur.rownum - 1
Oracle, PostgreSQL, SQL Server and many more RDBMS engines have analytic functions called LAG and LEAD that do this very thing.
In SQL Server prior to 2012 you'd need to do the following:
SELECT value - (
SELECT TOP 1 value
FROM mytable m2
WHERE m2.col1 < m1.col1 OR (m2.col1 = m1.col1 AND m2.pk < m1.pk)
ORDER BY
col1, pk
)
FROM mytable m1
ORDER BY
col1, pk
, where COL1 is the column you are ordering by.
Having an index on (COL1, PK) will greatly improve this query.
LEFT JOIN the table to itself, with the join condition worked out so the row matched in the joined version of the table is one row previous, for your particular definition of "previous".
Update: At first I was thinking you would want to keep all rows, with NULLs for the condition where there was no previous row. Reading it again you just want that rows culled, so you should an inner join rather than a left join.
Update:
Newer versions of Sql Server also have the LAG and LEAD Windowing functions that can be used for this, too.
select t2.col from (
select col,MAX(ID) id from
(
select ROW_NUMBER() over(PARTITION by col order by col) id ,col from testtab t1) as t1
group by col) as t2
The selected answer will only work if there are no gaps in the sequence. However if you are using an autogenerated id, there are likely to be gaps in the sequence due to inserts that were rolled back.
This method should work if you have gaps
declare #temp (value int, primaryKey int, tempid int identity)
insert value, primarykey from mytable order by primarykey
select t1.value - t2.value from #temp t1
join #temp t2
on t1.tempid = t2.tempid - 1
Another way to refer to the previous row in an SQL query is to use a recursive common table expression (CTE):
CREATE TABLE t (counter INTEGER);
INSERT INTO t VALUES (1),(2),(3),(4),(5);
WITH cte(counter, previous, difference) AS (
-- Anchor query
SELECT MIN(counter), 0, MIN(counter)
FROM t
UNION ALL
-- Recursive query
SELECT t.counter, cte.counter, t.counter - cte.counter
FROM t JOIN cte ON cte.counter = t.counter - 1
)
SELECT counter, previous, difference
FROM cte
ORDER BY counter;
Result:
counter
previous
difference
1
0
1
2
1
1
3
2
1
4
3
1
5
4
1
The anchor query generates the first row of the common table expression cte where it sets cte.counter to column t.counter in the first row of table t, cte.previous to 0, and cte.difference to the first row of t.counter.
The recursive query joins each row of common table expression cte to the previous row of table t. In the recursive query, cte.counter refers to t.counter in each row of table t, cte.previous refers to cte.counter in the previous row of cte, and t.counter - cte.counter refers to the difference between these two columns.
Note that a recursive CTE is more flexible than the LAG and LEAD functions because a row can refer to any arbitrary result of a previous row. (A recursive function or process is one where the input of the process is the output of the previous iteration of that process, except the first input which is a constant.)
I tested this query at SQLite Online.
You can use the following funtion to get current row value and previous row value:
SELECT value,
min(value) over (order by id rows between 1 preceding and 1
preceding) as value_prev
FROM table
Then you can just select value - value_prev from that select and get your answer

Using prior line record in next line record SQL

[The data set above is what I have created.
What I'd like to do is loop the last column (New_UPB) and have that be the first column in the next line of records and have the data set continue until the UPB reaches 0.]1
So that the outcome is this:
I have all of the fields already in my database as a temp table, I just need to figure out how to loop that until the installments complete but not sure how to work that.
This is what my query looks like so far:
SELECT
AMS.Loan,
AMS.Payment#,
AMS.Due_Date,
AMS.UPB,
AMS.Int_Rate,
AMS.Total_PI,
AMS.Monthly_Int_Amt,
AMS.Monthly_Prin_Amt,
AMS.New_UPB
FROM #AmSchedule AMS
WHERE 1=1
Since you are using SQL Server, you can use a Recursive Common Table Expression.
A Recursive CTE, is composed of two complementary queries unioned together. The first query is the anchor which sets up the initial conditions for the recursion or looping, while the second query does the recursion by doing a self referential select. That is it references the Recursive CTE in its from clause:
-- vvvvvvvvvvvv this is the Recursive CTEs name
with RecursiveCTE(Loan, Payment#, Due_Date, UPB, Int_Rate, Total_PI,
Monthly_Int_Amt, Monthly_Prin_Amt, New_UPB)
as (
-- The Anchor Query
SELECT AMS.Loan,
AMS.Payment#,
AMS.Due_Date,
AMS.UPB,
AMS.Int_Rate,
AMS.Total_PI,
AMS.Monthly_Int_Amt,
AMS.Monthly_Prin_Amt,
AMS.New_UPB
FROM #AmSchedule AMS
UNION ALL
-- The Recursive Part
SELECT Prior.Loan,
Prior.Payment# + 1, -- Increment Pay#
dateadd(mm, 1, Prior.Due_Date), -- Increment Due Date
Prior.new_UPB, -- <-- New_UPB from last iteration
Prior.Int_Rate,
Prior.Total_PI,
Prior.Monthly_Int_Amt, -- <-- Put your
Prior.Monthly_Prin_Amt, -- <-- calculations
Prior.New_UPB -- <-- here
FROM RecursiveCTE Prior
-- ^^^^^^^^^^^^ this is what makes it recursive
)
-- Output the results
select * from RecursiveCTE

T-SQL Selecting TOP 1 In A Query With Aggregates/Groups

I'm still fairly new to SQL. This is a stripped down version of the query I'm trying to run. This query is suppose to find those customers with more than 3 cases and display either the top 1 case or all cases but still show the overall count of cases per customer in each row in addition to all the case numbers.
The TOP 1 subquery approach didn't work but is there another way to get the results I need? Hope that makes sense.
Here's the code:
SELECT t1.StoreID, t1.CustomerID, t2.LastName, t2.FirstName
,COUNT(t1.CaseNo) AS CasesCount
,(SELECT TOP 1 t1.CaseNo)
FROM MainDatabase t1
INNER JOIN CustomerDatabase t2
ON t1.StoreID = t2.StoreID
WHERE t1.SubmittedDate >= '01/01/2017' AND t1.SubmittedDate <= '05/31/2017'
GROUP BY t1.StoreID, t1.CustomerID, t2.LastName, t2.FirstName
HAVING COUNT (t1.CaseNo) >= 3
ORDER BY t1.StoreID, t1.PatronID
I would like it to look something like this, either one row with just the most recent case and detail or several rows showing all details of each case in addition to the store id, customer id, last name, first name, and case count.
Data Example
For these I usually like to make a temp table of aggregates:
DROP TABLE IF EXISTS #tmp;
CREATE TABLE #tmp (
CustomerlD int NOT NULL DEFAULT 0,
case_count int NOT NULL DEFAULT 0,
case_max int NOT NULL DEFAULT 0,
);
INSERT INTO #tmp
(CustomerlD, case_count, case_max)
SELECT CustomerlD, COUNT(tl.CaseNo), MAX(tl.CaseNo)
FROM MainDatabase
GROUP BY CustomerlD;
Then you can join this "tmp" table back to any other table you want to display the number of cases on, or the max case number on. And you can limit it to customers that have more than 3 cases with WHERE case_count > 3

How does a recursive CTE eliminate duplicates?

I'm learning recursive CTEs in the AdventureWorks2012 database using SQL Server 2014 Express. I think I'm mostly getting the below example (taking from Beginning T-SQL 3rd Edition), but I don't quite understand why the recursive CTE doesn't produce duplicates.
Below is the recursive CTE that I'm trying to understand, it's a standard employee - manager hierarchy.
;with orgchart (employeeid, managerid, title, level, node) as (
--Anchor
select employeeid
, managerid
, title
, 0
, convert(varchar(30),'/') 'node'
from employee
where managerid is null
union all
--Recursive
select emp.employeeid
, emp.managerid
, emp.title
, oc.level + 1
, convert(varchar(30), oc.node + convert(varchar(30),emp.managerid) + '/')
from employee emp
inner join orgchart oc on oc.employeeid = emp.managerid
)
select employeeid
, managerid
, space(level * 3) + title 'title'
, level
, node
from orgchart
order by node;
It works fine, but the question comes when I try to understand what's going on by recreating it via temp tables. I create a series of temp tables to plug one output into the next query's input and recreate what the recursive CTE does.
--Anchor (Level 0)
select employeeid
, managerid
, title
, 0
, convert(varchar(30),'/') 'node'
into #orgchart
from employee
where managerid is null
Then I use that temp table to recreate the first level of recursion, at this point it's just the recursive CTE but with temp tables.
--Anchor + 1 level
select *
into #orgchart2
from #orgchart
union all
select emp.employeeid
, emp.managerid
, emp.title
, oc.level + 1
, convert(varchar(30), oc.node + convert(varchar(30),emp.managerid) + '/')
from employee emp
inner join #orgchart oc on oc.employeeid = emp.managerid
So far so good, the results make sense. Then I do it one more time, but here's where it starts to break down:
--Anchor + 2 levels
select *
into #orgchart3
from #orgchart2
union all
select emp.employeeid
, emp.managerid
, emp.title
, oc.level + 1
, convert(varchar(30), oc.node + convert(varchar(30),emp.managerid) + '/')
from employee emp
inner join #orgchart2 oc on oc.employeeid = emp.managerid
The output from this begins to return duplicate rows (all fields duplicate) of the level 1 employees. This makes sense - the second query after the UNION ALL will return the previous levels as well as the new level of recursion, and UNION ALL doesn't duplicate. If I do another round of recursion, the level 2 employees are also duplicated, and so on.
I understand that I can change UNION ALL to UNION in order to remove duplicates, but I'm trying to understand why the recursive CTE doesn't produce duplicates as well? It uses UNION ALL so I don't understand where the deduplication comes in. Is removal of duplicates an intrinsic part of a recursive CTE?
I'm trying to post all the result sets, but if they're needed to understand the problem then let me know and I will post them. Thanks in advance.
The difference is that when you populate your #orgchart2, you are including all the rows from #orgchart. So now when you create #orgchart3 (which represents a 3rd level of recursion), you are joining on the rows from #orgchart as well as #orgchart2.
So when you create the third level in #orgchart3, it is related to rows in both #orgchart and #orgchart2, when it should only be related to #orgchart2. Instead your third level includes rows that are one level beyond the 2nd level, but also one level beyond the anchor level, so you are duplicating rows, since you already have rows in the second level that are one level beyond the anchor level.
The optimizer knows not to do that with recursive CTEs. Each level of recursion only looks at the previous one and ignores all the ones that came before it. So no duplicates are created.
You would simulate what the optimizer does if you left out the top half of the UNION ALL when you populated #orgchart2 and #orgchart3, and then finally produced a single UNION ALL of all three temp tables.

Wrong order in Table valued Function(keep "order" of a recursive CTE)

a few minutes ago i asked here how to get parent records with a recursive CTE.
This works now, but I get the wrong order(backwards, ordered by the PK idData) when i create a Table valued Function which returns all parents. I cannot order directly because i need the logical order provided by the CTE.
This gives the correct order(from next parent to that parent and so on):
declare #fiData int;
set #fiData=16177344;
WITH PreviousClaims(idData,fiData)
AS(
SELECT parent.idData,parent.fiData
FROM tabData parent
WHERE parent.idData = #fiData
UNION ALL
SELECT child.idData,child.fiData
FROM tabData child
INNER JOIN PreviousClaims parent ON parent.fiData = child.idData
)
select iddata from PreviousClaims
But the following function returns all records in backwards order(ordered by PK):
CREATE FUNCTION [dbo].[_previousClaimsByFiData] (
#fiData INT
)
RETURNS #retPreviousClaims TABLE
(
idData int PRIMARY KEY NOT NULL
)
AS
BEGIN
DECLARE #idData int;
WITH PreviousClaims(idData,fiData)
AS(
SELECT parent.idData,parent.fiData
FROM tabData parent
WHERE parent.idData = #fiData
UNION ALL
SELECT child.idData,child.fiData
FROM tabData child
INNER JOIN PreviousClaims parent ON parent.fiData = child.idData
)
INSERT INTO #retPreviousClaims
SELECT idData FROM PreviousClaims;
RETURN;
END;
select * from dbo._previousClaimsByFiData(16177344);
UPDATE:
Since everybody beliefs that the CTE is not ordering(Any "ordering" will be totally arbitrary and coincidental), i'm wondering why the opposite seems to be true. I have queried a child claim with many parents and the order in the CTE is exactly the logical order when i go from child to parent and so on. This would mean that the CTE is iterating from record to record like a cursor and the following select returns it in exact this order. But when i call the TVF i got the order of the primary key idData instead.
The solution was simple. I only needed to remove the parent key of the return-Table of the TVF. So change...
RETURNS #retPreviousClaims TABLE
(
idData int PRIMARY KEY NOT NULL
)
to...
RETURNS #retPreviousClaims TABLE
(
idData int
)
.. and it keeps the right "order" (same order they were inserted into the CTE's temporary result set).
UPDATE2:
Because Damien mentioned that the "CTE-Order" could change in certain circumstances, i will add a new column relationLevel to the CTE which describes the level of relationship of the parent records (what is by the way quite useful in general f.e. for a ssas cube).
So the final Inline-TVF(which returns all columns) is now:
CREATE FUNCTION [dbo].[_previousClaimsByFiData] (
#fiData INT
)
RETURNS TABLE AS
RETURN(
WITH PreviousClaims
AS(
SELECT 1 AS relationLevel, child.*
FROM tabData child
WHERE child.idData = #fiData
UNION ALL
SELECT relationLevel+1, child.*
FROM tabData child
INNER JOIN PreviousClaims parent ON parent.fiData = child.idData
)
SELECT TOP 100 PERCENT * FROM PreviousClaims order by relationLevel
)
This is an exemplary relationship:
select idData,fiData,relationLevel from dbo._previousClaimsByFiData(46600314);
Thank you.
The correct way to do your ORDERing is to add an ORDER BY clause to your outermost select. Anything else is relying on implementation details that may change at any time (including if the size of your database/tables goes up, which may allow more parallel processing to occur).
If you need something convenient to allow the ordering to take place, look at Example D in the examples from the MSDN page on WITH:
WITH DirectReports(ManagerID, EmployeeID, Title, EmployeeLevel) AS
(
SELECT ManagerID, EmployeeID, Title, 0 AS EmployeeLevel
FROM dbo.MyEmployees
WHERE ManagerID IS NULL
UNION ALL
SELECT e.ManagerID, e.EmployeeID, e.Title, EmployeeLevel + 1
FROM dbo.MyEmployees AS e
INNER JOIN DirectReports AS d
ON e.ManagerID = d.EmployeeID
)
Add something similay to the EmployeeLevel column to your CTE, and everything should work.
I think the impression that the CTE is creating an ordering is wrong. It's a coincidence that the rows are coming out in order (possibly due to how they were originally inserted into tabData). Regardless, the TVF is returning a table so you have to explicitly add an ORDER BY to the SELECT you're using to call it if you want to guarantee ordering:
select * from dbo._previousClaimsByFiData(16177344) order by idData
There is no ORDER BY anywhere in sight - neither in the table-valued function, nor in the SELECT from that TVF.
Any "ordering" will be totally arbitrary and coincidental.
If you want a specific order, you need to specify an ORDER BY.
So why can't you just add an ORDER BY to your SELECT:
SELECT * FROM dbo._previousClaimsByFiData(16177344)
ORDER BY (whatever you want to order by)....
or put your ORDER BY into the TVF:
INSERT INTO #retPreviousClaims
SELECT idData FROM PreviousClaims
ORDER BY idData DESC (or whatever it is you want to order by...)

Resources