Recursive SQL Query with Defined Start Point - sql-server

I'm trying to get a recursive query to work with a defined starting point.
Here's some sample data from my table called Part_v_Container:
Serial_No From_Container Part_Key Part_Operation_Key
1234 1233 5678 5
1233 1232 5678 4
1231 1230 5678 3
1230 NULL 5678 2
Basically we ship a serial number and then I need to trace it back following all the previous serial numbers From_Container through the process. In my real query I'll have the starting Serial_No be a user defined variable.
Here's my attempt:
;WITH Recursive_cte AS
(SELECT
PContainer.Serial_No
,PContainer.From_Container
,PContainer.Part_Key
,PContainer.Part_Operation_Key
FROM Part_v_Container AS PContainer
WHERE Serial_No LIKE '1234' -- Will be user defined variable
UNION ALL
SELECT
PContainer.Serial_No
,PContainer.From_Container
,PContainer.Part_Key
,PContainer.Part_Operation_Key
FROM Recursive_cte
INNER JOIN
Part_v_Container AS PContainer
ON PContainer.From_Container = Recursive_cte.Serial_No
)
SELECT *
FROM Recursive_cte
Yet this only returns one row, the Serial_No = 1234 row. My real data set has thousands of serial numbers and I need to be able to pick which ones I pick from to trace back, not a broad query that is recursive for every one in my table.
I've tried reading several articles and examples to get this to work, including the one here with no success.
Thank you in advance for your help.

You had the join fields inverted.
SQL Demo
;WITH Recursive_cte AS (
SELECT
PContainer.Serial_No
, PContainer.From_Container
, PContainer.Part_Key
, PContainer.Part_Operation_Key
FROM Part_v_Container AS PContainer
WHERE Serial_No = 1234 -- Will be user defined variable
UNION ALL
SELECT
PContainer.Serial_No
, PContainer.From_Container
, PContainer.Part_Key
, PContainer.Part_Operation_Key
FROM Recursive_cte
INNER JOIN Part_v_Container AS PContainer
ON Recursive_cte.From_Container = PContainer.Serial_No
)
SELECT *
FROM Recursive_cte
OUTPUT
| Serial_No | From_Container | Part_Key | Part_Operation_Key |
|-----------|----------------|----------|--------------------|
| 1234 | 1233 | 5678 | 5 |
| 1233 | 1232 | 5678 | 4 |

Related

Using STRING_SPLIT for 2 columns in a single table

I've started from a table like this
ID | City | Sales
1 | London,New York,Paris,Berlin,Madrid| 20,30,,50
2 | Istanbul,Tokyo,Brussels | 4,5,6
There can be an unlimited amount of cities and/or sales.
I need to get each city and their salesamount their own record. So my result should look something like this:
ID | City | Sales
1 | London | 20
1 | New York | 30
1 | Paris |
1 | Berlin | 50
1 | Madrid |
2 | Istanbul | 4
2 | Tokyo | 5
2 | Brussels | 6
What I got so far is
SELECT ID, splitC.Value, splitS.Value
FROM Table
CROSS APLLY STRING_SPLIT(Table.City,',') splitC
CROSS APLLY STRING_SPLIT(Table.Sales,',') splitS
With one cross apply, this works perfectly. But when executing the query with a second one, it starts to multiply the number of records a lot (which makes sense I think, because it's trying to split the sales for each city again).
What would be an option to solve this issue? STRING_SPLIT is not neccesary, it's just how I started on it.
STRING_SPLIT() is not an option, because (as is mentioned in the documantation) the output rows might be in any order and the order is not guaranteed to match the order of the substrings in the input string.
But you may try with a JSON-based approach, using OPENJSON() and string transformation (comma-separated values are transformed into a valid JSON array - London,New York,Paris,Berlin,Madrid into ["London","New York","Paris","Berlin","Madrid"]). The result from the OPENJSON() with default schema is a table with columns key, value and type and the key column is the 0-based index of each item in this array:
Table:
CREATE TABLE Data (
ID int,
City varchar(1000),
Sales varchar(1000)
)
INSERT INTO Data
(ID, City, Sales)
VALUES
(1, 'London,New York,Paris,Berlin,Madrid', '20,30,,50'),
(2, 'Istanbul,Tokyo,Brussels', '4,5,6')
Statement:
SELECT d.ID, a.City, a.Sales
FROM Data d
CROSS APPLY (
SELECT c.[value] AS City, s.[value] AS Sales
FROM OPENJSON(CONCAT('["', REPLACE(d.City, ',', '","'), '"]')) c
LEFT OUTER JOIN OPENJSON(CONCAT('["', REPLACE(d.Sales, ',', '","'), '"]')) s
ON c.[key] = s.[key]
) a
Result:
ID City Sales
1 London 20
1 New York 30
1 Paris
1 Berlin 50
1 Madrid NULL
2 Istanbul 4
2 Tokyo 5
2 Brussels 6
STRING_SPLIT has no context of what oridinal positions are. In fact, the documentation specifically states that it doesn't care about it:
The order of the output may vary as the order is not guaranteed to match the order of the substrings in the input string.
As a result, you need to use something that is aware of such basic things, such as DelimitedSplit8k_LEAD.
Then you can do something like this:
WITH Cities AS(
SELECT ID,
DSc.Item,
DSc.ItemNumber
FROM dbo.YourTable YT
CROSS APPLY dbo.DelimitedSplit8k_LEAD(YT.City,',') DSc)
Sales AS(
SELECT ID,
DSs.Item,
DSs.ItemNumber
FROM dbo.YourTable YT
CROSS APPLY dbo.DelimitedSplit8k_LEAD(YT.Sales,',') DSs)
SELECT ISNULL(C.ID,S.ID) AS ID,
C.Item AS City,
S.Item AS Sale
FROM Cities C
FULL OUTER JOIN Sales S ON C.ItemNumber = S.ItemNumber;
Of course, however, the real solution is fix your design. This type of design is going to only cause you 100's of problems in the future. Fix it now, not later; you'll reap so many rewards sooner the earlier you do it.

TSQL Conditional Where or Group By?

I have a table like the following:
id | type | duedate
-------------------------
1 | original | 01/01/2017
1 | revised | 02/01/2017
2 | original | 03/01/2017
3 | original | 10/01/2017
3 | revised | 09/01/2017
Where there may be either one or two rows for each id. If there are two rows with same id, there would be one with type='original' and one with type='revised'. If there is one row for the id, type will always be 'original'.
What I want as a result are all the rows where type='revised', but if there is only one row for a particular id (thus type='original') then I want to include that row too. So desired output for the above would be:
id | type | duedate
1 | revised | 02/01/2017
2 | original | 03/01/2017
3 | revised | 09/01/2017
I do not know how to construct a WHERE clause that conditionally checks whether there are 1 or 2 rows for a given id, nor am I sure how to use GROUP BY because the revised date could be greater than or less than than the original date so use of aggregate functions MAX or MIN don't work. I thought about using CASE somehow, but also do not know how to construct a conditional that chooses between two different rows of data (if there are two rows) and display one of them rather than the other.
Any suggested approaches would be appreciated.
Thanks!
you can use row number for this.
WITH T AS
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY Type DESC) AS RN
FROM YourTable
)
SELECT *
FROM T
WHERE RN = 1
Is something like this sufficient?
SELECT *
FROM mytable m1
WHERE type='revised'
or 1=(SELECT COUNT(*) FROM mytable m2 WHERE m2.id=m1.id)
You could use a subquery to take the MAX([type]). In this case it works for [type] since alphabetically we want revised first, then original and "r" comes after "o" in the alphabet. We can then INNER JOIN back on the same table with the matching conditions.
SELECT T2.*
FROM (
SELECT id, MAX([type]) AS [MAXtype]
FROM myTABLE
GROUP BY id
) AS dT INNER JOIN myTable T2 ON dT.id = T2.id AND dT.[MAXtype] = T2.[type]
ORDER BY T2.[id]
Gives output:
id type duedate
1 revised 2017-02-01
2 original 2017-03-01
3 revised 2017-09-01
Here is the sqlfiddle: http://sqlfiddle.com/#!6/14121f/6/0

SQL Server query involving subqueries - performance issues

I have three tables:
Table 1: | dbo.pc_a21a22 |
batchNbr Other columns...
-------- ----------------
12345
12346
12347
Table 2: | dbo.outcome |
passageId record
---------- ---------
00003 200
00003 9
00004 7
Table 3: | dbo.passage |
passageId passageTime batchNbr
---------- ------------- ---------
00001 2015.01.01 12345
00002 2016.01.01 12345
00003 2017.01.01 12345
00004 2018.01.01 12346
What I want to do: for each batchNbr in Table 1 get first its latest passageTime and the corresponding passageID from Table 3. With that passageID, get the relevant rows in Table 2 and establish whether any of these rows contains the record 200. Per passageId there are at most 2 records in Table 2
What is the most efficient way to do this?
I have already created a query that works, but it's awfully slow and thus unfit for tables with millions of rows. Any suggestion on how to either change the query or do it another way? Altering the table structure is not an option, I only have read rights to the database.
My current solution (slow):
SELECT TOP 50000
a.batchNbr,
CAST ( CASE WHEN 200 in (SELECT TOP 2 record FROM dbo.outcome where passageId in (
SELECT SubqueryResults.passageId From (SELECT Top 1 passageId FROM dbo.passage pass WHERE pass.batchNbr = a.batchNbr ORDER BY passageTime Desc) SubqueryResults
)
) then 1 else 0 end as bit) as KGT_IO_END
FROM dbo.pc_a21a22 a
The desired output is:
batchNbr 200present
--------- ----------
12345 1
12346 0
I suggest you use table joining rather than subqueries.
select
a.*, b.*
from
dbo.table1 a
join
dbo.table2 b on a.id = b.id
where
/*your where clause for filtering*/
EDIT:
You could use this as a reference Join vs. sub-query
Try this
SELECT TOP 50000 a.*, (CASE WHEN b.record = 200 THEN 1 ELSE 0 END) AS
KGT_IO_END
FROM dbo.Test1 AS a
LEFT OUTER JOIN
(SELECT record, p.batchNbr
FROM dbo.Test2 AS o
LEFT OUTER JOIN (SELECT MAX(passageId) AS passageId, batchNbr FROM
dbo.Test3 GROUP BY batchNbr) AS p ON o.passageId = p.passageId
) AS b ON a.batchNbr = b.batchNbr;
The MAX subquery is to get the latest passageId by batchNbr.
However, your example won't get the record 200, since the passageId of the record with 200 is 00001, while the latest passageId of the batchNbr 12345 is 00003.
I used LEFT OUTER JOIN since the passageId from Table2 no longer match any of the latest passageId from Table3. The resulting subquery would have no records to join to Table1. Therefore INNER JOIN would not show any records from your example data.
Output from your example data:
batchNbr KGT_IO_END
12345 0
12346 0
12347 0
Output if we change the passageId of record 200 to 00003 (the latest for 12345)
batchNbr KGT_IO_END
12345 1
12346 0
12347 0

Finding the best match with "fuzzy" ranking logic

I need help with grouping results of the below temp table using a 'rank' column.
The temp table (MS SQL) is as follows:
student_address | school_address | student_st| school_st| district | districtID | rank
---------------------------------------------------------------------------------------
123 some street | 12 apple way | CT | CT | 322 | 322 | 0.2
123 some street | 33 pear street| CT | NJ | 039 | 039 | 0.1
333 another st. | NULL | VT | NULL | 111 | 111 | 0.0
I populated the #temp table as such:
SELECT st.student_address, sc.school_address, st.student_st, sc.district, st.districtID, '0.0' as rank
FROM students st
LEFT OUTER JOIN schools sc
ON st.[District ID] = sc.District
ORDER BY st.[District ID] asc;
I followed the results of my temp table by a series of updates that changed the 'rank' column based on certain rules (e.g. no match between school and student = 0.0, only a district match = 0.1, a district match & a state match = 0.2 and so on). The end result is that highly ranked rows are more likely to show the student's actual school vs. lesser ranked rows.
Where I need help is the final query. I essentially want to return all student info (all rows from the original students table) and the most likely corresponding school (determined by rank).
Something like (pseudo code)
select student_address, student_st, student_etc, school_address
from #temp
where rank = max(rank)
group by student_address
I know the above isn't correct SQL, but I hope it gives you an idea what I am trying to achieve?
Thanks for any guidance.
You can try this out:
select student_address, student_st, student_etc, school_address,RANK
from #temp t1
group by student_address, student_st, student_etc, school_address,RANK having
RANK=(select MAX(RANK) from #temp t2 where t1.student_address=t2.student_address)
I think you're close. Probably need to use a subquery like:
SELECT student_address, student_st, student_etc, school_address
FROM #temp
WHERE rank = (SELECT MAX(rank) FROM #temp)
...though I'm missing where student_street is coming from. The above, however looks like the pattern you're looking for.

SQL Server - Transpose rows into columns

I've searched high and low for an answer to this so apologies if it's already answered!
I have the following result from a query in SQL 2005:
ID
1234
1235
1236
1267
1278
What I want is
column1|column2|column3|column4|column5
---------------------------------------
1234 |1235 |1236 |1267 |1278
I can't quite get my head around the pivot operator but this looks like it's going to be involved. I can work with there being only 5 rows for now but a bonus would be for it to be dynamic, i.e. can scale to x rows.
EDIT:
What I'm ultimately after is assigning the values of each resulting column to variables, e.g.
DECLARE #id1 int, #id2 int, #id3 int, #id4 int, #id5 int
SELECT #id1 = column1, #id2 = column2, #id3 = column3, #id4 = column4,
#id5 = column5 FROM [transposed_table]
You also need a value field in your query for each id to aggregate on. Then you can do something like this
select [1234], [1235]
from
(
-- replace code below with your query, e.g. select id, value from table
select
id = 1234,
value = 1
union
select
id = 1235,
value = 2
) a
pivot
(
avg(value) for id in ([1234], [1235])
) as pvt
I think you'll find the answer in this answer to a slightly different question: Generate "scatter plot" result of members against sets from SQL query
The answer uses Dynamic SQL. Check out the last link in mellamokb's answer: http://www.sqlfiddle.com/#!3/c136d/14 where he creates column names from row data.
In case you have a grouped flat data structure that you want to group transpose, like such:
GRP | ID
---------------
1 | 1234
1 | 1235
1 | 1236
1 | 1267
1 | 1278
2 | 1234
2 | 1235
2 | 1267
2 | 1289
And you want its group transposition to appear like:
GRP | Column 1 | Column 2 | Column 3 | Column 4 | Column 5
-------------------------------------------------------------
1 | 1234 | 1235 | 1236 | 1267 | 1278
2 | 1234 | 1235 | NULL | 1267 | NULL
You can accomplish it with a query like this:
SELECT
Column1.ID As column1,
Column2.ID AS column2,
Column3.ID AS column3,
Column4.ID AS column4,
Column5.ID AS column5
FROM
(SELECT GRP, ID FROM FlatTable WHERE ID = 1234) AS Column1
LEFT OUTER JOIN
(SELECT GRP, ID FROM FlatTable WHERE ID = 1235) AS Column2
ON Column1.GRP = Column2.GRP
LEFT OUTER JOIN
(SELECT GRP, ID FROM FlatTable WHERE ID = 1236) AS Column3
ON Column1.GRP = Column3.GRP
LEFT OUTER JOIN
(SELECT GRP, ID FROM FlatTable WHERE ID = 1267) AS Column4
ON Column1.GRP = Column4.GRP
LEFT OUTER JOIN
(SELECT GRP, ID FROM FlatTable WHERE ID = 1278) AS Column5
ON Column1.GRP = Column5.GRP
(1) This assumes you know ahead of time which columns you will want — notice that I intentionally left out ID = 1289 from this example
(2) This basically uses a bunch of left outer joins to append 1 column at a time, thus creating the transposition. The left outer joins (rather than inner joins) allow for some columns to be null if they don't have corresponding values from the flat table, without affecting any subsequent columns.

Resources