Microsoft SQL, ROW_NUMBER & OUTER APPLY - sql-server

I am having some problems with an outer apply, specifically that I cant reference ROW from within the OUTER APPLY.
Note if i place the where ROW criteria outside of the outer apply people without a 3 ,4 or 5th row for example aren't returned.
OUTER APPLY
(
SELECT ROW_NUMBER()
OVER (ORDER BY L.DATECREATED) AS ROW,
L.PERCENTAGE
LABOURALLOCATION L
***WHERE ROW = 1***
) RECORDS

You can't reference an aliased column in your where clause. You also can't put the row_number() function in your where or in a having. If you need to limit to row = 1 within the cross apply, the only way to do it is to put the row_number() in a subquery or perhaps as a CTE. Note that ROW is a reserved word - I usually use ROW_NUM.
OUTER APPLY
(
SELECT ROW_NUM, PERCENTAGE
FROM (
SELECT ROW_NUMBER()
OVER (ORDER BY L.DATECREATED) AS ROW_NUM,
L.PERCENTAGE
FROM LABOURALLOCATION L
)
WHERE ROW_NUM = 1
) RECORDS

Related

SQL Newbie - Over Partition?

I have the following query. I am trying to get the Row # to increment whenever the value in Value1 field changes. The SensorData table has 2800 records and the Value1 is either 0 or 3 and changes throughout the day.
SELECT
ROW_NUMBER() OVER(PARTITION BY Value1 ORDER BY Block ASC) AS Row#,
GatewayDetailID, Block, Value1
FROM
SensorData
ORDER BY
Row#
I get the following results:
It seems like it creates only 2 partitions 0 and 3. It is not restarting the row number every time the value 1 changes.?
First instead of creating a permanent table I just changed it to a Temp table.
So, Given your example here is what I came up with:
WITH CTE as(
select ROW_NUMBER() OVER(ORDER BY BLOCK) RN, LAG(Value1,1,VALUE1) OVER (ORDER BY BLOCK) LG,
GatewayDetailID, Block, Value1,Value2,Vaule3
from #tmp
),
CTE2 as (
select *, CASE WHEN LG <> VALUE1 THEN RN ELSE 0 END RowMark
from cte
),
CTE3 AS (
select MIN(Block) BL, RowMark from CTE2
GROUP BY ROwMark
),
CTE4 AS (
SELECT GatewayDetailID,Block,Value1,Value2,Vaule3,RMM from cte2 t1
CROSS APPLY (SELECT MAX(ROWMark) RMM FROM CTE3 t9 where t1.Block >= t9.ROwMark and t1.RN >= t9.RowMark) t2
)
SELECT GateWayDetailID,Block,Value1,Value2,Vaule3, ROW_NUMBER() OVER(Partition by RMM ORDER BY BLOCK) RN
FROM CTE4
ORDER BY BLOCK
I first had to get a Row number for all the rows, then depending on when the Value1 changed I marked that as a new group. From that I created a CTE with the date and row boundry for each group. And then lastly I cross applied that back to the table to find each row in each group.
From that last CTE I merely just applied a simple ROW_NUMBER() function portioned by each RowMarker group and poof....row numbers.
There may be a better way to do this, but this was how I logically worked through the problem.

How to filter TOP 1 condition in WHERE clause

What would be the most efficient way to eliminate records in WHERE clause using TOP 1 logic?
Table tblQuoteStatusChangeLog is not in a JOIN.
But based on value in this table I need to eliminate records that have NewQuoteStatusID = 12
It works the way it is, but I am looking for more efficient way, since I have Sort (Top N Sort) operator that is too expansive.
SELECT
Q.ControlNo
,sum(fid.amtbilled) as Premium
FROM
[dbo].tblQuotes Q
inner join [dbo].[tblFin_Invoices] FI on Q.QuoteID = FI.QuoteID and FI.failed = 0
inner join [dbo].[tblFin_InvoiceDetails] FID on FI.[InvoiceNum] = FID.InvoiceNum
WHERE (
SELECT TOP 1 NewQuoteStatusID
FROM tblQuoteStatusChangeLog
WHERE (ControlNo = Q.ControlNo)
ORDER BY Timestamp DESC
) <> 12
Group by
Q.ControlNo
Your code is RBAR; performing the same subquery 1 at a time, which is very inefficient.
You worry about "sort", but that by itself would not be a problem. Look further up and left of the plan; to the nested loop. See the fat input line at the top and thin just below. Basically you're hitting your sort very many times.
Suggestion: try to use a set-based solution. "Prepare" the data you require for the WHERE clause "in advance", so you can eliminate the RBAR. Imagine you had LatestStatus as a table with ControlNo and StatusID columns. It would be much simpler to apply your filter; and the Query Optimiser should be able to find a more efficient overall plan.
You can set this up using a CTE.
;with StatusByControlNo as (
SELECT ROW_NUMBER() OVER(PARTITION BY ControlNo ORDER BY Timestamp DESC) AS RowNo,
ControlNo, Timestamp, NewQuoteStatusID
FROM tblQuoteStatusChangeLog
) ...
/*Easy to get Latest status per ControlNo from here*/
SELECT ControlNo, NewQuoteStatusID
FROM StatusByControlNo
WHERE RowNo = 1
Now with a few tweaks your query becomes:
;with StatusByControlNo as (
SELECT ROW_NUMBER() OVER(PARTITION BY ControlNo ORDER BY Timestamp DESC) AS RowNo,
ControlNo, Timestamp, NewQuoteStatusID
FROM tblQuoteStatusChangeLog
)
SELECT
Q.ControlNo,
sum(fid.amtbilled) as Premium
FROM
tblQuotes Q
inner join tblFin_Invoices FI
on Q.QuoteID = FI.QuoteID and FI.failed = 0
inner join tblFin_InvoiceDetails FID
on FI.InvoiceNum = FID.InvoiceNum
inner join StatusByControlNo S
on S.ControlNo = Q.ControlNo and S.RowNo = 1
WHERE
S.ControlNo <> 12
Group by Q.ControlNo
It should go without saying you could try a number of variations on this. But the core principle is to reduce RBAR and look for solutions that are more 'set-based'.

How to select top X from a Row_number in SQL Server ?

I have a data sample, and now i want to get data using TOP X combine ROW_NUMBER()
IndexNo ProductName
1 Black
2 Blue
3 Brown
4 Green
5 Red
6 White
7 Yellow
As follow in this case, i want to get the data, which after run SQL Statement, result as
IndexNo ProductName
3 Brown
4 Green
5 Red
I use this sql statement for this case, but i get this error Invalid column name 'IndexNo' , this is sql statement .
SELECT TOP 3 ROW_NUMBER() OVER(ORDER BY TEMPA.ProductName) AS IndexNo, TEMPA.ProductName
FROM (
SELECT DISTINCT ProductName FROM PRODUCTS WHERE ProductType ='Food'
) AS TEMPA
WHERE IndexNo between 3 and 5
You could use another level of subquery with parentheses.
SELECT TOP 3 * FROM
( SELECT ROW_NUMBER() OVER(ORDER BY TEMPA.ProductName) AS IndexNo, TEMPA.ProductName
FROM (
SELECT DISTINCT ProductName FROM PRODUCTS
) AS TEMPA
) as TEMPB
WHERE IndexNo between 3 and 5
DEMO
You need to wrap your ROW_NUMBER into Common Table Expression and apply between on the outer level:
with cte as (
SELECT ROW_NUMBER() OVER(ORDER BY TEMPA.ProductName) AS IndexNo, TEMPA.ProductName
FROM (
SELECT DISTINCT ProductName FROM PRODUCTS WHERE ProductType ='Food'
) AS TEMPA
) select top 3 * from cte
WHERE cte.IndexNo between 3 and 5
You need to create the ROW_NUMBER() in one scope and filter it in another scope...
SELECT
*
FROM
(
SELECT *, ROW_NUMBER() OVER (ORDER BY x) AS ix FROM example
)
indexed_example
WHERE
ix BETWEEN 3 AND 5
This is the NOT same for TOP and ORDER BY, as these are applied after the SELECT and WHERE clauses, so this would work fine...
SELECT TOP(3)
*,
ROW_NUMBER() OVER (ORDER BY id DESC) ix
FROM
example
ORDER BY
ix
This is especially useful to your case when using ORDER BY ? OFFSET ? FETCH ? instead of TOP.
SELECT
*,
ROW_NUMBER() OVER (ORDER BY id DESC) ix
FROM
example
ORDER BY
ix DESC
OFFSET 2 ROWS -- Skip 2 rows
FETCH NEXT 3 ROWS ONLY -- Fetch the 3rd, 4th and 5th rows.
In your example, you're also using DISTINCT which is applied after the SELECT values are calculated, but you could use GROUP BY instead as it is applied before the SELECT values are calculated.
SELECT
ROW_NUMBER() OVER (ORDER BY Products.ProductName) ix,
Products.ProductName
FROM
Products
WHERE
Products.ProductType = 'Food'
GROUP BY
Products.ProductName
ORDER BY
ix DESC
OFFSET 2 ROWS
FETCH NEXT 3 ROWS ONLY
All the joins in the FROM clause first (nothing to do in your case)
Apply the WHERE clause
Apply the GROUP BY clause (same effect as your DISTINCT)
Calculate the SELECT values, including the ROW_NUMBER()
Apply the ORDER BY including the OFFSET and FETCH NEXT clauses
Everything you wanted, without needing to next anything in sub-queries.

T-SQL Full outer join on two subqueries or arbitrary static values?

I'm trying to basically combine the columns from two outputs into one row.
Here's one example:
SELECT * FROM (SELECT 'Today' AS Txt) t1
FULL OUTER JOIN (SELECT * FROM (SELECT GETDATE() AS D) t2)
-- desired result is one row with a 'Txt' column with value 'Today' and a 'D' column with the result of the GETDATE function
And another:
SELECT * FROM (SELECT * FROM dbo.myTableFunc()) t1 -- returns 5 rows
FULL OUTER JOIN (SELECT * FROM (SELECT * FROM dbo.myOtherTableFunc())) t2 -- also returns 5 rows
The thing I cannot figure out how to do is to do the "outer join" on the two subqueries. In the first example, I'm basically trying to combine the result of two scalars into a single row result. In the second I'm trying to take two tables, each with five rows, and combine their columns, without any relationship between the data in the two tables.
I'm trying to do the above in a UDF and also in a view, so anything that involves creating temporary tables will not work.
In both of the above cases I get syntax errors around the closing ) signs in the outer join.
You're just missing the join conditions. in the first example, your join condition is "always", or 1 = 1:
SELECT * FROM (SELECT 'Today' AS Txt) t1
FULL OUTER JOIN (SELECT * FROM (SELECT GETDATE() AS D) t2) t2 on 1=1
In the second example you don't want any relationship between the rows in each data set - well, if you want to join them then there needs to be SOME relationship, even if it's spurious. Using a row number like this would work (assumes you have a unique column called Id in both tables):
select * from (
select row_number() over (order by Id asc) rn, * from dbo.myTableFunc()
) t1
full join (
select row_number() over (order by Id asc) rn, * from dbo.myOtherTableFunc()
) t2 on t1.rn=t2.rn

Top 1 for each joined record

In this SQL Server query, I want to return at most 1 lostReason for each booking. However, the sub-query seems to be returning the first record from the lostBusiness table for every booking. Let me know if I need to clarify.
SELECT
bookings.bookingNumber, lost.lostReason
FROM
bookings
LEFT OUTER JOIN(SELECT TOP (1)
bookingNumber,
lostReason
FROM
lostBusiness) AS lost ON bookings.bookingNumber = lost.bookingNumber
if you need more than one column
select
bookings.bookingNumber, lost.*
from bookings
outer apply
(
select top 1
lost.bookingNumber,
lost.lostReason,
--other columns
from lostBusiness as lost
where bookings.bookingNumber = lost.bookingNumber
order by -- put you order by here
) as lost
or
;with cte as (
select
*,
row_number() over (partition by bookings.bookingNumber order by /* ??? */) as row_num
from bookings
left outer join lostBusiness as lost on bookings.bookingNumber = lost.bookingNumber
)
select * from cte where row_num = 1
if you need more than one column
select
bookings.bookingNumber, max(lost.lostReason) as lostReason
from bookings
left outer join lostBusiness as lost on bookings.bookingNumber = lost.bookingNumber
group by bookings.bookingNumber
If you just want any lost reason, MAX or MIN would do:
SELECT
Bookings.BookingNumber,
MAX(LostBusiness.LostReason) as SomeLostReason
FROM
Bookings
LEFT JOIN LostBusiness ON bookings.BookingNumber= lostBusiness.BookingNumber
GROUP BY
Bookings.BookingNumber
Your query fails because you are joining to a single record, not to a record for each booking.
Try this
select *,
(
select top 1
lostreason
from lostbusiness
where lostbusiness.bookingnumber = bookings.bookingnumber
-- order by goes here.
)
from bookings
You should understand that data doesn't have any inherent order, so you should define what you mean by the "first" reason, by means of an order by clause in the subquery.

Resources