Cross Apply yields nulls - sql-server

I'm quoting MS:
CROSS APPLY returns only rows from the outer table that produce a
result set from the table-valued function.
This would mean that it would not return rows with null values, right?
However, my query is:
select ....,cat_custom,....
from ...(various inner joins)...
cross apply
(
select
case
when i.cat1='01' then 1
when i.cat2='04' then 2
when i.cat2='07' then 3
when i.cat2 in ('08') or i.cat3 in ('014','847') then 4
else null
end as cat_custom
) as cat_custom_query
...and sure enough, I get rows with nulls. Wouldn't that be OUTER apply's job? What is going on?

CROSS APPLY returns only rows from the outer table that produce a
result set from the table-valued function.
In your example, a row is produced - row, which is returning a NULL value.
You can try this:
select ....,cat_custom,....
from ...(various inner joins)...
cross apply
(
select
case
when i.cat1='01' then 1
when i.cat2='04' then 2
when i.cat2='07' then 3
when i.cat2 in ('08') or i.cat3 in ('014','847') then 4
else null
end as cat_custom
WHERE i.cat1 IN ('01', '04', '07', '08', '014', '847')
) as cat_custom_query
Also, if this is part of your real query, you can add the result column in the SELECT statement. You do not need to use CROSS APPLY here, as you are not referring any SQL objects (tables, views, functions ,etc).

Related

Outer Apply Returning columns unexpectedly NOT NULL when no match

I'm hitting some weird behavior on a table valued function when used with OUTER APPLY. I have a simple inline function that returns some simple calculations based on a row in another table. When the input values for the TVF are hard-coded scalars, there is no row returned. When I take the same scalars and make a single row out of them in a CTE, then feed them in as columns using CROSS APPLY, no result set. When I do the same with OUTER APPLY, I get 1 row (as expected), but two of the output columns are NULL and the other two NOT NULL. Based on BOL, that shouldn't happen with an OUTER APPLY. Is this a user error? I wrote a simple version to demonstrate the issue.
--Test set-up
CREATE FUNCTION dbo.TVFTest
(
#keyID INT,
#matchValue1 MONEY,
#matchValue2 MONEY
)
RETURNS TABLE AS RETURN
(
WITH TestRow
AS (SELECT #keyID AS KeyID,
#matchValue1 AS MatchValue1,
#matchValue2 AS MatchValue2)
SELECT KeyID,
MatchValue1,
MatchValue2,
CASE
WHEN MatchValue1 <> MatchValue2
THEN 'Not equal'
ELSE 'Something else'
END AS MatchTest
FROM TestRow
WHERE MatchValue1 <> MatchValue2
)
GO
Query
WITH Test AS
(
SELECT 12 AS PropertyID,
$350000 AS Ap1,
350000 AS Ap2
)
SELECT LP.*
FROM Test T
OUTER APPLY dbo.TVFTest
(
T.PropertyID,
T.Ap1,
T.Ap2
) LP;
Results
+-------+-------------+-------------+-----------+
| KeyID | MatchValue1 | MatchValue2 | MatchTest |
+-------+-------------+-------------+-----------+
| 12 | 350000.00 | NULL | NULL |
+-------+-------------+-------------+-----------+
Using Cross Apply returns no rows as expected. Also removing the CTE and using inline constants returns no row.
--Scalars, no row here...
SELECT LP.*
FROM dbo.TVFTest
(
12,
$350000,
350000
) LP;
This is certainly a bug in the product.
A similar bug was already reported and closed as "Won't Fix".
Including this question, the linked connect item and another two questions on this site I have seen four cases of this type of behaviour with inline TVFs and OUTER APPLY - All of them were of the format
OUTER APPLY dbo.SomeFunction(...) F
And returned correct results when written as
OUTER APPLY (SELECT * FROM dbo.SomeFunction(...)) F
So this looks like a possible workaround.
For the query
WITH Test AS
(
SELECT 12 AS PropertyID,
$350000 AS Ap1,
350000 AS Ap2
)
SELECT LP.*
FROM Test T
OUTER APPLY dbo.TVFTest
(
T.PropertyID,
T.Ap1,
T.Ap2
) LP;
The execution plan looks like
And the list of output columns in the final projection is. Expr1000, Expr1001, Expr1003, Expr1004.
However only two of those columns are defined in the table of constants in the bottom right.
The literal $350000 is defined in the table of constants in the top right (Expr1001). This then gets outer joined onto the table of constants in the bottom right. As no rows match the join condition the two columns defined there (Expr1003, Expr1004) are correctly evaluated as NULL. then finally the compute scalar adds the literal 12 into the data flow as a new column (Expr1000) irrespective of the result of the outer join.
These are not at all the correct semantics. Compare with the (correct) plan when the inline TVF is manually inlined.
WITH Test
AS (SELECT 12 AS PropertyID,
$350000 AS Ap1,
350000 AS Ap2)
SELECT LP.*
FROM Test T
OUTER APPLY (SELECT KeyID,
MatchValue1,
MatchValue2,
CASE
WHEN MatchValue1 <> MatchValue2
THEN 'Not equal'
ELSE 'Something else'
END AS MatchTest
FROM (SELECT T.PropertyID AS KeyID,
T.Ap1 AS MatchValue1,
T.Ap2 AS MatchValue2) TestRow
WHERE MatchValue1 <> MatchValue2) LP
Here the columns used in the final projection are Expr1003, Expr1004, Expr1005, Expr1006. All of these are defined in the bottom right constant scan.
In the case of the TVF it all seems to go wrong very early on.
Adding OPTION (RECOMPILE, QUERYTRACEON 3604, QUERYTRACEON 8606); shows the input tree to the process is already incorrect. Expressed in SQL it is something like.
SELECT Expr1000,
Expr1001,
Expr1003,
Expr1004
FROM (VALUES (12,
$350000,
350000)) V1(Expr1000, Expr1001, Expr1002)
OUTER APPLY (SELECT Expr1003,
IIF(Expr1001 <> Expr1003,
'Not equal',
'Something else') AS Expr1004
FROM (SELECT CAST(Expr1002 AS MONEY) AS Expr1003) D
WHERE Expr1001 <> Expr1003) OA
The full output of that trace flag is as follows (And 8605 shows basically the same tree.)
*** Input Tree: ***
LogOp_Project COL: Expr1000 COL: Expr1001 COL: Expr1003 COL: Expr1004
LogOp_Apply (x_jtLeftOuter)
LogOp_Project
LogOp_ConstTableGet (1) [empty]
AncOp_PrjList
AncOp_PrjEl COL: Expr1000
ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=12)
AncOp_PrjEl COL: Expr1001
ScaOp_Const TI(money,ML=8) XVAR(money,Not Owned,Value=(10000units)=(-794967296))
AncOp_PrjEl COL: Expr1002
ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=350000)
LogOp_Project
LogOp_Select
LogOp_Project
LogOp_ConstTableGet (1) [empty]
AncOp_PrjList
AncOp_PrjEl COL: Expr1003
ScaOp_Convert money,Null,ML=8
ScaOp_Identifier COL: Expr1002
ScaOp_Comp x_cmpNe
ScaOp_Identifier COL: Expr1001
ScaOp_Identifier COL: Expr1003
AncOp_PrjList
AncOp_PrjEl COL: Expr1004
ScaOp_IIF varchar collate 53256,Var,Trim,ML=14
ScaOp_Comp x_cmpNe
ScaOp_Identifier COL: Expr1001
ScaOp_Identifier COL: Expr1003
ScaOp_Const TI(varchar collate 53256,Var,Trim,ML=9) XVAR(varchar,Owned,Value=Len,Data = (9,Not equal))
ScaOp_Const TI(varchar collate 53256,Var,Trim,ML=14) XVAR(varchar,Owned,Value=Len,Data = (14,Something else))
AncOp_PrjList
*******************
I did some further research (SQL Server 2012) - and this is really weird!
You can simplify this. It seems to me that it has something to do with implicit type conversion. That's the reason why I tried around with data types...
Try this:
--Test set-up
CREATE FUNCTION dbo.TVFTest
(
#ValueInt INT,
#ValueMoney MONEY,
#ValueVarchar VARCHAR(10),
#ValueDate DATE,
#DateAsVarchar DATE
)
RETURNS TABLE AS RETURN
(
SELECT #ValueInt AS ValueInt
,#ValueMoney AS ValueMoney
,#ValueVarchar AS ValueVarchar
,#ValueDate AS ValueDate
,#DateAsVarchar AS DateAsVarchar
WHERE 1 != 1
)
GO
This function will never return a line due to the WHERE...
DECLARE #d AS DATE='20150101';
This typed date variable is needed later, try to replace it in the calls by GETDATE()...
--direct call: comes back with no row
SELECT * FROM dbo.TVFTest(1,2,'test',#d,'20150101');
--parameters via CTE:
WITH Test AS
(
SELECT 1 AS valint,
2 AS valmoney,
'test' AS valchar,
#d AS valdate, --try GETDATE() here!
'20150101' AS valdateasvarchar
)
SELECT *
FROM Test AS T
OUTER APPLY dbo.TVFTest(T.valint,T.valmoney,T.valchar,T.valdate,T.valdateasvarchar) AS LP;
Both implicitly converted parameters (Money and DateAsVarchar) don't show up, but the INT, the VARCHAR and the "real" DATE do!!!
Look at the execution plan:
This call was done with GETDATE(). Otherwise there'd be only 2 scalar operators...
EDIT: The first "Compute Scalar" in the execution plan shows all columns, the Constant Scan (scanning an internal table with constants) has only two columns (three if you use GETDATE()). The "bad" columns don't even seem to be part of the CTE at this stage...
--parameters via CTE with single calls
WITH Test AS
(
SELECT 1 AS valint,
2 AS valmoney,
'test' AS valchar,
#d AS valdate,
'20150101' AS valdateasvarchar
)
SELECT * FROM dbo.TVFTest((SELECT valint FROM Test)
,(SELECT valmoney FROM Test)
,(SELECT valchar FROM Test)
,(SELECT valdate FROM Test)
,(SELECT valdateasvarchar FROM Test));
GO
DROP FUNCTION dbo.TVFTest;
Just one more try, this returns with the expected result (empty)
My conclusio: Only scalar values which need some extra handling are handled and therefore "know" that they shouldn't show up. All scalar values which can be passed through without any extra work are not handled within the function and show up - which is a bug.
What's your opinion?

Ignore condition in WHERE clause when column is NULL

I do have table were one row (with Type =E) is related to another row.
I have written query to return COUNT of those related rows. The problem is that there is no explicit relationship (like ID column that would clearly say which row is related to other row). Therefore I am trying to find relationship based on multiple conditions in WHERE clause.
The problem is that in few cases, the columns A and B could be NULL (for records where TYPE = 'M'). In such a cases I would like to ignore that condition, so It would use only first 3 conditions to determine relationship.
I have tried CASE Statement but is not working as expected:
SELECT [T1].[ID],[T1].[AlphaId],[T1].[Type],[T1].[A],[T1].[B],[T1].[Date],[T1].[ServiceID]
,( SELECT COUNT(*)
FROM MyTable T2
WHERE [T1].[AlphaId]=[T2].[AlphaId] AND
[T1].[Date]=[T2].[Date] AND
[T1].[ServiceID]=[T2].[ServiceID] AND
[T2].[A]=CASE WHEN [T2].[A] IS NULL THEN NULL ELSE [T1].[A] END AND
[T2].[B]=CASE WHEN [T2].[B] IS NULL THEN NULL ELSE [T1].[B] END AND
[T2].[Type]='M'
) as TotalCount
FROM MyTable T1
WHERE [T1].[Type] = 'E'
I can't ignore that condition, as for some cases the Date, ServiceID could be same, however it's the A, B which differs them. Luckily where A, B IS NULL, it is the Date, ServiceID which differs those two records.
http://sqlfiddle.com/#!3/c98db/1
Many thanks in advance.
You could join the tables and use COUNT and GROUP BY to get the counts. Then you can JOIN [A] and [B] if they are equal or NULL.
SELECT [T1].[ID],[T1].[AlphaId],[T1].[Type],[T1].[A],[T1].[B],[T1].[Date],[T1].[ServiceID], count([T2].[ID])
FROM MyTable T1
INNER JOIN MyTable T2 ON [T1].[AlphaId]=[T2].[AlphaId] AND
[T1].[Date]=[T2].[Date] AND
[T1].[ServiceID]=[T2].[ServiceID] AND
([T2].[A]= [T1].[A] OR [T2].[A] IS NULL )AND
([T2].[B]= [T1].[B] OR [T2].[B] IS NULL )AND
[T2].[Type] <> [T1].[Type]
WHERE [T1].[Type] = 'E'
GROUP BY [T1].[ID],[T1].[AlphaId],[T1].[Type],[T1].[A],[T1].[B],[T1].[Date],[T1].[ServiceID]

SQL Server Column From Another Table With Static Value In View

Is there a way to have a column from another table with value which is always the same inside a View> Example:
SELECT *,
(SELECT value FROM tblStudentPrefixes WHERE PrefixName = 'SeniorPrefix')
AS StudentPrefix
FROM tblStudents
Will the above nested query get executed fro each row? Is there a way to execute it once and use for all rows.
Please note, I'm specifically talking about a View, not a Stored Procedure. I know this can be done in a Stored Procedure.
This actually depends on your table set up. Unless prefixName is constrained to be unique you could come across errors, where the subquery returns more than one row. If it is not constrained to be unique, but happens to be unique for SeniorPrefix then your query will be executed 1000 times. To demonstrate I have used the following DDL:
CREATE TABLE #tblStudents (ID INT IDENTITY(1, 1), Filler CHAR(100));
INSERT #tblStudents (Filler)
SELECT TOP 10000 NULL
FROM sys.all_objects a, sys.all_objects b;
CREATE TABLE #tblStudentPrefixes (Value VARCHAR(10), PrefixName VARCHAR(20));
INSERT #tblStudentPrefixes (Value, PrefixName) VALUES ('A Value', 'SeniorPrefix');
Running your query gives the following IO output:
Table '#tblStudentPrefixes'. Scan count 10000, logical reads 10000
Table '#tblStudents'. Scan count 1, logical reads 142
The key being the 1000 logical reads on tblStudentPrefixes. The other problem with it not being constrained to be unique is that if you have duplicates your query will fail with the error:
Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression.
If you can't constrain PrefixName to be unique, then you can stop it executing for each row and avoid the errors by using TOP:
SELECT *,
(SELECT TOP 1 value FROM #tblStudentPrefixes WHERE PrefixName = 'SeniorPrefix' ORDER BY Value)
AS StudentPrefix
FROM #tblStudents
The IO now becomes:
Table '#tblStudentPrefixes'. Scan count 1, logical reads 1
Table '#tblStudents'. Scan count 1, logical reads 142
However, I would still recommend switching to a CROSS JOIN here:
SELECT s.*, p.Value AS StudentPrefix
FROM #tblStudents AS s
CROSS JOIN
( SELECT TOP 1 value
FROM #tblStudentPrefixes
WHERE PrefixName = 'SeniorPrefix'
ORDER BY Value
) AS p;
Inspection of the execution plans shows that a sub-select using a table spool which is very unnecessary for a single value:
So in summary, it depends on your table set up whether it will execute for each row, but regardless you are giving the optimiser a better chance if you switch to a cross join.
EDIT
In light of the fact that you need to return rows from tblstudent when there is no match for SeniorPrefix in tblStudentPrefixes, and that PrefixName is not currenty constrianed to be unique then the best solution is:
SELECT *,
(SELECT MAX(value) FROM #tblStudentPrefixes WHERE PrefixName = 'SeniorPrefix')
AS StudentPrefix
FROM #tblStudents;
If you do constrain it to be unique, then the following 3 queries produce (essentially) the same plan and the same results, it is simply personal preference:
SELECT *,
(SELECT value FROM #tblStudentPrefixes WHERE PrefixName = 'SeniorPrefix')
AS StudentPrefix
FROM #tblStudents;
SELECT s.*, p.Value AS StudentPrefix
FROM #tblStudents AS s
LEFT JOIN #tblStudentPrefixes AS p
ON p.PrefixName = 'SeniorPrefix';
SELECT s.*, p.Value AS StudentPrefix
FROM #tblStudents AS s
OUTER APPLY
( SELECT Value
FROM #tblStudentPrefixes
WHERE PrefixName = 'SeniorPrefix'
) AS p;
I hope I understand your question right, but try this
SELECT *
FROM tblStudents
Outer Apply
(
SELECT value
FROM tblStudentPrefixes
WHERE PrefixName = 'SeniorPrefix'
) as tble
This is OK. Subquery would be executed for every row on every row (which could provide bad performance).
You could try also:
SELECT tblStudents.*,StudentPrefix.value
FROM tblStudents,
(SELECT value
FROM tblStudentPrefixes
WHERE PrefixName = 'SeniorPrefix')StudentPrefix

T-SQL: Can a subquery in the SELECT clause implicitly reference a table in the main outer query?

In T-SQL, is it possible to have a subquery in the SELECT clause that implicitly references tables in the main, outer query? For example:
select NAME,
case when exists (select o.ORDERID) then 1 else 0 end BUYER
from CUSTOMER c
left join ORDER o
on c.CUSTID = o.CUSTID
In other words, can I write subqueries without a FROM clause?
Intellisense seems to recognize outer table aliases in subqueries, but I can't find any documentation that says this is acceptable T-SQL. I can certainly run some of my own tests, but I also wanted to check with the community. Thanks.
Yes this is valid syntax.
A SELECT without a FROM is treated as though selecting from a single row table. Referencing columns from the outer query is required for correlated sub queries and is perfectly valid there.
The particular query you have makes no sense though. It will always evaluate to 1 as that subquery always returns a single row (with a single column containing the corresponding o.OrderId) going into the EXISTS check.
Probably you want to check o.OrderId IS NULL
SELECT NAME,
CASE
WHEN o.ORDERID IS NULL THEN 0
ELSE 1
END BUYER,
FROM CUSTOMER c
LEFT JOIN ORDER o
ON c.CUSTID = o.CUSTID
Where it does make sense to use this type of syntax is in a null safe equality check.
e.g.
SELECT A,
B,
CASE
WHEN EXISTS (SELECT T.A
EXCEPT
SELECT T.B) THEN 1
ELSE 0
END AS DistinctFrom
FROM T
Is equivalent to
SELECT A,
B,
CASE
WHEN A <> B
OR ( A IS NULL
AND B IS NOT NULL )
OR ( A IS NOT NULL
AND B IS NULL ) THEN 1
ELSE 0
END AS DistinctFrom
FROM T

JOIN ON subselect returns what I want, but surrounding select is missing records when subselect returns NULL

I have a table where I am storing records with a Created_On date and a Last_Updated_On date. Each new record will be written with a Created_On, and each subsequent update writes a new row with the same Created_On, but an updated Last_Updated_On.
I am trying to design a query to return the newest row of each. What I have looks something like this:
SELECT
t1.[id] as id,
t1.[Store_Number] as storeNumber,
t1.[Date_Of_Inventory] as dateOfInventory,
t1.[Created_On] as createdOn,
t1.[Last_Updated_On] as lastUpdatedOn
FROM [UserData].[dbo].[StoreResponses] t1
JOIN (
SELECT
[Store_Number],
[Date_Of_Inventory],
MAX([Created_On]) co,
MAX([Last_Updated_On]) luo
FROM [UserData].[dbo].[StoreResponses]
GROUP BY [Store_Number],[Date_Of_Inventory]) t2
ON
t1.[Store_Number] = t2.[Store_Number]
AND t1.[Created_On] = t2.co
AND t1.[Last_Updated_On] = t2.luo
AND t1.[Date_Of_Inventory] = t2.[Date_Of_Inventory]
WHERE t1.[Store_Number] = 123
ORDER BY t1.[Created_On] ASC
The subselect works fine...I see X number of rows, grouped by Store_Number and Date_Of_Inventory, some of which have luo (Last_Updated_On) values of NULL. However, those rows in the sub-select where luo is null do not appear in the overall results. In other words, where I get 6 results in the sub-select, I only get 2 in the overall results, and its only those rows where the Last_Updated_On is not NULL.
So, as a test, I wrote the following:
SELECT 1 WHERE NULL = NULL
And got no results, but, when I run:
SELECT 1 WHERE 1 = 1
I get back a result of 1. Its as if SQL Server is not relating NULL to NULL.
How can I fix this? Why wouldn't two fields compare when both values are NULL?
You could use Coalesce (example assuming Store_Number is an integer)
ON
Coalesce(t1.[Store_Number],0) = Coalesce(t2.[Store_Number],0)
The ANSI Null comparison is not enabled by default; NULL doesn't equal NULL.
You can enable this (if your business case and your Database design usage of NULL requires this) by the Hint:
SET ansi_nulls off
Another alternative basic turn around using:
ON ((t1.[Store_Number] = t2.[Store_Number]) OR
(t1.[Store_Number] IS NULL AND t2.[Store_Number] IS NULL))
Executing your POC:
SET ansi_nulls off
SELECT 1 WHERE NULL = NULL
Returns:
1
This also works:
AND EXISTS (SELECT t1.Store_Number INTERSECT SELECT t2.Store_Number)

Resources