CROSS APPLY with table valued function restriction performance

CROSS APPLY with table valued function restriction performance - sql-server

I have problem with CROSS APPLY with parametrised table valued function.
Here is simplified pseudo code example:
SELECT *
FROM (
SELECT lor.*
FROM LOT_OF_ROWS_TABLE lor
WHERE ...
) AS lor
CROSS APPLY dbo.HeavyTableValuedFunction(lor.ID) AS htvf
INNER JOIN ANOTHER_TABLE AS at ON lor.ID = at.ID
WHERE ...
Inner select on table LOT_OF_ROWS_TABLE is returning many rows.
Joining tables LOT_OF_ROWS_TABLE and ANOTHER_TABLE returns only one or few rows.
Table valued function is very time consuming and when calling for a lot of
rows the select lasts very long time.
My problem:
The function is called for all rows returned from LOT_OF_ROWS_TABLE regardless of the fact that the data will be limited when just join ANOTHER_TABLE.
The select has to be in the shown format - it is generated and in fact it is much more dificult.
When I try to rewrite it, it can be very fast, but it cannot be rewritten like this:
SELECT *
FROM (
SELECT lor.*
FROM LOT_OF_ROWS_TABLE lor
WHERE ...
) AS lor
INNER JOIN ANOTHER_TABLE AS at ON lor.ID = at.ID
CROSS APPLY dbo.HeavyTableValuedFunction(at.ID) AS htvf
WHERE ...
I'd like to know:
Is there any setting or hint or something that forces select to call function only for finally restricted rows?
Thank you.
EDIT:
The table valued function is very complex: http://pastebin.com/w6azRvxR.
The select we are talking about is "user configured" and generated: http://pastebin.com/bFbanY2n.

you can divide this query into 2 parts use either table variable or temp table
SELECT lor.*,at.* into #tempresult
FROM (
SELECT lor.*
FROM LOT_OF_ROWS_TABLE lor
WHERE ...
) lor
INNER JOIN ANOTHER_TABLE AS at ON lor.ID = at.ID
WHERE ...
now do the time consuming part which is table valued function right
SELECT * FROM #tempresult
CROSS APPLY dbo.HeavyTableValuedFunction(#tempresult.ID) AS htvf

I believe this is what you are looking for.
Plan Forcing Scenario: Create a Plan Guide to Force a Plan Obtained from a Rewritten Query
Basically it describes re-writing the query to get a generated plan using the correct order of joins. Then saving off that plan and forcing your existing query (that does not get changed) to use the plan you saved off.
The BOL link I put in even gives a specific example of re-writing the query putting the joins in a different order and using a FORCE ORDER hint. Then using sp_create_plan_guild to take the plan from the re-written query and use it on the original query.

YES and NO... it's hard to interprit what you're trying to achieve without sample data IN and result OUT, to compare outcomes.
I'd like to know:
Is there any setting or hint or something that forces select to call
function only for finally restricted rows?
So I'll answer your question above (3 years later!!) directly, with a direct statement:
You need to learn about CTE and the difference between CROSS APPLY
compared to INNER JOIN and why using CROSS APPLY in your case is
necessary. You "could" take the code in your function and apply it
into a single SQL statement using CTE.
ie:
Read this and this.
Essentially, something like this...
WITH t2o AS
(
SELECT t2.*, ROW_NUMBER() OVER (PARTITION BY t1_id ORDER BY rank) AS rn
FROM t2
)
SELECT t1.*, t2o.*
FROM t1
INNER JOIN
t2o
ON t2o.t1_id = t1.id
AND t2o.rn <= 3
Apply your query to extrapolate the date you want ONCE, and using CTE, then apply your second SQL using the CROSS APPLY.
You have no choice. You cannot do what you're trying to do in ONE SQL.

Related

Need help write a loop in snowflake

I have data like below in the table. component part no shows the part no replaced for Part no.
Table having data
I want to write a code where I get the last part i.e. the latest part. The loop ends when Part doesnt return anything.
I want to show the data like below:
How data is needed
I tried using recursive CTE but the data is huge in table thatit keeps on running for 2 hours.
I am weak in writing stored procedure.
Any way we can achieve it? We are okay if it completes in 1 hour.

If we need to analyze the level of nesting, CTE is a good solution. The key is to choose the starting point right. Only the roots. So that there will be no infinite loops or duplicate results.
If the CTE takes too long and there is too much data, maybe try to scale up the warehouse or divide the data into batches.
The CTE should look something like this:
CREATE OR REPLACE TABLE T1 (
PART_NO STRING,
COMPOMENT_NO STRING);
INSERT INTO T1 (PART_NO, COMPOMENT_NO)
VALUES ('9U8806', '1252127'),
('1252127', '1073295'),
('1073295', '1386464'),
('1386464', '2320160'),
('2320160', '3153441');
WITH CTE AS (
SELECT T1.PART_NO AS ORIGINAL_PART_NO, T1.PART_NO, T1.PART_NO AS PREVIOUS_PART_NO, 1 AS PART_LEVEL
FROM T1
WHERE T1.PART_NO NOT IN (SELECT COMPOMENT_NO FROM T1) -- only roots
UNION ALL
SELECT CTE.ORIGINAL_PART_NO, T1.COMPOMENT_NO AS PART_NO, CTE.PART_NO AS PREVIOUS_PART_NO, CTE.PART_LEVEL + 1 AS PART_LEVEL
FROM T1
JOIN CTE ON CTE.PART_NO = T1.PART_NO
)
SELECT *
FROM CTE;

Can you set multiple column names as a macro in SQL to query against?

Can you set multiple column names from a SQL table as a macro in SQL to query against?
For example I have multiple columns I am hitting against multiple times, can I use a macro or some type of reference to identify them ONCE to avoid displaying them repetitively and cluttering up the code?
The current code works, I am just looking for a cleaner/streamlined option.
Current Code:
WHERE ('ABC') IN
([CODE1],[CODE2],[CODE3],[CODE4],[CODE5],[CODE6],[CODE7],[CODE8]
,[CODE9],[CODE10],[CODE11],[CODE12],[CODE13],[CODE14],[CODE15]
,[CODE16],[CODE17],[CODE18],[CODE19],[CODE20],[CODE21],[CODE22]
,[CODE23],[CODE24],[CODE25]
AND ('CFS') IN
([CODE1],[CODE2],[CODE3],[CODE4],[CODE5],[CODE6],[CODE7],[CODE8]
,[CODE9],[CODE10],[CODE11],[CODE12],[CODE13],[CODE14],[CODE15]
,[CODE16],[CODE17],[CODE18],[CODE19],[CODE20],[CODE21],[CODE22]
,[CODE23],[CODE24],[CODE25]
ect...(20 more times)
Goal:
WHERE 'ABC' IN (&columnsmentionedabove)
OR 'FGS' in (&columnsmentionedabove)
OR 'g6s' in (&columnsmentionedabove)
etc.....
This is inherited code and just seems very clunky.
Thank you

Numbered columns like this are almost always a sign you should have an additional table. So if your existing table structure is like this:
Table1
Table1ID, OtherFields, Code1, Code2, Code3.... Code25
You really want something more like this:
Table1
Table1ID, OtherFields
Table1Codes
Table1ID, Code
Where each entry in Table1 will have many entries in Table1Codes. Then you write JOIN statements to show the two sets side-by-side when needed.
FROM Table1 t
INNER JOIN Table1Codes tc1 ON tc.Table1ID = t.Table1ID AND tc.Code = 'ABC'
INNER JOIN Table1Codes tc2 ON tc.Table1ID = t.Table1ID AND tc.Code = 'CFS'
Or
FROM Table1 t
INNER JOIN Table1Codes tc1 ON tc.Table1ID = t.Table1ID AND tc.Code IN ('ABC','FGS','g6s')

If you can't change the table's schema, as in often the case, you can UNPIVOT it. For example, assuming CODE1...CODE25 come from MyTable, wrap the UNPIVOT operation inside a CTE:
;WITH
cte AS
(
SELECT upvt.*
FROM MyTable
UNPIVOT (
CodeValue FOR CodeLabel IN ([CODE1], [CODE2], ..., [CODE25])
) upvt
)
SELECT *
FROM cte
WHERE CodeValue IN ('ABC', 'DEF', ...)
The unpivot operation is not free. Make sure you filter as much as possible from MyTable before unpivoting the it.

How does t-sql update work without a join

I think my head is muddy or something. I'm trying to figure out how a t-sql update works without a join when updating one table from another. I've always used joins in the past but came across a stored proc where someone else created one without a join. This update is being used in SQL 2008R2 and it works.
Update table1
SET col1 = (SELECT TOP 1 colX FROM table2 WHERE colZ = colY),
col2 = (SELECT TOP 1 colE FROM table2 WHERE colZ = colY)
Obviously, colY is a field in table1. To get the same results in a select statement (not update), a join is required. I guess I don't understand how an update works behind the scenes but it must be doing some kind of join?

SQL Server translates those subqueries into joins. You can look at this by getting the query plan. You can write an equivalent query with UPDATE ... FROM ... JOIN syntax and observe the query plan to be essentially the same.
The sample code shown is unusual, hard to understand, redundant and inflexible. I recommend against using this style.

No it's doing a sub query, well two in this case. Be damn painful if you have another 98 col fields.
You can do something similar for select
select *,
(SELECT TOP 1 colX FROM table2 WHERE colZ = colY) as col1
From table1
A left join would simply be more efficient
Your example unless the dbms optimises it it running the subquery(ies) for each row in table.
Got to say whoever wrote it is less than competent.

These subqueries are what is called correlated subqueries. If you were to write the same query as a SELECT rather than an UPDATE it would look like this.
SELECT col1 = (SELECT TOP 1 table2.colX FROM table2 WHERE table2.colZ = table1.colY),
col2 = (SELECT TOP 1 table2.colE FROM table2 WHERE table2.colZ = table1.colY)
FROM table1
The JOIN is in the fact that you are referencing a column from an outside table on the inside of the subquery. Table1 is referenced in the UPDATE command. You can include a FROM clause but it isn't required for a setup like this.

You can use the same syntax in a SELECT with no join, but you need to alias the table if colY also exists in table2
SELECT (SELECT TOP 1 colX FROM table2 WHERE colZ = T.colY)
, (SELECT TOP 1 colE FROM table2 WHERE colZ = T.colY)
FROM table1 AS T
I only ever use this sort of thing when building up an ad hoc query just for my own infomation. If it's going to be put into any sort of permanent code I'll convert it to a join as it's easier to read and more maintainable.

Dynamic inner query

Is there a way to code a dynamic inner query? Basically, I find myself typing something like the following query over and over:
;with tempData as (
--this inner query is the part that changes, but there's always a timeGMT column.
select timeGMT, dataCol2, dataCol3
from tbl1 t1
join tbl2 t2 on t1.ID=t2.ID
)
select dateadd(ss,d.gmtOffset,t.timeGMT) timeLocal,
t.*
from tempData t
join dst d on t.timeGMT between d.sTimeGMT and d.eTimeGMT
where d.zone = 'US-Eastern'
The only thing I can think of is a stored proc with the inner query text as the input for some dynamic sql... However, my understanding of the optimizer (which is, admittedly, limited) says this isn't really a good idea.

From a performance perspective, what you have there is the version on which I would expect the optimizer to do the best job.
If the "outer" part of your example is static and code maintenance overrides performance, I'd look to encapsulating the dateadd result in a table-valued function (TVF). Since the time conversion is very much the common thread in these queries, I would definitely focus on that part of the workload.
For example, your query that can vary would look like this:
select timeGMT, dataCol2, dataCol3, lt.timeLocal
from tbl1 t1
join tbl2 t2 on t1.ID = t2.ID
cross apply dbo.LocalTimeGet(timeGMT, 'US-Eastern') AS lt
Where the TVF dbo.LocalTimeGet contains the logic for dateadd(ss,d.gmtOffset,t.timeGMT) and the lookup of the time zone offset value based on the time zone name. The implementation of that function would look something like:
CREATE FUNCTION dbo.LocalTimeGet (
#TimeGMT datetime,
#TimeZone varchar(20)
)
RETURNS TABLE
AS
RETURN (
SELECT DATEADD(ss, d.gmtOffset, #TimeGMT) AS timeLocal
FROM dst AS d
WHERE d.zone = #TimeZone
);
GO
The upside of this approach is when you upgrade to 2008 or later, there are system functions you could use to make this conversion a lot easier to code and you'll only have to alter the TVF. If your result sets are small, I'd consider a system scalar function (SQL 2008) over a TVF, even if it implements those same system functions. Based on your comment, it sounds like the system functions won't do what you need, but you could still stick with your implementation of a dst table, which is encapsulated in the TVF above.
TVFs can be a performance problem because the optimizer assumes they only return 1 row.
If you need to combine encapsulation and performance, then I'd do the time zone calc in the application code instead. Even though you'd have to apply it to each project that uses it, you would only have to implement it 1x in each project (in the Data Access Layer) and treat it as a common utility library if you'll be using across projects.

To answer the OP's follow-on question, a SQL Server 2008 solution would look like this:
First, create permanent definitions:
CREATE TYPE dbo.tempDataType AS TABLE (
timeGMT DATETIME,
dataCol2 int,
dataCol3 int)
GO
CREATE PROCEDURE ComputeDateWithDST
#tempData tempDataType READONLY
AS
SELECT dateadd(ss,d.gmtOffset,t.timeGMT) timeLocal, t.*
FROM #tempData t
JOIN dst d ON t.timeGMT BETWEEN d.sTimeGMT AND d.eTimeGMT
WHERE d.zone = 'US-Eastern'
GO
Thereafter, whenever you want to plug a subquery (which has now become a separate query, no longer a CTE) into the stored procedure:
DECLARE #tempData tempDataType
INSERT #tempData
-- sample subquery:
SELECT timeGMT, dataCol2, dataCol3
FROM tbl1 t1
JOIN tbl2 t2 ON t1.ID=t2.ID
EXEC ComputeDateWithDST #tempData;
GO
Performance could be an issue because you'd be running separately what used to be a CTE instead of letting SQL Server combine it with the main query to optimize the execution plan.

Call TVF on every record of a table and concat results

I thought that must be obvious but I can't figure it out.
Say there is a table tblData with a column ID and a table-valued-function (_tvf) that takes an ID as parameter. I need the results for all ID's in tblData.
But:
SELECT * FROM tblData data
INNER JOIN dbo._tvf(data.ID) AS tvfData
ON data.ID = tvfData.ID
gives me an error: The multi-part identifier "data.ID" could not be bound
What is the correct way to pass all ID's to this TVF and concat the results?
Thanks

I think you might need to use CROSS APPLY instead of an inner join here:
SELECT *
FROM dbo.tblData data
CROSS APPLY dbo._tvf(data.ID) AS tvfData
This will call the TVF function for each data.ID of the base table and join the results to the base table's columns.
See ressources here:
Using CROSS APPLY in SQL Server
Understanding APPLY clause in SQL Server
Using T-SQL CROSS APPLY and OUTER APPLY

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

CROSS APPLY with table valued function restriction performance - sql-server

Related

Need help write a loop in snowflake

Can you set multiple column names as a macro in SQL to query against?

How does t-sql update work without a join

Dynamic inner query

Call TVF on every record of a table and concat results

Categories

Resources