Cost of Inline Table Valued function in SQL Server

Cost of Inline Table Valued function in SQL Server - sql-server

Is there an inherent cost to using inline-table-valued functions in SQL Server 2008 that is not incurred if the SQL is inlined directly? Our application makes very heavy use of inline-table-valued functions to reuse common queries, but recently, we've found that queries run much faster if we don't use them.
Consider this:
CREATE FUNCTION dbo.fn_InnerQuery (#asOfDate DATETIME)
RETURNS TABLE
AS
RETURN
(
SELECT ... -- common, complicated query here
)
Now, when I do this:
SELECT TOP 10 Amount FROM dbo.fn_InnerQuery(dbo.Date(2009,1,1)) ORDER BY Amount DESC
The query returns with results in about 15 seconds.
However, when I do this:
SELECT TOP 10 Amount FROM
(
SELECT ... -- inline the common, complicated query here
) inline
ORDER BY Amount DESC
The query returns in less than 1 second.
I'm a little baffled by the overhead of using the table valued function in this case. I did not expect that. We have a ton of table valued functions in our application, so I'm wondering if there is something I'm missing here.

In this case, the UDF should be unnested/expanded like a view and it should be transparent.
Obviously, it's not...
In this case, my guess is that the column is smalldatetime and is cast to datetime because of the udf parameter but the constant is correctly evaluated (to match colum datatype) when inline.
datetime has a higher precedence that smalldatetime, so the column would be cast
What do the query plans say? The UDF would show a scan, the inline a seek most likely (not 100%, just based on what I've seen before)
Edit: Blog post by Adam Machanic

One thing that can slow functions down is omitting dbo. from table references inside the function. That causes SQL Server to do a security check for every call, which can be expensive.

Try running the table valued function independently to see, how fast/slow it executes?
Also, I am not sure how to clear the execution cache(?) which SQL Server might retain from the execution of the UDF. I mean - if you run the UDF first, it could be the case where SQL Server has the actual query with it & it could cache the plan/result. So, if you run the complicated query separately - it could be running it from cache.

In your second example the Table Valued function has to return the entire data set before the query can apply the filter. Hopping across the TF boundary is not something that the optimiser can always do.
In the third example the query optimiser can work out that the user only wants the top few 'amounts'. If this isn't an aggregate value the optimiser can push that processing right to the start of the query and not bother with any other data. If it is an aggregate amount then the slowdown is for a different reason.
If you compare the query plans of the two queries you should see that they are different.

Related

Difference between scalar, table-valued, and aggregate functions in SQL server?

What is the difference between scalar-valued, table-valued, and aggregate functions in SQL server? And does calling them from a query need a different method, or do we call them in the same way?

Scalar Functions
Scalar functions (sometimes referred to as User-Defined Functions / UDFs) return a single value as a return value, not as a result set, and can be used in most places within a query or SET statement, except for the FROM clause (and maybe other places?). Also, scalar functions can be called via EXEC, just like Stored Procedures, though there are not many occasions to make use of this ability (for more details on this ability, please see my answer to the following question on DBA.StackExchange: Why scalar valued functions need execute permission rather than select?). These can be created in both T-SQL and SQLCLR.
T-SQL (UDF):
Prior to SQL Server 2019: these scalar functions are typically a performance issue because they generally run for every row returned (or scanned) and always prohibit parallel execution plans.
Starting in SQL Server 2019: certain T-SQL scalar UDFs can be inlined, that is, have their definitions placed directly into the query such that the query does not call the UDF (similar to how iTVFs work (see below)). There are restrictions that can prevent a UDF from being inlineable (if that wasn't a word before, it is now), and UDFs that can be inlined will not always be inlined due to several factors. This feature can be disabled at the database, query, and individual UDF levels. For more information on this really cool new feature, please see: Scalar UDF Inlining (be sure to review the "requirements" section).
SQLCLR (UDF): these scalar functions also typically run per each row returned or scanned, but there are two important benefits over T-SQL UDFs:
Starting in SQL Server 2012, return values can be constant-folded into the execution plan IF the UDF does not do any data access, and if it is marked IsDeterministic = true. In this case the function wouldn't run per each row.
SQLCLR scalar functions can work in parallel plans ( 😃 ) if they do not do any database access.
Table-Valued Functions
Table-Valued Functions (TVFs) return result sets, and can be used in a FROM clause, JOIN, or CROSS APPLY / OUTER APPLY of any query, but unlike simple Views, cannot be the target of any DML statements (INSERT / UPDATE / DELETE). These can also be created in both T-SQL and SQLCLR.
T-SQL MultiStatement (TVF): these TVFs, as their name implies, can have multiple statements, similar to a Stored Procedure. Whatever results they are going to return are stored in a Table Variable and returned at the very end; meaning, nothing is returned until the function is done processing. The estimated number of rows that they will return, as reported to the Query Optimizer (which impacts the execution plan) depends on the version of SQL Server:
Prior to SQL Server 2014: these always report 1 (yes, just 1) row.
SQL Server 2014 and 2016: these always report 100 rows.
Starting in SQL Server 2017: default is to report 100 rows, BUT under some conditions the row count will be fairly accurate (based on current statistics) thanks to the new Interleaved Execution feature.
T-SQL Inline (iTVF): these TVFs can only ever be a single statement, and that statement is a full query, just like a View. And in fact, Inline TVFs are essentially a View that accepts input parameters for use in the query. They also do not cache their own query plan as their definition is placed into the query in which they are used (unlike the other objects described here), hence they can be optimized much better than the other types of TVFs ( 😃 ). These TVFs perform quite well and are preferred if the logic can be handled in a single query.
SQLCLR (TVF): these TVFs are similar to T-SQL MultiStatement TVFs in that they build up the entire result set in memory (even if it is swap / page file) before releasing all of it at the very end. The estimated number of rows that they will return, as reported to the Query Optimizer (which impacts the execution plan) is always 1000 rows. Given that a fixed row count is far from ideal, please support my request to allow for specifying the row count: Allow TVFs (T-SQL and SQLCLR) to provide user-defined row estimates to query optimizer
SQLCLR Streaming (sTVF): these TVFs allow for complex C# / VB.NET code just like regular SQLCLR TVFs, but are special in that they return each row to the calling query as they are generated ( 😃 ). This model allows the calling query to start processing the results as soon as the first one is sent so the query doesn't need to wait for the entire process of the function to complete before it sees any results. And it requires less memory since the results aren't being stored in memory until the process completes. The estimated number of rows that they will return, as reported to the Query Optimizer (which impacts the execution plan) is always 1000 rows. Given that a fixed row count is far from ideal, please support my request to allow for specifying the row count: Allow TVFs (T-SQL and SQLCLR) to provide user-defined row estimates to query optimizer
Aggregate Functions
User-Defined Aggregates (UDA) are aggregates similar to SUM(), COUNT(), MIN(), MAX(), etc. and typically require a GROUP BY clause. These can only be created in SQLCLR, and that ability was introduced in SQL Server 2005. Also, starting in SQL Server 2008, UDAs were enhanced to allow for multiple input parameters ( 😃 ). One particular deficiency is that there is no knowledge of row ordering within the group, so creating a running total, which would be relatively easy if ordering could be guaranteed, is not possible within a SAFE Assembly.
Please also see:
CREATE FUNCTION (MSDN documentation)
CREATE AGGREGATE (MSDN documentation)
CLR Table-Valued Function Example with Full Streaming (STVF / TVF) (article I wrote)

A scalar function returns a single value. It might not even be related to tables in your database.
A tabled-valued function returns your specified columns for rows in your table meeting your selection criteria.
An aggregate-valued function returns a calculation across the rows of a table -- for example summing values.

Scalar function
Returns a single value. It is just like writing functions in other programming languages using T-SQL syntax.
Table Valued function
Is a little different compared to the above. Returns a table value. Inside the body of this function you write a query that will return the exact table.
For example:
CREATE FUNCTION <function name>(parameter datatype)
RETURN table
AS
RETURN
(
-- *write your query here* ---
)
Note that there is no BEGIN & END statements here.
Aggregate Functions
Includes built in functions that is used alongside GROUP clause. For example: SUM(),MAX(),MIN(),AVG(),COUNT() are aggregate functions.

Aggregate and Scalar functions both return a single value but Scalar functions operate based on a single input value argument while Aggregate functions operate on a single input set of values (a collection or column name). Examples of Scalar functions are string functions, ISNULL, ISNUMERIC, for Aggregate functions examples are AVG, MAX and others you can find in Aggregate Functions section of Microsoft website.
Table-Valued functions return a table regardless existence of any input argument. Execution of this functions is done by using them as a regular physical table e.g: SELECT * FROM fnGetMulEmployee()
This following link is very useful to understand the difference: https://www.dotnettricks.com/learn/sqlserver/different-types-of-sql-server-functions

Do SQL Server functions such as inline table-values functions persist?

I am aware that derived table and Common table expression (CTE) do not persist. They live in memory til the end of the outer query. Every call is a repeated execution.
Do functions such as inline table-valued functions persist, meaning they are only calculated once ? Can we index an inline table-valued function?

Inline function is basically the same thing as a view or a CTE, except that it has parameters. If you look at the query plan you'll see that the logic from the function will be included in the query using it -- so no, you can't index it and SQL Server doesn't cache it's results as such, but of course the pages will be in buffer pool for future use.
I wouldn't say that each call to CTE is a repeated execution either, since SQL server can freely decide how to run the query, as long as the results are correct.
For multi statement UDF each of the calls (at least in versions up to 2014) are separate executions, as far as I know, every time, and not cached in the sense I assume you mean.

Do functions such as inline table-valued functions persist
NO, if you see syntax of table valued function it returns result of a select statement in essence and so it doesn't store the fetched data anywhere (same as in view). So, NO there is no question of creating index on it since the data doesn't gets stored.
Unless you are storing that fetched data in another table like below and then you can create a index/other stuff on that test table;
SELECT * FROM yourInlineTableValuedFunction(parameter)
INTO TestTable;

Can we index an inline table-valued function?
No, but if you make the table into a temp table, sure you can index and speed up. The overhead of creation of a temp table will more than pay off in improved indexed access, caching, and, based on your use case, repeated use of the same temp table in a multi-user scenario.

How can I force a subquery to perform as well as a #temp table?

I am re-iterating the question asked by Mongus Pong Why would using a temp table be faster than a nested query? which doesn't have an answer that works for me.
Most of us at some point find that when a nested query reaches a certain complexity it needs to broken into temp tables to keep it performant. It is absurd that this could ever be the most practical way forward and means these processes can no longer be made into a view. And often 3rd party BI apps will only play nicely with views so this is crucial.
I am convinced there must be a simple queryplan setting to make the engine just spool each subquery in turn, working from the inside out. No second guessing how it can make the subquery more selective (which it sometimes does very successfully) and no possibility of correlated subqueries. Just the stack of data the programmer intended to be returned by the self-contained code between the brackets.
It is common for me to find that simply changing from a subquery to a #table takes the time from 120 seconds to 5. Essentially the optimiser is making a major mistake somewhere. Sure, there may be very time consuming ways I could coax the optimiser to look at tables in the right order but even this offers no guarantees. I'm not asking for the ideal 2 second execute time here, just the speed that temp tabling offers me within the flexibility of a view.
I've never posted on here before but I have been writing SQL for years and have read the comments of other experienced people who've also just come to accept this problem and now I would just like the appropriate genius to step forward and say the special hint is X...

There are a few possible explanations as to why you see this behavior. Some common ones are
The subquery or CTE may be being repeatedly re-evaluated.
Materialising partial results into a #temp table may force a more optimum join order for that part of the plan by removing some possible options from the equation.
Materialising partial results into a #temp table may improve the rest of the plan by correcting poor cardinality estimates.
The most reliable method is simply to use a #temp table and materialize it yourself.
Failing that regarding point 1 see Provide a hint to force intermediate materialization of CTEs or derived tables. The use of TOP(large_number) ... ORDER BY can often encourage the result to be spooled rather than repeatedly re evaluated.
Even if that works however there are no statistics on the spool.
For points 2 and 3 you would need to analyse why you weren't getting the desired plan. Possibly rewriting the query to use sargable predicates, or updating statistics might get a better plan. Failing that you could try using query hints to get the desired plan.

I do not believe there is a query hint that instructs the engine to spool each subquery in turn.
There is the OPTION (FORCE ORDER) query hint which forces the engine to perform the JOINs in the order specified, which could potentially coax it into achieving that result in some instances. This hint will sometimes result in a more efficient plan for a complex query and the engine keeps insisting on a sub-optimal plan. Of course, the optimizer should usually be trusted to determine the best plan.
Ideally there would be a query hint that would allow you to designate a CTE or subquery as "materialized" or "anonymous temp table", but there is not.

Another option (for future readers of this article) is to use a user-defined function. Multi-statement functions (as described in How to Share Data between Stored Procedures) appear to force the SQL Server to materialize the results of your subquery. In addition, they allow you to specify primary keys and indexes on the resulting table to help the query optimizer. This function can then be used in a select statement as part of your view. For example:
CREATE FUNCTION SalesByStore (#storeid varchar(30))
RETURNS #t TABLE (title varchar(80) NOT NULL PRIMARY KEY,
qty smallint NOT NULL) AS
BEGIN
INSERT #t (title, qty)
SELECT t.title, s.qty
FROM sales s
JOIN titles t ON t.title_id = s.title_id
WHERE s.stor_id = #storeid
RETURN
END
CREATE VIEW SalesData As
SELECT * FROM SalesByStore('6380')

Having run into this problem, I found out that (in my case) SQL Server was evaluating the conditions in incorrect order, because I had an index that could be used (IDX_CreatedOn on TableFoo).
SELECT bar.*
FROM
(SELECT * FROM TableFoo WHERE Deleted = 1) foo
JOIN TableBar bar ON (bar.FooId = foo.Id)
WHERE
foo.CreatedOn > DATEADD(DAY, -7, GETUTCDATE())
I managed to work around it by forcing the subquery to use another index (i.e. one that would be used when the subquery was executed without the parent query). In my case I switched to PK, which was meaningless for the query, but allowed the conditions from the subquery to be evaluated first.
SELECT bar.*
FROM
(SELECT * FROM TableFoo WITH (INDEX([PK_Id]) WHERE Deleted = 1) foo
JOIN TableBar bar ON (bar.FooId = foo.Id)
WHERE
foo.CreatedOn > DATEADD(DAY, -7, GETUTCDATE())
Filtering by the Deleted column was really simple and filtering the few results by CreatedOn afterwards was even easier. I was able to figure it out by comparing the Actual Execution Plan of the subquery and the parent query.
A more hacky solution (and not really recommended) is to force the subquery to get executed first by limiting the results using TOP, however this could lead to weird problems in the future if the results of the subquery exceed the limit (you could always set the limit to something ridiculous). Unfortunately TOP 100 PERCENT can't be used for this purpose since SQL Server just ignores it.

SQL Server Multi-statement UDF - way to store data temporarily required

I have a relatively complex query, with several self joins, which works on a rather large table.
For that query to perform faster, I thus need to only work with a subset of the data.
Said subset of data can range between 12 000 and 120 000 rows depending on the parameters passed.
More details can be found here: SQL Server CTE referred in self joins slow
As you can see, I was using a CTE to return the data subset before, which caused some performance problems as SQL Server was re-running the Select statement in the CTE for every join instead of simply being run once and reusing its data set.
The alternative, using temporary tables worked much faster (while testing the query in a separate window outside the UDF body).
However, when I tried to implement this in a multi-statement UDF, I was harshly reminded by SQL Server that multi-statement UDFs do not support temporary tables for some reason...
UDFs do allow table variables however, so I tried that, but the performance is absolutely horrible as it takes 1m40 for my query to complete whereas the CTE version only took 40 seconds.
I believe the table variables is slow for reasons listed in this thread: Table variable poor performance on insert in SQL Server Stored Procedure
Temporary table version takes around 1 seconds, but I can't make it into a function due to the SQL Server restrictions, and I have to return a table back to the caller.
Considering that CTE and table variables are both too slow, and that temporary tables are rejected in UDFs, What are my options in order for my UDF to perform quickly?
Thanks a lot in advance.

In many such cases all we need to do is to declare primary keys for those table variables, and it is fast again.

Set up and use a Process-Keyed Table, See the article: from How to Share Data Between Stored Procedures by Erland Sommarskog

One kludgey work-around I've used involves code like so (psuedo code follows):
CREATE TEMP TABLE #foo
EXECUTE MyStoredProcedure
SELECT *
from #foo
GO
-- Stored procedure definition
CREATE PROCEDURE MyStoredProcedure
AS
INSERT #foo values (whatever)
RETURN
GO
In short, the stored procedure references and uses a temp table created by the calling procedure (or routine). This will work, but it can be confusing for others to follow what's going on if you don't document it clearly, and you will get recompiles, statistics recalcs, and other oddness that may consume unwanted clock cycles.

SQL Server query plan differences

I'm having trouble understanding the behavior of the estimated query plans for my statement in SQL Server when a change from a parameterized query to a non-parameterized query.
I have the following query:
DECLARE #p0 UniqueIdentifier = '1fc66e37-6eaf-4032-b374-e7b60fbd25ea'
SELECT [t5].[value2] AS [Date], [t5].[value] AS [New]
FROM (
SELECT COUNT(*) AS [value], [t4].[value] AS [value2]
FROM (
SELECT CONVERT(DATE, [t3].[ServerTime]) AS [value]
FROM (
SELECT [t0].[CookieID]
FROM [dbo].[Usage] AS [t0]
WHERE ([t0].[CookieID] IS NOT NULL) AND ([t0].[ProductID] = #p0)
GROUP BY [t0].[CookieID]
) AS [t1]
OUTER APPLY (
SELECT TOP (1) [t2].[ServerTime]
FROM [dbo].[Usage] AS [t2]
WHERE ((([t1].[CookieID] IS NULL) AND ([t2].[CookieID] IS NULL))
OR (([t1].[CookieID] IS NOT NULL) AND ([t2].[CookieID] IS NOT NULL)
AND ([t1].[CookieID] = [t2].[CookieID])))
AND ([t2].[CookieID] IS NOT NULL)
AND ([t2].[ProductID] = #p0)
ORDER BY [t2].[ServerTime]
) AS [t3]
) AS [t4]
GROUP BY [t4].[value]
) AS [t5]
ORDER BY [t5].[value2]
This query is generated by a Linq2SQL expression and extracted from LINQPad. This produces a nice query plan (as far as I can tell) and executes in about 10 seconds on the database. However, if I replace the two uses of parameters with the exact value, that is replace the two '= #p0' parts with '= '1fc66e37-6eaf-4032-b374-e7b60fbd25ea' ' I get a different estimated query plan and the query now runs much longer (more than 60 seconds, haven't seen it through).
Why is it that performing the seemingly innocent replacement produces a much less efficient query plan and execution? I have cleared the procedure cache with 'DBCC FreeProcCache' to ensure that I was not caching a bad plan, but the behavior remains.
My real problem is that I can live with the 10 seconds execution time (at least for a good while) but I can't live with the 60+ sec execution time. My query will (as hinted above) by produced by Linq2SQL so it is executed on the database as
exec sp_executesql N'
...
WHERE ([t0].[CookieID] IS NOT NULL) AND ([t0].[ProductID] = #p0)
...
AND ([t2].[ProductID] = #p0)
...
',N'#p0 uniqueidentifier',#p0='1FC66E37-6EAF-4032-B374-E7B60FBD25EA'
which produces the same poor execution time (which I think is doubly strange since this seems to be using parameterized queries.
I'm not looking for advise on which indexes to create or the like, I'm just trying to understand why the query plan and execution are so dissimilar on three seemingly similar queries.
EDIT: I have uploaded execution plans for the non-parameterized and the parameterized query as well as an execution plan for a parameterized query (as suggested by Heinz) with a different GUID here
Hope it helps you help me :)

If you provide an explicit value, SQL Server can use statistics of this field to make a "better" query plan decision. Unfortunately (as I've experienced myself recently), if the information contained in the statistics is misleading, sometimes SQL Server just makes the wrong choices.
If you want to dig deeper into this issue, I recommend you to check what happens if you use other GUIDs: If it uses a different query plan for different concrete GUIDs, that's an indication that statistics data is used. In that case, you might want to look at sp_updatestats and related commands.
EDIT: Have a look at DBCC SHOW_STATISTICS: The "slow" and the "fast" GUID are probably in different buckets in the histogram. I've had a similar problem, which I solved by adding an INDEX table hint to the SQL, which "guides" SQL Server towards finding the "right" query plan. Basically, I've looked at what indices are used during a "fast" query and hard-coded those into the SQL. This is far from an optimal or elegant solution, but I haven't found a better one yet...

I'm not looking for advise on which indexes to create or the like, I'm just trying to understand why the query plan and execution are so dissimilar on three seemingly similar queries.
You seem to have two indexes:
IX_NonCluster_Config (ProductID, ServerTime)
IX_NonCluster_ProductID_CookieID_With_ServerTime (ProductID, CookieID) INCLUDE (ServerTime)
The first index does not cover CookieID but is ordered on ServerTime and hence is more efficient for the less selective ProductID's (i. e. those that you have many)
The second index does cover all columns but is not ordered, and hence is more efficient for more selective ProductID's (those that you have few).
In average, you ProductID cardinality is so that SQL Server expects the second method to be efficient, which is what it uses when you use parametrized queries or explicitly provide selective GUID's.
However, your original GUID is considered less selective, that's why the first method is used.
Unfortunately, the first method requires additional filtering on CookieID which is why it's less efficient in fact.

My guess is that when you take the non paramaterized route, your guid has to be converted from a varchar to a UniqueIdentifier which may cause an index not to be used, while it will be used taking the paramatarized route.
I've seen this happen with using queries that have a smalldatetime in the where clause against a column that uses a datetime.

Its difficult to tell without looking at the execution plans, however if I was going to guess at a reason I'd say that its a combinaton of parameter sniffing and poor statistics - In the case where you hard-code the GUID into the query, the query optimiser attempts to optimise the query for that value of the parameter. I believe that the same thing happens with the parameterised / prepared query (this is called parameter sniffing - the execution plan is optimised for the parameters used the first time that the prepared statement is executed), however this definitely doesn't happen when you declare the parameter and use it in the query.
Like I said, SQL server attempt to optimise the execution plan for that value, and so usually you should see better results. It seems here that that information it is basing its decisions on is incorrect / misleading, and you are better off (for some reason) when it optimises the query for a generic parameter value.
This is mostly guesswork however - its impossible to tell really without the execution - if you can upload the executuion plan somewhere then I'm sure someone will be able to help you with the real reason.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight