Here is simpler version of one of the SELECT statement from my procedure:
select
...
from
...
where
((#SearchTextList IS NULL) OR
(SomeColumn IN (SELECT SomeRelatedColumn From #SearchTextListTable)))
#SearchTextList is just a varchar variable that holds a comma-separated list of strings. #SearchTextListTable is single column temp table that holds search text values.
This query takes 30 seconds to complete, which is performance issue in my application.
If I get rid of the first condition (i.e. if I remove OR condition), it takes just ONE second.
select
...
from
...
where
SomeColumn IN (SELECT SomeRelatedColumn From #SearchTextListTable)
Can somebody please explain why this much difference?
What's going on internally in SQL Server engine?
Thanks.
Since you said that the SQL is fast when you don't have the OR specified, I assume the table has index for SomeColumn and the amount of rows in #SearchTextListTable is small. When that is the case, SQL Server can decide to use the index for searching the rows.
If you specify the or clause, and the query is like this:
((#SearchTextList IS NULL) OR
(SomeColumn IN (SELECT SomeRelatedColumn From #SearchTextListTable)))
SQL Server can't create a plan where the index is used because the plans are cached and must be usable also when #SearchTextList is NULL.
There's usually 2 ways to improve this, either use dynamic SQL or recompile the plan for each execution.
To get the plan recompiled, just add option (recompile) to the end of the query. Unless this query is executed really often, that should be an ok solution. The downside is that it causes slightly higher CPU usage because the plans can't be re-used.
The other option is to create dynamic SQL and execute it with sp_executesql. Since in that point you know if #SearchTextList will be NULL, you can just omit the SomeColumn IN ... when it's not needed. Be aware of SQL injection in this case and don't just concatenate the variable values into the SQL string, but use variables in the SQL and give those as parameter for sp_executesql.
If you only have this one column in the SQL, you could also make 2 separate procedures for both options and execute them from the original procedure depending on which is the case.
Related
Does cfquery becomes a prepared statement as long as there's 1 cfqueryparam? Or are there other conditions?
What happen when the ORDER BY clause or FROM clause is dynamic? Would every unique combination becomes a prepared statement?
And what happen when we're doing cfloop with INSERT, with every value cfqueryparam'ed, and invoke the cfquery with different number of iterations?
Any potential problems with too many prepared statements?
How does DB handle prepared statement? Will they be converted into something similar to store procedure?
Under what circumstances should we Not use prepared statement?
Thank you!
I can answer some parts of your question:
a query will become a preparedStatement as long as there is one <queryparam. I have in the past added a
where 1 = <cfqueryparam value="1" to queries which didn't have any dynamic parameters, in order to get them run as preparedStatements
Most DBs handle preparedStarements similarly to Stored Procedures, just held temporarily, rather than long-term, however the details are likely to be DB-specific.
Assuming you are using the drivers supplied with ColdFusion, if you turn on the 'Log Activity' checkbox in the advanced panel of the DataSource setup, then you'll get very detailed information about how CF is interacting with he DB and when it is creating a new preparedStatement and when it is re-using them. I'd recommend trying this out for yourself, as so many factors are involved (DB setup, Driver, CF version etc). If you do use the DB logging, re-start CF before running your test code, so you can see it creating the prepared statements, otherwise you'll just see it re-using statements by ID, without seeing what those statements are.
In addition, if you are asking about execution plans then there is more involved than just the number PreparedStatement's generated. It is a huge topic and very database dependent. I do not have a DBA's grasp on it, but I can answer a few of the questions about MS SQL.
What happen when the ORDER BY clause or FROM clause is dynamic? Would
every unique combination becomes a prepared statement?
The base sql is different. So you will end up with separate execution plans for each unique ORDER BY clause.
And what happen when we're doing cfloop with INSERT, with every value
cfqueryparam'ed, and invoke the cfquery with different number of
iterations?
MS SQL should reuse the same plan for all iterations because only the parameters change.
The sys.dm_exec_cached_plans view is very useful for seeing what plans are cached and how often they are reused.
SELECT p.usecounts, p.cacheobjtype, p.objtype, t.text
FROM sys.dm_exec_cached_plans p
CROSS APPLY sys.dm_exec_sql_text( p.plan_handle) t
ORDER BY p.usecounts DESC
To clear the cache first, use DBCC FLUSHPROCINDB. Obviously do not use it on a production server.
DECLARE #ID int
SET #ID = DB_ID(N'YourTestDatabaseName')
DBCC FLUSHPROCINDB( #ID )
In SQL Server, what is the best way to allow for multiple execution plans to exist for a query in a SP without having to recompile every time?
For example, I have a case where the query plan varies significantly depending on how many rows are in a temp table that the query uses. Since there was no "one size fits all" plan that was satisfactory, and since it was unacceptable to recompile every time, I ended up copy/pasting (ick) the main query in the SP multiple times within several IF statements, forcing the SQL engine to give each case its own optimal plan. It actually seemed to work beautifully performance-wise, but it feels a bit clunky. (I know I could similarly break this part out into multiple SPs to do the same thing.) Is there a better way to do this?
IF #RowCount < 1
[paste query here]
ELSE IF #RowCount < 50
[paste query here]
ELSE IF #RowCount < 200
[paste query here]
ELSE
[paste query here]
You can use OPTIMIZE FOR in certain situations, to create a plan targeted to a certain value of a parameter (but not multiple plans per se). This allows you to specify what parameter value we want SQL Server to use when creating the execution plan. This is a SQL Server 2005 onwards hint.
Optimize Parameter Driven Queries with the OPTIMIZE FOR Hint in SQL Server
There is also OPTIMIZE FOR UNKNOWN – a SQL Server 2008 onwards feature (use judiciously):
This hint directs the query optimizer
to use the standard algorithms it has
always used if no parameters values
had been passed to the query at all.
In this case the optimizer will look
at all available statistical data to
reach a determination of what the
values of the local variables used to
generate the queryplan should be,
instead of looking at the specific
parameter values that were passed to
the query by the application.
Perhaps also look into optimize for ad hoc workloads Option
SQL Server 2005+ has statement level recompilation and is better at dealing with this kind of branching. You have one plan still but the plan can be partially recompiled at the statement level.
But it is ugly.
I'd go with #Mitch Wheat's option personally because you have recompilations anyway with the stored procedure using a temp table. See Temp table and stored proc compilation
What's the efficient way to check for a null or value for a column in SQL query. Consider a sql table table with integer column column which has an index. #value can be some integer or null ex: 16 or null.
Query 1: Not sure, but it seems one should not rely on the short-circuit in SQL. However, below query always works correctly when #value is some integer or null.
select * from
table
where (#value is null or column = #value)
The below query is an expanded version of the above query. It works correctly too.
select * from
table
where ((#value is null)
or (#value is not null and column = #value))
Would the above 2 queries would take the advantage of the index?
Query 2: The below query compares the column with non-null #value else compares the column column with itself which will always be true and returns everything. It works correctly too. Would this query take advantage of the index?
select * from
table
where (column = isnull(#value, column))
What's the best way?
Note: If the answer varies with databases, I'm interested in MS-SQL.
Variations on this question have come up several times in the past couple of days (why do these things always happen in groups?). The short answer is that yes, SQL Server will short-circuit the logic IF it creates the query plan with known values. So, if you have that code in a script where the variables are set then I believe it should short-circuit the logic (test to be sure). However, if it's in a stored procedure then SQL Server will create a query plan ahead of time and it won't know whether or not it can short-circuit the query, because it doesn't know the parameter values at the time of generating the query plan.
Regardless of whether it is short-circuited or not, SQL Server should be able to use the index if that's the only part of your query. If the variable is NULL though, then you probably don't want SQL Server using the index because it will be useless.
If you're in a stored procedure then your best bet is to use OPTION (RECOMPILE) on your query. This will cause SQL Server to create a new query plan each time. This is a little bit of overhead, but the gains typically outweigh that by a lot. This is ONLY good for SQL 2008 and even then only for some of the later service packs. There was a bug with RECOMPILE before that rendering it useless. For more information check out Erland Sommarskog's great article on the subject. Specifically you'll want to look under the Static SQL sections.
To clarify a point, SQL doesn't really have a short circuit as we know it in C-based languages. What looks like a short circuit is really that in SQL Server ternary logic TRUE OR NULL evaluates to TRUE
TRUE OR NULL ===> TRUE
TRUE AND NULL ===> NULL
Eg:
if 1=null or 1=1 print 'true' else print 'false'
if 1=null and 1=1 print 'true' else print 'false'
Is there an inherent cost to using inline-table-valued functions in SQL Server 2008 that is not incurred if the SQL is inlined directly? Our application makes very heavy use of inline-table-valued functions to reuse common queries, but recently, we've found that queries run much faster if we don't use them.
Consider this:
CREATE FUNCTION dbo.fn_InnerQuery (#asOfDate DATETIME)
RETURNS TABLE
AS
RETURN
(
SELECT ... -- common, complicated query here
)
Now, when I do this:
SELECT TOP 10 Amount FROM dbo.fn_InnerQuery(dbo.Date(2009,1,1)) ORDER BY Amount DESC
The query returns with results in about 15 seconds.
However, when I do this:
SELECT TOP 10 Amount FROM
(
SELECT ... -- inline the common, complicated query here
) inline
ORDER BY Amount DESC
The query returns in less than 1 second.
I'm a little baffled by the overhead of using the table valued function in this case. I did not expect that. We have a ton of table valued functions in our application, so I'm wondering if there is something I'm missing here.
In this case, the UDF should be unnested/expanded like a view and it should be transparent.
Obviously, it's not...
In this case, my guess is that the column is smalldatetime and is cast to datetime because of the udf parameter but the constant is correctly evaluated (to match colum datatype) when inline.
datetime has a higher precedence that smalldatetime, so the column would be cast
What do the query plans say? The UDF would show a scan, the inline a seek most likely (not 100%, just based on what I've seen before)
Edit: Blog post by Adam Machanic
One thing that can slow functions down is omitting dbo. from table references inside the function. That causes SQL Server to do a security check for every call, which can be expensive.
Try running the table valued function independently to see, how fast/slow it executes?
Also, I am not sure how to clear the execution cache(?) which SQL Server might retain from the execution of the UDF. I mean - if you run the UDF first, it could be the case where SQL Server has the actual query with it & it could cache the plan/result. So, if you run the complicated query separately - it could be running it from cache.
In your second example the Table Valued function has to return the entire data set before the query can apply the filter. Hopping across the TF boundary is not something that the optimiser can always do.
In the third example the query optimiser can work out that the user only wants the top few 'amounts'. If this isn't an aggregate value the optimiser can push that processing right to the start of the query and not bother with any other data. If it is an aggregate amount then the slowdown is for a different reason.
If you compare the query plans of the two queries you should see that they are different.
Today i stumbled upon an interesting performance problem with a stored procedure running on Sql Server 2005 SP2 in a db running on compatible level of 80 (SQL2000).
The proc runs about 8 Minutes and the execution plan shows the usage of an index with an actual row count of 1.339.241.423 which is about factor 1000 higher than the "real" actual rowcount of the table itself which is 1.144.640 as shown correctly by estimated row count. So the actual row count given by the query plan optimizer is definitly wrong!
Interestingly enough, when i copy the procs parameter values inside the proc to local variables and than use the local variables in the actual query, everything works fine - the proc runs 18 seconds and the execution plan shows the right actual row count.
EDIT: As suggested by TrickyNixon, this seems to be a sign of the parameter sniffing problem. But actually, i get in both cases exact the same execution plan. Same indices are beeing used in the same order. The only difference i see is the way to high actual row count on the PK_ED_Transitions index when directly using the parametervalues.
I have done dbcc dbreindex and UPDATE STATISTICS already without any success.
dbcc show_statistics shows good data for the index, too.
The proc is created WITH RECOMPILE so every time it runs a new execution plan is getting compiled.
To be more specific - this one runs fast:
CREATE Proc [dbo].[myProc](
#Param datetime
)
WITH RECOMPILE
as
set nocount on
declare #local datetime
set #local = #Param
select
some columns
from
table1
where
column = #local
group by
some other columns
And this version runs terribly slow, but produces exactly the same execution plan (besides the too high actual row count on an used index):
CREATE Proc [dbo].[myProc](
#Param datetime
)
WITH RECOMPILE
as
set nocount on
select
some columns
from
table1
where
column = #Param
group by
some other columns
Any ideas?
Anybody out there who knows where Sql Server gets the actual row count value from when calculating query plans?
Update: I tried the query on another server woth copat mode set to 90 (Sql2005). Its the same behavior. I think i will open up an ms support call, because this looks to me like a bug.
Ok, finally i got to it myself.
The two query plans are different in a small detail which i missed at first. the slow one uses a nested loops operator to join two subqueries together. And that results in the high number at current row count on the index scan operator which is simply the result of multiplicating the number of rows of input a with number of rows of input b.
I still don't know why the optimizer decides to use the nested loops instead of a hash match which runs 1000 timer faster in this case, but i could handle my problem by creating a new index, so that the engine does an index seek statt instead of an index scan under the nested loops.
When you're checking execution plans of the stored proc against the copy/paste query, are you using the estimated plans or the actual plans? Make sure to click Query, Include Execution Plan, and then run each query. Compare those plans and see what the differences are.
It sounds like a case of Parameter Sniffing. Here's an excellent explanation along with possible solutions: I Smell a Parameter!
Here's another StackOverflow thread that addresses it: Parameter Sniffing (or Spoofing) in SQL Server
To me it still sounds as if the statistics were incorrect. Rebuilding the indexes does not necessarily update them.
Have you already tried an explicit UPDATE STATISTICS for the affected tables?
Have you run sp_spaceused to check if SQL Server's got the right summary for that table? I believe in SQL 2000 the engine used to use that sort of metadata when building execution plans. We used to have to run DBCC UPDATEUSAGE weekly to update the metadata on some of the rapidly changing tables, as SQL Server was choosing the wrong indexes due to the incorrect row count data.
You're running SQL 2005, and BOL says that in 2005 you shouldn't have to run UpdateUsage anymore, but since you're in 2000 compat mode you might find that it is still required.