Extrememly High Estimated Number of Rows in Execution Plan - sql-server

I have a stored procedure running 10 times slower in production than in staging. I took at look at the execution plan and the first thing I noticed was the cost on Table Insert (into a table variable #temp) was 100% in production and 2% in staging.
The estimated number of rows in production showed almost 200 million row! But in staging was only about 33.
Although the production DB is running on SQL Server 2008 R2 while staging is SQL Server 2012 but I don't think this difference could cause such a problem.
What could be the cause of such a huge difference?
UPDATED
Added the execution plan. As you can see, the large number of estimated rows shows up in Nested Loops (Inner Join) but all it does is a clustered index seek to another table.
UPDATED2
Link for the plan XML included
plan.xml
And SQL Sentry Plan Explorer view (with estimated counts shown)

This looks like a bug to me.
There are an estimated 90,991.1 rows going into the nested loops.
The table cardinality of the table being seeked on is 24,826.
If there are no statistics for a column and the equality operator is used, that means the SQL can’t know the density of the column, so it uses a 10 percent fixed value.
90,991.1 * 24,826 * 10% = 225,894,504.86 which is pretty close to your estimated rows of 225,894,000
But the execution plan shows that only 1 row is estimated per seek. Not the 24,826 from above.
So these figures don't add up. I would assume that it starts off from an original 10% ball park estimate and then later adjusts it to 1 because of the presence of a unique constraint without making a compensating adjustment to the other branches.
I see that the seek is calling a scalar UDF [dbo].[TryConvertGuid] I was able to reproduce similar behavior on SQL Server 2005 where seeking on a unique index on the inside of a nested loops with the predicate being a UDF produced a result where the number of rows estimated out of the join was much larger than would be expected by multiplying estimated seeked rows * estimated number of executions.
But, in your case, the operators to the left of the problematic part of the plan are pretty simple and not sensitive to the number of rows (neither the rowcount top operator or the insert operator will change) so I don't think this quirk is responsible for the performance issues you noticed.
Regarding the point in the comments to another answer that switching to a temp table helped the performance of the insert this may be because it allows the read part of the plan to operate in parallel (inserting to a table variable would block this)

Run EXEC sp_updatestats; on the production database. This updates statistics on all tables. It might produce more sane execution plans if your statistics are screwed up.

Please don't run EXEC sp_updatestats; On a large system it could take hours, or days, to complete. What you may want to do is look at the query plan that is being used on production. Try to see if it has a index that could be used and is not being used. Try rebuilding the index (as a side effect it rebuilds statistics on the index.) After rebuilding look at the query plan and note if it is using the index. Perhaps you many need to add an index to the table. Does the table have a clustered index?
As a general rule, since 2005, SQL server manages statistics on its own rather well. The only time you need to explicitly update statistics is if you know that if SQL Server uses an index the query would execute would execute a lot faster but its not. You may want to run (on a nightly or weekly basis) scripts that automatically test every table and every index to see if the index needs to be reorged or rebuilt (depending on how fragmented it is). These kind of scripts (on a large active OLTP system)r may take a long time to run and you should consider carefully when you have a window to run it. There are quite a few versions of this script floating around but I have used this one often:
https://msdn.microsoft.com/en-us/library/ms189858.aspx

Sorry this is probably too late to help you.
Table Variables are impossible for SQL Server to predict. They always estimate one row and exactly one row coming back.
To get accurate estimates so that the better plan can be created you need to switch your table variable to a temp table or a cte.

Related

Gather Streams operator before table update causing serial update leading to long running query in SQL Server 2017

I have a long running stored procedure with lot of statements. After analyzing identified few statements which are taking most time. Those statements are all update statements.
Looking at the execution plan, the query scans the source table in parallel in few seconds, and then passed it to gather streams operation which then passes to
This is somewhat similar to below, and we see same behavior with the index creation statements too causing slowness.
https://brentozar.com/archive/2019/01/why-do-some-indexes-create-faster-than-others/
Table has 60 million records and is a heap as we do lot of data loads, updates and deletes.
Reading the source is not a problem as it completes in few seconds, but actual update which happens serially is taking most time.
A few suggestions to try:
if you have indexes on the target table, dropping them before and recreating after should improve insert performance.
Add insert into [Table] with (tablock) hint to the table you are inserting into, this will enable sql server to lock the table exclusively and will allow the insert to also run in parallel.
Alternatively if that doesn't yield an improvement try adding a maxdop 1 hint to the query.
How often do you UPDATE the rows in this heap?
Because, unlike clustered indexes, heaps will use a RID to find specific rows. But the thing is that (unless you specifically rebuild this) when you update a row, the last row will still remain where it was and now point to the new location instead, increasing the number of lookups that is needed for each time you perform an update on a row.
I don't really think that is something that will be affected here, but could you possible see what happens if you add a clustered index on the table and see how the update times are affected?
Also, I don't assume you got some heavy trigger on the table, doing a bunch of stuff as well, right?
Additionally, since you are referring to an article by Brent Ozar, he does advocate to break updates into batches of no more than 4000 rows a time, as that has both been proven to be the fastest and will be below the 5000 rows X-lock that will occur during updates.

Improve performance of insert?

I ran the Performance – Top Queries by Total IO (I am trying to improve this process).
The top #1 is this code:
DECLARE #LeadsVS3 AS TT_LEADSMERGE
DECLARE #LastUpdateDate DATETIME
SELECT #LastUpdateDate = MAX(updatedate)
FROM [BUDatamartsource].[dbo].[salesforce_lead]
INSERT INTO #LeadsVS3
SELECT
Lead_id,
(more columns…)
OrderID__c,
City__c
FROM
[ReplicatedVS3].[dbo].[Lead]
WHERE
UpdateDate > #LastUpdateDate
(the code is a piece of a larger SP)
This is in a job that runs every 15 minutes... Other than running the job less frequently is there any other improvement I could make?
Make a try with a local hash table like #LeadsVS3, it is faster than udtt in most cases
Also there is another trick you may do.
On those cases where you always get all 'recent' rows, you may get locked for 1 row, the latest, waiting to commit. You may sacrifice a small part e.g. 1 minute that is to ignore last minute records ( current datetime - 1 minute ). You get those to the next run and save yourself any transaction (or replication) lock waits.
The execution plan that you posted appears to be the estimated execution plan. (the actual execution plan includes the actual number of rows). Without the actual plan it's impossible to tell what's really going on.
The obvious improvement would be to add a covering nonclustered index on Lead.leadid that includes the other columns in your SELECT statement. Right now your scanning a the widest possible index (your clustered index) to retrieve a presumably small percentage of records. Turning that clustered scan into a non-clustered seek will be huge.
On that same note you could make that index a filtered index that's only includes records for dates greater than your last UpdateDate. Then setup a regular SQL Job that periodically rebuilds it to filter on a more current date.
Other things you can do to increase insert performance:
Drop any constraints and/or indexes before the insert then rebuild
them after.
Use smaller data types

Database reads varying dramatically on a query with indexes

I have a query that has appropriate indexes and is shown in the query plan with an estimated subtree cost of circa 1.5. The plan shows an Index Seek, followed by Key Lookup - which is fine for a query expected to return 1 row from a set of between 5 and 20 rows (i.e. the Index Seek should find between 5 and 20 rows, and after 5 - 20 Key Lookups, we should return 1 row).
When run interactively, the query returns almost immediately. However, DB traces this morning show runtimes from live (a web app) that vary wildly; typically the query is taking < 100 DB Reads, and effectively 0 runtime... but we are getting a few runs that consume > 170,000 DB Reads, and runtime up to 60s (greater than our timeout value).
What could explain this variation in disk reads? I have tried comparing queries interactively and using Actual Execution plans from two parallel runs with filter values taken from fast and slow runs, but interactively these show effectively no difference in the plan used.
I also tried to identify other queries that could be locking this one, but I am not sure that would impact the DB Reads so much... and in any event this query tended to be the worst for runtime in my trace logs.
Update: Here's a sample of the plan produced when the query is run interactively:
Please ignore the 'missing index' text. It is true that changes to the current indexes could allow a faster query with fewer lookups, but that is not the issue here (there are already appropriate indexes). This is an Actual Execution Plan, where we see figures like Actual Number of Rows. For example, on the Index Seek, the Actual number of rows is 16, and the I/O cost is 0.003. The I/O cost is the same on the Key Lookup.
Update 2: The results from the trace for this query are:
exec sp_executesql N'select [...column list removed...] from ApplicationStatus where ApplicationGUID = #ApplicationGUID and ApplicationStatusCode = #ApplicationStatusCode;',N'#ApplicationGUID uniqueidentifier,#ApplicationStatusCode bigint',#ApplicationGUID='ECEC33BC-3984-4DA4-A445-C43639BF7853',#ApplicationStatusCode=10
The query is constructed using the Gentle.Framework SqlBuilder class, which builds parameterised queries like this:
SqlBuilder sb = new SqlBuilder(StatementType.Select, typeof(ApplicationStatus));
sb.AddConstraint(Operator.Equals, "ApplicationGUID", guid);
sb.AddConstraint(Operator.Equals, "ApplicationStatusCode", 10);
SqlStatement stmt = sb.GetStatement(true);
IList apps = ObjectFactory.GetCollection(typeof(ApplicationStatus), stmt.Execute());
Could the data be being removed from the cache? That may be an explanation why with a hot cache (data already in memory), the reads recorded are very low....and then when the data is no longer in RAM, the reads would increase as it has to read it off disk again.
Just one idea to get things moving.
Run profiler to see if statistics are being updated around the same time. Or simply to see what else is going.
Also, please add the SQL query as well the client code.
Thoughts:
It sounds like your "5-20" rows could be far more than that
With a bad plan/parameter sniffing you'd get consistently bad performance
How may writes happen on this table: enough to update statistics?
Is there some datatype issue? (eg concatenating parameters and introducing a datatype conversion)

How to profile and address insert/update performance issues?

I am trying to insert thousands of rows into a table and performance is not acceptable. Rows on a particular table take 300ms per row to insert.
I know that tools exist to profile queries run against SQL Server (SQL Server Profile, Database Tuning Advisor), but how would I profile insert and update statements to determine slow running inserts? Am I forced to use perfmon while the queries run and deduce the issue with counters?
I would first check the query plan of a single insert to understand the costs associated to that operation - it is not known from the question whether the insert is selecting the data from elsewhere.
I would then check the table indexing for the following:
how many indexes are in place (apart from filtered indexes, each index will be inserted into as well)
whether a clustered index is present or are we inserting into a heap.
if the clustered index key means we will be getting a hotspot benefit on the end of the table or causing a large quantity of page splits.
This is all SQL schema based issues, assuming there is no problems within SQL, you can start checking disk IO counters to check for disk queue lengths and response time. Not forgetting the Log drive response time since each insert will be logged.
These kind of problems are very difficult to nail down as any 1 perscriptive thing / silver bullet you can give advice over, just a range of things you should be checking.
I'm betting that the problem is with the selects and not necessarily the updates. Have you tried profiling the select part of the update statement to make sure there isn't a problem there first?

SQL Server query execution plan shows wrong "actual row count" on an used index and performance is terrible slow

Today i stumbled upon an interesting performance problem with a stored procedure running on Sql Server 2005 SP2 in a db running on compatible level of 80 (SQL2000).
The proc runs about 8 Minutes and the execution plan shows the usage of an index with an actual row count of 1.339.241.423 which is about factor 1000 higher than the "real" actual rowcount of the table itself which is 1.144.640 as shown correctly by estimated row count. So the actual row count given by the query plan optimizer is definitly wrong!
Interestingly enough, when i copy the procs parameter values inside the proc to local variables and than use the local variables in the actual query, everything works fine - the proc runs 18 seconds and the execution plan shows the right actual row count.
EDIT: As suggested by TrickyNixon, this seems to be a sign of the parameter sniffing problem. But actually, i get in both cases exact the same execution plan. Same indices are beeing used in the same order. The only difference i see is the way to high actual row count on the PK_ED_Transitions index when directly using the parametervalues.
I have done dbcc dbreindex and UPDATE STATISTICS already without any success.
dbcc show_statistics shows good data for the index, too.
The proc is created WITH RECOMPILE so every time it runs a new execution plan is getting compiled.
To be more specific - this one runs fast:
CREATE Proc [dbo].[myProc](
#Param datetime
)
WITH RECOMPILE
as
set nocount on
declare #local datetime
set #local = #Param
select
some columns
from
table1
where
column = #local
group by
some other columns
And this version runs terribly slow, but produces exactly the same execution plan (besides the too high actual row count on an used index):
CREATE Proc [dbo].[myProc](
#Param datetime
)
WITH RECOMPILE
as
set nocount on
select
some columns
from
table1
where
column = #Param
group by
some other columns
Any ideas?
Anybody out there who knows where Sql Server gets the actual row count value from when calculating query plans?
Update: I tried the query on another server woth copat mode set to 90 (Sql2005). Its the same behavior. I think i will open up an ms support call, because this looks to me like a bug.
Ok, finally i got to it myself.
The two query plans are different in a small detail which i missed at first. the slow one uses a nested loops operator to join two subqueries together. And that results in the high number at current row count on the index scan operator which is simply the result of multiplicating the number of rows of input a with number of rows of input b.
I still don't know why the optimizer decides to use the nested loops instead of a hash match which runs 1000 timer faster in this case, but i could handle my problem by creating a new index, so that the engine does an index seek statt instead of an index scan under the nested loops.
When you're checking execution plans of the stored proc against the copy/paste query, are you using the estimated plans or the actual plans? Make sure to click Query, Include Execution Plan, and then run each query. Compare those plans and see what the differences are.
It sounds like a case of Parameter Sniffing. Here's an excellent explanation along with possible solutions: I Smell a Parameter!
Here's another StackOverflow thread that addresses it: Parameter Sniffing (or Spoofing) in SQL Server
To me it still sounds as if the statistics were incorrect. Rebuilding the indexes does not necessarily update them.
Have you already tried an explicit UPDATE STATISTICS for the affected tables?
Have you run sp_spaceused to check if SQL Server's got the right summary for that table? I believe in SQL 2000 the engine used to use that sort of metadata when building execution plans. We used to have to run DBCC UPDATEUSAGE weekly to update the metadata on some of the rapidly changing tables, as SQL Server was choosing the wrong indexes due to the incorrect row count data.
You're running SQL 2005, and BOL says that in 2005 you shouldn't have to run UpdateUsage anymore, but since you're in 2000 compat mode you might find that it is still required.

Resources