Two separate instances of SQL Server running a different explain plan - sql-server

Here's one I need help from the SQL administrators out there. I have two separate SQL Server instances on Amazon EC2. One is our staging environment, and the other is our production environment, but they are configured exactly the same way (spawned from the same image).
We had a database that we copied from staging to our production environment last week. The way we copy a db to production is we take a backup of it on our staging site, and restore the backup in production. Anyways, we found that in production, one particular complex query was timing out after an hour, but that exact query in our staging environment completed in 10 minutes.
The explain plan on both were almost the same, except in one server it was doing a PK scan on a large table (8M rows), and on the other table it was doing an index seek. We're assuming this was the difference. So one server was doing a lot of disk IO, and the other was not.
So my question is, what are the reasons that one installation of SQL server would decide to use an index, while another one ignores it--assuming same versions of SQL server, and same data set? Even better, what are the best ways to find out why SQL is ignoring an index?

SQL Server uses statistics to determine the query execution plan.
Normally, they should be the same on the same datasets, but there is a chance of outdated statistics on one of the machines.
Use sp_updatestats to update statistics on both machines.
Also, I'm not familiar with Amazon EC2, but there may be a chance that the machines running the two instances have different number of CPU installed (or made available for use by SQL Server). This is also taken into account by the optimizer.

Parameter Sniffing?
An SP will use the query plan that was deemed most appropriate based on the parameters passed to it when it was executed (and so compiled) for the first time.
Restoring a database wipes the plan cache; if the SP on the copy of the database was run with parameters that favored an index seek, then that's what will subsequently be used.
You can check this by sp_recompile'ing both and running them again with identical parameters.

This was our mistake.
After much digging investigation, we found that one of our devs had added a couple additional indexes to the production db after the transfer. This was a case where the additional indexes actually caused the query optimizer to pick a less efficient route in the production environment.
Removing those additional indexes appeared to have addressed the performance issue for the particular query, and both explain plans are now the same.

Related

Queries slow when run by specific Windows account

Running SQL Server 2014 Express on our domain. We use Windows Authentication to log on. All queries are performed in stored procedures.
Now, the system runs fine for all our users - except one. When he logs on (using our software), all queries take around 10 times longer (e.g. 30 ms instead of 2 ms). The queries are identical, the database is the same, the network speed is the same, the operative system is the same, the SQL Server drivers are the same, connection pooling is the same, DNS is the same. Changing computer does not help. The problem seems to be linked to the account being used.
What on Earth may be the cause for this huge performance hit?
Please advise!
I would try rebuilding the SP (by running an ALTER statement that duplicates its existing structure) to force SQL Server to recompile. I don't know every way SQL Server caches things but it can definitely create distinct execution plans for different types of connections so I wouldn't be surprised if your slow user is running a version with an inefficient execution plan.
http://www.sommarskog.se/query-plan-mysteries.html

SQL Server Using TableDiff on large tables

We have a process which uses uses SQL Server's amazing tableDiff via:
Microsoft SQL Server\100\COM\Tablediff.exe
It's SQL Server 2008 R2. It connects from one instance to another identical instance. It works very well!
I have a situation where a table which now has 10767594 records is taking 2.5 hours to complete, it only has one table in the job. How can I improve this?
The process is triggered by a Windows Scheduled Task, this calls a .bat file, the .bat file contains the recommended code which has no issue. We have a couple of these in place and have had for some time. It's just the one job that deals with the big table from instance to instance that is taking too long.
I have realised that the source table does have an index but the destination table does not. I will put an index on this table, what else can I do?
Does table diff run better with indexes?
Is there a ways to use table diff more effectively?
E.g. if I capture the lastProcessedID can I run tableDiff next time for all records where id > lastProcessedID?
Any advice would be great. Thank you in advance
EDITED:
MY SOLUTION - This was a very very big surprise. As I mentioned above, the 10 million+ record table which was identical on the source and destination except for 2 indexes (on the source). After waiting for out of hours since this is an internal production server I applied the indexes to the source. Now I run the tableDiff job which has not been changed at all and it completes in under 2 minutes. 2.5 hours to 2 mins!
I have accepted the answer below because it very very helpful. I did go down the Merge Replication path however after setting up replication and publishing I found out that the production instance was not able to be a subscriber due to the replication not be ticked on install. As Jason says its a reasonable amount of research, learning and setting up. Since I am not a DBA and had not looked at this before it was a worth while experience.
The performance issue is because the remote queries pull every record from each place to do the comparison to generate the output. Indexes can help slightly to make the pull a little faster from each location, but it's not likely to be significant.
An incremental approach is definitely better. I don't believe tablediff directly supports comparing 2 queries. If it did, you could do something like EXCEPT or INTERSECT to do the comparisons. If you're trying to keep these databases in sync, why not consider other solutions, like log shipping, mirroring, SSIS, replication, clustering, etc.

Copy Stored Proc Execution Plan to Another Database

Setup:
Using SQL Server 2008 R2.
We've got a stored procedure that has been intermittently running very long. I'd like to test a theory that parameter sniffing is causing the query engine to choose a bad plan.
Question:
How can I copy the query's execution plans from one database to another (test) database?
Note:
I'm fully aware that this may not be parameter sniffing issues. However, I'd like to go through the motions of creating a test plan and using it, if at all possible. Therefore please do not ask me to post code and/or table schema, since this is irrelevant at this time.
Plans are not portable, they bind to object IDs. You can use planguides, but they are strictly tied to the database. What you have to do is test on a restored backup of the same database. On a restored backup you can use a planguide. But for relevance the physical characteristics of the machines should be similar (CPUs, RAM, Disks).
Normally though one does not need to resort to such shenanigans as copy the plans. Looking at actual execution plans all the answers are right there.
Have you tried using OPTIMIZE FOR clause? With it you can tune your procedure easier, and without the risk that plan that you copy from another database will be inappropriate due to differences in those databases (if copying the plan is even possible).
http://www.mssqltips.com/sqlservertip/1354/optimize-parameter-driven-queries-with-sql-server-optimize-for-hint/

SQL Server Performance and Update Statistics

We have a site in development that when we deployed it to the client's production server, we started getting query timeouts after a couple of hours.
This was with a single user testing it and on our server (which is identical in terms of Sql Server version number - 2005 SP3) we have never had the same problem.
One of our senior developers had come across similar behaviour in a previous job and he ran a query to manually update the statistics and the problem magically went away - the query returned in a few miliseconds.
A couple of hours later, the same problem occurred.So we again manually updated the statistics and again, the problem went away. We've checked the database properties and sure enough, auto update statistics isTRUE.
As a temporary measure, we've set a task to update stats periodically, but clearly, this isn't a good solution.
The developer who experienced this problem before is certain it's an environment problem - when it occurred for him previously, it went away of its own accord after a few days.
We have examined the SQL server installation on their db server and it's not what I would regard as normal. Although they have SQL 2005 installed (and not 2008) there's an empty "100" folder in installation directory. There is also MSQL.1, MSQL.2, MSQL.3 and MSQL.4 (which is where the executables and data are actually stored).
If anybody has any ideas we'd be very grateful - I'm of the opinion that rather than the statistics failing to update, they are somehow becoming corrupt.
Many thanks
Tony
Disagreeing with Remus...
Parameter sniffing allows SQL Server to guess the optimal plan for a wide range of input values. Some times, it's wrong and the plan is bad because of an atypical value or a poorly chosen default.
I used to be able to demonstrate this on demand by changing a default between 0 and NULL: plan and performance changed dramatically.
A statistics update will invalidate the plan. The query will thus be compiled and cached when next used
The workarounds are one of these follows:
parameter masking
use OPTIMISE FOR UNKNOWN hint
duplicate "default"
See these SO questions
Why does the SqlServer optimizer get so confused with parameters?
At some point in your career with SQL Server does parameter sniffing just jump out and attack?
SQL poor stored procedure execution plan performance - parameter sniffing
Known issue?: SQL Server 2005 stored procedure fails to complete with a parameter
...and Google search on SO
Now, Remus works for the SQL Server development team. However, this phenomenon is well documented by Microsoft on their own website so blaming developers is unfair
How Data Access Code Affects Database Performance (MSDN mag)
Suboptimal index usage within stored procedure (MS Connect)
Batch Compilation, Recompilation, and Plan Caching Issues in SQL Server 2005 (an excellent white paper)
Is not that the statistics are outdated. What happens when you update statistics all plans get invalidated and some bad cached plan gets evicted. Things run smooth until a bad plan gets again cached and causes slow execution.
The real question is why do you get bad plans to start with? We can get into lengthy technical and philosophical arguments whether a query processor shoudl create a bad plan to start with, but the thing is that, when applications are written in a certain way, bad plans can happen. The typical example is having a where clause like (#somevaribale is null) or (somefield= #somevariable). Ultimately 99% of the bad plans can be traced to developers writing queries that have C style procedural expectation instead of sound, set based, relational processing.
What you need to do now is to identify the bad queries. Is really easy, just check sys.dm_exec_query_stats, the bad queries will stand out in terms of total_elapsed_time and total_logical_reads. Once you identified the bad plan, you can take corrective measures which depend from query to query.

Same query, different execution plans

I am trying to find a solution for a problem that is driving me mad...
I have a query which runs very fast in a QA Server but it is very slow in production. I realized that they have different execution plans... so I have try recompiling, cleanning the cache for the execution plans, update statistics, check the type of collation... but I still can't find what's going on...
The databases where the query is running are exactly the same and the SQL Servers have also the same configuration.
Any new ideas would be much appreciated.
Thanks,
A.
I just realised the the QA server is running SP3 and in production is SP2. Could this have any impact on this issue?
Is it possible the production server has a larger database size? The plan can be different because it is based on statistics on the data it contains.
I think it could be due to the volume of data present. It happened to us one time where the query literally flew in QA server but was incredibly slow in the production. After breaking our heads for a while we found out that QA server had 15K rows where as production had 1.5 million.
HTH
If the execution plan was the same and one was slow, it would be database load, hardware, locking/blocking, etc.
However, if the execution plans are different something is different between the two databases. Are statistics up to date in both, have the exact same schemas, same indexes, similar number of rows, same distribution of PK and index values, etc. Where did the QA data come from, random data or is it a restore from production?
Disable parallel query execution on production :)
I ran into this recently and here's what I found.
I had two databases that were essentially copies of each other. On one version a TVF was taking 1 second to run, while on the other version took 15 minutes to run.
The execution plans of the underlying SQL code were very different. I was able to fix it by rebuilding some indexes that the TVF relied on. The execution plans aren't the same, but it did change a lot. And the execution time is back down to around a second.
Now, both versions had indexes that were highly fragmented. My assumption is that historical statistic or execution plan information allowed the fast version to continue to find an optimal execution plan.
So to sum up: make sure you look at the fragmentation of your indexes even if they have the same structure or similar rates of fragmentation.

Resources