Clear SQL Azure execution plan / query cache - sql-server

I have a few "inefficient" queries that I am trying to debug on Azure SQL (v12). The problem I have is that after the query executes for the first time (albeit, many seconds) Azure appears to cache the query / execution plan. I have done some research and several people have suggested adding and removing a column will clear the cache but this doesn't seem to work. If I leave the server alone for a few hours / overnight and re-run the query it takes its usual time to execute but once again the cache is in place - this makes it very hard to optimise my query. Does anyone know how to force Azure SQL to not cache my queries / execution plans?

ALTER DATABASE SCOPED CONFIGURATION CLEAR PROCEDURE_CACHE is designed to help wit this problem.
https://learn.microsoft.com/en-us/sql/t-sql/statements/alter-database-scoped-configuration-transact-sql?view=sql-server-2017
This is closest to the DBCC FREEPROCCACHE you have in SQL Server but is scoped to a database instead of the server instance. This does not prevent caching of query plans - it just invalidates the current cache entries.
Please note that the query store is there to help you in SQL Azure (on-by-default). It stores a history of plan choices and plan performance (per-plan). So, if you have a prior plan that performs better available in the history of your application, you can force it using SSMS if you'd prefer to have the query optimizer pick this plan each time your query compiles. One common reason for what you are seeing is parameter-sensitivity in the plan choice where the optimizer will use the passed parameter value to try to generate the query plan, assuming it is representing a common pattern when you run that query. If that value is actually not close to a common value (in terms of how frequent it is in the table), then you can sometimes compile and cache a plan that is not better on average for your application.
Query store has an overview here:
https://learn.microsoft.com/en-us/sql/relational-databases/performance/monitoring-performance-by-using-the-query-store?view=sql-server-2017
Note that SQL Azure also has an automated mechanism to try forcing prior plans if it notices a performance regression. It is somewhat conservative, however, so it may not kick in for every single regression until it sees an obvious pattern over time. So, while you can force things in SSMS, you can also potentially just wait (assuming this is the issue you were seeing)

Related

SQL Server CXPACKET timeout

We've got SQL Server 2016 (v13.0.4206.0), by default there is no restrictions for parallelism - any count SQL wants. And it didn't lead any problems... Till now.
For another feature there were written query that unexpectedly raised timeout exception in our application. I was deeply surprised when it was successfully executed with setting up maximum threads per query to 1. Yes, 6 seconds for query is not so good, even accounting to most of time was spent for fetching, but it's far away from 3 minutes timeout!
By the way, executing this query with SQL Server Management Studio works all the time despite of parallelism settings. It seems that something wrong with connection to database, but all other queries works fine, even which much harder then that one.
Our application is built on ASP.NET Core 3.0 (don't know if it matters), database connection is made using System.Data.SqlClient v4.8.0. All I could determine is that there are so much tasks created for this query:
I've tried to watch for execution in sys.dm_os_waiting_tasks (thanks google). I'm not sure I got it right, but it seems that tasks with context_id 0-8 is blocked with those who have context_id 9-16 and vise versa. Obvious example of deadlock, isn't it? But how can SQL Server manage threads to make it without my "help"? Or what am I doing wrong?
Just in case some inappropriate answers:
I won't turn parallelism off (set maximum threads per query to 1) as solution because of some heavy queries in our application;
I don't want to raise Cost Threshold for Parallelism setting because I'm afraid of same problem with another query (guess, a heavier one). So I just want to determine real cause;
Optimizing the query isn't considered (anymore), as according to actual execution plan I can't make it faster - there are enough indexes for it. But I'm ready to rethink after some really weighty arguments.
So, my question is: why does parallelism that I didn't ask for spoil the query execution? And how can I avoid that?
It's true sometimes the engine chooses to use parallel execution (or not to use) which leads to worse performance.
You do not want to control the server option and the cost as you are not sure how this will reflect to other queries, which is understandable.
If you are sure, your query will be execute better without being handle in parallel, you can specify the option just for it using query hints - MAXDOP like this:
SELECT ...
FROM ...
OPTION (MAXDOP 1);
It's easy and you can rollback if needed. Also, you are not affecting other queries.
You are saying that:
Optimizing query isn't considered (anymore), as according to actual execution plan...
The execution plan is sometimes misleading. As a start - you can save your execution plan and open it with SentryOne Plan Explorer - it's free and can give you a better look of what's going on.
Also, if a query is execute for either 3 seconds or 6 minutes, there must be something wrong with it or may be the activity of your database. If it is executed fast in the SSMS always, maybe the engine is using the correct cache plan. I thing it's better to share the query itself and to attach the two plans (serial and parallel) and spend more time tuning it.

SQL server force multitheading when executing SSIS package

Here is the question:
I am using VisualCron to run a ssis package on SQL Server 2008 R2. The SSIS package will run a query which get millions of rows and output it into a flat file. Sometimes, I found when I run this SSIS package, the sql server doesn't use multi-threading(I can tell that from the activity monitor) , this lead to very long running time about 20 hours. But, if it was using multi-threading it could be done in 8 minutes.
Is there a way to force sql server to use multi-threading whenever it is running this SSIS package?
There are a few options for optimizing your query to handle multiple simultaneous operations... or at least improving performance.
Apply OPTION MAXDOP in your query to apply the maximum number of processors (parallelism) available with the operating system. Listed below is an example and here is a link with more detail.
SELECT FirstName, LastName
FROM dbo.Customer
OPTION (MAXDOP 1)
Apply NOLOCK in the your query if there is no concern with data in the tables being updated during the SSIS package operation. That is, this works if there is no concern with "dirty reads." See following link and example.
SELECT FirstName, LastName
FROM dbo.Customer WITH(NOLOCK)
Review this link for best practices for improving query performance. There may be additional steps you have not taken or tools you have not applied that can greatly assist in improving performance.
The problem is getting clearer.
It's related to how sql decide to use a serial execution plan or a parallel execution plan. That's the optimizer's job. It turns out that I have two tasks in VisualCron that are scheduled to run, they both will run the same big query. The difference is they will get different input parameters.
The first one get parameters that will not deal with too much data.
The second one get parameters that deal with a big amount of data.
I assume the SQL optimizer first see the submitted query, and the query will not get too much data, so it decide to use serial plan.
I guess the plan for this same query is cached, so when it check the second submitted query, the optimizer might check the cache to see if any past evaluated plan exist for this query. Then, if it exist, it will use it.
That's why it still choose to use serial plan for the second query.
After I change the order of the two task(I execute the one which will deal with more data first, and then the one that deal with less data), it works, it is now using parallel plan for both. (You may need to restart the instance to clear the cached execution plans)
How the optimizer works is still my assumption.
Other people's post, explaining how optimizer are playing a important role here
http://web.archive.org/web/20180404164406/http://sqlblog.com/blogs/paul_white/archive/2011/12/23/forcing-a-parallel-query-execution-plan.aspx

SQL Server Performance and Update Statistics

We have a site in development that when we deployed it to the client's production server, we started getting query timeouts after a couple of hours.
This was with a single user testing it and on our server (which is identical in terms of Sql Server version number - 2005 SP3) we have never had the same problem.
One of our senior developers had come across similar behaviour in a previous job and he ran a query to manually update the statistics and the problem magically went away - the query returned in a few miliseconds.
A couple of hours later, the same problem occurred.So we again manually updated the statistics and again, the problem went away. We've checked the database properties and sure enough, auto update statistics isTRUE.
As a temporary measure, we've set a task to update stats periodically, but clearly, this isn't a good solution.
The developer who experienced this problem before is certain it's an environment problem - when it occurred for him previously, it went away of its own accord after a few days.
We have examined the SQL server installation on their db server and it's not what I would regard as normal. Although they have SQL 2005 installed (and not 2008) there's an empty "100" folder in installation directory. There is also MSQL.1, MSQL.2, MSQL.3 and MSQL.4 (which is where the executables and data are actually stored).
If anybody has any ideas we'd be very grateful - I'm of the opinion that rather than the statistics failing to update, they are somehow becoming corrupt.
Many thanks
Tony
Disagreeing with Remus...
Parameter sniffing allows SQL Server to guess the optimal plan for a wide range of input values. Some times, it's wrong and the plan is bad because of an atypical value or a poorly chosen default.
I used to be able to demonstrate this on demand by changing a default between 0 and NULL: plan and performance changed dramatically.
A statistics update will invalidate the plan. The query will thus be compiled and cached when next used
The workarounds are one of these follows:
parameter masking
use OPTIMISE FOR UNKNOWN hint
duplicate "default"
See these SO questions
Why does the SqlServer optimizer get so confused with parameters?
At some point in your career with SQL Server does parameter sniffing just jump out and attack?
SQL poor stored procedure execution plan performance - parameter sniffing
Known issue?: SQL Server 2005 stored procedure fails to complete with a parameter
...and Google search on SO
Now, Remus works for the SQL Server development team. However, this phenomenon is well documented by Microsoft on their own website so blaming developers is unfair
How Data Access Code Affects Database Performance (MSDN mag)
Suboptimal index usage within stored procedure (MS Connect)
Batch Compilation, Recompilation, and Plan Caching Issues in SQL Server 2005 (an excellent white paper)
Is not that the statistics are outdated. What happens when you update statistics all plans get invalidated and some bad cached plan gets evicted. Things run smooth until a bad plan gets again cached and causes slow execution.
The real question is why do you get bad plans to start with? We can get into lengthy technical and philosophical arguments whether a query processor shoudl create a bad plan to start with, but the thing is that, when applications are written in a certain way, bad plans can happen. The typical example is having a where clause like (#somevaribale is null) or (somefield= #somevariable). Ultimately 99% of the bad plans can be traced to developers writing queries that have C style procedural expectation instead of sound, set based, relational processing.
What you need to do now is to identify the bad queries. Is really easy, just check sys.dm_exec_query_stats, the bad queries will stand out in terms of total_elapsed_time and total_logical_reads. Once you identified the bad plan, you can take corrective measures which depend from query to query.

How do you fix queries that only run slow until they're cached

I have some queries that are causing timeouts in our live environment. (>30 seconds)
If I run profiler and grab the exact SQL being run and run it from Management Studio then they take a long time to run the first time and then drop to a few hundred miliseconds each run after that.
This is obviously SQL caching the data and getting it all in memory.
I'm sure there are optimisations that can be made to the SQL that will make it run faster.
My question is, how can I "fix" these queries when the second time I run it the data has already been cached and is fast?
May I suggest that you inspect the execution plan for the queries that are responsible for your poor performance issues.
You need to identify, within the execution plan, which steps have the highest cost and why. It could be that your queries are performing a table scan, or that an inappropriate index is being used for example.
There is a very detailed, free ebook available from the RedGate website that concentrates specifically on understanding the contents of execution plans.
https://www.red-gate.com/Dynamic/Downloads/DownloadForm.aspx?download=ebook1
You may find that there is a particular execution plan that you would like to be used for your query. You can force which execution plan is used for a query in SQL Server using query hints. This is quite an advanced concept however and should be used with discretion. See the following Microsoft White Paper for more details.
http://www.microsoft.com/technet/prodtechnol/sql/2005/frcqupln.mspx
I would also not recommend that you clear the procedure cache on your production environment as this will be detrimental to the performance of all other queries on the platform that are not currently experience performance issues.
If you are executing a stored procedure for example you can ensure that a new execution plan is calculated for each execution of the procedure by using the WITH RECOMPILE command.
For overall performance tuning information, there are some excellent resources over at Brent Ozar’s blog.
http://www.brentozar.com/sql-server-performance-tuning/
Hope this helps. Cheers.
According to http://morten.lyhr.dk/2007/10/how-to-clear-sql-server-query-cache.html, you can run the following to clear the cache:
DBCC DROPCLEANBUFFERS
DBCC FREEPROCCACHE
EDIT: I checked with the SQL Server documentation I have and this is at least true for SQL Server 2000.
Use can use
DBCC DROPCLEANBUFFERS
DBCC FREEPROCCACHE
But only use this in your development environment whilst tuning the queries for deployment to a live server.
I think people are running off in the wrong direction. If I understand, you want the performance to be good all the time? Are they not running fast the 2nd (and subsequent executions) and are slow the first time?
The DBCC commands above clear out the cache, causing WORSE performance.
What you want, I think, is to prime the pump and cache the data. You can do this with some startup procedures that execute the queries and load data into memory.
Memory is a finite resource, so you can't load all data, likely, into memory, but you can find a balance. Brent has some good references above to help learn what you can do here.
Query optimisation is a large subject, there is no single answer to your question. The clues as to what to do are all in the query plan which should be the same regardless of whether the results are cached or not.
Look for the usual things such as table scans, indexes not being used when you expect them to be used, etc. etc. Ultimately you may have to revew your data model and perhaps implement a denormalisation strategy.
From MSDN:
"Use DBCC DROPCLEANBUFFERS to test queries with a cold buffer cache without shutting down and restarting the server."

Whats the best way to profile a sqlserver 2005 database for performance?

What techinques do you use? How do you find out which jobs take the longest to run? Is there a way to find out the offending applications?
Step 1:
Install the SQL Server Performance Dashboard.
Step2:
Profit.
Seriously, you do want to start with a look at that dashboard. More about installing and using it can be found here and/or here
To identify problematic queries start the Profiler, select following Events:
TSQL:BatchCompleted
TSQL:StmtCompleted
SP:Completed
SP:StmtCompleted
filter output for example by
Duration > x ms (for example 100ms, depends mainly on your needs and type of system)
CPU > y ms
Reads > r
Writes > w
Depending on what you want to optimize.
Be sure to filter the output enough to not having thousands of datarows scrolling through your window, because that will impact your server performance!
Its helpful to log output to a database table to analyse it afterwards.
Its also helpful to run Windows system monitor in parallel to view cpu load, disk io and some sql server performance counters. Configure sysmon to save the data to a file.
Than you have to get production typical query load and data volumne on your database to see meaningfull values with profiler.
After getting some output from profiler, you can stop profiling.
Then load the stored data from the profiling table again into profiler, and use importmenu to import the output from systemmonitor and the profiler will correlate the sysmon output to your sql profiler data. Thats a very nice feature.
In that view you can immediately identifiy bootlenecks regarding to your memory, disk or cpu sytem.
When you have identified some queries you want to omtimize, go to query analyzer and watch the execution plan and try to omtimize index usage and query design.
I have had good sucess with the Database Tuning tools provided inside SSMS or SQL Profiler when working on SQL Server 2000.
The key is to work with a GOOD sample set, track a portion of TRUE production workload for analsys, that will get the best overall bang for the buck.
I use the SQL Profiler that comes with SQL Server. Most of the poorly performing queries I've found are not using a lot of CPU but are generating a ton of disk IO.
I tend to put in filters on disk reads and look for queries that tend to do more than 20,000 or so reads. Then I look at the execution plan for those queries which usually gives you the information you need to optimize either the query or the indexes on the tables involved.
I use a few different techniques.
If you're trying to optimize a specific query, use Query Analyzer. Use the tools in there like displaying the execution plan, etc.
For your situation where you're not sure WHICH query is running slowly, one of the most powerful tools you can use is SQL Profiler.
Just pick the database you want to profile, and let it do its thing.
You need to let it run for a decent amount of time (this varies on traffic to your application) and then you can dump the results in a table and start analyzing them.
You are going to want to look at queries that have a lot of reads, or take up a lot of CPU time, etc.
Optimization is a bear, but keep going at it, and most importantly, don't assume you know where the bottleneck is, find proof of where it is and fix it.

Resources