What's changed on Azure to slow my SQL sproc down to a crawl? - sql-server

In December 2015 I deployed a small azure web app (webapi, 1 controller, 2 REST end points) along with an Azure SQL db (1 table, 1.7M rows, 3 stored procedures).
I could call my rest endpoints and get data back within a few seconds. Happy days.
Now I make the same call and my app throws a 500 error. Closer examination shows the SQL access timed out.
I can open the db (using Visual Studio data tools) and run the queries and call the stored procedures. For my main sproc execution time is about 50 seconds - way too long for the app to wait.
The data in the table has not changed since deployment, and the app and db have been untouched for the last few months, so how come it ran OK back in December but fails miserably now?
All help greatly appreciated.

The Query Store is available in SQL Server 2016 and Azure SQL Database. It is a sort of "flight recorder" which records a history of query executions.
Its purpose is to identify what has gone wrong, when a query execution plan suddenly becomes slow. Unlike DMVs, the Query Store data is persisted in tables, so it isn't lost when SQL Server is restarted, and can be retained for months.
It has four reports in SSMS. This picture shows the Top Resource Consuming Queries. The top left pane shows a bar graph where each bar represents a query, ordered by descending resource usage.
You can select a particular query of interest, then the top right pane shows a timeline with points for each execution. In this example, you can see that the query has got much worse, because the second dot is showing much higher resource usage. (Actually I forced this to happen by deliberately dropping a covering index.)
Then you can click on a particular dot and the graphical execution plan is displayed in the lower pane. So in this example, I can compare the two plans to see what has changed. The graphical execution plan is telling me there is a missing index (this feature in itself is not new), and if I clicked on the previous dot this message wouldn't appear. So that's a pretty good clue as to what's gone wrong!
The Regressed Queries report has the same format, but it shows only queries that have "regressed" or got worse. So it is ideal for troubleshooting.
I know this doesn't resolve your present situation, unless you happened to have Query Store enabled. However it could be very useful for the future and for other people reading this.
See MSDN > Monitoring Performance By Using the Query Store: https://msdn.microsoft.com/en-GB/library/dn817826.aspx

Related

Azure SQL Query Editor vs Management Studio

I'm pretty new to azure and cloud computing in general and would like to ask your help in figuring out issue.
Issue was first seen when we had webpage that time outs due to sql timeout set to (30 seconds).
First thing I did was connect to the Production database using MS SQL management studio 2014 (Connected to the azure prod db)
Ran the stored procedure being used by the low performing page but got the return less than 0 seconds. This made me confused since what could be causing the issue.
By accident i also tried to run the same query in the Azure SQL query editor and was shock that it took 29 seconds to run it.
My main question is why is there a difference between running the query in azure sql query editor vs Management studio. This is the exact same database.
DTU usage is at 98% and im thingking there is a performance issue with the stored proc but want to know first why sql editor is running the SP slower than Management studio.
Current azure db has 50 dtu's.
Two guesses (posting query plans will help get you an answer for situations like this):
SQL Server has various session-level settings. For example, there is one to determine if you should use ansi_nulls behavior (vs. the prior setting from very old versions of SQL Server). There are others for how identifiers are quoted and similar. Due to legacy reasons, some of the drivers have different default settings. These different settings can impact which query plans get chosen, in the limit. While they won't always impact performance, there is a chance that you get a scan instead of a seek on some query of interest to you.
The other main possible path for explaining this kind of issue is that you have a parameter sniffing difference. SQL's optimizer will peek into the parameter values used to pick a better plan (hoping that the value will represent the average use case for future parameter values). Oracle calls this bind peeking - SQL calls it parameter sniffing. Here's the post I did on this some time ago that goes through some examples:
https://blogs.msdn.microsoft.com/queryoptteam/2006/03/31/i-smell-a-parameter/
I recommend you do your experiments and then look at the query store to see if there are different queries or different plans being picked. You can learn about the query store and the SSMS UI here:
https://learn.microsoft.com/en-us/sql/relational-databases/performance/monitoring-performance-by-using-the-query-store?view=sql-server-2017
For this specific case, please note that the query store exposes those different session-level settings using "context settings". Each unique combination of context settings will show up as a different context settings id, and this will inform how query texts are interpreted. In query store parlance, the same query text can be interpreted different ways under different context settings, so two different context settings for the same query text would imply two semantically different queries.
Hope that helps - best of luck on your perf problem

Best Way to Pull in Live Data From 'Root' Database On Demand

Let me start by apologizing as I'm afraid this might be more of a "discussion" than an "answerable" question...but I'm running out of options.
I work for the Research Dept. for my city's public schools and am in charge of a reporting web site. We use a third-party vendor (Infinite Campus/IC) solution to track information on our students -- attendance, behavior, grades, etc. The IC database sits in a cloud and they replicate the data to a local database controlled by our IT Dept.
I have created a series of SSIS packages that pull in data each night from our local database, so the reporting data is through the prior school day. This has worked well, but recently users have requested that some of the data be viewed in real-time. My database sits on a different server than the local IC database.
My first solution was to create a linked server from my server to the local IC server, and this was slow but worked. Unfortunately, this put a strain on the local IC database, my IT Dept. freaked out and told me I could no longer do that.
My next & current solution was to create an SSIS package that would be called by a stored procedure. The SSIS package would query the local IC database and bring in the needed data to my database. This has been working well and is actually much quicker than using the linked server. It takes about 30 seconds to pull in the data, process it and spit it out on the screen as opposed to the 2-3 minutes the linked server took. It's been in place for about a month or so.
Yesterday, this live report turned into a parking lot -- the report says "loading" and just sits like that for hours. It eventually will bring back the data. I discovered the department head that I created this report for sent out an e-mail to all schools (approximately 160) encouraging them to check out the report. As far as I can tell, about 90 people tried to run the report at the same time, and I guess this is what caused the traffic jam.
So my question is...is there a better way to pull in this data from the local IC database? I'm kind of limited with what I can do, because I'm not in our IT Dept. I think if I presented a solution to them, they may work with me, but it would have to be minimal impact on their end. I'm good with SQL queries but I'm far from a db admin so I don't really know what options are available to me.
UPDATE
I talked to my IT Dept about doing transactional replication on the handful of tables that I needed, and as suspected it was quickly shot down. What I decided to do was set up an SSIS package that is called via Job Scheduler and runs every 5 minutes. The package only takes about 25-30 seconds to execute. On the report, I've put a big "Last Updated 3/29/2018 5:50 PM" at the top of the report along with a message explaining the report gets updated every 5 minutes. So far this morning, the report is running fantastically and the users I've checked in with seem to be satisfied. I still wish my IT team was more open to replicating, but I guess that is a worry for another day.
Thanks to everybody who offered solutions and ideas!!
One option which I've done in the past is an "ETL on the Fly" method.
You set up an SSIS package as a dataflow but it writes to a DataReader Destination. This then becomes the source for your SSRS Report. In effect this means that when the SSRS report is run - it automatically runs the SSIS package and fetches the data - it can pass parameters into the SSIS report as well.
There's a bit of extra config involved but this is straightforward.
This article goes through it -
https://www.mssqltips.com/sqlservertip/1997/enable-ssis-as-data-source-type-on-sql-server-reporting-services/

How can I find out why azure SQL Database is restarting/resetting periodically?

Over the past week or two, we've seen a four cases where our Azure SQL Database DTU graph ends up looking like this:
That is, it seems to "restart" (note that the graph consistently shows 0 DTUs before the spike, which was definitely not the case because we have constant traffic on this server). This seems to indicate that the DTU measurements restarted. The large spike, followed by the subsequent decaying and stabilizing DTU values seems to indicate to us that the database is "warming up" (presumably doing things like populating caches and organizing indexes perhaps?). The traffic to the web app that accesses this database showed nothing abnormal over the same time period, so we don't have any reason to think that this is a result of "high load".
The "Activity Log" item in Azure doesn't show any information. Looking at the "Resource Health" of our database, however, we saw the following:
Note the A problem with your SQL database has been resolved. The timestamp however doesn't exactly correspond to the time of the spike above (the graph is showing UTC+1 time, and presumably the resource-health timestamp is in UTC, so it's about 1.15hrs difference).
Clicking on "View History" gives us all such events for the past couple of weeks:
In each case the database is "available" again within the refresh-granularity (2 minutes), again suggesting restarts. Interestingly, the restarts are around 4 days apart in each case.
Of course I expect and understand that the database be moved around and restarted from time to time. Our web app is Asp.Net Core 2.0 and uses connection resiliency, so we don't have any failing requests.
That said, considering that this has been happening relatively frequently in the last few weeks, I'm of course wondering if this is something that needs action from our side. We did, for example, upgrade to Entity Framework Core 2.0 around 5 weeks ago, so I'm slightly concerned that that might have something to do with it.
My questions:
Is there any way to know for sure that the database server restarted? Is this information stored in the database itself anywhere, or perhaps on the master database?
Is there any way to know the reason for such restarts, and whether or not it's "our fault" or simply a result of hosting-environment changes? Does the Azure team make such information publicly available anywhere?
The database is on S3 Standard level (100 DTUs) and is hosted in South-East Asia. It's around 3.5GB in size.
Please enable Query Store to identify queries and statements involved on those spikes you see on the DTU consumption graph.
ALTER DATABASE [DB1] SET QUERY_STORE = ON;
Then use a query like below to identify long running queries and the tables involved with them. The name of the tables may give you an idea on what is creating those spikes.
SELECT TOP 10 rs.avg_duration, qt.query_sql_text, q.query_id,
qt.query_text_id, p.plan_id, GETUTCDATE() AS CurrentUTCTime,
rs.last_execution_time
FROM sys.query_store_query_text AS qt
JOIN sys.query_store_query AS q
ON qt.query_text_id = q.query_text_id
JOIN sys.query_store_plan AS p
ON q.query_id = p.query_id
JOIN sys.query_store_runtime_stats AS rs
ON p.plan_id = rs.plan_id
WHERE rs.last_execution_time > DATEADD(hour, -1, GETUTCDATE())
ORDER BY rs.avg_duration DESC;
About the downtimes logged on Resource Health, it seems they are related to maintenance tasks because they occur every 4 days, but I will report it to SQL Azure team and try to get a feedback.

How to view ActualNumberOfRows in SQL Server execution plan

I've been trying to diagnose a performance issue in my database and have googled a lot on maxdop. I have seen in many places where ActualNumberOfRows, ActualRebinds etc. are shown in properties view but the first thing I see is DefinedValues.
After running execution plan I right click an Index Scan for example and expect to see these fields so I can determine how rows are distributed amongst threads.
I am using SQL Server 2005 Enterprise.
include the Actual Execution plan and in that click on the arrow button, there we can see the Actual Number of Rows

SQL Server 2005 - Queries going into suspended state immediately

I'm having a problem with an ad-hoc query that manages a fairly high amount of data. Upon executing the query, the status immediately goes into suspended state. It will stay suspended for around 25 minutes and then complete execution.
I have a mirror environment with SQL2K and the same query executes in around 2 minutes and never goes into suspended state.
##version =
Microsoft SQL Server 2005 - 9.00.3068.00 (Intel IA-64) Feb 26 2008 21:28:22 Copyright (c) 1988-2005 Microsoft Corporation Enterprise Edition (64-bit) on Windows NT 5.2 (Build 3790: Service Pack 2)
Perhaps the statistics are out of date and need updated.
Update them but better to rebuild indexes at the same time.
Or, you don't have any. Are stats set to create and update automatically?
I've seen cases where they're switched off because someone does not understand what they are for or how updates happen.
Note: the sampling rate of stats is based on the last stats update. So if you last sampled 100%, it may take some time.
What happens when you run the query twice? Is it quicker the second time?
It's hard to tell from the limited information, but I'd be curious to know what's happening from a performance perspective on the server while the query is running. You can capture performance metrics with Perfmon, and I've got a tutorial about it here:
http://www.brentozar.com/perfmon
While the query's running, what's the statistics each of those counters look like? If you capture the statistics as described in that article, you can email 'em to me at brento#brentozar.com and I'll take a look at 'em to see what's going on.
Another thing that'd help is the execution plan of the query. Go into SQL Server Management Studio, put the query in, and click Query, Display Estimated Execution Plan. Right-click anywhere on the plan and save it as a file, and then other people can see what the query looks like.
Then ideally, click Query, Include Actual Execution Plan, run the query, and then go to the Execution Plan tab. Save that one too. If you post the two plans (or email 'em to me) you'll get better answers about what's going on.

Resources