Azure SQL connection timeouts (non-transient) - database

I know there are posts around Azure SQL connection timeouts, but have not found the following case.
I've been using Azure SQL (S3 plan). Normally the DTU is very low, and there are no timeouts when apps connect to this DB.
The problem starts when we run batch jobs against the DB, such as updating certain column value for millions of rows. It may take hours to complete these batch jobs. During this period, the DTU value reaches the max and other apps fail with timeouts.
Are there guidelines on what should be done? Here are options I thought of.
Upgrade to higher tier. This option likely works, but not attractive as the DTU is usually very low.
Increasing timeouts for the apps that connect to DB. Not sure if this works, because timeout would have to be a very long time.
If there is a way to allocate a certain portion of DTU to the batch job (say 70%) and always keeps some DTU left for others, that'd be ideal, but I don't think it's possible. Any suggestion would be appreciated!

Increasing timeouts is rarely a true solution, and can often make things worse.
First see if the operations being performed by the batch job can be made more efficient, by reviewing the query execution plans for missing or insufficient indexes, inefficient query logic, opportunities for caching, etc.
You could add a configuration setting and self-throttling logic to your batch job that allows you to control how many operations it may perform in a given timeframe, and then use that to determine what works best in your situation.
Maybe an easier option would be to just add a step to the beginning of your batch job that temporarily scales-up the database to a higher pricing tier when it starts, and then scales it back down when it finished.
ALTER DATABASE MyDatabaseName MODIFY (SERVICE_OBJECTIVE = 'P4')

You can upgrade to higher tier before running batch job and then downgrade back to S3.

Related

SQL Server CXPACKET timeout

We've got SQL Server 2016 (v13.0.4206.0), by default there is no restrictions for parallelism - any count SQL wants. And it didn't lead any problems... Till now.
For another feature there were written query that unexpectedly raised timeout exception in our application. I was deeply surprised when it was successfully executed with setting up maximum threads per query to 1. Yes, 6 seconds for query is not so good, even accounting to most of time was spent for fetching, but it's far away from 3 minutes timeout!
By the way, executing this query with SQL Server Management Studio works all the time despite of parallelism settings. It seems that something wrong with connection to database, but all other queries works fine, even which much harder then that one.
Our application is built on ASP.NET Core 3.0 (don't know if it matters), database connection is made using System.Data.SqlClient v4.8.0. All I could determine is that there are so much tasks created for this query:
I've tried to watch for execution in sys.dm_os_waiting_tasks (thanks google). I'm not sure I got it right, but it seems that tasks with context_id 0-8 is blocked with those who have context_id 9-16 and vise versa. Obvious example of deadlock, isn't it? But how can SQL Server manage threads to make it without my "help"? Or what am I doing wrong?
Just in case some inappropriate answers:
I won't turn parallelism off (set maximum threads per query to 1) as solution because of some heavy queries in our application;
I don't want to raise Cost Threshold for Parallelism setting because I'm afraid of same problem with another query (guess, a heavier one). So I just want to determine real cause;
Optimizing the query isn't considered (anymore), as according to actual execution plan I can't make it faster - there are enough indexes for it. But I'm ready to rethink after some really weighty arguments.
So, my question is: why does parallelism that I didn't ask for spoil the query execution? And how can I avoid that?
It's true sometimes the engine chooses to use parallel execution (or not to use) which leads to worse performance.
You do not want to control the server option and the cost as you are not sure how this will reflect to other queries, which is understandable.
If you are sure, your query will be execute better without being handle in parallel, you can specify the option just for it using query hints - MAXDOP like this:
SELECT ...
FROM ...
OPTION (MAXDOP 1);
It's easy and you can rollback if needed. Also, you are not affecting other queries.
You are saying that:
Optimizing query isn't considered (anymore), as according to actual execution plan...
The execution plan is sometimes misleading. As a start - you can save your execution plan and open it with SentryOne Plan Explorer - it's free and can give you a better look of what's going on.
Also, if a query is execute for either 3 seconds or 6 minutes, there must be something wrong with it or may be the activity of your database. If it is executed fast in the SSMS always, maybe the engine is using the correct cache plan. I thing it's better to share the query itself and to attach the two plans (serial and parallel) and spend more time tuning it.

How many SQL jobs a sql server can handle?

I am creating a database medical system and then I came to a point where I am trying to create a notification feature and i will use SQL jobs in it, where the SQL job responsibility is to check some tables and the entities that will find it need to be notified for a change in certain data will put their ids in an entity called Notification and a trigger will be called for the app to check that table and send the notificiation.
what I want to ask is how many SQL jobs can a sql server handle ?
Does the number of running SQL jobs in background affect the performance of my application or the database performance in a way or another ?
NOTE: the SQL job will run every 10 seconds
I couldn't find any useful information online.
thanks in advance.
This question really doesn't have enough background to get a definitive answer. What are the considerations?
Do the queries in your ten-second job actually complete in ten seconds, even when your DBMS is under its peak transactional workload? Obviously, if the job routinely doesn't complete in ten seconds, you'll get jobs piling up.
Do the queries in your job lock up tables and/or indexes so the transactional load can't run efficiently? (You should use SET ISOLATION LEVEL READ UNCOMMITTED; as much as you can so database reads won't lock things unnecessarily.)
Do the queries in your job do a lot of rows' worth of inserts and updates, and so swamp the SQL Server transaction logs?
How big is your server? (CPU cores? RAM? IO capacity?) How big is your database?
If your project succeeds and you get many users, will your answers to the above questions remain the same? (Hint: no.)
You should spend some time on the execution plans for the queries in your job, and try to make them as efficient as possible. Add the necessary indexes. If necessary refactor the queries to make them more efficient. SSMS will show you the execution plans and suggest appropriate indexes.
If your job is doing things like deleting expired rows, you may want to build the expiration in your data model. For example, suppose your job does
DELETE FROM readings WHERE expiration_date >= GETDATE()
and your application does this, relying on your job to avoid getting expired readings.
SELECT something FROM readings
You can refactor your application query to say
SELECT something FROM readings WHERE expiration_date < GETDATE()
and then run your job overnight, at a quiet time, rather than every ten seconds.
A ten-second job is not the greatest idea in the world. If you can rework your application so it will function correctly with a ten-second, ten-minute, or twelve-hour job, you'll have a more resilient production system. At any rate if something goes wrong with the job when your system is very busy you'll have more than ten seconds to fix it.

What is XE_FILE_TARGET_TVF wait type in Azure SQL Database?

I'm experiencing periodical Azure SQL Database connection slow downs. As recommended in Wait statistics, or please tell me where it hurts article I ran sys.dm_db_wait_stats (analogue of sys.dm_os_wait_stats for Azure SQL Database) aggregation script which show me that longest waits are of type XE_FILE_TARGET_TVF. Average wait time is 54 seconds.
XE_FILE_TARGET_TVF didn't mentioned in documentation as in any other online resources that I know. I suspect that "XE" means "Extended Events", "TVF" is "table-valued functions", "FILE_TARGET" is probably indicator that something is being written to some file.
So, what kind of wait it is?
Use Azure SQL Database Query Performance Insight for troubleshooting SQL Azure DBs. To begin with, wait stats alone are not a correct performance troubleshooting approach. Read How to analyse SQL Server performance for a more thorough approach, learn to analyze also where the CPU is spent, not only where the elapsed time is spent waiting. Aggregate wait stats are often misleading, filtering out the 'benign' wait stats is a chasing the red lights game.
Unfortunately not everything is actionable in SQL Azure DB environment. Start with the Query Performance Insight to see if you can correlate the performance issues with queries issued by your app. Use the SQL Database Index Advisor to get index recommendations for your workload.
If you cannot find the application problems and suspect this is caused by the platform, you will have to open a support case. Twitting #AzureSupport is very effective in getting help.
To answer your question: a lot of Azure SQL DB monitoring relies on Extended Events and they store data to files. This specific wait stat is unlikely to be related to the cause of your performance problems.
I believe that it's a async process so possibly nothing to worry around and could be a red herring. However I think this is a fabric related task behind the DB. Does this event seem to be consistently causing you issues?

How to debug slowdown on a SQL Azure server?

We've had a SQL Azure cloudapp/database in production for a long time and while its performance has been a little volatile, over the last few days it has suddenly dropped drastically. Our application is unresponsive because SQL queries and stored procedures that used to take 5-10 seconds are now taking 90 seconds or more.
What are the things I should check, given that we already do regular index rebuilds/reorgs, clear down large tables when we're finished, etc.
We're still on the "Web" service tier and are planning to move soon to the newer S2 perhaps but we need to tackle this issue.
1) How many active connections does your SQL Azure DB have during slow times? Things get wierd once you get into 150+ range on a shared plan.
If you have a ton of connections open, that means you're not properly clearing them in your app somewhere.
2) Does your DB have any blocking queries? DBs with alot of blocking (deadlocking) queries may behave much slower, if you need access to locked resources
3) You should really consider switching to a dedicated SQL Azure plan. It is very quick to do and no action is required on the app-dev side. http://azure.microsoft.com/blog/2014/07/08/azure-update-sql-database-easy-upgrade-to-new-service-tiers-performance-improvements-pitr-for-basic-and-automated-export-for-all-service-tiers/
4) If neither helps, contact support. This could be an issue on their end
5) Once immediate problems are resolved, consider active monitoring of your SQL Azure db's (link in my profile signature)
http://www.developer.com/services/how-to-identify-performance-bottlenecks-on-azure-sql-database.html
You could also have a device in your network that is slowing down the performance. You might want to run some network tests to see if the problem is internal or external. For instance, someone might have changed some firewall or security settings on a rollout and messed it up a bit or a device might be ready to fail.

Would it ever be wise to have a SQL server per web server?

I'm wondering if, under the circumstances that
You get lots more reads than writes
Your SQL server of choice is cheap/free and offers a fast mirroring/replication service
Your database isn't insanely large
rather than having separate SQL servers it would be better to have an instance of SQL on each machine getting instant updates from the master. This way there would be no network latency when doing all the read queries, but there would be a per box performance hit as the SQL instance has to execute. Would this be better overall for performance? Are there any other pros/cons that might come up?
Your SQL Server should always be on a different box to the webserver, of that there is no question.
How many DB servers and webservers you have, and how they mirror (or otherwise) is up to how you scale your application.
You have SQL Server on a different machine because it needs (and deserves) a lot of RAM.
It's quite a common architectural pattern to have read-only replicas of a database. We accept some degree of stalesness in them, perhaps they are even only updated once a day.
The general rule will be that multiple copies will introduce complexity in terms of operations and management and tend to introduce the possibilities of inconsistency of data - almost inevitably the copies will not be perfectly is step (or the costs of making them soo will be too high.)
An example: what happens if your replication processing breaks a bit. So that some, but not all copies become stale. Now your users start to see radically different views of the world. How much might that matter to you? If it's a site with low value data (eg. celebrity sightings in London suberbs) then perhaps that's fine. If it's on hand inventory, and being out of date means that your customers can't place orders, then maybe you care rather more.
My advice: things that sound simple at a boxed on paper sort of level don't always work out that way when you're sitting in an operations room at 3AM. Be very sure that you can easily operate your solution.
How would your SQL Server be cheap/free? I should have said the licensing costs for this setup would be crippling. At retail prices you're looking at $6000 per server. See also Jeff's comments about costs. Scale out the web servers by all means, but not your SQL Server until it's pretty much on its' knees.
You might instead want to think about a distributed cache like Velocity or NCache.
Either way, run your site first with one SQL server and see how it copes with the load, then think about mirroring/replication across servers, otherwise you're just optimising prematurely. Measure first!
An immediate con is that there is no distributed lock co-ordinator in SQL Server so you can get merge conflicts as updates can change the same row on two different servers at the same time.
Depending on the size of the database and the disks in the web servers, you will find your network latency is smaller than the disk latency you will start suffering as the web server disks will not usually be as performant as the disk array you give to the database. If you wanted that kind of performance, you would be buying it per web server.
Replication performance is not without latency either, the distribution of the transactions isn't 'free' and careful maintenance of the transaction log would have to be planned to ensure you did not get log fragmentation (too many vlog's wthin the transaction log) which kills replication performance.

Resources