We have some DB instances on Azure, I am trying to optimize DB Performance on Azure. Can any explain what is Azure DTU & How can we calculate Azure DTU ?
1.What is Azure DTU:
Azure SQL Database provides two purchasing models: vCore-based purchasing model and DTU-based purchasing model.
DTU-based purchasing model is based on a bundled measure of compute, storage, and IO resources. Compute sizes are expressed in terms of Database.Transaction Units (DTUs) for single databases and elastic Database Transaction Units (eDTUs) for elastic pools.
The Database Transaction Unit (DTU) represents a blended measure of CPU, memory, reads, and writes. The DTU-based purchasing model offers a set of preconfigured bundles of compute resources and included storage to drive different levels of application performance: Basic, Standard and Premium.
For more details, please see: https://learn.microsoft.com/en-us/azure/sql-database/sql-database-service-tiers-dtu
2.How to optimize DB Performance:
Since your DB instances are already on Azure, you can monitor your DB instances and improve your DB Performance with Azure Portal by troubleshot or change the service tiers. Please see: Monitoring and performance tuning: https://learn.microsoft.com/en-us/azure/sql-database/sql-database-monitor-tune-overview#improving-database-performance-with-more-resources
3.How to calculate Azure DTU:
Here is a link about Azure SQL Database DTU Calculator.This calculator will help us determine the number of DTUs for our existing SQL Server database(s) as well as a recommendation of the minimum performance level and service tier that we need before we migrate to Azure SQL Database.
If you still keep the database backup file, i think you can try this calculator.
Please see: http://dtucalculator.azurewebsites.net/
The significant consideration of the Azure SQL Database is to meet the performance requirement of the deployed database against the minimum cost. Undoubtedly, nobody wants to pay money for the redundant resources or features that they do not use or plan to use.
At this point, Microsoft Azure offers two different purchasing models to provide cost-efficiency:
Database Transaction Unit (DTU)-Based purchasing model.
Virtual Core (vCore)-Based purchasing model
A purchasing model decision directly affects the database performance and the total amount of the bills. In my thought, If the deployed database will not consume too many resources the DTU-Based purchase model will be more suitable.
Now, we will discuss the details about these two purchasing models in the following sections.
Database Transaction Unit (DTU)-Based purchasing model
In order to understand the DTU-Based purchase model more clearly, we need to clarify what does make a sense DTU in Azure SQL Database. DTU is an abbreviation for the “Database Transaction Unit” and it describes a performance unit metric for the Azure SQL Database. We can just like the DTU to the horsepower in a car because it directly affects the performance of the database. DTU represents a mixture of the following performance metrics as a single performance unit for Azure SQL Database:
CPU
Memory
Data I/O and Log I/O
Elastic Pool
Briefly, Elastic Pool helps us to automatically manage and scale the multiple databases that have unpredictable and varying resource demands upon a shared resource pool. Through the Elastic Pool, we don’t need to scale the databases continuously against resource demand fluctuation. The databases which take part in the pool consumes the Elastic Pool resources when they are needed but they can not exceed the Elastic Pool resource limitations so that it provides a cost-effective solution.
Properly Estimation of the DTU for Azure SQL Database
After deciding to use DTU-based purchase model, we have to find out the following question-answer with logical reasons:
Which service tier and how much DTUs are required for my workload when migrating to Azure SQL?
DTU Calculator will be the main solution to estimate the DTUs requirement when we are migrating on-premise databases to Azure SQL Database. The main idea of this tool is capturing the various metrics utilization from the existing SQL Server that affects the DTUs and then it tries to estimate approximately DTUs and service tier in the light of the collected performance utilizations. DTU calculator collects the following metrics through the either Command-Line Utility or PowerShell Script and saves these metrics to a CSV file.
Processor - % Processor Time
Logical Disk - Disk Reads/sec
Logical Disk - Disk Writes/sec
Database - Log Bytes Flushed/sec
Extracted from https://www.spotlightcloud.io/blog/what-is-dtu-in-azure-sql-database-and-how-much-do-we-need.
Check this well-written article on calculating DTU
Credit to original author : Esat Erkec
Related
I am new to Azure and I am currently exploring the possibility of migrating my database from a Virtual private server to an Azure hosted sql database. I can see the potential advantage of moving to azure, less maintenance, cloud hosted, cheaper.
My VPS i currently pay £36 every month, this is a fixed payment. However on Azure using the pricing calculator i can see that with a standard tier it would cost about £14, which is a huge saving. My only issue is that I have chosen the DTU. Now i am worried that for one month it may be fine, but for another month cost may spike. The reason why I have been hesitant to migrate is because I am fine with paying £36 a month knowing i can use less or more and i will still be at a fixed cost.
On the other hand - using azure with dtu, there is no guarantee that i may have the same cost each month, it may be potentially higher.
My question is can someone explain DTU for me and also is there a way where i can make sure my cost is low without having any surprise costs in the future?
In DTU-based SQL purchase models, a fixed set of resources is assigned to the database or elastic pool via performance tiers: Basic, Standard and Premium. This model is best for customers who prefer the simplicity of fixed payments each month, where the simplicity of pre-configured options is desired.
You need to first measure the resource utilization to check now much DTU is enough for you.
Measure the following utilization metrics for at least an hour so the calculator can analyze utilization over time to provide you the best recommendation:
Processor - % Processor Time
Logical Disk - Disk Reads/sec
Logical Disk - Disk Writes/sec
Database - Log Bytes Flushed/sec
Refer this document to calculate the database resource consumption. Once you will get the output, calculate button to view your recommended Service Tier/Performance Level and DTUs on the same page.
Your consumption cost will be as per given image below:
You can also check your the pricing based on the your calculated DTUs here.
I am a Data Warehouse developer currently looking into using the Azure platform to host a new Data Warehouse.
My experience is with using on premise servers hosting standard SQL Server Databases, one for the staging database and one for the Data Warehouse. Typically I would use a combination of SSIS and stored procedures running in a scheduled SQL server agent job for the ETL.
How can I replicate this kind of setup within Azure?
The storage size will be less than 1TB so could I just use Azure SQL Server Database over Azure SQL Data Warehouse?
If so would I need separate databases for staging and the data warehouse using the elastic pool option?
The data that I will be loading into staging will all be on premise. Will SSIS still be suitable for loading to Azure or will Azure Data Factory be a better fit?
Any help at all would be greatly appreciated! Thanks.
Leon has lots of good information there. But from a Data Warehouse perspective, I wouldn't use Data Sync for ETL purposes (mensioned as "not preferred" in the link Leon provided, Data Sync, in the list "When to use Data Sync").
For DW, Azure DB is a good option. Azure SQL Data Warehouse (known as Azure Synapse Analytics nowadays) is a heavy duty beast for handling DW. Are you really sure you need this kind of system with < 1Tb data? I'd personnally leave Azure Synaptics for now, and tried with Azure DB first. It's a LOT cheaper and you can upgrade later if necessary.
One thing to note about Azure DB though: Azure DB doesn't support queries over databases. That's not a deal breaker though, everything can be handled in the same database. I personally use a schema to differentiate staging from the DW (and of course I use other schemas in the DW as well). It's not very difficult to use separate databases of course, but the border between them is a lot deeper in Azure DB than on-premise SQL Server or other Azure solutions (Managed Instance for example).
SSIS is still an option, but the problem is, what you use to run the packages? There are options like:
continue running them from on-premise (all the hard work is still done in the cloud)
rent a VM with SQL Server from Azure, deploy the packages to the VM and run them from VM
use Data Factory to run the SSIS packages
None of those are a perfect solution for every use case. First two options come with quite a heavy cost, if running SSIS is the only thing you need them for. Using Data Factory to run SSIS is a bit cumbersome at the moment, but it's an option anyway.
Data Factory itself is a good option as well (I haven't personally tried it, but I have heard good things about it). If you use Data Factory to run your SSIS, why not start using Data Factory without SSIS packages in the first place? Of course Data Factory has some limitations compared to SSIS which might be the reason, but if your SSIS packages are simple enough, why not give Data Factory a try.
I would suggest you using Azure SQL database. It provides many price tier with difference storage for you. You can select the most suitable price tier for you. Azure SQL database also support scale up/down base on the usage.
Ref: Service tiers in the DTU-based purchase model
And as you said, the data that I will be loading into staging will all be on premise.
Azure SQL database has the feature Data Sync can help you do that:
Data Sync is useful in cases where data needs to be kept updated across several Azure SQL databases or SQL Server databases. Here are the main use cases for Data Sync:
Hybrid Data Synchronization: With Data Sync, you can keep data
synchronized between your on-premises databases and Azure SQL
databases to enable hybrid applications. This capability may appeal
to customers who are considering moving to the cloud and would like
to put some of their application in Azure.
Distributed Applications: In many cases, it's beneficial to separate
different workloads across different databases. For example, if you
have a large production database, but you also need to run a
reporting or analytics workload on this data, it's helpful to have a
second database for this additional workload. This approach minimizes
the performance impact on your production workload. You can use Data
Sync to keep these two databases synchronized.
Globally Distributed Applications: Many businesses span several
regions and even several countries/regions. To minimize network
latency, it's best to have your data in a region close to you. With
Data Sync, you can easily keep databases in regions around the world
synchronized.
When you create the SQL database, you can migrate the schema or data to Azure with many tools, such as Data Migration Assistant(DMA).
Then Set up SQL Data Sync between Azure SQL Database and SQL Server on-premises, it will help sync the data auto every 5 mins.
Hope this helps.
If you want to start on the less expensive options in Azure, go with a general purpose SQL database and an Azure Data Factory pipeline with a few activities.
Dynamic Resource Scaling ETL
You can scale up the database by issuing an alter database statement and then move onto your stored proc based ETL. I would even use a "master" proc to call the dimension and fact proc's to control the execution flow. Then scale down the database with another alter database statement. I even created my own stored proc to issue these scaling statements.
You also cannot predict when the scaling will be completed, so I have a wait activity. You could be a little more nerdy with a loop that checks the service objective property and then proceeds when it is complete. But it was just easier to wait for 10 minutes. I have only been burnt a couple times when the scaling took longer.
Data Pipeline Activities:
Scale up, proceed if successful
Wait about 10 minutes, proceed always
Execute the ETL, proceed always
Scale down
Elastic Query
You can query across databases with vertical partition Elastic Query. Performance isn't great, and they don't recommend it for ETL, but it will work. To improve performance try dumping any large table you need into a temp table and then transform the data locally.
We have around 300 millions documents on a drive. We need to delete around 200 millions of them. I am going to write the 200 million paths to a storage so I can keep track of deleted documents. My current thought's is an Azure SQL database is properly not very suited for this amount. Cosmos DB is to expensive. Storing csv files is bad, because I need to do updates everytime I delete a file. Table storage seems to be a pretty good match, but does not offer groups by operations that could come in handy when doing status reports. I dont know much about data lake, if you can do fast updates or it is more like an archive. All input is welcome for choosing the right storage for this kind of reporting.
Thanks in advance.
According to your need, you can use Azure Cosmos DB or Azure table storage.
Azure Table Storage offers a NoSQL key-value store for semi-structured data. Unlike a traditional relational database, each entity (such as a row - in relational database terminology) can have a different structure, allowing your application to evolve without downtime to migrate between schemas.
Azure Cosmos DB is a multimodal database service designed for global use in mission-critical systems. Not only does it expose a Table API, it also has a SQL API, Apache Cassandra, MongoDB, Gremlin and Azure Table Storage. These allow you to easily swap out existing dbs with a Cosmos DB implementation.
Their differences are as below:
Performance
Azure Table Storage has no upper bound on latency. Cosmos DB defines
latency of single-digit milliseconds for reads and writes along with
operations at sub-15 milliseconds at the 99th percentile worldwide.
(That was a mouthful) Throughput is limited on Table Storage to 20,000
operations per second. On Cosmos DB, there is no upper limit on
throughput, and more than 10 million operations per second are
supported.
Global Distribution
Azure Table Storage supports a single region with an optional
read-only secondary region for availability. Cosmos DB supports
distribution from 1 to more than 30 regions with automatic failovers
worldwide.
Billing
Azure Table Storage uses storage volume to determine billing.Pricing
is tiered to get progressively cheaper per GB the more storage you
use. Operations incur a charge measured per 10,000 transactions.
For Cosmos DB, It has tow billing nodule : Provisioned throughput and
Consumed Storage.
Provisioned Throughput: Provisioned throughput (also called reserved throughput) guarantees high performance at any scale. You
specify the throughput (RU/s) that you need, and Azure Cosmos DB
dedicates the resources required to guarantee the configured
throughput. You are billed hourly for the maximum provisioned
throughput for a given hour.
Consumed Storage: You are billed a flat rate for the total amount of storage (GBs) consumed for data and the indexes for a given
hour.
For more details, please refer to the document.
I am unable to locate the cost per transaction for a Azure SQL Database.
https://learn.microsoft.com/en-us/azure/sql-database/sql-database-single-databases-manage
I know the SQL Server database is about 5$ per month but how much for the transactions?
If I go to the Azure Pricing Calculator (https://azure.microsoft.com/en-us/pricing/calculator/) they do not seem to have the info. They list the price for a single database as $187.77 so that is not the same service as they one you create if you use the link above.
TL;DR:
Azure SQL pricing is "flat": first you choose a performance level for your database which has a fixed cost (e.g. S6 for $580/mo or S1 for $30/mo), and this is billed by the second. Azure does not bill your account for actual IO/CPU usage.
The rest:
There is no single "cost per transaction" because a "transaction" is not a single uniform amount of work for a database server (e.g. a single SELECT over a small table with indexes is significantly less IO and CPU intensive compared to a MERGE over millions of rows).
There three different types of Azure-SQL deployment in Azure, with their own different formulas for determining monthly cost:
Single database (DTU)
Single database (vCore)
Elastic pool
Managed Instance
I assume you're interested in the "single database" deployment types, as "Managed instance" is a rather niche application and "Elastic pool" is to save money if you have lots (think: hundreds or thousands) of smaller databases. If you have a small number (e.g. under 100) of larger databases (in terms of disk space) then "Single database" is probably right for you. I won't go into detail on the other deployment types.
If you go with DTU-based Single Database deployment (which most users do), then the pricing follows this general formula:
Monthly-price = ( Instances * Performance-level )
Where Performance-level is the selected SKU for the minimum level of performance you need. You can change this level up or down at will at any point in time as you're billed by the second and not per month (but per-second pricing is difficult to work into a monthly price estimate)
A "DTU" (Database Throughput Unit) is a unit of measure that represents the actual cost to Microsoft of running your database, which is then passed on to you somewhat transparently (disregarding whatever profit-margin Microsoft has per-DTU, of course).
When determining what Performance-level to get for your database you should select the performance level that offers the minimum number of DTUs that your application actually needs (you determine this through profiling and estimating, usually by starting off with a high-performance database for a few hours (which won't cost more than a few dollars) and running your application code - if the actual DTU usage numbers are low (e.g. you get an "S6" 400 DTU (~$580/mo) database and see that you only use 20 DTUs under normal loads then you can safely leave it on the "S1" 20DTU (~$30/mo) performance level
The question about what a DTU actually is has been asked before, so to avoid creating a duplicate answer please read that QA here: Azure SQL Database "DTU percentage" metric
It is based on your requirement, I am using a single instance Azure SQL Database, so basically based on your cpu cost and your transaction limit and space called 'DTU'. For this totally based on your requirement.
If it is in VM (Virtual machine), that applied your vm cost and your sqlserver cost (if you do not have licence of sqlserver).
Cost https://azure.microsoft.com/en-us/pricing/calculator/
I am in the process of migrating to Amazon AWS and need a SQL server high availability solution. The current licence that I have is SQL standard 2016.
At this time Amazon does not support shared volumes for Windows instances. Therefore, I am not able to do a regular SQL cluster fail over solution. This is the one where if the entire server goes down the stand by server picks up the slack and continues writing to the same storage. My only option is high availability always on basic groups. As I am starting to get familiar with this feature I find it very maintenance intensive and can see it becoming a problem when dealing with thousands of databases. In my case I have about 5k databases mostly small in size 600mb or less each. My question is Amazon not a viable hosting environment for a full SQL fail over solution. Is the high availability always on basic groups one per database a viable solution?