I have recently completed two migrations from Hadoop to Snowflake on AWS. I am now moving on to work with another customer where snowflake is on Azure. I wanted to know if there are any feature difference between Snowflake on AWS and snowflake on Azure.
Thanks
not in terms of Snowflake - most customers pick depending on the other systems they use and the deployment (to avoid the extra costs related to data transfer - Snowflake doesn't charge for data load, but AWS and Azure do charge to get the data into the correct deployment if it's not there already)
Metadaten Operations in Azure are slightly slower.
And the maximum filesize ist limited to 256 MB while its 5 GB on AWS and Google.
Related
Is there a maximum throughput or upload limit on the amount of simultaneous data being transferred using an Azure IRT as part of an Azure data factory before a pipeline and/or activity may timeout or fail?
Good question. The answer is we cannot tell.
There is an interesting article about Hyperscale throughput and most of the research I did around throughput they don't give me a concrete answer.
On the same article you can find a comment that states:
Dont’t bother with Azure Data Factory. For some reason with type
casting from blob to Azure SQL, but also Azure SQL Database as a
source, the throughput is dramatic. From 6 MB/s to 13 MB/s on high
service tiers for transferring 1 table, 5GB in total. That is beyond
bad.
So there are too many factor to count in: where the data come from, where the data go, the products you are using, etc...
You might need to open a (paid) ticket with Azure support and ask if they hardcoded that trigs a timeout. But in my experience I've seen simple database migration from SQL Server to Azure SQL DB failing for no reason and then complete at a second try.
Is Snowflake warehouse based on virtual machine like EC2? I mean is every Snowflake warehouse a EC2? It so, it is impossible for warehouse to resume so fast. If it is not, is it based on container or something?
Thanks
Steven
The compute (Virtual Warehouse) layer of Snowflake uses whatever compute is available on the platform the account is set up on. One of the good things about Snowflake is that you don't have to concern yourself with these details, It's probably something like:
AWS -> EC2
GCP -> Compute Engine
Azure -> Virtual Machines
From what I understand, Snowflake maintain a pool of servers. When you shut down your Virtual Warehouse it doesn't mean that your EC2 instance terminate - they just go back into the pool of servers that can be used by other Snowflake customers.
At a high level a Snowflake DWH is composed of three layers
a.) Storage Layer: In AWS this would typically be S3 storage layer
b.) Compute Layer: These are EC2 machines and are used for the Compute and data load
c.) Web Services: These take care of services like authentication and security
So the compute layer of Snowflake is hosted on EC2.
We have some DB instances on Azure, I am trying to optimize DB Performance on Azure. Can any explain what is Azure DTU & How can we calculate Azure DTU ?
1.What is Azure DTU:
Azure SQL Database provides two purchasing models: vCore-based purchasing model and DTU-based purchasing model.
DTU-based purchasing model is based on a bundled measure of compute, storage, and IO resources. Compute sizes are expressed in terms of Database.Transaction Units (DTUs) for single databases and elastic Database Transaction Units (eDTUs) for elastic pools.
The Database Transaction Unit (DTU) represents a blended measure of CPU, memory, reads, and writes. The DTU-based purchasing model offers a set of preconfigured bundles of compute resources and included storage to drive different levels of application performance: Basic, Standard and Premium.
For more details, please see: https://learn.microsoft.com/en-us/azure/sql-database/sql-database-service-tiers-dtu
2.How to optimize DB Performance:
Since your DB instances are already on Azure, you can monitor your DB instances and improve your DB Performance with Azure Portal by troubleshot or change the service tiers. Please see: Monitoring and performance tuning: https://learn.microsoft.com/en-us/azure/sql-database/sql-database-monitor-tune-overview#improving-database-performance-with-more-resources
3.How to calculate Azure DTU:
Here is a link about Azure SQL Database DTU Calculator.This calculator will help us determine the number of DTUs for our existing SQL Server database(s) as well as a recommendation of the minimum performance level and service tier that we need before we migrate to Azure SQL Database.
If you still keep the database backup file, i think you can try this calculator.
Please see: http://dtucalculator.azurewebsites.net/
The significant consideration of the Azure SQL Database is to meet the performance requirement of the deployed database against the minimum cost. Undoubtedly, nobody wants to pay money for the redundant resources or features that they do not use or plan to use.
At this point, Microsoft Azure offers two different purchasing models to provide cost-efficiency:
Database Transaction Unit (DTU)-Based purchasing model.
Virtual Core (vCore)-Based purchasing model
A purchasing model decision directly affects the database performance and the total amount of the bills. In my thought, If the deployed database will not consume too many resources the DTU-Based purchase model will be more suitable.
Now, we will discuss the details about these two purchasing models in the following sections.
Database Transaction Unit (DTU)-Based purchasing model
In order to understand the DTU-Based purchase model more clearly, we need to clarify what does make a sense DTU in Azure SQL Database. DTU is an abbreviation for the “Database Transaction Unit” and it describes a performance unit metric for the Azure SQL Database. We can just like the DTU to the horsepower in a car because it directly affects the performance of the database. DTU represents a mixture of the following performance metrics as a single performance unit for Azure SQL Database:
CPU
Memory
Data I/O and Log I/O
Elastic Pool
Briefly, Elastic Pool helps us to automatically manage and scale the multiple databases that have unpredictable and varying resource demands upon a shared resource pool. Through the Elastic Pool, we don’t need to scale the databases continuously against resource demand fluctuation. The databases which take part in the pool consumes the Elastic Pool resources when they are needed but they can not exceed the Elastic Pool resource limitations so that it provides a cost-effective solution.
Properly Estimation of the DTU for Azure SQL Database
After deciding to use DTU-based purchase model, we have to find out the following question-answer with logical reasons:
Which service tier and how much DTUs are required for my workload when migrating to Azure SQL?
DTU Calculator will be the main solution to estimate the DTUs requirement when we are migrating on-premise databases to Azure SQL Database. The main idea of this tool is capturing the various metrics utilization from the existing SQL Server that affects the DTUs and then it tries to estimate approximately DTUs and service tier in the light of the collected performance utilizations. DTU calculator collects the following metrics through the either Command-Line Utility or PowerShell Script and saves these metrics to a CSV file.
Processor - % Processor Time
Logical Disk - Disk Reads/sec
Logical Disk - Disk Writes/sec
Database - Log Bytes Flushed/sec
Extracted from https://www.spotlightcloud.io/blog/what-is-dtu-in-azure-sql-database-and-how-much-do-we-need.
Check this well-written article on calculating DTU
Credit to original author : Esat Erkec
Let me explain the problem. I have a magento project with 3 million products and more than 6 million urls. The problem ist only the database because of this much products. I would like to only autoscale the google cloud sql database. Then it would always respond adequate. I know its possible for google compute engine and that also includes the database. So is it possible with google or another cloud sql provider?
You cannot scale databases the same way Autoscaler is doing for Compute Engine managed instances. Autoscaling capabilities of Compute Engine works for stateless VMs. Databases are stateful. You can use read replications to scale Cloud SQL. Read Replica instances allow data from the master instance to be replicated to one or more slaves. This setup can provide increased read throughput. Visit this artcile for different read replica scenarios.
SQL Azure has a database size limit of 150 gb. I have read through their documentation several times and also searched online but I'm unclear about this: Does using federations allow a developer to grow beyond a 150 gb data base? For example can I have several 150GB federation members.
If not, how can I handle a database larger than 150 gb on Windows Azure?
basically, How do I scale out beyond 150 gb on Windows Azure
If theres no other way is RDS a good alternative(share any other alternatives)
Currently it is not possible to have a single database larger than 150G.
The only approach is to either split the data into multiple databases, one account can have up to 149 user databases plus the master DB, or use SQL Azure Federations. Currently, if I am not mistaken, the total number of Federations supported is Int16.MaxValue - 1. Each federation is actually a separate database, transparent to the developer, which can be up to 150GB.
However, SQL Azure Federations has its own pros and cons, along with some data access layer re-factoring. If you are interested you may check out these cool videos on SQL Azure Federations:
Building Scalable Apps with SQL Azure
Using SQL Azure Database Federations
UPDATE
I will not completely agree with #ryancrawcour. What he explains is just the peak of the iceberg lying bellow the water. The amount of required re-factoring really depends on how data is consumed from the application. I will just mention a few factors for considerations (which are not complete picture at all). Consider any of the following:
Data that is common for all federations (how you get this data)
Stored proc, that post-processes data - you have to iterate in each and ever federation member and execute that stored proc. There is no way to execute the Stored proc once and process data in all the federations.
Aggregate data, which is spread across more than 1 federation member
List data from more than one federation member.
These are just few operations that you will need to consider, and that does not require "just change in connection string and execute one use federation ..." before each query. Actually using SQL Azure Federations you don't need to change the connection string at all. It is all the same SQL Azure connection string. The "USE FEDERATION ..." statement is what you have execute before each query. But it is way not just the only thing. And how about if one is using EntityFramework (model first, or code first, or whatever). Things get even more complicated and need real understanding of SQL Azure Federations.
I would say that SQL Azure Federations is different way of thinking about data, about modelling and normalizing.
UPDATE 2 - new Database sizes announced by Microsoft
As of 03. April 2014 the maximum size for a single Database has been increased to 500GB. The only available information to date is here. Be aware that the management portal still doesn't show this option (as of Today and now: 4. Apri 2014, 15:00 GMT+0:00).
I've been looking for these same answers a while ago. In addition to the answers Anton provided (which are very accurate), I found that you can make your WAVM with SQL Server installation redundant through load balancing and mirroring.
The advantage of WASD is that everything is automated. E.g. when your WAVM instance is taken out of the roulation of the load balancer, you'll need bring a new one up yourself. WASD takes care of all of this.
With WASD Federations you're able to scale to 75TB of data (if I remember correctly), while with WAVM with SQL Server you can scale to 16TB tops.
Also with WASD Federations you can more granularly divide the SQL Workloads.
Regards,
Patriek
There is also the new Azure feature of persistent VMs (currently in preview) which will allow you to migrate your on-premises applications to cloud with minimal changes.
Further reading: Infrastructure as a Service Series: Running SQL Server in a Windows Azure Virtual Machine
.This guide might be helpful as well.
Edit
Here is a comparison with Sql Azure
While considering your scale options, be aware that, as of April 3 2014, Microsoft announced upcoming changes to SQL Premium, including ability to scale each SQL Database instance to 500GB (along with geo-replication, self-service restore, and higher uptime SLA). No date has been announced yet, but you can read about the announcement details here.
There is now a 1 Terrabyte tier available - see https://azure.microsoft.com/en-us/pricing/details/sql-database/ and look at the Premium level.