Overcoming Windows Azure Sql Database 150 gb size limitation - database

SQL Azure has a database size limit of 150 gb. I have read through their documentation several times and also searched online but I'm unclear about this: Does using federations allow a developer to grow beyond a 150 gb data base? For example can I have several 150GB federation members.
If not, how can I handle a database larger than 150 gb on Windows Azure?
basically, How do I scale out beyond 150 gb on Windows Azure
If theres no other way is RDS a good alternative(share any other alternatives)

Currently it is not possible to have a single database larger than 150G.
The only approach is to either split the data into multiple databases, one account can have up to 149 user databases plus the master DB, or use SQL Azure Federations. Currently, if I am not mistaken, the total number of Federations supported is Int16.MaxValue - 1. Each federation is actually a separate database, transparent to the developer, which can be up to 150GB.
However, SQL Azure Federations has its own pros and cons, along with some data access layer re-factoring. If you are interested you may check out these cool videos on SQL Azure Federations:
Building Scalable Apps with SQL Azure
Using SQL Azure Database Federations
UPDATE
I will not completely agree with #ryancrawcour. What he explains is just the peak of the iceberg lying bellow the water. The amount of required re-factoring really depends on how data is consumed from the application. I will just mention a few factors for considerations (which are not complete picture at all). Consider any of the following:
Data that is common for all federations (how you get this data)
Stored proc, that post-processes data - you have to iterate in each and ever federation member and execute that stored proc. There is no way to execute the Stored proc once and process data in all the federations.
Aggregate data, which is spread across more than 1 federation member
List data from more than one federation member.
These are just few operations that you will need to consider, and that does not require "just change in connection string and execute one use federation ..." before each query. Actually using SQL Azure Federations you don't need to change the connection string at all. It is all the same SQL Azure connection string. The "USE FEDERATION ..." statement is what you have execute before each query. But it is way not just the only thing. And how about if one is using EntityFramework (model first, or code first, or whatever). Things get even more complicated and need real understanding of SQL Azure Federations.
I would say that SQL Azure Federations is different way of thinking about data, about modelling and normalizing.
UPDATE 2 - new Database sizes announced by Microsoft
As of 03. April 2014 the maximum size for a single Database has been increased to 500GB. The only available information to date is here. Be aware that the management portal still doesn't show this option (as of Today and now: 4. Apri 2014, 15:00 GMT+0:00).

I've been looking for these same answers a while ago. In addition to the answers Anton provided (which are very accurate), I found that you can make your WAVM with SQL Server installation redundant through load balancing and mirroring.
The advantage of WASD is that everything is automated. E.g. when your WAVM instance is taken out of the roulation of the load balancer, you'll need bring a new one up yourself. WASD takes care of all of this.
With WASD Federations you're able to scale to 75TB of data (if I remember correctly), while with WAVM with SQL Server you can scale to 16TB tops.
Also with WASD Federations you can more granularly divide the SQL Workloads.
Regards,
Patriek

There is also the new Azure feature of persistent VMs (currently in preview) which will allow you to migrate your on-premises applications to cloud with minimal changes.
Further reading: Infrastructure as a Service Series: Running SQL Server in a Windows Azure Virtual Machine
.This guide might be helpful as well.
Edit
Here is a comparison with Sql Azure

While considering your scale options, be aware that, as of April 3 2014, Microsoft announced upcoming changes to SQL Premium, including ability to scale each SQL Database instance to 500GB (along with geo-replication, self-service restore, and higher uptime SLA). No date has been announced yet, but you can read about the announcement details here.

There is now a 1 Terrabyte tier available - see https://azure.microsoft.com/en-us/pricing/details/sql-database/ and look at the Premium level.

Related

Regularly Transfer SQL Server to Azure SQL

I'm completely stymied. Let me describe my situation.
We're a relatively small company and the vast majority of our operational data is contained in a vendor database. Our vendor offers a Data Warehousing service. They've taken all of our data and applied some OLAP-ish modeling to it. Each day, they place either a .bak or a .diff file (.bak once a week, .diff every other day) in a FTP endpoint that we pay to access. Currently, we use a PowerShell script to download this data to a server that we've got sitting at a local server farm, where we then use SQL Server to "rehydrate it" by restoring from it.
That's all fine and good, but we really want to move as many of our workloads into the cloud as possible (we use Azure). As far as I can tell, SQL Managed Instances are the only way we can restore from a .bak file in the cloud. This is waaaay more expensive than we need, and we really don't need the managed instance platform at all except to restore from this file.
Basically, everything about this current process is diametrically opposed to us moving it to the cloud, unless we want to pay even more than we are to rent out this server farm.
I'm trying to lobby them for a different method of getting their data, but I'm having trouble coming up with a method to propose. We need to, every day, transfer a ~40gb database from SQL Server (at our vendor) to Azure SQL (in our cloud). What's the least-intrusive way we could do this?
We are glad that you choose the Azure SQL on Azure VM as the solution. Thanks for the suggestions of Alex and Davaid too:
I've actually seen all of those resources already. The biggest
obstacle here is that the entire process has to be automated
end-to-end, which makes bacpac restores more difficult (they'd have
to write some sort of .NET app to back up to bacpac). I think SQL on Azure VM is the only real option, so I may have
to look at cost for that.
If others face the same scenario, we could reference this. This also can be beneficial to other community members.

Advice on Azure platform to host Data Warehouse

I am a Data Warehouse developer currently looking into using the Azure platform to host a new Data Warehouse.
My experience is with using on premise servers hosting standard SQL Server Databases, one for the staging database and one for the Data Warehouse. Typically I would use a combination of SSIS and stored procedures running in a scheduled SQL server agent job for the ETL.
How can I replicate this kind of setup within Azure?
The storage size will be less than 1TB so could I just use Azure SQL Server Database over Azure SQL Data Warehouse?
If so would I need separate databases for staging and the data warehouse using the elastic pool option?
The data that I will be loading into staging will all be on premise. Will SSIS still be suitable for loading to Azure or will Azure Data Factory be a better fit?
Any help at all would be greatly appreciated! Thanks.
Leon has lots of good information there. But from a Data Warehouse perspective, I wouldn't use Data Sync for ETL purposes (mensioned as "not preferred" in the link Leon provided, Data Sync, in the list "When to use Data Sync").
For DW, Azure DB is a good option. Azure SQL Data Warehouse (known as Azure Synapse Analytics nowadays) is a heavy duty beast for handling DW. Are you really sure you need this kind of system with < 1Tb data? I'd personnally leave Azure Synaptics for now, and tried with Azure DB first. It's a LOT cheaper and you can upgrade later if necessary.
One thing to note about Azure DB though: Azure DB doesn't support queries over databases. That's not a deal breaker though, everything can be handled in the same database. I personally use a schema to differentiate staging from the DW (and of course I use other schemas in the DW as well). It's not very difficult to use separate databases of course, but the border between them is a lot deeper in Azure DB than on-premise SQL Server or other Azure solutions (Managed Instance for example).
SSIS is still an option, but the problem is, what you use to run the packages? There are options like:
continue running them from on-premise (all the hard work is still done in the cloud)
rent a VM with SQL Server from Azure, deploy the packages to the VM and run them from VM
use Data Factory to run the SSIS packages
None of those are a perfect solution for every use case. First two options come with quite a heavy cost, if running SSIS is the only thing you need them for. Using Data Factory to run SSIS is a bit cumbersome at the moment, but it's an option anyway.
Data Factory itself is a good option as well (I haven't personally tried it, but I have heard good things about it). If you use Data Factory to run your SSIS, why not start using Data Factory without SSIS packages in the first place? Of course Data Factory has some limitations compared to SSIS which might be the reason, but if your SSIS packages are simple enough, why not give Data Factory a try.
I would suggest you using Azure SQL database. It provides many price tier with difference storage for you. You can select the most suitable price tier for you. Azure SQL database also support scale up/down base on the usage.
Ref: Service tiers in the DTU-based purchase model
And as you said, the data that I will be loading into staging will all be on premise.
Azure SQL database has the feature Data Sync can help you do that:
Data Sync is useful in cases where data needs to be kept updated across several Azure SQL databases or SQL Server databases. Here are the main use cases for Data Sync:
Hybrid Data Synchronization: With Data Sync, you can keep data
synchronized between your on-premises databases and Azure SQL
databases to enable hybrid applications. This capability may appeal
to customers who are considering moving to the cloud and would like
to put some of their application in Azure.
Distributed Applications: In many cases, it's beneficial to separate
different workloads across different databases. For example, if you
have a large production database, but you also need to run a
reporting or analytics workload on this data, it's helpful to have a
second database for this additional workload. This approach minimizes
the performance impact on your production workload. You can use Data
Sync to keep these two databases synchronized.
Globally Distributed Applications: Many businesses span several
regions and even several countries/regions. To minimize network
latency, it's best to have your data in a region close to you. With
Data Sync, you can easily keep databases in regions around the world
synchronized.
When you create the SQL database, you can migrate the schema or data to Azure with many tools, such as Data Migration Assistant(DMA).
Then Set up SQL Data Sync between Azure SQL Database and SQL Server on-premises, it will help sync the data auto every 5 mins.
Hope this helps.
If you want to start on the less expensive options in Azure, go with a general purpose SQL database and an Azure Data Factory pipeline with a few activities.
Dynamic Resource Scaling ETL
You can scale up the database by issuing an alter database statement and then move onto your stored proc based ETL. I would even use a "master" proc to call the dimension and fact proc's to control the execution flow. Then scale down the database with another alter database statement. I even created my own stored proc to issue these scaling statements.
You also cannot predict when the scaling will be completed, so I have a wait activity. You could be a little more nerdy with a loop that checks the service objective property and then proceeds when it is complete. But it was just easier to wait for 10 minutes. I have only been burnt a couple times when the scaling took longer.
Data Pipeline Activities:
Scale up, proceed if successful
Wait about 10 minutes, proceed always
Execute the ETL, proceed always
Scale down
Elastic Query
You can query across databases with vertical partition Elastic Query. Performance isn't great, and they don't recommend it for ETL, but it will work. To improve performance try dumping any large table you need into a temp table and then transform the data locally.

SQL server high availability solutions in Amazon AWS

I am in the process of migrating to Amazon AWS and need a SQL server high availability solution. The current licence that I have is SQL standard 2016.
At this time Amazon does not support shared volumes for Windows instances. Therefore, I am not able to do a regular SQL cluster fail over solution. This is the one where if the entire server goes down the stand by server picks up the slack and continues writing to the same storage. My only option is high availability always on basic groups. As I am starting to get familiar with this feature I find it very maintenance intensive and can see it becoming a problem when dealing with thousands of databases. In my case I have about 5k databases mostly small in size 600mb or less each. My question is Amazon not a viable hosting environment for a full SQL fail over solution. Is the high availability always on basic groups one per database a viable solution?

Azure Database Query Optimizer using View Indices

SQL Server Enterprise Edition's query optimizer will use indices from a view to increase performance of a query even if the view is not explicitly referenced in the query, if applicable. Question: does Azure Database do the same thing? I know SQL Server Express does not do this, for example. I want to ensure I can still get the performance I need from the query optimizer when doing a sort on a joined table with a few million users (works great on enterprise edition but takes several seconds on express - bottle neck at the sort).
Sometime last year (2012) Microsoft announced that the engine was the same between SQL Server and SQL Azure (now called Windows Azure SQL Database :/). So you will likely get the same behavior. Same performance may be another question. Windows Azure SQL Database is also keeping replicas in place in the event of hardware failure. You get the benefit of the secondary coming online in a fashion that is seamless to you. But, This does have a bit of a performance cost. Also, the SQL running in Windows Azure is running in a shared environment. It is pretty well documented that the performance is not the same as a local dedicated multi-processor machine with fast storage. It is a bit of an unfair comparison multi-user, multi-instance vs. dedicated. For many applications this is fast enough, but not all.

SQL Server High Availability on premise - cloud

I would like to know which is the best way to make a copy and keep the copies synchronized of a on premises SQL Server 2008 (not R2) database to SQL Azure.
Think of the SQL Azure as a failover kind of structure...
Notes:
The database runs fine in SQL Azure
I have already figured out how to get the rest of the app running on Azure
Please consider suggestions of the type "Upgrade to SQL Server 2012 because of X" if the gain (reliability, efficiency, time to replicate, etc...) are worth it
I`m looking for instant replication (as fast as possible)
Yes it will have to sync back eventually. If the on-premises deploy crash and the cloud get activated and changed, sync back will be necessary, but i think it does not need to be automatic... of it is, better!
The Database consist of 900+ tables (legacy system)
http://www.windowsazure.com/en-us/manage/services/sql-databases/getting-started-w-sql-data-sync/
http://msdn.microsoft.com/en-us/library/hh456371.aspx
I think the best bet is to use SQL Data Sync, it should give you bidirectional and we use it currently to sync data around the world in terms of datacenters and one local on premise database. It will only give you 5 mins sync timing but this will probably do, otherwise the next best options is to use SQL Server VMs and do the old fashion way. But with SQL Azure Data Sync we have found to be reasonable reliable and been running it for a good six months syncing across 4 database in four data centres in Azure.
Some problems though with it,
It uses Triggers.
It will obivously add load and connections to your current SQL Database.
The new control panel in Azure is a nightmare for it, so I would use the old panel for the moment.
It is in preview last time I looked, so it might not be 100% suitable
for you.
I would imagine there is some better third party solutions out there but off the shelf and in Azure SQL Data sync is well worth a look for the situation you a describing.

Resources