What are some of the best practices available for taking SQL Server backups from an Amazon EC2 based server??? I have a nightly job that creates the backups but I still need to move them "off-site". So I'm really asking two question (1) are there sample scripts (BAT and otherwise) that can take the files and move them (FTP?) to another server and (2) are there any other EC2 specific options that I can take a look at?
You should look at a two part approach.
Amazon S3 is distributed across three different data centers and offers durability of 11 9s - realistically there's more chance of Amazon going out of business than your data being unavailable. You could back up your data to S3 on a schedule based on how much it would hurt to lose that period's worth of data: e.g. if losing a day of data isn't a big deal, perhaps you back up nightly. You asked about EC2-specific options; EBS snapshots are automatically saved to S3, and they are part of the best practice for ensuring durability of EBS disks.
Because you're never likely to lose data on S3, the next most likely problem is "Amazon goes out of business", or "Amazon does not let me access my data". You can consider either another cloud provider, a traditional hosting provider, your own on-site storage, etc. The sky is really the limit on options (search StackOverflow if you can't come up with a handful - there is no S3-alternative hosted by anyone but Amazon unfortunately) so I won't cover that here, but it's worth weighing up your perception of the likelihood of this happening and the effort it would take you to make your data useful again when considering how frequently you need to do the off-AWS backup.
People who run services outside of AWS have it so easy, because they can just off-site back up to S3!
Take a look at SQL Backup Master, which can take database backups and move them to Amazon S3, Dropbox, FTP, etc.
Any standard backup tool for SQLServer should work. But I agree with crb that S3 (using EBS snapshots or not) should be the first line of defense.
Related
I'm completely stymied. Let me describe my situation.
We're a relatively small company and the vast majority of our operational data is contained in a vendor database. Our vendor offers a Data Warehousing service. They've taken all of our data and applied some OLAP-ish modeling to it. Each day, they place either a .bak or a .diff file (.bak once a week, .diff every other day) in a FTP endpoint that we pay to access. Currently, we use a PowerShell script to download this data to a server that we've got sitting at a local server farm, where we then use SQL Server to "rehydrate it" by restoring from it.
That's all fine and good, but we really want to move as many of our workloads into the cloud as possible (we use Azure). As far as I can tell, SQL Managed Instances are the only way we can restore from a .bak file in the cloud. This is waaaay more expensive than we need, and we really don't need the managed instance platform at all except to restore from this file.
Basically, everything about this current process is diametrically opposed to us moving it to the cloud, unless we want to pay even more than we are to rent out this server farm.
I'm trying to lobby them for a different method of getting their data, but I'm having trouble coming up with a method to propose. We need to, every day, transfer a ~40gb database from SQL Server (at our vendor) to Azure SQL (in our cloud). What's the least-intrusive way we could do this?
We are glad that you choose the Azure SQL on Azure VM as the solution. Thanks for the suggestions of Alex and Davaid too:
I've actually seen all of those resources already. The biggest
obstacle here is that the entire process has to be automated
end-to-end, which makes bacpac restores more difficult (they'd have
to write some sort of .NET app to back up to bacpac). I think SQL on Azure VM is the only real option, so I may have
to look at cost for that.
If others face the same scenario, we could reference this. This also can be beneficial to other community members.
I'm using a Cloud SQL instance to store two types of data: typical transactional data and large "read-only" data. Each of these read-only tables could have GBs of data and they work like snapshots that are refreshed once a day. The old data is totally replaced by the most recent data. The "read-only" tables reference data from the "transactional tables", but I don't necessarily need to perform joins between them, so they're kind of "independent".
In this context, I believe using Cloud SQL to store these kind of tables are going to be a problem in terms of billing. Because Cloud SQL is fully managed, I would be paying for maintenance work from Google and I wouldn't need any kind of maintenance for those specific tables.
Maybe there are databases more suitable for storing snapshot/temporary data. I'm considering to move those type of tables to another kind of storage, but it's possible that I would end up making the bill even higher. Or maybe I could continue using Cloud SQL for those tables and just unlog them.
Can anyone help me with this? Is there any kind of storage in GCP that would be great for storing large snapshots that are refreshed once a day? Or is there an workaround to make Cloud SQL not maintain those tables?
This is a tough question because there are a lot of options and a lot of things that could work. The GCP documentation page "Choosing a Storage Option" is very handy in this kind of cases. It has a flowchart to select a storage option based on the kind of data you want to store, a video that explains each storage option and a table with the description, strong points and use cases for each option. I would recommend to start there.
Also, if the issue with Cloud SQL is that is fully managed and pricy, you can set up MySQL on Google Compute Engine and manage it yourself. Is also fairly cheaper for the same machine. For a n1-standard-1, $0.0965 in Cloud SQL and $0.0475 in GCE (keep in mind that other charges may apply on top of the machine price)
I am planning to host my iphone game on amazon aws. Basically my game just need a database, and currently I am using mysql (relational database) to store users data.
I am new to amazon aws, and I have read some of the articles. This page: http://aws.amazon.com/running_databases/ provides some available choices for databases.
RDS (relational database services)
EC2 with Relational Database AMI (it has mysql)
simpleDB
I think I will skip simpleDB, because I have read the sample codes, the database structure is kind of different from relational db, no join tables, all data stored in strings. The current game that I am developing is already in relational form, with all the php codes already, maybe for future project, I could consider it.
Now, left RDS and EC2, which one should I use? In comparison in costs, performances, reliability and stability? My game server requirements:
MySQL database (as I only familiar with this database engine and I already developed the game half way, no time to re-write or learn new language)
Easy to scale
Load balancing
Automatic backup
(if possible, less maintenance works in future)
Please give me some advice, thank you very much.
As you have already chosen MySQL on AWS, the question is only whether you want to host the Database Server on the Instance or through AWS RDS Service.
In comparison in costs, performances, reliability and stability and the your game server requirements:
MySQL database Easy to scale,
Load balancing,
Automatic backup,
(if possible, less maintenance works in future),
AWS RDS would be the BEST option.
As once you scale the Environment, it might be complex and needs lot of processing and maintaining if you host it on the INSTANCE.
While AWS RDS makes it easy for you.
Hope It Helps.. :)
If you need EC2 instance(s) anyway (for web hosting for instance) then hosting MySQL on an EC2 instance that you are already paying for is going to be cheaper...
But as your load goes up I would definitely look towards RDS for easier scaling, reduced admin overhead, better disaster recover story, etc... No reason in my opinion to host MySQL on dedicated EC2 instances...
For your requirement of load balancing and easy scaling you will need a dedicated instance for database. Your EC2 instances hosting your game would be behind load balancer and those will all connect to one database on dedicated instance.
That dedicated instance hosting your database could be RDS or EC2 instance. RDS is expensive but has its benefits.
As a .NET web developer, I've always used SQL Server as my database store because it's already in the MSFT ecosystem and easy to work with from the .NET platform.
Recently, however, I had a computer almost literally blow up, and consequently lost all my data in SQL Server on that machine.
Now that I've got a new computer, I want to start using an off-site database so that this doesn't happen again. A database hosted by a third-party (i.e. hosting company) or cloud service.
It doesn't have to be SQL Server or even RMDBS necessarily, but if it's not, it'd be be something cutting-edge (e.g. redis, Cassandra, MongoDB, CouchDB, etc.) and not just MySQL or Postgre or something.
Does anyone have an recommendations for those with little financial means?
I'd like to be able to use it during development of projects, and if they ever go live, not have to migrate the data anywhere to a new service--keep the data right there where it is and point my live domain requiring the data to the same service it pointed while in development.
It's not so much a question of available hosted services as of what setup you want for your standard development environment. If one of the cloud datastores doesn't work for you, you can always get a virtual server and install whatever you need.
However, you may want to rethink the idea of putting dev databases in the cloud. Performance will not be as good as something running locally (particularly if you are working with things like bulk import), and turning a dev database into a production database isn't a particularly good idea. I think what you are really looking for is a combination of easy backup, schema management and data setup.
Backup on a live server is easy enough - either you are backing up the entire server or have a script that uploads the backup file somewhere. For dev I don't bother as I prefer to set up disposable environments - have code that can set up the database if it doesn't already exist and add any necessary default data. Most apps don't need much data unless there is some sort of import process involved, and the same code works quite nicely when you first set up the live environment.
Schema management is one of the more painful aspects of working with SQL and where NoSQL systems can make life a lot easier as most have the schema defined entirely by the code that is using it - I mostly use redis myself, but whether or not it is appropriate for you will depend on the type of project you work on - if you need a lot of joins or transactions you probably need SQL, but if you just need basic data storage most NoSQL platforms would be better.
May I suggest looking into Windows Azure table storage? It is quiet different from pure relational play of SQL Server, is the "next big thing" from Microsoft and is in general a somewhat of a paradigm shift for folks used to relational databases.
If you're ever going to come face to face with Azure in the future (and I suspect many .NET people will), it maybe a beneficial of an experience to have.
With respect to costs, they're negligible for individual use. 10,000 transactions a month cost a penny. A gigabyte per month of storage costs 15 cents, and data transfers are 10-15cents per gigabyte.
If you have only "development" projects that store their data in the cloud, I'll be damned if you pay more than $2-3/month to MS... if that :)
Google Cloud Datastore is in beta now and could be a good option for you. It's free up to 1GB and 50K requests per day. The API is rather low level. However, I wrote a high level ORM for GCD called Pogo that serializes and deserializes plain old objects into GCD entities.
Take a look at the documentation and open source here - http://code.thecodeprose.com/pogo
It's also available on Nuget called "Pogo".
I'm looking for some help/suggestions for backing up two large databases to one server dedicated to reports. The situation is;
My company has two databases for its internal website. One for the UK and one for Europe. Both are mirrored for DR.
I have a server based in Europe which is dedicated to Microsoft Reporting Services, where we run reports based on the data collected in those two databases.
We do not want to point reporting services to the live databases for performance/security reasons so we currently backup both databases on a daily basis and restore them to our Reporting Services server.
However this means we are putting a strain on our networks by backing up the entire databases, and also the data is only up-to-date by midnight yesterday.
Our aim is to have the data up to date by at least 15 minutes, it has been suggested to look at Log Shipping so I wondered if anyone had any experience in setting this up and what are the pros and cons and whether there is a better alternative?
Any help would be greatley appreciated,
Thanks
We developed a similar environment. We used Mirroring to get the data off to our reporting server and created an automated routine to create Snapshots of the database every 15 min. These snapshots only take 1 to 2 seconds to create in our environment and give us a read only copy of the database. Let me know if you would like me to go into deeper detail.
Note we are running Enterprise on both servers.
Log shipping is a great solution for this. We've got articles about it over at SQLServerPedia's Log Shipping section, and I've got a video tutorial on there talking you through your different options. One thing to keep in mind about log shipping is that when the restores happen, your users will be kicked out of the reporting database.
Replication doesn't have that problem, but replication is nowhere near "set-it-and-forget-it" - it's time-intensive to manage, and isn't quite as reliable as you'd like it to be. In addition, you may have to make schema modifications in order to use replication. Log shipping is more automatic & stable, but at the cost of kicking users out at restore time.
You can minimize that by having two log shipping schedules - one for daytime during business hours, and one for the rest. During business hours, you only restore the data once per hour (or less), and the rest of the time you do it every 15 minutes.
You should look at replication as an alternative to backups.
I would recommend that you look into using Transactional Replication.
It sounds as though you are looking to implement a scenario that is similar to what we are currently implementing ourselves.
We use Transaction Replication (albeit real time, you would most likely wish to synchronize your environment on a less frequent schedule) to offload a copy of our live production database to another server for reporting purposes.
Offloading reporting data is a common replication scenario and is described here in the Microsoft Replication documentation.
http://msdn.microsoft.com/en-us/library/ms151784.aspx
Brent is right in that there is indeed an element of configuration required with Replication, along with security considerations that would need to be addressed however, there are a number of key advantages to using Replication in my opinion, including:
Reduced latency in comparison to log
shipping.
The ability to Publish only the
Articles (tables) that are required
for reporting.
Reduced storage requirements.
Less data being published means less
network traffic.
Access to your reporting
data/database at all times.
For example, in our environment, we decided to replicate only the specific tables (articles) from our production database that we actually require for reporting.
I hope what I have described is clear and makes sense but please do feel free to contact me if you have any queries.