I would like to know how people tackle backup and restore of data in the cloud.
I plan to use AppEngine for business use and as far as I can tell there is no classic backup and restore functionality built into AppEngine. It is reasonable because there database structure is much different.
So the question is how to approach backup and restore in high replication AppEngine application?
Agree with the previous posts. The built-in datastore backup / restore is solid enough.
However, if your data is partioned by namespace, Google does not offer the possibility to backup / restore by namespace which is a big limitation. It means that if you have a multi-tenant application, you cannot backup / restore data for one namespace.
Would you really need backup / restore by namespace, you will have to extend the backup / restore from Google which is open source (see http://code.google.com/p/googleappengine/source/browse/trunk/python/google/appengine/ext/datastore_admin/backup_handler.py).
I am currently on the path to perform this modification to the open source of Google but did not find the time to do it yet.
Hope this helps !
Backup/Restore, Copy and Delete on Google App Engine is still experimental, but it is there. I would suggest you to built a prototype and try to Backup/Restore a few times before deciding to build the whole thing. The data is pretty much secure, but if you want to protect the datastore from some abuse/attack then it's necessary to have that resolved. If you are afraid of losing data, then the chances are pretty low for that actually to happen, but still, you never know!
GAE backup works pretty well for us: few days ago I backed up entities of about 800MB in size. No problems there. It also does restore - just save data to a file in blobstore or Cloud Store and you can restore it anytime. There is a limitation: no automatic/programmable backup - it's all manual.
Related
I would like to backup my SonarQube production server. Can you please help with below two queries:
1\ What all things backup needs to be taken (e.g. database or which all folders from SonarQube home)
2\ Is there any solution already available which can be used directly to take backup of SonarQube.
Thanks and Regards,
Deepti
We regularly do backups in our company, and for that we only backup two things:
the database - because it contains all the data and this is the only thing based on the upgrade notice you need for a restore https://docs.sonarqube.org/latest/setup/upgrading/
the ./extensions/plugin folder - because you never know, what was the version of which plugin afterwards. and you might have a custom plugin, or one which is not in the marketplace, which you will sure forget about
There is no reason to backup the elastic search data, as sonarqube will create all the necessary information on startup. Just be aware, that the first startup will take some time, depending on the amount of data you have stored.
As those two things are actually pretty straight forward, i am not sure if there is really a tool which can help you with that.
So I recently started using AWS and Elastic Beanstalk with RDS.
I wonder whats the best practices for creating database backups?
So far my setup is this.
Enable automatic backups.
bash script that creates manual snapshots everyday and removes manual snapshots older than 8 days.
bash script that creates a sql dump of the database and uploads it to S3.
Reason why I am creating the manual snapshots is if I were to delete the database by misstake, then I still have snapshots.
The bash scripts is on an EC2 instance launched with a IAM role which is allowed to execute these scripts.
Am I on the right track here?
I really appreciate answers, thanks.
A bit of context...
That automated backups are not saved after DB deletion is a very important technical gotcha. I've seen it catch devs on the team unawares, so thanks for bringing this up.
After the DB instance is deleted, RDS retains this final DB snapshot and all other manual DB snapshots indefinitely. However, all automated backups are deleted and cannot be recovered when you delete a DB instance. source
I suspect for most folks, the final snapshot is sufficient.
Onto the question at hand...
Yes. 110%. Absolutely.
I wouldn't create manual snapshots; rather, copy the automated ones.
Option 1: You already have the automated snapshots available. Why not just copy the automated snapshot (less unnecessary DB load; though, admittedly less of an issue if you're multi-AZ since you'll be snapshoting from the replica), which created a manual snapshot. I'd automate this using the aws sdk and a cron job.
Option 2: Requires manual adherence. Simply copy your automated snapshots (to create manual snapshots) before terminating a DB.
Unclear why you'd need the s3 dump if you have the snapshots.
For schema changes?: If you're doing it for schema changes, these should really be handled with migrations (we use knex.js, but pick your poison). If that's a bridge too far, remember that there's an option for schema-only dumps (pg_dump --schema-only). Much more manageable.
Getting at the data?: Your snapshots are already on s3, see faq. You can always load a snapshot, and sql dump it if you choose. I don't see an immediately obvious reason for purposely duplicating your data.
Google Datastore has a backup utility. But it is far too slow for a operational database, taking hours to run a backup or restoration of a few dozen GB. Also, Google recommends disabling Cloud Datastore writes during backup, which again is impossible for an operational database.
How can I backup my Datastore so that if there is data corruption, I can rapidly restore, losing at most a few minutes of transactions?
It seems that this is an essential part of any full-strength database system.
(Other databases provide this with
append-only storage or
periodic backups augmented with a differential backup or transaction log or
realtime mirroring, though that doesn't handle the case of data corruption from a bug that writes to the database.)
The backup utility starts MapReduce jobs in the background to parallelize the backup/restore and perform it faster. However, it seems to shard the entities by namespace. If your data is on one or few namespaces the process can be very slow.
You could implement your own parallel backup/restore mecanism by using a tool like AppEngine MapReduce1 or Cloud Dataflow2 to make it faster.
You aren't going to get a "few minutes" latency with an "eventually consistent" nosql datastore like this in order to protect yourself from yourself with a naive back up of everything. You really should invest in good testing to make sure that no bugs like this exist in the first place.
Your only real solution is to use immutable data and versioning, since this is how all the other nosql systems do it as well. The datastore key system that exists already can works well for this and is extremely fast.
Never update anything and you never corrupt anything or lose anything and can roll back to individual records to previous revisions very quickly.
Archive the older revisions at some point to the cloud bucket store at some point to save money on storage or delete after a fixed number of revisions.
I've seen a few posts on this, but I just want to make sure I'm not missing something.
I'm seriously considering moving from Azure to App Harbor, but I'm a bit dismayed that there doesn't seem to be a way to maintain daily SQL Server database backups.
I understand that App Harbor maintains daily file system snapshots. This is great for recovering from a catastrophic failure, but doesn't do much to deal with recovering from user errors. For example, if I accidentally delete a chunk of rows, I may want to restore a database from a few days ago to help recover.
I know about these tools for transferring data to/from App Harbor:
- "Generate Scripts" tool in SQL Management Studio
- Bulk copy tool: https://github.com/appharbor/AppHarbor-SqlServerBulkCopy
Those are fine for doing a one-off backup or restore, but I'm looking to figure out some way to back up data automatically, and ideally save it off to AWS S3 storage. Is there a tool or service out there that could possibly do this?
Thank you!
I've created a simple console app that does a daily backup of tables in a SQL Server database. The output is then zipped and uploaded to Amazon S3 storage. This app can be deployed as an AppHarbor background worker. No local SQL server required!
See notes in the readme file for instructions and limitations. This does what we need for now, but I'm happy to let others work on the project if you'd like to extend it.
https://bitbucket.org/ManicBlowfish/ah-dbbackup
What are some of the best practices available for taking SQL Server backups from an Amazon EC2 based server??? I have a nightly job that creates the backups but I still need to move them "off-site". So I'm really asking two question (1) are there sample scripts (BAT and otherwise) that can take the files and move them (FTP?) to another server and (2) are there any other EC2 specific options that I can take a look at?
You should look at a two part approach.
Amazon S3 is distributed across three different data centers and offers durability of 11 9s - realistically there's more chance of Amazon going out of business than your data being unavailable. You could back up your data to S3 on a schedule based on how much it would hurt to lose that period's worth of data: e.g. if losing a day of data isn't a big deal, perhaps you back up nightly. You asked about EC2-specific options; EBS snapshots are automatically saved to S3, and they are part of the best practice for ensuring durability of EBS disks.
Because you're never likely to lose data on S3, the next most likely problem is "Amazon goes out of business", or "Amazon does not let me access my data". You can consider either another cloud provider, a traditional hosting provider, your own on-site storage, etc. The sky is really the limit on options (search StackOverflow if you can't come up with a handful - there is no S3-alternative hosted by anyone but Amazon unfortunately) so I won't cover that here, but it's worth weighing up your perception of the likelihood of this happening and the effort it would take you to make your data useful again when considering how frequently you need to do the off-AWS backup.
People who run services outside of AWS have it so easy, because they can just off-site back up to S3!
Take a look at SQL Backup Master, which can take database backups and move them to Amazon S3, Dropbox, FTP, etc.
Any standard backup tool for SQLServer should work. But I agree with crb that S3 (using EBS snapshots or not) should be the first line of defense.