Backup approach in Snowflake - snowflake-cloud-data-platform

While I understand that replication can be used for disaster recovery, Timetravel/failsafe to recover up to 90 to 97 days. I am curious on solution options in implementing backup approach beyond what time travel can offer based on below requirements
(1) Retain weekly/monthly/yearly snapshots of data to comply with various regulations
(2) Recover entire database to a back-in-time picture
While traditional databases offer flexibility in terms of backups for recovery, I am trying to understand any backup options beyond cloning and timetravel that customers use to facilitate snowflake back-in-time recovery options up to last 7 years.
Thanks for reading the post and any inputs that you may have.

Related

Recommendations on which SQL Server Recovery Model to use

We have a production server which has approx. 50 customer production databases running, we are reviewing the recovery models and would like to know what all your recommendations are for what recovery model to use.
I have been doing some research, but it's been mixed responses to which model is best.
The server scenario is;
Full Server backups are performed once daily (this occurs by the cloud provider)
All databases currently don't get backed up regularly (as they used too approx. 6 months ago)
The information/amount of data that is stored on these databases differ between each customer's needs - per database
All of the databases are currently set as SIMPLE recovery model, this was decided when the log files began to grow excessively which started causing HDD issues / limited space and connections to the databases couldn't be made. Since changing to SIMPLE it has stopped the HDD issues but now we have to consider what the recovery of the databases would be if a disaster was to happen.
I look forward to your response/recommendations!
Thank you in advance
Simple
Work loss exposure: Changes since the most recent backup are unprotected. In the event of a disaster, those changes must be redone.
Full
Work loss exposure: Normally none. If the tail of the log is damaged, changes since the most recent log backup must be redone.
So what if you lose your data since your last backup? Acceptable or not?

Database Backup best practices

I am working in a production environment, where we process XML files daily. Our database size is quite big. we are taking a daily backup. I learned that Marklogic adds up changes to your previous backup to create new backup.
I wanted to confirm that is it the best way to keep daily backup or there is any other better way to do it. Also is there any limit to the process, that I am following. My Database size is around 350 GB and increasing daily. So I am looking for a faster and easier solution.
This question is fairly open-ended: there is no single "best way". MarkLogic supports full online backups, and journal archiving for continuous incremental backup. The docs at http://docs.marklogic.com/guide/admin/backup_restore discuss these options.
Instead of a full daily backup, you might consider a full weekly backup plus journal archiving. As you start a new week, you can do whatever you like with the data the from previous week: retain it, delete it, move it onto cheaper storage, etc.
As MarkLogic databases go, 350-GB is not so large. However at that point you should have already configured multiple forests: see http://docs.marklogic.com/guide/cluster/scalability#id_96443 for guidelines. Assuming you have multiple CPU cores, storing the content in a proportional number of forests will improve performance throughout the system. That includes backup, because multiple forests will back up in parallel - though of course the disk may still be the bottleneck. If storage is the bottleneck, separating the I/O for forests and backup is advisable.
If having multiple forests is a new idea, you might also be interested in https://github.com/mblakele/task-rebalancer

Maintenance window and recovery for a large database

One of our teams is developing a database that will be somewhat large (~500GB) and grow from there (I know 500 Gigs may seem small to many of you, but it will be one of the larger databases in our shop). One of the issues they are grappling with is backing up and restoring the database. Basically, the database will have several "data" tables and one table used for storing images / documents. We need to accomplish the following:
Be able to quickly backup and restore only the data tables (sans images) to our test server for debugging and testing purposes.
In the event of a catastrophic database failure, restore the data tables only to get most of the application up and running ASAP. Then, restore the images table when possible.
Backup the database within the allotted nightly time window (a few hours).
My questions are:
Is it possible to accomplish the first two goals while still having the images stored in the same database? If so, would we use filegroups, filestream, or something else?
How do other shops backup their databases in a reasonable time window while maintaining high availability? Do you replicate to a second server and backup from there?
We have dealt with similar issues. We are a $2.5B solar manufacturing company and disaster recovery is critical for us, as well as keeping our databases backed up. Our main database is our plant floor production database. We decided to strip this database to the absolutely essential data needed to maintain production, and move other data off into its own database. This has allowed us high availability and reasonable backup/restore times.
In your case, is it really necessary to store images in the same database as your other data? I suspect it's not, and is just a case of making some issues easier to deal with. I think separate file groups would also help your problem. But you might want to seriously reconsider whether everything needs to be in a single DB.

Backing up SQL Database for Reports

I'm looking for some help/suggestions for backing up two large databases to one server dedicated to reports. The situation is;
My company has two databases for its internal website. One for the UK and one for Europe. Both are mirrored for DR.
I have a server based in Europe which is dedicated to Microsoft Reporting Services, where we run reports based on the data collected in those two databases.
We do not want to point reporting services to the live databases for performance/security reasons so we currently backup both databases on a daily basis and restore them to our Reporting Services server.
However this means we are putting a strain on our networks by backing up the entire databases, and also the data is only up-to-date by midnight yesterday.
Our aim is to have the data up to date by at least 15 minutes, it has been suggested to look at Log Shipping so I wondered if anyone had any experience in setting this up and what are the pros and cons and whether there is a better alternative?
Any help would be greatley appreciated,
Thanks
We developed a similar environment. We used Mirroring to get the data off to our reporting server and created an automated routine to create Snapshots of the database every 15 min. These snapshots only take 1 to 2 seconds to create in our environment and give us a read only copy of the database. Let me know if you would like me to go into deeper detail.
Note we are running Enterprise on both servers.
Log shipping is a great solution for this. We've got articles about it over at SQLServerPedia's Log Shipping section, and I've got a video tutorial on there talking you through your different options. One thing to keep in mind about log shipping is that when the restores happen, your users will be kicked out of the reporting database.
Replication doesn't have that problem, but replication is nowhere near "set-it-and-forget-it" - it's time-intensive to manage, and isn't quite as reliable as you'd like it to be. In addition, you may have to make schema modifications in order to use replication. Log shipping is more automatic & stable, but at the cost of kicking users out at restore time.
You can minimize that by having two log shipping schedules - one for daytime during business hours, and one for the rest. During business hours, you only restore the data once per hour (or less), and the rest of the time you do it every 15 minutes.
You should look at replication as an alternative to backups.
I would recommend that you look into using Transactional Replication.
It sounds as though you are looking to implement a scenario that is similar to what we are currently implementing ourselves.
We use Transaction Replication (albeit real time, you would most likely wish to synchronize your environment on a less frequent schedule) to offload a copy of our live production database to another server for reporting purposes.
Offloading reporting data is a common replication scenario and is described here in the Microsoft Replication documentation.
http://msdn.microsoft.com/en-us/library/ms151784.aspx
Brent is right in that there is indeed an element of configuration required with Replication, along with security considerations that would need to be addressed however, there are a number of key advantages to using Replication in my opinion, including:
Reduced latency in comparison to log
shipping.
The ability to Publish only the
Articles (tables) that are required
for reporting.
Reduced storage requirements.
Less data being published means less
network traffic.
Access to your reporting
data/database at all times.
For example, in our environment, we decided to replicate only the specific tables (articles) from our production database that we actually require for reporting.
I hope what I have described is clear and makes sense but please do feel free to contact me if you have any queries.

DB2 Online Database Backup

I have currently 200+ GB database that is using the DB2 built in backup to do a daily backup (and hopefully not restore - lol) But since that backup now takes more than 2.5 hours to complete I am looking into a Third party Backup and Restore utility. The version is 8.2 FP 14 But I will be moving soon to 9.1 and I also have some 9.5 databases to backup and restore. What are the best tools that you have used for this purpose?
Thanks!
One thing that will help is going to DB2 version 9 and turn on compression. The size of the backup will then decrease (by up to 70-80% on table level) which should shorten the backup time. Of course, if your database is continuosly growing you'll soon run into problems again, but then data archiving might be the thing for you.
Before looking at third party tools, which I doubt would help too much, I would consider a few optimizations.
1) Have you used REORG on your tables and indexes? This would compact the information and minimize the amount of pages used;
2) If you can, backup on multiple disks at the same time. This can easily be achieved by running db2 backup db mydb /mnt/disk1 /mnt/disk2 /mnt/disk3 ...
3) DB2 should do a good job at fine tuning itself, but you can always experiment with the WITH num_buffers BUFFERS, BUFFER buffer-size and PARALLELISM n options. But again, usually DB2 does a better job on its own;
4) Consider performing daily incremental backups, and a full backup once on Saturdays or Sundays;
5) UTIL_IMPACT_PRIORITY and UTIL_IMPACT_LIM let you throttle the backup process so that it doesn't affect your regular workload too much. This is useful if your main concern is not the time per se, but rather the performance of your datasever while you backup;
6) DB2 9's data compression can truly do wonders when it comes to reducing the dimensions of the data that needs to be backed up. I have seen very impressive results and would highly recommend it if you can migrate to version 9.1 or, even better, 9.5.
There really are only two ways to make backup, and more important recovery, run faster:
1. backup less data and/or
2. have a bigger pipe to the backup media
I think you got a lot of suggestions on how to reduce the amount of data that you back up. Basically, you should be creating a backup strategy that relies on relatively infrequent full backup and much more frequent backups of changed (since last full backup) data. I encourage you to take a look at the "Configure Automatic Maintenance" wizard in the DB2 Control Center. It will help you with creating automatic backups and with other utilities like REORG that Antonio suggested. Things like compression obviously can help as the amount of data is much lower. However, not all DB2 editions offer compression. For example, DB2 Express-C does not. Frankly, doing compression on a 200GB database may not be worth while anyway and that is precisely why free DBMS like DB2 Express-C don't offer compression.
As far as openign a bigger pipe for your backup you first have to decide if you are going to backup to disk or to tape. There is a big difference in speed (obviously disk is a lot faster). Second, DB2 can paralelize backups. So, if you have multiple devices to back to, it will backup to all of them at the same time i.e. your elapsed time will be a lot less depending how many devices you have to throw at the problem. Again, DB2 Control Center can help you have it set up.
Try High Performance Unload (HPU) - this was a standalone product from Infotel is now available as part of the Optim data studio - posting here https://www.ibm.com/developerworks/mydeveloperworks/blogs/idm/date/200910?lang=en
It's not a "third-party" product but anyone that I have ever seen using DB2 is using Tivoli Storage Manager to store their database backups.
Most shops will set up archive logging to TSM so you only have to take the "big" backup every week or so.
Since it's also an IBM product you won't have to worry about it working with all the different flavors of DB2 that you have.
The downside is it's an IBM product. :) Not sure if that ($) makes a difference to you.
I doubt that you can speed things up using another backup tool. As Mike mentions, you can add TSM to the stack, but that will hardly make the backup run any faster.
If I were you, I'd look into where backup files are stored: Are they using the same disk spindles as the database itself? If so: See if you can store the backup files on a storage area which isn't contented for access during your backup window.
And consider using incremental backups for daily backups, and then a long full backup on Saturdays.
(I assume that you are already running online backups, so that your data aren't unavailable during backup.)
A third party backup package probably won't help your speed much. Making sure that you are not doing full backups every 2 hours is probably the first step.
After that, look at where you are writing your backup to. Is it a local drive, instead of a network drive? Are the spindles used for anything else? Backups don't involve a lot of seek activity, but do involve a lot of big writes, so you probably want to avoid RAID 5 and go for large stripe sizes to help maximize throughput.
Naturally, you have to do full backups sooner or later, but hopefully you can find a window when load is light and you can live with a longer time period between backups. Do your full backup during a 4-6 hour period when the normal incrementals are off and then do incrementals based off of that the rest of the time.
Also, until you get your backup copied to a completely separate system you really aren't backed up. You'll have to experiment to figure out if you're better off compressing it before, during or after sending.

Resources