Creating efficient, fast, incremental InfluxDB database backups - database

I have a raspberrypi (4b) running Raspbian Linux, collecting IoT data around the house and feeding this into an InfluxDB 1.8.3 (opensource) database. This works fine so far.
I also have a backup which runs daily like this:
influxd backup -portable /home/pi/influx-backup/
Question:
This backup process takes almost 30 minutes, during which InfluxDB is almost unuseable, system load climbs to >7 and my Pi cannot collect data. Each backup is a complete backup. Can I somehow create a faster incremental backup daily? The documentation only mentions a -since parameter but you'd have to specify this manually, which would be risky.
Alternatively, the whole system is backed up daily using borgbackup anyway. Stopping Influx, making a rsync copy of /var/lib/influxdb/data as backup, and restarting it is much, much faster than influxd backup. Is this a good alternative idea to backup the database?
What other alternatives exist to perform regular, quick, (if possible online) backups of Influx databases?
Thanks!

Acording to this site:
InfluxDB also has support for incremental backups. Snapshotting from the server now creates a full backup if one does not exist and creates numbered incremental backups after that.
DataSource
If this is the case, but you are still having a problem, perhaps you could downsize your data by running continuous queries and a data retention policy.

Related

Finding an alternative for taking full backup daily

I'm using Microsoft SQL server and storing lot of data daily. I'm taking full backup daily which takes more than 5 hours to complete. Is there any idea to complete my backup process within a hour ? An alternative things to do?
Daily differential backup (only changes will be backed up) and CloudBerry software will make you deal:)
Definitely consider taking differential backups. Depending on how much data is changing you may be able to take differential backups every hour or so, giving you a quick way to restore the hourly backup if needed. Perhaps complete a full backup once a week or every few days.
There are several considerations and I suggest you research it well to see if it will meet your specific needs, for example (source):
If your databases are small or compress effectively enough so that your full and transaction log backups fall within storage and SLA limits, differential backups are unnecessary.
If your databases change a lot between backups, you might as well perform full backups.
If the changes to your database are few and the transaction log backups would take longer to restore than the differential backups, using differential backups might make sense and are worth investigating.
Other things to consider:
Are you currently taking transaction log backups? Should you be? What recovery model are you using?
If you decide to do differential backups, be aware of things such as the "Copy only backup" option in SQL Server Management Studio. These have implications for restoring data in a disaster recovery situation.
In essence, you should educate yourself on SQL Server backup and restore before you make any changes.
To improve the situation and to speed the backup you may consider differential and transaction log backups. You can do it through the studio, but to schedule it you will have to refer to a 3-rd party tool like CloudBerry.
The alternate to taking full backups daily is taking incremental/ progressive incremental or reverse incremental backups.
An incremental backup copies the changes only from the last successful backups.
Progressive incremental has only on one full backup copy for the servers and takes incremental backups for the rest of cycle.
Reverse incremental supported by Veeam and DPM has its own mechanism to backup and store only incremental copies from the server, and store each copy as a full copy which allows quick restoration in case of disasters.
To configure this kind of backups, you can go for any enterprise level backup tool, but if you focus on cost effectiveness I would recommend Cloudberry backup.
I tried CloudBerry Cross-Platform Cloud Backup which provides a simple GUI to manage backup and restores and cloud storage account comes bundled with the software.You can also use the cloudberry explorer to view, move and manage your data on the cloud storage account.
It also provides enterprise backup features like job scheduling, CLI, compression and encryption.

Continuous database backups?

I have the following scenario:
Our system is running a SQL Server Express 2005 database locally (on each users desktop, if you will). The system is storing a lot of production data from a machine. There are high demands on the safety of the data, and doing a backup each night, or even each hour is not enough. We need a backup strategy that will ensure almost instantaneous/continuous backup of the database.
Is there anyone out there that has successfully implemented a system similar to this, and/or has got some ideas of how to accomplish it? The only thing I can think of right now is to have mirrored drives (raid) to hold the data, but that would be complicated and expensive.
I would appreciate any and all thoughts on this, since it is a real issue for me and my company. Thanks in advance!
Update:
I was not clear enough in my description of the scenario. The system is storing data in a vehicle that has no connection to anything. A centralized database is therefor not possible. Neither can we use a standard/enterprise version of SQL Server, since it would be to expensive (each vehicle would need a license). Thanks for your input!
Switch your database into "Full" recovery mode. Do full backup every night and do delta backup after major user action. The delta backups can be done to the flash memory or different hard-drive, and all data can be synchronized with server when online.
Another simple way is to trace all user changes and important data in a text file that stored on a separate drive. If SQL database crashes the user or other operator can repeat steps to restore data.
One way I've seen this done is by using DoubleTake.
I will assume that a central database on a server is not feasible because your systems are running standalone and are not connected to anything. So this is what I would do
Set up RAID on the computer. This insures you against simple disk failure.
Any SQLSever database can be recovered to the point of the last commited transaction if you have a full database backup and a set of transaction logs available. Basically you simply restore the last full backup then apply the transaction logs going forward. See these links.
http://www.enterpriseitplanet.com/storage/features/article.php/11318_3776361_3
https://web.archive.org/web/1/http://blogs.techrepublic%2ecom%2ecom/datacenter/?p=132
So what you need to do is set up a periodic full backup of both the database and transaction logs, and more regular transaction log backups (and ensure that your transaction log can never run out of space).
In the event of failure you restore the last full backup, then apply the transaction logs going forward.
Myself, if these are critical systems, I would be inclined to add an additional drive to the system and make sure that the backups are copied over to that. This is because as good as raid is it does sometimes have issues - raid controllers fail, disks get wiped accidentally in parallel, disk failures go unnoticed so your just running on one disk etc. If you ensure backups are copied to a separate disk then you can always recover to the last transaction log backup. You should also ensure tape backups of course, but they are generally a last resort in the event of trouble.
If for some reason you cannot set up raid then you should still install a second disk, but place the database file on one drive and the transaction log on the other and copy backups to both disks. In the event of failure of the C drive, or some other software issue crashing the database you can still recover to the last commited transaction. Failure on the D drive limits you to the last transaction log backup (Oracle used to allow you to mirror the transaction log from the database, which again would completely cover you, but I don't think this facility exists in SQL Server)
If you are looking for a scheduler for SQL Server Express (which doesn't come with one) then I've been using SQLScheduler quite happily without problems, and it's free.
The most obvious answer would be to ditch SQL Server Express running locally and use a single source for your data (such as a standard SQL server install on a central storage location). Unless your system requires individual back ups of every single person's own individual instance of SQL Server Express.
If your requirements are so stringent as to call for instantaneous backups on every operation, you should definitely think about a different method of storage than local instances of SQL Server Express.
Wouldn't it be easier to just use one centralized SQL Server and back that up every hour or so? If you truly need instantaneous backup, your company (which seems not to want to spend money by installing Express on each machine) will need to spring for two servers and two SQL Server Enterprise licenses to implement Mirroring.
Raid isn't that expensive, but it is also not the best option. If you really want high availability data you should upgrade to sql server standard on a remote server where each user connects to and use transaction based replication to an sql server (express) instance on another machine. Raid doesn't always protect you from dataloss. If the data is that important for you then the costs should not be that much of an issue.
Update in response to the question update.
If you can't use remote servers then there a couple of options:
You write a trigger which initiates a backup script on each insert or update and stores it on a seperate harddrive.
You use raid. But beware that if the raid controller fails that you still got a problem.
RAID is not expensive. Use RAID to protect against hard drive failure. You also need monitoring though. No point in having this if you let both drives fail.
Also, implement hourly incremental backups, then daily incremental backups and finally weekly full backups.
You need all of these strategies working together because they protect against different things. RAID does not protect against human or coding errors destroying data. Hourly and weekly backups don't protect against hard drive failure.

SQL Server 2005 Backup strategy

I manage a web application for a client with the following specs:
ASP.net 3.5 running on a Virtual Windows 2003 Web Server
SQL Server Standard hosting the database
Database current size of 6Gb, with 1Gb/month growth rate
One single table is responsible for 98% of the size, holds the most critical data for the client
Log is not kept for this big table, only selects are done in this table
50 Gb FTP space avaiable for backup
Considering this scenario, what would be the best strategy for a SQL Backup and what tool would be best suited for this task (commercial applications included, client can pay for the license fee)?
Here is the strategy we use for CodePlex.com:
All SQL servers run with a peer server using SQL mirroring
Weekly full backup (stored on separate drive from databases)
Daily differential backup (stored on separate drive from databases)
Transaction log backup every 5 minutes (stored on separate drive from databases)
Daily tape backup
Tape backups taken offsite weekly
Also very important test your backups! Studies have shown that over 30% of untested backup procedures are flawed. Here is our backup testing strategy:
Every 30 minutes verify the full backup file exists (using scheduled task)
Every 30 minutes verify the differential backup file exists (using scheduled task)
Every 30 minutes verify the transaction log backup file exists (using scheduled task)
Every 30 minutes verify the database mirroring is configured (using scheduled task)
Every day, do a test restore of the full+differential backup and report the table row counts (using scheduled task)
Once a month do a test restore of the most recent tape backup and verify the data
It depends how critical is the data. Here is however how I i'd do it.
1. Run a full backup every day.
2. Run a differential backup every 4 hours.
3. Run a transactional log backup every 15 minutes
4. Keep a copy at the site and move a copy off the site as well as soon as the backup is done.
The database is not too big, and this is easily doable.
Use a third party tool like Redgate SQL Backup and it will automatically compress and encrypt the database backup for you. I have used it extensively and am a big fan.
Additionally if you another site available, and the data is very critical, you might want to think about setting up log shipping as well.
This is a VPC? Can you install apps?
http://www.jungledisk.com/
That's what we use - make a sql job that pushes out a backup every day, then use that service to push a copy back to Amazons S3 service. If not maybe you could have a local app that pulls the backup to a machine then pushes it /w S3 webservice, or still using Jungledisk.
This is important! If your app goes down it hurts! Also make sure you backup your deployed app and resources stored there... i.e. uploaded content to your apps storage directory.
I was supposed to type in my answer to your question but I realized there are lots of far greater resources somewhere like this article in SQLServerCentral.com. You can also find lots of "Best Practices on Backup" like this one.
You might also want to take into consideration how much data you can afford to lose and how long it will take you to restore the database. Your client may decide that they never want to lose more than 15 minutes of data ever, or they may decide that losing up to a days worth of data is okay with them.

Backup SQL Server while minimizing bandwidth

I want to implement an automated backup system for my site's SQL Server 2005 DB that will backup nightly to Amazon's S3 service. But since S3 charges both for space and bandwidth used, I would like to minimize the size of the files that I transfer in. What is the best way to achieve this?
I should clarify that I'm not really talking so much about compression, which is pretty straightforward, but concerning backup strategies like whether to do differential backups all the time, whether I need to copy transaction logs, etc.
Differential backups will be smaller than full backups, of course. However, you should consider the restoration side as well. You'll need your last full backup as well as your differentials to perform the restore which can add up to a lot of bandwidth/transfer time for a restore. One option is to perform a full backup weekly and do differentials daily (or a similar type of schedule).
As for transaction logs, it depends on what granularity you're looking for in restoring your data. If restoring to the last full or differential backup is sufficient, then you don't need to worry about taking transaction log backups. If that's not the case, then transaction log backups will be necessary.
Either use a commercial product do compress the backups like Red Gate Backup Pro or just zip-compress it after you're done.
Write a .batch script or powershell script that will find the file/s created in the past day and zip them up. Then FTP or whatever you have to do.
A powershell example that I just came across.

DB2 Online Database Backup

I have currently 200+ GB database that is using the DB2 built in backup to do a daily backup (and hopefully not restore - lol) But since that backup now takes more than 2.5 hours to complete I am looking into a Third party Backup and Restore utility. The version is 8.2 FP 14 But I will be moving soon to 9.1 and I also have some 9.5 databases to backup and restore. What are the best tools that you have used for this purpose?
Thanks!
One thing that will help is going to DB2 version 9 and turn on compression. The size of the backup will then decrease (by up to 70-80% on table level) which should shorten the backup time. Of course, if your database is continuosly growing you'll soon run into problems again, but then data archiving might be the thing for you.
Before looking at third party tools, which I doubt would help too much, I would consider a few optimizations.
1) Have you used REORG on your tables and indexes? This would compact the information and minimize the amount of pages used;
2) If you can, backup on multiple disks at the same time. This can easily be achieved by running db2 backup db mydb /mnt/disk1 /mnt/disk2 /mnt/disk3 ...
3) DB2 should do a good job at fine tuning itself, but you can always experiment with the WITH num_buffers BUFFERS, BUFFER buffer-size and PARALLELISM n options. But again, usually DB2 does a better job on its own;
4) Consider performing daily incremental backups, and a full backup once on Saturdays or Sundays;
5) UTIL_IMPACT_PRIORITY and UTIL_IMPACT_LIM let you throttle the backup process so that it doesn't affect your regular workload too much. This is useful if your main concern is not the time per se, but rather the performance of your datasever while you backup;
6) DB2 9's data compression can truly do wonders when it comes to reducing the dimensions of the data that needs to be backed up. I have seen very impressive results and would highly recommend it if you can migrate to version 9.1 or, even better, 9.5.
There really are only two ways to make backup, and more important recovery, run faster:
1. backup less data and/or
2. have a bigger pipe to the backup media
I think you got a lot of suggestions on how to reduce the amount of data that you back up. Basically, you should be creating a backup strategy that relies on relatively infrequent full backup and much more frequent backups of changed (since last full backup) data. I encourage you to take a look at the "Configure Automatic Maintenance" wizard in the DB2 Control Center. It will help you with creating automatic backups and with other utilities like REORG that Antonio suggested. Things like compression obviously can help as the amount of data is much lower. However, not all DB2 editions offer compression. For example, DB2 Express-C does not. Frankly, doing compression on a 200GB database may not be worth while anyway and that is precisely why free DBMS like DB2 Express-C don't offer compression.
As far as openign a bigger pipe for your backup you first have to decide if you are going to backup to disk or to tape. There is a big difference in speed (obviously disk is a lot faster). Second, DB2 can paralelize backups. So, if you have multiple devices to back to, it will backup to all of them at the same time i.e. your elapsed time will be a lot less depending how many devices you have to throw at the problem. Again, DB2 Control Center can help you have it set up.
Try High Performance Unload (HPU) - this was a standalone product from Infotel is now available as part of the Optim data studio - posting here https://www.ibm.com/developerworks/mydeveloperworks/blogs/idm/date/200910?lang=en
It's not a "third-party" product but anyone that I have ever seen using DB2 is using Tivoli Storage Manager to store their database backups.
Most shops will set up archive logging to TSM so you only have to take the "big" backup every week or so.
Since it's also an IBM product you won't have to worry about it working with all the different flavors of DB2 that you have.
The downside is it's an IBM product. :) Not sure if that ($) makes a difference to you.
I doubt that you can speed things up using another backup tool. As Mike mentions, you can add TSM to the stack, but that will hardly make the backup run any faster.
If I were you, I'd look into where backup files are stored: Are they using the same disk spindles as the database itself? If so: See if you can store the backup files on a storage area which isn't contented for access during your backup window.
And consider using incremental backups for daily backups, and then a long full backup on Saturdays.
(I assume that you are already running online backups, so that your data aren't unavailable during backup.)
A third party backup package probably won't help your speed much. Making sure that you are not doing full backups every 2 hours is probably the first step.
After that, look at where you are writing your backup to. Is it a local drive, instead of a network drive? Are the spindles used for anything else? Backups don't involve a lot of seek activity, but do involve a lot of big writes, so you probably want to avoid RAID 5 and go for large stripe sizes to help maximize throughput.
Naturally, you have to do full backups sooner or later, but hopefully you can find a window when load is light and you can live with a longer time period between backups. Do your full backup during a 4-6 hour period when the normal incrementals are off and then do incrementals based off of that the rest of the time.
Also, until you get your backup copied to a completely separate system you really aren't backed up. You'll have to experiment to figure out if you're better off compressing it before, during or after sending.

Resources