PostgreSQL Replication: First Full Copy Then Just Change Deltas - database

Is it possible to have, in PostgreSQL, a replication where we first take a full backup and then just take the deltas - the changes since the last backup? This way, the first backup would take as long as needed, depending on the size of the database, but after that the next ones could be very fast, depending on the amount of changes. Could this be implemented using a logical replication?

Related

Incremental backup of greenplum database not working

In greenplum database while using gpbackup utility, I understand the heap tables, even when partitioned take full backup of the table even when we take incremental backup. But if I create the primary key or index on the heap table, shouldn't it start behaving as an append organized table? But it still takes full backup when --incremental is specified. Any reason for that?
gpcrondump utility only compares the state of each table in the database against the last backup using state files. If there is any change in the state of the table since the last backup, it is marked dirty and is backed up during an incremental backup.
At the file level, heap tables, when vacuumed, have empty tuple slots that are filled by the next available tuple -- as soon as that slot is filled, that whole file has been modified.
As such, gpcrondump can only take incremental backups of "append-only" tables.
I would take a look at gpbackup - which has incremental backups on the roadmap and currently is operating much faster than gpcrondump for most backup operations.

SQLServer differential backup and restore

I have a scenario in which I need to maintain a replica of existing database.
Is there a solution to achieve the below mentioned approach.
1. Take a full back once and restore to a destination database.
2. Scheduled( ex: Every day) differential backup(Only the data which has changed since last backup) of the source database and restore into the destination database
This is to avoid taking full backup and restore each time.
You can use Differential Backups, but you would need to ship a new Full backup periodically or the Differentials will continue to grow.
A better solution might be Log Shipping, where you can ship just the changes on whatever schedule you want.
You can consider configuring an availability group and use a secondary SQL server instance with asynchronous data sync. This should be considered only if the primary(original live SQL server) and secondary servers are in the same location\data centre. So you don't need to take backup-restore or do any extra work other than properly configuring it at the first time.
If that is not the case (copy should be available in another location\data center), it would be better to go with configuring log shipping.
First option is a lot better because it would contain the exact copy of the primary database (with a sync delay depending on various factors...probably seconds) and you can directly fail over to the secondary in case of any issues with the primary server.

Do indexes need to be rebuilt when restoring a database with existing indexes?

I removed some indexes on a very large table and realized I needed them. Instead of adding them back concurrently, which would take a very long time, I was wondering if I could just do restore using a database copy that was taken before the indexes were removed?
If by "database copy" you mean a copy of the Postgres DB directory at file level (with Postgres not running to get a consistent state), then yes, such a snapshot includes everything, indexes too. You could copy that back on file level, and then start Postgres - falling back to the previous state, of course.
If, OTOH, you mean a backup with the standard Postgres tools pg_dump or pg_dumpall, then no, indexes are not included physically. Just the instructions to build them. It would not make sense to include huge junks of functionally dependent values. Building them from restored data may be about as fast.
Either way, you could not add back an index from an older snapshot to a live DB anyway, after changes to the table have been made. That's a logically impossible. Then there is no alternative to rebuilding the index one way or another.
I'll answer for MySQL. You tagged your question with both mysql and postgresql so I don't know which one you really use.
If your backup was a physical backup made with a backup solution like Percona XtraBackup or MySQL Enterprise Backup, it will include the indexes, so restoring it will be quicker.
If your backup was a logical backup made with mysqldump or mydumper, then the backup includes only data. Restoring it will have to rebuild the indexes anyway. It will not save any time.
If you made the mistake of making a "backup" only by copying files out of the data directory, those are sort of like the physical backup, but unless you copied the files while the MySQL Server was shut down, the backup is probably not viable.

Optimized Way of Scheduling a Differential Backup

I am working with a data warehouse with SQL Server 2012 and was wondering what would be the most optimized, automated procedure for a backup/restore strategy.
Current observations and limitations:
1) Cannot use transaction logs as it would affect my load performance - datasets are potentially huge with large transactions
2) Current plan is to do full backup every week and differential backup every day
I am not sure when DML operations will happen as it depends on my application's usage, but is there a way to just track the NUMBER of changes to a database that would trigger a differential backup? A way that would not affect performance? I do not want to be taking unnecessary differential backups.
Would Change tracking be a good solution for my scenario? Or would there be overhead involved? I do not need to know the actual data that was changed, just the fact that it was changed by a certain amount.
Thanks in advance!
Well, there's this ( http://www.sqlskills.com/blogs/paul/new-script-how-much-of-the-database-has-changed-since-the-last-full-backup/ ). I'm just trying to figure out what problem you're trying to solve. That is, if you find that the size is below some threshold, it will be (by definition) cheap to do.
It all depends on your DWH configuration.
1. Is your DWH database partitioned? If yes, It would be easier to do the daily db backup(diff backup) for the current partition ONLY. It's much more smaller set of data to be backed up.
If not, Current plan is to do full backup every week and differential backup every day is the only way since you cannot use transaction log file.
You could also try 3rd party disk (block) level backup software (i.e. Doubletake)....
Hope it helps.
You seem to have a mistaken notion of what a differential backup is. Don't worry; it's common.
When you say things like "track the number of changes to a database that would trigger a differential backup", it implies that you think that a differential backup gets all of the changes since the latest full or differential.
However, a differential backup gets all of the data that has changed since the last full backup only. So, you'd expect the size of subsequent differential backups to get larger and larger. For example, let's say you take a full backup on Sunday and a differential backup every other day. You'd get something like:
Monday: All of the data changed since Sunday's backup.
Tuesday: All of the data changed since Sunday's backup (including Monday's data)
Wednesday: All of the data changed since Sunday's backup (including Tuesday's data)
etc
Additionally, you'd only ever restore at most one differential backup if/when you need to restore your database. For instance, if your database crashed on right before Thursday's backup, you'd restore your last full backup (from Sunday in my example), then Wednesday's differential, and you're done.
As for when to schedule it, that's typically dictated by the rhythm of your business. For instance, you might decide to take a backup just before you kick off your ETL or just after. Doing it during doesn't make much sense as you'd have an inconsistent (with respect to your ETL process) database if you ever need to restore it.

Mix transactional replication and log shipping?

I've replicated a large database (close to 1TB) to three remote servers using pushing transactional replication. The subscribers are read-only. A lot of data is inserted and updated (from other sources) in one day every month. It always fail the replication after the day and we manually initialize the replication from backup every month.
Is it possible to switch to log shipping before the inserting day and switch back to transactional replication after the bulk insertions are log shipped? So there is no need to copy the big backup file for re-initialization?
No. Transactional replication is logical while log shipping is physical. You can't switch at will between the two. But if your subscribers are read only to start with then transactional replication can be replaced out of the box with log shipping, at the cost of a slight delay in updates and having to disconnect readers on the stand-by sites every time a log is being applied (usually this is nowhere near as bad as it sounds). Given how much more efficient and less problematic log shipping is compared to transactional replication, I would not hesitate for a single second in doing this replace for good.
I question your need to re-initialize on a scheduled basis. I've had replication topologies go for a really long time without having the need to re-init. And when we did, it was only because there was a schema change that didn't play nice. When you say that the large amount of data fails replication, what does that mean? Replication will gladly deliver large data changes that to the subscribers. If you're running afoul of latency limits, you can either increase those or break down large transactions into smaller ones at the publisher. You also have the option of setting the MaxCmdsInTran option for the log reader agent to have it break up your transactions for you.

Resources