Incremental backup of greenplum database not working - database

In greenplum database while using gpbackup utility, I understand the heap tables, even when partitioned take full backup of the table even when we take incremental backup. But if I create the primary key or index on the heap table, shouldn't it start behaving as an append organized table? But it still takes full backup when --incremental is specified. Any reason for that?

gpcrondump utility only compares the state of each table in the database against the last backup using state files. If there is any change in the state of the table since the last backup, it is marked dirty and is backed up during an incremental backup.
At the file level, heap tables, when vacuumed, have empty tuple slots that are filled by the next available tuple -- as soon as that slot is filled, that whole file has been modified.
As such, gpcrondump can only take incremental backups of "append-only" tables.
I would take a look at gpbackup - which has incremental backups on the roadmap and currently is operating much faster than gpcrondump for most backup operations.

Related

Do indexes need to be rebuilt when restoring a database with existing indexes?

I removed some indexes on a very large table and realized I needed them. Instead of adding them back concurrently, which would take a very long time, I was wondering if I could just do restore using a database copy that was taken before the indexes were removed?
If by "database copy" you mean a copy of the Postgres DB directory at file level (with Postgres not running to get a consistent state), then yes, such a snapshot includes everything, indexes too. You could copy that back on file level, and then start Postgres - falling back to the previous state, of course.
If, OTOH, you mean a backup with the standard Postgres tools pg_dump or pg_dumpall, then no, indexes are not included physically. Just the instructions to build them. It would not make sense to include huge junks of functionally dependent values. Building them from restored data may be about as fast.
Either way, you could not add back an index from an older snapshot to a live DB anyway, after changes to the table have been made. That's a logically impossible. Then there is no alternative to rebuilding the index one way or another.
I'll answer for MySQL. You tagged your question with both mysql and postgresql so I don't know which one you really use.
If your backup was a physical backup made with a backup solution like Percona XtraBackup or MySQL Enterprise Backup, it will include the indexes, so restoring it will be quicker.
If your backup was a logical backup made with mysqldump or mydumper, then the backup includes only data. Restoring it will have to rebuild the indexes anyway. It will not save any time.
If you made the mistake of making a "backup" only by copying files out of the data directory, those are sort of like the physical backup, but unless you copied the files while the MySQL Server was shut down, the backup is probably not viable.

SQL Server--piecemeal restore of filegroups from complete backup in Simple Recovery Mode

We have a large database in MS SQL in which one of the tables is partitioned by a date column. The Primary key index is also partitioned using the same partition function. The database is kept in Simple Recovery model, since data is added to it in batches every 3 months.
DBCC checkfilegroup found consistency errors, so we needed to bring back just one filegroup from a complete backup.
Restore did not allow me to run a restore of a filegroup in Simple Mode, so I changed to full recovery mode, then ran the following, with no errors.
restore database aricases filegroup='2003'
from disk=N'backupfile-name.bak'
with recovery
I expected the "with recovery" clause to bring this back to working order, but the process ended with a note saying
The roll forward start point is now at log sequence number (LSN) 511972000001350200037. Additional roll forward past LSN 549061000001370900001 is required to complete the restore sequence.
When I query the database table that includes this filegroup I get a message saying that the primary key cannot be accessed because one of the partitions for the table cannot be access because it is offline, restoring, or defunct.
Why didn't "with recovery" clause leave this filegroup fully restored. Now what?
The entire database is very large (1.5TB). I can't backup the log file, because I'd first need to create a backup in full model mode. The filegroup itself is only 300gb.
I can do the restore again-- but would like to know the correct way of performing this.
Is there a way of staying in complete recovery mode and performing a piecemeal filegroup backup from a complete database backup?
I found the answer. Bottom line is that Simple Recovery Model is very limited. You must restore ALL read/write filegroups together from the same backup. Individual read/only filegroups CAN be restored separately, as long as they became read/only (no more changes) BEFORE the last backup of the read/write filegroups.
Bottom line-- only Full or Bulk-Logged models let you restore single read/write filegroups.
Bulk-Logged model is what a datawarehouse with batch loading should be using, not Simple Model. My error in design.
see from Microsoft
http://msdn.microsoft.com/en-us/library/ms191253.aspx
then look at piecemeal restores for Simple Model
http://msdn.microsoft.com/en-us/library/ms190984%28v=sql.100%29.aspx
very limited

SQL Server - Tempdb vs. Database Log usage

This may be a very basic question, but how can you determine beforehand whether a large operation will end up using database log or tempdb space?
For instance, one large insert / update operation I did used the database log to a point where we needed to employ SSIS & bulk operations just so the space wouldn't run out, because all the changes in the script had to be deployed at one time.
So now I'm working with a massive delete operation, that would fill the log 10 times over. So I created a script to check the space used by the database log file and delete the rows in smaller batches, with the idea that once the log file was large enough, the script would abort and then continue from that point the next day (allowing normal usage to continue till the next backup, without risk of the log running out of space).
Now, instead of filling the log, the latter query started filling up tempdb. Tempdb data file, not log file, to be specific. So I'm thinking there's a huge hole where my understanding of these two should be. :)
Thanks for any advice!
Edit:
To clarify, the question here is that why does the first example use database log, while the latter uses tempdb data file, to store the changes? And in general, by which logic are DML operations stored to either tempdb or log? Normally log should store all DB changes while tempdb is only used to store the processed data during operation when explicitly requested (ie, temp objects) or when the server runs out of RAM, right?
There is actually quite a bit that goes on behind the scenes when deleting records from a table. This MSDN Blog link may help shed some light on why tempdb is filling up when you try and delete. Either way, the delete will fill up the transaction logs as well, it just sounds like tempdb is filling up before it gets to the step of logging the transaction(s).
I'm not entirely sure what your requirements are, but the following links could be somewhat enlightening on your transaction logging issues. These are all set for SQL Server 2008 R2, but you can switch to whatever version you are running.
Recovery Model Overiew
Considerations for Switching from the Simple Recovery Model
Considerations for Switching from the Full or Bulk-Logged Recovery Model
You also have the option of truncating the table, but that depends on a few things. If you don't need the operation to be logged and you're deleting all the records from the table you can truncate. If you are doing some sort of conditional delete, but you're deleting more than you're keeping, you could always insert all of the records you want to keep into another "staging" table and then truncate the original. Then you can re-insert the records into the staging table. However, that really only works when you have no foreign key relationships on that table.

How are DDL changes replicated in PostgreSQL

In PostgreSQl 9.1, the streaming replication is done by streaming WALs records which is generated by UPDATEs and DELETEs on the Master’s data.
How are the DDL changes replicated? Those are not the part of WALs.
Postgresql's Write Ahead Log (WAL) does contain DDL. In Postgresql, DDL is transactional, just like DML. All goes through the WAL.
See http://wiki.postgresql.org/wiki/Transactional_DDL_in_PostgreSQL:_A_Competitive_Analysis
To elaborate on Colin's answer, almost everything goes through the write-ahead log. It is a block level journal that records every write that will be made to any database structure. Every change to any part of the data directory is first recorded in the WAL. That's because the primary purpose of the WAL is to allow replay of changes if the system crashes or loses power, so it needs to record every single planned disk write.
In PostgreSQL, tables, views, etc are just entries in the system catalog tables. Changes to these catalogs get write-ahead logged along with everything else. The same is true of database creation; a db is just an entry in pg_database and the corresponding directory structure.
Changes to tables made by VACUUM, CLUSTER, TRUNCATE etc; they all go through WAL, either with block-level change records or special WAL entries to describe the operation.
Only a few non-durable things don't go through WAL, like:
changes to UNLOGGED and TEMPORARY tables
Temp files for on-disk sorts

SQL Server Partial Database Backup (excluding some tables)

I'm managing a reasonably large SQL Server database. Some tables contain data that are business-critical and must be backed up offsite daily. But we also have other (read-write) tables that take up about half the size of the database that aren't business-critical. What I would like to do is something like this:
Primary filegroup: Tables A, B, C --> daily backup
Secondary filegroup: Tables D, E, F --> monthly (or occasional manual) backup
When I tried to test this, I got errors while trying to restore the filegroups. It looks like I can't restore a single filegroup alone or different file groups from different points in time. Ideally, I'd like to be able to just restore the primary filegroup (the most important one) first, and then restore the secondary one. I'm willing to accept some data loss on the secondary filegroup.
Can this be done?
In order to succeed with a partial or piecemeal restore strategy, you first need to adopt a Filegroup backup strategy. You can still backup your whole database at one time if you wish, but the backup needs to be at the filegroup level.
Details of how to perform filegroup backups can be found at the following link: http://msdn.microsoft.com/en-us/library/ms179401(v=sql.105).aspx
Details of how to perform a piecemeal restore can be found here http://msdn.microsoft.com/en-us/library/ms177425(v=sql.100).aspx

Resources