I've a bit of a strange problem with a BACPAC I took last night using the SQL Azure Import/Export Service.
In our database there are 2 related tables.
dbo.Documents --All Documents in the database
Id
DocName
Extension
dbo.ProcessDocuments --Doc's specific to a process
Id
DocumentId (FK -> dbo.Documents.Id with Check Constraint)
ProcessId
Based on that Schema it should not be possible for the ProcessDocuments table to include a row that does not have a companion entry in the main Documents table.
However after I did the restore of the database in another environment I ended up with
7001 entries in ProcessDocuments. Only 7000 equivalent entries for them in Documents (missing 1). And the restore failed on attempting to restore the ALTER TABLE CHECK CONSTRAINT on ProcessDocuments
The only thing I can imagine is that when the backup was being taken, it was sequentially (alphabetically???) going through the tables, and backing up the data 1 table at a time and something like the following happened.
Documents gets backed up. Contains 7000 entries
Someone adds a new process document to the system / Insert to Documents & Process Documents
ProcessDocuements gets backed up. Contains 7001 entries
If that's the case, then it creates a massive problem in terms using BACPACs as a valid disaster recovery asset, because if they're taken while the system has data in motion, it's possible that your BACPAC contains data integrity issues.
Is this the case, or can anyone shed any light on what else could have caused this ?
Data export uses bulk operations on the DB and is NOT guaranteed to be transactional, so issue like you described can and eventually will happen.
"An export operation performs an individual bulk copy of the data from each table in the database so does not guarantee the transactional consistency of the data. You can use the Windows Azure SQL Database copy database feature to make a consistent copy of a database, and perform the export from the copy."
http://msdn.microsoft.com/en-us/library/windowsazure/hh335292.aspx
if you want to create transactionally consistent backups you have to copy the DB first (which may cost you a lot, depending on size of your db) and then export a copied DB as BACPAC (as ramiramilu pointed out) http://msdn.microsoft.com/en-us/library/windowsazure/jj650016.aspx
you can do it yourself or use RedGate SQL Azure Backup but from what I understand they follow exactly the same steps as described above, so if you choose their consistent backup option it's gonna cost you as well.
As per the answer from Slav, the bacpac is non-transactional and will be corrupted if any new rows are added to any table while the bacpac is being generated.
To avoid this:
1) Copy the target database, which will return straight away, but the database will take some time to copy. This operation will create a full transactional copy:
CREATE DATABASE <name> AS COPY OF <original_name>
2) Find the status of your copy operation:
SELECT * FROM sys.dm_database_copies
3) Generate a bacpac file on the copied database, which isn't being used by anyone.
4) Delete the copied database, and you'll have a working bacpac file.
Related
How various databases implement copying data (replication) to a new instance when it is added to the replication setup?
I. e., when we add a new instance, how is the data loaded into it?
There are a lot of information about ways of replication, but they are explained in cases when the target database instance already has the same data from its source. But not when there is a new initially empty instance of database
There are basically 3 approaches here.
First you start capturing the changes from the source database using CDC tool. Since the target database is not yet created you store all the changes to apply them later.
Depending on the architecture you can:
If you have 1:1 copy
Take a backup of the source database, backup it, and restore it to the target database. Having some point in time of the backup you start applying the changes from the timestamp when the database backup was created.
Assuming you have a consistent backup of the database you would have the same data on the target but delayed compared to the source.
If you have a subset of the tables or a different vendor
The same approach like in 1. but you don't backup & restore the full database but just a list of tables. You can also restore the database backup in a temporary location, export part of the tables (or not full tables but just subset of columns), and next load them to the target.
When the target is initially prepared - you start applying the changers from the source to the target.
No source database snapshot available
If you can't get a snapshot of the replication tool often contains a method to work with that. Depending on the tool the function is named AUTOCORRECTION (SAP/Sybase Replication Server), HANDLECOLLISIONS (Oracle GoldenGate). This method basically means that the replication tool has a full image of the UPDATE operation, and when the record does not exist in the target - it is cerated. When the row for DELETE does not exists - the operation is ignored. When the rows already exists for INSERT - the operation is ignored.
To get a consistent state of the target you work in mode described here for some time until the point when you have data in sync, and next switch to regular replication.
One thing to mention about this mode is that you need to make sure that during the reconciliation operation the CDC must provide full UPDATE content for rows. If the UPDATE just contains modified columns - you would not be able to create INSERT command (with all column values) if the row is missing.
Of course the replication tool you use can incorporate the solution described above and do the task instead of you - automatically.
I need a backup from a database that makes a backup from the whole database but changes a few fields in table with random data (instead of the real data).
What is a good way to do that?
Thanks in advance ;-)
There is no silver bullet or one size fits all script for this issue, however the process itself is not a big deal, if I follow your question to the letter, you are looking for guidance on how to script the following 3 operations
Backup the database
Restore into a temporary location
Execute scripts to anonymize the data
Backup the anonymized data
See issue 1
This is a common development scenario, but also comes up when we want to otherwise test or demonstrate an application to prospective clients. Read over this tip for detailed information as well as a solution Automatically Create and Anonymize Downstream Databases from Azure
Scripting Backups
MSDN is a good source of scripts for common SQL Server tasks, there are many different flavours for scripting backups and many of them will be dependent on what resources your database is hosted on, this is a good start, but google or SO is your friend here: Create a Full Database Backup (SQL Server)
USE SQLTestDB;
GO
BACKUP DATABASE SQLTestDB
TO DISK = 'E:\Backups\SQLTestDB.Bak'
WITH FORMAT,
MEDIANAME = 'E_Backups',
NAME = 'Full Backup of SQLTestDB';
GO
Restoring backups
You can easily use Backup to clone an existing database and restore it into the same server as long as you use a different database name, and you restore to different logical files on disk
by default a restore operation would try to use the original filenames but that will not work when we restore side-by-side the original source database, because those files are still in use by the source database!
Have a read over Restore a Database to a New Location (SQL Server)
First you need to know the names of the files stored within the backup, then you can construct a query that will restore the database mapped to new files.
This answer to How do you backup and restore a database as a copy on the same server? should help a lot.
Putting that together we get:
BACKUP DATABASE SQLTestDB TO DISK = 'E:\Backups\SQLTestDB.Bak'
GO
-- use the filelistonly command to work out what the logical names
-- are to use in the MOVE commands. the logical name needs to
-- stay the same, the physical name can change
restore filelistonly from disk='E:\Backups\SQLTestDB.Bak'
-- --------------------------------------------------
--| LogicalName | PhysicalName |
-- --------------------------------------------------
--| SQLTestDB | C:\mssql\data\SQLTestDB.mdf |
--| SQLTestDB_log | C:\mssql\data\SQLTestDB_log.ldf |
-- -------------------------------------------------
restore database SQLTestDB_Temp from disk='E:\Backups\SQLTestDB.Bak'
with move 'SQLTestDB' to 'C:\mssql\data\SQLTestDB_Temp.mdf',
move 'SQLTestDB_log' to 'C:\mssql\data\SQLTestDB_log.ldf'
It is possible to put this script into a stored proc so you can reuse it, one issue is how to use the results from RESTORE FILELISTONLY, you'll find this answer will help if you want to go down that path: https://stackoverflow.com/a/4018782/1690217
Anonymizing Data
This is where things get specific, now that your database has been restored to a temporary location, you can pretty much do whatever you want to the data in a series of INSERT, UPDATE, or `DELETE' statements that you need, you could even modify the schema to remove particularly sensitive or audit or other logging tables that you may have that you don't need to distribute.
Do not leave audit tables in the database that you have anonymized, unless you plan on anonymizing the context within those logs as well! Depending on your circumstances, also consider nulling out all IMAGE and VARBINARY columns as their contents will be particularly hard to process sufficiently.
I wont go into the specifics, there are already healthy discussions on this topic on SO:
Anonymizing customer data for development or testing
Backup the Anonymized data
When you have finished scrubbing or anonymizing your database, simply complete your script with a call to backup the temporary DB that has been anonymized.
I'm able to implement trigger on user define table but not on system table (log_shipping_primaries and log_shipping_secondaries), it store info about when the backup .bak file was generated and when the transaction log ship file .trn file was copied and restored at secondary database
Objective : to implement RPO (recovery point objective) > 15 min at secondary database (DR site)
I was given a task to monitor log_shipping activities and provide historical data to higher management. Now the problem is these 2 tables is that it updates the old entry (for a given database) every time whenever a new entry is added.
Solution (does not work in SQL Server 2000): whenever a new entry is inserted, insert the similar data in user define table via trigger so as to keep historical data.
how to implement trigger on system table (what permission do I need) or is it possible at all (please be precidse)?
alternative to trigger like stored procedure or something(I've no experience in stored procedure)
If you using SQL job for backup and transaction log shipping, in that case you can add a step before that to import that data in history table.
Its not possible to create triggers on system tables.
I'd look at adding a job to SQL Server Agent that runs at whatever frequency is appropriate (say, every 30 seconds)
Why not create a job to save the data from the table before each backup? As for how to do this, it's your choice really. Write a stored procedure, write an insert statement directly into a SQL Agent job, sit by your computer and hit F5 every 15 minutes... :-) Lots of options.
I'm using redgate's sql azure backup tool: http://www.red-gate.com/products/dba/sql-azure-backup/
It looks like if you check "Make Backup Transactionally Consistent" you get charged a full day's use for sql server. I'm wondering if I need to check this.
I do daily backups to blob storage and I backup the database to my local machine to work with every 3 days or so.
If I don't check the Transactionally Consistent box, am I going to run into any problems?
Well as the person who wrote SQL Azure Backup at Red Gate I can say that the only way to create a guaranteed transactionally consistent backup in Azure currently is indeed to use CREATE DATABASE ... AS COPY OF. This copy only exists for the duration of us taking the backup and is then dropped immediately afterwards.
If you don't check the box you'll only hit problems if there is a risk of transactions being in an inconsistent state when reading the data from each table in turn. CREATE COPY OF can take a very long time and also may cost money for the copy too.
If you're backing up to a BLOB you're using the Microsoft Import Export service rather than SQL Compare and SQL Data Compare technology but that also reads data from the tables to could be inconsistent too.
Hope this helps
Richard
AFAIK transactionally consistent means that you get a snapshot of the database at a point in time (which presumably means SQL Azure locks the db while (quickly we hope) it makes a copy of the entire database = your one day charge for a db that exists for only a few minutes).
This is better illustrated by non-transactionally consistent backup where begin by copying table X. While you are doing that someone amends (as it's a live database) table Y, which later gets copied to the backup. The foreign keys between X and Y might now not match 'cos X is from an earlier time period than Y.
I have used Sql Azure Backup and I did go for transactional consistency because the backups are for an emergency and the last thing I want in that scenario is inconsistencies in the data.
edit: now I think about it, Redgate should really state that if you backup every day you are effectively paying twice the rate for your database. I've been waiting for the sync framework which I think is there now...
To answer the question in the title: a SQL Azure database copy (the 'backup') is a SQL Azure database that is copied (fully online) from the source database and contains no uncommitted transactions (ie. is transactionally consistent). This is achieved the same way database snapshots or backup restores achieve consistency on the standalone SQL Server product: all pending transactions at the moment of 'separation' are rolled back.
As to why or how RedGate's product utilizes this, I don't know. I would venture a guess that in order to achieve a 'transitionally consistent backup' they are doing a CREATE DATABASE ... AS COPY OF ... (which creates the desired transactionally consistence) and then they use the technology from SQL Compare and Data Compare to copy out the schema and data.
I have a table in a database that I would like to backup daily, and keep the backups of the last two weeks. It's important that only this single table will be backed up.
I couldn't find a way of creating a maintenance plan or a job that will backup a single table, so I thought of creating a stored procedure job that will run the logic I mentioned above by copying rows from my table to a database on a different server, and deleting old rows from that destination database.
Unfortunately, I'm not sure if that's even possible.
Any ideas how can I accomplish what I'm trying to do would be greatly appreciated.
You back up an entire database.
A table consists of entries in system tables (sys.objects) with permissions assigned (sys.database_permissions), indexes (sys.indexes) + allocated 8k data pages. What about foreign key consistency for example?
Upshot: There is no "table" to back up as such.
If you insist, then bcp the contents out and backup that file. YMMV for restore.
You can create a DTS/SSIS package to do this.
I've never done this, but I think you can create another file group in your database, and then move the table to this filegroup. Then you can schedule backups just for this file group. I'm not saying this will work, but it's worth your time investigating it.
To get you started...
http://decipherinfosys.wordpress.com/2007/08/14/moving-tables-to-a-different-filegroup-in-sql-2005/
http://msdn.microsoft.com/en-us/library/ms179401.aspx