MS SQL Server restart CDC to read from beginning after cleanup - sql-server

recently I started working with CDC on MS SQL Server. I have a scenario.
Enabled CDC on a SQL Server
Enalbed CDC on a certain table
Data ingested using debezium connector to kafka
Data has been cleared by cdc cleanup job
Is it possible to run cdc capturing changes once again from beginning ? Like restarting whole CDC process to initial point ?

Kind of, but it you may or may not like it.
It seems like you're looking for "can I get the missing change history back?". The answer is "not really". But you have options.
If you have historic backups of the database that has CDC on it, you could restore those somewhere and grab the CDC data from those. Depending on the size of your database, the configured retention on the CDC data, and the rate of change (i.e. how much change data has been captured), this is probably not a great option. That is, let's say that you have a backup from a month ago and your configured retention is two days. Once you restore the database, it will have the two days of change data from a month ago. You could continue to restore successively newer backups to get to current but that seems like a lot to me.
If you're using CDC to keep a target in sync with the source, you could either restore a backup of the source db somewhere or use a database snapshot of it to grab the initial state of the data and then consume CDC data from that point forward (based on the LSN of the source).

Sounds like you are asking for snapshots, not just CDC... Debezium maintains a history topic, and Kafka Source connectors store offsets as well.
These topics could be modified such that you reset both. For example, here's a blog explaining how it's done with the FileStream source connector.
Otherwise, re-posting a new Debezium connector with a different name should achieve the same effect

Related

Backup and restore cdc tables

Currently working on customer releases for our product. First step is to compare the customer database with our current dev db. Apparently we need to shut off cdc to do this, which drops all of the system cdc tables. Is there any way to backup and restore these tables for after we've finished comparing the two db's?
Apparently we need to shut off cdc to do this.
Completely agree with #DavidBrown that it isn't necessary to disable CDC for comparing the databases.
CDC enabled database simply means that your database captures the data change automatically. It manages that changed data in a separate table therefore you can still compare your database with different database without disabling CDC.
In case if required, you can restore the CDC enabled database by referring this third-party tutorial.
Restoring a SQL Server database that uses Change Data Capture

SQLServer differential backup and restore

I have a scenario in which I need to maintain a replica of existing database.
Is there a solution to achieve the below mentioned approach.
1. Take a full back once and restore to a destination database.
2. Scheduled( ex: Every day) differential backup(Only the data which has changed since last backup) of the source database and restore into the destination database
This is to avoid taking full backup and restore each time.
You can use Differential Backups, but you would need to ship a new Full backup periodically or the Differentials will continue to grow.
A better solution might be Log Shipping, where you can ship just the changes on whatever schedule you want.
You can consider configuring an availability group and use a secondary SQL server instance with asynchronous data sync. This should be considered only if the primary(original live SQL server) and secondary servers are in the same location\data centre. So you don't need to take backup-restore or do any extra work other than properly configuring it at the first time.
If that is not the case (copy should be available in another location\data center), it would be better to go with configuring log shipping.
First option is a lot better because it would contain the exact copy of the primary database (with a sync delay depending on various factors...probably seconds) and you can directly fail over to the secondary in case of any issues with the primary server.

cdc when source system is sql server azure/standard 2008+

Let us say my target staging db/data warehouse is sql server 2008+ enterprise. However, my source systems are sql server azure/standard 2008+. Can I still exploit CDC? As far as I understand, I cannot as I have to turn CDC on in the source systems and it is only available for eneterprise editions. Is this correct? I am also curious what happens if the transaction log is truncated. Thanks.
I just googled it and... if you need this for replicating into a data warehouse you probably only need change tracking https://technet.microsoft.com/en-us/library/cc280519(v=sql.105).aspx. This http://azure.microsoft.com/en-us/documentation/articles/sql-database-preview-whats-new/ says change tracking is available in Azure.
I don't see any specific info anywhere about whether change tracking uses the transaction log, but this info is in one of the links:
The tracking mechanism in change data capture involves an asynchronous
capture of changes from the transaction log so that changes are
available after the DML operation. In change tracking, the tracking
mechanism involves synchronous tracking of changes in line with DML
operations so that change information is available immediately.

what is the best way to replicate database for SSRS

I have installed SQL server database (mainserver) in one instance and SQL server database for RerportServer in others. what is the best way to replicate data from mainServer to report Server? Data in mainServer changes frequently and actual information in the ReportSever is very important.
And there is many ways to do this:
mirroring
shipping log
transactional replication
merge replication
snapshot replication
are there some best-practices about this?
Thanks
You need Transactional Replication for your case. Here is why you would not need the other 4 cases:
Mirroring
This is generally used to increase the availability of a database server and provides for automatic failover in case of a disaster.
Typically even though you have more than a single copy of the database (recommended to be on different server instances), only one of them is active at a time, called the principle server.
Every operation on this server instance is mirrored on the others continuously (as soon as possible), so this doesn't fit your use case.
Log Shipping
In this case, apart from the production database servers, you have extra failover servers such that the backup of the production server's database, differential & transactional logs are automatically shipped (copied) to the failovers, and restored.
The replication here is relatively scheduled to be at a longer interval of time than the other mechanisms, typically ranging from an hour to a couple of hours.
This also provides for having the failver servers readies manually in case of a disaster at the production sites.
This also doesn't fit your use case.
Merge Replication
The key difference between this and the others is that the replicated database instances can communicate to the different client applications independent of the changes being made to each other.
For example a database server in North America being updated by clients across Americas & Europe and another one in Australia being updated by clients across the Asia-Pacific region, and then the changes being merged to one another.
Again, it doesn't fit your use case.
Snapshot Replication
The whole snapshot of the database is published to be replicated to the secondary database (different from just the log files being shipped for replication.)
Initially however, for each type of replication a snapshot is generated to initialized the subscribing database, i.e only once.
Why you should use Transactional Replication?
You can choose the objects (Tables, Views, etc) to be replicated continuously, so if there are only a subset of the tables which are used to reporting, it would save a lot of bandwidth. This is not possible in Mirroring and Log Shipping.
You can redirect traffic from your application to the reporting server for all the reads and reports (which you can also do in others too, btw).
You can have independent batch jobs generating some of the more used reports running on the reporting server, reducing the load on the main server if it has quite frequent Inserts, Updates or Deletes.
Going through your list from top to bottom.
Mirroring: If you mirror your data from your mainServer to your reportServer you will not be able to access your reportServer. Mirroring puts the mirrored database into a continuous restoring state. Mirroring is a High Availability solution. In your case the reportServer will only be available to query if you do a fail over. The mirrored server is never operational till fail over. This is not what you want as you cannot use the reportServer till it is operational.
Log Shipping: Log shipping will allow you to apply transactional log backups on a scheduled event to the reportServer. If you backup the transaction log every 15 minutes and apply the data to the reportServer you will have a delay of 15+ minutes between your mainServer and Log server. Mirroring is actually real time log shipping. Depending on how you setup log shipping your client will have to disconnect while the database is busy restoring the log files. Thus during a long restore it might be impossible to use reporting. Log Shipping is also a High Availability feature and not really useful for reporting. See this link for a description of trying to access a database while it is trying to restore http://social.msdn.microsoft.com/forums/en-US/sqldisasterrecovery/thread/c6931747-9dcb-41f6-bdf4-ae0f4569fda7
Replication : I am lumping all the replication together here. Replication especially transactional replication can help you scale out your reporting needs. It would generally be mush easier to implement and also you would be able to report on the data all of the time where in mirroring you cant report on the data in transaction log shipping you will have gaps. So in your case replication makes much more sense. Snapshot replication would be useful if your reports could be say a day old. You can make a snapshot every morning of the data you need from mainServer and publish this to the subscribers reportServer. However if the database is extremely large then Snapshot is going to be problematic to deal with on a daily basis. Merge replication is only usefull when you want to update the replicated data. In your case you want to have a read only copy of the data to report on so Merge replication is not going to help. Transactional Replication would allow you to send replications across the wire. In your case where you need frequently updated information in your reportServer this would be extremely useful. I would probably suggest this route for you.
Just remember that by implementing the replication/mirroring/log shipping you are creating more maintenance work. Replication CAN fail. So can mirroring and so can transaction log shipping. You will need to monitor these solutions to make sure they are running smoothly. So the question is do you really need to scale out your reports to another server or maybe spend time identifying why you cant report on the production server?
Hope that helps!

SQL Azure Backup: What does transactionally consistent mean?

I'm using redgate's sql azure backup tool: http://www.red-gate.com/products/dba/sql-azure-backup/
It looks like if you check "Make Backup Transactionally Consistent" you get charged a full day's use for sql server. I'm wondering if I need to check this.
I do daily backups to blob storage and I backup the database to my local machine to work with every 3 days or so.
If I don't check the Transactionally Consistent box, am I going to run into any problems?
Well as the person who wrote SQL Azure Backup at Red Gate I can say that the only way to create a guaranteed transactionally consistent backup in Azure currently is indeed to use CREATE DATABASE ... AS COPY OF. This copy only exists for the duration of us taking the backup and is then dropped immediately afterwards.
If you don't check the box you'll only hit problems if there is a risk of transactions being in an inconsistent state when reading the data from each table in turn. CREATE COPY OF can take a very long time and also may cost money for the copy too.
If you're backing up to a BLOB you're using the Microsoft Import Export service rather than SQL Compare and SQL Data Compare technology but that also reads data from the tables to could be inconsistent too.
Hope this helps
Richard
AFAIK transactionally consistent means that you get a snapshot of the database at a point in time (which presumably means SQL Azure locks the db while (quickly we hope) it makes a copy of the entire database = your one day charge for a db that exists for only a few minutes).
This is better illustrated by non-transactionally consistent backup where begin by copying table X. While you are doing that someone amends (as it's a live database) table Y, which later gets copied to the backup. The foreign keys between X and Y might now not match 'cos X is from an earlier time period than Y.
I have used Sql Azure Backup and I did go for transactional consistency because the backups are for an emergency and the last thing I want in that scenario is inconsistencies in the data.
edit: now I think about it, Redgate should really state that if you backup every day you are effectively paying twice the rate for your database. I've been waiting for the sync framework which I think is there now...
To answer the question in the title: a SQL Azure database copy (the 'backup') is a SQL Azure database that is copied (fully online) from the source database and contains no uncommitted transactions (ie. is transactionally consistent). This is achieved the same way database snapshots or backup restores achieve consistency on the standalone SQL Server product: all pending transactions at the moment of 'separation' are rolled back.
As to why or how RedGate's product utilizes this, I don't know. I would venture a guess that in order to achieve a 'transitionally consistent backup' they are doing a CREATE DATABASE ... AS COPY OF ... (which creates the desired transactionally consistence) and then they use the technology from SQL Compare and Data Compare to copy out the schema and data.

Resources