SQL Server 2008 R2 - Change Tracking, Database Snapshots, invalid rows

SQL Server 2008 R2 - Change Tracking, Database Snapshots, invalid rows - sql-server

I have a system in place where a SQL Server 2008 R2 database is being mirrored to a second server. The database on the first server has change tracking turned on and on the second server we create a database snapshot of the mirror database in order to pull data for an ETL system with the change tracking functions. Both databases have an isolation level of Read Committed
Occasionally when we pull the change tracking information from the database snapshot,using the changetable function, there is a row that is either an insert or update (i,u) but the row is not in the base table, it is deleted. When I go to check the first server, the changetable function shows only a 'd' row for the same primary key. It appears that the snapshot was taken right at the time the delete happened on the base table but before the information was tracked by change tracking.
My questions are:
Is the mechanism for capturing change tracking in the change tracking internal tables asynchronous?
EDIT:
The mechanism for change tracking is synchronous and a part of the transaction making the change to the table. I found that information here:
http://technet.microsoft.com/en-us/magazine/2008.11.sql.aspx
So that leaves me with these questions.
Why am I seeing the behavior I outlined above? If change tracking is synchronous why would I get inserts/updates from change tracking that don't have a corresponding row in the table?
Would changing the isolation level to snapshot isolation help alleviate the above situation?
EDIT:
After reading more about snapshot isolation recommendation it is recommended so that you can wrap your call to change_tracking_current_version and the call to changetable(changes...) in a transaction so that everything stays consistent. The database snapshot should be doing that for me since it is as of a point in time. I pass in the value from change_tracking_current_version into changetable(changes...) function.
Let me know if you need any more information!
Thanks
Chris

Related

tsql wait for change with Read Committed Snapshot On

Hi Stackoverflow community,
Let me ask for your help as I did run into a critical issue.
We have two linked servers and both are Microsoft SQL Servers: CRM and DW servers. Some changes in CRM system triggers a procedure to instantly get updates to DW server, and the way it works is that CRM system calls DW server to update the record. In my case the updates coming from CRM system for CRM and DW sql servers are called simultaneous, and here the problem begins.
DW server tries to read changes and gets records only before transaction begin. Yes, this happens because CRM Server uses:
Read Committed Snapshot On
Unfortunately, we are not able to change isolation level on the CRM sql server. Simple explanation- CRM comes from a third party provider, and they want to limit us to make these possibilities.
Is there any other way, to wait for transaction to commit and then read the latest data after commitment?
If there is a lack of information, please let me know, then I will provide more insights.

I don't understand the control flow here, but from the first paragraph you said updates in the CRM triggers a proc to update the DW server. So, I don't see how the DW server could be updating before the CRM server. You stated they are called simultaneously, but that would negate the comment about the trigger. You wouldn't want the DW to get dirty reads, so READ COMMITTED SNAPSHOT is a good choice here but you can also specify whatever isolation level you want at the transaction level and override the server default.
Since you asked "Is there a way to wait for transaction to commit and then read the latest data after commitment?". Sure, this can be handled in a few ways...
one would be a AFTER INSERT trigger
another would be to add the UPDATE to the DW in a code block after the INSERT statement, in the same procedure. Here, you could use TRY / CATCH and of course, SET XACT_ABORT ON so that if anything fails, the entire transaction is rolled back. Remember, nested transaction aren't real.

Database Engine Update Logic

When a record is updated in a SQL Server table, how does the db engine physically execute such a request: is it INSERT + DELETE or UPDATE operation?
As we know, the performance of a database and any statements depends on many variables. But I would like to know if some things can be generalized.
Is there a threshold (table size, query length, # records affected...) after which the database switches to one approach or the other upon UPDATEs?
If there are times when SQL Server is physically performing insert/delete when a logical update is requested, is there a system view or metric that would show this? - i.e, if there is a running total of all the inserts, updates and deletes that the database engine has performed since it was started, then I would be able to figure out how the database behaves after I issue a single UPDATE.
Is there any difference between the UPDATE statement's behavior depending on SQL Server version (2008, 2012...)
Many thanks.
Peter

UPDATE on base table without triggers is always physical UPDATE. SQL Server has no such threshold. You can look up usage statistics, for example, in sys.dm_db_index_usage_stats.

Update edits the existing row. If it was insert/delete, then you'd get update failures for duplicate keys.
Insert/Update/Delete also all can be discretely permissioned. So a user could update records, but not insert or delete, also leading to that not being the way it works.

Replicating a SQL Server database for read access

I have an application that is in production with its own database for more than 10 years.
I'm currently developing a new application (kind of a reporting application) that only needs read access to the database.
In order not to be too much linked to the database and to be able to use newer DAL (Entity Framework 6 Code First) I decided to start from a new empty database, and I only added the tables and columns I need (different names than the production one).
Now I need some way to update the new database with the production database regularly (would be best if it is -almost- immediate).
I hesitated to ask this question on http://dba.stackexchange.com but I'm not necessarily limited to only using SQL Server for the job (I can develop and run some custom application if needed).
I already made some searches and had those (part-of) solutions :
Using Transactional Replication to create a smaller database (with only the tables/columns I need). But as far as I can see, the fact that I have different table names / columns names will be problematic. So I can use it to create a smaller database that is automatically replicated by SQL Server, but I would still need to replicate this database to my new one (it may avoid my production database to be too much stressed?)
Using triggers to insert/update/delete the rows
Creating some custom job (either a SQL Job or some Windows Service that runs every X minutes) that updates the necessary tables (I have a LastEditDate that is updated by a trigger on my tables, so I can know that a row has been updated since my last replication)
Do you some advice or maybe some other solutions that I didn't foresee?
Thanks

I think that the Transactional replication is the better than using triggers.
Too much resources would be used in source server/database due to the trigger fires by each DML transaction.
Transactional rep could be scheduled as a SQL job and run it few times a day/night or as a part of nightly scheduled job. IT really depends on how busy the source db is...
There is one more thing that you could try - DB mirroring. it depends on your sql server version.

If it were me, I'd use transactional replication, but keep the table/column names the same. If you have some real reason why you need them to change (I honestly can't think of any good ones and a lot of bad ones), wrap each table in a view. At least that way, the view is the documentation of where the data is coming from.

I'm gonna throw this out there and say that I'd use Transaction Log shipping. You can even set the secondary DBs to read-only. There would be some setting up for full recovery mode and transaction log backups but that way you can just automatically restore the transaction logs to the secondary database and be hands-off with it and the secondary database would be as current as your last transaction log backup.
Depending on how current the data needs to be, if you only need it done daily you can set up something that will take your daily backups and then just restore them to the secondary.

In the end, we went for the Trigger solution. We don't have that much changes a day (maybe 500, 1000 top), and it didn't put too much pressure on the current database. Thanks for your advices.

How to record all updates and insertions for a database for a specific time period

I have a remote db for which I like to record all changes ( updates and inserts ) for the next couple of hours. If necessary I like to revert all changes as well. Note, that all of the changes might be coming in from several users/applications.
The db is fairly small ( couple of GBs ) and the number of updates should be small too ( < 1000 ).
How would I do that with SQL Server 2008 R2?
Let me know if you need more information.
Thanks,
Christian

So many options, depending on your exact needs and what you have available:
Take a full backup when the database is in the "before" state. Use a data comparison tool to determine what changed.
Create a database snapshot (Enterprise Edition engine only). This will let you see the before state at all times, and you can manually consolidate/rollback/whatever the changes that get made, or revert to the snapshot which will essentially do the same thing as a restore. Use a data comparison tool to determine what changed.
Use Change Data Capture (CDC) (Enterprise Edition engine only). This will capture all the changes made to the tables on which it's enabled. This is a really granular approach. Combine with either a backup or snapshot to roll back.
Set up triggers on all the tables that can be affected, and direct the data to change tracking tables you create yourself. Combine with either a backup or snapshot to roll back.

Sql Server, find all rows that have been updated by a statement

Is there a way of finding all the rows that have been updated by a single statement, sql itself must be tracking this as it could roll back the update if required. I'm interested in finding all the changed rows as I'm getting performance hit using update triggers.
I have a some large (2M-10M) row tables in Sql Server, and I'm adding audit triggers to track when records are updated and by what, trouble is this is killing performance. Most of the updates against the table will touch 20,000+ rows and they're now taking 5-10 times longer than previously.
I've thought of some options
1) Ditch triggers entirely and add the audit fields to every update statement, but that relies on everyone's code being changed.
2) Use before/after checksum values on the fields and then use them to update the changed rows a second time, still a performance hit.
Has anyone else solved this problem?

An UPDATE trigger already has the records affected by an update statement in the inserted and deleted pseudo columns. You can select their primary key columns into a preliminary audit table serving as a queue, and move more complicated calculation into a separate job.
Another option is the OUTPUT clause for the UPDATE statement, which was introduced in SQL Server 2005. (updated after comment by Philip Kelley)

SqlServer knows how to rollback because it has the transaction log. Is not something that you can find in the data tables.
You can try to add a timestamp column to your rows, then save a "current" timestamp, update all the rows. The changed rows should be all the rows with the timestamp greater than your "current" timestamp. THis will help you to find the changed rows, but not to find what has changed them.

You can use Change Tracking or Change Data Capture. These are technologies built into the Engine for tracking changes and are leveraging the Replication infrastructure (log reader or table triggers). Both are only available in SQL Server 2008 or 2008 R2 and CDC requires Enterprise Edition licensing.
Anything else you'd try to do would ultimately boil down to either one of:
reading the log for changes (which is only doable by Replication, including Change Data Capture, otherwise the Engine will recycle the log before you can read it)
track changes in triggers (which is what Change Tracking would use)
track changes in application
There just isn't any Free Lunch. If audit is a requirement, then the overhead of auditing has to be taken into consideration and capacity planning must be done accordingly. All data audit solution will induce significant overhead, so the an increase of operating cost by factors of 2x, 4x or even 10x are not unheard of.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight