Sql Server, find all rows that have been updated by a statement - sql-server

Is there a way of finding all the rows that have been updated by a single statement, sql itself must be tracking this as it could roll back the update if required. I'm interested in finding all the changed rows as I'm getting performance hit using update triggers.
I have a some large (2M-10M) row tables in Sql Server, and I'm adding audit triggers to track when records are updated and by what, trouble is this is killing performance. Most of the updates against the table will touch 20,000+ rows and they're now taking 5-10 times longer than previously.
I've thought of some options
1) Ditch triggers entirely and add the audit fields to every update statement, but that relies on everyone's code being changed.
2) Use before/after checksum values on the fields and then use them to update the changed rows a second time, still a performance hit.
Has anyone else solved this problem?

An UPDATE trigger already has the records affected by an update statement in the inserted and deleted pseudo columns. You can select their primary key columns into a preliminary audit table serving as a queue, and move more complicated calculation into a separate job.
Another option is the OUTPUT clause for the UPDATE statement, which was introduced in SQL Server 2005. (updated after comment by Philip Kelley)

SqlServer knows how to rollback because it has the transaction log. Is not something that you can find in the data tables.
You can try to add a timestamp column to your rows, then save a "current" timestamp, update all the rows. The changed rows should be all the rows with the timestamp greater than your "current" timestamp. THis will help you to find the changed rows, but not to find what has changed them.

You can use Change Tracking or Change Data Capture. These are technologies built into the Engine for tracking changes and are leveraging the Replication infrastructure (log reader or table triggers). Both are only available in SQL Server 2008 or 2008 R2 and CDC requires Enterprise Edition licensing.
Anything else you'd try to do would ultimately boil down to either one of:
reading the log for changes (which is only doable by Replication, including Change Data Capture, otherwise the Engine will recycle the log before you can read it)
track changes in triggers (which is what Change Tracking would use)
track changes in application
There just isn't any Free Lunch. If audit is a requirement, then the overhead of auditing has to be taken into consideration and capacity planning must be done accordingly. All data audit solution will induce significant overhead, so the an increase of operating cost by factors of 2x, 4x or even 10x are not unheard of.

Related

Create audit table for a big table with a lot of columns in SQL Server

I know this question has been asked many times. My question here is I have a table which is around 8000 records but with around 25 columns. I would like to monitor any changes we make in this table. my server is only 2008.
We usually create an audit table for the specific table we monitor and record any changes into that using cursors as we usually have a lot of columns to monitor. But I don't want that this time!
Do you think instead of cursors, I can use a trigger to create a table called audit table XYZ and monitor changes in it having columns like field name, old value, new value, update_date, username?
Many thanks!
Short answer
Yes, absolutely use triggers over cursors. Cursors have a bad reputation for being misused and performing terribly, so where possible, avoid using them
Longer answer
If you have control over the application which is reading/writing to this table, consider have it build the queries for auditing instead. The thing to watch out for with an INSERT/UPDATE/DELETE trigger (which I assume is what you're going for) is that it's going to increase your write time for queries on that table, whereas writing the audit in its own query will avoid this (there is a caveat that I'll detail in the next paragraph). A consideration you also need to make is how much metadata the audit table needs to contain. For example, if your application requires users to log in, you may want to log their username to the audit table, which may not be available to a trigger. It all comes down to the purpose the audit table needs to serve for your application.
An advantage that triggers do have in this scenario is that they are bound to the same transaction as the underlying query. So if your INSERT/UPDATE/DELETE query fails and is rolled back, the audit rows which were created by the trigger will also be rolled back along with it, so you'll never end up with an audit entry for rows which never existed. If you favour writing your own audit queries over a trigger, you'll need to be careful to ensure that they are in the same transaction and get rolled back correctly in the event of an error

SQL Server - Rolling back particular transaction only at a later date

I have SQL Server 2014, standard edition. We have several tables where we delete data from, then re-insert it under different primary keys (to merge records for two people in our system that are actually the same). All these changes are performed with a T-SQL transaction.
I understand how transactions and rollbacks work, but what I need is more of an audit/rollback since my users may need to rollback just this transaction only at a later date (not restoring the whole database or table). "Change Data Capture" is not an option since I only have standard edition.
My real question lies in how to store this auditing information. I imagine I'll need a unique key to keep track of this being one unit of work so all these table changes get tied to same group as far as the user is concerned. But if I have a DELETE WHERE ID = #ID query for example, how do I store all these deleted records before deleting so that I can re-insert them later if needed? I'm fine with even storing a large rollback T-SQL script of some kind, I'm just not sure how to generate INSERT scripts that I can store and run later for data that I'm about to delete.
I'm open to any ideas, I just need an architecture that's generic enough to handle multiple tables and the ability to rollback deletions and insertions. I care more about the rollback ability than keeping a pretty audit table.
You can not do that out of the box as even with full logging you can roll back an entire database to a point in time but not specific transactions.
You will have to code something for un-doing transactions but I believe simple audit triggers will give you the data you need to make it happen. Here is a good article to get you started.
https://www.mssqltips.com/sqlservertip/4055/create-a-simple-sql-server-trigger-to-build-an-audit-trail/

Database Engine Update Logic

When a record is updated in a SQL Server table, how does the db engine physically execute such a request: is it INSERT + DELETE or UPDATE operation?
As we know, the performance of a database and any statements depends on many variables. But I would like to know if some things can be generalized.
Is there a threshold (table size, query length, # records affected...) after which the database switches to one approach or the other upon UPDATEs?
If there are times when SQL Server is physically performing insert/delete when a logical update is requested, is there a system view or metric that would show this? - i.e, if there is a running total of all the inserts, updates and deletes that the database engine has performed since it was started, then I would be able to figure out how the database behaves after I issue a single UPDATE.
Is there any difference between the UPDATE statement's behavior depending on SQL Server version (2008, 2012...)
Many thanks.
Peter
UPDATE on base table without triggers is always physical UPDATE. SQL Server has no such threshold. You can look up usage statistics, for example, in sys.dm_db_index_usage_stats.
Update edits the existing row. If it was insert/delete, then you'd get update failures for duplicate keys.
Insert/Update/Delete also all can be discretely permissioned. So a user could update records, but not insert or delete, also leading to that not being the way it works.

insert data from different db server every second

Primary DB have all the raw data every 10 minutes, but it only store for 1 week. I would like to keep all the raw data for 1 year in another DB, and it is different server. How can it possible?
I have created T-query to select the required data from Primary DB. How can it keep update the data from primary DB and insert to secondary DB accordingly? The table has Datetime, would it able to insert new data for latest datetime?
Notes: source data SQL 2012
secondary db SQL 2005
If you are on sql2008 or higher the merge command (ms docs) may be very useful in your actual update process. Be sure to you understand it.
You table containing the full year data sounds like it could be OLAP, so I refer to it that way occasionally (if you don't know what OLAP is, look it up sometime, but it does not matter to this answer)
If you are only updating 1 or 2 tables, log shipping replication and failover may not work well for you, especially since you are not replicating the table due to different retention policies if nothing else. So make sure you understand how replication, etc. work before you go down that path. If these tables are over perhaps 50% of the total database, log shipping style methods might still be your best method. They work well and handle downtime issues for you -- you just replicate the source database to the OLAP server and then update from the duplicate database into your OLAP database.
Doing an update this every second is an unusual requirement. However, if you create a linked server, you be able to insert your selected rows into a staging table on the remote sever and them update from them to your OLAP table(s). If you can reliably update your OLAP table(s) on the remote server in 1 second, you have a potentially useful method. If not, you may fall behind on posting data to your OLAP tables. If you can update once a minute, you may find you are much less likely to fall behind on the update cycle (at the cost of being slightly less current at all times).
You want to consider putting after triggers on the source table(s) that copies the changes to a staging table (still on the source database) into staging table(s) with an identity on this staging table along with a flag to indicate Insert, Update or Delete and you are well positioned to ship updates for one or a few tables instead of the whole database. You don't need to requery your source database repeatedly to determine what data needs to be transmitted, just select top 1000 from from your staging table(s) (order by the staging id) and move them to the remote staging table.
If your fall behind, a top 1000 loop keeps from trying to post to much data in any one cross server call.
Depending on your data, you may be able to optimize storage and reduce log churn by not copying all columns to your staging table, just the staging id and the primary key of the source table and pretend that whatever data is in the source record at the time you post it to the OLAP database accurately reflects the data at the time the record was staged. It won't be 100% accurate on your OLAP table at all times, but it will be accurate eventually.
Cannot over emphasize that you need to accommodate the downtime in your design -- unless you can live with data loss or just wrong data. Even reliable connections are not 100% reliable.

Find most recent SQL Server database activity

Data from another system is replicated into a SQL Server 2005 database in real-time (during the day, it's hundreds of transactions/second) using Goldengate. I'd like to be able to tell if there's been a transaction recently, which will tell me if replication is currently happening. Even in the off-hours, I can expect a transaction every few minutes, though I won't know which of the 400 tables it will go into.
Here's my current process:
IUD trigger on most popular replicated table
Updates date in "Sync Notification" table every time there's any activity on that table
SQL Agent job runs every few minutes and compares this date with GETDATE(). If it's been too long, it emails me.
This works for the most part, but I get false positives if there's activity in other tables, but not the monitored one, which can happen overnight.
Any other suggestions short of adding this same trigger to every table in the database? If I do add the triggers, how to I prevent deadlocks and contention on the "Sync notification" table? Since I don't care about the most recent date being exact during high-contention periods, is there a way I can have SQL try to update the date but just skip it if some other process has locked it?
The only "application-level" choice I have is to TELNET to the Goldengate monitor and ask for the replica lag, then screen scrape the results. I'm open to that, but I'd like to do something SQL-side if it's more feasible.
Is this for an automated job or something you want to look at every now and then? If the latter, then you could use a transaction log examination tool (Redgate Log Rescue, Apex SQLLog, probably others).
Another option open to you is look at sysindexes (SQL Server 2000: dbo.sysindex; 2005: sys.sysindexes). The column rowmodctr (to quote MSDN) "Counts the total number of inserted, deleted, or updated rows since the last time statistics were updated for the table". It may not return everything you need to know but, providing you've got covering indexes, it would give an indication of how many and where the changes there have been if sampled on a regular basis.
You can check SELECT * FROM ::fn_dblog(#startLSN, NULL) and see if any LOP_MODIFY_ROW operation occured since the last check (since last LSN you checked).

Resources