Recording all Sql Server Inserts and Updates - sql-server

How can I record all the Inserts and Updates being performed on a database (MS SQL Server 2005 and above)?
Basically I want a table in which I can record all the inserts andupdates issues on my database.
Triggers will be tough to manage because there are 100s of tables and growing.
Thanks
Bullish

We have hundreds of tables and growing and use triggers. In newer versions of SQL server you can use change Data Capture or Change Tracking but we have not found them adequate for auditing.
What we have is are two separate audit tables for each table (one for recording the details of the instance (1 row even if you updated a million records) and one for recording the actual old and new values), but each has the same structure and is created by running a dynamic SQL proc that looks for unauditied tables and creates the audit triggers. This proc is run every time we deploy.
Then you should also take the time to write a proc to pull the data back out of the audit tables if you want to restore the old values. This can be tricky to write on the fly with this structure, so it is best to have it handy before you have the CEO peering down your neck while you restore the 50,000 users accidentally deleted.

As of SQL Server 2008 and above you have change data capture.
Triggers, although unwieldy and a maintenance nightmare, will do the job on versions prior to 2008.

Related

SQL Server: Archiving old data

I have a database that is getting pretty big, but the client is only interested in the last 2 years' data. But they would like to keep the older data "just-in-case".
Now we would like to archive the data to a different server over a WAN.
My plan is to create a stored proc to:
Copy all data from lookup tables, tables containing master data and foreign key tables over to the archive server.
Copy data from transactional tables over to the archive DB.
Delete transactional data from master db that's older than 2 years.
Although the approach will teoretically meet our needs, the 2 main problems are:
Performace: I'm copying the data over via SQL Linked Servers. Some of the big tables are really slow as it needs to compare which records exist and then update them, and the records that doesn't exists needs to be created. Seems like it will run in 3-4 hours.
We need to copy the tables in the correct sequence to prevent foreign key violations, and also the tables that have a relationship to itself (eg. Customers table with a ParentCustomer field), needs to be transferred without the ParentCustomer and then the ParentCustomer needs to be updated to prevent FK violations. Thus it becomes difficult to auto generate my Insert and Update statements (I would like to auto generate my statements as far as possible).
I just feel there might be a better way of archiving data that I do not yet know about. SSIS might be an option, but not sure if it will prevent my existing challenges. I don't know much about SSIS, so I might need to find some material to study it if that's the way to go.
I believe you need a batch process that will run as a scheduled task; perhaps every night. There are two options, which you have already discussed:
1) SQL Agent Job, which executes a Stored Procedure. The stored procedure will use Linked Servers.
2) SQL Agent Job, which will execute an SSIS package.
I believe you could benefit from a combination of both approaches, which would avoid Linked Serverd. Here are the steps:
1) An SQL Agent Job executes an SSIS package, which transfers the data to be archived from the live database to the copy database. This should be done in a specific sequence to avoid foreign key violations.
2) Once the SSIS package has executed the transfer, then it executes a stored procedure on the live database deleting the information that is over two years old. The stored procedure will not require any linked servers.
You will have to use transactions to make sure duplicate data is not archived. For example, if the SSIS package fails then the transaction should be rolled back and the Stored Procedure should not be executed.
You can use table partitions to create separate partitions for relevant date ranges.

Replicating a SQL Server database for read access

I have an application that is in production with its own database for more than 10 years.
I'm currently developing a new application (kind of a reporting application) that only needs read access to the database.
In order not to be too much linked to the database and to be able to use newer DAL (Entity Framework 6 Code First) I decided to start from a new empty database, and I only added the tables and columns I need (different names than the production one).
Now I need some way to update the new database with the production database regularly (would be best if it is -almost- immediate).
I hesitated to ask this question on http://dba.stackexchange.com but I'm not necessarily limited to only using SQL Server for the job (I can develop and run some custom application if needed).
I already made some searches and had those (part-of) solutions :
Using Transactional Replication to create a smaller database (with only the tables/columns I need). But as far as I can see, the fact that I have different table names / columns names will be problematic. So I can use it to create a smaller database that is automatically replicated by SQL Server, but I would still need to replicate this database to my new one (it may avoid my production database to be too much stressed?)
Using triggers to insert/update/delete the rows
Creating some custom job (either a SQL Job or some Windows Service that runs every X minutes) that updates the necessary tables (I have a LastEditDate that is updated by a trigger on my tables, so I can know that a row has been updated since my last replication)
Do you some advice or maybe some other solutions that I didn't foresee?
Thanks
I think that the Transactional replication is the better than using triggers.
Too much resources would be used in source server/database due to the trigger fires by each DML transaction.
Transactional rep could be scheduled as a SQL job and run it few times a day/night or as a part of nightly scheduled job. IT really depends on how busy the source db is...
There is one more thing that you could try - DB mirroring. it depends on your sql server version.
If it were me, I'd use transactional replication, but keep the table/column names the same. If you have some real reason why you need them to change (I honestly can't think of any good ones and a lot of bad ones), wrap each table in a view. At least that way, the view is the documentation of where the data is coming from.
I'm gonna throw this out there and say that I'd use Transaction Log shipping. You can even set the secondary DBs to read-only. There would be some setting up for full recovery mode and transaction log backups but that way you can just automatically restore the transaction logs to the secondary database and be hands-off with it and the secondary database would be as current as your last transaction log backup.
Depending on how current the data needs to be, if you only need it done daily you can set up something that will take your daily backups and then just restore them to the secondary.
In the end, we went for the Trigger solution. We don't have that much changes a day (maybe 500, 1000 top), and it didn't put too much pressure on the current database. Thanks for your advices.

insert data from different db server every second

Primary DB have all the raw data every 10 minutes, but it only store for 1 week. I would like to keep all the raw data for 1 year in another DB, and it is different server. How can it possible?
I have created T-query to select the required data from Primary DB. How can it keep update the data from primary DB and insert to secondary DB accordingly? The table has Datetime, would it able to insert new data for latest datetime?
Notes: source data SQL 2012
secondary db SQL 2005
If you are on sql2008 or higher the merge command (ms docs) may be very useful in your actual update process. Be sure to you understand it.
You table containing the full year data sounds like it could be OLAP, so I refer to it that way occasionally (if you don't know what OLAP is, look it up sometime, but it does not matter to this answer)
If you are only updating 1 or 2 tables, log shipping replication and failover may not work well for you, especially since you are not replicating the table due to different retention policies if nothing else. So make sure you understand how replication, etc. work before you go down that path. If these tables are over perhaps 50% of the total database, log shipping style methods might still be your best method. They work well and handle downtime issues for you -- you just replicate the source database to the OLAP server and then update from the duplicate database into your OLAP database.
Doing an update this every second is an unusual requirement. However, if you create a linked server, you be able to insert your selected rows into a staging table on the remote sever and them update from them to your OLAP table(s). If you can reliably update your OLAP table(s) on the remote server in 1 second, you have a potentially useful method. If not, you may fall behind on posting data to your OLAP tables. If you can update once a minute, you may find you are much less likely to fall behind on the update cycle (at the cost of being slightly less current at all times).
You want to consider putting after triggers on the source table(s) that copies the changes to a staging table (still on the source database) into staging table(s) with an identity on this staging table along with a flag to indicate Insert, Update or Delete and you are well positioned to ship updates for one or a few tables instead of the whole database. You don't need to requery your source database repeatedly to determine what data needs to be transmitted, just select top 1000 from from your staging table(s) (order by the staging id) and move them to the remote staging table.
If your fall behind, a top 1000 loop keeps from trying to post to much data in any one cross server call.
Depending on your data, you may be able to optimize storage and reduce log churn by not copying all columns to your staging table, just the staging id and the primary key of the source table and pretend that whatever data is in the source record at the time you post it to the OLAP database accurately reflects the data at the time the record was staged. It won't be 100% accurate on your OLAP table at all times, but it will be accurate eventually.
Cannot over emphasize that you need to accommodate the downtime in your design -- unless you can live with data loss or just wrong data. Even reliable connections are not 100% reliable.

Populating SQL Server databases and creating indexes - which is the most efficient way?

We've got a project site where we have to replicate a legacy database system into SQL Server 2008 on a nightly basis.
We are using the SQL DataWizard tool from Maestro to do the job, and because we cannot get an accurate delta every night, it was decided that we would dump the previous SQL Server database and take a fresh snapshot every night. Several million rows in about 10 different tables. The snapshot takes about 2 hours to run.
Now, we also need to create some custom indexes on the snapshot copy of the data, so that certain BI tools can query the data quickly.
My question is: is it more efficient to create the tables AND the indexes before the snapshot copy is run, or do we just create the table structures first, run the snapshot copy then create the indexes after the tables are populated?
Is there a performance different in the SQL Server database building the index WHILE adding rows vs adding all rows first then creating the indexes on the final data set?
Just trying to work out which way will result in less database server CPU overhead.
When you perform a snapshot replication, the first task is to bulk copy the data. After the data has been copied, primary and secondary indexes are added. The indexes don't exists until the second step is complete. So no, there is no improvement gain by applying an index after the snapshot.

SQL Server replication for 70 databases with transformation in a small time window

We have 70+ SQL Server 2008 databases that need to be copied from an OLTP environment to a separate reporting server. Once the DB's are copied, we will do some partial data transformation: de-normalization, row level security, etc.
SSRS Reports will be written based on these static denormalized tables and views.
We have a small nightly window for copying and transforming all 70 databases (3 hours).
Currently databases average about 10GB.
Options:
1. Transactional replication:
We would need to create 100+ static denormalized tables on each reporting database.
Doing this for all 70 databases almost reaches our nightly time limit.
As the databases grow we will exceed the time limit. We thought of mixing denormalized tables with views to speed up transformation. But then there would be some dynamic and some static data which is not a solution we can use.
Also with 70 databases using transactional replication we are concerned about bandwidth usage.
2. Snapshot replication:
Copy the entire database each night.
This means we could have a mixture of denormalized tables and views so the data transformation process is quicker.
But the snapshot is a full data copy, so as the DB grows, we will exceed our time limit for completing copy and transformation.
3. Log shipping:
In our nightly window, we could use the log shipping to update the reporting databases, then truncate and repopulate the denormalized tables and use some views.
However, I understand that with log shipping, extra tables and views cannot be added to the subscribing database.
4. Mirroring:
Mirroring is being deprecated, but also the DB is not active for reporting against until failover.
5. SQL Server 2012 AlwaysOn.
We don't have SQL Server 2012 yet, can this be configured to do an update once a day instead of realtime?
And can extra tables and views be created on the subscribing database (our reporting databases)?
6. Merge replication:
This is meant to be for combining multiple data sources into one database.
But is looks like it allows for a scheduled update (once per day) and only updates the subscriber DB with the latest changes rather than doing an entire snapshot.
It requires adding a rowversion column to every table but we could handle this. Also with this solution would additional tables be able to be created on the subscriber database without the update getting out of sync?
The final option is that we use SSIS to select only the data we need from the OLTP databases. I think this options creates more risk as we would have to handle inserts/updates/deletes to our denormalized tables, rather than just drop and recreate the denormalized tables daily.
Any help on our options would be greatly appreciated.
If I've made any incorrect assumptions, please say.
If it were me, I'd go with transactional replication that runs continuously and have views (possibly indexed) at the subscriber. This has the advantage of not having to wait for the data to come over since it's always coming over.

Resources