I'm trying to create a LINQ to SQL class that represents the "latest" version of itself.
Right now, the table that this entity represents has a single auto-incrementing ID, and I was thinking that I would add a version number to the primary key. I've never done anything like this, so I'm not sure how to proceed. I would like to be able to abstract the idea of the object's version away from whoever is using it. In other words, you have an instance of this entity that represents the most current version, and whenever any changes are submitted, a new copy of the object is stored with an incremented version number.
How should I proceed with this?
If you can avoid keeping a history, do. It's a pain.
If a complete history is unavoidable (regulated financial and medical data or the like), consider adding history tables. Use a trigger to 'version' into the history tables. That way, you're not dependent on your application to ensure a version is recorded - all inserts/updates/deletes are captured regardless of the source.
If your app needs to interact with historical data, make sure it's readonly. There's no sense capturing transaction histories if someone can simply change them.
If your concern is concurrent updates, consider using a record change timestamp. When both User A and User B view a record at noon, they fetch the record's timestamp. When User A updates the record, her timestamp matches the record's so the update goes through and the timestamp is updated as well. When User B updates the record five minutes later, his timestamp doesn't match the record's so he's warned that the record has changed since he last viewed it. Maybe it's automatically reloaded...
Whatever you decide, I would avoid inter-mingling current and historic data.
Trigger resources per comments:
MSDN
A SQL Team Introduction
Stackoverflow's Jon Galloway describes a general data-change logging trigger
The keys to an auditing trigger are the virtual tables 'inserted' and 'deleted'. These tables contain the rows effected by an INSERT, UPDATE, or DELETE. You can use them to audit changes. Something like:
CREATE TRIGGER tr_TheTrigger
ON [YourTable]
FOR INSERT, UPDATE, DELETE
AS
IF EXISTS(SELECT * FROM inserted)
BEGIN
--this is an insert or update
--your actual action will vary but something like this
INSERT INTO [YourTable_Audit]
SELECT * FROM inserted
END
IF EXISTS(SELECT * FROM deleted)
BEGIN
--this is a delete, mark [YourTable_Audit] as required
END
GO
The best way to proceed is to stop and seriously rethink your approach.
If you are going to keep different versions of the "object" around, then you are better off serializing it into an xml format and storing that in an XML column with a field for the version number.
There are serious considerations when trying to maintain versioned data in sql server revolving around application maintenance.
UPDATE per comment:
Those considerations include: the inability to remove a field or change the data type of a field in future "versions". New fields are required to be nullable or, at the very least, have a default value stored in the DB for them. As such you will not be able to use them in a unique index or as part of the primary keys.
In short, the only thing your application can do is expand. Provided the expansion can be ignored by previous layers of code.
This is the classic problem of Backwards Compatibility which desktop software makers have struggled with for years. And is the reason you might want to stay away from it.
Related
We want to know what rows in a certain table is used frequently, and which are never used. We could add an extra column for this, but then we'd get an UPDATE for every SELECT, which sounds expensive? (The table contains 80k+ rows, some of which are used very often.)
Is there a better and perhaps faster way to do this? We're using some old version of Microsoft's SQL Server.
This kind of logging/tracking is the classical application server's task. If you want to realize your own architecture (there tracking architecture) do it on your own layer.
And in any case you will need application server there. You are not going to update tracking field it in the same transaction with select, isn't it? what about rollbacks? so you have some manager who first run select than write track information. And what is the point to save tracking information together with entity info sending it back to DB? Save it into application server file.
You could either update the column in the table as you suggested, but if it was me I'd log the event to another table, i.e. id of the record, datetime, userid (maybe ip address etc, browser version etc), just about anything else I could capture and that was even possibly relevant. (For example, 6 months from now your manager decides not only does s/he want to know which records were used the most, s/he wants to know which users are using the most records, or what time of day that usage pattern is etc).
This type of information can be useful for things you've never even thought of down the road, and if it starts to grow large you can always roll-up and prune the table to a smaller one if performance becomes an issue. When possible, I log everything I can. You may never use some of this information, but you'll never wish you didn't have it available down the road and will be impossible to re-create historically.
In terms of making sure the application doesn't slow down, you may want to 'select' the data from within a stored procedure, that also issues the logging command, so that the client is not doing two roundtrips (one for the select, one for the update/insert).
Alternatively, if this is a web application, you could use an async ajax call to issue the logging action which wouldn't slow down the users experience at all.
Adding new column to track SELECT is not a practice, because it may affect database performance, and the database performance is one of major critical issue as per Database Server Administration.
So here you can use one very good feature of database called Auditing, this is very easy and put less stress on Database.
Find more info: Here or From Here
Or Search for Database Auditing For Select Statement
Use another table as a key/value pair with two columns(e.g. id_selected, times) for storing the ids of the records you select in your standard table, and increment the times value by 1 every time the records are selected.
To do this you'd have to do a mass insert/update of the selected ids from your select query in the counting table. E.g. as a quick example:
SELECT id, stuff1, stuff2 FROM myTable WHERE stuff1='somevalue';
INSERT INTO countTable(id_selected, times)
SELECT id, 1 FROM myTable mt WHERE mt.stuff1='somevalue' # or just build a list of ids as values from your last result
ON DUPLICATE KEY
UPDATE times=times+1
The ON DUPLICATE KEY is right from the top of my head in MySQL. For conditionally inserting or updating in MSSQL you would need to use MERGE instead
I recently ran across a very interesting problem involving database design.
I have changed the table names to simplify the problem, so let me describe it as such:
I have 2 tables, fruit and vegetable each stores whether or not a fruit or vegetable is tasty.
Now lets say that someone keeps changing the IsTasty setting through the UI of my application and management is demanding the ability to see when someone changed it last and who. The tricky part here is although we are ignoring the other data on the table, there is other data, and we don’t want to track when any data on the table was changed, just this one column.
What is the best way to solve this problem?
I have a description of the problem with ER diagrams here:
I like the way the acts_as_versioned plugin does it in Rails. It's closest to your solution 2 with an additional version number field. You basically have your fruits table and your fruits_versions table. Every time a row in fruits is updated, you insert the same data in the fruits_versions table, with an incremented version number.
I think it's more extensible than you solution 3 approach if you ever want to add more fields to the tables or track additional values. Solution 4 is sort of a non-relational solution, you could probably keep an audit log like that.
Another approach, as opposed to keeping the versions, is to keep track of the changes, like subversion or a version control system does. This can make it easier if you often need to know if something changed from a to b, versus just what it changed to. I guess that means the real answer is "it depends" on how the data will be used.
If you are using SQL Server 2008, have you taken a look at CDC (Change Data Capture)?
In a Trigger, there is a way to see which columns were modified..
Using SQL Server 2005 and up you can create a trigger with an if statement like this:
IF UPDATE(IsTasty)
BEGIN
Insert INTO Log (ID, NewValue) VALUES (#ID, #NewValue)
END
One way to do this is to add triggers to those tables. In the triggers check to see if the column you are interested has changed. If it has changed, insert a row into another table that tracks changes. The change tracking table might want to store data like the name of the column that was changed, the previous value, the new value and the date of the change.
Fo SQL Server, I use AutoAudit to generate the triggers. The audit table contains the history and views can be used to show the changes (AutoAudit makes views for deleted rows automatically).
I am trying to do an audit history by adding triggers to my tables and inserting rows intto my Audit table. I have a stored procedure that makes doing the inserts a bit easier because it saves code; I don't have to write out the entire insert statement, but I instead execute the stored procedure with a few parameters of the columns I want to insert.
I am not sure how to execute a stored procedure for each of the rows in the "inserted" table. I think maybe I need to use a cursor, but I'm not sure. I've never used a cursor before.
Since this is an audit, I am going to need to compare the value for each column old to new to see if it changed. If it did change I will execute the stored procedure that adds a row to my Audit table.
Any thoughts?
I would trade space for time and not do the comparison. Simply push the new values to the audit table on insert/update. Disk is cheap.
Also, I'm not sure what the stored procedure buys you. Can't you do something simple in the trigger like:
insert into dbo.mytable_audit
(select *, getdate(), getdate(), 'create' from inserted)
Where the trigger runs on insert and you are adding created time, last updated time, and modification type fields. For an update, it's a little tricker since you'll need to supply named parameters as the created time shouldn't be updated
insert into dbo.mytable_audit (col1, col2, ...., last_updated, modification)
(select *, getdate(), 'update' from inserted)
Also, are you planning to audit only successes or failures as well? If you want to audit failures, you'll need something other than triggers I think since the trigger won't run if the transaction is rolled back -- and you won't have the status of the transaction if the trigger runs first.
I've actually moved my auditing to my data access layer and do it in code now. It makes it easier to both success and failure auditing and (using reflection) is pretty easy to copy the fields to the audit object. The other thing that it allows me to do is give the user context since I don't give the actual user permissions to the database and run all queries using a service account.
If your database needs to scale past a few users this will become very expensive. I would recommend looking into 3rd party database auditing tools.
There is already a built in function UPDATE() which tells you if a column has changed (but it is over the entire set of inserted rows).
You can look at some of the techniques in Paul Nielsen's AutoAudit triggers which are code generated.
What it does is check both:
IF UPDATE(<column_name>)
INSERT Audit (...)
SELECT ...
FROM Inserted
JOIN Deleted
ON Inserted.KeyField = Deleted.KeyField -- (AutoAudit does not support multi-column primary keys, but the technique can be done manually)
AND NOT (Inserted.<column_name> = Deleted.<column_name> OR COALESCE(Inserted.<column_name>, Deleted.<column_name>) IS NULL)
But it audits each column change as a separate row. I use it for auditing changes to configuration tables. I am not currently using it for auditing heavy change tables. (But in most transactional systems I've designed, rows on heavy activity tables are typically immutable, you don't have a lot of UPDATEs, just a lot of INSERTs - so you wouldn't even need this kind of auditing). For instance, orders or ledger entries are never changed, and shopping carts are disposable - neither would have this kind of auditing. On low volume change tables, like customer, you can use this kind of auditing.
Jeff,
I agree with Zodeus..a good option is to use a 3rd tool.
I have used auditdatabase (FREE)web tool that generates audit triggers (you do not need to write a single line of TSQL code)
Another good tools is Apex SQL Audit but..it's not free.
I hope this helps you,
F. O'Neill
What is the best way to track changes in a database table?
Imagine you got an application in which users (in the context of the application not DB users ) are able to change data which are store in some database table. What's the best way to track a history of all changes, so that you can show which user at what time change which data how?
In general, if your application is structured into layers, have the data access tier call a stored procedure on your database server to write a log of the database changes.
In languages that support such a thing aspect-oriented programming can be a good technique to use for this kind of application. Auditing database table changes is the kind of operation that you'll typically want to log for all operations, so AOP can work very nicely.
Bear in mind that logging database changes will create lots of data and will slow the system down. It may be sensible to use a message-queue solution and a separate database to perform the audit log, depending on the size of the application.
It's also perfectly feasible to use stored procedures to handle this, although there may be a bit of work involved passing user credentials through to the database itself.
You've got a few issues here that don't relate well to each other.
At the basic database level you can track changes by having a separate table that gets an entry added to it via triggers on INSERT/UPDATE/DELETE statements. Thats the general way of tracking changes to a database table.
The other thing you want is to know which user made the change. Generally your triggers wouldn't know this. I'm assuming that if you want to know which user changed a piece of data then its possible that multiple users could change the same data.
There is no right way to do this, you'll probably want to have a separate table that your application code will insert a record into whenever a user updates some data in the other table, including user, timestamp and id of the changed record.
Make sure to use a transaction so you don't end up with cases where update gets done without the insert, or if you do the opposite order you don't end up with insert without the update.
One method I've seen quite often is to have audit tables. Then you can show just what's changed, what's changed and what it changed from, or whatever you heart desires :) Then you could write up a trigger to do the actual logging. Not too painful if done properly...
No matter how you do it, though, it kind of depends on how your users connect to the database. Are they using a single application user via a security context within the app, are they connecting using their own accounts on the domain, or does the app just have everyone connecting with a generic sql-account?
If you aren't able to get the user info from the database connection, it's a little more of a pain. And then you might look at doing the logging within the app, so if you have a process called "CreateOrder" or whatever, you can log to the Order_Audit table or whatever.
Doing it all within the app opens yourself up a little more to changes made from outside of the app, but if you have multiple apps all using the same data and you just wanted to see what changes were made by yours, maybe that's what you wanted... <shrug>
Good luck to you, though!
--Kevin
In researching this same question, I found a discussion here very useful. It suggests having a parallel table set for tracking changes, where each change-tracking table has the same columns as what it's tracking, plus columns for who changed it, when, and if it's been deleted. (It should be possible to generate the schema for this more-or-less automatically by using a regexed-up version of your pre-existing scripts.)
Suppose I have a Person Table with 10 columns which include PersonSid and UpdateDate. Now, I want to keep track of any updates in Person Table.
Here is the simple technique I used:
Create a person_log table
create table person_log(date datetime2, sid int);
Create a trigger on Person table that will insert a row into person_log table whenever Person table gets updated:
create trigger tr on dbo.Person
for update
as
insert into person_log(date, sid) select updatedDTTM, PersonSID from inserted
After any updates, query person_log table and you will be able to see personSid that got updated.
Same you can do for Insert, delete.
Above example is for SQL, let me know in case of any queries or use this link :
https://web.archive.org/web/20211020134839/https://www.4guysfromrolla.com/webtech/042507-1.shtml
A trace log in a separate table (with an ID column, possibly with timestamps)?
Are you going to want to undo the changes as well - perhaps pre-create the undo statement (a DELETE for every INSERT, an (un-) UPDATE for every normal UPDATE) and save that in the trace?
Let's try with this open source component:
https://tabledependency.codeplex.com/
TableDependency is a generic C# component used to receive notifications when the content of a specified database table change.
If all changes from php. You may use class to log evry INSERT/UPDATE/DELETE before query. It will be save action, table, column, newValue, oldValue, date, system(if need), ip, UserAgent, clumnReference, operatorReference, valueReference. All tables/columns/actions that need to log are configurable.
The table doesn't have a last updated field and I need to know when existing data was updated. So adding a last updated field won't help (as far as I know).
SQL Server 2000 does not keep track of this information for you.
There may be creative / fuzzy ways to guess what this date was depending on your database model. But, if you are talking about 1 table with no relation to other data, then you are out of luck.
You can't check for changes without some sort of audit mechanism. You are looking to extract information that ha not been collected. If you just need to know when a record was added or edited, adding a datetime field that gets updated via a trigger when the record is updated would be the simplest choice.
If you also need to track when a record has been deleted, then you'll want to use an audit table and populate it from triggers with a row when a record has been added, edited, or deleted.
You might try a log viewer; this basically just lets you look at the transactions in the transaction log, so you should be able to find the statement that updated the row in question. I wouldn't recommend this as a production-level auditing strategy, but I've found it to be useful in a pinch.
Here's one I've used; it's free and (only) works w/ SQL Server 2000.
http://www.red-gate.com/products/SQL_Log_Rescue/index.htm
You can add a timestamp field to that table and update that timestamp value with an update trigger.
OmniAudit is a commercial package which implments auditng across an entire database.
A free method would be to write a trigger for each table which addes entries to an audit table when fired.