Maintaining audit log for entities split across multiple tables - sql-server

We have an entity split across 5 different tables. Records in 3 of those tables are mandatory. Records in the other two tables are optional (based on sub-type of entity).
One of the tables is designated the entity master. Records in the other four tables are keyed by the unique id from master.
After update/delete trigger is present on each table and a change of a record saves off history (from deleted table inside trigger) into a related history table. Each history table contains related entity fields + a timestamp.
So, live records are always in the live tables and history/changes are in history tables. Historical records can be ordered based on the timestamp column. Obviously, timestamp columns are not related across history tables.
Now, for the more difficult part.
Records are initially inserted in a single transaction. Either 3 or 5 records will be written in a single transaction.
Individual updates can happen to any or all of the 5 tables.
All records are updated as part of a single transaction. Again, either 3 or 5 records will be updated in a single transaction.
Number 2 can be repeated multiple times.
Number 3 can be repeated multiple times.
The application is supposed to display a list of point in time history entries based on records written as single transactions only (points 1,3 and 5 only)
I'm currently having problems with an algorithm that will retrieve historical records based on timestamp data alone.
Adding a HISTORYMASTER table to hold the extra information about transactions seems to partially address the problem. A new record is added into HISTORYMASTER before every transaction. New HISTORYMASTER.ID is saved into each entity table during a transaction.
Point in time history can be retrieved by selecting the first record for a particular HISTORYMASTER.ID (ordered by timestamp)
Is there any more optimal way to manage audit tables based on AFTER (UPDATE, DELETE) TRIGGERs for entities spanning multiple tables?

Your HistoryMaster seems similar to how we have addressed history of multiple related items in one of our systems. By having a single point to hang all the related changes from in the history table, it is easy to then create a view that uses the history master as the hub and attached the related information. It also allows you to not create records in the history where an audit is not desired.
In our case the primary tables were called EntityAudit (where entity was the "primary" item being retained) and all data was stored EntityHistory tables related back to the Audit. In our case we were using a data layer for business rules, so it was easy to insert the audit rules into the data layer itself. I feel that the data layer is an optimal point for such tracking if and only if all modifications use that data layer. If you have multiple applications using distinct data layers (or none at all) then I suspect that a trigger than creates the master record is pretty much the only way to go.
If you don't have additional information to track in the Audit (we track the user who made the change, for example, something not on the main tables) then I would contemplate putting the extra Audit ID on the "primary" record itself. Your description does not seem to indicate you are interested in the minor changes to individual tables, but only changes that update the entire entity set (although I may be miss reading that). I would only do so if you don't care about the minor edits though. In our case, we needed to track all changes, even to the related records.
Note that the use of an Audit/Master table has an advantage in that you are making minimal changes to the History tables as compared to the source tables: a single AuditID (in our case, a Guid, although autonumbers would be fine in non distributed databases).

Can you add a TimeStamp / RowVersion datatype column to the entity master table, and associate all the audit records with that?
But an Update to any of the "child" tables will need to update the Master entity table to force the TimeStamp / RowVersion to change :(
Or stick a GUID in there that you freshen whenever one of the associated records changes.
Thinking that through, out loud, it may be better to have a table joined 1:1 to Master Entity that only contains the Master Entity ID and the "version number" fo the record - either TimeSTamp / RowVersion, GUID, incremented number, or something else.

I think it's a symptom of trying to capture "abstract" audit events at the lowest level of your application stack - the database.
If it's possible consider trapping the audit events in your business layer. This would allow you to capture the history per logical transaction rather than on a row-by-row basis. The date/time is unreliable for resolving things like this as it can be different for different rows, and the same for concurrent (or closely spaced) transactions.
I understand that you've asked how to do this in DB triggers though. I don't know about SQL Server, but in Oracle you can overcome this by using the DBMS_TRANSACTION.LOCAL_TRANSACTION_ID system package to return the ID for the current transaction. If you can retrieve an equivalent SQLServer value, then you can use this to tie the record updates for the current transaction together into a logical package.

Related

CDC ODI - Why odi need two views JV$ and JV$D

During cdc process odi is creating two views JV$ AND JV$D even both have same structure why odi need two views if both are doing the same work.
In the next paragraphs you will see the diferences (extract from link).
The JV$ view is the view that is used in the mappings where you select the option Journalized data only. Records from the J$ table are filtered so that only the following records are returned:
Only Locked records :JRN_CONSUMED=’1’;
If the same PK appears multiple times, only the last entry for that PK (based on the JRN_DATE) is taken into account. Again the logic here is that we want to replicate values as they are currently in the source database. We are not interested in the history of intermediate values that could have existed.
An additional filter is added in the mappings at design time so that only the records for the selected subscriber are consumed from the J$ table, as we saw in figure 5.
Similarly to the JV$ view, the JV$D view joins the J$ table with the source table on the primary key. This view shows all changed records, locked or not, but applies the same filter on the JRN_DATE column so that only the last entry is taken into account when the same record has been modified multiple times since the last consumption cycle. It lists the changes for all subscribers.

Keeping update changes for each column

I've been asked by my client to manage update history for each column/field in one of our SQL Server tables. I've managed "batch" versions of entire rows before, but haven't done this type of backup/history before. They want to be able to keep track of changes for each column/field in a row of a table. I could use some help in the most efficient way to track this. Thanks!
The answer is triggers and history tables, but the best practice depends on what data your database holds and how often and by how much it is modified.
Essentially every time a record in a table is updated, an update trigger (attached to the table) gets notified what the old record looked like and what the new record will look like. You can then write the change history to new records to another table (i.e. tblSomething_History). Note: if updates to your tables are done via stored procs, you could write the history from there, but the problem with this is if another stored procedure updates your table as well, then the history won't be written.
Depending on the amount of fields / tables you want history for, you may do as suggested by #M.Al, you may embedded your history directly into the base table through versioning, or you may create a history table for each individual table, or you may create a generic history table such as:
| TblName | FieldName | NewValue | OldValue | User | Date Time |
Getting the modified time is easy, but it depends on security setup to determine which user changed what. Keeping the history in a separate table means less impact on retrieving the current data, as it is already separated you do not need to filter it out. But if you need to show history most of the time, this probably won't has the same effect.
Unfortunately you cannot add a single trigger to all tables in a database, you need to create a separate trigger for each, but they can then call a single stored procedure to do the guts of the actual work.
Another word of warning as well: automatically loading all history associated with tables can dramatically increase the load required, depending on the type of data stored in your tables, the history may become systematically larger than the base table. I have encountered and been affected by numerous applications that have become unusable, because the history tables were needlessly loaded when the base table was, and given the change history for the table could run into 100's per item, that's how much the load time increased.
Final Note: This is a strategy which is easy to absorb if built into your application from the ground up, but be careful bolting it on to an existing solution, as it can have a dramatic impact on performance if not tailored to your requirements. And can cost more than the client would expect it to.
I have worked on a similar database not long ago, where no row is ever deleted, every time a record was updated it actually added a new row and assigned it a version number. something like....
Each table will have two columns
Original_ID | Version_ID
each time a new record is added it gets assigned a sequential Version_ID which is a Unique Column and the Original_ID column remains NULL, On every subsequent changes to this row will actually insert a new row into the table and increases the Version_ID and the Version_ID that was assigned when the record was 1st created will be assigned to Original_ID.
If you have some situations where records need deleting, use a BIT column Deleted and set its value to 1 when a records is suppose to be deleted, Its called (Soft Deletion).
Also add a column something like LastUpdate datetime, to keep track of time that a change was made.
This way you will end up with all versions of a row starting from where a row is inserted till its is deleted (Soft deletion).

How do I update the database in the most efficient manner?

I am building a price comparison site that holds about 300.000 products and several hundred clients.
On a daily basis the site needs updating of prices and vendor stock availability.
When a vendor needs updating I was thinking about deleting all the vendor information and then pulling and inserting a new - each time.
Doing this I don't have to worry about a vendor deleting a product. In a simple way I would get a new set of data each day.
On the other hand I need to keep the autoincrement counter in check and it seems a waste to delete everything from a vendor if he has only updated prices on 3 products in his entire warehouse.
But updating has its disadvantages too. I still have to read out all data and compare each and every product price/availability to the data from the vendor, one product at a time. Also it's more complicated to notice products that gets deleted by the vendor.
How do I achieve updating in the most efficient and easy way?
I am not sure of what database technology you are using. In any case, a database supports handling of relationships between multiple tables. It is up to you to implement the concepts. For example, you can use the CASCADE option when creating a table A that references table B (by primary keys for example) to ensure that an update/delete of tuples in table B cascades (reflects) that corresponding tuple in table A.
See example of CASCADING and when to use

Where should I break up my user records to keep track of revisions

I am putting together a staff database and I need to be able to revise the staff member information, but also keep track of all the revisions. How should I structure the database so that I can have multiple revisions of the same user data but be able to query against the most recent revision? I am looking at information that changes rarely, like Last Name, but that I will need to be able to query for out of date values. So if Jenny Smith changes her name to Jenny James I need to be able to find the user's current information when I search against her old name.
I assume that I will need at least 2 tables, one that contains the uid and another that contains the revisions. Then I would join them and query against the most recent revision. But should I break it out even further, depending on how often the data changes or the type of data? I am looking at about 40 fields per record and only one or two fields will probably change per update. Also I cannot remove any data from the database, I need to be able to look back on all previous records.
A simple way of doing this is to add a deleted flag and instead of updating records you set the deleted flag on the existing record and insert a new record.
You can of course also write the existing record to an archive table, if you prefer. But if changes are infrequent and the table is not big I would not bother.
To get the active record, query with 'where deleted = 0', the speed impact will be minimal when there is an index on this field.
Typically this is augmented with some other fields like a revision number, when the record was last updated, and who updated it. The revision number is very useful to get the previous versions and also to do optimistic locking. The 'who updated this last and when' questions usually come once the system is running instead of during requirements gathering, and are useful fields to put in any table containing 'master' data.
I would use the separate table because then you can have a unique identifier that points to all the other child records that is also the PK of the table which I think makes it less likely you will have data integrity issues. For instance, you have Mary Jones who has records in the address table and the email table and performance evaluation table, etc. If you add a change record to the main table, how are you going to relink all the existing information? With a separate history table, it isn't a problem.
With a deleted field in one table, you then have to have an non-autogenerated person id and an autogenrated recordid.
You also have the possiblity of people forgetting to use the where deleted = 0 where clause that is needed for almost every query. (If you do use the deleted flag field, do yourself a favor and set a view with the where deleted = 0 and require developers to use the view in queries not the orginal table.)
With the deleted flag field you will also need a trigger to ensure one and only one record is marked as active.
#Peter Tillemans' suggestion is a common way to accomplish what you're asking for. But I don't like it.
The structure of a database should reflect the real-world facts that are being modeled.
I would create a separate table for obsolete_employee, and just store the historical information that would need to be searched in the future. This way you can keep your real employee data table clean and keep only the old data that is necessary. This approach will also simplify reporting and other features of the application that are not related to searching historical data.
Just think of that warm feeling you'll get when you type select * from employee and nothing but current, correct goodness comes flowing back!

trigger insertions into same table

I have many tables in my database which are interrelated. I have a table (table one) which has had data inserted and the id auto increments. Once that row has an ID i want to insert this into a table (table three) with another set of ID's which comes from a form(this data will also be going into a table, so it could from from that table), the same form as the data which went into the first table came from.
The two ID's together make the primary key of the third table.
How can I do this, its to show that more than one ID is joined to a single ID for something else.
Thanks.
You can't do that through a trigger as the trigger only has available to it the data that you already inserted not data that is currenlty only residing in your user interface.
Normally how you handle this situation is that you write a stored proc that inserts the meeting, returns the id value (using scope_identity() in SQL Server, but I'm sure other databases would have method to return the auto-generated id as well). Then you would use that value to insert to the other table with the other values you need for that table. You would of course want to wrap the whole thing in a transaction.
I think you can probably do what you're describing (just write the INSERTs to table 3) in the table 1 trigger) but you'll have to put the additional info for the table 3 rows into your table 1 row, which isn't very smart.
I can't see why you would do that instead of writing the INSERTs in your code, where someone reading it can see what's happening.
The trouble with triggers is that they make it easy to hide business logic in the database. I think (and I believe I'm in the majority here) that it's easier to understand, manage, maintain and generally all-round deal with an application where all the business rules exist in the same general area.
There are reasons to use triggers (for propagating denormalised values, for example) just as there are reasons for useing stored procedures. I'm going to assert that they are largely related to performance-critical areas. Or should be.

Resources