How to keep track of revisions? - sql-server

I am trying to design my database and I want to have the ability to keep a track of history of changes.
I will have a table that will have all the nutritional facts of an food. When a user makes changes to the item(say changes calories from 100 to 200). I want to make that as a new revision.
That way a person who comes along can see that it was original 100 calories and then was updated to 200 calories. I guess this would be alot like how stack does it now where you can see what has been edited.
I am wondering what is the best way to do this? I am using sql server and nhibernate.
I was thinking of have another column or something that would be revision number. Then every-time a revision is made the number is incremented. Is this a good way?

NHibernate.Envers helps you with that.

You could just do as you've written, add another revision number.
Another approach would be adding a timestamp and use that as history. Every time an item is updated you add a row with the current values and the current timestamp.

I'd normally use some form of audit table(s) to handle this. What we do where I currently work is have all our audit tables under an audit schema, and we have an audit table for every that we wish to track revisions for. We don't use NHibernate, so we simply utilise triggers to ensure that every update to a given row is recorded in the audit table along with a timestamp and user id so you can get some context and order to the way in which the revisions were made.
It seems like it is difficult using triggers with NHibernate, but you could use something like NHibernate interceptors or events as mentioned in this post. I'd say triggers would be preferable to relying on your code, but if this is the only way to go with NHibernate, then maybe it is worth a look.
Lastly, I've seen it mentioned that you can use SQL Servers native audit or trace capabilities. I've never used this myself, but do remember some post on SO suggesting that this replaced the need for creating your own manual audit tables and associated triggers. It seems to do what you want as illustrated by this quote from the link above;
The auditing of activity of users, roles, or groups on database
objects can be restricted down to the table level. That is, you can
target SQL Server Audit to track specific activities of a user or
users down to the individual table level. For example, SQL Server
Audit allows a record to be made of all the UPDATEs to the Payroll
table by DBO.
To me that sounds more like a true audit rather than just a revision history, but maybe someone with more experience in this area could comment on its feasibility to be used for revision tracking. Of course, you'd have to see what effect it had on NHibernate also.

Related

With Microsoft Common Data Service, is incremental refresh possible?

We have tables on Salesforce which we'd like to make available to other applications usings Microsoft Common Data Service. Moreover, we'd like to keep CDS more or less up to date, even having data which was created or updated five minutes ago.
However, some of those tables have hundreds of thousands, or even millions, of records. So, refreshing all the data is inefficient and impractical.
In theory, when CDS queries for data, it should be able to know when its most recent data is from and include this data in the query for new data.
But I'm not clear how to make that part of the query that gets used in the refresh operation.
Is this possible?
What do I need to do?
How are you selecting the data that you are retrieving? If you have the option to use a SOQL query, you can use the fields CreatedDate and LastModifiedDate as part of your queries. For example:
SELECT Id, Name
FROM Account
WHERE CreatedDate > 2020-10-25T23:01:01Z
or
SELECT Id, Name
FROM Account
WHERE LastModifiedDate > 2020-10-25T23:01:01Z
There are some options but I have no idea what (if anything) is implemented in your connector. Read up a bit and maybe you'll decide to do something custom. There's also a semi-decent SF connector in Azure Data Factory 2 if that helps.
Almost every Salesforce table contains CreatedDate, LastModifiedDate and SystemModstamp columns but we don't have raw access to underlying database. There's no ODBC driver (or if there is - it's lying, pretending SF objects are "linked tables" and hiding the API implementation magic in stored procedures). I'm not affiliated, never used it personally but heard good stuff about DBAmp. It's been few years though, alternatives might have popped up. Go give that ADF connector a go.
I mean you could even rephrase it a bit, look around for backup tool that does incremental backups and works with SQL Server? kill 2 birds with 1 stone.
So...
The other answer gives you query route which is OK but bit impractical if it's 100+ tables.
There's Data Replication API to get Ids (primary keys) of recently updated/deleted records. SOAP API has getUpdated call, similar in REST API. You'd still have to call it per object though. And you'd just know that these were modified, still need to query all columns (there's "retrieve" call similar to SELECT *)
Perhaps you need to change the direction. SF can raise events when data changes, subscribing apps have between 1 to 3 days to consume them. It uses cometd protocol, chances are there's app for that in .NET world. There are few types of events (could raise custom events, could raise only when certain conditions are met, from SF config or code; and other way around - subscribing app could specify a query it's interested in and get notified whenever query's results would change). But if you just want everything - search for "Change Data Capture". Could be nice near-realtime solution

Table for History of changes

I am working in asp.net MVC 3 Website and I need to keep track of any changes made to a table/entity. Whenever on Edit view something is modified, a list of changes will display with date, changes made columns below that Edit view. Do I need to create another table with entityHistory Name or I need to insert another record in same table for that ?
Please suggest
Depends what you want to do with the history data. If you want to show the record or object graph snapshots I have found creating a History table, with the same columns as the current table, easier to work with in building up how the complete record looked after or before a certain change. This also means that you'll have duplicated tables and data.
If your needs is a pure audit requirement it is easier to have one/two tables that holds data for entity, property, old value and new value columns.
Besides Audit options, SQL Server has now CDC (Change Data Capture in SQL2008) feature which enables developers to trace data changes on a sql table
You can build a similar logging mechanism by using triggers (refer to http://www.kodyaz.com/articles/sql-trigger-sql-server-trigger-example-to-log-changes-history.aspx for a sample)
You can also check the following article for an enhanced solution for logging data changes similar to CDC in SQL2005 http://www.kodyaz.com/articles/log-data-changes-using-change-data-capture-for-sql-server-2005.aspx

Update Table in the DataBase when we made changes in DataGrid contents

Hi i am developing an App in WPF who will have paginated records (i am doing the pagination myself depending on the filters or in the number of records per page the user wants to be shown).
So i have never worked serious with DataGrids and what i am asking is, what is the best approach and better politic when we work with a DataGrid to update the Table in the DB?
We detect the row who have been changed, or we update the whole Table in the DB, what is the better way?
Because the user can change one row, and then other, and imagine the user changes 50 rows, the App will have to connect 50 Times with the DB?
Unit of work is probably the most common infrastructure solution to this problem, basically it stores the changes applied to the data and when ready executes them in a transaction to the database. There are many ORM mappers like Entity Framework or nHibernate that already do this for you, so id start there.
EDIT
See this example implementation as it sounds like from your comments yould need to write your own version, but basically you build a list of inserts, updates, deletes that should happen and execute them all in a trasaction, first inserts, then updates, then deletes but Id recommend you look at an ORM like the ones i described above they already have this as a feature.

django AuditTrail vs Reversion

I am working on an new web app I need to store any changes in database to audit table(s). Purpose of such audit tables is that later on in a real physical audit we can asecertain what happened in a situation, who edited what and what was the state of db at the time of e.g. a complex calculation.
So mostly audit table will be written and not read. Report may be generated though sometimes.
I have looked for available solution
AuditTrail - simple and that is why I am inclining towards it, I can understand it single file code.
Reversion - looks simple enough to use but not sure how easy it would be to modify it if needed.
rcsField seems to be very complex and too much for my needs
I haven't tried anyone of these, so I wanted to know some real experiences and which one I should be using. e.g. which one is faster uses less space, easy to extend and maintain?
Personally I prefer to create audit tables in the database and populate through triggers so that any change even ad hoc queries from the query window are stored. I would never consider an audit solution that is not based in the database itself. This is important because people who are making malicious changes to the database or committing fraud are not likely to do so through the web interface but on the backend directly. Far more of this stuff happens from disgruntled or larcenous employees than outside hackers. If you are using an ORM already, your data is at risk because the permissions are at the table level rather than the sp level where they belong. Therefore it is even more important that you capture any possible change to the dat not just what was from the GUI. WE have a dynamic proc to create audit tables that is run whenever new tables are added to the database. Since our audit tables populate only the changes and not the whole record, we do not need to change them every time a field is added.
Also when evaluating possible solutions, make sure you consider how hard it will be to revert the data to undo a specific change. Once you have audit tables, you will find that this is one of the most important things you need to do from them. Also consider how hard it will be to maintian the information as the database schema changes.
Choosing a solution because it appears to be the easiest to understand, is not generally a good idea. That should be lowest of your selction criteria after meeting the requirements, security, etc.
I can't give you real experience with any of them but would like to make an observation.
I assume by AuditTrail you mean AuditTrail on the Django wiki. If so, I think you'll want to instead look at HistoricalRecords developed by the same author (Marty Alchin aka #gulopine) in his book Pro Django. It should work better with Django 1.x.
This is the approach I'll be using on an upcoming project, not because it necessarily beats the others from a technical standpoint, but because it matches the "real world" expectations of the audit trail for that application.
As i stated in my question rcField seems to be to much for my needs, which is simple that i want store any changes to my table, and may be come back later to those changes to generate some reports.
So I tested AuditTrail and Reversion
Reversion seems to be a better full blown application with many features(which i do not need), Also as far as i know it saves data in a single table in XML or YAML format, which i think
will generate too much data in a single table
to read that data I may not be able to use already present db tools.
AuditTrail wins in that regard that for each table it generates a corresponding audit table and hence changes can be tracked easily, per table data is less and can be easily manipulated and user for report generation.
So i am going with AuditTrail.

Effective strategy for leaving an audit trail/change history for DB applications?

What are some strategies that people have had success with for maintaining a change history for data in a fairly complex database. One of the applications that I frequently use and develop for could really benefit from a more comprehensive way of tracking how records have changed over time. For instance, right now records can have a number of timestamp and modified user fields, but we currently don't have a scheme for logging multiple change, for instance if an operation is rolled back. In a perfect world, it would be possible to reconstruct the record as it was after each save, etc.
Some info on the DB:
Needs to have the capacity to grow by thousands of records per week
50-60 Tables
Main revisioned tables may have several million records each
Reasonable amount of foreign keys and indexes set
Using PostgreSQL 8.x
One strategy you could use is MVCC, Multi-Value Concurrency Control. In this scheme, you never do updates to any of your tables, you just do inserts, maintaining version numbers for each record. This has the advantage of providing an exact snapshot from any point in time, and it also completely sidesteps the update lock problems that plague many databases.
But it makes for a huge database, and selects all require an extra clause to select the current version of a record.
If you are using Hibernate, take a look at JBoss Envers. From the project homepage:
The Envers project aims to enable easy versioning of persistent JPA classes. All that you have to do is annotate your persistent class or some of its properties, that you want to version, with #Versioned. For each versioned entity, a table will be created, which will hold the history of changes made to the entity. You can then retrieve and query historical data without much effort.
This is somewhat similar to Eric's approach, but probably much less effort. Don't know, what language/technology you use to access the database, though.
In the past I have used triggers to construct db update/insert/delete logging.
You could insert a record each time one of the above actions is done on a specific table into a logging table that keeps track of the action, what db user did it, timestamp, table it was performed on, and previous value.
There is probably a better answer though as this would require you to cache the value before the actual delete or update was performed I think. But you could use this to do rollbacks.
The only problem with using Triggers is that it adds to performance overhead of any insert/update/delete. For higher scalability and performance, you would like to keep the database transaction to a minimum. Auditing via triggers increase the time required to do the transaction and depending on the volume may cause performance issues.
another way is to explore if the database provides any way of mining the "Redo" logs as is the case in Oracle. Redo logs is what the database uses to recreate the data in case it fails and has to recover.
Similar to a trigger (or even with) you can have every transaction fire a logging event asynchronously and have another process (or just thread) actually handle the logging. There would be many ways to implement this depending upon your application. I suggest having the application fire the event so that it does not cause unnecessary load on your first transaction (which sometimes leads to locks from cascading audit logs).
In addition, you may be able to improve performance to the primary database by keeping the audit database in a separate location.
I use SQL Server, not PostgreSQL, so I'm not sure if this will work for you or not, but Pop Rivett had a great article on creating an audit trail here:
Pop rivett's SQL Server FAQ No.5: Pop on the Audit Trail
Build an audit table, then create a trigger for each table you want to audit.
Hint: use Codesmith to build your triggers.

Resources