We are currently working in a Healthcare application. Since medical data are very sensitive , we may have to keep history of all changes made to some medical records. I know that we can do this by keeping history tables for logging in the changes. But keeping history tables will cause the database to grow large. So I wonder if there is any features in SQL Server to track the complete data changes of a table. I have heard about Change Data Capture feature.Is it applicable in my scenario ??
Thanks in advance
It depends greatly on what you need to track. First of all, anything that tracks all history will use lots more space, and assuming the same level of detail, will not differ significantly in his much space it needs. Data is data, and it just takes space to store. However, with severally of your options you could look at storing your audit info outside of SQL if you wanted, and that helped.
Second, you have at least three immediately obvious options.
Use SQL Triggers on tables to create history tables. This does allow you to include as many, or as few columns as you want, so that is one nice feature.
Use SQL Change Tracking to monitor for changes and create history tables or otherwise audit the changes. Change tracking just stores that there were changes, it is up to you to store what the change was. Since this is queryable you can store it relatively easily outside the dB if desired.
Use SQL Change Data Capture as mentioned. This is really heavyweight, and has performance and storage implications. However, it will do most of the work for you, and is relatively easy to use.
Related
A client has one of my company's applications which points to a specific database and tables within the database on their server. We need to update the data several times a day. We don't want to update the tables that the users are looking at in live sessions. We want to refresh the data on the side and then flip which database/tables the users are accessing.
What is the accepted way of doing this? Do we have two databases and rename the databases? Do we put the data into separate tables, then rename the tables? Are there other approaches that we can take?
Based on the information you have provided, I believe your best bet would be partition switching. I've included a couple links for you to check out because it's much easier to direct you to a source that already explains it well. There are several approaches with partition switching you can take.
Links: Microsoft and Catherin Wilhelmsen blog
Hope this helps!
I think I understand what you're saying: if the user is on a screen you don't want the screen updating with new information while they're viewing it, only update when they pull a new screen after the new data has been loaded? Correct me if I'm wrong. And Mike's question is also a good one, how is this data being fed to the users? Possibly there's a way to pause that or something while the new data is being loaded. There are more elegant ways to load data like possibly partitioning the table, using a staging table, replication, have the users view snapshots, etc. But we need to know what you mean by 'live sessions'.
Edit: with the additional information you've given me, partition switching could be the answer. The process takes virtually no time, it just changes the pointers from the old records to the new ones. Only issue is you have to partition on something patitionable, like a date or timestamp, to differentiate old and new data. It's also an Enterprise-Edition feature and I'm not sure what version you're running.
Possibly a better thing to look at is Read Committed Snapshot Isolation. It will ensure that your users only look at the new data after it's committed; it provides a transaction-level consistent view of the data and has minimal concurrency issues, though there is more overhead in TempDB. Here are some resources for more research:
http://www.databasejournal.com/features/mssql/snapshot-isolation-level-in-sql-server-what-why-and-how-part-1.html
https://msdn.microsoft.com/en-us/library/tcbchxcb(v=vs.110).aspx
Hope this helps and good luck!
The question details are a little vague so to clarify:
What is a live session? Is it a session in the application itself (with app code managing it's own connections to the database) or is it a low level connection per user/session situation? Are users just running reports or active reading/writing from the database during the session? When is a session over and how do you know?
Some options:
1) Pull all the data into the client for the entire session.
2) Use read committed or partitions as mentioned in other answers (however requires careful setup for your queries and increases requirements for the database)
3) Use replica database for all queries, pause/resume replication when necessary (updating data should be faster than your process but it still might take a while depending on volume and complexity)
4) Use replica database and automate a backup/restore from the master (this might be the quickest depending on the overall size of your database)
5) Use multiple replica databases with replication or backup/restore and then switch the connection string (this allows you to update the master constantly and then update a replica and switch over at a certain predictable time)
I would like to track changes on views data. I don't think it is possible out of the box with current sql server change tracking. Has anyone come up with a solution to this one?
//edit
I'm synchronizing data between two databases. Synchronization works mostly on views (some tables too), so I need to track changes that are being made on views data (insert/update/delete). The task is not trivial, because some views are just JOINS and others use PIVOT.
No. No, you cannot. I wish you could.
You can track changes with DDL Triggers.. Seems like these work on SQL Server version 2005 and above.
One benefit is for users that are just not going to use source control (It happens). This can be a problem if the use GUI view creating and don't want to export to the actual text.
This records the previous versions (you already have the current in the database itself) without any user involvement.
This would free you up from having to be the bottle neck and approve or apply every change.
I didn't read enough of this to see if it would capture the user who submitted the change.
Manage the changes in your source control server and generate your views WITH ENCRYPTION so people don't mess with them on the server.
I have really odd user requirement. I have tried to explain to them there are much better ways of supporting their business process and they don't want to hear it. I am tempted to walk away but I first want to see if maybe there is another way.
Is there any way that I can lock a whole database as opposed to row-lock or table-lock. I know I can perhaps put the database into single-user mode but that means only one person can use it at a time. I would like many people to be able to read at a time but only one person to be able to write to it at a time.
They are trying to do some really odd data migration.
What do you want to achieve?
Do you want to make the whole database read-only? You can definitely do that
Do you want to prevent any new clients from connecting to the database? You can definitely do that too
But there's really no concept of a "database lock" in terms of only ever allowing one person to use the database. At least not in SQL Server, not that I'm aware of. What good would that make you, anyway?
If you want to do data migration out of this database, then setting the database into read-only mode (or creating a snapshot copy of it) will probably be sufficient and the easiest way to go.
UPDATE: for the scenario you mention (grab the data for people with laptops, and then re-syncronize), you should definitely check out ADO.NET Sync Services - that's exactly what it's made for!
Even if you can't use ADO.NET Sync Services, you should still be able to selectively and intelligently update your central database with the changes from laptops without locking the entire database. SQL Server has several methods to update rows even while the database is in use - there's really no need to completely lock the whole database just to update a few rows!
For instance: you should have a TIMESTAMP (or ROWVERSION) column on each of your data tables, which would easily allow you to see if any changes have occured at all. If the TIMESTAMP field (which is really just a counter - it has nothing to do with date or time) has not changed, the row has not changed and thus doesn't need to be considered for an update.
I am working on an new web app I need to store any changes in database to audit table(s). Purpose of such audit tables is that later on in a real physical audit we can asecertain what happened in a situation, who edited what and what was the state of db at the time of e.g. a complex calculation.
So mostly audit table will be written and not read. Report may be generated though sometimes.
I have looked for available solution
AuditTrail - simple and that is why I am inclining towards it, I can understand it single file code.
Reversion - looks simple enough to use but not sure how easy it would be to modify it if needed.
rcsField seems to be very complex and too much for my needs
I haven't tried anyone of these, so I wanted to know some real experiences and which one I should be using. e.g. which one is faster uses less space, easy to extend and maintain?
Personally I prefer to create audit tables in the database and populate through triggers so that any change even ad hoc queries from the query window are stored. I would never consider an audit solution that is not based in the database itself. This is important because people who are making malicious changes to the database or committing fraud are not likely to do so through the web interface but on the backend directly. Far more of this stuff happens from disgruntled or larcenous employees than outside hackers. If you are using an ORM already, your data is at risk because the permissions are at the table level rather than the sp level where they belong. Therefore it is even more important that you capture any possible change to the dat not just what was from the GUI. WE have a dynamic proc to create audit tables that is run whenever new tables are added to the database. Since our audit tables populate only the changes and not the whole record, we do not need to change them every time a field is added.
Also when evaluating possible solutions, make sure you consider how hard it will be to revert the data to undo a specific change. Once you have audit tables, you will find that this is one of the most important things you need to do from them. Also consider how hard it will be to maintian the information as the database schema changes.
Choosing a solution because it appears to be the easiest to understand, is not generally a good idea. That should be lowest of your selction criteria after meeting the requirements, security, etc.
I can't give you real experience with any of them but would like to make an observation.
I assume by AuditTrail you mean AuditTrail on the Django wiki. If so, I think you'll want to instead look at HistoricalRecords developed by the same author (Marty Alchin aka #gulopine) in his book Pro Django. It should work better with Django 1.x.
This is the approach I'll be using on an upcoming project, not because it necessarily beats the others from a technical standpoint, but because it matches the "real world" expectations of the audit trail for that application.
As i stated in my question rcField seems to be to much for my needs, which is simple that i want store any changes to my table, and may be come back later to those changes to generate some reports.
So I tested AuditTrail and Reversion
Reversion seems to be a better full blown application with many features(which i do not need), Also as far as i know it saves data in a single table in XML or YAML format, which i think
will generate too much data in a single table
to read that data I may not be able to use already present db tools.
AuditTrail wins in that regard that for each table it generates a corresponding audit table and hence changes can be tracked easily, per table data is less and can be easily manipulated and user for report generation.
So i am going with AuditTrail.
What are some strategies that people have had success with for maintaining a change history for data in a fairly complex database. One of the applications that I frequently use and develop for could really benefit from a more comprehensive way of tracking how records have changed over time. For instance, right now records can have a number of timestamp and modified user fields, but we currently don't have a scheme for logging multiple change, for instance if an operation is rolled back. In a perfect world, it would be possible to reconstruct the record as it was after each save, etc.
Some info on the DB:
Needs to have the capacity to grow by thousands of records per week
50-60 Tables
Main revisioned tables may have several million records each
Reasonable amount of foreign keys and indexes set
Using PostgreSQL 8.x
One strategy you could use is MVCC, Multi-Value Concurrency Control. In this scheme, you never do updates to any of your tables, you just do inserts, maintaining version numbers for each record. This has the advantage of providing an exact snapshot from any point in time, and it also completely sidesteps the update lock problems that plague many databases.
But it makes for a huge database, and selects all require an extra clause to select the current version of a record.
If you are using Hibernate, take a look at JBoss Envers. From the project homepage:
The Envers project aims to enable easy versioning of persistent JPA classes. All that you have to do is annotate your persistent class or some of its properties, that you want to version, with #Versioned. For each versioned entity, a table will be created, which will hold the history of changes made to the entity. You can then retrieve and query historical data without much effort.
This is somewhat similar to Eric's approach, but probably much less effort. Don't know, what language/technology you use to access the database, though.
In the past I have used triggers to construct db update/insert/delete logging.
You could insert a record each time one of the above actions is done on a specific table into a logging table that keeps track of the action, what db user did it, timestamp, table it was performed on, and previous value.
There is probably a better answer though as this would require you to cache the value before the actual delete or update was performed I think. But you could use this to do rollbacks.
The only problem with using Triggers is that it adds to performance overhead of any insert/update/delete. For higher scalability and performance, you would like to keep the database transaction to a minimum. Auditing via triggers increase the time required to do the transaction and depending on the volume may cause performance issues.
another way is to explore if the database provides any way of mining the "Redo" logs as is the case in Oracle. Redo logs is what the database uses to recreate the data in case it fails and has to recover.
Similar to a trigger (or even with) you can have every transaction fire a logging event asynchronously and have another process (or just thread) actually handle the logging. There would be many ways to implement this depending upon your application. I suggest having the application fire the event so that it does not cause unnecessary load on your first transaction (which sometimes leads to locks from cascading audit logs).
In addition, you may be able to improve performance to the primary database by keeping the audit database in a separate location.
I use SQL Server, not PostgreSQL, so I'm not sure if this will work for you or not, but Pop Rivett had a great article on creating an audit trail here:
Pop rivett's SQL Server FAQ No.5: Pop on the Audit Trail
Build an audit table, then create a trigger for each table you want to audit.
Hint: use Codesmith to build your triggers.