Problem statement:
To always approve record updates before the changes reflect in main record. Want to try solving this problem using SQL database.
Eg.
User: { "name" : "Ravi Kumar", "city" : "Bangalore" }
Say we want to update the city to "Delhi" but it has to undergo approval before it reflects in the main record.
Once approved. It should show:
User: { "name" : "Ravi Kumar", "city" : "Delhi" }
Required Features:
Changes must get approved before they reflect in the main record
audit trail of the record to be maintained
how to support table joins? what if record is fetched using joining multiple tables? Basically if db is normalised.
Possible Solutions:
Have a additional column "approved" in the table. All approved records will have status 1 and rest will be 0. To get the current record we have to get record with the most recent timestamp with approved=1.
Have 2 tables one which contains the main table and another for approvals. When someone approves then we make changes to the record in main table.
Questions:
How to incorporate joins when there is normalisation? Is it simple or complex when joins are must? Is it even possible?
In case of joins, can we still implement using ORMs like hibernate?
If there are multiple updates waiting for approval on the same record and each update modifies different set of fields. If all of the records are approved then the record in consideration might only have the last update changes(Assuming one of above solutions are used).
Eg. There is a record in the main table and no unapproved records for it. Now a user changes a property say "name" and submits for approval. Another user changes property "city" and submits for approval. Another user changes property "salary" and submits for approval. All unapproved records are approved. Now last update(which changed salary property), that change contains old name not the one which is in first update. How to get all the approved changes.
This can be achieved by storing only changed properties instead of the whole. But it comes with cost of more code changes.
How are these problems tackled in the industry?
Related questions:
What's the best way to store changes to database records that require approval before being visible?
Structure to handle changes to records that require approval
If you want to be very careful about approving changes and you expect so many changes that it is likely that more changes will be made to a record before prior changes can be approved, then your best approach would be to have a separate set of tables with one record for each proposed change.
These "pending change" records could (should) include extra information about the change transaction, such as who proposed it and when.
Your process for handling all of these changes, and especially conflicting or overlapping changes will depend on your business rules, which you haven't stated definitively. Options include:
Prevent a second change while one is pending
Last change approved wins
Changes must be approved in sequence, overlaying earlier pending changes on top of the official data so that there is a presumption that all earlier changes will be approved prior to applying later changes
Regarding normalized databases and joins, this presents no special problems in your case. You will join the tables containing official, approved data as you would in any case. If you want to join to an interim/pre-approved version of a record, you should create a view which reflects these changes overlapped over the official data and then join to that view.
Related
I am putting together a staff database and I need to be able to revise the staff member information, but also keep track of all the revisions. How should I structure the database so that I can have multiple revisions of the same user data but be able to query against the most recent revision? I am looking at information that changes rarely, like Last Name, but that I will need to be able to query for out of date values. So if Jenny Smith changes her name to Jenny James I need to be able to find the user's current information when I search against her old name.
I assume that I will need at least 2 tables, one that contains the uid and another that contains the revisions. Then I would join them and query against the most recent revision. But should I break it out even further, depending on how often the data changes or the type of data? I am looking at about 40 fields per record and only one or two fields will probably change per update. Also I cannot remove any data from the database, I need to be able to look back on all previous records.
A simple way of doing this is to add a deleted flag and instead of updating records you set the deleted flag on the existing record and insert a new record.
You can of course also write the existing record to an archive table, if you prefer. But if changes are infrequent and the table is not big I would not bother.
To get the active record, query with 'where deleted = 0', the speed impact will be minimal when there is an index on this field.
Typically this is augmented with some other fields like a revision number, when the record was last updated, and who updated it. The revision number is very useful to get the previous versions and also to do optimistic locking. The 'who updated this last and when' questions usually come once the system is running instead of during requirements gathering, and are useful fields to put in any table containing 'master' data.
I would use the separate table because then you can have a unique identifier that points to all the other child records that is also the PK of the table which I think makes it less likely you will have data integrity issues. For instance, you have Mary Jones who has records in the address table and the email table and performance evaluation table, etc. If you add a change record to the main table, how are you going to relink all the existing information? With a separate history table, it isn't a problem.
With a deleted field in one table, you then have to have an non-autogenerated person id and an autogenrated recordid.
You also have the possiblity of people forgetting to use the where deleted = 0 where clause that is needed for almost every query. (If you do use the deleted flag field, do yourself a favor and set a view with the where deleted = 0 and require developers to use the view in queries not the orginal table.)
With the deleted flag field you will also need a trigger to ensure one and only one record is marked as active.
#Peter Tillemans' suggestion is a common way to accomplish what you're asking for. But I don't like it.
The structure of a database should reflect the real-world facts that are being modeled.
I would create a separate table for obsolete_employee, and just store the historical information that would need to be searched in the future. This way you can keep your real employee data table clean and keep only the old data that is necessary. This approach will also simplify reporting and other features of the application that are not related to searching historical data.
Just think of that warm feeling you'll get when you type select * from employee and nothing but current, correct goodness comes flowing back!
For the first time, I am developing in an environment in which there is a central repository for a number of different industry standard reference data tables and many different customers who need to select records from these industry standard reference data tables to fill in foreign key information for their customer specific records.
Because these industry standard reference files are utilized by all customers, I want to reserve Create/Update/Delete access to these records for global product administrators. However, I would like to implement a (semi-)automated interface by which specific customers could request record additions, deletions or modifications to any of the industry standard reference files that are shared among all customers.
I know I need something like a "data change request" table specifying:
user id,
user request datetime,
request type (insert, modify, delete),
a user entered text explanation of the change request,
the user request's current status (pending, declined, completed),
admin resolution datetime,
admin id,
an admin entered text description of the resolution,
etc.
What I can't figure out is how to elegantly handle the fact that these data change requests could apply to dozens of different tables with differing table column definitions. I would like to give the customer users making these data change requests a convenient way to enter their proposed record additions/modifications directly into CRUD screens that look very much like the reference table CRUD screens they don't have write/delete permissions for (with an additional text explanation and perhaps request priority field). I would also like to give the global admins a tool that allows them to view all the outstanding data change requests for the users they oversee sorted by date requested or user/date requested. Upon selecting a data change request record off the list, the admin would be directed to another CRUD screen that would be populated with the fields the customer users requested for the new/modified industry standard reference table record along with customer's text explanation, the request status and the text resolution explanation field. At this point the admin could accept/edit/reject the requested change and if accepted the affected industry standard reference file would be automatically updated with the appropriate fields and the data change request record's status, text resolution explanation and resolution datetime would all also be appropriately updated.
However, I want to keep the actual production reference tables as simple as possible and free from these extraneous and typically null customer change request fields. I'd also like the data change request file to aggregate all data change requests across all the reference tables yet somehow "point to" the specific reference table and primary key in question for modification & deletion requests or the specific reference table and associated customer user entered field values in question for record creation requests.
Does anybody have any ideas of how to design something like this effectively? Is there a cleaner, simpler way I am missing?
Option 1
If preserving the base tables is important then I would create a "change details" table as a child to your change request table. I'm envisioning something like
ChangeID
TableName
TableKeyValue
FieldName
ProposedValue
Add/Change/Delete Indicator
So you'd have a row in this table for every proposed field change. The challenge in this scenario is maintaining the mapping of TableName and FieldName values to the actual tables and fields. If your database structure if fairly static then this may not be an issue.
Option 2
Add a ChangeID field to each of your base tables. When a change is proposed add a record to the base table with the ChangeID populated. So as an example if you have a Company table, for a single company you could have multiple records:
CompanyCode ChangeID CompanyName CompanyAddress
----------- -------- ----------- --------------
COMP1 My Company Boston <-- The "live" record
COMP1 1 New Name Boston <-- A proposed change
When the admin commits the change the existing live record is deleted or archived and the ChangeID value is removed from the proposed record making it the live record. It may be a little tricky to handle proposed deletions with this option. This option also has the potential for impacting performance of selecting live data for normal usage. However it does save you the hassle of maintaining a list of table names and field names somewhere in your code.
I'm sure others will have some opinions!
Say you have a ServiceCall database table that records down all the service calls made to you. Each of this record contains a many to one relationship to Customer record, where it stores which customer made the Service Call.
Ok, suppose the Customer has stop doing business with you and you do not need the Customer's record in your database. No longer need the Customer's name to appear in the dropdown list when you create a new ServiceCall record.
What do you do?
Do you allow the user to delete the Customer's record from the database?
Do you set a special column IsDeleted to true for that Customer's record, then make sure all dropdown list will not load all records that has IsDeleted set to true? Although this keeps the old records from breaking at innerjoins, it also prevents user from adding a new record with the same name as the old Customer, won't it?
Do you disallow deletion at all? Just allow to 'disable' it?
Any other strategies you used? I am guessing everyone have their way, I just need to see your opinions.
Of course the above is quite simplified, usually a ServiceCall record will link to many other entity tables. All of which will face the same problem when they are required to be deleted.
I prefer to set an IsDeleted flag, one of the benefits is you can still report on historical information (all teh data is still there).
As to the issue of not being able to insert another customer with the same name, this isn't a problem if you use an ID column (eg CustomerId) which is generally auto populated.
I agree with #Tetraneutron's answer.
Additionally, you can create a VIEW that lists only the active customers, to make it more convenient to populate drop-down lists and such.
Lets say I have the following 2 tables in a database:
[Movies] (Scheme: Automatic)
----------------------------
MovieID
Name
[Comments] (Scheme: Manual)
----------------------------
CommentID
MovieID
Text
The "Movies" table gets updated by a service every 10 minutes and the "Comments" table gets updated manually by the users of the database.
Normally you'd just create a simple foreign-key relationship between the two tables with cascading updates and deletes but in this case I want to be able to keep the manually entered data even if the movie it refers to gets deleted (the update service isn't that reliable). This should only be a problem in one-to-many releationships from an automatic table to a manual table. How would you separate the manual and the automatically populated parts of the database?
I was planning to add a foreign-key that isn't maintaining referencial integrity and only cascades updates, not deletions. But are there any pitfalls I should be aware of by doing it this way? I mean, except the fact that I might end up with some of the manual data that doesn't actually reference anything.
Edit / Clarification:
Just to clarify. The example tables are totally made up. In reality the DB will contain objects like servers, applications, application notes, versions numbers etc. Server related information will be populated automatically but some application details will be filled in manually. It could be information like special configurations and such. Even if the server record gets deleted the application notes on that server are still valuable and shouldn't be deleted.
I'd suggest you use an import table that gets updated by the service and then populate the movies tables from that. Then you get to keep movies that are deleted in the movies table. Possible tagging them as deleted or obsolete, but you'd still be able to keep them for historical purposes.
I think you should use a soft delete for that scenario. I don't think you want to have comments you don't know which movie they belong to.
Agree; an example route would be to copy the movies table and add a status field which indicates each record's present state (live/checking/deleted). Then the autoimport should go into a temporary table, set the status of all movies to 'checking', then use the temporary table to update the real movies table, setting the movie status to live when it's found in the temporary table. Once complete, set any movie which still has a status of 'checking' to deleted, since they weren't found in the autoimport. At the application end, select any movie which doesn't have status = deleted.
"I was planning to add a foreign-key
that isn't maintaining referencial
integrity and only cascades updates,
not deletions."
Since you appear to be using surrogate keys, updates will not be relevant to foreign elements. Additionally, since you do not care about orphaning data, then why use the referential constraint at all? You use constraints to ensure that something exists, which you do not appear to require in this situation.
We have an entity split across 5 different tables. Records in 3 of those tables are mandatory. Records in the other two tables are optional (based on sub-type of entity).
One of the tables is designated the entity master. Records in the other four tables are keyed by the unique id from master.
After update/delete trigger is present on each table and a change of a record saves off history (from deleted table inside trigger) into a related history table. Each history table contains related entity fields + a timestamp.
So, live records are always in the live tables and history/changes are in history tables. Historical records can be ordered based on the timestamp column. Obviously, timestamp columns are not related across history tables.
Now, for the more difficult part.
Records are initially inserted in a single transaction. Either 3 or 5 records will be written in a single transaction.
Individual updates can happen to any or all of the 5 tables.
All records are updated as part of a single transaction. Again, either 3 or 5 records will be updated in a single transaction.
Number 2 can be repeated multiple times.
Number 3 can be repeated multiple times.
The application is supposed to display a list of point in time history entries based on records written as single transactions only (points 1,3 and 5 only)
I'm currently having problems with an algorithm that will retrieve historical records based on timestamp data alone.
Adding a HISTORYMASTER table to hold the extra information about transactions seems to partially address the problem. A new record is added into HISTORYMASTER before every transaction. New HISTORYMASTER.ID is saved into each entity table during a transaction.
Point in time history can be retrieved by selecting the first record for a particular HISTORYMASTER.ID (ordered by timestamp)
Is there any more optimal way to manage audit tables based on AFTER (UPDATE, DELETE) TRIGGERs for entities spanning multiple tables?
Your HistoryMaster seems similar to how we have addressed history of multiple related items in one of our systems. By having a single point to hang all the related changes from in the history table, it is easy to then create a view that uses the history master as the hub and attached the related information. It also allows you to not create records in the history where an audit is not desired.
In our case the primary tables were called EntityAudit (where entity was the "primary" item being retained) and all data was stored EntityHistory tables related back to the Audit. In our case we were using a data layer for business rules, so it was easy to insert the audit rules into the data layer itself. I feel that the data layer is an optimal point for such tracking if and only if all modifications use that data layer. If you have multiple applications using distinct data layers (or none at all) then I suspect that a trigger than creates the master record is pretty much the only way to go.
If you don't have additional information to track in the Audit (we track the user who made the change, for example, something not on the main tables) then I would contemplate putting the extra Audit ID on the "primary" record itself. Your description does not seem to indicate you are interested in the minor changes to individual tables, but only changes that update the entire entity set (although I may be miss reading that). I would only do so if you don't care about the minor edits though. In our case, we needed to track all changes, even to the related records.
Note that the use of an Audit/Master table has an advantage in that you are making minimal changes to the History tables as compared to the source tables: a single AuditID (in our case, a Guid, although autonumbers would be fine in non distributed databases).
Can you add a TimeStamp / RowVersion datatype column to the entity master table, and associate all the audit records with that?
But an Update to any of the "child" tables will need to update the Master entity table to force the TimeStamp / RowVersion to change :(
Or stick a GUID in there that you freshen whenever one of the associated records changes.
Thinking that through, out loud, it may be better to have a table joined 1:1 to Master Entity that only contains the Master Entity ID and the "version number" fo the record - either TimeSTamp / RowVersion, GUID, incremented number, or something else.
I think it's a symptom of trying to capture "abstract" audit events at the lowest level of your application stack - the database.
If it's possible consider trapping the audit events in your business layer. This would allow you to capture the history per logical transaction rather than on a row-by-row basis. The date/time is unreliable for resolving things like this as it can be different for different rows, and the same for concurrent (or closely spaced) transactions.
I understand that you've asked how to do this in DB triggers though. I don't know about SQL Server, but in Oracle you can overcome this by using the DBMS_TRANSACTION.LOCAL_TRANSACTION_ID system package to return the ID for the current transaction. If you can retrieve an equivalent SQLServer value, then you can use this to tie the record updates for the current transaction together into a logical package.