How to separate automatically populated tables from manually populated tables, properly, in SQL Server? - sql-server

Lets say I have the following 2 tables in a database:
[Movies] (Scheme: Automatic)
----------------------------
MovieID
Name
[Comments] (Scheme: Manual)
----------------------------
CommentID
MovieID
Text
The "Movies" table gets updated by a service every 10 minutes and the "Comments" table gets updated manually by the users of the database.
Normally you'd just create a simple foreign-key relationship between the two tables with cascading updates and deletes but in this case I want to be able to keep the manually entered data even if the movie it refers to gets deleted (the update service isn't that reliable). This should only be a problem in one-to-many releationships from an automatic table to a manual table. How would you separate the manual and the automatically populated parts of the database?
I was planning to add a foreign-key that isn't maintaining referencial integrity and only cascades updates, not deletions. But are there any pitfalls I should be aware of by doing it this way? I mean, except the fact that I might end up with some of the manual data that doesn't actually reference anything.
Edit / Clarification:
Just to clarify. The example tables are totally made up. In reality the DB will contain objects like servers, applications, application notes, versions numbers etc. Server related information will be populated automatically but some application details will be filled in manually. It could be information like special configurations and such. Even if the server record gets deleted the application notes on that server are still valuable and shouldn't be deleted.

I'd suggest you use an import table that gets updated by the service and then populate the movies tables from that. Then you get to keep movies that are deleted in the movies table. Possible tagging them as deleted or obsolete, but you'd still be able to keep them for historical purposes.

I think you should use a soft delete for that scenario. I don't think you want to have comments you don't know which movie they belong to.

Agree; an example route would be to copy the movies table and add a status field which indicates each record's present state (live/checking/deleted). Then the autoimport should go into a temporary table, set the status of all movies to 'checking', then use the temporary table to update the real movies table, setting the movie status to live when it's found in the temporary table. Once complete, set any movie which still has a status of 'checking' to deleted, since they weren't found in the autoimport. At the application end, select any movie which doesn't have status = deleted.

"I was planning to add a foreign-key
that isn't maintaining referencial
integrity and only cascades updates,
not deletions."
Since you appear to be using surrogate keys, updates will not be relevant to foreign elements. Additionally, since you do not care about orphaning data, then why use the referential constraint at all? You use constraints to ensure that something exists, which you do not appear to require in this situation.

Related

Where should I break up my user records to keep track of revisions

I am putting together a staff database and I need to be able to revise the staff member information, but also keep track of all the revisions. How should I structure the database so that I can have multiple revisions of the same user data but be able to query against the most recent revision? I am looking at information that changes rarely, like Last Name, but that I will need to be able to query for out of date values. So if Jenny Smith changes her name to Jenny James I need to be able to find the user's current information when I search against her old name.
I assume that I will need at least 2 tables, one that contains the uid and another that contains the revisions. Then I would join them and query against the most recent revision. But should I break it out even further, depending on how often the data changes or the type of data? I am looking at about 40 fields per record and only one or two fields will probably change per update. Also I cannot remove any data from the database, I need to be able to look back on all previous records.
A simple way of doing this is to add a deleted flag and instead of updating records you set the deleted flag on the existing record and insert a new record.
You can of course also write the existing record to an archive table, if you prefer. But if changes are infrequent and the table is not big I would not bother.
To get the active record, query with 'where deleted = 0', the speed impact will be minimal when there is an index on this field.
Typically this is augmented with some other fields like a revision number, when the record was last updated, and who updated it. The revision number is very useful to get the previous versions and also to do optimistic locking. The 'who updated this last and when' questions usually come once the system is running instead of during requirements gathering, and are useful fields to put in any table containing 'master' data.
I would use the separate table because then you can have a unique identifier that points to all the other child records that is also the PK of the table which I think makes it less likely you will have data integrity issues. For instance, you have Mary Jones who has records in the address table and the email table and performance evaluation table, etc. If you add a change record to the main table, how are you going to relink all the existing information? With a separate history table, it isn't a problem.
With a deleted field in one table, you then have to have an non-autogenerated person id and an autogenrated recordid.
You also have the possiblity of people forgetting to use the where deleted = 0 where clause that is needed for almost every query. (If you do use the deleted flag field, do yourself a favor and set a view with the where deleted = 0 and require developers to use the view in queries not the orginal table.)
With the deleted flag field you will also need a trigger to ensure one and only one record is marked as active.
#Peter Tillemans' suggestion is a common way to accomplish what you're asking for. But I don't like it.
The structure of a database should reflect the real-world facts that are being modeled.
I would create a separate table for obsolete_employee, and just store the historical information that would need to be searched in the future. This way you can keep your real employee data table clean and keep only the old data that is necessary. This approach will also simplify reporting and other features of the application that are not related to searching historical data.
Just think of that warm feeling you'll get when you type select * from employee and nothing but current, correct goodness comes flowing back!

inserting into a view in SQL server

I have a SQL Server as backend and use ms access as frontend.
I have two tables (persons and managers), manager is derived from persons (a 1:1 relation), thus i created a view managersFull which is basically a:
SELECT *
FROM `managers` `m`
INNER JOIN `persons` `p`
ON `m`.`id` = `p`.`id`
id in persons is autoincrementing and the primary key, id in managers is the primary key and a foreign key, referencing persons.id
now i want to be able to insert a new dataset with a form in ms access, but i can’t get it to work. no error message, no status line, nothing. the new rows aren’t inserted, and i have to press escape to cancel my changes to get back to design view in ms access.
i’m talking about a managers form and i want to be able to enter manager AND person information at the same time in a single form
my question is now: is it possible what i want to do here? if not, is there a “simple” workaround using after insert triggers or some lines of vba code?
thanks in advance
The problem is that your view is across several tables. If you access multiple tables you could update or insert in only one of them.
Please also check the MSDN for more detailed information on restrictions and on proper strategies for view updates
Assuming ODBC, some things to consider:
make sure you have a timestamp field in the person table, and that it is returned in your managers view. You also probably need the real PK of the person table in the manager view (I'm assuming your view takes the FK used for the self-join and aliases it as the ID field -- I wouldn't do that myself, as it is confusing. Instead, I'd use the real foreign key name in the managers view, and let the PK stand on its own with its real name).
try the Jet/ACE-specific DISTINCTROW predicate in your recordsource. With Jet/ACE back ends, this often makes it possible to insert into both tables when it's otherwise impossible. I don't know for certain if Jet will be smart enough to tell SQL Server to do the right thing, though.
if neither of those things works, change your form to use a recordsource based on your person table, and use a combo box based on the managers view as the control with which you edit the record to relate the person to a manager.
Ilya Kochetov pointed out that you can only update one table, but the work-around would be to apply the updates to the fields on one table and then the other. This solution assumes that the only access you have to these two tables is through this view and that you are not allowed to create a stored procedure to take care of this.
To model and maintain two related tables in access you don’t use a query or view that is a join of both tables. What you do is use a main form, and drop in a sub-form that is based on the child table. If the link master and child setting in the sub-form is set correctly, then you not need to write any code and access will insert the person’s id in the link field.
So, don’t use a joined table here. Simply use a form + sub-form setup and you be able to edit and maintain the data and the data in the related child table.
This means you base the form on the table, and not a view. And you base the sub-form on the child table. So, don't use a view here.

Entity Deletion Strategy

Say you have a ServiceCall database table that records down all the service calls made to you. Each of this record contains a many to one relationship to Customer record, where it stores which customer made the Service Call.
Ok, suppose the Customer has stop doing business with you and you do not need the Customer's record in your database. No longer need the Customer's name to appear in the dropdown list when you create a new ServiceCall record.
What do you do?
Do you allow the user to delete the Customer's record from the database?
Do you set a special column IsDeleted to true for that Customer's record, then make sure all dropdown list will not load all records that has IsDeleted set to true? Although this keeps the old records from breaking at innerjoins, it also prevents user from adding a new record with the same name as the old Customer, won't it?
Do you disallow deletion at all? Just allow to 'disable' it?
Any other strategies you used? I am guessing everyone have their way, I just need to see your opinions.
Of course the above is quite simplified, usually a ServiceCall record will link to many other entity tables. All of which will face the same problem when they are required to be deleted.
I prefer to set an IsDeleted flag, one of the benefits is you can still report on historical information (all teh data is still there).
As to the issue of not being able to insert another customer with the same name, this isn't a problem if you use an ID column (eg CustomerId) which is generally auto populated.
I agree with #Tetraneutron's answer.
Additionally, you can create a VIEW that lists only the active customers, to make it more convenient to populate drop-down lists and such.

trigger insertions into same table

I have many tables in my database which are interrelated. I have a table (table one) which has had data inserted and the id auto increments. Once that row has an ID i want to insert this into a table (table three) with another set of ID's which comes from a form(this data will also be going into a table, so it could from from that table), the same form as the data which went into the first table came from.
The two ID's together make the primary key of the third table.
How can I do this, its to show that more than one ID is joined to a single ID for something else.
Thanks.
You can't do that through a trigger as the trigger only has available to it the data that you already inserted not data that is currenlty only residing in your user interface.
Normally how you handle this situation is that you write a stored proc that inserts the meeting, returns the id value (using scope_identity() in SQL Server, but I'm sure other databases would have method to return the auto-generated id as well). Then you would use that value to insert to the other table with the other values you need for that table. You would of course want to wrap the whole thing in a transaction.
I think you can probably do what you're describing (just write the INSERTs to table 3) in the table 1 trigger) but you'll have to put the additional info for the table 3 rows into your table 1 row, which isn't very smart.
I can't see why you would do that instead of writing the INSERTs in your code, where someone reading it can see what's happening.
The trouble with triggers is that they make it easy to hide business logic in the database. I think (and I believe I'm in the majority here) that it's easier to understand, manage, maintain and generally all-round deal with an application where all the business rules exist in the same general area.
There are reasons to use triggers (for propagating denormalised values, for example) just as there are reasons for useing stored procedures. I'm going to assert that they are largely related to performance-critical areas. Or should be.

Maintaining audit log for entities split across multiple tables

We have an entity split across 5 different tables. Records in 3 of those tables are mandatory. Records in the other two tables are optional (based on sub-type of entity).
One of the tables is designated the entity master. Records in the other four tables are keyed by the unique id from master.
After update/delete trigger is present on each table and a change of a record saves off history (from deleted table inside trigger) into a related history table. Each history table contains related entity fields + a timestamp.
So, live records are always in the live tables and history/changes are in history tables. Historical records can be ordered based on the timestamp column. Obviously, timestamp columns are not related across history tables.
Now, for the more difficult part.
Records are initially inserted in a single transaction. Either 3 or 5 records will be written in a single transaction.
Individual updates can happen to any or all of the 5 tables.
All records are updated as part of a single transaction. Again, either 3 or 5 records will be updated in a single transaction.
Number 2 can be repeated multiple times.
Number 3 can be repeated multiple times.
The application is supposed to display a list of point in time history entries based on records written as single transactions only (points 1,3 and 5 only)
I'm currently having problems with an algorithm that will retrieve historical records based on timestamp data alone.
Adding a HISTORYMASTER table to hold the extra information about transactions seems to partially address the problem. A new record is added into HISTORYMASTER before every transaction. New HISTORYMASTER.ID is saved into each entity table during a transaction.
Point in time history can be retrieved by selecting the first record for a particular HISTORYMASTER.ID (ordered by timestamp)
Is there any more optimal way to manage audit tables based on AFTER (UPDATE, DELETE) TRIGGERs for entities spanning multiple tables?
Your HistoryMaster seems similar to how we have addressed history of multiple related items in one of our systems. By having a single point to hang all the related changes from in the history table, it is easy to then create a view that uses the history master as the hub and attached the related information. It also allows you to not create records in the history where an audit is not desired.
In our case the primary tables were called EntityAudit (where entity was the "primary" item being retained) and all data was stored EntityHistory tables related back to the Audit. In our case we were using a data layer for business rules, so it was easy to insert the audit rules into the data layer itself. I feel that the data layer is an optimal point for such tracking if and only if all modifications use that data layer. If you have multiple applications using distinct data layers (or none at all) then I suspect that a trigger than creates the master record is pretty much the only way to go.
If you don't have additional information to track in the Audit (we track the user who made the change, for example, something not on the main tables) then I would contemplate putting the extra Audit ID on the "primary" record itself. Your description does not seem to indicate you are interested in the minor changes to individual tables, but only changes that update the entire entity set (although I may be miss reading that). I would only do so if you don't care about the minor edits though. In our case, we needed to track all changes, even to the related records.
Note that the use of an Audit/Master table has an advantage in that you are making minimal changes to the History tables as compared to the source tables: a single AuditID (in our case, a Guid, although autonumbers would be fine in non distributed databases).
Can you add a TimeStamp / RowVersion datatype column to the entity master table, and associate all the audit records with that?
But an Update to any of the "child" tables will need to update the Master entity table to force the TimeStamp / RowVersion to change :(
Or stick a GUID in there that you freshen whenever one of the associated records changes.
Thinking that through, out loud, it may be better to have a table joined 1:1 to Master Entity that only contains the Master Entity ID and the "version number" fo the record - either TimeSTamp / RowVersion, GUID, incremented number, or something else.
I think it's a symptom of trying to capture "abstract" audit events at the lowest level of your application stack - the database.
If it's possible consider trapping the audit events in your business layer. This would allow you to capture the history per logical transaction rather than on a row-by-row basis. The date/time is unreliable for resolving things like this as it can be different for different rows, and the same for concurrent (or closely spaced) transactions.
I understand that you've asked how to do this in DB triggers though. I don't know about SQL Server, but in Oracle you can overcome this by using the DBMS_TRANSACTION.LOCAL_TRANSACTION_ID system package to return the ID for the current transaction. If you can retrieve an equivalent SQLServer value, then you can use this to tie the record updates for the current transaction together into a logical package.

Resources