Can Instead Of triggers co-exist with regular triggers? If so, are there any potential issues we should be aware of?
INSTEAD OF triggers can coexist with normal triggers. I've done this a good bit.
INSTEAD OF triggers have numerous potential issues, mainly around the fact that what they replace the normal insert/update/delete behavior with whatever you define. A developer may think nothing of UPDATE User SET Address = 'foo' WHERE UserID = 4, but if your trigger is using that as a hook to touch a dozen authentication tables and maybe talk to a server around the world, you've bought yourself a lot of potential confusion.
Keep the behavior of these triggers inline with expected behavior of IUD statements. Don't do too much.
INSTEAD OF triggers are a very powerful tool, easily misused. Use them appropriately and thoughtfully.
I haven't found anything to be concerned about with respect to using both INSTEAD OF and AFTER (AKA FOR) triggers at the same time. The main issues with INSTEAD OF triggers are:
You can only have one INSTEAD OF trigger per operation, per table;
They can mess with OUTPUT INTO clauses (i.e. you'll get identity values of 0);
If you make any schema changes to the table, things may mysteriously break at some point in the future if you weren't careful to maintain the trigger.
None of these caveats are related to AFTER triggers, so you don't really have anything to worry about in that regard. Although I will say that it's more common to write INSTEAD OF triggers on views as opposed to tables, because there's less chance of them interfering with table operations. They were primarily designed as a tool to help you create insertable/updatable views.
Anyway, you'll be fine if you're careful, but I would still recommend against using an INSTEAD OF trigger unless you actually need to, because they make ordinarily simple operations harder to reason about.
Related
I have a production SQL-Server DB (reporting) that has many Stored Procedures.
The SPs are publicly exposed to the external world in different ways
- some users have access directly to the SP,
- some are exposed via a WebService
- while others are encapsulated as interfaces thru a DCOM layer.
The user base is large and we do not know exactly which user-set uses which method of accessing the DB.
We get frequent (about 1 every other month) requests from user-sets for modifying an existing SP by adding one column to the output or a group of columns to the existing output, all else remaining same.
We initially started doing this by modifying the existing SP and adding the newly requested columns to the end of the output. But this broke the custom tools built by some other user bases as their tool had the number of columns hardcoded, so adding a column meant they had to modify their tool as well.
Also for some columns complex logic is required to get that column into the report which meant the SP performance degraded, affecting all users - even those who did not need the new column.
We are thinking of various ways to fix this:
1 Default Parameters to control flow
Update the existing SP and control the new functionality by adding a flag as a default parameter to control the code path. By using default parameters, if value of the Parameter is set to true then only call the new functionality. By default it is set to False.
Advantage
New Object is not required.
On going maintenance is not affected.
Testing overhead remains under control.
Disadvantage
Since an existing SP is modified, it will need testing of existing functionality as well as new functionality.
Since we have no inkling on how the client tools are calling the SPs we can never be sure that we have not broken anything.
It will be difficult to handle if same report gets modified again with more requests – will mean more flags and code will become un-readable.
2 New Stored procedure
A new stored procedure will be created for any requirement which changes the signature(Input/Output) of the SP. The new SP will call the original stored procedure for existing stuff and add the logic for new requirement on top of it.
Advantage
Here benefit will be that there will be No impact on the existing procedure hence No Testing required for old logic.
Disadvantage
Need to create new objects in database whenever changes are requested. This will be overhead in database maintenance.
Will the execution plan change based on adding a new parameter? If yes then this could adversely affect users who did not request the new column.
Considering a SP is a public interface to the DB and interfaces should be immutable should we go for option 2?
What is the best practice or does it depend on a case by case basis, and what should be the main driving factors when choosing a option?
Thanks in advance!
Quoting from a disadvantage for your first option:
It will be difficult to handle if same report gets modified again with more requests – will mean more flags and code will become un-readable.
Personally I feel this is the biggest reason not to modify an existing stored procedure to accommodate the new columns.
When bugs come up with a stored procedure that has several branches, it can become very difficult to debug. Also as you hinted at, the execution plan can change with branching/if statements. (sql using different execution plans when running a query and when running that query inside a stored procedure?)
This is very similar to object oriented coding and your instinct is correct that it's best to extend existing objects instead of modify them.
I would go for approach #2. You will have more objects, but at least when an issue comes up, you will be able to know the affected stored procedure has limited scope/impact.
Over time I've learned to grow objects/data structures horizontally, not vertically. In other words, just make something new, don't keep making existing things bigger and bigger and bigger.
Ok. #2. Definitely. No doubt.
#1 says: "change the existing procedure", causing things to break. No way that's a good thing! Your customers will hate you. Your code just gets more complex meaning it is harder and harder to avoid breaking things leading to more hatred. It will go horribly slowly, and be impossible to tune. And so on.
For #2 you have a stable interface. No hatred. Yay! Seriously, "yay" as in "I still have a job!" as opposed to "boo, I got fired for annoying the hell out of my customers". Seriously. Never ever do #1 for that reason alone. You know this is true. You know it!
Having said that, record what people are doing. Take a user-id as a parameter. Log it. Know your users. Find the ones using old crappy code and ask them nicely to upgrade if necessary.
Your reason given to avoid number 2 is proliferation. But that is only a problem if you don't test stuff. If you do test stuff properly, then proliferation is happening anyway, in your tests. And you can always tune things in #2 if you have to, or at least isolate performance problems.
If the fatter procedure is really great, then retrofit the skinny version with a slimmer version of the fat one. In SQL this is tricky, but copy/paste and cut down your select column list works. Generally I just don't bother to do this. Life is too short. Having really good test code is a much better investment of time, and data schema tend to rarely change in ways that break existing queries.
Okay. Rant over. Serious message. Do #2, or at the very least do NOT do #1 or you will get yourself fired, or hated, or both. I can't think of a better reason than that.
Easier to go with #2. Nullable SP parameters can create some very difficult to locate bugs. Although, I do employ them from time to time.
Especially when you start getting into joins on nulls and ANSI settings. The way you write the query will change the results dramatically. KISS. (Keep things simple stupid).
Also, if it's a parameterized search for reporting or displaying, I might consider a super-fast fetch of data into a LINQ-able object. Then you can search an in-memory list rather than re-fetching from the database.
#2 could be better option than #1 particularly considering the bullet 3 of disadvantages of #1 since requirements keep changing on most of the time. I feel this because disadvantages are dominating here than advantages on either side.
I would also vote for #2. I've seen a few stored procedures which take #1 to the extreme: The SPs has a parameter #Option and a few parameters #param1, #param2, .... The net effect is that this is a single stored procedure that tries to play the role of many stored procedures.
The main disadvantage to #2 is that there are more stored procedures. It may be more difficult to find the one you're looking for, but I think that is a small price to pay for the other advantages you get.
I want to make sure also, that you don't just copy and paste the original stored procedure and add some columns. I've also seen too many of those. If you are only adding a few columns, you can call the original stored procedure and join in the new columns. This will definitely incur a performance penalty if those columns were readily available before, but you won't have to change your original stored procedure (refactoring to allow for good performance and no duplication of the code), nor will you have to maintain two copies of the code (copy and paste for performance).
I am going to suggest a couple of other options based on the options you gave.
Alternative option #1: Add another variable, but instead of making it a default variable base the variable off of customer name. That way Customer A can get his specialized report and Customer B can get his slightly different customized report. This adds a ton of work as updates to the 'Main' portion would have to get copied to all the specialty customer ones.
You could do this with branching 'if' statements.
Alternative option #2: Add new stored procedures, just add the customer's name to the stored procedure. Maintenance wise, this might be a little more difficult but it will achieve the same end results, each customer gets his own report type.
Option #2 is the one to choose.
You yourself mentioned (dis)advantages.
While you consider adding new objects to db based on requirement changes, add only necessary objects that don't make your new SP bigger and difficult to maintain.
I am working on an application that someone else wrote and it appears that they are using IDs throughout the application that are not defined in the database. For a simplified example, lets say there is a table called Question:
Question
------------
Id
Text
TypeId
SubTypeId
Currently the SubTypeId column is populated with a set of IDs that do not reference another table in the database. In the code these SubTypeIds are mapped to a specific string in a configuration file.
In the past when I have had these types of values I would create a lookup table and insert the appropriate values, but in this application there is a mapping between the IDs and their corresponding text values in a configuration file.
Is it bad practice to define a lookup table in a configuration file rather than in the database itself?
Is it bad practice to define a lookup table in a configuration file rather than in the database itself?
Absolutely, yes. It brings in a heavy dependence on the code to manage and maintain references, fetch necessary values, etc. In a situation where you now need to create additional functionality, you would rely on copy-pasting the mapping (or importing them, etc.) which is more likely to cause an issue.
It's similar to why DB constraints should be in the DB rather than in the program/application that's accessing it - any maintenance or new application needs to replicate all the behaviour and rules. Having things this way has similar side-affects I've mentioned here in another answer.
Good reasons to have a lookup table:
Since DBs can generally naturally have these kinds of relations, it would be obvious to use them.
Queries first need to be constructed in code for the Type- and SubType- Text vs ID instead of having them as part of the where/having clause of the query that is actually executed.
Speed/Performance - with the right indexes and table structures, you'd benefit from this (and reduce code complexity that manages it)
You don't need to update your code for to add a new Type or SubType, or to edit/delete them.
Possible reasons it was done that way, which I don't think are valid reasons:
The TypeID and SubTypeID are related and the original designer did not know how to create a complex foreign key. (Not a good reason though.)
Another could be 'translation' but that could also be handled using foreign key relations.
In some pieces of code, there may not be a strict TypeID-to-SubTypeID relation and that logic was handled in code rather than in the DB. Again, can be managed using 'flag' values or NULLs if possible. Those specific cases could be handled by designing the DB right and then working around a unique/odd situation in code instead of putting all the dependence on the code.
NoSQL: Original designer may be under the impression that such foreign keys or relations cannot be done in a NoSQL db.
And the obvious 'people' problem vs technical challenge: The original designer may not have had a proper understanding of databases and may have been a programmer who did that application (or was made to do it) without the right knowledge or assistance.
Just to put it out there: If the previous designer was an external contractor, he may have used the code maintenance complexity or 'support' clause as a means to get more business/money.
As a general rule of thumb, I'd say that keeping all the related data in a DB is a better practice since it removes a tacit dependency between the DB and your app, and because it makes the DB more "comprehensible." If the definitions of the SubTypeIDs are in a lookup table it becomes possible to create queries that return human-readable results, etc.
That said, the right answer probably depends a bit on the specifics of the application. If there's very tight coupling between the DB and app to begin with (eg, if the DB isn't going to be accessed by other clients) this is probably a minor concern particularly if the set of SubTypeIDs is small and seldom changes.
I would like to ask experienced users, if you prefer to use data aware controls to add, insert, delete and edit data in DB or you favor to do it manualy.
I developed some DB applications, in which for the sake of "user friendly policy" I run into complicated web of table events (afterinsert, afteredit, after... and beforeedit, beforeinsert, before...). After that it was a quite nasty work to debug the application.
Aware of this risk (later by another application) I tried to avoid this problem, so I paid increased attention to write code well, readable and comprehensive. It seemed everything all right from the beginning, but as I needed to handle some preprocessing stuff before sending and loading data etc, I run into the same problems again, "slowly and inevitably". Sometime I could not use dataaware controls anyway, and what seemed to be a "cool" feature of DAControl at the beginning it turned to an obstacle on the end. I "had to" write special routine for non-dataaware controls, in order to behave as dataaware. Then I asked myself, why on earth should I use dataaware controls? Is it better to found application architecture on non-dataaware controls? It requires more time to write bug-proof code, of course, but does it worth of it? I do not know...
I happened to me several times, like jinxed : paradise on the beginning hell on the end...
I do not know, if I use wrong method to write DB program, if there is some standard common practice how to proceed. Or if it is common problem to everybody?
Thanx for advices and your experiences
I've written applications that used data aware components against TTable style components and applications which used non-data aware components.
My preference these days is to use data aware components but with TClientDataSets rather than TTable style components.
Using a TClientDataSet I don't have to make my user interface structure mimic my database structure. It's flexible enough to fill it with the data from several tables and then when you are applying the updates back to the database you can manually add/delete/update records as you see fit.
The secret should be in DataSet parameter automation, you can create a control that glues datasets together in master-slave way, just by defining connections between them. Ofcourse such control should be fed with form parameters in some other generalized way. In this case calling form with entity identifier, all datasets will get filled in a proper order and will allow to update data in database automatically by provider.
Generally it is better to have DataSets being an exact representation of tables with optional calculated fields (fkInternalCalc sometimes works better as it updates with row change not field change) bound to data aware controls. Data aware controls are the most optimal approach, and less error prone. Like in every aspect, there are exceptions to that.
If you must write too many glue functions, the problem probably is in design pattern not in VCL.
A lot of the time I use data aware controls linked to an in-memory table (kbmMemTable) that is filled from a query.
The benefits I see are:
I have full control over all inserts/updates/posts/edits to the database.
No need to worry about a user leaving a record in update mode (potentially locking other users)
Did I mention full control over all inserts/updates/posts/edits?
Using the in-memory table is as easy as:
dataset.sql.add('select a.field,b.field from a,b');
dataset.open;
inMemoryTable.loadfromdataset(dataset);
inMemoryTable.checkpoint;
And then "resolving" back to the database, you are given access to the original and new data for each field in each record (similar in a way to a trigger) - you can easily transaction and resolve a whole edit back in milliseconds - even if it took the end user 30 mins to fill in the data aware controls.
Have you considered a O/R mapper for Delphi like tiOPF or hcOPF?
This will separate the business domain logic from the database layer. For big and legacy systems, it is even common to add another layer, the 'Anti Corruption Layer', which protects the model from changes in the database design.
For me, the classic wisdom is to store enum values (OrderStatus, UserTypes, etc) as Lookup tables in your db. This lets me enforce data integrity in the database, preventing false or null values, etc.
However more and more, this feels like unnecessary duplication to me. Not only do I have to create tables for these values (or have an unwieldy central lookup table), but if I want to add a value, i have to remember to add it to 2 (or more, counting production, testing, live db's) and things can get out of sync easily.
Still I have a hard time letting go of lookup tables.
I know there are probably certain scenarios where one had an advantage over the other, but what are your general thoughts?
I've done both, but I now much prefer defining them as in classes in code.
New files cost nothing, and the benefits that you seek by having it in the database should be handled as business rules.
Also, I have an aversion to holding data in a database that really doesn't change. And it seems an enum fits this description. It doesn't make sense for me to have a States lookup table, but a States enum class makes sense to me.
If it has to be maintained I would leave them in a lookup table in the DB. Even if I think they won't need to be maintained I would still go towards a lookup table so that if I am wrong it's not a big deal.
EDIT:
I want to clarify that if the Enum is not part of the DB model then I leave it in code.
I put them in the database, but I really can't defend why I do that. It just "seems right". I guess I justify it by saying there's always a "right" version of what the enums can be by checking the database.
Schema dependencies should be stored in the database itself to ensure any changes to your architecture can be easily perform transparently to the app..
I prefer enums as it enforces early binding of values in code, so that exceptions aren't caused by missing values
It's also helpful if you can use code generation that can bring in the associations of the integer columns to an enumeration type, so that in business logic you only have to deal with easily memorable enumeration values.
Consider it a form of documentation.
If you've already documented the enum constants properly in the code that uses the dB, do you really need a duplicate set of documentation (to use and maintain)?
How do you implement a last-modified column in SQL?
I know for a date-created column you can just set the default value to getdate(). For last-modified I have always used triggers, but it seems like there must be a better way.
Thanks.
Triggers are the best way, because this logic is intimately associated with the table, and not the application. This is just about the only obvious proper use of a trigger that I can think of, apart from more granular referential integrity.
But I think "last-modified" dates are a flawed concept anyway. What makes the "last-changed" timestamp any more valuable than the ones that came before it? You generally need a more complete audit trail.
The only other way to perform this without using triggers is to disallow any inserts/updates/deletes on the table directly via permissions, and insist all these actions are performed via stored procedures that will take care of setting the modified date.
An administrator might still be able to modify data without using the stored procedures, but an administrator can also disable triggers.
If there are a lot of tables that require this sort of functionality, I would favour triggers as it simplifies the code. Simple, well written and well-indexed auditing triggers are generally not too evil - they only get bad when you try to put too much logic in the trigger.
You can use the keyword DEFAULT, assuming you have a default constraint.
On insert, there is no need to specify a value, You could use the keyword here too.
No trigger and is done in the same write as the "real" data
UPDATE
table
SET
blah,
LastUpdatedDateTime = DEFAULT
WHERE
foo = bar
Using a trigger to update the last modified column is the way to go. Almost every record in the system at work is stamped with an add and change timestamp and this has helped me quite a bit. Implementing it as a trigger will let you stamp it whenever there is any change, no matter how it was initiated.
Another nice thing about a trigger is that you can easily expand it to store an audit trail as well without too much trouble.
And as long as you are doing so, add a field for last_updated_by and update the user everytime the record is updated. Not as good as a real audit table but much better than the date last updated.
A trigger is the only method you can use to do this. And to those who said triggers are evil, no they aren't. They are the best way - by far - to maintain data integrity that is more complex than a simple default or constraint. They do need to written by people who actually know what they are doing though. But then that is true of most things that affect database design and integrity.