How to avoid circular relationship in SQL-Server? - sql-server

I am creating a self-related table:
Table Item columns:
ItemId int - PK;
Amount money - not null;
Price money - a computed column using a UDF that retrieves value according to the items ancestors' Amount.
ParentItemId int - nullable, reference to another ItemId in this table.
I need to avoid a loop, meaning, a sibling cannot become an ancestor of his ancestors, meaning, if ItemId=2 ParentItemId = 1, then ItemId 1 ParentItemId = 2 shouldn't be allowed.
I don't know what should be the best practice in this situation.
I think I should add a CK that gets a Scalar value from a UDF or whatever else.
EDIT:
Another option is to create an INSTEAD OF trigger and put in 1 transaction the update of the ParentItemId field and selecting the Price field from the ##RowIdentity, if it fails cancel transaction, but I would prefer a UDF validating.
Any ideas are sincerely welcomed.

Does this definitely need to be enforced at the database level?
I'm only asking as I have databases like this (where the table similar to this is like a folder) and I only make sure that the correct parent/child relationships are set up in the application.

Checks like this is not easy to implement, and possible solutions could cause a lot of bugs and problems may be harder then initial one. Usually it is enough to add control for user's input and prevent infinite loop on read data.
If your application uses stored procedures, no ORM, than I would choose to implement this logic in SP. Otherwise - handle it in other layers, not in DB

How big of a problem is this, in real life? It can be expensive to detect these situations (using a trigger, perhaps). In fact, it's likely going to cost you a lot of effort, on each transaction, when only a tiny subset of all your transactions would ever cause this problem.
Think about it first.

A simple trick is to force the ParentItemId to be less than the ItemId. This prevents loop closure in this simple context.
However, there's a down side - if you need for some reason to delete/insert a parent, you may need to delete/insert all of its children in order as well.
Equally, hierarchies need to be inserted in order, and you may not be able to reassign a parent.

Tested and works just great:
CREATE TRIGGER Item_UPDATE
ON Item
FOR INSERT, UPDATE
AS
BEGIN
BEGIN TRY
SELECT Price FROM INSERTED
END TRY
BEGIN CATCH
RAISERROR('This item cannot be specified with this parent.', 16, 1)
ROLLBACK TRANSACTION;
END CATCH
END
GO

Related

What is the proper way to write insert triggers in SQL Server?

My question is a little bit theoretical because I don't have any concrete working example. But I think it's worth to answer it.
What is the proper way to write insert-triggers in SQL Server?
Let's say I create a trigger like this (more or less pseudocode)
CREATE TRIGGER MY_TRIGGER
ON MY_TABLE
FOR INSERT AS
DECLARE #myVariable;
DECLARE InsertedRows CURSOR FAST_FORWARD FOR SELECT A_COLUMN FROM INSERTED;
OPEN InsertedRows;
FETCH NEXT FROM InsertedRows INTO #NewOrderCode;
...
INSERT INTO ANOTHER_TABLE (
CODE,
DATE_INSERTED
) VALUES (
#myVariable,
GETDATE()
);
...etc
Now what if someone else create another trigger on the same table and that trigger would change some columns on inserted rows? Something like this
CREATE TRIGGER ANOTHER_TRIGGER
ON MY_TABLE
FOR INSERT AS
UPDATE MY_TABLE
SET A_COLUMN = something
WHERE ID IN (SELECT ID FROM INSERTED);
...etc
Then my trigger (if fired after the another trigger) operates on wrong data, because INSERTED data are not the same as the real inserted data in the table which have been changed with the other trigger right?
Summary:
Trigger A updates new inserted rows on table T, trigger B then operates on dirty data because the update from trigger A is not visible in the INSERTED pseudo table which trigger B operates on. BUT if the trigger B would operate directly on the table instead of on the pseudo table INSERTED, it would see updated data by trigger A.
Is that true? Should I always work with the data from the table itself and not from the INSERTED table?
I'd usually recommend against having multiple triggers. For just two, you can, if you want to, define what order you want them to run in. Once you have a few more though, you have no control over the order in which the non-first, non-last triggers run.
It also increasingly makes it difficult just to reason about what's happening during insert.
I'd instead recommend having a single trigger per-table, per-action, that accomplishes all tasks that should happen for that action. If you're concerned about the size of the code that results, that's usually an indication that you ought to be moving that code out of the trigger all together - triggers should be fast and light.
Instead, you should start thinking about having the trigger just record an action and then use e.g. service broker or a SQL Server job that picks up those records and performs additional processing. Importantly, it does that within its own transactions rather than delaying the original INSERT.
I would also caution against the current code you're showing in example 1. Rather than using a cursor and inserting rows one by one, consider writing an INSERT ... SELECT statement that references inserted directly and inserts all new rows into the other table.
One thing you should absolutely avoid in a trigger is using a CURSOR!
A trigger should be very nimble, small, fast - and a cursor is anything but! After all, it's being executed in the context of the transaction that caused it to fire. Don't delay completion of that transaction unnecessarily!
You need to also be aware that Inserted will contain multiple rows and write your trigger accordingly, but please use set-based techniques - not cursors and while loops - to keep your trigger quick and fast.
Don't do heavy lifting, time-consuming work in a trigger - just updating a few columns, or making an entry into another table - that's fine - NO heavy lifting! and no e-mail sending etc!
My Personal Guide to SQL Trigger Happiness
The trigger should be light and fast. Expensive triggers make for a slow database for EVERYBODY (and not incidentally unhappiness for everybody concerned including the trigger author)
One trigger operation table combo please. That is at most one insert trigger on the foo table. Though the same trigger for multiple operations on a table is not necessarily bad.
Don't forget that the inserted and deleted tables may contain more than a single row or even no rows at all. A happy trigger (and more importantly happy database users and administrators) will be well-behaved no matter how many rows are involved in the operation.
Do not Not NOT NOT ever use cursors in triggers. Server-side cursors are usually an abuse of good practice though there are rare circumstances where their use is justified. A trigger is NEVER one of them. Prefer instead a series of set-oriented DML statements to anything resembling a trigger.
Remember there are two classes of triggers - AFTER triggers and INSTEAD OF triggers. Consider this when writing a trigger.
Never overlook that triggers (AFTER or INSTEAD OF) begin execution with ##trancount one greater than the context where the statement that fired them runs at.
Prefer declarative referential integrity (DRI) over triggers as a means of keeping data in the database consistent. Some application integrity rules require triggers. But DRI has come a long way over the years and functions like row_number() make triggers less necessary.
Triggers are transactional. If you tried to do a circular update as you've described, it should result in a deadlock - the first update will block the second from completing.
While looking at this code though, you're trying to cursor through the INSERTED pseudo-table to do the inserts - nothing in the example requires that behaviour. If you just insert directly from the full INSERTED table you'd get a definite improvement, and also less firings of your second trigger.

Updating column with it's current value

I have a stored proc that should conditionally update a bunch of fields in the same table. Conditionally, because for each field I also pass a "dirty" flag and a field should be updated only if flag is set to 1.
So I'm going to do the following:
create proc update
#field1 nvarchar(1000), #field1Dirty bit, ...other fields...
as
begin
update mytable
set field1 = case when #field1dirty = 1 then #field1 else field1 end,
... same for other fields
end
go
Question - is SQL Server (2008) smart enough to not physically update a field if it's been assigned its own value, like in case if #field1dirty = 0?
Question - is SQL Server (2008) smart enough to not physically update
a field if it's been assigned its own
value, like in case if #field1dirty =
0?
No you should add a where clause that says...where field <> the value you are updating to.
This doesn't seem like a big deal at first, but in truth it can create a massive amount of overhead. One example, think about triggers. If that updates every field in the table, that trigger will fire for every row. YIKES, that's a lot of code execution that's needless, especially if that code is say, moving updates rows to a logging table. I'm sure you get the idea.
Remember, you're updating the field, it just happens to be the same value it was before. It's actually good that this happens, because that means that you can still count the field as modified (think timestamp etc.). If it didn't think updating the field to the same value was modifying the row, you wouldn't know if someone inadvertently (or deliberately) tried to change data.
Update due to comments:
Link to the coalesce function
Example:
For handling null parameter values in your stored procedure
Update Table SET My_Field = COALESCE(#Variable, My_Field)
This doesn't get around what I was talking about before with the field being updated to the same value, but it does allow you to check parameter and conditionally update the field.
SQL doesn't check the value before writing to it. It will overwrite it anyway.
SQL Server will perform the update. The row will be updated as an entire row, so if one column in the row does have FieldxDirty = 1, the update is required anyway. There's no optimization gained in the SET clause.
#Kevin's answer will help more than optimizing the SET clause.
Sorry to come here with an opinion, but I have nowhere else to write :-)
There should at least be a kind of "hint" possibility to tell the UPDATE statement to generally NOT update to the same value.
There are at least 2 reasons I can think of:
1st: the value to update to can be a complicated expression and it is a waste of execution time (not to mention the maintenance of expression changes) to express it again in the WHERE clause. Think also of NULL values!
Ex. UPDATE X SET A = B WHERE ISNULL(A,'') <> ISNULL(B,'')
2nd: we have a synchronized mirroring scenario where the "backup" server is physically placed in another part of the city. This means, that the write to disk is comitted first when the backup-server has performed the write. There is a huge time difference between the write and skip writing. When the developers created the application, they worked in a test environment without mirroring. Most of the UPDATE statements just did not change the values, but it did not matter in the test environment. After deloying the application to production with mirroring, we would really love to have that "only changed value" hint. Reading the original value and checking it does not take time compared to writing

Postgresql: keep 2 sequences synchronized

Is there a way to keep 2 sequences synchronized in Postgres?
I mean if I have:
table_A_id_seq = 1
table_B_id_seq = 1
if I execute SELECT nextval('table_A_id_seq'::regclass)
I want that table_B_id_seq takes the same value of table_A_id_seq
and obviously it must be the same on the other side.
I need 2 different sequences because I have to hack some constraints I have in Django (and that I cannot solve there).
The two tables must be related in some way? I would encapsulate that relationship in a lookup table containing the sequence and then replace the two tables you expect to be handling with views that use the lookup table.
Just use one sequence for both tables. You can't keep them in sync unless you always sync them again and over again. Sequences are not transaction safe, they always roll forwards, never backwards, not even by ROLLBACK.
Edit: one sequence is also not going to work, doesn't give you the same number for both tables. Use a subquery to get the correct number and use just a single sequence for a single table. The other table has to use the subquery.
My first thought when seeing this is why do you really want to do this? This smells a little spoiled, kinda like milk does after being a few days expired.
What is the scenario that requires that these two seq stay at the same value?
Ignoring the "this seems a bit odd" feelings I'm getting in my stomach you could try this:
Put a trigger on table_a that does this on insert.
--set b seq to the value of a.
select setval('table_b_seq',currval('table_a_seq'));
The problem with this approach is that is assumes only a insert into table_a will change the table_a_seq value and nothing else will be incrementing table_a_seq. If you can live with that this may work in a really hackish fashion that I wouldn't release to production if it was my call.
If you really need this, to make it more robust make a single interface to increment table_a_seq such as a function. And only allow manipulation of table_a_seq via this function. That way there is one interface to increment table_a_seq and you should also put
select setval('table_b_seq',currval('table_a_seq')); into that function. That way no matter what, table_b_seq will always be set to be equal to table_a_seq. That means removing any grants to the users to table_a_seq and only granting them execute grant on the new function.
You could put an INSERT trigger on Table_A that executes some code that increases Table_B's sequence. Now, every time you insert a new row into Table_A, it will fire off that trigger.

LINQ Inserts without IDENTITY column

I'm using LINQ, but my database tables do not have an IDENTITY column (although they are using a surrogate Primary Key ID column)
Can this work?
To get the identity values for a table, there is a stored procedure called GetIDValueForOrangeTable(), which looks at a SystemValues table and increments the ID therein.
Is there any way I can get LINQ to get the ID value from this SystemValues table on an insert, rather than the built in IDENTITY?
As an aside, I don't think this is a very good idea, especially not for a web application. I imagine there will be a lot of concurrency conflicts because of this SystemValues lookup. Am I justified in my concern?
Cheers
Duncan
Sure you can make this work with LINQ, and safely, too:
wrap the access to the underlying SystemValues table in the "GetIDValue.....()" function in a TRANSACTION (and not with the READUNCOMMITTED isolation level!), then one and only one user can access that table at any given time and you should be able to safely distribute ID's
call that stored proc from LINQ just before saving your entity and store the ID if you're dealing with a new entity (if the ID hasn't been set yet)
store your entity in the database
That should work - not sure if it's any faster and any more efficient than letting the database handle the work - but it should work - and safely.
Marc
UPDATE:
Something like this (adapt to your needs) will work safely:
CREATE PROCEDURE dbo.GetNextTableID(#TableID INT OUTPUT)
AS BEGIN
SET TRANSACTION ISOLATION LEVEL READ COMMITTED
BEGIN TRANSACTION
UPDATE SystemTables
SET MaxTableID = MaxTableID + 1
WHERE ........
SELECT
#TableID = MaxTableID
FROM
dbo.SystemTables
COMMIT TRANSACTION
END
As for performance - as long as you have a reasonable number (less than 50 maybe) of concurrent users, and as long as this SystemTables tables isn't used for much else, then it should perform OK.
You are very justified in your concern. If two users try to insert at the sametime, both might be given the same number unless you do as described by marc_s and put the thing in a transaction. However, if the transaction doesn't wrap around your whole insert as well as the table that contains the id values, you may still have gaps if the outer insert fails (It got a value but then for some other reason didn't insert a record). Since most people do this to avoid gaps (something that is in most cases an unnecessary requirement) it makes life more complicated and still may not achieve the result. Using an identity field is almost always a better choice.

What should be returned when inserting into SQL?

A few months back, I started using a CRUD script generator for SQL Server. The default insert statement that this generator produces, SELECTs the inserted row at the end of the stored procedure. It does the same for the UPDATE too.
The previous way (and the only other way I have seen online) is to just return the newly inserted Id back to the business object, and then have the business object update the Id of the record.
Having an extra SELECT is obviously an additional database call, and more data is being returned to the application. However, it allows additional flexibility within the stored procedure, and allows the application to reflect the actual data in the table.
The additional SELECT also increases the complexity when wanting to wrap the insert/update statements in a transaction.
I am wondering what people think is better way to do it, and I don't mean the implementation of either method. Just which is better, return just the Id, or return the whole row?
We always return the whole row on both an Insert and Update. We always want to make sure our client apps have a fresh copy of the row that was just inserted or updated. Since triggers and other processes might modify values in columns outside of the actual insert/update statement, and since the client usually needs the new primary key value (assuming it was auto generated), we've found it's best to return the whole row.
The select statement will have some sort of an advantage only if the data is generated in the procedure. Otherwise the data that you have inserted is generally available to you already so no point in selecting and returning again, IMHO. if its for the id then you can have it with SCOPE_IDENTITY(), that will return the last identity value created in the current session for the insert.
Based on my prior experience, my knee-jerk reaction is to just return the freshly generated identity value. Everything else the application is inserting, it already knows--names, dollars, whatever. But a few minutes reflection and reading the prior 6 (hmm, make that 5) replies, leads to a number of “it depends” situations:
At the most basic level, what you inserted is what you’d get – you pass in values, they get written to a row in the table, and you’re done.
Slightly more complex that that is when there are simple default values assigned during an insert statement. “DateCreated” columns that default to the current datetime, or “CreatedBy” that default to the current SQL login, are a prime example. I’d include identity columns here, since not every table will (or should) contain them. These values are generated by the database upon table insertion, so the calling application cannot know what they are. (It is not unknown for web server clocks to not be synchronized with database server clocks. Fun times…) If the application needs to know the values just generated, then yes, you’d need to pass those back.
And then there are are situations where additional processing is done within the database before data is inserted into the table. Such work might be done within stored procedures or triggers. Once again, if the application needs to know the results of such calculations, then the data would need to be returned.
With that said, it seems to me the main issue underlying your decision is: how much control/understanding do you have over the database? You say you are using a tool to automatically generate your CRUD procedures. Ok, that means that you do not have any elaborate processing going on within them, you’re just taking data and loading it on in. Next question: are there triggers (of any kind) present that might modify the data as it is being written to the tables? Extend that to: do you know whether or not such triggers exists? If they’re there and they matter, plan accordingly; if you do not or cannot know, then you might need to “follow up” on the insert to see if changes occurred. Lastly: does the application care? Does it need to be informed of the results of the insert action it just requested, and if so, how much does it need to know? (New identity value, date time it was added, whether or not something changed the Name from “Widget” to “Widget_201001270901”.)
If you have complete understanding and control over the system you are building, I would only put in as much as you need, as extra code that performs no useful function impacts performance and maintainability. On the flip side, if I were writing a tool to be used by others, I’d try to build something that did everything (so as to increase my market share). And if you are building code where you don't really know how and why it will be used (application purpose), or what it will in turn be working with (database design), then I guess you'd have to be paranoid and try to program for everything. (I strongly recommend not doing that. Pare down to do only what needs to be done.)
Quite often the database will have a property that gives you the ID of the last inserted item without having to do an additional select. For example, MS SQL Server has the ##Identity property (see here). You can pass this back to your application as an output parameter of your stored procedure and use it to update your data with the new ID. MySQL has something similar.
INSERT
INTO mytable (col1, col2)
OUTPUT INSERTED.*
VALUES ('value1', 'value2')
With this clause, returning the whole row does not require an extra SELECT and performance-wise is the same as returning only the id.
"Which is better" totally depends on your application needs. If you need the whole row, return the whole row, if you need only the id, return only the id.
You may add an extra setting to your business object which can trigger this option and return the whole row only if the object needs it:
IF #return_whole_row = 1
INSERT
INTO mytable (col1, col2)
OUTPUT INSERTED.*
VALUES ('value1', 'value2')
ELSE
INSERT
INTO mytable (col1, col2)
OUTPUT INSERTED.id
VALUES ('value1', 'value2')
FI
I don't think I would in general return an entire row, but it could be a useful technique.
If you are code-generating, you could generate two procs (one which calls the other, perhaps) or parametrize a single proc to determine whther to return it over the wire or not. I doubt the DB overhead is significant (single-row, got to have a PK lookup), but the data on the wire from DB to client could be significant when all added up and if it's just discarded in 99% of the cases, I see little value. Having an SP which returns different things with different parameters is a potential problem for clients, of course.
I can see where it would be useful if you have logic in triggers or calculated columns which are managed by the database, in which case, a SELECT is really the only way to get that data back without duplicating the logic in your client or the SP itself. Of course, the place to put any logic should be well thought out.
Putting ANY logic in the database is usually a carefully-thought-out tradeoff which starts with the minimally invasive and maximally useful things like constraints, unique constraints, referential integrity, etc and growing to the more invasive and marginally useful tools like triggers.
Typically, I like logic in the database when you have multi-modal access to the database itself, and you can't force people through your client assemblies, say. In this case, I would still try to force people through views or SPs which minimize the chance of errors, duplication, logic sync issues or misinterpretation of data, thereby providing as clean, consistent and coherent a perimeter as possible.

Resources