I have a stored proc that should conditionally update a bunch of fields in the same table. Conditionally, because for each field I also pass a "dirty" flag and a field should be updated only if flag is set to 1.
So I'm going to do the following:
create proc update
#field1 nvarchar(1000), #field1Dirty bit, ...other fields...
as
begin
update mytable
set field1 = case when #field1dirty = 1 then #field1 else field1 end,
... same for other fields
end
go
Question - is SQL Server (2008) smart enough to not physically update a field if it's been assigned its own value, like in case if #field1dirty = 0?
Question - is SQL Server (2008) smart enough to not physically update
a field if it's been assigned its own
value, like in case if #field1dirty =
0?
No you should add a where clause that says...where field <> the value you are updating to.
This doesn't seem like a big deal at first, but in truth it can create a massive amount of overhead. One example, think about triggers. If that updates every field in the table, that trigger will fire for every row. YIKES, that's a lot of code execution that's needless, especially if that code is say, moving updates rows to a logging table. I'm sure you get the idea.
Remember, you're updating the field, it just happens to be the same value it was before. It's actually good that this happens, because that means that you can still count the field as modified (think timestamp etc.). If it didn't think updating the field to the same value was modifying the row, you wouldn't know if someone inadvertently (or deliberately) tried to change data.
Update due to comments:
Link to the coalesce function
Example:
For handling null parameter values in your stored procedure
Update Table SET My_Field = COALESCE(#Variable, My_Field)
This doesn't get around what I was talking about before with the field being updated to the same value, but it does allow you to check parameter and conditionally update the field.
SQL doesn't check the value before writing to it. It will overwrite it anyway.
SQL Server will perform the update. The row will be updated as an entire row, so if one column in the row does have FieldxDirty = 1, the update is required anyway. There's no optimization gained in the SET clause.
#Kevin's answer will help more than optimizing the SET clause.
Sorry to come here with an opinion, but I have nowhere else to write :-)
There should at least be a kind of "hint" possibility to tell the UPDATE statement to generally NOT update to the same value.
There are at least 2 reasons I can think of:
1st: the value to update to can be a complicated expression and it is a waste of execution time (not to mention the maintenance of expression changes) to express it again in the WHERE clause. Think also of NULL values!
Ex. UPDATE X SET A = B WHERE ISNULL(A,'') <> ISNULL(B,'')
2nd: we have a synchronized mirroring scenario where the "backup" server is physically placed in another part of the city. This means, that the write to disk is comitted first when the backup-server has performed the write. There is a huge time difference between the write and skip writing. When the developers created the application, they worked in a test environment without mirroring. Most of the UPDATE statements just did not change the values, but it did not matter in the test environment. After deloying the application to production with mirroring, we would really love to have that "only changed value" hint. Reading the original value and checking it does not take time compared to writing
Related
I have an API that i'm trying to read that gives me just the updated field. I'm trying to take that and update my tables using a stored procedure. So far the only way I have been able to figure out how to do this is with dynamic SQL but i would prefer to not do that if there is a way not to.
If it was just a couple columns, I'd just write a proc for each but we are talking about 100 fields and any of them could be updated together. One ticket might just need a timestamp updated at this time, but the next ticket might be a timestamp and who modified it while the next one might just be a note.
Everything I've read and have been taught have told me that dynamic SQL is bad and while I'll write it if I have too, I'd prefer to have a proc.
YOU CAN PERHAPS DO SOMETHING LIKE THIS:::
IF EXISTS (SELECT * FROM NEWTABLE NOT IN (SELECT * FROM OLDTABLE))
BEGIN
UPDATE OLDTABLE
SET OLDTABLE.OLDRECORDS = NEWTABLE.NEWRECORDS
WHERE OLDTABLE.PRIMARYKEY= NEWTABLE.PRIMARYKEY
END
The best way to solve your problem is using MERGE:
Performs insert, update, or delete operations on a target table based on the results of a join with a source table. For example, you can synchronize two tables by inserting, updating, or deleting rows in one table based on differences found in the other table.
As you can see your update could be more complex but more efficient as well. Using MERGE requires some proficiency, but when you start to use it you'll use it with pleasure again and again.
I am not sure how your business logic works that determines what columns are updated at what time. If there are separate business functions that require updating different but consistent columns per function, you will probably want to have individual update statements for each function. This will ensure that each process updates only the columns that it needs to update.
On the other hand, if your API is such that you really don't know ahead of time what needs to be updated, then building a dynamic SQL query is a good idea.
Another option is to build a save proc that sets every user-configurable field. As long as the calling process has all of that data, it can call the save procedure and pass every updateable column. There is no harm in having a UPDATE MyTable SET MyCol = #MyCol with the same values on each side.
Note that even if all of the values are the same, the rowversion (or timestampcolumns) will still be updated, if present.
With our software, the tables that users can edit have a widely varying range of columns. We chose to create a single save procedure for each table that has all of the update-able columns as parameters. The calling processes (our web servers) have all the required columns in memory. They pass all of the columns on every call. This performs fine for our purposes.
I needed to create a function today which will always return the exact same value on the specific database it's executed on. It may / may not be the same across databases which is why it has to be able to load it from a table the first time it's required.
CREATE FUNCTION [dbo].[PAGECODEGET] ()
RETURNS nvarchar(6)
AS
BEGIN
DECLARE #PageCode nvarchar(6) = ( SELECT PCO_IDENTITY FROM PAGECODES WHERE PCO_PAGE = 'SWTE' AND PCO_TAB = 'RECORD' )
RETURN #PageCode
END
The PCO_IDENTITY field is a sql identity field, so once the record is inserted for the first time, it's always going to return the same result thereafter.
My question is, is there any way to persist this value to something equivalent to a C# readonly variable?
From a perfomance point of view I know sql will optimise the plan etc, but from a best practice point of view I'm thinking there may possibly be a better way of doing it.
We use a mix of SQL Servers, but the lowest is 2008 R2 in case there's a version specific solution.
I'm afraid there's no such thing as a global variable like you suggest in SQL Server.
As you've pointed out, the function will potentially return different results on another database, depending on a variety of factors, such as when the row was inserted, what other values exist in the table already etc. - basically, the PCO_IDENTITY value for this row cannot be relied upon to be consistent.
A few observations:
I don't see how getting this value occasionally is really going to be a performance bottleneck. I don't think best practices cover this, as selecting a value from a table is as basic as you can get.
If this is part of another larger query, you will probably get better performance by using a join to the PAGECODES table directly, rather than potentially running this function for every row
However, if you are really worried:
There are objects in the database which are persistant - tables. When you first insert this value, retrieve the PCO_IDENTITY value, and create a new table with just that in, that you may join to in your queries. Seems a bit of a waste for one value, doesn't it? (Note you could also make a view, but how would that be any better performing than the function you started with?)
You could force these values into a row with a specific PCO_IDENTITY value, using IDENTITY_INSERT. That way the value is consistent, and you know what it is - you could hard code it in your queries. (NB: Turn IDENTITY_INSERT off again afterwards, and other rows inserted into this table will continue to be automatically generated again)
TL;DR: How you are doing it is probably fine. I suspect you are trying to optimise something that isn't a problem. As always - if in doubt, try out a few approaches and measure.
I have a table that is currently using a couple of columns named DateFrom and DateTo. I'm trying to replace them with a single NewDate column, populated for existing rows with the value from DateFrom.
I need good error/transaction handling as, if the change fails, I don't want a halfway in-between table, I want to revert.
I've tried a number of things but can't get it to work properly. Any help is appreciated as I'm far from experienced with this.
I started with
BEGIN TRAN
ALTER TABLE TableName
ADD NewDate DATETIME
IF ##ERROR = 0 AND ##TRANCOUNT = 1
UPDATE TableName
SET NewDate = ValidFrom
....
This fails immediately as NewDate is not currently a column in the table. Fine, so I add a GO in there. This breaks it into two batches and it now runs, except it makes the ##ERROR check pointless. I also can't use a local variable as those are lost after GO as well. Ideally I'd like to use a TRY...CATCH to avoid checking errors after each statement but I can't use a GO with that as it needs to be one batch.
None of the articles I've found talk about this situation (error handling with GO). So the question is: Is there any way I can get the transaction-with-error-handling approach I'm looking for when adding and updating a column (which seems to necessitate a GO somewhere)?
Or am I going to have to settle for doing it in several batches, without the ability to roll back to my original table if anything goes wrong?
Why are you worried about creating the new column in the transaction? Just create the column and then populate it. You don't even need an explicit tran when populating it. If it fails (which is very unlikely), just do the update again.
I would do the following steps
Add new column
Update new column
Check to see if the data in the new column looks correct
Drop the old columns no longer needed (you may want to check where these columns are being used before dropping them e.g. are they used in any stored procedures, reports, front-end application code)
Also, it is worth adding more context to your question. I assume you are testing a script against a test database and will later apply the script to a prod database. Is the prod database very big? Very busy? Mission critical? Backed up on a schedule?
I am creating a self-related table:
Table Item columns:
ItemId int - PK;
Amount money - not null;
Price money - a computed column using a UDF that retrieves value according to the items ancestors' Amount.
ParentItemId int - nullable, reference to another ItemId in this table.
I need to avoid a loop, meaning, a sibling cannot become an ancestor of his ancestors, meaning, if ItemId=2 ParentItemId = 1, then ItemId 1 ParentItemId = 2 shouldn't be allowed.
I don't know what should be the best practice in this situation.
I think I should add a CK that gets a Scalar value from a UDF or whatever else.
EDIT:
Another option is to create an INSTEAD OF trigger and put in 1 transaction the update of the ParentItemId field and selecting the Price field from the ##RowIdentity, if it fails cancel transaction, but I would prefer a UDF validating.
Any ideas are sincerely welcomed.
Does this definitely need to be enforced at the database level?
I'm only asking as I have databases like this (where the table similar to this is like a folder) and I only make sure that the correct parent/child relationships are set up in the application.
Checks like this is not easy to implement, and possible solutions could cause a lot of bugs and problems may be harder then initial one. Usually it is enough to add control for user's input and prevent infinite loop on read data.
If your application uses stored procedures, no ORM, than I would choose to implement this logic in SP. Otherwise - handle it in other layers, not in DB
How big of a problem is this, in real life? It can be expensive to detect these situations (using a trigger, perhaps). In fact, it's likely going to cost you a lot of effort, on each transaction, when only a tiny subset of all your transactions would ever cause this problem.
Think about it first.
A simple trick is to force the ParentItemId to be less than the ItemId. This prevents loop closure in this simple context.
However, there's a down side - if you need for some reason to delete/insert a parent, you may need to delete/insert all of its children in order as well.
Equally, hierarchies need to be inserted in order, and you may not be able to reassign a parent.
Tested and works just great:
CREATE TRIGGER Item_UPDATE
ON Item
FOR INSERT, UPDATE
AS
BEGIN
BEGIN TRY
SELECT Price FROM INSERTED
END TRY
BEGIN CATCH
RAISERROR('This item cannot be specified with this parent.', 16, 1)
ROLLBACK TRANSACTION;
END CATCH
END
GO
A few months back, I started using a CRUD script generator for SQL Server. The default insert statement that this generator produces, SELECTs the inserted row at the end of the stored procedure. It does the same for the UPDATE too.
The previous way (and the only other way I have seen online) is to just return the newly inserted Id back to the business object, and then have the business object update the Id of the record.
Having an extra SELECT is obviously an additional database call, and more data is being returned to the application. However, it allows additional flexibility within the stored procedure, and allows the application to reflect the actual data in the table.
The additional SELECT also increases the complexity when wanting to wrap the insert/update statements in a transaction.
I am wondering what people think is better way to do it, and I don't mean the implementation of either method. Just which is better, return just the Id, or return the whole row?
We always return the whole row on both an Insert and Update. We always want to make sure our client apps have a fresh copy of the row that was just inserted or updated. Since triggers and other processes might modify values in columns outside of the actual insert/update statement, and since the client usually needs the new primary key value (assuming it was auto generated), we've found it's best to return the whole row.
The select statement will have some sort of an advantage only if the data is generated in the procedure. Otherwise the data that you have inserted is generally available to you already so no point in selecting and returning again, IMHO. if its for the id then you can have it with SCOPE_IDENTITY(), that will return the last identity value created in the current session for the insert.
Based on my prior experience, my knee-jerk reaction is to just return the freshly generated identity value. Everything else the application is inserting, it already knows--names, dollars, whatever. But a few minutes reflection and reading the prior 6 (hmm, make that 5) replies, leads to a number of “it depends” situations:
At the most basic level, what you inserted is what you’d get – you pass in values, they get written to a row in the table, and you’re done.
Slightly more complex that that is when there are simple default values assigned during an insert statement. “DateCreated” columns that default to the current datetime, or “CreatedBy” that default to the current SQL login, are a prime example. I’d include identity columns here, since not every table will (or should) contain them. These values are generated by the database upon table insertion, so the calling application cannot know what they are. (It is not unknown for web server clocks to not be synchronized with database server clocks. Fun times…) If the application needs to know the values just generated, then yes, you’d need to pass those back.
And then there are are situations where additional processing is done within the database before data is inserted into the table. Such work might be done within stored procedures or triggers. Once again, if the application needs to know the results of such calculations, then the data would need to be returned.
With that said, it seems to me the main issue underlying your decision is: how much control/understanding do you have over the database? You say you are using a tool to automatically generate your CRUD procedures. Ok, that means that you do not have any elaborate processing going on within them, you’re just taking data and loading it on in. Next question: are there triggers (of any kind) present that might modify the data as it is being written to the tables? Extend that to: do you know whether or not such triggers exists? If they’re there and they matter, plan accordingly; if you do not or cannot know, then you might need to “follow up” on the insert to see if changes occurred. Lastly: does the application care? Does it need to be informed of the results of the insert action it just requested, and if so, how much does it need to know? (New identity value, date time it was added, whether or not something changed the Name from “Widget” to “Widget_201001270901”.)
If you have complete understanding and control over the system you are building, I would only put in as much as you need, as extra code that performs no useful function impacts performance and maintainability. On the flip side, if I were writing a tool to be used by others, I’d try to build something that did everything (so as to increase my market share). And if you are building code where you don't really know how and why it will be used (application purpose), or what it will in turn be working with (database design), then I guess you'd have to be paranoid and try to program for everything. (I strongly recommend not doing that. Pare down to do only what needs to be done.)
Quite often the database will have a property that gives you the ID of the last inserted item without having to do an additional select. For example, MS SQL Server has the ##Identity property (see here). You can pass this back to your application as an output parameter of your stored procedure and use it to update your data with the new ID. MySQL has something similar.
INSERT
INTO mytable (col1, col2)
OUTPUT INSERTED.*
VALUES ('value1', 'value2')
With this clause, returning the whole row does not require an extra SELECT and performance-wise is the same as returning only the id.
"Which is better" totally depends on your application needs. If you need the whole row, return the whole row, if you need only the id, return only the id.
You may add an extra setting to your business object which can trigger this option and return the whole row only if the object needs it:
IF #return_whole_row = 1
INSERT
INTO mytable (col1, col2)
OUTPUT INSERTED.*
VALUES ('value1', 'value2')
ELSE
INSERT
INTO mytable (col1, col2)
OUTPUT INSERTED.id
VALUES ('value1', 'value2')
FI
I don't think I would in general return an entire row, but it could be a useful technique.
If you are code-generating, you could generate two procs (one which calls the other, perhaps) or parametrize a single proc to determine whther to return it over the wire or not. I doubt the DB overhead is significant (single-row, got to have a PK lookup), but the data on the wire from DB to client could be significant when all added up and if it's just discarded in 99% of the cases, I see little value. Having an SP which returns different things with different parameters is a potential problem for clients, of course.
I can see where it would be useful if you have logic in triggers or calculated columns which are managed by the database, in which case, a SELECT is really the only way to get that data back without duplicating the logic in your client or the SP itself. Of course, the place to put any logic should be well thought out.
Putting ANY logic in the database is usually a carefully-thought-out tradeoff which starts with the minimally invasive and maximally useful things like constraints, unique constraints, referential integrity, etc and growing to the more invasive and marginally useful tools like triggers.
Typically, I like logic in the database when you have multi-modal access to the database itself, and you can't force people through your client assemblies, say. In this case, I would still try to force people through views or SPs which minimize the chance of errors, duplication, logic sync issues or misinterpretation of data, thereby providing as clean, consistent and coherent a perimeter as possible.