How to control order of Update query execution? - sql-server

I have a table in MS SQL 2005. And would like to do:
update Table
set ID = ID + 1
where ID > 5
And the problem is that ID is primary key and when I do this I have an error, because when this query comes to row with ID 8 it tries to change the value to 9, but there is old row in this table with value 9 and there is constraint violation.
Therefore I would like to control the update query to make sure that it's executed in the descending order.
So no for ID = 1,2,3,4 and so on, but rather ID = 98574 (or else) and then 98573, 98572 and so on. In this situation there will be no constraint violation.
So how to control order of update execution? Is there a simple way to acomplish this programmatically?

Transact SQL defers constraint checking until the statement finishes.
That's why this query:
UPDATE mytable
SET id = CASE WHEN id = 7 THEN 8 ELSE 7 END
WHERE id IN (7, 8)
will not fail, though it swaps id's 7 and 8.
It seems that some duplicate values are left after your query finishes.

Try this:
update Table
set ID = ID * 100000 + 1
where ID > 5
update Table
set ID = ID / 100000
where ID > 500000

Don't forget the parenthesis...
update Table
set ID = (ID * 100000) + 1
where ID > 5
If the IDs get too big here, you can always use a loop.

Personally I would not update an id field this way, I would create a work table that is the old to new table. It stores both ids and then all the updates are done from that. If you are not using cascade delete (which could incidentally lock your tables for a long time), then start with the child tables and work up, other wise start with the pk table. Do not do this unless you are in single user mode or you can get some nasty data integrity problems if other users are changin things while the tables are not consistent with each other.
PKs are nothing to fool around with changing and if at all possible should not be changed.
Before you do any changes to production data in this way, make sure to take a full backup. Messing this up can cost you your job if you can't recover.

Related

Which is more efficient update where or if exists then update

I would like to know which is more efficient and why.
if not exists (select 1 from table where ID = 101 and TT = 5)
begin
update table
set TT = 5
where ID = 101;
end;
or
update table
set TT = 5
where ID = 101 and TT <> 5;
Assume there is a clustered index on ID (nothing more table used default table creation setting)
WHERE, IF EXISTS and IN all have different performance benefits. I would suggest checking out these two articles.
https://www.sqlshack.com/t-sql-commands-performance-comparison-not-vs-not-exists-vs-left-join-vs-except/
https://sqlchitchat.com/sqldev/tsql/semi-joins-in-sql-server/
SQL Server will generally optimize a non-updating UPDATE to not actually issue any updates. Therefore, with a simple table, you are not going to see much difference.
If you have triggers, they will be fired if the UPDATE statement executes, irrelevant of how many rows are updated.
If the UPDATE statement executes over rows, even if they are modified to the same value, they will appear in the trigger.
If rows are filtered out with a WHERE clause, for example and TT <> 5, then the trigger will fire with 0 rows
rowversion and GENERATED AS columns will be updated regardless.
Clustered key columns will cause a delete and insert of the whole row.
If ALLOW_SNAPSHOT_ISOLATION or READ_COMMITTED_SNAPSHOT are on, even if not being used, then due to the way row-versioning works, an actual update will always be made.
If the IF EXISTS is complex, it still may not be worth doing, but in simple cases it usually is.

How to safely use current identity as value in insert query

I have a table where one of the columns is a path to an image and I need to create a directory for the record being inserted.
Example:
Id | PicPath |<br>
1 | /Pics/1/0.jpg|<br>
2 | /Pics/2/0.jpg|
This way I can be sure that the folder name is always valid and it is unique (no clash between two records).
Question is: how can I safely refer to the current id of the record being insert? Keep in mind that this is a highly concurrent environment, and I would like to avoid multiple trips to the DB if possible.
I have tried the following:
insert into Dummy values(CONCAT('a', (select IDENT_CURRENT('Dummy'))))
and
insert into Dummy values(CONCAT('a', (select SCOPE_IDENTITY() + 1)))
The first query is not safe, for when running 1000 concurrent inserts I got 58 'duplicate key' exceptions.
The second query didn't work because SCOPE_IDENTITY() returned the same value for all queries as I suspected.
What are my alternatives here?
Try a temporary table to track your inserted ids using OUTPUT clause
INSERT #temp_ids(someval) OUTPUT inserted.identity_column
This will get all the inserted ids from your queries. 'inserted' is context safe.

SQL Server RowVersion/Timestamp - Comparisons

I know that the value itself for a RowVersion column is not in and of itself useful, except that it changes each time the row is updated. However, I was wondering if they are useful for relative (inequality) comparison.
If I have a table with a RowVersion column, are either of the following true:
Will all updates that occur simultaneously (either same update statement or same transaction) have the same value in the RowVersion column?
If I do update "A", followed by update "B", will the rows involved in update "B" have a higher value than the rows involved in update "A"?
Thanks.
From MSDN:
Each database has a counter that is incremented for each insert or update operation that is performed on a table that contains a rowversion column within the database. This counter is the database rowversion. This tracks a relative time within a database, not an actual time that can be associated with a clock. Every time that a row with a rowversion column is modified or inserted, the incremented database rowversion value is inserted in the rowversion column.
http://msdn.microsoft.com/en-us/library/ms182776.aspx
As far as I understand, nothing ACTUALLY happens simultaneously in the system. This means that all rowversions should be unique. I venture to say that they would be effectively useless if duplicates were allowed within the same table. Also giving credance to rowversions not being duplicated is MSDN's stance on not using them as primary keys not because it would cause violations, but because it would cause foreign key issues.
According to MSDN, "The rowversion data type is just an incrementing number..." so yes, later is larger.
To the question of how much it increments, MSDN states, "[rowversion] tracks a relative time within a database" which indicates that it is not a fluid integer incrementing, but time based. However, this "time" reveals nothing of when exactly, but rather when in relation to other rows a row was inserted/modified.
Some additional information.
RowVersion converts nicely to bigint and thus one can display better readable output when debugging:
CREATE TABLE [dbo].[T1](
[Id] [int] IDENTITY(1,1) NOT NULL,
[Value] [nvarchar](50) NULL,
[RowVer] [timestamp] NOT NULL
)
insert into t1 ([value]) values ('a')
insert into t1 ([value]) values ('b')
insert into t1 ([value]) values ('c')
select Id, Value,CONVERT(bigint,rowver)as RowVer from t1
update t1 set [value] = 'x' where id = 3
select Id, Value,CONVERT(bigint,rowver)as RowVer from t1
update t1 set [value] = 'y'
select Id, Value,CONVERT(bigint,rowver)as RowVer from t1
Id Value RowVer
1 a 2037
2 b 2038
3 c 2039
Id Value RowVer
1 a 2037
2 b 2038
3 x 2040
Id Value RowVer
1 y 2041
2 y 2042
3 y 2043
I spent ages trying to sort something out with this - to ask for columns updated after a particular sequence number. The timestamp is really just a sequence number - it's also bigendian when c# functions like BitConverter.ToInt64 want littleendian.
I ended up creating a db view on the table i want data from with an alias column 'SequenceNo'
SELECT ID, CONVERT(bigint, Timestamp) AS SequenceNo
FROM dbo.[User]
c# Code first sees the view (ie UserV) identically to a normal table
then in my linq I can join the view and parent table and compare with a sequence number
var users = (from u in context.GetTable<User>()
join uv in context.GetTable<UserV>() on u.ID equals uv.ID
where mysequenceNo < uv.SequenceNo
orderby uv.SequenceNo
select u).ToList();
to get what I want - all the entries changed since the last time I checked.
What makes you think Timestamp data types are evil? The data type is very useful for concurrency checking. Linq-To-SQL uses this data type for this very purpose.
The answers to your questions:
1) No. This value is updated each time the row is updated. If you are updating the row say five times, each update will increment the Timestamp value. Of course, you realize that updates that "occur simultaneously" really don't. They still only occur one at a time, in turn.
2) Yes.
Just as a note, timestamp is deprecated in SQL Server 2008 onwards. rowversion should be used instead.
From this page on MSDN:
The timestamp syntax is deprecated. This feature will be removed in a
future version of Microsoft SQL Server. Avoid using this feature in
new development work, and plan to modify applications that currently
use this feature.
Rowversion does break one of the "idealistic" approaches of SQL - that an UPDATE statement is a single, atomic action, and acts as if all UPDATEs (both to all columns within a row, and all rows within the table) occur "at the same time". But in this case, with Rowversion, it is possible to determine that one row was updated at a slightly different time than another.
Note that the order in which rows are updated (by a single update statement) is not guaranteed - it may, by coincidence follow the same order as the clustered key for the table, but I wouldn't count on that being true.
To answer part of your question: you can end up with duplicate values according to MSDN:
Duplicate rowversion values can be generated by using the SELECT INTO
statement in which a rowversion column is in the SELECT list. We do
not recommend using rowversion in this manner.
Source: rowversion (Transact-SQL)
Every database has a counter that is incremented one by one on every data modification that is done in the database. If the table containing the affected (by update/insert) row contains a timestamp/rowversion column, the current counter value of the database is stored in that column of the updated/inserted record.

How to make tasks double-checked (the way how to store it in the DB)?

I have a DB that stores different types of tasks and more items in different tables.
In many of these tables (that their structure is different) I need a way to do it that the item has to be double-checked, meaning that the item can't be 'saved' (I mean of course it will be saved) before someone else goes in the program and confirms it.
What should be the right way to say which item is confirmed:
Each of these tables should have a column "IsConfirmed", then when that guy wants to confirm all the stuff, the program walks thru all the tables and creates a list of the items that are not checked.
There should be a third table that holds the table name and Id of that row that has to be confirmed.
I hope you have a better idea than the two uglies above.
Is the double-confirmed status something that happens exactly once for an entity? Or can it be rejected and need to go through confirmation again? In the latter case, do you need to keep all of this history? Do you need to keep track of who confirmed each time (e.g. so you don't have the same person performing both confirmations)?
The simple case:
ALTER TABLE dbo.Table ADD ConfirmCount TINYINT NOT NULL DEFAULT 0;
ALTER TABLE dbo.Table ADD Processed BIT NOT NULL DEFAULT 0;
When the first confirmation:
UPDATE dbo.Table SET ConfirmCount = 1 WHERE PK = <PK> AND ConfirmCount = 0;
On second confirmation:
UPDATE dbo.Table SET ConfirmCount = 2 WHERE PK = <PK> AND ConfirmCount = 1;
When rejected:
UPDATE dbo.Table SET ConfirmCount = 0 WHERE PK = <PK>;
Now obviously your background job can only treat rows where Processed = 0 and ConfirmCount = 2. Then when it has processed that row:
UPDATE dbo.Table SET Processed = 1 WHERE PK = <PK>;
If you have a more complex scenario than this, please provide more details, including the goals of the double-confirm process.
Consider adding a new table to hold the records to be confirmed (e.g. TasksToBeConfirmed). Once the records are confirmed, move those records to the permanent table (Tasks).
The disadvantage of adding an "IsConfirmed" column is that virtually every SQL statement that uses the table will have to filter on "IsConfirmed" to prevent getting unconfirmed records. Every time this is missed, a defect is introduced.
In cases where you need confirmed and unconfirmed records, use UNION.
This pattern is a little more work to code and implement, but in my experience, significantly improves performance and reduces defects.

MS SQL Server trigger to update item rating and number of votes

To make this easier to understand, I will present the exact same problem as if it was about a forum (the actual app doesn't have to do with forums at all, but I think such a parallel is easier for most of us to grasp, the actual app is about something very specific that most programmers won't understand (it's an app intended for hardcore graphic designers)).
Let's suppose that there is a thread table that stores information about each forum thread and a threadrating table that stores thread ratings per user (1-5). For efficiency I decided to cache the rating average and number of votes in the thread table and triggers sounded like a good idea for updating it (I used to do such stuff in the actual application code, but I think triggers are worth a try, despite the debugging dangers).
As you know, MS SQL Server doesn't support a trigger to be executed per row, it has to be per statement. So I tried defining it this way:
CREATE TRIGGER thread_rating ON threadrating
AFTER INSERT
AS
UPDATE thread
SET
thread.rating = (thread.rating * thread.voters + SUM(inserted.rating))/(thread.voters + COUNT(inserted.rating)),
thread.voters = thread.voters + COUNT(inserted.rating)
FROM thread
INNER JOIN inserted ON(inserted.threadid = thread.threadid)
GROUP BY inserted.threadid
but I get an error for the "GROUP BY" clause (which I expected). The question is, how can I make this work?
Sorry if the question is stupid, but it's the first time I actually try to use triggers.
Additional info: The thread table would contain threadid (int, primary key), rating (float), voters(int) and some other fields that are irrelevent to the current question.
The threadrating table only contains threadid (foreign key), userid (foreign key to the primary key of the users table) and rating (tinyint between 1 and 5).
The error message is "Incorrect syntax near the keyword 'GROUP'."
First, I strongly recommend that you not use triggers.
If you're getting a syntax error, check that your parens are balanced as well as your begin/ends. In your case, you have an end (at the end) but no begin. You can fix that be just removing the end.
Once you fix that, you'll likely get some more errors like "columns x,y,z not in an aggregate or group by". That's because you have several columns that are not in either. You need to add thread.rating, thread.voters, etc. to your group by or perform some kind of aggregate on them.
This is all assuming that there are multiple records with the same threadID (ie, it's not the primary key). If that's not the case, then what's the purpose of the group by?
Edit:
I'm stumped on the syntax error. I worked around it with a couple correlated sub queries. I guessed at your table structure so modify as needed and try this:
--CREATE TABLE ThreadRating (threadid int not null, userid int not null, rating int not null)
--CREATE TABLE Thread (threadid int not null, rating int not null, voters int not null)
ALTER TRIGGER thread_rating ON threadrating
AFTER INSERT
AS
UPDATE Thread
SET Thread.rating =
(SELECT (Thread.Rating * Thread.Voters + SUM(I.Rating)) / (Thread.Voters + COUNT(I.Rating))
FROM ThreadRating I WHERE I.ThreadID = thread.ThreadID)
,Thread.Voters =
(SELECT Thread.Voters + COUNT(I.Rating)
FROM ThreadRating I WHERE I.ThreadID = Thread.ThreadID)
FROM Thread
JOIN Inserted ON Inserted.ThreadID = Thread.ThreadID
If that's what you wanted, then we can check the performance/execution plan and modify as needed. We might be able to get it to work with the group by yet.
Alternatives to triggers
If you are updating data that impact ratings in only a few select places, I'd recommend updating the ratings directly there. Factoring the logic into a trigger is nice but provides lots of problems (performance, visibility, etc.). This can be aided by a function.
Consider this: your trigger will execute every single time someone touches that table. Things like view counts, last updated dates, etc. will execute this trigger. You can add logic to short circuit the trigger in those cases but it gets complicated rapidly.
D'ohh! I totally misread your question and I thought you were asking about MySQL. Mea culpa! I will leave the solution below intact, and mark it as community wiki. Maybe it'll be useful to someone with a similar problem on MySQL.
MySQL triggers are executed per row. Also the pseudo-table "inserted" is a Microsoft SQL Server convention.
MySQL uses pseudo-tables NEW and OLD as extensions to the trigger language.
Here's a solution to your problem:
CREATE TRIGGER thread_rating
AFTER INSERT ON threadrating
FOR EACH ROW
BEGIN
UPDATE thread
SET rating = (rating*voters + NEW.rating)/(voters+1),
voters = voters + 1
WHERE threadid = NEW.threadid;
END
Likewise you'd need triggers for UPDATE and DELETE:
CREATE TRIGGER thread_rating
AFTER UPDATE ON threadrating
FOR EACH ROW
BEGIN
UPDATE thread
SET rating = (rating*voters - OLD.rating + NEW.rating)/voters,
WHERE threadid = NEW.threadid;
END
CREATE TRIGGER thread_rating
AFTER DELETE ON threadrating
FOR EACH ROW
BEGIN
UPDATE thread
SET rating = (rating*voters - OLD.rating)/(voters-1),
voters = voters - 1
WHERE threadid = OLD.threadid;
END
You may find the following reading helpful:
An introduction to Triggers
Wikipedia: DB Triggers

Resources