UPDATE slow when setting column to NULL - sql-server

I have a SQL Server 2008 table with 80,000 rows and am executing the following query:
UPDATE dbo.TableName WITH (ROWLOCK)
SET HelloWorldID = NULL
WHERE HelloWorldID = #helloWorldID
HelloWorldID is an int and the #helloWorldID parameter is also int.
The query is taking too long and I'd like to optimize it. I created a nonclustered index on HelloWorldID but it didn't matter. I may have to redesign this...maybe put the HelloWorldID on another table that links it to the TableName table?

Since the command you're waiting on is DELETE I have to guess that there is a trigger on dbo.TableName and that it is performing additional work that you do not expect. Or perhaps some CASCADE option that is affecting other tables that have triggers on them.

It all depends on how much rows will be updated by this query.
If you're updating a lot of rows, say 30% of the table, then the index will actually slow down the query (as index will be updated along with the table, and it won't help with filtering the rows for update). Also ROWLOCK will slow it down, because the engine will issue a separate lock for each row (as opposed to pagelocks that would occur normally).
Try removing the index and running this update using WITH(TABLOCK) just to see what happens.

I get this problem sometimes. Your query is dependent upon simultaneously getting a write-lock on every row in the table meeting the conditions of the WHERE-Clause . Depending on your needs for full 'ACID', you could do something like this:
SELECT getdate() -- force ##rowcount=1
while ##rowcount > 0
UPDATE TOP (1000) dbo.TableName
SET HelloWorldID = NULL
WHERE HelloWorldID = #helloWorldID
This will do the update is smaller chunks, and help overcome locking issues. But remember, this-method gives up on doing this-query as a single-transaction. You will need to tune the 1000 to a value that is right for your server.

Related

Which is more efficient update where or if exists then update

I would like to know which is more efficient and why.
if not exists (select 1 from table where ID = 101 and TT = 5)
begin
update table
set TT = 5
where ID = 101;
end;
or
update table
set TT = 5
where ID = 101 and TT <> 5;
Assume there is a clustered index on ID (nothing more table used default table creation setting)
WHERE, IF EXISTS and IN all have different performance benefits. I would suggest checking out these two articles.
https://www.sqlshack.com/t-sql-commands-performance-comparison-not-vs-not-exists-vs-left-join-vs-except/
https://sqlchitchat.com/sqldev/tsql/semi-joins-in-sql-server/
SQL Server will generally optimize a non-updating UPDATE to not actually issue any updates. Therefore, with a simple table, you are not going to see much difference.
If you have triggers, they will be fired if the UPDATE statement executes, irrelevant of how many rows are updated.
If the UPDATE statement executes over rows, even if they are modified to the same value, they will appear in the trigger.
If rows are filtered out with a WHERE clause, for example and TT <> 5, then the trigger will fire with 0 rows
rowversion and GENERATED AS columns will be updated regardless.
Clustered key columns will cause a delete and insert of the whole row.
If ALLOW_SNAPSHOT_ISOLATION or READ_COMMITTED_SNAPSHOT are on, even if not being used, then due to the way row-versioning works, an actual update will always be made.
If the IF EXISTS is complex, it still may not be worth doing, but in simple cases it usually is.

Use of inserted and deleted tables for logging - is my concept sound?

I have a table with a simple identity column primary key. I have written a 'For Update' trigger that, among other things, is supposed to log the changes of certain columns to a log table. Needless to say, this is the first time I've tried this.
Essentially as follows:
Declare Cursor1 Cursor for
select a.*, b.*
from inserted a
inner join deleted b on a.OrderItemId = b.OrderItemId
(where OrderItemId is the actual name of the primary identity key).
I then do the usual open the cursor and go into a fetch next loop. With the columns I want to test, I do:
if Update(Field1)
begin
..... do some logging
end
The columns include varchars, bits, and datetimes. It works, sometimes. The problem is that the log function is writing the a and b values of the field to a log and in some cases, it appears that the before and after values are identical.
I have 2 questions:
Am I using the Update function correctly?
Am I accessing the before and after values correctly?
Is there a better way?
If you are using SQL Server 2016 or higher, I would recommend skipping this trigger entirely and instead using system-versioned temporal tables.
Not only will it eliminate the need for (and performance issues around) the trigger, it'll be easier to query the historical data.

TSQL - SELECT TOP and UPDATE affecting more rows than expected

I'm trying to understand the behavior of an UPDATE/REPLACE that I'm carrying out that is removing some invalid data and replacing with preferred data.
The UPDATE executes normally and does what it needs to do, but the rows affected are not what I expected in some cases (I'm carrying this out on multiple databases).
I've put part of the script below (The rest is essentially replicating the same function across multiple tables)
UPDATE TBL_HISTORY
SET DETAILS = REPLACE(DETAILS,'&QUOT','Times New Roman')
WHERE HISTORYID IN
(SELECT TOP 1000 (HISTORYID) FROM TBL_HISTORY
WHERE DETAILS LIKE '%&QUOT%')
GO
What I'd imagine to happen with the script above is to select the TOP 1000 records in TBL_HISTORY that contain the unwanted string of data and carry out the REPLACE.
The result has been in cases where there are more than 1000 affected rows it will update all of them, returning a value of 1068 rows affected for example.
HISTORYID is the PK on the table. Am I misunderstanding how this should work? Any guidance would be appreciated.
Try this instead(it is faster). If it still update more than 1000 rows, it is due to a trigger. If it updates 1000 rows then HISTORYID is not the only column in the primary key(composite primary key).
;WITH CTE as
(
SELECT top 1000
DETAILS
FROM
TBL_HISTORY
WHERE
DETAILS LIKE '%&QUOT%'
)
UPDATE CTE
SET DETAILS = REPLACE(DETAILS,'&QUOT','Times New Roman')

T-SQL Insert or update

I have a question regarding performance of SQL Server.
Suppose I have a table persons with the following columns: id, name, surname.
Now, I want to insert a new row in this table. The rule is the following:
If id is not present in the table, then insert the row.
If id is present, then update.
I have two solutions here:
First:
update persons
set id=#p_id, name=#p_name, surname=#p_surname
where id=#p_id
if ##ROWCOUNT = 0
insert into persons(id, name, surname)
values (#p_id, #p_name, #p_surname)
Second:
if exists (select id from persons where id = #p_id)
update persons
set id=#p_id, name=#p_name, surname=#p_surname
where id=#p_id
else
insert into persons(id, name, surname)
values (#p_id, #p_name, #p_surname)
What is a better approach? It seems like in the second choice, to update a row, it has to be searched two times, whereas in the first option - just once. Are there any other solutions to the problem? I am using MS SQL 2000.
Both work fine, but I usually use option 2 (pre-mssql 2008) since it reads a bit more clearly. I wouldn't stress about the performance here either...If it becomes an issue, you can use NOLOCK in the exists clause. Though before you start using NOLOCK everywhere, make sure you've covered all your bases (indexes and big picture architecture stuff). If you know you will be updating every item more than once, then it might pay to consider option 1.
Option 3 is to not use destructive updates. It takes more work, but basically you insert a new row every time the data changes (never update or delete from the table) and have a view that selects all the most recent rows. It's useful if you want the table to contain a history of all its previous states, but it can also be overkill.
Option 1 seems good. However, if you're on SQL Server 2008, you could also use MERGE, which may perform good for such UPSERT tasks.
Note that you may want to use an explicit transaction and the XACT_ABORT option for such tasks, so that the transaction consistency remains in the case of a problem or concurrent change.
I tend to use option 1. If there is record in a table, you save one search. If there isn't, you don't loose anything. Moreover, in the second option you may run into funny locking and deadlocking issues related to locks incompatibility.
There's some more info on my blog:
http://sqlblogcasts.com/blogs/piotr_rodak/archive/2010/01/04/updlock-holdlock-and-deadlocks.aspx
You could just use ##RowCount to see if the update did anything. Something like:
UPDATE MyTable
SET SomeData = 'Some Data' WHERE ID = 1
IF ##ROWCOUNT = 0
BEGIN
INSERT MyTable
SELECT 1, 'Some Data'
END
Aiming to be a little more DRY, I avoid writing out the values list twice.
begin tran
insert into persons (id)
select #p_id from persons
where not exists (select * from persons where id = #p_id)
update persons
set name=#p_name, surname=#p_surname
where id = #p_id
commit
Columns name and surname have to be nullable.
The transaction means no other user will ever see the "blank" record.
Edit: cleanup

Delete large amount of data in sql server

Suppose that I have a table with 10000000 record. What is difference between this two solution?
delete data like :
DELETE FROM MyTable
delete all of data with a application row by row :
DELETE FROM MyTable WHERE ID = #SelectedID
Is the first solution has best performance?
what is the impact on log and performance?
If you need to restrict to what rows you need to delete and not do a complete delete, or you can't use TRUNCATE TABLE (e.g. the table is referenced by a FK constraint, or included in an indexed view), then you can do the delete in chunks:
DECLARE #RowsDeleted INTEGER
SET #RowsDeleted = 1
WHILE (#RowsDeleted > 0)
BEGIN
-- delete 10,000 rows a time
DELETE TOP (10000) FROM MyTable [WHERE .....] -- WHERE is optional
SET #RowsDeleted = ##ROWCOUNT
END
Generally, TRUNCATE is the best way and I'd use that if possible. But it cannot be used in all scenarios. Also, note that TRUNCATE will reset the IDENTITY value for the table if there is one.
If you are using SQL 2000 or earlier, the TOP condition is not available, so you can use SET ROWCOUNT instead.
DECLARE #RowsDeleted INTEGER
SET #RowsDeleted = 1
SET ROWCOUNT 10000 -- delete 10,000 rows a time
WHILE (#RowsDeleted > 0)
BEGIN
DELETE FROM MyTable [WHERE .....] -- WHERE is optional
SET #RowsDeleted = ##ROWCOUNT
END
If you have that many records in your table and you want to delete them all, you should consider truncate <table> instead of delete from <table>. It will be much faster, but be aware that it cannot activate a trigger.
See for more details (this case sql server 2000):
http://msdn.microsoft.com/en-us/library/aa260621%28SQL.80%29.aspx
Deleting the table within the application row by row will end up in long long time, as your dbms can not optimize anything, as it doesn't know in advance, that you are going to delete everything.
The first has clearly better performance.
When you specify DELETE [MyTable] it will simply erase everything without doing checks for ID. The second one will waste time and disk operation to locate a respective record each time before deleting it.
It also gets worse because every time a record disappears from the middle of the table, the engine may want to condense data on disk, thus wasting time and work again.
Maybe a better idea would be to delete data based on clustered index columns in descending order. Then the table will basically be truncated from the end at every delete operation.
Option 1 will create a very large transaction and have a big impact on the log / performance, as well as escalating locks so that the table will be unavailable.
Option 2 will be slower, although it will generate less impact on the log (assuming bulk / full mode)
If you want to get rid of all the data, Truncate Table MyTable would be faster than both, although it has no facility to filter rows, it does a meta data change at the back and basically drops the IAM on the floor for the table in question.
The best performance for clearing a table would bring TRUNCATE TABLE MyTable. See http://msdn.microsoft.com/en-us/library/ms177570.aspx for more verbose explaination
Found this post on Microsoft TechNet.
Basically, it recommends:
By using SELECT INTO, copy the data that you want to KEEP to an intermediate table;
Truncate the source table;
Copy back with INSERT INTO from intermediate table, the data to the source table;
..
BEGIN TRANSACTION
SELECT *
INTO dbo.bigtable_intermediate
FROM dbo.bigtable
WHERE Id % 2 = 0;
TRUNCATE TABLE dbo.bigtable;
SET IDENTITY_INSERT dbo.bigTable ON;
INSERT INTO dbo.bigtable WITH (TABLOCK) (Id, c1, c2, c3)
SELECT Id, c1, c2, c3 FROM dbo.bigtable_intermediate ORDER BY Id;
SET IDENTITY_INSERT dbo.bigtable OFF;
ROLLBACK TRANSACTION
The first will delete all the data from the table and will have better performance that your second who will delete only data from a specific key.
Now if you have to delete all the data from the table and you don't rely on using rollback think of the use a truncate table

Resources