Massive Update with foreign key dependencies - sql-server

I have the following query:
update largeTable
set largeTable_id ='NA';
I would like to know what are the best practices to perform that kind of updates if we talk about a 45m records table. Should I consider a cascade update? Or is this automatically done?
I have the below query as an example to perform the updates in separate batches to avoid tlog space issues :
DECLARE #i INT=1
WHILE (#i <= 10)
BEGIN
UPDATE TOP(20000) largeTable
SET largeTable_id = 'NA'
SET #i = #i + 1
END
So, that's pretty much the idea, any comment or suggestion will be appreciated.
Thanks in advance :).
Adding a new idea:
--T-SQL using the ROWCOUNT setting to control update size
SET ROWCOUNT 1000
WHILE (1 = 1)
BEGIN
BEGIN TRANSACTION
UPDATE tableB
SET TableB_TableA_id = 'NA';
IF ##ROWCOUNT = 0
BEGIN
COMMIT TRANSACTION
BREAK
END
COMMIT TRANSACTION
END
SET ROWCOUNT 0
The main goal is to perform that update in multiple batches avoiding issues in the tlog datafile and perform the cascade updates with no performance issues.

DECLARE #AffectedRows INT, #BatchSize INT;
SET #BatchSize = 5000;
SET #AffectedRows = #BatchSize;
WHILE (#Rows = #BatchSize)
BEGIN
UPDATE TOP (#BatchSize) tableB
SET TableB_TableA_id='NA'
WHERE TableB_TableA_id <> 'NA';
SET #AffectedRows = ##ROWCOUNT;
END;

Related

DELETE using WHILE LOOP in a TRANSACTION

I am trying to delete table which just has around 39K records, but for some reasons it is taking time(Around 1.5 minutes) even to delete a single record. How can I improve the performance of my delete operation. How can I ensure that log activity is not taking much time. Can I put the "DELETE" statement within a while loop and then open a transaction and commit it each time it successfully completes. Any other effective method is available?
[PrimaryKey] here has a "Clustered Index"
DECLARE #BatchCount INT;
SELECT #BatchCount = COUNT(1) FROM #DHDID
DECLARE #Counter INT = 1
WHILE( #Counter <= #BatchCount)
BEGIN
BEGIN TRANSACTION
DECLARE #ID INT;
SELECT #ID = DHDID FROM #DHDID WHERE ID = #Counter
DELETE FROM <MYTABLE> WHERE [PrimaryKey] = #ID
COMMIT TRANSACTION
SET #Counter = #Counter + 1
END
Based on your answer, you should do a set-based delete via a join. Try something like this:
Begin Tran
Delete m
From <MyTable> m
Inner Join DHDID d
on d.DHDID = m.[PrimaryKey]
-- error detection code here
If <an error occurred>
Rollback
Else
Commit
I would try creating index on the #DHDID table:
CREATE NONCLUSTERED INDEX [idx] ON [#DHDID] ([ID] ASC) INCLUDE ([DHDID])

SQL Update with Inner Join on different databases

I am trying to update the data in a table with data from another table in a different database with a inner join. The amount of data is pretty big and this results in an execution time of over 10 hours and that makes me think that there is something maybe wrong with my query.
UPDATE [Database1]..[Table1]
SET [Database1]..[Table1].Table1BitValue =
CASE
WHEN ([Database2]..[Table2].Table2BitValue IS NULL
OR [Database2]..[Table2].Table2BitValue = 0)
THEN 0
ELSE 1
END
FROM [Database1]..[Table1]
INNER JOIN [Database2]..[Table2] ON [Database2]..[Table2].[Table2Id] = [Database1]..[Table1] .[Table1Id]
You can try updating table in chunks.
Idea is to avoid locking entire table due to large number of rows
DECLARE #maxID INT,
#startRange INT,
#endRange INT,
#batchSize INT; -- keep below 5000 to be safe
SET #batchSize = 2000;
SET #startRange = 0;
SET #endRange = #batchSize;
SET #maxID = 1;
SELECT #maxID = max([Table1Id]) FROM [Database1]..[Table1]
BEGIN TRY
WHILE (#startRange < #maxID)
BEGIN
UPDATE [Database1]..[Table1]
SET [Database1]..[Table1].Table1BitValue =
CASE
WHEN ([Database2]..[Table2].Table2BitValue IS NULL
OR [Database2]..[Table2].Table2BitValue = 0)
THEN 0
ELSE 1
END
FROM [Database1]..[Table1]
INNER JOIN [Database2]..[Table2] ON [Database2]..[Table2].[Table2Id] = [Database1]..[Table1].[Table1Id]
WHERE [Database1]..[Table1].[Table1Id] BETWEEN #startRange AND #endRange;
SET #startRange = #endRange + 1;
SET #endRange = #endRange + #batchSize;
END;
END TRY
BEGIN CATCH
-- Add your code for: RAISERROR();
RETURN;
END CATCH;
This is just idea of chunking, you can modify according to your needs.
Haven't verified, please verify before executing above script
You need to help the optimizer out a little bit and put a filter on to remote table if possible. Otherwise it is going to pull back all of the rows in the remote table to satisfy the join.
Here are some other ways to attack it:
https://blogs.technet.microsoft.com/pfelatam/2011/09/07/linked-server-behavior-when-used-on-join-clauses/

Is there a better way to DELETE 80 million+ rows from a table?

Is there a better way to DELETE 80 million+ rows from a table?
WHILE EXISTS (SELECT TOP 1 * FROM large_table)
BEGIN
WITH LT AS
(
SELECT TOP 60000 *
FROM large_table
)
DELETE FROM LT
END
This does the job of keeping my transaction logs from becoming too large, but I need to know if there is a way to make this process go faster? I've had my computer on for 5+ days now running this script and I haven't gotten very far, very fast.
You can truncate the table simply by.
TRUNCATE TABLE large_table
GO
You can also use delete by using where condition. The time taken by delete depends on various aspects. You can reduce the cost by eliminating SELECT query in the condition of WHILE loop.
DECLARE #rows INT = 1
WHILE (#rows>0)
BEGIN
DELETE TOP 1000 *
FROM large_table
#rows = ##ROWCOUNT
END
Bulk deletion will create a lots of logs and rollback happen if the log file is full.
you can do the delete as batches and ensure every transaction is committed.
DECLARE #IDCollection TABLE (ID INT)
DECLARE #Batch INT = 1000;
DECLARE #ROWCOUNT INT;
WHILE (1 = 1)
BEGIN
BEGIN TRANSACTION;
INSERT INTO #IDCollection
SELECT TOP (#Batch) ID
FROM table
ORDER BY id
DELETE
FROM table
WHERE id IN (
SELECT *
FROM #IDCollection
)
SET #ROWCOUNT = ##ROWCOUNT
IF (#ROWCOUNT = 0)
BREAK
COMMIT TRANSACTION;
END

Transaction is rolled back after commit?

I'm experiencing some problems that look a LOT like a transaction in a stored procedure has been rolled back, even though I'm fairly certain that it was committed, since the output variable isn't set until after the commit, and the user gets the value of the output variable (I know, because they print it out and I also set up a log table where i input the value of the output variable).
In theory someone COULD manually delete and update the data such that it would look like a rollback, but it is extremely unlikely.
So, I'm hoping someone can spot some kind of structural mistake in my stored procedure. Meet BOB:
CREATE procedure [dbo].[BOB] (#output_id int OUTPUT, #output_msg varchar(255) OUTPUT)
as
BEGIN
SET NOCOUNT ON
DECLARE #id int
DECLARE #record_id int
SET #output_id = 1
-- some preliminary if-statements that doesn't alter any data, but might do a RETURN
SET XACT_ABORT ON
BEGIN TRANSACTION
BEGIN TRY
--insert into table A
SET #id = SCOPE_IDENTITY()
--update table B
DECLARE csr cursor local FOR
SELECT [some stuff] and record_id
FROM temp_table_that_is_not_actually_a_temporary_table
open csr
fetch next from csr into [some variables], #record_id
while ##fetch_status=0
begin
--check type of item + if valid
IF (something)
BEGIN
SET SOME VARIABLE
END
ELSE
BEGIN
ROLLBACK TRANSACTION
SET #output_msg = 'item does not exist'
SET #output_id = 0
RETURN
END
--update table C
--update table D
--insert into table E
--execute some other stored procedure (without transactions)
if (something)
begin
--insert into table F
--update table C again
end
DELETE FROM temp_table_that_is_not_actually_a_temporary_table WHERE record_id=#record_id
fetch next from csr into [some variables], #record_id
end
close csr
deallocate csr
COMMIT TRANSACTION
SET #output_msg = 'ok'
SET #output_id = #id
END TRY
BEGIN CATCH
ROLLBACK TRANSACTION
SET #output_msg = 'transaction failed !'
SET #output_id = 0
INSERT INTO errors (record_time, sp_name, sp_msg, error_msg)
VALUES (getdate(), 'BOB', #output_msg, error_message())
END CATCH
RETURN
END
I know, my user gets an #output_id that is the SCOPE_IDENTITY() and he also gets an #output_msg that says 'ok'. Is there ANY way he can get those outputs without the transaction getting committed?
Thank you.
You know the problem is that transaction dose NOT support rollback on variables because there is no data change inside database. Either commit or rollback of the transactions ONLY make difference on those database objects (tables, temp table, etc.), NOT THE VARIABLES (including table variables).
--EDIT
declare #v1 int = 0, #v2 int = 0, #v3 int = 0
set #v2 = 1
begin tran
set #v1 = 1
commit tran
begin tran
set #v3 = 1
rollback tran
select #v1 as v1, #v2 as v2, #v3 as v3
RESULT is as follows
Personally I never used transactions in stored procedures, especially when they are used simultaniously by many people. I seriously avoid cursors as well.
I think I would go with passing the involved rows of temp_table_that_is_not_actually_a_temporary_table into a real temp table and then go with an if statement for all rows together. That's so simple in tsql:
select (data) into #temp from (normal_table) where (conditions).
What's the point of checking each row, doing the job and then rollback the whole thing if say the last row doesn't meet the condition? Do the check for all of them at once, do the job for all of them at once. That's what sql is all about.

Correct locking for stored procedure

Apologies for the basic nature of this question by SQL, but it comes from a SQL noob.
I've created the following stored procedure after some online research. The aim of the the procedure is to maintain count (VisitCount), so the appropriate locking is necessary to maintain integrity. As far as I understand MERGE gives the correct level of lock for this scenario but I'd appreciate it if someone could advise whether this is correct or not.
Thanks.
ALTER PROCEDURE dbo.Popularity_Update
#TermID int
AS
SET NOCOUNT ON
DECLARE #Now date = SYSDATETIME()
BEGIN TRY
MERGE Popularity AS t
USING (SELECT #TermID AS TermID, #Now AS VisitDate) AS s ON t.TermID = s.TermID
AND t.VisitDate = s.VisitDate
WHEN MATCHED THEN
UPDATE
SET VisitCount += 1
WHEN NOT MATCHED BY TARGET THEN
INSERT (TermID, VisitDate, VisitCount)
VALUES (s.TermID, s.VisitDate, 1);
END TRY
BEGIN CATCH
END CATCH
How about this....
ALTER PROCEDURE dbo.Popularity_Update
#TermID int
AS
BEGIN
SET NOCOUNT ON;
DECLARE #Now date = SYSDATETIME()
BEGIN TRY
UPDATE Popularity
SET VisitCount = COALESCE(VisitCount, 0) + 1
WHERE TermID = #TermID
AND VisitDate = #Now
IF (##ROWCOUNT = 0)
BEGIN
INSERT INTO Popularity (TermID, VisitDate, VisitCount)
VALUES (#TermID, #Now, 1)
END
END TRY
BEGIN CATCH
END CATCH
END

Resources