SQL Server Used Transaction Log Space Not Decreasing After Long UPDATE Finishes - sql-server

In SQL Server, I have a table (dbo.MYTABLE) containing 12 million rows and 5.6 GB storage.
I need to update a varchar(150) field for each record.
I perform the UPDATE operation in WHILE loop, updating 50K rows in each iteration.
Transaction log seems not to free after each iteration and keeps growing.
Even, after all the UPDATE process finished, Transaction Log space is not returned.
My question is that, why used transaction log space never decreases even UPDATE is complete. Code is below:
DECLARE #MAX_COUNT INT;
DECLARE #COUNT INT;
DECLARE #INC INT;
SET #COUNT = 0;
SET #INC = 50000;
SELECT #MAX_COUNT=MAX(ID) FROM dbo.MYTABLE(NOLOCK)
WHILE #COUNT <= #MAX_COUNT
BEGIN
UPDATE dbo.MYTABLE
SET NEW_NAME = REPLACE(NAME,' X','Y'))
WHERE ID > #COUNT AND ID<=( #COUNT + #INC)
SET #COUNT = (#COUNT + #INC)
END

if you don't do Transaction Log Backups, switch to simple:
USE [master]
GO
ALTER DATABASE [YOUR_DB] SET RECOVERY SIMPLE WITH NO_WAIT
GO

Related

SQL Large Insert Transaction log full Error

I am trying to insert almost 1,75,00,000 in 8 tables.
I have stored procedure for that. At the start and end of that procedure, I have written Transaction.
Error: The transaction log for database 'Database' is full due to 'ACTIVE_TRANSACTION'.
Note: I want to keep everything in the transaction, its automated process. This process will run on Database every month
CREATE OR ALTER PROCEDURE [dbo].[InsertInMainTbls]
AS
BEGIN
PRINT('STARTED [InsertInMainTbls]')
DECLARE #NoRows INT
DECLARE #maxLoop INT
DECLARE #isSuccess BIT=1
BEGIN TRY
BEGIN TRAN
--1st table
SET #NoRows = 1
SELECT #maxLoop=(MAX([col1])/1000)+1 FROM ProcessTbl
SELECT 'loop=='+ CAST(#maxLoop as Varchar)
WHILE (#NoRows <= #maxLoop)
BEGIN
INSERT INTO MainTbl WITH(TABLOCK)
( col1,col2,col3....col40)
SELECT
( val1,val2,val3....val40)FROM
ProcessTbl
WHERE [col1] BETWEEN (#NoRows*1000)-1000
AND (#NoRows*1000)-1
SET #NoRows = #NoRows+1;
END
--2nd table
.
.
.
--8th table
SET #isSuccess=1;
COMMIT TRAN
END TRY
BEGIN CATCH
PRINT ERROR_MESSAGE();
SELECT ERROR_MESSAGE() 'ErrorMsg' ;
SET #isSuccess=0;
ROLLBACK TRAN
END CATCH
Despite the fact that is a nonsense to have such a huge transaction, while you can do a manual rollback by tagging the rows with something like a timesptamp or a GUID, to do so, you need to have the necessary space in the transaction log file to store all the rows of all your inserts from the first one til the last one, plus all the transaction that other user swill do at the same time. Many solutions to solve your problem :
1) enlarge your transaction log file
2) add some complementary log files
3) remove or decrease the transaction scope

Deadlocking in Concurrently Executed Nested Stored Procedures

I have a stored procedure that calls child stored procedures to parse JSON that is coming in from the client. The problem is that the JSON parsing mechanism is bottle-necking the application because it's doing it row-by-row. I had an idea to turn it into sort of a queue-ing structure where I could have x jobs running this same parent stored procedure and process through the records quicker. I created a proof of concept for this and it's periodically running into deadlocking issues that I am sort of at a loss as how to resolve. I'd first like to see if the way I'm handling the transactions and the general code is a reasonable method, and then I plan to approach my DBAs to see if they can help in actually resolving the deadlocks because I won't have the access in production to be able to do anything.
NOTE
Please ignore any syntax or spelling issues or lack of variables being declared. I cut out a lot of unnecessary code to keep this as simple as possible while trying to make sure the overarching strategy was still in place. Please consider this "pseudo-code". I realize that having the actual queries run in the child procs would be valuable in sorting out the cause, but I'm mainly interested in finding out if the way I'm using transactions in the nested procedures is correct, and if the transaction isolation level i'm using is right. Also, in the way of solutions, I'm wondering if programmatically re-processing the deadlocks is a valid work-around, or if I should be shooting for a different solution.
So this is the parent procedure:
CREATE PROCEDURE [dbo].[ParentProc]
#BatchSize int = 5
AS
BEGIN
SET NOCOUNT ON;
SET XACT_ABORT ON; --http://www.sommarskog.se/error_handling/Part1.html#jumpXACT_ABORT
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED; --mandated by DBAs unless I provide good reason not to use it.
/*
I opted to use UPDLOCK and READPAST hints in this proc based on doing some research online to determine how best to handle a queue processing system, which is essentially what we're doing.
For reference: https://www.mssqltips.com/sqlservertip/1257/processing-data-queues-in-sql-server-with-readpast-and-updlock/
*/
BEGIN TRY
--this transaction just marks the records with a guid that this instance of the proc will be handling.
BEGIN TRANSACTION
DECLARE #ProcessID varchar(36) = (SELECT NEWID());
;with cte as (
SELECT TOP (#BatchSize) ProcessID, UpdateDate
FROM dbo.MainJSONTable
WHERE IsProcessed = 0
AND ProcessID IS NULL
AND ProcessingFailed = 0
ORDER BY CreateDate ASC
)
UPDATE cte WITH (UPDLOCK, READPAST)
SET ProcessID = #ProcessID,
UpdateDate = GETUTCDATE();
DECLARE #i bigint = 1;
SET #BatchSize = (SELECT COUNT(*) FROM dbo.MainJSONTable WHERE IsProcessed = 0 AND ProcessingFailed = 0 AND ProcessID = #ProcessID);
COMMIT TRANSACTION;
END TRY
BEGIN CATCH
IF (##TRANCOUNT > 0)
BEGIN
ROLLBACK TRANSACTION;
RETURN;
END
END CATCH;
WHILE #i <= #BatchSize
BEGIN
BEGIN TRY
BEGIN TRANSACTION
SET #RowKey = (SELECT TOP 1 RowKey FROM dbo.MainJSONTable WHERE IsProcessed = 0 AND ProcessingFailed = 0 AND ProcessID = #ProcessID ORDER BY CreateDate ASC);
IF(#RowKey IS NULL)
BEGIN
RETURN;
END
BEGIN TRY
EXEC dbo.DoStuffInChildProc #RowKey;
END TRY
BEGIN CATCH
SET #msg = error_message();
RAISERROR (#msg, 16, 1);
END CATCH
UPDATE dbo.MainJSONTable
SET IsProcessed = 1
WHERE RowKey = #RowKey
AND ProcessID = #ProcessID;
COMMIT TRANSACTION;
END TRY
BEGIN CATCH
IF (##TRANCOUNT > 0)
BEGIN
ROLLBACK TRANSACTION;
END
EXEC Logging.SetLogEntry ##PROCID, #msg;
UPDATE dbo.MainJSONTable
SET ProcessingFailed = 1
WHERE RowKey = #RowKey
AND ProcessID = #ProcessID;
END CATCH;
SET #i = #i + 1;
END
END
And here is an example of the child proc that is called. There are multiple child procs called, but this is how they are all handled:
CREATE PROCEDURE [dbo].[DoStuffInChildProc]
#RowKey bigint
AS
BEGIN
SET NOCOUNT ON;
SET XACT_ABORT ON; --http://www.sommarskog.se/error_handling/Part1.html#jumpXACT_ABORT
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED; --mandated by DBAs unless I provide good reason not to use it.
BEGIN TRY
--before this i get some dynamic sql ready.
EXEC sp_executesql #FinalSQL;
END TRY
BEGIN CATCH
IF ##trancount > 0 ROLLBACK TRANSACTION
DECLARE #msg NVARCHAR(200) = CONCAT('[dbo].[DoStuffInChildProc] generated an error while processing RowKey ', #RowKey);
EXEC Logging.SetLogEntry ##PROCID, #msg, #FinalSQL;
RAISERROR(#msg, 16, 1);
RETURN 1;
END CATCH;
END
The error in my log is for the parent procedure:
Transaction (Process ID 60) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
UPDATE
This is the table create statement, and I just realized there currently aren't any indexes on the table other than it's clustered on the primary key. All of the child procs are selecting the Result field which contains the JSON to be parsed. Should I be committing the child transactions as soon as they're finished, or just bubbling up to the parent proc?
CREATE TABLE dbo.MainJSONTable (
TableID bigint IDENTITY(1,1) NOT NULL,
ParentTableFK bigint NULL,
Result nvarchar(MAX),
IsProcessed bit NOT NULL,
ProcessingFailed big NOT NULL
ProcessID
)

SQL Server Log full due to active transaction

I have been trying to update a column in a table and I am getting the below error:
The transaction log for database 'STAGING' is full due to 'ACTIVE_TRANSACTION'.
I am trying to run the below statement :
UPDATE [STAGING].[dbo].[Stg_Encounter_Alias]
SET
[valid_flag] = 1
FROM [Stg_Encounter_Alias] Stg_ea
where [ACTIVE_IND] = 1
and [END_EFFECTIVE_DT_TM] > convert(date,GETDATE())
My table has approx 18 million rows. And the above update will modify all the rows. The table size is 2.5 GB. Also the DB is in simple recovery mode
This is something that I'll be doing very frequently on different tables. How can I manage this?
My Database size is as per below
Below are the database properties!!! I have tried changing the logsize to unlimited but it goes back to default.
Can any one tell me an efficient way to handle this scenario?
If I run in batches :
begin
DECLARE #COUNT INT
SET #COUNT = 0
SET NOCOUNT ON;
DECLARE #Rows INT,
#BatchSize INT; -- keep below 5000 to be safe
SET #BatchSize = 2000;
SET #Rows = #BatchSize; -- initialize just to enter the loop
WHILE (#Rows = #BatchSize)
BEGIN
UPDATE TOP (#BatchSize) [STAGING].[dbo].[Stg_Encounter_Alias]
SET
[valid_flag] = 1
FROM [Stg_Encounter_Alias] Stg_ea
where [ACTIVE_IND] = 1
and [END_EFFECTIVE_DT_TM] > convert(date,GETDATE())
SET #Rows = ##ROWCOUNT;
END;
end
You are performing your update in a single transaction, and this causes the transaction log to grow very large.
Instead, perform your updates in batches, say 50K - 100K at a time.
Do you have an index on END_EFFECTIVE_DT_TM that includes ACTIVE_IND and valid_flag? That would help performance.
CREATE INDEX NC_Stg_Encounter_Alias_END_EFFECTIVE_DT_TM_I_
ON [dbo].[Stg_Encounter_Alias](END_EFFECTIVE_DT_TM)
INCLUDE (valid_flag)
WHERE ([ACTIVE_IND] = 1);
Another thing that can help performance drastically if you are running Enterprise Edition OR SQL Server 2016 SP1 or later (any edition), is turning on data_compression = page for the table and it's indexes.

How to prevent multi threaded application to read this same Sql Server record twice

I am working on a system that uses multiple threads to read, process and then update database records. Threads run in parallel and try to pick records by calling Sql Server stored procedure.
They call this stored procedure looking for unprocessed records multiple times per second and sometimes pick this same record up.
I try to prevent this happening this way:
UPDATE dbo.GameData
SET Exported = #Now,
ExportExpires = #Expire,
ExportSession = #ExportSession
OUTPUT Inserted.ID INTO #ExportedIDs
WHERE ID IN ( SELECT TOP(#ArraySize) GD.ID
FROM dbo.GameData GD
WHERE GD.Exported IS NULL
ORDER BY GD.ID ASC)
The idea here is to set a record as exported first using an UPDATE with OUTPUT (remembering record id), so no other thread can pick it up again. When record is set as exported, then I can do some extra calculations and pass the data to the external system hoping that no other thread will pick this same record again in the mean time. Since the UPDATE that has in mind to secure the record first.
Unfortunately it doesn't seem to be working and the application sometimes pick same record twice anyway.
How to prevent it?
Kind regards
Mariusz
I think you should be able to do this atomically using a common table expression. (I'm not 100% certain about this, and I haven't tested, so you'll need to verify that it works for you in your situation.)
;WITH cte AS
(
SELECT TOP(#ArrayCount)
ID, Exported, ExportExpires, ExportSession
FROM dbo.GameData WITH (READPAST)
WHERE Exported IS NULL
ORDER BY ID
)
UPDATE cte
SET Exported = #Now,
ExportExpires = #Expire,
ExportSession = #ExportSession
OUTPUT INSERTED.ID INTO #ExportedIDs
I have a similar set up and I use sp_getapplock. My application runs many threads and they call a stored procedure to get the ID of the element that has to be processed. sp_getapplock guarantees that the same ID would not be chosen by two different threads.
I have a MyTable with a list of IDs that my application checks in an infinite loop using many threads. For each ID there are two datetime columns: LastCheckStarted and LastCheckCompleted. They are used to determine which ID to pick. Stored procedure picks an ID that wasn't checked for the longest period. There is also a hard-coded period of 20 minutes - the same ID can't be checked more often than every 20 minutes.
CREATE PROCEDURE [dbo].[GetNextIDToCheck]
-- Add the parameters for the stored procedure here
AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;
BEGIN TRANSACTION;
BEGIN TRY
DECLARE #VarID int = NULL;
DECLARE #VarLockResult int;
EXEC #VarLockResult = sp_getapplock
#Resource = 'SomeUniqueName_app_lock',
#LockMode = 'Exclusive',
#LockOwner = 'Transaction',
#LockTimeout = 60000,
#DbPrincipal = 'public';
IF #VarLockResult >= 0
BEGIN
-- Acquired the lock
-- Find ID that wasn't checked for the longest period
SELECT TOP 1
#VarID = ID
FROM
dbo.MyTable
WHERE
LastCheckStarted <= LastCheckCompleted
-- this ID is not being checked right now
AND LastCheckCompleted < DATEADD(minute, -20, GETDATE())
-- last check was done more than 20 minutes ago
ORDER BY LastCheckCompleted;
-- Start checking
UPDATE dbo.MyTable
SET LastCheckStarted = GETDATE()
WHERE ID = #VarID;
-- There is no need to explicitly verify if we found anything.
-- If #VarID is null, no rows will be updated
END;
-- Return found ID, or no rows if nothing was found,
-- or failed to acquire the lock
SELECT
#VarID AS ID
WHERE
#VarID IS NOT NULL
;
COMMIT TRANSACTION;
END TRY
BEGIN CATCH
ROLLBACK TRANSACTION;
END CATCH;
END
The second procedure is called by an application when it finishes checking the found ID.
CREATE PROCEDURE [dbo].[SetCheckComplete]
-- Add the parameters for the stored procedure here
#ParamID int
AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;
BEGIN TRANSACTION;
BEGIN TRY
DECLARE #VarLockResult int;
EXEC #VarLockResult = sp_getapplock
#Resource = 'SomeUniqueName_app_lock',
#LockMode = 'Exclusive',
#LockOwner = 'Transaction',
#LockTimeout = 60000,
#DbPrincipal = 'public';
IF #VarLockResult >= 0
BEGIN
-- Acquired the lock
-- Completed checking the given ID
UPDATE dbo.MyTable
SET LastCheckCompleted = GETDATE()
WHERE ID = #ParamID;
END;
COMMIT TRANSACTION;
END TRY
BEGIN CATCH
ROLLBACK TRANSACTION;
END CATCH;
END
It does not work because multiple transactions might first execute the IN clause and find the same set of rows, then update multiple times and overwrite each other.
LukeH's answer is best, accept it.
You can also fix it by adding AND Exported IS NULL to cancel double updates.
Or, make this SERIALIZABLE. This will lead to some blocking and deadlocking. This can safely be handled by timeouts and retry in case of deadlock. SERIALIZABLE is always safe for all workloads but it might block/deadlock more often.

How to delete records in SQL 2005 keeping transaction logs in check

I am running the following stored procedure to delete large number of records. I understand that the DELETE statement writes to the transaction log and deleting many rows will make the log grow.
I have looked into other options of creating tables and inserting records to keep and then Truncating the source, this method will not work for me.
How can I make my stored procedure below more efficient while making sure that I keep the transaction log from growing unnecessarily?
CREATE PROCEDURE [dbo].[ClearLog]
(
#Age int = 30
)
AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;
-- DELETE ERRORLOG
WHILE EXISTS ( SELECT [LogId] FROM [dbo].[Error_Log] WHERE DATEDIFF( dd, [TimeStamp], GETDATE() ) > #Age )
BEGIN
SET ROWCOUNT 10000
DELETE [dbo].[Error_Log] WHERE DATEDIFF( dd, [TimeStamp], GETDATE() ) > #Age
WAITFOR DELAY '00:00:01'
SET ROWCOUNT 0
END
END
Here is how I would do it:
CREATE PROCEDURE [dbo].[ClearLog] (
#Age int = 30)
AS
BEGIN
SET NOCOUNT ON;
DECLARE #d DATETIME
, #batch INT;
SET #batch = 10000;
SET #d = DATEADD( dd, -#Age, GETDATE() )
WHILE (1=1)
BEGIN
DELETE TOP (#batch) [dbo].[Error_Log]
WHERE [Timestamp] < #d;
IF (0 = ##ROWCOUNT)
BREAK
END
END
Make the Tiemstamp comparison SARGable
Separate the GETDATE() at the start of batch to produce a consistent run (otherwise it can block in an infinite loop as new records 'age' as the old ones are being deleted).
use TOP instead of SET ROWCOUNT (deprecated: Using SET ROWCOUNT will not affect DELETE, INSERT, and UPDATE statements in the next release of SQL Server.)
check ##ROWCOUNT to break the loop instead of redundant SELECT
Assuming you have the option of rebuilding the error log table on a partition scheme one option would be to partition the table on date and swap out the partitions. Do a google search for 'alter table switch partition' to dig a bit further.
how about you run it more often, and delete fewer rows each time? Run this every 30 minutes:
CREATE PROCEDURE [dbo].[ClearLog]
(
#Age int = 30
)
AS
BEGIN
SET NOCOUNT ON;
SET ROWCOUNT 10000 --I assume you are on an old version of SQL Server and can't use TOP
DELETE dbo.Error_Log Where Timestamp>GETDATE()-#Age
WAITFOR DELAY '00:00:01' --why???
SET ROWCOUNT 0
END
the way it handles the dates will not truncate time, and you will only delete 30 minutes worth of data each time.
If your database is in FULL recovery mode, the only way to minimize the impact of your delete statements is to "space them out" -- only delete so many during a "transaction interval". For example, if you do t-log backups every hour, only delete, say, 20,000 rows per hour. That may not drop all you need all at once, but will things even out after 24 hours, or after a week?
If your database is in SIMPLE or BULK_LOGGED mode, breaking the deletes into chunks should do it. But, since you're already doing that, I'd have to guess your database is in FULL recover mode. (That, or the connection calling the procedure may be part of a transaction.)
A solution I have used in the past was to temporarily set the recovery model to "Bulk Logged", then back to "Full" at the end of the stored procedure:
DECLARE #dbName NVARCHAR(128);
SELECT #dbName = DB_NAME();
EXEC('ALTER DATABASE ' + #dbName + ' SET RECOVERY BULK_LOGGED')
WHILE EXISTS (...)
BEGIN
-- Delete a batch of rows, then WAITFOR here
END
EXEC('ALTER DATABASE ' + #dbName + ' SET RECOVERY FULL')
This will significantly reduce the transaction log consumption for large batches.
I don't like that it sets the recovery model for the whole database (not just for this session), but it's the best solution I could find.

Resources