I have been trying to update a column in a table and I am getting the below error:
The transaction log for database 'STAGING' is full due to 'ACTIVE_TRANSACTION'.
I am trying to run the below statement :
UPDATE [STAGING].[dbo].[Stg_Encounter_Alias]
SET
[valid_flag] = 1
FROM [Stg_Encounter_Alias] Stg_ea
where [ACTIVE_IND] = 1
and [END_EFFECTIVE_DT_TM] > convert(date,GETDATE())
My table has approx 18 million rows. And the above update will modify all the rows. The table size is 2.5 GB. Also the DB is in simple recovery mode
This is something that I'll be doing very frequently on different tables. How can I manage this?
My Database size is as per below
Below are the database properties!!! I have tried changing the logsize to unlimited but it goes back to default.
Can any one tell me an efficient way to handle this scenario?
If I run in batches :
begin
DECLARE #COUNT INT
SET #COUNT = 0
SET NOCOUNT ON;
DECLARE #Rows INT,
#BatchSize INT; -- keep below 5000 to be safe
SET #BatchSize = 2000;
SET #Rows = #BatchSize; -- initialize just to enter the loop
WHILE (#Rows = #BatchSize)
BEGIN
UPDATE TOP (#BatchSize) [STAGING].[dbo].[Stg_Encounter_Alias]
SET
[valid_flag] = 1
FROM [Stg_Encounter_Alias] Stg_ea
where [ACTIVE_IND] = 1
and [END_EFFECTIVE_DT_TM] > convert(date,GETDATE())
SET #Rows = ##ROWCOUNT;
END;
end
You are performing your update in a single transaction, and this causes the transaction log to grow very large.
Instead, perform your updates in batches, say 50K - 100K at a time.
Do you have an index on END_EFFECTIVE_DT_TM that includes ACTIVE_IND and valid_flag? That would help performance.
CREATE INDEX NC_Stg_Encounter_Alias_END_EFFECTIVE_DT_TM_I_
ON [dbo].[Stg_Encounter_Alias](END_EFFECTIVE_DT_TM)
INCLUDE (valid_flag)
WHERE ([ACTIVE_IND] = 1);
Another thing that can help performance drastically if you are running Enterprise Edition OR SQL Server 2016 SP1 or later (any edition), is turning on data_compression = page for the table and it's indexes.
Related
We are trying to delete the data from tables which have not been used since many years.
If we are doing CRUD operations on tables SQL Server log file size is going to be increased,
and our goal is that not to increase size the log file, or any system resources,
Is there any other solutions?
We had this issue and used the below code to delete 10,000 records at a time, until the table is empty.
DECLARE #Rowcount INT = 1
WHILE #Rowcount > 0
BEGIN
DELETE TOP (10000)
FROM TABLE
SET #Rowcount = ##ROWCOUNT
END
I'm trying to benchmark memory optimized tables in Microsoft SQL Server 2016 with classic temporary tables.
SQL Server version:
Microsoft SQL Server 2016 (SP2) (KB4052908) - 13.0.5026.0 (X64) Mar 18 2018 09:11:49
Copyright (c) Microsoft Corporation
Developer Edition (64-bit) on Windows 10 Enterprise 10.0 <X64> (Build 17134: ) (Hypervisor)
I'm following steps described here: https://learn.microsoft.com/en-us/sql/relational-databases/in-memory-oltp/faster-temp-table-and-table-variable-by-using-memory-optimization?view=sql-server-ver15.
CrudTest_TempTable 1000, 100, 100
go 1000
versus
CrudTest_memopt_hash 1000, 100, 100
go 1000
What this test does?
1000 inserts
100 random updates
100 random deletes
And this is repeated 1000 times.
First stored procedure that uses classic temporary tables takes about 6 seconds to run.
Second stored procedure takes at least 15 seconds and usually errors out:
Beginning execution loop
Msg 3998, Level 16, State 1, Line 3
Uncommittable transaction is detected at the end of the batch. The transaction is rolled back.
Msg 701, Level 17, State 103, Procedure CrudTest_memopt_hash, Line 16 [Batch Start Line 2]
There is insufficient system memory in resource pool 'default' to run this query.
I have done following optimizations (before it was even worse):
hash index includes both Col1 and SpidFilter
doing everything in single transaction makes it works faster (however it would be nice to run without it)
I'm generating random ids - without it records from every iteration ended up in the same buckets
I haven't created natively compiled SP yet since my results are awful.
I have plenty of free RAM on my box and SQL Server can consume it - in different scenarios it allocates much memory but in this test case it simply errors out.
For me these results mean that memory optimized tables cannot replace temporary tables. Do you have similar results or am I doing something wrong?
The code that uses temporary tables is:
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
DROP PROCEDURE IF EXISTS CrudTest_TempTable;
GO
CREATE PROCEDURE CrudTest_TempTable
#InsertsCount INT, #UpdatesCount INT, #DeletesCount INT
AS
BEGIN
SET NOCOUNT ON;
BEGIN TRAN;
CREATE TABLE #tempTable
(
Col1 INT NOT NULL PRIMARY KEY CLUSTERED,
Col2 NVARCHAR(4000),
Col3 NVARCHAR(4000),
Col4 DATETIME2,
Col5 INT NOT NULL
);
DECLARE #cnt INT = 0;
DECLARE #currDate DATETIME2 = GETDATE();
WHILE #cnt < #InsertsCount
BEGIN
INSERT INTO #tempTable (Col1, Col2, Col3, Col4, Col5)
VALUES (#cnt,
'sdkfjsdjfksjvnvsanlknc kcsmksmk ms mvskldamvks mv kv al kvmsdklmsdkl mal mklasdmf kamfksam kfmasdk mfksamdfksafeowa fpmsad lak',
'msfkjweojfijm skmcksamepi eisjfi ojsona npsejfeji a piejfijsidjfai spfdjsidjfkjskdja kfjsdp fiejfisjd pfjsdiafjisdjfipjsdi s dfipjaiesjfijeasifjdskjksjdja sidjf pajfiaj pfsdj pidfe',
#currDate, 100);
SET #cnt = #cnt + 1;
END
SET #cnt = 0;
WHILE #cnt < #UpdatesCount
BEGIN
UPDATE #tempTable SET Col5 = 101 WHERE Col1 = cast ((rand() * #InsertsCount) as int);
SET #cnt = #cnt + 1;
END
SET #cnt = 0;
WHILE #cnt < #DeletesCount
BEGIN
DELETE FROM #tempTable WHERE Col1 = cast ((rand() * #InsertsCount) as int);
SET #cnt = #cnt + 1;
END
COMMIT;
END
GO
The objects used in the in-memory test are :
DROP PROCEDURE IF EXISTS CrudTest_memopt_hash;
GO
DROP SECURITY POLICY IF EXISTS tempTable_memopt_hash_SpidFilter_Policy;
GO
DROP TABLE IF EXISTS tempTable_memopt_hash;
GO
DROP FUNCTION IF EXISTS fn_SpidFilter;
GO
CREATE FUNCTION fn_SpidFilter(#SpidFilter smallint)
RETURNS TABLE
WITH SCHEMABINDING , NATIVE_COMPILATION
AS
RETURN
SELECT 1 AS fn_SpidFilter
WHERE #SpidFilter = ##spid;
GO
CREATE TABLE tempTable_memopt_hash
(
Col1 INT NOT NULL,
Col2 NVARCHAR(4000),
Col3 NVARCHAR(4000),
Col4 DATETIME2,
Col5 INT NOT NULL,
SpidFilter SMALLINT NOT NULL DEFAULT (##spid),
INDEX ix_SpidFiler NONCLUSTERED (SpidFilter),
INDEX ix_hash HASH (Col1, SpidFilter) WITH (BUCKET_COUNT=100000),
CONSTRAINT CHK_SpidFilter CHECK ( SpidFilter = ##spid )
) WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_ONLY);
GO
CREATE SECURITY POLICY tempTable_memopt_hash_SpidFilter_Policy
ADD FILTER PREDICATE dbo.fn_SpidFilter(SpidFilter)
ON dbo.tempTable_memopt_hash
WITH (STATE = ON);
GO
And the stored procedure that uses them is:
CREATE PROCEDURE CrudTest_memopt_hash
#InsertsCount INT, #UpdatesCount INT, #DeletesCount int
AS
BEGIN
SET NOCOUNT ON;
BEGIN TRAN;
DECLARE #cnt INT = 0;
DECLARE #currDate DATETIME2 = GETDATE();
DECLARE #IdxStart INT = CAST ((rand() * 1000) AS INT);
WHILE #cnt < #InsertsCount
BEGIN
INSERT INTO tempTable_memopt_hash(Col1, Col2, Col3, Col4, Col5)
VALUES (#IdxStart + #cnt,
'sdkfjsdjfksjvnvsanlknc kcsmksmk ms mvskldamvks mv kv al kvmsdklmsdkl mal mklasdmf kamfksam kfmasdk mfksamdfksafeowa fpmsad lak',
'msfkjweojfijm skmcksamepi eisjfi ojsona npsejfeji a piejfijsidjfai spfdjsidjfkjskdja kfjsdp fiejfisjd pfjsdiafjisdjfipjsdi s dfipjaiesjfijeasifjdskjksjdja sidjf pajfiaj pfsdj pidfe',
#currDate, 100);
SET #cnt = #cnt + 1;
END
SET #cnt = 0;
WHILE #cnt < #UpdatesCount
BEGIN
UPDATE tempTable_memopt_hash
SET Col5 = 101
WHERE Col1 = #IdxStart + cast ((rand() * #InsertsCount) as int);
SET #cnt = #cnt + 1;
END
SET #cnt = 0;
WHILE #cnt < #DeletesCount
BEGIN
DELETE FROM tempTable_memopt_hash
WHERE Col1 = #IdxStart + cast ((rand() * #InsertsCount) as int);
SET #cnt = #cnt + 1;
END
DELETE FROM tempTable_memopt_hash;
COMMIT;
END
GO
Index stats:
table index total_bucket_count empty_bucket_count empty_bucket_percent avg_chain_length max_chain_length
[dbo].[tempTable_memopt_hash] PK__tempTabl__3ED0478731BB5AF0 131072 130076 99 1 3
UPDATE
I'm including my final test cases and sql code for creating procedures, tables, etc. I've performed test on empty database.
SQL Code: https://pastebin.com/9K6SgAqZ
Test cases: https://pastebin.com/ckSTnVqA
My last run looks like this (temp table is the fastest one when it comes to tables, but I am able to achieve fastest times using memory optimized table variable):
Start CrudTest_TempTable 2019-11-18 10:45:02.983
Beginning execution loop
Batch execution completed 1000 times.
Finish CrudTest_TempTable 2019-11-18 10:45:09.537
Start CrudTest_SpidFilter_memopt_hash 2019-11-18 10:45:09.537
Beginning execution loop
Batch execution completed 1000 times.
Finish CrudTest_SpidFilter_memopt_hash 2019-11-18 10:45:27.747
Start CrudTest_memopt_hash 2019-11-18 10:45:27.747
Beginning execution loop
Batch execution completed 1000 times.
Finish CrudTest_memopt_hash 2019-11-18 10:45:46.100
Start CrudTest_tableVar 2019-11-18 10:45:46.100
Beginning execution loop
Batch execution completed 1000 times.
Finish CrudTest_tableVar 2019-11-18 10:45:47.497
IMHO, the test in OP cannot show the advantages of memory-optimized tables
because the greatest advantage of these tables is that they are lock-and-latch free, this means your update/insert/delete do not take locks at all that permits concurrent changes to these tables.
But the test made does not include concurrent changes at all, the code shown make all the changes in one session.
Another observation: hash index defined on the table is wrong as you search only on one column and hash index is defined on two columns. Hash index on two columns means that hash function is applied to both arguments, but you search only on one column so hash index just cannot be used.
Do you think by using mem opt tables I can get performance
improvements over temp tables or is it just for limiting IO on tempdb?
Memory-optimized tables are not supposed to substitute temporary tables, as already mentioned, you'll see the profit in highly concurrent OLTP environment, while as you guess temporary table is visible only to your session, there is no concurrency at all.
Eliminate latches and locks. All In-Memory OLTP internal data structures are latch- and lock-free. In-Memory OLTP uses a new
multi-version concurrency control (MVCC) to provide transaction
consistency. From a user standpoint, it behaves in a way similar to
the regular SNAPSHOT transaction isolation level; however, it does
not use locking under the hood. This schema allows multiple sessions
to work with the same data without locking and blocking each other
and improves the scalability of the system allowing fully utilize
modern multi-CPU/multi-core hardware.
Cited book: Pro SQL Server Internals by Dmitri Korotkevitch
What do you think about the title "Faster temp table and table
variable by using memory optimization"
I opened this article and see these examples (in the order they are in the article)
A. Basics of memory-optimized table variables
B. Scenario: Replace global tempdb ##table
C. Scenario: Replace session tempdb #table
A. I use table variables only in cases they contain very few rows. Why should I even take care about this few rows?
B. Replace global tempdb ##table . I just don't use them at all.
C. Replace session tempdb #table. As already mentioned, session tempdb #table is not visible to any other session, so what is the gain? That the data don't go to the disk? May be you shuold think about fastest SSD disk for your tempdb if you really have problems with tempdb? Starting with 2014 tempdb objects don't necessarily goes to disk even in case of bulk inserts, in any case I have even RCSI enabled on my databases and have no problems with tempdb.
Likely will not see performance improvement, only in very special applications. SQL dabbled with things like 'pin table' in the past but the optimizer, choosing what pages are in memory based on real activity, is probably as good as it gets for almost all cases. This has been performance tuned over decades. I think that 'in memory' is more a marketing touchpoint than any practical use. Prove me wrong please.
In SQL Server, I have a table (dbo.MYTABLE) containing 12 million rows and 5.6 GB storage.
I need to update a varchar(150) field for each record.
I perform the UPDATE operation in WHILE loop, updating 50K rows in each iteration.
Transaction log seems not to free after each iteration and keeps growing.
Even, after all the UPDATE process finished, Transaction Log space is not returned.
My question is that, why used transaction log space never decreases even UPDATE is complete. Code is below:
DECLARE #MAX_COUNT INT;
DECLARE #COUNT INT;
DECLARE #INC INT;
SET #COUNT = 0;
SET #INC = 50000;
SELECT #MAX_COUNT=MAX(ID) FROM dbo.MYTABLE(NOLOCK)
WHILE #COUNT <= #MAX_COUNT
BEGIN
UPDATE dbo.MYTABLE
SET NEW_NAME = REPLACE(NAME,' X','Y'))
WHERE ID > #COUNT AND ID<=( #COUNT + #INC)
SET #COUNT = (#COUNT + #INC)
END
if you don't do Transaction Log Backups, switch to simple:
USE [master]
GO
ALTER DATABASE [YOUR_DB] SET RECOVERY SIMPLE WITH NO_WAIT
GO
I am working on a client's database and there is about 1 million rows that need to be deleted due to a bug in the software. Is there an efficient way to delete them besides:
DELETE FROM table_1 where condition1 = 'value' ?
Here is a structure for a batched delete as suggested above. Do not try 1M at once...
The size of the batch and the waitfor delay are obviously quite variable, and would depend on your servers capabilities, as well as your need to mitigate contention. You may need to manually delete some rows, measuring how long they take, and adjust your batch size to something your server can handle. As mentioned above, anything over 5000 can cause locking (which I was not aware of).
This would be best done after hours... but 1M rows is really not a lot for SQL to handle. If you watch your messages in SSMS, it may take a while for the print output to show, but it will after several batches, just be aware it won't update in real-time.
Edit: Added a stop time #MAXRUNTIME & #BSTOPATMAXTIME. If you set #BSTOPATMAXTIME to 1, the script will stop on it's own at the desired time, say 8:00AM. This way you can schedule it nightly to start at say midnight, and it will stop before production at 8AM.
Edit: Answer is pretty popular, so I have added the RAISERROR in lieu of PRINT per comments.
DECLARE #BATCHSIZE INT, #WAITFORVAL VARCHAR(8), #ITERATION INT, #TOTALROWS INT, #MAXRUNTIME VARCHAR(8), #BSTOPATMAXTIME BIT, #MSG VARCHAR(500)
SET DEADLOCK_PRIORITY LOW;
SET #BATCHSIZE = 4000
SET #WAITFORVAL = '00:00:10'
SET #MAXRUNTIME = '08:00:00' -- 8AM
SET #BSTOPATMAXTIME = 1 -- ENFORCE 8AM STOP TIME
SET #ITERATION = 0 -- LEAVE THIS
SET #TOTALROWS = 0 -- LEAVE THIS
WHILE #BATCHSIZE>0
BEGIN
-- IF #BSTOPATMAXTIME = 1, THEN WE'LL STOP THE WHOLE JOB AT A SET TIME...
IF CONVERT(VARCHAR(8),GETDATE(),108) >= #MAXRUNTIME AND #BSTOPATMAXTIME=1
BEGIN
RETURN
END
DELETE TOP(#BATCHSIZE)
FROM SOMETABLE
WHERE 1=2
SET #BATCHSIZE=##ROWCOUNT
SET #ITERATION=#ITERATION+1
SET #TOTALROWS=#TOTALROWS+#BATCHSIZE
SET #MSG = 'Iteration: ' + CAST(#ITERATION AS VARCHAR) + ' Total deletes:' + CAST(#TOTALROWS AS VARCHAR)
RAISERROR (#MSG, 0, 1) WITH NOWAIT
WAITFOR DELAY #WAITFORVAL
END
BEGIN TRANSACTION
DoAgain:
DELETE TOP (1000)
FROM <YourTable>
IF ##ROWCOUNT > 0
GOTO DoAgain
COMMIT TRANSACTION
Maybe this solution from Uri Dimant
WHILE 1 = 1
BEGIN
DELETE TOP(2000)
FROM Foo
WHERE <predicate>;
IF ##ROWCOUNT < 2000 BREAK;
END
(Link: https://social.msdn.microsoft.com/Forums/sqlserver/en-US/b5225ca7-f16a-4b80-b64f-3576c6aa4d1f/how-to-quickly-delete-millions-of-rows?forum=transactsql)
Here is something I have used:
If the bad data is mixed in with the good-
INSERT INTO #table
SELECT columns
FROM old_table
WHERE statement to exclude bad rows
TRUNCATE old_table
INSERT INTO old_table
SELECT columns FROM #table
Not sure how good this would be but what if you do like below (provided table_1 is a stand alone table; I mean no referenced by other table)
create a duplicate table of table_1 like table_1_dup
insert into table_1_dup select * from table_1 where condition1 <> 'value';
drop table table_1
sp_rename table_1_dup table_1
If you cannot afford to get the database out of production while repairing, do it in small batches. See also: How to efficiently delete rows while NOT using Truncate Table in a 500,000+ rows table
If you are in a hurry and need the fastest way possible:
take the database out of production
drop all non-clustered indexes and triggers
delete the records (or if the majority of records is bad, copy+drop+rename the table)
(if applicable) fix the inconsistencies caused by the fact that you dropped triggers
re-create the indexes and triggers
bring the database back in production
I am running the following stored procedure to delete large number of records. I understand that the DELETE statement writes to the transaction log and deleting many rows will make the log grow.
I have looked into other options of creating tables and inserting records to keep and then Truncating the source, this method will not work for me.
How can I make my stored procedure below more efficient while making sure that I keep the transaction log from growing unnecessarily?
CREATE PROCEDURE [dbo].[ClearLog]
(
#Age int = 30
)
AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;
-- DELETE ERRORLOG
WHILE EXISTS ( SELECT [LogId] FROM [dbo].[Error_Log] WHERE DATEDIFF( dd, [TimeStamp], GETDATE() ) > #Age )
BEGIN
SET ROWCOUNT 10000
DELETE [dbo].[Error_Log] WHERE DATEDIFF( dd, [TimeStamp], GETDATE() ) > #Age
WAITFOR DELAY '00:00:01'
SET ROWCOUNT 0
END
END
Here is how I would do it:
CREATE PROCEDURE [dbo].[ClearLog] (
#Age int = 30)
AS
BEGIN
SET NOCOUNT ON;
DECLARE #d DATETIME
, #batch INT;
SET #batch = 10000;
SET #d = DATEADD( dd, -#Age, GETDATE() )
WHILE (1=1)
BEGIN
DELETE TOP (#batch) [dbo].[Error_Log]
WHERE [Timestamp] < #d;
IF (0 = ##ROWCOUNT)
BREAK
END
END
Make the Tiemstamp comparison SARGable
Separate the GETDATE() at the start of batch to produce a consistent run (otherwise it can block in an infinite loop as new records 'age' as the old ones are being deleted).
use TOP instead of SET ROWCOUNT (deprecated: Using SET ROWCOUNT will not affect DELETE, INSERT, and UPDATE statements in the next release of SQL Server.)
check ##ROWCOUNT to break the loop instead of redundant SELECT
Assuming you have the option of rebuilding the error log table on a partition scheme one option would be to partition the table on date and swap out the partitions. Do a google search for 'alter table switch partition' to dig a bit further.
how about you run it more often, and delete fewer rows each time? Run this every 30 minutes:
CREATE PROCEDURE [dbo].[ClearLog]
(
#Age int = 30
)
AS
BEGIN
SET NOCOUNT ON;
SET ROWCOUNT 10000 --I assume you are on an old version of SQL Server and can't use TOP
DELETE dbo.Error_Log Where Timestamp>GETDATE()-#Age
WAITFOR DELAY '00:00:01' --why???
SET ROWCOUNT 0
END
the way it handles the dates will not truncate time, and you will only delete 30 minutes worth of data each time.
If your database is in FULL recovery mode, the only way to minimize the impact of your delete statements is to "space them out" -- only delete so many during a "transaction interval". For example, if you do t-log backups every hour, only delete, say, 20,000 rows per hour. That may not drop all you need all at once, but will things even out after 24 hours, or after a week?
If your database is in SIMPLE or BULK_LOGGED mode, breaking the deletes into chunks should do it. But, since you're already doing that, I'd have to guess your database is in FULL recover mode. (That, or the connection calling the procedure may be part of a transaction.)
A solution I have used in the past was to temporarily set the recovery model to "Bulk Logged", then back to "Full" at the end of the stored procedure:
DECLARE #dbName NVARCHAR(128);
SELECT #dbName = DB_NAME();
EXEC('ALTER DATABASE ' + #dbName + ' SET RECOVERY BULK_LOGGED')
WHILE EXISTS (...)
BEGIN
-- Delete a batch of rows, then WAITFOR here
END
EXEC('ALTER DATABASE ' + #dbName + ' SET RECOVERY FULL')
This will significantly reduce the transaction log consumption for large batches.
I don't like that it sets the recovery model for the whole database (not just for this session), but it's the best solution I could find.