How to loop and/or wait for delay - sql-server

I have imported over 400 million records to a dummy dimension table. I need to take my existing fact table and join to the dummy dimension to perform an update on the fact table. To avoid filling up the transaction logs, somebody suggested I perform a loop to update these records instead of performing an update to hundreds of millions of records at once. I have research loops and researched using a Wait For and Delays, but I am not for sure the best approach on writing the logic out.
Here is the sample update I need to perform:
Update f
set f.value_key = r.value_key
FROM [dbo].[FACT_Table] f
INNER JOIN dbo.dummy_table r ON f.some_key = r.some_Key
and r.calendar_key = f.calendar_key
WHERE f.date_Key > 20130101
AND f.date_key < 20141201
AND f.diff_key = 17
If anybody has a suggestion on the best way to write I would really appreciate it.

To avoid filling up the transaction log you CAN set your recovery model to SIMPLE on your dev machines - that will prevent transaction log bloating when tran log backups aren't done.
ALTER DATABASE MyDB SET RECOVERY SIMPLE;
If you want to perform your update faster use hint ie., (tablock).

Please don't do what the previous person suggested unless you really understand what else will happen. The most important result is that you lose the ability to do a point in time recovery. If you do a full recovery every night and a transaction log recovery every hour (or every 15 minutes) switching to a simple recovery model breaks the chain and you can only recover to the last full recovery time.
If you do it you have to do it (switch to simple) switch back to full and do a full backup and then switch back to doing log backups on a schedule. When the previous person suggest is like driving without bumpers to save on car weight. Sounds great until you hit something.

-- Run this section of code one time to create a global queue containing only the columns needed to identify a unique row to process. Update the SELECT statement as necessary.
IF OBJECT_ID('Tempdb.dbo.##GlobalQueue') IS NOT NULL
DROP TABLE ##GlobalQueue
SELECT diff_key, date_key INTO ##GlobalQueue FROM [dbo].[FACT_Table] f
INNER JOIN dbo.dummy_table r ON f.some_key = r.some_Key
and r.calendar_key = f.calendar_key
WHERE f.date_Key > 20130101
AND f.date_key < 20141201
AND f.diff_key = 17
-- Copy/paste the SQL below to run from multiple sessions/clients if you want to process things faster than the single session. Start with 1 session and move up from there as needed to ramp up processing load.
WHILE 1 = 1
BEGIN
DELETE TOP ( 10000 ) --Feel free to update to a higher number if desired depending on how big of a 'bite' you want it to take each time.
##Queue WITH ( READPAST )
OUTPUT Deleted.*
INTO #RowsToProcess
IF ##ROWCOUNT > 0
BEGIN
Update f
set f.value_key = r.value_key
FROM [dbo].[FACT_Table] f
INNER JOIN dbo.dummy_table r ON f.some_key = r.some_Key
INNER JOIN #RowsToProcess RTP ON r.some_key = RTP.diff_key
and f.calendar_key = RTP.calendar_key
DELETE FROM #RowsToProcess
--WAITFOR DELAY '00:01' --(Commented out 1 minute delay as running multiple sessions to process the records eliminates the need for that). Here just for demonstration purposes as it may be useful in other situations.
END
ELSE
BREAK
END

use top and <>
in an update you must use top (#)
while 1 = 1
begin
update top (100) [test].[dbo].[Table_1]
set lname = 'bname'
where lname <> 'bname'
if ##ROWCOUNT = 0 break
end
while 1 = 1
begin
Update top (100000) f
set f.value_key = r.value_key
FROM [dbo].[FACT_Table] f
INNER JOIN dbo.dummy_table r
ON f.some_key = r.some_Key
and r.calendar_key = f.calendar_key
AND f.date_Key > 20130101
AND f.date_key < 20141201
AND f.diff_key = 17
AND f.value_key <> r.value_key
if ##ROWCOUNT = 0 break
end

Related

MSSQL - IF within a TRIGGER

we're just migrating from mariadb (galera) to MSSQL.
One of our applications has a very special behaviour - from time to time (I have not found a pattern, the vendor uses very fancy AI-related stuff which noone can debug :-/) it will block the monitor-user of our loadbalancers because of too many connects, so the loadbalancer is no longer able to get the health state, suspends all services on all servers and the whole service is going down.
So I wrote a trigger which enables this user after he will be disabled.
I've already thought about a constraint which prohibits this, but then the application goes nuts if it will disable the user.
Anyway - in mysql this works perfectly for us:
delimiter $$
CREATE TRIGGER f5mon_no_disable AFTER UPDATE ON dpa_web_user
FOR EACH ROW
BEGIN
IF NEW.user_id = '99999999' AND NEW.enabled = 0 THEN
UPDATE dpa_web_user SET enabled = 1 WHERE user_id = '9999999';
END IF;
END$$
delimiter ;
I tried this in T-SQL (if it's important, it is MSSQL 2016)
CREATE TRIGGER f5mon_no_disable ON [dbo].[dpa_web_user]
AFTER UPDATE
AS
BEGIN
IF ( inserted.[user_id] = '9999999' AND inserted.[enabled] = 0 )
BEGIN
UPDATE dpa_web_user SET enabled = 1 WHERE user_id = '9999999';
END
END
I think it's the if-statement which is totally wrong in more than one way - but I do not have an idea how the syntax is in t-sql.
Thanks in advance for your help.
You can use IF EXISTS but you can't reference column values in inserted without set-based access to inserted:
IF EXISTS (SELECT 1 FROM inserted WHERE [user_id] = '9999999' AND [enabled] = 0)
BEGIN
UPDATE dpa_web_user SET enabled = 1 WHERE user_id = '9999999';
END
You may want to add AND enabled <> 1 to prevent updating a row for no reason.
You can do this in a single statement though:
UPDATE u
SET enabled = 1
FROM dbo.dpa_web_user AS u
INNER JOIN inserted AS i
ON u.[user_id] = i.[user_id]
WHERE u.[user_id] = '9999999'
AND i.[user_id] = '9999999'
AND u.enabled <> 1
AND i.enabled = 0;

TSQL Cancel long running UPDATE without rolling back

I have no BEGIN / END and no TRANSACTION START.
UPDATE [rpt].[Symbol_Index]
SET [fk_company] = b.pk_company --INT
FROM [rpt].[Symbol_Index] A
JOIN lu.company b
ON a.[fmp_Y_SeriesSymbol] = b.[ticker] --VARCHAR(18)
I'm resetting a logic key type of VARCHAR(18) to an INT, assuming this to be more efficient for joins. This being the first step of the key reset.
this are about 5 Million rows and 16 hours seems too long. This is a case where I discover how un-DBA I really am.

SQL Trigger Works in Play but not Production

I created an SQL trigger in my Play database and it worked great. When I moved it over to Production, it suddenly won't work. We want the trigger to kick off whenever someone edits one of two custom fields in our database. The company who created the software already set up a trigger that kicks of any time a change is made to the database object (it just didn't track the changes made to custom fields). If I let my new trigger create a new record, I wound up with two audit records, so I changed my trigger to update the audit record the software company's trigger created. Could anyone tell me what I have done wrong? Here is my trigger:
USE [TmsEPrd]
GO
/****** Object: Trigger [dbo].[tr_Biograph_Udef_Audit_tracking] Script Date: 11/23/2020 10:22:57 AM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER TRIGGER [dbo].[tr_Biograph_Udef_Audit_tracking] ON [dbo].[BIOGRAPH_MASTER] FOR UPDATE AS
BEGIN
IF EXISTS (SELECT 1 FROM deleted d
JOIN inserted i ON d.ID_NUM = i.ID_NUM
JOIN (SELECT ID_NUM, binary_checksum(UDEF_10A_1, UDEF_2A_4) AS inserted_checksum
FROM inserted) a ON i.ID_NUM = a.ID_NUM
JOIN (SELECT ID_NUM, binary_checksum(UDEF_10A_1, UDEF_2A_4) AS deleted_checksum
FROM deleted) b ON d.ID_NUM = b.ID_NUM
WHERE a.inserted_checksum <> b.deleted_checksum)
BEGIN
Update BIOGRAPH_HISTORY
set archive_job_name = 'UDEF_Change',
udef_2a_4 = i.udef_2a_4,
udef_2a_4_CHG = i.udef_2a_4_chg,
udef_10a_1 = i.udef_10a_1,
udef_10a_1_chg = i.udef_10a_1_chg
from
(select i.ID_NUM, SYSDATETIME()as job_time_a,
i.UDEF_10A_1, case when i.UDEF_10A_1 = d.UDEF_10A_1 then 0 when i.UDEF_10A_1 is null and d.UDEF_10A_1 is null then 0 else 1 end as UDEF_10A_1_CHG,
i.UDEF_2A_4, case when i.UDEF_2A_4 = d.UDEF_2A_4 then 0 when i.UDEF_2A_4 is null and d.UDEF_2A_4 is null then 0 else 1 end as UDEF_2A_4_CHG,
d.USER_NAME,d.JOB_NAME,d.JOB_TIME
FROM deleted d JOIN inserted i ON d.ID_NUM = i.ID_NUM) i
join BIOGRAPH_HISTORY b on i.ID_NUM = b.ID_NUM
where DATEDIFF(Minute, i.job_time_a, b.ARCHIVE_JOB_TIM) = 0
and b.ARCHIVE_JOB_NAME not like 'UDEF_Change%'
END;
END;
Try specifying #order = 'LAST' for your trigger. It might be that your trigger is executing first and not finding a record to update. In your test system, the trigger execution order might be reversed.
The order that triggers are created might affect trigger execution order, but this is not something to rely upon. When you think about it, this can be a headache. A test system that looks just like production can behave differently.
This is similar to relying upon a "natural" record order of a clustered index and not using a ORDER BY clause. A different execution plan can use a different index or go parallel resulting in a different or no order.

How can I avoid SQL deadlock during read/update in stored procedure

I have a procedure used to push queued changes to another system. The procedure was created several years ago, with a lot of thought, to prevent deadlocks during its execution. In the past week we started seeing deadlocks, but can't identify a pattern. When deadlocks start occurring, it can go on for more than a day or just a few hours. Low activity time periods, thus far, have shown few or no deadlocks, but high activity periods have shown no deadlocks to thousands of deadlocks. A 12 hour period, with consistently high activity, had no deadlocks. A slower 12 hour period had over 4,000 deadlocks.
What I've found on existing posts makes me think I should be happy with what I have and not worry about deadlocks, but the lack of any pattern to frequency/occurrence is hard to accept. Without a pattern I can't duplicate in our test environment.
Here is the procedure where the deadlock is occurring. We've had as many as 30 services calling this procedure to process changes. Today we are slow and are only running five services (on five separate servers), but deadlocks continue.
ALTER PROCEDURE [dbo].[retrieveAndLockTransactionChangeLogItems]
#serverName VARCHAR(50)
AS
--Create a common-table-expression to represent the data we're looking
--for. In this case, all the unprocessed, unlocked items for a
--transaction.
WITH cte AS
(
SELECT t.*
FROM TransactionChangeLog t WITH (UPDLOCK, ROWLOCK, READPAST)
WHERE t.intTransactionID =
(SELECT TOP 1 t1.intTransactionID
FROM TransactionChangeLog t1 WITH (ROWLOCK, READPAST)
INNER JOIN TransactionServices s ON
s.intTransactionID = t1.intTransactionID
WHERE s.ProductionSystemDefinitionID > 2 --SoftPro v4
AND t1.bitProcessed = 0
AND t1.bitLocked = 0
AND t1.intSourceID = 1
AND NOT EXISTS(SELECT t2.intChangeID
FROM TransactionChangeLog t2 WITH (ROWLOCK, READPAST)
WHERE t2.intTransactionID = t1.intTransactionID
AND t2.bitLocked = 1)
AND NOT EXISTS (SELECT ts.intServiceID
FROM TransactionServices ts
WHERE ts.intTransactionID = t1.intTransactionID
AND ts.intStatusID in (0, 5, 6, 7))
ORDER BY t1.Priority, t1.dtmLastUpdatedDate, t1.dtmChangeDate)
AND t.bitProcessed = 0
)
--Now update those records to Locked.
UPDATE cte
SET bitLocked = 1,
dtmLastUpdatedDate = GETUTCDATE(),
dtmLockedDate = GETUTCDATE(),
LockedBy = #serverName
--Use the OUTPUT clause to return the affected records.
--We do this instead of a SELECT then UPDATE because it acts as a
--single operation and we can avoid any deadlocks.
OUTPUT
Inserted.intChangeID AS 'ChangeID',
Inserted.intSourceID AS 'SourceID',
Inserted.bnyChangeData AS 'ChangeData',
Inserted.intChangeDataTypeID AS 'ChangeDataTypeID',
Inserted.intTransactionID AS 'TransactionID',
Inserted.intUserID AS 'UserID',
Inserted.dtmChangeDate AS 'ChangeDate',
Inserted.bitLocked AS 'Locked',
Inserted.dtmLockedDate AS 'LockedDate',
Inserted.bitProcessed AS 'Processed',
Inserted.dtmLastUpdatedDate AS 'LastUpdatedDate',
Inserted.Retries,
Inserted.LockedBy,
Inserted.ServiceID,
Inserted.Priority
Based on history, before the past week, I expect no deadlocks from this procedure. At this point I would just be happy understanding why/how we are now getting deadlocks.

SQL Server - Multiple select queries hit performance

Recently I ran into an issue where we have multiple concurrent client requests causing performance issue in db. I tried the test scenario and as it turned out, when I run SELECT queries (same query) 6 to 7 times (gets worse with more), It degrades the performance and execution takes a lot of time. However I tried this one
SELECT TOP (100) COUNT(DISTINCT([Doc_Number])) AS "Expression"
FROM (
SELECT *
FROM "dbo"."Dummy_Table" "table_alias"
WHERE ((CAST("table_alias"."ID" AS NVARCHAR)) NOT IN
(
SELECT "PrimaryKey" AS ExceptionKey
FROM dbo.exceptions inner_exceptionStatus
LEFT JOIN dbo.Workflow inner_workflowStates ON
(inner_exceptionStatus."Status"= inner_workflowStates."UUID" AND
inner_exceptionStatus."UUID"= 'CA1662D6-73A2-4692-A765-E7E3EDB66062')
WHERE ("inner_workflowStates"."RemoveFromRecordSet" = 1 AND
"inner_workflowStates"."IsDeleted" = 0) AND
("inner_exceptionStatus"."IsArchived" IS NULL OR
"inner_exceptionStatus"."IsArchived" = 0)))) wrapperQuery
The query when runs alone takes around 1sec execution time. But If we runs it in parallel, for each query it takes up a wried amount of time of leads to timeout.
The only thing bothers me here is that SELECT query should be non-blocking and even with shared lock, then need to get along easily.
I am not sure if there is anything wrong in the query that adds up the situation.
Any help is deeply appreciated !!
Try this way
SELECT Count(DISTINCT( [Doc_Number] )) AS Expression
FROM dbo.Dummy_Table table_alias
WHERE NOT EXISTS (SELECT 1
FROM dbo.exceptions inner_exceptionStatus
INNER JOIN dbo.Workflow inner_workflowStates
ON ( inner_exceptionStatus.Status = inner_workflowStates.UUID
AND inner_exceptionStatus.UUID = 'CA1662D6-73A2-4692-A765-E7E3EDB66062' )
WHERE inner_workflowStates.RemoveFromRecordSet = 1
AND inner_workflowStates.IsDeleted = 0
AND ( inner_exceptionStatus.IsArchived IS NULL
OR inner_exceptionStatus.IsArchived = 0 )
AND table_alias.ID = PrimaryKey)
Made couple of changes.
Changed NOT IN to NOT EXISTS
Removed the convert in "table_alias"."ID" because it will avoid using any index present in "table_alias"."ID" column. If the conversion is really required then add it.
Removed Top (100) since there is no Group By it will return a single record as result.
Still if the query is running slow then you need to post the execution plan and make sure the statistics are up-to-date
You can simplyfy your query like this :
SELECT COUNT(DISTINCT(Doc_Number)) AS Expression
FROM dbo.Dummy_Table dmy
WHERE not exists
(
SELECT *
FROM dbo.exceptions ies
INNER JOIN dbo.Workflow iws ON ies.Status= iws.UUID AND ies.UUID= 'CA1662D6-73A2-4692-A765-E7E3EDB66062'
WHERE iws.RemoveFromRecordSet = 1 AND iws.IsDeleted = 0 AND (ies.IsArchived IS NULL OR ies.IsArchived = 0)
and dmy.ID=PrimaryKey
)
Like prdp say :
Changed NOT IN to NOT EXISTS
Removed the convert in "table_alias"."ID" because it will avoid using any index present in "table_alias"."ID" column. If the conversion is really required then add it.
Removed Top (100) since there is no Group By it will return a single record as result.
I add :
Remove you temporary table wrapperQuery
You can use INNER JOIN because into where you test RemoveFromRecordSet = 1 then you remove null values.
Remove not utils quotes ,brackets and parenthèses into where clause

Resources