TSQL Cancel long running UPDATE without rolling back

TSQL Cancel long running UPDATE without rolling back - sql-server

I have no BEGIN / END and no TRANSACTION START.
UPDATE [rpt].[Symbol_Index]
SET [fk_company] = b.pk_company --INT
FROM [rpt].[Symbol_Index] A
JOIN lu.company b
ON a.[fmp_Y_SeriesSymbol] = b.[ticker] --VARCHAR(18)
I'm resetting a logic key type of VARCHAR(18) to an INT, assuming this to be more efficient for joins. This being the first step of the key reset.
this are about 5 Million rows and 16 hours seems too long. This is a case where I discover how un-DBA I really am.

Related

How can I avoid SQL deadlock during read/update in stored procedure

I have a procedure used to push queued changes to another system. The procedure was created several years ago, with a lot of thought, to prevent deadlocks during its execution. In the past week we started seeing deadlocks, but can't identify a pattern. When deadlocks start occurring, it can go on for more than a day or just a few hours. Low activity time periods, thus far, have shown few or no deadlocks, but high activity periods have shown no deadlocks to thousands of deadlocks. A 12 hour period, with consistently high activity, had no deadlocks. A slower 12 hour period had over 4,000 deadlocks.
What I've found on existing posts makes me think I should be happy with what I have and not worry about deadlocks, but the lack of any pattern to frequency/occurrence is hard to accept. Without a pattern I can't duplicate in our test environment.
Here is the procedure where the deadlock is occurring. We've had as many as 30 services calling this procedure to process changes. Today we are slow and are only running five services (on five separate servers), but deadlocks continue.
ALTER PROCEDURE [dbo].[retrieveAndLockTransactionChangeLogItems]
#serverName VARCHAR(50)
AS
--Create a common-table-expression to represent the data we're looking
--for. In this case, all the unprocessed, unlocked items for a
--transaction.
WITH cte AS
(
SELECT t.*
FROM TransactionChangeLog t WITH (UPDLOCK, ROWLOCK, READPAST)
WHERE t.intTransactionID =
(SELECT TOP 1 t1.intTransactionID
FROM TransactionChangeLog t1 WITH (ROWLOCK, READPAST)
INNER JOIN TransactionServices s ON
s.intTransactionID = t1.intTransactionID
WHERE s.ProductionSystemDefinitionID > 2 --SoftPro v4
AND t1.bitProcessed = 0
AND t1.bitLocked = 0
AND t1.intSourceID = 1
AND NOT EXISTS(SELECT t2.intChangeID
FROM TransactionChangeLog t2 WITH (ROWLOCK, READPAST)
WHERE t2.intTransactionID = t1.intTransactionID
AND t2.bitLocked = 1)
AND NOT EXISTS (SELECT ts.intServiceID
FROM TransactionServices ts
WHERE ts.intTransactionID = t1.intTransactionID
AND ts.intStatusID in (0, 5, 6, 7))
ORDER BY t1.Priority, t1.dtmLastUpdatedDate, t1.dtmChangeDate)
AND t.bitProcessed = 0
)
--Now update those records to Locked.
UPDATE cte
SET bitLocked = 1,
dtmLastUpdatedDate = GETUTCDATE(),
dtmLockedDate = GETUTCDATE(),
LockedBy = #serverName
--Use the OUTPUT clause to return the affected records.
--We do this instead of a SELECT then UPDATE because it acts as a
--single operation and we can avoid any deadlocks.
OUTPUT
Inserted.intChangeID AS 'ChangeID',
Inserted.intSourceID AS 'SourceID',
Inserted.bnyChangeData AS 'ChangeData',
Inserted.intChangeDataTypeID AS 'ChangeDataTypeID',
Inserted.intTransactionID AS 'TransactionID',
Inserted.intUserID AS 'UserID',
Inserted.dtmChangeDate AS 'ChangeDate',
Inserted.bitLocked AS 'Locked',
Inserted.dtmLockedDate AS 'LockedDate',
Inserted.bitProcessed AS 'Processed',
Inserted.dtmLastUpdatedDate AS 'LastUpdatedDate',
Inserted.Retries,
Inserted.LockedBy,
Inserted.ServiceID,
Inserted.Priority
Based on history, before the past week, I expect no deadlocks from this procedure. At this point I would just be happy understanding why/how we are now getting deadlocks.

How do I set the correct transaction level?

I am using Dapper on ADO.NET. So at present I am doing the following:
using (IDbConnection conn = new SqlConnection("MyConnectionString")))
{
conn.Open());
using (IDbTransaction transaction = conn.BeginTransaction())
{
// ...
However, there are various levels of transactions that can be set. I think this is the various settings.
My first question is how do I set the transaction level (where I am using Dapper)?
My second question is what is the correct level for each of the following cases? In each of these cases we have multiple instances of a web worker (Azure) service running that will be hitting the DB at the same time.
I need to run monthly charges on subscriptions. So in a transaction I need to read a record and if it's due for a charge create the invoice record and mark the record as processed. Any other read of that record for the same purpose needs to fail. But any other reads of that record that are just using it to verify that it is active need to succeed.
So what transaction do I use for the access that will be updating the processed column? And what transaction do I use for the other access that just needs to verify that the record is active?
In this case it's fine if a conflict causes the charge to not be run (we'll get it the next day). But it is critical that we not charge someone twice. And it is critical that the read to verify that the record is active succeed immediately while the other operation is in its transaction.
I need to update a record where I am setting just a couple of columns. One use case is I set a new password hash for a user record. It's fine if other access occurs during this except for deleting the record (I think that's the only problem use case). If another web service is also updating that's the user's problem for doing this in 2 places simultaneously.
But it's key that the record stay consistent. And this includes the use case of "set NumUses = NumUses + #ParamNum" so it needs to treat the read, calculation, write of the column value as an atomic action. And if I am setting 3 column values, they all get written together.

1) Assuming that Invoicing process is an SP with multiple statements your best bet is to create another "lock" table to store the fact that invoicing job is already running e.g.
CREATE TABLE InvoicingJob( JobStarted DATETIME, IsRunning BIT NOT NULL )
-- Table will only ever have one record
INSERT INTO InvoicingJob
SELECT NULL, 0
EXEC InvoicingProcess
ALTER PROCEDURE InvoicingProcess
AS
BEGIN
DECLARE #InvoicingJob TABLE( IsRunning BIT )
-- Try to aquire lock
UPDATE InvoicingJob WITH( TABLOCK )
SET JobStarted = GETDATE(), IsRunning = 1
OUTPUT INSERTED.IsRunning INTO #InvoicingJob( IsRunning )
WHERE IsRunning = 0
-- job has been running for more than a day i.e. likely crashed without releasing a lock
-- OR ( IsRunning = 1 AND JobStarted <= DATEADD( DAY, -1, GETDATE())
IF NOT EXISTS( SELECT * FROM #InvoicingJob )
BEGIN
PRINT 'Another Job is already running'
RETURN
END
ELSE
RAISERROR( 'Start Job', 0, 0 ) WITH NOWAIT
-- Do invoicing tasks
WAITFOR DELAY '00:01:00' -- to simulate execution time
-- Release lock
UPDATE InvoicingJob
SET IsRunning = 0
END
2) Read about how transactions work: https://learn.microsoft.com/en-us/sql/t-sql/language-elements/transactions-transact-sql?view=sql-server-2017
https://learn.microsoft.com/en-us/sql/t-sql/statements/set-transaction-isolation-level-transact-sql?view=sql-server-2017
You second question is quite broad.

How to loop and/or wait for delay

I have imported over 400 million records to a dummy dimension table. I need to take my existing fact table and join to the dummy dimension to perform an update on the fact table. To avoid filling up the transaction logs, somebody suggested I perform a loop to update these records instead of performing an update to hundreds of millions of records at once. I have research loops and researched using a Wait For and Delays, but I am not for sure the best approach on writing the logic out.
Here is the sample update I need to perform:
Update f
set f.value_key = r.value_key
FROM [dbo].[FACT_Table] f
INNER JOIN dbo.dummy_table r ON f.some_key = r.some_Key
and r.calendar_key = f.calendar_key
WHERE f.date_Key > 20130101
AND f.date_key < 20141201
AND f.diff_key = 17
If anybody has a suggestion on the best way to write I would really appreciate it.

To avoid filling up the transaction log you CAN set your recovery model to SIMPLE on your dev machines - that will prevent transaction log bloating when tran log backups aren't done.
ALTER DATABASE MyDB SET RECOVERY SIMPLE;
If you want to perform your update faster use hint ie., (tablock).

Please don't do what the previous person suggested unless you really understand what else will happen. The most important result is that you lose the ability to do a point in time recovery. If you do a full recovery every night and a transaction log recovery every hour (or every 15 minutes) switching to a simple recovery model breaks the chain and you can only recover to the last full recovery time.
If you do it you have to do it (switch to simple) switch back to full and do a full backup and then switch back to doing log backups on a schedule. When the previous person suggest is like driving without bumpers to save on car weight. Sounds great until you hit something.

-- Run this section of code one time to create a global queue containing only the columns needed to identify a unique row to process. Update the SELECT statement as necessary.
IF OBJECT_ID('Tempdb.dbo.##GlobalQueue') IS NOT NULL
DROP TABLE ##GlobalQueue
SELECT diff_key, date_key INTO ##GlobalQueue FROM [dbo].[FACT_Table] f
INNER JOIN dbo.dummy_table r ON f.some_key = r.some_Key
and r.calendar_key = f.calendar_key
WHERE f.date_Key > 20130101
AND f.date_key < 20141201
AND f.diff_key = 17
-- Copy/paste the SQL below to run from multiple sessions/clients if you want to process things faster than the single session. Start with 1 session and move up from there as needed to ramp up processing load.
WHILE 1 = 1
BEGIN
DELETE TOP ( 10000 ) --Feel free to update to a higher number if desired depending on how big of a 'bite' you want it to take each time.
##Queue WITH ( READPAST )
OUTPUT Deleted.*
INTO #RowsToProcess
IF ##ROWCOUNT > 0
BEGIN
Update f
set f.value_key = r.value_key
FROM [dbo].[FACT_Table] f
INNER JOIN dbo.dummy_table r ON f.some_key = r.some_Key
INNER JOIN #RowsToProcess RTP ON r.some_key = RTP.diff_key
and f.calendar_key = RTP.calendar_key
DELETE FROM #RowsToProcess
--WAITFOR DELAY '00:01' --(Commented out 1 minute delay as running multiple sessions to process the records eliminates the need for that). Here just for demonstration purposes as it may be useful in other situations.
END
ELSE
BREAK
END

use top and <>
in an update you must use top (#)
while 1 = 1
begin
update top (100) [test].[dbo].[Table_1]
set lname = 'bname'
where lname <> 'bname'
if ##ROWCOUNT = 0 break
end
while 1 = 1
begin
Update top (100000) f
set f.value_key = r.value_key
FROM [dbo].[FACT_Table] f
INNER JOIN dbo.dummy_table r
ON f.some_key = r.some_Key
and r.calendar_key = f.calendar_key
AND f.date_Key > 20130101
AND f.date_key < 20141201
AND f.diff_key = 17
AND f.value_key <> r.value_key
if ##ROWCOUNT = 0 break
end

My trigger is running forever? Any idea why?

I have created the follow TSQL trigger that appears to be running forever whenever the underlying table gets updated.
CREATE TRIGGER Trigger_MDSS_ComputeAggregates
ON dbo.MonthlyDetectionScoresSums
AFTER UPDATE, INSERT
AS
BEGIN
update dbo.MonthlyDetectionScoresSums
SET
YPElec = CAST(COALESCE (i.YPLocChain_TotElec, i.YPGlobChain_TotElec, i.YPSIC_TotElec) AS real),
YPGas = CAST(COALESCE (i.YPLocChain_TotGas, i.YPSIC_TotGas) AS real)
from MonthlyDetectionScoresSums mdss
inner join INSERTED i on i.ACI_OI = mdss.ACI_OI
END
GO
Do you know why it might be running for a really really long time?

May I suggest that you use computed columns and drop the trigger?
ALTER TABLE dbo.MonthlyDetectionScoresSums ADD
YPElec AS CAST(COALESCE (YPLocChain_TotElec, YPGlobChain_TotElec, YPSIC_TotElec) AS real)
YPGas AS CAST(COALESCE (YPLocChain_TotGas, YPSIC_TotGas) AS real)
From what I see, you are updating rows you've just updated/inserted. The DB engine will do it for you and no trigger needed.

Do you have recursive triggers turned on?
Although an infinite loop should be terminated, it's possible if your update is very large that it takes a long time to get to the nesting limit of 32:
http://msdn.microsoft.com/en-us/library/ms190739.aspx

Why does an UPDATE take much longer than a SELECT?

I have the following select statement that finishes almost instantly.
declare #weekending varchar(6)
set #weekending = 100103
select InvoicesCharges.orderaccnumber, Accountnumbersorders.accountnumber
from Accountnumbersorders, storeinformation, routeselecttable,InvoicesCharges, invoice
where InvoicesCharges.pubid = Accountnumbersorders.publication
and Accountnumbersorders.actype = 0
and Accountnumbersorders.valuezone = 'none'
and storeinformation.storeroutename = routeselecttable.istoreroutenumber
and storeinformation.storenumber = invoice.store_number
and InvoicesCharges.invoice_number = invoice.invoice_number
and convert(varchar(6),Invoice.bill_to,12) = #weekending
However, the equivalent update statement takes 1m40s
declare #weekending varchar(6)
set #weekending = 100103
update InvoicesCharges
set InvoicesCharges.orderaccnumber = Accountnumbersorders.accountnumber
from Accountnumbersorders, storeinformation, routeselecttable,InvoicesCharges, invoice
where InvoicesCharges.pubid = Accountnumbersorders.publication
and Accountnumbersorders.actype = 0
and dbo.Accountnumbersorders.valuezone = 'none'
and storeinformation.storeroutename = routeselecttable.istoreroutenumber
and storeinformation.storenumber = invoice.store_number
and InvoicesCharges.invoice_number = invoice.invoice_number
and convert(varchar(6),Invoice.bill_to,12) = #weekending
Even if I add:
and InvoicesCharges.orderaccnumber <> Accountnumbersorders.accountnumber
at the end of the update statement reducing the number of writes to zero, it takes the same amount of time.
Am I doing something wrong here? Why is there such a huge difference?

transaction log file writes
index updates
foreign key lookups
foreign key cascades
indexed views
computed columns
check constraints
locks
latches
lock escalation
snapshot isolation
DB mirroring
file growth
other processes reading/writing
page splits / unsuitable clustered index
forward pointer/row overflow events
poor indexes
statistics out of date
poor disk layout (eg one big RAID for everything)
Check constraints with UDFs that have table access
...
Although, the usual suspect is a trigger...
Also, your condition extra has no meaning: How does SQL Server know to ignore it? An update is still generated with most of the baggage... even the trigger will still fire. Locks must be held while rows are searched for the other conditions for example
Edited Sep 2011 and Feb 2012 with more options

The update has to lock and modify the data in the table, and also log the changes to the transaction log. The select does not have to do any of those things.

Because reading does not affect indices, triggers, and what have you?

In Slow servers or large database i usually use UPDATE DELAYED, that waits for a "break" to update the database itself.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight