How do I set the correct transaction level? - sql-server

I am using Dapper on ADO.NET. So at present I am doing the following:
using (IDbConnection conn = new SqlConnection("MyConnectionString")))
{
conn.Open());
using (IDbTransaction transaction = conn.BeginTransaction())
{
// ...
However, there are various levels of transactions that can be set. I think this is the various settings.
My first question is how do I set the transaction level (where I am using Dapper)?
My second question is what is the correct level for each of the following cases? In each of these cases we have multiple instances of a web worker (Azure) service running that will be hitting the DB at the same time.
I need to run monthly charges on subscriptions. So in a transaction I need to read a record and if it's due for a charge create the invoice record and mark the record as processed. Any other read of that record for the same purpose needs to fail. But any other reads of that record that are just using it to verify that it is active need to succeed.
So what transaction do I use for the access that will be updating the processed column? And what transaction do I use for the other access that just needs to verify that the record is active?
In this case it's fine if a conflict causes the charge to not be run (we'll get it the next day). But it is critical that we not charge someone twice. And it is critical that the read to verify that the record is active succeed immediately while the other operation is in its transaction.
I need to update a record where I am setting just a couple of columns. One use case is I set a new password hash for a user record. It's fine if other access occurs during this except for deleting the record (I think that's the only problem use case). If another web service is also updating that's the user's problem for doing this in 2 places simultaneously.
But it's key that the record stay consistent. And this includes the use case of "set NumUses = NumUses + #ParamNum" so it needs to treat the read, calculation, write of the column value as an atomic action. And if I am setting 3 column values, they all get written together.

1) Assuming that Invoicing process is an SP with multiple statements your best bet is to create another "lock" table to store the fact that invoicing job is already running e.g.
CREATE TABLE InvoicingJob( JobStarted DATETIME, IsRunning BIT NOT NULL )
-- Table will only ever have one record
INSERT INTO InvoicingJob
SELECT NULL, 0
EXEC InvoicingProcess
ALTER PROCEDURE InvoicingProcess
AS
BEGIN
DECLARE #InvoicingJob TABLE( IsRunning BIT )
-- Try to aquire lock
UPDATE InvoicingJob WITH( TABLOCK )
SET JobStarted = GETDATE(), IsRunning = 1
OUTPUT INSERTED.IsRunning INTO #InvoicingJob( IsRunning )
WHERE IsRunning = 0
-- job has been running for more than a day i.e. likely crashed without releasing a lock
-- OR ( IsRunning = 1 AND JobStarted <= DATEADD( DAY, -1, GETDATE())
IF NOT EXISTS( SELECT * FROM #InvoicingJob )
BEGIN
PRINT 'Another Job is already running'
RETURN
END
ELSE
RAISERROR( 'Start Job', 0, 0 ) WITH NOWAIT
-- Do invoicing tasks
WAITFOR DELAY '00:01:00' -- to simulate execution time
-- Release lock
UPDATE InvoicingJob
SET IsRunning = 0
END
2) Read about how transactions work: https://learn.microsoft.com/en-us/sql/t-sql/language-elements/transactions-transact-sql?view=sql-server-2017
https://learn.microsoft.com/en-us/sql/t-sql/statements/set-transaction-isolation-level-transact-sql?view=sql-server-2017
You second question is quite broad.

Related

singleton pattern implementation in Snowflake?

We need to implement some singleton pattern to ensure a stored procedure cannot be run several times simultaneously.
As I cannot see this functionality in place, I thought about implementing this via a "Lock" table.
We are in a "batch" environment so waiting a few seconds is no problem.
SHARED.LOCK(LOCK_NAME STRING NOT NULL PRIMARY KEY
,SESSION_ID STRING NOT NULL
,ACQUIRED_AT TIMESTAMP_NTZ
)
LOCK_NAME is forced to upper case and used as a Primary Key
SESSION_ID is the current session
ACQUIRED_AT is just useful information
I then create a stored proc to "acquire" the lock $LOCK_NAME that tries to update the lock record with its own session id as long as it is not "locked" already
UPDATE SHARED.LOCK
SET LOAD_ID = $LOAD_ID
,SESSION_ID = CURRENT_SESSION()
,ACQUIRED_AT = CURRENT_TIMESTAMP()
WHERE LOCK_NAME = $LOCK_NAME
AND SESSION_ID IS NULL;
To avoid Snowflake optimistic locking side effects, I would ensure that this stored procedure is not called as part of an explicit transaction.
I then check whether I successfully "acquired" this lock
SELECT 1
FROM SHARED.LOCK
WHERE LOCK_NAME = $LOCK_NAME
AND LOAD_ID = $SESSION_ID;
If I get a record, then I have the lock.
Otherwise, I could wait X seconds and try again later, up to a certain number of attempts.
Once I am done, I can release the lock with a simple Update statement
UPDATE SHARED.LOCK
SET SESSION_ID = NULL
,ACQUIRED_AT = NULL
WHERE LOCK_NAME = $LOCK_NAME
AND SESSION_ID = $SESSION_ID;
And of course we'll have to do something about locks not released within a certain amount of time or locked by a session that is not live anymore, etc...
I think this should work... but maybe there is a simpler way to implement a singleton in Snowflake?
Any better ideas?
Depending on requirements, if the stored procedure is going to be run on schedule TASK could be used, which has OVERLAP protection built-in:
CREATE OR REPLACE TASK my_task
WAREHOUSE = compute_wh
SCHEDULE = '1 minute'
ALLOW_OVERLAPPING_EXECUTION = FALSE
AS
CALL procedure_call();
CREATE TASK - ALLOW_OVERLAPPING_EXECUTION :
ALLOW_OVERLAPPING_EXECUTION = TRUE | FALSE
Specifies whether to allow multiple instances of the task tree to run concurrently
FALSE ensures only one instance of a particular tree of tasks is allowed to run at a time.
Demo:
CREATE TABLE log(id INT NOT NULL IDENTITY(1,1), d TIMESTAMP);
CREATE OR REPLACE procedure insert_log()
returns string
language javascript
execute as owner
as
$$
snowflake.execute ({sqlText: "INSERT INTO log (d) SELECT CURRENT_TIMESTAMP()"});
snowflake.execute ({sqlText: "CALL SYSTEM$WAIT(2, 'MINUTES')"});
return "Succeeded.";
$$
;
ALTER TASK my_task RESUME;
SELECT * FROM log;

Streams + tasks missing inserts?

We've setup a stream on a table that is continuously loaded via snowpipe.
We're consuming this data with a task that runs every minute where we merge into another table. There is a possibility of duplicate keys so we use a ROW_NUMBER() window function, ordered by the file created timestamp descending where row_num=1. This way we always get the latest insert
Initially we used a standard task with the merge statement but we noticed that in some instances, since snowpipe does not guarantee loading in order of when the files were staged, we were updating rows with older data. As such, on the WHEN MATCHED section we added a condition so only when the file created ts > existing, to update the row
However, since we did that, reconciliation checks show that some new inserts are missing. I don't know for sure why changing the matched clause would interfere with the not matched clause.
My theory was that the extra clause added a bit of time to the task run where some runs were skipped or the next run happened almost immediately after the last one completed. The idea being that the missing rows were caught up in the middle and the offset changed before they could be consumed
As such, we changed the task to call a stored procedure which uses an explicit transaction. We did this because the docs seem to suggest that using a transaction will lock the stream. However even with this we can see that new inserts are still missing. We're talking very small numbers e.g. 8 out of 100,000s
Any ideas what might be happening?
Example task code below (not the sp version)
WAREHOUSE = TASK_WH
SCHEDULE = '1 minute'
WHEN SYSTEM$stream_has_data('my_stream')
AS
MERGE INTO processed_data pd USING (
select
ms.*,
CASE WHEN ms.status IS NULL THEN 1/mv.count ELSE NULL END as pending_count,
CASE WHEN ms.status='COMPLETE' THEN 1/mv.count ELSE NULL END as completed_count
from my_stream ms
JOIN my_view mv ON mv.id = ms.id
qualify
row_number() over (
partition by
id
order by
file_created DESC
) = 1
) ms ON ms.id = pd.id
WHEN NOT MATCHED THEN INSERT (col1, col2, col3,... )
VALUES (ms.col1, ms.col2, ms.col3,...)
WHEN MATCHED AND ms.file_created >= pd.file_created THEN UPDATE SET pd.col1 = ms.col1, pd.col2 = ms.col2, pd.col3 = ms.col3, ....
;
I am not fully sure what is going wrong here, but the file created time related recommendation is given by Snowflake somewhere. It suggest that the file created timestamp is calculated in cloud service and it may be bit different than you think. There is another recommendation related to snowpipe and data ingestion. The queue service takes a min to consume the data from pipe and if you have lot of data being flown inside with in a min, you may end up this issue. Look you implementation and simulate if pushing data in 1min interval solve that issue and don't rely on file create time.
The condition "AND ms.file_created >= pd.file_created" seems to be added as a mechanism to avoid updating the same row multiple times.
Alternative approach could be using IS DISTINCT FROM to compare source against target columns(except id):
MERGE INTO processed_data pd USING (
select
ms.*,
CASE WHEN ms.status IS NULL THEN 1/mv.count ELSE NULL END as pending_count,
CASE WHEN ms.status='COMPLETE' THEN 1/mv.count ELSE NULL END as completed_count
from my_stream ms
JOIN my_view mv ON mv.id = ms.id
qualify
row_number() over (
partition by
id
order by
file_created DESC
) = 1
) ms ON ms.id = pd.id
WHEN NOT MATCHED THEN INSERT (col1, col2, col3,... )
VALUES (ms.col1, ms.col2, ms.col3,...)
WHEN MATCHED
AND (pd.col1, pd.col2,..., pd.coln) IS DISTINCT FROM (ms.col1, ms.col2,..., ms.coln)
THEN UPDATE SET pd.col1 = ms.col1, pd.col2 = ms.col2, pd.col3 = ms.col3, ....;
This approach will also prevent updating row when nothing has changed.

Concurrent updates on a single staging table

I am developing a service application (VB.NET) which pulls information from a source and imports it to a SQL Server database
The process can involve one or more “batches” of information at a time (the number and size of batches in any given “run” is arbitrary based on a queue maintained elsewhere)
Each batch is assigned an identifier (BatchID) so that the set of records in the staging table which belong to that batch can be easily identified
The ETL process for each batch is sequential in nature; the raw data is bulk inserted to a staging table and then a series of stored procedures perform updates on a number of columns until the data is ready for import
These stored procedures are called in sequence by the service and are generally simple UPDATE commands
Each SP takes the BatchID as an input parameter and specifies this as the criteria for inclusion in each UPDATE, á la :
UPDATE dbo.stgTable
SET FieldOne = (CASE
WHEN S.[FieldOne] IS NULL
THEN T1.FieldOne
ELSE
S.[FieldOne]
END
)
, FieldTwo = (CASE
WHEN S.[FieldTwo] IS NULL
THEN T2.FieldTwo
ELSE
S.[FieldTwo]
END
)
FROM dbo.stgTable AS S
LEFT JOIN dbo.someTable T1 ON S.[SomeField] = T1.[SomeField]
LEFT JOIN dbo.someOtherTable T2 ON S.[SomeOtherField] = T2.[SomeOtherField]
WHERE S.BatchID = #BatchID
Some of the SP’s also refer to functions (both scalar and table-valued) and all incorporate a TRY / CATCH structure so I can tell from the output parameters if a particular SP has failed
The final SP is a MERGE operation to move the enriched data from the staging table into the production table (again, specific to the provided BatchID)
I would like to thread this process in the service so that a large batch doesn’t hold up smaller batches in the same run
I figured there should be no issue with this as no thread could ever attempt to process records in the staging table that could be targeted by another thread (no race conditions)
However, I’ve noticed that, when I do thread the process, arbitrary steps on arbitrary batches seem to fail (but no error is recorded from the output of the SP)
The failures are inconsistent; e.g. sometimes batches 2, 3 & 5 will fail (on SP’s 3, 5 & 7 respectively), other times it will be different batches, each at different steps in the sequence
When I import the batches sequentially, they all import perfectly fine – always!
I can’t figure out if this is an issue on the service side (VB.NET) – e.g. is each thread opening an independent connection to the DB or could they be sharing the same one (I’ve set it up that each one should be independent…)
Or if the issue is on the SQL Server side – e.g. is it not feasible for concurrent SP calls to manipulate data on the same table, even though, as described above, no thread/batch will ever touch records belonging to another thread/batch
(On this point – I tried using CTE’s to create subsets of data from the staging table based on the BatchID and apply the UPDATE’s to those instead but the exact same behaviour occurred)
WITH CTE AS (
SELECT *
FROM dbo.stgTable
WHERE BatchID = #BatchID
)
UPDATE CTE...
Or maybe the problem is that multiple SP’s are calling the same function at the same time and that is why one or more of them are failing (I don’t see why that would be a problem though?)
Any suggestions would be very gratefully received – I’ve been playing around with this all week and I can’t for the life of me determine precisely what the problem might be!
Update to include sample service code
This is the code in the service class where the threading is initiated
For Each ItemInScope In ScopedItems
With ItemInScope
_batches(_batchCount) = New Batch(.Parameter1, .Parameter2, .ParameterX)
With _batches(_batchCount)
If .Initiate() Then
_doneEvents(_batchCount) = New ManualResetEvent(False)
Dim _batchWriter = New BatchWriter(_batches(_batchCount), _doneEvents(_batchCount))
ThreadPool.QueueUserWorkItem(AddressOf _batchWriter.ThreadPoolCallBack, _batchCount)
Else
_doneEvents(_batchCount) = New ManualResetEvent(True)
End If
End With
End With
_batchCount += 1
Next
WaitHandle.WaitAll(_doneEvents)
Here is the BatchWriter class
Public Class BatchWriter
Private _batch As Batch
Private _doneEvent As ManualResetEvent
Public Sub New(ByRef batch As Batch, ByVal doneEvent As ManualResetEvent)
_batch = batch
_doneEvent = doneEvent
End Sub
Public Sub ThreadPoolCallBack(ByVal threadContext As Object)
Dim threadIndex As Integer = CType(threadContext, Integer)
With _batch
If .PrepareBatch() Then
If .WriteTextOutput() Then
.ProcessBatch()
End If
End If
End With
_doneEvent.Set()
End Sub
End Class
The PrepareBatch and WriteTextOutput functions of the Batch class are entirely contained within the service application - it is only the ProcessBatch function where the service starts to interact with the database (via Entity Framework)
Here is that function
Public Sub ProcessScan()
' Confirm that a file is ready for import
If My.Computer.FileSystem.FileExists(_filePath) Then
Dim dbModel As New DatabaseModel
With dbModel
' Pass the batch to the staging table in the database
If .StageBatch(_batchID, _filePath) Then
' First update (results recorded for event log)
If .UpdateOne(_batchID) Then
_stepOneUpdates = .RetUpdates.Value
' Second update (results recorded for event log)
If .UpdateTwo(_batchID) Then
_stepTwoUpdates = .RetUpdates.Value
' Third update (results recorded for event log)
If .UpdateThree(_batchID) Then
_stepThreeUpdates = .RetUpdates.Value
....
End Sub

Large update in Sybase

I need your help to solve a problem that I have with a large update. Hi we want to create a server for developers/testers and we want that have a copy from production, so we want to obfuscate the NSS with a random NSS generated at the moment that I execute the update, so I create a function that do this process. This its the update:
UPDATE CUSTOMERS SET
NAME = 'John',
LAST NAME ='Doe',
NSS = RandomNSS
plan '(i_scan PK_CUSTOMERS CUSTOMERS)'
The update works fine except that the table its 9 million long so I got the log suspended or the message that you are out of locks so the process never finish so I tried to implement the following:
SET ROWCOUNT 10000
WHILE (1 = 1)
BEGIN
BEGIN TRANSACTION
UPDATE CUSTOMERS SET
NAME = 'John',
LAST NAME ='Doe',
NSS = RandomNSS
plan '(i_scan PK_CUSTOMERS CUSTOMERS)'
IF ##ROWCOUNT = 0
BEGIN
COMMIT TRANSACTION
BREAK
END
COMMIT TRANSACTION
END
SET ROWCOUNT 0
But it doesn't solve the problem because I don't have a where clause so the update never finish even that all the customers name are John Doe, so can you help me to create a query that help me to finish this process?
First, on the DEV system, I would change the setting for the databse so that it will truncate the transaction log when it checkpoints. That will keep the log from filling up while you accomplish the update. Because the DEV database is restored from PROD database, it's not important to dump the transaction logs for recoverability.
sp_dboption DEV_DB, 'trunc log on chkpt', true
If you must do the update, try changing the while loop to continue until all the names are changed to "John", like the following.
SET ROWCOUNT 10000
while ( select count(*) from CUSTOMERS where name != "John" and LAST_NAME != 'Doe' ) > 0
BEGIN
BEGIN TRANSACTION
UPDATE CUSTOMERS SET
NAME = 'John',
LAST NAME ='Doe',
NSS = RandomNSS
WHERE name != "John" AND LAST_NAME != "DOE"
plan '(i_scan PK_CUSTOMERS CUSTOMERS)'
COMMIT TRANSACTION
END
SET ROWCOUNT 0
Alternatively, you can create a view on the database, grabbing the Production data you want, while obscuring the fields you don't want the devs to see. This view can then be used to select into the DevelopementDB, or exported/imported with bcp
create view DEV_VIEW (NAME, LAST_NAME, NSS, colA, colB) as
select "John", "Doe", RandomNSS, other_colA, other_colB
from CUSTOMERS

Why does an UPDATE take much longer than a SELECT?

I have the following select statement that finishes almost instantly.
declare #weekending varchar(6)
set #weekending = 100103
select InvoicesCharges.orderaccnumber, Accountnumbersorders.accountnumber
from Accountnumbersorders, storeinformation, routeselecttable,InvoicesCharges, invoice
where InvoicesCharges.pubid = Accountnumbersorders.publication
and Accountnumbersorders.actype = 0
and Accountnumbersorders.valuezone = 'none'
and storeinformation.storeroutename = routeselecttable.istoreroutenumber
and storeinformation.storenumber = invoice.store_number
and InvoicesCharges.invoice_number = invoice.invoice_number
and convert(varchar(6),Invoice.bill_to,12) = #weekending
However, the equivalent update statement takes 1m40s
declare #weekending varchar(6)
set #weekending = 100103
update InvoicesCharges
set InvoicesCharges.orderaccnumber = Accountnumbersorders.accountnumber
from Accountnumbersorders, storeinformation, routeselecttable,InvoicesCharges, invoice
where InvoicesCharges.pubid = Accountnumbersorders.publication
and Accountnumbersorders.actype = 0
and dbo.Accountnumbersorders.valuezone = 'none'
and storeinformation.storeroutename = routeselecttable.istoreroutenumber
and storeinformation.storenumber = invoice.store_number
and InvoicesCharges.invoice_number = invoice.invoice_number
and convert(varchar(6),Invoice.bill_to,12) = #weekending
Even if I add:
and InvoicesCharges.orderaccnumber <> Accountnumbersorders.accountnumber
at the end of the update statement reducing the number of writes to zero, it takes the same amount of time.
Am I doing something wrong here? Why is there such a huge difference?
transaction log file writes
index updates
foreign key lookups
foreign key cascades
indexed views
computed columns
check constraints
locks
latches
lock escalation
snapshot isolation
DB mirroring
file growth
other processes reading/writing
page splits / unsuitable clustered index
forward pointer/row overflow events
poor indexes
statistics out of date
poor disk layout (eg one big RAID for everything)
Check constraints with UDFs that have table access
...
Although, the usual suspect is a trigger...
Also, your condition extra has no meaning: How does SQL Server know to ignore it? An update is still generated with most of the baggage... even the trigger will still fire. Locks must be held while rows are searched for the other conditions for example
Edited Sep 2011 and Feb 2012 with more options
The update has to lock and modify the data in the table, and also log the changes to the transaction log. The select does not have to do any of those things.
Because reading does not affect indices, triggers, and what have you?
In Slow servers or large database i usually use UPDATE DELAYED, that waits for a "break" to update the database itself.

Resources