Queue up requests to a sql server stored procedure - sql-server

I am working with a stored procedure that:
determines the number of rows in the table where the chosenBy column is null
picks one of these rows at random
updates the chosenBy column of this row
returns the row to the client
How do I prevent clients from choosing the same row in situations where they choose at exactly the same time?
I have tried various table hints and isolation levels but just get deadlock exceptions at the client. I just want the second call to wait for the fraction of a second until the first call is completed.

One way of avoiding deadlocks (as indicated in your question title) would be to serialise access to that procedure.
You can do this with sp_getapplock and sp_releaseapplock
See Application Locks (or Mutexes) in SQL Server 2005 for some example code.

Related

Sql Server Stored Procedure Concurrency Issue

We have some SQL Server stored procedure.
It selects some rows from a table and puts them into the temp table to apply and calculate some data validations.
The next part of the procedure either updates the actual table based on the temp table data or sends back the error status.
Initially selected rows can only be updated once and no further updates are allowed to the same rows.
The problem, we are facing is like some time, 2 simultaneous threads execute the procedure at the same time and both pass the initial validation block as in-memory temp data is not processed yet. 2nd thread is able to overwrite the first transaction.
We applied the transaction mechanism to prevent duplicate inserts and updates by checking the affected rows count by update query and aborting the transaction.
I am not sure if it's correct and optimized or not.
Also, can we lock rows with a select statements as well ?
This has been solved using the UPDLOCK on select query inside the transaction.
It locks the specific rows and allow the transaction to proceed in isolation.
Thanks Everyone for your help.

Does sp_getapplock cause SQL Server performance problems?

I have a stored procedure which cannot be executed concurrently. Multiple processes call this stored procedure, but it is of vital importance that the processes access the stored procedure sequentially.
The stored procedure basically scans a table for a primary key that meets various conditions, marks the record as in-use by the calling process, and then passes the primary key back to the calling process.
Anywhere from one to a dozen instances of the calling process could exist, depending upon how much work exists.
I decided to prevent concurrency by using sp_GetAppLock inside the stored procedure. I grab an exclusive transaction lock, with #Resource set to a string that is only used inside this stored procedure. The only thing that is ever blocked by this lock is the execution of this stored procedure.
The call inside the stored procedure looks like this:
sp_getapplock #Resource='My Unique String Here'
,#LockMode='Exclusive' -- Type of lock
,#LockOwner='Transaction' -- Transaction or Session
,#LockTimeout = 5000
It works swimmingly. If a dozen instances of my process are running, only one of them executes the stored procedure at any one point in time, while the other 11 obediently queue up and wait their turn.
The only problem is our DBA. He is a very good DBA who constantly monitors the database for blocking and receives an alert when it exceeds a certain threshold. My use of sp_getapplock triggers a lot of alerts. My DBA claims that the blocking in-and-of-itself is a performance problem.
Is his claim accurate? My feeling is that this is "good" blocking, in that the only thing being blocked is execution of the stored procedure, and I want that to be blocked. But my DBA says that forcing SQL Server to enforce this blocking is a significant drain on resources.
Can I tell him to "put down the crack pipe," as we used to say? Or should I re-write my application to avoid the need for sp_getapplock?
The article I read which sold me on sp_getapplock is here: sp_getapplock
Unfortunately, I think your DBA has a point, blocking does drain resources and this type of blocking is putting extra load on the server.
Let me explain how:
Proc gets called, SQL Server assigns worker thread from the Thread pool to it and it starts executing.
Call 2,3,4,... comes in, again SQL Server assigns worker threads to these calls, the Threads starts executing but because of the exclusive locks you have obtained, all the threads get suspended and sitting in the "Waiting List" for resources to become available.
Worker Threads which are very limited in numbers on any SQL Server are being held because of your process.
Now SQL Server is accumulating waits because of something a developer decided to do.
As a DBA we want you to come to SQL Server get what you need and leave it as soon as possible. If you are intentionally staying there and holding on to resources and putting SQL Server under pressure, it will piss off the DBA.
I think you need to reconsider your application design and come up with an alternative solution.
Maybe a "Process Table" in the SQL Server, update it with some value when a process start and for each call check the process table first before you fire the next call for that proc. So the wait stuff happens in the application layer and only when the resources are available then go to DB.
"The stored procedure basically scans a table for a primary key that meets various conditions, marks the record as in-use by the calling process, and then passes the primary key back to the calling process."
Here is a different way to do it inside the SP:
BEGIN TRANSACTION
SELECT x.PKCol
FROM dbo.[myTable] x WITH (FASTFIRSTROW XLOCK ROWLOCK READPAST)
WHERE x.col1 = #col1...
IF ##ROWCOUNT > 0 BEGIN
UPDATE dbo.[myTable]
SET ...
WHERE x.col1 = #col1
END
COMMIT TRANSACTION
XLOCK
Specifies that exclusive locks are to be taken and held until the transaction completes. If specified with ROWLOCK, PAGLOCK, or TABLOCK, the exclusive locks apply to the appropriate level of granularity.

SQL server, pyodbc and deadlock errors

I have some code that writes Scrapy scraped data to a SQL server db. The data items consist of some basic hotel data (name, address, rating..) and some list of rooms with associated data(price, occupancy etc). There can be multiple celery threads and multiple servers running this code and simultaneously writing to the db different items. I am encountering deadlock errors like:
[Failure instance: Traceback: <class 'pyodbc.ProgrammingError'>:
('42000', '[42000] [FreeTDS][SQL Server]Transaction (Process ID 62)
was deadlocked on lock resources with another process and has been
chosen as the deadlock victim. Rerun the transaction. (1205) (SQLParamData)')
The code that actually does the insert/update schematically looks like this:
1) Check if hotel exists in hotels table, if it does update it, else insert it new.
Get the hotel id either way. This is done by `curs.execute(...)`
2) Python loop over the hotel rooms scraped. For each room check if room exists
in the rooms table (which is foreign keyed to the hotels table).
If not, then insert it using the hotel id to reference the hotels table row.
Else update it. These upserts are done using `curs.execute(...)`.
It is a bit more complicated than this in practice, but this illustrates that the Python code is using multiple curs.executes before and during the loop.
If instead of upserting the data in the above manner, I generate one big SQL command, which does the same thing (checks for hotel, upserts it, records the id to a temporary variable, for each room checks if exists and upserts against the hotel id var etc), then do only a single curs.execute(...) in the python code, then I no longer see deadlock errors.
However, I don't really understand why this makes a difference, and also I'm not entirely sure it is safe to run big SQL blocks with multiple SELECTS, INSERTS, UPDATES in a single pyodbc curs.execute. As I understand pyodbc is suppose to only handle single statements, however it does seem to work, and I see my tables populates with no deadlock errors.
Nevertheless, it seems impossible to get any output if I do a big command like this. I tried declaring a variable #output_string and recording various things to it (did we have to insert or update the hotel for example) before finally SELECT #output_string as outputstring, but doing a fetch after the execute in pyodbc always fails with
<class 'pyodbc.ProgrammingError'>: No results. Previous SQL was not a query.
Experiments within the shell suggest pyodbc ignores everything after the first statement:
In [11]: curs.execute("SELECT 'HELLO'; SELECT 'BYE';")
Out[11]: <pyodbc.Cursor at 0x7fc52c044a50>
In [12]: curs.fetchall()
Out[12]: [('HELLO', )]
So if the first statement is not a query you get that error:
In [13]: curs.execute("PRINT 'HELLO'; SELECT 'BYE';")
Out[13]: <pyodbc.Cursor at 0x7fc52c044a50>
In [14]: curs.fetchall()
---------------------------------------------------------------------------
ProgrammingError Traceback (most recent call last)
<ipython-input-14-ad813e4432e9> in <module>()
----> 1 curs.fetchall()
ProgrammingError: No results. Previous SQL was not a query.
Nevertheless, except for the inability to fetch my #output_string, my real "big query", consisting of multiple selects, updates, inserts actually works and populates multiple tables in the db.
Nevertheless, if I try something like
curs.execute('INSERT INTO testX (entid, thecol) VALUES (4, 5); INSERT INTO testX (entid, thecol) VALUES (5, 6); SELECT * FROM testX; '
...: )
I see that both rows were inserted into the table tableX, even a subsequent curs.fetchall() fails with the "Previous SQL was not a query." error, so it seems that pyodbc execute does execute everything...not just the first statement.
If I can trust this, then my main problem is how to get some output for logging.
EDIT Setting autocommit=True in the dbargs seems to prevent the deadlock errors, even with the multiple curs.executes. But why does this fix it?
Setting autocommit=True in the dbargs seems to prevent the deadlock errors, even with the multiple curs.executes. But why does this fix it?
When establishing a connection, pyodbc defaults to autocommit=False in accordance with the Python DB-API spec. Therefore when the first SQL statement is executed, ODBC begins a database transaction that remains in effect until the Python code does a .commit() or a .rollback() on the connection.
The default transaction isolation level in SQL Server is "Read Committed". Unless the database is configured to support SNAPSHOT isolation by default, a write operation within a transaction under Read Committed isolation will place transaction-scoped locks on the rows that were updated. Under conditions of high concurrency, deadlocks can occur if multiple processes generate conflicting locks. If those processes use long-lived transactions that generate a large number of such locks then the chances of a deadlock are greater.
Setting autocommit=True will avoid the deadlocks because each individual SQL statement will be automatically committed, thus ending the transaction (which was automatically started when that statement began executing) and releasing any locks on the updated rows.
So, to help avoid deadlocks you can consider a couple of different strategies:
continue to use autocommit=True, or
have your Python code explicitly .commit() more often, or
use SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED to "loosen up" the transaction isolation level and avoid the persistent locks created by write operations, or
configure the database to use SNAPSHOT isolation which will avoid lock contention but will make SQL Server work harder.
You will need to do some homework to determine the best strategy for your particular usage case.

Are all deadlocks caused by a bad query

"Transaction (Process ID 63) was deadlocked on lock | communication buffer resources with another process and has been chosen as the deadlock victim. Rerun the transaction.". Possible failure reasons: Problems with the query, "ResultSet" property not set correctly, parameters not set correctly, or connection not established correctly."
Could this deadlock be caused by something that stored proc uses like SQL mail? Or is it always caused my something like two applications accessing the same table at the same time?
Two tables accessing the same table at the same time happens all the time in an application. Generally that won't cause a deadlock. A deadlock typically happens when you have say process 'A' attempting to update Table 1 and then Table 2 and then Table 3, and you have process 'B' attempting to update Table 3, then Table 2, and then Table 1. Process 'A' will have a resource locked that process 'B' needs and process 'B' has a resource process 'A' needs. SQL Server detects this as a deadlock and rolls one of the processes back, as a failed transaction.
The bottom line is that you have two processes attempting to update the same tables at the same time, but not in the same order. This will often lead to deadlocks.
One easy way to handle this in your application is to handle the failed transaction and simply re-execute the transaction. It will almost always execute successfully. A better solution is to make sure your processes are updating tables in the same order, as much as possible.
Missing Indexes is another common cause of deadlocks. If a select query can get the info it needs from an index instead of the base table, then it won't be blocked by any updates/inserts on the table itself.
To find out for sure, use the SQL profiler to trace for "Deadlock Graph" events, which will show you the detail of the deadlock itself.
Based on this, I don't think SQL Mail itself would directly be the culprit. I say "directly" because I don't know what you're doing with it. However, I assume SQL Mail is probably slow compared to the rest of your SQL ops, so if you're doing a lot with that, it could indirectly create a bottleneck that leads to a deadlock if you're holding onto tables while sending off the SQL Mail.
It's hard to recommend a specific strategy without having too many specifics about what you're doing. The short of it is that you should consider whether there's a way to break your dependence on holding onto the table while you're doing this such as using NOLOCK, using a temp table or non-temp "holding" table or just refactoring the SP that is doing the call.

Sql Server 2005 - manage concurrency on tables

I've got in an ASP.NET application this process :
Start a connection
Start a transaction
Insert into a table "LoadData" a lot of values with the SqlBulkCopy class with a column that contains a specific LoadId.
Call a stored procedure that :
read the table "LoadData" for the specific LoadId.
For each line does a lot of calculations which implies reading dozens of tables and write the results into a temporary (#temp) table (process that last several minutes).
Deletes the lines in "LoadDate" for the specific LoadId.
Once everything is done, write the result in the result table.
Commit transaction or rollback if something fails.
My problem is that if I have 2 users that start the process, the second one will have to wait that the previous has finished (because the insert seems to put an exclusive lock on the table) and my application sometimes falls in timeout (and the users are not happy to wait :) ).
I'm looking for a way to be able to have the users that does everything in parallel as there is no interaction, except the last one: writing the result. I think that what is blocking me is the inserts / deletes in the "LoadData" table.
I checked the other transaction isolation levels but it seems that nothing could help me.
What would be perfect would be to be able to remove the exclusive lock on the "LoadData" table (is it possible to force SqlServer to only lock rows and not table ?) when the Insert is finished, but without ending the transaction.
Any suggestion?
Look up SET TRANSACTION ISOLATION LEVEL READ COMMITTED SNAPSHOT in Books OnLine.
Transactions should cover small and fast-executing pieces of SQL / code. They have a tendancy to be implemented differently on different platforms. They will lock tables and then expand the lock as the modifications grow thus locking out the other users from querying or updating the same row / page / table.
Why not forget the transaction, and handle processing errors in another way? Is your data integrity truely being secured by the transaction, or can you do without it?
if you're sure that there is no issue with cioncurrent operations except the last part, why not start the transaction just before those last statements, Whichever they are that DO require isolation), and commit immediately after they succeed.. Then all the upfront read operations will not block each other...

Resources