SSIS Conditional split results in deadlock error

SSIS Conditional split results in deadlock error - sql-server

I have a conditional split in an SSIS job that inserts or updates based on the CDC operation. The deletes dont actually delete, they just mark the row deleted (so it is also an update statement).
This is what it looks like:
And the error message associated with the red x is
"Transaction (Process ID 67) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.".
I tried to put an additional input arrow in there, so it only runs one update at a time, but it won't let me.

I suspect FactWaybillTrans is has table lock checked in the destination. That's usually what you want when loading large amounts of data. However, since you also want to update the same thing, that is going to conflict with the lock and thus, you get a deadlock. Even without checking table lock, the default lock could escalate to a full lock.
I would look to stage my updates to a table (stage.CDCWaybillUpdates) and then have an Execute SQL Task fire after the Data Flow. Much cleaner and no opportunity for deadlocks.
You might be able to fake it but it'd be totally unreliable. Add a Sort Operation between the NumRowsUpdated and Update. That might be able to induce enough drag so that the OLE DB Destination finishes and releases its lock before the update begins firing. If it doesn't slow it down enough, then sort the same data in the opposite direction. Terrible, hackish approach but sometimes you have to do the dumb.

Related

how are concurrent queries handled in snowflake?

For example, if I have a task that's inserting rows into a table while another task is truncating the same table, what happens?
I'm asking because I have a task that runs every minute which inserts rows into a table and then a lambda that reads and truncates the same table that runs every minute. I know snow tasks and event bridge don't run at every minute on the dot so I haven't really run into this issue yet but I'm thinking it'll happen eventually.
How does snowflake handle this?

It is the same concept in other SQL engines, that lock on resources will be placed.
In the Snowflake world, INSERT will have PARTITION level locking, because most of the INSERT statements write only new partitions.
Please see the below doc:
https://docs.snowflake.com/en/sql-reference/transactions.html#resource-locking
If the INSERT query is submitted before the TRUNCATE, then the TRUNCATE will have to wait until the INSERT query finishes. They can't be operated at the same time on the same resource.
See the screenshot below, the first query was the INSERT, which was HOLDING the PARTITION level lock, while the second query was the TRUNCATE, which was in the WAITING state:

The table will be locked by the first transaction that runs and subsequent transactions will be queued until the preceding transaction(s) complete.
BTW (and this may be the point of your question) having two processes like this operate independently doesn’t seem like a good design - as the lambda process seems to be logically dependent on the task.

Does inserting data into SQL Server lock the whole table?

I am using Entity Framework, and I am inserting records into our database which include a blob field. The blob field can be up to 5 MB of data.
When inserting a record into this table, does it lock the whole table?
So if you are querying any data from the table, will it block until the insert is done (I realise there are ways around this, but I am talking by default)?
How long will it take before it causes a deadlock? Will that time depend on how much load is on the server, e.g. if there is not much load, will it take longer to cause a deadlock?
Is there a way to monitor and see what is locked at any particular time?
If each thread is doing queries on single tables, is there then a case where blocking can occur? So isn't it the case that a deadlock can only occur if you have a query which has a join and is acting on multiple tables?
This is taking into account that most of my code is just a bunch of select statements, not heaps of long running transactions or anything like that.

Holy cow, you've got a lot of questions in here, heh. Here's a few answers:
When inserting a record into this table, does it lock the whole table?
Not by default, but if you use the TABLOCK hint or if you're doing certain kinds of bulk load operations, then yes.
So if you are querying any data from the table will it block until the insert is done (I realise there are ways around this, but I am talking by default)?
This one gets a little trickier. If someone's trying to select data from a page in the table that you've got locked, then yes, you'll block 'em. You can work around that with things like the NOLOCK hint on a select statement or by using Read Committed Snapshot Isolation. For a starting point on how isolation levels work, check out Kendra Little's isolation levels poster.
How long will it take before it causes a deadlock? Will that time depend on how much load is on the server, e.g. if there is not much load will it take longer to cause a deadlock?
Deadlocks aren't based on time - they're based on dependencies. Say we've got this situation:
Query A is holding a bunch of locks, and to finish his query, he needs stuff that's locked by Query B
Query B is also holding a bunch of locks, and to finish his query, he needs stuff that's locked by Query A
Neither query can move forward (think Mexican standoff) so SQL Server calls it a draw, shoots somebody's query in the back, releases his locks, and lets the other query keep going. SQL Server picks the victim based on which one will be less expensive to roll back. If you want to get fancy, you can use SET DEADLOCK_PRIORITY LOW on particular queries to paint targets on their back, and SQL Server will shoot them first.
Is there a way to monitor and see what is locked at any particular time?
Absolutely - there's Dynamic Management Views (DMVs) you can query like sys.dm_tran_locks, but the easiest way is to use Adam Machanic's free sp_WhoIsActive stored proc. It's a really slick replacement for sp_who that you can call like this:
sp_WhoIsActive #get_locks = 1
For each running query, you'll get a little XML that describes all of the locks it holds. There's also a Blocking column, so you can see who's blocking who. To interpret the locks being held, you'll want to check the Books Online descriptions of lock types.
If each thread is doing queries on single tables, is there then a case where blocking can occur? So isn't it the case that a deadlock can only occur if you have a query which has a join and is acting on multiple tables?
Believe it or not, a single query can actually deadlock itself, and yes, queries can deadlock on just one table. To learn even more about deadlocks, check out The Difficulty with Deadlocks by Jeremiah Peschka.

If you have direct control over the SQL, you can force row level locking using:
INSERT INTO WITH (ROWLOCK) MyTable(Id, BigColumn)
VALUES(...)
These two answers might be helpful:
Is it possible to force row level locking in SQL Server?
Locking a table with a select in Entity Framework
To view current held locks in Management Studio, look under the server, then under Management/Activity Monitor. It has a section for locks by object, so you should be able to see whether the inserts are really causing a problem.

Deadlock errors generally return quite quickly. Deadlock states do not occur as a result of a timeout error occurring while waiting for a lock. Deadlock is detected by SQL Server by looking for cycles in the lock requests.

The best answer I can come up with is: It depends.
The best way to check is to find your connection SPID and use sp_lock SPID to check if the lock mode is X on the TAB type. You can also verify the table name with SELECT OBJECT_NAME(objid). I also like to use the below query to check for locking.
SELECT RESOURCE_TYPE,RESOURCE_SUBTYPE,DB_NAME(RESOURCE_DATABASE_ID) AS 'DATABASE',resource_database_id DBID,
RESOURCE_DESCRIPTION,RESOURCE_ASSOCIATED_ENTITY_ID,REQUEST_MODE,REQUEST_SESSION_ID,
CASE WHEN RESOURCE_TYPE = 'OBJECT' THEN OBJECT_NAME(RESOURCE_ASSOCIATED_ENTITY_ID,RESOURCE_DATABASE_ID) ELSE '' END OBJETO
FROM SYS.DM_TRAN_LOCKS (NOLOCK)
WHERE REQUEST_SESSION_ID = --SPID here
In SQL Server 2008 (and later) you can disable the lock escalation on the table and enforce a WITH (ROWLOCK) in your insert clause effectively forcing a rowlock. This can't be done prior to SQL Server 2008 (you can write WITH ROWLOCK, but SQL Server can choose to ignore it).
I'm speaking generals here, and I don't have much experience with BLOBs as I usually advise developers to avoid them, especially if larger than 1 MB.

Are all deadlocks caused by a bad query

"Transaction (Process ID 63) was deadlocked on lock | communication buffer resources with another process and has been chosen as the deadlock victim. Rerun the transaction.". Possible failure reasons: Problems with the query, "ResultSet" property not set correctly, parameters not set correctly, or connection not established correctly."
Could this deadlock be caused by something that stored proc uses like SQL mail? Or is it always caused my something like two applications accessing the same table at the same time?

Two tables accessing the same table at the same time happens all the time in an application. Generally that won't cause a deadlock. A deadlock typically happens when you have say process 'A' attempting to update Table 1 and then Table 2 and then Table 3, and you have process 'B' attempting to update Table 3, then Table 2, and then Table 1. Process 'A' will have a resource locked that process 'B' needs and process 'B' has a resource process 'A' needs. SQL Server detects this as a deadlock and rolls one of the processes back, as a failed transaction.
The bottom line is that you have two processes attempting to update the same tables at the same time, but not in the same order. This will often lead to deadlocks.
One easy way to handle this in your application is to handle the failed transaction and simply re-execute the transaction. It will almost always execute successfully. A better solution is to make sure your processes are updating tables in the same order, as much as possible.

Missing Indexes is another common cause of deadlocks. If a select query can get the info it needs from an index instead of the base table, then it won't be blocked by any updates/inserts on the table itself.
To find out for sure, use the SQL profiler to trace for "Deadlock Graph" events, which will show you the detail of the deadlock itself.

Based on this, I don't think SQL Mail itself would directly be the culprit. I say "directly" because I don't know what you're doing with it. However, I assume SQL Mail is probably slow compared to the rest of your SQL ops, so if you're doing a lot with that, it could indirectly create a bottleneck that leads to a deadlock if you're holding onto tables while sending off the SQL Mail.
It's hard to recommend a specific strategy without having too many specifics about what you're doing. The short of it is that you should consider whether there's a way to break your dependence on holding onto the table while you're doing this such as using NOLOCK, using a temp table or non-temp "holding" table or just refactoring the SP that is doing the call.

database record locking

I have a server application, and a database. Multiple instances of the server can run at the same time, but all data comes from the same database (on some servers it is postgresql, in other cases ms sql server).
In my application, there is a process that is performed which can take hours. I need to ensure that this process is only executed one at a time. If one server is processing, no other server instance can process until the first one has completed.
The process depends on one table (let's call it 'ProcessTable'). What I do is, before any server starts the hour-long process, I set a boolean flag in the ProcessTable which indicates that this record is 'locked' and is being processed (not all records in this table are processed / locked, so I need to specifically mark each record which is needed by the process). So when the next server instance comes along while the previous instance is still processing, it sees the boolean flags and throws an exception.
The problem is, that 2 server instances might both be activated at nearly the same time, and when both check the ProcessTable, there may not be any flags set, but both servers are actually in the process of 'setting' the flags but since the transaction hasn't yet commited for either process, neither process will see the locking done by the other process. This is because the locking mechanism itself may take a few seconds, so there is that window of opportunity where 2 servers might still be able to process at the same time.
It appears that what I need is a single record in my 'Settings' table which should store a boolean flag called 'LockInProgress'. So before even a server can lock the needed records in the ProcessTable, it first must make sure that it has full rights to do the locking by checking the 'LockInProgress' column in the Settings table.
So my question is, how do I prevent two servers from both modifying that LockInProgress column in the settings table, at the same time... or am I going about this in the wrong manner?
Please note that I need to support both postgresql and ms sql server as some servers use one database, and some servers use the other.
Thanks in advance...

How about obtaining a lock on the record first and then update the record to show "locked". This would avoid the 2nd instance to get a lock successfully and thereby the update of record fails.
The point is to make sure the lock and update as one atomic step.

Make a stored procedure that hands out the lock, and run it under 'serializable' isolation. This will guarantee that one and only one process can get at the resource at any given time.
Note that this means that the second process trying to get at the lock will block until the first process releases it. Also, if you have to get multiple locks in this manner, make sure that the design of the process guarantees that the locks will be acquired and released in the same order. This will avoid deadlock situations where two processes hold resources while waiting for each other to release locks.
Unless you can't deal with your other processes blocking this would probably be easier to implement and more robust than attempting to implement 'test and set' semantics.

I've been thinking about this, and I think this is the simplest way of doing things; I just execute a command like this:
update settings set settingsValue = '333' where settingsKey = 'ProcessLock' and settingsValue = '0'
'333' would be a unique value which each server process gets (based on date/time, server name, + random value etc).
If no other process has locked the table, then the settingsValue would be = to 0, and that statement would adjust the settingsValue.
If another process has already locked the table, then that statement becomes a no-op, and nothing get's modified.
I then immediately commit the transaction.
Finally, I requery the table for the settingsValue, and if it is the correct value, then our lock succeeded and we continue on, otherwise an exception is thrown, etc. When we're done with the lock, we reset the value back down to 0.
Since I'm using SERIALIZATION transaction mode, I can't see this causing any issues... please correct me if I'm wrong.

Sql Server 2005 - manage concurrency on tables

I've got in an ASP.NET application this process :
Start a connection
Start a transaction
Insert into a table "LoadData" a lot of values with the SqlBulkCopy class with a column that contains a specific LoadId.
Call a stored procedure that :
read the table "LoadData" for the specific LoadId.
For each line does a lot of calculations which implies reading dozens of tables and write the results into a temporary (#temp) table (process that last several minutes).
Deletes the lines in "LoadDate" for the specific LoadId.
Once everything is done, write the result in the result table.
Commit transaction or rollback if something fails.
My problem is that if I have 2 users that start the process, the second one will have to wait that the previous has finished (because the insert seems to put an exclusive lock on the table) and my application sometimes falls in timeout (and the users are not happy to wait :) ).
I'm looking for a way to be able to have the users that does everything in parallel as there is no interaction, except the last one: writing the result. I think that what is blocking me is the inserts / deletes in the "LoadData" table.
I checked the other transaction isolation levels but it seems that nothing could help me.
What would be perfect would be to be able to remove the exclusive lock on the "LoadData" table (is it possible to force SqlServer to only lock rows and not table ?) when the Insert is finished, but without ending the transaction.
Any suggestion?

Look up SET TRANSACTION ISOLATION LEVEL READ COMMITTED SNAPSHOT in Books OnLine.

Transactions should cover small and fast-executing pieces of SQL / code. They have a tendancy to be implemented differently on different platforms. They will lock tables and then expand the lock as the modifications grow thus locking out the other users from querying or updating the same row / page / table.
Why not forget the transaction, and handle processing errors in another way? Is your data integrity truely being secured by the transaction, or can you do without it?

if you're sure that there is no issue with cioncurrent operations except the last part, why not start the transaction just before those last statements, Whichever they are that DO require isolation), and commit immediately after they succeed.. Then all the upfront read operations will not block each other...

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight