Should I avoid using sp_getAppLock? - sql-server

I have a stored procedure, and I want to ensure it cannot be executed concurrently.
My (multi-threaded) application does all necessary work on the underlying table via this stored procedure.
IMO, locking the table itself is an unnecessarily drastic action to take, and so when I found out about sp_GetAppLock, which essentially enforces a critical section, this sounded ideal.
My plan was to encase the stored procedure in a transaction and to set up spGetAppLock with transaction scope. The code was written and tested successfully.
The code has now been put forward for review and I have been told that I should not call this function. However when asking the obvious question "why not?", the only reasons I am getting are highly subjective, to do with any form of locking being complicated.
I don't necessarily buy this, but I was wondering whether anyone had any objective reasons why I should avoid this construct. Like I say, given my circumstances a critical section sounds an ideal approach to me.
Further info: An application sits on top of this with 2 threads T1 and T2. Each thread is waiting for a different message M1 and M2. The business logic involved says that processing can only happen once both M1 and M2 have arrived. The stored procedure logs that Mx has arrived (insert) and then checks whether My is present (select). The built-in locking is fine to make sure the inserts happen serially. But the selects need to happen serially too and I think I need to do something over and above the built-in functionality here.
Just for clarity, I want the "processing" to happen exactly once. So I can't afford for the stored procedure to return either false positives or false negatives. I'm worried that if the stored proc runs twice in very quick succession, then both "selects" might return data which indicates that it is appropriate to perform processing.

What is the procedure doing that you cannot rely on SQL Servers built-in concurrency control mechanisms? Often queries can be rewritten to allow real concurrency.
But if this procedure indeed has to be executed "alone", locking the table itself on first access is most likely going to be a lot faster than using the call to sp_GetAppLock. It sounds like this procedure is going to be called often. If that is the case you should look for a way to achieve the goal with minimal impact.
If the table contains no other rows besides of M1 and M2 a table lock is still your best bet.
If you have multiple threads sending multiple messages you can get more fine-grained by using "serializable" as transaction level and check if the other message is there before you do the insert but within the same transaction. To prevent deadlocks in this case make sure you check for both messages for example like this:
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN TRAN;
SELECT
#hasM1 = MAX(CASE WHEN msg_type='M1' THEN 1 ELSE 0 END),
#hasM2 = MAX(CASE WHEN msg_type='M2' THEN 1 ELSE 0 END)
FROM messages WITH(UPDLOCK)
WHERE msg_type IN ('M1','M2')
INSERT ...
IF(??) EXEC do_other_stuff_and_delete_messages;
COMMIT
In the IF statement before(!) the COMMIT you can use the information collected before the insert together with the information that you inserted to decide if additional processing is necessary.
In that processing step make sure to either mark those messages as processed or to delete them all still within the same transaction. That will make sure that you will not process those messages twice.
SERIALIZABLE is the only transaction isolation level that allows to lock rows that do not exist yet, so the first select statement with the WITH(UPDLOCK) effectively prevents the other row being inserted while the first execution is still running.
Finally, these are a lot of things to be aware of that could go wrong. You might want to have a look at service broker instead. you could use three queues with that. one for type M1 and one for type M2. Every time a message arrives within those queues a procedure can automatically be called to insert a token into the third queue. The third queue then could activate a process to check if both messages exist and do work. That would make the entire process asynchronous but for that it would be easy to restrict the queue 3 response to always only do one check at a time.
Service broker on msdn, also look at "activation" for the automatic message processing.

sp_GetAppLock is just like many other tools and as such it can be misused, overused, or correctly used. It is an exact match for the type of problem described by the original poster.
This is a good MSSQL Tips post on the usage
Prevent multiple users from running the same SQL Server stored procedure at the same time
http://www.mssqltips.com/sqlservertip/3202/prevent-multiple-users-from-running-the-same-sql-server-stored-procedure-at-the-same-time/

We use sp_getapplock all the time, due to the fact that we support some legacy applications that have been re-worked to use a SQL back-end, and the SQL Server locking model is not an exact match for our application logic.
We tend to go for a 'pessimistic' locking model, where we lock an entity before allowing a user to edit it, and use the (NOLOCK) hint extensively when reading data to bypass any blocking from the native locks on the actual tables. sp_getapplock is a good match for this. We also use it to enforce critical paths in large multi-user systems. You have to be systematic about what you call the locks you place.
We've found no performance problems with large numbers of user/locks via this route, so I see no reason why it wouldn't work well for you. Just be aware that you can get blocking and deadlocks if you have processes that place the same named locks, but not necessarily in the same order.

You can create a table with a flag for each set of messages, so if one of the threads is first to start processing it will mark the flag as processing.
To make sure that record blocked properly once one of threads reaches it use:
SELECT ... FROM WITH(XLOCK,ROWLOCK,READCOMMITTED) ... WHERE ...
This peace of code will put Exclusive lock on the record meaning who first got to it owns the row.
Then you do your changes and update flag, other thread will get updated value because it will be blocked by Exclusive lock until first thread commmits or rollbacks transaction.
For this to work you always need to select records from table with XLOCK this way it will work as expected.
Hope this helps.
Exclusive lock prove:
USE master
GO
IF OBJECT_ID('dbo.tblTest') IS NOT NULL
DROP TABLE dbo.tblTest
CREATE TABLE tblTest ( id int PRIMARY KEY )
;WITH cteNumbers AS (
SELECT 1 N
UNION ALL
SELECT N + 1 FROM cteNumbers WHERE N<1000
)
INSERT INTO
tblTest
SELECT
N
FROM
cteNumbers
OPTION (MAXRECURSION 0)
BEGIN TRANSACTION
SELECT * FROM dbo.tblTest WITH(XLOCK,ROWLOCK,READCOMMITTED) WHERE id = 1
SELECT * FROM sys.dm_tran_locks WHERE resource_database_id = DB_ID('master')
ROLLBACK TRANSACTION

Related

Primary key conflict even when TABLOCKX and HOLDLOCK hints

I have a table which is used to create locks with unique key to control execution of a critical section over multiple servers, i.e. only one thread at a time from all the web servers can enter that critical section.
The lock mechanism starts by trying to add a record to the database, and if successful it enters the region, otherwise it waits. When it exits the critical section, it removes that key from the table. I have the following procedure for this:
SET TRANSACTION ISOLATION LEVEL READ COMMITTED
BEGIN TRANSACTION
DECLARE #startTime DATETIME2
DECLARE #lockStatus INT
DECLARE #lockTime INT
SET #startTime = GETUTCDATE()
IF EXISTS (SELECT * FROM GuidLocks WITH (TABLOCKX, HOLDLOCK) WHERE Id = #lockName)
BEGIN
SET #lockStatus = 0
END
ELSE
BEGIN
INSERT INTO GuidLocks VALUES (#lockName, GETUTCDATE())
SET #lockStatus = 1
END
SET #lockTime = (SELECT DATEDIFF(millisecond, #startTime, GETUTCDATE()))
SELECT #lockStatus AS Status, #lockTime AS Duration
COMMIT TRANSACTION GetLock
So I do a SELECT on the table and use TABLOCKX and HOLDLOCK so I get an exclusive lock on the complete table and hold it until the end of the transaction. Then depending on the result, I either return fail status (0), or create a new record and return (1).
However, I am getting this exception from time to time and I just don't know how it is happening:
System.Data.SqlClient.SqlException: Violation of PRIMARY KEY constraint 'PK_GuidLocks'. Cannot insert duplicate key in object 'dbo.GuidLocks'. The duplicate key value is (XXXXXXXXX). The statement has been terminated.
Any idea how this is happening? How is it possible that two threads managed to obtain an exclusive lock on the same table and tried to insert rows at the same time?
UPDATE: It looks readers might have not fully understand my question here, so I would like to elaborate: My understanding is that using TABLOCKX obtains an exclusive lock on the table. I also understood from the documentation (and I could be mistaken) that if I use the HOLDLOCK statement, then the lock will be held till the end of the transaction, which in this case, I assume (and apparently my assumption is wrong, but that's what I understood from the documentation) is the outer transaction initiated by the BEGIN TRANSACTION statement and ended by COMMIT TRANSACTION statement. So the way I understand things here is that by the time SQL Server reach the SELECT statement having the TABLOCKX and HOLDLOCK, it will try to obtain an exclusive lock on the whole table, and will not release it until the execution of COMMIT TRANSACTION. If that's the case, how comes two threads seam to be trying to execute the same INSERT statement at the same time?
If you look up the documentation for tablock and holdlock, you'll see that it is not doing what you think it is:
Tablock: Specifies that the acquired lock is applied at the table level. The
type of lock that is acquired depends on the statement being executed.
For example, a SELECT statement may acquire a shared lock. By
specifying TABLOCK, the shared lock is applied to the entire table
instead of at the row or page level. If HOLDLOCK is also specified,
the table lock is held until the end of the transaction.
So the reason that your query is not working is because you are only getting a shared lock from the table. What Frisbee is attempting to point out is that you don't need to re-implement all of the transaction isolating and locking code because there is a more natural syntax that handles this implicitly. His version is better than yours because it's much more easy to not make a mistake that introduces bugs.
More generally, when ordering statements in your query, you should place the statements requiring the more restrictive lock first.
In my concurrent programming text many years ago, we read the parable of the blind train engineers who needed to transport trains both directions through a single track pass across the Andes only one track wide. In the first mutex model, an engineer would walk up to a synchronization bowl at the top of the pass and, if it was empty, place a pebble in to lock the pass. After driving through the pass he would remove his pebble to unlock the pass for the next train. This is the mutex model you have implemented and it doesn't work. In the parable a crach occurred soon after implementation, and sure enough there were two pebbles in the bowl - we have encountered a READ-READ-WRITE-WRTE anomaly due to the multi-threaded environment.
The parable then describes a second mutex model, where there is already a single pebble in the bowl. Each engineer walks up to the bowl and removes the pebble if one is there, placing it in his pocket while he drives through the pass. Then he restores the pebble to unlock the pass for the next train. If an engineer finds the bowl empty he keeps trying (or blocks for some length of time) until a pebble is available. This is the model that works.
You can implement this (correct) model by having (only ever) a single row in the GuidLocks table with a (by default) NULL value for the lock holder. In a suitable transaction each process UPDATES (in place) this single row with it's SPID exactly if the old value IS NULL; returning 1 if this succeeds and 0 if it fails. It again updates this column back to NULL when it releases the lock.
This will ensure that the resource being locked actually includes the row being modified, which in your case is clearly not always true.
See the answer by usr to this question for an interesting example.
I believe that you are being confused by the error message - clearly the engine is locating the row of a potential conflict before testing for the existence of a lock, resulting in a misleading error message, and that since (due to implementing model 1 above instead of model 2) the TABLOCK is being held on the resource used by the SELECT instead of the resource used by an INSERT/UPDATE, a second process is able to sneak in.
Note that, especially in the presence of support for snapshot isolation, the resource on which you have taken your TABLOCKX (the table snapshot before any inserts) does not guarantee to include the resource to which you have written the lock specifics (the table snapshot after an insert) .
Use an app lock.
exec sp_getapplock #resource = #lockName,
#LockMode='Exclusive',
#LockOwner = 'Session';
Your approach is incorrect from many point of view: granularity (table lock), scope (transaction which commit), leakage (will leak locks). Session scope app locks is what you actually intend to use.
INSERT INTO GuidLocks
select #lockName, GETUTCDATE()
where not exists ( SELECT *
FROM GuidLocks
WHERE Id = #lockName );
IF ##ROWCOUNT = 0 ...
to be safe about optimization
SELECT 1
FROM GuidLocks

Understanding SQL Server LOCKS on SELECT queries

I'm wondering what is the benefit to use SELECT WITH (NOLOCK) on a table if the only other queries affecting that table are SELECT queries.
How is that handled by SQL Server? Would a SELECT query block another SELECT query?
I'm using SQL Server 2012 and a Linq-to-SQL DataContext.
(EDIT)
About performance :
Would a 2nd SELECT have to wait for a 1st SELECT to finish if using a locked SELECT?
Versus a SELECT WITH (NOLOCK)?
A SELECT in SQL Server will place a shared lock on a table row - and a second SELECT would also require a shared lock, and those are compatible with one another.
So no - one SELECT cannot block another SELECT.
What the WITH (NOLOCK) query hint is used for is to be able to read data that's in the process of being inserted (by another connection) and that hasn't been committed yet.
Without that query hint, a SELECT might be blocked reading a table by an ongoing INSERT (or UPDATE) statement that places an exclusive lock on rows (or possibly a whole table), until that operation's transaction has been committed (or rolled back).
Problem of the WITH (NOLOCK) hint is: you might be reading data rows that aren't going to be inserted at all, in the end (if the INSERT transaction is rolled back) - so your e.g. report might show data that's never really been committed to the database.
There's another query hint that might be useful - WITH (READPAST). This instructs the SELECT command to just skip any rows that it attempts to read and that are locked exclusively. The SELECT will not block, and it will not read any "dirty" un-committed data - but it might skip some rows, e.g. not show all your rows in the table.
On performance you keep focusing on select.
Shared does not block reads.
Shared lock blocks update.
If you have hundreds of shared locks it is going to take an update a while to get an exclusive lock as it must wait for shared locks to clear.
By default a select (read) takes a shared lock.
Shared (S) locks allow concurrent transactions to read (SELECT) a resource.
A shared lock as no effect on other selects (1 or a 1000).
The difference is how the nolock versus shared lock effects update or insert operation.
No other transactions can modify the data while shared (S) locks exist on the resource.
A shared lock blocks an update!
But nolock does not block an update.
This can have huge impacts on performance of updates. It also impact inserts.
Dirty read (nolock) just sounds dirty. You are never going to get partial data. If an update is changing John to Sally you are never going to get Jolly.
I use shared locks a lot for concurrency. Data is stale as soon as it is read. A read of John that changes to Sally the next millisecond is stale data. A read of Sally that gets rolled back John the next millisecond is stale data. That is on the millisecond level. I have a dataloader that take 20 hours to run if users are taking shared locks and 4 hours to run is users are taking no lock. Shared locks in this case cause data to be 16 hours stale.
Don't use nolocks wrong. But they do have a place. If you are going to cut a check when a byte is set to 1 and then set it to 2 when the check is cut - not a time for a nolock.
I have to add one important comment. Everyone is mentioning that NOLOCKreads only dirty data. This is not precise. It is also possible that you'll get the same row twice or the whole row is skipped during your read. The reason is that you could ask for some data at the same time when SQL Server is re-balancing b-tree.
Check another threads
https://stackoverflow.com/a/5469238/2108874
http://www.sqlmag.com/article/sql-server/quaere-verum-clustered-index-scans-part-iii.aspx)
With the NOLOCK hint (or setting the isolation level of the session to READ UNCOMMITTED) you tell SQL Server that you don't expect consistency, so there are no guarantees. Bear in mind though that "inconsistent data" does not only mean that you might see uncommitted changes that were later rolled back, or data changes in an intermediate state of the transaction. It also means that in a simple query that scans all table/index data SQL Server may lose the scan position, or you might end up getting the same row twice.
At my work, we have a very big system that runs on many PCs at the same time, with very big tables with hundreds of thousands of rows, and sometimes many millions of rows.
When you make a SELECT on a very big table, let's say you want to know every transaction a user has made in the past 10 years, and the primary key of the table is not built in an efficient way, the query might take several minutes to run.
Then, our application might me running on many user's PCs at the same time, accessing the same database. So if someone tries to insert into the table that the other SELECT is reading (in pages that SQL is trying to read), then a LOCK can occur and the two transactions block each other.
We had to add a "NO LOCK" to our SELECT statement, because it was a huge SELECT on a table that is used a lot by a lot of users at the same time and we had LOCKS all the time.
I don't know if my example is clear enough? This is a real life example.
The SELECT WITH (NOLOCK) allows reads of uncommitted data, which is equivalent to having the READ UNCOMMITTED isolation level set on your database. The NOLOCK keyword allows finer grained control than setting the isolation level on the entire database.
Wikipedia has a useful article: Wikipedia: Isolation (database systems)
It is also discussed at length in other stackoverflow articles.
select with no lock - will select records which may / may not going to be inserted. you will read a dirty data.
for example - lets say a transaction insert 1000 rows and then fails.
when you select - you will get the 1000 rows.

What SQL Server 2005/2008 locking approach should I use to process individual table rows in multiple server application instances?

I need to develop a server application (in C#) that will read rows from a simple table (in SQL Server 2005 or 2008), do some work, such as calling a web service, and then update the rows with the resulting status (success, error).
Looks quite simple, but things get tougher when I add the following application requisites:
Multiple application instances must be running at the same time, for Load Balancing and Fault Tolerance purposes. Typically, the application will be deployed on two or more servers, and will concurrently access the same database table. Each table row must be processed only once, so a common synchronization/locking mechanism must be used between multiple application instances.
When an application instance is processing a set of rows, other application instances shouldn't have to wait for it to end in order to read a different set of rows waiting to be processed.
If an application instance crashes, no manual intervention should need to take place on the table rows that were being processed (such as removing temporary status used for application locking on rows that the crashing instance was processing).
The rows should be processed in a queue-like fashion, i.e., the oldest rows should be processed first.
Although these requisites don't look too complex, I'm having some trouble in coming up with a solution.
I've seen locking hint suggestions, such as XLOCK, UPDLOCK, ROWLOCK, READPAST, etc., but I see no combination of locking hints that will allow me to implement these requisites.
Thanks for any help.
Regards,
Nuno Guerreiro
This is a typical table as queue pattern, as described in Using tables as Queues. You would use a Pending Queue and the dequeue transaction should also schedule a retry in a reasonable timeout. Is not realistically possible to hold on to locks for the duration of the web calls. On success, you would remove the pending item.
You also need to be able to dequeue in batch, dequeuing one-by-one is too slow if you go into serious load (100 and thousands of operations per second). So taking the Pending Queue example from the article linked:
create table PendingQueue (
id int not null,
DueTime datetime not null,
Payload varbinary(max),
cnstraint pk_pending_id nonclustered primary key(id));
create clustered index cdxPendingQueue on PendingQueue (DueTime);
go
create procedure usp_enqueuePending
#dueTime datetime,
#payload varbinary(max)
as
set nocount on;
insert into PendingQueue (DueTime, Payload)
values (#dueTime, #payload);
go
create procedure usp_dequeuePending
#batchsize int = 100,
#retryseconds int = 600
as
set nocount on;
declare #now datetime;
set #now = getutcdate();
with cte as (
select top(#batchsize)
id,
DueTime,
Payload
from PendingQueue with (rowlock, readpast)
where DueTime < #now
order by DueTime)
update cte
set DueTime = dateadd(seconds, #retryseconds, DueTime)
output deleted.Payload, deleted.id;
go
On successful processing you would remove the item from the queue using the ID. On failure, or on crash, it would be retries automatically in 10 minutes. One think you must internalize is that as long as HTTP does not offer transactional semantics you will never be able to do this with 100% consistent semantics (eg. guarantee that no item is processed twice). You can achieve a very high margin for error, but there will always be a moment when the system can crash after the HTTP call succeeded before the database is updated, and will cause the same item to be retried since you cannot distinguish this case from a case when the system crashed before the HTTP call.
I initially suggested SQL Server Service Broker for this. However, after some research it turns out this is probably not the best way of handling the problem.
What you're left with is the table architecture you've asked for. However, as you've been finding, it is unlikely that you will be able to come up with a solution that meets all the given criteria, due to the great complexity of locking, transactions, and the pressures placed on such a scheme by high concurrency and high transactions per second.
Note: I am currently researching this issue and will get back to you with more later. The following script was my attempt to meet the given requirements. However, it suffers from frequent deadlocks and processes items out of order. Please stay tuned, and in the meantime consider a destructive reads method (DELETE with OUTPUT or OUTPUT INTO).
SET XACT_ABORT ON; -- blow up the whole tran on any errors
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
BEGIN TRAN
UPDATE X
SET X.StatusID = 2 -- in process
OUTPUT Inserted.*
FROM (
SELECT TOP 1 * FROM dbo.QueueTable WITH (READPAST, ROWLOCK)
WHERE StatusID = 1 -- ready
ORDER BY QueuedDate, QueueID -- in case of items with the same date
) X;
-- Do work in application, holding open the tran.
DELETE dbo.QueueTable WHERE QueueID = #QueueID; -- value taken from recordset that was output earlier
COMMIT TRAN;
In the case of several/many rows being locked at once by a single client, there is a possibility of the rowlock escalating to an extent, page, or table lock, so be aware of that. Also, normally holding long-running transactions that maintain locks is a big no-no. It may work in this special usage case, but I fear that high tps by multiple clients will make the system break down. Note that normally, the only processes querying your queue table should be those that are doing queue work. Any processes doing reporting should use READ UNCOMMITTED or WITH NOLOCK to avoid interfering with the queue in any way.
What is the implication of rows being processed out of order? If an application instance crashes while another instance is successfully completing rows, this delay will likely cause at least one row to be delayed in its completion, causing the processing order to be incorrect.
If the transaction/locking method above is not to your satisfaction, another way to handle your application crashing would be to give your instances names, then set up a monitor process that has the capacity to check periodically if those named instances are running. When a named instance starts up it would always reset any unprocessed rows that possess its instance identifier (something as simple as "instance A" and "instance B" would work). Additionally, the monitor process would check if the instances are running and if one of them is not, reset the rows for that missing instance, enabling any other instances to run. There would be a small lag between crash and recovery, but with proper architecture it could be quite reasonable.
Note: The following links should be edifying:
info about XLOCK
Tables as Queues
You can't do this with SQL transactions (or relying on transactions as your main component here). Actually, you can do this, but you shouldn't. Transactions are not meant to be used this way, for long locks, and you shouldn't abuse them like this.
Keeping a transaction open for that long (retrieve rows, call the web service, get back to make some updates) is simply not good. And there's no optimistic locking isolation level that will allow you to do what you want.
Using ROWLOCK is also not a good idea, because it's just that. A hint. It's subject to lock escalation, and it can be converted to a table lock.
May I suggest a single entry point to your database? I think it fits in the pub/sub design.
So there would be only one component that reads/updates these records:
Reads batches of messages (enough for all your other instances to consume) - 1000, 10000, whatever you see fit. It makes these batches available to the other (concurrent) components through some queued way. I'm not going to say MSMQ :) (it would be the second time today I recommend it, but it's really suitable in your case too).
It marks the messages as in progress or something similar.
Your consumers are all bound, transactionally, to the inbound queue and do their stuff.
When ready, after the web service call, they put the messages in an outbound queue.
The central component picks them up and, inside a distributed transaction, does an update on the database (if it fails the messages will stay in the queue). Since it is the only one that could do that operation you won't have any concurrency issues. At least not on the database.
In the mean time it can read the next pending batch and so on.

SQL Server - Ensuring only one transaction tries to create a table

An internal application needs to dynamically create SQL tables based on some provided criteria. There are multiple consumer of this application.
IF (NOT EXISTS (SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA = 'dbo' AND TABLE_NAME = 'SomeTableName'))
BEGIN
-- Create table in here.
END
To do this I have the above basic construct within the sproc. I am aware of possible race conditions, so my first solution was to add some locking hints to the SELECT statement to ensure that all other transactions checking for the existance of a table would be blocked until the other transactions had finished. However, no matter which hints I used, this would not work.
My next solution was to wrap the table creation in a TRY..CATCH so that even if it did fail, I could just ignore the error. However, the failure of the CREATE TABLE statement dooms the transaction so I cannot carry on even if I do ignore the error.
My last solution, which works, was to use the TRY..CATCH construct and if an error is raised then GOTO the top of the sproc where a fresh transaction is created and everything goes through fine as the table exists second time round.
I am not happy with the solution as it seems like a hack. Any SQL gurus out there who knows a clean solution to this issue?
Just to clarify, the solution I discussed above does not have a large impact on performance, so I am really looking for a clean solution which doesn't have large performance implications.
Use semaphores (aka manual locking) with sp_getapplock (top of code) and sp_releaseapplock (bottom of code) to ensure one process only.
A 2nd process will fail or wait or timeout based on your sp_getapplock parameters

Does inserting data into SQL Server lock the whole table?

I am using Entity Framework, and I am inserting records into our database which include a blob field. The blob field can be up to 5 MB of data.
When inserting a record into this table, does it lock the whole table?
So if you are querying any data from the table, will it block until the insert is done (I realise there are ways around this, but I am talking by default)?
How long will it take before it causes a deadlock? Will that time depend on how much load is on the server, e.g. if there is not much load, will it take longer to cause a deadlock?
Is there a way to monitor and see what is locked at any particular time?
If each thread is doing queries on single tables, is there then a case where blocking can occur? So isn't it the case that a deadlock can only occur if you have a query which has a join and is acting on multiple tables?
This is taking into account that most of my code is just a bunch of select statements, not heaps of long running transactions or anything like that.
Holy cow, you've got a lot of questions in here, heh. Here's a few answers:
When inserting a record into this table, does it lock the whole table?
Not by default, but if you use the TABLOCK hint or if you're doing certain kinds of bulk load operations, then yes.
So if you are querying any data from the table will it block until the insert is done (I realise there are ways around this, but I am talking by default)?
This one gets a little trickier. If someone's trying to select data from a page in the table that you've got locked, then yes, you'll block 'em. You can work around that with things like the NOLOCK hint on a select statement or by using Read Committed Snapshot Isolation. For a starting point on how isolation levels work, check out Kendra Little's isolation levels poster.
How long will it take before it causes a deadlock? Will that time depend on how much load is on the server, e.g. if there is not much load will it take longer to cause a deadlock?
Deadlocks aren't based on time - they're based on dependencies. Say we've got this situation:
Query A is holding a bunch of locks, and to finish his query, he needs stuff that's locked by Query B
Query B is also holding a bunch of locks, and to finish his query, he needs stuff that's locked by Query A
Neither query can move forward (think Mexican standoff) so SQL Server calls it a draw, shoots somebody's query in the back, releases his locks, and lets the other query keep going. SQL Server picks the victim based on which one will be less expensive to roll back. If you want to get fancy, you can use SET DEADLOCK_PRIORITY LOW on particular queries to paint targets on their back, and SQL Server will shoot them first.
Is there a way to monitor and see what is locked at any particular time?
Absolutely - there's Dynamic Management Views (DMVs) you can query like sys.dm_tran_locks, but the easiest way is to use Adam Machanic's free sp_WhoIsActive stored proc. It's a really slick replacement for sp_who that you can call like this:
sp_WhoIsActive #get_locks = 1
For each running query, you'll get a little XML that describes all of the locks it holds. There's also a Blocking column, so you can see who's blocking who. To interpret the locks being held, you'll want to check the Books Online descriptions of lock types.
If each thread is doing queries on single tables, is there then a case where blocking can occur? So isn't it the case that a deadlock can only occur if you have a query which has a join and is acting on multiple tables?
Believe it or not, a single query can actually deadlock itself, and yes, queries can deadlock on just one table. To learn even more about deadlocks, check out The Difficulty with Deadlocks by Jeremiah Peschka.
If you have direct control over the SQL, you can force row level locking using:
INSERT INTO WITH (ROWLOCK) MyTable(Id, BigColumn)
VALUES(...)
These two answers might be helpful:
Is it possible to force row level locking in SQL Server?
Locking a table with a select in Entity Framework
To view current held locks in Management Studio, look under the server, then under Management/Activity Monitor. It has a section for locks by object, so you should be able to see whether the inserts are really causing a problem.
Deadlock errors generally return quite quickly. Deadlock states do not occur as a result of a timeout error occurring while waiting for a lock. Deadlock is detected by SQL Server by looking for cycles in the lock requests.
The best answer I can come up with is: It depends.
The best way to check is to find your connection SPID and use sp_lock SPID to check if the lock mode is X on the TAB type. You can also verify the table name with SELECT OBJECT_NAME(objid). I also like to use the below query to check for locking.
SELECT RESOURCE_TYPE,RESOURCE_SUBTYPE,DB_NAME(RESOURCE_DATABASE_ID) AS 'DATABASE',resource_database_id DBID,
RESOURCE_DESCRIPTION,RESOURCE_ASSOCIATED_ENTITY_ID,REQUEST_MODE,REQUEST_SESSION_ID,
CASE WHEN RESOURCE_TYPE = 'OBJECT' THEN OBJECT_NAME(RESOURCE_ASSOCIATED_ENTITY_ID,RESOURCE_DATABASE_ID) ELSE '' END OBJETO
FROM SYS.DM_TRAN_LOCKS (NOLOCK)
WHERE REQUEST_SESSION_ID = --SPID here
In SQL Server 2008 (and later) you can disable the lock escalation on the table and enforce a WITH (ROWLOCK) in your insert clause effectively forcing a rowlock. This can't be done prior to SQL Server 2008 (you can write WITH ROWLOCK, but SQL Server can choose to ignore it).
I'm speaking generals here, and I don't have much experience with BLOBs as I usually advise developers to avoid them, especially if larger than 1 MB.

Resources