How to prevent interim identity holes in SQL Server

How to prevent interim identity holes in SQL Server - sql-server

Is there a way (using config + transaction isolation levels) to ensure that there are no interim holes in a SQL Server IDENTITY column? Persistent holes are OK. The situation I am trying to avoid is when one query returns a hole but a subsequent similar query returns a row that was not yet committed when the query had been run the first time.

Your question is one of isolation levels and has nothing to do with IDENTITY. The same problem applies to any update/insert visibility. The first query can return results which had include an uncommited row in one and only one situation: if you use dirty reads (read uncommited). If you do, then you deserve all the inconsistent results you'll get and you deserve no help.
If you want to see stable results between two consecutive reads you must have a transaction that encompases both reads and use SERIALIZABLE isolation level or, better, use a row versioning based isolation level like SNAPSHOT. My recommendation would be to enable SNAPSHOT and use it. See Using Snapshot Isolation.
All I need is the promise that inserts to a table are committed in order of identity values they claim.
I hope you read this again and realize the impossibility of the request ('promise ... commit..'). You can't ask for something to guarantee success before it finished. What you're asking eventually boils down to asking not to allocate a new identity before the previous allocated one has committed successfully. In other words, full serialization of all insert transactions.

Related

Statement-Level Read Consistency in various SQL/NoSQL DBs

Recently I was thinking about query consistency in various SQL and NoSQL databases. What happens, when I have a (long running) query and rows are inserted or updated while the query is running? A simple theoretic example:
Let’s assume the following query takes a long time:
SELECT SUM(salary) FROM emp;
And while this query is running, another transaction does:
UPDATE emp SET salary = salary * 1.05 WHERE salary > 10000;
COMMIT;
When the SUM query has read half of the updated employees before the update, and the other half after the update, I would get an inconsistent nonsense result. Does this phenomenon have a name? By definition, it is not really a phantom read, because just one query is involved.
How do various DBs handle this situation? I am especially interested in SQL Server, MongoDB, RavenDB and Azure Table Storage.
Oracle for example guarantees statement-level read consistency, which says that the data returned by a single query is committed and consistent for a single point in time.
UPDATE: SQL Server seems to only prevent this kind of problem when READ_COMMITTED_SNAPSHOT is set to ON.

I believe the term you're looking for is "Dirty Read"
I can answer this one for SQL server.
You get 5 options for transaction isolation level, where the default is READ COMMITTED.
Only READ UNCOMMITTED allows dirty reads. You'll have to specifically enable that using SET TRANSACTION LEVEL READ UNCOMMITTED.
READ UNCOMMITTED is equivalent to NOLOCK, but syntactically nicer (opinion) as it doesn't need to be repeated for each table in your query.
Possible isolation levels are as below. I've linked the docs for more detail, if future readers find the link stale please edit.
https://learn.microsoft.com/en-us/sql/t-sql/statements/set-transaction-isolation-level-transact-sql
READ UNCOMMITTED
READ COMMITTED
REPEATABLE READ
SNAPSHOT
SERIALIZABLE

By default (read committed), you get your query and the update is blocked by the shared lock taken by your SELECT, until it completes.
If you enable Read Committed Snapshot Isolation Level (RCSI) as a database option, you continue to see the previous version of the data but the update isn't blocked.
Similarly, if the update was running first, when you have RSCI enabled, it doesn't block you, but you see the data before the update started.
RCSI is generally (but not 100% always) a good thing. I always design with it on. In Azure SQL DB, it's on by default.

Is it correct to say that data reading operations need not run inside transactions?

Say that a method only reads data from a database and does not write to it. Is it always the case that such methods don't need to run within a transaction?

In many databases a request for reading from the database which is not in an explicit transaction implicitly creates a transaction to run the request.
In a SQL database you may want to use a transaction if you are running multiple SELECT statements and you don't want changes from other transactions to show up in one SELECT but not an earlier one. A transaction running at the SERIALIZABLE transaction isolation level will present a consistent view of the data across multiple statements.

No. If you don't read at a specific isolation level you might not get enough guarantees. For example rows might disappear or new rows might appear.
This is true even for a single statement:
select * from Tab
except select * from Tab
This query can actually return rows in case of concurrent modifications because it scans the table twice.
SQL Server: There is an easy way to get fast, nonblocking, nonlocking, consistent reads: Enable snapshot isolation and read in a snapshot transaction. AFAIK Oracle has this capability as well. Postgres too.

the purpose of transaction is to rollback or commit the operations done to a database, if u are just selecting values and making no change in the data there is no need of transaction.

WITH (NOLOCK) on table in SQL Server 2008

In my SQL tempOrder table has millions of records and with 10 trigger to update tempOrder table with another table's update.
So I want to apply apply with(NOLOCK) on table.
I know with
SELECT * FROM temporder with(NOLOCK)
This statement I can do. But is there any way to apply with(NOLOCK) directly to the table from SQL Server 2008.

The direct answer to your question is NO -- there is no option to to tell SQL to never lock tableX. With that said, your question opens up a whole series of things that should be brought up.
Isolation Level
First, the most direct way you can accomplish what you want is to use with (nolock) option or SET TRANSACTION ISLOATION LEVEL READ UNCOMMITTED (aka chaos). These options are good for the query or the duration of the connection respectively. If I chose this route I would combine it with a long running SQL Profiler trace to identify any queries taking locks on TableX.
Lock Escalation
Second, SQL Server does have a table wide LOCK_ESCALATION threshold (executed as ALTER TABLE SET LOCK_ESCALATION x where X is the number of locks or AUTO). This controls when SQL attempts to consolidate many fine grained locks into fewer coarse grained locks. Said another way, it is a numeric threshold for converting how many locks are taken out on a single database object (think index).
Overriding SQL's lock escaltion generally isn't a good idea. As the documentation states:
In most cases, the Database Engine delivers the best performance when
operating with its default settings for locking and lock escalation.
As counter intuitive as it may seem, from the scenario you described you might have some luck with fewer broad locks instead of NOLOCK. You'll need to test this theory out with a real workload to determine if its worthwhile.
Snapshot Isolation
You might also check out the SNAPSHOT isolation level. There isn't enough information in your question to know, but I suspect it would help.
Dangers of NOLOCK
With that said, as you might have picked up from #GSerg's comment, NOLOCK can be evil. No-Lock is colloquially referred to as Chaos--and for good reason. When developers first encounter NOLOCK it seems like allowing dirty reads is the only implication. There are more...
dirty data is read for inconsistent results (the common impression)
wrong data -- meaning neither consistent with the pre-write or post-write state of your data.
Hard exceptions (like error 601 due to data movement) that terminate your query
Blank data is returned
previously committed rows are missed
Malformed bytes are returned
But don't take my word for it :
Actual Email: "NoLOCK is the epitome of evil?"
SQL Sever NOLOCK hint & other poor ideas
Is the nolock hint a bad practice

this is not a table's configuration.
If you add (nolock) to the query (it is called a query hint) you are saying that when executing this (and only this) query, it wont create lock on the affected tables.
Of course, you can make this configuration permanent for the current connection by setting a transaction isolation level to read uncommitted for example: set transaction isolation level read uncommitted. But again, it is valid only until that connection is open.
Perhaps if you explain in more details what you are trying to achieve, we can better help you.

You cannot change the default isolation level (except for snapshot) for a table or a database, however you can change it for all read queries in one transaction:
set transaction isolation level read uncommitted
See msdn for more information.

In a Data Warehouse scenario is there any disadvantage to using WITH(NOLOCK)

I have a Kimball-style DW (facts and dimensions in star models - no late-arriving facts rows or columns, no columns changing in dimensions except expiry as part of Type 2 slowly changing dimensions) with heavy daily processing to insert and update rows (on new dates) and monthly and daily reporting processes. The fact tables are partitioned by the dates for easy rolloff of old data.
I understand the WITH(NOLOCK) can cause uncommitted data to be read, however, I also do not wish to create any locks which would cause the ETL processes to fail or block.
In all cases, when we are reading from the DW, we are reading from fact tables for a date which will not change (the fact tables are partitioned by date) and dimension tables which will not have attributes changing for the facts they are linked to.
So - are there any disadvantages? - perhaps in the execution plans or in the operation of such SELECT-only queries running in parallel off the same tables.

This is what you probably need:
`ALTER DATABASE AdventureWorks
SET READ_COMMITTED_SNAPSHOT ON;
ALTER DATABASE AdventureWorks
SET ALLOW_SNAPSHOT_ISOLATION ON;
`
Then go ahead and use
SET TRANSACTION ISOLATION LEVEL READ COMMITTED
in your queries. According to BOL:
The behavior of READ COMMITTED depends on the setting of the READ_COMMITTED_SNAPSHOT database option:
If READ_COMMITTED_SNAPSHOT is set to OFF (the default), the Database Engine uses shared locks to prevent other transactions from modifying rows while the current transaction is running a read operation. The shared locks also block the statement from reading rows modified by other transactions until the other transaction is completed. The shared lock type determines when it will be released. Row locks are released before the next row is processed. Page locks are released when the next page is read, and table locks are released when the statement finishes.
If READ_COMMITTED_SNAPSHOT is set to ON, the Database Engine uses row versioning to present each statement with a transactionally consistent snapshot of the data as it existed at the start of the statement. Locks are not used to protect the data from updates by other transactions.
Hope this help.
Raj

As long as it's all no-update data there's no harm, but I'd be surprised if there's much benefit either. I'd say it's worth a try. The worst that will happen is that you'll get incomplete and/or inconsistent data if you are in the middle of a batch insert, but you can decide if that invalidates anything useful.

Have you considered creating a DATABASE SNAPSHOT of your DW and run your reports off it?

Yes. Your SQL will be far less readable. You will inevitably miss some NOLOCK hints because SQL SELECT commands using the NOLOCK strategy have to put it all over the place.
You can get the same thing by setting the isolation level
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
In the end you get a 10% performance boost (sorry I'm too lazy too look up the article for it, but it's out there)
I'd say a 10% gain isn't worth reducing readability.

If making the whole database read-only is possbile, Then this is a better option. You'll get read-uncommitted performance without having to modify all your code.
ALTER DATABASE adventureworks SET read_only

NOLOCK performs a ‘dirty read’ (indecently READ UNCOMMITTED does the same thing as NOLOCK). If the database is being updated as you read there is a danger that you will get inconsistent data back. The only option is to either accept locking and hence blocking, or to pick one of the two new isolation levels offered in SQL 2005 onwards discussed here.

There should be only one service in a Kimball-DWH that manipulationg data - the etl-process - himself.
If you have a full end-to-end etl-job you will never ever encounter locks (wehen you set the dependecies of the sub-tasks correct).
But: If you have independent jobs, which are updating data-pipelines end-2-end from sourcing up to the stars and models and reports, you need a concept to ensure consistency and accessibility for concurrent jobs sharing ressources/artefacts. A good advice is partitioned tables and updating cloned tables and switch the updated partitions of involved tables in a short transaction together (after the etl process). so the main-table should be consistent with the others and accessible all the time.
this pattern is a best practise but not without stones in your road - if you google a bit - you will agree.

Sql Server 2005 - manage concurrency on tables

I've got in an ASP.NET application this process :
Start a connection
Start a transaction
Insert into a table "LoadData" a lot of values with the SqlBulkCopy class with a column that contains a specific LoadId.
Call a stored procedure that :
read the table "LoadData" for the specific LoadId.
For each line does a lot of calculations which implies reading dozens of tables and write the results into a temporary (#temp) table (process that last several minutes).
Deletes the lines in "LoadDate" for the specific LoadId.
Once everything is done, write the result in the result table.
Commit transaction or rollback if something fails.
My problem is that if I have 2 users that start the process, the second one will have to wait that the previous has finished (because the insert seems to put an exclusive lock on the table) and my application sometimes falls in timeout (and the users are not happy to wait :) ).
I'm looking for a way to be able to have the users that does everything in parallel as there is no interaction, except the last one: writing the result. I think that what is blocking me is the inserts / deletes in the "LoadData" table.
I checked the other transaction isolation levels but it seems that nothing could help me.
What would be perfect would be to be able to remove the exclusive lock on the "LoadData" table (is it possible to force SqlServer to only lock rows and not table ?) when the Insert is finished, but without ending the transaction.
Any suggestion?

Look up SET TRANSACTION ISOLATION LEVEL READ COMMITTED SNAPSHOT in Books OnLine.

Transactions should cover small and fast-executing pieces of SQL / code. They have a tendancy to be implemented differently on different platforms. They will lock tables and then expand the lock as the modifications grow thus locking out the other users from querying or updating the same row / page / table.
Why not forget the transaction, and handle processing errors in another way? Is your data integrity truely being secured by the transaction, or can you do without it?

if you're sure that there is no issue with cioncurrent operations except the last part, why not start the transaction just before those last statements, Whichever they are that DO require isolation), and commit immediately after they succeed.. Then all the upfront read operations will not block each other...

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight