I am trying to get range-lock working with entity framework. Let's say i have a table with the following columns:
| Id | int |
| Type | int |
| Value | int |
Where Id is a PRIMARY KEY with CLUSTERED INDEX and Type has a NON-CLUSTERED NOT-UNIQUE INDEX.
If I want to select a value within serializable transaction using this code
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
BEGIN TRAN
SELECT Value FROM MyTable WHERE Type = 5
SELECT * FROM sys.dm_tran_locks WHERE request_session_id = ##SPID AND resource_type = 'KEY'
COMMIT
It correctly range-locks a row with Type = 5 and next row.
If I do this query:
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
BEGIN TRAN
SELECT Id, Type, Value FROM MyTable WHERE Type = 5
SELECT * FROM sys.dm_tran_locks WHERE request_session_id = ##SPID AND resource_type = 'KEY'
COMMIT
It locks all rows. Unfortunately Entity Framwork selects all columns:
SELECT [Id], [Type], [Value] FROM ...
I am filtering my real table on a column with FOREIGN KEY and this column is not unique. I tried to make my NON-CLUSTERED INDEX on the Type column UNIQUE and it locks the correct rows even when I select all columns.
How can I get same with NON UNIQUE INDEX?
What is locked depends on the query plan. Everything that the plan reads is subject to locking. So you need to make SQL Server find the index that you want to lock on attractive. Start by creating an optimal index for that query.
Why do you want a specific locking pattern to occur? If it's for performance reasons that is totally valid. If it's for behavioral reasons that is quite unreliable.
You also can make EF select less columns by not selecting entities but DTP objects (e.g. anonymous types).
It's a pity, SERIALIZABLE transaction can't do range lock with Clustered Index when WHERE clause contains different columns which have NON-UNIQUE INDEX or NO INDEX.
I found a nice workaround for Entity Framework.
If you want to LOCK ROWS with specific values, for example all rows with Type=FINISHED, make a NON-UNIQUE index ( if the column can contain duplicates ).
We have to tell to SQL DB what INDEX we should use.
var tables = context.MyTables.SqlQuery("SELECT * FROM dbo.MyTable WITH(INDEX(MyIndex)) WHERE Type='FINISHED'").ToList();
I used WITH(INDEX(MyIndex)), so it locks all rows where Type='FINISHED' even it has NON-UNIQUE INDEX
Perhaps someone will bring better solution than RAW QUERY.
EDIT: Rangelock uses NON-UNIQUE INDEX without any problem. It did not use because not enough data in the database.
Related
Some of our tables have dynamic relationships to other tables.
for example - we have an address table that stores all addresses, in it are two 'linking' fields- entity_id and entity_key_id, that are used to link the addresses to other tables.
For instance 'member' might be entity_id 1 and 'organization' might be entity id 2, so if we are storing a member address the row would have entity_id = 1 and entity_key_id = mem_id (the pk of the mem table), but if we are storing an organization address, the row would have entity_id = 2 and entity_key_id would store the pk of the org table).
how best would I index this? should I have 2 indexes- one for the entity_id and one for the entity_key_id? or would it be better to include both columns in a single index, and if so, in what order?
The db is SQL server 2008 R2
It depends on queries you are going to run against this database. You can use Tuning Adviser (https://msdn.microsoft.com/en-us/library/ms166575(v=sql.100).aspx), it will help.
But generally you should have index with columns in the order you use for predicate (WHERE) and columns you select. Here are some examples.
SELECT ... from table1 where table1.column1=.. and table1.column2=...
Here you should have column1, column2 index, so DB will be able to first find all column1 and then all column2 directly from index. You may also have column1 index, but in this case DB will first read column1 from index, and then go to table itself, which is slower.
But if your have column2, column42, column1 index it will not be used since DB can't follow your WHERE condition.
It is also good to have column in index.
Running select column1 from table1 where column2=... with index column2, column1 will give DB ability to read both columns from index and not even touch table! It is fast. But if you change order in this index, it will not be used becase DB needs column2 (according to WHERE).
You should always profiler to obtain execution plan: use https://msdn.microsoft.com/en-us/library/ms181091(v=sql.100).aspx : not only it helps you to find bottlenecks but also teaches you how DB optimizer uses indecies.
A cleaner approach would be to have two distinct keys mem_key_id and org_key_id. This not only allows you to create an index on each of them but also to declare proper foreign key constraints.
One of the two keys would always be null.
part function and schema:
CREATE PARTITION FUNCTION SegmentPartitioningFunction (smallint)
AS RANGE LEFT FOR VALUES (1, 2, 3, 4, 5, 6, 7, 8, 9, 10) ;
GO
CREATE PARTITION SCHEME SegmentPartitioningSchema
AS PARTITION SegmentPartitioningFunction ALL TO ([PRIMARY])
GO
I have a table (Note the table is M:N) :
PartTable
(
prd_id int,
cat_id smallint,
datacolumn smallint,
primary key (prd_id, cat_id) on SegmentPartitioningSchema(cat_id)
) on SegmentPartitioningSchema(cat_id)
NOTE: I tried all 3 combinations (part. table, part. index, part table and index)
In addition I have an index on PRD_ID and CAT_ID by them selves as they are Foreing keys.
I set LOCK_ESCALATION to AUTO as confirmed using this select:
SELECT lock_escalation_desc FROM sys.tables
Now, what I am trying to do is and by my understending should work:
begin tran
update PartTable set DataColumn = DataColumn where cat_id = 1
-- commit tran -- this doesnt happen yet
and at the same time in different connection
begin tran
update PartTable set DataColumn = DataColumn where cat_id = 2
-- commit tran -- this doesnt happen yet
The execution plan
Pastebin: http://pastebin.com/MgK6whYG
Bonus Q: Why is the TOP operator used?
By my understanding, I should get IX locks on the partition and only one one partition at the same time, which means both transactions can complete at the same time and the second one where cat_id = 2 shouldn't have to wait for the first one as both table and index are partitioned and the index is aligned.
However this doesn't happen and when I check the execution plan, I see that the table scan happens on every partition even through that doesnt make sense to me.
I tried technet, documentation and stackoverflow but all I didn't find an answer.
Now I know that the PK should be CAT_ID, PRD_ID (this makes more sense the way the table is used)
I am currently working on SQL Server 2012, but I suppose it should work better then on 2008R2.
"T-SQL Querying" book (http://www.amazon.com/Inside-Microsoft-Querying-Developer-Reference/dp/0735626030) has an interesting example, where, querying a table under default transaction isolation level during clustered index key column update, you may miss a row or read a row twice. It looks to be acceptable, since updating table/entity key is not a good idea anyway. However, I've updated this example so that the same happens, when you update non-clustered index key column value.
Following is the table structure:
SET NOCOUNT ON;
USE master;
IF DB_ID('TestIndexColUpdate') IS NULL CREATE DATABASE TestIndexColUpdate;
GO
USE TestIndexColUpdate;
GO
IF OBJECT_ID('dbo.Employees', 'U') IS NOT NULL DROP TABLE dbo.Employees;
CREATE TABLE dbo.Employees
(
empid CHAR(900) NOT NULL, -- this column should be big enough, so that 9 rows fit on 2 index pages
salary MONEY NOT NULL,
filler CHAR(1) NOT NULL DEFAULT('a')
);
CREATE INDEX idx_salary ON dbo.Employees(salary) include (empid); -- include empid into index, so that test query reads from it
ALTER TABLE dbo.Employees ADD CONSTRAINT PK_Employees PRIMARY KEY NONCLUSTERED(empid);
INSERT INTO dbo.Employees(empid, salary) VALUES
('A', 1500.00),('B', 2000.00),('C', 3000.00),('D', 4000.00),
('E', 5000.00),('F', 6000.00),('G', 7000.00),('H', 8000.00),
('I', 9000.00);
This is what needs to be done in the first connection (on each update, the row will jump between 2 index pages):
SET NOCOUNT ON;
USE TestIndexColUpdate;
WHILE 1=1
BEGIN
UPDATE dbo.Employees SET salary = 10800.00 - salary WHERE empid = 'I'; -- on each update, "I" employee jumps between 2 pages
END
This is what needs to be done in the second connection:
SET NOCOUNT ON;
USE TestIndexColUpdate;
DECLARE #c INT
WHILE 1 = 1
BEGIN
SELECT salary, empid FROM dbo.Employees
if ##ROWCOUNT <> 9 BREAK;
END
Normally, this query should return 9 records we inserted in the first code sample. However, very soon, I see 8 records being returned. This query reads all it's data from the "idx_salary" index, which is being updated by previous sample code.
This seems to be quite lax attitude towards data consistency from SQL Server. I would expect some locking coordination, when data is being read from index, while its key column is being updated.
Do I interpret this behavior correctly? Does this mean, that even non-clustered index keys should not be updated?
UPDATE:
To solve this problem, you only need to enable "snapshots" on the db (READ_COMMITTED_SNAPSHOT ON). No more deadlocking or missing rows. I've tried summarize all of this here: http://blog.konstantins.net/2015/01/missing-rows-after-updating-sql-server.html
UPDATE 2:
This seems to be the very same problem, as in this good old article: http://blog.codinghorror.com/deadlocked/
Do I interpret this behavior correctly?
Yes.
Does this mean, that even non-clustered index keys should not be updated?
No. You should use a proper isolation level or make the application tolerate the inconsistencies that READ COMMITTED allows.
This issue of missing rows is not limited to clustered indexes. It is caused by moving a row in a b-tree. Clustered and nonclustered indexes are implemented as b-trees with only tiny physical differences between them.
So you are seeing the exact same physical phenomenon. It applies every time your query reads a range of rows from a b-tree. The contents of that range can move around.
Use an isolation level that provides you the guarantees that you need. For read-only transactions the snapshot isolation level is usually a very elegant and total solution to concurrency. It seems to apply to your case.
This seems to be quite lax attitude towards data consistency from SQL Server. I would expect some locking coordination, when data is being read from index, while its key column is being updated.
This is an understandable request. On the other hand you specifically requested a low level of isolation. You can dial all the way up to SERIALIZABLE of you want. SERIALIZABLE presents you as-if serial execution.
Missing rows are just one special case of the many effects that READ COMMITTED allows. It makes no sense to specifically prevent them while allowing all kinds of other inconsistencies.
SET NOCOUNT ON;
USE TestIndexColUpdate;
SET TRANSACTION ISOLATION LEVEL READ COMMITTED
DECLARE #c INT
WHILE 1 = 1
BEGIN
DECLARE #count INT
SELECT #count = COUNT(*) FROM dbo.Employees WITH (INDEX (idx_salary))
WHERE empid > '' AND CONVERT(NVARCHAR(MAX), empid) > '__'
AND salary > 0
if #count <> 9 BREAK;
END
We are using MS SQL Server 2005.
Hi, i am performing UPDATE statement on a database table. Lets say this table has next colums:
int Id PK
int Column1
int Column2
It also has several Index:
Unique Clustered (Id)
Non-Unique Non-Clustered (Column1)
Non-Unique Non-Clustered (Column2)
I do next operation:
UPDATE [dbo].[Table]
SET Column1 = #Value1
WHERE Column1 = #Param1
AND Column2 = #Param2
Actual execution plan after that looks like this:
Which says that 86% of time was spent on updating clustered index, which does not include column i have just changed.
This operation should run hundreds of thousands times with web application disabled, which means it is very time critical.
So, does anybody have any idea why things are going this way and if it can be fixed somehow? Does this question make any sense? I am ready to provide more information if needed.
The "clustered index" is the actual table. All of the columns of the table are in the "clustered index" (with some exceptions for "out of row" storage for lobs, etc.)
When you change the value of a column, it has to be changed in the table pages, as well as in any index that the column appears in.
In terms of performance for quickly locating the rows to be updated (for your particular query), an index on dbo.Table(Column1,Column2) or dbo.Table(Column2,Column1) would be the most appropriate.
If it's possible that the column being modified already has the value being assigned (i.e. #Param1 and #Value both represent the same value, then adding another predicate may improve performance by avoiding a lock being obtained on the row.
UPDATE [dbo].[Table]
SET Column1 = #Value1
WHERE Column1 = #Param1
AND Column2 = #Param2
AND Column1 <> #Value1
Serializable transaction isolation levels avoids the problem of phantom reads by blocking any inserts to a table in a transaction which are conflicting with any select statements in other transactions. I am trying to understand it with an example, but it blocks insert even if when the filter in the select statement is not conflicting. I would appreciate any explanation on why it behaves in that way.
Table Script
CREATE TABLE [dbo].[dummy](
[firstname] [char](20) NULL,
[lastname] [char](20) NULL
) ON [PRIMARY]
GO
Session - 1
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
begin tran
select * from dummy where firstname = 'abc'
Session - 2
insert into dummy values('lmn', 'lmn') -- Why this blocks?
The first issue in your test scenario is that the table has no useful index on firstname. The second is that the table is empty.
From Key-Range Locking in BOL
Before key-range locking can occur,
the following conditions must be
satisfied:
The transaction-isolation level must be set to SERIALIZABLE.
The query processor must use an index to implement the range filter
predicate. For example, the WHERE
clause in a SELECT statement could
establish a range condition with this
predicate: ColumnX BETWEEN N'AAA' AND
N'CZZ'. A key-range lock can only be
acquired if ColumnX is covered by an
index key.
There is no suitable index to take RangeS-S locks on so to guarantee serializable semantics SQL Server needs to lock the whole table.
If you try adding a clustered index on the table on the first name column as below and repeat the experiment ...
CREATE CLUSTERED INDEX [IX_FirstName] ON [dbo].[dummy] ([firstname] ASC)
... you will find that you are still blocked!
Despite the fact that a suitable index now exists and the execution plan shows that it is seeked into to satisfy the query.
You can see why by running the following
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
BEGIN TRAN
SELECT *
FROM dummy
WHERE firstname = 'abc'
SELECT resource_type,
resource_description,
request_mode
FROM sys.dm_tran_locks
WHERE request_session_id = ##SPID
COMMIT
Returns
+---------------+----------------------+--------------+
| resource_type | resource_description | request_mode |
+---------------+----------------------+--------------+
| DATABASE | | S |
| OBJECT | | IS |
| PAGE | 1:198 | IS |
| KEY | (ffffffffffff) | RangeS-S |
+---------------+----------------------+--------------+
SQL Server does not just take out a range lock on exactly the range you specify in your query.
For an equality predicate on a unique index if there is a matching key it will just take a regular lock rather than any type of range lock at all.
For a non unique seek predicate it takes out locks on all matching keys within the range plus the "next" one at the end of the range (or on ffffffffffff to represent infinity if no "next" key exists). Even deleted "ghost" records can be used in this range key locking.
As described here for an equality predicate on either a unique or non unique index
If the key does not exist, then the ‘range’ lock is taken on the
‘next’ key both for unique and non-unique index. If the ‘next’ key
does not exist, then a range lock is taken on the ‘infinity’ value.
So with an empty table the SELECT still ends up locking the entire index. You would need to also have previously inserted a row between abc and lmn and then your insert would succeed.
insert into dummy values('def', 'def')
From http://msdn.microsoft.com/en-us/library/ms173763.aspx
SERIALIZABLE
Specifies the following:
Statements cannot read data that has been modified but not yet committed by other transactions.
No other transactions can modify data that has been read by the current transaction until the current transaction completes.
As I understand this, your insert will be blocked since the transaction under which your SELECT is running has not completed.