I detected a bookmark lookup deadlock in my application, and I can't decide which solution to use. None of them seem to be optimal.
Here are the queries:
UPDATE TEST SET DATA = #data WHERE CATEGORY = #cat
SELECT DATA, EXTRA_COLUMN FROM TEST WHERE CATEGORY = #cat
The problem is that there is an unclustered index in CATEGORY and DATA that is used by both queries in reverse order with the clustered index.
i.e.: The update locks the clustered index and update the table, while the select locks the unclustered index to make the bookmark lookup, and them both want each others locks (deadlock).
Here are the options that I found:
1 - Create an index that includes all the columns from the select query.
- It worked, but I don't think is a good idea, I would have to include any column that is used in any select query that can be update anywhere in the application.
2 - Change the transaction isolation level of the database to COMMITTED_SNAPSHOT
3 - Add NOLOCK hint to the select
4 - Drop the index
5 - force one of the transactions to block at an earlier point, before it has had an opportunity to acquire the lock that ends up blocking the other transaction. (Did not work)
I think the second option is the best choice, but I know that it can create other issues, shouldn't the COMMITTED_SNAPSHOT be the default isolation level in SQL SERVER?
It seems to me that there isn't any error either in the application or in the database logic, it's one simple table with an unclustered index and two queries that acces the same table, one to update and the other to select.
Which is the best way to solve this problem? Is there any other solution?
I really expected that SQL Server was able to solve it by itself.
Snapshot isolation is a very robust solution to removing reads from the equation. Many RDBMSes have them always on. They don't cause a lot of problems in practice. Prefer this solution to some manual brittle solution such as very specific indexes or hints.
Please try adding a nonclustered index on Category (include Data & Extra_Column) and adding the following hints to your queries:
UPDATE t SET t.DATA = #data FROM TEST WITH (index(ix_Cat)) WHERE CATEGORY = #cat
SELECT DATA, EXTRA_COLUMN FROM TEST WITH (index(ix_Cat)) WHERE CATEGORY = #cat
This will ensure that both queries will Update/Select data in the same order, and will prevent them from deadlocking eachother.
Related
I have a table in SQL DB Server. (Table Name : Material, 6 columns). It contains 2.6 million records. I need to update this table based on two column values. For update, system is taking 2 seconds.
Please help me how optimize below query.
UPDATE Material
SET Value = #Value,
Format = #Format,
SValue = #SValue,
CGroup = #CGroup
WHERE
SM = #SM
AND Characteristic = #Characteristic
You really need to provide the query plan before we can tell you with any certainty what, if anything, might help.
Having said that, the first thing I would check is whether the plan is showing a great deal of time doing a table scan. If so, you could improve performance substantially if it is a large table by adding an index on SM and Characteristic - that will allow the profiler to use the index to perform an index seek instead of a table scan, and could improve performance dramatically.
As you got big data few tweaks can increase query performance
(1) If column to be updated is indexed, remove index
(2) Executing the update in smaller batches
DECLARE #i INT=1
WHILE( #i <= 10 )
BEGIN
UPDATE TOP(20000) Material
SET Value = #Value,
Format = #Format,
SValue = #SValue,
CGroup = #CGroup
WHERE
SM = #SM
AND Characteristic = #Characteristic
SET #i=#i + 1
END
(3) Disabling Delete triggers (if any)
Hope this helps !
Try to put composite index for SM & Characteristic .By doing this, the sql server will be able to select records more easily. Operational wise, Update is a combination of insert & delete.If your table is having more columns, it may slow down your update even if you are not try to update all the columns.
Steps i prefer
Try to put composite index with SM & Characteristic
Try to re create a table with required columns & use joins where ever needed.
2.6 mil rows is not that much. 2 secs for an update is probably too much.
Having said that, the update times could depend on two things.
First, how many rows are being updated with a single update command, ie is it just one row or some larger set? You can't really do much about that, just saying it should be taken into consideration.
The other thing are indexes - you could either have too many of then or not enough.
If the table is missing an index on (SM, Characteristic) -- or (Characteristic, SM), depending on the selectivity -- then it's probably a full table scan every time. If the update touches only a couple of rows, it's waste of time. So, it's the first thing to check.
If there are too many indexes on the affected columns, this could slow down updates as well, because those indexes have to be maintained with every change of data. You can check the usefulness of indexes by querying the sys.dm_db_index_usage_stats DMV (plenty of explanation on the internet, so I won't get into it here) and remove the unused ones. Just be carefull with this and test thoroughly.
One other thing to check is whether the affected columns are part of some foreign key constraint. In that case, the engine must check the validity of the constraint every time (iow, check if the new value exists in the referenced table, or check if there's data in referencing tables, depending on which side of the FK the column is). If there are no supporting indexes for this check, it would cause (again) a scan on the other tables involved.
But to really make sure, check the exec plan and IO stats (SET STATISTICS IO ON), it will tell you exactly what is going on.
I am currently experiencing very long sync times on a zumero synced database (well over a minute), and following some profiling, the culprit appears to be a particular query that is taking 20+ seconds (suitably anonymised):
WITH relevant_rvs AS
(
SELECT rv.z_rv AS rv FROM zumero."mydb_089eb7ec0e2e4772ba0dde90170ee368_mysynceddb$z$rv$271340031" rv
WHERE (rv.txid<=913960)
AND NOT EXISTS (SELECT 1 FROM zumero."mydb_089eb7ec0e2e4772ba0dde90170ee368_mysynceddb$z$dd$271340031" dd WHERE dd.rv=rv.z_rv AND (dd.txid<=913960))
)
INSERT INTO #final_included_271340031_e021cfbe1c97213dd5adbacd667c08439fb8c6 (z_rv)
SELECT z$this.z_rv
FROM zumero."mydb_089eb7ec0e2e4772ba0dde90170ee368_mysynceddb$z$271340031" z$this
WHERE (z$this.z_rv IN (SELECT rv FROM relevant_rvs))
AND MyID = (MyID = XXX AND MyOtherField=XXX)
UNION SELECT z$this.z_rv
FROM zumero."mydb_089eb7ec0e2e4772ba0dde90170ee368_mysynceddb$z$old$271340031" z$this
WHERE (z$this.z_rv IN (SELECT rv FROM relevant_rvs))
AND (MyID = XXX AND MyOtherField=XXX)
I have taken the latter SELECT part of the query and ran it in isolation, which reproduces the same poor performance. Interestingly the execution plan recommends an index be applied, but I'm reluctant to go changing the schema of zumero generated tables, is adding indexes to these tables something that can be attempted safely and is it likely to help?
The source tables have 100,000ish records in them and the filter results in each client syncing 100-1000ish records, so not trivial data volumes but levels I would not expect to be causing major issues in terms of query performance.
Does anyone have any experience optimising Zumero sync performance server side? Do any indexes on source tables propagate to these tables? they don't appear to in this case.
Creating a custom index on the z$old table should be safe. I do hope it helps boost your query performance! (And it would be great to see a comment letting us know if it does or not.)
I believe the only issue such an index may cause would be that it could block certain schema changes on the host table. For example, if you tried to DROP the [MyOtherField] column from the host table, the Zumero triggers would attempt to drop the same column from the z$old table as well, and the transaction would fail with an error (which might be a bit surprising, since the index is not on the table being directly acted on).
Another thing to consider: It might also help to give this new index a name that will be recognized/helpful if it ever appears in an error message. Then (as always) feel free to contact support#zumero.com with any further questions or issues if they come up.
I want to place DB2 Triggers for Insert, Update and Delete on DB2 Tables heavily used in parallel online Transactions. The tables are shared by several members on a Sysplex, DB2 Version 10.
In each of the DB2 Triggers I want to insert a row into a central table and have one background process calling a Stored Procedure to read this table every second to process the newly inserted rows, ordered by sequence of the insert (sequence number or timestamp).
I'm very concerned about DB2 Index locking contention and want to make sure that I do not introduce Deadlocks/Timeouts to the applications with these Triggers.
Obviously I would take advantage of DB2 Features to reduce locking like rowlevel locking, but still see no real good approach how to avoid index contention.
I see three different options to select the newly inserted rows.
Put a sequence number in the table and the store the last processed sequence number in the background process. I would do the following select Statement:
SELECT COLUMN_1, .... Column_n
FROM CENTRAL_TABLE
WHERE SEQ_NO > 'last-seq-number'
ORDER BY SEQ_NO;
Locking Level must be CS to avoid selecting uncommited rows, which will be later rolled back.
I think I need one Index on the table with SEQ_NO ASC
Pro: Background process only reads rows and makes no updates/deletes (only shared locks)
Neg: Index contention because of ascending key used.
I can clean-up processed records later (e.g. by rolling partions).
Put a Status field in the table (processed and unprocessed) and change the Select as follows:
SELECT COLUMN_1, .... Column_n
FROM CENTRAL_TABLE
WHERE STATUS = 'unprocessed'
ORDER BY TIMESTAMP;
Later I would update the STATUS on the selected rows to "processed"
I think I need an Index on STATUS
Pro: No ascending sequence number in the index and no direct deletes
Cons: Concurrent updates by online transactions and the background process
Clean-up would happen in off-hours
DELETE the processed records instead of the status field update.
SELECT COLUMN_1, .... Column_n
FROM CENTRAL_TABLE
ORDER BY TIMESTAMP;
Since the table contains very few records, no index is required which could create a hot spot.
Also I think I could SELECT with Isolation Level UR, because I would detect potential uncommitted data on the later delete of this row.
For a Primary Key index I could use GENERATE_UNIQUE,which is random an not ascending.
Pro: No Index hot spot and the Inserts can be spread across the tablespace by random UNIQUE_ID
Con: Tablespace scan and sort on every call of the Stored Procedure and deleting records in parallel to the online inserts.
Looking forward what the community thinks about this problem. This must be a pretty common problem e.g. SAP should have a similar issue on their Batch Input tables.
I tend to favour Option 3, because it avoids index contention.
May be there is still another solution in your minds out there.
I think you are going to have numerous performance problems with your various solutions.
(I know premature optimazation is a sin, but experience tells us that some things are just not going to work in a busy system).
You should be able to use DB2s autoincrement feature to get your sequence number, with little or know performance implications.
For the rest perhaps you should look at a Queue based solution.
Have your trigger drop the operation (INSERT/UPDATE/DELETE) and the keys of the row into a MQ queue,
Then have a long running backgound task (in CICS?) do your post processing as its processing one update at a time you should not trip over yourself. Having a single loaded and active task with the ability to batch up units of work should give you a throughput in the order of 3 to 5 hundred updates a second.
My question concerns Oracle 11g and the use of indexes in SQL queries.
In my database, there is a table that is structured as followed:
Table tab (
rowid NUMBER(11),
unique_id_string VARCHAR2(2000),
year NUMBER(4),
dynamic_col_1 NUMBER(11),
dynamic_col_1_text NVARCHAR2(2000)
) TABLESPACE tabspace_data;
I have created two indexes:
CREATE INDEX Index_dyn_col1 ON tab (dynamic_col_1, dynamic_col_1_text) TABLESPACE tabspace_index;
CREATE INDEX Index_unique_id_year ON tab (unique_id_string, year) TABLESPACE tabspace_index;
The table contains around 1 to 2 million records. I extract the data from it by executing the following SQL command:
SELECT distinct
"sub_select"."dynamic_col_1" "AS_dynamic_col_1","sub_select"."dynamic_col_1_text" "AS_dynamic_col_1_text"
FROM
(
SELECT "tab".* FROM "tab"
where "tab".year = 2011
) "sub_select"
Unfortunately, the query needs around 1 hour to execute, although I created the both indexes described above.
The explain plan shows that Oracle uses a "Table Full Access", i.e. a full table scan. Why is the index not used?
As an experiment, I tested the following SQL command:
SELECT DISTINCT
"dynamic_col_1" "AS_dynamic_col_1", "dynamic_col_1_text" "AS_dynamic_col_1_text"
FROM "tab"
Even in this case, the index is not used and a full table scan is performed.
In my real database, the table contains more indexed columns like "dynamic_col_1" and "dynamic_col_1_text".
The whole index file has a size of about 50 GB.
A few more informations:
The database is Oracle 11g installed on my local computer.
I use Windows 7 Enterprise 64bit.
The whole index is split over 3 dbf files with about 50GB size.
I would really be glad, if someone could tell me how to make Oracle use the index in the first query.
Because the first query is used by another program to extract the data from the database, it can hardly be changed. So it would be good to tweak the table instead.
Thanks in advance.
[01.10.2011: UPDATE]
I think I've found the solution for the problem. Both columns dynamic_col_1 and dynamic_col_1_text are nullable. After altering the table to prohibit "NULL"-values in both columns and adding a new index solely for the column year, Oracle performs a Fast Index Scan.
The advantage is that the query takes now about 5 seconds to execute and not 1 hour as before.
Are you sure that an index access would be faster than a full table scan? As a very rough estimate, full table scans are 20 times faster than reading an index. If tab has more than 5% of the data in 2011 it's not surprising that Oracle would use a full table scan. And as #Dan and #Ollie mentioned, with year as the second column this will make the index even slower.
If the index really is faster, than the issue is probably bad statistics. There are hundreds of ways the statistics could be bad. Very briefly, here's what I'd look at first:
Run an explain plan with and without and index hint. Are the cardinalities off by 10x or more? Are the times off by 10x or more?
If the cardinality is off, make sure there are up to date stats on the table and index and you're using a reasonable ESTIMATE_PERCENT (DBMS_STATS.AUTO_SAMPLE_SIZE is almost always the best for 11g).
If the time is off, check your workload statistics.
Are you using parallelism? Oracle always assumes a near linear improvement for parallelism, but on a desktop with one hard drive you probably won't see any improvement at all.
Also, this isn't really relevant to your problem, but you may want to avoid using quoted identifiers. Once you use them you have to use them everywhere, and it generally makes your tables and queries painful to work with.
Your index should be:
CREATE INDEX Index_year
ON tab (year)
TABLESPACE tabspace_index;
Also, your query could just be:
SELECT DISTINCT
dynamic_col_1 "AS_dynamic_col_1",
dynamic_col_1_text "AS_dynamic_col_1_text"
FROM tab
WHERE year = 2011;
If your index was created solely for this query though, you could create it including the two fetched columns as well, then the optimiser would not have to go to the table for the query data, it could retrieve it directly from the index making your query more efficient again.
Hope it helps...
I don't have an Oracle instance on hand so this is somewhat guesswork, but my inclination is to say it's because you have the compound index in the wrong order. If you had year as the first column in the index it might use it.
Your second test query:
SELECT DISTINCT
"dynamic_col_1" "AS_dynamic_col_1", "dynamic_col_1_text" "AS_dynamic_col_1_text"
FROM "tab"
would not use the index because you have no WHERE clause, so you're asking Oracle to read every row in the table. In that situation the full table scan is the faster access method.
Also, as other posters have mentioned, your index on YEAR has it in the second column. Oracle can use this index by performing a skip scan, but there is a performance hit for doing so, and depending on the size of your table Oracle may just decide to use the FTS again.
I don't know if it's relevant, but I tested the following query:
SELECT DISTINCT
"dynamic_col_1" "AS_dynamic_col_1", "dynamic_col_1_text" "AS_dynamic_col_1_text"
FROM "tab"
WHERE "dynamic_col_1" = 123 AND "dynamic_col_1_text" = 'abc'
The explain plan for that query show that Oracle uses an index scan in this scenario.
The columns dynamic_col_1 and dynamic_col_1_text are nullable. Does this have an effect on the usage of the index?
01.10.2011: UPDATE]
I think I've found the solution for the problem. Both columns dynamic_col_1 and dynamic_col_1_text are nullable. After altering the table to prohibit "NULL"-values in both columns and adding a new index solely for the column year, Oracle performs a Fast Index Scan. The advantage is that the query takes now about 5 seconds to execute and not 1 hour as before.
Try this:
1) Create an index on year field (see Ollie answer).
2) And then use this query:
SELECT DISTINCT
dynamic_col_1
,dynamic_col_1_text
FROM tab
WHERE ID (SELECT ID FROM tab WHERE year=2011)
or
SELECT DISTINCT
dynamic_col_1
,dynamic_col_1_text
FROM tab
WHERE ID (SELECT ID FROM tab WHERE year=2011)
GROUP BY dynamic_col_1, dynamic_col_1_text
Maybe it will help you.
I'm using LINQ, but my database tables do not have an IDENTITY column (although they are using a surrogate Primary Key ID column)
Can this work?
To get the identity values for a table, there is a stored procedure called GetIDValueForOrangeTable(), which looks at a SystemValues table and increments the ID therein.
Is there any way I can get LINQ to get the ID value from this SystemValues table on an insert, rather than the built in IDENTITY?
As an aside, I don't think this is a very good idea, especially not for a web application. I imagine there will be a lot of concurrency conflicts because of this SystemValues lookup. Am I justified in my concern?
Cheers
Duncan
Sure you can make this work with LINQ, and safely, too:
wrap the access to the underlying SystemValues table in the "GetIDValue.....()" function in a TRANSACTION (and not with the READUNCOMMITTED isolation level!), then one and only one user can access that table at any given time and you should be able to safely distribute ID's
call that stored proc from LINQ just before saving your entity and store the ID if you're dealing with a new entity (if the ID hasn't been set yet)
store your entity in the database
That should work - not sure if it's any faster and any more efficient than letting the database handle the work - but it should work - and safely.
Marc
UPDATE:
Something like this (adapt to your needs) will work safely:
CREATE PROCEDURE dbo.GetNextTableID(#TableID INT OUTPUT)
AS BEGIN
SET TRANSACTION ISOLATION LEVEL READ COMMITTED
BEGIN TRANSACTION
UPDATE SystemTables
SET MaxTableID = MaxTableID + 1
WHERE ........
SELECT
#TableID = MaxTableID
FROM
dbo.SystemTables
COMMIT TRANSACTION
END
As for performance - as long as you have a reasonable number (less than 50 maybe) of concurrent users, and as long as this SystemTables tables isn't used for much else, then it should perform OK.
You are very justified in your concern. If two users try to insert at the sametime, both might be given the same number unless you do as described by marc_s and put the thing in a transaction. However, if the transaction doesn't wrap around your whole insert as well as the table that contains the id values, you may still have gaps if the outer insert fails (It got a value but then for some other reason didn't insert a record). Since most people do this to avoid gaps (something that is in most cases an unnecessary requirement) it makes life more complicated and still may not achieve the result. Using an identity field is almost always a better choice.