SQL Server: creating indexes on foreign keys - sql-server

What I'm experimenting here is how DELETE statements perform on a very simple example. I'm currently using SQL Server 2017 (I also tried with SQL Server 2014 and the results were similar).
I have two tables: Parent and Child. Child has a foreign key to Parent (Parent_ID).
Parent:
Parent_ID Name
-----------------------
1 P1
2 P2
Child:
Child_ID Parent_ID Data
-----------------------------
1 1 P1C1
2 2 P2C1
3 2 PPPPCCCC
4 2 P2C1
5 2 PPPPCCCC
(around 4 million more rows with Parent_ID=2)
I always thought that adding an index on the foreign key (Parent_ID in Child here) was a good idea. But today, I have tried the behavior of DELETE in a somewhat extreme case - but I'm sure this kind of case could happen in real life - (4 millions rows with Parent_ID=2 in the Child table, only one row for Parent_ID=1).
If I try to delete rows with Parent_ID = 1, it looks good: it is fast enough, the index is used, the amount of logical reads seems to be fine (12 logical reads: I am no expert and don't know if it's really OK for such small amount of data).
Now here is what I don't understand (and don't like):
I try to delete all records in Child where Parent_ID=2:
BEGIN TRAN
DELETE FROM child
WHERE parent_id = 2
ROLLBACK TRAN
The IO statistics show this (for the DELETE):
Table 'Child'. Scan count 1, logical reads 38486782, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
38486782 logical reads... isn't that huge? I have tried to update statistics to be sure.
UPDATE STATISTICS Child WITH FULLSCAN
Then ran my query again => same results. May be the problem is the Index Delete on IX_Child_Parent_ID?
After disabling the index on the foreign key, things went much better:
Table 'Child'. Scan count 1, logical reads 202233, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Note: SQL Server suggests to create an index for the FK.
202233 logical reads sounds much better... at least for the specific case of Parent_Id=2.
The question is: why is SQL Server using the index and did not choose the clustered index scan method when it knows there are about 4 000 000 rows for Parent_ID = 2? Or may be it doesn't know? Aren't the statistics supposed to "help" SQL Server to know this kind of information?
I'm probably missing something.
(I have double checked and the statistics are - seemed to be - OK after the index is created:

For the purpose of a delete and with the cardinality of the data you have, the index on parent_id is not useful, as you have seen.
Given you know you need to delete 99% of the rows, doing so is highly inneficient for many reasons, not least the growth of the transaction log.
Every statement you execute is atomic and its own implicit transaction, if the delete were to fail mid-way ie a power cut, SQL Server needs to be able to roll the incomplete delete back, for which it uses the transaction log, so all the rows that are deleted will hit the transaction log.
In cases such as these it's much more performant to insert the rows to keep into a new table, drop the original table and then rename the new table to the original; you can also script the indexes/constraints from the original and apply them to the new table.
Where deleting a large proportion of rows but less than 50% of the table, other recommendations would be to split the job into batches and delete <5000 rows at a time (5000 is the rough threshold for lock-escalation).
Often it can help to create a View on the table selecting the top n rows to delete, ordered by a specific key, then delete from the View.

May be I found the answer here: https://stackoverflow.com/a/3650886.
Looks like this is an expected behavior and that the problem is really updating the index (IX_Child_Parent_ID in my case).

Related

How to best use multicolumn index with value ranges in SQL Server?

I'm running SQL Server 2016 Enterprise edition.
I have a table with 24 columns and 10 indexes. Those indexes are vendor defined so I cannot change them. I have a hard time to understand how to get best performance as whatever I do SQL Server chooses in my opinion a poor execution plan.
The following query :
SELECT event_id
FROM Events e WITH(NOLOCK, index=[Event_By_PU_And_TimeStamp])
WHERE e.timestamp > '2022-05-12 15:00'
AND e.PU_Id BETWEEN 103 AND 186
results in this index seek:
The specified index is the clustered index and it has two columns PU_ID and Timestamp. Even though the SEEK PREDICATE lists both PU_ID and Timestamp as the used columns the "Number of rows read" is too high in my opinion. Without the index hint SQL chooses a different index for the seek with double rows-read number.
Unfortunately the order of the columns in the index is PU_ID, Timestamp, while Timestamp is the much more selective column here.
However if I change the PU_ID condition to list every possible number between the margins
PU_ID IN (103,104,105,...186)
then the "rows read are exactly the number of returned rows" and the statistics output confirms a better performance (validated with profiler trace).
Between-condition:
(632 rows affected)
Table 'Events'. Scan count 7, logical reads 139002, physical reads 0, read-ahead reads 1, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
IN-condition with every number written out:
(632 rows affected)
Table 'Events'. Scan count 84, logical reads 459, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Edit: the IndexSeek for the IN-query
What is the best way to make SQL Server choose the better plan?
Do I really need to write out all possible PU_IDs in every query?
The used index is just a simple two column index, it's just the clustered index as well:
CREATE UNIQUE CLUSTERED INDEX [Event_By_PU_And_TimeStamp] ON [dbo].[Events]
(
[PU_Id] ASC,
[TimeStamp] ASC
)

No blocking, good execution plan, slow query: why?

I have a query that occasionally takes several minutes to complete. Several processes are running concurrently but there is no blocking (I'm running an extended events session, I can see blocking of other transactions, so the query to inspect the logged events is working).
Looking at the query plan cache, the execution plan is a good one: running it in SSMS, it takes less than 100 IOs, and there are no table or index scans.
There is the possibility that the users are getting a different plan, but if I add hints to use scans on all tables (and some are fairly large), it still returns in around 1 second. So the worst possible execution plan still wouldn't result in a query that takes several minutes.
Having ruled out blocking and a bad execution plan, What else can make a query slow ?
One thing worth pointing out is that SQL Server uses an indexed view we have created, although the code doesn't reference it (we're using SQL Server Enterprise). That indexed view has a covering index to support the query and it is being used - again, the execution plan is very good. The original query is using NOLOCK, and I observed that no locks are taken on any rows or pages of the indexed view either (so SQL Server respects our locking hints, even though it's accessing an indexed view instead of the underlying tables - good). This makes sense, otherwise I would have expected to see blocking.
We are using indexed views in some other queries but we reference them in SQL code (and specify NOLOCK, NOEXPAND). I've not seen any problems with those queries, and I'm not aware that there should be any difference between indexed views that we tell the optimizer to use and indexed views that the optimizer itself decides to use, but what I'm seeing suggests that there is.
Any thoughts ? Anything else I should be looking at ?
This is the query:
execute sp_executesql
N'SELECT DISTINCT p.policy_id
, p.name_e AS policy_name_e
, p.name_l AS policy_name_l
FROM patient_visit_nl_view AS pv
INNER JOIN swe_cashier_transaction_nl_view AS ct ON ct.patient_visit_id = pv.patient_visit_id
AND ct.split_date_time IS NOT NULL
INNER JOIN ar_invoice_nl_view AS ai ON ai.ar_invoice_id = ct.invoice_id
AND ai.company_code = ''KOC''
AND ai.transaction_status_rcd = ''TEMP''
INNER JOIN policy_nl_view p ON p.policy_id = ai.policy_id
WHERE pv.patient_id = #pv__patient_id'
, N' #pv__patient_id uniqueidentifier'
, #pv__patient_id = '5D61EDF1-7542-11E8-BFCB-D89EF37315A2'
Note: views with suffix _nl_view select from the table with NOLOCK (the idea is we can change this in future without affecting the business tier code).
You can see the query plan here: https://www.brentozar.com/pastetheplan/?id=HJI9Lj_WH
IO stats:
Table 'policy'. Scan count 0, logical reads 9, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'ar_invoice_cashier_transaction_visit_iview'. Scan count 1, logical reads 5, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Locks taken (IS locks on the objects involved, nothing else):
locks taken
Below the relevant part of the indexed view:
CREATE VIEW dbo.ar_invoice_cashier_transaction_visit_iview WITH SCHEMABINDING
AS
SELECT ai.ar_invoice_id
, ai.company_code
, ai.policy_id
, ai.transaction_status_rcd
, ct.cashier_transaction_id
, pv.patient_id
-- more columns
FROM dbo.ar_invoice AS ai
INNER JOIN dbo.swe_cashier_transaction AS ct ON ct.invoice_id = ai.ar_invoice_id AND ct.split_date_time IS NOT NULL
INNER JOIN dbo.patient_visit AS pv ON pv.patient_visit_id = ct.patient_visit_id
CREATE UNIQUE CLUSTERED INDEX XPKar_invoice_cashier_transaction_visit_iview ON dbo.ar_invoice_cashier_transaction_visit_iview (ar_invoice_id, cashier_transaction_id)
CREATE INDEX XIE4ar_invoice_cashier_transaction_visit_iview ON dbo.ar_invoice_cashier_transaction_visit_iview (patient_id, transaction_status_rcd, company_code) INCLUDE (policy_id)
So far so good.
But every few days (and not at the same time of day), things go pear-shaped, the query takes minutes and actually times out (the command timeout of the provider is set to 10 minutes). When this happens, there is no blocking. I have an extended event session and this is my query
DECLARE #event_xml xml;
SELECT #event_xml = CONVERT(xml, target_data)
FROM sys.dm_xe_sessions AS s
INNER JOIN sys.dm_xe_session_targets AS t ON s.address = t.event_session_address
WHERE s.name = 'Blocking over 10 seconds'
SELECT DATEADD(hour, DATEDIFF(hour, GETUTCDATE(), GETDATE()), R.c.value('#timestamp', 'datetime')) AS time_stamp
, R.c.value('(data[#name="blocked_process"]/value[1]/blocked-process-report[1]/blocked-process[1]/process)[1]/#spid', 'int') AS blocked_spid
, R.c.value('(data[#name="blocked_process"]/value[1]/blocked-process-report[1]/blocked-process[1]/process[1]/inputbuf)[1]', 'varchar(max)') AS blocked_inputbuf
, R.c.value('(data[#name="blocked_process"]/value[1]/blocked-process-report[1]/blocked-process[1]/process[1]/#waitresource)[1]', 'varchar(max)') AS wait_resource
, R.c.value('(data[#name="blocked_process"]/value[1]/blocked-process-report[1]/blocking-process[1]/process)[1]/#spid', 'int') AS blocking_spid
, R.c.value('(data[#name="blocked_process"]/value[1]/blocked-process-report[1]/blocking-process[1]/process[1]/inputbuf)[1]', 'varchar(max)') AS blocking_inputbuf
, R.c.query('.')
FROM #event_xml.nodes('/RingBufferTarget/event') AS R(c)
ORDER BY R.c.value('#timestamp', 'datetime') DESC
This query is returning other cases of blocking, so I believe it's correct. At the time the problem (the timeouts) occur, there are no cases of blocking involving the query above, or any other query.
Since there is no blocking, I'm looking at the possibility of a bad query plans. I didn't find a bad plan in the cache (I had already recommended an sp_recompile on of the tables before I was given remote access), so I tried to think of the worst possible one: scans for every table. Applying the relevant options, here are the IO stats for this query:
Table 'patient_visit'. Scan count 1, logical reads 4559, physical reads 0, read-ahead reads 7, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'swe_cashier_transaction'. Scan count 9, logical reads 24840, physical reads 0, read-ahead reads 23660, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'ar_invoice'. Scan count 9, logical reads 21247, physical reads 0, read-ahead reads 7074, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'policy'. Scan count 9, logical reads 271, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
And here is the execution plan: https://www.brentozar.com/pastetheplan/?id=rJr29s_br
The customer has a beefy SQL Server 2012 box, plenty of cores (maxdop is set to 8), tons of memory. It eats this bad query for breakfast (takes around 350 msec).
For completeness, here are the row counts of the tables involved:
ar_invoice: 2363527
swe_cashier_transaction: 2946514
patient_visit: 654976
policy: 1038
ar_invoice_cashier_transaction_visit_iview: 1999609
I also ran the query for a patient_id that returns the most rows, and for a patient_id that didn't exist (i.e. 0 rows). I ran these with the recompile option: in both cases the optimizer selected the same (good) execution plan.
So back to the question: there is no blocking, the query plan seems to be good (and even if it was bad, it wouldn't be bad to the extent this query takes 10 minutes), so what can cause this ?
The only thing a little unusual here is that, although the SQL doesn't select from the indexed view, the optimizer uses it anyway - and this is or should be a good thing. I know the Enterprise version claims it can do this, but this is the first time I've seen it in the wild (I've seen plenty of the opposite though: referencing an indexed view in SQL, but the optimizer selects from the view's underlying tables anyway). I'm tempted to believe that this is relevant.
Without knowing anything about your setup, a few other things I would check:
what is overall CPU and memory utilisation like on the box, could there be resource contention
if your storage is on a SAN rather than local storage, is there contention at the storage end (this can happen if you have heavy reads/writes on the same disk arrays from different systems)
There can be several other factors involved in slowing down a query. Personally I don't really trust the SQL Server's Optimization technique though. Normally I would recommend to optimize your query so that optimizer does NOT have to do hard work, for example use Exists / In on main table instead of joining and doing distinct/grouping, like,
select distinct ia.AttributeCode, ia.AttributeDescription
from ItemsTable as i
inner join ItemAttributesTable as ia on i.AttributeCode = ia.AttributeCode
where i.Manufacturer = #paramMfr
and i.MfrYear between #paramYearStart and #paramYearYend
instead of running a query like above run it like this
select ia.AttributeCode, ia.AttributeDescription
from ItemAttributesTable as ia
where ia.AttributeCode in (
select i.AttributeCode
from ItemsTable as i
where i.Manufacturer = #paramMfr
and i.MfrYear between #paramYearStart and #paramYearYend
)
I am NOT really expert in indexing, but for above case, I think only 1 index should be sufficient in ItemsTable
Another optimization can be done by removing the views and directly using the tables, because views may also be doing joins on other tables that are really not required here.
All in all, the main point is that when query optimizer is figuring out the best possible scenario and it may run into the case where it reaches to the timeout (which is called Optimizer TimeOut limit), in that case it may pick up a plan which is NOT really good at that specific time, which is why the plan cache should be used. That's the reason here I am recommending to focus on optimizing the query rather looking at the reasons why it's timing out.
Check this out as well https://blogs.msdn.microsoft.com/psssql/2018/10/19/understanding-optimizer-timeout-and-how-complex-queries-can-be-affected-in-sql-server/
Update-1:
Recommendations:
Use Exists / In, even if you see the same execution plan as your current query, still this will help optimizer to almost always use the correct plan
Try eliminating the views and directly use the tables, with fewer select columns.
Make sure you have proper indexed defined as per the given parameters
Try breaking the query into smaller parts, for example pick the filtered data in temporary table and then grab rest of the details using temporary tables
Try googling "Timeout in Application not in SSMS" and see different hacks
Common causes of query timeout:
No indexing defined
Extracting too much data
There is/are lock(s) on one or more tables while you are trying to read data from those table(s)
Parameter type and Field type difference, for example, the column is varchar while parameter type is nvarchar
Parameter sniffing

weird phenomena during deadlocks involving IMAGE or TEXT columns

This is something very disturbing I stumbled upon while stress-testing an application using Sybase ASE 15.7.
We have the following table:
CREATE TABLE foo
(
i INT NOT NULL,
blob IMAGE
);
ALTER TABLE foo ADD PRIMARY KEY (i);
The table has, even before starting the test, a single row with some data in the IMAGE column. No rows are either deleted or inserted during the test. So the table always contains a single row. Column blob is only updated (in transaction T1 below) to some value (not NULL).
Then, we have the following two transactions:
T1: UPDATE foo SET blob=<some not null value> WHERE i=1
T2: SELECT * FROM foo WHERE i=1
For some reason, the above transactions may deadlock under load (approx. 10 threads doing T1 20 times in a loop and another 10 threads doing T2 20 times in loop).
This is already weird enough, but there's more to come. T1 is always chosen as the deadlock victim. So, the application logic, on the event of a deadlock (error code 1205) simply retries T1. This should work and should normally be the end of the story. However …
… it happens that sometimes T2 will retrieve a row in which the value of the blob column is NULL! This is even though the table already starts with a row and the updates simply reset the previous (non-NULL) value to some other (non-NULL) value. This is 100% reproducible in every test run.
This is observed with the READ COMMITTED serialization level.
I verified that the above behavior also occurs with the TEXT column type but not with VARCHAR.
I've also verified that obtaining an exlusive lock on table foo in transaction T1 makes the issue go away.
So I'd like to understand how can something that so fundamentally breaks transaction isolation be even possible? In fact, I think this is worse than transaction isolation as T1 never sets the value of the blob column to NULL.
The test code is written in Java using the jconn4.jar driver (class com.sybase.jdbc4.jdbc.SybDriver) so I don't rule out that this may be a JDBC driver bug.
update
This is reproducible simply using isql and spawning several shells in parallel that continuously execute T1 in a loop. So I am removing the Java and JDBC tags as this is definitely server-related.
Your example create table code by default would create an allpages locked table unless your DBA has changed the system-wide 'lock scheme' parameter via sp_configure to another value(you can check this yourself as anyone via sp_configure 'lock scheme'.
Unless you have a very large number of rows they are all going to be sat on a single data page because an int is only 4 bytes long and the blob data is stored at the end of the table (unless you use the in-row LOB functionality in ASE15.7 and up). This is why you are getting deadlocks. You have by definition created a single hotspot where all the data is being accessed at the page level. This is even more likely where larger page sizes > 2k are used, since by their nature they will have even more rows per page and with allpages locking, even more likelihood of contention.
Change your locking scheme to datarows (unless you are planning to have very high rowcounts) as has been said above and your problem should go away. I will add that your blob column looks to allow nulls from your code, so you should also consider setting the 'dealloc_first_txtpg' attribute for your table to avoid wasted space if you have nulls in your image column.
We've seen all kinds of weird stuff with isolation level 1. I'm under the impression that when T2 is in progress, T1 can change data and T2 might return intermediate result of T1.
Try isolation level 2 and see if it helps (does for us).

SQL Server does not choose to use index although everything seems to suggest it

Something is wrong here and I don't understand what. It's worth to mention, there searched value is not in the table, for an existing value there is no problem. Though, why does the first query require a clustered key search for the primary key which is not even used in the query, while the second can run on the index directly.
Forcing the query to use the index WITH(INDEX(indexname)) does work, but why does the optimizer not choose to use it by itself.
The column PIECE_NUM is not in any other index and is also not the primary key.
SET STATISTICS IO ON
DECLARE #vchEventNum VARCHAR(50)
SET #vchEventNum = '54235DDS28KC1F5SJQMWZ'
SELECT TOP 1
fwt.WEIGHT,
fwt.TEST_RESULT
FROM FIN_WEIGHT_TESTS fwt WITH(NOLOCK)
WHERE fwt.PIECE_NUM LIKE #vchEventNum + '%'
ORDER BY fwt.DTTM_INSERT DESC
SELECT TOP 1
fwt.WEIGHT,
fwt.TEST_RESULT
FROM FIN_WEIGHT_TESTS fwt WITH(NOLOCK)
WHERE fwt.PIECE_NUM LIKE '54235DDS28KC1F5SJQMWZ' + '%'
ORDER BY fwt.DTTM_INSERT DESC
SET STATISTICS IO OFF
I let both queries run in one batch:
IO statistics report:
Query 1: logical reads 16244910
Query 2: logical reads 5
Table 'FIN_WEIGHT_TESTS'. Scan count 1, logical reads 16244910, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'FIN_WEIGHT_TESTS'. Scan count 1, logical reads 5, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
The table has a non-clustered index on PIECE_NUM INCLUDING all three other columns from the query.
Here are the query execution plans(with a little editing to remove the actual names):
I noticed the convert_implicit, but that is just due to conversion of the varchar parameter to nvarchar column. Changing the parameter type did not change the behaviour of the query.
Why does the query with the parameter not use the index while replacing the parameter with its value does?
The first query is going to scan because you are using a local variable. The optimizer sees this as an "anonymous" value and therefore cannot use statistics to build a good query plan.
The second query seeks because it is a literal value and SQL can look into it's stats and knows much better how many estimated rows it will find with that value.
If you run your first query as follows I would imagine you will see it use the better plan:
DECLARE #vchEventNum VARCHAR(50)
SET #vchEventNum = '54235DDS28KC1F5SJQMWZ'
SELECT TOP 1
fwt.WEIGHT,
fwt.TEST_RESULT
FROM FIN_WEIGHT_TESTS fwt WITH(NOLOCK)
WHERE fwt.PIECE_NUM LIKE #vchEventNum + '%'
ORDER BY fwt.DTTM_INSERT DESC
OPTION(RECOMPILE)
I would suggest using a parameterized procedure to run this code to ensure that it uses a cached plan. Using the RECOMPILE hint has it's own drawbacks as the optimizer will need to rebuild the plan every time it runs. So if you run this code very often I would avoid this hint.
You can read about local variables here:https://www.brentozar.com/archive/2014/06/tuning-stored-procedures-local-variables-problems/
I think the cause of what happened is both use of #variable and ORDER BY in your query. To test my guess remove order by from your query and it may lead to equal plans in both cases (this time with different estimated number of rows reported in select).
As mentioned in previous answer, local variables cannot be sniffed at compilation time as the batch is seen as a whole thing, and only recompile option permits to server to know the value of a variable at the compilation time as recompilation begins when the variable is already assigned. This leads to "estimate for unknown" in the first case i.e. statistics cannot be used as we don't know the value in the filter, more rows in output are estimated.
But the query has top + order by in it. This means that if we expect many rows, to get only one but the first ordered by DTTM_INSERT DESC we must sort all the filtered rows. In fact if you look at the second plan you see that the SORT operator costs most of all. But when you use the constant, SQL Server uses the statistics and finds out that there will be only one row returned, so it can permit to sort the result.
In case of many rows expected it decides to use the index that is already ordered by DTTM_INSERT. It's only my guess because you did not post here the creation scripts for your indexes but from the plan I see that the first plan surely goes to the clustered index to grab the fields that are missing in non-clustered index, this means it's not the same non clustered that is used in the second case, but I'm sure the index chosen in the first case has THE LEADING KEY COLUMN DTTM_INSERT. Doing so server eliminates the sort that we see in the second plan

Avoiding Locking Contention on DB2 zOS

I want to place DB2 Triggers for Insert, Update and Delete on DB2 Tables heavily used in parallel online Transactions. The tables are shared by several members on a Sysplex, DB2 Version 10.
In each of the DB2 Triggers I want to insert a row into a central table and have one background process calling a Stored Procedure to read this table every second to process the newly inserted rows, ordered by sequence of the insert (sequence number or timestamp).
I'm very concerned about DB2 Index locking contention and want to make sure that I do not introduce Deadlocks/Timeouts to the applications with these Triggers.
Obviously I would take advantage of DB2 Features to reduce locking like rowlevel locking, but still see no real good approach how to avoid index contention.
I see three different options to select the newly inserted rows.
Put a sequence number in the table and the store the last processed sequence number in the background process. I would do the following select Statement:
SELECT COLUMN_1, .... Column_n
FROM CENTRAL_TABLE
WHERE SEQ_NO > 'last-seq-number'
ORDER BY SEQ_NO;
Locking Level must be CS to avoid selecting uncommited rows, which will be later rolled back.
I think I need one Index on the table with SEQ_NO ASC
Pro: Background process only reads rows and makes no updates/deletes (only shared locks)
Neg: Index contention because of ascending key used.
I can clean-up processed records later (e.g. by rolling partions).
Put a Status field in the table (processed and unprocessed) and change the Select as follows:
SELECT COLUMN_1, .... Column_n
FROM CENTRAL_TABLE
WHERE STATUS = 'unprocessed'
ORDER BY TIMESTAMP;
Later I would update the STATUS on the selected rows to "processed"
I think I need an Index on STATUS
Pro: No ascending sequence number in the index and no direct deletes
Cons: Concurrent updates by online transactions and the background process
Clean-up would happen in off-hours
DELETE the processed records instead of the status field update.
SELECT COLUMN_1, .... Column_n
FROM CENTRAL_TABLE
ORDER BY TIMESTAMP;
Since the table contains very few records, no index is required which could create a hot spot.
Also I think I could SELECT with Isolation Level UR, because I would detect potential uncommitted data on the later delete of this row.
For a Primary Key index I could use GENERATE_UNIQUE,which is random an not ascending.
Pro: No Index hot spot and the Inserts can be spread across the tablespace by random UNIQUE_ID
Con: Tablespace scan and sort on every call of the Stored Procedure and deleting records in parallel to the online inserts.
Looking forward what the community thinks about this problem. This must be a pretty common problem e.g. SAP should have a similar issue on their Batch Input tables.
I tend to favour Option 3, because it avoids index contention.
May be there is still another solution in your minds out there.
I think you are going to have numerous performance problems with your various solutions.
(I know premature optimazation is a sin, but experience tells us that some things are just not going to work in a busy system).
You should be able to use DB2s autoincrement feature to get your sequence number, with little or know performance implications.
For the rest perhaps you should look at a Queue based solution.
Have your trigger drop the operation (INSERT/UPDATE/DELETE) and the keys of the row into a MQ queue,
Then have a long running backgound task (in CICS?) do your post processing as its processing one update at a time you should not trip over yourself. Having a single loaded and active task with the ability to batch up units of work should give you a throughput in the order of 3 to 5 hundred updates a second.

Resources