Maintaining Row Sequence across multiple tables - sql-server

I have triggers on multiple tables that capture updates to a table (Capture Table). Once that insert is complete the trigger completes and control returns to the calling application. It's expected that this capture table will receive about 1.75 million records per day.
I then have a Windows service that runs and takes the data from the Capture Table, adds some context by referencing other tables in our database, then inserts records into a notification table that is used by another process.
Our Capture Table has an IDENTITY column and a DATETIME to help with sequencing. For the DATETIME, we set a variable to GETDATE() and use the variable so that all records in the same transaction will have the exact same DATETIME.
I am looking at making the process more robust. At this point the Windows service calls a stored procedure and processes one transaction at a time in sequence. This could become a bottleneck. We may need to process multiple transactions at the same time.
How can I guarantee the sequence is the same between the Capture Table and the notification table? It’s critical that the records stay in the same order. Thanks for your input!

Related

is rowversion a transactionally-consistent value to capture table data changes

If an ETL process attempts to detect data changes on system-versioned tables in SQL Server by including rows as defined by a rowversion column to be within a rowversion "delta window", e.g.:
where row_version >= #previous_etl_cycle_rowversion
and row_version < #current_etl_cycle_rowversion
.. and the values for #previous_etl_cycle_rowversion and #current_etl_cycle_rowversion are selected from a logging table whose newest rowversion gets appended to said logging table at the start of each ETL cycle via:
insert into etl_cycle_logged_rowversion_marker (cycle_start_row_version)
select ##DBTS
... is it possible that a rowversion of a record falling within a given "delta window" (bounded by the 2 ##DBTS values) could be missed/skipped due to rowversion's behavior vis-à-vis transactional consistency? - i.e., is it possible that rowversion would be reflected on a basis of "eventual" consistency?
I'm thinking of a case where say, 1000 records are updated within a single transaction and somehow ##DBTS is "ahead" of the record's committed rowversion yet that specific version of the record is not yet readable...
(For the sake of scoping the question, please exclude any cases of deleted records or immediately consecutive updates on a given record within such a large batch transaction.)
If you make sure to avoid row versioning for the queries that read the change windows you shouldn't miss many rows. With READ COMMITTED SNAPSHOT or SNAPSHOT ISOLATION an updated but uncommitted row would not appear in your query.
But you can also miss rows that got updated after you query ##dbts. That's not such a big deal usually as they'll be in the next window. But if you have a row that is constantly updated you may miss it for a long time.
But why use rowversion? If these are temporal tables you can query the history table directly. And Change Tracking is better and easier than using rowversion, as it tracks deletes and optionally column changes. The feature was literally built for to replace the need to do this manually which:
usually involved a lot of work and frequently involved using a
combination of triggers, timestamp columns, new tables to store
tracking information, and custom cleanup processes
.
Under SNAPSHOT isolation, it turns out the proper function to inspect rowversion which will ensure contiguous delta windows while not skipping rowversion values attached to long-running transactions is MIN_ACTIVE_ROWVERSION() rather than ##DBTS.

Oracle database table delete best practices

Environment: Oracle 12C
Got a table with about 10 columns which include few clob and date columns. This is a very busy table for an ETL process as described below-
Flat files are loaded into the table first, then updated and processed. The insert and updates happen in batches. Millions of records are inserted and updated.
There is also a delete process to delete old data based on a date field from the table. The delete process runs as a pl/sql procedure and deletes from the table in a loop fetching first n records only based on date field.
I do not want the delete process to interfere with the regular insert/update . What is the best practice to code the delete so that it has minimal impact on the regular insert/update process ?
I can also partition the table and delete in parallel since each partition uses its own rollback segment but am looking for a simpler way to tune the delete process.
Any suggestions on using a special rollback segment or other tuning tips ?
The first thing you should look for is to decouple various ETL processes so that you need not do all of them together or in a particular sequence. Thereby, removing the dependency of the INSERTS/UPDATES and the DELETES. While a insert/update you could manage in single MERGE block in your ETL, you could do the delete later by simply marking the rows to be deleted later, thus doing a soft delete. You could do this as a flag in your table column. And use the same in your application and queries to filter them out.
By doing the delete later, your critical path of the ETL should minimize. Partitioning the data based on date range should definitely help you to maintain the data and also make the transactions efficient if it's date driven. Also, look for any row-by-row thus slow-by-slow transactions and make them in bulk. Avoid context switching between SQL and PL/SQL as much as possible.
If you partition the table as a date range, then you could look into DROP/TRUNCATE partition which will discard the rows stored in that partition as a DDL statement. This cannot be rolled back. It executes quickly and uses few system resources (Undo and Redo). You can read more about it in the documentation.

How to reduce blocking during concurrent DELETE & INSERT to a single table in SQL Server

We have a stored procedure which loads order details about an order. We always want the latest information about an order, so order details for the order are regenerated every time, when the stored procedure is called. We are using SQL Server 2016.
Pseudo code:
DELETE by clustered index based on order identifier
INSERT into the table, based on a huge query containing information about order
When multiple end-users are executing the stored procedure concurrently, there is a blocking created on orderdetails table. Once the first caller is done, second caller is queued, followed by third caller. So, the time for the generation of the orderdetails increases as time goes by. This is happening especially in the cases of big orders containing details rows in > 100k or 1 or 2 million, as there is table level lock is happening.
The approach we took
We partitioned the table based on the last digit of the order identifier for concurrent orderdetails loading. This improves the performance in the case of first time orderdetails loading, as there are no deletes. But, second time onwards, INSERT in first session is causing blocking for other sessions DELETE. The other sessions are blocked till first session is done with INSERT.
We are considering creation of separate orderdetails table for every order to avoid this concurrency issues.
Question
Can you please suggest some approach, which will support concurrent DELETE & INSERT scenario ?
We solved the contention issue by going for temporary table for orderdetails. We found that huge queries are taking longer SELECT time and this longer time was contributing to longer table level locks on the orderdetails table.
So, we first loaded data into temporary table #orderdetail and then went for DELETE and INSERT in the orderdetail table.
As the orderdetail table is already partitioned, DELETE were faster and INSERT were happening in parallel. INSERT was also very fast here, as it is simple table scan from #orderdetail table.
You can give a look to the Hekaton Engine. It is available even in SQL Server Standard Edition if you are using SP1.
If this is too complicated for implementation due to hardware or software limitations, you can try to play with the Isolation Levels of the database. Sometimes, queries that are reading huge amount of data are blocked or even deadlock victims of queries which are modifying parts of these data. You can ask yourself do you need to guarantee that the data read by the user is valid or you can afford for example some dirty reads?

table lock is transactions and multiple query access the same table

I'm using the transaction in .net. I use to insert data into four tables with different methods.
since it is order inflow method, busy in peak hours. The table used for order submission, the same table used for bulk order processing and batch job processing at the same time. so sometime we are getting timeout in any of the above 3 areas.
Mean to say single table used in major 3 transaction areas same time.
order submission.(insert operation)
Job(updating a column)
Bulk order processing(updating a column)
Is that due to locking in order submission affect other transaction also.
Nolock is not used in transactions.

Is Log Sequence Number (LSN) unique for database or table in SQL Server?

I am using SQL CDC to track changes for multiple tables in SQL Server. I would want to report out these changes in right sequence for each I have a program which collects the data from each CDC table. But I want to make sure that all the changes that are happening to these tables are reported in correct sequence. Can I rely on LSN for the right sequence?
The LSN number is unique for a given transaction but is not globally unique. If you have multiple records within the same transaction they will all share the same __$start_lsn value in cdc. If you want the correct order of operations you need to sort by __$start_lsn, __$seqval, then __$operation. The __$seqval represents the id of the individual operation within the wrapping transaction.
For example, I have a table in the dbo schema named foo. It has one column y. If I run this statement:
INSERT INTO dbo.foo VALUES (1);
INSERT INTO dbo.foo VALUES (2);
Then I will see two separate LSN values in cdc because these are in two separate transactions. If I run this:
BEGIN TRAN
INSERT INTO dbo.foo VALUES (1);
INSERT INTO dbo.foo VALUES (2);
COMMIT TRAN
Then I will see one LSN value for both records, but they will have different __$seqval values, and the seqval for my first record will be less than the seqval for my second record.
LSN is unique, ever increasing within the database, across all tables in that database.
In most cases LSN value is unique across all tables, however I found instances where one single LSN value belongs to the changes in 40 tables. I don't know the SQL script that associated with those changes, but I know that all operations were 'INSERT'.
Not sure if it is a bug. CDC documentations is poor, covers just basics. Not many users know that CDC capture process has many bugs confirmed by MS for both SQL 2014 & 2016 (we have the open case).
So I would not rely on the documentation. It may be wrong in some scenarios. It's better to implement more checks and test it with large volume of different combinations of changes.
I also encountered that scenario. In my experience and what I understood is in your first example, there are 2 transactions happened so you will really get 2 different LSN. While in your second example, you only have 1 transaction with 2 queries inside. The CDC will count it as only 1 transaction since it is inside BEGIN and END TRAN. I can't provide links to you since this is my personal experience.

Resources