Short version:
I am trying to determine which is the best transaction isolation level for the SQL server in our premises.
Long version:
I am extracting data from API calls and loading it into staging tables which then are incrementally loaded into destination tables. These destination tables are used in multiple ways, some of which are mentioned below:
Load data into a CRM through SSIS
Feed PowerBI reports (scheduled refreshes)
Apply business transformations to the data and load it into a Data Warehouse
Extract data into excel documents
(Most importantly) Do changes to the destination tables outside of the initial ETL process (From API to Staging to Destination)
Due to the large datasets, the issues I am facing are:
Deadlocks which I avoid by utilizing temporary tables and CTEs
Long waits between the updates of the tables (since one stored procedure that updates a destination table may wait up to an hour until this table is not used by another update)
Long PowerBI refresh waits and sometimes refresh timeouts when the SQL tables are being updated
Long Select statement waits when the SQL tables are being updated
Given that:
The industry I'm working is not banking or a sort of industry in which the data needs to be 100% accurate all the time
The PowerBI reports refresh only twice a day
I urgently need to utilize the data in those destination tables for other reporting purposes too
The datasets contain millions of records
What isolation level is suitable for this occasion? Or would it be better to set individual isolation levels through table hints?
Note1 : My employer and I would not mind if we had some dirty reads in the report refreshes as long as this means the reports refresh in a consequent manner and the tables can be used in other stored procedures (both read and update) without having to wait.
Note2 : The is_read_committed_snapshot_on is 0 in our SQL server.
READ COMMITTED with READ COMMITTED SNAPSHOT ISOLATION set for the database will enable readers to read without getting blocked by writers, prevent the writers from being blocked by the readers, and it doesn’t cause dirty reads.
So that’s the obvious first step.
Related
From analyzing table locks in SQL Server, my Win32 application built in RAD Studio XE7 starts numerous transactions while each FDQuery is active. Sometimes this causes application problems and locks with dozens of users. Especially with triggered tables.
For my test, I used simple FDConnection and FDQuery as Select * from Customer with default settings, and concluded that FDQuery1.Active:=True causes the start of a Customer table transaction. The transaction disappears when FDQuery1.Active:=false.
I would like to inhibit the starting of transactions in FDQuery for read-only, as lists of data for grid or reports.
But I can't find a way to find the appropriate tuning of FDQuery.
By default, SQL Server does not implement versioning of data blocks. So, to return a consistent set of rows, it guarantee that no other sessions makes changes do data during execution of a query, using shared locks.
Using "WITH(NOLOCK)" disable shared locks, but can result in an inconsistent result set.
The only one solution is to use READ_COMMITED_SNAPSHOT isolation level, which store changed data to temp, used to return consistent result sets without locking updates.
Look at the NOLOCK keyword for tables. https://www.mssqltips.com/sqlservertip/2470/understanding-the-sql-server-nolock-hint/
I need some light here. I am working with SQL Server 2008.
I have a database for my application. Each table has a trigger to stores all changes on another database (on the same server) on one unique table 'tbSysMasterLog'. Yes the log of the application its stored on another database.
Problem is, before any Insert/update/delete command on the application database, a transaction its started, and therefore, the table of the log database is locked until the transaction is committed or rolled back. So anyone else who tries to write in any another table of the application will be locked.
So...is there any way possible to disable transactions on a particular database or on a particular table?
You cannot turn off the log. Everything gets logged. You can set to "Simple" which will limit the amount of data saved after the records are committed.
" the table of the log database is locked": why that?
Normally you log changes by inserting records. The insert of records should not lock the complete table, normally there should not be any contention in insertion.
If you do more than inserts, perhaps you should consider changing that. Perhaps you should look at the indices defined on log, perhaps you can avoid some of them.
It sounds from the question that you have a create transaction at the start of your triggers, and that you are logging to the other database prior to the commit transaction.
Normally you do not need to have explicit transactions in SQL server.
If you do need explicit transactions. You could put the data to be logged into variables. Commit the transaction and then insert it into your log table.
Normally inserts are fast and can happen in parallel with out locking. There are certain things like identity columns that require order, but this is very lightweight structure they can be avoided by generating guids so inserts are non blocking, but for something like your log table a primary key identity column would give you a clear sequence that is probably helpful in working out the order.
Obviously if you log after the transaction, this may not be in the same order as the transactions occurred due to the different times that transactions take to commit.
We normally log into individual tables with a similar name to the master table e.g. FooHistory or AuditFoo
There are other options a very lightweight method is to use a trace, this is what is used for performance tuning and will give you a copy of every statement run on the database (including triggers), and you can log this to a different database server. It is a good idea to log to different server if you are doing a trace on a heavily used servers since the volume of data is massive if you are doing a trace across say 1,000 simultaneous sessions.
https://learn.microsoft.com/en-us/sql/tools/sql-server-profiler/save-trace-results-to-a-table-sql-server-profiler?view=sql-server-ver15
You can also trace to a file and then load it into a table, ( better performance), and script up starting stopping and loading traces.
The load on the server that is getting the trace log is minimal and I have never had a locking problem on the server receiving the trace, so I am pretty sure that you are doing something to cause the locks.
I have an SSIS package which when runs, updates a table. It is using a staging table and subsequently, uses slowly changing dimension table to load data into the warehouse. We have set it up as a SQL Agent job and it runs every two hours.
The isolation level of the package is serializable. The database isolation level is read committed.
The issue is that when this job runs, this job blocks that table and therefore, clients cannot run any reports. It blanks it out.
So what would be the best option for me to avoid it? clients need to see that data, meanwhile, we need to update the table every two hours.
Using Microsoft SQL Server 2012 (SP3-GDR) (KB4019092) - 11.0.6251.0 (X64)
Thanks.
You're getting "lock escalation". It's a feature, not a bug. 8-)
SQL Server combines large numbers of smaller locks into a table lock to improve performance.
If INSERT performance isn't an issue, you can do your data load in smaller chunks inside of transactions and commit after each chunk.
https://support.microsoft.com/en-us/help/323630/how-to-resolve-blocking-problems-that-are-caused-by-lock-escalation-in
Another option is to give your clients/reports access to a clone of your warehouse table.
Do your ETL into a table that no one else can read from, and when it is finished, switch the table with the clone.
We have an audit database (oracle) that holds monitor information of all activities performed by services (about 100) deployed on application servers. As you may imagine the audit database is really huge because of the volume of requests the services process. And the only write transaction that occurs on this database is services writing audit information in real-time.
As the audit database started growing (more than a million records per day), querying required data (for example select all errors occurred with service A for requests between start date and end date) quickly became nearly impossible.
To address this, some "smart kids" decided to device a batch job that will copy data from the database over to another database (say, audit_archives) and delete records so that only 2 days worth of audit data is retained in audit database.
This initially looked neat but whenever the "batch" process runs, the audit process that inserts data to audit database starts to become very slow - and sometimes the "batch" process also fails due to database contention.
What is a better way to design this scenario to perform above mentioned archival in most efficient way so that there is least impact to the audit process and the batch?
You might want to look into partitioning your base table.
Create a mirror table (as the target of the "historic" data) and create the same partitioning scheme on that one (most probably on a per-date basis).
Then you can simply exchange the "old" partitions (using ALTER TABLE the_table EXCHANGE partition) from one table to the other. Should only take a few seconds to "move" the partition. The actual performance would depend on the indexes defined (local, global).
This technique is usually used to do it the other way round (prepare new data to be fed into a reporting table in a datawarehouse environment) but should work for "archiving" as well.
I Easy way.
delete old records partially the best with FORALL statement
copy data partially the best with FORALL
add partitioning based on day of the week
II Queues
delete old records partially the best with FORALL statement
fill audit_archives with trigger on audit, in trigger use queue to avoid long dml
My application currently needs to upload a large amount of data to a database server (SQL Server) and locally on a SQLite database (local cache).
I have always used Transactions when inserting data to a database for speed purposes. But now that I am working with something like 20k rows or more per insert batch, I am worried that Transactions might cause issues. Basically, what I don't know is if Transactions have a limit on how much data you can insert under them.
What is the correct way to use transactions with large amounts of rows to be inserted in a database? Do you for instance begin/commit every 1000 rows?
No there is no such limit. Contrary to what you might believe, SQLite writes pending transactions into the database file, not RAM. So you should not run into any limits on the amount of data you can write under a transaction.
See SQLite docs for these info: http://sqlite.org/docs.html
Follow the link "Limits in SQLite" for implementation limits like these.
Follow the link "How SQLite Implements Atomic Commit" for how transactions work
I dont see any problems doing this but if there are any constraint/ referential integrity errors then probably you got insert them all again and also the table is locked till the time the transaction is commited. Breaking down into smaller portions while logging activity in each batch will help.
A better option would be to BCP insert them into the target while dealing with many rows or even an SSIS package to do this.