Huge transaction in Sql Server, are there any problems? - sql-server

I have a program which does many bulk operations on an SQL Server 2005 or 2008 database (drops and creates indexes, creates columns, full table updates etc), all in one transaction.
Are there any problems to be expected?
I know that the transaction log expands even in Simple recovery mode.
This program is not executed during normal operation of the system, so locking and concurrency is not an issue.
Are there other reasons to split the transaction into smaller steps?

In short,
Using smaller transactions provides more robust recovery from failure.
Long transactions may also unnecessarily hold locks on objects for extended periods of time that other processes may require access to i.e. blocking.
Consider that if at any point between the time the transaction started and finished, your server experienced a failure, in order to be bring the database online SQL Server would have to perform the crash recovery process which would involve rolling back all uncommitted transactions from the log.
Supposing you developed a data processing solution that is intelligent enough to pick up from where it left off. By using a single transaction this would not be an option available to you because you would need to start the process from the begging once again.

If the transaction causes too many database log entries (updates) the log can hit what is known as the "high water mark". It's the point at which the log reaches (about) half of its absolute maximum size, when it must then commence rolling back all updates (which will consume about the same amount of disk as it took to do the updates.
Not rolling back at this point would mean risking eventually reaching the maximum log size and still not finishing the transaction or hitting a rollback command, at which point the database is screwed because there's not enough log space to rollback.

It isn't really a problem until you run out of disk space, but you'll find that rollback will take a long time. I'm not saying to plan for failure of course.
However, consider the process not the transaction log as such. I'd consider separating:
DDL into a separate transaction
Bulk load staging tables with a transaction
Flush data from staging to final table in another transaction
If something goes wrong I'd hope that you have rollback scripts and/or a backup.
Is there really a need to do everything atomically?

Depending on the complexity of your update statements, I'd recommend to do this only on small tables of, say, a few 100 rows. Especially if you have only a small amount of main memory available. Otherwise, for instance, updates on big tables can take a very long time and even appear to hang. Then it's difficult to figure out what the process (spid) is doing and how long it might take.
I'm not sure whether "Drop index" is transaction-logged operation anyway. See this question here on stackoverflow.com.

Related

Why do a periodic full data refresh?

Is there a benefit from doing a periodic full table refresh when you regularly insert/update/delete incrementally?
To clarify, this question is in regards to ETL processes.
If you are 100% certain that your incremental updates are capturing all CRUD operations, there is no reason to flush and fill. If your incrementals have room for error beyond the tolerance of the business rules governing the process, then you should consider period flush and fills.
It all depends on your source system, your target system, your ETL process, and your tolerance for error.
I'm not sure what you mean by 'data refresh', so I will take some liberties in assuming that you mean rebuilding indexes. Good maintenance involves rebuilding indexes periodically over time in order to eliminate fragmentation of any indexes on tables that are the result of INSERT/UPDATE/DELETE.
For more information, read: https://dba.stackexchange.com/questions/4283/when-should-i-rebuild-indexes
If you mean to say a full backup, then that is to truncate the transaction log and create a more recent database backup that you can fully restore from without having to restore the last full backup plus all incremental partial database backups and the transaction log backup.
For more information, read this: https://learn.microsoft.com/en-us/sql/relational-databases/backup-restore/full-database-backups-sql-server
and this: https://technet.microsoft.com/en-us/library/2009.07.sqlbackup.aspx

SQL Server ROLLBACK transaction took forever. Why?

We have a huge DML script, that opens up a transaction and performs a lot of changes and only then it commits.
So recently, I had triggered this scripts (through an app), and as it was taking quite an amount of time, I had killed the session, which triggered a ROLLBACK.
So the problem is that this ROLLBACK took forever and moreover it was hogging a lot of CPU (100% utilization), and as I was monitoring this session (using exec DMVs), I saw a lot of waits that are IO related (IO_COMPLETION, PAGE_IO_LATCH etc).
So my question is:
1. WHy does a rollback take some much amount of time? Is it because it needs to write every revert change to the LOG file? And the IO waits I saw could be related to IO operation against this LOG file?
2. Are there any online resources that I can find, that explains how ROLLBACK mechanism works?
Thank You
Based on another article on the DBA side of SO, ROLLBACKs are slower for at least two reasons: the original SQL is capable of being multithreaded, where the rollback is single-threaded, and two, a commit confirms work that is already complete, where the rollback not only must identify the log action to reverse, but then target the impacted row.
https://dba.stackexchange.com/questions/5233/is-rollback-a-fast-operation
This is what I have found out about why a ROLLBACK operation in SQL Server could be time-consuming and as to why it could produce a lot of IO.
Background Knowledge (Open Tran/Log mechanism):
When a lot of changes to the DB are being written as part of an open transaction, these changes modify the data pages in memory (dirty pages) and log records (into a structure called LOG BLOCKS) generated are initially written to the buffer pool (In Memory). These dirty pages are flushed to the disk either by a recurring Checkpoint operation or a lazy-write process. In accordance with the write-ahead logging mechanism of the SQL Server, before the dirty pages are flushed the LOG RECORDS describing these changes needs to be flushed to the disk as well.
Keeping this background knowledge in mind, now when a transaction is rolled back, this is almost like a recovery operation, where all the changes that are written to the disk, have to be undone. So, the heavy IO we were experiencing might have happened because of this, as there were lots of data changes that had to be undone.
Information Source: https://app.pluralsight.com/library/courses/sqlserver-logging/table-of-contents
This course has a very deep and detailed explanation of how logging recovery works in SQL Server.

Can I be a deadlock victim if I'm not executing a query within a transaction?

Lets say I open a transaction and run update queries.
BEGIN TRANSACTION
UPDATE x SET y = z WHERE w = v
The query returns successfully and the transaction stays open deliberately for a period of time before I decide to commit.
While I'm sitting on the transaction is it ever possible the MSSQL deadlock machinary would be able to preempt my open transaction that is not actually executing anything to either clear a deadlock or free resources as system memory/resource limits are reached?
I know about SET DEADLOCK_PRIORITY and have read the MSDN articles on the topic of deadlocks. Logically since I'm not actively seeking to stake claim on any additional resources I can't imagine a scenario that would trigger a sane deadlock avoidance algorithm.
Does anyone know for sure if its possible that simply holding any locks can make me a valid target? Similarly could any low resource condition trigger the killing of my SPID?
NO
For a deadlock to occur all the participants in the deadlock chain must be waiting for a resource (a lock). If your connection is idle it means it doesn't execute a request, which implies it cannot be waiting.
As for other conditions that can kill your session I can think of at least three:
administrative operations that use WITH ROLLBACK_IMMEDIATE
a mirroring failover
intentional KILL <yourspid>, perhaps as a joke by your friendly DBA
To answer your question: you can be a deadlock victim if you're not executing a query in a transaction.
It's counter-intuitive, but you can be a deadlock victim by running a SELECT statement.
It can happen if you're running a query that uses an index:
you scan indexes looking for matching rows
other process starts updating data pages
you now want to fetch data from data pages from matching rows
other process holding locks on data pages
you wait for data page locks to release
other process finished updating data pages, wants to update indexes
you are holding read locks on indexes
other process waits for index locks to release
DEADLOCK
So, strictly speaking, you can be a deadlock victim, when you're not executing a query in a transaction. The other guy wasn't executing his UPDATE statement in a transaction either.
Nobody's explicitly using a transaction, yet there's a deadlock.
Possible problems:
SQL Server only has a finite number of locks. It is possible to run out of locks.
Other resources are finite (e.g. memory, tempdb). Holding on to these resources could cause those resources to run out.
Transaction logs - the logical transaction logs cannot be freed for re-use if a transaction is open. The result could be a log that fills up. This problem could stop your process because it would halt the entire instance.
To consider:
CASCADE: DELETE may only have one table in the command, but the a CASCADE relationship may touch other tables.
Triggers: Triggers on the modified table may affect other tables.
DELETE and UPDATE commands may use the FROM clause which touch other tables. I've never seen this, but I would not rule it out.
Transactions may time out, is that what is happening.
As you have at least 1 (or more) update locks taken out and make be some read and table scan locks, you may be killed to help free up deadlocks created by other transactions. The deadlock recovery code in SQL Server is unlikely to be totally bug free and it is not normal to keep a transaction open for a long time on SQL Server. However I would not expect that to happen often.
Some system when they detach deadlock type problem, just start killing “long lived” transactions that have not done match work so as to free up locks. Just because you are not part of the deadlock loop, does not stop the system picking on you.
To understand what is going on in your case, you will have to use the Sql Server Profiler to collect all the locking and deadlock related events, as well as event about aborted connection and transactions etc. Good lack this will time some time and a good level of understanding of the profiler events you are looking at...
The detail of this sort of things are different between database vendors and versions of their database. However as it is considered bad design by most database vendors to have a transaction open for a long time, doing so tends to lead to problems and hit code paths that have not had the most testing effort.
Just because you're not in a transaction doesn't mean you're not holding locks.

Advice for minimizing locking on an append only table in MS SQL Server?

I'm writing some logging/auditing code that will be running in production (not just when errors are thrown or while developing). After reading Coding Horror's experiences with dead-locking and logging, I decided I should seek advice. (Jeff's solution of "not logging" won't work for me, this is legally mandated security auditing)
Is there an suitable Isolation level for minimizing contention and dead-locking? Any query hints I can add to the insert statement or the stored procedure?
I care deeply about the transactional integrity for everything except the audit table. The idea is that so much will be logged that if a few entries fail, it's not a problem. If the logging stops a some other transaction-- that would be bad.
I can log to a database or a file, although logging to a file is less attractive because I need to be able to display the results somehow. Logging to a file would (almost) guarantee the logging wouldn't interfere with other code though.
A normal transaction (ie. READ COMMITTED) insert already does the 'minimal' locking. Insert intensive applications will not deadlock on the insert, no matter the order of how the insert is mixed with other operations. At worst an intensive insert system may cause page latch contention on the hot spot where insert occurs, but not deadlocks.
To cause deadlocks as described by Jeff there has to be more at play, like any one of the following:
The system is using a higher isolation level (they had it coming then and well deserve it)
They were reading from the log table during the transaction (so is no longer 'append-only')
The deadlock chain involved application layer locks (ie. .Net lock statements in the log4net framework) resulting in undetectable deadlocks (ie. application hangs). Given that solving the problem involved looking at process dumps, I guess this is the scenario they were having.
So as long as you do insert only logging in READ COMMITTED isolation level transactions you are safe. If you expect the same problem I suspect SO had (ie. deadlocks involving application layer locks) then no amount of database wizardry can save you, as the problem can still manifest even if you log on separate transaction or into separate connection.
If you don't care about consistency on your logging table, why not perform all the logging from a separate thread.
I probably would not wait for transactions to complete before logging, since the log can be pivotal in diagnosing long running transactions. Also, this enables you to see all the work a transaction that rolled back did.
Grab the stack trace and all of your logging data in the logging thread, chuck it on a queue when there are new logging messages, flush them to the db in a single transaction.
Steps to minimizing locking:
(KEY) perform all appends to the logging table outside of the main thread/connection/transaction.
Ensure your logging table has a monotonically increasing clustered index (Eg. int identity ) that is increasing each time you append a log message. This ensures the pages being inserted into are usually in memory and avoids the performance hits you get with heap tables.
Perform multiple appends to the log in a transaction (10 inserts in a transaction are faster than 10 inserts out of a transaction and usually acquire/release less locks)
Give it a break. Only perform logging to your db every N milliseconds. Batch up bits of works.
If you need to report on stuff historically, you can consider partitioning your logging table. Example: You could create a new logging table every month, and at the same time have a log VIEW that is a UNION ALL of all the older logging tables. Perform the reporting against the most appropriate source.
You will get better performance by flushing multiple logging messages in a single (smallish) transaction, and have the advantage that if 10 threads are doing work and logging stuff, only a single thread is flushing stuff to the logging table. This pipelining actually makes stuff scale better.
Since you don't care about the transactional integrity of the audit table, you can obviously perform logging outside of the transaction (i.e. after it completes). That will minimise impact on the transaction.
Also, if you want to minimize locking, you should try to ensure that as much of your query workload as possible has covering non-clustered indexes. (SQL Server 2005 and above, the use of the INCLUDE statement in NC indexes can make a big difference)
One easy way to prevent your logging from having locking issues with your 'regular' database is to not use the same database. Just create another database for your logging. As a bonus, the rapid growth of your logging database won't result in fragmentation in your main DB. Personall, I usually prefer to log to a file -- but then again, I'm used to doing heavy text manipulation in my editor - VIM. Logging to a separate DB should help avoid deadlocking issues.
Just make sure that if you try writing your own database appender for the logging framework you use, you be very careful about your locks (which I'm guessing is what tripped up Jeff in the blog post you reference). Properly written (see several of the comments in Jeff's post), you shouldn't have locking issues with your logging framework unless they do something odd.

Concurrency issues

Here's my situation (SQL Server):
I have a web application that utilizes nHibernate for data access, and another 3 desktop applications. All access the same database, and are likely to utilize the same tables at any one time.
Now, with the help of NH I'm batching selects in order to load an aggregate with all of its hierarchy - so I would see 4 to maybe 7 selects being issued at once (not sure if it matters).
Every few days one of the applications will get a : "Transaction has been chosen as the deadlock victim." (this usually appears on a select)
I tried changing to snapshot isolation on the database , but that didn't helped - I was ending up with :
Snapshot isolation transaction aborted
due to update conflict. You cannot use
snapshot isolation to access table
'...' directly or indirectly in
database '...' to update,
delete, or insert the row that has
been modified or deleted by another
transaction. Retry the transaction or
change the isolation level for the
update/delete statement.
What suggestions to you have for this situation ? What should I try, or what should I read in order to find a solution ?
EDIT:
Actually there's no raid in there :). The number of users per day is small (I'll say 100 per day - with hundreds of small orders on a busy day), the database is a bit bigger at about 2GB and growing faster every day.
It's a business app, that handles orders, emails, reports, invoices and stuff like that.
Lazy loading would not be an option in this case.
I guess taking a very close looks at those indexes is my best bet.
Deadlocks are complicated. A deadlock means that at least two sessions have locks and are waiting for one another to release a different lock; since both are waiting, the locks never get released, neither session can continue, and a deadlock occurs.
In other words, A has lock X, B has lock Y, now A wants Y and B wants X. Neither will give up the lock they have until they are finished with their transaction. Both will wait indefinitely until they get the other lock. SQL Server sees that this is happening and kills one of the transactions in order to prevent the deadlock. Snapshot isolation won't help you - the DB still needs to preserve atomicity of transactions.
There is no simple answer anyone can give as to why a deadlock would be occurring. You'll need to profile your application to find out.
Start here: How to debug SQL deadlocks. That's a good intro.
Next, look at Detecting and Ending Deadlocks on MSDN. That will give you a lot of good background information on why deadlocks occur, and help you understand what you're looking at/for.
There are also some previous SO questions that you might want to look at:
Diagnosing Deadlocks in SQL Server 2005
Zero SQL deadlock by design
Or, if the deadlocks are very infrequent, just write some exception-handling code into your application to retry the transaction if a deadlock occurs. Sometimes it can be extremely hard (if not nearly impossible) to prevent certain deadlocks. As long as you write transactionally-safe code, it's not the end of the world; it's completely safe to just try the transaction again.
Is your hardware properly configured (specifically RAID configuration)? Is it capable of matching your workload?
If hardware is all good and humming, you should ensure you have the 'right' indexes to match your query workload.
Many locking/deadlock problems can be eliminated with the correct indexes (covering indexes can take pressure off the clustered index during inserts).
BTW: turning on snapshot isolation will put increased pressure on your tempDB. How is tempDB configured? RAID 0 is preferred (and even better use an SSD if tempDB is a bottleneck).
While it's not uncommon to find this error in NHibernate sessions with large numbers of users, it seems to be happening too often in your case.
Perhaps your objects are very large resulting in long-running selects? And if your selects are taking too long, that might indicate problems with your indexes (as Mitch Wheat explains)
If everything is in order, you could also try Lazy Loading to postpone your selects until when you really need your data. This might not be appropriate for your exact situation so you do have to see if it works.

Resources