I have n machines writing to DB (sql server) at the exact same time (initiating a transaction). I'm setting the Isolation level to serializable. My understanding is that whichever machine's transaction gets to the DB first, gets executed and the other transactions will be blocked while this completes.
Is that correct?
It depends - are they all performing the same activities? That is, exactly the same statements executing in the same order, with no flow control?
If not, and two connections are accessing independent objects in the DB, they can run in parallel.
If there's some overlap of resources, then some progress may be made by multiple connections until one of them wants to take a lock that another already has - at which point it will wait. There is then the possibility of deadlocks.
SERIALIZABLE:
Statements cannot read data that has been modified but not yet committed by other transactions.
No other transactions can modify data that has been read by the current transaction until the current transaction completes.
Other transactions cannot insert new rows with key values that would fall in the range of keys read by any statements in the current transaction until the current transaction completes.
whichever machine's transaction gets to the DB first, gets executed and the other transactions will be blocked while this completes
No, this i incorrect. The results should be as if each transaction was executed one after another (serially, hence the isolation level name). But the engine is free to use any implementation it likes, as long as it honors the guarantees of the serializable isolation model. And some engines actually implement it pretty much as you describe it, eg. Redis Transactions (although Redis has no 'isolation level' concept).
For SQL Server the transactions will execute in parallel until they hit a lock conflict. When a conflict occurs the transaction that has the lock granted continues undisturbed, while the one that requests the lock in a conflicting mode has to wait for the lock to free (for the granted transaction to commit). Which transaction happens to be the request and which one happens to be the granted is entirely up to what is being executed. That means that it well may be the case that the second machine gets the grant first and finishes first, while the first machine waits.
For a better understanding how the locking behavior differs under serializable isolation level, see Key-Range Locking
Yes, this will be true for write operations at any isolation level: "My understanding is that whichever machine's transaction gets to the DB first, gets executed and the other transactions will be blocked while this completes."
The isolation level helps determine what happens when you READ data while this is going on. Serializable read operations will block your write operations, which might be the behavior you want.
Related
I have gone thru the entire Microsoft site to understand the isolation levels in SQL Server 2008 R2. However before adopting one I would like to take suggestion from the experts at SO.
I have a PHP based web page primarily used as a dashboard. Users (not more than 5) will upload bulk data (around 40,000 rows) each day, and around 70 users will have ready only access to the database. Please note that I have fixed schedule for these 5 users for upload, but I want to mistake proof the same for any data loss. Help me with the below questions:
What is the best isolation level I can use?
Will the default READ COMMITTED isolation help me here?
Also is there a way to set isolation level thru SSMS for a particular database, other than TSQL statements? (universal isolation for a database)
70 users will have the download options, is there a chance that the db will get corrupted if all or most of them try to download at the same time? How do I avoid the same?
Any suggestion from experts....
Regards,
Yuvraj S
Isolation levels are really about how long shared locks on data being read are kept. But as Lieven already mention: those are NOT about preventing "corruption" in the database - those are about preventing readers and writers from getting in each other's way.
First up: any write operation (INSERT, UPDATE) will always require an exclusive lock on that row, and exclusive locks are not compatible with anything else - so if a given row to be updated is already locked, any UPDATE operation will have to wait - no way around this.
For reading data, SQL Server takes out shared locks - and the isolation levels are about how long those are held.
The default isolation level (READ COMMITTED) means: SQL Server will try to get a shared lock on a row, and if successful, read the contents of the row and release that lock again right away. So the lock only exists just for the brief period of the row being read. Shared locks are compatible to other shared locks, so any number of readers can read the same rows at the same time. Shared locks however prevent exclusive locks, so shared locks prevent mostly UPDATE on the same rows.
And then there's also the READ UNCOMMITTED isolation level - which basically takes out no locks; this means, it can also read rows that are currently being updated and exclusively locked - so you might get non-committed data - data that might not even really end up in the database in the end (if the transaction updating it gets rolled back) - be careful with this one!
The next level up is REPEATABLE READ, in which case the shared locks once acquired are held until the current transaction terminates. This locks more rows and for a longer period of time - reads are repeatable since those rows you have read are locked against updates "behind your back".
And the ultimate level is SERIALIZABLE in which entire ranges for rows (defined by your WHERE clause in the SELECT) are locked until the current transaction terminates.
Update:
More than the download part (secondary for me) I am worried about 5 users trying to update one database at the same time.
Well, don't worry - SQL Server will definitely handle this without any trouble!
If those 5 (or even 50) concurrent users are updating different rows - they won't even notice someone else is around. The updates will happen, no data will be hurt in the process - all is well.
If some of those users try to update the same row - they will be serialized. The first one will be able to get the exclusive lock on the row, do its update, release the lock and then go on. Now the second user would get its chance at it - get the exclusive lock, update the data, release lock, go on.
Of course: if you don't do anything about it, the second user's data will simply overwrite the first update. That's why there is a need for concurrency checks. You should check to see whether or not the data has changed between the time you read it and the time you want to write it; if it's changed, it means someone else already updated it in the mean time -> you need to think of a concurrency conflict resolution strategy for this case (but that's a whole other question in itself....)
I am newbie to database and SQL Server.
So when i have search things about database on internet i have found that The Database said to be good if it obey or follow the ACID (Atomicity, Consistency, Isolation, Durability) property.
I wonder that the Microsoft SQL Server (Any Version Current or previous one) follow the ACID property internally or if we are using MS SQL Server in our application then we have to write coding such way that our Application follow the ACID property.
In Short: Maintain the ACID property is task (or liability) of Database
Or its task of Application Programmer.
Thanks..
IMHO, it is a two fold maintainance. Both DB Admins (writing stored procedures ) and programmers should enforce ACID Properties. SQL Server maintains its own ACID properties internally and we don't have to worry about that.
ACID Properties are enforced in SQL Server.
Read this: Acid Properties of SQL 2005
But that doesn't mean that that the Database would handle everything for you.
According to Pinal Dave (blog.sqlauthority.com)
ACID (an acronymn for Atomicity
Consistency Isolation Durability) is a
concept that Database Professionals
generally look for when evaluating
databases and application
architectures. For a reliable database
all this four attributes should be
achieved.
Atomicity is an all-or-none
proposition.
Consistency guarantees that a
transaction never leaves your database
in a half-finished state.
Isolation keeps transactions separated
from each other until they’re
finished.
Durability guarantees that the
database will keep track of pending
changes in such a way that the server
can recover from an abnormal
termination.
Above four rules are very important
for any developers dealing with
databases.
That goes for developers dealing with databases.
But application developers should also write business logic on which ACID properties are being enforced.
An example on Practical use of ACID properties would help you more I guess
Almost every modern database systems enforce ACID properties.
Read this : Database transaction and ACID properties
ACID --> Atomicity, Consistency, Isolation, Durability
Atomicity:
A transaction is the fundamental unit of processing. Either all of its operations are executed, or none of them are.
Suppose that the system crashes after the Write(A) operation (but before write(B).)
Database must be able to recover old values of A and B (or complete entire transaction)
Consistency Preserving:
Executing a transaction alone must move the database from one consistent state to another consistent state.
Sum of A and B must be unchanged by the execution of the transaction
Isolation:
A transaction should not make its effects known to other transactions until after it commits.
If two transactions execute concurrently, it must appear that one completed execution before the other started.
If another transaction executing at the same time is reading (and/or writing to) accounts A and B, it should not be able to read the data in an inconsistent state (after write to A and before write to B)
Durability:
Once a transaction commits, the changes to the database can not be lost due to a future failure.
Once transaction completes, we will always have new values of A and B in the database
Transaction:-A transaction is a batch of SQL statements that behaves like a single unit. In simple words, a transaction is a unit where a sequence of work is done to complete the whole activity. We can take an example of Bank transaction to understand this.
When we transfer money from account “A” to account “B”, a transaction takes place.Every transaction has four characteristics, those are known as ACID properties.
◦ Atomicity
◦ Consistency
◦ Isolation
◦ Durability
Atomicity: – Every transaction follow atomicity model, which means that if a transaction is started, it should be either completed or rollback. To understand this lets take above example, if person is transferring amount from account “A” to account “B”, it should be credited to account B after completing the transaction. In case if any failure happens, after debiting amount from account “A” , the change should be rollback.
Consistency: - Consistency says that after the completion of a transaction, changes made during the transaction should be consistent. Let’s understand this fact by referring the above example, if account “A” has been debited by 200 RS then after completion of transaction account “B” should be credited by 200 RS. It means changes should be consistent.
Isolation: - Isolation states that every transaction should be isolated with each other, there should not be any interference between two transactions.
Durability: - Durability means that once the transaction is completed, all the changes should be permanent, it means that in case of any system failure, changes should not be lost.
ACID :
[A]tomic:- Everything succeeds or fails as a single unit.
[C]onsistent:- When the operation is complete, everything is left in a safe state.
[I]solated:- No other operation can impact me operation.
[D]urable:- When the operation is completed, changes are safe
What is a deadlock in SQL Server and when it arises?
What are the issues with deadlock and how to resolve it?
In general, deadlock means that two or more entities are blocking some sources, and none of them is able to finish, because they are blocking sources in a cyclic way.
One example: Let's say I have table A and table B, I need to do some update in A and then B and I decide to lock both of them at the moment of usage (this is really stupid behaviour, but it serves it's purpose now). At the same moment, someone else does the same thing in opposite order - locks B first, then locks A.
Chronologically, this happens:
proc1: Lock A
proc2: Lock B
proc1: Lock B - starts waiting until proc2 releases B
proc2: Lock A - starts waiting until proc1 releases A
Neither of them will ever finish. That's a deadlock. In practice this usually results in timeout errors since it is not desired to have any query hanging forever, and the underlying system (e.g. the database) will kill queries that don't finish in time.
One real world example of a deadlock is when you lock your house keys in your car, and your car keys in your house.
What is a deadlock
A deadlock happens when two concurrent transactions cannot make progress because each one waits for the other to release a lock, as illustrated in the following diagram.
Because both transactions are in the lock acquisition phase, neither one releases a lock prior to acquiring the next one.
Recovering from a deadlock situation
If you're using a Concurrency Control algorithm that relies on locks, then there is always the risk of running in a deadlock situation. Deadlocks can occur in any concurrency environment, not just in a database system.
For instance, a multithreading program can deadlock if two or more threads are waiting on locks that were previously acquired so that no thread can make any progress. If this happens in a Java application, the JVM cannot just force a Thread to stop its execution and release its locks.
Even if the Thread class exposes a stop method, that method has been deprecated since Java 1.1 because it can cause objects to be left in an inconsistent state after a thread is stopped. Instead, Java defines an interrupt method, which acts as a hint as a thread that gets interrupted can simply ignore the interruption and continue its execution.
For this reason, a Java application cannot recover from a deadlock situation, and it is the responsibility of the application developer to order the lock acquisition requests in such a way that deadlocks can never occur.
However, a database system cannot enforce a given lock acquisition order since it's impossible to foresee what other locks a certain transaction will want to acquire further. Preserving the lock order becomes the responsibility of the data access layer, and the database can only assist in recovering from a deadlock situation.
The database engine runs a separate process that scans the current conflict graph for lock-wait cycles (which are caused by deadlocks).
When a cycle is detected, the database engine picks one transaction and aborts it, causing its locks to be released, so that the other transaction can make progress.
Unlike the JVM, a database transaction is designed as an atomic unit of work. Hence, a rollback leaves the database in a consistent state.
Deadlock priority
While the database chooses to rollback one of the two transactions being stuck, it's not always possible to predict which one will be rolled back. As a rule of thumb, the database might choose to roll back the transaction with a lower rollback cost.
Oracle
According to the Oracle documentation, the transaction that detected the deadlock is the one whose statement will be rolled back.
SQL Server
SQL Server allows you to control which transaction is more likely to be rolled back during a deadlock situation via the DEADLOCK_PRIORITY session variable.
The DEADLOCK_PRIORITY session can accept any integer between -10 and 10, or pre-defined values such as LOW (-5), NORMAL (0) or HIGH (5).
In case of a deadlock, the current transaction will roll back, unless the other transactions have a lower deadlock priority value. If both transactions have the same priority value, then SQL Server rolls back the transaction with the least rollback cost.
PostgreSQL
As explained in the documentation, PostgreSQL does not guarantee which transaction is to be rolled back.
MySQL
MySQL tries to roll back the transaction that modified the least number of records, as releasing fewer locks is less costly.
Deadlock is what happens when two people need multiple resources to execute, and where some of the resources are locked by each of the people. This leads to the fact that A can't execute without something B has and vice versa.
Lets say I have Person A and Person B. They both need to get two rows to run (Row1 and Row2).
Person A locks Row1 and tries to get Row2.
Person B locks Row2 and tries to get Row1.
Person A can't run because it needs Row2, Person B can't run because it needs Row1. Neither person will ever be able to execute because they're locking what the other needs and vice versa.
One reasonably simple way to reduce deadlock is in all your complex transactions, you should do operations in the same order. In other words, access Table1 then Table2 in the same order. This will help reduce the number of deadlocks that occur.
An impasse that may result when two (or more) transactions are each
waiting for locks to be released that are held by the other.
Deadlock is when a process or thread enters a waiting state because a requested system resources in held by another waiting process
Lets say I open a transaction and run update queries.
BEGIN TRANSACTION
UPDATE x SET y = z WHERE w = v
The query returns successfully and the transaction stays open deliberately for a period of time before I decide to commit.
While I'm sitting on the transaction is it ever possible the MSSQL deadlock machinary would be able to preempt my open transaction that is not actually executing anything to either clear a deadlock or free resources as system memory/resource limits are reached?
I know about SET DEADLOCK_PRIORITY and have read the MSDN articles on the topic of deadlocks. Logically since I'm not actively seeking to stake claim on any additional resources I can't imagine a scenario that would trigger a sane deadlock avoidance algorithm.
Does anyone know for sure if its possible that simply holding any locks can make me a valid target? Similarly could any low resource condition trigger the killing of my SPID?
NO
For a deadlock to occur all the participants in the deadlock chain must be waiting for a resource (a lock). If your connection is idle it means it doesn't execute a request, which implies it cannot be waiting.
As for other conditions that can kill your session I can think of at least three:
administrative operations that use WITH ROLLBACK_IMMEDIATE
a mirroring failover
intentional KILL <yourspid>, perhaps as a joke by your friendly DBA
To answer your question: you can be a deadlock victim if you're not executing a query in a transaction.
It's counter-intuitive, but you can be a deadlock victim by running a SELECT statement.
It can happen if you're running a query that uses an index:
you scan indexes looking for matching rows
other process starts updating data pages
you now want to fetch data from data pages from matching rows
other process holding locks on data pages
you wait for data page locks to release
other process finished updating data pages, wants to update indexes
you are holding read locks on indexes
other process waits for index locks to release
DEADLOCK
So, strictly speaking, you can be a deadlock victim, when you're not executing a query in a transaction. The other guy wasn't executing his UPDATE statement in a transaction either.
Nobody's explicitly using a transaction, yet there's a deadlock.
Possible problems:
SQL Server only has a finite number of locks. It is possible to run out of locks.
Other resources are finite (e.g. memory, tempdb). Holding on to these resources could cause those resources to run out.
Transaction logs - the logical transaction logs cannot be freed for re-use if a transaction is open. The result could be a log that fills up. This problem could stop your process because it would halt the entire instance.
To consider:
CASCADE: DELETE may only have one table in the command, but the a CASCADE relationship may touch other tables.
Triggers: Triggers on the modified table may affect other tables.
DELETE and UPDATE commands may use the FROM clause which touch other tables. I've never seen this, but I would not rule it out.
Transactions may time out, is that what is happening.
As you have at least 1 (or more) update locks taken out and make be some read and table scan locks, you may be killed to help free up deadlocks created by other transactions. The deadlock recovery code in SQL Server is unlikely to be totally bug free and it is not normal to keep a transaction open for a long time on SQL Server. However I would not expect that to happen often.
Some system when they detach deadlock type problem, just start killing “long lived” transactions that have not done match work so as to free up locks. Just because you are not part of the deadlock loop, does not stop the system picking on you.
To understand what is going on in your case, you will have to use the Sql Server Profiler to collect all the locking and deadlock related events, as well as event about aborted connection and transactions etc. Good lack this will time some time and a good level of understanding of the profiler events you are looking at...
The detail of this sort of things are different between database vendors and versions of their database. However as it is considered bad design by most database vendors to have a transaction open for a long time, doing so tends to lead to problems and hit code paths that have not had the most testing effort.
Just because you're not in a transaction doesn't mean you're not holding locks.
So I know that autocommit commits every sql statement, but do updates to the database go directly to the disk or do they remain on cache until flushed?
I realize it's dependent on the database implementation.
Does auto-commit mean
a) every statement is a complete transaction AND it goes straight to disk or
b) every statement is a complete transaction and it may go to cache where it will be flushed later or it may go straight to disk
Clarification would be great.
Auto-commit simply means that each statement is in its own transaction which commits immediately. This is in contrast to the "normal" mode, where you must explicitly BEGIN a transaction and then COMMIT once you are done (usually after several statements).
The phrase "auto-commit" has nothing to do with disk access or caching. As an implementation detail, most databases will write to disk on commit so as to avoid data loss, but this isn't mandatory in the spec.
For ARIES-based protocols, committing a transaction involves logging all modifications made within that transaction. Changes are flushed immediately to logfile, but not necessarily to datafile (that is dependent on the implementation). That is enough to ensure that the changes can be recovered in the event of a failure. So, (b).
Commit provides no guarantee that something has been written to disk, only that your transaction has been completed and the changes are now visible to other users.
Permanent does not necessarily mean written to disk (i.e. durable)... Even if a "commit" waits for the transaction to complete can be configured with some databases.
For example, Oracle 10gR2 has several commit modes, including IMMEDIATE,WAIT,BATCH,NOWAIT. BATCH will queue the buffer the changes and the writer will write the changes to disk at some future time. NOWAIT will return immediately without regard for I/O.
The exact behavior of commmit is very database specific and can often be configured depending on your tolerance for data loss.
It depends on the DBMS you're using. For example, Firebird has it as an option in configuration file. If you turn Forced Writes on, the changes go directly to the disk. Otherwise they are submitted to the filesystem, and the actual write time depends on the operating system caching.
If the database transaction is claimed to be ACID, then the D (durability) mandates that the transaction committed should survive the crash immediately after the successful commit. For single server database, that means it's on the disk (disk commit). For some modern multi-server databases, it can also means that the transaction is sent to one or more servers (network commit, which are typically much faster than disk), under the assumption that the probability of multiple server crash at the same time is much smaller.
It's impossible to guarantee that commits are atomic, so modern databases use two-phase or three phase commit strategies. See Atomic Commit