According to the reference documentation the READ ONLY transaction flag is useful other than allowing DEFERRABLE transactions?
SET SESSION CHARACTERISTICS AS TRANSACTION READ ONLY;
The DEFERRABLE transaction property has no effect unless the
transaction is also SERIALIZABLE and READ ONLY. When all three of
these properties are selected for a transaction, the transaction may
block when first acquiring its snapshot, after which it is able to run
without the normal overhead of a SERIALIZABLE transaction and without
any risk of contributing to or being canceled by a serialization
failure. This mode is well suited for long-running reports or backups.
Does the database engine runs other optimizations for read-only transactions?
To sum up the comments from Nick Barnes and Craig Ringer in the question comments:
The READ_ONLY flag does not necessarily provide any optimization
The main benefit of setting the READ_ONLY flag is to ensure that no tuple is going to be modified
Actually, it does. Let me just cite source code comment here:
/*
* Check if we have just become "RO-safe". If we have, immediately release
* all locks as they're not needed anymore. This also resets
* MySerializableXact, so that subsequent calls to this function can exit
* quickly.
*
* A transaction is flagged as RO_SAFE if all concurrent R/W transactions
* commit without having conflicts out to an earlier snapshot, thus
* ensuring that no conflicts are possible for this transaction.
*/
Related
I am inserting a few rows with in a transaction. But when I do a 'END TRANSACTION' its taking around 250ms to execute while the 'BEGIN TRANSACTION' hardly takes around 1ms. I need to fasten the speed here to suite my application. How can I?
[edit]
* A single thread is accessing the database.
* I have 2 tables in this database and both these have primary key on them.
* With in a transaction there is exactly one insert into each the tables.
* OS - windows 7
Using the out of the box, or default, settings of sqlite, 250ms to commit a transaction makes sense. This is due to how sqlite commits your transaction. It waits for the VFS for a guarantee that the writes are committed to disk to return.
Here are a couple of possibilities to optimize.
Encapsulate more inserts per transaction
If possible, execute more inserts per transaction. Try running, say, 100 inserts in one transactions, and see how little difference there will be compared to only 1 insert. You may not even see any (i.e. 100 inserts will probably take marginally more than 250ms).
Bottom line is that you'll get more bang for your buck as each insert will ultimately take less time (on average).
Use WAL journaling
I strongly recommend you try WAL journaling as you should see a dramatic reduction from 250ms. WAL shouldn't be any less safe than regular journaling. The reason WAL is faster is in its name: it appends to a journaling file instead of having the database file absorb the changes of your commit every time you commit. Read this for the full story.
To activate WAL journaling, set the journal_mode pragma to WAL:
PRAGMA journal_mode = WAL;
Change the synchronous pragma
This may or may not be good enough for you as it is less safe. As such, I recommend it only if you understand what the risks are, and also if the two previous suggestions are not good enough for you, or if you cannot use them.
Basically, changing the synchronous pragma to NORMAL or OFF will cause sqlite to not wait after the VFS for a guarantee that the writes are committed to disk to return.
Please read the documentation first, then if you still want to try it, you can set your pragma to either OFF or NORMAL:
PRAGMA synchronous = NORMAL;
or
PRAGMA synchronous = OFF;
Background
Given:
a set of threads
each thread has its own data source
objects in each data source references objects in other data sources
there is a possibility for duplicate objects across various data sources
the threads are writing to a database with an engine that enforces the foreign key constraint
each type of object gets its own table and references the other objects through a foreign key
each thread has its own connection to the database
Proposed Solution
A register class which tracks the ID of the objects that have been written. The inteface of the register class has public methods, thus (represented in Java):
public interface Register
{
synchronized boolean requestObjectLock(int id);
synchronized boolean saveFinalized(int id);
synchronized boolean checkSaved(int id);
}
The method requestObjectLock checks to see if the object has been locked by another thread yet, and returns false it has. Otherwise, it locks that ID and returns true. It is then the responsibility of the calling thread to call saveFinalized when it has been successfully written to the database, and the responsibility of all other threads to check to see whether it has been written already with checkSaved before writing an object that references it. In other words, there are three states an object can be in: unregistered, locked (registered but unwritten), and saved (registered and written).
Reasoning
As far as I know there is no way to guarentee that one SQL query will finish before another when called by different threads. Thus, if an object was only registered or unregistered, it seems possible that a thread could check to see if an object was written, start writing an object that referenced it, and have its query complete (and fail) before the query that actually wrote the referenced object did.
Questions
Is it possible to guarantee the sequence of execution of queries being executed by different threads? And therefore, is this solution overengineered? Is there a simpler solution? On the other hand, is it safe?
The terms you need to research on the database side are "transaction isolation level" and "concurrency control". DBMS platform support varies. Some platforms implement a subset of the isolation levels defined in the SQL standards. (The SQL standards allow this. They're written in terms of what behavior isn't allowed.) And different platforms approach concurrency control in different ways.
Wikipedia, although not authoritative, has a good introduction to isolation levels, and also a good introduction to concurrency control.
As far as I know there is no way to guarentee that one SQL query will
finish before another when called by different threads.
That's kind of true. It's also kind of not true. In SQL standards, transaction isolation levels aren't concerned with who finishes first. They're concerned with behavior that's not allowed.
dirty read: Transaction A can read data written by concurrent, uncommitted transaction B.
nonrepeatable read: Transaction A reads data twice. A concurrent transaction, B, commits between the two reads. The data transaction A read first is different from the data it read second, because of transaction B. (Some people describe transaction A as seeing "same rows, different column values".)
phantom read: Transaction A reads data twice. A concurrent transaction, B, commits between the two reads. Transaction A's two reads return two different sets of rows, because transaction B has affected the evaluation of transaction A's WHERE clause. (Some people describe transaction A as seeing "same column values, different rows".)
You control transaction behavior in SQL using SET TRANSACTION. So SET TRANSACTION SERIALIZABLE means dirty reads, nonrepeatable reads, and phantom reads are impossible. SET TRANSACTION REPEATABLE READ allows phantom reads, but dirty reads and nonrepeatable reads are impossible.
You'll have to check your platform's documentation to find out what it supports. PostgreSQL, for example, supports all four isolation levels syntactically. But internally it only has two levels: read committed and serializable. That means you can SET TRANSACTION READ UNCOMMITTED, but you'll get "read committed" behavior.
Important for you: The effect of a serializable isolation level is to guarantee that transactions appear to have been issued one at a time by a single client. But that's not quite the same thing as saying that if transaction A starts before transaction B, it will commit before transaction B. If they don't affect each other, the dbms is allowed to commit transaction B first without violating the serializable isolation level semantics.
When I have questions myself about how these work, I test them by opening two command-line clients connected to the same database.
Do I understand correctly that table/row lock hints are being used for pessimistic transaction (TX) isolation models of concurrency ONLY?
In other words, when can table/row lock hints be used during engagement of optimistic TX isolation provided by SQL Server (2005 and higher)?
When one would need pessimistic TX isolation levels/hints in SQL Server2005+ if the later provides built-in optimistic (aka snapshot aka versioning) concurrency isolation?
I did read that pessimistic options are legacy and are not needed anymore, though I am in doubt.
Also, having optimistic (aka snapshot aka versioning) TX isolation levels built-in SQL Server2005+,
when one would need to manually code for optimistic concurrency features?
The last question is inspired by having read:
"Optimistic Concurrency in SQL Server" (September 28, 2007)
describing custom coding to provide versioning in SQL Server.
Optimistic concurrency requires more resources and is more expensive when the conflict occurs.
Two sessions can read and modify the values and the conflict only occurs when they try to apply their changes simultaneously. This means that in case of the concurrent update both values should be stored somewhere (which of course requires resources).
Also, when a conflict occurs, usually the whole transaction should be rolled back or the cursor refetched, which is expensive too.
Pessimistic concurrency model uses locking, thus downgrading concurrency but improving performance.
In case of two concurrent tasks, it may be cheaper for the second task to wait for a lock to release than spending CPU time and disk I/O on two simultaneous works and then yet more on rolling back the less fortunate work and redoing it.
Say, you have a query like this:
UPDATE mytable
SET myvalue = very_complex_function(#range)
WHERE rangeid = #range
, with very_complex_function reading some data from mytable itself. In other words, this query transforms a subset of mytable sharing the value of range.
Now, when two functions work on the same range, there may be two scenarios:
Pessimistic: the first query locks, the second query waits for it. The first query completes in 10 seconds, the second one does too. Total: 20 seconds.
Optimistic: both queries work independently (on the same input). This shares CPU time between them plus some overhead on switching. They should keep their intermediate data somewhere, so the data is stored twice (which implies twice I/O or memory). Let's say both complete almost at the same time, in 15seconds.
But when it's time to commit the work, the second query will conflict and will have to rollback its changes (say, it takes the same 15 seconds). Then it needs to reread the data again and do the work again, with the new set of data (10 seconds).
As a result, both queries complete later than with a pessimistic locking: 15 and 40 seconds vs. 10 and 20.
When one would need pessimistic TX isolation levels/hints in SQL Server2005+ if the later provides built-in optimistic (aka snapshot aka versioning) concurrency isolation?
Optimistic isolation levels are, well, optimistic. You should not use them when you expect high contention on your data.
BTW, optimistic isolation (for the read queries) was available in SQL Server 2000 too.
I have a detailed answer here: Developing Modifications that Survive Concurrency
I think there's a bit confusion over terminology here.
The technique of optimistic locking/optimistic concurrency/... is a programming technique used to avoid the following scenario :
start transaction
read data, setting a "read" lock on it to prevent any deletes/modifications to our data
display data on user's screen
await user input, lock remains active
keep awaiting user input, lock still preventing any writes/modifications
user input never comes (for whatever reason)
transaction times out (and this is usually not very rapidly, as the user must be given reasonable time to enter his input).
Optimistic locking replaces this with the following:
start transaction READ
read data, setting a "read" lock on it to prevent any deletes/modifications to our data
end transaction READ, releasing the read lock just set
display data on user's screen
await user input, but data can be modified/deleted meanwhile by other transactions
user input arrives
start transaction WRITE
verify that the data has remained unaltered, raising an exception if it has
apply user updates
end transaction WRITE
So the single "user transaction" to go fetch some data, and change and update them, consists of two distinct "database transactions". What is usually called "isolation levels" applies to those database transactions. The "optimistic locking" that you refer to applies to the "user transaction".
The matter is further complicated in that, broadly speaking, two completely distinct strategies are possible for the "isolating the database transactions part" :
MVCC
2-phase locking
I think the "snapshot versioning isolation level" means that the MVCC technique (well, one of its various possible variations) is being used for the database transaction. The other commonly known isolation levels apply more to transaction isolation using 2PL as the serialization(/isolation) technique. (And mixing them up can get messy ...)
What is the relationship between ACID and database transaction?
Does ACID give database transaction or is it the same thing?
Could someone enlighten this topic.
ACID is a set of properties that you would like to apply when modifying a database.
Atomicity
Consistency
Isolation
Durability
A transaction is a set of related changes which is used to achieve some of the ACID properties. Transactions are tools to achieve the ACID properties.
Atomicity means that you can guarantee that all of a transaction happens, or none of it does; you can do complex operations as one single unit, all or nothing, and a crash, power failure, error, or anything else won't allow you to be in a state in which only some of the related changes have happened.
Consistency means that you guarantee that your data will be consistent; none of the constraints you have on related data will ever be violated.
Isolation means that one transaction cannot read data from another transaction that is not yet completed. If two transactions are executing concurrently, each one will see the world as if they were executing sequentially, and if one needs to read data that is written by another, it will have to wait until the other is finished.
Durability means that once a transaction is complete, it is guaranteed that all of the changes have been recorded to a durable medium (such as a hard disk), and the fact that the transaction has been completed is likewise recorded.
So, transactions are a mechanism for guaranteeing these properties; they are a way of grouping related actions together such that as a whole, a group of operations can be atomic, produce consistent results, be isolated from other operations, and be durably recorded.
ACID are desirable properties of any transaction processing engine.
A DBMS is (if it is any good) a particular kind of transaction processing engine that exposes, usually to a very large extent but not quite entirely, those properties.
But other engines exist that can also expose those properties. The kind of software that used to be called "TP monitors" being a case in point (nowadays' equivalent mostly being web servers).
Such TP monitors can access resources other than a DBMS (e.g. a printer), and still guarantee ACID toward their users. As an example of what ACID might mean when a printer is involved in a transaction:
Atomicity: an entire document gets printed or nothing at all
Consistency: at end-of-transaction, the paper feed is positioned at top-of-page
Isolation: no two documents get mixed up while printing
Durability: the printer can guarantee that it was not "printing" with empty cartridges.
What is the relationship between ACID and database transaction?
In a relational database, every SQL statement must execute in the scope of a transaction.
Without defining the transaction boundaries explicitly, the database is going to use an implicit transaction which is wraps around every individual statement.
The implicit transaction begins before the statement is executed and end (commit or rollback) after the statement is executed.
The implicit transaction mode is commonly known as auto-commit.
A transaction is a collection of read/write operations succeeding only if all contained operations succeed.
Inherently a transaction is characterized by four properties (commonly referred as ACID):
Atomicity
Consistency
Isolation
Durability
Does ACID give database transaction or is it the same thing?
For a relational database system, this is true because the SQL Standard specifies that a transaction should provide the ACID guarantees:
Atomicity
Atomicity takes individual operations and turns them into an all-or-nothing unit of work, succeeding if and only if all contained operations succeed.
A transaction might encapsulate a state change (unless it is a read-only one). A transaction must always leave the system in a consistent state, no matter how many concurrent transactions are interleaved at any given time.
Consistency
Consistency means that constraints are enforced for every committed transaction. That implies that all Keys, Data types, Checks and Trigger are successful and no constraint violation is triggered.
Isolation
Transactions require concurrency control mechanisms, and they guarantee correctness even when being interleaved. Isolation brings us the benefit of hiding uncommitted state changes from the outside world, as failing transactions shouldn’t ever corrupt the state of the system. Isolation is achieved through concurrency control using pessimistic or optimistic locking mechanisms.
Durability
A successful transaction must permanently change the state of a system, and before ending it, the state changes are recorded in a persisted transaction log. If our system is suddenly affected by a system crash or a power outage, then all unfinished committed transactions may be replayed.
I slightly modified the printer example to make it more explainable
1 document which had 2 pages content was sent to printer
Transaction - document sent to printer
atomicity - printer prints 2 pages of a document or none
consistency - printer prints half page and the page gets stuck. The printer restarts itself and prints 2 pages with all content
isolation - while there were too many print outs in progress - printer prints the right content of the document
durability - while printing, there was a power
cut- printer again prints documents without any errors
Hope this helps someone to get the hang of the concept of ACID
ACID properties are very old and important concept of database theory. I know that you can find lots of posts on this topic, but still I would like to start share answer on this because this is very important topic of RDBMS.
Database System plays with lots of different types of transactions where all transaction has certain characteristic. This characteristic is known ACID Properties.
ACID Properties take grantee for all database transactions to accomplish all tasks.
Atomicity : Either commit all or nothing.
Consistency : Make consistent record in terms of validate all rule and constraint of transaction.
Isolation : Make sure that two transaction is unaware to each other.
Durability : committed data stored forever.
Reference taken from this article:
To quote Wikipedia:
ACID (atomicity, consistency, isolation, durability) is a set of properties that guarantee database transactions are processed reliably.
A DBMS that supports transactions will strive to support all of these properties - any commercial DBMS (as well as several open-source DBMSs) provide full ACID 'support' - although it's often possible (for example, with varying isolation levels in MSSQL) to lessen the ACIDness - thus losing the guarantee of fully transactional behaviour.
ACID Properties in Databases:
Atomicity : Transactions are all or nothing
Consistency: Only valid data is saved (database from one state that is consistent to another state that is also consistent.)
Isolation: Transaction do not effect each other (Multiple transactions can run at the same time in the system. Executing multiple transactions in parallel must have the same results as running them sequentially.)
Durability: Written data will not be lost (even if the database crashes immediately or in the event of a power loss.)
Credit
[Gray] introduced the ACD properties for a transaction in 1981. In 1983 [Haerder] added the Isolation property. In my opinion, the ACD properties would be have a more useful set of properties to discuss. One interpretation of Atomicity (that the transaction should be atomic as seen from any client any time) would actually imply the isolation property. The "isolation" property is useful when the transaction is not isolated; when the isolation property is relaxed. In ANSI SQL speak: if the isolation level is weaker then SERIALIZABLE. But when the isolation level is SERIALIZABLE, the isolation property is not really of interest.
I have written more about this in a blog post: "ACID Does Not Make Sense".
http://blog.franslundberg.com/2013/12/acid-does-not-make-sense.html
[Gray] The Transaction Concept, Jim Gray, 1981.
http://research.microsoft.com/en-us/um/people/gray/papers/theTransactionConcept.pdf
[Haerder] Principles of Transaction-Oriented Database Recovery, Haerder and Reuter, 1983.
http://www.stanford.edu/class/cs340v/papers/recovery.pdf
Transaction can be defined as a collection of task that are considered as minimum processing unit. Each minimum processing unit can not be divided further.
All transaction must contain four properties that commonly known as ACID properties. i.e ACID are the group of properties of any transaction.
Atomicity :
Consistency
Isolation
Durability
I've a question regarding distributed transactions. Let's assume I have 3 transaction programs:
Transaction A
begin
a=read(A)
b=read(B)
c=a+b
write(C,c)
commit
Transaction B
begin
a=read(A)
a=a+1
write(A,a)
commit
Transaction C
begin
c=read(C)
c=c*2
write(A,c)
commit
So there are 5 pairs of critical operations: C2-A5, A2-B4, B4-C4, B2-C4, A2-C4.
I should ensure integrity and confidentiality, do you have any idea of how to achieve it?
Thank you in advance!
What you have described in your post is a common situation in multi-user systems. Different sessions simultaneously start transactions using the same tables and indeed the same rows. There are two issues here:
What happens if Session C reads a record after Session A has updated it but before Session A has committed its trandsaction?
What happens if Session C updates the same record which Session A has updated but not committed?
(Your scenario only illustrates the first of these issues).
The answer to the first question is ioslation level. This is the definition of the visibility of uncommmitted transactions across sessions. The ANSI standard specifies four levels:
SERIALIZABLE: no changes from another session are ever visible.
REPEATABLE READ: phantom reads allowed, that is the same query executed twice may return different results.
READ COMMITTED: only changes which have been committed by another session are visible.
READ UNCOMMITTED: diryt readsallowed, that is uncommitted changes from one session are visible in another.
Different flavours or database implement these in different fashions, and not all databases support all of them. For instance, Oracle only supports READ COMMITTED and SERIALIZABLE, and it implements SERIALIZABLE as a snapsot (i.e. it is a read-only transaction). However, it uses multiversion concurrency control to prevent non-repeatable reads in READ COMMITTED transactions.
So, coming back to your question, the answer is: set the appropriate Isolation Level. What the appropriate level is depends on what levels your database supports, and what behaviour you wish to happen. Probably you want READ COMMITTED or SERIALIZABLE, that is you want your transactions to proceed on the basis of data values being consistent with the start of the transaction.
As to the other matter, the answer is simpler: transactions must issue locks on tables or preferably just the required rows, before they start to update them. This ensures that the transaction can proceed to change those values without causing a deadlock. This is called pessimistic locking. It is not possible in applications which use connection pooling (i.e. most web-based applications), and the situation there is much gnarlier.