Is Software Transactional Memory the same as database transactions? - database

I have read alot about Software Transactional Memory, especially in relation to Haskell but I am trying to figure how it is different from database transactions? Are there some advantages I do not understand with STM?

The idea of a "transaction" in software transactional memory is explicitly borrowed from databases. The difference is where the transactions are implemented and how they are used.
STM is a language-level concept: a sequence of operations does not take effect until a transaction is committed. Typically this means that the values of some global/shared variables only change when a transaction succeeds. The property is enforced by the language runtime. There is no inherent notion of persistence: the variables involved in a transaction may be purely dynamic in nature (e.g., the size of a work queue).
Database transactions are an application-level concept: a sequence of data operations do not take effect until the transaction is committed. Since this is a database, persistence is fundamental: the meaning of "taking effect" inside of a database is that the data is saved in some persistent store.
You could potentially use a database and database transactions to implement a STM-style algorithm, but you'd lose the ease and convenience (and probably in most cases the performance) of a language-level implementation.

An STM transaction has a lot in common with a database transaction. In particular, of the ACID properties important to database designers, STM provides Atomicity and Isolation. Consistency, however, is up to the programmer—you can write STM transactions that violate the invariants of internal data structures, for example. Finally, STM transactions typically are not Durable; results are stored in volatile RAM, and if the machine crashes after a successful transaction, the results can be lost. That, in my mind, is probably the most salient difference between an STM transaction and a database transaction.

STM is mostly used for concurrency, while database transactions are about data consistency.

Related

Does Two Phase Commit mean all the Participant database updates have to use Pessimistic Locking?

I'm referring to the scenario of applying 2PC to heterogenous distributed transactions. Say I want to write to both database A and database B together in atomic fashion. Now reads and writes to database A and B each themselves could be from multiple concurrent users. Normally in a high throughput environment we'd prefer A and B to use Optimistic Locking (row versioning for example) instead of Pessimistic Locking so high volume concurrent read only operations will not be blocked.
But if A and B are also involved as a whole in a 2PC protocol, does that mean they HAVE to use Lock during each of their PREPARE phases for those relevant data changes? And basically need to hold that lock until COMMIT are done? Because the transaction on each is forced split into two parts, and you simply can't just do "prepare to write a value that I read now is the most up-to-date value" cos when you actually commit it later, it could very well be changed, if the resource isn't locked?
Does that mean for example, in distributed transaction environments involving multiple databases, each of their own concurrent throughput will be limited simply because they need to be coordinated with other databases even though they on their own could have used Optimistic Locking?

Transaction isolation in JET?

MSDN describes the JET transaction isolation for its OLEDB provider as follows:
Jet supports five levels of nesting in transactions. The only
supported mode for transactions is Read Committed. Setting lesser
levels of transactional separation implies Read Committed. Setting
higher levels will cause StartTransaction to fail.
Jet supports only single-phase commit.
MSDN describes Read Committed as follows:
Specifies that shared locks are held while the data is being read to
avoid dirty reads, but the data can be changed before the end of the
transaction, resulting in nonrepeatable reads or phantom data. This
option is the SQL Server default.
My questions are:
What is single-phase commit? What consequence does this have for transactions and isolation?
Would the Read Committed isolation level as described above be suitable for my requirements here?
What is the best way to achieve a Serializable transaction isolation using Jet?
By question number:
Single-phase commit is used where all of your data is in one database -- the activity of the transaction is committed atomically and you're done. If you have a logical transaction which needs to be spread across multiple storage engines (like a relational database for metadata and some sort of document store for a big blob) you can use a transaction manager to coordinate the activities so that the work is persisted in both or neither, if both products support two phase commit. They are just telling you that they don't support two-phase commit, so the product is not suitable for distributed transactions.
Yes, if you check the condition in the UPDATE statement itself; otherwise you might have problems.
They seem to be suggesting that you can't.
As an aside, I worked for decades as a consultant in quite a variety of environments. More than once I was engaged to migrate people off of Jet because of performance problems. In one case a simple "star" type query was running for two minutes because it was joining on the client rather than letting the database do it. As a direct query against the database it was sub-second. In another case there was a report which took 72 hours to run through Jet, which took 2 minutes when run directly against the database. If it generally works OK for you, you might be able to deal with such situations by using stored procedures where Jet is causing performance pain.

SQL Server and ACID property of Database

I am newbie to database and SQL Server.
So when i have search things about database on internet i have found that The Database said to be good if it obey or follow the ACID (Atomicity, Consistency, Isolation, Durability) property.
I wonder that the Microsoft SQL Server (Any Version Current or previous one) follow the ACID property internally or if we are using MS SQL Server in our application then we have to write coding such way that our Application follow the ACID property.
In Short: Maintain the ACID property is task (or liability) of Database
Or its task of Application Programmer.
Thanks..
IMHO, it is a two fold maintainance. Both DB Admins (writing stored procedures ) and programmers should enforce ACID Properties. SQL Server maintains its own ACID properties internally and we don't have to worry about that.
ACID Properties are enforced in SQL Server.
Read this: Acid Properties of SQL 2005
But that doesn't mean that that the Database would handle everything for you.
According to Pinal Dave (blog.sqlauthority.com)
ACID (an acronymn for Atomicity
Consistency Isolation Durability) is a
concept that Database Professionals
generally look for when evaluating
databases and application
architectures. For a reliable database
all this four attributes should be
achieved.
Atomicity is an all-or-none
proposition.
Consistency guarantees that a
transaction never leaves your database
in a half-finished state.
Isolation keeps transactions separated
from each other until they’re
finished.
Durability guarantees that the
database will keep track of pending
changes in such a way that the server
can recover from an abnormal
termination.
Above four rules are very important
for any developers dealing with
databases.
That goes for developers dealing with databases.
But application developers should also write business logic on which ACID properties are being enforced.
An example on Practical use of ACID properties would help you more I guess
Almost every modern database systems enforce ACID properties.
Read this : Database transaction and ACID properties
ACID --> Atomicity, Consistency, Isolation, Durability
Atomicity:
A transaction is the fundamental unit of processing. Either all of its operations are executed, or none of them are.
Suppose that the system crashes after the Write(A) operation (but before write(B).)
Database must be able to recover old values of A and B (or complete entire transaction)
Consistency Preserving:
Executing a transaction alone must move the database from one consistent state to another consistent state.
Sum of A and B must be unchanged by the execution of the transaction
Isolation:
A transaction should not make its effects known to other transactions until after it commits.
If two transactions execute concurrently, it must appear that one completed execution before the other started.
If another transaction executing at the same time is reading (and/or writing to) accounts A and B, it should not be able to read the data in an inconsistent state (after write to A and before write to B)
Durability:
Once a transaction commits, the changes to the database can not be lost due to a future failure.
Once transaction completes, we will always have new values of A and B in the database
Transaction:-A transaction is a batch of SQL statements that behaves like a single unit. In simple words, a transaction is a unit where a sequence of work is done to complete the whole activity. We can take an example of Bank transaction to understand this.
When we transfer money from account “A” to account “B”, a transaction takes place.Every transaction has four characteristics, those are known as ACID properties.
◦ Atomicity
◦ Consistency
◦ Isolation
◦ Durability
Atomicity: – Every transaction follow atomicity model, which means that if a transaction is started, it should be either completed or rollback. To understand this lets take above example, if person is transferring amount from account “A” to account “B”, it should be credited to account B after completing the transaction. In case if any failure happens, after debiting amount from account “A” , the change should be rollback.
Consistency: - Consistency says that after the completion of a transaction, changes made during the transaction should be consistent. Let’s understand this fact by referring the above example, if account “A” has been debited by 200 RS then after completion of transaction account “B” should be credited by 200 RS. It means changes should be consistent.
Isolation: - Isolation states that every transaction should be isolated with each other, there should not be any interference between two transactions.
Durability: - Durability means that once the transaction is completed, all the changes should be permanent, it means that in case of any system failure, changes should not be lost.
ACID :
[A]tomic:- Everything succeeds or fails as a single unit.
[C]onsistent:- When the operation is complete, everything is left in a safe state.
[I]solated:- No other operation can impact me operation.
[D]urable:- When the operation is completed, changes are safe

Database JDBC using multicore vs isolation level overhead

Hallo,
I want to get data into a database on a multicore system with ative WAL using JDBC. I was thinking about spawning multiple threads in my application to insert data parallely.
If the application has multiple threads I will have to increase the isolation level to Repeatable Read which on MVCC-databases should be mapped to Snapshot isolation.
If I were using one thread I wouldn't need isolation levels. As far as I know most Snapshot isolation databases analyze the write sets of all transaction that could have a conflict and then rollback all but one of the real conflict transactions. More specific I'm talking about Oracle, InnoDB and PostgreSQL.
1.) Is this analyzing of the write sets expensive?
2.) Is it a good idea to multithread the inserts for a higher total throughput? Real conflict are nearly impossible because of the application layer feeding the threads conflict free stuff. But the database shall be a safety net.
Oracle does not support Repeatable Read. It supports only Read Committed and Serializable. I might be mistaken, but setting an isolation level of Repeatable Read for Oracle might result in a transaction with an isolation level of Serializable. In short, you are left to mercy of the database support for the isolation levels that you desire.
I cannot speak for InnoDB and PostgreSQL, but the same would apply if they do not support the required isolation levels. The database could automatically upgrade the isolation level to a higher level to meet the desired isolation characteristics. You ought to rethink this approach, if your application's desired isolation level has to be Repeatable Read.
The problem like you've rightly inferred is that optimistic locking will possibly result in transaction rollbacks, if a conflict is detected. Oracle does so by reporting the ORA-08177 SQL error. Since this error is reported when two threads will access the same data range, it could be avoided if the threads work against data sets involving different data ranges. You will have to ensure that this is the case when dividing work across threads.
I think the limiting factor here will be disk IO, not the overhead of moving to Repeatable Read.
Even a single thread may be able to max out the disks on the DB server especially with the amount of DB logging required on insert / update. Are you sure that's not already the case?
Also, in any multi-user system, you probably want to be running with Repeatable Read isolation anyway (Postgres only supports this and serializable). So, I don't think of this as adding any "overhead" above what I would normally see.

What's the difference between a transaction manager and a database manager?

Reading on both it seems that they both have similar responsibilities of managing the sharing and integrity of resources as well as prioritizing execution but I cannot seem to find how they differ? Can someone clarify this misunderstanding.
Thank You
In addition to what Oded already said:
A transaction manager manages transactions - and a transaction can include/address other resources than just databases. I have given the example of a printer at some occasions before.
A database manager manages data - and not necessarily in a transactional way. There is a very popular SQL system whose 1.0 version did not have commit/rollback, iow, did not offer transactional functionality and thus did not offer much of support for data integrity.
The distinction is mostly rather obtuse, however, because:
a great many real-life transactions involve no other recoverable resources than just the database,
in order to guarantee data consistency, DBMS's cannot avoid having to offer most if not all of the functionality of transactions.
A transaction manager manages transactions - these can be distributed (i.e. involving several databases/systems).
A database manager deals with a single database - managing it on the disk, memory consumption, query parsing etc...
Just to ensure understanding:
Transaction Manager deals with multiple levels of control and the physical database.
Database Manager deals with the direct access of the physical database.
I would also like to add to both of these answers that the transaction manager is also responsible to enforce ACID (Atomicity, Consistency, Isolation, and Durability). I was pretty much confused as well.

Resources