How to avoid executing database transactions on dirty data? - database

I am trying to find an architectural approach to handling the following concurrency issue:
Multiple users may submit database transactions to the same subset of a database table in a relational database simultaneously, each with different transactions. In this scenario each transaction will run in Isolation level Serializable thus ensuring that the transactions are handled as if they occurred one after the other but this does not solve my specific concurrency issue...
Transaction 1 originates from User 1 who has through the Application made a batch of inserts, updates and deletes on a subset of a table.
User 2 may have started editing the data in the Application any time prior to the commit from Transaction 1 and thus is editing dirty data. Transaction 2 thus originates from User 2 who has (unknowingly) made a batch of inserts, updates, deletes on a the same subset of the table as User 1 BUT these are likely to overwrite and in some instances overlap with the changes made in Transaction 1. Transaction 2 will not fail based on MVCC but is not sensible to perform.
I need Transaction 2 to fail (ideally not even start) due to the data in the database (after the Transaction 1 commit) not being in the state that it was in when User 2 received his/her initial data to work on.
There must be "standard" architectural patterns to achieve my objective - any pointers in the right direction will be much appreciated.

Transaction 2 will not fail based on MVCC
If the transactions are really done in serializable isolation level, then transaction 2 will fail if the transaction includes both the original read of the data (since changed), and the attempted changes based on it. True serializable isolation was implemented way back in PostgreSQL 9.1.

Related

How is an update query on a locked database resolved in postgresql?

Using postgresql, if a user wants to add new data or update existing data in the database while it is locked, how is his transaction resolved? Lets consider this scenario and please correct me if my understanding is wrong at any point:
1. User 1 wants to batch update some records in the database.
2. The transaction from user 1 locks the database until all the updates are pushed.
3. User 2 wants to update something in the database, or insert some new data to the database while it is locked.
4. Based on what MVCC denotes, user 2 is shown the pre-lock version of the database.
5. User 2 inserts or updates the data.
6. User one finishes pushing its transaction and releases the database.
7. There are two versions of database now, the data is resolved.
How does the issue in step 7 get resolved? I read somewhere that it will take the data with the latest global time stamp. But how can I be sure that is the data it should keep? If the data from user 2 has priority over user 1, but the transaction from user 2 finished before user 1, how would this priority be resolved? Thank you for your help.
You cannot lock the database in PostgreSQL, but you can lock a table exclusively. This is something you should never do, because it is not necessary and hurts concurrency and system maintenance processes (autovacuum).
Every row you modify or delete will be automatically locked for the duration of the transaction. This does not affect the modification of other rows by concurrent sessions.
If a transaction tries to modify a row that is already locked by another transaction, the second transaction is blocked and has to wait until the first transaction is done. This way the scenario you describe is avoided.

Payment Transactions vs Database Transactions

So in payments processing, you have this concept of payment transaction. As payments or messages come in from various internal and external interfaces, payment transactions get created, probably in some main base transaction table in some relational database (at least for the sake of this question). Then the transactions have some state that changes through a series of edits until one of several final states (paid or unpaid, approved or declined, etc) is reached.
When dealing with databases, you of course have the database transaction, and my question is really, are there any rules of thumb about processing payments transactions within database transactions? The transaction is often an aggregate root to many other tables for information on the customer or cardholder or merchant or velocity settings that participate in that transaction.
I could see a rule saying, "never process more than one payment transaction in a database transaction". But I could also see a database transaction being correct when performing batch type operations, when you must consider the whole batch of transactions successful or failed, so you have the option of rollback.
The normal design pattern is to stick to the following rule: Use database transactions to transition the database from one valid state to another.
In the context of payment transactions that probably means that adding a transaction is one db transaction. Then, each processing step on the transaction (like validation, fulfillment, ...) would be another db transaction.
I could see a rule saying, "never process more than one payment transaction in a database transaction"
You can put multiple logical transactions into one physical transactions for performance reasons or for architectural reasons. That's not necessarily a problem. You need to make sure that work is not lost, though, because a failure will abort an entire batch.

Transaction isolation in JET?

MSDN describes the JET transaction isolation for its OLEDB provider as follows:
Jet supports five levels of nesting in transactions. The only
supported mode for transactions is Read Committed. Setting lesser
levels of transactional separation implies Read Committed. Setting
higher levels will cause StartTransaction to fail.
Jet supports only single-phase commit.
MSDN describes Read Committed as follows:
Specifies that shared locks are held while the data is being read to
avoid dirty reads, but the data can be changed before the end of the
transaction, resulting in nonrepeatable reads or phantom data. This
option is the SQL Server default.
My questions are:
What is single-phase commit? What consequence does this have for transactions and isolation?
Would the Read Committed isolation level as described above be suitable for my requirements here?
What is the best way to achieve a Serializable transaction isolation using Jet?
By question number:
Single-phase commit is used where all of your data is in one database -- the activity of the transaction is committed atomically and you're done. If you have a logical transaction which needs to be spread across multiple storage engines (like a relational database for metadata and some sort of document store for a big blob) you can use a transaction manager to coordinate the activities so that the work is persisted in both or neither, if both products support two phase commit. They are just telling you that they don't support two-phase commit, so the product is not suitable for distributed transactions.
Yes, if you check the condition in the UPDATE statement itself; otherwise you might have problems.
They seem to be suggesting that you can't.
As an aside, I worked for decades as a consultant in quite a variety of environments. More than once I was engaged to migrate people off of Jet because of performance problems. In one case a simple "star" type query was running for two minutes because it was joining on the client rather than letting the database do it. As a direct query against the database it was sub-second. In another case there was a report which took 72 hours to run through Jet, which took 2 minutes when run directly against the database. If it generally works OK for you, you might be able to deal with such situations by using stored procedures where Jet is causing performance pain.

SQL Server and ACID property of Database

I am newbie to database and SQL Server.
So when i have search things about database on internet i have found that The Database said to be good if it obey or follow the ACID (Atomicity, Consistency, Isolation, Durability) property.
I wonder that the Microsoft SQL Server (Any Version Current or previous one) follow the ACID property internally or if we are using MS SQL Server in our application then we have to write coding such way that our Application follow the ACID property.
In Short: Maintain the ACID property is task (or liability) of Database
Or its task of Application Programmer.
Thanks..
IMHO, it is a two fold maintainance. Both DB Admins (writing stored procedures ) and programmers should enforce ACID Properties. SQL Server maintains its own ACID properties internally and we don't have to worry about that.
ACID Properties are enforced in SQL Server.
Read this: Acid Properties of SQL 2005
But that doesn't mean that that the Database would handle everything for you.
According to Pinal Dave (blog.sqlauthority.com)
ACID (an acronymn for Atomicity
Consistency Isolation Durability) is a
concept that Database Professionals
generally look for when evaluating
databases and application
architectures. For a reliable database
all this four attributes should be
achieved.
Atomicity is an all-or-none
proposition.
Consistency guarantees that a
transaction never leaves your database
in a half-finished state.
Isolation keeps transactions separated
from each other until they’re
finished.
Durability guarantees that the
database will keep track of pending
changes in such a way that the server
can recover from an abnormal
termination.
Above four rules are very important
for any developers dealing with
databases.
That goes for developers dealing with databases.
But application developers should also write business logic on which ACID properties are being enforced.
An example on Practical use of ACID properties would help you more I guess
Almost every modern database systems enforce ACID properties.
Read this : Database transaction and ACID properties
ACID --> Atomicity, Consistency, Isolation, Durability
Atomicity:
A transaction is the fundamental unit of processing. Either all of its operations are executed, or none of them are.
Suppose that the system crashes after the Write(A) operation (but before write(B).)
Database must be able to recover old values of A and B (or complete entire transaction)
Consistency Preserving:
Executing a transaction alone must move the database from one consistent state to another consistent state.
Sum of A and B must be unchanged by the execution of the transaction
Isolation:
A transaction should not make its effects known to other transactions until after it commits.
If two transactions execute concurrently, it must appear that one completed execution before the other started.
If another transaction executing at the same time is reading (and/or writing to) accounts A and B, it should not be able to read the data in an inconsistent state (after write to A and before write to B)
Durability:
Once a transaction commits, the changes to the database can not be lost due to a future failure.
Once transaction completes, we will always have new values of A and B in the database
Transaction:-A transaction is a batch of SQL statements that behaves like a single unit. In simple words, a transaction is a unit where a sequence of work is done to complete the whole activity. We can take an example of Bank transaction to understand this.
When we transfer money from account “A” to account “B”, a transaction takes place.Every transaction has four characteristics, those are known as ACID properties.
◦ Atomicity
◦ Consistency
◦ Isolation
◦ Durability
Atomicity: – Every transaction follow atomicity model, which means that if a transaction is started, it should be either completed or rollback. To understand this lets take above example, if person is transferring amount from account “A” to account “B”, it should be credited to account B after completing the transaction. In case if any failure happens, after debiting amount from account “A” , the change should be rollback.
Consistency: - Consistency says that after the completion of a transaction, changes made during the transaction should be consistent. Let’s understand this fact by referring the above example, if account “A” has been debited by 200 RS then after completion of transaction account “B” should be credited by 200 RS. It means changes should be consistent.
Isolation: - Isolation states that every transaction should be isolated with each other, there should not be any interference between two transactions.
Durability: - Durability means that once the transaction is completed, all the changes should be permanent, it means that in case of any system failure, changes should not be lost.
ACID :
[A]tomic:- Everything succeeds or fails as a single unit.
[C]onsistent:- When the operation is complete, everything is left in a safe state.
[I]solated:- No other operation can impact me operation.
[D]urable:- When the operation is completed, changes are safe

What is a "distributed transaction"?

The Wikipedia article for Distributed transaction isn't very helpful.
Can you give a high-level description with more details of what a distributed transaction is?
Also, can you give me an example of why an application or database should perform a transaction that updates data on two or more networked computers?
I understand the classic bank example; I care more about distributed transactions in Web-scale databases like Dynamo, Bigtable, HBase, or Cassandra.
Usually, transactions occur on one database server:
BEGIN TRANSACTION
SELECT something FROM myTable
UPDATE something IN myTable
COMMIT
A distributed transaction involves multiple servers:
BEGIN TRANSACTION
UPDATE amount = amount - 100 IN bankAccounts WHERE accountNr = 1
UPDATE amount = amount + 100 IN someRemoteDatabaseAtSomeOtherBank.bankAccounts WHERE accountNr = 2
COMMIT
The difficulty comes from the fact that the servers must communicate to ensure that transactional properties such as atomicity are satisfied on both servers: If the transaction succeeds, the values must be updated on both servers. If the transaction fails, the transaction must be rollbacked on both servers. It must never happen that the values are updated on one server but not updated on the other.
Distributed transactions span multiple physical systems, whereas standard transactions do not. Synchronization amongst the systems becomes a need which traditionally would not exist in a standard transaction.
From your Wikipedia reference...
...a distributed transaction can be
seen as a database transaction that
must be synchronized (or provide ACID
properties) among multiple
participating databases which are
distributed among different physical
locations...
A distributed transaction is a transaction that works across several computers. Say you start a transaction in some method in a program on computer A. You then make some changes to data in the method on computer A, and afterwords the method calls a web service on computer B. The web service method on computer B fails and rolls the transaction back. Since the transaction is distributed, this means that any changes made on computer A also need to be rolled back. The combination of the distributed transaction coordinator on windows and the .net framework facilitate this functionality.
I have tried to show the distributed transactions details in this post Performance tuning of Distributed ( XA ) transaction - How?
The good data for distributed transaction is the data that has very high requirement for Consistency. Usually this is money or something else that we can never have stale data. I usually define two categories Live data and data that there is no immediate need for correctness/consistency.
Now the second part of the question about Dynamo, Bigtable, HBase, or Cassandra.
You can not draw a paralel in between NOSQL databases and distributed transactions. The very existance of this class of databases is justified as a means of avoiding distributed transactions. Distributed transaction is centered all around consistency. That is quite the opposite with the NOSQL storage which is centered around Availability and Partitioning.
The usual transactional model used in such databases is Eventual Consistency.
A distributed transaction is a transaction on a distributed database (i.e., one where the data is stored on a number of physically separate systems). It's noteworthy because there's a fair amount of complexity involved (especially in the communications) to assure that all the machines remain in agreement, so either the whole transaction succeeds, or else it appears that nothing happened at all.
Generally,A distributed transaction involves multiple physical servers. There is two classes of distributed transactions:
Update data in a distributed database, which is a logical database, but store data in multiple physical servers. Examples are Google's Spanner, or PingCAP's TiDB. In these cases, DB system take care of the distributed transaction, and developers do not need to care about.
Update data in multiple databases or in multiple services. In the context of micro-services, coupon, account, payments and so on may be separated services for your order system. In this case, developer should ensure the atomicity of the updates. If the transaction succeeds, the values must be updated on both servers. If the transaction fails, the transaction must be rollbacked on both servers. It must never happen that the values are updated on one server but not updated on the other. This article The seven most classic solutions for distributed transaction management has an in-depth discussion about distributed transactions

Resources