What transaction isolation level should I use - database

I am working on a custom application that allows users to insert / update / delete / retrieve data from a database.
Stored procedures I use affects a few tables so I use a transaction and I want to be able to lock these tables to prevent unintended things from happening when multiple users are inserting , deleting, updating, or retrieving (CRUD) data.
My question is what would be the best isolation level to use here, I have read the MSDN on isolation levels and tried to make sense of it as well as searching around, If someone who has been here done that could comment quickly that would be greatly appreciated.

The default .Net transaction level is serializable. Why do you need another transaction level? Are you under the impression that without an explicit transaction, other connections will be able to change the data out from under you? If so, that is incorrect -- all actions take place in an explicit or implicit transaction, and the only time you can get inconsistent results is if you explicitly set the transaction isolation level to something which allows it.
Edit: As pointed out in a comment, the default for the database engine is READ COMMITTED, but the default for the .Net transactions/connections is SERIALIZABLE, with a caveat -- if the transaction level is changed, it will not be "dropped" for a pooled connection. This means that you can never be absolutely sure what it is unless you set it yourslf. For most cases, you can probably get away with assuming SERIALIZABLE and leave it at that.

Related

Isolation Level vs Optimistic Locking-Hibernate , JPA

I have a web application where I want to ensure concurrency with a DB level lock on the object I am trying to update. I want to make sure that a batch change or another user or process may not end up introducing inconsistency in the DB.
I see that Isolation levels ensure read consistency and optimistic lock with #Version field can ensure data is written with a consistent state.
My question is can't we ensure consistency with isolation level only? By making my any transaction that updates the record Serializable(not considering performance), will I not ensure that a proper lock is taken by the transaction and any other transaction trying to update or acquire lock or this transaction will fail?
Do I really need version or timestamp management for this?
Depending on isolation level you've chosen, specific resource is going to be locked until given transaction commits or rollback - it can be lock on a whole table, row or block of sql. It's a pessimistic locking and it's ensured on database level when running a transaction.
Optimistic locking on the other hand assumes that multiple transactions rarely interfere with each other so no locks are required in this approach. It is a application-side check that uses #Version attribute in order to establish whether version of a record has changed between fetching and attempting to update it.
It is reasonable to use optimistic locking approach in web applications as most of operations span through multiple HTTP request. Usually you fetch some information from database in one request, and update it in another. It would be very expensive and unwise to keep transactions open with lock on database resources that long. That's why we assume that nobody is going to use set of data we're working on - it's cheaper. If the assumption happens to be wrong and version has changed in between requests by someone else, Hibernate won't update the row and will throw OptimisticLockingException. As a developer, you are responsible for managing this situation.
Simple example. Online auctions service - you're watching an item page. You read its description and specification. All of it takes, let's say, 5 minutes. With pessimistic locking and some isolation levels you'd block other users from this particular item page (or all of the items even!). With optimistic locking everybody can access it. After reading about the item you're willing to bid on it so you click the proper button. If any other of users watching this item and change its state (owner changed its description, someone other bid on it) in the meantime you will probably (depending on app implementation) be informed about the changes before application will accept your bid because version you've got is not the same as version persisted in database.
Hope that clarifies a few things for you.
Unless we are talking about some small, isolated web application (only app that is working on a database), then making all of your transactions to be Serializable would mean having a lot of confidence in your design, not taking into account the fact that it may not be the only application hitting on that certain database.
In my opinion the incorporation of Serializable isolation level, or a Pessimistic Lock in other words, should be very well though decision and applied for:
Large databases and short transactions that update only a few rows
Where the chance that two concurrent transactions will modify the same rows is relatively low.
Where relatively long-running transactions are primarily read-only.
Based on my experience, in most of the cases using just the Optimistic Locking would be the most beneficial decision, as frequent concurrent modifications mostly happen in only small percentage of cases.
Optimistic locking definately also helps other applications run faster (dont think only of yourself!).
So when we take the Pessimistic - Optimistic locking strategies spectrum, in my opinion the truth lies somewhere more towards the Optimistic locking with a flavor of serializable here and there.
I really cannot reference anything here as the answer is based on my personal experience with many complex web projects and from my notes when i was preapring to my JPA Certificate.
Hope that helps.

Replicate a database using snapshots and transaction logs

For learning purposes, I want to write my own database, that is able to replicate itself. I have made some progress, but now I am facing a problem that I can not solve. Supposed I have a database (let's call this source) that I would like to replicate to another database (let's call this target).
The basic principle is easy: In the source you don't store actual tables, but instead a log of transactions. It's easy to send over the transaction log to the target, where the database then rebuilds itself. If you want to update the target, you simply request the part of the transaction log that has changed ever since. Basically this is what almost every database does.
While this works, it has one major drawback: If a table already exists for a long time, the transaction log is very long, and hence replicating the table requires lots of timeā€¦
To avoid this you can store the current state as well. This means you have an up-to-date snapshot that you can copy fast. Additionally, the target has to subscribe to the transaction log of the source. Once it contains additional entries, the target applies them to its copied table. This works well, too, and it's way better in terms of performance and transferred volume.
But now I am facing a problem: Supposed the snapshot is large, then it may happen that changes are made to it while it is being delivered. That means that the copied snapshot contains some old and some new data. Now, how do I get the target database in a consistent state? Even if I know from where to start the transaction log, I either have to apply a change that was already applied to some of the records, or I have to leave it out, but then a change is not applied at all to some other records.
Of course I could use the isolation level sequential, but then performance drops. Of course I could do what e.g. CouchDB does and remember the current table revision in every record, and keep a copy of every record for every revision. But then the required space grows enormously.
So, what shall I do?
Everything that I was able to find on the web always either relies on the idea of replaying the entire transaction log, or by using a process as in CouchDB which takes up huge amounts of space.
Any ideas?
Your snapshot needs to be consistent and you need to know at what time (in regards to the tx log) it is consistent. You then apply any transactions that have been committed since this point.
Obtaining a consistent snapshot can be done with exclusive locking, which may delay other transactions from committing, or using row versions (MVCC).
Good luck with your project.

Does the Google Cloud Datastore create transactions implicitly?

In many databases, when an operation is performed without explicitly starting a transaction, the database creates a new transaction implicitly.
Does the datastore do this?
If it does not, is there any model for reasoning about how the data changes in the absence of transactions? How do puts, fetches, and reads, work outside of transactions?
If it does, is there any characterization for when and how. Does it do it always? What is the scope of the transaction?
A mutation (put, delete) of a single entity will always be atomic (succeed entirely or fail entirely). You can think of the single mutation as transactional, even if you did not provide a transaction.
However, if you send multiple mutations in the same non-transactional request, that overall request is not atomic. Each mutation may succeed or fail independently -- one failure will not cause the other mutations to be reverted.
"Transactions are an optional feature of the Datastore; you're not required to use transactions to perform Datastore operations."
so there are no automatic transactions being opened for you across more than a single entity datastore operation.
a single entity commit will behave the same as a transaction internally. so if you are changing more than one entity or committing it more than once, its as if you open and close a transaction every time.

ISOLATION levels in Transaction

I want to know that what is the best way to arrive at the isolation level of the transaction?
This is a good link of the available ISOLATION levels.
Blockquote It will be nice if someone can explain the various isolation levels of a transaction
Update: Clarified and corrected explanation.
Isolation levels just indicate how much of your transaction is affected by other concurrent transactions. The higher the isolation level, the less affected it is.
The effort will be made manifest in cpu load, memory load, and perhaps commit latency. In addition, write conflicts can be more likely in higher isolation levels, which may mean that you have to abort your transaction and retry the whole thing. (This only affects transactions that perform updates or inserts, not transactions which only perform selects.)
In general, the rule of thumb is to use the lowest level that gives your application the consistency it needs.
The partial transaction isolation provided by Read Committed mode is adequate for many applications, and this mode is fast and simple to use; however, it is not sufficient for all cases. Applications that do complex queries and updates might require a more rigorously consistent view of the database than Read Committed mode provides.
The Serializable mode provides a rigorous guarantee that each transaction sees a wholly consistent view of the database. However, the application has to be prepared to retry transactions when concurrent updates make it impossible to sustain the illusion of serial execution. Since the cost of redoing complex transactions can be significant, serializable mode is recommended only when updating transactions contain logic sufficiently complex that they might give wrong answers in Read Committed mode. Most commonly, Serializable mode is necessary when a transaction executes several successive commands that must see identical views of the database.
( http://www.postgresql.org/docs/8.4/interactive/transaction-iso.html is very nice. )
If you're not sure about the differences in isolation levels, then stick to the default. Changing the level can have peculiar side-effects. 99% of applications are fine with the default.
The default I think varies with each JDBC driver, although some frameworks like JPA may enforce it, I can't recall offhand. The most common default is read_committed, because it gives the best general-purpose balance between transactional safety and concurrency. If you pick a different isolation level, you sacrifice either safety or concurrency, and you have to be aware of the compromise.
What the heck is the question?!
Isolation levels define the lock type and lock granularity used by the DBMS. Locking is essential in the context of DBMS's, as transactions are executed concurrently, by potentially many users. Higher transaction isolation--such as SERIALIZABLE--is safer--you can potentially eliminate dirty reads and phantom updates--but impose a penalty as serialized transactions limit concurrency and therefore preclude scalability.
What to do? Architect the application such that the logic limits the possibility of "bad data" by judiciously using serialized transactions when they're absolutely needed, but not such that concurrency is unnecessarily hampered.
Transaction Isolation Levels are about solving data reading problems in concurrent transactions (when, within one transaction, we read the same data that another transaction changes at the same time).
There are 4 isolation levels. Each solves 1 related problem + the problems of all previous levels:
#
Isolation level
Problem to be solved
Problem description
1
Read Uncommitted
Lost Update
Only the last of the concurrent transactions affects the read data. The impact of other transactions is lost
2
Read Committed
Dirty Read
The read data was changed by a transaction, that was then rolled back
3
Repeatable Read
Non-repeatable Read
Second read of the same data gives a result that is not equal to the first read, because the data was changed by another transaction between reads
4
Serialization
Phantom Read
Second data select by the same parameters is different from the first one, because the data was changed by another transaction between reads
Isolation levels refer to the DB layer.
Isolation levels are a tradeoff between data accuracy and performance: higher level gives higher accuracy, but lower speed.
The default database level is usually Read Committed (PostgreSQL) or Repeatable Read (MySQL).

Does Active Directory commitChanges method work the same as a DataBase Commit transaction?

I do not have a way to test now so is it possible for you to confirm me the question of the title?
I mean in a ADO.NET database transaction, I can update/insert thousands of records before commiting to the database. In Active Directory using System.Directory.Services it seems I need to commit for every entry (or record) that I update/insert.
Thanks.
Active Directory is not a transactional store - so you don't have the transaction support like you have with a database.
Your observation is absolutely correct - with Active Directory, you deal on a per-object basis; you can retrieve an object, manipulate it, and then save back all the changes (or discard them) - but you don't have any transaction support to roll back a whole series of operations.
If you really must have this capability, you'd have to write your own Resource Manager for AD (see some ideas here in MSDN) - this would allow you to wrap your AD operations in a TransactionScope() and roll them back. I don't think this is a trivial undertaking, otherwise, someone would have done it already....
So your current observations are absolutely correct, and without a whole lot of effort, this cannot be changed, unfortunately.

Resources