I'm currently facing the following problem:
I have a C# .NET application connecting to a database (with the use of NHibernate). The application basically displays the database content and lets the user edit it. Since multiple instances of the application are running at the same time (on the same and on different workstations) i'm having concurrency problems as soon as two users modify the same record at the same time.
Currently I kind of solved the issues with optimistic locking. But this is not the perfect solution since one user still looses its changes.
Now i came up with the idea of having the application lock an entry every time it loads a new one from the database and release the lock as soon as the user switches to another entry. So basically all entries which are currently displayed to the user are locked in the database. If another user loads locked entries it will display them in a read-only mode.
Now to my actual question:
Is it a good idea to do the locking on database level? Which means i would open a new transaction every time a user loads a new entry and lock it. Or would it be better to do it through a "Lock Table" which holds for example a key to all locked entries in a table?
Thanks for your help!
Is it a good idea to do the locking on database level?
Yes, it is fine in some cases.
So basically all entries which are currently displayed to the user are
locked in the database.
...
Or would it be better to do it through a "Lock Table" which holds for example a key to all locked entries in a table?
So you lock a bunch of entries on page load? And when would you release them? What if the editing will take lots of time (e.g. had started editing entry and then went for a lunch)? What if user would close the page without editing all these locked entries, for how long entries would remain locked?
Pessimistic locking and "Lock Table" help to avoid some problems of optimistic locking but bring new.
Currently I kind of solved the issues with optimistic locking. But this is not the perfect solution since one user still looses its changes.
Can't agree that this is loosing, because in your case if validate and commit phases are performed as a single atomic operation then entry wouldn't be corrupted and only one transaction would be successful (let suppose it is the 1st), another would be rolled back (2nd).
According to NHibernate's Optimistic concurrency control
It will be atomic if only one of these database transactions (the last
one) stores the updated data, all others simply read data.
The only approach that is consistent with high concurrency and high
scalability is optimistic concurrency control with versioning.
NHibernate provides for three possible approaches to writing
application code that uses optimistic concurrency.
So the 2nd transaction would be gracefully rolled back and after that user could be notified that he has either to make new edit (new transaction) or skip this entry.
But everything depends on your business logic and requirements. If you don't have high contention for the data and thus there wouldn't be lots of collisions then I suggest you to use Optimistic locking.
Related
Question : Is keeping a lock on a record for a long period of time common practice with modern database systems ?
My understanding is locking records in a database (optimistic or pessimistic) is usually for very short period of time during a transaction.
The software I'm working with right now keeps locks on records for long periods of time :
A lock is kept on the record of the logged in user (in the ACTIVE_USERS' table) for the whole time the user is logged in the software.
Let say USER A is working on a file. The record corresponding to the file is locked until USER A saves the file or exit the file. So if a colleague, USER B tries to work on the same file, a popup shows up saying 'You can't work on this file because USER A is working on it right now'.
The company I'm working for to implement compatibility with Microsoft SQL Server wants the changes to be minimal : so I need to implement such a locking mechanism. I've hacked something that is working on a minimal test project but I'm not sure it is up to the industry and MSSQL's standards ...
This is a bit long for a comment.
Using the database locking mechanism for this application-level locking seems unusual. Database locks could be on the row, page, or table level, and they also affect indexes, so there could be unexpected side effects. Obviously, a proliferation of locks also makes deadlocks much more likely.
Normally, application locks would be handled on the record level. Using flags (of some sort) in the record, the application would ensure that only one row would have access to the file.
I would say, it might work. But I would never design a system that way and I'd be wary of unexpected consequences.
I made a database system right here:
(comments on the normalization are highly appreciated as well - I have a feeling you'll hate me on what I did with tblIsolateSensitivity; tblHAIFile only has a bunch of Boolean fields and foreign keys).
Let's say we have x number of terminals accessing the database. X1 edits Patient 01, X2 edits Patient 02, X3 deletes Patient 01 at the same time. How can I ensure that the data between the three terminals are all up-to-date and consistent?
At the moment, I am querying the data only when the query is needed to be done (ie: when the user searches for a record, or if the program needs to verify something against a database record), meaning the data is only as updated as the most recent query that the user makes. This makes it difficult to ensure that the data is up-to-date on all terminals. Of course, for deleted entries, I have error handling to handle that, but for the rest, well...
So, my question is: how do you guys typically handle this kind of situation? Is there a name for this concept so that I can look it up and read long?
From a database design perspective, you should read up on optimistic concurrency and pessimistic concurrency. These are two options for making sure that you either don't have two users modifying the same record at the same time, or at least if you do allow that, the conflict is detected so it can be resolved.
The basic idea behind optimistic concurrency is that you allow multiple users to view and modify the data at the same time, on the assumption that this will be relatively rare. However, before any user writes changes to the data, a check is made to ensure that the underlying data hasn't changed since it was originally read. In some cases you do this manually with a read before update, checking each column value against a cached value. However, that is cumbersome. Some DBMS systems have features that make this simpler. For example, SQL Server has the ROWVERSION (formerly known as TIMESTAMP) data type, which lets you check easily using a single value whether someone else has changed a record since the last time you read it.
The basic idea behind pessimistic concurrency is that you put a lock on a record in the expectation that you're going to change it. While you hold the lock, the DBMS will prevent anyone else from getting their own lock.
The advantage of optimistic concurrency is that it's pretty light weight, doesn't interfere too much with your application, and let's you manually (or automatically) resolve any conflicts on those rare occasions when they happen. You also don't have to worry about someone reading a record, locking it and then going home for the weekend.
The advantage of pessimistic concurrency is that it prevents collisions, but it can stop one user from working while they wait for another to finish what they're doing.
From the perspective of notifying users when records change in the background (i.e. they're changed by another user) that isn't a database design feature. It may be a feature of your application logic or of your application's data access layer.
I have a web application where I want to ensure concurrency with a DB level lock on the object I am trying to update. I want to make sure that a batch change or another user or process may not end up introducing inconsistency in the DB.
I see that Isolation levels ensure read consistency and optimistic lock with #Version field can ensure data is written with a consistent state.
My question is can't we ensure consistency with isolation level only? By making my any transaction that updates the record Serializable(not considering performance), will I not ensure that a proper lock is taken by the transaction and any other transaction trying to update or acquire lock or this transaction will fail?
Do I really need version or timestamp management for this?
Depending on isolation level you've chosen, specific resource is going to be locked until given transaction commits or rollback - it can be lock on a whole table, row or block of sql. It's a pessimistic locking and it's ensured on database level when running a transaction.
Optimistic locking on the other hand assumes that multiple transactions rarely interfere with each other so no locks are required in this approach. It is a application-side check that uses #Version attribute in order to establish whether version of a record has changed between fetching and attempting to update it.
It is reasonable to use optimistic locking approach in web applications as most of operations span through multiple HTTP request. Usually you fetch some information from database in one request, and update it in another. It would be very expensive and unwise to keep transactions open with lock on database resources that long. That's why we assume that nobody is going to use set of data we're working on - it's cheaper. If the assumption happens to be wrong and version has changed in between requests by someone else, Hibernate won't update the row and will throw OptimisticLockingException. As a developer, you are responsible for managing this situation.
Simple example. Online auctions service - you're watching an item page. You read its description and specification. All of it takes, let's say, 5 minutes. With pessimistic locking and some isolation levels you'd block other users from this particular item page (or all of the items even!). With optimistic locking everybody can access it. After reading about the item you're willing to bid on it so you click the proper button. If any other of users watching this item and change its state (owner changed its description, someone other bid on it) in the meantime you will probably (depending on app implementation) be informed about the changes before application will accept your bid because version you've got is not the same as version persisted in database.
Hope that clarifies a few things for you.
Unless we are talking about some small, isolated web application (only app that is working on a database), then making all of your transactions to be Serializable would mean having a lot of confidence in your design, not taking into account the fact that it may not be the only application hitting on that certain database.
In my opinion the incorporation of Serializable isolation level, or a Pessimistic Lock in other words, should be very well though decision and applied for:
Large databases and short transactions that update only a few rows
Where the chance that two concurrent transactions will modify the same rows is relatively low.
Where relatively long-running transactions are primarily read-only.
Based on my experience, in most of the cases using just the Optimistic Locking would be the most beneficial decision, as frequent concurrent modifications mostly happen in only small percentage of cases.
Optimistic locking definately also helps other applications run faster (dont think only of yourself!).
So when we take the Pessimistic - Optimistic locking strategies spectrum, in my opinion the truth lies somewhere more towards the Optimistic locking with a flavor of serializable here and there.
I really cannot reference anything here as the answer is based on my personal experience with many complex web projects and from my notes when i was preapring to my JPA Certificate.
Hope that helps.
I recently came up with a case that makes me wonder if I'm a newbie or something trivial has escaped to me.
Suppose I have a software to be run by many users, that uses a table. When the user makes login in the app a series of information from the table appears and he has just to add and work or correct some information to save it. Now, if the software he uses is run by many people, how can I guarantee is he is the only one working with that particular record? I mean how can I know the record is not selected and being worked by 2 or more users at the same time? And please I wouldn't like the answer use “SELECT FOR UPDATE... “
because for what I've read it has too negative impact on the database. Thanks to all of you. Keep up the good work.
This is something that is not solved primarily by the database. The database manages isolation and locking of "concurrent transactions". But when the records are sent to the client, you usually (and hopefully) closed the transaction and start a new one when it comes back.
So you have to care yourself.
There are different approaches, the ones that come into my mind are:
optimistic locking strategies (first wins)
pessimistic locking strategies
last wins
Optimistic locking: you check whether a record had been changed in the meanwhile when storing. Usually it does this by having a version counter or timestamp. Some ORMs and frameworks may help a little to implement this.
Pessimistic locking: build a mechanism that stores the information that someone started to edit something and do not allow someone else to edit the same. Especially in web projects it needs a timeout when the lock is released anyway.
Last wins: the second person storing the record just overwrites the first changes.
... makes me wonder if I'm a newbie ...
That's what happens always when we discover that very common stuff is still not solved by the tools and frameworks we use and we have to solve it over and over again.
Now, if the software he uses is runed by many people how can I guarantee is he
is the only one working with that particular record.
Ah...
And please I wouldn't like the answer use “SELECT FOR UPDATE... “ because for
what I've read it has too negative impact on the database.
Who cares? I mean, it is the only way (keep a lock on a row) to guarantee you are the only one who can change it. Yes, this limits throughput, but then this is WHAT YOU WANT.
It is called programming - choosing the right tools for the job. IN this case impact is required because of the requirements.
The alternative - not a guarantee on the database but an application server - is an in memory or in database locking mechanism (like a table indicating what objects belong to what user).
But if you need to guarantee one record is only used by one person on db level, then you MUST keep a lock around and deal with the impact.
But seriously, most programs avoid this. They deal with it either with optimistic locking (second user submitting changes gets error) or other programmer level decisions BECAUSE the cost of such guarantees are ridiculously high.
Oracle is different from SQL server.
In Oracle, when you update a record or data set the old information is still available because your update is still on hold on the database buffer cache until commit.
Therefore who is reading the same record will be able to see the old result.
If the access to this record though is a write access, it will be a lock until commit, then you'll have access to write the same record.
Whenever the lock can't be resolved, a deadlock will pop up.
SQL server though doesn't have the ability to read a record that has been locked to write changes, therefore depending which query you're running, you might lock an entire table
First you need to separate queries and insert/updates using a data-warehouse database. Which means you could solve slow performance in update that causes locks.
The next step is to identify what is causing locks and work out each case separately.
rebuilding indexes during working hours could cause very nasty locks. Push them to after hours.
We are re-writing an old Cobol application in Java EE.
The old Cobol is a full client application.
The client's requirement is to lock entities (e.g. a particular account) so that no one can access it or at least only in read-only. This is because some transactions might be long and we don't want users to enter a lot of data just to loose everything while updating.
Optimistic locking is not wanted.
Currently the requirement is implemented by creating locks in a file system with a lot of problems like concurrent access, no transactions. Not very Java EE compliant.
The lock should also tell the client that is locking an entity.
Any suggestion?
I would suggest using the database itself for locking instead of the file system approach.
In an application we implemented locking an an per entity basis by using a special table LOCK which had the fields ENTITY_TYPE, ENTITY_ID, USER_ID and VALID_TO. Locks created timed out after a certain time when the user did not do anything. This was to prevent locked entities which can never be edited by other users when a client closes the application or due to network errors etc.
Before allowing a user to edit an entity we checked / created a row in this table. On the UI the user had a button to lock an entity or an info box showing the user holding the lock if an entity was already locked.