I'm working on a viewer program that formats the contents of a database. So far it's all been read-only, and I have a Refresh button that re-queries the database if I want to make sure to use current data.
Now I'm looking at changing the viewer to an editor (read-write) which involves writing back to the database, and am realizing there are potential concurrency issues: if more than one user is working on the database then there are the possibilities of stale data & other concurrency bugaboos.
What I'm wondering is, what are appropriate design patterns both for the database and the application UI to avoid concurrency problems?
To be bulletproof, I could force the user to use an explicit transaction (e.g. it's in read-only mode most of the time, then they have to push an Edit button to start a transaction, then Commit and Revert buttons to commit or revert the transaction) but that seems clunky and wouldn't work well with large sets of changes (Edit, then 1 hour's worth of changes yields an overly large transaction and may prevent other people from making changes). Also it would suck if someone's making a bunch of changes and then it fails -- then what should they do to avoid losing that work?
It seems like I'd want to notify the user when the relevant data is being changed so that the granularity to changes is small and they get cued to refresh from the database & get in the habit of doing so.
Also, if there are updates, should I automatically bring them into the application display? (assuming they don't clobber what the user is working on) Or should the user be forced to explicitly refresh?
A great example, which is sort of close to the situation I'm working on, is filesystem explorers (e.g. Windows Explorer) which show a hierarchy of folders/directories and a list of objects within them. Windows Explorer lets you refresh, but there's also some notification from the filesystem to the Explorer window, so that if a new file is created, it will just appear in the viewport without you having to hit F5 to refresh.
I found these StackOverflow posts, but they're not exactly the same question:
Web services and database concurrency
Distributed Concurrency Control
C# Database Application
Only display one record for editing at a time.
Submit new values conditionally, after applying whatever domain-specific validation is appropriate. If the record has changed in the meantime (most DAL-type software will throw an exception so you don't need to check manually), display the current (changed) values, advise the user, and accept another try (or abandon). You may want to indicate the source and timestamp of the change you are displaying.
That's the simplest reliable standard pattern I know of. Trying to induce the user to explicitly choose "Display" vs. "Edit" mode is problematic. It locks the record for some indeterminate amount of time, and it's not always reliable that you know when the user (for instance) gives up, turns off their computer, and goes home.
If you have a case where you have a parent record with editable child records (e.g. the line items on a purchase order), it gets more complex but let's worry about that later? There are patterns for those too.
a good working way i use:
don't open tran until really applying changes to db (after user presses Save button)
don't even need to refresh record before beginning user's edit dialog.
but just before applying changes, check if record is changed by another user in your app code.
it's done trough a select statement just before update statement.
if record with old field values (in DataSet) not exists in database, alert user that 'record is changed by another user' and user must close dialog, refresh record and start editing again.
else open tran and the rest.
Optimistic locking works fine for most cases where your records are composed of short simple fields (e.g., a short string or single numeric value per field), giving users the greatest access to the data, and not forcing them to worry about locks and stuff. Apply a write lock only when actually in the process of saving a record. No records are locked while anyone is merely editing. If the app finds a record it's trying to save is already locked, then the app simply retries a short time (<500 ms) later. There’s no need to alert the user (other than maybe hourglass/pointer feedback if last longer than 500 ms) since no lock is ever in place long enough to matter to the user. When User A saves a record, the database only updates the fields that User A has changed (along with any other fields that depend on those changed values). This avoids overwriting with old values the fields changed by User B since User A retrieved the record.
The implicit assumption is that whoever edits a field of a record last has the final say, which not an unreasonable way of doing business. For example, User A retrieves a record and edits a field, then User B retrieves a record and edits the same field. User B saves, then User A saves. User A’s changes over-write User B. User B’s work was “a waste,” but that sort of thing is going to happen anyway when users share data. Locks can only prevent wasted work when users happen to try to edit the same record in the same thin slice of time. However, the more likely event is that User B edits the record’s the field and saves, then User A edits the field and saves, again wasting User B’s work. There’s nothing you can do with locks to prevent that. If there’s really a high chance of wasted work by user interactions, it’s better to prevent it through the design of the business process rather than database locks.
As for the UI, there are two server styles I recommend: (1) Real Time, and (2) Transactional.
In Real Time style, the users’ displays automatically correspond as closely as practical to what’s in the database. Refreshes are automatic either being based on a short period (every fives seconds), or “pushed” to the user when changes are made by others. When the user enters a field and makes an edit, the app suppresses refreshes for that field, but continues to refresh other fields and records. There is no Save button or menu item. The app saves a record anytime a user edits a field and then leaves it or hits Enter. When the user starts to edit a field, the app changes the field's appearance to indicate that things are tentative (e.g., changing the border around the field to a dashed line) in order to encourage the user to hit Enter or Tab when done.
In Transactional, the users' displays are presented as a snapshot of what's in the database. The user must explicitly save and manually refresh data with buttons or menu items, except the app should automatically refresh a record when the user starts to edit it or after the user saves it. The user can edit any number of fields or records before saving. However, you can encourage frequent saves by changing the appearance of edited fields to indicate their tentative state, like recommended for Real Time. You can also display a timestamp or other indication of the last refresh to encourage users to refresh frequently.
Generally, Real Time is preferred. Users don’t have to worry about stale data or losing a lot of work by forgetting to save. However, use Transactional if it is necessary to maintain sufficient database performance. You probably don’t want Real Time if updating a field typically takes more than 1.0 second for server response. You should also consider Transactional if users’ edits trigger major events that are difficult to reverse or can produce wasted work (e.g., changing a budget value triggers notice to superior for approval). An explicit Save command is good for saying, “Okay, I’ve checked my work, let ‘er rip.”
Related
A client wants to build a worksheet-like application to show data from a database (presumably on a TDbGrid or similar), allowing free search and edition of all cells, as you would do in a worksheet. Underlying table will have more than 100k rows.
The problem with using TClientDataset, is that it tends to load all data into memory, violating user requisites, that are these 3:
User will be able navigate from first to last record at any moment using scroll bar, keyboard, or a search filter (note that TClientDataset will load all records if you go to last record, AFAIK...).
Connection will be through external VPN / internet (possibly slow), so only the actual visible records on screen should be loaded. Never all.
Editions must be kept inside a transaction, so they can be committed or rollbacked at the end, and reconciling if necessary.
Is it possible to accomplish this 3 points using TClientDataset?
If not, what are the alternatives?
I'm answering just by your last line regarding alternatives, I can add some suggestions:
1- You can use some creativity, provide pagination and fetch let's say 100 rows per page using a thread which is equipped with a nice progress bar in the UI. in this method you must manage search and filters by some smart queries, reloading data sometimes, etc...
2- Use third party components that optimized for this purpose like SDAC + EhLib Dbgrid.
SDAC is a dataset that can be useful for cache updates and EhDBGrid has a MemTable component inside it which is very powerful, free search, fuzzy match or approximate search work nicely, possible to revert, undo and redo, etc...
I need a solution for a special scenario:
Each user has many(millions to billions) database rows of his products. Each row is an product.
Each user can only change his own products.
Each subset of those products can be changed differently.
Each user can change different values(price, amount,...) by changing their value by adding, subtracting a fixed entered value.
Each user can also change those values by adding or subtracting a percentage value(add 3 % to all selected values or to a subset of all products).
Each change can be done through executing any amount of changes unless he saves his changes.
In addition those users need the possibility to roll back their changes to the initial state or to a state he defined any time ago, so he can choose which state he wants to be restored.(like many undo, redo functionalities stored in a state with a timestamp)
If a user has defined 12 or another amount of changing states and he decides to roll back them in reverse order than he must get all values restored to its initial state.
Based on the massive amount of data for each user it is not really practical to store all changes to each product.
It will be used in a web based application written in PHP, Javascript and MySQL.
Is there any possibility(database functionality, another database, api,...) to realize that?
Maybe something like the command pattern in a different way?
I hope someone has an idea how i can realize that.
I have a situation where some information is valid only for a limited period of time.
One example is conversion rates stored in DB with validFrom and ValidTo timestamps.
Imagine the situation when user starts the process and I render a pre-receipt for him with one conversion rate, but when he finally hits the button other rate is already valid.
Some solutions I see for now:
Show user a message about new rate, render updated pre-receipt and ask him to submit form again.
To have overlaying periods of rates. So the transactions started with one rate could finish, but the new ones will start with the new rate.
While the 1st solution seems most logical, I've never seen such messages on websites. I wonder are there other solutions and what is the best practice.
So this is a question best posed to the product owner of your application. If I were wearing my product owner hat, I would want that the data being displayed never be out of sync, such that option (2) above never occurs. This is to make sure the display is fair in all respects.
Ways to handle this:
As you say: display an alert that something changed and allow a refresh.
handle updates to the data tables using DHTML/ AJAX updates so that the data is usually fresh.
To summarize: it's a business decision, but generally speaking it's a bad choice to show unfair and/or out of data data on a page.
Here are two potential workflows I would like to perform in a web application.
Variation 1
user sends request
server reads data
server modifies data
server saves modified data
Variation 2:
user sends request
server reads data
server sends data to user
user sends request with modifications
server saves modified data
In each of these cases, I am wondering: what are the standard approaches to ensuring that concurrent access to this service will produce sane results? (i.e. nobody's edit gets clobbered, values correspond to some ordering of the edits, etc.)
The situation is hypothetical, but here are some details of where I would likely need to deal with this in practice:
web application, but language unspecified
potentially, using a web framework
data store is a SQL relational database
the logic involved is too complex to express well in a query e.g. value = value + 1
I feel like I would prefer not to try and reinvent the wheel here. Surely these are well known problems with well known solutions. Please advise.
Thanks.
To the best of my knowledge, there is no general solution to the problem.
The root of the problem is that the user may retrieve data and stare at it on the screen for a long time before making an update and saving.
I know of three basic approaches:
When the user reads the database, lock the record, and don't release until the user saves any updates. In practice, this is wildly impractical. What if the user brings up a screen and then goes to lunch without saving? Or goes home for the day? Or is so frustrated trying to update this stupid record that he quits and never comes back?
Express your updates as deltas rather than destinations. To take the classic example, suppose you have a system that records stock in inventory. Every time there is a sale, you must subtract 1 (or more) from the inventory count.
So say the present quantity on hand is 10. User A creates a sale. Current quantity = 10. User B creates a sale. He also gets current quantity = 10. User A enters that two units are sold. New quantity = 10 - 2 = 8. Save. User B enters one unit sold. New quantity = 10 (the value he loaded) - 1 = 9. Save. Clearly, something went wrong.
Solution: Instead of writing "update inventory set quantity=9 where itemid=12345", write "update inventory set quantity=quantity-1 where itemid=12345". Then let the database queue the updates. This is very different from strategy #1, as the database only has to lock the record long enough to read it, make the update, and write it. It doesn't have to wait while someone stares at the screen.
Of course, this is only useable for changes that can be expressed as a delta. If you are, say, updating the customer's phone number, it's not going to work. (Like, old number is 555-1234. User A says to change it to 555-1235. That's a change of +1. User B says to change it to 555-1243. That's a change of +9. So total change is +10, the customer's new number is 555-1244. :-) ) But in cases like that, "last user to click the enter key wins" is probably the best you can do anyway.
On update, check that relevant fields in the database match your "from" value. For example, say you work for a law firm negotiating contracts for your clients. You have a screen where a user can enter notes about negotiations. User A brings up a contract record. User B brings up the same contract record. User A enters that he just spoke to the other party on the phone and they are agreeable to the proposed terms. User B, who has also been trying to call the other party, enters that they are not responding to phone calls and he suspects they are stonewalling. User A clicks save. Do we want user B's comments to overwrite user A's? Probably not. Instead we display a message indicating that the notes have been changed since he read the record, and allowing him to see the new value before deciding whether to proceed with the save, abort, or enter something different.
[Note: the forum is automatically renumbering my numbered lists. I'm not sure how to override this.]
If you do not have transactions in mysql, you can use the update command to ensure that the data is not corrupted.
UPDATE tableA SET status=2 WHERE status = 1
If status is one, then only one process well get the result that a record was updated. In the code below, returns -1 if the update was NOT executed (if there were no rows to update).
PreparedStatement query;
query = connection.prepareStatement(s);
int rows = -1;
try
{
rows = query.executeUpdate();
query.close();
}
catch (Exception e)
{
e.printStackTrace();
}
return rows;
Things are simple in the application layer - every request is served by a different thread (or process), so unless you have state in your processing classes (services), everything is safe.
Things get more complicated when you reach the database - i.e. where the state is held. There you need transactions to ensure that everything is ok.
Transactions have a set of properties - ACID, that "guarantee database transactions are processed reliably".
I am writing an application where I have some publicly available information in a database which I want the users to be able to edit. The information is not textual like a wiki but is similar in concept because the edits bring the public information increasingly closer to the truth. The changes will affect multiple tables and the update needs to be automatically checked before affecting the public tables.
I'm working on the design and I'm wondering if there are any best practices that might help with some particular issues.
I want to provide undo capability.
I want to show the user the combined result of all their changes.
When the user says they're done, I need to check the underlying public data to make sure it hasn't been changed by somebody else.
My current plan is to have the user work in a set of tables setup to be a private working area. Once they're ready they can kick off a process to check everything and update the public tables. Undo can be recorded using Command pattern saving to a table.
Are there any techniques I might have missed or useful papers or patterns?
Thanks in advance!
I would do it like this:
Use insert only databases, you never update data only add new rows
Each row has a valid from date, a valid to date and who made the change
Read the data through a query where the valid to date = null, and the row is approved, this gives the most recent row
When a user adds data, he can see his changes by selecting the last row that he added
When the user is happy with the changes he has made he can mark them as approved
When they are approved they can be seen by other users
Undo is not a problem since you have all the previous versions, you can mark a row as no longer being approved and revert to a previous version.