Database concurrency - handling out of sync objects

Database concurrency - handling out of sync objects - database

I am using a DAL service to retrive data from the database.
Lets observe to simplest case where I retrieve one object from the database.
After retrieving that object I am doing some changes to its properties according to some business logic,
and than I want to update the object in the persistent database.
However, some other client (maybe even one I am not aware that exists) changed the state of the underline object in the database, and I identify this when I am trying to update.
What should I do in this case?
Should I throw an exception?
Should I try to update only the fields that I changed?
Should I lock that table for writing while I am performing bussiness logic based on the persistant data?
Guy

I think what you should do depends on what you are trying to achieve.
Your main options as i see it:
lock beforehand - main pros & cons - occupying the database until you commit, much more simple.
don't lock beforehand, merge in case someone else updated it - main disadvantage - merging can be very complex
I would go with the first one, but i would try to minimize the locking time (i.e i would figure out what's all the changes i want to do prior to locking the object).
Any way i don't think this is an exceptional case.. so i won't go with throwing exception.

This is very subjective and it depends on what exactly you are trying to.
Should I throw an exception?
You should if you are not expectig the update by other user. For instance, your software is trying to book a seat which has been already booked by somebody, you will throw say SeatAlreadyBookedException and handle that appropriately by logging or showing proper message
Should I try to update only the fields that I changed?
You can do that if you have not used the existing state to make an update or you want your changes to be final changes overriding any changes already done by other users. For instance, if want to update a new date for the dead-line for a project.
Should I lock that table for writing while I am performing bussiness
logic based on the persistant data?
Locking a table will affect the overall throughput. Your application logic should take of these transactions and maintan the data integrity.

Related

Notify / Check if database content has changed

Is it possible to know when and if the contents of certain tables in a database has changed?
I' builting an multiuser app, and I want to notify the user if somebody else modified any relevant data. I'm using an Oracle 10g database and an .NET WinForms app.
Thanks!

One approach is that each time you make an update to the users data you increment a counter associated with that user. Then your application can grab that counter from time to time and see if it has increased and so know if it needs to update itself because something has changed. This is easy to implement.
I am sure Oracle has some callback mechanism that allows it to tell you when an update has occured. This would be more efficient but I do not know enough about Oracle to provide more information about that approach.

If a data model changes (an object) whats the best approach to also update the database?

Let's say theres a Teacher object and that Teachers holds a list of Courses objects.
The courses for the Teacher can change. Some get added, some get deleted.
What's the best approach to update this changes in the database.
1. Right after every change, update the database. e.g.: A course get added, immediately add that into the database as well .
2. After all changes are made to the entity/object Teacher (courses are added, courses are deleted), only then update the database with all the changes.
3. Others ??
I can see for both 1 and 2 advantages and disadvantages.
For 1: I don't know how good it is when data models have direct access to the database.
For 2: The algorithm its more complex because you have to compare the information in the data models with information in the database all at once.
Thank you

Take a look at some of the ORM tools available for your language or platform. An object's representation of an entity in the database can always get out-of-sync. For example, if a user changed a certain property, and a database update was attempted but for whatever reasons that update failed. Now these two items are not synchronized.
It may also depend on the frequency of updates to your database. If you don't expect massive activity in terms of writes to the database, triggering instant database updates on any property change may be an option.
Otherwise, you may include a dirty flag in your object to indicate when it goes out of sync with the database. Object changes could be updated as they happen, or when a event is triggered such as when the user decides to save their progress, or periodically, say every x minutes.
Different languages and frameworks implement these model objects differently. In Rails, a model object is subclasses from ActiveRecord and knows how to persist itself into a database. In Java, you would rarely mix domain objects with the way they're persisted. That's usually taken care of by an ORM framework or custom DAO objects.

I found out about Hibernate. It's exactly what i need and it's simple.

Lock whole database?

I have really odd user requirement. I have tried to explain to them there are much better ways of supporting their business process and they don't want to hear it. I am tempted to walk away but I first want to see if maybe there is another way.
Is there any way that I can lock a whole database as opposed to row-lock or table-lock. I know I can perhaps put the database into single-user mode but that means only one person can use it at a time. I would like many people to be able to read at a time but only one person to be able to write to it at a time.
They are trying to do some really odd data migration.

What do you want to achieve?
Do you want to make the whole database read-only? You can definitely do that
Do you want to prevent any new clients from connecting to the database? You can definitely do that too
But there's really no concept of a "database lock" in terms of only ever allowing one person to use the database. At least not in SQL Server, not that I'm aware of. What good would that make you, anyway?
If you want to do data migration out of this database, then setting the database into read-only mode (or creating a snapshot copy of it) will probably be sufficient and the easiest way to go.
UPDATE: for the scenario you mention (grab the data for people with laptops, and then re-syncronize), you should definitely check out ADO.NET Sync Services - that's exactly what it's made for!
Even if you can't use ADO.NET Sync Services, you should still be able to selectively and intelligently update your central database with the changes from laptops without locking the entire database. SQL Server has several methods to update rows even while the database is in use - there's really no need to completely lock the whole database just to update a few rows!
For instance: you should have a TIMESTAMP (or ROWVERSION) column on each of your data tables, which would easily allow you to see if any changes have occured at all. If the TIMESTAMP field (which is really just a counter - it has nothing to do with date or time) has not changed, the row has not changed and thus doesn't need to be considered for an update.

History tables pros, cons and gotchas - using triggers, sproc or at application level [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I am currently playing around with the idea of having history tables for some of my tables in my database. Basically I have the main table and a copy of that table with a modified date and an action column to store what action was preformed e.g., Update, Delete and Insert.
So far I can think of three different places that you can do the history table work.
Triggers on the main table for update, insert and delete. (Database)
Stored procedures. (Database)
Application layer. (Application)
My main question is; what are the pros, cons and gotchas of doing the work in each of these layers?
One advantage I can think of by using the triggers way is that integrity is always maintained no matter what is implemented on top of the database.

I'd put it this way:
Stored procs: they're bypassed if you modify the table directly. Security on the database can control this
Application: same deal. Also if you have multiple applications, possibly in different languages, it needs to be implemented in each stack, which is somewhat redundant; and
Triggers: transparent to the application and will capture all changes. This is my preferred method.

Triggers are the quickest and easiest way to achieve simple history. The following information assumes a more complex example where history processing may include some business rules and may require logging information not found in the table being tracked.
To those that think that triggers are safer than sprocs because they cannot be bypassed I remind them that they are making the following assumption:
!) Permissions exist that stop users from executing DISABLE TRIGGER [but then permissions could too exist to limit all access to the database except for EXECUTE on sprocs which is a common pattern for enterprise applications] - therefore one must assume correct permissions and therefore sprocs equal triggers in terms of security and ability to be bypassed
!) Depending on the database it may be possible to execute update statements that do not fire triggers. I could take advantage of knowledge of nested trigger execution depth to bypass a trigger. The only sure solution includes security in database and limiting access to data using only approved mechanisms - whether these be triggers, sprocs or data access layers.
I think the choices are clear here. If the data is being accessed by multiple applications then you want to control the history from the lowest common layer and this will mean the database.
Following the above logic, the choice of triggers or stored procedures depends again on whether the stored procedure is the lowest common layer. You should prefer the sproc over the trigger as you can control performance, and side effects better and the code is easier to maintain.
Triggers are acceptable, but try to make sure that you do not increase locks by reading data outside of the tables being updated. Limit triggers to inserts into the log tables, log only what you need to.
If the application uses a common logical access layer and it is unlikely that this would change over time I would prefer to implement the logic here. Use a Chain Of Responsibility pattern and a plug-in architecture, drive this from Dependency Injection to allow for all manner of processing in you history module, including logging to completely different types of technology, different databases, a history service or anything else that you could imagine.

Have been using the trigger based approach for years and it has definitely worked well for us, but then you do have the following points to ponder over:
Triggers on a heavily used (say, a multi-tenant SaaS based application) could be extremely expensive
In some scenarios, a few fields can get redundant. Triggers are good only when you are crystal clear on the fields to be logged; though using an application you could have an interceptor layer which could help you log certain fields based on the "configuration"; though with it's own share of overheads
Without adequate database control, a person could easily disable the triggers, modify the data and enable the triggers; all without raising any alarms
In case of web applications, where the connections are established from a pool, tracking the actual users who made the changes can be tedious. A possible solution would be to have the "EditedBy" field in every transaction table.

Late one but it adds couple more options that can be considered.
Change Data Capture: This feature is available in SQL Server 2008 R2+ but only in enterprise edition. It allows you to select tables you want to track and SQL Server will do the job for you. It works by reading transaction log and populating history tables with data.
Reading transaction log: If database is in full recovery mode then transaction log can be read and details on almost transactions can be found.
Downside is that this is not supported by default. Options are to read transaction log using undocumented functions like fn_dblog or third party tools such as ApexSQL Log.
Triggers: Works just fine for small number of tables where there are not too many triggers to manage. If you have a lot of tables you want to audit then you should consider some third party tool for this.
All of these work at the database level and are completely transparent to application.

Triggers are the only reliable way to capture changes. If you do it in Stored Procs or the App, you can always go in and SQL away a change that you don't have a log for (inadvertantly). Of course, somebody who doesn't want to leave a log can disable triggers. But you'd rather force somebody to disable the logging than hope that they remember to include it.

Usually if you choose the application layer, you can design your app code to do the logging in a single point, that will handle consistenly all your historical table. differently triggers are a more complicated approach to maintain because they are (depending on the db technology) replicated for every table: in case of hundred of tables the amount of code for the trigger coud be a problem.
if you have a support organization that will maintain the code you are writing now, and you don't know who will maintain your code (tipical for big industries) you cannot assume which is the skill level of the person who will do fix on your application, in that case it is better in my opinion to make the historical table working principle as simple as possible, and the application layer is probably the best place for this purpose.

Effective strategy for leaving an audit trail/change history for DB applications?

What are some strategies that people have had success with for maintaining a change history for data in a fairly complex database. One of the applications that I frequently use and develop for could really benefit from a more comprehensive way of tracking how records have changed over time. For instance, right now records can have a number of timestamp and modified user fields, but we currently don't have a scheme for logging multiple change, for instance if an operation is rolled back. In a perfect world, it would be possible to reconstruct the record as it was after each save, etc.
Some info on the DB:
Needs to have the capacity to grow by thousands of records per week
50-60 Tables
Main revisioned tables may have several million records each
Reasonable amount of foreign keys and indexes set
Using PostgreSQL 8.x

One strategy you could use is MVCC, Multi-Value Concurrency Control. In this scheme, you never do updates to any of your tables, you just do inserts, maintaining version numbers for each record. This has the advantage of providing an exact snapshot from any point in time, and it also completely sidesteps the update lock problems that plague many databases.
But it makes for a huge database, and selects all require an extra clause to select the current version of a record.

If you are using Hibernate, take a look at JBoss Envers. From the project homepage:
The Envers project aims to enable easy versioning of persistent JPA classes. All that you have to do is annotate your persistent class or some of its properties, that you want to version, with #Versioned. For each versioned entity, a table will be created, which will hold the history of changes made to the entity. You can then retrieve and query historical data without much effort.
This is somewhat similar to Eric's approach, but probably much less effort. Don't know, what language/technology you use to access the database, though.

In the past I have used triggers to construct db update/insert/delete logging.
You could insert a record each time one of the above actions is done on a specific table into a logging table that keeps track of the action, what db user did it, timestamp, table it was performed on, and previous value.
There is probably a better answer though as this would require you to cache the value before the actual delete or update was performed I think. But you could use this to do rollbacks.

The only problem with using Triggers is that it adds to performance overhead of any insert/update/delete. For higher scalability and performance, you would like to keep the database transaction to a minimum. Auditing via triggers increase the time required to do the transaction and depending on the volume may cause performance issues.
another way is to explore if the database provides any way of mining the "Redo" logs as is the case in Oracle. Redo logs is what the database uses to recreate the data in case it fails and has to recover.

Similar to a trigger (or even with) you can have every transaction fire a logging event asynchronously and have another process (or just thread) actually handle the logging. There would be many ways to implement this depending upon your application. I suggest having the application fire the event so that it does not cause unnecessary load on your first transaction (which sometimes leads to locks from cascading audit logs).
In addition, you may be able to improve performance to the primary database by keeping the audit database in a separate location.

I use SQL Server, not PostgreSQL, so I'm not sure if this will work for you or not, but Pop Rivett had a great article on creating an audit trail here:
Pop rivett's SQL Server FAQ No.5: Pop on the Audit Trail
Build an audit table, then create a trigger for each table you want to audit.
Hint: use Codesmith to build your triggers.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight