What are the scenarios in which I should use timestamps? - database

What are the scenarios in which I should use timestamps? I know what a timestamp is, but I'd like to hear how it's helped folks in real-life projects.

Like #Paul McCowat we also used to use timestamp for concurrency handling long ago. With our switch to NHibernate (ORM) the trend is to use a simpler version number rather than timestamp. The Rails framework uses version numbers instead of timestamps for concurrency as well. We've pulled the timestamps from our database structure as we've migrated to newer ORM's.

Before I used an ORM (Linq2Sql or Enitity in my case), I read an interesting article by Imar Spaanjaars about n-tiered in asp.net using ado.net. In this example to handle concurrency, the timestamp column was checked when a record was returned, in say a gridview, and then checked on the database before an edit was made. By comparing the timestamp in memory and in the database, you are able to see if a change was made between retrieving the record and then asking then reporting back to the user.
So in summary, I have used the timestamp column to handle concurrency.

Related

How to store CQRS Read Models in SQL Server Table?

I'm looking into storing CQRS read models in SQL Server tables due to legacy system concerns (see approaches 2 & 3 of this question).
While I'd like to implement the read models using document database such as MongoDB, due to outside systems that can't be reworked at this time, I'm stuck with keeping everything in the rdbms for now.
Since I'm looking at storing records in a properly de-normalized way, what's the best way to actually store them when dealing with typical hierarchical data, such as the typical Customer / Order / LineItems /etc, that must all be displayed in the same view? [EDIT: What I'm thinking is that I put the data needed to query the model in separate fields, but the full object in a "object data field" with it]
Due to my legacy systems (mostly out of my control) I'm thinking that I'll add triggers to the legacy system tables or make sproc changes to keep my read models current, but how should I actually store the data itself?
I considered simply storing them as JSON in a field, or storing them as XML, as both can easily be serialized/deserialized from a .net application, and can reasonably easily be updated by triggers from other activities in the database. (Xpath/XQuery isn't so bad when you get used to it, and from another answer here, I found a JSON parser for T-SQL)
Is there a better approach? If not, should I use XML or JSON?
I would go with XML as it has a built-in support in SQL Server. In general I would avoid using any additional stuff written in T-SQL, as maintaining this can be a nightmare.

Save Database field changes - best practices? Versioning, Loggable?

i am using Symfony 2 with Doctrine as ORM Framework. I am searching for the best way to save changes done to database fields. I will have about 100 Tables each with about 50 fields and some thousand rows. Now i would like to save all changes done to the fields.
Possibilities i thought about:
Doctrine extension "Loggable" - saves changes in a different Table, but don't know if it can afford this amount of entries.
a MySQL Trigger for each Table that saves changes in a new Table?
But what is the best practice to save changes?
You can use either MySQL triggers or the mentioned DoctrineExtension Loggable feature. Both works, both has cons and pros. MySQL trigger can write into a separate table (see mysql trigger FAQ).
triggers:
++ framework, programming language independent
++ works when you want to modify the data by hand or by a script.
-- You have to write the triggers for every table or have to figure out some generic solution in SQL (I can't help on that).
-- If you are not familiar with stored procedures and PL/SQL, well, there is learning curve
doctrine extensions:
++ Just put your annotation on classes and you're done.
++ You can query the history, revert changes through the Repository API
-- you lock yourself to a vendor, this sometimes is, sometimes isn't a problem
-- doesn't works when you modify the data by hand or with a 3rd party scripts.
If the chance of switching doctrine to something else is low, I would start with doctrine extensions. It's a tool with the exact purpose to help dealing with SQL after all.
I'd suggest going with triggers, especially if you want your logging functionality to stay application independent — that is, it will work even if you decide to rewrite your app on a different framework or completely different programming language.
P.S. I don't know how great is triggers support in MySQL, since I switched to PostgreSQL before MySQL even had them.
Such a thing is commonly called "change data capture". It's been asked about with reference to MySQL before on SO:
Change Data Capture in MySQL
Maybe this answer can help you.
Different vendors make this a built in feature to varying degrees.
The following article has a step by step explanation + sample code of doing versioning using triggers.
http://www.jasny.net/articles/versioning-mysql-data/

Should I use messaging instead of a database

I am designing a system that will allow users to take data from one system and send to other systems. One of the destination systems has a sophisticated SOA (web services) and the other is a mainframe that accepts flat files for input.
I have created a database that has a PublishEvent table and PublishEventType table. There are also normalized tables that are specific to the type of event being published.
I also have an "interface" table that is a flatened out version of the normalized data tables. The end user has a process that puts data into the interface table. I am not sure of the exact process - I think it's some kind of reporting application that they can export results to a SQL table. I then use an SSIS package to take the data out of the interface table and put it into the normalized data structure and create new rows in the PublishEvent table. I use the flat table because when I first showed them the relational tables they seemed to be very confused.
I have a windows service that watches for new rows in the PublishEvent table. The windows service is extended with plug-ins (using the MEF framework). Which plug-in is called depends on the value of the PublishEventTypeID field in the PublishEvent row.
PublishEventTypeID 1 calls the plug-in that reads data from one set of tables and calls the SOA Web service. PublishEventTypeID 2 calls the plug-in that reads data from a different set of tables and created the flat file to be sent to the mainframe.
This seems like I am implementing the "Database as IPC" anti-pattern. Should I change my design to use a messaging based system? Is the process of puting data into the flat table then into the normalized tables redundant?
EDIT: This is being developed in .NET 3.5
A MOM is probably the better solution but you also have to take in account the following points:
Do you have a message based system already in place as part of your
customer's architecture? If not, maybe introducing it is an
overkill.
Do you have any experience with Message-based systems? As an Jason
Plank correctly mentioned, you have to take in account specific
patterns for these, like having to ensure chronological order of
messages, managing dead letter channels and so on (see this
book for more).
You mentioned a mainframe system which has apparently limited
options for interfacing with. Who will take care of the layer that
will transform "messages" (either DB or MOM based) into something
that the mainframe can digest? Assuming it is you, would it be
easier (for you) to do that by accessing the DB (maybe you have
already worked on the problem in the past) or would the effort be
different depending on using a DB or a MOM?
To sum it up: if you are more confident by going the DB route, maybe it's better to do that, even if - as you correctly suggested yourself, it is a bit of an "anti-pattern".
Some key items to keep in mind are:
Row order consistency - Does your data model depend on the order of the data generated? If so, does your scheme ensure the pub and sub activity in the same order original data is created?
Do you have identity columns on either side? They are a problem since their value keeps changing based on the order the data is inserted. If Identity column is the sole primary key (surrogate key), a change in its value may make the data unusable.
How do you prove that you have not lost a record? This is the trickiest part of the solution, especially if you have millions of rows.
As for the architecture, you may want to check out the XMPP protocol - Smack for client (if Java) and eJabberD for Server.
Have a look at nServiceBus, Mass Transit or RhinoServiceBus if you're using .Net.

Version Controlled Database with efficient use of diff

I have a project involving a web voting system. The current values and related data is stored in several tables. Historical data will be an important aspect of this project so I've also created Audit Tables to which current data will be moved to on a regular basis.
I find this strategy highly inefficient. Even if I only archive data on a daily basis, the number of rows will become huge even if only 1 or 2 users make updates on a given day.
The next alternative I can think of is only storing entries that have changed. This will mean having to build logic to automatically create a view of a given day. This means less stored rows, but considerable complexity.
My final idea is a bit less conventional. Since the historical data will be for reporting purposes, there's no need for web users to have quick access. I'm thinking that my db could have no historical data in it. DB only represents current state. Then, daily, the entire db could be loaded into objects (number of users/data is relatively low) and then serialized to something like XML or JSON. These files could be diffed with the previous day and stored. In fact, SVN could do this for me. When I want the data for a given past day, the system has to retrieve the version for that day and deserialize into objects. This is obviously a costly operation but performance is not so much a concern here. I'm considering using LINQ for this which I think would simplify things. The serialization procedure would have to be pretty organized for the diff to work well.
Which approach would you take?
Thanks
If you're basically wondering how revisions of data are stored in relational databases, then I would look into how wikis do it.
Wikis are all about keeping detailed revision history. They use simple relational databases for storage.
Consider Wikipedia's database schema.
All you've told us about your system is that it involves votes. As long as you store timestamps for when votes were cast you should be able to generate a report describing the vote state tally at any point in time... no?
For example, say I have a system that tallies favorite features (eyes, smile, butt, ...). If I want to know how many votes there were for a particular feature as of a particular date, then I would simply tally all the votes for the feature with a timestamp smaller or equal to that date.
If you want to have a history of other things, then you would follow a similar approach.
I think this is the way it is done.
Have you considered using a real version control system rather than trying to shoehorn a database in its place? I myself am quite partial to git, but there are many options. They all have good support for differences between versions, and they tend to be well optimised for this kind of workload.

Mobile/PDA + SQL Server data synchronization

Need a little advice here. We do some windows mobile development using the .NET Compact framework and SQL CE on the mobile along with a central SQL 2005 database at the customers offices. Currently we synchronize the data using merge replication technology.
Lately we've had some annoying problems with synchronization throwing errors and generally being a bit unreliable. This is compounded by the fact that there seems to be limited information out there on replication issues. This suggests to me that it isn't a commonly used technology.
So, I was just wondering if replication was the way to go for synchronizing data or are there more reliable methods? I was thinking web services maybe or something like that. What do you guys use for this implementing this solution?
Dave
I haven't used replication a great deal, but I have used it and I haven't had problems with it. The thing is, you need to set things up carefully. No matter which method you use you need to decide on the rules governing all of the various possible situations - changes in both databases, etc.
If you are more specific about the "generally being a bit unreliable" then maybe you'll get more useful advice. As it is all I can say is, I haven't had issues with it.
EDIT: Given your response below I'll just say that you can certainly go with a custom replication that uses SSIS or some other method, but there are definitely shops out there using replication successfully in a production environment.
well we've had the error occur twice which was a real pain fixing :-
The insert failed. It conflicted with an identity range check constraint in database 'egScheduler', replicated table 'dbo.tblServiceEvent', column 'serviceEventID'. If the identity column is automatically managed by replication, update the range as follows: for the Publisher, execute sp_adjustpublisheridentityrange; for the Subscriber, run the Distribution Agent or the Merge Agent.
When we tried running the stored procedure it messed with the identities so now when we try to synchronize it throws the following error in the replication monitor.
The row operation cannot be reapplied due to an integrity violation. Check the Publication filter. [,,,Table,Operation,RowGuid] (Source: MSSQLServer, Error number: 28549)
We've also had a few issues were snapshots became invalid but these were relatively easy to fix. However all this is making me wonder whether replication is the best method for what we're trying to do here or whether theres an easier method. This is what prompted my original question.
We're working on a similar situation, but ours is involved with programming a tool that works in a disconnected model, and runs on the Windows Desktop... We're using SQL Server Compact Edition for the clients and Microsoft SQL Server 2005 with a web service for the server solution.
TO enable synchronization services, we initially started by building our own synchronization framework, but after many issues with keeping that framework in sync with the rest of the system, we opted to go with Microsoft Synchronization Framework. (http://msdn.microsoft.com/en-us/sync/default.aspx for reference). Our initial requirements were to make the application as easy to use as installing other packages like Intuit QuickBooks, and I think that we have closely succeeded.
The Synchronization Framework from Microsoft has its ups and downs, but the only bad thing that I can say at this point is that documentation is horrendous.
We're in discussions now to decide whether or not to continue using it or to go back to maintaining our own synchronization subsystem. YMMV on it, but for us, it was a quick fix to the issue.
You're definitely pushing the stability envelope for CE, aren't you?
When I've done this, I've found it necessary to add in a fair amount of conflict tolerance, by not thinking of it so much as synchronization as simultaneous asynchronous data collection, with intermittent mutual updates and/or refreshes. In particular, I've always avoided using identity columns for anything. If you can strictly adhere to true Primary Keys based on real (not surrogate) data, it makes things easier. Sometimes a PK comprising SourceUnitNumber and timestamp works well.
If you can, view the remotely collected data as a simple timestamped, sourceided, userided log of cumulative chronologically ordered transactions. Going the other way, the host provides static validation info which never needs to go back - send back the CRUD transactions instead.
Post back how this turns out. I'm interested in seeing any kind of reliable Microsoft technology that helps with this.
TomH & le dorfier - I think that part of our problem is that we're allowing the customer to insert a large number of rows into one of the replicated table with an identity field. Its a scheduling application which can automatically multiple tasks up to a specified month/year. One of the times that it failed was around the time they entered 15000 rows into the table. We'll look into increasing the identity range.
The synchronization framework sounds interesting but sounds like it suffers from a similar problem to replication of having poor documentation. Trying to find help on replication is a bit of a nightmare and I'm not sure I want us to move to something with similar issues. Wish M'soft would stop releasing stuff that seems to have the support of beta s'ware!

Resources