Which ORM framework can best handle an MVCC database design? - database

When designing a database to use MVCC (Multi-Version Concurrency Control), you create tables with either a boolean field like "IsLatest" or an integer "VersionId", and you never do any updates, you only insert new records when things change.
MVCC gives you automatic auditing for applications that require a detailed history, and it also relieves pressure on the database with regards to update locks. The cons are that it makes your data size much bigger and slows down selects, due to the extra clause necessary to get the latest version. It also makes foreign keys more complicated.
(Note that I'm not talking about the native MVCC support in RDBMSs like SQL Server's snapshot isolation level)
This has been discussed in other posts here on Stack Overflow. [todo - links]
I am wondering, which of the prevalent entity/ORM frameworks (Linq to Sql, ADO.NET EF, Hibernate, etc) can cleanly support this type of design? This is a major change to the typical ActiveRecord design pattern, so I'm not sure if the majority of tools that are out there could help someone who decides to go this route with their data model. I'm particularly interested in how foreign keys would be handled, because I'm not even sure of the best way to data model them to support MVCC.

I might consider implementing the MVCC tier purely in the DB, using stored procs and views to handle my data operations. Then you could present a reasonable API to any ORM that was capable of mapping to and from stored procs, and you could let the DB deal with the data integrity issues (since it's pretty much build for that). If you went this way, you might want to look at a more pure Mapping solution like IBatis or IBatis.net.

I designed a database similarly (only INSERTs — no UPDATEs, no DELETEs).
Almost all of my SELECT queries were against views of only the current rows for each table (highest revision number).
The views looked like this…
SELECT
dbo.tblBook.BookId,
dbo.tblBook.RevisionId,
dbo.tblBook.Title,
dbo.tblBook.AuthorId,
dbo.tblBook.Price,
dbo.tblBook.Deleted
FROM
dbo.tblBook INNER JOIN
(
SELECT
BookId,
MAX(RevisionId) AS RevisionId
FROM
dbo.tblBook
GROUP BY
BookId
) AS CurrentBookRevision ON
dbo.tblBook.BookId = CurrentBookRevision.BookId AND
dbo.tblBook.RevisionId = CurrentBookRevision.RevisionId
WHERE
dbo.tblBook.Deleted = 0
And my inserts (and updates and deletes) were all handled by stored procedures (one per table).
The stored procedures looked like this…
ALTER procedure [dbo].[sp_Book_CreateUpdateDelete]
#BookId uniqueidentifier,
#RevisionId bigint,
#Title varchar(256),
#AuthorId uniqueidentifier,
#Price smallmoney,
#Deleted bit
as
insert into tblBook
(
BookId,
RevisionId,
Title,
AuthorId,
Price,
Deleted
)
values
(
#BookId,
#RevisionId,
#Title,
#AuthorId,
#Price,
#Deleted
)
Revision numbers were handled per-transaction in the Visual Basic code…
Shared Sub Save(ByVal UserId As Guid, ByVal Explanation As String, ByVal Commands As Collections.Generic.Queue(Of SqlCommand))
Dim Connection As SqlConnection = New SqlConnection(System.Configuration.ConfigurationManager.ConnectionStrings("Connection").ConnectionString)
Connection.Open()
Dim Transaction As SqlTransaction = Connection.BeginTransaction
Try
Dim RevisionId As Integer = Nothing
Dim RevisionCommand As SqlCommand = New SqlCommand("sp_Revision_Create", Connection)
RevisionCommand.CommandType = CommandType.StoredProcedure
RevisionCommand.Parameters.AddWithValue("#RevisionId", 0)
RevisionCommand.Parameters(0).SqlDbType = SqlDbType.BigInt
RevisionCommand.Parameters(0).Direction = ParameterDirection.Output
RevisionCommand.Parameters.AddWithValue("#UserId", UserId)
RevisionCommand.Parameters.AddWithValue("#Explanation", Explanation)
RevisionCommand.Transaction = Transaction
LogDatabaseActivity(RevisionCommand)
If RevisionCommand.ExecuteNonQuery() = 1 Then 'rows inserted
RevisionId = CInt(RevisionCommand.Parameters(0).Value) 'generated key
Else
Throw New Exception("Zero rows affected.")
End If
For Each Command As SqlCommand In Commands
Command.Connection = Connection
Command.Transaction = Transaction
Command.CommandType = CommandType.StoredProcedure
Command.Parameters.AddWithValue("#RevisionId", RevisionId)
LogDatabaseActivity(Command)
If Command.ExecuteNonQuery() < 1 Then 'rows inserted
Throw New Exception("Zero rows affected.")
End If
Next
Transaction.Commit()
Catch ex As Exception
Transaction.Rollback()
Throw New Exception("Rolled back transaction", ex)
Finally
Connection.Close()
End Try
End Sub
I created an object for each table, each with constructors, instance properties and methods, create-update-delete commands, a bunch of finder functions, and IComparable sorting functions. It was a huge amount of code.
One-to-one DB table to VB object...
Public Class Book
Implements iComparable
#Region " Constructors "
Private _BookId As Guid
Private _RevisionId As Integer
Private _Title As String
Private _AuthorId As Guid
Private _Price As Decimal
Private _Deleted As Boolean
...
Sub New(ByVal BookRow As DataRow)
Try
_BookId = New Guid(BookRow("BookId").ToString)
_RevisionId = CInt(BookRow("RevisionId"))
_Title = CStr(BookRow("Title"))
_AuthorId = New Guid(BookRow("AuthorId").ToString)
_Price = CDec(BookRow("Price"))
Catch ex As Exception
'TO DO: log exception
Throw New Exception("DataRow does not contain valid Book data.", ex)
End Try
End Sub
#End Region
...
#Region " Create, Update & Delete "
Function Save() As SqlCommand
If _BookId = Guid.Empty Then
_BookId = Guid.NewGuid()
End If
Dim Command As SqlCommand = New SqlCommand("sp_Book_CreateUpdateDelete")
Command.Parameters.AddWithValue("#BookId", _BookId)
Command.Parameters.AddWithValue("#Title", _Title)
Command.Parameters.AddWithValue("#AuthorId", _AuthorId)
Command.Parameters.AddWithValue("#Price", _Price)
Command.Parameters.AddWithValue("#Deleted", _Deleted)
Return Command
End Function
Shared Function Delete(ByVal BookId As Guid) As SqlCommand
Dim Doomed As Book = FindByBookId(BookId)
Doomed.Deleted = True
Return Doomed.Save()
End Function
...
#End Region
...
#Region " Finders "
Shared Function FindByBookId(ByVal BookId As Guid, Optional ByVal TryDeleted As Boolean = False) As Book
Dim Command As SqlCommand
If TryDeleted Then
Command = New SqlCommand("sp_Book_FindByBookIdTryDeleted")
Else
Command = New SqlCommand("sp_Book_FindByBookId")
End If
Command.Parameters.AddWithValue("#BookId", BookId)
If Database.Find(Command).Rows.Count > 0 Then
Return New Book(Database.Find(Command).Rows(0))
Else
Return Nothing
End If
End Function
Such a system preserves all past versions of each row, but can be a real pain to manage.
PROS:
Total history preserved
Fewer stored procedures
CONS:
relies on non-database application for data integrity
huge amount of code to be written
No foreign keys managed within database (goodbye automatic Linq-to-SQL-style object generation)
I still haven't come up with a good user interface to retrieve all that preserved past versioning.
CONCLUSION:
I wouldn't go to such trouble on a new project without some easy-to-use out-of-the-box ORM solution.
I'm curious if the Microsoft Entity Framework can handle such database designs well.
Jeff and the rest of that Stack Overflow team must have had to deal with similar issues while developing Stack Overflow: Past revisions of edited questions and answers are saved and retrievable.
I believe Jeff has stated that his team used Linq to SQL and MS SQL Server.
I wonder how they handled these issues.

To the best of my knowledge, ORM frameworks are going to want to generate the CRUD code for you, so they would have to be explicitly designed to implement a MVCC option; I don't know of any that do so out of the box.
From an Entity framework standpoint, CSLA doesn't implement persistence for you at all -- it just defines a "Data Adapter" interface that you use to implement whatever persistence you need. So you could set up code generation (CodeSmith, etc.) templates to auto-generate CRUD logic for your CSLA entities that go along with a MVCC database architecture.
This approach would work with any entity framework, most likely, not just CSLA, but it would be a very "clean" implementation in CSLA.

Check out the Envers project - works nice with JPA/Hibernate applications and basically does that for you - keeps track of different versions of each Entity in another table and gives you SVN-like possibilities ("Gimme the version of Person being used 2008-11-05...")
http://www.jboss.org/envers/
/Jens

I always figured you'd use a db trigger on update and delete to push those rows out into a TableName_Audit table.
That'd work with ORMs, give you your history and wouldn't decimate select performance on that table. Is that a good idea or am I missing something?

What we do, is just use a normal ORM ( hibernate ) and handle the MVCC with views + instead of triggers.
So, there is a v_emp view, which just looks like a normal table, you can insert and update into it fine, when you do this though, the triggers handle actually inserting the correct data into the base table.
Not.. I hate this method :) I'd go with a stored procedure API as suggested by Tim.

Related

Insert/Update rows in a DataTable, VB.NET

I'm having issues with a piece of code that should load a database table on a TypedTable and insert (or update if the key is already present), on the update part though my code run extremely slow.
Now, most of the tables I handle require a full refresh, so I wipe the data and re-add everything from another table in the typedtable using a simple AddTableRow(row) procedure that works just fine, but when I need to update the data I use the LoadDataRow(row, fAcceptChanges) function, and even with the .BeginLoadData() -> .EndLoadData() it gets extremely slow (2/3 update per second) with a Table containing around 500k rows of data (every row has like 15 cols).
I'm pretty new to vb.net so I don't know much about alternatives I have to update the datatable, but if anyone know any way to speed it up I'll be really glad to hear everything about.
Some more info:
Mostly the reason because I'm inserting the data row by row is because I need to check the constraints for my table so I can handle exeptions raised from the insert part, plus the automatic constraint check of the TypedDataTable it's pretty good, considering I have to handle more than 10 db tables.
My code for the update run like this atm:
Table = Parser.GetData()
TypedTable = TableAdapter.GetData()
For Each row In Table
Try
Dim TypedRow = TypedTable.NewRow()
LoadNotTypedIntoTyped(row, TypedRow)
TypedTable.BeginLoadData()
TypedTable.LoadDataRow(TypedRow.ItemArray, True) 'TODO speed up this
TypedTable.EndLoadData()
Catch ex As Exception
'Generic exception handling here
End Try
Next
SqlBulkCopyLoadProcedure()
I found a good solution to my particular problem; using a typedtable mean that I have more control on the table constraints, because my datasource is related to the DB table, so I created a new empty typed table to load the new data, then I load the current data from the db and Table1.Merge(Table2) to merge the data.
In my case this is possible because the amount od data I handle in not too big (around 500k records), if the memory becomes a problem I think that a viable solution can be to create a support table and merge directly using SQL, but I'm a DB newbie so contradict me if I'm wrong here
Code of what I did:
Dim SupportTable As TypedTable = MyTypedTable.Clone()
For each row in TableToLoad
Dim NewTypedRow = SupportTable.NewRow()
For Each col In Columns
'Load every column
Next
SupportTable.AddTypedRow(NewTypedRow)
Next
TypedTable.Merge(SupportTable)
TypedTable.AcceptChanges()
'Load to database

Batch insert with EclipseLink not working

I am using EclipseLink-2.6.1 with Amazon RDS database instance. The following code is used to insert new entities into database:
tx = em.getTransaction();
tx.begin();
for (T item : persistCollection) {
em.merge(item);
}
tx.commit();
The object which is being persisted has composite primary key (not a generated one). Locally, queries run super fast, but when inserting into remote DB it is really slow process (~20 times slower). I have tried to implement JDBC batch writing but had no success with it (eclipselink.jdbc.batch-writing and rewriteBatchedStatements=true). When logging queries which are being executed I only see lots of SELECTS and not one INSERT (SELECTS are probably here because objects are detached at first).
My question is how to proceed on this problem? (I would like to have batch writing and then see how the performance changes, but any help is appreciated)
Thank you!
Edit:
When using em.persist(item) loop is finished almost instantly but after tx.commit() there are lots (I guess for every persisted item) queries like :
[EL Fine]: sql: ServerSession(925803196) Connection(60187547) SELECT NAME FROM TICKER WHERE (NAME = ?), bind => [AA]
My model has #ManyToOne relationship with ticker_name. Why are there again so many slow SELECT queries?

TransactionScope And Function Scope- Are These Connections In-Scope?

Suppose you set up a TransactionScope object as illustrated per the Microsoft example here. Now suppose that you need to update a lot of database tables, and you want them all in the scope of the TransactionScope object. Continually nesting SqlConnection and SqlCommand objects 10 deep will create a source code mess. If instead you call other functions which create connections (in your data access layer, for example), will they be within scope of the TransactionScope object?
Example:
' Assume variable "x" is a business object declared and populated with data.
Using scope As New TransactionScope()
Dal.Foo.SaveProducts(x.Products)
Dal.Foo.SaveCustomer(x.Customer)
Dal.Foo.SaveDetails(x.Details)
' more DAL calls ...
Dal.Foo.SaveSomethingElse(x.SomethingElse)
scope.Complete()
End Using
Assume that each DAL function contains its own using statements for connections. Example:
Public Shared Sub SaveProducts(x As Object)
Using conn As New SqlConnection("connection string")
Using cmd As New SqlCommand("stored procedure name", conn)
With cmd
' etc.
End With
End Using
End Using
End Sub
Yes, they will be inside the TransactionScope. What the TransactionScope basically does is to create a Transaction object and set Transaction.Current to that.
In other words, this:
Using scope As New TransactionScope()
... blah blah blah ...
End Using
is basically the same as this:
try
{
// Transaction.Current is a thread-static field
Transaction.Current = new CommittableTransaction();
... blah blah blah ...
}
finally
{
Transaction.Current.Commit(); // or Rollback(), depending on whether the scope was completed
Transaction.Current = null;
}
When a SqlConnection is opened, it checks if Transaction.Current (on this thread) is null or not, and if it is not null then it enlists (unless enlist=false in the connection string). So this means that SqlConnection.Open() doesn't know or care if the TransactionScope was opened in this method or a method that called this one.
(Note that if you wanted the SqlConnection in the child methods to NOT be in a transaction, you can make an inner TransactionScope with TransactionScopeOption.Suppress)
When you create a TransactionScope, all connections you open while the TransactionScope exists join the transaction automatically (they're 'auto enlisted'). So you don't need to pass connection strings around.
You may still want to, when SQL Server sees different transactions (even if they are all contained by one DTC transaction), it doesn't share locks between them. If you open too many connections and do a lot of reading and writing, you're headed for a deadlock.
Why not put the active connection in some global place and use it?
Some more info after some research. Read this: TransactionScope automatically escalating to MSDTC on some machines? .
If you're using SQL Server 2008 (and probably 2012, but not any other database), some magic is done behind the scenes, and if you open two SQL Connections one after the other, they are going to be united into a single SQL transaction, and you're not going to have any locking problem.
However, if you're using a different database, or you may open two connections concurrently, you will get a DTC transaction, which means SQL Server will not manage the locks properly, and you may encounter very unpleasant and unexpected deadlocks.
While it's easy to make sure you're only running on SQL Server 2008, making sure you don't open two connections at the same time is a bit harder. It's very easy to forget it and do something like this:
class MyPersistentObject
{
public void Persist()
{
using(SQLConnection conn=...)
{
conn.Open()
WriteOurStuff()
foreach(var child in this.PersistedChildren)
child.Persist()
WriteLogMessage()
}
}
}
If the child's Persist method opens another connection, your transaction is escalated into a DTC transaction and you're facing potential locking issues.
So I still suggest maintaining the connection in one place and using it through your DAL. It doesn't have to be a simple global static variable, you can create a simple ConnectionManager class with a ConnectionManager.Current property which will hold the current connection. Make ConnectionManager.Current as [ThreadStatic] and you solved most of your potential problems. That's exactly how the TransactionScope works behind the scenes.

Is there an automatic way to generate a rollback script when inserting data with LINQ2SQL?

Let's assume we have a bunch of LINQ2SQL InsertOnSubmit statements against a given DataContext. If the SubmitChanges call is successful, is there any way to automatically generate a list of SQL commands (or even LINQ2SQL statements) that could undo everything that was submitted at a later time? It's like executing a rollback even though everything worked as expected.
Note: The destination database will either be Oracle or SQL Server, so if there is specific functionality for both databases that will achieve this, I'm happy to use that as well.
Clarification:
I do not want the "rollback" to happen automatically as soon as the inserts have succesfully completed. I want to have the ability to "undo" the INSERT statements via DELETE (or some other means) up to 24 hours (for example) after the original program finished inserting data. We can ignore any possible referential integrity issues that may come up.
Assume a Table A with two columns: Id (autogenerated unique id) and Value (string)
If the LINQ2SQL code performs two inserts
INSERT INTO Table A VALUES('a') // Creates new row with Id = 1
INSERT INTO Table A VALUES('z') // Creates new row with Id = 2
<< time passes>>
At some point later I would want to be able "undo" this by executing
DELETE FROM A Where Id = 1
DELETE FROM A Where Id = 2
or something similar. I want to be able to generate the DELETE statements to match the INSERT ones. Or use some functionality that would let me capture a transaction and perform a rollback later.
We cannot just 'reset the database' to a certain point in time either as other changes not initiated by our program could have taken place since.
It is actually quite easy to do this, because you can pass in a SqlConnection into the LINQ to SQL DataContext on construction. Just run this connection in a transaction and roll that transaction back as soon as you're done.
Here's an example:
string output;
using (var connection = new SqlConnection("your conn.string"))
{
connection.Open();
using (var transaction = connection.StartTransaction())
{
using (var context = new YourDataContext(connection))
{
// This next line is needed in .NET 3.5.
context.Transaction = transaction;
var writer = new StringWriter();
context.Log = writer;
// *** Do your stuff here ***
context.SubmitChanges();
output = writer.ToString();
}
transaction.Rollback();
}
}
I am always required to provide a RollBack script to our QA team for testing before any change script can be executed in PROD.
Example: Files are sent externally with a bunch of mappings between us, the recipient and other third parties. One of these third parties wants to change, on an agreed date, the mappings between the three of us.
Exec script would maybe update some exisiting, delete some now redundant and insert some new records - scope_identity used in subsequent relational setup etc etc.
If, for some reason, after we have all executed our changes and the file transport is fired up, just like in UAT, we see some errors not encountered in UAT, we might multilaterally make the decision to roll back the changes. Hence the roll back script.
SQL has this info when you BEGIN TRAN until you COMMIT TRAN or ROLLBACK TRAN. I guess your question is the same as mine - can you output that info as a script.
Why do you need this?
Maybe you should explore the flashback possibilities of Oracle. It makes it possible to travel back in time.
It makes it possible to reset the content of a table or a database to how it once was at a specific moment in time (or at a specific system change number).
See: http://www.oracle.com/technology/deploy/availability/htdocs/Flashback_Overview.htm

Does SQLite support transactions across multiple databases?

I've done some searching and also read the FAQ on the SQLite site, no luck finding an answer to my question.
It could very well be that my database approach is flawed, but at the moment, I would like to store my data in multiple SQLite3 databases, so that means separate files. I am very worried about data corruption due to my application possibly crashing, or a power outage in the middle of changing data in my tables.
In order to ensure data integrity, I basically need to do this:
begin transaction
modify table(s) in database #1
modify table(s) in database #2
commit, or rollback if error
Is this supported by SQLite? Also, I am using sqlite.net, specifically the latest which is based on SQLite 3.6.23.1.
UPDATE
One more question -- is this something people would usually add to their unit tests? I always unit test databases, but have never had a case like this. And if so, how would you do it? It's almost like you have to pass another parameter to the method like bool test_transaction, and if it's true, throw an exception between database accesses. Then test after the call to make sure the first set of data didn't make it into the other database. But maybe this is something that's covered by the SQLite tests, and should not appear in my test cases.
Yes transactions works with different sqlite database and even between sqlite and sqlserver. I have tried it couple of times.
Some links and info
From here - Transaction between different data sources.
Since SQLite ADO.NET 2.0 Provider supports transaction enlistement, not only it is possible to perform a transaction spanning several SQLite datasources, but also spanning other database engines such as SQL Server.
Example:
using (DbConnection cn1 = new SQLiteConnection(" ... ") )
using (DbConnection cn2 = new SQLiteConnection(" ... "))
using (DbConnection cn3 = new System.Data.SqlClient.SqlConnection( " ... ") )
using (TransactionScope ts = new TransactionScope() )
{
cn1.Open(); cn2.Open(); cn3.Open();
DoWork1( cn1 );
DoWork2( cn2 );
DoWork3( cn3 );
ts.Complete();
}
How to attach a new database:
SQLiteConnection cnn = new SQLiteConnection("Data Source=C:\\myfirstdatabase.db");
cnn.Open();
using (DbCommand cmd = cnn.CreateCommand())
{
cmd.CommandText = "ATTACH DATABASE 'c:\\myseconddatabase.db' AS [second]";
cmd.ExecuteNonQuery();
cmd.CommandText = "SELECT COUNT(*) FROM main.myfirsttable INNER JOIN second.mysecondtable ON main.myfirsttable.id = second.mysecondtable.myfirstid";
object o = cmd.ExecuteScalar();
}
Yes, SQLite explicitly supports multi-database transactions (see https://www.sqlite.org/atomiccommit.html#_multi_file_commit for technical details). However, there is a fairly large caveat. If the database file is in WAL mode, then:
Transactions that involve changes against multiple ATTACHed databases
are atomic for each individual database, but are not atomic across all
databases as a set.

Resources