Change ID generator in nhibernate and migrate existing database - sql-server

I have an existing product using the increment ID generator for most db entities. A new version should allow clustering of multiple server instances working on the same database. The product supports use of MSSQL and Oracle databases.
So I consider changing the ID generator to native, but there are some issues with that.
Two different algorithms will be used for Oracle and MSSQL - will that be transparent when creating objects in the code?
How can I migrate existing databases and how will I get the generator to not use the IDs already in use?
Thanks in advance for any insights on this.

I would suggest looking at a hilo generator strategy. The benefit is that it can be used for multiple processes, and you still retain the performance benefit of using a generated id in NHibernate (specifically allowing batching of inserts).
MSSQL does not allow you to change a column to be an identity column - you will need to add a new column and then update all the foreign keys - if you have a lot of tables / relationships, this can be very very messy.
With the hilo generator strategy you can avoid that issue altogether, it's just a configuration change and adding a table to your database to store the table high values, and populating that table with the correct values.

Related

When and how do I have to create an Index in Grails?

In Grails you can add custom indices to your domain classes.
Does Grails generate indices by default for my tabels?
Is there a rule which columns I have to use for my index?
Do my queries change when an index is set?
This isn't really a Grails question, except for the part about when and if Grails creates indexes. You need them like you would in any application that uses a database - create them to improve lookup performance.
Grails doesn't actually create any, Hibernate does that when it generates the DDL that creates your tables. You can see this DDL at any time by running grails schema-export - the generated file will be target/ddl.sql.
In general you'll see unique constraints which will typically create a unique index, and in MySQL and some other databases you'll see indexes created on foreign keys (but this isn't done for Oracle for some reason).
There is some mapping support for getting Hibernate to create indexes as you noted in your question, but in general you'll need to create them yourself since they are often database-specific. Use the http://grails.org/plugin/database-migration plugin for this.
In general you will use indexes on columns that are part of frequent queries and queries with high execution cost. This will happen using any relational database and any development framework.
About Grails, I found this post very useful in how indexes are defined in Grails: http://grails.asia/grails-how-to-create-custom-table-index-or-composite-index

What is the suitable data type for DB replication?

I'm creating a DB using SQL Server 2008.
This DB will be used in two countries and at some time (every day) they will be synchronized, I'll use the Replication service to accomplish that.
Most of the tables are using an Int column with Identity increment. But the tables will be empty when deployed so both countries will have a row with identity 1, 2, and son. I've never use replication before so I wanna know if there will be an error when the tables are synchronized?
Should I use a GUID data type instead?
Replicate Identity Columns (MSDN):
Replication offers three identity range management options:
Automatic. Used for merge replication and transactional replication with updates at the Subscriber...
Manual. Used for snapshot and transactional replication without updates at the Subscriber...
None. This option is recommended only for backwards compatibility...
So, yes, you can continue to use IDENTITY, provided you read through the information on replication and choose an option that makes sense for you.
Under Automatic, what it does is each server grabs a range of usable identity values and hands the individual values out as needed. Provided synchronization occurs often enough so that the ranges aren't completely exhausted, you'll never notice this detail.
And this allows you to scale out later as needed - as opposed to e.g. a MOD scheme where one server hands out odd values and the other even - you can't easily add a third server to such a scheme.
By your description, it sounds like you want to implement so called Merge replication.
In SQL Server you would not need to change the identity to a GUID, however, if you don't SQL server will automatically add another column called rowguid for each table and you may end up with duplicates of your original identity column. To circumvent this, you could have the servers assign mod 2 IDs.
In my opinion it makes most sense to use a GUID for the IDs altogether. Don't forget to set the ROWGUIDCOL property on your identity columns. Good luck.
Relevant MSDN:
http://technet.microsoft.com/en-us/library/ms152746.aspx
Consider adding a deviceID field to all tables users can update. With each device making changes using its own ID as part of the PK, there cannot be conflicts across devices.

Are there any in-memory databases that support computed columns?

We have a SQL 2005/2008 database that has a table with a computed column. We're using the computed column as a discriminator in NHibernate so having it in the database is proving to be very useful.
In order to gain the benefits of faster integration tests, I'd like to be able to run our integration tests against an in-memory database such as SQLite or SQL CE. But I don't think either of those support the computed column.
Are there any other solutions to my problem? I have complete access to the database and can modify it if there's a better solution available. I've seen this post that suggests using a view instead of a computed column, is this the best alternative?
What I did was added the computed column to the DataTable when loading the table from SqlCe. I stored the definition of the computed DataColumn in a "configuration" table stored in the database. I was able to do complex calculations that depended on a "chain" of tables, where each table performed a simplier function of a more complex function. (The last table in the chain contained the results.) I used SqlCe because one table of five contained 15 million rows. Too much data for the in-memory data sets of ADO.NET. (I had a requirement of using local client based calculations before posting to server.)

Moving client data from one database to a new one

Our application architecture allows us to host multiple clients in a single database, and also host multiple databases. This allows us to scale out by distributing clients across multiple databases. For example, 20 clients can be in database A, and another 15 could be in database B. We use a ClientID field in almost every table to partition client data. All our table's primary keys are INT identity TableID fields.
I'm looking for a tool/script that would help me extract client data from one database, and move it to a brand new database (so the PKs can stay the same). I'm hoping this exists already so we don't have to build our own. Pretty flexible in how this could work, but ideally it just generates a large .sql file with all the necessary INSERTS in the right order to move the data, and another sql file with all the necessary DELETES to erase the data from the source.
If it makes any difference we are on SQL Server 2008.
If you have standard or enterprise, you do have SSIS. Although it may not qualify as a "tool", it is fairly easy to implement in this scenario.
I can recomend redgate SQL DataCompare for this, we use it for syncing data, and use their SQL Compare to sync the database schema.
Both tools can either output sql, you can execute yourself, or the tools can execute the sql scripts themself.
They have a command line version of the tools to, so you could use them in an deployment script, tho i haven't tried this.
They both work really well, and are no doubt worth the price.
Not the answer you may be looking for, but you should consider using a GUID as a key. This will ensure that you have some type of unique identifier for your all records and that you can avoid collisions with identity keys / integer based indexes. It would add another degree of traceability should something go wrong when you migrate between databases.
SplendidCRM uses this technique when importing data from other DB systems.
Update:
My assumption was that the operation of transferring data between databases was not that frequent and that you needed database architecture for that task. I would use the GUID as lookup key specifically validation for the transfer of data, but I would NOT use that as a primary key for joins for standard operations like URL's. Although unique across databases, the trade-off is that GUIDs are slow.
In other words, the GUIDS would in addition to your existing primary keys now, and act as a means of validation for you should something go wrong. If you need ClientID in Database A to retain the same value in Database B then an identity column as that identifier will be an issue. You may have to create another identifier that is not "auto-generated". This could something other than the GUID, but my instinct is that integers alone will not be enough. Maybe you can create a columns that is a hash of the identity key, customer name and database name, or more simply, just concatenate those columns into a varchar column.

Advice Please: SQL Server Identity vs Unique Identifier keys when using Entity Framework

I'm in the process of designing a fairly complex system. One of our primary concerns is supporting SQL Server peer-to-peer replication. The idea is to support several geographically separated nodes.
A secondary concern has been using a modern ORM in the middle tier. Our first choice has always been Entity Framework, mainly because the developers like to work with it. (They love the LiNQ support.)
So here's the problem:
With peer-to-peer replication in mind, I settled on using uniqueidentifier with a default value of newsequentialid() for the primary key of every table. This seemed to provide a good balance between avoiding key collisions and reducing index fragmentation.
However, it turns out that the current version of Entity Framework has a very strange limitation: if an entity's key column is a uniqueidentifier (GUID) then it cannot be configured to use the default value (newsequentialid()) provided by the database. The application layer must generate the GUID and populate the key value.
So here's the debate:
abandon Entity Framework and use another ORM:
use NHibernate and give up LiNQ support
use linq2sql and give up future support (not to mention get bound to SQL Server on DB)
abandon GUIDs and go with another PK strategy
devise a method to generate sequential GUIDs (COMBs?) at the application layer
I'm leaning towards option 1 with linq2sql (my developers really like linq2[stuff]) and 3. That's mainly because I'm somewhat ignorant of alternate key strategies that support the replication scheme we're aiming for while also keeping things sane from a developer's perspective.
Any insight or opinion would be greatly appreciated.
I second Craig's suggestion - option 4.
You can always use the GUID column, populated by the middle-tier, as your PRIMARY KEY (that's a LOGICAL construct).
To avoid massive index (thus: table) fragmentation, use some other key (ideally an INT IDENTITY column) as the CLUSTERING KEY - that's a physical database construct, which CAN be separated from the primary key.
By default, the primary key is the clustering key - but that doesn't have to be that way. In fact, I improved performance and drastically lowered fragmentation by doing just that on a database I "inherited" - add a INT IDENTITY column and put the clustering key on that small, ever-increasing, never-changing INT - works like a charm!
Marc
Huh? I think your three options are a false choice. Consider option 4:
4) Use the Entity Framework with non-sequential, client-generated GUIDs.
The EF can't see DB-server-generated GUIDs for new rows inserted by the framework itself, sure, but you don't need to generate the GUIDs on the DB server. You can generate them on the client when you create your entity instances. The whole point of a GUID is it doesn't matter where you generate it. As for GUIDs generated by a replicated DB, the EF will see them just fine.
Your client-side GUIDs won't be sequential (use Guid.NewGuid()), but they will be world-wide, guaranteed unique.
We do this in shipping, production software with replication. It does work.
Another option (not available when this was posted) is to upgrade to EF 4, which supports server-generated GUIDs.
Why not use identity column? If you are doing merge replication you can have each system start at a separate seed and work in one direction (e.g. node a starts at 1 and adds 1, node b starts at 0 and subtracts one)...
You can use stored procedures if you are really stuck on using NewSequentialID(). You can bind the result columns from the procedure to the appropriate property and once inserted the SQL-generated GUID will be fed back into the object.
Unfortunately you have to define SPs for all three operations (insert, update, delete) even though the other operations would complete properly using the defaults. You also need to maintain the SP code and ensure it is synchronized with your EF model as you make changes, which may make this option unattractive on account of the additional overhead.
There is a step-by-step example at http://blogs.msdn.com/bags/archive/2009/03/12/entity-framework-modeling-action-stored-procedures.aspx which is pretty straight-forward.
use newseqid with your own orm (it not that hard) with linq

Resources