What is the suitable data type for DB replication? - sql-server

I'm creating a DB using SQL Server 2008.
This DB will be used in two countries and at some time (every day) they will be synchronized, I'll use the Replication service to accomplish that.
Most of the tables are using an Int column with Identity increment. But the tables will be empty when deployed so both countries will have a row with identity 1, 2, and son. I've never use replication before so I wanna know if there will be an error when the tables are synchronized?
Should I use a GUID data type instead?

Replicate Identity Columns (MSDN):
Replication offers three identity range management options:
Automatic. Used for merge replication and transactional replication with updates at the Subscriber...
Manual. Used for snapshot and transactional replication without updates at the Subscriber...
None. This option is recommended only for backwards compatibility...
So, yes, you can continue to use IDENTITY, provided you read through the information on replication and choose an option that makes sense for you.
Under Automatic, what it does is each server grabs a range of usable identity values and hands the individual values out as needed. Provided synchronization occurs often enough so that the ranges aren't completely exhausted, you'll never notice this detail.
And this allows you to scale out later as needed - as opposed to e.g. a MOD scheme where one server hands out odd values and the other even - you can't easily add a third server to such a scheme.

By your description, it sounds like you want to implement so called Merge replication.
In SQL Server you would not need to change the identity to a GUID, however, if you don't SQL server will automatically add another column called rowguid for each table and you may end up with duplicates of your original identity column. To circumvent this, you could have the servers assign mod 2 IDs.
In my opinion it makes most sense to use a GUID for the IDs altogether. Don't forget to set the ROWGUIDCOL property on your identity columns. Good luck.
Relevant MSDN:
http://technet.microsoft.com/en-us/library/ms152746.aspx

Consider adding a deviceID field to all tables users can update. With each device making changes using its own ID as part of the PK, there cannot be conflicts across devices.

Related

Copy Database Data from Many DBs to One. Data Replication (sort of)

This involves data replication, kind of:
We have many sites with SQL Express installed, there is an 'audit' database on each site that has one table in 1st normal form (to make life simple :)
Now I need to get this table from each site, and copy the contents (say, with a Date Time Value > 1/1/200 00:00, but this will change obviously) and copy it to a big 'super table' in sql server proper, that also has the primary key as the Site Name (That needs injecting in) and the current primary key from the SQL Express table)
e.g. Many SQL Express DBs with the following table columns
ID, Definition Name, Definition Type, DateTime, Success, NvarChar1, NvarChar2 etc etc etc
And the big super table needs to have:
SiteName, ID, Definition Name, Definition Type, DateTime, Success, NvarChar1, NvarChar2 etc etc etc
Where items in bold are the primary key(s)
Is there a Microsoft (or non MS I suppose) app/tool/thing to manager copying all this data accross already, or do we need to write our own?
Many thanks.
You can use SSIS (which comes with SQL Server) to populate, it can be set up with variables to change the connection string to the various databases. I have one that loops through the whole list and does the same process using three differnt files from three differnt vendors. You could so something simliar to loop through the different site databases. Put the whole list of database you want to copy the audit data from in a table and loop through it changing the connection string each time.
However, why on earth would you want one mega audit table per site? If every table in the database populates the audit table as changes happen, then the audit table eventually becomes a huge problem for performance. Every insert, update and delete has to hit this table and then you are proposing to add an export on top of that. This seems to me to be a guaranteed structure for locking and deadlocks and all sorts of nastiness. Do yourself a favor and limit each audit table to the table it is auditing.
Things to consider:
Linked servers and sp_msforeachdb as part of a do-it-yourself solution.
SQL Server Replication (by Microsoft) (which I believe can pull data from SQL Server Express)
SQL Server Integration Services which can pull data from SQL Server Express instances.
Personally, I would investigate Integration Services first.
Good luck.
You could do this with SymmetricDS. SymmetricDS is open source, web-enabled, database independent, data synchronization/replication software. It uses web and database technologies to replicate tables between relational databases in near real time. The software was designed to scale for a large number of databases, work across low-bandwidth connections, and withstand periods of network outage.
As of right now, however, you would need to implement a custom IDataLoaderFilter extension point (in Java) to add the extra column. The metadata would be available though because your SiteName would be the external_id.

Transactional replication with no primary key (unique index)

I've just come across something disturbing, I was trying to implement transactional replication from a database whose design is not under our control . This replication was in order to perform reporting without taxing the system too much. Upon trying the replication only some of the tables went across.
On investigation tables were not selected to be replicated because they don't have a primary key, I thought this cannot be it is even shown as a primary key if I use ODBC and ms access but not in management studio. Also the queries are not ridiculously slow.
I tried inserting a duplicate record and it failed saying about a unique index(not a primary key). Seems to be the tables have been implemented using a unique index as oppose to a primary key. Why I do not know I could scream.
Is there anyway to perform transactional replication or an alternative, it needs to be live (last minute or two). The main db server is currently sql 2000 sp3a and the reporting server 2005.
The only thing I have currently thought of trying is setting the replication up as if it is another type of database. I believe replication to say oracle is possible would this force the use of say an ODBC driver like I assume access is using hence showing a primary key. I don't know if that is accurate out of my depth on this.
As MSDN states, it is not possible to create a transactional replication on tables without primary keys. You could use Merge replication (one way), that doesn't require a primary key, and it automatically creates a rowguid column if it doesn't exist:
Merge replication uses a globally
unique identifier (GUID) column to
identify each row during the merge
replication process. If a published
table does not have a uniqueidentifier
column with the ROWGUIDCOL property
and a unique index, replication adds
one. Ensure that any SELECT and INSERT
statements that reference published
tables use column lists. If a table is
no longer published and replication
added the column, the column is
removed; if the column already
existed, it is not removed.
Unfortunately, you will have a performance penalty if using merge replication.
If you need to use replication for reporting only, and you don't need the data to be exactly the same as on the publisher, then you could consider snapshot replication also

Moving client data from one database to a new one

Our application architecture allows us to host multiple clients in a single database, and also host multiple databases. This allows us to scale out by distributing clients across multiple databases. For example, 20 clients can be in database A, and another 15 could be in database B. We use a ClientID field in almost every table to partition client data. All our table's primary keys are INT identity TableID fields.
I'm looking for a tool/script that would help me extract client data from one database, and move it to a brand new database (so the PKs can stay the same). I'm hoping this exists already so we don't have to build our own. Pretty flexible in how this could work, but ideally it just generates a large .sql file with all the necessary INSERTS in the right order to move the data, and another sql file with all the necessary DELETES to erase the data from the source.
If it makes any difference we are on SQL Server 2008.
If you have standard or enterprise, you do have SSIS. Although it may not qualify as a "tool", it is fairly easy to implement in this scenario.
I can recomend redgate SQL DataCompare for this, we use it for syncing data, and use their SQL Compare to sync the database schema.
Both tools can either output sql, you can execute yourself, or the tools can execute the sql scripts themself.
They have a command line version of the tools to, so you could use them in an deployment script, tho i haven't tried this.
They both work really well, and are no doubt worth the price.
Not the answer you may be looking for, but you should consider using a GUID as a key. This will ensure that you have some type of unique identifier for your all records and that you can avoid collisions with identity keys / integer based indexes. It would add another degree of traceability should something go wrong when you migrate between databases.
SplendidCRM uses this technique when importing data from other DB systems.
Update:
My assumption was that the operation of transferring data between databases was not that frequent and that you needed database architecture for that task. I would use the GUID as lookup key specifically validation for the transfer of data, but I would NOT use that as a primary key for joins for standard operations like URL's. Although unique across databases, the trade-off is that GUIDs are slow.
In other words, the GUIDS would in addition to your existing primary keys now, and act as a means of validation for you should something go wrong. If you need ClientID in Database A to retain the same value in Database B then an identity column as that identifier will be an issue. You may have to create another identifier that is not "auto-generated". This could something other than the GUID, but my instinct is that integers alone will not be enough. Maybe you can create a columns that is a hash of the identity key, customer name and database name, or more simply, just concatenate those columns into a varchar column.

Use SSIS to migrate and normalize database

We have an MS Access database that we want to migrate to a SQL Server Database with a new DB design. A part of the application that uses the SQL Server DB is already written.
I looked around to find out how to do the migration step most easily and started with Microsofts SQL Server Integration Services (SSIS). Now I have gotten to the point that I want to split a table vertically for normalization reasons.
A made up example looks like this
MS Access table person
ID
Name
Street
SQL Server table person
id
name
SQL Server table address
id
person_id
street
How can I complete this task best with SSIS? The id columns are identity (autoincrement) columns, so I cannot insert the old ID. How can I put the correct person_id foreign key in the address table?
There might even be a table which has to be broken up into three tables, where a row in table2 belongs to table1 and a row in table3 belongs to a row table2.
Is SSIS the appropriate means for this?
EDIT
Although this is a one-time migration, we need to have an automated and repeatable process, because the production database is under heavy usage and we are working on the migration in our development environment with recent, but not up-to-date data. We plan for one test run of the migration and have the customer review the behaviour. If everything is fine, we will go for the real migration.
Most of the given solutions include lots of manual steps and are thus not appropriate.
Use the execute SQL Task and write the statement yourself.
For the parent table do Select into table from table... then do the same for the rest as you progress. Make sure you set identity insert to ON for the parent table and reuse your old ID's. That will help you keep your data integrity.
For migrating your Access tables into SQL Server, use SSMA, not the Upsizing Wizard from Access.
You'll get a lot more tools at your disposal.
You can then break up your tables one by one from within SQL Server.
I'm not sure if there are any tools that can help you split your tables automatically, at least I couldn't find any, but it's not too difficult to do manually although how much work is required depends on how you used the original tables in your VBA code and forms in the first place.
A side note
Regarding normalization, don't go overboard with it: I know your example was just that but normalizing customer addresses is not always (rarely?) needed.
How many addresses can a person have?
If you count a home address, business address, delivery address, billing address, that's probably the most you'll ever need.
In that case, it's better to just keep them in the same table. Normalizing that data will just require more work to recombine and offers no benefit.
Of course, there are cases where it would make sense to normalise but I've seen people going overboard with the notion (I've been guilty of it as well) and then find themselves struggling to build more complex queries to join all that split data, making development and maintenance harder and often suffering a performance penalty in the process.
Access is so user-friendly, why not normalize your tables in Access, and then upsize the finished structure from there?
I found a different solution which was not mentioned yet and allows us to use all the comfort and options of the dataflow task:
If the destination database is on a local SQL Server, you can use a dataflow task with SQL Server destination instead of an OLE DB destination.
For a SQL Server destination you can mark the "keep identities" option. (I do not know if the english names are correct, because we have a german version.) With this you can write into identity columns
We found that we cannot use the old primary keys everywhere, because we have some tables that take a union of records from multiple tables.
We start the process by building a temporary mapping table with columns
new_id (identity)
old_id (int)
old_tablename (string)
We first fill in all the old_id s for every table that is referenced by a foreign key in the new schema. The new_id values are generated automatically by SQL Server.
So we can use a join to translate from old_id to new_id where needed. We use the new_id values to fill the identity (primary key) columns in the new tables with the "keep identities" option and can simply look them up in our mapping table for the foreign keys by a join.
You might also look at Jamie Thomson's SSIS Normalizer component. I just found out about it today (haven't actually tried it yet). The example he posts looks a lot like the one in your question.

Advice Please: SQL Server Identity vs Unique Identifier keys when using Entity Framework

I'm in the process of designing a fairly complex system. One of our primary concerns is supporting SQL Server peer-to-peer replication. The idea is to support several geographically separated nodes.
A secondary concern has been using a modern ORM in the middle tier. Our first choice has always been Entity Framework, mainly because the developers like to work with it. (They love the LiNQ support.)
So here's the problem:
With peer-to-peer replication in mind, I settled on using uniqueidentifier with a default value of newsequentialid() for the primary key of every table. This seemed to provide a good balance between avoiding key collisions and reducing index fragmentation.
However, it turns out that the current version of Entity Framework has a very strange limitation: if an entity's key column is a uniqueidentifier (GUID) then it cannot be configured to use the default value (newsequentialid()) provided by the database. The application layer must generate the GUID and populate the key value.
So here's the debate:
abandon Entity Framework and use another ORM:
use NHibernate and give up LiNQ support
use linq2sql and give up future support (not to mention get bound to SQL Server on DB)
abandon GUIDs and go with another PK strategy
devise a method to generate sequential GUIDs (COMBs?) at the application layer
I'm leaning towards option 1 with linq2sql (my developers really like linq2[stuff]) and 3. That's mainly because I'm somewhat ignorant of alternate key strategies that support the replication scheme we're aiming for while also keeping things sane from a developer's perspective.
Any insight or opinion would be greatly appreciated.
I second Craig's suggestion - option 4.
You can always use the GUID column, populated by the middle-tier, as your PRIMARY KEY (that's a LOGICAL construct).
To avoid massive index (thus: table) fragmentation, use some other key (ideally an INT IDENTITY column) as the CLUSTERING KEY - that's a physical database construct, which CAN be separated from the primary key.
By default, the primary key is the clustering key - but that doesn't have to be that way. In fact, I improved performance and drastically lowered fragmentation by doing just that on a database I "inherited" - add a INT IDENTITY column and put the clustering key on that small, ever-increasing, never-changing INT - works like a charm!
Marc
Huh? I think your three options are a false choice. Consider option 4:
4) Use the Entity Framework with non-sequential, client-generated GUIDs.
The EF can't see DB-server-generated GUIDs for new rows inserted by the framework itself, sure, but you don't need to generate the GUIDs on the DB server. You can generate them on the client when you create your entity instances. The whole point of a GUID is it doesn't matter where you generate it. As for GUIDs generated by a replicated DB, the EF will see them just fine.
Your client-side GUIDs won't be sequential (use Guid.NewGuid()), but they will be world-wide, guaranteed unique.
We do this in shipping, production software with replication. It does work.
Another option (not available when this was posted) is to upgrade to EF 4, which supports server-generated GUIDs.
Why not use identity column? If you are doing merge replication you can have each system start at a separate seed and work in one direction (e.g. node a starts at 1 and adds 1, node b starts at 0 and subtracts one)...
You can use stored procedures if you are really stuck on using NewSequentialID(). You can bind the result columns from the procedure to the appropriate property and once inserted the SQL-generated GUID will be fed back into the object.
Unfortunately you have to define SPs for all three operations (insert, update, delete) even though the other operations would complete properly using the defaults. You also need to maintain the SP code and ensure it is synchronized with your EF model as you make changes, which may make this option unattractive on account of the additional overhead.
There is a step-by-step example at http://blogs.msdn.com/bags/archive/2009/03/12/entity-framework-modeling-action-stored-procedures.aspx which is pretty straight-forward.
use newseqid with your own orm (it not that hard) with linq

Resources