Transactional replication with no primary key (unique index) - sql-server

I've just come across something disturbing, I was trying to implement transactional replication from a database whose design is not under our control . This replication was in order to perform reporting without taxing the system too much. Upon trying the replication only some of the tables went across.
On investigation tables were not selected to be replicated because they don't have a primary key, I thought this cannot be it is even shown as a primary key if I use ODBC and ms access but not in management studio. Also the queries are not ridiculously slow.
I tried inserting a duplicate record and it failed saying about a unique index(not a primary key). Seems to be the tables have been implemented using a unique index as oppose to a primary key. Why I do not know I could scream.
Is there anyway to perform transactional replication or an alternative, it needs to be live (last minute or two). The main db server is currently sql 2000 sp3a and the reporting server 2005.
The only thing I have currently thought of trying is setting the replication up as if it is another type of database. I believe replication to say oracle is possible would this force the use of say an ODBC driver like I assume access is using hence showing a primary key. I don't know if that is accurate out of my depth on this.

As MSDN states, it is not possible to create a transactional replication on tables without primary keys. You could use Merge replication (one way), that doesn't require a primary key, and it automatically creates a rowguid column if it doesn't exist:
Merge replication uses a globally
unique identifier (GUID) column to
identify each row during the merge
replication process. If a published
table does not have a uniqueidentifier
column with the ROWGUIDCOL property
and a unique index, replication adds
one. Ensure that any SELECT and INSERT
statements that reference published
tables use column lists. If a table is
no longer published and replication
added the column, the column is
removed; if the column already
existed, it is not removed.
Unfortunately, you will have a performance penalty if using merge replication.
If you need to use replication for reporting only, and you don't need the data to be exactly the same as on the publisher, then you could consider snapshot replication also

Related

Data Warehousing GUID to Int PrimaryKeys

I'm a (very) junior Analyst responsible for setting up an mssql DWH which hosts data from our CRM for reporting purposes.
The current CRM uses uniqueidentifiers in its mssql database for all keys, and some of the tables have 8m+ rows. In our reporting software (Qlikview) I can swap the GUIDs for ints and take an 800mb data file down to 90mb which is excellent, however I'd like to perform this logic in the DWH if possible to make it faster and a little cleaner.
My issue is I have no idea how to do so while maintaining FK links to other tables. I have considered maintaining a staging table of GUIDs and associated numeric IDs however this seems inefficient and poses a problem of then trying to write some arbitrary numeric ID to the PK column of the destination table which I'm sure is a terrible idea.
The DWH import works as follows: I have USPs on the source db performing SELECTs which are executed by a SSIS package, the output of which are placed in tables of the same name on the [Staging] schema of the DWH. From there, transform is performed by USPs on the DWH, also executed by the same SSIS package, which handles execution order and multi-threading. Whatever implementation I come up with will need to be compatible with this architecture (done within USPs that potentially run asynchronously).
I'm very much a SQL noob so I do ask to please link documentation if necessary or at least describe answers in a google-friendly way.
Is the removal of GUID is the major cause of possible shrink to 90mb ? Do you not need GUID to process the Report?
Do you strip the relationship and join almost all table into as few table as possible when creating the staging table?
If answer to number 1 and 2 is yes then you do not need GUID and simply need to have a int unique column.
I suggest in select command during creating/inserting staging table you use ROW_NUMBER for replacing the GUID column with int unique column. This is only going to work if you recreating the staging table each time running the SSIS Script.
If you are simply inserting data to an already existing Staging Table when running SSIS Script then you can just create an autoincrement primary column. When you insert data to Staging Table, do not insert to autoincrement primary column so the column is automatically generating unique int value.

Change Primary keys from int to guid in existing database

I have taken over a project with an existing SQL server installation. The client wants to move everything to the azure SQL and make several on premises databases sync to azure.
The PK's in the tables are int's and for the Azure datasync to work PK's needs to be guid's. the database consists of several related tables.
My question is therefore. What is the best way to change the PK's to guids and at the same time update the FK's accordingly in existing tables.
The process as far as I see it:
1. Create new guid column
2. fill it with ID's.
3. change the PK to the guid column
4. update data to new guids in the FK tables.
Is there an easy scriptable way to make this magically happen?
No there is nothing built into SQL Server that makes this any easier than the process you described already.

What is the suitable data type for DB replication?

I'm creating a DB using SQL Server 2008.
This DB will be used in two countries and at some time (every day) they will be synchronized, I'll use the Replication service to accomplish that.
Most of the tables are using an Int column with Identity increment. But the tables will be empty when deployed so both countries will have a row with identity 1, 2, and son. I've never use replication before so I wanna know if there will be an error when the tables are synchronized?
Should I use a GUID data type instead?
Replicate Identity Columns (MSDN):
Replication offers three identity range management options:
Automatic. Used for merge replication and transactional replication with updates at the Subscriber...
Manual. Used for snapshot and transactional replication without updates at the Subscriber...
None. This option is recommended only for backwards compatibility...
So, yes, you can continue to use IDENTITY, provided you read through the information on replication and choose an option that makes sense for you.
Under Automatic, what it does is each server grabs a range of usable identity values and hands the individual values out as needed. Provided synchronization occurs often enough so that the ranges aren't completely exhausted, you'll never notice this detail.
And this allows you to scale out later as needed - as opposed to e.g. a MOD scheme where one server hands out odd values and the other even - you can't easily add a third server to such a scheme.
By your description, it sounds like you want to implement so called Merge replication.
In SQL Server you would not need to change the identity to a GUID, however, if you don't SQL server will automatically add another column called rowguid for each table and you may end up with duplicates of your original identity column. To circumvent this, you could have the servers assign mod 2 IDs.
In my opinion it makes most sense to use a GUID for the IDs altogether. Don't forget to set the ROWGUIDCOL property on your identity columns. Good luck.
Relevant MSDN:
http://technet.microsoft.com/en-us/library/ms152746.aspx
Consider adding a deviceID field to all tables users can update. With each device making changes using its own ID as part of the PK, there cannot be conflicts across devices.

Vs2010 Data Generation Plan fails with "Data generation failed because of the following exception: Column "xyz" does not allow DBNull.Value"

I'm fairly new to Vs Data capabilities, and this is my first data generation plan. I have implemented a database using a Vs2010 database project, and used it to deploy to a sql server express 2008 database. All the tables use identity columns as their primary keys, and they're related to one another with foreign keys.
I set up a data generation plan, but when I try to generate data with it, the tables are simply populated in alphabetical order, which is of course going to fail. The only tables that populate correctly are the lookup tables and other sorts of independent entities with no FK constraints. The rest are skipped after the first table fails.
Supposedly the generation plan determines the population order based on FK dependencies. What happened?
edit: someone with the rep for it should make a visual-studio-data-tools tag, since DBPro is no longer (nor really ever was) a product name.
So apparently according to this thread the data generation plan blows up when you have a table containing only a primary key and no other columns. It turns out that one of my independent entities, whose only purpose is to serve as a joinder to one of my other tables, fit this description. After adding a harmless Description column, I was able to proceed fixing other problems until the generation plan completed successfully.

Primary Keys in Oracle and SQL Server

What's the best practice for handling primary keys using an ORM over Oracle or SQL Server?
Oracle - Should I use a sequence and a trigger or let the ORM handle this? Or is there some other way ?
SQL Server - Should I use the identifier data type or somehow else ?
If you are using any kind of ORM, I would suggest you to let it handle your primary keys generation. In SQL Server and Oracle.
With either database, I would use a client-generated Guid for the primary key (which would map to uniqueidentifier in SQL Server, or RAW(20) in Oracle). Despite the performance penalty on JOINs when using a Guid foreign key, I tend to work with disconnected clients and replicated databases, so being able to generate unique IDs on the client is a must. Guid IDs also have advantages when working with an ORM, as they simplify your life considerably.
It is a good idea to remember that databases tend to have a life independent from a front end application. Records can be inserted by batch processes, web services, data exchange with other databases, heck, even different applications sharing the same database.
Consequently it is useful if a database table is in charge of its own identify, or at least has that capability. For instance, in Oracle a BEFORE INSERT trigger can check whether a value has been provided for its primary key, and if not generate its own.
Both Oracle and SQL Server can generate GUIDs, so that is not a sufficient reason for delegating identity generation to the client.
Sometimes, there is a natural, unique identifier for a table. For instance, each row in a User table can be uniquely identified by the UserName column. In that case, it may be best to use UserName as the primary key.
Also, consider tables used to form a many to many relationship. A UserGroupMembership table will contain UserId and GroupId columns, which should be the primary key, as the combination uniquely identifies the fact that a particular user is a member of a particular group.

Resources