Transfer several joined tables from one database to another using SSIS? - sql-server

I have several tables in my database A which are interconnected via foreign keys and contain values. These values need to be transfered to another database B, all dependencies must be preserved, but the actual (numeric) values of primary and foreign keys are, of course, of no importance.
What would be the easiest way to fulfill this task using SSIS?
Here are the approaches I tried but with no much success:
I implemented a really very sophisticated view with flattened data and a lot of redundancy in the data and bumped into the problem how to split the data from this flattened view into several tables connected via foreign keys. This might be a solution, but I would personally prefer to avoid the data flatenning step if possible.
I tried to copy the tables one-to-one using NOCHECK options to lift up the constraint checks and to perform insertion into PK and FK fields. This, however, confines my transfer to a complete new import, I cannot just "add" some new data to existing set of data that would be nice.
Any other suggestions?

Integration Services has a Control Flow called Transfer Database Task and Transfer SQL Server Objects Task exclusive for what you need.
Here is a tutorial for what you need LINK.

Related

Data Warehousing GUID to Int PrimaryKeys

I'm a (very) junior Analyst responsible for setting up an mssql DWH which hosts data from our CRM for reporting purposes.
The current CRM uses uniqueidentifiers in its mssql database for all keys, and some of the tables have 8m+ rows. In our reporting software (Qlikview) I can swap the GUIDs for ints and take an 800mb data file down to 90mb which is excellent, however I'd like to perform this logic in the DWH if possible to make it faster and a little cleaner.
My issue is I have no idea how to do so while maintaining FK links to other tables. I have considered maintaining a staging table of GUIDs and associated numeric IDs however this seems inefficient and poses a problem of then trying to write some arbitrary numeric ID to the PK column of the destination table which I'm sure is a terrible idea.
The DWH import works as follows: I have USPs on the source db performing SELECTs which are executed by a SSIS package, the output of which are placed in tables of the same name on the [Staging] schema of the DWH. From there, transform is performed by USPs on the DWH, also executed by the same SSIS package, which handles execution order and multi-threading. Whatever implementation I come up with will need to be compatible with this architecture (done within USPs that potentially run asynchronously).
I'm very much a SQL noob so I do ask to please link documentation if necessary or at least describe answers in a google-friendly way.
Is the removal of GUID is the major cause of possible shrink to 90mb ? Do you not need GUID to process the Report?
Do you strip the relationship and join almost all table into as few table as possible when creating the staging table?
If answer to number 1 and 2 is yes then you do not need GUID and simply need to have a int unique column.
I suggest in select command during creating/inserting staging table you use ROW_NUMBER for replacing the GUID column with int unique column. This is only going to work if you recreating the staging table each time running the SSIS Script.
If you are simply inserting data to an already existing Staging Table when running SSIS Script then you can just create an autoincrement primary column. When you insert data to Staging Table, do not insert to autoincrement primary column so the column is automatically generating unique int value.

Better practice for SQL? One database for shared resources or tables in each Database with those resources

I have shared resources across all of my databases. Users, Companies etc. These are shared between all of my databases and the tables are the same. I want to create on Database for these tables and have all of my databases reference this one instead of having multiple tables that are the same. I come from a C# background and I am not very proficient in SQL. I am writing a new application that uses several of the databases we have.
Question: Should I make one database an authoritative source on these resources? The problem I see is I need Foreign Key relationships between databases and without triggers this is not possible. Not to mention when I write my linq statements I cannot query by these items.
We were able to achieve this by having one central database as the source of truth, then having copies of the applicable tables moved out to all the databases that needed it via triggers. You have to make sure all CRUD is done to the source of truth database, otherwise it gets very complicated to manage everything. You can then create the foreign keys to the copy tables.

Separating weakly linked database schemas

I've been tasked with revisiting a database schema we designed and use internally for various ticketing and reporting systems. Currently there exists about 40 tables in one Oracle database schema supporting perhaps six webapps.
However, there's one unifying relationship amongst them all: a rooms table describing the room. Room name, purpose and other data are thrown into a shared table for each app. My initial idea was to pull each of these applications into a separate database, and perform joins between a given database and the room database. But I've discovered this solution prevents foreign key constraints in SQL Server 2005. It seems silly to duplicate one table for each app and keep those multiple copies synchronized.
Should I just leave everything in one large DB, or is there something else I can do separate the tables without losing FK constraints?
The only way to achieve built-in referential integrity is to have the table inside the database in which it is referenced. You might be able to achieve the equivalent of referential integrity using triggers but it would likely be deathly slow.
You might be able to use SQL Server replication, in it's "Transactional replication" mode/form. http://msdn.microsoft.com/en-us/library/ms151176.aspx
if all the apps truly use and depend on the rooms - then keep them all in one DB.
you can still set privilege on the tables properly, and manage the data sets in the non overlapping areas normally -
is there any task you imagine you will not be able to perform when things are together?

Moving client data from one database to a new one

Our application architecture allows us to host multiple clients in a single database, and also host multiple databases. This allows us to scale out by distributing clients across multiple databases. For example, 20 clients can be in database A, and another 15 could be in database B. We use a ClientID field in almost every table to partition client data. All our table's primary keys are INT identity TableID fields.
I'm looking for a tool/script that would help me extract client data from one database, and move it to a brand new database (so the PKs can stay the same). I'm hoping this exists already so we don't have to build our own. Pretty flexible in how this could work, but ideally it just generates a large .sql file with all the necessary INSERTS in the right order to move the data, and another sql file with all the necessary DELETES to erase the data from the source.
If it makes any difference we are on SQL Server 2008.
If you have standard or enterprise, you do have SSIS. Although it may not qualify as a "tool", it is fairly easy to implement in this scenario.
I can recomend redgate SQL DataCompare for this, we use it for syncing data, and use their SQL Compare to sync the database schema.
Both tools can either output sql, you can execute yourself, or the tools can execute the sql scripts themself.
They have a command line version of the tools to, so you could use them in an deployment script, tho i haven't tried this.
They both work really well, and are no doubt worth the price.
Not the answer you may be looking for, but you should consider using a GUID as a key. This will ensure that you have some type of unique identifier for your all records and that you can avoid collisions with identity keys / integer based indexes. It would add another degree of traceability should something go wrong when you migrate between databases.
SplendidCRM uses this technique when importing data from other DB systems.
Update:
My assumption was that the operation of transferring data between databases was not that frequent and that you needed database architecture for that task. I would use the GUID as lookup key specifically validation for the transfer of data, but I would NOT use that as a primary key for joins for standard operations like URL's. Although unique across databases, the trade-off is that GUIDs are slow.
In other words, the GUIDS would in addition to your existing primary keys now, and act as a means of validation for you should something go wrong. If you need ClientID in Database A to retain the same value in Database B then an identity column as that identifier will be an issue. You may have to create another identifier that is not "auto-generated". This could something other than the GUID, but my instinct is that integers alone will not be enough. Maybe you can create a columns that is a hash of the identity key, customer name and database name, or more simply, just concatenate those columns into a varchar column.

Using integration service, how can I copy a record, with all the records that the main record has a foreign key relationship to

I am working on creating the necessary views, triggers and stored procedures so I can make it easier for people to use Integration Service to copy data to and from our database, which is an entity-attribute-value schema, so the foreign key relationships are not always explicitly stated in the schema, but in my view I can hopefully make it more explicit.
So if I have a vehicle entity and I want to copy it, and have all the related parts of the vehicle also be copied, what should I be looking at with the service?
I am not very comfortable with Integration Service so I may ask for some clarification after responses.
Thank you.
SSIS typically loads a single branch of a dataflow into a table. A branch can split to load multiple tables.
I'd say it would be better to load to a staging table which always matches the required expectations for an entity, have the users make their dataflows to populate the staging table and then use a single INSERT/UPDATE in a SQL Command task to update your view (via an INSTEAD OF trigger, right?).
Another good possibility is to create a custom data destination component which enforces all your expectations.

Resources