SqlBulkCopy with relationships - database

I need to somehow copy data from one database to another. I guess best will be SqlBulkCopy.
Problem is that, my table Product have 5 more tables connected with ID of product.
So product is connected with Product_picture (pictureID, productID), for example.
Is there any way to copy Product and all other connected tables with ID (or Identity) from one database to another?

Invoke bulk copy multiple times. Nothing of this is automated in the product, so you have to build this on your own.
Look into table valued parameters. They are much more flexible and offer reasonable performance for bulk jobs. It is probably fast enough.

Related

Structuring the data warehouse?

I need an advice for creating my DW. I have some experience in creating DW (using Pentaho as BI server), I made an array of scheduled database queries which creates 4 dimension tables and 1 fact table (for sales reports).
Now, there is a need for spreading out DW (for import, warehouse, logistic reports), so my question is: Do I have to make more fact tables for each department (and dimension tables for those), or there is some other structuring model?
Of course, this time it will be done using ETL tool, but need general advice.
Thanks,
Stevan
A fact table represents a process or events that you want to analyze.
If there is not a new process or event that you want to analyze, then you do not need a new fact table.
Perhaps you could give us some more details about what you are analyzing...

Postgresql - one database for everyone, or one-database per customer

I'm working on a web-based business application where each customer will need to have their own data (think basecamphq.com type model) For scalability and ease-of-upgrades, I'd prefer to have a single database where each customer gets a filtered version of the data. The problem is how to guarantee that they stay sandboxed to their own data. Trying to enforce it in code seems like a disaster waiting to happen. I know Oracle has a way to append a where clause to every query based on a login id, but does Postgresql have anything similar?
If not, is there a different design pattern I could use (like creating a view of each table for each customer that filters)?
Worse case scenario, what is the performance/memory overhead of having 1000 100M databases vs having a single 1Tb database? I will need to provide backup/restore functionality on a per-customer basis which is dead-simple on a single database but quite a bit trickier if they are sharing the database with other customers.
You might want to look into adding Veil to your PostgreSQL installation.
Schemas plus inherited tables might work for this, create your master table then inherit tables into per-customer schemas which provide a company ID or name field default.
Set the permissions per schema for each customer and set the schema search path per user. Use the same table names in each schema so that the queries remain the same.

Grouping ETL Staging Tables With User Schemas?

I was thinking of putting staging tables and stored procedures that update those tables into their own schema. Such that when importing data from SomeTable to the datawarehouse, I would run a Initial.StageSomeTable procedure which would insert the data into the Initial.SomeTable table. This way all the procs and tables dealing with the Initial staging are grouped together. Then I'd have a Validation schema for that stage of the ETL, etc.
This seems cleaner than trying to uniquely name all these very similar tables, since each table will have multiple instances of itself throughout the staging process.
Question: Is using a user schema to group tables/procs/views together an appropriate use of user schemas in MS SQL Server? Or are user schemas supposed to be used for security, such as grouping permissions together for objects?
This is actually a recommended practice. Take a look at the Microsoft Business Intelligence ETL Design Practices from the Project Real. You will find (download doc from the first link) that they use quite a few schemata to group and identify objects in the warehouse.
In addition to dbo and etl, they also use admin, audit, part, olap and a few more.
I think it's appropriate enough, it doesn't really matter, you could use another database if you liked which is actually what we do.
I'm not sure why you would want a validation schema though, what are you going to do there?
Both the reasons you list (purpose/intent, security) are valid reasons to use schemas. Once you start using them, you should always specify schema when referencing an object (although I'm lazy and never specify dbo).
One trick we use is to have the same-named table in each of several schemas, combined with table partitioning (available in SQL 2005 and up). Load the data in first schema, then when it's validated "swap" the partition into dbo--after swapping the dbo partition into a "dumpster" schema copy of the table. Net Production downtime is measured in seconds, and it's all carefully wrapped in a declared transaction.

Moving client data from one database to a new one

Our application architecture allows us to host multiple clients in a single database, and also host multiple databases. This allows us to scale out by distributing clients across multiple databases. For example, 20 clients can be in database A, and another 15 could be in database B. We use a ClientID field in almost every table to partition client data. All our table's primary keys are INT identity TableID fields.
I'm looking for a tool/script that would help me extract client data from one database, and move it to a brand new database (so the PKs can stay the same). I'm hoping this exists already so we don't have to build our own. Pretty flexible in how this could work, but ideally it just generates a large .sql file with all the necessary INSERTS in the right order to move the data, and another sql file with all the necessary DELETES to erase the data from the source.
If it makes any difference we are on SQL Server 2008.
If you have standard or enterprise, you do have SSIS. Although it may not qualify as a "tool", it is fairly easy to implement in this scenario.
I can recomend redgate SQL DataCompare for this, we use it for syncing data, and use their SQL Compare to sync the database schema.
Both tools can either output sql, you can execute yourself, or the tools can execute the sql scripts themself.
They have a command line version of the tools to, so you could use them in an deployment script, tho i haven't tried this.
They both work really well, and are no doubt worth the price.
Not the answer you may be looking for, but you should consider using a GUID as a key. This will ensure that you have some type of unique identifier for your all records and that you can avoid collisions with identity keys / integer based indexes. It would add another degree of traceability should something go wrong when you migrate between databases.
SplendidCRM uses this technique when importing data from other DB systems.
Update:
My assumption was that the operation of transferring data between databases was not that frequent and that you needed database architecture for that task. I would use the GUID as lookup key specifically validation for the transfer of data, but I would NOT use that as a primary key for joins for standard operations like URL's. Although unique across databases, the trade-off is that GUIDs are slow.
In other words, the GUIDS would in addition to your existing primary keys now, and act as a means of validation for you should something go wrong. If you need ClientID in Database A to retain the same value in Database B then an identity column as that identifier will be an issue. You may have to create another identifier that is not "auto-generated". This could something other than the GUID, but my instinct is that integers alone will not be enough. Maybe you can create a columns that is a hash of the identity key, customer name and database name, or more simply, just concatenate those columns into a varchar column.

Use SSIS to migrate and normalize database

We have an MS Access database that we want to migrate to a SQL Server Database with a new DB design. A part of the application that uses the SQL Server DB is already written.
I looked around to find out how to do the migration step most easily and started with Microsofts SQL Server Integration Services (SSIS). Now I have gotten to the point that I want to split a table vertically for normalization reasons.
A made up example looks like this
MS Access table person
ID
Name
Street
SQL Server table person
id
name
SQL Server table address
id
person_id
street
How can I complete this task best with SSIS? The id columns are identity (autoincrement) columns, so I cannot insert the old ID. How can I put the correct person_id foreign key in the address table?
There might even be a table which has to be broken up into three tables, where a row in table2 belongs to table1 and a row in table3 belongs to a row table2.
Is SSIS the appropriate means for this?
EDIT
Although this is a one-time migration, we need to have an automated and repeatable process, because the production database is under heavy usage and we are working on the migration in our development environment with recent, but not up-to-date data. We plan for one test run of the migration and have the customer review the behaviour. If everything is fine, we will go for the real migration.
Most of the given solutions include lots of manual steps and are thus not appropriate.
Use the execute SQL Task and write the statement yourself.
For the parent table do Select into table from table... then do the same for the rest as you progress. Make sure you set identity insert to ON for the parent table and reuse your old ID's. That will help you keep your data integrity.
For migrating your Access tables into SQL Server, use SSMA, not the Upsizing Wizard from Access.
You'll get a lot more tools at your disposal.
You can then break up your tables one by one from within SQL Server.
I'm not sure if there are any tools that can help you split your tables automatically, at least I couldn't find any, but it's not too difficult to do manually although how much work is required depends on how you used the original tables in your VBA code and forms in the first place.
A side note
Regarding normalization, don't go overboard with it: I know your example was just that but normalizing customer addresses is not always (rarely?) needed.
How many addresses can a person have?
If you count a home address, business address, delivery address, billing address, that's probably the most you'll ever need.
In that case, it's better to just keep them in the same table. Normalizing that data will just require more work to recombine and offers no benefit.
Of course, there are cases where it would make sense to normalise but I've seen people going overboard with the notion (I've been guilty of it as well) and then find themselves struggling to build more complex queries to join all that split data, making development and maintenance harder and often suffering a performance penalty in the process.
Access is so user-friendly, why not normalize your tables in Access, and then upsize the finished structure from there?
I found a different solution which was not mentioned yet and allows us to use all the comfort and options of the dataflow task:
If the destination database is on a local SQL Server, you can use a dataflow task with SQL Server destination instead of an OLE DB destination.
For a SQL Server destination you can mark the "keep identities" option. (I do not know if the english names are correct, because we have a german version.) With this you can write into identity columns
We found that we cannot use the old primary keys everywhere, because we have some tables that take a union of records from multiple tables.
We start the process by building a temporary mapping table with columns
new_id (identity)
old_id (int)
old_tablename (string)
We first fill in all the old_id s for every table that is referenced by a foreign key in the new schema. The new_id values are generated automatically by SQL Server.
So we can use a join to translate from old_id to new_id where needed. We use the new_id values to fill the identity (primary key) columns in the new tables with the "keep identities" option and can simply look them up in our mapping table for the foreign keys by a join.
You might also look at Jamie Thomson's SSIS Normalizer component. I just found out about it today (haven't actually tried it yet). The example he posts looks a lot like the one in your question.

Resources