Vs2010 Data Generation Plan fails with "Data generation failed because of the following exception: Column "xyz" does not allow DBNull.Value" - database

I'm fairly new to Vs Data capabilities, and this is my first data generation plan. I have implemented a database using a Vs2010 database project, and used it to deploy to a sql server express 2008 database. All the tables use identity columns as their primary keys, and they're related to one another with foreign keys.
I set up a data generation plan, but when I try to generate data with it, the tables are simply populated in alphabetical order, which is of course going to fail. The only tables that populate correctly are the lookup tables and other sorts of independent entities with no FK constraints. The rest are skipped after the first table fails.
Supposedly the generation plan determines the population order based on FK dependencies. What happened?
edit: someone with the rep for it should make a visual-studio-data-tools tag, since DBPro is no longer (nor really ever was) a product name.

So apparently according to this thread the data generation plan blows up when you have a table containing only a primary key and no other columns. It turns out that one of my independent entities, whose only purpose is to serve as a joinder to one of my other tables, fit this description. After adding a harmless Description column, I was able to proceed fixing other problems until the generation plan completed successfully.

Related

Data Warehousing GUID to Int PrimaryKeys

I'm a (very) junior Analyst responsible for setting up an mssql DWH which hosts data from our CRM for reporting purposes.
The current CRM uses uniqueidentifiers in its mssql database for all keys, and some of the tables have 8m+ rows. In our reporting software (Qlikview) I can swap the GUIDs for ints and take an 800mb data file down to 90mb which is excellent, however I'd like to perform this logic in the DWH if possible to make it faster and a little cleaner.
My issue is I have no idea how to do so while maintaining FK links to other tables. I have considered maintaining a staging table of GUIDs and associated numeric IDs however this seems inefficient and poses a problem of then trying to write some arbitrary numeric ID to the PK column of the destination table which I'm sure is a terrible idea.
The DWH import works as follows: I have USPs on the source db performing SELECTs which are executed by a SSIS package, the output of which are placed in tables of the same name on the [Staging] schema of the DWH. From there, transform is performed by USPs on the DWH, also executed by the same SSIS package, which handles execution order and multi-threading. Whatever implementation I come up with will need to be compatible with this architecture (done within USPs that potentially run asynchronously).
I'm very much a SQL noob so I do ask to please link documentation if necessary or at least describe answers in a google-friendly way.
Is the removal of GUID is the major cause of possible shrink to 90mb ? Do you not need GUID to process the Report?
Do you strip the relationship and join almost all table into as few table as possible when creating the staging table?
If answer to number 1 and 2 is yes then you do not need GUID and simply need to have a int unique column.
I suggest in select command during creating/inserting staging table you use ROW_NUMBER for replacing the GUID column with int unique column. This is only going to work if you recreating the staging table each time running the SSIS Script.
If you are simply inserting data to an already existing Staging Table when running SSIS Script then you can just create an autoincrement primary column. When you insert data to Staging Table, do not insert to autoincrement primary column so the column is automatically generating unique int value.

How to turn off foreign key constraints permanently on SQL database?

Running EXEC sp_msforeachtable #command1="ALTER TABLE ? NOCHECK CONSTRAINT ALL" will disable Foreign keys on existing tables.
What if the tables and insert data queries that enforce foreign key constraints run after this query,?
I am encountering this issue during build automation and What I am ideally look for is a permanent switch to disable all constraints on the database (i can do that since the database is created as a part of build process).
NOTE: See the 5 steps mentioned towards the last to get an idea of the issue faced during build automation
I have created a build step before processing the scripts to disable all existing foreign key constraints. The next step would be package and run all release sql scripts that may contain tables created, data inserted. The earlier build step to disable constraints have no clue about forth coming database tables and insert scripts which will enforce foreign key constraints after running the data insert, failing my build process.
Is there a way i am set a flag in the database to stop checking for foreign keys?
Adding some more context to what i am doing specifically.Automating build using bamboo and following steps are performed on a high level
locate last available deployed db schema
build a database using the schema generated script (no master data copied).
disable all foreign keys (unable to disable FK for tables yet to be created in next step)
merge all release specific db scripts(may contain new db and insert scripts)
apply other transformations like running codegeneration, script compare, delta finding etc.
Step 3 is the challenge.
Note: This is automating a legacy system with 300ish master datables and data, since Codesmith tools are used, schema changes has to be detected and auto generated code has to be checked against last deployed schema. Since the master data is so huge, keeping a reference db with data for build purposes is out of the question hence the referential integrity constraint issue will be more prominent.
The only thing I can think of is to create a DDL trigger which listens for constraints' creation and, if any are detected, drops them. However, I'm not sure this approach is viable if a constraint is created as a part of the create table statement. You should test it thoroughly before using.
Personally, however, I usually solve this by properly ordering the sequence in which the data is inserted. It's much safer, not prohibitively difficult and, last but not least, always possible to do.
Your basic problem is that your database migrations that are creating your database are running in the wrong order. Adjust the order of tables and data insertion so that only data that references already existing data, is inserted at any one time
Turning all the constraints off, loading data, and turning them all back on at the start and end of each script that does DB data alterations, is also an option, but you should separate your scripts that do schema changes from your scripts that do data loading and run all the schema changes first

Updating Records In a Second Table Using Foreign Keys

I have a database in which two tables have a 1:1 relationship using foreign keys. Table one is called Manifest and table two is called Inventory. When an inventory record is added using the application this is built for it uses a foreign key to reference the matching record in the manifest table. In addition, this causes an update to a column in the manifest table for the matching record called Received (datatype: BIT) to 1. This is used for reconciliation and reporting purposes.
Now here is where it gets tricky: This database is synchronized to a server database using Sync Framework in a client-server relationship. The Manifest table is synchronized in one direction from server to client, and the Inventory table is synchronized from client to server. Because of this the "received" column in the Manifest table is not always updated accurately on the server-side after a sync.
I was thinking of creating a stored procedure to perform this update, but I'm a bit rusty on my SQL (and T-SQL). The SP I was thinking of using would use a CURSOR to locate any records in the inventory table where the foreign key is NOT NULL (this is allowed due to exceptions where we receive something that was not in the manifest). The cursor would then allow me to iterate though all the records to locate the matching record in the manifest table and update the "received" column. I know that this cannot be the best way to perform this update. Can anyone suggest another way of doing this that would be faster and use less resources? Examples would be appreciated =)

Moving client data from one database to a new one

Our application architecture allows us to host multiple clients in a single database, and also host multiple databases. This allows us to scale out by distributing clients across multiple databases. For example, 20 clients can be in database A, and another 15 could be in database B. We use a ClientID field in almost every table to partition client data. All our table's primary keys are INT identity TableID fields.
I'm looking for a tool/script that would help me extract client data from one database, and move it to a brand new database (so the PKs can stay the same). I'm hoping this exists already so we don't have to build our own. Pretty flexible in how this could work, but ideally it just generates a large .sql file with all the necessary INSERTS in the right order to move the data, and another sql file with all the necessary DELETES to erase the data from the source.
If it makes any difference we are on SQL Server 2008.
If you have standard or enterprise, you do have SSIS. Although it may not qualify as a "tool", it is fairly easy to implement in this scenario.
I can recomend redgate SQL DataCompare for this, we use it for syncing data, and use their SQL Compare to sync the database schema.
Both tools can either output sql, you can execute yourself, or the tools can execute the sql scripts themself.
They have a command line version of the tools to, so you could use them in an deployment script, tho i haven't tried this.
They both work really well, and are no doubt worth the price.
Not the answer you may be looking for, but you should consider using a GUID as a key. This will ensure that you have some type of unique identifier for your all records and that you can avoid collisions with identity keys / integer based indexes. It would add another degree of traceability should something go wrong when you migrate between databases.
SplendidCRM uses this technique when importing data from other DB systems.
Update:
My assumption was that the operation of transferring data between databases was not that frequent and that you needed database architecture for that task. I would use the GUID as lookup key specifically validation for the transfer of data, but I would NOT use that as a primary key for joins for standard operations like URL's. Although unique across databases, the trade-off is that GUIDs are slow.
In other words, the GUIDS would in addition to your existing primary keys now, and act as a means of validation for you should something go wrong. If you need ClientID in Database A to retain the same value in Database B then an identity column as that identifier will be an issue. You may have to create another identifier that is not "auto-generated". This could something other than the GUID, but my instinct is that integers alone will not be enough. Maybe you can create a columns that is a hash of the identity key, customer name and database name, or more simply, just concatenate those columns into a varchar column.

Use SSIS to migrate and normalize database

We have an MS Access database that we want to migrate to a SQL Server Database with a new DB design. A part of the application that uses the SQL Server DB is already written.
I looked around to find out how to do the migration step most easily and started with Microsofts SQL Server Integration Services (SSIS). Now I have gotten to the point that I want to split a table vertically for normalization reasons.
A made up example looks like this
MS Access table person
ID
Name
Street
SQL Server table person
id
name
SQL Server table address
id
person_id
street
How can I complete this task best with SSIS? The id columns are identity (autoincrement) columns, so I cannot insert the old ID. How can I put the correct person_id foreign key in the address table?
There might even be a table which has to be broken up into three tables, where a row in table2 belongs to table1 and a row in table3 belongs to a row table2.
Is SSIS the appropriate means for this?
EDIT
Although this is a one-time migration, we need to have an automated and repeatable process, because the production database is under heavy usage and we are working on the migration in our development environment with recent, but not up-to-date data. We plan for one test run of the migration and have the customer review the behaviour. If everything is fine, we will go for the real migration.
Most of the given solutions include lots of manual steps and are thus not appropriate.
Use the execute SQL Task and write the statement yourself.
For the parent table do Select into table from table... then do the same for the rest as you progress. Make sure you set identity insert to ON for the parent table and reuse your old ID's. That will help you keep your data integrity.
For migrating your Access tables into SQL Server, use SSMA, not the Upsizing Wizard from Access.
You'll get a lot more tools at your disposal.
You can then break up your tables one by one from within SQL Server.
I'm not sure if there are any tools that can help you split your tables automatically, at least I couldn't find any, but it's not too difficult to do manually although how much work is required depends on how you used the original tables in your VBA code and forms in the first place.
A side note
Regarding normalization, don't go overboard with it: I know your example was just that but normalizing customer addresses is not always (rarely?) needed.
How many addresses can a person have?
If you count a home address, business address, delivery address, billing address, that's probably the most you'll ever need.
In that case, it's better to just keep them in the same table. Normalizing that data will just require more work to recombine and offers no benefit.
Of course, there are cases where it would make sense to normalise but I've seen people going overboard with the notion (I've been guilty of it as well) and then find themselves struggling to build more complex queries to join all that split data, making development and maintenance harder and often suffering a performance penalty in the process.
Access is so user-friendly, why not normalize your tables in Access, and then upsize the finished structure from there?
I found a different solution which was not mentioned yet and allows us to use all the comfort and options of the dataflow task:
If the destination database is on a local SQL Server, you can use a dataflow task with SQL Server destination instead of an OLE DB destination.
For a SQL Server destination you can mark the "keep identities" option. (I do not know if the english names are correct, because we have a german version.) With this you can write into identity columns
We found that we cannot use the old primary keys everywhere, because we have some tables that take a union of records from multiple tables.
We start the process by building a temporary mapping table with columns
new_id (identity)
old_id (int)
old_tablename (string)
We first fill in all the old_id s for every table that is referenced by a foreign key in the new schema. The new_id values are generated automatically by SQL Server.
So we can use a join to translate from old_id to new_id where needed. We use the new_id values to fill the identity (primary key) columns in the new tables with the "keep identities" option and can simply look them up in our mapping table for the foreign keys by a join.
You might also look at Jamie Thomson's SSIS Normalizer component. I just found out about it today (haven't actually tried it yet). The example he posts looks a lot like the one in your question.

Resources