Recently I've been trying to restructure an old database that was not designed with filegroups (just the default PRIMARY) and, among other things, move a bunch of tables to a new Data filegroup residing on a SAN. I know how to migrate the data:
ALTER TABLE MyTable
DROP CONSTRAINT PK_MyTable WITH (MOVE TO [MyDB_Data])
ALTER TABLE MyTable
ADD CONSTRAINT PK_MyTable
PRIMARY KEY CLUSTERED (MyID)
ON [MyDB_Data]
But damned if this isn't the most tedious work I've ever had to. And it's error-prone. At one point I was halfway (I assume, since there's no progress indicator) through moving a 30 GB table before I realized that I had accidentally included one of the value columns in the PK. So I had to start all over again.
It's even worse when the table has a lot of dependencies. Then I can't just drop the primary key; I have to drop and recreate every foreign key that references it. This leads to hundreds of lines of boilerplate; multiply by 100 tables and it becomes downright asinine. My wrists hurt.
Has anybody come up with a shortcut for this? Are there maybe any tools out there (priced with the notion of one-time-use in mind) that can do it? Perhaps somebody here has had to go through this process before and wrote their own tool/script that they wouldn't mind sharing?
SSMS won't do it, obviously - it can only generate migration scripts for non-clustered indexes (and they have to be indexes, not UNIQUE constraints - on at least a few tables, for better or for worse, the clustered index is not actually the primary key, it's a different UNIQUE constraint).
It's not that the syntax is so complicated that I can't write a code gen for it. At least for the basic drop-and-recreate-the-primary-key part. But add in the overhead of figuring out all the dependencies and generating drop/recreate scripts for all the foreign keys and this starts to feel like it's just over that threshold where it's more work to automate and fully test than it is to just do every table manually as with the example above.
So, the question is: Can this process be automated in any reasonably straightforward way? Are there any alternatives to what I've written above?
Thanks!
The simplest way to do it, IMO, would be to use one of the schema comparison tools (My tool, red gate's SQL Compare, Apex SQL Diff as a couple of examples) to create a script of your schema. Then, edit that script to create all the objects, empty, in the right file groups. Having done that, you can then use the same tools to compare your new DB with correct filegroups, and they will generate the scripts to migrate the data for you. It's worth testing with multiple ones to find which is the most appropriate for you.
Related
I'm looking for a way to get a diff of two states (S1, S2) in a database (Oracle), to compare and see what has changed between these two states. Best would be to see what statements I would have to apply to the database in state one (S1) to transform it to state two (S2).
The two states are from the same database (schema) at different points in time (some small amount of time, not weeks).
I was thinking about doing something like a snapshot and compare - but how to make the snapshots and how to compare them in the best way ?
Edit: I'm looking for changes in the data (primarily) and if possible objects.
This is one of those questions which are easy to state, and it seems the solution should be equally simple. Alas it is not.
The starting point is the data dictionary. From ALL_TABLES you can generate a set of statements like this:
select * from t1#dbstate2
minus
select * from t1#dbstate1
This will give you the set of rows that have been added or amended in dbstate2. You also need:
select * from t1#dbstate1
minus
select * from t1#dbstate2
This will give you the set of rows that have been deleted or amended in dbstate2. Obviously the amended ones will be included in the first set, it's the delta you need, which gives the deleted rows.
Except it's not that simple because:
When a table has a surrogate primary key (populated by a sequence)
then the primary key for the same record might have a different value
in each database. So you should exclude such primary keys from the
sets, which means you need to generated tailored projections for each
table using ALL_TAB_COLS and ALL_CONSTRAINTS, and you may have to use
your skill and judgement to figure out which queries need to exclude
the primary key.
Also, resolving foreign keys is problematic. If the foreign key is a
surrogate key (or even if it isn't) you need to look up the
referenced table to compare the meaning / description columns in the
two databases. But of course, the reference data could have different
state in the two databases, so you have to resolve that first.
Once you have a set of queries which identify the difference you are
ready for the next stage: generating the appliance statements. There
are two choices here: generating a set of INSERT, UPDATE and DELETE
statements or generating a set of MERGE statements. MERGE has the
advantage of idempotency but is a gnarly thing to generate. Probably
go for the easier option.
Remember:
For INSERT and UPDATE statements exclude columns which are populated by triggers or are generated (identity, virtual columns).
For INSERT and UPDATE statements you will need to join to referenced tables for populating foreign keys on the basis of description columns (unless you have already synchronised the primary key columns of all foreign key tables).
So this means you need to apply changes in the order dictated by foreign key dependencies.
For DELETE statements you need to cascade foreign key deletions.
You may consider dropping foreign keys and maybe other constraints, but then you may be in a right pickle when you come to re-apply them only to discover you have you have constraint violations.
Use DML Error Logging to track errors in bulk operations. Find out more.
If you need to manage change of schema objects too? Oh boy. You need to align the data structures first before you can even start doing the data comparison task. This is simpler than the contents, because it just requires interrogating the data dictionary and generating DDL statements. Even so, you need to run minus queries on ALL_TABLES (perhaps even ALL_OBJECTS) to see whether there are tables added to or dropped from the target database. For tables which are present in both you need to query ALL_TAB_COLS to verify the columns - names, datatype, length and precision, and probably mandatory too.
Just synchronising schema structures is sufficiently complex that Oracle sell the capability as a chargeable extra to the Enterprise Edition license, the Change Management Pack.
So, to confess. The above is a thought experiment. I have never done this. I doubt whether anybody ever has done this. For all but the most trivial of schemas generating DML to synchronise state is a monstrous exercise, which could take months to deliver (during which time the states of the two databases continue to diverge).
The straightforward solution? For a one-off exercise, Data Pump Export from S2, Data Pump Import into S1 using the table_exists_action=REPLACE option. Find out more.
For ongoing data synchronisation Oracle offers a variety of replication solutions. Their recommended approach is GoldenGate but that's a separately licensed product so of course they recommend it :) Replication with Streams is deprecated in 12c but it's still there. Find out more.
The solution for synchronising schema structure is simply not to need it: store all the DDL scripts in a source control repository and always deploy from there.
I have a tool which uses SQL scripts to apply changes to a customer database. Often this invloves changing a column definition (datatype etc). The problem is that often there are primary keys applied by the user that we don't know about (and they don't remember), which trips up the process (eg when changing columns belonging to the indexes or primary keys).
The requirement given to me is that this update process should be 'seamless', with no human involvement to prepare the ground. I have also researched this on this forum, and as far as I can see my particular question has not yet been asked.
I know how to disable and then later rebuild all indexes on a database, and even those only in certain tables, but if the index is on a primary key I still can't change any column that is part of the primary key unless I explicitly drop the PK by name, and later recreate it explicitly, which means I have to know about it at code-time. I can probably write a query to find the name of the primary key on a table if one is there, but how to know how to recreate it?
How can I, using Transact-SQL (or PL/SQL), detect, drop and then recreate the primary keys on given tables, without knowing at code time what they are or what columns belong to them? The key is that the tool cannot know in advance what the primary keys are are on any given table, nor what they comprise. The SQL code must handle this itself.
Better still would be to detect if a known column belongs to a primary key, then drop and later recreate that after I have changed the column.
This needs to be done in both Oracle and Sql Server, ideally purely with SQL code.
TIA
I really don't understand why would a customer define his own primary keys for the tables? Moreover, I don't understand why would you let them? In my world, if customer changes schema in any way, this automatically means end of support for them.
I will strongly advise against dropping and recreating primary keys on production database. Any number of bad things can happen, leading to data loss.
And it's not just the PKs, you will have to drop the foreign key constraints first. And FKs may reference not only the PKs but the unique constraints as well, so yao have to deal with those as well.
Your best bet would be to create a new table with the required schema, copy the data, drop original table and rename the new one. Of course, you will have to handle the FKs, but it's easier. Check this link an example:
http://sqlblog.com/blogs/john_paul_cook/archive/2009/09/17/script-to-create-all-foreign-keys.aspx
The two databases have identical schemas, but distinct data. It's possible there will be some duplication of rows, but it's sufficient for the merge to bail noisily and not do the update if duplicates are found, i.e., duplicates should be resolved manually.
Part of the problem is that there are a number of foreign key constraints in the databases in question. Also, there may be some columns which reference foreign keys which do not actually have foreign key constraints. These latter are due to performance issues on insertion. Also, we need to be able to map between the ids from the old databases and the IDs in the new database.
Obviously, we can write a bunch of code to handle this, but we are looking for a solution which is:
Less work
Less overhead on the machines doing the merge.
More reliable. If we have to write code it will need to go through testing, etc. and isn't guaranteed to be bug free
Obviously we are still searching the web and the Postgresql documentation for the answer, but what we've found so far has been unhelpful.
Update: One thing I clearly left out is that "duplicates" are clearly defined by unique constraints in the schema. We expect to restore the contents of one database, then restore the contents of a second. Errors during the second restore should be considered fatal to the second restore. The duplicates should then be removed from the second database and a new dump created. We want the IDs to be renumbered, but not the other unique constraints. It's possible, BTW, that there will be a third or even a fourth database to merge after the second.
There's no shortcut to writing a bunch of scripts… This cannot realistically be automated, since managing conflicts requires applying rules that will be specific to your data.
That said, you can reduce the odds of conflicts by removing duplicate surrogate keys…
Say your two databases have only two tables: A (id pkey) and B (id pkey, a_id references A(id)). In the first database, find max_a_id = max(A.id) and max_b_id = max(B.id).
In the second database:
Alter table B if needed so that a_id does cascade updates.
Disable triggers if any have side effects that might erroneously kick in.
Update A and set id = id + max_a_id, and the same kind of thing for B.
Export the data
Next, import this data into the first database, and update sequences accordingly.
You'll still need to be wary of overflows if IDs can end up larger than 2.3 billion, and of unique keys that might exist in both databases. But at least you won't need to worry about dup IDs.
This is the sort of case I'd be looking into ETL tools like CloverETL, Pentaho Kettle or Talend Studio for.
I tend to agree with Denis that there aren't any real shortcuts to avoid dealing with the complexity of a data merge.
i am using linq to sql .dbml ,
May i know what is the best way to add foreign key constraint to the database?
ALTER TABLE Staffs
Add CONSTRAINT fk_Staffs FOREIGN KEY(UserId) REFERENCES Users(Id);
i can write this with no problem. But when my database table increases, i have hard time to maintain the Add Constraint foreign script . Each time when i have multiple update to the database columns, then i will crack my head to update those alter table script.
Could there be a simple process for this? In the .dbml, i can drag and drop the association to add the foreign key, i wonder is there a way that i can export those foreign key into script which like what i wrote above? this is good when i want to do the deployment.
Or must i write the alter script and update it whenever there is changes on tables?
please advice
You only need to do this once per database update that actually changes a FK relationship.
In the context of doing a database refactoring this is usually not a big deal of the whole refactoring.
But if you don't like writing your scripts you can use the table designer i SQL Management Studio.
Right click table -> Design
Right click on the appropritate database column (one of the rows in the designer) -> Relationships
In the dialog, add a new relationship and select related tables and columns in the properties editor.
Done.
This is the right way to do it. You can also do it in the designer as written in another answer here but that way if you have to promote from development to production you must do it all by hand and that is very tedious and can easilly lead to errors.
A compromise can be to use the designer to do the changes and in SQL management studio use the right mouse click and select `Script object...´. Than you do not have to type that much.
You mention a change of table names. Well, that should not happen that often!
If it happens a lot, I advise you to create some naming conventions with your team about how to name your columns (and stick to them) and the amount of work will be limited.
I have the same database running on two different machines. The DB's make extensive use of Identity columns, and the tables have clashed pretty horribly. I now want to merge these two together before sorting out the undelying issue which I may do by
A) Using GUIDs (unweildy but works everywhere)
B) Assigning Identity ranges, kind of naff, but means you can still access records in order, knock up basic Sql and select records easily, and it identifies which machine originated the data.
My question is, what's the best way of re-keying (ie changing the primary keys) on one of the databases so the data no longer clashes. We're only looking at 6 tables total, but lots of rows ~2M in the 3 tables.
Update - is there any real sql code out there that does this, I know about Identity Insert etc. I've solved this issue in a number of in-elegant ways before, and I was looking for the elegant solution, preferable with a nice TSQL SP to do the donkey work - if that doesn't exist I'll code it up and place on wiki.
A simplistic way is to change all keys on the one of the databases by a fixed increment, say 10,000,000, and they will line up. In order to do this, you will have to bring the applications down so the database is quiet and drop all FK references affected by this, recreating them when finished. You will also have to reset the seed value on all affected identity columns to an appropriate value.
Some of the tables will be reference data, which will be more complicated to merge if it is not in sync. You could possibly have issues with conflicting codes meaning the same thing on different instances or the same code having different meanings. This may or may not be an issue with your application but if the instances have been run without having this coordinated between them you might want to check carefully for this.
Also, data like names and addresses are very likely to be out of sync if there wasn't a canonical source for these. You may need to get these out, run a matching query and get the business to tidy up any exceptions.
I would add another column to the table first, populate that with the new Primary key.
Then I'd use update statements to update the new foreign key fields in all related tables.
Then you can drop the old Primary key and old foreign key fields.