I have a bunch of tables with a foreign key to some main table in Oracle DB.
In each of this tables there are several rows for given id value from main table.
My task is for given id from main table duplicate all records referencing that id in other tables but with new id value as a reference to main table.
I already wrote PL/SQL procedure which goes over each of my tables and does what is required in a most obvious and straightforward way. But, I wonder, what is the more elegant solution to this problem? Maybe writing some general procedure for generic case and then calling it for each table from my main procedure, or maybe something even more effective can be done here?
A solution that "goes over each table" and that operates in an "obvious and straightforward way" would seem rather likely to be the most elegant solution, at least for a large number of problems.
If you are trying to build a generic utility, you could do something like query the data dictionary to get the foreign keys that relate to a particular table and walk the tree to find the child tables, grandchild tables, etc. and dynamically build the logic you want. That's going to involve a bunch of dynamic SQL, though, which means that it will take longer to write the code, it will be harder to read and modify in the future, it will probably be slower, and it will be more fragile if you ever want to have different logic for different tables (for example, if you want to treat history or audit tables differently) or if you want to handle things that are related but not enforced by a foreign key. These may be acceptable trade-offs if you want to have similar logic that works for many different base tables or if you're building an application that you want to work with arbitrary databases. They're probably not acceptable trade-offs if you're just trying to do something that will work with one or two base tables in a custom-built system.
Following solution is the closest to what I needed and is based on answer to this question:
begin
FOR r IN (SELECT *
FROM table_name
WHERE fk_id = "old fk value")
LOOP
r.pk_id := pk_seq.NEXTVAL;
r.fk_id := "new fk value";
INSERT INTO table_name
VALUES r ;
END LOOP;
end;
Related
I am looking for a solution to detect and delete all records of a table "UniqueKeys" which are not referenced anymore from records in any other table. As my question seems not clear, I have rephrased it.
Challenge:
If there is a table called "UniqueKeys" which consists of an ID and a uniqueIdentifier column and if there are dozens of tables which references the ID field of the "UniqueKeys" table - and now there are some records in the "UniqueKeys" table whose ID are not used in any of these other tables' references, I want to be able to detect and delete them with a SQL query without hard-code the joins to all of these other tables.
The found solutions so far included explicitly writing joins with each of the "other" tables which I want to avoid here.
Like this: Other SO answer
The goal: should be a generic solution so that at any time Devs can add additional foreign tables and this solution should continually be able (without modification) to detect any references to table "X" (and avoid the deletion of such affected records).
I know that I simply could programmatically (in the programming language of my choice) iterate through all given records of table "UniqueKeys" and use exception handling to simply continue when a given record cannot be deleted because of an active constraint.
This is what I am currently doing - and it yields the desired result - but imho this is a very ugly approach.
As I am no SQL expert, show me how to better re-phrase the above if that will help a better understanding of what I am trying to achieve.
I'm looking for a way to get a diff of two states (S1, S2) in a database (Oracle), to compare and see what has changed between these two states. Best would be to see what statements I would have to apply to the database in state one (S1) to transform it to state two (S2).
The two states are from the same database (schema) at different points in time (some small amount of time, not weeks).
I was thinking about doing something like a snapshot and compare - but how to make the snapshots and how to compare them in the best way ?
Edit: I'm looking for changes in the data (primarily) and if possible objects.
This is one of those questions which are easy to state, and it seems the solution should be equally simple. Alas it is not.
The starting point is the data dictionary. From ALL_TABLES you can generate a set of statements like this:
select * from t1#dbstate2
minus
select * from t1#dbstate1
This will give you the set of rows that have been added or amended in dbstate2. You also need:
select * from t1#dbstate1
minus
select * from t1#dbstate2
This will give you the set of rows that have been deleted or amended in dbstate2. Obviously the amended ones will be included in the first set, it's the delta you need, which gives the deleted rows.
Except it's not that simple because:
When a table has a surrogate primary key (populated by a sequence)
then the primary key for the same record might have a different value
in each database. So you should exclude such primary keys from the
sets, which means you need to generated tailored projections for each
table using ALL_TAB_COLS and ALL_CONSTRAINTS, and you may have to use
your skill and judgement to figure out which queries need to exclude
the primary key.
Also, resolving foreign keys is problematic. If the foreign key is a
surrogate key (or even if it isn't) you need to look up the
referenced table to compare the meaning / description columns in the
two databases. But of course, the reference data could have different
state in the two databases, so you have to resolve that first.
Once you have a set of queries which identify the difference you are
ready for the next stage: generating the appliance statements. There
are two choices here: generating a set of INSERT, UPDATE and DELETE
statements or generating a set of MERGE statements. MERGE has the
advantage of idempotency but is a gnarly thing to generate. Probably
go for the easier option.
Remember:
For INSERT and UPDATE statements exclude columns which are populated by triggers or are generated (identity, virtual columns).
For INSERT and UPDATE statements you will need to join to referenced tables for populating foreign keys on the basis of description columns (unless you have already synchronised the primary key columns of all foreign key tables).
So this means you need to apply changes in the order dictated by foreign key dependencies.
For DELETE statements you need to cascade foreign key deletions.
You may consider dropping foreign keys and maybe other constraints, but then you may be in a right pickle when you come to re-apply them only to discover you have you have constraint violations.
Use DML Error Logging to track errors in bulk operations. Find out more.
If you need to manage change of schema objects too? Oh boy. You need to align the data structures first before you can even start doing the data comparison task. This is simpler than the contents, because it just requires interrogating the data dictionary and generating DDL statements. Even so, you need to run minus queries on ALL_TABLES (perhaps even ALL_OBJECTS) to see whether there are tables added to or dropped from the target database. For tables which are present in both you need to query ALL_TAB_COLS to verify the columns - names, datatype, length and precision, and probably mandatory too.
Just synchronising schema structures is sufficiently complex that Oracle sell the capability as a chargeable extra to the Enterprise Edition license, the Change Management Pack.
So, to confess. The above is a thought experiment. I have never done this. I doubt whether anybody ever has done this. For all but the most trivial of schemas generating DML to synchronise state is a monstrous exercise, which could take months to deliver (during which time the states of the two databases continue to diverge).
The straightforward solution? For a one-off exercise, Data Pump Export from S2, Data Pump Import into S1 using the table_exists_action=REPLACE option. Find out more.
For ongoing data synchronisation Oracle offers a variety of replication solutions. Their recommended approach is GoldenGate but that's a separately licensed product so of course they recommend it :) Replication with Streams is deprecated in 12c but it's still there. Find out more.
The solution for synchronising schema structure is simply not to need it: store all the DDL scripts in a source control repository and always deploy from there.
The two databases have identical schemas, but distinct data. It's possible there will be some duplication of rows, but it's sufficient for the merge to bail noisily and not do the update if duplicates are found, i.e., duplicates should be resolved manually.
Part of the problem is that there are a number of foreign key constraints in the databases in question. Also, there may be some columns which reference foreign keys which do not actually have foreign key constraints. These latter are due to performance issues on insertion. Also, we need to be able to map between the ids from the old databases and the IDs in the new database.
Obviously, we can write a bunch of code to handle this, but we are looking for a solution which is:
Less work
Less overhead on the machines doing the merge.
More reliable. If we have to write code it will need to go through testing, etc. and isn't guaranteed to be bug free
Obviously we are still searching the web and the Postgresql documentation for the answer, but what we've found so far has been unhelpful.
Update: One thing I clearly left out is that "duplicates" are clearly defined by unique constraints in the schema. We expect to restore the contents of one database, then restore the contents of a second. Errors during the second restore should be considered fatal to the second restore. The duplicates should then be removed from the second database and a new dump created. We want the IDs to be renumbered, but not the other unique constraints. It's possible, BTW, that there will be a third or even a fourth database to merge after the second.
There's no shortcut to writing a bunch of scripts… This cannot realistically be automated, since managing conflicts requires applying rules that will be specific to your data.
That said, you can reduce the odds of conflicts by removing duplicate surrogate keys…
Say your two databases have only two tables: A (id pkey) and B (id pkey, a_id references A(id)). In the first database, find max_a_id = max(A.id) and max_b_id = max(B.id).
In the second database:
Alter table B if needed so that a_id does cascade updates.
Disable triggers if any have side effects that might erroneously kick in.
Update A and set id = id + max_a_id, and the same kind of thing for B.
Export the data
Next, import this data into the first database, and update sequences accordingly.
You'll still need to be wary of overflows if IDs can end up larger than 2.3 billion, and of unique keys that might exist in both databases. But at least you won't need to worry about dup IDs.
This is the sort of case I'd be looking into ETL tools like CloverETL, Pentaho Kettle or Talend Studio for.
I tend to agree with Denis that there aren't any real shortcuts to avoid dealing with the complexity of a data merge.
Because I have a poor memory, I want to write a simple application to store table column information, especially the meaning of table columns. Now I have a problem with my table design. I plan to make the table have the following columns:
id, table_name, column_name, data_type, is_pk, meaning
But this design can’t express the foreign key relationship. For example: table1.column1 and table2.column3 and table8.column5 all have the same data type and the same meaning, how can I modify my table design to express this information(relationship)?
Great thanks!
PS:
In fact, recently I'm working on a legacy application. The database is poorly designed. The foreign key relationship is not expressed on the database layer but on the application layer. Now my boss not allow us to modify the database. We just need to make the application working. So I can't do some work on the database directly.
Depending on your DBMS, you could probably use comments on the table / column to record the meaning of each one of those columns. Most DBMS allow you to perform some kind of annotation.
If you must have it in your table you have a few choices.
Free text If this is just to serve as a memory aid, it doesn't really need to be machine readable. This makes it easier for you to read / use directly.
fk_id Store the ID of the field this foreign key maps into. You could then define a view that pulls in the meaning column from this foreign key.
Meaning Table Store meaning as an ID into a seperate table and use a view to make it easier to work with.
Create a document Keep it in a document instead. That way you can print it out and have it handy.
You could try designing a full de-normalized schema for this, but I'd argue thats seriously over-thinking something that's just meant as a memory aid.
I would just add a column to your design "FK_Column_ID" that will hold a reference to column ID in case of a FK constraint.
The other way will be to create a duplicate of your DB as DBDefinitions or something like that.
Almost all DBMS allow you to attach descriptions or comments to table, index, and column definitions.
Oracle:
COMMENT ON COLUMN employees.job_id IS 'abbreviated job title';
If you specify foreign key relationships as part of the schema, the database will keep track of them, and will display them for you.
It is not possible to define a compound foreign key relationship with a single additional column. I would suggest that you create a second table to define the foreign keys, perhaps with the following columns:
id, fk_name, primary_table_id, foreign_table_id
and add a fk_id column to relate the fields used in the foreign key relationship. This works for both the single column foreign key and the compound foreign key.
Alternatively, and with some attempt at diplomacy, tell your boss that if you can't fix the root cause of an issue, then the time required to complete the project will be much longer than expected. First you will take some time to implement a work around which will not perform adequately, then you will take more time to implement the fix you should have implemented in the first place (which in this case is fixing the database.)
If you're not allowed to edit the database then presumably you're creating this in another standalone DBMS. I don't think is something you can acheive simply and you may well be better of just writing it up in a text document.
I think that you need more than one table. If you create a table of tables:
id, table_name, meaning
And then a table of columns:
id, column_name, datatype, meaning
You can then create a link table:
table_id, column_id, is_pk, meaning
This will enable you to have the same column linked to more than one table - thus expressing your foreign keys. As I said above though - it may be more effort than its worth.
FWIW, I do this quite often and the best "simple application" I've found is a spreadsheet.
I use a page for table/column defs, and extra pages as I need them for things like FK relationships, lookup values etc.
One of the great things about a spreadsheet for this app, is adding columns to the sheet as you need them, & removing them when you don't.
The indexing ability of a spreadsheet is also v. useful when you have a large number of tables to work with.
I know this does not answer your question directly, but how about using a database diagram?
I also have a poor memory (age I guess) and I always have an up to date diagram on my wall.
You can show all the tables, fields and foreign keys and also add comments.
I use the PowerAMC (aka PowerDesigner from Sybase) database designer, it also generates the SQL script to create the database, perhaps not very useful for legacy databases, although it will reverse engineer the database and create the diagram automatically (it can take some time to make the diagram readable).
I don't see a reason why you should implement some app to store some info there. You can as well use smth like OneNote or any other available organizer, development wiki, etc.: there are tons of ways to store info in such a way that it comes handy when you look up for it in future.
If you can make some inner changes, you can change keys' constraints names to readable pattern, like table1_colName_table2_colName.
And at least you can make some diagram, whether hand-made or using some design application.
If all this doesn't solve your problem, some more details are needed on what exactly you need to solve :)
I have the same database running on two different machines. The DB's make extensive use of Identity columns, and the tables have clashed pretty horribly. I now want to merge these two together before sorting out the undelying issue which I may do by
A) Using GUIDs (unweildy but works everywhere)
B) Assigning Identity ranges, kind of naff, but means you can still access records in order, knock up basic Sql and select records easily, and it identifies which machine originated the data.
My question is, what's the best way of re-keying (ie changing the primary keys) on one of the databases so the data no longer clashes. We're only looking at 6 tables total, but lots of rows ~2M in the 3 tables.
Update - is there any real sql code out there that does this, I know about Identity Insert etc. I've solved this issue in a number of in-elegant ways before, and I was looking for the elegant solution, preferable with a nice TSQL SP to do the donkey work - if that doesn't exist I'll code it up and place on wiki.
A simplistic way is to change all keys on the one of the databases by a fixed increment, say 10,000,000, and they will line up. In order to do this, you will have to bring the applications down so the database is quiet and drop all FK references affected by this, recreating them when finished. You will also have to reset the seed value on all affected identity columns to an appropriate value.
Some of the tables will be reference data, which will be more complicated to merge if it is not in sync. You could possibly have issues with conflicting codes meaning the same thing on different instances or the same code having different meanings. This may or may not be an issue with your application but if the instances have been run without having this coordinated between them you might want to check carefully for this.
Also, data like names and addresses are very likely to be out of sync if there wasn't a canonical source for these. You may need to get these out, run a matching query and get the business to tidy up any exceptions.
I would add another column to the table first, populate that with the new Primary key.
Then I'd use update statements to update the new foreign key fields in all related tables.
Then you can drop the old Primary key and old foreign key fields.