Maintain database legacy table data in parallel with new database design - database

I am currently working on a project with the final aim of converting a legacy system to a true N-Layer architecture. The initial part of the project involves converting the underlying database to a true relational design.
The underlying database is currently running on the IBM iSeries. The tables are defined using DDS and contain mountains of redundant data, no integrity checking and poorly designed keys etc. Basically refactoring them to a fully normalized design is a non-starter.
New tables are going to be designed from scratch. They will also be on the same iSeries but will be defined with DDL. This will also involve re-writing any insert or update code throughout the application to utilise the new tables. There is however a large amount of legacy apps responsible for reports, displays etc that will not be re-written at this point and will still be reading from the original tables. So we need to keep the data in old, legacy tables in sync with new table data. I was wondering if anyone had ever done something similar or had any suggestions? I am currently thinking either:
1) The stored procedures that inserts, updates, deletes from new table A will also do the same to matching legacy table B. At some point down the line the stored procedure will need to be modified to stop syncing table B
2) Place triggers on table A that also modifies table B. Then the triggers will be removed down the line... the stored procedure don't need to be changed but will this still work fine with transaction management?
3) Remove legacy table B and recreate it as view on table A. Not sure if this will work as table B works on keyed access and I believe views don't support this?
Would be interested to hear anyone's thoughts?
Cheers

I favour number 3. I assume that you would run some conversion process that would take the existing data from legacy table B and put it into new table A (or more likely, a set of new tables since it sounds like legacy table B is denormalized).
If your legacy apps responsible for reports, displays etc don't need to do any updates then using views is the ideal situation. I think keeping two separate copies of data (one normalized, one denormalized) would turn into a headache, especially when the manager's TPS reports don't match what he entered in the new system. At least with only one copy of the data, you don't have to worry about performing data fixes in two places.

Related

Data independence in relational database

Can anybody explain how data independence in ensured in a relational database? What says that nothing will change for the user if the database structure changes?
For example, I have relation R (and have created an application which uses this relation) and the database admin decides to make a decomposition of R to R1 and R2. How is application inalterability ensured for the end user?
I asked myself exactly the same question during my Database class.
According Codd's 12 rules, there are two kinds of data independence:
Physical Data Independence requires that changes at the physical level (like data structures) have no impact in the applications that consume the database. For example, let's say you decide to stop using a Hash Index in your table and decide to use a B-Tree Index instead: Your application that executes queries against this table doesn't have to change at all.
Logical Data Independence states that changes at the logical level (tables, columns, rows) will have no impact in the applications that access the database. As you already noticed, this feature is harder to implement that Physical Data Independence but there are still cases when this feature works. For example, if you add Tables, Columns or Rows to your current scheme the already working queries aren't affected at all.
Your question is not phrased very clearly. I don't see the relationship between between "data independence" and "application inalterability".
A proper relational structure decomposes data into entities and relationships. The idea is that when a value changes, it only changes in one place. This is the reasoning behind the various "normal forms" of data.
Most user applications do not want to see data in a normalized form. They want to see data in a denormalized form, often with lots of fields gathered together on one line. Similarly, an update might involve several fields in different entities, but to a user, it is just one thing.
A relational database can maintain the structure of the data and allow you to combine data for different viewpoints. It has nothing to do with your second point. Application independence (I think this is a better word than "inalterability") depends on how the application is designed. A well-designed application has a well-design application programming interface (also known as an API).
It seems that a lot of database developers think that the physical data structure is good enough as an API. However, this is often a bad design decision. Often, a better design decision is to have all database operations performed through stored procedures, views, and user defined functions. In other words, don't directly update a table. Create a stored procedure called something usp_table_update that takes fields and updates the table.
With such a structure, you can modify the underlying database structure and maintain user applications at the same time.
what says that nothing will change for the user if the database
structure changes?
Well, database structures can change for many reasons. On a high level, I see two possibilities:
Performance / internal database reasons
Business rules / the world outside the application changed
#1: in this case, the DBA has decided to change some structure for performance or ... In that case an extra layer, for example using stored procedures, views etc. can help to "hide" the change to the application/user. Or a good data-layer on the application side could be helpfull.
#2: if the outside world changes, or your business rules change, NOTHING you can do on the database level can keep that away from the user. For example a company that always has used only ONE currency in the database is suddenly going international: in that case your database has to be adopted to support multi currency and it will need serious alteration in the database and for the user.
For example, I have relation R (and created application which uses this relation) and the database admin desides to make a decomposion of R to R1 and R2. How the application inalternability is ensured for the end user?
The admin should create a view which would represent R1 and R2 as the original R.

Database Table Synchronization Without Table Dropping?

My company's workflow relies on two MSSQL databases: one for web content data and the other is the ERP. I've been doing some proof of concept on some tools that would serve as an intermediary that builds a relationship between the datasets, and thus far its proving to be monumentally faster.
Instead of reading out to both datasets, I'd much rather house a database on the local Linux box that represents the data I'm working with. That way, its less pressure on the system as a whole.
What I don't understand is if there is a way to update this new database without completely dropping the table each time or running through a punishing line by line check. If the records had timestamps, this would be easy...but they don't.
Does anyone have any tips? Am I missing some crucial feature I don't know about, or am I
SOL?
Finally, is there one preferred database stack out there anyone thinks might work better than another? I'm not committed to any technology at this point.
Thanks!
Have you read about the MERGE statement in SQL? It allows update or inserts on existing tables.
I assume your tables have primary keys even though you say there is no timestamp.

Dynamic ability on creating transformations of data from one set of tables to another on SQL Server

I have a scenario in which i receive files from a source and i blindly dump the data in the files to a staging area DB into number of tables. Now i need to translate the data in the Raw Data tables to a format that my Primary database model would understand and eventually do a move of the translated tables to my Primary Database from the staging area.
For example, i may have to join 3 tables in the raw data stage tables and get a final list of columns to setup a final primary table that is compatible with my primary DB. I may have a lot of rules for translation. Join is one.
So my question is this, What is the best way to do this ? I am planning to have a rule table which may have the Source table, RuleSet, Destination table and construct queries dynamically in a stored procedure that would read the data from the rule table, construct dynamic queries such that the query creates the final primary tables in the format given by the table.
I am looking for better ideas or more design insights on this idea from the experts for the rule table, so as to seamlessly do the translation.
Edit : The idea is i am gonna reuse this DB design for many of our instances. So i am intending to populate the rule table and run the procedure instead just having an ETL process for each of the instances.
If you have a good understanding of the scope of the possible transformations, you can probably get this to work.
I'm a fan of getting a few (possibly more difficult) examples under my belt first, before attempting to extract a framework, because a framework without good use cases can 1) have unused features, 2) be difficult to configure and be hard to document, 3) spectacularly fail to meet future use cases
That's not to say that a framework cannot be refactored, but that refactoring a framework which you know already meets 80% of your use cases (because you understand the scope) is a lot easier than refactoring one which only meets 10% to one which meets 20% to one which meets 30% etc.

Replication - syncronizing most of the data some of the time

I have some data that isn't properly "partitioned" (for lack of a better word).
All inserts, processing and reporting happen on the same table. The bulk of the processing happens not long after the insert and not long after that it becomes immutable (we're talking days).
I could do all inserts and processing on a new table that I replicate to the old table. When I detect that the data has become immutable I would delete the data from the new table, but I would edit the delete replication stored procedure so that the delete did not replicate.
How bad an idea is this? <edit1>That is, editing the replication stored procedure.</edit1>
It seems attractive at the moment (I haven't slept on it yet) because it might mitigate a performance problem with only very small changes to the application. It also seems like it might be a good way to shoot myself in the foot.
Edit1:
I like the idea of inserting into two tables because I can avoid the view and the maintenance window described in Jono's answer. No offense, Jono, I actually use this technique elsewhere.
I might want to use replication because one table might be in another database (I know, I didn't mention this) and that way I don't have to worry about committing to two tables, I just let replication handle that.
My actual concern (that I didn't make clear) is that editing the replication stored procedure could end up being a deployment/maintenance headache.
I wouldn't advocate replication to solve a performance issue (unless it's a problem of physical data distribution); if anything it's going to slow your system down as the changes are propagated to their destination. If you're using a single server, I'd suggest adding a second table with the same schema as the first, but with your indexes optimised for the kind of work you do in your processing phase. Then create a view that selects from both tables, and use that view in any query where you want the union of both tables. You could then throw more hardware at the second table (I'm thinking of a separate file group over more spindles) and then migrate the data on a weekly delay into the first table, during an available maintenance window.

Should I clone or denormalize my database for portable use?

I have a database that has lots of data and is all "neat", normalized (within reason - using EAV), and I have stored procedures to access and modify the data.
I also have a WinForms application that users download to search and view this data (no inserts). To make things handy for use and updates, I've been using SQLite to store this data and it works really well.
I'm working on updating the entire process and I was wondering if I should use a denormalized view of the data to ship out to the users, ala the 1 table with all the properties as columns, or continue to use the same schema as the master database?
My initial thoughts are along the lines of :
Denormalized View:
Benefits...
Provides a simple method of querying the data (since I'm not doing a lot of joins, just a bunch of column searching.
Cons...
I'd have to manage a second data access layer. Granted I don't think it will be difficult, but it is still a bit more work.
If a new property is added, I'd have to modify the schema again and accomodate for the changes. Wheras I can simply query the property bag and work form there.
Same Schema:
Pros...
Same layout as master database, so updates are minimal, and I can even use the same queries when building my Data Access Layer since SQLite doesn't support stored procedures.
Cons...
There is a lot of small tables for lookup codes and the like, so I could start running into issues when building the queries and managing it in the DAL.
How should I proceed?
If you develop your application to query views of the data rather than the underlying data itself, you will be able to keep the same database for both scenarios without concern or the need to alter your DAL.

Resources