We are running a pretty uncommon erp-system of a small it-business which doesn't allow us to modify data in an extensive way. We thought about doing a data update by exporting the data we wanted to change directly from the db and by using Excel VBA to update a bunch of data of different tables. Now we got the data updated in excel which is supposed to be written into the Oracle DB.
The it-business support told us not to do so, because of all the triggers running in the background during a regular data update in their program. We are pretty afraid of damaging the db so we are looking for the best way to do the data update without bypassing any trigger. To be more specific there are some thousands of changes we've done in different columns and tables merged all together in one Excel-file. Now we have to be sure to insert the modified data into the db and firing all the triggers the erp-software does during data update.
Is there anyone who knows a good way to do so?
I don't know what ERP system you are using, but I can relate some experiences from Oracle's E-Business Suite.
Nowadays, Oracle's ERP includes a robust set of APIs that will allow your custom programs to safely maintain ERP data. For example, if you want to modify a sales order, you use Oracle's API for that purpose and it makes sure all the necessary, related validations and logic are applied.
So, step #1 -- find out if your ERP system offers any APIs to allow you to safely update your data.
Back in the early days of Oracle's ERP, there were not so many APIs. In those days, when we needed to update a lot of table and had no API available, the next approach would be to use some sort of data loader tool. The most popular was, in fact, called "Data Loader". What this would do is read your data from an Excel spreadsheet and send it to the ERP's user interface -- exactly as though it were being typed in by a user. Since the data went through the ERP's UI, all the necessary validations and logic would automatically be applied.
In really extreme cases, when there was no API and DataLoader was, for whatever reason, not practical, it was still sometimes deemed necessary and worth the risk to attempt our own direct update of the ERP tables. This is, in general, risky and a bad practice, but sometimes we do what we must.
In these cases, we would start a database trace going on a user's session as they keyed in a few updates via the ERP's user interface. Then, we would use the trace to figure out what validations and related logic we needed to apply during our custom direct updates. We would also analyze the source code of the ERP system (since we had it available in the case of Oracle's ERP). Then, we would test it extensively. And, after all that, it was still risky and also prone to break after upgrades. But, in general, it worked as a last resort.
No my problem is that I need to do the work fast by make some automation in my processes. The work is already done on excel that's true but it needed the modification anyway. It's only if I put it manually with c&p into the db over our ERP or all at once over I don't know what.
But I guess Mathew is right. There are validation processes in the ERP so we can't write it directly into the db.
I don't know maybe you could contact me if you have a clue to bypass the ERP in a non risky manner.
We are currently developping an application which use a database.
Every time we update the database structure, we have to provide a script to update the database from the previous version to the current one.
So the database has currently a number that gave us it's current version and then our software make an update when we want to use an "old" database.
The issue we are encountering is when we have branches:
When we create a new big feature, that will not be available for users(and not included in releases), we create a branch.
The main branch(trunk) will be merged regularly to ensure that the create brunch has the latest bug corrections.
Here is some illustration:
The issue is with our update scripts. They update from the previous version to the current one, then update the version number of the database.
Imagine that we have the DB version 17 when creating the branch.
We then do the branch, and make changes on the Trunk DB. The DB has now the version 18.
Then we make a db change on the branch. Since we know there has already been a new version "18", we create the version 19 and the updater 18->19.
Then the trunk is merged on the branch.
At this very moment we may have some updaters that will never runs.
If someone updated his database before the merge, his database will be flagged has having the version 19, the the update 17->18 will never be done.
We want to change this behavior but we can't find how:
Our constraints are:
We are unable to make all changes on the same branch
Sometimes we have more than just 2 branchs, and we can only merge from the trunk to the feature branch until the feature is finished
What can we do to ensure a continuity between our database branch?
I think the easiest way is to use the Ruby-on-rails approach. Every DB change is a separate script file, no matter how small. Each script file is numbered, and when you do an upgrade you simply run each script from the number your DB currently is to the last one.
What this means in practice is that your DB version system stops being v18 to v19, and starts being v18.0 to v18.01, then v18.02 etc. What you release to the customer may get rolled up into a big v19 upgrade script, but as you develop, you will be making many, many small upgrades.
You'll have to modify this slightly to work for your system, each script will either have to be renumbered as it gets merged to the branch or you will have to ensure the upgrade scripts don't simply track the last upgrade number, but track each upgrade number so missing holes will still get filled in as the script gets merged across.
You will also have to roll up these little upgrades into the next major number as you create the release tag (on the trunk first) to keep things sane.
edit: so fundamentally you first havew to get rid of the notion of using a upgrade sdcript to go from version to version. For example, if you start with a table, and trunk adds column A and the branch adds column B, then you merge trunk to branch - you cannot realistically "upgrade" to the version with both, unless the branch version number is always greater than the trunk's upgrade script, and that doesn't work if you subsequently merge trunk to the branch. So you must therefore scrap the idea of a "version" that applies to development branches. The only way round that is to update each change independently, and track each change individually. Then you can say you need the "last main release plus colA plus colB" (admittedly if you merge trunk in, you can take the current main release from trunk whether its v18 or v19, but you still need to apply each branch update individually).
So you start with trunk at DB v18. Branch and make changes. Then you merge trunk later, where the DB is at v19. Your earlier branch changes still need to be applied (or should already be applied, but you may need to write a branch-update script with all branch changes in it, if you re-create your DB). Note the branch does not have a "v20" version number at all, and the branches changes are not made to a single update script like you have on trunk. You can add these changes you make on branch as a single script if you like (or 1 script of 'since the last trunk merge' changes) or as many little scripts.
When the branch is complete, the very last task is to take all the DB changes made for the branch and toll them up into a script that can be applied to the master upgrader, and when it is merged onto trunk, that script is merged into the current upgrade script and the DB version number bumped.
There is an alternative that may work for you, but I found it to be a little flaky when you try to update DBs with data, sometimes it just couldn't manage to do the update and the DB had to be wiped and re-created (which, to be fair, is probably what would have had to happen if I used SQL scripts at the time). That's to use Visual Studio Database project. This stores every part of the schema as a file, so you'll have 1 script per table. These will be hidden from you by Visual Studio itself that will show you designers instead of scripts but they're stored as files in version control. VS can deploy the project and will try to upgrade your DB if it already exists. Be careful of the options, many defaults say "drop and create" instead of using alter to update an existing table.
These projects can generate a (largely machine-readable) SQL script for deployment, we used to generate these and deliver them to a DBA team who didn't use VS and only accepted SQL.
And lastly, there's Roundhouse which is not something I've used but it might help you to become the new upgrader "script". Its a free project and I've read its more powerful and easier to use than VS DB projects. Its a DB versioning and change management tool, integrates with VS, and uses SQL scripts.
We use the following procedure for about 1.5 years now. I don't know if this is the best solution, but we didn't have any trouble with it (except some human errors in a delta-file like forgetting a USE-statement).
It has some simularities with the answer that Krumia gave, but differs in the point that in this approach only new change scripts/delta files are executed. This makes it a lot easier to write those files.
Delta files
Write all the DB-changes you make for a feature in a delta-file. You can have multiple statements in one delta-file or split them up into multiple. Once committed that file it's best (and once merged it's necessary) to start a new one and leave the old one untouched.
Put all the delta-files in one directory and give them a name-pattern like YYYY-MM-DD-HH.mm.description.sql. It's essential that you can sort them in time (therefore the timestamp) so you know what file needs to be executed first. Besides that you don't want to have a merge conflict with those files so it should be unique (over all branches).
Merging/pulling
Create a merge-script (for examlpe a bash-script) that performs the following actions:
Note the current commit-hash
Do the actual merge (or pull)
Get a list of all the delta-files that are added with this merge (git diff --stat $old_hash..HEAD -- path/to/delta-files)
Execute those delta-files, in the order specified by the timestamp
By using git to determine what files are new (and thus what database-actions aren't executed yet on the current branch) you are not longer bound to version-numbering.
Alternating delta-files
It might happen that within one merge delta-files from different branches may be 'new to execute' and that those files alternate like this:
2014-08-04-delta-from-feature_A.sql
2014-08-05-delta-from-feature_B.sql
2014-08-06-delta-from-feature_A.sql
As the timestamp determines the execution-order there will be first added something from feature A, then feature B, then back again to feature A. When you write proper delta-files, that are executable by themself/stand-alone, that shouldn't be a problem.
We recently have started using the Sql Server Data Tools (SSDT), which replaced the Visual Studio Database Project type, to version control our SQL databases. It creates a project for each database, with items for views and stored procedures and the ability to create Data-Tier Applications (DACPAC) that can be deployed to SQL Server instances. SSDT also supports Unit Testing and Static Data, and offers developers the option of quick sandbox testing using a LocalDB instance. There is a a good TechEd video overview of the SSDT tools and a lot more resources online.
In your situation you would use SSDT to manage your database objects in version control along side your application code, using the same merging process to push features between branches. When it comes time to upgrade an existing install you would create the DACPACs and use the Data-Tier Application upgrade process to apply the changes. Alternatively you could also use database synchronization tools such as DBGhost or RedGate to apply updates to the existing schema.
You want database migrations. Many frameworks have plugins for this. For instance CakePHP uses a plugin from CakeDC to manage. Here are some generic tools: http://en.wikipedia.org/wiki/Schema_migration#Available_Tools.
If you want to roll your own, perhaps instead of keeping the current DB version in the database, you keep a list of which patches have been applied. So instead of version table with one row with value 19, you instead have a patches table with multiple rows:
Patches
1
2
3
4
5
8
Looking at this you need to apply patches 6 and 7.
I just stumbled upon an older article written in 2008 by Jeff Atwood; hopefully it is still relevant to your problem.
Get Your Database Under Version Control
It mentiones five part series written by K. Scott Allen:
Three rules for database work
The Baseline
Change Scripts
Views, Stored Procedures and the Like
Branching and Merging
There are tools specifically designed to deal with this type of problems.
One is DBSourceTools
DBSourceTools is a GUI utility to help developers bring SQL Server
databases under source control. A powerful database scripter, code
editor, sql generator, and database versioning tool. Compare Schemas,
create diff scripts, edit T-SQL with ease. Better than Management
Studio.
Another one:
neXtep Designer
NeXtep designer is an Integrated Development Environment for database
developers. The main concept behind the product is to take advantage
of versioning in order to compute the incremental SQL scripts you need
to deliver your developments.
This project aims at building a development platform that provides all
tools which a database developer needs while automating the tasks of
generating the deliveries (= SQL resulting from a development).
To learn more about the problematic of delivering database updates, we
invite you to read the Delivering database updates article which will
present you our vision of best and worst practices.
I think an approach which will satisfy most of your requirements is to embrace the "Database Refactoring" concept.
There is a good book on this topic Refactoring Databases: Evolutionary Database Design
A database refactoring is a small change to your database schema which
improves its design without changing its semantics (e.g. you don't add
anything nor do you break anything). The process of database
refactoring is the evolutionary improvement of your database schema so
as to improve your ability to support the new needs of your customers,
support evolutionary software development, and to fix existing legacy
database design problems.
The book describes database refactoring from the point of view of:
Technology. It includes full source code for how to implement each refactoring at the database level and for most refactorings we
show how the application would change to reflect the change in the
database. Our code examples are in Oracle, Java, and Hibernate
meta-data (the refactorings are easy to translate to other
environments, and sometimes we discuss vendor-specific features which
simplify some refactorings).
Process. It describes in detail the process of database refactoring in both the simple situation of a single application
accessing the database as well as the situation of the database being
accessed by many programs, many of which are out of the scope of your
authority. The technical examples assume the latter situation, so if
you're in the simple situation you may find some of our solutions to
be a little more complicated than you need (lucky you!).
Culture. Although it is technically simple to implement individual refactorings, and clearly possible (albeit a little
complicated) to adapt your internal processes to support database
refactoring, the fact is that cultural challenges within your
organization will likely prove to be the most difficult hurdle to
overcome.
This idea may or may not work, but reading about your work so far and the previous answer looks like reinventing the wheel. The "wheel" is source control, with it's branch, merge and version tracking features.
At the moment, for each DB schema change, you have a SQL file containing the changes from the previous one. You already mention the significant issues you have with this approach.
Replace your method with this one: Maintain ONE (and only ONE!) SQL file, which stores all DDL command for creating tables, indexes, and so on from scratch. You need to add a new field? Add a "ALTER TABLE" line in your SQL file. This way your source control tool will in effect manage your database schema, and each branch can have a different.
All of a sudden, the source code is in sync with the database schema, branching and merging works, and so on.
Note: Just to clarify the purpose of the script mentioned here is to recreate the database from scratch up to a specific version, every single time.
EDIT: I spent some time looking for material to support this approach. Here is one that looks particularly good, with a proven track record:
Database Schema Versioning Management 101
Have you seen this situation before?
Your team is writing an enterprise application around a database
Since everyone is building around the same database, the schema of the database is in flux
Everyone has their own "local" copies of the database
Every time someone changes the schema, all of these copies need the latest schema to work with the latest build of the code
Every time you deploy to a staging or production database, the schema needs to work with the latest build of the code
Factors such as schema dependencies, data changes, configuration changes, and remote developers muddy the water
How do you currently address this problem of keeping the database
versions in working order? Do you suspect this is taking more time
than necessary? There are many ways to approach this problem, and the
answer depends on the workflow in your environment. The following
article describes a distilled and simplistic methodology you can use
as a starting point.
Since it can be implemented with ANSI SQL, it is database agnostic
Since it depends on scripting, it requires negligible storage management, and it can fit in your current code version management
program
The database versioning method you are using is certainly wrong, in my opinion. If anything has to have versions, it should be the source code. The source code has versions. Your live environment is only an instance of the source code.
The answer is to apply database changes using redeployable change scripts.
All changes, no matter which branch it is on (even in master/trunk) should be done in a separate script.
Sequence your scripts, so that newer ones will not get executed first. Having a prefix with date in the format YYYYMMDD for filename has worked for us.
When this happens, the change is made to the source code, not the database. You can have as many instances/builds for various tags/branches in the VCS as you like. For example, separate live builds for each branch.
Then you only have to do the build for each instance (probably every day). The build should fetch the files from the relevant branch and perform compiling/deploying. Since the scripts are redeployable, old scripts make no effect on the database. Only the recent changes are deployed to the database.
But, how to make redeployable scripts?
This is a question that is hard to answer, since you have not specified which database you are using. So I will give you an example about how my organization does it.
Let me take a simple example: if we need to add a column to a particular table, we do not just write ALTER TABLE ... ADD COLUMN .... We write code to add a column, if and only if that column does not exist in the given table.
Now, we have separate API to handle all that existence-checking boilerplate code. So our scripts are simply calls to those APIs. You will have to write your own. These API's are not actually that hard (we're using Oracle RDBMS). But they give us a huge gain in version control and deployment.
But, that's only one scenario, there are gazillion ways a schema definition can change
Yes indeed. Data type of a column can change; A new table can be added; An attribute column can be merged into a primary key (very rare); Sequences can change; Constraints; Foreign keys; They all can change.
But it turns out that all this can be handled by API's with special privileges to read metadata tables. I am not saying it's easy, but I am saying that it is a one time cost.
But, how do you rollback a database change?
My personal experience is, if you put some real effort into designing before banging the keyboard to write ALTER TABLE statements, this scenario is extremely rare. And if there ever is a rollback, you should manually handle it. (e.g. manually remove added column).
Normally, changes to views and stored procedures are rather common, and changes to table definitions is rare.
Building the Database
As I said before, building the database can be done by running all the redeployable scripts. Pre-deployed scripts has no effect.
Your database deployment script should not start with DROP DATABASE. Your database has lots of data which was used for unit tests. Unless you make a really really simple system, these data will be valuable in the future for testing. Your testers will not be too happy about adding ten thousand records to various tables every time a database is upgraded.
Put testers aside, how are you planning to upgrade your client/customers production database without annihilating all their production data? This is why you must use redeployable change scripts.
You can try version number schemes such as 18.1-branchname etc... But they are really going to utterly fail. Because you can merge your source, not it's instances.
I think that the way you pose the problem is impossible to solve, but if change part of your process there is a solution. Let's start with the first part: why it is impossible to solve using just deltas. In the following I assume you have the main trunk and two branches dev-a and dev-b; both branches stem from the same point-in-time.
Why cannot work
Say Alice add a delta script to dev-a:
ALTER TABLE t1 (ALTER COLUMN col5 char(4))
and Bob add another script in dev-b
ALTER TABLE t1 (ALTER COLUMN col5 int)
The two scripts are clearly incompatible and you end up in breaking code in main when you merge back from any of the two. The merge tool cannot be of help if the script files have different names.
Possible solution
My suggestion is to describe your database in terms of both baseline and deltas: the delta scripts must always refer to a specific baseline, so you are able to compute a new baseline schema resulting from the application of successive deltas to a specific baseline.
An example
dev-a *--B.A1--D.1#A1--D2#A1--------B.A2--*--B.A3--
/ /
main -- B.0 --*--------------------------*--B.1---*----------
\ /
dev-b *--B.B1--D.1#B1--B.B2--*
note that after branching you immediately spin-off a new baseline, same before every merge. This way you may check that the baselines are compatible.
Final comment
Managing deltas in version control is kind of reinventing the wheel, as each delta script is functionally equivalent to saving different versions of the baseline script. That said I agree with you that they in practice they convey more value and force people to think what happens in production when you change the database.
If you opt store only baseline, you have plenty of tools to support.
Another option is to serialize work on the database, as a whole or partitioning the schema in separate areas with unique owners.
This seems to be an issue that keeps coming back in every web application; you're improving the back-end code and need to alter a table in the database in order to do so. No problem doing manually on the development system, but when you deploy your updated code to production servers, they'll need to automatically alter the database tables too.
I've seen a variety of ways to handle these situations, all come with their benefits and own problems. Roughly, I've come to the following two possibilities;
Dedicated update script. Requires manually initiating the update. Requires all table alterations to be done in a predefined order (rigid release planning, no easy quick fixes on the database). Typically requires maintaining a separate updating process and some way to record and manage version numbers. Benefit is that it doesn't impact running code.
Checking table properties at runtime and altering them if needed. No manual interaction required and table alters may happen in any order (so a quick fix on the database is easy to deploy). Another benefit is that the code is typically a lot easier to maintain. Obvious problem is that it requires checking table properties a lot more than it needs to.
Are there any other general possibilities or ways of dealing with altering database tables upon application updates?
I'll share what I've seen work best. It's just expanding upon your first option.
The steps I've usually seen when updating schemas in production:
Take down the front end applications. This prevents any data from being written during a schema update. We don't want writes to fail because relationships are messed up or a table is suddenly out of sync with the application.
Potentially disconnect the database so no connections can be made. Sometimes there is code out there using your database you don't even know about!
Run the scripts as you described in your first option. It definitely takes careful planning. You're right that you need a pre-defined order to apply the changes. Also I would note often times you need two sets of scripts, one for schema updates and one for data updates. As an example, if you want to add a field that is not nullable, you might add a nullable field first, and then run a script to put in a default value.
Have rollback scripts on hand. This is crucial because you might make all the changes you think you need (since it all worked great in development) and then discover the application doesn't work before you bring it back online. It's good to have an exit strategy so you aren't in that horrible place of "oh crap, we broke the application and we've been offline for hours and hours and what do we do?!"
Make sure you have backups ready to go in case (4) goes really bad.
Coordinate the application update with the database updates. Usually you do the database updates first and then roll out the new code.
(Optional) A lot of companies do partial roll outs to test. I've never done this, but if you have 5 application servers and 5 database servers, you can first roll out to 1 application/1 database server and see how it goes. Then if it's good you continue with the rest of the production machines.
It definitely takes time to find out what works best for you. From my experience doing lots of production database updates, there is no silver bullet. The most important thing is taking your time and being disciplined in tracking changes (versioning like you mentioned).
I am not sure whether this has been asked before; I did a few searches but nothing appropriate showed up.
OK, now my problem:
I want to migrate an old application to a different programming language. The only requirement we have is to keep the database structure stable. So no changes in my database schema. For the rest of the application I am basically reimplementing everything from scratch without reusing old code.
My Idea: in order to verify my new code was to let users do certain actions or workflows, capture the state of the database before that and after that and then maybe create unit tests with the help of this data. Does anyone know an elegant solution to keep track of these changes? Copying the database (>10GB) is pretty expensive. I also can't modify the code of the old application in which the users will be performing these sample actions. I have to keep it on the database level.
My database is Oracle 10g.
You could capture the old application behavior with a trace and then validate the changes against your new code. But, honestly, trying to write a new application by capturing the data modifications it makes and the imitating that will be a very difficult task as the inputs and the outputs to the original application are not guaranteed to be stateless (that is, the old application might do the same thing the first 1,000,000 times it is given a certain set of inputs and do something completely different on the 1,000,001st run.)
Your best bet is to start over with the business requirements and use the old application and a functional reference.
Take a look at Oracle Flashback Queries.
It enables to execute queries which return past data. The timeframe is limited, but it can be very useful.
In 10g the only way is to do with FLASHBACK queries.in 11g we can do this with RAT(Real Application Testing). RAT is quite useful for this senarios and also for load and volume testing.
I am working on an new web app I need to store any changes in database to audit table(s). Purpose of such audit tables is that later on in a real physical audit we can asecertain what happened in a situation, who edited what and what was the state of db at the time of e.g. a complex calculation.
So mostly audit table will be written and not read. Report may be generated though sometimes.
I have looked for available solution
AuditTrail - simple and that is why I am inclining towards it, I can understand it single file code.
Reversion - looks simple enough to use but not sure how easy it would be to modify it if needed.
rcsField seems to be very complex and too much for my needs
I haven't tried anyone of these, so I wanted to know some real experiences and which one I should be using. e.g. which one is faster uses less space, easy to extend and maintain?
Personally I prefer to create audit tables in the database and populate through triggers so that any change even ad hoc queries from the query window are stored. I would never consider an audit solution that is not based in the database itself. This is important because people who are making malicious changes to the database or committing fraud are not likely to do so through the web interface but on the backend directly. Far more of this stuff happens from disgruntled or larcenous employees than outside hackers. If you are using an ORM already, your data is at risk because the permissions are at the table level rather than the sp level where they belong. Therefore it is even more important that you capture any possible change to the dat not just what was from the GUI. WE have a dynamic proc to create audit tables that is run whenever new tables are added to the database. Since our audit tables populate only the changes and not the whole record, we do not need to change them every time a field is added.
Also when evaluating possible solutions, make sure you consider how hard it will be to revert the data to undo a specific change. Once you have audit tables, you will find that this is one of the most important things you need to do from them. Also consider how hard it will be to maintian the information as the database schema changes.
Choosing a solution because it appears to be the easiest to understand, is not generally a good idea. That should be lowest of your selction criteria after meeting the requirements, security, etc.
I can't give you real experience with any of them but would like to make an observation.
I assume by AuditTrail you mean AuditTrail on the Django wiki. If so, I think you'll want to instead look at HistoricalRecords developed by the same author (Marty Alchin aka #gulopine) in his book Pro Django. It should work better with Django 1.x.
This is the approach I'll be using on an upcoming project, not because it necessarily beats the others from a technical standpoint, but because it matches the "real world" expectations of the audit trail for that application.
As i stated in my question rcField seems to be to much for my needs, which is simple that i want store any changes to my table, and may be come back later to those changes to generate some reports.
So I tested AuditTrail and Reversion
Reversion seems to be a better full blown application with many features(which i do not need), Also as far as i know it saves data in a single table in XML or YAML format, which i think
will generate too much data in a single table
to read that data I may not be able to use already present db tools.
AuditTrail wins in that regard that for each table it generates a corresponding audit table and hence changes can be tracked easily, per table data is less and can be easily manipulated and user for report generation.
So i am going with AuditTrail.