Track changes on a database - database

I am not sure whether this has been asked before; I did a few searches but nothing appropriate showed up.
OK, now my problem:
I want to migrate an old application to a different programming language. The only requirement we have is to keep the database structure stable. So no changes in my database schema. For the rest of the application I am basically reimplementing everything from scratch without reusing old code.
My Idea: in order to verify my new code was to let users do certain actions or workflows, capture the state of the database before that and after that and then maybe create unit tests with the help of this data. Does anyone know an elegant solution to keep track of these changes? Copying the database (>10GB) is pretty expensive. I also can't modify the code of the old application in which the users will be performing these sample actions. I have to keep it on the database level.
My database is Oracle 10g.

You could capture the old application behavior with a trace and then validate the changes against your new code. But, honestly, trying to write a new application by capturing the data modifications it makes and the imitating that will be a very difficult task as the inputs and the outputs to the original application are not guaranteed to be stateless (that is, the old application might do the same thing the first 1,000,000 times it is given a certain set of inputs and do something completely different on the 1,000,001st run.)
Your best bet is to start over with the business requirements and use the old application and a functional reference.

Take a look at Oracle Flashback Queries.
It enables to execute queries which return past data. The timeframe is limited, but it can be very useful.

In 10g the only way is to do with FLASHBACK queries.in 11g we can do this with RAT(Real Application Testing). RAT is quite useful for this senarios and also for load and volume testing.

Related

How to update a bulk of data in Oracle db by not bypassing any trigger?

We are running a pretty uncommon erp-system of a small it-business which doesn't allow us to modify data in an extensive way. We thought about doing a data update by exporting the data we wanted to change directly from the db and by using Excel VBA to update a bunch of data of different tables. Now we got the data updated in excel which is supposed to be written into the Oracle DB.
The it-business support told us not to do so, because of all the triggers running in the background during a regular data update in their program. We are pretty afraid of damaging the db so we are looking for the best way to do the data update without bypassing any trigger. To be more specific there are some thousands of changes we've done in different columns and tables merged all together in one Excel-file. Now we have to be sure to insert the modified data into the db and firing all the triggers the erp-software does during data update.
Is there anyone who knows a good way to do so?
I don't know what ERP system you are using, but I can relate some experiences from Oracle's E-Business Suite.
Nowadays, Oracle's ERP includes a robust set of APIs that will allow your custom programs to safely maintain ERP data. For example, if you want to modify a sales order, you use Oracle's API for that purpose and it makes sure all the necessary, related validations and logic are applied.
So, step #1 -- find out if your ERP system offers any APIs to allow you to safely update your data.
Back in the early days of Oracle's ERP, there were not so many APIs. In those days, when we needed to update a lot of table and had no API available, the next approach would be to use some sort of data loader tool. The most popular was, in fact, called "Data Loader". What this would do is read your data from an Excel spreadsheet and send it to the ERP's user interface -- exactly as though it were being typed in by a user. Since the data went through the ERP's UI, all the necessary validations and logic would automatically be applied.
In really extreme cases, when there was no API and DataLoader was, for whatever reason, not practical, it was still sometimes deemed necessary and worth the risk to attempt our own direct update of the ERP tables. This is, in general, risky and a bad practice, but sometimes we do what we must.
In these cases, we would start a database trace going on a user's session as they keyed in a few updates via the ERP's user interface. Then, we would use the trace to figure out what validations and related logic we needed to apply during our custom direct updates. We would also analyze the source code of the ERP system (since we had it available in the case of Oracle's ERP). Then, we would test it extensively. And, after all that, it was still risky and also prone to break after upgrades. But, in general, it worked as a last resort.
No my problem is that I need to do the work fast by make some automation in my processes. The work is already done on excel that's true but it needed the modification anyway. It's only if I put it manually with c&p into the db over our ERP or all at once over I don't know what.
But I guess Mathew is right. There are validation processes in the ERP so we can't write it directly into the db.
I don't know maybe you could contact me if you have a clue to bypass the ERP in a non risky manner.

Altering database tables on updating website

This seems to be an issue that keeps coming back in every web application; you're improving the back-end code and need to alter a table in the database in order to do so. No problem doing manually on the development system, but when you deploy your updated code to production servers, they'll need to automatically alter the database tables too.
I've seen a variety of ways to handle these situations, all come with their benefits and own problems. Roughly, I've come to the following two possibilities;
Dedicated update script. Requires manually initiating the update. Requires all table alterations to be done in a predefined order (rigid release planning, no easy quick fixes on the database). Typically requires maintaining a separate updating process and some way to record and manage version numbers. Benefit is that it doesn't impact running code.
Checking table properties at runtime and altering them if needed. No manual interaction required and table alters may happen in any order (so a quick fix on the database is easy to deploy). Another benefit is that the code is typically a lot easier to maintain. Obvious problem is that it requires checking table properties a lot more than it needs to.
Are there any other general possibilities or ways of dealing with altering database tables upon application updates?
I'll share what I've seen work best. It's just expanding upon your first option.
The steps I've usually seen when updating schemas in production:
Take down the front end applications. This prevents any data from being written during a schema update. We don't want writes to fail because relationships are messed up or a table is suddenly out of sync with the application.
Potentially disconnect the database so no connections can be made. Sometimes there is code out there using your database you don't even know about!
Run the scripts as you described in your first option. It definitely takes careful planning. You're right that you need a pre-defined order to apply the changes. Also I would note often times you need two sets of scripts, one for schema updates and one for data updates. As an example, if you want to add a field that is not nullable, you might add a nullable field first, and then run a script to put in a default value.
Have rollback scripts on hand. This is crucial because you might make all the changes you think you need (since it all worked great in development) and then discover the application doesn't work before you bring it back online. It's good to have an exit strategy so you aren't in that horrible place of "oh crap, we broke the application and we've been offline for hours and hours and what do we do?!"
Make sure you have backups ready to go in case (4) goes really bad.
Coordinate the application update with the database updates. Usually you do the database updates first and then roll out the new code.
(Optional) A lot of companies do partial roll outs to test. I've never done this, but if you have 5 application servers and 5 database servers, you can first roll out to 1 application/1 database server and see how it goes. Then if it's good you continue with the rest of the production machines.
It definitely takes time to find out what works best for you. From my experience doing lots of production database updates, there is no silver bullet. The most important thing is taking your time and being disciplined in tracking changes (versioning like you mentioned).

Using a common database for collaborative development

Some of the people in my project seem to think that using a common development database with everyone connecting to it is the best thing. I think that it isn't and each developer having his own database (with periodic updated data dumps) is the best. Am I right or wrong? Have you encountered any problems in any of these approaches?
Disk space and CPU should be cheap enough that every developer can run their own instance of the database, with an automated build under version control. This is needed to allow developers to be bold in hacking on the database, in isolation from any other developer's concurrent hacking.
The caveat being, of course, that any changes they make to their private instance are useless to anyone else unless it can be automatically applied during the build process. So there needs to be a firm policy that application code can't depend on any database state unless that state is represented by version-controlled, unit-tested changes to the DDL.
For an excellent guide on the theory and practice of treating the database definition as another part of the project code, and coordinating changes and refactorings, see Refactoring Databases: Evolutionary Database Design by Scott W. Ambler and Pramod Sadalage.
I like having my own copy of the database for development, because it gives you the flexibility to rapidly change things without worrying how it will impact others.
However, if all the developers are hacking away on their own copy of the database, it becomes more and more difficult to merge everyone's work together in the end.
I think you can get the best of both worlds by letting developers work on a local copy during day-to-day development, but each developer should probably merge their work into a common copy on a pretty regular basis. Writing a lot of unit tests helps too.
We share a single database amongst all our developer (20-odd) but we've got it structured so that everyone has their own tables.
You don't need a separate database per developer if you structure the application right. It should be configurable which database or table-prefix it uses anyway so you can easily move it between instances (unit test, system test, acceptance test, production, disaster recovery and so on).
The advantage to using a single database is that the cost of maintenance is amortized. You don't have your DBAs trying to handle a lot of databases (or, if you're a small-DB shop, you don't have every developer trying to maintain their own database when they're better utilized in developing).
Having a single point of Failure is not a good thing isn't it?
I prefer a single, shared database. But it's very dependent on the situation and the applications being developed.
What works for me may not work for you. Go with your gut.
If you are working with Hibernate or any hibernate-based platform you can configure your database to be created when you start your server (create-drop option). This is very useful when you are adding new attributes to your classes. If this is the case each developer must have his own copy of the DB.
If you are not changing the DB structure at all then you can use a single shared DB.
In this second case is not a must. I prefer to have my own DB where I can do whatever I want. On the other hand remember that some queries can take a lot of time and this will affect your whole team if you are sharing a DB.

django AuditTrail vs Reversion

I am working on an new web app I need to store any changes in database to audit table(s). Purpose of such audit tables is that later on in a real physical audit we can asecertain what happened in a situation, who edited what and what was the state of db at the time of e.g. a complex calculation.
So mostly audit table will be written and not read. Report may be generated though sometimes.
I have looked for available solution
AuditTrail - simple and that is why I am inclining towards it, I can understand it single file code.
Reversion - looks simple enough to use but not sure how easy it would be to modify it if needed.
rcsField seems to be very complex and too much for my needs
I haven't tried anyone of these, so I wanted to know some real experiences and which one I should be using. e.g. which one is faster uses less space, easy to extend and maintain?
Personally I prefer to create audit tables in the database and populate through triggers so that any change even ad hoc queries from the query window are stored. I would never consider an audit solution that is not based in the database itself. This is important because people who are making malicious changes to the database or committing fraud are not likely to do so through the web interface but on the backend directly. Far more of this stuff happens from disgruntled or larcenous employees than outside hackers. If you are using an ORM already, your data is at risk because the permissions are at the table level rather than the sp level where they belong. Therefore it is even more important that you capture any possible change to the dat not just what was from the GUI. WE have a dynamic proc to create audit tables that is run whenever new tables are added to the database. Since our audit tables populate only the changes and not the whole record, we do not need to change them every time a field is added.
Also when evaluating possible solutions, make sure you consider how hard it will be to revert the data to undo a specific change. Once you have audit tables, you will find that this is one of the most important things you need to do from them. Also consider how hard it will be to maintian the information as the database schema changes.
Choosing a solution because it appears to be the easiest to understand, is not generally a good idea. That should be lowest of your selction criteria after meeting the requirements, security, etc.
I can't give you real experience with any of them but would like to make an observation.
I assume by AuditTrail you mean AuditTrail on the Django wiki. If so, I think you'll want to instead look at HistoricalRecords developed by the same author (Marty Alchin aka #gulopine) in his book Pro Django. It should work better with Django 1.x.
This is the approach I'll be using on an upcoming project, not because it necessarily beats the others from a technical standpoint, but because it matches the "real world" expectations of the audit trail for that application.
As i stated in my question rcField seems to be to much for my needs, which is simple that i want store any changes to my table, and may be come back later to those changes to generate some reports.
So I tested AuditTrail and Reversion
Reversion seems to be a better full blown application with many features(which i do not need), Also as far as i know it saves data in a single table in XML or YAML format, which i think
will generate too much data in a single table
to read that data I may not be able to use already present db tools.
AuditTrail wins in that regard that for each table it generates a corresponding audit table and hence changes can be tracked easily, per table data is less and can be easily manipulated and user for report generation.
So i am going with AuditTrail.

Database design for physics hardware

I have to develop a database for a unique environment. I don't have experience with database design and could use everybody's wisdom.
My group is designing a database for piece of physics hardware and a data acquisition system. We need a system that will store all the hardware configuration parameters, and track the changes to these parameters as they are changed by the user.
The setup:
We have nearly 200 detectors and roughly 40 parameters associated with each detector. Of these 40 parameters, we expect only a few to change during the course of the experiment. Most parameters associated with a single detector are static.
We collect data for this experiment in timed runs. During these runs, the parameters loaded into the hardware must not change, although we should be able to edit the database at any time to prepare for the next run. The current plan:
The database will provide the difference between the current parameters and the parameters used during last run.
At the start of a new run, the most recent database changes be loaded into hardware.
The settings used for the upcoming run must be tagged with a run number and the current date and time. This is essential. I need a run-by-run history of the experimental setup.
There will be several different clients that both read and write to the database. Although changes to the database will be infrequent, I cannot guarantee that the changes won't happen concurrently.
Must be robust and non-corruptible. The configuration of the experimental system depends on the hardware. Any breakdown of the database would prevent data acquisition, and our time is expensive. Database backups?
My current plan is to implement the above requirements using a sqlite database, although I am unsure if it can support all my requirements. Is there any other technology I should look into? Has anybody done something similar? I am willing to learn any technology, as long as it's mature.
Tips and advice are welcome.
Thank you,
Sean
Update 1:
Database access:
There are three lite applications that can write and read to the database and one application that can only read.
The applications with write access are responsible for setting a non-overlapping subset of the hardware parameters. To be specific, we have one application (of which there may be multiple copies) which sets the high voltage, one application which sets the remainder of the hardware parameters which may change during the experiment, and one GUI which sets the remainder of the parameters which are nearly static and are only essential for the proper reconstruction of the data.
The program with read access only is our data analysis software. It needs access to nearly all of the parameters in the database to properly format the incoming data into something we can analyze properly. The number of connections to the database should be >10.
Backups:
Another setup at our lab dumps an xml file every run. Even though I don't think xml is appropriate, I was planning to back up the system every run, just in case.
Some basic things about the design; you should make sure that you don't delete data from any tables; keep track of the most recent data (probably best with most recent updated datetime); when the data value changes, though, don't delete the old data. When a run is initiated, tag every table used with the Run ID (in another column); this way, you maintain full historical record about every setting, and can pin exactly what the state used at a given run was.
Ask around of your colleagues.
You don't say what kind of physics you're doing, or how big the working group is, but in my discipline (particle physics) there is a deep repository of experience putting up and running just this type of systems (we call it "slow controls" and similar). There is a pretty good chance that someone you work with has either done this or knows someone who has. There may be a detailed description of the last time out in someone's thesis.
I don't personally have much to do with this, but I do know this: one common feature is to have no-delete-no-overwrite design. You can only add data, never remove it. This preserves your chances of figuring out what really happened in the case of trouble
Perhaps I should explain a little more. While this is an important task and has to be done right, it is not really related to physics, so you can't look it up on Spires or on arXive.org. No one writes papers on the design and implementation of medium sized slow controls databases. But they do sometimes put it in their dissertations. The easiest way to find a pointer really is to ask a bunch of people around the lab.
This is not a particularly large database by the sounds of things. So you might be able to get away with using Oracle's free database which will give you all kinds of great flexibility with journaling (not sure if that is an actual word) and administration.
Your mentioning of 'non-corruptible' right after you say "There will be several different clients that both read and write to the database" raises a red flag for me. Are you planning on creating some sort of application that has a interface for this? Or were you planning on direct access to the db via a tool like TOAD?
In order to preserve your data integrity you will need to get really strict on your permissions. I would only allow one (and a backup) person to have admin rights with the ability to do the data manipulation outside the GUI (which will make your life easier).
Backups? Yes, absolutely! Not only should you do daily, weekly and monthly backups you should do full and incremental. Also, test your backup images often to confirm they are in fact working.
As for the data structure I would need much greater detail in what you are trying to store and how you would access it. But from what you have put here I would say you need the following tables (to begin with):
Detectors
Parameters
Detector_Parameters
Some additional notes:
Since you will be doing so many changes I recommend using a version control like SVN to keep track of all your DDLs etc. I would also recommend using something like bugzilla for bug tracking (if needed) and using google docs for team document management.
Hope that helps.

Resources