Why odi if we have PL/SQL lanuguage - oracle-data-integrator

Someone asked me why we need ODI tool if we have PL/SQL code. Odi is generating the PL/SQL code in the back end . why we need ODI interface if we can use code generated by odi interface even using on step less instead of putting data into I$ table we can directly push it with PL/SQL.
Let's take and example:
IF we have to insert 2000 records into a another table from one table we can directly use PL/SQL code instead of designing odi interface which make me confused thinking how odi is better than just a tool.

There are a lot to say, but I can mention to you the most important aspects, in my opinion:
In ODI, you can write KM's (Knowledge module - some SQL/OS commands/Groovy/Java generic code, that generates the statements that you need, based on the source and target tables/table). After creating, you can use it in many mappings. Conclusion: write once, use many times;
ODI have an API: with it, you can automatically generate mappings/objects. So, you don't need to manually create 100 mappings (for example), but maintain a Metadata Repository, from where you can generate mappings automatically;
The fact that you can combine sql with Groovy gives you such a power, that you can't find it in other ETL tools (from what I know);
ODI Contexts - permit you to run the same mapping on different servers or for parallel work;
For your example, it's clearly that it's easy to made it through sql, if it's once. But if you have 10 similar sql to build, you may save some time writing a KM that meets your desire and then generate/create 10 mappings.
There are more to say. If you need, I can expand this post with more. Feel free to tell me.

Related

Multiple Tables Load from Snowflake to Snowflake using ADF

I have source tables in Snowflake and Destination tables in Snowflake.
I need to load data from source to destination using ADF.
Requirement: I need to load data using single pipeline for all the tables.
Eg: For suppose i have 40 tables in source and load the total 40 tables data to destination tables. I need to create a single pipeline to load all tables at a time.
Can anyone help me in achieving this?
Thanks,
P.
This is a fairly broad question. So take this all as general thoughts, more than specific advice.
Feel free to ask more specific questions, and I'll try to update/expand on this.
ADF is useful as an orchestration/monitoring process, but can be tricky to manage the actual copying and maneuvering of data in Snowflake. My high level recommendation is to write your logic and loading code in snowflake stored procedures
then you can use ADF to orchestrate by simply calling those stored procedures. You get the benefits of using ADF for what it is good at, and allow Snowflake to do the heavy lifting, which is what it is good at.
hopefully you'd be able to parameterize procedures so that you can have one procedure (or a few) that takes a table name and dynamically figures out column names and the like to run your loading process.
Assorted Notes on implementation.
ADF does have a native Snowflake connector. It is fairly new, so a lot of online posts will tell you how to set up a custom ODBC connector. You don't need to do this. Use the native connector and auto resolve integration and it should work for you.
You can write a query in an ADF lookup activity to output your list of tables, along with any needed parameters (like primary key, order by column, procedure name to call, etc.), then feed that list into an ADF foreach loop.
foreach loops are a little limited in that there are some things that you can't nest inside of a loop (like conditionals). If you need extra functionality, you can have the foreach loop call a child ADF pipeline (passing in those parameters) and have the child pipline manage your table processing logic.
Snowflake has pretty good options for querying metadata based on a tablename. See INFORMATION_SCHEMA. Between that and just a tiny bit of javascript logic, it's not too bad to generate dynamic queries (e.g. with column names specific to a provided tablename).
If you do want to use ADF's Copy Activities, I think You'll need to set up an intermediary Azure Storage Account connection. I believe this is because it uses COPY INTO under the hood which requires using external storage.
ADF doesn't have many good options for avoiding running one pipeline multiple times at once. Either be careful about making sure that your code can handle edge cases like this, or that your scheduling/timeouts won't allow for that scenario with a pipeline running too long.
Extra note:
I don't know how tied you are to ADF, but without more context, I might suggest a quick look into DBT for this use case. It's a great tool for this specific scenario of Snowflake to Snowflake processing/transforming. My team's been much happier since moving some of our projects from ADF to DBT. (not sponsored :P )

how to tap into PostgreSQL DDL parser?

The DDL / SQL scripts used to create my PostgreSQL database are under version control. In theory, any change to the database model is tracked in the source code repository.
In practice however, it happens that the structure of a live database is altered 'on the fly' and if any client scripts fail to insert / select / etc. data, I am put in charge of fixing the problem.
It would help me enormously if I could run a quick test to verify the database still corresponds to the creation scripts in the repo, i.e. is still the 'official' version.
I started using pgTAP for that purpose and so far, it works great. However, whenever a controlled, approved change is done to the DB, the test scripts need changing, too.
Therefore, I considered creating the test scripts automatically. One general approach could be to
run the scripts to create the DB
access DB metadata on the server
use that metadata to generate test code
I would prefer though not having to create the DB, but instead read the DB creation scripts directly. I tried to google a way to tap into the DDL parser and get some kind of metadata representation I could use, but so far, I have learned a lot about PostgreSQL internals, but couldn't really find a solution to the issue.
Can someone think of a way to have a PostgreSQL DDL script parsed ?
Here is my method for ensuring that the live database schema matches the schema definition under version control: As part of the "build" routine of your database schema, set up a temporary database instance, load in all the schema creation scripts the way it was intended, then run a pg_dump -s off that, and compare that with a schema dump of your production database. Depending your exact circumstances, you might need to run a little bit of sed over the final product to get an exact match, but it's usually possible.
You can automate this procedure completely. Run the database "build" on SCM checking (using a build bot, continuous integration server, or similar), and get the dumps from the live instance by a cron job. Of course, this way you'd get an alert every time someone checks in a database change, so you'll have to tweak the specifics a little.
There is no pgTAP involved there. I love pgTAP and use it for unit testing database functions and the like (also done on the CI server), but not for verifying schema properties, because the above procedure makes that unnecessary. (Also, generating tests automatically from what they are supposed to test seems a little bit like the wrong way around.)
There is a lot of database metadata to be concerned about here. I've been poking around the relevant database internals for a few years, and I wouldn't consider the project you're considering feasible to build without dumping a few man months of programming time just to get a rough alpha quality tool that handles some specific subset of changes you're concerned about supporting. If this were easy, there wouldn't be a long standing (as in: people have wanted it for a decade) open item to build DDL Triggers into the database, which is exactly the thing you'd like to have here.
In practice, there are two popular techniques people use to make this class of problem easier to deal with:
Set log_statement to 'ddl' and try to parse the changes it records.
Use pg_dump --schema-only to make a regular snapshot of the database structure. Put that under version control, and use changes in its diff to find the information you're looking for.
Actually taking either of these and deriving the pgTAP scripts you want directly is its own challenge. If the change made is small enough, you might be able to automate that to some degree. At least you'd be starting with a reasonably sized problem to approach from that angle.

Cloning a PHP/MySQL database app (w/ some automation) in MS Access or OpenOffice.org Base

I wasn't sure whether to ask this here or on SuperUser, so I apologize if it doesn't belong here.
I created a small PHP/MySQL database app to manage the customer loyalty data for my mom's shop, intending to set it up locally on her cash register computer with XAMPP. However, I've been asked to reimplement the system in a GUI relational database such as MS Access or OpenOffice Base, primarily so that she can do things like mail merge and graphical reports with a GUI (that I don't have to write).
I can easily replicate my MySQL table structure and relationships, and create a few of the more basic forms and reports, but I've never done any scripting, macros etc in Access or Base. My PHP handled a lot more than just form input, there was some scripting involved that I don't know how to implement in Access / Base. Worth noting: if I end up using Access, it'll be Access 2007.
Here's a quick overview of what I'm trying to make, in case it helps. Sorry for the length.
The business is a take & bake food market, and the database is replacing a physical stamp-card loyalty system. Each customer gets a stamp on their card for every $25 they spend. They earn free meals as follows:
- On the 8th stamp, they earn a free side dish.
- On the 16th stamp, they earn a free regular size meal.
- On the 24th stamp, they earn a free family size meal, and their card resets to zero stamps.
The date of each stamp must be recorded (otherwise I'd just increment one field instead of having a stamps table).
I have 3 tables: customers, stamps, and freebies. customers has a 1-to-many relationship with both stamps and freebies.
customers is a simple contact list.
columns: ID, firstname, lastname, email, phone
stamps keeps records of each stamp earned.
columns: ID, customerID, date, index (1-24; the Nth stamp on that customer's card)
freebies keeps records of each free meal they have earned.
columns: ID, customerID, date, size, is_redeemed
Here's the magic from my PHP that I don't know how to implement in Access/Base:
When a user selects a customer and clicks an "add a stamp" button:
stamps is queried to grab the index from the last stamp for that customer => local variable N
if N == 24, set N = 0. Increment N by 1.
a record is inserted to stamps with the current date, customer id and an index of N
if N == 8, 16 or 24 a record is inserted into freebies with the appropriate size and an alert appears to notify the user that the customer earned some free shit.
Some kind of "view customer" page (form? report?) that shows all the stamps and freebies they've earned, with "redeem" buttons next to the freebies that have not been redeemed.
In general I need to make it fairly idiot-proof and "big-button" -- automation wherever possible -- cashiers at the shop should be able to use it with no prior knowledge of databases.
Is this practical in a program like Access or Base, or should I just convince her to use my PHP version? If I need to write code, what language(s) do I need to teach myself? Should I be structuring my data differently? I'm not sure where to start here.
Really I think this would be a piece of cake. It's true like Tony said that you can continue to use the same tables/backend and that's probably the route I'd recommend. You'll need to install MySQL's ODBC drivers on any machine that will be linking to the MySQL database. After that create a DSN and then access the tables through that from within Access. You may want to add code later to relink the tables every time the software loads using DSN-less tables. This way the database can run on a machine that doesn't have a DSN configured. I do recommend that you go with either MySQL or SQL Server Express as opposed to an MS Access backend but I'm not going to take the time to elaborate on why.
I think you can actually get much more functionality from a traditional Windows Desktop Application (built in MS Access or VB.Net) than you could with PHP. And it's my own opinion that you'll be able to do it with less code and less time invested. I mentioned VB.Net but I'd probably recommend MS Access over VB.Net for databases although either one will do the job.
As Tony already mentioned, Access uses VBA language. It takes a little while to really pick it up unless you already have some experience with other programming languages that use the Basic syntax. I've found that moving from VBA/ASP to PHP/Javascript has been slow going though not necessarily so difficult. PHP uses the C style code with curly braces and VBA does not.
Coming from PHP, here's some things that may be new to you:
Stronger Variable Typing - In Access you can actually declare your variables with a specified data type such as String, Date, Integer, Long, Single, Double, etc. I recommend using this as much as possible. There are very few times when you will need to use the more general types such as Object or Variant. Variables declared with a specified data type will throw an error if you attempt to put the wrong data type into them. This helps you write better code, in my opinion.
Option Explicit - Option Explicit is a declaration you can put at the top of each code module to enforce that you have to declare a variable with a Dim statement before using it. I highly recommend that you do this. It will save you a lot of time troubleshooting problems.
Set MyVariable = Nothing - Cleaning up object variables after using them is one of the best practices of using MS Access. You'll use this to clean up DAO Recordset variables, ADO Connection variables, ADO Recordset variables, form variables, etc. Any variable that you declare as an object (or some specific type of object) should get cleaned up by setting it to Nothing when you no longer need to use the variable.
No Includes - There is no such thing as an Include statement in MS Access. You can import code modules from other Access databases. You can call functions contained in a DLL. But there is no include in Access like there is in PHP.
DoCmd - You'll have to use MS Access's DoCmd object to open forms and reports and perform other common tasks. Just a warning: it's frequently irrational. Long-time Access users don't think much of it but I've found these commands to have little cohesion or consistency. Let me give you an example. If you want to close a form you use this code: DoCmd.Close acForm, "frmSomeFormName" but if you want to open a form you use this code: DoCmd.OpenForm "frmName" In this example, why does opening a form get it's own OpenForm function while closing a form simply uses Close followed by a constant that tells Access you are wanting to close a form? I have no answer. DoCmd is full of this type of inconsistency. Blueclaw does a pretty good job of listing the most common DoCmd's although I don't think the examples there are exactly stellar.
References - You shouldn't need to use references very frequently. You will have to use them to enable things like DAO and ADO (see further down) or Microsoft Scripting Runtime (often used for accessing, reading, writing, etc. to files and folders). It's basically something you do once and then you forget about it.
ActiveX Controls - Probably better to try to build your project without using these. They require the same control to be installed on each computer that will run your software. I don't know much about it but I understand there are some compatibility issues that can come up if you use ActiveX controls in your project.
DAO - Data Access Objects - DAO is Access's original, native set of objects used to interface to your data container. Although it is primarily used to access date held in an Access database backend/container, it also can be used for some tasks when you are using ODBC linked tables. DAO is very helpful when you need to loop through recordsets to make changes in bulk. You can even use it to loop through form controls. One place I use this is to reorder line numbers in invoice details after a line gets deleted. Another typical use is to use it in "utility" functions where you need to change something in a given field or fields that can't be done with an update query.
CurrentDb.Execute("Update or Delete query here...") The Execute method of the CurrentDb object is, in my understanding, an implicit DAO call. It allows you to run Update or Delete queries on local and linked tables from VBA code. You can also achieve this using DoCmd.RunSQL but CurrentDb.Execute is the preferred method because it gives you improved error messages if something fails if you append ", dbFailOnError" as a second argument.
ADO - ActiveX Data Objects - I recommended not using ActiveX controls but this is one ActiveX technology you might need. To my knowledge, ADO is the only thing you can use to run stored procedures from Access. ADO is similar to DAO and was supposed to replace DAO although it didn't really. I tend to use both of them in my applications. It takes a while to figure out which one will do the job for you or which one will do it better. In general, I stick with DAO for everything except for running stored procedures or connecting to outside data sources (i.e. not using linked tables). DAO and ADO are both part of MDAC (Microsoft Data Access Components) which gets installed with MS Access.
File System Object - This object, mentioned above, is often used to access files and folders. You'll find you may have to use it to copy files, create text files, read text files, write to text files, etc. It's a part of Microsoft Scripting Runtime which is part of Windows Script Host (exists on all Windows computers although it can become "broken"). Access does give you some ways of access files and folders using VBA's built-in functions/methods such as Dir() but these functions don't cover all the bases.
SQL - Server's Query Language - You're probably familiar with SQL already but you'll need to get used to Access's "superset" of the SQL language. It's not drastically different but Access does allow you to use Access functions (e.g Len, Left, right) or your own custom functions. Your own functions just need to exist in a code module and be declared as public. An example of your own function would be Repeat (doesn't exist in MS Access, exists in MySQL) which is sometimes used to create indentation based on Count(*) in tables with child parent relationships. I'm giving that as an example although it's unlikely you'll need to use such a function unless you are going to be using the Nested Set Model to hold hierarchal categories.
Variables Cannot be in Literal Strings - This is a massive difference between Access and PHP. PHP lets you write: "SELECT * FROM tag WHERE tagtext = '$mytag'" In MS Access you'd have to write it like this: "SELECT * FROM tag WHERE tagtext = '" & strMyTag & "'" (You may not ever need to worry about this unless you are formatting a query in VBA to retrieve a DAO or ADO recordset. What I've just pointed out doesn't generally affect your form's or report's recordsource or saved queries because you generally don't use variables in those.)
Query - Not difficult to figure out but in Access a Query is basically a MySQL view. I actually don't save queries very often. I generally use them only to derive my SQL "code" and then I take that SQL and paste it into my form as the Recordsource instead of binding a form to a saved query. It doesn't matter which way you want to do it. There are pros and cons either way you choose to do this. As a side note, don't be afraid to create views in MySQL and link to them in Access. When you link to them Access sees them as tables. Whether or not it is updateable/writeable will depend on the construction of the view. Certain types of queries/views (such as unions) are read-only.
As a final note, I recommend MS Access over OpenOffice.org Base. I tried out Base a couple years ago and I found it to lack so many features. However, I was already experienced in MS Access so I'm not sure that I gave OpenOffice Base a fair trial. What I found missing was events. I'm accustomed to being able to fine-tune my forms in MS Access to give users a very responsive UI with lots of feedback and I couldn't figure out how to do this in Base. Maybe things have changed since since I last tried it, I don't know. Here's an article comparing Base to MS access.
Other SO Access gurus, feel free to point out any errors in my answer. I still consider myself a rookie in programming.
I can't speak for Base. However Access can link to the MySQL database directly so you don't have to redo the data. As far as creating the bits and pieces of code in Access that would be quite easy. Access, Word and Excel, use VBA which is identical, except for Access, Word or Excel object model specific stuff, to Visual Basic 6.0. Indeed a minor obscure bug when using the VBA editor is also present in the VB6 editor.
I will also add that one of my Access databases had 160 tables, 1200 queries, 350 forms, 450 reports and 70K lines of code. So your app is quite small by comparision.
On the freebies table I would change the is_redeemed field to a date_redeemed. I definitely agree with recording each stamp and and freebie earned as separate records in tables. Thss way it's real easy to show the customer a history rather than just stating you've only got x stamps.
Also consider a bar code reader and issueing the users bar coded plastic wallet cards. This will greatly speed up the time required by the clerk to look up their records. Indeed consider using common to your area loyalty cards they might already have such as a Safeway or AirMiles card. I'd put that number in a separate table though just in case they lose the first card they were given. Or so they can track multiple cards. A family might want want to accumulate points onto one account.
Thanks for the lenghty posting. This enables us to give you some suggestions on different facets you might not have thought of in the first place.
My suggestion: don't do it. Run a mysql server on the PC in question, have your PHP app as the front end for the cashiers, and then if you want MS Access's reports feature, just have Access connect to the mysql database with ODBC.
The best implementation is quite frequently the one you already have.

Need help designing big database update process

We have a database with ~100K business objects in it. Each object has about 40 properties which are stored amongst 15 tables. I have to get these objects, perform some transforms on them and then write them to a different database (with the same schema.)
This is ADO.Net 3.5, SQL Server 2005.
We have a library method to write a single property. It figures out which of the 15 tables the property goes into, creates and opens a connection, determines whether the property already exists and does an insert or update accordingly, and closes the connection.
My first pass at the program was to read an object from the source DB, perform the transform, and call the library routine on each of its 40 properties to write the object to the destination DB. Repeat 100,000 times. Obviously this is egregiously inefficent.
What are some good designs for handling this type of problem?
Thanks
This is exactly the sort of thing that SQL Server Integration Services (SSIS) is good for. It's documented in Books Online, same as SQL Server is.
Unfortunately, I would say that you need to forget your client-side library, and do it all in SQL.
How many times do you need to do this? If only once, and it can run unattended, I see no reason why you shouldn't reuse your existing client code. Automating the work of human beings is what computers are for. If it's inefficient, I know that sucks, but if you're going to do a week of work setting up a SSIS package, that's inefficient too. Plus, your client-side solution could contain business logic or validation code that you'd have to remember to carry over to SQL.
You might want to research Create_Assembly, moving your client code across the network to reside on your SQL box. This will avoid network latency, but could destabilize your SQL Server.
Bad news: you have many options
use flatfile transformations: Extract all the data into flatfiles, manipulate them using grep, awk, sed, c, perl into the required insert/update statements and execute those against the target database
PRO: Fast; CON: extremly ugly ... nightmare for maintanance, don't do this if you need this for longer then a week. And a couple dozens of executions
use pure sql: I don't know much about sql server, but I assume it has away to access one database from within the other, so one of the fastes ways to do this is to write it as a collection of 'insert / update / merge statements fed with select statements.
PRO: Fast, one technology only; CON: Requires direct connection between databases You might reach the limit of SQL or the available SQL knowledge pretty fast, depending on the kind of transformation.
use t-sql, or whatever iterative language the database provides, everything else is similiar to pure sql aproach.
PRO: pretty fast since you don't leave the database CON: I don't know t-sql, but if it is anything like PL/SQL it is not the nicest language to do complex transformation.
use a high level language (Java, C#, VB ...): You would load your data into proper business objects manipulate those and store them in the database. Pretty much what you seem to be doing right now, although it sounds there are better ORMs available, e.g. nhibernate
use a ETL Tool: There are special tools for extracting, transforming and loading data. They often support various databases. And have many strategies readily available for deciding if an update or insert is in place.
PRO: Sorry, you'll have to ask somebody else for that, I so far have nothing but bad experience with those tools.
CON: A highly specialized tool, that you need to master. I my personal experience: slower in implementation and execution of the transformation then handwritten SQL. A nightmare for maintainability, since everything is hidden away in proprietary repositories, so for IDE, Version Control, CI, Testing you are stuck with whatever the tool provider gives you, if any.
PRO: Even complex manipulations can be implemented in a clean maintainable way, you can use all the fancy tools like good IDEs, Testing Frameworks, CI Systems to support you while developing the transformation.
CON: It adds a lot of overhead (retrieving the data, out of the database, instanciating the objects, and marshalling the objects back into the target database. I'd go this way if it is a process that is going to be around for a long time.
Building on the last option you could further glorify the architectur by using messaging and webservices, which could be relevant if you have more then one source database, or more then one target database. Or you could manually implement a multithreaded transformer, in order to gain through put. But I guess I am leaving the scope of your question.
I'm with John, SSIS is the way to go for any repeatable process to import large amounts of data. It should be much faster than the 30 hours you are currently getting. You could also write pure t-sql code to do this if the two database are on the same server or are linked servers. If you go the t-sql route, you may need to do a hybrid of set-based and looping code to run on batches (of say 2000 records at a time) rather than lock up the table for the whole time a large insert would take.

Database source control with Oracle

I have been looking during hours for a way to check in a database into source control. My first idea was a program for calculating database diffs and ask all the developers to imlement their changes as new diff scripts. Now, I find that if I can dump a database into a file I cound check it in and use it as just antother type of file.
The main conditions are:
Works for Oracle 9R2
Human readable so we can use diff to see the diferences. (.dmp files doesn't seem readable)
All tables in a batch. We have more than 200 tables.
It stores BOTH STRUCTURE AND DATA
It supports CLOB and RAW Types.
It stores Procedures, Packages and its bodies, functions, tables, views, indexes, contraints, Secuences and synonims.
It can be turned into an executable script to rebuild the database into a clean machine.
Not limitated to really small databases (Supports least 200.000 rows)
It is not easy. I have downloaded a lot of demos that does fail in one way or another.
EDIT: I wouldn't mind alternatives aproaches provided that they allows us to check a working system against our release DATABASE STRUCTURE AND OBJECTS + DATA in a batch mode.
By the way. Our project has been developed for years. Some aproaches can be easily implemented when you make a fresh start but seem hard at this point.
EDIT: To understand better the problem let's say that some users can sometimes do changes to the config data in the production eviroment. Or developers might create a new field or alter a view without notice in the realease branch. I need to be aware of this changes or it will be complicated to merge the changes into production.
So many people try to do this sort of thing (diff schemas). My opinion is
Source code goes into a version control tool (Subversion, CSV, GIT, Perforce ...). Treat it as if it was Java or C code, its really no different. You should have an install process that checks it out and applies it to the database.
DDL IS SOURCE CODE. It goes into the version control tool too.
Data is a grey area - lookup tables maybe should be in a version control tool. Application generated data certainly should not.
The way I do things these days is to create migration scripts similar to Ruby on Rails migrations. Put your DDL into scripts and run them to move the database between versions. Group changes for a release into a single file or set of files. Then you have a script that moves your application from version x to version y.
One thing I never ever do anymore (and I used to do it until I learned better) is use any GUI tools to create database objects in my development environment. Write the DDL scripts from day 1 - you will need them anyway to promote the code to test, production etc. I have seen so many people who use the GUIs to create all the objects and come release time there is a scrabble to attempt to produce scripts to create/migrate the schema correctly that are often not tested and fail!
Everyone will have their own preference to how to do this, but I have seen a lot of it done badly over the years which formed my opinions above.
Oracle SQL Developer has a "Database Export" function. It can produce a single file which contains all DDL and data.
I use PL/SQL developer with a VCS Plug-in that integrates into Team Foundation Server, but it only has support for database objects, and not with the data itself, which usually is left out of source control anyways.
Here is the link: http://www.allroundautomations.com/bodyplsqldev.html
It may not be as slick as detecting the diffs, however we use a simple ant build file. In our current CVS branch, we'll have the "base" database code broken out into the ddl for tables and triggers and such. We'll also have the delta folder, broken out in the same manner. Starting from scratch, you can run "base" + "delta" and get the current state of the database. When you go to production, you'll simply run the "delta" build and be done. This model doesn't work uber-well if you have a huge schema and you are changing it rapidly. (Note: At least among database objects like tables, indexes and the like. For packages, procedures, functions and triggers, it works well.) Here is a sample ant task:
<target name="buildTables" description="Build Tables with primary keys and sequences">
<sql driver="${conn.jdbc.driver}" password="${conn.user.password}"
url="${conn.jdbc.url}" userid="${conn.user.name}"
classpath="${app.base}/lib/${jdbc.jar.name}">
<fileset dir="${db.dir}/ddl">
<include name="*.sql"/>
</fileset>
</sql>
</target>
I think this is a case of,
You're trying to solve a problem
You've come up with a solution
You don't know how to implement the solution
so now you're asking for help on how to implement the solution
The better way to get help,
Tell us what the problem is
ask for ideas for solving the problem
pick the best solution
I can't tell what the problem you're trying to solve is. Sometimes it's obvious from the question, this one certainly isn't. But I can tell you that this 'solution' will turn into its own maintenance nightmare. If you think developing the database and the app that uses it is hard. This idea of versioning the entire database in a human readable form is nothing short of insane.
Have you tried Oracle's Workspace Manager? Not that I have any experience with it in a production database, but I found some toy experiments with it promising.
Don't try to diff the data. Just write a trigger to store whatever-you-want-to-get when the data is changed.
Expensive though it may be, a tool like TOAD for Oracle can be ideal for solving this sort of problem.
That said, my preferred solution is to start with all of the DDL (including Stored Procedure definitions) as text, managed under version control, and write scripts that will create a functioning database from source. If someone wants to modify the schema, they must, must, must commit those changes to the repository, not just modify the database directly. No exceptions! That way, if you need to build scripts that reflect updates between versions, it's a matter of taking all of the committed changes, and then adding whatever DML you need to massage any existing data to meet the changes (adding default values for new columns for existing rows, etc.) With all of the DDL (and prepopulated data) as text, collecting differences is as simple as diffing two source trees.
At my last job, I had NAnt scripts that would restore test databases, run all of the upgrade scripts that were needed, based upon the version of the database, and then dump the end result to DDL and DML. I would do the same for an empty database (to create one from scratch) and then compare the results. If the two were significantly different (the dump program wasn't perfect) I could tell immediately what changes needed to be made to the update / creation DDL and DML. While I did use database comparison tools like TOAD, they weren't as useful as hand-written SQL when I needed to produce general scripts for massaging data. (Machine-generated code can be remarkably brittle.)
Try RedGate's Source Control for Oracle. I've never tried the Oracle version, but the MSSQL version is really great.

Resources