Current status of DifferentDatabaseScope implementation - castle-activerecord

my current project requires to connect to two different database (same schema, different database engine) using castle activerecord. The connection string for first database is same all the time, but the second one is changed dynamically based on user input. I decided that the best solution is to use DifferentDatabaseScope for second database.
From castle project documentation, it stated that DifferentDatabaseScope Still very experimental and it's not bullet proof for all situations. I want to know what are the situations that make DifferentDatabaseScope failed or not behave as intended.

Related

How do you implement version control in a database application?

I'm working on a web based Java project that stores end user data in a MySql database. I'd like to implement something that allows the user to have functionality similar to what I have for my source code version control (e.g. Subversion). In other words, I'd like to implement code that allows the user to commit and rollback work and return to an existing branch. Is there an existing framework for this? It seems like putting the database data into version control and exposing the version control functionality to the end user (i.e. write code that allows the user to commit, rollback, etc.) could be a reasonable approach but it also seems their might be some problems with this approach. For example, how would you allow one user to view a rolled back version of the data (i.e. you can't just replace the data the database is pointing to if one user wants to look at a rolled back version of the data)? If given the choice of completely rebuilding the system using any persistence architecture what could be used to store the data that would make this type of functionality easy to implement?
There are 2 very common solutions for what you need:
http://www.liquibase.org/
https://flywaydb.org/
Branching and merging the user data
Your question is about solutions to version the user data in a application, to give your users capabilities such as branching and merging. You pondered about exposing a real version control such as svn.
The side-effects I can foresee are:
You will have to index things by directory and filename. Maybe using an abstraction of directories as entities and filenames as the primary key.
Operating systems (linux, mac and windows alike) does not handle well directories with millions of files. You will have to partition the entity. Usually hashing the ID (md5 for example) and taking the beginning of the hash to create an subdirectory. The number of digits to take from the hash depends on the expected size of the entity.
Operating systems (linux, mac and windows alike) are not prepared for huge quantity of files. I did a test on that. It took me days to backup and finally remove an file tree with hundreds of millions of files.
You will not be able to have additional indexes beyond the primary key, however you can work around that creating a data-mart, as I will describe below.
You will not have database constraints, but similar functionality can be implemented through git/svn/cvs triggers.
You will not have strong transactions, but similar functionality can be implemented through git/svn/cvs triggers.
You will have a working copy for each user, this will consume space depending on the size of the repositories. That way each user will be in a single point in time.
GIT is fast enough to switch from a branch to another, so go back in time and back will take only seconds (unless the user data is big, of course).
I saw a Linus interview where he warned about low performance in huge git repositories. Maybe it is best to have a repository to each user or other means to avoid your application having a single humongous repository.
Resolution of the changes. I bet that if you create gazillions of versions any version control will complaint. I do not what gazillions mean. You will have to test it.
Query database
A version control working copy will be limited to primary key queries using the "=" operator and sequential scans. This is not enough to make good reports and statistics on any usage pattern I can think off. That why you need to build a data-mart from your application data and you have two ways of doing that:
A batch process: that reads the whole repository history and builds cubes and other views to allow easier querying.
GIT/SVN/CVS triggers: can call programs made by you on file addition, modification, exclusion, branch creation and merging. This could be used to update the database when a change happen.
The batch is easier to implement but takes time to the reports and statistics be synchronized with the activity. You probably will want to go that way in the 1.0 version and in time moving to triggers to get things more dynamic.
Simulating constraints and transactions
GIT, SVN and CVS supports triggers that execute programs when a new version is submitted. Then the relationships and consistency can be checked to accept or not the change.
Alternative Solutions
Since you do not specified the kind of application you want, I will talk about blogs, content portals and online stores. For those kinds of applications I see no much reason to reinvent the wheel and build a custom database. Most of the versioning necessary can be predicted in the database model. A good event-oriented database design will be enough.
For example, a revision in a blog post could be modeled as marking the end date/time of the post and creating a new row for the revised post, increasing the version number and setting the previous version id. The same strategy can be used with sales and catalog of an online store. If you model your application with good logs you does not need version control.
Some developers also do a row level trigger that records everything that has changed on the database. This is a bit harder for an auditor that would need to reconstruct the past from bad designed logs. I personally do not like this way because is very difficult to index this kinds of queries. I prefer to make my whole applications around a good designed and meaningful log.
For example:
History Table
10/10/2010 [new process] process_id=1; name=john
11/10/2010 [change name] process_id=1; old_name=john; new_name=john doe
12/10/2010 [change name] process_id=1; old_name=john doe; new_name=john doe junior
Process Table after 12/10/2010.
proc_id=1 name=john doe junior
That way I can reconstruct almost everything on the past and still have my operational data in a easy-to-use format.
However, this is not close to the usage pattern you want (branching and merging)
Conclusion
The applicability of version control as a database seems to me very powerful on one hand and very limited and dangerous in another. It is very inspiring for auditing and error correction purposes. But my main concern would be scale and reliability.
It seems like you want version control for your data rather than the database schema. I could find two databases that implement most of the version control features such as fork, clone, branch, merge, push, and pull:
https://github.com/dolthub/dolt - SQL based
https://github.com/terminusdb/terminusdb - graph based
You mentioned Subversion, which is a Centralized Version Control System. But let us focus on Git, because of reasons. Git is a Decentralized Version Control System. A local copy of a Git repository is the same as a remote copy of the repository, if a remote copy exists at all (services such as GitLab and GitHub provide the remote housing and managing of Git projects). With Git you can have version control in an arbitrary directory in your machine. You can do whatever you are accustomed to doing with SVN, and more, in this arbitrary directory.
What I am getting at, is that you could possibly create per user directories/repositories in your server programmatically, and apply version control in these directories/repositories, keeping a separate repository per user (the specifics of the architecture would be decided later, though, depending on the structure of the user's "work"). Your application would be in charge of adding and removing files on behalf of the user (e.g. Biography, My Sample Project, etc.), editing files, committing the changes, presenting a file history, etc., essentially issuing Git commands. Your application would, thus, interface with the Git repository, exploiting the advanced version control that Git provides. Your database would just make sure that the user is linked to the directory/repository that contains their "work".
To provide a critical analogy, the GitLab project is an open source web-based Git repository manager with wiki and issue tracking features. GitLab is written in Ruby and uses PostgreSQL (preferably). It is a typical (as in Code - Database - Data directories and files) multiuser web-based application. Its purpose is to manage Git repositories. These Git repositories are stored in a designated directory in the server. Part of the code is responsible for accessing the Git repositories that the logged-in user is authorized to access (as the owner or as a collaborator). An interesting use case is of a user editing a file online, which will result in a commit in some branch in some repository. Another interesting use case is of a user checking the history of a file. A final interesting use case is of a user reverting a specific commit. All of these actions are performed online, via a web browser.
To provide an interesting real-world use case, Atlas by O'Reilly is an online platform for publishing-related collaboration using GitLab as the backend.
For Java there is JGit, a lightweight, pure Java library implementing the Git version control system. JGit is used by Eclipse for all actions related to managing Git repositories. Maybe you could look into it. It is an extremely active project, supported by many, Google included.
All of the above make sense, if the "work" you refer to is more than some fields in a database table, which the user will fill in and may later change the values of. For instance, it would make sense for structured text, HTML, etc.
If this "work" is not so large-scale, maybe doing something like what is described above is overkill. In that case, you could employ some of the version control concepts in your database design, such as calculating diffs and applying patches (also in reverse, for viewing past versions / rolling back). Your tables should allow for a tree-like structure, to store the diffs, so you could allow for branches. You could have the active version of a file readily available, as well as the active index (what Git calls HEAD), and navigate to another indexed/hashed/tagged version in the file's history by applying all patches sequentially, if moving forward, or applying patches in reverse, and in the reverse chronological order, if moving backwards. If this "work" is really small-scale, you could even ditch the diff concept, and store the whole version of the "work" in the tree-like structure.
Pure fun.

Creating database tables programmatically in evolutions kingdom

Imagine a program which operates large hierarhical datasets. The program stores each new such dataset in a dedicated table. The table is created accordingly to what data types the dataset has in it. Well, nothing very unusual. This is a trivial situation. But how do I make this kind of arrangements in Play 2.0, where the evolution paradigm rules? I just cannot start thinking of it.
UPDATE
It turned out, there is no simple way. Ok. The round way.
Is it possible to:
1) Make the program write the evolutions files itself and apply them automatically? Will it cause some distortion with Play's philosophy?
2) Use another DB system in a separate thread and do not use the Play's innate databsae functionality? Would that hurt much?
UPDATE 2
I am reading though MongoDB Casbah documentation and I like it a lot. I am planning to use this with my Play application. Is there any contra-evidence for using MongoDB via Casbah with Play?
Thst's good question. And there's no brilliant answer, unfortunately.
Generally evolutions are good and are desired when you work in group. In such case you should switch to manual evolutions (not these generated by Ebean, they are dangerous to your data in current state) and just put your initial DDL as big as possible with create statements.
In next evolutions you can create new tables or alter existing, but for god's sake do not try to create existing table :)
Other approach I was (or still) thinking about is using Ebean's auto-generated DDLs (which always assumes that your DB is empty) to generate differential schemas with some SQL schema migration tools (ie mybatis) but this is unfortunately additional effort required.
The last thing I sometimes use when I'm not sure about correct evolution syntax is small test-field app where you can add similar models and watch how Ebean's plugin will threat them. Unfortunately even this solution won't create proper alters, but it's better then testing on main app.
Well, after some more experiments, I have concluded to use MongoDB (actually, I had to choose from a wide variety of document-oriented DBMSs, and decided to start with MongoDB). I have established a MongoDB server, incorporated it's Java driver, Casbah (the driver's Scala-wrapper) and all the necessary dependencies into my project, and all works fine. No need for SQL or the evolutions paradigm, whatsoever.
And I am not using any parts of Play that work with database (the config file, anorm, and what's else is there), just ignoring that, and doing all Mongo.
All works JUST FINE!

In Zend Framework using MultiDB Resource, how do I configure database fallback?

We use Zend Framework's MultiDB resource ( http://framework.zend.com/manual/1.10/en/zend.application.available-resources.html#zend.application.available-resources.multidb )
I've been tasked with adding a new DB resource which has three endpoints for redundancy. I'd like to configure MultiDB so if the connection to the first endpoint fails, it'll connect to the second and, if need be, third endpoint before giving up.
I tried setting resource.multidb.resourcename.host[] but that failed. It looks like it will only accept a single endpoint.
Is there a way to configure fallback? Or do I need to extend Zend_Application_Resource_Multidb?
To be honest I've never tried software-side (not even sure if its really possible with zend framework "itself") fallback for database applications. And there's a good reason for it: Its simply the wrong place for it!
There are several stumbling blocks:
How do you manage to keep data persistent among several databases?
What happens if one database is down?
Assuming you're using MySQL: You might want to take a look at the master - slave replication of MySQL itself: 16.1.1. How to Set Up Replication
Even this might cause you headdeaches (espacially if you have to migrate existing data). I'm currently quite satisfied with a solution i came across several projects ago:
Instead of having the pain of handling redudancy by your own, simply hand the task to your database/sever itself!
The easiest solution I came across so far is setting up your database on an (from you webapplication itself independent) failover vServer cluster.
I'm sorry if my answer quite not match your question ... but it might be a thought-provoking impulse for an different approach.

database versioning

I work as a scm developer and I am currently tasked with a activity to which involves the database versioning. Although I have done source code management I am quite new to this. Hence I would like to have different views and experience on how to implement this.
What I mean by database(oracle/sybase) version is to capture the changes which happens to the database schema/triggers/etc and store it as revisions. Basically in our company there are some changes in the customer databases which we are not aware of or at least not able to identify when and who made a particular change. We are just trying to create a record of the changes which happens in the DB.
Note: I am not a DB guy.
The usual practice is to allow changes to go through a build process. Basically.. have a version control tool like CVS where users check in the changes that have to to go to the QA and Prod environments.
So.. let's say, there are a couple of columns added to a table, the developer would check in a .ddl script with the "Alter table ..." command and that will be "applied" to the database the next time you do a build.
Unless you restrict users (in this case.. Developers) from directly making changes and instead use a standard build-process, tracking changes to objects is almost impossible over time.
Consider necessary details like the user who made the change, Time of change, reason (Check-in comments, bug Number, new feature request etc) which you'd need later to understand why a change was made. All the changes are usually compiled using a standard user like "APPOWNER" and in the absence of a version control system, you only have access to the latest change (last_ddl_change ).
If your concern is to track changes to Data, you can use triggers or use an application like Golden Gate that will read through the redo-logs and get you the change capture records. From your Question, it looks like you are looking for a way to track object changes.
The best way to do it is to have some kind of db revision software which manages all changes and allows to easily apply it to multiple databases (up/downgrade).
It requires to save all changes to revision software, no direct db changes.
Maybe similar tools for PostgreSQL will help:
depesz scripts http://www.depesz.com/index.php/projects/.
Python tool: https://code.google.com/p/sqlalchemy-migrate/

Version track, automate DB schema changes with django

I'm currently looking at the Python framework Django for future db-based web apps as well as for a port of some apps currently written in PHP. One of the nastier issues during my last years was keeping track of database schema changes and deploying these changes to productive systems. I haven't dared asking for being able to undo them too, but of course for testing and debugging that would be a great feature. From other questions here (such as this one or this one), I can see that I'm not alone and that this is not a trivial problem. Also, I found many inspirations in the answers there.
Now, as Django seems to be very powerful, does it have any tools to help with the above? Maybe it's even in their docs and I missed it?
There are at least two third party utilities to handle DB schema migrations, South and Django Evolution. I haven't tried either one, but I have heard some good things about South, though Evolution has been around a little longer.
Also, look at SchemaEvolution on the Django wiki. It is just a wiki page about migrating the db.
Last time I checked (version 0.97), syncdb will be able to add tables to sync your DB schema with your models.py file, but it cannot:
Rename or add a column on a populated DB. You need to do that by hand.
Refactorize your model (like split a table into two) and repopulate your DB accordingly.
It might be possible though to write a Django script to make the migration by playing with the two different managers, but that might take ages if your DB is large.
There was a panel session on DB schema changes at the recent DjangoCon; there is a video of the session (thanks to Google), which should provide some useful information on a number of these utilities.
And now there's also dmigrations. From announcement:
django-evolution attempts to address this problem the clever way, by detecting changes to models that are not yet reflected in the database schema and figuring out what needs to be done to bring the two back in sync. In contrast, dmigrations takes the stupid approach: it requires you to explicitly state the changes in a sequence of migrations, which will be applied in turn to bring a database up to the most recent state that reflects the underlying models.
This means extra work for developers who create migrations, but it also makes the whole process completely transparent—for our projects, we decided to go with the simplest system that could possibly work.
(My bold)
I heard lot of good about Django Schema Evolution Branch and those were opions of actual users. It mostely works out of the box and do what it should do.
U should lookup Dmigrations, it functions a little bit diffrent from django-eveoltions.
It shows you everything it is doing and for compliccated things it asks you for your intervetnions. It should be great.

Resources