i am using Symfony 2 with Doctrine as ORM Framework. I am searching for the best way to save changes done to database fields. I will have about 100 Tables each with about 50 fields and some thousand rows. Now i would like to save all changes done to the fields.
Possibilities i thought about:
Doctrine extension "Loggable" - saves changes in a different Table, but don't know if it can afford this amount of entries.
a MySQL Trigger for each Table that saves changes in a new Table?
But what is the best practice to save changes?
You can use either MySQL triggers or the mentioned DoctrineExtension Loggable feature. Both works, both has cons and pros. MySQL trigger can write into a separate table (see mysql trigger FAQ).
triggers:
++ framework, programming language independent
++ works when you want to modify the data by hand or by a script.
-- You have to write the triggers for every table or have to figure out some generic solution in SQL (I can't help on that).
-- If you are not familiar with stored procedures and PL/SQL, well, there is learning curve
doctrine extensions:
++ Just put your annotation on classes and you're done.
++ You can query the history, revert changes through the Repository API
-- you lock yourself to a vendor, this sometimes is, sometimes isn't a problem
-- doesn't works when you modify the data by hand or with a 3rd party scripts.
If the chance of switching doctrine to something else is low, I would start with doctrine extensions. It's a tool with the exact purpose to help dealing with SQL after all.
I'd suggest going with triggers, especially if you want your logging functionality to stay application independent — that is, it will work even if you decide to rewrite your app on a different framework or completely different programming language.
P.S. I don't know how great is triggers support in MySQL, since I switched to PostgreSQL before MySQL even had them.
Such a thing is commonly called "change data capture". It's been asked about with reference to MySQL before on SO:
Change Data Capture in MySQL
Maybe this answer can help you.
Different vendors make this a built in feature to varying degrees.
The following article has a step by step explanation + sample code of doing versioning using triggers.
http://www.jasny.net/articles/versioning-mysql-data/
Related
Imagine a program which operates large hierarhical datasets. The program stores each new such dataset in a dedicated table. The table is created accordingly to what data types the dataset has in it. Well, nothing very unusual. This is a trivial situation. But how do I make this kind of arrangements in Play 2.0, where the evolution paradigm rules? I just cannot start thinking of it.
UPDATE
It turned out, there is no simple way. Ok. The round way.
Is it possible to:
1) Make the program write the evolutions files itself and apply them automatically? Will it cause some distortion with Play's philosophy?
2) Use another DB system in a separate thread and do not use the Play's innate databsae functionality? Would that hurt much?
UPDATE 2
I am reading though MongoDB Casbah documentation and I like it a lot. I am planning to use this with my Play application. Is there any contra-evidence for using MongoDB via Casbah with Play?
Thst's good question. And there's no brilliant answer, unfortunately.
Generally evolutions are good and are desired when you work in group. In such case you should switch to manual evolutions (not these generated by Ebean, they are dangerous to your data in current state) and just put your initial DDL as big as possible with create statements.
In next evolutions you can create new tables or alter existing, but for god's sake do not try to create existing table :)
Other approach I was (or still) thinking about is using Ebean's auto-generated DDLs (which always assumes that your DB is empty) to generate differential schemas with some SQL schema migration tools (ie mybatis) but this is unfortunately additional effort required.
The last thing I sometimes use when I'm not sure about correct evolution syntax is small test-field app where you can add similar models and watch how Ebean's plugin will threat them. Unfortunately even this solution won't create proper alters, but it's better then testing on main app.
Well, after some more experiments, I have concluded to use MongoDB (actually, I had to choose from a wide variety of document-oriented DBMSs, and decided to start with MongoDB). I have established a MongoDB server, incorporated it's Java driver, Casbah (the driver's Scala-wrapper) and all the necessary dependencies into my project, and all works fine. No need for SQL or the evolutions paradigm, whatsoever.
And I am not using any parts of Play that work with database (the config file, anorm, and what's else is there), just ignoring that, and doing all Mongo.
All works JUST FINE!
I'm pretty new to database development in general and I've never used ORM before. I'm interested in the benefits of using one, specifically saving time writing boilerplate SQL queries. I'd like to use ORM for a project that I'm working on right now, but I'm not sure it's applicable.
This project is more akin to change tracking for very small (<= 500 characters) documents. I need to track edits and categorizations made by multiple users. Not really to see the specific changes they make, but more to see if the users agree with each other. I am using a SQL database for this (as opposed to actual document control) for a few reasons:
The documents are really small; and I'm only interested in the strings, not really in files.
I wanted the ability to perform ad-hoc queries against the data for development purposes, and didn't want an unpleasantly surprised halfway through that a particular document control package couldn't do what I wanted.
From most of what I've read it seems like you need a direct mapping from columns to data fields in an object to use ORM. What I have now does not even come close to this. To create objects representing documents in different stages of editing I have to cobble together data from columns in different tables, in different combinations.
So my question is: Does an ORM like Hibernate apply to this type of project? And if it does can one be added to an existing application/database?
If it makes a difference: I'm using Java, MySQL, and JDBC. The web app users have access to for edits is made with GWT and hosted via Tomcat6. If I need it, I have complete control of the webserver.
Thanks.
Does an ORM like Hibernate apply to
this type of project?
Yes
And if it does can one be added to an
existing application/database?
Yes
My opinion is that an ORM tool could be useful for you but you really need to delve into it to see for yourself. Remember when you use an ORM tool you are not forced to use only that to connect to your database. ORM tools in general make the most sense for applications that store data in a very object like structure. For instance your user code might be the place to start. Usually you only create 1 user at a time, you edit 1 user at a time, you check if 1 user is logged into. It also makes sense for things where you would return a list of results like Order Lines. Where I have run into issues with ORM tools is when you have complex data that requires multiple joins especially back to the table you started in. For those cases you might want to keep doing what you are doing. Overall, ORM tools are great but they are like a lot of other things in software development. Try them out on a small part of your code and use them where they work and don't where they don't. Ultimately, you are the one that will have to deal with and maintain whatever you make. Just educate yourself on Hibernate and I am sure you will know what to do!
I think that ORM (I would suggest using the JPA standard, probably with Hibernate as the provider) could suit your project.
It is fairly traditional, as you say, for database columns to map directly onto object fields. If you need to keep your existing database structure (which apparently doesn't map at all well to your objects), then you might find that its more trouble than it's worth to use ORM.
While it's certainly possible to use ORM to map to a specific database schema - perhaps because it's used by other systems - my view is that one of the biggest advantages of ORM is that you can almost ignore the schema. Once you design your objects, and tell hibernate about them, hibernate will create whatever tables it needs.
If you have to create an application like - let's say a blog application, creating the database schema is relatively simple. You have to create some tables, tblPosts, tblAttachments, tblCommets, tblBlaBla… and that's it (ok, i know, that's a bit simplified but you understand what i mean).
What if you have an application where you want to allow users to define parts of the schema at runtime. Let's say you want to build an application where users can log any kind of data. One user wants to log his working hours (startTime, endTime, project Id, description), the next wants to collect cooking recipes, others maybe stock quotes, the weekly weight of their babies, monthly expenses they spent for food, the results of their favorite football teams or whatever stuff you can think about.
How would you design a database to hold all that very very different kind of data? Would you create a generic schema that can hold all kind of data, would you create new tables reflecting the user data schema or do you have another great idea to do that?
If it's important: I have to use SQL Server / Entity Framework
Let's try again.
If you want them to be able to create their own schema, then why not build the schema using, oh, I dunno, the CREATE TABLE statment. You have a full boat, full functional, powerful database that can do amazing things like define schemas and store data. Why not use it?
If you were just going to do some ad-hoc properties, then sure.
But if it's "carte blanche, they can do whatever they want", then let them.
Do they have to know SQL? Umm, no. That's your UIs task. Your job as a tool and application designer is to hide the implementation from the user. So present lists of fields, lines and arrows if you want relationships, etc. Whatever.
Folks have been making "end user", "simple" database tools for years.
"What if they want to add a column?" Then add a column, databases do that, most good ones at least. If not, create the new table, copy the old data, drop the old one.
"What if they want to delete a column?" See above. If yours can't remove columns, then remove it from the logical view of the user so it looks like it's deleted.
"What if they have eleventy zillion rows of data?" Then they have a eleventy zillion rows of data and operations take eleventy zillion times longer than if they had 1 row of data. If they have eleventy zillion rows of data, they probably shouldn't be using your system for this anyway.
The fascination of "Implementing databases on databases" eludes me.
"I have Oracle here, how can I offer less features and make is slower for the user??"
Gee, I wonder.
There's no way you can predict how complex their data requirements will be. Entity-Attribute-Value is one typical solution many programmers use, but it might be be sufficient, for instance if the user's data would conventionally be modeled with multiple tables.
I'd serialize the user's custom data as XML or YAML or JSON or similar semi-structured format, and save it in a text BLOB.
You can even create inverted indexes so you can look up specific values among the attributes in your BLOB. See http://bret.appspot.com/entry/how-friendfeed-uses-mysql (the technique works in any RDBMS, not just MySQL).
Also consider using a document store such as Solr or MongoDB. These technologies do not need to conform to relational database conventions. You can add new attributes to any document at runtime, without needing to redefine the schema. But it's a tradeoff -- having no schema means your app can't depend on documents/rows being similar throughout the collection.
I'm a critic of the Entity-Attribute-Value anti-pattern.
I've written about EAV problems in my book, SQL Antipatterns Volume 1: Avoiding the Pitfalls of Database Programming.
Here's an SO answer where I list some problems with Entity-Attribute-Value: "Product table, many kinds of products, each product has many parameters."
Here's a blog I posted the other day with some more discussion of EAV problems: "EAV FAIL."
And be sure to read this blog "Bad CaRMa" about how attempting to make a fully flexible database nearly destroyed a company.
I would go for a Hybrid Entity-Attribute-Value model, so like Antony's reply, you have EAV tables, but you also have default columns (and class properties) which will always exist.
Here's a great article on what you're in for :)
As an additional comment, I knocked up a prototype for this approach using Linq2Sql in a few days, and it was a workable solution. Given that you've mentioned Entity Framework, I'd take a look at version 4 and their POCO support, since this would be a good way to inject a hybrid EAV model without polluting your EF schema.
On the surface, a schema-less or document-oriented database such as CouchDB or SimpleDB for the custom user data sounds ideal. But I guess that doesn't help much if you can't use anything but SQL and EF.
I'm not familiar with the Entity Framework, but I would lean towards the Entity-Attribute-Value (http://en.wikipedia.org/wiki/Entity-Attribute-Value_model) database model.
So, rather than creating tables and columns on the fly, your app would create attributes (or collections of attributes) and then your end users would complete the values.
But, as I said, I don't know what the Entity Framework is supposed to do for you, and it may not let you take this approach.
Not as a critical comment, but it may help save some of your time to point out that this is one of those Don Quixote Holy Grail type issues. There's an eternal quest for probably over 50 years to make a user-friendly database design interface.
The only quasi-successful ones that have gained any significant traction that I can think of are 1. Excel (and its predecessors), 2. Filemaker (the original, not its current flavor), and 3. (possibly, but doubtfully) Access. Note that the first two are limited to basically one table.
I'd be surprised if our collective conventional wisdom is going to help you break the barrier. But it would be wonderful.
Rather than re-implement sqlservers "CREATE TABLE" statement, which was done many years ago by a team of programmers who were probably better than you or I, why not work on exposing SQLSERVER in a limited way to the users -- let them create thier own schema in a limited way and leverage the power of SQLServer to do it properly.
I would just give them a copy of SQL Server Management Studio, and say, "go nuts!" Why reinvent a wheel within a wheel?
Check out this post you can do it but it's a lot of hard work :) If performance is not a concern an xml solution could work too though that is also alot of work.
I know that there are a few (automatic) ways to create a data access layer to manipulate an existing database (LINQ to SQL, Hibernate, etc...). But I'm getting kind of tired (and I believe that there should be a better way of doing things) of stuff like:
Creating/altering tables in Visio
Using Visio's "Update Database" to create/alter the database
Importing the tables into a "LINQ to SQL classes" object
Changing the code accordingly
Compiling
What about a way to generate the database schema from the objects/entities definition? I can't seem to find good references for tools like this (and I would expect some kind of built-in support in at least some frameworks).
It would be perfect if I could just:
Change the object definition
Change the code that manipulates the object
Compile (the database changes are done auto-magically)
Check out DataObjects.Net - is is designed to support exactly this case. Code only, and nothing else. Its schema upgrade layer is probably the most featured one you can find, and it really fully abstracts schema upgrade SQL.
Check out product video - you'll notice nothing additional is made to sync the schema. Schema upgrade sample shows the intended usage of this feature.
You may be looking for an Object Database.
I believe this is the problem that the Microsofy Entity Framework is trying to address. Whilst not specifically designed to "Compile (the database changes are done auto-magically)" it does address the issue of handling changes to the domain model without a huge dependance on the underlying data model.
As Jason suggested, object db might be a good choice. Take a look at db4objects.
What you described is GORM. It is part of the Grails framework and is built to work with Hibernate (maybe JPA in the future). When I was first using Grails it seemed backwards. I was more comfortable with a Rails style workflow of making the tables and letting the framework generate scaffolding from the database schema. GORM persists your domain objects for you so you create and change the objects, it manages database create/update. This makes more sense now that I have gotten used to it. Sorry to tease you if you aren't looking for a new framework but it is on the roadmap for release 1.1 to make GORM available standalone.
When we built the first version of our own framework (Inon Datamanager) I had it read pre-existing SQL tables and autogenerate Java objects from them.
When my colleagues who came from a Smalltalkish background built the second version, they started from the objects and then autogenerated the tables.
Actually, they forgot about the SQL part altogether until I came back in and added it. But nowadays we just run a trigger on application startup which iterates over the object model, checks if the tables and all the right columns exist, and creates them if not. Very convenient.
This turned out to be a lot easier than you might expect - if your favourite tool doesn't support a similar process, you could probably write it in a couple of hours - assuming the relational to object mapping is relatively simple.
But the point is, it seems to depend on whether you're culturally an object person or a database person - you can regard either one as the authoritative source.
Some of the really big dogs, such as ERwin Data Modeler, will go object to DB. You need to have the big bucks to afford the product though.
I kept digging around some of the "major" frameworks and it seems that Django does exactly what I was talking about. Or so it seems from this screencast.
Does anyone have any remark to make about this? Does it work well?
Yes, Django works well.
yes, it will generate your SQL tables from your data model definitions (written in python)
It won't always alter existing tables if you update your structure, you might have to run an ALTER table manually
Ruby on Rails has an even more advanced version of these features (Rails migrations), but I don't like the framework as much, I find ruby and rails pretty idiosyncratic
Kind of a late answer, but here it goes:
I faced the exact same problem and ended up writing my own solution for it, working with .NET and SQL Server only however. It basicaly does implement the process you describe:
All DB objects are kept as embedded CREATE scripts as part of the source code
DB Objects are set up automatically (or on request) when using the data access functionality
All non-table changes are also performed automatically (or on request) at the same time
Table changes, which may require special attention to migrate data, are performend via (manually created) change scripts also upon upgrading the database
Even manual changes made to any databse object can be detected, so that schema integrity can be verified and rectified
An optional lightweight ORM can map stored procedures and objects as well as result sets (even multiple)
A command-line application helps keeping the SQL source files in sync with a development database
The library including the database are free under a LGPL license.
http://code.google.com/p/bsn-modulestore/
I'm currently looking at the Python framework Django for future db-based web apps as well as for a port of some apps currently written in PHP. One of the nastier issues during my last years was keeping track of database schema changes and deploying these changes to productive systems. I haven't dared asking for being able to undo them too, but of course for testing and debugging that would be a great feature. From other questions here (such as this one or this one), I can see that I'm not alone and that this is not a trivial problem. Also, I found many inspirations in the answers there.
Now, as Django seems to be very powerful, does it have any tools to help with the above? Maybe it's even in their docs and I missed it?
There are at least two third party utilities to handle DB schema migrations, South and Django Evolution. I haven't tried either one, but I have heard some good things about South, though Evolution has been around a little longer.
Also, look at SchemaEvolution on the Django wiki. It is just a wiki page about migrating the db.
Last time I checked (version 0.97), syncdb will be able to add tables to sync your DB schema with your models.py file, but it cannot:
Rename or add a column on a populated DB. You need to do that by hand.
Refactorize your model (like split a table into two) and repopulate your DB accordingly.
It might be possible though to write a Django script to make the migration by playing with the two different managers, but that might take ages if your DB is large.
There was a panel session on DB schema changes at the recent DjangoCon; there is a video of the session (thanks to Google), which should provide some useful information on a number of these utilities.
And now there's also dmigrations. From announcement:
django-evolution attempts to address this problem the clever way, by detecting changes to models that are not yet reflected in the database schema and figuring out what needs to be done to bring the two back in sync. In contrast, dmigrations takes the stupid approach: it requires you to explicitly state the changes in a sequence of migrations, which will be applied in turn to bring a database up to the most recent state that reflects the underlying models.
This means extra work for developers who create migrations, but it also makes the whole process completely transparent—for our projects, we decided to go with the simplest system that could possibly work.
(My bold)
I heard lot of good about Django Schema Evolution Branch and those were opions of actual users. It mostely works out of the box and do what it should do.
U should lookup Dmigrations, it functions a little bit diffrent from django-eveoltions.
It shows you everything it is doing and for compliccated things it asks you for your intervetnions. It should be great.