I am writing an application using an object database (db4o) and in agile fashion will be starting from a small, minimal implementation and iteratively adding features from there, while releasing new versions of the software as I go.
The main question I have is how to maintain backwards compatibility for the database, as new implementations of the model classes are developed, so that users will be able to use first edition saved data with 10th edition software.
Are there some standard ways to do this, especially using an object database?
db4o supports automatic object schema evolution for the basic class model changes (field name deletion/addition). More complex class model modifications, like field name change, field type change, hierarchy move are not automated out-of-the box, but can be automated by writing small utility update program.
See documentation here and here.
Related
Its clear to me that I can customize the behavior of syncing models and DB schema process. I am using the DropCreateDatabaseIfModelChanges<> class to do so.
Assume that I have a working project and site and DB is filling in the data. Everything is working fine.
One day I decide that some functionality needs to be changed. The changes will affect the properties of my models (they can be renamed/deleted/added, some models will be new, some models are deleted).
My question: What will happen with the already existing data on my deployed site when I check in all of my changes?
Will I lose it? If so, how can I avoid that?
Yes, you will lose your data if your model changes and you are using DropCreateDatabaseIfModelChanges<T>
To avoid this:
Don't use Db initializers in production (maybe except the CreateDatabaseIfNotExists<T>). DB initializes are there to smooth the development experience, not for production use.
What you need is the new Migration feature of Entity Framework 4.3. (currently in Beta1) which provides features for automatic and code base db schema migration.
Also now you can set the DB initializer from the *.config file, so you can easily switch beetween the development time DropCreateDatabaseIfModelChanges to no initializer in production configurations.
I do a lot of untyped dataset work in my projects and have done so for a while but when working with in-place editing of datagridview I found it's a lot easier for validation and stuff if you use a Typed Dataset.
This poses an issue though because I don't like using those dataset designers to create strongly typed datatables/datasets. It makes it harder to make simple changes down the road when the dataset is typed than it does with untyped. Typed dataset changes require a copy of VS to be installed whereas untyped doesn't. I can change a sql view on the db server and the apps will show the new column in my grid. They may not be able to use the new column but most of my stuff is info display so that's ok.
I looked at Entity Framework, but it too looks like a few wizards must run to build your data model. I'm not against a data model but it would be great if it would generate at runtime so that new changes to the db don't require software recompiling.
Is there a happy medium? Or am I stuck creating untyped datatables at startup for a while longer?
It's all a matter of taste, of course, but I find that Datasets are the root of all evil.
well, maybe not all evil, but they represent a data structure that has no behavior associated with it => they are not objects (as defined in OOP) => using them promotes non-OOP style programming (not to mention procedural programming).
some other points:
gridviews and any other control fully support binding to a list of objects (and not just datasets).
I think that, if you have to make changes to your data model, having a copy of VS installed is not too much to ask.
along the same lines- it's also not an exaggerated requirement to have to recompile your code when you make changes to your data model.
if, when you change a table in your db, that forces a change on your UI, I would say it's not "loosely coupled" by any stretch of the imagination.
I believe that the only justification for using datasets is to pull data out of your db and then transfer it into your objects.
but now, as you know, if that isn't necessary- you have ORMs that do that job for you (EF is one, nHibernate is another, better, option).
so, in conclusion- I strongly recommend you reconsider your use of DataSets, as they go against the very basics of Object-Oriented.
p.s.
sorry if this came across as a little emotional- I was talking from bitter personal experience.
I was pulling my hair out for 2 years because the app I was working on had used DataSets all over, and that meant that we had to duplicate the behavior for that data all over as well. uughh....
Is there a way I can generate a database schema from an erlang application like I can do with hibernate.
I assume you mean Mnesia, and if that is the case, you don't really understand the nature of the Mnesia database. It is by its very design and implementation "schemaless". You might could write some really messy ugly code that walked a Mnesia database and tried to document the various records that are in it, but that would be pretty much a futile exercise. If you are storing Records in Mnesia, you already have the "schema" in the .hrl files that the Records are defined in.
There's nothing like nhibernate for sql databases in erlang.
Check out SumoDB
Overview
sumo_db gives you a standard way to define your db schema, regardless of the db implementation (mongo, mysql, redis, elasticsearch, etc.).
Your entities encapsulate behavior in code (i.e. functions in a module) and state in a sumo:doc() implementation.
sumo is the main module. It translates to and from sumo internal records into your own state.
Each store is managed by a worker pool of processes, each one using a module that implements sumo_store and calls the actual db driver (e.g: sumo_store_mnesia).
Some native domain events are supported, that are dispatched through a gen_event:notify/2 automatically when an entity is created, updated, deleted. Also when a schema is created and when all entities of a given type are deleted. Events are described in this article.
Full conditional logic support when using find_by/2 and delete_by/2 function. You can find more information about the syntax of this conditional logic operators here.
Support for sorting (asc or desc) based on multiple fields unsing find_by/5 and find_all/4 functions. For example this [{age, desc}, {name, asc}]] will sort descendently by age and ascendently by name.
Support for docs/models validations through sumo_changeset (check out the Changeset section).
If you are looking for Java hibernate type of object to SQL mapping framework in Erlang, you may have to write your own mapping module. One option is to map Erlang records to SQL. Any framework has to make sure the type mapping. Here is the link to Erlang's ODBC mapping http://erlang.org/doc/apps/odbc/databases.html#type
Erlang's ETS and Mnesia which is an extension of ETS are very flexible and efficient to manage records. If these two databases cannot be your choice, you might have to implement ways for record mapping
Is LINQ a kind of Object-Relational Mapper?
LINQ in itself is a set of language extensions to aid querying, readability and reduce code. LINQ to SQL is a kind of OR Mapper, but it isn't particularly powerful. The Entity Framework is often referred to as an OR Mapper, but it does quite a lot more.
There are several other LINQ to X implementations around, including LINQ to NHibernate and LINQ to LLBLGenPro that offer OR Mapping and supporting frameworks in a broadly similar fashion to the Entity Framework.
If you are just learning LINQ though, I'd recommend you stick to LINQ to Objects to get a feel for it, rather than diving into one of the more complicated flavours :-)
LINQ is not an ORM at all. LINQ is a way of querying "stuff", and can be more or less seen as a SQL-like language extension for different things (IEnumerables).
There are various types of "stuff" that can be queried, among them SQL Server databases. This is called LINQ-to-SQL. The way it works is that it generates (implicit) classes based on the structure of the DB and your query. In this sense it works much more like a code generator.
LINQ-to-SQL is not an ORM because it doesn't try at all to solve the object-relational impedance mismatch. In an ORM you design the classes and then either map them manually to tables or let the ORM generate the database. If you then change the database for whatever reason (typically refactoring, renormalization, denormalization), many times you are able to keep the classes as they are by changing the mapping.
LINQ-to-SQL does nothing of the sort. Your LINQ queries will be tightly coupled to the database structure. If you change the DB, you will probably have to change the LINQ as well.
LINQ to SQL (part of Visual Studio 2008) is an OR Mapper.
LINQ is a new query language that can be used to query many different types of sources.
LINQ itself is not a ORM. LINQ is the language features and methods that exist in allowing you to query objects like SQL.
"LINQ to SQL" is a provider that allows us to use LINQ against SQL strongly-typed objects.
I think a good test to ascertain whether a platform or code block displays the characteristics of an O/R-M is simply:
With his solution hat on, does the developer(s) (or his/her code generator) have any direct, unabstracted knowledge of what's inside the database?
With this criterion, the answer for differing LINQ implementations can be
Yes, knowledge of the database schema is entirely contained within the roll-your-own, LINQ utilizing O/R-M code layerorNo, knowledge of the database schema is scattered throughout the application
Further, I'd extend this characterization to three simple levels of O/R-M.
1. Abandonment.
It's a small app w/ a couple of developers and the object/data model isn't that complex and doesn't change very often. The small dev team can stay on top of it.
2. Roll your own in the data access layer.
With some managable refactoring in a data access layer, the desired O/R-M functionality can be effected in an intermediate layer by the relatively small dev team. Enough to keep the entire team on the same page.
3. Enterprise-level O/R-M specification defining/overhead introducing tools.
At some level of complexity, the need to keep all devs on the same page just swamps any overhead introduced by the formality. No need to reinvent the wheel at this level of complexity. N-hibernate or the (rough) V1.0 Entity Framework are examples of this scale.
For a richer classification, from which I borrowed and simplified, see Ted Neward's classic post at
http://blogs.tedneward.com/2006/06/26/The+Vietnam+Of+Computer+Science.aspx
where he classifies O/R-M treatments (or abdications) as
1. Abandonment. Developers simply give up on objects entirely, and return to a programming model that doesn't create the object/relational impedance mismatch. While distasteful, in certain scenarios an object-oriented approach creates more overhead than it saves, and the ROI simply isn't there to justify the cost of creating a rich domain model. ([Fowler] talks about this to some depth.) This eliminates the problem quite neatly, because if there are no objects, there is no impedance mismatch.
2. Wholehearted acceptance. Developers simply give up on relational storage entirely, and use a storage model that fits the way their languages of choice look at the world. Object-storage systems, such as the db4o project, solve the problem neatly by storing objects directly to disk, eliminating many (but not all) of the aforementioned issues; there is no "second schema", for example, because the only schema used is that of the object definitions themselves. While many DBAs will faint dead away at the thought, in an increasingly service-oriented world, which eschews the idea of direct data access but instead requires all access go through the service gateway thus encapsulating the storage mechanism away from prying eyes, it becomes entirely feasible to imagine developers storing data in a form that's much easier for them to use, rather than DBAs.
3. Manual mapping. Developers simply accept that it's not such a hard problem to solve manually after all, and write straight relational-access code to return relations to the language, access the tuples, and populate objects as necessary. In many cases, this code might even be automatically generated by a tool examining database metadata, eliminating some of the principal criticism of this approach (that being, "It's too much code to write and maintain").
4. Acceptance of O/R-M limitations. Developers simply accept that there is no way to efficiently and easily close the loop on the O/R mismatch, and use an O/R-M to solve 80% (or 50% or 95%, or whatever percentage seems appropriate) of the problem and make use of SQL and relational-based access (such as "raw" JDBC or ADO.NET) to carry them past those areas where an O/R-M would create problems. Doing so carries its own fair share of risks, however, as developers using an O/R-M must be aware of any caching the O/R-M solution does within it, because the "raw" relational access will clearly not be able to take advantage of that caching layer.
5. Integration of relational concepts into the languages. Developers simply accept that this is a problem that should be solved by the language, not by a library or framework. For the last decade or more, the emphasis on solutions to the O/R problem have focused on trying to bring objects closer to the database, so that developers can focus exclusively on programming in a single paradigm (that paradigm being, of course, objects). Over the last several years, however, interest in "scripting" languages with far stronger set and list support, like Ruby, has sparked the idea that perhaps another solution is appropriate: bring relational concepts (which, at heart, are set-based) into mainstream programming languages, making it easier to bridge the gap between "sets" and "objects". Work in this space has thus far been limited, constrained mostly to research projects and/or "fringe" languages, but several interesting efforts are gaining visibility within the community, such as functional/object hybrid languages like Scala or F#, as well as direct integration into traditional O-O languages, such as the LINQ project from Microsoft for C# and Visual Basic. One such effort that failed, unfortunately, was the SQL/J strategy; even there, the approach was limited, not seeking to incorporate sets into Java, but simply allow for embedded SQL calls to be preprocessed and translated into JDBC code by a translator.
6. Integration of relational concepts into frameworks. Developers simply accept that this problem is solvable, but only with a change of perspective. Instead of relying on language or library designers to solve this problem, developers take a different view of "objects" that is more relational in nature, building domain frameworks that are more directly built around relational constructs. For example, instead of creating a Person class that holds its instance data directly in fields inside the object, developers create a Person class that holds its instance data in a RowSet (Java) or DataSet (C#) instance, which can be assembled with other RowSets/DataSets into an easy-to-ship block of data for update against the database, or unpacked from the database into the individual objects.
Linq To SQL using the dbml designer yes, otherwise Linq is just a set of extension methods for Enumerables.
I know that there are a few (automatic) ways to create a data access layer to manipulate an existing database (LINQ to SQL, Hibernate, etc...). But I'm getting kind of tired (and I believe that there should be a better way of doing things) of stuff like:
Creating/altering tables in Visio
Using Visio's "Update Database" to create/alter the database
Importing the tables into a "LINQ to SQL classes" object
Changing the code accordingly
Compiling
What about a way to generate the database schema from the objects/entities definition? I can't seem to find good references for tools like this (and I would expect some kind of built-in support in at least some frameworks).
It would be perfect if I could just:
Change the object definition
Change the code that manipulates the object
Compile (the database changes are done auto-magically)
Check out DataObjects.Net - is is designed to support exactly this case. Code only, and nothing else. Its schema upgrade layer is probably the most featured one you can find, and it really fully abstracts schema upgrade SQL.
Check out product video - you'll notice nothing additional is made to sync the schema. Schema upgrade sample shows the intended usage of this feature.
You may be looking for an Object Database.
I believe this is the problem that the Microsofy Entity Framework is trying to address. Whilst not specifically designed to "Compile (the database changes are done auto-magically)" it does address the issue of handling changes to the domain model without a huge dependance on the underlying data model.
As Jason suggested, object db might be a good choice. Take a look at db4objects.
What you described is GORM. It is part of the Grails framework and is built to work with Hibernate (maybe JPA in the future). When I was first using Grails it seemed backwards. I was more comfortable with a Rails style workflow of making the tables and letting the framework generate scaffolding from the database schema. GORM persists your domain objects for you so you create and change the objects, it manages database create/update. This makes more sense now that I have gotten used to it. Sorry to tease you if you aren't looking for a new framework but it is on the roadmap for release 1.1 to make GORM available standalone.
When we built the first version of our own framework (Inon Datamanager) I had it read pre-existing SQL tables and autogenerate Java objects from them.
When my colleagues who came from a Smalltalkish background built the second version, they started from the objects and then autogenerated the tables.
Actually, they forgot about the SQL part altogether until I came back in and added it. But nowadays we just run a trigger on application startup which iterates over the object model, checks if the tables and all the right columns exist, and creates them if not. Very convenient.
This turned out to be a lot easier than you might expect - if your favourite tool doesn't support a similar process, you could probably write it in a couple of hours - assuming the relational to object mapping is relatively simple.
But the point is, it seems to depend on whether you're culturally an object person or a database person - you can regard either one as the authoritative source.
Some of the really big dogs, such as ERwin Data Modeler, will go object to DB. You need to have the big bucks to afford the product though.
I kept digging around some of the "major" frameworks and it seems that Django does exactly what I was talking about. Or so it seems from this screencast.
Does anyone have any remark to make about this? Does it work well?
Yes, Django works well.
yes, it will generate your SQL tables from your data model definitions (written in python)
It won't always alter existing tables if you update your structure, you might have to run an ALTER table manually
Ruby on Rails has an even more advanced version of these features (Rails migrations), but I don't like the framework as much, I find ruby and rails pretty idiosyncratic
Kind of a late answer, but here it goes:
I faced the exact same problem and ended up writing my own solution for it, working with .NET and SQL Server only however. It basicaly does implement the process you describe:
All DB objects are kept as embedded CREATE scripts as part of the source code
DB Objects are set up automatically (or on request) when using the data access functionality
All non-table changes are also performed automatically (or on request) at the same time
Table changes, which may require special attention to migrate data, are performend via (manually created) change scripts also upon upgrading the database
Even manual changes made to any databse object can be detected, so that schema integrity can be verified and rectified
An optional lightweight ORM can map stored procedures and objects as well as result sets (even multiple)
A command-line application helps keeping the SQL source files in sync with a development database
The library including the database are free under a LGPL license.
http://code.google.com/p/bsn-modulestore/