Object-databases are used very seldomly, albeit they offer a way to live without SQL, which is, I think, a benefit of its own.
Yet, I have seen them about never in production systems. Is there something fundamentally wrong with object-databases? Can I use a object-database in a production system?
Edit: So, maybe I should confess that I love object-databases. I cannot really get my head around why they are not used a lot more often.
Sure you could, as long as it was stable. The problem is the relative lack of high quality Object Oriented DB systems, as well as the fact that most people don't even know what one is.
db4o is being used a lot by many Fortune 500 companies (especially for embedded applications), so I wouldn't say that OODBs are not used for real-world production systems
There are production systems written using the GemStone OODB. It's a distributed, persistent Smalltalk system.
Heard of Cache? Used by EpicSystems for their Enterprise Health Record(EHR) product. Plenty of production shops using it.
The real question here is whether you require tooling to support your database or not. By tooling I mean reporting, data migration, data mining, etc. Do you need to provide self service reports? Heck, even reports with a fast turnaround time that doesn't require deploying new code? (Providing report functionality in the application is a real drag.)
There are countless tools available to perform these operations against traditional RDBMs. Against OODBs? I'm not familiar with any major products. Though, admittedly, I'm not an OODB kinda guy.
If you don't need those tools, they go for it. Otherwise, stick to traditional RDBMs. With current ORM technology, the pain of mapping objects to records is much less than it used to be.
The problem I believe, is that SQL isn't inherently a bad thing. It is very good at performing set based operations. From what I've seen, object databases work well when working with individual objects, yet fail when trying to do set based operations. Also, people are very good at working with SQL databases. It's easy to find people to work with them. Object databases are another story.
Related
In past year I've made numerous projects with NoSQL json based databases - the rich kinds (not the key/value stores) - such as CouchDB, MongoDB, RavenDB. I talk to fellow programmers often about my adoption, I notice though I'm always quick to add "of course SQL RDBMS system still have a place, its always whats best for particular project/task" - as a little disclaimer so not be seen as kool aid drinker, however its pretty shallow statement. Outside of legacy projects that already have an investment in RDBMS, or corporate mandates insisting on Oracle, I struggling to think of any future green field project I'd opt for a SQL database. Its CouchDB all the way for me as far as I can see with rich map/reduce, changes feed, replication support, RESTFUL api (sorry thats starting to sound like a plug)
I'd like to hear those that do "get" (beyond screencasts) NoSQL Json M/R databases such as CouchDb, what type of projects do you think you'll use MS-SQL, Oracle, Postgresql etc.. in the future ?
Thanks
One of the biggest strengths of SQL is that there is a standard way of modelling just about anything - for any given project it may not provide an optimal solution, but it does provide a reasonable one. This means that in a corporate environment you can decide that everything will be stored in Oracle to get the maintenance benefits of having a single system without the risk that it will be completely inappropriate for future projects.
This ability to handle different requirements without needing a lot of planning is also relevant at the start of projects where the design doesn't get signed off until six months into development - again, something that applies mostly in corporate development.
Using NoSQL properly generally requires better developers than you need for SQL development. One of the many SQL code samples available can be edited into a working system by someone who barely knows what they are doing. NoSQL does things like eliminating integrity checks for performance - a good developer produces well tested code that doesn't insert invalid records or understands why a few invalid records don't matter for a particular app. A bad developer thinks anything without error messages is good code. The average developer at a successful web startup can probably handle it. The average developer maintaining internal corporate apps probably can't.
On my own projects where I have complete control of the platform choice and know both the requirements and who will do the development, NoSQL is as good a complete solution as you suggest. Nothing really matters except the technical advantages - easier to code, easier to maintain, better performance, horizontal scaling.
In the corporate environment it's the other realities that dominate - incompetent developers, unknown/changing requirements, long application life cycles, the need to minimize the number of systems - NoSQL wasn't designed for that problem set.
Off-hand, I'd simply say that if the data is inherently relational, an RDBMS is the optimal choice. For instance, an order management system is a good fit for an RDBMS. If the data is not inherently relational (like Google's search index, for example), than NoSQL is a better choice. They both have their place.
My latest application is massively hierarchical/relational with objects containing sub-objects containing sub-objects, all related by natural keys. It's a perfect fit for an RDBMS, and would have been much trickier in a NoSQL DB.
While computer programming evangelists predicting the future of Cloud Computing to be very bright, is there a chance for relational databases to be on their way out?
What are the DBs that are more suitable for Cloud Computing?
Here's a good article that may answer some of your questions. It features a good comparison between RDBMS systems and the ones usually used for cloud storage infrastructure:
http://www.readwriteweb.com/enterprise/2009/02/is-the-relational-database-doomed.php
The relational database model has a firm mathematical basis in relational algebra. This makes it easy to reason about, to extend, and to use properly (in theory). Even if database access patterns change significantly as a result of these new APIs and uses, it's likely that a relational database will form the underlying implementation for this reason.
No, RDBMSs will always have a place because of their functionality. Not just on their own, but also as backbones to other systems (like OODBMSs).
Relational databases are still relevant, both for localized storage (such as application-specific storage) and for server storage.
The cloud computing platforms that I've seen each have a relational database offering. So, I don't see cloud computing really changing the picture in reference to database types being used.
However, something will eventually replace the databases that we're all used to. The question is whether that will be a higher-level version of RDBs or something different. Another aspect of that question is how long will it take for the current crop of RDBs to fade out? (I don't have an answer for either.)
Clouds go poof still these days, so I don't think so anytime soon.
I don't think that cloud computing will kill RDBMSs. Something else might though.
First, what type of storage engine a given application uses does not (or should not) depend on where it is running (the cloud or a specific server), but rather on how it needs to store the data.
Second, as far as I can tell the only reason people think RDBMSs are on their way out is because they don't scale as well as non-relational DBMSs (such as document-oriented DBMSs like CouchDB) which can more easily be distributed into the cloud. However, there is no reason that RDBMSs cannot become more cloud-friendly in the future. As an early example, look at Drizzle:
The Drizzle project is building a database optimized for Cloud and Net applications. It is being designed for massive concurrency on modern multi-cpu/core architecture.
So no, I don't think that cloud computing will kill RDBMSs. They will just be forced to adapt. What might kill them, however, is if an existing alternative, or a new one, becomes as robust and easy to use as RDBMSs. What I mean is a solution that has both completely solid software (betas not allowed) and is easy for programmers to switch to. They give out degrees to people who understand RDBMSs. Because of all the assisting software (such as ORMs like ActiveRecord, SQLAlchemy, and whatever the .NET folk use I'm assuming), using RDBMSs has become easy even for people who don't know what the first normal form is. So I think that until there is a way for people to use (for instance) a DODBMS just as easily, RDBMSs will continue to dominate. I'm also not saying that is necessarily bad. Again, which DBMS you use should depend on your data, not what people say is cool and better.
A quote from the article :
"The inherent constraints of a relational database ensure that data at the lowest level have integrity. Data that violate integrity constraints cannot physically be entered into the database. These constraints don't exist in a key/value database, so the responsibility for ensuring data integrity falls entirely to the application. But application code often carries bugs. Bugs in a properly designed relational database usually don't lead to data integrity issues; bugs in a key/value database, however, quite easily lead to data integrity issues."
What this means to me is that RDBMS's are doomed, and hotshot new technologies are facing a great and brilliant future, to the same extent that users aren't anywhere near interested in the correctness of their data.
IMHO.
There's nothing wrong with relational databases for applications that need to query more structured data (e.g., "How many people bought product XYZ, on this date, paid more than $100, but less than $150?"). There are potentially significant architectural issues that will need to be addressed as these systems scale and grow. Once your DB outgrows the one machine you started on and/or traffic/requests begin to overload available resources, then (if you still want to keep your relational database) you have to start adding layers. Thankfully today, there are many more options available then in previous years... including caching, map and reduce, and other functionality - but these add-on layers do add complexity and maintenance overhead. In one sense I'd consider these engineered "band-aids" which will most likely solve the scalability and distribution problems with a relational DB today, but longer term? Who knows. I also see these popular layers today - all of which are basically trying to emulate functionality already available in object DBs, giving developers a "virtual object DB" layer that they can use with their object languages to do things faster and more efficiently, and get past the growth and performance obstacles. So I guess my overall opinion is, relational DBs became the defacto DB probably mostly due to how (relatively) easy it was to query a database, and get results back to the one client/app using it. As volumes have grown though, and application complexity is exponentially greater today, I think more developers will decide to bite the bullet, learn the syntax for object DBs (which is actually about as standardized today as relational DBs), and just skip all the middleware and layers that only emulate functionality that one could get natively in an OODBMS. I've seen OODBs that simply get installed on any number of servers, and automatically distributing data as needed, and giving the developer a single view of any size federation of databases... Seems to me the best solution as systems become more distributed, to get a DB that can has native distributed architecture. Anyway, just a thought.
In my experience, this has been a contentious issue between "backend" (database developer) and "frontend" guys (application developer, client and server side).
There have been many heated pub discussions on this subject.
I just want to know is it just people have different mindsets, or lazy to learn more and feel comfortable in what they know, or something else.
I might re-phrase the question: why do (some) application developers think they can do "database stuff" without actually bothering to understand it properly? Whereas database developers do not (in general) assume they can write a good application without some training and experience!
It is about levels of abstraction. A database is the lowest level of abstraction in a typical business application (software-wise). It is much more likely that a developer working on an outer layer of the abstraction would have knowledge of an inner layer than a developer in an inner layer would know about the outer layer.
This is because inner layers of abstraction best perform when they are ignorant of the outer layers who depend on them.
So a designer in the presentation layer of a website may know a bit about the server-side code they depend on because they interact with it. But the developer working on the server does not need to know anything about design at all.
I would say it's on a need to know basis. Applications developers often need to know how to connect to databases, add records, delete records etc... This is taken further with new technologies such as LINQ where developers can write database queries within their actual code.
Database developers on the other hand only really need to know how to write database queries as that is their job and probably won't need to worry about the code at application level.
Because programmers very often must understand and interact with databases to do their job, but DBAs very often don't need to do any programming (outside of the DBMS) to do their jobs.
I believe it stems from the fact that programming in sql looks easy, and to get started you have to have a small amount of knowledge (Really for a programmer to learn SELECT * FROM Table is pretty easy). Application programming is not the same way. It becomes very complex in a small amount of time, and that discourages a lot of people. Now I am not saying that database people are any less intelligent it is just what they do looks easier than building applications.
If you develop applications, then the chances are, that sooner or later, you'll have to connect an app to a back-end.
The opposite is not as true.
I think it stems from necessity. If you consider the roles of each person, a programmer needs to to database related stuff far more than database workers need to do programming tasks.
From my experience, having developed both "databases" and "applications" (following your nomenclature...), I guess there's a big difference in state management.
Properly designed databases are always in a "clean" state, and every transaction keeps this consistency. So when developing a database, you have to very clearly specify your data abstractions into tables and which updates are legal and so on.
I've found that most application developers (myself included :)) do a very sloppy job in keeping this consistent state in the application. Any non-trivial interface has many more possible states to manage than a modest database, and it's not as easy to make sure it's always in a clean state. It's also harder to analyze every possible sequence of steps that users will perform.
From my experience, the application developers don't do all the database stuff. Consider all the administration that is related to the databse, backups, replication, etc.
A typical DBA (at least on most of the projects I've been involved to) takes care about everything that is related to project databases - all administration, cooperates with application developers on performance tuning, gives advices about SQL used by the app, does some of the stored procs coding, creates (or, at least reviews and consults) physical DB designs, etc.
So, aren't the database guys "lazy", or "fine with what they already know" just from an application developer's perspective? I'm an app developer myself and there is a whole lot of things that I just don't know about the DBs we're using on our projects.
Part of my education ensured I got a decent understanding of how Databases work. I went into the field expecting to do database work, and a lot of it. I'm a web app guy; it comes with the territory I guess.
My two jobs as a developer have been at two shops that would best be described as tiny (2 people myself included, and then just me) and tiny (3 developers, briefly having a fourth). I have not observed an immediate business need for, nor worked anywhere that had the resources to employ a dedicated DB guy. I can envision some scenarios where that would change (including a new job :P).
As to the rest, I agree that abstraction is also a factor and as developers we're way up on top/outside looking in. I can't imagine doing web app development without DB skills, and I consider Sql/DB Management to be both an important tool and an area I need to stay sharp in.
I'll add that I treat the database side as its own field. There's skills that translate between the two, but there's a lot of specialized knowledge I need to acquire to get better at it, and that being a good programmer doesn't necessarily mean I'm doing a good job on the back end either (fortunately, I'm not a good programmer ;) ). Also, I'm pretty sure that's what she said.
2 reasons:
DB Vendors facilitate bad SQL, and
SQL is hidden from view while
application UI is front and center.
Most naive developers think SQL is a procedural language and write it as such because vendors ensure that the tools exist so that they can do so. DBAs know that good SQL is set-oriented and has optimization principles that are totally different from those involved in application programming.
The visibility aspect makes it so the application developers can write bad SQL against a database and get it to perform in a marginal way, and no one ever sees quite how bad it is. When a DBA writes an application, there are immediate critiques on its appearance and behavior because it's directly visible to the end user.
Good question. Actually why developers do Database Stuff because where no dedicated Database guys then developers have to do that. But a company have Database Guys also have Development guys.
:) what is your idea ?
My boss asks me to write only ANSI SQL to make it database independent. But I learned that it is not that easy as no database fully ANSI SQL compatible. SQL code can rarely be ported between database systems without modifications.
I saw people do different way to make their program database independent. For example:
Externalize SQL statements to resource files.
Write many providers class to support different database.
Write only simple SQL, and keep away from advance functions/joins.
Do you always write your code "any database ready"? Or do it only if needed?
If yes, how do you achieve it?
You could use one of the many Object/Relational Mapper tools, like Hibernate/NHibernate, LLBLGen, and so forth. That can get you a long way to database portability. No matter what you do, you need to have some abstraction layer between your model and the rest of your code. This doesn't mean you need some sort of dependency injection infrastructure, but good OO design can get you pretty far. Also, sticking with plain SQL and thinking that will get you portability is rather naive. That would be true if your application was trivial and only used very trivial queries.
As for always writing an application to be "any database ready," I usually use some sort of abstraction layer so it is not hard to move from one database system to another. However, in many circumstances, this is not required, you are developing for the Oracle platform or SQL Server or MySQL whatever so you shouldn't sacrifice the benefits of your chosen RDBMS just for the possibility of an entirely seamless transition. Nevertheless, if you build a good abstraction layer, even targeting a specific RDBMS won't necessarily be too difficult to migrate to a different RDBMS.
To decouple the database engine from your application, use a database abstraction layer (also data access layer, or DAL). You didn't mention what language you use, but there are good database abstraction libraries for all the major languages.
However, by avoiding database-specific optimizations you will be missing out on the advantages of your particular brand. I usually abstract what's possible and use what's available. Changing database engines is a major decision and doesn't happen often, and it's best to use the tools you have available to the max.
Tell your boss to mind his own business. No, of course one can't say such things to one's boss, but stay tuned.
What's interesting is what business value is supposed to be supported by this requirement. One obvious candidate seems to be that the database code should be ready for working on other database engines than the current. If that's the case then that's what should be stated in the requirement.
From there it's up to you as an engineer to figure out the different ways to achieve that. One might be writing ANSI SQL. One might be using a database abstraction layer.
Further it's your responsibility to inform your boss what the costs of the different alternatives are (in terms of performance, speed of development, etcetera).
"Write ANSI SQL"... gah!
Just for the record. There is a similar question here on Stackoverflow:
Database design for database-agnostic applications
Being 100% compliant to ANSI SQL is a difficult goal to meet, and yet it doesn't guarantee portability anyway. So it's an artificial goal.
Presumably your boss is asking for this in order to make it easy and quick to switch database brands for some hypothetical purpose in the future (which actually may never come). But he's trading that future efficiency for a greater amount of work now, since it's harder to make the code database-neutral.*
So if you can phrase the problem in terms of the goals your manager should be focusing on, like finishing the current project phase on time and on budget, it may be more effective than just telling him it's too difficult to make the code database-neutral.
There is a scenario when you need to make truly database-neutral code, that is when you are developing a shrink-wrap application that is required to support multiple brands of database.
Anyway, even if you currently support only one brand, there are certainly cases where you have a choice to code some SQL using proprietary features, but there also exists a more portable way to achieve the same result. You can treat these as "low-hanging fruit" cases, and you can make it easier to port the code in the future if the need arises. But don't limit yourself either, use proprietary solutions if they give good value. Perhaps add a note in the comments that this deserves review if/when you need to make a port.
* I prefer the word "neutral" instead of "agnostic" when talking about platform-independence. It avoids the religious overtone. :-)
There is a lot of information out there on object-relational mappers and how to best avoid impedance mismatch, all of which seem to be moot points if one were to use an object database. My question is why isn't this used more frequently? Is it because of performance reasons or because object databases cause your data to become proprietary to your application or is it due to something else?
Familiarity. The administrators of databases know relational concepts; object ones, not so much.
Performance. Relational databases have been proven to scale far better.
Maturity. SQL is a powerful, long-developed language.
Vendor support. You can pick between many more first-party (SQL servers) and third-party (administrative interfaces, mappings and other kinds of integration) tools than is the case with OODBMSs.
Naturally, the object-oriented model is more familiar to the developer, and, as you point out, would spare one of ORM. But thus far, the relational model has proven to be the more workable option.
See also the recent question, Object Orientated vs Relational Databases.
I've been using db4o which is an OODB and it solves most of the cons listed:
Familiarity - Programmers know their language better then SQL (see Native queries)
Performance - this one is highly subjective but you can take a look at PolePosition
Vendor support and maturity - can change over time
Cannot be used by programs that don't also use the same framework - There are OODB standards and you can use different frameworks
Versioning is probably a bit of a bitch - Versioning is actually easier!
The pros I'm interested in are:
Native queries - Db4o lets you write queries in your static typed language so you don't have to worry about mistyping a string and finding data missing at runtime,
Ease of use - Defining buissiness logic in the domain layer, persistence layer (mapping) and finally the SQL database is certainly violation of DRY. With OODB you define your domain where it belongs.
I agree - OODB have a long way to go but they are going. And there are domain problems out there that are better solved by OODB,
One objection to object databases is that it creates a tight coupling between the data and your code. For certain apps this may be OK, but not for others. One nice thing that a relational database gives you is the possibility to put many views on your data.
Ted Neward explains this and a lot more about OODBMSs a lot better than this.
It has nothing to do with performance. That is to say, basically all applications would perform better with an OODB. But that would also put lots of DBA's out of work/having to learn a new technology. Even more people would be out of work correcting errors in the data. That's unlikely to make OODBs popular with established companies. Gavin seems to be totally clueless, a better link would be Kirk
Cons:
Cannot be used by programs that
don't also use the same framework
for accessing the data store, making
it more difficult to use across the
enterprise.
Less resources available online for
non SQL-based database
No compatibility across database
types (can't swap to a different db
provider without changing all the
code)
Versioning is probably a bit of a
bitch. I'd guess adding a new
property to an object isn't quite as
easy as adding a new column to a
table.
Sören
All of the reasons you stated are valid, but I see the problem with OODBMS is the logical data model. The object-model (or rather the network model of the 70s) is not as simple as the relational one, and is therefore inferior.
jodonnel, i dont' see how use of object databases couples application code to the data. You can still abstract your application from the OODB through using a Repository pattern and replace with an ORM backed SQL database if you design things properly.
For an OO application, an OO database will provide a more natural fit for persisting objects.
What's probably true is that you tie your data to your domain model, but then that's the crux!
Wouldn't it be good to have a single way of looking at both data, business rules and processes using a domain centric view?
So, a big pro is that an OODB matches how most modern, enterprise level object orientated software applications are designed, there is no extra effort to design a data layer using a different (relational) design. Cheaper to build and maintain, and in many cases general higher performance.
Cons, just general lack of maturity and adoption i reckon...