Related
Just thinking that a relational db with an ORM is in many ways very similar to an object oriented database. My experience lies solely with RDMS with a hint of ORM, so it seems to me that object oriented databases are very similar but without the experience I can't say for sure.
If you have used object oriented databases and ORMs can you compare them? What are the weak points associcated with object oriented databases compared to RBMS+ORM?
What are the weak points associated with object oriented databases compared to RBMS+ORM?
The biggest weakness is the lack of standardization: no standard API, no standard query language (the OQL attempt has been a big failure) and thus the lack of portability and interoperable tools (for backup, archiving, migration, etc). You don't want that when it comes to data.
This explains IMO why OODBMS are a failure from an adoption point of view and why RDBMS will stay around for a while, regardless of the NoSQL movement (I have the feeling that OODBMS vendors see the NoSQL movement as an opportunity for a come back after some rebranding of their products).
My experience:
RDBMS:
Frankly, I don't like working with SQL, a 15-year old language, but the reality is that you're forced to if you want anything usable, such as bulk inserts (the LINQ-to-Entity ORM framework does not support bulk inserts, so it takes 30 seconds for a 20,000 record insert, compared to 500ms for a bulk insert in SQL). You end up having to use ADO.NET for database insertions.
There are some advantages of an RDBMS over an object database, namely, the data is independent of the calling application (however, this is a weakness also, as the mapping layer makes everything slower, more complex, and more brittle).
Bottom line: Spent 6 weeks with the LINQ-to-Entity ORM framework and Microsoft SQL Server 2008 R2. Reasonably steep learning curve.
Object databases: Tried an object database, namely, the free, open source db4o.
Discovered that I could persist my objects with one line of code.
No schema changes to deal with, it just worked.
Instant support for POCO (Plain Old Class Objects). Create your class in code, then persist it. Its possible to do the same with the Entity framework, however, its a lot of work with manual mapping, and it breaks very easy.
Turn on transparent persistence, and it has automatic lazy loading in the background - no more checking for objects that are not loaded because of lazy loading in the Entity framework.
The performance of object databases is also impressive: if you have 10 million rows in an object database, and indexing turned on, its 16ms for a three-column select. That's decent.
Bottom line: After 1 week I had the same solution as using a RDBMS for persistence, however, it was much cleaner, much less code, and more maintainable - and if I really wanted, I could use use a service to synchronize the db4o database with MSSQL.
For really large systems in the enterprise world, consisting of say 250 million rows in a table, things like sharding and options such as clustered indexing vs. non-clustered indexing become important for performance. In this case, db4o wouldn't work, it may require something more enterprisey. However, if you a trivially easy method of persisting objects, then an object database will fit the bill. All the work of learning SQL, setting up the mapping in the ORM, dealing with installation of MSSQL, implementing your own bulk loading procedures, etc just disappears, leaving you with clean, elegant 100% managed code.
I suspect that one of the reasons that vendors have not embraced object databases as that the database market is worth $3 billion per year, and there is no reason to kill the cash cow just yet.
Disclaimer: I have no affiliation with either Microsoft or db4o.
Chris Date agrees:
... 'object/relational' system would
be nothing more nor less than a true
relational system ... A proper
object/relational system is just a
relational system with proper type
support ... which just means it's a
proper relational system, no more and
no less.
SQL and Relational Theory: How to Write Accurate SQL Code, p 36
I would like to understand database scalability so I've just heard a talk about Habits of Highly Scalable Web Applications
http://techportal.inviqa.com/2010/03/02/habits-of-highly-scalable-web-applications/
On it, the presenter mainly talk about relational database scalability.
I also have read something about MapReduce and Column oriented tables, big tables, hypertable etc... trying to understand which are the most up to date methods to scale web application data. But the second group, to me, is being hard to understand where it fits.
It serves as transactional, reliable data store? or not, its just for large access and processing and to handle fine graned operations we will ever need to rely on RDBMSs?
Could someone give a comprehensive landscape for those new technologies and how to use it?
Basically it's about using the right tool for the job. Relational databases have been around for decades, which means they are very good at solving the problems that haven't changed in that time - things like keeping track of sales for example. Although they have become the default data store for just about everything, they are not so good at handling the problems that didn't exist twenty years ago - particularly scalability and data without a clearly defined, unchanging schema.
NOSQL is a class of tools designed to solve the problems that are not perfectly suited to relational databases. Scalability is the best known, though unlikely to be a relevant to most developers. I think the other key use case that we don't see so much of yet is for small projects that don't need to worry about the data storage characteristics at all, and can just use the default - being able to skip database design, ORM and database maintenance is quite attractive.
For Ecommerce specifically you're probably better off using sql at least in part - You might use NOSQL for product details or a recommendation engine, but you want your sales data in an easily queried sql table.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I've been hearing things about NoSQL and that it may eventually become the replacement for SQL DB storage methods due to the fact that DB interaction is often a bottle neck for speed on the web.
So I just have a few questions:
What exactly is it?
How does it work?
Why would it be better than using a SQL Database? And how much better is it?
Is the technology too new to start implementing yet or is it worth taking a look into?
There is no such thing as NoSQL!
NoSQL is a buzzword.
For decades, when people were talking about databases, they meant relational databases. And when people were talking about relational databases, they meant those you control with Edgar F. Codd's Structured Query Language. Storing data in some other way? Madness! Anything else is just flatfiles.
But in the past few years, people started to question this dogma. People wondered if tables with rows and columns are really the only way to represent data. People started thinking and coding, and came up with many new concepts how data could be organized. And they started to create new database systems designed for these new ways of working with data.
The philosophies of all these databases were different. But one thing all these databases had in common, was that the Structured Query Language was no longer a good fit for using them. So each database replaced SQL with their own query languages. And so the term NoSQL was born, as a label for all database technologies which defy the classic relational database model.
So what do NoSQL databases have in common?
Actually, not much.
You often hear phrases like:
NoSQL is scalable!
NoSQL is for BigData!
NoSQL violates ACID!
NoSQL is a glorified key/value store!
Is that true? Well, some of these statements might be true for some databases commonly called NoSQL, but every single one is also false for at least one other. Actually, the only thing NoSQL databases have in common, is that they are databases which do not use SQL. That's it. The only thing that defines them is what sets them apart from each other.
So what sets NoSQL databases apart?
So we made clear that all those databases commonly referred to as NoSQL are too different to evaluate them together. Each of them needs to be evaluated separately to decide if they are a good fit to solve a specific problem. But where do we begin? Thankfully, NoSQL databases can be grouped into certain categories, which are suitable for different use-cases:
Document-oriented
Examples: MongoDB, CouchDB
Strengths: Heterogenous data, working object-oriented, agile development
Their advantage is that they do not require a consistent data structure. They are useful when your requirements and thus your database layout changes constantly, or when you are dealing with datasets which belong together but still look very differently. When you have a lot of tables with two columns called "key" and "value", then these might be worth looking into.
Graph databases
Examples: Neo4j, GiraffeDB.
Strengths: Data Mining
While most NoSQL databases abandon the concept of managing data relations, these databases embrace it even more than those so-called relational databases.
Their focus is at defining data by its relation to other data. When you have a lot of tables with primary keys which are the primary keys of two other tables (and maybe some data describing the relation between them), then these might be something for you.
Key-Value Stores
Examples: Redis, Cassandra, MemcacheDB
Strengths: Fast lookup of values by known keys
They are very simplistic, but that makes them fast and easy to use. When you have no need for stored procedures, constraints, triggers and all those advanced database features and you just want fast storage and retrieval of your data, then those are for you.
Unfortunately they assume that you know exactly what you are looking for. You need the profile of User157641? No problem, will only take microseconds. But what when you want the names of all users who are aged between 16 and 24, have "waffles" as their favorite food and logged in in the last 24 hours? Tough luck. When you don't have a definite and unique key for a specific result, you can't get it out of your K-V store that easily.
Is SQL obsolete?
Some NoSQL proponents claim that their favorite NoSQL database is the new way of doing things, and SQL is a thing of the past.
Are they right?
No, of course they aren't. While there are problems SQL isn't suitable for, it still got its strengths. Lots of data models are simply best represented as a collection of tables which reference each other. Especially because most database programmers were trained for decades to think of data in a relational way, and trying to press this mindset onto a new technology which wasn't made for it rarely ends well.
NoSQL databases aren't a replacement for SQL - they are an alternative.
Most software ecosystems around the different NoSQL databases aren't as mature yet. While there are advances, you still haven't got supplemental tools which are as mature and powerful as those available for popular SQL databases.
Also, there is much more know-how for SQL around. Generations of computer scientists have spent decades of their careers into research focusing on relational databases, and it shows: The literature written about SQL databases and relational data modelling, both practical and theoretical, could fill multiple libraries full of books. How to build a relational database for your data is a topic so well-researched it's hard to find a corner case where there isn't a generally accepted by-the-book best practice.
Most NoSQL databases, on the other hand, are still in their infancy. We are still figuring out the best way to use them.
What exactly is it?
On one hand, a specific system, but it has also become a generic word for a variety of new data storage backends that do not follow the relational DB model.
How does it work?
Each of the systems labelled with the generic name works differently, but the basic idea is to offer better scalability and performance by using DB models that don't support all the functionality of a generic RDBMS, but still enough functionality to be useful. In a way it's like MySQL, which at one time lacked support for transactions but, exactly because of that, managed to outperform other DB systems. If you could write your app in a way that didn't require transactions, it was great.
Why would it be better than using a SQL Database? And how much better is it?
It would be better when your site needs to scale so massively that the best RDBMS running on the best hardware you can afford and optimized as much as possible simply can't keep up with the load. How much better it is depends on the specific use case (lots of update activity combined with lots of joins is very hard on "traditional" RDBMSs) - could well be a factor of 1000 in extreme cases.
Is the technology too new to start implementing yet or is it worth taking a look into?
Depends mainly on what you're trying to achieve. It's certainly mature enough to use. But few applications really need to scale that massively. For most, a traditional RDBMS is sufficient. However, with internet usage becoming more ubiquitous all the time, it's quite likely that applications that do will become more common (though probably not dominant).
Since someone said that my previous post was off-topic, I'll try to compensate :-) NoSQL is not, and never was, intended to be a replacement for more mainstream SQL databases, but a couple of words are in order to get things in the right perspective.
At the very heart of the NoSQL philosophy lies the consideration that, possibly for commercial and portability reasons, SQL engines tend to disregard the tremendous power of the UNIX operating system and its derivatives.
With a filesystem-based database, you can take immediate advantage of the ever-increasing capabilities and power of the underlying operating system, which have been steadily increasing for many years now in accordance with Moore's law. With this approach, many operating-system commands become automatically also "database operators" (think of "ls" "sort", "find" and the other countless UNIX shell utilities).
With this in mind, and a bit of creativity, you can indeed devise a filesystem-based database that is able to overcome the limitations of many common SQL engines, at least for specific usage patterns, which is the whole point behind NoSQL's philosophy, the way I see it.
I run hundreds of web sites and they all use NoSQL to a greater or lesser extent. In fact, they do not host huge amounts of data, but even if some of them did I could probably think of a creative use of NoSQL and the filesystem to overcome any bottlenecks. Something that would likely be more difficult with traditional SQL "jails". I urge you to google for "unix", "manis" and "shaffer" to understand what I mean.
If I recall correctly, it refers to types of databases that don't necessarily follow the relational form. Document databases come to mind, databases without a specific structure, and which don't use SQL as a specific query language.
It's generally better suited to web applications that rely on performance of the database, and don't need more advanced features of Relation Database Engines. For example, a Key->Value store providing a simple query by id interface might be 10-100x faster than the corresponding SQL server implementation, with a lower developer maintenance cost.
One example is this paper for an OLTP Tuple Store, which sacrificed transactions for single threaded processing (no concurrency problem because no concurrency allowed), and kept all data in memory; achieving 10-100x better performance as compared to a similar RDBMS driven system. Basically, it's moving away from the 'One Size Fits All' view of SQL and database systems.
In practice, NoSQL is a database system which supports fast access to large binary objects (docs, jpgs etc) using a key based access strategy. This is a departure from the traditional SQL access which is only good enough for alphanumeric values. Not only the internal storage and access strategy but also the syntax and limitations on the display format restricts the traditional SQL. BLOB implementations of traditional relational databases too suffer from these restrictions.
Behind the scene it is an indirect admission of the failure of the SQL model to support any form of OLTP or support for new dataformats. "Support" means not just store but full access capabilities - programmatic and querywise using the standard model.
Relational enthusiasts were quick to modify the defnition of NoSQL from Not-SQL to Not-Only-SQL to keep SQL still in the picture! This is not good especially when we see that most Java programs today resort to ORM mapping of the underlying relational model. A new concept must have a clearcut definition. Else it will end up like SOA.
The basis of the NoSQL systems lies in the random key - value pair. But this is not new. Traditional database systems like IMS and IDMS did support hashed ramdom keys (without making use of any index) and they still do. In fact IDMS already has a keyword NONSQL where they support SQL access to their older network database which they termed as NONSQL.
It's like Jacuzzi: both a brand and a generic name. It's not just a specific technology, but rather a specific type of technology, in this case referring to large-scale (often sparse) "databases" like Google's BigTable or CouchDB.
NoSQL the actual program appears to be a relational database implemented in awk using flat files on the backend. Though they profess, "NoSQL essentially has no arbitrary limits, and can work where other products can't. For example there is no limit on data field size, the number of columns, or file size" , I don't think it is the large scale database of the future.
As Joel says, massively scalable databases like BigTable or HBase, are much more interesting. GQL is the query language associated with BigTable and App Engine. It's largely SQL tweaked to avoid features Google considers bottle-necks (like joins). However, I haven't heard this referred to as "NoSQL" before.
NoSQL is a database system which doesn't use string based SQL queries to fetch data.
Instead you build queries using an API they will provide, for example Amazon DynamoDB is a good example of a NoSQL database.
NoSQL databases are better for large applications where scalability is important.
Does NoSQL mean non-relational database?
Yes, NoSQL is different from RDBMS and OLAP. It uses looser consistency models than traditional relational databases.
Consistency models are used in distributed systems like distributed shared memory systems or distributed data store.
How it works internally?
NoSQL database systems are often highly optimized for retrieval and appending operations and often offer little functionality beyond record storage (e.g. key-value stores). The reduced run-time flexibility compared to full SQL systems is compensated by marked gains in scalability and performance for certain data models.
It can work on Structured and Unstructured Data. It uses Collections instead of Tables
How do you query such "database"?
Watch SQL vs NoSQL: Battle of the Backends; it explains it all.
While computer programming evangelists predicting the future of Cloud Computing to be very bright, is there a chance for relational databases to be on their way out?
What are the DBs that are more suitable for Cloud Computing?
Here's a good article that may answer some of your questions. It features a good comparison between RDBMS systems and the ones usually used for cloud storage infrastructure:
http://www.readwriteweb.com/enterprise/2009/02/is-the-relational-database-doomed.php
The relational database model has a firm mathematical basis in relational algebra. This makes it easy to reason about, to extend, and to use properly (in theory). Even if database access patterns change significantly as a result of these new APIs and uses, it's likely that a relational database will form the underlying implementation for this reason.
No, RDBMSs will always have a place because of their functionality. Not just on their own, but also as backbones to other systems (like OODBMSs).
Relational databases are still relevant, both for localized storage (such as application-specific storage) and for server storage.
The cloud computing platforms that I've seen each have a relational database offering. So, I don't see cloud computing really changing the picture in reference to database types being used.
However, something will eventually replace the databases that we're all used to. The question is whether that will be a higher-level version of RDBs or something different. Another aspect of that question is how long will it take for the current crop of RDBs to fade out? (I don't have an answer for either.)
Clouds go poof still these days, so I don't think so anytime soon.
I don't think that cloud computing will kill RDBMSs. Something else might though.
First, what type of storage engine a given application uses does not (or should not) depend on where it is running (the cloud or a specific server), but rather on how it needs to store the data.
Second, as far as I can tell the only reason people think RDBMSs are on their way out is because they don't scale as well as non-relational DBMSs (such as document-oriented DBMSs like CouchDB) which can more easily be distributed into the cloud. However, there is no reason that RDBMSs cannot become more cloud-friendly in the future. As an early example, look at Drizzle:
The Drizzle project is building a database optimized for Cloud and Net applications. It is being designed for massive concurrency on modern multi-cpu/core architecture.
So no, I don't think that cloud computing will kill RDBMSs. They will just be forced to adapt. What might kill them, however, is if an existing alternative, or a new one, becomes as robust and easy to use as RDBMSs. What I mean is a solution that has both completely solid software (betas not allowed) and is easy for programmers to switch to. They give out degrees to people who understand RDBMSs. Because of all the assisting software (such as ORMs like ActiveRecord, SQLAlchemy, and whatever the .NET folk use I'm assuming), using RDBMSs has become easy even for people who don't know what the first normal form is. So I think that until there is a way for people to use (for instance) a DODBMS just as easily, RDBMSs will continue to dominate. I'm also not saying that is necessarily bad. Again, which DBMS you use should depend on your data, not what people say is cool and better.
A quote from the article :
"The inherent constraints of a relational database ensure that data at the lowest level have integrity. Data that violate integrity constraints cannot physically be entered into the database. These constraints don't exist in a key/value database, so the responsibility for ensuring data integrity falls entirely to the application. But application code often carries bugs. Bugs in a properly designed relational database usually don't lead to data integrity issues; bugs in a key/value database, however, quite easily lead to data integrity issues."
What this means to me is that RDBMS's are doomed, and hotshot new technologies are facing a great and brilliant future, to the same extent that users aren't anywhere near interested in the correctness of their data.
IMHO.
There's nothing wrong with relational databases for applications that need to query more structured data (e.g., "How many people bought product XYZ, on this date, paid more than $100, but less than $150?"). There are potentially significant architectural issues that will need to be addressed as these systems scale and grow. Once your DB outgrows the one machine you started on and/or traffic/requests begin to overload available resources, then (if you still want to keep your relational database) you have to start adding layers. Thankfully today, there are many more options available then in previous years... including caching, map and reduce, and other functionality - but these add-on layers do add complexity and maintenance overhead. In one sense I'd consider these engineered "band-aids" which will most likely solve the scalability and distribution problems with a relational DB today, but longer term? Who knows. I also see these popular layers today - all of which are basically trying to emulate functionality already available in object DBs, giving developers a "virtual object DB" layer that they can use with their object languages to do things faster and more efficiently, and get past the growth and performance obstacles. So I guess my overall opinion is, relational DBs became the defacto DB probably mostly due to how (relatively) easy it was to query a database, and get results back to the one client/app using it. As volumes have grown though, and application complexity is exponentially greater today, I think more developers will decide to bite the bullet, learn the syntax for object DBs (which is actually about as standardized today as relational DBs), and just skip all the middleware and layers that only emulate functionality that one could get natively in an OODBMS. I've seen OODBs that simply get installed on any number of servers, and automatically distributing data as needed, and giving the developer a single view of any size federation of databases... Seems to me the best solution as systems become more distributed, to get a DB that can has native distributed architecture. Anyway, just a thought.
There is a lot of information out there on object-relational mappers and how to best avoid impedance mismatch, all of which seem to be moot points if one were to use an object database. My question is why isn't this used more frequently? Is it because of performance reasons or because object databases cause your data to become proprietary to your application or is it due to something else?
Familiarity. The administrators of databases know relational concepts; object ones, not so much.
Performance. Relational databases have been proven to scale far better.
Maturity. SQL is a powerful, long-developed language.
Vendor support. You can pick between many more first-party (SQL servers) and third-party (administrative interfaces, mappings and other kinds of integration) tools than is the case with OODBMSs.
Naturally, the object-oriented model is more familiar to the developer, and, as you point out, would spare one of ORM. But thus far, the relational model has proven to be the more workable option.
See also the recent question, Object Orientated vs Relational Databases.
I've been using db4o which is an OODB and it solves most of the cons listed:
Familiarity - Programmers know their language better then SQL (see Native queries)
Performance - this one is highly subjective but you can take a look at PolePosition
Vendor support and maturity - can change over time
Cannot be used by programs that don't also use the same framework - There are OODB standards and you can use different frameworks
Versioning is probably a bit of a bitch - Versioning is actually easier!
The pros I'm interested in are:
Native queries - Db4o lets you write queries in your static typed language so you don't have to worry about mistyping a string and finding data missing at runtime,
Ease of use - Defining buissiness logic in the domain layer, persistence layer (mapping) and finally the SQL database is certainly violation of DRY. With OODB you define your domain where it belongs.
I agree - OODB have a long way to go but they are going. And there are domain problems out there that are better solved by OODB,
One objection to object databases is that it creates a tight coupling between the data and your code. For certain apps this may be OK, but not for others. One nice thing that a relational database gives you is the possibility to put many views on your data.
Ted Neward explains this and a lot more about OODBMSs a lot better than this.
It has nothing to do with performance. That is to say, basically all applications would perform better with an OODB. But that would also put lots of DBA's out of work/having to learn a new technology. Even more people would be out of work correcting errors in the data. That's unlikely to make OODBs popular with established companies. Gavin seems to be totally clueless, a better link would be Kirk
Cons:
Cannot be used by programs that
don't also use the same framework
for accessing the data store, making
it more difficult to use across the
enterprise.
Less resources available online for
non SQL-based database
No compatibility across database
types (can't swap to a different db
provider without changing all the
code)
Versioning is probably a bit of a
bitch. I'd guess adding a new
property to an object isn't quite as
easy as adding a new column to a
table.
Sören
All of the reasons you stated are valid, but I see the problem with OODBMS is the logical data model. The object-model (or rather the network model of the 70s) is not as simple as the relational one, and is therefore inferior.
jodonnel, i dont' see how use of object databases couples application code to the data. You can still abstract your application from the OODB through using a Repository pattern and replace with an ORM backed SQL database if you design things properly.
For an OO application, an OO database will provide a more natural fit for persisting objects.
What's probably true is that you tie your data to your domain model, but then that's the crux!
Wouldn't it be good to have a single way of looking at both data, business rules and processes using a domain centric view?
So, a big pro is that an OODB matches how most modern, enterprise level object orientated software applications are designed, there is no extra effort to design a data layer using a different (relational) design. Cheaper to build and maintain, and in many cases general higher performance.
Cons, just general lack of maturity and adoption i reckon...