Related
“A Graph Database –transforms a–> RDBMS”
The Neo4j site seems to imply that whatever you can do in RDBMS, you can do in Neo4j.
Before choosing Neo4j as a replacement for an RDBMS, I need some doubts answered.
I am interested in Neo4j for
ability to do quickly modify data "schema"
ability to express entities naturally instead of relations and normalizations
...which leads to highly expressive code (better than ORM)
This is a NoSQL solution I am interested in for it's features, not high performance.
Question: Does Neo4j present any issues that may make it unsuitable as a RDBMS replacement?
I am particularly concerned about these:
is there any DB feature I must implement in application logic? (For example, you must implement joins at application layer for a few NoSQL DBs)
Are the fields "indexed" to allow a lookup faster than O(n)?
How do I handle hot backups and replication?
any issues with "altering" schema or letting entities with different versions of the schema living together?
This is an extremely broad topic covering everything from modeling and implementation to IT and support. It's impossible to really answer all those questions here, especially without details on your situation. However, you seem to be exploring options and avenues. So, I'll just pass on some general food for thought as someone that's implemented a number of systems.
Everybody seems to think their new database paradigm is a replacement for relational databases. So, take those claims with a grain of salt.
I like to think in terms of 3 fundamental models: Relational, Document, and Graphing. Depending on your problem space one or even more of these is the right answer. I would not do financial transactions in anything but relational (SQL Based). If you are building a CMS, then a Document DB is the way to go. If my application is modeling networks (roads, people, connections, networks etc.) I use Neo4J.
As far as production quality, there are solid options in each category. Relational has a bunch. For document databases I'd go MongoDB or a higher level JCR system like Apache Jackrabbit. For graphing, I only have experience with Neo4j and it is rock solid for me.
Whatever you do, don't buy into the hype that "We have the one technology that solves all your problems." It's not there and it narrows your thinking.
I 'm convinced Neo4j is a good replacement for relational databases by now.
It is ACID compliant
Though the community version lacks some features like hot backups, the enterprise edition has
You can get support for it
At first sight (and in the new releases where you don't need a START clause) its query language CYPHER can do almost anything SQL can
but
it's harder to find a CYPHER developer than a SQL one
and it does not have an equivalent optimizer: it matters more than with SQL how you write the query
Though it supports replication and Neo explicitly markets it as a big data product, I can't confirm it is scalable enough and I did not study security aspects.
In recent releases (younger that the question above), one can define indexes on labels, which work like indexes on tables in a relational DB, allowing for O(log(n)) lookups.
(fyi: Neo4j has no tables, but each node(~=row) can have different labels, comparable to gmail labels. This is more flexible: you don't have to chose whether you put cars and bicycles in one for vehicles table or not: a bicycle would have both a :vehicle and a :bicycle label.)
To answer the original question: Neo4j does hardly support for schema enforcement. Neo advices implementing automated consistency tests on your database, which you run on your acceptance test instance as part of your release cycle.
Using an enterprise db such as oracle will give you many, many features which may or may not be part of neo. These include:
ACID transactions
High availability / backups / standby
ability to use sql to get data in the most efficient way using a cost based optimizer - the db determines the best way to retrieve the data based on your latest statistics
Scalability, partitioning
support
security
If you are going to implement most of the functionality of your application in code by yourself and don't require the structure and advanced features offered by an rdbms or if your data structures are better suited to a graph based db then by all means trial neo. There is a reason that most corporate apps use a one of the traditional rdbms servers but this may not always be the case in the future
I am looking at rewriting a VB based on-premise (locally installed) application (invoicing+inventory) as a web based Clojure application for small enterprise customers. I am intending this to be offered as a SaaS application for customers in similar trade.
I was looking at database options: My choice was an RDBMS: Postgresql/ MySQL. I might scale up to 400 users in the first year, with typically a 20-40 page views/ per day per user - mostly for transactions not static views. Each view will involve fetch data and update data. ACID compliance is necessary(or so I think). So the transaction volume is not huge.
It would have been a no-brainer to pick either of these based on my preference, but for this one requirement, which I believe is typical of a SaaS app: The Schema will be changing as I add more customers/users and for each customer's changing business requirement (I will be offering some limited flexibility only to start with). As I am not a DB expert, based on what I can think of and has read, I can handle that in a number of ways:
Have a traditional RDBMS schema design in MySQl/Postgresql with a single DB hosting multiple tenants. And add enough "free-floating" columns in each table to allow for future changes as I add more customers or changes for an existing customer. This might have a downside of propagating the changes to the DB every time a small change is made to the Schema. I remember reading that in Postgresql schema updates can be done real time without locking. But not sure, how painful or how practical is it in this use case. And also, as the schema changes might also introduce new/ minor SQL changes as well.
Have an RDBMS, but design the database schema in a flexible manner: with a close to entity-attribute-value or just as a key-value store. (Workday, FriendFeed for example)
Have the entire thing in-memory as objects and store them in log files periodically.(e.g., edval, lmax)
Go for a NoSQL DB like MongoDB or Redis. But based on what I can gather, they are not suitable for this use-case and not fully ACID compliant.
Go for some NewSQL Dbs like VoltDb or JustoneDb(cloud based) which retain the SQL and ACID compliant behaviour and are "new-gen" RDBMS.
I looked at neo4j(graphdb), but not sure if that will fit this use-case
In my use case, more than scalability or distributed computing, I am looking at a better way to achieve "Flexibility in Schema + ACID + some reasonable Performance". Most of the articles I could find on the net speak of flexibility in schema as a cause leading to performance(in the case of NoSQL DBs) and scalability while leaving out the ACID/Transactions side.
Is this an "either or" case of 'Schema flexibility vs ACID' transactions or Is there a better way out?
I think tarantool can help you. That solution have transactions, lua, msgpack, and etc. And also see that video
Recently NoSQL has gained immense popularity.
What are the advantages of NoSQL over traditional RDBMS?
Not all data is relational. For those situations, NoSQL can be helpful.
With that said, NoSQL stands for "Not Only SQL". It's not intended to knock SQL or supplant it.
SQL has several very big advantages:
Strong mathematical basis.
Declarative syntax.
A well-known language in Structured Query Language (SQL).
Those haven't gone away.
It's a mistake to think about this as an either/or argument. NoSQL is an alternative that people need to consider when it fits, that's all.
Documents can be stored in non-relational databases, like CouchDB.
Maybe reading this will help.
The history seem to look like this:
Google needs a storage layer for their inverted search index. They figure a traditional RDBMS is not going to cut it. So they implement a NoSQL data store, BigTable on top of their GFS file system. The major part is that thousands of cheap commodity hardware machines provides the speed and the redundancy.
Everyone else realizes what Google just did.
Brewers CAP theorem is proven. All RDBMS systems of use are CA systems. People begin playing with CP and AP systems as well. K/V stores are vastly simpler, so they are the primary vehicle for the research.
Software-as-a-service systems in general do not provide an SQL-like store. Hence, people get more interested in the NoSQL type stores.
I think much of the take-off can be related to this history. Scaling Google took some new ideas at Google and everyone else follows suit because this is the only solution they know to the scaling problem right now. Hence, you are willing to rework everything around the distributed database idea of Google because it is the only way to scale beyond a certain size.
C - Consistency
A - Availability
P - Partition tolerance
K/V - Key/Value
NoSQL is better than RDBMS because of the following reasons/properities of NoSQL
It supports semi-structured data and volatile data
It does not have schema
Read/Write throughput is very high
Horizontal scalability can be achieved easily
Will support Bigdata in volumes of Terra Bytes & Peta Bytes
Provides good support for Analytic tools on top of Bigdata
Can be hosted in cheaper hardware machines
In-memory caching option is available to increase the performance of queries
Faster development life cycles for developers
EDIT:
To answer "why RDBMS cannot scale", please take a look at RDBMS Overheads pdf written by Stavros Harizopoulos,Daniel J. Abadi,Samuel Madden and Michael Stonebraker
RDBMS's have challenges in handling huge data volumes of Terabytes & Peta bytes. Even if you have Redundant Array of Independent/Inexpensive Disks (RAID) & data shredding, it does not scale well for huge volume of data. You require very expensive hardware.
Logging: Assembling log records and tracking down all changes in database structures slows performance. Logging may not be necessary if recoverability is not a requirement or if recoverability is provided through other means (e.g., other sites on the network).
Locking: Traditional two-phase locking poses a sizeable overhead since all accesses to database structures are governed by a separate entity, the Lock Manager.
Latching: In a multi-threaded database, many data structures have to be latched before they can be accessed. Removing this feature and going to a single-threaded approach has a noticeable performance impact.
Buffer management: A main memory database system does not need to access pages through a buffer pool, eliminating a level of indirection on every record access.
This does not mean that we have to use NoSQL over SQL.
Still, RDBMS is better than NoSQL for the following reasons/properties of RDBMS
Transactions with ACID properties - Atomicity, Consistency, Isolation & Durability
Adherence to Strong Schema of data being written/read
Real time query management ( in case of data size < 10 Tera bytes )
Execution of complex queries involving join & group by clauses
We have to use RDBMS (SQL) and NoSQL (Not only SQL) depending on the business case & requirements
NOSQL has no special advantages over the relational database model. NOSQL does address certain limitations of current SQL DBMSs but it doesn't imply any fundamentally new capabilities over previous data models.
NOSQL means only no SQL (or "not only SQL") but that doesn't mean the same as no relational. A relational database in principle would make a very good NOSQL solution - it's just that none of the current set of NOSQL products uses the relational model.
RDBMS focus more on relationship and NoSQL focus more on storage.
You can consider using NoSQL when your RDBMS reaches bottlenecks. NoSQL makes RDBMS more flexible.
The biggest advantage of NoSQL over RDBMS is Scalability.
NoSQL databases can easily scale-out to many nodes, but for RDBMS it is very hard.
Scalability not only gives you more storage space but also much higher performance since many hosts work at the same time.
If you need to process huge amount of data with high performance
OR
If data model is not predetermined
then
NoSQL database is a better choice.
Just adding to all the information given above
NoSql Advantages:
1) NoSQL is good if you want to be production ready fast due to its support for schema-less and object oriented architecture.
2) NoSql db's are eventually consistent which in simple language means they will not provide any lock on the data(documents) as in case of RDBMS and what does it mean is latest snapshot of data is always available and thus increase the latency of your application.
3) It uses MVCC (Multi view concurrency control) strategy for maintaining and creating snapshot of data(documents).
4) If you want to have indexed data you can create view which will automatically index the data by the view definition you provide.
NoSql Disadvantages:
1) Its definitely not suitable for big heavy transactional applications as it is eventually consistent and does not support ACID properties.
2) Also it creates multiple snapshots (revisions) of your data (documents) as it uses MVCC methodology for concurrency control, as a result of which space get consumed faster than before which makes compaction and hence reindexing more frequent and it will slow down your application response as the data and transaction in your application grows.
To counter that you can horizontally scale the nodes but then again it will be higher cost as compare sql database.
From mongodb.com:
NoSQL databases differ from older, relational technology in four main areas:
Data models: A NoSQL database lets you build an application without having to define the schema first unlike relational databases which make you define your schema before you can add any data to the system. No predefined schema makes NoSQL databases much easier to update as your data and requirements change.
Data structure: Relational databases were built in an era where data was fairly structured and clearly defined by their relationships. NoSQL databases are designed to handle unstructured data (e.g., texts, social media posts, video, email) which makes up much of the data that exists today.
Scaling: It’s much cheaper to scale a NoSQL database than a relational database because you can add capacity by scaling out over cheap, commodity servers. Relational databases, on the other hand, require a single server to host your entire database. To scale, you need to buy a bigger, more expensive server.
Development model: NoSQL databases are open source whereas relational databases typically are closed source with licensing fees baked into the use of their software. With NoSQL, you can get started on a project without any heavy investments in software fees upfront.
Want to improve this post? Provide detailed answers to this question, including citations and an explanation of why your answer is correct. Answers without enough detail may be edited or deleted.
Is there any NoSQL data store that is ACID compliant?
I'll post this as an answer purely to support the conversation - Tim Mahy , nawroth , and CraigTP have suggested viable databases. CouchDB would be my preferred due to the use of Erlang, but there are others out there.
I'd say ACID does not contradict or negate the concept of NoSQL... While there seems to be a trend following the opinion expressed by dove , I would argue the concepts are distinct.
NoSQL is fundamentally about simple key-value (e.g. Redis) or document-style schema (collected key-value pairs in a "document" model, e.g. MongoDB) as a direct alternative to the explicit schema in classical RDBMSs. It allows the developer to treat things asymmetrically, whereas traditional engines have enforced rigid same-ness across the data model. The reason this is so interesting is because it provides a different way to deal with change, and for larger data sets it provides interesting opportunities to deal with volumes and performance.
ACID provides principles governing how changes are applied to a database. In a very simplified way, it states (my own version):
(A) when you do something to change a database the change should work or fail as a whole
(C) the database should remain consistent (this is a pretty broad topic)
(I) if other things are going on at the same time they shouldn't be able to see things mid-update
(D) if the system blows up (hardware or software) the database needs to be able to pick itself back up; and if it says it finished applying an update, it needs to be certain
The conversation gets a little more excitable when it comes to the idea of propagation and constraints. Some RDBMS engines provide the ability to enforce constraints (e.g. foreign keys) which may have propagation elements (a la cascade). In simpler terms, one "thing" may have a relationship with another "thing" in the database, and if you change an attribute of one it may require the other be changed (updated, deleted, ... lots of options). NoSQL databases, being predominantly (at the moment) focused on high data volumes and high traffic, seem to be tackling the idea of distributed updates which take place within (from a consumer perspective) arbitrary time frames. This is basically a specialized form of replication managed via transaction - so I would say that if a traditional distributed database can support ACID, so can a NoSQL database.
Some resources for further reading:
Wikipedia article on ACID
Wikipedia on propagation constraints
Wikipedia (yeah, I like the site, ok?) on database normalization
Apache documentation on CouchDB with a good overview of how it applies ACID
Wikipedia on Cluster Computing
Wikipedia (again...) on database transactions
UPDATE (27 July 2012):
Link to Wikipedia article has been updated to reflect the version of the article that was current when this answer was posted. Please note that the current Wikipedia article has been extensively revised!
Well, according to an older version of a Wikipedia article on NoSQL:
NoSQL is a movement promoting a
loosely defined class of
non-relational data stores that break
with a long history of relational
databases and ACID guarantees.
and also:
The name was an attempt to describe
the emergence of a growing number of
non-relational, distributed data
stores that often did not attempt to
provide ACID guarantees.
and
NoSQL systems often provide weak
consistency guarantees such as
eventual consistency and transactions
restricted to single data items, even
though one can impose full ACID
guarantees by adding a supplementary
middleware layer.
So, in a nutshell, I'd say that one of the main benefits of a "NoSQL" data store is its distinct lack of ACID properties. Furthermore, IMHO, the more one tries to implement and enforce ACID properties, the further away from the "spirit" of a "NoSQL" data store you get, and the closer to a "true" RDBMS you get (relatively speaking, of course).
However, all that said, "NoSQL" is a very vague term and is open to individual interpretations, and depends heavily upon just how much of a purist viewpoint you have. For example, most modern-day RDBMS systems don't actually adhere to all of Edgar F. Codd's 12 rules of his relation model!
Taking a pragmatic approach, it would appear that Apache's CouchDB comes closest to embodying both ACID-compliance whilst retaining loosely-coupled, non-relational "NoSQL" mentality.
Please ensure you read the Martin Fowler introduction about NoSQL databases. And the corresponding video.
First of all, we can distinguish two types of NoSQL databases:
Aggregate-oriented databases;
Graph-oriented databases (e.g. Neo4J).
By design, most Graph-oriented databases are ACID!
Then, what about the other types?
In Aggregate-oriented databases, we can put three sub-types:
Document-based NoSQL databases (e.g. MongoDB, CouchDB);
Key/Value NoSQL databases (e.g. Redis);
Column family NoSQL databases (e.g. Hibase, Cassandra).
What we call an Aggregate here, is what Eric Evans defined in its Domain-Driven Design as a self-sufficient of Entities and Value-Objects in a given Bounded Context.
As a consequence, an aggregate is a collection of data that we
interact with as a unit. Aggregates form the boundaries for ACID
operations with the database. (Martin Fowler)
So, at Aggregate level, we can say that most NoSQL databases can be as safe as ACID RDBMS, with the proper settings. Of source, if you tune your server for the best speed, you may come into something non ACID. But replication will help.
My main point is that you have to use NoSQL databases as they are, not as a (cheap) alternative to RDBMS. I have seen too much projects abusing of relations between documents. This can't be ACID. If you stay at document level, i.e. at Aggregate boundaries, you do not need any transaction. And your data will be as safe as with an ACID database, even if it not truly ACID, since you do not need those transactions! If you need transactions and update several "documents" at once, you are not in the NoSQL world any more - so use a RDBMS engine instead!
some 2019 update: Starting in version 4.0, for situations that require atomicity for updates to multiple documents or consistency between reads to multiple documents, MongoDB provides multi-document transactions for replica sets.
In this question someone must mention OrientDB:
OrientDB is a NoSQL database, one of the few, that support fully ACID transactions. ACID is not only for RDBMS because it's not part of the Relational algebra. So it IS possible to have a NoSQL database that support ACID.
This feature is the one I miss the most in MongoDB
FoundationDB is ACID compliant:
http://www.foundationdb.com/
It has proper transactions, so you can update multiple disparate data items in an ACID fashion. This is used as the foundation for maintaining indexes at a higher layer.
ACID and NoSQL are completely orthogonal. One does not imply the other.
I have a notebook on my desk, I use it to keep notes on things that I still have to do. This notebook is a NoSQL database. I query it using a linear search with a "page cache" so I don't always have to search every page. It is also ACID compliant as I ensure that I only write one thing at a time and never while I am reading it.
NoSQL simply means that it isn't SQL. Many people get confused and think it means highly-scaleable-wild-west-super-fast-storage. It doesn't. It doesn't mean key-value store, or eventual consistency. All it means is "not SQL", there are a lot of databases in this planet and most of them are not SQL[citation needed].
You can find many examples in the other answers so I need not list them here, but there are non-SQL databases with ACID compliance for various operations, some are only ACID for single object writes while some guarantee far more. Each database is different.
"NoSQL" is not a well-defined term. It's a very vague concept. As such, it's not even possible to say what is and what is not a "NoSQL" product. Not nearly all of the products typcially branded with the label are key-value stores.
As one of the originators of NoSQL (I was an early contributor to Apache CouchDB, and a speaker at the first NoSQL event held at CBS Interactive / CNET in 2009) I'm excited to see new algorithms create possibilities that didn't exist before. The Calvin protocol offers a new way to think of physical constraints like CAP and PACELC.
Instead of active/passive async replication, or active/active synchronous replication, Calvin preserves correctness and availability during replica outages by using a RAFT-like protocol to maintain a transaction log. Additionally, transactions are processed deterministically at each replica, removing the potential for deadlocks, so agreement is achieved with only a single round of consensus. This makes it fast even on multi-cloud worldwide deployments.
FaunaDB is the only database implementation using the Calvin protocol, making it uniquely suited for workloads that require mainframe-like data integrity with NoSQL scale and flexibility.
Yes, MarkLogic Server is a NoSQL solution (document database I like to call it) that works with ACID transactions
The grandfather of NoSQL: ZODB is ACID compliant. http://www.zodb.org/
However, it's Python only.
If you are looking for an ACID compliant key/value store, there's Berkeley DB. Among graph databases at least Neo4j and HyperGraphDB offer ACID transactions (HyperGraphDB actually uses Berkeley DB for low-level storage at the moment).
FoundationDB was mentioned and at the time it wasn't open source. It's been open sourced by Apple two days ago:
https://www.foundationdb.org/blog/foundationdb-is-open-source/
I believe it is ACID compliant.
MongoDB announced that its 4.0 version will be ACID compliant for multi-document transactions.
Version 4.2. is supposed to support it under sharded setups.
https://www.mongodb.com/blog/post/multi-document-transactions-in-mongodb
NewSQL
This concept Wikipedia contributors define as:
[…] a class of modern relational database management systems that seek to provide the same scalable performance of NoSQL systems for online transaction processing (OLTP) read-write workloads while still maintaining the ACID guarantees of a traditional database system.[1][2][3]
References
[1] Nancy Lynch and Seth Gilbert, “Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services”, ACM SIGACT News, Volume 33 Issue 2 (2002), pg. 51-59.
[2] "Brewer's CAP Theorem", julianbrowne.com, Retrieved 02-Mar-2010
[3] "Brewers CAP theorem on distributed systems", royans.net
take a look at the CAP theorem
EDIT: RavenDB seems to be ACID compliant
To add to the list of alternatives, another fully ACID compliant NoSQL database is GT.M.
Hyperdex Warp http://hyperdex.org/warp/
Warp (ACID feature) is proprietary, but Hyperdex is free.
db4o
Unlike roll-your-own persistence or
serialization, db4o is ACID
transaction safe and allows for
querying, replication and schema
changes during runtime
http://www.db4o.com/about/productinformation/db4o/
BergDB is a light-weight, open-source, NoSQL database designed from the start to run ACID transactions. Actually, BergDB is "more" ACID than most SQL databases in the sense that the only way to change the state of the database is to run ACID transactions with the highest isolation level (SQL term: "serializable"). There will never be any issues with dirty reads, non-repeatable reads, or phantom reads.
In my opinion, the database is still highly performant; but don't trust me, I created the software. Try it yourself instead.
Tarantool is a fully ACID NoSQL database. You can issue CRUD operations or stored procedures, everything will be run with strict accordance with an ACID property. You can also read about that here: http://stable.tarantool.org/doc/mpage/data-and-persistence.html
MarkLogic is also ACID complient. I think is one of the biggest players now.
Wait is over.
ACID compliant NoSQL DB is out ----------- have a look at citrusleaf
A lot of modern NoSQL solution don't support ACID transactions (atomic isolated multi-key updates), but most of them support primitives which allow you to implement transactions on the application level.
If a data store supports per key linearizability and compare-and-set (document level atomicity) then it's enough to implement client-side transactions, more over you have several options to choose from:
If you need Serializable isolation level then you can follow the same algorithm which Google use for the Percolator system or Cockroach Labs for CockroachDB. I've blogged about it and create a step-by-step visualization, I hope it will help you to understand the main idea behind the algorithm.
If you expect high contention but it's fine for you to have Read Committed isolation level then please take a look on the RAMP transactions by Peter Bailis.
The third approach is to use compensating transactions also known as the saga pattern. It was described in the late 80s in the Sagas paper but became more actual with the raise of distributed systems. Please see the Applying the Saga Pattern talk for inspiration.
The list of data stores suitable for client side transactions includes Cassandra with lightweight transactions, Riak with consistent buckets, RethinkDB, ZooKeeper, Etdc, HBase, DynamoDB, MongoDB and others.
YugaByte DB supports an ACID Compliant distributed txns as well as Redis and CQL API compatibility on the query layer.
Google Cloud Datastore is a NoSQL database that supports ACID transactions
DynamoDB is a NoSQL database and has ACID transactions.
VoltDB is an entrant which claims ACID compliance, and while it still uses SQL, its goals are the same in terms of scalability
Whilst it's only an embedded engine and not a server, leveldb has WriteBatch and the ability to turn on Synchronous writes to provide ACID behaviour.
Node levelUP is transactional and built on leveldb https://github.com/rvagg/node-levelup#batch
If you add enough pure water and successfully flip a coin, anything can become acidic. Or basic for that matter.
To say a database is ACID compliant means four specific things. And in defining the system (restricting the range) we can arbitrarily water down the meanings so that the result is ACID compliance.
A—if your NoSQL database only allows one record operation at a time and records either go or they don't then that's atomic.
C—if the only constraints you allow are simple, like checking JSON schemas against a known schema then that's consistent.
I—if just append-only transactions are supported (and schema changes are disallowed) then it is impossible for anything to depend on anything else, that's independent.
D—if you turn off all machines at night and synchronize disks then the transactions will be it in or they won't, that's durable.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I've been hearing things about NoSQL and that it may eventually become the replacement for SQL DB storage methods due to the fact that DB interaction is often a bottle neck for speed on the web.
So I just have a few questions:
What exactly is it?
How does it work?
Why would it be better than using a SQL Database? And how much better is it?
Is the technology too new to start implementing yet or is it worth taking a look into?
There is no such thing as NoSQL!
NoSQL is a buzzword.
For decades, when people were talking about databases, they meant relational databases. And when people were talking about relational databases, they meant those you control with Edgar F. Codd's Structured Query Language. Storing data in some other way? Madness! Anything else is just flatfiles.
But in the past few years, people started to question this dogma. People wondered if tables with rows and columns are really the only way to represent data. People started thinking and coding, and came up with many new concepts how data could be organized. And they started to create new database systems designed for these new ways of working with data.
The philosophies of all these databases were different. But one thing all these databases had in common, was that the Structured Query Language was no longer a good fit for using them. So each database replaced SQL with their own query languages. And so the term NoSQL was born, as a label for all database technologies which defy the classic relational database model.
So what do NoSQL databases have in common?
Actually, not much.
You often hear phrases like:
NoSQL is scalable!
NoSQL is for BigData!
NoSQL violates ACID!
NoSQL is a glorified key/value store!
Is that true? Well, some of these statements might be true for some databases commonly called NoSQL, but every single one is also false for at least one other. Actually, the only thing NoSQL databases have in common, is that they are databases which do not use SQL. That's it. The only thing that defines them is what sets them apart from each other.
So what sets NoSQL databases apart?
So we made clear that all those databases commonly referred to as NoSQL are too different to evaluate them together. Each of them needs to be evaluated separately to decide if they are a good fit to solve a specific problem. But where do we begin? Thankfully, NoSQL databases can be grouped into certain categories, which are suitable for different use-cases:
Document-oriented
Examples: MongoDB, CouchDB
Strengths: Heterogenous data, working object-oriented, agile development
Their advantage is that they do not require a consistent data structure. They are useful when your requirements and thus your database layout changes constantly, or when you are dealing with datasets which belong together but still look very differently. When you have a lot of tables with two columns called "key" and "value", then these might be worth looking into.
Graph databases
Examples: Neo4j, GiraffeDB.
Strengths: Data Mining
While most NoSQL databases abandon the concept of managing data relations, these databases embrace it even more than those so-called relational databases.
Their focus is at defining data by its relation to other data. When you have a lot of tables with primary keys which are the primary keys of two other tables (and maybe some data describing the relation between them), then these might be something for you.
Key-Value Stores
Examples: Redis, Cassandra, MemcacheDB
Strengths: Fast lookup of values by known keys
They are very simplistic, but that makes them fast and easy to use. When you have no need for stored procedures, constraints, triggers and all those advanced database features and you just want fast storage and retrieval of your data, then those are for you.
Unfortunately they assume that you know exactly what you are looking for. You need the profile of User157641? No problem, will only take microseconds. But what when you want the names of all users who are aged between 16 and 24, have "waffles" as their favorite food and logged in in the last 24 hours? Tough luck. When you don't have a definite and unique key for a specific result, you can't get it out of your K-V store that easily.
Is SQL obsolete?
Some NoSQL proponents claim that their favorite NoSQL database is the new way of doing things, and SQL is a thing of the past.
Are they right?
No, of course they aren't. While there are problems SQL isn't suitable for, it still got its strengths. Lots of data models are simply best represented as a collection of tables which reference each other. Especially because most database programmers were trained for decades to think of data in a relational way, and trying to press this mindset onto a new technology which wasn't made for it rarely ends well.
NoSQL databases aren't a replacement for SQL - they are an alternative.
Most software ecosystems around the different NoSQL databases aren't as mature yet. While there are advances, you still haven't got supplemental tools which are as mature and powerful as those available for popular SQL databases.
Also, there is much more know-how for SQL around. Generations of computer scientists have spent decades of their careers into research focusing on relational databases, and it shows: The literature written about SQL databases and relational data modelling, both practical and theoretical, could fill multiple libraries full of books. How to build a relational database for your data is a topic so well-researched it's hard to find a corner case where there isn't a generally accepted by-the-book best practice.
Most NoSQL databases, on the other hand, are still in their infancy. We are still figuring out the best way to use them.
What exactly is it?
On one hand, a specific system, but it has also become a generic word for a variety of new data storage backends that do not follow the relational DB model.
How does it work?
Each of the systems labelled with the generic name works differently, but the basic idea is to offer better scalability and performance by using DB models that don't support all the functionality of a generic RDBMS, but still enough functionality to be useful. In a way it's like MySQL, which at one time lacked support for transactions but, exactly because of that, managed to outperform other DB systems. If you could write your app in a way that didn't require transactions, it was great.
Why would it be better than using a SQL Database? And how much better is it?
It would be better when your site needs to scale so massively that the best RDBMS running on the best hardware you can afford and optimized as much as possible simply can't keep up with the load. How much better it is depends on the specific use case (lots of update activity combined with lots of joins is very hard on "traditional" RDBMSs) - could well be a factor of 1000 in extreme cases.
Is the technology too new to start implementing yet or is it worth taking a look into?
Depends mainly on what you're trying to achieve. It's certainly mature enough to use. But few applications really need to scale that massively. For most, a traditional RDBMS is sufficient. However, with internet usage becoming more ubiquitous all the time, it's quite likely that applications that do will become more common (though probably not dominant).
Since someone said that my previous post was off-topic, I'll try to compensate :-) NoSQL is not, and never was, intended to be a replacement for more mainstream SQL databases, but a couple of words are in order to get things in the right perspective.
At the very heart of the NoSQL philosophy lies the consideration that, possibly for commercial and portability reasons, SQL engines tend to disregard the tremendous power of the UNIX operating system and its derivatives.
With a filesystem-based database, you can take immediate advantage of the ever-increasing capabilities and power of the underlying operating system, which have been steadily increasing for many years now in accordance with Moore's law. With this approach, many operating-system commands become automatically also "database operators" (think of "ls" "sort", "find" and the other countless UNIX shell utilities).
With this in mind, and a bit of creativity, you can indeed devise a filesystem-based database that is able to overcome the limitations of many common SQL engines, at least for specific usage patterns, which is the whole point behind NoSQL's philosophy, the way I see it.
I run hundreds of web sites and they all use NoSQL to a greater or lesser extent. In fact, they do not host huge amounts of data, but even if some of them did I could probably think of a creative use of NoSQL and the filesystem to overcome any bottlenecks. Something that would likely be more difficult with traditional SQL "jails". I urge you to google for "unix", "manis" and "shaffer" to understand what I mean.
If I recall correctly, it refers to types of databases that don't necessarily follow the relational form. Document databases come to mind, databases without a specific structure, and which don't use SQL as a specific query language.
It's generally better suited to web applications that rely on performance of the database, and don't need more advanced features of Relation Database Engines. For example, a Key->Value store providing a simple query by id interface might be 10-100x faster than the corresponding SQL server implementation, with a lower developer maintenance cost.
One example is this paper for an OLTP Tuple Store, which sacrificed transactions for single threaded processing (no concurrency problem because no concurrency allowed), and kept all data in memory; achieving 10-100x better performance as compared to a similar RDBMS driven system. Basically, it's moving away from the 'One Size Fits All' view of SQL and database systems.
In practice, NoSQL is a database system which supports fast access to large binary objects (docs, jpgs etc) using a key based access strategy. This is a departure from the traditional SQL access which is only good enough for alphanumeric values. Not only the internal storage and access strategy but also the syntax and limitations on the display format restricts the traditional SQL. BLOB implementations of traditional relational databases too suffer from these restrictions.
Behind the scene it is an indirect admission of the failure of the SQL model to support any form of OLTP or support for new dataformats. "Support" means not just store but full access capabilities - programmatic and querywise using the standard model.
Relational enthusiasts were quick to modify the defnition of NoSQL from Not-SQL to Not-Only-SQL to keep SQL still in the picture! This is not good especially when we see that most Java programs today resort to ORM mapping of the underlying relational model. A new concept must have a clearcut definition. Else it will end up like SOA.
The basis of the NoSQL systems lies in the random key - value pair. But this is not new. Traditional database systems like IMS and IDMS did support hashed ramdom keys (without making use of any index) and they still do. In fact IDMS already has a keyword NONSQL where they support SQL access to their older network database which they termed as NONSQL.
It's like Jacuzzi: both a brand and a generic name. It's not just a specific technology, but rather a specific type of technology, in this case referring to large-scale (often sparse) "databases" like Google's BigTable or CouchDB.
NoSQL the actual program appears to be a relational database implemented in awk using flat files on the backend. Though they profess, "NoSQL essentially has no arbitrary limits, and can work where other products can't. For example there is no limit on data field size, the number of columns, or file size" , I don't think it is the large scale database of the future.
As Joel says, massively scalable databases like BigTable or HBase, are much more interesting. GQL is the query language associated with BigTable and App Engine. It's largely SQL tweaked to avoid features Google considers bottle-necks (like joins). However, I haven't heard this referred to as "NoSQL" before.
NoSQL is a database system which doesn't use string based SQL queries to fetch data.
Instead you build queries using an API they will provide, for example Amazon DynamoDB is a good example of a NoSQL database.
NoSQL databases are better for large applications where scalability is important.
Does NoSQL mean non-relational database?
Yes, NoSQL is different from RDBMS and OLAP. It uses looser consistency models than traditional relational databases.
Consistency models are used in distributed systems like distributed shared memory systems or distributed data store.
How it works internally?
NoSQL database systems are often highly optimized for retrieval and appending operations and often offer little functionality beyond record storage (e.g. key-value stores). The reduced run-time flexibility compared to full SQL systems is compensated by marked gains in scalability and performance for certain data models.
It can work on Structured and Unstructured Data. It uses Collections instead of Tables
How do you query such "database"?
Watch SQL vs NoSQL: Battle of the Backends; it explains it all.