Want to improve this post? Provide detailed answers to this question, including citations and an explanation of why your answer is correct. Answers without enough detail may be edited or deleted.
Is there any NoSQL data store that is ACID compliant?
I'll post this as an answer purely to support the conversation - Tim Mahy , nawroth , and CraigTP have suggested viable databases. CouchDB would be my preferred due to the use of Erlang, but there are others out there.
I'd say ACID does not contradict or negate the concept of NoSQL... While there seems to be a trend following the opinion expressed by dove , I would argue the concepts are distinct.
NoSQL is fundamentally about simple key-value (e.g. Redis) or document-style schema (collected key-value pairs in a "document" model, e.g. MongoDB) as a direct alternative to the explicit schema in classical RDBMSs. It allows the developer to treat things asymmetrically, whereas traditional engines have enforced rigid same-ness across the data model. The reason this is so interesting is because it provides a different way to deal with change, and for larger data sets it provides interesting opportunities to deal with volumes and performance.
ACID provides principles governing how changes are applied to a database. In a very simplified way, it states (my own version):
(A) when you do something to change a database the change should work or fail as a whole
(C) the database should remain consistent (this is a pretty broad topic)
(I) if other things are going on at the same time they shouldn't be able to see things mid-update
(D) if the system blows up (hardware or software) the database needs to be able to pick itself back up; and if it says it finished applying an update, it needs to be certain
The conversation gets a little more excitable when it comes to the idea of propagation and constraints. Some RDBMS engines provide the ability to enforce constraints (e.g. foreign keys) which may have propagation elements (a la cascade). In simpler terms, one "thing" may have a relationship with another "thing" in the database, and if you change an attribute of one it may require the other be changed (updated, deleted, ... lots of options). NoSQL databases, being predominantly (at the moment) focused on high data volumes and high traffic, seem to be tackling the idea of distributed updates which take place within (from a consumer perspective) arbitrary time frames. This is basically a specialized form of replication managed via transaction - so I would say that if a traditional distributed database can support ACID, so can a NoSQL database.
Some resources for further reading:
Wikipedia article on ACID
Wikipedia on propagation constraints
Wikipedia (yeah, I like the site, ok?) on database normalization
Apache documentation on CouchDB with a good overview of how it applies ACID
Wikipedia on Cluster Computing
Wikipedia (again...) on database transactions
UPDATE (27 July 2012):
Link to Wikipedia article has been updated to reflect the version of the article that was current when this answer was posted. Please note that the current Wikipedia article has been extensively revised!
Well, according to an older version of a Wikipedia article on NoSQL:
NoSQL is a movement promoting a
loosely defined class of
non-relational data stores that break
with a long history of relational
databases and ACID guarantees.
and also:
The name was an attempt to describe
the emergence of a growing number of
non-relational, distributed data
stores that often did not attempt to
provide ACID guarantees.
and
NoSQL systems often provide weak
consistency guarantees such as
eventual consistency and transactions
restricted to single data items, even
though one can impose full ACID
guarantees by adding a supplementary
middleware layer.
So, in a nutshell, I'd say that one of the main benefits of a "NoSQL" data store is its distinct lack of ACID properties. Furthermore, IMHO, the more one tries to implement and enforce ACID properties, the further away from the "spirit" of a "NoSQL" data store you get, and the closer to a "true" RDBMS you get (relatively speaking, of course).
However, all that said, "NoSQL" is a very vague term and is open to individual interpretations, and depends heavily upon just how much of a purist viewpoint you have. For example, most modern-day RDBMS systems don't actually adhere to all of Edgar F. Codd's 12 rules of his relation model!
Taking a pragmatic approach, it would appear that Apache's CouchDB comes closest to embodying both ACID-compliance whilst retaining loosely-coupled, non-relational "NoSQL" mentality.
Please ensure you read the Martin Fowler introduction about NoSQL databases. And the corresponding video.
First of all, we can distinguish two types of NoSQL databases:
Aggregate-oriented databases;
Graph-oriented databases (e.g. Neo4J).
By design, most Graph-oriented databases are ACID!
Then, what about the other types?
In Aggregate-oriented databases, we can put three sub-types:
Document-based NoSQL databases (e.g. MongoDB, CouchDB);
Key/Value NoSQL databases (e.g. Redis);
Column family NoSQL databases (e.g. Hibase, Cassandra).
What we call an Aggregate here, is what Eric Evans defined in its Domain-Driven Design as a self-sufficient of Entities and Value-Objects in a given Bounded Context.
As a consequence, an aggregate is a collection of data that we
interact with as a unit. Aggregates form the boundaries for ACID
operations with the database. (Martin Fowler)
So, at Aggregate level, we can say that most NoSQL databases can be as safe as ACID RDBMS, with the proper settings. Of source, if you tune your server for the best speed, you may come into something non ACID. But replication will help.
My main point is that you have to use NoSQL databases as they are, not as a (cheap) alternative to RDBMS. I have seen too much projects abusing of relations between documents. This can't be ACID. If you stay at document level, i.e. at Aggregate boundaries, you do not need any transaction. And your data will be as safe as with an ACID database, even if it not truly ACID, since you do not need those transactions! If you need transactions and update several "documents" at once, you are not in the NoSQL world any more - so use a RDBMS engine instead!
some 2019 update: Starting in version 4.0, for situations that require atomicity for updates to multiple documents or consistency between reads to multiple documents, MongoDB provides multi-document transactions for replica sets.
In this question someone must mention OrientDB:
OrientDB is a NoSQL database, one of the few, that support fully ACID transactions. ACID is not only for RDBMS because it's not part of the Relational algebra. So it IS possible to have a NoSQL database that support ACID.
This feature is the one I miss the most in MongoDB
FoundationDB is ACID compliant:
http://www.foundationdb.com/
It has proper transactions, so you can update multiple disparate data items in an ACID fashion. This is used as the foundation for maintaining indexes at a higher layer.
ACID and NoSQL are completely orthogonal. One does not imply the other.
I have a notebook on my desk, I use it to keep notes on things that I still have to do. This notebook is a NoSQL database. I query it using a linear search with a "page cache" so I don't always have to search every page. It is also ACID compliant as I ensure that I only write one thing at a time and never while I am reading it.
NoSQL simply means that it isn't SQL. Many people get confused and think it means highly-scaleable-wild-west-super-fast-storage. It doesn't. It doesn't mean key-value store, or eventual consistency. All it means is "not SQL", there are a lot of databases in this planet and most of them are not SQL[citation needed].
You can find many examples in the other answers so I need not list them here, but there are non-SQL databases with ACID compliance for various operations, some are only ACID for single object writes while some guarantee far more. Each database is different.
"NoSQL" is not a well-defined term. It's a very vague concept. As such, it's not even possible to say what is and what is not a "NoSQL" product. Not nearly all of the products typcially branded with the label are key-value stores.
As one of the originators of NoSQL (I was an early contributor to Apache CouchDB, and a speaker at the first NoSQL event held at CBS Interactive / CNET in 2009) I'm excited to see new algorithms create possibilities that didn't exist before. The Calvin protocol offers a new way to think of physical constraints like CAP and PACELC.
Instead of active/passive async replication, or active/active synchronous replication, Calvin preserves correctness and availability during replica outages by using a RAFT-like protocol to maintain a transaction log. Additionally, transactions are processed deterministically at each replica, removing the potential for deadlocks, so agreement is achieved with only a single round of consensus. This makes it fast even on multi-cloud worldwide deployments.
FaunaDB is the only database implementation using the Calvin protocol, making it uniquely suited for workloads that require mainframe-like data integrity with NoSQL scale and flexibility.
Yes, MarkLogic Server is a NoSQL solution (document database I like to call it) that works with ACID transactions
The grandfather of NoSQL: ZODB is ACID compliant. http://www.zodb.org/
However, it's Python only.
If you are looking for an ACID compliant key/value store, there's Berkeley DB. Among graph databases at least Neo4j and HyperGraphDB offer ACID transactions (HyperGraphDB actually uses Berkeley DB for low-level storage at the moment).
FoundationDB was mentioned and at the time it wasn't open source. It's been open sourced by Apple two days ago:
https://www.foundationdb.org/blog/foundationdb-is-open-source/
I believe it is ACID compliant.
MongoDB announced that its 4.0 version will be ACID compliant for multi-document transactions.
Version 4.2. is supposed to support it under sharded setups.
https://www.mongodb.com/blog/post/multi-document-transactions-in-mongodb
NewSQL
This concept Wikipedia contributors define as:
[…] a class of modern relational database management systems that seek to provide the same scalable performance of NoSQL systems for online transaction processing (OLTP) read-write workloads while still maintaining the ACID guarantees of a traditional database system.[1][2][3]
References
[1] Nancy Lynch and Seth Gilbert, “Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services”, ACM SIGACT News, Volume 33 Issue 2 (2002), pg. 51-59.
[2] "Brewer's CAP Theorem", julianbrowne.com, Retrieved 02-Mar-2010
[3] "Brewers CAP theorem on distributed systems", royans.net
take a look at the CAP theorem
EDIT: RavenDB seems to be ACID compliant
To add to the list of alternatives, another fully ACID compliant NoSQL database is GT.M.
Hyperdex Warp http://hyperdex.org/warp/
Warp (ACID feature) is proprietary, but Hyperdex is free.
db4o
Unlike roll-your-own persistence or
serialization, db4o is ACID
transaction safe and allows for
querying, replication and schema
changes during runtime
http://www.db4o.com/about/productinformation/db4o/
BergDB is a light-weight, open-source, NoSQL database designed from the start to run ACID transactions. Actually, BergDB is "more" ACID than most SQL databases in the sense that the only way to change the state of the database is to run ACID transactions with the highest isolation level (SQL term: "serializable"). There will never be any issues with dirty reads, non-repeatable reads, or phantom reads.
In my opinion, the database is still highly performant; but don't trust me, I created the software. Try it yourself instead.
Tarantool is a fully ACID NoSQL database. You can issue CRUD operations or stored procedures, everything will be run with strict accordance with an ACID property. You can also read about that here: http://stable.tarantool.org/doc/mpage/data-and-persistence.html
MarkLogic is also ACID complient. I think is one of the biggest players now.
Wait is over.
ACID compliant NoSQL DB is out ----------- have a look at citrusleaf
A lot of modern NoSQL solution don't support ACID transactions (atomic isolated multi-key updates), but most of them support primitives which allow you to implement transactions on the application level.
If a data store supports per key linearizability and compare-and-set (document level atomicity) then it's enough to implement client-side transactions, more over you have several options to choose from:
If you need Serializable isolation level then you can follow the same algorithm which Google use for the Percolator system or Cockroach Labs for CockroachDB. I've blogged about it and create a step-by-step visualization, I hope it will help you to understand the main idea behind the algorithm.
If you expect high contention but it's fine for you to have Read Committed isolation level then please take a look on the RAMP transactions by Peter Bailis.
The third approach is to use compensating transactions also known as the saga pattern. It was described in the late 80s in the Sagas paper but became more actual with the raise of distributed systems. Please see the Applying the Saga Pattern talk for inspiration.
The list of data stores suitable for client side transactions includes Cassandra with lightweight transactions, Riak with consistent buckets, RethinkDB, ZooKeeper, Etdc, HBase, DynamoDB, MongoDB and others.
YugaByte DB supports an ACID Compliant distributed txns as well as Redis and CQL API compatibility on the query layer.
Google Cloud Datastore is a NoSQL database that supports ACID transactions
DynamoDB is a NoSQL database and has ACID transactions.
VoltDB is an entrant which claims ACID compliance, and while it still uses SQL, its goals are the same in terms of scalability
Whilst it's only an embedded engine and not a server, leveldb has WriteBatch and the ability to turn on Synchronous writes to provide ACID behaviour.
Node levelUP is transactional and built on leveldb https://github.com/rvagg/node-levelup#batch
If you add enough pure water and successfully flip a coin, anything can become acidic. Or basic for that matter.
To say a database is ACID compliant means four specific things. And in defining the system (restricting the range) we can arbitrarily water down the meanings so that the result is ACID compliance.
A—if your NoSQL database only allows one record operation at a time and records either go or they don't then that's atomic.
C—if the only constraints you allow are simple, like checking JSON schemas against a known schema then that's consistent.
I—if just append-only transactions are supported (and schema changes are disallowed) then it is impossible for anything to depend on anything else, that's independent.
D—if you turn off all machines at night and synchronize disks then the transactions will be it in or they won't, that's durable.
What exactly is NoSQL? Is it database systems that only work with {key:value} pairs?
As far as I know MemCache is one of such database systems, am I right?
What other popular NoSQL databases are there and where exactly are they useful?
Thanks, Boda Cydo.
I'm not agree with the answers I'm seeing, although it's true that NoSQL solutions tends to break the ACID rules, not all are created from that approach.
I think first you should define what is a SQL Solution and then you can put the "Not Only" in front of it, that will be more accurate definition of what is a NoSQL solution.
With this approach in mind:
SQL databases are a way to group all the data stores that are accessible using Structured Query Language as the main (and most of the time only) way to communicate with them, this means it requires that the database support the structures that are common to those systems like "tables", "columns", "rows", "relationships", etc.
Now, put the "Not Only" in front of the last sentence and you will get a definition of what means "NoSQL". NoSQL groups all the stores created as an attempt to solve problems which cannot fit into the table/column/rows structures or even in SQL Statements, in most of the cases these databases will not support relationships, they're abandoning the well known structures just because the problems have changed since their conception.
If you have a text file, and you create an API to store/retrieve/organize this information, then you have a NoSQL database in your hands.
All of these means that there are several solutions to store the information in a way that traditional SQL systems will not allow to achieve better performance, flexibility, etc etc. Every NoSQL provider tries to solve a different problem and that's why you wont be able to compare two different solutions, for example:
djondb is a document store created to be used as
NoSQL enterprise solution supporting transactions, consistency, etc.
but sacrifice performance of its counterparts.
MongoDB is a document store (similar to
djondb) which accomplish great performance but trades some of the
ACID properties to achieve this.
CouchDB is another document store which
solves the queries slightly different providing views to retrieve the
information without doing a full query every time.
...
As you may have noticed I only talked about the document stores, that's because I wanted to show you that 3 different document stores implementations have different approach, therefore you should keep in mind the golden rule of NoSQL stores "Use the right tool for the right job".
I'm the creator of djondb and I've been doing a lot of research even before trying to start my own NoSQL implementation, but this is a field where the concepts will keep changing the way we see the information storage.
From wikipedia:
NoSQL is an umbrella term for a loosely defined class of non-relational data stores that break with a long history of relational databases and ACID guarantees. Data stores that fall under this term may not require fixed table schemas, and usually avoid join operations. The term was first popularised in early 2009.
The motivation for such an architecture was high scalability, to support sites such as Facebook, advertising.com, etc...
To quickly get a handle on NoSQL systems, see this blog post I wrote: Visual Guide to NoSQL Systems. Essentially, NoSQL systems sacrifice either consistency or availability in favor of tolerance to network partitions.
What is NoSQL ?
NoSQL is the acronym for Not Only SQL. The basic qualities of NoSQL databases are schemaless, distributed and horizontally scalable on commodity hardware. The NoSQL databases offers variety of functions to solve various problems with variety of data types, where “blob” used to be the only data type in RDBMS to store unstructured data.
1 Dynamic Schema
NoSQL databases allows schema to be flexible. New columns can be added anytime. Rows may or may not have values for those columns and no strict enforcement of data types for columns. This flexibility is handy for developers, especially when they expect frequent changes during the course of product life cycle.
2 Variety of Data
NoSQL databases support any type of data. It supports structured, semi-structured and unstructured data to be stored. Its supports logs, images files, videos, graphs, jpegs, JSON, XML to be stored and operated as it is without any pre-processing. So it reduces the need for ETL (Extract – Transform – Load).
3 High Availability Cluster
NoSQL databases support distributed storage using commodity hardware. It also supports high availability by horizontal scalability. This features enables NoSQL databases get the benefit of elastic nature of the Cloud infrastructure services.
4 Open Source
NoSQL databases are open source software. The usage of software is free and most of them are free to use in commercial products. The open sources codebase can be modified to solve the business needs. There are minor variations in the open source software licenses, users must be aware of license agreements.
5 NoSQL – Not Only SQL
NoSQL databases not only depend SQL to retrieve data. They provide rich API interfaces to perform DML and CRUD operations. These are APIs are move developer friendly and supported in variety of programming languages.
Take a look at these:
http://en.wikipedia.org/wiki/Nosql#List_of_NoSQL_open_source_projects
and this:
http://www.mongodb.org/display/DOCS/Comparing+Mongo+DB+and+Couch+DB
I used something called the Raima Data Manager more than a dozen years ago, that qualifies as NoSQL. It calls itself a "Set Oriented Database" Its not based on tables, and there is no query "language", just an C API for asking for subsets.
It's fast and easier to work with in C/C++ and SQL, there's no building up strings to pass to a query interpreter and the data comes back as an enumerable object rather than as an array. variable sized records are normal and don't waste space. I never saw the source code, but there were some hints at the interface that internally, the code used pointers a lot.
I'm not sure that the product I used is even sold anymore, but the company is still around.
MongoDB looks interesting, SourceForge is now using it.
I listened to a podcast with a team member. The idea with NoSQL isn't so much to replace SQL as it is to provide a solution for problems that aren't solved well with traditional RDBMS. As mentioned elsewhere, they are faster and scale better at the cost of reliability and atomicity (different solutions to different degrees). You wouldn't want to use one for a financial system, but a document based system would work great.
Here is a comprehensive list of NoSQL Databases: http://nosql-database.org/.
I'm glad that you have had success with RDM John! I work at Raima so it's great to hear feedback. For those looking for more information, here are a couple of resources:
Video Overview of RDM's General Architecture
Free Evaluation Download of RDM
Just wonder how good is Propel's support for database sharding? I am thinking about creating my application in PHP, using MySQL as the database server and Propel as the ORM.
I figure out that it may be good to keep the architecture scalable right from the start, just in case my application takes off.
What's your take?
I think that's a very bad idea. Assuming that you need to shard your data is not a good assumption. You don't know, in advance, how you're going to want to scale. Sharding is a very complicated business and needs to be avoided if at all possible. This is an obscene case of premature optimisation.
I agree with MarkR that it's too early to be worrying about sharding, but I disagree that it should be avoided if at all possible. I'd say go with the ORM that seems to fit your style and language choice -- and Propel is probably the right one in your case. Even if your application takes off in a big way, sharding probably won't be necessary -- you can easily pull off 25 million records with a MySQL-based DBMS and some decent caching techniques, so just focus on making your queries fast and design for easy memcache-integration, and you'll be a happy camper even when your app takes off.
Good luck with it!
Propel supports sharding out of the box through connections. check an example here:
http://groups.google.com/group/propel-users/browse_thread/thread/4d19c0668aa17452
Some database sharding middleware can help you.
For example: Apache ShardingSphere (https://shardingsphere.apache.org/document/current/en/features/sharding/)
There are 2 adaptors of ShardingSphere, JDBC and Proxy. JDBC is the java driver to connect database which no fit for PHP, but Proxy is just like the database (MySQL or Postgres).
ShardingSphere-Proxy can isolate the data sharding and business logic. It is better to use third party middleware to handle common problems.