Nosql has a distributed structure n that why it does not support to ACID property? newsql has also distributed structure so how can newsql gives guarantee that ACID is supporter
As you point out, both NoSQL and NewSQL databases often have distributed architectures. Being distributed does not preclude providing ACID guarantees, nor does using or not using SQL, as in fact there are some ACID NoSQL databases. They are separate things.
However, in the early days of NoSQL, it was often argued that in order to provide the scale needed for web applications, not only was it necessary to use a distributed architecture, but also to get rid of SQL, relational tables, and ACID guarantees. NewSQL in many ways refuted this argument, showing that databases could be distributed and scalable without giving up those things, by using a specialized architecture, often a distributed architecture.
Each database is different, and even among databases that are "ACID-compliant" there are many variations in the actual guarantees they provide, so it's often important to read the fine print.
For example, some ACID databases limit the scope of a transaction to a single operation, such as inserting or updating a single record. Others relax the definition of isolation so that it is possible to get incorrect results. Other relax durability, so there is a possibility that some "committed" transactions might not survive an outage. Many database claim to be ACID-compliant, but that doesn't mean you can use them all to do things like guarantee only one person reserves a seat on a plane, or that orders never exceed inventory, or that purchases never exceed the customer's available balance.
It is possible to adhere to very strict definitions of ACID, and support complex transactions, while still being distributed and scalable. One example is VoltDB. VoltDB has a detailed description of how it handles transactions here: http://voltdb.com/sites/default/files/tn-transactions.pdf
If you are looking for pure ACID properties (Atomicity, Consistency, Isolation, Durability) , NoSQL may not be accurate solution.
If you are looking for two attributes out of three CAP ( Consistency, Availability & Partitioning), NoSQL is right solution.
But some of the NoSQs like MongoDB can come close to implementing ACID properties ( except some compromise on durability)
Refer to below article on how MongoDB implements ACI (and not complete D) How ACID is MongoDB
There are NoSQL ACID (distributed) databases, despite CAP theorem. How this is possible? What's the relation between CAP theorem and (possible/not possible of) being ACID?
Is impossible for a distributed computer system to simultaneously provide consistency,
availability and partition tolerance.
CAP theorem is actually a bit misleading. The fact you can have a CA design is nonsense because when a partition occurs you necessarily have a problem regarding consistency (data synchronization issue for example) or availability (latency). That's why there is a more accurate theorem stating that :
During a partition in a distributed system, you must chose between consistency and availability.
Still in practice it is not that simple. You should note that the choice between consistency and availability isn't binary. You can even have some degree of both. For example regarding ACID, you can have atomic and durable transactions with NoSQL, but forfeit a degree of isolation and consistency for better availability. Availability can then be assimilated to latency because your response time will depend on several factors (is the nearest server available ?).
So, to answer your question, this is usually marketing bullshit. You need to actually scratch the surface to see what the solution is exactly gaining and forfeiting.
If you want deeper explanations you can look here, here or here.
The PACELC theorem extends CAP to talk about the tradeoffs even when partitions aren't happening. One of the exciting insights for distributed systems, is that they can be made partition tolerant without losing consistency, when consensus protocols such as RAFT or Paxos are used to create a transaction log. The Calvin protocol combines a RAFT log with deterministic transaction application.
FaunaDB implements Calvin, allowing it to maintain ACID transactions with strict-serializability, even during partitions or during replica failure, as long as a quorum of replicas is not partitioned.
From the NimbusDB website:
Our distributed non-blocking atomic commit protocol allows database transaction processing at any available node.
They claim that they can guarantee ACID transactions in a distributed environment, and provide all of: consistency, high availability and partition tolerance. As far as I can tell from the text, their "secret" for overcoming the limitations of CAP theorem is some sort of "predictable and consistent" way to manage network partitions.
I'm wondering if anyone has some insights or more information on what's behind?
There are multiple possible meanings for the word "consistency". See, e.g., Why is C in CAP theorem not same as C in ACID? .
Plus, some level of debate is also possible as to the meaning of the C in 'ACID' : while it is typically defined in a sense that relates to database integrity ("no transaction shall get to see a database state that violates a declared constraint - modulo the inconsistencies that that transaction has created itself of course"), one commenter said he interpreted it as referring to "the database state as seen (or perhaps better, as effectively used) by any transaction does not change while that transaction is in progress. Paraphrased : transactions are ACID-compliant if they are executing in at least repeatable read mode.
If you take the CAP-C to mean "all nodes see the same data at the same time", then availability is necessarily hampered because while the system is busy distributing the data to the various nodes, it cannot allow any transaction access to (the elder versions of) that data. (Unless of course access to elder versions is precisely what is needed, such as when a transaction is running under MVCC.)
If you take the CAP-C to mean something along the lines of "no transaction can get to see an inconsistent database state", then essentially the same applies, except that it is now the user's update process that should be locking out access for all other transactions.
If you impose a rule to the effect that "whenever a transaction has accessed a particular node N to read from some resource R (assuming R could theoretically be accessed on more than one node), then whenever that transaction accesses R again, it should do so on the same node N.", then I can imagine this will increase your guarantee of "consistency", but you pay in availability, because if node N falls out, then precisely because of the rule imposed, your transaction cannot access R anymore even if it could be done on other nodes.
At any rate, I think that if an institution such as Berkeley comes up with a proof of some theorem, then you're on the safe side if you consider vociferous claims such as the one you mention, as marketing lies.
It's been a while since this post was written and since then NuoDB has added a lot to their product marketing and technical resources on their website.
They've achieve data durability and ACID compliance by using their Distributed Data Cache System. They now call it an "Emergent Architecture:" (p.6-7)
The architecture opens a variety of possible future directions including “time-travel”, the ability to create a copy of the database that recreates its state at an earlier time; “cloud bursting”, the ability to move a database across cloud systems managed by separate groups; and
“coteries” a mechanism that addresses the CAP Theorem by allowing the DBA to specify which systems survive a network partition to provide consistency and partition resistance with continuous availability.
From the How It Works page :
Today’s database vendors have applied three common design patterns around traditional systems to extend them into distributed scale-out database systems. These approaches – Shared-Disk, Shared-Nothing and Synchronous Commit - overcome some of the limitations of single-server deployments, but remain complex and prone to error.
By stepping back and rethinking database design from the ground up, Jim Starkey, NuoDB’s technical founder, has come up with an entirely new design approach called Durable Distributed Cache (DDC). The net effect is a system that scales-out/in dynamically on commodity machines and virtual machines, has no single point of failure, and delivers full ACID transactional semantics.
The primary architectural difference between NuodDB's NewSQL model and that of the more traditional RDMS systems is that the NuoDB inverts the traditional relationship between Memory and Storage, creating an ACID compliant RDBMS with an underlying design similar to that of a distributed DRAM cache. From the NuoDB Durable Distributed Cache page:
All general-purpose relational databases to date have been architected around a storage-centric assumption. Unfortunately this creates a fundamental problem relative to scaling out. In effect, these database systems are fancy file systems that arrange for concurrent read/write access to disk-based files such that users do not interfere with each other.
The NuoDB DDC architecture inverts this idea, imagining the database as a set of in-memory container objects that can overflow to disk if necessary and can be retained in backing stores for durability purposes.
All servers in the NuoDB DDC architecture can request and supply objects (referred to as Atoms) thereby acting as peers to each other. Some servers have a subset of the objects at any given time, and can therefore only supply a subset of the database to other servers. Other servers have all the objects and can supply any of them, but will be slower to supply objects that are not resident in memory.
NuoDB consists of two types of servers: Transaction Engines (TEs) hold a subset of the objects; Storage Managers (SMs) are servers that have a complete copy of all objects. TEs are pure in memory servers that do not need use disks. They are autonomous and can unilaterally load and eject objects from memory according to their needs. Unlike TEs, SMs can’t just drop objects on the floor when they are finished with them; instead they must ensure that they are safely placed in durable storage.
For those familiar with caching architectures, you might have already recognized that these TEs are in effect a distributed DRAM cache, and the SMs are specialized TEs that ensure durability. Hence the name Durable Distributed Cache.
They also publish a technical white paper that deep-dives into the sub-system components and the way they work together to provide an ACID-compliant RDMBS with most of the performance of a NoSQL system (NOTE: registration on their site to download the white paper). The general gist is that they provide an automated network cluster partitioning system that, when combined with their persistent storage system, addresses the concerns the CAP Theorem.
There are also a lot of informative technical white papers and independent analysis reports on their technology in their Online Documents Library
I've heard about two kind of database architectures.
master-master
master-slave
Isn't the master-master more suitable for today's web cause it's like Git, every unit has the whole set of data and if one goes down, it doesn't quite matter.
Master-slave reminds me of SVN (which I don't like) where you have one central unit that handles thing.
Questions:
What are the pros and cons of each?
If you want to have a local database in your mobile phone like iPhone, which one is more appropriate?
Is the choice of one of these a critical factor to consider thoroughly?
While researching the various database architectures as well. I have compiled a good bit of information that might be relevant to someone else researching in the future. I came across
Master-Slave Replication
Master-Master Replication
MySQL Cluster
I have decided to settle for using MySQL Cluster for my use case. However please see below for the various pros and cons that I have compiled
1. Master-Slave Replication
Pros
Analytic applications can read from the slave(s) without impacting the master
Backups of the entire database of relatively no impact on the master
Slaves can be taken offline and sync back to the master without any downtime
Cons
In the instance of a failure, a slave has to be promoted to master to take over its place. No automatic failover
Downtime and possibly loss of data when a master fails
All writes also have to be made to the master in a master-slave design
Each additional slave add some load to the master since the binary log have to be read and data copied to each slave
Application might have to be restarted
2. Master-Master Replication
Pros
Applications can read from both masters
Distributes write load across both master nodes
Simple, automatic and quick failover
Cons
Loosely consistent
Not as simple as master-slave to configure and deploy
3. MySQL Cluster
The new kid in town based on MySQL cluster design. MySQL cluster was developed with high availability and scalability in mind and is the ideal solution to be used for environments that require no downtime, high avalability and horizontal scalability.
See MySQL Cluster 101 for more information
Pros
(High Avalability) No single point of failure
Very high throughput
99.99% uptime
Auto-Sharding
Real-Time Responsiveness
On-Line Operations (Schema changes etc)
Distributed writes
Cons
See known limitations
You can visit for my Blog full breakdown including architecture diagrams that goes into further details about the 3 mentioned architectures.
We're trading off availability, consistency and complexity. To address the last question first: Does this matter? Yes very much! The choices concerning how your data is to be managed is absolutely fundamental, and there's no "Best Practice" dodging the decisions. You need to understand your particular requirements.
There's a fundamental tension:
One copy: consistency is easy, but if it happens to be down everybody is out of the water, and if people are remote then may pay horrid communication costs. Bring portable devices, which may need to operate disconnected, into the picture and one copy won't cut it.
Master Slave: consistency is not too difficult because each piece of data has exactly one owning master. But then what do you do if you can't see that master, some kind of postponed work is needed.
Master-Master: well if you can make it work then it seems to offer everything, no single point of failure, everyone can work all the time. The trouble with this is that it is very hard to preserve absolute consistency. See the wikipedia article for more.
Wikipedia seems to have a nice summary of the advantages and disadvantages
Advantages
If one master fails, other masters will continue to update the
database.
Masters can be located in several physical sites i.e.
distributed across the network.
Disadvantages
Most multi-master replication systems are only loosely consistent,
i.e. lazy and asynchronous, violating ACID properties.
Eager replication systems are complex and introduce some
communication latency.
Issues such as conflict resolution can become intractable as
the number of nodes involved rises and the required latency decreases.
Want to improve this post? Provide detailed answers to this question, including citations and an explanation of why your answer is correct. Answers without enough detail may be edited or deleted.
Is there any NoSQL data store that is ACID compliant?
I'll post this as an answer purely to support the conversation - Tim Mahy , nawroth , and CraigTP have suggested viable databases. CouchDB would be my preferred due to the use of Erlang, but there are others out there.
I'd say ACID does not contradict or negate the concept of NoSQL... While there seems to be a trend following the opinion expressed by dove , I would argue the concepts are distinct.
NoSQL is fundamentally about simple key-value (e.g. Redis) or document-style schema (collected key-value pairs in a "document" model, e.g. MongoDB) as a direct alternative to the explicit schema in classical RDBMSs. It allows the developer to treat things asymmetrically, whereas traditional engines have enforced rigid same-ness across the data model. The reason this is so interesting is because it provides a different way to deal with change, and for larger data sets it provides interesting opportunities to deal with volumes and performance.
ACID provides principles governing how changes are applied to a database. In a very simplified way, it states (my own version):
(A) when you do something to change a database the change should work or fail as a whole
(C) the database should remain consistent (this is a pretty broad topic)
(I) if other things are going on at the same time they shouldn't be able to see things mid-update
(D) if the system blows up (hardware or software) the database needs to be able to pick itself back up; and if it says it finished applying an update, it needs to be certain
The conversation gets a little more excitable when it comes to the idea of propagation and constraints. Some RDBMS engines provide the ability to enforce constraints (e.g. foreign keys) which may have propagation elements (a la cascade). In simpler terms, one "thing" may have a relationship with another "thing" in the database, and if you change an attribute of one it may require the other be changed (updated, deleted, ... lots of options). NoSQL databases, being predominantly (at the moment) focused on high data volumes and high traffic, seem to be tackling the idea of distributed updates which take place within (from a consumer perspective) arbitrary time frames. This is basically a specialized form of replication managed via transaction - so I would say that if a traditional distributed database can support ACID, so can a NoSQL database.
Some resources for further reading:
Wikipedia article on ACID
Wikipedia on propagation constraints
Wikipedia (yeah, I like the site, ok?) on database normalization
Apache documentation on CouchDB with a good overview of how it applies ACID
Wikipedia on Cluster Computing
Wikipedia (again...) on database transactions
UPDATE (27 July 2012):
Link to Wikipedia article has been updated to reflect the version of the article that was current when this answer was posted. Please note that the current Wikipedia article has been extensively revised!
Well, according to an older version of a Wikipedia article on NoSQL:
NoSQL is a movement promoting a
loosely defined class of
non-relational data stores that break
with a long history of relational
databases and ACID guarantees.
and also:
The name was an attempt to describe
the emergence of a growing number of
non-relational, distributed data
stores that often did not attempt to
provide ACID guarantees.
and
NoSQL systems often provide weak
consistency guarantees such as
eventual consistency and transactions
restricted to single data items, even
though one can impose full ACID
guarantees by adding a supplementary
middleware layer.
So, in a nutshell, I'd say that one of the main benefits of a "NoSQL" data store is its distinct lack of ACID properties. Furthermore, IMHO, the more one tries to implement and enforce ACID properties, the further away from the "spirit" of a "NoSQL" data store you get, and the closer to a "true" RDBMS you get (relatively speaking, of course).
However, all that said, "NoSQL" is a very vague term and is open to individual interpretations, and depends heavily upon just how much of a purist viewpoint you have. For example, most modern-day RDBMS systems don't actually adhere to all of Edgar F. Codd's 12 rules of his relation model!
Taking a pragmatic approach, it would appear that Apache's CouchDB comes closest to embodying both ACID-compliance whilst retaining loosely-coupled, non-relational "NoSQL" mentality.
Please ensure you read the Martin Fowler introduction about NoSQL databases. And the corresponding video.
First of all, we can distinguish two types of NoSQL databases:
Aggregate-oriented databases;
Graph-oriented databases (e.g. Neo4J).
By design, most Graph-oriented databases are ACID!
Then, what about the other types?
In Aggregate-oriented databases, we can put three sub-types:
Document-based NoSQL databases (e.g. MongoDB, CouchDB);
Key/Value NoSQL databases (e.g. Redis);
Column family NoSQL databases (e.g. Hibase, Cassandra).
What we call an Aggregate here, is what Eric Evans defined in its Domain-Driven Design as a self-sufficient of Entities and Value-Objects in a given Bounded Context.
As a consequence, an aggregate is a collection of data that we
interact with as a unit. Aggregates form the boundaries for ACID
operations with the database. (Martin Fowler)
So, at Aggregate level, we can say that most NoSQL databases can be as safe as ACID RDBMS, with the proper settings. Of source, if you tune your server for the best speed, you may come into something non ACID. But replication will help.
My main point is that you have to use NoSQL databases as they are, not as a (cheap) alternative to RDBMS. I have seen too much projects abusing of relations between documents. This can't be ACID. If you stay at document level, i.e. at Aggregate boundaries, you do not need any transaction. And your data will be as safe as with an ACID database, even if it not truly ACID, since you do not need those transactions! If you need transactions and update several "documents" at once, you are not in the NoSQL world any more - so use a RDBMS engine instead!
some 2019 update: Starting in version 4.0, for situations that require atomicity for updates to multiple documents or consistency between reads to multiple documents, MongoDB provides multi-document transactions for replica sets.
In this question someone must mention OrientDB:
OrientDB is a NoSQL database, one of the few, that support fully ACID transactions. ACID is not only for RDBMS because it's not part of the Relational algebra. So it IS possible to have a NoSQL database that support ACID.
This feature is the one I miss the most in MongoDB
FoundationDB is ACID compliant:
http://www.foundationdb.com/
It has proper transactions, so you can update multiple disparate data items in an ACID fashion. This is used as the foundation for maintaining indexes at a higher layer.
ACID and NoSQL are completely orthogonal. One does not imply the other.
I have a notebook on my desk, I use it to keep notes on things that I still have to do. This notebook is a NoSQL database. I query it using a linear search with a "page cache" so I don't always have to search every page. It is also ACID compliant as I ensure that I only write one thing at a time and never while I am reading it.
NoSQL simply means that it isn't SQL. Many people get confused and think it means highly-scaleable-wild-west-super-fast-storage. It doesn't. It doesn't mean key-value store, or eventual consistency. All it means is "not SQL", there are a lot of databases in this planet and most of them are not SQL[citation needed].
You can find many examples in the other answers so I need not list them here, but there are non-SQL databases with ACID compliance for various operations, some are only ACID for single object writes while some guarantee far more. Each database is different.
"NoSQL" is not a well-defined term. It's a very vague concept. As such, it's not even possible to say what is and what is not a "NoSQL" product. Not nearly all of the products typcially branded with the label are key-value stores.
As one of the originators of NoSQL (I was an early contributor to Apache CouchDB, and a speaker at the first NoSQL event held at CBS Interactive / CNET in 2009) I'm excited to see new algorithms create possibilities that didn't exist before. The Calvin protocol offers a new way to think of physical constraints like CAP and PACELC.
Instead of active/passive async replication, or active/active synchronous replication, Calvin preserves correctness and availability during replica outages by using a RAFT-like protocol to maintain a transaction log. Additionally, transactions are processed deterministically at each replica, removing the potential for deadlocks, so agreement is achieved with only a single round of consensus. This makes it fast even on multi-cloud worldwide deployments.
FaunaDB is the only database implementation using the Calvin protocol, making it uniquely suited for workloads that require mainframe-like data integrity with NoSQL scale and flexibility.
Yes, MarkLogic Server is a NoSQL solution (document database I like to call it) that works with ACID transactions
The grandfather of NoSQL: ZODB is ACID compliant. http://www.zodb.org/
However, it's Python only.
If you are looking for an ACID compliant key/value store, there's Berkeley DB. Among graph databases at least Neo4j and HyperGraphDB offer ACID transactions (HyperGraphDB actually uses Berkeley DB for low-level storage at the moment).
FoundationDB was mentioned and at the time it wasn't open source. It's been open sourced by Apple two days ago:
https://www.foundationdb.org/blog/foundationdb-is-open-source/
I believe it is ACID compliant.
MongoDB announced that its 4.0 version will be ACID compliant for multi-document transactions.
Version 4.2. is supposed to support it under sharded setups.
https://www.mongodb.com/blog/post/multi-document-transactions-in-mongodb
NewSQL
This concept Wikipedia contributors define as:
[…] a class of modern relational database management systems that seek to provide the same scalable performance of NoSQL systems for online transaction processing (OLTP) read-write workloads while still maintaining the ACID guarantees of a traditional database system.[1][2][3]
References
[1] Nancy Lynch and Seth Gilbert, “Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services”, ACM SIGACT News, Volume 33 Issue 2 (2002), pg. 51-59.
[2] "Brewer's CAP Theorem", julianbrowne.com, Retrieved 02-Mar-2010
[3] "Brewers CAP theorem on distributed systems", royans.net
take a look at the CAP theorem
EDIT: RavenDB seems to be ACID compliant
To add to the list of alternatives, another fully ACID compliant NoSQL database is GT.M.
Hyperdex Warp http://hyperdex.org/warp/
Warp (ACID feature) is proprietary, but Hyperdex is free.
db4o
Unlike roll-your-own persistence or
serialization, db4o is ACID
transaction safe and allows for
querying, replication and schema
changes during runtime
http://www.db4o.com/about/productinformation/db4o/
BergDB is a light-weight, open-source, NoSQL database designed from the start to run ACID transactions. Actually, BergDB is "more" ACID than most SQL databases in the sense that the only way to change the state of the database is to run ACID transactions with the highest isolation level (SQL term: "serializable"). There will never be any issues with dirty reads, non-repeatable reads, or phantom reads.
In my opinion, the database is still highly performant; but don't trust me, I created the software. Try it yourself instead.
Tarantool is a fully ACID NoSQL database. You can issue CRUD operations or stored procedures, everything will be run with strict accordance with an ACID property. You can also read about that here: http://stable.tarantool.org/doc/mpage/data-and-persistence.html
MarkLogic is also ACID complient. I think is one of the biggest players now.
Wait is over.
ACID compliant NoSQL DB is out ----------- have a look at citrusleaf
A lot of modern NoSQL solution don't support ACID transactions (atomic isolated multi-key updates), but most of them support primitives which allow you to implement transactions on the application level.
If a data store supports per key linearizability and compare-and-set (document level atomicity) then it's enough to implement client-side transactions, more over you have several options to choose from:
If you need Serializable isolation level then you can follow the same algorithm which Google use for the Percolator system or Cockroach Labs for CockroachDB. I've blogged about it and create a step-by-step visualization, I hope it will help you to understand the main idea behind the algorithm.
If you expect high contention but it's fine for you to have Read Committed isolation level then please take a look on the RAMP transactions by Peter Bailis.
The third approach is to use compensating transactions also known as the saga pattern. It was described in the late 80s in the Sagas paper but became more actual with the raise of distributed systems. Please see the Applying the Saga Pattern talk for inspiration.
The list of data stores suitable for client side transactions includes Cassandra with lightweight transactions, Riak with consistent buckets, RethinkDB, ZooKeeper, Etdc, HBase, DynamoDB, MongoDB and others.
YugaByte DB supports an ACID Compliant distributed txns as well as Redis and CQL API compatibility on the query layer.
Google Cloud Datastore is a NoSQL database that supports ACID transactions
DynamoDB is a NoSQL database and has ACID transactions.
VoltDB is an entrant which claims ACID compliance, and while it still uses SQL, its goals are the same in terms of scalability
Whilst it's only an embedded engine and not a server, leveldb has WriteBatch and the ability to turn on Synchronous writes to provide ACID behaviour.
Node levelUP is transactional and built on leveldb https://github.com/rvagg/node-levelup#batch
If you add enough pure water and successfully flip a coin, anything can become acidic. Or basic for that matter.
To say a database is ACID compliant means four specific things. And in defining the system (restricting the range) we can arbitrarily water down the meanings so that the result is ACID compliance.
A—if your NoSQL database only allows one record operation at a time and records either go or they don't then that's atomic.
C—if the only constraints you allow are simple, like checking JSON schemas against a known schema then that's consistent.
I—if just append-only transactions are supported (and schema changes are disallowed) then it is impossible for anything to depend on anything else, that's independent.
D—if you turn off all machines at night and synchronize disks then the transactions will be it in or they won't, that's durable.