I have read a number of articles and got confused.
Opinion 1:
2PC is very efficient, a minimal number of messages are exchanged and latency is low.
Source:
http://highscalability.com/paper-consensus-protocols-two-phase-commit
Opinion 2:
It is very hard to scale distributed transactions to high level, moreover they reduce throughput. As 2PC guarantess ACID It puts a great burden due to its complex coordination algorithm.
Source: http://ivoroshilin.com/2014/03/18/distributed-transactions-and-scalability-issues-in-large-scale-distributed-systems/
Opinion 3:
“some authors have claimed that two-phase commit is too expensive to support, because
of the performance or availability problems that it brings. We believe it is better to have
application programmers deal with performance problems due to overuse of transactions
as bottlenecks arise, rather than always coding around the lack of transactions. Running
two-phase commit over Paxos mitigates the availability problems.”
Source: http://courses.cs.washington.edu/courses/csep552/13sp/lectures/6/spanner.pdf
Opinion 4:
The 2PC coordinator also represents a Single Point of Failure, which is unacceptable for critical systems - I believe it is a coordinator.
Source: http://www.addsimplicity.com/adding_simplicity_an_engi/2006/12/2pc_or_not_2pc_.html
First 3 opinions contradict each other. The 4-th one I think is correct. Please clarify what is wrong and what is correct. It would be great also to give facts why that is.
The 4th statement is correct, but maybe not in the way you are reading it. In 2PC, if the coordinator fails, the system cannot make progress. It therefore often desirable to use a fault-tolerant protocol like Paxos (see Gray and Lamport for example), which will allow the system to safely progress when there are failures.
Opinion 3 should be read in context of the rest of the Spanner paper. The authors are saying that they have developed a system which allows efficient transactions in a distributed database, and that they think it's the right default tradeoff for users of the system. The way Spanner does that is well detailed in the paper, and it is worth reading. Take note that Spanner is simply a way (a clever way, granted) of organizing the coordination which is inherently required to implement serializable transactions. See Gilbert and Lynch for one way to look at the limits on coordination).
Opinion 2 is a common belief, and there are indeed tradeoffs between availability and richness of transaction semantics in real-world distributed systems. Current research, however, is making it clear that these tradeoffs are not as dire as they have been portrayed in the past. See this talk by Peter Bailis for one of the research directions. If you want true serializability or linearizability in the strictest sense, you need to obey certain lower bounds of coordination in order to achieve them.
Opinion 1 is technically true, but not very helpful in the way you quoted it. 2PC is optimal in some sense, but seldom implemented naively because of the availability tradeoffs. Many adhoc attempts to adress these tradeoffs lead to incorrect protocols. Others, like Paxos and Raft, successfully address them at the cost of some complexity.
Related
Background:
I know of quite a few large scale systems, especially in e-commerce domain where distributed transactions are used along with eventual consistency.
Question:
Is it possible to have a distributed transaction (over two networked resources) with strong consistency guarantees.
I keep hearing/reading about it in theory (using 2 phase commits), but have never had a chance to come across one such system.
Or it is not at all possible to achieve it at all ? Any insights/relevant article appreciated.
Right away I can suggest at least two modern distributed databases that fit your requirements: TiKV and CocroachDB. Both of them are CP systems (in terms of CAP theorem) both support ACID, both use two-phase commit algorithms for distributed transactions. It also possible to set up two-phase commits within PostgreSQL. And I believe there are much more databases that support distributed transactions while preserving strong consistency guarantees.
As I know, there are not too much options for distributed strong consistent database design: you may use either two-phase commit (or its variations like three-phase commit), or distributed consensus algorithms like Raft. I would suggest you to read a comprehensive guide by Martin Kleppman.
As i understand in distributed system we are supposed to handle network partition failure which is solved by using multiple copies of the same data.
Is this the only place where we use consensus algorithm?
What is the difference between 2PC/3PC/Paxos (is paxos modified version of 3PC? if so then 2PC/3PC , PC also kind of consensus algorithm?)
Network partition is not "solved" by having many copies of the same data. Although redundancy is ofcourse essential to deal with any sort of failures :)
There are many other problems regarding network partition. Generally to increase tolerance for network partitions you use an algorithm that relies on a quorum rather than total-communication, in the quroum approach you can still make progress on one side of the partition as long as f+1 nodes out of 2f are reachable. Paxos for example uses quorum approach. It is quite clear that a protocol like 2PC cannot make progress in case of any type of network partition since it requires "votes" from all of the nodes.
What is the difference between 2PC/3PC/Paxos (is paxos modified version of 3PC? if so then 2PC/3PC , PC also kind of consensus algorithm?)
2PC/3PC/Paxos are all variants of consensus-protocols, although 2PC and 3PC are often described as handling the more specific scenario of "atomic commit in a distributed system" which essentially is a consensus problem. 2PC, 3PC, Paxos are similar but different. You can easily find detailed information about each algorithm on the web.
Is this the only place where we use consensus algorithm?
Consensus protocols have many many use cases in distributed systems, for example: atomic commit, atomic broadcast, leader election or basicly any algorithm that require a set of processes to agree upon some value or action.
Caveat: consensus protocols and related problems of distributed systems are not trivial and you'll need to do some reading to get a deep understanding. If you are comfortable reading academic papers you can find most of the famous ones available online, for example "Paxos made simple" by Leslie Lamport or you can find good blogposts by googling. Also the wiki-article for paxos is very good quality in my opinion!
Hope that could answer some of your questions although I probably introduced you to even more! (if your interested you have some research to do).
There are NoSQL ACID (distributed) databases, despite CAP theorem. How this is possible? What's the relation between CAP theorem and (possible/not possible of) being ACID?
Is impossible for a distributed computer system to simultaneously provide consistency,
availability and partition tolerance.
CAP theorem is actually a bit misleading. The fact you can have a CA design is nonsense because when a partition occurs you necessarily have a problem regarding consistency (data synchronization issue for example) or availability (latency). That's why there is a more accurate theorem stating that :
During a partition in a distributed system, you must chose between consistency and availability.
Still in practice it is not that simple. You should note that the choice between consistency and availability isn't binary. You can even have some degree of both. For example regarding ACID, you can have atomic and durable transactions with NoSQL, but forfeit a degree of isolation and consistency for better availability. Availability can then be assimilated to latency because your response time will depend on several factors (is the nearest server available ?).
So, to answer your question, this is usually marketing bullshit. You need to actually scratch the surface to see what the solution is exactly gaining and forfeiting.
If you want deeper explanations you can look here, here or here.
The PACELC theorem extends CAP to talk about the tradeoffs even when partitions aren't happening. One of the exciting insights for distributed systems, is that they can be made partition tolerant without losing consistency, when consensus protocols such as RAFT or Paxos are used to create a transaction log. The Calvin protocol combines a RAFT log with deterministic transaction application.
FaunaDB implements Calvin, allowing it to maintain ACID transactions with strict-serializability, even during partitions or during replica failure, as long as a quorum of replicas is not partitioned.
Recenctly I read some articles online that indicates relational databases have scaling issues and not good to use when it comes to big data. Specially in cloud computing where the data is big. But I could not find good solid reasons to why it isn't scalable much, by googling. Can you please explain me the limitations of relational databases when it comes to scalability?
Thanks.
Imagine two different kinds of crossroads.
One has traffic lights or police officers regulating traffic, motion on the crossroad is at limited speed, and there's a watchdog registering precisely what car drove on the crossroad at what time precisely, and what direction it went.
The other has none of that and everyone who arrives at the crossroad at whatever speed he's driving, just dives in and wants to get through as quick as possible.
The former is any traditional database engine. The crossroad is the data itself. The cars are the transactions that want to access the data. The traffic lights or police officer is the DBMS. The watchdog keeps the logs and journals.
The latter is a NOACID type of engine.
Both have a saturation point, at which point arriving cars are forced to start queueing up at the entry points. Both have a maximal throughput. That threshold lies at a lower value for the former type of crossroad, and the reason should be obvious.
The advantage of the former type of crossroad should however also be obvious. Way less opportunity for accidents to happen. On the second type of crossroad, you can expect accidents not to happen only if traffic density is at a much much lower point than the theoretical maximal throughput of the crossroad. And in translation to data management engines, it translates to a guarantee of consistent and coherent results, which only the former type of crossroad (the classical database engine, whether relational or networked or hierarchical) can deliver.
The analogy can be stretched further. Imagine what happens if an accident DOES happen. On the second type of crossroad, the primary concern will probably be to clear the road as quick as possible, so traffic can resume, and when that is done, what info is still available to investigate who caused the accident and how ? Nothing at all. It won't be known. The crossroad is open just waiting for the next accident to happen. On the regulated crossroad, there's the police officer regulating the traffic who saw what happened and can testify. There's the logs saying which car entered at what time precisely, at which entry point precisely, at what speed precisely, a lot of material is available for inspection to determine the root cause of the accident. But of course none of that comes for free.
Colourful enough as an explanation ?
Relational databases provide solid, mature services according to the ACID properties. We get transaction-handling, efficient logging to enable recovery etc. These are core services of the relational databases, and the ones that they are good at. They are hard to customize, and might be considered as a bottleneck, especially if you don't need them in a given application (eg. serving website content with low importance; in this case for example, the widely used MySQL does not provide transaction handling with the default storage engine, and therefore does not satisfy ACID). Lots of "big data" problems don't require these strict constrains, for example web analytics, web search or processing moving object trajectories, as they already include uncertainty by nature.
When reaching the limits of a given computer (memory, CPU, disk: the data is too big, or data processing is too complex and costly), distributing the service is a good idea. Lots of relational and NoSQL databases offer distributed storage. In this case however, ACID turns out to be difficult to satisfy: the CAP theorem states somewhat similar, that availability, consistency and partition tolerance can not be achieved at the same time. If we give up ACID (satisfying BASE for example), scalability might be increased.
See this post eg. for categorization of storage methods according to CAP.
An other bottleneck might be the flexible and clever typed relational model itself with SQL operations: in lots of cases a simpler model with simpler operations would be sufficient and more efficient (like untyped key-value stores). The common row-wise physical storage model might also be limiting: for example it isn't optimal for data compression.
There are however fast and scalable ACID compliant relational databases, including new ones like VoltDB, as the technology of relational databases is mature, well-researched and widespread. We just have to select an appropriate solution for the given problem.
Take the simplest example: insert a row with generated ID. Since IDs must be unique within table, database must somehow lock some sort of persistent counter so that no other INSERT uses the same value. So you have two choices: either allow only one instance to write data or have distributed lock. Both solutions are a major bottle-beck - and is the simplest example!
Imagine you have 2 entities on different systems, and need to perform some sort of transaction that alters one or both of them based on information associated with one or both of them, and require that either changes to both entities will complete or neither of them will.
Simple example, that essentially has to run the 2 lines on 2 separate pieces of hardware:
my_bank.my_account -= payment
their_bank.their_account += payment
Presumably there are algorithms or idioms that exist specifically for this sort of situation, working correctly (for some predictable definition of correct) in the presence of other attempted access to the same values. The two-phase commit protocol seems to be one such approach. Are there any simpler alternatives, perhaps with more limitations? (eg. Perhaps they require that no system can shutdown entirely or fail to respond.) Or maybe there more complex ones that are better in some way? And is there a standard or well-regarded text on the matter?
There's also the 3PC "3 Phase Commit Protocol". 3PC solves some of the issues of 2PC by having an extra phase called pre-commit. A participant in the transaction receives a pre-commit message to know that all the other participants have agreed to commit, but have not done it yet. This phase removes the uncertainty of the 2PC when all participants are waiting for either a commit or abort message from the coordinator.
AFAIK - most databases work just fine with 2PC protocol, because in the unlikely conditions that it fails, they always have the transaction logs to undo/redo operations and leave the data in a consistent state.
Most of this stuff is very well discussed in
"Database Solutions, second edition"
and
"Database Systems: The Complete Book"
More in the distributed world you might want to check current state of Web Service technology on distributed transactions and workflows. Not my cup of tea, to be honest. There are frameworks for Python, Java and .Net to run this kind of services (an example).
As my last year project, some years ago, I implemented a distributed 2PC protocol on top of Web Services and I was able to run transactions on two separate databases, just like the example you gave. However, I am sure today people implement this in a most restful-alike approach, for instance see here. Even though, some other protocols are mentioned in these links, in the end they all end up implementing 2PC.
In summary, a 2PC protocol implementation with with proper operation logs to undo/redo in case of crash is one of the most sensible options to go for.