When do you exactly use consensus algorithm in distributed system? - database

As i understand in distributed system we are supposed to handle network partition failure which is solved by using multiple copies of the same data.
Is this the only place where we use consensus algorithm?
What is the difference between 2PC/3PC/Paxos (is paxos modified version of 3PC? if so then 2PC/3PC , PC also kind of consensus algorithm?)

Network partition is not "solved" by having many copies of the same data. Although redundancy is ofcourse essential to deal with any sort of failures :)
There are many other problems regarding network partition. Generally to increase tolerance for network partitions you use an algorithm that relies on a quorum rather than total-communication, in the quroum approach you can still make progress on one side of the partition as long as f+1 nodes out of 2f are reachable. Paxos for example uses quorum approach. It is quite clear that a protocol like 2PC cannot make progress in case of any type of network partition since it requires "votes" from all of the nodes.
What is the difference between 2PC/3PC/Paxos (is paxos modified version of 3PC? if so then 2PC/3PC , PC also kind of consensus algorithm?)
2PC/3PC/Paxos are all variants of consensus-protocols, although 2PC and 3PC are often described as handling the more specific scenario of "atomic commit in a distributed system" which essentially is a consensus problem. 2PC, 3PC, Paxos are similar but different. You can easily find detailed information about each algorithm on the web.
Is this the only place where we use consensus algorithm?
Consensus protocols have many many use cases in distributed systems, for example: atomic commit, atomic broadcast, leader election or basicly any algorithm that require a set of processes to agree upon some value or action.
Caveat: consensus protocols and related problems of distributed systems are not trivial and you'll need to do some reading to get a deep understanding. If you are comfortable reading academic papers you can find most of the famous ones available online, for example "Paxos made simple" by Leslie Lamport or you can find good blogposts by googling. Also the wiki-article for paxos is very good quality in my opinion!
Hope that could answer some of your questions although I probably introduced you to even more! (if your interested you have some research to do).

Related

Consensus algorithm for Node.js

I'm trying to implement a collaborative canvas in which many people can draw free-handly or with specific shape tools.
Server has been developed in Node.js and client with Angular1-js (and I am pretty new to them both).
I must use a consensus algorithm for it to show always the same stuff to all the users.
I'm seriously in troubles with it since I cannot find a proper tutorial its use. I have been looking and studying Paxos implementation but it seems like Raft is very used in practical.
Any suggestions? I would really appreciate it.
Writing a distributed system is not an easy task[1], so I'd recommend using some existing strongly consistent one instead of implementing one from scratch. The usual suspects are zookeeper, consul, etcd, atomix/copycat. Some of them offer nodejs clients:
https://github.com/alexguan/node-zookeeper-client
https://www.npmjs.com/package/consul
https://github.com/stianeikeland/node-etcd
I've personally never used any of them with nodejs though, so I won't comment on maturity of clients.
If you insist on implementing consensus on your own, then raft should be easier to understand — the paper is surprisingly accessible https://raft.github.io/raft.pdf. They also have some nodejs implementations, but again, I haven't used them, so it is hard to recommend any particular one. Gaggle readme contains an example and skiff has an integration test which documents its usage.
Taking a step back, I'm not sure if the distributed consensus is what you need here. Seems like you have multiple clients and a single server. You can probably use a centralized data store. The problem domain is not really that distributed as well - shapes can be overlaid one on top of the other when they are received by server according to FIFO (imagine multiple people writing on the same whiteboard, the last one wins). The challenge is with concurrent modifications of existing shapes, by maybe you can fallback to last/first change wins or something like that.
Another interesting avenue to explore here would be Conflict-free Replicated Data Types — CRDT. Folks at github used them to implement collaborative "pair" programming in atom. See the atom teletype blog post, also their implementation maybe useful, as collaborative editing seems to be exactly the problem you try to solve.
Hope this helps.
[1] Take a look at jepsen series https://jepsen.io/analyses where Kyle Kingsbury tests various failure conditions of distribute data stores.
Try reading Understanding Paxos. It's geared towards software developers rather than an academic audience. For this particular application you may also be interested in the Multi-Paxos Example Application referenced by the article. It's intended both to help illustrate the concepts behind the consensus algorithm and it sounds like it's almost exactly what you need for this application. Raft and most Multi-Paxos designs tend to get bogged down with an overabundance of accumulated history that generates a new set of problems to deal with beyond simple consistency. An initial prototype could easily handle sending the full-state of the drawing on each update and ignore the history issue entirely, which is what the example application does. Later optimizations could be made to reduce network overhead.

Two-phase commit: availability, scalability and performance issues

I have read a number of articles and got confused.
Opinion 1:
2PC is very efficient, a minimal number of messages are exchanged and latency is low.
Source:
http://highscalability.com/paper-consensus-protocols-two-phase-commit
Opinion 2:
It is very hard to scale distributed transactions to high level, moreover they reduce throughput. As 2PC guarantess ACID It puts a great burden due to its complex coordination algorithm.
Source: http://ivoroshilin.com/2014/03/18/distributed-transactions-and-scalability-issues-in-large-scale-distributed-systems/
Opinion 3:
“some authors have claimed that two-phase commit is too expensive to support, because
of the performance or availability problems that it brings. We believe it is better to have
application programmers deal with performance problems due to overuse of transactions
as bottlenecks arise, rather than always coding around the lack of transactions. Running
two-phase commit over Paxos mitigates the availability problems.”
Source: http://courses.cs.washington.edu/courses/csep552/13sp/lectures/6/spanner.pdf
Opinion 4:
The 2PC coordinator also represents a Single Point of Failure, which is unacceptable for critical systems - I believe it is a coordinator.
Source: http://www.addsimplicity.com/adding_simplicity_an_engi/2006/12/2pc_or_not_2pc_.html
First 3 opinions contradict each other. The 4-th one I think is correct. Please clarify what is wrong and what is correct. It would be great also to give facts why that is.
The 4th statement is correct, but maybe not in the way you are reading it. In 2PC, if the coordinator fails, the system cannot make progress. It therefore often desirable to use a fault-tolerant protocol like Paxos (see Gray and Lamport for example), which will allow the system to safely progress when there are failures.
Opinion 3 should be read in context of the rest of the Spanner paper. The authors are saying that they have developed a system which allows efficient transactions in a distributed database, and that they think it's the right default tradeoff for users of the system. The way Spanner does that is well detailed in the paper, and it is worth reading. Take note that Spanner is simply a way (a clever way, granted) of organizing the coordination which is inherently required to implement serializable transactions. See Gilbert and Lynch for one way to look at the limits on coordination).
Opinion 2 is a common belief, and there are indeed tradeoffs between availability and richness of transaction semantics in real-world distributed systems. Current research, however, is making it clear that these tradeoffs are not as dire as they have been portrayed in the past. See this talk by Peter Bailis for one of the research directions. If you want true serializability or linearizability in the strictest sense, you need to obey certain lower bounds of coordination in order to achieve them.
Opinion 1 is technically true, but not very helpful in the way you quoted it. 2PC is optimal in some sense, but seldom implemented naively because of the availability tradeoffs. Many adhoc attempts to adress these tradeoffs lead to incorrect protocols. Others, like Paxos and Raft, successfully address them at the cost of some complexity.

When should I use Datomic?

I'm intrigued in the database service Datomic, but I'm not sure if it fits the needs of the projects I work on. When is Datomic a good choice, and when should it be avoided?
With the proviso that I haven't used Datomic in production, thought I'd give you an answer.
Advantages
Datalog queries are powerful (more so than non-recursive SQL) and very expressive.
Queries can be written with Clojure data structures, and it's NOT a weak DSL like many SQL libraries that allow you to query with data structures.
It's immutable, so you get the advantages that immutability gives you in Clojure/other languages as well
a. This also allows you to store, while saving structures, all past facts in your database—this is VERY useful for auditing & more
Disadvantages
It can be slow, as Datalog is just going to be slower than equivalent SQL (assuming an equivalent SQL statement can be written).
If you are writing a LOT, you could maybe need to worry about the single transactor getting overwhelmed. This seems unlikely for most cases, but it's something to think about (you could do a sort of shard, though, and probably save yourself; but this isn't a DB for e.g. storing stock tick data).
It's a bit tricky to get up and running with, and it's expensive, and the licensing and price makes it difficult to use a hosted instance with it: you'll need to be dealing with sysadminning this yourself instead of using something like Postgres on Heroku or Mongo at MongoHQ
I'm sure I'm missing some on each side, and though I have 3 listed under disadvantages, I think that the advantages outweigh them in more circumstances where disadvantages don't preclude its use. Price is probably the one that will prevent its being used in most small projects (that you expect to outlast the 1 year free trial).
Cf. this short post describing Datomic simply for some more information.
Expressivity (c.f. Datalog) and immutability are awesome. It's SO much fun to work with Dataomic in that regard, and you can tell it's powerful just by using it a bit.
One important thing when considering if Datomic is the right fit for your application is to think about shape of the data you are going to store and query - as Datomic facts are actually very similar to RDF triples (+ first class time notion) it lends itself very good to modeling complex relationships (linked graph data) - something which is often cumbersome with traditional SQL databases.
I found this aspect to be one of the most appealing and important for me, it worked really well, even if this is of course not something exclusive to Datomic, as there are many other high-quality offerings for graph databases, one must mention Neo4J when we are talking about JVM based solutions.
Regarding Datomic schema, i think it's just the right balance between flexibility and stability.
To complete the above answers, I'd like to emphasize that immutability and the ability to remember the past are not 'wizardry features' suited to a few special case like auditing. It is an approach which has several deep benefits compared to 'mutable cells' databases (which are 99% of databases today). Stuart Halloway demonstrates this nicely in this video: the Impedance Mismatch is our fault.
In my personal opinion, this approach is fundamentally more sane conceptually. Having used it for several months, I don't see Datomic has having crazy magical sophisticated powers, rather a more natural paradigm without some of the big problems the others have.
Here are some features of Datomic I find valuable, most of which are enabled by immutability:
because reading is not remote, you don't have to design your queries like an expedition over the wire. In particular, you can separate concerns into several queries (e.g find the entities which are the input to my query - answer some business question about these entities - fetch associated data for presenting the result)
the schema is very flexible, without sacrificing query power
it's comfortable to have your queries integrated in your application programming language
the Entity API brings you the good parts of ORMs
the query language is programmable and has primitives for abstraction and reuse (rules, predicates, database functions)
performance: writers impede only other writers, and no one impedes readers. Plus, lots of caching.
... and yes, a few superpowers like travelling to the past, speculative writes or branching reality.
Regarding when not to use Datomic, here are the current constraints and limitations I see:
you have to be on the JVM (there is also a REST API, but you lose most of the benefits IMO)
not suited for write scale, nor huge data volumes
won't be especially integrated into frameworks, e.g you won't currently find a library which generates CRUD REST endpoints from a Datomic schema
it's a commercial database
since reading happens in the application process (the 'Peer'), you have to make sure that the Peer has enough memory to hold all the data it needs to traverse in a query.
So my very vague and informal answer would be that Datomic is a good fit for most non-trivial applications which write load is reasonable and you don't have a problem with the license and being on the JVM.
As an analogy, you can ask yourself the same question for Git as compared to other version control systems which are not based on immutability.
Just to tentatively add over the other answers:
It is probably fair to say datomic presents the better conceptual framework for a queryable data store of all other current options out there, while being partially scalable and not exceptionally performant.
I say only partially scalable, because queries need to fit in the peer RAM or fail. And not exceptionally performant, as top-notch SQL engines can optimize queries to fit in memory through sophisticated execution plans, something I've not yet seen mentioned as a feature in datomic; Datomic's decoupling of transacting and querying might in the overall offset this feature.
Unlike many NoSQL engines though, transactions are a first-class citizen, which puts it at par with RDBMS systems in that key regard.
For applications where data is read more than being written, transactions are needed, queries always fit in memory or memory is very cheap, and the overall size of accumulated data isn't too large, it might be a win where a commercial-only product can be afforded ― for those who are willing to embrace its novel conceptual framework implied in the API.

Idioms or algorithms for distributed transactions?

Imagine you have 2 entities on different systems, and need to perform some sort of transaction that alters one or both of them based on information associated with one or both of them, and require that either changes to both entities will complete or neither of them will.
Simple example, that essentially has to run the 2 lines on 2 separate pieces of hardware:
my_bank.my_account -= payment
their_bank.their_account += payment
Presumably there are algorithms or idioms that exist specifically for this sort of situation, working correctly (for some predictable definition of correct) in the presence of other attempted access to the same values. The two-phase commit protocol seems to be one such approach. Are there any simpler alternatives, perhaps with more limitations? (eg. Perhaps they require that no system can shutdown entirely or fail to respond.) Or maybe there more complex ones that are better in some way? And is there a standard or well-regarded text on the matter?
There's also the 3PC "3 Phase Commit Protocol". 3PC solves some of the issues of 2PC by having an extra phase called pre-commit. A participant in the transaction receives a pre-commit message to know that all the other participants have agreed to commit, but have not done it yet. This phase removes the uncertainty of the 2PC when all participants are waiting for either a commit or abort message from the coordinator.
AFAIK - most databases work just fine with 2PC protocol, because in the unlikely conditions that it fails, they always have the transaction logs to undo/redo operations and leave the data in a consistent state.
Most of this stuff is very well discussed in
"Database Solutions, second edition"
and
"Database Systems: The Complete Book"
More in the distributed world you might want to check current state of Web Service technology on distributed transactions and workflows. Not my cup of tea, to be honest. There are frameworks for Python, Java and .Net to run this kind of services (an example).
As my last year project, some years ago, I implemented a distributed 2PC protocol on top of Web Services and I was able to run transactions on two separate databases, just like the example you gave. However, I am sure today people implement this in a most restful-alike approach, for instance see here. Even though, some other protocols are mentioned in these links, in the end they all end up implementing 2PC.
In summary, a 2PC protocol implementation with with proper operation logs to undo/redo in case of crash is one of the most sensible options to go for.

How to design and verify distributed systems?

I've been working on a project, which is a combination of an application server and an object database, and is currently running on a single machine only. Some time ago I read a paper which describes a distributed relational database, and got some ideas on how to apply the ideas in that paper to my project, so that I could make a high-availability version of it running on a cluster using a shared-nothing architecture.
My problem is, that I don't have experience on designing distributed systems and their protocols - I did not take the advanced CS courses about distributed systems at university. So I'm worried about being able to design a protocol, which does not cause deadlock, starvation, split brain and other problems.
Question: Where can I find good material about designing distributed systems? What methods there are for verifying that a distributed protocol works right? Recommendations of books, academic articles and others are welcome.
I learned a lot by looking at what is published about really huge web-based plattforms, and especially how their systems evolved over time to meet their growth.
Here a some examples I found enlightening:
eBay Architecture: Nice history of their architecture and the issues they had. Obviously they can't use a lot of caching for the auctions and bids, so their story is different in that point from many others. As of 2006, they deployed 100,000 new lines of code every two weeks - and are able to roll back an ongoing deployment if issues arise.
Paper on Google File System: Nice analysis of what they needed, how they implemented it and how it performs in production use. After reading this, I found it less scary to build parts of the infrastructure myself to meet exactly my needs, if necessary, and that such a solution can and probably should be quite simple and straight-forward. There is also a lot of interesting stuff on the net (including YouTube videos) on BigTable and MapReduce, other important parts of Google's architecture.
Inside MySpace: One of the few really huge sites build on the Microsoft stack. You can learn a lot of what not to do with your data layer.
A great start for finding much more resources on this topic is the Real Life Architectures section on the "High Scalability" web site. For example they a good summary on Amazons architecture.
Learning distributed computing isn't easy. Its really a very vast field covering areas on communication, security, reliability, concurrency etc., each of which would take years to master. Understanding will eventually come through a lot of reading and practical experience. You seem to have a challenging project to start with, so heres your chance :)
The two most popular books on distributed computing are, I believe:
1) Distributed Systems: Concepts and Design - George Coulouris et al.
2) Distributed Systems: Principles and Paradigms - A. S. Tanenbaum and M. Van Steen
Both these books give a very good introduction to current approaches (including communication protocols) that are being used to build successful distributed systems. I've personally used the latter mostly and I've found it to be an excellent text. If you think the reviews on Amazon aren't very good, its because most readers compare this book to other books written by A.S. Tanenbaum (who IMO is one of the best authors in the field of Computer Science) which are quite frankly better written.
PS: I really question your need to design and verify a new protocol. If you are working with application servers and databases, what you need is probably already available.
I liked the book Distributed Systems: Principles and Paradigms by Andrew S. Tanenbaum and Maarten van Steen.
At a more abstract and formal level, Communicating and Mobile Systems: The Pi-Calculus by Robin Milner gives a calculus for verifying systems. There are variants of pi-calculus for verifying protocols, such as SPI-calculus (the wikipedia page for which has disappeared since I last looked), and implementations, some of which are also verification tools.
Where can I find good material about designing distributed systems?
I have never been able to finish the famous book from Nancy Lynch. However, I find that the book from Sukumar Ghosh Distributed Systems: An Algorithmic Approach is much easier to read, and it points to the original papers if needed.
It is nevertheless true that I didn't read the books from Gerard Tel and Nicola Santoro. Perhaps they are still easier to read...
What methods there are for verifying that a distributed protocol works right?
In order to survey the possibilities (and also in order to understand the question), I think that it is useful to get an overview of the possible tools from the book Software Specification Methods.
My final decision was to learn TLA+. Why? Even if the language and tools seem better, I really decided to try TLA+ because the guy behind it is Leslie Lamport. That is, not just a prominent figure on distributed systems, but also the author of Latex!
You can get the TLA+ book and several examples for free.
There are many classic papers written by Leslie Lamport :
(http://research.microsoft.com/en-us/um/people/lamport/pubs/pubs.html) and Edsger Dijkstra
(http://www.cs.utexas.edu/users/EWD/)
for the database side.
A main stream is NoSQL movement,many project are appearing in the market including CouchDb( couchdb.apache.org) , MongoDB ,Cassandra. These all have the promise of scalability and managability (replication, fault tolerance, high-availability).
One good book is Birman's Reliable Distributed Systems, although it has its detractors.
If you want to formally verify your protocol you could look at some of the techniques in Lynch's Distributed Algorithms.
It is likely that whatever protocol you are trying to implement has been designed and analysed before. I'll just plug my own blog, which covers e.g. consensus algorithms.

Resources