what are the essential differences between SAGA and TCC for the distributed transaction problem? - distributed-transactions

I am reading the SAGA pattern to solve the distributed transaction problem. Although the names of the concepts in SAGA are different from those in TCC(try-commit-cancel), I don't find essential different ideas between the 2 patterns.
Both TCC and SAGA rely on the local transaction, try/retry/cancel/commit of the local transaction to guarantee the whole transaction. Could somebody explain the key differences that differentiate the 2 patterns and make both models valuable?
I lean SAGA from this document:
https://learn.microsoft.com/en-us/azure/architecture/reference-architectures/saga/saga

The key differences is the data visibility. If you use Saga to do a cross-bank transfer, then the client may see unexpected balance change if the transfer failed. But TCC can avoid that.
It is explained in detail in this article:
https://dev.to/yedf2/best-practice-for-tcc-distributed-transaction-in-go-402m

I think TCC is one special kind of SAGA Orchestration.

Related

Consensus algorithm for Node.js

I'm trying to implement a collaborative canvas in which many people can draw free-handly or with specific shape tools.
Server has been developed in Node.js and client with Angular1-js (and I am pretty new to them both).
I must use a consensus algorithm for it to show always the same stuff to all the users.
I'm seriously in troubles with it since I cannot find a proper tutorial its use. I have been looking and studying Paxos implementation but it seems like Raft is very used in practical.
Any suggestions? I would really appreciate it.
Writing a distributed system is not an easy task[1], so I'd recommend using some existing strongly consistent one instead of implementing one from scratch. The usual suspects are zookeeper, consul, etcd, atomix/copycat. Some of them offer nodejs clients:
https://github.com/alexguan/node-zookeeper-client
https://www.npmjs.com/package/consul
https://github.com/stianeikeland/node-etcd
I've personally never used any of them with nodejs though, so I won't comment on maturity of clients.
If you insist on implementing consensus on your own, then raft should be easier to understand — the paper is surprisingly accessible https://raft.github.io/raft.pdf. They also have some nodejs implementations, but again, I haven't used them, so it is hard to recommend any particular one. Gaggle readme contains an example and skiff has an integration test which documents its usage.
Taking a step back, I'm not sure if the distributed consensus is what you need here. Seems like you have multiple clients and a single server. You can probably use a centralized data store. The problem domain is not really that distributed as well - shapes can be overlaid one on top of the other when they are received by server according to FIFO (imagine multiple people writing on the same whiteboard, the last one wins). The challenge is with concurrent modifications of existing shapes, by maybe you can fallback to last/first change wins or something like that.
Another interesting avenue to explore here would be Conflict-free Replicated Data Types — CRDT. Folks at github used them to implement collaborative "pair" programming in atom. See the atom teletype blog post, also their implementation maybe useful, as collaborative editing seems to be exactly the problem you try to solve.
Hope this helps.
[1] Take a look at jepsen series https://jepsen.io/analyses where Kyle Kingsbury tests various failure conditions of distribute data stores.
Try reading Understanding Paxos. It's geared towards software developers rather than an academic audience. For this particular application you may also be interested in the Multi-Paxos Example Application referenced by the article. It's intended both to help illustrate the concepts behind the consensus algorithm and it sounds like it's almost exactly what you need for this application. Raft and most Multi-Paxos designs tend to get bogged down with an overabundance of accumulated history that generates a new set of problems to deal with beyond simple consistency. An initial prototype could easily handle sending the full-state of the drawing on each update and ignore the history issue entirely, which is what the example application does. Later optimizations could be made to reduce network overhead.

Accessing same db with multiple clients - Pattern?

I am having trouble finding a pattern/literature on how to make a client-server environment with multiple clients without being bugged with conflicting changes. Does there exits a pattern or some god literature on this concept?
You have to handle conflicting changes, unless you want to keep your transaction open during the whole user interaction (hint: you don't want to do that).
The concept to look for is "optimistic locking" or "optimistic concurrency control".
If you already looked into that, you need to explain what specific problem you want to avoid, and why optimistic locking won't work.

SQL Server Bi-Directional Transactional Replication - Is it a good use-case?

We're having a problem with scaling out with SQL server. This is largely because of a few reasons: 1) poorly designed data structures, 2) heavy lifting and business/processing logic is all done in T-SQL. This was verified by a Microsoft SQL guy from Redmond we hired to perform an analysis on our server. We're literally solving issues by continually increasing the command timeout, which is ridiculous, and not a good long term solution. We have since put together the following strategy and set of phases:
Phase 1: Throw hardware/software at the issue to stop the bleeding.
This includes a few different things like a caching server, but what I'd like to ask everyone here about is specifically related to implementing bi-directional transactional replication on a new SQL server. We have two use-cases for wanting to implement this:
We were thinking of running the long running (and table/row locking) SELECTs on this new SQL "processing box" and throwing them into a caching layer and having the UI read them from the cache. These SELECTs are generating reports and also returning results on the web.
Most of the business logic is in SQL. We have some LONG running queries for SELECTs, INSERTs, UPDATEs, and DELETEs which perform processing logic. The end result is really just a hand-full of INSERTs, UPDATEs, and DELETEs after the processing is complete (lots of cursors). The thought would be to balance the load between these two servers.
I have some questions:
Are these good use-cases for bi-directional transactional replication?
I need to ensure that this solution is going to "just work" and not have to worry about conflicts. Where would conflicts arise within this solution? I have read a few articles about resetting the increment on your identity seed in order to prevent collisions, which makes sense, but how does it handle UPDATEs/DELETEs or other places where conflicts might occur?
What other issues might I run into and we need to watch out for?
Is there a better solution to this problem?
Phase 2: Rewrite the logic into .NET, where it should be, and optimize SQL stored procedures to perform only set-based operations, as it should also be.
This will obviously take a while, which is why we wanted to see if there were some preliminary steps we could take to stop the pain our users are experiencing.
Thanks.
Imho bidirectional replication is very very far from 'it will just work'. Preventing update conflicts requires exquisite planning, ensuring that all that 'processing' is carefully orchestrated never to work on overlapping data. Master-master replication is one of the most difficult solution to pull off.
Consider this: you envision a solution that is providing a cheap 2x scale out with nearly no code modification. such a solution would be quite useful, one would expect to see it deployed everywhere. Yet is nowhere to be seen.
I recommend you search for the many blogs and articles describing gotchas and warnings about (the much more popular) MySQL master-master deployments (eg. If You Must Deploy Multi-Master Replication, Read This First), judge for yourself is the trouble is worth it.
I don't have all the details you do, but I would focus on the application. If you want to just throw money at the problem short term I would make sure that the cheap scale-up is exhausted before considering scale-out (SSD/Fusion drives, more RAM). Also investigate snapshot isolation level/read committed snapshot first, if locking is the main issue.

Idioms or algorithms for distributed transactions?

Imagine you have 2 entities on different systems, and need to perform some sort of transaction that alters one or both of them based on information associated with one or both of them, and require that either changes to both entities will complete or neither of them will.
Simple example, that essentially has to run the 2 lines on 2 separate pieces of hardware:
my_bank.my_account -= payment
their_bank.their_account += payment
Presumably there are algorithms or idioms that exist specifically for this sort of situation, working correctly (for some predictable definition of correct) in the presence of other attempted access to the same values. The two-phase commit protocol seems to be one such approach. Are there any simpler alternatives, perhaps with more limitations? (eg. Perhaps they require that no system can shutdown entirely or fail to respond.) Or maybe there more complex ones that are better in some way? And is there a standard or well-regarded text on the matter?
There's also the 3PC "3 Phase Commit Protocol". 3PC solves some of the issues of 2PC by having an extra phase called pre-commit. A participant in the transaction receives a pre-commit message to know that all the other participants have agreed to commit, but have not done it yet. This phase removes the uncertainty of the 2PC when all participants are waiting for either a commit or abort message from the coordinator.
AFAIK - most databases work just fine with 2PC protocol, because in the unlikely conditions that it fails, they always have the transaction logs to undo/redo operations and leave the data in a consistent state.
Most of this stuff is very well discussed in
"Database Solutions, second edition"
and
"Database Systems: The Complete Book"
More in the distributed world you might want to check current state of Web Service technology on distributed transactions and workflows. Not my cup of tea, to be honest. There are frameworks for Python, Java and .Net to run this kind of services (an example).
As my last year project, some years ago, I implemented a distributed 2PC protocol on top of Web Services and I was able to run transactions on two separate databases, just like the example you gave. However, I am sure today people implement this in a most restful-alike approach, for instance see here. Even though, some other protocols are mentioned in these links, in the end they all end up implementing 2PC.
In summary, a 2PC protocol implementation with with proper operation logs to undo/redo in case of crash is one of the most sensible options to go for.

What are the pros and cons of using database for IPC to share data instead of message passing?

To be more specific for my application: the shared data are mostly persistent data such as monitoring status, configurations -- not more than few hundreds of items, and are updated and read frequently but no more than 1 or 2Hz. The processes are local to each other on the same machine.
EDIT 1: more info - Processes are expected to poll on a set of data they are interested in (ie. monitoring) Most of the data are persistent during the lifetime of the program but some (eg. configs) are required to be restored after software restart. Data are updated by the owner only (assume one owner for each data) Number of processes are small too (not more than 10)
Although using a database is notably more reliable and scalable, it always seems to me it is kind of an overkill or too heavy to use when all I do with it is to share data within an application. Whereas message passing with eg. JMS also has the middleware part, but it is more lightweight and has a more natural or flexible communication API. Implementing event notification and command pattern is also easier with messaging I think.
It will be of great help if anyone can give me an example of what circumstance would one be more preferable to the other.
For example I know we can more readily share persistent data between processes using database, although it is also doable with messaging by distributing across processes and/or storing with some XML files.
And according to here, http://en.wikipedia.org/wiki/Database-as-IPC and here, http://tripatlas.com/Database_as_an_IPC. It says it'd be an anti pattern when used in place of message passing, but it does not elaborate on eg. how bad can the performance hit be using database compared to message passing.
I have gone thru several previous posts that asked a similar question but I am hoping to find an answer that'd focus on design justification. But from those questions that I have read so far I can see a lot of people did use database for IPC (or implemented message queue with database)
Thanks!
I once wrote a data acquisition system that ran on about 20 lab computers. All the communication between the machines took place through small tables on a MySQL database. This scheme worked great, and as the years went by and the system expanded everything scaled well and remained very responsive. It was easy to add features and fix bugs. You could debug it easily because all the message passing was easy to tap into.
What the database does for you is that it provides a fast, well debugged way of maintaining concurrency while the network traffic gets very busy. Someone at MySQL spent a lot of time making the network stuff work well under high load, and if you write your own system using tcp/ip sockets you're going to be re-inventing those wheels at great expense of time and effort.
Another advantage is that if you're using the database anyway for actual data storage then you don't have to add anything else to your system. You keep the possible points of failure to a minimum.
So all these people who say IPC through databases is bad aren't always right.
Taking into account that DBMS is a way to store information, and messages are the way to transport information, your decision should be based on the answer to the question: "do I need persistence of data in time or the data is consumed by recipient".

Resources