What assigns timestamps to read-only transactions in Spanner? - distributed-transactions

The Spanner paper makes it clear that commit timestamps are chosen for read-write transactions by the coordinator leader. However, I'm not sure of where the timestamps are chosen for read-only transactions.
The documentation here says:
API Layer will pick the read timestamp by using the current TrueTime.
But where is that API layer situated? Does that refer to the location proxy that the paper says clients use to locate the relevant spanservers? The paper says it uses TT.now().latest but I can't tell where that gets invoked.
I had assumed that the timestamp could be chosen by any of the Paxos leaders involved in the multi-site read, but apparently not. Can someone please help clarify?

But where is that API layer situated? Does that refer to the location proxy that the paper says clients use to locate the relevant spanservers?
That's correct, it's in the API proxy.
I had assumed that the timestamp could be chosen by any of the Paxos leaders involved in the multi-site read, but apparently not.
To negotiate a timestamp, strong read requests must contact the Paxos leader for each Spanner group involved in that read.

Related

MongoDB transactions and readPreference option

Is there any way to use MongoDB multi-document transactions and readPreference=secondaryPreferred option at the same time? What is my goal: I have some functionality that makes a lot of heavy read operations, and I want to reduce the load from primary replica by executing read operations on secondary replicas.
MongoDB docs say that readPreference should be primary if transactions are used. So I am wondering how I can split load to read replicas. Does anyone know the way how to achieve this?
There is no way to do that because transactions can only be executed in the primary node. After the commit point, the operations are replicated in the secondary members too.
Transactions in MongoDB automatically guarantee Read your own Writes and Monotonic Reads (by design) in order to ensure a good level of consistency. Even if readPreference=secondaryPreferred was possible, these properties would not be granted and the results would be quite unpredictable. The safer way to implement transactions (in particular ACID properties) in a NoSQL DBMS is to pretend to be working on a single instance DB.
This does not mean you cannot have distributed transactions (in fact they exist in MongoDB), but a single source of truth is kind of necessary for each shard.
As #dododo pointed out, this is a restriction that they might relax in the future, as described in the Driver Transactions Specification.
According to the official MongoDb Manual,
If transaction-level and the session-level read preference are unset, the transaction uses the client-level read preference. By default, the client-level read preference is primary.
Multi-document transactions that contain read operations must use read preference primary. All operations in a given transaction must route to the same member.
So coming to your question, I was able to update the read preferences for transactions using TransactionOptions in the mongo configurations.
#Bean
MongoTransactionManager transactionManager(MongoDatabaseFactory dbFactory) {
TransactionOptions transactionOptions = TransactionOptions.builder().readPreference(com.mongodb.ReadPreference.primary()).build());
return new MongoTransactionManager(dbFactory, transactionOptions );
}
This worked for me. Thanks.

DB consistency with microservices

What is the best way to achieve DB consistency in microservice-based systems?
At the GOTO in Berlin, Martin Fowler was talking about microservices and one "rule" he mentioned was to keep "per-service" databases, which means that services cannot directly connect to a DB "owned" by another service.
This is super-nice and elegant but in practice it becomes a bit tricky. Suppose that you have a few services:
a frontend
an order-management service
a loyalty-program service
Now, a customer make a purchase on your frontend, which will call the order management service, which will save everything in the DB -- no problem. At this point, there will also be a call to the loyalty-program service so that it credits / debits points from your account.
Now, when everything is on the same DB / DB server it all becomes easy since you can run everything in one transaction: if the loyalty program service fails to write to the DB we can roll the whole thing back.
When we do DB operations throughout multiple services this isn't possible, as we don't rely on one connection / take advantage of running a single transaction.
What are the best patterns to keep things consistent and live a happy life?
I'm quite eager to hear your suggestions!..and thanks in advance!
This is super-nice and elegant but in practice it becomes a bit tricky
What it means "in practice" is that you need to design your microservices in such a way that the necessary business consistency is fulfilled when following the rule:
that services cannot directly connect to a DB "owned" by another service.
In other words - don't make any assumptions about their responsibilities and change the boundaries as needed until you can find a way to make that work.
Now, to your question:
What are the best patterns to keep things consistent and live a happy life?
For things that don't require immediate consistency, and updating loyalty points seems to fall in that category, you could use a reliable pub/sub pattern to dispatch events from one microservice to be processed by others. The reliable bit is that you'd want good retries, rollback, and idempotence (or transactionality) for the event processing stuff.
If you're running on .NET some examples of infrastructure that support this kind of reliability include NServiceBus and MassTransit. Full disclosure - I'm the founder of NServiceBus.
Update: Following comments regarding concerns about the loyalty points: "if balance updates are processed with delay, a customer may actually be able to order more items than they have points for".
Many people struggle with these kinds of requirements for strong consistency. The thing is that these kinds of scenarios can usually be dealt with by introducing additional rules, like if a user ends up with negative loyalty points notify them. If T goes by without the loyalty points being sorted out, notify the user that they will be charged M based on some conversion rate. This policy should be visible to customers when they use points to purchase stuff.
I don’t usually deal with microservices, and this might not be a good way of doing things, but here’s an idea:
To restate the problem, the system consists of three independent-but-communicating parts: the frontend, the order-management backend, and the loyalty-program backend. The frontend wants to make sure some state is saved in both the order-management backend and the loyalty-program backend.
One possible solution would be to implement some type of two-phase commit:
First, the frontend places a record in its own database with all the data. Call this the frontend record.
The frontend asks the order-management backend for a transaction ID, and passes it whatever data it would need to complete the action. The order-management backend stores this data in a staging area, associating with it a fresh transaction ID and returning that to the frontend.
The order-management transaction ID is stored as part of the frontend record.
The frontend asks the loyalty-program backend for a transaction ID, and passes it whatever data it would need to complete the action. The loyalty-program backend stores this data in a staging area, associating with it a fresh transaction ID and returning that to the frontend.
The loyalty-program transaction ID is stored as part of the frontend record.
The frontend tells the order-management backend to finalize the transaction associated with the transaction ID the frontend stored.
The frontend tells the loyalty-program backend to finalize the transaction associated with the transaction ID the frontend stored.
The frontend deletes its frontend record.
If this is implemented, the changes will not necessarily be atomic, but it will be eventually consistent. Let’s think of the places it could fail:
If it fails in the first step, no data will change.
If it fails in the second, third, fourth, or fifth, when the system comes back online it can scan through all frontend records, looking for records without an associated transaction ID (of either type). If it comes across any such record, it can replay beginning at step 2. (If there is a failure in step 3 or 5, there will be some abandoned records left in the backends, but it is never moved out of the staging area so it is OK.)
If it fails in the sixth, seventh, or eighth step, when the system comes back online it can look for all frontend records with both transaction IDs filled in. It can then query the backends to see the state of these transactions—committed or uncommitted. Depending on which have been committed, it can resume from the appropriate step.
I agree with what #Udi Dahan said. Just want to add to his answer.
I think you need to persist the request to the loyalty program so that if it fails it can be done at some other point. There are various ways to word/do this.
1) Make the loyalty program API failure recoverable. That is to say it can persist requests so that they do not get lost and can be recovered (re-executed) at some later point.
2) Execute the loyalty program requests asynchronously. That is to say, persist the request somewhere first then allow the service to read it from this persisted store. Only remove from the persisted store when successfully executed.
3) Do what Udi said, and place it on a good queue (pub/sub pattern to be exact). This usually requires that the subscriber do one of two things... either persist the request before removing from the queue (goto 1) --OR-- first borrow the request from the queue, then after successfully processing the request, have the request removed from the queue (this is my preference).
All three accomplish the same thing. They move the request to a persisted place where it can be worked on till successful completion. The request is never lost, and retried if necessary till a satisfactory state is reached.
I like to use the example of a relay race. Each service or piece of code must take hold and ownership of the request before allowing the previous piece of code to let go of it. Once it's handed off, the current owner must not lose the request till it gets processed or handed off to some other piece of code.
Even for distributed transactions you can get into "transaction in doubt status" if one of the participants crashes in the midst of the transaction. If you design the services as idempotent operation then life becomes a bit easier. One can write programs to fulfill business conditions without XA. Pat Helland has written excellent paper on this called "Life Beyond XA". Basically the approach is to make as minimum assumptions about remote entities as possible. He also illustrated an approach called Open Nested Transactions (http://www.cidrdb.org/cidr2013/Papers/CIDR13_Paper142.pdf) to model business processes. In this specific case, Purchase transaction would be top level flow and loyalty and order management will be next level flows. The trick is to crate granular services as idempotent services with compensation logic. So if any thing fails anywhere in the flow, individual services can compensate for it. So e.g. if order fails for some reason, loyalty can deduct the accrued point for that purchase.
Other approach is to model using eventual consistency using CALM or CRDTs. I've written a blog to highlight using CALM in real life - http://shripad-agashe.github.io/2015/08/Art-Of-Disorderly-Programming May be it will help you.

Distributed transactions - why do we save tranlogs to file system?

All transaction managers (Atomikos, Bitronix, IBM WebSphere TM etc) save some "transaction logs" into 'tranlogs' folder to file system.
When something terrible happens and server gets down sometimes tranlogs become broken.
They require some manual recovery procedure.
I've been told that by simply clearing broken tranlogs folder I risk to have an inconsistent state of resources that participated in transactions.
As a "dumb" developer I feel more comfortable with simple concepts. I want to think that distributed transaction management should be alike the regular transaction management:
If something went wrong at any party (network, app error, timeout) - I expect the whole multi-resource transaction not to be committed in any part of it. All leftovers should be cleaned up sooner or later automatically.
If transaction managers fails (file system fault, power supply fault) - I expect all the transactions under this TM to be rollbacked (apparently, at DB timeout level).
File storage for tranlogs is optional if I don't want to have any automatic TX recovery (whatever it would mean).
Questions
Why can't I think like this? What's so complicated about 2PC?
What are the exact risks when I clear broken tranlogs?
If I am wrong and I really need all the mess with 2PC file system state. Don't you feel sick about the fact that TX manager can actually break storage state in an easy and ugly manner?
When I was first confronted with 2 phase commit in real life in 1994 (initially on a larger Oracle7 environment), I had a similar initial reaction. What a bloody shame that it is not generally possible to make it simple. But looking back at algorithm books of university, it become clear that there is no general solution for 2PC.
See for instance how to come to consensus in a distributed environment
Of course, there are many specific cases where a 2PC commit of a transaction can be resolved more easy to either complete or roll back completely and with less impact. But the general problem stays and can not be solved.
In this case, a transaction manager has to decide at some time what to do; a transaction can not remain open forever. Therefor, as an ultimate solution they will always need to have go back to their own transaction logs, since one or more of the other parties may not be able to reliably communicate status now and in the near future. Some transaction managers might be more advanced and know how to resolve some cases more easily, but the need for an ultimate fallback stays.
I am sorry for you. Fixing it generally seems to be identical to "Falsity implies anything" in binary logic.
Summarizing
On Why can't I think like this? and What's so complicated about 2PC: See above. This algorithmetic problem can't be solved universally.
On What are the exact risks when I clear broken tranlogs?: the transaction manager has some database backing it. Deleting translogs is the same problem in general relational database software; you loose information on the transactions in process. Some db platforms can still have somewhat or largely integer files. For background and some database theory, see Wikipedia.
On Don't you feel sick about the fact that TX manager can actually break storage state in an easy and ugly manner?: yes, sometimes when I have to get a lot of work done by the team, I really hate it. But well, it keeps me having a job :-)
Addition: to 2PC or not
From your addition I understand that you are thinking whether or not to include 2PC in your projects.
In my opinion, your mileage may vary. Our company has as policy for 2PC: avoid it whenever possible. However, in some environments and especially with legacy systems and complex environments such a found in banking you can not work around it. The customer requires it and they may be not willing to allow you to perform a major change in other infrastructural components.
When you must do 2PC: do it well. I like a clean architecture of the software and infrastructure, and something that is so simple that even 5 years from now it is clear how it works.
For all other cases, we stay away from two phase commit. We have our own framework (Invantive Producer) from client, to application server to database backend. In this framework we have chosen to sacrifice elements of ACID when normally working in a distributed environment. The application developer must take care himself of for instance atomicity. Often that is possible with little effort or even doesn't require thinking about. For instance, all software must be safe for restart. Even with atomicity of transactions this requires some thinking to do it well in a massive multi user environment (for instance locking issues).
In general this stupid approach is very easy to understand and maintain. In cases where we have been required to do two phase commit, we have been able to just replace some plug-ins on the framework and make some changes to client-side code.
So my advice would be:
Try to avoid 2PC.
But encapsulate your transaction logic nicely.
Allowing to do 2PC without a complete rebuild, but only changing things where needed.
I hope this helps you. If you can tell me more about your typical environments (size in #tables, size in GB persistent data, size in #concurrent users, typical transaction mgmt software and platform) may be i can make some additions or improvements.
Addition: Email and avoiding message loss in 2PC
Regarding whether suggesting DB combining with JMS: No, combining DB with JMS is normally of little use; it will itself already have some db, therefor the original question on transaction logs.
Regarding your business case: I understand that per event an email is sent from a template and that the outgoing mail is registered as an event in the database.
This is a hard nut to crack; I've been enjoying doing security audits and one of the easiest security issues to score was checking use of email.
Email - besides not being confidential and tampersafe in most situations like a postcard - has no guarantees for delivery and/or reading without additional measures. For instance, even when email is delivered directly between your mail transfer agent and the recipient, data loss can occur without the transaction monitor being informed. That even gets worse when multiple hops are involved. For instance, each MTA has it's own queueing mechanism on which a "bomb can be dropped" leading to data loss. But you can also think of spam measures, bad configuration, mail loops, pressing delete file by accident, etc. Even when you can register the sending of the email without any loss of transaction information using 2PC, this gives absolutely no clue on whether the email will arrive at all or even make it across the first hop.
The company I work for sells a large software package for project-driven businesses. This package has an integrated queueing mechanism, which also handles email events. Typically combined in most implementation with Exchange nowadays. A few months we've had a nice problem: transaction started, opened mail channel, mail delivered to Exchange as MTA, register that mail was handled... transaction aborted, since Oracle tablespace full. On the next run, the mail was delivered again to Exchange, again abort, etc. The algorithm has been enhanced now, but from this simple example you can see that you need all endpoints to cooperate in your 2PC, even when some of the endpoints are far away in an organisation receiving and displaying your email.
If you need measures to ensure that an email is delivered or read, you will need to supplement it by additional measures. Please pick one of application controls, user controls and process controls from literature.

Should Conversation Handles be the Same at Each End of a Service Broker?

I'm new to Service Broker. We've set up a development system that appears to be working well.
One strange thing I've noticed is that the conversation handles in the audit tables at each end of the Service Broker are different for the same message. I had assumed the same conversation handle would be used by the initiator and target ends so I'm wondering if we've got something configured wrong.
Is it normal for a single message to have different conversation handles at each end of a conversation?
There is a conversation_id, which is the same at both endpoints.
And there is conversation_handle, which must be different at each endpoint. Think at the very simply scenario when the initiator and target are in the same database. Had the conversation_handle be the same then it would be ambiguous which endpoint you mean when you issue a SEND. Did you send from initiator to target, or from target to initiator? If the handle is the same, can't distinguish! Therefore the handle must be different.
You always interact in your application with conversation_handle. This is the value you pass to your T-SQL statements (SEND, END, MOVE), and this is what BEGIN DIALOG returns. The handle value never leaves the box (is not part of the message wire format).
You use the conversation_id mostly in debugging. It can be used to identify your peer converstaion endpoint on a remote machine, it can be used to identify the conversation in message related events in SQL Profiler (Broker Event Category).
After you get past this A-ha moment, soon you'll be confused by the fact that conversation_group is always local and does not travel on the wire. It was asked before here, read 1314050 or 6434464.
And there is one more id: the message_id. This is assigned to the message during SEND. It travels on the wire, and is use is only for debugging and troubleshooting. Eg. it features in Broker:Forwarded Message Dropped Event Class.
I've been looking for definitive information about this on the web. Here's what I found:
Conversation handles are definitely different at each end of a conversation. They uniquely identify the endpoint of the conversation, not the conversation as a whole. The conversation_id is the conversation identifier that is shared by both ends of the conversation.
Here's the Books Online entry for sys.conversation_endpoints. It says:
conversation_handle: Identifier for this conversation endpoint. (emphasis mine)
conversation_id: Identifier for the conversation. This identifier is shared by both participants in the conversation. (emphasis mine)
In addition, Microsoft has a PDF online with a sample chapter of the book The Rational Guide To SQL Server 2005 Service Broker, written by the program manager for Service Broker, Roger Wolter. For those with access to the book this is chapter 2, Conversations. The section Conversation Persistence has this explanation of conversation_ids and why conversation handles have to be different at each end of the conversation:
conversation_id — This is the "on the wire" identifier for a conversation. This identifier is included in each message sent across the network. Each message of this conversation has this uniqueidentifier in its header so that the endpoint knows which conversation the message belongs to. At this point, you are probably wondering when to use the conversation_handle instead of another uniqueidentifier. When both endpoints of a conversation are in the same database (as they were in the ISBN Lookup example in Chapter 1), there will be two rows in the table for the same conversation. Therefore, we need two handles as keys.
This puzzled me when I first started using service broker but you're not seeing things - the id's are different at each end.
I gave this quite a lot of thought (couldn't find any definitive documentation on it) and came up with the following reasoning.
The conversation Id is really a correlation identifier that logically groups messages inside a service boundary, i.e. on either the sender or the receiving end, so there is no compelling reason why the ID has to be the same on both sides (even if that would be convenient for debugging).
If there is no good reason why they should be the same then it makes sense for them not to be the same. There is a tiny, tiny chance that you could get a GUID collision if you used one on a machine that didn't create it, so I guess they decided it would be better to generate a new one.
[Happy to be corrected by anyone who knows more about this....]

writing then reading entity does not fetch entity from datastore

I am having the following problem. I am now using the low-level
google datastore API rather than JDO, that way I should be in a
better position to see exactly what is happening in my code. I am
writing an entity to the datastore and shortly thereafter reading it
from the datastore using Jetty and eclipse. Sometimes the written
entity is not being read. This would be a real problem if it were to
happen in production code. I am using the 2.0 RC2 API.
I have tried this several times, sometimes the entity is retrieved
from the datastore and sometimes it is not. I am doing a simple
query on the datastore just after committing a write transaction.
(If I run the code through the debugger things run slow enough
that the entity has a chance of being read back on the second pass).
Any help with this issue would be greatly appreciated,
Regards,
The development server has the same consistency guarantees as the High Replication datastore on the live server. A "global" query uses an index that is only guaranteed to be eventually consistent with writes. To perform a query with strongly consistent guarantees, the query must be limited to an entity group, using an "ancestor" key.
A typical technique is to group data specific to a single user in a group, so the user can see changes to queries limited to the user's group with strong consistency guarantees. Another technique is to use fancier client logic to update the client's local view as soon as the change is submitted, so the user sees the change in the UI immediately while the update to the global index is in progress.
See the docs on queries and transactions.

Resources