Is there any way to use MongoDB multi-document transactions and readPreference=secondaryPreferred option at the same time? What is my goal: I have some functionality that makes a lot of heavy read operations, and I want to reduce the load from primary replica by executing read operations on secondary replicas.
MongoDB docs say that readPreference should be primary if transactions are used. So I am wondering how I can split load to read replicas. Does anyone know the way how to achieve this?
There is no way to do that because transactions can only be executed in the primary node. After the commit point, the operations are replicated in the secondary members too.
Transactions in MongoDB automatically guarantee Read your own Writes and Monotonic Reads (by design) in order to ensure a good level of consistency. Even if readPreference=secondaryPreferred was possible, these properties would not be granted and the results would be quite unpredictable. The safer way to implement transactions (in particular ACID properties) in a NoSQL DBMS is to pretend to be working on a single instance DB.
This does not mean you cannot have distributed transactions (in fact they exist in MongoDB), but a single source of truth is kind of necessary for each shard.
As #dododo pointed out, this is a restriction that they might relax in the future, as described in the Driver Transactions Specification.
According to the official MongoDb Manual,
If transaction-level and the session-level read preference are unset, the transaction uses the client-level read preference. By default, the client-level read preference is primary.
Multi-document transactions that contain read operations must use read preference primary. All operations in a given transaction must route to the same member.
So coming to your question, I was able to update the read preferences for transactions using TransactionOptions in the mongo configurations.
#Bean
MongoTransactionManager transactionManager(MongoDatabaseFactory dbFactory) {
TransactionOptions transactionOptions = TransactionOptions.builder().readPreference(com.mongodb.ReadPreference.primary()).build());
return new MongoTransactionManager(dbFactory, transactionOptions );
}
This worked for me. Thanks.
Related
As per this answer, it is recommended to go for single table in Cassandra.
Cassandra 3.0
We are planning for below schema:
Second table has composite key. PK(domain_id, item_id). So, domain_id is partition key & item_id will be clustering key.
GET request handler will access(read) two tables
POST request handler will access(write) into two tables
PUT request handler will access(write) details table(only)
As per CAP theorem,
What are the consistency issues in having multi-table schema? in Cassandra...
Can we avoid consistency issues in Cassandra? with these terms QUORUM, consistency level etc...
recommended to go for single table in Cassandra.
I would recommend the opposite. If you have to support multiple queries for the same data in Apache Cassandra, you should have one table for each query.
What are the consistency issues in having multi-table schema? in Cassandra...
Consistency issues between query tables can happen when writes are applied to one table but not the other(s). In that case, the application should have a way to gracefully handle it. If it becomes problematic, perhaps running a nightly job to keep them in-sync might be necessary.
You can also have consistency issues within a table. Maybe something happens (node crashes, down longer than 3 hours, hints not replayed) during the write process. In that case, a given data point may have only a subset of its intended replicas.
This scenario can be countered by running regularly-scheduled repairs. Additionally, consistency can be increased on a per-query basis (QUORUM vs. ONE, etc), and consistency levels of QUORUM and higher will occasionally trigger a read-repair (which syncs all replicas in the current operation).
Can we avoid consistency issues in Cassandra? with these terms QUORUM, consistency level etc...
So Apache Cassandra was engineered to be highly-available (HA), thereby embracing the paradigm of eventual consistency. Some might interpret that to mean Cassandra is inconsistent by design, and they would not be incorrect. I can say after several years of supporting hundreds of clusters at web/retail scale, that consistency issues (while they do happen) are rare, and are usually caused by failures to components outside of a Cassandra cluster.
Ultimately though, it comes down to the business requirements of the application. For some applications like product reviews or recommendations, a little inconsistency shouldn't be a problem. On the other hand, things like location-based pricing may need a higher level of query consistency. And if 100% consistency is indeed a hard requirement, I would question whether or not Cassandra is the proper choice for data storage.
Edit
I did not get this: "Consistency issues between query tables can happen when writes are applied to one table but not the other(s)." When writes are applied to one table but not the other(s), what happens?
So let's say that a new domain is added. Perhaps a scenario arises where the domain_details_table gets updated, but the id_table does not. Nothing wrong here on the database side. Except that when the application expects to find that domain_id in the id_table, but cannot.
In that case, maybe the application can retry using a secondary index on domain_details_table.domain_id. It won't be fast, but the decision to be made is more around which scenario is more preferable; no answer, or a slow answer? Again, application requirements come into play here.
For your point: "You can also have consistency issues within a table. Maybe something happens (node crashes, down longer than 3 hours, hints not replayed) during the write process." How does RDBMS(like MySQL) deal with this?
So the answer to this used to be simple. RDBMSs only run on a single server, so there's only one replica to keep in-sync. But today, most RDBMSs have HA solutions which can be used, and thus have to be kept in-sync. In that case (from what I understand), most of them will asynchronously update the secondary replica(s), while restricting traffic only to the primary.
It's also good to remember that RDBMSs enforce consistency through locking strategies, as well. So even a single-instance RDBMS will lock a data point during an update, blocking any reads until the lock is released.
In a node-down scenario, a single-instance RDBMS will be completely offline, so instead of inconsistent data you'd have data loss instead. In a HA RDBMS scenario, there would be a short pause (during which you would likely encounter connection/query failures) until it has failed-over to the new primary. Once the replica comes up, there would probably be additional time necessary to sync-up the replicas, until HA can be restored.
I am confused about how causal consistency affects the decision of choosing between read-concern "local" vs "available".
Why read-concern "available" is default for secondaries when it is a non causally consistent session ?
I understand how "available" behaves for sharded clusters vs "local" behaves for unsharded collections.
I just cannot make the connection based on reading the documents.
I would really appreciate if someone helped me bridge. Thanks ahead.
Here's a summary of read concern levels in terms of a sharded cluster:
majority: only returns data that was written to the majority of voting nodes and will not be rolled back.
local: returns data on the local node, but with orphaned documents filtered out. This requires the node to communicate with the shard's primary (if this read is on a secondary), or the config server to service the read. In a degraded sharded cluster, this read may stall indefinitely. This is not an issue for an unsharded collection, though. Possible to return data that could be rolled back.
available: returns whatever data that is available. This is to allow read availability as a priority over correctness. Possible to return data that could be rolled back. See Read Concern "available"
Shard secondaries default to "available" read concern to maintain behaviour compatibility with MongoDB 3.4 (see SERVER-31032)
Causal consistency can provide different guarantees depending on the read & write concern used (see Causal Consistency and Read and Write Concerns for an exhaustive detail), where a balance between read own writes, monotonic reads, monotonic writes, and writes follow reads can be achieved by using different levels of read & write concerns.
Since causal consistency provides a semblance of guarantee of data integrity, it is not compatible with "available" read concern, since "available" is meant to provide no guarantee regarding integrity, but to emphasize availability instead.
I am not completely understanding how these two features relate to one another in a (WiredTiger) MongoDB program:
1) WiredTiger Snapshots
2) Data Locking
If each read operation using the WiredTiger engine is, at read-time, provided with a database level 'snapshot' (so as to create consistency (the C in ACID), why then, do we also need locking? Let's use an example.
I perform a query at the Document level (a read operation). Okay, so I know I get the database level snapshot, so that my data is consistent EVEN IF another user is concurrently writing to that same Document, updating it.
So at this point, what is the use for having a Shared-Lock on that document, which is blocking all write (exclusive) operations on that document until the Shared-Lock is released? What could possibly go wrong in writing to that Document concurrently while I'm reading it, if I am, in fact, using a snapshot of the Document that was provided to me at read-time? Why would I care if the Document is locked during my read-operation period or not? I already have my (consistent) data from that point-in-time, no?
I'm obviously missing a key concept here... Any help?
Thanks.
You are right that the read operation will acquire a snapshot. When using the WiredTiger storage engine, MongoDB does not lock individual documents for either reads or writes. Instead WiredTiger uses Multi-Version Concurrency Control, MVCC. When performing an update of a document, that update will succeed as long as the document still has the same version as it had when acquiring the snapshot. If not, WiredTiger will return an error (WT_ROLLBACK) indicating that the update had write conflicts. In this case, the update will abort and all pending changes are undone. MongoDB will then transparently retry the operation.
I have a web application where I want to ensure concurrency with a DB level lock on the object I am trying to update. I want to make sure that a batch change or another user or process may not end up introducing inconsistency in the DB.
I see that Isolation levels ensure read consistency and optimistic lock with #Version field can ensure data is written with a consistent state.
My question is can't we ensure consistency with isolation level only? By making my any transaction that updates the record Serializable(not considering performance), will I not ensure that a proper lock is taken by the transaction and any other transaction trying to update or acquire lock or this transaction will fail?
Do I really need version or timestamp management for this?
Depending on isolation level you've chosen, specific resource is going to be locked until given transaction commits or rollback - it can be lock on a whole table, row or block of sql. It's a pessimistic locking and it's ensured on database level when running a transaction.
Optimistic locking on the other hand assumes that multiple transactions rarely interfere with each other so no locks are required in this approach. It is a application-side check that uses #Version attribute in order to establish whether version of a record has changed between fetching and attempting to update it.
It is reasonable to use optimistic locking approach in web applications as most of operations span through multiple HTTP request. Usually you fetch some information from database in one request, and update it in another. It would be very expensive and unwise to keep transactions open with lock on database resources that long. That's why we assume that nobody is going to use set of data we're working on - it's cheaper. If the assumption happens to be wrong and version has changed in between requests by someone else, Hibernate won't update the row and will throw OptimisticLockingException. As a developer, you are responsible for managing this situation.
Simple example. Online auctions service - you're watching an item page. You read its description and specification. All of it takes, let's say, 5 minutes. With pessimistic locking and some isolation levels you'd block other users from this particular item page (or all of the items even!). With optimistic locking everybody can access it. After reading about the item you're willing to bid on it so you click the proper button. If any other of users watching this item and change its state (owner changed its description, someone other bid on it) in the meantime you will probably (depending on app implementation) be informed about the changes before application will accept your bid because version you've got is not the same as version persisted in database.
Hope that clarifies a few things for you.
Unless we are talking about some small, isolated web application (only app that is working on a database), then making all of your transactions to be Serializable would mean having a lot of confidence in your design, not taking into account the fact that it may not be the only application hitting on that certain database.
In my opinion the incorporation of Serializable isolation level, or a Pessimistic Lock in other words, should be very well though decision and applied for:
Large databases and short transactions that update only a few rows
Where the chance that two concurrent transactions will modify the same rows is relatively low.
Where relatively long-running transactions are primarily read-only.
Based on my experience, in most of the cases using just the Optimistic Locking would be the most beneficial decision, as frequent concurrent modifications mostly happen in only small percentage of cases.
Optimistic locking definately also helps other applications run faster (dont think only of yourself!).
So when we take the Pessimistic - Optimistic locking strategies spectrum, in my opinion the truth lies somewhere more towards the Optimistic locking with a flavor of serializable here and there.
I really cannot reference anything here as the answer is based on my personal experience with many complex web projects and from my notes when i was preapring to my JPA Certificate.
Hope that helps.
The Google App Engine documentation contains this paragraph:
Note: If your application receives an exception when committing a
transaction, it does not always mean that the transaction failed. You
can receive DatastoreTimeoutException,
ConcurrentModificationException, or DatastoreFailureException
exceptions in cases where transactions have been committed and
eventually will be applied successfully. Whenever possible, make your
Datastore transactions idempotent so that if you repeat a transaction,
the end result will be the same.
Wait, what? It seems like there's a very important class of transactions that just simply cannot be made idempotent because they depend on current datastore state. For example, a simple counter, as in a like button. The transaction needs to read the current count, increment it, and write out the count again. If the transaction appears to "fail" but doesn't REALLY fail, and there's no way for me to tell that on the client side, then I need to try again, which will result in one click generating two "likes." Surely there is some way to prevent this with GAE?
Edit:
it seems that this is problem inherent in distributed systems, as per non other than Guido van Rossum -- see this link:
app engine datastore transaction exception
So it looks like designing idempotent transactions is pretty much a must if you want a high degree of reliability.
I was wondering if it was possible to implement a global system across a whole app for ensuring idempotency. The key would be to maintain a transaction log in the datastore. The client would generated a GUID, and then include that GUID with the request (the same GUID would be re-sent on retries for the same request). On the server, at the start of each transaction, it would look in the datastore for a record in the Transactions entity group with that ID. If it found it, then this is a repeated transaction, so it would return without doing anything.
Of course this would require enabling cross-group transactions, or having a separate transaction log as a child of each entity group. Also there would be a performance hit if failed entity key lookups are slow, because almost every transaction would include a failed lookup, because most GUIDs would be new.
In terms of the additional $ cost in terms of additional datastore interactions, this would probably still be less than if I had to make every transaction idempotent, since that would require a lot of checking what's in the datastore in each level.
dan wilkerson, simon goldsmith, et al. designed a thorough global transaction system on top of app engine's local (per entity group) transactions. at a high level, it uses techniques similar to the GUID one you describe. dan dealt with "submarine writes," ie the transactions you describe that report failure but later surface as succeeded, as well as many other theoretical and practical details of the datastore. erick armbrust implemented dan's design in tapioca-orm.
i don't necessarily recommend that you implement his design or use tapioca-orm, but you'd definitely be interested in the research.
in response to your questions: plenty of people implement GAE apps that use the datastore without idempotency. it's only important when you need transactions with certain kinds of guarantees like the ones you describe. it's definitely important to understand when you do need them, but you often don't.
the datastore is implemented on top of megastore, which is described in depth in this paper. in short, it uses multi-version concurrency control within each entity group and Paxos for replication across datacenters, both of which can contribute to submarine writes. i don't know if there are public numbers on submarine write frequency in the datastore, but if there are, searches with these terms and on the datastore mailing lists should find them.
amazon's S3 isn't really a comparable system; it's more of a CDN than a distributed database. amazon's SimpleDB is comparable. it originally only provided eventual consistency, and eventually added a very limited kind of transactions they call conditional writes, but it doesn't have true transactions. other NoSQL databases (redis, mongo, couchdb, etc.) have different variations on transactions and consistency.
basically, there's always a tradeoff in distributed databases between scale, transaction breadth, and strength of consistency guarantees. this is best known by eric brewer's CAP theorem, which says the three axes of the tradeoff are consistency, availability, and partition tolerance.
The best way I came up with making counters idempotent is using a set instead of an integer in order to count. Thus, when a person "likes" something, instead of incrementing a counter I add the like to the thing like this:
class Thing {
Set<User> likes = ....
public void like (User u) {
likes.add(u);
}
public Integer getLikeCount() {
return likes.size();
}
}
this is in java, but i hope you get my point even if you are using python.
This method is idempotent and you can add a single user for how many times you like, it will only be counted once. Of course, it has the penalty of storing a huge set instead of a simple counter. But hey, don't you need to keep track of likes anyway? If you don't want to bloat the Thing object, create another object ThingLikes, and cache the like count on the Thing object.
another option worth looking into is app engine's built in cross-group transaction support, which lets you operate on up to five entity groups in a single datastore transaction.
if you prefer reading on stack overflow, this SO question has more details.