My program has users and groups.
Use-case: Users have credits they can give to the groups.
I want no credits to be lost.
The problem with RDBMS-like schema is the atomicity of Mongo is document-level. I reduce the credits of the user and then increase the credits of the group. If the application crashes between the two the credits, that are sent, become lost. (If I replace the two actions credits can appear without being withdrawn from the user.)
One solution could be using of a collection called transactions. Then the credits of the user can be calculated from the send credits of the user subtracted from the credits of the user. The credits of the group can be calculated as the sum of the sent credits. The problem is if the database grows the sum will take a while.
Can you give me any acceptable, scalable and robust solution for this with NoSQL? (I know it is really easy in RDBMS)
Two phase commits in MongoDB
You can perform multi-document updates or multi-document transactions using a two phase commit approach.
Using two-phase commit ensures that data is consistent and, in case of an error, the state that preceded the transaction is recoverable. During the procedure, however, documents can represent pending data and states.
For more details about Two phase commits you can refer to the docs here
NOTE
Because only single-document operations are atomic with MongoDB, two-phase commits can only offer transaction-like semantics. It is possible for applications to return intermediate data at intermediate points during the two-phase commit or rollback.
Related
I am building a home budget app with NextJS + Prisma ORM + PostgreSQL.
I am not sure if my current strategy of how to handle deleting/reverting past transactions makes sense in terms of scaling up/db performance..
So, app functions in this way:
User adds transaction that are assigned to a chosen bank account. Every transaction row in db includes fields like amount, balanceBefore and balanceAfter.
After successful transaction, banks account current balance is being updated.
Now, assuming the situation where multiple transactions has been inserted and user realises he made a mistake somewhere along the line. He would then need to select the transaction and delete/update it, which would follow updating every row following this transaction to update balanceAfter and balanceBefore fields so the transactions history is correct.
Is there a better way of handling this kind of situation? Having to update a row that is couple thousand records in past might be heavy on resources.
Not only should you never delete or update a financial transaction but neither should your input data contain balances (before or after). Instead of updating or deleting a transaction you generate 2 new ones: one which reverses the incorrect transaction (thus restoring balances) and one that inserts the correct values.
As for balances, do not store them, just store the transaction amount. Then create a view which calculates the balances on the fly when needed. By creating a view you do not need to perform any calculations when DML is preformed on your transactions. See the following example for both the above.
We can undo an action using Command or Memento pattern.
If we are using kafka then we can replay the stream in reverse order to go back to the previous state.
For example, Google docs/sheet etc. also has version history.
in case of pcpartpicker, it looks like the following:
For being safe, I want to commit everything but want to go back to the previous state if needed.
I know we can disable auto-commit and use Transaction Control Language (COMMIT, ROLLBACK, SAVEPOINT). But I am talking about undoing even after I have commited the change.
How can I do That?
There isn't a real generic answer to this question. It all depends on the structure of your database, span of the transactions across entities, distributed transactions, how much time/transactions are allowed to pass before your can revert the change, etc.
Memento like pattern
Memento Pattern is one of the possible approaches, however it needs to be modified due to the nature of the relational databases as follows:
You need to have transaction log table/list that will hold the information of the entities and attributes (tables and columns) that ware affected by the transaction with their primary key, the old and new values (values before the transaction had occurred, and values after the transaction) as well as datetimestamp. This is same with the command (memento) pattern.
Next you need a mechanism to identify the non-explicit updates that ware triggered by the stored procedures in the database as a consequence of the transaction. This is important, since a change in a table can trigger changes in other tables which ware not explicitly captured by the command.
Mechanism for rollback will need to determine if the transaction is eligible for roll-back by building a list of subsequent transactions on the same entities and determine if this transaction is eligible for roll-back, or some subsequent transaction would need to be rolled-back as well before this transaction can be rolled-back.
In case of a roll-back is allowed after longer period of time, or a near-realtime consumption of the data, there should also be a list of transaction observers, processes that need to be informed that the transaction is no longer valid since they already read the new data and took a decision based on it. Example would be a process generating a cumulative report. When transaction is rolled-back, the rollback will invalidate the report, so the report needs to be generated again.
For a short term roll-back, mainly used for distributed transactions, you can check the Microservices Saga Pattern, and use it as a starting point to build your solution.
History tables
Another approach is to keep incremental updates or also known as history tables. Where each update of the row will be an insert in the history table with new version. Similar to previous case, you need to decide how far back you can go in the history when you try to rollback the committed transaction.
Regulation issues
Finally, when you work with business data such as invoice, inventory, etc. you also need to check what are the regulations related with the cancelation of committed transactions. As example, in the accounting systems, it's not allowed to delete data, rather a new row with the compensation is added (ex. removing product from shipment list will not delete the product, but add a row with -quantity to cancel the effect of the original row and keep audit track of the change at the same time.
I have an Entity that represents a Payment Method. I want to have an entity group for all the payment attempts performed with that payment method.
The 1 write-per-second limitation is fine and actually good for my use case, as there is no good reason to charge a specific credit card more frequently than that, but I could not find any specifications on the max size of an entity group.
My concern is would a very active corporate account hit any limitations in terms of number of records within an entity group (when they perform their 1 millionth transaction with us)?
No, there isn't a limit for the entity group size, all datastore-related limits are documented at Limits.
But be aware that the entity group size matters when it comes to data contention, see Keep entity groups small. Please note that contention is not only happening when writing entities, but also when reading them inside transaction (see Contention problems in Google App Engine) or, occasionally, maybe even outside transactions (see TransactionFailedError on GAE when no transaction).
IMHO your usage case is not worth the risk of dealing with these issues (fairly difficult to debug and address), I wouldn't use a single entity group in this case.
After posting the question here, I got to know that NoSQL are better at scaling out because they make a trade off between support for transaction and scalability.
So I wonder in what circumstances transactions are not that important so that scalability is more preferable to support of transaction?
Well, I would say first that NoSQL is better at scaling is some situations, but not all.
Full ACID transactions are Atomic, Consistent, Isolated and Durable. If you lose transactions, you will loose some or all of ACID within the datastore.
There are many ways to restore these functions with other asynchronous systems like message queues that themselves are durable. You can shove data onto a durable message queue, pop the data and deal with it in your NoSQL, then, when you can confirm it's stored to your required minimum, you can flag the message as consumed. It's the D in ACID, but distributed and asynchronous. There are ways to ensure the others, but they are often sacrificed to some extent, or moved into another place in the system. With some NoSQL solutions, you just have to move consistency into the application so it doesn't try to store invalid data.
When you start moving away from database driven transactions, you must increase your application testing dramatically to ensure your system doesn't fail (for some values of fail).
There are essentially no situations where transactions and constraints are not important in a system that has both read and write requirements. If they weren't you wouldn't care about your data at all (and some people don't, but regret it later). There are however levels of "caring". It's just a matter of how you end up at ACID or some pseudo-ACID that's "good enough". RDMBS makes caring about your data cheap. NoSQL makes caring about your data expensive, but, it makes scaling cheap(er) (in some cases). There are many companies with multi-terabyte database in RDBMSes, so to say unilaterally that "they don't scale" is simply inaccurate. Multi-terabyte SQL databases however, can cost lots of money, depending on the use case (you can after all just slap a RAID 10 array with a few 3TB drives onto a computer and throw a database engine on it. Might take several minutes to a few hours to do any kind of table scan on a big table, or even indexed look-up though, but if you don't care, it's cheap and multi-terabyte).
The biggest category is read-only type queries, where an aborted or botched transaction can simply be repeated. Anything where you are changing an underlying state, or want to guarantee once and only once activity, should have proper transactional semantics.
That is, "I want to order one widget, charge my credit card" should be a proper transaction: I don't want my card charged unless the widget is ordered, and the vendor doesn't want the widget sent unless the card is charged. "Report the shipment status of order xyz" doesn't need to be transactional -- if I don't get an answer, I can hit reload.
Much of it is just a bit of lateral thinking.
Thw whole point of transcation is you wrap up several operations, and should any fail all that have succeeded get rolled back, and while the transaction is in progress, records are locked and unless you have read uncommitted going, you don't see any on the individual changes of state until the transaction is committed.
Doing all that with distributed systems is expensive, because you need one 'central' and difficult to scale point that needs to 'know' all about the others.
So instead or Order this, charge my card, and show me my current balance.
You do Try to order this, if it's instock charge my card, and if my card gets charged the current known balance will be this.
There's a risk, that the order will be placed, put payment fail, so you need to deal with that. There's a risk that the proposed balance of the card my not be entirely accurate, hence add weasel words and show the potential effect of payment as opposed to the result.
It's not so much are transactions important, it's seeing as they aren't as well supported in NoSQL systems, where/how can I get away with not using them.
I saw this sentence not only in one place:
"A transaction should be kept as short as possible to avoid concurrency issues and to enable maximum number of positive commits."
What does this really mean?
It puzzles me now because I want to use transactions for my app which in normal use will deal with inserting of hundreds of rows from many clients, concurrently.
For example, I have a service which exposes a method: AddObjects(List<Objects>) and of course these object contain other nested different objects.
I was thinking to start a transaction for each call from the client performing the appropriate actions (bunch of insert/update/delete for each object with their nested objects). EDIT1: I meant a transaction for entire "AddObjects" call in order to prevent undefined states/behaviour.
Am I going in the wrong direction? If yes, how would you do that and what are your recommendations?
EDIT2: Also, I understood that transactions are fast for bulk oeprations, but it contradicts somehow with the quoted sentence. What is the conclusion?
Thanks in advance!
A transaction has to cover a business specific unit of work. It has nothing to do with generic 'objects', it must always be expressed in domain specific terms: 'debit of account X and credit of account Y must be in a transaction', 'subtract of inventory item and sale must be in a transaction' etc etc. Everything that must either succeed together or fail together must be in a transaction. If you are down an abstract path of 'adding objects to a list is a transaction' then yes, you are on a wrong path. The fact that all inserts/updates/deletes triggered by a an object save are in a transaction is not a purpose, but a side effect. The correct semantics should be 'update of object X and update of object Y must be in a transaction'. Even a degenerate case of a single 'object' being updated should still be regarded in terms of domain specific terms.
That recommendation is best understood as Do not allow user interaction in a transaction. If you need to ask the user during a transaction, roll back, ask and run again.
Other than that, do use transaction whenever you need to ensure atomicity.
It is not a transactions' problem that they may cause "concurrency issues", it is the fact that the database might need some more thought, a better set of indices or a more standardized data access order.
"A transaction should be kept as short as possible to avoid concurrency issues and to enable maximum number of positive commits."
The longer a transaction is kept open the more likely it will lock resources that are needed by other transactions. This blocking will cause other concurrent transactions to wait for the resources (or fail depending on the design).
Sql Server is usually setup in Auto Commit mode. This means that every sql statement is a distinct transaction. Many times you want to use a multi-statement transaction so you can commit or rollback multiple updates. The longer the updates take, the more likely other transactions will conflict.