design pattern for undoing after I have commited the changes

design pattern for undoing after I have commited the changes - database

We can undo an action using Command or Memento pattern.
If we are using kafka then we can replay the stream in reverse order to go back to the previous state.
For example, Google docs/sheet etc. also has version history.
in case of pcpartpicker, it looks like the following:
For being safe, I want to commit everything but want to go back to the previous state if needed.
I know we can disable auto-commit and use Transaction Control Language (COMMIT, ROLLBACK, SAVEPOINT). But I am talking about undoing even after I have commited the change.
How can I do That?

There isn't a real generic answer to this question. It all depends on the structure of your database, span of the transactions across entities, distributed transactions, how much time/transactions are allowed to pass before your can revert the change, etc.
Memento like pattern
Memento Pattern is one of the possible approaches, however it needs to be modified due to the nature of the relational databases as follows:
You need to have transaction log table/list that will hold the information of the entities and attributes (tables and columns) that ware affected by the transaction with their primary key, the old and new values (values before the transaction had occurred, and values after the transaction) as well as datetimestamp. This is same with the command (memento) pattern.
Next you need a mechanism to identify the non-explicit updates that ware triggered by the stored procedures in the database as a consequence of the transaction. This is important, since a change in a table can trigger changes in other tables which ware not explicitly captured by the command.
Mechanism for rollback will need to determine if the transaction is eligible for roll-back by building a list of subsequent transactions on the same entities and determine if this transaction is eligible for roll-back, or some subsequent transaction would need to be rolled-back as well before this transaction can be rolled-back.
In case of a roll-back is allowed after longer period of time, or a near-realtime consumption of the data, there should also be a list of transaction observers, processes that need to be informed that the transaction is no longer valid since they already read the new data and took a decision based on it. Example would be a process generating a cumulative report. When transaction is rolled-back, the rollback will invalidate the report, so the report needs to be generated again.
For a short term roll-back, mainly used for distributed transactions, you can check the Microservices Saga Pattern, and use it as a starting point to build your solution.
History tables
Another approach is to keep incremental updates or also known as history tables. Where each update of the row will be an insert in the history table with new version. Similar to previous case, you need to decide how far back you can go in the history when you try to rollback the committed transaction.
Regulation issues
Finally, when you work with business data such as invoice, inventory, etc. you also need to check what are the regulations related with the cancelation of committed transactions. As example, in the accounting systems, it's not allowed to delete data, rather a new row with the compensation is added (ex. removing product from shipment list will not delete the product, but add a row with -quantity to cancel the effect of the original row and keep audit track of the change at the same time.

Related

General strategy to handle reverting financial transactions in DB?

I am building a home budget app with NextJS + Prisma ORM + PostgreSQL.
I am not sure if my current strategy of how to handle deleting/reverting past transactions makes sense in terms of scaling up/db performance..
So, app functions in this way:
User adds transaction that are assigned to a chosen bank account. Every transaction row in db includes fields like amount, balanceBefore and balanceAfter.
After successful transaction, banks account current balance is being updated.
Now, assuming the situation where multiple transactions has been inserted and user realises he made a mistake somewhere along the line. He would then need to select the transaction and delete/update it, which would follow updating every row following this transaction to update balanceAfter and balanceBefore fields so the transactions history is correct.
Is there a better way of handling this kind of situation? Having to update a row that is couple thousand records in past might be heavy on resources.

Not only should you never delete or update a financial transaction but neither should your input data contain balances (before or after). Instead of updating or deleting a transaction you generate 2 new ones: one which reverses the incorrect transaction (thus restoring balances) and one that inserts the correct values.
As for balances, do not store them, just store the transaction amount. Then create a view which calculates the balances on the fly when needed. By creating a view you do not need to perform any calculations when DML is preformed on your transactions. See the following example for both the above.

Snowflake: Concurrent queries with CREATE OR REPLACE

When running a CREATE OR REPLACE TABLE AS statement in one session, are other sessions able to query the existing table, before the transaction opened by CORTAS is committed?
From reading the usage notes section of the documentation, it appears this is the case. Ideally I'm looking for someone who's validated this in practice and at scale, with a large number of read operations on the target table.
Using OR REPLACE is the equivalent of using DROP TABLE on the existing table and then creating a new table with the same name; however, the dropped table is not permanently removed from the system. Instead, it is retained in Time Travel. This is important to note because dropped tables in Time Travel can be recovered, but they also contribute to data storage for your account. For more information, see Storage Costs for Time Travel and Fail-safe.
In addition, note that the drop and create actions occur in a single atomic operation. This means that any queries concurrent with the CREATE OR REPLACE TABLE operation use either the old or new table version.
Recreating or swapping a table drops its change data. Any stream on the table becomes stale. In addition, any stream on a view that has this table as an underlying table, becomes stale. A stale stream is

I have not "proving it via performance tests to prove it happens" but we did run for 5 years, where we read from tables of on set of warehouses and rebuilts underlying tables of overs and never noticed "corruption of results".
I always thought of snowflake like double buffer in computer graphics, you have the active buffer, that the video signal is reading from (the existing tables state) and you are writing to the back buffer while a MERGE/INSERT/UPDATE/DELETE is running, and when that write transaction is complete the active "current page/files/buffer" is flipped, all going forward reads are now from the "new" state.
Given the files are immutable, the double buffer analogy holds really well (aka this is how time travel works also). Thus there is just a "global state of what is current" maintained in the meta data.
To the CORTAS / Transaction, I would assume as that is an DDL operation, if you had any transactions that it completes them, like all DDL operations do. So perhaps prior in my double buffer story, that is a hiccup to understand.

Updating database keys where one table's keys refer to another's

I have two tables in DynamoDB. One has data about homes, one has data about businesses. The homes table has a list of the closest businesses to it, with walking times to each of them. That is, the homes table has a list of IDs which refer to items in the businesses table. Since businesses are constantly opening and closing, both these tables need to be updated frequently.
The problem I'm facing is that, when either one of the tables is updated, the other table will have incorrect data until it is updated itself. To make this clearer: let's say one business closes and another one opens. I could update the businesses table first to remove the old business and add the new one, but the homes table would then still refer to the now-removed business. Similarly, if I updated the homes table first to refer to the new business, the businesses table would not yet have this new business' data yet. Whichever table I update first, there will always be a period of time where the two tables are not in synch.
What's the best way to deal with this problem? One way I've considered is to do all the updates to a secondary database and then swap it with my primary database, but I'm wondering if there's a better way.
Thanks!

Dynamo only offers atomic operations on the item level, not transaction level, but you can have something similar to an atomic transaction by enforcing some rules in your application.
Let's say you need to run a transaction with two operations:
Delete Business(id=123) from the table.
Update Home(id=456) to remove association with Business(id=123) from the home.businesses array.
Here's what you can do to mimic a transaction:
Generate a timestamp for locking the items
Let's say our current timestamp is 1234567890. Using a timestamp will allow you to clean up failed transactions (I'll explain later).
Lock the two items
Update both Business-123 and Home-456 and set an attribute lock=1234567890.
Do not change any other attributes yet on this update operation!
Use a ConditionalExpression (check the Developer Guide and API) to verify that attribute_not_exists(lock) before updating. This way you're sure there's no other process using the same items.
Handle update lock responses
Check if both updates succeeded to Home and Business. If yes to both, it means you can proceed with the actual changes you need to make: delete the Business-123 and update the Home-456 removing the Business association.
For extra care, also use a ConditionExpression in both updates again, but now ensuring that lock == 1234567890. This way you're extra sure no other process overwrote your lock.
If both updates succeed again, you can consider the two items updated and consistent to be read by other processes. To do this, run a third update removing the lock attribute from both items.
When one of the operations fail, you may try again X times for example. If it fails all X times, make sure the process cleans up the other lock that succeeded previously.
Enforce the transaction lock throught your code
Always use a ConditionExpression in any part of your code that may update/delete Home and Business items. This is crucial for the solution to work.
When reading Home and Business items, you'll need to do this (this may not be necessary in all reads, you'll decide if you need to ensure consistency from start to finish while working with an item read from DB):
Retrieve the item you want to read
Generate a lock timestamp
Update the item with lock=timestamp using a ConditionExpression
If the update succeeds, continue using the item normally; if not, wait one or two seconds and try again;
When you're done, update the item removing the lock
Regularly clean up failed transactions
Every minute or so, run a background process to look for potentially failed transactions. If your processes take at max 60 seconds to finish and there's an item with lock value older than, say 5 minutes (remember lock value is the time the transaction started), it's safe to say that this transaction failed at some point and whatever process running it didn't properly clean up the locks.
This background job would ensure that no items keep locked for eternity.
Beware this implementation do not assure a real atomic and consistent transaction in the sense traditional ACID DBs do. If this is mission critical for you (e.g. you're dealing with financial transactions), do not attempt to implement this. Since you said you're ok if atomicity is broken on rare failure occasions, you may live with it happily. ;)
Hope this helps!

Rolling back specific, older SQL transaction

We have a fairly large stored proc that merges two people found in our system with similar names. It deletes and updates on many different tables (about 10 or so, all very different tables). It's all wrapped in a transaction and rolls back if it fails of course. This may be a dumb question, but is it possible to somehow store and rollback just this specific transaction at a later time without having to create and insert into many "history" tables that keep track of exactly what happened? I don't want to restore the whole database, just the results of a stored procedures specific transaction, and at a later date.

It sounds like you may want to investigate Change Data Capture.
It will still be capturing data as it changes, and if you're only doing it for one execution or for a very small amount of data, other methods may be better.

Once a transaction has been committed, it's not possible to roll back just that one transaction at a later date. You're "committed" in quite a literal sense. You can obviously restore from a backup and roll back every transaction since a particular point, but that's probably not what you are looking for.
So making auditing tables are about your only option. As another answer pointed out you can use Data Change Capture, but unless you have forked out the big money for Enterprise Edition, this isn't an option for you. But if you're just interested in undoing this particular type of change, it's probably easiest to add some code to your procedure that does the merge to store the data records necessary to re-split them and create a procedure to do the actual split. But you have to keep in mind that you must handle any changes to the "merged" data that might break your ability to perform the split. This is why SQL can't do it for you automatically... it doesn't know how you might want to handle any changes to the data that might occur after your original transaction.

How to block records when are read?

In my case, I have groups of records and when I update one of them, I need to update the rest of the records of the group. So to ensure that I update all of them and I need to ensure that no other records are added to the group when I am updating one record of the group. Also I need to ensure that a record of the group is not removed in the middle of the process because then I update a record that really isn't belong to the group.
So I am thinking in the possibility to block a record just when it is read. In the documenation I see the the more restrictive isolation is the serializable isolation, but I have a doubt because it says:
Statements cannot read data that has been modified but not yet committed by other transactions.
Other statement can't read the record if it modified and not committed, but can be read if still is not modified, so I can get unupdated information that I need to decide what related records I need to update.
So I would like to know if it is another way to block records just when it is read. I know that with the hints I can block a table when I execute a statement, but block all the table. The process that I need to execute it's very fast, but I would like to avoid the need to block all the table and block only the records that I need.

Yes, you can use SERIALIZABLE isolation level. In the next point in the document you linked, it says:
No other transactions can modify data that has been read by the current transaction until the current transaction completes.
Other transactions cannot insert new rows with key values that would fall in the range of keys read by any statements in the current transaction until the current transaction completes.
Which is what you need.
Remember that you can change the isolation level. You can raise it to SERIALIZABLE to do your job and then move it back to what you had before. The locks put in place when the isolation level was SERIALIZABLE will stay in place until the end of transaction.

Manipulating multi-view concurrency control (MVCC) is something to be done with care. Yes, SERIALIZABLE is a two-edged sword that can get you into trouble. But, here's the key point in the documentation you referenced:
Other transactions cannot insert new rows with key values that would
fall in the range of keys read by any statements in the current
transaction until the current transaction completes.
It sounds like protecting key ranges is really what you want to do, and serialized is the only somewhat sane way to do that (that I know of).
So, you are on the right track with SERIALIZABLE. Just be careful, test thoroughly, and make your transactions complete quickly.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight