How to block records when are read? - sql-server

In my case, I have groups of records and when I update one of them, I need to update the rest of the records of the group. So to ensure that I update all of them and I need to ensure that no other records are added to the group when I am updating one record of the group. Also I need to ensure that a record of the group is not removed in the middle of the process because then I update a record that really isn't belong to the group.
So I am thinking in the possibility to block a record just when it is read. In the documenation I see the the more restrictive isolation is the serializable isolation, but I have a doubt because it says:
Statements cannot read data that has been modified but not yet committed by other transactions.
Other statement can't read the record if it modified and not committed, but can be read if still is not modified, so I can get unupdated information that I need to decide what related records I need to update.
So I would like to know if it is another way to block records just when it is read. I know that with the hints I can block a table when I execute a statement, but block all the table. The process that I need to execute it's very fast, but I would like to avoid the need to block all the table and block only the records that I need.

Yes, you can use SERIALIZABLE isolation level. In the next point in the document you linked, it says:
No other transactions can modify data that has been read by the current transaction until the current transaction completes.
Other transactions cannot insert new rows with key values that would fall in the range of keys read by any statements in the current transaction until the current transaction completes.
Which is what you need.
Remember that you can change the isolation level. You can raise it to SERIALIZABLE to do your job and then move it back to what you had before. The locks put in place when the isolation level was SERIALIZABLE will stay in place until the end of transaction.

Manipulating multi-view concurrency control (MVCC) is something to be done with care. Yes, SERIALIZABLE is a two-edged sword that can get you into trouble. But, here's the key point in the documentation you referenced:
Other transactions cannot insert new rows with key values that would
fall in the range of keys read by any statements in the current
transaction until the current transaction completes.
It sounds like protecting key ranges is really what you want to do, and serialized is the only somewhat sane way to do that (that I know of).
So, you are on the right track with SERIALIZABLE. Just be careful, test thoroughly, and make your transactions complete quickly.

Related

design pattern for undoing after I have commited the changes

We can undo an action using Command or Memento pattern.
If we are using kafka then we can replay the stream in reverse order to go back to the previous state.
For example, Google docs/sheet etc. also has version history.
in case of pcpartpicker, it looks like the following:
For being safe, I want to commit everything but want to go back to the previous state if needed.
I know we can disable auto-commit and use Transaction Control Language (COMMIT, ROLLBACK, SAVEPOINT). But I am talking about undoing even after I have commited the change.
How can I do That?
There isn't a real generic answer to this question. It all depends on the structure of your database, span of the transactions across entities, distributed transactions, how much time/transactions are allowed to pass before your can revert the change, etc.
Memento like pattern
Memento Pattern is one of the possible approaches, however it needs to be modified due to the nature of the relational databases as follows:
You need to have transaction log table/list that will hold the information of the entities and attributes (tables and columns) that ware affected by the transaction with their primary key, the old and new values (values before the transaction had occurred, and values after the transaction) as well as datetimestamp. This is same with the command (memento) pattern.
Next you need a mechanism to identify the non-explicit updates that ware triggered by the stored procedures in the database as a consequence of the transaction. This is important, since a change in a table can trigger changes in other tables which ware not explicitly captured by the command.
Mechanism for rollback will need to determine if the transaction is eligible for roll-back by building a list of subsequent transactions on the same entities and determine if this transaction is eligible for roll-back, or some subsequent transaction would need to be rolled-back as well before this transaction can be rolled-back.
In case of a roll-back is allowed after longer period of time, or a near-realtime consumption of the data, there should also be a list of transaction observers, processes that need to be informed that the transaction is no longer valid since they already read the new data and took a decision based on it. Example would be a process generating a cumulative report. When transaction is rolled-back, the rollback will invalidate the report, so the report needs to be generated again.
For a short term roll-back, mainly used for distributed transactions, you can check the Microservices Saga Pattern, and use it as a starting point to build your solution.
History tables
Another approach is to keep incremental updates or also known as history tables. Where each update of the row will be an insert in the history table with new version. Similar to previous case, you need to decide how far back you can go in the history when you try to rollback the committed transaction.
Regulation issues
Finally, when you work with business data such as invoice, inventory, etc. you also need to check what are the regulations related with the cancelation of committed transactions. As example, in the accounting systems, it's not allowed to delete data, rather a new row with the compensation is added (ex. removing product from shipment list will not delete the product, but add a row with -quantity to cancel the effect of the original row and keep audit track of the change at the same time.

Do transactions in SQL Server lock all tables within the statement by default?

Suppose I have a T-SQL statement like so:
BEGIN TRAN
UPDATE dbo.TableA
...
...
...
DELETE FROM dbo.TableB
COMMIT TRAN
Suppose that the update on TableA is going to take some time.
By default, would SQL Server lock TableB until the transaction is completed? Would that mean you can't read or write to it while the update is ongoing?
Short answer: NO and NO.
Long answer:
This is, in fact, a great question as it goes deep in transaction concepts and how the engine works but I guess a complete answer can occupy a good deal of a chapter on a good book and is out of the scope of this site.
First, keep in mind the engine can work in several isolation modes: snapshot, read committed, etc. I can recommend good research on this topic (this can take a few days).
Second, the engine has the granularity level and will try to use the "smallest" one but can escalate it on demand, depending on many factors, for example: "will this operation need a page split?"
Third, BEGIN, COMMIT, ROLLBACK work more in a "semaphore" way, flagging how changes are being phased from "memory" to "disk". It's a lot more complicated than it and that why I use quotes.
That said a "default transaction" will use a row granularity in a read committed isolation mode. Nothing says how locks will be issued one way or another.
It depends on stuff like foreign keys, triggers, how much of the table is being changed, etc.
TLDR: It depends on a lot of minor details particular to your scenario. The best way to find out is by testing.
Following the comments of #Jeroen Mostert, #marc_s, and #Cato under the question, your locks on TableA and TableB here are likely to escalate to table exclusive locks as there is no "where" clause. If so, the other read and write operations from different connections may be affected based on their transaction isolation level until the end of this transaction.
Besides, locks are created on-demand; it means that the query first puts a lock on the tableA and after the execution of the update operation, it puts another lock on the tableB.

Updating database keys where one table's keys refer to another's

I have two tables in DynamoDB. One has data about homes, one has data about businesses. The homes table has a list of the closest businesses to it, with walking times to each of them. That is, the homes table has a list of IDs which refer to items in the businesses table. Since businesses are constantly opening and closing, both these tables need to be updated frequently.
The problem I'm facing is that, when either one of the tables is updated, the other table will have incorrect data until it is updated itself. To make this clearer: let's say one business closes and another one opens. I could update the businesses table first to remove the old business and add the new one, but the homes table would then still refer to the now-removed business. Similarly, if I updated the homes table first to refer to the new business, the businesses table would not yet have this new business' data yet. Whichever table I update first, there will always be a period of time where the two tables are not in synch.
What's the best way to deal with this problem? One way I've considered is to do all the updates to a secondary database and then swap it with my primary database, but I'm wondering if there's a better way.
Thanks!
Dynamo only offers atomic operations on the item level, not transaction level, but you can have something similar to an atomic transaction by enforcing some rules in your application.
Let's say you need to run a transaction with two operations:
Delete Business(id=123) from the table.
Update Home(id=456) to remove association with Business(id=123) from the home.businesses array.
Here's what you can do to mimic a transaction:
Generate a timestamp for locking the items
Let's say our current timestamp is 1234567890. Using a timestamp will allow you to clean up failed transactions (I'll explain later).
Lock the two items
Update both Business-123 and Home-456 and set an attribute lock=1234567890.
Do not change any other attributes yet on this update operation!
Use a ConditionalExpression (check the Developer Guide and API) to verify that attribute_not_exists(lock) before updating. This way you're sure there's no other process using the same items.
Handle update lock responses
Check if both updates succeeded to Home and Business. If yes to both, it means you can proceed with the actual changes you need to make: delete the Business-123 and update the Home-456 removing the Business association.
For extra care, also use a ConditionExpression in both updates again, but now ensuring that lock == 1234567890. This way you're extra sure no other process overwrote your lock.
If both updates succeed again, you can consider the two items updated and consistent to be read by other processes. To do this, run a third update removing the lock attribute from both items.
When one of the operations fail, you may try again X times for example. If it fails all X times, make sure the process cleans up the other lock that succeeded previously.
Enforce the transaction lock throught your code
Always use a ConditionExpression in any part of your code that may update/delete Home and Business items. This is crucial for the solution to work.
When reading Home and Business items, you'll need to do this (this may not be necessary in all reads, you'll decide if you need to ensure consistency from start to finish while working with an item read from DB):
Retrieve the item you want to read
Generate a lock timestamp
Update the item with lock=timestamp using a ConditionExpression
If the update succeeds, continue using the item normally; if not, wait one or two seconds and try again;
When you're done, update the item removing the lock
Regularly clean up failed transactions
Every minute or so, run a background process to look for potentially failed transactions. If your processes take at max 60 seconds to finish and there's an item with lock value older than, say 5 minutes (remember lock value is the time the transaction started), it's safe to say that this transaction failed at some point and whatever process running it didn't properly clean up the locks.
This background job would ensure that no items keep locked for eternity.
Beware this implementation do not assure a real atomic and consistent transaction in the sense traditional ACID DBs do. If this is mission critical for you (e.g. you're dealing with financial transactions), do not attempt to implement this. Since you said you're ok if atomicity is broken on rare failure occasions, you may live with it happily. ;)
Hope this helps!

is insert-select statement massive?

When multiple inserts are used with a select statement in a transaction, how does the database keep track of the changes during the transaction? Can there be problems with resources (such as memory or hard disk space) if a transaction is held open too long?
The short answer is, it depends on the size of the select. The select is part of the transaction, technically, but most selects don't have to be "rolled back", so the actual log of DB changes wouldn't include the select by itself. What it WILL include is a new row for every result from the select statement as an insert statement. If that select statement is 10k rows, the commit will be rather large, but no more so than if you'd written 10k individual insert statements within an explicit transaction.
Exactly how this works depends on the database. For example, in Oracle, it will require UNDO space (and eventually, if you run out, your transaction will be aborted, or your DBA will yell at you). In PostgreSQL, it'll prevent the vacuuming of old row versions. In MySQL/InnoDB, it'll use rollback space, and possibly cause lock timeouts.
There are several things the database must use space for:
Storing which rows your transaction has changed (the old values, the new values, or both) so that rollback can be performed
Keeping track of which data is visible to your transaction so that a consistent view is maintained (in transaction isolation levels other than read uncommitted). This overhead will often be greater the more isolation you request.
Keeping track of which data is visible to other transactions (unless the whole database is running in read uncommitted)
Keeping track of which objects which transactions have changed, so isolation rules are followed, especially in serializable isolation. (Probably not much space, but plenty of locks).
In general, you want your transactions to commit as soon as possible. So, e.g., you don't want to hold one open on an idle connection. How to best batch inserts depends on the database (often, many inserts on one transaction is better than one transaction per insert). And of course, the primary purpose of transactions is data integrity.
You can have many problems with the large transaction. First, in most databases you do not want to run row-by-row because for a million records that will take hours. But to insert a million records in one complex statement can cause locking on the tables involved and harm performance for everyone else. And a rollback if you kill the transaction can take a good while too. Usually the best alternative is to loop in batches. I usually test 50,000 at a time and raise or lower the set depending on how long that takes. I've had some databases where I do no more that 1000 in one set-based operation. If possible large inserts or updates should be scheduled for the off-peak hours that the database operates. If really large (and one-time - usually a large data migration) you might even want to close the database for maintenance, put it in single user mode and drop the indexes, do the insert and reindex.

Transactional theory

(I have a simple CRUD API in a DAO pattern implementation.)
All operations (save, load, update, delete) have a transaction id that must be supplied.
So eg. its possible to do:
...
id = begintransaction();
dao.save(o, id);
sao.update(o2, id);
rollback(id);
All examples excluding load invocations seems intuitive. But as soon as you start to load objects from the database, things "feel" a little bit different. Are load-operations, per definition, tied to a transaction? Or should my load operations be counted as a single amount of work?
Depends on the transaction isolation level (http://en.wikipedia.org/wiki/Isolation_(database_systems)) you're using, but in general they should be part of the transaction. What if somebody else is just in the middle of updating the data you're trying to read? If the read operation is not transactional, you would get old data, and maybe you're interested in the latest data.
If the database is set to decent isolation level, uncommited writes can only be read from the transaction that created them. For example, in Oracle, if a procedure inserts or updates a row and then (without commiting) calls another procedure, which uses "pragma autonomous_transaction" to run in a seperate transaction, that other procedure does not see the new row. (An excellent way to shoot yourself in the foot, btw).
For that reason, you should always consider your load operations as tied to the transaction.

Resources