Read consistency on page split - sql-server

For simplicity, lets suppose we have some non-leaf page A where key is int.
We want to find key 4812 and at this point we have entries 2311 and 5974.
So, current thread acquires a shared latch for that page and calculates that it needs leaf page B (for data between 2311 and 5974).
At the same time, some other thread is inserting on page B, previously acquiring exclusive latch on it.
Because of insert, it has to split page on entry 3742 and create new Page C with upper half of data.
First thread has finished reading and releases the latch on Page A.
If it tries to find key 4812 on Page B (after exclusive latch is released) it won't find it, because it was moved to Page C during page split.
If I understand correctly, latch is implemented with spinlock and it should be short lived.
In order to prevent this kind of problem, writer thread would have to keep latches on all traversed non-leaf pages, which would be extremely inefficient.
I have basically 2 questions:
Is latch on page level only or it can be on row level also? I couldn't find information about that. If that was the case, then impact wouldn't be that big, but it would still be wasteful when there are no page splits (and that's mostly the case).
Is there some other mechanism to cover this?
My question is about Sql Server because I'm familiar with its internals, but this should apply to mostly any other database.

Related

ACCESS_METHODS_HOBT_VIRTUAL_ROOT event

According to reputable websites:
ACCESS_METHODS_HOBT_VIRTUAL_ROOT
https://learn.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/sys-dm-os-latch-stats-transact-sql?view=sql-server-ver15
access_methods_hobt_virtual_root used to synchronize access to the root page abstraction of an internal b-tree.
paul randal
https://www.sqlskills.com/blogs/paul/most-common-latch-classes-and-what-they-mean/
access_methods_hobt_virtual_root
this latch is used to access the metadata for an index that contains the page id of the index’s root page.
contention on this latch can occur when a b-tree root page split occurs (requiring the latch in ex mode)
and threads wanting to navigate down the b-tree (requiring the latch in sh mode) have to wait. this could
be from very fast population of a small index using many concurrent connections, with or without page
splits from random key values causing cascading page splits (from leaf to root).
How do I tune SQL Server to limit this wait type?

How to efficiently wait for data becoming available in RDBS (PostgreSQL) table?

I'm building a web service which reserves unique items to users.
Service is required to handle high amounts of concurrent requests that should avoid blocking each other as much as possible. Each incoming request must reserve n-amount of unique items of the desired type, and then process them successfully or release them back to the reservables list so they can be reserved by an another request. A succesful processing contains multiple steps like communicating with integrated services and other time consuming steps, so keeping items reserved with a DB transaction until the end would not be an efficient solution.
Currently I've implemented a solution where reservable items are stored in a buffer DB table where items are being locked and deleted by incoming requests with SELECT FOR UPDATE SKIP LOCKED. As service must support multiple item types, this buffer table contains only n amount of items per type at a time as the table size would otherwise grow into too big as there is about ten thousand different types. When certain item types are all reserved (selected and removed) the request locks the item type and adds more reservable items into the buffer. This fill operation requires integration calls and may take some time. During the fill, all other operations needs to wait until the filling operation finishes and items become available. This is where the problem arises. When thousands of requests wait for the same item type to become available in the buffer, each needs to poll this information somehow.
What could be an efficient solution for this kind of polling?
I think the "real" answer is to start the refill process when the stock gets low, rather than when it is completely depleted. Then it would already be refilled by the time anyone needs to block on it. Or perhaps you could make the refill process work asynchronously, so that the new rows are generated near-instantly and then the integrations are called later. So you would enqueue the integrations, rather than the consumers.
But barring that, it seems like you want the waiters to lock the "item type" in a mode incompatible with the how the refiller locks it. Then it will naturally block, and be released once the refiller is done. The problem is that if you want to assemble an order of 50 things and the 47th is depleted, do you want to maintain the reservation on the previous 46 things while you wait?
Presumably your reservation is not blocking anyone else, unless the one you have reserved is the last one available. In which case you are not really blocking them, just forcing them to go through the refill process, which would have had to be done eventually anyway.

How to increase concurrent performance of Buffer Pool in database systems?

In database system tutorials, like textbook Database System Concepts, there is a module called Buffer Pool / Buffer Manager / Pager / whatever. I didn't see much detail about it, so I'm curious how do you increase the concurrent performance of it?
For example, let's say we have a Trie Index. If we do the paging inside the trie, without the buffer pool, we can easily have multiple threads concurrently load or evict leaf nodes: all you need to do is to acquire shared locks of nodes from the top to the bottom and the exclusive lock of the parent of the leaf node.
However, if you instead let the buffer pool to handle the paging things, then I suppose you might need to acquire the exclusive lock of the buffer pool. Then, there is only a single thread can load or evict pages at the same time.
Actually, I have tried this in a database implementation. The old version doesn't have a buffer pool and manages the paging things in the trie index. And the new version has a buffer pool does the job instead of the trie index itself. There is a big lock protecting the hashmap that maps Page ID to the corresponding page in the buffer pool. The single thread test is 40% faster, however, with 10 concurrent threads, 5x slower!
I guess lock-free data structures may help? But I also guess that's going to be hard to think it straight. So how do you guys design and implement the buffer pool? Thanks!
I solved this problem thanks to the discussion here (in Chinese, sorry). The solution is quite simple, just shard the buffer manager. Each page is delegated to a shard by hashing the page number. As long as this hash function results in a uniform distribution, the probability of multiple threads waiting on the same lock will be low.
In my case, I divided the buffer manager into 128 shards and the hash function is just page_no % 128, with 10 threads, the result of a simple benchmark looks quite amazing:
with Sharded Buffer Manager: 7.73s
with Buffer Manager: 123s
without Buffer Manager, i.e. the trie does the paging itself: 19.7s
BTW, MySQL seems to also take this approach (correct me if I misunderstood it): https://dev.mysql.com/doc/refman/5.7/en/innodb-multiple-buffer-pools.html

Why use lock mode page on a table

I was wondering why I would need to use lock mode page on a table.
Recently I came up to a pretty good case of why not. While I was trying to insert a row on a table I got a deadlock. After lots of investigation I figured out the the lock level of my table was Page and this was the actual reason that lead to the deadlock.
My guess is that this is a common scenario on large scale high performance environments with multiple applications hitting the same db
The only thing I found is that I should use page locking if I am processing rows in the same order as the paging occurs. This looks like a weak condition that can seldom be met (especially for scaling which could render this case obsolete).
I can see why one would lock a full table or use per row locking but the Page locking does not make much sense. Or does it?
You never need to use LOCK MODE PAGE on a table, but you may choose to do so.
It does no damage whatsoever if only a single row fits on a page (or a single row requires more than one page).
If you can fit multiple rows on a page, though, you have a meaningful choice between LOCK MODE PAGE and LOCK MODE ROW. Clearly, if you use LOCK MODE ROW, then the fact that one process has a lock on one row of a page won't prevent another process from gaining a lock on a different row on the same page, whereas LOCK MODE PAGE will prevent that.
The advantage of LOCK MODE PAGE is that it requires less locks when a single process updates multiple rows on a page in a single transaction.
So, you have to do a balancing act. You can take the view that there are so many rows in the database that the chances of two processes needing to lock different rows on the same page is negligible, and use LOCK MODE PAGE knowing that there's a small risk that you'll have processes blocking other processes that would not be blocked if you used LOCK MODE ROW. Alternatively, you can take the view that the risk of such blocking is unacceptable and the increased number of locks is not a problem, and decide to use LOCK MODE ROW anyway.
Historically, when the number of locks was a problem because memory was scarce (in the days when big machines had less than a 100 MiB of main memory!), saving locks by using LOCK MODE PAGE made more sense than it does now when systems have multiple gigabytes of main memory.
Note that it doesn't matter which lock mode you use if two processes want to update the same row; one will get a lock and block the other until the transaction commits (or until the statement completes if you aren't using explicit transactions).
Note that the default lock mode is still LOCK MODE PAGE, mainly in deference to history where that has always been the case. However, there is an ONCONFIG parameter, DEF_TABLE_LOCKMODE, that you can set to row (instead of page) that will set the default table lock mode to LOCK MODE ROW. You can still override that explicitly in a DDL statement, but if you don't specify an explicit lock mode, the default will be row or page depending on the setting of DEF_TABLE_LOCKMODE.

Multiple instances of a service processing rows from a single table, what's the best way to prevent collision

We are trying to build a system with multiple instances of a service on different machines that share the load of processing.
Each of these will check a table, if there are rows to be processed on that table, it will pick the first, mark it processing, then process it, then mark it done. Rinse repeat.
What is the best way to prevent a racing condition where 2 instances A and B do the following
A (1) read the table, finds row 1 to process,
B (1) reads the table, finds row 1 to process,
A (2) marks it row processing
B (2) Marks it row processing
In a single app we could use locks or mutexs.
I can just put A1 and A2 in a single transaction, is it that simple, or is there a better, faster way to do this?
Should I just turn it on it's head so that the steps are:
A (1) Mark the next row as mine to process
A (2) Return it to me for processing.
I figure this has to have been solved many times before, so I'm looking for the "standard" solutions, and if there are more than one, benefits and disadvantages.
Transactions are a nice simple answer, with two possible drawbacks:
1) You might want to check with the fine print of your database. Sometimes the default consistency settings don't guarantee absolute consistency in every possible circumstance.
2) Sometimes the pattern of accesses associated with using a database to queue and distribute work is hard on a database that isn't expecting it.
One possibility is to look at reliable message queuing systems, which are seem to pretty good match to what you are looking for - worker machines could just read work from a shared queue. Possible jumping-off points are http://en.wikipedia.org/wiki/Message_queue and http://docs.oracle.com/cd/B10500_01/appdev.920/a96587/qintro.htm

Resources