I am currently attempting to make a b+ tree concurrent.
So far the approach I had in mind as a starting point would be to iterate through the tree when inserting, locking each node (each node has its own lock) and unlocking once gotten the lock to the next node in the tree until a node which has a child who has the order of the b+ tree - 1 keys as anything under that node can be modified, after which all the necessary insert operations are ran and the node is unlocked.
This is obviously a very naive approach and doesnt offer much in the way of concurrency so I was wondering if there is a better way to go about this? Any inputs whatsoever would be greatly appreciated!
I have just finished one project on implementing a concurrent B+ tree. You can find some intuition from CMU 15-445 (Database Systems):
https://15445.courses.cs.cmu.edu/fall2018/slides/09-indexconcurrency.pdf (Slides)
https://www.youtube.com/watch?v=6AiAR_giC6A&list=PLSE8ODhjZXja3hgmuwhf89qboV1kOxMx7&index=9 (Video)
One way to do this is called "latch crabbing". Basically, you need an RWLock for each node.
When you are searching for a leaf node, you add a Read(Search) or Write(Insert/Delete) lock on each of the node you visit. Once you discovered a node is "safe" (i.e. it won't split for insert, or it won't merge/redistribute with neighbors in delete), you can release the locks of its ancestors since you know the modification is limited to under this node. In this way, you are acquiring locks in the front and releasing locks on the back, walking like a crab, that's why it is called "latch crabbing" (I am misusing "latch" and "lock" here)
This can be hard to implement, good lock :)
I am trying to figure out how to insert an item into a B+ tree using locks and don't really understand the theory behind it.
So for searching, my view is that I put a lock on the root node, and then decide which child node I should go to and lock it, at this point I can release the parent node and continue this operation until I reach the leaf node.
But inserting is a lot more complicated because I can't allow any other threads to interfere with the insertion. My idea is put a lock on each node along the path to the leaf node but putting that many locks is quite expensive, and then the question I have is what happens when the leaf node splits because it is too large?
Does anyone know how to properly insert an item into a B+ tree using locks?
There are many different strategies for dealing with locking in B-Trees in general; most of these actually deal with B+Trees and its variations since they have been dominating the field for decades. Summarising these strategies would be tantamount to summarising the progress of four decades; it's virtually impossible. Here are some highlights.
One strategy for minimising the amount of locking during initial descent is to lock not the whole path starting from the root, but only the sub-path beginning at the last 'stable' node (i.e. a node that won't split or merge as a result of the currently planned operation).
Another strategy is to assume that no split or merge will happen, which is true most of the time anyway. This means the descent can be done by locking only the current node and the child node one will descend into next, then release the lock on the previously 'current' node and so on. If it turns out that a split or merge is necessary after all then re-descend from the root under a heavier locking regime (i.e. path rooted at last stable node).
Another staple in the bag of tricks is to ensure that each node 'descended through' is stable by preventative splitting/merging; that is, when the current node would split or merge under a change bubbling up from below then it gets split/merged right away before continuing the descent. This can simplify operations (including locking) and it is somewhat popular in reinventions of the wheel - homework assignments and 'me too' implementations, rather than sophisticated production-grade systems.
Some strategies allow most normal operations to be performed without any locking at all but usually they require that the standard B+Tree structure be slightly modified; see B-link trees for example. This means that different concurrent threads operating on the tree can 'see' different physical views of this tree - depending on when they got where and followed which link - but they all see the same logical view.
Seminal papers and good overviews:
Efficient Locking for Concurrent Operations on B-Trees (Lehman/Yao 1981)
Concurrent Operations on B*-Trees with Overtaking (Sagiv 1986)
A survey of B-tree locking techniques (Graefe 2010)
B+Tree Locking (slides from Stanford U, including Blink trees)
A Blink Tree method and latch protocol for synchronous deletion in a high concurreny environment (Malbrain 2010)
A Lock-Free B+Tree (Braginsky/Petrank 2012)
I am writing application in Linux using C, pthreads and sockets.
This will be client-server application, server will have N+2 threads, where N - number of active clients, one thread for accepting new connections and creating threads for clients and last one will be accepting user input.
I will be using linked list to save some data that will be relevant to my application, with every client there will be associated one node in my list. Those client threads will update information that is stored in their nodes with some interval, could be one second, could be two minutes, it will dynamically change.
Now here is the problem, if user requests it, the information stored in linked list needs to be written to standard output. Of course during writing I should acquire mutex. I am worried that one mutex for whole list will hinder the performance.
I was thinking about associating mutex with every node, but it will complicate removal of some specified node (firstly, I would need to make sure that the 'stdout writer' thread won't be traversing the list, I would also need to acquire mutex of my node and the previous one to change the pointer that points to the next node and so on - either I would need to traverse all the way to the previous or I would need to make double linked list).
So I am wondering if the solution that involves multiple mutexes is even better with much more complicated code, conditions and all of this locking, waiting and unlocking.
You are right that having a per-node mutex will make code more complex. That's a tradeoff you will have to decide the value of. You can either have a single lock for the entire list, that might cause lock contention, but the code is largely not impacted by the presence of the lock and thus easier to write, or you can have more locks with considerably less opportunity for contention, leading to better performance, but the code is harder to write and get correct. You could even have something in the middle by having a lock per group of nodes - allocate a few nodes together and have a lock for that group - but then you'll have issues with tracking a free list and the potential for fragmentation.
You'll need to consider the relative frequency of add operations, delete operations, and full-list iterations, as well as others (reorganization, searching, whatever else your application will require). If add/delete are extremely frequent, but walking the list is once every third blue moon, the single lock could easily be appropriate. But if walking the list (whether for a full dump of the data, or to search or something else) is very common, the more granular approach becomes more attractive. You might even need to consider reader/writer locks instead of mutexes.
You don't need to traverse the list all the way back: while you traverse it, you test if the next element is the one that you want to remove, and then you may lock both nodes - always in the same order throughout the code, so you avoid deadlocking. Also, you can use the double checking idiom and lock the mutex node when you need to be sure of what it has.
remove
for node in list
if node->next is the desired node
lock(node)
lock(node->next)
if node->next is the desired node
do removing stuff
else
treat concurrent modification - retry, maybe?
release(node->next)
release(node)
With this idiom you don't need to lock the entire list while reading it, and also checks for a modification performed between the first test and the locking. I don't believe the code would get that much more complicated with an array of mutexes, and the locking overhead is nothing compared with the operations you may do, as IO.
Unless you have tens or even hundreds of thousands of users, it won't take that long to read the list. You might want to create a local, intermediate list so the original is not locked while writing, which might take some time. This also means you get a snapshot of the list at one point in time. If you lock individual nodes, you could remove A, then remove element B, and yet have A appear in the displayed list when B does not.
As I understand it, if you do want to lock individual nodes, your list must be singly linked. Additions and removals get rather tricky. In Java, there are several system classes that do this using fast compare-and-swap techniques. There must be code like it in C, but I don't know where to look for it. And you will get those chronologically-challenged results.
If you are going to have N thread for N active client then think about the option of using pthread_setspecific and pthread_getspecific.
I'm trying to implement a (special kind of) doubly-linked list in C, in a pthreads environment but using only C-wrapped synchronization instructions like atomic CAS, etc. rather than pthread primitives. (The elements of the list are fixed-size chunks of memory and almost surely cannot fit pthread_mutex_t etc. inside them.) I don't actually need full arbitrary doubly-linked list methods, only:
insertion at the end of the list
deletion from the beginning of the list
deletion at arbitrary points in the list based on a pointer to the member to be removed, which was obtained from a source other than by traversing the list.
So perhaps a better way to describe this data structure would be a queue/fifo with the possibility of removing items mid-queue.
Is there a standard approach to synchronizing this? I'm getting stuck on possible deadlock issues, some of which are probably inherent to the algorithms involved and others of which might stem from the fact that I'm trying to work in a confined space with other constraints on what I can do.
Edit: In particular, I'm stuck on what to do if adjacent objects are to be removed simultaneously. Presumably when removing an object, you need to obtain locks on both the previous and next objects in the list and update their next/prev pointers to point to one another. But if either neighbor is already locked, this would result in a deadlock. I've tried to work out a way that any/all of the removals taking place could walk the locked part of the list and determine the maximal sublist that's currently in the process of removal, then lock the nodes adjacent to that sublist so that the whole sublist gets removed as a whole, but my head is starting to hurt.. :-P
Conclusion(?): To follow up, I do have some code I want to get working, but I'm also interested in the theoretical problem. Everyone's answers have been quite helpful, and combined with details of the constraints outside what I expressed here (you really don't want to know where the pointer-to-element-to-be-removed came from and the synchronization involved there!) I've decided to abandon the local-lock code for now and focus on:
using a larger number of smaller lists which each have individual locks.
minimizing the number of instructions over which locks are held and poking at memory (in a safe way) prior to acquiring a lock to reduce the possibility of page faults and cache misses while a lock is held.
measuring the contention under artificially-high load and evaluating whether this approach is satisfactory.
Thanks again to everybody who gave answers. If my experiment doesn't go well I might come back to the approaches outlined (especially Vlad's) and try again.
Why not just apply a coarse-grained lock? Just lock the whole queue.
A more elaborate (however not necessarily more efficient, depends on your usage pattern) solution would be using a read-wrote lock, for reading and writing, respectively.
Using lock-free operations seem to me not a very good idea for your case. Imagine that some thread is traversing your queue, and at the same moment the "current" item is deleted. Doesn't matter how many additional links your traverse algorithm holds, all that items may be deleted, so your code would have no chance to finish the traversal.
Another issue with compare-and-swap is that with pointers you never know whether it really points to the same old structure, or the old structure has been freed and some new structure is allocated at the same address. This may or may not be an issue for your algorithms.
For the case of "local" locking (i.e., the possibility to lock each list item separately), An idea would be to make the locks ordered. Ordering the locks ensures the impossibility of a deadlock. So your operations are like that:
Delete by the pointer p to the previous item:
lock p, check (using perhaps special flag in the item) that the item is still in the list
lock p->next, check that it's not zero and in the list; this way you ensure that the p->next->next won't be removed in the meantime
lock p->next->next
set a flag in p->next indicating that it's not in the list
(p->next->next->prev, p->next->prev) = (p, null); (p->next, p->next->next) = (p->next->next, null)
release the locks
Insert into the beginning:
lock head
set the flag in the new item indicating that it's in the list
lock the new item
lock head->next
(head->next->prev, new->prev) = (new, head); (new->next, head) = (head, new)
release the locks
This seems to be correct, I didn't however try this idea.
Essentially, this makes the double-linked list work as if it were a single-linked list.
If you don't have the pointer to the previous list element (which is of course usually the case, as it's virtually impossible to keep such a pointer in consistent state), you can do the following:
Delete by the pointer c to the item to be deleted:
lock c, check if it is still a part of the list (this has to be a flag in the list item), if not, operation fails
obtain pointer p = c->prev
unlock c (now, c may be moved or deleted by other thread, p may be moved or deleted from the list as well) [in order to avoid the deallocation of c, you need to have something like shared pointer or at least a kind of refcounting for list items here]
lock p
check if p is a part of the list (it could be deleted after step 3); if not, unlock p and restart from the beginning
check if p->next equals c, if not, unlock p and restart from the beginning [here we can maybe optimize out the restart, not sure ATM]
lock p->next; here you can be sure that p->next==c and is not deleted, because the deletion of c would have required locking of p
lock p->next->next; now all the locks are taken, so we can proceed
set the flag that c is not a part of the list
perform the customary (p->next, c->next, c->prev, c->next->prev) = (c->next, null, null, p)
release all the locks
Note that just having a pointer to some list item cannot ensure that the item is not deallocated, so you'll need to have a kind of refcounting, so that the item is not destroyed at the very moment you try to lock it.
Note that in the last algorithm the number of retries is bounded. Indeed, new items cannot appear on the left of c (insertion is at the rightmost position). If our step 5 fails and thus we need a retry, this can be caused only by having p removed from the list in the meanwhile. Such a removal can occur not more than N-1 times, where N is the initial position of c in the list. Of course, this worst case is rather unlikely to happen.
Please don't take this answer harshly, but don't do this.
You will almost certainly wind up with bugs, and very hard bugs to find at that. Use the pthreads lock primitives. They are your friends, and have been written by people who deeply understand the memory model provided by your processor of choice. If you try to do the same thing with CAS and atomic increment and the like, you will almost certainly make some subtle mistake that you won't find until it's far too late.
Here's a little code example to help illustrate the point. What's wrong with this lock?
volatile int lockTaken = 0;
void EnterSpinLock() {
while (!__sync_bool_compare_and_swap(&lockTaken, 0, 1) { /* wait */ }
}
void LeaveSpinLock() {
lockTaken = 0;
}
The answer is: there's no memory barrier when releasing the lock, meaning that some of the write operations executed within the lock may not have happened before the next thread gets into the lock. Yikes! (There are probably many more bugs too, for example, the function doesn't do the platform-appropriate yield inside the spin loop and so is hugely wasteful of CPU cycles. &c.)
If you implement your double-linked list as a circular list with a sentinal node, then you only need to perform two pointer assignments in order to remove an item from the list, and four to add an item. I'm sure you can afford to hold a well-written exclusive lock over those pointer assignments.
Note that I am assuming that you are not one of the few people who deeply understand memory models only because there are very few of them in the world. If you are one of these people, the fact that even you can't figure it out ought to be an indication of how tricky it is. :)
I am also assuming that you're asking this question because you have some code you'd actually like to get working. If this is simply an academic exercise in order to learn more about threading (perhaps as a step on your way to becoming a deep low-level concurrency expert) then by all means, ignore me, and do your research on the details of the memory model of the platform you're targeting. :)
You can avoid deadlock if you maintain a strict hierarchy of locks: if you're locking multiple nodes, always lock the ones closer to the head of the list first. So, to delete an element, first lock the node's predecessor, then lock the node, then lock the node's successor, unlink the node, and then release the locks in reverse order.
This way, if multiple threads try to delete adjacent nodes simultaneously (say, nodes B and C in the chain A-B-C-D), then whichever thread first gets the lock to node B will be the one that will unlink first. Thread 1 will lock A, then B, then C, and thread 2 will lock B, then C, then D. There's only competition for B, and there's no way that thread 1 can hold a lock while waiting for a lock held by thread 2 and while thread 2 is waiting on the lock held by thread 1 (i.e. deadlock).
You cannot get away without a lock for the whole list. Here's why:
Insert into an Empty List
Threads A and B wants to insert an object.
Thread A examines the list, finds it empty
A context switch occurs.
Thread B examines the list, finds it empty and updates the head and tail to point to its object.
A context switch occurs
Thread A updates the head and tail to point to its object. Thread B's object has been lost.
Delete an item from the middle of the list
Thread A wants to delete node X. For this it first has to lock X's predecessor, X itself and X's successor since all of these nodes will be affected by the operation. To lock X's predecessor you must do something like
spin_lock(&(X->prev->lockFlag));
Although I've used function call syntax, if spin_lock is a function, you are dead in the water because that involves at least three operations before you actually have the lock:
place the address of the lock flag on the stack (or in a register)
call the function
do the atomic test and set
There are two places there where thread A can be swapped out and another thread can get in and remove X's predecessor without thread A knowing that X's predecessor has changed. So you have to implement the spin lock itself atomically. i.e. you have to add an offset to X to get x->prev then dereference it to get *(x->prev) and add an offset to that to get lockFlag and then do an atomic operation all in one atomic unit. Otherwise there is always an opportunity for something to sneak in after you have committed to locking a particular node but before you have actually locked it.
I note that the only reason you need a doubly-linked list here is because of the requirement to delete nodes from the middle of the list, that were obtained without walking the list. A simple FIFO can obviously be implemented with a singly-linked list (with both head and tail pointers).
You could avoid the deletion-from-the-middle case by introducing another layer of indirection - if the list nodes simply contain a next pointer and a payload pointer, with the actual data pointed to elsewhere (you say memory allocation is not possible at the point of insertion, so you'll just need to allocate the list node structure at the same point that you allocate the payload itself).
In the delete-from-the-middle case, you simply set the payload pointer to NULL and leave the orphaned node in the list. If the FIFO pop operation encounters such an empty node, it just frees it and tries again. This deferral lets you use a singly-linked list, and a lockless singly-linked list implementation is significantly easier to get right.
Of course, there is still an essential race here around the removal of a node in the middle of the queue - nothing appears to stop that node coming to the front of the queue and being removed by another thread before the thread that has decided it wants to remove it actually gets a chance to do so. This race appears to be outside the scope of the details provided in your question.
Two ideas.
First, to avoid the deadlock problem I would do some sort of spinlock:
lock the item that is to be deleted
try to lock one of the neighbors, if you have cheap random bits available chose the side randomly
if this doesn't succeed abandon your first lock
and loop
try to lock the other one
if this succeeds delete your item
else abandon both locks
and loop
Since splicing an element out of a list is not very lengthy as an operation, this shouldn't cost you much performance overhead. And in case that you really have a rush to delete all elements at the same time, it still should give you some good parallelism.
The second would be to do lazy delete. Mark your elements that are to be deleted and only remove them effectively when they appear at the end of the list. Since you are only interested in the head and the tail the effective users of the list items can do this. The advantage is that when they are at the end when deleted, the deadlock problem disappears. The disadvantage is that this makes the final deletion a sequential operation.
I have been searching concurrent linked list implementations/academic papers that allow for concurrent insertions to disjoint places in the list. I would prefer a lock based approach.
Unfortunately, all the implementations I've checked out so far use list based locking as opposed to something akin to node based locking.
Any help people?
EDIT 1: Thanks all for the initial responses. Using node based locking means that for insertion after a node or deleting a node I need to lock the previous and the next node. Now it is entirely possible that by the time Thread 1 tries to lock the previous node it got deleted in Thread 2. How to guard against such accidents?
I'm not able to recommend any libraries that do this for C specifically, but if you end up doing it yourself you could potentially avoid having to have thousands of locks by re-using a small number of locks and some "hashing" to decide which to use for each node. You'd get quite a number of cases where there wouldn't be any contention if the number of locks is suitably larger than the number of nodes for little space overhead (and it's fixed, not per node).
Update, for EDIT 1
You could work around this by having a per-list multiple reader, single write lock, (rwlock), where you acquire a "read" lock prior to getting the per-node lock for inserts, but for a delete you need to get the single "write" lock. You avoid unnecessary synchronisation issues for the read/insert operations fairly easily and deleting is simple enough. (The assumption is delete is much rarer than insert though)
You may want to look at using a lock-free implementation. The idea is to use an atomic test-set operation when inserting/deleting a node.
Unfortunately, there are not many widely known implementations. You may have to roll your own. Here is the gcc documentation about atomic operation support:
http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Atomic-Builtins.html
The trouble with node based locking is that you normally have to lock two nodes for each insertion. This can be more expensive in some situations.
Worse is that you get dining philosopher alike deadlock possibilities you have to treat.
So therefore list based locking is easier and thats why you see more about these.
If the performance characteristics of list based locking is not favorable to your application consider changing to a different data structure than as single linked list.