making a b+tree concurrent (c) - c

I am currently attempting to make a b+ tree concurrent.
So far the approach I had in mind as a starting point would be to iterate through the tree when inserting, locking each node (each node has its own lock) and unlocking once gotten the lock to the next node in the tree until a node which has a child who has the order of the b+ tree - 1 keys as anything under that node can be modified, after which all the necessary insert operations are ran and the node is unlocked.
This is obviously a very naive approach and doesnt offer much in the way of concurrency so I was wondering if there is a better way to go about this? Any inputs whatsoever would be greatly appreciated!

I have just finished one project on implementing a concurrent B+ tree. You can find some intuition from CMU 15-445 (Database Systems):
https://15445.courses.cs.cmu.edu/fall2018/slides/09-indexconcurrency.pdf (Slides)
https://www.youtube.com/watch?v=6AiAR_giC6A&list=PLSE8ODhjZXja3hgmuwhf89qboV1kOxMx7&index=9 (Video)
One way to do this is called "latch crabbing". Basically, you need an RWLock for each node.
When you are searching for a leaf node, you add a Read(Search) or Write(Insert/Delete) lock on each of the node you visit. Once you discovered a node is "safe" (i.e. it won't split for insert, or it won't merge/redistribute with neighbors in delete), you can release the locks of its ancestors since you know the modification is limited to under this node. In this way, you are acquiring locks in the front and releasing locks on the back, walking like a crab, that's why it is called "latch crabbing" (I am misusing "latch" and "lock" here)
This can be hard to implement, good lock :)

Related

Unavailable nodes in consistent hashing

From everything I have read, in consistent hashing, if a node crashes, the keys handled by that node will be re-mapped to the adjacent node in the hash ring. This conceptually makes sense to me.
What I don't understand is how this would work in practice for a distributed database. How can the data be moved to another node if the node has crashed? Does it assume there is a backup/standby cluster available? Or redundant nodes it can be copied from?
Yes. Data is copied from other nodes in the cluster. If the data is not replicated, there is no way to bring back the data.
Consistent Hashing gives us a single node to which key is assigned. How are the other nodes on which the key is replicated are identified?
The answer is replication strategy is built on top of consistent hashing. First, the node to which key belongs is identified using consistent hashing. Second, system replicates the data by using another algorithm. One of the strategies is that the system writes data to the nodes which come next, in a clockwise direction, to the current node in the consistent hash ring. As an example, you can find some other replication strategies here.

Inserting into B+ tree using locks

I am trying to figure out how to insert an item into a B+ tree using locks and don't really understand the theory behind it.
So for searching, my view is that I put a lock on the root node, and then decide which child node I should go to and lock it, at this point I can release the parent node and continue this operation until I reach the leaf node.
But inserting is a lot more complicated because I can't allow any other threads to interfere with the insertion. My idea is put a lock on each node along the path to the leaf node but putting that many locks is quite expensive, and then the question I have is what happens when the leaf node splits because it is too large?
Does anyone know how to properly insert an item into a B+ tree using locks?
There are many different strategies for dealing with locking in B-Trees in general; most of these actually deal with B+Trees and its variations since they have been dominating the field for decades. Summarising these strategies would be tantamount to summarising the progress of four decades; it's virtually impossible. Here are some highlights.
One strategy for minimising the amount of locking during initial descent is to lock not the whole path starting from the root, but only the sub-path beginning at the last 'stable' node (i.e. a node that won't split or merge as a result of the currently planned operation).
Another strategy is to assume that no split or merge will happen, which is true most of the time anyway. This means the descent can be done by locking only the current node and the child node one will descend into next, then release the lock on the previously 'current' node and so on. If it turns out that a split or merge is necessary after all then re-descend from the root under a heavier locking regime (i.e. path rooted at last stable node).
Another staple in the bag of tricks is to ensure that each node 'descended through' is stable by preventative splitting/merging; that is, when the current node would split or merge under a change bubbling up from below then it gets split/merged right away before continuing the descent. This can simplify operations (including locking) and it is somewhat popular in reinventions of the wheel - homework assignments and 'me too' implementations, rather than sophisticated production-grade systems.
Some strategies allow most normal operations to be performed without any locking at all but usually they require that the standard B+Tree structure be slightly modified; see B-link trees for example. This means that different concurrent threads operating on the tree can 'see' different physical views of this tree - depending on when they got where and followed which link - but they all see the same logical view.
Seminal papers and good overviews:
Efficient Locking for Concurrent Operations on B-Trees (Lehman/Yao 1981)
Concurrent Operations on B*-Trees with Overtaking (Sagiv 1986)
A survey of B-tree locking techniques (Graefe 2010)
B+Tree Locking (slides from Stanford U, including Blink trees)
A Blink Tree method and latch protocol for synchronous deletion in a high concurreny environment (Malbrain 2010)
A Lock-Free B+Tree (Braginsky/Petrank 2012)

Multithreaded access to data structure

I am writing application in Linux using C, pthreads and sockets.
This will be client-server application, server will have N+2 threads, where N - number of active clients, one thread for accepting new connections and creating threads for clients and last one will be accepting user input.
I will be using linked list to save some data that will be relevant to my application, with every client there will be associated one node in my list. Those client threads will update information that is stored in their nodes with some interval, could be one second, could be two minutes, it will dynamically change.
Now here is the problem, if user requests it, the information stored in linked list needs to be written to standard output. Of course during writing I should acquire mutex. I am worried that one mutex for whole list will hinder the performance.
I was thinking about associating mutex with every node, but it will complicate removal of some specified node (firstly, I would need to make sure that the 'stdout writer' thread won't be traversing the list, I would also need to acquire mutex of my node and the previous one to change the pointer that points to the next node and so on - either I would need to traverse all the way to the previous or I would need to make double linked list).
So I am wondering if the solution that involves multiple mutexes is even better with much more complicated code, conditions and all of this locking, waiting and unlocking.
You are right that having a per-node mutex will make code more complex. That's a tradeoff you will have to decide the value of. You can either have a single lock for the entire list, that might cause lock contention, but the code is largely not impacted by the presence of the lock and thus easier to write, or you can have more locks with considerably less opportunity for contention, leading to better performance, but the code is harder to write and get correct. You could even have something in the middle by having a lock per group of nodes - allocate a few nodes together and have a lock for that group - but then you'll have issues with tracking a free list and the potential for fragmentation.
You'll need to consider the relative frequency of add operations, delete operations, and full-list iterations, as well as others (reorganization, searching, whatever else your application will require). If add/delete are extremely frequent, but walking the list is once every third blue moon, the single lock could easily be appropriate. But if walking the list (whether for a full dump of the data, or to search or something else) is very common, the more granular approach becomes more attractive. You might even need to consider reader/writer locks instead of mutexes.
You don't need to traverse the list all the way back: while you traverse it, you test if the next element is the one that you want to remove, and then you may lock both nodes - always in the same order throughout the code, so you avoid deadlocking. Also, you can use the double checking idiom and lock the mutex node when you need to be sure of what it has.
remove
for node in list
if node->next is the desired node
lock(node)
lock(node->next)
if node->next is the desired node
do removing stuff
else
treat concurrent modification - retry, maybe?
release(node->next)
release(node)
With this idiom you don't need to lock the entire list while reading it, and also checks for a modification performed between the first test and the locking. I don't believe the code would get that much more complicated with an array of mutexes, and the locking overhead is nothing compared with the operations you may do, as IO.
Unless you have tens or even hundreds of thousands of users, it won't take that long to read the list. You might want to create a local, intermediate list so the original is not locked while writing, which might take some time. This also means you get a snapshot of the list at one point in time. If you lock individual nodes, you could remove A, then remove element B, and yet have A appear in the displayed list when B does not.
As I understand it, if you do want to lock individual nodes, your list must be singly linked. Additions and removals get rather tricky. In Java, there are several system classes that do this using fast compare-and-swap techniques. There must be code like it in C, but I don't know where to look for it. And you will get those chronologically-challenged results.
If you are going to have N thread for N active client then think about the option of using pthread_setspecific and pthread_getspecific.

concurrent access and free of a data structure

The problem is like this:
I have an array of 500 pointers which point to 500 elements in a doubly linked list. There are 10 threads which run in parallel. Each thread runs 50 loops, and tries to free some element in the list.
The list is sorted (contain simple integers), and there are 10 other threads running in parallel, searching for the node that contains a particular integer and access the other satellite data in this node. So the node is like:
struct node
{
int key; // Key used to search this nodes
int x,y,z; // Satellite data
struct node *prev;
struct node *right;
};
The problem is easily solvable if I just lock the list before search / delete. But that is too coarse grained. How do I synchronize these threads so that I can achieve better concurrency?
Edits:
This is not a homework question. I do not belong to academia.
The array holding 500 pointers seems weird. I have made it like that to visualize my problems with least possible complexity.
I can think of a couple of broad approaches which don't involve a global lock, and should allow some degree of forward progress:
1. mark but don't remove
When a deletion thread identifies its victim, mark it as deleted but leave it in place.
When a search thread encounters a node with this deleted mark, it just ignores it.
You'll need to issue a write/release barrier after marking the node deleted, and an acquire barrier before inspecting the value: you'll need platform-specific, compiler-specific extensions, otherwise you're writing those barriers in assembler.
2. genuine removal with a lockfree list
As per the paper in Peeyush's answer; similar platform- or compiler-specific requirements for CAS, and significant care is required. Options such as refcounts or hazard pointers can allow the node to be genuinely deleted once no-one is looking at it. You may find you need to replace your prev/next pointers by short indices you can pack into a single word for CAS to work: this means bounding the number of nodes and allocating them in an array.
Also note that although every thread should be able to make progress with this sort of scheme, individual operations (eg. traversing to the next node) may become much more expensive due to the synchronisation requirements.
You might consider lock-free linked list using CompareAndSwap operation.
link to paper
You need to lock any data that can change. If you will do a lot of work, create one lock per item in the list. A thread has to have the previous, the current, and the next item locked in order to remove the middle one. Make sure to always get locks in the same order to avoid deadlocks.
Other delete threads and the search threads will have to wait until the object is removed and the new links set up. Then the locks are released and they can continue.

List insertion, disjoint n parallel?

I have been searching concurrent linked list implementations/academic papers that allow for concurrent insertions to disjoint places in the list. I would prefer a lock based approach.
Unfortunately, all the implementations I've checked out so far use list based locking as opposed to something akin to node based locking.
Any help people?
EDIT 1: Thanks all for the initial responses. Using node based locking means that for insertion after a node or deleting a node I need to lock the previous and the next node. Now it is entirely possible that by the time Thread 1 tries to lock the previous node it got deleted in Thread 2. How to guard against such accidents?
I'm not able to recommend any libraries that do this for C specifically, but if you end up doing it yourself you could potentially avoid having to have thousands of locks by re-using a small number of locks and some "hashing" to decide which to use for each node. You'd get quite a number of cases where there wouldn't be any contention if the number of locks is suitably larger than the number of nodes for little space overhead (and it's fixed, not per node).
Update, for EDIT 1
You could work around this by having a per-list multiple reader, single write lock, (rwlock), where you acquire a "read" lock prior to getting the per-node lock for inserts, but for a delete you need to get the single "write" lock. You avoid unnecessary synchronisation issues for the read/insert operations fairly easily and deleting is simple enough. (The assumption is delete is much rarer than insert though)
You may want to look at using a lock-free implementation. The idea is to use an atomic test-set operation when inserting/deleting a node.
Unfortunately, there are not many widely known implementations. You may have to roll your own. Here is the gcc documentation about atomic operation support:
http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Atomic-Builtins.html
The trouble with node based locking is that you normally have to lock two nodes for each insertion. This can be more expensive in some situations.
Worse is that you get dining philosopher alike deadlock possibilities you have to treat.
So therefore list based locking is easier and thats why you see more about these.
If the performance characteristics of list based locking is not favorable to your application consider changing to a different data structure than as single linked list.

Resources