RAFT: What happens when Leader change during operation - distributed

I want to understand following scenario:
3 node cluster ( n1, n2, n3 )
Assume n1 is the leader
Client send operation/request to n1
For some reason during the operation i.e. ( appendentry, commitEntry ... ) leader changes to n2
But n1 has successfully written to its log
Should this be considered as failure of operation/request ? or It should return "Leader Change"

This is an indeterminate result because we don't know at the time if the value will be committed or not. That is, the new leader may decide to keep or overwrite the value; we just do not have the information to know.
In some systems that I have maintained the lower-level system (the consensus layer) gives two success results for each request. It first returns Accepted when the leader puts it into its log and then Committed when the value is sufficiently replicated.
The Accepted result means that the value may be committed later, but it may not.
The layers that wrap the consensus layers can be a bit smarter about the return value to the client. They have a broader view of the system and may retry the requests and/or query the new leader about the value. That way the system-as-a-whole can have the customary single return value.

Related

How to strong consistency read when a node goes down

To my understanding, strong consistency can be achieved when Vr + Vw > V. Vr is the read quorum (Vr), Vw is the write quorum. Assume V = 3.
When write a value (val = 2) to the DB, it only need to write success to 2 machines (e.g. Machine A, and B). when read a value from DB, it only needs to read from 2 machines if they return the same versioned value, in order to achieve strong consistency.
What if: after successfully persist val =2 to machine A an B, A went down and the value hasn't been replicated to machine C. Then when reading the value , it has to read from machine B and C, which has different value. Then which value will it choose as the latest result ?
It can't pick a value because there is no quorum on the version to pick. So effectively the system is unavailable for that read.

Why paxos acceptor must send back any value they have already accepted

I'm learning MIT 6.824 class and I have a question about paxos. When proposer send a prepare to acceptor, acceptor will return a prepare_ok with n and v which highest accept see. I wonder why acceptor need to return n and v?
In a nutshell, the acceptor must return v because if the value was already committed then the new proposer needs to know what it is. There is no global "is_committed" flag, so the proposer gathers all these values and their associated rounds from at least a majority of the acceptors. The proposer then sends the value with the highest round to all the acceptors.
As you can see, a proposer always finishes what another proposer has started when it receives a value from an acceptor. This is a little bit similar to a lot of wait-free algorithms.

Efficient data structure and strategy for synchronizing several item collections

I want one primary collection of items of a single type that modifications are made to over time. Periodically, several slave collections are going to synchronize with the primary collection. The primary collection should send a delta of items to the slave collections.
Primary Collection: A, C, D
Slave Collection 1: A, C (add D)
Slave Collection 2: A, B (add C, D; remove B)
The slave collections cannot add or remove items on their own, and they may exist in a different process, so I'm probably going to use pipes to push the data.
I don't want to push more data than necessary since the collection may become quite large.
What kind of data structures and strategies would be ideal for this?
For that I use differential execution.
(BTW, the word "slave" is uncomfortable for some people, with reason.)
For each remote site, there is a sequential file at the primary site representing what exists on the remote site.
There is a procedure at the primary site that walks through the primary collection, and as it walks it reads the corresponding file, detecting differences between what currently exists on the remote site and what should exist.
Those differences produce deltas, which are transmitted to the remote site.
At the same time, the procedure writes a new file representing what will exist at the remote site after the deltas are processed.
The advantage of this is it does not depend on detecting change events in the primary collection, because often those change events are unreliable or can be self-cancelling or made irrelevant by other changes, so you cut way down on needless transmissions to the remote site.
In the case that the collections are simple lists of things, this boils down to having local copies of the remote collections and running a diff algorithm to get the delta.
Here are a couple such algorithms:
If the collections can be sorted (like your A,B,C example), just run a merge loop:
while(ix<nx && iy<ny){
if (X[ix] < Y[iy]){
// X[ix] was inserted in X
ix++;
} else if (Y[iy] < X[ix]){
// Y[iy] was deleted from X
iy++;
} else {
// the two elements are equal. skip them both;
ix++; iy++;
}
}
while(ix<nx){
// X[ix] was inserted in X
ix++;
}
while(iy<ny>){
// Y[iy] was deleted from X
iy++;
}
If the collections cannot be sorted (note relationship to Levenshtein distance),
Until we have read through both collections X and Y,
See if the current items are equal
else see if a single item was inserted in X
else see if a single item was deleted from X
else see if 2 items were inserted in X
else see if a single item was replaced in X
else see if 2 items were deleted from X
else see if 3 items were inserted in X
else see if 2 items in X replaced 1 items in Y
else see if 1 items in X replaced 2 items in Y
else see if 3 items were deleted from X
etc. etc. up to some limit
Performance is generally not an issue, because the procedure does not have to be run at high frequency.
There's a crude video demonstrating this concept, and source code where it is used for dynamically changing user interfaces.
If one doesn't push all data, sort of a log is required, which, instead of using pipe bandwidth, uses main memory. The parameter to find a good balance between CPU & memory usage would be the 'push' frequency.
From your question, I assume, you have more than one slave process. In this case, some shared memory or CMA (Linux) approach with double buffering in the master process should outperform multiple pipes by far, as it doesn't even require multithreaded pushing, which would be used to optimize the overall pipe throughput during synchronization.
The slave processes could be notified using a global synchronization barrier for reading from masterCollectionA without copying, while master modifies masterCollectionB (which is initialized with a copy from masterCollectionA) and vice versa. Access to a collection should be interlocked between slaves and master. The slaves could copy that collection (snapshot), if they would block it past the next update attempt from master, thus, allowing it to continue. Modifications in slave processes could be implemented with a copy on write strategy for single elements. This cooperative approach is rather simple to implement and in case the slave processes don't copy whole snapshots everytime, the overall memory consumption is low.

C - How can this simple transaction() function be completely free of deadlocks?

So I have this basic transaction() function written in C:
void transaction (Account from, Account to, double amount) {
mutex lock1, lock2;
lock1 = get_lock(from);
lock2 = get_lock(to);
acquire(lock1);
acquire(lock2);
withdraw(from, amount);
deposit(to, amount);
release(lock2);
release (lock1);
}
It's to my understanding that the function is mostly deadlock-free since the function locks one account and then the other (instead of locking one, making changes, and then locking another). However, if this function was called simultaneously by these two calls:
transaction (savings_account, checking_account, 500);
transaction (checking_account, savings_account, 300);
I am told that this would result in a deadlock. How can I edit this function so that it's completely free of deadlocks?
You need to create a total ordering of objects (Account objects, in this case) and then always lock them in the same order, according to that total ordering. You can decide what order to lock them in, but the simple thing would be to first lock the one that comes first in the total ordering, then the other.
For example, let's say each account has an account number, which is a unique* integer. (* meaning no two accounts have the same number) Then you could always lock the one with the smaller account number first. Using your example:
void transaction (Account from, Account to, double amount)
{
mutex first_lock, second_lock;
if (acct_no(from) < acct_no(to))
{
first_lock = get_lock(from);
second_lock = get_lock(to);
}
else
{
assert(acct_no(to) < acct_no(from)); // total ordering, so == is not possible!
assert(acct_no(to) != acct_no(from)); // this assert is essentially equivalent
first_lock = get_lock(to);
second_lock = get_lock(from);
}
acquire(first_lock);
acquire(second_lock);
withdraw(from, amount);
deposit(to, amount);
release(second_lock);
release(first_lock);
}
So following this example, if checking_account has account no. 1 and savings_account has account no. 2, transaction (savings_account, checking_account, 500); will lock checking_account first and then savings_account, and transaction (checking_account, savings_account, 300); will also lock checking_account first and then savings_account.
If you don't have account numbers (say your working with class Foo instead of class Account) then you need to find something else to establish a total ordering. If each object has a name, as a string, then you can do an alphabetic comparison to determine which string is "less". Or you can use any other type that is comparable for > and <.
However, it is very important that the values be unique for each and every object! If two objects have the same value in whichever field you're testing, then they in the same spot in the ordering. If that can happen, then it is a "partial ordering" not a "total ordering" and it is important to have a total ordering for this locking application.
If necessary, you can make up a "key value" that is an arbitrary number that doesn't mean anything, but is guaranteed unique for each object of that type. Assign a new, unique value to each object when it is created.
Another alternative is to keep all the objects of that type in some kind of list. Then their list position serves to put them in a total ordering. (Frankly, the "key value" approach is better, but some applications may be keeping the objects in a list already for application logic purposes so you can leverage the existing list in that case.) However, take care that you don't end up taking O(n) time (instead of O(1) like the other approaches*) to determine which one comes first in the total ordering when you use this approach.
(* If you're using a string to determine total ordering, then it's not really O(1), but it's linear with the length of the strings and constant w.r.t. the number of objects that hold those strings... However, depending on your application, the string length may be much more reasonably bounded than the numer of objects.)
The problem you are trying to solve is called the dining philosophers problem, it is a well known concurrency problem.
In your case the naive solution would be to change acquire to receive 2 parameters(to and from) and only return when it can get both locks at the same time and to not get any lock if it can't have both (because that's the situation when the deadlock may occur, when get 1 lock and wait for the other). Read about the dining philosophers problem and you'll understand why.
Hope it helps!

Recursion within a thread

Is it a good idea to call a recursive function inside a thread ?
I am creating 10 threads, the thread function in turn call a recursive function . The bad part is
ThreadFunc( )
{
for( ;condn ; )
recursiveFunc(objectId);
}
bool recursiveFunc(objectId)
{
//Get a instance to the database connection
// Query for attibutes of this objectId
if ( attibutes satisfy some condition)
return true;
else
recursiveFunc(objectId) // thats the next level of objectId
}
The recursive function has some calls to the database
My guess is that a call to recursive function inside a loop is causing a performance degradation. Can anyone confirm
Calling a function recursively inside a thread is not a bad idea per se. The only thing you have to be aware of is to limit the recursion depth, or you may produce a (wait for it...) stack overflow. This is not specific to multithreading but applies in any case where you use recursion.
In this case, I would recommend against recursion because it's not necessary. Your code is an example of tail recursion, which can always be replaced with a loop. This eliminates the stack overflow concern:
bool recursiveFunc(objectId)
{
do
{
// Get an instance to the database connection
// Query for attributes of this objectId
// Update objectId if necessary (not sure what the "next level of objectId" is)
}
while(! attributes satisfy some condition);
return true;
}
There's no technical reason why this wouldn't work - it's perfectly legal.
Why is this code the "bad part"?
You'll need to debug/profile this and recursiveFunc to see where the performance degradation is.
Going by the code you've posted have you checked that condn is ever satisfied so that your loop terminates. If not it will loop for ever.
Also what does recursiveFunc actually do?
UPDATE
Based on your comment that each thread performs 15,000 iterations the first thing I'd do would be to move the Get an instance to the database connection code outside recursiveFunc so that you are only getting it once per thread.
Even if you rewrite into a loop (as per Martin B's answer) you would still want to do this.
It depends on how the recursive function talks to the database. If each (or many) level of recursion reopens the database that can be the reason for degradation. If they all share the same "connection" to the database the problem is not in recursion but in the number of threads concurrently accessing the database.
The only potential problem I see with the posted code is that it can represent an infinite loop, and that's usually not what you want (so you'd have to force break somewhere on known reachable conditions to avoid having to abend the application in order to break out of the thread (and subsequently the thread).
Performance degradation can happen with both threading, recursion, and database access for a variety of reasons.
Whether any or all of them are at fault for your problems is impossible to ascertain from the little you're showing us.

Resources