Concurrent loads and stores

Concurrent loads and stores - c

In C I have:
double balance;
void deposit(double amount)
{balance = balance +amount;}
machine language:
load R1, balance
load R2, amount
add R1, R2
store R1, balance
If the variable balance contains 500 and two threads run the procedure to deposit 300 and 200 respectively concurrently, how can this be problematic? And how do I use a concurrency mechanism to make this procedure thread safe?

Concurrency 101
Thread 1 Thread 2
load R1, balance
load R2, amount load R1, balance
add R1, R2 load R2, amount
store R1, balance add R1, R2
store R1, balance
The write by Thread 1 is lost. (There are many sequences that achieve approximately the same result.)
You fix it by locking balance so that only one thread or the other has access to it between the load and the store. Acquire a mutex on balance at the start of the sequence and release it at the end. Consider loading amount before loading balance to reduce the scope of the mutex to the minimum.

Related

How to implement a LEFT OUTER JOIN with streams in Apache Flink

I have two streams left and right.
For the same time window let's say that
the left stream contains the elements L1, L2 (the number is the
key)
the right stream contains the elements R1, R3
I wonder how to implement a LEFT OUTER JOIN in Apache Flink so that the result obtained when processing this window is the following:
(L1, R1), (L2, null)
L1, R1 are matching by key (1), and L2, R3 do not match. L2 is included because is at left

Well, You should be able to obtain the proper results with the coGroup operator and properly implemented CoGroupFunction. The function gives You access to the whole group in the coGroup method. The documentation states that for CoGroupFunction one of the groups may be empty, so this should allow You to implement the Outer Join. The only issue is the fact that groups are currently created in memory, so You need to verify that Your groups won't grow too big as they can effectively kill the JVM.

Fastest way to check for absolute bound

In a function, I want to check the absolute bound of a number as a condition for an operation. I want to do abs(r1) > 15 and right now I have an unoptimized way of doing it which is:
CMP r1, #15
ADDGT //operation
CMP r1, #-15
ADDLT //operation
Anyone thinks there could be a faster way? I was thinking of maybe right shifting by 4 so if it is less than +/-15, it will be all 1s or all 0s but I couldn't find a good way of doing it.

Paxos questions: if proposer down, what happened?

I'm going through kinds of scenarios which the basic Paxos algorithm could get agreement of final result. There's one case I can't explain the result.
There's two proposed P1 P2, three acceptor A1 A2 A3. P1 would propose value u, P2 would propose value v.
1. P1(send id n) finish the prepare step, receive all promise from A1 A2 A3, then in A1 A2 A3 all store n as id.
2. P2(send id n+1) then A1 A2 A3 store n+1 as id
3. P2 down.
4. P1 send accept request with (n, u) to A1 A2 A3, of course A1 A2 A3 would refuse the request, unfortunately at the same time P2 already down.
Such proposer down case, what would we do next? another new round of Paxos?

Do a new paxos round, this is exactly what it is for.
The proposers send their value in the Prepare message, so the acceptors will send P2's value to P1 in the next paxos round.

I reviewed all my notes and material from class when I studied this a few years ago. And for a Paxos correct implementation, it must be fault tolerant and never terminate, in both side, proposers and acceptors. Since the question is about fault tolerance for proposers, i'll focus on them:
A solution (but not the only one) to this issue, is to replicate the proposers, having several instances of each proposer type, one of them being the leader/master choosen at the begining, and is the one who sends the proposals. If the master fails, another one, whom might be decided in a new election, or use a priority stablished on the initialization, steps up as new master, and take its place.
In this case you could have 3 instances of P2: P2-1, P2-2, P2-3 with P2-1 being the leader by default, if P2-1 fails, then P2-2 can step up.
Have to keep in mind, that the acceptors can ask for acknowledgement of P2 and P2-2 is still in the middle of stepping up as new leader, so probably is a good idea to set a retry after a timeout, to give P2-2 enough time to be ready.

"Paxos Made Simple" Lamport
"It’s easy to construct a scenario in which two proposers each keep issuing a sequence of proposals with increasing numbers, none of which are ever chosen. Proposer p completes phase 1 for a proposal number n1. Another proposer q then completes phase 1 for a proposal number n2 > n1. Proposer p’s phase 2 accept requests for a proposal numbered n1 are ignored because the acceptors have all promised not to accept any new proposal numbered less than n2. So, proposer p then begins and completes phase 1 for a new proposal number n3 > n2, causing the second phase 2 accept requests of proposer q to be ignored. And so on."
From the description Step 4:
(A1,A2,A3) would reply the accept request from P1, send id+1 to P1, then P1 is notified and increase id -> id+2. P1 send to (A1,A2,A3) prepare request again with id+2. In order to avoid live-lock between P1 and P2, better way should be only one proposer (reference to 2.4 chapter of "Paxos Made Simple").

SPSS creating a loop for a multiple regression over several variables

For my master thesis I have to use SPSS to analyse my data. Actually I thought that I don't have to deal with very difficult statistical issues, which is still true regarding the concepts of my analysis. BUT the problem is now that in order to create my dependent variable I need to use the syntax editor/ programming in general and I have no experience in this area at all. I hope you can help me in the process of creating my syntax.
I have in total approximately 900 companies with 6 year observations. For all of these companies I need the predicted values of the following company-specific regression:
Y= ß1*X1+ß2*X2+ß3*X3 + error
(I know, the ß won t very likely be significant, but this is nothing to worry about in my thesis, it will be mentioned in the limitations though).
So far my data are ordered in the following way
COMPANY YEAR X1 X2 X3
1 2002
2 2002
1 2003
2 2003
But I could easily change the order, e.g. in
1
1
2
2 etc.
Ok let's say I have rearranged the data: what I need now is that SPSS computes for each company the specific ß and returns the output in one column (the predicted values with those ß multiplied by the specific X in each row). So I guess what I need is a loop that does a multiple linear regression for 6 rows for each of the 939 companies, am I right?
As I said I have no experience at all, so every hint is valuable for me.
Thank you in advance,
Janina.

Bear in mind that with only six observations per company and three (or 4 if you also have a constant term) coefficients to estimate, the coefficient estimates are likely to be very imprecise. You might want to consider whether companies can be pooled at least in part.

You can use SPLIT FILE to estimate the regressions specific for each company, example below. Note that one would likely want to consider other panel data models, and assess whether there is autocorrelation in the residuals. (This is IMO a useful approach though for exploratory analysis of multi-level models.)
The example declares a new dataset to pipe the regression estimates to (see the OUTFILE subcommand on REGRESSION) and suppresses the other tables (with 900+ tables much of the time is spent rendering the output). If you need other statistics either omit the OMS that suppresses the tables, or tweak it to only show the tables you want. (You can use OMS to pipe other results to other datasets as well.)
************************************************************.
*Making Fake data.
SET SEED 10.
INPUT PROGRAM.
LOOP #Comp = 1 to 1000.
COMPUTE #R1 = RV.NORMAL(10,2).
COMPUTE #R2 = RV.NORMAL(-3,1).
COMPUTE #R3 = RV.NORMAL(0,5).
LOOP Year = 2003 to 2008.
COMPUTE Company = #Comp.
COMPUTE Rand1 = #R1.
COMPUTE Rand2 = #R2.
COMPUTE Rand3 = #R3.
END CASE.
END LOOP.
END LOOP.
END FILE.
END INPUT PROGRAM.
DATASET NAME Companies.
COMPUTE x1 = RV.NORMAL(0,1).
COMPUTE x2 = RV.NORMAL(0,1).
COMPUTE x3 = RV.NORMAL(0,1).
COMPUTE y = Rand1*x1 + Rand2*x2 + Rand3*x3 + RV.NORMAL(0,1).
FORMATS Company Year (F4.0).
*Now sorting cases by Company and Year, then using SPLIT file to estimate
*the regression.
SORT CASES BY Company Year.
*Declare new set and have OMS suppress the other results.
DATASET DECLARE CoeffTable.
OMS
/SELECT TABLES
/IF COMMANDS = 'Regression'
/DESTINATION VIEWER = NO.
*Now split file to get the coefficients.
SPLIT FILE BY Company.
REGRESSION
/DEPENDENT y
/METHOD=ENTER x1 x2 x3
/SAVE PRED (CompSpePred)
/OUTFILE = COVB ('CoeffTable').
SPLIT FILE OFF.
OMSEND.
************************************************************.

how do I figure out provisional throughput for AWS DynamoDB table?

My system is supposed to write a large amount of data into a DynamoDB table every day. These writes come in bursts, i.e. at certain times each day several different processes have to dump their output data into the same table. Speed of writing is not critical as long as all the daily data gets written before the next dump occurs. I need to figure out the right way of calculating the provisional capacity for my table.
So for simplicity let's assume that I have only one process writing data once a day and it has to write upto X items into the table (each item < 1KB). Is the capacity I would have to specify essentially equal to X / 24 / 3600 writes/second?
Thx

The provisioned capacity is in terms of writes/second. You need to make sure that you can handle the PEAK number of writes/second that you are going to expect, not the average over the day. So, if you have a single process that runs once a day and makes X number of writes, of Y size (in KB, rounded up), over Z number of seconds, your formula would be
capacity = (X * Y) / Z
So, say you had 100K writes over 100 seconds and each write < 1KB, you would need 1000 w/s capacity.
Note that in order to minimize provisioned write capacity needs, it is best to add data into the system on a more continuous basis, so as to reduce peaks in necessary read/write capacity.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Concurrent loads and stores - c

Related

How to implement a LEFT OUTER JOIN with streams in Apache Flink

Fastest way to check for absolute bound

Paxos questions: if proposer down, what happened?

SPSS creating a loop for a multiple regression over several variables

how do I figure out provisional throughput for AWS DynamoDB table?

Categories

Resources