shared / exclusive locking scheme and strict two-phase locking protocol - database

[exam question]Given a key:
S: r3(X); r1(Y); w3(Z); w3(X); w1(Z); w1(X); r2(X); w2(Z); w2(Y); r1(X);
and also
T1: r1(Y); w1(Z); w1(X); r1(X);
T2: r2(X); w2(Z); w2(Y);
T3: r3(X); w3(Z); w3(X);
Indicate after each operation which locks are there. I really dont understand it and I need your help. thank you.
this is the answer: but I have no idea how its done.
time stamp ordering for this question:

Note: This should be a comment, but I don't have reputation for it
I don't think you should be asking for help about homework, though I can point that if S is the sequence of operations (it looks like it is) then the answer is pretty straighforward.
EDIT:
On chapter 16, Section 4, Figure 16.15 of Distributed Systems: Concepts and Design 5th edition (link) you have the lock compatibility table.
Also, Figure 16.16 has everything you need to know to execute the 2PL.
As for the commits (the c1, c2, c3), you just need to look at the last operation of each transaction.
Thus, first build the table with the operations and the commit events and then, for each object (X, Y, Z) determine who has the locks by using the references I just gave you.

Related

Basic interpretation of two table operations (row wise/not row wise) in Postgis

I'm new in PostGIS, I has been reading the docs, usually the docs are very good written, at least for tables of 1 row D:
Probs this will be a silly question, or obvs too all ones that know postgis, but plis help a little to can go inside from other languages.
I have checked a lot from:
https://postgis.net/workshops/postgis-intro/
Sadly, I still can't get an answer for a simple question, the behavior of a lot of functions in table-table operations.
I know R/sf, and I'm trying to learn Postgis, but usually, every function have its own way to relate the functions, as example, IIRC intersects exists in sf and geopandas, but..., the behavior of the function is different, even when they have the same name.
Lets pick an example:
https://postgis.net/docs/ST_Intersects.html
The function is defined as:
boolean ST_Intersects( geometry geomA , geometry geomB );
All the params are defined a geometry, that means it can be a column or a singular geometry, but we don't know what will be the behavior if the tables has more than 1 row, maybe when it says "geometries" will interpret the full table as one big geometry.
Then I can go to this link:
https://postgis.net/docs/geometry_overlaps.html
Where I can finally see a result that seems a matrix operation..., at some extent, here is where the possibilities starts to open.
Intersects is a row wise function?
Intersects will intersects every from from the first table over the second table? in case how would be the return...? (need a table of rows(table1)*rows(table2), this is not written in the docs)
Here above, are just the questions and what is confusing, checking intersects, now lets go back to the specific issue.
Probably, the relation of the functions is a common sense in postgis, because in the doc that is omitted, and not only in intersects in others like intersection, disjoint, etc. I think all of them has the same behavior, is just implicit.
So, postgis works in a element-element? table-element? element-table? table-table? or other interpretation? or every function have its own way but is not written or I need search on other place?
Thx!

Apriori algrithm - finding associations in production data

I have problem with finding "correct" associations within production data.
The data looks like this
A;B;C;D;E;F;G
1;0;1;0;0;0;0
0;1;0;0;0;0;0
0;0;0;1;0;0;0
0;0;1;0;1;0;0
1;0;0;0;0;0;0
0;0;0;0;0;1;0
0;0;0;0;0;0;1
1;0;1;0;0;0;0
(Of course I have a lot more steps and rows)
Where A,B,C etc are production steps. 0 means that a worker did not perform this production step and 1 means that this step was performed by a worker. For example, first row - 1;0;1;0;0;0;0 means that steps A & C where performed at the same time by a worker. And second row -0;1;0;0;0;0;0 means that (perhaps another worker) performed only production step B.
So it happens that some of the production steps are usually performed simultaneously by the same worker, just like step A & C in the example above (2 out of 3 times they occur together). In order to find which steps tend to be performed together I applied apriori algorithm.
I hoped to receive answer like "If there is 1 in column A, it is likely that 1 will appear in column C". But instead, apriori algorithm found for me this "cool" rules which basically say that there are a lot of 0s in the table. Rules found where like this "If there is 0 in columns A and G, it is likely that there is 0 in column E" - thanks Sherlock
I need this algorithm to focus on rules connected to where are 1s in the table, not 0s. Basically any rule that looks at 0s can be ignored. I just want rules that look at 1s because I want to know which production steps tend to be performed together and I don't care which production steps are not performed together (0s) because obviously majority of the steps are not performed simultaneously.
Does anybody have some idea how to find associations between 1s instead of 0s?
I use Weka software to do the data mining.
Apriori has no notion of what the labels represent, they are just strings.
Have you tried the -Z option, treating the first label in attribute as missing?

Flink Streaming Job execution graph analyse

I have some question in terms of performance with Flink,
Can someone tell me what is wrong if my program has an execution plan same as the picture below?
Thank you.
enter image description here
From your description I cannot immediately see, why you need more than one hash per source. Any kind of network shuffle limits throughput, so avoiding all unnecessary shuffles seems like the best solution in your case.
The final picture should look like
Source 1 --\
\
Source 2 ----\
+---> Map ---> Sink
... /
/
Source N --/
Such that each input record is only rekeyed once.
Aside from these general consideration, I'd need much more details and CEP pseudocode to give more specific recommendations.

what is the serializability graph of this?

I try to figure out a question, however I do not how to solve it, I am unannounced most of the terms in the question. Here is the question:
Three transactions; T1, T2 and T3 and schedule program s1 are given
below. Please draw the precedence or serializability graph of the s1
and specify the serializability of the schedule S1. If possible, write
at least one serial schedule. r ==> read, w ==> write
T1: r1(X);r1(Z);w1(X);
T2: r2(Z);r2(Y);w2(Z);w2(Y);
T3: r3(X);r3(Y);w3(Y);
S1: r1(X);r2(Z);r1(Z);r3(Y);r3(Y);w1(X);w3(Y);r2(Y);w2(Z);w2(Y);
I do not have any idea about how to solve this question, I need a detailed description. In which resource should I look for? Thank in advance.
There are various ways to test for serializability. The Objective of serializability is to find nonserial schedules that allow transactions to execute concurrently without interfering with one another.
First we do a Conflict-Equivalent Test. This will tell us whether the schedule is serializable.
To do this, we must define some rules (i & j are 2 transactions, R=Read, W=Write).
We cannot Swap the order of actions if equivalent to:
1. Ri(x), Wi(y) - Conflicts
2. Wi(x), Wj(x) - Conflicts
3. Ri(x), Wj(x) - Conflicts
4. Wi(x), Rj(x) - Conflicts
But these are perfectly valid:
R1(x), Rj(y) - No conflict (2 reads never conflict)
Ri(x), Wj(y) - No conflict (working on different items)
Wi(x), Rj(y) - No conflict (same as above)
Wi(x), Wj(y) - No conflict (same as above)
So applying the rules above we can derive this (using excel for simplicity):
From the result, we can clearly see with managed to derive a serial-relation (i.e. The schedule you have above, can be split into S(T1, T3, T2).
Now that we have a serializable schedule and we have the serial schedule, we now do the Conflict-Serialazabile test:
Simplest way to do this, using the same rules as the conflict-equivalent test, look for any combinations which would conflict.
r1(x); r2(z); r1(z); r3(y); r3(y); w1(x); w3(y); r2(y); w2(z); w2(y);
----------------------------------------------------------------------
r1(z) w2(z)
r3(y) w2(y)
w3(y) r2(y)
w3(y) w2(y)
Using the rules above, we end up with a table like above (e.g. we know reading z from one transaction and then writing z from another transaction will cause a conflict (look at rule 3).
Given the table, from left to right, we can create a precedence graph with these conditions:
T1 -> T2
T3 -> T2 (only 1 arrow per combination)
Thus we end up with a graph looking like this:
From the graph, since there it's acyclic (no cycle) we can conclude the schedule is conflict-serializable. Furthermore, since its also view-serializable (since every schedule that's conflict-s is also view-s). We could test the view-s to prove this, but it's rather complicated.
Regarding sources to learn this material, I recommend:
"Database Systems: A practical Approach To design, implementation and management: International Edition" by Thomas Connolly; Carolyn Begg - (It is rather expensive so I suggest looking for a cheaper, pdf copy)
Good luck!
Update
I've developed a little tool which will do all of the above for you (including graph). It's pretty simple to use, I've also added some examples.

Logic programming with integer or even floating point domains

I am reading a lot about logic programming - ASP (Answer Set Programming) is one example or this. They (logic programs) are usually in the form:
[Program 1]
Rule1: a <- a1, a2, ..., not am, am+1;
Rule2: ...
This set of rules is called the logic program and the s.c. model is the result of such computation - some kind of assignment of True/False values to each of a1, a2, ...
There is lot of research going on - e.g. how such kind of programs (rules) can be integrated with the (semantic web) ontologies to build knowledge bases that contain both - rules and ontologies (some kind of constraints/behaviour and data); there is lot of research about ASP itself - like parallel extensions, extensions for probabilistic logic, for temporal logic and so on.
My question is - is there some kind of research and maybe some proof-of-concept projects where this analysis is extended from Boolean variables to variables with integer and maybe even float domains? Currently I have not found any research that could address the following programs:
[Program 2]
Rule1 a1:=5 <- a2=5, a3=7, a4<8, ...
Rule2 ...
...
[the final assignment of values to a1, a2, etc., is the solution of this program]
Currently - as I understand - if one could like to perform some kind of analysis on Program-2 (e.g. to find if this program is correct in some sense - e.g. if it satisfies some properties, if it terminates, what domains are allowed not to violate some kind of properties and so on), then he or she must restate Program-2 in terms of Program-1 and then proceed in way which seems to be completely unexplored - to my knowledge (and I don't believe that is it unexplored, simply - I don't know some sources or trend). There is constraint logic programming that allow the use of statements with inequalities in Program-1, but it is too focused on Boolean variables as well. Actually - Programm-2 is of kind that can be fairly common in business rules systems, that was the cause of my interest in logic programming.
SO - my question has some history - my practical experience has led me to appreciate business rules systems/engines, especially - JBoss project Drools and it was my intention to do some kind of research of theory underlying s.c. production rules systems (I was and I am planning to do my thesis about them - if I would spot what can be done here), but I can say that there is little to do - after going over the literature (e.g. http://www.computer.org/csdl/trans/tk/2010/11/index.html was excellent IEEE TKDE special issues with some articles about them, one of them was writter by Drools leader) one can see that there is some kind of technical improvements of the decades old Rete algorithm but there is no theory of Drools or other production rule systems that could help with to do some formal analysis about them. So - the other question is - is there theory of production rule systems (for rule engines like Drools, Jess, CLIPS and so on) and is there practical need for such theory and what are the practical issues of using Drools and other systems that can be addressed by the theory of production rule systems.
p.s. I know - all these are questions that should be directed to thesis advisor, but my current position is that there is no (up to my knowledge) person in department where I am enrolled with who could fit to answer them, so - I am reading journals and also conference proceedings (there are nice conference series series of Lecture Notes in Computer Science - RuleML and RR)...
Thanks for any hint in advance!
In a sense the boolean systems already do what you suggest.
to ensure A=5 is part of your solution, consider the rules (I forget my ASP syntax so bear with me)
integer 1..100 //integers 1 to 100 exist
1{A(X) : integer(X)}1 //there is one A(X) that is true, where X is an integer
A(5) //A(5) is true
and I think your clause would require:
integer 1..100 //integers 1 to 100 exist
1{A(X) : integer(X)}1 //A1 can take only one value and must take a value
1{B(X) : integer(X)}1 //A2 ``
1{C(X) : integer(X)}1 //A3 ``
1{D(X) : integer(X)}1 //A4 ``
A(5) :- B(5), C(7), D(8) //A2=5, A3=7, A4=8 ==> A1=5
I hope I've understood the question correctly.
Recent versions of Clojure core.logic (since 0.8) include exactly this kind of support, based on cKanren
See an example here: https://gist.github.com/4229449

Resources