DB Transaction schedules - database

I basically understand what Serial and Seralizable Schedules are but the question I want to answer is
Briefly explain the terms, Serializable Schedule, Serial Schedule and
Equivalent Schedule
What is the difference between a Serealizable Schedule and an Equivalent Schedule?
Anything to do with the order of Transactions? or the values writing in to them?

Serial Schedule
One process at a time
Does not interleave
Equivalent Schedule
When the effect of executing different schedules are the same
Serializable Schedule
A schedule that is equivalent to some Serial Values

Related

Kinds of multi-partitioned stored procedures and will they still lock the entire cluster in VoltDB 9?

I try to understand the impact of multi-partitioned transactions in VoltDB 9.x. I know it is designed for single-partioned transactions, but I want to know what it will cost me if I can't avoid it.
In summary, my question is whether it is still the case that multi-partitioned transactions in VoltDB always lock the entire cluster and how are the different kinds of multi-partitioned transactions are related to each other regarding to their execution behaviour?
From H-Store-FAQ:
[...] this allows H-Store to support additional optimizations, such as speculative execution and arbitrary multi-partition transactions. For example, in VoltDB every transaction is either single-partition or all-partition. That is, any transaction that needs to touch multiple partitions will cause the VoltDB’s transaction coordinator to lock all partitions in the cluster, even if the transaction only needs to touch data at two partitions. [...] It is likely VoltDB will support these features in the future [...]
The papers The VoltDB Main Memory DBMS and How VoltDB does Transactions claim that it exists at least one split of multi-partitioned transactions in VoltDB: One-Shot-Reads and General-2PC-Transactions.
In the class MpTransactionTaskQueue there is a distinction, whether a transaction will be routed to the multi-partitioned site (count 1) or a pool of read-only sites (default count up to 20) of the MPI and they can't be executed interleaved.
So these are my sub questions:
Are One-Shot-Reads always be executed on RO-Sites?
Are RO-Sites execute read-only and not-one-phase multi-partitioned transactions in addition?
If it is at least one write fragment in a multi-partitioned transactions it will be executed on the RW-Site and atomic committed with 2PC?
In both cases it is possible, that I don't have to touch all partitions in the cluster. Are uninvolved partitions locked or can they execute single-partitioned transactions in the meantime (if several One-Shot-Reads or one 2PC-Transaction are running on other partitions). If they are locked, how? Does they get the FragmentTaskMessage with an empty or dummy plan fragment for example?
The class SystemProcedureCatalog defines an "Every-Flag" and it will be checked in code in addition to the read-only and single-partitioned flags. How does this flag is related to One-Shot-Reads or the Run-Everywhere-Pattern?
To make things easier for developers, procedures are called the same way regardless of what type they are. Internally there are different types of multi-partition procedures as they provide some optimizations, although there is more to be done and some H-Store projects have done research in these areas.
MP transactions still ultimately involve sending tasks to be done on all the partitions. The one exception you noticed is a special two-partition transaction that is only used in rebalancing data during elastic add or shrink.
Partitions consist of one or more sites (on separate servers) depending on kfactor. These sites stay in sync without a 2PC by requiring deterministic procedures. The partitions work through the backlog in a queue as fast as the process time (or local execution time) allows. All sites handle both reads and writes.
MP tasks sent to those partition queues have to wait on all the pending items to finish. That is why there is a pool of 20 (by default) threads for MP reads. This allows 20 tasks to be sent out at once, so that the next MP read usually doesn't have to wait for 2 networks hops + the max queue wait time + processing time before it can even get queued.
MP reads that are not "single-shot" would be Java procedures with multiple voltExecuteSQL() calls, such as a procedure where subsequent SQL queries depend on the results of prior queries. When these transactions send tasks to the partitions, the partitions have to wait for the max queue wait time + processing time + 2 network hops before they can do the next part of the transaction.
MP writes can also have multiple voltExecuteSQL() calls, plus they have to wait for a final commit signal, so this all delays the progress on the partitions.
There are certainly examples of MP transactions that shouldn't need to involve all of the partitions and could benefit from future optimizations, but it's not as easy as it may seem on a database that has to support durability to disk, k-safety, elastic add and shrink, multi-cluster active-active replication, and many of the other features that have been added to VoltDB over the years since it grew out of the H-Store project.
Disclosure: I work at VoltDB

View Serializability

If a schedule of transactions is not view-serialisable, can we conclude that the schedule is non-serialisable?
In other words, is it possible for a schedule to be serialisable and yet not view serialisable?
If possible, please give an example.

Is a strict schedule always serializable?

In DBMS, is a strict schedule always serializable?
If the answer is no, then please provide an example of a strict schedule which is not serializable.
(For the following, I assume conflict-serializability.)
No, a strict schedule is not always serializable. The following schedule is strict (no reading from or overwriting of data items before a commit), but not serializable (there are two conflict pairs: r_1[x] and w_2[x] & w_2[x] and w_1[x], but they are ordered differently). The operations with subscript 1 belong to transaction T_1, those with subscript 2 to transaction T_2.
r_1[x] w_2[x] c_2 w_1[x] c_1
Contrary to the answer given above, a serializable schedule is not necessarily strict. See the following example:
w_1[x] w_2[x] c_2 r_1[y] c_1
Transaction T_2 overwrites the w_1[x] of T_1 before the commit of T_1. Nevertheless, the schedule is (conflict-)equivalent to the serial schedule T_1 T_2.
Every serializable schedule is strict.
But every strict schedule is not serializable.
A Schedule is a serializable schedule if its outcome is equal to the outcome of its transactions executed serially i.e. without interleaving the transactions.
Whereas, a strict schedule is the one in which some other transaction T2 cannot read or write the data written by T1 unless it is committed.
So for example,
In other words in Strict schedule T2 can read the data which is being read by T1, but once T1 writes the data, from that time to the time it commits, T2 cannot read or write the data.

What is Conflict serializability?

I am reading in google about the conflict serializability and serializable.
But I am not getting the correct definition and the difference between serializable and conflict serializability.
I am getting only one thing.That is Conflict serializability implies serializable.
In many things they told most over the serializable and conflict serializable are same.
Can anyone please explain what is conflict serializable and difference between serializability and serializable with examples.
Thanks for advance !
I was found a answer for my question.
Serializable means the transaction is done in serial manner. That means if the scheduling is done, but the transactions are not use the same variable for read and write.
Example:-
T1 T2
Read(X)
Read(y)
Write(X)
Write(Y)
In this example, the two transactions are does not use the shared variable.
So, in here there is no conflict.
Conflict serializability means the transactions in done in concurrently. The two transactions are used the same variable, the output of the transaction is conflict.
Example:-
T1 T2
Read(X)
Read(X)
Write(X)
Write(X)
Read(Y)
Write(Y)
Read(Y)
Write(Y)
In this example the two transaction T1, and T2 uses the same variable.
So, the transaction T2 writes the X before T1 write. After the T1 writes the X. In here there is no use of transaction T2 writes. This is conflict serializability.
To answer your question, I'll first explain few terminologies, quoting a line from the Operating Systems book by Galvin.
Serial Schedule : A schedule in which each transaction is executed atomically is called a serial schedule just like it's shown in Premraj's answer.
Likewise, when we allow the transactions to overlap their execution,
i.e they aren't atomic any longer, they're called Non Serial
Schedule.
Conflicting Operations : This helps to see whether a Non Serial schedule is(not) equivalent to the one represented by Serial Schedule. Two consecutive operations are said to be in conflict if they access the same data item and at least one of them is write operation.
We try to find out and swap all the Non Conflicting Operations, and if the given Non Serial Schedule can be transformed to a Serial Schedule, we say the given schedule was Conflict Serializable.
Schedule : is a bundle of transactions.
Serializability is a property of a transaction schedule. It relates to the isolation property of a database transaction. Serializability of a schedule means equivalence to a serial schedule.
Conflict Serializable can occur on Non-Serializable Schedule on following 3 conditions:
They must belong to different transactions.
They must operate on same value
at least one of the should have write operation.

What is the difference between "conflict serializable" and "conflict equivalent"?

In database theory, what is the difference between "conflict serializable" and "conflict equivalent"?
My textbook has a section on conflict serializable but glosses over conflict equivalence. These are probably both concepts I am familiar with, but I am not familiar with the terminology, so I am looking for an explanation.
Conflict in DBMS can be defined as two or more different transactions accessing the same variable and atleast one of them is a write operation.
For example:
T1: Read(X)
T2: Read (X)
In this case there's no conflict because both transactions are performing just read operations.
But in the following case:
T1: Read(X)
T2: Write(X)
there's a conflict.
Lets say we have a schedule S, and we can reorder the instructions in them. and create 2 more schedules S1 and S2.
Conflict equivalent: Refers to the schedules S1 and S2 where they maintain the ordering of the conflicting instructions in both of the schedules. For example, if T1 has to read X before T2 writes X in S1, then it should be the same in S2 also. (Ordering should be maintained only for the conflicting operations).
Conflict Serializability: S is said to be conflict serializable if it is conflict equivalent to a serial schedule (i.e., where the transactions are executed one after the other).
From Wikipedia.
Conflict-equivalence
The schedules S1 and S2 are said to be conflict-equivalent if the following conditions are satisfied:
Both schedules S1 and S2 involve the same set of transactions (including ordering of actions within each transaction).
The order of each pair of conflicting actions in S1 and S2 are the same.
Conflict-serializable
A schedule is said to be conflict-serializable when the schedule is conflict-equivalent to one or more serial schedules.
Another definition for conflict-serializability is that a schedule is conflict-serializable if and only if its precedence graph/serializability graph, when only committed transactions are considered, is acyclic (if the graph is defined to include also uncommitted transactions, then cycles involving uncommitted transactions may occur without conflict serializability violation).
Just two terms to describe one thing in different ways.
Conflict equivalent: you need to say Schedule A is conflict equivalent to Schedule B. it must involve two schedules
Conflict serializable: Still use Schedule A and B. we can say Schedule A is conflict serializable. Schedule B is conflict serializable.
We didn't say Schedule A/B is conflict equivalent
We didn't say Schedule A is conflict serializable to Schedule B
If a schedule S can be transformed into a schedule S´ by a series of swaps of non-conflicting instructions, we say that S and S´ are conflict equivalent.
We say that a schedule S is conflict serializable if it is conflict equivalent to a serial schedule.
Conflict serializable means conflict equuivalent to any serial schedule.
Conflict Equivalent Schedules: if a Schedule S can be transformed into a schedule S' by a series of swaps of non conflicting instructions, we say that schedule S & S' are conflict equivalent.
Conflict Serializable Schedule: Schedule S is conflict serializable if it is conflict equivalent to a serial schedule.
Definitions have already been explained perfectly, but I feel this will be very useful to some.
I've developed a small console program (on github) which can test any schedule for conflict serializability and will also draw a precedence graph.
If there is at least one conflict equivalent schedule for considered transaction schedule, it is conflict serializable.

Resources