Can tuple variables bound by quantifiers occur on the left of '|' in Tuple Relational Calculus? - quantifiers

Quoting the general expression of Tuple Relational Calculus (Fundamentals of Database Systems - Elmasri, Navathe; 6th edition)
A general expression of the tuple relational calculus is of the form
{t1.Aj, t2.Ak, ..., tn.Am | COND(t1, t2, ..., tn, tn+1, tn+2, ..., tn+m)}
where t1, t2, ..., tn, tn+1, ..., tn+m are tuple variables, each Ai is an attribute of the relation on which ti ranges, and COND is a condition or formula of the tuple relational calculus. A formula is made up of predicate calculus atoms, which can be one of the following:
1. An atom of the form R(ti), where R is a relation name and ti is a tuple variable. This atom identifies the range of the tuple variable ti as the relation whose name is R. It evaluates to TRUE if ti is a tuple in the relation R, and evaluates to FALSE otherwise.
2. An atom of the form ti.A op tj.B, where op is one of the comparison operators in the set {=, <, ≤, >, ≥, ≠}, ti and tj are tuple variables, A is an attribute of the relation on which ti ranges, and B is an attribute of the relation on which t ranges. An atom of the form ti.A op c or c op tj.B, where op is one of the comparison operators in the set {=, <, ≤, >, ≥, ≠}, ti and tj are tuple variables, A is an attribute of the relation on which ti ranges, B is an attribute of the relation on which tj ranges, and c is a constant value.
*Edit(Thanks to philipxy): The meaning of a query in TRC with respect to the above general expression would be,
For {t|p}--"The result of such a query is the set of all tuples t that evaluate COND(t) to TRUE". For {t.a1,t.a2,...|p}--"we first specify the requested attributes […] for each selected tuple t. Then we specify the condition for selecting a tuple following the bar".
There is also a mention of,
The only free tuple variables in a tuple relational calculus expression should be those that appear to the left of the bar (|).
Consider for example, a relation Students(id, grade) and we would like to find the "id's of all students who have secured the highest grade". The query specified in Tuple Relational Calculus could be
Q1 = {s1. id | students(s1) ^ ¬(∃ s2, students(s2) ^ ( s2. grade > s1.grade) )}
Here, s1 is the free variable.
Q1 can be interpreted as all id values of tuple variable s1, where s1 ranges within the relation student (i.e. s1 belongs to student) AND there does not exist a variable s2 such that s2 belongs to student and s2.grade > s1.grade.
Consider the queries,
Q2 = {s1. id | ∃ s1, students(s1) ^ ¬(∃ s2, students(s2) ^ ( s2. grade > s1.grade) )}
and
Q3 = {s1. id | ∀ s1, students(s1) ^ ¬(∃ s2, students(s2) ^ ( s2. grade > s1.grade) )}
As we can see, s1 (variable on the left of the bar) in Q2 and Q3 is also bounded by ∃ and ∀ respectively.
How is the interpretation of Q2 and Q3 different from Q1 assuming Q2 and Q3 are even possible ?
Note:
Queries Q2 and Q3 were made up from Q1 with the purpose of trying to understand what the queries would mean if the variables on the left side of '|' were bound by existential or universal quantifiers.
(Edit, after thinking a bit more) My interpretation of Q2 and Q3 is that the result of Q2 and Q1 will be the same will not be the same as Q2 will produce all id values of s1 if there exists atleast one s1 that belongs to student and for which there does not exist s2 such that s2 belongs to student and s2.grade > s1.grade (meaning, the result of Q2 is the "set of all student id if there exists atleast one student who got the highest grade"). Q3 will produce all id values of s1 if for every s1 that belongs to student and for which there does not exist s2 such that s2 belongs to student and s2.grade > s1.grade (meaning, the result of Q3 is the "set of all student id if every student got the highest grade"). But, I'm not sure if the queries Q2 and Q3 are even possible or even if such a scenario where the variables on the left of the bar are also bounded by quantifiers could occur in general.

Related

Creating formula in SQL server

Maybe some of you could help me with creation of formula in sql. I need to perform calculations of the result of the expression for all given formulas. The notation of the formula is simple: P (X) means that the expression appends the integer X in parentheses. M (Y) means that the expression subtracts the integer Y in parentheses. The “+” symbol combines elements of the formula.
Example. given formula: P (10) + M (5) + M (3) + P (1). It translates to 10-5-3 + 1 = 3.
The result should look like this:
Those easy formulas can be done in Microsoft SQL Server.
Split the formula in the different parts with STRING_SPLIT and + as separator.
Use REPLACE to apply a negative number sign: P(X) --> X and M(X) --> -X.
Use CONVERT to turn the string parts into numbers.
Add everything up with a SUM aggregation and group by clause.
Sample data
create table input
(
formula nvarchar(50)
);
insert into input (formula) values
('P(10)+M(5)+M(3)+P(1)'),
('P(7)+M(3)+M(4)');
Solution
select i.formula,
sum(convert(int, replace(replace(replace(s.value,'P(', ''),'M(','-'),')',''))) as rez
from input i
cross apply string_split(i.formula, '+') s
group by i.formula;
Result
formula rez
-------------------- ---
P(10)+M(5)+M(3)+P(1) 3
P(7)+M(3)+M(4) 0
Fiddle to see everything in action with intermediate steps.

DBMS: Relational Algebra Execution Plan Cost Calculation

I have been trying the final days to come with a solution to the following question.
Lets suppose that we have the following two tables.
Film(ID',Title,Country,Production_Date)
Actor(ID',Name,Genre,Nationality)
Cast(Actor_ID',Film_ID',Role)
Given information:
Film holds N(film)=50.000 records, r(film)=40bytes, sequential organized, index on PK
Actor holds N(actor)=200.000 records r(actor)=80bytes,heap organized, index on PK
Cast holds N(cast)=100.000 records,r(cast)=25 bytes, heap organized, No INDEXES
The execution tree and relation expression for an execution plan is in the following picture:
For the lower level join between cast & film I'm calculating the followings:
Block Nested Loop Join : Bcast x Bfilm
Index Nested Loop Join : Bcast + Ncast x Cfilm
I'm keeping the smallest value which is given with an INLJ.
Question:
Now how can I calculate the size of the joined table and the new r which is the size of a record on the new joined table in order to proceed and calculate the upper level join between the already joined table with table actor after having calculated the cost B in blocks that join operation will take?
I assume you want to do a natural join on FILM.ID = CAST.FILM_ID and CAST.FILM_ID is a foreign key referencing FILM.ID.
1) Size of one row:
A join of Film and Cast results in tuples of the form
[FILM_ID, TITLE, COUNTRY, PRODUCTION_DATE, ACTOR_ID, ROLE].
Hence the row size should be something like
R(FILM JOIN CAST) = R(FILM) + R(CAST) - R(FILM_ID)
since the FILM_ID is the only column which is shared.
2) Number of rows:
N(FILM JOIN CAST) = N(CAST)
As there is exactly one row in FILM for every row in CAST.

Postgres SQL function string_to_array

I have a table:
c1|c2|c3|c4
-----+--+--+----
a b c 10
a a b 20
c a c 10
b b c 10
c b c 30
I want to write a function where the inputs are 3 strings / text eg ('a b c, b d, c'), compare every element to each other, find if a row exist with this combination, an sum the number of the 4th (c4) column up. But if there is a constellation of b a c or c a b it would match a b c 10. If there is a row like b c c then it wont be a row like c b b. Every matchup is unique.
I think the best would be to use string_to_array(text, text).
I put together some pseudo code, but no idea how to write it in SQL. Maybe the logic is wrong too.
function (x,y,z)
res = 0
x_array = string_to_array(x, ' ')
y_array = string_to_array(y, ' ')
z_array = string_to_array(z, ' ')
foreach(x_item in x_array)
foreach(y_item in y_array)
foreach(z_item in z_array)
if (c1 = (x_item || y_item || z_item ) && c2 = (x_item || y_item || z_item ) && c3 = (x_item || y_item || z_item ))
res++
EDIT
First off all there was a mistake in the example table. There was a row a b c and c b a. It cant be. a b c = c b a ! and each row must be unique.
example: three text inputs a b c | b c | c
each element vs each element: a b c , a c c, b b c, b c c, c b c, c c c
a b c = 10;
a c c (is the same as c a c) = 10;
b b c = 10;
b c c (is the same as c b c) = 30;
c b c = 30;
c c c (no match) = 0; result = 90
I think this might be what you want:
Return the sum of column c4 from all rows where a given set of three tokens matches the columns (c1, c2, c3).
Simple version
Much simpler with contains #> and is contained <# by operators:
SELECT sum(c4) AS sum_of_matching_c4
FROM tbl
WHERE ARRAY[c1,c2,c3] <# ARRAY['b', 'a', 'c'] -- strings in arbitrary order
AND ARRAY[c1,c2,c3] #> ARRAY['b', 'a', 'c'];
Sorry, that would fail for ('b', 'c', 'c') vs. ('c', 'b', 'b').
Slow and sure
WITH i(arr) AS (
SELECT ARRAY(VALUES ('b'), ('c'), ('c') ORDER BY 1) -- input once
) -- in arbitrary order
SELECT sum(c4) AS sum_of_matching_c4
FROM (
SELECT c4, array_agg(x ORDER BY x) AS arr
FROM (
SELECT ctid, c4, unnest(ARRAY[c1,c2,c3]) AS x
FROM tbl t, i
WHERE ARRAY[c1,c2,c3] <# arr -- optional pre-selection
AND ARRAY[c1,c2,c3] #> arr -- for better performance?
) a
GROUP BY ctid, c4
) b
JOIN i USING (arr)
-> sqlfiddle demo.
The major difficulty is to order the values of the columns within the row.
For your input (3 strings) I achieve this in the WHERE clause with a VALUE expression in the CTE which I order right away and collect it in an array. I use a CTE for convenience, so we have to enter values in one place only.
It's more complicated for the row values. I put the three columns in an array and break that up to rows with unnest(). As you did not provide a primary key, I use the ctid as ad-hoc surrogate primary key instead - which I need for the GROUP BY to stuff the now sorted (c1, c2, c3) into an array.
Finally I sum up all c4 of rows where the now sorted arrays match exactly.
Note: I expressly do not use string_agg() because that does not produce distinct results. Consider:
'abc' 'cde' 'fgh'
'ab' 'ccdef' 'gh'
.. resulting int the same string if concatenated.
Index / Performance
You might consider to save pre-ordered data to speed up queries. Doing it on the fly is expensive. I.e. you could pre-generate the sorted array and save it as redundant column which you can then support with an index. Should be faster by several orders of magnitude for the cost of redundant data storage.
If you are dealing with long strings, a solution similar to what I outlined in this related answer on dba.SE might be the best course of action.
Alternatively (preferred!) guarantee that (c1, c2, c3) are always stored in ascending order. You could use a trigger BEFORE INSERT OR UPDATE to keep values within the row ordered. No redundant storage and you can simply create a multi-column index on the three columns and compare to them one by one (instead of comparing the array like in my example).
You don't need to write a function for that.
First, there's no "strings" with postgresql ( sql ) , it's "text" or "varchar".
Second, what you need is an SQL query like this:
SELECT ( DISTINCT ( c1 || c2 || c3 )) AS txtcol, SUM (c4) AS rowsum;
or
SELECT ( DISTINCT ( c1 || c2 || c3 )) AS txtcol, SUM(c4) AS numsum GROUP BY txtcol;
Can't recall the exact syntax at the moment, you need to work it out,
anyway the point is you need to concatenate varchar columns with some built-in
function like CONCAT or "||" operator, and then sum/group by numeric column. All you need
is to concatenate columns, and give resulting all-together column a name.
To be exact, you don't even need to show concatenated column on resulting table,
you could output just sums, and number of rows sumarized for example.
Theoretically you could write SQL function or PL/SQL function for that, but I'm sure it's just not necessary, your case seems to me simple enough to be able to achieve result you want without a function. Built-in sumarizing function SUM() is called "aggregate" function, other examples of aggregating functions are e.g. MIN() or MAX().
Note what you're actually trying to do, is grouping rows by some resulting VARCHAR column by the effect of concatenation per-row.
EDIT: "Arrays" in SQL or procedural SQL is some internally-handled arrays, do not confuse them with relations ( tables in database, nor with tables as SELECT results ). I think you also don't need SQL arrays for that, the task really isn't so hard as it looks like.

Compare arrays for equality, ignoring order of elements

I have a table with 4 array columns.. the results are like:
ids signed_ids new_ids new_ids_signed
{1,2,3} | {2,1,3} | {4,5,6} | {6,5,4}
Anyway to compare ids and signed_ids so that they come out equal, by ignoring the order of the elements?
You can use contained by operator:
(array1 <# array2 and array1 #> array2)
The additional module intarray provides operators for arrays of integer, which are typically (much) faster. Install once per database with (in Postgres 9.1 or later):
CREATE EXTENSION intarray;
Then you can:
SELECT uniq(sort(ids)) = uniq(sort(signed_ids));
Or:
SELECT ids #> signed_ids AND ids <# signed_ids;
Bold emphasis on functions and operators from intarray.
In the second example, operator resolution arrives at the specialized intarray operators if left and right argument are type integer[].
Both expressions will ignore order and duplicity of elements. Further reading in the helpful manual here.
intarray operators only work for arrays of integer (int4), not bigint (int8) or smallint (int2) or any other data type.
Unlike the default generic operators, intarray operators do not accept NULL values in arrays. NULL in any involved array raises an exception. If you need to work with NULL values, you can default to the standard, generic operators by schema-qualifying the operator with the OPERATOR construct:
SELECT ARRAY[1,4,null,3]::int[] OPERATOR(pg_catalog.#>) ARRAY[3,1]::int[]
The generic operators can't use indexes with an intarray operator class and vice versa.
Related:
GIN index on smallint[] column not used or error "operator is not unique"
The simplest thing to do is sort them and compare them sorted. See sorting arrays in PostgreSQL.
Given sample data:
CREATE TABLE aa(ids integer[], signed_ids integer[]);
INSERT INTO aa(ids, signed_ids) VALUES (ARRAY[1,2,3], ARRAY[2,1,3]);
the best thing to do is to if the array entries are always integers is to use the intarray extension, as Erwin explains in his answer. It's a lot faster than any pure-SQL formulation.
Otherwise, for a general version that works for any data type, define an array_sort(anyarray):
CREATE OR REPLACE FUNCTION array_sort(anyarray) RETURNS anyarray AS $$
SELECT array_agg(x order by x) FROM unnest($1) x;
$$ LANGUAGE 'SQL';
and use it sort and compare the sorted arrays:
SELECT array_sort(ids) = array_sort(signed_ids) FROM aa;
There's an important caveat:
SELECT array_sort( ARRAY[1,2,2,4,4] ) = array_sort( ARRAY[1,2,4] );
will be false. This may or may not be what you want, depending on your intentions.
Alternately, define a function array_compare_as_set:
CREATE OR REPLACE FUNCTION array_compare_as_set(anyarray,anyarray) RETURNS boolean AS $$
SELECT CASE
WHEN array_dims($1) <> array_dims($2) THEN
'f'
WHEN array_length($1,1) <> array_length($2,1) THEN
'f'
ELSE
NOT EXISTS (
SELECT 1
FROM unnest($1) a
FULL JOIN unnest($2) b ON (a=b)
WHERE a IS NULL or b IS NULL
)
END
$$ LANGUAGE 'SQL' IMMUTABLE;
and then:
SELECT array_compare_as_set(ids, signed_ids) FROM aa;
This is subtly different from comparing two array_sorted values. array_compare_as_set will eliminate duplicates, making array_compare_as_set(ARRAY[1,2,3,3],ARRAY[1,2,3]) true, whereas array_sort(ARRAY[1,2,3,3]) = array_sort(ARRAY[1,2,3]) will be false.
Both of these approaches will have pretty bad performance. Consider ensuring that you always store your arrays sorted in the first place.
If your arrays have no duplicates and are of the same dimension:
use array contains #>
AND array_length where the length must match the size you want on both sides
select (string_agg(a,',' order by a) = string_agg(b,',' order by b)) from
(select unnest(array[1,2,3,2])::text as a,unnest(array[2,2,3,1])::text as b) A

"if, then, else" in SQLite

Without using custom functions, is it possible in SQLite to do the following. I have two tables, which are linked via common id numbers. In the second table, there are two variables. What I would like to do is be able to return a list of results, consisting of: the row id, and NULL if all instances of those two variables (and there may be more than two) are NULL, 1 if they are all 0 and 2 if one or more is 1.
What I have right now is as follows:
SELECT
a.aid,
(SELECT count(*) from W3S19 b WHERE a.aid=b.aid) as num,
(SELECT count(*) FROM W3S19 c WHERE a.aid=c.aid AND H110 IS NULL AND H112 IS NULL) as num_null,
(SELECT count(*) FROM W3S19 d WHERE a.aid=d.aid AND (H110=1 or H112=1)) AS num_yes
FROM W3 a
So what this requires is to step through each result as follows (rough Python pseudocode):
if row['num_yes'] > 0:
out[aid] = 2
elif row['num_null'] == row['num']:
out[aid] = 'NULL'
else:
out[aid] = 1
Is there an easier way? Thanks!
Use CASE...WHEN, e.g.
CASE x WHEN w1 THEN r1 WHEN w2 THEN r2 ELSE r3 END
Read more from SQLite syntax manual (go to section "The CASE expression").
There's another way, for numeric values, which might be easier for certain specific cases.
It's based on the fact that boolean values is 1 or 0, "if condition" gives a boolean result:
(this will work only for "or" condition, depends on the usage)
SELECT (w1=TRUE)*r1 + (w2=TRUE)*r2 + ...
of course #evan's answer is the general-purpose, correct answer

Resources