number of functional dependencies - database

Find the number of functional dependencies in a relation having n attributes?
On first thought , I figure out that left hand side can have N+1 possibilities(null can also be there.) and similarly N+1 possibility for right hand side.
Hence total no of FD should be
(n+1)*(n+1) - 1
But ans given is 2^(n+1) .
On analyzing the answer , We can see they are not including trivital ones like ABC -> A etc .
So what should be the correct ans ???

2^n possible attribute sets on the LHS, and again 2^n possible attribute sets on the RHS. Both counts include the empty set.
Number of possible distinct pairs between those is 2^n * 2^n.
While technically correct, this answer also implies that FDs such as {AB} -> {} are also considered. How many of those are there ? For each cardinality l of the LHS, there are 2^l possible subsets, each of those giving a trivial FD if it appears on the RHS. So the number of trivial FDs is 2^0 + 2^1 + 2^2 + ... + 2^n = 2^(n+1) - 1. Leaving in total 2^(2*n) + 1 - 2^(n+1).
But now we have excluded only {AB} -> {A} and the like, and not {AB} -> {AC}. If we want the RHS to mention only attributes that are not mentioned on the LHS, then for each cardinality l of the LHS, there are 2^(n-l)-1 possible subsets on the RHS (that extra minus one is needed because the empty set must be excluded). Summing up to 2^0 - 1 + 2^1 - 1 + 2^2 - 1 + ... + 2^n - 1 = 2^(n+1) - 1 - (n + 1).
Still different from the answer given. And at any rate, the question was formulated hopelessly poorly. The question DID NOT STATE that trivial FDs were to be excluded. The question DID NOT STATE that "partially trivial" FDs were to be excluded.
BTW let's take the answers to the test. Choose a relation of degree 1, {A}. There are four possible FDs :
{} -> {} trivial, RHS subset of LHS
{} -> {A}
{A} -> {} trivial, RHS subset of LHS
{A} -> {A} trivial, RHS subset of LHS.
The correct answer, if trivial FDs are to be excluded, is "one". Your textbook says it's "four".

I think 2^n *2 is the answer because it includes all basic minimal FD’s and the rest can be derived from axioms or are trivial. A -> BC will have A -> B, A -> C, etc. ABC -> A, etc. are trivial.
Although it’s not clear from question whether to include all trivial cases. In that case the answer will be 2^n * 2^n. Because we can have 2^n choices for both LHS and RHS .

Related

Daily Coding Problem 260 : Reconstruct a jumbled array - Intuition?

I'm going through the question below.
The sequence [0, 1, ..., N] has been jumbled, and the only clue you have for its order is an array representing whether each number is larger or smaller than the last. Given this information, reconstruct an array that is consistent with it.
For example, given [None, +, +, -, +], you could return [1, 2, 3, 0, 4].
I went through the solution on this post but still unable to understand it as to why this solution works. I don't think I would be able to come up with the solution if I had this in front of me during an interview. Can anyone explain the intuition behind it? Thanks in advance!
This answer tries to give a general strategy to find an algorithm to tackle this type of problems. It is not trying to prove why the given solution is correct, but lying out a route towards such a solution.
A tried and tested way to tackle this kind of problem (actually a wide range of problems), is to start with small examples and work your way up. This works for puzzles, but even so for problems encountered in reality.
First, note that the question is formulated deliberately to not point you in the right direction too easily. It makes you think there is some magic involved. How can you reconstruct a list of N numbers given only the list of plusses and minuses?
Well, you can't. For 10 numbers, there are 10! = 3628800 possible permutations. And there are only 2⁹ = 512 possible lists of signs. It's a very huge difference. Most original lists will be completely different after reconstruction.
Here's an overview of how to approach the problem:
Start with very simple examples
Try to work your way up, adding a bit of complexity
If you see something that seems a dead end, try increasing complexity in another way; don't spend too much time with situations where you don't see progress
While exploring alternatives, revisit old dead ends, as you might have gained new insights
Try whether recursion could work:
given a solution for N, can we easily construct a solution for N+1?
or even better: given a solution for N, can we easily construct a solution for 2N?
Given a recursive solution, can it be converted to an iterative solution?
Does the algorithm do some repetitive work that can be postponed to the end?
....
So, let's start simple (writing 0 for the None at the start):
very short lists are easy to guess:
'0++' → 0 1 2 → clearly only one solution
'0--' → 2 1 0 → only one solution
'0-+' → 1 0 2 or 2 0 1 → hey, there is no unique outcome, though the question only asks for one of the possible outcomes
lists with only plusses:
'0++++++' → 0 1 2 3 4 5 6 → only possibility
lists with only minuses:
'0-------'→ 7 6 5 4 3 2 1 0 → only possibility
lists with one minus, the rest plusses:
'0-++++' → 1 0 2 3 4 5 or 5 0 1 2 3 4 or ...
'0+-+++' → 0 2 1 3 4 5 or 5 0 1 2 3 4 or ...
→ no very obvious pattern seem to emerge
maybe some recursion could help?
given a solution for N, appending one sign more?
appending a plus is easy: just repeat the solution and append the largest plus 1
appending a minus, after some thought: increase all the numbers by 1 and append a zero
→ hey, we have a working solution, but maybe not the most efficient one
the algorithm just appends to an existing list, no need to really write it recursively (although the idea is expressed recursively)
appending a plus can be improved, by storing the largest number in a variable so it doesn't need to be searched at every step; no further improvements seem necessary
appending a minus is more troublesome: the list needs to be traversed with each append
what if instead of appending a zero, we append -1, and do the adding at the end?
this clearly works when there is only one minus
when two minus signs are encountered, the first time append -1, the second time -2
→ hey, this works for any number of minuses encountered, just store its counter in a variable and sum with it at the end of the algorithm
This is in bird's eye view one possible route towards coming up with a solution. Many routes lead to Rome. Introducing negative numbers might seem tricky, but it is a logical conclusion after contemplating the recursive algorithm for a while.
It works because all changes are sequential, either adding one or subtracting one, starting both the increasing and the decreasing sequences from the same place. That guarantees we have a sequential list overall. For example, given the arbitrary
[None, +, -, +, +, -]
turned vertically for convenience, we can see
None 0
+ 1
- -1
+ 2
+ 3
- -2
Now just shift them up by two (to account for -2):
2 3 1 4 5 0
+ - + + -
Let's look at first to a solution which (I think) is easier to understand, formalize and demonstrate for correctness (but I will only explain it and not demonstrate in a formal way):
We name A[0..N] our input array (where A[k] is None if k = 0 and is + or - otherwise) and B[0..N] our output array (where B[k] is in the range [0, N] and all values are unique)
At first we see that our problem (find B such that B[k] > B[k-1] if A[k] == + and B[k] < B[k-1] if A[k] == -) is only a special case of another problem:
Find B such that B[k] == max(B[0..k]) if A[k] == + and B[k] == min(B[0..k]) if A[k] == -.
Which generalize from "A value must larger or smaller than the last" to "A value must be larger or smaller than everyone before it"
So a solution to this problem is a solution to the original one as well.
Now how do we approach this problem?
A greedy solution will be sufficient, indeed is easy to demonstrate that the value associated with the last + will be the biggest number in absolute (which is N), the one associated with the second last + will be the second biggest number in absolute (which is N-1) ecc...
And in the same time the value associated with the last - will be the smallest number in absolute (which is 0), the one associated with the second last - will be the second smallest (which is 1) ecc...
So we can start filling B from right to left remembering how many + we have seen (let's call this value X), how many - we have seen (let's call this value Y) and looking at what is the current symbol, if it is a + in B we put N-X and we increase X by 1 and if it is a - in B we put 0+Y and we increase Y by 1.
In the end we'll need to fill B[0] with the only remaining value which is equal to Y+1 and to N-X-1.
An interesting property of this solution is that if we look to only the values associated with a - they will be all the values from 0 to Y (where in this case Y is the total number of -) sorted in reverse order; if we look to only the values associated with a + they will be all the values from N-X to N (where in this case X is the total number of +) sorted and if we look at B[0] it will always be Y+1 and N-X-1 (which are equal).
So the - will have all the values strictly smaller than B[0] and reverse sorted and the + will have all the values strictly bigger than B[0] and sorted.
This property is the key to understand why the solution proposed here works:
It consider B[0] equals to 0 and than it fills B following the property, this isn't a solution because the values are not in the range [0, N], but it is possible with a simple translation to move the range and arriving to [0, N]
The idea is to produce a permutation of [0,1...N] which will follow the pattern of [+,-...]. There are many permutations which will be applicable, it isn't a single one. For instance, look the the example provided:
[None, +, +, -, +], you could return [1, 2, 3, 0, 4].
But you also could have returned other solutions, just as valid: [2,3,4,0,1], [0,3,4,1,2] are also solutions. The only concern is that you need to have the first number having at least two numbers above it for positions [1],[2], and leave one number in the end which is lower then the one before and after it.
So the question isn't finding the one and only pattern which is scrambled, but to produce any permutation which will work with these rules.
This algorithm answers two questions for the next member of the list: get a number who’s both higher/lower from previous - and get a number who hasn’t been used yet. It takes a starting point number and essentially create two lists: an ascending list for the ‘+’ and a descending list for the ‘-‘. This way we guarantee that the next member is higher/lower than the previous one (because it’s in fact higher/lower than all previous members, a stricter condition than the one required) and for the same reason we know this number wasn’t used before.
So the intuition of the referenced algorithm is to start with a referenced number and work your way through. Let's assume we start from 0. The first place we put 0+1, which is 1. we keep 0 as our lowest, 1 as the highest.
l[0] h[1] list[1]
the next symbol is '+' so we take the highest number and raise it by one to 2, and update both the list with a new member and the highest number.
l[0] h[2] list [1,2]
The next symbol is '+' again, and so:
l[0] h[3] list [1,2,3]
The next symbol is '-' and so we have to put in our 0. Note that if the next symbol will be - we will have to stop, since we have no lower to produce.
l[0] h[3] list [1,2,3,0]
Luckily for us, we've chosen well and the last symbol is '+', so we can put our 4 and call is a day.
l[0] h[4] list [1,2,3,0,4]
This is not necessarily the smartest solution, as it can never know if the original number will solve the sequence, and always progresses by 1. That means that for some patterns [+,-...] it will not be able to find a solution. But for the pattern provided it works well with 0 as the initial starting point. If we chose the number 1 is would also work and produce [2,3,4,0,1], but for 2 and above it will fail. It will never produce the solution [0,3,4,1,2].
I hope this helps understanding the approach.
This is not an explanation for the question put forward by OP.
Just want to share a possible approach.
Given: N = 7
Index: 0 1 2 3 4 5 6 7
Pattern: X + - + - + - + //X = None
Go from 0 to N
[1] fill all '-' starting from right going left.
Index: 0 1 2 3 4 5 6 7
Pattern: X + - + - + - + //X = None
Answer: 2 1 0
[2] fill all the vacant places i.e [X & +] starting from left going right.
Index: 0 1 2 3 4 5 6 7
Pattern: X + - + - + - + //X = None
Answer: 3 4 5 6 7
Final:
Pattern: X + - + - + - + //X = None
Answer: 3 4 2 5 1 6 0 7
My answer definitely is too late for your problem but if you need a simple proof, you probably would like to read it:
+min_last or min_so_far is a decreasing value starting from 0.
+max_last or max_so_far is an increasing value starting from 0.
In the input, each value is either "+" or "-" and for each increase the value of max_so_far or decrease the value of min_so_far by one respectively, excluding the first one which is None. So, abs(min_so_far, max_so_far) is exactly equal to N, right? But because you need the range [0, n] but max_so_far and min_so_far now are equal to the number of "+"s and "-"s with the intersection part with the range [0, n] being [0, max_so_far], what you need to do is to pad it the value equal to min_so_far for the final solution (because min_so_far <= 0 so you need to take each value of the current answer to subtract by min_so_far or add by abs(min_so_far)).

Theorem Solution by Resolution Refutation

I have the following problem which I need to solve by resolution method in Artificial Intelligence
I don't understand why the negation of dog(x) is added in the first clause and ///y in the fourth clause why negation of animal(Y) is added ...
I mean what is the need of negation there?
Recall that logical implication P → Q is equivalent to ¬P ∨ Q. You can verify this by looking at the truth table:
P Q P → Q
0 0 1
1 0 0
0 1 1
1 1 1
Now clearly dog(X) → animal(X) is equivalent to ¬dog(X) ∨ animal(X) which is a disjunction of literals therefore is a clause.
The same reasoning applies to animal(Y) → die(Y).
As soon as you have got a set of formulas in clausal form that is equivalent to your input knowledge base, you can apply binary resolution to check if your knowledge base is consistent, or to prove a goal.
To prove a goal you add a negation of it to your consistent knowledge base and see if the knowledge base with the added negation of the goal becomes inconsistent.

Compact storage coefficients of a multivariate polynomial

The setup
I am writing a code for dealing with polynomials of degree n over d-dimensional variable x and ran into a problem that others have likely faced in the past. Such polynomial can be characterized by coefficients c(alpha) corresponding to x^alpha, where alpha is a length d multi-index specifying the powers the d variables must be raised to.
The dimension and order are completely general, but known at compile time, and could be easily as high as n = 30 and d = 10, though probably not at the same time. The coefficients are dense, in the sense that most coefficients are non-zero.
The number of coefficients required to specify such a polynomial is n + d choose n, which in high dimensions is much less than n^d coefficients that could fill a cube of side length n. As a result, in my situation I have to store the coefficients rather compactly. This is at a price, because retrieving a coefficient for a given multi-index alpha requires knowing its location.
The question
Is there a (straightforward) function mapping a d-dimensional multi-index alpha to a position in an array of length (n + d) choose n?
Ordering combinations
A well-known way to order combinations can be found on this wikipedia page. Very briefly you order the combinations lexically so you can easily count the number of lower combinations. An explanation can be found in the sections Ordering combinations and Place of a combination in the ordering.
Precomputing the binomial coefficients will speed up the index calculation.
Associating monomials with combinations
If we can now associate each monomial with a combination we can effectively order them with the method above. Since each coefficient corresponds with such a monomial this would provide the answer you're looking for. Luckily if
alpha = (a[1], a[2], ..., a[d])
then the combination you're looking for is
combination = (a[1] + 0, a[1] + a[2] + 1, ..., a[1] + a[2] + ... + a[d] + d - 1)
The index can then readily be calculated with the formula from the wikipedia page.
A better, more object oriented solution, would be to create Monomial and Polynomial classes. The Polynomial class would encapsulate a collection of Monomials. That way you can easily model a pathological case like
y(x) = 1.0 + x^50
using just two terms rather than 51.
Another solution would be a map/dictionary where the key was the exponent and the value is the coefficient. That would only require two entries for my pathological case. You're in business if you have a C/C++ hash map.
Personally, I don't think doing it the naive way with arrays is so terrible, even with a polynomial containing 1000 terms. RAM is cheap; that array won't make or break you.

Non trivial functional dependency in DBMS

What are the non-trivial functional dependencies in the following table?
A B C
1 1 1
1 1 0
2 3 2
2 3 2
What the basic concept?
A functional dependency answers the question, "Given one value for X, do I find one and only one value for Y?" Both X and Y are sets; each one represents one or more attributes.
So we can ask ourselves, "Given one value for 'A', do I find one and only one value for 'B'?" And the answer is "Yes". (Assuming the sample data is representative.) That leads to the nontrivial functional dependency A->B.
And we continue with the question, "Given one value for 'A', do I find one and only one value for 'C'?" And the answer is "No". Given 1 for 'A', we find two different values for 'C': 1 and 0. No functional dependency there.
Repeat for every possible combination of attributes.
Trivial: If an FD X → Y holds where Y subset of X, then it is called a trivial FD. Trivial FDs are always hold.
Non-trivial: If an FD X → Y holds where Y is not subset of X, then it is called non-trivial FD.
Completely non-trivial: If an FD X → Y holds where x intersect Y = Φ, is said to be completely non-trivial FD.
For example:
X = { b, c } and Y = { b, a }. If X → Y, then the FD is non-trivial but not completely non-trivial.
See the examples here: http://en.wikipedia.org/wiki/Functional_dependency
Especially the lecture one. I think in this case (for the data set you show) for instance if A=1 B=2 and if A=2 B=3. That is probably the dependency you are talking about.
non trivial dependency means X-->Y that is if Y is not proper subset of X table or relation with X then it said to be non trivial functional dependency.
A FD (functional dependency) is trival, non-trivial or semitrivial.
Write what all attributes have functional dependency between them:
A->B, B->A, C->A, C->B
Using the augmentation inference rule we also get:
AC->B, BC->A
Augmentation says that if A -> B holds then AX -> BX holds.
So in total we have 5 non-trivial functional dependencies.
Trivial fd: x,y some attributes sets, if y is a subset of x then x->y implies is a trivial fd.
Non-trivial fd; x,y some attributes sets ,
if x intersection y goes to phi. then x->

Can all objects be paired in a sequence?

First of all, this is not a homework or something like that, this is a suggestive problem from my last question Candidate in finding majority element in an array.
There are n kinds of objects O1, O2, ..., On, and there is an array F[1...n], F[i] is the number of Oi(i.e. there are F[i] Ois, array F[1...n] is given), and every F[i] > 0 .
Now use the following rule to pair objects:
if i != j, Oi can be paired with Oj,
else if i == j, Oi can not be paired with Oj.
i.e., only two different kinds of objects can be paired with each other.
Whether or not a valid pairing method may exist for the input array F[1...n]? Give out an algorithm with best time complexity to tell true or false and proves its correctness.
If there exist, output one valid paring sequence. Give out your algorithm and time/space complexity analysis.
For example,
input F[] = {10, 2, 4, 4};
Then there exists at least one valid pairing method:
2 O1s and 2 O2s paired, 4 O1s and 4 O3s paired, 4 O1s and 4 O4s paired.
One valid pairing sequence is:
(1,2) (1,2) (1,3) (1,3) (1,3) (1,3) (1,4) (1,4) (1,4) (1,4)
Checking if a solution exists in O(n)
Let s be the sum of F.
If s is odd there is no solution (intuitive)
If there exists an i such that F[i] > s/2 there is no solution (intuitive)
Otherwise, a solution exists (proof by construction follows)
Finding a solution in O(n)
# Find s
s = 0
for i in 1..n:
s += F[i]
# Find m such that F[m] is maximal
m = 1
for i in 1..n:
if F[i] > F[m]:
m = i
if s % 2 != 0 or F[m] > s/2:
fail
a = 1
b = 1
# Pair off arbitrary objects (except those of type m) until F[m] = s/2
while s/2 > F[m]:
# Find a type with a non-zero frequency
until F[a] > 0 and a != m:
a = a + 1
# Find another type with a non-zero frequency
until F[b] > 0 and b != m and b != a:
b = b + 1
count = min(F[a], F[b], s/2 - F[m])
pair(a, b, count)
# Pair off objects of type m with objects of different types
while F[m] > 0:
# Find a type with a non-zero frequency
until F[a] > 0 and a != m:
a = a + 1
pair(a, m, F[a])
end of algorithm
def pair(a, b, count):
# Pairs off 'count' pairs of types a and b
F[a] = F[a] - count
F[b] = F[b] - count
s = s - (2 * count)
output "Pairing off $a and $b ($count times)"
The two while loops are both linear. The first while loop increments either a or b by at least one at each iteration, because after matching up count pairs either F[a] is zero, or F[b] is zero, or s/2 = F[m] and the loop terminates. a and b can be incremented at most n times each before all the elements have been visited. The second while loop is also linear since it increments a at by at least one during each iteration.
The key invariants are
(1) F[m] is the largest element of F
(2) F[m] <= s/2
I think both are fairly obvious upon inspection.
With the inner loop, as long as s/2 > F[m] there must be at least two other object types with non-zero frequencies. If there was only one, say a, then
F[a] + F[m] = s
F[a] = s - F[m] > s - s/2 (from the loop condition)
F[a] > s/2
F[a] > F[m]
which is a contradiction of invariant (1).
Since there are at least two types with non-zero frequencies (besides m) the loop will be able to find types a and b and pair off objects until s/2 = F[m].
The second loop is trivial. Since exactly half the objects are of type m, each object of type m can be paired with an object of a different type.
Here's one suggestion. I'm not sure if it succeeds for every possible case, though, or if it's the most efficient possible algorithm.
Let n be the total number of indices. Construct a highest-value priority queue of the numbers of object types, where each object type is its index i. In other words, make a priority queue where the sorting values in the queue are the values of F. Associate each node with the list of all objects of that type. This will take O(n log(n)) time.
For each object type, starting with the type that has the most duplicates and proceeding to the type with the fewest, pair one object from that "class" of objects with one object for each of the other classes that still have objects remaining in them, and remove that object from that node in the queue. Since every queue item except the top one will have one fewer object in it, most of the queue will still be in priority-queue order; the top node, however, will have n-1 fewer items (or it will be empty), so heapify down to maintain the queue. Also, remove nodes with no objects left. Repeat this process with the new highest queue value until all nodes are paired. This will take O(n log(n) + k) time, where k is the total number of items. Assuming that k is significantly larger than n, the total time complexity is O(k).
Again, I'm not quite sure that this will always find a solution if one is possible. My intuition is that if you re-heapify (if necessary) after every pairing, you'll ensure that if a full pairing is possible it will be found, but (1) this would be much less efficient, and (2) I'm not sure what cases it would succeed at that the original algorithm wouldn't, and (3) I'm not entirely certain even that would work every time.
As for what values of F have no solution, obviously if there exists one class of objects that has more elements than all the other classes combined, no pairing is possible. Beyond that...I'm not really sure. It would be interesting to investigate whether my "improved" algorithm correctly evaluates every case or not.

Resources