Determining functional dependencies from a chart

Determining functional dependencies from a chart - database

Can anyone explain to me how to go about figuring out which dependencies the following instance satisfies?
A B C
1 0 1
1 1 1
I know it satisfies B-> A, B->C, A->C, C->A (and other implied dependencies)
but I haven't been able to grasp the concept of how to just view that from this chart. Can anyone explain how I am supposed to read it and go about determining what it satisfies with only the 0's and 1's ?
Adding another example to help better understand:
A B C
1 0 1
1 1 1
2 2 1
Since there is only one row where B = 0 and B = 2, can you base the B -> A off of only one row with one unique value. Like since there is only one place where B = 0 and A = 1 does that mean it automatically holds since there is no other value of B with a 0 ?

A way of answering to a question like this is to look at all the possible proper subsets of columns, let's call them X1, X2, ... , starting first with single columns (so in this case we start with X1=A, X2=B, X3=C, and trying to see, for identical value in Xi, which other columns have identical values.
For instance, starting with A, we discover that for A=1, B has two different values: this means that B cannot depend on A, (that is has not the same value for the value of A, which is the definition of functional dependency), while C has the same value (1), so that we know that this instance of relation satisfies A → C.
Looking at B, we discover that all the values are different, so we can say that all the other columns are dependent on it, and we add B → A, B → C. Finally, in analysing C, we discover that only the values of A are equal when the values of C are equal, so that C → A.
We can stop here, without considering the pairs of attributes AB, AC, and BC, since in this simple case every attribute is the determinant of some dependency, so that dependencies with set of attributes as determinant are implied by the dependencies already found.
In summary
In a certain instance, to know if a dependency X -> Y hold or not, we check:
if all the values of X are different, then the dependency hold; if there are rows with repeat values, then, if for each row with the same value of X the value of Y is always the same, then the dependency holds, otherwise no.
Here is another example:
A B C
1 2 2
0 3 3
1 2 4
2 2 4
In this instance A → B ? Yes, since the are two rows (the first and the fourth) with the same value of A (1), and in both rows the value of B is equal (2). Is A → C ? No, since C has two different values in the first and fourth row.
B → A ? No, since B has three rows with the same value (2) and A has different values in the same rows (1 and 2).

Related

Extracting positions of elements from two Matlab vectors satisfying some criteria

Consider three row vectors in Matlab, A, B, C, each with size 1xJ. I want to construct a matrix D of size Kx3 listing every triplets (a,b,c) such that:
a is the position in A of A(a).
b is the position in B of B(b).
A(a)-B(b) is an element of C.
c is the position in C of A(a)-B(b).
A(a) and B(b) are different from Inf, -Inf.
For example,
A=[-3 3 0 Inf -Inf];
B=[-2 2 0 Inf -Inf];
C=[Inf -Inf -1 1 0];
D=[1 1 3; %-3-(-2)=-1
2 2 4; % 3-2=1
3 3 5]; % 0-0=0
I would like this code to be efficient, because in my real example I have to repeat it many times.
This question relates to my previous question here, but now I'm looking for the positions of the elements.

You can use combvec (or any number of alternatives) to get all pairings of indices a and b for the corresponding arrays A and B. Then it's simply a case of following your criteria
Find the differences
Check which differences are in C
Remove elements you don't care about
Like so:
% Generate all index pairings
D = combvec( 1:numel(A), 1:numel(B) ).';
% Calculate deltas
delta = A(D(:,1)) - B(D(:,2));
delta = delta(:); % make it a column
% Get delta index in C (0 if not present)
[~,D(:,3)] = ismember(delta,C);
% If A or B are inf then the delta is Inf or NaN, remove these
idxRemove = isinf(delta) | isnan(delta) | D(:,3) == 0;
D(idxRemove,:) = [];
For your example, this yields the expected results from the question.
You said that A and B are at most 7 elements long, so you have up to 49 pairings to check. This isn't too bad, but readers should be careful that the pairings can grow quickly for larger inputs.

SUMPRODUCT() handling arrays in excel

I have three arrays A, B, and A - B = C. They are broken into columns and formatted in excel like:
A, B, C, A, B, C, A, B, C... D, E
I want to sum all C>0 = D, and sum of all C<0 = E. The problem is that C is broken up for easy human readability, so I only want to call every third column.
My solution:
Following a variation on the method given here and here, and a simple test array of data:
1 0 1 1
1 1 0 1
-1 -1 -1 1
0 -1 -1 0
-1 -1 1 0
1 1 0 1
I will pull out the even columns and do the conditional sums:
=SUMPRODUCT((MOD(COLUMN(A1:D1),2)=0)*(A1:D1>0),A1:D1)
=SUMPRODUCT((MOD(COLUMN(A1:D1),2)=0)*(A1:D1<0),A1:D1)
Which produces the correct result:
1 0
2 0
1 -1
0 -1
0 -1
2 0
But I am absolutely baffled as to why this works. For one thing, I didn't put in the double negative (--), so upon getting a "TRUE" or "FALSE" value, the formula should have spit an error at me. For another, this works just fine even though I'm not running it as a CSE array function in excel. And the part I get least of all is the arguments for SUMPRODUCT().
MOD() is just acting as a filter for the conditional, that I get, but I don't understand how it's handling A1:D1 when all it gets from COLUMN is a single number. COLUMN(A1:D1) just returns a single scalar value in excel, the first column in the range, in this case 1. How is that being turned into the needed array [1, 3], especially since I'm not using CSE?

SUMPRODUCT is naturally an Array type formula which is why you do not need the CSE:
=SUMPRODUCT(A1:A6,B1:B6)
This will iterate and do A1*B1+A2*B2+...
So using this functionality we can do:
=SUMPRODUCT((MOD(COLUMN(A1:D1),2)))
Which will iterate through the columns and return: 2 MOD(1,2)+MOD(2,2)+...
The reason you do not need the double unary(--) is because you have the * in the expression. Any math done on a Boolean(TRUE/FALSE) will turn it to its Bit (1/0).
The -- is short hand for -1 * -1 *
You would need the -- if you did this:
=SUMPRODUCT(--(MOD(COLUMN(A1:D1),2)=0),--(A1:D1>0),A1:D1)
So with this you would end up multiplying a series of 1's and 0's against the series in A1:D1.
When either is false it will return 0 and 0 times anything is 0.
So only when both are TRUE or 1 does the value in the corresponding cell get added to the iterating sum.

Efficient algorithm to print sum of elements at all possible subsequences of length 2 to n+1 [duplicate]

This question already has answers here:
Sum of products of elements of all subarrays of length k
(2 answers)
Permutation of array
(13 answers)
Closed 7 years ago.
I will start with an example. Suppose we have an array of size 3 with elements a, b and c like: (where a, b and c are some numerical values)
|1 | 2| 3| |a | b| c|
(Assume index starts from 1 as shown in the example above)
Now all possible increasing sub-sequence of length 2 are:
12 23 13
so the sum of product of elements at those indexes is required, that is, ab+bc+ac
For length 3 we have only one increasing sub-sequence, that is, 123 so abc should be printed.
For length 4 we have no sequence so 0 is printed and the program terminates.
So output for the given array will be:
ab+bc+ac,abc,0
So for example if the elements a, b and c are 1, 2 and 3 respectively then the output should be 11,6,0
Similarly, for an array of size 4 with elements a,b,c,d the output will be:
ab+ac+ad+bc+bd+cd,abc+abd+acd+bcd,abcd,0
and so on...
Now obviously brute force will be too inefficient for large value of array size. I was wondering if there is an efficient algorithm to compute the output for an array of given size?
Edit 1: I tried finding a pattern. For example for an array of size 4:
The first value we need is :(ab+ac+bc)+d(a+b+c)= ab+ac+ad+bc+bd+cd (Take A=ab+ac+bd)
then the second value we need is:(abc) +d(A) = abc+abd+acd+bcd(B=abc)
then the third value we need is : (0) +d(B) = abcd(Let's take 0 as C)
then the fourth value we need is: +d(C) = 0
But it still requires a lot of computation and I can't figure out an efficient way to implement this.
Edit 2: My question is different then this since:
I don't need all possible permutations. I need all possible increasing sub-sequences from length 2 to n+1.
I also don't need to print all possible such sequences, I just need the value thus obtained (as explained above) and hence I am looking for some maths concept or/and some dynamic programming approach to solve this problem efficiently.
Note I am finding the set of all possible such increasing sub-sequences based on the index value and then computing based on the values at those index position as explained above.

As a post that seems to have disappeared pointed out one way is to get a recurrence relation. Let S(n,k) be the sum over increasing subsequences (of 1..n) of length k of the product of the array elements indexed by the sequence. Such a subsequence either ends in n or not; in the first case it's the concatenation of a subsequence of length k-1 of 1..n-1 and {n}; in the second case it's a subsequence of 1..n-1 of length k. Thus:
S(n,k) = S(n-1,k) + A[n] * S(n-1,k-1)
For this always to make sense we need to add:
S(n,0) = 1
S(n,m) = 0 for m>n

Hash Table: Which is the right linear-probing array?

I am studying data structures right now and in specific Hash Tables. I came across the follow question:
Imagine that we have placed the following keys
in an initial empty hash table with a length of 7
with linear probing, using the following table of hash-values:
key: A B C D E F G
hash: 3 1 4 1 5 2 5
Which of the following arrays could be the linear-probing array?
1.
0 1 2 3 4 5 6
G B D F A C E
2.
0 1 2 3 4 5 6
B G D F A C E
3.
0 1 2 3 4 5 6
E G F A B C D
When I create the linear-probing array I get this:
0 1 2 3 4 5 6
G B D A C E F
Could somebody please tell me why I am wrong and whats the right answer?

Notice how the question doesn't specify the order in which the keys are inserted, so your answer is only correct assuming that the keys are actually inserted in the order A-B-C-D-E-F-G, but since the question doesn't explicitly state the order, you need to dig deeper.
What you do know, however, is that one of those keys will be inserted first and it will go to its designated slot as shown in the Key-to-Hash diagram, since the hash table is initially empty. This immediately discards option choice 2 because none of the keys are in their designated array entry, leaving you with choice 1 and 3.
For table 1, B is in slot 1, which corresponds to its hash value and for table 3, keys F and A are in their initial hash-value spots.
It's simple to prove that no sequence of key inserts on table 3 after inserting F and A will yield table 3 as a result. And its likewise easy to prove that the sequence of key inserts B-D-F-A-C-E-G will result in table 1.
Although this is a question based on hash tables, I honestly don't consider it a good way to assess your knowledge on linear probing, this is more of a puzzle, as #gnasher729 mentioned.

Comparing a cell with a vector in Matlab

I have a cell A in Matlab of dimension 1x3, e.g.
A={{1,2,3,4} {5,6} {7,8,9} }
A contains all the integers from 1 to n in increasing order. In the example n=9. However, the number of elements within each sub-cell can be different. Each sub-cell is non-empty.
Consider the vector B of dimension nx1 containing some integers from 1 to n in increasing order (repetitions are allowed), e.g.
B=[1 1 2 2 4 7 7 8 9]'
I want to construct (without using loops) the vector C of dimension nx1 such that each C(i) tells which sub-cell of A B(i) belongs to. In the example
C=[1 1 1 1 1 3 3 3 3]'

With that structure, A is uniquely determined by the number of elements of each of its cells, and the result can be obtained as
C = sum(bsxfun(#gt, B, cumsum(cellfun(#numel, A))), 2)+1;

I don't know whether it's faster than for loops, but how about
C = arrayfun(#(b) find(cellfun(#(a) any(cell2mat(a) == b), A)), B);
Explanation: pick each element b in B; then pick every sub-cell a in A and check for equality with b, return the index of sub-cell b is a member of.