I am having some trouble understanding Multi-Valued Dependencies. The definition being: A multivalued dependency exists when there are at least 3 attributes (like X,Y and Z) in a relation and for value of X there is a well defined set of values of Y and a well defined set of values of Z. However, the set of values of Y is independent of set Z and vice versa.
Suppose we have a relation R(A,B,C,D,E) that satisfies the MVD's
A →→ B and B →→ D
How does MVD play into A->B and B->D here? Honestly I'm not sure I really understand the definition after looking at example problems.
If R contains the tuples (0,1,2,3,4) and (0,5,6,7,8), what other tuples must
necessarily be in R? Identify one such tuple in the list below.
a) (0,5,2,3,8)
b) (0,5,6,3,8)
c) (0,5,6,7,4)
d) (0,1,6,3,4)
I would have thought AB is 0,1 and 0,5 and BD is 1,3 and 5,7. None of the answers have 0,1,3,5,7.
MVDs (multi-valued dependencies) have nothing to do with "at least 3 attributes". (You seem to be quoting the Wikipedia article but that informal definition is partly wrong and partly unintelligible.) You need to read and think through a clear, precise and correct definition. (Which you were probably given.)
MVD X ↠ Y mentions two subsets of the set S of all attributes, X & Y. There are lots of ways to define when a MVD holds in a relation but the simplest to state & envisage is probably that the two projections XY and X(S-Y) join to the original relation. Which also mentions a third subset, S-Y. Which is what the (binary) JD (join dependency) {XY, X(S-Y)} says.
Wikipedia (although that article is a mess):
A decomposition of R into (X, Y) and (X, R−Y) is a lossless-join decomposition if and only if X ↠ Y holds in R.
From this answer:
MVDs always come in pairs. Suppose MVD X ↠ Y holds in a relation with attributes S, normalized to components XY & X(S-Y). Notice that S-XY is the set of non-X non-Y attributes, and X(S-Y) = X(S-XY). Then there is also an MVD X ↠ S-XY, normalized to components X(S-XY) & X(S-(S-XY)), ie X(S-XY) & XY, ie X(S-Y) & XY. Why? Notice that both MVDs give the same component pair. Ie both MVDs describe the same condition, that S = XY JOIN X(S-XY). So when an MVD holds, that partner holds too. We can write the condition expressed by each of the MVDs using the special explicit & symmetrical notation X ↠ Y | S-XY.
For R =
A,B,C,D,E
0,1,2,3,4
0,5,6,7,8
...
A ↠ B tells us that the following join to R:
A,B A,C,D,E
0,1 0,2,3,4
0,5 0,6,7,8
... ...
so R has at least
A,B,C,D,E
0,1,2,3,4
0,1,6,7,8
0,5,2,3,4
0,5,6,7,8
of which two are given and two are new but not choices.
B ↠ D tells us that the following join to R:
B,D A,B,C,E
1,3 0,1,2,4
5,7 0,5,6,8
... ...
so R has at least
A,B,C,D,E
0,1,2,3,4
0,5,6,7,8
which we already know.
So we don't yet know whether any of the choices are in R. But we do now know R is
A,B,C,D,E
0,1,2,3,4
0,1,6,7,8
0,5,2,3,4
0,5,6,7,8
...
Repeating, A ↠ B adds no new tuples but B ↠ D now gives this join:
B,D A,B,C,E
1,3 0,1,2,4
1,7 0,1,6,8
5,3 0,5,2,4
5,7 0,5,6,8
... ...
And one of the tuples in that join is choice b) (0,5,6,3,8).
The way the question is phrased, they are probably expecting you to use a definition that they will have given you that is like another two in Wikipedia. One says that α ↠ β holds in R when
[...] if we denote by (x, y, z) the tuple having values for α, β, R − α − β collectively equal to x, y, z, then whenever the tuples (a, b, c) and (a, d, e) exist in r, the tuples (a, b, e) and (a, d, c) should also exist in r.
(The only sense in which this gives the "formal" definition "in more simple words" is that this is also a definition. Because this isn't actually paraphrasing it, because this uses R − α − β whereas it uses R − β.)
By applying this rule to repeatedly generate further tuples for R starting from the given ones, we end up generating b) (0,5,6,3,8) much as we did above.
PS I would normally suggest that you review the (unsound) reasoning that led you to expect "AB is 0,1 and 0,5 and BD is 1,3 and 5,7" (whatever that means) or "0,1,3,5,7". But the "definition" you give (from Wikipedia) doesn't make any sense. So I suggest that you consider what you were doing with it.
I've been trying to define a function compdiff on the Wolfram Language that takes two mathematical expressions f and g and a variable x as input and outputs the difference of their compositions f[g[x]]-g[f[x]] (a sort of commutator if you are into abstract algebra).
For example: compdiff[x^2,x+1,x] = (x+1)^2-(x^2+1).
I've tried with
compdiff[f_,g_,x_]:= Composition[f,g][x]-Composition[g,f][x]
and
compdiff[f_,g_,x_]:= f #* g # x-g #* f # x
but when I input
compdiff[x^2,x+1,x]
it outputs
(x^2)[(1 + x)[x]] - (1 + x)[(x^2)[x]]
What am I doing wrong?
You need to use functions instead of expressions. For example:
f[x_] := x^2
g[x_] := x+1
Then compdiff[f, g, x] will work:
In[398]:= compdiff[f,g,x]
Out[398]= -1-x^2+(1+x)^2
Alternatively, you could use pure functions, as in:
In[399]:= compdiff[#^2&,#+1&,x]
Out[399]= -1-x^2+(1+x)^2
I am trying to convert a big expression from sage into valid C code using ccode() from sympy. However, my expression has many squared and cube terms. As pow(x,2) is far slower than x*x, I'm trying to expand those terms in my expression before the conversion.
Based on this conversation, I wrote the following code :
from sympy import Symbol, Mul, Pow, pprint, Matrix, symbols
from sympy.core import numbers
def pow_to_mul(expr):
"""
Convert integer powers in an expression to Muls, like a**2 => a*a.
"""
pows = list(expr.atoms(Pow))
pows = [p for p in pows if p.as_base_exp()[1]>=0]
if any(not e.is_Integer for b, e in (i.as_base_exp() for i in pows)):
raise ValueError("A power contains a non-integer exponent")
repl = zip(pows, (Mul(*[b]*e,evaluate=False) for b,e in (i.as_base_exp() for i in pows)))
return expr.subs(repl)
It partially works, but fails as long as the power is argument of a multiplication:
>>>_=var('x')
>>>print pow_to_mul((x^3+2*x^2)._sympy_())
2*x**2 + x*x*x
>>>print pow_to_mul((x^2/(1+x^2)+(1-x^2)/(1+x^2))._sympy_())
x**2/(x*x + 1) - (x*x - 1)/(x*x + 1)
Why? And how can I change that ?
Thank you very much,
If you compile with -ffast-math the compiler will do this optimization for you. If you are using an ancient compiler or cannot affect the level of optimization used in the build process you may pass a user defined function to ccode (using SymPy master branch):
>>> ccode(x**97 + 4*x**7 + 5*x**3 + 3**pi, user_functions={'Pow': [
... (lambda b, e: e.is_Integer and e < 42, lambda b, e: '*'.join([b]*int(e))),
... (lambda b, e: not e.is_Integer, 'pow')]})
'pow(x, 97) + 4*x*x*x*x*x*x*x + 5*x*x*x + pow(3, M_PI)'
I want to break a loop in a situation like this:
import Data.Maybe (fromJust, isJust, Maybe(Just))
tryCombination :: Int -> Int -> Maybe String
tryCombination x y
| x * y == 20 = Just "Okay"
| otherwise = Nothing
result :: [String]
result = map (fromJust) $
filter (isJust) [tryCombination x y | x <- [1..5], y <- [1..5]]
main = putStrLn $ unlines $result
Imagine, that "tryCombination" is a lot more complicated like in this example. And it's consuming a lot of cpu power. And it's not a evalutation of 25 possibilities, but 26^3.
So when "tryCombination" finds a solution for a given combination, it returns a Just, otherwise a Nothing. How can I break the loop instantly on the first found solution?
Simple solution: find and join
It looks like you're looking for Data.List.find. find has the type signature
find :: (a -> Bool) -> [a] -> Maybe a
So you'd do something like
result :: Maybe (Maybe String)
result = find isJust [tryCombination x y | x <- [1..5], y <- [1..5]]
Or, if you don't want a Maybe (Maybe String) (why would you?), you can fold them together with Control.Monad.join, which has the signature
join :: Maybe (Maybe a) -> Maybe a
so that you have
result :: Maybe String
result = join $ find isJust [tryCombination x y | x <- [1..5], y <- [1..5]]
More advanced solution: asum
If you wanted a slightly more advanced solution, you could use Data.Foldable.asum, which has the signature
asum :: [Maybe a] -> Maybe a
What it does is pick out the first Just value from a list of many. It does this by using the Alternative instance of Maybe. The Alternative instance of Maybe works like this: (import Control.Applicative to get access to the <|> operator)
λ> Nothing <|> Nothing
Nothing
λ> Nothing <|> Just "world"
Just "world"
λ> Just "hello" <|> Just "world"
Just "hello"
In other words, it picks the first Just value from two alternatives. Imagine putting <|> between every element of your list, so that
[Nothing, Nothing, Just "okay", Nothing, Nothing, Nothing, Just "okay"]
gets turned to
Nothing <|> Nothing <|> Just "okay" <|> Nothing <|> Nothing <|> Nothing <|> Just "okay"
This is exactly what the asum function does! Since <|> is short-circuiting, it will only evaluate up to the first Just value. With that, your function would be as simple as
result :: Maybe String
result = asum [tryCombination x y | x <- [1..5], y <- [1..5]]
Why would you want this more advanced solution? Not only is it shorter; once you know the idiom (i.e. when you are familiar with Alternative and asum) it is much more clear what the function does, just by reading the first few characters of the code.
To answer your question, find function is what you need. After you get Maybe (Maybe String) you can transform it into Maybe String with join
While find is nicer, more readable and surely does only what's needed, I wouldn't be so sure about inefficiency of the code that you have in a question. The lazy evaluation would probably take care of that and compute only what's needed, (extra memory can still be consumed). If you are interested, try to benchmark.
Laziness can actually take care of that in this situation.
By calling unlines you are requesting all of the output of your "loop"1, so obviously it can't stop after the first successful tryCombination. But if you only need one match, just use listToMaybe (from Data.Maybe); it will convert your list to Nothing if there are no matches at all, or Just the first match found.
Laziness means that the results in the list will only be evaluated on demand; if you never demand any more elements of the list, the computations necessary to produce them (or even see whether there are any more elements in the list) will never be run!
This means you often don't have to "break loops" the way you do in imperative languages. You can write the full "loop" as a list generator, and the consumer(s) can decide independently how much of the they want. The extreme case of this idea is that Haskell is perfectly happy to generate and even filter infinite lists; it will only run the generation code just enough to produce exactly as many elements as you later end up examining.
1 Actually even unlines produces a lazy string, so if you e.g. only read the first line of the resulting joined string you could still "break the loop" early! But you print the whole thing here.
The evaluation strategy you are looking for is exactly the purpose of the Maybe instance of MonadPlus. In particular, there is the function msum whose type specializes in this case to
msum :: [Maybe a] -> Maybe a
Intuitively, this version of msum takes a list of potentially failing computations, executes them one after another until the first computations succeeds and returns the according result. So, result would become
result :: Maybe String
result = msum [tryCombination x y | x <- [1..5], y <- [1..5]]
On top of that, you could make your code in some sense agnostic to the exact evaluation strategy by generalizing from Maybe to any instance of MonadPlus:
tryCombination :: MonadPlus m => Int -> Int -> m (Int,Int)
-- For the sake of illustration I changed to a more verbose result than "Okay".
tryCombination x y
| x * y == 20 = return (x,y) -- `return` specializes to `Just`.
| otherwise = mzero -- `mzero` specializes to `Nothing`.
result :: MonadPlus m => m (Int,Int)
result = msum [tryCombination x y | x <- [1..5], y <- [1..5]]
To get your desired behavior, just run the following:
*Main> result :: Maybe (Int,Int)
Just (4,5)
However, if you decide you need not only the first combination but all of them, just use the [] instance of MonadPlus:
*Main> result :: [(Int,Int)]
[(4,5),(5,4)]
I hope this helps more on a conceptual level than just providing a solution.
PS: I just noticed that MonadPlus and msum are indeed a bit too restrictive for this purpose, Alternative and asum would have been enough.
Excuse me if I get a little mathy for a second:
I have two sets, X and Y, and a many-to-many relation ℜ ⊆ X✗Y.
For all x ∈ X, let xℜ = { y | (x,y) ∈ ℜ } ⊆ Y, the subset of Y associated with x by ℜ.
For all y ∈ Y, let ℜy = { x | (x,y) ∈ ℜ } ⊆ X, the subset of X associated with y by ℜ.
Define a query as a set of subsets of Y, Q ⊆ ℘(Y).
Let the image of the query be the union of the subsets in Q:image(Q) = Uq∈Q q
Say an element of X x satisifies a query Q if for all q ∈ Q, q ∩ xℜ ≠ ∅, that is if all subsets in Q overlap with the subset of Y associated with x.
Define evidence of satisfaction of an element x of a query Q such that:evidence(x,Q) = xℜ ∩ image(Q)
That is, the parts of Y that are associated with x and were used to match some part of Q. This could be used to verify whether x satisfies Q.
My question is how should I store my relation ℜ so that I can efficiently report which x∈X satisfy queries, and preferably report evidence of satisfaction?
The relation isn't overly huge, as csv it's only about 6GB. I've got a couple ideas, neither of which I'm particularly happy with:
I could store { (x, xℜ) | ∀ x∈X } just in a flat file, then do O(|X||Q||Y|) work checking each x to see if it satisfies the query. This could be parallelized, but feels wrong.
I could store ℜ in a DB table indexed on Y, retrieve { (y, ℜy) | ∀ y∈image(Q) }, then invert it to get { (x, evidence(x,Q)) | ∀ x s.t. evidence(x,Q) ≠ ∅ }, then check just that to find the x that satisfy Q and the evidence. This seems a little better, but I feel like inverting it myself might be doing something I could ask my RDBMS to do.
How could I be doing this better?
I think #2 is the way to go. Also, if Q can be represented in CNF you can use several queries plus INTERSECT to get the RDBMS to do some of the heavy lifting. (Similarly with DNF and UNION.)
This also looks a bit a you want a "inverse index", which some RDBMS have support for. X = set of documents, Y = set of words, q = set of words matching the glob "a*c".
HTH