Saying R exactly 18 C means that there are exactly 18 individuals x_i as well as individuals y_i such that we have R(x_i,y_i) and all y_i are in C.
My question is: are the x_i individuals different? In other words will they necessarily be assigned to different domain elements in every interpretation?
I know that there is a owl:Different construct, but is it assumed when we say R exactly 18 C or do I have to declare it every time I want 18 different individuals?
Related
The language below is the intersection of two simpler languages. First, identify the simpler languages and give the state diagrams of the DFAs that recognize them. Then, use the product construction to build a DFA that recognizes the language specified below; give its state diagram before and after simplification if there are any unneeded states or states that can be combined.
Language: {w is a member of {0,1}* | w contains an odd number of 0s and the sum of its 0s and 1s is equal to 1}
This is my proposed solution: https://imgur.com/a/lh5Hwfr Should the bottom two states be connected with 0s?
Also, what would be the DFA if it was OR instead of AND?
Here's a drawing I hope will help understand how to do this:
Language A is "odd number of zeros". States are labeled Z0 and Z1 indicating even or odd number of zeros.
Language B is "exactly one one" (which is equivalent to "sum of digits equals one"). States are labeled I0, I1 and I2 indicaing zero, one or more ones.
Language A+B can be interpreted as A∩B (ignoring the dotted circles) or AUB (counting the dotted circles). If building A∩B, states Z0I2 and Z1I2 can be joined together.
I hope this gives not only an answer to the exact problem in the question, but also an idea how to build similar answers for similar problems.
In SHOIN(D) that is equivalent to the DL family used by OWL-DL;
Is this expression legal:
F ⊑ (≤1 r. D) ⊓ (¬ (=0 r. D))
Where F, D are concepts, r is a role. I want to express that each instance of F is related to at most one instance of D through r, and not to zero instances.
In general, how to decide that some expression is legal w. r. t. a specific variation of DL? I thought that using BNF syntax of the variation may be what I'm targeting.
One easy way is to check whether you can write it in Protege. Most of the things that you can write in Protege will be legal OWL-DL. In Protege you can write:
F SubClassOf ((r max 1 D) and not(r exactly 0 D))
Of course, saying that something has at most 1 value, and not exactly one would be exactly the same as saying that it has exactly 1:
F SubClassOf r exactly 1 D
But there are a few things that you'll be able to do in Protege that won't be legal OWL-DL. The more direct way to find out what these are is the standard, specifically §11 Global Restrictions on Axioms in OWL 2 DL. Generally the only problems you might run into is trying to use composite properties where you're not allowed to.
If you don't want to check by hand, then you could try uploading your ontology into the OWL Validator and selecting the OWL2 DL profile.
In a relationship R of N attributes how many functional dependencies are there (including trivial)?
I know that trivial dependencies are those where the right hand side is a subset of the left hand side but I'm not sure how to calculate the upper bound on the dependencies.
Any information regarding the answer and the approach to it would be greatly appreciated.
-
The maximum possible number of functional dependencies is
the number of possible left-hand sides * the number of possible right-hand sides
We're including trivial functional dependencies, so the number of possible left-hand sides equals the number of possible right-hand sides. So this simplifies to
(the number of possible left-hand sides)2
Let's say you have R{∅AB}. There are three attributes.1 The number of possible left-hand sides is
combinations of 3 attributes taken 1 at a time, plus
combinations of 3 attributes taken 2 at a time, plus
combinations of 3 attributes taken 3 at a time
which equals 3+3+1, or 7. So there are at most 72 possible functional dependencies for any R having three attributes: 49. The order of attributes doesn't matter, so we use formulas for combinations, not for permutations.
If you start with R{∅ABC}, you have
combinations of 4 attributes taken 1 at a time, plus
combinations of 4 attributes taken 2 at a time, plus
combinations of 4 attributes taken 3 at a time, plus
combinations of 4 attributes taken 4 at a time
which equals 4+6+4+1, or 15. So there are at most 152 possible functional dependencies for any R having four attributes: 225.
Once you know this formula, these calculations are simple using a spreadsheet. It's also pretty easy to write a program to generate every possible functional dependency using a scripting language like Ruby or Python.
The Wikipedia article on combinations has examples of how to count the combinations, with and without using factorials.
All possible combinations from R{∅AB}:
A->A A->B A->∅ A->AB A->A∅ A->B∅ A->AB∅
B->A B->B B->∅ B->AB B->A∅ B->B∅ B->AB∅
∅->A ∅->B ∅->∅ ∅->AB ∅->A∅ ∅->B∅ ∅->AB∅
AB->A AB->B AB->∅ AB->AB AB->A∅ AB->B∅ AB->AB∅
A∅->A A∅->B A∅->∅ A∅->AB A∅->A∅ A∅->B∅ A∅->AB∅
B∅->A B∅->B B∅->∅ B∅->AB B∅->A∅ B∅->B∅ B∅->AB∅
AB∅->A AB∅->B AB∅->∅ AB∅->AB AB∅->A∅ AB∅->B∅ AB∅->AB∅
Most people ignore the empty set. They'd say R{∅AB} has only two attributes, A and B, and they'd write it as R{AB}.
I am designing a metric to measure when a search term is "ambiguous." A score near to one means that it is ambiguous ("Ajax" could be a programming language, a cleaning solution, a greek hero, a European soccer club, etc.) and a score near to zero means it is pretty clear what the user meant ("Lady Gaga" probably means only one thing). Part of this metric is that I have a list of possible interpretations and frequency of those interpretations from past data and I need to turn this into a number between 0 and 1.
For example: lets say the term is "Cats" -- of a million trials 850,000 times the user meant the furry thing that meows, 80,000 times they meant the musical by that name, and the rest are abbreviations for things each only meant a trivial number of times. I would say this should have a low ambiguity score because even though there were multiple possible meanings, one was by far the preferred meaning. In contrast lets say the term is "Friends" -- of a million trials 500,000 times the user meant the people who they hang out with all the time, 450,000 times they meant the tv show by that name, and the rest were some other meaning. This should get a higher ambiguity score because the different meanings were much closer in frequency.
TLDR: If I sort the array in decreasing order, I need a way to take arrays which fall off quickly to numbers close to zero and arrays that fall off slower to numbers closer to one. If the array was [1,0,0,0...] this should get a perfect score of 0 and if it was [1/n,1/n,1/n...] this should get a perfect score of 1. Any suggestions?
What you are looking for sounds very similar to the Entropy measure in information theory. It is a measure of how uncertain a random variable is based on the probabilities of each outcome. It is given by:
H(X) = -sum(p(x[i]) * log( p(x[i])) )
where p(x[i]) is the probability of the ith possiblility. So in your case, p(x[i]) would be the probability that a certain search phrase corresponded to an actual meaning. In the cats example, you would have:
p(x[0]) = 850,000 / (850,000+80,000) = 0.914
p(x[1]) = 80,000 / (850,000+80,000) = 0.086
H(X) = -(0.914*log2(0.914) + 0.086*log2(0.086)) = 0.423
For the Friends case, you would have: (assuming only one other category)
H(X) = -(0.5*log2(0.5) + 0.45*log2(0.45) + 0.05*log2(0.05)) = 1.234
The higher number here means more uncertainty.
Note that I am using log base 2 in both cases, but if you use a logarithm of the base equal to the number of possibilities, you can get the scale to work out to 0 to 1.
H(X) = -(0.5*log3(0.5) + 0.45*log3(0.45) + 0.05*log3(0.05)) = 0.779
Note also that the most ambiguous case is when all possibilities have the same probability:
H(X) = -(0.33*log3(0.33) + 0.33*log3(0.33) + 0.33*log3(0.33)) = 1.0
and the least ambiguous case is when there is only one possibility:
H(X) = -log(1) = 0.0
Since you want the most ambiguous terms to be near 1, you could just use 1.0-H(X) as your metric.
As you probably know, the SUBSET-SUM problem is defined as determining if a subset of a set of whole numbers sum to a specified whole number. (there is another definition of subset-sum, where a group of integers sum to zero, but let's use this definition for now)
For example ((1,2,4,5),6) is true because (2,4) sums to 6. We say that (2,4) is a "solution"
Furthermore, ((1,5,10),7) is false because nothing in the arguments sum to 7
My question is, given a set of argument numbers for SUBSET-SUM is there a polynomial upper bound on the number of possible solutions. In the first example there was (2,4) and (1,5).
We know that since SUBSET-SUM is NP-complete deciding in polynomail time probably is impossible. However my question is not related to the decision time, I'm asking strictly about the size of the list of solutions.
Obviously the size of the power set of the argument numbers can be an upper bound on solution list size, however this has exponential growth. My intuition is that there should be a polynomial bound, but I cannot prove this.
nb I know this sounds like a homework question, but please trust me it isn't. I am trying to teach myself certain aspects of CS theory and this is where my thoughts have taken me.
No; take numbers:
(1, 2, 1+2, 4, 8, 4+8, 16, 32, 16+32, ..., 22n, 22n+1, 22n+22n+1)
and ask about forming the sum 1 + 2 + 4 + ... + 22n + 22n+1. (For example: with n = 3 take the set (1,2,3,4,8,12,16,32,48) and ask about the subsets summing to 63.)
You can form 1+2 either using 1 and 2 or using 1+2.
You can form 4+8 either using 4 and 8 or using 4+8.
....
You can form 22n + 22n+1 either using 22n and 22n+1 or 22n+22n+1.
The choices are independent, so there are at least 3n=3m/3, where m is the size of your set. I bet this can be sharply strengthened, but this proves there's no polynomial bound.
Sperner's Theorem provides a nice (albeit non-polynomial) upper bound, at least in the case when the numbers in the set are strictly greater than zero (as seems to be the case in your problem).
The family of all subsets with a given sum form a Sperner family, which is a collection of subsets where no subset in the family is itself a subset of any of the other subsets in the family. This is where the assumption that the elements are strictly greater than zero is used. Sperner's theorem states that the maximum size of such a Sperner family is given by the binomial coefficient n Choose floor(n/2).
If you drop the assumption that the n numbers are distinct, it is easy to see that this upper bound can't be improved (just take all numbers = 1 and let the target sum be floor(n/2)). I don't know if it can be improved under the assumption that the numbers are distinct.