Spring Expression Language nested projections - spring-el

Is there any way (or a workaround) to perform nested projection in expressions with access to parent projection context? What I want to evaulate is
collection.![#this.someList.?[#this.id == #parent.id]]
where #parent presumably refers to #this of collection.! projection.

Related

Best way to represent part-of (mereological) transitivity for OWL classes?

I have a background in frame-based ontologies, in which classes represent concepts and there's no restriction against assertion of class:class relationships.
I am now working with an OWL2 ontology and trying to determine the best/recommended way to represent "canonical part-of" relationships - conceptually, these are relationships that are true, by definition, of the things represented by each class (i.e., all instances). The part-of relationship is transitive, and I want to take advantage of that so that I'd be able to query the ontology for "all parts of (a canonical) X".
For example, I might want to represent:
"engine" is a part of "car", and
"piston" is a part of "engine"
and then query transitively, using SPARQL, for parts of cars, getting back both engine and piston. Note that I want to be able to represent individual cars later (and be able to deduce their parts by reference to their rdf:type), and of course I want to be able to represent sub-classes of cars as well, so I can't model the above-described classes as individuals - they must be classes.
It seems that I have 3 options using OWL, none ideal. Is one of these recommended (or are any discouraged), and am I missing any?
OWL restriction axioms:
rdfs:subClassOf(engine, someValuesFrom(partOf, car))
rdfs:subClassOf(piston, someValuesFrom(partOf, engine))
The major drawback of the above is that there's no way in SPARQL to query transitively over the partOf relationship, since it's embedded in an OWL restriction. I would need some kind of generalized recursion feature in SPARQL - or I would need the following rule, which is not part of any standard OWL profile as far as I can tell:
antecedent (body):
subClassOf(B, (P some A) ^
subClassOf(C, (P some B) ^
transitiveProperty(P)
consequent (head):
subClassOf(C, (P some A))
OWL2 punning: I could effectively represent the partOf relationships on canonical instances of the classes, by asserting the object-property directly on the classes. I'm not sure that that'd work transparently with SPARQL though, since the partOf relationships would be asserted on instances (via punning) and any subClassOf relationships would be asserted on classes. So if I had, for example, a subclass six_cylinder_engine, the following SPARQL snippet would not bind six_cylinder_engine:
?part (rdfs:subClassOf*/partOf*)+ car
Annotation property: I could assert partOf as an annotation property on the classes, with values that are also classes. I think that would work (minus transitivity, but I could recover that easily enough with SPARQL paths as in the query above), but it seems like an abuse of the intended use of annotation properties.
I think you have performed a good analysis of the problem and the advantages/disadvantages of different approaches. I don't know if any one is discouraged or encouraged. IMHO this problem has not received sufficient attention, and is a bigger problem in some domains than others (I work in bio-ontologies which frequently use partonomies, and hence this is very important).
For 1, your rule is valid and justified by OWL semantics. There are other ways to implement this using OWL reasoners, as well as RDF-level reasoners. For example, using the ROBOT command line wrapper to the OWLAPI, you can run the reason command using an Expression Materializing Reasoner. E.g
robot reason --exclude-tautologies true --include-indirect true -r emr -i engine.owl -o engine-reasoned.owl
This will give you an axiom piston subClassOf partOf some car that can be queried using a non-transitive SPARQL query.
The --exclude-tautologies removes inferences to owl:Thing, and --include-indirect will include transitive inferences.
For your option 2, you have to be careful in that you may introduce incorrect inferences. For example, assume there are some engines without pistons, i.e. engine SubClassOf inverse(part_of) some piston does not hold. However, in your punned shadow world, this would be entailed. This may or may not be a problem depending on your use case.
A variant of your 2 is to introduce different mapping rules for layering OWL T-Tboxes onto RDF, such as described in my OWLStar proposal. With this proposal, existentials would be mapped to direct triples, but there is another mechanism (e.g. reification) to indicate the intended quantification. This allows writing rules that are both safe (no undesired inferences) and complete (for anything expressible in OWL-RL). Here there is no punning (under the alternative RDF to OWL interpretation). You can also use the exact same transitive SPARQL query you wrote to get the desired results.

Is multiplication allowed in relational algebra?

I have a relation
R
-------
cid sid gradepoint credits
CS425 001 4.0 3
I need to calculate the GPA. There are more rows, but I believe if I just get this answered I should be ok with the rest. I need to do gradepoint * credits. How do I express this with a relational algebra expression?
My best guess is:
, but I'm not sure if I can multiply attributes with anything other than a constant.
Relational algebra doesn't address domain-specific operations. It neither includes nor excludes it, just like real algebra neither includes nor excludes operations on relations.
If you allow multiplication by constants, you're already combining algebras (which is pretty much required for any practical application) so I see no reason to disallow multiplication between attributes.
Notice that if expressions like you are using are allowed then it is projection that is doing the multiplying. Instead of its inputs being a relation value & attribute names, its inputs are a relation value and expressions of some sort that include names of operators whose inputs are values of attribute types. Your projection is parsing and multiplying. So it is a different operator than one that only accepts attribute names.
The projection that takes attribute expressions begs the question of its implementation given an algebra with projection only on a relation value and attribute names. This is important in an academic setting because a question may be wanting you to actually figure out how to do that, or because the difficulty of a question is dependent on the operators available. So find out what algebra you are supposed to use.
We can introduce an operator on attribute values when we only have basic relation operators taking attribute names and relation values. Each such operator can be associated with a relation value that has an attribute for each operand and an attribute for the result. The relation holds the tuples where the result value is equal to the the result of the operator called on the operand values. (The result is functionally dependent on the operands.)
So suppose we have the following table value Times holding tuples where left * right = result:
left right result
-------------------
0 0 0
1 0 0
...
0 1 0
1 1 1
2 1 2
...
If your calculated attribute is result then you want
/* tuples where for some credits & gradepoint,
course cid's student sid earned grade gradepoint and credits credits
and credits * gradepoint = result
*/
project cid, sid, result (
R natural join (rename left\credits right\gradepoint (Times))
)
Relational algebra - recode column values
PS re algebra vs language: What is a reference to the "relational algebra" you are using? There are many. They even have different notions of what a "relation" is. Some so-called "algebras" are really languages because the expressions don't only represent the results of operators being called on values. Although it is possible for an algebra to have operand values that represent expressions and/or relation values that contain names for themselves.
PS re separation of concerns: You haven't said what the attribute name of the multiplication result is. If you're expecting it to be credit * gradepoint then you're also expecting projection to map expression-valued inputs to attribute names. Except you are expecting credit * gradepoint to be recognized as an expression with two attribute names & and an operator name in one place but to be just one attribute name in another. There are solutions for these things in language design, eg in SQL optional quotes specialized for attribute names. But maybe you can see why simple things like an algebra operating on just attribute names & relation values with unique, unordered attribute names helps us understand in terms of self-contained chunks.

Why are union, intersection and difference operations called boolean operations in relational algebra?

Why are union, intersection and difference operations of relational algebra called boolean operations?
I found them called that in the first line in section 5.4.1 Boolean operations (Section 5.4 is Relational Algebra and Datalog) in a book named A First Course in DATABASE SYSTEMS by Ullman & Widom.
The statement that you are citing is from the Chapter 5 (Algebraic and Logical Query Languages) of the book “First Course in Database Systems” by Ullman and Widom (Pearson 2013).
In particular, the sections 5.3 and 5.4 of that Chapter treat the language Datalog, which can be used to work on a Relational Data Base, in which a Relation is seen as predicate, not a set (see Section 5.3.1).
In other words, a tuple (x1, x2, ..., xn) of R is seen as the fact that the relation R is true for the arguments specified (x1, x2, ..., xn). In this way one could transform the set operators discussed in the context of Relation Algebra (union, difference, intersection) as rules of Datalog expressed through the use of boolean operators, like AND, NOT, etc.
In fact, you can see that in the same book, in section 2.4.4, they are called Set Operators (as they actually are), so I think the naming of “boolean operators” is due to the fact that they are strictly related to boolean operators and are discussed in the context of a logical view of a database.
The notion of "boolean" relational operations is an idiosyncratic ad hoc distinction/categorization in that book and chapter. The authors identify 3 common relation operators that "can each be expressed simply in Datalog". The next sections at the same level go on to each express some other relational operator. ("Selections can be somewhat more difficult to express in Datalog.") The 3 operators' treatments are similar to each other and different from others' in a certain context-specific "boolean" way. (So it's not a particularly helpful or deep distinction.)
5.4.1 Boolean Operations
The boolean operations of relational algebra--union, intersection, and set
difference--can each be expressed simply in Datalog.
To take the union R ∪ S, [...] As a result, each tuple from R and each tuple of S is put into the answer relation.
To take the intersection R ∩ S, [...] Then, a tuple is in the answer relation if and only if it is in both R and S.
To take the difference R - S, [...] Then, a tuple is in the answer relation if and only if it is in R but not in S.
So the "boolean" is because each relation operator corresponds to a certain boolean operator in that context:
UNION returns tuples that are in one operand OR the other
INTERSECTION returns tuples that are in one operand AND the other
DIFFERENCE returns tuples that are in one operand AND NOT the other
(Datalog also literally uses AND and AND NOT, but OR is implicit.)
More importantly, this can be put another way: If every base relation holds the tuples that make a true proposition (statement) from some associated predicate (sentence template parameterized by attribute names) then every result has a predicate made from its operand predicates, and its value holds the tuples that make a true proposition from that predicate:
U holds tuples where U
V holds the tuples where V
U UNION V holds the tuples where U OR V
U INTERSECTION V holds the tuples where U AND V
U DIFFERENCE V holds the tuples where U AND NOT V
But more than that it's not just that those operators correspond to some boolean propositional logic connectives/nonterminals, but that every relational operator corresponds to a predicate logic connective/nonterminal, every query expression has an associated predicate, and every query result value holds the tuples that make a true proposition from that predicate:
U JOIN V holds the tuples where U AND V
R RESTRICTcondition holds the tuples where U AND condition
U PROJECT A holds the tuples where FORSOME values for all attributes but A, U
U RENAME A A' holds the tuples where U with A replaced by A'
So the book ties those three operators to the syntax & semantics of Datalog, but really all operators can be tied to the syntax & semantics of predicate logic. Which is the language/notation of precision in mathematics, science (including computer science) & engineering (including software engineering.) In fact, that's how we know what a query means. So it's not really that those three operators are in some sense boolean so much as that all relational operators are logical: every relation expression corresponds to a predicate logic expression.
(The three operators are also frequently called the "set operators", since the set that is the body of the result relation arises from the sets that are the bodies of the operand relations according to the set operators by the same names. And that might be a helpful way to group them in your mind or remember them or their names, and no doubt inspired their names. But in light of the correspondence between relation operators and predicate logic connectives/nonterminals and the correspondence between relation expressions and predicate logic expressions and the correspondence between predicates and relation values, the fact that some relation operators are reminiscent of some set operators is totally irrelevant.) (Just like that some operators are reminiscent of boolean operators. Just like that you can make an algebra with "relations" whose bodies are bags instead of sets.)

Short-circuit OR operator in Lucene/Solr

I understand that lucene's AND (&&), OR (||) and NOT (!) operators are shorthands for REQUIRED, OPTIONAL and EXCLUDE respectively, which is why one can't treat them as boolean operators (adhering to boolean algebra).
I have been trying to construct a simple OR expression, as follows
q = +(field1:value1 OR field2:value2)
with a match on either field1 or field2. But since the OR is merely an optional, documents where both field1:value1 and field2:value2 are matched, the query returns a score resulting in a match on both the clauses.
How do I enforce short-circuiting in this context? In other words, how to implement short-circuiting as in boolean algebra where an expression A || B || C returns true if A is true without even looking into whether B or C could be true.
Strictly speaking, no, there is no short circuiting boolean logic. If a document is found for one term, you can't simply tell it not to check for the other. Lucene is an inverted index, so it doesn't really check documents for matches directly. If you search for A OR B, it finds A and gets all the documents which have indexed that value. Then it gets B in the index, and then list of all documents containing it (this is simplifying somewhat, but I hope it gets the point across). It doesn't really make sense for it to not check the documents in which A is found. Further, for the query provided, all the matches on a document still need to be enumerated in order to acquire a correct score.
However, you did mention scores! I suspect what you are really trying to get at is that if one query term in a set is found, to not compound the score with other elements. That is, for (A OR B), the score is either the score-A or the score-B, rather than score-A * score-B or some such (Sorry if I am making a wrong assumption here, of course).
That is what DisjunctionMaxQuery is for. Adding each subquery to it will render a score from it equal to the maximum of the scores of all subqueries, rather than a product.
In Solr, you should learn about the DisMaxQParserPlugin and it's more recent incarnation, the ExtendedDisMax, which, if I'm close to the mark here, should serve you very well.

index many documents to enable queries that supports AND/OR operations

We have many documents that consists of words.
What is most appropriate way to index the documents.
A search query should support the AND/OR operations.
The query runtime should be efficient as possible.
Please describe the space required for the index.
The documents contain words only(excluding AND/OR) and the query contains words and the keywords AND/OR.
EDIT: what would be the algorithm if I would allow only 2 keywords and an operation
(e.g. w1 AND w2)
The basic data structured needed is an inverted index. This maps each word to the set of documents that contain it. Lets say lookup is a function from words to document sets: lookup: W -> Pos(D) (where W is the set of words, D the set of documents, Pos(D) the power set of D).
Lets say you have an query expression of the form:
Expr ::= Word | AndExpression | OrExpression
AndExpression ::= Expr 'AND' Expr
OrExpression ::= Expr 'OR' Expr
So you get an abstract syntax tree representing your query. That's a tree with the following kind of nodes:
abstract class Expression { }
class Word extends Expression {
String word
}
class AndExpression extends Expression {
Expression left
Expression right
}
class OrExpression extends Expression {
Expression left
Expression right
}
For example, foo AND (bar OR baz) would be translated to this tree:
AndExpression
/ \
/ \
Word('foo') OrExpression
/ \
/ \
Word('bar') Word('baz')
To evaluate this tree, follow these simple rules, expressed in pseudocode:
Set<Document> evaluate(Expr e) {
if (e is Word)
return lookup(e.word)
else if (e is AndExpression)
return intersection(evaluate(e.left), evaluate(e.right))
else if (e is OrExpression)
return union(evaluate(e.left), evaluate(e.right))
//otherwise, throw assertion error: no other case remaining
}
//implemented by the inverted index, not shown
Set<Document> lookup(String word)
Thus, AND expressions are basically translated to set intersections, while OR expressions are translated to union expressions, all evaluated recursively. I'm sure if you stare long enough on the above, you'll see its beauty :)
You could represent each set (that lookup returns) as a HashSet. If you use Java, You could also use guava's lazy union and intersection implementations, that should be fun (especially if you study the code or use your imagination to see what 'lazy' really means in this context).
To the best of my knowledge, though, intersections are rarely computed by intersecting hashtables - instead, what usually happens is the following: assume are 3 sets to be intersected, we pick one (the smallest preferably) and assign a counter (equal to 1) to each of the documents. We then iterate the other sets, incrementing the counter of each found document. Finally, we report each document that its counter becomes 3 (that means that the document appeared in all sets, thus exists in their intersection).
In case only 2 keywords are allowed (e.g. "key1 AND key2")
This is the solution I found so far:
1) keyMap, HashMap, where the key is the keywords and the value is the LinkedList of documents.
2) docMap, HashMap, where the key is the document id and the value is an HashSet of keywords
Now on such a query ("key1 AND key2") I would:
LinkedList docs = keyMap.get(key1);
for each (HashSet doc:docs)
if(doc.contains(keys))
result.add(doc);
return result
What do you think?
Is there any better way?
How about 3 keywords?
Google Desktop springs to mind, but all the major O/S have similar features built in or easily added. I'd describe the space used by Spotlight on my Mac as 'reasonable'.
With many documents, a dedicated package like Lucene becomes very attractive.
As #Wrikken said, use lucene.
As you are interested in the algorithms used there are many, you can find a starting point here and more information here.

Resources