How do I prevent a Datalog rule from pruning nulls?

How do I prevent a Datalog rule from pruning nulls? - database

I have the following facts and rules:
% frequents(D,P) % D=drinker, P=pub
% serves(P,B) % B=beer
% likes(D,B)
frequents(janus, godthaab).
frequents(janus, goldenekrone).
frequents(yanai, goldenekrone).
frequents(dimi, schlosskeller).
serves(godthaab, tuborg).
serves(godthaab, carlsberg).
serves(goldenekrone, pfungstaedter).
serves(schlosskeller, fix).
likes(janus, tuborg).
likes(janus, carlsberg).
count_good_beers_for_at(D,P,F) :- group_by((frequents(D,P), serves(P,B), likes(D,B)),[D,P],(F = count)).
possible_beers_served_for_at(D,P,B) :- lj(serves(P,B), frequents(D,R), P=R).
Now I would like to construct a rule that should work like a predicate returning "true" when the number of available "liked" beers at each pub that a "drinker" "frequents" is bigger than 0.
I would consider the predicate true when the rule returns no tuples. If the predicate is false, I was planning to make it return the bars not having a single "liked" beer.
As you can see, I already have a rule counting the good beers for a given drinker at a given pub. I also have a rule giving me the number of servable beers.
DES> count_good_beers_for_at(A,B,C)
{
count_good_beers_for_at(janus,godthaab,2)
}
Info: 1 tuple computed.
As you can see, the counter doesn't return the pubs frequented but having 0 liked beers. I was planning to work around this by using a left outer join.
DES> is_happy_at(D,P,Z) :- lj(serves(P,B), count_good_beers_for_at(D,Y,Z), (Y=P))
Info: Processing:
is_happy_at(D,P,Z) :-
lj(serves(P,B),count_good_beers_for_at(D,Y,Z),Y = P).
{
is_happy_at(janus,godthaab,2),
is_happy_at(null,goldenekrone,null),
is_happy_at(null,schlosskeller,null)
}
Info: 3 tuples computed.
This is almost right, except it is also giving me the pubs not frequented. I try adding an extra condition:
DES> is_happy_at(D,P,Z) :- lj(serves(P,B), count_good_beers_for_at(D,Y,Z), (Y=P)), frequents(D,P)
Info: Processing:
is_happy_at(D,P,Z) :-
lj(serves(P,B),count_good_beers_for_at(D,Y,Z),Y = P),
frequents(D,P).
{
is_happy_at(janus,godthaab,2)
}
Info: 1 tuple computed.
Now I somehow filtered everything containing nulls away! I suspect this is due to null-value logic in DES.
I recognize that I might be approaching this whole problem in a wrong way. Any help is appreciated.
EDIT: Assignment is "very_happy(D) ist wahr, genau dann wenn jede Bar, die Trinker D besucht, wenigstens ein Bier ausschenkt, das er mag." which translates to "very_happy(D) is true, iff each bar drinker D visits, serves at least 1 beer, that he likes". Since this assignment is about Datalog, I would think it is definitely possible to solve without using Prolog.

I think that for your assignement you should use basic Datalog, without abusing of aggregates. The point of the question is how to express universally quantified conditions. I googled for 'universal quantification datalog', and at first position I found deductnotes.pdf that asserts:
An universally quantified condition can only be expressed by an equivalent condition with existential quantification and negation.
In that PDF you will find also an useful example (pagg 9 & 10).
Thus we must rephrase our question. I ended up with this code:
not_happy(D) :-
frequents(D, P),
likes(D, B),
not(serves(P, B)).
very_happy(D) :-
likes(D, _),
not(not_happy(D)).
that seems what's required:
DES> very_happy(D)
{
}
Info: 0 tuple computed.
Note the likes(D, _), that's required to avoid that yanai and dimi get listed as very_happy, without explicit assertion of what them like (OT sorry my English really sucks...)
EDIT: I'm sorry, but the above solution doesn't work. I've rewritten it this way:
likes_pub(D, P) :-
likes(D, B),
serves(P, B).
unhappy(D) :-
frequents(D, P),
not(likes_pub(D, P)).
very_happy(D) :-
likes(D, _),
not(unhappy(D)).
test:
DES> unhappy(D)
{
unhappy(dimi),
unhappy(janus),
unhappy(yanai)
}
Info: 3 tuples computed.
DES> very_happy(D)
{
}
Info: 0 tuples computed.
Now we add a fact:
serves(goldenekrone, tuborg).
and we can see the corrected code outcome:
DES> unhappy(D)
{
unhappy(dimi),
unhappy(yanai)
}
Info: 2 tuples computed.
DES> very_happy(D)
{
very_happy(janus)
}
Info: 1 tuple computed.

Maybe not the answer your are expecting. But you can use ordinary Prolog and easily do group by queries with the bagof/3 or setof/3 builtin predicates.
?- bagof(B,(frequents(D,P), serves(P,B), likes(D,B)),L), length(L,N).
D = janus,
P = godthaab,
L = [tuborg,carlsberg],
N = 2
The semantics of bagof/3 is such that it does not compute an outer join for the given query. The query is normally executed by Prolog. The results are first accumulated and key sorted. Finally the results are then returned by backtracking. If your datalog cannot do without nulls, then yes you have to filter.
But you don't need to go into aggregates when you only want to know the existence of a liked beer. You can do it directly via a query without any aggregates:
is_happy_at(D,P) :- frequents(D,P), once((serves(P,B), likes(D,B))).
?- is_happy_at(D,P).
D = janus,
P = godthaab ;
Nein
The once/1 prevents from unnecessary backtrack. Datalog might either automatically not do unnecessary backtracking when it sees the projection in is_happy_at/2, i.e. B is projected away. Or you might need to explicitly use what corresponds to SQL DISTINCT. Or eventually your datalog provides you something that corresponds to SQL EXISTS which most closely corresponds to once/1.
Bye

Related

Clingo program expected to be satisfiable

I am testing some programs involving arithmetics in Clingo 5.0.0 and I don't understand why the below program is unsatisfiable:
#const v = 1.
a(object1).
a(object2).
b(object3).
value(object1,object2,object3) = "1.5".
value(X,Y,Z) > v, a(X), a(Y), b(Z), X!=Y :- go(X,Y,Z).
I expected an answer containing: a(object1) a(object2) b(object3) go(object1,object2,object3).
There is probably something I miss regarding arithmetic with Clingo.

I fear there are quite some misunderstandings about ASP here.
You can not assign values to predicates (value(a,b,c)=1.5). Predicates form atoms, that can be true or false (contained in an answer set or not).
I assume that your last rule shall derive the atom go(X,Y,Z). Rules do work the other way around, what is derived is on the left hand side.
There is no floating point arithmetic possible, you would have to scale your values up to integers.
Your problem might look like this, but this is just groping in the dark:
#const v = 1.
a(object1).
a(object2).
b(object3).
value(object1,object2,object3,2).
go(X,Y,Z) :- value(X,Y,Z,Value), Value > v, a(X), a(Y), b(Z), X!=Y.
The last rule states:
Derive go(object1,object2,object3) if value(object1,object2,object3,2) is true and 2 > 1 and a(object1) is true and a(object2) is true and b(object3) is true and object1 != object2.

Excluding certain conditioned variable in calculation using nested ifelse statement

I have a formula, says :
v12=-(ln(v6)*v6+ln(v7)*v7+ln(v8)*v8+ln(v9)*v9).
I have 0 number in one or two variables in the calculation. Since ln(0) is undefined, the calculation was not performed. Is there any way to ignore any variables that contain 0 in the calculation and proceed without it? I tried using na, but it failed.

Two choices (one is in R language which is what you originally asked for :S ):
1) What you literally asked for, in R, by using vectorization:
vec = c(v6,v7,v8,v9)
result = -sum(ifelse(vec!=0, vec*log(vec), 0))
2) log1p() is the safer and numerically more well-behaved alternative for small numbers:
v12 = -sum(lapply(c(v6,v7,v8,v9), function(x) { x*log1p(x) } ))
You can use log1p in C too!

SWI Prolog Array Retrive [Index and Element]

I'm trying to program an array retrieval in swi-prolog. With the current code printed below I can retrieve the element at the given index but I also want to be able to retrieve the index[es] of a given element.
aget([_|X],Y,Z) :- Y \= 0, Y2 is (Y-1), aget(X,Y2,Z).
aget([W|_],Y,Z) :- Y = 0, Z is W.
Example 1: aget([9,8,7,6,5],1,N) {Retrieve the element 8 at index 1}
output: N = 9. {Correct}
Example 2: aget([9,8,7,6,5],N,7) {retrieve the index 2 for Element 7}
output: false {incorrect}
The way I understood it was that swi-prolog would work in this way with little no additional programing. So clearly I'm doing something wrong. If you could point me in the right direction or tell me what I'm doing wrong, I would greatly appreciate it.

Your code it's too procedural, and the second clause it's plainly wrong, working only for numbers.
The functionality you're looking for is implemented by nth0/3. In SWI-Prolog you can see the optimized source with ?- edit(nth0). An alternative implementation has been discussed here on SO (here my answer).
Note that Prolog doesn't have arrays, but lists. When an algorithm can be rephrased to avoid indexing, then we should do.

If you represent arrays as compounds, you can also use the ISO standard predicate arg/3 to access an array element. Here is an example run:
?- X = array(11,33,44,77), arg(2,X,Y).
X = array(11, 33, 44, 77),
Y = 33.
The advantage over lists is that the compound access needs O(1) time and whereas the list access needs O(n) time, where n is the length of the array.

Prolog - Little exercise on facts

Ok. That's my problem. I need to implement a predicate that sums up all the prices of the products in the list. But, for now, I'm not getting any further with it.
What am I doing wrong?
Thanks in advance.
domains
state = reduced ; normal
database
producte (string, integer, state)
predicates
nondeterm calculate(integer)
clauses
% ---> producte( description , price , state )
producte("Enciam",2,normal).
producte("Llet",1,reduced).
producte("Formatge",5,normal).
calculate(Import):-
producte(_,Import,_).
calculate(Import):-
producte(_,Import,_),
calculate(Import2),
Import=Import2+Import,!.
Goal
calculate(I).

Disclaimer: I'm a bit daft when it comes to prolog. Also, I don't have access to a prolog interpreter right now.
The canonical example, sum of a list:
sum([], 0).
sum([Head | Tail], Total) :- sum(Tail, Temp), Total is Head + Temp.
making a list with findall/3:
findall(Val, producte(_, Val, _), Vals).
Vals has your list you want to sum.
Update: per your comment, I'm a bit out of my depth without access to an interpreter.
calculate(I) :- sum(Vals, I), findall(Val, producte(_, Val, _), Vals).
What I think this does:
uses your single goal I, which receives the result of summing your Vals list, which is generated by findall. But it's been so long since I've used prolog that I may not even have the syntax right for doing what I want. However, a small variation should accomplish what you want with a single goal.

The findall part :
calculate(Price) :-
List = [ Price || producte(_, Price, _) ],
sum_list(List, 0, Sum).
The sum_list part :
sum_list([], Acc, Acc).
sum_list([Head|Tail], Acc, Sum) :-
NewAcc is Acc + Head,
sum_list(Tail, NewAcc, Sum).
I guess something along these lines should work according to visual-prolog doc but I kinda don't wanna install visual-prolog to test it...

Determine regular expression's specificity

Given the following regular expressions:
- alice#[a-z]+\.[a-z]+
- [a-z]+#[a-z]+\.[a-z]+
- .*
The string alice#myprovider.com will obviously match all three regular expressions. In the application I am developing, we are only interested in the 'most specific' match. In this case this is obviously the first one.
Unfortunately there seems no way to do this. We are using PCRE and I did not find a way to do this and a search on the Internet was also not fruitful.
A possible way would be to keep the regular expressions sorted on descending specificity and then simply take the first match. Of course then the next question would be how to sort the array of regular expressions. It is not an option to give the responsability to the end-user to ensure that the array is sorted.
So I hope you guys could help me out here...
Thanks !!
Paul

The following is the solution to this problem I developed based on Donald Miner's research paper, implemented in Python, for rules applied to MAC addresses.
Basically, the most specific match is from the pattern that is not a superset of any other matching pattern. For a particular problem domain, you create a series of tests (functions) which compare two REs and return which is the superset, or if they are orthogonal. This lets you build a tree of matches. For a particular input string, you go through the root patterns and find any matches. Then go through their subpatterns. If at any point, orthogonal patterns match, an error is raised.
Setup
import re
class RegexElement:
def __init__(self, string,index):
self.string=string
self.supersets = []
self.subsets = []
self.disjoints = []
self.intersects = []
self.maybes = []
self.precompilation = {}
self.compiled = re.compile(string,re.IGNORECASE)
self.index = index
SUPERSET = object()
SUBSET = object()
INTERSECT = object()
DISJOINT = object()
EQUAL = object()
The Tests
Each test takes 2 strings (a and b) and tries to determine how they are related. If the test cannot determine the relation, None is returned.
SUPERSET means a is a superset of b. All matches of b will match a.
SUBSET means b is a superset of a.
INTERSECT means some matches of a will match b, but some won't and some matches of b won't match a.
DISJOINT means no matches of a will match b.
EQUAL means all matches of a will match b and all matches of b will match a.
def equal_test(a, b):
if a == b: return EQUAL
The graph
class SubsetGraph(object):
def __init__(self, tests):
self.regexps = []
self.tests = tests
self._dirty = True
self._roots = None
#property
def roots(self):
if self._dirty:
r = self._roots = [i for i in self.regexps if not i.supersets]
return r
return self._roots
def add_regex(self, new_regex):
roots = self.roots
new_re = RegexElement(new_regex)
for element in roots:
self.process(new_re, element)
self.regexps.append(new_re)
def process(self, new_re, element):
relationship = self.compare(new_re, element)
if relationship:
getattr(self, 'add_' + relationship)(new_re, element)
def add_SUPERSET(self, new_re, element):
for i in element.subsets:
i.supersets.add(new_re)
new_re.subsets.add(i)
element.supersets.add(new_re)
new_re.subsets.add(element)
def add_SUBSET(self, new_re, element):
for i in element.subsets:
self.process(new_re, i)
element.subsets.add(new_re)
new_re.supersets.add(element)
def add_DISJOINT(self, new_re, element):
for i in element.subsets:
i.disjoints.add(new_re)
new_re.disjoints.add(i)
new_re.disjoints.add(element)
element.disjoints.add(new_re)
def add_INTERSECT(self, new_re, element):
for i in element.subsets:
self.process(new_re, i)
new_re.intersects.add(element)
element.intersects.add(new_re)
def add_EQUAL(self, new_re, element):
new_re.supersets = element.supersets.copy()
new_re.subsets = element.subsets.copy()
new_re.disjoints = element.disjoints.copy()
new_re.intersects = element.intersects.copy()
def compare(self, a, b):
for test in self.tests:
result = test(a.string, b.string)
if result:
return result
def match(self, text, strict=True):
matches = set()
self._match(text, self.roots, matches)
out = []
for e in matches:
for s in e.subsets:
if s in matches:
break
else:
out.append(e)
if strict and len(out) > 1:
for i in out:
print(i.string)
raise Exception("Multiple equally specific matches found for " + text)
return out
def _match(self, text, elements, matches):
new_elements = []
for element in elements:
m = element.compiled.match(text)
if m:
matches.add(element)
new_elements.extend(element.subsets)
if new_elements:
self._match(text, new_elements, matches)
Usage
graph = SubsetGraph([equal_test, test_2, test_3, ...])
graph.add_regex("00:11:22:..:..:..")
graph.add_regex("..(:..){5,5}"
graph.match("00:de:ad:be:ef:00")
A complete usable version is here.

My gut instinct says that not only is this a hard problem, both in terms of computational cost and implementation difficulty, but it may be unsolvable in any realistic fashion. Consider the two following regular expressions to accept the string alice#myprovider.com
alice#[a-z]+\.[a-z]+
[a-z]+#myprovider.com
Which one of these is more specific?

This is a bit of a hack, but it could provide a practical solution to this question asked nearly 10 years ago.
As pointed out by #torak, there are difficulties in defining what it means for one regular expression to be more specific than another.
My suggestion is to look at how stable the regular expression is with respect to a string that matches it. The usual way to investigate stability is to make minor changes to the inputs, and see if you still get the same result.
For example, the string alice#myprovider.com matches the regex /alice#myprovider\.com/, but if you make any change to the string, it will not match. So this regex is very unstable. But the regex /.*/ is very stable, because you can make any change to the string, and it still matches.
So, in looking for the most specific regex, we are looking for the least stable one with respect to a string that matches it.
In order to implement this test for stability, we need to define how we choose a minor change to the string that matches the regex. This is another can of worms. We could for example, choose to change each character of the string to something random and test that against the regex, or any number of other possible choices. For simplicity, I suggest deleting one character at a time from the string, and testing that.
So, if the string that matches is N characters long, we have N tests to make. Lets's look at deleting one character at a time from the string alice#foo.com, which matches all of the regular expressions in the table below. It's 12 characters long, so there are 12 tests. In the table below,
0 means the regex does not match (unstable),
1 means it matches (stable)
/alice#[a-z]+\.[a-z]+/ /[a-z]+#[a-z]+\.[a-z]+/ /.*/
lice#foo.com 0 1 1
aice#foo.com 0 1 1
alce#foo.com 0 1 1
alie#foo.com 0 1 1
alic#foo.com 0 1 1
alicefoo.com 0 0 1
alice#oo.com 1 1 1
alice#fo.com 1 1 1
alice#fo.com 1 1 1
alice#foocom 0 0 1
alice#foo.om 1 1 1
alice#foo.cm 1 1 1
--- --- ---
total score: 5 10 12
The regex with the lowest score is the most specific. Of course, in general, there may be more than one regex with the same score, which reflects the fact there are regular expressions which by any reasonable way of measuring specificity are as specific as one another. Although it may also yield the same score for regular expressions that one can easily argue are not as specific as each other (if you can think of an example, please comment).
But coming back to the question asked by #torak, which of these is more specific:
alice#[a-z]+\.[a-z]+
[a-z]+#myprovider.com
We could argue that the second is more specific because it constrains more characters, and the above test will agree with that view.
As I said, the way we choose to make minor changes to the string that matches more than one regex is a can of worms, and the answer that the above method yields may depend on that choice. But as I said, this is an easily implementable hack - it is not rigourous.
And, of course the method breaks if the string that matches is empty. The usefulness if the test will increase as the length of the string increases. With very short strings, it is more likely produce equal scores for regular expressions that are clearly different in their specificity.

I'm thinking of a similar problem for a PHP projects route parser. After reading the other answers and comments here, and also thinking about the cost involved I might go in another direction altogether.
A solution however, would be to simply sort the regular expression list in order of it's string length.
It's not perfect, but simply by removing the []-groups it would be much closer. On the first example in the question it would this list:
- alice#[a-z]+\.[a-z]+
- [a-z]+#[a-z]+\.[a-z]+
- .*
To this, after removing content of any []-group:
- alice#+\.+
- +#+\.+
- .*
Same thing goes for the second example in another answer, with the []-groups completely removed and sorted by length, this:
alice#[a-z]+\.[a-z]+
[a-z]+#myprovider.com
Would become sorted as:
+#myprovider.com
alice#+\.+
This is a good enough solution at least for me, if I choose to use it. Downside would be the overhead of removing all groups of [] before sorting and applying the sort on the unmodified list of regexes, but hey - you can't get everything.