Prolog - Help fixing a rule - database

I have a database full of facts such as:
overground(newcrossgate,brockley,2).
overground(brockley,honoroakpark,3).
overground(honoroakpark,foresthill,3).
overground(foresthill,sydenham,2).
overground(sydenham,pengewest,3).
overground(pengewest,anerley,2).
overground(anerley,norwoodjunction,3).
overground(norwoodjunction,westcroydon,8).
overground(sydenham,crystalpalace,5).
overground(highburyandislington,canonbury,2).
overground(canonbury,dalstonjunction,3).
overground(dalstonjunction,haggerston,1).
overground(haggerston,hoxton,2).
overground(hoxton,shoreditchhighstreet,3).
example: newcrossgate to brockley takes 2 minutes.
I then created a rule so that if I enter the query istime(newcrossgate,honoroakpark,Z). then prolog should give me the time it takes to travel between those two stations. (The rule I made is designed to calculate the distance between any two stations at all, not just adjacent ones).
istime(X,Y,Z):- istime(X,Y,0,Z); istime(Y,X,0,Z).
istime(X,Y,T,Z):- overground(X,Y,Z), T1 is T + Z.
istime(X,Y,Z):- overground(X,A,T), istime(A,Y,T1), Z is T + T1.
istime(X,Y,Z):- overground(X,B,T), istime(B,X,T1), Z is T + T1.
it seems to work perfectly for newcrossgate to the first couple stations, e.g newcrossgate to foresthill or sydenham. However, after testing newcrossgate to westcroydon which takes 26mins, I tried newcrossgate to crystalpalace and prolog said it should take 15 mins...despite the fact its the next station after westcroydon. Clearly somethings wrong here, however it works for most of the stations while coming up with a occasional error in time every now and again, can anyone tell me whats wrong? :S

This is essentially the same problem as your previous question, the only difference is that you need to accumulate the time as you go.
One thing I see is that your "public" predicate, istime/3 tries to do too much. All it should do is seed the accumulator and invoke the worker predicate istime/4. Since you're looking for route/time in both directions, the public predicate should be just
istime( X , Y , Z ) :- istime( X , Y , 0 , Z ) .
istime( X , Y , Z ) :- istime( Y , X , 0 , Z ) .
The above is essentially the first clause of your istime/3 predicate
istime(X,Y,Z):- istime(X,Y,0,Z); istime(Y,X,0,Z).
The remaining clauses of istime/3, the recursive ones:
istime(X,Y,Z):- overground(X,A,T), istime(A,Y,T1), Z is T + T1.
istime(X,Y,Z):- overground(X,B,T), istime(B,X,T1), Z is T + T1.
should properly be part of istime/4 and have the accumulator present. That's where your problem is.
Give it another shot and edit your question to show the next iteration. If you still can't figure it out, I'll show you some different ways to do it.
Some Hints
Your "worker" predicate will likely look a lot like your earlier "find a route between two stations" exercise, but it will have an extra argument, the accumulator for elapsed time.
There are two special cases. If you use the approach you used in your "find a route between two stations" solution, the special cases are
A and B are directly adjacent.
A and B are connected via at least one intermediate stop.
There's another approach as well, that might be described as using lookahead, in which case the special cases are
A and B are the same, in which case you're arrived.
A and B are not and are connected by zero or more intermediate stops.
FWIW, You shouldn't necessarily expect the route with the shortest elapsed time or the minimal number of hops to be the first solution found. Backtracking will produce all the routes, but the order in which they are found has to do with the order in which the facts are stored in the database. A minimal cost search of the graph is another kettle of fish entirely.

Have you tried to cycle through answers with ;? 26mins is not the shortest time between newcrossgate and westcroydon...
Edit: my bad! Apparently the shorter results were due to a bug in your code (see my comment about the 4th clause). However, your code is correct, 15mins is the shortest route between newcrossgate and crystalpalace. Only because there is a route that goes from newcrossgate to westcroydon, then crystalpalace, that doesn't mean it's the shortest route, or the route your program will yield first.
Update: if you're running into problems to find answers to some routes, I'd suggest changing the 3rd clause to:
istime(X,Y,_,Z):- overground(X,A,T), istime(A,Y,T1), Z is T + T1.
The reason is simple: your first clause swaps X with Y, which is good, since with that you're saying the routes are symmetrical. However, the 3rd clause does not benefit from that, because it's never called by the swapped one. Ignoring the 3rd argument (which you're not using anyway) and thus letting the 1st clause call the 3rd might fix your issue, since some valid routes that were not used previously will be now.
(also: I agree with Nicholas Carey's answer, it would be better to use the third argument as an accumulator; but as I said, ignoring it for now might just work)

To make it work you need to do the reverse of both journeys stated in your last clause.
Keep the predicate as it is, istime(X,Y,Z) and just make another clause containing the reverse journeys.
This way it works with all the stations. (Tried and Tested)

Related

(SPSS) Assign values to remaining time point based on value on another variable, and looping for each case

I am currently working on analyzing a within-subject dataset with 8 time-ordered assessment points for each subject.
The variables of interest in this example is ID, time point, and accident.
I want to create two variables: accident_intercept and accident_slope, based on the value on accident at a particular time point.
For the accident_intercept variable, once a participant indicated the occurrence of an accident (e.g., accident = 1) at a specific time point, I want the values for that time point and the remaining time points to be 1.
For the accident_slope variable, once a participant indicated the occurrence of an accident (e.g., accident = 1) at a specific time point, I want the value of that time point to be 0, but count up by 1 for the remaining time points until the end time point, for each subject.
The main challenge here is that the process stated above need to be repeated/looped for each participant that occupies 8 rows of data.
Please see how the newly created variables would look like:
I have looked into the instruction for different SPSS syntax, such as loop, the lag/lead functions. I also tried to break my task into different components and google each one. However, I have not made any progress :)
I would be really grateful of any helps and directions that you provide.
Here is one way to do what you need using aggregate to calculate "accident time":
if accident=1 accidentTime=TimePoint.
aggregate out=* mode=addvariables overwrite=yes /break=ID/accidentTime=max(accidentTime).
if TimePoint>=accidentTime Accident_Intercept=1.
if TimePoint>=accidentTime Accident_Slope=TimePoint-accidentTime.
recode Accident_Slope accidentTime (miss=0).
Here is another approach using the lag function:
compute Accident_Intercept=0.
if accident=1 Accident_Intercept=1.
if $casenum>1 and id=lag(id) and lag(Accident_Intercept)=1 Accident_Intercept=1.
compute Accident_Slope=0.
if $casenum>1 and id=lag(id) and lag(Accident_Intercept)=1 Accident_Slope=lag(Accident_Slope) +1.
exe.

Looping and selectig different value in each line

I have a problem that I think would be solved relatively quickly with a loop. I have to work with SPSS and I think it can only be solved in syntax.
Unfortunately I am not good with loops, so I hope that one of you can help me.
I have done a study on reasons for abortions. Now I would like to present the distribution of reasons.
The problem is that each person was first asked about all their pregnancies (because this is also relevant for the later analysis), then the pregnancy was determined to which the questionnaire will further refer.
So the further questionnaire was only about one of the pregnancies, whereas the first questions (f.ex. year of pregnancy, reason for abortion) were answered for each pregnancy. For the reasons I only need the information that refers to the pregnancy that was also used for the further questionnaire.
I have an index variable that determines the loop at which pass the relevant pregnancy is asked ("index"). Then I have the variable "Loop_1_R" to "Loop_5_R" which queries the reasons for each up to 5 abortions (of course, for each woman, only the number of pregnancies that she also indicated). In between there are some missing data, for ex. it could be that a woman said that she had 5 pregnancies, but only two of them were abortions (f.ex. the third and fifth). So then she would only give reasons for an abortion in loop3 and loop5.
Now I want to create a new variable which contains only the reason which refers to the relevant pregnancy. So for each woman only one value. I was thinking, you could build a loop in the sense of calculate new variable in such a way that loop i is taken at index i.
I could of course do it by hand, but with a VPN count of over 3000 it will obviously take considerably longer.
I hope someone can help me! This is an example dataset with less loops and VPN:
You can use do repeat to loop and catch the value you need this way:
do repeat vr=Loop_1_R to Loop_5_R/vl=1 to 5.
if Index=vl reason=vr.
end repeat.

AI - what is included in the state

I'm taking my first course in AI, and I have to define some problems in my homework (not yet solve them, just supply a definition).
So I have to define about boolean satisfiability problem
:
What is a state?
What is the initial state?
What is a final state?
What are the operators?
My question is: Should the formula be a part of the state?
Considerations so far:
The operator doesn't change it, and it's constant through the computation, so it's not.
If I do include it, in theory, the search space gets much bigger, since more states are possible, but in reality the formula can't be changed, so I get a big state, and a branching factor that is not corresponding.
It's varying from one execution to the next, so it should be a part of the state.
You need only really consider the varying parts of the problem to be a state when conducting a search such as this, although I'd say in this case it really comes down to how you define the problem.
The search space for a given run of the algorithm depends upon the input formula, but after that is fixed, ie you are searching the space of n length bit vectors where n is the number of variables in the formula. So the formula is not part of the state because it does not vary.
The counter claim is that you are searching in a larger space of formula-vector pairs, but as you cannot change the formula as part of the problem, this has not really increased the size of the search space. So I would not make the claim that "If I do include it, in theory the search space gets much bigger". It does not, the reachable states are the same, the branching is the same, the space that requires exploring to solve the problem is the same.
Given this, my answer would that the formula is not part of the state, but defines the nature of the state space. So the answers to your four questions will each be functionally dependant on the formula in some way, but the state depends only on the length of the formula.
Hope that makes sense!
This is just a note for future readers - Not an answer
Vic Smith is right, another way to look at the fact that in theory there are more states but in practice not (my second dot), is just to think about it as separate bondage spaces. For example for the formula X or Y there is one bondage space, and for not X and Y there is another one and they have no common nodes in the representation.
So it can vary from one execution to another, but still has the same "reachable" states, and same branching factor. And each execution has a different starting state.

Mathematica Findroot Exploring the parameter space

I am solving three non-linear equations in three variables (H0D,H0S and H1S) using FindRoot. In addition to the three variables of interest, there are four parameters in these equations that I would like to be able to vary. My parameters and the range in which I want to vary them are as follows:
CF∈{0,15} , CR∈{0,8} , T∈{0,0.35} , H1R∈{40,79}
The problem is that my non-linear system may not have any solutions for part of this parameter range. What I basically want to ask is if there is a smart way to find out exactly what part of my parameter range admits real solutions.
I could run a FindRoot inside a loop but because of non-linearity, FindRoot is very sensitive to initial conditions so frequently error messages could be because of bad initial conditions rather than absence of a solution.
Is there a way for me to find out what parameter space works, short of plugging 10^4 combinations of parameter values by hand and playing around with the initial conditions and hoping that FindRoot gives me a solution?
Thanks a lot,

How many possible bugs are there?

A user has to complete ten steps to achieve a desired result. The ten steps can be completed in any order.
If there is a bug, the bug is dependent only on the steps that have been taken, not the order in which they were taken (i.e., the bug is path independent).
For example: If the user performs three steps in the order 10, 1, 2 and produces a bug the exact same bug will be produced if the user performs the same three steps in the order 1, 2, 10.
What is the maximum number of unique bugs this program can have?
You mean what is the number of distinct sets pickable from 10 elements? That's a powerset: 2**10.
Much later: some knowledgeable others have suggested that having no bugs should not be counted as a bug. Accordingly, I revise my count: 2**10 - 1.
hughdbrown's answer is correct, but there is another possible interpretation of the question. Suppose that a sequence of operations can never produce more than one bug (i.e. that it should just be counted as one bug). For example, if the operations (3,6,2) is a bug, then you shouldn't be allowed to count (3,6,2,5) as another bug. In that case, rather than finding the maximum possible number of subsets of {1,2,...,10}, you want to find the maximum number of possible subsets so that no one contains another. The answer to this version of the question is "10 choose 5"=252.
Edit: by the way, the result that says this is maximal is called Sperner's Theorem.
It depends entirely how many ways there are of doing each step. If you have a process that involves only one step, but there are multiple ways of doing that step, every step could have an associated bug.
There's also the misuse of functions, which you cant prevent against, which could be considered a bug. ie:
If a user was to think that
rm -rf /
was short for
remove media --really fast /
ie: eject all devices1
I would guess that would be a potential bug. Its user error really, but its still a singular thing that can occur that produces results other than that were wanted.
You could argue the above is a bit over the top, but ultimately, there is no limitation on the ways users can do things wrong.
When users are there, assume, anything that can go wrong, will.
The only problem with the above reasoning, is you have to prematurely delete powerful things so users don't hurt themselves, which leads to less effective tools for those who know how to use them. Like corks on forks sort of rationale.
The only way to solve this concern effectively is give newbs blunt objects to learn with, and then give them an option which takes away all the foam padding once they learn the ropes, so experienced users don't have to keep working with blunt tools, and don't have to deblunten every tool themself.
( If there are infinite numbers of possibly ways to do 1 step, I don't even want to begin to think of the numbers of ways to do 10 steps wrong )
1: If you don't know, this will erase lots of your hard drive and cause much pain. Don't do it.
One, a designer fault? :)

Resources