How many possible bugs are there? - theory

A user has to complete ten steps to achieve a desired result. The ten steps can be completed in any order.
If there is a bug, the bug is dependent only on the steps that have been taken, not the order in which they were taken (i.e., the bug is path independent).
For example: If the user performs three steps in the order 10, 1, 2 and produces a bug the exact same bug will be produced if the user performs the same three steps in the order 1, 2, 10.
What is the maximum number of unique bugs this program can have?

You mean what is the number of distinct sets pickable from 10 elements? That's a powerset: 2**10.
Much later: some knowledgeable others have suggested that having no bugs should not be counted as a bug. Accordingly, I revise my count: 2**10 - 1.

hughdbrown's answer is correct, but there is another possible interpretation of the question. Suppose that a sequence of operations can never produce more than one bug (i.e. that it should just be counted as one bug). For example, if the operations (3,6,2) is a bug, then you shouldn't be allowed to count (3,6,2,5) as another bug. In that case, rather than finding the maximum possible number of subsets of {1,2,...,10}, you want to find the maximum number of possible subsets so that no one contains another. The answer to this version of the question is "10 choose 5"=252.
Edit: by the way, the result that says this is maximal is called Sperner's Theorem.

It depends entirely how many ways there are of doing each step. If you have a process that involves only one step, but there are multiple ways of doing that step, every step could have an associated bug.
There's also the misuse of functions, which you cant prevent against, which could be considered a bug. ie:
If a user was to think that
rm -rf /
was short for
remove media --really fast /
ie: eject all devices1
I would guess that would be a potential bug. Its user error really, but its still a singular thing that can occur that produces results other than that were wanted.
You could argue the above is a bit over the top, but ultimately, there is no limitation on the ways users can do things wrong.
When users are there, assume, anything that can go wrong, will.
The only problem with the above reasoning, is you have to prematurely delete powerful things so users don't hurt themselves, which leads to less effective tools for those who know how to use them. Like corks on forks sort of rationale.
The only way to solve this concern effectively is give newbs blunt objects to learn with, and then give them an option which takes away all the foam padding once they learn the ropes, so experienced users don't have to keep working with blunt tools, and don't have to deblunten every tool themself.
( If there are infinite numbers of possibly ways to do 1 step, I don't even want to begin to think of the numbers of ways to do 10 steps wrong )
1: If you don't know, this will erase lots of your hard drive and cause much pain. Don't do it.

One, a designer fault? :)

Related

Looping and selectig different value in each line

I have a problem that I think would be solved relatively quickly with a loop. I have to work with SPSS and I think it can only be solved in syntax.
Unfortunately I am not good with loops, so I hope that one of you can help me.
I have done a study on reasons for abortions. Now I would like to present the distribution of reasons.
The problem is that each person was first asked about all their pregnancies (because this is also relevant for the later analysis), then the pregnancy was determined to which the questionnaire will further refer.
So the further questionnaire was only about one of the pregnancies, whereas the first questions (f.ex. year of pregnancy, reason for abortion) were answered for each pregnancy. For the reasons I only need the information that refers to the pregnancy that was also used for the further questionnaire.
I have an index variable that determines the loop at which pass the relevant pregnancy is asked ("index"). Then I have the variable "Loop_1_R" to "Loop_5_R" which queries the reasons for each up to 5 abortions (of course, for each woman, only the number of pregnancies that she also indicated). In between there are some missing data, for ex. it could be that a woman said that she had 5 pregnancies, but only two of them were abortions (f.ex. the third and fifth). So then she would only give reasons for an abortion in loop3 and loop5.
Now I want to create a new variable which contains only the reason which refers to the relevant pregnancy. So for each woman only one value. I was thinking, you could build a loop in the sense of calculate new variable in such a way that loop i is taken at index i.
I could of course do it by hand, but with a VPN count of over 3000 it will obviously take considerably longer.
I hope someone can help me! This is an example dataset with less loops and VPN:
You can use do repeat to loop and catch the value you need this way:
do repeat vr=Loop_1_R to Loop_5_R/vl=1 to 5.
if Index=vl reason=vr.
end repeat.

Tree search based Game AI: How to avoid AI 'wandering'/'procrastination' with sparse rewards?

My game AI makes use of an algorithm that searches all possible future states based on the moves I can make (minimax / monte carlo esque). It evaluates these states using a scoring system, picks the highest scored final state and follows it.
This works well in most situations, but awfully when rewards are sparse. For example: there's a desirable collectable object that's 3 tiles to the right of me. The natural solution would be to go right->right->right.
But, my algorithm searches 6 turns deep. And it will will find many paths that eventually collect the object, including ones that take longer than 3 turns. It might for example find a path that's: up->right->down->right->right->down, collecting the object on turn 5 instead.
Since in both cases, the final leaf nodes detect the object as collected, it doesn't naturally prefer one or the other. So, instead of going right on turn 1, it might go up, or down, or left. This behavior will be repeated exactly on the next turn, so that it basically ends up dancing randomly in front of the collectable object, only luck will make it step on it.
That's clearly suboptimal and I want to fix it, but have run out of ideas how to handle this appropriately. Are there any solutions for this issue or is there any theoretical work that deals with handling this issue?
Solutions I've tried:
Make it value object collection more on earlier turns. While this works, to beat evaluator 'noise', the difference between turns must be quite high. Turn 1 must be rated higher than 2, turn 2 rated higher than 3, etc. The difference between turn 1 and 6 needs to be so high that it ends up making the behavior extremely greedy, which is not desirable in most situations. In an environment with multiple objects, it might end up choosing the path that grabs an object on turn 1, instead of the much better path that can grab objects on turn 5 and 6.
Assign the object as a target and value distance to it. If not done on a turn to turn basis, the original problem persists. If done on a turn to turn basis, the difference in importance required per turn once again makes it too greedy. This method also decreases flexibility and causes other issues. Target selection is not trivial and kind of ruins the point of a minimax style algorithm
Going much deeper in my searches so that it can always find a second object. This would cost so much computing power that I'd have to make concessions, like pruning paths much more aggressively. If I do so, I'll be back at the same problem since I won't know how to get it to prefer pruning the 5 turn version over the 3 turn version.
Give extra value to the plans laid out last turn. If it can at least follow the suboptimal path, there wouldn't be as much of an issue. Unfortunately, this once again has to be a pretty strong effect for it to work reliably, making it follow sub-optimal paths in all scenarios, hurting overall performance.
When weighting the outcome of the last step of your move, are you calculating in the number of moves needed to pick up an object?
I presume, you are quantifying each step of your move actions, giving a +1 if the step results in the picking up of an object. This means that in 3 steps, I can pick up the object with your above example, and get a +1 state of the play field, but I can also do this with 4-5-6-x steps, getting the same +1 state. If only a single object is reachable in the depth you are searching, your algorithm will likely select one of the random +1 states, giving you the above behaviour.
This could be solved, by quantifying with a negative score, each of the moves the AI must make. Thus, getting the object in 3 moves, will result in a -2, but getting the object in 6 moves, will result in -5. This way, the AI will clearly know, that it is preferable to get the object in the least amount of moves, ie, 3.

Code optimization - cicle

I want to do a large block of code until at least one of the elements of array 1 is equal to 1 of the elements of array 2.
I'm asking the community to share, if possible, the best(fastest to process) ways possible to do this "while"
Sum up:
while (none of the elements from arr1 is equal to any of arr2)
{
(code)
}
Reason: In my code, depending on some dimensions set by the user, my program may need to make this n^2 complexity comparison a lot of times, so I'm looking for a way to make it as light as I can.
I'm sorry in advance, and please let me know, if this type of questions are not suitable for StackOverflow.
Edit: My bad in not giving information about the arrays. As I said, it's dimension may vary based on what the user choses, but each one's size should be between 3 and 1000. Both arrays of integers.
Their values do change, the bigger the dimensions, the more it can happen.
The comments mention a hash and I agree, it could work, but it's very size-dependent. O(n^2)'s overhead will be negligible a lot of the time.
Otherwise, just add all the elements of arr1 into the hashset, then go through arr2's elements to see if they're in there. You'll get O(n) time. Honestly though, unless you're working with elements in the hundred thousands, even millions, I don't think the pay-out will be that tangible, but it's machine dependent and I haven't tested it myself.
C++ has std::unordered_set in the standard. If you're using pure C, I'm sure there's implementations available online.

How to go about creating a prolog program that can work backwards to determine steps needed to reach a goal

I'm not sure what exactly I'm trying to ask. I want to be able to make some code that can easily take an initial and final state and some rules, and determine paths/choices to get there.
So think, for example, in a game like Starcraft. To build a factory I need to have a barracks and a command center already built. So if I have nothing and I want a factory I might say ->Command Center->Barracks->Factory. Each thing takes time and resources, and that should be noted and considered in the path. If I want my factory at 5 minutes there are less options then if I want it at 10.
Also, the engine should be able to calculate available resources and utilize them effectively. Those three buildings might cost 600 total minerals but the engine should plan the Command Center when it would have 200 (or w/e it costs).
This would ultimately have requirements similar to 10 marines # 5 minutes, infantry weapons upgrade at 6:30, 30 marines at 10 minutes, Factory # 11, etc...
So, how do I go about doing something like this? My first thought was to use some procedural language and make all the decisions from the ground up. I could simulate the system and branching and making different choices. Ultimately, some choices are going quickly make it impossible to reach goals later (If I build 20 Supply Depots I'm prob not going to make that factory on time.)
So then I thought weren't functional languages designed for this? I tried to write some prolog but I've been having trouble with stuff like time and distance calculations. And I'm not sure the best way to return the "plan".
I was thinking I could write:
depends_on(factory, barracks)
depends_on(barracks, command_center)
builds_from(marine, barracks)
build_time(command_center, 60)
build_time(barracks, 45)
build_time(factory, 30)
minerals(command_center, 400)
...
build(X) :-
depends_on(X, Y),
build_time(X, T),
minerals(X, M),
...
Here's where I get confused. I'm not sure how to construct this function and a query to get anything even close to what I want. I would have to somehow account for rate at which minerals are gathered during the time spent building and other possible paths with extra gold. If I only want 1 marine in 10 minutes I would want the engine to generate lots of plans because there are lots of ways to end with 1 marine at 10 minutes (maybe cut it off after so many, not sure how you do that in prolog).
I'm looking for advice on how to continue down this path or advice about other options. I haven't been able to find anything more useful than towers of hanoi and ancestry examples for AI so even some good articles explaining how to use prolog to DO REAL THINGS would be amazing. And if I somehow can get these rules set up in a useful way how to I get the "plans" prolog came up with (ways to solve the query) other than writing to stdout like all the towers of hanoi examples do? Or is that the preferred way?
My other question is, my main code is in ruby (and potentially other languages) and the options to communicate with prolog are calling my prolog program from within ruby, accessing a virtual file system from within prolog, or some kind of database structure (unlikely). I'm using SWI-Prolog atm, would I be better off doing this procedurally in Ruby or would constructing this in a functional language like prolog or haskall be worth the extra effort integrating?
I'm sorry if this is unclear, I appreciate any attempt to help, and I'll re-word things that are unclear.
Your question is typical and very common for users of procedural languages who first try Prolog. It is very easy to solve: You need to think in terms of relations between successive states of your world. A state of your world consists for example of the time elapsed, the minerals available, the things you already built etc. Such a state can be easily represented with a Prolog term, and could look for example like time_minerals_buildings(10, 10000, [barracks,factory])). Given such a state, you need to describe what the state's possible successor states look like. For example:
state_successor(State0, State) :-
State0 = time_minerals_buildings(Time0, Minerals0, Buildings0),
Time is Time0 + 1,
can_build_new_building(Buildings0, Building),
building_minerals(Building, MB),
Minerals is Minerals0 - MB,
Minerals >= 0,
State = time_minerals_buildings(Time, Minerals, Building).
I am using the explicit naming convention (State0 -> State) to make clear that we are talking about successive states. You can of course also pull the unifications into the clause head. The example code is purely hypothetical and could look rather different in your final application. In this case, I am describing that the new state's elapsed time is the old state's time + 1, that the new amount of minerals decreases by the amount required to build Building, and that I have a predicate can_build_new_building(Bs, B), which is true when a new building B can be built assuming that the buildings given in Bs are already built. I assume it is a non-deterministic predicate in general, and will yield all possible answers (= new buildings that can be built) on backtracking, and I leave it as an exercise for you to define such a predicate.
Given such a predicate state_successor/2, which relates a state of the world to its direct possible successors, you can easily define a path of states that lead to a desired final state. In its simplest form, it will look similar to the following DCG that describes a list of successive states:
states(State0) -->
( { final_state(State0) } -> []
; [State0],
{ state_successor(State0, State1) },
states(State1)
).
You can then use for example iterative deepening to search for solutions:
?- initial_state(S0), length(Path, _), phrase(states(S0), Path).
Also, you can keep track of states you already considered and avoid re-exploring them etc.
The reason you get confused with the example code you posted is essentially that build/1 does not have enough arguments to describe what you want. You need at least two arguments: One is the current state of the world, and the other is a possible successor to this given state. Given such a relation, everything else you need can be described easily. I hope this answers your question.
Caveat: my Prolog is rusty and shallow, so this may be off base
Perhaps a 'difference engine' approach would be appropriate:
given a goal like 'build factory',
backwards-chaining relations would check for has-barracks and tell you first to build-barracks,
which would check for has-command-center and tell you to build-command-center,
and so on,
accumulating a plan (and costs) along the way
If this is practical, it may be more flexible than a state-based approach... or it may be the same thing wearing a different t-shirt!

Help--100% accuracy with LibSVM?

Nominally a good problem to have, but I'm pretty sure it is because something funny is going on...
As context, I'm working on a problem in the facial expression/recognition space, so getting 100% accuracy seems incredibly implausible (not that it would be plausible in most applications...). I'm guessing there is either some consistent bias in the data set that it making it overly easy for an SVM to pull out the answer, =or=, more likely, I've done something wrong on the SVM side.
I'm looking for suggestions to help understand what is going on--is it me (=my usage of LibSVM)? Or is it the data?
The details:
About ~2500 labeled data vectors/instances (transformed video frames of individuals--<20 individual persons total), binary classification problem. ~900 features/instance. Unbalanced data set at about a 1:4 ratio.
Ran subset.py to separate the data into test (500 instances) and train (remaining).
Ran "svm-train -t 0 ". (Note: apparently no need for '-w1 1 -w-1 4'...)
Ran svm-predict on the test file. Accuracy=100%!
Things tried:
Checked about 10 times over that I'm not training & testing on the same data files, through some inadvertent command-line argument error
re-ran subset.py (even with -s 1) multiple times and did train/test only multiple different data sets (in case I randomly upon the most magical train/test pa
ran a simple diff-like check to confirm that the test file is not a subset of the training data
svm-scale on the data has no effect on accuracy (accuracy=100%). (Although the number of support vectors does drop from nSV=127, bSV=64 to nBSV=72, bSV=0.)
((weird)) using the default RBF kernel (vice linear -- i.e., removing '-t 0') results in accuracy going to garbage(?!)
(sanity check) running svm-predict using a model trained on a scaled data set against an unscaled data set results in accuracy = 80% (i.e., it always guesses the dominant class). This is strictly a sanity check to make sure that somehow svm-predict is nominally acting right on my machine.
Tentative conclusion?:
Something with the data is wacked--somehow, within the data set, there is a subtle, experimenter-driven effect that the SVM is picking up on.
(This doesn't, on first pass, explain why the RBF kernel gives garbage results, however.)
Would greatly appreciate any suggestions on a) how to fix my usage of LibSVM (if that is actually the problem) or b) determine what subtle experimenter-bias in the data LibSVM is picking up on.
Two other ideas:
Make sure you're not training and testing on the same data. This sounds kind of dumb, but in computer vision applications you should take care that: make sure you're not repeating data (say two frames of the same video fall on different folds), you're not training and testing on the same individual, etc. It is more subtle than it sounds.
Make sure you search for gamma and C parameters for the RBF kernel. There are good theoretical (asymptotic) results that justify that a linear classifier is just a degenerate RBF classifier. So you should just look for a good (C, gamma) pair.
Notwithstanding that the devil is in the details, here are three simple tests you could try:
Quickie (~2 minutes): Run the data through a decision tree algorithm. This is available in Matlab via classregtree, or you can load into R and use rpart. This could tell you if one or just a few features happen to give a perfect separation.
Not-so-quickie (~10-60 minutes, depending on your infrastructure): Iteratively split the features (i.e. from 900 to 2 sets of 450), train, and test. If one of the subsets gives you perfect classification, split it again. It would take fewer than 10 such splits to find out where the problem variables are. If it happens to "break" with many variables remaining (or even in the first split), select a different random subset of features, shave off fewer variables at a time, etc. It can't possibly need all 900 to split the data.
Deeper analysis (minutes to several hours): try permutations of labels. If you can permute all of them and still get perfect separation, you have some problem in your train/test setup. If you select increasingly larger subsets to permute (or, if going in the other direction, to leave static), you can see where you begin to lose separability. Alternatively, consider decreasing your training set size and if you get separability even with a very small training set, then something is weird.
Method #1 is fast & should be insightful. There are some other methods I could recommend, but #1 and #2 are easy and it would be odd if they don't give any insights.

Resources