Find linearity in a graph - c

I am doing automation for a project and the results I get is in the form of a graph wherein I take the performance results.
Now the performance results which I take is generally at a straight line from the graph.
For example lets say the results from the graph in a List could be like this:
10, 30,90,100, 150,200,250,300,350,400,450,800,1000,1500,2000,2010,2006,2004,2000,1900,1800,1700, 1600,1000,500,400,0.
As you see the performance of the device starts increasing and then at a certain point it remains linear and with failures it starts dropping.
The point I want to take is the linear line.
As you can see in the list of numbers we see that from (2000,2010,2006,2004,2000) there is some kind of a linear line.
I am not asking for any code or Algorithm to solve this....I do not need an answer. If anyone can just give me a hint or a little clue I will try to do the rest.

Do you mean constant or linear?
If you mean linear:
Why not take the differences of adjacent values and search for a sequence that stays close to constant?
If you mean constant:
Why not take the differences of adjacent values and search for a sequence that stays close to 0?

First decide on the absolute or relative tolerance you can handle, that decides what is a straight line.
Then iterate trough the array checking the value of a point with the next point, if they are within tolerance, continue iterating until you get a point that is not and store those points. They represent a straight line.
This solution is very simple, not perfect and takes O(n) time.

Related

what is the fastest way to check 3repetition of chess position?

I am wirting chess Ai as a project. if positon repetes 3 times it is a draw I can create array with all previus position then get every updated position iterete over every previus one of them and see if we have 2 same position in array. but this seem like a lot of work for computer and it will make calculating moves for Ai hard. is there any better way to do this?
I would suggest using Zobrist Hashing, which is designed to handle this kind of situation. Simply store a list of hash values for each position as you go along. You could also use a Bloom Filter.
It does not matter so much if you get some false positives if you keep track of the actual board configurations as well, so if you do get a collision, you can then quickly check if you have come across the current position before; this should not happen very often if you use sufficiently large hash values.
As Oliver mentioned in his answer you should use something like Zobrist Hashing, each position then have an (almost) unique number. Zobrist Hashing is hard to implement so you need to make sure your implementation is bug free, which is easier said than done. You then store this value in a list or similar that you loop over backwards to find how many time the same position has occured.
To make the lookup faster you only need to loop each other step since you are checking for a specific colors turn. You can also break the loop immediately if you find a pawn move or a capture since these moves make the position impossible to be repeated again.
You can look at Chessprogrmaming - Repetitions for more inspiration, especially the header "List of Keys".

Fastest way to find minimum distance of one point to points on a curve

I'm looking for a fast solution for the following problem:
I have a fixed point (let's say the upper right on the white measurement line) and need to find the closest point on a curve made of equally spaced points (the lower curve). Additionally, I do this for every point on the upper curve to draw the distances between the curves with different colours (three levels: below minimum [red], between minimum and maximum [orange] and above maximum [green]).
My current solution is a tradeoff: I take the fixed point, iterate through an arbitrary interval (e. g. 50 units to the left and right of the fixed point) and calculate the distance of each pair. This saves some CPU power, but it is neither elegant nor accurate, since I could miss a minimum distance outside my chosen interval.
Any proposals for a faster algorithm?
Edit: Equally spaced means all points have the same distance on the x-axis, this is true for both curves. Also I do not need to interpolate between the points, this would be too time consuming.
Rather than an arbitrary distance, you could perhaps iterate until "out of range".
In your example, suppose you start with the point on the upper curve at the top-right of your line. Then drop vertically downwards, you get a distance of (by my eye) about 200um.
Now you can move right from here testing points until the horizontal distance is 200um. Beyond that, it's impossible to get a distance less than 200um.
Moving left, the distance goes down until you find the 150um minimum, then starts rising again. Once you're 150um to the left of your upper point, again, it's impossible to beat the minimum you've found.
If you'd gone left first, you wouldn't have had to go so far right, so as an optimization either follow the direction in which the distance falls, or else work out from the middle in both directions at once.
I don't know how many um 50 units is, so this might be slower or faster than what you have. It does avoid the risk of missing a lower value, though.
Since you're doing lots of tests against the same set of points on the lower curve, you can proably improve on this by ignoring the fact that the points form a curve at all. Stick them all in a k-d tree or similar, and search that repeatedly. It's called a Nearest neighbor search.
It may help to identify this problem as a nearest neighbour search problem. That link includes a good discussion about the various algorithms that are used for this. If you are OK with using C++ rather than straight C, ANN looks like a good library for this.
It also looks as though this question has been asked before.
We can label the top curve y=t(x) and the bottom curve y=b(x). Label the closest-function x_b=c(x_t). We know that the closest-function is weakly monotone non-decreasing as two shortest paths never cross each other.
If you know that the distance function d(x_t,x_b) has only one local minimum for every fixed x_t (this happens if the curve is "smooth enough"), then you can save time by "walking" the curve:
- start with x_t=0, x_b=0
- while x_t <= x_max
-- find the closest x_b by local search
(increment x_b while the distance is decreasing)
-- add {x_t, x_b} to the result set
-- increment x_t
If you expect x_b to be smooth enough, but you cannot assume that and you want an exact result,
Walk the curve in both directions. Where the results agree, they are correct. Where they disagree, run a complete search betwen the two results (the leftmost and the rightmost local maxima). Sample the "ambiguous block" in such an order (binary division) to allow the most pruning due to the monotonicity.
As a middle ground:
Walk the curve in both directions. If the results disagree, choose among the two. If you can guarantee at most two local maxima for each fixed x_t, this produces the optimal solution. There are still some pathological cases where the optimal solution is not found, and contain a local minimum that is flanked by two other local minima that are both worse than this one. I dare say it is uncommon to find a case where the solution is far from optimal (assuming smooth y=b(x)).

Implementing a basic predator-prey simulation

I am trying to implement a predator-prey simulation, but I am running into a problem.
A predator searches for nearby prey, and eats it. If there are no near by prey, they move to a random vacant cell.
Basically the part I am having trouble with is when I advanced a "generation."
Say I have a grid that is 3x3, with each cell numbered from 0 to 8.
If I have 2 predators in 0 and 1, first predator 0 is checked, it moves to either cell 3 or 4
For example, if it goes to cell 3, then it goes on to check predator 1. This may seem correct
but it kind of "gives priority" to the organisms with lower index values.. I've tried using 2 arrays, but that doesn't seem to work either as it would check places where organisms are but aren't. ._.
Anyone have an idea of how to do this "fairly" and "correctly?"
I recently did a similar task in Java. Processing the predators starting from the top row to bottom not only gives "unfair advantage" to lower indices but also creates patterns in the movement of the both preys and predators.
I overcame this problem by choosing both row and columns in random ordered fashion. This way, every predator/prey has the same chance of being processed at early stages of a generation.
A way to randomize would be creating a linked list of (row,column) pairs. Then shuffle the linked list. At each generation, choose a random index to start from and keep processing.
More as a comment then anything else if your prey are so dense that this is a common problem I suspect you don't have a "population" that will live long. Also as a comment update your predators randomly. That is, instead of stepping through your array of locations take your list of predators and randomize them and then update them one by one. I think is necessary but I don't know if it is sufficient.
This problem is solved with a technique called double buffering, which is also used in computer graphics (in order to prevent the image currently being drawn from disturbing the image currently being displayed on the screen). Use two arrays. The first one holds the current state, and you make all decisions about movement based on the first array, but you perform the movement in the other array. Then, you swap their roles.
Edit: Looks like I didn't read your question thoroughly enough. Double buffering and randomization might both be needed, depending on how complex your rules are (but if there are no rules other than the ones you've described, randomization should suffice). They solve two distinct problems, though:
Double buffering solves the problem of correctness when you have rules where decisions about what will happen to a creature in a cell depends on the contents of neighbouring cells, and the decisions about neighbouring cells also depend on this cell. If you e.g. have a rule that says that if two predators are adjacent, they will both move away from each other, you need double buffering. Otherwise, after you've moved the first predator, the second one won't see any adjacent predator and will remain in place.
Randomization solves the problem of fairness when there are limited resources, such as when a prey only can be eaten by one predator (which seems to be the problem that concerned you).
How about some sort of round robin method. Put your predators in a circular linked list and keep a pointer to the node that's currently "first". Then, advance that first pointer to the next place in the list each generation. You could insert new predators either at the front or the back of your circular list with ease.

How do you solve the 15-puzzle with A-Star or Dijkstra's Algorithm?

I've read in one of my AI books that popular algorithms (A-Star, Dijkstra) for path-finding in simulation or games is also used to solve the well-known "15-puzzle".
Can anyone give me some pointers on how I would reduce the 15-puzzle to a graph of nodes and edges so that I could apply one of these algorithms?
If I were to treat each node in the graph as a game state then wouldn't that tree become quite large? Or is that just the way to do it?
A good heuristic for A-Star with the 15 puzzle is the number of squares that are in the wrong location. Because you need at least 1 move per square that is out of place, the number of squares out of place is guaranteed to be less than or equal to the number of moves required to solve the puzzle, making it an appropriate heuristic for A-Star.
A quick Google search turns up a couple papers that cover this in some detail: one on Parallel Combinatorial Search, and one on External-Memory Graph Search
General rule of thumb when it comes to algorithmic problems: someone has likely done it before you, and published their findings.
This is an assignment for the 8-puzzle problem talked about using the A* algorithm in some detail, but also fairly straightforward:
http://www.cs.princeton.edu/courses/archive/spring09/cos226/assignments/8puzzle.html
The graph theoretic way to solve the problem is to imagine every configuration of the board as a vertex of the graph and then use a breath-first search with pruning based on something like the Manhatten Distance of the board to derive a shortest path from the starting configuration to the solution.
One problem with this approach is that for any n x n board where n > 3 the game space becomes so large that it is not clear how you can efficiently mark the visited vertices. In other words there is no obvious way to assess if the current configuration of the board is identical to one that has previously been discovered through traversing some other path. Another problem is that the graph size grows so quickly with n (it's approximately (n^2)!) that it is just not suitable for a brue-force attack as the number of paths becomes computationally infeasible to traverse.
This paper by Ian Parberry A Real-Time Algorithm for the (n^2 − 1) - Puzzle describes a simple greedy algorithm that iteritively arrives at a solution by completing the first row, then the first column, then the second row... It arrives at a solution almost immediately, however the solution is far from optimal; essentially it solves the problem the way a human would without leveraging any computational muscle.
This problem is closely related to that of solving the Rubik's cube. The graph of all game states it too large to solve by brue force, but there is a fairly simple 7 step method that can be used to solve any cube in about 1 ~ 2 minutes by a dextrous human. This path is of course non-optimal. By learning to recognise patterns that define sequences of moves the speed can be brought down to 17 seconds. However, this feat by Jiri is somewhat superhuman!
The method Parberry describes moves only one tile at a time; one imagines that the algorithm could be made better up by employing Jiri's dexterity and moving multiple tiles at one time. This would not, as Parberry proves, reduce the path length from n^3, but it would reduce the coefficient of the leading term.
Remember that A* will search through the problem space proceeding down the most likely path to goal as defined by your heurestic.
Only in the worst case will it end up having to flood fill the entire problem space, this tends to happen when there is no actual solution to your problem.
Just use the game tree. Remember that a tree is a special form of graph.
In your case the leaves of each node will be the game position after you make one of the moves that is available at the current node.
Here you go http://www.heyes-jones.com/astar.html
Also. be mindful that with the A-Star algorithm, at least, you will need to figure out a admissible heuristic to determine whether a possible next step is closer to the finished route than another step.
For my current experience, on how to solve an 8 puzzle.
it is required to create nodes. keep track of each step taken
and get the manhattan distance from each following steps, taking/going to the one with the shortest distance.
update the nodes, and continue until reaches the goal

Similarity between line strings

I have a number of tracks recorded by a GPS, which more formally can be described as a number of line strings.
Now, some of the recorded tracks might be recordings of the same route, but because of inaccurasies in the GPS system, the fact that the recordings were made on separate occasions and that they might have been recorded travelling at different speeds, they won't match up perfectly, but still look close enough when viewed on a map by a human to determine that it's actually the same route that has been recorded.
I want to find an algorithm that calculates the similarity between two line strings. I have come up with some home grown methods to do this, but would like to know if this is a problem that's already has good algorithms to solve it.
How would you calculate the similarity, given that similar means represents the same path on a map?
Edit: For those unsure of what I'm talking about, please look at this link for a definition of what a line string is: http://msdn.microsoft.com/en-us/library/bb895372.aspx - I'm not asking about character strings.
Compute the Fréchet distance on each pair of tracks. The distance can be used to gauge the similarity of your tracks.
Math alert: Fréchet was a pioneer in the field of metric space which is relevant to your problem.
I would add a buffer around the first line based on the estimated probable error, and then determine if the second line fits entirely within the buffer.
To determine "same route," create the minimal set of normalized path vectors, calculate the total power differences and compare the total to a quality measure.
Normalize the GPS waypoints on total path length,
walk the vectors of the paths together, creating a new set of path vectors for each path based upon the shortest vector at each waypoint,
calculate the total power differences between endpoints of each vector in the normalized paths weighting for vector length, and
compare against a quality measure.
Tune the power of the differences (start with, say, squared differences) and the quality measure (say as a percent of the total power differences) visually. This algorithm produces a continuous quality measure of the path match as well as a binary result (Are the paths the same?)
Paul Tomblin said: I would add a buffer
around the first line based on the
estimated probable error, and then
determine if the second line fits
entirely within the buffer.
You could modify the algorithm as the normalized vector endpoints are compared. You could determine if any endpoint difference was above a certain size (implementing Paul's buffer idea) or perhaps, if the endpoints were outside the "buffer," use that fact to ignore that endpoint difference, allowing a comparison ignoring side trips.
You could walk along each point (Pa) of LineString A and measure the distance from Pa to the nearest line-segment of LineString B, averaging each of these distances.
This is not a quick or perfect method, but should be able to give use a useful number and is pretty quick to implement.
Do the line strings start and finish at similar points, or are they of very different extents?
If you consider a single line string to be a sequence of [x,y] points (or [x,y,z] points), then you could compute the similarity between each pair of line strings using the Needleman-Wunsch algorithm. As described in the referenced Wikipedia article, the Needleman-Wunsch algorithm requires a "similarity matrix" which defines the distance between a pair of points. However, it would be easy to use a function instead of a matrix. In your case you could simply use the 2D Euclidean distance function (or a 3D Euclidean function if your points have elevation) to provide the distance between each pair of points.
I actually side with the person (Aaron F) who said that you might be interested in the Levenshtein distance problem (and cited this). His answer seems to me to be the best so far.
More specifically, Levenshtein distance (also called edit distance), does not measure strictly the character-by-character distance, but also allows you to perform insertions and deletions. The best algorithm for this distance measure can be computed in quadratic time (pretty slow if your strings are long), but the computational biologists have pretty good heuristics for this, that might be of interest to you on their own. Check out BLAST and FASTA.
In your problem, it seems that you are dealing with differences between strings of numbers, and you care about the numbers. If you give more information, I might be able to direct you to the right variant of BLAST/FASTA/etc for your purposes. In any case, you might consider adapting BLAST and FASTA for your needs. They're quite simple.
1: http://en.wikipedia.org/wiki/Levenshtein_distance, http://www.nist.gov/dads/HTML/Levenshtein.html

Resources