First I must say this is not a Homework or something related, this is a problem of a game named (freeciv).
Ok, in the game we have 'n' number of cities usually (8-12), each city can have a max number of trade-routes of 'k' usually (4), and those trade-routes need to be 'd' distance or further (8 Manhattan tiles).
The problem consist in to find the k*n trade-routes with (max distances or min distances), obviously this problem can be solved with a brute-force algorithm but it is really slow when you the player have more than 10 cities because the program has to make several iterations; I tried to solve it using graph theory but I am not an not an expert in it, I even asked some of my teachers and none could explain me an smart-algorithm, so I didn't come here to find the exact solution but I did to get the idea or the steps to analyze this.
The problem has two parts:
Calculate pair-wise distances between the cities
Select which pairs should become trade-route
I don't think the first part can be calculated faster than O(n·t) where t is number of tiles, as each run of Dijkstra's algorithm will give you distances from one city to all other cities. However if I understand correctly, distance between two cities never changes and is symmetrical. So whenever a new city is built, you just need to run Dijkstra's algorithm from it and cache the distances.
For the second part I would expect greedy algorithm to work. Order all pairs of cities by suitability and in each step pick the first one that does not violate the constraint of k routes per city. I am not sure whether it can be proven (the proof should be similar to the one for Kruskal's minimal spanning-tree algorithm if it exists. But I suspect it will work fine in practice even if you find that it does not work in theory (I haven't tried to either prove or disprove it; it's up to you)
continue #Jan Hudec way:
Init Stage:
lets say you have N cities (c1, c2,... cN). you should build a list of connections when each entity in the list will have a format of (cX, cY, Distance) (while X < Y, this is n^2/2 time) and order it by distance (descending for max distance or ascending for min distance), and you should also have an array/list which will hold the number of connection per City (cZ = W) which initialized for each city at N-1 because they all connected at the beginning.
Iterations:
iterate over the lists of connections
for each (cX, cY, D) if the number of connection (in the connection number array) of cX > k and cY > k then delete (cX, cY, D) from the connection list and also decrees by one the value of cX and cY in the connection array.
in the end, you'll have the connection list with the value you wish for.
Related
So this question is more of an algorithm/approach seeking question where I'm looking for any thoughts/insights on how I can approach this problem. I'm browsing through a set of programming problems and came across one question where I'm required to provide the minimum number of moves needed to sort a list of items. Although this problem is marked as 'Easy', I can't find a good solution for this. Your thoughts are welcome.
The problem statement is something like this.
X has N disks of equal radius. Every disk has a distinct number out of 1 to N associated with it. Disks are placed one over other in a single pile in a random order. X wants to sort this pile of disk in increasing order, top to bottom. But he has a very special method of doing this. In a single step he can only choose one disk out of the pile and he can only put it at the top. And X wants to sort his pile of disks in minimum number of possible steps. Can you find the minimum number of moves required to sort this pile of randomly ordered disks?
The easy way to solving it without considering making minimum moves will be:
Take a disk that is max value and put it on top. And then take the second max and put it on top. And so on till all are sorted. Now this greedy approach will not always give you min steps.
Consider this example: [5,4,1,2,3] with the above greedy approach it will be like this:
[5,4,1,2,3]
[4,1,2,3,5]
[1,2,3,5,4]
[1,2,5,4,3]
[1,5,4,3,2]
[5,4,3,2,1]
Which takes 5 moves, but the min moves should be this:
[5,4,1,2,3]
[5,4,1,3,2]
[5,4,3,2,1]
Which takes only 2
To get min moves, first think how many values are already in descending order starting from N, you can consider those something you don’t need to move. And for the rest you have to move which is the min value. For example
[1,5,2,3,10,4,9,6,8,7]
Here starting from 10 there are in total 4 numbers that are in desc order [10,9,8,7] for the rest you need to move. So the min moves will the 10-4 = 6
[1,5,2,3,10,4,9,6,8,7]
[1,5,2,3,10,4,9,8,7,6]
[1,2,3,10,4,9,8,7,6,5]
[1,2,3,10,9,8,7,6,5,4]
[1,2,10,9,8,7,6,5,4,3]
[1,10,9,8,7,6,5,4,3,2]
[10,9,8,7,6,5,4,3,2,1]
Lets assume we have some data structure like an array of n entries, and for arguments sake lets assume that the data has bounded numerical values .
Is there a way to determine the profile of the data , say monotonically ascending ,descending etc to a reasonable degree, perhaps with a certainty value of z having checked k entries within the data structure?
Assuming we have an array of size N, this means that we have N-1 comparisons between each adjacent elements in the array. Let M=N-1. M represents the number of relations. The probability of the array not being in the correct order is
1/M
If you select a subset of K relations to determine monoticallly ascending or descending, the theoretical probability of certainty is
K / M
Since these are two linear equations, it is easy to see that if you want to be .9 sure, then you will need to check about 90% of the entries.
This only takes into account the assumptions in your question. If you can are aware of the probability distribution, then using statistics, you could randomly check a small percentage of the array.
If you only care about the array being in relative order(for example, on an interval from [0,10], most 1s would be close to the beginning.), this is another question altogether. An algorithm that does this as opposed to just sorting, would have to have a high cost for swapping elements and a cheap cost for comparisons. Otherwise, there would be no performance pay offs from writing a complex algorithm to handle the check.
It is important to note that this is theoretically speaking. I am assuming no distribution in the array.
The easier problem is to check the probability of encountering such orderly behavior from random data.
Eg. If numbers are arranged randomly there is p=0.5 that the first number is lower than the second number (we will come to the case of repetitions later). Now, if you sample k pairs and in every case first number is lower than second number, the probability of observing it is 2^(-k).
Coming back to repetitions, keep track of observed repetitions and factor it in. Eg. If the probability of repetition is q, probability of not observing repetitions is (1-q), probability of observing increasing or equal is q + (1-q)/2, so exponentiatewith k to get the probaility.
This is an algorithm question.
Given 1 million points , each of them has x and y coordinates, which are floating point numbers.
Find the 10 closest points for the given point as fast as possible.
The closeness can be measured as Euclidean distance on a plane or other kind of distance on a globe. I prefer binary search due to the large number of points.
My idea:
save the points in a database
1. Amplify x by a large integer e.g. 10^4 and cut off the decimal part and then Amplify x integer part by 10^4 again.
2. Amplify y by a large integer e.g. 10^4
3. Sum the above result from step 1 and 2 , we call the sum as associate_value
4. Repeat 1 to 3 for each number in the database
E.g.
x = 12.3456789 , y = 98.7654321
x times 10^4 = 123456 and then times 10^4 to get 1234560000
y times 10^2 = 9876.54321 and then get 9876
Sum them, get 1234560000 + 9876 = 1234569876
In this way, I transform 2-d data to 1-d data. In the database, each point is associated with an integer (associate_value). The integer column can be set as index in the database for fast search.
For a given point (x, y), I perform step 1 - 3 for it and then find the points in the database such that their associate_value is close to the given point associate_value.
e.g.
x = 59.469797 , y = 96.4976416
their associated value is 5946979649
Then in the database, I search the associate_values that are close to 5946979649, for example, 5946979649 + 50 , 5946979649 - 50 and also 5946979649 + 50000000 , 5946979649 - 500000000. This can be done by index-search in database.
In this way, I can find a group of points that are close to the given point. I can reduce the search space greatly. Then, I can use Euclidean or other distance formula to find the closest points.
I am not sure the efficiency of the algorithm, especially, the process of generating associate_values.
My idea works or not ? Any better ideas ?
Thanks
Your idea seems like it may work, but I would be concerned with degenerate cases (like if no points are in your specified ranges, but maybe that's not possible given the constraints). Either way, since you asked for other ideas, here's my stab at it: Store all of your points in a quad tree. Then just walk down the quad tree until you have a sufficiently small group to search through. Since the points are fixed, the cost of creating the quad is constant, and this should be logarithmic in the number of points you have.
You can do better and just concatenate the binary value from the x- and y co-oordinates. Instead of a straight line it orders the points along a z-curve. Then you can compute the upper bounds with the mostsignificant bits. The z-curve is often use in mapping applications:http://msdn.microsoft.com/en-us/library/bb259689.aspx.
The way I read your algorithm you are discriminating the values along a line with a slope of -1 that are similar to your point. i.e. if your point is 2,2 you would look at points 1,3 0,4 and -1,5 and likely miss points closer. Most algorithms to solve this are O(n) which isn't terribly bad.
A simple algorithm to solve this problem is to keep a priority queue of the closest ten and a measurement of the furthest distance of the ten points as you iterate over the set. If the x or y value is not within the furthest distance discard it immediately. Otherwise calculate it with whatever distance measurement your using and see if it gets inserted into the queue. If so update your furthest on top ten threshold and continue iterating.
If your points are pre-sorted on one of the axes you can further optimize the algorithm by starting at the matching the point on that axis and radiate outward until you are at a difference greater than the distance from your tenth closest point. I did not include sorting in the description in the paragraph above because sorting is O(nlogn) which is slower than O(n). If you are doing this multiple times on the same set then it could be beneficial to sort it.
I want to implement the double bridge move in TSP. I know that I have to select 3 random positions and split the permutation into 4 parts and then I have to reconnect these parts together in different order, but I would like to get all possible combination that can be available for TSP problem by double bridge?
Assuming that the number of cities are n, so will all possible combination for double bridge be n?
If we try to divide the whole permutation into near about four equal parts and reconnect them to find new solution, then the approximate number of neighborhood solutions may be [(n-2)/4]^3 for n number of cities. Here, [x] represents the least integer value that is greater than or equal to x.
I have a simple machine learning question:
I have n (~110) elements, and a matrix of all the pairwise distances. I would like to choose the 10 elements that are most far apart. That is, I want to
Maximize:
Choose 10 different elements.
Return min distance over (all pairings within the 10).
My distance metric is symmetric and respects the triangle inequality.
What kind of algorithm can I use? My first instinct is to do the following:
Cluster the n elements into 20
clusters.
Replace each cluster with just the
element of that cluster that is
furthest from the mean element of
the original n.
Use brute force to solve the
problem on the remaining 20
candidates. Luckily, 20 choose 10 is
only 184,756.
Edit: thanks to etarion's insightful comment, changed "Return sum of (distances)" to "Return min distance" in the optimization problem statement.
Here's how you might approach this combinatorial optimization problem by taking the convex relaxation.
Let D be an upper triangular matrix with your distances on the upper triangle. I.e. where i < j, D_i,j is the distance between elements i and j. (Presumably, you'll have zeros on the diagonal, as well.)
Then your objective is to maximize x'*D*x, where x is binary valued with 10 elements set to 1 and the rest to 0. (Setting the ith entry in x to 1 is analogous to selecting the ith element as one of your 10 elements.)
The "standard" convex optimization thing to do with a combinatorial problem like this is to relax the constraints such that x need not be discrete valued. Doing so gives us the following problem:
maximize y'*D*y
subject to: 0 <= y_i <= 1 for all i, 1'*y = 10
This is (morally) a quadratic program. (If we replace D with D + D', it'll become a bona fide quadratic program and the y you get out should be no different.) You can use an off-the-shelf QP solver, or just plug it in to the convex optimization solver of your choice (e.g. cvx).
The y you get out need not be (and probably won't be) a binary vector, but you can convert the scalar values to discrete ones in a bunch of ways. (The simplest is probably to let x be 1 in the 10 entries where y_i is highest, but you might need to do something a little more complicated.) In any case, y'*D*y with the y you get out does give you an upper bound for the optimal value of x'*D*x, so if the x you construct from y has x'*D*x very close to y'*D*y, you can be pretty happy with your approximation.
Let me know if any of this is unclear, notation or otherwise.
Nice question.
I'm not sure if it can be solved exactly in an efficient manner, and your clustering based solution seems reasonable. Another direction to look at would be local search method such as simulated annealing and hill climbing.
Here's an obvious baseline I would compare any other solution against:
Repeat 100 times:
Greedily select the datapoint that whose removal decreases the objective function the least and remove it.