Say I need to find the euclidean distance from one (x,y) coordinate to every coordinate in an array of million coordinates and then select the coordinate with the smallest distance.
At present I loop though the million element array, calculate distance keeping track of the minimum. Is there a way I could do it differently and faster.
Thanks
You can improve your algorithm significantly by using a more complex data structure for instance a k-d tree. Still if what you expect to do is to simply search once for the nearest neighbour, you can not possibly perform better than iterating over all points.
What you can do, though is optimize the function that computes the distance and also(as mentioned in comments) you may omit the square root as comparing the squares of two non-negative integers is just the same as comparing the values.
What I understand from the question is that you wanna find closest pair of point. There is an algorithm Closest pair of points problem to solve this.
Closest Pair of a set of points:
Divide the set into two equal sized parts by the line l, and recursively compute the minimal distance in each part.
Let d be the minimal of the two minimal distances.
Eliminate points that lie farther than d apart from l
Sort the remaining points according to their y-coordinates
Scan the remaining points in the y order and compute the distances of each point to its five neighbors.
If any of these distances is less than d then update d.
The whole of algorithm Closest Pair takes O(logn*nlogn) = O(nlog2n) time.
We can improve on this algorithm slightly by reducing the time it takes to achieve the y-coordinate sorting in Step 4. This is done by asking that the recursive solution computed in Step 1 returns the points in sorted order by their y coordinates. This will yield two sorted lists of points which need only be merged (a linear time operation) in Step 4 in order to yield a complete sorted list. Hence the revised algorithm involves making the following changes:
Step 1: Divide the set into..., and recursively compute the distance in each part, returning the points in each set in sorted order by y-coordinate.
Step 4: Merge the two sorted lists into one sorted list in O(n) time.
Hence the merging process is now dominated by the linear time steps thereby yielding an O(nlogn) algorithm for finding the closest pair of a set of points in the plane.
You could save quite a chunk of time by first checking if both the distance along x and along y are <= than the last distance (squared) you stored. If it's true, then you carry on with calculating the distance (squared). Of course the amount of time you save depends on how the points are distributed.
Related
I'm designing a game in Scratch. The game is suppose to have a Spaceship travel through space, avoiding asteroids. These asteroids start at a fixed X position on the right side of the screen and go to the left, horizontally until they hit a fixed X position and they'll disappear. The asteroids will start in groups between 2-6 (it's a random number generated), and each set is about 1 second apart.
Assuming the game throws out up to 6 asteroids at once, I want to make sure each asteroid is distant from the next. I tried using two variables and comparing the distance, but this did not work. I can put the group of asteroids Y spawning position into a list. So say for instance in my list, I have:
0, 100, 5, 30, -20
As you can see, there are two items in that list that are close together. What I'm trying to do, is prevent this, so the third item would be something else, like -50, for instance, and then if a six item is generated, ensure it's also distant.
Can someone pseudocode how to achieve something like this? It doesn't matter what programming language it's in, I can probably translate the general idea into Scratch.
There is a way to do this without a trial-and-error loop.
Instead of picking random positions, pick random distances, then scale them to fit the screen.
Roughly, the approach is as follows.
The lists below represent the distances between neighboring asteroids (ordered by Y coordinate), as well as distances between the outermost asteroids and the edges of the screen.
For example, if a group contains 6 asteroids, then you need lists of 7 elements each.
Create a list L1 of minimal distances. Obviously, these are all fixed values.
Create a list L2 of random numbers. Take them from some arbitrary, fixed range with a positive lower bound, e.g. [1..100].
Calculate the total 'slack' = height of screen minus sum(L1).
Calculate a multiplication factor = slack divided by sum(L2).
Multiply every element of L2 with the multiplication factor.
Add every value from L1 to the value in L2 at the same index.
L2 now contains a list of distances that:
obey the minimal distances specified in L1
together equal the height of the screen
The final step is to position every asteroid relative to its neightbor, based on the distances in L2.
Note: if step 3 gives a negative number, then obviously there is not enough room on screen for all asteroids. What's worse, a naive 'trial-and-error' algorithm would then result in an infinite loop. The solution is of course to fix your parameters; you cannot fit 6 asteroids in 360 pixels with a minimal distance of 100.
To do this, you need to do through each previous entry in the array, compare that value to the new value, and if any element is too close change the value. This process needs to repeat until a suitable number is found. If this number is less then some minimum distance, then a variable tooClose is set to yes and the value will be reset. At the begining of the loop tooClose is set to yes so that at least one random number will be generated. Then, at the beginning of the loop, the value is randomized, and tooClose is set to no, then, I loop through all the previous entries with the value i, comparing each element and setting tooClose to yes if it is too close. The comparison between numbers is done with a subtraction, followed by an absolute value, which will ensure the result is positive, giving the difference between the two numbers as a positive value.
Here is a screenshot of the code:
And here is the project:
https://scratch.mit.edu/projects/408196031/
I am currently working on an implementation of a Local Search algorithm over an array of complex numbers in C. The point over which I run my algorithm are the following ones:
As you can see it on the figure, the numbers are spread in a circular shape over the complex plane.
Now, the first implementation of my algorithm is taking a random point in this space and tries to find the neighbours by computing the module over all the different points in the complex plane by running in a 1D complex array.
Once the neighbours have been found, the cost function is evaluated over the different neighbours and only the one that gives the best value is kept and the process is repeated again until the maximum of the cost function has been reached. This method is not really a Local Search because I could directly compute the cost function on all the different points in place of having to find the neighbours.
As you understand, the current way of doing it is not efficient at all and therefore I thought of moving to a 2D implementation.
The problem is that I don't know how I could sort my complex numbers to have a 2D array in which I could know directly the neighbours instead of running through all the points?
Suppose you have an array of integers (for eg. [1 5 3 4 6]). The elements are rearranged according to the following rule. Every element can hop forward (towards left) and slide the elements in those indices over which it hopped. The process starts with element in second index (i.e. 5). It has a choice to hop over element 1 or it can stay in its own position.If it does choose to hop, element 1 slides down to index 2. Let us assume it does choose to hop and our resultant array will then be [5 1 3 4 6]. Element 3 can now hop over 1 or 2 positions and the process repeats. If 3 hops over one position the array will now be [5 3 1 4 6] and if it hops over two positions it will now be [3 5 1 4 6].
It is very easy to show that all possible permutation of the elements is possible in this way. Also any final configuration can be reached by an unique set of occurrences.
The question is, given a final array and a source array, find the total number of hops required to arrive at the final array from the source. A O(N^2) implementation is easy to find, however I believe this can be done in O(N) or O(NlogN). Also if it is not possible to do better than O(N2) it will be great to know.
For example if the final array is [3,5,1,4,6] and the source array [1,5,3,4,6], the answer will be 3.
My O(N2) algorithm is like this: you loop over all the positions of the source array from the end, since we know that is the last element to move. Here it will be 6 and we check its position in the final array. We calculate the number of hops necessary and need to rearrange the final array to put that element in its original position in the source array. The rearranging step goes over all the elements in the array and the process loops over all the elements, hence O(N^2). Using Hashmap or map can help in searching, but the map needs to be updated after every step which makes in O(N^2).
P.S. I am trying to model correlation between two permutations in a Bayesian way and this is a sub-problem of that. Any ideas on modifying the generative process to make the problem simpler is also helpful.
Edit: I have found my answer. This is exactly what Kendall Tau distance does. There is an easy merge sort based algorithm to find this out in O(NlogN).
Consider the target array as an ordering. A target array [2 5 4 1 3] can be seen as [1 2 3 4 5], just by relabeling. You only have to know the mapping to be able to compare elements in constant time. On this instance, to compare 4 and 5 you check: index[4]=2 > index[5]=1 (in the target array) and so 4 > 5 (meaning: 4 must be to the right of 5 at the end).
So what you really have is just a vanilla sorting problem. The ordering is just different from the usual numerical ordering. The only thing that changes is your comparison function. The rest is basically the same. Sorting can be achieved in O(nlgn), or even O(n) (radix sort). That said, you have some additional constraints: you must sort in-place, and you can only swap two adjacent elements.
A strong and simple candidate would be selection sort, which will do just that in O(n^2) time. On each iteration, you identify the "leftiest" remaining element in the "unplaced" portion and swap it until it lands at the end of the "placed" portion. It can improve to O(nlgn) with the use of an appropriate data structure (priority queue for identifying the "leftiest" remaining element in O(lgn) time). Since nlgn is a lower bound for comparison based sorting, I really don't think you can do better than that.
Edit: So you're not interested in the sequence of swaps at all, only the minimum number of swaps required. This is exactly the number of inversions in the array (adapted to your particular needs: "non natural ordering" comparison function, but it doesn't change the maths). See this answer for a proof of that assertion.
One way to find the number of inversions is to adapt the Merge Sort algorithm. Since you have to actually sort the array to compute it, it turns out to be still O(nlgn) time. For an implementation, see this answer or this (again, remember that you'll have to adapt).
From your answer I assume number of hops is total number of swaps of adjacent elements needed to transform original array to final array.
I suggest to use something like insert sort, but without insertion part - data in arrays will not be altered.
You can make queue t for stalled hoppers as balanced binary search tree with counters (number of elements in subtree).
You can add element to t, remove element from t, balance t and find element position in t in O(log C) time, where C is the number of elements in t.
Few words on finding position of element. It consists of binary search with accumulation of skipped left sides (and middle elements +1 if you keep elements on branches) counts.
Few words on balancing/addition/removal. You have to traverse upward from removed/added element/subtree and update counters. Overall number of operations still hold at O(log C) for insert+balance and remove+balance.
Let's t is that (balanced search tree) queue, p is current original array index, q is final array index, original array is a, final array is f.
Now we have 1 loop starting from left side (say, p=0, q=0):
If a[p] == f[q], then original array element hops over the whole queue. Add t.count to the answer, increment p, increment q.
If a[p] != f[q] and f[q] is not in t, then insert a[p] into t and increment p.
If a[p] != f[q] and f[q] is in t, then add f[q]'s position in queue to answer, remove f[q] from t and increment q.
I like the magic that will ensure this process will move p and q to ends of arrays in the same time if arrays are really permutations of one array. Nevertheless you should probably check p and q overflows to detect incorrect data as we have no really faster way to prove data is correct.
This question already has answers here:
O(n) algorithm to find the median of n² implicit numbers
(3 answers)
Closed 7 years ago.
is there a way to find the Median of an unsorted array:
1- without sorting it.
2- without using the select algorithm, nor the median of medians
I found a lot of other questions similar to mine. But the solutions, most of them, if not all of them, discussed the SelectProblem and the MedianOfMedians
You can certainly find the median of an array without sorting it. What is not easy is doing that efficiently.
For example, you could just iterate over the elements of the array; for each element, count the number of elements less than and equal to it, until you find a value with the correct count. That will be O(n2) time but only O(1) space.
Or you could use a min heap whose size is just over half the size of the array. (That is, if the array has 2k or 2k+1 elements, then the heap should have k+1 elements.) Build the heap using the first array elements, using the standard heap building algorithm (which is O(N)). Then, for each remaining element x, if x is greater than the heap's minimum, replace the min element with x and do a SiftUp operation (which is O(log N)). At the end, the median is either the heap's minimum element (if the original array's size was odd) or is the average of the two smallest elements in the heap. So that's a total of O(n log n) time, and O(n) space if you cannot rearrange array elements. (If you can rearrange array elements, you can do this in-place.)
There is a randomized algorithm able to accomplish this task in O(n) steps (average case scenario), but it does involve sorting some subsets of the array. And, because of its random nature, there is no guarantee it will actually ever finish (though this unfortunate event should happen with vanishing probability).
I will leave the main idea here. For a more detailed description and for the proof of why this algorithm works, check here.
Let A be your array and let n=|A|. Lets assume all elements of A are distinct. The algorithm goes like this:
Randomly select t = n^(3/4) elements from A.
Let T be the "set" of the selected elements.Sort T.
Set pl = T[t/2-sqrt(n)] and pr = T[t/2+sqrt(n)].
Iterate through the elements of A and determine how many elements are less than pl (denoted by l) and how many are greater than pr (denoted by r). If l > n/2 or r > n/2, go back to step 1.
Let M be the set of elements in A in between pl and pr. M can be determined in step 4, just in case we reach step 5. If the size of M is no more than 4t, sort M. Otherwise, go back to step 1.
Return m = M[n/2-l] as the median element.
The main idea behind the algorithm is to obtain two elements (pl and pr) that enclose the median element (i.e. pl < m < pr) such that these two are very close one two each other in the ordered version of the array (and do this without actually sorting the array). With high probability, all the six steps only need to execute once (i.e. you will get pl and pr with these "good" properties from the first and only pass through step 1-5, so no going back to step 1). Once you find two such elements, you can simply sort the elements in between them and find the median element of A.
Step 2 and Step 5 do involve some sorting (which might be against the "rules" you've mysteriously established :p). If sorting a sub-array is on the table, you should use some sorting method that does this in O(slogs) steps, where s is the size of the array you are sorting. Since T and M are significantly smaller than A the sorting steps take "less than" O(n) steps. If it is also against the rules to sort a sub-array, then take into consideration that in both cases the sorting is not really needed. You only need to find a way to determine pl, pr and m, which is just another selection problem (with respective indices). While sorting T and M does accomplish this, you could use any other selection method (perhaps something rici suggested earlier).
A non-destructive routine selip() is described at http://www.aip.de/groups/soe/local/numres/bookfpdf/f8-5.pdf. It makes multiple passes through the data, at each stage making a random choice of items within the current range of values and then counting the number of items to establish the ranks of the random selection.
By googling for minutes, I know the basic idea.
Let A,B,and C be sorted arrays containing n elements.
Pick median in each array and call them medA, medB, and medC.
Without loss of generality, suppose that medA > medB > medC.
The elements bigger than medA in array A cannot become the median of three arrays. Likewise, the elements smaller than medC in array C cannot, so such elements will be ignored.
Repeat steps 2-4 recursively.
My question is, what is the base case?
Assuming a lot of base cases, I tested the algorithm by hands for hours, but I was not able to find a correct base case.
Also, the lengths of three arrays will become different every recursive step. Does step 4 work even if the length of three arrays are different?
This algorithm works for two sorted arrays of same sizes but not three. After the one iteration, you eliminates half of the elements in A and C but leaves B unchanged, so the number of elements in these arrays are no longer the same, and the method no longer apply. For arrays of different sizes, if you apply the same method, you will be removing different number of elements from the lower half and upper half, therefore the median of the remaining elements is not the same as the median of the original arrays.
That being said, you can modify the algorithm to eliminate same number of elements at both end in each iteration, this could be in efficient when some of the arrays are very small and some are very large. You can also turn this into a question of finding the k-th element, track the number of elements being throw away and change value of k at each iteration. Either way this is much trickier than the two array situation.
There is another post talking about a general case: Median of 5 sorted arrays
I think you can use the selection algorithm, slightly modified to handle more arrays.
You're looking for the median, which is the p=[n/2]th element.
Pick the median of the largest array, find for that value the splitting point in the other two arrays (binary search, log(n)). Now you know that the selected number is the kth (k = sum of the positions).
If k > p, discard elements in the 3 arrays above it, if smaller, below it (discarding can be implemented by maintaing lower and upper indexes for each array, separately). If it was smaller, also update p = p - k.
Repeat until k=p.
Oops, I think this is log(n)^2, let me think about it...