I am currently working on an implementation of a Local Search algorithm over an array of complex numbers in C. The point over which I run my algorithm are the following ones:
As you can see it on the figure, the numbers are spread in a circular shape over the complex plane.
Now, the first implementation of my algorithm is taking a random point in this space and tries to find the neighbours by computing the module over all the different points in the complex plane by running in a 1D complex array.
Once the neighbours have been found, the cost function is evaluated over the different neighbours and only the one that gives the best value is kept and the process is repeated again until the maximum of the cost function has been reached. This method is not really a Local Search because I could directly compute the cost function on all the different points in place of having to find the neighbours.
As you understand, the current way of doing it is not efficient at all and therefore I thought of moving to a 2D implementation.
The problem is that I don't know how I could sort my complex numbers to have a 2D array in which I could know directly the neighbours instead of running through all the points?
Related
I'm struggling with one question on my assignment regarding bottom up merge sort.
Bottom Up merge sort divides an array into sub-arrays of two and sorts its members, then combines (merges) every two consecutive sub-arrays into another set of 4-sized sub-arrays, and so on until there are two arrays of size n/2, that merge into a completely sorted array.
I completely understand the algorithm but I'm having trouble with proving it formally using induction.
I'm supposed to prove its correctness under the assumption that n is a power of 2.
then I'm asked to calculate and prove its run time which is also by induction.
my current progress consists of proving that at each iteration i the number of sub arrays that are sorted is n/2^i,I'm not getting anywhere with that maybe because I'm looking at it in a wrong way.
Any guidance on how to prove this using induction?
I have an array of ~1000 objects that are float values which evolve over time (in a manner which cannot be predetermined; assume it is a black box). At every fixed time interval, I want to set a threshold value that separates the top 5-15% of values, making the cut wherever a distinction can be made most "naturally," in the sense that there are the largest gaps between data points in the array.
What is the best way for me to implement such an algorithm? Obviously (I think) the first step to take at the end of each time interval is to sort the array, but then after that I am not sure what the most efficient way to resolve this problem is. I have a feeling that it is not necessary to tabulate all of the gaps between consecutive data points in the region of interest in the sorted array, and that there is a much faster way than brute-force to solve this, but I am not sure what it is. Any ideas?
You could write your own quicksort/select routine that doesn't issue recursive calls for subarrays lying entirely outside of the 5%-15%ile range. For only 1,000 items, though, I'm not sure if it would be worth the trouble.
Another possibility would be to use fancy data structures to track the largest gaps online as the values evolve (e.g., a binary search tree decorated with subtree counts (for fast indexing) and largest subtree gaps). It's definitely not clear if this would be worth the trouble.
Say I need to find the euclidean distance from one (x,y) coordinate to every coordinate in an array of million coordinates and then select the coordinate with the smallest distance.
At present I loop though the million element array, calculate distance keeping track of the minimum. Is there a way I could do it differently and faster.
Thanks
You can improve your algorithm significantly by using a more complex data structure for instance a k-d tree. Still if what you expect to do is to simply search once for the nearest neighbour, you can not possibly perform better than iterating over all points.
What you can do, though is optimize the function that computes the distance and also(as mentioned in comments) you may omit the square root as comparing the squares of two non-negative integers is just the same as comparing the values.
What I understand from the question is that you wanna find closest pair of point. There is an algorithm Closest pair of points problem to solve this.
Closest Pair of a set of points:
Divide the set into two equal sized parts by the line l, and recursively compute the minimal distance in each part.
Let d be the minimal of the two minimal distances.
Eliminate points that lie farther than d apart from l
Sort the remaining points according to their y-coordinates
Scan the remaining points in the y order and compute the distances of each point to its five neighbors.
If any of these distances is less than d then update d.
The whole of algorithm Closest Pair takes O(logn*nlogn) = O(nlog2n) time.
We can improve on this algorithm slightly by reducing the time it takes to achieve the y-coordinate sorting in Step 4. This is done by asking that the recursive solution computed in Step 1 returns the points in sorted order by their y coordinates. This will yield two sorted lists of points which need only be merged (a linear time operation) in Step 4 in order to yield a complete sorted list. Hence the revised algorithm involves making the following changes:
Step 1: Divide the set into..., and recursively compute the distance in each part, returning the points in each set in sorted order by y-coordinate.
Step 4: Merge the two sorted lists into one sorted list in O(n) time.
Hence the merging process is now dominated by the linear time steps thereby yielding an O(nlogn) algorithm for finding the closest pair of a set of points in the plane.
You could save quite a chunk of time by first checking if both the distance along x and along y are <= than the last distance (squared) you stored. If it's true, then you carry on with calculating the distance (squared). Of course the amount of time you save depends on how the points are distributed.
I'm writing a program for a numerical simulation in C. Part of the simulation are spatially fixed nodes that have some float value to each other node. It is like a directed graph. However, if two nodes are too far away, (farther than some cut-off length a) this value is 0.
To represent all these "correlations" or float values, I tried to use a 2D array, but since I have 100.000 and more nodes, that would correspond to 40GB memory or so.
Now, I am trying to think of different solustions for that problem. I don't want to save all these values on the harddisk. I also don't want to calculate them on the fly. One idea was some sort of sparse matrix, like the one one can use in Matlab.
Do you have any other ideas, how to store these values?
I am new to C, so please don't expect too much experience.
Thanks and best regards,
Jan Oliver
How many nodes, on average, are within the cutoff distance for a given node determines your memory requirement and tells you whether you need to page to disk. The solution taking the least memory is probably a hash table that maps a pair of nodes to a distance. Since the distance is the same each way, you only need to enter it into the hash table once for the pair -- put the two node numbers in numerical order and then combine them to form a hash key. You could use the Posix hsearch/hcreate/hdestroy functions for the hash table, although they are less than ideal.
A sparse matrix approach sounds ideal for this. The Wikipedia article on sparse matrices discusses several approaches to implementation.
A sparse adjacency matrix is one idea, or you could use an adjacency list, allowing your to only store the edges which are closer than your cutoff value.
You could also hold a list for each node, which contains the other nodes this node is related to. You would then have an overall number of list entries of 2*k, where k is the number of non-zero values in the virtual matrix.
Implementing the whole system as a combination of hashes/sets/maps is still expected to be acceptable with regard to speed/performance compared to a "real" matrix allowing random access.
edit: This solution is one possible form of an implementation of a sparse matrix. (See also Jim Balter's note below. Thank you, Jim.)
You should indeed use sparse matrices if possible. In scipy, we have support for sparse matrices, so that you can play in python, although to be honest sparse support still has rough edges.
If you have access to matlab, it will definitely be better ATM.
Without using sparse matrix, you could think about using memap-based arrays so that you don't need 40 Gb of RAM, but it will still be slow, and only really make sense if you have a low degree of sparsity (say if 10-20 % of your 100000x100000 matrix has items in it, then full arrays will actually be faster and maybe even take less space than sparse matrices).
I am working on a project which will be using large datasets (both 2D and 3D) which I will be turning into triangles, or tetrahedra, in order to render them.
I will also be performing calculations on these tris/tets. Which tris/tets to use for each calculation depend on the greatest and smallest values of their vertices.
So I need to sort the tris/tets in order of their greatest valued vertex.
--
I have tried quicksort, and binary insertion sort. Quicksort so far offers the quickest solution but it is still quite slow due to the size of the data sets.
I was thinking along the lines of a bucket/map sort when creating the tris/tets in the first place; a bucket for each of the greatest valued vertices encountered, adding pointers to the triangles who all have that value as the value of their greatest valued vertex.
This approach should be linear in time, but requires more memory obviously. This is not an issue, but my programming language of choice is c. And I'm not entirely sure how I would go about coding such a thing up.
So my question to you is, how would you go about getting the triangles/tets in such a way that you could iterate through them, from the triangle whos vertex with the greatest value of its 3 vertices is the greatest valued vertex in the entire data set, all the way down to the triangle with the the smallest greatest vertex value? :)
Can't you just store them in a binary search tree as you generate them? That would keep them in order and easily searchable (O(log(n)) for both insertion and lookup)
You could use priority queue based on a heap data structure. This should also get you O(log(n)) for insertion and extraction.