Data Array Decimation

Data Array Decimation - c

I have an array which is changing rapidly and has variable length -this could be 100 minimum and about 5k maximum-. And i'm going to use these values to encolouring a data column that i produce by drawing lines one by one. This will be something like scan graph.
And another thing is, i have a fixed column length but variable data array length, so every member of the array should be fit into the graph. If the length of the data is less than column length, i should expand the array which is easier one and i did that. But if the length of the data is bigger, i have to do something like decimation.
The problem is, i should keep the characteristic of the array during the decimation. When i tried to calculate the arithmetic mean the group of every N member, the graph is getting smoother which i don't want to.
What should i do to fit this array into the graph, without change it's characteristic?
This is how the graph looks like : http://imgur.com/KFAzaAQ

I guess you mean that your relevant information is the location of spikes and their height. So you want to rescale the graph while preserving the spikes and their height.
One way to achieve that is to remove data points with minimal value difference with its neighbours. You could compute a score for each data point that is proportional to the difference of its value with its neighbour and decimate the data points with smallest score. A candidate score function is the sum of the square of difference.
You then create a score vector as big as your data vector and for each data value you compute its score like this
score[i] = square(data[i]-data[i-1]) + square(data[i+1]-data[i]);
Another candidate is
score[i] = abs(data[i]-data[i-1]) + abs(data[i+1]-data[i]);
You also want that this decimation is applied as uniformly as possible over your data so that the graph doesn't become distorted. One way to achieve this is to split your data into buckets and decimate in each buckets the required number of data points.
If you have to remove many data points (N > 2), it might be preferable to do it in multiple pass where each pass recomputes the score. In this case you don't distort to much the content of each buckets.

Related

Looking for a array reducing algorithm

I have an array of a structures containing two float values x and y.
x is the position on a x-axis in mm (0.0mm to e.g. 2500.0mm)
y is a height measurement in mm at the x-position (0.0mm to e.g. 50.0mm)
With the length of 2500.0mm I will have an array filled up with 2501 values (one for each mm). As i want to send this array to a visualization which will draw that on a x/y plot i want to reduce that array to exactly 500 values (more than 500 values will slow down the communication too much). Now you might say.. well okay, than just take every 5th value. But what to do if my array has 1653 values? I would have to take every 3,306th value. I definitely need the first and the last value.
Is there any elegant algorithm that might help me out?

You could use interpolation to estimate a function that is similar to yours, and then you could just choose the desired points in the data range and estimate their value. Then, you could simply plot these values.
This is elegant and pretty easy to generalize for more or less points, as long as you stay within the range of original data (and do not try to extrapolate out of this range)

Filling a plane with random points

I would like to fill a plane with randomly placed points, check whether any of them overlap (and if they do, move one of them to empty place) and then calculate the average distance between them. Later I plan on extending that to 3D so that it is kind of having particles in a box.
I know there must be better ways of doing it but here's what I came up with. For placing random points in a plane:
int pos[NUMBER][2]; /* Creates an array of NUMBER amount of points with x and y coordinate */
int a, b;
srand( time(NULL) );
for(a=0;a<NUMBER;a++)
for(b=0;b<2;b++)
pos[a][b]=rand()%11; /* Using modulus is random enough for now */
The next stage is finding points that over lap:
for(a=0;a<NUMBER-1;a++)
for(b=a+1;b<NUMBER;b++)
if( pos[a][0] == pos[b][0] && pos[a][1] == pos[b][1])
printf("These points overlap:\t", pos[a][0], pos[a][1]);
Now when I identify which points overlap I have to move one of them, but when I do the point in new position might overlap with one of the earlier ones. Is there any accepted way of solving this problem? One way is infinite while(true) loop with breaking condition but that seems very inefficient especially when system gets dense.
Thank you!

Here's a sketch of a solution that I think could work:
Your point generation algorithm is good, can be left as is.
The correct time to check for overlap is already when the point is generated. We simply generate new points until we generate one that doesn't overlap with any previous.
To quickly find overlap, use a hash table such as the one from '''glib'''. The key could be two int32_t packed into a int64_t union:
typedef union _Point {
struct {
int32_t x;
int32_t y;
};
int64_t hashkey;
} Point;
Use the "iterate over all keys" functionality of your hash table to build the output array.
I haven't been able to test this but it should work. This assumes that the plane is large in relation to the number of points, so that overlaps are less likely. If the opposite is true, you can invert the logic: start with a full plane and add holes randomly.
Average complexity of this algorithm is O(n).

As you hinted that it should work for high densities as well, the best course of action is to create a 2D array of booleans (or bit vectors if you want to save space), where all elements are set to false initially. Then you loop NUMBER times, generating a random coordinate, and check whether the value in the array is true or not. If true, you generate another random coordinate. If false, you add the coordinate to the list, and set the corresponding element in the array to true.
The above assumes you want exactly NUMBER points, and a completely uniform chance of placing them. If either of those constraints are not necessary, there are other algorithms possible that use much less memory.

One solution is to place points at random, see if they overlap, and re-try on overlap. To avoid testing every point, you need to set up an index by space - if you have a 100*100 plane and a cut-off of 3-4, you could use 10*10 grid squares. Then you have to search four grid squares to check you don't have a hit.
But there are other ways of doing it. Uniformly placing points on a gird will create a Poisson distribution. So for each point, you can create a random number with the Poisson distribution. What happens when you get 2 or more? This method forces you to answer that question. Maybe you artificially clamp to one, maybe you move into the neighbouring slot. This method won't create exactly N points, so if you must have N, you can put in a fudge (randomly add/remove the last few points).

Find the largest rectangle with no repeated elements

Find the max size of rectangular contiguous submatrix of unique (i.e. non repeated within a given submatrix) element.
How can I solve this?

You should set a maximum value to 0. Iterate the rows of the matrix and if they are not repeating (whatever that means), compare its size to the maximum. If it is bigger, then store the new maximum value and use that for further iterations. In case you found a new maximum, store whatever you need to store. So, the algorithm looks like this:
maximum <- 0
for all rows as row
if (row is not repeating) then
if (row rectangle size > maximum) then
maximum <- new maximum
store whatever you need to store
end if
end if
end for
Note, that if you do not have further information, then it is pointless to do a binary search, since you will have to check the size of each rectangle. If you have further knowledge about your rectangles, then the algorithm might be optimized.

A first idea (recursion): Maybe identify pairs in the whole array, this will identify constraints to respect. If there is a value v at both positions x0,y0 and x1,y1 then you cannot have a rectangle containing these positions, so this will let you construct some possible rectangles from these values and recurse on them?
Another one (dynamic programming): start with elementary arrays (size 1x1) and try to merge them respecting the constraint?

Sorting n sets of data into one

I have n arrays of data, each of these arrays is sorted by the same criteria.
The number of arrays will, in almost all cases, not exceed 10, so it is a relatively small number. In each array, however, can be a large number of objects, that should be treated as infinite for the algorithm I am looking for.
I now want to treat these arrays as if they are one array. However, I do need a way, to retrieve objects in a given range as fast as possible and without touching all objects before the range and/or all objects after the range. Therefore it is not an option to iterate over all objects and store them in one single array. Fetches with low start values are also more likely than fetches with a high start value. So e.g. fetching objects [20,40) is much more likely than fetching objects [1000,1020), but it could happen.
The range itself will be pretty small, around 20 objects, or can be increased, if relevant for the performance, as long as this does not hit the limits of memory. So I would guess a couple of hundred objects would be fine as well.
Example:
3 arrays, each containing a couple of thousand entires. I now want to get the overall objects in the range [60, 80) without touching either the upper 60 objects in each set nor all the objets that are after object 80 in the array.
I am thinking about some sort of combined, modified binary search. My current idea is something like the following (note, that this is not fully thought through yet, it is just an idea):
get object 60 of each array - the beginning of the range can not be after that, as every single array would already meet the requirements
use these objects as the maximum value for the binary search in every array
from one of the arrays, get the centered object (e.g. 30)
with a binary search in all the other arrays, try to find the object in each array, that would be before, but as close as possible to the picked object.
we now have 3 objects, e.g. object 15, 10 and 20. The sum of these objects would be 45. So there are 42 objects in front, which is more than the beginning of the range we are looking for (30). We continue our binary search in the remaining left half of one of the arrays
if we instead get a value where the sum is smaller than the beginning of the range we are looking for, we continue our search on the right.
at some point we will hit object 30. From there on, we can simply add the objects from each array, one by one, with an insertion sort until we hit the range length.
My questions are:
Is there any name for this kind of algorithm I described here?
Are there other algorithms or ideas for this problem, that might be better suited for this issue?
Thans in advance for any idea or help!

People usually call this problem something like "selection in the union of multiple sorted arrays". One of the questions in the sidebar is about the special case of two sorted arrays, and this question is about the general case. Several comparison-based approaches appear in the combined answers; they more or less have to determine where the lower endpoint in each individual array is. Your binary search answer is one of the better approaches; there's an asymptotically faster algorithm due to Frederickson and Johnson, but it's complicated and not obviously an improvement for small ranks.

Big problem with Dijkstra algorithm in a linked list graph implementation

I have my graph implemented with linked lists, for both vertices and edges and that is becoming an issue for the Dijkstra algorithm. As I said on a previous question, I'm converting this code that uses an adjacency matrix to work with my graph implementation.
The problem is that when I find the minimum value I get an array index. This index would have match the vertex index if the graph vertexes were stored in an array instead. And the access to the vertex would be constant.
I don't have time to change my graph implementation, but I do have an hash table, indexed by a unique number (but one that does not start at 0, it's like 100090000) which is the problem I'm having. Whenever I need, I use the modulo operator to get a number between 0 and the total number of vertices.
This works fine for when I need an array index from the number, but when I need the number from the array index (to access the calculated minimum distance vertex in constant time), not so much.
I tried to search for how to inverse the modulo operation, like, 100090000 mod 18000 = 10000 and, 10000 invmod 18000 = 100090000 but couldn't find a way to do it.
My next alternative is to build some sort of reference array where, in the example above, arr[10000] = 100090000. That would fix the problem, but would require to loop the whole graph one more time.
Do I have any better/easier solution with my current graph implementation?

In your array, instead of just storing the count (or whatever you're storing there), store a structure which contains the count as well as the vertex index number.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight