Looking for a array reducing algorithm

Looking for a array reducing algorithm - arrays

I have an array of a structures containing two float values x and y.
x is the position on a x-axis in mm (0.0mm to e.g. 2500.0mm)
y is a height measurement in mm at the x-position (0.0mm to e.g. 50.0mm)
With the length of 2500.0mm I will have an array filled up with 2501 values (one for each mm). As i want to send this array to a visualization which will draw that on a x/y plot i want to reduce that array to exactly 500 values (more than 500 values will slow down the communication too much). Now you might say.. well okay, than just take every 5th value. But what to do if my array has 1653 values? I would have to take every 3,306th value. I definitely need the first and the last value.
Is there any elegant algorithm that might help me out?

You could use interpolation to estimate a function that is similar to yours, and then you could just choose the desired points in the data range and estimate their value. Then, you could simply plot these values.
This is elegant and pretty easy to generalize for more or less points, as long as you stay within the range of original data (and do not try to extrapolate out of this range)

Related

Excel maximum difference between consecutive values in array

I have an array of numbers:
46.50, 46.50, 46.50, 50.00, 60.00, 57.00, 50.00, 48.00, 44.00, 42.00
I'd like to create a formula that finds the maximum positive difference between two consecutive pairs. So in the above example, the intermediate calculation would be:
0,0,-3.50,-10.00,3.00,7.00,2.00,4.00,2.00
Therefore the answer would be 7.00

Going to go with a basic array formula for this one. Aussuming your data is layed out as per the image below, used to offset ranges and subtract one range from the other. Then take the maximum of those results. This can be achieved using the following formula entered as an array. meaning you need to confirm it with CTRL+SHIFT+ENTER. You will know you have done it right when {} show up around your formula. They cannot be added manually.
=MAX(B3:B11-B4:B12)
as an alternative non array formula you can got with AGGREGATE which will perform array like calculations:
=AGGREGATE(14,6,B3:B11-B4:B12,1)
The above formulas will provide you with the largest positive difference. If how ever you need to know the largest difference, then -10 is a larger difference than 7. Its just in the opposite direction. To find this you would need to add ABS to the above equations as follows:
=MAX(ABS(B3:B11-B4:B12))
OR
=AGGREGATE(14,6,ABS(B3:B11-B4:B12),1)

Use an array formaula. If your values are in column A (rows 1 to 10 in this case), use
=MAX(A1:A9-A2:A10)
And enter it with CTRL-SHIFT-ENTER instead of just Enter.

Excel Formula with dynamic input range

I have the following problem. I use excel's SLOPE function. However, I constantly have to adjust the input range manually -> eg sometimes its SLOPE(A2:A50) and then SLOPE(A2:A75), depending how many input variables I have.
Is there a way I can change the function so that it always takes the range up until the last non-empty cell so that I don't have to adjust manually every time?
Many thanks in advance!

Here is a Simple Explanation :
You can just use an intermediate cell to calculate your ranges and then pass those values to the slope function
In the below Example , if X and Y are extended to have any number of obs , the formula will calculate it dynamically.

Filling a plane with random points

I would like to fill a plane with randomly placed points, check whether any of them overlap (and if they do, move one of them to empty place) and then calculate the average distance between them. Later I plan on extending that to 3D so that it is kind of having particles in a box.
I know there must be better ways of doing it but here's what I came up with. For placing random points in a plane:
int pos[NUMBER][2]; /* Creates an array of NUMBER amount of points with x and y coordinate */
int a, b;
srand( time(NULL) );
for(a=0;a<NUMBER;a++)
for(b=0;b<2;b++)
pos[a][b]=rand()%11; /* Using modulus is random enough for now */
The next stage is finding points that over lap:
for(a=0;a<NUMBER-1;a++)
for(b=a+1;b<NUMBER;b++)
if( pos[a][0] == pos[b][0] && pos[a][1] == pos[b][1])
printf("These points overlap:\t", pos[a][0], pos[a][1]);
Now when I identify which points overlap I have to move one of them, but when I do the point in new position might overlap with one of the earlier ones. Is there any accepted way of solving this problem? One way is infinite while(true) loop with breaking condition but that seems very inefficient especially when system gets dense.
Thank you!

Here's a sketch of a solution that I think could work:
Your point generation algorithm is good, can be left as is.
The correct time to check for overlap is already when the point is generated. We simply generate new points until we generate one that doesn't overlap with any previous.
To quickly find overlap, use a hash table such as the one from '''glib'''. The key could be two int32_t packed into a int64_t union:
typedef union _Point {
struct {
int32_t x;
int32_t y;
};
int64_t hashkey;
} Point;
Use the "iterate over all keys" functionality of your hash table to build the output array.
I haven't been able to test this but it should work. This assumes that the plane is large in relation to the number of points, so that overlaps are less likely. If the opposite is true, you can invert the logic: start with a full plane and add holes randomly.
Average complexity of this algorithm is O(n).

As you hinted that it should work for high densities as well, the best course of action is to create a 2D array of booleans (or bit vectors if you want to save space), where all elements are set to false initially. Then you loop NUMBER times, generating a random coordinate, and check whether the value in the array is true or not. If true, you generate another random coordinate. If false, you add the coordinate to the list, and set the corresponding element in the array to true.
The above assumes you want exactly NUMBER points, and a completely uniform chance of placing them. If either of those constraints are not necessary, there are other algorithms possible that use much less memory.

One solution is to place points at random, see if they overlap, and re-try on overlap. To avoid testing every point, you need to set up an index by space - if you have a 100*100 plane and a cut-off of 3-4, you could use 10*10 grid squares. Then you have to search four grid squares to check you don't have a hit.
But there are other ways of doing it. Uniformly placing points on a gird will create a Poisson distribution. So for each point, you can create a random number with the Poisson distribution. What happens when you get 2 or more? This method forces you to answer that question. Maybe you artificially clamp to one, maybe you move into the neighbouring slot. This method won't create exactly N points, so if you must have N, you can put in a fudge (randomly add/remove the last few points).

Data Array Decimation

I have an array which is changing rapidly and has variable length -this could be 100 minimum and about 5k maximum-. And i'm going to use these values to encolouring a data column that i produce by drawing lines one by one. This will be something like scan graph.
And another thing is, i have a fixed column length but variable data array length, so every member of the array should be fit into the graph. If the length of the data is less than column length, i should expand the array which is easier one and i did that. But if the length of the data is bigger, i have to do something like decimation.
The problem is, i should keep the characteristic of the array during the decimation. When i tried to calculate the arithmetic mean the group of every N member, the graph is getting smoother which i don't want to.
What should i do to fit this array into the graph, without change it's characteristic?
This is how the graph looks like : http://imgur.com/KFAzaAQ

I guess you mean that your relevant information is the location of spikes and their height. So you want to rescale the graph while preserving the spikes and their height.
One way to achieve that is to remove data points with minimal value difference with its neighbours. You could compute a score for each data point that is proportional to the difference of its value with its neighbour and decimate the data points with smallest score. A candidate score function is the sum of the square of difference.
You then create a score vector as big as your data vector and for each data value you compute its score like this
score[i] = square(data[i]-data[i-1]) + square(data[i+1]-data[i]);
Another candidate is
score[i] = abs(data[i]-data[i-1]) + abs(data[i+1]-data[i]);
You also want that this decimation is applied as uniformly as possible over your data so that the graph doesn't become distorted. One way to achieve this is to split your data into buckets and decimate in each buckets the required number of data points.
If you have to remove many data points (N > 2), it might be preferable to do it in multiple pass where each pass recomputes the score. In this case you don't distort to much the content of each buckets.

signrank test in a three-dimensional array in MATLAB

I have a 60x60x35 array and would like to calculate the Wilcoxon signed rank test to calculate if the median for each element value across the third array dimension (i.e. with 35 values) is different from zero. Thus, I would like my results in two 60x60 arrays - with values of 0 and 1 depending on the test statistic, and in a separate array with corresponding p values.
The problem I am facing is specifying the command in a way that desired output would have appropriate dimensions and would be calculated across the appropriate dimension of the array.
Thanks for your help and all the best!

So one way to solve your problem is using a nested for-loop. Lets say your data is stored in data:
data=rand(60,60,35);
size_data=size(data);
p=zeros(size_data(1),size_data(2));
p(:,:)=NaN;
h=zeros(size_data(1),size_data(2));
h(:,:)=NaN;
for k=1:size_data(1)
for l=1:size_data(2)
tmp_data=data(k,l,:);
tmp_data=reshape(tmp_data,1,numel(tmp_data));
[p(k,l), h(k,l)]=signrank(tmp_data);
end
end
What I am doing is I preallocate the memory of p,h as a 60x60 matrix. Then I set them to NaN, so if you can easily see if sth went wrong (0 would be an acceptable result). Now I loop over all elements and store the actual data array in a new variable. signrank needs the data to be an array so I reshape it to two dimensions.
I guess you could skip those loops by using bsxfun

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Looking for a array reducing algorithm - arrays

Related

Excel maximum difference between consecutive values in array

Excel Formula with dynamic input range

Filling a plane with random points

Data Array Decimation

signrank test in a three-dimensional array in MATLAB

Categories

Resources