Led bar indicator flickering within adjacent values and how to avoid this (embedded-C) - c

I am designing a measurement instrument that has a visible user output on a 30-LEDs bar. The program logic acts in this fashion (pseudo-code)
while(1)
{
(1)Sensor_Read();
(2)Transform_counts_into_leds();
(3)Send_to_bar();
{
The relevant function (2) is a simple algorithm that transforms the counts from the I2C sensor to a value serially sent to shift-registers that control the single leds.
The variable sent to function (3) is simply the number of LEDs that have to stay on (0 for al LEDs off, 30 for all LEDs on)
uint8_t Transform_counts_into_leds(uint16_t counts)
{
float on_leds;
on_leds = (uint8_t)(counts * 0.134); /*0.134 is a dummy value*/
return on_leds;
}
using this program logic when counts value is on the threshold between two LEDs, the next led flickers
I think this is a bad user experience for my device and I want the LEDs, once lit, to stay stable in a small range of values.
QUESTION: How a solution to this problem could be implemented in my project?

Hysteresis is useful for a number of applications, but I would suggest not appropriate in this instance. The problem is that if the level genuinely falls from say 8 to 7 for example you would not see any change until at least one sample at 6 and it would jump to 6 and there would have to be a sample of 8 before it went back to 7.
A more appropriate solution in the case is a moving average, although it is simpler and more useful to use a moving sum and use the higher resolution that gives. For example a moving-sum of 16 effectively adds (almost) 4 bits of resolution, making a 8 bit sensor effectively 12 bit - at the cost of bandwidth of course; you don't get something for nothing. In this case lower bandwidth (i.e. less responsive to higher frequencies is exactly what you need)
Moving sum:
#define BUFFER_LEN 16 ;
#define SUM_MAX (255 * BUFFER_LEN)
#define LED_MAX 30
uint8_t buffer[BUFFER_LEN] = {0} ;
int index = 0 ;
uint16_t sum = 0 ;
for(;;)
{
uint8_t sample = Sensor_Read() ;
// Maintain sum of buffered values by
// subtracting oldest buffered value and
// adding the new sample
sum -= buffer[index] ;
sum += sample ;
// Replace oldest sample with new sample
// and increment index to next oldest sample
buffer[index] = sample ;
index = (index + 1) % BUFFER_LEN ;
// Transform to LED bar level
int led_level = (LED_MAX * sum) / SUM_MAX ;
// Show level
setLedBar( led_level ) ;
}

The underlying problem -- displaying sensor data in a human-friendly way -- is very interesting. Here's my approach in pseudocode:
Loop:
Read sensor
If sensor outside valid range:
Enable warning LED
Sleep in a low-power state for a while
Restart loop
Else:
Disable warning LED
Filter sensor value
Compute display value from sensor value with extra precision:
If new display value differs sufficiently from current value:
Update current displayed value
Update display with scaled-down display value
Filtering deals with noise in the measurements. Filtering smoothes out any sudden changes in the measurement, removing sudden spikes. It is like erosion, turning sharp and jagged mountains into rolling fells and hills.
Hysteresis hides small changes, but does not otherwise filter the results. Hysteresis won't affect noisy or jagged data, it only hides small changes.
Thus, the two are separate, but complementary methods, that affect the readout in different ways.
Below, I shall describe two different filters, and two variants of simple hysteresis implementation suitable for numeric and bar graph displays.
If possible, I'd recommend you write some scripts or test programs that output the input data and the variously filtered output data, and plot it in your favourite plotting program (mine is Gnuplot). Or, better yet, experiment! Nothing beats practical experiments for human interface stuff (at least if you use existing suggestions and known theory as your basis, and leap forward from there).
Moving average:
You create an array of N sensor readings, updating them in a round-robin fashion, and using their average as the current reading. This produces very nice (as in human-friendly, intuitive) results, as only the N latest sensor readings affect the average.
When the application is first started, you should copy the very first reading into all N entries in the averaging array. For example:
#define SENSOR_READINGS 32
int sensor_reading[SENSOR_READINGS];
int sensor_reading_index;
void sensor_init(const int reading)
{
int i;
for (i = 0; i < SENSOR_READINGS; i++)
sensor_reading[i] = reading;
sensor_reading_index = 0;
}
int sensor_update(const int reading)
{
int i, sum;
sensor_reading_index = (sensor_reading_index + 1) % SENSOR_READINGS;
sensor_reading[sensor_reading_index] = reading;
sum = sensor_reading[0];
for (i = 1; i < SENSOR_READINGS; i++)
sum += sensor_reading[i];
return sum / SENSOR_READINGS;
}
At start-up, you call sensor_init() with the very first valid sensor reading, and sensor_update() with the following sensor readings. The sensor_update() will return the filtered result.
The above works best when the sensor is regularly polled, and SENSOR_READINGS can be chosen large enough to properly filter out any unwanted noise in the sensor readings. Of course, the array requires RAM, which may be on short supply in some microcontrollers.
Exponential smoothing:
When there is not enough RAM to use a moving average to filter data, an exponential smoothing filter is often applied.
The idea is that we keep an average value, and recalculate the average using each new sensor reading using (A * average + B * reading) / (A + B). The effect of each sensor reading on the average decays exponentially: the weight of the most current sensor reading is always B/(A+B), the weight of the previous one is A*B/(A+B)^2, the weight of the one before that is A^2*B/(A+B)^3, and so on (^ indicating exponentiation); the weight of the n'th sensor reading in the past (with current one being n=0) is A^n*B/(A+B)^(n+1).
The code corresponding to the previous filter is now
#define SENSOR_AVERAGE_WEIGHT 31
#define SENSOR_CURRENT_WEIGHT 1
int sensor_reading;
void sensor_init(const int reading)
{
sensor_reading = reading;
}
int sensor_update(const int reading)
return sensor_reading = (sensor_reading * SENSOR_AVERAGE_WEIGHT +
reading * SENSOR_CURRENT_WEIGHT) /
(SENSOR_AVERAGE_WEIGHT + SENSOR_CURRENT_WEIGHT);
}
Note that if you choose the weights so that their sum is a power of two, most compilers optimize the division into a simple bit shift.
Applying hysteresis:
(This section, including example code, edited on 2016-12-22 for clarity.)
Proper hysteresis support involves having the displayed value in higher precision than is used for output. Otherwise, your output value with hysteresis applied will never change by a single unit, which I would consider a bad design in an user interface. (I'd much prefer a value to flicker between two consecutive values every few seconds, to be honest -- and that's what I see in e.g. the weather stations I like best with good temperature sensors.)
There are two typical variants in how hysteresis is applied to readouts: fixed, and dynamic. Fixed hysteresis means that the displayed value is updated whenever the value differs by a fixed limit; dynamic means the limits are set dynamically. (The dynamic hysteresis is much rarer, but it may be very useful when coupled with the moving average; one can use the standard deviation (or error bars) to set the hysteresis limits, or set asymmetric limits depending on whether the new value is smaller or greater than the previous one.)
The fixed hysteresis is very simple to implement. First, because we need to apply the hysteresis to a higher-precision value than the output, we choose a suitable multiplier. That is, display_value = value / DISPLAY_MULTIPLIER, where value is the possibly filtered sensor value, and display_value is the integer value displayed (number of bars lit, for example).
Note that below, display_value and the value returned by the functions, refer to the integer value displayed, for example the number of lit LED bars. value is the (possibly filtered) sensor reading, and saved_value containing the sensor reading that is currently displayed.
#define DISPLAY_HYSTERESIS 10
#define DISPLAY_MULTIPLIER 32
int saved_value;
void display_init(const int value)
{
saved_value = value;
}
int display_update(const int value)
{
const int delta = value - saved_value;
if (delta < -DISPLAY_HYSTERESIS ||
delta > DISPLAY_HYSTERESIS)
saved_value = value;
return saved_value / DISPLAY_MULTIPLIER;
}
The delta is just the difference between the new sensor value, and the sensor value corresponding to the currently displayed value.
The effective hysteresis, in units of displayed value, is DISPLAY_HYSTERESIS/DISPLAY_MULTIPLIER = 10/32 = 0.3125 here. It means that the displayed value can be updated three times before a visible change is seen (if e.g. slowly decreasing or increasing; more if the value is just fluctuating, of course). This eliminates rapid flickering between two visible values (when the value is in the middle of two displayed values), but ensures the error of the reading is less than half display units (on average; half plus effective hysteresis in the worst case).
In a real life application, you usually use a more complete form return (saved_value * DISPLAY_SCALE + DISPLAY_OFFSET) / DISPLAY_MULTIPLIER, which scales the filtered sensor value by DISPLAY_SCALE/DISPLAY_MULTIPLIER and moves the zero point by DISPLAY_OFFSET/DISPLAY_MULTIPLIER, both evaluated at 1.0/DISPLAY_MULTIPLIER precision, but only using integer operations. However, for simplicity, I'll just assume that to derive the display value value, say the number of lit LED bars, you just divide the sensor value by DISPLAY_MULTIPLIER. In either case, the hysteresis is DISPLAY_HYSTERESIS/DISPLAY_MULTIPLIER of the output unit. Ratios of about 0.1 to 0.5 work fine; and the below test values, 10 and 32, yields 0.3125, which is about midway of the range of ratios that I believe work best.
Dynamic hysteresis is very similar to above:
#define DISPLAY_MULTIPLIER 32
int saved_value_below;
int saved_value;
int saved_value_above;
void display_init(const int value, const int below, const int above)
{
saved_value_below = below;
saved_value = value;
saved_value_above = above;
}
int display_update(const int value, const int below, const int above)
{
if (value < saved_value - saved_value_below ||
value > saved_value + saved_value_above) {
saved_value_below = below;
saved_value = value;
saved_value_above = above;
}
return saved_value / DISPLAY_MULTIPLIER;
}
Note that if DISPLAY_HYSTERESIS*2 <= DISPLAY_MULTIPLIER, the displayed value is always within a display unit of the actual (filtered) sensor value. In other words, hysteresis can easily deal with flickering, but it does not need to add much error to the displayed value.
In many practical cases the best amount of hysteresis applied depends on the amount of short-term variations in the sensor samples. This includes not only noise, but also the types of signals that are to be measured. A hysteresis of just 0.3 (relative to the output unit) is sufficient to completely eliminate the flicker when sensor readings flip the filtered sensor value between two consecutive integers that map to different integer ouputs, as it ensures that the filtered sensor value must change by at least 0.3 (in output display units) before it effects a change in the display.
The maximum error with hysteresis is half display units plus the current hysteresis. The half unit is the minimum error possible (since consecutive units are one unit apart, so when the true value is in the middle, either value shown is correct to within half a unit). With dynamic hysteresis, if you always start with some fixed hysteresis value when a reading changes enough, but when the reading is within the hysteresis, you instead just decrease the hysteresis (if greater than zero). This approach leads to a changing sensor value being tracked correctly (maximum error being half an unit plus the initial hysteresis), but a relatively static value being displayed as accurately as possible (at half an unit maximum error). I don't show an example of this, because it adds another tunable (how the hysteresis decays towards zero), and requires that you verify (calibrate) the sensor (including any filtering) first; otherwise it's like polishing a turd: possible, but not useful.
Also note that if you have 30 bars in the display, you actually have 31 states (zero bars, one bar, .., 30 bars), and thus the proper range for the value is 0 to 31*DISPLAY_MULTIPLIER - 1, inclusive.

Related

Compute efficiently Max and Min on data stream

I'm working in C with a stream of data. Basically I receive a column array of 6 elements every n milliseconds. I would like to compute the max value for each row of data.
To make this clear this is how my data looks like (this is a toy example, actually I'll have thousand of columns acquired):
[6] [-10] [5]
[1] [5] [3]
[5] [30] [10]
[2] [-10] [0]
[-2][5] [10]
[-5][0] [1]
So basically (as I said) I receive a column of data every n milliseconds, and I want to compute the max and min value row-wise. So in my previous example my result would be:
max_values=[6,5,30,2,10,1]
min_values=[-10,1,5,-10,-2,-5]
I want to point out that I have no access to the full matrix, I can only work over single columns of 6 elements that I receive every n milliseconds.
This is my simple code algorithm so far (I'm omitting the whole code since it's part of a bigger project):
for(int i=0;i<6;i++){
if(input[i]>temp_max[i]){
temp_max[i]=input[i];
}
if(input[i]<temp_min[i]){
temp_min[i]=input[i];
}
}
Where input, temp_max and temp_min are all float arrays of dimension 6.
Basically my code executes this piece of code everytime a new input array is available and updates the maximum and minimum accordingly.
Since I'm interested in performance (this is going to run on an embedded system), is there any way to improve this part of the code? Calling a comparison for each single element of the 2 arrays doesn't seem the most smart idea.
With random input data (i.e. unordered data), it'll be pretty hard (aka impossible) to find min/max without a comparison per element.
You may get some minor improvement from something like:
temp_max[0]=input[0];
temp_min[0]=input[0];
for(int i=1;i<6;i++){ // Only 1..6
if(input[i]>temp_max[i]){
temp_max[i]=input[i];
}
else // If current element was larger than max, you don't need to check min
{
if(input[i]<temp_min[i]){
temp_min[i]=input[i];
}
}
}
but I doubt this will be a significant improvement.
Branching is slow, especially on embedded systems. Scalar computation too.
Hopefully, your targeted processor seems to be an ARM-based processor supporting the NEON SIMD instruction set (apparently one based on a 64-bits ARM-V8 A53 architecture). NEON can compute 4 32-bits floating-point operations in a row. This should be much faster than the current code (which compilers apparently fail to vectorize).
Here is an example code (untested):
void minmax_optim(float temp_min[6], float temp_max[6], float input[6]) {
/* Compute the first 4 floats */
float32x4_t vInput = vld1q_f32(input);
float32x4_t vMin = vld1q_f32(temp_min);
float32x4_t vMax = vld1q_f32(temp_max);
vMin = vminq_f32(vInput, vMin);
vMax = vmaxq_f32(vInput, vMax);
vst1q_f32(temp_min, vMin);
vst1q_f32(temp_max, vMax);
/* Remainder 2 floats */
float32x2_t vLastInput = vld1_f32(input+4);
float32x2_t vLastMin = vld1_f32(temp_min+4);
float32x2_t vLastMax = vld1_f32(temp_max+4);
vLastMin = vmin_f32(vLastInput, vLastMin);
vLastMax = vmax_f32(vLastInput, vLastMax);
vst1_f32(temp_min+4, vLastMin);
vst1_f32(temp_max+4, vLastMax);
}
The resulting code should be much faster. One can see on goldbolt that the number of instructions of this vectorized implementation is drastically smaller than the reference implementation without any conditional jump instructions.
You nailed it -- you have to keep a temporary max and min arrays. Unfortunately, if we're talking strictly C, it seems to be the single possible and thus most performant algorithm possible.
Since you've mentioned it's going to run on embedded system (but omitted which), please make sure you have hardware floating point support. If there isn't, that's going to be high performance penality. If you have high-end hardware, you can look for availability of vector instructions, but then that's platform-specific, possibly by use of assembly.
To my impression the approach as such cannot be substantially improved, as the input is not available as a whole. That being said, the inner comparisons can be compacted. The assignemnts
if(input[i]>temp_max[i]){
temp_max[i]=input[i];
}
if(input[i]<temp_min[i]){
temp_min[i]=input[i];
}
can be improved to
if(input[i]>temp_max[i]){
temp_max[i]=input[i];
}
else if(input[i]<temp_min[i]){
temp_min[i]=input[i];
}
because if the current value replaces the temporary maximum, it cannot also replace the temporary minimum (assuming some sensible initialization).
Only for max but it is easy to expand
#define MAX(a,b,c) (a) > (b) ? ((b) > (c) ? (b) : (a) > (c) ? (a) : (c) ) : (b) > (c) ? (b) : (c)
void rowmax(int *a, int *b, int *c, int *result, size_t size)
{
for(size_t index = 0; index < size; index++)
{
result[index] = MAX(a[index], b[index], c[index]);
}
}

Low Pass Filter in OpenCL

I am trying to implement a low pass filter in OpenCL and the theory behind all this has me confused a bit. I have attached my code at the bottom after my explanation of the scenario.
First off, let me try to explain the whole scenario in point form.
For the input, we have a cos signal with a sample size, frequency (Frequency sample obtained by multiplying sample size with frequency) and a step size.
The value of at each step size is stored in an array with the frequency and step size multiplied to the function
This array is then passed into the kernel, which then will execute the low pass filter function.
Kernel returns an output array with the new filtered values.
The cos function is always returning a value from (-1,1), the only thing that modifies this value is the frequency. So it may repeat faster or slower depending on the frequency BUT it is always between (-1,1).
This is where I am confused, I am not sure how to apply a low pass filter to these values. Let say the cutoff was 100Hz for the filter. I can't just say:
if(array[i] > 100 ) { //delete or ignore this value. Else store in a array }
The reason this won't work is because the value of array[i] ranges from (-1,1). So how then would I apply this filter? What values am I going to compare?
From a physical perspective, I can see how it works, a capacitor and a resistor to calculate the cut-off frequency and send the input through the circuit. But programmatically, I do not see how I can implement this. I have seen many implementations of this on-line but the code wasn't documented enough to get a good understanding of what was going on.
Here is the code on my host side:
//Array to hold the information of signal
float *Array;
//Number of sampling points
int sampleSize = 100;
float h = 0;
//Signal Frequency in Hz
float signalFreq = 10;
//Number of points between 0 and max val (T_Sample)
float freqSample = sampleSize*signalFreq;
//Step = max value or T_Sample
float stepSize = 1.0 / freqSample;
//Allocate enough memory for the array
Array = (float*)malloc(sampleSize*sizeof(float));
//Populate the array with modified cosine
for (int i = 0; i < sampleSize; i++) {
Array[0] = cos(2*CL_M_PI*signalFreq*h);
h = h + stepSize;
printf("Value of current sample for cos is: %f \n", Array[0]);
}
My kernel is only as follows: (Obviously this is not the code for the filter, this is where I am confused).
__kernel void lowpass(__global int *Array, __local float *cutOffValue, __global int *Output) {
int idx = get_global_id(0);
Output[idx] = Array[idx];
};
I found this PDF that implements a lot of filters. Near the end of the document you can find a float implementation of the Low Pass Filter.
http://scholar.uwindsor.ca/cgi/viewcontent.cgi?article=6242&context=etd
In the filter implementation in that pdf, the compare data[j] to value. Also I have no idea what numItems or workItems is.
If someone can provide some insight on this that would be great. I have searched a lot of other examples on low pass filters but I just can't wrap my head around the implementation. I hope I made this question clear. Again, I know how/what the low pass filter does. I just have no idea as to what values I need to compare in order for the filtering to take place.
Found this question aswell:
Low Pass filter in C
I have a possible solution. What I am attempting will be a moving average fir (which I am told is the easiest form of a lowpass filter that one can implement).
What is required:
FIFO buffer
Coefficient values (I generated and obtained mine from matlab for a specific cut-off frequency)
Input and Output arrays for the program
I have not implemented this code wise, but I do understand how to use it on a theoretical level. I have created a diagram below to try and explain the process.
Essentially, from another input array values will be passed into the FIFO buffer one at a time. Every time a value is passed in, the kernel will do a multiplication across the FIFO buffer that has 'n' taps. Each tap has a coefficient value associated with it. So the input at a particular element gets multiplied with the coefficient value and all the values are then accumulated and stored in one element of the output buffer.
Note that the coefficients were generated in Matlab. I didn't know how else to grab these values. At first I was going to just use the coefficient of 1/n, but I am pretty sure that is just going to distort the values of the signal.
And that should do the trick, I am going to implement this in the code now, but if there is anything wrong with this theory feel free to correct it.

Restricted 3 body simulation not working correctly

I am currently trying to do an assignment where i have to write a simulation for the restricted 3 body gravitational problem, with two fixed masses and one test mass. The information i have been given on the problem is: Check out this link and here is my program so far:
#include<stdlib.h>
#include<stdio.h>
#include <math.h>
int main (int argc, char* argv[])
{
double dt=0.005, x[20000],y[20000],xv,yv,ax,ay,mneg,mpos,time,radius=0.01;
int n,validation=0;
FILE* output=fopen("proj1.out", "w");
printf("\n");
if((argv[1]==NULL) || (argv[2]==NULL) || (argv[3]==NULL) || (argv[4]==NULL) || (argv[5]==NULL) || (argv[6]==NULL))
{
printf("************************ ERROR! ***********************\n");
printf("** Not enough comand line arguments input. **\n");
printf("** Please run again with the correct amount (6). **\n");
printf("*******************************************************\n");
validation=1;
goto VALIDATIONFAIL;
}
if((sscanf(argv[1], "%lf", &mneg)==NULL) || (sscanf(argv[2], "%lf", &mpos)==NULL) || (sscanf(argv[3], "%lf", &x[0])==NULL) ||
(sscanf(argv[4], "%lf", &y[0])==NULL) || (sscanf(argv[5], "%lf", &xv)==NULL) || (sscanf(argv[6], "%lf", &yv)==NULL) )
{
printf("************************* ERROR! ************************\n");
printf("** Input values must be numbers. Please run again with **\n");
printf("** with numerical inputs (6). **\n");
printf("*********************************************************\n");
validation=1;
goto VALIDATIONFAIL;
}
sscanf(argv[1], "%lf", &mneg);
sscanf(argv[2], "%lf", &mpos);
sscanf(argv[3], "%lf", &x[0]);
sscanf(argv[4], "%lf", &y[0]);
sscanf(argv[5], "%lf", &xv);
sscanf(argv[6], "%lf", &yv);
x[1]=x[0]+(xv*dt);
y[1]=y[0]+(yv*dt);
for(n=1;n<10000;n++)
{
if(x[n-1]>=(1-radius) && x[n-1]<=(1+radius) && y[n-1]>=(0-radius) && y[n-1]<=(0+radius))
{
printf("Test mass has collided with M+ at (1,0), Exiting...\n");
goto EXIT;
}
else if(x[n-1]>=(-1-radius) && x[n-1]<=(-1+radius) && y[n-1]>=(0-radius) && y[n-1]<=(0+radius))
{
printf("Test mass has collided with M- at (-1,0), Exiting...\n");
goto EXIT;
}
else
{
double dxn = x[n] + 1;
double dxp = x[n] - 1;
double mnegdist = pow((dxn*dxn + (y[n]*y[n])), -1.5);
double mposdist = pow((dxp*dxp + (y[n]*y[n])), -1.5);
ax = -(mpos*dxp*mposdist+mneg*dxn*mnegdist);
ay = -(mpos*y[n]*mposdist+mneg*y[n]*mnegdist);
x[n+1]=((2*x[n])-x[n-1] +(dt*dt*ax));
y[n+1]=((2*y[n])-y[n-1]+(dt*dt*ay));
fprintf(output, "%lf %lf\n",x[n-1], y[n-1]);
}
}
VALIDATIONFAIL:
printf("\n");
return(EXIT_FAILURE);
EXIT:
return(EXIT_SUCCESS);
}
My program is working to certain extent but i am getting some weird problems that i hope someone can help me with.
The main issue is that when the test mass gets to a point in its trajectory when it should go off and start to orbit about the other mass it instead just shoots off on a straight line to infinity! at first i thought it was that the masses were colliding so i put in the radius check, but in some cases this does work, in some cases it doesn't, and in some cases the masses collide earlier on before the trajectory goes wrong anyway so this clearly isn't the issue. I am not sure if i have explained that all too well so here is a picture to show you what i mean. (the simulation on the right is from here)
However, this is not always the case, sometimes instead of going in a straight line, the trajectory just goes crazy when it should go over to the other mass, like this:
I really have absolutely no idea whats going on i have spent days trying to figure this out but just cant seem to get anywhere, so any help in identifying where my problem is would be very much appreciated.
This is just too long to fit in a comment and I also might be of use to future visitors.
The proper choice of timestep for a given computation is not an easy task. The family of Verlet integrators are symplectic, which means that they preserve the phase space volume and hence should preserve the total energy of the system given infinite precision and an infinitely small timestep, but unfortunately real computers operate with finite precision and the human life is too short in order for us to wait for an infinite number of timesteps.
The Verlet integrator, like the one that you have implemented and the velocity Verlet scheme, have global error which is O(Δt2). It means that the algorithm is only quadratically sensitive to the timestep and in order to greatly improve the precision, one has to decrease the timestep accordingly by as many times as the square root of the desired precision improvement factor. Click on the into button of the Flash applet that you compare your trajectories with and you'll see that it uses a completely different integrator - the Euler-Cromer algorithm (also known as semi-implicit Euler method). It has different precision given the same timestep (actually it is worse than that of the Verlet scheme given the same timestep) and hence you cannot and should not directly compare both trajectories, rather only their statistical properties (e.g. mean energy, mean velocity, etc.)
My point was that you have to decrease the timestep, because it is too large to handle the cases when the test body comes too close to one of the gravitational centres. There is another problem hidden here and it is the finite numerical precision. Observe this term:
double dxp = x[n] - 1;
double mposdist = pow((dxp*dxp + (y[n]*y[n])), -1.5);
Whenever you subtract two closely valued floating point numbers (x[n] and 1.0), a very unfortunate event happens, known as precision loss as most of the higher significant bits in their mantissas cancel each other and in the end, after the normalisation step, you get a number with much less significant bits than the original two numbers. This precision loss gets even bigger as the result is then squared and used as a denominator. Note that this mostly happens near the axis of symmetry of the system where y[n] comes close to 0. Otherwise y[n] might be big enough so that dxp*dxp is only a tiny correction to the value of y[n]*y[n]. The net result is that the force would come out totally wrong near each fixed mass and would usually be greater in magnitude than the actual force. This is prevented in your case as you test for the point being outside the prescribed radius.
Greater forces lead to greater displacements given a fixed timestep. This also leads to an artificial increase in the total energy of the system, i.e. the test mass would tend to move faster than in a finer simulation. It also might happen that the test body ends up so close to the gravitational centre, that the huge force times the square of the timestep might give so huge a displacement, that your test mass would end up much far away, but this time with the increased total energy it would result in high kinetic energy and the body would practically be ejected from the simulation volume. This could also happen even if you compute the force with infinite precision - simply the displacement between two timesteps might be so large (because of the large timestep) that the system would make an unrealistic jump in the phase space to a completely different energy isosurface. And with gravity (as well as with electrostatics) it is so easy to get to such a case as the force increases as 1/r^2 and near the radius it is many orders of magnitude stronger than in the initial state.
One might come up with different rules of a thumb to estimate the size of the timestep given the largest expected force value, but in general the higher the maximum force, the lower the timestep should be. These kind of things can usually be roughly estimated given the initial conditions, which saves a lot of failed simulations due to "ejection" effects.
Now as the Verlet schemes are symplectic, the best way to control the correctness of the simulation is to observe the total energy of the system. Note that the velocity Verlet integrator is a bit better as it is numerically more stable (but still it has the same dependence of the accuracy on the square of the timestep). With the standard Verlet scheme you can get an approximation of the speed v[i] by taking (x[i+1] - x[i-1])/(2*dt). With the velocity Verlet, the speed is included explicitly in the equations.
Either way, it would make sense to take the speed and to compute the total energy of the system at each timestep and to observe the value. If the timestep is Just Right (tm), then the total should be almost conserved with relatively small oscillations around the mean value. If it goes crazy upwards, then your timestep is too big and should be decreased.
Decreasing the timestep increases the run-time of the simulation accordingly. One could also observe that in the far field the forces are small, the point moves slowly and a long timestep is just fine. Shorter timestep would not improve the solution there but would only increase the run-time. That's why people have invented the multi-timestep algorithms and also the adaptive timestep algorithms that automatically refine the solutions in the near field. Also a different method to compute the forces might be applied there by transforming the equations so as to not include subtraction of closely valued variables.
(well, this came out way larger than even several comments)
I found it difficult to understand your code, so I will try to be as helpful as I can and apologize, if I am telling you things you already know.
The best way I have found to calculate the physics for simulations involving gravity is to use Newtons Law of Universal Gravitation. This is given by the formula:
F = ( ( -G * M1 * M2 ) / ( R * R ) ) * r_unit_vector;
Where:
G ~= 6.67e-11,
M1 is the mass of the first object,
M2 is the mass of the second object,
R is the distance between the two objects: sqrt(pow(X2 - X1, 2) + pow(Y2 - Y1, 2))
X1 is the X-coordinate of object 1,
X2 is the X-coordinate of object 2,
Y1 is the Y-coordinate of object 1,
Y2 is the Y-coordinate of object 2.
r_unit_vector is a unit vector pointing from object 2 to object 1
struct r_unit_vector_struct{
double x, y;
}r_unit_vector;
r_unit_vector has a x component which is object 2's x-coordinate - object 1's x-coordinate,
r_unit_vector has a y component which is object 2's y-coordinate - object 1's y-coordinate.
To make r_unit_vector a unit vector, you must divide (both the x aand y components separately) by its length, which is given by sqrt(pow(r_unit_vector.x, 2) + pow(r_unit_vector.y - Y1, 2))
And you should all be ready to go! Hopefully this makes sense. If not, I will write you a class to do this stuff or explain it further if I can!

Arrival-time handling with wrap around in C

I´m currently writing a software in Ansi-C and are struggling to get one of the basic functionality to work.
The software will receive messages over a CAN-network and when these messages arrive, I need to make sure that they are delivered before a expected time and after the previous message.
Only unsigned variables are allowed to be used, so there will be problems with wrap around when the timers reach their maximum value (255 in my case).
It is easy to verify that messages arrive before the expected time, since I know the maximum time between two messages.
This example handles wrap around and discovers messages that are late:
UC_8 arrival = 250;
UC_8 expected = 15;
UC_8 maxInterArrTime = 30;
result = expected - arrival;
if(result <= maxInterArrTime){
// ON TIME!
}
else{
// DELAYED
}
This is the easy part, but I must also check that the arrived message actually have arrived after the previous message. My problem is that I do not know how to solve this with the wrap around problem. I tried to mimic the solution that finds delayed messages, but without any luck.
UC_8 arrival = 10; // Wrapped around
UC_8 lastArrival = 250;
UC_8 expected = 15;
UC_8 maxInterArrTime = 30;
result = expected - arrival;
result2 = lastArrival - arrival; //Problem
if(result2 >= ???){ // How should I compare and with what?
//Message received after previous msg
if(result <= maxInterArrTime){
// ON TIME!
}
else{
// DELAYED
}
else{
//Message received before previous msg - ERROR
}
My problem is when the arrival time value is lower than the previous arrival time, but is actually "larger" since it has wrapped around. I guess I might need to do it i several steps.
Any suggestions how I can solve this? I need to keep the number of if-statements low, the code will be analysed for complexity and other stuff.
If you can GUARANTEE that the delay between packets will not be 256 or more then the following will account for wrap around
if (newerTime >= olderTime)
delay = newerTime - olderTime;
else
delay = 256 - olderTime + newerTime;
If you can't guarantee the delay is less than 256 then unwind is correct, and you can't do what you want to do.
Huh? You can't magically code your way around a case of missing information. If you only have 8-bit unsigned timestamps, then you will not be able to differentiate between something that happened 3 ticks ago, and something that happened 259 ticks ago, and so on.
Look into making larger (more bits) timestamps available.
If you can ensure that the absolute value of the time delta is less than 1/2 of the maximum measurable time span then you can determine the time delta.
int8_t delta_u8(uint8_t a, uint8_t b) {
int8_t delta = a - b;
return delta;
}
...
delta = delta_u8(newerTime, olderTime);
delay = abs( (int) delta ); // or you could have a byte version of abs, since
// I suspect that you may be doing embedded stuff
// and care about such things.
If you can ensure that time always move forward then you can do better. By time moving forward I mean that in your case newerTime is always greater than olderTime, regardless of how they numerically compare. In this case you can measure deltas up to the maximum measurable time span -- which really goes without saying.
uint8_t delta_i8(uint8_t a, uint8_t b) {
return a - b;
}
If you know that two events can't happen during the same tick you can do even better by 1. If you know that two events can't happen closer together than a certain amount of time then you can calculate deltas up to maximum time span representable by your time stamp + the amount of time that must be between events, but then you have to use a larger variable size to do the actual math.
All of these work because the values rap around when you do math on them. You can very easily think of this as turning one of your known time values into the new origin (0) and adjusting the other time value to match this shift.

Efficient Implementation of Fitness-Proportionate "Roulette" Selection

I am currently writing a keyboard layout optimization algorithm in C (such as the one designed by Peter Klausler) and I want to implement a fitness-proportionate selection as described here (PDF Link):
With roulette selection you select
members of the population based on a
roullete wheel model. Make a pie
chart, where the area of a member’s
slice to the whole circle is the ratio
of the members fitness to the total
population. As you can see if a point
on the circumfrence of the circle is
picked at random those population
members with higher fitness will have a
higher probability of being picked.
This ensures natural selection takes
place.
The problem is, I don't see how to implement it efficiently. I've thought of two methods: one is unreliable, and the other is slow.
First, the slow one:
For a keyboard pool of length N, create an array of length N where each element of the array actually contains two elements, a minimum and a maximum value. Each keyboard has a corresponding minimum and maximum value, and the range is based on the fitness of the keyboard. For example, if keyboard zero has a fitness of 10, keyboard one has a fitness of 20, and keyboard two has a fitness of 25, it would look like this:
Code:
array[0][0] = 0; // minimum
array[0][1] = 9; // maximum
array[1][0] = 10;
array[1][1] = 30;
array[2][0] = 31;
array[2][1] = 55;
(In this case a lower fitness is better, since it means less effort is required.)
Then generate a random number. For whichever range that number falls into, the corresponding keyboard is "killed" and replaced with the offspring of a different keyboard. Repeat this as many times as desired.
The problem with this is that it is very slow. It takes O(N^2) operations to finish.
Next the fast one:
First figure out what the lowest and highest fitnesses for the keyboards are. Then generate a random number between (lowest fitness) and (highest fitness) and kill all keyboards with a fitness higher than the generated number. This is efficient, but it's not guaranteed to only kill half the keyboards. It also has somewhat different mechanics from a "roulette wheel" selection, so it may not even be applicable.
So the question is, what is an efficient implementation?
There is a somewhat efficient algorithm on page 36 of this book (Link), but the problem is, it's only efficient if you do the roulette selection only one or a few times. Is there any efficient way to do many roulette selections in parallel?
For one thing, it sounds like you are talking about unfitness scores if you want to "kill off" your selection (which is likely to be a keyboard with high score).
I see no need to maintain two arrays. I think the simplest way is to maintain a single array of scores, which you then iterate through to make a choice:
/* These will need to be populated at the outset */
int scores[100];
int totalScore;
for (gen = 0; gen < nGenerations; ++gen) {
/* Perform a selection and update */
int r = rand() % totalScore; /* HACK: using % introduces bias */
int t = 0;
for (i = 0; i < 100; ++i) {
t += scores[i];
if (r < t) {
/* Bingo! */
totalScore -= scores[i];
keyboards[i] = generate_new_keyboard_somehow();
scores[i] = score_keyboard(keyboards[i]);
totalScore += scores[i]; /* Now totalScore is correct again */
}
}
}
Each selection/update takes O(n) time for n keyboards.

Resources