I am writing a piece of code to solve the LaPlacian at a boundary with charge density, in C. It uses the following loop:
chargeold[i] = charge[i];
charge[i] = -0.05*sgn(mat[i][j])*mat[i][j]*mat[i][j];
charge[i] = (1.0 - alpha)*charge[i] + alpha*chargeold[i];
mat[i][j] = (( mat[i][j-1] + di2*mat[i][j+1] ) / ( di2 + 1.0)) + 80.0*charge[i];
where the constant alpha is an under-relaxation parameter 0 < alpha < 1. This hasn't seemed to work and I think it may be to do with the numerical instability of the code - so far I have tried changing the constants, here -0.05, alpha and 80.0, and the sign in various the lines for calculating charge and mat[i][j] and get wildly different results depending on what I put: for example changing the coefficient of charge[i] on the last line by just 10.0 can cause the programme to get trapped inside a loop, diverge to infinity or -infinity, or quickly converge to 0 (which is not to be expected). This suggests to me a problem with the code I have created.
I have also tried condensing the calculation down into one line, or doing the same steps in different lines, and these also change the result wildy.
Any help on this would be appreciated. Thanks.
n.b. all data types are double
EDIT - full loop looks like:
do
{
sum_matdiff = 0;
for (i = 1; i < meshno; i++)
{
for (j = 1; j < meshno; j++)
{
if (bound[i][j] == 1) // holds boundary conditions
continue;
else
matold[i][j] = mat[i][j];
if ((i + j + count) % 2 == 1)
{
continue;
}
else if (j == (int)(0.3 * meshno))
{ // if statement to calculate at boundary
chargeold[i] = charge[i];
charge[i] = -0.05 * sgn(mat[i][j]) * mat[i][j] * mat[i][j];
charge[i] = (1.0 - alpha) * charge[i] + alpha * chargeold[i];
mat[i][j] =
((mat[i][j - 1] + di2 * mat[i][j + 1]) / (di2 + 1.0)) +
80.0 * charge[i];
if (i == 50)
{
printf("%f\n", charge[50]);
}
}
else
{ // calculates outside boundary
omega = 1.0 / (1.0 - 0.25 * omega * rho_sq);
mat[i][j] =
0.25 * (mat[i + 1][j] + mat[i - 1][j] + mat[i][j + 1] +
mat[i][j - 1] + (mat[i + 1][j] - mat[i - 1][j]) / (2 * i));
}
mat[i][j] = (1.0 - omega) * matold[i][j] + omega * mat[i][j];
sum_matdiff += fabs(1.0 - matold[i][j] / mat[i][j]);
}
}
count += 1;
av_diff = sum_matdiff / N;
}
while (av_diff > 0.01 || count < meshno * 2);
Related
I have a simple for loop that I want to parallelize using Rayon in Rust. However I am stuck at applying the boundary conditions.
The loop in Rust is:
let mut u: Vec<Vec<f64>> = vec![vec!(10.0; x.len()); y.len()]; // Set initial condition = u(x,0) = 10
let mut u_new: Vec<Vec<f64>> = vec![vec!(0.0; x.len()); y.len()]; // Set initial condition = u_new(x,0) = 0
for t in 1..N_t - 1 {
for i in 1..N_x - 1 {
for j in 1..N_y - 1 {
u_new[i][j] = u[i][j]
+ (dt / (dx * dx)) * (u[i - 1][j] + u[i + 1][j] - 2.0 * u[i][j])
+ (dt / (dy * dy)) * (u[i][j - 1] + u[i][j + 1] - 2.0 * u[i][j]);
u_new[0][j] = 0.0;
u_new[N_x - 1][j] = 0.0;
u_new[i][0] = 0.0;
u[i][N_y - 1] = 0.0;
}
}
u = u_new.clone();
}
I managed to parallelize the loop in Rayon but not able to add the boundary conditions:
for t in 1..N_t - 1 {
u_new.par_iter_mut().enumerate().for_each(|(i, r)| {
if (i != 0) && (i != N_x - 1) {
for (j, c) in r.iter_mut().enumerate() {
if (j != 0) && (j != N_x - 1) {
*c = u[i][j]
+ (dt / (dx * dx)) * (u[i - 1][j] + u[i + 1][j] - 2.0 * u[i][j])
+ (dt / (dy * dy)) * (u[i][j - 1] + u[i][j + 1] - 2.0 * u[i][j]);
}
}
}
});
// TODO: add boundary conditions
u = u_new.clone();
}
I want to add the boundary conditions in parallel using Rayon parallel iteration. Any ideas please?
So I'm doing a simple oscilloscope in C. It reads audio data from the output buffer (and drops buffer write counter when called so the buffer is refreshed). I tried making simple zero-cross triggering since most of the time users will see simple (sine, pulse, saw, triangle) waves but the best result I got with the code below is a wave that jumps back and forth for half of its cycle. What is wrong?
Signal that is fed in goes from -32768 to 32767 so zero is where it should be.
If you didn't understand what I meant you can see the video: click
Upd: Removed the code unrelated to triggering so all function may be understood easier.
extern Mused mused;
void update_oscillscope_view(GfxDomain *dest, const SDL_Rect* area)
{
if (mused.output_buffer_counter >= OSC_SIZE * 12) {
mused.output_buffer_counter = 0;
}
for (int x = 0; x < area->h * 0.5; x++) {
//drawing a black rect so bevel is hidden when it is under oscilloscope
gfx_line(domain,
area->x, area->y + 2 * x,
area->x + area->w - 1, area->y + 2 * x,
colors[COLOR_WAVETABLE_BACKGROUND]);
}
Sint32 sample, last_sample, scaled_sample;
for (int i = 0; i < 2048; i++) {
if (mused.output_buffer[i] < 0 && mused.output_buffer[i - 1] > 0) {
//here comes the part with triggering
if (i < OSC_SIZE * 2) {
for (int x = i; x < area->w + i; ++x) {
last_sample = scaled_sample;
sample = (mused.output_buffer[2 * x] + mused.output_buffer[2 * x + 1]) / 2;
if (sample > OSC_MAX_CLAMP) { sample = OSC_MAX_CLAMP; }
if (sample < -OSC_MAX_CLAMP) { sample = -OSC_MAX_CLAMP; }
if (last_sample > OSC_MAX_CLAMP) { last_sample = OSC_MAX_CLAMP; }
if (last_sample < -OSC_MAX_CLAMP) { last_sample = -OSC_MAX_CLAMP; }
scaled_sample = (sample * OSC_SIZE) / 32768;
if(x != i) {
gfx_line(domain,
area->x + x - i - 1, area->h / 2 + area->y + last_sample,
area->x + x - i, area->h / 2 + area->y + scaled_sample,
colors[COLOR_WAVETABLE_SAMPLE]);
}
}
}
return;
}
}
}
During debugging, I simplified the code until it started working. Thanks Clifford.
I found a trigger index i (let's say it is array index 300). Modified it so that the oscilloscope was drawing lines from [(2 * i) + offset] to [(2 * i + 1) + offset], thus an incorrect picture was formed.
I used (2 * i), because I wanted long waves to fit into oscilloscope. I replaced it with drawing from [i + offset] to [i + 1 + offset] and that solved a problem.
Afterwards, I implemented "horizontal scale 0.5x properly.
The output waveform still jumps a little, but overall it holds it in place.
1I have a 3 stage biquad filter to filter the input signal data (input). While optimizing, I unrolled the entire loop and used load (vld1_s32), multiply-accumulate (vmlal_s32) intrinsics thinking that this optimization will reduce the time spent by the CPU on the code. But, I didn't get better results. Is it possible to optimize the following code in any other way using ARM NEON Intrinsics (SIMD architecture)?
Note: Values of buffer (x) and coefficients (h) are 32-bit integers.
long long int sum;
for(i1=0;i1<100;i1++)
{
temp = input[i1];//32 bit (int) datatype
for(i=0;i<3;i++)
{
sum = x[1+2*i] * h[1+5*i];
sum += x[3+2*i] * h[3+5*i];
sum = sum<<1; //center tapping
sum += x[2+2*i] * h[2+5*i];
sum += x[4+2*i] * h[4+5*i];
sum += temp * h[0+5*i];
x[2+2*i] = x[1+2*i];
x[1+2*i] = temp;
temp = sum>>31;
}
printf("%lld\n",temp);
}
Optimized code: (i believe it's only partial optimization)
int32x2_t x_vec,h_vec;
int64x2_t result_vec;
for(i1=0;i1<100;i1++)
{
temp =input[i1];
for (i = 0; i < 3; i++)
{
result_vec = vdupq_n_s64(0);
for (j = 0; j < 2; j++) // 5 coefficients in total for a biquad - 4 used here
{
x_vec = vld1_s32(&x[1 + 2 * i + 2 * j]);
h_vec = vld1_s32(&h[1 + 5 * i + 2 * j]);
result_vec = vmlal_s32(result_vec, x_vec, h_vec);
}
sum = vadd_s64(result_vec[0], result_vec[1]);
sum = sum + temp * h[0 + 5 * i]) //- remaining one used here
sum += x[1 + 2 * i] * h[1 + 5 * i]);//adding one more time
sum += x[3 + 2 * i] * h[3 + 5 * i];//adding one more time
x[2 + 2 * i] = x[1 + 2 * i];
x[1 + 2 * i] = temp;
temp = sum>>31;
}
printf("%d\n",temp);
}
I would like to evaluate Pi approximately by running the following code which fits a regular polygon of n sides inside a circle with unit diameter and calculates its perimeter using the function in the code. However the output after the 34th term is 0 when long double variable type is used or it increases without bounds when double variable type is used. How can I remedy this situation? Any suggestion or help is appreciated and welcome.
Thanks
P.S: Operating system: Ubuntu 12.04 LTS 32-bit, Compiler: GCC 4.6.3
#include <stdio.h>
#include <math.h>
#include <limits.h>
#include <stdlib.h>
#define increment 0.25
int main()
{
int i = 0, k = 0, n[6] = {3, 6, 12, 24, 48, 96};
double per[61] = {0}, per2[6] = {0};
// Since the above algorithm is recursive we need to specify the perimeter for n = 3;
per[3] = 0.5 * 3 * sqrtl(3);
for(i = 3; i <= 60; i++)
{
per[i + 1] = powl(2, i) * sqrtl(2 * (1.0 - sqrtl(1.0 - (per[i] / powl(2, i)) * (per[i] / powl(2, i)))));
printf("%d %f \n", i, per[i]);
}
return 0;
for(k = 0; k < 6; k++)
{
//p[k] = k
}
}
Some ideas:
Use y = (1.0 - x)*( 1.0 + x) instead of y = 1.0 - x*x. This helps with 1 stage of "subtraction of nearly equal values", but I am still stuck on the next 1.0 - sqrtl(y) as y approaches 1.0.
// per[i + 1] = powl(2, i) * sqrtl(2 * (1.0 - sqrtl(1.0 - (per[i] / powl(2, i)) * (per[i] / powl(2, i)))));
long double p = powl(2, i);
// per[i + 1] = p * sqrtl(2 * (1.0 - sqrtl(1.0 - (per[i] / p) * (per[i] / p))));
long double x = per[i] / p;
// per[i + 1] = p * sqrtl(2 * (1.0 - sqrtl(1.0 - x * x)));
// per[i + 1] = p * sqrtl(2 * (1.0 - sqrtl((1.0 - x)*(1.0 + x)) ));
long double y = (1.0 - x)*( 1.0 + x);
per[i + 1] = p * sqrtl(2 * (1.0 - sqrtl(y) ));
Change array size or for()
double per[61+1] = { 0 }; // Add 1 here
...
for (i = 3; i <= 60; i++) {
...
per[i + 1] =
Following is a similar method for pi
unsigned n = 6;
double sine = 0.5;
double cosine = sqrt(0.75);
double pi = n*sine;
static const double mpi = 3.1415926535897932384626433832795;
do {
sine = sqrt((1 - cosine)/2);
cosine = sqrt((1 + cosine)/2);
n *= 2;
pi = n*sine;
printf("%6u s:%.17e c:%.17e pi:%.17e %%:%.6e\n", n, sine, cosine, pi, (pi-mpi)/mpi);
} while (n <500000);
Subtracting 1.0 from a nearly-1.0 number is leading to "catastrophic cancellation", where the relative error in a FP calculation skyrockets due to the loss of significant digits. Try evaluating pow(2, i) - (pow(2, i) - 1.0) for each i between 0 and 60 and you'll see what I mean.
The only real solution to this issue is reorganizing your equations to avoid subtracting nearly-equal nonzero quantities. For more details, see Acton, Real Computing Made Real, or Higham, Accuracy and Stability of Numerical Algorithms.
I've a problem, i'm stuck with some underflow problem for my algorithm.
I'm basically dseisgning a path from a Bezier curve and to deal with this I had to work with some vector multiplication (cross and dot product) in order to have the angle between two vectors and the clock-counterclock direction from one to another one.
The problem is that when the path is a straight line one of the control variable has problem of underflow, basically blocking the execution and causing errors.
Here is the code:
void BezierInterp() {
NumOfSetpoints = 10;
float seqTH[11];
float orient[10];
float divider;
math.MatrixMult((float*) BCoeff, (float*) waypointX, 11, 4, 1,
(float*) setpoint0);
math.MatrixMult((float*) BCoeff, (float*) waypointY, 11, 4, 1,
(float*) setpoint1);
float dx1, dy1, dx2, dy2, dxy1, dxy2, dir;
dx1 = cos(state[2]);
dy1 = sin(state[2]);
dx2 = setpoint0[1] - setpoint0[0];
dy2 = setpoint1[1] - setpoint1[0];
dxy2 = sqrt(sq(dx2) + sq(dy2));
dir = dx1 * dy2 - dx2 * dy1;
if (dxy2<0.0001 && dxy2>-0.0001) {
seqTH[0] = 0.0;
}
else{
if (dir >= 0) {
seqTH[0] = acos((dx1 * dx2 + dy1 * dy2) / (dxy2));
} else {
seqTH[0] = -acos((dx1 * dx2 + dy1 * dy2) / (dxy2));
}}
for (uint8_t i = 1; i <= 9; i = i + 1) {
dx2 = setpoint0[i + 1] - setpoint0[i];
dy2 = setpoint1[i + 1] - setpoint1[i];
dxy2 = sqrt(sq(dx2) + sq(dy2));
dx1 = setpoint0[i] - setpoint0[i - 1];
dy1 = setpoint1[i] - setpoint1[i - 1];
dxy1 = sqrt(sq(dx1) + sq(dy1));
dir = dx1 * dy2 - dx2 * dy1;
divider= dxy1 * dxy2;
if (divider<0.0001 && divider>-0.0001) {
seqTH[0] = 0.0;
}
else {
if (dir >= 0) {
seqTH[i] = acos((dx1 * dx2 + dy1 * dy2) / (divider));
} else {
seqTH[i] = -acos((dx1 * dx2 + dy1 * dy2) / (divider));
}}
}
print_array("seqTh", seqTH, 11, 6);
orient[0] = state[2] + seqTH[0];
if (orient[0]<0.0001 && orient[0]>-0.0001){orient[0]=0.0001;}
for (uint8_t i = 1; i <= 9; i = i + 1) {
orient[i] = orient[i - 1] + seqTH[i];
if (orient[i]<0.0001 && orient[i]>-0.0001){orient[i]=0.0001;}
}
print_array("orient", orient, 10, 6);
for (uint8_t i = 1; i <= 9; i = i + 1) {
setpoint2[i] = orient[i - 1];
setpoint3[i] = Vref * cos(orient[i - 1]);
setpoint4[i] = Vref * sin(orient[i - 1]);
}
setpoint2[10] = orient[9];
setpoint3[10] = 0;
setpoint4[10] = 0;
setpoint5[10] = 0;
}
}
As you see in the attempt to avoid error I put several if conditions, but was not enough.
Actually the problem come probably from dir=dx1 * dy2 - dx2 * dy1;. that's when moving along x or y axis is too small to be a float.
A friend suggested to use a boolean value but I'm not sure how.
Maybe defining boolean dir; and then if the value is too small will be a 0 otherwise will be considered a 1 and in that case I could use the same procedure i'm using now for the detection of the direction.
Do you have any suggestion or maybe a different solution?
Thanks in advance
Ned
I'm not familiar with the method you're using, but when I've done this in the past I've detected the degenerate case of the Bezier (where the two end points and two control points fall on a straight line) as a special case.
This is also much faster to draw of course.