C - Haversine formula slightly off - c

My output seems to be slightly off when calculating the distance between two lat/lon coordinates and I can't seem to work out why. Below is my code (implementation of Haversine formula)
float calcDistance(double latHome, double lonHome, double latDest, double lonDest) {
double pi = 3.141592653589793;
int R = 6371; //Radius of the Earth
latHome = (pi/180)*(latHome);
latDest = (pi/180)*(latDest);
double differenceLon = (pi/180)*(lonDest - lonHome);
double differenceLat = (pi/180)*(latDest - latHome);
double a = sin(differenceLat/2) * sin(differenceLat/2) +
cos(latHome) * cos(latDest) *
sin(differenceLon/2) * sin(differenceLon/2);
double c = 2 * atan2(sqrt(a), sqrt(1-a));
double distance = R * c;
printf("%f\n", distance);
return distance;
}
Input: 38.898556 -77.037852 38.897147 -77.043934
Output: 0.526339
Supposed to get 0.5492

The only thing I can think of (and if fixed gives the expected result) is the fact that you overwrite the local parameters latHome and latDest (but probably should use the original ones when calculating differenceLon and differenceLat above a). Use a different name for those, like latHomeTmp, then calculate a with the new ones, and it will work, like:
double latHomeTmp = (pi/180)*(latHome);
double latDestTmp = (pi/180)*(latDest);
double a = sin(differenceLat/2.) * sin(differenceLat/2.) +
cos(latHomeTmp) * cos(latDestTmp) *
sin(differenceLon/2.) * sin(differenceLon/2.);
In general it is a good idea to keep the parameters passed to a function immutable, to avoid situations like this one (of course, not a set in stone rule, but I usually obey it).

Related

If I run a loop a million times, do I have to worry about declaring doubles in each iteration?

I run a loop a million times. Within the loop I call a C function to do some math (generating random variables from various distributions, to be exact). As part of that function, I declare a couple of double variables to hold parts of the transformation. An example:
void getRandNorm(double *randnorm, double mean, double var, int n)
{
// Declare variables
double u1;
double u2;
int arrptr = 0;
double sigma = sqrt(var); // the standard deviation
while (arrptr < n) {
// Generate two uniform random variables
u1 = rand() / (double)RAND_MAX;
u2 = rand() / (double)RAND_MAX;
// Box-Muller transform
randnorm[arrptr] = sqrt(-2*log(u1))*cos(2*pi*u2)*sigma+mean;
arrptr++;
if (arrptr < n) { // for an odd n, we cannot add off the end
randnorm[arrptr] = sqrt(-2*log(u2))*cos(2*pi*u1)*sigma+mean;
arrptr++;
}
}
}
And the calling loop:
iter = 1000000 // or something
for (i = 0; i < iter; i++) {
// lots of if statements
getRandNorm(sample1, truemean1, truevar1, n);
// some more analysis
}
I am working on speeding up the runtime. It occurs to me that I don't know what is happening with all these double variables that I am declaring. I assume a new 8 byte chunk of memory is allocated for the double for each of the one million loops. What happens to all those memory locations? They are declared within a C function; do they survive that function? Are they still locked up until the script exits?
The context for this question is wrapping this C program into a python function. If I'm going to execute this function multiple times in parallel from python, I want to be sure that I'm being as thrifty with memory usage as possible.
If you're talking about something like this:
for(int i=0;i<100000;i++){
double d = 5;
// some other stuff here
}
d is only allocated once by the compiler. It's mostly equivalent to declaring it above the for loop, except that the scope doesn't extend as far.
However, if you are doing something like this:
for(int i=0;i<1000000;i++){
double *d = malloc(sizeof(double));
free(d);
}
Then yes, you will allocate a double 1 million times, but it will likely re-use the memory for subsequent allocations. Finally, if you don't free the memory in my second example, you'll leak 16-32MB of memory.
The short answer is: NO, it should not matter if you declare these double variables inside the loop in C. By double variable, I assume you mean variables of type double.
The long answer is: Please post your code so people can tell you if you do something wrong and how to fix it to improve correctness and/or performance (a vast subject).
The final answer is: with the code provided, it makes no difference whether you declare u1 and u2 inside the body of the loop or outside. A good compiler will likely generate the same code.
You can improve the code a tiny bit by testing the odd case just once:
void getRandNorm(double *randnorm, double mean, double var, int n, double pi) {
// Declare variables
double u1, u2;
double sigma = sqrt(var); // the standard deviation
int arrptr, odd;
odd = n & 1; // check if n is odd
n -= odd; // make n even
for (arrptr = 0; arrptr < n; arrptr += 2) {
// Generate two uniform random variables
u1 = rand() / (double)RAND_MAX;
u2 = rand() / (double)RAND_MAX;
// Box-Muller transform
randnorm[arrptr + 0] = sqrt(-2*log(u1)) * cos(2*pi*u2) * sigma + mean;
randnorm[arrptr + 1] = sqrt(-2*log(u2)) * cos(2*pi*u1) * sigma + mean;
}
if (odd) {
u1 = rand() / (double)RAND_MAX;
u2 = rand() / (double)RAND_MAX;
randnorm[arrptr++] = sqrt(-2*log(u1)) * cos(2*pi*u2) * sigma + mean;
}
}
Note: arrptr + 0 is here for symmetry, the compiler will not generate any code for this addition.
regarding your question: If I run a loop a million times, do I have to worry about declaring doubles in each iteration?
The variables are being declared on the stack. So they 'disappear' when the function exits. The next execution of the function 're-creates' the variables, so (in reality) there is only a single instance of the variables and even then, only while the function is being executed.
So it does not matter how many times you call the function.

1D Adaptive Mesh Refinement in C

I am a Physics PhD student currently entering the field of numerical relativity and I have to implement an adaptive mesh refinement code in 2 dimensions. As with every other bit of my program, I usually prefer to do something much simpler to understand what is going on before jumping to a more sophisticated case. However, I still seem to be doing something fundamentally wrong.
My code performs (or at least should perform) the following procedure: I discretize the x-axis in N intervals of size h. Every time a point is computed, the program stops and computes that point again by changing the interval of size h to another interval with two steps h/2. The program checks if the results are below some user specified tolerance and, if not, the process starts again with step size h/4 and so on. The following sketch illustrates the procedure
After the refinement function acts, I have absolutely no interest in keeping the values of the function on the refined grids. All I want is to compute the function on the coarse grid with maximum accuracy (in the image all I want to keep - and change - are the values of the black dots of the coarse [base] grid).
Unfortunately I see no improvement on the solution after the refinement algorithm is passed. I do not expect the plot of the function to be perfect, but I expect every point to be very close to the analytic solution. This is my refinement function (the function is called recursively until a maximum level of refinement - user specified - is reached):
void refine( int l, long double dx, long double x_min, long double x_max, long double f_min, long double *f_max ){
// l = level of refinement, dx = step size, x_min is current x position, x_max = point we want to calculate, f_min = function evaluated at x_min, f_max = function evaluated at x_max
int i;
long double *f_aux, f_point;
f_aux = (long double *) malloc ( (2*l + 1) * sizeof (long double) );
dx = 0.5 * dx;
f_aux[0] = f_min;
for( i=1; i<2*l+1; i++ ){
f_aux[i] = ( 1.0 - 2.0 * dx * ( x_min + (i-1)*dx - X0 ) / DELTA ) * f_aux[i-1];
}
if( l < lMAX ){
if( fabs( f_aux[2*l] - *f_max ) > TOL ){
f_point = f_aux[2*l];
free( f_aux );
l++;
refine( l, dx, x_min+dx, x_max, f_min, &f_point );
}
else{
*f_max = f_aux[2*l];
free( f_aux );
}
}
else{
*f_max = f_aux[2*l];
free( f_aux );
}
return;
}
Can anyone shed some light on the problem? I feel completely stuck.
Thanks in advance!
It looks that your iteration enhances around last point, but is still coarse around start point:
*---------------* refine 0
*-------*-------* refine 1
*-------*---*---* refine 2
*-------*---*-*-* refine 3
And as your equation looks hyperbolic (depends on all previous iterations) the solution error will be cumuluative in nature. I think you shall iterate with fine grid from the begining to get proper solution - not only around neighborhood. Eg. for simple equation df/dx=f with such implementation:
float update(float f_prev, float dx, float /* unused */ target_dx) {
return f_prev*dx + f_prev;
}
int main(void)
{
int n = 8;
float dx = 2.f/(float)(n-1);
float f[n];
f[0] = 1.f;
for (int i = 1; i < n; i++) {
f[i] = update(f[i-1], dx, dx / 16.f);
}
return 0;
}
I would go with simple recursive formula:
float update_recursive(float f_prev, float dx, float target_dx) {
if (dx > target_dx) {
return update_recursive(
update_recursive(f_prev, dx / 2.f, target_dx), dx / 2.f, target_dx
);
}
return dx * f_prev + f_prev;
}
The approach enhances the quality of solution. The terminating condition may be more adaptive to solution than target_dx is. Of course, it is needed to ensure that recursion is well bounded.

How to write this function C

Hi I am new to C programming, I am just trying to replace part of my code with a function call but I don't know how to do it properly, please help.
I just want the line d = ... to be equivalent to the line e = ...
#include <stdio.h>
#include <math.h>
double dist(int i, int j, double v[100][2])
{
return sqrt( pow((v[j][0] - v[i][0]),2) + pow((v[j][1] - v[i][1]), 2) )
}
main()
{
double v[100][2], d, e;
v[1][0] = 0;
v[1][1] = 1;
v[2][0] = 1;
v[2][1] = 1;
d = sqrt( pow((v[1][0] - v[2][0]),2) + pow((v[1][1] - v[2][1]), 2) );
e = dist(1,2,v);
printf("\n%f\n",d);
printf("\n%f\n",e);
}
double dist(int i, int j, double (*v)[2])
{
return sqrt( pow((v[j][0] - v[i][0]),2) + pow((v[j][1] - v[i][1]), 2) );
}
d = dist(0,1,v)
Or dist(1,0,v)
Distance between point 0 and point 1 ... Order does not matter.
EDIT: What I have above is a function CALL, as requested. d= is equivalent to e= ... to write another function is quite the waste of code and more importantly, not a realization of what a function is used for. I stick by my answer.
If you wanted the same thing for different types, you can use a macro (not recommended for this case since a decent compiler will inline the function call to Cato's function) but just for educational purposes
#define dist(i,j,v) sqrt(pow((v[j][0]-v[i][0]),2)+pow((v[j][1]-v[i][1]),2))
Just keep in mind that sqrt returns a double, so if you want float or long double returns, you'll need sqrtf or sqrtl.
The advantage to using macros for mathematical "functions" is that they get expanded out into the code prior to compile such that constants can be evaluated into the computation and can sometimes reduce the entire calculation down to a much simpler computation or sometimes even a constant value.
Mike is correct on the mathematical properties, though precision may cause the 2 values to differ slightly (usually this difference is unwanted).

How to speed up this mex code?

I am reprogramming a piece of MATLAB code in mex (using C). So far my C version of the MATLAB code is about as double as fast as the MATLAB code. Now I have three questions, all related to the code below:
How can I speed up this code more?
Do you see any problems with this code? I ask this because I don't know mex very well and I am also not a C guru ;-) ... I am aware that there should be some checks in the code (for example if there is still heap space while using realloc, but I left this away for the sake of simplicity for the moment)
Is it possible, that MATLAB is optimizing so well, that I really can't get much more than twice as fast code in C...?
The code should be more or less platform independent (Win, Linux, Unix, Mac, different Hardware), so I don't want to use assembler or specific linear Algebra Libraries. So that's why I programmed the staff by myself...
#include <mex.h>
#include <math.h>
#include <matrix.h>
void mexFunction(
int nlhs, mxArray *plhs[],
int nrhs, const mxArray *prhs[])
{
double epsilon = ((double)(mxGetScalar(prhs[0])));
int strengthDim = ((int)(mxGetScalar(prhs[1])));
int lenPartMat = ((int)(mxGetScalar(prhs[2])));
int numParts = ((int)(mxGetScalar(prhs[3])));
double *partMat = mxGetPr(prhs[4]);
const mxArray* verletListCells = prhs[5];
mxArray *verletList;
double *pseSum = (double *) malloc(numParts * sizeof(double));
for(int i = 0; i < numParts; i++) pseSum[i] = 0.0;
float *tempVar = NULL;
for(int i = 0; i < numParts; i++)
{
verletList = mxGetCell(verletListCells,i);
int numberVerlet = mxGetM(verletList);
tempVar = (float *) realloc(tempVar, numberVerlet * sizeof(float) * 2);
for(int a = 0; a < numberVerlet; a++)
{
tempVar[a*2] = partMat[((int) (*(mxGetPr(verletList) + a))) - 1] - partMat[i];
tempVar[a*2 + 1] = partMat[((int) (*(mxGetPr(verletList) + a))) - 1 + lenPartMat] - partMat[i + lenPartMat];
tempVar[a*2] = pow(tempVar[a*2],2);
tempVar[a*2 + 1] = pow(tempVar[a*2 + 1],2);
tempVar[a*2] = tempVar[a*2] + tempVar[a*2 + 1];
tempVar[a*2] = sqrt(tempVar[a*2]);
tempVar[a*2] = 4.0/(pow(epsilon,2) * M_PI) * exp(-(pow((tempVar[a*2]/epsilon),2)));
pseSum[i] = pseSum[i] + ((partMat[((int) (*(mxGetPr(verletList) + a))) - 1 + 2*lenPartMat] - partMat[i + (2 * lenPartMat)]) * tempVar[a*2]);
}
}
plhs[0] = mxCreateDoubleMatrix(numParts,1,mxREAL);
for(int a = 0; a < numParts; a++)
{
*(mxGetPr(plhs[0]) + a) = pseSum[a];
}
free(tempVar);
free(pseSum);
}
So this is the improved version, which is about 12 times faster than MATLAB version. The conversion thing is still eating up much time, but I let this away for now, becaues I have to change something in MATLAB for this. So first focus on the remaining C code. Do you see any more potential in the following code?
#include <mex.h>
#include <math.h>
#include <matrix.h>
void mexFunction(
int nlhs, mxArray *plhs[],
int nrhs, const mxArray *prhs[])
{
double epsilon = ((double)(mxGetScalar(prhs[0])));
int strengthDim = ((int)(mxGetScalar(prhs[1])));
int lenPartMat = ((int)(mxGetScalar(prhs[2])));
double *partMat = mxGetPr(prhs[3]);
const mxArray* verletListCells = prhs[4];
int numParts = mxGetM(verletListCells);
mxArray *verletList;
plhs[0] = mxCreateDoubleMatrix(numParts,1,mxREAL);
double *pseSum = mxGetPr(plhs[0]);
double epsilonSquared = epsilon*epsilon;
double preConst = 4.0/((epsilonSquared) * M_PI);
int numberVerlet = 0;
double tempVar[2];
for(int i = 0; i < numParts; i++)
{
verletList = mxGetCell(verletListCells,i);
double *verletListPtr = mxGetPr(verletList);
numberVerlet = mxGetM(verletList);
for(int a = 0; a < numberVerlet; a++)
{
int adress = ((int) (*(verletListPtr + a))) - 1;
tempVar[0] = partMat[adress] - partMat[i];
tempVar[1] = partMat[adress + lenPartMat] - partMat[i + lenPartMat];
tempVar[0] = tempVar[0]*tempVar[0] + tempVar[1]*tempVar[1];
tempVar[0] = preConst * exp(-(tempVar[0]/epsilonSquared));
pseSum[i] += ((partMat[adress + 2*lenPartMat] - partMat[i + (2*lenPartMat)]* tempVar[0]);
}
}
}
You do not need to allocate the pseSum for local use and then later copy the data to the output. You can simply allocate a MATLAB object and get the pointer to the memory :
plhs[0] = mxCreateDoubleMatrix(numParts,1,mxREAL);
pseSum = mxGetPr(plhs[0]);
Thus you will not have to initialize pseSum to 0, because MATLAB already does it in mxCreateDoubleMatrix.
Remove all the mxGetPr from the inner loop and assign them to variables before.
Instead of casting doubles to ints consider using int32 or uint32 arrays in MATLAB. Casting double to int is expensive. The internal loop computations would look like
tempVar[a*2] = partMat[somevar[a] - 1] - partMat[i];
You use such constructs in your code
((int) (*(mxGetPr(verletList) + a)))
You do it because the varletList is a 'double' array (that is the case by default in MATLAB), which holds integer values. Instead, you should use integer array. Before you call your mex file type in MATLAB:
varletList = int32(varletList);
Then you will not need the type cast to int above. You will simply write
((int*)mxGetData(verletList))[a]
or better yet, assign earlier
somevar = (int*)mxGetData(verletList);
and later write
somevar[a]
precompute 4.0/(pow(epsilon,2) * M_PI) before all loops! That is one expensive constant.
pow((tempVar[a*2]/epsilon),2)) is simply tempVar[a*2]^2/epsilon^2. You calculate sqrt(tempVar[a*2]) just before. Why do you square it now?
Generally do not use pow(x, 2). Just write x*x
I would add some sanity checks on the parameters, especially if you demand integers. Either use MATLABs int32/uint32 type, or check that what you get actually is an integer.
Edit in the new code
compute -1/epsilonSquared before the loops and compute exp(minvepssq*tempVar[0]).note that the result might differ slightly. Depends what you need, but if you don't care about exact order of operations, do it.
define a register variable preSum_r and use it to sum the results in the inner loop. After the loop assign it to preSum[i]. If you want more fun, you can write the result to the memory using SSE streaming store (_mm_stream_pd compiler intrinsic).
do remove double to int cast
most likely irrelevant, but try to change tempVar[0/1] to normal variables. Irrelevant, because the compiler should do that for you. But again, an array is not needed here.
parallelise the external loop with OpenMP. Trivial (at least the simplest version without thinking about data layout for NUMA architectures) since there is no dependence between the iterations.
Can you estimate ahead of time what will be the maximum size of tempVar and allocate memory for it before the loop instead of using realloc? Reallocating memory is a time consuming operation and if your numParts is large, this could have a huge impact. Take a look at this question.

C array of functions

I have a problem with a series of functions. I have an array of 'return values' (i compute them through matrices) from a single function sys which depends on a integer variable, lets say, j, and I want to return them according to this j , i mean, if i want the equation number j, for example, i just write sys(j)
For this, i used a for loop but i don't know if it's well defined, because when i run my code, i don't get the right values.
Is there a better way to have an array of functions and call them in a easy way? That would make easier to work with a function in a Runge Kutta method to solve a diff equation.
I let this part of the code here: (c is just the j integer i used to explain before)
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
int N=3;
double s=10.;
//float r=28.;
double b=8.0/3.0;
/ * Define functions * /
double sys(int c,double r,double y[])
{
int l,m,n,p=0;
double tmp;
double t[3][3]={0};
double j[3][3]={{-s,s,0},{r-y[2],-1,-y[0]},{y[1],y[0],-b}}; //Jacobiano
double id[3][3] = { {y[3],y[6],y[9]} , {y[4],y[7],y[10]} , {y[5],y[8],y[11]} };
double flat[N*(N+1)];
// Multiplication of matrices J * Y
for(l=0;l<N;l++)
{
for(m=0;m<N;m++)
{
for(n=0;n<N;n++)
{
t[l][m] += j[l][n] * id[n][m];
}
}
}
// Transpose the matrix (J * Y) -> () t
for(l=0;l<N;l++)
{
for(m=l+1;m<N;m++)
{
tmp = t[l][m];
t[l][m] = t[m][l];
t[m][l] = tmp;
}
}
// We flatten the array to be left in one array
for(l=0;l<N;l++)
{
for(m=0;m<N;m++)
{
flat[p+N] = t[l][m];
}
}
flat[0] = s*(y[1]-y[0]);
flat[1] = y[0]*(r-y[2])-y[1];
flat[2] = y[0]*y[1]-b*y[2];
for(l=0;l<(N*(N+1));l++)
{
if(c==l)
{
return flat[c];
}
}
}
EDIT ----------------------------------------------------------------
Ok, this is the part of the code where i use the function
int main(){
output = fopen("lyapcoef.dat","w");
int j,k;
int N2 = N*N;
int NN = N*(N+1);
double r;
double rmax = 29;
double t = 0;
double dt = 0.05;
double tf = 50;
double z[NN]; // Temporary matrix for RK4
double k1[N2],k2[N2],k3[N2],k4[N2];
double y[NN]; // Matrix for all variables
/* Initial conditions */
double u[N];
double phi[N][N];
double phiu[N];
double norm;
double lyap;
//Here we integrate the system using Runge-Kutta of fourth order
for(r=28;r<rmax;r++){
y[0]=19;
y[1]=20;
y[2]=50;
for(j=N;j<NN;j++) y[j]=0;
for(j=N;j<NN;j=j+3) y[j]=1; // Identity matrix for y from 3 to 11
while(t<tf){
/* RK4 step 1 */
for(j=0;j<NN;j++){
k1[j] = sys(j,r,y)*dt;
z[j] = y[j] + k1[j]*0.5;
}
/* RK4 step 2 */
for(j=0;j<NN;j++){
k2[j] = sys(j,r,z)*dt;
z[j] = y[j] + k2[j]*0.5;
}
/* RK4 step 3 */
for(j=0;j<NN;j++){
k3[j] = sys(j,r,z)*dt;
z[j] = y[j] + k3[j];
}
/* RK4 step 4 */
for(j=0;j<NN;j++){
k4[j] = sys(j,r,z)*dt;
}
/* Updating y matrix with new values */
for(j=0;j<NN;j++){
y[j] += (k1[j]/6.0 + k2[j]/3.0 + k3[j]/3.0 + k4[j]/6.0);
}
printf("%lf %lf %lf \n",y[0],y[1],y[2]);
t += dt;
}
Since you're actually computing all these values at the same time, what you really want is for the function to return them all together. The easiest way to do this is to pass in a pointer to an array, into which the function will write the values. Or perhaps two arrays; it looks to me as if the output of your function is (conceptually) a 3x3 matrix together with a length-3 vector.
So the declaration of sys would look something like this:
void sys(double v[3], double JYt[3][3], double r, const double y[12]);
where v would end up containing the first three elements of your flat and JYt would contain the rest. (More informative names are probably possible.)
Incidentally, the for loop at the end of your code is exactly equivalent to just saying return flat[c]; except that if c happens not to be >=0 and <N*(N+1) then control will just fall off the end of your function, which in practice means that it will return some random number that almost certainly isn't what you want.
Your function sys() does an O(N3) calculation to multiply two matrices, then does a couple of O(N2) operations, and finally selects a single number to return. Then it is called the next time and goes through most of the same processing. It feels a tad wasteful unless (even if?) the matrices are really small.
The final loop in the function is a little odd, too:
for(l=0;l<(N*(N+1));l++)
{
if(c==l)
{
return flat[c];
}
}
Isn't that more simply written as:
return flat[c];
Or, perhaps:
if (c < N * (N+1))
return flat[c];
else
...do something on disastrous error other than fall off the end of the
...function without returning a value as the code currently does...
I don't see where you are selecting an algorithm by the value of j. If that's what you're trying to describe, in C you can have an array of pointers to functions; you could use a numerical index to choose a function from the array, but you can also pass a pointer-to-a-function to another function that will call it.
That said: Judging from your code, you should keep it simple. If you want to use a number to control which code gets executed, just use an if or switch statement.
switch (c) {
case 0:
/* Algorithm 0 */
break;
case 1:
/* Algorithm 1 */
etc.

Resources